Home | History | Annotate | only in /src/sys/ufs
History log of /src/sys/ufs
RevisionDateAuthorComments
 1.2 26-Nov-2002  lukem Remove KDIR=, since SYS_INCLUDE=symlinks and KDIR are not supported any more.
 1.1 12-Jun-1998  cgd branches: 1.1.26;
Rework the way kernel include files are installed. In the new method,
as with user-land programs, include files are installed by each directory
in the tree that has includes to install. (This allows more flexibility
as to what gets installed, makes 'partial installs' easier, and gives us
more options as to which machines' includes get installed at any given
time.) The old SYS_INCLUDES={symlinks,copies} behaviours are _both_
still supported, though at least one bug in the 'symlinks' case is
fixed by this change. Include files can't be build before installation,
so directories that have includes as targets (e.g. dev/pci) have to move
those targets into a different Makefile.
 1.1.26.1 11-Dec-2002  thorpej Sync with HEAD.
 1.5 08-Jun-1994  mycroft Clean up deleted files.
 1.4 17-May-1994  cgd copyright foo
 1.3 20-May-1993  cgd add rcs ids, and clean up headers where necessary
 1.2 19-Apr-1993  mycroft Add consistent multiple-inclusion protection.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.51 17-Sep-2025  perseant Add working in-kernel roll forward.
 1.50 28-Nov-2022  chs the UFS_EXTATTR option was supposed to affect only UFS1 file systems,
but when the UFS2 extattr code was merged, the UFS_EXTATTR option was
mistakenly changed to affect UFS2 file systems as well. this commit
changes UFS_EXTATTR back to affecting only UFS1 file systems as originally
intended. in UFS2 (or rather UFS2ea in NetBSD), extattrs are a
native feature and are always supported.
 1.49 24-Sep-2020  riastradh lfs: Include lfs_debug.c only if DEBUG is enabled.
 1.48 16-May-2020  christos Add ACL support for FFS. From FreeBSD.
 1.47 18-Apr-2020  christos Extended attribute support for ffsv2, from FreeBSD.
 1.46 11-Apr-2020  jdolecek remove noncompilable WAPBL_DEBUG_INODES

PR kern/49554 by Thomas Klausner
 1.45 17-Jun-2019  christos branches: 1.45.8;
Don't include any of the ufs code if all the dependent filesystems are missing.
 1.44 16-Jun-2019  christos Include the fs scaffolding when none of the ffs/mfs/ext2fs/chfs is included
so a MODULAR kernel links.
 1.43 12-Aug-2016  jdolecek branches: 1.43.16;
add support for extended attributes in ext2fs for ext3/ext4; read-only for now
 1.42 24-Jun-2016  christos GSoC 2016 (Hrishikesh Goyal): Htree index support from FreeBSD
 1.41 03-Jun-2016  christos add extents.
 1.40 31-May-2015  hannken Change lfs from hash table to vcache.

- Change lfs_valloc() to return an inode number and version instead of
a vnode and move lfs_ialloc() and lfs_vcreate() to new lfs_init_vnode().

- Add lfs_valloc_fixed() to allocate a known inode, used by kernel
roll forward.

- Remove lfs_*ref(), these functions cannot coexist with vcache and
their commented behaviour is far away from their implementation.

- Add the cleaner lwp and blockinfo to struct ulfsmount so lfs_loadvnode()
may use hints from the cleaner.

- Remove vnode locks from ulfs_lookup() like we did with ufs_lookup().
 1.39 11-Jan-2015  hannken Change chfs from hashlist to vcache.
 1.38 16-Nov-2014  manu branches: 1.38.2;
Remove unused extended attributes kernel options

As Masao Uebayashi pointed to me, UFS_EXTATTR_AUTOSTART, LFS_EXTATTR_AUTOSTART
and UFS_EXTATTR_AUTOCREATE are not used anywhere in the code. Remove them
as they have been obsolete for a long time:
UFS_EXTATTR_AUTOSTART was replaced by mount -o extattr
LFS_EXTATTR_AUTOSTART was created to match obsolete UFS_EXTATTR_AUTOSTART
UFS_EXTATTR_AUTOCREATE was replaced by sysctl vfs.ffs.extattr_autocreate
 1.37 10-Oct-2014  uebayasi To make sure that I'm not doing wrong, try to define ffs/ufs/vfs dependencies
a little more strictly.
 1.36 16-May-2014  dholland branches: 1.36.2;
Move lfs_getpages and lfs_putpages to their own file.
 1.35 08-May-2014  hannken Add a global vnode cache:

- vcache_get() retrieves a referenced and initialised vnode / fs node pair.
- vcache_remove() removes a vnode / fs node pair from the cache.

On cache miss vcache_get() calls new vfs operation vfs_loadvnode() to
initialise a vnode / fs node pair. This call is guaranteed exclusive,
no other thread will try to load this vnode / fs node pair.

Convert ufs/ext2fs, ufs/ffs and ufs/mfs to use this interface.

Remove now unused ufs/ufs_ihash

Discussed on tech-kern.

Welcome to 6.99.41
 1.34 18-Mar-2014  riastradh branches: 1.34.2;
Merge riastradh-drm2 to HEAD.
 1.33 20-Jul-2013  dholland Collect the pieces of lfs rename into lfs_rename.c, and sprinkle static.
 1.32 08-Jun-2013  dholland branches: 1.32.2; 1.32.4;
G/C
 1.31 06-Jun-2013  dholland Apparently we also need to cut and paste ffs_snapgone() in order to be
able to link the ufs code.

Instead of actually cutting and pasting it (as it depends on ffs-only
things) implement it as panic. Probably we'll be able to demonstrate
later that it's unreachable.

XXX: Someone should add snapgone to struct ufs_ops in ufs/ufsmount.h,
XXX: and fix ufs/ufs_lookup.c to not hardwire ffs.
 1.30 06-Jun-2013  dholland Split lfs from ufs step 3: rearrange config stuff.
Add new options:
LFS_EI
LFS_DIRHASH
LFS_EXTATTR
LFS_EXTATTR_AUTOSTART
LFS_QUOTA
LFS_QUOTA2

and update code referring to the corresponding FFS and UFS config
symbols to use the LFS versions. Disable the one extant reference
to APPLE_UFS in the ulfs files. Use opt_lfs.h only, not opt_ffs.h.
 1.29 09-May-2012  riastradh branches: 1.29.2;
Adapt ffs, lfs, and ext2fs to use genfs_rename.

ok dholland, rmind
 1.28 19-Apr-2012  ttoth chfs/debug.c deleted from files.ufs
 1.27 24-Nov-2011  ahoka branches: 1.27.2;
Import CHFS, which was formerly known as ChewieFS.

CHFS is a file system for flash devices developed by the
Software Engineering Department at University of Szeged, Hungary.

http://chewiefs.sed.hu/

Thanks for all who made it possible.
 1.26 24-Mar-2011  bouyer branches: 1.26.4;
Add a new libquota library, which contains some blocks to build and/or
parse quota plists; as well as a getfsquota() function to retrieve quotas
for a single id from a single filesystem (whatever filesystem this is:
a local quota-enabled fs or NFS). This is build on functions getufsquota()
(for local filesystems with UFS-like quotas) and getnfsquota();
which are also available to userland programs.
move functions from quota2_subr.c to libquota or libprop as appropriate,
and ajust in-tree quota tools.
move some declarations from kernel headers to either sys/quota.h or
quota/quota.h as appropriate. ufs/ufs/quota.h still installed because
it's needed by other installed ufs headers.
ufs/ufs/quota1.h still installed as a quick&dirty way to get a code
using the old quotactl() to compile (just include ufs/ufs/quota1.h instead of
ufs/ufs/quota.h - old code won't compile without this change and this is
on purpose).
Discussed on tech-kern@ and tech-net@ (long thread, but not much about
libquota itself ...)
 1.25 06-Mar-2011  bouyer merge the bouyer-quota2 branch. This adds a new on-disk format
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.

See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
 1.24 02-Mar-2010  pooka branches: 1.24.2; 1.24.4; 1.24.6;
fs_lfs.h is no longer necessary
 1.23 02-Mar-2010  pooka Remove fs_mfs.h from users because it is now unnecessary and don't
generate fs_mfs.h anymore.
 1.22 02-Mar-2010  pooka Make mfs_initminiroot() mandatory. Allows to remove #ifdef MFS.
 1.21 02-Mar-2010  pooka Don't generate unused fs_thefs.h headers.
 1.20 02-Mar-2010  pooka Remove last #ifdef FFS. Do this by making lfs include ffs.
Could use UFS_OPS, but:

1) the lfs kernel module depends on full ffs already anway
2) lfs is being split from ufs, so this will automatically
go away soon
3) chances of anyone wanting an lfs-only kernel are pretty slim
4) i'm too lazy to figure out how to test ffs_snapgone() is
still called properly if I change the call ;)
 1.19 22-Feb-2009  ad branches: 1.19.2;
PR kern/26878 FFSv2 + softdep = livelock (no free ram)
PR kern/16942 panic with softdep and quotas
PR kern/19565 panic: softdep_write_inodeblock: indirect pointer #1 mismatch
PR kern/26274 softdep panic: allocdirect_merge: ...
PR kern/26374 Long delay before non-root users can write to softdep partitions
PR kern/28621 1.6.x "vp != NULL" panic in ffs_softdep.c:4653 while unmounting a softdep (+quota) filesystem
PR kern/29513 FFS+Softdep panic with unfsck-able file-corruption
PR kern/31544 The ffs softdep code appears to fail to write dirty bits to disk
PR kern/31981 stopping scsi disk can cause panic (softdep)
PR kern/32116 kernel panic in softdep (assertion failure)
PR kern/32532 softdep_trackbufs deadlock
PR kern/37191 softdep: locking against myself
PR kern/40474 Kernel panic after remounting raid root with softdep

Retire softdep, pass 2. As discussed and later formally announced on the
mailing lists.
 1.18 31-Jul-2008  simonb branches: 1.18.2; 1.18.8;
Merge the simonb-wapbl branch. From the original branch commit:

Add Wasabi System's WAPBL (Write Ahead Physical Block Logging)
journaling code. Originally written by Darrin B. Jewell while
at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

OK'd by core@, releng@.
 1.17 12-Dec-2007  lukem branches: 1.17.6; 1.17.10; 1.17.12; 1.17.14; 1.17.16;
defflag LFS_KERNEL_RFW (in opt_lfs.h).
Note: lfs_rfw.c doesn't compile if you define the option; locking API fallout?
 1.16 13-Nov-2006  jmmv branches: 1.16.24; 1.16.34; 1.16.36; 1.16.38;
Let ext2fs be built even when none of ffs, lfs and mfs are present.
 1.15 20-Jul-2006  perseant branches: 1.15.4; 1.15.6;
Separate the (non-working) LFS kernel roll-forward code into its own file,
lfs_rfw.c.
 1.14 05-Apr-2006  uwe Tell config to generate fs_ffs.h as vfs_bio.c checks for defined(FFS).
Include that header in vfs_bio.c so that bioops are not redefined.
 1.13 11-Dec-2005  christos branches: 1.13.4; 1.13.6; 1.13.8; 1.13.10; 1.13.12;
merge ktrace-lwp.
 1.12 13-Sep-2005  christos split out lfs_itimes(). It is used in fsck_lfs.
 1.11 28-Aug-2005  thorpej Experimental support for extended attributes on UFS1 file systems, using a
backing file per attribute type indexed by inode number to hold the extended
attributes.

This is working pretty well on my test systems, except for the "autostart"
feature. I need someone with a better handle on the VFS locking protocol
to go over that.

This is a work-in-progress. There are parts of this that could be re-factored
allowing this approach to be used on other types of file systems.

Adapted from FreeBSD.
 1.10 10-Jul-2005  thorpej Defflag UFS_DIRHASH.
 1.9 26-Feb-2005  perry branches: 1.9.4;
nuke trailing whitespace
 1.8 21-Feb-2005  hannken Make `options FFS_NO_SNAPSHOT' only disable snapshot creation
while not trashing existing snapshots.

Approved by: core@
 1.7 18-Feb-2005  dsl change ffs_snapshot to !ffs_no_snapshot
 1.6 10-Feb-2005  dsl Make ffs snapshots be enabled by 'option FFS_SNAPSHOT'
 1.5 31-Jan-2005  hannken Add file system snapshots to kernel configs.

- Ffs internal snapshots get compiled in unconditionally.

- File system snapshot device fss(4) added to all kernel configs that
have a disk. Device is commented out on all non-GENERIC kernels.

Reviewed by: Jason Thorpe <thorpej@netbsd.org>
 1.4 23-Jan-2005  rumble branches: 1.4.2;
Bring in Ian Dowse's Dirhash from FreeBSD. Hash tables of
directories are created on the fly and used to increase
performance by circumventing ufs_lookup's linear search.

Dirhash is enabled by the UFS_DIRHASH option, but not
by default.
 1.3 25-May-2004  hannken branches: 1.3.4;
Add ffs internal snapshots. Written by Marshall Kirk McKusick for FreeBSD.

- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.

Welcome to 2.0F.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.2 28-Sep-2002  dbj branches: 1.2.6;
Add support for the Apple UFS variation on ffs
This is the bulk of PR #17345

The general approach is to use a run time deteriminable value
for DIRBLKSIZ. Additional allowances are included for using
MAXSYMLINKLEN with FS_42INODEFMT and a shift in the cylinder group
cluster summary count array. Support is added for managing
the Apple UFS volume label.
 1.1 16-Apr-2002  thorpej branches: 1.1.6; 1.1.8;
Cleanup how file system configuration information is declared, grouping
related information together, with the file system code itself.

This is just low-hanging fruit -- more to come.
 1.1.8.3 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.1.8.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.1.8.1 16-Apr-2002  jdolecek file files.ufs was added on branch kqueue on 2002-06-23 17:52:05 +0000
 1.1.6.3 18-Oct-2002  nathanw Catch up to -current.
 1.1.6.2 20-Jun-2002  nathanw Catch up to -current.
 1.1.6.1 16-Apr-2002  nathanw file files.ufs was added on branch nathanw_sa on 2002-06-20 03:50:21 +0000
 1.2.6.8 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.2.6.7 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.2.6.6 15-Feb-2005  skrll Sync with HEAD.
 1.2.6.5 04-Feb-2005  skrll Sync with HEAD.
 1.2.6.4 24-Jan-2005  skrll Sync with HEAD.
 1.2.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.2.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.2.6.1 03-Aug-2004  skrll Sync with HEAD
 1.3.4.1 29-Apr-2005  kent sync with -current
 1.4.2.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.4.2.1 12-Feb-2005  yamt sync with head.
 1.9.4.3 21-Jan-2008  yamt sync with head
 1.9.4.2 30-Dec-2006  yamt sync with head.
 1.9.4.1 21-Jun-2006  yamt sync with head.
 1.13.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.13.10.1 19-Apr-2006  elad sync with head.
 1.13.8.2 11-Aug-2006  yamt sync with head
 1.13.8.1 11-Apr-2006  yamt sync with head
 1.13.6.1 22-Apr-2006  simonb Sync with head.
 1.13.4.1 09-Sep-2006  rpaulo sync with head
 1.15.6.1 10-Dec-2006  yamt sync with head.
 1.15.4.1 18-Nov-2006  ad Sync with head.
 1.16.38.1 13-Dec-2007  bouyer Sync with HEAD
 1.16.36.1 13-Dec-2007  yamt sync with head.
 1.16.34.1 26-Dec-2007  ad Sync with head.
 1.16.24.1 09-Jan-2008  matt sync with HEAD
 1.17.16.1 19-Oct-2008  haad Sync with HEAD.
 1.17.14.1 10-Jun-2008  simonb Initial commit of Wasabi System's WAPBL (Write Ahead Physical Block
Logging) journaling code. Originally written by Darrin B. Jewell
while at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

Still a number of issues - look in doc/BRANCHES for "simonb-wapbl"
for more info.
 1.17.12.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.17.10.2 11-Mar-2010  yamt sync with head
 1.17.10.1 04-May-2009  yamt sync with head.
 1.17.6.1 28-Sep-2008  mjf Sync with HEAD.
 1.18.8.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.18.2.1 03-Mar-2009  skrll Sync with HEAD.
 1.19.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.24.6.5 18-Feb-2011  bouyer quota2_subr.c is not used outside of ufs_quota2.c in kernel, so make it
compiled conditionally on QUOTA2 again
 1.24.6.4 15-Feb-2011  bouyer Implement COMPAT_50 quotactl(2)
 1.24.6.3 09-Feb-2011  bouyer Reimplement quotactl commands for quota1
 1.24.6.2 09-Feb-2011  bouyer Various build fixes
 1.24.6.1 20-Jan-2011  bouyer Snapshot of work in progress on a modernised disk quota system:
- new quotactl syscall (versionned for backward compat), which takes
as parameter a path to a mount point, and a prop_dictionary
(in plistref format) describing commands and arguments.
For each command, status and data are returned as a prop_dictionary.
quota commands features will be added to take advantage of this,
exporting quota data or getting quota commands as plists.

- new on disk-format storage (all 64bit wide), integrated to metadata for
ffs (and playing nicely with wapbl).
Quotas are enabled on a ffs filesystem via superblock flags.
tunefs(8) can enable or disable quotas.
On a quota-enabled filesystem, fsck_ffs(8) will track per-uid/gid
block and inode usages, and will check and update quotas in Pass 6.
quota usage and limits are stored in unliked files (one for users,
one for groups)l fsck_ffs(8) will create the files if needed, or
free them if needed. This means that after enabling or disabling
quotas on a filesystem; a fsck_ffs(8) run is required.
quotacheck(8) is not needed any more, on a unclean shutdown
fsck or journal replay will take care of fixing quotas.
newfs(8) can create a ready-to-mount quota-enabled filesystem
(superblock flags are set and quota inodes are created).
Other new features or semantic changes:
- default quota datas, applied to users or groups which don't already
have a quota entry
- per-user/group grace time (instead of a filesystem global one)
- 0 really means "nothing allowed at all", not "no limit".
If you want "no limit", set the limit to UQUAD_MAX (tools will
understand "unlimited" and "-")

A quota file is structured as follow:
it starts with a header, containing a few per-filesystem values,
and the default quota limits.
Quota entries are linked together as a simple list, each entry has a
pointer (as an offset withing the file) to the next.
The header has a pointer to a list of free quota entries, and
a hash table of in-use entries. The size of the hash table depends
on the filesystem block size (header+hash table should fit in the
first block). The file is not sparse and is a multiple of
filesystem block size (when the free quota entry list is empty a new
filesystem block is allocated). quota entries to not cross
filesystem block boundaries.

In memory, the kernel keeps a cache of recently used quota entries
as a reference to the block number, and offset withing the block.
The quota entry itself is keept in the buf cache.

fsck_ffs(8), tunefs(8) and newfs(8) supports are completed (with
related atf tests :)
The kernel can update disk usage and report it via quotactl(2).

Todo: enforce quotas limits (limits are not checked by kernel yet)
update repquota, edquota and rpc.rquotad to the new world
implement compat_50_quotactl ioctl.
update quotactl(2) man page

fsck_ffs required fixes so that allocating new blocks or inodes will
properly update the superblock and cg sumaries. This was not an issue up
to now because superblock and cg sumaries check happened last, but now
allocations or frees can happen in pass 6.
 1.24.4.1 06-Jun-2011  jruoho Sync with HEAD.
 1.24.2.1 21-Apr-2011  rmind sync with head
 1.26.4.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.26.4.2 23-May-2012  yamt sync with head.
 1.26.4.1 17-Apr-2012  yamt sync with head
 1.27.2.2 02-Jun-2012  mrg sync to latest -current.
 1.27.2.1 29-Apr-2012  mrg sync to latest -current.
 1.29.2.3 03-Dec-2017  jdolecek update from HEAD
 1.29.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.29.2.1 23-Jun-2013  tls resync from head
 1.32.4.1 23-Jul-2013  riastradh sync with HEAD
 1.32.2.2 18-May-2014  rmind sync with head
 1.32.2.1 28-Aug-2013  rmind sync with head
 1.34.2.1 10-Aug-2014  tls Rebase.
 1.36.2.1 18-Nov-2014  snj Pull up following revision(s) (requested by manu in ticket #251):
sys/arch/acorn26/conf/GENERIC: revision 1.81
sys/arch/acorn32/conf/GENERIC: revision 1.116
sys/arch/alpha/conf/GENERIC: revision 1.362
sys/arch/amd64/conf/ALL: revision 1.23
sys/arch/amd64/conf/GENERIC: revision 1.404
sys/arch/amd64/conf/XEN3_DOM0: revision 1.112
sys/arch/amd64/conf/XEN3_DOMU: revision 1.60
sys/arch/amiga/conf/GENERIC.in: revision 1.129
sys/arch/amiga/conf/GENERIC: revision 1.311
sys/arch/amigappc/conf/GENERIC: revision 1.24
sys/arch/arc/conf/GENERIC: revision 1.184
sys/arch/bebox/conf/GENERIC: revision 1.145
sys/arch/cats/conf/GENERIC: revision 1.155
sys/arch/cesfic/conf/GENERIC: revision 1.65
sys/arch/cobalt/conf/GENERIC: revision 1.147
sys/arch/dreamcast/conf/GENERIC: revision 1.121
sys/arch/emips/conf/GENERIC: revision 1.15
sys/arch/epoc32/conf/GENERIC: revision 1.8
sys/arch/ews4800mips/conf/GENERIC: revision 1.51
sys/arch/hp300/conf/GENERIC: revision 1.190
sys/arch/hpcmips/conf/GENERIC: revision 1.229
sys/arch/hpcsh/conf/GENERIC: revision 1.106
sys/arch/hppa/conf/GENERIC: revision 1.6
sys/arch/i386/conf/ALL: revision 1.389
sys/arch/i386/conf/GENERIC: revision 1.1118
sys/arch/i386/conf/XEN3_DOM0: revision 1.93
sys/arch/i386/conf/XEN3_DOMU: revision 1.65
sys/arch/ibmnws/conf/GENERIC: revision 1.46
sys/arch/iyonix/conf/GENERIC: revision 1.88
sys/arch/landisk/conf/GENERIC: revision 1.45
sys/arch/luna68k/conf/GENERIC: revision 1.119
sys/arch/mac68k/conf/GENERIC: revision 1.220
sys/arch/macppc/conf/GENERIC: revision 1.320
sys/arch/macppc/conf/MAMBO: revision 1.24
sys/arch/macppc/conf/POWERMAC_G5: revision 1.25
sys/arch/mipsco/conf/GENERIC: revision 1.88
sys/arch/mmeye/conf/GENERIC: revision 1.120
sys/arch/mvme68k/conf/GENERIC: revision 1.94
sys/arch/mvmeppc/conf/GENERIC: revision 1.24
sys/arch/netwinder/conf/GENERIC: revision 1.126
sys/arch/news68k/conf/GENERIC: revision 1.125
sys/arch/newsmips/conf/GENERIC: revision 1.129
sys/arch/next68k/conf/GENERIC: revision 1.139
sys/arch/ofppc/conf/GENERIC: revision 1.157
sys/arch/pmax/conf/GENERIC64: revision 1.21
sys/arch/pmax/conf/GENERIC: revision 1.185
sys/arch/prep/conf/GENERIC: revision 1.174
sys/arch/rs6000/conf/GENERIC: revision 1.33
sys/arch/sandpoint/conf/GENERIC: revision 1.88
sys/arch/sbmips/conf/GENERIC: revision 1.101
sys/arch/sgimips/conf/GENERIC32_IP12: revision 1.28
sys/arch/sgimips/conf/GENERIC32_IP2x: revision 1.104
sys/arch/sgimips/conf/GENERIC32_IP3x: revision 1.106
sys/arch/shark/conf/GENERIC: revision 1.121
sys/arch/sparc/conf/GENERIC: revision 1.248
sys/arch/sparc/conf/TADPOLE3GX: revision 1.65
sys/arch/sparc64/conf/GENERIC: revision 1.177
sys/arch/sparc64/conf/NONPLUS64: revision 1.44
sys/arch/sun2/conf/GENERIC: revision 1.94
sys/arch/sun3/conf/GENERIC: revision 1.171
sys/arch/vax/conf/GENERIC: revision 1.193
sys/arch/vax/conf/VAX780: revision 1.19
sys/arch/x68k/conf/GENERIC: revision 1.179
sys/arch/zaurus/conf/GENERIC: revision 1.65
sys/ufs/files.ufs: revision 1.38
Remove unused extended attributes kernel options

As Masao Uebayashi pointed to me, UFS_EXTATTR_AUTOSTART, LFS_EXTATTR_AUTOSTART
and UFS_EXTATTR_AUTOCREATE are not used anywhere in the code. Remove them
as they have been obsolete for a long time:
UFS_EXTATTR_AUTOSTART was replaced by mount -o extattr
LFS_EXTATTR_AUTOSTART was created to match obsolete UFS_EXTATTR_AUTOSTART
UFS_EXTATTR_AUTOCREATE was replaced by sysctl vfs.ffs.extattr_autocreate
 1.38.2.4 05-Oct-2016  skrll Sync with HEAD
 1.38.2.3 09-Jul-2016  skrll Sync with HEAD
 1.38.2.2 06-Jun-2015  skrll Sync with HEAD
 1.38.2.1 06-Apr-2015  skrll Sync with HEAD
 1.43.16.2 21-Apr-2020  martin Sync with HEAD
 1.43.16.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.45.8.1 20-Apr-2020  bouyer Sync with HEAD
 1.7 08-Jun-1994  mycroft Clean up deleted files.
 1.6 25-Apr-1994  cgd i hate RISC.
 1.5 25-Apr-1994  cgd some prototype cleanup, eliminate/replace bogus types (e.g. quad and
u_quad) -> use better types (e.g. quad_t & u_quad_t in inodes),
some cleanup.
 1.4 01-Oct-1993  mycroft Add FS_CLEANFREQ.
 1.3 20-May-1993  cgd branches: 1.3.4;
add rcs ids, and clean up headers where necessary
 1.2 19-Apr-1993  mycroft Add consistent multiple-inclusion protection.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.3.4.3 02-Oct-1993  mycroft Make fs_qbmask and fs_qfmask quad's rather than quad_t's.
 1.3.4.2 01-Oct-1993  mycroft Add FS_CLEANFREQ.
 1.3.4.1 24-Sep-1993  mycroft Make all files using spl*() #include cpu.h.
dinode.h: Change di_size to a u_quad_t.
fs.h: Change fs_bmask and fs_fmask to quad_ts.
ufs_vfsops.c, ufsmount.h: Make ufs_quotactl() take an int rather than a uid_t.
ufs_vnops.c: va_size and va_bytes are now quads.
 1.5 09-Mar-1994  mycroft Clean up deleted files.
 1.4 22-May-1993  cgd add Yuval Yarom's changes (originally for BSD/386) for advisory record
locking on NFS files. Note that this DOES NOT support network locking,
only local advisory locks.
 1.3 20-May-1993  cgd add rcs ids, and clean up headers where necessary
 1.2 19-Apr-1993  mycroft Add consistent multiple-inclusion protection.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.6 08-Jun-1994  mycroft Clean up deleted files.
 1.5 14-Apr-1994  cgd fs types are names now.
 1.4 06-Feb-1994  mycroft Use b_actf, not av_forw.
 1.3 17-Dec-1993  mycroft Canonicalize all #includes.
 1.2 20-May-1993  cgd branches: 1.2.4;
add rcs ids, and clean up headers where necessary
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.2.4.1 14-Nov-1993  mycroft Canonicalize all #includes.
 1.7 08-Jun-1994  mycroft Clean up deleted files.
 1.6 06-Feb-1994  mycroft Use b_actf, not av_forw.
 1.5 23-Dec-1993  cgd mfs_print return type back to 'int'
 1.4 17-Dec-1993  mycroft Canonicalize all #includes.
 1.3 24-Aug-1993  mycroft branches: 1.3.2;
Make mfs_print() return a void to prevent a warning from GCC.
 1.2 20-May-1993  cgd add rcs ids, and clean up headers where necessary
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.3.2.1 14-Nov-1993  mycroft Canonicalize all #includes.
 1.4 08-Jun-1994  mycroft Clean up deleted files.
 1.3 20-May-1993  cgd add rcs ids, and clean up headers where necessary
 1.2 19-Apr-1993  mycroft Add consistent multiple-inclusion protection.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.8 01-Mar-1995  mycroft Clean up deleted files.
 1.7 23-Dec-1993  cgd mfs_print return type back to 'int'
 1.6 07-Sep-1993  cgd ws forgot two backslashes (so it tossed his 'cookies')
 1.5 07-Sep-1993  ws Changes to VFS readdir semantics
NFS changes for better cookie support
ISOFS changes for better Rockridge support and support for generation numbers
 1.4 24-Aug-1993  mycroft Make mfs_print() return a void to prevent a warning from GCC.
 1.3 20-May-1993  cgd add rcs ids, and clean up headers where necessary
 1.2 19-Apr-1993  mycroft Add consistent multiple-inclusion protection.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.6 01-Mar-1995  mycroft Clean up deleted files.
 1.5 27-Apr-1994  cgd SHUT UP!
 1.4 26-Apr-1994  pk More prototyping.
 1.3 20-May-1993  cgd add rcs ids, and clean up headers where necessary
 1.2 19-Apr-1993  mycroft Add consistent multiple-inclusion protection.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.7 01-Mar-1995  mycroft Clean up deleted files.
 1.6 18-May-1994  cgd use a cast b_data for everything
 1.5 27-Mar-1994  cgd expand uid_t/gid_t/off_t
 1.4 17-Dec-1993  mycroft Canonicalize all #includes.
 1.3 20-May-1993  cgd branches: 1.3.4;
add rcs ids, and clean up headers where necessary
 1.2 09-Apr-1993  cgd fix from Chris Torek (patch 106):
386BSD inherits a bug from the 4.3 Reno port for contiguous block allocation.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.3.4.1 14-Nov-1993  mycroft Canonicalize all #includes.
 1.7 09-Mar-1994  mycroft Clean up deleted files.
 1.6 17-Dec-1993  mycroft Canonicalize all #includes.
 1.5 28-Jul-1993  cgd branches: 1.5.2;
incorporate changes from 0-9-base to 0-9-ALPHA
 1.4 22-May-1993  cgd branches: 1.4.2;
add Yuval Yarom's changes (originally for BSD/386) for advisory record
locking on NFS files. Note that this DOES NOT support network locking,
only local advisory locks.
 1.3 20-May-1993  cgd add rcs ids, and clean up headers where necessary
 1.2 11-May-1993  deraadt dangling pointer patch for lockf. From pk@cs.few.eur.nl
patch dated Apr 26.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.4.2.2 22-Jul-1993  cgd oops; i committed the wrong version of ufs_lockf.c last time...
 1.4.2.1 21-Jul-1993  cgd "canonical" fix for the lockf "dangling pointer" problem, from torek.
 1.5.2.1 14-Nov-1993  mycroft Canonicalize all #includes.
 1.10 01-Mar-1995  mycroft Clean up deleted files.
 1.9 17-May-1994  cgd copyright foo
 1.8 25-Apr-1994  cgd some prototype cleanup, eliminate/replace bogus types (e.g. quad and
u_quad) -> use better types (e.g. quad_t & u_quad_t in inodes),
some cleanup.
 1.7 27-Mar-1994  cgd expand uid_t/gid_t/off_t
 1.6 14-Feb-1994  mycroft PARANOID --> DIAGNOSTIC for inexpensive tests.
 1.5 17-Dec-1993  mycroft Canonicalize all #includes.
 1.4 20-Nov-1993  cgd do something better with lookup return values; suggested by BSDI's msdosfs mod
 1.3 20-May-1993  cgd branches: 1.3.4;
add rcs ids, and clean up headers where necessary
 1.2 02-Apr-1993  cgd make when PARANOID wouldn't work, for mis-remembered field name
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.3.4.3 20-Nov-1993  cgd do something better with lookup return values; suggested by BSDI's msdosfs mod
 1.3.4.2 14-Nov-1993  mycroft Canonicalize all #includes.
 1.3.4.1 14-Nov-1993  mycroft PARANOID --> DIAGNOSTIC. These are not expensive tests.
 1.7 01-Mar-1995  mycroft Clean up deleted files.
 1.6 25-Apr-1994  cgd some prototype cleanup, eliminate/replace bogus types (e.g. quad and
u_quad) -> use better types (e.g. quad_t & u_quad_t in inodes),
some cleanup.
 1.5 21-Apr-1994  cgd Convert mount, vnode, and buf structs to use <sys/queue.h>. Also,
some knf and structure frobbing to do along with it.
 1.4 17-Dec-1993  mycroft Canonicalize all #includes.
 1.3 01-Aug-1993  mycroft branches: 1.3.2;
Add RCS identifiers (this time on the correct side of the branch), and
incorporate recent changes in netbsd-0-9 branch.
 1.2 20-May-1993  cgd branches: 1.2.2;
add rcs ids, and clean up headers where necessary
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.2.2.1 31-Jul-1993  cgd give names, err, wmesg's, to my "pain" -- i.e. convert sleep() to tsleep()
 1.3.2.1 14-Nov-1993  mycroft Canonicalize all #includes.
 1.7 01-Mar-1995  mycroft Clean up deleted files.
 1.6 17-Dec-1993  mycroft Canonicalize all #includes.
 1.5 11-Sep-1993  jtc Removed functions moved to libkern: scanc, skpc, locc.
 1.4 01-Sep-1993  glass branches: 1.4.2;
sun3 has scanc support, so it doesn't need the ufs_subr.c version
this crud will go away with the usage of libkern
 1.3 27-Jun-1993  andrew ANSIfications.
 1.2 20-May-1993  cgd add rcs ids, and clean up headers where necessary
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.4.2.2 14-Nov-1993  mycroft Canonicalize all #includes.
 1.4.2.1 14-Sep-1993  mycroft Incorporate changes from main branch.
 1.4 01-Mar-1995  mycroft Clean up deleted files.
 1.3 17-Dec-1993  mycroft Canonicalize all #includes.
 1.2 20-May-1993  cgd branches: 1.2.4;
add rcs ids, and clean up headers where necessary
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.2.4.1 14-Nov-1993  mycroft Canonicalize all #includes.
 1.19 01-Mar-1995  mycroft Clean up deleted files.
 1.18 24-May-1994  cgd MIN -> min, MAX -> max
 1.17 18-May-1994  cgd use a cast b_data for everything
 1.16 18-May-1994  cgd put sync printing in one place
 1.15 17-May-1994  cgd copyright foo
 1.14 13-May-1994  cgd new kernel malloc. much better (but slower) diagnostic checking
 1.13 05-May-1994  cgd lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.
 1.12 25-Apr-1994  cgd some prototype cleanup, eliminate/replace bogus types (e.g. quad and
u_quad) -> use better types (e.g. quad_t & u_quad_t in inodes),
some cleanup.
 1.11 23-Apr-1994  cgd make fs types consistent over new kernels. also, some proto foo.
 1.10 21-Apr-1994  cgd Convert mount, vnode, and buf structs to use <sys/queue.h>. Also,
some knf and structure frobbing to do along with it.
 1.9 14-Apr-1994  cgd fs types are names now.
 1.8 09-Mar-1994  ws Make FFS optional
 1.7 24-Feb-1994  paulus Remove the last dependencies on DEV_BSIZE in the ufs code.
 1.6 27-Jan-1994  cgd quiet a compiler warning
 1.5 17-Dec-1993  mycroft Canonicalize all #includes.
 1.4 12-Nov-1993  cgd new specfs.h and fifo.h locations
 1.3 07-Sep-1993  ws branches: 1.3.2;
Changes to VFS readdir semantics
NFS changes for better cookie support
ISOFS changes for better Rockridge support and support for generation numbers
 1.2 20-May-1993  cgd add rcs ids, and clean up headers where necessary
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.3.2.3 14-Nov-1993  mycroft Canonicalize all #includes.
 1.3.2.2 12-Nov-1993  cgd new specfs.h and fifo.h locations, and include file syntax updates
 1.3.2.1 24-Sep-1993  mycroft Make all files using spl*() #include cpu.h.
dinode.h: Change di_size to a u_quad_t.
fs.h: Change fs_bmask and fs_fmask to quad_ts.
ufs_vfsops.c, ufsmount.h: Make ufs_quotactl() take an int rather than a uid_t.
ufs_vnops.c: va_size and va_bytes are now quads.
 1.2 01-Mar-1995  mycroft Clean up deleted files.
 1.1 09-Mar-1994  ws Make FFS optional
 1.5 01-Mar-1995  mycroft Clean up deleted files.
 1.4 27-Mar-1994  cgd expand uid_t/gid_t/off_t
 1.3 20-May-1993  cgd branches: 1.3.4;
add rcs ids, and clean up headers where necessary
 1.2 19-Apr-1993  mycroft Add consistent multiple-inclusion protection.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.3.4.1 24-Sep-1993  mycroft Make all files using spl*() #include cpu.h.
dinode.h: Change di_size to a u_quad_t.
fs.h: Change fs_bmask and fs_fmask to quad_ts.
ufs_vfsops.c, ufsmount.h: Make ufs_quotactl() take an int rather than a uid_t.
ufs_vnops.c: va_size and va_bytes are now quads.
 1.11 07-Dec-2021  andvar fix various typos, mainly in comments.
 1.10 19-Apr-2018  christos s/static inline/static __inline/g for consistency.
 1.9 11-Jan-2015  hannken branches: 1.9.16;
Change chfs from hashlist to vcache.
 1.8 19-Oct-2012  ttoth branches: 1.8.14;
CHFS comments
 1.7 10-Aug-2012  ttoth branches: 1.7.2;
chfs bugfix [node was obsoleted twice]
 1.6 13-Apr-2012  ttoth branches: 1.6.2;
prepare for chfs's makefs
 1.5 12-Apr-2012  ttoth using chtype on media instead of vtype
debug.c deleted
 1.4 28-Nov-2011  ahoka branches: 1.4.2;
cleanup, some style and remove leftover code
 1.3 24-Nov-2011  ahoka disable dbg messages (they break the build on amd64)
 1.2 24-Nov-2011  ahoka fix build failure on amd64 due to incorrect format string
 1.1 24-Nov-2011  ahoka Import CHFS, which was formerly known as ChewieFS.

CHFS is a file system for flash devices developed by the
Software Engineering Department at University of Szeged, Hungary.

http://chewiefs.sed.hu/

Thanks for all who made it possible.
 1.4.2.1 29-Apr-2012  mrg sync to latest -current.
 1.6.2.3 30-Oct-2012  yamt sync with head
 1.6.2.2 17-Apr-2012  yamt sync with head
 1.6.2.1 13-Apr-2012  yamt file chfs.h was added on branch yamt-pagecache on 2012-04-17 00:08:54 +0000
 1.7.2.2 03-Dec-2017  jdolecek update from HEAD
 1.7.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.8.14.1 06-Apr-2015  skrll Sync with HEAD
 1.9.16.1 22-Apr-2018  pgoyette Sync with HEAD
 1.2 19-Oct-2012  ttoth CHFS comments
 1.1 24-Nov-2011  ahoka branches: 1.1.6; 1.1.10;
Import CHFS, which was formerly known as ChewieFS.

CHFS is a file system for flash devices developed by the
Software Engineering Department at University of Szeged, Hungary.

http://chewiefs.sed.hu/

Thanks for all who made it possible.
 1.1.10.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.1.6.3 30-Oct-2012  yamt sync with head
 1.1.6.2 17-Apr-2012  yamt sync with head
 1.1.6.1 24-Nov-2011  yamt file chfs_args.h was added on branch yamt-pagecache on 2012-04-17 00:08:54 +0000
 1.6 19-Jul-2021  andvar Release mutexes in few more places on failure path. Reviewed them in chfs code after fixing PR kern/56242.
ok riastradh
 1.5 19-Oct-2012  ttoth branches: 1.5.54;
CHFS comments
 1.4 10-Aug-2012  ttoth branches: 1.4.2;
chfs bugfix [node was obsoleted twice]
 1.3 12-Apr-2012  ttoth branches: 1.3.2;
using chtype on media instead of vtype
debug.c deleted
 1.2 24-Nov-2011  agc branches: 1.2.2;
i missed a file - quick workaround for compilation bugs on amd64
 1.1 24-Nov-2011  ahoka Import CHFS, which was formerly known as ChewieFS.

CHFS is a file system for flash devices developed by the
Software Engineering Department at University of Szeged, Hungary.

http://chewiefs.sed.hu/

Thanks for all who made it possible.
 1.2.2.1 29-Apr-2012  mrg sync to latest -current.
 1.3.2.3 30-Oct-2012  yamt sync with head
 1.3.2.2 17-Apr-2012  yamt sync with head
 1.3.2.1 12-Apr-2012  yamt file chfs_build.c was added on branch yamt-pagecache on 2012-04-17 00:08:54 +0000
 1.4.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.5.54.1 01-Aug-2021  thorpej Sync with HEAD.
 1.2 19-Oct-2012  ttoth CHFS comments
 1.1 24-Nov-2011  ahoka branches: 1.1.6; 1.1.10;
Import CHFS, which was formerly known as ChewieFS.

CHFS is a file system for flash devices developed by the
Software Engineering Department at University of Szeged, Hungary.

http://chewiefs.sed.hu/

Thanks for all who made it possible.
 1.1.10.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.1.6.3 30-Oct-2012  yamt sync with head
 1.1.6.2 17-Apr-2012  yamt sync with head
 1.1.6.1 24-Nov-2011  yamt file chfs_erase.c was added on branch yamt-pagecache on 2012-04-17 00:08:54 +0000
 1.12 07-Dec-2021  andvar fix typos in word "instead", mainly in log messages.
 1.11 19-Jul-2021  andvar NFC - if/else blocks start with the same mutex_exit, just move it up.
 1.10 16-Jul-2021  andvar Fix incorrect function name, some grammar and typos in comments. Remove trailing tab symbol.
No functional change intended.
 1.9 01-Jun-2017  chs branches: 1.9.26;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.8 11-Jan-2015  hannken Convert a bogus mnt_vnodelist traversal to vfs_vnode_iterator.
 1.7 08-Sep-2014  joerg branches: 1.7.2;
Timestamps are bad sources of entropy, so just use cprng_fast32.
 1.6 01-Sep-2014  he Plug memory leak in error returns and normal operation in
chfs_gcollect_pristine().
 1.5 20-Oct-2013  christos branches: 1.5.4;
remove unused
 1.4 19-Oct-2012  ttoth branches: 1.4.2;
CHFS comments
 1.3 10-Aug-2012  ttoth branches: 1.3.2;
chfs bugfix [node was obsoleted twice]
 1.2 24-Nov-2011  agc branches: 1.2.6;
quick workaround to make this compile, with thanks to Hisashi Fujinaka for the
nudge.
 1.1 24-Nov-2011  ahoka Import CHFS, which was formerly known as ChewieFS.

CHFS is a file system for flash devices developed by the
Software Engineering Department at University of Szeged, Hungary.

http://chewiefs.sed.hu/

Thanks for all who made it possible.
 1.2.6.4 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.2.6.3 30-Oct-2012  yamt sync with head
 1.2.6.2 17-Apr-2012  yamt sync with head
 1.2.6.1 24-Nov-2011  yamt file chfs_gc.c was added on branch yamt-pagecache on 2012-04-17 00:08:54 +0000
 1.3.2.3 03-Dec-2017  jdolecek update from HEAD
 1.3.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.3.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.4.2.1 18-May-2014  rmind sync with head
 1.5.4.1 08-Sep-2014  msaitoh Pull up following revision(s) (requested by he in ticket #74):
sys/ufs/chfs/chfs_vnode.c: revision 1.11
sys/ufs/chfs/chfs_readinode.c: revision 1.9
sys/ufs/chfs/chfs_scan.c: revision 1.5
sys/ufs/chfs/chfs_gc.c: revision 1.6
sys/ufs/chfs/ebh.c: revision 1.4
Plug leak in chfs_scan_eraseblock() of the allocated buffer.
Make sure to release it both on success and failure returns.
OK'ed by ttoth@
Plug memory leak in a corner case in chfs_get_data_nodes().
Plug memory leaks in error returns in chfs_readvnode().
Plug memory leak in error returns and normal operation in
chfs_gcollect_pristine().
Plug memory leak in add_peb_to_free() and add_peb_to_in_use()
in case there's a duplicate in the tree.
 1.7.2.2 28-Aug-2017  skrll Sync with HEAD
 1.7.2.1 06-Apr-2015  skrll Sync with HEAD
 1.9.26.1 01-Aug-2021  thorpej Sync with HEAD.
 1.4 11-Jan-2015  hannken Change chfs from hashlist to vcache.
 1.3 27-Feb-2014  hannken branches: 1.3.6;
The current implementation of vn_lock() is racy. Modification of
the vnode operations vector for active vnodes is unsafe because it
is not known whether deadfs or the original file system will be
called.

- Pass down LK_RETRY to the lock operation (hint for deadfs only).

- Change deadfs lock operation to return ENOENT if LK_RETRY is unset.

- Change all other lock operations to check for dead vnode once
the vnode is locked and unlock and return ENOENT in this case.

With these changes in place vnode lock operations will never succeed
after vclean() has marked the vnode as VI_XLOCK and before vclean()
has changed the operations vector.

Adresses PR kern/37706 (Forced unmount of file systems is unsafe)

Discussed on tech-kern.

Welcome to 6.99.33
 1.2 19-Oct-2012  ttoth branches: 1.2.2;
CHFS comments
 1.1 24-Nov-2011  ahoka branches: 1.1.6; 1.1.10;
Import CHFS, which was formerly known as ChewieFS.

CHFS is a file system for flash devices developed by the
Software Engineering Department at University of Szeged, Hungary.

http://chewiefs.sed.hu/

Thanks for all who made it possible.
 1.1.10.3 03-Dec-2017  jdolecek update from HEAD
 1.1.10.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.1.10.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.1.6.4 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.1.6.3 30-Oct-2012  yamt sync with head
 1.1.6.2 17-Apr-2012  yamt sync with head
 1.1.6.1 24-Nov-2011  yamt file chfs_ihash.c was added on branch yamt-pagecache on 2012-04-17 00:08:54 +0000
 1.2.2.1 18-May-2014  rmind sync with head
 1.3.6.1 06-Apr-2015  skrll Sync with HEAD
 1.10 11-Jan-2015  hannken Change chfs from hashlist to vcache.
 1.9 26-May-2014  dholland branches: 1.9.4;
Fix previous. Anyone have a brown paper bag?
 1.8 26-May-2014  dholland Remove lfs-only inode flags.
 1.7 22-Jan-2013  dholland branches: 1.7.10;
Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.6 19-Oct-2012  ttoth CHFS comments
 1.5 18-Apr-2012  joerg branches: 1.5.2;
Don't depend on implicit enum casts, be explicit.
 1.4 13-Apr-2012  ttoth branches: 1.4.2;
prepare for chfs's makefs
 1.3 12-Apr-2012  ttoth using chtype on media instead of vtype
debug.c deleted
 1.2 28-Feb-2012  christos Make this compile again. From Paul Fleischer.
 1.1 24-Nov-2011  ahoka branches: 1.1.2; 1.1.4;
Import CHFS, which was formerly known as ChewieFS.

CHFS is a file system for flash devices developed by the
Software Engineering Department at University of Szeged, Hungary.

http://chewiefs.sed.hu/

Thanks for all who made it possible.
 1.1.4.1 02-Mar-2012  riz Pull up following revision(s) (requested by tron in ticket #63):
sys/ufs/chfs/chfs_inode.h: revision 1.2
sys/ufs/chfs/chfs_malloc.c: revision 1.2
sys/arch/i386/conf/ALL: revision 1.333
sys/ufs/chfs/chfs_pool.c: revision 1.2
Make this compile again. From Paul Fleischer.
Add Chip File System.
 1.1.2.4 29-Apr-2012  mrg sync to latest -current.
 1.1.2.3 06-Mar-2012  mrg sync to -current
 1.1.2.2 06-Mar-2012  mrg sync to -current
 1.1.2.1 04-Mar-2012  mrg sync to latest -current.
 1.4.2.5 23-Jan-2013  yamt sync with head
 1.4.2.4 30-Oct-2012  yamt sync with head
 1.4.2.3 23-May-2012  yamt sync with head.
 1.4.2.2 17-Apr-2012  yamt sync with head
 1.4.2.1 13-Apr-2012  yamt file chfs_inode.h was added on branch yamt-pagecache on 2012-04-17 00:08:54 +0000
 1.5.2.4 03-Dec-2017  jdolecek update from HEAD
 1.5.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.5.2.2 25-Feb-2013  tls resync with head
 1.5.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.7.10.1 10-Aug-2014  tls Rebase.
 1.9.4.1 06-Apr-2015  skrll Sync with HEAD
 1.7 07-Dec-2021  andvar fix various typos, mainly in comments.
 1.6 17-Jun-2019  ryoon Fix build without DIAGNOSTIC
 1.5 09-Nov-2017  christos branches: 1.5.4;
use PR_WAITOK everywhere.
 1.4 19-Oct-2012  ttoth branches: 1.4.30;
CHFS comments
 1.3 10-Aug-2012  ttoth branches: 1.3.2;
chfs bugfix [node was obsoleted twice]
 1.2 28-Feb-2012  christos branches: 1.2.2;
Make this compile again. From Paul Fleischer.
 1.1 24-Nov-2011  ahoka branches: 1.1.2; 1.1.4;
Import CHFS, which was formerly known as ChewieFS.

CHFS is a file system for flash devices developed by the
Software Engineering Department at University of Szeged, Hungary.

http://chewiefs.sed.hu/

Thanks for all who made it possible.
 1.1.4.1 02-Mar-2012  riz Pull up following revision(s) (requested by tron in ticket #63):
sys/ufs/chfs/chfs_inode.h: revision 1.2
sys/ufs/chfs/chfs_malloc.c: revision 1.2
sys/arch/i386/conf/ALL: revision 1.333
sys/ufs/chfs/chfs_pool.c: revision 1.2
Make this compile again. From Paul Fleischer.
Add Chip File System.
 1.1.2.3 06-Mar-2012  mrg sync to -current
 1.1.2.2 06-Mar-2012  mrg sync to -current
 1.1.2.1 04-Mar-2012  mrg sync to latest -current.
 1.2.2.3 30-Oct-2012  yamt sync with head
 1.2.2.2 17-Apr-2012  yamt sync with head
 1.2.2.1 28-Feb-2012  yamt file chfs_malloc.c was added on branch yamt-pagecache on 2012-04-17 00:08:54 +0000
 1.3.2.2 03-Dec-2017  jdolecek update from HEAD
 1.3.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.4.30.1 27-Feb-2018  martin Pull up following revision(s) (requested by mrg in ticket #593):
sys/dev/marvell/mvxpsec.c: revision 1.2
sys/arch/m68k/m68k/pmap_motorola.c: revision 1.70
sys/opencrypto/crypto.c: revision 1.102
sys/arch/sparc64/sparc64/pmap.c: revision 1.308
sys/ufs/chfs/chfs_malloc.c: revision 1.5
sys/arch/powerpc/oea/pmap.c: revision 1.95
sys/sys/pool.h: revision 1.80,1.82
sys/kern/subr_pool.c: revision 1.209-1.216,1.219-1.220
sys/arch/alpha/alpha/pmap.c: revision 1.262
sys/kern/uipc_mbuf.c: revision 1.173
sys/uvm/uvm_fault.c: revision 1.202
sys/sys/mbuf.h: revision 1.172
sys/kern/subr_extent.c: revision 1.86
sys/arch/x86/x86/pmap.c: revision 1.266 (via patch)
sys/dev/dtv/dtv_scatter.c: revision 1.4

Allow only one pending call to a pool's backing allocator at a time.
Candidate fix for problems with hanging after kva fragmentation related
to PR kern/45718.

Proposed on tech-kern:
https://mail-index.NetBSD.org/tech-kern/2017/10/23/msg022472.html
Tested by bouyer@ on i386.

This makes one small change to the semantics of pool_prime and
pool_setlowat: they may fail with EWOULDBLOCK instead of ENOMEM, if
there is a pending call to the backing allocator in another thread but
we are not actually out of memory. That is unlikely because nearly
always these are used during initialization, when the pool is not in
use.

Define the new flag too for previous commit.

pool_grow can now fail even when sleeping is ok. Catch this case in pool_get
and retry.

Assert that pool_get failure happens only with PR_NOWAIT.
This would have caught the mistake I made last week leading to null
pointer dereferences all over the place, a mistake which I evidently
poorly scheduled alongside maxv's change to the panic message on x86
for null pointer dereferences.

Since pr_lock is now used to wait for two things now (PR_GROWING and
PR_WANTED) we need to loop for the condition we wanted.
make the KASSERTMSG/panic strings consistent as '%s: [%s], __func__, wchan'
Handle the ERESTART case from pool_grow()

don't pass 0 to the pool flags
Guess pool_cache_get(pc, 0) means PR_WAITOK here.
Earlier on in the same context we use kmem_alloc(sz, KM_SLEEP).

use PR_WAITOK everywhere.
use PR_NOWAIT.

Don't use 0 for PR_NOWAIT

use PR_NOWAIT instead of 0

panic ex nihilo -- PR_NOWAITing for zerot

Add assertions that either PR_WAITOK or PR_NOWAIT are set.
- fix an assert; we can reach there if we are nowait or limitfail.
- when priming the pool and failing with ERESTART, don't decrement the number
of pages; this avoids the issue of returning an ERESTART when we get to 0,
and is more correct.
- simplify the pool_grow code, and don't wakeup things if we ENOMEM.

In pmap_enter_ma(), only try to allocate pves if we might need them,
and even if that fails, only fail the operation if we later discover
that we really do need them. This implements the requirement that
pmap_enter(PMAP_CANFAIL) must not fail when replacing an existing
mapping with the first mapping of a new page, which is an unintended
consequence of the changes from the rmind-uvmplock branch in 2011.

The problem arises when pmap_enter(PMAP_CANFAIL) is used to replace an existing
pmap mapping with a mapping of a different page (eg. to resolve a copy-on-write).
If that fails and leaves the old pmap entry in place, then UVM won't hold
the right locks when it eventually retries. This entanglement of the UVM and
pmap locking was done in rmind-uvmplock in order to improve performance,
but it also means that the UVM state and pmap state need to be kept in sync
more than they did before. It would be possible to handle this in the UVM code
instead of in the pmap code, but these pmap changes improve the handling of
low memory situations in general, and handling this in UVM would be clunky,
so this seemed like the better way to go.

This somewhat indirectly fixes PR 52706, as well as the failing assertion
about "uvm_page_locked_p(old_pg)". (but only on x86, various other platforms
will need their own changes to handle this issue.)
In uvm_fault_upper_enter(), if pmap_enter(PMAP_CANFAIL) fails, assert that
the pmap did not leave around a now-stale pmap mapping for an old page.
If such a pmap mapping still existed after we unlocked the vm_map,
the UVM code would not know later that it would need to lock the
lower layer object while calling the pmap to remove or replace that
stale pmap mapping. See PR 52706 for further details.
hopefully workaround the irregularly "fork fails in init" problem.
if a pool is growing, and the grower is PR_NOWAIT, mark this.
if another caller wants to grow the pool and is also PR_NOWAIT,
busy-wait for the original caller, which should either succeed
or hard-fail fairly quickly.

implement the busy-wait by unlocking and relocking this pools
mutex and returning ERESTART. other methods (such as having
the caller do this) were significantly more code and this hack
is fairly localised.
ok chs@ riastradh@

Don't release the lock in the PR_NOWAIT allocation. Move flags setting
after the acquiring the mutex. (from Tobias Nygren)
apply the change from arch/x86/x86/pmap.c rev. 1.266 commitid vZRjvmxG7YTHLOfA:

In pmap_enter_ma(), only try to allocate pves if we might need them,
and even if that fails, only fail the operation if we later discover
that we really do need them. If we are replacing an existing mapping,
reuse the pv structure where possible.

This implements the requirement that pmap_enter(PMAP_CANFAIL) must not fail
when replacing an existing mapping with the first mapping of a new page,
which is an unintended consequence of the changes from the rmind-uvmplock
branch in 2011.

The problem arises when pmap_enter(PMAP_CANFAIL) is used to replace an existing
pmap mapping with a mapping of a different page (eg. to resolve a copy-on-write).
If that fails and leaves the old pmap entry in place, then UVM won't hold
the right locks when it eventually retries. This entanglement of the UVM and
pmap locking was done in rmind-uvmplock in order to improve performance,
but it also means that the UVM state and pmap state need to be kept in sync
more than they did before. It would be possible to handle this in the UVM code
instead of in the pmap code, but these pmap changes improve the handling of
low memory situations in general, and handling this in UVM would be clunky,
so this seemed like the better way to go.

This somewhat indirectly fixes PR 52706 on the remaining platforms where
this problem existed.
 1.5.4.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.5 07-Dec-2021  andvar fix various typos, mainly in comments.
 1.4 09-Dec-2013  wiz Fix typo ("then" instead of "than")
 1.3 19-Oct-2012  ttoth branches: 1.3.2;
CHFS comments
 1.2 10-Aug-2012  ttoth branches: 1.2.2;
chfs bugfix [node was obsoleted twice]
 1.1 24-Nov-2011  ahoka branches: 1.1.6;
Import CHFS, which was formerly known as ChewieFS.

CHFS is a file system for flash devices developed by the
Software Engineering Department at University of Szeged, Hungary.

http://chewiefs.sed.hu/

Thanks for all who made it possible.
 1.1.6.4 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.1.6.3 30-Oct-2012  yamt sync with head
 1.1.6.2 17-Apr-2012  yamt sync with head
 1.1.6.1 24-Nov-2011  yamt file chfs_nodeops.c was added on branch yamt-pagecache on 2012-04-17 00:08:54 +0000
 1.2.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.2.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.3.2.1 18-May-2014  rmind sync with head
 1.5 05-Sep-2020  riastradh Round of uvm.h cleanup.

The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
to query whether curlwp is the pagedaemon, which should maybe be
exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
by it. We should split up uvm_extern.h but this will serve for now
to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
UVMHIST(ubchist), since ubchist is declared in uvm.h but the
reference evaporates if UVMHIST is not defined, so we reduce header
file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
here.

ok chs@
 1.4 17-Jun-2019  ryoon Fix build without DIAGNOSTIC
 1.3 29-Jan-2018  sevan branches: 1.3.4;
Drop commended out include to a hardcoded path in root's home directory.
 1.2 28-Feb-2012  christos branches: 1.2.2;
Make this compile again. From Paul Fleischer.
 1.1 24-Nov-2011  ahoka branches: 1.1.2; 1.1.4;
Import CHFS, which was formerly known as ChewieFS.

CHFS is a file system for flash devices developed by the
Software Engineering Department at University of Szeged, Hungary.

http://chewiefs.sed.hu/

Thanks for all who made it possible.
 1.1.4.1 02-Mar-2012  riz Pull up following revision(s) (requested by tron in ticket #63):
sys/ufs/chfs/chfs_inode.h: revision 1.2
sys/ufs/chfs/chfs_malloc.c: revision 1.2
sys/arch/i386/conf/ALL: revision 1.333
sys/ufs/chfs/chfs_pool.c: revision 1.2
Make this compile again. From Paul Fleischer.
Add Chip File System.
 1.1.2.3 06-Mar-2012  mrg sync to -current
 1.1.2.2 06-Mar-2012  mrg sync to -current
 1.1.2.1 04-Mar-2012  mrg sync to latest -current.
 1.2.2.2 17-Apr-2012  yamt sync with head
 1.2.2.1 28-Feb-2012  yamt file chfs_pool.c was added on branch yamt-pagecache on 2012-04-17 00:08:54 +0000
 1.3.4.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.1 24-Nov-2011  ahoka branches: 1.1.6;
Import CHFS, which was formerly known as ChewieFS.

CHFS is a file system for flash devices developed by the
Software Engineering Department at University of Szeged, Hungary.

http://chewiefs.sed.hu/

Thanks for all who made it possible.
 1.1.6.2 17-Apr-2012  yamt sync with head
 1.1.6.1 24-Nov-2011  yamt file chfs_pool.h was added on branch yamt-pagecache on 2012-04-17 00:08:54 +0000
 1.13 08-Apr-2022  andvar s/postion/position/
 1.12 10-Dec-2021  andvar s/occured/occurred/ in comments, log messages and man pages.
 1.11 15-Jul-2021  andvar Make sure that mutex is released before conditional return statements. Fixes PR kern/56242
ok riastradh
 1.10 01-Jun-2017  chs branches: 1.10.26;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.9 01-Sep-2014  he branches: 1.9.2;
Plug memory leak in a corner case in chfs_get_data_nodes().
 1.8 20-Oct-2013  christos branches: 1.8.4;
remove unused
 1.7 01-Dec-2012  mbalmer branches: 1.7.2;
Fix a typo in debug output.
 1.6 19-Oct-2012  ttoth CHFS comments
 1.5 22-Aug-2012  ttoth branches: 1.5.2;
chfs: fixed truncating
 1.4 13-Aug-2012  ttoth chfs fixes
1. nodes are obsoleted only once during truncating a file
2. frags don't stay in pool_cache
 1.3 10-Aug-2012  ttoth chfs bugfix [node was obsoleted twice]
 1.2 24-Nov-2011  agc branches: 1.2.6;
quick workaround to make this compile, with thanks to Hisashi Fujinaka for the
nudge.
 1.1 24-Nov-2011  ahoka Import CHFS, which was formerly known as ChewieFS.

CHFS is a file system for flash devices developed by the
Software Engineering Department at University of Szeged, Hungary.

http://chewiefs.sed.hu/

Thanks for all who made it possible.
 1.2.6.5 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.2.6.4 16-Jan-2013  yamt sync with (a bit old) head
 1.2.6.3 30-Oct-2012  yamt sync with head
 1.2.6.2 17-Apr-2012  yamt sync with head
 1.2.6.1 24-Nov-2011  yamt file chfs_readinode.c was added on branch yamt-pagecache on 2012-04-17 00:08:54 +0000
 1.5.2.4 03-Dec-2017  jdolecek update from HEAD
 1.5.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.5.2.2 25-Feb-2013  tls resync with head
 1.5.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.7.2.1 18-May-2014  rmind sync with head
 1.8.4.1 08-Sep-2014  msaitoh Pull up following revision(s) (requested by he in ticket #74):
sys/ufs/chfs/chfs_vnode.c: revision 1.11
sys/ufs/chfs/chfs_readinode.c: revision 1.9
sys/ufs/chfs/chfs_scan.c: revision 1.5
sys/ufs/chfs/chfs_gc.c: revision 1.6
sys/ufs/chfs/ebh.c: revision 1.4
Plug leak in chfs_scan_eraseblock() of the allocated buffer.
Make sure to release it both on success and failure returns.
OK'ed by ttoth@
Plug memory leak in a corner case in chfs_get_data_nodes().
Plug memory leaks in error returns in chfs_readvnode().
Plug memory leak in error returns and normal operation in
chfs_gcollect_pristine().
Plug memory leak in add_peb_to_free() and add_peb_to_in_use()
in case there's a duplicate in the tree.
 1.9.2.1 28-Aug-2017  skrll Sync with HEAD
 1.10.26.1 01-Aug-2021  thorpej Sync with HEAD.
 1.10 16-Jul-2021  andvar Fix incorrect function name, some grammar and typos in comments. Remove trailing tab symbol.
No functional change intended.
 1.9 15-Jul-2021  andvar Make sure that mutex is released before conditional return statements. Fixes PR kern/56242
ok riastradh
 1.8 17-Jun-2019  ryoon branches: 1.8.14;
Fix build without DIAGNOSTIC
 1.7 01-Jun-2017  chs branches: 1.7.10;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.6 07-Feb-2015  christos fix buf leak. Reported by:
http://www.m00nbsd.net/ae123a9bae03f7dde5c6d654412daf5a.html#Report-4
 1.5 01-Sep-2014  he branches: 1.5.2;
Plug leak in chfs_scan_eraseblock() of the allocated buffer.
Make sure to release it both on success and failure returns.
OK'ed by ttoth@
 1.4 19-Oct-2012  ttoth branches: 1.4.12;
CHFS comments
 1.3 10-Aug-2012  ttoth branches: 1.3.2;
chfs bugfix [node was obsoleted twice]
 1.2 24-Nov-2011  agc branches: 1.2.6;
quick workaround to make this compile, with thanks to Hisashi Fujinaka for the
nudge.
 1.1 24-Nov-2011  ahoka Import CHFS, which was formerly known as ChewieFS.

CHFS is a file system for flash devices developed by the
Software Engineering Department at University of Szeged, Hungary.

http://chewiefs.sed.hu/

Thanks for all who made it possible.
 1.2.6.3 30-Oct-2012  yamt sync with head
 1.2.6.2 17-Apr-2012  yamt sync with head
 1.2.6.1 24-Nov-2011  yamt file chfs_scan.c was added on branch yamt-pagecache on 2012-04-17 00:08:54 +0000
 1.3.2.2 03-Dec-2017  jdolecek update from HEAD
 1.3.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.4.12.2 16-Feb-2015  martin Pull up following revision(s) (requested by maxv in ticket #520):
sys/ufs/chfs/ebh.c: revision 1.6
sys/dev/sdmmc/sdmmc_mem.c: revision 1.33
sys/dev/ic/aic7xxx.c: revision 1.132
sys/fs/nfs/common/krpc_subr.c: revision 1.2
sys/modules/lua/lua.c: revision 1.16
sys/fs/udf/udf_subr.c: revision 1.128
sys/ufs/chfs/chfs_scan.c: revision 1.6
sys/dev/ic/an.c: revision 1.62

Fix six memory leaks and two inconsistencies.
 1.4.12.1 08-Sep-2014  msaitoh Pull up following revision(s) (requested by he in ticket #74):
sys/ufs/chfs/chfs_vnode.c: revision 1.11
sys/ufs/chfs/chfs_readinode.c: revision 1.9
sys/ufs/chfs/chfs_scan.c: revision 1.5
sys/ufs/chfs/chfs_gc.c: revision 1.6
sys/ufs/chfs/ebh.c: revision 1.4
Plug leak in chfs_scan_eraseblock() of the allocated buffer.
Make sure to release it both on success and failure returns.
OK'ed by ttoth@
Plug memory leak in a corner case in chfs_get_data_nodes().
Plug memory leaks in error returns in chfs_readvnode().
Plug memory leak in error returns and normal operation in
chfs_gcollect_pristine().
Plug memory leak in add_peb_to_free() and add_peb_to_in_use()
in case there's a duplicate in the tree.
 1.5.2.2 28-Aug-2017  skrll Sync with HEAD
 1.5.2.1 06-Apr-2015  skrll Sync with HEAD
 1.7.10.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.8.14.1 01-Aug-2021  thorpej Sync with HEAD.
 1.15 05-Sep-2020  riastradh Round of uvm.h cleanup.

The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
to query whether curlwp is the pagedaemon, which should maybe be
exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
by it. We should split up uvm_extern.h but this will serve for now
to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
UVMHIST(ubchist), since ubchist is declared in uvm.h but the
reference evaporates if UVMHIST is not defined, so we reduce header
file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
here.

ok chs@
 1.14 11-Jun-2020  ad uvm_availmem(): give it a boolean argument to specify whether a recent
cached value will do, or if the very latest total must be fetched. It can
be called thousands of times a second and fetching the totals impacts not
only the calling LWP but other CPUs doing unrelated activity in the VM
system.
 1.13 16-May-2020  christos Add ACL support for FFS. From FreeBSD.
 1.12 23-Apr-2020  ad PR kern/54759 (vm.ubc_direct deadlock when read()/write() into mapping of itself)

- Add new flag UBC_ISMAPPED which tells ubc_uiomove() the object is mmap()ed
somewhere. Use it to decide whether to do direct-mapped copy, rather than
poking around directly in the vnode in ubc_uiomove(), which is ugly and
doesn't work for tmpfs. It would be nicer to contain all this in UVM but
the filesystem provides the needed locking here (VV_MAPPED) and to
reinvent that would suck more.

- Rename UBC_UNMAP_FLAG() to UBC_VNODE_FLAGS(). Pass in UBC_ISMAPPED where
appropriate.
 1.11 31-Dec-2019  ad branches: 1.11.6;
Rename uvm_free() -> uvm_availmem().
 1.10 21-Dec-2019  ad uvmexp.free -> uvm_free()
 1.9 20-Oct-2013  christos branches: 1.9.30;
remove unused
 1.8 19-Oct-2012  ttoth branches: 1.8.2;
CHFS comments
 1.7 22-Aug-2012  ttoth branches: 1.7.2;
chfs: fixed truncating
 1.6 13-Aug-2012  ttoth chfs fixes
1. nodes are obsoleted only once during truncating a file
2. frags don't stay in pool_cache
 1.5 10-Aug-2012  ttoth chfs bugfix [node was obsoleted twice]
 1.4 12-Apr-2012  ttoth branches: 1.4.2;
using chtype on media instead of vtype
debug.c deleted
 1.3 13-Mar-2012  elad Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.
 1.2 24-Nov-2011  agc branches: 1.2.2;
quick workaround to make this compile, with thanks to Hisashi Fujinaka for the
nudge.
 1.1 24-Nov-2011  ahoka Import CHFS, which was formerly known as ChewieFS.

CHFS is a file system for flash devices developed by the
Software Engineering Department at University of Szeged, Hungary.

http://chewiefs.sed.hu/

Thanks for all who made it possible.
 1.2.2.2 29-Apr-2012  mrg sync to latest -current.
 1.2.2.1 05-Apr-2012  mrg sync to latest -current.
 1.4.2.4 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.4.2.3 30-Oct-2012  yamt sync with head
 1.4.2.2 17-Apr-2012  yamt sync with head
 1.4.2.1 12-Apr-2012  yamt file chfs_subr.c was added on branch yamt-pagecache on 2012-04-17 00:08:54 +0000
 1.7.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.7.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.8.2.1 18-May-2014  rmind sync with head
 1.9.30.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.11.6.1 25-Apr-2020  bouyer Sync with bouyer-xenpvh-base2 (HEAD)
 1.24 28-May-2025  andvar fix few typos in comments.
 1.23 19-Mar-2022  hannken branches: 1.23.10;
Remove now unused VV_LOCKSWORK, all file systems support locking.

Remove unused predicates vn_locked() and vn_anylocked().

Welcome to 9.99.95
 1.22 05-Sep-2020  riastradh Round of uvm.h cleanup.

The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
to query whether curlwp is the pagedaemon, which should maybe be
exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
by it. We should split up uvm_extern.h but this will serve for now
to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
UVMHIST(ubchist), since ubchist is declared in uvm.h but the
reference evaporates if UVMHIST is not defined, so we reduce header
file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
here.

ok chs@
 1.21 17-Jan-2020  ad VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.
 1.20 27-Dec-2019  msaitoh branches: 1.20.2;
s/inital/initial/
 1.19 20-Jun-2019  pgoyette Split the ufs code out of the ffs module and into its own module.

Adapt chfs and ext2fs modules accordingly.
 1.18 28-May-2018  chs branches: 1.18.2;
add a genfs method to allow a file system to limit the range of pages
that are given to a single GOP_WRITE() call. needed by ZFS.
 1.17 14-Nov-2017  riastradh branches: 1.17.2;
Fix up chfs_mountfs error branches.
 1.16 17-Feb-2017  hannken branches: 1.16.4;
Add generic genfs_suspendctl() and use it for all file systems.
Layered file systems need work.
 1.15 11-Jan-2015  hannken branches: 1.15.2; 1.15.4;
Change chfs from hashlist to vcache.
 1.14 09-Nov-2014  maxv branches: 1.14.2;
Do not uselessly include <sys/malloc.h>.
 1.13 20-Oct-2014  christos simplify.
 1.12 20-Oct-2014  maxv Memory leak.

Found by my code scanner.

ok christos@
 1.11 16-Apr-2014  maxv branches: 1.11.2;
An (un)privileged user can easily make the kernel dereference a NULL
pointer.

The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).

ok christos@
 1.10 23-Mar-2014  hannken branches: 1.10.2;
Change all vfsops to use C99 designated initializers.

No functional changes intended.
 1.9 20-Oct-2013  christos remove unused
 1.8 30-Sep-2013  hannken Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>
 1.7 22-Jan-2013  dholland branches: 1.7.2;
Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.6 19-Oct-2012  ttoth CHFS comments
 1.5 10-Aug-2012  ttoth branches: 1.5.2;
chfs bugfix [node was obsoleted twice]
 1.4 30-Apr-2012  rmind - Replace some malloc(9) uses with kmem(9).
- G/C M_IPMOPTS, M_IPMADDR and M_BWMETER.
 1.3 12-Apr-2012  ttoth branches: 1.3.2;
using chtype on media instead of vtype
debug.c deleted
 1.2 24-Nov-2011  agc branches: 1.2.2; 1.2.4; 1.2.8; 1.2.10;
quick workaround to make this compile, with thanks to Hisashi Fujinaka for the
nudge.
 1.1 24-Nov-2011  ahoka Import CHFS, which was formerly known as ChewieFS.

CHFS is a file system for flash devices developed by the
Software Engineering Department at University of Szeged, Hungary.

http://chewiefs.sed.hu/

Thanks for all who made it possible.
 1.2.10.1 21-Apr-2014  bouyer Pull up following revision(s) (requested by maxv in ticket #1050):
sys/ufs/chfs/chfs_vfsops.c: revision 1.11
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/fs/nilfs/nilfs_vfsops.c: revision 1.16
sys/ufs/mfs/mfs_vfsops.c: revision 1.107
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/kern/vfs_syscalls.c: revision 1.478
sys/kern/vfs_syscalls.c: revision 1.479
sys/fs/puffs/puffs_vfsops.c: revision 1.110
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/nfs/nfs_vfsops.c: revision 1.227
sys/fs/v7fs/v7fs_vfsops.c: revision 1.10
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/miscfs/nullfs/null_vfsops.c: revision 1.88
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50
sys/coda/coda_vfsops.c: revision 1.81
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/kern/vfs_syscalls.c: revision 1.480
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/kern/vfs_syscalls.c: revision 1.482
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vfsops.c: revision 1.12
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/udf/udf_vfsops.c: revision 1.67
Limit check for 'data_len'. Otherwise a (un)privileged user can easily
panic the system by passing a huge size.
ok christos@
An (un)privileged user can easily make the kernel dereference a NULL
pointer.
The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).
ok christos@
Some fs's - like kernfs - set their vfs_min_mount_data to zero. Add a check
to prevent an (un)privileged user from requesting a zero-sized allocation
(and thus a panic).
This thing is totally buggy: 'data_len' is modified by the fs, so calling
kmem_free with it while its value has changed since the kmem_alloc is far
from being a good idea.
If the kernel figures out that something mismatches, it will panic
(typically with kernfs).
 1.2.8.1 21-Apr-2014  bouyer Pull up following revision(s) (requested by maxv in ticket #1050):
sys/ufs/chfs/chfs_vfsops.c: revision 1.11
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/fs/nilfs/nilfs_vfsops.c: revision 1.16
sys/ufs/mfs/mfs_vfsops.c: revision 1.107
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/kern/vfs_syscalls.c: revision 1.478
sys/kern/vfs_syscalls.c: revision 1.479
sys/fs/puffs/puffs_vfsops.c: revision 1.110
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/nfs/nfs_vfsops.c: revision 1.227
sys/fs/v7fs/v7fs_vfsops.c: revision 1.10
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/miscfs/nullfs/null_vfsops.c: revision 1.88
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50
sys/coda/coda_vfsops.c: revision 1.81
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/kern/vfs_syscalls.c: revision 1.480
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/kern/vfs_syscalls.c: revision 1.482
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vfsops.c: revision 1.12
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/udf/udf_vfsops.c: revision 1.67
Limit check for 'data_len'. Otherwise a (un)privileged user can easily
panic the system by passing a huge size.
ok christos@
An (un)privileged user can easily make the kernel dereference a NULL
pointer.
The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).
ok christos@
Some fs's - like kernfs - set their vfs_min_mount_data to zero. Add a check
to prevent an (un)privileged user from requesting a zero-sized allocation
(and thus a panic).
This thing is totally buggy: 'data_len' is modified by the fs, so calling
kmem_free with it while its value has changed since the kmem_alloc is far
from being a good idea.
If the kernel figures out that something mismatches, it will panic
(typically with kernfs).
 1.2.4.1 21-Apr-2014  bouyer Pull up following revision(s) (requested by maxv in ticket #1050):
sys/ufs/chfs/chfs_vfsops.c: revision 1.11
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/fs/nilfs/nilfs_vfsops.c: revision 1.16
sys/ufs/mfs/mfs_vfsops.c: revision 1.107
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/kern/vfs_syscalls.c: revision 1.478
sys/kern/vfs_syscalls.c: revision 1.479
sys/fs/puffs/puffs_vfsops.c: revision 1.110
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/nfs/nfs_vfsops.c: revision 1.227
sys/fs/v7fs/v7fs_vfsops.c: revision 1.10
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/miscfs/nullfs/null_vfsops.c: revision 1.88
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50
sys/coda/coda_vfsops.c: revision 1.81
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/kern/vfs_syscalls.c: revision 1.480
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/kern/vfs_syscalls.c: revision 1.482
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vfsops.c: revision 1.12
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/udf/udf_vfsops.c: revision 1.67
Limit check for 'data_len'. Otherwise a (un)privileged user can easily
panic the system by passing a huge size.
ok christos@
An (un)privileged user can easily make the kernel dereference a NULL
pointer.
The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).
ok christos@
Some fs's - like kernfs - set their vfs_min_mount_data to zero. Add a check
to prevent an (un)privileged user from requesting a zero-sized allocation
(and thus a panic).
This thing is totally buggy: 'data_len' is modified by the fs, so calling
kmem_free with it while its value has changed since the kmem_alloc is far
from being a good idea.
If the kernel figures out that something mismatches, it will panic
(typically with kernfs).
 1.2.2.2 02-Jun-2012  mrg sync to latest -current.
 1.2.2.1 29-Apr-2012  mrg sync to latest -current.
 1.3.2.6 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.3.2.5 23-Jan-2013  yamt sync with head
 1.3.2.4 30-Oct-2012  yamt sync with head
 1.3.2.3 23-May-2012  yamt sync with head.
 1.3.2.2 17-Apr-2012  yamt sync with head
 1.3.2.1 12-Apr-2012  yamt file chfs_vfsops.c was added on branch yamt-pagecache on 2012-04-17 00:08:54 +0000
 1.5.2.4 03-Dec-2017  jdolecek update from HEAD
 1.5.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.5.2.2 25-Feb-2013  tls resync with head
 1.5.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.7.2.1 18-May-2014  rmind sync with head
 1.10.2.1 10-Aug-2014  tls Rebase.
 1.11.2.2 17-Jan-2015  martin Pull up following revision(s) (requested by maxv in ticket #427):
sys/compat/svr4/svr4_schedctl.c: revision 1.8
sys/netinet/tcp_timer.c: revision 1.88
sys/miscfs/genfs/layer_vfsops.c: revision 1.45
sys/compat/svr4/svr4_ioctl.c: revision 1.37
sys/ufs/chfs/chfs_vfsops.c: revision 1.14
sys/miscfs/fdesc/fdesc_vfsops.c: revision 1.91
sys/compat/linux/arch/i386/linux_ptrace.c: revision 1.30
sys/compat/common/kern_time_50.c: revision 1.28
sys/netinet6/ip6_forward.c: revision 1.74
sys/miscfs/umapfs/umap_vnops.c: revision 1.57
sys/compat/svr4/svr4_fcntl.c: revision 1.74
distrib/sets/lists/comp/mi: revision 1.1931
sys/netinet6/udp6_output.c: revision 1.46
sys/fs/puffs/puffs_compat.c: revision 1.3
sys/fs/udf/udf_rename.c: revision 1.11
sys/compat/svr4/svr4_filio.c: revision 1.24
sys/fs/udf/udf_rename.c: revision 1.12
sys/netinet/tcp_usrreq.c: revision 1.202
sys/miscfs/umapfs/umap_subr.c: revision 1.29
sys/compat/linux/common/linux_fadvise64.c: revision 1.3
sys/netinet/if_atm.c: revision 1.34
sys/miscfs/procfs/procfs_subr.c: revision 1.106
sys/miscfs/genfs/layer_subr.c: revision 1.37
sys/netinet/tcp_sack.c: revision 1.30
sys/compat/freebsd/freebsd_misc.c: revision 1.33
sys/compat/freebsd/freebsd_file.c: revision 1.33
sys/ufs/chfs/chfs_vnode.c: revision 1.12
sys/compat/svr4/svr4_ttold.c: revision 1.34
sys/compat/linux/common/linux_file.c: revision 1.114
sys/compat/linux/arch/mips/linux_machdep.c: revision 1.43
sys/compat/linux/common/linux_signal.c: revision 1.76
sys/compat/common/compat_util.c: revision 1.46
sys/compat/linux/arch/arm/linux_ptrace.c: revision 1.18
sys/compat/svr4/svr4_sockio.c: revision 1.36
sys/compat/linux/arch/arm/linux_machdep.c: revision 1.32
sys/compat/svr4/svr4_signal.c: revision 1.66
sys/kern/kern_exec.c: revision 1.410
sys/fs/puffs/puffs_vfsops.c: revision 1.115
sys/compat/svr4/svr4_exec_elf64.c: revision 1.15
sys/compat/linux/arch/i386/linux_machdep.c: revision 1.159
sys/compat/linux/arch/alpha/linux_machdep.c: revision 1.50
sys/compat/linux32/common/linux32_misc.c: revision 1.24
sys/netinet/in_pcb.c: revision 1.153
sys/sys/malloc.h: revision 1.116
sys/compat/common/if_43.c: revision 1.9
share/man/man9/Makefile: revision 1.380
sys/netinet/tcp_vtw.c: revision 1.12
sys/miscfs/umapfs/umap_vfsops.c: revision 1.95
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.186
sys/compat/common/uipc_syscalls_43.c: revision 1.46
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.115
sys/fs/puffs/puffs_msgif.c: revision 1.97
sys/compat/svr4/svr4_ipc.c: revision 1.27
sys/compat/linux/common/linux_exec.c: revision 1.117
sys/ufs/ext2fs/ext2fs_readwrite.c: revision 1.66
sys/netinet/tcp_output.c: revision 1.179
sys/compat/svr4/svr4_termios.c: revision 1.28
sys/fs/udf/udf_strat_bootstrap.c: revision 1.4
sys/fs/puffs/puffs_subr.c: revision 1.67
sys/fs/puffs/puffs_node.c: revision 1.36
sys/miscfs/overlay/overlay_vnops.c: revision 1.21
sys/fs/cd9660/cd9660_node.c: revision 1.34
sys/netinet/raw_ip.c: revision 1.146
sys/sys/mallocvar.h: revision 1.13
sys/miscfs/overlay/overlay_vfsops.c: revision 1.63
share/man/man9/malloc.9: revision 1.50
sys/netinet6/dest6.c: revision 1.18
sys/compat/linux/common/linux_uselib.c: revision 1.33
sys/compat/linux/common/linux_socket.c: revision 1.120
share/man/man9/malloc.9: revision 1.51
sys/netinet/tcp_subr.c: revision 1.257
sys/compat/linux/common/linux_socketcall.c: revision 1.45
sys/compat/linux/common/linux_fadvise64_64.c: revision 1.3
sys/compat/freebsd/freebsd_ipc.c: revision 1.17
sys/compat/linux/common/linux_misc_notalpha.c: revision 1.109
sys/compat/linux/arch/alpha/linux_pipe.c: revision 1.17
sys/netinet6/in6_pcb.c: revision 1.132
sys/netinet6/in6_ifattach.c: revision 1.94
sys/compat/svr4/svr4_exec_elf32.c: revision 1.15
sys/miscfs/nullfs/null_vfsops.c: revision 1.90
sys/fs/cd9660/cd9660_util.c: revision 1.12
sys/compat/linux/arch/powerpc/linux_machdep.c: revision 1.48
sys/compat/freebsd/freebsd_exec_elf32.c: revision 1.20
sys/miscfs/procfs/procfs_vfsops.c: revision 1.94
sys/compat/linux/arch/powerpc/linux_ptrace.c: revision 1.28
sys/compat/linux/common/linux_sched.c: revision 1.67
sys/compat/linux/common/linux_exec_aout.c: revision 1.67
sys/compat/linux/common/linux_pipe.c: revision 1.67
sys/compat/linux/common/linux_llseek.c: revision 1.34
sys/compat/linux/arch/mips/linux_ptrace.c: revision 1.10
Do not uselessly include <sys/malloc.h>.
Cleanup:
- remove struct kmembuckets (dead)
- correctly deadify MALLOC_XX
- remove MALLOC_DEFINE_LIMIT and MALLOC_JUSTDEFINE_LIMIT (dead)
- remove malloc_roundup(), malloc_type_setlimit(), MALLOC_DEFINE_LIMIT()
and MALLOC_JUSTDEFINE_LIMIT() from man 9 malloc
New sentence, new line. Bump date for previous.
Obsolete malloc_roundup(9), malloc_type_setlimit(9) and MALLOC_DEFINE_LIMIT(9)
man pages.
 1.11.2.1 29-Dec-2014  martin Pull up following revision(s) (requested by maxv in ticket #353):
sys/ufs/chfs/chfs_vfsops.c: revision 1.12
sys/compat/common/vfs_syscalls_30.c: revision 1.35
sys/compat/linux/common/linux_uselib.c: revision 1.31
sys/compat/linux/common/linux_uselib.c: revision 1.32
Resource leak.
Memory leaks.
Reject non-regular files.
 1.14.2.2 28-Aug-2017  skrll Sync with HEAD
 1.14.2.1 06-Apr-2015  skrll Sync with HEAD
 1.15.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.15.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.15.2.1 20-Jul-2016  pgoyette Adapt machine-independant code to the new {b,c}devsw reference-counting
(using localcount(9)). All callers of {b,c}devsw_lookup() now call
{b,c}devsw_lookup_acquire() which retains a reference on the 'struct
{b,c}devsw'. This reference must be released by the caller once it is
finished with the structure's content (or other data that would disappear
if the 'struct {b,c}devsw' were to disappear).
 1.16.4.1 27-Apr-2017  pgoyette Restore all work from the former pgoyette-localcount branch (which is
now abandoned doe to cvs merge botch).

The branch now builds, and installs via anita. There are still some
problems (cgd is non-functional and all atf tests time-out) but they
will get resolved soon.
 1.17.2.1 25-Jun-2018  pgoyette Sync with HEAD
 1.18.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.18.2.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.20.2.1 17-Jan-2020  ad Sync with head.
 1.23.10.1 02-Aug-2025  perseant Sync with HEAD
 1.20 07-Dec-2021  andvar fix typos in word "instead", mainly in log messages.
 1.19 16-May-2020  christos Add ACL support for FFS. From FreeBSD.
 1.18 17-Jan-2020  ad VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.
 1.17 18-Sep-2019  christos branches: 1.17.2;
fix compilation
 1.16 18-Sep-2019  christos Add newly created vnodes to the namei cache. The rest of the filesystems
already did that (or they don't support writing). Discussed in tech-kern.
 1.15 01-Apr-2017  riastradh branches: 1.15.14;
KASSERT(mutex_owned(vp->v_interlock)) in vnode iterator selector.
 1.14 11-Jan-2015  hannken branches: 1.14.2; 1.14.4;
Change chfs from hashlist to vcache.
 1.13 11-Jan-2015  hannken Convert a bogus mnt_vnodelist traversal to vfs_vnode_iterator.
 1.12 09-Nov-2014  maxv branches: 1.12.2;
Do not uselessly include <sys/malloc.h>.
 1.11 01-Sep-2014  he Plug memory leaks in error returns in chfs_readvnode().
 1.10 23-Jan-2014  hannken branches: 1.10.4;
Change vnode operations create, mknod, mkdir and symlink to return
the resulting vnode *vpp unlocked.

Discussed on tech-kern@

Welcome to 6.99.30
 1.9 17-Jan-2014  hannken Change vnode operations create, mknod, mkdir and symlink to keep the
directory node dvp locked on return.

Discussed on tech-kern@

Welcome to 6.99.29
 1.8 19-Oct-2012  ttoth branches: 1.8.2;
CHFS comments
 1.7 13-Aug-2012  ttoth branches: 1.7.2;
chfs fixes
1. nodes are obsoleted only once during truncating a file
2. frags don't stay in pool_cache
 1.6 10-Aug-2012  ttoth chfs bugfix [node was obsoleted twice]
 1.5 13-Apr-2012  ttoth branches: 1.5.2;
prepare for chfs's makefs
 1.4 12-Apr-2012  ttoth using chtype on media instead of vtype
debug.c deleted
 1.3 13-Mar-2012  elad Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.
 1.2 24-Nov-2011  agc branches: 1.2.2;
quick workaround to make this compile, with thanks to Hisashi Fujinaka for the
nudge.
 1.1 24-Nov-2011  ahoka Import CHFS, which was formerly known as ChewieFS.

CHFS is a file system for flash devices developed by the
Software Engineering Department at University of Szeged, Hungary.

http://chewiefs.sed.hu/

Thanks for all who made it possible.
 1.2.2.2 29-Apr-2012  mrg sync to latest -current.
 1.2.2.1 05-Apr-2012  mrg sync to latest -current.
 1.5.2.4 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.5.2.3 30-Oct-2012  yamt sync with head
 1.5.2.2 17-Apr-2012  yamt sync with head
 1.5.2.1 13-Apr-2012  yamt file chfs_vnode.c was added on branch yamt-pagecache on 2012-04-17 00:08:55 +0000
 1.7.2.3 03-Dec-2017  jdolecek update from HEAD
 1.7.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.7.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.8.2.1 18-May-2014  rmind sync with head
 1.10.4.2 17-Jan-2015  martin Pull up following revision(s) (requested by maxv in ticket #427):
sys/compat/svr4/svr4_schedctl.c: revision 1.8
sys/netinet/tcp_timer.c: revision 1.88
sys/miscfs/genfs/layer_vfsops.c: revision 1.45
sys/compat/svr4/svr4_ioctl.c: revision 1.37
sys/ufs/chfs/chfs_vfsops.c: revision 1.14
sys/miscfs/fdesc/fdesc_vfsops.c: revision 1.91
sys/compat/linux/arch/i386/linux_ptrace.c: revision 1.30
sys/compat/common/kern_time_50.c: revision 1.28
sys/netinet6/ip6_forward.c: revision 1.74
sys/miscfs/umapfs/umap_vnops.c: revision 1.57
sys/compat/svr4/svr4_fcntl.c: revision 1.74
distrib/sets/lists/comp/mi: revision 1.1931
sys/netinet6/udp6_output.c: revision 1.46
sys/fs/puffs/puffs_compat.c: revision 1.3
sys/fs/udf/udf_rename.c: revision 1.11
sys/compat/svr4/svr4_filio.c: revision 1.24
sys/fs/udf/udf_rename.c: revision 1.12
sys/netinet/tcp_usrreq.c: revision 1.202
sys/miscfs/umapfs/umap_subr.c: revision 1.29
sys/compat/linux/common/linux_fadvise64.c: revision 1.3
sys/netinet/if_atm.c: revision 1.34
sys/miscfs/procfs/procfs_subr.c: revision 1.106
sys/miscfs/genfs/layer_subr.c: revision 1.37
sys/netinet/tcp_sack.c: revision 1.30
sys/compat/freebsd/freebsd_misc.c: revision 1.33
sys/compat/freebsd/freebsd_file.c: revision 1.33
sys/ufs/chfs/chfs_vnode.c: revision 1.12
sys/compat/svr4/svr4_ttold.c: revision 1.34
sys/compat/linux/common/linux_file.c: revision 1.114
sys/compat/linux/arch/mips/linux_machdep.c: revision 1.43
sys/compat/linux/common/linux_signal.c: revision 1.76
sys/compat/common/compat_util.c: revision 1.46
sys/compat/linux/arch/arm/linux_ptrace.c: revision 1.18
sys/compat/svr4/svr4_sockio.c: revision 1.36
sys/compat/linux/arch/arm/linux_machdep.c: revision 1.32
sys/compat/svr4/svr4_signal.c: revision 1.66
sys/kern/kern_exec.c: revision 1.410
sys/fs/puffs/puffs_vfsops.c: revision 1.115
sys/compat/svr4/svr4_exec_elf64.c: revision 1.15
sys/compat/linux/arch/i386/linux_machdep.c: revision 1.159
sys/compat/linux/arch/alpha/linux_machdep.c: revision 1.50
sys/compat/linux32/common/linux32_misc.c: revision 1.24
sys/netinet/in_pcb.c: revision 1.153
sys/sys/malloc.h: revision 1.116
sys/compat/common/if_43.c: revision 1.9
share/man/man9/Makefile: revision 1.380
sys/netinet/tcp_vtw.c: revision 1.12
sys/miscfs/umapfs/umap_vfsops.c: revision 1.95
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.186
sys/compat/common/uipc_syscalls_43.c: revision 1.46
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.115
sys/fs/puffs/puffs_msgif.c: revision 1.97
sys/compat/svr4/svr4_ipc.c: revision 1.27
sys/compat/linux/common/linux_exec.c: revision 1.117
sys/ufs/ext2fs/ext2fs_readwrite.c: revision 1.66
sys/netinet/tcp_output.c: revision 1.179
sys/compat/svr4/svr4_termios.c: revision 1.28
sys/fs/udf/udf_strat_bootstrap.c: revision 1.4
sys/fs/puffs/puffs_subr.c: revision 1.67
sys/fs/puffs/puffs_node.c: revision 1.36
sys/miscfs/overlay/overlay_vnops.c: revision 1.21
sys/fs/cd9660/cd9660_node.c: revision 1.34
sys/netinet/raw_ip.c: revision 1.146
sys/sys/mallocvar.h: revision 1.13
sys/miscfs/overlay/overlay_vfsops.c: revision 1.63
share/man/man9/malloc.9: revision 1.50
sys/netinet6/dest6.c: revision 1.18
sys/compat/linux/common/linux_uselib.c: revision 1.33
sys/compat/linux/common/linux_socket.c: revision 1.120
share/man/man9/malloc.9: revision 1.51
sys/netinet/tcp_subr.c: revision 1.257
sys/compat/linux/common/linux_socketcall.c: revision 1.45
sys/compat/linux/common/linux_fadvise64_64.c: revision 1.3
sys/compat/freebsd/freebsd_ipc.c: revision 1.17
sys/compat/linux/common/linux_misc_notalpha.c: revision 1.109
sys/compat/linux/arch/alpha/linux_pipe.c: revision 1.17
sys/netinet6/in6_pcb.c: revision 1.132
sys/netinet6/in6_ifattach.c: revision 1.94
sys/compat/svr4/svr4_exec_elf32.c: revision 1.15
sys/miscfs/nullfs/null_vfsops.c: revision 1.90
sys/fs/cd9660/cd9660_util.c: revision 1.12
sys/compat/linux/arch/powerpc/linux_machdep.c: revision 1.48
sys/compat/freebsd/freebsd_exec_elf32.c: revision 1.20
sys/miscfs/procfs/procfs_vfsops.c: revision 1.94
sys/compat/linux/arch/powerpc/linux_ptrace.c: revision 1.28
sys/compat/linux/common/linux_sched.c: revision 1.67
sys/compat/linux/common/linux_exec_aout.c: revision 1.67
sys/compat/linux/common/linux_pipe.c: revision 1.67
sys/compat/linux/common/linux_llseek.c: revision 1.34
sys/compat/linux/arch/mips/linux_ptrace.c: revision 1.10
Do not uselessly include <sys/malloc.h>.
Cleanup:
- remove struct kmembuckets (dead)
- correctly deadify MALLOC_XX
- remove MALLOC_DEFINE_LIMIT and MALLOC_JUSTDEFINE_LIMIT (dead)
- remove malloc_roundup(), malloc_type_setlimit(), MALLOC_DEFINE_LIMIT()
and MALLOC_JUSTDEFINE_LIMIT() from man 9 malloc
New sentence, new line. Bump date for previous.
Obsolete malloc_roundup(9), malloc_type_setlimit(9) and MALLOC_DEFINE_LIMIT(9)
man pages.
 1.10.4.1 08-Sep-2014  msaitoh Pull up following revision(s) (requested by he in ticket #74):
sys/ufs/chfs/chfs_vnode.c: revision 1.11
sys/ufs/chfs/chfs_readinode.c: revision 1.9
sys/ufs/chfs/chfs_scan.c: revision 1.5
sys/ufs/chfs/chfs_gc.c: revision 1.6
sys/ufs/chfs/ebh.c: revision 1.4
Plug leak in chfs_scan_eraseblock() of the allocated buffer.
Make sure to release it both on success and failure returns.
OK'ed by ttoth@
Plug memory leak in a corner case in chfs_get_data_nodes().
Plug memory leaks in error returns in chfs_readvnode().
Plug memory leak in error returns and normal operation in
chfs_gcollect_pristine().
Plug memory leak in add_peb_to_free() and add_peb_to_in_use()
in case there's a duplicate in the tree.
 1.12.2.2 28-Aug-2017  skrll Sync with HEAD
 1.12.2.1 06-Apr-2015  skrll Sync with HEAD
 1.14.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.14.2.1 26-Apr-2017  pgoyette Sync with HEAD
 1.15.14.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.15.14.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.17.2.1 17-Jan-2020  ad Sync with head.
 1.3 19-Oct-2012  ttoth CHFS comments
 1.2 10-Aug-2012  ttoth branches: 1.2.2;
chfs bugfix [node was obsoleted twice]
 1.1 24-Nov-2011  ahoka branches: 1.1.6;
Import CHFS, which was formerly known as ChewieFS.

CHFS is a file system for flash devices developed by the
Software Engineering Department at University of Szeged, Hungary.

http://chewiefs.sed.hu/

Thanks for all who made it possible.
 1.1.6.3 30-Oct-2012  yamt sync with head
 1.1.6.2 17-Apr-2012  yamt sync with head
 1.1.6.1 24-Nov-2011  yamt file chfs_vnode_cache.c was added on branch yamt-pagecache on 2012-04-17 00:08:55 +0000
 1.2.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.48 27-Mar-2022  christos add a kauth vnode check for creating links
 1.47 07-Dec-2021  andvar fix various typos, mainly in comments.
 1.46 20-Oct-2021  thorpej Overhaul of the EVFILT_VNODE kevent(2) filter:

- Centralize vnode kevent handling in the VOP_*() wrappers, rather than
forcing each individual file system to deal with it (except VOP_RENAME(),
because VOP_RENAME() is a mess and we currently have 2 different ways
of handling it; at least it's reasonably well-centralized in the "new"
way).
- Add support for NOTE_OPEN, NOTE_CLOSE, NOTE_CLOSE_WRITE, and NOTE_READ,
compatible with the same events in FreeBSD.
- Track which kevent notifications clients are interested in receiving
to avoid doing work for events no one cares about (avoiding, e.g.
taking locks and traversing the klist to send a NOTE_WRITE when
someone is merely watching for a file to be deleted, for example).

In support of the above:

- Add support in vnode_if.sh for specifying PRE- and POST-op handlers,
to be invoked before and after vop_pre() and vop_post(), respectively.
Basic idea from FreeBSD, but implemented differently.
- Add support in vnode_if.sh for specifying CONTEXT fields in the
vop_*_args structures. These context fields are used to convey information
between the file system VOP function and the VOP wrapper, but do not
occupy an argument slot in the VOP_*() call itself. These context fields
are initialized and subsequently interpreted by PRE- and POST-op handlers.
- Version VOP_REMOVE(), uses the a context field for the file system to report
back the resulting link count of the target vnode. Return this in tmpfs,
udf, nfs, chfs, ext2fs, lfs, and ufs.

NetBSD 9.99.92.
 1.45 18-Jul-2021  dholland Use macros for the canned parts of device and fifo vnode op tables.

Add GENFS_SPECOP_ENTRIES and GENFS_FIFOOP_ENTRIES macros that contain
the portion of the vnode ops table declaration that is
(conservatively) the same in every fs. Use these in every fs that
supports devices and/or fifos with separate ops tables.

Note that ptyfs works differently (it has one type of vnode with
open-coded dispatch to the specfs code, which I haven't changed in
this commit) and rump/librump/rumpvfs/rumpfs.c has an indirect dynamic
dispatch that already does more or less the same thing, which I also
haven't changed.

Also note that this anticipates a few bits in the next changeset here
and there, and adds missing but unreachable calls in some cases (e.g.
most fses weren't defining whiteout on devices and fifos, but it isn't
reachable there), and it changes parsepath on devices and fifos to
genfs_badop from genfs_parsepath (but it's not reachable there
either).

It appears that devices in kernfs were missing kqfilter, so it's
possible that if you try to use kqueue on /kern/rootdev that it'll
explode.

And finally note that the ops declaration tables aren't
order-dependent. (Other than vop_default_desc has to come first.)
Otherwise this wouldn't work.
 1.44 05-Jul-2021  dholland whitespace
 1.43 29-Jun-2021  dholland - Add a new vnode op: VOP_PARSEPATH.
- Move namei_getcomponent to genfs_vnops.c and call it genfs_parsepath.
- Add a parsepath entry to every vnode ops table.

VOP_PARSEPATH takes a directory vnode to be searched and a complete
following path and chooses how much of that path to consume. To begin
with, all parsepath calls are genfs_parsepath, which locates the first
'/' as always.

Note that the call doesn't take the whole struct componentname, only
the string. The other bits of struct componentname should not be
needed and there's no reason to cause potential complications by
exposing them.
 1.42 05-Sep-2020  riastradh branches: 1.42.6;
Round of uvm.h cleanup.

The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
to query whether curlwp is the pagedaemon, which should maybe be
exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
by it. We should split up uvm_extern.h but this will serve for now
to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
UVMHIST(ubchist), since ubchist is declared in uvm.h but the
reference evaporates if UVMHIST is not defined, so we reduce header
file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
here.

ok chs@
 1.41 23-May-2020  ad Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.
 1.40 20-May-2020  christos fix accessx confusion (thanks hannken@)
 1.39 16-May-2020  christos Add ACL support for FFS. From FreeBSD.
 1.38 23-Apr-2020  ad PR kern/54759 (vm.ubc_direct deadlock when read()/write() into mapping of itself)

- Add new flag UBC_ISMAPPED which tells ubc_uiomove() the object is mmap()ed
somewhere. Use it to decide whether to do direct-mapped copy, rather than
poking around directly in the vnode in ubc_uiomove(), which is ugly and
doesn't work for tmpfs. It would be nicer to contain all this in UVM but
the filesystem provides the needed locking here (VV_MAPPED) and to
reinvent that would suck more.

- Rename UBC_UNMAP_FLAG() to UBC_VNODE_FLAGS(). Pass in UBC_ISMAPPED where
appropriate.
 1.37 04-Apr-2020  ad branches: 1.37.2;
Merge the remaining changes from the ad-namecache branch, affecting namei()
and getcwd():

- push vnode locking back as far as possible.
- do most lookups directly in the namecache, avoiding vnode locks & refs.
- don't block new refs to vnodes across VOP_INACTIVE().
- get shared locks for VOP_LOOKUP() if the file system supports it.
- correct lock types for VOP_ACCESS() / VOP_GETATTR() in a few places.

Possible future enhancements:

- make the lookups lockless.
- support dotdot lookups by being lockless and inferring absence of chroot.
- maybe make it work for layered file systems.
- avoid vnode references at the root & cwd.
 1.36 23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.35 17-Jan-2020  ad VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.
 1.34 17-Jun-2019  ryoon branches: 1.34.4;
Fix build without DIAGNOSTIC
 1.33 26-May-2017  riastradh branches: 1.33.10;
Make VOP_RECLAIM do the last unlock of the vnode.

VOP_RECLAIM naturally has exclusive access to the vnode, so having it
locked on entry is not strictly necessary -- but it means if there
are any final operations that must be done on the vnode, such as
ffs_update, requiring exclusive access to it, we can now kassert that
the vnode is locked in those operations.

We can't just have the caller release the last lock because some file
systems don't use genfs_lock, and require the vnode to remain valid
for VOP_UNLOCK to work, notably unionfs.
 1.32 26-Apr-2017  riastradh Change VOP_REMOVE and VOP_RMDIR to preserve lock/ref on dvp.

No change to vp -- the plan is to replace the node by the
componentname in the vop parameters, and let all directory vops do
lookups internally.

Proposed on tech-kern with no objections:
https://mail-index.netbsd.org/tech-kern/2017/04/17/msg021825.html
 1.31 11-Apr-2017  riastradh Make VOP_INACTIVE preserve vnode lock on return.

Discussed on tech-kern:
https://mail-index.netbsd.org/tech-kern/2017/04/01/msg021751.html

Ride 7.99.68, a bumpy bus of incremental vfs improvements!
 1.30 30-Mar-2017  hannken Remove now redundant calls to fstrans_start()/fstrans_done().
 1.29 20-Aug-2016  hannken branches: 1.29.2;
Remove now obsolete operation vcache_remove().

Welcome to 7.99.36
 1.28 20-Apr-2015  riastradh branches: 1.28.2;
Make VOP_LINK return directory still locked and referenced.

Ride 7.99.10 bump.
 1.27 28-Mar-2015  maxv Remove the 'cred' argument from bread(). Remove a now unused var in
ffs_snapshot.c. Update the man page accordingly.

ok hannken@
 1.26 28-Mar-2015  maxv Remove the 'cred' argument from breadn(), and update the man page
accordingly.

ok hannken@
 1.25 27-Mar-2015  riastradh Disentangle buffer-cached I/O from page-cached I/O in UFS.

Page-cached I/O is used for regular files, and is initiated by VFS
users such as userland and NFS.

Buffer-cached I/O is used for directories and symlinks, and is issued
only internally by UFS.

New UFS routine ufs_bufio replaces vn_rdwr for internal use.
ufs_bufio is implemented by new UFS operations uo_bufrd/uo_bufwr,
which sit in ufs_readwrite.c alongside the VOP_READ/VOP_WRITE
implementations.

I preserved the code as much as possible and will leave further
simplification for future commits. I kept the ulfs_readwrite.c
copypasta close to ufs_readwrite.c in case we ever want to merge them
back; likewise ext2fs_readwrite.c.

No externally visible semantic change. All atf fs tests still pass.
 1.24 11-Jan-2015  hannken Change chfs from hashlist to vcache.
 1.23 11-Jan-2015  hannken Return immediately from successfull cache_lookup().
No need to unlock an unlocked vnode.
 1.22 25-Jul-2014  dholland branches: 1.22.4;
Add VOP_FALLOCATE and VOP_FDISCARD to every vnode ops table I can
find.

The filesystem ones all call genfs_eopnotsupp - right now I am only
implementing the plumbing and we can implement fallocate and/or
fdiscard for files later.

The device ones call spec_fallocate (which is also genfs_eopnotsupp)
and spec_fdiscard, which dispatches to the device-level op.

The fifo ones all call vn_fifo_bypass, which also ends up being
EOPNOTSUPP.
 1.21 07-Feb-2014  hannken branches: 1.21.2;
Change vnode operation lookup to return the resulting vnode *vpp unlocked.
Change cache_lookup() to return an unlocked vnode.

Discussed on tech-kern@

Welcome to 6.99.31
 1.20 23-Jan-2014  hannken Change vnode operations create, mknod, mkdir and symlink to return
the resulting vnode *vpp unlocked.

Discussed on tech-kern@

Welcome to 6.99.30
 1.19 17-Jan-2014  hannken Change vnode operations create, mknod, mkdir and symlink to keep the
directory node dvp locked on return.

Discussed on tech-kern@

Welcome to 6.99.29
 1.18 20-Oct-2013  christos remove unused
 1.17 23-Jun-2013  dholland branches: 1.17.2;
Stick ffs_, ext2_, chfs_, filecore_, cd9660_, or mfs_ in front of
the following symbols so as to disambiguate fully. (Christos already
did the lfs ones.)

lblkno
lblktosize
lfragtosize
numfrags
blkroundup
fragroundup
 1.16 19-Jun-2013  dholland blkoff -> chfs_blkoff
blksize -> chfs_blksize
 1.15 18-Mar-2013  plunky C99 section 6.7.2.3 (Tags) Note 3 states that:

A type specifier of the form

enum identifier

without an enumerator list shall only appear after the type it
specifies is complete.

which means that we cannot pass an "enum vtype" argument to
kauth_access_action() without fully specifying the type first.
Unfortunately there is a complicated include file loop which
makes that difficult, so convert this minimal function into a
macro (and capitalize it).

(ok elad@)
 1.14 22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.13 05-Nov-2012  dholland Excise struct componentname from the namecache.

This uglifies the interface, because several operations need to be
passed the namei flags and cache_lookup also needs for the time being
to be passed cnp->cn_nameiop. Nonetheless, it's a net benefit.

The glop should be able to go away eventually but requires structural
cleanup elsewhere first.

This change requires a kernel bump.
 1.12 05-Nov-2012  dholland Disentangle the namecache from the internals of namei.

- Move the namecache's hash computation to inside the namecache code,
instead of being spread out all over the place. Remove cn_hash from
struct componentname and delete all uses of it.

- It is no longer necessary (if it ever was) for cache_lookup and
cache_lookup_raw to clear MAKEENTRY from cnp->cn_flags for the cases
that cache_enter already checks for.

- Rearrange the interface of cache_lookup (and cache_lookup_raw) to
make it somewhat simpler, to exclude certain nonexistent error
conditions, and (most importantly) to make it not require write access
to cnp->cn_flags.

This change requires a kernel bump.
 1.11 19-Oct-2012  ttoth CHFS comments
 1.10 23-Aug-2012  ttoth branches: 1.10.2;
chfs: uappnd flag patch
 1.9 10-Aug-2012  ttoth chfs bugfix [node was obsoleted twice]
 1.8 22-Jul-2012  rmind Move some the test for MAKEENTRY into the cache_enter(9). Make some
variables in vfs_cache.c static, __read_mostly, etc.

No objection on tech-kern@.
 1.7 29-Apr-2012  chs change vflushbuf() to take the full FSYNC_* flags.
translate FSYNC_LAZY into PGO_LAZY for VOP_PUTPAGES() so that
genfs_do_io() can set the appropriate io priority for the I/O.
this is the first part of addressing PR 46325.
 1.6 18-Apr-2012  joerg Don't depend on implicit enum casts, be explicit.
 1.5 17-Apr-2012  christos it is not an error if the kernel needs to clear the setuid/
setgid bit on write/chown/chgrp
 1.4 12-Apr-2012  ttoth branches: 1.4.2;
using chtype on media instead of vtype
debug.c deleted
 1.3 13-Mar-2012  elad Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.
 1.2 24-Nov-2011  agc branches: 1.2.2; 1.2.4;
quick workaround to make this compile, with thanks to Hisashi Fujinaka for the
nudge.
 1.1 24-Nov-2011  ahoka Import CHFS, which was formerly known as ChewieFS.

CHFS is a file system for flash devices developed by the
Software Engineering Department at University of Szeged, Hungary.

http://chewiefs.sed.hu/

Thanks for all who made it possible.
 1.2.4.2 12-Aug-2012  martin Pull up following revision(s) (requested by manu in ticket #484):
sys/fs/nilfs/nilfs_vnops.c: revision 1.18
sys/ufs/ufs/ufs_lookup.c: revision 1.117
sys/nfs/nfs_vnops.c: revision 1.295
sys/ufs/chfs/chfs_vnops.c: revision 1.8
sys/ufs/ext2fs/ext2fs_lookup.c: revision 1.70
sys/fs/unionfs/unionfs_vnops.c: revision 1.6
sys/kern/vfs_cache.c: revision 1.89
sys/fs/efs/efs_vnops.c: revision 1.26
sys/fs/hfs/hfs_vnops.c: revision 1.26
sys/fs/adosfs/adlookup.c: revision 1.16
sys/fs/puffs/puffs_vnops.c: revision 1.168
sys/fs/tmpfs/tmpfs_vnops.c: revision 1.98
sys/fs/ntfs/ntfs_vnops.c: revision 1.52
sys/fs/cd9660/cd9660_lookup.c: revision 1.20
sys/fs/msdosfs/msdosfs_lookup.c: revision 1.24
sys/fs/smbfs/smbfs_vnops.c: revision 1.80
sys/fs/udf/udf_vnops.c: revision 1.72
sys/fs/filecorefs/filecore_lookup.c: revision 1.14
sys/fs/puffs/puffs_node.c: revision 1.25
Move some the test for MAKEENTRY into the cache_enter(9). Make some
variables in vfs_cache.c static, __read_mostly, etc.
No objection on tech-kern@.
 1.2.4.1 07-May-2012  riz Pull up following revision(s) (requested by chs in ticket #204):
sys/fs/sysvbfs/sysvbfs_vnops.c: revision 1.44
sys/ufs/ffs/ffs_vfsops.c: revision 1.277
sys/fs/v7fs/v7fs_vnops.c: revision 1.11
sys/ufs/chfs/chfs_vnops.c: revision 1.7
sys/ufs/ext2fs/ext2fs_readwrite.c: revision 1.61
sys/miscfs/genfs/genfs_io.c: revision 1.54
sys/kern/vfs_wapbl.c: revision 1.52
sys/uvm/uvm_pager.h: revision 1.43
sys/ufs/ffs/ffs_vnops.c: revision 1.121
sys/kern/vfs_subr.c: revision 1.434
sys/fs/msdosfs/msdosfs_vnops.c: revision 1.83
sys/fs/ntfs/ntfs_vnops.c: revision 1.51
sys/fs/udf/udf_subr.c: revision 1.119
sys/miscfs/specfs/spec_vnops.c: revision 1.135
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.103
sys/fs/udf/udf_vnops.c: revision 1.71
sys/ufs/ufs/ufs_readwrite.c: revision 1.104
change vflushbuf() to take the full FSYNC_* flags.
translate FSYNC_LAZY into PGO_LAZY for VOP_PUTPAGES() so that
genfs_do_io() can set the appropriate io priority for the I/O.
this is the first part of addressing PR 46325.
mark all wapbl I/O as BPRIO_TIMECRITICAL.
this is the second part of addressing PR 46325.
 1.2.2.3 02-Jun-2012  mrg sync to latest -current.
 1.2.2.2 29-Apr-2012  mrg sync to latest -current.
 1.2.2.1 05-Apr-2012  mrg sync to latest -current.
 1.4.2.7 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.4.2.6 23-Jan-2013  yamt sync with head
 1.4.2.5 16-Jan-2013  yamt sync with (a bit old) head
 1.4.2.4 30-Oct-2012  yamt sync with head
 1.4.2.3 23-May-2012  yamt sync with head.
 1.4.2.2 17-Apr-2012  yamt sync with head
 1.4.2.1 12-Apr-2012  yamt file chfs_vnops.c was added on branch yamt-pagecache on 2012-04-17 00:08:55 +0000
 1.10.2.5 03-Dec-2017  jdolecek update from HEAD
 1.10.2.4 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.10.2.3 23-Jun-2013  tls resync from head
 1.10.2.2 25-Feb-2013  tls resync with head
 1.10.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.17.2.1 18-May-2014  rmind sync with head
 1.21.2.1 10-Aug-2014  tls Rebase.
 1.22.4.4 28-Aug-2017  skrll Sync with HEAD
 1.22.4.3 05-Oct-2016  skrll Sync with HEAD
 1.22.4.2 06-Jun-2015  skrll Sync with HEAD
 1.22.4.1 06-Apr-2015  skrll Sync with HEAD
 1.28.2.1 26-Apr-2017  pgoyette Sync with HEAD
 1.29.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.33.10.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.33.10.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.34.4.3 29-Feb-2020  ad Sync with head.
 1.34.4.2 19-Jan-2020  ad Set IMNT_SHRLOOKUP and use it for the in-cache case. Need to check what
more can be done with tmpfs though, it can probably do the whole lookup.
 1.34.4.1 17-Jan-2020  ad Sync with head.
 1.37.2.1 25-Apr-2020  bouyer Sync with bouyer-xenpvh-base2 (HEAD)
 1.42.6.1 01-Aug-2021  thorpej Sync with HEAD.
 1.7 18-Oct-2014  snj src is too big these days to tolerate superfluous apostrophes. It's
"its", people!
 1.6 25-Jan-2014  skrll More alignment spellos
 1.5 19-Oct-2012  ttoth branches: 1.5.2;
CHFS comments
 1.4 16-Jan-2012  ahoka branches: 1.4.4; 1.4.8;
use enum instead of macros
add some input validation
cleanup
 1.3 16-Jan-2012  ahoka cleanup macros
 1.2 24-Nov-2011  agc branches: 1.2.2;
quick workaround for compilation bug on amd64
 1.1 24-Nov-2011  ahoka Import CHFS, which was formerly known as ChewieFS.

CHFS is a file system for flash devices developed by the
Software Engineering Department at University of Szeged, Hungary.

http://chewiefs.sed.hu/

Thanks for all who made it possible.
 1.2.2.1 18-Feb-2012  mrg merge to -current.
 1.4.8.3 03-Dec-2017  jdolecek update from HEAD
 1.4.8.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.4.8.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.4.4.4 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.4.4.3 30-Oct-2012  yamt sync with head
 1.4.4.2 17-Apr-2012  yamt sync with head
 1.4.4.1 16-Jan-2012  yamt file chfs_wbuf.c was added on branch yamt-pagecache on 2012-04-17 00:08:55 +0000
 1.5.2.1 18-May-2014  rmind sync with head
 1.7 07-Dec-2021  andvar fix various typos, mainly in comments.
 1.6 19-Jul-2021  andvar Release mutexes in few more places on failure path. Reviewed them in chfs code after fixing PR kern/56242.
ok riastradh
 1.5 19-Oct-2012  ttoth branches: 1.5.54;
CHFS comments
 1.4 10-Aug-2012  ttoth branches: 1.4.2;
chfs bugfix [node was obsoleted twice]
 1.3 12-Apr-2012  ttoth branches: 1.3.2;
using chtype on media instead of vtype
debug.c deleted
 1.2 24-Nov-2011  agc branches: 1.2.2;
quick workaround to make this compile, with thanks to Hisashi Fujinaka for the
nudge.
 1.1 24-Nov-2011  ahoka Import CHFS, which was formerly known as ChewieFS.

CHFS is a file system for flash devices developed by the
Software Engineering Department at University of Szeged, Hungary.

http://chewiefs.sed.hu/

Thanks for all who made it possible.
 1.2.2.1 29-Apr-2012  mrg sync to latest -current.
 1.3.2.3 30-Oct-2012  yamt sync with head
 1.3.2.2 17-Apr-2012  yamt sync with head
 1.3.2.1 12-Apr-2012  yamt file chfs_write.c was added on branch yamt-pagecache on 2012-04-17 00:08:55 +0000
 1.4.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.5.54.1 01-Aug-2021  thorpej Sync with HEAD.
 1.2 12-Apr-2012  ttoth using chtype on media instead of vtype
debug.c deleted
 1.1 24-Nov-2011  ahoka branches: 1.1.2;
Import CHFS, which was formerly known as ChewieFS.

CHFS is a file system for flash devices developed by the
Software Engineering Department at University of Szeged, Hungary.

http://chewiefs.sed.hu/

Thanks for all who made it possible.
 1.1.2.1 29-Apr-2012  mrg sync to latest -current.
 1.2 12-Apr-2012  ttoth branches: 1.2.2;
using chtype on media instead of vtype
debug.c deleted
 1.1 24-Nov-2011  ahoka branches: 1.1.2;
Import CHFS, which was formerly known as ChewieFS.

CHFS is a file system for flash devices developed by the
Software Engineering Department at University of Szeged, Hungary.

http://chewiefs.sed.hu/

Thanks for all who made it possible.
 1.1.2.1 29-Apr-2012  mrg sync to latest -current.
 1.2.2.2 17-Apr-2012  yamt sync with head
 1.2.2.1 12-Apr-2012  yamt file debug.h was added on branch yamt-pagecache on 2012-04-17 00:08:55 +0000
 1.11 08-Jan-2025  andvar s/eraseing/erasing/ and couple more typos in debug messages and comments.
 1.10 30-Dec-2022  andvar branches: 1.10.6;
s/succes/success/ in comments.
 1.9 07-Dec-2021  andvar fix various typos, mainly in comments.
 1.8 09-Aug-2021  andvar s/fist/first/
 1.7 07-Feb-2018  ozaki-r Remove unnecessary assertions

KASSERT(!rw_lock_held()) just before rw_destroy() is useless because
rw_destroy does more strict check and provides better information on
failure.
 1.6 07-Feb-2015  christos fix leak. Reported by:
http://www.m00nbsd.net/ae123a9bae03f7dde5c6d654412daf5a.html#Report-4
 1.5 18-Oct-2014  snj branches: 1.5.2;
src is too big these days to tolerate superfluous apostrophes. It's
"its", people!
 1.4 01-Sep-2014  he Plug memory leak in add_peb_to_free() and add_peb_to_in_use()
in case there's a duplicate in the tree.
 1.3 10-Aug-2012  ttoth branches: 1.3.2; 1.3.14;
chfs bugfix [node was obsoleted twice]
 1.2 25-Nov-2011  ahoka branches: 1.2.6;
Don't shadow some stupid function defined globally in random platforms.
 1.1 24-Nov-2011  ahoka Import CHFS, which was formerly known as ChewieFS.

CHFS is a file system for flash devices developed by the
Software Engineering Department at University of Szeged, Hungary.

http://chewiefs.sed.hu/

Thanks for all who made it possible.
 1.2.6.3 30-Oct-2012  yamt sync with head
 1.2.6.2 17-Apr-2012  yamt sync with head
 1.2.6.1 25-Nov-2011  yamt file ebh.c was added on branch yamt-pagecache on 2012-04-17 00:08:55 +0000
 1.3.14.2 16-Feb-2015  martin Pull up following revision(s) (requested by maxv in ticket #520):
sys/ufs/chfs/ebh.c: revision 1.6
sys/dev/sdmmc/sdmmc_mem.c: revision 1.33
sys/dev/ic/aic7xxx.c: revision 1.132
sys/fs/nfs/common/krpc_subr.c: revision 1.2
sys/modules/lua/lua.c: revision 1.16
sys/fs/udf/udf_subr.c: revision 1.128
sys/ufs/chfs/chfs_scan.c: revision 1.6
sys/dev/ic/an.c: revision 1.62

Fix six memory leaks and two inconsistencies.
 1.3.14.1 08-Sep-2014  msaitoh Pull up following revision(s) (requested by he in ticket #74):
sys/ufs/chfs/chfs_vnode.c: revision 1.11
sys/ufs/chfs/chfs_readinode.c: revision 1.9
sys/ufs/chfs/chfs_scan.c: revision 1.5
sys/ufs/chfs/chfs_gc.c: revision 1.6
sys/ufs/chfs/ebh.c: revision 1.4
Plug leak in chfs_scan_eraseblock() of the allocated buffer.
Make sure to release it both on success and failure returns.
OK'ed by ttoth@
Plug memory leak in a corner case in chfs_get_data_nodes().
Plug memory leaks in error returns in chfs_readvnode().
Plug memory leak in error returns and normal operation in
chfs_gcollect_pristine().
Plug memory leak in add_peb_to_free() and add_peb_to_in_use()
in case there's a duplicate in the tree.
 1.3.2.1 03-Dec-2017  jdolecek update from HEAD
 1.5.2.1 06-Apr-2015  skrll Sync with HEAD
 1.10.6.1 02-Aug-2025  perseant Sync with HEAD
 1.5 08-Jan-2025  andvar s/eraseing/erasing/ and couple more typos in debug messages and comments.
 1.4 06-Aug-2016  dholland branches: 1.4.52;
typo in comment
 1.3 19-Oct-2012  ttoth branches: 1.3.14;
CHFS comments
 1.2 13-Apr-2012  ttoth branches: 1.2.2; 1.2.4;
prepare for chfs's makefs
 1.1 24-Nov-2011  ahoka branches: 1.1.2;
Import CHFS, which was formerly known as ChewieFS.

CHFS is a file system for flash devices developed by the
Software Engineering Department at University of Szeged, Hungary.

http://chewiefs.sed.hu/

Thanks for all who made it possible.
 1.1.2.1 29-Apr-2012  mrg sync to latest -current.
 1.2.4.2 03-Dec-2017  jdolecek update from HEAD
 1.2.4.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.2.2.3 30-Oct-2012  yamt sync with head
 1.2.2.2 17-Apr-2012  yamt sync with head
 1.2.2.1 13-Apr-2012  yamt file ebh.h was added on branch yamt-pagecache on 2012-04-17 00:08:55 +0000
 1.3.14.1 05-Oct-2016  skrll Sync with HEAD
 1.4.52.1 02-Aug-2025  perseant Sync with HEAD
 1.2 16-Sep-2021  andvar fix typos in word "successfully", mainly s/succesfully/successfully/.
 1.1 24-Nov-2011  ahoka branches: 1.1.6;
Import CHFS, which was formerly known as ChewieFS.

CHFS is a file system for flash devices developed by the
Software Engineering Department at University of Szeged, Hungary.

http://chewiefs.sed.hu/

Thanks for all who made it possible.
 1.1.6.2 17-Apr-2012  yamt sync with head
 1.1.6.1 24-Nov-2011  yamt file ebh_media.h was added on branch yamt-pagecache on 2012-04-17 00:08:55 +0000
 1.2 19-Oct-2012  ttoth CHFS comments
 1.1 24-Nov-2011  ahoka branches: 1.1.6; 1.1.10;
Import CHFS, which was formerly known as ChewieFS.

CHFS is a file system for flash devices developed by the
Software Engineering Department at University of Szeged, Hungary.

http://chewiefs.sed.hu/

Thanks for all who made it possible.
 1.1.10.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.1.6.3 30-Oct-2012  yamt sync with head
 1.1.6.2 17-Apr-2012  yamt sync with head
 1.1.6.1 24-Nov-2011  yamt file ebh_misc.h was added on branch yamt-pagecache on 2012-04-17 00:08:55 +0000
 1.3 11-Aug-2021  andvar s/enrty/entry/
 1.2 19-Oct-2012  ttoth CHFS comments
 1.1 24-Nov-2011  ahoka branches: 1.1.6; 1.1.10;
Import CHFS, which was formerly known as ChewieFS.

CHFS is a file system for flash devices developed by the
Software Engineering Department at University of Szeged, Hungary.

http://chewiefs.sed.hu/

Thanks for all who made it possible.
 1.1.10.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.1.6.3 30-Oct-2012  yamt sync with head
 1.1.6.2 17-Apr-2012  yamt sync with head
 1.1.6.1 24-Nov-2011  yamt file media.h was added on branch yamt-pagecache on 2012-04-17 00:08:55 +0000
 1.4 09-Aug-2016  kre Revert previous - which itself (incorrectly) reverted the previous
changes, breaking the build.
 1.3 09-Aug-2016  christos More htree writing support (Hrishikesh Goyal GSoC 2016)
 1.2 03-Jun-2016  joerg Install new header and fix rump to include the corresponding source as
well.
 1.1 12-Jun-1998  cgd branches: 1.1.198; 1.1.218;
Rework the way kernel include files are installed. In the new method,
as with user-land programs, include files are installed by each directory
in the tree that has includes to install. (This allows more flexibility
as to what gets installed, makes 'partial installs' easier, and gives us
more options as to which machines' includes get installed at any given
time.) The old SYS_INCLUDES={symlinks,copies} behaviours are _both_
still supported, though at least one bug in the 'symlinks' case is
fixed by this change. Include files can't be build before installation,
so directories that have includes as targets (e.g. dev/pci) have to move
those targets into a different Makefile.
 1.1.218.1 09-Jul-2016  skrll Sync with HEAD
 1.1.198.1 03-Dec-2017  jdolecek update from HEAD
 1.53 04-May-2025  rillig ext2fs.h: avoid adjacent commas in unsupported features

A question mark still stands out enough to be noticeable.

Noticed by lint.
 1.52 27-Aug-2023  christos branches: 1.52.6;
- fix cgload/cgsave inconsistencies
- add a constant for the rev 0 group descriptor size
 1.51 26-Aug-2023  riastradh ext2fs: Nix trailing whitespace.
 1.50 26-Aug-2023  riastradh ext2fs.h: Restore e2fs_cgload/cgsave for libsa and userland use.

Stop-gap until they can be taught to handle the new version that was
moved to ext2fs_vfsops.c, presumably.
 1.49 25-Aug-2023  christos Support INCOMPAT_64BIT on ext4 (Vladimir 'phcoder' Serbinenko)
 1.48 20-Aug-2016  jdolecek add support for GDT_CSUM AKA uninit_bg feature
 1.47 15-Aug-2016  jdolecek EXT2F_INCOMPAT_FLEX_BG feature actually doesn't require any explicit
code changes, all magic is done by setting the block offsets appropriately
in group descriptors by newfs; add it to the list of supported INCOMPAT flags
 1.46 15-Aug-2016  jdolecek bump link limit to 65000 for files, and add support for EXT2F_ROCOMPAT_DIR_NLINK to make link count unlimited for directories
 1.45 14-Aug-2016  jdolecek switch ext2fs_htree_has_idx() over to EXT2F_HAS_COMPAT_FEATURE() and remove EXT2F_HAS_COMPAT_FEATURE() - this also fixes it for BE machines, as EXT2F_HAS_COMPAT_FEATURE() did extra byte swap; also remove XXX comment about IN_E3INDEX
 1.44 14-Aug-2016  jdolecek add EXT2F_HAS_ROCOMPAT_FEATURE() macro, and change the current EXT2F_HAS_{COMPAT|INCOMPAT}_FEATURE() to take fs as first parameter
 1.43 12-Aug-2016  macallan sprinkle ()s in macros with comparisons, shuts up compiler warnings
 1.42 12-Aug-2016  jdolecek add support for extended attributes in ext2fs for ext3/ext4; read-only for now
 1.41 05-Aug-2016  jdolecek add defines for the missing ext4 feature flags
 1.40 04-Aug-2016  jdolecek rename struct ext2fs_dinode attribute e2di_dacl to correct
e2di_size_high; even Linux ext2 filesystem code actually uses it
unconditionally this way and ext4 code finally also calls it that way
in their struct definition too; if there was any trace of this for other
purpose it's long gone
 1.39 03-Aug-2016  jdolecek support arbitrary ext3/ext4 inode size, add all the new ext4 fields ext2fs_dinode, and add support for loading the extra inode data
 1.38 24-Jun-2016  christos branches: 1.38.2;
GSoC 2016 (Hrishikesh Goyal): Htree index support from FreeBSD
 1.37 03-Jun-2016  christos Add ext4 extent support from GSoC 2016 (Hrishikesh Goyal), from the FreeBSD
ext2 code.
 1.36 23-Jun-2013  dholland branches: 1.36.10;
Stick ffs_, ext2_, chfs_, filecore_, cd9660_, or mfs_ in front of
the following symbols so as to disambiguate fully. (Christos already
did the lfs ones.)

lblkno
lblktosize
lfragtosize
numfrags
blkroundup
fragroundup
 1.35 23-Jun-2013  dholland fsbtodb() -> FFS_FSBTODB(), EXT2_FSBTODB(), or MFS_FSBTODB()
dbtofsb() -> FFS_DBTOFSB() or EXT2_DBTOFSB()

(Christos already did the lfs ones a few days back)
 1.34 19-Jun-2013  dholland Rename ambiguous macros:
MAXDIRSIZE -> UFS_MAXDIRSIZE or LFS_MAXDIRSIZE
NINDIR -> FFS_NINDIR, EXT2_NINDIR, LFS_NINDIR, or MFS_NINDIR
INOPB -> FFS_INOPB, LFS_INOPB
INOPF -> FFS_INOPF, LFS_INOPF
blksize -> ffs_blksize, ext2_blksize, or lfs_blksize
sblksize -> ffs_blksize

These are not the only ambiguously defined filesystem macros, of
course, there's a pile more. I may not have found all the ambiguous
definitions of blksize(), too, as there are a lot of other things
called 'blksize' in the system.
 1.33 21-Nov-2012  jakllsch Write support for the Ext4 Read-only Compatible Feature "huge_file".

Primarily, this feature extends the inode block count field to 48 bits.
Additionally, this feature allows this field to be represented in file
system block size units rather than DEV_BSIZE units.
 1.32 21-Nov-2012  jakllsch Add various newer Ext2 superblock feature bits and inode flag bits.
 1.31 19-Nov-2012  jakllsch snprintb EXT2F_ROCOMPAT_SPARSESUPER as such.
 1.30 01-Sep-2012  christos branches: 1.30.2;
really print the incompatible bits.
 1.29 27-Nov-2009  tsutsui branches: 1.29.12;
Add definitions for more reserved inodes.
 1.28 19-Oct-2009  bouyer Remove closes 3 & 4 from my licence. Lots of thanks to Soren Jacobsen
for the booring work !
 1.27 12-Sep-2009  tsutsui Migrate from u_intNN_t to uintNN_t.
 1.26 25-Dec-2007  perry branches: 1.26.10;
Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h
 1.25 17-Nov-2007  tsutsui branches: 1.25.2; 1.25.6;
Misc cosmetics.
 1.24 15-Nov-2007  tsutsui Add some definitions for resizefs features.
 1.23 13-Nov-2007  tsutsui - add some more constant definitions
- add some comments
- typo and some cosmetics
 1.22 16-Feb-2006  perry branches: 1.22.38; 1.22.40; 1.22.44; 1.22.46;
Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.
 1.21 29-Jan-2006  dsl branches: 1.21.2; 1.21.4;
Make almost everything #include <sys/bswap.h> instead of <machine/bswap.h>
The bswap.h and endian.h files are all rather incestuous, but I want to
get the constant folding stuff into one place - sys/bswap.h
 1.20 24-Dec-2005  perry branches: 1.20.2;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.19 11-Dec-2005  christos merge ktrace-lwp.
 1.18 30-Aug-2005  xtraeme * Remove __P()
* Use ANSI function declarations on ext2fs and mfs
 1.17 26-Feb-2005  perry branches: 1.17.4;
nuke trailing whitespace
 1.16 09-Feb-2005  ws Add support for large files (>2GB).
Like Linux, automagically convert old filesystem to use this,
if they are already at revision 1.
For revision 0, just punt (unlike Linux; makes me a bit too nervous.)

There should be an option to fsck_ext2fs to upgrade revision 0 to revision 1.

Reviewd by Manuel (bouyer@).
 1.15 22-Mar-2004  bouyer branches: 1.15.8; 1.15.10;
Fix disclaimer in my copyright. Pointed out by Thomas Klausner.
 1.14 05-Oct-2003  bouyer Remove references to University of California from my copyright notices.
 1.13 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.12 24-Jan-2003  fvdl branches: 1.12.2;
Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.11 01-Dec-2002  matt Add multiple inclusion protection for headers. Fix mismatched
variable declarations (missing const's) as needed.
 1.10 28-Jan-2000  bouyer branches: 1.10.6;
Correct (minor) bogons in filetype option support, and add support
for sparse_super option
 1.9 26-Jan-2000  bouyer First cut at ext2fs rev 1 support (as of mke2fs 1.18): supports the filetype
option read/write and the sparse option read-only.
 1.8 17-Feb-1999  bouyer branches: 1.8.2; 1.8.8; 1.8.14;
Some new fields in the ext2fs superblock, from Tim Shepard.
While I'm there, reformat this file a bit.
 1.7 15-Jan-1999  bouyer Move the bswap functions from libutil to libc (this bups the
minor of libc and the major of libutil). For little-endian architectures
merge the bnswap() assembly versions with nto* and hton* using symbols
aliasing. Use symbol renaming for the bswap function in this case to avoid
namespace pollution.
Declare bswap* in machine/bswap.h, not machine/endian.h. For little-endian
machines, common code for inline macros go in machine/byte_swap.h
Sync libkern with libc.
Adjust #include in kernel sources for machine/bswap.h.
 1.6 30-Sep-1998  bouyer Not time to #include <machine/bswap.h>m, it will come with the move
of bswap*() from libutil to libc. Sorry !
 1.5 29-Sep-1998  bouyer #include opt_uvm.h only if _KENREL and !_LKM
Make ext2fs_init() call ufs_init(). it was doing the init by itself,
testing for extern done != 0. This bug was hidden by the fact that
ext2fs_init() is called before ffs_init().
 1.4 09-Aug-1998  perry bzero->memset, bcopy->memcpy, bcmp->memcmp
 1.3 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.2 09-Oct-1997  bouyer Add byte-swapping functions (bswap16, bswap32, bswap64) to libkern.
Only assembly version for i386 bswap16 and bswap32 for now (bswap64 uses
bswap32). Contribution of assembly versions of these are welcome.
Add byte-swapping of ext2fs metadata for big-endian systems.
Tested on i386 and sparc.
 1.1 11-Jun-1997  bouyer branches: 1.1.4;
The ext2fs layer, based on the ffs/ufs one. Uses a few functions from
sys/ufs/ufs/
 1.1.4.1 14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.8.14.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.8.8.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.8.2.1 01-Feb-2000  he Apply patch (requested by bouyer):
Add support for ext2fs revision 1, with read-only support for
the 'sparse_super' and 'filetype' options. Should fix PR#9088.
 1.10.6.1 11-Dec-2002  thorpej Sync with HEAD.
 1.12.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.12.2.5 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.12.2.4 15-Feb-2005  skrll Sync with HEAD.
 1.12.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.12.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.12.2.1 03-Aug-2004  skrll Sync with HEAD
 1.15.10.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.15.10.1 12-Feb-2005  yamt sync with head.
 1.15.8.1 29-Apr-2005  kent sync with -current
 1.17.4.4 21-Jan-2008  yamt sync with head
 1.17.4.3 07-Dec-2007  yamt sync with head
 1.17.4.2 15-Nov-2007  yamt sync with head.
 1.17.4.1 21-Jun-2006  yamt sync with head.
 1.20.2.2 18-Feb-2006  yamt sync with head.
 1.20.2.1 01-Feb-2006  yamt sync with head.
 1.21.4.1 22-Apr-2006  simonb Sync with head.
 1.21.2.1 09-Sep-2006  rpaulo sync with head
 1.22.46.2 18-Feb-2008  mjf Sync with HEAD.
 1.22.46.1 19-Nov-2007  mjf Sync with HEAD.
 1.22.44.2 18-Nov-2007  bouyer Sync with HEAD
 1.22.44.1 13-Nov-2007  bouyer Sync with HEAD
 1.22.40.1 09-Jan-2008  matt sync with HEAD
 1.22.38.2 21-Nov-2007  joerg Sync with HEAD.
 1.22.38.1 14-Nov-2007  joerg Sync with HEAD.
 1.25.6.1 02-Jan-2008  bouyer Sync with HEAD
 1.25.2.1 26-Dec-2007  ad Sync with head.
 1.26.10.2 11-Mar-2010  yamt sync with head
 1.26.10.1 16-Sep-2009  yamt sync with head
 1.29.12.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.29.12.2 16-Jan-2013  yamt sync with (a bit old) head
 1.29.12.1 30-Oct-2012  yamt sync with head
 1.30.2.4 03-Dec-2017  jdolecek update from HEAD
 1.30.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.30.2.2 23-Jun-2013  tls resync from head
 1.30.2.1 25-Feb-2013  tls resync with head
 1.36.10.2 05-Oct-2016  skrll Sync with HEAD
 1.36.10.1 09-Jul-2016  skrll Sync with HEAD
 1.38.2.1 06-Aug-2016  pgoyette Sync with HEAD
 1.52.6.1 02-Aug-2025  perseant Sync with HEAD
 1.59 27-Jun-2025  andvar s/quadradically/quadratically/ in comments.
 1.58 14-May-2024  andvar branches: 1.58.2;
fix recently committed typos by msaitoh in few more places, as well as few more.
mainly s/contigous/contiguous/ and s/miliseconds/milliseconds/ in comments.
 1.57 13-May-2024  msaitoh s/contigous/contiguous/ in comment.
 1.56 26-Aug-2023  christos fix incorrect test
 1.55 26-Aug-2023  christos Fix metadata_cksum (Vladimir Serbinenko)

Current code always assumes that CG uses crc16. Yet when metadata_cksum is
enabled then it uses truncated crc32c. This patch doesn't implement full
metadata_cksum, just allows volumes with metadata_cksum to be mounted
read-only.
 1.54 26-Aug-2023  riastradh ext2fs: Nix trailing whitespace.
 1.53 25-Aug-2023  christos Support INCOMPAT_64BIT on ext4 (Vladimir 'phcoder' Serbinenko)
 1.52 28-May-2017  hannken Change ext2fs to use vcache_new like we did for ffs:
- Change ext2fs_valloc to return an inode number.
- Make ext2fs_makeinode private to ext2fs_vnops.c and
pass vattr instead of mode.
 1.51 20-Aug-2016  jdolecek modify the comment to note code needs to brele() to have a shot on actually
working
 1.50 20-Aug-2016  jdolecek #if 0 the check for ext2fs_mapsearch() failure (similar what was done
for ffs counterpart), it actually never fails, it panics instead
 1.49 20-Aug-2016  jdolecek add support for GDT_CSUM AKA uninit_bg feature
 1.48 13-Aug-2016  christos KNF, no functional changes...
 1.47 03-Aug-2016  jdolecek support arbitrary ext3/ext4 inode size, add all the new ext4 fields ext2fs_dinode, and add support for loading the extra inode data
 1.46 28-Mar-2015  maxv branches: 1.46.2;
Remove the 'cred' argument from bread(). Remove a now unused var in
ffs_snapshot.c. Update the man page accordingly.

ok hannken@
 1.45 23-Jun-2013  dholland branches: 1.45.10;
fsbtodb() -> FFS_FSBTODB(), EXT2_FSBTODB(), or MFS_FSBTODB()
dbtofsb() -> FFS_DBTOFSB() or EXT2_DBTOFSB()

(Christos already did the lfs ones a few days back)
 1.44 20-Dec-2012  hannken Change bread() and breadn() to never return a buffer on
error and modify all callers to not brelse() on error.

Welcome to 6.99.16

PR kern/46282 (6.0_BETA crash: msdosfs_bmap -> pcbmap -> bread -> bio_doread)
 1.43 21-Nov-2012  jakllsch Write support for the Ext4 Read-only Compatible Feature "huge_file".

Primarily, this feature extends the inode block count field to 48 bits.
Additionally, this feature allows this field to be represented in file
system block size units rather than DEV_BSIZE units.
 1.42 06-Mar-2011  rmind branches: 1.42.4; 1.42.14;
{ffs_nodealloccg,ext2fs_nodealloccg,ext2fs_mapsearch}: use XOR and ffs()
to find free bits in the inode and block bitmaps, instead of the loop.

Obtained from FreeBSD (changes by jhb).
 1.41 19-Oct-2009  bouyer branches: 1.41.4; 1.41.6;
Remove closes 3 & 4 from my licence. Lots of thanks to Soren Jacobsen
for the booring work !
 1.40 12-Sep-2009  tsutsui Whitespace nits.
 1.39 07-May-2009  elad Introduce several actions/requests for authorizing file-system related
operations, specifically quota and block allocation from reserved space.

Modify ufs_quotactl() to accomodate passing "mp" earlier by vfs_busy()ing
it a little bit higher.

Mailing list reference:

http://mail-index.netbsd.org/tech-kern/2009/04/26/msg004936.html

Note that the umapfs request mentioned in this thread was NOT added as
there is still on-going discussion regarding the proper implementation.
 1.38 11-Jan-2009  christos branches: 1.38.2;
merge christos-time_t
 1.37 23-Nov-2008  mrg add support for 32 bit uid/gid fields in ext2, but only do so for
when the revision is > REV0.
 1.36 16-May-2008  hannken branches: 1.36.4; 1.36.6; 1.36.8;
Make sure all cached buffers with valid, not yet written data have been
run through copy-on-write. Call fscow_run() with valid data where possible.

The LP_UFSCOW hack is no longer needed to protect ffs_copyonwrite() against
endless recursion.

- Add a flag B_MODIFY to bread(), breada() and breadn(). If set the caller
intends to modify the buffer returned.

- Always run copy-on-write on buffers returned from ffs_balloc().

- Add new function ffs_getblk() that gets a buffer, assigns a new blkno,
may clear the buffer and runs copy-on-write. Process possible errors
from getblk() or fscow_run(). Part of PR kern/38664.

Welcome to 4.99.63

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.35 08-Oct-2007  ad branches: 1.35.6; 1.35.18; 1.35.20; 1.35.22; 1.35.24; 1.35.26;
Merge ffs locking & brelse changes from the vmlocking branch.
 1.34 04-Jan-2007  elad branches: 1.34.6; 1.34.18; 1.34.20; 1.34.22;
Consistent usage of KAUTH_GENERIC_ISSUSER.
 1.33 09-Dec-2006  chs several ext2fs fixes provided by Barry Bouwsma:
- set ip->i_e2fs_dtime to time_second, not time_uptime.
- don't allow ipref to go negative
- fs->e2fs.e2fs_icount is a valid inode number, allow it.
 1.32 16-Nov-2006  christos branches: 1.32.2;
__unused removal on arguments; approved by core.
 1.31 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.30 07-Jun-2006  kardel branches: 1.30.6; 1.30.8;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.29 14-May-2006  elad branches: 1.29.2;
integrate kauth.
 1.28 11-Dec-2005  christos branches: 1.28.4; 1.28.6; 1.28.8; 1.28.10; 1.28.12;
merge ktrace-lwp.
 1.27 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.26 30-Aug-2005  xtraeme branches: 1.26.2;
* Remove __P()
* Use ANSI function declarations on ext2fs and mfs
 1.25 19-Aug-2005  christos 64 bit inode changes.
 1.24 29-May-2005  christos branches: 1.24.2;
- sprinkle const
- avoid shadow variables.
 1.23 26-Feb-2005  perry nuke trailing whitespace
 1.22 22-Mar-2004  bouyer branches: 1.22.8; 1.22.10;
Fix disclaimer in my copyright. Pointed out by Thomas Klausner.
 1.21 05-Oct-2003  bouyer Remove references to University of California from my copyright notices.
 1.20 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.19 29-Jun-2003  fvdl branches: 1.19.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.18 29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.17 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.16 24-Apr-2003  gmcgarry More ufs2 merge fall-out. Pass the correct pointer to the dinode
for clearing. Fixes PR#21241.
 1.15 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.14 27-Sep-2002  provos remove trailing \n in panic(). approved perry.
 1.13 08-Nov-2001  lukem add RCSID
 1.12 26-Oct-2001  lukem remove #include <ufs/ufs/quota.h> where it was just to appease
<ufs/ufs/inode.h>, since the latter now includes the former. leave the former
in source that obviously uses specific bits of it (for completeness.)
 1.11 24-Aug-2001  wiz branches: 1.11.4;
heirarchy -> hierarchy
 1.10 05-Jul-2001  toshii branches: 1.10.2;
Fix typo. s/extention/extension/
 1.9 28-Jun-2000  mrg branches: 1.9.2;
remove include of <vm/vm.h> and <uvm/uvm_extern.h>
 1.8 19-May-2000  thorpej NULL != 0
 1.7 30-Mar-2000  augustss Remove register declarations.
 1.6 10-Feb-1999  bouyer branches: 1.6.8; 1.6.14;
Make sure a buffer optained from bread() is always bresle()'d in case of
error. Closes PR kern/1448 from Wolfgang Solfrank.
 1.5 23-Oct-1998  thorpej For consistency w/ FFS/LFS, define EXT2_DINODE_SIZE, and use it instead
of pointer arithmetic and/or sizeof(struct ext2fs_dinode).
 1.4 09-Aug-1998  perry bzero->memset, bcopy->memcpy, bcmp->memcmp
 1.3 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.2 09-Oct-1997  bouyer Add byte-swapping functions (bswap16, bswap32, bswap64) to libkern.
Only assembly version for i386 bswap16 and bswap32 for now (bswap64 uses
bswap32). Contribution of assembly versions of these are welcome.
Add byte-swapping of ext2fs metadata for big-endian systems.
Tested on i386 and sparc.
 1.1 11-Jun-1997  bouyer branches: 1.1.4;
The ext2fs layer, based on the ffs/ufs one. Uses a few functions from
sys/ufs/ufs/
 1.1.4.1 14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.6.14.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.6.8.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.9.2.4 18-Oct-2002  nathanw Catch up to -current.
 1.9.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.9.2.2 21-Sep-2001  nathanw Catch up to -current.
 1.9.2.1 24-Aug-2001  nathanw Catch up with -current.
 1.10.2.3 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.10.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.10.2.1 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.11.4.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.19.2.7 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.19.2.6 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.19.2.5 21-Sep-2004  skrll Fix the sync with head I botched.
 1.19.2.4 18-Sep-2004  skrll Sync with HEAD.
 1.19.2.3 24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.19.2.2 03-Aug-2004  skrll Sync with HEAD
 1.19.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.22.10.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.22.8.1 29-Apr-2005  kent sync with -current
 1.24.2.4 27-Oct-2007  yamt sync with head.
 1.24.2.3 26-Feb-2007  yamt sync with head.
 1.24.2.2 30-Dec-2006  yamt sync with head.
 1.24.2.1 21-Jun-2006  yamt sync with head.
 1.26.2.1 20-Oct-2005  yamt adapt ufs.
 1.28.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.28.10.2 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.28.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.28.8.2 26-Jun-2006  yamt sync with head.
 1.28.8.1 24-May-2006  yamt sync with head.
 1.28.6.2 01-Jun-2006  kardel Sync with head.
 1.28.6.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time() and use "time_second"
instead of "time.tv_sec".
 1.28.4.1 09-Sep-2006  rpaulo sync with head
 1.29.2.1 19-Jun-2006  chap Sync with head.
 1.30.8.2 10-Dec-2006  yamt sync with head.
 1.30.8.1 22-Oct-2006  yamt sync with head
 1.30.6.2 12-Jan-2007  ad Sync with head.
 1.30.6.1 18-Nov-2006  ad Sync with head.
 1.32.2.1 12-Jan-2007  bouyer Pull up following revision(s) (requested by chs in ticket #346):
sys/ufs/ext2fs/ext2fs_inode.c: revision 1.56
sys/ufs/ext2fs/ext2fs_alloc.c: revision 1.33
several ext2fs fixes provided by Barry Bouwsma:
- set ip->i_e2fs_dtime to time_second, not time_uptime.
- don't allow ipref to go negative
- fs->e2fs.e2fs_icount is a valid inode number, allow it.
 1.34.22.1 14-Oct-2007  yamt sync with head.
 1.34.20.1 06-Nov-2007  matt sync with HEAD
 1.34.18.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.34.6.2 19-Oct-2007  ad ext2fs_vfree: the in-core root inode can have nlinks == 0. Don't try to
free it during reclaim. XXX Needs further investigation.
 1.34.6.1 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.35.26.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.35.24.4 11-Mar-2010  yamt sync with head
 1.35.24.3 16-Sep-2009  yamt sync with head
 1.35.24.2 16-May-2009  yamt sync with head
 1.35.24.1 04-May-2009  yamt sync with head.
 1.35.22.1 18-May-2008  yamt sync with head.
 1.35.20.4 30-Dec-2008  christos fix dev_t printfs
 1.35.20.3 27-Dec-2008  christos merge with head.
 1.35.20.2 01-Nov-2008  christos Sync with head.
 1.35.20.1 29-Mar-2008  christos Welcome to the time_t=long long dev_t=uint64_t branch.
 1.35.18.2 17-Jan-2009  mjf Sync with HEAD.
 1.35.18.1 02-Jun-2008  mjf Sync with HEAD.
 1.35.6.2 30-Dec-2007  ad Fix remaining problems with ext2fs on this branch.
 1.35.6.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.36.8.1 29-Nov-2008  snj Pull up following revision(s) (requested by mrg in ticket #147):
sys/ufs/ext2fs/ext2fs_alloc.c: revision 1.37
sys/ufs/ext2fs/ext2fs_bswap.c: revision 1.14
sys/ufs/ext2fs/ext2fs_dinode.h: revision 1.17
sys/ufs/ext2fs/ext2fs_lookup.c: revision 1.56
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.83
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.140
sys/ufs/ufs/inode.h: revision 1.55
add support for 32 bit uid/gid fields in ext2, but only do so for
when the revision is > REV0.
 1.36.6.1 19-Jan-2009  skrll Sync with HEAD.
 1.36.4.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.38.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.41.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.41.4.1 21-Apr-2011  rmind sync with head
 1.42.14.3 03-Dec-2017  jdolecek update from HEAD
 1.42.14.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.42.14.1 25-Feb-2013  tls resync with head
 1.42.4.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.42.4.2 23-Jan-2013  yamt sync with head
 1.42.4.1 16-Jan-2013  yamt sync with (a bit old) head
 1.45.10.3 28-Aug-2017  skrll Sync with HEAD
 1.45.10.2 05-Oct-2016  skrll Sync with HEAD
 1.45.10.1 06-Apr-2015  skrll Sync with HEAD
 1.46.2.1 06-Aug-2016  pgoyette Sync with HEAD
 1.58.2.1 02-Aug-2025  perseant Sync with HEAD
 1.43 05-Sep-2020  riastradh Round of uvm.h cleanup.

The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
to query whether curlwp is the pagedaemon, which should maybe be
exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
by it. We should split up uvm_extern.h but this will serve for now
to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
UVMHIST(ubchist), since ubchist is declared in uvm.h but the
reference evaporates if UVMHIST is not defined, so we reduce header
file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
here.

ok chs@
 1.42 03-Sep-2018  riastradh Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
 1.41 13-Aug-2016  christos branches: 1.41.14; 1.41.16;
KNF, no functional changes...
 1.40 28-Mar-2015  maxv Remove the 'cred' argument from bread(). Remove a now unused var in
ffs_snapshot.c. Update the man page accordingly.

ok hannken@
 1.39 23-Jun-2013  dholland branches: 1.39.10;
Stick ffs_, ext2_, chfs_, filecore_, cd9660_, or mfs_ in front of
the following symbols so as to disambiguate fully. (Christos already
did the lfs ones.)

lblkno
lblktosize
lfragtosize
numfrags
blkroundup
fragroundup
 1.38 23-Jun-2013  dholland fsbtodb() -> FFS_FSBTODB(), EXT2_FSBTODB(), or MFS_FSBTODB()
dbtofsb() -> FFS_DBTOFSB() or EXT2_DBTOFSB()

(Christos already did the lfs ones a few days back)
 1.37 22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.36 20-Dec-2012  hannken Change bread() and breadn() to never return a buffer on
error and modify all callers to not brelse() on error.

Welcome to 6.99.16

PR kern/46282 (6.0_BETA crash: msdosfs_bmap -> pcbmap -> bread -> bio_doread)
 1.35 21-Nov-2012  jakllsch Write support for the Ext4 Read-only Compatible Feature "huge_file".

Primarily, this feature extends the inode block count field to 48 bits.
Additionally, this feature allows this field to be represented in file
system block size units rather than DEV_BSIZE units.
 1.34 19-Oct-2009  bouyer branches: 1.34.12; 1.34.22;
Remove closes 3 & 4 from my licence. Lots of thanks to Soren Jacobsen
for the booring work !
 1.33 16-May-2008  hannken Make sure all cached buffers with valid, not yet written data have been
run through copy-on-write. Call fscow_run() with valid data where possible.

The LP_UFSCOW hack is no longer needed to protect ffs_copyonwrite() against
endless recursion.

- Add a flag B_MODIFY to bread(), breada() and breadn(). If set the caller
intends to modify the buffer returned.

- Always run copy-on-write on buffers returned from ffs_balloc().

- Add new function ffs_getblk() that gets a buffer, assigns a new blkno,
may clear the buffer and runs copy-on-write. Process possible errors
from getblk() or fscow_run(). Part of PR kern/38664.

Welcome to 4.99.63

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.32 08-Oct-2007  ad branches: 1.32.18; 1.32.20; 1.32.22; 1.32.24;
Merge ffs locking & brelse changes from the vmlocking branch.
 1.31 16-Nov-2006  christos branches: 1.31.8; 1.31.22; 1.31.24; 1.31.26;
__unused removal on arguments; approved by core.
 1.30 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.29 14-May-2006  elad branches: 1.29.8; 1.29.10;
integrate kauth.
 1.28 11-Dec-2005  christos branches: 1.28.4; 1.28.6; 1.28.8; 1.28.10; 1.28.12;
merge ktrace-lwp.
 1.27 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.26 30-Aug-2005  xtraeme branches: 1.26.2;
* Remove __P()
* Use ANSI function declarations on ext2fs and mfs
 1.25 26-Feb-2005  perry branches: 1.25.4;
nuke trailing whitespace
 1.24 14-Feb-2005  chs fix typoe in previous.
 1.23 09-Feb-2005  ws Add support for large files (>2GB).
Like Linux, automagically convert old filesystem to use this,
if they are already at revision 1.
For revision 0, just punt (unlike Linux; makes me a bit too nervous.)

There should be an option to fsck_ext2fs to upgrade revision 0 to revision 1.

Reviewd by Manuel (bouyer@).
 1.22 22-Mar-2004  bouyer branches: 1.22.8; 1.22.10;
Fix disclaimer in my copyright. Pointed out by Thomas Klausner.
 1.21 05-Oct-2003  bouyer Remove references to University of California from my copyright notices.
 1.20 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.19 24-Jan-2003  fvdl branches: 1.19.2;
Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.18 26-Sep-2002  jdolecek use ufs_balloc_range() rather than local (mostly identical, but with some
bugs) ext2fs variant
 1.17 05-May-2002  chs use the correct size when zeroing an array.
 1.16 26-Jan-2002  chs fix an error case.
 1.15 30-Nov-2001  chs pick up changes from ufs_balloc_range().
 1.14 10-Nov-2001  chs track some changes in the ufs code:
update UVM's notion of the file size in *_write() rather than
*_balloc().
 1.13 08-Nov-2001  lukem add RCSID
 1.12 26-Oct-2001  lukem remove #include <ufs/ufs/quota.h> where it was just to appease
<ufs/ufs/inode.h>, since the latter now includes the former. leave the former
in source that obviously uses specific bits of it (for completeness.)
 1.11 15-Sep-2001  chs branches: 1.11.2;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.10 04-Jul-2001  chs branches: 1.10.2; 1.10.4;
in ext2fs_balloc_range(), clear PG_RDONLY on pages which now have backing store.
 1.9 30-May-2001  mrg use _KERNEL_OPT
 1.8 10-Dec-2000  chs branches: 1.8.2;
redo ext2fs_balloc_range(), accounting for differences between ext2fs and ffs.
 1.7 27-Nov-2000  chs Initial integration of the Unified Buffer Cache project.
 1.6 28-Jun-2000  mrg remove include of <vm/vm.h> and <uvm/uvm_extern.h>
 1.5 28-May-2000  mycroft Pull in indirect block unwind code from FFS.
 1.4 30-Mar-2000  augustss branches: 1.4.2;
Remove register declarations.
 1.3 01-Mar-1998  fvdl branches: 1.3.10; 1.3.14; 1.3.20;
Merge with Lite2 + local changes
 1.2 09-Oct-1997  bouyer Add byte-swapping functions (bswap16, bswap32, bswap64) to libkern.
Only assembly version for i386 bswap16 and bswap32 for now (bswap64 uses
bswap32). Contribution of assembly versions of these are welcome.
Add byte-swapping of ext2fs metadata for big-endian systems.
Tested on i386 and sparc.
 1.1 11-Jun-1997  bouyer branches: 1.1.4;
The ext2fs layer, based on the ffs/ufs one. Uses a few functions from
sys/ufs/ufs/
 1.1.4.1 14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.3.20.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.3.14.3 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.3.14.2 08-Dec-2000  bouyer Sync with HEAD.
 1.3.14.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.3.10.1 06-Aug-1999  chs UBCify.
 1.4.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.8.2.8 18-Oct-2002  nathanw Catch up to -current.
 1.8.2.7 20-Jun-2002  nathanw Catch up to -current.
 1.8.2.6 28-Feb-2002  nathanw Catch up to -current.
 1.8.2.5 08-Jan-2002  nathanw Catch up to -current.
 1.8.2.4 14-Nov-2001  nathanw Catch up to -current.
 1.8.2.3 21-Sep-2001  nathanw Catch up to -current.
 1.8.2.2 24-Aug-2001  nathanw Catch up with -current.
 1.8.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.10.4.1 01-Oct-2001  fvdl Catch up with -current.
 1.10.2.4 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.10.2.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.10.2.2 11-Feb-2002  jdolecek Sync w/ -current.
 1.10.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.11.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.19.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.19.2.5 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.19.2.4 15-Feb-2005  skrll Sync with HEAD.
 1.19.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.19.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.19.2.1 03-Aug-2004  skrll Sync with HEAD
 1.22.10.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.22.10.1 12-Feb-2005  yamt sync with head.
 1.22.8.1 29-Apr-2005  kent sync with -current
 1.25.4.3 27-Oct-2007  yamt sync with head.
 1.25.4.2 30-Dec-2006  yamt sync with head.
 1.25.4.1 21-Jun-2006  yamt sync with head.
 1.26.2.1 20-Oct-2005  yamt adapt ufs.
 1.28.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.28.10.2 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.28.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.28.8.1 24-May-2006  yamt sync with head.
 1.28.6.1 01-Jun-2006  kardel Sync with head.
 1.28.4.1 09-Sep-2006  rpaulo sync with head
 1.29.10.2 10-Dec-2006  yamt sync with head.
 1.29.10.1 22-Oct-2006  yamt sync with head
 1.29.8.1 18-Nov-2006  ad Sync with head.
 1.31.26.1 14-Oct-2007  yamt sync with head.
 1.31.24.1 06-Nov-2007  matt sync with HEAD
 1.31.22.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.31.8.2 24-Aug-2007  ad Sync with buffer cache locking changes. See buf.h/vfs_bio.c for details.
Some minor portions are incomplete and needs to be verified as a whole.
 1.31.8.1 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.32.24.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.32.22.2 11-Mar-2010  yamt sync with head
 1.32.22.1 04-May-2009  yamt sync with head.
 1.32.20.1 18-May-2008  yamt sync with head.
 1.32.18.1 02-Jun-2008  mjf Sync with HEAD.
 1.34.22.3 03-Dec-2017  jdolecek update from HEAD
 1.34.22.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.34.22.1 25-Feb-2013  tls resync with head
 1.34.12.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.34.12.2 23-Jan-2013  yamt sync with head
 1.34.12.1 16-Jan-2013  yamt sync with (a bit old) head
 1.39.10.2 05-Oct-2016  skrll Sync with HEAD
 1.39.10.1 06-Apr-2015  skrll Sync with HEAD
 1.41.16.1 10-Jun-2019  christos Sync with HEAD
 1.41.14.1 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.31 26-Aug-2023  riastradh ext2fs: Nix trailing whitespace.
 1.30 14-Aug-2016  jdolecek whitespace cleanup
 1.29 14-Aug-2016  jdolecek check correct inode extents flag - IN_E4EXTENTS is defined as 0x8000, correct flag EXT2_EXTENTS is 0x80000
 1.28 13-Aug-2016  christos KNF, no functional changes...
 1.27 03-Jun-2016  christos Add ext4 extent support from GSoC 2016 (Hrishikesh Goyal), from the FreeBSD
ext2 code.
 1.26 22-Jan-2013  dholland branches: 1.26.14;
Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.25 19-Oct-2009  bouyer branches: 1.25.12; 1.25.22;
Remove closes 3 & 4 from my licence. Lots of thanks to Soren Jacobsen
for the booring work !
 1.24 27-Mar-2008  ad branches: 1.24.4;
Make rusage collection per-LWP and collate in the appropriate places.
cloned threads need a little bit more work but the locking needs to
be fixed first.
 1.23 02-Jan-2008  ad branches: 1.23.6;
Merge vmlocking2 to head.
 1.22 08-Oct-2007  ad branches: 1.22.4; 1.22.6; 1.22.10;
Merge ffs locking & brelse changes from the vmlocking branch.
 1.21 11-Dec-2005  christos branches: 1.21.30; 1.21.44; 1.21.46; 1.21.48;
merge ktrace-lwp.
 1.20 30-Aug-2005  xtraeme * Remove __P()
* Use ANSI function declarations on ext2fs and mfs
 1.19 24-Mar-2005  bouyer branches: 1.19.2;
getblk() can return NULL if we are the pagedaemon. Check for this.
 1.18 26-Feb-2005  perry branches: 1.18.2;
nuke trailing whitespace
 1.17 15-Dec-2004  mycroft branches: 1.17.2; 1.17.4;
Remove some unnecessary (int32_t) casts that would cause us to screw up the
top bit in block addresses.

Also, change some daddr_t->int32_t casts (mostly as arguments to ufs_rw32(),
where they would get promoted anyway) to u_int32_t.
 1.16 15-Aug-2004  mycroft Fixing age old cruft:
* Rather than using mnt_maxsymlinklen to indicate that a file systems returns
d_type fields(!), add a new internal flag, IMNT_DTYPE.

Add 3 new elements to ufsmount:
* um_maxsymlinklen, replaces mnt_maxsymlinklen (which never should have existed
in the first place).
* um_dirblksiz, which tracks the current directory block size, eliminating the
FS-specific checks littered throughout the code. This may be used later to
make the block size variable.
* um_maxfilesize, which is the maximum file size, possibly adjusted lower due
to implementation issues.

Sync some bug fixes from FFS into ext2fs, particularly:
* ffs_lookup.c 1.21, 1.28, 1.33, 1.48
* ffs_inode.c 1.43, 1.44, 1.45, 1.66, 1.67
* ffs_vnops.c 1.84, 1.85, 1.86

Clean up some crappy pointer frobnication.
 1.15 22-Mar-2004  bouyer branches: 1.15.2; 1.15.4;
Fix disclaimer in my copyright. Pointed out by Thomas Klausner.
 1.14 25-Jan-2004  hannken Make VOP_STRATEGY(bp) a real VOP as discussed on tech-kern.

VOP_STRATEGY(bp) is replaced by one of two new functions:

- VOP_STRATEGY(vp, bp) Call the strategy routine of vp for bp.
- DEV_STRATEGY(bp) Call the d_strategy routine of bp->b_dev for bp.

DEV_STRATEGY(bp) is used only for block-to-block device situations.
 1.13 05-Oct-2003  bouyer Remove references to University of California from my copyright notices.
 1.12 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.11 18-May-2003  yamt branches: 1.11.2;
make is_sequential a callback in order to achieve better lfs write clustering.

since lfs always rewrite blocks into the new segment,
current on-disk place of the block doesn't affect to write clustering.

ok'ed by Konrad Schroder.
 1.10 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.9 10-Nov-2001  chs branches: 1.9.10;
update to track ufs_bmap.c:
don't call ufs_getlbns() for direct blocks.
 1.8 08-Nov-2001  lukem add RCSID
 1.7 06-Nov-2001  simonb Remove some variables that are set but never used.
 1.6 26-Oct-2001  lukem remove #include <ufs/ufs/quota.h> where it was just to appease
<ufs/ufs/inode.h>, since the latter now includes the former. leave the former
in source that obviously uses specific bits of it (for completeness.)
 1.5 30-Mar-2000  augustss branches: 1.5.6; 1.5.10; 1.5.14;
Remove register declarations.
 1.4 01-Mar-1998  fvdl branches: 1.4.14;
Merge with Lite2 + local changes
 1.3 09-Oct-1997  bouyer Add byte-swapping functions (bswap16, bswap32, bswap64) to libkern.
Only assembly version for i386 bswap16 and bswap32 for now (bswap64 uses
bswap32). Contribution of assembly versions of these are welcome.
Add byte-swapping of ext2fs metadata for big-endian systems.
Tested on i386 and sparc.
 1.2 24-Jul-1997  bouyer branches: 1.2.2;
ufs_getlbns needs an array of NIADDR+1 struct indir's, and not NIADDR.
This fixes a panic due to stack corruption when reading larges files.
 1.1 11-Jun-1997  bouyer The ext2fs layer, based on the ffs/ufs one. Uses a few functions from
sys/ufs/ufs/
 1.2.2.1 14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.4.14.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.5.14.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.5.10.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.5.6.4 12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.5.6.3 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.5.6.2 14-Nov-2001  nathanw Catch up to -current.
 1.5.6.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.9.10.1 06-Apr-2005  tron Pull up revision 1.19 (requested by bouyer in ticket #5728):
getblk() can return NULL if we are the pagedaemon. Check for this.
 1.11.2.8 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.11.2.7 01-Apr-2005  skrll Sync with HEAD.
 1.11.2.6 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.11.2.5 18-Dec-2004  skrll Sync with HEAD.
 1.11.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.11.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.11.2.2 25-Aug-2004  skrll Sync with HEAD.
 1.11.2.1 03-Aug-2004  skrll Sync with HEAD
 1.15.4.1 10-May-2005  riz Pull up revision 1.19 (requested by bouyer in ticket #1355):
getblk() can return NULL if we are the pagedaemon. Check for this.
 1.15.2.1 11-May-2005  riz Pull up revision 1.19 (requested by bouyer in ticket #1355):
getblk() can return NULL if we are the pagedaemon. Check for this.
 1.17.4.2 26-Mar-2005  yamt sync with head.
 1.17.4.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.17.2.1 29-Apr-2005  kent sync with -current
 1.18.2.1 27-Mar-2005  tron Pull up revision 1.19 (requested by bouyer in ticket #58):
getblk() can return NULL if we are the pagedaemon. Check for this.
 1.19.2.3 21-Jan-2008  yamt sync with head
 1.19.2.2 27-Oct-2007  yamt sync with head.
 1.19.2.1 21-Jun-2006  yamt sync with head.
 1.21.48.1 14-Oct-2007  yamt sync with head.
 1.21.46.2 09-Jan-2008  matt sync with HEAD
 1.21.46.1 06-Nov-2007  matt sync with HEAD
 1.21.44.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.21.30.2 24-Aug-2007  ad Sync with buffer cache locking changes. See buf.h/vfs_bio.c for details.
Some minor portions are incomplete and needs to be verified as a whole.
 1.21.30.1 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.22.10.1 02-Jan-2008  bouyer Sync with HEAD
 1.22.6.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.22.4.1 18-Feb-2008  mjf Sync with HEAD.
 1.23.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.24.4.1 11-Mar-2010  yamt sync with head
 1.25.22.2 03-Dec-2017  jdolecek update from HEAD
 1.25.22.1 25-Feb-2013  tls resync with head
 1.25.12.1 23-Jan-2013  yamt sync with head
 1.26.14.2 05-Oct-2016  skrll Sync with HEAD
 1.26.14.1 09-Jul-2016  skrll Sync with HEAD
 1.25 26-Aug-2023  riastradh ext2fs: Nix trailing whitespace.
 1.24 20-Aug-2016  jdolecek add support for GDT_CSUM AKA uninit_bg feature
 1.23 15-Aug-2016  jdolecek adjust ext2fs_makeinode() so that the direnter is optional, use the function (with the direnter off) in ext2fs_mkdir() instead of the code copy; adjust ext2fs_makeinode() to initialize extra_isize and set creation time, if supported by the filesystem
 1.22 04-Aug-2016  jdolecek rename struct ext2fs_dinode attribute e2di_dacl to correct
e2di_size_high; even Linux ext2 filesystem code actually uses it
unconditionally this way and ext4 code finally also calls it that way
in their struct definition too; if there was any trace of this for other
purpose it's long gone
 1.21 03-Aug-2016  jdolecek support arbitrary ext3/ext4 inode size, add all the new ext4 fields ext2fs_dinode, and add support for loading the extra inode data
 1.20 02-Aug-2016  jdolecek do not bswap fragment address, support in ext* for them was never actually implemented in linux kernels
 1.19 22-Jan-2013  dholland branches: 1.19.14; 1.19.18;
Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.18 19-Nov-2012  jakllsch - Add e2di_version, e2di_nblock_high, e2di_facl_high fields to ext2fs_dinode.

- Update i_e2fs_ aliases to match.

- ext2fs_bswap support for these ext2fs_dinode fields.

(e2di_version and e2di_facl_high replace previously reserved fields.
e2di_nblock_high was formerly e2di_nfrag and e2di_fsize, however these
are currently defined in e2fsprogs as only being relevant for HURD.)
 1.17 18-Nov-2012  jakllsch correct comment to match code
 1.16 19-Oct-2009  bouyer branches: 1.16.12; 1.16.22;
Remove closes 3 & 4 from my licence. Lots of thanks to Soren Jacobsen
for the booring work !
 1.15 14-Apr-2009  lukem fix -Wsign-compare issue on bigendian platforms
 1.14 23-Nov-2008  mrg branches: 1.14.4;
add support for 32 bit uid/gid fields in ext2, but only do so for
when the revision is > REV0.
 1.13 17-Nov-2007  tsutsui branches: 1.13.14; 1.13.18; 1.13.24; 1.13.26; 1.13.28;
Some KNF and cosmetics.
 1.12 17-Nov-2007  tsutsui Also bswap recently added e2fs_reserved_ngdb in e2fs_sb_bswap().
 1.11 11-Dec-2005  christos branches: 1.11.44; 1.11.46; 1.11.50; 1.11.52;
merge ktrace-lwp.
 1.10 30-Oct-2005  simonb Only include <sys/systm.h> if _KERNEL is defined.
 1.9 30-Aug-2005  xtraeme branches: 1.9.2;
* Remove __P()
* Use ANSI function declarations on ext2fs and mfs
 1.8 05-Oct-2003  bouyer branches: 1.8.16;
Remove references to University of California from my copyright notices.
 1.7 08-Nov-2001  lukem branches: 1.7.16;
add RCSID
 1.6 24-Jul-2000  mycroft branches: 1.6.2; 1.6.6; 1.6.10;
Need string.h for memset() prototype.
 1.5 15-May-2000  bouyer branches: 1.5.4;
Sync copyrigth notice.
 1.4 28-Jan-2000  bouyer Correct (minor) bogons in filetype option support, and add support
for sparse_super option
 1.3 26-Jan-2000  bouyer First cut at ext2fs rev 1 support (as of mke2fs 1.18): supports the filetype
option read/write and the sparse option read-only.
 1.2 09-Aug-1998  perry branches: 1.2.6; 1.2.12;
bzero->memset, bcopy->memcpy, bcmp->memcmp
 1.1 09-Oct-1997  bouyer branches: 1.1.2;
Add byte-swapping functions (bswap16, bswap32, bswap64) to libkern.
Only assembly version for i386 bswap16 and bswap32 for now (bswap64 uses
bswap32). Contribution of assembly versions of these are welcome.
Add byte-swapping of ext2fs metadata for big-endian systems.
Tested on i386 and sparc.
 1.1.2.2 14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.1.2.1 09-Oct-1997  thorpej file ext2fs_bswap.c was added on branch marc-pcmcia on 1997-10-14 16:06:14 +0000
 1.2.12.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.2.6.1 01-Feb-2000  he Apply patch (requested by bouyer):
Add support for ext2fs revision 1, with read-only support for
the 'sparse_super' and 'filetype' options. Should fix PR#9088.
 1.5.4.1 26-Jul-2000  mycroft Approved by thorpej:
Fix compilation error in fsck_ext2fs on big-endian systems, due to missing
prototypes.

syssrc/sys/ufs/ext2fs/ext2fs_bswap.c 1.5 -> 1.6
 1.6.10.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.6.6.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.6.2.1 14-Nov-2001  nathanw Catch up to -current.
 1.7.16.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.7.16.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.7.16.2 18-Sep-2004  skrll Sync with HEAD.
 1.7.16.1 03-Aug-2004  skrll Sync with HEAD
 1.8.16.2 07-Dec-2007  yamt sync with head
 1.8.16.1 21-Jun-2006  yamt sync with head.
 1.9.2.1 02-Nov-2005  yamt sync with head.
 1.11.52.1 19-Nov-2007  mjf Sync with HEAD.
 1.11.50.1 18-Nov-2007  bouyer Sync with HEAD
 1.11.46.1 09-Jan-2008  matt sync with HEAD
 1.11.44.1 21-Nov-2007  joerg Sync with HEAD.
 1.13.28.1 29-Nov-2008  snj Pull up following revision(s) (requested by mrg in ticket #147):
sys/ufs/ext2fs/ext2fs_alloc.c: revision 1.37
sys/ufs/ext2fs/ext2fs_bswap.c: revision 1.14
sys/ufs/ext2fs/ext2fs_dinode.h: revision 1.17
sys/ufs/ext2fs/ext2fs_lookup.c: revision 1.56
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.83
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.140
sys/ufs/ufs/inode.h: revision 1.55
add support for 32 bit uid/gid fields in ext2, but only do so for
when the revision is > REV0.
 1.13.26.2 28-Apr-2009  skrll Sync with HEAD.
 1.13.26.1 19-Jan-2009  skrll Sync with HEAD.
 1.13.24.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.13.18.2 11-Mar-2010  yamt sync with head
 1.13.18.1 04-May-2009  yamt sync with head.
 1.13.14.1 17-Jan-2009  mjf Sync with HEAD.
 1.14.4.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.16.22.3 03-Dec-2017  jdolecek update from HEAD
 1.16.22.2 25-Feb-2013  tls resync with head
 1.16.22.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.16.12.2 23-Jan-2013  yamt sync with head
 1.16.12.1 16-Jan-2013  yamt sync with (a bit old) head
 1.19.18.1 06-Aug-2016  pgoyette Sync with HEAD
 1.19.14.1 05-Oct-2016  skrll Sync with HEAD
 1.37 13-Jan-2017  christos Fix unsigned
 1.36 12-Aug-2016  jdolecek add support for extended attributes in ext2fs for ext3/ext4; read-only for now
 1.35 06-Aug-2016  jdolecek some more inode flags
 1.34 04-Aug-2016  jdolecek make E2MAXSYMLINKLEN just alias for EXT2_MAXSYMLINKLEN, they are the same
 1.33 04-Aug-2016  jdolecek move i_e2fs_* defines from ufs/inode.h to ext2fs/ext2fs_dinode.h, where they belong; they don't seem to be used anywhere else then ext2fs code any more
 1.32 04-Aug-2016  jdolecek rename struct ext2fs_dinode attribute e2di_dacl to correct
e2di_size_high; even Linux ext2 filesystem code actually uses it
unconditionally this way and ext4 code finally also calls it that way
in their struct definition too; if there was any trace of this for other
purpose it's long gone
 1.31 04-Aug-2016  nonaka include stddef.h for offsetof.

fix newfs_ext2fs build failure on evbppc.
 1.30 04-Aug-2016  nonaka pass isize to e2fs_i_bswap() if BYTE_ORDER != LITTLE_ENDIAN.
 1.29 03-Aug-2016  jdolecek get and set expanded timestamp if the inode contains the extra information, add support for create time
 1.28 03-Aug-2016  jdolecek support arbitrary ext3/ext4 inode size, add all the new ext4 fields ext2fs_dinode, and add support for loading the extra inode data
 1.27 02-Aug-2016  jdolecek adjust the comments for on-disk ext2fs inode to indicate which of the ext* was it implemented for linux kernel; makes it a bit easier to locate

split e2di_linux_reserved3 with e2di_extra_isize and e2di_checksum_high, tag as ext4
 1.26 22-Jan-2013  dholland branches: 1.26.14; 1.26.18;
Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.25 21-Nov-2012  jakllsch Add various newer Ext2 superblock feature bits and inode flag bits.
 1.24 19-Nov-2012  jakllsch - Add e2di_version, e2di_nblock_high, e2di_facl_high fields to ext2fs_dinode.

- Update i_e2fs_ aliases to match.

- ext2fs_bswap support for these ext2fs_dinode fields.

(e2di_version and e2di_facl_high replace previously reserved fields.
e2di_nblock_high was formerly e2di_nfrag and e2di_fsize, however these
are currently defined in e2fsprogs as only being relevant for HURD.)
 1.23 18-Nov-2012  jakllsch stylistic adjustment in comments
 1.22 27-Nov-2009  tsutsui branches: 1.22.12; 1.22.22;
Add definitions for more reserved inodes.
 1.21 19-Oct-2009  bouyer Remove closes 3 & 4 from my licence. Lots of thanks to Soren Jacobsen
for the booring work !
 1.20 12-Sep-2009  tsutsui Migrate from u_intNN_t to uintNN_t.
 1.19 02-Mar-2009  tsutsui Don't use e2fs_inode_size in superblock on E2FS_REV0 file system.
 1.18 01-Mar-2009  christos PR/40936: Frederik Sausmikat: ext2fs: add support for inodes > 128 bytes
 1.17 23-Nov-2008  mrg branches: 1.17.4;
add support for 32 bit uid/gid fields in ext2, but only do so for
when the revision is > REV0.
 1.16 17-Nov-2007  tsutsui branches: 1.16.14; 1.16.18; 1.16.24; 1.16.26; 1.16.28;
Misc cosmetics.
 1.15 15-Nov-2007  tsutsui Add some definitions for resizefs features.
 1.14 11-Dec-2005  christos branches: 1.14.44; 1.14.46; 1.14.50; 1.14.52;
merge ktrace-lwp.
 1.13 30-Aug-2005  xtraeme * Remove __P()
* Use ANSI function declarations on ext2fs and mfs
 1.12 26-Feb-2005  perry branches: 1.12.4;
nuke trailing whitespace
 1.11 22-Mar-2004  bouyer branches: 1.11.8; 1.11.10;
Fix disclaimer in my copyright. Pointed out by Thomas Klausner.
 1.10 05-Oct-2003  bouyer Remove references to University of California from my copyright notices.
 1.9 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.8 06-Jan-2003  wiz branches: 1.8.2;
writable, not writeable.
 1.7 01-Dec-2002  matt Add multiple inclusion protection for headers. Fix mismatched
variable declarations (missing const's) as needed.
 1.6 26-Jan-2000  bouyer branches: 1.6.6;
First cut at ext2fs rev 1 support (as of mke2fs 1.18): supports the filetype
option read/write and the sparse option read-only.
 1.5 23-Oct-1998  thorpej branches: 1.5.12;
For consistency w/ FFS/LFS, define EXT2_DINODE_SIZE, and use it instead
of pointer arithmetic and/or sizeof(struct ext2fs_dinode).
 1.4 13-Sep-1998  christos Fix copyright '\t' -> ' '
 1.3 09-Aug-1998  perry bzero->memset, bcopy->memcpy, bcmp->memcmp
 1.2 09-Oct-1997  bouyer Add byte-swapping functions (bswap16, bswap32, bswap64) to libkern.
Only assembly version for i386 bswap16 and bswap32 for now (bswap64 uses
bswap32). Contribution of assembly versions of these are welcome.
Add byte-swapping of ext2fs metadata for big-endian systems.
Tested on i386 and sparc.
 1.1 11-Jun-1997  bouyer branches: 1.1.4;
The ext2fs layer, based on the ffs/ufs one. Uses a few functions from
sys/ufs/ufs/
 1.1.4.1 14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.5.12.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.6.6.2 07-Jan-2003  thorpej Sync with HEAD.
 1.6.6.1 11-Dec-2002  thorpej Sync with HEAD.
 1.8.2.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.8.2.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.8.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.8.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.8.2.1 03-Aug-2004  skrll Sync with HEAD
 1.11.10.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.11.8.1 29-Apr-2005  kent sync with -current
 1.12.4.2 07-Dec-2007  yamt sync with head
 1.12.4.1 21-Jun-2006  yamt sync with head.
 1.14.52.1 19-Nov-2007  mjf Sync with HEAD.
 1.14.50.1 18-Nov-2007  bouyer Sync with HEAD
 1.14.46.1 09-Jan-2008  matt sync with HEAD
 1.14.44.1 21-Nov-2007  joerg Sync with HEAD.
 1.16.28.2 16-Jan-2011  bouyer Pull up following revision(s) (requested by tsutsui in ticket #1486):
sbin/fsck_ext2fs/setup.c: revision 1.26
sbin/newfs_ext2fs/mke2fs.c: revision 1.10
sbin/newfs_ext2fs/mke2fs.c: revision 1.11
sbin/newfs_ext2fs/mke2fs.c: revision 1.12
sbin/fsck_ext2fs/inode.c: revision 1.24
sys/lib/libsa/ext2fs.c: revision 1.6
sbin/newfs_ext2fs/extern.h: revision 1.3
sbin/fsck_ext2fs/inode.c: revision 1.25
sys/lib/libsa/ext2fs.c: revision 1.7
sbin/fsck_ext2fs/inode.c: revision 1.26
sys/ufs/ext2fs/ext2fs_inode.c: revision 1.68
sbin/fsck_ext2fs/inode.c: revision 1.27
sbin/fsck_ext2fs/inode.c: revision 1.28
sys/ufs/ext2fs/ext2fs_dinode.h: revision 1.18
sys/ufs/ext2fs/ext2fs_dinode.h: revision 1.19
sbin/newfs_ext2fs/newfs_ext2fs.c: revision 1.5
sbin/newfs_ext2fs/newfs_ext2fs.8: revision 1.2
sbin/newfs_ext2fs/newfs_ext2fs.c: revision 1.6
sbin/newfs_ext2fs/newfs_ext2fs.8: revision 1.3
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.142
sbin/newfs_ext2fs/newfs_ext2fs.c: revision 1.7
sbin/newfs_ext2fs/newfs_ext2fs.8: revision 1.4
sbin/newfs_ext2fs/newfs_ext2fs.c: revision 1.8
PR/40936: Frederik Sausmikat: ext2fs: add support for inodes > 128 bytes
Support variable inode sizes.
catch up with variable inode size.
Don't use e2fs_inode_size in superblock on E2FS_REV0 file system.
- accept only EXT2_REV0_DINODE_SIZE inodesize on -O 0
- use inodesize to get offset of inode, not struct ext2fs_dinode array
Replace a magic number with a new EXT2_REV0_DINODE_SIZE macro.
Use EXT2_DINODE_SIZE() to get offset of inode, not struct ext2fs_dinode array.
Fix botched logic in inodesize check.
Use inodesize to get offset of inode in one more place.
- add a sanity check for e2fs_inode_size in readsb()
- use EXT2_DINODE_SIZE() rather than sizeof(struct ext2fs_dinode) or
struct ext2fs_dinode array/pointer to see e2fs_ipb and inode offsets
Sort options.
New sentence, new line.
Sort options in usage.
- unsigned -> unsigned int
- remove unnecessary casts from malloc(3) and free(3)
- fix a bogus indent
Use "size > INT32_MAX" rather than "size >= 0x80000000U" to check 2GB limit.
Add missed byteswap ops against ext2fs_dinode members.
Handle 32 bit uid field on E2FS_REV1.
 1.16.28.1 29-Nov-2008  snj Pull up following revision(s) (requested by mrg in ticket #147):
sys/ufs/ext2fs/ext2fs_alloc.c: revision 1.37
sys/ufs/ext2fs/ext2fs_bswap.c: revision 1.14
sys/ufs/ext2fs/ext2fs_dinode.h: revision 1.17
sys/ufs/ext2fs/ext2fs_lookup.c: revision 1.56
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.83
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.140
sys/ufs/ufs/inode.h: revision 1.55
add support for 32 bit uid/gid fields in ext2, but only do so for
when the revision is > REV0.
 1.16.26.2 03-Mar-2009  skrll Sync with HEAD.
 1.16.26.1 19-Jan-2009  skrll Sync with HEAD.
 1.16.24.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.16.18.3 11-Mar-2010  yamt sync with head
 1.16.18.2 16-Sep-2009  yamt sync with head
 1.16.18.1 04-May-2009  yamt sync with head.
 1.16.14.1 17-Jan-2009  mjf Sync with HEAD.
 1.17.4.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.22.22.3 03-Dec-2017  jdolecek update from HEAD
 1.22.22.2 25-Feb-2013  tls resync with head
 1.22.22.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.22.12.2 23-Jan-2013  yamt sync with head
 1.22.12.1 16-Jan-2013  yamt sync with (a bit old) head
 1.26.18.2 20-Mar-2017  pgoyette Sync with HEAD
 1.26.18.1 06-Aug-2016  pgoyette Sync with HEAD
 1.26.14.2 05-Feb-2017  skrll Sync with HEAD
 1.26.14.1 05-Oct-2016  skrll Sync with HEAD
 1.23 10-Mar-2024  christos PR/58018: Damir Holovati: ext2fs readdir (d_type conversion error)
 1.22 07-Aug-2016  kre branches: 1.22.20; 1.22.46;

If using constants from dirent.h it ought to be included.
Hopefully fixes i386 build.
 1.21 06-Aug-2016  jdolecek actually pass the d_type from the on-disk directory entry to the lookup results
 1.20 24-Jun-2016  christos GSoC 2016 (Hrishikesh Goyal): Htree index support from FreeBSD
 1.19 09-May-2012  riastradh branches: 1.19.2; 1.19.16;
Adapt ffs, lfs, and ext2fs to use genfs_rename.

ok dholland, rmind
 1.18 19-Oct-2009  bouyer branches: 1.18.12; 1.18.16;
Remove closes 3 & 4 from my licence. Lots of thanks to Soren Jacobsen
for the booring work !
 1.17 12-Sep-2009  tsutsui Use proper macro, some KNF, fix typo.
 1.16 12-Sep-2009  tsutsui Migrate from u_intNN_t to uintNN_t.
 1.15 25-Dec-2007  perry branches: 1.15.10;
Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h
 1.14 17-Nov-2007  tsutsui branches: 1.14.2; 1.14.6;
Misc cosmetics.
 1.13 16-Feb-2006  perry branches: 1.13.38; 1.13.40; 1.13.44; 1.13.46;
Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.
 1.12 24-Dec-2005  perry branches: 1.12.2; 1.12.4; 1.12.6;
__inline__ -> inline
 1.11 11-Dec-2005  christos merge ktrace-lwp.
 1.10 30-Aug-2005  xtraeme * Remove __P()
* Use ANSI function declarations on ext2fs and mfs
 1.9 26-Feb-2005  perry branches: 1.9.4;
nuke trailing whitespace
 1.8 22-Mar-2004  bouyer branches: 1.8.8; 1.8.10;
Fix disclaimer in my copyright. Pointed out by Thomas Klausner.
 1.7 05-Oct-2003  bouyer Remove references to University of California from my copyright notices.
 1.6 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.5 01-Dec-2002  matt branches: 1.5.6;
Add multiple inclusion protection for headers. Fix mismatched
variable declarations (missing const's) as needed.
 1.4 28-Jan-2000  bouyer branches: 1.4.6;
Correct (minor) bogons in filetype option support, and add support
for sparse_super option
 1.3 26-Jan-2000  bouyer First cut at ext2fs rev 1 support (as of mke2fs 1.18): supports the filetype
option read/write and the sparse option read-only.
 1.2 13-Sep-1998  christos branches: 1.2.12;
Fix copyright '\t' -> ' '
 1.1 11-Jun-1997  bouyer The ext2fs layer, based on the ffs/ufs one. Uses a few functions from
sys/ufs/ufs/
 1.2.12.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.4.6.1 11-Dec-2002  thorpej Sync with HEAD.
 1.5.6.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.5.6.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.5.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.5.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.5.6.1 03-Aug-2004  skrll Sync with HEAD
 1.8.10.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.8.8.1 29-Apr-2005  kent sync with -current
 1.9.4.3 21-Jan-2008  yamt sync with head
 1.9.4.2 07-Dec-2007  yamt sync with head
 1.9.4.1 21-Jun-2006  yamt sync with head.
 1.12.6.1 22-Apr-2006  simonb Sync with head.
 1.12.4.1 09-Sep-2006  rpaulo sync with head
 1.12.2.1 18-Feb-2006  yamt sync with head.
 1.13.46.2 18-Feb-2008  mjf Sync with HEAD.
 1.13.46.1 19-Nov-2007  mjf Sync with HEAD.
 1.13.44.1 18-Nov-2007  bouyer Sync with HEAD
 1.13.40.1 09-Jan-2008  matt sync with HEAD
 1.13.38.1 21-Nov-2007  joerg Sync with HEAD.
 1.14.6.1 02-Jan-2008  bouyer Sync with HEAD
 1.14.2.1 26-Dec-2007  ad Sync with head.
 1.15.10.2 11-Mar-2010  yamt sync with head
 1.15.10.1 16-Sep-2009  yamt sync with head
 1.18.16.1 02-Jun-2012  mrg sync to latest -current.
 1.18.12.1 23-May-2012  yamt sync with head.
 1.19.16.2 05-Oct-2016  skrll Sync with HEAD
 1.19.16.1 09-Jul-2016  skrll Sync with HEAD
 1.19.2.1 03-Dec-2017  jdolecek update from HEAD
 1.22.46.1 23-Aug-2024  martin Pull up following revision(s) (requested by riastradh in ticket #796):

sys/ufs/ext2fs/ext2fs_dir.h: revision 1.23

PR/58018: Damir Holovati: ext2fs readdir (d_type conversion error)
 1.22.20.1 23-Aug-2024  martin Pull up following revision(s) (requested by riastradh in ticket #1873):

sys/ufs/ext2fs/ext2fs_dir.h: revision 1.23

PR/58018: Damir Holovati: ext2fs readdir (d_type conversion error)
 1.3 13-Aug-2016  christos branches: 1.3.14;
KNF, no functional changes...
 1.2 09-Aug-2016  christos KNF
 1.1 03-Jun-2016  christos branches: 1.1.2;
Add ext4 extent support from GSoC 2016 (Hrishikesh Goyal), from the FreeBSD
ext2 code.
 1.1.2.3 05-Oct-2016  skrll Sync with HEAD
 1.1.2.2 09-Jul-2016  skrll Sync with HEAD
 1.1.2.1 03-Jun-2016  skrll file ext2fs_extents.c was added on branch nick-nhusb on 2016-07-09 20:25:24 +0000
 1.3.14.2 03-Dec-2017  jdolecek update from HEAD
 1.3.14.1 13-Aug-2016  jdolecek file ext2fs_extents.c was added on branch tls-maxphys on 2017-12-03 11:39:21 +0000
 1.5 26-Aug-2023  riastradh ext2fs: Nix trailing whitespace.
 1.4 09-Aug-2016  kre branches: 1.4.14;

Revert previous - which itself (incorrectly) reverted the previous
changes, breaking the build.
 1.3 09-Aug-2016  christos More htree writing support (Hrishikesh Goyal GSoC 2016)
 1.2 10-Jun-2016  dholland branches: 1.2.2;
needs <stdbool.h>
 1.1 03-Jun-2016  christos Add ext4 extent support from GSoC 2016 (Hrishikesh Goyal), from the FreeBSD
ext2 code.
 1.2.2.3 05-Oct-2016  skrll Sync with HEAD
 1.2.2.2 09-Jul-2016  skrll Sync with HEAD
 1.2.2.1 10-Jun-2016  skrll file ext2fs_extents.h was added on branch nick-nhusb on 2016-07-09 20:25:24 +0000
 1.4.14.2 03-Dec-2017  jdolecek update from HEAD
 1.4.14.1 09-Aug-2016  jdolecek file ext2fs_extents.h was added on branch tls-maxphys on 2017-12-03 11:39:21 +0000
 1.57 26-Aug-2023  riastradh ext2fs: Nix trailing whitespace.
 1.56 28-May-2017  hannken Change ext2fs to use vcache_new like we did for ffs:
- Change ext2fs_valloc to return an inode number.
- Make ext2fs_makeinode private to ext2fs_vnops.c and
pass vattr instead of mode.
 1.55 20-Aug-2016  jdolecek add support for GDT_CSUM AKA uninit_bg feature
 1.54 19-Aug-2016  jdolecek fix bug introduced in rev 1.82 of ext2fs_lookup.c, when ext2fs_add_entry()
was introduced splitting code from ext2fs_direnter() - code used
incorrect new entry size, leading to incomplete entry copy or buffer
overflow; fixed by passing the right size from ext2fs_direnter()
 1.53 15-Aug-2016  jdolecek adjust ext2fs_makeinode() so that the direnter is optional, use the function (with the direnter off) in ext2fs_mkdir() instead of the code copy; adjust ext2fs_makeinode() to initialize extra_isize and set creation time, if supported by the filesystem
 1.52 09-Aug-2016  kre Undo revert now Christos has added the missing glue...
 1.51 09-Aug-2016  kre Revert previous. This work isn't complete enough to include yet,
and the build of current really does need to go back to a working state.
 1.50 09-Aug-2016  christos More htree writing support (Hrishikesh Goyal GSoC 2016)
 1.49 24-Jun-2016  christos GSoC 2016 (Hrishikesh Goyal): Htree index support from FreeBSD
 1.48 27-Mar-2015  riastradh Disentangle buffer-cached I/O from page-cached I/O in UFS.

Page-cached I/O is used for regular files, and is initiated by VFS
users such as userland and NFS.

Buffer-cached I/O is used for directories and symlinks, and is issued
only internally by UFS.

New UFS routine ufs_bufio replaces vn_rdwr for internal use.
ufs_bufio is implemented by new UFS operations uo_bufrd/uo_bufwr,
which sit in ufs_readwrite.c alongside the VOP_READ/VOP_WRITE
implementations.

I preserved the code as much as possible and will leave further
simplification for future commits. I kept the ulfs_readwrite.c
copypasta close to ufs_readwrite.c in case we ever want to merge them
back; likewise ext2fs_readwrite.c.

No externally visible semantic change. All atf fs tests still pass.
 1.47 25-May-2014  hannken branches: 1.47.4;
Remove ext2fs_checkpath(). It is a relic from the pre-genfs_rename era.
 1.46 21-Nov-2012  jakllsch branches: 1.46.10;
Write support for the Ext4 Read-only Compatible Feature "huge_file".

Primarily, this feature extends the inode block count field to 48 bits.
Additionally, this feature allows this field to be represented in file
system block size units rather than DEV_BSIZE units.
 1.45 17-Nov-2012  jakllsch Match prototype types to function types (u_int64_t vs. uint64_t).
 1.44 09-May-2012  riastradh branches: 1.44.2;
Adapt ffs, lfs, and ext2fs to use genfs_rename.

ok dholland, rmind
 1.43 12-Jul-2011  dholland branches: 1.43.2; 1.43.6;
Pass the ufs_lookup_results pointer around instead of fetching it from
the inode in the guts of ufs. Now, in VOPs where i_crap is used it is
used (directly) only immediately on entry to the VOP call and then
passed around by reference.

Except for rename, which needs explicit sorting out. The code in
ufs_wapbl_rename is unchanged in behavior but I'm increasingly
inclined to think it's wrong.
 1.42 21-Oct-2009  pooka update i_uid and i_gid after chown
 1.41 19-Oct-2009  bouyer Remove closes 3 & 4 from my licence. Lots of thanks to Soren Jacobsen
for the booring work !
 1.40 12-Sep-2009  tsutsui Reduce diffs a bit between ext2fs_reload() and ffs_reload().
 1.39 28-Jun-2008  rumble branches: 1.39.6; 1.39.14;
Create sysctl entries during module initialisation and destroy them
appropriately.

Many of these file systems are now ready for modularisation.
 1.38 08-Dec-2007  pooka branches: 1.38.12; 1.38.16; 1.38.18; 1.38.20;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.
 1.37 26-Nov-2007  pooka branches: 1.37.2;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.36 31-Jul-2007  pooka branches: 1.36.2; 1.36.4; 1.36.10; 1.36.12;
* nuke the nameidata parameter from VFS_MOUNT(). Nobody on tech-kern
knew what it was supposed to be used for and wrstuden gave a go-ahead
* while rototilling, convert file systems which went easily to
use VFS_PROTOS() instead of manually prototyping the methods
 1.35 12-Jul-2007  dsl branches: 1.35.2;
Change the VFS_MOUNT() interface so that the 'data' buffer passed to the
fs code is a kernel buffer, pass though the length of the buffer as well.
Since the length of the userspace buffer isn'it (yet) passed through the mount
system call, add a field to the vfsops structure containing the default length.
Split sys_mount() for calls from compat code.
Ride one of the recent kernel version changes - old fs LKMs will load, but
sys_mount() will reject any attempt to use them.
 1.34 13-Jul-2006  martin branches: 1.34.14;
Fix alignement problems for fhandle_t, exposed by gcc4.1.

While touching all vptofh/fhtovp functions, get rid of VFS_MAXFIDSIZ,
version the getfh(2) syscall and explicitly pass the size available in
the filehandle from userland.

Discussed on tech-kern, with lots of help from yamt (thanks!).
 1.33 14-May-2006  elad branches: 1.33.4;
integrate kauth.
 1.32 27-Dec-2005  chs branches: 1.32.4; 1.32.6; 1.32.8; 1.32.10; 1.32.12;
changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
make those fields always present.
- for functions which are conditionally inline, make them never inline.
- remove some other functions which are conditionally defined but
don't actually do anything anymore.
- make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.
 1.31 13-Dec-2005  christos add fwd declaration for struct proc. Fixes vax build.
 1.30 11-Dec-2005  christos merge ktrace-lwp.
 1.29 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.28 12-Sep-2005  christos branches: 1.28.2;
- access the ffs and ext2fs itimes functions through a pointer, so that
if the filesystem is not compiled in the kernel still links. Probably
a better solution is to use weak symbols.
- move the filesystem-specific itime macros to the filesystem header files.
 1.27 12-Sep-2005  christos Use nanotime() to update the time fields in filesystems. Convert the code
from macros to real functions. Original patch and review from chuq.
Note: ext2fs only keeps seconds in the on-disk inode, and msdosfs does not
have enough precision for all fields, so this is not very useful for those
two.
 1.26 30-Aug-2005  xtraeme * Remove __P()
* Use ANSI function declarations on ext2fs and mfs
 1.25 09-Feb-2005  ws branches: 1.25.6;
Add support for large files (>2GB).
Like Linux, automagically convert old filesystem to use this,
if they are already at revision 1.
For revision 0, just punt (unlike Linux; makes me a bit too nervous.)

There should be an option to fsck_ext2fs to upgrade revision 0 to revision 1.

Reviewd by Manuel (bouyer@).
 1.24 20-May-2004  atatat branches: 1.24.4; 1.24.6;
Tweak sysctl setup functions (the macros, actually) for use in lkms,
and tweak lkminit_*.c (where applicable) to call them, and to call
sysctl_teardown() when being unloaded.

This consists of (1) making setup functions not be static when being
compiled as lkms (change to sys/sysctl.h), (2) making prototypes
visible for the various setup functions in header files (changes to
various header files), and (3) making simple "load" and "unload"
functions in the actual lkminit stuff.

linux_sysctl.c also needs its root exposed (ie, made not static) for
this (when built as an lkm).
 1.23 21-Apr-2004  christos Replace the statfs() family of system calls with statvfs().
Retain binary compatibility.
 1.22 22-Mar-2004  bouyer branches: 1.22.2;
Fix disclaimer in my copyright. Pointed out by Thomas Klausner.
 1.21 04-Dec-2003  atatat Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
 1.20 05-Oct-2003  bouyer Remove references to University of California from my copyright notices.
 1.19 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.18 29-Jun-2003  fvdl branches: 1.18.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.17 29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.16 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.15 26-May-2003  fvdl free the ext2fs dinode struct in ext2fs_reclaim. From Ted Unangst.
 1.14 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.13 01-Dec-2002  matt Add multiple inclusion protection for headers. Fix mismatched
variable declarations (missing const's) as needed.
 1.12 26-Sep-2002  jdolecek use ufs_balloc_range() rather than local (mostly identical, but with some
bugs) ext2fs variant
 1.11 15-Sep-2001  chs a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.10 15-Sep-2001  chs add a new VFS op, vfs_reinit, which is called when desiredvnodes is
adjusted via sysctl. file systems that have hash tables which are
sized based on the value of this variable now resize those hash tables
using the new value. the max number of FFS softdeps is also recalculated.

convert various file systems to use the <sys/queue.h> macros for
their hash tables.
 1.9 27-Nov-2000  chs branches: 1.9.2; 1.9.6; 1.9.8;
Initial integration of the Unified Buffer Cache project.
 1.8 16-Mar-2000  jdolecek Add new VFS op routine - vfs_done and call it on filesystem detach
in vfs_detach(). vfs_done may free global filesystem's resources,
typically those allocated in respective filesystem's init function.
Needed so those filesystems which went in via LKM have a chance to
clean after themselves before unloading. This fixes random panics
when LKM for filesystem using pools was loaded and unloaded several
times.

For each leaf filesystem, add appropriate vfs_done routine.
 1.7 26-Feb-1999  wrstuden branches: 1.7.4; 1.7.8;
Modify vfsops to seperate vfs_fhtovp() into two routines. vfs_fhtovp() now
only handles the file handle to vnode conversion, and a new call,
vfs_checkexp(), performs the export verification.
 1.6 01-Sep-1998  thorpej Use the pool allocator and "nointr" pool page allocator for ext2fs inodes.
 1.5 24-Jun-1998  sommerfe Always include fifos; "not an option any more".
 1.4 23-Jun-1998  sommerfe Don't include opt_fifo.h if not kernel...
 1.3 22-Jun-1998  sommerfe defopt for options FIFO
 1.2 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.1 11-Jun-1997  bouyer The ext2fs layer, based on the ffs/ufs one. Uses a few functions from
sys/ufs/ufs/
 1.7.8.2 08-Dec-2000  bouyer Sync with HEAD.
 1.7.8.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.7.4.1 06-Aug-1999  chs UBCify.
 1.9.8.1 01-Oct-2001  fvdl Catch up with -current.
 1.9.6.2 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.9.6.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.9.2.3 11-Dec-2002  thorpej Sync with HEAD.
 1.9.2.2 18-Oct-2002  nathanw Catch up to -current.
 1.9.2.1 21-Sep-2001  nathanw Catch up to -current.
 1.18.2.8 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.18.2.7 15-Feb-2005  skrll Sync with HEAD.
 1.18.2.6 29-Oct-2004  skrll Remove the struct lwp * argument from ext2f2_checkpath that is no longer
(read: was never) required.
 1.18.2.5 21-Sep-2004  skrll Fix the sync with head I botched.
 1.18.2.4 18-Sep-2004  skrll Sync with HEAD.
 1.18.2.3 24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.18.2.2 03-Aug-2004  skrll Sync with HEAD
 1.18.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.22.2.1 23-May-2004  tron Pull up revision 1.24 (requested by atatat in ticket #374):
Tweak sysctl setup functions (the macros, actually) for use in lkms,
and tweak lkminit_*.c (where applicable) to call them, and to call
sysctl_teardown() when being unloaded.
This consists of (1) making setup functions not be static when being
compiled as lkms (change to sys/sysctl.h), (2) making prototypes
visible for the various setup functions in header files (changes to
various header files), and (3) making simple "load" and "unload"
functions in the actual lkminit stuff.
linux_sysctl.c also needs its root exposed (ie, made not static) for
this (when built as an lkm).
 1.24.6.1 12-Feb-2005  yamt sync with head.
 1.24.4.1 29-Apr-2005  kent sync with -current
 1.25.6.5 21-Jan-2008  yamt sync with head
 1.25.6.4 07-Dec-2007  yamt sync with head
 1.25.6.3 03-Sep-2007  yamt sync with head.
 1.25.6.2 30-Dec-2006  yamt sync with head.
 1.25.6.1 21-Jun-2006  yamt sync with head.
 1.28.2.1 20-Oct-2005  yamt adapt ufs.
 1.32.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.32.10.3 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.32.10.2 20-Apr-2006  christos use struct kauth_cred instead of kauth_cred_t so that we don't need kauth.h
 1.32.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.32.8.2 11-Aug-2006  yamt sync with head
 1.32.8.1 24-May-2006  yamt sync with head.
 1.32.6.1 01-Jun-2006  kardel Sync with head.
 1.32.4.1 09-Sep-2006  rpaulo sync with head
 1.33.4.1 13-Jul-2006  gdamore Merge from HEAD.
 1.34.14.2 20-Aug-2007  ad Sync with HEAD.
 1.34.14.1 15-Jul-2007  ad Sync with head.
 1.35.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.36.12.2 31-Jul-2007  pooka * nuke the nameidata parameter from VFS_MOUNT(). Nobody on tech-kern
knew what it was supposed to be used for and wrstuden gave a go-ahead
* while rototilling, convert file systems which went easily to
use VFS_PROTOS() instead of manually prototyping the methods
 1.36.12.1 31-Jul-2007  pooka file ext2fs_extern.h was added on branch matt-mips64 on 2007-07-31 21:14:20 +0000
 1.36.10.2 27-Dec-2007  mjf Sync with HEAD.
 1.36.10.1 08-Dec-2007  mjf Sync with HEAD.
 1.36.4.1 09-Jan-2008  matt sync with HEAD
 1.36.2.2 09-Dec-2007  jmcneill Sync with HEAD.
 1.36.2.1 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.37.2.1 26-Dec-2007  ad Sync with head.
 1.38.20.1 03-Jul-2008  simonb Sync with head.
 1.38.18.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.38.16.3 11-Mar-2010  yamt sync with head
 1.38.16.2 16-Sep-2009  yamt sync with head
 1.38.16.1 04-May-2009  yamt sync with head.
 1.38.12.1 29-Jun-2008  mjf Sync with HEAD.
 1.39.14.1 21-Apr-2010  matt sync to netbsd-5
 1.39.6.1 27-Oct-2009  bouyer Pull up following revision(s) (requested by pooka in ticket #1112):
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.91
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.152
sys/ufs/ext2fs/ext2fs_extern.h: revision 1.42
update i_uid and i_gid after chown
 1.43.6.1 02-Jun-2012  mrg sync to latest -current.
 1.43.2.2 16-Jan-2013  yamt sync with (a bit old) head
 1.43.2.1 23-May-2012  yamt sync with head.
 1.44.2.4 03-Dec-2017  jdolecek update from HEAD
 1.44.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.44.2.2 25-Feb-2013  tls resync with head
 1.44.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.46.10.1 10-Aug-2014  tls Rebase.
 1.47.4.4 28-Aug-2017  skrll Sync with HEAD
 1.47.4.3 05-Oct-2016  skrll Sync with HEAD
 1.47.4.2 09-Jul-2016  skrll Sync with HEAD
 1.47.4.1 06-Apr-2015  skrll Sync with HEAD
 1.2 13-Aug-2016  christos branches: 1.2.14;
KNF, no functional changes...
 1.1 24-Jun-2016  christos branches: 1.1.2;
GSoC 2016 (Hrishikesh Goyal): Htree index support from FreeBSD
 1.1.2.3 05-Oct-2016  skrll Sync with HEAD
 1.1.2.2 09-Jul-2016  skrll Sync with HEAD
 1.1.2.1 24-Jun-2016  skrll file ext2fs_hash.c was added on branch nick-nhusb on 2016-07-09 20:25:24 +0000
 1.2.14.2 03-Dec-2017  jdolecek update from HEAD
 1.2.14.1 13-Aug-2016  jdolecek file ext2fs_hash.c was added on branch tls-maxphys on 2017-12-03 11:39:21 +0000
 1.1 24-Jun-2016  christos branches: 1.1.2; 1.1.18;
GSoC 2016 (Hrishikesh Goyal): Htree index support from FreeBSD
 1.1.18.2 03-Dec-2017  jdolecek update from HEAD
 1.1.18.1 24-Jun-2016  jdolecek file ext2fs_hash.h was added on branch tls-maxphys on 2017-12-03 11:39:21 +0000
 1.1.2.2 09-Jul-2016  skrll Sync with HEAD
 1.1.2.1 24-Jun-2016  skrll file ext2fs_hash.h was added on branch nick-nhusb on 2016-07-09 20:25:24 +0000
 1.11 26-Aug-2023  riastradh ext2fs: Nix trailing whitespace.
 1.10 04-May-2022  andvar s/entires/entries/ in local variable definition.
 1.9 23-Aug-2016  christos branches: 1.9.14;
CID 1371645: remove dead code
 1.8 20-Aug-2016  jdolecek whitespace fix
 1.7 19-Aug-2016  jdolecek fix bug introduced in rev 1.82 of ext2fs_lookup.c, when ext2fs_add_entry()
was introduced splitting code from ext2fs_direnter() - code used
incorrect new entry size, leading to incomplete entry copy or buffer
overflow; fixed by passing the right size from ext2fs_direnter()
 1.6 14-Aug-2016  jdolecek switch ext2fs_htree_has_idx() over to EXT2F_HAS_COMPAT_FEATURE() and remove EXT2F_HAS_COMPAT_FEATURE() - this also fixes it for BE machines, as EXT2F_HAS_COMPAT_FEATURE() did extra byte swap; also remove XXX comment about IN_E3INDEX
 1.5 13-Aug-2016  christos KNF, no functional changes...
 1.4 09-Aug-2016  kre Undo revert now Christos has added the missing glue...
 1.3 09-Aug-2016  kre Revert previous. This work isn't complete enough to include yet,
and the build of current really does need to go back to a working state.
 1.2 09-Aug-2016  christos More htree writing support (Hrishikesh Goyal GSoC 2016)
 1.1 24-Jun-2016  christos branches: 1.1.2;
GSoC 2016 (Hrishikesh Goyal): Htree index support from FreeBSD
 1.1.2.3 05-Oct-2016  skrll Sync with HEAD
 1.1.2.2 09-Jul-2016  skrll Sync with HEAD
 1.1.2.1 24-Jun-2016  skrll file ext2fs_htree.c was added on branch nick-nhusb on 2016-07-09 20:25:24 +0000
 1.9.14.2 03-Dec-2017  jdolecek update from HEAD
 1.9.14.1 23-Aug-2016  jdolecek file ext2fs_htree.c was added on branch tls-maxphys on 2017-12-03 11:39:21 +0000
 1.1 24-Jun-2016  christos branches: 1.1.2; 1.1.18;
GSoC 2016 (Hrishikesh Goyal): Htree index support from FreeBSD
 1.1.18.2 03-Dec-2017  jdolecek update from HEAD
 1.1.18.1 24-Jun-2016  jdolecek file ext2fs_htree.h was added on branch tls-maxphys on 2017-12-03 11:39:21 +0000
 1.1.2.2 09-Jul-2016  skrll Sync with HEAD
 1.1.2.1 24-Jun-2016  skrll file ext2fs_htree.h was added on branch nick-nhusb on 2016-07-09 20:25:24 +0000
 1.91 26-Aug-2023  riastradh ext2fs: Nix trailing whitespace.
 1.90 17-Aug-2021  andvar fix multiplei repetitive typos in comments, messages and documentation. mainly because copy paste code big amount of files are affected.
 1.89 23-Apr-2020  ad PR kern/54759 (vm.ubc_direct deadlock when read()/write() into mapping of itself)

- Add new flag UBC_ISMAPPED which tells ubc_uiomove() the object is mmap()ed
somewhere. Use it to decide whether to do direct-mapped copy, rather than
poking around directly in the vnode in ubc_uiomove(), which is ugly and
doesn't work for tmpfs. It would be nicer to contain all this in UVM but
the filesystem provides the needed locking here (VV_MAPPED) and to
reinvent that would suck more.

- Rename UBC_UNMAP_FLAG() to UBC_VNODE_FLAGS(). Pass in UBC_ISMAPPED where
appropriate.
 1.88 26-May-2017  riastradh branches: 1.88.20;
Eliminate crusty debugging sludge.

We have a mostly sane vnode lifecycle now. If this needs debugging,
it should be done once at the call site of VOP_RECLAIM.
 1.87 11-Apr-2017  riastradh Make VOP_INACTIVE preserve vnode lock on return.

Discussed on tech-kern:
https://mail-index.netbsd.org/tech-kern/2017/04/01/msg021751.html

Ride 7.99.68, a bumpy bus of incremental vfs improvements!
 1.86 14-Aug-2016  jdolecek branches: 1.86.2;
switch code to use the EXT2_HAS_{COMPAT|ROCOMPAT|INCOMPAT}_FEATURE() macros instead of open coding the checks
 1.85 13-Aug-2016  christos KNF, no functional changes...
 1.84 04-Aug-2016  jdolecek rename struct ext2fs_dinode attribute e2di_dacl to correct
e2di_size_high; even Linux ext2 filesystem code actually uses it
unconditionally this way and ext4 code finally also calls it that way
in their struct definition too; if there was any trace of this for other
purpose it's long gone
 1.83 03-Aug-2016  jdolecek support arbitrary ext3/ext4 inode size, add all the new ext4 fields ext2fs_dinode, and add support for loading the extra inode data
 1.82 28-Mar-2015  maxv branches: 1.82.2;
Remove the 'cred' argument from bread(). Remove a now unused var in
ffs_snapshot.c. Update the man page accordingly.

ok hannken@
 1.81 23-Jun-2013  dholland branches: 1.81.10;
Stick ffs_, ext2_, chfs_, filecore_, cd9660_, or mfs_ in front of
the following symbols so as to disambiguate fully. (Christos already
did the lfs ones.)

lblkno
lblktosize
lfragtosize
numfrags
blkroundup
fragroundup
 1.80 23-Jun-2013  dholland fsbtodb() -> FFS_FSBTODB(), EXT2_FSBTODB(), or MFS_FSBTODB()
dbtofsb() -> FFS_DBTOFSB() or EXT2_DBTOFSB()

(Christos already did the lfs ones a few days back)
 1.79 19-Jun-2013  dholland Rename ambiguous macros:
MAXDIRSIZE -> UFS_MAXDIRSIZE or LFS_MAXDIRSIZE
NINDIR -> FFS_NINDIR, EXT2_NINDIR, LFS_NINDIR, or MFS_NINDIR
INOPB -> FFS_INOPB, LFS_INOPB
INOPF -> FFS_INOPF, LFS_INOPF
blksize -> ffs_blksize, ext2_blksize, or lfs_blksize
sblksize -> ffs_blksize

These are not the only ambiguously defined filesystem macros, of
course, there's a pile more. I may not have found all the ambiguous
definitions of blksize(), too, as there are a lot of other things
called 'blksize' in the system.
 1.78 22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.77 20-Dec-2012  hannken Change bread() and breadn() to never return a buffer on
error and modify all callers to not brelse() on error.

Welcome to 6.99.16

PR kern/46282 (6.0_BETA crash: msdosfs_bmap -> pcbmap -> bread -> bio_doread)
 1.76 21-Nov-2012  jakllsch Write support for the Ext4 Read-only Compatible Feature "huge_file".

Primarily, this feature extends the inode block count field to 48 bits.
Additionally, this feature allows this field to be represented in file
system block size units rather than DEV_BSIZE units.
 1.75 27-Jan-2012  para branches: 1.75.6;
converting readdir in ffs ext2fs from malloc(9) to kmem(9)
while there allocate ufs mount structs from kmem(9) too
preceding kmem-vmem-pool-patch

releng@ acknowledged
 1.74 16-Jun-2011  hannken branches: 1.74.2; 1.74.6;
Rename uvm_vnp_zerorange(struct vnode *, off_t, size_t) to
ubc_zerorange(struct uvm_object *, off_t, size_t, int) changing
the first argument to an uvm_object and adding a flags argument.

Modify tmpfs_reg_resize() to zero the backing store (aobj) instead
of the vnode. Ubc_purge() no longer panics when unmounting tmpfs.

Keep uvm_vnp_zerorange() until the next kernel version bump.
 1.73 28-Jul-2010  hannken branches: 1.73.6;
ext2fs,ffs: free on disk inodes in the reclaim routine.
Remove now unneeded vnode flag VI_FREEING.

Welcome to 5.99.38.

Ok: Andrew Doran <ad@netbsd.org>
 1.72 24-Jun-2010  hannken Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.
 1.71 07-Feb-2010  bouyer branches: 1.71.2; 1.71.4;
- ufs_balloc_range(): on error, only PG_RELEASED the pages that were
allocated to extend the file to the new size. Releasing all pages
may release pages that contains previously-written data not yet flushed
to disk. Should fix PR kern/35704
- {ffs,lfs,ext2fs}_truncate(): Even if the inode's size is the same as
the new length, call uvm_vnp_setsize(). *_truncate() may have been
called by *_write() in the error path (e.g. block allocation failure
because of quota of file system full), and at this point v_writesize
has been set to the desired size of the file and not reverted to the
old size. Not adjusting v_writesize to the real size cause
genfs_do_io() to write to disk past the real end of the file.
 1.70 19-Oct-2009  bouyer Remove closes 3 & 4 from my licence. Lots of thanks to Soren Jacobsen
for the booring work !
 1.69 12-Sep-2009  tsutsui Migrate from u_intNN_t to uintNN_t.
 1.68 01-Mar-2009  christos PR/40936: Frederik Sausmikat: ext2fs: add support for inodes > 128 bytes
 1.67 17-Dec-2008  cegger branches: 1.67.2;
kill MALLOC and FREE macros.
 1.66 16-May-2008  hannken branches: 1.66.6; 1.66.8; 1.66.14;
Make sure all cached buffers with valid, not yet written data have been
run through copy-on-write. Call fscow_run() with valid data where possible.

The LP_UFSCOW hack is no longer needed to protect ffs_copyonwrite() against
endless recursion.

- Add a flag B_MODIFY to bread(), breada() and breadn(). If set the caller
intends to modify the buffer returned.

- Always run copy-on-write on buffers returned from ffs_balloc().

- Add new function ffs_getblk() that gets a buffer, assigns a new blkno,
may clear the buffer and runs copy-on-write. Process possible errors
from getblk() or fscow_run(). Part of PR kern/38664.

Welcome to 4.99.63

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.65 27-Mar-2008  ad branches: 1.65.2; 1.65.4; 1.65.6;
Make rusage collection per-LWP and collate in the appropriate places.
cloned threads need a little bit more work but the locking needs to
be fixed first.
 1.64 09-Jan-2008  ad branches: 1.64.6;
Go back to freeing on disk inodes in the inactive routine. It would be
better not to do this, but it rules out potential side effects with softdep.
 1.63 02-Jan-2008  ad Merge vmlocking2 to head.
 1.62 08-Dec-2007  pooka branches: 1.62.4;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.
 1.61 26-Nov-2007  pooka branches: 1.61.2;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.60 08-Oct-2007  ad branches: 1.60.4;
Merge ffs locking & brelse changes from the vmlocking branch.
 1.59 05-Jun-2007  yamt branches: 1.59.6; 1.59.8; 1.59.10;
improve post-ubc file overwrite performance in common cases.
ie. when it's safe, actually overwrite blocks rather than doing
read-modify-write.

also fixes PR/33152 and PR/36303.
 1.58 07-Apr-2007  hannken Remove calls to now obsolete vn_start_write() and vn_finished_write().
 1.57 04-Mar-2007  christos branches: 1.57.2; 1.57.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.56 09-Dec-2006  chs branches: 1.56.2;
several ext2fs fixes provided by Barry Bouwsma:
- set ip->i_e2fs_dtime to time_second, not time_uptime.
- don't allow ipref to go negative
- fs->e2fs.e2fs_icount is a valid inode number, allow it.
 1.55 07-Jun-2006  kardel branches: 1.55.6; 1.55.8; 1.55.10;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.54 14-May-2006  elad branches: 1.54.2;
integrate kauth.
 1.53 17-Mar-2006  christos don't use MALLOC with a non-constant size; use malloc instead.
 1.52 11-Dec-2005  christos branches: 1.52.4; 1.52.6; 1.52.8; 1.52.10; 1.52.12;
merge ktrace-lwp.
 1.51 11-Nov-2005  yamt - ignore truncation for VCHR/VBLK/VFIFO as it used to be
before yamt-vop merge. PR/32049 from Atsushi Onoe.
- reject setattr which attempts to change size of VLNK/VSOCK.
 1.50 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.49 26-Sep-2005  yamt branches: 1.49.2;
always use nanotime rather than time.
it's bad to mix nanotime and time because it sometimes
make timestamps go backwards.
 1.48 12-Sep-2005  christos Use nanotime() to update the time fields in filesystems. Convert the code
from macros to real functions. Original patch and review from chuq.
Note: ext2fs only keeps seconds in the on-disk inode, and msdosfs does not
have enough precision for all fields, so this is not very useful for those
two.
 1.47 30-Aug-2005  xtraeme * Remove __P()
* Use ANSI function declarations on ext2fs and mfs
 1.46 28-Jun-2005  kml branches: 1.46.2;
Ensure that we change the size of the vnode at the same time as
we change the size of the inode, and use ext2fs_size uniformly.
This fixes a crash that occurs when I create a directory, then
move it, all on an ext2 filesystem.
 1.45 26-Feb-2005  perry nuke trailing whitespace
 1.44 09-Feb-2005  ws Add support for large files (>2GB).
Like Linux, automagically convert old filesystem to use this,
if they are already at revision 1.
For revision 0, just punt (unlike Linux; makes me a bit too nervous.)

There should be an option to fsck_ext2fs to upgrade revision 0 to revision 1.

Reviewd by Manuel (bouyer@).
 1.43 15-Aug-2004  mycroft branches: 1.43.4; 1.43.6;
Fixing age old cruft:
* Rather than using mnt_maxsymlinklen to indicate that a file systems returns
d_type fields(!), add a new internal flag, IMNT_DTYPE.

Add 3 new elements to ufsmount:
* um_maxsymlinklen, replaces mnt_maxsymlinklen (which never should have existed
in the first place).
* um_dirblksiz, which tracks the current directory block size, eliminating the
FS-specific checks littered throughout the code. This may be used later to
make the block size variable.
* um_maxfilesize, which is the maximum file size, possibly adjusted lower due
to implementation issues.

Sync some bug fixes from FFS into ext2fs, particularly:
* ffs_lookup.c 1.21, 1.28, 1.33, 1.48
* ffs_inode.c 1.43, 1.44, 1.45, 1.66, 1.67
* ffs_vnops.c 1.84, 1.85, 1.86

Clean up some crappy pointer frobnication.
 1.42 14-Aug-2004  mycroft Push atime/mtime updates even further -- into the reclaim path, so they happen
rarely in the normal case. (Note: This happens at reboot/shutdown time because
all file systems are unmounted.)

Also, for IN_MODIFY, use IN_ACCESSED, not IN_MODIFIED; otherwise "ls -l" of
your device node or FIFO would cause the time stamps to get written too
quickly.
 1.41 14-Aug-2004  mycroft Add a new flag, IN_MODIFY. This is like IN_UPDATE|IN_CHANGE, but unlike
setting those flags, it does not cause the inode to be written in the periodic
sync. This is used for writes to special files (devices and named pipes) and
FIFOs.

Do not preemptively sync updates to access times and modification times. They
are now updated in the inode only opportunistically, or when the file or device
is closed. (Really, it should be delayed beyond close, but this is enough to
help substantially with device nodes.)

And the most amusing part:
Trickle sync was broken on both FFS and ext2fs, in different ways. In FFS, the
periodic call to VFS_SYNC(MNT_LAZY) was still causing all file data to be
synced. In ext2fs, it was causing the metadata to *not* be synced. We now
only call VOP_UPDATE() on the node if we're doing MNT_LAZY. I've confirmed
that we do in fact trickle correctly now.
 1.40 22-Mar-2004  bouyer Fix disclaimer in my copyright. Pointed out by Thomas Klausner.
 1.39 25-Jan-2004  hannken Make VOP_STRATEGY(bp) a real VOP as discussed on tech-kern.

VOP_STRATEGY(bp) is replaced by one of two new functions:

- VOP_STRATEGY(vp, bp) Call the strategy routine of vp for bp.
- DEV_STRATEGY(bp) Call the d_strategy routine of bp->b_dev for bp.

DEV_STRATEGY(bp) is used only for block-to-block device situations.
 1.38 05-Nov-2003  hannken Clean up the usage of vn_start_write(). At least one occurence clobbered
previous error conditions.
If "(flags & (V_WAIT|V_PCATCH)) == V_WAIT" the return value is always zero.
Ignore the return value in these cases.

From Darrin B. Jewell.
 1.37 15-Oct-2003  hannken Add the gating of system calls that cause modifications to the underlying
file system.
The function vfs_write_suspend stops all new write operations to a file
system, allows any file system modifying system calls already in progress
to complete, then sync's the file system to disk and returns. The
function vfs_write_resume allows the suspended write operations to
complete.

From FreeBSD with slight modifications.

Approved by: Frank van der Linden <fvdl@netbsd.org>
 1.36 05-Oct-2003  bouyer Remove references to University of California from my copyright notices.
 1.35 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.34 29-Jun-2003  fvdl branches: 1.34.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.33 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.32 02-Apr-2003  he In the inode, i_din.e2fs_din is now a pointer, so there is no longer
a need to take the address here.
 1.31 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.30 25-Jan-2003  fvdl The oldblks and newblks arrays are used to store direct copies of
on-disk block pointers, so they should be int32_t. Error found
by Izumi Tsutsui.
 1.29 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.28 26-Sep-2002  jdolecek use ufs_balloc_range() rather than local (mostly identical, but with some
bugs) ext2fs variant
 1.27 08-Nov-2001  lukem add RCSID
 1.26 06-Nov-2001  simonb Remove some bogus checks for unsigned variables < 0.
 1.25 26-Oct-2001  lukem remove #include <ufs/ufs/quota.h> where it was just to appease
<ufs/ufs/inode.h>, since the latter now includes the former. leave the former
in source that obviously uses specific bits of it (for completeness.)
 1.24 19-Jun-2001  wiz branches: 1.24.4; 1.24.8;
`accessible' only has one `a'.
 1.23 18-Feb-2001  chs branches: 1.23.2;
skip truncating a file to 0 before freeing it if it's already zero-length.
 1.22 07-Feb-2001  tsutsui Fix nested extern declaration of prtactive.
 1.21 27-Nov-2000  chs Initial integration of the Unified Buffer Cache project.
 1.20 28-Jun-2000  mrg remove include of <vm/vm.h> and <uvm/uvm_extern.h>
 1.19 30-May-2000  mycroft Adjust where IN_MODIFIED and IN_ACCESSED are cleared (as in the FFS code).
 1.18 29-May-2000  mycroft Pull in IN_ACCESSED changes and some MNT_LAZY `bug fixes' from FFS.
 1.17 28-May-2000  mycroft Pull in indirect block unwind code from FFS.
 1.16 28-May-2000  mycroft Add a new function to remove extra buffers when truncating a file. This is
more generic than the vinvalbuf(V_SAVEMETA) case, avoiding synchronous
operations when truncating to a non-zero length.
 1.15 13-May-2000  perseant branches: 1.15.2;
Change the sementics of the last parameter from a boolean ("waitfor") to
a set of flags ("flags"). Two flags are defined, UPDATE_WAIT and
UPDATE_DIROP.

Under the old semantics, VOP_UPDATE would block if waitfor were set,
under the assumption that directory operations should be done
synchronously. At least LFS and FFS+softdep do not make this
assumption; FFS+softdep got around the problem by enclosing all relevant
calls to VOP_UPDATE in a "if(!DOINGSOFTDEP(vp))", while LFS simply
ignored waitfor, one of the reasons why NFS-serving an LFS filesystem
did not work properly.

Under the new semantics, the UPDATE_DIROP flag is a hint to the
fs-specific update routine that the call comes from a dirop routine, and
should be wait for, or not, accordingly.

Closes PR#8996.
 1.14 30-Mar-2000  augustss Remove register declarations.
 1.13 24-Mar-1999  mrg branches: 1.13.4; 1.13.8; 1.13.14;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.
 1.12 05-Mar-1999  mycroft Pass null pointers to VOP_UPDATE rather than having all the callers fetch the
current time themselves.
 1.11 05-Mar-1999  mycroft Permit the access and modify time pointers passed to VOP_UPDATE to be null,
meaning the current time.
 1.10 23-Oct-1998  thorpej For consistency w/ FFS/LFS, define EXT2_DINODE_SIZE, and use it instead
of pointer arithmetic and/or sizeof(struct ext2fs_dinode).
 1.9 29-Sep-1998  bouyer #include opt_uvm.h only if _KENREL and !_LKM
Make ext2fs_init() call ufs_init(). it was doing the init by itself,
testing for extern done != 0. This bug was hidden by the fact that
ext2fs_init() is called before ffs_init().
 1.8 09-Aug-1998  perry bzero->memset, bcopy->memcpy, bcmp->memcmp
 1.7 09-Jun-1998  mikel ffs_ -> ext2fs_ in warning; art@openbsd.org
 1.6 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.5 10-Feb-1998  mrg - add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.4 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)
 1.3 09-Oct-1997  bouyer Add byte-swapping functions (bswap16, bswap32, bswap64) to libkern.
Only assembly version for i386 bswap16 and bswap32 for now (bswap64 uses
bswap32). Contribution of assembly versions of these are welcome.
Add byte-swapping of ext2fs metadata for big-endian systems.
Tested on i386 and sparc.
 1.2 04-Jul-1997  drochner branches: 1.2.2;
Don't cast 64bit (off_t) file sizes to vm_offset_t (32bit on many
architectures), truncate them intelligently instead.
The truncation is done centralized in vnode_pager.c.
This prevents from wrap-over effects when parts of large (>2^32 byte) files
are mmapped.
Don't allow to mmap above the numerical range of vm_offset_t.
This is considered a temporary solution until the vm system handles the
object sizes/offsets more cleanly.
 1.1 11-Jun-1997  bouyer The ext2fs layer, based on the ffs/ufs one. Uses a few functions from
sys/ufs/ufs/
 1.2.2.1 14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.13.14.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.13.8.4 12-Mar-2001  bouyer Sync with HEAD.
 1.13.8.3 11-Feb-2001  bouyer Sync with HEAD.
 1.13.8.2 08-Dec-2000  bouyer Sync with HEAD.
 1.13.8.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.13.4.2 06-Aug-1999  chs UBCify.
 1.13.4.1 11-Jul-1999  chs remove uvm_vnp_uncache(), it's no longer needed.
 1.15.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.23.2.6 18-Oct-2002  nathanw Catch up to -current.
 1.23.2.5 12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.23.2.4 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.23.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.23.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.23.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.24.8.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.24.4.2 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.24.4.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.34.2.10 11-Dec-2005  christos Sync with head.
 1.34.2.9 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.34.2.8 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.34.2.7 15-Feb-2005  skrll Sync with HEAD.
 1.34.2.6 27-Oct-2004  skrll Fix various comments that describe the argument structures
 1.34.2.5 21-Sep-2004  skrll Fix the sync with head I botched.
 1.34.2.4 18-Sep-2004  skrll Sync with HEAD.
 1.34.2.3 25-Aug-2004  skrll Sync with HEAD.
 1.34.2.2 03-Aug-2004  skrll Sync with HEAD
 1.34.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.43.6.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.43.6.1 12-Feb-2005  yamt sync with head.
 1.43.4.1 29-Apr-2005  kent sync with -current
 1.46.2.6 21-Jan-2008  yamt sync with head
 1.46.2.5 07-Dec-2007  yamt sync with head
 1.46.2.4 27-Oct-2007  yamt sync with head.
 1.46.2.3 03-Sep-2007  yamt sync with head.
 1.46.2.2 30-Dec-2006  yamt sync with head.
 1.46.2.1 21-Jun-2006  yamt sync with head.
 1.49.2.1 20-Oct-2005  yamt adapt ufs.
 1.52.12.2 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.52.12.1 28-Mar-2006  tron Merge 2006-03-28 NetBSD-current into the "peter-altq" branch.
 1.52.10.3 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.52.10.2 19-Apr-2006  elad sync with head.
 1.52.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.52.8.3 26-Jun-2006  yamt sync with head.
 1.52.8.2 24-May-2006  yamt sync with head.
 1.52.8.1 01-Apr-2006  yamt sync with head.
 1.52.6.3 01-Jun-2006  kardel Sync with head.
 1.52.6.2 22-Apr-2006  simonb Sync with head.
 1.52.6.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time() and use "time_second"
instead of "time.tv_sec".
 1.52.4.1 09-Sep-2006  rpaulo sync with head
 1.54.2.1 19-Jun-2006  chap Sync with head.
 1.55.10.1 12-Jan-2007  bouyer Pull up following revision(s) (requested by chs in ticket #346):
sys/ufs/ext2fs/ext2fs_inode.c: revision 1.56
sys/ufs/ext2fs/ext2fs_alloc.c: revision 1.33
several ext2fs fixes provided by Barry Bouwsma:
- set ip->i_e2fs_dtime to time_second, not time_uptime.
- don't allow ipref to go negative
- fs->e2fs.e2fs_icount is a valid inode number, allow it.
 1.55.8.1 10-Dec-2006  yamt sync with head.
 1.55.6.1 12-Jan-2007  ad Sync with head.
 1.56.2.2 15-Apr-2007  yamt sync with head.
 1.56.2.1 12-Mar-2007  rmind Sync with HEAD.
 1.57.4.1 11-Jul-2007  mjf Sync with head.
 1.57.2.5 16-Sep-2007  ad - Checkpoint work in progress on the vnode lifecycle and reference counting
stuff. This makes it work properly without kernel_lock and fixes a few
quite old bugs. See vfs_subr.c 1.283.2.17 for details.

- Fix some problems with softdep. Unfortunately our softdep code appears
to have some longstanding bugs that cause it fail under stress test.
 1.57.2.4 24-Aug-2007  ad Sync with buffer cache locking changes. See buf.h/vfs_bio.c for details.
Some minor portions are incomplete and needs to be verified as a whole.
 1.57.2.3 09-Jun-2007  ad Sync with head.
 1.57.2.2 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.57.2.1 10-Apr-2007  ad Sync with head.
 1.59.10.1 14-Oct-2007  yamt sync with head.
 1.59.8.3 23-Mar-2008  matt sync with HEAD
 1.59.8.2 09-Jan-2008  matt sync with HEAD
 1.59.8.1 06-Nov-2007  matt sync with HEAD
 1.59.6.3 09-Dec-2007  jmcneill Sync with HEAD.
 1.59.6.2 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.59.6.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.60.4.3 18-Feb-2008  mjf Sync with HEAD.
 1.60.4.2 27-Dec-2007  mjf Sync with HEAD.
 1.60.4.1 08-Dec-2007  mjf Sync with HEAD.
 1.61.2.3 30-Dec-2007  ad Fix remaining problems with ext2fs on this branch.
 1.61.2.2 26-Dec-2007  ad Sync with head.
 1.61.2.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.62.4.2 10-Jan-2008  bouyer Sync with HEAD
 1.62.4.1 02-Jan-2008  bouyer Sync with HEAD
 1.64.6.3 17-Jan-2009  mjf Sync with HEAD.
 1.64.6.2 02-Jun-2008  mjf Sync with HEAD.
 1.64.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.65.6.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.65.4.4 11-Aug-2010  yamt sync with head.
 1.65.4.3 11-Mar-2010  yamt sync with head
 1.65.4.2 16-Sep-2009  yamt sync with head
 1.65.4.1 04-May-2009  yamt sync with head.
 1.65.2.1 18-May-2008  yamt sync with head.
 1.66.14.1 21-Apr-2010  matt sync to netbsd-5
 1.66.8.2 16-Jan-2011  bouyer Pull up following revision(s) (requested by tsutsui in ticket #1486):
sbin/fsck_ext2fs/setup.c: revision 1.26
sbin/newfs_ext2fs/mke2fs.c: revision 1.10
sbin/newfs_ext2fs/mke2fs.c: revision 1.11
sbin/newfs_ext2fs/mke2fs.c: revision 1.12
sbin/fsck_ext2fs/inode.c: revision 1.24
sys/lib/libsa/ext2fs.c: revision 1.6
sbin/newfs_ext2fs/extern.h: revision 1.3
sbin/fsck_ext2fs/inode.c: revision 1.25
sys/lib/libsa/ext2fs.c: revision 1.7
sbin/fsck_ext2fs/inode.c: revision 1.26
sys/ufs/ext2fs/ext2fs_inode.c: revision 1.68
sbin/fsck_ext2fs/inode.c: revision 1.27
sbin/fsck_ext2fs/inode.c: revision 1.28
sys/ufs/ext2fs/ext2fs_dinode.h: revision 1.18
sys/ufs/ext2fs/ext2fs_dinode.h: revision 1.19
sbin/newfs_ext2fs/newfs_ext2fs.c: revision 1.5
sbin/newfs_ext2fs/newfs_ext2fs.8: revision 1.2
sbin/newfs_ext2fs/newfs_ext2fs.c: revision 1.6
sbin/newfs_ext2fs/newfs_ext2fs.8: revision 1.3
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.142
sbin/newfs_ext2fs/newfs_ext2fs.c: revision 1.7
sbin/newfs_ext2fs/newfs_ext2fs.8: revision 1.4
sbin/newfs_ext2fs/newfs_ext2fs.c: revision 1.8
PR/40936: Frederik Sausmikat: ext2fs: add support for inodes > 128 bytes
Support variable inode sizes.
catch up with variable inode size.
Don't use e2fs_inode_size in superblock on E2FS_REV0 file system.
- accept only EXT2_REV0_DINODE_SIZE inodesize on -O 0
- use inodesize to get offset of inode, not struct ext2fs_dinode array
Replace a magic number with a new EXT2_REV0_DINODE_SIZE macro.
Use EXT2_DINODE_SIZE() to get offset of inode, not struct ext2fs_dinode array.
Fix botched logic in inodesize check.
Use inodesize to get offset of inode in one more place.
- add a sanity check for e2fs_inode_size in readsb()
- use EXT2_DINODE_SIZE() rather than sizeof(struct ext2fs_dinode) or
struct ext2fs_dinode array/pointer to see e2fs_ipb and inode offsets
Sort options.
New sentence, new line.
Sort options in usage.
- unsigned -> unsigned int
- remove unnecessary casts from malloc(3) and free(3)
- fix a bogus indent
Use "size > INT32_MAX" rather than "size >= 0x80000000U" to check 2GB limit.
Add missed byteswap ops against ext2fs_dinode members.
Handle 32 bit uid field on E2FS_REV1.
 1.66.8.1 22-Feb-2010  snj Pull up following revision(s) (requested by bouyer in ticket #1302):
sys/ufs/ext2fs/ext2fs_inode.c: revision 1.71
sys/ufs/ffs/ffs_inode.c: revision 1.104
sys/ufs/lfs/lfs_inode.c: revision 1.121
sys/ufs/ufs/ufs_inode.c: revision 1.79
- ufs_balloc_range(): on error, only PG_RELEASED the pages that were
allocated to extend the file to the new size. Releasing all pages
may release pages that contains previously-written data not yet flushed
to disk. Should fix PR kern/35704
- {ffs,lfs,ext2fs}_truncate(): Even if the inode's size is the same as
the new length, call uvm_vnp_setsize(). *_truncate() may have been
called by *_write() in the error path (e.g. block allocation failure
because of quota of file system full), and at this point v_writesize
has been set to the desired size of the file and not reverted to the
old size. Not adjusting v_writesize to the real size cause
genfs_do_io() to write to disk past the real end of the file.
 1.66.6.2 03-Mar-2009  skrll Sync with HEAD.
 1.66.6.1 19-Jan-2009  skrll Sync with HEAD.
 1.67.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.71.4.3 05-Mar-2011  rmind sync with head
 1.71.4.2 03-Jul-2010  rmind sync with head
 1.71.4.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.71.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.73.6.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.74.6.1 18-Feb-2012  mrg merge to -current.
 1.74.2.4 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.74.2.3 23-Jan-2013  yamt sync with head
 1.74.2.2 16-Jan-2013  yamt sync with (a bit old) head
 1.74.2.1 17-Apr-2012  yamt sync with head
 1.75.6.4 03-Dec-2017  jdolecek update from HEAD
 1.75.6.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.75.6.2 23-Jun-2013  tls resync from head
 1.75.6.1 25-Feb-2013  tls resync with head
 1.81.10.3 28-Aug-2017  skrll Sync with HEAD
 1.81.10.2 05-Oct-2016  skrll Sync with HEAD
 1.81.10.1 06-Apr-2015  skrll Sync with HEAD
 1.82.2.2 26-Apr-2017  pgoyette Sync with HEAD
 1.82.2.1 06-Aug-2016  pgoyette Sync with HEAD
 1.86.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.88.20.1 25-Apr-2020  bouyer Sync with bouyer-xenpvh-base2 (HEAD)
 1.95 08-Sep-2024  rillig fix a/an grammar in obvious cases
 1.94 26-Aug-2023  riastradh branches: 1.94.6;
ext2fs: Nix trailing whitespace.
 1.93 10-Aug-2023  mrg don't assign struct pointers to smaller then structure regions of memory.

in all cases here, the later parts of the structure are not actually
accessed, so there are no existing bugs here beyond general UB. for the
ufs ones, this also removes some casts.

found by GCC 12.
 1.92 06-Aug-2022  andvar s/blity/bility/ in various words, mainly in comments.
 1.91 16-May-2020  christos Add ACL support for FFS. From FreeBSD.
 1.90 04-Apr-2020  ad Merge the remaining changes from the ad-namecache branch, affecting namei()
and getcwd():

- push vnode locking back as far as possible.
- do most lookups directly in the namecache, avoiding vnode locks & refs.
- don't block new refs to vnodes across VOP_INACTIVE().
- get shared locks for VOP_LOOKUP() if the file system supports it.
- correct lock types for VOP_ACCESS() / VOP_GETATTR() in a few places.

Possible future enhancements:

- make the lookups lockless.
- support dotdot lookups by being lockless and inferring absence of chroot.
- maybe make it work for layered file systems.
- avoid vnode references at the root & cwd.
 1.89 14-Mar-2020  ad - Hide the details of SPCF_SHOULDYIELD and related behind a couple of small
functions: preempt_point() and preempt_needed().

- preempt(): if the LWP has exceeded its timeslice in kernel, strip it of
any priority boost gained earlier from blocking.
 1.88 23-Aug-2016  christos branches: 1.88.16; 1.88.22;
KNF, no functional change
 1.87 19-Aug-2016  jdolecek fix bug introduced in rev 1.82 of ext2fs_lookup.c, when ext2fs_add_entry()
was introduced splitting code from ext2fs_direnter() - code used
incorrect new entry size, leading to incomplete entry copy or buffer
overflow; fixed by passing the right size from ext2fs_direnter()
 1.86 14-Aug-2016  jdolecek when converting on-disk direntry, only use the on-disk filetype if the feature flag is present
 1.85 14-Aug-2016  jdolecek switch code to use the EXT2_HAS_{COMPAT|ROCOMPAT|INCOMPAT}_FEATURE() macros instead of open coding the checks
 1.84 13-Aug-2016  christos KNF, no functional changes...
 1.83 13-Aug-2016  christos sync with hrishi's git
 1.82 09-Aug-2016  christos merge missing function.
 1.81 06-Aug-2016  jdolecek actually pass the d_type from the on-disk directory entry to the lookup results
 1.80 24-Jun-2016  christos GSoC 2016 (Hrishikesh Goyal): Htree index support from FreeBSD
 1.79 12-Jan-2016  riastradh Use buffer cache, not page cache, to expand directories in ext2fs.

Candidate fix for PR kern/50607, PR port-evbmips/50059.

Formerly VOP_WRITE-->ext2fs_write would automatically dispatch to
this code path for writes to directories, but I broke that in
ext2fs_lookup.c rev. 1.78 when disentangling page-cached and
buffer-cached writes.

This was not a problem in ufs, and I didn't notice it in ext2fs,
because ufs consistently used buffercache(9) directly instead of
using VOP_WRITE sometimes as ext2fs did.
 1.78 27-Mar-2015  riastradh Disentangle buffer-cached I/O from page-cached I/O in UFS.

Page-cached I/O is used for regular files, and is initiated by VFS
users such as userland and NFS.

Buffer-cached I/O is used for directories and symlinks, and is issued
only internally by UFS.

New UFS routine ufs_bufio replaces vn_rdwr for internal use.
ufs_bufio is implemented by new UFS operations uo_bufrd/uo_bufwr,
which sit in ufs_readwrite.c alongside the VOP_READ/VOP_WRITE
implementations.

I preserved the code as much as possible and will leave further
simplification for future commits. I kept the ulfs_readwrite.c
copypasta close to ufs_readwrite.c in case we ever want to merge them
back; likewise ext2fs_readwrite.c.

No externally visible semantic change. All atf fs tests still pass.
 1.77 03-Jun-2014  joerg branches: 1.77.4;
Introduce two helper functions to centralise the namecache statistics
in vfs_cache.c. Use consistent locking around the per-cpu data.
 1.76 25-May-2014  hannken Remove ext2fs_checkpath(). It is a relic from the pre-genfs_rename era.
 1.75 08-May-2014  hannken Add a global vnode cache:

- vcache_get() retrieves a referenced and initialised vnode / fs node pair.
- vcache_remove() removes a vnode / fs node pair from the cache.

On cache miss vcache_get() calls new vfs operation vfs_loadvnode() to
initialise a vnode / fs node pair. This call is guaranteed exclusive,
no other thread will try to load this vnode / fs node pair.

Convert ufs/ext2fs, ufs/ffs and ufs/mfs to use this interface.

Remove now unused ufs/ufs_ihash

Discussed on tech-kern.

Welcome to 6.99.41
 1.74 07-Feb-2014  hannken branches: 1.74.2;
Change vnode operation lookup to return the resulting vnode *vpp unlocked.
Change cache_lookup() to return an unlocked vnode.

Discussed on tech-kern@

Welcome to 6.99.31
 1.73 22-Jan-2013  dholland branches: 1.73.2;
Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.72 05-Nov-2012  dholland Excise struct componentname from the namecache.

This uglifies the interface, because several operations need to be
passed the namei flags and cache_lookup also needs for the time being
to be passed cnp->cn_nameiop. Nonetheless, it's a net benefit.

The glop should be able to go away eventually but requires structural
cleanup elsewhere first.

This change requires a kernel bump.
 1.71 05-Nov-2012  dholland Disentangle the namecache from the internals of namei.

- Move the namecache's hash computation to inside the namecache code,
instead of being spread out all over the place. Remove cn_hash from
struct componentname and delete all uses of it.

- It is no longer necessary (if it ever was) for cache_lookup and
cache_lookup_raw to clear MAKEENTRY from cnp->cn_flags for the cases
that cache_enter already checks for.

- Rearrange the interface of cache_lookup (and cache_lookup_raw) to
make it somewhat simpler, to exclude certain nonexistent error
conditions, and (most importantly) to make it not require write access
to cnp->cn_flags.

This change requires a kernel bump.
 1.70 22-Jul-2012  rmind branches: 1.70.2;
Move some the test for MAKEENTRY into the cache_enter(9). Make some
variables in vfs_cache.c static, __read_mostly, etc.

No objection on tech-kern@.
 1.69 16-Mar-2012  hannken Fix last commit that broke lookup for dot with op DELETE.

Reviewed by: David Holland <dholland@netbsd.org>
 1.68 13-Mar-2012  elad Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.
 1.67 27-Jan-2012  para branches: 1.67.2;
converting readdir in ffs ext2fs from malloc(9) to kmem(9)
while there allocate ufs mount structs from kmem(9) too
preceding kmem-vmem-pool-patch

releng@ acknowledged
 1.66 12-Jul-2011  dholland branches: 1.66.2; 1.66.6;
Pass the ufs_lookup_results pointer around instead of fetching it from
the inode in the guts of ufs. Now, in VOPs where i_crap is used it is
used (directly) only immediately on entry to the VOP call and then
passed around by reference.

Except for rename, which needs explicit sorting out. The code in
ufs_wapbl_rename is unchanged in behavior but I'm increasingly
inclined to think it's wrong.
 1.65 12-Jul-2011  dholland Currently, ufs_lookup produces five auxiliary results that are left in
the vnode when lookup returns and fished out again later.

1. Create struct ufs_lookup_results to hold these.

2. Call the ufs_lookup_results instance in struct inode "i_crap" to be
clear about exactly what's going on, and to distinguish the lookup
results from respectable members of struct inode.

3. Update references to these members in the directory access
subroutines.

4. Include preliminary infrastructure for checking that the i_crap
being used is still valid when it's used. This doesn't actually do
anything yet.

5. Update the way ufs_wapbl_rename manipulates these elements to use
the new data structures. I have not changed the manipulation; it may
or may not be correct but I continue to suspect that it is not.

The word of the day is "stigmergy".
 1.64 11-Jul-2011  hannken Change VOP_BWRITE() to take a vnode as its first argument like all other
VOPs do. Layered file systems no longer have to modify bp->b_vp and run
into trouble when an async VOP_BWRITE() uses the wrong vnode.

- change all occurences of VOP_BWRITE(bp) to VOP_BWRITE(bp->b_vp, bp).
- remove layer_bwrite().
- welcome to 5.99.55

Adresses PR kern/38762 panic: vwakeup: neg numoutput

No objections from tech-kern@.
 1.63 30-Nov-2010  dholland Abolish the SAVENAME and HASBUF flags. There is now always a buffer,
so the path in a struct componentname is now always valid during VOP
calls.
 1.62 24-Jun-2010  hannken Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.
 1.61 08-Jan-2010  pooka branches: 1.61.2; 1.61.4;
The VATTR_NULL/VREF/VHOLD/HOLDRELE() macros lost their will to live
years ago when the kernel was modified to not alter ABI based on
DIAGNOSTIC, and now just call the respective function interfaces
(in lowercase). Plenty of mix'n match upper/lowercase has creeped
into the tree since then. Nuke the macros and convert all callsites
to lowercase.

no functional change
 1.60 12-Sep-2009  tsutsui Whitespace nits.
 1.59 12-Sep-2009  tsutsui Migrate from u_intNN_t to uintNN_t.
 1.58 17-Dec-2008  cegger kill MALLOC and FREE macros.
 1.57 24-Nov-2008  tsutsui Remove an extra semicolon.
 1.56 23-Nov-2008  mrg add support for 32 bit uid/gid fields in ext2, but only do so for
when the revision is > REV0.
 1.55 08-Dec-2007  pooka branches: 1.55.12; 1.55.16; 1.55.22; 1.55.24; 1.55.26;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.
 1.54 26-Nov-2007  pooka branches: 1.54.2;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.53 08-Oct-2007  ad branches: 1.53.4;
Merge ffs locking & brelse changes from the vmlocking branch.
 1.52 24-Sep-2007  rumble Avoid stack allocation of large dirent structures in foo_readdir().
 1.51 21-Jul-2007  ad branches: 1.51.4; 1.51.6; 1.51.8; 1.51.10;
Don't depend on uvm_extern.h pulling in proc.h.
 1.50 04-Mar-2007  christos branches: 1.50.2; 1.50.10;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.49 09-Feb-2007  ad branches: 1.49.2;
Merge newlock2 to head.
 1.48 04-Jan-2007  elad Consistent usage of KAUTH_GENERIC_ISSUSER.
 1.47 09-Dec-2006  chs a smorgasbord of improvements to vnode locking and path lookup:
- LOCKPARENT is no longer relevant for lookup(), relookup() or VOP_LOOKUP().
these now always return the parent vnode locked. namei() works as before.
lookup() and various other paths no longer acquire vnode locks in the
wrong order via vrele(). fixes PR 32535.
as a nice side effect, path lookup is also up to 25% faster.
- the above allows us to get rid of PDIRUNLOCK.
- also get rid of WANTPARENT (just use LOCKPARENT and unlock it).
- remove an assumption in layer_node_find() that all file systems implement
a recursive VOP_LOCK() (unionfs doesn't).
- require that all file systems supply vfs_vptofh and vfs_fhtovp routines.
fill in eopnotsupp() for file systems that don't support being exported
and remove the checks for NULL. (layerfs calls these without checking.)
- in union_lookup1(), don't change refcounts in the ISDOTDOT case, just
adjust which vnode is locked. fixes PR 33374.
- apply fixes for ufs_rename() from ufs_vnops.c rev. 1.61 to ext2fs_rename().
 1.46 16-Nov-2006  christos branches: 1.46.2;
__unused removal on arguments; approved by core.
 1.45 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.44 14-May-2006  elad branches: 1.44.8; 1.44.10;
integrate kauth.
 1.43 15-Apr-2006  christos Coverity CID 1169: Add KASSERT before deref.
 1.42 18-Mar-2006  bouyer Remove dead code, fixing coverity ID 745. nameiop can only be CREATE
or DELETE here. This code got cut-n-pasted from ufs_loolup.c, but
is only used in whiteout support. ext2fs doesn't support whiteout.
 1.41 17-Mar-2006  christos don't use MALLOC with a non-constant size; use malloc instead.
 1.40 01-Mar-2006  yamt branches: 1.40.2; 1.40.4; 1.40.6;
merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.
 1.39 11-Dec-2005  christos branches: 1.39.2; 1.39.4; 1.39.6;
merge ktrace-lwp.
 1.38 02-Nov-2005  yamt branches: 1.38.2;
merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.37 30-Aug-2005  xtraeme branches: 1.37.2;
* Remove __P()
* Use ANSI function declarations on ext2fs and mfs
 1.36 23-Aug-2005  christos Don't overload MAXNAMLEN, use a separate constant for each filesystem type.
 1.35 19-Aug-2005  christos 64 bit inode changes.
 1.34 28-Jun-2005  kml branches: 1.34.2;
Ensure that we change the size of the vnode at the same time as
we change the size of the inode, and use ext2fs_size uniformly.
This fixes a crash that occurs when I create a directory, then
move it, all on an ext2 filesystem.
 1.33 29-May-2005  christos - sprinkle const
- avoid shadow variables.
 1.32 26-Feb-2005  perry nuke trailing whitespace
 1.31 09-Feb-2005  ws Add support for large files (>2GB).
Like Linux, automagically convert old filesystem to use this,
if they are already at revision 1.
For revision 0, just punt (unlike Linux; makes me a bit too nervous.)

There should be an option to fsck_ext2fs to upgrade revision 0 to revision 1.

Reviewd by Manuel (bouyer@).
 1.30 17-Sep-2004  skrll branches: 1.30.4; 1.30.6;
There's no need to pass a proc value when using UIO_SYSSPACE with
vn_rdwr(9) and uiomove(9).

OK'd by Jason Thorpe
 1.29 15-Aug-2004  mycroft Fixing age old cruft:
* Rather than using mnt_maxsymlinklen to indicate that a file systems returns
d_type fields(!), add a new internal flag, IMNT_DTYPE.

Add 3 new elements to ufsmount:
* um_maxsymlinklen, replaces mnt_maxsymlinklen (which never should have existed
in the first place).
* um_dirblksiz, which tracks the current directory block size, eliminating the
FS-specific checks littered throughout the code. This may be used later to
make the block size variable.
* um_maxfilesize, which is the maximum file size, possibly adjusted lower due
to implementation issues.

Sync some bug fixes from FFS into ext2fs, particularly:
* ffs_lookup.c 1.21, 1.28, 1.33, 1.48
* ffs_inode.c 1.43, 1.44, 1.45, 1.66, 1.67
* ffs_vnops.c 1.84, 1.85, 1.86

Clean up some crappy pointer frobnication.
 1.28 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.27 29-Jun-2003  fvdl branches: 1.27.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.26 29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.25 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.24 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.23 26-Nov-2002  yamt eliminate i_ino from in-core inode
and use local variable instead.

ok'ed by Frank van der Linden.
 1.22 25-Nov-2002  thorpej Avoid strict-alias warnings.
 1.21 27-Sep-2002  provos remove trailing \n in panic(). approved perry.
 1.20 26-Jul-2002  wiz Spell '[Rr]ight' correctly. From Jim Bernard.
 1.19 30-May-2002  thorpej #if 0 a test that is always false (and the XXX comment above it
indicates so).
 1.18 08-Nov-2001  lukem branches: 1.18.8; 1.18.10;
add RCSID
 1.17 26-Oct-2001  lukem remove #include <ufs/ufs/quota.h> where it was just to appease
<ufs/ufs/inode.h>, since the latter now includes the former. leave the former
in source that obviously uses specific bits of it (for completeness.)
 1.16 03-Aug-2000  thorpej branches: 1.16.2; 1.16.6; 1.16.10;
MALLOC()/FREE() are not to be used for variable sized allocations.
 1.15 30-Mar-2000  augustss Remove register declarations.
 1.14 28-Jan-2000  bouyer Correct (minor) bogons in filetype option support, and add support
for sparse_super option
 1.13 26-Jan-2000  bouyer First cut at ext2fs rev 1 support (as of mke2fs 1.18): supports the filetype
option read/write and the sparse option read-only.
 1.12 05-Sep-1999  jdolecek branches: 1.12.2;
Adapt to cache_lookup() changes.

Tested by: jdolecek
Rewieved by: wrstuden
 1.11 04-Aug-1999  wrstuden Pull in changes which parallel rev 1.22 -> 1.25 of ufs_lookup().
 1.10 02-Aug-1999  wrstuden Add PDIRUNLOCK support.
 1.9 02-Dec-1998  bouyer branches: 1.9.4;
- intentation
- sync LK_* flags with ffs/ufs
 1.8 13-Sep-1998  christos Fix copyright '\t' -> ' '
 1.7 09-Aug-1998  perry bzero->memset, bcopy->memcpy, bcmp->memcmp
 1.6 28-Jul-1998  mjacob fix to accomodate change in vn_rdwr prototype
 1.5 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.4 10-Oct-1997  bouyer Update for 64 bits directory cookies.
 1.3 09-Oct-1997  bouyer Add byte-swapping functions (bswap16, bswap32, bswap64) to libkern.
Only assembly version for i386 bswap16 and bswap32 for now (bswap64 uses
bswap32). Contribution of assembly versions of these are welcome.
Add byte-swapping of ext2fs metadata for big-endian systems.
Tested on i386 and sparc.
 1.2 04-Aug-1997  bouyer Fix bad cut&paste from ufs code: we can't align uio_resid to a directory
block size boundary, because size of the returned dir entry may be bigger than
the one read.
 1.1 11-Jun-1997  bouyer branches: 1.1.4;
The ext2fs layer, based on the ffs/ufs one. Uses a few functions from
sys/ufs/ufs/
 1.1.4.2 14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.1.4.1 23-Aug-1997  thorpej Update marc-pcmcia branch from trunk.
 1.9.4.1 01-Feb-2000  he Apply patch (requested by bouyer):
Add support for ext2fs revision 1, with read-only support for
the 'sparse_super' and 'filetype' options. Should fix PR#9088.
 1.12.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.16.10.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.16.6.4 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.16.6.3 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.16.6.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.16.6.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.16.2.5 11-Dec-2002  thorpej Sync with HEAD.
 1.16.2.4 18-Oct-2002  nathanw Catch up to -current.
 1.16.2.3 01-Aug-2002  nathanw Catch up to -current.
 1.16.2.2 20-Jun-2002  nathanw Catch up to -current.
 1.16.2.1 14-Nov-2001  nathanw Catch up to -current.
 1.18.10.1 30-May-2002  tv Pull up revision 1.19 (requested by thorpej in ticket #93):
indicates so).
 1.18.8.2 29-Aug-2002  gehenna catch up with -current.
 1.18.8.1 20-Jun-2002  gehenna catch up with -current.
 1.27.2.10 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.27.2.9 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.27.2.8 15-Feb-2005  skrll Sync with HEAD.
 1.27.2.7 29-Oct-2004  skrll Remove the struct lwp * argument from ext2f2_checkpath that is no longer
(read: was never) required.
 1.27.2.6 21-Sep-2004  skrll Fix the sync with head I botched.
 1.27.2.5 18-Sep-2004  skrll Sync with HEAD.
 1.27.2.4 25-Aug-2004  skrll Sync with HEAD.
 1.27.2.3 24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.27.2.2 03-Aug-2004  skrll Sync with HEAD
 1.27.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.30.6.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.30.6.1 12-Feb-2005  yamt sync with head.
 1.30.4.1 29-Apr-2005  kent sync with -current
 1.34.2.7 21-Jan-2008  yamt sync with head
 1.34.2.6 07-Dec-2007  yamt sync with head
 1.34.2.5 27-Oct-2007  yamt sync with head.
 1.34.2.4 03-Sep-2007  yamt sync with head.
 1.34.2.3 26-Feb-2007  yamt sync with head.
 1.34.2.2 30-Dec-2006  yamt sync with head.
 1.34.2.1 21-Jun-2006  yamt sync with head.
 1.37.2.1 20-Oct-2005  yamt adapt ufs.
 1.38.2.2 19-Nov-2005  yamt - finish reverting VOP_READ prototype changes.
- remove unused variables.
- fix typos.
some of them are pointed by Juan RP.
 1.38.2.1 15-Nov-2005  yamt - adapt to the new prototype of VOP_READ.
- adapt ext2fs and union.
 1.39.6.2 01-Jun-2006  kardel Sync with head.
 1.39.6.1 22-Apr-2006  simonb Sync with head.
 1.39.4.1 09-Sep-2006  rpaulo sync with head
 1.39.2.1 15-Jan-2006  yamt convert the rest of ufs.
 1.40.6.2 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.40.6.1 28-Mar-2006  tron Merge 2006-03-28 NetBSD-current into the "peter-altq" branch.
 1.40.4.3 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.40.4.2 19-Apr-2006  elad sync with head.
 1.40.4.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.40.2.2 24-May-2006  yamt sync with head.
 1.40.2.1 01-Apr-2006  yamt sync with head.
 1.44.10.2 10-Dec-2006  yamt sync with head.
 1.44.10.1 22-Oct-2006  yamt sync with head
 1.44.8.3 30-Jan-2007  ad Remove support for SA. Ok core@.
 1.44.8.2 12-Jan-2007  ad Sync with head.
 1.44.8.1 18-Nov-2006  ad Sync with head.
 1.46.2.1 17-Feb-2007  tron Apply patch (requested by chs in ticket #422):
- Fix various deadlock problems with nullfs and unionfs.
- Speed up path lookups by upto 25%.
 1.49.2.1 12-Mar-2007  rmind Sync with HEAD.
 1.50.10.1 15-Aug-2007  skrll Sync with HEAD.
 1.50.2.3 09-Oct-2007  ad Sync with head.
 1.50.2.2 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.50.2.1 05-Apr-2007  ad Compile fixes.
 1.51.10.2 21-Jul-2007  ad Don't depend on uvm_extern.h pulling in proc.h.
 1.51.10.1 21-Jul-2007  ad file ext2fs_lookup.c was added on branch matt-mips64 on 2007-07-21 19:06:23 +0000
 1.51.8.2 14-Oct-2007  yamt sync with head.
 1.51.8.1 06-Oct-2007  yamt sync with head.
 1.51.6.2 09-Jan-2008  matt sync with HEAD
 1.51.6.1 06-Nov-2007  matt sync with HEAD
 1.51.4.4 09-Dec-2007  jmcneill Sync with HEAD.
 1.51.4.3 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.51.4.2 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.51.4.1 02-Oct-2007  joerg Sync with HEAD.
 1.53.4.2 27-Dec-2007  mjf Sync with HEAD.
 1.53.4.1 08-Dec-2007  mjf Sync with HEAD.
 1.54.2.1 26-Dec-2007  ad Sync with head.
 1.55.26.1 29-Nov-2008  snj Pull up following revision(s) (requested by mrg in ticket #147):
sys/ufs/ext2fs/ext2fs_alloc.c: revision 1.37
sys/ufs/ext2fs/ext2fs_bswap.c: revision 1.14
sys/ufs/ext2fs/ext2fs_dinode.h: revision 1.17
sys/ufs/ext2fs/ext2fs_lookup.c: revision 1.56
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.83
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.140
sys/ufs/ufs/inode.h: revision 1.55
add support for 32 bit uid/gid fields in ext2, but only do so for
when the revision is > REV0.
 1.55.24.1 19-Jan-2009  skrll Sync with HEAD.
 1.55.22.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.55.16.4 11-Aug-2010  yamt sync with head.
 1.55.16.3 11-Mar-2010  yamt sync with head
 1.55.16.2 16-Sep-2009  yamt sync with head
 1.55.16.1 04-May-2009  yamt sync with head.
 1.55.12.1 17-Jan-2009  mjf Sync with HEAD.
 1.61.4.2 05-Mar-2011  rmind sync with head
 1.61.4.1 03-Jul-2010  rmind sync with head
 1.61.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.66.6.2 05-Apr-2012  mrg sync to latest -current.
 1.66.6.1 18-Feb-2012  mrg merge to -current.
 1.66.2.5 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.66.2.4 23-Jan-2013  yamt sync with head
 1.66.2.3 16-Jan-2013  yamt sync with (a bit old) head
 1.66.2.2 30-Oct-2012  yamt sync with head
 1.66.2.1 17-Apr-2012  yamt sync with head
 1.67.2.1 12-Aug-2012  martin Pull up following revision(s) (requested by manu in ticket #484):
sys/fs/nilfs/nilfs_vnops.c: revision 1.18
sys/ufs/ufs/ufs_lookup.c: revision 1.117
sys/nfs/nfs_vnops.c: revision 1.295
sys/ufs/chfs/chfs_vnops.c: revision 1.8
sys/ufs/ext2fs/ext2fs_lookup.c: revision 1.70
sys/fs/unionfs/unionfs_vnops.c: revision 1.6
sys/kern/vfs_cache.c: revision 1.89
sys/fs/efs/efs_vnops.c: revision 1.26
sys/fs/hfs/hfs_vnops.c: revision 1.26
sys/fs/adosfs/adlookup.c: revision 1.16
sys/fs/puffs/puffs_vnops.c: revision 1.168
sys/fs/tmpfs/tmpfs_vnops.c: revision 1.98
sys/fs/ntfs/ntfs_vnops.c: revision 1.52
sys/fs/cd9660/cd9660_lookup.c: revision 1.20
sys/fs/msdosfs/msdosfs_lookup.c: revision 1.24
sys/fs/smbfs/smbfs_vnops.c: revision 1.80
sys/fs/udf/udf_vnops.c: revision 1.72
sys/fs/filecorefs/filecore_lookup.c: revision 1.14
sys/fs/puffs/puffs_node.c: revision 1.25
Move some the test for MAKEENTRY into the cache_enter(9). Make some
variables in vfs_cache.c static, __read_mostly, etc.
No objection on tech-kern@.
 1.70.2.4 03-Dec-2017  jdolecek update from HEAD
 1.70.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.70.2.2 25-Feb-2013  tls resync with head
 1.70.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.73.2.1 18-May-2014  rmind sync with head
 1.74.2.1 10-Aug-2014  tls Rebase.
 1.77.4.4 05-Oct-2016  skrll Sync with HEAD
 1.77.4.3 09-Jul-2016  skrll Sync with HEAD
 1.77.4.2 19-Mar-2016  skrll Sync with HEAD
 1.77.4.1 06-Apr-2015  skrll Sync with HEAD
 1.88.22.1 19-Jan-2020  ad Set IMNT_SHRLOOKUP and use it for the in-cache case. Need to check what
more can be done with tmpfs though, it can probably do the whole lookup.
 1.88.16.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.94.6.1 02-Aug-2025  perseant Sync with HEAD
 1.79 19-Oct-2024  jakllsch ufs: base amount of data to sync on MAXPHYS instead of constant

No functional change except on sun2 and sun3, as ilog2(MAXPHYS) is the
same as the previous constant (16) on all other ports. On sun[23] this
changes amount written from 56KiB/8KiB to 2x 32KiB.
 1.78 20-Oct-2021  thorpej branches: 1.78.10;
Overhaul of the EVFILT_VNODE kevent(2) filter:

- Centralize vnode kevent handling in the VOP_*() wrappers, rather than
forcing each individual file system to deal with it (except VOP_RENAME(),
because VOP_RENAME() is a mess and we currently have 2 different ways
of handling it; at least it's reasonably well-centralized in the "new"
way).
- Add support for NOTE_OPEN, NOTE_CLOSE, NOTE_CLOSE_WRITE, and NOTE_READ,
compatible with the same events in FreeBSD.
- Track which kevent notifications clients are interested in receiving
to avoid doing work for events no one cares about (avoiding, e.g.
taking locks and traversing the klist to send a NOTE_WRITE when
someone is merely watching for a file to be deleted, for example).

In support of the above:

- Add support in vnode_if.sh for specifying PRE- and POST-op handlers,
to be invoked before and after vop_pre() and vop_post(), respectively.
Basic idea from FreeBSD, but implemented differently.
- Add support in vnode_if.sh for specifying CONTEXT fields in the
vop_*_args structures. These context fields are used to convey information
between the file system VOP function and the VOP wrapper, but do not
occupy an argument slot in the VOP_*() call itself. These context fields
are initialized and subsequently interpreted by PRE- and POST-op handlers.
- Version VOP_REMOVE(), uses the a context field for the file system to report
back the resulting link count of the target vnode. Return this in tmpfs,
udf, nfs, chfs, ext2fs, lfs, and ufs.

NetBSD 9.99.92.
 1.77 23-Apr-2020  ad PR kern/54759 (vm.ubc_direct deadlock when read()/write() into mapping of itself)

- Add new flag UBC_ISMAPPED which tells ubc_uiomove() the object is mmap()ed
somewhere. Use it to decide whether to do direct-mapped copy, rather than
poking around directly in the vnode in ubc_uiomove(), which is ugly and
doesn't work for tmpfs. It would be nicer to contain all this in UVM but
the filesystem provides the needed locking here (VV_MAPPED) and to
reinvent that would suck more.

- Rename UBC_UNMAP_FLAG() to UBC_VNODE_FLAGS(). Pass in UBC_ISMAPPED where
appropriate.
 1.76 23-Feb-2020  ad branches: 1.76.4;
UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.75 13-Aug-2016  christos branches: 1.75.16; 1.75.22;
KNF, no functional changes...
 1.74 28-Mar-2015  maxv Remove the 'cred' argument from bread(). Remove a now unused var in
ffs_snapshot.c. Update the man page accordingly.

ok hannken@
 1.73 28-Mar-2015  riastradh Let I/O errors override inode update errors in UFS.

Fixes tests/fs/vfs/t_io:read_fault for UFS.
 1.72 28-Mar-2015  maxv Remove the 'cred' argument from breadn(), and update the man page
accordingly.

ok hannken@
 1.71 28-Mar-2015  riastradh Factor out post-read/write inode updates in UFS.
 1.70 28-Mar-2015  riastradh Turn some `#if DIAGNOSTIC' into KASSERT.
 1.69 28-Mar-2015  riastradh Missed another spot, in ext2fs_write.
 1.68 28-Mar-2015  riastradh Missed a spot in ext2fs_read
 1.67 27-Mar-2015  riastradh Disentangle buffer-cached I/O from page-cached I/O in UFS.

Page-cached I/O is used for regular files, and is initiated by VFS
users such as userland and NFS.

Buffer-cached I/O is used for directories and symlinks, and is issued
only internally by UFS.

New UFS routine ufs_bufio replaces vn_rdwr for internal use.
ufs_bufio is implemented by new UFS operations uo_bufrd/uo_bufwr,
which sit in ufs_readwrite.c alongside the VOP_READ/VOP_WRITE
implementations.

I preserved the code as much as possible and will leave further
simplification for future commits. I kept the ulfs_readwrite.c
copypasta close to ufs_readwrite.c in case we ever want to merge them
back; likewise ext2fs_readwrite.c.

No externally visible semantic change. All atf fs tests still pass.
 1.66 09-Nov-2014  maxv branches: 1.66.2;
Do not uselessly include <sys/malloc.h>.
 1.65 12-Aug-2014  maxv http://m00nbsd.net/ae123a9bae03f7dde5c6d654412daf5a.html#Report-2

#04-0x02: Remove 'doclusterread' and 'doclusterwrite' (unused).
 1.64 23-Jun-2013  dholland branches: 1.64.8;
Stick ffs_, ext2_, chfs_, filecore_, cd9660_, or mfs_ in front of
the following symbols so as to disambiguate fully. (Christos already
did the lfs ones.)

lblkno
lblktosize
lfragtosize
numfrags
blkroundup
fragroundup
 1.63 19-Jun-2013  dholland Rename ambiguous macros:
MAXDIRSIZE -> UFS_MAXDIRSIZE or LFS_MAXDIRSIZE
NINDIR -> FFS_NINDIR, EXT2_NINDIR, LFS_NINDIR, or MFS_NINDIR
INOPB -> FFS_INOPB, LFS_INOPB
INOPF -> FFS_INOPF, LFS_INOPF
blksize -> ffs_blksize, ext2_blksize, or lfs_blksize
sblksize -> ffs_blksize

These are not the only ambiguously defined filesystem macros, of
course, there's a pile more. I may not have found all the ambiguous
definitions of blksize(), too, as there are a lot of other things
called 'blksize' in the system.
 1.62 21-Nov-2012  jakllsch Write support for the Ext4 Read-only Compatible Feature "huge_file".

Primarily, this feature extends the inode block count field to 48 bits.
Additionally, this feature allows this field to be represented in file
system block size units rather than DEV_BSIZE units.
 1.61 29-Apr-2012  chs branches: 1.61.2;
change vflushbuf() to take the full FSYNC_* flags.
translate FSYNC_LAZY into PGO_LAZY for VOP_PUTPAGES() so that
genfs_do_io() can set the appropriate io priority for the I/O.
this is the first part of addressing PR 46325.
 1.60 17-Apr-2012  christos it is not an error if the kernel needs to clear the setuid/
setgid bit on write/chown/chgrp
 1.59 13-Mar-2012  elad Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.
 1.58 18-Nov-2011  christos branches: 1.58.4; 1.58.6;
Obey MNT_RELATIME, the only addition is that mkdir in ufs sets IN_ACCESS too.
 1.57 12-Jun-2011  rmind branches: 1.57.2;
Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.56 23-Apr-2010  pooka branches: 1.56.6;
Enforce RLIMIT_FSIZE before VOP_WRITE. This adds support to file
system drivers where it was missing from and fixes one buggy
implementation. The arguably weird semantics of the check are
maintained (v_size vs. va_bytes, overwrite).
 1.55 19-Oct-2009  bouyer branches: 1.55.2; 1.55.4;
Remove closes 3 & 4 from my licence. Lots of thanks to Soren Jacobsen
for the booring work !
 1.54 12-Sep-2009  tsutsui Migrate from u_intNN_t to uintNN_t.
 1.53 26-Nov-2008  pooka Rototill all remaining file systems to use ubc_uiomove() instead
of the ubc_alloc() - uiomove() - ubc_release() dance.
 1.52 16-May-2008  hannken branches: 1.52.4; 1.52.6;
Make sure all cached buffers with valid, not yet written data have been
run through copy-on-write. Call fscow_run() with valid data where possible.

The LP_UFSCOW hack is no longer needed to protect ffs_copyonwrite() against
endless recursion.

- Add a flag B_MODIFY to bread(), breada() and breadn(). If set the caller
intends to modify the buffer returned.

- Always run copy-on-write on buffers returned from ffs_balloc().

- Add new function ffs_getblk() that gets a buffer, assigns a new blkno,
may clear the buffer and runs copy-on-write. Process possible errors
from getblk() or fscow_run(). Part of PR kern/38664.

Welcome to 4.99.63

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.51 24-Apr-2008  ad branches: 1.51.2; 1.51.4;
Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.
 1.50 02-Jan-2008  ad branches: 1.50.6; 1.50.8;
Merge vmlocking2 to head.
 1.49 08-Dec-2007  pooka branches: 1.49.4;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.
 1.48 08-Oct-2007  ad branches: 1.48.4; 1.48.6;
Merge ffs locking & brelse changes from the vmlocking branch.
 1.47 05-Jun-2007  yamt branches: 1.47.6; 1.47.8; 1.47.10;
improve post-ubc file overwrite performance in common cases.
ie. when it's safe, actually overwrite blocks rather than doing
read-modify-write.

also fixes PR/33152 and PR/36303.
 1.46 19-Apr-2007  yamt hold proclist_mutex when calling psignal().
 1.45 21-Feb-2007  thorpej branches: 1.45.4; 1.45.6;
Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.44 04-Jan-2007  elad branches: 1.44.2;
Consistent usage of KAUTH_GENERIC_ISSUSER.
 1.43 14-May-2006  elad branches: 1.43.8;
integrate kauth.
 1.42 01-Mar-2006  yamt branches: 1.42.2; 1.42.4; 1.42.6;
merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.
 1.41 14-Jan-2006  christos branches: 1.41.2; 1.41.4;
Protect against uio_lwp being NULL from Pavel Cahyna
 1.40 11-Dec-2005  christos branches: 1.40.2;
merge ktrace-lwp.
 1.39 29-Nov-2005  yamt merge yamt-readahead branch.
 1.38 02-Nov-2005  yamt branches: 1.38.2;
merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.37 30-Aug-2005  xtraeme branches: 1.37.2;
* Remove __P()
* Use ANSI function declarations on ext2fs and mfs
 1.36 09-Feb-2005  ws branches: 1.36.6;
Add support for large files (>2GB).
Like Linux, automagically convert old filesystem to use this,
if they are already at revision 1.
For revision 0, just punt (unlike Linux; makes me a bit too nervous.)

There should be an option to fsck_ext2fs to upgrade revision 0 to revision 1.

Reviewd by Manuel (bouyer@).
 1.35 09-Jan-2005  chs branches: 1.35.2; 1.35.4;
adjust the UBC mapping code to support non-vnode uvm_objects.
this means we can no longer look at the vnode size to determine how many
pages to request in a fault, which is good since for NFS the size can change
out from under us on the server anyway. there's also a new flag UBC_UNMAP
for ubc_release(), so that the file system code can make the decision about
whether to cache mappings for files being used as executables.
 1.34 14-Nov-2004  christos Remove erroneous KASSERT; i_size is one of the fields mentioned in
<ufs/inode.h> as unused by ext2fs.
 1.33 15-Aug-2004  mycroft Fixing age old cruft:
* Rather than using mnt_maxsymlinklen to indicate that a file systems returns
d_type fields(!), add a new internal flag, IMNT_DTYPE.

Add 3 new elements to ufsmount:
* um_maxsymlinklen, replaces mnt_maxsymlinklen (which never should have existed
in the first place).
* um_dirblksiz, which tracks the current directory block size, eliminating the
FS-specific checks littered throughout the code. This may be used later to
make the block size variable.
* um_maxfilesize, which is the maximum file size, possibly adjusted lower due
to implementation issues.

Sync some bug fixes from FFS into ext2fs, particularly:
* ffs_lookup.c 1.21, 1.28, 1.33, 1.48
* ffs_inode.c 1.43, 1.44, 1.45, 1.66, 1.67
* ffs_vnops.c 1.84, 1.85, 1.86

Clean up some crappy pointer frobnication.
 1.32 22-Mar-2004  bouyer Fix disclaimer in my copyright. Pointed out by Thomas Klausner.
 1.31 05-Oct-2003  bouyer Remove references to University of California from my copyright notices.
 1.30 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.29 29-Jun-2003  fvdl branches: 1.29.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.28 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.27 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.26 23-Oct-2002  jdolecek merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe
 1.25 22-Sep-2002  jdolecek don't need <sys/conf.h> here
 1.24 25-Mar-2002  chs if the size argument to write(2) is 0, do not modify the file in any way,
including updating timestamps. required for standards conformance.
 1.23 17-Mar-2002  chs don't do any flush-behind for async mounts.
this matches the traditional behaviour.
 1.22 30-Nov-2001  chs VOP_PUTPAGES() requires page-aligned offsets, so be sure to provide such.
fixes PR 14759.

(while I'm here, call VOP_PUTPAGES() directly instead of indirecting through
the UVM pager op vector.)
 1.21 10-Nov-2001  chs track some changes in the ufs code:
update UVM's notion of the file size in *_write() rather than
*_balloc().
 1.20 08-Nov-2001  lukem add RCSID
 1.19 26-Oct-2001  lukem remove #include <ufs/ufs/quota.h> where it was just to appease
<ufs/ufs/inode.h>, since the latter now includes the former. leave the former
in source that obviously uses specific bits of it (for completeness.)
 1.18 22-Sep-2001  chs branches: 1.18.2;
check early for reads beyond EOF.
 1.17 15-Sep-2001  chs a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.16 27-Feb-2001  chs branches: 1.16.2; 1.16.6; 1.16.8;
min() -> MIN(), max() -> MAX().
fixes more problems with file offsets > 4GB.
 1.15 01-Dec-2000  chs fix merge error: ext2fs uses a custom balloc rather than a VOP-style one.
 1.14 27-Nov-2000  chs Initial integration of the Unified Buffer Cache project.
 1.13 28-Jun-2000  mrg remove include of <vm/vm.h> and <uvm/uvm_extern.h>
 1.12 13-May-2000  perseant Change the sementics of the last parameter from a boolean ("waitfor") to
a set of flags ("flags"). Two flags are defined, UPDATE_WAIT and
UPDATE_DIROP.

Under the old semantics, VOP_UPDATE would block if waitfor were set,
under the assumption that directory operations should be done
synchronously. At least LFS and FFS+softdep do not make this
assumption; FFS+softdep got around the problem by enclosing all relevant
calls to VOP_UPDATE in a "if(!DOINGSOFTDEP(vp))", while LFS simply
ignored waitfor, one of the reasons why NFS-serving an LFS filesystem
did not work properly.

Under the new semantics, the UPDATE_DIROP flag is a hint to the
fs-specific update routine that the call comes from a dirop routine, and
should be wait for, or not, accordingly.

Closes PR#8996.
 1.11 30-Mar-2000  augustss Remove register declarations.
 1.10 24-Mar-1999  mrg branches: 1.10.4; 1.10.8;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.
 1.9 05-Mar-1999  mycroft Pass null pointers to VOP_UPDATE rather than having all the callers fetch the
current time themselves.
 1.8 29-Sep-1998  bouyer #include opt_uvm.h only if _KENREL and !_LKM
Make ext2fs_init() call ufs_init(). it was doing the init by itself,
testing for extern done != 0. This bug was hidden by the fact that
ext2fs_init() is called before ffs_init().
 1.7 02-Aug-1998  kleink Implement support for IEEE Std 1003.1b-1993 synchronous I/O:
* in the read vnode operator, check for IO_SYNC being set in the ioflag and
synchronously update the file's meta-data if appropriate.
* in the write vnode operator, update the appropriate checks for IO_SYNC being
set in the ioflag to reflect that IO_DSYNC is now inclusive-or'ed into
IO_SYNC, and require all IO_SYNC bits to be set for operations defined by
synchronized I/O file integrity completion but not by synchronized I/O data
integrity completion.
 1.6 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.5 10-Feb-1998  mrg - add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.4 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)
 1.3 23-Oct-1997  bouyer Uses ext2fs_vinit not ufs_vinit.
In ext2fs, an inode is deleted either when mode == 0 or dtime != 0. If
dtime != 0, reset others fields before using the inode, or we could end
up with the wrong v_op in ext2fs_vinit.
While I'm there, kill a unused variable in ext2fs_readwrite
 1.2 04-Jul-1997  drochner branches: 1.2.6;
Don't cast 64bit (off_t) file sizes to vm_offset_t (32bit on many
architectures), truncate them intelligently instead.
The truncation is done centralized in vnode_pager.c.
This prevents from wrap-over effects when parts of large (>2^32 byte) files
are mmapped.
Don't allow to mmap above the numerical range of vm_offset_t.
This is considered a temporary solution until the vm system handles the
object sizes/offsets more cleanly.
 1.1 11-Jun-1997  bouyer The ext2fs layer, based on the ffs/ufs one. Uses a few functions from
sys/ufs/ufs/
 1.2.6.1 24-Oct-1997  thorpej Pull up from trunk: kill an unused variable.
 1.10.8.3 12-Mar-2001  bouyer Sync with HEAD.
 1.10.8.2 08-Dec-2000  bouyer Sync with HEAD.
 1.10.8.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.10.4.2 06-Aug-1999  chs UBCify.
 1.10.4.1 11-Jul-1999  chs remove uvm_vnp_uncache(), it's no longer needed.
 1.16.8.1 01-Oct-2001  fvdl Catch up with -current.
 1.16.6.4 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.16.6.3 26-Sep-2002  jdolecek add support for kevents - sprikle VN_KNOTE() and add genfs_kqfilter()
to vnode ops; basically same thing as in ufs_readwrite.c and ufs_vnops.c
 1.16.6.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.16.6.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.16.2.7 11-Nov-2002  nathanw Catch up to -current
 1.16.2.6 18-Oct-2002  nathanw Catch up to -current.
 1.16.2.5 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.16.2.4 08-Jan-2002  nathanw Catch up to -current.
 1.16.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.16.2.2 26-Sep-2001  nathanw Catch up to -current.
Again.
 1.16.2.1 21-Sep-2001  nathanw Catch up to -current.
 1.18.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.29.2.10 11-Dec-2005  christos Sync with head.
 1.29.2.9 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.29.2.8 15-Feb-2005  skrll Sync with HEAD.
 1.29.2.7 17-Jan-2005  skrll Sync with HEAD.
 1.29.2.6 29-Nov-2004  skrll Sync with HEAD.
 1.29.2.5 21-Sep-2004  skrll Fix the sync with head I botched.
 1.29.2.4 18-Sep-2004  skrll Sync with HEAD.
 1.29.2.3 25-Aug-2004  skrll Sync with HEAD.
 1.29.2.2 03-Aug-2004  skrll Sync with HEAD
 1.29.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.35.4.1 12-Feb-2005  yamt sync with head.
 1.35.2.1 29-Apr-2005  kent sync with -current
 1.36.6.5 21-Jan-2008  yamt sync with head
 1.36.6.4 27-Oct-2007  yamt sync with head.
 1.36.6.3 03-Sep-2007  yamt sync with head.
 1.36.6.2 26-Feb-2007  yamt sync with head.
 1.36.6.1 21-Jun-2006  yamt sync with head.
 1.37.2.1 20-Oct-2005  yamt adapt ufs.
 1.38.2.6 22-Nov-2005  yamt revert a whitespace change.
 1.38.2.5 19-Nov-2005  yamt - as read-ahead context is per-vnode now,
there are less reasons to make VOP_READ call uvm_ra_request explicitly.
move it to pager (uvn_get) so that it can handle accesses via mmap as well.
- pass advice to pager via ubc.
- tweak DPRINTF.

XXX can be disturbed by PGO_LOCKED.

XXX it's controversial where it should be done.
(uvm_fault, uvn_get or genfs_getpages.)
 1.38.2.4 19-Nov-2005  yamt - finish reverting VOP_READ prototype changes.
- remove unused variables.
- fix typos.
some of them are pointed by Juan RP.
 1.38.2.3 18-Nov-2005  yamt - associate read-ahead context to vnode, rather than file.
- revert VOP_READ prototype.
 1.38.2.2 15-Nov-2005  yamt add missing include.
 1.38.2.1 15-Nov-2005  yamt - adapt to the new prototype of VOP_READ.
- adapt ext2fs and union.
 1.40.2.3 18-Feb-2006  yamt fix proc/lwp mismatch.
 1.40.2.2 15-Jan-2006  yamt convert the rest of ufs.
 1.40.2.1 15-Jan-2006  yamt sync with head.
 1.41.4.2 01-Jun-2006  kardel Sync with head.
 1.41.4.1 22-Apr-2006  simonb Sync with head.
 1.41.2.1 09-Sep-2006  rpaulo sync with head
 1.42.6.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.42.4.2 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.42.4.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.42.2.1 24-May-2006  yamt sync with head.
 1.43.8.1 12-Jan-2007  ad Sync with head.
 1.44.2.2 07-May-2007  yamt sync with head.
 1.44.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.45.6.1 11-Jul-2007  mjf Sync with head.
 1.45.4.4 09-Jun-2007  ad Sync with head.
 1.45.4.3 08-Jun-2007  ad Sync with head.
 1.45.4.2 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.45.4.1 05-Apr-2007  ad Compile fixes.
 1.47.10.1 14-Oct-2007  yamt sync with head.
 1.47.8.2 09-Jan-2008  matt sync with HEAD
 1.47.8.1 06-Nov-2007  matt sync with HEAD
 1.47.6.2 09-Dec-2007  jmcneill Sync with HEAD.
 1.47.6.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.48.6.2 26-Dec-2007  ad Sync with head.
 1.48.6.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.48.4.1 18-Feb-2008  mjf Sync with HEAD.
 1.49.4.1 02-Jan-2008  bouyer Sync with HEAD
 1.50.8.1 18-May-2008  yamt sync with head.
 1.50.6.2 17-Jan-2009  mjf Sync with HEAD.
 1.50.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.51.4.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.51.2.4 11-Aug-2010  yamt sync with head.
 1.51.2.3 11-Mar-2010  yamt sync with head
 1.51.2.2 16-Sep-2009  yamt sync with head
 1.51.2.1 04-May-2009  yamt sync with head.
 1.52.6.1 19-Jan-2009  skrll Sync with HEAD.
 1.52.4.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.55.4.2 30-May-2010  rmind sync with head
 1.55.4.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.55.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.56.6.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.57.2.4 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.57.2.3 16-Jan-2013  yamt sync with (a bit old) head
 1.57.2.2 23-May-2012  yamt sync with head.
 1.57.2.1 17-Apr-2012  yamt sync with head
 1.58.6.1 07-May-2012  riz Pull up following revision(s) (requested by chs in ticket #204):
sys/fs/sysvbfs/sysvbfs_vnops.c: revision 1.44
sys/ufs/ffs/ffs_vfsops.c: revision 1.277
sys/fs/v7fs/v7fs_vnops.c: revision 1.11
sys/ufs/chfs/chfs_vnops.c: revision 1.7
sys/ufs/ext2fs/ext2fs_readwrite.c: revision 1.61
sys/miscfs/genfs/genfs_io.c: revision 1.54
sys/kern/vfs_wapbl.c: revision 1.52
sys/uvm/uvm_pager.h: revision 1.43
sys/ufs/ffs/ffs_vnops.c: revision 1.121
sys/kern/vfs_subr.c: revision 1.434
sys/fs/msdosfs/msdosfs_vnops.c: revision 1.83
sys/fs/ntfs/ntfs_vnops.c: revision 1.51
sys/fs/udf/udf_subr.c: revision 1.119
sys/miscfs/specfs/spec_vnops.c: revision 1.135
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.103
sys/fs/udf/udf_vnops.c: revision 1.71
sys/ufs/ufs/ufs_readwrite.c: revision 1.104
change vflushbuf() to take the full FSYNC_* flags.
translate FSYNC_LAZY into PGO_LAZY for VOP_PUTPAGES() so that
genfs_do_io() can set the appropriate io priority for the I/O.
this is the first part of addressing PR 46325.
mark all wapbl I/O as BPRIO_TIMECRITICAL.
this is the second part of addressing PR 46325.
 1.58.4.3 02-Jun-2012  mrg sync to latest -current.
 1.58.4.2 29-Apr-2012  mrg sync to latest -current.
 1.58.4.1 05-Apr-2012  mrg sync to latest -current.
 1.61.2.4 03-Dec-2017  jdolecek update from HEAD
 1.61.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.61.2.2 23-Jun-2013  tls resync from head
 1.61.2.1 25-Feb-2013  tls resync with head
 1.64.8.2 17-Jan-2015  martin Pull up following revision(s) (requested by maxv in ticket #427):
sys/compat/svr4/svr4_schedctl.c: revision 1.8
sys/netinet/tcp_timer.c: revision 1.88
sys/miscfs/genfs/layer_vfsops.c: revision 1.45
sys/compat/svr4/svr4_ioctl.c: revision 1.37
sys/ufs/chfs/chfs_vfsops.c: revision 1.14
sys/miscfs/fdesc/fdesc_vfsops.c: revision 1.91
sys/compat/linux/arch/i386/linux_ptrace.c: revision 1.30
sys/compat/common/kern_time_50.c: revision 1.28
sys/netinet6/ip6_forward.c: revision 1.74
sys/miscfs/umapfs/umap_vnops.c: revision 1.57
sys/compat/svr4/svr4_fcntl.c: revision 1.74
distrib/sets/lists/comp/mi: revision 1.1931
sys/netinet6/udp6_output.c: revision 1.46
sys/fs/puffs/puffs_compat.c: revision 1.3
sys/fs/udf/udf_rename.c: revision 1.11
sys/compat/svr4/svr4_filio.c: revision 1.24
sys/fs/udf/udf_rename.c: revision 1.12
sys/netinet/tcp_usrreq.c: revision 1.202
sys/miscfs/umapfs/umap_subr.c: revision 1.29
sys/compat/linux/common/linux_fadvise64.c: revision 1.3
sys/netinet/if_atm.c: revision 1.34
sys/miscfs/procfs/procfs_subr.c: revision 1.106
sys/miscfs/genfs/layer_subr.c: revision 1.37
sys/netinet/tcp_sack.c: revision 1.30
sys/compat/freebsd/freebsd_misc.c: revision 1.33
sys/compat/freebsd/freebsd_file.c: revision 1.33
sys/ufs/chfs/chfs_vnode.c: revision 1.12
sys/compat/svr4/svr4_ttold.c: revision 1.34
sys/compat/linux/common/linux_file.c: revision 1.114
sys/compat/linux/arch/mips/linux_machdep.c: revision 1.43
sys/compat/linux/common/linux_signal.c: revision 1.76
sys/compat/common/compat_util.c: revision 1.46
sys/compat/linux/arch/arm/linux_ptrace.c: revision 1.18
sys/compat/svr4/svr4_sockio.c: revision 1.36
sys/compat/linux/arch/arm/linux_machdep.c: revision 1.32
sys/compat/svr4/svr4_signal.c: revision 1.66
sys/kern/kern_exec.c: revision 1.410
sys/fs/puffs/puffs_vfsops.c: revision 1.115
sys/compat/svr4/svr4_exec_elf64.c: revision 1.15
sys/compat/linux/arch/i386/linux_machdep.c: revision 1.159
sys/compat/linux/arch/alpha/linux_machdep.c: revision 1.50
sys/compat/linux32/common/linux32_misc.c: revision 1.24
sys/netinet/in_pcb.c: revision 1.153
sys/sys/malloc.h: revision 1.116
sys/compat/common/if_43.c: revision 1.9
share/man/man9/Makefile: revision 1.380
sys/netinet/tcp_vtw.c: revision 1.12
sys/miscfs/umapfs/umap_vfsops.c: revision 1.95
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.186
sys/compat/common/uipc_syscalls_43.c: revision 1.46
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.115
sys/fs/puffs/puffs_msgif.c: revision 1.97
sys/compat/svr4/svr4_ipc.c: revision 1.27
sys/compat/linux/common/linux_exec.c: revision 1.117
sys/ufs/ext2fs/ext2fs_readwrite.c: revision 1.66
sys/netinet/tcp_output.c: revision 1.179
sys/compat/svr4/svr4_termios.c: revision 1.28
sys/fs/udf/udf_strat_bootstrap.c: revision 1.4
sys/fs/puffs/puffs_subr.c: revision 1.67
sys/fs/puffs/puffs_node.c: revision 1.36
sys/miscfs/overlay/overlay_vnops.c: revision 1.21
sys/fs/cd9660/cd9660_node.c: revision 1.34
sys/netinet/raw_ip.c: revision 1.146
sys/sys/mallocvar.h: revision 1.13
sys/miscfs/overlay/overlay_vfsops.c: revision 1.63
share/man/man9/malloc.9: revision 1.50
sys/netinet6/dest6.c: revision 1.18
sys/compat/linux/common/linux_uselib.c: revision 1.33
sys/compat/linux/common/linux_socket.c: revision 1.120
share/man/man9/malloc.9: revision 1.51
sys/netinet/tcp_subr.c: revision 1.257
sys/compat/linux/common/linux_socketcall.c: revision 1.45
sys/compat/linux/common/linux_fadvise64_64.c: revision 1.3
sys/compat/freebsd/freebsd_ipc.c: revision 1.17
sys/compat/linux/common/linux_misc_notalpha.c: revision 1.109
sys/compat/linux/arch/alpha/linux_pipe.c: revision 1.17
sys/netinet6/in6_pcb.c: revision 1.132
sys/netinet6/in6_ifattach.c: revision 1.94
sys/compat/svr4/svr4_exec_elf32.c: revision 1.15
sys/miscfs/nullfs/null_vfsops.c: revision 1.90
sys/fs/cd9660/cd9660_util.c: revision 1.12
sys/compat/linux/arch/powerpc/linux_machdep.c: revision 1.48
sys/compat/freebsd/freebsd_exec_elf32.c: revision 1.20
sys/miscfs/procfs/procfs_vfsops.c: revision 1.94
sys/compat/linux/arch/powerpc/linux_ptrace.c: revision 1.28
sys/compat/linux/common/linux_sched.c: revision 1.67
sys/compat/linux/common/linux_exec_aout.c: revision 1.67
sys/compat/linux/common/linux_pipe.c: revision 1.67
sys/compat/linux/common/linux_llseek.c: revision 1.34
sys/compat/linux/arch/mips/linux_ptrace.c: revision 1.10
Do not uselessly include <sys/malloc.h>.
Cleanup:
- remove struct kmembuckets (dead)
- correctly deadify MALLOC_XX
- remove MALLOC_DEFINE_LIMIT and MALLOC_JUSTDEFINE_LIMIT (dead)
- remove malloc_roundup(), malloc_type_setlimit(), MALLOC_DEFINE_LIMIT()
and MALLOC_JUSTDEFINE_LIMIT() from man 9 malloc
New sentence, new line. Bump date for previous.
Obsolete malloc_roundup(9), malloc_type_setlimit(9) and MALLOC_DEFINE_LIMIT(9)
man pages.
 1.64.8.1 20-Oct-2014  martin Pullup the following revisions, requested by maxv in ticket #148:

sys/compat/svr4/svr4_stat.c 1.70
sys/dev/dm/dm_target_snapshot.c 1.17
sys/dev/if_ndis/if_ndis_pci.c 1.20
sys/fs/smbfs/smbfs_smb.c 1.45
sys/ufs/ext2fs/ext2fs_readwrite.c 1.65

Various fixes: two memory leaks, a typo, a dead compiler condition and
unused macros, respectively in if_ndis and dm, smbfs, svr4 and ext2fs.
 1.66.2.2 05-Oct-2016  skrll Sync with HEAD
 1.66.2.1 06-Apr-2015  skrll Sync with HEAD
 1.75.22.1 29-Feb-2020  ad Sync with head.
 1.75.16.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.76.4.1 25-Apr-2020  bouyer Sync with bouyer-xenpvh-base2 (HEAD)
 1.78.10.1 02-Aug-2025  perseant Sync with HEAD
 1.13 26-Aug-2023  riastradh ext2fs: Nix trailing whitespace.
 1.12 20-Oct-2021  thorpej Overhaul of the EVFILT_VNODE kevent(2) filter:

- Centralize vnode kevent handling in the VOP_*() wrappers, rather than
forcing each individual file system to deal with it (except VOP_RENAME(),
because VOP_RENAME() is a mess and we currently have 2 different ways
of handling it; at least it's reasonably well-centralized in the "new"
way).
- Add support for NOTE_OPEN, NOTE_CLOSE, NOTE_CLOSE_WRITE, and NOTE_READ,
compatible with the same events in FreeBSD.
- Track which kevent notifications clients are interested in receiving
to avoid doing work for events no one cares about (avoiding, e.g.
taking locks and traversing the klist to send a NOTE_WRITE when
someone is merely watching for a file to be deleted, for example).

In support of the above:

- Add support in vnode_if.sh for specifying PRE- and POST-op handlers,
to be invoked before and after vop_pre() and vop_post(), respectively.
Basic idea from FreeBSD, but implemented differently.
- Add support in vnode_if.sh for specifying CONTEXT fields in the
vop_*_args structures. These context fields are used to convey information
between the file system VOP function and the VOP wrapper, but do not
occupy an argument slot in the VOP_*() call itself. These context fields
are initialized and subsequently interpreted by PRE- and POST-op handlers.
- Version VOP_REMOVE(), uses the a context field for the file system to report
back the resulting link count of the target vnode. Return this in tmpfs,
udf, nfs, chfs, ext2fs, lfs, and ufs.

NetBSD 9.99.92.
 1.11 15-Aug-2016  jdolecek bump link limit to 65000 for files, and add support for EXT2F_ROCOMPAT_DIR_NLINK to make link count unlimited for directories
 1.10 13-Aug-2016  christos KNF, no functional changes...
 1.9 06-Aug-2016  jdolecek actually pass the d_type from the on-disk directory entry to the lookup results
 1.8 27-Mar-2015  riastradh Disentangle buffer-cached I/O from page-cached I/O in UFS.

Page-cached I/O is used for regular files, and is initiated by VFS
users such as userland and NFS.

Buffer-cached I/O is used for directories and symlinks, and is issued
only internally by UFS.

New UFS routine ufs_bufio replaces vn_rdwr for internal use.
ufs_bufio is implemented by new UFS operations uo_bufrd/uo_bufwr,
which sit in ufs_readwrite.c alongside the VOP_READ/VOP_WRITE
implementations.

I preserved the code as much as possible and will leave further
simplification for future commits. I kept the ulfs_readwrite.c
copypasta close to ufs_readwrite.c in case we ever want to merge them
back; likewise ext2fs_readwrite.c.

No externally visible semantic change. All atf fs tests still pass.
 1.7 25-May-2014  hannken branches: 1.7.4;
ext2fs_gro_genealogy: use vcache_get() to lookup DOTDOT.
 1.6 28-Jan-2014  martin branches: 1.6.2;
Quell a (bogus) "may be used unintialized" warning from gcc 4.8
 1.5 22-Jan-2013  dholland branches: 1.5.2;
Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.4 04-Jun-2012  riastradh branches: 1.4.2;
Kill the IN_RENAME in-core inode flag in ufs and ext2fs.

Now that rename works we need not to wave this sort of voodoo at it.

ok dholland
 1.3 04-Jun-2012  riastradh Fix ext2fs's scary cross-block directory message too.

(See rev. 1.3 of sys/ufs/ufs/ufs_rename.c for the analysis.)
 1.2 10-May-2012  riastradh branches: 1.2.2; 1.2.4;
Swap byte order of ext2fs_direct fields in ext2fs_rename_recalculate_fulr.

Symptom found and fix tested by martin.

ok martin
 1.1 09-May-2012  riastradh Adapt ffs, lfs, and ext2fs to use genfs_rename.

ok dholland, rmind
 1.2.4.2 02-Jun-2012  mrg sync to latest -current.
 1.2.4.1 10-May-2012  mrg file ext2fs_rename.c was added on branch jmcneill-usbmp on 2012-06-02 11:09:40 +0000
 1.2.2.5 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.2.2.4 23-Jan-2013  yamt sync with head
 1.2.2.3 30-Oct-2012  yamt sync with head
 1.2.2.2 23-May-2012  yamt sync with head.
 1.2.2.1 10-May-2012  yamt file ext2fs_rename.c was added on branch yamt-pagecache on 2012-05-23 10:08:18 +0000
 1.4.2.3 03-Dec-2017  jdolecek update from HEAD
 1.4.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.4.2.1 25-Feb-2013  tls resync with head
 1.5.2.1 18-May-2014  rmind sync with head
 1.6.2.1 10-Aug-2014  tls Rebase.
 1.7.4.2 05-Oct-2016  skrll Sync with HEAD
 1.7.4.1 06-Apr-2015  skrll Sync with HEAD
 1.33 13-Aug-2016  christos KNF, no functional changes...
 1.32 03-Aug-2016  jdolecek get and set expanded timestamp if the inode contains the extra information, add support for create time
 1.31 28-Mar-2015  maxv branches: 1.31.2;
Remove the 'cred' argument from bread(). Remove a now unused var in
ffs_snapshot.c. Update the man page accordingly.

ok hannken@
 1.30 23-Jun-2013  dholland branches: 1.30.10;
Stick ffs_, ext2_, chfs_, filecore_, cd9660_, or mfs_ in front of
the following symbols so as to disambiguate fully. (Christos already
did the lfs ones.)

lblkno
lblktosize
lfragtosize
numfrags
blkroundup
fragroundup
 1.29 19-Jun-2013  dholland Rename ambiguous macros:
MAXDIRSIZE -> UFS_MAXDIRSIZE or LFS_MAXDIRSIZE
NINDIR -> FFS_NINDIR, EXT2_NINDIR, LFS_NINDIR, or MFS_NINDIR
INOPB -> FFS_INOPB, LFS_INOPB
INOPF -> FFS_INOPF, LFS_INOPF
blksize -> ffs_blksize, ext2_blksize, or lfs_blksize
sblksize -> ffs_blksize

These are not the only ambiguously defined filesystem macros, of
course, there's a pile more. I may not have found all the ambiguous
definitions of blksize(), too, as there are a lot of other things
called 'blksize' in the system.
 1.28 20-Dec-2012  hannken Change bread() and breadn() to never return a buffer on
error and modify all callers to not brelse() on error.

Welcome to 6.99.16

PR kern/46282 (6.0_BETA crash: msdosfs_bmap -> pcbmap -> bread -> bio_doread)
 1.27 19-Oct-2009  bouyer branches: 1.27.12; 1.27.22;
Remove closes 3 & 4 from my licence. Lots of thanks to Soren Jacobsen
for the booring work !
 1.26 16-May-2008  hannken Make sure all cached buffers with valid, not yet written data have been
run through copy-on-write. Call fscow_run() with valid data where possible.

The LP_UFSCOW hack is no longer needed to protect ffs_copyonwrite() against
endless recursion.

- Add a flag B_MODIFY to bread(), breada() and breadn(). If set the caller
intends to modify the buffer returned.

- Always run copy-on-write on buffers returned from ffs_balloc().

- Add new function ffs_getblk() that gets a buffer, assigns a new blkno,
may clear the buffer and runs copy-on-write. Process possible errors
from getblk() or fscow_run(). Part of PR kern/38664.

Welcome to 4.99.63

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.25 08-Oct-2007  ad branches: 1.25.18; 1.25.20; 1.25.22; 1.25.24;
Merge ffs locking & brelse changes from the vmlocking branch.
 1.24 23-Jun-2006  yamt branches: 1.24.14; 1.24.28; 1.24.30; 1.24.32;
fix a simonb-timecounters regression.
the precision of getnanotime() is not suitable for file timestamps.
esp. when it's nfs-exported.

- introduce vfs_timestamp().
(the name is from freebsd. currently merely a wrapper of nanotime())
- for ufs-like filesystems, use it rather than getnanotime().

XXX check other filesystems.
 1.23 07-Jun-2006  kardel branches: 1.23.2;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.22 14-May-2006  elad branches: 1.22.2;
integrate kauth.
 1.21 18-Mar-2006  bouyer Fix a dead error condition, coverity ID 603.
 1.20 27-Dec-2005  chs branches: 1.20.4; 1.20.6; 1.20.8; 1.20.10; 1.20.12;
changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
make those fields always present.
- for functions which are conditionally inline, make them never inline.
- remove some other functions which are conditionally defined but
don't actually do anything anymore.
- make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.
 1.19 11-Dec-2005  christos merge ktrace-lwp.
 1.18 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.17 27-Sep-2005  yamt branches: 1.17.2;
introduce "ufs_ops" and use it for ITIMES.
 1.16 12-Sep-2005  christos Add a KASSERT like the one ffs has.
 1.15 12-Sep-2005  christos Use nanotime() to update the time fields in filesystems. Convert the code
from macros to real functions. Original patch and review from chuq.
Note: ext2fs only keeps seconds in the on-disk inode, and msdosfs does not
have enough precision for all fields, so this is not very useful for those
two.
 1.14 30-Aug-2005  xtraeme * Remove __P()
* Use ANSI function declarations on ext2fs and mfs
 1.13 22-Mar-2004  bouyer branches: 1.13.16;
Fix disclaimer in my copyright. Pointed out by Thomas Klausner.
 1.12 30-Dec-2003  pk Replace the traditional buffer memory management -- based on fixed per buffer
virtual memory reservation and a private pool of memory pages -- by a scheme
based on memory pools.

This allows better utilization of memory because buffers can now be allocated
with a granularity finer than the system's native page size (useful for
filesystems with e.g. 1k or 2k fragment sizes). It also avoids fragmentation
of virtual to physical memory mappings (due to the former fixed virtual
address reservation) resulting in better utilization of MMU resources on some
platforms. Finally, the scheme is more flexible by allowing run-time decisions
on the amount of memory to be used for buffers.

On the other hand, the effectiveness of the LRU queue for buffer recycling
may be somewhat reduced compared to the traditional method since, due to the
nature of the pool based memory allocation, the actual least recently used
buffer may release its memory to a pool different from the one needed by a
newly allocated buffer. However, this effect will kick in only if the
system is under memory pressure.
 1.11 05-Oct-2003  bouyer Remove references to University of California from my copyright notices.
 1.10 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.9 25-Jan-2003  tron branches: 1.9.2;
Use PRId64 instead of hard coding "%lld" to fix build problems under
LP64 ports.
 1.8 25-Jan-2003  tron Fix printf() format strings problems caused by "daddr_t" change.
 1.7 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.6 08-Nov-2001  lukem add RCSID
 1.5 26-Oct-2001  lukem remove #include <ufs/ufs/quota.h> where it was just to appease
<ufs/ufs/inode.h>, since the latter now includes the former. leave the former
in source that obviously uses specific bits of it (for completeness.)
 1.4 30-Mar-2000  augustss branches: 1.4.6; 1.4.10; 1.4.14;
Remove register declarations.
 1.3 04-Mar-1998  cgd branches: 1.3.14; 1.3.20;
ext2fs_checkoverlap is (or at least seems) unused, and its prototype is
#ifdef DIAGNOSTIC. Make the function #ifdef DIAGNOSTIC, as well, so we
don't get a warning about the function declaration not being a prototype.
 1.2 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.1 11-Jun-1997  bouyer The ext2fs layer, based on the ffs/ufs one. Uses a few functions from
sys/ufs/ufs/
 1.3.20.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.3.14.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.4.14.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.4.10.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.4.6.1 14-Nov-2001  nathanw Catch up to -current.
 1.9.2.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.9.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.9.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.9.2.1 03-Aug-2004  skrll Sync with HEAD
 1.13.16.3 27-Oct-2007  yamt sync with head.
 1.13.16.2 30-Dec-2006  yamt sync with head.
 1.13.16.1 21-Jun-2006  yamt sync with head.
 1.17.2.1 20-Oct-2005  yamt adapt ufs.
 1.20.12.2 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.20.12.1 28-Mar-2006  tron Merge 2006-03-28 NetBSD-current into the "peter-altq" branch.
 1.20.10.2 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.20.10.1 19-Apr-2006  elad sync with head.
 1.20.8.3 26-Jun-2006  yamt sync with head.
 1.20.8.2 24-May-2006  yamt sync with head.
 1.20.8.1 01-Apr-2006  yamt sync with head.
 1.20.6.4 01-Jun-2006  kardel Sync with head.
 1.20.6.3 22-Apr-2006  simonb Sync with head.
 1.20.6.2 05-Feb-2006  simonb In the *itimes functions, just call getnanotime() at the start of
the function and use the result if needed, rather than the previous
conditional calls/assignments method. The code is clearer this way,
and benchmarks at about the same speed.
 1.20.6.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time() and use "time_second"
instead of "time.tv_sec".
 1.20.4.1 09-Sep-2006  rpaulo sync with head
 1.22.2.1 19-Jun-2006  chap Sync with head.
 1.23.2.1 13-Jul-2006  gdamore Merge from HEAD.
 1.24.32.1 14-Oct-2007  yamt sync with head.
 1.24.30.1 06-Nov-2007  matt sync with HEAD
 1.24.28.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.24.14.1 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.25.24.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.25.22.2 11-Mar-2010  yamt sync with head
 1.25.22.1 04-May-2009  yamt sync with head.
 1.25.20.1 18-May-2008  yamt sync with head.
 1.25.18.1 02-Jun-2008  mjf Sync with HEAD.
 1.27.22.4 03-Dec-2017  jdolecek update from HEAD
 1.27.22.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.27.22.2 23-Jun-2013  tls resync from head
 1.27.22.1 25-Feb-2013  tls resync with head
 1.27.12.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.27.12.1 23-Jan-2013  yamt sync with head
 1.30.10.2 05-Oct-2016  skrll Sync with HEAD
 1.30.10.1 06-Apr-2015  skrll Sync with HEAD
 1.31.2.1 06-Aug-2016  pgoyette Sync with HEAD
 1.229 16-Feb-2025  joe remove unecessary branching

break is executed regardless the path of the if branch
 1.228 30-Dec-2024  hannken emove comment "we are always called with the filesystem marked `MPBUSY'."
above some xxx_sync() operations. These operations get called without
any exclusive lock.

This comment appeared with "add quota support" on 1990-05-02.
On 1998/02/18 MNT_MPBUSY disappeared when vfs_busy() was changed from
an exclusive lock to a shared lock.

PR kern/58837 "ffs: Missing locking around fs_fmod/time"
 1.227 02-Jul-2024  rin ext2fs: Fix copy-paste for PR kern/58388
 1.226 01-Jul-2024  riastradh ext2fs: Fix indexing of group descriptors on disk.

XXX Evidently we need some more automatic tests for this!

PR kern/58388
 1.225 27-Aug-2023  christos branches: 1.225.6;
- fix cgload/cgsave inconsistencies
- add a constant for the rev 0 group descriptor size
 1.224 26-Aug-2023  christos fix kmem_free size for e2fs_gd
 1.223 26-Aug-2023  riastradh ext2fs: Nix trailing whitespace.
 1.222 25-Aug-2023  christos Support INCOMPAT_64BIT on ext4 (Vladimir 'phcoder' Serbinenko)
 1.221 22-May-2022  andvar branches: 1.221.4;
fix various small typos, mainly in comments.
 1.220 19-Mar-2022  hannken Remove now unused VV_LOCKSWORK, all file systems support locking.

Remove unused predicates vn_locked() and vn_anylocked().

Welcome to 9.99.95
 1.219 16-May-2020  christos Add ACL support for FFS. From FreeBSD.
 1.218 04-Apr-2020  ad Merge the remaining changes from the ad-namecache branch, affecting namei()
and getcwd():

- push vnode locking back as far as possible.
- do most lookups directly in the namecache, avoiding vnode locks & refs.
- don't block new refs to vnodes across VOP_INACTIVE().
- get shared locks for VOP_LOOKUP() if the file system supports it.
- correct lock types for VOP_ACCESS() / VOP_GETATTR() in a few places.

Possible future enhancements:

- make the lookups lockless.
- support dotdot lookups by being lockless and inferring absence of chroot.
- maybe make it work for layered file systems.
- avoid vnode references at the root & cwd.
 1.217 16-Mar-2020  pgoyette Use the module subsystem's ability to process SYSCTL_SETUP() entries to
automate installation of sysctl nodes.

Note that there are still a number of device and pseudo-device modules
that create entries tied to individual device units, rather than to the
module itself. These are not changed.
 1.216 27-Feb-2020  ad Tighten up the locking around vp->v_iflag a little more after the recent
split of vmobjlock & v_interlock.
 1.215 17-Jan-2020  ad VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.
 1.214 20-Jun-2019  pgoyette branches: 1.214.2; 1.214.4;
Split the ufs code out of the ffs module and into its own module.

Adapt chfs and ext2fs modules accordingly.
 1.213 01-Jan-2019  hannken Add "void *extra" argument to vcache_new() so a file system may
pass more information about the file to create.

Welcome to 8.99.30
 1.212 10-Dec-2018  maxv Remove unused mbuf.h includes.
 1.211 28-May-2018  chs branches: 1.211.2;
add a genfs method to allow a file system to limit the range of pages
that are given to a single GOP_WRITE() call. needed by ZFS.
 1.210 30-Jul-2017  riastradh branches: 1.210.2;
kmem_xyz(sizeof(struct foo)) --> kmem_xyz(sizeof(*foo))

No change to amd64 binary.
 1.209 28-May-2017  hannken Change ext2fs to use vcache_new like we did for ffs:
- Change ext2fs_valloc to return an inode number.
- Make ext2fs_makeinode private to ext2fs_vnops.c and
pass vattr instead of mode.
 1.208 17-Apr-2017  hannken branches: 1.208.2;
Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.
 1.207 17-Apr-2017  hannken Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).
 1.206 01-Apr-2017  riastradh KASSERT(mutex_owned(vp->v_interlock)) in vnode iterator selector.
 1.205 17-Feb-2017  hannken Add generic genfs_suspendctl() and use it for all file systems.
Layered file systems need work.
 1.204 25-Aug-2016  christos branches: 1.204.2;
put back second strlcpy; pointed out by dholland.
 1.203 23-Aug-2016  christos CID 1371644: use strlcpy, remove dup copy.
 1.202 20-Aug-2016  jdolecek fix code which sets REV1 e2fs_fsmnt, set also mount time and mount count
 1.201 20-Aug-2016  jdolecek adjust ext2fs_loadvnode_content() to do the sanity checking before allocating
memory, and avoid reallocaing memory on vnode reload
 1.200 20-Aug-2016  jdolecek add support for GDT_CSUM AKA uninit_bg feature
 1.199 14-Aug-2016  jdolecek switch code to use the EXT2_HAS_{COMPAT|ROCOMPAT|INCOMPAT}_FEATURE() macros instead of open coding the checks
 1.198 13-Aug-2016  christos KNF, no functional changes...
 1.197 05-Aug-2016  jdolecek add devel ifndefs for incompat/rocompat features so that it's possible
to ignore them and mount the filesystem; default is for the mount to fail
 1.196 03-Aug-2016  pgoyette Update previous. Since original format was %llu, replace it with
% PRIu64 (unsigned).
 1.195 03-Aug-2016  pgoyette Use correct printf() format for inode (fixes build for me)
 1.194 03-Aug-2016  jdolecek support arbitrary ext3/ext4 inode size, add all the new ext4 fields ext2fs_dinode, and add support for loading the extra inode data
 1.193 28-Mar-2015  maxv branches: 1.193.2;
Remove the 'cred' argument from bread(). Remove a now unused var in
ffs_snapshot.c. Update the man page accordingly.

ok hannken@
 1.192 27-Mar-2015  riastradh Disentangle buffer-cached I/O from page-cached I/O in UFS.

Page-cached I/O is used for regular files, and is initiated by VFS
users such as userland and NFS.

Buffer-cached I/O is used for directories and symlinks, and is issued
only internally by UFS.

New UFS routine ufs_bufio replaces vn_rdwr for internal use.
ufs_bufio is implemented by new UFS operations uo_bufrd/uo_bufwr,
which sit in ufs_readwrite.c alongside the VOP_READ/VOP_WRITE
implementations.

I preserved the code as much as possible and will leave further
simplification for future commits. I kept the ulfs_readwrite.c
copypasta close to ufs_readwrite.c in case we ever want to merge them
back; likewise ext2fs_readwrite.c.

No externally visible semantic change. All atf fs tests still pass.
 1.191 17-Mar-2015  hannken Change ffs to use vcache_new:
- Change ffs_valloc to return an inode number.
- Remove now obsolete UFS operations UFS_VALLOC and UFS_VFREE.
- Make ufs_makeinode private to ufs_vnops.c and pass vattr instead of mode.
 1.190 23-Feb-2015  maxv Hum. Perhaps I missed a bit of the specification. Let's not be that
severe when checking the superblock.

Should fix ATF.
 1.189 22-Feb-2015  maxv Merge _sbcompute() and _sbcheck() into _sbfill().

In ext2fs_sbfill(), check more fields of the superblock, to prevent
several kernel panics when mounting/unmounting a disk.
 1.188 20-Feb-2015  maxv Several fixes:
- rename ext2fs_checksb() -> ext2fs_sbcheck(): more consistent
- in ext2fs_sbcheck(), add a check to ensure e2fs_inode_size!=0,
otherwise division by zero
- add ext2fs_sbcompute(), to compute dynamic values of the superblock.
It is done twice in _reload() and _mountfs(), so put it in a function.
- reorder the code in charge of loading the superblock: now, read the
superblock, swap it directly, and *then* pass it to ext2fs_sbcheck().
It is similar to what ffs now does. It is better since the fields don't
need to be swapped on the fly in ext2fs_sbcheck().
Tested on amd64.
 1.187 19-Feb-2015  maxv e2fs_sbcheck(): add a check to ensure e2fs_bpg!=0. Otherwise the kernel
panics with a division by zero.

While here, remove the #ifdef's.
 1.186 09-Nov-2014  maxv branches: 1.186.2;
Do not uselessly include <sys/malloc.h>.
 1.185 19-Sep-2014  matt curlwp can never be NULL now.
 1.184 22-Aug-2014  hannken Use mount from argument "mp", "vp->v_mount" is not valid here.

PR kern/49142 (panic in ext2fs_loadvnode mounting an ext2fs filesystem)

Needs pullup to -7
 1.183 09-Jul-2014  maxv branches: 1.183.2;
Remove ROOTNAME (unused).
 1.182 24-May-2014  christos Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.
 1.181 08-May-2014  hannken Add a global vnode cache:

- vcache_get() retrieves a referenced and initialised vnode / fs node pair.
- vcache_remove() removes a vnode / fs node pair from the cache.

On cache miss vcache_get() calls new vfs operation vfs_loadvnode() to
initialise a vnode / fs node pair. This call is guaranteed exclusive,
no other thread will try to load this vnode / fs node pair.

Convert ufs/ext2fs, ufs/ffs and ufs/mfs to use this interface.

Remove now unused ufs/ufs_ihash

Discussed on tech-kern.

Welcome to 6.99.41
 1.180 16-Apr-2014  maxv An (un)privileged user can easily make the kernel dereference a NULL
pointer.

The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).

ok christos@
 1.179 23-Mar-2014  hannken branches: 1.179.2;
Change all vfsops to use C99 designated initializers.

No functional changes intended.
 1.178 17-Mar-2014  hannken Change ext2fs_sync() to use vfs_vnode_iterator.
 1.177 05-Mar-2014  hannken Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34
 1.176 25-Feb-2014  pooka Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.175 23-Nov-2013  christos change the mountlist CIRCLEQ into a TAILQ
 1.174 29-Oct-2013  hannken Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25
 1.173 30-Sep-2013  hannken Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>
 1.172 11-Aug-2013  dholland Kill off uo_unmark_vnode/UFS_UNMARK_VNODE as it's now a leftover.
 1.171 23-Jun-2013  dholland branches: 1.171.2;
fsbtodb() -> FFS_FSBTODB(), EXT2_FSBTODB(), or MFS_FSBTODB()
dbtofsb() -> FFS_DBTOFSB() or EXT2_DBTOFSB()

(Christos already did the lfs ones a few days back)
 1.170 19-Jun-2013  dholland Rename ambiguous macros:
MAXDIRSIZE -> UFS_MAXDIRSIZE or LFS_MAXDIRSIZE
NINDIR -> FFS_NINDIR, EXT2_NINDIR, LFS_NINDIR, or MFS_NINDIR
INOPB -> FFS_INOPB, LFS_INOPB
INOPF -> FFS_INOPF, LFS_INOPF
blksize -> ffs_blksize, ext2_blksize, or lfs_blksize
sblksize -> ffs_blksize

These are not the only ambiguously defined filesystem macros, of
course, there's a pile more. I may not have found all the ambiguous
definitions of blksize(), too, as there are a lot of other things
called 'blksize' in the system.
 1.169 08-Apr-2013  skrll Remove some set but unused variables
 1.168 20-Dec-2012  hannken Change bread() and breadn() to never return a buffer on
error and modify all callers to not brelse() on error.

Welcome to 6.99.16

PR kern/46282 (6.0_BETA crash: msdosfs_bmap -> pcbmap -> bread -> bio_doread)
 1.167 21-Nov-2012  jakllsch Write support for the Ext4 Read-only Compatible Feature "huge_file".

Primarily, this feature extends the inode block count field to 48 bits.
Additionally, this feature allows this field to be represented in file
system block size units rather than DEV_BSIZE units.
 1.166 01-Sep-2012  christos branches: 1.166.2;
really print the incompatible bits.
 1.165 01-Sep-2012  chs when failing a mount due to unsupported features,
print which features are involved.
 1.164 30-Apr-2012  rmind - Replace some malloc(9) uses with kmem(9).
- G/C M_IPMOPTS, M_IPMADDR and M_BWMETER.
 1.163 13-Mar-2012  elad Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.
 1.162 14-Nov-2011  hannken branches: 1.162.4; 1.162.6; 1.162.10; 1.162.12;
VOP_OPEN() needs a locked vnode. All these copy-and-pasted xxxfs_mount()
implementations need more review.
 1.161 07-Oct-2011  hannken branches: 1.161.2;
As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.
 1.160 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.159 27-Jul-2010  jakllsch branches: 1.159.6;
Make DEBUG_EXT2 work with 64-bit size_t.
 1.158 21-Jul-2010  hannken Make holding v_interlock mandatory for callers of vget().

Announced some time ago on tech-kern.
 1.157 24-Jun-2010  hannken Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.
 1.156 11-Feb-2010  mlelstv branches: 1.156.2;
There is no code left that uses disk size data, so don't query it.
 1.155 31-Jan-2010  mlelstv branches: 1.155.2;
Fix block shift to work with different device block sizes.
 1.154 31-Jan-2010  mlelstv Replace individual queries for partition information with
new helper function.
 1.153 08-Jan-2010  pooka The VATTR_NULL/VREF/VHOLD/HOLDRELE() macros lost their will to live
years ago when the kernel was modified to not alter ABI based on
DIAGNOSTIC, and now just call the respective function interfaces
(in lowercase). Plenty of mix'n match upper/lowercase has creeped
into the tree since then. Nuke the macros and convert all callsites
to lowercase.

no functional change
 1.152 21-Oct-2009  pooka update i_uid and i_gid after chown
 1.151 19-Oct-2009  bouyer Remove closes 3 & 4 from my licence. Lots of thanks to Soren Jacobsen
for the booring work !
 1.150 13-Sep-2009  tsutsui Move declaration of ufs_hashlock into <ufs/ufs_extern.h> from each c source.
 1.149 12-Sep-2009  tsutsui Reduce diffs a bit between ext2fs_reload() and ffs_reload().
 1.148 12-Sep-2009  tsutsui Add a missed brelse(9) call after bread(9) in ext2fs_reload().

This may close PR kern/28712 (ext2fs hang on mount after fsck).
 1.147 12-Sep-2009  tsutsui Pull a fix from ffs_vfsops.c rev 1.248:
> Fix bug introduced in revision 1.174(*) where a NULL fspec with an MNT_UPDATE
> command would always return EINVAL. This broke fsck on root, where fsck'ing
> a dirty root would always return an error causing rc to resort in a reboot.
(*) This is "Apply the NFS exports list rototill patch" change
in ext2fs_vfsops.c rev 1.91.
 1.146 12-Sep-2009  tsutsui Pull a fix for mount function from ffs_vfsops.c rev1.186:
> Change ffs_mount, in MNT_UPDATE case, to check dev_t's for equality
> instead of just vnode pointers. Fixes erroneous "does not match mounted
> device" errors from mount(8) in the presence of MFS /dev, init.root, &c.
 1.145 11-Sep-2009  tsutsui Fix botch around argument check in ext2fs_mount(). Taken from ffs_vfsops.c.

Fixes LOCKDEBUG panic which is the same one mentioned in PR kern/41078
on trying to mount_ext2fs against a raw device, while that panic
seems to have another route cause around module_autoload() in
sys/miscfs/specfs/spec_vnops.c:spec_open().
 1.144 29-Jun-2009  dholland Convert 67 namei call sites to use namei_simple, in these functions:

check_console, veriexecclose, veriexec_delete, veriexec_file_add,
emul_find_root, coff_load_shlib (sh3 version), coff_load_shlib,
compat_20_sys_statfs, compat_20_netbsd32_statfs,
ELFNAME2(netbsd32,probe_noteless), darwin_sys_statfs,
ibcs2_sys_statfs, ibcs2_sys_statvfs, linux_sys_uselib,
osf1_sys_statfs, sunos_sys_statfs, sunos32_sys_statfs,
ultrix_sys_statfs, do_sys_mount, fss_create_files (3 of 4),
adosfs_mount, cd9660_mount, coda_ioctl, coda_mount, ext2fs_mount,
ffs_mount, filecore_mount, hfs_mount, lfs_mount, msdosfs_mount,
ntfs_mount, sysvbfs_mount, udf_mount, union_mount, sys_chflags,
sys_lchflags, sys_chmod, sys_lchmod, sys_chown, sys_lchown,
sys___posix_chown, sys___posix_lchown, sys_link, do_sys_pstatvfs,
sys_quotactl, sys_revoke, sys_truncate, do_sys_utimes, sys_extattrctl,
sys_extattr_set_file, sys_extattr_set_link, sys_extattr_get_file,
sys_extattr_get_link, sys_extattr_delete_file,
sys_extattr_delete_link, sys_extattr_list_file, sys_extattr_list_link,
sys_setxattr, sys_lsetxattr, sys_getxattr, sys_lgetxattr,
sys_listxattr, sys_llistxattr, sys_removexattr, sys_lremovexattr

All have been scrutinized (several times, in fact) and compile-tested,
but not all have been explicitly tested in action.

XXX: While I haven't (intentionally) changed the use or nonuse of
XXX: TRYEMULROOT in any of these places, I'm not convinced all the
XXX: uses are correct; an audit might be desirable.
 1.143 25-Apr-2009  elad Add genfs_can_mount() and use it to prevent some more code duplication of
the security checks when mounting a device (VOP_ACCESS() + kauth(9) call)).

Proposed with no objections on tech-kern@:

http://mail-index.netbsd.org/tech-kern/2009/04/20/msg004859.html

The vnode is always expected to be locked, so no locking is done outside
the file-system code.
 1.142 01-Mar-2009  christos PR/40936: Frederik Sausmikat: ext2fs: add support for inodes > 128 bytes
 1.141 08-Dec-2008  pooka branches: 1.141.2;
Remove no longer valid comment (which probably didn't even say what
it wanted to say in the first place).
 1.140 23-Nov-2008  mrg add support for 32 bit uid/gid fields in ext2, but only do so for
when the revision is > REV0.
 1.139 13-Nov-2008  ad These depend on ffs.
 1.138 13-Nov-2008  ad Remove #ifdef LFS from the ufs code.
 1.137 28-Jun-2008  rumble branches: 1.137.2; 1.137.4; 1.137.6;
Create sysctl entries during module initialisation and destroy them
appropriately.

Many of these file systems are now ready for modularisation.
 1.136 16-May-2008  hannken branches: 1.136.2;
Make sure all cached buffers with valid, not yet written data have been
run through copy-on-write. Call fscow_run() with valid data where possible.

The LP_UFSCOW hack is no longer needed to protect ffs_copyonwrite() against
endless recursion.

- Add a flag B_MODIFY to bread(), breada() and breadn(). If set the caller
intends to modify the buffer returned.

- Always run copy-on-write on buffers returned from ffs_balloc().

- Add new function ffs_getblk() that gets a buffer, assigns a new blkno,
may clear the buffer and runs copy-on-write. Process possible errors
from getblk() or fscow_run(). Part of PR kern/38664.

Welcome to 4.99.63

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.135 10-May-2008  rumble Convert file systems to dynamically attach with the new module interface.
Make VFS hooks dynamic while we're here and say farewell to VFS_ATTACH and
VFS_HOOKS_ATTACH linksets.

As a consequence, most of the file systems can now be loaded as new style
modules.

Quick sanity check by ad@.
 1.134 06-May-2008  ad branches: 1.134.2;
PR kern/38141 lookup/vfs_busy acquire rwlock recursively

Simplify the mount locking. Remove all the crud to deal with recursion on
the mount lock, and crud to deal with unmount as another weirdo lock.

Hopefully this will once and for all fix the deadlocks with this. With this
commit there are two locks on each mount:

- krwlock_t mnt_unmounting. This is used to prevent unmount across critical
sections like getnewvnode(). It's only ever read locked with rw_tryenter(),
and is only ever write locked in dounmount(). A write hold can't be taken
on this lock if the current LWP could hold a vnode lock.

- kmutex_t mnt_updating. This is taken by threads updating the mount, for
example when going r/o -> r/w, and is only present to serialize updates.
In order to take this lock, a read hold must first be taken on
mnt_unmounting, and the two need to be held across the operation.

One effect of this change: previously if an unmount failed, we would make a
half hearted attempt to back out of it gracefully, but that was unlikely to
work in a lot of cases. Now while an unmount that will be aborted is in
progress, new file operations within the mount will fail instead of being
delayed. That is unlikely to be a problem though, because if the admin
requests unmount of a file system then s(he) has made a decision to deny
access to the resource.
 1.133 30-Apr-2008  ad PR kern/38135 vfs_busy/vfs_trybusy confusion

The previous fix worked, but it opened a window where mounts could have
disappeared from mountlist while the caller was traversing it using
vfs_trybusy(). Fix that.
 1.132 29-Apr-2008  ad PR kern/38057 ffs makes assuptions about devvp file system
PR kern/33406 softdeps get stuck in endless loop

Introduce VFS_FSYNC() and call it when syncing a block device, if it
has a mounted file system.
 1.131 05-Feb-2008  ad branches: 1.131.6; 1.131.8; 1.131.10;
Do genfs_node_init() earlier. PR kern/36162.
 1.130 30-Jan-2008  ad PR kern/37706 (forced unmount of file systems is unsafe):

- Do reference counting for 'struct mount'. Each vnode associated with a
mount takes a reference, and in turn the mount takes a reference to the
vfsops.
- Now that mounts are reference counted, replace the overcomplicated mount
locking inherited from 4.4BSD with a recursable rwlock.
 1.129 28-Jan-2008  dholland Fix some race conditions in rename.
Introduce a per-FS rename lock and new vfsops to manipulate it.
Get this lock while renaming. Also add another relookup() in do_sys_rename,
which is a hack to kludge around some of the worst deficiencies of
ufs_rename.
reviewed-by: pooka (and an earlier rev by ad)
posted on tech-kern with no objections.
 1.128 24-Jan-2008  ad specfs changes for PR kern/37717 (raidclose() is no longer called on
shutdown). There are still problems with device access and a PR will be
filed.

- Kill checkalias(). Allow multiple vnodes to reference a single device.

- Don't play dangerous tricks with block vnodes to ensure that only one
vnode can describe a block device. Instead, prohibit concurrent opens of
block devices. As a bonus remove the unreliable code that prevents
multiple file system mounts on the same device. It's no longer needed.

- Track opens by vnode and by device. Issue cdev_close() when the last open
goes away, instead of abusing vnode::v_usecount to tell if the device is
open.
 1.127 03-Jan-2008  pooka valloc -> vnalloc, vfree -> vnfree
Avoids collision with userland valloc(3).

no functional change
ad ok
 1.126 02-Jan-2008  ad Merge vmlocking2 to head.
 1.125 08-Dec-2007  pooka branches: 1.125.4;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.
 1.124 01-Dec-2007  tsutsui branches: 1.124.2;
Use e2fs_first_dblock in superblock to read/write group descriptor blocks.
 1.123 26-Nov-2007  pooka Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.122 26-Nov-2007  tsutsui Misc cosmetics.
 1.121 26-Nov-2007  tsutsui Account e2fs_reserved_ngdb blocks accordingly in ext2fs_statvfs().
 1.120 17-Oct-2007  ad branches: 1.120.4;
Sync with ffs: fix ufs_ihashlock / ufs_hash_lock deadlock.
From Sverre Froyen.
 1.119 10-Oct-2007  ad Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.118 08-Oct-2007  ad Merge ffs locking & brelse changes from the vmlocking branch.
 1.117 31-Jul-2007  pooka branches: 1.117.2; 1.117.4; 1.117.6; 1.117.8;
* nuke the nameidata parameter from VFS_MOUNT(). Nobody on tech-kern
knew what it was supposed to be used for and wrstuden gave a go-ahead
* while rototilling, convert file systems which went easily to
use VFS_PROTOS() instead of manually prototyping the methods
 1.116 26-Jul-2007  pooka Use eopnotsupp() instead of vfs_stdsuspendctl() and retire the latter.
 1.115 20-Jul-2007  pooka In sync, skip over vnodes based on if they are clean rather than
if they have pages.
 1.114 17-Jul-2007  pooka branches: 1.114.2;
Make set_statvfs_info() take a parameter for the vfs name instead
of always retrieving it from mp->mnt_op->vfs_name

christos ok
 1.113 12-Jul-2007  dsl Change the VFS_MOUNT() interface so that the 'data' buffer passed to the
fs code is a kernel buffer, pass though the length of the buffer as well.
Since the length of the userspace buffer isn'it (yet) passed through the mount
system call, add a field to the vfsops structure containing the default length.
Split sys_mount() for calls from compat code.
Ride one of the recent kernel version changes - old fs LKMs will load, but
sys_mount() will reject any attempt to use them.
 1.112 30-Jun-2007  pooka Using POOL_INIT here makes no sense, since file systems always have
an init method. So get rid of it and #ifdef _LKM and just always
init in the init method. Give malloc types the same treatment.
Makes file systems nicer to work with in linksetless environments
and fixes a few LKM discrepancies.
 1.111 05-Jun-2007  yamt improve post-ubc file overwrite performance in common cases.
ie. when it's safe, actually overwrite blocks rather than doing
read-modify-write.

also fixes PR/33152 and PR/36303.
 1.110 12-Mar-2007  ad branches: 1.110.2;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.
 1.109 04-Mar-2007  christos branches: 1.109.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.108 15-Feb-2007  ad branches: 1.108.2;
Replace some uses of lockmgr() / simplelocks.
 1.107 19-Jan-2007  hannken New file system suspension API to replace vn_start_write and vn_finished_write.
The suspension helpers are now put into file system specific operations.
This means every file system not supporting these helpers cannot be suspended
and therefore snapshots are no longer possible.

Implemented for file systems of type ffs.

The new API is enabled on a kernel option NEWVNGATE. This option is
not enabled by default in any kernel config.

Presented and discussed on tech-kern with much input from
Bill Studenmund <wrstuden@netbsd.org> and YAMAMOTO Takashi <yamt@netbsd.org>.

Welcome to 4.99.9 (new vfs op vfs_suspendctl).
 1.106 04-Jan-2007  elad Consistent usage of KAUTH_GENERIC_ISSUSER.
 1.105 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.104 25-Oct-2006  reinoud Revisit mnt_vnodelist TAILQ patch. Remove all suspicious TAILQ_FOREACH()
loops where vnodes can get removed or added during the loops. This could
lead to panic's on unmount since nodes are skipped or otherwise
TAILQ_NEXT(0xdeadbeef, ...) was dereferenced.
 1.103 20-Oct-2006  reinoud Replace the LIST structure mp->mnt_vnodelist to a TAILQ structure since all
vnodes were synced and processed backwards. This meant that the last
accessed node was processed first and the earlierst last.

An extra benefit is the removal of the ugly hack from the Berkly days on
LFS.

In the proces, i've also replaced the various variations hand written loops
by the TAILQ_FOREACH() macro's.
 1.102 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.101 30-Aug-2006  christos branches: 1.101.2; 1.101.4;
fix incomplete initializer.
 1.100 23-Jul-2006  ad Use the LWP cached credentials where sane.
 1.99 13-Jul-2006  martin Fix alignement problems for fhandle_t, exposed by gcc4.1.

While touching all vptofh/fhtovp functions, get rid of VFS_MAXFIDSIZ,
version the getfh(2) syscall and explicitly pass the size available in
the filehandle from userland.

Discussed on tech-kern, with lots of help from yamt (thanks!).
 1.98 07-Jun-2006  kardel branches: 1.98.2;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.97 14-May-2006  elad branches: 1.97.2;
integrate kauth.
 1.96 18-Mar-2006  bouyer bread() will always return a valid bp. So remplace the (always true) if (bp)
with a KASSERT.
Should fix Coverity ID 2444.
 1.95 21-Feb-2006  thorpej branches: 1.95.2; 1.95.4; 1.95.6;
Use device_class() instead of accessing dv_class directly.
 1.94 11-Dec-2005  christos branches: 1.94.2; 1.94.4; 1.94.6;
merge ktrace-lwp.
 1.93 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.92 27-Sep-2005  yamt branches: 1.92.2;
introduce "ufs_ops" and use it for ITIMES.
 1.91 23-Sep-2005  jmmv Apply the NFS exports list rototill patch:

- Remove all NFS related stuff from file system specific code.
- Drop the vfs_checkexp hook and generalize it in the new nfs_check_export
function, thus removing redundancy from all file systems.
- Move all NFS export-related stuff from kern/vfs_subr.c to the new
file sys/nfs/nfs_export.c. The former was becoming large and its code
is always compiled, regardless of the build options. Using the latter,
the code is only compiled in when NFSSERVER is enabled. While doing this,
also make some functions in nfs_subs.c conditional to NFSSERVER.
- Add a new command in nfssvc(2), called NFSSVC_SETEXPORTSLIST, that takes a
path and a set of export entries. At the moment it can only clear the
exports list or append entries, one by one, but it is done in a way that
allows setting the whole set of entries atomically in the future (see the
comment in mountd_set_exports_list or in doc/TODO).
- Change mountd(8) to use the nfssvc(2) system call instead of mount(2) so
that it becomes file system agnostic. In fact, all this whole thing was
done to remove a 'XXX' block from this utility!
- Change the mount*, newfs and fsck* userland utilities to not deal with NFS
exports initialization; done internally by the kernel when initializing
the NFS support for each file system.
- Implement an interface for VFS (called VFS hooks) so that several kernel
subsystems can run arbitrary code upon receipt of specific VFS events.
At the moment, this only provides support for unmount and is used to
destroy NFS exports lists from the file systems being unmounted, though it
has room for extension.

Thanks go to yamt@, chs@, thorpej@, wrstuden@ and others for their comments
and advice in the development of this patch.
 1.90 12-Sep-2005  christos - access the ffs and ext2fs itimes functions through a pointer, so that
if the filesystem is not compiled in the kernel still links. Probably
a better solution is to use weak symbols.
- move the filesystem-specific itime macros to the filesystem header files.
 1.89 30-Aug-2005  xtraeme * Remove __P()
* Use ANSI function declarations on ext2fs and mfs
 1.88 23-Aug-2005  christos Don't overload MAXNAMLEN, use a separate constant for each filesystem type.
 1.87 23-Jul-2005  yamt update file timestamps for nfsd loaned-read and mmap.
PR/25279. discussed on tech-kern@.
 1.86 28-Jun-2005  yamt branches: 1.86.2;
- constify genfs_ops.
- use member designators.
 1.85 29-May-2005  christos - sprinkle const
- avoid shadow variables.
 1.84 29-Mar-2005  thorpej - Define a VFS_ATTACH() macro that places a reference to a vfsops structure
into the "vfsops" link set.
- Use VFS_ATTACH() where vfsops are declared for individual file systems.
- In vfsinit(), traverse the "vfsops" link set, rather than vfs_list_initial[].
 1.83 26-Feb-2005  perry branches: 1.83.2;
nuke trailing whitespace
 1.82 09-Feb-2005  ws Add support for large files (>2GB).
Like Linux, automagically convert old filesystem to use this,
if they are already at revision 1.
For revision 0, just punt (unlike Linux; makes me a bit too nervous.)

There should be an option to fsck_ext2fs to upgrade revision 0 to revision 1.

Reviewd by Manuel (bouyer@).
 1.81 11-Jan-2005  mycroft branches: 1.81.2; 1.81.4;
Rearrange some code slightly to avoid uninitialized variable warnings.
 1.80 09-Jan-2005  mycroft Whoops -- move the location of the VOP_OPEN()/VOP_CLOSE(), et al, from
foo_mountfs() to foo_mount(), to match the new mountroot API.
Also, for ext2fs and lfs, copy some restructuring from ffs to allow changing
file system parameters without specifying the device name.
(ntfs could use some more work.)
 1.79 09-Jan-2005  mycroft Rework the mountroot interface so that vfs_mountroot() opens the root device
and just passes it on to the file system functions. This avoids opening and
closing the device several times.

Mentioned on tech-kern some time ago, IIRC. I've been running this for a
long time.
 1.78 02-Jan-2005  thorpej Add the system call and VFS infrastructure for file system extended
attributes.

From FreeBSD.
 1.77 11-Nov-2004  christos Put the correct fragment size in struct statvfs. From Kevin Lahey.
 1.76 21-Sep-2004  thorpej Add a new VNODE_LOCKDEBUG option, which enables checks in the VOP_*()
calls to ensure that the vnode lock state is as expected when the VOP
call is made. Modify vnode_if.src to set the expected state according
to the documenting lock table for each VOP. Modify vnode_if.sh to emit
the checks.

Notes:
- The checks are only performed if the vnode has the VLOCKSWORK bit
set. Some file systems (e.g. specfs) don't even bother with vnode
locks, so of course the checks will fail.
- We can't actually run with VNODE_LOCKDEBUG because there are so many
vnode locking problems, not the least of which is the "use SHARED for
VOP_READ()" issue, which screws things up for the entire call chain.

Inspired by similar changes in OpenBSD, but implemented differently.
 1.75 15-Aug-2004  mycroft Fixing age old cruft:
* Rather than using mnt_maxsymlinklen to indicate that a file systems returns
d_type fields(!), add a new internal flag, IMNT_DTYPE.

Add 3 new elements to ufsmount:
* um_maxsymlinklen, replaces mnt_maxsymlinklen (which never should have existed
in the first place).
* um_dirblksiz, which tracks the current directory block size, eliminating the
FS-specific checks littered throughout the code. This may be used later to
make the block size variable.
* um_maxfilesize, which is the maximum file size, possibly adjusted lower due
to implementation issues.

Sync some bug fixes from FFS into ext2fs, particularly:
* ffs_lookup.c 1.21, 1.28, 1.33, 1.48
* ffs_inode.c 1.43, 1.44, 1.45, 1.66, 1.67
* ffs_vnops.c 1.84, 1.85, 1.86

Clean up some crappy pointer frobnication.
 1.74 14-Aug-2004  mycroft Add a new flag, IN_MODIFY. This is like IN_UPDATE|IN_CHANGE, but unlike
setting those flags, it does not cause the inode to be written in the periodic
sync. This is used for writes to special files (devices and named pipes) and
FIFOs.

Do not preemptively sync updates to access times and modification times. They
are now updated in the inode only opportunistically, or when the file or device
is closed. (Really, it should be delayed beyond close, but this is enough to
help substantially with device nodes.)

And the most amusing part:
Trickle sync was broken on both FFS and ext2fs, in different ways. In FFS, the
periodic call to VFS_SYNC(MNT_LAZY) was still causing all file data to be
synced. In ext2fs, it was causing the metadata to *not* be synced. We now
only call VOP_UPDATE() on the node if we're doing MNT_LAZY. I've confirmed
that we do in fact trickle correctly now.
 1.73 05-Jul-2004  pk Call inittodr() from main(). Let file system code set the recorded `last
update' time (if any) through the new function setrootfstime().
 1.72 25-May-2004  hannken Add ffs internal snapshots. Written by Marshall Kirk McKusick for FreeBSD.

- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.

Welcome to 2.0F.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.71 25-May-2004  atatat Sysctl descriptions under vfs subtree
 1.70 20-May-2004  atatat Explicitly call pool_init() (and pool_destroy()) when being built as
an _LKM.

This adds pools to the list of things that lkms must do manually
because they're set up with link sets. Not that there's anything
wrong with link sets, but that we need to try harder to remember that
lkms are second class citizens. Of a sort.
 1.69 02-May-2004  wiz Fix typo in error message, reported by Piotr Meyer in PR 25418.
 1.68 25-Apr-2004  simonb Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.
 1.67 21-Apr-2004  christos Replace the statfs() family of system calls with statvfs().
Retain binary compatibility.
 1.66 24-Mar-2004  atatat branches: 1.66.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.
 1.65 22-Mar-2004  bouyer Fix disclaimer in my copyright. Pointed out by Thomas Klausner.
 1.64 04-Dec-2003  atatat Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
 1.63 14-Oct-2003  dbj add mnt_iflag field to struct mount for internal flags
mv MNT_GONE, MNT_UNMOUNT and MNT_WANTRDWR to this field
additonally add mnt_writeopcountupper and mnt_writeopcountlower fields
in preparation for pending write suspension support work
bump kernel version to 1.6ZD
 1.62 05-Oct-2003  bouyer Remove references to University of California from my copyright notices.
 1.61 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.60 29-Jun-2003  fvdl branches: 1.60.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.59 29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.58 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.57 16-Apr-2003  christos PR/1796: John Kohl: statfs misbehaves under chrooted environments.

- Under chroot it displays only the visible filesystems with appropriate paths.
- The statfs f_mntonname gets adjusted to contain the real path from root.
- While was there, fixed a bug in ext2fs, locking problems with vfs_getfsstat(),
and factored out some of the vfsop statfs() code to copy_statfs_info(). This
fixes the problem where some filesystems forgot to set fsid.
- Made coda look more like a normal fs.
 1.56 05-Apr-2003  fvdl Actually get an ext2fs_dinode structure from the pool before using it.
 1.55 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.54 21-Mar-2003  dsl Use 'void *' instead of 'caddr_t' in prototypes of VOP_IOCTL, VOP_FCNTL
and VOP_ADVLOCK, delete casts from callers (and some to copyin/out).
 1.53 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.52 21-Sep-2002  christos MNT_GETARGS support
 1.51 06-Sep-2002  gehenna Merge the gehenna-devsw branch into the trunk.

This merge changes the device switch tables from static array to
dynamically generated by config(8).

- All device switches is defined as a constant structure in device drivers.

- The new grammer ``device-major'' is introduced to ``files''.

device-major <prefix> char <num> [block <num>] [<rules>]

- All device major numbers must be listed up in port dependent majors.<arch>
by using this grammer.

- Added the new naming convention.
The name of the device switch must be <prefix>_[bc]devsw for auto-generation
of device switch tables.

- The backward compatibility of loading block/character device
switch by LKM framework is broken. This is necessary to convert
from block/character device major to device name in runtime and vice versa.

- The restriction to assign device major by LKM is completely removed.
We don't need to reserve LKM entries for dynamic loading of device switch.

- In compile time, device major numbers list is packed into the kernel and
the LKM framework will refer it to assign device major number dynamically.
 1.50 30-Jul-2002  soren Die, qaddr_t, die! - mnt_data in struct mount is already effectively
a void *, so stop pretending otherwise.
 1.49 08-Mar-2002  thorpej branches: 1.49.6;
Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.
 1.48 08-Nov-2001  lukem add RCSID
 1.47 06-Nov-2001  simonb Use the sector size from the partition info, not a hard-coded value.
 1.46 06-Nov-2001  simonb Remove a variable that is set but never used.
 1.45 15-Sep-2001  chs branches: 1.45.2;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.44 15-Sep-2001  chs add a new VFS op, vfs_reinit, which is called when desiredvnodes is
adjusted via sysctl. file systems that have hash tables which are
sized based on the value of this variable now resize those hash tables
using the new value. the max number of FFS softdeps is also recalculated.

convert various file systems to use the <sys/queue.h> macros for
their hash tables.
 1.43 30-May-2001  mrg branches: 1.43.4; 1.43.6;
use _KERNEL_OPT
 1.42 22-Jan-2001  jdolecek branches: 1.42.2;
make filesystem vnodeop, specop, fifoop and vnodeopv_* arrays const
 1.41 10-Dec-2000  chs in *_sync(), don't skip vnodes which have (potentially dirty) pages.
 1.40 27-Nov-2000  chs Initial integration of the Unified Buffer Cache project.
 1.39 19-Sep-2000  fvdl Adapt for VOP_FSYNC parameter change.
 1.38 22-Jul-2000  jdolecek ext2fs_reload(), ext2fs_mountfs(): do devvp locking same way as ffs
this has not shown any good or bad effect, but might help narrow
some problems people seen with ext2fs reload (hi Soren!)
 1.37 30-Jun-2000  fvdl Rearrange code around getnewvnode as was already done for ffs, to avoid
locking against oneself because getnewvnode recycles a softdep-using vnode.
 1.36 29-May-2000  mycroft branches: 1.36.2;
Pull in IN_ACCESSED changes and some MNT_LAZY `bug fixes' from FFS.
 1.35 30-Mar-2000  augustss branches: 1.35.2;
Remove register declarations.
 1.34 16-Mar-2000  jdolecek Add new VFS op routine - vfs_done and call it on filesystem detach
in vfs_detach(). vfs_done may free global filesystem's resources,
typically those allocated in respective filesystem's init function.
Needed so those filesystems which went in via LKM have a chance to
clean after themselves before unloading. This fixes random panics
when LKM for filesystem using pools was loaded and unloaded several
times.

For each leaf filesystem, add appropriate vfs_done routine.
 1.33 31-Jan-2000  bouyer Check that we can handle the inode size before mounting the fs, and correct
a return value.
 1.32 28-Jan-2000  bouyer Correct (minor) bogons in filetype option support, and add support
for sparse_super option
 1.31 26-Jan-2000  bouyer First cut at ext2fs rev 1 support (as of mke2fs 1.18): supports the filetype
option read/write and the sparse option read-only.
 1.30 15-Nov-1999  fvdl Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.29 20-Oct-1999  enami Check if the type of device node isn't VBAD before touching v_specinfo. If
the device vnode is revoked, the field is NULL and touching it causes null
pointer derefercence.
 1.28 16-Oct-1999  wrstuden branches: 1.28.2; 1.28.4;
In spec_close(), if we're not doing a non-blocking close and VXLOCK is
not set, unlock the vnode before calling the device's close routine and
relock it after it returns. tty close routines will sleep waiting for
buffers to drain, which won't happen often times as the other side needs
to grab the vnode lock first.

Make all unmount routines lock the device vnode before calling VOP_CLOSE().
 1.27 17-Jul-1999  wrstuden branches: 1.27.2;
Adjust mountroot routines to vrele rootvp in case of mount error. Closes
PR 7977 by Neil Carson, <neil@brini.com>.
 1.26 08-Jul-1999  wrstuden Modify file systems to deal with struct lock in struct vnode. All leaf
fs's other than nfs use genfs_lock() for locking.

Modify lookup routines to set PDIRUNLOCK when they unlock the parrent.
 1.25 01-Jun-1999  bouyer memset ump->um_e2fs to 0 after malloc, it is bigger than SBSIZE and thus some
parts were left uninitialised. The symptom was that a read-only mount tried
to rewrite back the superblock.
 1.24 26-Feb-1999  wrstuden branches: 1.24.2; 1.24.4; 1.24.6;
Modify vfsops to seperate vfs_fhtovp() into two routines. vfs_fhtovp() now
only handles the file handle to vnode conversion, and a new call,
vfs_checkexp(), performs the export verification.
 1.23 10-Feb-1999  bouyer Make sure a buffer optained from bread() is always bresle()'d in case of
error. Closes PR kern/1448 from Wolfgang Solfrank.
 1.22 02-Dec-1998  bouyer - intentation
- sync LK_* flags with ffs/ufs
 1.21 01-Dec-1998  bouyer In ext2fs_sync(), don't flush the vnode if vput() returned an error. Fixes
PR kern/6495.
 1.20 23-Oct-1998  thorpej For consistency w/ FFS/LFS, define EXT2_DINODE_SIZE, and use it instead
of pointer arithmetic and/or sizeof(struct ext2fs_dinode).
 1.19 29-Sep-1998  bouyer #include opt_uvm.h only if _KENREL and !_LKM
Make ext2fs_init() call ufs_init(). it was doing the init by itself,
testing for extern done != 0. This bug was hidden by the fact that
ext2fs_init() is called before ffs_init().
 1.18 13-Sep-1998  christos Fix copyright '\t' -> ' '
 1.17 01-Sep-1998  thorpej Use the pool allocator and "nointr" pool page allocator for ext2fs inodes.
 1.16 09-Aug-1998  perry bzero->memset, bcopy->memcpy, bcmp->memcmp
 1.15 05-Jul-1998  jonathan * defopt COMPAT_{09,10,11,12,13} and COMPAT_NOMID.
TODO: revisit interaction between native compat and emul compat usage.
 1.14 24-Jun-1998  sommerfe Always include fifos; "not an option any more".
 1.13 22-Jun-1998  sommerfe defopt for options FIFO
 1.12 05-Jun-1998  kleink Convert fsync vnode operator implementations and usage from the old `waitfor'
argument and MNT_WAIT/MNT_NOWAIT to `flags' and FSYNC_WAIT.
 1.11 18-Mar-1998  bouyer Add support for reading/writing FFS in non-native byte order, conditioned
to "options FFS_EI". The superblock and inodes (without blk addr) are
byteswapped at disk read/write time, other metadatas are byteswapped
when used (as they are acceeded directly in the buffer cache).
This required the addition of a "um_flags" field to struct ufsmount.
ffs_bswap.c contains superblock and inode byteswap routines also used
by userland utilities.
 1.10 02-Mar-1998  bouyer Close kern/5077: When DIAGNOSTIC is defined, don't complain about
bad magic numbers at a mount attempt. A message is still printed when the
magic number is OK, but the version number or the block size is bad.
Patch from Soren S. Jorvang, but different from the one in the PR.
 1.9 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.8 18-Feb-1998  drochner add missing vfsops element
 1.7 18-Feb-1998  thorpej Place a pointer to an array of our vnodeopv_desc *'s in our vfsops
structure, for use by vfs_attach().
 1.6 27-Oct-1997  bouyer When allocating an inode with dtime set, also bzero e2di_blocks[].
 1.5 23-Oct-1997  bouyer Uses ext2fs_vinit not ufs_vinit.
In ext2fs, an inode is deleted either when mode == 0 or dtime != 0. If
dtime != 0, reset others fields before using the inode, or we could end
up with the wrong v_op in ext2fs_vinit.
While I'm there, kill a unused variable in ext2fs_readwrite
 1.4 09-Oct-1997  bouyer branches: 1.4.2;
Add byte-swapping functions (bswap16, bswap32, bswap64) to libkern.
Only assembly version for i386 bswap16 and bswap32 for now (bswap64 uses
bswap32). Contribution of assembly versions of these are welcome.
Add byte-swapping of ext2fs metadata for big-endian systems.
Tested on i386 and sparc.
 1.3 17-Jul-1997  bouyer branches: 1.3.2;
Add a lock locking around inode hashing.
 1.2 12-Jun-1997  mrg remove swap configuration.
 1.1 11-Jun-1997  bouyer The ext2fs layer, based on the ffs/ufs one. Uses a few functions from
sys/ufs/ufs/
 1.3.2.1 14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.4.2.2 27-Oct-1997  mellon Pull rev 1.6 up from trunk (bouyer)
 1.4.2.1 23-Oct-1997  mellon Pull rev 1.5 up from trunk
 1.24.6.1 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.24.4.3 06-Aug-1999  chs UBCify.
 1.24.4.2 02-Aug-1999  thorpej Update from trunk.
 1.24.4.1 21-Jun-1999  thorpej Sync w/ -current.
 1.24.2.3 01-Feb-2000  he Apply patch (requested by bouyer):
Add support for ext2fs revision 1, with read-only support for
the 'sparse_super' and 'filetype' options. Should fix PR#9088.
 1.24.2.2 18-Oct-1999  cgd pull up rev 1.28 from trunk (requested by wrstuden):
In spec_close(), call the device's close routine with the vnode
unlocked if the call might block. Force a non-blocking close if
VXLOCK is set. This eliminates a potential deadlock situation, and
should eliminate the dirty buffers on reboot issue.
 1.24.2.1 21-Jun-1999  perry pullup 1.24->1.25 (bouyer): zero ump->um_e2fs after malloc
 1.27.2.2 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.27.2.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.28.4.3 15-Nov-1999  fvdl Sync with -current
 1.28.4.2 03-Nov-1999  fvdl Give ufs_ihashget an extra argument: the flags passed to vget() for
locking. This way we can avoid locking against ourselves when
ufs_ihashget is called during the flushing of metadata. XXX

Also, comment out a VOP_FSYNC call that I think is now unneeded, and
put a diagnostic printf there to check if this still happens.
 1.28.4.1 19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.28.2.5 11-Feb-2001  bouyer Sync with HEAD.
 1.28.2.4 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.28.2.3 08-Dec-2000  bouyer Sync with HEAD.
 1.28.2.2 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.28.2.1 20-Oct-1999  thorpej Sync w/ trunk.
 1.35.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.36.2.3 14-Dec-2000  he Pull up revision 1.39 (requested by fvdl):
Improve NFS performance, possibly with as much as 100% in
throughput. Please note: this implies a kernel interface change,
VOP_FSYNC gains two arguments.
 1.36.2.2 24-Jul-2000  jdolecek pullup rev. 1.38 from trunk (approved by thorpej):
ext2fs_reload(), ext2fs_mountfs(): do devvp locking same way as ffs
this has not shown any good or bad effect, but might help narrow
some problems people seen with ext2fs reload
 1.36.2.1 03-Jul-2000  fvdl pullup from trunk:

Fix a "locking against myself" problem; holding ufs_hashlock
across getnewvnode() could cause a recursive lock if it resulted in
recycling a vnode that was using softdeps.
 1.42.2.10 18-Oct-2002  nathanw Catch up to -current.
 1.42.2.9 17-Sep-2002  nathanw Catch up to -current.
 1.42.2.8 01-Aug-2002  nathanw Catch up to -current.
 1.42.2.7 12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.42.2.6 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.42.2.5 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.42.2.4 14-Nov-2001  nathanw Catch up to -current.
 1.42.2.3 21-Sep-2001  nathanw Catch up to -current.
 1.42.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.42.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.43.6.3 01-Oct-2001  fvdl Catch up with -current.
 1.43.6.2 26-Sep-2001  fvdl * add a VCLONED vnode flag that indicates a vnode representing a cloned
device.
* rename REVOKEALL to REVOKEALIAS, and add a REVOKECLONE flag, to pass
to VOP_REVOKE
* the revoke system call will revoke all aliases, as before, but not the
clones
* vdevgone is called when detaching a device, so make it use REVOKECLONE
to get rid of all clones as well
* clean up all uses of VOP_OPEN wrt. locking.
* add a few VOPS to spec_vnops that need to do something when it's a
clone vnode (access and getattr)
* add a copy of the vnode vattr structure of the original 'master' vnode
to the specinfo of a cloned vnode. could possibly redirect getattr to
the 'master' vnode, but this has issues with revoke
* add a vdev_reassignvp function that disassociates a vnode from its
original device, and reassociates it with the specified dev_t. to be
used by cloning devices only, in case a new minor is allocated.
* change all direct references in drivers to v_devcookie and v_rdev
to vdev_privdata(vp) and vdev_rdev(vp). for diagnostic purposes
when debugging race conditions that still exist wrt. locking and
revoking vnodes.
* make the locking state of a vnode consistent when passed to
d_open and d_close (unlocked). locked would be better, but has
some deadlock issues
 1.43.6.1 18-Sep-2001  fvdl Various changes to make cloning devices possible:

* Add an extra argument (struct vnode **) to VOP_OPEN. If it is
not NULL, specfs will create a cloned (aliased) vnode during
the call, and return it there. The caller should release and
unlock the original vnode if a new vnode was returned. The
new vnode is returned locked.

* Add a flag field to the cdevsw and bdevsw structures.
DF_CLONING indicates that it wants a new vnode for each
open (XXX is there a better way? devprop?)

* If a device is cloning, always call the close entry
point for a VOP_CLOSE.


Also, rewrite cons.c to do the right thing with vnodes. Use VOPs
rather then direct device entry calls. Suggested by mycroft@

Light to moderate testing done an i386 system (arch doesn't matter
though, these are MI changes).
 1.43.4.4 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.43.4.3 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.43.4.2 16-Mar-2002  jdolecek Catch up with -current.
 1.43.4.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.45.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.49.6.2 29-Aug-2002  gehenna catch up with -current.
 1.49.6.1 16-May-2002  gehenna Use devsw APIs for checking validity of major numbers.
 1.60.2.13 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.60.2.12 01-Apr-2005  skrll Sync with HEAD.
 1.60.2.11 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.60.2.10 15-Feb-2005  skrll Sync with HEAD.
 1.60.2.9 17-Jan-2005  skrll Sync with HEAD.
 1.60.2.8 14-Nov-2004  skrll Sync with HEAD.
 1.60.2.7 24-Sep-2004  skrll Sync with HEAD.
 1.60.2.6 21-Sep-2004  skrll Fix the sync with head I botched.
 1.60.2.5 18-Sep-2004  skrll Sync with HEAD.
 1.60.2.4 25-Aug-2004  skrll Sync with HEAD.
 1.60.2.3 24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.60.2.2 03-Aug-2004  skrll Sync with HEAD
 1.60.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.66.2.1 29-May-2004  tron Pull up revision 1.71 (requested by atatat in ticket #393):
Sysctl descriptions under vfs subtree
 1.81.4.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.81.4.1 12-Feb-2005  yamt sync with head.
 1.81.2.1 29-Apr-2005  kent sync with -current
 1.83.2.1 24-Aug-2005  riz Pull up following revision(s) (requested by yamt in ticket #688):
sys/miscfs/genfs/genfs_vnops.c: revision 1.98 via patch
sys/ufs/ffs/ffs_vfsops.c: revision 1.165
sys/ufs/lfs/lfs_extern.h: revision 1.69
sys/fs/filecorefs/filecore_vfsops.c: revision 1.20
sys/nfs/nfs_node.c: revision 1.80
sys/fs/smbfs/smbfs_node.c: revision 1.24
sys/fs/cd9660/cd9660_vfsops.c: revision 1.24
sys/fs/msdosfs/msdosfs_denode.c: revision 1.8
sys/miscfs/genfs/genfs_node.h: revision 1.6
sys/ufs/lfs/lfs_vfsops.c: revision 1.183
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.86
sys/fs/adosfs/advfsops.c: revision 1.23
sys/fs/ntfs/ntfs_vfsops.c: revision 1.31
- constify genfs_ops.
- use member designators.

sys/miscfs/genfs/genfs_vnops.c: revision 1.99 via patch
genfs_getpages: don't forget to put the vnode onto the syncer's work que
ue
even in the case of PGO_LOCKED.

sys/uvm/uvm_bio.c: revision 1.40
sys/uvm/uvm_pager.h: revision 1.29
sys/miscfs/genfs/genfs_vnops.c: revision 1.100 via patch
sys/ufs/ufs/ufs_inode.c: revision 1.50
- introduce PGO_NOBLOCKALLOC and use it for ubc mapping
to prevent unnecessary block allocations in the case that
page size > block size.
- ufs_balloc_range: use VM_PROT_WRITE+PGO_NOBLOCKALLOC rather than
VM_PROT_READ.

sys/uvm/uvm_fault.c: revision 1.96
sys/miscfs/genfs/genfs_vnops.c: revision 1.101 via patch
sys/uvm/uvm_object.h: revision 1.19
sys/miscfs/genfs/genfs_node.h: revision 1.7
ensure that vnodes with dirty pages are always on syncer's queue.
- genfs_putpages: wait for i/o completion of PG_RELEASED/PG_PAGEOUT pages by
setting "wasclean" false when encountering them.
suggested by Stephan Uphoff in PR/24596 (1).
- genfs_putpages: write protect pages when cleaning out, if
we're going to take the vnode off the syncer's queue.
uvm_fault: don't write-map pages unless its vnode is already on
the syncer's queue.
fix PR/24596 (3) but in the different way from the suggested fix.
(to keep our current behaviour, ie. not to require explicit msync.
discussed on tech-kern@.)
- genfs_putpages: don't mistakenly take a vnode off the queue
by introducing a generation number in genfs_node.
genfs_getpages: increment the generation number.
suggested by Stephan Uphoff in PR/24596 (2).
- add some assertions.

sys/miscfs/genfs/genfs_vnops.c: revision 1.102 via patch
genfs_putpages: don't bother to clean the vnode unless VONWORKLST.

sys/ufs/ffs/ffs_vnops.c: revision 1.71
ffs_full_fsync: because VBLK/VCHR can be mmap'ed,
do VOP_PUTPAGES for them as well.

sys/uvm/uvm_fault.c: revision 1.97
uvm_fault: check a correct object in the case of layered filesystems.
fix PR/30811 from Jukka Salmi.

sys/uvm/uvm_object.h: revision 1.20
sys/ufs/ffs/ffs_vfsops.c: revision 1.167
sys/uvm/uvm_bio.c: revision 1.41
sys/ufs/ufs/ufs_vnops.c: revision 1.129
sys/uvm/uvm_mmap.c: revision 1.92
sys/uvm/uvm_fault.c: revision 1.98
sys/kern/vfs_subr.c: revision 1.252
sys/fs/msdosfs/denode.h: revision 1.5
sys/miscfs/genfs/genfs_vnops.c: revision 1.103 via patch
sys/fs/msdosfs/msdosfs_denode.c: revision 1.9
sys/sys/vnode.h: revision 1.141
sys/ufs/ufs/ufs_inode.c: revision 1.51
sys/ufs/ufs/ufs_extern.h: revision 1.45 via patch
sys/miscfs/genfs/genfs_node.h: revision 1.8
sys/ufs/lfs/lfs_vfsops.c: revision 1.184
sys/uvm/uvm_pager.h: revision 1.30
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.87
update file timestamps for nfsd loaned-read and mmap.
PR/25279. discussed on tech-kern@.

sys/miscfs/genfs/genfs_vnops.c: revision 1.104 via patch
don't write-protect wired pages. pointed by Chuck Silvers.
for now, leave a vnode on the syncer's queue, as suggested by him.

sys/ufs/ffs/ffs_vnops.c: revision 1.72
revert VCHR part of ffs_vnops.c 1.71.
as VCHR uses the device pager, no point to call VOP_PUTPAGES here.
pointed by Chuck Silvers.
 1.86.2.9 11-Feb-2008  yamt sync with head.
 1.86.2.8 04-Feb-2008  yamt sync with head.
 1.86.2.7 21-Jan-2008  yamt sync with head
 1.86.2.6 07-Dec-2007  yamt sync with head
 1.86.2.5 27-Oct-2007  yamt sync with head.
 1.86.2.4 03-Sep-2007  yamt sync with head.
 1.86.2.3 26-Feb-2007  yamt sync with head.
 1.86.2.2 30-Dec-2006  yamt sync with head.
 1.86.2.1 21-Jun-2006  yamt sync with head.
 1.92.2.1 20-Oct-2005  yamt adapt ufs.
 1.94.6.3 01-Jun-2006  kardel Sync with head.
 1.94.6.2 22-Apr-2006  simonb Sync with head.
 1.94.6.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time() and use "time_second"
instead of "time.tv_sec".
 1.94.4.1 09-Sep-2006  rpaulo sync with head
 1.94.2.1 01-Mar-2006  yamt sync with head.
 1.95.6.2 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.95.6.1 28-Mar-2006  tron Merge 2006-03-28 NetBSD-current into the "peter-altq" branch.
 1.95.4.3 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.95.4.2 19-Apr-2006  elad sync with head.
 1.95.4.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.95.2.5 03-Sep-2006  yamt sync with head.
 1.95.2.4 11-Aug-2006  yamt sync with head
 1.95.2.3 26-Jun-2006  yamt sync with head.
 1.95.2.2 24-May-2006  yamt sync with head.
 1.95.2.1 01-Apr-2006  yamt sync with head.
 1.97.2.1 19-Jun-2006  chap Sync with head.
 1.98.2.1 13-Jul-2006  gdamore Merge from HEAD.
 1.101.4.2 10-Dec-2006  yamt sync with head.
 1.101.4.1 22-Oct-2006  yamt sync with head
 1.101.2.3 01-Feb-2007  ad Sync with head.
 1.101.2.2 12-Jan-2007  ad Sync with head.
 1.101.2.1 18-Nov-2006  ad Sync with head.
 1.108.2.2 24-Mar-2007  yamt sync with head.
 1.108.2.1 12-Mar-2007  rmind Sync with HEAD.
 1.109.2.11 25-Oct-2007  ad Fix up mnt_vnodelist handling.
 1.109.2.10 23-Oct-2007  ad Sync with head.
 1.109.2.9 19-Oct-2007  ad Hook ext2fs_vfree into ext2fs_ufsops (for ufs_reclaim).
 1.109.2.8 20-Aug-2007  ad Sync with HEAD.
 1.109.2.7 29-Jul-2007  ad Add vfs_destroy() to free mount structures. The specificdata_ref was being
leaked.
 1.109.2.6 15-Jul-2007  ad Sync with head.
 1.109.2.5 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.109.2.4 09-Jun-2007  ad Sync with head.
 1.109.2.3 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.109.2.2 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.109.2.1 13-Mar-2007  ad Sync with head.
 1.110.2.1 11-Jul-2007  mjf Sync with head.
 1.114.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.117.8.2 31-Jul-2007  pooka * nuke the nameidata parameter from VFS_MOUNT(). Nobody on tech-kern
knew what it was supposed to be used for and wrstuden gave a go-ahead
* while rototilling, convert file systems which went easily to
use VFS_PROTOS() instead of manually prototyping the methods
 1.117.8.1 31-Jul-2007  pooka file ext2fs_vfsops.c was added on branch matt-mips64 on 2007-07-31 21:14:21 +0000
 1.117.6.2 18-Oct-2007  yamt sync with head.
 1.117.6.1 14-Oct-2007  yamt sync with head.
 1.117.4.3 23-Mar-2008  matt sync with HEAD
 1.117.4.2 09-Jan-2008  matt sync with HEAD
 1.117.4.1 06-Nov-2007  matt sync with HEAD
 1.117.2.4 09-Dec-2007  jmcneill Sync with HEAD.
 1.117.2.3 03-Dec-2007  joerg Sync with HEAD.
 1.117.2.2 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.117.2.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.120.4.3 18-Feb-2008  mjf Sync with HEAD.
 1.120.4.2 27-Dec-2007  mjf Sync with HEAD.
 1.120.4.1 08-Dec-2007  mjf Sync with HEAD.
 1.124.2.3 30-Dec-2007  ad Fix remaining problems with ext2fs on this branch.
 1.124.2.2 26-Dec-2007  ad Sync with head.
 1.124.2.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.125.4.2 08-Jan-2008  bouyer Sync with HEAD
 1.125.4.1 02-Jan-2008  bouyer Sync with HEAD
 1.131.10.6 11-Aug-2010  yamt sync with head.
 1.131.10.5 11-Mar-2010  yamt sync with head
 1.131.10.4 16-Sep-2009  yamt sync with head
 1.131.10.3 18-Jul-2009  yamt sync with head.
 1.131.10.2 04-May-2009  yamt sync with head.
 1.131.10.1 16-May-2008  yamt sync with head.
 1.131.8.1 18-May-2008  yamt sync with head.
 1.131.6.3 17-Jan-2009  mjf Sync with HEAD.
 1.131.6.2 29-Jun-2008  mjf Sync with HEAD.
 1.131.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.134.2.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.134.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.136.2.1 03-Jul-2008  simonb Sync with head.
 1.137.6.7 25-Apr-2014  sborrill Pull up the following revisions(s) (requested by maxv in ticket #1901):
sys/kern/vfs_syscalls.c: revision 1.478, 1.480 via patch
sys/coda/coda_vfsops.c: revision 1.81
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50 via patch
sys/fs/puffs/puffs_vfsops.c: revision 1.110 via patch
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59 via patch
sys/fs/udf/udf_vfsops.c: revision 1.67
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/kern/vfs_syscalls.c: revision 1.479
sys/miscfs/nullfs/null_vfsops.c: revision 1.88 via patch
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/nfs/nfs_vfsops.c: revision 1.227
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/ufs/mfs/mfs_vfsops.c: revision 1.107

Due to missing checks in the mount syscall, and a wrong assumption on the
file systems side, the kernel could allocate an unbounded or zero-sized
memory buffer, and could dereference a NULL pointer when particular
arguments are given by a user.
 1.137.6.6 16-Jan-2011  bouyer branches: 1.137.6.6.2;
Pull up following revision(s) (requested by tsutsui in ticket #1486):
sbin/fsck_ext2fs/setup.c: revision 1.26
sbin/newfs_ext2fs/mke2fs.c: revision 1.10
sbin/newfs_ext2fs/mke2fs.c: revision 1.11
sbin/newfs_ext2fs/mke2fs.c: revision 1.12
sbin/fsck_ext2fs/inode.c: revision 1.24
sys/lib/libsa/ext2fs.c: revision 1.6
sbin/newfs_ext2fs/extern.h: revision 1.3
sbin/fsck_ext2fs/inode.c: revision 1.25
sys/lib/libsa/ext2fs.c: revision 1.7
sbin/fsck_ext2fs/inode.c: revision 1.26
sys/ufs/ext2fs/ext2fs_inode.c: revision 1.68
sbin/fsck_ext2fs/inode.c: revision 1.27
sbin/fsck_ext2fs/inode.c: revision 1.28
sys/ufs/ext2fs/ext2fs_dinode.h: revision 1.18
sys/ufs/ext2fs/ext2fs_dinode.h: revision 1.19
sbin/newfs_ext2fs/newfs_ext2fs.c: revision 1.5
sbin/newfs_ext2fs/newfs_ext2fs.8: revision 1.2
sbin/newfs_ext2fs/newfs_ext2fs.c: revision 1.6
sbin/newfs_ext2fs/newfs_ext2fs.8: revision 1.3
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.142
sbin/newfs_ext2fs/newfs_ext2fs.c: revision 1.7
sbin/newfs_ext2fs/newfs_ext2fs.8: revision 1.4
sbin/newfs_ext2fs/newfs_ext2fs.c: revision 1.8
PR/40936: Frederik Sausmikat: ext2fs: add support for inodes > 128 bytes
Support variable inode sizes.
catch up with variable inode size.
Don't use e2fs_inode_size in superblock on E2FS_REV0 file system.
- accept only EXT2_REV0_DINODE_SIZE inodesize on -O 0
- use inodesize to get offset of inode, not struct ext2fs_dinode array
Replace a magic number with a new EXT2_REV0_DINODE_SIZE macro.
Use EXT2_DINODE_SIZE() to get offset of inode, not struct ext2fs_dinode array.
Fix botched logic in inodesize check.
Use inodesize to get offset of inode in one more place.
- add a sanity check for e2fs_inode_size in readsb()
- use EXT2_DINODE_SIZE() rather than sizeof(struct ext2fs_dinode) or
struct ext2fs_dinode array/pointer to see e2fs_ipb and inode offsets
Sort options.
New sentence, new line.
Sort options in usage.
- unsigned -> unsigned int
- remove unnecessary casts from malloc(3) and free(3)
- fix a bogus indent
Use "size > INT32_MAX" rather than "size >= 0x80000000U" to check 2GB limit.
Add missed byteswap ops against ext2fs_dinode members.
Handle 32 bit uid field on E2FS_REV1.
 1.137.6.5 27-Oct-2009  bouyer branches: 1.137.6.5.2;
Pull up following revision(s) (requested by pooka in ticket #1112):
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.91
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.152
sys/ufs/ext2fs/ext2fs_extern.h: revision 1.42
update i_uid and i_gid after chown
 1.137.6.4 16-Oct-2009  snj Pull up following revision(s) (requested by tsutsui in ticket #1060):
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.148
Add a missed brelse(9) call after bread(9) in ext2fs_reload().
This may close PR kern/28712 (ext2fs hang on mount after fsck).
 1.137.6.3 16-Oct-2009  snj Pull up following revision(s) (requested by tsutsui in ticket #1060):
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.147
Pull a fix from ffs_vfsops.c rev 1.248:
Fix bug introduced in revision 1.174(*) where a NULL fspec with an MNT_UPDATE
command would always return EINVAL. This broke fsck on root, where fsck'ing
a dirty root would always return an error causing rc to resort in a reboot.
(*) This is "Apply the NFS exports list rototill patch" change
in ext2fs_vfsops.c rev 1.91.
 1.137.6.2 16-Oct-2009  snj Pull up following revision(s) (requested by tsutsui in ticket #1060):
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.146
Pull a fix for mount function from ffs_vfsops.c rev1.186:
Change ffs_mount, in MNT_UPDATE case, to check dev_t's for equality
instead of just vnode pointers. Fixes erroneous "does not match mounted
device" errors from mount(8) in the presence of MFS /dev, init.root, &c.
 1.137.6.1 29-Nov-2008  snj branches: 1.137.6.1.4;
Pull up following revision(s) (requested by mrg in ticket #147):
sys/ufs/ext2fs/ext2fs_alloc.c: revision 1.37
sys/ufs/ext2fs/ext2fs_bswap.c: revision 1.14
sys/ufs/ext2fs/ext2fs_dinode.h: revision 1.17
sys/ufs/ext2fs/ext2fs_lookup.c: revision 1.56
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.83
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.140
sys/ufs/ufs/inode.h: revision 1.55
add support for 32 bit uid/gid fields in ext2, but only do so for
when the revision is > REV0.
 1.137.6.6.2.1 28-Apr-2014  sborrill Pull up the following revisions(s) (requested by maxv in ticket #1901):
sys/kern/vfs_syscalls.c: revision 1.478, 1.480 via patch
sys/coda/coda_vfsops.c: revision 1.81
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50 via patch
sys/fs/puffs/puffs_vfsops.c: revision 1.110 via patch
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59 via patch
sys/fs/udf/udf_vfsops.c: revision 1.67
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/kern/vfs_syscalls.c: revision 1.479
sys/miscfs/nullfs/null_vfsops.c: revision 1.88 via patch
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/nfs/nfs_vfsops.c: revision 1.227
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/ufs/mfs/mfs_vfsops.c: revision 1.107

Due to missing checks in the mount syscall, and a wrong assumption on the
file systems side, the kernel could allocate an unbounded or zero-sized
memory buffer, and could dereference a NULL pointer when particular
arguments are given by a user.
 1.137.6.5.2.1 28-Apr-2014  sborrill Pull up the following revisions(s) (requested by maxv in ticket #1901):
sys/kern/vfs_syscalls.c: revision 1.478, 1.480 via patch
sys/coda/coda_vfsops.c: revision 1.81
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50 via patch
sys/fs/puffs/puffs_vfsops.c: revision 1.110 via patch
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59 via patch
sys/fs/udf/udf_vfsops.c: revision 1.67
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/kern/vfs_syscalls.c: revision 1.479
sys/miscfs/nullfs/null_vfsops.c: revision 1.88 via patch
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/nfs/nfs_vfsops.c: revision 1.227
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/ufs/mfs/mfs_vfsops.c: revision 1.107

Due to missing checks in the mount syscall, and a wrong assumption on the
file systems side, the kernel could allocate an unbounded or zero-sized
memory buffer, and could dereference a NULL pointer when particular
arguments are given by a user.
 1.137.6.1.4.1 21-Apr-2010  matt sync to netbsd-5
 1.137.4.3 28-Apr-2009  skrll Sync with HEAD.
 1.137.4.2 03-Mar-2009  skrll Sync with HEAD.
 1.137.4.1 19-Jan-2009  skrll Sync with HEAD.
 1.137.2.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.141.2.2 23-Jul-2009  jym Sync with HEAD.
 1.141.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.155.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.155.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.156.2.4 19-May-2011  rmind Implement sharing of vnode_t::v_interlock amongst vnodes:
- Lock is shared amongst UVM objects using uvm_obj_setlock() or getnewvnode().
- Adjust vnode cache to handle unsharing, add VI_LOCKSHARE flag for that.
- Use sharing in tmpfs and layerfs for underlying object.
- Simplify locking in ubc_fault().
- Sprinkle some asserts.

Discussed with ad@.
 1.156.2.3 05-Mar-2011  rmind sync with head
 1.156.2.2 03-Jul-2010  rmind sync with head
 1.156.2.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.159.6.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.161.2.6 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.161.2.5 23-Jan-2013  yamt sync with head
 1.161.2.4 16-Jan-2013  yamt sync with (a bit old) head
 1.161.2.3 30-Oct-2012  yamt sync with head
 1.161.2.2 23-May-2012  yamt sync with head.
 1.161.2.1 17-Apr-2012  yamt sync with head
 1.162.12.1 21-Apr-2014  bouyer Pull up following revision(s) (requested by maxv in ticket #1050):
sys/ufs/chfs/chfs_vfsops.c: revision 1.11
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/fs/nilfs/nilfs_vfsops.c: revision 1.16
sys/ufs/mfs/mfs_vfsops.c: revision 1.107
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/kern/vfs_syscalls.c: revision 1.478
sys/kern/vfs_syscalls.c: revision 1.479
sys/fs/puffs/puffs_vfsops.c: revision 1.110
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/nfs/nfs_vfsops.c: revision 1.227
sys/fs/v7fs/v7fs_vfsops.c: revision 1.10
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/miscfs/nullfs/null_vfsops.c: revision 1.88
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50
sys/coda/coda_vfsops.c: revision 1.81
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/kern/vfs_syscalls.c: revision 1.480
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/kern/vfs_syscalls.c: revision 1.482
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vfsops.c: revision 1.12
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/udf/udf_vfsops.c: revision 1.67
Limit check for 'data_len'. Otherwise a (un)privileged user can easily
panic the system by passing a huge size.
ok christos@
An (un)privileged user can easily make the kernel dereference a NULL
pointer.
The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).
ok christos@
Some fs's - like kernfs - set their vfs_min_mount_data to zero. Add a check
to prevent an (un)privileged user from requesting a zero-sized allocation
(and thus a panic).
This thing is totally buggy: 'data_len' is modified by the fs, so calling
kmem_free with it while its value has changed since the kmem_alloc is far
from being a good idea.
If the kernel figures out that something mismatches, it will panic
(typically with kernfs).
 1.162.10.1 21-Apr-2014  bouyer Pull up following revision(s) (requested by maxv in ticket #1050):
sys/ufs/chfs/chfs_vfsops.c: revision 1.11
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/fs/nilfs/nilfs_vfsops.c: revision 1.16
sys/ufs/mfs/mfs_vfsops.c: revision 1.107
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/kern/vfs_syscalls.c: revision 1.478
sys/kern/vfs_syscalls.c: revision 1.479
sys/fs/puffs/puffs_vfsops.c: revision 1.110
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/nfs/nfs_vfsops.c: revision 1.227
sys/fs/v7fs/v7fs_vfsops.c: revision 1.10
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/miscfs/nullfs/null_vfsops.c: revision 1.88
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50
sys/coda/coda_vfsops.c: revision 1.81
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/kern/vfs_syscalls.c: revision 1.480
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/kern/vfs_syscalls.c: revision 1.482
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vfsops.c: revision 1.12
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/udf/udf_vfsops.c: revision 1.67
Limit check for 'data_len'. Otherwise a (un)privileged user can easily
panic the system by passing a huge size.
ok christos@
An (un)privileged user can easily make the kernel dereference a NULL
pointer.
The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).
ok christos@
Some fs's - like kernfs - set their vfs_min_mount_data to zero. Add a check
to prevent an (un)privileged user from requesting a zero-sized allocation
(and thus a panic).
This thing is totally buggy: 'data_len' is modified by the fs, so calling
kmem_free with it while its value has changed since the kmem_alloc is far
from being a good idea.
If the kernel figures out that something mismatches, it will panic
(typically with kernfs).
 1.162.6.1 21-Apr-2014  bouyer Pull up following revision(s) (requested by maxv in ticket #1050):
sys/ufs/chfs/chfs_vfsops.c: revision 1.11
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/fs/nilfs/nilfs_vfsops.c: revision 1.16
sys/ufs/mfs/mfs_vfsops.c: revision 1.107
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/kern/vfs_syscalls.c: revision 1.478
sys/kern/vfs_syscalls.c: revision 1.479
sys/fs/puffs/puffs_vfsops.c: revision 1.110
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/nfs/nfs_vfsops.c: revision 1.227
sys/fs/v7fs/v7fs_vfsops.c: revision 1.10
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/miscfs/nullfs/null_vfsops.c: revision 1.88
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50
sys/coda/coda_vfsops.c: revision 1.81
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/kern/vfs_syscalls.c: revision 1.480
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/kern/vfs_syscalls.c: revision 1.482
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vfsops.c: revision 1.12
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/udf/udf_vfsops.c: revision 1.67
Limit check for 'data_len'. Otherwise a (un)privileged user can easily
panic the system by passing a huge size.
ok christos@
An (un)privileged user can easily make the kernel dereference a NULL
pointer.
The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).
ok christos@
Some fs's - like kernfs - set their vfs_min_mount_data to zero. Add a check
to prevent an (un)privileged user from requesting a zero-sized allocation
(and thus a panic).
This thing is totally buggy: 'data_len' is modified by the fs, so calling
kmem_free with it while its value has changed since the kmem_alloc is far
from being a good idea.
If the kernel figures out that something mismatches, it will panic
(typically with kernfs).
 1.162.4.2 02-Jun-2012  mrg sync to latest -current.
 1.162.4.1 05-Apr-2012  mrg sync to latest -current.
 1.166.2.4 03-Dec-2017  jdolecek update from HEAD
 1.166.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.166.2.2 23-Jun-2013  tls resync from head
 1.166.2.1 25-Feb-2013  tls resync with head
 1.171.2.2 18-May-2014  rmind sync with head
 1.171.2.1 28-Aug-2013  rmind sync with head
 1.179.2.1 10-Aug-2014  tls Rebase.
 1.183.2.2 17-Jan-2015  martin Pull up following revision(s) (requested by maxv in ticket #427):
sys/compat/svr4/svr4_schedctl.c: revision 1.8
sys/netinet/tcp_timer.c: revision 1.88
sys/miscfs/genfs/layer_vfsops.c: revision 1.45
sys/compat/svr4/svr4_ioctl.c: revision 1.37
sys/ufs/chfs/chfs_vfsops.c: revision 1.14
sys/miscfs/fdesc/fdesc_vfsops.c: revision 1.91
sys/compat/linux/arch/i386/linux_ptrace.c: revision 1.30
sys/compat/common/kern_time_50.c: revision 1.28
sys/netinet6/ip6_forward.c: revision 1.74
sys/miscfs/umapfs/umap_vnops.c: revision 1.57
sys/compat/svr4/svr4_fcntl.c: revision 1.74
distrib/sets/lists/comp/mi: revision 1.1931
sys/netinet6/udp6_output.c: revision 1.46
sys/fs/puffs/puffs_compat.c: revision 1.3
sys/fs/udf/udf_rename.c: revision 1.11
sys/compat/svr4/svr4_filio.c: revision 1.24
sys/fs/udf/udf_rename.c: revision 1.12
sys/netinet/tcp_usrreq.c: revision 1.202
sys/miscfs/umapfs/umap_subr.c: revision 1.29
sys/compat/linux/common/linux_fadvise64.c: revision 1.3
sys/netinet/if_atm.c: revision 1.34
sys/miscfs/procfs/procfs_subr.c: revision 1.106
sys/miscfs/genfs/layer_subr.c: revision 1.37
sys/netinet/tcp_sack.c: revision 1.30
sys/compat/freebsd/freebsd_misc.c: revision 1.33
sys/compat/freebsd/freebsd_file.c: revision 1.33
sys/ufs/chfs/chfs_vnode.c: revision 1.12
sys/compat/svr4/svr4_ttold.c: revision 1.34
sys/compat/linux/common/linux_file.c: revision 1.114
sys/compat/linux/arch/mips/linux_machdep.c: revision 1.43
sys/compat/linux/common/linux_signal.c: revision 1.76
sys/compat/common/compat_util.c: revision 1.46
sys/compat/linux/arch/arm/linux_ptrace.c: revision 1.18
sys/compat/svr4/svr4_sockio.c: revision 1.36
sys/compat/linux/arch/arm/linux_machdep.c: revision 1.32
sys/compat/svr4/svr4_signal.c: revision 1.66
sys/kern/kern_exec.c: revision 1.410
sys/fs/puffs/puffs_vfsops.c: revision 1.115
sys/compat/svr4/svr4_exec_elf64.c: revision 1.15
sys/compat/linux/arch/i386/linux_machdep.c: revision 1.159
sys/compat/linux/arch/alpha/linux_machdep.c: revision 1.50
sys/compat/linux32/common/linux32_misc.c: revision 1.24
sys/netinet/in_pcb.c: revision 1.153
sys/sys/malloc.h: revision 1.116
sys/compat/common/if_43.c: revision 1.9
share/man/man9/Makefile: revision 1.380
sys/netinet/tcp_vtw.c: revision 1.12
sys/miscfs/umapfs/umap_vfsops.c: revision 1.95
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.186
sys/compat/common/uipc_syscalls_43.c: revision 1.46
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.115
sys/fs/puffs/puffs_msgif.c: revision 1.97
sys/compat/svr4/svr4_ipc.c: revision 1.27
sys/compat/linux/common/linux_exec.c: revision 1.117
sys/ufs/ext2fs/ext2fs_readwrite.c: revision 1.66
sys/netinet/tcp_output.c: revision 1.179
sys/compat/svr4/svr4_termios.c: revision 1.28
sys/fs/udf/udf_strat_bootstrap.c: revision 1.4
sys/fs/puffs/puffs_subr.c: revision 1.67
sys/fs/puffs/puffs_node.c: revision 1.36
sys/miscfs/overlay/overlay_vnops.c: revision 1.21
sys/fs/cd9660/cd9660_node.c: revision 1.34
sys/netinet/raw_ip.c: revision 1.146
sys/sys/mallocvar.h: revision 1.13
sys/miscfs/overlay/overlay_vfsops.c: revision 1.63
share/man/man9/malloc.9: revision 1.50
sys/netinet6/dest6.c: revision 1.18
sys/compat/linux/common/linux_uselib.c: revision 1.33
sys/compat/linux/common/linux_socket.c: revision 1.120
share/man/man9/malloc.9: revision 1.51
sys/netinet/tcp_subr.c: revision 1.257
sys/compat/linux/common/linux_socketcall.c: revision 1.45
sys/compat/linux/common/linux_fadvise64_64.c: revision 1.3
sys/compat/freebsd/freebsd_ipc.c: revision 1.17
sys/compat/linux/common/linux_misc_notalpha.c: revision 1.109
sys/compat/linux/arch/alpha/linux_pipe.c: revision 1.17
sys/netinet6/in6_pcb.c: revision 1.132
sys/netinet6/in6_ifattach.c: revision 1.94
sys/compat/svr4/svr4_exec_elf32.c: revision 1.15
sys/miscfs/nullfs/null_vfsops.c: revision 1.90
sys/fs/cd9660/cd9660_util.c: revision 1.12
sys/compat/linux/arch/powerpc/linux_machdep.c: revision 1.48
sys/compat/freebsd/freebsd_exec_elf32.c: revision 1.20
sys/miscfs/procfs/procfs_vfsops.c: revision 1.94
sys/compat/linux/arch/powerpc/linux_ptrace.c: revision 1.28
sys/compat/linux/common/linux_sched.c: revision 1.67
sys/compat/linux/common/linux_exec_aout.c: revision 1.67
sys/compat/linux/common/linux_pipe.c: revision 1.67
sys/compat/linux/common/linux_llseek.c: revision 1.34
sys/compat/linux/arch/mips/linux_ptrace.c: revision 1.10
Do not uselessly include <sys/malloc.h>.
Cleanup:
- remove struct kmembuckets (dead)
- correctly deadify MALLOC_XX
- remove MALLOC_DEFINE_LIMIT and MALLOC_JUSTDEFINE_LIMIT (dead)
- remove malloc_roundup(), malloc_type_setlimit(), MALLOC_DEFINE_LIMIT()
and MALLOC_JUSTDEFINE_LIMIT() from man 9 malloc
New sentence, new line. Bump date for previous.
Obsolete malloc_roundup(9), malloc_type_setlimit(9) and MALLOC_DEFINE_LIMIT(9)
man pages.
 1.183.2.1 22-Aug-2014  martin Pull up following revision(s) (requested by hannken in ticket #49):
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.184
Use mount from argument "mp", "vp->v_mount" is not valid here.
PR kern/49142 (panic in ext2fs_loadvnode mounting an ext2fs filesystem)
 1.186.2.3 28-Aug-2017  skrll Sync with HEAD
 1.186.2.2 05-Oct-2016  skrll Sync with HEAD
 1.186.2.1 06-Apr-2015  skrll Sync with HEAD
 1.193.2.4 26-Apr-2017  pgoyette Sync with HEAD
 1.193.2.3 20-Mar-2017  pgoyette Sync with HEAD
 1.193.2.2 06-Aug-2016  pgoyette Sync with HEAD
 1.193.2.1 20-Jul-2016  pgoyette Adapt machine-independant code to the new {b,c}devsw reference-counting
(using localcount(9)). All callers of {b,c}devsw_lookup() now call
{b,c}devsw_lookup_acquire() which retains a reference on the 'struct
{b,c}devsw'. This reference must be released by the caller once it is
finished with the structure's content (or other data that would disappear
if the 'struct {b,c}devsw' were to disappear).
 1.204.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.208.2.2 17-May-2017  pgoyette Typo - insert missing *
 1.208.2.1 17-May-2017  pgoyette Adapt for bdevsw_lookup_acquire()
 1.210.2.3 18-Jan-2019  pgoyette Synch with HEAD
 1.210.2.2 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.210.2.1 25-Jun-2018  pgoyette Sync with HEAD
 1.211.2.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.211.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.211.2.1 10-Jun-2019  christos Sync with HEAD
 1.214.4.3 29-Feb-2020  ad Sync with head.
 1.214.4.2 19-Jan-2020  ad Set IMNT_SHRLOOKUP and use it for the in-cache case. Need to check what
more can be done with tmpfs though, it can probably do the whole lookup.
 1.214.4.1 17-Jan-2020  ad Sync with head.
 1.214.2.1 07-Jan-2025  martin Pull up following revision(s) (requested by hannken in ticket #1934):

sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.228
sys/ufs/lfs/lfs_vfsops.c: revision 1.383
sys/ufs/ffs/ffs_wapbl.c: revision 1.50
sys/ufs/ffs/ffs_vfsops.c: revision 1.383 (patch)
sys/ufs/ffs/ffs_vfsops.c: revision 1.384 (patch)

Remove comment "we are always called with the filesystem marked `MPBUSY'."
above some xxx_sync() operations. These operations get called without
any exclusive lock.

This comment appeared with "add quota support" on 1990-05-02.
On 1998/02/18 MNT_MPBUSY disappeared when vfs_busy() was changed from
an exclusive lock to a shared lock.

PR kern/58837 "ffs: Missing locking around fs_fmod/time"

Protect test/clear fs->fs_fmod with um_lock like it is already
protected in ffs_alloc.c.

When writing to disk protect moving superblock to buffer with um_lock.

Set/clear fs->fmod while mounting, updating a mount or unmounting
is safe as these operations run exclusive, either mounting creates
a new file system or the file system is suspended. Assert suspension
for update and unmount.

PR kern/58837 "ffs: Missing locking around fs_fmod/time"
 1.221.4.1 07-Jan-2025  martin Pull up following revision(s) (requested by hannken in ticket #1037):

sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.228
sys/ufs/lfs/lfs_vfsops.c: revision 1.383
sys/ufs/ffs/ffs_wapbl.c: revision 1.50
sys/ufs/ffs/ffs_vfsops.c: revision 1.383
sys/ufs/ffs/ffs_vfsops.c: revision 1.384

Remove comment "we are always called with the filesystem marked `MPBUSY'."
above some xxx_sync() operations. These operations get called without
any exclusive lock.

This comment appeared with "add quota support" on 1990-05-02.
On 1998/02/18 MNT_MPBUSY disappeared when vfs_busy() was changed from
an exclusive lock to a shared lock.

PR kern/58837 "ffs: Missing locking around fs_fmod/time"

Protect test/clear fs->fs_fmod with um_lock like it is already
protected in ffs_alloc.c.

When writing to disk protect moving superblock to buffer with um_lock.

Set/clear fs->fmod while mounting, updating a mount or unmounting
is safe as these operations run exclusive, either mounting creates
a new file system or the file system is suspended. Assert suspension
for update and unmount.

PR kern/58837 "ffs: Missing locking around fs_fmod/time"
 1.225.6.1 02-Aug-2025  perseant Sync with HEAD
 1.139 29-Jan-2024  christos PR/57889: Ricardo Branco: ext2fs does not have user immutable and append
file flags, only system ones. Restrict those to the superuser. Before
the behavior was controlled by EXT2FS_SYSTEM_FLAGS. Make that behavior the
default.
 1.138 26-Aug-2023  riastradh ext2fs: Nix trailing whitespace.
 1.137 27-Mar-2022  christos add a kauth vnode check for creating links
 1.136 20-Oct-2021  thorpej Overhaul of the EVFILT_VNODE kevent(2) filter:

- Centralize vnode kevent handling in the VOP_*() wrappers, rather than
forcing each individual file system to deal with it (except VOP_RENAME(),
because VOP_RENAME() is a mess and we currently have 2 different ways
of handling it; at least it's reasonably well-centralized in the "new"
way).
- Add support for NOTE_OPEN, NOTE_CLOSE, NOTE_CLOSE_WRITE, and NOTE_READ,
compatible with the same events in FreeBSD.
- Track which kevent notifications clients are interested in receiving
to avoid doing work for events no one cares about (avoiding, e.g.
taking locks and traversing the klist to send a NOTE_WRITE when
someone is merely watching for a file to be deleted, for example).

In support of the above:

- Add support in vnode_if.sh for specifying PRE- and POST-op handlers,
to be invoked before and after vop_pre() and vop_post(), respectively.
Basic idea from FreeBSD, but implemented differently.
- Add support in vnode_if.sh for specifying CONTEXT fields in the
vop_*_args structures. These context fields are used to convey information
between the file system VOP function and the VOP wrapper, but do not
occupy an argument slot in the VOP_*() call itself. These context fields
are initialized and subsequently interpreted by PRE- and POST-op handlers.
- Version VOP_REMOVE(), uses the a context field for the file system to report
back the resulting link count of the target vnode. Return this in tmpfs,
udf, nfs, chfs, ext2fs, lfs, and ufs.

NetBSD 9.99.92.
 1.135 18-Jul-2021  dholland Abolish all the silly indirection macros for initializing vnode ops tables.

These are things of the form #define foofs_op genfs_op, or #define
foofs_op genfs_eopnotsupp, or similar. They serve no purpose besides
obfuscation, and have gotten cutpasted all over everywhere.
 1.134 18-Jul-2021  dholland Use macros for the canned parts of device and fifo vnode op tables.

Add GENFS_SPECOP_ENTRIES and GENFS_FIFOOP_ENTRIES macros that contain
the portion of the vnode ops table declaration that is
(conservatively) the same in every fs. Use these in every fs that
supports devices and/or fifos with separate ops tables.

Note that ptyfs works differently (it has one type of vnode with
open-coded dispatch to the specfs code, which I haven't changed in
this commit) and rump/librump/rumpvfs/rumpfs.c has an indirect dynamic
dispatch that already does more or less the same thing, which I also
haven't changed.

Also note that this anticipates a few bits in the next changeset here
and there, and adds missing but unreachable calls in some cases (e.g.
most fses weren't defining whiteout on devices and fifos, but it isn't
reachable there), and it changes parsepath on devices and fifos to
genfs_badop from genfs_parsepath (but it's not reachable there
either).

It appears that devices in kernfs were missing kqfilter, so it's
possible that if you try to use kqueue on /kern/rootdev that it'll
explode.

And finally note that the ops declaration tables aren't
order-dependent. (Other than vop_default_desc has to come first.)
Otherwise this wouldn't work.
 1.133 29-Jun-2021  dholland - Add a new vnode op: VOP_PARSEPATH.
- Move namei_getcomponent to genfs_vnops.c and call it genfs_parsepath.
- Add a parsepath entry to every vnode ops table.

VOP_PARSEPATH takes a directory vnode to be searched and a complete
following path and chooses how much of that path to consume. To begin
with, all parsepath calls are genfs_parsepath, which locates the first
'/' as always.

Note that the call doesn't take the whole struct componentname, only
the string. The other bits of struct componentname should not be
needed and there's no reason to cause potential complications by
exposing them.
 1.132 16-May-2020  christos branches: 1.132.6;
Add ACL support for FFS. From FreeBSD.
 1.131 08-Mar-2020  kamil Perform bit operations on unsigned integer

ext2fs_vnops.c:1002:2, signed integer overflow: 510008 * 4294 cannot be represented in type 'int

Maximum usec * 4294 is in the range of unsigned int.

>>> 1000000*4294
4294000000
>>> 2**32
4294967296

Patch submitted by Nisarg S. Joshi.
 1.130 18-Sep-2019  christos Add newly created vnodes to the namei cache. The rest of the filesystems
already did that (or they don't support writing). Discussed in tech-kern.
 1.129 01-Jan-2019  hannken Add "void *extra" argument to vcache_new() so a file system may
pass more information about the file to create.

Welcome to 8.99.30
 1.128 28-May-2017  hannken branches: 1.128.8; 1.128.10;
Change ext2fs to use vcache_new like we did for ffs:
- Change ext2fs_valloc to return an inode number.
- Make ext2fs_makeinode private to ext2fs_vnops.c and
pass vattr instead of mode.
 1.127 26-May-2017  riastradh Make VOP_RECLAIM do the last unlock of the vnode.

VOP_RECLAIM naturally has exclusive access to the vnode, so having it
locked on entry is not strictly necessary -- but it means if there
are any final operations that must be done on the vnode, such as
ffs_update, requiring exclusive access to it, we can now kassert that
the vnode is locked in those operations.

We can't just have the caller release the last lock because some file
systems don't use genfs_lock, and require the vnode to remain valid
for VOP_UNLOCK to work, notably unionfs.
 1.126 26-Apr-2017  riastradh Change VOP_REMOVE and VOP_RMDIR to preserve lock/ref on dvp.

No change to vp -- the plan is to replace the node by the
componentname in the vop parameters, and let all directory vops do
lookups internally.

Proposed on tech-kern with no objections:
https://mail-index.netbsd.org/tech-kern/2017/04/17/msg021825.html
 1.125 15-Aug-2016  jdolecek bump link limit to 65000 for files, and add support for EXT2F_ROCOMPAT_DIR_NLINK to make link count unlimited for directories
 1.124 15-Aug-2016  jdolecek adjust ext2fs_makeinode() so that the direnter is optional, use the function (with the direnter off) in ext2fs_mkdir() instead of the code copy; adjust ext2fs_makeinode() to initialize extra_isize and set creation time, if supported by the filesystem
 1.123 14-Aug-2016  jdolecek switch code to use the EXT2_HAS_{COMPAT|ROCOMPAT|INCOMPAT}_FEATURE() macros instead of open coding the checks
 1.122 13-Aug-2016  christos KNF, no functional changes...
 1.121 12-Aug-2016  jdolecek add support for extended attributes in ext2fs for ext3/ext4; read-only for now
 1.120 05-Aug-2016  jdolecek PR kern/7867 add support for UF_NODUMP flag to ext2fs
 1.119 03-Aug-2016  jdolecek get and set expanded timestamp if the inode contains the extra information, add support for create time
 1.118 03-Aug-2016  jdolecek support arbitrary ext3/ext4 inode size, add all the new ext4 fields ext2fs_dinode, and add support for loading the extra inode data
 1.117 20-Apr-2015  riastradh branches: 1.117.2;
Make VOP_LINK return directory still locked and referenced.

Ride 7.99.10 bump.
 1.116 27-Mar-2015  riastradh Disentangle buffer-cached I/O from page-cached I/O in UFS.

Page-cached I/O is used for regular files, and is initiated by VFS
users such as userland and NFS.

Buffer-cached I/O is used for directories and symlinks, and is issued
only internally by UFS.

New UFS routine ufs_bufio replaces vn_rdwr for internal use.
ufs_bufio is implemented by new UFS operations uo_bufrd/uo_bufwr,
which sit in ufs_readwrite.c alongside the VOP_READ/VOP_WRITE
implementations.

I preserved the code as much as possible and will leave further
simplification for future commits. I kept the ulfs_readwrite.c
copypasta close to ufs_readwrite.c in case we ever want to merge them
back; likewise ext2fs_readwrite.c.

No externally visible semantic change. All atf fs tests still pass.
 1.115 09-Nov-2014  maxv branches: 1.115.2;
Do not uselessly include <sys/malloc.h>.
 1.114 18-Oct-2014  snj src is too big these days to tolerate superfluous apostrophes. It's
"its", people!
 1.113 25-Jul-2014  dholland branches: 1.113.2;
Add VOP_FALLOCATE and VOP_FDISCARD to every vnode ops table I can
find.

The filesystem ones all call genfs_eopnotsupp - right now I am only
implementing the plumbing and we can implement fallocate and/or
fdiscard for files later.

The device ones call spec_fallocate (which is also genfs_eopnotsupp)
and spec_fdiscard, which dispatches to the device-level op.

The fifo ones all call vn_fifo_bypass, which also ends up being
EOPNOTSUPP.
 1.112 25-May-2014  hannken ext2fs_mknod: use vcache_get() to reload the new node.
 1.111 24-Mar-2014  hannken branches: 1.111.2;
- Make VI_XLOCK, VI_CLEAN and VI_LOCKSHARE private to kern/vfs_*.c.
- Make vwait() static.
- Add vdead_check() to check a vnode for being or becoming dead.

Discussed on tech-kern.

Welcome to 6.99.38
 1.110 23-Jan-2014  hannken Change vnode operations create, mknod, mkdir and symlink to return
the resulting vnode *vpp unlocked.

Discussed on tech-kern@

Welcome to 6.99.30
 1.109 21-Jan-2014  hannken Move VOP_UNLOCK() after setting type to VNON like all other UFS file systems.
 1.108 17-Jan-2014  hannken Change vnode operations create, mknod, mkdir and symlink to keep the
directory node dvp locked on return.

Discussed on tech-kern@

Welcome to 6.99.29
 1.107 18-Mar-2013  plunky branches: 1.107.6;
C99 section 6.7.2.3 (Tags) Note 3 states that:

A type specifier of the form

enum identifier

without an enumerator list shall only appear after the type it
specifies is complete.

which means that we cannot pass an "enum vtype" argument to
kauth_access_action() without fully specifying the type first.
Unfortunately there is a complicated include file loop which
makes that difficult, so convert this minimal function into a
macro (and capitalize it).

(ok elad@)
 1.106 22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.105 21-Nov-2012  jakllsch Write support for the Ext4 Read-only Compatible Feature "huge_file".

Primarily, this feature extends the inode block count field to 48 bits.
Additionally, this feature allows this field to be represented in file
system block size units rather than DEV_BSIZE units.
 1.104 09-May-2012  riastradh branches: 1.104.2;
Adapt ffs, lfs, and ext2fs to use genfs_rename.

ok dholland, rmind
 1.103 29-Apr-2012  chs change vflushbuf() to take the full FSYNC_* flags.
translate FSYNC_LAZY into PGO_LAZY for VOP_PUTPAGES() so that
genfs_do_io() can set the appropriate io priority for the I/O.
this is the first part of addressing PR 46325.
 1.102 13-Mar-2012  elad Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.
 1.101 18-Nov-2011  christos branches: 1.101.4; 1.101.6;
Obey MNT_RELATIME, the only addition is that mkdir in ufs sets IN_ACCESS too.
 1.100 12-Jul-2011  dholland branches: 1.100.2;
Pass the ufs_lookup_results pointer around instead of fetching it from
the inode in the guts of ufs. Now, in VOPs where i_crap is used it is
used (directly) only immediately on entry to the VOP call and then
passed around by reference.

Except for rename, which needs explicit sorting out. The code in
ufs_wapbl_rename is unchanged in behavior but I'm increasingly
inclined to think it's wrong.
 1.99 26-Apr-2011  hannken Change vflushbuf() to return an error if a synchronous write fails.

Welcome to 5.99.51.
 1.98 24-Apr-2011  rmind sys_link: prevent hard links on directories (cross-mount operations are
already prevented). File systems are no longer responsible to check this.
Clean up and add asserts (note that dvp == vp cannot happen in vop_link).

OK dholland@
 1.97 02-Jan-2011  dholland branches: 1.97.2;
Remove the special refcount behavior (adding an extra reference to the
parent dir) associated with SAVESTART in relookup().

Check all call sites to make sure that SAVESTART wasn't set while
calling relookup(); if it was, adjust the refcount behavior. Remove
related references to SAVESTART.

The only code that was reaching the extra ref was msdosfs_rename,
where the refcount behavior was already fairly broken and/or gross;
repair it.

Add a dummy 4th argument to relookup to make sure code that hasn't
been inspected won't compile. (This will go away next time the
relookup semantics change, which they will.)
 1.96 30-Nov-2010  dholland Abolish the SAVENAME and HASBUF flags. There is now always a buffer,
so the path in a struct componentname is now always valid during VOP
calls.
 1.95 30-Nov-2010  dholland Abolish struct componentname's cn_pnbuf. Use the path buffer in the
pathbuf object passed to namei as work space instead. (For now a pnbuf
pointer appears in struct nameidata, to support certain unclean things
that haven't been fixed yet, but it will be going away in the future.)

This removes the need for the SAVENAME and HASBUF namei flags.
 1.94 28-Jul-2010  hannken ext2fs,ffs: free on disk inodes in the reclaim routine.
Remove now unneeded vnode flag VI_FREEING.

Welcome to 5.99.38.

Ok: Andrew Doran <ad@netbsd.org>
 1.93 24-Jun-2010  hannken Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.
 1.92 29-Mar-2010  pooka Stop exposing fifofs internals and leave only fifo_vnodeop_p visible.
 1.91 21-Oct-2009  pooka branches: 1.91.2; 1.91.4;
update i_uid and i_gid after chown
 1.90 19-Oct-2009  bouyer Remove closes 3 & 4 from my licence. Lots of thanks to Soren Jacobsen
for the booring work !
 1.89 12-Sep-2009  tsutsui Whitespace nits.
 1.88 03-Jul-2009  elad Where possible, extract the file-system's access() routine to two internal
functions: the first checking if the operation is possible (regardless of
permissions), the second checking file-system permissions, ACLs, etc.

Mailing list reference:

http://mail-index.netbsd.org/tech-kern/2009/06/21/msg005311.html
 1.87 23-Jun-2009  elad Move the implementation of vaccess() to genfs_can_access(), in line with
the other routines of the same spirit.

Adjust file-system code to use it.

Keep vaccess() for KPI compatibility and to keep element of least
surprise. A "diagnostic" message warning that vaccess() is deprecated will
be printed when it's used (obviously, only in DIAGNOSTIC kernels).

No objections on tech-kern@:

http://mail-index.netbsd.org/tech-kern/2009/06/21/msg005310.html
 1.86 07-May-2009  elad Extract the open-coded authorization logic for chtimes() from various
file-systems and put it in a single function, genfs_can_chtimes().

This also makes UDF follow the same policy as all other file-systems.

Mailing list reference:

http://mail-index.netbsd.org/tech-kern/2009/04/27/msg004951.html
 1.85 22-Apr-2009  elad Per discussion on tech-kern@:

- Replace use of label/goto with returns

- Rename, change prototype of, and move functions from vfs_subr.c to
genfs_vnops.c
 1.84 20-Apr-2009  elad Refactor some duplicated file-system code.

Proposed and received no objections on tech-kern@:

http://mail-index.netbsd.org/tech-kern/2009/04/18/msg004843.html
 1.83 23-Nov-2008  mrg branches: 1.83.4;
add support for 32 bit uid/gid fields in ext2, but only do so for
when the revision is > REV0.
 1.82 29-Apr-2008  ad branches: 1.82.6; 1.82.8; 1.82.10;
PR kern/38057 ffs makes assuptions about devvp file system
PR kern/33406 softdeps get stuck in endless loop

Introduce VFS_FSYNC() and call it when syncing a block device, if it
has a mounted file system.
 1.81 25-Jan-2008  ad branches: 1.81.6; 1.81.8; 1.81.10;
Remove VOP_LEASE. Discussed on tech-kern.
 1.80 24-Jan-2008  ad specfs changes for PR kern/37717 (raidclose() is no longer called on
shutdown). There are still problems with device access and a PR will be
filed.

- Kill checkalias(). Allow multiple vnodes to reference a single device.

- Don't play dangerous tricks with block vnodes to ensure that only one
vnode can describe a block device. Instead, prohibit concurrent opens of
block devices. As a bonus remove the unreliable code that prevents
multiple file system mounts on the same device. It's no longer needed.

- Track opens by vnode and by device. Issue cdev_close() when the last open
goes away, instead of abusing vnode::v_usecount to tell if the device is
open.
 1.79 09-Jan-2008  ad Go back to freeing on disk inodes in the inactive routine. It would be
better not to do this, but it rules out potential side effects with softdep.
 1.78 02-Jan-2008  ad Merge vmlocking2 to head.
 1.77 08-Dec-2007  pooka branches: 1.77.4;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.
 1.76 26-Nov-2007  pooka branches: 1.76.2;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.75 10-Oct-2007  ad branches: 1.75.4;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.74 04-Mar-2007  christos branches: 1.74.2; 1.74.14; 1.74.16; 1.74.18;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.73 20-Feb-2007  ad Call genfs_node_destroy() where appropriate.
 1.72 04-Jan-2007  elad branches: 1.72.2;
Consistent usage of KAUTH_GENERIC_ISSUSER.
 1.71 02-Jan-2007  elad Add KAUTH_SYSTEM_CHSYSFLAGS so we can get rid of the last three
securelevel references (ufs, ext2fs, tmpfs).

Intentionally undocumented.
 1.70 09-Dec-2006  chs a smorgasbord of improvements to vnode locking and path lookup:
- LOCKPARENT is no longer relevant for lookup(), relookup() or VOP_LOOKUP().
these now always return the parent vnode locked. namei() works as before.
lookup() and various other paths no longer acquire vnode locks in the
wrong order via vrele(). fixes PR 32535.
as a nice side effect, path lookup is also up to 25% faster.
- the above allows us to get rid of PDIRUNLOCK.
- also get rid of WANTPARENT (just use LOCKPARENT and unlock it).
- remove an assumption in layer_node_find() that all file systems implement
a recursive VOP_LOCK() (unionfs doesn't).
- require that all file systems supply vfs_vptofh and vfs_fhtovp routines.
fill in eopnotsupp() for file systems that don't support being exported
and remove the checks for NULL. (layerfs calls these without checking.)
- in union_lookup1(), don't change refcounts in the ISDOTDOT case, just
adjust which vnode is locked. fixes PR 33374.
- apply fixes for ufs_rename() from ufs_vnops.c rev. 1.61 to ext2fs_rename().
 1.69 03-Oct-2006  christos branches: 1.69.2;
redo previous: It is better to add a KASSERT, since this is code is same
with ufs.
 1.68 03-Oct-2006  christos Coverity CID 3689: dp cannot be NULL at this point, so don't check for it.
 1.67 23-Jul-2006  ad branches: 1.67.4; 1.67.6;
Use the LWP cached credentials where sane.
 1.66 07-Jun-2006  kardel merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.65 14-May-2006  elad branches: 1.65.2;
integrate kauth.
 1.64 11-Dec-2005  christos branches: 1.64.4; 1.64.6; 1.64.8; 1.64.10; 1.64.12;
merge ktrace-lwp.
 1.63 02-Nov-2005  yamt branches: 1.63.2;
merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.62 12-Sep-2005  christos branches: 1.62.2;
Use nanotime() to update the time fields in filesystems. Convert the code
from macros to real functions. Original patch and review from chuq.
Note: ext2fs only keeps seconds in the on-disk inode, and msdosfs does not
have enough precision for all fields, so this is not very useful for those
two.
 1.61 30-Aug-2005  xtraeme * Remove __P()
* Use ANSI function declarations on ext2fs and mfs
 1.60 28-Jun-2005  kml branches: 1.60.2;
Ensure that we change the size of the vnode at the same time as
we change the size of the inode, and use ext2fs_size uniformly.
This fixes a crash that occurs when I create a directory, then
move it, all on an ext2 filesystem.
 1.59 26-Feb-2005  perry nuke trailing whitespace
 1.58 09-Feb-2005  ws Add support for large files (>2GB).
Like Linux, automagically convert old filesystem to use this,
if they are already at revision 1.
For revision 0, just punt (unlike Linux; makes me a bit too nervous.)

There should be an option to fsck_ext2fs to upgrade revision 0 to revision 1.

Reviewd by Manuel (bouyer@).
 1.57 21-Sep-2004  thorpej branches: 1.57.4; 1.57.6;
Add a new VNODE_LOCKDEBUG option, which enables checks in the VOP_*()
calls to ensure that the vnode lock state is as expected when the VOP
call is made. Modify vnode_if.src to set the expected state according
to the documenting lock table for each VOP. Modify vnode_if.sh to emit
the checks.

Notes:
- The checks are only performed if the vnode has the VLOCKSWORK bit
set. Some file systems (e.g. specfs) don't even bother with vnode
locks, so of course the checks will fail.
- We can't actually run with VNODE_LOCKDEBUG because there are so many
vnode locking problems, not the least of which is the "use SHARED for
VOP_READ()" issue, which screws things up for the entire call chain.

Inspired by similar changes in OpenBSD, but implemented differently.
 1.56 17-Sep-2004  skrll There's no need to pass a proc value when using UIO_SYSSPACE with
vn_rdwr(9) and uiomove(9).

OK'd by Jason Thorpe
 1.55 15-Aug-2004  mycroft Fixing age old cruft:
* Rather than using mnt_maxsymlinklen to indicate that a file systems returns
d_type fields(!), add a new internal flag, IMNT_DTYPE.

Add 3 new elements to ufsmount:
* um_maxsymlinklen, replaces mnt_maxsymlinklen (which never should have existed
in the first place).
* um_dirblksiz, which tracks the current directory block size, eliminating the
FS-specific checks littered throughout the code. This may be used later to
make the block size variable.
* um_maxfilesize, which is the maximum file size, possibly adjusted lower due
to implementation issues.

Sync some bug fixes from FFS into ext2fs, particularly:
* ffs_lookup.c 1.21, 1.28, 1.33, 1.48
* ffs_inode.c 1.43, 1.44, 1.45, 1.66, 1.67
* ffs_vnops.c 1.84, 1.85, 1.86

Clean up some crappy pointer frobnication.
 1.54 14-Aug-2004  mycroft Push atime/mtime updates even further -- into the reclaim path, so they happen
rarely in the normal case. (Note: This happens at reboot/shutdown time because
all file systems are unmounted.)

Also, for IN_MODIFY, use IN_ACCESSED, not IN_MODIFIED; otherwise "ls -l" of
your device node or FIFO would cause the time stamps to get written too
quickly.
 1.53 22-May-2004  kleink POSIX: Permit a process without the appropriate privilege to change a
file's group ID to its effective gid, in addition to the presently
permitted set of supplementary gids.

From Mark Davies in PR standards/25401.
 1.52 22-Mar-2004  bouyer branches: 1.52.2;
Fix disclaimer in my copyright. Pointed out by Thomas Klausner.
 1.51 05-Oct-2003  bouyer Remove references to University of California from my copyright notices.
 1.50 09-Aug-2003  dsl Stop panic if 'mknod xxx b 0 0' done on a full filesystem.
panics in ffs_full_fsync because v_specmountpoint requires that the NULL
v_specinfo be followed.
Tidy up in the same order in all error paths so compiler can merge the
code sequences.

Fixes PR kern/22419
 1.49 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.48 29-Jun-2003  fvdl branches: 1.48.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.47 29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.46 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.45 26-May-2003  fvdl free the ext2fs dinode struct in ext2fs_reclaim. From Ted Unangst.
 1.44 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.43 23-Oct-2002  jdolecek ext2fs_remove(): use 'else' to eliminate need for goto (and improve
readibility, even)
 1.42 23-Oct-2002  jdolecek merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe
 1.41 22-Sep-2002  jdolecek don't need <sys/conf.h> here
 1.40 08-Nov-2001  lukem add RCSID
 1.39 26-Oct-2001  lukem remove #include <ufs/ufs/quota.h> where it was just to appease
<ufs/ufs/inode.h>, since the latter now includes the former. leave the former
in source that obviously uses specific bits of it (for completeness.)
 1.38 22-Sep-2001  sommerfeld branches: 1.38.2;
Add fifo_putpages() placebo so that the vnode's uobj is unlocked.
 1.37 15-Sep-2001  chs a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.36 24-Aug-2001  wiz branches: 1.36.2;
heirarchy -> hierarchy
 1.35 17-Aug-2001  chs add getpages/putpages entries for spec vnodes.
 1.34 24-Jul-2001  assar change vop_symlink and vop_mknod to return vpp (the created node)
refed, so that the caller can actually use it. update callers and
file systems that implement these vnode operations
 1.33 23-Mar-2001  fvdl branches: 1.33.4;
Same change as in the UFS code: unlock vnode before setting v_op
to spec_vnode_ops. From Bill Studenmund.
 1.32 07-Feb-2001  tsutsui branches: 1.32.2;
Fix nested extern declaration of prtactive.
 1.31 22-Jan-2001  jdolecek make filesystem vnodeop, specop, fifoop and vnodeopv_* arrays const
 1.30 27-Nov-2000  chs Initial integration of the Unified Buffer Cache project.
 1.29 03-Aug-2000  thorpej Convert namei pathname buffer allocation to use the pool allocator.
 1.28 22-Jul-2000  jdolecek change the lf_advlock() arguments from

int lf_advlock __P((struct lockf **,
off_t, caddr_t, int, struct flock *, int));
to

int lf_advlock __P((struct vop_advlock_args *, struct lockf **, off_t));

This matches common usage and is also compatible with similar change
in FreeBSD (though they use u_quad_t as last arg).
 1.27 28-Jun-2000  mrg remove include of <vm/vm.h> and <uvm/uvm_extern.h>
 1.26 13-May-2000  perseant branches: 1.26.4;
Change the sementics of the last parameter from a boolean ("waitfor") to
a set of flags ("flags"). Two flags are defined, UPDATE_WAIT and
UPDATE_DIROP.

Under the old semantics, VOP_UPDATE would block if waitfor were set,
under the assumption that directory operations should be done
synchronously. At least LFS and FFS+softdep do not make this
assumption; FFS+softdep got around the problem by enclosing all relevant
calls to VOP_UPDATE in a "if(!DOINGSOFTDEP(vp))", while LFS simply
ignored waitfor, one of the reasons why NFS-serving an LFS filesystem
did not work properly.

Under the new semantics, the UPDATE_DIROP flag is a hint to the
fs-specific update routine that the call comes from a dirop routine, and
should be wait for, or not, accordingly.

Closes PR#8996.
 1.25 30-Mar-2000  augustss Remove register declarations.
 1.24 22-Mar-2000  thorpej Small cosmetic change.
 1.23 28-Jan-2000  bouyer Correct (minor) bogons in filetype option support, and add support
for sparse_super option
 1.22 26-Jan-2000  bouyer First cut at ext2fs rev 1 support (as of mke2fs 1.18): supports the filetype
option read/write and the sparse option read-only.
 1.21 03-Aug-1999  wrstuden branches: 1.21.2; 1.21.8;
Add support for fcntl(2) to generate VOP_FCNTL calls. Any fcntl
call with F_FSCTL set and F_SETFL calls generate calls to a new
fileop fo_fcntl. Add genfs_fcntl() and soo_fcntl() which return 0
for F_SETFL and EOPNOTSUPP otherwise. Have all leaf filesystems
use genfs_fcntl().

Reviewed by: thorpej
Tested by: wrstuden
 1.20 08-Jul-1999  wrstuden Modify file systems to deal with struct lock in struct vnode. All leaf
fs's other than nfs use genfs_lock() for locking.

Modify lookup routines to set PDIRUNLOCK when they unlock the parrent.
 1.19 24-Mar-1999  mrg branches: 1.19.4;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.
 1.18 05-Mar-1999  mycroft Pass null pointers to VOP_UPDATE rather than having all the callers fetch the
current time themselves.
 1.17 26-Feb-1999  mrg pull across patches from warner losh <imp@freebsd.org> (freebsd ufs_vnops.c
versions 1.109&1.110), adjusted for our ext2fs support, and also commited
there also. this avoids overflowing the link count.
 1.16 02-Dec-1998  bouyer - intentation
- sync LK_* flags with ffs/ufs
 1.15 29-Sep-1998  bouyer #include opt_uvm.h only if _KENREL and !_LKM
Make ext2fs_init() call ufs_init(). it was doing the init by itself,
testing for extern done != 0. This bug was hidden by the fact that
ext2fs_init() is called before ffs_init().
 1.14 01-Sep-1998  thorpej Use the pool allocator and "nointr" pool page allocator for ext2fs inodes.
 1.13 09-Aug-1998  perry bzero->memset, bcopy->memcpy, bcmp->memcmp
 1.12 28-Jul-1998  mjacob fix to accomodate change in vn_rdwr prototype
 1.11 24-Jun-1998  sommerfe Always include fifos; "not an option any more".
 1.10 22-Jun-1998  sommerfe defopt for options FIFO
 1.9 20-Jun-1998  mrg splify UVM #ifdef.
 1.8 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.7 10-Feb-1998  mrg - add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.6 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)
 1.5 17-Oct-1997  christos branches: 1.5.2;
Add missing cast for nlink_t
 1.4 09-Oct-1997  bouyer Add byte-swapping functions (bswap16, bswap32, bswap64) to libkern.
Only assembly version for i386 bswap16 and bswap32 for now (bswap64 uses
bswap32). Contribution of assembly versions of these are welcome.
Add byte-swapping of ext2fs metadata for big-endian systems.
Tested on i386 and sparc.
 1.3 01-Jul-1997  bouyer branches: 1.3.2; 1.3.4;
Sync with ufs/ufs:
Avoid panic triggered by rename("foo/", "bar/..") (From Mycroft, via christos)
 1.2 30-Jun-1997  fvdl Return EPERM, not EISDIR for an attempt to remove a directory.
 1.1 11-Jun-1997  bouyer The ext2fs layer, based on the ffs/ufs one. Uses a few functions from
sys/ufs/ufs/
 1.3.4.1 14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.3.2.2 01-Jul-1997  bouyer Sync with ufs/ufs:
Avoid panic triggered by rename("foo/", "bar/..") (From Mycroft, via christos)
 1.3.2.1 01-Jul-1997  bouyer file ext2fs_vnops.c was added on branch bouyer-scsipi on 1997-07-01 07:34:04 +0000
 1.5.2.1 06-Nov-1998  cgd Show correct number of blocks used for files larger than 2GB.
Fixed in trunk as part of Lite-2 merging. (cgd)
 1.19.4.3 06-Aug-1999  chs UBCify.
 1.19.4.2 02-Aug-1999  thorpej Update from trunk.
 1.19.4.1 11-Jul-1999  chs remove uvm_vnp_uncache(), it's no longer needed.
 1.21.8.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.21.2.4 27-Mar-2001  bouyer Sync with HEAD.
 1.21.2.3 11-Feb-2001  bouyer Sync with HEAD.
 1.21.2.2 08-Dec-2000  bouyer Sync with HEAD.
 1.21.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.26.4.2 06-Apr-2001  he Pull up revision 1.33 (requested by wrstuden):
Explicitly VOP_UNLOCK before setting v_op to spec_vnode_ops_p.
Works around a lock leak and eventual kernel panic.
 1.26.4.1 30-Jul-2000  jdolecek Pullup from trunk (approved by thorpej):
Change lf_advlock() to:
int lf_advlock (struct vop_advlock_args *, struct lockf **, off_t)

This matches it's usage. Change inspired by FreeBSD, though we use
off_t instead u_quad_t as the last argument.

sys/lockf.h rev. 1.9
msdosfs/msdosfs_vnops.c rev. 1.99
kern/vfs_lockf.c rev. 1.17
miscfs/specfs/spec_vnops.c rev. 1.49
nfs/nfs_vnops.c rev. 1.115
ufs/ext2fs/ext2fs_vnops.c rev. 1.28
ufs/ufs/ufs_vnops.c rev. 1.72
 1.32.2.7 11-Nov-2002  nathanw Catch up to -current
 1.32.2.6 18-Oct-2002  nathanw Catch up to -current.
 1.32.2.5 14-Nov-2001  nathanw Catch up to -current.
 1.32.2.4 26-Sep-2001  nathanw Catch up to -current.
Again.
 1.32.2.3 21-Sep-2001  nathanw Catch up to -current.
 1.32.2.2 24-Aug-2001  nathanw Catch up with -current.
 1.32.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.33.4.7 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.33.4.6 26-Sep-2002  jdolecek add support for kevents - sprikle VN_KNOTE() and add genfs_kqfilter()
to vnode ops; basically same thing as in ufs_readwrite.c and ufs_vnops.c
 1.33.4.5 23-Sep-2002  jdolecek add spec kqfilter vnode op
 1.33.4.4 22-Sep-2002  jdolecek add fifo_kqfilter() to fifo ops, to switch on support for kevents
 1.33.4.3 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.33.4.2 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.33.4.1 03-Aug-2001  lukem update to -current
 1.36.2.2 01-Oct-2001  fvdl Catch up with -current.
 1.36.2.1 18-Sep-2001  fvdl Various changes to make cloning devices possible:

* Add an extra argument (struct vnode **) to VOP_OPEN. If it is
not NULL, specfs will create a cloned (aliased) vnode during
the call, and return it there. The caller should release and
unlock the original vnode if a new vnode was returned. The
new vnode is returned locked.

* Add a flag field to the cdevsw and bdevsw structures.
DF_CLONING indicates that it wants a new vnode for each
open (XXX is there a better way? devprop?)

* If a device is cloning, always call the close entry
point for a VOP_CLOSE.


Also, rewrite cons.c to do the right thing with vnodes. Use VOPs
rather then direct device entry calls. Suggested by mycroft@

Light to moderate testing done an i386 system (arch doesn't matter
though, these are MI changes).
 1.38.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.48.2.11 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.48.2.10 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.48.2.9 15-Feb-2005  skrll Sync with HEAD.
 1.48.2.8 29-Oct-2004  skrll Remove the struct lwp * argument from ext2f2_checkpath that is no longer
(read: was never) required.
 1.48.2.7 24-Sep-2004  skrll Sync with HEAD.
 1.48.2.6 21-Sep-2004  skrll Fix the sync with head I botched.
 1.48.2.5 18-Sep-2004  skrll Sync with HEAD.
 1.48.2.4 25-Aug-2004  skrll Sync with HEAD.
 1.48.2.3 24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.48.2.2 03-Aug-2004  skrll Sync with HEAD
 1.48.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.52.2.1 23-May-2004  grant Pull up revision 1.53 (requested by kleink in ticket #379):

POSIX: Permit a process without the appropriate privilege to change a
file's group ID to its effective gid, in addition to the presently
permitted set of supplementary gids.
 1.57.6.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.57.6.1 12-Feb-2005  yamt sync with head.
 1.57.4.1 29-Apr-2005  kent sync with -current
 1.60.2.8 04-Feb-2008  yamt sync with head.
 1.60.2.7 21-Jan-2008  yamt sync with head
 1.60.2.6 07-Dec-2007  yamt sync with head
 1.60.2.5 27-Oct-2007  yamt sync with head.
 1.60.2.4 03-Sep-2007  yamt sync with head.
 1.60.2.3 26-Feb-2007  yamt sync with head.
 1.60.2.2 30-Dec-2006  yamt sync with head.
 1.60.2.1 21-Jun-2006  yamt sync with head.
 1.62.2.1 20-Oct-2005  yamt adapt ufs.
 1.63.2.2 19-Nov-2005  yamt - finish reverting VOP_READ prototype changes.
- remove unused variables.
- fix typos.
some of them are pointed by Juan RP.
 1.63.2.1 15-Nov-2005  yamt - adapt to the new prototype of VOP_READ.
- adapt ext2fs and union.
 1.64.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.64.10.5 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.64.10.4 11-Mar-2006  elad When calling kauth_cred_ismember_gid(), don't return the error code if
there is one, just treat it as if the check failed.

Pointed out by thorpej@.
 1.64.10.3 11-Mar-2006  elad kauth_cred_groupmember() -> kauth_cred_ismember_gid(), as requested by
thorpej@ to conform to the Darwin KPI.
 1.64.10.2 10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.64.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.64.8.3 11-Aug-2006  yamt sync with head
 1.64.8.2 26-Jun-2006  yamt sync with head.
 1.64.8.1 24-May-2006  yamt sync with head.
 1.64.6.2 01-Jun-2006  kardel Sync with head.
 1.64.6.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time() and use "time_second"
instead of "time.tv_sec".
 1.64.4.1 09-Sep-2006  rpaulo sync with head
 1.65.2.1 19-Jun-2006  chap Sync with head.
 1.67.6.2 10-Dec-2006  yamt sync with head.
 1.67.6.1 22-Oct-2006  yamt sync with head
 1.67.4.2 12-Jan-2007  ad Sync with head.
 1.67.4.1 18-Nov-2006  ad Sync with head.
 1.69.2.1 17-Feb-2007  tron Apply patch (requested by chs in ticket #422):
- Fix various deadlock problems with nullfs and unionfs.
- Speed up path lookups by upto 25%.
 1.72.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.72.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.74.18.1 14-Oct-2007  yamt sync with head.
 1.74.16.3 23-Mar-2008  matt sync with HEAD
 1.74.16.2 09-Jan-2008  matt sync with HEAD
 1.74.16.1 06-Nov-2007  matt sync with HEAD
 1.74.14.3 09-Dec-2007  jmcneill Sync with HEAD.
 1.74.14.2 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.74.14.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.74.2.3 23-Oct-2007  ad Sync with head.
 1.74.2.2 16-Sep-2007  ad - Checkpoint work in progress on the vnode lifecycle and reference counting
stuff. This makes it work properly without kernel_lock and fixes a few
quite old bugs. See vfs_subr.c 1.283.2.17 for details.

- Fix some problems with softdep. Unfortunately our softdep code appears
to have some longstanding bugs that cause it fail under stress test.
 1.74.2.1 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.75.4.3 18-Feb-2008  mjf Sync with HEAD.
 1.75.4.2 27-Dec-2007  mjf Sync with HEAD.
 1.75.4.1 08-Dec-2007  mjf Sync with HEAD.
 1.76.2.3 30-Dec-2007  ad Fix remaining problems with ext2fs on this branch.
 1.76.2.2 26-Dec-2007  ad Sync with head.
 1.76.2.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.77.4.2 10-Jan-2008  bouyer Sync with HEAD
 1.77.4.1 02-Jan-2008  bouyer Sync with HEAD
 1.81.10.7 11-Aug-2010  yamt sync with head.
 1.81.10.6 11-Mar-2010  yamt sync with head
 1.81.10.5 16-Sep-2009  yamt sync with head
 1.81.10.4 18-Jul-2009  yamt sync with head.
 1.81.10.3 16-May-2009  yamt sync with head
 1.81.10.2 04-May-2009  yamt sync with head.
 1.81.10.1 16-May-2008  yamt sync with head.
 1.81.8.1 18-May-2008  yamt sync with head.
 1.81.6.2 17-Jan-2009  mjf Sync with HEAD.
 1.81.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.82.10.2 27-Oct-2009  bouyer Pull up following revision(s) (requested by pooka in ticket #1112):
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.91
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.152
sys/ufs/ext2fs/ext2fs_extern.h: revision 1.42
update i_uid and i_gid after chown
 1.82.10.1 29-Nov-2008  snj branches: 1.82.10.1.4;
Pull up following revision(s) (requested by mrg in ticket #147):
sys/ufs/ext2fs/ext2fs_alloc.c: revision 1.37
sys/ufs/ext2fs/ext2fs_bswap.c: revision 1.14
sys/ufs/ext2fs/ext2fs_dinode.h: revision 1.17
sys/ufs/ext2fs/ext2fs_lookup.c: revision 1.56
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.83
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.140
sys/ufs/ufs/inode.h: revision 1.55
add support for 32 bit uid/gid fields in ext2, but only do so for
when the revision is > REV0.
 1.82.10.1.4.1 21-Apr-2010  matt sync to netbsd-5
 1.82.8.2 28-Apr-2009  skrll Sync with HEAD.
 1.82.8.1 19-Jan-2009  skrll Sync with HEAD.
 1.82.6.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.83.4.2 23-Jul-2009  jym Sync with HEAD.
 1.83.4.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.91.4.4 31-May-2011  rmind sync with head
 1.91.4.3 05-Mar-2011  rmind sync with head
 1.91.4.2 03-Jul-2010  rmind sync with head
 1.91.4.1 30-May-2010  rmind sync with head
 1.91.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.91.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.97.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.100.2.5 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.100.2.4 23-Jan-2013  yamt sync with head
 1.100.2.3 16-Jan-2013  yamt sync with (a bit old) head
 1.100.2.2 23-May-2012  yamt sync with head.
 1.100.2.1 17-Apr-2012  yamt sync with head
 1.101.6.1 07-May-2012  riz Pull up following revision(s) (requested by chs in ticket #204):
sys/fs/sysvbfs/sysvbfs_vnops.c: revision 1.44
sys/ufs/ffs/ffs_vfsops.c: revision 1.277
sys/fs/v7fs/v7fs_vnops.c: revision 1.11
sys/ufs/chfs/chfs_vnops.c: revision 1.7
sys/ufs/ext2fs/ext2fs_readwrite.c: revision 1.61
sys/miscfs/genfs/genfs_io.c: revision 1.54
sys/kern/vfs_wapbl.c: revision 1.52
sys/uvm/uvm_pager.h: revision 1.43
sys/ufs/ffs/ffs_vnops.c: revision 1.121
sys/kern/vfs_subr.c: revision 1.434
sys/fs/msdosfs/msdosfs_vnops.c: revision 1.83
sys/fs/ntfs/ntfs_vnops.c: revision 1.51
sys/fs/udf/udf_subr.c: revision 1.119
sys/miscfs/specfs/spec_vnops.c: revision 1.135
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.103
sys/fs/udf/udf_vnops.c: revision 1.71
sys/ufs/ufs/ufs_readwrite.c: revision 1.104
change vflushbuf() to take the full FSYNC_* flags.
translate FSYNC_LAZY into PGO_LAZY for VOP_PUTPAGES() so that
genfs_do_io() can set the appropriate io priority for the I/O.
this is the first part of addressing PR 46325.
mark all wapbl I/O as BPRIO_TIMECRITICAL.
this is the second part of addressing PR 46325.
 1.101.4.2 02-Jun-2012  mrg sync to latest -current.
 1.101.4.1 05-Apr-2012  mrg sync to latest -current.
 1.104.2.4 03-Dec-2017  jdolecek update from HEAD
 1.104.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.104.2.2 23-Jun-2013  tls resync from head
 1.104.2.1 25-Feb-2013  tls resync with head
 1.107.6.1 18-May-2014  rmind sync with head
 1.111.2.1 10-Aug-2014  tls Rebase.
 1.113.2.1 17-Jan-2015  martin Pull up following revision(s) (requested by maxv in ticket #427):
sys/compat/svr4/svr4_schedctl.c: revision 1.8
sys/netinet/tcp_timer.c: revision 1.88
sys/miscfs/genfs/layer_vfsops.c: revision 1.45
sys/compat/svr4/svr4_ioctl.c: revision 1.37
sys/ufs/chfs/chfs_vfsops.c: revision 1.14
sys/miscfs/fdesc/fdesc_vfsops.c: revision 1.91
sys/compat/linux/arch/i386/linux_ptrace.c: revision 1.30
sys/compat/common/kern_time_50.c: revision 1.28
sys/netinet6/ip6_forward.c: revision 1.74
sys/miscfs/umapfs/umap_vnops.c: revision 1.57
sys/compat/svr4/svr4_fcntl.c: revision 1.74
distrib/sets/lists/comp/mi: revision 1.1931
sys/netinet6/udp6_output.c: revision 1.46
sys/fs/puffs/puffs_compat.c: revision 1.3
sys/fs/udf/udf_rename.c: revision 1.11
sys/compat/svr4/svr4_filio.c: revision 1.24
sys/fs/udf/udf_rename.c: revision 1.12
sys/netinet/tcp_usrreq.c: revision 1.202
sys/miscfs/umapfs/umap_subr.c: revision 1.29
sys/compat/linux/common/linux_fadvise64.c: revision 1.3
sys/netinet/if_atm.c: revision 1.34
sys/miscfs/procfs/procfs_subr.c: revision 1.106
sys/miscfs/genfs/layer_subr.c: revision 1.37
sys/netinet/tcp_sack.c: revision 1.30
sys/compat/freebsd/freebsd_misc.c: revision 1.33
sys/compat/freebsd/freebsd_file.c: revision 1.33
sys/ufs/chfs/chfs_vnode.c: revision 1.12
sys/compat/svr4/svr4_ttold.c: revision 1.34
sys/compat/linux/common/linux_file.c: revision 1.114
sys/compat/linux/arch/mips/linux_machdep.c: revision 1.43
sys/compat/linux/common/linux_signal.c: revision 1.76
sys/compat/common/compat_util.c: revision 1.46
sys/compat/linux/arch/arm/linux_ptrace.c: revision 1.18
sys/compat/svr4/svr4_sockio.c: revision 1.36
sys/compat/linux/arch/arm/linux_machdep.c: revision 1.32
sys/compat/svr4/svr4_signal.c: revision 1.66
sys/kern/kern_exec.c: revision 1.410
sys/fs/puffs/puffs_vfsops.c: revision 1.115
sys/compat/svr4/svr4_exec_elf64.c: revision 1.15
sys/compat/linux/arch/i386/linux_machdep.c: revision 1.159
sys/compat/linux/arch/alpha/linux_machdep.c: revision 1.50
sys/compat/linux32/common/linux32_misc.c: revision 1.24
sys/netinet/in_pcb.c: revision 1.153
sys/sys/malloc.h: revision 1.116
sys/compat/common/if_43.c: revision 1.9
share/man/man9/Makefile: revision 1.380
sys/netinet/tcp_vtw.c: revision 1.12
sys/miscfs/umapfs/umap_vfsops.c: revision 1.95
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.186
sys/compat/common/uipc_syscalls_43.c: revision 1.46
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.115
sys/fs/puffs/puffs_msgif.c: revision 1.97
sys/compat/svr4/svr4_ipc.c: revision 1.27
sys/compat/linux/common/linux_exec.c: revision 1.117
sys/ufs/ext2fs/ext2fs_readwrite.c: revision 1.66
sys/netinet/tcp_output.c: revision 1.179
sys/compat/svr4/svr4_termios.c: revision 1.28
sys/fs/udf/udf_strat_bootstrap.c: revision 1.4
sys/fs/puffs/puffs_subr.c: revision 1.67
sys/fs/puffs/puffs_node.c: revision 1.36
sys/miscfs/overlay/overlay_vnops.c: revision 1.21
sys/fs/cd9660/cd9660_node.c: revision 1.34
sys/netinet/raw_ip.c: revision 1.146
sys/sys/mallocvar.h: revision 1.13
sys/miscfs/overlay/overlay_vfsops.c: revision 1.63
share/man/man9/malloc.9: revision 1.50
sys/netinet6/dest6.c: revision 1.18
sys/compat/linux/common/linux_uselib.c: revision 1.33
sys/compat/linux/common/linux_socket.c: revision 1.120
share/man/man9/malloc.9: revision 1.51
sys/netinet/tcp_subr.c: revision 1.257
sys/compat/linux/common/linux_socketcall.c: revision 1.45
sys/compat/linux/common/linux_fadvise64_64.c: revision 1.3
sys/compat/freebsd/freebsd_ipc.c: revision 1.17
sys/compat/linux/common/linux_misc_notalpha.c: revision 1.109
sys/compat/linux/arch/alpha/linux_pipe.c: revision 1.17
sys/netinet6/in6_pcb.c: revision 1.132
sys/netinet6/in6_ifattach.c: revision 1.94
sys/compat/svr4/svr4_exec_elf32.c: revision 1.15
sys/miscfs/nullfs/null_vfsops.c: revision 1.90
sys/fs/cd9660/cd9660_util.c: revision 1.12
sys/compat/linux/arch/powerpc/linux_machdep.c: revision 1.48
sys/compat/freebsd/freebsd_exec_elf32.c: revision 1.20
sys/miscfs/procfs/procfs_vfsops.c: revision 1.94
sys/compat/linux/arch/powerpc/linux_ptrace.c: revision 1.28
sys/compat/linux/common/linux_sched.c: revision 1.67
sys/compat/linux/common/linux_exec_aout.c: revision 1.67
sys/compat/linux/common/linux_pipe.c: revision 1.67
sys/compat/linux/common/linux_llseek.c: revision 1.34
sys/compat/linux/arch/mips/linux_ptrace.c: revision 1.10
Do not uselessly include <sys/malloc.h>.
Cleanup:
- remove struct kmembuckets (dead)
- correctly deadify MALLOC_XX
- remove MALLOC_DEFINE_LIMIT and MALLOC_JUSTDEFINE_LIMIT (dead)
- remove malloc_roundup(), malloc_type_setlimit(), MALLOC_DEFINE_LIMIT()
and MALLOC_JUSTDEFINE_LIMIT() from man 9 malloc
New sentence, new line. Bump date for previous.
Obsolete malloc_roundup(9), malloc_type_setlimit(9) and MALLOC_DEFINE_LIMIT(9)
man pages.
 1.115.2.4 28-Aug-2017  skrll Sync with HEAD
 1.115.2.3 05-Oct-2016  skrll Sync with HEAD
 1.115.2.2 06-Jun-2015  skrll Sync with HEAD
 1.115.2.1 06-Apr-2015  skrll Sync with HEAD
 1.117.2.1 06-Aug-2016  pgoyette Sync with HEAD
 1.128.10.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.128.10.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.128.10.1 10-Jun-2019  christos Sync with HEAD
 1.128.8.1 18-Jan-2019  pgoyette Synch with HEAD
 1.132.6.1 01-Aug-2021  thorpej Sync with HEAD.
 1.5 16-May-2020  christos Add ACL support for FFS. From FreeBSD.
 1.4 23-Aug-2016  christos branches: 1.4.2; 1.4.4; 1.4.18;
CID 1371648: off by one in index checking
KNF.
 1.3 14-Aug-2016  jdolecek add EXT2F_HAS_ROCOMPAT_FEATURE() macro, and change the current EXT2F_HAS_{COMPAT|INCOMPAT}_FEATURE() to take fs as first parameter
 1.2 13-Aug-2016  christos KNF, no functional changes...
 1.1 12-Aug-2016  jdolecek add support for extended attributes in ext2fs for ext3/ext4; read-only for now
 1.4.18.2 03-Dec-2017  jdolecek update from HEAD
 1.4.18.1 23-Aug-2016  jdolecek file ext2fs_xattr.c was added on branch tls-maxphys on 2017-12-03 11:39:21 +0000
 1.4.4.2 05-Oct-2016  skrll Sync with HEAD
 1.4.4.1 23-Aug-2016  skrll file ext2fs_xattr.c was added on branch nick-nhusb on 2016-10-05 20:56:11 +0000
 1.4.2.2 14-Sep-2016  pgoyette Sync with HEAD
 1.4.2.1 23-Aug-2016  pgoyette file ext2fs_xattr.c was added on branch pgoyette-localcount on 2016-09-14 03:04:19 +0000
 1.4 16-Apr-2020  rin Revert previous for now:
http://mail-index.netbsd.org/source-changes/2020/04/16/msg116278.html

The reasoning turned out to be wrong; __KERNEL_RCSID() in header files
does *not* overwrite RCSID in main source files. The real problem is that
it inserts its RCSID into *every* object files. However, it can be still
useful even if heavily duplicated.
 1.3 16-Apr-2020  rin Stop using __KERNEL_RCSID() in header files; it confuses ident(1) by
overwriting RCSID in main source files.

XXX
The first argument of __KERNEL_RCSID() is neglected for ELF. If we wish
to have RCSID of header files in kernel binary, we need something like
__FBSDID() macro in FreeBSD.
 1.2 12-Aug-2016  macallan branches: 1.2.2; 1.2.4; 1.2.18; 1.2.32;
cast pointers to uintptr_t before comparing them, also ()s
now this at least compiles
 1.1 12-Aug-2016  jdolecek add support for extended attributes in ext2fs for ext3/ext4; read-only for now
 1.2.32.1 20-Apr-2020  bouyer Sync with HEAD
 1.2.18.2 03-Dec-2017  jdolecek update from HEAD
 1.2.18.1 12-Aug-2016  jdolecek file ext2fs_xattr.h was added on branch tls-maxphys on 2017-12-03 11:39:21 +0000
 1.2.4.2 05-Oct-2016  skrll Sync with HEAD
 1.2.4.1 12-Aug-2016  skrll file ext2fs_xattr.h was added on branch nick-nhusb on 2016-10-05 20:56:11 +0000
 1.2.2.2 14-Sep-2016  pgoyette Sync with HEAD
 1.2.2.1 12-Aug-2016  pgoyette file ext2fs_xattr.h was added on branch pgoyette-localcount on 2016-09-14 03:04:19 +0000
 1.1 12-Jun-1998  cgd Rework the way kernel include files are installed. In the new method,
as with user-land programs, include files are installed by each directory
in the tree that has includes to install. (This allows more flexibility
as to what gets installed, makes 'partial installs' easier, and gives us
more options as to which machines' includes get installed at any given
time.) The old SYS_INCLUDES={symlinks,copies} behaviours are _both_
still supported, though at least one bug in the 'symlinks' case is
fixed by this change. Include files can't be build before installation,
so directories that have includes as targets (e.g. dev/pci) have to move
those targets into a different Makefile.
 1.174 27-Jun-2025  andvar s/quadradically/quadratically/ in comments.
 1.173 13-May-2024  msaitoh branches: 1.173.2;
s/contigous/contiguous/ in comment.
 1.172 07-Jan-2023  chs ufs: fixed signed/unsigned bugs affecting large file systems

Apply these commits from FreeBSD:

commit e870d1e6f97cc73308c11c40684b775bcfa906a2
Author: Kirk McKusick <mckusick@FreeBSD.org>
Date: Wed Feb 10 20:10:35 2010 +0000

This fix corrects a problem in the file system that treats large
inode numbers as negative rather than unsigned. For a default
(16K block) file system, this bug began to show up at a file system
size above about 16Tb.

To fully handle this problem, newfs must be updated to ensure that
it will never create a filesystem with more than 2^32 inodes. That
patch will be forthcoming soon.

Reported by: Scott Burns, John Kilburg, Bruce Evans
Followup by: Jeff Roberson
PR: 133980
MFC after: 2 weeks

commit 81479e688b0f643ffacd3f335b4b4bba460b769d
Author: Kirk McKusick <mckusick@FreeBSD.org>
Date: Thu Feb 11 18:14:53 2010 +0000

One last pass to get all the unsigned comparisons correct.


In additional to the changes from FreeBSD, this commit includes quite a few
related changes to appease -Wsign-compare.
 1.171 23-Apr-2022  hannken branches: 1.171.4;
Need vnode locked fot VOP_FDISCARD().
 1.170 03-Sep-2021  andvar fix typos in comments, mainly s/extention/extension/ and s/sufficent/sufficient/
 1.169 05-Sep-2020  riastradh Round of uvm.h cleanup.

The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
to query whether curlwp is the pagedaemon, which should maybe be
exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
by it. We should split up uvm_extern.h but this will serve for now
to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
UVMHIST(ubchist), since ubchist is declared in uvm.h but the
reference evaporates if UVMHIST is not defined, so we reduce header
file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
here.

ok chs@
 1.168 26-Jul-2020  chs skip the assertions about page-locking when allocating to the extattr bmap,
since extattrs do not use the page cache.
 1.167 18-Apr-2020  christos Extended attribute support for ffsv2, from FreeBSD.
 1.166 23-Feb-2020  ad branches: 1.166.4;
UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.165 18-Feb-2020  riastradh Fix non-DIAGNOSTIC build with UVM_PAGE_TRKOWN.
 1.164 14-Apr-2019  kardel branches: 1.164.4; 1.164.6;
PR/53990, PR/52380, PR/52102: UFS2 cylinder group inode allocation botch

Fix rare allocation botch in ffs_nodealloccg().

Conditions:
a) less than
#_of_initialized_inodes(cg->cg_initediblk)
- inodes_per_filesystem_block
are allocated in the cylinder group
b) cg->cg_irotor points to a uninterupted run of
allocated inodes in the inode bitmap up to the
end of dynamically initialized inodes
(cg->cg_initediblk)

In this case the next inode after this run was returned
without initializing the respective inode block. As the
block is not initialized these inodes could trigger panics
on inode consistency due to old (uninitialized) disk data.

In very rare cases data loss could occur when
the uninitialized inode block is initialized via the
normal mechanism.
Further conditions to occur after the above:
c) no panic
d) no (forced) fsck
e) and more than cg->cg_initediblk - inodes_per_filesystem_block
allocated inodes.

Fix:
Always insure allocation always in initialized inode range
extending the initialized inode range as needed.

Add KASSERTMSG() safeguards.

ok hannken@
 1.163 10-Dec-2018  jdolecek put back UFS_WAPBL_JUNLOCK_ASSERT(), the underlying rw_write_held() check
doesn't actually have a race since it checks if the rwlock is held by
current lwp
 1.162 10-Dec-2018  jdolecek make UFS_WAPBL_JLOCK_ASSERT() #ifdef DIAGNOSTIC, same as the underlying
function KASSERT(), so that it actually does something; fix code using
it to actually pass correct params, so that it compiles

remove UFS_WAPBL_JUNLOCK_ASSERT(), as that is inherently racy (it's
okay on those places if the rwlock is held by other lwp); depend
on the RW_ASSERT()/LOCKDEBUG inside rw_enter() to catch the case
with wapbl rwlock held by current lwp
 1.161 03-Sep-2018  riastradh Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
 1.160 19-Jul-2018  ozaki-r Avoid using magic numbers for arguments of workqueue_create (NFC)
 1.159 07-Dec-2017  chs branches: 1.159.2; 1.159.4;
fix the UVM_PAGE_TRKOWN page-locking assertion at the top of ffs_alloc()
to work right for multi-threaded processes.
 1.158 13-Aug-2017  mlelstv Don't time out the discard work queue here. Either destroying a work queue
with pending work items panics or accessing freed resources from the work
item will crash. The timeout needs to be handled gracefully by the driver
that implements the discard operation.

Fixes parts of PR 50725.
 1.157 12-Jul-2017  hannken When initializing more inodes make sure to write them to disk
before writing the cylinder group with updated cg_initediblk.
 1.156 18-Mar-2017  riastradh branches: 1.156.6;
#if DIAGNOSTIC panic ---> KASSERT
 1.155 01-Mar-2017  hannken Remove now redundant calls to fstrans_start()/fstrans_done().
 1.154 30-Oct-2016  christos branches: 1.154.2;
Tidy up panic messages, no functional change.
 1.153 28-Oct-2016  jdolecek reorganize ffs_truncate()/ffs_indirtrunc() to be able to partially
succeed; change wapbl_register_deallocation() to return EAGAIN
rather than panic when code hits the limit

callers changed to either loop calling ffs_truncate() using new
utility ufs_truncate_retry() if their semantics requires it, or
just ignore the failure; remove ufs_wapbl_truncate()

this fixes possible user-triggerable panic during truncate, and
resolves WAPBL performance issue with truncates of large files

PR kern/47146 and kern/49175
 1.152 25-Sep-2016  jdolecek adjust ffs_realloccg() so that the logic about allocating full
contiguous block for future fragment expansion doesn't need to
UFS_WAPBL_REGISTER_DEALLOCATION() or ffs_blkfree(); the free blocks
are now immediatelly available for use by the expanding file in further i/o

primary driver is safe removal of the deallocation registration and
hence failure point, but this also fixes degenerate case for wapbl,
and similar also for discard - if the file would be actually expanded
before wapbl commit, or before discard queue would be processed,
the filesystem would not yet see the contiguous free blocks, and
would be forced to allocate another fragment elsewhere
 1.151 12-Aug-2015  riastradh branches: 1.151.2;
Need wapbl transaction around ffs_blkfree_cg. Fixes wapbl+discard.
 1.150 08-Aug-2015  mlelstv don't crash when printing error messages when there are no credentials.
don't abuse the printed uid to log the inode number.

The printing/logging of error messages should be simplified.
 1.149 28-Mar-2015  maxv Remove the 'cred' argument from bread(). Remove a now unused var in
ffs_snapshot.c. Update the man page accordingly.

ok hannken@
 1.148 17-Mar-2015  hannken Change ffs to use vcache_new:
- Change ffs_valloc to return an inode number.
- Remove now obsolete UFS operations UFS_VALLOC and UFS_VFREE.
- Make ufs_makeinode private to ufs_vnops.c and pass vattr instead of mode.
 1.147 08-Sep-2014  joerg branches: 1.147.2;
Prefer cprng_fast32 over random. A good distribution even in the lower
bits beat any minor performance advantage randomo(9) might have,
especially given the disk IO involved.
 1.146 25-Jul-2014  dholland branches: 1.146.2;
Switch the FFS code for discarding free blocks to use VOP_FDISCARD.
 1.145 12-Nov-2013  dholland branches: 1.145.2;
clarify warning printout
 1.144 28-Oct-2013  bad Pull in fix from FreeBSD ffs_alloc.c r121785:
Consider only cylinder groups with at least 75% of the average free space
per cylinder group and 75% of the average free inodes per cylinder group
as candidates for the creation of a new directory. Avoids excessive I/O
scanning for a suitable cylinder group on relatively full file systems.

Tested by sborril and me.

Pullup: netbsd-6, netbsd-5


Original commit message:

Tweak the calculation of minbfree in ffs_dirpref() so that only
those cylinder groups that have at least 75% of the average free
space per cylinder group for that file system are considered as
candidates for the creation of a new directory. The previous formula
for minbfree would set it to zero if the file system was more than
75% full, which allowed cylinder groups with no free space at all
to be chosen as candidates for directory creation, which resulted
in an expensive search for free blocks for each file that was
subsequently created in that directory.

Modify the calculation of minifree in the same way.

Decrease maxcontigdirs as the file system fills to decrease the
likelyhood that a cluster of directories will overflow the available
space in a cylinder group.

Reviewed by: mckusick
Tested by: kmarx@vicor.com
MFC after: 2 weeks
 1.143 20-Oct-2013  christos always declare needswap
 1.142 20-Oct-2013  christos always declare needswap
 1.141 19-Oct-2013  martin Eliminate a variable only used in diagnostic kernels
 1.140 30-Sep-2013  hannken Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>
 1.139 12-Sep-2013  martin #ifdef a variable just like their use
 1.138 23-Jun-2013  dholland branches: 1.138.2;
Stick ffs_ in front of the following macros:
fragstoblks()
blkstofrags()
fragnum()
blknum()

to finish the job of distinguishing them from the lfs versions, which
Christos renamed the other day.

I believe this is the last of the overtly ambiguous exported symbols
from ffs... or at least, the last of the ones that conflicted with lfs.
ffs still pollutes the C namespace very broadly (as does ufs) and this
needs quite a bit more cleanup.

XXX: boo on macros with lowercase names. But I'm not tackling that just yet.
 1.137 23-Jun-2013  dholland Stick ffs_, ext2_, chfs_, filecore_, cd9660_, or mfs_ in front of
the following symbols so as to disambiguate fully. (Christos already
did the lfs ones.)

lblkno
lblktosize
lfragtosize
numfrags
blkroundup
fragroundup
 1.136 23-Jun-2013  dholland fsbtodb() -> FFS_FSBTODB(), EXT2_FSBTODB(), or MFS_FSBTODB()
dbtofsb() -> FFS_DBTOFSB() or EXT2_DBTOFSB()

(Christos already did the lfs ones a few days back)
 1.135 19-Jun-2013  dholland blkoff() -> ffs_blkoff() stragglers
 1.134 19-Jun-2013  dholland Rename ambiguous macros:
MAXDIRSIZE -> UFS_MAXDIRSIZE or LFS_MAXDIRSIZE
NINDIR -> FFS_NINDIR, EXT2_NINDIR, LFS_NINDIR, or MFS_NINDIR
INOPB -> FFS_INOPB, LFS_INOPB
INOPF -> FFS_INOPF, LFS_INOPF
blksize -> ffs_blksize, ext2_blksize, or lfs_blksize
sblksize -> ffs_blksize

These are not the only ambiguously defined filesystem macros, of
course, there's a pile more. I may not have found all the ambiguous
definitions of blksize(), too, as there are a lot of other things
called 'blksize' in the system.
 1.133 22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.132 20-Dec-2012  hannken Change bread() and breadn() to never return a buffer on
error and modify all callers to not brelse() on error.

Welcome to 6.99.16

PR kern/46282 (6.0_BETA crash: msdosfs_bmap -> pcbmap -> bread -> bio_doread)
 1.131 19-Oct-2012  drochner Implement experimental support to pass notifications that a file
was deleted from the filesystem to the disk driver, commonly
known as "discard" or "trim".
fs/driver support is in ffs and ata wd for now.
This is what was posted here:
http://mail-index.netbsd.org/tech-kern/2012/02/28/msg012813.html
with minor cleanup, and the global switch replaced by a mount option.
 1.130 28-Nov-2011  tls branches: 1.130.4; 1.130.8;
Remove arc4random() and arc4randbytes() from the kernel API. Replace
arc4random() hacks in rump with stubs that call the host arc4random() to
get numbers that are hopefully actually random (arc4random() keyed with
stack junk is not). This should fix some of the currently failing anita
tests -- we should no longer generate duplicate "random" MAC addresses in
the test environment.
 1.129 20-Sep-2011  chs branches: 1.129.2;
strengthen the assertions about pages existing during block allocation,
which were incorrectly relaxed last year. add some comments so that
the intent of these is hopefully clearer.

in ufs_balloc_range(), don't free pages or mark them dirty if
allocating their backing store failed. this fixes PR 45369.
 1.128 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.127 06-Mar-2011  bouyer branches: 1.127.2;
merge the bouyer-quota2 branch. This adds a new on-disk format
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.

See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
 1.126 06-Mar-2011  rmind {ffs_nodealloccg,ext2fs_nodealloccg,ext2fs_mapsearch}: use XOR and ffs()
to find free bits in the inode and block bitmaps, instead of the loop.

Obtained from FreeBSD (changes by jhb).
 1.125 21-Feb-2010  mlelstv branches: 1.125.2; 1.125.4; 1.125.6;
For the UVM_PAGE_TRKOWN test do not require that the relevant pages
must exist.
 1.124 07-May-2009  elad branches: 1.124.2;
Introduce several actions/requests for authorizing file-system related
operations, specifically quota and block allocation from reserved space.

Modify ufs_quotactl() to accomodate passing "mp" earlier by vfs_busy()ing
it a little bit higher.

Mailing list reference:

http://mail-index.netbsd.org/tech-kern/2009/04/26/msg004936.html

Note that the umapfs request mentioned in this thread was NOT added as
there is still on-going discussion regarding the proper implementation.
 1.123 25-Apr-2009  sborrill Fix random 'filesystem full' messages by trapping a couple of 32-bit
overflow areas missed in rev 1.110 and switching cgbase().

Kudos to rump_ffs!
 1.122 22-Feb-2009  ad PR kern/26878 FFSv2 + softdep = livelock (no free ram)
PR kern/16942 panic with softdep and quotas
PR kern/19565 panic: softdep_write_inodeblock: indirect pointer #1 mismatch
PR kern/26274 softdep panic: allocdirect_merge: ...
PR kern/26374 Long delay before non-root users can write to softdep partitions
PR kern/28621 1.6.x "vp != NULL" panic in ffs_softdep.c:4653 while unmounting a softdep (+quota) filesystem
PR kern/29513 FFS+Softdep panic with unfsck-able file-corruption
PR kern/31544 The ffs softdep code appears to fail to write dirty bits to disk
PR kern/31981 stopping scsi disk can cause panic (softdep)
PR kern/32116 kernel panic in softdep (assertion failure)
PR kern/32532 softdep_trackbufs deadlock
PR kern/37191 softdep: locking against myself
PR kern/40474 Kernel panic after remounting raid root with softdep

Retire softdep, pass 2. As discussed and later formally announced on the
mailing lists.
 1.121 22-Feb-2009  ad PR kern/39564 wapbl performance issues with disk cache flushing
PR kern/40361 WAPBL locking panic in -current
PR kern/40361 WAPBL locking panic in -current
PR kern/40470 WAPBL corrupts ext2fs
PR kern/40562 busy loop in ffs_sync when unmounting a file system
PR kern/40525 panic: ffs_valloc: dup alloc

- A fix for an issue that can lead to "ffs_valloc: dup" due to dirty cg
buffers being invalidated. Problem discovered and patch by dholland@.

- If the syncer fails to lazily sync a vnode due to lock contention,
retry 1 second later instead of 30 seconds later.

- Flush inode atime updates every ~10 seconds (this makes most sense with
logging). Presently they didn't hit the disk for read-only files or
devices until the file system was unmounted. It would be better to trickle
the updates out but that would require more extensive changes.

- Fix issues with file system corruption, busy looping and other nasty
problems when logging and non-logging file systems are intermixed,
with one being the root file system.

- For logging, do not flush metadata on an inode-at-a-time basis if the sync
has been requested by ioflush. Previously, we could try hundreds of log
sync operations a second due to inode update activity, causing the syncer
to fall behind and metadata updates to be serialized across the entire
file system. Instead, burst out metadata and log flushes at a minimum
interval of every 10 seconds on an active file system (happens more often
if the log becomes full). Note this does not change the operation of
fsync() etc.

- With the flush issue fixed, re-enable concurrent metadata updates in
vfs_wapbl.c.
 1.120 11-Jan-2009  christos branches: 1.120.2;
merge christos-time_t
 1.119 06-Dec-2008  joerg Split ffs_freefile into a frontend for normal cylinder group and for
snapshot use. Adjust ffs_blkfree_common to get the fs instance passed
in, the original commit didn't account blocks in the snapshots
correctly. Assert that ffs_blkfree is used with the primary fs instance
and that ffs_checkfreefile is only used for snapshots. Move the bdwrite
from ffs_blkfree_common into the caller for symmetry. This creates a
redundant write of unmodified data for ffs_blkfree_snap if a double free
of a block happens.

Reviewed and tested by hannken@.
 1.118 01-Dec-2008  joerg Revert last. Conditionalize variables on FFS_EI.
 1.117 01-Dec-2008  cegger build fix: remove unused variables
 1.116 01-Dec-2008  joerg ffs_blkfree is used in two different ways. The normal usage is to free a
block in the cylinder groups of the filesystem. The other user is the
snapshot code, which wants to modify the copied cylinder groups. Use
different frontends to distinguish the cases in preparation for fine
grained locking for cylinder groups.
 1.115 30-Nov-2008  joerg Split ffs_blkalloc into a frontend that does inode based consistency
checks and a backend that just asserts them. Use the backend in
ffs_wapbl_abort_sync_metadata instead of faking an inode.
 1.114 06-Nov-2008  joerg Remove XXXUBC code for ffs_reallocblks, that has been conditionalized in
2002 and #if 0'ed in 2005. It would need a considerable amount of work
to bring back and obscures the more important block allocation.
 1.113 06-Aug-2008  hannken branches: 1.113.2; 1.113.4;
Do not call UFS_WAPBL_*() when ffs_freefile() is acting on a snapshot.

While here replace the test for VBLK with a convenience variable.
 1.112 31-Jul-2008  hannken Resolve a deadlock when fs_nodealloccg() initializes more inodes on
an UFS2 file system. With the current cylinder group buffer busy it
calls ffs_getblk(). This runs through copy-on-write and may need the
current cylinder group buffer to allocate a new block for the snapshot.

While here write the cylinder group buffer synchronously after
cg_initediblk was changed because fsck_ffs will trust it.

Reviewed by: Jason Thorpe <thorpej@netbsd.org>
 1.111 31-Jul-2008  simonb Merge the simonb-wapbl branch. From the original branch commit:

Add Wasabi System's WAPBL (Write Ahead Physical Block Logging)
journaling code. Originally written by Darrin B. Jewell while
at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

OK'd by core@, releng@.
 1.110 11-Jul-2008  simonb Fix potential 32-bit overflow problem in the blockpref code.
mlelstv@ points out FreeBSD fixed the same thing a couple of years
ago - here's the commit message they used on rev 1.127:

Fixes a bug that caused UFS2 filesystems bigger than 2TB to
prematurely report that they were full and/or to panic the kernel
with the message ``ffs_clusteralloc: allocated out of group''.

Submitted by: Henry Whincup <henry@jot.to>
 1.109 04-Jun-2008  ad branches: 1.109.2; 1.109.4;
When setting DONE on the buffer, assert that there are no waiters in
biowait().
 1.108 03-Jun-2008  hannken ufs/ffs: replace calls to getblk() with ffs_getblk(). Now all buffers
have been run through copy-on-write and async mounts work again.

Fixes PR kern/38820

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.107 16-May-2008  hannken Make sure all cached buffers with valid, not yet written data have been
run through copy-on-write. Call fscow_run() with valid data where possible.

The LP_UFSCOW hack is no longer needed to protect ffs_copyonwrite() against
endless recursion.

- Add a flag B_MODIFY to bread(), breada() and breadn(). If set the caller
intends to modify the buffer returned.

- Always run copy-on-write on buffers returned from ffs_balloc().

- Add new function ffs_getblk() that gets a buffer, assigns a new blkno,
may clear the buffer and runs copy-on-write. Process possible errors
from getblk() or fscow_run(). Part of PR kern/38664.

Welcome to 4.99.63

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.106 21-Jan-2008  pooka branches: 1.106.6; 1.106.8; 1.106.10; 1.106.12; 1.106.14;
Sprinkle comments about um_lock status on function entry and exit.
No functional change.
 1.105 02-Jan-2008  ad Merge vmlocking2 to head.
 1.104 01-Nov-2007  hannken branches: 1.104.2; 1.104.4; 1.104.8;
Avoid doing bawrite to initialize inode block while holding cylinder
group block buffer busy. If filesystem has any active snapshots, bawrite
can come back trying to allocate new snapshot data block from the same
cylinder group and cause deadlock.

From FreeBSD Rev. 1.117
 1.103 18-Oct-2007  hannken Ffs_blkfree() and ffs_freefile() take a devvp that may be a regular file whencalled from snapshot creation. Be sure to use the right mount.

Ok: Andrew Doran <ad@netbsd.org>
 1.102 10-Oct-2007  ad branches: 1.102.2;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.101 08-Oct-2007  ad Merge ffs locking & brelse changes from the vmlocking branch.
 1.100 09-Aug-2007  hannken branches: 1.100.2; 1.100.4;
Move snapshot per-mount data from struct ufsmount to mount specific data.
No functional changes.

Welcome to 4.99.28 (struct ufsmount changed size)
 1.99 16-Jul-2007  pooka branches: 1.99.2; 1.99.6;
When allocating blocks, check minfree before asking kauth about
suser. The latter has unknown cost and rarely needs to be called.
 1.98 04-Mar-2007  christos branches: 1.98.2; 1.98.6;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.97 04-Jan-2007  elad branches: 1.97.2;
Consistent usage of KAUTH_GENERIC_ISSUSER.
 1.96 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.95 15-Oct-2006  yamt ffs_alloc: remove an assertion which is no longer true.
 1.94 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.93 23-Jun-2006  yamt branches: 1.93.4; 1.93.6;
fix a simonb-timecounters regression.
the precision of getnanotime() is not suitable for file timestamps.
esp. when it's nfs-exported.

- introduce vfs_timestamp().
(the name is from freebsd. currently merely a wrapper of nanotime())
- for ufs-like filesystems, use it rather than getnanotime().

XXX check other filesystems.
 1.92 07-Jun-2006  kardel branches: 1.92.2;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.91 14-May-2006  elad branches: 1.91.2;
integrate kauth.
 1.90 23-Dec-2005  yamt branches: 1.90.4; 1.90.6; 1.90.8; 1.90.10; 1.90.12;
prevent in-core vnode being freed from getting new references.
otherwise, once the corresponding bit in the inode bitmap is cleared,
an unrelated inode with the same inode number can be allocated and
ufs_ihashget() picks a stale in-core vnode for it.

PR/32301 by Matthias Scheler.
 1.89 27-Nov-2005  dsl Force some multiplies to give a 64 bit result to avoid dirsize being zero
and causing a divide by zero trap later.
Fixes a panic noted in netbsd-help.
 1.88 02-Nov-2005  yamt branches: 1.88.2;
merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.87 26-Sep-2005  yamt branches: 1.87.2;
always use nanotime rather than time.
it's bad to mix nanotime and time because it sometimes
make timestamps go backwards.
 1.86 19-Aug-2005  christos 64 bit inode changes.
 1.85 15-Jul-2005  thorpej Use ANSI function decls.
 1.84 06-Jun-2005  dbj branches: 1.84.2;
remove (long) cast on bpref, which is daddr_t
 1.83 29-May-2005  christos - sprinkle const
- avoid shadow variables.
 1.82 22-May-2005  hannken ffs/ffs_alloc.c:
- Add a missing ACTIVECG_CLR().

ffs/ffs_snapshot.c:
- Use async/delayed writes for snapshot creation and sync/uncache these buffers
on end. Reduces the time the file system must be suspended.
- Remove um_snaplistsize. Was a duplicate of um_snapblklist[0].
- Byte swap the list of preallocated blocks on read/write instead of access.
- Always keep this list on ip->i_snapblklist so it may be rolled back when the
newest snapshot gets removed. Fixes a rare snapshot corruption when using
more than one snapshot on a file system.

ufs/ufsmount.h:
- Make TAILQ_LAST() possible on member um_snapshots.
- Remove um_snaplistsize. Was a duplicate of um_snapblklist[0].
 1.81 26-Feb-2005  perry branches: 1.81.2;
nuke trailing whitespace
 1.80 15-Dec-2004  mycroft branches: 1.80.2; 1.80.4;
Remove some unnecessary (int32_t) casts that would cause us to screw up the
top bit in block addresses.

Also, change some daddr_t->int32_t casts (mostly as arguments to ufs_rw32(),
where they would get promoted anyway) to u_int32_t.
 1.79 11-Oct-2004  dbj print absolute inode number in debug output when freeing free inode occurs.
previously, the number was relative to the cylinder group, which was confusing.
prefix debug message with "ifree:" so this can be differentiated in bug reports.
 1.78 29-Aug-2004  hannken While creating a snapshot inodes must be freed from the
snapshot, not from the file system.
ffs_freefile() needs explicit "fs" and "devvp" arguments.
 1.77 26-May-2004  hannken Don't use VTOI(vp)->i_flags to test for snapshot devices. Will not work
for non-UFS file systems. Test for VBLK vnode instead.
 1.76 25-May-2004  hannken Add ffs internal snapshots. Written by Marshall Kirk McKusick for FreeBSD.

- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.

Welcome to 2.0F.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.75 18-Apr-2004  dbj when enabling ffs compatibility in ffs_reload, use
sblockloc that superblock was read from
also note XXX that ffs_reload doesn't handle superblock moving
 1.74 13-Jan-2004  soren branches: 1.74.2;
With large average filesizes, it was possible to overflow dirsize to zero,
causing a division by zero in ffs_dirpref().

From Barry Bouwsma of Tiengen.
 1.73 09-Jan-2004  dbj never upgrade the superblock or set FS_FLAGS_UPDATED in fs_old_flags
add compatibility for filesystems created before FFSv2 integration
these patches are from pr port-macppc/23926 and should also fix
problems discussed in pr kern/21404 and pr kern/21283
 1.72 30-Dec-2003  pk Replace the traditional buffer memory management -- based on fixed per buffer
virtual memory reservation and a private pool of memory pages -- by a scheme
based on memory pools.

This allows better utilization of memory because buffers can now be allocated
with a granularity finer than the system's native page size (useful for
filesystems with e.g. 1k or 2k fragment sizes). It also avoids fragmentation
of virtual to physical memory mappings (due to the former fixed virtual
address reservation) resulting in better utilization of MMU resources on some
platforms. Finally, the scheme is more flexible by allowing run-time decisions
on the amount of memory to be used for buffers.

On the other hand, the effectiveness of the LRU queue for buffer recycling
may be somewhat reduced compared to the traditional method since, due to the
nature of the pool based memory allocation, the actual least recently used
buffer may release its memory to a pool different from the one needed by a
newly allocated buffer. However, this effect will kick in only if the
system is under memory pressure.
 1.71 27-Nov-2003  mycroft Remove part of previous -- there is NO reason for directory allocation to use
arc4random().
 1.70 05-Sep-2003  itojun use arc4random instead of random (mask with INT32_MAX to avoid getting
negative numbers unexpectedly).
 1.69 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.68 29-Jun-2003  fvdl branches: 1.68.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.67 29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.66 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.65 15-May-2003  kristerw The C language does not permit statements of the form
(X ? Y : Z) = 0;
even though gcc handles this by a stupid extension.

Transform these to correct C.

Approved by fvdl.
 1.64 04-May-2003  gmcgarry Print pid on error. From Greg A. Woods in PR#17393.
 1.63 17-Apr-2003  fvdl configdirs was changed to an array of u_int8_t, so don't compare values
to 65535.
 1.62 12-Apr-2003  fvdl Use variables for some cg accesses; makes things more readable and more
similar to FreeBSD. No functional change.
 1.61 10-Apr-2003  fvdl Initialize the 'mirror' i_flags fiels in struct inode to 0.
 1.60 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.59 26-Jan-2003  tsutsui More printf format cleanup to reduce casts.
 1.58 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.57 27-Dec-2002  hannken Clear IN_SPACECOUNTED on (re-)used inodes.
This cures the "unmount pending error:" on softdep umounts.

Approved by: Frank van der Linden <fvdl@netbsd.org>
 1.56 27-Sep-2002  provos remove trailing \n in panic(). approved perry.
 1.55 14-May-2002  matt branches: 1.55.4;
Commit out code that's no longer used.
 1.54 10-Apr-2002  mycroft Use blkstofrags() and fragstoblks(). Use &(NBBY-1) rather than %NBBY.
Switch off of fs_fragshift rather than fs_frag (generates better jump tables).
 1.53 30-Oct-2001  lukem add __KERNEL_RCSID()
 1.52 19-Sep-2001  lukem branches: 1.52.2;
- ffs_blkpref() changes:
- don't both updating fs->fs_cgrotor, since it's actually not used in
the kernel. from Manuel Bouyer in [kern/3389]
- when examining cylinder groups from startcg to startcg-1 (wrapping
at fs->fs_ncg), there's no need to check startcg at the end as well
as the start...
- highlight in the struct fs declaration that fs_cgrotor is UNUSED
 1.51 15-Sep-2001  chs a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.50 06-Sep-2001  lukem branches: 1.50.2;
Incorporate the enhanced ffs_dirpref() by Grigoriy Orlov, as found in
FreeBSD (three commits; the initial work, man page updates, and a fix
to ffs_reload()), with the following differences:
- Be consistent between newfs(8) and tunefs(8) as to the options which
set and control the tuning parameters for this work (avgfilesize & avgfpdir)
- Use u_int16_t instead of u_int8_t to keep track of the number of
contiguous directories (suggested by Chuck Silvers)
- Work within our FFS_EI framework
- Ensure that fs->fs_maxclusters and fs->fs_contigdirs don't point to
the same area of memory

The new algorithm has a marked performance increase, especially when
performing tasks such as untarring pkgsrc.tar.gz, etc.

The original FreeBSD commit messages are attached:

=====
mckusick 2001/04/10 01:39:00 PDT
Directory layout preference improvements from Grigoriy Orlov <gluk@ptci.ru>.
His description of the problem and solution follow. My own tests show
speedups on typical filesystem intensive workloads of 5% to 12% which
is very impressive considering the small amount of code change involved.

------

One day I noticed that some file operations run much faster on
small file systems then on big ones. I've looked at the ffs
algorithms, thought about them, and redesigned the dirpref algorithm.

First I want to describe the results of my tests. These results are old
and I have improved the algorithm after these tests were done. Nevertheless
they show how big the perfomance speedup may be. I have done two file/directory
intensive tests on a two OpenBSD systems with old and new dirpref algorithm.
The first test is "tar -xzf ports.tar.gz", the second is "rm -rf ports".
The ports.tar.gz file is the ports collection from the OpenBSD 2.8 release.
It contains 6596 directories and 13868 files. The test systems are:

1. Celeron-450, 128Mb, two IDE drives, the system at wd0, file system for
test is at wd1. Size of test file system is 8 Gb, number of cg=991,
size of cg is 8m, block size = 8k, fragment size = 1k OpenBSD-current
from Dec 2000 with BUFCACHEPERCENT=35

2. PIII-600, 128Mb, two IBM DTLA-307045 IDE drives at i815e, the system
at wd0, file system for test is at wd1. Size of test file system is 40 Gb,
number of cg=5324, size of cg is 8m, block size = 8k, fragment size = 1k
OpenBSD-current from Dec 2000 with BUFCACHEPERCENT=50

You can get more info about the test systems and methods at:
http://www.ptci.ru/gluk/dirpref/old/dirpref.html

Test Results

tar -xzf ports.tar.gz rm -rf ports
mode old dirpref new dirpref speedup old dirprefnew dirpref speedup
First system
normal 667 472 1.41 477 331 1.44
async 285 144 1.98 130 14 9.29
sync 768 616 1.25 477 334 1.43
softdep 413 252 1.64 241 38 6.34
Second system
normal 329 81 4.06 263.5 93.5 2.81
async 302 25.7 11.75 112 2.26 49.56
sync 281 57.0 4.93 263 90.5 2.9
softdep 341 40.6 8.4 284 4.76 59.66

"old dirpref" and "new dirpref" columns give a test time in seconds.
speedup - speed increasement in times, ie. old dirpref / new dirpref.

------

Algorithm description

The old dirpref algorithm is described in comments:

/*
* Find a cylinder to place a directory.
*
* The policy implemented by this algorithm is to select from
* among those cylinder groups with above the average number of
* free inodes, the one with the smallest number of directories.
*/

A new directory is allocated in a different cylinder groups than its
parent directory resulting in a directory tree that is spreaded across
all the cylinder groups. This spreading out results in a non-optimal
access to the directories and files. When we have a small filesystem
it is not a problem but when the filesystem is big then perfomance
degradation becomes very apparent.

What I mean by a big file system ?

1. A big filesystem is a filesystem which occupy 20-30 or more percent
of total drive space, i.e. first and last cylinder are physically
located relatively far from each other.
2. It has a relatively large number of cylinder groups, for example
more cylinder groups than 50% of the buffers in the buffer cache.

The first results in long access times, while the second results in
many buffers being used by metadata operations. Such operations use
cylinder group blocks and on-disk inode blocks. The cylinder group
block (fs->fs_cblkno) contains struct cg, inode and block bit maps.
It is 2k in size for the default filesystem parameters. If new and
parent directories are located in different cylinder groups then the
system performs more input/output operations and uses more buffers.
On filesystems with many cylinder groups, lots of cache buffers are
used for metadata operations.

My solution for this problem is very simple. I allocate many directories
in one cylinder group. I also do some things, so that the new allocation
method does not cause excessive fragmentation and all directory inodes
will not be located at a location far from its file's inodes and data.
The algorithm is:
/*
* Find a cylinder group to place a directory.
*
* The policy implemented by this algorithm is to allocate a
* directory inode in the same cylinder group as its parent
* directory, but also to reserve space for its files inodes
* and data. Restrict the number of directories which may be
* allocated one after another in the same cylinder group
* without intervening allocation of files.
*
* If we allocate a first level directory then force allocation
* in another cylinder group.
*/

My early versions of dirpref give me a good results for a wide range of
file operations and different filesystem capacities except one case:
those applications that create their entire directory structure first
and only later fill this structure with files.

My solution for such and similar cases is to limit a number of
directories which may be created one after another in the same cylinder
group without intervening file creations. For this purpose, I allocate
an array of counters at mount time. This array is linked to the superblock
fs->fs_contigdirs[cg]. Each time a directory is created the counter
increases and each time a file is created the counter decreases. A 60Gb
filesystem with 8mb/cg requires 10kb of memory for the counters array.

The maxcontigdirs is a maximum number of directories which may be created
without an intervening file creation. I found in my tests that the best
performance occurs when I restrict the number of directories in one cylinder
group such that all its files may be located in the same cylinder group.
There may be some deterioration in performance if all the file inodes
are in the same cylinder group as its containing directory, but their
data partially resides in a different cylinder group. The maxcontigdirs
value is calculated to try to prevent this condition. Since there is
no way to know how many files and directories will be allocated later
I added two optimization parameters in superblock/tunefs. They are:

int32_t fs_avgfilesize; /* expected average file size */
int32_t fs_avgfpdir; /* expected # of files per directory */

These parameters have reasonable defaults but may be tweeked for special
uses of a filesystem. They are only necessary in rare cases like better
tuning a filesystem being used to store a squid cache.

I have been using this algorithm for about 3 months. I have done
a lot of testing on filesystems with different capacities, average
filesize, average number of files per directory, and so on. I think
this algorithm has no negative impact on filesystem perfomance. It
works better than the default one in all cases. The new dirpref
will greatly improve untarring/removing/coping of big directories,
decrease load on cvs servers and much more. The new dirpref doesn't
speedup a compilation process, but also doesn't slow it down.

Obtained from: Grigoriy Orlov <gluk@ptci.ru>
=====

=====
iedowse 2001/04/23 17:37:17 PDT
Pre-dirpref versions of fsck may zero out the new superblock fields
fs_contigdirs, fs_avgfilesize and fs_avgfpdir. This could cause
panics if these fields were zeroed while a filesystem was mounted
read-only, and then remounted read-write.

Add code to ffs_reload() which copies the fs_contigdirs pointer
from the previous superblock, and reinitialises fs_avgf* if necessary.

Reviewed by: mckusick
=====

=====
nik 2001/04/10 03:36:44 PDT
Add information about the new options to newfs and tunefs which set the
expected average file size and number of files per directory. Could do
with some fleshing out.
=====
 1.49 31-Aug-2001  lukem no need to cast arg to lblktosize() any more
 1.48 30-Aug-2001  lukem be consistent when casting arg to lblktosize() in UVM_PAGE_TRKOWN debug code
 1.47 24-Aug-2001  wiz heirarchy -> hierarchy
 1.46 20-Aug-2001  wiz precede, not preceed.
 1.45 09-Aug-2001  lukem correctly cast arguments to scanc()
 1.44 03-Jun-2001  chs branches: 1.44.4;
fix an error case for quotas.
 1.43 30-May-2001  mrg use _KERNEL_OPT
 1.42 13-Mar-2001  sommerfeld Change ffs_dirpref() to pay attention to the amount of available free
space before deciding which cylinder group should contain a new directory
inode.

Fixes kern/11983; works around some, but not all, of the side effects
of kern/11989.

Tested by me for well over a month on my laptop; preliminary versions of
the fix were tested by Frank van der Linden and Herb Peyerl.
 1.41 05-Feb-2001  chs branches: 1.41.2;
add casts to an assertion in ffs_alloc() so it works with offsets past 4GB.
 1.40 18-Jan-2001  jdolecek constify
 1.39 30-Nov-2000  nathanw Don't set the value of doreallocblks here; it's defined over in vfs_cluster.c
In fact, doreallocblks isn't used here at all. Delete the declaration.
 1.38 30-Nov-2000  jdolecek change vfs.ffs.doreallocblks to 1 by default - this does not have
aby bad symptoms any more, fix for bug causing problems with this
option was in BSD4.4-Lite2 and pulled in together with softdep changes

See also Keith Smith & Margo Seltzer's paper on the topic at
http://www.eecs.harvard.edu/~keith/papers/realloc.ps.gz
 1.37 27-Nov-2000  chs Initial integration of the Unified Buffer Cache project.
 1.36 28-Jun-2000  mrg remove include of <vm/vm.h> and <uvm/uvm_extern.h>
 1.35 19-May-2000  thorpej branches: 1.35.4;
NULL != 0
 1.34 04-Apr-2000  jdolecek Add a new sysctl variable vfs.ffs.log_changeopt - if this is true,
an optimalization strategy change is logged into syslog. Default
is 0 (to not log). This replaces the recent not quite "right"
change to only log the change if kernel is compiled with DEBUG.
 1.33 30-Mar-2000  augustss Remove register declarations.
 1.32 29-Mar-2000  jdolecek Log the optimization changes only if DEBUG. Fixes kern/9697
 1.31 14-Feb-2000  fvdl Fixes to the softdep code from Ethan Solomita <ethan@geocast.com>.
* Fix buffer ordering when it has dependencies.
* Alleviate memory problems.
* Deal with some recursive vnode locks (sigh).
* Fix other bugs.
 1.30 15-Nov-1999  fvdl Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.29 24-Mar-1999  mrg branches: 1.29.4; 1.29.8; 1.29.10; 1.29.14;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.
 1.28 05-Mar-1999  mycroft Pass null pointers to VOP_UPDATE rather than having all the callers fetch the
current time themselves.
 1.27 12-Nov-1998  thorpej defopt FFS_EI
 1.26 18-Aug-1998  thorpej branches: 1.26.2;
Back out part of last change (uninitialized work-around).
 1.25 18-Aug-1998  thorpej Add some braces to make egcs happy (ambiguous else warning). Also,
deal with bogus uninitialized warning (__noreturn__ related)
 1.24 09-Aug-1998  perry bzero->memset, bcopy->memcpy, bcmp->memcmp
 1.23 28-Jul-1998  drochner The fragtbl[], inside[] and around[] variables are needed by "fsck",
so we can't put them inside "#ifdef _KERNEL".
Put declarations inside .c files where needed to preserve namespace.
 1.22 09-Jun-1998  scottr Protect various config(8)-generated files from inclusion while
building LKMs. Fixes PR 5557.
 1.21 08-Jun-1998  scottr Use the newly-defined opt_quota.h.
 1.20 19-Mar-1998  ross Fix a 64-bit pointer/int warning.
 1.19 18-Mar-1998  bouyer Add support for reading/writing FFS in non-native byte order, conditioned
to "options FFS_EI". The superblock and inodes (without blk addr) are
byteswapped at disk read/write time, other metadatas are byteswapped
when used (as they are acceeded directly in the buffer cache).
This required the addition of a "um_flags" field to struct ufsmount.
ffs_bswap.c contains superblock and inode byteswap routines also used
by userland utilities.
 1.18 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.17 10-Feb-1998  mrg - add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.16 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)
 1.15 11-Jun-1997  bouyer Add support for ext2fs, this needed a few modifications to ufs/ufs/inode.h:
- added an "union inode_ext" to struct inode, for the per-fs extentions.
For now only ext2fs uses it.
- i_din is now an union:
union {
struct dinode ffs_din; /* 128 bytes of the on-disk dinode. */
struct ext2fs_dinode e2fs_din; /* 128 bytes of the on-disk dinode. */
} i_din
Added a lot of #define i_ffs_* and i_e2fs_* to access the fields.
- Added two macros: FFS_ITIMES and EXT2FS_ITIMES. ITIMES calls the rigth
macro, depending on the time of the inode. ITIMES is used where necessary,
FFS_ITIMES and EXT2FS_ITIMES in other places.
 1.14 10-Mar-1997  mycroft Just increment the generation count. Using the time is bogus and defeats
fsirand(8).
 1.13 12-Oct-1996  christos branches: 1.13.6;
revert previous kprintf changes
 1.12 10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.11 11-May-1996  mycroft Change VOP_UPDATE() semantics:
* Make 2nd and 3rd args timespecs, not timevals.
* Consistently pass a Boolean as the 4th arg (except in LFS).
Also, fix ffs_update() and lfs_update() to actually change the nsec fields.
 1.10 17-Mar-1996  christos Fix printf format strings
 1.9 09-Feb-1996  christos ffs prototypes
 1.8 19-Jul-1995  cgd don't just throw away updates to the cylinder group bitmaps, actually
write them to disk! From Keith Smith at Harvard, via Kirk McKusick.
fixes the occasional `blkfree: freeing free block' that has been seen
when cluster reallocation code is enabled.
 1.7 24-Mar-1995  cgd explicitly cast &time to (struct timeval *) when passing it to VOP_UPDATE.
new prototypes and picky compilers make a volatile mess.
 1.6 16-Dec-1994  mycroft Ignore rotational optimization if nrpos == 1, as suggested by Stefan Esser.
 1.5 14-Dec-1994  mycroft Sync with CSRG.
 1.4 20-Oct-1994  cgd update for new syscall args description mechanism, and deal safely
with wider types.
 1.3 04-Jul-1994  mycroft Do the doasyncfree conditionalization better.
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.3 01-Mar-1998  fvdl Import some files that were changed after Lite2
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.13.6.1 12-Mar-1997  is Merge in changes from Trunk
 1.26.2.4 30-May-1999  chs uvm_vnp_setpageblknos() is gone, and some misc cleanup.
 1.26.2.3 09-Apr-1999  chs in ffs_reallocg(), don't dereference bpp if it's NULL.
 1.26.2.2 25-Feb-1999  chs replace uvm_vnp_relocate() with uvm_vnp_setpageblknos().
 1.26.2.1 09-Nov-1998  chs initial snapshot. lots left to do.
 1.29.14.2 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.29.14.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.29.10.1 19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.29.8.4 27-Mar-2001  bouyer Sync with HEAD.
 1.29.8.3 11-Feb-2001  bouyer Sync with HEAD.
 1.29.8.2 08-Dec-2000  bouyer Sync with HEAD.
 1.29.8.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.29.4.2 11-Jul-1999  chs remove uvm_vnp_uncache(), it's no longer needed.
 1.29.4.1 07-Jun-1999  chs merge everything from chs-ubc branch.
 1.35.4.4 25-Nov-2001  he Pull up revision 1.52 (requested by lukem):
Mark fs_cgrotor as unused.
 1.35.4.3 25-Nov-2001  he Pull up revision 1.50 (requested by lukem):
Pull in enhanced ffs_dirpref() algorithm, which provides a
substantial performance improvement through better locality
between parent/child directories and their files, and by easing
the pressure on the buffer cache for metadata operations.
 1.35.4.2 25-Nov-2001  he Pull up revision 1.45 (requested by lukem):
Fix scanc() arguments.
 1.35.4.1 25-Nov-2001  he Pull up revision 1.42 (requested by lukem):
Change ffs_dirpref() to be less pathological.
 1.41.2.12 29-Dec-2002  thorpej Sync with HEAD.
 1.41.2.11 18-Oct-2002  nathanw Catch up to -current.
 1.41.2.10 15-Jul-2002  nathanw Revert to curproc.
 1.41.2.9 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.41.2.8 20-Jun-2002  nathanw Catch up to -current.
 1.41.2.7 17-Apr-2002  nathanw Catch up to -current.
 1.41.2.6 14-Nov-2001  nathanw Catch up to -current.
 1.41.2.5 21-Sep-2001  nathanw Catch up to -current.
 1.41.2.4 24-Aug-2001  nathanw Catch up with -current.
 1.41.2.3 21-Jun-2001  nathanw Catch up to -current.
 1.41.2.2 09-Apr-2001  nathanw Catch up with -current.
 1.41.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.44.4.5 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.44.4.4 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.44.4.3 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.44.4.2 13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.44.4.1 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.50.2.1 01-Oct-2001  fvdl Catch up with -current.
 1.52.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.55.4.1 05-Jan-2003  jmc Pull up revisions 1.56-1.57 (requested by hannken in ticket #1049)
Clear IN_SPACECOUNTED on (re-)used inodes.
This cures the "unmount pending error:" on softdep umounts.
 1.68.2.11 11-Dec-2005  christos Sync with head.
 1.68.2.10 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.68.2.9 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.68.2.8 18-Dec-2004  skrll Sync with HEAD.
 1.68.2.7 19-Oct-2004  skrll Sync with HEAD
 1.68.2.6 21-Sep-2004  skrll Fix the sync with head I botched.
 1.68.2.5 18-Sep-2004  skrll Sync with HEAD.
 1.68.2.4 03-Sep-2004  skrll Sync with HEAD
 1.68.2.3 24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.68.2.2 03-Aug-2004  skrll Sync with HEAD
 1.68.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.74.2.1 27-Apr-2004  jdc Pull up revision 1.75 (requested by dbj in ticket #185)

Fix problems related to superblock upgrade issues which may be
experienced by -current users from 2003.
 1.80.4.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.80.2.1 29-Apr-2005  kent sync with -current
 1.81.2.1 28-May-2005  tron Pull up revision 1.82 (requested by hannken in ticket #334):
ffs/ffs_alloc.c:
- Add a missing ACTIVECG_CLR().
ffs/ffs_snapshot.c:
- Use async/delayed writes for snapshot creation and sync/uncache these buffers
on end. Reduces the time the file system must be suspended.
- Remove um_snaplistsize. Was a duplicate of um_snapblklist[0].
- Byte swap the list of preallocated blocks on read/write instead of access.
- Always keep this list on ip->i_snapblklist so it may be rolled back when the
newest snapshot gets removed. Fixes a rare snapshot corruption when using
more than one snapshot on a file system.
ufs/ufsmount.h:
- Make TAILQ_LAST() possible on member um_snapshots.
- Remove um_snaplistsize. Was a duplicate of um_snapblklist[0].
 1.84.2.8 04-Feb-2008  yamt sync with head.
 1.84.2.7 21-Jan-2008  yamt sync with head
 1.84.2.6 15-Nov-2007  yamt sync with head.
 1.84.2.5 27-Oct-2007  yamt sync with head.
 1.84.2.4 03-Sep-2007  yamt sync with head.
 1.84.2.3 26-Feb-2007  yamt sync with head.
 1.84.2.2 30-Dec-2006  yamt sync with head.
 1.84.2.1 21-Jun-2006  yamt sync with head.
 1.87.2.2 29-Oct-2005  yamt use ffs_* directly rather than via ufs_ops.
suggested by Chuck Silvers.
 1.87.2.1 20-Oct-2005  yamt adapt ufs.
 1.88.2.1 29-Nov-2005  yamt sync with head.
 1.90.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.90.10.2 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.90.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.90.8.2 26-Jun-2006  yamt sync with head.
 1.90.8.1 24-May-2006  yamt sync with head.
 1.90.6.2 01-Jun-2006  kardel Sync with head.
 1.90.6.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time() and use "time_second"
instead of "time.tv_sec".
 1.90.4.1 09-Sep-2006  rpaulo sync with head
 1.91.2.1 19-Jun-2006  chap Sync with head.
 1.92.2.1 13-Jul-2006  gdamore Merge from HEAD.
 1.93.6.2 10-Dec-2006  yamt sync with head.
 1.93.6.1 22-Oct-2006  yamt sync with head
 1.93.4.2 12-Jan-2007  ad Sync with head.
 1.93.4.1 18-Nov-2006  ad Sync with head.
 1.97.2.1 12-Mar-2007  rmind Sync with HEAD.
 1.98.6.1 19-Mar-2007  reinoud Move the structure `cluster_save' to the dead ufs/ffs code that was using
it solely.

Preserved just in case the code is resurrected one day.
 1.98.2.8 23-Oct-2007  ad Sync with head.
 1.98.2.7 24-Aug-2007  ad Sync with buffer cache locking changes. See buf.h/vfs_bio.c for details.
Some minor portions are incomplete and needs to be verified as a whole.
 1.98.2.6 20-Aug-2007  ad Sync with HEAD.
 1.98.2.5 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.98.2.4 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.98.2.3 06-May-2007  ad ffs_blkfree: don't leak ump->um_lock.
 1.98.2.2 13-Apr-2007  ad Put a per-mount lock around ffs shared data structures, excluding softdep
and quotas. Strategy lifted from FreeBSD.
 1.98.2.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.99.6.3 04-Nov-2007  jmcneill Sync with HEAD.
 1.99.6.2 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.99.6.1 16-Aug-2007  jmcneill Sync with HEAD.
 1.99.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.100.4.1 14-Oct-2007  yamt sync with head.
 1.100.2.3 23-Mar-2008  matt sync with HEAD
 1.100.2.2 09-Jan-2008  matt sync with HEAD
 1.100.2.1 06-Nov-2007  matt sync with HEAD
 1.102.2.2 13-Nov-2007  bouyer Sync with HEAD
 1.102.2.1 25-Oct-2007  bouyer Sync with HEAD.
 1.104.8.2 23-Jan-2008  bouyer Sync with HEAD.
 1.104.8.1 02-Jan-2008  bouyer Sync with HEAD
 1.104.4.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.104.2.1 18-Feb-2008  mjf Sync with HEAD.
 1.106.14.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.106.14.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.106.12.3 11-Mar-2010  yamt sync with head
 1.106.12.2 16-May-2009  yamt sync with head
 1.106.12.1 04-May-2009  yamt sync with head.
 1.106.10.3 17-Jun-2008  yamt sync with head.
 1.106.10.2 04-Jun-2008  yamt sync with head
 1.106.10.1 18-May-2008  yamt sync with head.
 1.106.8.8 04-Jan-2009  christos fix diagnostic printfs.
 1.106.8.7 30-Dec-2008  christos fix dev_t printfs
 1.106.8.6 28-Dec-2008  christos deal with new printfs format inconsistencies.
 1.106.8.5 27-Dec-2008  christos merge with head.
 1.106.8.4 09-Nov-2008  christos merge with head.
 1.106.8.3 01-Nov-2008  christos catch up with changes in head.
 1.106.8.2 01-Nov-2008  christos Sync with head.
 1.106.8.1 29-Mar-2008  christos Welcome to the time_t=long long dev_t=uint64_t branch.
 1.106.6.4 17-Jan-2009  mjf Sync with HEAD.
 1.106.6.3 28-Sep-2008  mjf Sync with HEAD.
 1.106.6.2 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.106.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.109.4.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.109.4.1 19-Oct-2008  haad Sync with HEAD.
 1.109.2.4 28-Jul-2008  simonb Add support for creating a WAPBL log in the filesystem. Will
create an in-filesystem log on first "mount -o log" if one doesn't
exist, and will then continue to use same log in the future. See
(soon to be added) wapbl(4) for more info.

Adds a new B_CONTIG low-level allocation flag that uses hints in
"struct ffs_inode_ext" to lay out an ffs file's data contiguously.

Thanks to Greg Oster for helping with the design of this and to
Antti Kantee for code review and suggestions.
 1.109.2.3 18-Jul-2008  simonb Sync with head.
 1.109.2.2 12-Jun-2008  martin License police
 1.109.2.1 10-Jun-2008  simonb Initial commit of Wasabi System's WAPBL (Write Ahead Physical Block
Logging) journaling code. Originally written by Darrin B. Jewell
while at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

Still a number of issues - look in doc/BRANCHES for "simonb-wapbl"
for more info.
 1.113.4.3 29-Oct-2013  sborrill Pull up the following revisions(s) (requested by bad in ticket #1888):
sys/ufs/ffs/ffs_alloc.c: revision 1.144 via patch

Pull in fix from FreeBSD ffs_alloc.c r121785:
Consider only cylinder groups with at least 75% of the average free space
per cylinder group and 75% of the average free inodes per cylinder group
as candidates for the creation of a new directory. Avoids excessive I/O
scanning for a suitable cylinder group on relatively full file systems.
 1.113.4.2 07-May-2009  snj Pull up following revision(s) (requested by sborrill in ticket #726):
sys/ufs/ffs/ffs_alloc.c: revision 1.123 via patch
Fix random 'filesystem full' messages by trapping a couple of 32-bit
overflow areas missed in rev 1.110 and switching cgbase().
Kudos to rump_ffs!
 1.113.4.1 24-Feb-2009  snj branches: 1.113.4.1.2;
Pull up following revision(s) (requested by ad in ticket #490):
sys/kern/vfs_wapbl.c: revision 1.23
sys/miscfs/syncfs/sync_subr.c: revision 1.36
sys/miscfs/syncfs/sync_vnops.c: revision 1.26
sys/ufs/ffs/ffs_alloc.c: revision 1.121
sys/ufs/ffs/ffs_vfsops.c: revision 1.242
sys/ufs/ffs/ffs_vnops.c: revision 1.110
PR kern/39564 wapbl performance issues with disk cache flushing
PR kern/40361 WAPBL locking panic in -current
PR kern/40361 WAPBL locking panic in -current
PR kern/40470 WAPBL corrupts ext2fs
PR kern/40562 busy loop in ffs_sync when unmounting a file system
PR kern/40525 panic: ffs_valloc: dup alloc
- A fix for an issue that can lead to "ffs_valloc: dup" due to dirty cg
buffers being invalidated. Problem discovered and patch by dholland@.
- If the syncer fails to lazily sync a vnode due to lock contention,
retry 1 second later instead of 30 seconds later.
- Flush inode atime updates every ~10 seconds (this makes most sense with
logging). Presently they didn't hit the disk for read-only files or
devices until the file system was unmounted. It would be better to trickle
the updates out but that would require more extensive changes.
- Fix issues with file system corruption, busy looping and other nasty
problems when logging and non-logging file systems are intermixed,
with one being the root file system.
- For logging, do not flush metadata on an inode-at-a-time basis if the sync
has been requested by ioflush. Previously, we could try hundreds of log
sync operations a second due to inode update activity, causing the syncer
to fall behind and metadata updates to be serialized across the entire
file system. Instead, burst out metadata and log flushes at a minimum
interval of every 10 seconds on an active file system (happens more often
if the log becomes full). Note this does not change the operation of
fsync() etc.
- With the flush issue fixed, re-enable concurrent metadata updates in
vfs_wapbl.c.
 1.113.4.1.2.1 07-May-2009  snj branches: 1.113.4.1.2.1.2;
Pull up following revision(s) (requested by sborrill in ticket #726):
sys/ufs/ffs/ffs_alloc.c: revision 1.123 via patch
Fix random 'filesystem full' messages by trapping a couple of 32-bit
overflow areas missed in rev 1.110 and switching cgbase().
Kudos to rump_ffs!
 1.113.4.1.2.1.2.1 21-Apr-2010  matt sync to netbsd-5
 1.113.2.3 28-Apr-2009  skrll Sync with HEAD.
 1.113.2.2 03-Mar-2009  skrll Sync with HEAD.
 1.113.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.120.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.124.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.125.6.1 20-Jan-2011  bouyer Snapshot of work in progress on a modernised disk quota system:
- new quotactl syscall (versionned for backward compat), which takes
as parameter a path to a mount point, and a prop_dictionary
(in plistref format) describing commands and arguments.
For each command, status and data are returned as a prop_dictionary.
quota commands features will be added to take advantage of this,
exporting quota data or getting quota commands as plists.

- new on disk-format storage (all 64bit wide), integrated to metadata for
ffs (and playing nicely with wapbl).
Quotas are enabled on a ffs filesystem via superblock flags.
tunefs(8) can enable or disable quotas.
On a quota-enabled filesystem, fsck_ffs(8) will track per-uid/gid
block and inode usages, and will check and update quotas in Pass 6.
quota usage and limits are stored in unliked files (one for users,
one for groups)l fsck_ffs(8) will create the files if needed, or
free them if needed. This means that after enabling or disabling
quotas on a filesystem; a fsck_ffs(8) run is required.
quotacheck(8) is not needed any more, on a unclean shutdown
fsck or journal replay will take care of fixing quotas.
newfs(8) can create a ready-to-mount quota-enabled filesystem
(superblock flags are set and quota inodes are created).
Other new features or semantic changes:
- default quota datas, applied to users or groups which don't already
have a quota entry
- per-user/group grace time (instead of a filesystem global one)
- 0 really means "nothing allowed at all", not "no limit".
If you want "no limit", set the limit to UQUAD_MAX (tools will
understand "unlimited" and "-")

A quota file is structured as follow:
it starts with a header, containing a few per-filesystem values,
and the default quota limits.
Quota entries are linked together as a simple list, each entry has a
pointer (as an offset withing the file) to the next.
The header has a pointer to a list of free quota entries, and
a hash table of in-use entries. The size of the hash table depends
on the filesystem block size (header+hash table should fit in the
first block). The file is not sparse and is a multiple of
filesystem block size (when the free quota entry list is empty a new
filesystem block is allocated). quota entries to not cross
filesystem block boundaries.

In memory, the kernel keeps a cache of recently used quota entries
as a reference to the block number, and offset withing the block.
The quota entry itself is keept in the buf cache.

fsck_ffs(8), tunefs(8) and newfs(8) supports are completed (with
related atf tests :)
The kernel can update disk usage and report it via quotactl(2).

Todo: enforce quotas limits (limits are not checked by kernel yet)
update repquota, edquota and rpc.rquotad to the new world
implement compat_50_quotactl ioctl.
update quotactl(2) man page

fsck_ffs required fixes so that allocating new blocks or inodes will
properly update the superblock and cg sumaries. This was not an issue up
to now because superblock and cg sumaries check happened last, but now
allocations or frees can happen in pass 6.
 1.125.4.1 06-Jun-2011  jruoho Sync with HEAD.
 1.125.2.2 21-Apr-2011  rmind sync with head
 1.125.2.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.127.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.129.2.4 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.129.2.3 23-Jan-2013  yamt sync with head
 1.129.2.2 30-Oct-2012  yamt sync with head
 1.129.2.1 17-Apr-2012  yamt sync with head
 1.130.8.5 03-Dec-2017  jdolecek update from HEAD
 1.130.8.4 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.130.8.3 23-Jun-2013  tls resync from head
 1.130.8.2 25-Feb-2013  tls resync with head
 1.130.8.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.130.4.1 29-Oct-2013  sborrill Pull up the following revisions(s) (requested by bad in ticket #978):
sys/ufs/ffs/ffs_alloc.c: revision 1.144

Pull in fix from FreeBSD ffs_alloc.c r121785:
Consider only cylinder groups with at least 75% of the average free space
per cylinder group and 75% of the average free inodes per cylinder group
as candidates for the creation of a new directory. Avoids excessive I/O
scanning for a suitable cylinder group on relatively full file systems.
 1.138.2.1 18-May-2014  rmind sync with head
 1.145.2.1 10-Aug-2014  tls Rebase.
 1.146.2.2 29-May-2019  martin Pull up following revision(s) (requested by kardel in ticket #1697):

sys/ufs/ffs/ffs_alloc.c: revision 1.164

PR/53990, PR/52380, PR/52102: UFS2 cylinder group inode allocation botch

Fix rare allocation botch in ffs_nodealloccg().

Conditions:
a) less than
#_of_initialized_inodes(cg->cg_initediblk)
- inodes_per_filesystem_block
are allocated in the cylinder group
b) cg->cg_irotor points to a uninterupted run of
allocated inodes in the inode bitmap up to the
end of dynamically initialized inodes
(cg->cg_initediblk)

In this case the next inode after this run was returned
without initializing the respective inode block. As the
block is not initialized these inodes could trigger panics
on inode consistency due to old (uninitialized) disk data.

In very rare cases data loss could occur when
the uninitialized inode block is initialized via the
normal mechanism.

Further conditions to occur after the above:
c) no panic
d) no (forced) fsck
e) and more than cg->cg_initediblk - inodes_per_filesystem_block
allocated inodes.

Fix:

Always insure allocation always in initialized inode range
extending the initialized inode range as needed.

Add KASSERTMSG() safeguards.

ok hannken@
 1.146.2.1 14-Aug-2015  msaitoh branches: 1.146.2.1.2; 1.146.2.1.6;
Pull up following revision(s) (requested by riastradh in ticket #949):
sys/ufs/ffs/ffs_alloc.c: revision 1.151
Need wapbl transaction around ffs_blkfree_cg. Fixes wapbl+discard.
 1.146.2.1.6.1 29-May-2019  martin Pull up following revision(s) (requested by kardel in ticket #1697):

sys/ufs/ffs/ffs_alloc.c: revision 1.164

PR/53990, PR/52380, PR/52102: UFS2 cylinder group inode allocation botch

Fix rare allocation botch in ffs_nodealloccg().

Conditions:
a) less than
#_of_initialized_inodes(cg->cg_initediblk)
- inodes_per_filesystem_block
are allocated in the cylinder group
b) cg->cg_irotor points to a uninterupted run of
allocated inodes in the inode bitmap up to the
end of dynamically initialized inodes
(cg->cg_initediblk)

In this case the next inode after this run was returned
without initializing the respective inode block. As the
block is not initialized these inodes could trigger panics
on inode consistency due to old (uninitialized) disk data.

In very rare cases data loss could occur when
the uninitialized inode block is initialized via the
normal mechanism.

Further conditions to occur after the above:
c) no panic
d) no (forced) fsck
e) and more than cg->cg_initediblk - inodes_per_filesystem_block
allocated inodes.

Fix:

Always insure allocation always in initialized inode range
extending the initialized inode range as needed.

Add KASSERTMSG() safeguards.

ok hannken@
 1.146.2.1.2.1 29-May-2019  martin Pull up following revision(s) (requested by kardel in ticket #1697):

sys/ufs/ffs/ffs_alloc.c: revision 1.164

PR/53990, PR/52380, PR/52102: UFS2 cylinder group inode allocation botch

Fix rare allocation botch in ffs_nodealloccg().

Conditions:
a) less than
#_of_initialized_inodes(cg->cg_initediblk)
- inodes_per_filesystem_block
are allocated in the cylinder group
b) cg->cg_irotor points to a uninterupted run of
allocated inodes in the inode bitmap up to the
end of dynamically initialized inodes
(cg->cg_initediblk)

In this case the next inode after this run was returned
without initializing the respective inode block. As the
block is not initialized these inodes could trigger panics
on inode consistency due to old (uninitialized) disk data.

In very rare cases data loss could occur when
the uninitialized inode block is initialized via the
normal mechanism.

Further conditions to occur after the above:
c) no panic
d) no (forced) fsck
e) and more than cg->cg_initediblk - inodes_per_filesystem_block
allocated inodes.

Fix:

Always insure allocation always in initialized inode range
extending the initialized inode range as needed.

Add KASSERTMSG() safeguards.

ok hannken@
 1.147.2.5 28-Aug-2017  skrll Sync with HEAD
 1.147.2.4 05-Dec-2016  skrll Sync with HEAD
 1.147.2.3 05-Oct-2016  skrll Sync with HEAD
 1.147.2.2 22-Sep-2015  skrll Sync with HEAD
 1.147.2.1 06-Apr-2015  skrll Sync with HEAD
 1.151.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.151.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.154.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.156.6.2 29-May-2019  martin Pull up following revision(s) (requested by kardel in ticket #1272):

sys/ufs/ffs/ffs_alloc.c: revision 1.164

PR/53990, PR/52380, PR/52102: UFS2 cylinder group inode allocation botch

Fix rare allocation botch in ffs_nodealloccg().

Conditions:
a) less than
#_of_initialized_inodes(cg->cg_initediblk)
- inodes_per_filesystem_block
are allocated in the cylinder group
b) cg->cg_irotor points to a uninterupted run of
allocated inodes in the inode bitmap up to the
end of dynamically initialized inodes
(cg->cg_initediblk)

In this case the next inode after this run was returned
without initializing the respective inode block. As the
block is not initialized these inodes could trigger panics
on inode consistency due to old (uninitialized) disk data.

In very rare cases data loss could occur when
the uninitialized inode block is initialized via the
normal mechanism.

Further conditions to occur after the above:
c) no panic
d) no (forced) fsck
e) and more than cg->cg_initediblk - inodes_per_filesystem_block
allocated inodes.

Fix:

Always insure allocation always in initialized inode range
extending the initialized inode range as needed.

Add KASSERTMSG() safeguards.

ok hannken@
 1.156.6.1 24-Jul-2017  snj Pull up following revision(s) (requested by hannken in ticket #129):
sys/ufs/ffs/ffs_alloc.c: revision 1.157
When initializing more inodes make sure to write them to disk
before writing the cylinder group with updated cg_initediblk.
 1.159.4.3 21-Apr-2020  martin Sync with HEAD
 1.159.4.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.159.4.1 10-Jun-2019  christos Sync with HEAD
 1.159.2.2 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.159.2.1 28-Jul-2018  pgoyette Sync with HEAD
 1.164.6.1 29-Feb-2020  ad Sync with head.
 1.164.4.1 21-Mar-2020  martin Pull up following revision(s) (requested by riastradh in ticket #790):

sys/ufs/ffs/ffs_alloc.c: revision 1.165

Fix non-DIAGNOSTIC build with UVM_PAGE_TRKOWN.
 1.166.4.1 20-Apr-2020  bouyer Sync with HEAD
 1.171.4.1 13-May-2023  martin Pull up following revision(s) (requested by chs in ticket #160):

usr.sbin/makefs/ffs/ffs_alloc.c: revision 1.31
sbin/tunefs/tunefs.c: revision 1.58
sbin/fsck_ffs/setup.c: revision 1.105
sbin/fsck_ffs/pass5.c: revision 1.56
usr.sbin/makefs/ffs.c: revision 1.74
usr.sbin/makefs/ffs/mkfs.c: revision 1.42
usr.sbin/makefs/Makefile: revision 1.40
sys/ufs/ffs/fs.h: revision 1.71
sbin/fsdb/fsdb.c: revision 1.54
sbin/resize_ffs/resize_ffs.c: revision 1.58
sbin/fsck_ffs/pass4.c: revision 1.29
usr.sbin/makefs/ffs/ffs_extern.h: revision 1.9
sbin/newfs/mkfs.c: revision 1.133
sys/ufs/ffs/ffs_alloc.c: revision 1.172
sbin/fsck_ffs/pass1b.c: revision 1.24
usr.sbin/dumpfs/dumpfs.c: revision 1.68
sys/ufs/ffs/ffs_extern.h: revision 1.88
usr.sbin/quotacheck/quotacheck.c: revision 1.51
sys/ufs/ffs/ffs_subr.c: revision 1.54
sbin/fsck_ffs/main.c: revision 1.91
sbin/fsck_ffs/pass1.c: revision 1.63

ufs: fixed signed/unsigned bugs affecting large file systems

Apply these commits from FreeBSD:
commit e870d1e6f97cc73308c11c40684b775bcfa906a2
Author: Kirk McKusick <mckusick@FreeBSD.org>
Date: Wed Feb 10 20:10:35 2010 +0000
This fix corrects a problem in the file system that treats large
inode numbers as negative rather than unsigned. For a default
(16K block) file system, this bug began to show up at a file system
size above about 16Tb.
To fully handle this problem, newfs must be updated to ensure that
it will never create a filesystem with more than 2^32 inodes. That
patch will be forthcoming soon.
Reported by: Scott Burns, John Kilburg, Bruce Evans
Followup by: Jeff Roberson
PR: 133980
MFC after: 2 weeks

commit 81479e688b0f643ffacd3f335b4b4bba460b769d
Author: Kirk McKusick <mckusick@FreeBSD.org>
Date: Thu Feb 11 18:14:53 2010 +0000
One last pass to get all the unsigned comparisons correct.

In additional to the changes from FreeBSD, this commit includes quite a few
related changes to appease -Wsign-compare.
 1.173.2.1 02-Aug-2025  perseant Sync with HEAD
 1.15 15-Feb-2015  maxv Revert a change in my previous commit that broke the checksum calculation.
Noted by dholland@
 1.14 14-Feb-2015  maxv ffs_appleufs_validate():
- remove superfluous printfs
- ensure ul_namelen!=0, otherwise the kernel accesses ul_name[-1] and
overwrites the previous field in the structure.
 1.13 14-Feb-2015  maxv KNF. No functional change.
 1.12 19-Nov-2011  tls branches: 1.12.8; 1.12.26;
First step of random number subsystem rework described in
<20111022023242.BA26F14A158@mail.netbsd.org>. This change includes
the following:

An initial cleanup and minor reorganization of the entropy pool
code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are
fixed. Some effort is made to accumulate entropy more quickly at
boot time.

A generic interface, "rndsink", is added, for stream generators to
request that they be re-keyed with good quality entropy from the pool
as soon as it is available.

The arc4random()/arc4randbytes() implementation in libkern is
adjusted to use the rndsink interface for rekeying, which helps
address the problem of low-quality keys at boot time.

An implementation of the FIPS 140-2 statistical tests for random
number generator quality is provided (libkern/rngtest.c). This
is based on Greg Rose's implementation from Qualcomm.

A new random stream generator, nist_ctr_drbg, is provided. It is
based on an implementation of the NIST SP800-90 CTR_DRBG by
Henric Jungheim. This generator users AES in a modified counter
mode to generate a backtracking-resistant random stream.

An abstraction layer, "cprng", is provided for in-kernel consumers
of randomness. The arc4random/arc4randbytes API is deprecated for
in-kernel use. It is replaced by "cprng_strong". The current
cprng_fast implementation wraps the existing arc4random
implementation. The current cprng_strong implementation wraps the
new CTR_DRBG implementation. Both interfaces are rekeyed from
the entropy pool automatically at intervals justifiable from best
current cryptographic practice.

In some quick tests, cprng_fast() is about the same speed as
the old arc4randbytes(), and cprng_strong() is about 20% faster
than rnd_extract_data(). Performance is expected to improve.

The AES code in src/crypto/rijndael is no longer an optional
kernel component, as it is required by cprng_strong, which is
not an optional kernel component.

The entropy pool output is subjected to the rngtest tests at
startup time; if it fails, the system will reboot. There is
approximately a 3/10000 chance of a false positive from these
tests. Entropy pool _input_ from hardware random numbers is
subjected to the rngtest tests at attach time, as well as the
FIPS continuous-output test, to detect bad or stuck hardware
RNGs; if any are detected, they are detached, but the system
continues to run.

A problem with rndctl(8) is fixed -- datastructures with
pointers in arrays are no longer passed to userspace (this
was not a security problem, but rather a major issue for
compat32). A new kernel will require a new rndctl.

The sysctl kern.arandom() and kern.urandom() nodes are hooked
up to the new generators, but the /dev/*random pseudodevices
are not, yet.

Manual pages for the new kernel interfaces are forthcoming.
 1.11 22-Jun-2011  mrg branches: 1.11.2;
fix an off by one array overflow found by GCC 4.5.3.
 1.10 24-Apr-2010  dbj switch from 4 clause to 2 clause BSD license.
 1.9 11-Jun-2006  kardel branches: 1.9.60; 1.9.82; 1.9.84;
PR 33697: complete timecounter conversion
 1.8 11-Dec-2005  christos branches: 1.8.4; 1.8.8; 1.8.14;
merge ktrace-lwp.
 1.7 15-Jul-2005  thorpej Use ANSI function decls.
 1.6 26-Feb-2005  perry branches: 1.6.4;
nuke trailing whitespace
 1.5 02-Jan-2004  dbj branches: 1.5.8; 1.5.10;
explicitly pad struct appleufslabel and use __attribute__((__packed__))
since apple put the 64 bit uuid field on a 4 byte boundary
 1.4 02-Jan-2004  dbj add uuid field to apple ufs volume label
 1.3 13-Oct-2003  thorpej Whitespace nits.
 1.2 02-Nov-2002  dbj branches: 1.2.6;
use be32toh instead of ntohl, etc.
 1.1 28-Sep-2002  dbj branches: 1.1.2; 1.1.4;
Add support for the Apple UFS variation on ffs
This is the bulk of PR #17345

The general approach is to use a run time deteriminable value
for DIRBLKSIZ. Additional allowances are included for using
MAXSYMLINKLEN with FS_42INODEFMT and a shift in the cylinder group
cluster summary count array. Support is added for managing
the Apple UFS volume label.
 1.1.4.3 11-Nov-2002  nathanw Catch up to -current
 1.1.4.2 18-Oct-2002  nathanw Catch up to -current.
 1.1.4.1 28-Sep-2002  nathanw file ffs_appleufs.c was added on branch nathanw_sa on 2002-10-18 02:45:48 +0000
 1.1.2.2 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.1.2.1 28-Sep-2002  jdolecek file ffs_appleufs.c was added on branch kqueue on 2002-10-10 18:44:52 +0000
 1.2.6.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.2.6.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.2.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.2.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.2.6.1 03-Aug-2004  skrll Sync with HEAD
 1.5.10.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.5.8.1 29-Apr-2005  kent sync with -current
 1.6.4.1 21-Jun-2006  yamt sync with head.
 1.8.14.1 19-Jun-2006  chap Sync with head.
 1.8.8.1 26-Jun-2006  yamt sync with head.
 1.8.4.1 09-Sep-2006  rpaulo sync with head
 1.9.84.1 30-May-2010  rmind sync with head
 1.9.82.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.9.60.1 11-Aug-2010  yamt sync with head.
 1.11.2.1 17-Apr-2012  yamt sync with head
 1.12.26.1 06-Apr-2015  skrll Sync with HEAD
 1.12.8.1 03-Dec-2017  jdolecek update from HEAD
 1.66 17-Nov-2022  chs Restore backward compatibility of UFS2 with previous NetBSD releases by
disabling support in UFS2 for extended attributes (including ACLs).
Add a new variant of UFS2 called "UFS2ea" that does support extended attributes.
Add new fsck_ffs operations "-c ea" and "-c no-ea" to convert file systems
from UFS2 to UFS2ea and vice-versa (both of which delete all existing extended
attributes in the process).
 1.65 05-Sep-2020  riastradh Round of uvm.h cleanup.

The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
to query whether curlwp is the pagedaemon, which should maybe be
exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
by it. We should split up uvm_extern.h but this will serve for now
to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
UVMHIST(ubchist), since ubchist is declared in uvm.h but the
reference evaporates if UVMHIST is not defined, so we reduce header
file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
here.

ok chs@
 1.64 18-Apr-2020  christos Extended attribute support for ffsv2, from FreeBSD.
 1.63 28-Oct-2017  pgoyette branches: 1.63.4; 1.63.14;
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3) format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
but it is possible that I've missed some of them. I would be glad to
update any stragglers that anyone identifies.
 1.62 25-Sep-2016  jdolecek branches: 1.62.8;
fix typo in #ifdef notyet part
 1.61 28-Mar-2015  maxv branches: 1.61.2;
Remove the 'cred' argument from bread(). Remove a now unused var in
ffs_snapshot.c. Update the man page accordingly.

ok hannken@
 1.60 20-Oct-2013  htodd branches: 1.60.6;
Definining needswap where needed.
 1.59 23-Jun-2013  dholland branches: 1.59.2;
Stick ffs_, ext2_, chfs_, filecore_, cd9660_, or mfs_ in front of
the following symbols so as to disambiguate fully. (Christos already
did the lfs ones.)

lblkno
lblktosize
lfragtosize
numfrags
blkroundup
fragroundup
 1.58 23-Jun-2013  dholland fsbtodb() -> FFS_FSBTODB(), EXT2_FSBTODB(), or MFS_FSBTODB()
dbtofsb() -> FFS_DBTOFSB() or EXT2_DBTOFSB()

(Christos already did the lfs ones a few days back)
 1.57 19-Jun-2013  dholland Rename ambiguous macros:
MAXDIRSIZE -> UFS_MAXDIRSIZE or LFS_MAXDIRSIZE
NINDIR -> FFS_NINDIR, EXT2_NINDIR, LFS_NINDIR, or MFS_NINDIR
INOPB -> FFS_INOPB, LFS_INOPB
INOPF -> FFS_INOPF, LFS_INOPF
blksize -> ffs_blksize, ext2_blksize, or lfs_blksize
sblksize -> ffs_blksize

These are not the only ambiguously defined filesystem macros, of
course, there's a pile more. I may not have found all the ambiguous
definitions of blksize(), too, as there are a lot of other things
called 'blksize' in the system.
 1.56 22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.55 20-Dec-2012  hannken Change bread() and breadn() to never return a buffer on
error and modify all callers to not brelse() on error.

Welcome to 6.99.16

PR kern/46282 (6.0_BETA crash: msdosfs_bmap -> pcbmap -> bread -> bio_doread)
 1.54 23-Apr-2011  hannken branches: 1.54.4; 1.54.14;
Try to keep snapshot indirect blocks contiguous.

This speeds up snapshot creation by a factor of ~3 and reduces
the file system suspension time by a factor of ~5.
 1.53 06-Mar-2011  bouyer merge the bouyer-quota2 branch. This adds a new on-disk format
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.

See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
 1.52 22-Feb-2009  ad branches: 1.52.4; 1.52.6; 1.52.8;
PR kern/26878 FFSv2 + softdep = livelock (no free ram)
PR kern/16942 panic with softdep and quotas
PR kern/19565 panic: softdep_write_inodeblock: indirect pointer #1 mismatch
PR kern/26274 softdep panic: allocdirect_merge: ...
PR kern/26374 Long delay before non-root users can write to softdep partitions
PR kern/28621 1.6.x "vp != NULL" panic in ffs_softdep.c:4653 while unmounting a softdep (+quota) filesystem
PR kern/29513 FFS+Softdep panic with unfsck-able file-corruption
PR kern/31544 The ffs softdep code appears to fail to write dirty bits to disk
PR kern/31981 stopping scsi disk can cause panic (softdep)
PR kern/32116 kernel panic in softdep (assertion failure)
PR kern/32532 softdep_trackbufs deadlock
PR kern/37191 softdep: locking against myself
PR kern/40474 Kernel panic after remounting raid root with softdep

Retire softdep, pass 2. As discussed and later formally announced on the
mailing lists.
 1.51 31-Jul-2008  simonb branches: 1.51.2; 1.51.4; 1.51.8;
Merge the simonb-wapbl branch. From the original branch commit:

Add Wasabi System's WAPBL (Write Ahead Physical Block Logging)
journaling code. Originally written by Darrin B. Jewell while
at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

OK'd by core@, releng@.
 1.50 03-Jun-2008  hannken branches: 1.50.2; 1.50.4;
ufs/ffs: replace calls to getblk() with ffs_getblk(). Now all buffers
have been run through copy-on-write and async mounts work again.

Fixes PR kern/38820

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.49 16-May-2008  hannken Make sure all cached buffers with valid, not yet written data have been
run through copy-on-write. Call fscow_run() with valid data where possible.

The LP_UFSCOW hack is no longer needed to protect ffs_copyonwrite() against
endless recursion.

- Add a flag B_MODIFY to bread(), breada() and breadn(). If set the caller
intends to modify the buffer returned.

- Always run copy-on-write on buffers returned from ffs_balloc().

- Add new function ffs_getblk() that gets a buffer, assigns a new blkno,
may clear the buffer and runs copy-on-write. Process possible errors
from getblk() or fscow_run(). Part of PR kern/38664.

Welcome to 4.99.63

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.48 02-Jan-2008  ad branches: 1.48.6; 1.48.8; 1.48.10; 1.48.12;
Merge vmlocking2 to head.
 1.47 08-Dec-2007  ad branches: 1.47.4;
Add some comments.
 1.46 08-Oct-2007  ad branches: 1.46.4; 1.46.6;
Merge ffs locking & brelse changes from the vmlocking branch.
 1.45 29-Jun-2007  pooka branches: 1.45.6; 1.45.8; 1.45.10;
remove redundant KASSERTs
 1.44 29-Jan-2007  hubertf branches: 1.44.6; 1.44.8; 1.44.10;
Remove more duplicate headers.
Patch by Slava Semushin <slava.semushin@gmail.com>

Again, this was tested by comparing obj files from a pristine and a patched
source tree against an i386/ALL kernel, and also for src/sbin/fsck_ffs,
src/sbin/fsdb and src/usr.sbin/makefs. Only changes in assert() line numbers
were detected in 'objdump -d' output.
 1.43 14-May-2006  elad branches: 1.43.8;
integrate kauth.
 1.42 15-Apr-2006  christos Coverity CID 2858: Avoid NULL deref.
 1.41 23-Mar-2006  hannken ffs_balloc*(): Add an assertion for "bpp != NULL" if B_METAONLY is set.

From Coverity CIDs 1170..1173
 1.40 11-Dec-2005  christos branches: 1.40.4; 1.40.6; 1.40.8; 1.40.10; 1.40.12;
merge ktrace-lwp.
 1.39 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.38 15-Jul-2005  thorpej branches: 1.38.2;
Use ANSI function decls.
 1.37 15-Dec-2004  mycroft branches: 1.37.10;
Remove some unnecessary (int32_t) casts that would cause us to screw up the
top bit in block addresses.

Also, change some daddr_t->int32_t casts (mostly as arguments to ufs_rw32(),
where they would get promoted anyway) to u_int32_t.
 1.36 14-Aug-2004  mycroft In the indirect block unwind case, we only need to do the synchronous writes
of the inode in the softdep case. XXX This is really a deficiency in softdep.
 1.35 25-May-2004  hannken Add ffs internal snapshots. Written by Marshall Kirk McKusick for FreeBSD.

- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.

Welcome to 2.0F.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.34 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.33 02-Apr-2003  fvdl branches: 1.33.2;
Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.32 15-Mar-2003  kristerw ffs_gop_alloc() is not used any more. Remove it.

OK:ed by Konrad Schroder.
 1.31 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.30 05-Jun-2002  chs get the units right when computing a blkno in the ENOSPC path
for allocations involving indirect blocks.
spotted by Trevin Beattie <trevin@xmission.com>.
 1.29 08-Nov-2001  chs branches: 1.29.8; 1.29.10;
the previous fix (in rev. 1.26) for hangs when the filesystem is full
was wrong, so fix it right this time. undo the previous change and
instead, replace the troublesome VOP_FSYNC()s with code that just flushes
the particular indirect blocks that we allocated. this resolves the
softdeps for those blocks. then we can change the pointer for
the first indirect block we allocated to zero, write that, and finally
invalidate all the indirect blocks we've touched. also, wait until
after we finish all this before freeing any blocks we allocated.
fixes PRs 14413 and 14423.
 1.28 30-Oct-2001  lukem add __KERNEL_RCSID()
 1.27 30-Sep-2001  chs branches: 1.27.2;
in ffs_balloc(), clean up page cache state to avoid hangs when we
get ENOSPC. as a result of this, we now skip some of the normal cleanup
in ufs_balloc_range() in the error case.
 1.26 15-Sep-2001  chs a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.25 08-Aug-2001  lukem branches: 1.25.2;
get argument name correct in comment describing vop_balloc_args
 1.24 30-May-2001  mrg branches: 1.24.4;
use _KERNEL_OPT
 1.23 27-Nov-2000  chs branches: 1.23.2;
Initial integration of the Unified Buffer Cache project.
 1.22 19-Sep-2000  fvdl Adapt for VOP_FSYNC parameter change.

Implement range fsync for FFS. Note: not yet implemented for the
SOFTDEP case.
 1.21 28-Jun-2000  mrg remove include of <vm/vm.h> and <uvm/uvm_extern.h>
 1.20 29-May-2000  mycroft branches: 1.20.2;
MNT_WAIT -> FSYNC_WAIT
 1.19 28-May-2000  mycroft DTRT when unwinding multiple levels.
 1.18 28-May-2000  mycroft When unwinding a failed allocation, make sure to nuke the unwound block from
the vnode's block list. This fixes `itrunc3' panics (at least in some cases;
further testing is needed) and prevents further lossage later on.
 1.17 25-Feb-2000  fvdl branches: 1.17.2;
Fix a bug introduced in Lite2 with block allocation and full disk
conditions. Reported by Ian Dowse <iedowse@maths.tcd.ie>, based
on patch in FreeBSD reviewed by Kirk McKusick.
 1.16 14-Feb-2000  fvdl Fixes to the softdep code from Ethan Solomita <ethan@geocast.com>.
* Fix buffer ordering when it has dependencies.
* Alleviate memory problems.
* Deal with some recursive vnode locks (sigh).
* Fix other bugs.
 1.15 15-Nov-1999  fvdl Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.14 24-Mar-1999  mrg branches: 1.14.4; 1.14.8; 1.14.10; 1.14.14;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.
 1.13 27-Oct-1998  mycroft branches: 1.13.2;
Do not corrupt file flags when file system is full!
 1.12 13-Jun-1998  kleink KNF, mostly of FFS_EI changes.
 1.11 09-Jun-1998  scottr Protect various config(8)-generated files from inclusion while
building LKMs. Fixes PR 5557.
 1.10 08-Jun-1998  scottr Use the newly-defined opt_quota.h.
 1.9 18-Mar-1998  bouyer Add support for reading/writing FFS in non-native byte order, conditioned
to "options FFS_EI". The superblock and inodes (without blk addr) are
byteswapped at disk read/write time, other metadatas are byteswapped
when used (as they are acceeded directly in the buffer cache).
This required the addition of a "um_flags" field to struct ufsmount.
ffs_bswap.c contains superblock and inode byteswap routines also used
by userland utilities.
 1.8 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.7 10-Feb-1998  mrg - add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.6 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)
 1.5 04-Jul-1997  drochner Don't cast 64bit (off_t) file sizes to vm_offset_t (32bit on many
architectures), truncate them intelligently instead.
The truncation is done centralized in vnode_pager.c.
This prevents from wrap-over effects when parts of large (>2^32 byte) files
are mmapped.
Don't allow to mmap above the numerical range of vm_offset_t.
This is considered a temporary solution until the vm system handles the
object sizes/offsets more cleanly.
 1.4 11-Jun-1997  bouyer Add support for ext2fs, this needed a few modifications to ufs/ufs/inode.h:
- added an "union inode_ext" to struct inode, for the per-fs extentions.
For now only ext2fs uses it.
- i_din is now an union:
union {
struct dinode ffs_din; /* 128 bytes of the on-disk dinode. */
struct ext2fs_dinode e2fs_din; /* 128 bytes of the on-disk dinode. */
} i_din
Added a lot of #define i_ffs_* and i_e2fs_* to access the fields.
- Added two macros: FFS_ITIMES and EXT2FS_ITIMES. ITIMES calls the rigth
macro, depending on the time of the inode. ITIMES is used where necessary,
FFS_ITIMES and EXT2FS_ITIMES in other places.
 1.3 09-Feb-1996  christos ffs prototypes
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.13.2.5 30-May-1999  chs in ffs_balloc(), remove the "alloced" flag I added. with the demise
of the vm_page blkno field this is no longer useful.
also be sure to return the blkno in all cases.
in ffs_balloc_range(), uvm_vnp_setpageblknos() is gone.
 1.13.2.4 29-Apr-1999  chs catch another case in ffs_balloc() where we need to set the aux return info.
adjust the file size in ffs_balloc_range() instead of ffs_write(),
the allocator routines need to have current info.
 1.13.2.3 09-Apr-1999  chs undo combining of two cases that were actually different.
 1.13.2.2 25-Feb-1999  chs add some args to ffs_balloc() to allow it to return the
physical blkno of the requested block and whether or not
the block was allocated by the current call.
move ffs_mballoc() here from ufs_readwrite.c and rename it
to ffs_balloc_range().
 1.13.2.1 09-Nov-1998  chs initial snapshot. lots left to do.
 1.14.14.2 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.14.14.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.14.10.1 19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.14.8.2 08-Dec-2000  bouyer Sync with HEAD.
 1.14.8.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.14.4.6 06-Aug-1999  chs avoid setting u_size lower in ffs_balloc(), otherwise we'll end up
PG_RELEASEing pages we have busy in ufs_balloc_range().
 1.14.4.5 31-Jul-1999  chs adapt to new VOP_BALLOC() interface.
 1.14.4.4 11-Jul-1999  chs no need to call uvm_vnp_zerorange() in ffs_balloc() anymore,
it's handled differently now.
 1.14.4.3 06-Jul-1999  chs avoid creating pages beyond EOF.
 1.14.4.2 04-Jul-1999  chs convert ffs_balloc() to a VOP interface.
rename ffs_balloc_range() to ufs_balloc_range() in ufs_inode.c.
 1.14.4.1 07-Jun-1999  chs merge everything from chs-ubc branch.
 1.17.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.20.2.1 14-Dec-2000  he Pull up revision 1.22 (requested by fvdl):
Improve NFS performance, possibly with as much as 100% in
throughput. Please note: this implies a kernel interface change,
VOP_FSYNC gains two arguments.
 1.23.2.7 20-Jun-2002  nathanw Catch up to -current.
 1.23.2.6 14-Nov-2001  nathanw Catch up to -current.
 1.23.2.5 08-Oct-2001  nathanw Catch up to -current.
 1.23.2.4 21-Sep-2001  nathanw Catch up to -current.
 1.23.2.3 24-Aug-2001  nathanw Catch up with -current.
 1.23.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.23.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.24.4.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.24.4.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.24.4.1 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.25.2.1 01-Oct-2001  fvdl Catch up with -current.
 1.27.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.29.10.1 05-Jun-2002  lukem Pull up revision 1.30 (requested by chuq in ticket #171):
get the units right when computing a blkno in the ENOSPC path
for allocations involving indirect blocks.
spotted by Trevin Beattie <trevin@xmission.com>.
 1.29.8.1 20-Jun-2002  gehenna catch up with -current.
 1.33.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.33.2.5 18-Dec-2004  skrll Sync with HEAD.
 1.33.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.33.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.33.2.2 25-Aug-2004  skrll Sync with HEAD.
 1.33.2.1 03-Aug-2004  skrll Sync with HEAD
 1.37.10.5 21-Jan-2008  yamt sync with head
 1.37.10.4 27-Oct-2007  yamt sync with head.
 1.37.10.3 03-Sep-2007  yamt sync with head.
 1.37.10.2 26-Feb-2007  yamt sync with head.
 1.37.10.1 21-Jun-2006  yamt sync with head.
 1.38.2.2 29-Oct-2005  yamt use ffs_* directly rather than via ufs_ops.
suggested by Chuck Silvers.
 1.38.2.1 20-Oct-2005  yamt adapt ufs.
 1.40.12.2 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.40.12.1 28-Mar-2006  tron Merge 2006-03-28 NetBSD-current into the "peter-altq" branch.
 1.40.10.3 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.40.10.2 19-Apr-2006  elad sync with head.
 1.40.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.40.8.2 24-May-2006  yamt sync with head.
 1.40.8.1 01-Apr-2006  yamt sync with head.
 1.40.6.2 01-Jun-2006  kardel Sync with head.
 1.40.6.1 22-Apr-2006  simonb Sync with head.
 1.40.4.1 09-Sep-2006  rpaulo sync with head
 1.43.8.1 01-Feb-2007  ad Sync with head.
 1.44.10.1 09-Dec-2007  reinoud Pullup to HEAD
 1.44.8.1 11-Jul-2007  mjf Sync with head.
 1.44.6.6 24-Oct-2007  ad Comment out 'fix' for allocation failure with softdep. It would hang
because we can try to flush pages that we hold busy. Instead it now
crashes (matching what happens on HEAD).
 1.44.6.5 16-Sep-2007  ad - Checkpoint work in progress on the vnode lifecycle and reference counting
stuff. This makes it work properly without kernel_lock and fixes a few
quite old bugs. See vfs_subr.c 1.283.2.17 for details.

- Fix some problems with softdep. Unfortunately our softdep code appears
to have some longstanding bugs that cause it fail under stress test.
 1.44.6.4 24-Aug-2007  ad Sync with buffer cache locking changes. See buf.h/vfs_bio.c for details.
Some minor portions are incomplete and needs to be verified as a whole.
 1.44.6.3 15-Jul-2007  ad Sync with head.
 1.44.6.2 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.44.6.1 13-Apr-2007  ad Put a per-mount lock around ffs shared data structures, excluding softdep
and quotas. Strategy lifted from FreeBSD.
 1.45.10.1 14-Oct-2007  yamt sync with head.
 1.45.8.2 09-Jan-2008  matt sync with HEAD
 1.45.8.1 06-Nov-2007  matt sync with HEAD
 1.45.6.2 09-Dec-2007  jmcneill Sync with HEAD.
 1.45.6.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.46.6.2 08-Dec-2007  ad Sync with head.
 1.46.6.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.46.4.1 18-Feb-2008  mjf Sync with HEAD.
 1.47.4.1 02-Jan-2008  bouyer Sync with HEAD
 1.48.12.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.48.12.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.48.10.1 04-May-2009  yamt sync with head.
 1.48.8.2 04-Jun-2008  yamt sync with head
 1.48.8.1 18-May-2008  yamt sync with head.
 1.48.6.3 28-Sep-2008  mjf Sync with HEAD.
 1.48.6.2 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.48.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.50.4.1 19-Oct-2008  haad Sync with HEAD.
 1.50.2.1 28-Jul-2008  simonb Add support for creating a WAPBL log in the filesystem. Will
create an in-filesystem log on first "mount -o log" if one doesn't
exist, and will then continue to use same log in the future. See
(soon to be added) wapbl(4) for more info.

Adds a new B_CONTIG low-level allocation flag that uses hints in
"struct ffs_inode_ext" to lay out an ffs file's data contiguously.

Thanks to Greg Oster for helping with the design of this and to
Antti Kantee for code review and suggestions.
 1.51.8.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.51.4.1 18-Jun-2011  bouyer Pull up following revision(s) (requested by hannken in ticket #1627):
sys/kern/vfs_wapbl.c: revisions 1.41-1.42
sbin/dump/snapshot.c: revisions 1.6 (patch)
share/man/man4/fss.4: revisions 1.15 (patch)
sys/dev/fss.c: revisions 1.73 (patch)
sys/dev/fssvar.h: revisions 1.25
usr.sbin/fssconfig/fssconfig.c: revisions 1.7
sys/ufs/ffs/ffs_balloc.c: revisions 1.54
sys/ufs/ffs/ffs_snapshot.c: revisions 1.90, 1.98, 1.100-1.101, 1.103-1.110, 1.111, 1.112-1.115 (patch)

- Try to keep snapshot indirect blocks contiguous. This speeds up snapshot
creation by a factor of ~3 and reduces the file system suspension time by
a factor of ~5.

- Refine the scope of WAPBL transactions and the limit for deallocations in
one transaction so we should no longer get a "wapbl_flush: current
transaction too big to flush" panic when creating or removing snapshots
on larger logging disks.

- fss(4): Allow FSSIOCSET to set the initial flags. Add a new flag
"FSS_UNLINK_ON_CREATE" to unlink the backing store before the snapshot
gets created. With this change dump(8) no longer dumps the zero-sized,
but named snapshot it is working on.
 1.51.2.1 03-Mar-2009  skrll Sync with HEAD.
 1.52.8.1 20-Jan-2011  bouyer Snapshot of work in progress on a modernised disk quota system:
- new quotactl syscall (versionned for backward compat), which takes
as parameter a path to a mount point, and a prop_dictionary
(in plistref format) describing commands and arguments.
For each command, status and data are returned as a prop_dictionary.
quota commands features will be added to take advantage of this,
exporting quota data or getting quota commands as plists.

- new on disk-format storage (all 64bit wide), integrated to metadata for
ffs (and playing nicely with wapbl).
Quotas are enabled on a ffs filesystem via superblock flags.
tunefs(8) can enable or disable quotas.
On a quota-enabled filesystem, fsck_ffs(8) will track per-uid/gid
block and inode usages, and will check and update quotas in Pass 6.
quota usage and limits are stored in unliked files (one for users,
one for groups)l fsck_ffs(8) will create the files if needed, or
free them if needed. This means that after enabling or disabling
quotas on a filesystem; a fsck_ffs(8) run is required.
quotacheck(8) is not needed any more, on a unclean shutdown
fsck or journal replay will take care of fixing quotas.
newfs(8) can create a ready-to-mount quota-enabled filesystem
(superblock flags are set and quota inodes are created).
Other new features or semantic changes:
- default quota datas, applied to users or groups which don't already
have a quota entry
- per-user/group grace time (instead of a filesystem global one)
- 0 really means "nothing allowed at all", not "no limit".
If you want "no limit", set the limit to UQUAD_MAX (tools will
understand "unlimited" and "-")

A quota file is structured as follow:
it starts with a header, containing a few per-filesystem values,
and the default quota limits.
Quota entries are linked together as a simple list, each entry has a
pointer (as an offset withing the file) to the next.
The header has a pointer to a list of free quota entries, and
a hash table of in-use entries. The size of the hash table depends
on the filesystem block size (header+hash table should fit in the
first block). The file is not sparse and is a multiple of
filesystem block size (when the free quota entry list is empty a new
filesystem block is allocated). quota entries to not cross
filesystem block boundaries.

In memory, the kernel keeps a cache of recently used quota entries
as a reference to the block number, and offset withing the block.
The quota entry itself is keept in the buf cache.

fsck_ffs(8), tunefs(8) and newfs(8) supports are completed (with
related atf tests :)
The kernel can update disk usage and report it via quotactl(2).

Todo: enforce quotas limits (limits are not checked by kernel yet)
update repquota, edquota and rpc.rquotad to the new world
implement compat_50_quotactl ioctl.
update quotactl(2) man page

fsck_ffs required fixes so that allocating new blocks or inodes will
properly update the superblock and cg sumaries. This was not an issue up
to now because superblock and cg sumaries check happened last, but now
allocations or frees can happen in pass 6.
 1.52.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.52.4.2 31-May-2011  rmind sync with head
 1.52.4.1 21-Apr-2011  rmind sync with head
 1.54.14.4 03-Dec-2017  jdolecek update from HEAD
 1.54.14.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.54.14.2 23-Jun-2013  tls resync from head
 1.54.14.1 25-Feb-2013  tls resync with head
 1.54.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.54.4.1 23-Jan-2013  yamt sync with head
 1.59.2.1 18-May-2014  rmind sync with head
 1.60.6.2 05-Oct-2016  skrll Sync with HEAD
 1.60.6.1 06-Apr-2015  skrll Sync with HEAD
 1.61.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.62.8.1 02-Nov-2017  snj Pull up following revision(s) (requested by pgoyette in ticket #335):
share/man/man9/kernhist.9: 1.5-1.8
sys/arch/acorn26/acorn26/pmap.c: 1.39
sys/arch/arm/arm32/fault.c: 1.105 via patch
sys/arch/arm/arm32/pmap.c: 1.350, 1.359
sys/arch/arm/broadcom/bcm2835_bsc.c: 1.7
sys/arch/arm/omap/if_cpsw.c: 1.20
sys/arch/arm/omap/tiotg.c: 1.7
sys/arch/evbarm/conf/RPI2_INSTALL: 1.3
sys/dev/ic/sl811hs.c: 1.98
sys/dev/usb/ehci.c: 1.256
sys/dev/usb/if_axe.c: 1.83
sys/dev/usb/motg.c: 1.18
sys/dev/usb/ohci.c: 1.274
sys/dev/usb/ucom.c: 1.119
sys/dev/usb/uhci.c: 1.277
sys/dev/usb/uhub.c: 1.137
sys/dev/usb/umass.c: 1.160-1.162
sys/dev/usb/umass_quirks.c: 1.100
sys/dev/usb/umass_scsipi.c: 1.55
sys/dev/usb/usb.c: 1.168
sys/dev/usb/usb_mem.c: 1.70
sys/dev/usb/usb_subr.c: 1.221
sys/dev/usb/usbdi.c: 1.175
sys/dev/usb/usbdi_util.c: 1.67-1.70
sys/dev/usb/usbroothub.c: 1.3
sys/dev/usb/xhci.c: 1.75
sys/external/bsd/drm2/dist/drm/i915/i915_gem.c: 1.34
sys/kern/kern_history.c: 1.15
sys/kern/kern_xxx.c: 1.74
sys/kern/vfs_bio.c: 1.275-1.276
sys/miscfs/genfs/genfs_io.c: 1.71
sys/sys/kernhist.h: 1.21
sys/ufs/ffs/ffs_balloc.c: 1.63
sys/ufs/lfs/lfs_vfsops.c: 1.361
sys/ufs/lfs/ulfs_inode.c: 1.21
sys/ufs/lfs/ulfs_vnops.c: 1.52
sys/ufs/ufs/ufs_inode.c: 1.102
sys/ufs/ufs/ufs_vnops.c: 1.239
sys/uvm/pmap/pmap.c: 1.37-1.39
sys/uvm/pmap/pmap_tlb.c: 1.22
sys/uvm/uvm_amap.c: 1.108
sys/uvm/uvm_anon.c: 1.64
sys/uvm/uvm_aobj.c: 1.126
sys/uvm/uvm_bio.c: 1.91
sys/uvm/uvm_device.c: 1.66
sys/uvm/uvm_fault.c: 1.201
sys/uvm/uvm_km.c: 1.144
sys/uvm/uvm_loan.c: 1.85
sys/uvm/uvm_map.c: 1.353
sys/uvm/uvm_page.c: 1.194
sys/uvm/uvm_pager.c: 1.111
sys/uvm/uvm_pdaemon.c: 1.109
sys/uvm/uvm_swap.c: 1.175
sys/uvm/uvm_vnode.c: 1.103
usr.bin/vmstat/vmstat.c: 1.219
Reorder to test for null before null deref in debug code
--
Reorder to test for null before null deref in debug code
--
KNF
--
No need for '\n' in UVMHIST_LOG
--
normalise a BIOHIST log message
--
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...
(As proposed on tech-kern@ with additional changes and enhancements.)
Details of changes:
* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)
* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.
* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.
* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."
* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.
* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).
* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).
* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.
* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.
[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3)
format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".
[2] I've tried very hard to find "all [the] existing users of
kernhist(9)"
but it is possible that I've missed some of them. I would be glad
to
update any stragglers that anyone identifies.
--
For some reason this single kernel seems to have outgrown its declared
size as a result of the kernhist(9) changes. Bump the size.
XXX The amount of increase may be excessive - anyone with more detailed
XXX knowledge please feel free to further adjust the value
appropriately.
--
Misssed one cast of pointer --> uintptr_t in previous kernhist(9) commit
--
And yet another one. :(
--
Use correct mark-up for NetBSD version.
--
More improvements in grammar and readability.
--
Remove a stray '"' (obvious typo) and add a couple of casts that are
probably needed.
--
And replace an instance of "%p" conversion with "%#jx"
--
Whitespace fix. Give Bl tag table a width. Fix Xr.
 1.63.14.1 20-Apr-2020  bouyer Sync with HEAD
 1.63.4.1 21-Apr-2020  martin Sync with HEAD
 1.40 09-Feb-2017  kre Sprinkle in a pinch of const, not too much, just enough
to add a little strength without affecting the overall balance...
 1.39 20-May-2015  riastradh branches: 1.39.2; 1.39.4;
memcpy di_extb/db/ib separately. Noted by Coverity, CID 974636.
 1.38 20-May-2015  riastradh Don't (harmlessly) overrun di_db array; copy di_ib separately.

Noted by Coverity, CID 974635.

While here, simplify size calculation for memcpy.
 1.37 09-Jun-2013  dholland branches: 1.37.8; 1.37.10;
Remove lfs-only inumber field (and its supporting union) from struct
ufs1_dinode.
 1.36 22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.35 06-Mar-2011  bouyer branches: 1.35.4; 1.35.14;
merge the bouyer-quota2 branch. This adds a new on-disk format
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.

See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
 1.34 19-Oct-2009  bouyer branches: 1.34.4; 1.34.6; 1.34.8;
Remove closes 3 & 4 from my licence. Lots of thanks to Soren Jacobsen
for the booring work !
 1.33 18-Jan-2009  lukem fix -Wsign-compare issues
 1.32 11-Dec-2005  christos branches: 1.32.74; 1.32.84;
merge ktrace-lwp.
 1.31 03-Jun-2005  dbj the cluster summary must be swapped even for ufs2
 1.30 02-Jun-2005  is fix copy/paste/don'tupdate bug (fix from PR 22232 by Robert Elz).
 1.29 26-Feb-2005  perry branches: 1.29.2;
nuke trailing whitespace
 1.28 25-May-2004  hannken branches: 1.28.4; 1.28.6;
Add ffs internal snapshots. Written by Marshall Kirk McKusick for FreeBSD.

- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.

Welcome to 2.0F.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.27 31-Dec-2003  dbj branches: 1.27.4;
remove incorrect XXX comments I introduced a couple of days ago
 1.26 31-Dec-2003  dbj remove unused cs_numclusters field from struct csum_total
this avoids a potential future bug if it is ever used.
before this fix, fsck_ffs would check and fix this field to be zero
 1.25 31-Dec-2003  dbj reorder ffs_sb_swap to reflect actual order in superblock
add comments regarding historical field overlap
no functional change
 1.24 31-Dec-2003  dbj add fs_flags to ffs_sb_swap
 1.23 30-Dec-2003  dbj fix bugs in ffs_cg_swap for FS_42POSTBLFMT
 1.22 27-Oct-2003  lukem Overhaul how `build.sh tools' are used:

* Rename "config.h" to "nbtool_config.h" and
HAVE_CONFIG_H to HAVE_NBTOOL_CONFIG_H.
This makes in more obvious in the source when we're using
tools/compat/config.h versus "standard autoconf" config.h

* Consistently move the inclusion of nbtool_config.h to before
<sys/cdefs.h> so that the former can provide __RCSID() (et al),
and there's no need to protect those macros any more.

These changes should make it easier to "tool-ify" a program by adding:
#if HAVE_NBTOOL_CONFIG_H
#include "nbtool_config.h"
#endif
to the top of the source files (for the general case).
 1.21 05-Oct-2003  bouyer Remove references to University of California from my copyright notices.
 1.20 16-Apr-2003  yamt branches: 1.20.2;
use bswap32 and bswap64 correctly.
(fs_pendingblocks and fs_pendinginodes)
 1.19 11-Apr-2003  enami Make ffs_cg_swap() works even if same chunk is passed as new and old cg.
This is necessary to prevent newfs from dumping core when it is asked to
create a UFS1 file system of non-native endian.
 1.18 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.17 31-Jan-2002  tv These sources are pulled into makefs(8), so we need config.h and protection
for __KERNEL_RCSID().
 1.16 18-Dec-2001  fvdl Bring over fixes from FreeBSD that weren't incorporated yet, mainly
from Kirk McKusick. They implement taking pending block/inode frees
into account for the sake of correct statfs() numbers, and adding
a new softdep type (newdirblk) to correctly handle newly allocated
directory blocks.

Minor additional changes: 1) swap the newly introduced fs_pendinginodes
and fs_pendingblock fields in ffs_sb_swap, and 2) declare lkt_held
in the debug version of the softdep lock structure volatile, as it
can be modified from interrupt context #ifdef DEBUG.
 1.15 30-Oct-2001  lukem add __KERNEL_RCSID()
 1.14 29-Oct-2001  lukem ffs_sb_swap() fixes:
- calculate the offset and length of the postbl before byteswapping.
problem noted by der Mouse.
- use offsetof() to determine # of fields to calculate in initial
loop, rather than hard-coding in `52 fields'
- improve comments.
 1.13 06-Sep-2001  lukem branches: 1.13.4;
Incorporate the enhanced ffs_dirpref() by Grigoriy Orlov, as found in
FreeBSD (three commits; the initial work, man page updates, and a fix
to ffs_reload()), with the following differences:
- Be consistent between newfs(8) and tunefs(8) as to the options which
set and control the tuning parameters for this work (avgfilesize & avgfpdir)
- Use u_int16_t instead of u_int8_t to keep track of the number of
contiguous directories (suggested by Chuck Silvers)
- Work within our FFS_EI framework
- Ensure that fs->fs_maxclusters and fs->fs_contigdirs don't point to
the same area of memory

The new algorithm has a marked performance increase, especially when
performing tasks such as untarring pkgsrc.tar.gz, etc.

The original FreeBSD commit messages are attached:

=====
mckusick 2001/04/10 01:39:00 PDT
Directory layout preference improvements from Grigoriy Orlov <gluk@ptci.ru>.
His description of the problem and solution follow. My own tests show
speedups on typical filesystem intensive workloads of 5% to 12% which
is very impressive considering the small amount of code change involved.

------

One day I noticed that some file operations run much faster on
small file systems then on big ones. I've looked at the ffs
algorithms, thought about them, and redesigned the dirpref algorithm.

First I want to describe the results of my tests. These results are old
and I have improved the algorithm after these tests were done. Nevertheless
they show how big the perfomance speedup may be. I have done two file/directory
intensive tests on a two OpenBSD systems with old and new dirpref algorithm.
The first test is "tar -xzf ports.tar.gz", the second is "rm -rf ports".
The ports.tar.gz file is the ports collection from the OpenBSD 2.8 release.
It contains 6596 directories and 13868 files. The test systems are:

1. Celeron-450, 128Mb, two IDE drives, the system at wd0, file system for
test is at wd1. Size of test file system is 8 Gb, number of cg=991,
size of cg is 8m, block size = 8k, fragment size = 1k OpenBSD-current
from Dec 2000 with BUFCACHEPERCENT=35

2. PIII-600, 128Mb, two IBM DTLA-307045 IDE drives at i815e, the system
at wd0, file system for test is at wd1. Size of test file system is 40 Gb,
number of cg=5324, size of cg is 8m, block size = 8k, fragment size = 1k
OpenBSD-current from Dec 2000 with BUFCACHEPERCENT=50

You can get more info about the test systems and methods at:
http://www.ptci.ru/gluk/dirpref/old/dirpref.html

Test Results

tar -xzf ports.tar.gz rm -rf ports
mode old dirpref new dirpref speedup old dirprefnew dirpref speedup
First system
normal 667 472 1.41 477 331 1.44
async 285 144 1.98 130 14 9.29
sync 768 616 1.25 477 334 1.43
softdep 413 252 1.64 241 38 6.34
Second system
normal 329 81 4.06 263.5 93.5 2.81
async 302 25.7 11.75 112 2.26 49.56
sync 281 57.0 4.93 263 90.5 2.9
softdep 341 40.6 8.4 284 4.76 59.66

"old dirpref" and "new dirpref" columns give a test time in seconds.
speedup - speed increasement in times, ie. old dirpref / new dirpref.

------

Algorithm description

The old dirpref algorithm is described in comments:

/*
* Find a cylinder to place a directory.
*
* The policy implemented by this algorithm is to select from
* among those cylinder groups with above the average number of
* free inodes, the one with the smallest number of directories.
*/

A new directory is allocated in a different cylinder groups than its
parent directory resulting in a directory tree that is spreaded across
all the cylinder groups. This spreading out results in a non-optimal
access to the directories and files. When we have a small filesystem
it is not a problem but when the filesystem is big then perfomance
degradation becomes very apparent.

What I mean by a big file system ?

1. A big filesystem is a filesystem which occupy 20-30 or more percent
of total drive space, i.e. first and last cylinder are physically
located relatively far from each other.
2. It has a relatively large number of cylinder groups, for example
more cylinder groups than 50% of the buffers in the buffer cache.

The first results in long access times, while the second results in
many buffers being used by metadata operations. Such operations use
cylinder group blocks and on-disk inode blocks. The cylinder group
block (fs->fs_cblkno) contains struct cg, inode and block bit maps.
It is 2k in size for the default filesystem parameters. If new and
parent directories are located in different cylinder groups then the
system performs more input/output operations and uses more buffers.
On filesystems with many cylinder groups, lots of cache buffers are
used for metadata operations.

My solution for this problem is very simple. I allocate many directories
in one cylinder group. I also do some things, so that the new allocation
method does not cause excessive fragmentation and all directory inodes
will not be located at a location far from its file's inodes and data.
The algorithm is:
/*
* Find a cylinder group to place a directory.
*
* The policy implemented by this algorithm is to allocate a
* directory inode in the same cylinder group as its parent
* directory, but also to reserve space for its files inodes
* and data. Restrict the number of directories which may be
* allocated one after another in the same cylinder group
* without intervening allocation of files.
*
* If we allocate a first level directory then force allocation
* in another cylinder group.
*/

My early versions of dirpref give me a good results for a wide range of
file operations and different filesystem capacities except one case:
those applications that create their entire directory structure first
and only later fill this structure with files.

My solution for such and similar cases is to limit a number of
directories which may be created one after another in the same cylinder
group without intervening file creations. For this purpose, I allocate
an array of counters at mount time. This array is linked to the superblock
fs->fs_contigdirs[cg]. Each time a directory is created the counter
increases and each time a file is created the counter decreases. A 60Gb
filesystem with 8mb/cg requires 10kb of memory for the counters array.

The maxcontigdirs is a maximum number of directories which may be created
without an intervening file creation. I found in my tests that the best
performance occurs when I restrict the number of directories in one cylinder
group such that all its files may be located in the same cylinder group.
There may be some deterioration in performance if all the file inodes
are in the same cylinder group as its containing directory, but their
data partially resides in a different cylinder group. The maxcontigdirs
value is calculated to try to prevent this condition. Since there is
no way to know how many files and directories will be allocated later
I added two optimization parameters in superblock/tunefs. They are:

int32_t fs_avgfilesize; /* expected average file size */
int32_t fs_avgfpdir; /* expected # of files per directory */

These parameters have reasonable defaults but may be tweeked for special
uses of a filesystem. They are only necessary in rare cases like better
tuning a filesystem being used to store a squid cache.

I have been using this algorithm for about 3 months. I have done
a lot of testing on filesystems with different capacities, average
filesize, average number of files per directory, and so on. I think
this algorithm has no negative impact on filesystem perfomance. It
works better than the default one in all cases. The new dirpref
will greatly improve untarring/removing/coping of big directories,
decrease load on cvs servers and much more. The new dirpref doesn't
speedup a compilation process, but also doesn't slow it down.

Obtained from: Grigoriy Orlov <gluk@ptci.ru>
=====

=====
iedowse 2001/04/23 17:37:17 PDT
Pre-dirpref versions of fsck may zero out the new superblock fields
fs_contigdirs, fs_avgfilesize and fs_avgfpdir. This could cause
panics if these fields were zeroed while a filesystem was mounted
read-only, and then remounted read-write.

Add code to ffs_reload() which copies the fs_contigdirs pointer
from the previous superblock, and reinitialises fs_avgf* if necessary.

Reviewed by: mckusick
=====

=====
nik 2001/04/10 03:36:44 PDT
Add information about the new options to newfs and tunefs which set the
expected average file size and number of files per directory. Could do
with some fleshing out.
=====
 1.12 03-Sep-2001  lukem deprecate fs_fscktime; we never used it.

in an effort to maintain compatibility with freebsd/openbsd/whatever,
i'm attempting to get the superblock format in sync, and freebsd uses
the int32_t at this position for `fs_pendinginodes'.

if we ever decide to implement fscktime functionality, we'll:
a) make sure to liaise with the other projects to reserve the same
spare field
b) actually implement the code this time ...

(this is also preparing us for other changes, like the new dirpref code)
 1.11 17-Aug-2001  lukem remove third argument (`int ns') from ffs_sb_swap(), and let ffs_sb_swap()
determine the endianness of the `struct fs *o' superblock from o->fs_magic
and set needswap as necessary, rather than trusting the caller to get
it right. invariably, almost every caller of ffs_sb_swap() was calling it
with ns set to the wrong value for ns anyway!
ansi KNF ffs_bswap.c declarations whilst here.

this fixes all sorts of problems when trying to use other-endian file systems,
notably the kernel trying to access memory *way* off, possibly corrupting or
panicing, and userland programs SEGVing and/or corrupting things (e.g,
"fsck_ffs -B" to swap a file system endianness).

whilst the previous rev of ffs_bswap.c (1.10, 2000/12/23) made this problem
worse, i suspect that the problem was always there and previous versions
just happened not to trash things at the wrong time.

FFS_EI should now be a lot more stable.
 1.10 23-Dec-2000  enami branches: 1.10.2; 1.10.6;
- 16 * 8 != 168
- offset should be endian independent.
 1.9 23-Dec-2000  enami Cosmetic changes
 1.8 15-May-2000  bouyer branches: 1.8.4;
Sync copyrigth notice.
 1.7 18-Jan-2000  bouyer Handle pre-FS_42POSTBLFMT. I now can mount an Ultrix file system on my
sparc without panic.
 1.6 14-Sep-1999  thorpej branches: 1.6.2; 1.6.8;
Need <string.h> for memcpy(3) prototype if building from userland.
 1.5 09-Aug-1998  perry branches: 1.5.6;
bzero->memset, bcopy->memcpy, bcmp->memcmp
 1.4 13-Jun-1998  kleink KNF, mostly of FFS_EI changes.
 1.3 10-Jun-1998  kleink KNF: only include one of <sys/{param,types}.h>, not both.
 1.2 08-Jun-1998  ragge Wrong include file order; caused compile error on vax.
 1.1 18-Mar-1998  bouyer Add support for reading/writing FFS in non-native byte order, conditioned
to "options FFS_EI". The superblock and inodes (without blk addr) are
byteswapped at disk read/write time, other metadatas are byteswapped
when used (as they are acceeded directly in the buffer cache).
This required the addition of a "um_flags" field to struct ufsmount.
ffs_bswap.c contains superblock and inode byteswap routines also used
by userland utilities.
 1.5.6.1 18-Jan-2000  he Pull up revision 1.7 (requested by bouyer):
Properly handle pre-FS_42POSTBLFMT file systems (e.g. Ultrix) in
the endian-independent file system code.
 1.6.8.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.6.2.2 05-Jan-2001  bouyer Sync with HEAD
 1.6.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.8.4.5 25-Nov-2001  he Pull up revision 1.14 (requested by lukem):
A few ffs_sb_swap() fixes.
 1.8.4.4 25-Nov-2001  he Pull up revision 1.13 (requested by lukem):
Pull in enhanced ffs_dirpref() algorithm, which provides a
substantial performance improvement through better locality
between parent/child directories and their files, and by easing
the pressure on the buffer cache for metadata operations.
 1.8.4.3 25-Nov-2001  he Pull up revision 1.12 (requested by lukem):
Deprecate unused fs_fscktime.
 1.8.4.2 25-Nov-2001  he Pull up revision 1.11 (requested by lukem):
Call ffs_sb_swap() with the correct arguments. Fixes problems
with using other-endian file systems.
 1.8.4.1 25-Nov-2001  he Pull up revisions 1.9-1.10 (requested by lukem):
Offset should be endian independent. Some cosmetic changes.
 1.10.6.4 11-Feb-2002  jdolecek Sync w/ -current.
 1.10.6.3 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.10.6.2 13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.10.6.1 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.10.2.5 28-Feb-2002  nathanw Catch up to -current.
 1.10.2.4 08-Jan-2002  nathanw Catch up to -current.
 1.10.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.10.2.2 21-Sep-2001  nathanw Catch up to -current.
 1.10.2.1 24-Aug-2001  nathanw Catch up with -current.
 1.13.4.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.20.2.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.20.2.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.20.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.20.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.20.2.1 03-Aug-2004  skrll Sync with HEAD
 1.27.4.1 02-Jun-2005  riz Pull up revision 1.30 (requested by is in ticket #1973):
fix copy/paste/don'tupdate bug (fix from PR 22232 by Robert Elz).
 1.28.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.28.4.1 29-Apr-2005  kent sync with -current
 1.29.2.1 02-Jun-2005  tron Pull up revision 1.30 (requested by is in ticket #385):
fix copy/paste/don'tupdate bug (fix from PR 22232 by Robert Elz).
 1.32.84.1 19-Jan-2009  skrll Sync with HEAD.
 1.32.74.2 11-Mar-2010  yamt sync with head
 1.32.74.1 04-May-2009  yamt sync with head.
 1.34.8.1 20-Jan-2011  bouyer Snapshot of work in progress on a modernised disk quota system:
- new quotactl syscall (versionned for backward compat), which takes
as parameter a path to a mount point, and a prop_dictionary
(in plistref format) describing commands and arguments.
For each command, status and data are returned as a prop_dictionary.
quota commands features will be added to take advantage of this,
exporting quota data or getting quota commands as plists.

- new on disk-format storage (all 64bit wide), integrated to metadata for
ffs (and playing nicely with wapbl).
Quotas are enabled on a ffs filesystem via superblock flags.
tunefs(8) can enable or disable quotas.
On a quota-enabled filesystem, fsck_ffs(8) will track per-uid/gid
block and inode usages, and will check and update quotas in Pass 6.
quota usage and limits are stored in unliked files (one for users,
one for groups)l fsck_ffs(8) will create the files if needed, or
free them if needed. This means that after enabling or disabling
quotas on a filesystem; a fsck_ffs(8) run is required.
quotacheck(8) is not needed any more, on a unclean shutdown
fsck or journal replay will take care of fixing quotas.
newfs(8) can create a ready-to-mount quota-enabled filesystem
(superblock flags are set and quota inodes are created).
Other new features or semantic changes:
- default quota datas, applied to users or groups which don't already
have a quota entry
- per-user/group grace time (instead of a filesystem global one)
- 0 really means "nothing allowed at all", not "no limit".
If you want "no limit", set the limit to UQUAD_MAX (tools will
understand "unlimited" and "-")

A quota file is structured as follow:
it starts with a header, containing a few per-filesystem values,
and the default quota limits.
Quota entries are linked together as a simple list, each entry has a
pointer (as an offset withing the file) to the next.
The header has a pointer to a list of free quota entries, and
a hash table of in-use entries. The size of the hash table depends
on the filesystem block size (header+hash table should fit in the
first block). The file is not sparse and is a multiple of
filesystem block size (when the free quota entry list is empty a new
filesystem block is allocated). quota entries to not cross
filesystem block boundaries.

In memory, the kernel keeps a cache of recently used quota entries
as a reference to the block number, and offset withing the block.
The quota entry itself is keept in the buf cache.

fsck_ffs(8), tunefs(8) and newfs(8) supports are completed (with
related atf tests :)
The kernel can update disk usage and report it via quotactl(2).

Todo: enforce quotas limits (limits are not checked by kernel yet)
update repquota, edquota and rpc.rquotad to the new world
implement compat_50_quotactl ioctl.
update quotactl(2) man page

fsck_ffs required fixes so that allocating new blocks or inodes will
properly update the superblock and cg sumaries. This was not an issue up
to now because superblock and cg sumaries check happened last, but now
allocations or frees can happen in pass 6.
 1.34.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.34.4.1 21-Apr-2011  rmind sync with head
 1.35.14.3 03-Dec-2017  jdolecek update from HEAD
 1.35.14.2 23-Jun-2013  tls resync from head
 1.35.14.1 25-Feb-2013  tls resync with head
 1.35.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.35.4.1 23-Jan-2013  yamt sync with head
 1.37.10.2 28-Aug-2017  skrll Sync with HEAD
 1.37.10.1 06-Jun-2015  skrll Sync with HEAD
 1.37.8.1 04-Nov-2015  riz Pull up following revision(s) (requested by riastradh in ticket #896):
sys/ufs/ffs/ffs_bswap.c: revision 1.38
sys/ufs/ffs/ffs_bswap.c: revision 1.39
Don't (harmlessly) overrun di_db array; copy di_ib separately.
Noted by Coverity, CID 974635.
While here, simplify size calculation for memcpy.
memcpy di_extb/db/ib separately. Noted by Coverity, CID 974636.
 1.39.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.39.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.10 28-Nov-2022  chs the UFS_EXTATTR option was supposed to affect only UFS1 file systems,
but when the UFS2 extattr code was merged, the UFS_EXTATTR option was
mistakenly changed to affect UFS2 file systems as well. this commit
changes UFS_EXTATTR back to affecting only UFS1 file systems as originally
intended. in UFS2 (or rather UFS2ea in NetBSD), extattrs are a
native feature and are always supported.
 1.9 17-Nov-2022  chs Restore backward compatibility of UFS2 with previous NetBSD releases by
disabling support in UFS2 for extended attributes (including ACLs).
Add a new variant of UFS2 called "UFS2ea" that does support extended attributes.
Add new fsck_ffs operations "-c ea" and "-c no-ea" to convert file systems
from UFS2 to UFS2ea and vice-versa (both of which delete all existing extended
attributes in the process).
 1.8 14-Dec-2021  chs ffs: fix the creation of device nodes on file systems with ACLs enabled.
 1.7 05-Sep-2020  riastradh Round of uvm.h cleanup.

The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
to query whether curlwp is the pagedaemon, which should maybe be
exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
by it. We should split up uvm_extern.h but this will serve for now
to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
UVMHIST(ubchist), since ubchist is declared in uvm.h but the
reference evaporates if UVMHIST is not defined, so we reduce header
file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
here.

ok chs@
 1.6 20-May-2020  christos remove accmode_t typedef (not needed, breaks llvm) from maxv@
 1.5 16-May-2020  christos Add ACL support for FFS. From FreeBSD.
 1.4 02-May-2020  christos Remove the unlock/relock hack by using IO_EXT to indicate that we are already
holding the lock.
 1.3 20-Apr-2020  christos branches: 1.3.2;
- Allow root to set system attributes, samba does this
- Fix locking issue, perhaps we should use our own mutex; does not seem worth
it for this simple case.
 1.2 19-Apr-2020  christos branches: 1.2.2;
- add locking
- wrap wapbl around truncating, ffs_extwrite does it on its own.
 1.1 18-Apr-2020  christos Extended attribute support for ffsv2, from FreeBSD.
 1.2.2.3 25-Apr-2020  bouyer Sync with bouyer-xenpvh-base2 (HEAD)
 1.2.2.2 20-Apr-2020  bouyer Sync with HEAD
 1.2.2.1 19-Apr-2020  bouyer file ffs_extattr.c was added on branch bouyer-xenpvh on 2020-04-20 11:29:14 +0000
 1.3.2.2 21-Apr-2020  martin Sync with HEAD
 1.3.2.1 20-Apr-2020  martin file ffs_extattr.c was added on branch phil-wifi on 2020-04-21 18:42:45 +0000
 1.88 07-Jan-2023  chs ufs: fixed signed/unsigned bugs affecting large file systems

Apply these commits from FreeBSD:

commit e870d1e6f97cc73308c11c40684b775bcfa906a2
Author: Kirk McKusick <mckusick@FreeBSD.org>
Date: Wed Feb 10 20:10:35 2010 +0000

This fix corrects a problem in the file system that treats large
inode numbers as negative rather than unsigned. For a default
(16K block) file system, this bug began to show up at a file system
size above about 16Tb.

To fully handle this problem, newfs must be updated to ensure that
it will never create a filesystem with more than 2^32 inodes. That
patch will be forthcoming soon.

Reported by: Scott Burns, John Kilburg, Bruce Evans
Followup by: Jeff Roberson
PR: 133980
MFC after: 2 weeks

commit 81479e688b0f643ffacd3f335b4b4bba460b769d
Author: Kirk McKusick <mckusick@FreeBSD.org>
Date: Thu Feb 11 18:14:53 2010 +0000

One last pass to get all the unsigned comparisons correct.


In additional to the changes from FreeBSD, this commit includes quite a few
related changes to appease -Wsign-compare.
 1.87 28-Nov-2022  chs branches: 1.87.2;
the UFS_EXTATTR option was supposed to affect only UFS1 file systems,
but when the UFS2 extattr code was merged, the UFS_EXTATTR option was
mistakenly changed to affect UFS2 file systems as well. this commit
changes UFS_EXTATTR back to affecting only UFS1 file systems as originally
intended. in UFS2 (or rather UFS2ea in NetBSD), extattrs are a
native feature and are always supported.
 1.86 18-Apr-2020  christos Extended attribute support for ffsv2, from FreeBSD.
 1.85 22-Aug-2018  msaitoh branches: 1.85.10;
- Cleanup for dynamic sysctl:
- Remove unused *_NAMES macros for sysctl.
- Remove unused *_MAXID for sysctls.
- Move CTL_MACHDEP sysctl definitions for m68k into m68k/include/cpu.h and
use them on all m68k machines.
 1.84 09-Feb-2017  kre branches: 1.84.12; 1.84.14;

Sprinkle in a pinch of const, not too much, just enough
to add a little strength without affecting the overall balance...
 1.83 01-Oct-2016  jdolecek branches: 1.83.2;
allocate wapbl dealloc registration structures via pool, so that there is more
flexibility with limit handling
 1.82 27-Mar-2015  riastradh branches: 1.82.2;
Disentangle buffer-cached I/O from page-cached I/O in UFS.

Page-cached I/O is used for regular files, and is initiated by VFS
users such as userland and NFS.

Buffer-cached I/O is used for directories and symlinks, and is issued
only internally by UFS.

New UFS routine ufs_bufio replaces vn_rdwr for internal use.
ufs_bufio is implemented by new UFS operations uo_bufrd/uo_bufwr,
which sit in ufs_readwrite.c alongside the VOP_READ/VOP_WRITE
implementations.

I preserved the code as much as possible and will leave further
simplification for future commits. I kept the ulfs_readwrite.c
copypasta close to ufs_readwrite.c in case we ever want to merge them
back; likewise ext2fs_readwrite.c.

No externally visible semantic change. All atf fs tests still pass.
 1.81 17-Mar-2015  hannken Change ffs to use vcache_new:
- Change ffs_valloc to return an inode number.
- Remove now obsolete UFS operations UFS_VALLOC and UFS_VFREE.
- Make ufs_makeinode private to ufs_vnops.c and pass vattr instead of mode.
 1.80 16-Jun-2013  hannken branches: 1.80.10;
Add an UFS_SNAPGONE() ufs op replacing the calls
to ffs_snapgone() in ufs_lookup.c.

Ok: David Holland <dholland@netbsd.org>

Welcome to 6.99.22
 1.79 19-Oct-2012  drochner Implement experimental support to pass notifications that a file
was deleted from the filesystem to the disk driver, commonly
known as "discard" or "trim".
fs/driver support is in ffs and ata wd for now.
This is what was posted here:
http://mail-index.netbsd.org/tech-kern/2012/02/28/msg012813.html
with minor cleanup, and the global switch replaced by a mount option.
 1.78 17-Jun-2011  manu branches: 1.78.2; 1.78.12;
Add mount -o extattr option to enable extended attributs (corrently only
for UFS1).
Remove kernel option for EA backing store autocreation and do it by
default. Add a sysctl so that autocreated attriutr size can be modified.
 1.77 27-Apr-2011  hannken branches: 1.77.2;
Cleanup ffs fsync and make devices on wapbl enabled file systems work here:

- Replace the ugly sync loop in ffs_full_fsync() and ffs_vfs_fsync() with
vflushbuf(). This loop is a relic of softdeps and not needed anymore.

- Add ffs_spec_fsync() for device nodes on ffs file systems that calls
spec_fsync() like all other file systems do and then updates the ctime.

Discussed on tech-kern.

Should fix PRs:
PR #41192 wapbl diagnostic panic during cgdconfig
PR #41977 kernel diagnostic assertion "rw_lock_held(&wl->wl_rwlock)" failed
PR #42149 wapbl locking panic if watching DVD
PR #42551 Lockdebug assert in wapbl when running zpool
 1.76 06-Mar-2011  bouyer merge the bouyer-quota2 branch. This adds a new on-disk format
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.

See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
 1.75 22-Feb-2009  ad branches: 1.75.4; 1.75.6; 1.75.8;
PR kern/26878 FFSv2 + softdep = livelock (no free ram)
PR kern/16942 panic with softdep and quotas
PR kern/19565 panic: softdep_write_inodeblock: indirect pointer #1 mismatch
PR kern/26274 softdep panic: allocdirect_merge: ...
PR kern/26374 Long delay before non-root users can write to softdep partitions
PR kern/28621 1.6.x "vp != NULL" panic in ffs_softdep.c:4653 while unmounting a softdep (+quota) filesystem
PR kern/29513 FFS+Softdep panic with unfsck-able file-corruption
PR kern/31544 The ffs softdep code appears to fail to write dirty bits to disk
PR kern/31981 stopping scsi disk can cause panic (softdep)
PR kern/32116 kernel panic in softdep (assertion failure)
PR kern/32532 softdep_trackbufs deadlock
PR kern/37191 softdep: locking against myself
PR kern/40474 Kernel panic after remounting raid root with softdep

Retire softdep, pass 2. As discussed and later formally announced on the
mailing lists.
 1.74 06-Dec-2008  joerg branches: 1.74.4;
Split ffs_freefile into a frontend for normal cylinder group and for
snapshot use. Adjust ffs_blkfree_common to get the fs instance passed
in, the original commit didn't account blocks in the snapshots
correctly. Assert that ffs_blkfree is used with the primary fs instance
and that ffs_checkfreefile is only used for snapshots. Move the bdwrite
from ffs_blkfree_common into the caller for symmetry. This creates a
redundant write of unmodified data for ffs_blkfree_snap if a double free
of a block happens.

Reviewed and tested by hannken@.
 1.73 01-Dec-2008  joerg ffs_blkfree is used in two different ways. The normal usage is to free a
block in the cylinder groups of the filesystem. The other user is the
snapshot code, which wants to modify the copied cylinder groups. Use
different frontends to distinguish the cases in preparation for fine
grained locking for cylinder groups.
 1.72 30-Nov-2008  joerg Split ffs_blkalloc into a frontend that does inode based consistency
checks and a backend that just asserts them. Use the backend in
ffs_wapbl_abort_sync_metadata instead of faking an inode.
 1.71 06-Nov-2008  joerg Remove XXXUBC code for ffs_reallocblks, that has been conditionalized in
2002 and #if 0'ed in 2005. It would need a considerable amount of work
to bring back and obscures the more important block allocation.
 1.70 10-Oct-2008  hannken branches: 1.70.2;
Break a deadlock where one thread has a wapbl transaction, calls VOP_GETPAGES
and wants to busy a page while another thread calls VOP_PUTPAGES on the same
vnode, takes pages busy and wants to start a wapbl transaction.

Reviewed by: Jason Thorpe <thorpej@netbsd.org>
 1.69 22-Aug-2008  hannken Add snapshot support for logging ffs file systems.

- Add UFS_WAPBL_BEGIN() / UFS_WAPBL_END() where needed.

- Expunge WAPBL log inodes from snapshots.

- Ffs_copyonwrite() and ffs_snapblkfree() must run inside a WAPBL transaction.

- Add ffs_gop_write() as a wrapper around genfs_gop_write() that makes sure
genfs_gop_write() gets always called inside a WAPBL transaction.

- Add VOP_PUTPAGES() flag PGO_JOURNALLOCKED to tag calls to VOP_PUTPAGES()
inside a WAPBL transaction.

Reviewed by: Simon Burge <simonb@netbsd.org>, Greg Oster <oster@netbsd.org>

PGO_JOURNALLOCKED / ffs_gop_write() part presented on tech-kern@.
 1.68 12-Aug-2008  hannken Deny read/write access to snapshot vnodes. We use fss(4) to read from
snapshots. With this policy in place:

- Separate the snapshot vnode lock from the snapshot common lock.
Snapshots no longer need recursive vnode locks.

- Use a mutex (si_snaplock) to serialize creation, deletion, reading and
writing of snapshots.

- Move ffs_read() for snapshots into ffs_snapshot.c.

Reviewed by: Jason Thorpe <thorpej@netbsd.org>

While here change ffs_copyonwrite() to fail requests from pagedaemon that need
to copy-on-write.
 1.67 31-Jul-2008  simonb Merge the simonb-wapbl branch. From the original branch commit:

Add Wasabi System's WAPBL (Write Ahead Physical Block Logging)
journaling code. Originally written by Darrin B. Jewell while
at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

OK'd by core@, releng@.
 1.66 28-Jun-2008  rumble branches: 1.66.2;
Create sysctl entries during module initialisation and destroy them
appropriately.

Many of these file systems are now ready for modularisation.
 1.65 03-Jun-2008  hannken branches: 1.65.2;
ufs/ffs: replace calls to getblk() with ffs_getblk(). Now all buffers
have been run through copy-on-write and async mounts work again.

Fixes PR kern/38820

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.64 17-Apr-2008  hannken branches: 1.64.2; 1.64.4; 1.64.6;
Replace get/setspecific with a void pointer in struct ufsmount. Use explicit
initialization/finalization of snapshot private data on creation/deletion
of struct ufsmount.
Snapshot mounts no longer may fail silently because kmem_alloc() fails.

Welcome to 4.99.60

Ok: Andrew Doran <ad@netbsd.org>
 1.63 03-Jan-2008  ad branches: 1.63.6;
Use pool_cache.
 1.62 02-Jan-2008  ad Merge vmlocking2 to head.
 1.61 08-Dec-2007  pooka branches: 1.61.4;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.
 1.60 09-Aug-2007  hannken branches: 1.60.2; 1.60.8; 1.60.10;
Move the fstrans-aware lock vnops from ufs to ffs. Other ufs file systems
do not need them.

Ride on 4.99.28
 1.59 09-Aug-2007  hannken Move snapshot per-mount data from struct ufsmount to mount specific data.
No functional changes.

Welcome to 4.99.28 (struct ufsmount changed size)
 1.58 31-Jul-2007  pooka branches: 1.58.2; 1.58.4;
* nuke the nameidata parameter from VFS_MOUNT(). Nobody on tech-kern
knew what it was supposed to be used for and wrstuden gave a go-ahead
* while rototilling, convert file systems which went easily to
use VFS_PROTOS() instead of manually prototyping the methods
 1.57 12-Jul-2007  dsl branches: 1.57.2;
Change the VFS_MOUNT() interface so that the 'data' buffer passed to the
fs code is a kernel buffer, pass though the length of the buffer as well.
Since the length of the userspace buffer isn'it (yet) passed through the mount
system call, add a field to the vfsops structure containing the default length.
Split sys_mount() for calls from compat code.
Ride one of the recent kernel version changes - old fs LKMs will load, but
sys_mount() will reject any attempt to use them.
 1.56 07-Jun-2007  yamt remove a duplicated definition of FFS_ITIMES.
 1.55 19-Jan-2007  hannken branches: 1.55.6; 1.55.8;
New file system suspension API to replace vn_start_write and vn_finished_write.
The suspension helpers are now put into file system specific operations.
This means every file system not supporting these helpers cannot be suspended
and therefore snapshots are no longer possible.

Implemented for file systems of type ffs.

The new API is enabled on a kernel option NEWVNGATE. This option is
not enabled by default in any kernel config.

Presented and discussed on tech-kern with much input from
Bill Studenmund <wrstuden@netbsd.org> and YAMAMOTO Takashi <yamt@netbsd.org>.

Welcome to 4.99.9 (new vfs op vfs_suspendctl).
 1.54 13-Jul-2006  martin branches: 1.54.4;
Fix alignement problems for fhandle_t, exposed by gcc4.1.

While touching all vptofh/fhtovp functions, get rid of VFS_MAXFIDSIZ,
version the getfh(2) syscall and explicitly pass the size available in
the filehandle from userland.

Discussed on tech-kern, with lots of help from yamt (thanks!).
 1.53 14-May-2006  elad branches: 1.53.4;
integrate kauth.
 1.52 23-Apr-2006  yamt remove unused FFS_NAMES and LFS_NAMES.
 1.51 14-Jan-2006  yamt branches: 1.51.2; 1.51.4; 1.51.6; 1.51.8; 1.51.10;
- unify ffs_blkatoff and lfs_blkatoff.
- remove ufs_ops::uo_blkatoff.
- add directory read-ahead code. (disabled for now.)
 1.50 27-Dec-2005  chs branches: 1.50.2;
changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
make those fields always present.
- for functions which are conditionally inline, make them never inline.
- remove some other functions which are conditionally defined but
don't actually do anything anymore.
- make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.
 1.49 11-Dec-2005  christos merge ktrace-lwp.
 1.48 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.47 12-Sep-2005  christos branches: 1.47.2;
- access the ffs and ext2fs itimes functions through a pointer, so that
if the filesystem is not compiled in the kernel still links. Probably
a better solution is to use weak symbols.
- move the filesystem-specific itime macros to the filesystem header files.
 1.46 12-Sep-2005  christos Use nanotime() to update the time fields in filesystems. Convert the code
from macros to real functions. Original patch and review from chuq.
Note: ext2fs only keeps seconds in the on-disk inode, and msdosfs does not
have enough precision for all fields, so this is not very useful for those
two.
 1.45 09-Sep-2005  yamt revert the code to expand putpage requests to block boundary.
because:
- it was incomplete in some cases.
- it can confuse pagedaemon.
see PR/15364 for details.
 1.44 28-Aug-2005  thorpej Experimental support for extended attributes on UFS1 file systems, using a
backing file per attribute type indexed by inode number to hold the extended
attributes.

This is working pretty well on my test systems, except for the "autostart"
feature. I need someone with a better handle on the VFS locking protocol
to go over that.

This is a work-in-progress. There are parts of this that could be re-factored
allowing this approach to be used on other types of file systems.

Adapted from FreeBSD.
 1.43 15-Jul-2005  thorpej Use ANSI function decls.
 1.42 26-Feb-2005  perry branches: 1.42.2; 1.42.4;
nuke trailing whitespace
 1.41 29-Aug-2004  hannken branches: 1.41.4; 1.41.6;
While creating a snapshot inodes must be freed from the
snapshot, not from the file system.
ffs_freefile() needs explicit "fs" and "devvp" arguments.
 1.40 04-Jun-2004  he Need to forward-declare "struct timespec" because the new ffs_snapshot()
function declaration refers to it. Fixes build problem of sbin/badsect
for the vax target, which still uses gcc 2.95.3.
 1.39 25-May-2004  hannken Add ffs internal snapshots. Written by Marshall Kirk McKusick for FreeBSD.

- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.

Welcome to 2.0F.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.38 20-May-2004  atatat Tweak sysctl setup functions (the macros, actually) for use in lkms,
and tweak lkminit_*.c (where applicable) to call them, and to call
sysctl_teardown() when being unloaded.

This consists of (1) making setup functions not be static when being
compiled as lkms (change to sys/sysctl.h), (2) making prototypes
visible for the various setup functions in header files (changes to
various header files), and (3) making simple "load" and "unload"
functions in the actual lkminit stuff.

linux_sysctl.c also needs its root exposed (ie, made not static) for
this (when built as an lkm).
 1.37 21-Apr-2004  christos Replace the statfs() family of system calls with statvfs().
Retain binary compatibility.
 1.36 10-Jan-2004  hannken branches: 1.36.2;
Split out softdep_flushworklist() from softdep_flushfiles() so that
it can be used to clear the work queue.

Cleanup ffs_sync() which did not synchronously wait when MNT_WAIT
was specified. Clear the work queue when MNT_WAIT is specified.

Result is a clean on-disk file system after ffs_sync(.., MNT_WAIT, ..)

From FreeBSD.
 1.35 02-Jan-2004  dbj add uuid field to apple ufs volume label
 1.34 04-Dec-2003  atatat Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
 1.33 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.32 29-Jun-2003  fvdl branches: 1.32.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.31 29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.30 29-Jun-2003  enami Add forward declaration of struct lwp instead of struct proc. Sort those
while I'm here.
 1.29 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.28 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.27 15-Mar-2003  kristerw ffs_gop_alloc() is not used any more. Remove it.

OK:ed by Konrad Schroder.
 1.26 17-Feb-2003  perseant Add code to UBCify LFS. This is still behind "#ifdef LFS_UBC" for now
(there are still some details to work out) but expect that to go
away soon. To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:

* Create a writer daemon kernel thread whose purpose is to handle page
writes for the pagedaemon, but which also takes over some of the
functions of lfs_check(). This thread is started the first time an
LFS is mounted.

* Add a "flags" parameter to GOP_SIZE. Current values are
GOP_SIZE_READ, meaning that the call should return the size of the
in-core version of the file, and GOP_SIZE_WRITE, meaning that it
should return the on-disk size. One of GOP_SIZE_READ or
GOP_SIZE_WRITE must be specified.

* Instead of using malloc(...M_WAITOK) for everything, reserve enough
resources to get by and use malloc(...M_NOWAIT), using the reserves if
necessary. Use the pool subsystem for structures small enough that
this is feasible. This also obsoletes LFS_THROTTLE.

And a few that are not strictly necessary:

* Moves the LFS inode extensions off onto a separately allocated
structure; getting closer to LFS as an LKM. "Welcome to 1.6O."

* Unified GOP_ALLOC between FFS and LFS.

* Update LFS copyright headers to correct values.

* Actually cast to unsigned in lfs_shellsort, like the comment says.

* Keep track of which segments were empty before the previous
checkpoint; any segments that pass two checkpoints both dirty and
empty can be summarily cleaned. Do this. Right now lfs_segclean
still works, but this should be turned into an effectless
compatibility syscall.
 1.25 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.24 01-Dec-2002  matt Add multiple inclusion protection for headers. Fix mismatched
variable declarations (missing const's) as needed.
 1.23 28-Sep-2002  dbj Add support for the Apple UFS variation on ffs
This is the bulk of PR #17345

The general approach is to use a run time deteriminable value
for DIRBLKSIZ. Additional allowances are included for using
MAXSYMLINKLEN with FS_42INODEFMT and a shift in the cylinder group
cluster summary count array. Support is added for managing
the Apple UFS volume label.
 1.22 05-May-2002  chs for softdep vnodes, always write together the pages for any block that
might have a dependency , since the accounting doesn't work otherwise.
fixes PRs 15364 16336 16448.
 1.21 15-Sep-2001  chs a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.20 15-Sep-2001  chs add a new VFS op, vfs_reinit, which is called when desiredvnodes is
adjusted via sysctl. file systems that have hash tables which are
sized based on the value of this variable now resize those hash tables
using the new value. the max number of FFS softdeps is also recalculated.

convert various file systems to use the <sys/queue.h> macros for
their hash tables.
 1.19 17-Aug-2001  lukem branches: 1.19.2;
remove third argument (`int ns') from ffs_sb_swap(), and let ffs_sb_swap()
determine the endianness of the `struct fs *o' superblock from o->fs_magic
and set needswap as necessary, rather than trusting the caller to get
it right. invariably, almost every caller of ffs_sb_swap() was calling it
with ns set to the wrong value for ns anyway!
ansi KNF ffs_bswap.c declarations whilst here.

this fixes all sorts of problems when trying to use other-endian file systems,
notably the kernel trying to access memory *way* off, possibly corrupting or
panicing, and userland programs SEGVing and/or corrupting things (e.g,
"fsck_ffs -B" to swap a file system endianness).

whilst the previous rev of ffs_bswap.c (1.10, 2000/12/23) made this problem
worse, i suspect that the problem was always there and previous versions
just happened not to trash things at the wrong time.

FFS_EI should now be a lot more stable.
 1.18 09-Aug-2001  lukem be consistent and use "u_char" instead of "unsigned char"
 1.17 27-Nov-2000  chs branches: 1.17.2; 1.17.6;
Initial integration of the Unified Buffer Cache project.
 1.16 04-Apr-2000  jdolecek branches: 1.16.4;
Add a new sysctl variable vfs.ffs.log_changeopt - if this is true,
an optimalization strategy change is logged into syslog. Default
is 0 (to not log). This replaces the recent not quite "right"
change to only log the change if kernel is compiled with DEBUG.
 1.15 16-Mar-2000  jdolecek Add new VFS op routine - vfs_done and call it on filesystem detach
in vfs_detach(). vfs_done may free global filesystem's resources,
typically those allocated in respective filesystem's init function.
Needed so those filesystems which went in via LKM have a chance to
clean after themselves before unloading.

For each leaf filesystem, add appropriate vfs_done routine.

Also remember how many times ffs_init() was called and do
the appropriate initialization on first call only. In ffs_done(),
destroy the resources when called by the last user of ffs code.
Change mfs to call ffs_init()/ffs_done() appropriately.
 1.14 14-Feb-2000  fvdl Fixes to the softdep code from Ethan Solomita <ethan@geocast.com>.
* Fix buffer ordering when it has dependencies.
* Alleviate memory problems.
* Deal with some recursive vnode locks (sigh).
* Fix other bugs.
 1.13 15-Nov-1999  fvdl Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.12 26-Feb-1999  wrstuden branches: 1.12.4; 1.12.8; 1.12.10; 1.12.14;
Modify vfsops to seperate vfs_fhtovp() into two routines. vfs_fhtovp() now
only handles the file handle to vnode conversion, and a new call,
vfs_checkexp(), performs the export verification.
 1.11 01-Sep-1998  thorpej branches: 1.11.2;
Use the pool allocator and the "nointr" pool page allocator for FFS inodes.

XXX MFS also comes in here for inodes, and used a different malloc type,
but the structure is the same, so we just use the FFS inode pool.
 1.10 24-Jun-1998  sommerfe Always include fifos; "not an option any more".
 1.9 22-Jun-1998  sommerfe defopt for options FIFO
 1.8 18-Mar-1998  bouyer Add support for reading/writing FFS in non-native byte order, conditioned
to "options FFS_EI". The superblock and inodes (without blk addr) are
byteswapped at disk read/write time, other metadatas are byteswapped
when used (as they are acceeded directly in the buffer cache).
This required the addition of a "um_flags" field to struct ufsmount.
ffs_bswap.c contains superblock and inode byteswap routines also used
by userland utilities.
 1.7 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.6 22-Dec-1996  cgd Change the second and third args to struct vfsops' (*vfs_mount)() to
'const char *', and 'void *', respectively. The second arg is taken directly
from user arguments, and is const there, so must be const in the prototypes
and functions. The third arg is also taken directly from user arguments.
It doesn't have to be changed, but since it's cleaner to keep the type
the same as the user arg's type, and I'm already making the 'const char *'
change...
 1.5 01-Sep-1996  mycroft Add a set of generic file system operations that most file systems use.
Also, fix some time stamp bogosities.
 1.4 09-Feb-1996  christos ffs prototypes
 1.3 20-Oct-1994  cgd update for new syscall args description mechanism, and deal safely
with wider types.
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.11.2.3 30-May-1999  chs remove "allocedp" arg to ffs_balloc().
 1.11.2.2 25-Feb-1999  chs update ffs_balloc(), add ffs_balloc_range().
 1.11.2.1 09-Nov-1998  chs initial snapshot. lots left to do.
 1.12.14.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.12.10.2 26-Oct-1999  fvdl Merge changes in the trickle-sync and softdep code as done by Kirk McKusick
in FreeBSD since the version that we based the branch on. Merging mostly
done by Ethan Solomita <ethan@geocast.com>.

Also, make sure the syncer thread/process isn't active when we're
unmounting a filesystem. This could wreak havoc. XXX should be done
on a per-mountpoint basis, but especially the softdep code would
end up to be a big pile of vfs_busy() calls.
 1.12.10.1 19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.12.8.2 08-Dec-2000  bouyer Sync with HEAD.
 1.12.8.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.12.4.4 31-Jul-1999  chs add proto for ffs_balloc1().
 1.12.4.3 04-Jul-1999  chs ffs_balloc() is now a VOP. ffs_balloc_range() is gone.
use the genfs getpages and putpages for ffs.
 1.12.4.2 21-Jun-1999  thorpej Forward decl struct csum.
 1.12.4.1 07-Jun-1999  chs merge everything from chs-ubc branch.
 1.16.4.1 25-Nov-2001  he Pull up revision 1.19 (requested by lukem):
Call ffs_sb_swap() with the correct arguments. Fixes problems
with using other-endian file systems.
 1.17.6.4 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.17.6.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.17.6.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.17.6.1 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.17.2.5 11-Dec-2002  thorpej Sync with HEAD.
 1.17.2.4 18-Oct-2002  nathanw Catch up to -current.
 1.17.2.3 20-Jun-2002  nathanw Catch up to -current.
 1.17.2.2 21-Sep-2001  nathanw Catch up to -current.
 1.17.2.1 24-Aug-2001  nathanw Catch up with -current.
 1.19.2.1 01-Oct-2001  fvdl Catch up with -current.
 1.32.2.9 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.32.2.8 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.32.2.7 21-Sep-2004  skrll Fix the sync with head I botched.
 1.32.2.6 18-Sep-2004  skrll Sync with HEAD.
 1.32.2.5 03-Sep-2004  skrll Sync with HEAD
 1.32.2.4 24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.32.2.3 18-Aug-2004  skrll s/proc/lwp/
 1.32.2.2 03-Aug-2004  skrll Sync with HEAD
 1.32.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.36.2.1 23-May-2004  tron Pull up revision 1.38 (requested by atatat in ticket #374):
Tweak sysctl setup functions (the macros, actually) for use in lkms,
and tweak lkminit_*.c (where applicable) to call them, and to call
sysctl_teardown() when being unloaded.
This consists of (1) making setup functions not be static when being
compiled as lkms (change to sys/sysctl.h), (2) making prototypes
visible for the various setup functions in header files (changes to
various header files), and (3) making simple "load" and "unload"
functions in the actual lkminit stuff.
linux_sysctl.c also needs its root exposed (ie, made not static) for
this (when built as an lkm).
 1.41.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.41.4.1 29-Apr-2005  kent sync with -current
 1.42.4.5 21-Jan-2008  yamt sync with head
 1.42.4.4 03-Sep-2007  yamt sync with head.
 1.42.4.3 26-Feb-2007  yamt sync with head.
 1.42.4.2 30-Dec-2006  yamt sync with head.
 1.42.4.1 21-Jun-2006  yamt sync with head.
 1.42.2.1 21-Oct-2005  tron Pull up following revision(s) (requested by yamt in ticket #845):
sys/ufs/ffs/ffs_extern.h: revision 1.45 via patch
sys/ufs/ffs/ffs_vnops.c: revision 1.75 via patch
revert the code to expand putpage requests to block boundary.
because:
- it was incomplete in some cases.
- it can confuse pagedaemon.
see PR/15364 for details.
 1.47.2.1 20-Oct-2005  yamt adapt ufs.
 1.50.2.1 15-Jan-2006  yamt sync with head.
 1.51.10.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.51.8.6 11-May-2006  elad sync with head
 1.51.8.5 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.51.8.4 03-May-2006  yamt - wrap some kernel-only things by #ifdef _KERNEL.
- place __END_DECLS correctly.

ok'd by elad@
 1.51.8.3 18-Apr-2006  elad make build.sh tools work. from matt.
 1.51.8.2 08-Mar-2006  elad Include sys/kauth.h here.
 1.51.8.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.51.6.2 11-Aug-2006  yamt sync with head
 1.51.6.1 24-May-2006  yamt sync with head.
 1.51.4.1 01-Jun-2006  kardel Sync with head.
 1.51.2.1 09-Sep-2006  rpaulo sync with head
 1.53.4.1 13-Jul-2006  gdamore Merge from HEAD.
 1.54.4.1 01-Feb-2007  ad Sync with head.
 1.55.8.2 11-Jul-2007  mjf Sync with head.
 1.55.8.1 30-Mar-2007  mjf Provide a test journal. It's just a wrapper to bwrite and doesn't
actually do any journaling, but we need something to give the
transactions to.
 1.55.6.5 16-Sep-2007  ad - Checkpoint work in progress on the vnode lifecycle and reference counting
stuff. This makes it work properly without kernel_lock and fixes a few
quite old bugs. See vfs_subr.c 1.283.2.17 for details.

- Fix some problems with softdep. Unfortunately our softdep code appears
to have some longstanding bugs that cause it fail under stress test.
 1.55.6.4 20-Aug-2007  ad Sync with HEAD.
 1.55.6.3 20-Aug-2007  ad softdep locking improvements. It hangs looping in flush_inodedep_deps(),
more work required.
 1.55.6.2 15-Jul-2007  ad Sync with head.
 1.55.6.1 09-Jun-2007  ad Sync with head.
 1.57.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.58.4.2 31-Jul-2007  pooka * nuke the nameidata parameter from VFS_MOUNT(). Nobody on tech-kern
knew what it was supposed to be used for and wrstuden gave a go-ahead
* while rototilling, convert file systems which went easily to
use VFS_PROTOS() instead of manually prototyping the methods
 1.58.4.1 31-Jul-2007  pooka file ffs_extern.h was added on branch matt-mips64 on 2007-07-31 21:14:21 +0000
 1.58.2.2 09-Dec-2007  jmcneill Sync with HEAD.
 1.58.2.1 16-Aug-2007  jmcneill Sync with HEAD.
 1.60.10.2 26-Dec-2007  ad Sync with head.
 1.60.10.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.60.8.1 18-Feb-2008  mjf Sync with HEAD.
 1.60.2.1 09-Jan-2008  matt sync with HEAD
 1.61.4.2 08-Jan-2008  bouyer Sync with HEAD
 1.61.4.1 02-Jan-2008  bouyer Sync with HEAD
 1.63.6.5 17-Jan-2009  mjf Sync with HEAD.
 1.63.6.4 28-Sep-2008  mjf Sync with HEAD.
 1.63.6.3 29-Jun-2008  mjf Sync with HEAD.
 1.63.6.2 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.63.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.64.6.3 10-Oct-2008  skrll Sync with HEAD.
 1.64.6.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.64.6.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.64.4.1 04-May-2009  yamt sync with head.
 1.64.2.1 04-Jun-2008  yamt sync with head
 1.65.2.3 28-Jul-2008  simonb Add support for creating a WAPBL log in the filesystem. Will
create an in-filesystem log on first "mount -o log" if one doesn't
exist, and will then continue to use same log in the future. See
(soon to be added) wapbl(4) for more info.

Adds a new B_CONTIG low-level allocation flag that uses hints in
"struct ffs_inode_ext" to lay out an ffs file's data contiguously.

Thanks to Greg Oster for helping with the design of this and to
Antti Kantee for code review and suggestions.
 1.65.2.2 03-Jul-2008  simonb Sync with head.
 1.65.2.1 10-Jun-2008  simonb Initial commit of Wasabi System's WAPBL (Write Ahead Physical Block
Logging) journaling code. Originally written by Darrin B. Jewell
while at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

Still a number of issues - look in doc/BRANCHES for "simonb-wapbl"
for more info.
 1.66.2.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.66.2.1 19-Oct-2008  haad Sync with HEAD.
 1.70.2.2 03-Mar-2009  skrll Sync with HEAD.
 1.70.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.74.4.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.75.8.1 20-Jan-2011  bouyer Snapshot of work in progress on a modernised disk quota system:
- new quotactl syscall (versionned for backward compat), which takes
as parameter a path to a mount point, and a prop_dictionary
(in plistref format) describing commands and arguments.
For each command, status and data are returned as a prop_dictionary.
quota commands features will be added to take advantage of this,
exporting quota data or getting quota commands as plists.

- new on disk-format storage (all 64bit wide), integrated to metadata for
ffs (and playing nicely with wapbl).
Quotas are enabled on a ffs filesystem via superblock flags.
tunefs(8) can enable or disable quotas.
On a quota-enabled filesystem, fsck_ffs(8) will track per-uid/gid
block and inode usages, and will check and update quotas in Pass 6.
quota usage and limits are stored in unliked files (one for users,
one for groups)l fsck_ffs(8) will create the files if needed, or
free them if needed. This means that after enabling or disabling
quotas on a filesystem; a fsck_ffs(8) run is required.
quotacheck(8) is not needed any more, on a unclean shutdown
fsck or journal replay will take care of fixing quotas.
newfs(8) can create a ready-to-mount quota-enabled filesystem
(superblock flags are set and quota inodes are created).
Other new features or semantic changes:
- default quota datas, applied to users or groups which don't already
have a quota entry
- per-user/group grace time (instead of a filesystem global one)
- 0 really means "nothing allowed at all", not "no limit".
If you want "no limit", set the limit to UQUAD_MAX (tools will
understand "unlimited" and "-")

A quota file is structured as follow:
it starts with a header, containing a few per-filesystem values,
and the default quota limits.
Quota entries are linked together as a simple list, each entry has a
pointer (as an offset withing the file) to the next.
The header has a pointer to a list of free quota entries, and
a hash table of in-use entries. The size of the hash table depends
on the filesystem block size (header+hash table should fit in the
first block). The file is not sparse and is a multiple of
filesystem block size (when the free quota entry list is empty a new
filesystem block is allocated). quota entries to not cross
filesystem block boundaries.

In memory, the kernel keeps a cache of recently used quota entries
as a reference to the block number, and offset withing the block.
The quota entry itself is keept in the buf cache.

fsck_ffs(8), tunefs(8) and newfs(8) supports are completed (with
related atf tests :)
The kernel can update disk usage and report it via quotactl(2).

Todo: enforce quotas limits (limits are not checked by kernel yet)
update repquota, edquota and rpc.rquotad to the new world
implement compat_50_quotactl ioctl.
update quotactl(2) man page

fsck_ffs required fixes so that allocating new blocks or inodes will
properly update the superblock and cg sumaries. This was not an issue up
to now because superblock and cg sumaries check happened last, but now
allocations or frees can happen in pass 6.
 1.75.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.75.4.2 31-May-2011  rmind sync with head
 1.75.4.1 21-Apr-2011  rmind sync with head
 1.77.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.78.12.3 03-Dec-2017  jdolecek update from HEAD
 1.78.12.2 23-Jun-2013  tls resync from head
 1.78.12.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.78.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.78.2.1 30-Oct-2012  yamt sync with head
 1.80.10.3 28-Aug-2017  skrll Sync with HEAD
 1.80.10.2 05-Oct-2016  skrll Sync with HEAD
 1.80.10.1 06-Apr-2015  skrll Sync with HEAD
 1.82.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.82.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.83.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.84.14.2 21-Apr-2020  martin Sync with HEAD
 1.84.14.1 10-Jun-2019  christos Sync with HEAD
 1.84.12.1 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.85.10.1 20-Apr-2020  bouyer Sync with HEAD
 1.87.2.1 13-May-2023  martin Pull up following revision(s) (requested by chs in ticket #160):

usr.sbin/makefs/ffs/ffs_alloc.c: revision 1.31
sbin/tunefs/tunefs.c: revision 1.58
sbin/fsck_ffs/setup.c: revision 1.105
sbin/fsck_ffs/pass5.c: revision 1.56
usr.sbin/makefs/ffs.c: revision 1.74
usr.sbin/makefs/ffs/mkfs.c: revision 1.42
usr.sbin/makefs/Makefile: revision 1.40
sys/ufs/ffs/fs.h: revision 1.71
sbin/fsdb/fsdb.c: revision 1.54
sbin/resize_ffs/resize_ffs.c: revision 1.58
sbin/fsck_ffs/pass4.c: revision 1.29
usr.sbin/makefs/ffs/ffs_extern.h: revision 1.9
sbin/newfs/mkfs.c: revision 1.133
sys/ufs/ffs/ffs_alloc.c: revision 1.172
sbin/fsck_ffs/pass1b.c: revision 1.24
usr.sbin/dumpfs/dumpfs.c: revision 1.68
sys/ufs/ffs/ffs_extern.h: revision 1.88
usr.sbin/quotacheck/quotacheck.c: revision 1.51
sys/ufs/ffs/ffs_subr.c: revision 1.54
sbin/fsck_ffs/main.c: revision 1.91
sbin/fsck_ffs/pass1.c: revision 1.63

ufs: fixed signed/unsigned bugs affecting large file systems

Apply these commits from FreeBSD:
commit e870d1e6f97cc73308c11c40684b775bcfa906a2
Author: Kirk McKusick <mckusick@FreeBSD.org>
Date: Wed Feb 10 20:10:35 2010 +0000
This fix corrects a problem in the file system that treats large
inode numbers as negative rather than unsigned. For a default
(16K block) file system, this bug began to show up at a file system
size above about 16Tb.
To fully handle this problem, newfs must be updated to ensure that
it will never create a filesystem with more than 2^32 inodes. That
patch will be forthcoming soon.
Reported by: Scott Burns, John Kilburg, Bruce Evans
Followup by: Jeff Roberson
PR: 133980
MFC after: 2 weeks

commit 81479e688b0f643ffacd3f335b4b4bba460b769d
Author: Kirk McKusick <mckusick@FreeBSD.org>
Date: Thu Feb 11 18:14:53 2010 +0000
One last pass to get all the unsigned comparisons correct.

In additional to the changes from FreeBSD, this commit includes quite a few
related changes to appease -Wsign-compare.
 1.131 31-Jul-2020  chs fix the UFS2 extattr truncate code to play nice with wapbl.
also, rather than pull in the FreeBSD V_NORMAL/V_ALT flags to
vinvalbuf() and the buf b_xflags field and BX_ALTDATA flag,
add a binvalbuf() function to invalid a specific buffer
and use that to invalidate the two possible exattr bufs
during IO_EXT truncations.
 1.130 26-Jul-2020  chs pull in a bit more FreeBSD code to allow specifying truncation of
the regular bmap (IO_NORMAL) independently of the extattr bmap (IO_EXT).
fixes fs corruption when removing extattrs in UFS2.
 1.129 02-May-2020  christos Remove the unlock/relock hack by using IO_EXT to indicate that we are already
holding the lock.
 1.128 23-Apr-2020  ad PR kern/54759 (vm.ubc_direct deadlock when read()/write() into mapping of itself)

- Add new flag UBC_ISMAPPED which tells ubc_uiomove() the object is mmap()ed
somewhere. Use it to decide whether to do direct-mapped copy, rather than
poking around directly in the vnode in ubc_uiomove(), which is ugly and
doesn't work for tmpfs. It would be nicer to contain all this in UVM but
the filesystem provides the needed locking here (VV_MAPPED) and to
reinvent that would suck more.

- Rename UBC_UNMAP_FLAG() to UBC_VNODE_FLAGS(). Pass in UBC_ISMAPPED where
appropriate.
 1.127 18-Apr-2020  christos Extended attribute support for ffsv2, from FreeBSD.
 1.126 23-Feb-2020  ad branches: 1.126.4;
UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.125 10-Dec-2018  jdolecek branches: 1.125.6;
make UFS_WAPBL_JLOCK_ASSERT() #ifdef DIAGNOSTIC, same as the underlying
function KASSERT(), so that it actually does something; fix code using
it to actually pass correct params, so that it compiles

remove UFS_WAPBL_JUNLOCK_ASSERT(), as that is inherently racy (it's
okay on those places if the rwlock is held by other lwp); depend
on the RW_ASSERT()/LOCKDEBUG inside rw_enter() to catch the case
with wapbl rwlock held by current lwp
 1.124 18-Mar-2017  riastradh branches: 1.124.12; 1.124.14;
#if DIAGNOSTIC panic ---> KASSERT
 1.123 11-Nov-2016  hannken branches: 1.123.2;
Fix a "slight tweak" from Rev. 1.121: bap1/bap2 must be valid
before using BAP_ASSIGN().

Prevents NULL pointer dereference when "lastbn >= 0".
 1.122 10-Nov-2016  jdolecek during truncate with wapbl, register deallocation for upper indirect block
before recursing into lower blocks, to make sure that it will be removed after
all its referenced blocks are removed

fixes 'ffs_blkfree_common: freeing free block' panic triggered by
ufs_truncate_retry() when just the upper indirect block registration failed,
code tried to free the lower blocks again after wapbl flush

problem found by hannken@, thank you
 1.121 10-Nov-2016  jdolecek ffs_indirtrunc(): for !wapbl, restore rev 1.117 behavior of writing the zeroed
(indirect) block before freeing the referenced blocks; it's necessary for
fsck to recover the filesystem, if system goes down during truncate

patch courtesy of hannken@ with only sligh tweaks
 1.120 07-Nov-2016  jdolecek fix broken test for partial truncate, introduced in rev 1.118

PR kern/51601 kern/51602
 1.119 07-Nov-2016  jdolecek reduce diff vs 1.117, no functional change
 1.118 28-Oct-2016  jdolecek reorganize ffs_truncate()/ffs_indirtrunc() to be able to partially
succeed; change wapbl_register_deallocation() to return EAGAIN
rather than panic when code hits the limit

callers changed to either loop calling ffs_truncate() using new
utility ufs_truncate_retry() if their semantics requires it, or
just ignore the failure; remove ufs_wapbl_truncate()

this fixes possible user-triggerable panic during truncate, and
resolves WAPBL performance issue with truncates of large files

PR kern/47146 and kern/49175
 1.117 28-Mar-2015  maxv branches: 1.117.2;
Remove the 'cred' argument from bread(). Remove a now unused var in
ffs_snapshot.c. Update the man page accordingly.

ok hannken@
 1.116 20-Oct-2013  htodd branches: 1.116.6;
Definining needswap where needed.
 1.115 23-Jun-2013  dholland branches: 1.115.2;
Stick ffs_, ext2_, chfs_, filecore_, cd9660_, or mfs_ in front of
the following symbols so as to disambiguate fully. (Christos already
did the lfs ones.)

lblkno
lblktosize
lfragtosize
numfrags
blkroundup
fragroundup
 1.114 23-Jun-2013  dholland fsbtodb() -> FFS_FSBTODB(), EXT2_FSBTODB(), or MFS_FSBTODB()
dbtofsb() -> FFS_DBTOFSB() or EXT2_DBTOFSB()

(Christos already did the lfs ones a few days back)
 1.113 19-Jun-2013  dholland Rename ambiguous macros:
MAXDIRSIZE -> UFS_MAXDIRSIZE or LFS_MAXDIRSIZE
NINDIR -> FFS_NINDIR, EXT2_NINDIR, LFS_NINDIR, or MFS_NINDIR
INOPB -> FFS_INOPB, LFS_INOPB
INOPF -> FFS_INOPF, LFS_INOPF
blksize -> ffs_blksize, ext2_blksize, or lfs_blksize
sblksize -> ffs_blksize

These are not the only ambiguously defined filesystem macros, of
course, there's a pile more. I may not have found all the ambiguous
definitions of blksize(), too, as there are a lot of other things
called 'blksize' in the system.
 1.112 22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.111 20-Dec-2012  hannken Change bread() and breadn() to never return a buffer on
error and modify all callers to not brelse() on error.

Welcome to 6.99.16

PR kern/46282 (6.0_BETA crash: msdosfs_bmap -> pcbmap -> bread -> bio_doread)
 1.110 09-Jul-2012  matt branches: 1.110.2;
Convert a KDASSERT to a KDASSERTMSG
 1.109 27-Jan-2012  para converting readdir in ffs ext2fs from malloc(9) to kmem(9)
while there allocate ufs mount structs from kmem(9) too
preceding kmem-vmem-pool-patch

releng@ acknowledged
 1.108 23-Nov-2011  bouyer branches: 1.108.2;
If ufs_balloc_range() fails, make sure to call ?fs_truncate() to
reset v_writesize to the right value.
If v_writesize is left larger than the allocated blocks, we may have
the same issue as the one described in
http://mail-index.netbsd.org/tech-kern/2010/02/02/msg007156.html
 1.107 16-Jun-2011  hannken branches: 1.107.2;
Rename uvm_vnp_zerorange(struct vnode *, off_t, size_t) to
ubc_zerorange(struct uvm_object *, off_t, size_t, int) changing
the first argument to an uvm_object and adding a flags argument.

Modify tmpfs_reg_resize() to zero the backing store (aobj) instead
of the vnode. Ubc_purge() no longer panics when unmounting tmpfs.

Keep uvm_vnp_zerorange() until the next kernel version bump.
 1.106 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.105 06-Mar-2011  bouyer branches: 1.105.2;
merge the bouyer-quota2 branch. This adds a new on-disk format
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.

See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
 1.104 07-Feb-2010  bouyer branches: 1.104.4; 1.104.6; 1.104.8;
- ufs_balloc_range(): on error, only PG_RELEASED the pages that were
allocated to extend the file to the new size. Releasing all pages
may release pages that contains previously-written data not yet flushed
to disk. Should fix PR kern/35704
- {ffs,lfs,ext2fs}_truncate(): Even if the inode's size is the same as
the new length, call uvm_vnp_setsize(). *_truncate() may have been
called by *_write() in the error path (e.g. block allocation failure
because of quota of file system full), and at this point v_writesize
has been set to the desired size of the file and not reverted to the
old size. Not adjusting v_writesize to the real size cause
genfs_do_io() to write to disk past the real end of the file.
 1.103 22-Feb-2009  ad PR kern/26878 FFSv2 + softdep = livelock (no free ram)
PR kern/16942 panic with softdep and quotas
PR kern/19565 panic: softdep_write_inodeblock: indirect pointer #1 mismatch
PR kern/26274 softdep panic: allocdirect_merge: ...
PR kern/26374 Long delay before non-root users can write to softdep partitions
PR kern/28621 1.6.x "vp != NULL" panic in ffs_softdep.c:4653 while unmounting a softdep (+quota) filesystem
PR kern/29513 FFS+Softdep panic with unfsck-able file-corruption
PR kern/31544 The ffs softdep code appears to fail to write dirty bits to disk
PR kern/31981 stopping scsi disk can cause panic (softdep)
PR kern/32116 kernel panic in softdep (assertion failure)
PR kern/32532 softdep_trackbufs deadlock
PR kern/37191 softdep: locking against myself
PR kern/40474 Kernel panic after remounting raid root with softdep

Retire softdep, pass 2. As discussed and later formally announced on the
mailing lists.
 1.102 15-Jan-2009  pooka branches: 1.102.2;
Revert 1.101, author did not provide a justification.
 1.101 23-Dec-2008  cegger ffs_update: sprinkle KASSERTs
 1.100 17-Dec-2008  cegger kill MALLOC and FREE macros.
 1.99 30-Aug-2008  hannken branches: 1.99.2; 1.99.4; 1.99.10;
ffs_truncate() always runs with journal locked. Propagate this information
to VOP_PUTPAGES().

Report from Lars Nordlund on current-users@
 1.98 31-Jul-2008  simonb Merge the simonb-wapbl branch. From the original branch commit:

Add Wasabi System's WAPBL (Write Ahead Physical Block Logging)
journaling code. Originally written by Darrin B. Jewell while
at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

OK'd by core@, releng@.
 1.97 03-Jun-2008  hannken branches: 1.97.2; 1.97.4;
ufs/ffs: replace calls to getblk() with ffs_getblk(). Now all buffers
have been run through copy-on-write and async mounts work again.

Fixes PR kern/38820

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.96 16-May-2008  hannken Make sure all cached buffers with valid, not yet written data have been
run through copy-on-write. Call fscow_run() with valid data where possible.

The LP_UFSCOW hack is no longer needed to protect ffs_copyonwrite() against
endless recursion.

- Add a flag B_MODIFY to bread(), breada() and breadn(). If set the caller
intends to modify the buffer returned.

- Always run copy-on-write on buffers returned from ffs_balloc().

- Add new function ffs_getblk() that gets a buffer, assigns a new blkno,
may clear the buffer and runs copy-on-write. Process possible errors
from getblk() or fscow_run(). Part of PR kern/38664.

Welcome to 4.99.63

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.95 27-Mar-2008  ad branches: 1.95.2; 1.95.4; 1.95.6;
Make rusage collection per-LWP and collate in the appropriate places.
cloned threads need a little bit more work but the locking needs to
be fixed first.
 1.94 09-Jan-2008  ad branches: 1.94.6;
Go back to freeing on disk inodes in the inactive routine. It would be
better not to do this, but it rules out potential side effects with softdep.
 1.93 02-Jan-2008  ad Merge vmlocking2 to head.
 1.92 08-Dec-2007  pooka branches: 1.92.4;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.
 1.91 08-Dec-2007  ad Grab ump->um_lock in another spot.
 1.90 26-Nov-2007  pooka branches: 1.90.2;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.89 08-Oct-2007  ad branches: 1.89.4;
Merge ffs locking & brelse changes from the vmlocking branch.
 1.88 10-Jul-2007  hannken branches: 1.88.6; 1.88.8; 1.88.10;
Move `struct dquot' and its supporting functions from quota.h to ufs_quota.c.

- Make quota-internal functions static.
- Clean up declarations in quota.h and ufs_extern.h. quota.h now has the
description of quota criterions, on-disk structure, user-kernel interface and
declaration of init/done functions. All ufs quota related function
prototypes go to ufs_extern.h.
- New functions ufsquota_init() and ufsquota_free() create or destroy the
quota fields of `struct inode'.
- chkdq() and chkiq() always update the quota fields of `struct inode' first.
- Only ufs_access() explicitely calls getinoquota().

No objections on tech-kern@
 1.87 05-Jun-2007  yamt improve post-ubc file overwrite performance in common cases.
ie. when it's safe, actually overwrite blocks rather than doing
read-modify-write.

also fixes PR/33152 and PR/36303.
 1.86 04-Mar-2007  christos branches: 1.86.2; 1.86.4; 1.86.6;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.85 17-Oct-2006  yamt branches: 1.85.4;
ffs_truncate: don't forget to zero the past eof in the case of
blocksize < pagesize. PR/33777 from Simon Burge.
XXX check other filesystems, esp. lfs.
 1.84 14-Oct-2006  yamt don't use g_glock directly.
 1.83 23-Jun-2006  yamt branches: 1.83.4; 1.83.6;
fix a simonb-timecounters regression.
the precision of getnanotime() is not suitable for file timestamps.
esp. when it's nfs-exported.

- introduce vfs_timestamp().
(the name is from freebsd. currently merely a wrapper of nanotime())
- for ufs-like filesystems, use it rather than getnanotime().

XXX check other filesystems.
 1.82 07-Jun-2006  kardel branches: 1.82.2;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.81 14-May-2006  elad branches: 1.81.2;
integrate kauth.
 1.80 18-Mar-2006  bouyer Fix dead error condition, coverity ID 747.
 1.79 11-Dec-2005  christos branches: 1.79.4; 1.79.6; 1.79.8; 1.79.10; 1.79.12;
merge ktrace-lwp.
 1.78 11-Nov-2005  yamt - ignore truncation for VCHR/VBLK/VFIFO as it used to be
before yamt-vop merge. PR/32049 from Atsushi Onoe.
- reject setattr which attempts to change size of VLNK/VSOCK.
 1.77 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.76 27-Sep-2005  yamt branches: 1.76.2;
introduce "ufs_ops" and use it for ITIMES.
 1.75 12-Sep-2005  christos Add another KASSERT.
 1.74 12-Sep-2005  drochner move the new ffs_itimes() to a berr place -- ffs_subr.c is shared with
userland
 1.73 12-Sep-2005  christos Use nanotime() to update the time fields in filesystems. Convert the code
from macros to real functions. Original patch and review from chuq.
Note: ext2fs only keeps seconds in the on-disk inode, and msdosfs does not
have enough precision for all fields, so this is not very useful for those
two.
 1.72 15-Jul-2005  thorpej Use ANSI function decls.
 1.71 15-Aug-2004  mycroft branches: 1.71.12;
Don't write out the extra zero pages with PGO_SYNCIO. We start an asynchronous
write anyway, and they will not be freed until that write is finished.
 1.70 15-Aug-2004  mycroft Correct the fix for the partial-truncate inefficiency. We still need to zero,
but we only need to sync those pages that are being lopped off, if any.
 1.69 15-Aug-2004  mycroft Minor simplification to some arithmetic.
 1.68 15-Aug-2004  mycroft Fixing age old cruft:
* Rather than using mnt_maxsymlinklen to indicate that a file systems returns
d_type fields(!), add a new internal flag, IMNT_DTYPE.

Add 3 new elements to ufsmount:
* um_maxsymlinklen, replaces mnt_maxsymlinklen (which never should have existed
in the first place).
* um_dirblksiz, which tracks the current directory block size, eliminating the
FS-specific checks littered throughout the code. This may be used later to
make the block size variable.
* um_maxfilesize, which is the maximum file size, possibly adjusted lower due
to implementation issues.

Sync some bug fixes from FFS into ext2fs, particularly:
* ffs_lookup.c 1.21, 1.28, 1.33, 1.48
* ffs_inode.c 1.43, 1.44, 1.45, 1.66, 1.67
* ffs_vnops.c 1.84, 1.85, 1.86

Clean up some crappy pointer frobnication.
 1.67 14-Aug-2004  mycroft Partially fix a performance problem in the partial-truncate case. We were
doing synchronous writes unnecessarily in a couple of places. Now it's 1
write per truncate in my test case rather than 3. :-P
 1.66 14-Aug-2004  mycroft There is no need to do a synchronous write when truncating a short symlink.
 1.65 14-Aug-2004  mycroft Add a new flag, IN_MODIFY. This is like IN_UPDATE|IN_CHANGE, but unlike
setting those flags, it does not cause the inode to be written in the periodic
sync. This is used for writes to special files (devices and named pipes) and
FIFOs.

Do not preemptively sync updates to access times and modification times. They
are now updated in the inode only opportunistically, or when the file or device
is closed. (Really, it should be delayed beyond close, but this is enough to
help substantially with device nodes.)

And the most amusing part:
Trickle sync was broken on both FFS and ext2fs, in different ways. In FFS, the
periodic call to VFS_SYNC(MNT_LAZY) was still causing all file data to be
synced. In ext2fs, it was causing the metadata to *not* be synced. We now
only call VOP_UPDATE() on the node if we're doing MNT_LAZY. I've confirmed
that we do in fact trickle correctly now.
 1.64 20-Jun-2004  hannken Use one daddr_t XXXblks[NDADDR + NIADDR] instead of two.
No functional changes. Reduces kernel stack usage by 120 bytes.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.63 25-May-2004  hannken Add ffs internal snapshots. Written by Marshall Kirk McKusick for FreeBSD.

- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.

Welcome to 2.0F.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.62 25-Jan-2004  hannken Make VOP_STRATEGY(bp) a real VOP as discussed on tech-kern.

VOP_STRATEGY(bp) is replaced by one of two new functions:

- VOP_STRATEGY(vp, bp) Call the strategy routine of vp for bp.
- DEV_STRATEGY(bp) Call the d_strategy routine of bp->b_dev for bp.

DEV_STRATEGY(bp) is used only for block-to-block device situations.
 1.61 10-Jan-2004  yamt store a i/o priority hint in struct buf for buffer queue discipline.
 1.60 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.59 29-Jun-2003  fvdl branches: 1.59.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.58 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.57 15-May-2003  kristerw The C language does not permit statements of the form
(X ? Y : Z) = 0;
even though gcc handles this by a stupid extension.

Transform these to correct C.

Approved by fvdl.
 1.56 10-Apr-2003  fvdl Remove some leftover diagnostic checks.
 1.55 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.54 25-Jan-2003  fvdl The oldblks and newblks arrays are used to store direct copies of
on-disk block pointers, so they should be int32_t. Error found
by Izumi Tsutsui.
 1.53 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.52 26-Sep-2002  simonb Move a brace that is in the wrong position when changes from FreeBSD
were added in rev 1.51. This may fix the "N lost blocks" problem some
people have noticed.
Reviewed by fvdl.
 1.51 18-Dec-2001  fvdl branches: 1.51.10;
Bring over fixes from FreeBSD that weren't incorporated yet, mainly
from Kirk McKusick. They implement taking pending block/inode frees
into account for the sake of correct statfs() numbers, and adding
a new softdep type (newdirblk) to correctly handle newly allocated
directory blocks.

Minor additional changes: 1) swap the newly introduced fs_pendinginodes
and fs_pendingblock fields in ffs_sb_swap, and 2) declare lkt_held
in the debug version of the softdep lock structure volatile, as it
can be modified from interrupt context #ifdef DEBUG.
 1.50 18-Dec-2001  chs when truncating a file, make sure the last block of the file is actually
allocated, since other parts of the code assume this.
 1.49 30-Nov-2001  chs VOP_PUTPAGES() requires page-aligned offsets, so be sure to provide such.
fixes PR 14759.

(while I'm here, call VOP_PUTPAGES() directly instead of indirecting through
the UVM pager op vector.)
 1.48 08-Nov-2001  chs in both paths that can cause fragments to be expanded (write and truncate-up),
deal with the fragment expansion separately before the rest of the operation.
this allows us to simplify ufs_balloc_range() by not worrying about implicit
fragment expansion.

call VOP_PUTPAGES() directly for vnodes instead of
going through the UVM pager "put" vector.
 1.47 06-Nov-2001  simonb Remove some bogus checks for unsigned variables < 0.
 1.46 30-Oct-2001  lukem add __KERNEL_RCSID()
 1.45 28-Sep-2001  chs branches: 1.45.2;
handle allocation errors in truncate-up case.
 1.44 20-Sep-2001  chs we can't assert that the inode and vnode sizes are consistent at the start
of ffs_truncate() since there are cases (eg. when ffs_write() gets ENOSPC)
where they should be different. move the assert to the end instead.
 1.43 15-Sep-2001  chs a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.42 30-Aug-2001  chs branches: 1.42.2;
min() -> MIN()
 1.41 30-May-2001  mrg branches: 1.41.4;
use _KERNEL_OPT
 1.40 27-Jan-2001  augustss branches: 1.40.2;
Fix from chuq:
don't update UVM's notion of the file size before the VOP_FSYNC() when
we're partially truncating a file with softdeps enabled. doing so could
free pages without updating the dependency info, which would result in
"panic: softdep_write_inodeblock: direct pointer #1 mismatch 0 != N".
 1.39 01-Jan-2001  matt Convert a MALLOC with a variable size to malloc(). Saves 220 bytes of text
on VAX.
 1.38 27-Nov-2000  chs Initial integration of the Unified Buffer Cache project.
 1.37 19-Sep-2000  fvdl Adapt for VOP_FSYNC parameter change.

Implement range fsync for FFS. Note: not yet implemented for the
SOFTDEP case.
 1.36 28-Jun-2000  mrg remove include of <vm/vm.h> and <uvm/uvm_extern.h>
 1.35 30-May-2000  mycroft branches: 1.35.2;
In ffs_update():
* Move the clearing of IN_MODIFIED and IN_ACCESSED later, so they are not
cleared if the bread() failed.
* Explicitly set waitfor to 0 in the softdep case, if IN_MODIFIED is not
set (mirroring the bwrite()/bdwrite() decision).
 1.34 29-May-2000  mycroft Add a new inode flags called IN_ACCESSED. This used in place of IN_MODIFIED
to record that the atime was updated. In ffs_update(), we only do synchronous
writes if something *other* than the atime was changed.
 1.33 28-May-2000  mycroft When unwinding a failed allocation, make sure to nuke the unwound block from
the vnode's block list. This fixes `itrunc3' panics (at least in some cases;
further testing is needed) and prevents further lossage later on.
 1.32 28-May-2000  mycroft Add a new function to remove extra buffers when truncating a file. This is
more generic than the vinvalbuf(V_SAVEMETA) case, avoiding synchronous
operations when truncating to a non-zero length.
 1.31 13-May-2000  perseant branches: 1.31.2;
Change the sementics of the last parameter from a boolean ("waitfor") to
a set of flags ("flags"). Two flags are defined, UPDATE_WAIT and
UPDATE_DIROP.

Under the old semantics, VOP_UPDATE would block if waitfor were set,
under the assumption that directory operations should be done
synchronously. At least LFS and FFS+softdep do not make this
assumption; FFS+softdep got around the problem by enclosing all relevant
calls to VOP_UPDATE in a "if(!DOINGSOFTDEP(vp))", while LFS simply
ignored waitfor, one of the reasons why NFS-serving an LFS filesystem
did not work properly.

Under the new semantics, the UPDATE_DIROP flag is a hint to the
fs-specific update routine that the call comes from a dirop routine, and
should be wait for, or not, accordingly.

Closes PR#8996.
 1.30 30-Mar-2000  augustss Remove register declarations.
 1.29 15-Nov-1999  fvdl Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.28 24-Mar-1999  mrg branches: 1.28.4; 1.28.8; 1.28.10; 1.28.14;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.
 1.27 05-Mar-1999  mycroft Pass null pointers to VOP_UPDATE rather than having all the callers fetch the
current time themselves.
 1.26 05-Mar-1999  mycroft Permit the access and modify time pointers passed to VOP_UPDATE to be null,
meaning the current time.
 1.25 12-Nov-1998  thorpej defopt FFS_EI
 1.24 23-Oct-1998  thorpej branches: 1.24.2;
Use DINODE_SIZE rather than pointer arithmetic.
 1.23 04-Oct-1998  christos Missed a conditional for FFS_EI; appears when we compile without -Ox
 1.22 09-Aug-1998  perry bzero->memset, bcopy->memcpy, bcmp->memcmp
 1.21 09-Jun-1998  scottr Protect various config(8)-generated files from inclusion while
building LKMs. Fixes PR 5557.
 1.20 08-Jun-1998  scottr Use the newly-defined opt_quota.h.
 1.19 18-Mar-1998  bouyer Add support for reading/writing FFS in non-native byte order, conditioned
to "options FFS_EI". The superblock and inodes (without blk addr) are
byteswapped at disk read/write time, other metadatas are byteswapped
when used (as they are acceeded directly in the buffer cache).
This required the addition of a "um_flags" field to struct ufsmount.
ffs_bswap.c contains superblock and inode byteswap routines also used
by userland utilities.
 1.18 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.17 10-Feb-1998  mrg - add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.16 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)
 1.15 04-Jul-1997  drochner Don't cast 64bit (off_t) file sizes to vm_offset_t (32bit on many
architectures), truncate them intelligently instead.
The truncation is done centralized in vnode_pager.c.
This prevents from wrap-over effects when parts of large (>2^32 byte) files
are mmapped.
Don't allow to mmap above the numerical range of vm_offset_t.
This is considered a temporary solution until the vm system handles the
object sizes/offsets more cleanly.
 1.14 11-Jun-1997  bouyer Add support for ext2fs, this needed a few modifications to ufs/ufs/inode.h:
- added an "union inode_ext" to struct inode, for the per-fs extentions.
For now only ext2fs uses it.
- i_din is now an union:
union {
struct dinode ffs_din; /* 128 bytes of the on-disk dinode. */
struct ext2fs_dinode e2fs_din; /* 128 bytes of the on-disk dinode. */
} i_din
Added a lot of #define i_ffs_* and i_e2fs_* to access the fields.
- Added two macros: FFS_ITIMES and EXT2FS_ITIMES. ITIMES calls the rigth
macro, depending on the time of the inode. ITIMES is used where necessary,
FFS_ITIMES and EXT2FS_ITIMES in other places.
 1.13 27-Jan-1997  tls Correct old inode flag names in comment, and reformat for 80 character screen
 1.12 06-Nov-1996  thorpej Performance enhancement from Kirk McKusick <mckusick@McKusick.COM>:
When freeing an indirect block, there is no need to write it (synchronously,
no less!) before tossing it.
 1.11 01-Sep-1996  mycroft Add a set of generic file system operations that most file systems use.
Also, fix some time stamp bogosities.
 1.10 11-May-1996  mycroft Change VOP_UPDATE() semantics:
* Make 2nd and 3rd args timespecs, not timevals.
* Consistently pass a Boolean as the 4th arg (except in LFS).
Also, fix ffs_update() and lfs_update() to actually change the nsec fields.
 1.9 09-Feb-1996  christos ffs prototypes
 1.8 15-Jun-1995  cgd compensate for timeval/timespec/stat structure changes.
 1.7 14-Dec-1994  mycroft Sync with CSRG.
 1.6 28-Oct-1994  mycroft Don't allow truncating past maxfilesize.
 1.5 29-Jun-1994  cgd branches: 1.5.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.4 15-Jun-1994  mycroft Fastlink compat.
 1.3 13-Jun-1994  mycroft Format police.
 1.2 13-Jun-1994  pk Check requested file size; negative values cause havoc.
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.5.2.2 23-Nov-1994  cgd from mycroft, for patch_05
 1.5.2.1 19-Oct-1994  cgd temporary sanity checks, as suggested by charles.
 1.24.2.3 30-May-1999  chs update call to ffs_balloc() for new args.
fix an uninitialize variable in ffs_truncate().
 1.24.2.2 25-Feb-1999  chs add UBC stuff to ffs_truncate().
 1.24.2.1 09-Nov-1998  chs initial snapshot. lots left to do.
 1.28.14.2 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.28.14.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.28.10.2 26-Oct-1999  fvdl Merge changes in the trickle-sync and softdep code as done by Kirk McKusick
in FreeBSD since the version that we based the branch on. Merging mostly
done by Ethan Solomita <ethan@geocast.com>.

Also, make sure the syncer thread/process isn't active when we're
unmounting a filesystem. This could wreak havoc. XXX should be done
on a per-mountpoint basis, but especially the softdep code would
end up to be a big pile of vfs_busy() calls.
 1.28.10.1 19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.28.8.4 11-Feb-2001  bouyer Sync with HEAD.
 1.28.8.3 05-Jan-2001  bouyer Sync with HEAD
 1.28.8.2 08-Dec-2000  bouyer Sync with HEAD.
 1.28.8.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.28.4.3 31-Jul-1999  chs simplify ffs_truncate().
 1.28.4.2 11-Jul-1999  chs remove uvm_vnp_uncache(), it's no longer needed.
 1.28.4.1 07-Jun-1999  chs merge everything from chs-ubc branch.
 1.31.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.35.2.3 26-Feb-2002  he Apply patch (requested by fvdl):
Fix a panic in the FFS sofdep code on an NFS server triggered by
an excerciser program run on an NFS client.
 1.35.2.2 30-Sep-2001  he Apply patch (requested by chuck):
Make one call to uvm_vnp_uncache() conditional. Fixes a panic
when removing an mmap'ing to an unlinked, closed file.
 1.35.2.1 14-Dec-2000  he Pull up revision 1.37 (requested by fvdl):
Improve NFS performance, possibly with as much as 100% in
throughput. Please note: this implies a kernel interface change,
VOP_FSYNC gains two arguments.
 1.40.2.9 18-Oct-2002  nathanw Catch up to -current.
 1.40.2.8 12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.40.2.7 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.40.2.6 08-Jan-2002  nathanw Catch up to -current.
 1.40.2.5 14-Nov-2001  nathanw Catch up to -current.
 1.40.2.4 08-Oct-2001  nathanw Catch up to -current.
 1.40.2.3 21-Sep-2001  nathanw Catch up to -current.
 1.40.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.40.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.41.4.3 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.41.4.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.41.4.1 13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.42.2.1 01-Oct-2001  fvdl Catch up with -current.
 1.45.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.51.10.1 24-Jun-2003  grant Pull up revision 1.52 (requested by nakayama in ticket #1333):

Move a brace that is in the wrong position when changes from FreeBSD
were added in rev 1.51. This may fix the "N lost blocks" problem some
people have noticed.
Reviewed by fvdl.
 1.59.2.7 11-Dec-2005  christos Sync with head.
 1.59.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.59.2.5 21-Sep-2004  skrll Fix the sync with head I botched.
 1.59.2.4 18-Sep-2004  skrll Sync with HEAD.
 1.59.2.3 25-Aug-2004  skrll Sync with HEAD.
 1.59.2.2 03-Aug-2004  skrll Sync with HEAD
 1.59.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.71.12.6 21-Jan-2008  yamt sync with head
 1.71.12.5 07-Dec-2007  yamt sync with head
 1.71.12.4 27-Oct-2007  yamt sync with head.
 1.71.12.3 03-Sep-2007  yamt sync with head.
 1.71.12.2 30-Dec-2006  yamt sync with head.
 1.71.12.1 21-Jun-2006  yamt sync with head.
 1.76.2.2 29-Oct-2005  yamt use ffs_* directly rather than via ufs_ops.
suggested by Chuck Silvers.
 1.76.2.1 20-Oct-2005  yamt adapt ufs.
 1.79.12.2 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.79.12.1 28-Mar-2006  tron Merge 2006-03-28 NetBSD-current into the "peter-altq" branch.
 1.79.10.3 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.79.10.2 19-Apr-2006  elad sync with head.
 1.79.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.79.8.3 26-Jun-2006  yamt sync with head.
 1.79.8.2 24-May-2006  yamt sync with head.
 1.79.8.1 01-Apr-2006  yamt sync with head.
 1.79.6.4 01-Jun-2006  kardel Sync with head.
 1.79.6.3 22-Apr-2006  simonb Sync with head.
 1.79.6.2 05-Feb-2006  simonb In the *itimes functions, just call getnanotime() at the start of
the function and use the result if needed, rather than the previous
conditional calls/assignments method. The code is clearer this way,
and benchmarks at about the same speed.
 1.79.6.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time() and use "time_second"
instead of "time.tv_sec".
 1.79.4.1 09-Sep-2006  rpaulo sync with head
 1.81.2.1 19-Jun-2006  chap Sync with head.
 1.82.2.1 13-Jul-2006  gdamore Merge from HEAD.
 1.83.6.1 22-Oct-2006  yamt sync with head
 1.83.4.1 18-Nov-2006  ad Sync with head.
 1.85.4.1 12-Mar-2007  rmind Sync with HEAD.
 1.86.6.1 09-Dec-2007  reinoud Pullup to HEAD
 1.86.4.1 11-Jul-2007  mjf Sync with head.
 1.86.2.6 24-Aug-2007  ad Sync with buffer cache locking changes. See buf.h/vfs_bio.c for details.
Some minor portions are incomplete and needs to be verified as a whole.
 1.86.2.5 15-Jul-2007  ad Sync with head.
 1.86.2.4 09-Jun-2007  ad Sync with head.
 1.86.2.3 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.86.2.2 13-Apr-2007  ad Put a per-mount lock around ffs shared data structures, excluding softdep
and quotas. Strategy lifted from FreeBSD.
 1.86.2.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.88.10.1 14-Oct-2007  yamt sync with head.
 1.88.8.3 23-Mar-2008  matt sync with HEAD
 1.88.8.2 09-Jan-2008  matt sync with HEAD
 1.88.8.1 06-Nov-2007  matt sync with HEAD
 1.88.6.3 09-Dec-2007  jmcneill Sync with HEAD.
 1.88.6.2 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.88.6.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.89.4.3 18-Feb-2008  mjf Sync with HEAD.
 1.89.4.2 27-Dec-2007  mjf Sync with HEAD.
 1.89.4.1 08-Dec-2007  mjf Sync with HEAD.
 1.90.2.4 30-Dec-2007  ad ffs_update: if softdep and the inode has been unlinked, wait for the update
(and so dependencies) to flush. Ensures that the slate is clean when the
inode is reused. Should work around "panic: handle_written_inodeblock:
filefree".
 1.90.2.3 26-Dec-2007  ad Sync with head.
 1.90.2.2 08-Dec-2007  ad Sync with head.
 1.90.2.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.92.4.2 10-Jan-2008  bouyer Sync with HEAD
 1.92.4.1 02-Jan-2008  bouyer Sync with HEAD
 1.94.6.5 17-Jan-2009  mjf Sync with HEAD.
 1.94.6.4 28-Sep-2008  mjf Sync with HEAD.
 1.94.6.3 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.94.6.2 02-Jun-2008  mjf Sync with HEAD.
 1.94.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.95.6.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.95.6.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.95.4.2 11-Mar-2010  yamt sync with head
 1.95.4.1 04-May-2009  yamt sync with head.
 1.95.2.2 04-Jun-2008  yamt sync with head
 1.95.2.1 18-May-2008  yamt sync with head.
 1.97.4.1 19-Oct-2008  haad Sync with HEAD.
 1.97.2.2 12-Jun-2008  martin License police
 1.97.2.1 10-Jun-2008  simonb Initial commit of Wasabi System's WAPBL (Write Ahead Physical Block
Logging) journaling code. Originally written by Darrin B. Jewell
while at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

Still a number of issues - look in doc/BRANCHES for "simonb-wapbl"
for more info.
 1.99.10.1 21-Apr-2010  matt sync to netbsd-5
 1.99.4.2 25-Jan-2012  riz Pull up following revision(s) (requested by bouyer in ticket #1702):
sys/ufs/lfs/lfs_inode.c: revision 1.126
sys/ufs/ffs/ffs_inode.c: revision 1.108
If ufs_balloc_range() fails, make sure to call ?fs_truncate() to
reset v_writesize to the right value.
If v_writesize is left larger than the allocated blocks, we may have
the same issue as the one described in
http://mail-index.netbsd.org/tech-kern/2010/02/02/msg007156.html
 1.99.4.1 22-Feb-2010  snj Pull up following revision(s) (requested by bouyer in ticket #1302):
sys/ufs/ext2fs/ext2fs_inode.c: revision 1.71
sys/ufs/ffs/ffs_inode.c: revision 1.104
sys/ufs/lfs/lfs_inode.c: revision 1.121
sys/ufs/ufs/ufs_inode.c: revision 1.79
- ufs_balloc_range(): on error, only PG_RELEASED the pages that were
allocated to extend the file to the new size. Releasing all pages
may release pages that contains previously-written data not yet flushed
to disk. Should fix PR kern/35704
- {ffs,lfs,ext2fs}_truncate(): Even if the inode's size is the same as
the new length, call uvm_vnp_setsize(). *_truncate() may have been
called by *_write() in the error path (e.g. block allocation failure
because of quota of file system full), and at this point v_writesize
has been set to the desired size of the file and not reverted to the
old size. Not adjusting v_writesize to the real size cause
genfs_do_io() to write to disk past the real end of the file.
 1.99.2.2 03-Mar-2009  skrll Sync with HEAD.
 1.99.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.102.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.104.8.1 20-Jan-2011  bouyer Snapshot of work in progress on a modernised disk quota system:
- new quotactl syscall (versionned for backward compat), which takes
as parameter a path to a mount point, and a prop_dictionary
(in plistref format) describing commands and arguments.
For each command, status and data are returned as a prop_dictionary.
quota commands features will be added to take advantage of this,
exporting quota data or getting quota commands as plists.

- new on disk-format storage (all 64bit wide), integrated to metadata for
ffs (and playing nicely with wapbl).
Quotas are enabled on a ffs filesystem via superblock flags.
tunefs(8) can enable or disable quotas.
On a quota-enabled filesystem, fsck_ffs(8) will track per-uid/gid
block and inode usages, and will check and update quotas in Pass 6.
quota usage and limits are stored in unliked files (one for users,
one for groups)l fsck_ffs(8) will create the files if needed, or
free them if needed. This means that after enabling or disabling
quotas on a filesystem; a fsck_ffs(8) run is required.
quotacheck(8) is not needed any more, on a unclean shutdown
fsck or journal replay will take care of fixing quotas.
newfs(8) can create a ready-to-mount quota-enabled filesystem
(superblock flags are set and quota inodes are created).
Other new features or semantic changes:
- default quota datas, applied to users or groups which don't already
have a quota entry
- per-user/group grace time (instead of a filesystem global one)
- 0 really means "nothing allowed at all", not "no limit".
If you want "no limit", set the limit to UQUAD_MAX (tools will
understand "unlimited" and "-")

A quota file is structured as follow:
it starts with a header, containing a few per-filesystem values,
and the default quota limits.
Quota entries are linked together as a simple list, each entry has a
pointer (as an offset withing the file) to the next.
The header has a pointer to a list of free quota entries, and
a hash table of in-use entries. The size of the hash table depends
on the filesystem block size (header+hash table should fit in the
first block). The file is not sparse and is a multiple of
filesystem block size (when the free quota entry list is empty a new
filesystem block is allocated). quota entries to not cross
filesystem block boundaries.

In memory, the kernel keeps a cache of recently used quota entries
as a reference to the block number, and offset withing the block.
The quota entry itself is keept in the buf cache.

fsck_ffs(8), tunefs(8) and newfs(8) supports are completed (with
related atf tests :)
The kernel can update disk usage and report it via quotactl(2).

Todo: enforce quotas limits (limits are not checked by kernel yet)
update repquota, edquota and rpc.rquotad to the new world
implement compat_50_quotactl ioctl.
update quotactl(2) man page

fsck_ffs required fixes so that allocating new blocks or inodes will
properly update the superblock and cg sumaries. This was not an issue up
to now because superblock and cg sumaries check happened last, but now
allocations or frees can happen in pass 6.
 1.104.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.104.4.2 21-Apr-2011  rmind sync with head
 1.104.4.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.105.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.107.2.4 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.107.2.3 23-Jan-2013  yamt sync with head
 1.107.2.2 30-Oct-2012  yamt sync with head
 1.107.2.1 17-Apr-2012  yamt sync with head
 1.108.2.1 18-Feb-2012  mrg merge to -current.
 1.110.2.4 03-Dec-2017  jdolecek update from HEAD
 1.110.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.110.2.2 23-Jun-2013  tls resync from head
 1.110.2.1 25-Feb-2013  tls resync with head
 1.115.2.1 18-May-2014  rmind sync with head
 1.116.6.3 28-Aug-2017  skrll Sync with HEAD
 1.116.6.2 05-Dec-2016  skrll Sync with HEAD
 1.116.6.1 06-Apr-2015  skrll Sync with HEAD
 1.117.2.3 20-Mar-2017  pgoyette Sync with HEAD
 1.117.2.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.117.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.123.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.124.14.3 21-Apr-2020  martin Sync with HEAD
 1.124.14.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.124.14.1 10-Jun-2019  christos Sync with HEAD
 1.124.12.1 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.125.6.1 29-Feb-2020  ad Sync with head.
 1.126.4.2 25-Apr-2020  bouyer Sync with bouyer-xenpvh-base2 (HEAD)
 1.126.4.1 20-Apr-2020  bouyer Sync with HEAD
 1.1 30-Mar-2007  mjf branches: 1.1.2;
file ffs_journal.c was initially added on branch mjf-ufs-trans.
 1.1.2.1 30-Mar-2007  mjf Provide a test journal. It's just a wrapper to bwrite and doesn't
actually do any journaling, but we need something to give the
transactions to.
 1.7 17-Jan-2020  ad VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.
 1.6 07-Jul-2016  msaitoh branches: 1.6.18; 1.6.24;
KNF. Remove extra spaces. No functional change.
 1.5 22-Feb-2015  maxv KNF, and simplify a bit.

No functional change
 1.4 12-Jun-2011  rmind branches: 1.4.12; 1.4.30;
Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.3 07-Jun-2011  bouyer Fix bad cut'n'paste in copyright. Pointed out by dyoung@
 1.2 06-Mar-2011  bouyer branches: 1.2.2; 1.2.4; 1.2.6;
merge the bouyer-quota2 branch. This adds a new on-disk format
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.

See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
 1.1 20-Jan-2011  bouyer branches: 1.1.2;
file ffs_quota2.c was initially added on branch bouyer-quota2.
 1.1.2.2 09-Feb-2011  bouyer Support MNT_UPDATE for quota2 (especially r/o -> r/w transitions)
 1.1.2.1 20-Jan-2011  bouyer Snapshot of work in progress on a modernised disk quota system:
- new quotactl syscall (versionned for backward compat), which takes
as parameter a path to a mount point, and a prop_dictionary
(in plistref format) describing commands and arguments.
For each command, status and data are returned as a prop_dictionary.
quota commands features will be added to take advantage of this,
exporting quota data or getting quota commands as plists.

- new on disk-format storage (all 64bit wide), integrated to metadata for
ffs (and playing nicely with wapbl).
Quotas are enabled on a ffs filesystem via superblock flags.
tunefs(8) can enable or disable quotas.
On a quota-enabled filesystem, fsck_ffs(8) will track per-uid/gid
block and inode usages, and will check and update quotas in Pass 6.
quota usage and limits are stored in unliked files (one for users,
one for groups)l fsck_ffs(8) will create the files if needed, or
free them if needed. This means that after enabling or disabling
quotas on a filesystem; a fsck_ffs(8) run is required.
quotacheck(8) is not needed any more, on a unclean shutdown
fsck or journal replay will take care of fixing quotas.
newfs(8) can create a ready-to-mount quota-enabled filesystem
(superblock flags are set and quota inodes are created).
Other new features or semantic changes:
- default quota datas, applied to users or groups which don't already
have a quota entry
- per-user/group grace time (instead of a filesystem global one)
- 0 really means "nothing allowed at all", not "no limit".
If you want "no limit", set the limit to UQUAD_MAX (tools will
understand "unlimited" and "-")

A quota file is structured as follow:
it starts with a header, containing a few per-filesystem values,
and the default quota limits.
Quota entries are linked together as a simple list, each entry has a
pointer (as an offset withing the file) to the next.
The header has a pointer to a list of free quota entries, and
a hash table of in-use entries. The size of the hash table depends
on the filesystem block size (header+hash table should fit in the
first block). The file is not sparse and is a multiple of
filesystem block size (when the free quota entry list is empty a new
filesystem block is allocated). quota entries to not cross
filesystem block boundaries.

In memory, the kernel keeps a cache of recently used quota entries
as a reference to the block number, and offset withing the block.
The quota entry itself is keept in the buf cache.

fsck_ffs(8), tunefs(8) and newfs(8) supports are completed (with
related atf tests :)
The kernel can update disk usage and report it via quotactl(2).

Todo: enforce quotas limits (limits are not checked by kernel yet)
update repquota, edquota and rpc.rquotad to the new world
implement compat_50_quotactl ioctl.
update quotactl(2) man page

fsck_ffs required fixes so that allocating new blocks or inodes will
properly update the superblock and cg sumaries. This was not an issue up
to now because superblock and cg sumaries check happened last, but now
allocations or frees can happen in pass 6.
 1.2.6.2 06-Jun-2011  jruoho Sync with HEAD.
 1.2.6.1 06-Mar-2011  jruoho file ffs_quota2.c was added on branch jruoho-x86intr on 2011-06-06 09:10:16 +0000
 1.2.4.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.2.2.4 12-Jun-2011  rmind sync with head
 1.2.2.3 23-Apr-2011  rmind Few fixes, missed in last sync with head.
 1.2.2.2 21-Apr-2011  rmind sync with head
 1.2.2.1 06-Mar-2011  rmind file ffs_quota2.c was added on branch rmind-uvmplock on 2011-04-21 01:42:20 +0000
 1.4.30.2 09-Jul-2016  skrll Sync with HEAD
 1.4.30.1 06-Apr-2015  skrll Sync with HEAD
 1.4.12.1 03-Dec-2017  jdolecek update from HEAD
 1.6.24.1 17-Jan-2020  ad Sync with head.
 1.6.18.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.155 11-May-2023  chs ffs: apply the remaining ffs_snapshot.c part of this FreeBSD commit:

commit 364ed814e7285c8216d8a201d3ab3674eb34ce29
Author: Kirk McKusick <mckusick@FreeBSD.org>
Date: Thu Dec 9 21:24:00 2004 +0000

Fixes a bug that caused UFS2 filesystems bigger than 2TB to
prematurely report that they were full and/or to panic the kernel
with the message ``ffs_clusteralloc: allocated out of group''.

Submitted by: Henry Whincup <henry@jot.to>
MFC after: 1 week

all the other changes in that commit were applied previously by others:
- sborrill commmitted ffs_alloc.c rev 1.123 in 2009
- simonb committed ffs_alloc.c rev 1.110 in 2008
- the ffs_clusteralloc() part is not needed because we no longer have
that function.

fixes PR 57307
 1.154 16-Apr-2022  hannken branches: 1.154.4;
Take the link count from the inode.
 1.153 05-Dec-2021  msaitoh s/shapshot/snapshot/
 1.152 18-Apr-2020  christos Extended attribute support for ffsv2, from FreeBSD.
 1.151 23-Feb-2020  ad branches: 1.151.4;
UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.150 17-Jan-2020  ad VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.
 1.149 01-Jun-2017  chs branches: 1.149.10; 1.149.14; 1.149.16;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.148 01-Apr-2017  riastradh KASSERT(mutex_owned(vp->v_interlock)) in vnode iterator selector.
 1.147 18-Mar-2017  riastradh #if DIAGNOSTIC panic ---> KASSERT
 1.146 01-Mar-2017  hannken Remove now redundant calls to fstrans_start()/fstrans_done().
 1.145 17-Feb-2017  hannken Bring back vrele_flush() to flush deferred vrele() o an suspended file system.
 1.144 17-Feb-2017  hannken Untangle VFS_SYNC() from VFS_SUSPENDCTL().
 1.143 28-Oct-2016  jdolecek branches: 1.143.2;
reorganize ffs_truncate()/ffs_indirtrunc() to be able to partially
succeed; change wapbl_register_deallocation() to return EAGAIN
rather than panic when code hits the limit

callers changed to either loop calling ffs_truncate() using new
utility ufs_truncate_retry() if their semantics requires it, or
just ignore the failure; remove ufs_wapbl_truncate()

this fixes possible user-triggerable panic during truncate, and
resolves WAPBL performance issue with truncates of large files

PR kern/47146 and kern/49175
 1.142 21-Oct-2016  jdolecek revert 1.141 - the second ffs_truncate() can't really fail

requested by hannken@
 1.141 20-Oct-2016  jdolecek allow also the snapshot_setup()'s call to ffs_truncate() fail, the code
should simply reuse the file blocks in that case; also make sure the
ffs_truncate() call is run within transaction if log is on
 1.140 28-Jun-2015  maxv branches: 1.140.2;
Small fixes.

ok hannken@
 1.139 28-Mar-2015  maxv Remove the 'cred' argument from bread(). Remove a now unused var in
ffs_snapshot.c. Update the man page accordingly.

ok hannken@
 1.138 28-Mar-2015  maxv Remove the 'cred' argument from breadn(), and update the man page
accordingly.

ok hannken@
 1.137 05-Sep-2014  matt branches: 1.137.2;
Don't nest structure definitions.
 1.136 10-Jul-2014  dholland Use an explicit compare to 0 for an immediate error result, not !.
Using ! is perfectly clear on variables like "error" or "result",
but directly on a function call it tends to look like a mistake.
 1.135 30-May-2014  hannken Testing "v_usecount == 1" for exclusive reference will not always
work -- remove and test only readonly.
 1.134 24-May-2014  christos Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.
 1.133 17-Mar-2014  hannken branches: 1.133.2;
Change snapshot_expunge() to use vfs_vnode_iterator.
 1.132 17-Dec-2013  joerg ib_get is not used in the evbarm/OPENRD kernel, so mark it as such.
 1.131 19-Oct-2013  martin Mark unused (in the !FFS_EI case) variables as such.
 1.130 19-Oct-2013  martin Mark a potentially unused (ifndef FFS_EI) variable
 1.129 30-Sep-2013  hannken Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>
 1.128 13-Sep-2013  joerg Kill unused function ib_assign.
 1.127 23-Jun-2013  dholland branches: 1.127.2;
Stick ffs_ in front of the following macros:
fragstoblks()
blkstofrags()
fragnum()
blknum()

to finish the job of distinguishing them from the lfs versions, which
Christos renamed the other day.

I believe this is the last of the overtly ambiguous exported symbols
from ffs... or at least, the last of the ones that conflicted with lfs.
ffs still pollutes the C namespace very broadly (as does ufs) and this
needs quite a bit more cleanup.

XXX: boo on macros with lowercase names. But I'm not tackling that just yet.
 1.126 23-Jun-2013  dholland Stick ffs_, ext2_, chfs_, filecore_, cd9660_, or mfs_ in front of
the following symbols so as to disambiguate fully. (Christos already
did the lfs ones.)

lblkno
lblktosize
lfragtosize
numfrags
blkroundup
fragroundup
 1.125 23-Jun-2013  dholland fsbtodb() -> FFS_FSBTODB(), EXT2_FSBTODB(), or MFS_FSBTODB()
dbtofsb() -> FFS_DBTOFSB() or EXT2_DBTOFSB()

(Christos already did the lfs ones a few days back)
 1.124 19-Jun-2013  dholland Rename ambiguous macros:
MAXDIRSIZE -> UFS_MAXDIRSIZE or LFS_MAXDIRSIZE
NINDIR -> FFS_NINDIR, EXT2_NINDIR, LFS_NINDIR, or MFS_NINDIR
INOPB -> FFS_INOPB, LFS_INOPB
INOPF -> FFS_INOPF, LFS_INOPF
blksize -> ffs_blksize, ext2_blksize, or lfs_blksize
sblksize -> ffs_blksize

These are not the only ambiguously defined filesystem macros, of
course, there's a pile more. I may not have found all the ambiguous
definitions of blksize(), too, as there are a lot of other things
called 'blksize' in the system.
 1.123 16-Jun-2013  hannken Add an UFS_SNAPGONE() ufs op replacing the calls
to ffs_snapgone() in ufs_lookup.c.

Ok: David Holland <dholland@netbsd.org>

Welcome to 6.99.22
 1.122 07-May-2013  hannken When invalidating short buffers on the snapshots clean list use bbusy()
to mark the buffer busy. There exists a small window where a buffer is
done but not released and therefore still busy.
 1.121 22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.120 20-Dec-2012  hannken Change bread() and breadn() to never return a buffer on
error and modify all callers to not brelse() on error.

Welcome to 6.99.16

PR kern/46282 (6.0_BETA crash: msdosfs_bmap -> pcbmap -> bread -> bio_doread)
 1.119 13-Mar-2012  elad branches: 1.119.2;
Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.
 1.118 07-Oct-2011  hannken branches: 1.118.2; 1.118.6;
As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.
 1.117 01-Jul-2011  hannken ffs_copyonwrite(): If the write is to the in-file-system journal
there is no need to lock and check the snapshots.
 1.116 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.115 08-May-2011  hannken branches: 1.115.2;
Revert previous commit. Locking the snapshot vnode while the file system
is suspended extends the suspension until the vnode gets unlocked by
the caller of ffs_snapshot().

Resuming the file system before expunging all snapshots and syncing the
snapshot creates races and deadlocks with journaling file systems at least.
 1.114 29-Apr-2011  hannken Before expunging all snapshots take the snapshot lock and resume the file
system as this is sufficient for the remaining operations.

Reduces the time the file system is suspended and should make this time
independent of the number of snapshots already present.
 1.113 23-Apr-2011  hannken ffs_snapshot(): return an error if the node is an invalid snapshot.
 1.112 18-Apr-2011  hannken Preallocate all cylinder group blocks so we no longer redo ~50% of
the cylinder groups while the file system is suspended.
This was removed in error with Rev 1.16.

From Manuel Bouyer <bouyer@netbsd.org> via tech-kern.
 1.111 06-Mar-2011  bouyer merge the bouyer-quota2 branch. This adds a new on-disk format
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.

See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
 1.110 24-Feb-2011  hannken fss(4): Allow FSSIOCSET to set the initial flags. Add a new flag
"FSS_UNLINK_ON_CREATE" to unlink the backing store before
the snapshot gets created.

With this change dump(8) no longer dumps the zero-sized, but named
snapshot it is working on. Same applies to fsck_ffs(8).
 1.109 23-Feb-2011  dyoung Initialize blkno to 0 right before the snapblkaddr() call that GCC does
not understand so that if ffs_copyonwrite() sprouts a new code path that
does not initialize blkno, the compiler has the chance to reveal it.
 1.108 23-Feb-2011  hannken Quiesce CC ('blkno' may be used uninitialized in this function).
 1.107 22-Feb-2011  he Move blocks_in_journal() in under #ifndef FFS_NO_SNAPSHOT, all uses
are under that ifdef anyway; this allows build with FFS_NO_SNAPSHOT defined.
 1.106 21-Feb-2011  hannken Change the snapshot lock:
- No need to take the snapshot lock while the file system is suspended.
- Allow ffs_copyonwrite() one level of recursion with snapshots locked.
- Do the block address lookup with snapshots locked.
- Take the snapshot lock while removing a snapshot from the list.

While hunting deadlocks change the transaction scope for ffs_snapremove().
We could deadlock from UFS_WAPBL_BEGIN() with a buffer held.
 1.105 18-Feb-2011  bouyer Initialize error in snapshot_expunge(); if the list is empty error would
be returned uninitialized. t_snapshot_v2 was failing for me when
librumpffs was compiled DGB=-g.
No idea why gcc didn't catch this ...
 1.104 18-Feb-2011  hannken Revert rev. 1.101. Dead snapshots would hang around until unmount.

Adresses PR #44568 (WAPBL doens't play nice with snapshots).
 1.103 16-Feb-2011  hannken Refine the scope of WAPBL transactions so we should no longer get
a "wapbl_flush: current transaction too big to flush" panic when
creating or removing snapshots on larger logging disks.

Adresses PR #44568 (WAPBL doens't play nice with snapshots).
 1.102 20-Dec-2010  matt branches: 1.102.2; 1.102.4;
Move counting of faults, traps, intrs, soft[intr]s, syscalls, and nswtch
from uvmexp to per-cpu cpu_data and move them to 64bits. Remove unneeded
includes of <uvm/uvm_extern.h> and/or <uvm/uvm.h>.
 1.101 12-Dec-2010  hannken Keep a reference to the snapshot vnode until it gets removed from the
snapshot list.
 1.100 12-Dec-2010  hannken syncsnap: Use bbusy() to take a buffer from v_dirtyblkhd.
 1.99 24-Jun-2010  hannken Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.
 1.98 02-Jun-2010  hannken Initialize the initial snap block list's count.

From Antti Kantee <pooka@netbsd.org>.
 1.97 15-Oct-2009  hannken branches: 1.97.2; 1.97.4;
No longer abuse TAILQ internal data.
 1.96 13-Oct-2009  hannken Fix a deadlock where fscow_disestablish() blocks because outstanding
copy-on-write operations wait for si_snaplock.
 1.95 18-Apr-2009  tsutsui Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch
 1.94 18-Mar-2009  cegger bcopy -> memcpy
 1.93 18-Mar-2009  cegger bzero -> memset
 1.92 22-Feb-2009  ad PR kern/26878 FFSv2 + softdep = livelock (no free ram)
PR kern/16942 panic with softdep and quotas
PR kern/19565 panic: softdep_write_inodeblock: indirect pointer #1 mismatch
PR kern/26274 softdep panic: allocdirect_merge: ...
PR kern/26374 Long delay before non-root users can write to softdep partitions
PR kern/28621 1.6.x "vp != NULL" panic in ffs_softdep.c:4653 while unmounting a softdep (+quota) filesystem
PR kern/29513 FFS+Softdep panic with unfsck-able file-corruption
PR kern/31544 The ffs softdep code appears to fail to write dirty bits to disk
PR kern/31981 stopping scsi disk can cause panic (softdep)
PR kern/32116 kernel panic in softdep (assertion failure)
PR kern/32532 softdep_trackbufs deadlock
PR kern/37191 softdep: locking against myself
PR kern/40474 Kernel panic after remounting raid root with softdep

Retire softdep, pass 2. As discussed and later formally announced on the
mailing lists.
 1.91 11-Jan-2009  christos branches: 1.91.2;
merge christos-time_t
 1.90 03-Jan-2009  hannken Remove superfluous "vp->v_vnlock = &vp->v_lock".

Observed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.89 19-Dec-2008  hannken Restore a line removed by mistake with the last commit.

Should fix PR 40225 panic: indiracct: missing indir.
 1.88 17-Dec-2008  cegger kill MALLOC and FREE macros.
 1.87 07-Dec-2008  hannken ffs_copyonwrite(): Only use si_snapblklist if it is already allocated.

ffs_snapshot_read(): Use IO_ALTSEMANTICS to allow reading a snapshot vnode
beyond file system size. Needed to read the snapblklist
on mount.

Persistent snapshots work again.

Should fix PR kern/37425: fss_snapshot_mount panic during fsck.
 1.86 07-Dec-2008  hannken Revert previous -- ALL reads are from kernel space.

Still open: PR kern/37425: fss_snapshot_mount panic during fsck.
 1.85 07-Dec-2008  hannken ffs_copyonwrite(): Only use si_snapblklist if it is already allocated.
ffs_snapshot_read(): Allow the kernel to read beyond file system size.

Persistent snapshots work again.

Should fix PR kern/37425: fss_snapshot_mount panic during fsck.
 1.84 06-Dec-2008  joerg Split ffs_freefile into a frontend for normal cylinder group and for
snapshot use. Adjust ffs_blkfree_common to get the fs instance passed
in, the original commit didn't account blocks in the snapshots
correctly. Assert that ffs_blkfree is used with the primary fs instance
and that ffs_checkfreefile is only used for snapshots. Move the bdwrite
from ffs_blkfree_common into the caller for symmetry. This creates a
redundant write of unmodified data for ffs_blkfree_snap if a double free
of a block happens.

Reviewed and tested by hannken@.
 1.83 01-Dec-2008  joerg ffs_blkfree is used in two different ways. The normal usage is to free a
block in the cylinder groups of the filesystem. The other user is the
snapshot code, which wants to modify the copied cylinder groups. Use
different frontends to distinguish the cases in preparation for fine
grained locking for cylinder groups.
 1.82 23-Oct-2008  hannken branches: 1.82.2; 1.82.4;
Correct previous.
- Count frags, not blocks to get the file system size.
- Cannot use blksize() here, it depends on vnode size.
- Correctly update xfersize on short reads.
 1.81 23-Oct-2008  hannken When computing the requests hard limit in ffs_snapshot_read()
use the file system size, not the size of the snapshot vnode.
 1.80 08-Sep-2008  hannken Adjust some WAPBL transactions:
- Put transaction inside cgaccount() to simplify caller.
- No vget() / vrele() inside a transaction.
 1.79 02-Sep-2008  hannken Ffs_snapshot() has become a huge monster over the time. Break it into
helper functions to enhance readability. Adjust comments to reality
and test the main error paths.

While here, expand and remove the last FreeBSD->NetBSD conversion macros.

No functional change intended.
 1.78 25-Aug-2008  hannken Sync the just created snapshot to disk.

Invalidate short ( < fs_bsize ) buffers. We will always read full
size buffers later.

Should fix PR #39402
 1.77 24-Aug-2008  hannken Add missing vput() for logvp.

Fixes PR #39400
 1.76 24-Aug-2008  hannken Merge the _ufs1 and _ufs2 variants of the expunge and accounting functions.
Remove some unneeded UFS_FSNEEDSWAP().

Saves ~250 lines of redundant code.
 1.75 22-Aug-2008  hannken Add snapshot support for logging ffs file systems.

- Add UFS_WAPBL_BEGIN() / UFS_WAPBL_END() where needed.

- Expunge WAPBL log inodes from snapshots.

- Ffs_copyonwrite() and ffs_snapblkfree() must run inside a WAPBL transaction.

- Add ffs_gop_write() as a wrapper around genfs_gop_write() that makes sure
genfs_gop_write() gets always called inside a WAPBL transaction.

- Add VOP_PUTPAGES() flag PGO_JOURNALLOCKED to tag calls to VOP_PUTPAGES()
inside a WAPBL transaction.

Reviewed by: Simon Burge <simonb@netbsd.org>, Greg Oster <oster@netbsd.org>

PGO_JOURNALLOCKED / ffs_gop_write() part presented on tech-kern@.
 1.74 12-Aug-2008  hannken Deny read/write access to snapshot vnodes. We use fss(4) to read from
snapshots. With this policy in place:

- Separate the snapshot vnode lock from the snapshot common lock.
Snapshots no longer need recursive vnode locks.

- Use a mutex (si_snaplock) to serialize creation, deletion, reading and
writing of snapshots.

- Move ffs_read() for snapshots into ffs_snapshot.c.

Reviewed by: Jason Thorpe <thorpej@netbsd.org>

While here change ffs_copyonwrite() to fail requests from pagedaemon that need
to copy-on-write.
 1.73 31-Jul-2008  hannken Ffs snapshots don't work (yet) with WAPBL:
- no snapshot creation on logging file systems.
- refuse to mount logging file systems with persistent snapshots.

Ok: Simon Burge <simonb@netbsd.org>
 1.72 30-Jul-2008  hannken ffs_snapshot():
Release allocated indir blocks on non-softdep file systems instead
of writing them twice.
It is sufficient to clean dirty data pages to avoid UBC inconsistencies.

ffs_snapblkfree() and wrsnapblk():
If a snapshots effective link count is zero there is no need
to use synchronous writes.

ffs_copyonwrite():
Defer locking the snapshots until there is a need to copy the block.

wrsnapblk():
Use vn_rdwr() instead of bwrite() to write to the snapshots.
 1.71 15-Jul-2008  hannken expunge_ufs*(): Use the buffer cache to update the inodes on the snapshot like
the rest of snapshot creation does.
 1.70 17-Jun-2008  reinoud branches: 1.70.2;
Mark a buffer `busy` in getnewbuf() when it came from the pool_cache since
its not on a free list.

Also change buf_init() to not automatically mark buffers `busy' since this
only makes sense for bufcache buffers.

Mark all buf_init'd buffers 'busy' on the places where they ought to be
flagged as such to not confuse the buffer cache.

Fixes PR 38923.
 1.69 03-Jun-2008  hannken branches: 1.69.2;
ufs/ffs: replace calls to getblk() with ffs_getblk(). Now all buffers
have been run through copy-on-write and async mounts work again.

Fixes PR kern/38820

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.68 29-May-2008  hannken ffs_copyonwrite(): stop abusing ffs_balloc() to get a block address.
Use ufs_getlbns()/bread() instead.
Saves some reads and removes deep recursion with possible deadlock
when ffs_balloc() runs copy-on-write on the buffer returned.
 1.67 16-May-2008  hannken Make sure all cached buffers with valid, not yet written data have been
run through copy-on-write. Call fscow_run() with valid data where possible.

The LP_UFSCOW hack is no longer needed to protect ffs_copyonwrite() against
endless recursion.

- Add a flag B_MODIFY to bread(), breada() and breadn(). If set the caller
intends to modify the buffer returned.

- Always run copy-on-write on buffers returned from ffs_balloc().

- Add new function ffs_getblk() that gets a buffer, assigns a new blkno,
may clear the buffer and runs copy-on-write. Process possible errors
from getblk() or fscow_run(). Part of PR kern/38664.

Welcome to 4.99.63

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.66 17-Apr-2008  hannken branches: 1.66.2; 1.66.4; 1.66.6;
Replace get/setspecific with a void pointer in struct ufsmount. Use explicit
initialization/finalization of snapshot private data on creation/deletion
of struct ufsmount.
Snapshot mounts no longer may fail silently because kmem_alloc() fails.

Welcome to 4.99.60

Ok: Andrew Doran <ad@netbsd.org>
 1.65 30-Jan-2008  hannken branches: 1.65.6; 1.65.8;
Make it work after lockmgr -> vlockmgr conversion:

- Initialize si_vnlock in si_mount_init().
- Also initialize vl_recursecnt to zero.
- Destroy it only in si_mount_dtor().
- Simplify the v_lock <-> si_vnlock exchange.
- Don't abuse the overall error variable for LK_NOWAIT errors.
- ffs_snapremove: release the vnode one instead of three times.
 1.64 30-Jan-2008  ad Replace use of LK_SLEEPFAIL.
 1.63 30-Jan-2008  ad PR kern/37706 (forced unmount of file systems is unsafe):

- Do reference counting for 'struct mount'. Each vnode associated with a
mount takes a reference, and in turn the mount takes a reference to the
vfsops.
- Now that mounts are reference counted, replace the overcomplicated mount
locking inherited from 4.4BSD with a recursable rwlock.
 1.62 30-Jan-2008  ad Replace struct lock on vnodes with a simpler lock object built on
krwlock_t. This is a step towards removing lockmgr and simplifying
vnode locking. Discussed on tech-kern.
 1.61 28-Jan-2008  hannken - Always destroy si_vnlock after use.
- Take care of vnodes without file system data.
 1.60 24-Jan-2008  hannken si_mount_dtor(): destroy si_vnlock before free.
 1.59 24-Jan-2008  hannken Fix a typo from the vmlocking2 merge: vmark() the right vnode.
 1.58 03-Jan-2008  pooka valloc -> vnalloc, vfree -> vnfree
Avoids collision with userland valloc(3).

no functional change
ad ok
 1.57 02-Jan-2008  ad Merge vmlocking2 to head.
 1.56 08-Dec-2007  pooka branches: 1.56.4;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.
 1.55 02-Dec-2007  hannken branches: 1.55.2;
Fscow_run(): add a flag "bool data_valid" to note still valid data.
Buffers run through copy-on-write are marked B_COWDONE. This condition
is valid until the buffer has run through bwrite() and gets cleared from
biodone().

Welcome to 4.99.39.

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.54 26-Nov-2007  pooka Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.53 10-Oct-2007  ad branches: 1.53.4;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.52 08-Oct-2007  ad Merge ffs locking & brelse changes from the vmlocking branch.
 1.51 07-Oct-2007  hannken Update the file system copy-on-write handler.

- Instead of hooking the handler on the specdev of a mounted file system
hook directly on the `struct mount'.

- Rename from `vn_cow_*' to `fscow_*' and move to `kern/vfs_trans.c'. Use
`mount_*specific' instead of clobbering `struct mount' or `struct specinfo'.

- Replace the hand-made reader/writer lock with a krwlock.

- Keep `vn_cow_*' functions and mark as obsolete.

- Welcome to NetBSD 4.99.32 - `struct specinfo' changed size.

Reviewed by: Jason Thorpe <thorpej@netbsd.org>
 1.50 21-Aug-2007  hannken branches: 1.50.2; 1.50.4;
Modify ffs_lock() to take care for changed v_vnlock. Snapshots do not need
transferlockers() anymore.

From FreeBSD ffs_vnops.c Rev. 1.159

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.49 18-Aug-2007  hannken - Use a mutex to protect snapinfo.
- Move the snapshot lock to snapinfo.
- ffs_snapblkfree(),ffs_copyonwrite(): replace lockmgr() with VOP_LOCK().
 1.48 18-Aug-2007  hannken Expunge traces of unlinked snapshot files when making a new snapshot.

From FreeBSD Rev. 1.123
 1.47 09-Aug-2007  hannken Move snapshot per-mount data from struct ufsmount to mount specific data.
No functional changes.

Welcome to 4.99.28 (struct ufsmount changed size)
 1.46 12-Jul-2007  hannken branches: 1.46.2; 1.46.6;
ffs_snapshot_mount: No persistent snapshots on an Apple UFS file system.

From Thor Lancelot Simon <tls@netbsd.org>
 1.45 10-Jul-2007  hannken Move `struct dquot' and its supporting functions from quota.h to ufs_quota.c.

- Make quota-internal functions static.
- Clean up declarations in quota.h and ufs_extern.h. quota.h now has the
description of quota criterions, on-disk structure, user-kernel interface and
declaration of init/done functions. All ufs quota related function
prototypes go to ufs_extern.h.
- New functions ufsquota_init() and ufsquota_free() create or destroy the
quota fields of `struct inode'.
- chkdq() and chkiq() always update the quota fields of `struct inode' first.
- Only ufs_access() explicitely calls getinoquota().

No objections on tech-kern@
 1.44 09-Jul-2007  ad Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.43 04-Mar-2007  christos branches: 1.43.2; 1.43.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.42 16-Feb-2007  hannken branches: 1.42.2;
Make fstrans(9) the default helper for file system suspension.
Replaces the now obsolete vn_start_write()/vn_finished_write().
 1.41 09-Feb-2007  ad Merge newlock2 to head.
 1.40 19-Jan-2007  hannken New file system suspension API to replace vn_start_write and vn_finished_write.
The suspension helpers are now put into file system specific operations.
This means every file system not supporting these helpers cannot be suspended
and therefore snapshots are no longer possible.

Implemented for file systems of type ffs.

The new API is enabled on a kernel option NEWVNGATE. This option is
not enabled by default in any kernel config.

Presented and discussed on tech-kern with much input from
Bill Studenmund <wrstuden@netbsd.org> and YAMAMOTO Takashi <yamt@netbsd.org>.

Welcome to 4.99.9 (new vfs op vfs_suspendctl).
 1.39 04-Jan-2007  elad Consistent usage of KAUTH_GENERIC_ISSUSER.
 1.38 02-Dec-2006  hannken On snapshot creation be sure the snapshot vnode has valid quota information.

Fixes PR kern/35121
 1.37 16-Nov-2006  christos branches: 1.37.2;
ifdef out an unused function if !FFS_NO_SNAPSHOT
 1.36 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.35 25-Oct-2006  reinoud Revisit mnt_vnodelist TAILQ patch. Remove all suspicious TAILQ_FOREACH()
loops where vnodes can get removed or added during the loops. This could
lead to panic's on unmount since nodes are skipped or otherwise
TAILQ_NEXT(0xdeadbeef, ...) was dereferenced.
 1.34 20-Oct-2006  reinoud Replace the LIST structure mp->mnt_vnodelist to a TAILQ structure since all
vnodes were synced and processed backwards. This meant that the last
accessed node was processed first and the earlierst last.

An extra benefit is the removal of the ugly hack from the Berkly days on
LFS.

In the proces, i've also replaced the various variations hand written loops
by the TAILQ_FOREACH() macro's.
 1.33 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.32 29-Sep-2006  christos Coverity CID 2949: comment out dead code (from Arnaud Lacombe)
 1.31 23-Jul-2006  ad branches: 1.31.4; 1.31.6;
Use the LWP cached credentials where sane.
 1.30 07-Jun-2006  kardel merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.29 14-May-2006  elad branches: 1.29.2;
integrate kauth.
 1.28 18-Apr-2006  christos Coverity CID 746: Remove dead code. lbn >= NDADDR is mutually exclusive to
snapshot_locked == 0.
 1.27 10-Apr-2006  bouyer Revert previous; I mixed bpp and *bpp when reading ffs_balloc_ufs1().
ffs_balloc() will always allocate a new buffer or leave it as NULL,
so coverity is wrong here, we're not using a freed argument.
 1.26 10-Apr-2006  bouyer If we brelse ibp, set ibp to NULL, to avoid reusing it later in balloc()
or in our code at the next iteration.
Coverity ID 2706
 1.25 17-Mar-2006  christos don't use MALLOC with a non-constant size; use malloc instead.
 1.24 04-Jan-2006  yamt branches: 1.24.2; 1.24.4; 1.24.6; 1.24.8; 1.24.10;
- add simple functions to allocate/free a buffer for i/o.
- make bufpool static.
 1.23 11-Dec-2005  christos branches: 1.23.2;
merge ktrace-lwp.
 1.22 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.21 26-Sep-2005  yamt branches: 1.21.2;
revert ffs_snapshot.c 1.20 because it's bogus. pointed by Simon Burge.
 1.20 26-Sep-2005  yamt always use nanotime rather than time.
it's bad to mix nanotime and time because it sometimes
make timestamps go backwards.
 1.19 19-Aug-2005  christos 64 bit inode changes.
 1.18 15-Jul-2005  thorpej Use ANSI function decls.
 1.17 29-May-2005  christos branches: 1.17.2;
- sprinkle const
- avoid shadow variables.
 1.16 25-May-2005  hannken - Use an empty snap block list to set the initial file size. Snapshot is
now valid from the beginning. No need to copy the last fs block two times.
- No need to allocate the cylinder group blocks twice.
- cgbuf -> sbbuf
 1.15 22-May-2005  hannken ffs/ffs_alloc.c:
- Add a missing ACTIVECG_CLR().

ffs/ffs_snapshot.c:
- Use async/delayed writes for snapshot creation and sync/uncache these buffers
on end. Reduces the time the file system must be suspended.
- Remove um_snaplistsize. Was a duplicate of um_snapblklist[0].
- Byte swap the list of preallocated blocks on read/write instead of access.
- Always keep this list on ip->i_snapblklist so it may be rolled back when the
newest snapshot gets removed. Fixes a rare snapshot corruption when using
more than one snapshot on a file system.

ufs/ufsmount.h:
- Make TAILQ_LAST() possible on member um_snapshots.
- Remove um_snaplistsize. Was a duplicate of um_snapblklist[0].
 1.14 03-May-2005  hannken Fix last commit. The last block of the file system may have changed
even if the last cylinder group is not modified.
 1.13 24-Apr-2005  hannken Fix an inconsistency where the last block of the snapshot contains old data.

The last block of the file system is written to the snapshot before the
file system is suspended. If the last cylinder group is modified after
the file system is suspended the last block of the snapshot may contain
old data. So update this block again.
 1.12 21-Apr-2005  yamt don't assign to non-lvalue. found by gcc4.
 1.11 26-Feb-2005  perry branches: 1.11.2;
nuke trailing whitespace
 1.10 21-Feb-2005  hannken Make `options FFS_NO_SNAPSHOT' only disable snapshot creation
while not trashing existing snapshots.

Approved by: core@
 1.9 09-Feb-2005  hannken Fss device only checks read access to snapshot vode. On snapshot creation
check we are either super-user or owner of the snapshot vnode.
 1.8 18-Jan-2005  hannken branches: 1.8.2;
Protect calls to `ffs_*_swap' with `#ifdef FFS_EI'.
 1.7 17-Sep-2004  skrll branches: 1.7.4;
There's no need to pass a proc value when using UIO_SYSSPACE with
vn_rdwr(9) and uiomove(9).

OK'd by Jason Thorpe
 1.6 29-Aug-2004  hannken While creating a snapshot inodes must be freed from the
snapshot, not from the file system.
ffs_freefile() needs explicit "fs" and "devvp" arguments.
 1.5 30-Jun-2004  hannken branches: 1.5.2;
When we expunge an unreferenced file from a snapshot its size may be zero.
 1.4 20-Jun-2004  hannken - Add flag L_COWINPROGRESS to struct lwp to avoid recursion when
doing copy-on-write.

- Change VFS_SNAPSHOT() to return the snapshot vnode locked.

- Make the IO path for copy-on-write and snapshot-read more lightweight.
Avoids deadlocks where vn_rdwr(...READ...) has a shared lock and needs
to copy-on-write.
Avoids deadlocks/panics where to clean pages the copy-on-write needs
to allocate pages for its VOP_PUTPAGES().

L_COWINPROGRESS part approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.3 31-May-2004  hannken Once all block address modifications are done invalidate and
free all pages from the snapshot vnode.
 1.2 26-May-2004  hannken Make it compile without option FFS_EI.
 1.1 25-May-2004  hannken Add ffs internal snapshots. Written by Marshall Kirk McKusick for FreeBSD.

- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.

Welcome to 2.0F.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.5.2.11 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.5.2.10 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.5.2.9 15-Feb-2005  skrll Adapt to branch.
 1.5.2.8 15-Feb-2005  skrll Sync with HEAD.
 1.5.2.7 04-Feb-2005  skrll Sync with HEAD.
 1.5.2.6 24-Jan-2005  skrll Sync with HEAD.
 1.5.2.5 21-Sep-2004  skrll Fix the sync with head I botched.
 1.5.2.4 18-Sep-2004  skrll Sync with HEAD.
 1.5.2.3 03-Sep-2004  skrll Sync with HEAD
 1.5.2.2 03-Aug-2004  skrll Sync with HEAD
 1.5.2.1 30-Jun-2004  skrll file ffs_snapshot.c was added on branch ktrace-lwp on 2004-08-03 10:56:49 +0000
 1.7.4.1 29-Apr-2005  kent sync with -current
 1.8.2.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.8.2.1 12-Feb-2005  yamt sync with head.
 1.11.2.8 06-Dec-2006  tron Pull up following revision(s) (requested by hannken in ticket #1598):
sys/ufs/ffs/ffs_snapshot.c: revision 1.38
On snapshot creation be sure the snapshot vnode has valid quota information.
Fixes PR kern/35121
 1.11.2.7 28-May-2005  tron branches: 1.11.2.7.2; 1.11.2.7.4;
Pull up revision 1.16 (requested by hannken in ticket #334):
- Use an empty snap block list to set the initial file size. Snapshot is
now valid from the beginning. No need to copy the last fs block two times.
- No need to allocate the cylinder group blocks twice.
- cgbuf -> sbbuf
 1.11.2.6 28-May-2005  tron Pull up revision 1.15 (requested by hannken in ticket #334):
ffs/ffs_alloc.c:
- Add a missing ACTIVECG_CLR().
ffs/ffs_snapshot.c:
- Use async/delayed writes for snapshot creation and sync/uncache these buffers
on end. Reduces the time the file system must be suspended.
- Remove um_snaplistsize. Was a duplicate of um_snapblklist[0].
- Byte swap the list of preallocated blocks on read/write instead of access.
- Always keep this list on ip->i_snapblklist so it may be rolled back when the
newest snapshot gets removed. Fixes a rare snapshot corruption when using
more than one snapshot on a file system.
ufs/ufsmount.h:
- Make TAILQ_LAST() possible on member um_snapshots.
- Remove um_snaplistsize. Was a duplicate of um_snapblklist[0].
 1.11.2.5 03-May-2005  tron Pull up revision 1.14 (requested by hannken in ticket #244):
Fix last commit. The last block of the file system may have changed
even if the last cylinder group is not modified.
 1.11.2.4 03-May-2005  tron Restore file which was deleted by accident because of CVS glitch.
 1.11.2.3 03-May-2005  tron Pull up revision 1.14 (requested by hannken in ticket #244):
Fix last commit. The last block of the file system may have changed
even if the last cylinder group is not modified.
 1.11.2.2 25-Apr-2005  tron Pull up revision 1.13 (requested by hannken in ticket #197):
Fix an inconsistency where the last block of the snapshot contains old data.
The last block of the file system is written to the snapshot before the
file system is suspended. If the last cylinder group is modified after
the file system is suspended the last block of the snapshot may contain
old data. So update this block again.
 1.11.2.1 25-Apr-2005  tron Pull up revision 1.12 (requested by hannken in ticket #197):
don't assign to non-lvalue. found by gcc4.
 1.11.2.7.4.1 06-Dec-2006  tron Pull up following revision(s) (requested by hannken in ticket #1598):
sys/ufs/ffs/ffs_snapshot.c: revision 1.38
On snapshot creation be sure the snapshot vnode has valid quota information.
Fixes PR kern/35121
 1.11.2.7.2.1 06-Dec-2006  tron Pull up following revision(s) (requested by hannken in ticket #1598):
sys/ufs/ffs/ffs_snapshot.c: revision 1.38
On snapshot creation be sure the snapshot vnode has valid quota information.
Fixes PR kern/35121
 1.17.2.8 04-Feb-2008  yamt sync with head.
 1.17.2.7 21-Jan-2008  yamt sync with head
 1.17.2.6 07-Dec-2007  yamt sync with head
 1.17.2.5 27-Oct-2007  yamt sync with head.
 1.17.2.4 03-Sep-2007  yamt sync with head.
 1.17.2.3 26-Feb-2007  yamt sync with head.
 1.17.2.2 30-Dec-2006  yamt sync with head.
 1.17.2.1 21-Jun-2006  yamt sync with head.
 1.21.2.2 29-Oct-2005  yamt use ffs_* directly rather than via ufs_ops.
suggested by Chuck Silvers.
 1.21.2.1 20-Oct-2005  yamt adapt ufs.
 1.23.2.1 15-Jan-2006  yamt sync with head.
 1.24.10.2 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.24.10.1 28-Mar-2006  tron Merge 2006-03-28 NetBSD-current into the "peter-altq" branch.
 1.24.8.5 11-May-2006  elad sync with head
 1.24.8.4 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.24.8.3 19-Apr-2006  elad sync with head.
 1.24.8.2 10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.24.8.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.24.6.4 11-Aug-2006  yamt sync with head
 1.24.6.3 26-Jun-2006  yamt sync with head.
 1.24.6.2 24-May-2006  yamt sync with head.
 1.24.6.1 01-Apr-2006  yamt sync with head.
 1.24.4.3 01-Jun-2006  kardel Sync with head.
 1.24.4.2 22-Apr-2006  simonb Sync with head.
 1.24.4.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time() and use "time_second"
instead of "time.tv_sec".
 1.24.2.1 09-Sep-2006  rpaulo sync with head
 1.29.2.1 19-Jun-2006  chap Sync with head.
 1.31.6.2 10-Dec-2006  yamt sync with head.
 1.31.6.1 22-Oct-2006  yamt sync with head
 1.31.4.4 01-Feb-2007  ad Sync with head.
 1.31.4.3 12-Jan-2007  ad Sync with head.
 1.31.4.2 29-Dec-2006  ad Checkpoint work in progress.
 1.31.4.1 18-Nov-2006  ad Sync with head.
 1.37.2.1 06-Dec-2006  tron Pull up following revision(s) (requested by hannken in ticket #252):
sys/ufs/ffs/ffs_snapshot.c: revision 1.38
On snapshot creation be sure the snapshot vnode has valid quota information.
Fixes PR kern/35121
 1.42.2.1 12-Mar-2007  rmind Sync with HEAD.
 1.43.4.1 11-Jul-2007  mjf Sync with head.
 1.43.2.14 29-Oct-2007  ad Remove unused label.
 1.43.2.13 28-Oct-2007  ad Fix up mnt_vnodelist handling.
 1.43.2.12 09-Oct-2007  ad Sync with head.
 1.43.2.11 09-Oct-2007  ad Sync with head.
 1.43.2.10 30-Aug-2007  ad bufcache_lock is sufficient to inspect v_dirtyblkhd, vp->v_interlock is only
needed to modify.
 1.43.2.9 24-Aug-2007  ad Sync with buffer cache locking changes. See buf.h/vfs_bio.c for details.
Some minor portions are incomplete and needs to be verified as a whole.
 1.43.2.8 20-Aug-2007  ad Sync with HEAD.
 1.43.2.7 15-Jul-2007  ad Sync with head.
 1.43.2.6 23-Jun-2007  ad - Lock v_cleanblkhd, v_dirtyblkhd, v_numoutput with the vnode's interlock.
Get rid of global_v_numoutput_lock. Partially incomplete as the buffer
cache locking doesn't work very well and needs an overhaul.
- Some changes to try and make softdep MP safe. Untested.
 1.43.2.5 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.43.2.4 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.43.2.3 13-Apr-2007  ad - Make the devsw interface MP safe, and add some comments.
- Allow individual block/character drivers to be marked MP safe.
- Provide wrappers around the device methods that look up the
device, returning ENXIO if it's not found, and acquire the
kernel lock if needed.
 1.43.2.2 21-Mar-2007  ad - Replace more simple_locks, and fix up in a few places.
- Use condition variables.
- LOCK_ASSERT -> KASSERT.
 1.43.2.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.46.6.6 09-Dec-2007  jmcneill Sync with HEAD.
 1.46.6.5 03-Dec-2007  joerg Sync with HEAD.
 1.46.6.4 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.46.6.3 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.46.6.2 03-Sep-2007  jmcneill Sync with HEAD.
 1.46.6.1 16-Aug-2007  jmcneill Sync with HEAD.
 1.46.2.2 03-Sep-2007  skrll Sync with HEAD.
 1.46.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.50.4.1 14-Oct-2007  yamt sync with head.
 1.50.2.3 23-Mar-2008  matt sync with HEAD
 1.50.2.2 09-Jan-2008  matt sync with HEAD
 1.50.2.1 06-Nov-2007  matt sync with HEAD
 1.53.4.3 18-Feb-2008  mjf Sync with HEAD.
 1.53.4.2 27-Dec-2007  mjf Sync with HEAD.
 1.53.4.1 08-Dec-2007  mjf Sync with HEAD.
 1.55.2.2 26-Dec-2007  ad Sync with head.
 1.55.2.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.56.4.2 08-Jan-2008  bouyer Sync with HEAD
 1.56.4.1 02-Jan-2008  bouyer Sync with HEAD
 1.65.8.5 04-Jan-2009  christos merge diffs.
 1.65.8.4 27-Dec-2008  christos merge with head.
 1.65.8.3 01-Nov-2008  christos catch up with changes in head.
 1.65.8.2 01-Nov-2008  christos Sync with head.
 1.65.8.1 29-Mar-2008  christos Welcome to the time_t=long long dev_t=uint64_t branch.
 1.65.6.5 17-Jan-2009  mjf Sync with HEAD.
 1.65.6.4 28-Sep-2008  mjf Sync with HEAD.
 1.65.6.3 29-Jun-2008  mjf Sync with HEAD.
 1.65.6.2 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.65.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.66.6.3 24-Sep-2008  wrstuden Merge in changes between wrstuden-revivesa-base-2 and
wrstuden-revivesa-base-3.
 1.66.6.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.66.6.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.66.4.3 11-Aug-2010  yamt sync with head.
 1.66.4.2 11-Mar-2010  yamt sync with head
 1.66.4.1 04-May-2009  yamt sync with head.
 1.66.2.2 04-Jun-2008  yamt sync with head
 1.66.2.1 18-May-2008  yamt sync with head.
 1.69.2.3 31-Jul-2008  simonb Sync with head.
 1.69.2.2 18-Jul-2008  simonb Sync with head.
 1.69.2.1 18-Jun-2008  simonb Sync with head.
 1.70.2.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.70.2.1 19-Oct-2008  haad Sync with HEAD.
 1.82.4.4 18-Jun-2011  bouyer Pull up following revision(s) (requested by hannken in ticket #1627):
sys/kern/vfs_wapbl.c: revisions 1.41-1.42
sbin/dump/snapshot.c: revisions 1.6 (patch)
share/man/man4/fss.4: revisions 1.15 (patch)
sys/dev/fss.c: revisions 1.73 (patch)
sys/dev/fssvar.h: revisions 1.25
usr.sbin/fssconfig/fssconfig.c: revisions 1.7
sys/ufs/ffs/ffs_balloc.c: revisions 1.54
sys/ufs/ffs/ffs_snapshot.c: revisions 1.90, 1.98, 1.100-1.101, 1.103-1.110, 1.111, 1.112-1.115 (patch)

- Try to keep snapshot indirect blocks contiguous. This speeds up snapshot
creation by a factor of ~3 and reduces the file system suspension time by
a factor of ~5.

- Refine the scope of WAPBL transactions and the limit for deallocations in
one transaction so we should no longer get a "wapbl_flush: current
transaction too big to flush" panic when creating or removing snapshots
on larger logging disks.

- fss(4): Allow FSSIOCSET to set the initial flags. Add a new flag
"FSS_UNLINK_ON_CREATE" to unlink the backing store before the snapshot
gets created. With this change dump(8) no longer dumps the zero-sized,
but named snapshot it is working on.
 1.82.4.3 28-Mar-2010  snj branches: 1.82.4.3.4;
Pull up following revision(s) (requested by hannken in ticket #1345):
sys/ufs/ffs/ffs_snapshot.c: revision 1.97
No longer abuse TAILQ internal data.
 1.82.4.2 28-Mar-2010  snj Pull up following revision(s) (requested by hannken in ticket #1345):
sys/ufs/ffs/ffs_snapshot.c: revision 1.96
Fix a deadlock where fscow_disestablish() blocks because outstanding
copy-on-write operations wait for si_snaplock.
 1.82.4.1 10-Dec-2008  snj branches: 1.82.4.1.4;
Pull up following revision(s) (requested by hannken in ticket #169):
sys/ufs/ffs/ffs_snapshot.c: revision 1.87
ffs_copyonwrite(): Only use si_snapblklist if it is already allocated.
ffs_snapshot_read(): Use IO_ALTSEMANTICS to allow reading a snapshot vnode
beyond file system size. Needed to read the snapblklist
on mount.
Persistent snapshots work again.
Should fix PR kern/37425: fss_snapshot_mount panic during fsck.
 1.82.4.3.4.1 07-Jan-2011  matt Quiet gcc.
 1.82.4.1.4.1 21-Apr-2010  matt sync to netbsd-5
 1.82.2.3 28-Apr-2009  skrll Sync with HEAD.
 1.82.2.2 03-Mar-2009  skrll Sync with HEAD.
 1.82.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.91.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.97.4.5 31-May-2011  rmind sync with head
 1.97.4.4 21-Apr-2011  rmind sync with head
 1.97.4.3 05-Mar-2011  rmind sync with head
 1.97.4.2 03-Jul-2010  rmind sync with head
 1.97.4.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.97.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.102.4.8 05-Mar-2011  bouyer Sync with HEAD
 1.102.4.7 18-Feb-2011  bouyer Add a new inode flag, SF_SNAPINVAL, to be set on SF_SNAPSHOT inodes when
the snapshot is invalid.
Set SF_SNAPSHOT | SF_SNAPINVAL early when initializing a snapshot indode,
so that quota are bypassed for allocations on this inode.
Set SF_SNAPSHOT | SF_SNAPINVAL (instead of clearing SF_SNAPSHOT) when
expuge()ing a snapshot inode, so that userland tools working on the
snapshot (e.g. fsck or dump) can properly handle this inode.

The main point at this time is to have fsck_ffs -X properly compute quotas;
as a bonus persistent snapshots files won't show up in a dump(8) from a
snapshot.

This may also help speeding up taking snapshots, by bypassing expuge()
for snapshot inodes completely (but this needs more thoughs).


Briefly discussed with hannken@ in private mail.
 1.102.4.6 18-Feb-2011  bouyer Sync with HEAD
 1.102.4.5 17-Feb-2011  bouyer Remove comment that should not be there
 1.102.4.4 17-Feb-2011  bouyer Sync with HEAD
 1.102.4.3 17-Feb-2011  bouyer Do not adjust quota when a snapshot inode is cleared in a snapshot view.
 1.102.4.2 12-Feb-2011  bouyer Don't count snapshot files in inode quota too.
At umount time, chk?q may be called after quota have been shutdown,
as there is a final vflush pass after quota?_umount(); so skip quota
checks if the quota vnode is not there any more.
 1.102.4.1 12-Feb-2011  bouyer Do not update disk quotas for snapshot inodes, as this may require a
write to the same filesystem, which will trigger a copy on write,
which will trigger another update to the same block.
Set SF_SNAPSHOT just after truncating the snapshot inode, so that this
inode always account for 0 blocks in quotas.
 1.102.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.115.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.118.6.1 05-Apr-2012  mrg sync to latest -current.
 1.118.2.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.118.2.2 23-Jan-2013  yamt sync with head
 1.118.2.1 17-Apr-2012  yamt sync with head
 1.119.2.4 03-Dec-2017  jdolecek update from HEAD
 1.119.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.119.2.2 23-Jun-2013  tls resync from head
 1.119.2.1 25-Feb-2013  tls resync with head
 1.127.2.1 18-May-2014  rmind sync with head
 1.133.2.1 10-Aug-2014  tls Rebase.
 1.137.2.4 28-Aug-2017  skrll Sync with HEAD
 1.137.2.3 05-Dec-2016  skrll Sync with HEAD
 1.137.2.2 22-Sep-2015  skrll Sync with HEAD
 1.137.2.1 06-Apr-2015  skrll Sync with HEAD
 1.140.2.3 26-Apr-2017  pgoyette Sync with HEAD
 1.140.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.140.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.143.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.149.16.2 29-Feb-2020  ad Sync with head.
 1.149.16.1 17-Jan-2020  ad Sync with head.
 1.149.14.1 13-May-2023  martin Pull up following revision(s) (requested by chs in ticket #1633):

sys/ufs/ffs/ffs_snapshot.c: revision 1.155

ffs: apply the remaining ffs_snapshot.c part of this FreeBSD commit:
commit 364ed814e7285c8216d8a201d3ab3674eb34ce29
Author: Kirk McKusick <mckusick@FreeBSD.org>
Date: Thu Dec 9 21:24:00 2004 +0000
Fixes a bug that caused UFS2 filesystems bigger than 2TB to
prematurely report that they were full and/or to panic the kernel
with the message ``ffs_clusteralloc: allocated out of group''.
Submitted by: Henry Whincup <henry@jot.to>
MFC after: 1 week

all the other changes in that commit were applied previously by others:
- sborrill commmitted ffs_alloc.c rev 1.123 in 2009
- simonb committed ffs_alloc.c rev 1.110 in 2008
- the ffs_clusteralloc() part is not needed because we no longer have
that function.

fixes PR 57307
 1.149.10.2 21-Apr-2020  martin Sync with HEAD
 1.149.10.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.151.4.1 20-Apr-2020  bouyer Sync with HEAD
 1.154.4.1 13-May-2023  martin Pull up following revision(s) (requested by chs in ticket #165):

sys/ufs/ffs/ffs_snapshot.c: revision 1.155

ffs: apply the remaining ffs_snapshot.c part of this FreeBSD commit:
commit 364ed814e7285c8216d8a201d3ab3674eb34ce29
Author: Kirk McKusick <mckusick@FreeBSD.org>
Date: Thu Dec 9 21:24:00 2004 +0000
Fixes a bug that caused UFS2 filesystems bigger than 2TB to
prematurely report that they were full and/or to panic the kernel
with the message ``ffs_clusteralloc: allocated out of group''.
Submitted by: Henry Whincup <henry@jot.to>
MFC after: 1 week

all the other changes in that commit were applied previously by others:
- sborrill commmitted ffs_alloc.c rev 1.123 in 2009
- simonb committed ffs_alloc.c rev 1.110 in 2008
- the ffs_clusteralloc() part is not needed because we no longer have
that function.

fixes PR 57307
 1.2 31-Jan-2005  hannken No longer needed. Ffs snapshots are enabled by default.
 1.1 25-May-2004  hannken branches: 1.1.2; 1.1.6; 1.1.8;
Add ffs internal snapshots. Written by Marshall Kirk McKusick for FreeBSD.

- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.

Welcome to 2.0F.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.1.8.1 12-Feb-2005  yamt sync with head.
 1.1.6.1 29-Apr-2005  kent sync with -current
 1.1.2.5 04-Feb-2005  skrll Sync with HEAD.
 1.1.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.1.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.1.2.2 03-Aug-2004  skrll Sync with HEAD
 1.1.2.1 25-May-2004  skrll file ffs_snapshot.stub.c was added on branch ktrace-lwp on 2004-08-03 10:56:49 +0000
 1.2 21-Feb-2005  hannken Make `options FFS_NO_SNAPSHOT' only disable snapshot creation
while not trashing existing snapshots.

Approved by: core@
 1.1 10-Feb-2005  dsl branches: 1.1.2; 1.1.4;
Add a stub file so that snapshot support can be compiled out.
Will allow INSTALL_TINY to fit back in its designated space.
Since the calling code doesn't allow a snapshot mount to fail, this code
will output a warning and delete any snapshots it finds.
This only happend on rw mounts - snapshots don't seem to be created
when mounting ro.
The whole way the snapshots gets mounted is a PITA anyway, the superblock
'last mounted' time should be used to validate that the fs hasn't been
mounted elsewhere.
 1.1.4.3 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.1.4.2 15-Feb-2005  skrll Sync with HEAD.
 1.1.4.1 10-Feb-2005  skrll file ffs_snapshot_stub.c was added on branch ktrace-lwp on 2005-02-15 21:34:02 +0000
 1.1.2.3 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.1.2.2 12-Feb-2005  yamt sync with head.
 1.1.2.1 10-Feb-2005  yamt file ffs_snapshot_stub.c was added on branch yamt-km on 2005-02-12 18:17:56 +0000
 1.117 22-Feb-2009  ad PR kern/26878 FFSv2 + softdep = livelock (no free ram)
PR kern/16942 panic with softdep and quotas
PR kern/19565 panic: softdep_write_inodeblock: indirect pointer #1 mismatch
PR kern/26274 softdep panic: allocdirect_merge: ...
PR kern/26374 Long delay before non-root users can write to softdep partitions
PR kern/28621 1.6.x "vp != NULL" panic in ffs_softdep.c:4653 while unmounting a softdep (+quota) filesystem
PR kern/29513 FFS+Softdep panic with unfsck-able file-corruption
PR kern/31544 The ffs softdep code appears to fail to write dirty bits to disk
PR kern/31981 stopping scsi disk can cause panic (softdep)
PR kern/32116 kernel panic in softdep (assertion failure)
PR kern/32532 softdep_trackbufs deadlock
PR kern/37191 softdep: locking against myself
PR kern/40474 Kernel panic after remounting raid root with softdep

Retire softdep, pass 2. As discussed and later formally announced on the
mailing lists.
 1.116 06-Dec-2008  joerg branches: 1.116.4;
Split ffs_freefile into a frontend for normal cylinder group and for
snapshot use. Adjust ffs_blkfree_common to get the fs instance passed
in, the original commit didn't account blocks in the snapshots
correctly. Assert that ffs_blkfree is used with the primary fs instance
and that ffs_checkfreefile is only used for snapshots. Move the bdwrite
from ffs_blkfree_common into the caller for symmetry. This creates a
redundant write of unmodified data for ffs_blkfree_snap if a double free
of a block happens.

Reviewed and tested by hannken@.
 1.115 03-Jun-2008  hannken branches: 1.115.4; 1.115.6;
ufs/ffs: replace calls to getblk() with ffs_getblk(). Now all buffers
have been run through copy-on-write and async mounts work again.

Fixes PR kern/38820

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.114 31-May-2008  ad Put a TNF copyright on it.
 1.113 31-May-2008  ad XXX softdep:

If the number of deletes in progress is getting too high, newdirrem()
requests the syncer to flush faster, and in some cases will block to
prevent deletes accumulating faster than the disk can service them.

The syncer will try to lock vnodes that the remover holds locked, leading
to the syncer and remover proceeding in lockstep and making very little
overall forward progress.

Put a hook into ufs_rmdir() and ufs_remove() so that the softdep code
can pace itself without holding vnode locks if the number of deletes is
running out of control.
 1.112 16-May-2008  hannken Make sure all cached buffers with valid, not yet written data have been
run through copy-on-write. Call fscow_run() with valid data where possible.

The LP_UFSCOW hack is no longer needed to protect ffs_copyonwrite() against
endless recursion.

- Add a flag B_MODIFY to bread(), breada() and breadn(). If set the caller
intends to modify the buffer returned.

- Always run copy-on-write on buffers returned from ffs_balloc().

- Add new function ffs_getblk() that gets a buffer, assigns a new blkno,
may clear the buffer and runs copy-on-write. Process possible errors
from getblk() or fscow_run(). Part of PR kern/38664.

Welcome to 4.99.63

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.111 05-May-2008  ad branches: 1.111.2;
- Convert hashinit() to use kmem_alloc(). The hash tables can be large
and it's better to not have them in kmem_map.
- Convert a couple of minor items along the way to kmem_alloc().
- Fix some memory leaks.
 1.110 29-Apr-2008  ad PR kern/38057 ffs makes assuptions about devvp file system
PR kern/33406 softdeps get stuck in endless loop

Introduce VFS_FSYNC() and call it when syncing a block device, if it
has a mounted file system.
 1.109 11-Apr-2008  ad branches: 1.109.2; 1.109.4;
newdirrem: if the number of deletes in progress is getting too high, start
pushing the syncer before considering rate limiting the deletes. We hold
vnodes locked and it's likely that the syncer will try to lock them while
flushing, leading to the syncer and remover proceeding in lockstep and
making very little forward progress. XXX this is not a solution.
 1.108 20-Feb-2008  matt branches: 1.108.6;
Merge all the *different* definitions of bufqueues into one common one.
 1.107 15-Feb-2008  ad Give bbusy() an interlock argument. If the we need to wait for the buffer,
the interlock is dropped and reacquired when awoken. This allows for
busying buffers attached to a list that is not locked by bufcache_lock.
 1.106 12-Jan-2008  ad Initialize caches at IPL_SOFTBIO (not IPL_NONE) so that we are allocating
from kmem_map.
 1.105 07-Jan-2008  ad Fix 'panic: softdep_update_inodeblock: update failed'.
 1.104 07-Jan-2008  tnn softdep_freefile: don't acquire ufsmount lock twice.
 1.103 02-Jan-2008  ad Merge vmlocking2 to head.
 1.102 08-Dec-2007  pooka branches: 1.102.4;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.
 1.101 26-Nov-2007  pooka branches: 1.101.2;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.100 07-Nov-2007  ad Merge from vmlocking:

- pool_cache changes.
- Debugger/procfs locking fixes.
- Other minor changes.
 1.99 10-Oct-2007  ad branches: 1.99.2; 1.99.4;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.98 08-Oct-2007  ad Merge ffs locking & brelse changes from the vmlocking branch.
 1.97 01-Sep-2007  pooka branches: 1.97.2;
Make bioops a pointer and point it to the softdeps struct in softdep
init. Decouples "options SOFTDEP" from the main kernel and ffs code.
 1.96 29-Jul-2007  ad branches: 1.96.4; 1.96.6; 1.96.8;
It's not a good idea for device drivers to modify b_flags, as they don't
need to understand the locking around that field. Instead of setting
B_ERROR, set b_error instead. b_error is 'owned' by whoever completes
the I/O request.
 1.95 10-Jul-2007  hannken branches: 1.95.2;
Restore the special lkt_held handling for softdep_disk_write_complete().
No more panics 'worklist_remove: lock not held' on DEBUG kernels.

Ok Andrew Doran <ad@netbsd.org>
 1.94 09-Jul-2007  ad Fix build with DEBUG.
 1.93 09-Jul-2007  ad We got LWPs years ago..
 1.92 09-Jul-2007  ad Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.91 30-Jun-2007  pooka Using POOL_INIT here makes no sense, since file systems always have
an init method. So get rid of it and #ifdef _LKM and just always
init in the init method. Give malloc types the same treatment.
Makes file systems nicer to work with in linksetless environments
and fixes a few LKM discrepancies.
 1.90 07-May-2007  yamt flush_inodedep_deps: fix access after free. PR/29724.
 1.89 08-Apr-2007  hannken Remove now obsolete vn_start_write() and vn_finished_write() and
corresponding flags.

Revert softdep_trackbufs() to its state before vn_start_write() was added.

Remove from struct mount now unneeded flags IMNT_SUSPEND* and
members mnt_writeopcountupper, mnt_writeopcountlower and mnt_leaf.

Welcome to 4.99.17
 1.88 07-Apr-2007  hannken Remove calls to now obsolete vn_start_write() and vn_finished_write().
 1.87 12-Mar-2007  ad branches: 1.87.2;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.
 1.86 04-Mar-2007  christos branches: 1.86.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.85 22-Feb-2007  thorpej TRUE -> true, FALSE -> false
 1.84 21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.83 17-Feb-2007  pavel Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.
 1.82 09-Feb-2007  ad branches: 1.82.2;
Merge newlock2 to head.
 1.81 16-Nov-2006  christos branches: 1.81.2; 1.81.4;
__unused removal on arguments; approved by core.
 1.80 24-Oct-2006  drochner import a fix from FreeBSD (rev.1.185):
After a rmdir()ed directory has been truncated, force an update of
the directory's inode after queuing the dirrem that will decrement
the parent directory's link count. This will force the update of
the parent directory's actual link to actually be scheduled. Without
this change the parent directory's actual link count would not be
updated until ufs_inactive() cleared the inode of the newly removed
directory, which might be deferred indefinitely. ufs_inactive()
will not be called as long as any process holds a reference to the
removed directory, and ufs_inactive() will not clear the inode if
the link count is non-zero, which could be the result of an earlier
system crash.
[plus description about problems woth background fsck solved
by this; irrelevant to NetBSD]

For me, the good effect is at least that I'm getting less filesystem
inconsistencies after a crash.

Approved by christos quite a while ago.
 1.79 14-Oct-2006  yamt handle_workitem_freefrag/handle_workitem_freeblocks:
don't fake up inode/vnode pair.
 1.78 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.77 03-Oct-2006  christos Coverity CID 3690: Reverse INULL: Add KASSERT.
 1.76 23-Jul-2006  ad branches: 1.76.4; 1.76.6;
Use the LWP cached credentials where sane.
 1.75 12-Jun-2006  hannken softdep_sync_metadata: If vp is a block device it may have new I/O requests
posted for it even if the vnode is locked. This will deadlock with wmesg
"softgetdbuf" if it gets a BMSAFEMAP dependency as here we have "bp == nbp"
and try to get a buffer we already own.

Approved by: Frank van der Linden <fvdl@netbsd.org>
 1.74 14-May-2006  elad branches: 1.74.2;
integrate kauth.
 1.73 24-Dec-2005  perry branches: 1.73.4; 1.73.6; 1.73.8; 1.73.10; 1.73.12;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.72 11-Dec-2005  christos merge ktrace-lwp.
 1.71 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.70 09-Sep-2005  yamt branches: 1.70.2;
- for pagecache dependency, track which page in the block
has been written or not individually by (ab)using b_resid
in pcbp as a bitmap.
- add a comment to explain why it's needed.

PR/15364. reviewed by Chuck Silvers.
 1.69 30-Aug-2005  xtraeme * Remove __P()
* Use ANSI function declarations on ext2fs and mfs
 1.68 24-Aug-2005  yamt PRId64 -> ld in UVMHIST_LOG format strings.
 1.67 19-Aug-2005  christos 64 bit inode changes.
 1.66 30-May-2005  christos branches: 1.66.2;
rename delay because it is a function on sparc.
 1.65 29-May-2005  christos - sprinkle const
- avoid shadow variables.
 1.64 07-May-2005  hannken flush_inodedep_deps(): If softdep_lookupvp() returns NULL it means the
inode has been reclaimed. Skip the VOP_PUTPAGES() in this case.

Reviewed by: Chuck Silvers <chs@netbsd.org>
 1.63 26-Feb-2005  perry branches: 1.63.2;
nuke trailing whitespace
 1.62 25-Jan-2005  wrstuden Extend fsync_range(2) to support the FDISKSYNC flag, which requests
that the sync be propogated out through the disk drive caches.
 1.61 15-Dec-2004  mycroft branches: 1.61.2; 1.61.4;
Remove some unnecessary (int32_t) casts that would cause us to screw up the
top bit in block addresses.

Also, change some daddr_t->int32_t casts (mostly as arguments to ufs_rw32(),
where they would get promoted anyway) to u_int32_t.
 1.60 29-Aug-2004  hannken While creating a snapshot inodes must be freed from the
snapshot, not from the file system.
ffs_freefile() needs explicit "fs" and "devvp" arguments.
 1.59 25-May-2004  hannken Add ffs internal snapshots. Written by Marshall Kirk McKusick for FreeBSD.

- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.

Welcome to 2.0F.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.58 25-Apr-2004  simonb Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.
 1.57 11-Mar-2004  yamt reserve a MAXBSIZE-sized buffer for inodedeps for pagedaemon.

PR/24443.
 1.56 11-Mar-2004  yamt as we always replace whole buf in the case of indirdep,
simply changing b_data is enough. eliminate M_INDIRDEP.

PR/24443.
 1.55 10-Jan-2004  hannken Allow vfs_write_suspend() to wait if the file system is already
suspending.

Move vfs_write_suspend() and vfs_write_resume() from kern/vfs_vnops.c
to kern/vfs_subr.c.

Change vnode write gating in ufs/ffs/ffs_softdep.c (from FreeBSD).

When vnodes are throttled in softdep_trackbufs() check for
file system suspension every 10 msecs to avoid a deadlock.
 1.54 10-Jan-2004  hannken Split out softdep_flushworklist() from softdep_flushfiles() so that
it can be used to clear the work queue.

Cleanup ffs_sync() which did not synchronously wait when MNT_WAIT
was specified. Clear the work queue when MNT_WAIT is specified.

Result is a clean on-disk file system after ffs_sync(.., MNT_WAIT, ..)

From FreeBSD.
 1.53 15-Oct-2003  hannken Add the gating of system calls that cause modifications to the underlying
file system.
The function vfs_write_suspend stops all new write operations to a file
system, allows any file system modifying system calls already in progress
to complete, then sync's the file system to disk and returns. The
function vfs_write_resume allows the suspended write operations to
complete.

From FreeBSD with slight modifications.

Approved by: Frank van der Linden <fvdl@netbsd.org>
 1.52 14-Oct-2003  dbj add mnt_iflag field to struct mount for internal flags
mv MNT_GONE, MNT_UNMOUNT and MNT_WANTRDWR to this field
additonally add mnt_writeopcountupper and mnt_writeopcountlower fields
in preparation for pending write suspension support work
bump kernel version to 1.6ZD
 1.51 07-Sep-2003  yamt buffer cache mp locks.
 1.50 29-Jun-2003  fvdl branches: 1.50.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.49 29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.48 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.47 15-May-2003  kristerw The C language does not permit statements of the form
(X ? Y : Z) = 0;
even though gcc handles this by a stupid extension.

Transform these to correct C.

Approved by fvdl.
 1.46 03-Apr-2003  fvdl FreeBSD revision 1.135:

When removing the last item from a non-empty worklist, the worklist
tail pointer must be updated.
 1.45 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.44 05-Feb-2003  pk Make the buffer cache code MP-safe.
 1.43 01-Feb-2003  thorpej Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.
 1.42 26-Jan-2003  tsutsui More printf format cleanup to reduce casts.
 1.41 25-Jan-2003  tron Use PRId64 instead of hard coding "%lld" to fix build problems under
LP64 ports.
 1.40 25-Jan-2003  tron Fix printf() format strings problems caused by "daddr_t" change.
 1.39 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.38 01-Jan-2003  chs several bugs:
- move calls to softdep_setup_pagecache() (which can sleep to allocate
memory) outside the softdep lock.
- replace the softdep_flush_indir() hack (which tries to find another
vnode to fsync when we are holding lots of buffer-cache buffers locked
for long periods of time) with softdep_trackbufs() (which just kicks
the syncer and sleeps under the same circumstances). the former method
had a lock-ordering problem which would occasionally deadlock.
- relax the assertion in softdep_sync_metadata() which says that we should
never see D_ALLOCDIRECT deps for VREG vnodes. it's ok to see those
attached to indirect blocks.

also, there's no need to splbio() while allocating the buffer headers
to which pagecache dependencies are attached, so remove that.

fixes all the problems in PR 19288.
 1.37 30-Nov-2002  kristerw Softdep is mature enough that it shouldn't define DEBUG and DIAGNOSTIC
unconditionally.
 1.36 24-Nov-2002  scw Quell an uninitialised variable warning.
 1.35 27-Sep-2002  provos remove trailing \n in panic(). approved perry.
 1.34 25-Aug-2002  thorpej Make nbuf, nswbuf, and bufpages unsigned. Make all operations on these
variables unsigned, and update places where their values are printed.
 1.33 05-Jul-2002  scw Cast pointers first to uintptr_t before casting to register_t.
On SH-5, sizeof(register_t) is always 8, even if sizeof(void *) is 4
as is the case when compiling for ILP32.
 1.32 18-Jun-2002  jdolecek clear_inodedeps(): use CIRCLEQ_FOREACH() appropriately
 1.31 18-Mar-2002  wiz branches: 1.31.4; 1.31.6;
Fix a typo, a KNF-nit, and simplify a printf format string.
 1.30 08-Mar-2002  thorpej Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.
 1.29 22-Feb-2002  enami Record some page cache related information into ubchist.
 1.28 14-Feb-2002  wiz Fix two problems with softdep_typenames (missing entry, wrong boundary check).
Okayed by fvdl.
 1.27 10-Feb-2002  chs bring in the change from FreeBSD's rev. 1.107 of this file:

date: 2002/02/07 00:54:32; author: mckusick; state: Exp; lines: +10 -7
Occationally deleted files would hang around for hours or days
without being reclaimed. This bug was introduced in revision 1.95
dealing with filenames placed in newly allocated directory blocks,
thus is not present in 4.X systems. The bug is triggered when a
new entry is made in a directory after the data block containing
the original new entry has been written, but before the inode
that references the data block has been written.

Submitted by: Bill Fenner <fenner@research.att.com>

This should fix NetBSD PR 15531.
 1.26 18-Jan-2002  enami - For CIRCLEQ, comparing the loop variable against NULL doesn't make sense.
- Minor KNF while I'm here.

# This doesn't fix real problems though.
 1.25 16-Jan-2002  enami Fix typo which prevents diagnostic test from working.
 1.24 27-Dec-2001  fvdl Pull over one missed fix from FreeBSD wrt. running out of quota. Also
reshuffle some code a bit to make it look more similar (no functional
change).
 1.23 23-Dec-2001  fvdl Fix from FreeBSD that I missed: speed up handling of short-lived
files a bit.
 1.22 23-Dec-2001  chs process the delayed-free queue more often.
 1.21 18-Dec-2001  fvdl Bring over fixes from FreeBSD that weren't incorporated yet, mainly
from Kirk McKusick. They implement taking pending block/inode frees
into account for the sake of correct statfs() numbers, and adding
a new softdep type (newdirblk) to correctly handle newly allocated
directory blocks.

Minor additional changes: 1) swap the newly introduced fs_pendinginodes
and fs_pendingblock fields in ffs_sb_swap, and 2) declare lkt_held
in the debug version of the softdep lock structure volatile, as it
can be modified from interrupt context #ifdef DEBUG.
 1.20 08-Nov-2001  chs call VOP_PUTPAGES() directly for vnodes instead of
going through the UVM pager "put" vector.
 1.19 30-Oct-2001  lukem add __KERNEL_RCSID()
 1.18 26-Oct-2001  lukem remove #include <ufs/ufs/quota.h> where it was just to appease
<ufs/ufs/inode.h>, since the latter now includes the former. leave the former
in source that obviously uses specific bits of it (for completeness.)
 1.17 15-Sep-2001  chs branches: 1.17.2;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.16 15-Sep-2001  chs use pools for allocating most softdep datastructures. since we want to
allocate memory from kernel_map but some of the objects are freed from
interrupt context, we put objects on a queue instead of freeing them
immediately. then in softdep_process_worklist() (which is called at
least once per second from the syncer), we process that queue and
free all the objects. allocating from kernel_map instead of from kmem_map
allows us to have a much larger number of softdeps pending even in
configurations where kmem_map is relatively small.
 1.15 15-Sep-2001  chs add a new VFS op, vfs_reinit, which is called when desiredvnodes is
adjusted via sysctl. file systems that have hash tables which are
sized based on the value of this variable now resize those hash tables
using the new value. the max number of FFS softdeps is also recalculated.

convert various file systems to use the <sys/queue.h> macros for
their hash tables.
 1.14 30-Aug-2001  chs branches: 1.14.2;
min() -> MIN() (on general principles)
 1.13 10-Jan-2001  chs branches: 1.13.2; 1.13.6;
attach the softdep pagecache pseudo-buffers to the inode
so we can find them quickly in the softdep truncate path.
 1.12 13-Dec-2000  mycroft Patch from Kirk McKusick to fix an ordering problem in softdep_setup_freeblks()
that could cause an inode to be reused prematurely (possibly resulting in the
file containing garbage blocks).
 1.11 13-Dec-2000  chs fix bookkeeping for page cache dependency buffers.
 1.10 11-Dec-2000  chs in flush_inodedep_deps(), drop the big softdep lock while flushing pages.
 1.9 27-Nov-2000  chs allow building without SOFTDEP by adding the pageiodone hook to bio_ops.
 1.8 27-Nov-2000  chs Initial integration of the Unified Buffer Cache project.
 1.7 08-Nov-2000  ad branches: 1.7.2;
Update for hashinit() change.
 1.6 19-Sep-2000  fvdl Adapt for VOP_FSYNC parameter change.

Implement range fsync for FFS. Note: not yet implemented for the
SOFTDEP case.
 1.5 15-Aug-2000  fvdl Do not call MALLOC with M_WAITOK while holding the "lock". Thanks to
Ethan Solomita for the reminder.

Mark the parent vnode lock as recursive while flushing pagedeps. XXX.
Should fix kern/10564.
 1.4 28-Jun-2000  mrg <vm/vm.h> -> <uvm/uvm_extern.h>
 1.3 27-Jun-2000  pk We shouldn't be defining DEBUG and DIAGNOSTIC on our own; these may have
unwanted side-effects in the header files. For now, do the internal
#defines after including the headers.
 1.2 22-Jun-2000  fvdl branches: 1.2.2;
Moved here from gnu/sys/ufs/ffs
 1.1 19-Oct-1999  fvdl branches: 1.1.2;
file ffs_softdep.c was initially added on branch fvdl-softdep.
 1.1.2.6 03-Nov-1999  fvdl Give ufs_ihashget an extra argument: the flags passed to vget() for
locking. This way we can avoid locking against ourselves when
ufs_ihashget is called during the flushing of metadata. XXX

Also, comment out a VOP_FSYNC call that I think is now unneeded, and
put a diagnostic printf there to check if this still happens.
 1.1.2.5 26-Oct-1999  fvdl Merge changes in the trickle-sync and softdep code as done by Kirk McKusick
in FreeBSD since the version that we based the branch on. Merging mostly
done by Ethan Solomita <ethan@geocast.com>.

Also, make sure the syncer thread/process isn't active when we're
unmounting a filesystem. This could wreak havoc. XXX should be done
on a per-mountpoint basis, but especially the softdep code would
end up to be a big pile of vfs_busy() calls.
 1.1.2.4 21-Oct-1999  fvdl Add workaround hacks to enable the softdep code to call getnewvnode()
when a filesystem is being unmounted. The problem is that the softdep
code stored inode numbers in the worklist structures, and does not
use vnodes. So VFS_VGET must be used to get a vnode during the final
flush stages, and this can call getnewvnode(), resulting in
a vfs_busy() + MNT_UNMOUNT hang.

I've tried to make the softdep code use vnodes, but that's a pain,
since it gets called at points were vnode ops are dangerous (i.e.
interrupt context, and uncertainty whether a vnode is locked, etc).

This is all icky stuff, but it does get things much closer to a
working state..
 1.1.2.3 19-Oct-1999  soren Tell us which fs is being bold.
 1.1.2.2 19-Oct-1999  soren Fix compile with FFS_EI.
 1.1.2.1 19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.2.2.6 21-Apr-2001  he Pull up revision 1.12 (via patch, requested by chs):
Fix an ordering problem in softdep_setup_freeblks() that could
cause premature reuse of an inode, possibly causing the file to
contain garbage.
 1.2.2.5 14-Dec-2000  he Pull up revision 1.6 (requested by fvdl):
Improve NFS performance, possibly with as much as 100% in
throughput. Please note: this implies a kernel interface change,
VOP_FSYNC gains two arguments.
 1.2.2.4 17-Aug-2000  fvdl pull up version 1.5:
Do not call MALLOC with M_WAITOK while holding the "lock". Thanks to
Ethan Solomita for the reminder.

Mark the parent vnode lock as recursive while flushing pagedeps. XXX.
Should fix kern/10564.
 1.2.2.3 29-Jun-2000  thorpej Pull up rev. 1.3:
We shouldn't be defining DEBUG and DIAGNOSTIC on our own; these may be
unwanted side-effects in the header files. For now, do the internal
#defines after including the headers.
 1.2.2.2 23-Jun-2000  fvdl Pull up moved version (from gnu/sys/ufs/ffs) as on the trunk.
 1.2.2.1 22-Jun-2000  fvdl file ffs_softdep.c was added on branch netbsd-1-5 on 2000-06-23 14:32:22 +0000
 1.7.2.7 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.7.2.6 05-Jan-2001  bouyer Sync with HEAD
 1.7.2.5 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.7.2.4 08-Dec-2000  bouyer Sync with HEAD.
 1.7.2.3 22-Nov-2000  bouyer Sync with HEAD.
 1.7.2.2 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.7.2.1 08-Nov-2000  bouyer file ffs_softdep.c was added on branch thorpej_scsipi on 2000-11-20 18:11:45 +0000
 1.13.6.7 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.13.6.6 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.13.6.5 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.13.6.4 16-Mar-2002  jdolecek Catch up with -current.
 1.13.6.3 11-Feb-2002  jdolecek Sync w/ -current.
 1.13.6.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.13.6.1 13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.13.2.14 03-Jan-2003  thorpej Sync with HEAD.
 1.13.2.13 11-Dec-2002  thorpej Sync with HEAD.
 1.13.2.12 18-Oct-2002  nathanw Catch up to -current.
 1.13.2.11 27-Aug-2002  nathanw Catch up to -current.
 1.13.2.10 01-Aug-2002  nathanw Catch up to -current.
 1.13.2.9 15-Jul-2002  nathanw Whitespace.
 1.13.2.8 12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.13.2.7 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.13.2.6 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.13.2.5 28-Feb-2002  nathanw Catch up to -current.
 1.13.2.4 08-Jan-2002  nathanw Catch up to -current.
 1.13.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.13.2.2 21-Sep-2001  nathanw Catch up to -current.
 1.13.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.14.2.1 01-Oct-2001  fvdl Catch up with -current.
 1.17.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.31.6.1 05-Jan-2003  tron Pull up revision 1.38 (via patch, requested by chs in ticket #1053):
several bugs:
- move calls to softdep_setup_pagecache() (which can sleep to allocate
memory) outside the softdep lock.
- replace the softdep_flush_indir() hack (which tries to find another
vnode to fsync when we are holding lots of buffer-cache buffers locked
for long periods of time) with softdep_trackbufs() (which just kicks
the syncer and sleeps under the same circumstances). the former method
had a lock-ordering problem which would occasionally deadlock.
- relax the assertion in softdep_sync_metadata() which says that we should
never see D_ALLOCDIRECT deps for VREG vnodes. it's ok to see those
attached to indirect blocks.
also, there's no need to splbio() while allocating the buffer headers
to which pagecache dependencies are attached, so remove that.
fixes all the problems in PR 19288.
 1.31.4.2 29-Aug-2002  gehenna catch up with -current.
 1.31.4.1 15-Jul-2002  gehenna catch up with -current.
 1.50.2.12 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.50.2.11 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.50.2.10 04-Feb-2005  skrll Sync with HEAD.
 1.50.2.9 18-Dec-2004  skrll Sync with HEAD.
 1.50.2.8 30-Oct-2004  skrll Reduce diff to HEAD
 1.50.2.7 27-Oct-2004  skrll Fix various comments that describe the argument structures
 1.50.2.6 21-Sep-2004  skrll Fix the sync with head I botched.
 1.50.2.5 18-Sep-2004  skrll Sync with HEAD.
 1.50.2.4 03-Sep-2004  skrll Sync with HEAD
 1.50.2.3 24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.50.2.2 03-Aug-2004  skrll Sync with HEAD
 1.50.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.61.4.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.61.4.1 12-Feb-2005  yamt sync with head.
 1.61.2.1 29-Apr-2005  kent sync with -current
 1.63.2.2 21-Oct-2005  tron Pull up following revision(s) (requested by yamt in ticket #845):
sys/ufs/ffs/ffs_softdep.c: revision 1.70 via patch
- for pagecache dependency, track which page in the block
has been written or not individually by (ab)using b_resid
in pcbp as a bitmap.
- add a comment to explain why it's needed.
PR/15364. reviewed by Chuck Silvers.
 1.63.2.1 07-May-2005  tron Pull up revision 1.64 (requested by hannken in ticket #259):
flush_inodedep_deps(): If softdep_lookupvp() returns NULL it means the
inode has been reclaimed. Skip the VOP_PUTPAGES() in this case.
Reviewed by: Chuck Silvers <chs@netbsd.org>
 1.66.2.9 27-Feb-2008  yamt sync with head.
 1.66.2.8 21-Jan-2008  yamt sync with head
 1.66.2.7 07-Dec-2007  yamt sync with head
 1.66.2.6 15-Nov-2007  yamt sync with head.
 1.66.2.5 27-Oct-2007  yamt sync with head.
 1.66.2.4 03-Sep-2007  yamt sync with head.
 1.66.2.3 26-Feb-2007  yamt sync with head.
 1.66.2.2 30-Dec-2006  yamt sync with head.
 1.66.2.1 21-Jun-2006  yamt sync with head.
 1.70.2.2 29-Oct-2005  yamt use ffs_* directly rather than via ufs_ops.
suggested by Chuck Silvers.
 1.70.2.1 20-Oct-2005  yamt adapt ufs.
 1.73.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.73.10.2 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.73.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.73.8.3 11-Aug-2006  yamt sync with head
 1.73.8.2 26-Jun-2006  yamt sync with head.
 1.73.8.1 24-May-2006  yamt sync with head.
 1.73.6.1 01-Jun-2006  kardel Sync with head.
 1.73.4.1 09-Sep-2006  rpaulo sync with head
 1.74.2.1 19-Jun-2006  chap Sync with head.
 1.76.6.2 10-Dec-2006  yamt sync with head.
 1.76.6.1 22-Oct-2006  yamt sync with head
 1.76.4.2 29-Dec-2006  ad Checkpoint work in progress.
 1.76.4.1 18-Nov-2006  ad Sync with head.
 1.81.4.1 03-Sep-2007  wrstuden Sync w/ NetBSD-4-RC_1
 1.81.2.1 05-Jun-2007  bouyer Pull up following revision(s) (requested by yamt in ticket #706):
sys/ufs/ffs/ffs_softdep.c: revision 1.90
flush_inodedep_deps: fix access after free. PR/29724.
 1.82.2.5 17-May-2007  yamt sync with head.
 1.82.2.4 15-Apr-2007  yamt sync with head.
 1.82.2.3 24-Mar-2007  yamt sync with head.
 1.82.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.82.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.86.2.29 24-Oct-2007  ad softdep_disk_write_complete: fix the test to return early if !softdep.
 1.86.2.28 19-Oct-2007  ad softdep_freefile: mark the inode modified so that it gets flushed in
ufs_reclaim, resolving any dependencies. Fixes "%s: unmount pending
error: blocks %d files %d".
 1.86.2.27 09-Oct-2007  ad Sync with head.
 1.86.2.26 16-Sep-2007  ad - Checkpoint work in progress on the vnode lifecycle and reference counting
stuff. This makes it work properly without kernel_lock and fixes a few
quite old bugs. See vfs_subr.c 1.283.2.17 for details.

- Fix some problems with softdep. Unfortunately our softdep code appears
to have some longstanding bugs that cause it fail under stress test.
 1.86.2.25 10-Sep-2007  ad softdep_disk_write_complete: return early if the buffer describes a read,
meaning we don't have to grab bufcache_lock.
 1.86.2.24 30-Aug-2007  ad What a pain in the neck.. Just set bioops in softdep_initialize() and be
done with it.
 1.86.2.23 30-Aug-2007  ad Make softdep work.
 1.86.2.22 30-Aug-2007  ad Fix a NULL pointer deref and lock leak.
 1.86.2.21 30-Aug-2007  ad bufcache_lock is sufficient to inspect v_dirtyblkhd, vp->v_interlock is only
needed to modify.
 1.86.2.20 28-Aug-2007  yamt make this compile with DEBUG.
 1.86.2.19 24-Aug-2007  ad Sync with buffer cache locking changes. See buf.h/vfs_bio.c for details.
Some minor portions are incomplete and needs to be verified as a whole.
 1.86.2.18 21-Aug-2007  ad Remove dup call to callout_init().
 1.86.2.17 20-Aug-2007  ad softdep locking improvements. It hangs looping in flush_inodedep_deps(),
more work required.
 1.86.2.16 19-Aug-2007  ad - Back out the biodone() changes.
- Eliminate B_ERROR (from HEAD).
 1.86.2.15 15-Jul-2007  ad Sync with head.
 1.86.2.14 15-Jul-2007  ad Sync with head.
 1.86.2.13 07-Jul-2007  ad Fix some locking issues.
 1.86.2.12 01-Jul-2007  ad - Adapt to callout API change.
- Acquire softdep_lock before calling wakeup().
 1.86.2.11 23-Jun-2007  ad - Lock v_cleanblkhd, v_dirtyblkhd, v_numoutput with the vnode's interlock.
Get rid of global_v_numoutput_lock. Partially incomplete as the buffer
cache locking doesn't work very well and needs an overhaul.
- Some changes to try and make softdep MP safe. Untested.
 1.86.2.10 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.86.2.9 08-Jun-2007  ad Sync with head.
 1.86.2.8 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.86.2.7 12-Apr-2007  ad Make it build with DEBUG.
 1.86.2.6 10-Apr-2007  ad Update to handle LWPs.
 1.86.2.5 10-Apr-2007  ad Sync with head.
 1.86.2.4 09-Apr-2007  ad - Add two new arguments to kthread_create1: pri_t pri, bool mpsafe.
- Fork kthreads off proc0 as new LWPs, not new processes.
 1.86.2.3 21-Mar-2007  ad - Replace more simple_locks, and fix up in a few places.
- Use condition variables.
- LOCK_ASSERT -> KASSERT.
 1.86.2.2 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.86.2.1 13-Mar-2007  ad Sync with head.
 1.87.2.1 11-Jul-2007  mjf Sync with head.
 1.95.2.2 03-Sep-2007  skrll Sync with HEAD.
 1.95.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.96.8.2 29-Jul-2007  ad It's not a good idea for device drivers to modify b_flags, as they don't
need to understand the locking around that field. Instead of setting
B_ERROR, set b_error instead. b_error is 'owned' by whoever completes
the I/O request.
 1.96.8.1 29-Jul-2007  ad file ffs_softdep.c was added on branch matt-mips64 on 2007-07-29 13:31:14 +0000
 1.96.6.4 23-Mar-2008  matt sync with HEAD
 1.96.6.3 09-Jan-2008  matt sync with HEAD
 1.96.6.2 08-Nov-2007  matt sync with -HEAD
 1.96.6.1 06-Nov-2007  matt sync with HEAD
 1.96.4.5 09-Dec-2007  jmcneill Sync with HEAD.
 1.96.4.4 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.96.4.3 11-Nov-2007  joerg Sync with HEAD.
 1.96.4.2 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.96.4.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.97.2.1 14-Oct-2007  yamt sync with head.
 1.99.4.4 18-Feb-2008  mjf Sync with HEAD.
 1.99.4.3 27-Dec-2007  mjf Sync with HEAD.
 1.99.4.2 08-Dec-2007  mjf Sync with HEAD.
 1.99.4.1 19-Nov-2007  mjf Sync with HEAD.
 1.99.2.1 13-Nov-2007  bouyer Sync with HEAD
 1.101.2.3 26-Dec-2007  ad Sync with head.
 1.101.2.2 19-Dec-2007  ad Get lfs mostly working.
 1.101.2.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.102.4.3 19-Jan-2008  bouyer Sync with HEAD
 1.102.4.2 08-Jan-2008  bouyer Sync with HEAD
 1.102.4.1 02-Jan-2008  bouyer Sync with HEAD
 1.108.6.3 17-Jan-2009  mjf Sync with HEAD.
 1.108.6.2 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.108.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.109.4.2 04-May-2009  yamt sync with head.
 1.109.4.1 16-May-2008  yamt sync with head.
 1.109.2.2 04-Jun-2008  yamt sync with head
 1.109.2.1 18-May-2008  yamt sync with head.
 1.111.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.115.6.2 03-Mar-2009  skrll Sync with HEAD.
 1.115.6.1 19-Jan-2009  skrll Sync with HEAD.
 1.115.4.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.116.4.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.24 22-Feb-2009  ad PR kern/26878 FFSv2 + softdep = livelock (no free ram)
PR kern/16942 panic with softdep and quotas
PR kern/19565 panic: softdep_write_inodeblock: indirect pointer #1 mismatch
PR kern/26274 softdep panic: allocdirect_merge: ...
PR kern/26374 Long delay before non-root users can write to softdep partitions
PR kern/28621 1.6.x "vp != NULL" panic in ffs_softdep.c:4653 while unmounting a softdep (+quota) filesystem
PR kern/29513 FFS+Softdep panic with unfsck-able file-corruption
PR kern/31544 The ffs softdep code appears to fail to write dirty bits to disk
PR kern/31981 stopping scsi disk can cause panic (softdep)
PR kern/32116 kernel panic in softdep (assertion failure)
PR kern/32532 softdep_trackbufs deadlock
PR kern/37191 softdep: locking against myself
PR kern/40474 Kernel panic after remounting raid root with softdep

Retire softdep, pass 2. As discussed and later formally announced on the
mailing lists.
 1.23 31-May-2008  ad branches: 1.23.6; 1.23.12;
XXX softdep:

If the number of deletes in progress is getting too high, newdirrem()
requests the syncer to flush faster, and in some cases will block to
prevent deletes accumulating faster than the disk can service them.

The syncer will try to lock vnodes that the remover holds locked, leading
to the syncer and remover proceeding in lockstep and making very little
overall forward progress.

Put a hook into ufs_rmdir() and ufs_remove() so that the softdep code
can pace itself without holding vnode locks if the number of deletes is
running out of control.
 1.22 02-Jan-2008  ad branches: 1.22.6; 1.22.8; 1.22.10; 1.22.12;
Merge vmlocking2 to head.
 1.21 04-Mar-2007  christos branches: 1.21.2; 1.21.16; 1.21.22; 1.21.24; 1.21.28;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.20 16-Nov-2006  christos branches: 1.20.4;
__unused removal on arguments; approved by core.
 1.19 13-Oct-2006  hannken Add __unused to unused function arguments.
 1.18 14-May-2006  elad branches: 1.18.8; 1.18.10;
integrate kauth.
 1.17 11-Dec-2005  christos branches: 1.17.4; 1.17.6; 1.17.8; 1.17.10; 1.17.12;
merge ktrace-lwp.
 1.16 02-Nov-2005  gdt Adjust signature of softdep_freefile (dummy stub which always panics
if called) to match ffs_extern.h so that kernels w/o softdep can compile.
 1.15 15-Jul-2005  thorpej Use ANSI function decls.
 1.14 26-Feb-2005  perry branches: 1.14.4;
nuke trailing whitespace
 1.13 10-Jan-2004  hannken branches: 1.13.8; 1.13.10;
Split out softdep_flushworklist() from softdep_flushfiles() so that
it can be used to clear the work queue.

Cleanup ffs_sync() which did not synchronously wait when MNT_WAIT
was specified. Clear the work queue when MNT_WAIT is specified.

Result is a clean on-disk file system after ffs_sync(.., MNT_WAIT, ..)

From FreeBSD.
 1.12 29-Jun-2003  fvdl branches: 1.12.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.11 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.10 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.9 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.8 18-Dec-2001  fvdl Bring over fixes from FreeBSD that weren't incorporated yet, mainly
from Kirk McKusick. They implement taking pending block/inode frees
into account for the sake of correct statfs() numbers, and adding
a new softdep type (newdirblk) to correctly handle newly allocated
directory blocks.

Minor additional changes: 1) swap the newly introduced fs_pendinginodes
and fs_pendingblock fields in ffs_sb_swap, and 2) declare lkt_held
in the debug version of the softdep lock structure volatile, as it
can be modified from interrupt context #ifdef DEBUG.
 1.7 30-Oct-2001  lukem add __KERNEL_RCSID()
 1.6 26-Oct-2001  lukem remove #include <ufs/ufs/quota.h> where it was just to appease
<ufs/ufs/inode.h>, since the latter now includes the former. leave the former
in source that obviously uses specific bits of it (for completeness.)
 1.5 16-Sep-2001  jdolecek branches: 1.5.2;
add softdep_reinitialize() stub
 1.4 10-Jan-2001  ad branches: 1.4.2; 1.4.6; 1.4.8;
RCS ID
 1.3 14-Feb-2000  fvdl branches: 1.3.6;
Fixes to the softdep code from Ethan Solomita <ethan@geocast.com>.
* Fix buffer ordering when it has dependencies.
* Alleviate memory problems.
* Deal with some recursive vnode locks (sigh).
* Fix other bugs.
 1.2 15-Nov-1999  fvdl branches: 1.2.2;
Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.1 19-Oct-1999  fvdl branches: 1.1.2;
file ffs_softdep.stub.c was initially added on branch fvdl-softdep.
 1.1.2.2 26-Oct-1999  fvdl Merge changes in the trickle-sync and softdep code as done by Kirk McKusick
in FreeBSD since the version that we based the branch on. Merging mostly
done by Ethan Solomita <ethan@geocast.com>.

Also, make sure the syncer thread/process isn't active when we're
unmounting a filesystem. This could wreak havoc. XXX should be done
on a per-mountpoint basis, but especially the softdep code would
end up to be a big pile of vfs_busy() calls.
 1.1.2.1 19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.2.2.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.3.6.3 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.3.6.2 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.3.6.1 14-Feb-2000  bouyer file ffs_softdep.stub.c was added on branch thorpej_scsipi on 2000-11-20 18:11:46 +0000
 1.4.8.1 01-Oct-2001  fvdl Catch up with -current.
 1.4.6.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.4.2.3 08-Jan-2002  nathanw Catch up to -current.
 1.4.2.2 14-Nov-2001  nathanw Catch up to -current.
 1.4.2.1 21-Sep-2001  nathanw Catch up to -current.
 1.5.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.12.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.12.2.5 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.12.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.12.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.12.2.2 03-Aug-2004  skrll Sync with HEAD
 1.12.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.13.10.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.13.8.1 29-Apr-2005  kent sync with -current
 1.14.4.4 21-Jan-2008  yamt sync with head
 1.14.4.3 03-Sep-2007  yamt sync with head.
 1.14.4.2 30-Dec-2006  yamt sync with head.
 1.14.4.1 21-Jun-2006  yamt sync with head.
 1.17.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.17.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.17.8.1 24-May-2006  yamt sync with head.
 1.17.6.1 01-Jun-2006  kardel Sync with head.
 1.17.4.1 09-Sep-2006  rpaulo sync with head
 1.18.10.2 10-Dec-2006  yamt sync with head.
 1.18.10.1 22-Oct-2006  yamt sync with head
 1.18.8.1 18-Nov-2006  ad Sync with head.
 1.20.4.1 12-Mar-2007  rmind Sync with HEAD.
 1.21.28.1 02-Jan-2008  bouyer Sync with HEAD
 1.21.24.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.21.22.1 18-Feb-2008  mjf Sync with HEAD.
 1.21.16.1 09-Jan-2008  matt sync with HEAD
 1.21.2.2 16-Sep-2007  ad - Checkpoint work in progress on the vnode lifecycle and reference counting
stuff. This makes it work properly without kernel_lock and fixes a few
quite old bugs. See vfs_subr.c 1.283.2.17 for details.

- Fix some problems with softdep. Unfortunately our softdep code appears
to have some longstanding bugs that cause it fail under stress test.
 1.21.2.1 20-Aug-2007  yamt fix builds without softdep.
 1.22.12.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.22.10.1 04-May-2009  yamt sync with head.
 1.22.8.1 04-Jun-2008  yamt sync with head
 1.22.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.23.12.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.23.6.1 03-Mar-2009  skrll Sync with HEAD.
 1.54 07-Jan-2023  chs ufs: fixed signed/unsigned bugs affecting large file systems

Apply these commits from FreeBSD:

commit e870d1e6f97cc73308c11c40684b775bcfa906a2
Author: Kirk McKusick <mckusick@FreeBSD.org>
Date: Wed Feb 10 20:10:35 2010 +0000

This fix corrects a problem in the file system that treats large
inode numbers as negative rather than unsigned. For a default
(16K block) file system, this bug began to show up at a file system
size above about 16Tb.

To fully handle this problem, newfs must be updated to ensure that
it will never create a filesystem with more than 2^32 inodes. That
patch will be forthcoming soon.

Reported by: Scott Burns, John Kilburg, Bruce Evans
Followup by: Jeff Roberson
PR: 133980
MFC after: 2 weeks

commit 81479e688b0f643ffacd3f335b4b4bba460b769d
Author: Kirk McKusick <mckusick@FreeBSD.org>
Date: Thu Feb 11 18:14:53 2010 +0000

One last pass to get all the unsigned comparisons correct.


In additional to the changes from FreeBSD, this commit includes quite a few
related changes to appease -Wsign-compare.
 1.53 24-May-2022  andvar branches: 1.53.4;
fix various typos in comments, docs and log messages.
 1.52 21-Apr-2020  christos use %s/__func__ so that the strings can be shared.
 1.51 28-May-2019  kamil branches: 1.51.8;
Avoid unportable shift base -1 in ffs_subr.c

Cast the start variable before the modulo opration to unsigned int.

Detected with kUBSan.
 1.50 04-Jul-2018  kamil Avoid Undefined Behavior in ffs_clusteracct()

Change the type of 'bit' variable from int to unsigned int and use unsigned
values consistently.

sys/ufs/ffs/ffs_subr.c:336:10, shift exponent -1 is negative

Detected with Kernel Undefined Behavior Sanitizer.

Reported by <Harry Pantazis>
 1.49 07-May-2016  maxv branches: 1.49.16; 1.49.18;
uaf
 1.48 20-Oct-2013  htodd branches: 1.48.6;
Definining needswap where needed.
 1.47 14-Aug-2011  christos branches: 1.47.2; 1.47.12; 1.47.16;
fix sign-compare warnings
 1.46 06-Mar-2011  bouyer merge the bouyer-quota2 branch. This adds a new on-disk format
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.

See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
 1.45 03-Jun-2008  hannken branches: 1.45.20; 1.45.26; 1.45.28;
ufs/ffs: replace calls to getblk() with ffs_getblk(). Now all buffers
have been run through copy-on-write and async mounts work again.

Fixes PR kern/38820

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.44 29-Jan-2007  hubertf branches: 1.44.40; 1.44.42; 1.44.44; 1.44.46;
Remove more duplicate headers.
Patch by Slava Semushin <slava.semushin@gmail.com>

Again, this was tested by comparing obj files from a pristine and a patched
source tree against an i386/ALL kernel, and also for src/sbin/fsck_ffs,
src/sbin/fsdb and src/usr.sbin/makefs. Only changes in assert() line numbers
were detected in 'objdump -d' output.
 1.43 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.42 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.41 14-Jan-2006  yamt branches: 1.41.18; 1.41.20;
- unify ffs_blkatoff and lfs_blkatoff.
- remove ufs_ops::uo_blkatoff.
- add directory read-ahead code. (disabled for now.)
 1.40 27-Dec-2005  chs branches: 1.40.2;
changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
make those fields always present.
- for functions which are conditionally inline, make them never inline.
- remove some other functions which are conditionally defined but
don't actually do anything anymore.
- make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.
 1.39 11-Dec-2005  christos merge ktrace-lwp.
 1.38 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.37 12-Sep-2005  drochner branches: 1.37.2;
move the new ffs_itimes() to a berr place -- ffs_subr.c is shared with
userland
 1.36 12-Sep-2005  christos Use nanotime() to update the time fields in filesystems. Convert the code
from macros to real functions. Original patch and review from chuq.
Note: ext2fs only keeps seconds in the on-disk inode, and msdosfs does not
have enough precision for all fields, so this is not very useful for those
two.
 1.35 30-Aug-2005  xtraeme * Remove __P()
* Use ANSI function declarations on ext2fs and mfs
 1.34 15-Jul-2005  thorpej Use ANSI function decls.
 1.33 26-Feb-2005  perry branches: 1.33.4;
nuke trailing whitespace
 1.32 30-Dec-2003  pk branches: 1.32.8; 1.32.10;
Replace the traditional buffer memory management -- based on fixed per buffer
virtual memory reservation and a private pool of memory pages -- by a scheme
based on memory pools.

This allows better utilization of memory because buffers can now be allocated
with a granularity finer than the system's native page size (useful for
filesystems with e.g. 1k or 2k fragment sizes). It also avoids fragmentation
of virtual to physical memory mappings (due to the former fixed virtual
address reservation) resulting in better utilization of MMU resources on some
platforms. Finally, the scheme is more flexible by allowing run-time decisions
on the amount of memory to be used for buffers.

On the other hand, the effectiveness of the LRU queue for buffer recycling
may be somewhat reduced compared to the traditional method since, due to the
nature of the pool based memory allocation, the actual least recently used
buffer may release its memory to a pool different from the one needed by a
newly allocated buffer. However, this effect will kick in only if the
system is under memory pressure.
 1.31 02-Dec-2003  dbj clarify comments, especially since ffs_isfreeblock is non-intuitive:
ffs_isblock:
check if a block is available
returns true if all the correponding bits in the free map are 1
returns false if any corresponding bit in the free map is 0
ffs_isfreeblock:
check if a block is completely allocated
returns true if all the corresponding bits in the free map are 0
returns false if any corresponding bit in the free map is 1
 1.30 27-Oct-2003  lukem Overhaul how `build.sh tools' are used:

* Rename "config.h" to "nbtool_config.h" and
HAVE_CONFIG_H to HAVE_NBTOOL_CONFIG_H.
This makes in more obvious in the source when we're using
tools/compat/config.h versus "standard autoconf" config.h

* Consistently move the inclusion of nbtool_config.h to before
<sys/cdefs.h> so that the former can provide __RCSID() (et al),
and there's no need to protect those macros any more.

These changes should make it easier to "tool-ify" a program by adding:
#if HAVE_NBTOOL_CONFIG_H
#include "nbtool_config.h"
#endif
to the top of the source files (for the general case).
 1.29 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.28 02-Apr-2003  fvdl branches: 1.28.2;
Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.27 25-Jan-2003  tron Use PRId64 instead of hard coding "%lld" to fix build problems under
LP64 ports.
 1.26 25-Jan-2003  tron Fix printf() format strings problems caused by "daddr_t" change.
 1.25 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.24 01-Dec-2002  matt Add multiple inclusion protection for headers. Fix mismatched
variable declarations (missing const's) as needed.
 1.23 06-Jul-2002  fredette Fixed a printf argument type.
 1.22 10-Apr-2002  mycroft branches: 1.22.2;
Use blkstofrags() and fragstoblks(). Use &(NBBY-1) rather than %NBBY.
Switch off of fs_fragshift rather than fs_frag (generates better jump tables).
 1.21 31-Jan-2002  tv These sources are pulled into makefs(8), so we need config.h and protection
for __KERNEL_RCSID().
 1.20 09-Jan-2002  lukem Only pull in <sys/systm.h> #ifdef _KERNEL, since it's a kernel only header.
In the ! _KERNEL case, provide own prototype for panic() instead.
 1.19 30-Oct-2001  lukem add __KERNEL_RCSID()
 1.18 26-Oct-2001  lukem - pull in ufsmount.h after inode.h, because the latter pulls in
quota.h which the former needs, and this makes the usage consistent
with other files anyway
- expand the details in a few panic strings
 1.17 26-Oct-2001  lukem remove #include <ufs/ufs/quota.h> where it was just to appease
<ufs/ufs/inode.h>, since the latter now includes the former. leave the former
in source that obviously uses specific bits of it (for completeness.)
 1.16 09-Aug-2001  lukem branches: 1.16.4;
be consistent and use "u_char" instead of "unsigned char"
 1.15 30-Mar-2000  augustss branches: 1.15.6; 1.15.10;
Remove register declarations.
 1.14 15-Nov-1999  fvdl Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.13 28-Jul-1998  drochner branches: 1.13.14; 1.13.16; 1.13.20;
The fragtbl[], inside[] and around[] variables are needed by "fsck",
so we can't put them inside "#ifdef _KERNEL".
Put declarations inside .c files where needed to preserve namespace.
 1.12 13-Jun-1998  kleink KNF, mostly of FFS_EI changes.
 1.11 18-Mar-1998  bouyer Add support for reading/writing FFS in non-native byte order, conditioned
to "options FFS_EI". The superblock and inodes (without blk addr) are
byteswapped at disk read/write time, other metadatas are byteswapped
when used (as they are acceeded directly in the buffer cache).
This required the addition of a "um_flags" field to struct ufsmount.
ffs_bswap.c contains superblock and inode byteswap routines also used
by userland utilities.
 1.10 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.9 12-Oct-1996  christos revert previous kprintf changes
 1.8 10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.7 20-Sep-1996  christos Make this compile cleanly from userland (fsck_ffs).
 1.6 17-Mar-1996  christos Fix printf format strings
 1.5 09-Feb-1996  christos ffs prototypes
 1.4 28-Mar-1995  jtc KERNEL -> _KERNEL
 1.3 20-Oct-1994  cgd update for new syscall args description mechanism, and deal safely
with wider types.
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.13.20.2 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.13.20.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.13.16.1 19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.13.14.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.15.10.5 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.15.10.4 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.15.10.3 11-Feb-2002  jdolecek Sync w/ -current.
 1.15.10.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.15.10.1 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.15.6.7 11-Dec-2002  thorpej Sync with HEAD.
 1.15.6.6 01-Aug-2002  nathanw Catch up to -current.
 1.15.6.5 17-Apr-2002  nathanw Catch up to -current.
 1.15.6.4 28-Feb-2002  nathanw Catch up to -current.
 1.15.6.3 11-Jan-2002  nathanw More catchup.
 1.15.6.2 14-Nov-2001  nathanw Catch up to -current.
 1.15.6.1 24-Aug-2001  nathanw Catch up with -current.
 1.16.4.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.22.2.1 15-Jul-2002  gehenna catch up with -current.
 1.28.2.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.28.2.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.28.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.28.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.28.2.1 03-Aug-2004  skrll Sync with HEAD
 1.32.10.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.32.8.1 29-Apr-2005  kent sync with -current
 1.33.4.2 26-Feb-2007  yamt sync with head.
 1.33.4.1 21-Jun-2006  yamt sync with head.
 1.37.2.1 20-Oct-2005  yamt adapt ufs.
 1.40.2.1 15-Jan-2006  yamt sync with head.
 1.41.20.2 10-Dec-2006  yamt sync with head.
 1.41.20.1 22-Oct-2006  yamt sync with head
 1.41.18.2 01-Feb-2007  ad Sync with head.
 1.41.18.1 18-Nov-2006  ad Sync with head.
 1.44.46.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.44.44.1 04-May-2009  yamt sync with head.
 1.44.42.1 04-Jun-2008  yamt sync with head
 1.44.40.1 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.45.28.4 09-Feb-2011  bouyer Make it build without FFS_EI
 1.45.28.3 08-Feb-2011  bouyer for !_KERNEL case, always define FFS_EI.
Required for makefs, and maybe resize_ffs (it's not clear if
resize_ffs supports swapped byte order or not - swapped endian tests
are expected to fail but actually succeed :)
 1.45.28.2 08-Feb-2011  bouyer Sync with HEAD
 1.45.28.1 20-Jan-2011  bouyer Snapshot of work in progress on a modernised disk quota system:
- new quotactl syscall (versionned for backward compat), which takes
as parameter a path to a mount point, and a prop_dictionary
(in plistref format) describing commands and arguments.
For each command, status and data are returned as a prop_dictionary.
quota commands features will be added to take advantage of this,
exporting quota data or getting quota commands as plists.

- new on disk-format storage (all 64bit wide), integrated to metadata for
ffs (and playing nicely with wapbl).
Quotas are enabled on a ffs filesystem via superblock flags.
tunefs(8) can enable or disable quotas.
On a quota-enabled filesystem, fsck_ffs(8) will track per-uid/gid
block and inode usages, and will check and update quotas in Pass 6.
quota usage and limits are stored in unliked files (one for users,
one for groups)l fsck_ffs(8) will create the files if needed, or
free them if needed. This means that after enabling or disabling
quotas on a filesystem; a fsck_ffs(8) run is required.
quotacheck(8) is not needed any more, on a unclean shutdown
fsck or journal replay will take care of fixing quotas.
newfs(8) can create a ready-to-mount quota-enabled filesystem
(superblock flags are set and quota inodes are created).
Other new features or semantic changes:
- default quota datas, applied to users or groups which don't already
have a quota entry
- per-user/group grace time (instead of a filesystem global one)
- 0 really means "nothing allowed at all", not "no limit".
If you want "no limit", set the limit to UQUAD_MAX (tools will
understand "unlimited" and "-")

A quota file is structured as follow:
it starts with a header, containing a few per-filesystem values,
and the default quota limits.
Quota entries are linked together as a simple list, each entry has a
pointer (as an offset withing the file) to the next.
The header has a pointer to a list of free quota entries, and
a hash table of in-use entries. The size of the hash table depends
on the filesystem block size (header+hash table should fit in the
first block). The file is not sparse and is a multiple of
filesystem block size (when the free quota entry list is empty a new
filesystem block is allocated). quota entries to not cross
filesystem block boundaries.

In memory, the kernel keeps a cache of recently used quota entries
as a reference to the block number, and offset withing the block.
The quota entry itself is keept in the buf cache.

fsck_ffs(8), tunefs(8) and newfs(8) supports are completed (with
related atf tests :)
The kernel can update disk usage and report it via quotactl(2).

Todo: enforce quotas limits (limits are not checked by kernel yet)
update repquota, edquota and rpc.rquotad to the new world
implement compat_50_quotactl ioctl.
update quotactl(2) man page

fsck_ffs required fixes so that allocating new blocks or inodes will
properly update the superblock and cg sumaries. This was not an issue up
to now because superblock and cg sumaries check happened last, but now
allocations or frees can happen in pass 6.
 1.45.26.1 06-Jun-2011  jruoho Sync with HEAD.
 1.45.20.1 21-Apr-2011  rmind sync with head
 1.47.16.1 18-May-2014  rmind sync with head
 1.47.12.2 03-Dec-2017  jdolecek update from HEAD
 1.47.12.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.47.2.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.48.6.1 29-May-2016  skrll Sync with HEAD
 1.49.18.2 21-Apr-2020  martin Sync with HEAD
 1.49.18.1 10-Jun-2019  christos Sync with HEAD
 1.49.16.1 28-Jul-2018  pgoyette Sync with HEAD
 1.51.8.1 25-Apr-2020  bouyer Sync with bouyer-xenpvh-base2 (HEAD)
 1.53.4.1 13-May-2023  martin Pull up following revision(s) (requested by chs in ticket #160):

usr.sbin/makefs/ffs/ffs_alloc.c: revision 1.31
sbin/tunefs/tunefs.c: revision 1.58
sbin/fsck_ffs/setup.c: revision 1.105
sbin/fsck_ffs/pass5.c: revision 1.56
usr.sbin/makefs/ffs.c: revision 1.74
usr.sbin/makefs/ffs/mkfs.c: revision 1.42
usr.sbin/makefs/Makefile: revision 1.40
sys/ufs/ffs/fs.h: revision 1.71
sbin/fsdb/fsdb.c: revision 1.54
sbin/resize_ffs/resize_ffs.c: revision 1.58
sbin/fsck_ffs/pass4.c: revision 1.29
usr.sbin/makefs/ffs/ffs_extern.h: revision 1.9
sbin/newfs/mkfs.c: revision 1.133
sys/ufs/ffs/ffs_alloc.c: revision 1.172
sbin/fsck_ffs/pass1b.c: revision 1.24
usr.sbin/dumpfs/dumpfs.c: revision 1.68
sys/ufs/ffs/ffs_extern.h: revision 1.88
usr.sbin/quotacheck/quotacheck.c: revision 1.51
sys/ufs/ffs/ffs_subr.c: revision 1.54
sbin/fsck_ffs/main.c: revision 1.91
sbin/fsck_ffs/pass1.c: revision 1.63

ufs: fixed signed/unsigned bugs affecting large file systems

Apply these commits from FreeBSD:
commit e870d1e6f97cc73308c11c40684b775bcfa906a2
Author: Kirk McKusick <mckusick@FreeBSD.org>
Date: Wed Feb 10 20:10:35 2010 +0000
This fix corrects a problem in the file system that treats large
inode numbers as negative rather than unsigned. For a default
(16K block) file system, this bug began to show up at a file system
size above about 16Tb.
To fully handle this problem, newfs must be updated to ensure that
it will never create a filesystem with more than 2^32 inodes. That
patch will be forthcoming soon.
Reported by: Scott Burns, John Kilburg, Bruce Evans
Followup by: Jeff Roberson
PR: 133980
MFC after: 2 weeks

commit 81479e688b0f643ffacd3f335b4b4bba460b769d
Author: Kirk McKusick <mckusick@FreeBSD.org>
Date: Thu Feb 11 18:14:53 2010 +0000
One last pass to get all the unsigned comparisons correct.

In additional to the changes from FreeBSD, this commit includes quite a few
related changes to appease -Wsign-compare.
 1.9 11-Dec-2005  christos merge ktrace-lwp.
 1.8 26-Feb-2005  perry nuke trailing whitespace
 1.7 27-Oct-2003  lukem branches: 1.7.8; 1.7.10;
Overhaul how `build.sh tools' are used:

* Rename "config.h" to "nbtool_config.h" and
HAVE_CONFIG_H to HAVE_NBTOOL_CONFIG_H.
This makes in more obvious in the source when we're using
tools/compat/config.h versus "standard autoconf" config.h

* Consistently move the inclusion of nbtool_config.h to before
<sys/cdefs.h> so that the former can provide __RCSID() (et al),
and there's no need to protect those macros any more.

These changes should make it easier to "tool-ify" a program by adding:
#if HAVE_NBTOOL_CONFIG_H
#include "nbtool_config.h"
#endif
to the top of the source files (for the general case).
 1.6 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.5 31-Jan-2002  tv branches: 1.5.16;
These sources are pulled into makefs(8), so we need config.h and protection
for __KERNEL_RCSID().
 1.4 30-Oct-2001  lukem add __KERNEL_RCSID()
 1.3 18-Jan-2001  jdolecek branches: 1.3.2; 1.3.6; 1.3.10;
constify
 1.2 29-Jun-1994  cgd branches: 1.2.34;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.2.34.1 11-Feb-2001  bouyer Sync with HEAD.
 1.3.10.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.3.6.2 11-Feb-2002  jdolecek Sync w/ -current.
 1.3.6.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.3.2.2 28-Feb-2002  nathanw Catch up to -current.
 1.3.2.1 14-Nov-2001  nathanw Catch up to -current.
 1.5.16.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.5.16.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.5.16.2 18-Sep-2004  skrll Sync with HEAD.
 1.5.16.1 03-Aug-2004  skrll Sync with HEAD
 1.7.10.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.7.8.1 29-Apr-2005  kent sync with -current
 1.384 30-Dec-2024  hannken Protect test/clear fs->fs_fmod with um_lock like it is already
protected in ffs_alloc.c.

When writing to disk protect moving superblock to buffer with um_lock.

Set/clear fs->fmod while mounting, updating a mount or unmounting
is safe as these operations run exclusive, either mounting creates
a new file system or the file system is suspended. Assert suspension
for update and unmount.

PR kern/58837 "ffs: Missing locking around fs_fmod/time"
 1.383 30-Dec-2024  hannken emove comment "we are always called with the filesystem marked `MPBUSY'."
above some xxx_sync() operations. These operations get called without
any exclusive lock.

This comment appeared with "add quota support" on 1990-05-02.
On 1998/02/18 MNT_MPBUSY disappeared when vfs_busy() was changed from
an exclusive lock to a shared lock.

PR kern/58837 "ffs: Missing locking around fs_fmod/time"
 1.382 08-Sep-2023  riastradh branches: 1.382.6;
ffs_sync: Avoid unlocked access to v_numoutput/v_dirtyblkhd.

Found by lockdoc.

PR kern/57606
 1.381 15-Jun-2023  hannken Undo unlock/relock for VOP_IOCTL().

PR kern/57450 (unplugging hung USB disk triggers panic via _vstate_assert)
 1.380 05-Jun-2023  rin Make DEBUG_FFS_MOUNT compile again (with 64-bit ino_t).
 1.379 21-Dec-2022  chs ffs: fail mounts requesting ACLs for non-ea UFS2 file systems

For non-ea UFS2 file system, fail mounts that request ACLs rather than
letting the mount succeed only to reject all ACL operations later.

Also fix the messages about the on-disk fs flags conflicting with
the mount options for which type of ACLs to use, and about requesting
both types of ACLs.
 1.378 17-Nov-2022  chs branches: 1.378.2;
Restore backward compatibility of UFS2 with previous NetBSD releases by
disabling support in UFS2 for extended attributes (including ACLs).
Add a new variant of UFS2 called "UFS2ea" that does support extended attributes.
Add new fsck_ffs operations "-c ea" and "-c no-ea" to convert file systems
from UFS2 to UFS2ea and vice-versa (both of which delete all existing extended
attributes in the process).
 1.377 10-Nov-2022  hannken Some changes to "fs->fs_fmod" and "fs->fs_clean":
- clear "fs->fs_fmod" after reading the super block.
- assert we don't write a super block when mounted read-only.
- make sure "fs->fs_clean" is one of FS_ISCLEAN or FS_WASCLEAN.
- print "file system not clean" on every mount.

Should fix PR kern/57010: ffs: mounting unclean non-root fs read-only
causes spurious write to superblock
 1.376 16-Apr-2022  hannken Unlock vnode for VOP_IOCTL() and wapbl_flush().
 1.375 19-Mar-2022  hannken Remove now unused VV_LOCKSWORK, all file systems support locking.

Remove unused predicates vn_locked() and vn_anylocked().

Welcome to 9.99.95
 1.374 12-Mar-2022  riastradh ffs: Fix 64-bit inode integer truncation.

Reported-by: syzbot+1ae93e092d532582b809@syzkaller.appspotmail.com
 1.373 18-Sep-2021  christos Change the default for ACLs to be posix1e instead of nfsv4 to match FreeBSD.
Requested by chuq.
 1.372 20-Aug-2020  christos Don't cache id's for vnodes that have ACLs. ok chs@
 1.371 05-Jul-2020  christos simplify the acl setup, and fix reversed mask in the fs_flags code.
 1.370 18-May-2020  hannken Assert ufs_strategy() always gets used while current thread
holds a fstrans lock.
 1.369 16-May-2020  christos Add ACL support for FFS. From FreeBSD.
 1.368 12-May-2020  ad cache_enter_id(): give it a boolean parameter to indicate whether the cached
identity is valid.
 1.367 04-Apr-2020  ad Merge the remaining changes from the ad-namecache branch, affecting namei()
and getcwd():

- push vnode locking back as far as possible.
- do most lookups directly in the namecache, avoiding vnode locks & refs.
- don't block new refs to vnodes across VOP_INACTIVE().
- get shared locks for VOP_LOOKUP() if the file system supports it.
- correct lock types for VOP_ACCESS() / VOP_GETATTR() in a few places.

Possible future enhancements:

- make the lookups lockless.
- support dotdot lookups by being lockless and inferring absence of chroot.
- maybe make it work for layered file systems.
- avoid vnode references at the root & cwd.
 1.366 16-Mar-2020  pgoyette Use the module subsystem's ability to process SYSCTL_SETUP() entries to
automate installation of sysctl nodes.

Note that there are still a number of device and pseudo-device modules
that create entries tied to individual device units, rather than to the
module itself. These are not changed.
 1.365 27-Feb-2020  ad Tighten up the locking around vp->v_iflag a little more after the recent
split of vmobjlock & v_interlock.
 1.364 23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.363 17-Jan-2020  ad VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.
 1.362 20-Jun-2019  pgoyette branches: 1.362.2; 1.362.4;
Split the ufs code out of the ffs module and into its own module.

Adapt chfs and ext2fs modules accordingly.
 1.361 01-Jan-2019  hannken Add "void *extra" argument to vcache_new() so a file system may
pass more information about the file to create.

Welcome to 8.99.30
 1.360 10-Dec-2018  jdolecek make UFS_WAPBL_JLOCK_ASSERT() #ifdef DIAGNOSTIC, same as the underlying
function KASSERT(), so that it actually does something; fix code using
it to actually pass correct params, so that it compiles

remove UFS_WAPBL_JUNLOCK_ASSERT(), as that is inherently racy (it's
okay on those places if the rwlock is held by other lwp); depend
on the RW_ASSERT()/LOCKDEBUG inside rw_enter() to catch the case
with wapbl rwlock held by current lwp
 1.359 10-Dec-2018  maxv Remove unused mbuf.h includes.
 1.358 18-Jul-2018  uwe ffs_superblock_validate - check fs_old_size too.

Now I can mount OpenWindows Version 3 CD from 1991.
 1.357 28-May-2018  chs branches: 1.357.2;
add a genfs method to allow a file system to limit the range of pages
that are given to a single GOP_WRITE() call. needed by ZFS.
 1.356 28-Jan-2018  hannken branches: 1.356.2;
Prevent use-after-free where genfs_node_destroy() would destroy
a lock residing in the just freed inode data.
 1.355 15-Nov-2017  christos PR/52728: Izumi Tsutsui: "mount -u /dev/ /" triggers kernel panic
Simplify the control flow of the mount code and make sure that the
mountfrom argument can be converted to a block device in the update
case.
XXX: pullup-8
 1.354 20-Aug-2017  maya print mode as octal for readability
 1.353 17-Apr-2017  hannken branches: 1.353.2; 1.353.4;
Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.
 1.352 17-Apr-2017  hannken Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).
 1.351 01-Apr-2017  riastradh KASSERT(mutex_owned(vp->v_interlock)) in vnode iterator selector.
 1.350 10-Mar-2017  jdolecek slightly rearrange the code for IMNT_WANTRDONLY + MNT_UPDATE case for
better readability, no functional change
 1.349 06-Mar-2017  hannken Adapt the test "enable WAPBL on rw mounts only" to the recent change of
the protocol to update a mounted file.

Should fix PR kern/52031 (FFS mount update doesn't play nice with WAPBL)
 1.348 01-Mar-2017  hannken Bring back read-write to read-only mount update for ffs.
 1.347 01-Mar-2017  hannken Remove now redundant calls to fstrans_start()/fstrans_done().
 1.346 22-Feb-2017  hannken Enable fstrans on all file systems.

Welcome to 7.99.61
 1.345 17-Feb-2017  hannken Add generic genfs_suspendctl() and use it for all file systems.
Layered file systems need work.
 1.344 17-Feb-2017  hannken Untangle VFS_SYNC() from VFS_SUSPENDCTL().
 1.343 17-Feb-2017  hannken Flush the log to disk when ffs_sync() gets called with MNT_WAIT.
 1.342 27-Dec-2016  hannken branches: 1.342.2;
Fix a bug introduced with Rev. 1.294: use LK_NOWAIT when called with MNT_LAZY.
 1.341 20-Oct-2016  jdolecek add assertion to ensure ffs_cgupdate() is always called from
within a WAPBL transaction (if logging is on)
 1.340 28-Jul-2016  martin From Michael Plass:

The superblock field that distinguishes between 4.2BSD and 4.4BSD
inodes is really only relevant on a UFS1 file system. Make sure that
it is a UFS1 fs before using fs_old_inodefmt.

Note that the NetBSD newfs and mkfs utilities initialize fs_old_inodefmt
even for UFS2, so problems were apparent only on file systems created
by other operating systems, for example, FreeBSD.
 1.339 19-Jun-2016  christos branches: 1.339.2;
Relax the dup alloc tests to not include the on-disk data for ffsv2, since
nothing checks that the lazy-initialized inodes are correct and if they happen
to get corrupted, there is no way to fix them.
 1.338 23-Dec-2015  christos We need to check if the inode is initialized for ffsv2 when we translate a
filehandle to a vnode. This can come from nfs and it could be out of range.
In that case we read garbage from the disk, end up trying to free bogus data
when we put the vnode back and we crash.
XXX: pullup-7
 1.337 15-Nov-2015  pgoyette If file system ffs is built with WAPBL defined, make sure that the
module depends on the wapbl module.

No impact to users of built-in ffs file system code, as the WAPBL
#define will cause inclusion of the code in the kernel.

A standard build of the modular ffs file system code will #define
WAPBL, so the module will only work on a kernel which was also
built with WAPBL defined (or, once I commit it, with a dynamically-
loaded wapbl module).
 1.336 22-Oct-2015  maxv Fix PR 50070. From hannken@.
 1.335 24-Jul-2015  maxv Unused inits (harmless).

Found by Brainy.
 1.334 23-May-2015  maxv Add a missing goto.

(was here before my changes)

ok christos@
 1.333 19-May-2015  martin Cosmetics: fix netbsd.org spelling
 1.332 18-May-2015  martin Print all sizes as size_t
 1.331 18-May-2015  martin Make the recently added fs_cgsize test less strict, as it prevents existing
installs from booting.
Catch the common case and warn about it, pointing to a web page describing
the issue - but allow mounting. In all other cases, print more details about
the inconsistency and fail the mount.
 1.330 26-Apr-2015  maxv ffs_superblock_validate(): check the size of cylinder groups.
 1.329 22-Apr-2015  maxv Instead of duplicating code, create ffs_is_appleufs(): returns 1 if the
device is an AppleUFS FS, 0 otherwise.

This changes the behavior a bit: if the kernel cannot determine whether the
disk is an AppleUFS one or not, it now considers it as a normal UFS rather
than returning an error and not mounting/reloading it.

No particular comment on tech-kern@
 1.328 04-Apr-2015  maxv ffs_superblock_validate(): ensure fs_ncg!=0 and fs_maxbpg!=0 to prevent
several divisions by zero.
 1.327 28-Mar-2015  maxv Remove the 'cred' argument from bread(). Remove a now unused var in
ffs_snapshot.c. Update the man page accordingly.

ok hannken@
 1.326 27-Mar-2015  riastradh Disentangle buffer-cached I/O from page-cached I/O in UFS.

Page-cached I/O is used for regular files, and is initiated by VFS
users such as userland and NFS.

Buffer-cached I/O is used for directories and symlinks, and is issued
only internally by UFS.

New UFS routine ufs_bufio replaces vn_rdwr for internal use.
ufs_bufio is implemented by new UFS operations uo_bufrd/uo_bufwr,
which sit in ufs_readwrite.c alongside the VOP_READ/VOP_WRITE
implementations.

I preserved the code as much as possible and will leave further
simplification for future commits. I kept the ulfs_readwrite.c
copypasta close to ufs_readwrite.c in case we ever want to merge them
back; likewise ext2fs_readwrite.c.

No externally visible semantic change. All atf fs tests still pass.
 1.325 17-Mar-2015  hannken Change ffs to use vcache_new:
- Change ffs_valloc to return an inode number.
- Remove now obsolete UFS operations UFS_VALLOC and UFS_VFREE.
- Make ufs_makeinode private to ufs_vnops.c and pass vattr instead of mode.
 1.324 15-Mar-2015  maxv ffs_reload(): fix a bug that prevents Big Endian FSes from being reloaded.
'newfs' should be tagged as FS_SWAPPED, not 'fs'.

Was here before my changes.

While here, also KNF a bit.
 1.323 14-Mar-2015  maxv ffs_superblock_validate(): ensure fs_ipg and fs_fpg are != 0. Otherwise
division by zero in several places.
 1.322 10-Mar-2015  maxv ffs_superblock_validate(): check the number of inodes per block. Otherwise
a malformed value could panic the system.
 1.321 03-Mar-2015  maxv ffs_reload(): release 'bp' earlier
 1.320 03-Mar-2015  maxv ffs_reload(): the current implementation blindly guesses critical fields
of the superblock didn't change. Add checks to ensure they didn't change
for real. This prevents several memory corruptions.
 1.319 23-Feb-2015  maxv Small changes:
- instead of always calling DPRINTF with __func__, put __func__ directly
in the macro
- ffs_mountfs(): rename fsblockloc -> fs_sblockloc, initialize fs_sbsize
to zero
No real functional change
 1.318 22-Feb-2015  maxv ffs_superblock_validate(): sanitize fs_fragshift, fs_bmask and fs_fmask.
 1.317 20-Feb-2015  maxv Style, and fix a DPRINTF

No functional change
 1.316 14-Feb-2015  maxv ffs_superblock_validate(): when checking the number of frag blocks, also
make sure it matches fs->fs_frag. This also prevents an infinite loop if
fs->fs_frag=0.
 1.315 14-Feb-2015  maxv ffs_superblock_validate(): compute fs_bshift and fs_fshift, and ensure
they are consistent with what is indicated in the superblock. This allows
us to safely use some ffs_ macros.
 1.314 14-Feb-2015  maxv In fact, we need to sanitize the superblock *after* swapping it. Therefore,
move the swap code inside the loop.

'fs->fs_sbsize' is swapped twice: the first time in order to get the
correct superblock size, and later when swapping the whole superblock
structure. As a result, we need to check 'fs->fs_sbsize' twice.

This:
- fixes my previous changes for swapped FSes
- allows the kernel to look for other superblock locations if the
current superblock is not validated

And now:
- ffs_superblock_validate() takes only one argument: the fs structure
- 'fs_bsize' is unused, so delete it

Add some comments to explain a bit what we are doing.
 1.313 14-Feb-2015  maxv ffs_superblock_validate(): sanitize the number of frag blocks.
 1.312 14-Feb-2015  maxv Currently, in ffs_reload(), we don't handle the possibility that the
superblock location may have changed. But that implies that we don't
handle the possibility that its size may have changed either.

Therefore: add a check to ensure the size hasn't changed. Otherwise the
mismatch leads to a memory corruption with kmem.
 1.311 14-Feb-2015  maxv Style. No functional change.
 1.310 14-Feb-2015  maxv ffs_reload(): call ffs_superblock_validate() with the new superblock.
 1.309 13-Feb-2015  maxv ffs_superblock_validate(): ensure fs->fs_cssize!=0, otherwise the kernel
panics with kmem_alloc(0).
 1.308 13-Feb-2015  maxv Add some checks in ffs_superblock_validate():
- fs_bsize < MINBSIZE
- !powerof2(fs_bsize)
- !powerof2(fs->fs_fsize)
- fs_bsize < fs->fs_fsize

Based on makefs/ffs.
 1.307 13-Feb-2015  maxv Add a new function: ffs_superblock_validate(). And add a new check to
ensure fs_size!=0; otherwise the kernel panics with a division by zero.
 1.306 13-Feb-2015  maxv Make this a bit more readable. No functional change.
 1.305 16-Jan-2015  christos PR/39371: Tobias Nygren: Don't fail mounting root if WAPBL log is corrupt.
Patch from Sergio L. Pascual.
XXX: pullup-7
 1.304 14-Dec-2014  christos Restore apple ufs error handling.
 1.303 14-Dec-2014  christos - Add debugging for mount...
- Merge some error returns
- Check more errors
 1.302 14-Nov-2014  manu branches: 1.302.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.
 1.301 30-Oct-2014  maxv Limit the superblock size to SBLOCKSIZE, not MAXBSIZE. Otherwise memcpy
will read beyond the allocated buffer.

Discussed a bit on tech-kern@.
 1.300 24-Oct-2014  njoly One semicolon is enough.
 1.299 24-May-2014  christos branches: 1.299.2;
Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.
 1.298 08-May-2014  hannken Add a global vnode cache:

- vcache_get() retrieves a referenced and initialised vnode / fs node pair.
- vcache_remove() removes a vnode / fs node pair from the cache.

On cache miss vcache_get() calls new vfs operation vfs_loadvnode() to
initialise a vnode / fs node pair. This call is guaranteed exclusive,
no other thread will try to load this vnode / fs node pair.

Convert ufs/ext2fs, ufs/ffs and ufs/mfs to use this interface.

Remove now unused ufs/ufs_ihash

Discussed on tech-kern.

Welcome to 6.99.41
 1.297 16-Apr-2014  maxv An (un)privileged user can easily make the kernel dereference a NULL
pointer.

The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).

ok christos@
 1.296 01-Apr-2014  christos branches: 1.296.2;
Check for bread errors before we do the size check. Otherwise we de-reference
NULL...
 1.295 23-Mar-2014  hannken Change all vfsops to use C99 designated initializers.

No functional changes intended.
 1.294 17-Mar-2014  hannken Change ffs_sync() to use vfs_vnode_iterator.
 1.293 05-Mar-2014  hannken Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34
 1.292 25-Feb-2014  pooka Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.291 23-Nov-2013  christos change the mountlist CIRCLEQ into a TAILQ
 1.290 29-Oct-2013  hannken Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25
 1.289 30-Sep-2013  hannken Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>
 1.288 16-Sep-2013  hannken Function ffs_reload() works on a read-only mount, so remove the call
to ffs_snapshot_mount() as it would panic later with "already on list"
when remounting read-write.

Should fix PR kern/48211 (Unclean shutdown with active snapshot causes
panic during reboot)
 1.287 11-Aug-2013  dholland Kill off uo_unmark_vnode/UFS_UNMARK_VNODE as it's now a leftover.
 1.286 23-Jun-2013  dholland branches: 1.286.2;
Stick ffs_ in front of the following macros:
fragstoblks()
blkstofrags()
fragnum()
blknum()

to finish the job of distinguishing them from the lfs versions, which
Christos renamed the other day.

I believe this is the last of the overtly ambiguous exported symbols
from ffs... or at least, the last of the ones that conflicted with lfs.
ffs still pollutes the C namespace very broadly (as does ufs) and this
needs quite a bit more cleanup.

XXX: boo on macros with lowercase names. But I'm not tackling that just yet.
 1.285 23-Jun-2013  dholland fsbtodb() -> FFS_FSBTODB(), EXT2_FSBTODB(), or MFS_FSBTODB()
dbtofsb() -> FFS_DBTOFSB() or EXT2_DBTOFSB()

(Christos already did the lfs ones a few days back)
 1.284 16-Jun-2013  hannken Add an UFS_SNAPGONE() ufs op replacing the calls
to ffs_snapgone() in ufs_lookup.c.

Ok: David Holland <dholland@netbsd.org>

Welcome to 6.99.22
 1.283 09-Jun-2013  dholland Stick UFS_ in front of these symbols:
DIRBLKSIZ
DIRECTSIZ
DIRSIZ
OLDDIRFMT
NEWDIRFMT

Part of PR 47909.
 1.282 22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.281 20-Dec-2012  hannken Change bread() and breadn() to never return a buffer on
error and modify all callers to not brelse() on error.

Welcome to 6.99.16

PR kern/46282 (6.0_BETA crash: msdosfs_bmap -> pcbmap -> bread -> bio_doread)
 1.280 26-Nov-2012  drochner allow to enable ffs "discard" by update mounts, make the flag visible
to userland
 1.279 19-Oct-2012  drochner Implement experimental support to pass notifications that a file
was deleted from the filesystem to the disk driver, commonly
known as "discard" or "trim".
fs/driver support is in ffs and ata wd for now.
This is what was posted here:
http://mail-index.netbsd.org/tech-kern/2012/02/28/msg012813.html
with minor cleanup, and the global switch replaced by a mount option.
 1.278 10-Sep-2012  manu branches: 1.278.2;
Stop extended attributes at the appropriate place so that unmount
does not fail with EBUSY on filesystem with extended attributes ensabled.
 1.277 29-Apr-2012  chs change vflushbuf() to take the full FSYNC_* flags.
translate FSYNC_LAZY into PGO_LAZY for VOP_PUTPAGES() so that
genfs_do_io() can set the appropriate io priority for the I/O.
this is the first part of addressing PR 46325.
 1.276 13-Mar-2012  elad Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.
 1.275 29-Jan-2012  nonaka branches: 1.275.2;
use FS_UFS[12]_MAGIC_SWAPPED instead of bswap32(FS_UFS[12]_MAGIC).
 1.274 28-Jan-2012  rmind pool_page_alloc, pool_page_alloc_meta: avoid extra compare, use const.
ffs_mountfs,sys_swapctl: replace memset with kmem_zalloc.
sys_swapctl: move kmem_free outside the lock path.
uvm_init: fix comment, remove pointless numeration of steps.
uvm_map_enter: remove meflagval variable.
Fix some indentation.
 1.273 27-Jan-2012  para converting readdir in ffs ext2fs from malloc(9) to kmem(9)
while there allocate ufs mount structs from kmem(9) too
preceding kmem-vmem-pool-patch

releng@ acknowledged
 1.272 03-Jan-2012  pgoyette Display current mount point, rather than previous one, when printing
the "replaying log to disk" message.

OK dholland@

Fixes PR kern/39609
 1.271 14-Nov-2011  hannken branches: 1.271.4;
VOP_OPEN() needs a locked vnode. All these copy-and-pasted xxxfs_mount()
implementations need more review.
 1.270 13-Nov-2011  christos use getdiskinfo()
 1.269 07-Oct-2011  hannken branches: 1.269.2;
As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.
 1.268 17-Jun-2011  manu Add mount -o extattr option to enable extended attributs (corrently only
for UFS1).
Remove kernel option for EA backing store autocreation and do it by
default. Add a sysctl so that autocreated attriutr size can be modified.
 1.267 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.266 27-Apr-2011  hannken branches: 1.266.2;
Cleanup ffs fsync and make devices on wapbl enabled file systems work here:

- Replace the ugly sync loop in ffs_full_fsync() and ffs_vfs_fsync() with
vflushbuf(). This loop is a relic of softdeps and not needed anymore.

- Add ffs_spec_fsync() for device nodes on ffs file systems that calls
spec_fsync() like all other file systems do and then updates the ctime.

Discussed on tech-kern.

Should fix PRs:
PR #41192 wapbl diagnostic panic during cgdconfig
PR #41977 kernel diagnostic assertion "rw_lock_held(&wl->wl_rwlock)" failed
PR #42149 wapbl locking panic if watching DVD
PR #42551 Lockdebug assert in wapbl when running zpool
 1.265 27-Mar-2011  mlelstv Don't abort when APPLE_UFS autodetection cannot read the apple ufs label
due to sector size or alignment problems. Autodetection is only a safety
measure, you should mark the filesystem type in the BSD disklabel.
 1.264 06-Mar-2011  bouyer merge the bouyer-quota2 branch. This adds a new on-disk format
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.

See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
 1.263 27-Dec-2010  hannken branches: 1.263.2; 1.263.4;
Extend the range of fstrans transactions to a sequence of vnode operations
on a locked vnode. This leaves a suspended file system and therefore a
snapshot with either all or no operations of such a sequence done.
 1.262 09-Aug-2010  pooka add a linefeed to the previous
 1.261 09-Aug-2010  pooka Return error if we try to mount a file system with block size > MAXBSIZE.

Note: there is a billion ways to make the kernel panic by trying
to mount a garbage file system and I don't imagine we'll ever get
close to fixing even half of them. However, for this one failing
gracefully is a bonus since Xen DomU only does 32k MAXBSIZE and
the 64k MAXBSIZE file systems are out there (PR port-xen/43727).

Tested by compiling sys/rump with CPPFLAGS+=-DMAXPHYS=32768 (all
tests in tests/fs still pass). I don't know how we're going to
translate this into an easy regression test, though. Maybe with
a hacked newfs?
 1.260 21-Jul-2010  hannken Make holding v_interlock mandatory for callers of vget().

Announced some time ago on tech-kern.
 1.259 24-Jun-2010  hannken Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.
 1.258 11-Feb-2010  mlelstv branches: 1.258.2;
There is no code left that uses disk size data, so don't query it.
This also failed when querying the simulated block device from mfs.
Fixes PR kern/42782.
 1.257 05-Feb-2010  mlelstv branches: 1.257.2;
Correct addressing of superblock updates.
 1.256 31-Jan-2010  mlelstv Fix block shift to work with different device block sizes.

Unlike other filesystems this has some side issues because
the shift values are stored in the superblock and because
userland utitlies share the same fsbtodb macros.

-> the kernel now ignores the value stored in the superblock.
-> the macro adaption is only done for defined(_KERNEL) code.
 1.255 31-Jan-2010  mlelstv Replace individual queries for partition information with
new helper function.
 1.254 08-Jan-2010  pooka The VATTR_NULL/VREF/VHOLD/HOLDRELE() macros lost their will to live
years ago when the kernel was modified to not alter ABI based on
DIAGNOSTIC, and now just call the respective function interfaces
(in lowercase). Plenty of mix'n match upper/lowercase has creeped
into the tree since then. Nuke the macros and convert all callsites
to lowercase.

no functional change
 1.253 04-Nov-2009  hannken Now that softdep has left the tree the only place needing the ffs_lock()
hack is ffs_sync().

- Use the generic lock operations for ffs.
- Change ffs_sync() to omit the vnode lock while suspending.

Reviewed by: Antti Kantee <pooka@netbsd.org>
 1.252 13-Sep-2009  bouyer If the WAPBL journal can't be read (ffs_wapbl_replay_start() fails),
mount the filesystem anyway if MNT_FORCE is present.
This allows to still boot single-user a system with a corrupted
WAPBL on /, and so get a chance to run fsck to fix it.
http://mail-index.netbsd.org/tech-kern/2009/08/17/msg005896.html
and followups.
 1.251 13-Sep-2009  tsutsui Move declaration of ufs_hashlock into <ufs/ufs_extern.h> from each c source.
 1.250 31-Jul-2009  pooka Don't free extattr resources until it is certain that unmount
succeeds. Also, "unmount system call" -> "unmount vfs operation"
in comment just so that our comments aren't 15+ years outdated.
 1.249 23-Jul-2009  pooka Restore error behaviour bulldozed in rev 1.246.

might fix PR kern/41769
 1.248 06-Jul-2009  christos Fix bug introduced in revision 1.174 where a NULL fspec with an MNT_UPDATE
command would always return EINVAL. This broke fsck on root, where fsck'ing
a dirty root would always return an error causing rc to resort in a reboot.
 1.247 29-Jun-2009  dholland Convert 67 namei call sites to use namei_simple, in these functions:

check_console, veriexecclose, veriexec_delete, veriexec_file_add,
emul_find_root, coff_load_shlib (sh3 version), coff_load_shlib,
compat_20_sys_statfs, compat_20_netbsd32_statfs,
ELFNAME2(netbsd32,probe_noteless), darwin_sys_statfs,
ibcs2_sys_statfs, ibcs2_sys_statvfs, linux_sys_uselib,
osf1_sys_statfs, sunos_sys_statfs, sunos32_sys_statfs,
ultrix_sys_statfs, do_sys_mount, fss_create_files (3 of 4),
adosfs_mount, cd9660_mount, coda_ioctl, coda_mount, ext2fs_mount,
ffs_mount, filecore_mount, hfs_mount, lfs_mount, msdosfs_mount,
ntfs_mount, sysvbfs_mount, udf_mount, union_mount, sys_chflags,
sys_lchflags, sys_chmod, sys_lchmod, sys_chown, sys_lchown,
sys___posix_chown, sys___posix_lchown, sys_link, do_sys_pstatvfs,
sys_quotactl, sys_revoke, sys_truncate, do_sys_utimes, sys_extattrctl,
sys_extattr_set_file, sys_extattr_set_link, sys_extattr_get_file,
sys_extattr_get_link, sys_extattr_delete_file,
sys_extattr_delete_link, sys_extattr_list_file, sys_extattr_list_link,
sys_setxattr, sys_lsetxattr, sys_getxattr, sys_lgetxattr,
sys_listxattr, sys_llistxattr, sys_removexattr, sys_lremovexattr

All have been scrutinized (several times, in fact) and compile-tested,
but not all have been explicitly tested in action.

XXX: While I haven't (intentionally) changed the use or nonuse of
XXX: TRYEMULROOT in any of these places, I'm not convinced all the
XXX: uses are correct; an audit might be desirable.
 1.246 25-Apr-2009  elad Add genfs_can_mount() and use it to prevent some more code duplication of
the security checks when mounting a device (VOP_ACCESS() + kauth(9) call)).

Proposed with no objections on tech-kern@:

http://mail-index.netbsd.org/tech-kern/2009/04/20/msg004859.html

The vnode is always expected to be locked, so no locking is done outside
the file-system code.
 1.245 29-Mar-2009  ad fsync:

- atime updates were not being synced.

ffs_sync:

- In some cases the sync vnode was acting like now dead /usr/sbin/update.
It was examining vnodes that it should have ignored.

- It would find dirty inodes and try to flush them. Often ffs_fsync()
cheerfully ignored the flush request due to the fsync bug. Such inodes
remained dirty and were repeatedly re-examined by the syncer until
vnode reclaim or system shutdown.

- We were marking our place in the per-mount vnode list even though in
most cases there was not flush to perform. While not a bug, this wasted
CPU cycles because a TAILQ_NEXT would have sufficed.
 1.244 21-Mar-2009  ad ffs_sync: ensure that we *do* flush atime updates periodically.
ffs_update() was eating the flag.
 1.243 22-Feb-2009  ad PR kern/26878 FFSv2 + softdep = livelock (no free ram)
PR kern/16942 panic with softdep and quotas
PR kern/19565 panic: softdep_write_inodeblock: indirect pointer #1 mismatch
PR kern/26274 softdep panic: allocdirect_merge: ...
PR kern/26374 Long delay before non-root users can write to softdep partitions
PR kern/28621 1.6.x "vp != NULL" panic in ffs_softdep.c:4653 while unmounting a softdep (+quota) filesystem
PR kern/29513 FFS+Softdep panic with unfsck-able file-corruption
PR kern/31544 The ffs softdep code appears to fail to write dirty bits to disk
PR kern/31981 stopping scsi disk can cause panic (softdep)
PR kern/32116 kernel panic in softdep (assertion failure)
PR kern/32532 softdep_trackbufs deadlock
PR kern/37191 softdep: locking against myself
PR kern/40474 Kernel panic after remounting raid root with softdep

Retire softdep, pass 2. As discussed and later formally announced on the
mailing lists.
 1.242 22-Feb-2009  ad PR kern/39564 wapbl performance issues with disk cache flushing
PR kern/40361 WAPBL locking panic in -current
PR kern/40361 WAPBL locking panic in -current
PR kern/40470 WAPBL corrupts ext2fs
PR kern/40562 busy loop in ffs_sync when unmounting a file system
PR kern/40525 panic: ffs_valloc: dup alloc

- A fix for an issue that can lead to "ffs_valloc: dup" due to dirty cg
buffers being invalidated. Problem discovered and patch by dholland@.

- If the syncer fails to lazily sync a vnode due to lock contention,
retry 1 second later instead of 30 seconds later.

- Flush inode atime updates every ~10 seconds (this makes most sense with
logging). Presently they didn't hit the disk for read-only files or
devices until the file system was unmounted. It would be better to trickle
the updates out but that would require more extensive changes.

- Fix issues with file system corruption, busy looping and other nasty
problems when logging and non-logging file systems are intermixed,
with one being the root file system.

- For logging, do not flush metadata on an inode-at-a-time basis if the sync
has been requested by ioflush. Previously, we could try hundreds of log
sync operations a second due to inode update activity, causing the syncer
to fall behind and metadata updates to be serialized across the entire
file system. Instead, burst out metadata and log flushes at a minimum
interval of every 10 seconds on an active file system (happens more often
if the log becomes full). Note this does not change the operation of
fsync() etc.

- With the flush issue fixed, re-enable concurrent metadata updates in
vfs_wapbl.c.
 1.241 13-Nov-2008  ad branches: 1.241.4;
Remove #ifdef LFS from the ufs code.
 1.240 10-Nov-2008  joerg Reduce internals of WAPBL exposed to the rest of the system.
 1.239 30-Oct-2008  joerg branches: 1.239.2;
Fix indentation.
 1.238 10-Oct-2008  hannken branches: 1.238.2;
Break a deadlock where one thread has a wapbl transaction, calls VOP_GETPAGES
and wants to busy a page while another thread calls VOP_PUTPAGES on the same
vnode, takes pages busy and wants to start a wapbl transaction.

Reviewed by: Jason Thorpe <thorpej@netbsd.org>
 1.237 23-Sep-2008  pooka Remove some of my debugging code which was not meant to be committed
in the wapbl merge.
 1.236 21-Sep-2008  freza Revert previous, pooka@ points out it's wrong.
 1.235 21-Sep-2008  freza WAPBL: in '%s: replaying log to disk' message use the path we're
trying to mount on instead of the misleading last-mounted-on
path. Reported by jmcneill.
 1.234 22-Aug-2008  hannken Add snapshot support for logging ffs file systems.

- Add UFS_WAPBL_BEGIN() / UFS_WAPBL_END() where needed.

- Expunge WAPBL log inodes from snapshots.

- Ffs_copyonwrite() and ffs_snapblkfree() must run inside a WAPBL transaction.

- Add ffs_gop_write() as a wrapper around genfs_gop_write() that makes sure
genfs_gop_write() gets always called inside a WAPBL transaction.

- Add VOP_PUTPAGES() flag PGO_JOURNALLOCKED to tag calls to VOP_PUTPAGES()
inside a WAPBL transaction.

Reviewed by: Simon Burge <simonb@netbsd.org>, Greg Oster <oster@netbsd.org>

PGO_JOURNALLOCKED / ffs_gop_write() part presented on tech-kern@.
 1.233 15-Aug-2008  hannken ffs_suspendctl: make sure everything is on disk and the on disk log is empty.
 1.232 31-Jul-2008  hannken Ffs snapshots don't work (yet) with WAPBL:
- no snapshot creation on logging file systems.
- refuse to mount logging file systems with persistent snapshots.

Ok: Simon Burge <simonb@netbsd.org>
 1.231 31-Jul-2008  simonb Merge the simonb-wapbl branch. From the original branch commit:

Add Wasabi System's WAPBL (Write Ahead Physical Block Logging)
journaling code. Originally written by Darrin B. Jewell while
at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

OK'd by core@, releng@.
 1.230 28-Jun-2008  rumble branches: 1.230.2;
Create sysctl entries during module initialisation and destroy them
appropriately.

Many of these file systems are now ready for modularisation.
 1.229 03-Jun-2008  hannken branches: 1.229.2;
ufs/ffs: replace calls to getblk() with ffs_getblk(). Now all buffers
have been run through copy-on-write and async mounts work again.

Fixes PR kern/38820

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.228 16-May-2008  hannken Make sure all cached buffers with valid, not yet written data have been
run through copy-on-write. Call fscow_run() with valid data where possible.

The LP_UFSCOW hack is no longer needed to protect ffs_copyonwrite() against
endless recursion.

- Add a flag B_MODIFY to bread(), breada() and breadn(). If set the caller
intends to modify the buffer returned.

- Always run copy-on-write on buffers returned from ffs_balloc().

- Add new function ffs_getblk() that gets a buffer, assigns a new blkno,
may clear the buffer and runs copy-on-write. Process possible errors
from getblk() or fscow_run(). Part of PR kern/38664.

Welcome to 4.99.63

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.227 10-May-2008  rumble Convert file systems to dynamically attach with the new module interface.
Make VFS hooks dynamic while we're here and say farewell to VFS_ATTACH and
VFS_HOOKS_ATTACH linksets.

As a consequence, most of the file systems can now be loaded as new style
modules.

Quick sanity check by ad@.
 1.226 06-May-2008  ad branches: 1.226.2;
PR kern/38141 lookup/vfs_busy acquire rwlock recursively

Simplify the mount locking. Remove all the crud to deal with recursion on
the mount lock, and crud to deal with unmount as another weirdo lock.

Hopefully this will once and for all fix the deadlocks with this. With this
commit there are two locks on each mount:

- krwlock_t mnt_unmounting. This is used to prevent unmount across critical
sections like getnewvnode(). It's only ever read locked with rw_tryenter(),
and is only ever write locked in dounmount(). A write hold can't be taken
on this lock if the current LWP could hold a vnode lock.

- kmutex_t mnt_updating. This is taken by threads updating the mount, for
example when going r/o -> r/w, and is only present to serialize updates.
In order to take this lock, a read hold must first be taken on
mnt_unmounting, and the two need to be held across the operation.

One effect of this change: previously if an unmount failed, we would make a
half hearted attempt to back out of it gracefully, but that was unlikely to
work in a lot of cases. Now while an unmount that will be aborted is in
progress, new file operations within the mount will fail instead of being
delayed. That is unlikely to be a problem though, because if the admin
requests unmount of a file system then s(he) has made a decision to deny
access to the resource.
 1.225 30-Apr-2008  ad PR kern/38135 vfs_busy/vfs_trybusy confusion

The previous fix worked, but it opened a window where mounts could have
disappeared from mountlist while the caller was traversing it using
vfs_trybusy(). Fix that.
 1.224 29-Apr-2008  ad PR kern/38057 ffs makes assuptions about devvp file system
PR kern/33406 softdeps get stuck in endless loop

Introduce VFS_FSYNC() and call it when syncing a block device, if it
has a mounted file system.
 1.223 17-Apr-2008  hannken branches: 1.223.2; 1.223.4;
Replace get/setspecific with a void pointer in struct ufsmount. Use explicit
initialization/finalization of snapshot private data on creation/deletion
of struct ufsmount.
Snapshot mounts no longer may fail silently because kmem_alloc() fails.

Welcome to 4.99.60

Ok: Andrew Doran <ad@netbsd.org>
 1.222 30-Jan-2008  ad branches: 1.222.6;
PR kern/37706 (forced unmount of file systems is unsafe):

- Do reference counting for 'struct mount'. Each vnode associated with a
mount takes a reference, and in turn the mount takes a reference to the
vfsops.
- Now that mounts are reference counted, replace the overcomplicated mount
locking inherited from 4.4BSD with a recursable rwlock.
 1.221 28-Jan-2008  dholland Fix some race conditions in rename.
Introduce a per-FS rename lock and new vfsops to manipulate it.
Get this lock while renaming. Also add another relookup() in do_sys_rename,
which is a hack to kludge around some of the worst deficiencies of
ufs_rename.
reviewed-by: pooka (and an earlier rev by ad)
posted on tech-kern with no objections.
 1.220 25-Jan-2008  pooka Destroy extattr lock when destroying extattrs associated with the
mountpoint. Make stopping extattrs always succesful to facilitate
always being able to free resources.
 1.219 24-Jan-2008  ad specfs changes for PR kern/37717 (raidclose() is no longer called on
shutdown). There are still problems with device access and a PR will be
filed.

- Kill checkalias(). Allow multiple vnodes to reference a single device.

- Don't play dangerous tricks with block vnodes to ensure that only one
vnode can describe a block device. Instead, prohibit concurrent opens of
block devices. As a bonus remove the unreliable code that prevents
multiple file system mounts on the same device. It's no longer needed.

- Track opens by vnode and by device. Issue cdev_close() when the last open
goes away, instead of abusing vnode::v_usecount to tell if the device is
open.
 1.218 09-Jan-2008  ad Fix hangs on 'biolock' when creating a directory under / with softdep.
 1.217 07-Jan-2008  ad Fix 'panic: softdep_update_inodeblock: update failed'.
 1.216 03-Jan-2008  ad Use pool_cache.
 1.215 03-Jan-2008  pooka valloc -> vnalloc, vfree -> vnfree
Avoids collision with userland valloc(3).

no functional change
ad ok
 1.214 02-Jan-2008  ad Merge vmlocking2 to head.
 1.213 20-Dec-2007  dyoung Call genfs_node_init a little earlier to avoid a vput()ing an
uninitialized node, later, which leads to a kernel panic. Patch
by Antti Kantee.
 1.212 08-Dec-2007  pooka branches: 1.212.4;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.
 1.211 26-Nov-2007  pooka branches: 1.211.2;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.210 10-Oct-2007  ad branches: 1.210.4;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.209 08-Oct-2007  ad Merge ffs locking & brelse changes from the vmlocking branch.
 1.208 09-Aug-2007  hannken branches: 1.208.2; 1.208.4;
Move snapshot per-mount data from struct ufsmount to mount specific data.
No functional changes.

Welcome to 4.99.28 (struct ufsmount changed size)
 1.207 31-Jul-2007  pooka branches: 1.207.2; 1.207.4;
* nuke the nameidata parameter from VFS_MOUNT(). Nobody on tech-kern
knew what it was supposed to be used for and wrstuden gave a go-ahead
* while rototilling, convert file systems which went easily to
use VFS_PROTOS() instead of manually prototyping the methods
 1.206 20-Jul-2007  pooka In sync, skip over vnodes based on if they are clean rather than
if they have pages.
 1.205 17-Jul-2007  pooka branches: 1.205.2;
Make set_statvfs_info() take a parameter for the vfs name instead
of always retrieving it from mp->mnt_op->vfs_name

christos ok
 1.204 12-Jul-2007  dsl Change the VFS_MOUNT() interface so that the 'data' buffer passed to the
fs code is a kernel buffer, pass though the length of the buffer as well.
Since the length of the userspace buffer isn'it (yet) passed through the mount
system call, add a field to the vfsops structure containing the default length.
Split sys_mount() for calls from compat code.
Ride one of the recent kernel version changes - old fs LKMs will load, but
sys_mount() will reject any attempt to use them.
 1.203 10-Jul-2007  hannken Move `struct dquot' and its supporting functions from quota.h to ufs_quota.c.

- Make quota-internal functions static.
- Clean up declarations in quota.h and ufs_extern.h. quota.h now has the
description of quota criterions, on-disk structure, user-kernel interface and
declaration of init/done functions. All ufs quota related function
prototypes go to ufs_extern.h.
- New functions ufsquota_init() and ufsquota_free() create or destroy the
quota fields of `struct inode'.
- chkdq() and chkiq() always update the quota fields of `struct inode' first.
- Only ufs_access() explicitely calls getinoquota().

No objections on tech-kern@
 1.202 30-Jun-2007  pooka Using POOL_INIT here makes no sense, since file systems always have
an init method. So get rid of it and #ifdef _LKM and just always
init in the init method. Give malloc types the same treatment.
Makes file systems nicer to work with in linksetless environments
and fixes a few LKM discrepancies.
 1.201 29-May-2007  tsutsui Fix inconsistent changes in rev 1.153 and 1.154:
Adjust fs->fs_maxfilesize instead of ump->um_maxfilesize
in ffs_oldfscompat_read() because the latter is overrided
by the former after ffs_oldfscompat_read() returned.

Fixes EFBIG errors on read(2) and "exec /sbin/init: error 8"
problem on mac68k after mountroot() on old 4.3BSD UFS created
by the Mkfs tool for MacOS (reported and confirmed on port-mac68k).
 1.200 28-May-2007  ad Fix lock order inversion between vnode locks and ufs_hashlock. Addresses
kern/36331 (MP deadlock between ufs_ihashget() and VOP_LOOKUP()) for ffs,
other file systems to follow. Reported by perseant@, debugged by Sverre
Froyen, patch posted/tested by Blair Sadewitz.
 1.199 17-May-2007  hannken Fstrans_start() always returns zero, so change its type to void.
 1.198 07-Apr-2007  hannken Remove calls to now obsolete vn_start_write() and vn_finished_write().
 1.197 12-Mar-2007  ad branches: 1.197.2;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.
 1.196 16-Feb-2007  hannken branches: 1.196.2; 1.196.6;
Make fstrans(9) the default helper for file system suspension.
Replaces the now obsolete vn_start_write()/vn_finished_write().
 1.195 15-Feb-2007  ad Replace some uses of lockmgr() / simplelocks.
 1.194 29-Jan-2007  hannken Change fstrans enum types to upper case.
No functional change.

From Antti Kantee <pooka@netbsd.org>
 1.193 19-Jan-2007  hannken New file system suspension API to replace vn_start_write and vn_finished_write.
The suspension helpers are now put into file system specific operations.
This means every file system not supporting these helpers cannot be suspended
and therefore snapshots are no longer possible.

Implemented for file systems of type ffs.

The new API is enabled on a kernel option NEWVNGATE. This option is
not enabled by default in any kernel config.

Presented and discussed on tech-kern with much input from
Bill Studenmund <wrstuden@netbsd.org> and YAMAMOTO Takashi <yamt@netbsd.org>.

Welcome to 4.99.9 (new vfs op vfs_suspendctl).
 1.192 07-Jan-2007  isaki Correct indent.
 1.191 04-Jan-2007  elad Consistent usage of KAUTH_GENERIC_ISSUSER.
 1.190 16-Nov-2006  christos branches: 1.190.2; 1.190.4;
__unused removal on arguments; approved by core.
 1.189 25-Oct-2006  reinoud Revisit mnt_vnodelist TAILQ patch. Remove all suspicious TAILQ_FOREACH()
loops where vnodes can get removed or added during the loops. This could
lead to panic's on unmount since nodes are skipped or otherwise
TAILQ_NEXT(0xdeadbeef, ...) was dereferenced.
 1.188 20-Oct-2006  reinoud Replace the LIST structure mp->mnt_vnodelist to a TAILQ structure since all
vnodes were synced and processed backwards. This meant that the last
accessed node was processed first and the earlierst last.

An extra benefit is the removal of the ugly hack from the Berkly days on
LFS.

In the proces, i've also replaced the various variations hand written loops
by the TAILQ_FOREACH() macro's.
 1.187 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.186 21-Sep-2006  jld Change ffs_mount, in MNT_UPDATE case, to check dev_t's for equality
instead of just vnode pointers. Fixes erroneous "does not match mounted
device" errors from mount(8) in the presence of MFS /dev, init.root, &c.

No objections on tech-kern.
 1.185 30-Aug-2006  christos branches: 1.185.2; 1.185.4;
fix missing initializers
 1.184 23-Jul-2006  ad Use the LWP cached credentials where sane.
 1.183 13-Jul-2006  martin Fix alignement problems for fhandle_t, exposed by gcc4.1.

While touching all vptofh/fhtovp functions, get rid of VFS_MAXFIDSIZ,
version the getfh(2) syscall and explicitly pass the size available in
the filehandle from userland.

Discussed on tech-kern, with lots of help from yamt (thanks!).
 1.182 07-Jun-2006  kardel branches: 1.182.2;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.181 14-May-2006  elad branches: 1.181.2;
integrate kauth.
 1.180 21-Feb-2006  thorpej branches: 1.180.2; 1.180.4; 1.180.6;
Use device_class() instead of accessing dv_class directly.
 1.179 14-Jan-2006  yamt branches: 1.179.2; 1.179.4;
- unify ffs_blkatoff and lfs_blkatoff.
- remove ufs_ops::uo_blkatoff.
- add directory read-ahead code. (disabled for now.)
 1.178 23-Dec-2005  rpaulo branches: 1.178.2;
Convert UFS_EXTATTR to struct lwp.
 1.177 11-Dec-2005  christos merge ktrace-lwp.
 1.176 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.175 27-Sep-2005  yamt branches: 1.175.2;
introduce "ufs_ops" and use it for ITIMES.
 1.174 23-Sep-2005  jmmv Apply the NFS exports list rototill patch:

- Remove all NFS related stuff from file system specific code.
- Drop the vfs_checkexp hook and generalize it in the new nfs_check_export
function, thus removing redundancy from all file systems.
- Move all NFS export-related stuff from kern/vfs_subr.c to the new
file sys/nfs/nfs_export.c. The former was becoming large and its code
is always compiled, regardless of the build options. Using the latter,
the code is only compiled in when NFSSERVER is enabled. While doing this,
also make some functions in nfs_subs.c conditional to NFSSERVER.
- Add a new command in nfssvc(2), called NFSSVC_SETEXPORTSLIST, that takes a
path and a set of export entries. At the moment it can only clear the
exports list or append entries, one by one, but it is done in a way that
allows setting the whole set of entries atomically in the future (see the
comment in mountd_set_exports_list or in doc/TODO).
- Change mountd(8) to use the nfssvc(2) system call instead of mount(2) so
that it becomes file system agnostic. In fact, all this whole thing was
done to remove a 'XXX' block from this utility!
- Change the mount*, newfs and fsck* userland utilities to not deal with NFS
exports initialization; done internally by the kernel when initializing
the NFS support for each file system.
- Implement an interface for VFS (called VFS hooks) so that several kernel
subsystems can run arbitrary code upon receipt of specific VFS events.
At the moment, this only provides support for unmount and is used to
destroy NFS exports lists from the file systems being unmounted, though it
has room for extension.

Thanks go to yamt@, chs@, thorpej@, wrstuden@ and others for their comments
and advice in the development of this patch.
 1.173 22-Sep-2005  rpaulo Fix bogus if-clause introduced in previous revision.
 1.172 22-Sep-2005  rpaulo In ffs_unmount(), detect EOPNOTSUPP errno returned from
ufs_extattr_stop().

From FreeBSD.
 1.171 12-Sep-2005  christos - access the ffs and ext2fs itimes functions through a pointer, so that
if the filesystem is not compiled in the kernel still links. Probably
a better solution is to use weak symbols.
- move the filesystem-specific itime macros to the filesystem header files.
 1.170 28-Aug-2005  thorpej Experimental support for extended attributes on UFS1 file systems, using a
backing file per attribute type indexed by inode number to hold the extended
attributes.

This is working pretty well on my test systems, except for the "autostart"
feature. I need someone with a better handle on the VFS locking protocol
to go over that.

This is a work-in-progress. There are parts of this that could be re-factored
allowing this approach to be used on other types of file systems.

Adapted from FreeBSD.
 1.169 23-Aug-2005  christos Don't overload MAXNAMLEN, use a separate constant for each filesystem type.
 1.168 25-Jul-2005  drochner fix crash in mount error handling: don't free storage which was not
malloc'd
 1.167 23-Jul-2005  yamt update file timestamps for nfsd loaned-read and mmap.
PR/25279. discussed on tech-kern@.
 1.166 15-Jul-2005  thorpej Use ANSI function decls.
 1.165 28-Jun-2005  yamt branches: 1.165.2;
- constify genfs_ops.
- use member designators.
 1.164 29-May-2005  christos - sprinkle const
- avoid shadow variables.
 1.163 29-Mar-2005  thorpej - Define a VFS_ATTACH() macro that places a reference to a vfsops structure
into the "vfsops" link set.
- Use VFS_ATTACH() where vfsops are declared for individual file systems.
- In vfsinit(), traverse the "vfsops" link set, rather than vfs_list_initial[].
 1.162 04-Mar-2005  christos branches: 1.162.2;
PR/26823: Michael L. Hitch: Endianness flag were not preserved in the compat
superblock read routine.
 1.161 26-Feb-2005  perry nuke trailing whitespace
 1.160 11-Jan-2005  mycroft branches: 1.160.2; 1.160.4;
Rearrange some code slightly to avoid uninitialized variable warnings.
 1.159 09-Jan-2005  mycroft Rework the mountroot interface so that vfs_mountroot() opens the root device
and just passes it on to the file system functions. This avoids opening and
closing the device several times.

Mentioned on tech-kern some time ago, IIRC. I've been running this for a
long time.
 1.158 02-Jan-2005  thorpej Add the system call and VFS infrastructure for file system extended
attributes.

From FreeBSD.
 1.157 26-Dec-2004  dbj remove opt_compat_netbsd.h, afaict it is no longer needed.
i think it was previously used to pull in COMPAT_09 for ffs_statfs
 1.156 21-Nov-2004  jdolecek allow changes of the sysctl values
 1.155 21-Sep-2004  thorpej Add a new VNODE_LOCKDEBUG option, which enables checks in the VOP_*()
calls to ensure that the vnode lock state is as expected when the VOP
call is made. Modify vnode_if.src to set the expected state according
to the documenting lock table for each VOP. Modify vnode_if.sh to emit
the checks.

Notes:
- The checks are only performed if the vnode has the VLOCKSWORK bit
set. Some file systems (e.g. specfs) don't even bother with vnode
locks, so of course the checks will fail.
- We can't actually run with VNODE_LOCKDEBUG because there are so many
vnode locking problems, not the least of which is the "use SHARED for
VOP_READ()" issue, which screws things up for the entire call chain.

Inspired by similar changes in OpenBSD, but implemented differently.
 1.154 19-Sep-2004  yamt um_maxfilesize should be set after
ffs_oldfscompat_read adjusted fs_maxfilesize.
 1.153 15-Aug-2004  mycroft Fixing age old cruft:
* Rather than using mnt_maxsymlinklen to indicate that a file systems returns
d_type fields(!), add a new internal flag, IMNT_DTYPE.

Add 3 new elements to ufsmount:
* um_maxsymlinklen, replaces mnt_maxsymlinklen (which never should have existed
in the first place).
* um_dirblksiz, which tracks the current directory block size, eliminating the
FS-specific checks littered throughout the code. This may be used later to
make the block size variable.
* um_maxfilesize, which is the maximum file size, possibly adjusted lower due
to implementation issues.

Sync some bug fixes from FFS into ext2fs, particularly:
* ffs_lookup.c 1.21, 1.28, 1.33, 1.48
* ffs_inode.c 1.43, 1.44, 1.45, 1.66, 1.67
* ffs_vnops.c 1.84, 1.85, 1.86

Clean up some crappy pointer frobnication.
 1.152 14-Aug-2004  mycroft Add a new flag, IN_MODIFY. This is like IN_UPDATE|IN_CHANGE, but unlike
setting those flags, it does not cause the inode to be written in the periodic
sync. This is used for writes to special files (devices and named pipes) and
FIFOs.

Do not preemptively sync updates to access times and modification times. They
are now updated in the inode only opportunistically, or when the file or device
is closed. (Really, it should be delayed beyond close, but this is enough to
help substantially with device nodes.)

And the most amusing part:
Trickle sync was broken on both FFS and ext2fs, in different ways. In FFS, the
periodic call to VFS_SYNC(MNT_LAZY) was still causing all file data to be
synced. In ext2fs, it was causing the metadata to *not* be synced. We now
only call VOP_UPDATE() on the node if we're doing MNT_LAZY. I've confirmed
that we do in fact trickle correctly now.
 1.151 05-Jul-2004  pk Call inittodr() from main(). Let file system code set the recorded `last
update' time (if any) through the new function setrootfstime().
 1.150 27-May-2004  hannken Fixup last commit. fs->fs_active must be initialized.
 1.149 25-May-2004  hannken Add ffs internal snapshots. Written by Marshall Kirk McKusick for FreeBSD.

- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.

Welcome to 2.0F.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.148 25-May-2004  atatat Sysctl descriptions under vfs subtree
 1.147 20-May-2004  atatat Explicitly call pool_init() (and pool_destroy()) when being built as
an _LKM.

This adds pools to the list of things that lkms must do manually
because they're set up with link sets. Not that there's anything
wrong with link sets, but that we need to try harder to remember that
lkms are second class citizens. Of a sort.
 1.146 26-Apr-2004  simonb Unwrap a not-too-long line.
 1.145 25-Apr-2004  dbj remove botched superblock upgrade warnings.
there are now alternate non-kernel checks and fixes for this problem.
relevent prs include:
bin/17910 kern/21283 kern/21404 port-macppc/23925 port-macppc/23926
install/25138
 1.144 25-Apr-2004  simonb Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.
 1.143 21-Apr-2004  christos Replace the statfs() family of system calls with statvfs().
Retain binary compatibility.
 1.142 18-Apr-2004  dbj remove code that attempts to correct superblock location. this
enforces an unnecessary restriction that the superblock be in the
particular expected locations. Also, the compatibility case is
handled in ffs_oldfscompat_read.
 1.141 18-Apr-2004  dbj when enabling ffs compatibility in ffs_reload, use
sblockloc that superblock was read from
also note XXX that ffs_reload doesn't handle superblock moving
 1.140 27-Mar-2004  dsl branches: 1.140.2;
Rework previous so that FS_FLAGS_UPDATED is only looked at for ffsv1
 1.139 24-Mar-2004  atatat Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.
 1.138 21-Mar-2004  dsl Rework superblock validation logic to make adding validity tests easier.
Ensure that we don't use the first alternate superblock of a ffsv1
filesystem with 64k blocks (it is in the same place as an ffsv2 sb).
Fixes part of PR kern/24809
 1.137 11-Mar-2004  dbj quiet tls. change botched superblock warning to use -b 16
 1.136 10-Mar-2004  keihan s/netbsd.org/NetBSD.org/g
 1.135 22-Feb-2004  jdolecek make sblock_try[] const
 1.134 12-Jan-2004  dbj change the updating note to say you may need fsck_ffs -b 32 -c 4'
 1.133 12-Jan-2004  dbj add checks for a couple of botched superblock upgrade cases
and report a warning with repair references.
 1.132 10-Jan-2004  hannken Split out softdep_flushworklist() from softdep_flushfiles() so that
it can be used to clear the work queue.

Cleanup ffs_sync() which did not synchronously wait when MNT_WAIT
was specified. Clear the work queue when MNT_WAIT is specified.

Result is a clean on-disk file system after ffs_sync(.., MNT_WAIT, ..)

From FreeBSD.
 1.131 09-Jan-2004  dbj never upgrade the superblock or set FS_FLAGS_UPDATED in fs_old_flags
add compatibility for filesystems created before FFSv2 integration
these patches are from pr port-macppc/23926 and should also fix
problems discussed in pr kern/21404 and pr kern/21283
 1.130 04-Dec-2003  atatat Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
 1.129 01-Dec-2003  dbj in ffs_unmount, ignore error returned by VOP_CLOSE(devvp)
this fixes a problem where device close error would cause
unmount to fail but structures to be left partially deallocated
 1.128 08-Nov-2003  dbj fix minor memory leaks in error paths of ffs_mountfs
 1.127 05-Nov-2003  hannken Clean up the usage of vn_start_write(). At least one occurence clobbered
previous error conditions.
If "(flags & (V_WAIT|V_PCATCH)) == V_WAIT" the return value is always zero.
Ignore the return value in these cases.

From Darrin B. Jewell.
 1.126 30-Oct-2003  simonb Remove some assigned-to but otherwise unused variables.
 1.125 15-Oct-2003  hannken Add the gating of system calls that cause modifications to the underlying
file system.
The function vfs_write_suspend stops all new write operations to a file
system, allows any file system modifying system calls already in progress
to complete, then sync's the file system to disk and returns. The
function vfs_write_resume allows the suspended write operations to
complete.

From FreeBSD with slight modifications.

Approved by: Frank van der Linden <fvdl@netbsd.org>
 1.124 14-Oct-2003  dbj add mnt_iflag field to struct mount for internal flags
mv MNT_GONE, MNT_UNMOUNT and MNT_WANTRDWR to this field
additonally add mnt_writeopcountupper and mnt_writeopcountlower fields
in preparation for pending write suspension support work
bump kernel version to 1.6ZD
 1.123 25-Sep-2003  enami In ffs_sbupdate(), swap the sblock after ffs_oldfscompat_write() is
applied rather than the original.
 1.122 17-Sep-2003  enami Fix a recently introduced bug which prevents csum totals being copied
when an old ffs filesytem is first mounted (as a result, df reports disk
full on old ffs filesystem or mfs created by old binary). Problem first
noticed by onoe san.
 1.121 13-Sep-2003  bouyer make sure to not get flags which are for internal use only from the on-disk
superblock.
Proposed in http://mail-index.netbsd.org/tech-kern/2003/09/06/0005.html
 1.120 13-Sep-2003  bouyer Commit changes proposed in
http://mail-index.netbsd.org/tech-kern/2003/09/06/0001.html
http://mail-index.netbsd.org/tech-kern/2003/09/06/0006.html
to avoid compat problems with old ffsv1 by reuse of the old FS_SWAPPED
value for FS_FLAGS_UPDATED, and use of new, larger fields:
- Don't use FS_FLAGS_UPDATED to see if we need to update new fields from
old fields in ffsv1 case.
- when writing back the superblock, copy back the flags to the old location
if only old flags are set (FS_FLAGS_UPDATED won't be set in this case)
in ffsv1 case.
 1.119 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.118 29-Jun-2003  fvdl branches: 1.118.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.117 29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.116 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.115 12-Jun-2003  fvdl OS X still seems to use the old nrpos field in the superblock, and gets
unhappy after NetBSD wrote an Apple UFS filesystem. Just set it to 0
in this case.
 1.114 03-May-2003  christos make sure we update fs_fsmnt.
 1.113 16-Apr-2003  christos PR/1796: John Kohl: statfs misbehaves under chrooted environments.

- Under chroot it displays only the visible filesystems with appropriate paths.
- The statfs f_mntonname gets adjusted to contain the real path from root.
- While was there, fixed a bug in ext2fs, locking problems with vfs_getfsstat(),
and factored out some of the vfsop statfs() code to copy_statfs_info(). This
fixes the problem where some filesystems forgot to set fsid.
- Made coda look more like a normal fs.
 1.112 12-Apr-2003  fvdl Don't cache buffers used when finding the superblock, it can lead to
seeing bogus data for the first cg with certain block/frag sizes.
From enami tsugutomo.
 1.111 05-Apr-2003  fvdl * Use the old and new time fields in the superblock as well as a few others
to determine if this filesystem was mounted by an older kernel after
having been mounted by a newer one, to avoid some summary mismatches.
* Reinstate support for 4.2 cylinder groups (read-only, as it was before).
 1.110 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.109 31-Mar-2003  fvdl The modified flag must be cleared before the last sbupdate call in
unmount, because ffs_flushfiles or softdep_flushfiles may have
modified the filesystem (despite VFS_SYNC having been called first).
 1.108 21-Mar-2003  dsl Use 'void *' instead of 'caddr_t' in prototypes of VOP_IOCTL, VOP_FCNTL
and VOP_ADVLOCK, delete casts from callers (and some to copyin/out).
 1.107 17-Feb-2003  perseant Add code to UBCify LFS. This is still behind "#ifdef LFS_UBC" for now
(there are still some details to work out) but expect that to go
away soon. To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:

* Create a writer daemon kernel thread whose purpose is to handle page
writes for the pagedaemon, but which also takes over some of the
functions of lfs_check(). This thread is started the first time an
LFS is mounted.

* Add a "flags" parameter to GOP_SIZE. Current values are
GOP_SIZE_READ, meaning that the call should return the size of the
in-core version of the file, and GOP_SIZE_WRITE, meaning that it
should return the on-disk size. One of GOP_SIZE_READ or
GOP_SIZE_WRITE must be specified.

* Instead of using malloc(...M_WAITOK) for everything, reserve enough
resources to get by and use malloc(...M_NOWAIT), using the reserves if
necessary. Use the pool subsystem for structures small enough that
this is feasible. This also obsoletes LFS_THROTTLE.

And a few that are not strictly necessary:

* Moves the LFS inode extensions off onto a separately allocated
structure; getting closer to LFS as an LKM. "Welcome to 1.6O."

* Unified GOP_ALLOC between FFS and LFS.

* Update LFS copyright headers to correct values.

* Actually cast to unsigned in lfs_shellsort, like the comment says.

* Keep track of which segments were empty before the previous
checkpoint; any segments that pass two checkpoints both dirty and
empty can be summarily cleaned. Do this. Right now lfs_segclean
still works, but this should be turned into an effectless
compatibility syscall.
 1.106 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.105 01-Dec-2002  matt Add multiple inclusion protection for headers. Fix mismatched
variable declarations (missing const's) as needed.
 1.104 24-Nov-2002  scw Quell an uninitialised variable warning.
 1.103 28-Sep-2002  dbj Add support for the Apple UFS variation on ffs
This is the bulk of PR #17345

The general approach is to use a run time deteriminable value
for DIRBLKSIZ. Additional allowances are included for using
MAXSYMLINKLEN with FS_42INODEFMT and a shift in the cylinder group
cluster summary count array. Support is added for managing
the Apple UFS volume label.
 1.102 21-Sep-2002  christos MNT_GETARGS support
 1.101 06-Sep-2002  gehenna Merge the gehenna-devsw branch into the trunk.

This merge changes the device switch tables from static array to
dynamically generated by config(8).

- All device switches is defined as a constant structure in device drivers.

- The new grammer ``device-major'' is introduced to ``files''.

device-major <prefix> char <num> [block <num>] [<rules>]

- All device major numbers must be listed up in port dependent majors.<arch>
by using this grammer.

- Added the new naming convention.
The name of the device switch must be <prefix>_[bc]devsw for auto-generation
of device switch tables.

- The backward compatibility of loading block/character device
switch by LKM framework is broken. This is necessary to convert
from block/character device major to device name in runtime and vice versa.

- The restriction to assign device major by LKM is completely removed.
We don't need to reserve LKM entries for dynamic loading of device switch.

- In compile time, device major numbers list is packed into the kernel and
the LKM framework will refer it to assign device major number dynamically.
 1.100 30-Jul-2002  soren Die, qaddr_t, die! - mnt_data in struct mount is already effectively
a void *, so stop pretending otherwise.
 1.99 09-Jun-2002  chs allow read-only mounts even if we can't read the last fragment of the fs.
this enables one to recover data from a failing disk (where the read failure
is a hardware problem) while avoiding corrupting the fs further (in the case
where the read failure is due to a misconfiguration).
 1.98 10-Apr-2002  mycroft branches: 1.98.2; 1.98.4;
Use blkstofrags() and fragstoblks(). Use &(NBBY-1) rather than %NBBY.
Switch off of fs_fragshift rather than fs_frag (generates better jump tables).
 1.97 01-Apr-2002  enami Hold an extra reference if updating and args.fspec == NULL.
 1.96 01-Apr-2002  christos Fixes from enami:

- If VOP_ACCESS fails when updating mount, we will vrele() twice.

- The check for update-only flags in mp->mnt_flag when not updating
case is bogus. If we really want to check, we need to see flags in
ufs_args, but I'm not sure if it is really necessary.

- The credential passed to ffs_reload was credential of when looking
up mount point, but now it is credential of when looking up device
node. Anyway, it may be current process's credential.
 1.95 31-Mar-2002  christos PR/16136: Chris Jepeway: Bogus entry in /etc/fstab can panic kernel.
 1.94 17-Mar-2002  chs when mounting a filesystem, read the last block in the filesystem
to verify that the device is at least as big as the superblock claims
the filesystem is supposed to be, and if it's not then fail the mount.
this should help reduce the type of confusion reported in PR 13228.
 1.93 08-Mar-2002  thorpej Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.
 1.92 28-Feb-2002  pooka Don't add fs->fs_pendingblocks to f_bavail twice. It's already included
in f_bfree, which is added to f_bavail.

Fixes problem with statfs reporting too much free space for filesystems
which have files pending to be freed by softdeps.
 1.91 30-Dec-2001  fvdl XXXX temporary measure: in the case of a softdep 'unmount pending error',
do not mark the filesystem clean, as this will mean that one or more
files were likely not completely removed (will show up as unconnected
in fsck). Prevents filesystems from being marked clean while they're
not until this problem has been figured out.
 1.90 19-Dec-2001  fvdl ffs_reload may be called after an old fsck has run, and the pending*
fields may not be zero. Just reset them silently, it's not an error.
 1.89 18-Dec-2001  fvdl Bring over fixes from FreeBSD that weren't incorporated yet, mainly
from Kirk McKusick. They implement taking pending block/inode frees
into account for the sake of correct statfs() numbers, and adding
a new softdep type (newdirblk) to correctly handle newly allocated
directory blocks.

Minor additional changes: 1) swap the newly introduced fs_pendinginodes
and fs_pendingblock fields in ffs_sb_swap, and 2) declare lkt_held
in the debug version of the softdep lock structure volatile, as it
can be modified from interrupt context #ifdef DEBUG.
 1.88 30-Oct-2001  lukem add __KERNEL_RCSID()
 1.87 15-Sep-2001  chs branches: 1.87.2;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.86 15-Sep-2001  chs add a new VFS op, vfs_reinit, which is called when desiredvnodes is
adjusted via sysctl. file systems that have hash tables which are
sized based on the value of this variable now resize those hash tables
using the new value. the max number of FFS softdeps is also recalculated.

convert various file systems to use the <sys/queue.h> macros for
their hash tables.
 1.85 06-Sep-2001  lukem branches: 1.85.2;
Incorporate the enhanced ffs_dirpref() by Grigoriy Orlov, as found in
FreeBSD (three commits; the initial work, man page updates, and a fix
to ffs_reload()), with the following differences:
- Be consistent between newfs(8) and tunefs(8) as to the options which
set and control the tuning parameters for this work (avgfilesize & avgfpdir)
- Use u_int16_t instead of u_int8_t to keep track of the number of
contiguous directories (suggested by Chuck Silvers)
- Work within our FFS_EI framework
- Ensure that fs->fs_maxclusters and fs->fs_contigdirs don't point to
the same area of memory

The new algorithm has a marked performance increase, especially when
performing tasks such as untarring pkgsrc.tar.gz, etc.

The original FreeBSD commit messages are attached:

=====
mckusick 2001/04/10 01:39:00 PDT
Directory layout preference improvements from Grigoriy Orlov <gluk@ptci.ru>.
His description of the problem and solution follow. My own tests show
speedups on typical filesystem intensive workloads of 5% to 12% which
is very impressive considering the small amount of code change involved.

------

One day I noticed that some file operations run much faster on
small file systems then on big ones. I've looked at the ffs
algorithms, thought about them, and redesigned the dirpref algorithm.

First I want to describe the results of my tests. These results are old
and I have improved the algorithm after these tests were done. Nevertheless
they show how big the perfomance speedup may be. I have done two file/directory
intensive tests on a two OpenBSD systems with old and new dirpref algorithm.
The first test is "tar -xzf ports.tar.gz", the second is "rm -rf ports".
The ports.tar.gz file is the ports collection from the OpenBSD 2.8 release.
It contains 6596 directories and 13868 files. The test systems are:

1. Celeron-450, 128Mb, two IDE drives, the system at wd0, file system for
test is at wd1. Size of test file system is 8 Gb, number of cg=991,
size of cg is 8m, block size = 8k, fragment size = 1k OpenBSD-current
from Dec 2000 with BUFCACHEPERCENT=35

2. PIII-600, 128Mb, two IBM DTLA-307045 IDE drives at i815e, the system
at wd0, file system for test is at wd1. Size of test file system is 40 Gb,
number of cg=5324, size of cg is 8m, block size = 8k, fragment size = 1k
OpenBSD-current from Dec 2000 with BUFCACHEPERCENT=50

You can get more info about the test systems and methods at:
http://www.ptci.ru/gluk/dirpref/old/dirpref.html

Test Results

tar -xzf ports.tar.gz rm -rf ports
mode old dirpref new dirpref speedup old dirprefnew dirpref speedup
First system
normal 667 472 1.41 477 331 1.44
async 285 144 1.98 130 14 9.29
sync 768 616 1.25 477 334 1.43
softdep 413 252 1.64 241 38 6.34
Second system
normal 329 81 4.06 263.5 93.5 2.81
async 302 25.7 11.75 112 2.26 49.56
sync 281 57.0 4.93 263 90.5 2.9
softdep 341 40.6 8.4 284 4.76 59.66

"old dirpref" and "new dirpref" columns give a test time in seconds.
speedup - speed increasement in times, ie. old dirpref / new dirpref.

------

Algorithm description

The old dirpref algorithm is described in comments:

/*
* Find a cylinder to place a directory.
*
* The policy implemented by this algorithm is to select from
* among those cylinder groups with above the average number of
* free inodes, the one with the smallest number of directories.
*/

A new directory is allocated in a different cylinder groups than its
parent directory resulting in a directory tree that is spreaded across
all the cylinder groups. This spreading out results in a non-optimal
access to the directories and files. When we have a small filesystem
it is not a problem but when the filesystem is big then perfomance
degradation becomes very apparent.

What I mean by a big file system ?

1. A big filesystem is a filesystem which occupy 20-30 or more percent
of total drive space, i.e. first and last cylinder are physically
located relatively far from each other.
2. It has a relatively large number of cylinder groups, for example
more cylinder groups than 50% of the buffers in the buffer cache.

The first results in long access times, while the second results in
many buffers being used by metadata operations. Such operations use
cylinder group blocks and on-disk inode blocks. The cylinder group
block (fs->fs_cblkno) contains struct cg, inode and block bit maps.
It is 2k in size for the default filesystem parameters. If new and
parent directories are located in different cylinder groups then the
system performs more input/output operations and uses more buffers.
On filesystems with many cylinder groups, lots of cache buffers are
used for metadata operations.

My solution for this problem is very simple. I allocate many directories
in one cylinder group. I also do some things, so that the new allocation
method does not cause excessive fragmentation and all directory inodes
will not be located at a location far from its file's inodes and data.
The algorithm is:
/*
* Find a cylinder group to place a directory.
*
* The policy implemented by this algorithm is to allocate a
* directory inode in the same cylinder group as its parent
* directory, but also to reserve space for its files inodes
* and data. Restrict the number of directories which may be
* allocated one after another in the same cylinder group
* without intervening allocation of files.
*
* If we allocate a first level directory then force allocation
* in another cylinder group.
*/

My early versions of dirpref give me a good results for a wide range of
file operations and different filesystem capacities except one case:
those applications that create their entire directory structure first
and only later fill this structure with files.

My solution for such and similar cases is to limit a number of
directories which may be created one after another in the same cylinder
group without intervening file creations. For this purpose, I allocate
an array of counters at mount time. This array is linked to the superblock
fs->fs_contigdirs[cg]. Each time a directory is created the counter
increases and each time a file is created the counter decreases. A 60Gb
filesystem with 8mb/cg requires 10kb of memory for the counters array.

The maxcontigdirs is a maximum number of directories which may be created
without an intervening file creation. I found in my tests that the best
performance occurs when I restrict the number of directories in one cylinder
group such that all its files may be located in the same cylinder group.
There may be some deterioration in performance if all the file inodes
are in the same cylinder group as its containing directory, but their
data partially resides in a different cylinder group. The maxcontigdirs
value is calculated to try to prevent this condition. Since there is
no way to know how many files and directories will be allocated later
I added two optimization parameters in superblock/tunefs. They are:

int32_t fs_avgfilesize; /* expected average file size */
int32_t fs_avgfpdir; /* expected # of files per directory */

These parameters have reasonable defaults but may be tweeked for special
uses of a filesystem. They are only necessary in rare cases like better
tuning a filesystem being used to store a squid cache.

I have been using this algorithm for about 3 months. I have done
a lot of testing on filesystems with different capacities, average
filesize, average number of files per directory, and so on. I think
this algorithm has no negative impact on filesystem perfomance. It
works better than the default one in all cases. The new dirpref
will greatly improve untarring/removing/coping of big directories,
decrease load on cvs servers and much more. The new dirpref doesn't
speedup a compilation process, but also doesn't slow it down.

Obtained from: Grigoriy Orlov <gluk@ptci.ru>
=====

=====
iedowse 2001/04/23 17:37:17 PDT
Pre-dirpref versions of fsck may zero out the new superblock fields
fs_contigdirs, fs_avgfilesize and fs_avgfpdir. This could cause
panics if these fields were zeroed while a filesystem was mounted
read-only, and then remounted read-write.

Add code to ffs_reload() which copies the fs_contigdirs pointer
from the previous superblock, and reinitialises fs_avgf* if necessary.

Reviewed by: mckusick
=====

=====
nik 2001/04/10 03:36:44 PDT
Add information about the new options to newfs and tunefs which set the
expected average file size and number of files per directory. Could do
with some fleshing out.
=====
 1.84 02-Sep-2001  lukem Incorporate fix by iedowse @ FreeBSD to allow disks with large numbers of
cylinder groups to work correctly, with minor modifications by me to work
with our FFS_EI code. From the FreeBSD commit message:

The ffs superblock includes a 128-byte region for use by temporary
in-core pointers to summary information. An array in this region
(fs_csp) could overflow on filesystems with a very large number of
cylinder groups (~16000 on i386 with 8k blocks). When this happens,
other fields in the superblock get corrupted, and fsck refuses to
check the filesystem.

Solve this problem by replacing the fs_csp array in 'struct fs'
with a single pointer, and add padding to keep the length of the
128-byte region fixed. Update the kernel and userland utilities
to use just this single pointer.

With this change, the kernel no longer makes use of the superblock
fields 'fs_csshift' and 'fs_csmask'. Add a comment to newfs/mkfs.c
to indicate that these fields must be calculated for compatibility
with older kernels.

Reviewed by: mckusick
 1.83 17-Aug-2001  lukem remove third argument (`int ns') from ffs_sb_swap(), and let ffs_sb_swap()
determine the endianness of the `struct fs *o' superblock from o->fs_magic
and set needswap as necessary, rather than trusting the caller to get
it right. invariably, almost every caller of ffs_sb_swap() was calling it
with ns set to the wrong value for ns anyway!
ansi KNF ffs_bswap.c declarations whilst here.

this fixes all sorts of problems when trying to use other-endian file systems,
notably the kernel trying to access memory *way* off, possibly corrupting or
panicing, and userland programs SEGVing and/or corrupting things (e.g,
"fsck_ffs -B" to swap a file system endianness).

whilst the previous rev of ffs_bswap.c (1.10, 2000/12/23) made this problem
worse, i suspect that the problem was always there and previous versions
just happened not to trash things at the wrong time.

FFS_EI should now be a lot more stable.
 1.82 26-Jul-2001  lukem if printing the value of fs_clean, say 'fs_clean' instead of 'fs_flags' ...
 1.81 30-May-2001  mrg branches: 1.81.4;
use _KERNEL_OPT
 1.80 07-Feb-2001  chs branches: 1.80.2;
remove debug code that was left in by accident.
 1.79 22-Jan-2001  jdolecek make filesystem vnodeop, specop, fifoop and vnodeopv_* arrays const
 1.78 10-Jan-2001  mycroft On a RW->RO transition, explicitly clear fs_fmod after the cgupdate/sbupdate,
to prevent spurious writebacks and whinging about the (correct!) clean flag.
(Why this isn't done in ffs_sbupdate(), I dunno...)
 1.77 10-Jan-2001  chs attach the softdep pagecache pseudo-buffers to the inode
so we can find them quickly in the softdep truncate path.
 1.76 09-Jan-2001  mycroft ffs_reload(): Copy fs_ronly into the new superblock, too, as it may have been
modified on disk (e.g. by fsck(8)). This flag should really be elsewhere.
 1.75 04-Dec-2000  chs in ffs_sync(), don't skip vnodes which have (potentially dirty) pages.
 1.74 03-Dec-2000  fvdl In addition to setting the softdep flag in the superblock when
mounting with softdeps, also explicitly clear it when we don't,
so that a leftover setting after a crash will be cleared.
 1.73 27-Nov-2000  chs Initial integration of the Unified Buffer Cache project.
 1.72 13-Oct-2000  simonb There is no need to explicitly include <uvm/uvm_extern.h> for
<sys/sysctl.h> anymore.
 1.71 19-Sep-2000  fvdl Adapt for VOP_FSYNC parameter change.

Implement range fsync for FFS. Note: not yet implemented for the
SOFTDEP case.
 1.70 28-Jun-2000  mrg <vm/vm.h> -> <uvm/uvm_extern.h>
 1.69 27-Jun-2000  fvdl Due to popular demand, change vinsheadfree to ungetnewvnode to make
the name clearer. No functional change.
 1.68 27-Jun-2000  fvdl In ffs_vget, do not hold ufs_haslock across the call to getnewvnode.
We may sleep in it, or even recurse, with softdeps. Instead, grab
the lock later, but check if noone else has beaten us to the VFS_VGET
operation, and if so, roll back getnewvnode using vinsheadfree, and
just return.
 1.67 16-Jun-2000  perseant branches: 1.67.2;
make it compile (fix typo)
 1.66 16-Jun-2000  matt ignore the softdep flags when mounting and there's no softdep in the kernel.
 1.65 15-Jun-2000  fvdl Allow MNT_SOFTDEP to be passed in via the mount(2) system call, do not
require it to be set via tunefs(8). Silently ignore it when doing
an update mount of a writeable filesystem, the FFS/softdep code isn't ready
for this yet.
 1.64 29-May-2000  mycroft Use LIST_{FIRST,NEXT,EMPTY}().
 1.63 29-May-2000  mycroft Add a new inode flags called IN_ACCESSED. This used in place of IN_MODIFIED
to record that the atime was updated. In ffs_update(), we only do synchronous
writes if something *other* than the atime was changed.
 1.62 04-Apr-2000  jdolecek branches: 1.62.2;
Add a new sysctl variable vfs.ffs.log_changeopt - if this is true,
an optimalization strategy change is logged into syslog. Default
is 0 (to not log). This replaces the recent not quite "right"
change to only log the change if kernel is compiled with DEBUG.
 1.61 30-Mar-2000  augustss Remove register declarations.
 1.60 30-Mar-2000  simonb Delete redundant decls of rootvp - it's in <sys/systm.h>.
Delete redundant decl of ffs_sbupdate() - it's in <ufs/ffs/ffs_extern.h>.
 1.59 16-Mar-2000  jdolecek Add new VFS op routine - vfs_done and call it on filesystem detach
in vfs_detach(). vfs_done may free global filesystem's resources,
typically those allocated in respective filesystem's init function.
Needed so those filesystems which went in via LKM have a chance to
clean after themselves before unloading.

For each leaf filesystem, add appropriate vfs_done routine.

Also remember how many times ffs_init() was called and do
the appropriate initialization on first call only. In ffs_done(),
destroy the resources when called by the last user of ffs code.
Change mfs to call ffs_init()/ffs_done() appropriately.
 1.58 16-Mar-2000  fvdl Inititalize the fs variable struct a little earlier to avoid referencing
a bad pointer in a printf. Problem reported by Krister Walfridsson.
 1.57 14-Feb-2000  fvdl Fixes to the softdep code from Ethan Solomita <ethan@geocast.com>.
* Fix buffer ordering when it has dependencies.
* Alleviate memory problems.
* Deal with some recursive vnode locks (sigh).
* Fix other bugs.
 1.56 10-Dec-1999  drochner Call ffs_oldfscompat() before all the consistency checks, to avoid the
use of uninitialized data in the checks if the filesystem is an old one.
 1.55 15-Nov-1999  fvdl Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.54 20-Oct-1999  enami Check if the type of device node isn't VBAD before touching v_specinfo. If
the device vnode is revoked, the field is NULL and touching it causes null
pointer derefercence.
 1.53 16-Oct-1999  wrstuden branches: 1.53.2; 1.53.4;
In spec_close(), if we're not doing a non-blocking close and VXLOCK is
not set, unlock the vnode before calling the device's close routine and
relock it after it returns. tty close routines will sleep waiting for
buffers to drain, which won't happen often times as the other side needs
to grab the vnode lock first.

Make all unmount routines lock the device vnode before calling VOP_CLOSE().
 1.52 03-Aug-1999  drochner branches: 1.52.2;
clean up inclusion of "opt_ffs.h" and use of "FFS_EI" a bit
 1.51 17-Jul-1999  wrstuden Adjust mountroot routines to vrele rootvp in case of mount error. Closes
PR 7977 by Neil Carson, <neil@brini.com>.
 1.50 08-Jul-1999  wrstuden Modify file systems to deal with struct lock in struct vnode. All leaf
fs's other than nfs use genfs_lock() for locking.

Modify lookup routines to set PDIRUNLOCK when they unlock the parrent.
 1.49 05-Mar-1999  bouyer branches: 1.49.2; 1.49.4;
Don't check fs_bsize before the superblock has been swapped if needed.
Check value of sbsize before allocating memory with this value.
 1.48 26-Feb-1999  wrstuden Modify vfsops to seperate vfs_fhtovp() into two routines. vfs_fhtovp() now
only handles the file handle to vnode conversion, and a new call,
vfs_checkexp(), performs the export verification.
 1.47 10-Feb-1999  bouyer Make sure a buffer optained from bread() is always bresle()'d in case of
error. Closes PR kern/1448 from Wolfgang Solfrank.
 1.46 04-Dec-1998  bouyer Sanity check a few values in the superblock, to avoid mallocing huge
memory area if we try to mount a corrupted filesystem. Fixes kern/3933.
 1.45 12-Nov-1998  thorpej defopt FFS_EI
 1.44 23-Oct-1998  thorpej branches: 1.44.2;
Use DINODE_SIZE rather than pointer arithmetic.
 1.43 01-Sep-1998  thorpej Use the pool allocator and the "nointr" pool page allocator for FFS inodes.

XXX MFS also comes in here for inodes, and used a different malloc type,
but the structure is the same, so we just use the FFS inode pool.
 1.42 09-Aug-1998  perry bzero->memset, bcopy->memcpy, bcmp->memcmp
 1.41 05-Jul-1998  jonathan * defopt COMPAT_{09,10,11,12,13} and COMPAT_NOMID.
TODO: revisit interaction between native compat and emul compat usage.
 1.40 24-Jun-1998  sommerfe Always include fifos; "not an option any more".
 1.39 22-Jun-1998  sommerfe defopt for options FIFO
 1.38 13-Jun-1998  kleink KNF, mostly of FFS_EI changes.
 1.37 09-Jun-1998  scottr Protect various config(8)-generated files from inclusion while
building LKMs. Fixes PR 5557.
 1.36 08-Jun-1998  scottr Use the newly-defined opt_quota.h.
 1.35 05-Jun-1998  kleink Convert fsync vnode operator implementations and usage from the old `waitfor'
argument and MNT_WAIT/MNT_NOWAIT to `flags' and FSYNC_WAIT.
 1.34 18-Mar-1998  bouyer Add support for reading/writing FFS in non-native byte order, conditioned
to "options FFS_EI". The superblock and inodes (without blk addr) are
byteswapped at disk read/write time, other metadatas are byteswapped
when used (as they are acceeded directly in the buffer cache).
This required the addition of a "um_flags" field to struct ufsmount.
ffs_bswap.c contains superblock and inode byteswap routines also used
by userland utilities.
 1.33 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.32 18-Feb-1998  thorpej Place a pointer to an array of our vnodeopv_desc *'s in our vfsops
structure, for use by vfs_attach().
 1.31 16-Oct-1997  mjacob In calculating the f_bavail field, don't take 32 bit quantities and
multiply them by 90 (to be divided by 100) and expect them to be sane
for very large values (I was getting a negative 'avail' count).
 1.30 22-Jul-1997  fvdl Fix messed up RCS Id.
 1.29 07-Jul-1997  fvdl Get locking around inode hashing right.
 1.28 07-Jul-1997  fvdl Oops, I messed up the lock. Reverting it until I have time to fix it,
to avoid people getting trouble after the supscan hits.
 1.27 06-Jul-1997  fvdl Put lock around inode hashing, because getnewvnode or MALLOC might block,
creating race conditions.
 1.26 12-Jun-1997  mrg remove swap configuration.
 1.25 11-Jun-1997  bouyer Add support for ext2fs, this needed a few modifications to ufs/ufs/inode.h:
- added an "union inode_ext" to struct inode, for the per-fs extentions.
For now only ext2fs uses it.
- i_din is now an union:
union {
struct dinode ffs_din; /* 128 bytes of the on-disk dinode. */
struct ext2fs_dinode e2fs_din; /* 128 bytes of the on-disk dinode. */
} i_din
Added a lot of #define i_ffs_* and i_e2fs_* to access the fields.
- Added two macros: FFS_ITIMES and EXT2FS_ITIMES. ITIMES calls the rigth
macro, depending on the time of the inode. ITIMES is used where necessary,
FFS_ITIMES and EXT2FS_ITIMES in other places.
 1.24 10-Mar-1997  mycroft Just increment the generation count. Using the time is bogus and defeats
fsirand(8).
 1.23 31-Jan-1997  thorpej branches: 1.23.4;
- Add ffs_mountroot to ffs_vfsops.
- Only attempt to mount a root FFS on a DV_DISK class device.
 1.22 22-Dec-1996  cgd branches: 1.22.2;
Change the second and third args to struct vfsops' (*vfs_mount)() to
'const char *', and 'void *', respectively. The second arg is taken directly
from user arguments, and is const there, so must be const in the prototypes
and functions. The third arg is also taken directly from user arguments.
It doesn't have to be changed, but since it's cleaner to keep the type
the same as the user arg's type, and I'm already making the 'const char *'
change...
 1.21 12-Oct-1996  christos revert previous kprintf changes
 1.20 10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.19 09-Feb-1996  christos ffs prototypes
 1.18 19-Dec-1995  cgd Fix from Lite-2: when reloading the file system, save fs_maxcluster and
the old summary structure pointers, and recalculate cluster per cyl. grp.
information.
 1.17 11-Nov-1995  mycroft ffs -> ufs
 1.16 18-Jun-1995  cgd branches: 1.16.2;
don't assume the f_fsnamelen is nul-truncated or longer than MFSNAMELEN
 1.15 12-Apr-1995  mycroft Make use of the `fs_clean' field. If it was set when the file system was
mounted or upgraded to r-w, then clear it and set it again later when the
file system is unmounted or downgraded.
 1.14 09-Mar-1995  mycroft copy*str() should use size_t.
 1.13 08-Mar-1995  cgd size for copyinstr should be u_long
 1.12 18-Jan-1995  mycroft Clean up the code to frob mnt_stat a bit.
 1.11 18-Jan-1995  mycroft Turn mountlist into a CIRCLEQ, and handle setting and checking of MNT_ROOTFS
differently.
 1.10 15-Dec-1994  mycroft Call foo_statfs() from a common place when mounting.
 1.9 14-Dec-1994  mycroft Sync with CSRG.
 1.8 28-Oct-1994  mycroft This is not my day.
 1.7 28-Oct-1994  mycroft Fix typo.
 1.6 28-Oct-1994  mycroft For now, limit the maxfilesize to 2^31*bsize-1 in core. This is temporary.
 1.5 28-Oct-1994  mycroft Fix a couple of types in the compatibility code.
 1.4 29-Jun-1994  cgd branches: 1.4.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.3 28-Jun-1994  mycroft Reload mnt_maxsymlinklen, for `fsck -c2'.
 1.2 22-Jun-1994  mycroft Add a couple of missing casts.
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.4.2.1 23-Nov-1994  cgd from mycroft, for patch_05
 1.16.2.2 26-Dec-1995  mycroft Pull in ffs_reload() fix from trunk.
 1.16.2.1 01-Nov-1995  jtc complete ufs -> ffs change (From John Kohl; PR #1403)
 1.22.2.1 14-Jan-1997  thorpej Snapshot of work-in-progress, committed to private branch.

These changes implement machine-independent root device and file system
selection. Notable features:

- All ports behave in a consistent manner regarding root
device selection.
- No more "options GENERIC"; all kernels have the ability
to boot with RB_ASKNAME to select root device and file system
type.
- Root file system type can be wildcarded; a machine-independent
function will try all possible file systems for the selected
root device until one succeeds.
- If the root file system fails to mount, the operator will
be given the chance to select a new root device and file
system type, rather than having the machine simply panic.
- nfs_mountroot() no longer panics if any part of the NFS
mount process fails; it now returns an error, giving the
operator a chance to recover.
- New, more consistent, config(8) grammar. The constructs:

config netbsd swap generic
config netbsd root on nfs

have been replaced with:

config netbsd root on ? type ?
config netbsd root on ? type nfs

Additionally, the operator may select or wildcard root file
system type in the kernel configuration file:

config netbsd root on cd0a type cd9660

config(8) now requires that a "root" specification be
made. "root" may be wired down or wildcarded. "swap" and
"dump" specifications are optional, and follow previous
semantics.

- config(8) has a new "file-system" keyword, used to configure
file systems into the kernel. Eventually, this will be used
to generate the default vfssw[].

- "options NFSCLIENT" is obsolete, and is replaced by
"file-system NFS". "options NFSSERVER" still exists, since
NFS server support is independent of the NFS file system
client.

- sys/arch/<foo>/<foo>/swapgeneric.c is no longer used, and
will be removed; all information is now generated by config(8).

As of this commit, all ports except arm32 have been updated to use
the new setroot(). Only SPARC, i386, and Alpha ports have been
tested at this time. Port masters should test these changes on their
ports, and report any problems back to me.

More changes are on their way, including RB_ASKNAME support in
nfs_mountroot() (to prompt for server address and path) and, potentially,
the ability to select rarp/bootparam or bootp in nfs_mountroot().
 1.23.4.1 12-Mar-1997  is Merge in changes from Trunk
 1.44.2.1 30-May-1999  chs there's a new rule that all vnodes must call uvm_vnp_setsize()
before anyone can possibly access them, so do this in ffs_vget().
 1.49.4.3 02-Aug-1999  thorpej Update from trunk.
 1.49.4.2 04-Jul-1999  chs initialize new struct mount fields in ffs_mountfs().
 1.49.4.1 07-Jun-1999  chs merge everything from chs-ubc branch.
 1.49.2.2 20-Dec-1999  he Pull up revision 1.56 (via patch, requested by drochner):
Fix the use of an uninitialized variable. This could be triggered
if the file system to be mounted is a pre-BSD4.4 one (which can
result in the old file system being rejected).
 1.49.2.1 18-Oct-1999  cgd pull up rev 1.53 from trunk (requested by wrstuden):
In spec_close(), call the device's close routine with the vnode
unlocked if the call might block. Force a non-blocking close if
VXLOCK is set. This eliminates a potential deadlock situation, and
should eliminate the dirty buffers on reboot issue.
 1.52.2.2 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.52.2.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.53.4.3 15-Nov-1999  fvdl Sync with -current
 1.53.4.2 03-Nov-1999  fvdl Give ufs_ihashget an extra argument: the flags passed to vget() for
locking. This way we can avoid locking against ourselves when
ufs_ihashget is called during the flushing of metadata. XXX

Also, comment out a VOP_FSYNC call that I think is now unneeded, and
put a diagnostic printf there to check if this still happens.
 1.53.4.1 19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.53.2.5 11-Feb-2001  bouyer Sync with HEAD.
 1.53.2.4 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.53.2.3 08-Dec-2000  bouyer Sync with HEAD.
 1.53.2.2 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.53.2.1 20-Oct-1999  thorpej Sync /w trunk.
 1.62.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.67.2.8 06-Oct-2003  itojun make sure to not get flags which are for internal use only from the on-disk
superblock.
Proposed in http://mail-index.netbsd.org/tech-kern/2003/09/06/0005.html
[ticket #80, bouyer]
 1.67.2.7 25-Nov-2001  he Pull up revision 1.85 (requested by lukem):
Pull in enhanced ffs_dirpref() algorithm, which provides a
substantial performance improvement through better locality
between parent/child directories and their files, and by easing
the pressure on the buffer cache for metadata operations.
 1.67.2.6 25-Nov-2001  he Pull up revision 1.84 (requested by lukem):
Change fs_csp[] from being a fixed size to being an array sized
as required. This allows file systems with more than about 15500
cylinder groups (on 32-bit systems) to be used.
 1.67.2.5 25-Nov-2001  he Pull up revision 1.83 (requested by lukem):
Call ffs_sb_swap() with the correct arguments. Fixes problems
with using other-endian file systems.
 1.67.2.4 25-Nov-2001  he Pull up revision 1.82 (requested by lukem):
Correctly refer to fs_clean in error message.
 1.67.2.3 25-Nov-2001  he Pull up revisions 1.76,1.78 (requested by lukem):
In ffs_reload(), copy fs_ronly to the new superblock too.
Clear fs_fmod on rw->ro transition.
 1.67.2.2 14-Dec-2000  he Pull up revision 1.71 (requested by fvdl):
Improve NFS performance, possibly with as much as 100% in
throughput. Please note: this implies a kernel interface change,
VOP_FSYNC gains two arguments.
 1.67.2.1 03-Jul-2000  fvdl pullup from trunk:

Fix a "locking against myself" problem; holding ufs_hashlock
across getnewvnode() could cause a recursive lock if it resulted in
recycling a vnode that was using softdeps.
 1.80.2.15 11-Dec-2002  thorpej Sync with HEAD.
 1.80.2.14 18-Oct-2002  nathanw Catch up to -current.
 1.80.2.13 17-Sep-2002  nathanw Catch up to -current.
 1.80.2.12 01-Aug-2002  nathanw Catch up to -current.
 1.80.2.11 12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.80.2.10 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.80.2.9 20-Jun-2002  nathanw Catch up to -current.
 1.80.2.8 17-Apr-2002  nathanw Catch up to -current.
 1.80.2.7 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.80.2.6 08-Jan-2002  nathanw Catch up to -current.
 1.80.2.5 14-Nov-2001  nathanw Catch up to -current.
 1.80.2.4 21-Sep-2001  nathanw Catch up to -current.
 1.80.2.3 24-Aug-2001  nathanw Catch up with -current.
 1.80.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.80.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.81.4.8 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.81.4.7 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.81.4.6 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.81.4.5 16-Mar-2002  jdolecek Catch up with -current.
 1.81.4.4 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.81.4.3 13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.81.4.2 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.81.4.1 03-Aug-2001  lukem update to -current
 1.85.2.3 01-Oct-2001  fvdl Catch up with -current.
 1.85.2.2 26-Sep-2001  fvdl * add a VCLONED vnode flag that indicates a vnode representing a cloned
device.
* rename REVOKEALL to REVOKEALIAS, and add a REVOKECLONE flag, to pass
to VOP_REVOKE
* the revoke system call will revoke all aliases, as before, but not the
clones
* vdevgone is called when detaching a device, so make it use REVOKECLONE
to get rid of all clones as well
* clean up all uses of VOP_OPEN wrt. locking.
* add a few VOPS to spec_vnops that need to do something when it's a
clone vnode (access and getattr)
* add a copy of the vnode vattr structure of the original 'master' vnode
to the specinfo of a cloned vnode. could possibly redirect getattr to
the 'master' vnode, but this has issues with revoke
* add a vdev_reassignvp function that disassociates a vnode from its
original device, and reassociates it with the specified dev_t. to be
used by cloning devices only, in case a new minor is allocated.
* change all direct references in drivers to v_devcookie and v_rdev
to vdev_privdata(vp) and vdev_rdev(vp). for diagnostic purposes
when debugging race conditions that still exist wrt. locking and
revoking vnodes.
* make the locking state of a vnode consistent when passed to
d_open and d_close (unlocked). locked would be better, but has
some deadlock issues
 1.85.2.1 18-Sep-2001  fvdl Various changes to make cloning devices possible:

* Add an extra argument (struct vnode **) to VOP_OPEN. If it is
not NULL, specfs will create a cloned (aliased) vnode during
the call, and return it there. The caller should release and
unlock the original vnode if a new vnode was returned. The
new vnode is returned locked.

* Add a flag field to the cdevsw and bdevsw structures.
DF_CLONING indicates that it wants a new vnode for each
open (XXX is there a better way? devprop?)

* If a device is cloning, always call the close entry
point for a VOP_CLOSE.


Also, rewrite cons.c to do the right thing with vnodes. Use VOPs
rather then direct device entry calls. Suggested by mycroft@

Light to moderate testing done an i386 system (arch doesn't matter
though, these are MI changes).
 1.87.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.98.4.2 24-Sep-2003  tron Pull up revision 1.121 via patch (requested by bouyer in ticket #1464):
make sure to not get flags which are for internal use only from the on-disk
superblock.
Proposed in http://mail-index.netbsd.org/tech-kern/2003/09/06/0005.html
 1.98.4.1 10-Jun-2002  tv Pull up revision 1.99 (requested by chs in ticket #227):
allow read-only mounts even if we can't read the last fragment of the fs.
this enables one to recover data from a failing disk (where the read failure
is a hardware problem) while avoiding corrupting the fs further (in the case
where the read failure is due to a misconfiguration).
 1.98.2.3 29-Aug-2002  gehenna catch up with -current.
 1.98.2.2 20-Jun-2002  gehenna catch up with -current.
 1.98.2.1 16-May-2002  gehenna Use devsw APIs for checking validity of major numbers.
 1.118.2.14 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.118.2.13 01-Apr-2005  skrll Sync with HEAD.
 1.118.2.12 08-Mar-2005  skrll Sync with HEAD.
 1.118.2.11 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.118.2.10 17-Jan-2005  skrll Sync with HEAD.
 1.118.2.9 29-Nov-2004  skrll Sync with HEAD.
 1.118.2.8 27-Oct-2004  skrll Remove the struct lwp * arguments from qsync and ufs_checkpath that are
no longer (read: were never) required.
 1.118.2.7 24-Sep-2004  skrll Sync with HEAD.
 1.118.2.6 21-Sep-2004  skrll Fix the sync with head I botched.
 1.118.2.5 18-Sep-2004  skrll Sync with HEAD.
 1.118.2.4 25-Aug-2004  skrll Sync with HEAD.
 1.118.2.3 24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.118.2.2 03-Aug-2004  skrll Sync with HEAD
 1.118.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.140.2.3 29-May-2004  tron Pull up revision 1.148 (requested by atatat in ticket #393):
Sysctl descriptions under vfs subtree
 1.140.2.2 28-Apr-2004  jmc Pullup rev 1.145 (requested by dbj in ticket #197)

Remove botched superblock upgrade warnings.
There are now alternate non-kernel checks and fixes for this problem.
PR#17910 PR#21283 PR#21404 PR#23925 PR#23926
PR#25138
 1.140.2.1 27-Apr-2004  jdc Pull up revisions 1.141-1.142 (requested by dbj in ticket #185)

Fix problems related to superblock upgrade issues which may be
experienced by -current users from 2003.
 1.160.4.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.160.2.1 29-Apr-2005  kent sync with -current
 1.162.2.3 30-May-2007  bouyer Pull up following revision(s) (requested by tsutsui in ticket #1798):
sys/ufs/ffs/ffs_vfsops.c: revision 1.201
Fix inconsistent changes in rev 1.153 and 1.154:
Adjust fs->fs_maxfilesize instead of ump->um_maxfilesize
in ffs_oldfscompat_read() because the latter is overrided
by the former after ffs_oldfscompat_read() returned.
Fixes EFBIG errors on read(2) and "exec /sbin/init: error 8"
problem on mac68k after mountroot() on old 4.3BSD UFS created
by the Mkfs tool for MacOS (reported and confirmed on port-mac68k).
 1.162.2.2 10-Mar-2006  tron Pull up following revision(s) (requested by drochner in ticket #1189):
sys/ufs/ffs/ffs_vfsops.c: revision 1.168
fix crash in mount error handling: don't free storage which was not
malloc'd
 1.162.2.1 24-Aug-2005  riz Pull up following revision(s) (requested by yamt in ticket #688):
sys/miscfs/genfs/genfs_vnops.c: revision 1.98 via patch
sys/ufs/ffs/ffs_vfsops.c: revision 1.165
sys/ufs/lfs/lfs_extern.h: revision 1.69
sys/fs/filecorefs/filecore_vfsops.c: revision 1.20
sys/nfs/nfs_node.c: revision 1.80
sys/fs/smbfs/smbfs_node.c: revision 1.24
sys/fs/cd9660/cd9660_vfsops.c: revision 1.24
sys/fs/msdosfs/msdosfs_denode.c: revision 1.8
sys/miscfs/genfs/genfs_node.h: revision 1.6
sys/ufs/lfs/lfs_vfsops.c: revision 1.183
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.86
sys/fs/adosfs/advfsops.c: revision 1.23
sys/fs/ntfs/ntfs_vfsops.c: revision 1.31
- constify genfs_ops.
- use member designators.

sys/miscfs/genfs/genfs_vnops.c: revision 1.99 via patch
genfs_getpages: don't forget to put the vnode onto the syncer's work que
ue
even in the case of PGO_LOCKED.

sys/uvm/uvm_bio.c: revision 1.40
sys/uvm/uvm_pager.h: revision 1.29
sys/miscfs/genfs/genfs_vnops.c: revision 1.100 via patch
sys/ufs/ufs/ufs_inode.c: revision 1.50
- introduce PGO_NOBLOCKALLOC and use it for ubc mapping
to prevent unnecessary block allocations in the case that
page size > block size.
- ufs_balloc_range: use VM_PROT_WRITE+PGO_NOBLOCKALLOC rather than
VM_PROT_READ.

sys/uvm/uvm_fault.c: revision 1.96
sys/miscfs/genfs/genfs_vnops.c: revision 1.101 via patch
sys/uvm/uvm_object.h: revision 1.19
sys/miscfs/genfs/genfs_node.h: revision 1.7
ensure that vnodes with dirty pages are always on syncer's queue.
- genfs_putpages: wait for i/o completion of PG_RELEASED/PG_PAGEOUT pages by
setting "wasclean" false when encountering them.
suggested by Stephan Uphoff in PR/24596 (1).
- genfs_putpages: write protect pages when cleaning out, if
we're going to take the vnode off the syncer's queue.
uvm_fault: don't write-map pages unless its vnode is already on
the syncer's queue.
fix PR/24596 (3) but in the different way from the suggested fix.
(to keep our current behaviour, ie. not to require explicit msync.
discussed on tech-kern@.)
- genfs_putpages: don't mistakenly take a vnode off the queue
by introducing a generation number in genfs_node.
genfs_getpages: increment the generation number.
suggested by Stephan Uphoff in PR/24596 (2).
- add some assertions.

sys/miscfs/genfs/genfs_vnops.c: revision 1.102 via patch
genfs_putpages: don't bother to clean the vnode unless VONWORKLST.

sys/ufs/ffs/ffs_vnops.c: revision 1.71
ffs_full_fsync: because VBLK/VCHR can be mmap'ed,
do VOP_PUTPAGES for them as well.

sys/uvm/uvm_fault.c: revision 1.97
uvm_fault: check a correct object in the case of layered filesystems.
fix PR/30811 from Jukka Salmi.

sys/uvm/uvm_object.h: revision 1.20
sys/ufs/ffs/ffs_vfsops.c: revision 1.167
sys/uvm/uvm_bio.c: revision 1.41
sys/ufs/ufs/ufs_vnops.c: revision 1.129
sys/uvm/uvm_mmap.c: revision 1.92
sys/uvm/uvm_fault.c: revision 1.98
sys/kern/vfs_subr.c: revision 1.252
sys/fs/msdosfs/denode.h: revision 1.5
sys/miscfs/genfs/genfs_vnops.c: revision 1.103 via patch
sys/fs/msdosfs/msdosfs_denode.c: revision 1.9
sys/sys/vnode.h: revision 1.141
sys/ufs/ufs/ufs_inode.c: revision 1.51
sys/ufs/ufs/ufs_extern.h: revision 1.45 via patch
sys/miscfs/genfs/genfs_node.h: revision 1.8
sys/ufs/lfs/lfs_vfsops.c: revision 1.184
sys/uvm/uvm_pager.h: revision 1.30
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.87
update file timestamps for nfsd loaned-read and mmap.
PR/25279. discussed on tech-kern@.

sys/miscfs/genfs/genfs_vnops.c: revision 1.104 via patch
don't write-protect wired pages. pointed by Chuck Silvers.
for now, leave a vnode on the syncer's queue, as suggested by him.

sys/ufs/ffs/ffs_vnops.c: revision 1.72
revert VCHR part of ffs_vnops.c 1.71.
as VCHR uses the device pager, no point to call VOP_PUTPAGES here.
pointed by Chuck Silvers.
 1.165.2.8 04-Feb-2008  yamt sync with head.
 1.165.2.7 21-Jan-2008  yamt sync with head
 1.165.2.6 07-Dec-2007  yamt sync with head
 1.165.2.5 27-Oct-2007  yamt sync with head.
 1.165.2.4 03-Sep-2007  yamt sync with head.
 1.165.2.3 26-Feb-2007  yamt sync with head.
 1.165.2.2 30-Dec-2006  yamt sync with head.
 1.165.2.1 21-Jun-2006  yamt sync with head.
 1.175.2.2 29-Oct-2005  yamt use ffs_* directly rather than via ufs_ops.
suggested by Chuck Silvers.
 1.175.2.1 20-Oct-2005  yamt adapt ufs.
 1.178.2.2 01-Mar-2006  yamt sync with head.
 1.178.2.1 15-Jan-2006  yamt sync with head.
 1.179.4.3 01-Jun-2006  kardel Sync with head.
 1.179.4.2 22-Apr-2006  simonb Sync with head.
 1.179.4.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time() and use "time_second"
instead of "time.tv_sec".
 1.179.2.1 09-Sep-2006  rpaulo sync with head
 1.180.6.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.180.4.2 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.180.4.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.180.2.4 03-Sep-2006  yamt sync with head.
 1.180.2.3 11-Aug-2006  yamt sync with head
 1.180.2.2 26-Jun-2006  yamt sync with head.
 1.180.2.1 24-May-2006  yamt sync with head.
 1.181.2.1 19-Jun-2006  chap Sync with head.
 1.182.2.1 13-Jul-2006  gdamore Merge from HEAD.
 1.185.4.2 10-Dec-2006  yamt sync with head.
 1.185.4.1 22-Oct-2006  yamt sync with head
 1.185.2.3 01-Feb-2007  ad Sync with head.
 1.185.2.2 12-Jan-2007  ad Sync with head.
 1.185.2.1 18-Nov-2006  ad Sync with head.
 1.190.4.1 03-Sep-2007  wrstuden Sync w/ NetBSD-4-RC_1
 1.190.2.1 04-Jun-2007  riz Pull up following revision(s) (requested by tsutsui in ticket #686):
sys/ufs/ffs/ffs_vfsops.c: revision 1.201
Fix inconsistent changes in rev 1.153 and 1.154:
Adjust fs->fs_maxfilesize instead of ump->um_maxfilesize
in ffs_oldfscompat_read() because the latter is overrided
by the former after ffs_oldfscompat_read() returned.
Fixes EFBIG errors on read(2) and "exec /sbin/init: error 8"
problem on mac68k after mountroot() on old 4.3BSD UFS created
by the Mkfs tool for MacOS (reported and confirmed on port-mac68k).
 1.196.6.22 11-Nov-2007  hannken Add fstrans_mount() to explicitly allocate fstrans_info.
Replace remaining malloc() to kmem_alloc() in vfs_trans.c.

Ok: Andrew Doran <ad@netbsd.org>
 1.196.6.21 25-Oct-2007  ad Fix up mnt_vnodelist handling.
 1.196.6.20 23-Oct-2007  ad Sync with head.
 1.196.6.19 08-Oct-2007  ad Call fstrans_unmount().
 1.196.6.18 16-Sep-2007  ad - Checkpoint work in progress on the vnode lifecycle and reference counting
stuff. This makes it work properly without kernel_lock and fixes a few
quite old bugs. See vfs_subr.c 1.283.2.17 for details.

- Fix some problems with softdep. Unfortunately our softdep code appears
to have some longstanding bugs that cause it fail under stress test.
 1.196.6.17 30-Aug-2007  ad - Mark ffs MPSAFE. There are still a few minor problems and I'm not yet
sure about the snapshot code, but by and large it's there.
- Grap ump->um_lock in a few more places.
 1.196.6.16 28-Aug-2007  ad Revert accidental change (mp->mnt_iflag |= IMNT_MPSAFE).
 1.196.6.15 24-Aug-2007  ad Sync with buffer cache locking changes. See buf.h/vfs_bio.c for details.
Some minor portions are incomplete and needs to be verified as a whole.
 1.196.6.14 20-Aug-2007  ad Sync with HEAD.
 1.196.6.13 20-Aug-2007  ad softdep locking improvements. It hangs looping in flush_inodedep_deps(),
more work required.
 1.196.6.12 29-Jul-2007  ad Add vfs_destroy() to free mount structures. The specificdata_ref was being
leaked.
 1.196.6.11 15-Jul-2007  ad Sync with head.
 1.196.6.10 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.196.6.9 09-Jun-2007  ad Sync with head.
 1.196.6.8 08-Jun-2007  ad Sync with head.
 1.196.6.7 27-May-2007  ad ffs_sync: vp->v_data can be NULL if the vnode is being recycled.
 1.196.6.6 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.196.6.5 14-Apr-2007  ad ffs_sync: don't try to examine the inode without locking if the vnode is
being freed.
 1.196.6.4 13-Apr-2007  ad Put a per-mount lock around ffs shared data structures, excluding softdep
and quotas. Strategy lifted from FreeBSD.
 1.196.6.3 10-Apr-2007  ad Sync with head.
 1.196.6.2 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.196.6.1 13-Mar-2007  ad Sync with head.
 1.196.2.3 17-May-2007  yamt sync with head.
 1.196.2.2 15-Apr-2007  yamt sync with head.
 1.196.2.1 24-Mar-2007  yamt sync with head.
 1.197.2.1 11-Jul-2007  mjf Sync with head.
 1.205.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.207.4.2 31-Jul-2007  pooka * nuke the nameidata parameter from VFS_MOUNT(). Nobody on tech-kern
knew what it was supposed to be used for and wrstuden gave a go-ahead
* while rototilling, convert file systems which went easily to
use VFS_PROTOS() instead of manually prototyping the methods
 1.207.4.1 31-Jul-2007  pooka file ffs_vfsops.c was added on branch matt-mips64 on 2007-07-31 21:14:21 +0000
 1.207.2.4 09-Dec-2007  jmcneill Sync with HEAD.
 1.207.2.3 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.207.2.2 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.207.2.1 16-Aug-2007  jmcneill Sync with HEAD.
 1.208.4.1 14-Oct-2007  yamt sync with head.
 1.208.2.3 23-Mar-2008  matt sync with HEAD
 1.208.2.2 09-Jan-2008  matt sync with HEAD
 1.208.2.1 06-Nov-2007  matt sync with HEAD
 1.210.4.3 18-Feb-2008  mjf Sync with HEAD.
 1.210.4.2 27-Dec-2007  mjf Sync with HEAD.
 1.210.4.1 08-Dec-2007  mjf Sync with HEAD.
 1.211.2.2 26-Dec-2007  ad Sync with head.
 1.211.2.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.212.4.3 10-Jan-2008  bouyer Sync with HEAD
 1.212.4.2 08-Jan-2008  bouyer Sync with HEAD
 1.212.4.1 02-Jan-2008  bouyer Sync with HEAD
 1.222.6.5 17-Jan-2009  mjf Sync with HEAD.
 1.222.6.4 28-Sep-2008  mjf Sync with HEAD.
 1.222.6.3 29-Jun-2008  mjf Sync with HEAD.
 1.222.6.2 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.222.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.223.4.7 11-Aug-2010  yamt sync with head.
 1.223.4.6 11-Mar-2010  yamt sync with head
 1.223.4.5 16-Sep-2009  yamt sync with head
 1.223.4.4 19-Aug-2009  yamt sync with head.
 1.223.4.3 18-Jul-2009  yamt sync with head.
 1.223.4.2 04-May-2009  yamt sync with head.
 1.223.4.1 16-May-2008  yamt sync with head.
 1.223.2.2 04-Jun-2008  yamt sync with head
 1.223.2.1 18-May-2008  yamt sync with head.
 1.226.2.3 10-Oct-2008  skrll Sync with HEAD.
 1.226.2.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.226.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.229.2.7 28-Jul-2008  simonb Add support for creating a WAPBL log in the filesystem. Will
create an in-filesystem log on first "mount -o log" if one doesn't
exist, and will then continue to use same log in the future. See
(soon to be added) wapbl(4) for more info.

Adds a new B_CONTIG low-level allocation flag that uses hints in
"struct ffs_inode_ext" to lay out an ffs file's data contiguously.

Thanks to Greg Oster for helping with the design of this and to
Antti Kantee for code review and suggestions.
 1.229.2.6 03-Jul-2008  simonb Sync with head.
 1.229.2.5 30-Jun-2008  simonb During mount, mark the filesystem as clean once we've replayed the
journal.

With much help from Greg Oster.
 1.229.2.4 12-Jun-2008  martin License police
 1.229.2.3 11-Jun-2008  simonb Fix some whitespace and long line niggles.
 1.229.2.2 11-Jun-2008  simonb Comment out the behaviour change that requires "mount -f ..." to mount
a dirty filesystem.
 1.229.2.1 10-Jun-2008  simonb Initial commit of Wasabi System's WAPBL (Write Ahead Physical Block
Logging) journaling code. Originally written by Darrin B. Jewell
while at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

Still a number of issues - look in doc/BRANCHES for "simonb-wapbl"
for more info.
 1.230.2.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.230.2.1 19-Oct-2008  haad Sync with HEAD.
 1.238.2.3 28-Apr-2009  skrll Sync with HEAD.
 1.238.2.2 03-Mar-2009  skrll Sync with HEAD.
 1.238.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.239.2.5 25-Apr-2014  sborrill Pull up the following revisions(s) (requested by maxv in ticket #1901):
sys/kern/vfs_syscalls.c: revision 1.478, 1.480 via patch
sys/coda/coda_vfsops.c: revision 1.81
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50 via patch
sys/fs/puffs/puffs_vfsops.c: revision 1.110 via patch
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59 via patch
sys/fs/udf/udf_vfsops.c: revision 1.67
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/kern/vfs_syscalls.c: revision 1.479
sys/miscfs/nullfs/null_vfsops.c: revision 1.88 via patch
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/nfs/nfs_vfsops.c: revision 1.227
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/ufs/mfs/mfs_vfsops.c: revision 1.107

Due to missing checks in the mount syscall, and a wrong assumption on the
file systems side, the kernel could allocate an unbounded or zero-sized
memory buffer, and could dereference a NULL pointer when particular
arguments are given by a user.
 1.239.2.4 03-Oct-2009  snj branches: 1.239.2.4.2; 1.239.2.4.6;
Pull up following revision(s) (requested by bouyer in ticket #1036):
sbin/fsck_ffs/extern.h: revision 1.25 via patch
sbin/fsck_ffs/setup.c: revision 1.88 via patch
sbin/fsck_ffs/wapbl.c: revision 1.4 via patch
sbin/tunefs/tunefs.c: revision 1.41 via patch
sys/ufs/ffs/ffs_vfsops.c: revision 1.252 via patch
sys/ufs/ffs/ffs_wapbl.c: revision 1.13 via patch
Allow tunefs to clear any type of WAPBL log, not only in-filesystem
ones. Discussed in
http://mail-index.netbsd.org/tech-kern/2009/08/17/msg005896.html
and followups.
--
Do some basic checks of the WAPBL journal, to abort the boot before the
kernel refuse to mount a filesystem read-write (booting a system
multiuser with critical filesystems read-only is bad):
Add a check_wapbl() which will check some WAPBL values in the superblock,
and try to read the journal via wapbl_replay_start() if there is one.
pfatal() if one of these fail (abort boot if in preen mode,
as "CONTINUE" otherwise). In non-preen mode the bogus journal will
be cleared.
check_wapbl() is always called if the superblock supports WAPBL.
Even if FS_DOWAPBL is not there, there could be flags asking the
kernel to clear or create a log with bogus values which would cause the
kernel refuse to mount the filesystem.
Discussed in
http://mail-index.netbsd.org/tech-kern/2009/08/17/msg005896.html
and followups.
--
If the WAPBL journal can't be read (ffs_wapbl_replay_start() fails),
mount the filesystem anyway if MNT_FORCE is present.
This allows to still boot single-user a system with a corrupted
WAPBL on /, and so get a chance to run fsck to fix it.
http://mail-index.netbsd.org/tech-kern/2009/08/17/msg005896.html
and followups.
 1.239.2.3 04-Apr-2009  snj branches: 1.239.2.3.4;
Pull up following revision(s) (requested by add in ticket #655):
sys/ufs/ffs/ffs_vfsops.c: revision 1.245 via patch
fsync:
- atime updates were not being synced.
ffs_sync:
- In some cases the sync vnode was acting like now dead /usr/sbin/update.
It was examining vnodes that it should have ignored.
- It would find dirty inodes and try to flush them. Often ffs_fsync()
cheerfully ignored the flush request due to the fsync bug. Such inodes
remained dirty and were repeatedly re-examined by the syncer until
vnode reclaim or system shutdown.
- We were marking our place in the per-mount vnode list even though in
most cases there was not flush to perform. While not a bug, this wasted
CPU cycles because a TAILQ_NEXT would have sufficed.
 1.239.2.2 27-Mar-2009  msaitoh Pull up following revision(s) (requested by ad in ticket #600):
sys/ufs/ffs/ffs_vfsops.c: revision 1.244
ffs_sync: ensure that we *do* flush atime updates periodically.
ffs_update() was eating the flag.
 1.239.2.1 24-Feb-2009  snj Pull up following revision(s) (requested by ad in ticket #490):
sys/kern/vfs_wapbl.c: revision 1.23
sys/miscfs/syncfs/sync_subr.c: revision 1.36
sys/miscfs/syncfs/sync_vnops.c: revision 1.26
sys/ufs/ffs/ffs_alloc.c: revision 1.121
sys/ufs/ffs/ffs_vfsops.c: revision 1.242
sys/ufs/ffs/ffs_vnops.c: revision 1.110
PR kern/39564 wapbl performance issues with disk cache flushing
PR kern/40361 WAPBL locking panic in -current
PR kern/40361 WAPBL locking panic in -current
PR kern/40470 WAPBL corrupts ext2fs
PR kern/40562 busy loop in ffs_sync when unmounting a file system
PR kern/40525 panic: ffs_valloc: dup alloc
- A fix for an issue that can lead to "ffs_valloc: dup" due to dirty cg
buffers being invalidated. Problem discovered and patch by dholland@.
- If the syncer fails to lazily sync a vnode due to lock contention,
retry 1 second later instead of 30 seconds later.
- Flush inode atime updates every ~10 seconds (this makes most sense with
logging). Presently they didn't hit the disk for read-only files or
devices until the file system was unmounted. It would be better to trickle
the updates out but that would require more extensive changes.
- Fix issues with file system corruption, busy looping and other nasty
problems when logging and non-logging file systems are intermixed,
with one being the root file system.
- For logging, do not flush metadata on an inode-at-a-time basis if the sync
has been requested by ioflush. Previously, we could try hundreds of log
sync operations a second due to inode update activity, causing the syncer
to fall behind and metadata updates to be serialized across the entire
file system. Instead, burst out metadata and log flushes at a minimum
interval of every 10 seconds on an active file system (happens more often
if the log becomes full). Note this does not change the operation of
fsync() etc.
- With the flush issue fixed, re-enable concurrent metadata updates in
vfs_wapbl.c.
 1.239.2.4.6.1 28-Apr-2014  sborrill Pull up the following revisions(s) (requested by maxv in ticket #1901):
sys/kern/vfs_syscalls.c: revision 1.478, 1.480 via patch
sys/coda/coda_vfsops.c: revision 1.81
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50 via patch
sys/fs/puffs/puffs_vfsops.c: revision 1.110 via patch
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59 via patch
sys/fs/udf/udf_vfsops.c: revision 1.67
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/kern/vfs_syscalls.c: revision 1.479
sys/miscfs/nullfs/null_vfsops.c: revision 1.88 via patch
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/nfs/nfs_vfsops.c: revision 1.227
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/ufs/mfs/mfs_vfsops.c: revision 1.107

Due to missing checks in the mount syscall, and a wrong assumption on the
file systems side, the kernel could allocate an unbounded or zero-sized
memory buffer, and could dereference a NULL pointer when particular
arguments are given by a user.
 1.239.2.4.2.1 28-Apr-2014  sborrill Pull up the following revisions(s) (requested by maxv in ticket #1901):
sys/kern/vfs_syscalls.c: revision 1.478, 1.480 via patch
sys/coda/coda_vfsops.c: revision 1.81
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50 via patch
sys/fs/puffs/puffs_vfsops.c: revision 1.110 via patch
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59 via patch
sys/fs/udf/udf_vfsops.c: revision 1.67
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/kern/vfs_syscalls.c: revision 1.479
sys/miscfs/nullfs/null_vfsops.c: revision 1.88 via patch
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/nfs/nfs_vfsops.c: revision 1.227
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/ufs/mfs/mfs_vfsops.c: revision 1.107

Due to missing checks in the mount syscall, and a wrong assumption on the
file systems side, the kernel could allocate an unbounded or zero-sized
memory buffer, and could dereference a NULL pointer when particular
arguments are given by a user.
 1.239.2.3.4.1 21-Apr-2010  matt sync to netbsd-5
 1.241.4.2 23-Jul-2009  jym Sync with HEAD.
 1.241.4.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.257.2.14 19-Nov-2010  uebayasi - Check FFS fragment size to be page-aligned too.
- Hook the new cdev_mmap() method.
 1.257.2.13 25-Oct-2010  uebayasi Fragment size doesn't need to be page-aligned.

Return EINVAL if read-only mount option is not set, other failures
reported as ENXIO.
 1.257.2.12 21-Oct-2010  uebayasi Handle XIP mount error properly.
 1.257.2.11 21-Oct-2010  uebayasi After consideration, put back "xip" mount option.

The internal behavior is totally different between with and without
the option; automatic detection and/or fall-through are not user
friendly. mount(8) returning the "xip" flag is also informative.
 1.257.2.10 07-Oct-2010  uebayasi Check filesystem's bsize/fsize are aligned to PAGE_SIZE, or fail with
ENXIO.
 1.257.2.9 26-Sep-2010  uebayasi ffs_vget: Mark XIP only for VREG vnodes.
 1.257.2.8 17-Aug-2010  uebayasi Sync with HEAD.
 1.257.2.7 27-Jul-2010  uebayasi s/DIOCGPHYSADDR/DIOCGPHYSSEG/ now that it returns struct vm_physseg *,
not paddr_t.
 1.257.2.6 28-May-2010  uebayasi Remove the "xip" option from mount_ffs(8) for simplicity.
 1.257.2.5 30-Apr-2010  uebayasi Sync with HEAD.
 1.257.2.4 28-Apr-2010  uebayasi When mounting a block device as XIP, pass registered struct vm_physseg
* as a cookie from the block device to the caller (== mount code).
struct vm_physseg * will be passed to XIP vnode pager
(genfs_do_getpages_xip()), then converted back to paddr_t.

(My future plan is to pass struct vm_physseg * back to the fault handler,
and to pmap_enter() as is.)
 1.257.2.3 23-Mar-2010  uebayasi Put run-time XIP-specific per-mount data in struct specdev, not struct mount.
 1.257.2.2 23-Feb-2010  uebayasi Check XIP mount condition more nicely.
 1.257.2.1 11-Feb-2010  uebayasi XIP hook for ffs.
 1.258.2.6 31-May-2011  rmind sync with head
 1.258.2.5 19-May-2011  rmind Implement sharing of vnode_t::v_interlock amongst vnodes:
- Lock is shared amongst UVM objects using uvm_obj_setlock() or getnewvnode().
- Adjust vnode cache to handle unsharing, add VI_LOCKSHARE flag for that.
- Use sharing in tmpfs and layerfs for underlying object.
- Simplify locking in ubc_fault().
- Sprinkle some asserts.

Discussed with ad@.
 1.258.2.4 21-Apr-2011  rmind sync with head
 1.258.2.3 05-Mar-2011  rmind sync with head
 1.258.2.2 03-Jul-2010  rmind sync with head
 1.258.2.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.263.4.3 09-Feb-2011  bouyer Support MNT_UPDATE for quota2 (especially r/o -> r/w transitions)
 1.263.4.2 08-Feb-2011  bouyer Minimal hacking to make 'options QUOTA' compile again.
 1.263.4.1 20-Jan-2011  bouyer Snapshot of work in progress on a modernised disk quota system:
- new quotactl syscall (versionned for backward compat), which takes
as parameter a path to a mount point, and a prop_dictionary
(in plistref format) describing commands and arguments.
For each command, status and data are returned as a prop_dictionary.
quota commands features will be added to take advantage of this,
exporting quota data or getting quota commands as plists.

- new on disk-format storage (all 64bit wide), integrated to metadata for
ffs (and playing nicely with wapbl).
Quotas are enabled on a ffs filesystem via superblock flags.
tunefs(8) can enable or disable quotas.
On a quota-enabled filesystem, fsck_ffs(8) will track per-uid/gid
block and inode usages, and will check and update quotas in Pass 6.
quota usage and limits are stored in unliked files (one for users,
one for groups)l fsck_ffs(8) will create the files if needed, or
free them if needed. This means that after enabling or disabling
quotas on a filesystem; a fsck_ffs(8) run is required.
quotacheck(8) is not needed any more, on a unclean shutdown
fsck or journal replay will take care of fixing quotas.
newfs(8) can create a ready-to-mount quota-enabled filesystem
(superblock flags are set and quota inodes are created).
Other new features or semantic changes:
- default quota datas, applied to users or groups which don't already
have a quota entry
- per-user/group grace time (instead of a filesystem global one)
- 0 really means "nothing allowed at all", not "no limit".
If you want "no limit", set the limit to UQUAD_MAX (tools will
understand "unlimited" and "-")

A quota file is structured as follow:
it starts with a header, containing a few per-filesystem values,
and the default quota limits.
Quota entries are linked together as a simple list, each entry has a
pointer (as an offset withing the file) to the next.
The header has a pointer to a list of free quota entries, and
a hash table of in-use entries. The size of the hash table depends
on the filesystem block size (header+hash table should fit in the
first block). The file is not sparse and is a multiple of
filesystem block size (when the free quota entry list is empty a new
filesystem block is allocated). quota entries to not cross
filesystem block boundaries.

In memory, the kernel keeps a cache of recently used quota entries
as a reference to the block number, and offset withing the block.
The quota entry itself is keept in the buf cache.

fsck_ffs(8), tunefs(8) and newfs(8) supports are completed (with
related atf tests :)
The kernel can update disk usage and report it via quotactl(2).

Todo: enforce quotas limits (limits are not checked by kernel yet)
update repquota, edquota and rpc.rquotad to the new world
implement compat_50_quotactl ioctl.
update quotactl(2) man page

fsck_ffs required fixes so that allocating new blocks or inodes will
properly update the superblock and cg sumaries. This was not an issue up
to now because superblock and cg sumaries check happened last, but now
allocations or frees can happen in pass 6.
 1.263.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.266.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.269.2.6 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.269.2.5 23-Jan-2013  yamt sync with head
 1.269.2.4 16-Jan-2013  yamt sync with (a bit old) head
 1.269.2.3 30-Oct-2012  yamt sync with head
 1.269.2.2 23-May-2012  yamt sync with head.
 1.269.2.1 17-Apr-2012  yamt sync with head
 1.271.4.3 02-Jun-2012  mrg sync to latest -current.
 1.271.4.2 05-Apr-2012  mrg sync to latest -current.
 1.271.4.1 18-Feb-2012  mrg merge to -current.
 1.275.2.5 27-Aug-2016  bouyer Pull up following revision(s) (requested by martin in ticket #1395):
sys/ufs/ffs/ffs_vfsops.c: revision 1.340
usr.sbin/quot/quot.c: revision 1.34
sbin/fsdb/fsdb.c: revision 1.49
From Michael Plass:
The superblock field that distinguishes between 4.2BSD and 4.4BSD
inodes is really only relevant on a UFS1 file system. Make sure that
it is a UFS1 fs before using fs_old_inodefmt.
Note that the NetBSD newfs and mkfs utilities initialize fs_old_inodefmt
even for UFS2, so problems were apparent only on file systems created
by other operating systems, for example, FreeBSD.
 1.275.2.4 04-Dec-2014  snj Pull up following revision(s) (requested by manu in ticket #1196):
sys/kern/vfs_mount.c: revision 1.31
sys/ufs/ffs/ffs_vfsops.c: revision 1.302
sys/ufs/ufs/ufs_extattr.c: revision 1.44
Fix use-after-free on failed unmount with extended attribute enabled
When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.
The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart
As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.
 1.275.2.3 21-Apr-2014  bouyer Pull up following revision(s) (requested by maxv in ticket #1050):
sys/ufs/chfs/chfs_vfsops.c: revision 1.11
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/fs/nilfs/nilfs_vfsops.c: revision 1.16
sys/ufs/mfs/mfs_vfsops.c: revision 1.107
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/kern/vfs_syscalls.c: revision 1.478
sys/kern/vfs_syscalls.c: revision 1.479
sys/fs/puffs/puffs_vfsops.c: revision 1.110
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/nfs/nfs_vfsops.c: revision 1.227
sys/fs/v7fs/v7fs_vfsops.c: revision 1.10
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/miscfs/nullfs/null_vfsops.c: revision 1.88
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50
sys/coda/coda_vfsops.c: revision 1.81
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/kern/vfs_syscalls.c: revision 1.480
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/kern/vfs_syscalls.c: revision 1.482
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vfsops.c: revision 1.12
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/udf/udf_vfsops.c: revision 1.67
Limit check for 'data_len'. Otherwise a (un)privileged user can easily
panic the system by passing a huge size.
ok christos@
An (un)privileged user can easily make the kernel dereference a NULL
pointer.
The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).
ok christos@
Some fs's - like kernfs - set their vfs_min_mount_data to zero. Add a check
to prevent an (un)privileged user from requesting a zero-sized allocation
(and thus a panic).
This thing is totally buggy: 'data_len' is modified by the fs, so calling
kmem_free with it while its value has changed since the kmem_alloc is far
from being a good idea.
If the kernel figures out that something mismatches, it will panic
(typically with kernfs).
 1.275.2.2 13-Sep-2012  riz branches: 1.275.2.2.2; 1.275.2.2.4;
Pull up following revision(s) (requested by manu in ticket #553):
sys/ufs/ffs/ffs_vfsops.c: revision 1.278
Stop extended attributes at the appropriate place so that unmount
does not fail with EBUSY on filesystem with extended attributes ensabled.
 1.275.2.1 07-May-2012  riz branches: 1.275.2.1.2;
Pull up following revision(s) (requested by chs in ticket #204):
sys/fs/sysvbfs/sysvbfs_vnops.c: revision 1.44
sys/ufs/ffs/ffs_vfsops.c: revision 1.277
sys/fs/v7fs/v7fs_vnops.c: revision 1.11
sys/ufs/chfs/chfs_vnops.c: revision 1.7
sys/ufs/ext2fs/ext2fs_readwrite.c: revision 1.61
sys/miscfs/genfs/genfs_io.c: revision 1.54
sys/kern/vfs_wapbl.c: revision 1.52
sys/uvm/uvm_pager.h: revision 1.43
sys/ufs/ffs/ffs_vnops.c: revision 1.121
sys/kern/vfs_subr.c: revision 1.434
sys/fs/msdosfs/msdosfs_vnops.c: revision 1.83
sys/fs/ntfs/ntfs_vnops.c: revision 1.51
sys/fs/udf/udf_subr.c: revision 1.119
sys/miscfs/specfs/spec_vnops.c: revision 1.135
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.103
sys/fs/udf/udf_vnops.c: revision 1.71
sys/ufs/ufs/ufs_readwrite.c: revision 1.104
change vflushbuf() to take the full FSYNC_* flags.
translate FSYNC_LAZY into PGO_LAZY for VOP_PUTPAGES() so that
genfs_do_io() can set the appropriate io priority for the I/O.
this is the first part of addressing PR 46325.
mark all wapbl I/O as BPRIO_TIMECRITICAL.
this is the second part of addressing PR 46325.
 1.275.2.2.4.2 27-Aug-2016  bouyer Pull up following revision(s) (requested by martin in ticket #1395):
sys/ufs/ffs/ffs_vfsops.c: revision 1.340
usr.sbin/quot/quot.c: revision 1.34
sbin/fsdb/fsdb.c: revision 1.49
From Michael Plass:
The superblock field that distinguishes between 4.2BSD and 4.4BSD
inodes is really only relevant on a UFS1 file system. Make sure that
it is a UFS1 fs before using fs_old_inodefmt.
Note that the NetBSD newfs and mkfs utilities initialize fs_old_inodefmt
even for UFS2, so problems were apparent only on file systems created
by other operating systems, for example, FreeBSD.
 1.275.2.2.4.1 21-Apr-2014  bouyer Pull up following revision(s) (requested by maxv in ticket #1050):
sys/ufs/chfs/chfs_vfsops.c: revision 1.11
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/fs/nilfs/nilfs_vfsops.c: revision 1.16
sys/ufs/mfs/mfs_vfsops.c: revision 1.107
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/kern/vfs_syscalls.c: revision 1.478
sys/kern/vfs_syscalls.c: revision 1.479
sys/fs/puffs/puffs_vfsops.c: revision 1.110
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/nfs/nfs_vfsops.c: revision 1.227
sys/fs/v7fs/v7fs_vfsops.c: revision 1.10
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/miscfs/nullfs/null_vfsops.c: revision 1.88
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50
sys/coda/coda_vfsops.c: revision 1.81
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/kern/vfs_syscalls.c: revision 1.480
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/kern/vfs_syscalls.c: revision 1.482
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vfsops.c: revision 1.12
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/udf/udf_vfsops.c: revision 1.67
Limit check for 'data_len'. Otherwise a (un)privileged user can easily
panic the system by passing a huge size.
ok christos@
An (un)privileged user can easily make the kernel dereference a NULL
pointer.
The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).
ok christos@
Some fs's - like kernfs - set their vfs_min_mount_data to zero. Add a check
to prevent an (un)privileged user from requesting a zero-sized allocation
(and thus a panic).
This thing is totally buggy: 'data_len' is modified by the fs, so calling
kmem_free with it while its value has changed since the kmem_alloc is far
from being a good idea.
If the kernel figures out that something mismatches, it will panic
(typically with kernfs).
 1.275.2.2.2.2 27-Aug-2016  bouyer Pull up following revision(s) (requested by martin in ticket #1395):
sys/ufs/ffs/ffs_vfsops.c: revision 1.340
usr.sbin/quot/quot.c: revision 1.34
sbin/fsdb/fsdb.c: revision 1.49
From Michael Plass:
The superblock field that distinguishes between 4.2BSD and 4.4BSD
inodes is really only relevant on a UFS1 file system. Make sure that
it is a UFS1 fs before using fs_old_inodefmt.
Note that the NetBSD newfs and mkfs utilities initialize fs_old_inodefmt
even for UFS2, so problems were apparent only on file systems created
by other operating systems, for example, FreeBSD.
 1.275.2.2.2.1 21-Apr-2014  bouyer Pull up following revision(s) (requested by maxv in ticket #1050):
sys/ufs/chfs/chfs_vfsops.c: revision 1.11
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/fs/nilfs/nilfs_vfsops.c: revision 1.16
sys/ufs/mfs/mfs_vfsops.c: revision 1.107
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/kern/vfs_syscalls.c: revision 1.478
sys/kern/vfs_syscalls.c: revision 1.479
sys/fs/puffs/puffs_vfsops.c: revision 1.110
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/nfs/nfs_vfsops.c: revision 1.227
sys/fs/v7fs/v7fs_vfsops.c: revision 1.10
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/miscfs/nullfs/null_vfsops.c: revision 1.88
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50
sys/coda/coda_vfsops.c: revision 1.81
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/kern/vfs_syscalls.c: revision 1.480
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/kern/vfs_syscalls.c: revision 1.482
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vfsops.c: revision 1.12
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/udf/udf_vfsops.c: revision 1.67
Limit check for 'data_len'. Otherwise a (un)privileged user can easily
panic the system by passing a huge size.
ok christos@
An (un)privileged user can easily make the kernel dereference a NULL
pointer.
The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).
ok christos@
Some fs's - like kernfs - set their vfs_min_mount_data to zero. Add a check
to prevent an (un)privileged user from requesting a zero-sized allocation
(and thus a panic).
This thing is totally buggy: 'data_len' is modified by the fs, so calling
kmem_free with it while its value has changed since the kmem_alloc is far
from being a good idea.
If the kernel figures out that something mismatches, it will panic
(typically with kernfs).
 1.275.2.1.2.1 01-Nov-2012  matt sync with netbsd-6-0-RELEASE.
 1.278.2.7 03-Dec-2017  jdolecek update from HEAD
 1.278.2.6 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.278.2.5 23-Jun-2013  tls resync from head
 1.278.2.4 25-Feb-2013  tls resync with head
 1.278.2.3 10-Feb-2013  tls Add an accessor -- ufs_maxphys() -- to check the maximum transfer size
for a given UFS mountpoint, and move the code from mount that finds
the underlying disk and resets the mountpoint max transfer size into a
utility function, ufs_update_maxphys().

Add a global serial number that counts disk property changes to which
filesystems are meant to accomodate themselves. Make ufs_maxphys()
check it. This is a sort of flag-polling interface that avoids callbacks
into the filesystem code, but will require freezing filesystems and
draining in-flight transactions before a decrease in size that is
mandatory (like attaching a disk with a smaller maximum transfer size
as a spare in a RAIDframe set), rather than "advisory", like finding
out set geometry from a RAID controller long after boot and deciding
a smaller transfer size would be optimal, can be signalled. Still, the
"advisory" case is the common one so this is progress.

Make a bit of an example of RAIDframe by making it bump this new
serial number when disks are added to the subsystem. I will attack
one of the hardware RAID drivers (probably arcmsr) next.
 1.278.2.2 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.278.2.1 12-Sep-2012  tls Initial snapshot of work to eliminate 64K MAXPHYS. Basically works for
physio (I/O to raw devices); needs more doing to get it going with the
filesystems, but it shouldn't damage data.

All work's been done on amd64 so far. Not hard to add support to other
ports. If others want to pitch in, one very helpful thing would be to
sort out when and how IDE disks can do 128K or larger transfers, and
adjust the various PCI IDE (or at least ahcisata) drivers and wd.c
accordingly -- it would make testing much easier. Another very helpful
thing would be to implement a smart minphys() for RAIDframe along the
lines detailed in the MAXPHYS-NOTES file.
 1.286.2.2 18-May-2014  rmind sync with head
 1.286.2.1 28-Aug-2013  rmind sync with head
 1.296.2.1 10-Aug-2014  tls Rebase.
 1.299.2.4 27-Aug-2016  bouyer Pull up following revision(s) (requested by martin in ticket #1210):
sys/ufs/ffs/ffs_vfsops.c: revision 1.340
usr.sbin/quot/quot.c: revision 1.34
sbin/fsdb/fsdb.c: revision 1.49
From Michael Plass:
The superblock field that distinguishes between 4.2BSD and 4.4BSD
inodes is really only relevant on a UFS1 file system. Make sure that
it is a UFS1 fs before using fs_old_inodefmt.
Note that the NetBSD newfs and mkfs utilities initialize fs_old_inodefmt
even for UFS2, so problems were apparent only on file systems created
by other operating systems, for example, FreeBSD.
 1.299.2.3 28-Jan-2015  martin branches: 1.299.2.3.2;
Pull up following revision(s) (requested by christos in ticket #425):
sys/ufs/ufs/ufs_inode.c: revision 1.91-1.92
sys/ufs/ufs/ufs_vnops.c: revision 1.223-1.224
sys/ufs/ufs/ufs_extern.h: revision 1.76-1.77
sys/ufs/ffs/ffs_vfsops.c: revision 1.303-1.305
Add debugging for mount...
Merge some error returns
Check more errors
Restore apple ufs error handling.
Move and unify indirect block truncate algorithm into a separate function.
PR/39371: Tobias Nygren: Don't fail mounting root if WAPBL log is corrupt.
Patch from Sergio L. Pascual.
 1.299.2.2 29-Dec-2014  martin Pull up following revision(s) (requested by maxv in ticket #352):
sys/ufs/ffs/ffs_vfsops.c: revision 1.301
Limit the superblock size to SBLOCKSIZE, not MAXBSIZE. Otherwise memcpy
will read beyond the allocated buffer.
Discussed a bit on tech-kern@.
 1.299.2.1 18-Nov-2014  snj Pull up following revision(s) (requested by manu in ticket #246):
sys/kern/vfs_mount.c: revision 1.31
sys/ufs/ffs/ffs_vfsops.c: revision 1.302
sys/ufs/ufs/ufs_extattr.c: revision 1.44
Fix use-after-free on failed unmount with extended attribute enabled
When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.
The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart
As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.
 1.299.2.3.2.1 27-Aug-2016  bouyer Pull up following revision(s) (requested by martin in ticket #1210):
sys/ufs/ffs/ffs_vfsops.c: revision 1.340
usr.sbin/quot/quot.c: revision 1.34
sbin/fsdb/fsdb.c: revision 1.49
From Michael Plass:
The superblock field that distinguishes between 4.2BSD and 4.4BSD
inodes is really only relevant on a UFS1 file system. Make sure that
it is a UFS1 fs before using fs_old_inodefmt.
Note that the NetBSD newfs and mkfs utilities initialize fs_old_inodefmt
even for UFS2, so problems were apparent only on file systems created
by other operating systems, for example, FreeBSD.
 1.302.2.9 28-Aug-2017  skrll Sync with HEAD
 1.302.2.8 05-Feb-2017  skrll Sync with HEAD
 1.302.2.7 05-Dec-2016  skrll Sync with HEAD
 1.302.2.6 05-Oct-2016  skrll Sync with HEAD
 1.302.2.5 09-Jul-2016  skrll Sync with HEAD
 1.302.2.4 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.302.2.3 22-Sep-2015  skrll Sync with HEAD
 1.302.2.2 06-Jun-2015  skrll Sync with HEAD
 1.302.2.1 06-Apr-2015  skrll Sync with HEAD
 1.339.2.7 26-Apr-2017  pgoyette Sync with HEAD
 1.339.2.6 20-Mar-2017  pgoyette Sync with HEAD
 1.339.2.5 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.339.2.4 04-Nov-2016  pgoyette Sync with HEAD
 1.339.2.3 06-Aug-2016  pgoyette Sync with HEAD
 1.339.2.2 21-Jul-2016  pgoyette Actually save the bdev value when it is retrieved, so we can use it
later in a call to bdevsw_release().
 1.339.2.1 20-Jul-2016  pgoyette Adapt machine-independant code to the new {b,c}devsw reference-counting
(using localcount(9)). All callers of {b,c}devsw_lookup() now call
{b,c}devsw_lookup_acquire() which retains a reference on the 'struct
{b,c}devsw'. This reference must be released by the caller once it is
finished with the structure's content (or other data that would disappear
if the 'struct {b,c}devsw' were to disappear).
 1.342.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.353.4.3 28-Nov-2023  martin Pull up following revision(s) (requested by riastradh in ticket #1921):

sys/ufs/ffs/ffs_vfsops.c: revision 1.382

ffs_sync: Avoid unlocked access to v_numoutput/v_dirtyblkhd.

Found by lockdoc.

PR kern/57606
 1.353.4.2 11-Apr-2018  martin Pull up following revision(s) (requested by christos in ticket #738):

sys/ufs/ffs/ffs_vfsops.c: revision 1.355

PR/52728: Izumi Tsutsui: "mount -u /dev/ /" triggers kernel panic

Simplify the control flow of the mount code and make sure that the
mountfrom argument can be converted to a block device in the update
case.
 1.353.4.1 04-Feb-2018  martin Pull up following revision(s) (requested by christos in ticket #523):
sys/ufs/ffs/ffs_vfsops.c: revision 1.356
sys/ufs/ufs/ufs_inode.c: revision 1.103
Make sure inode blocks and size are zero when VOP_INACTIVE()
finalises a now unlinked inode.
Counterpart of the check in ffs_newvnode().
Prevent use-after-free where genfs_node_destroy() would destroy
a lock residing in the just freed inode data.
 1.353.2.1 27-Apr-2017  pgoyette Restore all work from the former pgoyette-localcount branch (which is
now abandoned doe to cvs merge botch).

The branch now builds, and installs via anita. There are still some
problems (cgd is non-functional and all atf tests time-out) but they
will get resolved soon.
 1.356.2.4 18-Jan-2019  pgoyette Synch with HEAD
 1.356.2.3 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.356.2.2 28-Jul-2018  pgoyette Sync with HEAD
 1.356.2.1 25-Jun-2018  pgoyette Sync with HEAD
 1.357.2.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.357.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.357.2.1 10-Jun-2019  christos Sync with HEAD
 1.362.4.5 29-Feb-2020  ad Sync with head.
 1.362.4.4 24-Jan-2020  ad - Put all the namecache stuff back into vnode_impl_t.
- Tidy vfs_cache.c up, finish the comments.
- Finalise how ID information is entered to the cache.
- Handle very small/old systems.
 1.362.4.3 19-Jan-2020  ad Set IMNT_SHRLOOKUP and use it for the in-cache case. Need to check what
more can be done with tmpfs though, it can probably do the whole lookup.
 1.362.4.2 17-Jan-2020  ad vfs_lookup:

- Do the easy component name lookups directly in the namecache without
taking vnode locks nor vnode references (between the start and the leaf /
parent), which seems to largely solve the lock contention problem with
namei(). It needs support from the file system, which has to tell the
name cache about directory permissions (only ffs and tmpfs tried so far),
and I'm not sure how or if it can work with layered file systems yet.
Work in progress.

vfs_cache:

- Make the rbtree operations more efficient: inline the lookup, and key on a
64-bit hash value (32 bits plus 16 bits length) rather than names.

- Take namecache stuff out of vnode_impl, and take the rwlocks, and put them
all together an an nchnode struct which is mapped 1:1: with vnodes. Saves
memory and nicer cache profile.

- Add a routine to help vfs_lookup do its easy component name lookups.

- Report some more stats.

- Tidy up the file a bit.
 1.362.4.1 17-Jan-2020  ad Sync with head.
 1.362.2.2 07-Jan-2025  martin Pull up following revision(s) (requested by hannken in ticket #1934):

sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.228
sys/ufs/lfs/lfs_vfsops.c: revision 1.383
sys/ufs/ffs/ffs_wapbl.c: revision 1.50
sys/ufs/ffs/ffs_vfsops.c: revision 1.383 (patch)
sys/ufs/ffs/ffs_vfsops.c: revision 1.384 (patch)

Remove comment "we are always called with the filesystem marked `MPBUSY'."
above some xxx_sync() operations. These operations get called without
any exclusive lock.

This comment appeared with "add quota support" on 1990-05-02.
On 1998/02/18 MNT_MPBUSY disappeared when vfs_busy() was changed from
an exclusive lock to a shared lock.

PR kern/58837 "ffs: Missing locking around fs_fmod/time"

Protect test/clear fs->fs_fmod with um_lock like it is already
protected in ffs_alloc.c.

When writing to disk protect moving superblock to buffer with um_lock.

Set/clear fs->fmod while mounting, updating a mount or unmounting
is safe as these operations run exclusive, either mounting creates
a new file system or the file system is suspended. Assert suspension
for update and unmount.

PR kern/58837 "ffs: Missing locking around fs_fmod/time"
 1.362.2.1 28-Nov-2023  martin Pull up following revision(s) (requested by riastradh in ticket #1770):

sys/ufs/ffs/ffs_vfsops.c: revision 1.382

ffs_sync: Avoid unlocked access to v_numoutput/v_dirtyblkhd.

Found by lockdoc.

PR kern/57606
 1.378.2.4 07-Jan-2025  martin Pull up following revision(s) (requested by hannken in ticket #1037):

sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.228
sys/ufs/lfs/lfs_vfsops.c: revision 1.383
sys/ufs/ffs/ffs_wapbl.c: revision 1.50
sys/ufs/ffs/ffs_vfsops.c: revision 1.383
sys/ufs/ffs/ffs_vfsops.c: revision 1.384

Remove comment "we are always called with the filesystem marked `MPBUSY'."
above some xxx_sync() operations. These operations get called without
any exclusive lock.

This comment appeared with "add quota support" on 1990-05-02.
On 1998/02/18 MNT_MPBUSY disappeared when vfs_busy() was changed from
an exclusive lock to a shared lock.

PR kern/58837 "ffs: Missing locking around fs_fmod/time"

Protect test/clear fs->fs_fmod with um_lock like it is already
protected in ffs_alloc.c.

When writing to disk protect moving superblock to buffer with um_lock.

Set/clear fs->fmod while mounting, updating a mount or unmounting
is safe as these operations run exclusive, either mounting creates
a new file system or the file system is suspended. Assert suspension
for update and unmount.

PR kern/58837 "ffs: Missing locking around fs_fmod/time"
 1.378.2.3 18-Oct-2023  martin Pull up following revision(s) (requested by riastradh in ticket #424):

sys/ufs/ffs/ffs_vfsops.c: revision 1.382

ffs_sync: Avoid unlocked access to v_numoutput/v_dirtyblkhd.

Found by lockdoc.
PR kern/57606
 1.378.2.2 21-Jun-2023  martin Pull up following revision(s) (requested by hannken in ticket #197):

sys/ufs/ffs/ffs_vfsops.c: revision 1.381
sys/dev/raidframe/rf_netbsdkintf.c: revision 1.412

Undo unlock/relock for VOP_IOCTL().
PR kern/57450 (unplugging hung USB disk triggers panic via _vstate_assert)
 1.378.2.1 21-Dec-2022  martin Pull up following revision(s) (requested by chs in ticket #17):

sys/ufs/ffs/ffs_vfsops.c: revision 1.379

ffs: fail mounts requesting ACLs for non-ea UFS2 file systems

For non-ea UFS2 file system, fail mounts that request ACLs rather than
letting the mount succeed only to reject all ACL operations later.

Also fix the messages about the on-disk fs flags conflicting with
the mount options for which type of ACLs to use, and about requesting
both types of ACLs.
 1.382.6.1 02-Aug-2025  perseant Sync with HEAD
 1.138 14-Dec-2021  chs ffs: support extattrs (and thus ACLs) on fifos.
 1.137 18-Jul-2021  dholland Abolish all the silly indirection macros for initializing vnode ops tables.

These are things of the form #define foofs_op genfs_op, or #define
foofs_op genfs_eopnotsupp, or similar. They serve no purpose besides
obfuscation, and have gotten cutpasted all over everywhere.
 1.136 18-Jul-2021  dholland Use macros for the canned parts of device and fifo vnode op tables.

Add GENFS_SPECOP_ENTRIES and GENFS_FIFOOP_ENTRIES macros that contain
the portion of the vnode ops table declaration that is
(conservatively) the same in every fs. Use these in every fs that
supports devices and/or fifos with separate ops tables.

Note that ptyfs works differently (it has one type of vnode with
open-coded dispatch to the specfs code, which I haven't changed in
this commit) and rump/librump/rumpvfs/rumpfs.c has an indirect dynamic
dispatch that already does more or less the same thing, which I also
haven't changed.

Also note that this anticipates a few bits in the next changeset here
and there, and adds missing but unreachable calls in some cases (e.g.
most fses weren't defining whiteout on devices and fifos, but it isn't
reachable there), and it changes parsepath on devices and fifos to
genfs_badop from genfs_parsepath (but it's not reachable there
either).

It appears that devices in kernfs were missing kqfilter, so it's
possible that if you try to use kqueue on /kern/rootdev that it'll
explode.

And finally note that the ops declaration tables aren't
order-dependent. (Other than vop_default_desc has to come first.)
Otherwise this wouldn't work.
 1.135 14-Jul-2021  christos Hook up ffsext_strategy to fifos. Pointed out by dholland@
 1.134 29-Jun-2021  dholland - Add a new vnode op: VOP_PARSEPATH.
- Move namei_getcomponent to genfs_vnops.c and call it genfs_parsepath.
- Add a parsepath entry to every vnode ops table.

VOP_PARSEPATH takes a directory vnode to be searched and a complete
following path and chooses how much of that path to consume. To begin
with, all parsepath calls are genfs_parsepath, which locates the first
'/' as always.

Note that the call doesn't take the whole struct componentname, only
the string. The other bits of struct componentname should not be
needed and there's no reason to cause potential complications by
exposing them.
 1.133 05-Sep-2020  riastradh branches: 1.133.6;
Round of uvm.h cleanup.

The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
to query whether curlwp is the pagedaemon, which should maybe be
exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
by it. We should split up uvm_extern.h but this will serve for now
to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
UVMHIST(ubchist), since ubchist is declared in uvm.h but the
reference evaporates if UVMHIST is not defined, so we reduce header
file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
here.

ok chs@
 1.132 16-May-2020  christos Add ACL support for FFS. From FreeBSD.
 1.131 18-Apr-2020  christos Extended attribute support for ffsv2, from FreeBSD.
 1.130 23-Feb-2020  ad branches: 1.130.4;
UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.129 26-May-2017  riastradh branches: 1.129.10; 1.129.16;
Make VOP_RECLAIM do the last unlock of the vnode.

VOP_RECLAIM naturally has exclusive access to the vnode, so having it
locked on entry is not strictly necessary -- but it means if there
are any final operations that must be done on the vnode, such as
ffs_update, requiring exclusive access to it, we can now kassert that
the vnode is locked in those operations.

We can't just have the caller release the last lock because some file
systems don't use genfs_lock, and require the vnode to remain valid
for VOP_UNLOCK to work, notably unionfs.
 1.128 02-Mar-2017  christos ifdef reduction
 1.127 01-Mar-2017  hannken Make compile again without "options WAPBL".

From John D. Baker via current-users@, slightly modified by me.
 1.126 01-Mar-2017  hannken Remove now redundant calls to fstrans_start()/fstrans_done().
 1.125 25-Jul-2014  dholland branches: 1.125.4; 1.125.8; 1.125.12;
Add VOP_FALLOCATE and VOP_FDISCARD to every vnode ops table I can
find.

The filesystem ones all call genfs_eopnotsupp - right now I am only
implementing the plumbing and we can implement fallocate and/or
fdiscard for files later.

The device ones call spec_fallocate (which is also genfs_eopnotsupp)
and spec_fdiscard, which dispatches to the device-level op.

The fifo ones all call vn_fifo_bypass, which also ends up being
EOPNOTSUPP.
 1.124 24-Mar-2014  hannken branches: 1.124.2;
- Make VI_XLOCK, VI_CLEAN and VI_LOCKSHARE private to kern/vfs_*.c.
- Make vwait() static.
- Add vdead_check() to check a vnode for being or becoming dead.

Discussed on tech-kern.

Welcome to 6.99.38
 1.123 23-Jun-2013  dholland branches: 1.123.2;
Stick ffs_, ext2_, chfs_, filecore_, cd9660_, or mfs_ in front of
the following symbols so as to disambiguate fully. (Christos already
did the lfs ones.)

lblkno
lblktosize
lfragtosize
numfrags
blkroundup
fragroundup
 1.122 22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.121 29-Apr-2012  chs branches: 1.121.2;
change vflushbuf() to take the full FSYNC_* flags.
translate FSYNC_LAZY into PGO_LAZY for VOP_PUTPAGES() so that
genfs_do_io() can set the appropriate io priority for the I/O.
this is the first part of addressing PR 46325.
 1.120 27-Jun-2011  manu branches: 1.120.2; 1.120.6; 1.120.8;
Implement extended attribute listing for UFS1.

Modify lsextattr(8) so that it does not expect each attribute name to be
prefixed by its length. This enable extattr_list_(file|link|fd) to
return a buffer matching its documentation. This also makes the interface
similar to what Linux and FUSE do, which is nice for interoperability.

Note that since we had no EA implementation supporting listing, we do
not break anything.
 1.119 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.118 27-Apr-2011  hannken branches: 1.118.2;
Cleanup ffs fsync and make devices on wapbl enabled file systems work here:

- Replace the ugly sync loop in ffs_full_fsync() and ffs_vfs_fsync() with
vflushbuf(). This loop is a relic of softdeps and not needed anymore.

- Add ffs_spec_fsync() for device nodes on ffs file systems that calls
spec_fsync() like all other file systems do and then updates the ctime.

Discussed on tech-kern.

Should fix PRs:
PR #41192 wapbl diagnostic panic during cgdconfig
PR #41977 kernel diagnostic assertion "rw_lock_held(&wl->wl_rwlock)" failed
PR #42149 wapbl locking panic if watching DVD
PR #42551 Lockdebug assert in wapbl when running zpool
 1.117 15-Apr-2011  hannken ffs_fsync: no need for wapbl_vptomp() here -- vnode is always VREG.
 1.116 12-Aug-2010  hannken branches: 1.116.2;
ffs_reclaim: don't free an already free inode. This may happen when
ffs_fhtovp() gets a free inode and releases it.
 1.115 28-Jul-2010  hannken ext2fs,ffs: free on disk inodes in the reclaim routine.
Remove now unneeded vnode flag VI_FREEING.

Welcome to 5.99.38.

Ok: Andrew Doran <ad@netbsd.org>
 1.114 29-Mar-2010  pooka Stop exposing fifofs internals and leave only fifo_vnodeop_p visible.
 1.113 04-Nov-2009  hannken branches: 1.113.2; 1.113.4;
Now that softdep has left the tree the only place needing the ffs_lock()
hack is ffs_sync().

- Use the generic lock operations for ffs.
- Change ffs_sync() to omit the vnode lock while suspending.

Reviewed by: Antti Kantee <pooka@netbsd.org>
 1.112 29-Mar-2009  ad fsync:

- atime updates were not being synced.

ffs_sync:

- In some cases the sync vnode was acting like now dead /usr/sbin/update.
It was examining vnodes that it should have ignored.

- It would find dirty inodes and try to flush them. Often ffs_fsync()
cheerfully ignored the flush request due to the fsync bug. Such inodes
remained dirty and were repeatedly re-examined by the syncer until
vnode reclaim or system shutdown.

- We were marking our place in the per-mount vnode list even though in
most cases there was not flush to perform. While not a bug, this wasted
CPU cycles because a TAILQ_NEXT would have sufficed.
 1.111 22-Feb-2009  ad PR kern/26878 FFSv2 + softdep = livelock (no free ram)
PR kern/16942 panic with softdep and quotas
PR kern/19565 panic: softdep_write_inodeblock: indirect pointer #1 mismatch
PR kern/26274 softdep panic: allocdirect_merge: ...
PR kern/26374 Long delay before non-root users can write to softdep partitions
PR kern/28621 1.6.x "vp != NULL" panic in ffs_softdep.c:4653 while unmounting a softdep (+quota) filesystem
PR kern/29513 FFS+Softdep panic with unfsck-able file-corruption
PR kern/31544 The ffs softdep code appears to fail to write dirty bits to disk
PR kern/31981 stopping scsi disk can cause panic (softdep)
PR kern/32116 kernel panic in softdep (assertion failure)
PR kern/32532 softdep_trackbufs deadlock
PR kern/37191 softdep: locking against myself
PR kern/40474 Kernel panic after remounting raid root with softdep

Retire softdep, pass 2. As discussed and later formally announced on the
mailing lists.
 1.110 22-Feb-2009  ad PR kern/39564 wapbl performance issues with disk cache flushing
PR kern/40361 WAPBL locking panic in -current
PR kern/40361 WAPBL locking panic in -current
PR kern/40470 WAPBL corrupts ext2fs
PR kern/40562 busy loop in ffs_sync when unmounting a file system
PR kern/40525 panic: ffs_valloc: dup alloc

- A fix for an issue that can lead to "ffs_valloc: dup" due to dirty cg
buffers being invalidated. Problem discovered and patch by dholland@.

- If the syncer fails to lazily sync a vnode due to lock contention,
retry 1 second later instead of 30 seconds later.

- Flush inode atime updates every ~10 seconds (this makes most sense with
logging). Presently they didn't hit the disk for read-only files or
devices until the file system was unmounted. It would be better to trickle
the updates out but that would require more extensive changes.

- Fix issues with file system corruption, busy looping and other nasty
problems when logging and non-logging file systems are intermixed,
with one being the root file system.

- For logging, do not flush metadata on an inode-at-a-time basis if the sync
has been requested by ioflush. Previously, we could try hundreds of log
sync operations a second due to inode update activity, causing the syncer
to fall behind and metadata updates to be serialized across the entire
file system. Instead, burst out metadata and log flushes at a minimum
interval of every 10 seconds on an active file system (happens more often
if the log becomes full). Note this does not change the operation of
fsync() etc.

- With the flush issue fixed, re-enable concurrent metadata updates in
vfs_wapbl.c.
 1.109 01-Feb-2009  ad branches: 1.109.2;
PR kern/40469 5.0_BETA/amd64 INSTALL kernel panics when installing on log-enabled filesystems
PR kern/40470 WAPBL corrupts ext2fs

Don't touch inodes at all unless VOP_FSYNC(). Might fix the ext2fs problem,
I am not sure.
 1.108 28-Dec-2008  christos Don't try to ffs_update VT_NON vnodes
 1.107 22-Dec-2008  ad Add a comment.
 1.106 22-Dec-2008  ad PR kern/40246 current panics when removing swap devices

Someone was smoking crack when they decided to unconditionally OR FSYNC_VFS
into the flags for block devices.
 1.105 21-Dec-2008  ad PR kern/40210 5.0 BETA WAPBL related crash
 1.104 10-Oct-2008  hannken branches: 1.104.2; 1.104.4;
Break a deadlock where one thread has a wapbl transaction, calls VOP_GETPAGES
and wants to busy a page while another thread calls VOP_PUTPAGES on the same
vnode, takes pages busy and wants to start a wapbl transaction.

Reviewed by: Jason Thorpe <thorpej@netbsd.org>
 1.103 22-Aug-2008  hannken Add snapshot support for logging ffs file systems.

- Add UFS_WAPBL_BEGIN() / UFS_WAPBL_END() where needed.

- Expunge WAPBL log inodes from snapshots.

- Ffs_copyonwrite() and ffs_snapblkfree() must run inside a WAPBL transaction.

- Add ffs_gop_write() as a wrapper around genfs_gop_write() that makes sure
genfs_gop_write() gets always called inside a WAPBL transaction.

- Add VOP_PUTPAGES() flag PGO_JOURNALLOCKED to tag calls to VOP_PUTPAGES()
inside a WAPBL transaction.

Reviewed by: Simon Burge <simonb@netbsd.org>, Greg Oster <oster@netbsd.org>

PGO_JOURNALLOCKED / ffs_gop_write() part presented on tech-kern@.
 1.102 12-Aug-2008  hannken Deny read/write access to snapshot vnodes. We use fss(4) to read from
snapshots. With this policy in place:

- Separate the snapshot vnode lock from the snapshot common lock.
Snapshots no longer need recursive vnode locks.

- Use a mutex (si_snaplock) to serialize creation, deletion, reading and
writing of snapshots.

- Move ffs_read() for snapshots into ffs_snapshot.c.

Reviewed by: Jason Thorpe <thorpej@netbsd.org>

While here change ffs_copyonwrite() to fail requests from pagedaemon that need
to copy-on-write.
 1.101 31-Jul-2008  oster Make MSDOS filesystems work again after WAPBL merge. Fixes a quite
repeatable panic in fstrans_getstate() found while searching for a
different USB bug. Also makes the code somewhat more readable.

Patch from Juergen Hannken-Illjes with a small rearrangement from me.

Approved by: hannken
 1.100 31-Jul-2008  simonb Merge the simonb-wapbl branch. From the original branch commit:

Add Wasabi System's WAPBL (Write Ahead Physical Block Logging)
journaling code. Originally written by Darrin B. Jewell while
at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

OK'd by core@, releng@.
 1.99 29-Apr-2008  ad branches: 1.99.2; 1.99.4; 1.99.6;
PR kern/38057 ffs makes assuptions about devvp file system
PR kern/33406 softdeps get stuck in endless loop

Introduce VFS_FSYNC() and call it when syncing a block device, if it
has a mounted file system.
 1.98 30-Jan-2008  ad branches: 1.98.6; 1.98.8; 1.98.10;
Replace struct lock on vnodes with a simpler lock object built on
krwlock_t. This is a step towards removing lockmgr and simplifying
vnode locking. Discussed on tech-kern.
 1.97 25-Jan-2008  ad Remove VOP_LEASE. Discussed on tech-kern.
 1.96 09-Jan-2008  ad Go back to freeing on disk inodes in the inactive routine. It would be
better not to do this, but it rules out potential side effects with softdep.
 1.95 03-Jan-2008  ad Use pool_cache.
 1.94 02-Jan-2008  ad Merge vmlocking2 to head.
 1.93 26-Nov-2007  pooka branches: 1.93.2; 1.93.6;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.92 10-Oct-2007  ad branches: 1.92.4;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.91 21-Aug-2007  hannken branches: 1.91.2; 1.91.4;
Modify ffs_lock() to take care for changed v_vnlock. Snapshots do not need
transferlockers() anymore.

From FreeBSD ffs_vnops.c Rev. 1.159

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.90 09-Aug-2007  hannken Move the fstrans-aware lock vnops from ufs to ffs. Other ufs file systems
do not need them.

Ride on 4.99.28
 1.89 20-Jul-2007  pooka branches: 1.89.4; 1.89.6;
In sync, skip over vnodes based on if they are clean rather than
if they have pages.
 1.88 05-Jun-2007  yamt branches: 1.88.2;
improve post-ubc file overwrite performance in common cases.
ie. when it's safe, actually overwrite blocks rather than doing
read-modify-write.

also fixes PR/33152 and PR/36303.
 1.87 17-May-2007  hannken Fstrans_start() always returns zero, so change its type to void.
 1.86 20-Feb-2007  ad branches: 1.86.4; 1.86.6;
Call genfs_node_destroy() where appropriate.
 1.85 29-Jan-2007  hannken branches: 1.85.2;
Change fstrans enum types to upper case.
No functional change.

From Antti Kantee <pooka@netbsd.org>
 1.84 19-Jan-2007  hannken New file system suspension API to replace vn_start_write and vn_finished_write.
The suspension helpers are now put into file system specific operations.
This means every file system not supporting these helpers cannot be suspended
and therefore snapshots are no longer possible.

Implemented for file systems of type ffs.

The new API is enabled on a kernel option NEWVNGATE. This option is
not enabled by default in any kernel config.

Presented and discussed on tech-kern with much input from
Bill Studenmund <wrstuden@netbsd.org> and YAMAMOTO Takashi <yamt@netbsd.org>.

Welcome to 4.99.9 (new vfs op vfs_suspendctl).
 1.83 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.82 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.81 23-Jul-2006  ad branches: 1.81.4; 1.81.6;
Use the LWP cached credentials where sane.
 1.80 14-May-2006  elad integrate kauth.
 1.79 09-Apr-2006  yamt ffs_gop_size: revert a problematic part of 1.78.
problems reported by Kouichirou Hiratsuka and Jukka Salmi on current-users@.
 1.78 30-Mar-2006  yamt some cleanups after the introduction of GOP_SIZE_MEM flag.
- remove GOP_SIZE_READ/GOP_SIZE_WRITE flags.
they have not been used since the change.
- ufs_balloc_range: remove code which has been no-op since the change.
thanks Konrad Schroder for explaining the original intention of the code.
- ffs_gop_size: don't extend past eof, in the case of GOP_SIZE_MEM.
otherwise genfs_getpages end up to allocate pages past eof unnecessarily.
 1.77 11-Dec-2005  christos branches: 1.77.4; 1.77.6; 1.77.8; 1.77.10; 1.77.12;
merge ktrace-lwp.
 1.76 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.75 09-Sep-2005  yamt branches: 1.75.2;
revert the code to expand putpage requests to block boundary.
because:
- it was incomplete in some cases.
- it can confuse pagedaemon.
see PR/15364 for details.
 1.74 30-Aug-2005  xtraeme * Remove __P()
* Use ANSI function declarations on ext2fs and mfs
 1.73 28-Aug-2005  thorpej Experimental support for extended attributes on UFS1 file systems, using a
backing file per attribute type indexed by inode number to hold the extended
attributes.

This is working pretty well on my test systems, except for the "autostart"
feature. I need someone with a better handle on the VFS locking protocol
to go over that.

This is a work-in-progress. There are parts of this that could be re-factored
allowing this approach to be used on other types of file systems.

Adapted from FreeBSD.
 1.72 26-Jul-2005  yamt revert VCHR part of ffs_vnops.c 1.71.
as VCHR uses the device pager, no point to call VOP_PUTPAGES here.
pointed by Chuck Silvers.
 1.71 21-Jul-2005  yamt ffs_full_fsync: because VBLK/VCHR can be mmap'ed,
do VOP_PUTPAGES for them as well.
 1.70 15-Jul-2005  thorpej Use ANSI function decls.
 1.69 26-Feb-2005  perry branches: 1.69.2; 1.69.4;
nuke trailing whitespace
 1.68 27-Jan-2005  wrstuden Fix pasto in previous. We only perform the DIOCCACHESYNC call if
FSYNC_CACHE is set, not if FSYNC_WAIT is set.
 1.67 25-Jan-2005  wrstuden Extend fsync_range(2) to support the FDISKSYNC flag, which requests
that the sync be propogated out through the disk drive caches.
 1.66 15-Nov-2003  thorpej branches: 1.66.8; 1.66.10;
Kernel portion of the fsync_range(2) system call. Written by Bill
Studenmund, and contributed by Wasabi Systems, Inc.
 1.65 08-Nov-2003  jdolecek fix uninitialized variable use in previous change (!)
 1.64 08-Nov-2003  dbj always do a full fsync if vp->v_type != VREG
in partial fsync, only use PGO_SYNCIO if FSYNC_WAIT is specified
 1.63 08-Nov-2003  dbj protect use of buf's b_flags with b_interlock
 1.62 08-Nov-2003  dbj protect a few uses of buf's b_flags with b_interlock
 1.61 25-Oct-2003  kleink Remove the present incarnation of FSYNC_DATAONLY use from ffs_fsync() and
ffs_full_fsync(); while it is supposed to hint that the update of _file_
metadata (as in timestamps et al.) may be omitted it doesn't mean the
same for _filesystem_ metadata.
 1.60 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.59 29-Jun-2003  fvdl branches: 1.59.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.58 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.57 16-Apr-2003  fvdl ffs_reclaim may be called while the dinode pointer in the inode structure
is still NULL (in the case of an error in ffs_vget). Check for this
condition before doing a pool_put.
 1.56 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.55 17-Feb-2003  perseant Add code to UBCify LFS. This is still behind "#ifdef LFS_UBC" for now
(there are still some details to work out) but expect that to go
away soon. To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:

* Create a writer daemon kernel thread whose purpose is to handle page
writes for the pagedaemon, but which also takes over some of the
functions of lfs_check(). This thread is started the first time an
LFS is mounted.

* Add a "flags" parameter to GOP_SIZE. Current values are
GOP_SIZE_READ, meaning that the call should return the size of the
in-core version of the file, and GOP_SIZE_WRITE, meaning that it
should return the on-disk size. One of GOP_SIZE_READ or
GOP_SIZE_WRITE must be specified.

* Instead of using malloc(...M_WAITOK) for everything, reserve enough
resources to get by and use malloc(...M_NOWAIT), using the reserves if
necessary. Use the pool subsystem for structures small enough that
this is feasible. This also obsoletes LFS_THROTTLE.

And a few that are not strictly necessary:

* Moves the LFS inode extensions off onto a separately allocated
structure; getting closer to LFS as an LKM. "Welcome to 1.6O."

* Unified GOP_ALLOC between FFS and LFS.

* Update LFS copyright headers to correct values.

* Actually cast to unsigned in lfs_shellsort, like the comment says.

* Keep track of which segments were empty before the previous
checkpoint; any segments that pass two checkpoints both dirty and
empty can be summarily cleaned. Do this. Right now lfs_segclean
still works, but this should be turned into an effectless
compatibility syscall.
 1.54 05-Feb-2003  pk Make the buffer cache code MP-safe.
 1.53 29-Jan-2003  simonb Remove variable that is only assigned to but not referenced.
 1.52 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.51 01-Nov-2002  kristerw Removed unused variables doclusterread and doclusterwrite.
 1.50 23-Oct-2002  jdolecek merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe
 1.49 05-May-2002  chs for softdep vnodes, always write together the pages for any block that
might have a dependency , since the accounting doesn't work otherwise.
fixes PRs 15364 16336 16448.
 1.48 31-Dec-2001  thorpej Do not compare an integer to NULL.
 1.47 27-Dec-2001  fvdl The softdep code sometimes use vfs_vget .. vput. For removals, these
would result in a vop_inactive call for the vnode each time, resulting
in vinvalbuf->fsync. The original softdep code avoided the fsync
in vinvalbuf by not calling it if there were no dirty blocks. This
was changed in NetBSD. Also, flush_inodedeps was changed to mark
the inode as modified so that it would do an inode update and flush the
last one. This combination basically caused a sync write for each removed
file in an rm -rf (showing up delayed from the syncer a lot of the time).

If called from vinvalbuf (FSYNC_RECLAIM), and there were no dirty blocks
or pages to begin with, still do everything as normal, so that possible dirty
blocks in transit to disk are properly waited for, etc, but don't pass
UPDATE_WAIT to VOP_UPDATE, since there is no need for it in that case.
 1.46 08-Nov-2001  chs call VOP_PUTPAGES() directly for vnodes instead of
going through the UVM pager "put" vector.
 1.45 06-Nov-2001  simonb Remove some variables that are set but never used.
 1.44 30-Oct-2001  lukem add __KERNEL_RCSID()
 1.43 26-Oct-2001  lukem remove #include <ufs/ufs/quota.h> where it was just to appease
<ufs/ufs/inode.h>, since the latter now includes the former. leave the former
in source that obviously uses specific bits of it (for completeness.)
 1.42 26-Sep-2001  chs branches: 1.42.2;
undo the part of the previous revision about skipping
the put if there are no pages, that seems to cause some problem.
fix another problem with missing an splx(), spotted by enami.
 1.41 26-Sep-2001  chs be sure to call the pager put with page-aligned offsets.
spotted by Nathan Williams.

while I'm here, move an splbio() so that we don't return without
splx()ing it if there's an error, and don't bother calling the
pager put if the vnode has no pages.
 1.40 22-Sep-2001  sommerfeld Add fifo_putpages() placebo so that the vnode's uobj is unlocked.
 1.39 15-Sep-2001  chs a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.38 17-Aug-2001  chs branches: 1.38.2;
add getpages/putpages entries for spec vnodes.
 1.37 22-Jan-2001  jdolecek branches: 1.37.2; 1.37.6;
make filesystem vnodeop, specop, fifoop and vnodeopv_* arrays const
 1.36 10-Dec-2000  chs call pgo_flush with (start,end) rather than (start,length).
 1.35 27-Nov-2000  chs Initial integration of the Unified Buffer Cache project.
 1.34 24-Oct-2000  fvdl Stay at splbio across the VBWAIT loop, as is done elsewhere in the
kernel. Avoids a possible race condition. Pointed out by
enami@netbsd.org, problem reported by deberg@netbsd.org.
 1.33 19-Sep-2000  fvdl Adapt for VOP_FSYNC parameter change.

Implement range fsync for FFS. Note: not yet implemented for the
SOFTDEP case.
 1.32 28-Jun-2000  mrg remove include of <vm/vm.h> and <uvm/uvm_extern.h>
 1.31 29-May-2000  mycroft branches: 1.31.2;
According to Frank, buffers with dependencies *are* left on v_dirtyblks, so
remove the FSYNC_RECLAIM check and force them to be flushed.
 1.30 29-May-2000  mycroft Never call softdep_sync_metadata() in the FSYNC_RECLAIM case. Any pending
blocks are detached from the vnode at this point. When the dependencies are
broken to enable writing the blocks, the vnode will be regenerated. (The only
reason we sync buffers in this case is that they have to be detached from the
vnode.)
 1.29 29-May-2000  mycroft In ffs_fsync(), remove the FSYNC_RECLAIM special case, so that it properly
waits for pending buffers, and doesn't throw away time stamp updates.
 1.28 27-May-2000  thorpej branches: 1.28.2;
sleep() -> tsleep()
 1.27 13-May-2000  perseant Change the sementics of the last parameter from a boolean ("waitfor") to
a set of flags ("flags"). Two flags are defined, UPDATE_WAIT and
UPDATE_DIROP.

Under the old semantics, VOP_UPDATE would block if waitfor were set,
under the assumption that directory operations should be done
synchronously. At least LFS and FFS+softdep do not make this
assumption; FFS+softdep got around the problem by enclosing all relevant
calls to VOP_UPDATE in a "if(!DOINGSOFTDEP(vp))", while LFS simply
ignored waitfor, one of the reasons why NFS-serving an LFS filesystem
did not work properly.

Under the new semantics, the UPDATE_DIROP flag is a hint to the
fs-specific update routine that the call comes from a dirop routine, and
should be wait for, or not, accordingly.

Closes PR#8996.
 1.26 30-Mar-2000  augustss Remove register declarations.
 1.25 29-Mar-2000  simonb Don't need to include <sys/conf.h> here.
 1.24 17-Mar-2000  fvdl If we're reclaiming, and there are no dirty blocks, just return.
 1.23 15-Mar-2000  fvdl Revert this back to 2 revisions ago, these checks are done higher up now.
 1.22 14-Mar-2000  fvdl Don't immediately return in ffs_fsync if there appears to be no data
to flush if it's a vnode on a softdep filesystem. softdep_sync_metadata
may still need to do some work.
 1.21 11-Mar-2000  perseant Move vinvalbuf's check for dirty blocks into ffs_fsync, to ensure that
mode and ownership bits are flushed to disk before the vnode is
reclaimed.

The check, introduced in the softdep merge, assumes that if no blocks
are dirty, no file data *or metadata* needs to be flushed to disk. This
is true of ffs, but is not true of lfs, and may not be true of other
filesystems.

Tested by myself and Bill Squier <groo@cs.stevens-tech.edu>.
 1.20 15-Nov-1999  fvdl Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.19 03-Aug-1999  wrstuden branches: 1.19.2; 1.19.4; 1.19.8;
Add support for fcntl(2) to generate VOP_FCNTL calls. Any fcntl
call with F_FSCTL set and F_SETFL calls generate calls to a new
fileop fo_fcntl. Add genfs_fcntl() and soo_fcntl() which return 0
for F_SETFL and EOPNOTSUPP otherwise. Have all leaf filesystems
use genfs_fcntl().

Reviewed by: thorpej
Tested by: wrstuden
 1.18 24-Mar-1999  mrg branches: 1.18.4;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.
 1.17 04-Dec-1998  bouyer No need to #include malloc.h here.
 1.16 01-Sep-1998  thorpej branches: 1.16.2;
Use the pool allocator and the "nointr" pool page allocator for FFS inodes.

XXX MFS also comes in here for inodes, and used a different malloc type,
but the structure is the same, so we just use the FFS inode pool.
 1.15 24-Jun-1998  sommerfe Always include fifos; "not an option any more".
 1.14 22-Jun-1998  sommerfe defopt for options FIFO
 1.13 09-Jun-1998  scottr Protect various config(8)-generated files from inclusion while
building LKMs. Fixes PR 5557.
 1.12 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.11 10-Feb-1998  mrg - add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.10 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)
 1.9 07-Sep-1996  mycroft Implement poll(2).
 1.8 01-Sep-1996  mycroft Add a set of generic file system operations that most file systems use.
Also, fix some time stamp bogosities.
 1.7 11-May-1996  mycroft Change VOP_UPDATE() semantics:
* Make 2nd and 3rd args timespecs, not timevals.
* Consistently pass a Boolean as the 4th arg (except in LFS).
Also, fix ffs_update() and lfs_update() to actually change the nsec fields.
 1.6 09-Feb-1996  christos ffs prototypes
 1.5 14-Dec-1994  mycroft Sync with CSRG.
 1.4 13-Dec-1994  mycroft Turn lease_check() into a vnode op, per CSRG.
 1.3 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.2 22-Jun-1994  mycroft Deallocate the vnode data using the correct type for MFS nodes.
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.16.2.6 02-Jun-1999  chs use the new flags PG_RDONLY and UFP_NORDONLY to ensure that
any page which becomes dirty will have backing store allocated.
 1.16.2.5 30-May-1999  chs redo ffs_getpages() and ffs_putpages() again since vm_page's
blkno field is gone.
 1.16.2.4 29-Apr-1999  chs disable buffer-cache clustering.
 1.16.2.3 25-Feb-1999  chs major overhaul of getpages and putpages functions.
 1.16.2.2 16-Nov-1998  chs fix style nits.
 1.16.2.1 09-Nov-1998  chs initial snapshot. lots left to do.
 1.18.4.2 04-Jul-1999  chs support VOP_BALLOC(). ffs_getpages() and ffs_putpages() are gone
in favor of the genfs versions.
 1.18.4.1 07-Jun-1999  chs merge everything from chs-ubc branch.
 1.19.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.19.4.2 26-Oct-1999  fvdl Merge changes in the trickle-sync and softdep code as done by Kirk McKusick
in FreeBSD since the version that we based the branch on. Merging mostly
done by Ethan Solomita <ethan@geocast.com>.

Also, make sure the syncer thread/process isn't active when we're
unmounting a filesystem. This could wreak havoc. XXX should be done
on a per-mountpoint basis, but especially the softdep code would
end up to be a big pile of vfs_busy() calls.
 1.19.4.1 19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.19.2.4 11-Feb-2001  bouyer Sync with HEAD.
 1.19.2.3 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.19.2.2 08-Dec-2000  bouyer Sync with HEAD.
 1.19.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.28.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.31.2.2 26-Feb-2002  he Pull up revision 1.47 (via patch, requested by fvdl):
Correct a mistake made in the original merge-in of the softdep
code, and fix a problem which caused ffs_fsync to do unneeded
sync writes.
 1.31.2.1 14-Dec-2000  he Pull up revisions 1.33-1.34 (requested by fvdl):
Improve NFS performance, possibly with as much as 100% in
throughput. Please note: this implies a kernel interface change,
VOP_FSYNC gains two arguments.
 1.37.6.7 25-Sep-2002  jdolecek switch over to genfs_kqfilter(), g/c the ufs_kqfilter() code
 1.37.6.6 23-Sep-2002  jdolecek add spec kqfilter vnode op
 1.37.6.5 22-Sep-2002  jdolecek add fifo_kqfilter() to ffs_fifoop_entries[], to switch on
support for kevents on fifos on FFS
 1.37.6.4 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.37.6.3 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.37.6.2 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.37.6.1 10-Jul-2001  lukem add ufs_kqfilter method for vop_kqfilter
 1.37.2.9 11-Nov-2002  nathanw Catch up to -current
 1.37.2.8 16-Jul-2002  nathanw pagedaemon_proc really should be a proc, not a LWP.
 1.37.2.7 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.37.2.6 20-Jun-2002  nathanw Catch up to -current.
 1.37.2.5 08-Jan-2002  nathanw Catch up to -current.
 1.37.2.4 14-Nov-2001  nathanw Catch up to -current.
 1.37.2.3 26-Sep-2001  nathanw Catch up to -current.
Again.
 1.37.2.2 21-Sep-2001  nathanw Catch up to -current.
 1.37.2.1 24-Aug-2001  nathanw Catch up with -current.
 1.38.2.1 01-Oct-2001  fvdl Catch up with -current.
 1.42.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.59.2.7 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.59.2.6 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.59.2.5 04-Feb-2005  skrll Sync with HEAD.
 1.59.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.59.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.59.2.2 03-Aug-2004  skrll Sync with HEAD
 1.59.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.66.10.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.66.10.1 12-Feb-2005  yamt sync with head.
 1.66.8.1 29-Apr-2005  kent sync with -current
 1.69.4.8 04-Feb-2008  yamt sync with head.
 1.69.4.7 21-Jan-2008  yamt sync with head
 1.69.4.6 07-Dec-2007  yamt sync with head
 1.69.4.5 27-Oct-2007  yamt sync with head.
 1.69.4.4 03-Sep-2007  yamt sync with head.
 1.69.4.3 26-Feb-2007  yamt sync with head.
 1.69.4.2 30-Dec-2006  yamt sync with head.
 1.69.4.1 21-Jun-2006  yamt sync with head.
 1.69.2.2 21-Oct-2005  tron Pull up following revision(s) (requested by yamt in ticket #845):
sys/ufs/ffs/ffs_extern.h: revision 1.45 via patch
sys/ufs/ffs/ffs_vnops.c: revision 1.75 via patch
revert the code to expand putpage requests to block boundary.
because:
- it was incomplete in some cases.
- it can confuse pagedaemon.
see PR/15364 for details.
 1.69.2.1 24-Aug-2005  riz Pull up following revision(s) (requested by yamt in ticket #688):
sys/miscfs/genfs/genfs_vnops.c: revision 1.98 via patch
sys/ufs/ffs/ffs_vfsops.c: revision 1.165
sys/ufs/lfs/lfs_extern.h: revision 1.69
sys/fs/filecorefs/filecore_vfsops.c: revision 1.20
sys/nfs/nfs_node.c: revision 1.80
sys/fs/smbfs/smbfs_node.c: revision 1.24
sys/fs/cd9660/cd9660_vfsops.c: revision 1.24
sys/fs/msdosfs/msdosfs_denode.c: revision 1.8
sys/miscfs/genfs/genfs_node.h: revision 1.6
sys/ufs/lfs/lfs_vfsops.c: revision 1.183
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.86
sys/fs/adosfs/advfsops.c: revision 1.23
sys/fs/ntfs/ntfs_vfsops.c: revision 1.31
- constify genfs_ops.
- use member designators.

sys/miscfs/genfs/genfs_vnops.c: revision 1.99 via patch
genfs_getpages: don't forget to put the vnode onto the syncer's work que
ue
even in the case of PGO_LOCKED.

sys/uvm/uvm_bio.c: revision 1.40
sys/uvm/uvm_pager.h: revision 1.29
sys/miscfs/genfs/genfs_vnops.c: revision 1.100 via patch
sys/ufs/ufs/ufs_inode.c: revision 1.50
- introduce PGO_NOBLOCKALLOC and use it for ubc mapping
to prevent unnecessary block allocations in the case that
page size > block size.
- ufs_balloc_range: use VM_PROT_WRITE+PGO_NOBLOCKALLOC rather than
VM_PROT_READ.

sys/uvm/uvm_fault.c: revision 1.96
sys/miscfs/genfs/genfs_vnops.c: revision 1.101 via patch
sys/uvm/uvm_object.h: revision 1.19
sys/miscfs/genfs/genfs_node.h: revision 1.7
ensure that vnodes with dirty pages are always on syncer's queue.
- genfs_putpages: wait for i/o completion of PG_RELEASED/PG_PAGEOUT pages by
setting "wasclean" false when encountering them.
suggested by Stephan Uphoff in PR/24596 (1).
- genfs_putpages: write protect pages when cleaning out, if
we're going to take the vnode off the syncer's queue.
uvm_fault: don't write-map pages unless its vnode is already on
the syncer's queue.
fix PR/24596 (3) but in the different way from the suggested fix.
(to keep our current behaviour, ie. not to require explicit msync.
discussed on tech-kern@.)
- genfs_putpages: don't mistakenly take a vnode off the queue
by introducing a generation number in genfs_node.
genfs_getpages: increment the generation number.
suggested by Stephan Uphoff in PR/24596 (2).
- add some assertions.

sys/miscfs/genfs/genfs_vnops.c: revision 1.102 via patch
genfs_putpages: don't bother to clean the vnode unless VONWORKLST.

sys/ufs/ffs/ffs_vnops.c: revision 1.71
ffs_full_fsync: because VBLK/VCHR can be mmap'ed,
do VOP_PUTPAGES for them as well.

sys/uvm/uvm_fault.c: revision 1.97
uvm_fault: check a correct object in the case of layered filesystems.
fix PR/30811 from Jukka Salmi.

sys/uvm/uvm_object.h: revision 1.20
sys/ufs/ffs/ffs_vfsops.c: revision 1.167
sys/uvm/uvm_bio.c: revision 1.41
sys/ufs/ufs/ufs_vnops.c: revision 1.129
sys/uvm/uvm_mmap.c: revision 1.92
sys/uvm/uvm_fault.c: revision 1.98
sys/kern/vfs_subr.c: revision 1.252
sys/fs/msdosfs/denode.h: revision 1.5
sys/miscfs/genfs/genfs_vnops.c: revision 1.103 via patch
sys/fs/msdosfs/msdosfs_denode.c: revision 1.9
sys/sys/vnode.h: revision 1.141
sys/ufs/ufs/ufs_inode.c: revision 1.51
sys/ufs/ufs/ufs_extern.h: revision 1.45 via patch
sys/miscfs/genfs/genfs_node.h: revision 1.8
sys/ufs/lfs/lfs_vfsops.c: revision 1.184
sys/uvm/uvm_pager.h: revision 1.30
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.87
update file timestamps for nfsd loaned-read and mmap.
PR/25279. discussed on tech-kern@.

sys/miscfs/genfs/genfs_vnops.c: revision 1.104 via patch
don't write-protect wired pages. pointed by Chuck Silvers.
for now, leave a vnode on the syncer's queue, as suggested by him.

sys/ufs/ffs/ffs_vnops.c: revision 1.72
revert VCHR part of ffs_vnops.c 1.71.
as VCHR uses the device pager, no point to call VOP_PUTPAGES here.
pointed by Chuck Silvers.
 1.75.2.2 29-Oct-2005  yamt use ffs_* directly rather than via ufs_ops.
suggested by Chuck Silvers.
 1.75.2.1 20-Oct-2005  yamt adapt ufs.
 1.77.12.2 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.77.12.1 31-Mar-2006  tron Merge 2006-03-31 NetBSD-current into the "peter-altq" branch.
 1.77.10.3 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.77.10.2 19-Apr-2006  elad sync with head.
 1.77.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.77.8.4 11-Aug-2006  yamt sync with head
 1.77.8.3 24-May-2006  yamt sync with head.
 1.77.8.2 11-Apr-2006  yamt sync with head
 1.77.8.1 01-Apr-2006  yamt sync with head.
 1.77.6.2 01-Jun-2006  kardel Sync with head.
 1.77.6.1 22-Apr-2006  simonb Sync with head.
 1.77.4.1 09-Sep-2006  rpaulo sync with head
 1.81.6.2 10-Dec-2006  yamt sync with head.
 1.81.6.1 22-Oct-2006  yamt sync with head
 1.81.4.2 01-Feb-2007  ad Sync with head.
 1.81.4.1 18-Nov-2006  ad Sync with head.
 1.85.2.2 17-May-2007  yamt sync with head.
 1.85.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.86.6.1 11-Jul-2007  mjf Sync with head.
 1.86.4.13 09-Oct-2007  ad Sync with head.
 1.86.4.12 16-Sep-2007  ad - Checkpoint work in progress on the vnode lifecycle and reference counting
stuff. This makes it work properly without kernel_lock and fixes a few
quite old bugs. See vfs_subr.c 1.283.2.17 for details.

- Fix some problems with softdep. Unfortunately our softdep code appears
to have some longstanding bugs that cause it fail under stress test.
 1.86.4.11 30-Aug-2007  ad bufcache_lock is sufficient to inspect v_dirtyblkhd, vp->v_interlock is only
needed to modify.
 1.86.4.10 24-Aug-2007  ad Sync with buffer cache locking changes. See buf.h/vfs_bio.c for details.
Some minor portions are incomplete and needs to be verified as a whole.
 1.86.4.9 20-Aug-2007  ad Sync with HEAD.
 1.86.4.8 01-Jul-2007  ad Minor locking fixes.
 1.86.4.7 23-Jun-2007  ad - Lock v_cleanblkhd, v_dirtyblkhd, v_numoutput with the vnode's interlock.
Get rid of global_v_numoutput_lock. Partially incomplete as the buffer
cache locking doesn't work very well and needs an overhaul.
- Some changes to try and make softdep MP safe. Untested.
 1.86.4.6 09-Jun-2007  ad Sync with head.
 1.86.4.5 08-Jun-2007  ad Sync with head.
 1.86.4.4 27-May-2007  ad ffs_sync: vp->v_data can be NULL if the vnode is being recycled.
 1.86.4.3 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.86.4.2 21-Mar-2007  ad - Replace more simple_locks, and fix up in a few places.
- Use condition variables.
- LOCK_ASSERT -> KASSERT.
 1.86.4.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.88.2.2 03-Sep-2007  skrll Sync with HEAD.
 1.88.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.89.6.2 20-Jul-2007  pooka In sync, skip over vnodes based on if they are clean rather than
if they have pages.
 1.89.6.1 20-Jul-2007  pooka file ffs_vnops.c was added on branch matt-mips64 on 2007-07-20 16:46:46 +0000
 1.89.4.4 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.89.4.3 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.89.4.2 03-Sep-2007  jmcneill Sync with HEAD.
 1.89.4.1 16-Aug-2007  jmcneill Sync with HEAD.
 1.91.4.1 14-Oct-2007  yamt sync with head.
 1.91.2.3 23-Mar-2008  matt sync with HEAD
 1.91.2.2 09-Jan-2008  matt sync with HEAD
 1.91.2.1 06-Nov-2007  matt sync with HEAD
 1.92.4.2 18-Feb-2008  mjf Sync with HEAD.
 1.92.4.1 08-Dec-2007  mjf Sync with HEAD.
 1.93.6.3 10-Jan-2008  bouyer Sync with HEAD
 1.93.6.2 08-Jan-2008  bouyer Sync with HEAD
 1.93.6.1 02-Jan-2008  bouyer Sync with HEAD
 1.93.2.4 30-Dec-2007  ad Fix remaining problems with ext2fs on this branch.
 1.93.2.3 10-Dec-2007  ad - Don't drain the vnode lock in vclean(); reference counting and XLOCK
should be enough.
- LK_SETRECURSE is gone.
 1.93.2.2 09-Dec-2007  ad LK_SETRECURSE is unused.
 1.93.2.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.98.10.5 09-Oct-2010  yamt sync with head
 1.98.10.4 11-Aug-2010  yamt sync with head.
 1.98.10.3 11-Mar-2010  yamt sync with head
 1.98.10.2 04-May-2009  yamt sync with head.
 1.98.10.1 16-May-2008  yamt sync with head.
 1.98.8.1 18-May-2008  yamt sync with head.
 1.98.6.3 17-Jan-2009  mjf Sync with HEAD.
 1.98.6.2 28-Sep-2008  mjf Sync with HEAD.
 1.98.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.99.6.1 19-Oct-2008  haad Sync with HEAD.
 1.99.4.3 18-Jul-2008  simonb In ffs_fsync() pass FSYNC_VFS to ffs_full_fsync() for a VBLK vnode so
that the correct "struct mount" is referenced.

Fixes WAPBL for the "mount update" case, so remove the "anti-kern/38057"
hack that was previous there to guard against this.

Based on suggestion from yamt@. yamt suggest this could be cleaner
that the current VFS_FSYNC method too. Another day...
 1.99.4.2 12-Jun-2008  martin License police
 1.99.4.1 10-Jun-2008  simonb Initial commit of Wasabi System's WAPBL (Write Ahead Physical Block
Logging) journaling code. Originally written by Darrin B. Jewell
while at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

Still a number of issues - look in doc/BRANCHES for "simonb-wapbl"
for more info.
 1.99.2.2 10-Oct-2008  skrll Sync with HEAD.
 1.99.2.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.104.4.8 17-Jul-2011  riz Pull up following revision(s) (requested by manu in ticket #1645):
lib/libc/sys/Makefile.inc 1.207 via patch
lib/libc/sys/extattr_get_file.2 patch
lib/libpuffs/dispatcher.c 1.34,1.36 via patch
lib/libpuffs/puffs.c 1.107 via patch
lib/libpuffs/puffs.h 1.115,1.118 via patch
sys/fs/puffs/puffs_msgif.h 1.71,1.76 via patch
sys/fs/puffs/puffs_vfsops.c 1.88 via patch
sys/fs/puffs/puffs_vnops.c 1.145,1.154 via patch
sys/kern/vfs_xattr.c 1.24-1.27 via patch
sys/kern/vnode_if.c 1.87 via patch
sys/sys/Makefile 1.133 via patch
sys/sys/extattr.h 1.6 via patch
sys/sys/vnode_if.h 1.81 via patch
sys/ufs/ffs/ffs_vnops.c patch
sys/ufs/ufs/ufs_extattr.c 1.31,1.34 via patch

* support extended attributes
* bump major due to structure growth
* add some spare space
* remove ABI sillyness
Support extended attributes.
Fix multiple non compliances in our Linux-like extattr API, and make it
public so that it can be used.
Improve a bit listxattr(2). It attemps to list both system and user
extended attributes, and it faled if calling user did not have privilege
for reading system EA. Now we just lise user EA and skip system EA in
reading them is not allowed.
Fix bug introduced in previous commuit: Do not vrele() a vnode we did not
obtained.
Improve UFS1 extended attributes usability
- autocreate attribute backing file for new attributes
- autoload attributes when issuing extattrctl start
- when autoloading attributes, do not display garbage warning when looking
up entries that got ENOENT
Add a flag to VOP_LISTEXTATTR(9) so that the vnode interface can tell the
filesystem in which format extended attribute shall be listed.
There are currently two formats:
- NUL-terminated strings, used for listxattr(2), this is the default.
- one byte length-pprefixed, non NUL-terminated strings, used for
extattr_list_file(2), which is obtanined by setting the
EXTATTR_LIST_PREFIXLEN flag to VOP_LISTEXTATTR(9)
This approach avoid the need for converting the list back and forth, except
in libperfuse, since FUSE uses NUL-terminated strings, and the kernel may
have requested EXTATTR_LIST_PREFIXLEN.
 1.104.4.7 04-Apr-2009  snj Pull up following revision(s) (requested by add in ticket #655):
sys/ufs/ffs/ffs_vfsops.c: revision 1.245 via patch
sys/ufs/ffs/ffs_vnops.c: revision 1.112 via patch
fsync:
- atime updates were not being synced.
ffs_sync:
- In some cases the sync vnode was acting like now dead /usr/sbin/update.
It was examining vnodes that it should have ignored.
- It would find dirty inodes and try to flush them. Often ffs_fsync()
cheerfully ignored the flush request due to the fsync bug. Such inodes
remained dirty and were repeatedly re-examined by the syncer until
vnode reclaim or system shutdown.
- We were marking our place in the per-mount vnode list even though in
most cases there was not flush to perform. While not a bug, this wasted
CPU cycles because a TAILQ_NEXT would have sufficed.
 1.104.4.6 24-Feb-2009  snj Pull up following revision(s) (requested by ad in ticket #490):
sys/kern/vfs_wapbl.c: revision 1.23
sys/miscfs/syncfs/sync_subr.c: revision 1.36
sys/miscfs/syncfs/sync_vnops.c: revision 1.26
sys/ufs/ffs/ffs_alloc.c: revision 1.121
sys/ufs/ffs/ffs_vfsops.c: revision 1.242
sys/ufs/ffs/ffs_vnops.c: revision 1.110
PR kern/39564 wapbl performance issues with disk cache flushing
PR kern/40361 WAPBL locking panic in -current
PR kern/40361 WAPBL locking panic in -current
PR kern/40470 WAPBL corrupts ext2fs
PR kern/40562 busy loop in ffs_sync when unmounting a file system
PR kern/40525 panic: ffs_valloc: dup alloc
- A fix for an issue that can lead to "ffs_valloc: dup" due to dirty cg
buffers being invalidated. Problem discovered and patch by dholland@.
- If the syncer fails to lazily sync a vnode due to lock contention,
retry 1 second later instead of 30 seconds later.
- Flush inode atime updates every ~10 seconds (this makes most sense with
logging). Presently they didn't hit the disk for read-only files or
devices until the file system was unmounted. It would be better to trickle
the updates out but that would require more extensive changes.
- Fix issues with file system corruption, busy looping and other nasty
problems when logging and non-logging file systems are intermixed,
with one being the root file system.
- For logging, do not flush metadata on an inode-at-a-time basis if the sync
has been requested by ioflush. Previously, we could try hundreds of log
sync operations a second due to inode update activity, causing the syncer
to fall behind and metadata updates to be serialized across the entire
file system. Instead, burst out metadata and log flushes at a minimum
interval of every 10 seconds on an active file system (happens more often
if the log becomes full). Note this does not change the operation of
fsync() etc.
- With the flush issue fixed, re-enable concurrent metadata updates in
vfs_wapbl.c.
 1.104.4.5 02-Feb-2009  snj Pull up following revision(s) (requested by ad in ticket #395):
sys/ufs/ffs/ffs_vnops.c: revision 1.109
PR kern/40469 5.0_BETA/amd64 INSTALL kernel panics when installing on
log-enabled filesystems
PR kern/40470 WAPBL corrupts ext2fs
Don't touch inodes at all unless VOP_FSYNC(). Might fix the ext2fs problem,
I am not sure.
 1.104.4.4 02-Feb-2009  snj Pull up following revision(s) (requested by ad in ticket #395):
sys/ufs/ffs/ffs_vnops.c: revision 1.108
Don't try to ffs_update VT_NON vnodes
 1.104.4.3 02-Feb-2009  snj Pull up following revision(s) (requested by ad in ticket #395):
sys/ufs/ffs/ffs_vnops.c: revision 1.107
Add a comment.
 1.104.4.2 02-Feb-2009  snj Pull up following revision(s) (requested by ad in ticket #395):
sys/ufs/ffs/ffs_vnops.c: revision 1.106
PR kern/40246 current panics when removing swap devices
Someone was smoking crack when they decided to unconditionally OR FSYNC_VFS
into the flags for block devices.
 1.104.4.1 02-Feb-2009  snj Pull up following revision(s) (requested by ad in ticket #395):
sys/ufs/ffs/ffs_vnops.c: revision 1.105
PR kern/40210 5.0 BETA WAPBL related crash
 1.104.2.3 28-Apr-2009  skrll Sync with HEAD.
 1.104.2.2 03-Mar-2009  skrll Sync with HEAD.
 1.104.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.109.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.113.4.5 31-May-2011  rmind sync with head
 1.113.4.4 21-Apr-2011  rmind sync with head
 1.113.4.3 05-Mar-2011  rmind sync with head
 1.113.4.2 30-May-2010  rmind sync with head
 1.113.4.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.113.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.113.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.116.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.118.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.120.8.1 07-May-2012  riz Pull up following revision(s) (requested by chs in ticket #204):
sys/fs/sysvbfs/sysvbfs_vnops.c: revision 1.44
sys/ufs/ffs/ffs_vfsops.c: revision 1.277
sys/fs/v7fs/v7fs_vnops.c: revision 1.11
sys/ufs/chfs/chfs_vnops.c: revision 1.7
sys/ufs/ext2fs/ext2fs_readwrite.c: revision 1.61
sys/miscfs/genfs/genfs_io.c: revision 1.54
sys/kern/vfs_wapbl.c: revision 1.52
sys/uvm/uvm_pager.h: revision 1.43
sys/ufs/ffs/ffs_vnops.c: revision 1.121
sys/kern/vfs_subr.c: revision 1.434
sys/fs/msdosfs/msdosfs_vnops.c: revision 1.83
sys/fs/ntfs/ntfs_vnops.c: revision 1.51
sys/fs/udf/udf_subr.c: revision 1.119
sys/miscfs/specfs/spec_vnops.c: revision 1.135
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.103
sys/fs/udf/udf_vnops.c: revision 1.71
sys/ufs/ufs/ufs_readwrite.c: revision 1.104
change vflushbuf() to take the full FSYNC_* flags.
translate FSYNC_LAZY into PGO_LAZY for VOP_PUTPAGES() so that
genfs_do_io() can set the appropriate io priority for the I/O.
this is the first part of addressing PR 46325.
mark all wapbl I/O as BPRIO_TIMECRITICAL.
this is the second part of addressing PR 46325.
 1.120.6.1 02-Jun-2012  mrg sync to latest -current.
 1.120.2.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.120.2.2 23-Jan-2013  yamt sync with head
 1.120.2.1 23-May-2012  yamt sync with head.
 1.121.2.3 03-Dec-2017  jdolecek update from HEAD
 1.121.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.121.2.1 25-Feb-2013  tls resync with head
 1.123.2.1 18-May-2014  rmind sync with head
 1.124.2.1 10-Aug-2014  tls Rebase.
 1.125.12.1 21-Apr-2017  bouyer Sync with HEAD
 1.125.8.1 20-Mar-2017  pgoyette Sync with HEAD
 1.125.4.1 28-Aug-2017  skrll Sync with HEAD
 1.129.16.1 29-Feb-2020  ad Sync with head.
 1.129.10.2 21-Apr-2020  martin Sync with HEAD
 1.129.10.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.130.4.1 20-Apr-2020  bouyer Sync with HEAD
 1.133.6.1 01-Aug-2021  thorpej Sync with HEAD.
 1.50 30-Dec-2024  hannken Protect test/clear fs->fs_fmod with um_lock like it is already
protected in ffs_alloc.c.

When writing to disk protect moving superblock to buffer with um_lock.

Set/clear fs->fmod while mounting, updating a mount or unmounting
is safe as these operations run exclusive, either mounting creates
a new file system or the file system is suspended. Assert suspension
for update and unmount.

PR kern/58837 "ffs: Missing locking around fs_fmod/time"
 1.49 13-May-2024  msaitoh branches: 1.49.2;
s/contigous/contiguous/ in comment.
 1.48 22-May-2022  andvar branches: 1.48.4;
fix various small typos, mainly in comments.
 1.47 13-May-2022  reinoud Fix typo dallocate -> deallocate
 1.46 11-Apr-2020  jdolecek remove noncompilable WAPBL_DEBUG_INODES

PR kern/49554 by Thomas Klausner
 1.45 17-Jan-2020  ad branches: 1.45.4;
VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.
 1.44 01-Jan-2019  hannken branches: 1.44.4; 1.44.6;
Add "void *extra" argument to vcache_new() so a file system may
pass more information about the file to create.

Welcome to 8.99.30
 1.43 10-Dec-2018  jdolecek make UFS_WAPBL_JLOCK_ASSERT() #ifdef DIAGNOSTIC, same as the underlying
function KASSERT(), so that it actually does something; fix code using
it to actually pass correct params, so that it compiles

remove UFS_WAPBL_JUNLOCK_ASSERT(), as that is inherently racy (it's
okay on those places if the rwlock is held by other lwp); depend
on the RW_ASSERT()/LOCKDEBUG inside rw_enter() to catch the case
with wapbl rwlock held by current lwp
 1.42 03-Sep-2018  riastradh Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
 1.41 28-May-2017  hannken branches: 1.41.8; 1.41.10;
No need to call vgone() on the just created in file system log vnode,
vput() is sufficient.
 1.40 22-Mar-2017  jdolecek move the ffs_sync() after wapbl_log_position() call, since that can still
create delayed writes with MNT_ASYNC when log is created
 1.39 16-Mar-2017  jdolecek need to turn off async during ffs_sync(), otherwise its bwrite() calls are
themselves turned to bdwrite(), creating dirty delayed writes

fixes panic for 'mount -o log,async ...' reported by Masanobu SAITOH
on current-users; fix help by hannken@, thank you
 1.38 10-Mar-2017  jdolecek sync any delayed writes when updating filesystem to log

Adresses PR kern/52056 by Martin Husemann, fix helped by Juergen Hannken, thanks
 1.37 10-Nov-2016  jdolecek branches: 1.37.2;
disable discard when log is enabled to preserve log consistency promise

PR kern/50725
 1.36 10-Nov-2016  jdolecek during truncate with wapbl, register deallocation for upper indirect block
before recursing into lower blocks, to make sure that it will be removed after
all its referenced blocks are removed

fixes 'ffs_blkfree_common: freeing free block' panic triggered by
ufs_truncate_retry() when just the upper indirect block registration failed,
code tried to free the lower blocks again after wapbl flush

problem found by hannken@, thank you
 1.35 02-Oct-2016  christos use __func__ and print the filesystem we are printing the message for.
 1.34 01-Oct-2016  jdolecek allocate wapbl dealloc registration structures via pool, so that there is more
flexibility with limit handling
 1.33 01-Oct-2016  jdolecek wapbl_remove_log(): add missing break; harmless, fallthrough just printed
extra debug message
 1.32 24-Sep-2016  jdolecek fix swapped KASSERT()
 1.31 24-Sep-2016  jdolecek i/o optimization for wapbl flush - only sync superblock and cgs when
they were actually changed
 1.30 28-Mar-2015  maxv branches: 1.30.2;
Remove the 'cred' argument from bread(). Remove a now unused var in
ffs_snapshot.c. Update the man page accordingly.

ok hannken@
 1.29 17-Mar-2015  hannken Change ffs to use vcache_new:
- Change ffs_valloc to return an inode number.
- Remove now obsolete UFS operations UFS_VALLOC and UFS_VFREE.
- Make ufs_makeinode private to ufs_vnops.c and pass vattr instead of mode.
 1.28 11-Jul-2014  christos branches: 1.28.4;
move the flag setting higher to avoid KASSERT (dholland)
 1.27 10-Jul-2014  christos CID 975226: hande error from UFS_WAPBL_BEGIN
 1.26 10-Jul-2014  dholland Fix unchecked UFS_WAPBL_BEGIN. Coverity 975226.
Unfortunately it looks like all we can do on error here is printf.
 1.25 25-Oct-2013  martin branches: 1.25.2;
Turn a few __unused into __diagused
 1.24 20-Oct-2013  htodd Definining needswap where needed.
 1.23 19-Oct-2013  martin Mark a potentially unused variable
 1.22 23-Jun-2013  dholland branches: 1.22.2;
Stick ffs_ in front of the following macros:
fragstoblks()
blkstofrags()
fragnum()
blknum()

to finish the job of distinguishing them from the lfs versions, which
Christos renamed the other day.

I believe this is the last of the overtly ambiguous exported symbols
from ffs... or at least, the last of the ones that conflicted with lfs.
ffs still pollutes the C namespace very broadly (as does ufs) and this
needs quite a bit more cleanup.

XXX: boo on macros with lowercase names. But I'm not tackling that just yet.
 1.21 23-Jun-2013  dholland Stick ffs_, ext2_, chfs_, filecore_, cd9660_, or mfs_ in front of
the following symbols so as to disambiguate fully. (Christos already
did the lfs ones.)

lblkno
lblktosize
lfragtosize
numfrags
blkroundup
fragroundup
 1.20 23-Jun-2013  dholland fsbtodb() -> FFS_FSBTODB(), EXT2_FSBTODB(), or MFS_FSBTODB()
dbtofsb() -> FFS_DBTOFSB() or EXT2_DBTOFSB()

(Christos already did the lfs ones a few days back)
 1.19 22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.18 20-Dec-2012  hannken Change bread() and breadn() to never return a buffer on
error and modify all callers to not brelse() on error.

Welcome to 6.99.16

PR kern/46282 (6.0_BETA crash: msdosfs_bmap -> pcbmap -> bread -> bio_doread)
 1.17 24-Dec-2010  mlelstv branches: 1.17.8; 1.17.18;
For update mounts the root vnode is already in use and we must not
free it. Since the mount persists even when the update fails,
this is not a problem either.
 1.16 23-Dec-2010  mlelstv mount(2) doesn't remove vnodes from the freelist in the error path,
so that they get reused with a invalid pointer to a mount structure.

As a workaround, free the vnodes used to create the in-filesystem journal
immediately.
 1.15 27-Feb-2010  mlelstv branches: 1.15.2;
Store physical block numbers in superblock that point to the journal.
Calculate position of both commit headers correctly for disks with
large sectors.
Correct calculation of circular buffer size.
 1.14 23-Feb-2010  mlelstv Replace individual queries for partition information with
new helper function.
Use this information to query physical sector sizes for WAPBL
instead of hardcoded defaults.
No longer limits physical sector sizes to 512 bytes.
 1.13 13-Sep-2009  bouyer branches: 1.13.2;
Allow tunefs to clear any type of WAPBL log, not only in-filesystem
ones. Discussed in
http://mail-index.netbsd.org/tech-kern/2009/08/17/msg005896.html
and followups.
 1.12 22-Feb-2009  ad branches: 1.12.2;
PR kern/26878 FFSv2 + softdep = livelock (no free ram)
PR kern/16942 panic with softdep and quotas
PR kern/19565 panic: softdep_write_inodeblock: indirect pointer #1 mismatch
PR kern/26274 softdep panic: allocdirect_merge: ...
PR kern/26374 Long delay before non-root users can write to softdep partitions
PR kern/28621 1.6.x "vp != NULL" panic in ffs_softdep.c:4653 while unmounting a softdep (+quota) filesystem
PR kern/29513 FFS+Softdep panic with unfsck-able file-corruption
PR kern/31544 The ffs softdep code appears to fail to write dirty bits to disk
PR kern/31981 stopping scsi disk can cause panic (softdep)
PR kern/32116 kernel panic in softdep (assertion failure)
PR kern/32532 softdep_trackbufs deadlock
PR kern/37191 softdep: locking against myself
PR kern/40474 Kernel panic after remounting raid root with softdep

Retire softdep, pass 2. As discussed and later formally announced on the
mailing lists.
 1.11 31-Jan-2009  yamt branches: 1.11.2;
0 -> NULL
 1.10 31-Jan-2009  yamt wapbl_log_position: 1 -> MNT_WAIT
 1.9 30-Nov-2008  joerg Split ffs_blkalloc into a frontend that does inode based consistency
checks and a backend that just asserts them. Use the backend in
ffs_wapbl_abort_sync_metadata instead of faking an inode.
 1.8 11-Nov-2008  joerg wapbl_replay_free needs the reply to have been stopped, so make sure
that the changes happen in the right order. Reported by veego@
 1.7 10-Nov-2008  joerg Reduce internals of WAPBL exposed to the rest of the system.
 1.6 08-Sep-2008  joerg branches: 1.6.2; 1.6.4; 1.6.6; 1.6.8; 1.6.12;
Move successful removal of unreferenced inodes under WAPBL_DEBUG to not
spam the console.

OK simon@
 1.5 05-Aug-2008  pooka zu, not zd, to print size_t
 1.4 04-Aug-2008  simonb Only allow WAPBL to operate with UFS2 style superblocks.

Problem reported by Takeshi Nakayama.
 1.3 02-Aug-2008  simonb When checking if there's enough space at the end of a partition,
compare bytes vs bytes, not sectors vs bytes.

Problem discovered and fix tested by Michael Hitch.
 1.2 31-Jul-2008  simonb Merge the simonb-wapbl branch. From the original branch commit:

Add Wasabi System's WAPBL (Write Ahead Physical Block Logging)
journaling code. Originally written by Darrin B. Jewell while
at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

OK'd by core@, releng@.
 1.1 10-Jun-2008  simonb branches: 1.1.2; 1.1.4;
file ffs_wapbl.c was initially added on branch simonb-wapbl.
 1.1.4.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.1.4.1 19-Oct-2008  haad Sync with HEAD.
 1.1.2.4 28-Jul-2008  simonb Add support for creating a WAPBL log in the filesystem. Will
create an in-filesystem log on first "mount -o log" if one doesn't
exist, and will then continue to use same log in the future. See
(soon to be added) wapbl(4) for more info.

Adds a new B_CONTIG low-level allocation flag that uses hints in
"struct ffs_inode_ext" to lay out an ffs file's data contiguously.

Thanks to Greg Oster for helping with the design of this and to
Antti Kantee for code review and suggestions.
 1.1.2.3 03-Jul-2008  simonb Store the location of the journal in the superblock. Currently
nothing really uses this, other than replay checking that what is
in the superblock matches what it expects.
 1.1.2.2 12-Jun-2008  martin License police
 1.1.2.1 10-Jun-2008  simonb Initial commit of Wasabi System's WAPBL (Write Ahead Physical Block
Logging) journaling code. Originally written by Darrin B. Jewell
while at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

Still a number of issues - look in doc/BRANCHES for "simonb-wapbl"
for more info.
 1.6.12.1 21-Apr-2010  matt sync to netbsd-5
 1.6.8.1 03-Oct-2009  snj Pull up following revision(s) (requested by bouyer in ticket #1036):
sbin/fsck_ffs/extern.h: revision 1.25 via patch
sbin/fsck_ffs/setup.c: revision 1.88 via patch
sbin/fsck_ffs/wapbl.c: revision 1.4 via patch
sbin/tunefs/tunefs.c: revision 1.41 via patch
sys/ufs/ffs/ffs_vfsops.c: revision 1.252 via patch
sys/ufs/ffs/ffs_wapbl.c: revision 1.13 via patch
Allow tunefs to clear any type of WAPBL log, not only in-filesystem
ones. Discussed in
http://mail-index.netbsd.org/tech-kern/2009/08/17/msg005896.html
and followups.
--
Do some basic checks of the WAPBL journal, to abort the boot before the
kernel refuse to mount a filesystem read-write (booting a system
multiuser with critical filesystems read-only is bad):
Add a check_wapbl() which will check some WAPBL values in the superblock,
and try to read the journal via wapbl_replay_start() if there is one.
pfatal() if one of these fail (abort boot if in preen mode,
as "CONTINUE" otherwise). In non-preen mode the bogus journal will
be cleared.
check_wapbl() is always called if the superblock supports WAPBL.
Even if FS_DOWAPBL is not there, there could be flags asking the
kernel to clear or create a log with bogus values which would cause the
kernel refuse to mount the filesystem.
Discussed in
http://mail-index.netbsd.org/tech-kern/2009/08/17/msg005896.html
and followups.
--
If the WAPBL journal can't be read (ffs_wapbl_replay_start() fails),
mount the filesystem anyway if MNT_FORCE is present.
This allows to still boot single-user a system with a corrupted
WAPBL on /, and so get a chance to run fsck to fix it.
http://mail-index.netbsd.org/tech-kern/2009/08/17/msg005896.html
and followups.
 1.6.6.2 03-Mar-2009  skrll Sync with HEAD.
 1.6.6.1 19-Jan-2009  skrll Sync with HEAD.
 1.6.4.3 17-Jan-2009  mjf Sync with HEAD.
 1.6.4.2 28-Sep-2008  mjf Sync with HEAD.
 1.6.4.1 08-Sep-2008  mjf file ffs_wapbl.c was added on branch mjf-devfs2 on 2008-09-28 10:41:06 +0000
 1.6.2.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.6.2.1 08-Sep-2008  wrstuden file ffs_wapbl.c was added on branch wrstuden-revivesa on 2008-09-18 04:37:05 +0000
 1.11.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.12.2.4 11-Mar-2010  yamt sync with head
 1.12.2.3 16-Sep-2009  yamt sync with head
 1.12.2.2 04-May-2009  yamt sync with head.
 1.12.2.1 22-Feb-2009  yamt file ffs_wapbl.c was added on branch yamt-nfs-mp on 2009-05-04 08:14:38 +0000
 1.13.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.15.2.1 05-Mar-2011  rmind sync with head
 1.17.18.3 03-Dec-2017  jdolecek update from HEAD
 1.17.18.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.17.18.1 25-Feb-2013  tls resync with head
 1.17.8.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.17.8.1 23-Jan-2013  yamt sync with head
 1.22.2.1 18-May-2014  rmind sync with head
 1.25.2.1 10-Aug-2014  tls Rebase.
 1.28.4.4 28-Aug-2017  skrll Sync with HEAD
 1.28.4.3 05-Dec-2016  skrll Sync with HEAD
 1.28.4.2 05-Oct-2016  skrll Sync with HEAD
 1.28.4.1 06-Apr-2015  skrll Sync with HEAD
 1.30.2.4 26-Apr-2017  pgoyette Sync with HEAD
 1.30.2.3 20-Mar-2017  pgoyette Sync with HEAD
 1.30.2.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.30.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.37.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.41.10.3 21-Apr-2020  martin Sync with HEAD
 1.41.10.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.41.10.1 10-Jun-2019  christos Sync with HEAD
 1.41.8.3 18-Jan-2019  pgoyette Synch with HEAD
 1.41.8.2 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.41.8.1 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.44.6.1 17-Jan-2020  ad Sync with head.
 1.44.4.1 07-Jan-2025  martin Pull up following revision(s) (requested by hannken in ticket #1934):

sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.228
sys/ufs/lfs/lfs_vfsops.c: revision 1.383
sys/ufs/ffs/ffs_wapbl.c: revision 1.50
sys/ufs/ffs/ffs_vfsops.c: revision 1.383 (patch)
sys/ufs/ffs/ffs_vfsops.c: revision 1.384 (patch)

Remove comment "we are always called with the filesystem marked `MPBUSY'."
above some xxx_sync() operations. These operations get called without
any exclusive lock.

This comment appeared with "add quota support" on 1990-05-02.
On 1998/02/18 MNT_MPBUSY disappeared when vfs_busy() was changed from
an exclusive lock to a shared lock.

PR kern/58837 "ffs: Missing locking around fs_fmod/time"

Protect test/clear fs->fs_fmod with um_lock like it is already
protected in ffs_alloc.c.

When writing to disk protect moving superblock to buffer with um_lock.

Set/clear fs->fmod while mounting, updating a mount or unmounting
is safe as these operations run exclusive, either mounting creates
a new file system or the file system is suspended. Assert suspension
for update and unmount.

PR kern/58837 "ffs: Missing locking around fs_fmod/time"
 1.45.4.1 20-Apr-2020  bouyer Sync with HEAD
 1.48.4.1 07-Jan-2025  martin Pull up following revision(s) (requested by hannken in ticket #1037):

sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.228
sys/ufs/lfs/lfs_vfsops.c: revision 1.383
sys/ufs/ffs/ffs_wapbl.c: revision 1.50
sys/ufs/ffs/ffs_vfsops.c: revision 1.383
sys/ufs/ffs/ffs_vfsops.c: revision 1.384

Remove comment "we are always called with the filesystem marked `MPBUSY'."
above some xxx_sync() operations. These operations get called without
any exclusive lock.

This comment appeared with "add quota support" on 1990-05-02.
On 1998/02/18 MNT_MPBUSY disappeared when vfs_busy() was changed from
an exclusive lock to a shared lock.

PR kern/58837 "ffs: Missing locking around fs_fmod/time"

Protect test/clear fs->fs_fmod with um_lock like it is already
protected in ffs_alloc.c.

When writing to disk protect moving superblock to buffer with um_lock.

Set/clear fs->fmod while mounting, updating a mount or unmounting
is safe as these operations run exclusive, either mounting creates
a new file system or the file system is suspended. Assert suspension
for update and unmount.

PR kern/58837 "ffs: Missing locking around fs_fmod/time"
 1.49.2.1 02-Aug-2025  perseant Sync with HEAD
 1.73 13-Dec-2024  riastradh sys/ufs/ffs/fs.h: Fix confusing comment about struct fs.

This is the on-disk format, not a purely in-memory data structure
like struct ufsmount. While the on-disk format happens to be copied
into memory, it is misleading to say `in memory' here.
 1.72 13-May-2024  msaitoh branches: 1.72.2;
s/of of/of/ in comment.
 1.71 07-Jan-2023  chs ufs: fixed signed/unsigned bugs affecting large file systems

Apply these commits from FreeBSD:

commit e870d1e6f97cc73308c11c40684b775bcfa906a2
Author: Kirk McKusick <mckusick@FreeBSD.org>
Date: Wed Feb 10 20:10:35 2010 +0000

This fix corrects a problem in the file system that treats large
inode numbers as negative rather than unsigned. For a default
(16K block) file system, this bug began to show up at a file system
size above about 16Tb.

To fully handle this problem, newfs must be updated to ensure that
it will never create a filesystem with more than 2^32 inodes. That
patch will be forthcoming soon.

Reported by: Scott Burns, John Kilburg, Bruce Evans
Followup by: Jeff Roberson
PR: 133980
MFC after: 2 weeks

commit 81479e688b0f643ffacd3f335b4b4bba460b769d
Author: Kirk McKusick <mckusick@FreeBSD.org>
Date: Thu Feb 11 18:14:53 2010 +0000

One last pass to get all the unsigned comparisons correct.


In additional to the changes from FreeBSD, this commit includes quite a few
related changes to appease -Wsign-compare.
 1.70 17-Nov-2022  chs branches: 1.70.2;
Restore backward compatibility of UFS2 with previous NetBSD releases by
disabling support in UFS2 for extended attributes (including ACLs).
Add a new variant of UFS2 called "UFS2ea" that does support extended attributes.
Add new fsck_ffs operations "-c ea" and "-c no-ea" to convert file systems
from UFS2 to UFS2ea and vice-versa (both of which delete all existing extended
attributes in the process).
 1.69 18-Sep-2021  christos Change the default for ACLs to be posix1e instead of nfsv4 to match FreeBSD.
Requested by chuq.
 1.68 16-May-2020  christos Add ACL support for FFS. From FreeBSD.
 1.67 18-Apr-2020  christos Extended attribute support for ffsv2, from FreeBSD.
 1.66 14-Feb-2015  maxv branches: 1.66.18; 1.66.28;
Two typos:
- "preferrably" -> "preferably"
- "overriden" -> "overridden"
No functional change.
 1.65 03-Sep-2013  dholland branches: 1.65.6;
Add the FS_SUJ flag for journaled softupdates from FreeBSD.

This conflicts with our flag for FS_INDEXDIRS. Apparently FreeBSD
changed that arbitrarily on their end when implementing journaled
softupdates, so follow their lead.

Unfortunately, the new value they use for FS_INDEXDIRS conflicts with
our flag FS_DOQUOTA2 for 64-bit quotas. Since the only thing in our
tree that knows about FS_INDEXDIRS is dumpfs (for printing it), leave
FS_INDEXDIRS commented out.

Also add FS_NFS4ACLS from FreeBSD, commented out because it conflicts
with our FS_DOWAPBL, and FS_TRIM.

(We could honor FS_TRIM as we have code for doing that; however I'm
not sure why FreeBSD chose to make it an on-disk flag instead of e.g.
a mount option and it seems problematic to me. In any case, not in
this commit.)

Also see a post I just made in tech-kern about the flag conflicts.
 1.64 23-Jun-2013  dholland branches: 1.64.2;
Stick ffs_ in front of the following macros:
fragstoblks()
blkstofrags()
fragnum()
blknum()

to finish the job of distinguishing them from the lfs versions, which
Christos renamed the other day.

I believe this is the last of the overtly ambiguous exported symbols
from ffs... or at least, the last of the ones that conflicted with lfs.
ffs still pollutes the C namespace very broadly (as does ufs) and this
needs quite a bit more cleanup.

XXX: boo on macros with lowercase names. But I'm not tackling that just yet.
 1.63 23-Jun-2013  dholland Stick ffs_, ext2_, chfs_, filecore_, cd9660_, or mfs_ in front of
the following symbols so as to disambiguate fully. (Christos already
did the lfs ones.)

lblkno
lblktosize
lfragtosize
numfrags
blkroundup
fragroundup
 1.62 23-Jun-2013  dholland fsbtodb() -> FFS_FSBTODB(), EXT2_FSBTODB(), or MFS_FSBTODB()
dbtofsb() -> FFS_DBTOFSB() or EXT2_DBTOFSB()

(Christos already did the lfs ones a few days back)
 1.61 19-Jun-2013  dholland Rename ambiguous macros:
MAXDIRSIZE -> UFS_MAXDIRSIZE or LFS_MAXDIRSIZE
NINDIR -> FFS_NINDIR, EXT2_NINDIR, LFS_NINDIR, or MFS_NINDIR
INOPB -> FFS_INOPB, LFS_INOPB
INOPF -> FFS_INOPF, LFS_INOPF
blksize -> ffs_blksize, ext2_blksize, or lfs_blksize
sblksize -> ffs_blksize

These are not the only ambiguously defined filesystem macros, of
course, there's a pile more. I may not have found all the ambiguous
definitions of blksize(), too, as there are a lot of other things
called 'blksize' in the system.
 1.60 22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.59 23-Apr-2012  drochner branches: 1.59.2;
everywhere else it is assumed that the filesystem block size fits into
a 32-bit "int" -- do the cast to quell a compiler warning in a more
sensible way
 1.58 20-Apr-2012  christos one more cast
 1.57 19-Apr-2012  christos Fix signed/unsigned issues.
 1.56 06-Mar-2011  bouyer branches: 1.56.4; 1.56.8;
merge the bouyer-quota2 branch. This adds a new on-disk format
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.

See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
 1.55 31-Jan-2010  mlelstv branches: 1.55.4; 1.55.6; 1.55.8;
Fix block shift to work with different device block sizes.

Unlike other filesystems this has some side issues because
the shift values are stored in the superblock and because
userland utitlies share the same fsbtodb macros.

-> the kernel now ignores the value stored in the superblock.
-> the macro adaption is only done for defined(_KERNEL) code.
 1.54 28-Jun-2009  ad +/*
+ * NOTE: COORDINATE ON-DISK FORMAT CHANGES WITH THE FREEBSD PROJECT.
+ */
 1.53 12-May-2009  ad Reserve a bit for FS_GJOURNAL (from FreeBSD).
 1.52 23-Feb-2009  dholland typo in comment
 1.51 31-Jul-2008  simonb branches: 1.51.2; 1.51.8;
Be consistent with #define<tab>.
 1.50 31-Jul-2008  simonb Merge the simonb-wapbl branch. From the original branch commit:

Add Wasabi System's WAPBL (Write Ahead Physical Block Logging)
journaling code. Originally written by Darrin B. Jewell while
at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

OK'd by core@, releng@.
 1.49 25-Dec-2007  perry branches: 1.49.6; 1.49.10; 1.49.12; 1.49.14; 1.49.16;
Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h
 1.48 23-Nov-2007  dholland branches: 1.48.2; 1.48.6;
Change the fs_clean member of the ffs superblock to be unsigned
(uint8_t instead of int8_t) - this prevents an ugly sign-extension
printing bug as well as formally undefined behavior when you mount an
unclean fs enough times.

From (my own) PR kern/28134; I've been carrying this patch for three
years, long enough to forget about it, and it's had no ill effects in
that time.

reviewed: pooka
 1.47 24-Sep-2007  pooka branches: 1.47.4;
Fix comment inaccurate from prehistoric times: default MINFREE is 5, not 10
 1.46 11-Dec-2005  christos branches: 1.46.30; 1.46.44; 1.46.46; 1.46.48;
merge ktrace-lwp.
 1.45 26-Feb-2005  perry branches: 1.45.4;
nuke trailing whitespace
 1.44 25-May-2004  hannken branches: 1.44.4; 1.44.6;
Add ffs internal snapshots. Written by Marshall Kirk McKusick for FreeBSD.

- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.

Welcome to 2.0F.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.43 21-Mar-2004  dsl Rework superblock validation logic to make adding validity tests easier.
Ensure that we don't use the first alternate superblock of a ffsv1
filesystem with 64k blocks (it is in the same place as an ffsv2 sb).
Fixes part of PR kern/24809
 1.42 20-Mar-2004  dsl Change comments - one I wrote earlier wasn't right.
Add a couple of notes about areas of the superblock being reassigned
when ffsv2 was imported.
 1.41 20-Mar-2004  dsl Add a large comment about the balls-up caused by the ffsv2 superblock
not being at 8k - causes all sorts of problems, in particular with
ffsv1 filessytems with 64k blocks, and disks that are reformatted from
ffsv1 to ffsv2 (and v.v.). see also PR kern/24809
 1.40 03-Jan-2004  dbj reintroduce compatbility defines for
fs_headswitch, fs_trkseek, fs_csmask, fs_csshift
fs_postbl, fs_rotbl, cg_blktot, cg_blks, cbtocylno, cbtorpos
 1.39 02-Jan-2004  dbj explicitly pad struct appleufslabel and use __attribute__((__packed__))
since apple put the 64 bit uuid field on a 4 byte boundary
 1.38 02-Jan-2004  dbj add uuid field to apple ufs volume label
 1.37 31-Dec-2003  dbj remove unused cs_numclusters field from struct csum_total
this avoids a potential future bug if it is ever used.
before this fix, fsck_ffs would check and fix this field to be zero
 1.36 31-Dec-2003  dbj update explanatory comment about NOCSPTRS to reflect that fs_active
is now within that region.
no functional change
 1.35 29-Sep-2003  dbj Declare fs_old_flags and fs_flags as unsigned.
This fixes a bug introduced in revision 1.120 of ffs_vfsops dated 2003/09/13
which results in fs_flags having a value of 0x7fffff00 when a superblock
is updated to use the new layout.
Discussed in http://mail-index.netbsd.org/tech-kern/2003/09/28/0003.html
 1.34 21-Aug-2003  dsl Split CGSIZE definition so it can be used with 64bit fpg values.
Split cg_start so magic can be done in libsa when it is known that the
filesystem isn't UFS2.
 1.33 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.32 05-Apr-2003  fvdl branches: 1.32.2;
* Use the old and new time fields in the superblock as well as a few others
to determine if this filesystem was mounted by an older kernel after
having been mounted by a newer one, to avoid some summary mismatches.
* Reinstate support for 4.2 cylinder groups (read-only, as it was before).
 1.31 05-Apr-2003  he Remember to prefix the manually-swapped FS magic numbers with 0x.
 1.30 03-Apr-2003  fvdl Avoid truncation of values in some macros that shift 64 bit values.
From FreeBSD.
 1.29 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.28 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.27 04-Nov-2002  wiz s/sqiud/squid/ in comment, reported by skrueger at europe com.
 1.26 28-Sep-2002  dbj Add support for the Apple UFS variation on ffs
This is the bulk of PR #17345

The general approach is to use a run time deteriminable value
for DIRBLKSIZ. Additional allowances are included for using
MAXSYMLINKLEN with FS_42INODEFMT and a shift in the cylinder group
cluster summary count array. Support is added for managing
the Apple UFS volume label.
 1.25 10-Apr-2002  mycroft Add a special case for nrpos=1 to cbtorpos(). This massively reduces CPU usage
by newfs(8) -- and fsck_ffs(8) on a relatively empty file system. There is
still one divide left in the inner loops, to calculate cylno values.
 1.24 10-Apr-2002  mycroft Use fsbtodb() rather than multiplying by NSPF().
 1.23 07-Jan-2002  lukem revert part of rev 1.14 - #include <ufs/ufs/dinode.h> - because that
makes it MUCH more difficult to reference this file stand-alone.
 1.22 18-Dec-2001  fvdl Bring over fixes from FreeBSD that weren't incorporated yet, mainly
from Kirk McKusick. They implement taking pending block/inode frees
into account for the sake of correct statfs() numbers, and adding
a new softdep type (newdirblk) to correctly handle newly allocated
directory blocks.

Minor additional changes: 1) swap the newly introduced fs_pendinginodes
and fs_pendingblock fields in ffs_sb_swap, and 2) declare lkt_held
in the debug version of the softdep lock structure volatile, as it
can be modified from interrupt context #ifdef DEBUG.
 1.21 19-Sep-2001  lukem - ffs_blkpref() changes:
- don't both updating fs->fs_cgrotor, since it's actually not used in
the kernel. from Manuel Bouyer in [kern/3389]
- when examining cylinder groups from startcg to startcg-1 (wrapping
at fs->fs_ncg), there's no need to check startcg at the end as well
as the start...
- highlight in the struct fs declaration that fs_cgrotor is UNUSED
 1.20 06-Sep-2001  lukem branches: 1.20.2;
Incorporate the enhanced ffs_dirpref() by Grigoriy Orlov, as found in
FreeBSD (three commits; the initial work, man page updates, and a fix
to ffs_reload()), with the following differences:
- Be consistent between newfs(8) and tunefs(8) as to the options which
set and control the tuning parameters for this work (avgfilesize & avgfpdir)
- Use u_int16_t instead of u_int8_t to keep track of the number of
contiguous directories (suggested by Chuck Silvers)
- Work within our FFS_EI framework
- Ensure that fs->fs_maxclusters and fs->fs_contigdirs don't point to
the same area of memory

The new algorithm has a marked performance increase, especially when
performing tasks such as untarring pkgsrc.tar.gz, etc.

The original FreeBSD commit messages are attached:

=====
mckusick 2001/04/10 01:39:00 PDT
Directory layout preference improvements from Grigoriy Orlov <gluk@ptci.ru>.
His description of the problem and solution follow. My own tests show
speedups on typical filesystem intensive workloads of 5% to 12% which
is very impressive considering the small amount of code change involved.

------

One day I noticed that some file operations run much faster on
small file systems then on big ones. I've looked at the ffs
algorithms, thought about them, and redesigned the dirpref algorithm.

First I want to describe the results of my tests. These results are old
and I have improved the algorithm after these tests were done. Nevertheless
they show how big the perfomance speedup may be. I have done two file/directory
intensive tests on a two OpenBSD systems with old and new dirpref algorithm.
The first test is "tar -xzf ports.tar.gz", the second is "rm -rf ports".
The ports.tar.gz file is the ports collection from the OpenBSD 2.8 release.
It contains 6596 directories and 13868 files. The test systems are:

1. Celeron-450, 128Mb, two IDE drives, the system at wd0, file system for
test is at wd1. Size of test file system is 8 Gb, number of cg=991,
size of cg is 8m, block size = 8k, fragment size = 1k OpenBSD-current
from Dec 2000 with BUFCACHEPERCENT=35

2. PIII-600, 128Mb, two IBM DTLA-307045 IDE drives at i815e, the system
at wd0, file system for test is at wd1. Size of test file system is 40 Gb,
number of cg=5324, size of cg is 8m, block size = 8k, fragment size = 1k
OpenBSD-current from Dec 2000 with BUFCACHEPERCENT=50

You can get more info about the test systems and methods at:
http://www.ptci.ru/gluk/dirpref/old/dirpref.html

Test Results

tar -xzf ports.tar.gz rm -rf ports
mode old dirpref new dirpref speedup old dirprefnew dirpref speedup
First system
normal 667 472 1.41 477 331 1.44
async 285 144 1.98 130 14 9.29
sync 768 616 1.25 477 334 1.43
softdep 413 252 1.64 241 38 6.34
Second system
normal 329 81 4.06 263.5 93.5 2.81
async 302 25.7 11.75 112 2.26 49.56
sync 281 57.0 4.93 263 90.5 2.9
softdep 341 40.6 8.4 284 4.76 59.66

"old dirpref" and "new dirpref" columns give a test time in seconds.
speedup - speed increasement in times, ie. old dirpref / new dirpref.

------

Algorithm description

The old dirpref algorithm is described in comments:

/*
* Find a cylinder to place a directory.
*
* The policy implemented by this algorithm is to select from
* among those cylinder groups with above the average number of
* free inodes, the one with the smallest number of directories.
*/

A new directory is allocated in a different cylinder groups than its
parent directory resulting in a directory tree that is spreaded across
all the cylinder groups. This spreading out results in a non-optimal
access to the directories and files. When we have a small filesystem
it is not a problem but when the filesystem is big then perfomance
degradation becomes very apparent.

What I mean by a big file system ?

1. A big filesystem is a filesystem which occupy 20-30 or more percent
of total drive space, i.e. first and last cylinder are physically
located relatively far from each other.
2. It has a relatively large number of cylinder groups, for example
more cylinder groups than 50% of the buffers in the buffer cache.

The first results in long access times, while the second results in
many buffers being used by metadata operations. Such operations use
cylinder group blocks and on-disk inode blocks. The cylinder group
block (fs->fs_cblkno) contains struct cg, inode and block bit maps.
It is 2k in size for the default filesystem parameters. If new and
parent directories are located in different cylinder groups then the
system performs more input/output operations and uses more buffers.
On filesystems with many cylinder groups, lots of cache buffers are
used for metadata operations.

My solution for this problem is very simple. I allocate many directories
in one cylinder group. I also do some things, so that the new allocation
method does not cause excessive fragmentation and all directory inodes
will not be located at a location far from its file's inodes and data.
The algorithm is:
/*
* Find a cylinder group to place a directory.
*
* The policy implemented by this algorithm is to allocate a
* directory inode in the same cylinder group as its parent
* directory, but also to reserve space for its files inodes
* and data. Restrict the number of directories which may be
* allocated one after another in the same cylinder group
* without intervening allocation of files.
*
* If we allocate a first level directory then force allocation
* in another cylinder group.
*/

My early versions of dirpref give me a good results for a wide range of
file operations and different filesystem capacities except one case:
those applications that create their entire directory structure first
and only later fill this structure with files.

My solution for such and similar cases is to limit a number of
directories which may be created one after another in the same cylinder
group without intervening file creations. For this purpose, I allocate
an array of counters at mount time. This array is linked to the superblock
fs->fs_contigdirs[cg]. Each time a directory is created the counter
increases and each time a file is created the counter decreases. A 60Gb
filesystem with 8mb/cg requires 10kb of memory for the counters array.

The maxcontigdirs is a maximum number of directories which may be created
without an intervening file creation. I found in my tests that the best
performance occurs when I restrict the number of directories in one cylinder
group such that all its files may be located in the same cylinder group.
There may be some deterioration in performance if all the file inodes
are in the same cylinder group as its containing directory, but their
data partially resides in a different cylinder group. The maxcontigdirs
value is calculated to try to prevent this condition. Since there is
no way to know how many files and directories will be allocated later
I added two optimization parameters in superblock/tunefs. They are:

int32_t fs_avgfilesize; /* expected average file size */
int32_t fs_avgfpdir; /* expected # of files per directory */

These parameters have reasonable defaults but may be tweeked for special
uses of a filesystem. They are only necessary in rare cases like better
tuning a filesystem being used to store a squid cache.

I have been using this algorithm for about 3 months. I have done
a lot of testing on filesystems with different capacities, average
filesize, average number of files per directory, and so on. I think
this algorithm has no negative impact on filesystem perfomance. It
works better than the default one in all cases. The new dirpref
will greatly improve untarring/removing/coping of big directories,
decrease load on cvs servers and much more. The new dirpref doesn't
speedup a compilation process, but also doesn't slow it down.

Obtained from: Grigoriy Orlov <gluk@ptci.ru>
=====

=====
iedowse 2001/04/23 17:37:17 PDT
Pre-dirpref versions of fsck may zero out the new superblock fields
fs_contigdirs, fs_avgfilesize and fs_avgfpdir. This could cause
panics if these fields were zeroed while a filesystem was mounted
read-only, and then remounted read-write.

Add code to ffs_reload() which copies the fs_contigdirs pointer
from the previous superblock, and reinitialises fs_avgf* if necessary.

Reviewed by: mckusick
=====

=====
nik 2001/04/10 03:36:44 PDT
Add information about the new options to newfs and tunefs which set the
expected average file size and number of files per directory. Could do
with some fleshing out.
=====
 1.19 03-Sep-2001  lukem deprecate fs_fscktime; we never used it.

in an effort to maintain compatibility with freebsd/openbsd/whatever,
i'm attempting to get the superblock format in sync, and freebsd uses
the int32_t at this position for `fs_pendinginodes'.

if we ever decide to implement fscktime functionality, we'll:
a) make sure to liaise with the other projects to reserve the same
spare field
b) actually implement the code this time ...

(this is also preparing us for other changes, like the new dirpref code)
 1.18 02-Sep-2001  lukem Incorporate fix by iedowse @ FreeBSD to allow disks with large numbers of
cylinder groups to work correctly, with minor modifications by me to work
with our FFS_EI code. From the FreeBSD commit message:

The ffs superblock includes a 128-byte region for use by temporary
in-core pointers to summary information. An array in this region
(fs_csp) could overflow on filesystems with a very large number of
cylinder groups (~16000 on i386 with 8k blocks). When this happens,
other fields in the superblock get corrupted, and fsck refuses to
check the filesystem.

Solve this problem by replacing the fs_csp array in 'struct fs'
with a single pointer, and add padding to keep the length of the
128-byte region fixed. Update the kernel and userland utilities
to use just this single pointer.

With this change, the kernel no longer makes use of the superblock
fields 'fs_csshift' and 'fs_csmask'. Add a comment to newfs/mkfs.c
to indicate that these fields must be calculated for compatibility
with older kernels.

Reviewed by: mckusick
 1.17 31-Aug-2001  lukem More fixes from FreeBSD (with changes):
- Cast blk argument to lblktosize() to (off_t), to prevent 32 bit overflow.
whilst almost every use in ffs used this for small blknos, there are
potential issues, and it's safer this way. (as discussed with chuq)
- Use 64bit (off_t) math to calculate if we have hit our freespace() limit.
Necessary for coherent results on filesystems bigger than 0.5Tb.
- Use lblktosize() in blksize() and dblksize(), to make it obvious what's
happening
- Remove sblksize() - nothing uses it
 1.16 30-Aug-2001  lukem some improvements from freebsd/openbsd
- replace the unused fs_headswitch and fs_trkseek with fs_id[2], bringing
our struct fs closer to that in freebsd & openbsd (& solaris FWIW)
- dumpfs: improve warning message when cpc == 0
 1.15 30-Aug-2001  lukem - minor whitespace and comments cleanup
- replace "filesystem" with "file system"
- fix spelo (from freebsd)
 1.14 27-Jul-2001  lukem - multiple include protection
- pull in <ufs/ufs/dinode.h> for ufs_daddr_t
- mark a few fields as being "UNUSED" (because they are)
 1.13 23-Feb-2001  eeh branches: 1.13.2; 1.13.6;
Use int32_t for on-disk time_t values.
 1.12 15-Nov-1999  fvdl branches: 1.12.4;
Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.11 28-Jul-1998  drochner branches: 1.11.14; 1.11.16; 1.11.20;
The fragtbl[], inside[] and around[] variables are needed by "fsck",
so we can't put them inside "#ifdef _KERNEL".
Put declarations inside .c files where needed to preserve namespace.
 1.10 28-Jul-1998  mycroft Omit some externs if not _KERNEL.
 1.9 18-Mar-1998  bouyer Add support for reading/writing FFS in non-native byte order, conditioned
to "options FFS_EI". The superblock and inodes (without blk addr) are
byteswapped at disk read/write time, other metadatas are byteswapped
when used (as they are acceeded directly in the buffer cache).
This required the addition of a "um_flags" field to struct ufsmount.
ffs_bswap.c contains superblock and inode byteswap routines also used
by userland utilities.
 1.8 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.7 11-Jun-1997  bouyer Add support for ext2fs, this needed a few modifications to ufs/ufs/inode.h:
- added an "union inode_ext" to struct inode, for the per-fs extentions.
For now only ext2fs uses it.
- i_din is now an union:
union {
struct dinode ffs_din; /* 128 bytes of the on-disk dinode. */
struct ext2fs_dinode e2fs_din; /* 128 bytes of the on-disk dinode. */
} i_din
Added a lot of #define i_ffs_* and i_e2fs_* to access the fields.
- Added two macros: FFS_ITIMES and EXT2FS_ITIMES. ITIMES calls the rigth
macro, depending on the time of the inode. ITIMES is used where necessary,
FFS_ITIMES and EXT2FS_ITIMES in other places.
 1.6 12-Apr-1995  mycroft Make use of the `fs_clean' field. If it was set when the file system was
mounted or upgraded to r-w, then clear it and set it again later when the
file system is unmounted or downgraded.
 1.5 14-Dec-1994  mycroft Sync with CSRG.
 1.4 13-Dec-1994  mycroft Sync with CSRG.
 1.3 20-Oct-1994  cgd update for new syscall args description mechanism, and deal safely
with wider types.
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.11.20.2 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.11.20.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.11.16.1 19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.11.14.2 12-Mar-2001  bouyer Sync with HEAD.
 1.11.14.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.12.4.8 25-Nov-2001  he Pull up revision 1.21 (requested by lukem):
Mark fs_cgrotor as unused.
 1.12.4.7 25-Nov-2001  he Pull up revision 1.20 (requested by lukem):
Pull in enhanced ffs_dirpref() algorithm, which provides a
substantial performance improvement through better locality
between parent/child directories and their files, and by easing
the pressure on the buffer cache for metadata operations.
 1.12.4.6 25-Nov-2001  he Pull up revision 1.19 (requested by lukem):
Deprecate unused fs_fscktime.
 1.12.4.5 25-Nov-2001  he Pull up revision 1.18 (requested by lukem):
Change fs_csp[] from being a fixed size to being an array sized
as required. This allows file systems with more than about 15500
cylinder groups (on 32-bit systems) to be used.
 1.12.4.4 25-Nov-2001  he Pull up revision 1.17 (requested by lukem):
Prevent 32-bit overflows by converting to 64-bit quantities in
appropriate places.
 1.12.4.3 25-Nov-2001  he Pull up revision 1.16 (requested by lukem):
Replace unused fs_headswitch/trkseek with fs_id.
 1.12.4.2 25-Nov-2001  he Pull up revisions 1.14-1.15 (requested by lukem):
Mark a few fields as unused. Multiple include protection.
Also typo corrections.
 1.12.4.1 25-Nov-2001  he Pull up revision 1.13 (requested by lukem):
Use int32_t for on-disk time_t representation.
Convert %q_ to %ll_ in print formats.
 1.13.6.5 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.13.6.4 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.13.6.3 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.13.6.2 13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.13.6.1 03-Aug-2001  lukem update to -current
 1.13.2.8 11-Nov-2002  nathanw Catch up to -current
 1.13.2.7 18-Oct-2002  nathanw Catch up to -current.
 1.13.2.6 17-Apr-2002  nathanw Catch up to -current.
 1.13.2.5 28-Feb-2002  nathanw Catch up to -current.
 1.13.2.4 11-Jan-2002  nathanw More catchup.
 1.13.2.3 08-Jan-2002  nathanw Catch up to -current.
 1.13.2.2 21-Sep-2001  nathanw Catch up to -current.
 1.13.2.1 24-Aug-2001  nathanw Catch up with -current.
 1.20.2.1 01-Oct-2001  fvdl Catch up with -current.
 1.32.2.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.32.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.32.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.32.2.1 03-Aug-2004  skrll Sync with HEAD
 1.44.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.44.4.1 29-Apr-2005  kent sync with -current
 1.45.4.3 21-Jan-2008  yamt sync with head
 1.45.4.2 07-Dec-2007  yamt sync with head
 1.45.4.1 27-Oct-2007  yamt sync with head.
 1.46.48.1 06-Oct-2007  yamt sync with head.
 1.46.46.2 09-Jan-2008  matt sync with HEAD
 1.46.46.1 06-Nov-2007  matt sync with HEAD
 1.46.44.2 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.46.44.1 02-Oct-2007  joerg Sync with HEAD.
 1.46.30.1 09-Oct-2007  ad Sync with head.
 1.47.4.2 18-Feb-2008  mjf Sync with HEAD.
 1.47.4.1 08-Dec-2007  mjf Sync with HEAD.
 1.48.6.1 02-Jan-2008  bouyer Sync with HEAD
 1.48.2.1 26-Dec-2007  ad Sync with head.
 1.49.16.1 19-Oct-2008  haad Sync with HEAD.
 1.49.14.2 03-Jul-2008  simonb Store the location of the journal in the superblock. Currently
nothing really uses this, other than replay checking that what is
in the superblock matches what it expects.
 1.49.14.1 10-Jun-2008  simonb Initial commit of Wasabi System's WAPBL (Write Ahead Physical Block
Logging) journaling code. Originally written by Darrin B. Jewell
while at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

Still a number of issues - look in doc/BRANCHES for "simonb-wapbl"
for more info.
 1.49.12.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.49.10.4 11-Mar-2010  yamt sync with head
 1.49.10.3 18-Jul-2009  yamt sync with head.
 1.49.10.2 16-May-2009  yamt sync with head
 1.49.10.1 04-May-2009  yamt sync with head.
 1.49.6.1 28-Sep-2008  mjf Sync with HEAD.
 1.51.8.2 23-Jul-2009  jym Sync with HEAD.
 1.51.8.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.51.2.1 03-Mar-2009  skrll Sync with HEAD.
 1.55.8.1 20-Jan-2011  bouyer Snapshot of work in progress on a modernised disk quota system:
- new quotactl syscall (versionned for backward compat), which takes
as parameter a path to a mount point, and a prop_dictionary
(in plistref format) describing commands and arguments.
For each command, status and data are returned as a prop_dictionary.
quota commands features will be added to take advantage of this,
exporting quota data or getting quota commands as plists.

- new on disk-format storage (all 64bit wide), integrated to metadata for
ffs (and playing nicely with wapbl).
Quotas are enabled on a ffs filesystem via superblock flags.
tunefs(8) can enable or disable quotas.
On a quota-enabled filesystem, fsck_ffs(8) will track per-uid/gid
block and inode usages, and will check and update quotas in Pass 6.
quota usage and limits are stored in unliked files (one for users,
one for groups)l fsck_ffs(8) will create the files if needed, or
free them if needed. This means that after enabling or disabling
quotas on a filesystem; a fsck_ffs(8) run is required.
quotacheck(8) is not needed any more, on a unclean shutdown
fsck or journal replay will take care of fixing quotas.
newfs(8) can create a ready-to-mount quota-enabled filesystem
(superblock flags are set and quota inodes are created).
Other new features or semantic changes:
- default quota datas, applied to users or groups which don't already
have a quota entry
- per-user/group grace time (instead of a filesystem global one)
- 0 really means "nothing allowed at all", not "no limit".
If you want "no limit", set the limit to UQUAD_MAX (tools will
understand "unlimited" and "-")

A quota file is structured as follow:
it starts with a header, containing a few per-filesystem values,
and the default quota limits.
Quota entries are linked together as a simple list, each entry has a
pointer (as an offset withing the file) to the next.
The header has a pointer to a list of free quota entries, and
a hash table of in-use entries. The size of the hash table depends
on the filesystem block size (header+hash table should fit in the
first block). The file is not sparse and is a multiple of
filesystem block size (when the free quota entry list is empty a new
filesystem block is allocated). quota entries to not cross
filesystem block boundaries.

In memory, the kernel keeps a cache of recently used quota entries
as a reference to the block number, and offset withing the block.
The quota entry itself is keept in the buf cache.

fsck_ffs(8), tunefs(8) and newfs(8) supports are completed (with
related atf tests :)
The kernel can update disk usage and report it via quotactl(2).

Todo: enforce quotas limits (limits are not checked by kernel yet)
update repquota, edquota and rpc.rquotad to the new world
implement compat_50_quotactl ioctl.
update quotactl(2) man page

fsck_ffs required fixes so that allocating new blocks or inodes will
properly update the superblock and cg sumaries. This was not an issue up
to now because superblock and cg sumaries check happened last, but now
allocations or frees can happen in pass 6.
 1.55.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.55.4.1 21-Apr-2011  rmind sync with head
 1.56.8.1 29-Apr-2012  mrg sync to latest -current.
 1.56.4.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.56.4.2 23-Jan-2013  yamt sync with head
 1.56.4.1 23-May-2012  yamt sync with head.
 1.59.2.4 03-Dec-2017  jdolecek update from HEAD
 1.59.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.59.2.2 23-Jun-2013  tls resync from head
 1.59.2.1 25-Feb-2013  tls resync with head
 1.64.2.1 18-May-2014  rmind sync with head
 1.65.6.1 06-Apr-2015  skrll Sync with HEAD
 1.66.28.1 20-Apr-2020  bouyer Sync with HEAD
 1.66.18.1 21-Apr-2020  martin Sync with HEAD
 1.70.2.1 13-May-2023  martin Pull up following revision(s) (requested by chs in ticket #160):

usr.sbin/makefs/ffs/ffs_alloc.c: revision 1.31
sbin/tunefs/tunefs.c: revision 1.58
sbin/fsck_ffs/setup.c: revision 1.105
sbin/fsck_ffs/pass5.c: revision 1.56
usr.sbin/makefs/ffs.c: revision 1.74
usr.sbin/makefs/ffs/mkfs.c: revision 1.42
usr.sbin/makefs/Makefile: revision 1.40
sys/ufs/ffs/fs.h: revision 1.71
sbin/fsdb/fsdb.c: revision 1.54
sbin/resize_ffs/resize_ffs.c: revision 1.58
sbin/fsck_ffs/pass4.c: revision 1.29
usr.sbin/makefs/ffs/ffs_extern.h: revision 1.9
sbin/newfs/mkfs.c: revision 1.133
sys/ufs/ffs/ffs_alloc.c: revision 1.172
sbin/fsck_ffs/pass1b.c: revision 1.24
usr.sbin/dumpfs/dumpfs.c: revision 1.68
sys/ufs/ffs/ffs_extern.h: revision 1.88
usr.sbin/quotacheck/quotacheck.c: revision 1.51
sys/ufs/ffs/ffs_subr.c: revision 1.54
sbin/fsck_ffs/main.c: revision 1.91
sbin/fsck_ffs/pass1.c: revision 1.63

ufs: fixed signed/unsigned bugs affecting large file systems

Apply these commits from FreeBSD:
commit e870d1e6f97cc73308c11c40684b775bcfa906a2
Author: Kirk McKusick <mckusick@FreeBSD.org>
Date: Wed Feb 10 20:10:35 2010 +0000
This fix corrects a problem in the file system that treats large
inode numbers as negative rather than unsigned. For a default
(16K block) file system, this bug began to show up at a file system
size above about 16Tb.
To fully handle this problem, newfs must be updated to ensure that
it will never create a filesystem with more than 2^32 inodes. That
patch will be forthcoming soon.
Reported by: Scott Burns, John Kilburg, Bruce Evans
Followup by: Jeff Roberson
PR: 133980
MFC after: 2 weeks

commit 81479e688b0f643ffacd3f335b4b4bba460b769d
Author: Kirk McKusick <mckusick@FreeBSD.org>
Date: Thu Feb 11 18:14:53 2010 +0000
One last pass to get all the unsigned comparisons correct.

In additional to the changes from FreeBSD, this commit includes quite a few
related changes to appease -Wsign-compare.
 1.72.2.1 02-Aug-2025  perseant Sync with HEAD
 1.12 22-Feb-2009  ad PR kern/26878 FFSv2 + softdep = livelock (no free ram)
PR kern/16942 panic with softdep and quotas
PR kern/19565 panic: softdep_write_inodeblock: indirect pointer #1 mismatch
PR kern/26274 softdep panic: allocdirect_merge: ...
PR kern/26374 Long delay before non-root users can write to softdep partitions
PR kern/28621 1.6.x "vp != NULL" panic in ffs_softdep.c:4653 while unmounting a softdep (+quota) filesystem
PR kern/29513 FFS+Softdep panic with unfsck-able file-corruption
PR kern/31544 The ffs softdep code appears to fail to write dirty bits to disk
PR kern/31981 stopping scsi disk can cause panic (softdep)
PR kern/32116 kernel panic in softdep (assertion failure)
PR kern/32532 softdep_trackbufs deadlock
PR kern/37191 softdep: locking against myself
PR kern/40474 Kernel panic after remounting raid root with softdep

Retire softdep, pass 2. As discussed and later formally announced on the
mailing lists.
 1.11 04-Mar-2007  christos branches: 1.11.40; 1.11.50; 1.11.56;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.10 11-Dec-2005  christos branches: 1.10.26;
merge ktrace-lwp.
 1.9 26-Feb-2005  perry branches: 1.9.4;
nuke trailing whitespace
 1.8 15-Oct-2003  hannken branches: 1.8.8; 1.8.10;
Add the gating of system calls that cause modifications to the underlying
file system.
The function vfs_write_suspend stops all new write operations to a file
system, allows any file system modifying system calls already in progress
to complete, then sync's the file system to disk and returns. The
function vfs_write_resume allows the suspended write operations to
complete.

From FreeBSD with slight modifications.

Approved by: Frank van der Linden <fvdl@netbsd.org>
 1.7 02-Apr-2003  fvdl branches: 1.7.2;
Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.6 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.5 01-Dec-2002  matt Add multiple inclusion protection for headers. Fix mismatched
variable declarations (missing const's) as needed.
 1.4 18-Dec-2001  fvdl Bring over fixes from FreeBSD that weren't incorporated yet, mainly
from Kirk McKusick. They implement taking pending block/inode frees
into account for the sake of correct statfs() numbers, and adding
a new softdep type (newdirblk) to correctly handle newly allocated
directory blocks.

Minor additional changes: 1) swap the newly introduced fs_pendinginodes
and fs_pendingblock fields in ffs_sb_swap, and 2) declare lkt_held
in the debug version of the softdep lock structure volatile, as it
can be modified from interrupt context #ifdef DEBUG.
 1.3 22-Jun-2000  fvdl branches: 1.3.2; 1.3.4; 1.3.8;
Copyright changed.
 1.2 15-Nov-1999  fvdl branches: 1.2.2; 1.2.6;
Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.1 19-Oct-1999  fvdl branches: 1.1.2;
file softdep.h was initially added on branch fvdl-softdep.
 1.1.2.2 26-Oct-1999  fvdl Merge changes in the trickle-sync and softdep code as done by Kirk McKusick
in FreeBSD since the version that we based the branch on. Merging mostly
done by Ethan Solomita <ethan@geocast.com>.

Also, make sure the syncer thread/process isn't active when we're
unmounting a filesystem. This could wreak havoc. XXX should be done
on a per-mountpoint basis, but especially the softdep code would
end up to be a big pile of vfs_busy() calls.
 1.1.2.1 19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.2.6.1 23-Jun-2000  fvdl Update for changed copyright notice.
 1.2.2.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.3.8.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.3.4.2 11-Dec-2002  thorpej Sync with HEAD.
 1.3.4.1 08-Jan-2002  nathanw Catch up to -current.
 1.3.2.2 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.3.2.1 22-Jun-2000  bouyer file softdep.h was added on branch thorpej_scsipi on 2000-11-20 18:11:47 +0000
 1.7.2.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.7.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.7.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.7.2.1 03-Aug-2004  skrll Sync with HEAD
 1.8.10.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.8.8.1 29-Apr-2005  kent sync with -current
 1.9.4.1 03-Sep-2007  yamt sync with head.
 1.10.26.1 12-Mar-2007  rmind Sync with HEAD.
 1.11.56.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.11.50.1 03-Mar-2009  skrll Sync with HEAD.
 1.11.40.1 04-May-2009  yamt sync with head.
 1.7 07-Jan-2025  andvar s/remaing/remaining/ s/containg/containing/, mainly in comments.
 1.6 09-Feb-2024  andvar branches: 1.6.2;
fix spelling mistakes, mainly in comments and log messages.
 1.5 11-Dec-2005  christos merge ktrace-lwp.
 1.4 24-Feb-2004  wiz parameter with two es. From Peter Postma.
 1.3 05-Jul-2001  toshii branches: 1.3.22;
Fix typo. s/extention/extension/
 1.2 10-Apr-1999  perseant branches: 1.2.14;
Change the reference to "newlfs" in the CHANGES file to the correct "newfs_lfs"
 1.1 15-Mar-1999  perseant branches: 1.1.4;
New CHANGES files that describes briefly all nontrivial changes made to
the LFS since the 4.4lite2 code was merged into NetBSD.

TODO updated to remove everything marked DONE in 4.4, and add in a list
of more current things to do.

Get rid of comments about the cleaner syscall code and missing fragment
support from README.
 1.1.4.1 21-Jun-1999  thorpej Sync w/ -current.
 1.2.14.1 24-Aug-2001  nathanw Catch up with -current.
 1.3.22.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.3.22.2 18-Sep-2004  skrll Sync with HEAD.
 1.3.22.1 03-Aug-2004  skrll Sync with HEAD
 1.6.2.1 02-Aug-2025  perseant Sync with HEAD
 1.3 28-Jul-2015  dholland Add a new lfs header file: lfs_accessors.h.

This contains all the accessor functions and macros out of lfs.h.
Add an include of lfs_accessors.h after all uses of lfs.h... except
for code that wants to define its own struct lfs-alike that the
accessors are supposed to play along with. For these, set STRUCT_LFS
and include lfs_accessors.h after the necessary structure has been
defined, so that lfs_accessors.h can emit functions in terms of it.
 1.2 08-Jun-2013  dholland branches: 1.2.10;
Split the definitions suitable for userland out of ulfs_inode.h into
lfs_inode.h. Since fsck_lfs, newfs_lfs, and lfs_cleanerd want to reuse
the inode structure for their own internal use, and some of them share
parts of the kernel code as well, the best way forward is to provide a
relatively sanitized header that doesn't bring in stray material.

Shuffle a few other definitions around so that lfs_inode.h depends
only on lfs.h.

Install lfs_inode.h into /usr/include.
 1.1 12-Jun-1998  cgd branches: 1.1.188; 1.1.198;
Rework the way kernel include files are installed. In the new method,
as with user-land programs, include files are installed by each directory
in the tree that has includes to install. (This allows more flexibility
as to what gets installed, makes 'partial installs' easier, and gives us
more options as to which machines' includes get installed at any given
time.) The old SYS_INCLUDES={symlinks,copies} behaviours are _both_
still supported, though at least one bug in the 'symlinks' case is
fixed by this change. Include files can't be build before installation,
so directories that have includes as targets (e.g. dev/pci) have to move
those targets into a different Makefile.
 1.1.198.2 03-Dec-2017  jdolecek update from HEAD
 1.1.198.1 23-Jun-2013  tls resync from head
 1.1.188.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.2.10.1 22-Sep-2015  skrll Sync with HEAD
 1.3 15-Mar-1999  perseant New CHANGES files that describes briefly all nontrivial changes made to
the LFS since the 4.4lite2 code was merged into NetBSD.

TODO updated to remove everything marked DONE in 4.4, and add in a list
of more current things to do.

Get rid of comments about the cleaner syscall code and missing fragment
support from README.
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.2 06-Jun-2013  dholland branches: 1.2.2; 1.2.10;
Update the line-count standings.
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.2.10.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.2.10.1 06-Jun-2013  yamt file README.wc was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.2.2.2 23-Jun-2013  tls resync from head
 1.2.2.1 06-Jun-2013  tls file README.wc was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.10 11-Dec-2005  christos merge ktrace-lwp.
 1.9 01-Apr-2005  perseant branches: 1.9.2;
Protect various per-fs structures with fs->lfs_interlock simple_lock, to
improve behavior in the multiprocessor case. Add debugging segment-lock
assertion statements.
 1.8 26-Feb-2005  perseant branches: 1.8.2;
Various minor LFS improvements:

* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statvfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().
 1.7 23-Feb-2003  perseant branches: 1.7.2; 1.7.8; 1.7.10; 1.7.12;
Fix a buffer overflow bug in the LFS_UBC case that manifested itself
either as a mysterious UVM error or as "panic: dirty bufs". Verify
maximum size in lfs_malloc.

Teach lfs_updatemeta and lfs_shellsort about oversized cluster blocks from
lfs_gop_write.

When unwiring pages in lfs_gop_write, deactivate them, under the theory
that the pagedaemon wanted to free them last we knew.
 1.6 17-Feb-2003  perseant Add code to UBCify LFS. This is still behind "#ifdef LFS_UBC" for now
(there are still some details to work out) but expect that to go
away soon. To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:

* Create a writer daemon kernel thread whose purpose is to handle page
writes for the pagedaemon, but which also takes over some of the
functions of lfs_check(). This thread is started the first time an
LFS is mounted.

* Add a "flags" parameter to GOP_SIZE. Current values are
GOP_SIZE_READ, meaning that the call should return the size of the
in-core version of the file, and GOP_SIZE_WRITE, meaning that it
should return the on-disk size. One of GOP_SIZE_READ or
GOP_SIZE_WRITE must be specified.

* Instead of using malloc(...M_WAITOK) for everything, reserve enough
resources to get by and use malloc(...M_NOWAIT), using the reserves if
necessary. Use the pool subsystem for structures small enough that
this is feasible. This also obsoletes LFS_THROTTLE.

And a few that are not strictly necessary:

* Moves the LFS inode extensions off onto a separately allocated
structure; getting closer to LFS as an LKM. "Welcome to 1.6O."

* Unified GOP_ALLOC between FFS and LFS.

* Update LFS copyright headers to correct values.

* Actually cast to unsigned in lfs_shellsort, like the comment says.

* Keep track of which segments were empty before the previous
checkpoint; any segments that pass two checkpoints both dirty and
empty can be summarily cleaned. Do this. Right now lfs_segclean
still works, but this should be turned into an effectless
compatibility syscall.
 1.5 13-Jul-2001  perseant Merge the short-lived perseant-lfsv2 branch into the trunk.

Kernels and tools understand both v1 and v2 filesystems; newfs_lfs
generates v2 by default. Changes for the v2 layout include:

- Segments of non-PO2 size and arbitrary block offset, so these can be
matched to convenient physical characteristics of the partition (e.g.,
stripe or track size and offset).

- Address by fragment instead of by disk sector, paving the way for
non-512-byte-sector devices. In theory fragments can be as large
as you like, though in reality they must be smaller than MAXBSIZE in size.

- Use serial number and filesystem identifier to ensure that roll-forward
doesn't get old data and think it's new. Roll-forward is enabled for
v2 filesystems, though not for v1 filesystems by default.

- The inode free list is now a tailq, paving the way for undelete (undelete
is not yet implemented, but can be without further non-backwards-compatible
changes to disk structures).

- Inode atime information is kept in the Ifile, instead of on the inode;
that is, the inode is never written *just* because atime was changed.
Because of this the inodes remain near the file data on the disk, rather
than wandering all over as the disk is read repeatedly. This speeds up
repeated reads by a small but noticeable amount.

Other changes of note include:

- The ifile written by newfs_lfs can now be of arbitrary length, it is no
longer restricted to a single indirect block.

- Fixed an old bug where ctime was changed every time a vnode was created.
I need to look more closely to make sure that the times are only updated
during write(2) and friends, not after-the-fact during a segment write,
and certainly not by the cleaner.
 1.4 17-Nov-2000  perseant branches: 1.4.2; 1.4.4; 1.4.6;
Correct accounting of lfs_avail, locked_queue_count, and locked_queue_bytes.
(PR #11468). In the case of fragment allocation, check to see if enough
space is available before extending a fragment already scheduled for writing.

The locked_queue_* variables indicate the number of buffer headers and bytes,
respectively, that are unavailable to getnewbuf() because they are locked up
waiting for LFS to flush them; make sure that that is actually what we're
counting, i.e., never count malloced buffers, and always use b_bufsize instead
of b_bcount.

If DEBUG is defined, the periodic calls to lfs_countlocked will now complain
if either counter is incorrect. (In the future lfs_countlocked will not need
to be called at all if DEBUG is not defined.)
 1.3 15-Mar-1999  perseant branches: 1.3.8;
New CHANGES files that describes briefly all nontrivial changes made to
the LFS since the 4.4lite2 code was merged into NetBSD.

TODO updated to remove everything marked DONE in 4.4, and add in a list
of more current things to do.

Get rid of comments about the cleaner syscall code and missing fragment
support from README.
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.3.8.1 22-Nov-2000  bouyer Sync with HEAD.
 1.4.6.1 03-Aug-2001  lukem update to -current
 1.4.4.1 27-Jun-2001  perseant Import of what I've been calling "LFSv2", that is, LFS with some features
added that require changes to the on-disk data structures. These include:

- 64-bit time in everything but inodes
- User-specified segment offset, and segment size no longer
restricted to PO2.
- Serial number on segment summaries in addition to timestamp, and
a new volume identifier, to make roll-forward feasible without
fear of finding old data and thinking it was new.

Although I think this version works at least as well as what's on the trunk,
we're not done yet; hence this commit is going in on a branch and not on
the trunk. Enhancements that are not here yet include fragment addressing,
like FFS does, instead of block addressing.
 1.4.2.1 24-Aug-2001  nathanw Catch up with -current.
 1.7.12.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.7.10.1 29-Apr-2005  kent sync with -current
 1.7.8.1 10-May-2005  riz Pull up the following revisions (requested by perseant in ticket #1281):

1.8 sys/ufs/lfs/TODO
1.75 sys/ufs/lfs/lfs.h (via patch)
1.74 sys/ufs/lfs/lfs_alloc.c (via patch)
1.49, 1.51 sys/ufs/lfs/lfs_balloc.c (1.51 via patch)
1.78 sys/ufs/lfs/lfs_bio.c
1.62 sys/ufs/lfs/lfs_extern.h (via patch)
1.156 sys/ufs/lfs/lfs_segment.c (via patch)
1.48 sys/ufs/lfs/lfs_subr.c
1.101 sys/ufs/lfs/lfs_syscalls.c
1.163 sys/ufs/lfs/lfs_vfsops.c (via patch)
1.134 sys/ufs/lfs/lfs_vnops.c (via patch)
1.61 sys/ufs/ufs/ufs_readwrite.c (via patch)

1.20 libexec/lfs_cleanerd/clean.h (via patch)
1.52 libexec/lfs_cleanerd/cleanerd.c (via patch)
1.41 libexec/lfs_cleanerd/library.c (via patch)

1.4 regress/sys/fs/lfs/newfs_fsck/Makefile
1.2 regress/sys/fs/lfs/newfs_fsck/mkfs_mount
1.2 regress/sys/fs/lfs/newfs_fsck/smallfiles
1.3 sbin/fsck_lfs/bufcache.c
1.3 sbin/fsck_lfs/bufcache.h
1.3 sbin/fsck_lfs/lfs.h
1.8 sbin/fsck_lfs/lfs.c (via patch)
1.8 sbin/fsck_lfs/pass3.c (via patch)
1.18 sbin/fsck_lfs/pass0.c (via patch)
1.18 sbin/fsck_lfs/utilities.c (via patch)
1.7 sbin/fsck_lfs/segwrite.c
1.19 sbin/fsck_lfs/setup.c (via patch)
1.3 sbin/newfs_lfs/Makefile
0 sbin/newfs_lfs/lfs.c (yes, remove it)
1.1 sbin/newfs_lfs/make_lfs.c
1.15 sbin/newfs_lfs/newfs.c (via patch)

Various minor LFS improvements.

Kernel:

* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this. Should fix PR #29045.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
Fixes PR #26680.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().

cleaner:

* Adapt lfs_cleanerd to use the fcntl call to get the Ifile filehandle,
so it need not be in the namespace.
* Make lfs_cleanerd be more careful when there are very few available
segments.
* Make lfs_cleanerd less verbose when the filesystem is unmounted.

newfs_lfs, fsck_lfs, and regression:

* Extend the lfs library from fsck_lfs(8) so that it can be used with a
not-yet-existent LFS. Make newfs_lfs(8) use this library, so it can
create LFSs whose Ifile is larger than one segment. Addresses PR #11110.
* Make newfs_lfs(8) use strsuftoi64() for its arguments, a la newfs(8).
* Make fsck_lfs(8) respect the "file system is clean" flag.
* Don't let fsck_lfs(8) think it has dirty blocks when invoked with the
-n flag.
* Remove the Ifile from the filesystem namespace. The cleaner now uses
a fcntl call on the root inode to find the Ifile filehandle. (As a
side-effect, addresses PR #29144.)
 1.7.2.2 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.7.2.1 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.8.2.1 07-May-2005  tron Apply patch (requested by perseant in ticket #242):
* fsck_lfs buffer cache fixes, including PR #29151
* Change fsck_lfs phase 0 message to reflect reality
* fsck_lfs: check phase 5 (cleanerinfo accounting) even on
roll-forward
* Keep better track of the free list during roll-forward, avoiding
a core dump
* Improve hash table use for fsck_lfs buffer and vnode cache
* Document fsck_lfs flag -f, and implement -q
* Add resize_lfs, including kernel support
* Add LFS to mountd's list of exportable filesystem types
* Make the LFS lkm work again [christos@]
* Add MP locking to the LFS kernel subsystem
* Fix pager_map deadlock in lfs_putpages()
* Avoid incomplete file extension that looks like "partial
truncation" to fsck
* Use lfs_malloc for cleaner malloc, since the cleaner often runs
in low-memory conditions.
* Use splay trees, not hash table, to track page allocation for
write.
* Fix mkdir panic on full fs
* Fix page accounting leak by counting differently.
* Use rightly named structure for lfs_getattr [skrll@]
* Cosmetic changes for readability.
 1.9.2.1 21-Jun-2006  yamt sync with head.
 1.212 04-Nov-2025  perseant Remove su_flags array, replacing it with a new flag SEGUSE_READY.
Segments progress from having su_nbytes==0 to SEGUSE_EMPTY to SEGUSE_READY
to clean, progressing to the nest step after a checkpoint.
 1.211 20-Oct-2025  perseant * Generalize the partial-segment parser introduced for roll-forward,
using it to facilitate an in-kernel segment rewriter (cleaner), and a
mechanism to check whether a segment is in fact empty (only used
with DEBUG).

* Add these new fcntl calls:
- LFCNFILESTATS: For each inode given, report its number of direct
blocks, how many gaps (discontinuities) there are between direct
blocks, and how large the total gap distance is. This will be
useful for a coalescing agent.
- LFCNREWRITEFILE: For each inode given, rewrite its direct blocks,
effectively coalescing it into as compact a form as possible.
- LFCNSCRAMBLE: As above, except that it only rewrites every other
block. This causes the file to have many gaps that can be
measured with LFCNFILESTATS and addressed with LFCNREWRITEFILE,
for testing purposes.
- LFCNREWRITESEGS: Rewrite any live data in the given segments.
This is intended to simplify the cleaner API and facilitate an
in-kernel cleaner.
- LFCNCLEANERINFO: Get the most current CLEANERINFO data from the
kernel.
- LFCNSEGUSE: Retrieve segment usage data from the kernel.

* Vnodes marked IN_CLEANING now take a reference. Add a new "cleaner
lock", which must be taken by the cleaner before the segment lock,
and before marking nodes IN_CLEANING. This allows us to flush
vnodes, if necessary, before the cleaning segment is written, and
never to flush vnodes being cleaned. When the cleaner lock is
released, the vnodes are cleared of IN_CLEANING and the reference
dropped.

* Track a potential infinite loop in lfs_gatherblock.

* Pull "needs to flush" and "needs to wait for flush" into functions
instead of inlining their definitions.
 1.210 17-Sep-2025  perseant Add routines to check freelist consistency if compiled with DEBUG and
conditional on a kernel variable manipulated via sysctl.
Add checks before and after each routine that modifies the free list.
#if 0 a section of lfs_vfree() that was intended to keep the free list ordered
but instead corrupted it.
 1.209 15-Sep-2025  perseant If setting the head (or tail) of the inode free list to LFS_UNUSED_INUM, also
set the tail (resp. head) to LFS_UNUSED_INUM, as the list is now empty.

Add a check to ensure that lfs_valloc_fixed will always terminate, even
if the free list should contain a loop. Extend the ifile at the end if it
is empty, to match the assumption of lfs_valloc() that the free list is
never empty.

Needed for roll-forward.
 1.208 28-Mar-2020  christos Comment out some of the CTASSERTS for lint until I fix lint.
 1.207 21-Mar-2020  riastradh Avoid misaligned access to lfs64 on-disk records in memory.

lfs64 directory entries are only 32-bit aligned in order to conserve
space in directory blocks, and we had a hack to stuff a 64-bit inode
in them. This replaces the hack by __aligned(4) __packed, and goes
further:

1. It's not clear that all the other lfs64 data structures are 64-bit
aligned on disk to begin with. We can go through these later and
upgrade them from

struct foo64 {
...
} __aligned(4) __packed;

union foo {
struct foo64 f64;
...
};

to

struct foo64 {
...
};

union foo {
struct foo64 f64 __aligned(8);
...
} __aligned(4) __packed;

if we really want to take advantage of 64-bit memory accesses.

However, the __aligned(4) __packed must remain on the union
because:

2. We access even the lfs32 data structures via a union that has
lfs64 members, and it turns out that compilers will assume access
through a union with 64-bit aligned members implies the whole
union has 64-bit alignment, even if we're only accessing a 32-bit
aligned member.
 1.206 21-Mar-2020  riastradh CTASSERT lfs on-disk structure sizes.
 1.205 23-Feb-2020  riastradh Teach LFS_ORPHAN_NEXTFREE about lfs64.
 1.204 10-Jan-2019  martin branches: 1.204.4; 1.204.6;
Update comment (overlooked in r1.179).
From Jos� Luis Rodr�guez Garc�a in PR kern/53849.
 1.203 26-Jul-2017  maya branches: 1.203.2; 1.203.4;
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar

XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
 1.202 05-Jun-2017  maya Move definition of IN_ALLMOD near the flag it's a mask for.

Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
 1.201 01-Apr-2017  maya branches: 1.201.6;
switch lfs_dirops to condvar (from mtsleep)
 1.200 01-Apr-2017  maya switch lfs_sleepers to condvar (from mtsleep)
 1.199 20-Jun-2016  dholland branches: 1.199.2; 1.199.4;
Massedit u_int{8,16,32,64}_t to uint{8,16,32,64}_t. This effectively
merges ufs/dinode.h 1.25.
 1.198 19-Jun-2016  dholland we are actually synced with ufs/dinode.h 1.24 and ufs/dir.h 1.25.
 1.197 26-Nov-2015  dholland Update now-lying comment.
 1.196 15-Oct-2015  dholland For now bitflip the lfs64 magic number.

This will be unflipped when the format is finalized - right now I
still have pending changes to the superblock in mind (to reduce the
number of redundant fields) so anything created now is not future-
proof. However, the code's also nearing being ready for testing; so
I'm doing this before turning it on as a precaution.
 1.195 15-Oct-2015  dholland Move stuff from struct ulfsmount to struct lfs.
 1.194 03-Oct-2015  dholland Add lfs_checkword type for reading checksum data out of structures.
This is always uint32_t, but having a name for it both makes things
clearer and avoids confusion about whether it should be 32 or 64 bit.

Note: deployed in only one place (that was erroneously tagged
ondisk32) so far.
 1.193 03-Oct-2015  dholland Add an IINFO struct, which is like the FINFO struct but for the inode
blocks portion of the segment summary.

A segment summary block begins with a header (SEGSUM); the rest of the
block contains FINFO structures describing file blocks growing upward
from the bottom (after the header), and IINFO structures describing
inode blocks grown downward from the end of the block. (When they meet
the segment is full regardless of how many blocks might be left.)

IINFO contains just a block number, and until now this information was
handled by just using uint32_t*; switching to a structure will make
the code a lot easier to read, and also make it easier to have 32-bit
and 64-bit versions without making a mess.

This commit just adds the structures and accessors; they'll be
deployed into the code in subsequent commits.
 1.192 21-Sep-2015  dholland Oops, I forgot to make the atime in the 64-bit IFILE 64 bits.
Correct that. Incompatible change, but no LFS64 volumes can have been
created yet.
 1.191 21-Sep-2015  dholland Add 64-bit directory entry structures, and adjust accessors accordingly.

The LFS64 directory entry has a 64-bit inode number. This is stored as
two 32-bit values to avoid inducing 64-bit alignment requirements.

The exposed type for manipulating directory entries is now
LFS_DIRHEADER, following the same convention as e.g. IFILE and SEGUSE.
(But with LFS_ on it, because.)
 1.190 20-Sep-2015  dholland Clean up struct lfs_dirtemplate.
 1.189 15-Sep-2015  dholland Remove struct lfs_direct (no longer used) and update the big block
comment about directories.
 1.188 15-Sep-2015  dholland Add an accessor function for directory names.
 1.187 15-Sep-2015  dholland Move the header part of struct lfs_direct to its own structure.
(lfs_dirheader)

Take the opportunity to improve the directory generation code in
make_lfs.c. (Everything else was unaffected by virtue of using
accessor functions.)
 1.186 15-Sep-2015  dholland Add and use accessor functions for more of the directory entry fields.
 1.185 01-Sep-2015  dholland Add new accessors for the d_type and d_namlen fields of struct lfs_direct.
Napalm the old byteswap access logic for these.
 1.184 01-Sep-2015  dholland Comments on directories.

This includes a description of the struct direct byteswap horrors that
ought to be propagated to ufs/ufs.
 1.183 01-Sep-2015  dholland The ifile's inode number is constant. (it is always 1)

Therefore, storing the value in the superblock and reading it out
again is silly and offers the opportunity for it to become corrupted.
So, don't do that (most of the code already didn't) and use the
existing constant instead. Initialize new 32-bit superblocks with
the value for the sake of old userland programs, but don't keep the
value in the 64-bit superblock at all.

(approved by Margo Seltzer)
 1.182 01-Sep-2015  dholland Make the inode fields in the 64-bit superblock 64 bits wide.
Reasoning as before.

Note that I am not going through and checking for 64->32 truncations
in inode numbers; I'm sure there are quite a few, but that's a project
for later.
 1.181 01-Sep-2015  dholland Add byteswapping to the dinode accessors.

This prevents regressions in the ulfs code when switching to the new
accessors. Note that while adding byteswapping to the other accessors
is straightforward, I haven't done it yet; and that also is not enough
to make LFS_EI work, because there are places lying around that bypass
the accessors for one reason and another and all of them need to be
updated. That is going to have to wait for a later day as LFS_EI is
not on the critical path right now.
 1.180 12-Aug-2015  dholland Hack up dinode usage to be 64 vs. 32 as needed. Part 1.

(This part changes the native lfs code; the ufs-derived code already
has 64 vs. 32 logic, but as aspects of it are unsafe, and don't
entirely interoperate cleanly with the lfs 64/32 stuff, pass 2 will be
rehashing that.)
 1.179 12-Aug-2015  dholland Make the inode number in the 64-bit dinode 64 bits wide, like the
other lfs64 on-disk inode numbers; I've been doing that since this is
a new format and we may as well take the opportunity. This does assume
that more than 4 billion files on a single volume becomes desirable;
but for an average file size of 10K all that takes is a 40 TB volume,
and it's not that hard to make one of those these days if you want to
badly enough.
 1.178 12-Aug-2015  dholland Provide 32-bit and 64-bit versions of FINFO.

This also entailed sorting out part of struct segment, as that
contains a pointer into the current FINFO data.
 1.177 12-Aug-2015  dholland Make 32-bit and 64-bit versions of SEGSUM.
Also fix some of the FINFO handling as it's closely entangled.
 1.176 12-Aug-2015  dholland Add IFILE32 and IFILE64 structures for the on-disk ifile entries.
Add and use accessors. There are also a bunch of places that cast and
I hope I've found them all...
 1.175 12-Aug-2015  dholland Make 32-bit and 64-bit versions of CLEANERINFO.

XXX: while this is written to disk, it seems like much of it would
XXX: be better set up as a commpage shared with the cleaner.
 1.174 12-Aug-2015  dholland Widen several of the fields of BLOCK_INFO to 64 bits.

Keep the old BLOCK_INFO as BLOCK_INFO_70, and version the fcntls that
use it.

Note that BLOCK_INFO_70 has 64-bit padding issues so that it's
different on 32-bit and 64-bit machines. This has been fixed. However,
BLOCK_INFO also contains a pointer, so compat32 stuff for 32-on-64 is
still needed and doesn't currently exist.
 1.173 12-Aug-2015  dholland Fix assorted 64->32 truncations related to BLOCK_INFO.

Also make note of a cleaner limitation: it seems that when it goes to
coalesce discontiguous files, it mallocs an array with one BLOCK_INFO
for every block in the file. Therefore, with 64-bit LFS, on a 32-bit
platform it will be possible to have files large enough to overflow
the cleaner's address space. Currently these will be skipped and cause
warnings via syslog.

At some point someone should rewrite the logic to coalesce files to
use chunks of some reasonable size, as discontinuity between such
chunks is immaterial and mallocing this much space is silly and
fragile. Also, the kernel only accepts up to 65536 blocks at a time
for bmapv and markv, so processing more than this at once probably
isn't useful and may not even work currently. I don't want to change
this around just now as it's not entirely trivial.
 1.172 02-Aug-2015  dholland Pass the fs object to LFS_MAX_DADDR so it can check lfs_is64.

Remove some hackish intentional 64->32 truncations next to the checks
using LFS_MAX_DADDR, and tackle the problem they handled in bmap
instead.

The problem: the magic block pointer value UNWRITTEN has magic value
-2, and if it's not handled specifically, uint32 -> uint64 promotion
turns it into 4294967294, which then causes consternation and
monkeyhouse downstream.

What's here is still kind of a hack, but it's a step forward.
 1.171 02-Aug-2015  dholland Add a (draft) 64-bit superblock. Make things build again.

Add pieces of support for using both superblock types where
convenient, and specifically to the superblock accessors, but don't
actually enable it anywhere.

First substantive step on PR 50000.
 1.170 02-Aug-2015  dholland lfs_cleanint[] in the in-memory superblock needs to have 64-bit entries.
 1.169 02-Aug-2015  dholland Second batch of 64 -> 32 truncations in lfs, along with more minor
tidyups and corrections in passing.
 1.168 02-Aug-2015  dholland Fix assorted 64 -> 32 truncations in lfs. Also, some minor tidyups and
corrections in passing.
 1.167 28-Jul-2015  dholland Move struct salfs back inside libsa now that lfs_accessors.h is separate.
 1.166 28-Jul-2015  dholland Add a new lfs header file: lfs_accessors.h.

This contains all the accessor functions and macros out of lfs.h.
Add an include of lfs_accessors.h after all uses of lfs.h... except
for code that wants to define its own struct lfs-alike that the
accessors are supposed to play along with. For these, set STRUCT_LFS
and include lfs_accessors.h after the necessary structure has been
defined, so that lfs_accessors.h can emit functions in terms of it.
 1.165 24-Jul-2015  dholland More lfs superblock accessors.
(This changes the rest of the code over; all the accessors were
already added.)

The difference between this commit and the previous one is arbitrary,
but the previous one passed the regression tests on its own so I'm
keeping it separate to help with any bisections that might be needed
in the future.
 1.164 24-Jul-2015  dholland Switch to accessor functions for elements of the LFS on-disk
superblock. This will allow switching between 32/64 bit forms on the
fly; it will also allow handling LFS_EI reasonably tidily. (That
currently doesn't work on the superblock.)

It also gets rid of cpp abuse in the form of fake structure member
macros.

Also, instead of doing sleep/wakeup on &lfs_avail and &lfs_nextseg
inside the on-disk superblock, add extra elements to the in-memory
struct lfs for this. (XXX: these should be changed to condvars, but
not right now)

XXX: this migrates a structure needed by the lfs code in libsa (struct
salfs) into lfs.h, where it doesn't belong, but for the time being
this is necessary in order to allow the accessors (and the various
lfs macros and other goop that relies on them) to compile.
 1.163 24-Jul-2015  dholland ulfs2_dinode, having never actually been used with lfs, doesn't have a
di_inumber field. Fix that. First preliminary step on PR 50000.
 1.162 31-May-2015  hannken Use VFS_PROTOS() for lfs.
Rename conflicting struct lfs field "lfs_start" to "lfs_s0addr".

No functional change.
 1.161 28-Mar-2015  maxv Remove the 'cred' argument from bread(). Remove a now unused var in
ffs_snapshot.c. Update the man page accordingly.

ok hannken@
 1.160 28-Jul-2013  dholland branches: 1.160.6;
Bring in a copy of ffs_quota2_mount() for reference.
Add stuff to struct lfs that it needs to initialize.
Clear these fields in mount as there's no on-disk support for quota2;
but this increases the chances of being able to add it (or something
like it) in the future.
 1.159 28-Jul-2013  dholland Migrate the miscellaneous ulfs-level info from struct ulfsmount to
struct lfs.

Put them inside #ifdef _KERNEL there. They are not the only such
members, gross as that is. Unfortunately, moving struct lfs to
lfs_kernel.h does not work.
 1.158 28-Jul-2013  dholland Add lfs_kernel.h for declarations that don't need to be exposed to userland.

lfs currently has the following headers:
lfs.h - on-disk structures and stuff needed for userlevel tools
lfs_inode.h - additional restricted materials for userlevel tools
that operate the fs (newfs_lfs, fsck_lfs, lfs_cleanerd)
lfs_kernel.h - stuff needed only in the kernel

and the following legacy headers that are expected to be mopped up and
folded into one of the above:
lfs_extern.h - function prototypes
ulfs_bswap.h - endian-independent support
ulfs_dinode.h - now contains very little
ulfs_dirhash.h - dirhash support
ulfs_extattr.h - extattr support
ulfs_extern.h - more function prototypes
ulfs_inode.h - assorted kernel-only declarations
ulfs_quota.h - quota support
ulfs_quota1.h - more quota support
ulfs_quota2.h - more quota support
ulfs_quotacommon.h - more quota support
ulfsmount.h - legacy copy of ufsmount material
 1.157 28-Jun-2013  matt branches: 1.157.2;
Remove duplicate define of LFS_MAXNAMLEN
 1.156 23-Jun-2013  dholland typo in comment
 1.155 19-Jun-2013  dholland Rename ambiguous macros:
MAXDIRSIZE -> UFS_MAXDIRSIZE or LFS_MAXDIRSIZE
NINDIR -> FFS_NINDIR, EXT2_NINDIR, LFS_NINDIR, or MFS_NINDIR
INOPB -> FFS_INOPB, LFS_INOPB
INOPF -> FFS_INOPF, LFS_INOPF
blksize -> ffs_blksize, ext2_blksize, or lfs_blksize
sblksize -> ffs_blksize

These are not the only ambiguously defined filesystem macros, of
course, there's a pile more. I may not have found all the ambiguous
definitions of blksize(), too, as there are a lot of other things
called 'blksize' in the system.
 1.154 18-Jun-2013  christos Prefix most of the cpp macros with lfs_ and LFS_ to avoid conflicts with ffs.
This was done so that boot blocks that want to compile both FFS and LFS in
the same file work.
 1.153 18-Jun-2013  dholland Tuck away a bunch of symbols that don't need to be public.
 1.152 09-Jun-2013  dholland Move struct lfs_inode_ext to lfs_inode.h; it doesn't need to be public.
 1.151 08-Jun-2013  dholland Remove stale union and accessor macros.
 1.150 08-Jun-2013  dholland ulfs_dir.h has been emptied; remove it.
 1.149 08-Jun-2013  dholland Move a comment to lfs.h that belongs better there.
 1.148 08-Jun-2013  dholland As nearly all the content of ulfs_dir.h and ulfs_dinode.h has migrated
to lfs.h, propagate the copyright notices too.
 1.147 08-Jun-2013  dholland Move more symbols to lfs.h:
LFS_DIRBLKSIZ
LFS_DIRECTSIZ
LFS_DIRSIZ
LFS_OLDDIRFMT
LFS_NEWDIRFMT
LFS_IFTODT
LFS_DTTOIF
ULFS{,1,2}_MAXSYMLINKLEN
 1.146 08-Jun-2013  dholland DIRBLKSIZ -> LFS_DIRBLKSIZ
DIRECTSIZ -> LFS_DIRECTSIZ
DIRSIZ -> LFS_DIRSIZ
OLDDIRFMT -> LFS_OLDDIRFMT
NEWDIRFMT -> LFS_NEWDIRFMT
IFTODT -> LFS_IFTODT
DTTOIF -> LFS_DTTOIF
 1.145 08-Jun-2013  dholland Move stuff to lfs.h that's needed by userland:
LFS_DT_*
ULFS_ROOTINO
ULFS_WINO
struct lfs_direct
struct lfs_dirtemplate
struct lfs_odirtemplate
struct ulfs_args

Also fix FFS_MAXNAMLEN -> LFS_MAXNAMLEN in several places.
 1.144 08-Jun-2013  dholland Now move LFS_IFMT and friends from ulfs_dinode.h to lfs.h.
 1.143 08-Jun-2013  dholland Move the dinode (on-disk inode) structures to lfs.h, since they are
and will be obviously required by userland tools that need to read
the on-disk structures.

Also, DINODE{1,2}_SIZE -> LFS_DINODE{1,2}_SIZE.
 1.142 08-Jun-2013  dholland Split the definitions suitable for userland out of ulfs_inode.h into
lfs_inode.h. Since fsck_lfs, newfs_lfs, and lfs_cleanerd want to reuse
the inode structure for their own internal use, and some of them share
parts of the kernel code as well, the best way forward is to provide a
relatively sanitized header that doesn't bring in stray material.

Shuffle a few other definitions around so that lfs_inode.h depends
only on lfs.h.

Install lfs_inode.h into /usr/include.
 1.141 06-Jun-2013  dholland Fix some exposed symbols:
LOSTFOUNDINO -> LFS_LOSTFOUNDINO
struct ufid -> struct ulfs_ufid
 1.140 06-Jun-2013  dholland Cleanups to reduce symbol and header exposure:
- move struct ufid from ulfs_inode.h to lfs.h
- lfs.h needs sys/mount.h and sys/pool.h
- ulfs_quota2_subr.c needs lfs_inode.h
- remove ulfs_inode.h from lfs.h in favor of ulfs_dinode.h
- move ULFS_NDADDR, ULFS_NIADDR, ULFS_NXADDR from ulfs_dinode.h to lfs.h
- remove ulfs_dinode.h from lfs.h
- add lfs.h to ulfs_dinode.h
 1.139 06-Jun-2013  dholland Remove stray references to ext2fs, chfs, ffs, and mfs.
 1.138 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.137 22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.136 16-Feb-2012  perseant branches: 1.136.2;
Pass t_renamerace and t_rmdirrace tests.

Adapt dholland@'s fix to ufs_rename to fix PR kern/43582. Address several
other MP locking issues discovered during the course of investigating the
same problem.

Removed extraneous vn_lock() calls on the Ifile, since the Ifile writes
are controlled by the segment lock.

Fix PR kern/45982 by deemphasizing the estimate of how much metadata
will fill the empty space on disk when the disk is nearly empty
(t_renamerace crates a lot of inode blocks on a tiny empty disk).
 1.135 02-Jan-2012  perseant branches: 1.135.2;

* Remove PGO_RECLAIM during lfs_putpages()' call to genfs_putpages(),
to avoid a live lock in the latter when reclaiming a vnode with
dirty pages.

* Add a new segment flag, SEGM_RECLAIM, to note when a segment is
being written for vnode reclamation, and record which inode is being
reclaimed, to aid in forensic debugging.

* Add a new segment flag, SEGM_SINGLE, so that opportunistic writes
can write a single segment's worth of blocks and then stop, rather
than writing all the way up to the cleaner's reserved number of
segments.

* Add assert statements to check mutex ownership is the way it ought
to be, mostly in lfs_putpages; fix problems uncovered by this.

* Don't clear VU_DIROP until the inode actually makes its way to disk,
avoiding a problem where dirop inodes could become separated
(uncovered by a modified version of the "ckckp" forensic regression
test).

* Move the vfs_getopsbyname() call into lfs_writerd. Prepare code to
make lfs_writerd notice when there are no more LFSs, and exit losing
the reference, so that, in theory, the module can be unloaded. This
code is not enabled, since it causes a crash on exit.

* Set IN_MODIFIED on inodes flushed by lfs_flush_dirops. Really we
only need to set IN_MODIFIED if we are going to write them again
(e.g., to write pages); need to think about this more.

Finally, several changes to help avoid "no clean segments" panics:

* In lfs_bmapv, note when a vnode is loaded only to discover whether
its blocks are live, so it can immediately be recycled. Since the
cleaner will try to choose ~empty segments over full ones, this
prevents the cleaner from (1) filling the vnode cache with junk, and
(2) squeezing any unwritten writes to disk and running the fs out of
segments.

* Overestimate by half the amount of metadata that will be required
to fill the clean segments. This will make the disk appear smaller,
but should help avoid a "no clean segments" panic.

* Rearrange lfs_writerd. In particular, lfs_writerd now pays
attention to the number of clean segments available, and holds off
writing until there is room.
 1.134 11-Jul-2011  hannken branches: 1.134.2; 1.134.6;
Change VOP_BWRITE() to take a vnode as its first argument like all other
VOPs do. Layered file systems no longer have to modify bp->b_vp and run
into trouble when an async VOP_BWRITE() uses the wrong vnode.

- change all occurences of VOP_BWRITE(bp) to VOP_BWRITE(bp->b_vp, bp).
- remove layer_bwrite().
- welcome to 5.99.55

Adresses PR kern/38762 panic: vwakeup: neg numoutput

No objections from tech-kern@.
 1.133 16-Feb-2010  mlelstv Three changes in a single commit.

- drop the notion of frags (LFS fragments) vs fsb (FFS fragments)
The code uses a complicated unity function that just makes the
code difficult to understand.

- support larger sector sizes. Fix disk address computations
to use DEV_BSIZE in the kernel as required by device drivers
and to use sector sizes in userland.

- Fix several locking bugs in lfs_bio.c and lfs_subr.c.
 1.132 05-Nov-2009  pooka branches: 1.132.2;
... actually, define compat only for the kernel. Userlandia should
see only one version of the interfaces.
 1.131 05-Nov-2009  pooka Include compat/sys/time_types.h instead of compat/sys/time.h.
Fixes lint drama with interface name collisions.
 1.130 05-Nov-2009  pooka Include compat code by default.
 1.129 29-Oct-2009  christos PR/42246: NAKAJIMA Yoshihiro: provide COMPAT_50 for LFS
 1.128 19-Jul-2009  dholland typo in comment
 1.127 16-May-2008  hannken branches: 1.127.12;
Make sure all cached buffers with valid, not yet written data have been
run through copy-on-write. Call fscow_run() with valid data where possible.

The LP_UFSCOW hack is no longer needed to protect ffs_copyonwrite() against
endless recursion.

- Add a flag B_MODIFY to bread(), breada() and breadn(). If set the caller
intends to modify the buffer returned.

- Always run copy-on-write on buffers returned from ffs_balloc().

- Add new function ffs_getblk() that gets a buffer, assigns a new blkno,
may clear the buffer and runs copy-on-write. Process possible errors
from getblk() or fscow_run(). Part of PR kern/38664.

Welcome to 4.99.63

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.126 28-Apr-2008  martin branches: 1.126.2;
Remove clause 3 and 4 from TNF licenses
 1.125 15-Feb-2008  ad branches: 1.125.6; 1.125.8; 1.125.10;
The buffer LOCKED flag need not be under the protection of bufcache_lock,
BUSY is enough.
 1.124 03-Jan-2008  ad Use pool_cache.
 1.123 02-Jan-2008  ad Merge vmlocking2 to head.
 1.122 10-Oct-2007  ad branches: 1.122.4; 1.122.6; 1.122.10;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.121 08-Oct-2007  ad Merge ffs locking & brelse changes from the vmlocking branch.
 1.120 16-May-2007  perseant branches: 1.120.6; 1.120.8; 1.120.10;
Change references to SEGM_W_DIROPS to SEGM_CKP, and replace the logic that
formerly used SEGM_W_DIROPS in lfs_segwrite() appropriately. This prevents
a problem in which processes could get stuck in "buffers" sleep forever.
 1.119 17-Apr-2007  perseant Install a new sysctl, vfs.lfs.ignore_lazy_sync, which causes LFS to ignore
the "smooth" syncer, as if vfs.sync.*delay = 0, but only for LFS. The
default is "on", i.e., ignore lazy sync.

Reduce the amount of polling/busy-waiting done by lfs_putpages(). To
accomplish this, copied genfs_putpages() and modified it to indicate which
page it was that caused it to return with EDEADLK. fsync()/fdatasync()
should no longer ever fail with EAGAIN, and should not consume huge
quantities of cpu.

Also, try to make dirops less likely to be written as the result of a
VOP_PUTPAGES(), while ensuring that they are written regularly.
 1.118 15-Feb-2007  ad branches: 1.118.2; 1.118.6; 1.118.8;
Replace some uses of lockmgr() / simplelocks.
 1.117 28-Sep-2006  perseant Use lockstatus instead of a homebrewed locking system to control
LFCNWRAPSTOP and LFCNWRAPGO.

Be less verbose about the various looping checks: use log() rather than
printf(), and only log anything if we are really looping ("count = 2" is
not an error condition).

Allow dirops sleeping on available space to be interruptible.
 1.116 15-Sep-2006  perseant branches: 1.116.2;
Don't remark a locked inode with IN_MODIFIED after writing it to disk,
if we ourselves hold the lock. This prevents e.g. mknod from hanging
indefinitely.

Also, always use the return value from VOP_ISLOCKED to determine whether
we hold the lock or someone else does, rather than looking into the lock
structure ourselves.
 1.115 15-Sep-2006  yamt merge yamt-pdpolicy branch.
- separate page replacement policy from the rest of kernel
- implement an alternative replacement policy
 1.114 01-Sep-2006  perseant branches: 1.114.2;
Changes to help the roll-forward agent, to wit:

* Mark being-deleted files in the Ifile so we can finish deleting them
at fs mount time.
* Flag the Ifile with "cleaner must clean" when writers are waiting for
the cleaner, rather than relying solely on the cleaner's estimation of
whether it should clean or not.
* Note partial segments written by a user agent (in particular,
fsck_lfs) so that repeated rolls forward don't interfere with one
another.
* Add a new fcntl, LFCNPASS, that allows the log to wrap exactly once,
for better testing of the validity of checkpoints.
* Keep track of the on-disk nlink count when cleaning, so that we don't
partially complete directory operations while cleaning.
* Ensure that every single Ifile inode write represents a consistent
view of the filesystem. In particular, the accounting for the segment
we are writing the inode into must be correct, and the accounting for
the segment that inode used to reside in must be correct. Rather than
just rewriting the inode if we wrote it wrong, rewrite the necessary
ifile blocks before writing the inode so we never write it wrong.
* Don't unmark any VDIROP vnodes if we haven't written them to disk,
avoiding yet another problem with the "wait for the cleaner" error
return from lfs_putpages().

Also, move the last callback to an aiodone call, so we no longer do any
memory management from interrupt context.
 1.113 06-Aug-2006  martin Fix size confusion with lfs_fhandle - and as it now turns out to be the same
as the lfs compat_30_fhandle, g/c the latter.
Add an alias for the LFCNIFILEFH fcntl, so that binaries compiled in the
meantime (with too large lfs_fhandle) continue to work.

This makes vfs_cleanerd work again after the kernel checks filehandle size
more strictly (problem reported by Kurt Schreiner on current-users).
 1.112 31-Jul-2006  martin Make filehandles opaque to userland
 1.111 20-Jul-2006  perseant Note partial segments that are written by the cleaner, to help out the
roll-forward agent.
 1.110 13-Jul-2006  martin Version the lfs_cleanerd internal fcntl() for filehandles too,
so old cleaners should work with newer kernels.
 1.109 13-Jul-2006  martin Fix alignement problems for fhandle_t, exposed by gcc4.1.

While touching all vptofh/fhtovp functions, get rid of VFS_MAXFIDSIZ,
version the getfh(2) syscall and explicitly pass the size available in
the filehandle from userland.

Discussed on tech-kern, with lots of help from yamt (thanks!).
 1.108 24-Jun-2006  perseant Change LFCNWRAP{STOP,GO} to make them more suitable for snapshotting; in
particular, the caller can now choose whether to wait for the condition
to be met, and if the caller of LFCNWRAPSTOP dies or otherwise closes
the descriptor, the filesystem is started again. Updated the ckckp
regression test to use the new semantics.

dump_lfs(8) now uses the fcntls to implement LFS-style snapshotting through
the -X flag, addressing PR#33457 albeit not using fss(4). Fixed a couple
other problems with dump_lfs that manifested themselves during testing.
 1.107 14-May-2006  elad branches: 1.107.4;
integrate kauth.
 1.106 12-May-2006  perseant Fixes to address the "vinvalbuf: dirty blocks" panic that can occur when
many inodes are cleaned at once. Make sure that we write all the pages
on vnodes that are being flushed, even if we don't think there's room;
drain v_numoutput before lfs_vflush() completes.

Also, don't allow a vnode that is in the process of being cleaned to be
chosen by getnewvnode(); this avoids a segment accounting panic in the case
that a large number of inodes are fed to lfs_markv() all at once.
 1.105 04-May-2006  perseant Introduce another per-filesystem parameter, lfs_resvseg, to separate the
notion of "how many segments are reserved for the cleaner" from that of
"how many segments are not counted in lfs_bfree". The default value
used for existing filesystems is the same as the previous implicit value
of (lfs_minfreeseg / 2 + 1), modulo some sanity checking.

Count pending dirops on a per-filesystem basis, since once we start
writing them we can't stop until we're done. This seems to help stave off
the "no clean segments" panic in the case of filling the filesystem with
directories and small files (e.g. simultaneously unpacking more copies of
pkgsrc than will fit).
 1.104 30-Apr-2006  perseant Postpone the segment accounting changes coming from truncation until the
inode that makes those changes valid is either written to disk by
lfs_writeinode() or discarded by lfs_vfree().

A couple of locking fixes are also included as well.
 1.103 17-Apr-2006  perseant Introduce two fcntl calls that freeze the filesystem right at the point
where segment 0 is being considered for writing. This allows for automated
checkpoint vailidity scanning, and could be used (in conjunction with the
existing LFCNREWIND) for e.g. snapshot dumps as well.

Include a regression test that does such scanning.

When writing the Ifile, loop through the dirty block list three times to
make sure that the checkpoint is always consistent (the first and second
times the Ifile blocks can cross a segment boundary; not so the third time
unless the segments are very small). Discovered by using the aforementioned
regression test.
 1.102 13-Apr-2006  perseant Make lfs_vref/lfs_vunref not need to know about VXLOCK and VFREEING
explicitly (especially since we didn't know about VFREEING at all before),
but notice the EBUSY return from vget() instead.

Fix some more MP locking protocol issues, most of which were pointed out by
Christian Ehrhardt this morning on tech-kern.
 1.101 10-Apr-2006  perseant Optimize the free list search a little more; in particular use words
instead of bytes for the index, and never search below fs->lfs_freehd.

Fix a bug in the previous version of the search (an erroneous assumption
that ino_t was signed).

Free the bitmap when we unmount the filesystem.
 1.100 08-Apr-2006  perseant Implement a somewhat finer-grained mechanism for paging LFS-backed pages.
The writer daemon, if it does not need to flush the whole filesystem,
now only writes the vnodes for which the pagedaemon has requested pageouts
(although it does not pay attention to the page ranges the pagedaemon
supplies).
 1.99 08-Apr-2006  perseant Keep the free list ordered. This solves a problem first pointed out to me
by Michel Oey, in which an aged LFS writes up to an extra Ifile block for
every file created; and paves the way for the truncation of the Ifile when
many files are deleted.
 1.98 07-Apr-2006  perseant Make the segment lock aware of LWPs. Fixes a (somewhat confusing)
"lockmgr: pid 3997, not exclusive lockholder 3997, unlocking" panic I
encountered while running blogbench on an LFS.
 1.97 24-Mar-2006  perseant Improvements to LFS's paging mechanism, to wit:

* Acknowledge that sometimes there are more dirty pages to be written to
disk than clean segments. When we reach the danger line,
lfs_gop_write() now returns EAGAIN. The caller of VOP_PUTPAGES(), if
it holds the segment lock, drops it and waits for the cleaner to make
room before continuing.

* Note and avoid a three-way deadlock in lfs_putpages (a writer holding
a page busy blocks on the cleaner while the cleaner blocks on the
segment lock while lfs_putpages blocks on the page).
 1.96 17-Mar-2006  tls From Konrad Schroeder, in response to strange df output on anoncvs.netbsd.org:
We were returning the wrong value for free space. Now we're not.
 1.95 11-Dec-2005  christos branches: 1.95.4; 1.95.6; 1.95.8; 1.95.10; 1.95.12;
merge ktrace-lwp.
 1.94 13-Sep-2005  christos split out lfs_itimes(). It is used in fsck_lfs.
 1.93 12-Sep-2005  christos Use nanotime() to update the time fields in filesystems. Convert the code
from macros to real functions. Original patch and review from chuq.
Note: ext2fs only keeps seconds in the on-disk inode, and msdosfs does not
have enough precision for all fields, so this is not very useful for those
two.
 1.92 23-Aug-2005  christos Don't overload MAXNAMLEN, use a separate constant for each filesystem type.
 1.91 22-Aug-2005  yamt whitespace.
 1.90 22-Aug-2005  christos change ino_t to u_int32_t for syscall compatibility.
 1.89 31-Jul-2005  christos Move extern kernel variable declarations, into a _KERNEL protected session
so that the don't pollute userland's namespace.
 1.88 29-May-2005  christos branches: 1.88.2;
- sprinkle const
- avoid shadow variables.
 1.87 20-May-2005  perseant Keep track of the number of segments reclaimed, since the cleaner doesn't
do this anymore (it hasn't for quite some time). Add a couple of conditional
debugging messages to indicate why segments are not cleaned, in the event
that lfs_segclean is used.

Make the LFCNSEGWAITALL fcntl work again.
 1.86 23-Apr-2005  perseant Provide a resize_lfs(8), including kernel and cleaner support. The current
implementation requires the fs to be mounted while resizing. Tested in both
directions, and everything appears to work happily, but ymmv.
 1.85 19-Apr-2005  perseant Keep per-inode, per-fs, and subsystem-wide counts of blocks allocated through
lfs_balloc(), and use that to estimate the number of dirty pages belonging
to LFS (subsystem or filesystem). This is almost certainly wrong for
the case of a large mmap()ed region, but the accounting is tighter than
what we had before, and performs much better in the typical case of pages
dirtied through write().
 1.84 16-Apr-2005  perseant Make userland compile again.
 1.83 16-Apr-2005  perseant Use splay trees, rather than a hash table, to manage the accounting of
blocks allocated through VOP_BALLOC() for pages to be written to disk.
This accounting no longer takes a noticeable fraction of the system CPU.
 1.82 16-Apr-2005  perseant Use lfs_malloc() to manage the blkiov arrays that the cleaner functions use,
since the cleaner is likely to operate in a low-memory condition.
 1.81 14-Apr-2005  perseant Tabify leading whitespace
 1.80 14-Apr-2005  perseant Consolidate the hash table we use to maintain the integrity of lfs_avail
into a single, system-wide table, rather than having a separate hash table
per inode. Significantly reduces the "system" cpu usage of your average
file write.
 1.79 14-Apr-2005  perseant Keep track of the highest block held by an LFS inode, so that we can
be assured that the last byte of a file is always allocated. Previously
a file extension could cause the filesystem to be flushed, writing an
inconsistent inode to disk. Although this condition would be corrected
the next time blocks were written to disk, an intervening crash would leave
the filesystem in an inconsistent state, leaving fsck_lfs to complain
of an inode "partially truncated".
 1.78 01-Apr-2005  perseant Protect various per-fs structures with fs->lfs_interlock simple_lock, to
improve behavior in the multiprocessor case. Add debugging segment-lock
assertion statements.
 1.77 08-Mar-2005  perseant branches: 1.77.2;
Straighten out the maze of ifdefs. Instead, consolidate all the debugging
stuff under '#ifdef DEBUG', and use sysctl knobs to turn on/off particular
parts of the debugging reporting (if DEBUG is enabled). Re-enable the LFS
statistics in sysctl, while I'm there. A bit of a rototill.
 1.76 26-Feb-2005  perry nuke trailing whitespace
 1.75 26-Feb-2005  perseant Various minor LFS improvements:

* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statvfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().
 1.74 14-Aug-2004  mycroft branches: 1.74.4; 1.74.6;
Push atime/mtime updates even further -- into the reclaim path, so they happen
rarely in the normal case. (Note: This happens at reboot/shutdown time because
all file systems are unmounted.)

Also, for IN_MODIFY, use IN_ACCESSED, not IN_MODIFIED; otherwise "ls -l" of
your device node or FIFO would cause the time stamps to get written too
quickly.
 1.73 14-Aug-2004  mycroft Add a new flag, IN_MODIFY. This is like IN_UPDATE|IN_CHANGE, but unlike
setting those flags, it does not cause the inode to be written in the periodic
sync. This is used for writes to special files (devices and named pipes) and
FIFOs.

Do not preemptively sync updates to access times and modification times. They
are now updated in the inode only opportunistically, or when the file or device
is closed. (Really, it should be delayed beyond close, but this is enough to
help substantially with device nodes.)

And the most amusing part:
Trickle sync was broken on both FFS and ext2fs, in different ways. In FFS, the
periodic call to VFS_SYNC(MNT_LAZY) was still causing all file data to be
synced. In ext2fs, it was causing the metadata to *not* be synced. We now
only call VOP_UPDATE() on the node if we're doing MNT_LAZY. I've confirmed
that we do in fact trickle correctly now.
 1.72 09-Mar-2004  yamt branches: 1.72.4;
use correct segment size. this fixes memory corruption when using lfsv1.
 1.71 28-Jan-2004  yamt use bufmem instead of bufpages to make lfs a little less broken.
 1.70 07-Sep-2003  yamt - raise spl to bio in lfs_countlocked() rather than having callers to do so.
- buffer cache MP locks.
- assert B_CALL buffers are not on the free queue.
 1.69 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.68 30-Jul-2003  yamt using normal bufcache buffer for cluster buffer head.
 1.67 12-Jul-2003  yamt - protect global resource counts with lfs_subsys_lock.
- clean up scattered externs a little.
 1.66 02-Jul-2003  yamt use queue.h macros.
 1.65 02-Jul-2003  yamt - add a new functions, lfs_writer_enter/leave, and use them instead of
duplicated code fragments.
- add an assertion.
 1.64 23-Apr-2003  perseant branches: 1.64.2;
Make LFS work better (though still not "well") as an NFS-exported
filesystem (and other things that needed to be fixed before the tests
would complete), to wit:

* Include the fs ident in the filehandle; improve stale filehandle checks.

* Change definition of blksize() to use the on-dinode size instead of
the inode's i_size, so that fsck_lfs will work properly again.

* Use b_interlock in lfs_vtruncbuf.

* Postpone dirop reclamation until after the seglock has been released,
so that lfs_truncate is not called with the segment lock held.

* Don't loop in lfs_fsync(), just write everything and wait.

* Be more careful about the interlock/uobjlock in lfs_putpages: when we
lose this lock, we have to resynchronize dirtiness of pages in each
block.

* Be sure to always write indirect blocks and update metadata in
lfs_putpages; fixes a bug that caused blocks to be accounted to the
wrong segment.
 1.63 09-Apr-2003  thorpej Use PAGE_SIZE rather than NBPG.
 1.62 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.61 28-Mar-2003  perseant Add a sleeper count, to prevent the cleaner from panicing the kernel
when the filesystem is unmounted, relocking the Ifile when its lock is
draining. (We can't use vfs_busy() since the process is sleeping for a
good long time.) Clean up / organize lfs.h, while I'm here.

In lfs_update_single, assert that disk addresses are either negative, or
are still positive when converted to int32_t, to prevent recurrence of a
negative/positive block problem.
 1.60 21-Mar-2003  perseant KNF (space after keywords).
 1.59 21-Mar-2003  perseant Use VONWORKLST as a heuristic for vnode emptiness, rather than exhaustively
checking the memq.

Take greater care not to dirty the Ifile vnode when unmounting the filesystem.
This should fix a "(vp->v_flag & VONWORKLST) == 0" assertion panic in vgonel
that could occur when unmounting.

Do not allow the Ifile to be mapped for writing.
 1.58 15-Mar-2003  perseant Add simple_lock protection for lfs_seglock and lfs_subsys_pages; these will
be expanded to cover other per-fs and subsystem-wide data as well.

Fix a case of IN_MODIFIED being set without updating lfs_uinodes, resulting
in a "lfs_uinodes < 0" panic.

Fix a deadlock in lfs_putpages arising from the need to busy all pages in a
block; unbusy any that had already been busied before starting over.
 1.57 11-Mar-2003  perseant - Get rid of unused #ifdefs LFS_NO_PAGEMOVE and LFS_MALLOC_SUMMARY (both
always true) and accompanying dead code.

- When constructing write clusters in lfs_writeseg, if the block we are
about to add is itself a cluster from GOP_WRITE, don't put a cluster
in a cluster, just write the GOP_WRITE cluster on its own. This seems
to represent a slight performance gain on my test machine.

- Charge someone's rusage for writes on LFSes. It's difficult to tell
who the "right" process to charge is; just charge whoever triggered
the write.
 1.56 08-Mar-2003  perseant Take away "#ifdef LFS_UBC".
 1.55 08-Mar-2003  perseant Add an lfs_strategy() that checks to make sure we're not trying to read
where the cleaner is trying to write, instead of tying up the "live"
buffers (or pages).

Fix a bug in the LFS_UBC case where oversized buffers would not be
checksummed correctly, causing uncleanable segments.

Make sure that wakeup(fs->lfs_iocount) is done if fs->lfs_iocount is 1
as well as 0, since we wait in some places for it to drop to 1.

Activate all pages that make it into lfs_gop_write without the segment
lock held, since they must have been dirtied very recently, even if
PG_DELWRI is not set.
 1.54 02-Mar-2003  perseant Account SEGUSE_ACTIVE correctly so that the automatic segment cleaning
actually happens.

Add a new fcntl call that will write the minimum necessary to checkpoint
(i.e., for on-disk directory structure to be consistent, not including
updates to file data) so that the cleaner can clean segments more quickly
without sacrificing three-way commit for cleaning.
 1.53 27-Feb-2003  perseant Do roundup and offset arithmetic in 64 bits, to allow >=2G files.
 1.52 25-Feb-2003  perseant Make fs-specific fcntl macros take three arguments (approved wrstuden).
Let LFS use fcntl for cleaner functions.
 1.51 24-Feb-2003  perseant Add lfs_ioctl vnode op, with ioctls to take over cleaner system call
functionality (not including segment clean, since that is now done
automatically as checkpoints happen).
 1.50 23-Feb-2003  perseant Fix a buffer overflow bug in the LFS_UBC case that manifested itself
either as a mysterious UVM error or as "panic: dirty bufs". Verify
maximum size in lfs_malloc.

Teach lfs_updatemeta and lfs_shellsort about oversized cluster blocks from
lfs_gop_write.

When unwiring pages in lfs_gop_write, deactivate them, under the theory
that the pagedaemon wanted to free them last we knew.
 1.49 20-Feb-2003  perseant Tabify, and fix some comment alignment problems.
 1.48 19-Feb-2003  yamt workaround for "another flush is..." infinity loop in writerd.
if we're writerd, sleep in lfs_flush until another writer goes away
instead of busy loop in writed.
 1.47 18-Feb-2003  soren Make libsa compile again.
 1.46 17-Feb-2003  perseant Add code to UBCify LFS. This is still behind "#ifdef LFS_UBC" for now
(there are still some details to work out) but expect that to go
away soon. To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:

* Create a writer daemon kernel thread whose purpose is to handle page
writes for the pagedaemon, but which also takes over some of the
functions of lfs_check(). This thread is started the first time an
LFS is mounted.

* Add a "flags" parameter to GOP_SIZE. Current values are
GOP_SIZE_READ, meaning that the call should return the size of the
in-core version of the file, and GOP_SIZE_WRITE, meaning that it
should return the on-disk size. One of GOP_SIZE_READ or
GOP_SIZE_WRITE must be specified.

* Instead of using malloc(...M_WAITOK) for everything, reserve enough
resources to get by and use malloc(...M_NOWAIT), using the reserves if
necessary. Use the pool subsystem for structures small enough that
this is feasible. This also obsoletes LFS_THROTTLE.

And a few that are not strictly necessary:

* Moves the LFS inode extensions off onto a separately allocated
structure; getting closer to LFS as an LKM. "Welcome to 1.6O."

* Unified GOP_ALLOC between FFS and LFS.

* Update LFS copyright headers to correct values.

* Actually cast to unsigned in lfs_shellsort, like the comment says.

* Keep track of which segments were empty before the previous
checkpoint; any segments that pass two checkpoints both dirty and
empty can be summarily cleaned. Do this. Right now lfs_segclean
still works, but this should be turned into an effectless
compatibility syscall.
 1.45 29-Jan-2003  yamt don't use daddr_t for segment summary since it's an on-disk structure.
 1.44 27-Jan-2003  yamt make these compilable with lfs debug options.
(follow daddr_t change)

XXX maybe segment number should be 64bit.
 1.43 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.42 01-Dec-2002  matt Add multiple inclusion protection for headers. Fix mismatched
variable declarations (missing const's) as needed.
 1.41 06-Jul-2002  perseant Deal with fragment size changes better. For each fragment that can
exist on an on-disk inode, we keep a record of its size in struct inode,
which is updated when we write the block to disk. The cleaner routines
thus have ready access to what size is the correct size for this block,
on disk.

Fixed a related bug: if a file with fragments is being cleaned
(fragments being cleaned) at the same time it is being extended beyond
NDADDR blocks, we could write a bogus FINFO record that has a frag in the
middle; when it was cleaned this would give back bogus file data. Don't
write the indirect blocks in this case, since there is no need.

lfs_fragextend and lfs_truncate no longer require the seglock, but instead
take a shared lock, which the seglock locks exclusively.
 1.40 16-Jun-2002  perseant For synchronous writes, keep separate i/o counters for each write, so
processes don't have to wait for one another to finish (e.g., nfsd seems
to be a little happier now, though I haven't measured the difference).
Synchronous checkpoints, however, must always wait for all i/o to finish.

Take the contents of the callback functions and have them run in thread
context instead (aiodoned thread). lfs_iocount no longer has to be
protected in splbio(), and quite a bit less of the segment construction
loop needs to be in splbio() as well.

If lfs_markv is handed a block that is not the correct size according to
the inode, refuse to process it. (Formerly it was extended to the "correct"
size.) This is possibly more prone to deadlock, but less prone to corruption.

lfs_segclean now outright refuses to clean segments that appear to have live
bytes in them. Again this may be more prone to deadlock but avoids
corruption.

Replace ufsspec_close and ufsfifo_close with LFS equivalents; this means
that no UFS functions need to know about LFS_ITIMES any more. Remove
the reference from ufs/inode.h.

Tested on i386, test-compiled on alpha.
 1.39 14-May-2002  perseant branches: 1.39.2; 1.39.4;
Phase one of my three-phase plan to make LFS play nice with UBC, and bug-fixes
I found while making sure there weren't any new ones.

* Make the write clusters keep track of the buffers whose blocks they contain.
This should make it possible to (1) write clusters using a page mapping
instead of malloc, if desired, and (2) schedule blocks for rewriting
(somewhere else) if a write error occurs. Code is present to use
pagemove() to construct the clusters but that is untested and will go away
anyway in favor of page mapping.
* DEBUG now keeps a log of Ifile writes, so that any lingering instances of
the "dirty bufs" problem can be properly debugged.
* Keep track of whether the Ifile has been dirtied by various routines that
can be called by lfs_segwrite, and loop on that until it is clean, for
a checkpoint. Checkpoints need to be squeaky clean.
* Warn the user (once) if the Ifile grows larger than is reasonable for their
buffer cache. Both lfs_mountfs and lfs_unmount check since the Ifile can
grow.
* If an inode is not found in a disk block, try rereading the block, under
the assumption that the block was copied to a cluster and then freed.
* Protect WRITEINPROG() with splbio() to fix a hang in lfs_update.
 1.38 23-Nov-2001  chs add spaces for KNF. confirmed to produce identical objects.
 1.37 13-Jul-2001  perseant Merge the short-lived perseant-lfsv2 branch into the trunk.

Kernels and tools understand both v1 and v2 filesystems; newfs_lfs
generates v2 by default. Changes for the v2 layout include:

- Segments of non-PO2 size and arbitrary block offset, so these can be
matched to convenient physical characteristics of the partition (e.g.,
stripe or track size and offset).

- Address by fragment instead of by disk sector, paving the way for
non-512-byte-sector devices. In theory fragments can be as large
as you like, though in reality they must be smaller than MAXBSIZE in size.

- Use serial number and filesystem identifier to ensure that roll-forward
doesn't get old data and think it's new. Roll-forward is enabled for
v2 filesystems, though not for v1 filesystems by default.

- The inode free list is now a tailq, paving the way for undelete (undelete
is not yet implemented, but can be without further non-backwards-compatible
changes to disk structures).

- Inode atime information is kept in the Ifile, instead of on the inode;
that is, the inode is never written *just* because atime was changed.
Because of this the inodes remain near the file data on the disk, rather
than wandering all over as the disk is read repeatedly. This speeds up
repeated reads by a small but noticeable amount.

Other changes of note include:

- The ifile written by newfs_lfs can now be of arbitrary length, it is no
longer restricted to a single indirect block.

- Fixed an old bug where ctime was changed every time a vnode was created.
I need to look more closely to make sure that the times are only updated
during write(2) and friends, not after-the-fact during a segment write,
and certainly not by the cleaner.
 1.36 20-Dec-2000  cgd branches: 1.36.2; 1.36.4; 1.36.6;
replace \<space(s)><newline> (wrong!) with \<newline>
 1.35 17-Nov-2000  perseant Correct accounting of lfs_avail, locked_queue_count, and locked_queue_bytes.
(PR #11468). In the case of fragment allocation, check to see if enough
space is available before extending a fragment already scheduled for writing.

The locked_queue_* variables indicate the number of buffer headers and bytes,
respectively, that are unavailable to getnewbuf() because they are locked up
waiting for LFS to flush them; make sure that that is actually what we're
counting, i.e., never count malloced buffers, and always use b_bufsize instead
of b_bcount.

If DEBUG is defined, the periodic calls to lfs_countlocked will now complain
if either counter is incorrect. (In the future lfs_countlocked will not need
to be called at all if DEBUG is not defined.)
 1.34 13-Nov-2000  perseant Remove debugging code that accidentally went in with yesterday's commit.
 1.33 12-Nov-2000  perseant Do not needlessly dirty segment table blocks during lfs_segwrite,
preventing needless disk activity when the filesystem is idle. (PR #10979.)
 1.32 13-Sep-2000  perseant Cast back to int32_t in LFS_EST_BFREE and LFS_EST_RSVD macros, for
consistency with their arguments.

Change the debugging printf in lfs_reserve to match, and enclose it in
#ifdef DEBUG.

Tested on alpha, arm32, sparc.
 1.31 09-Sep-2000  perseant Various bug-fixes to LFS, to wit:


Kernel:

* Add runtime quantity lfs_ravail, the number of disk-blocks reserved
for writing. Writes to the filesystem first reserve a maximum amount
of blocks before their write is allowed to proceed; after the blocks
are allocated the reserved total is reduced by a corresponding amount.

If the lfs_reserve function cannot immediately reserve the requested
number of blocks, the inode is unlocked, and the thread sleeps until
the cleaner has made enough space available for the blocks to be
reserved. In this way large files can be written to the filesystem
(or, smaller files can be written to a nearly-full but thoroughly
clean filesystem) and the cleaner can still function properly.

* Remove explicit switching on dlfs_minfreeseg from the kernel code; it
is now merely a fs-creation parameter used to compute dlfs_avail and
dlfs_bfree (and used by fsck_lfs(8) to check their accuracy). Its
former role is better assumed by a properly computed dlfs_avail.

* Bounds-check inode numbers submitted through lfs_bmapv and lfs_markv.
This prevents a panic, but, if the cleaner is feeding the filesystem
the wrong data, you are still in a world of hurt.

* Cleanup: remove explicit references of DEV_BSIZE in favor of
btodb()/dbtob().

lfs_cleanerd:

* Make -n mean "send N segments' blocks through a single call to
lfs_markv". Previously it had meant "clean N segments though N calls
to lfs_markv, before looking again to see if more need to be cleaned".
The new behavior gives better packing of direct data on disk with as
little metadata as possible, largely alleviating the problem that the
cleaner can consume more disk through inefficient use of metadata than
it frees by moving dirty data away from clean "holes" to produce
entirely clean segments.

* Make -b mean "read as many segments as necessary to write N segments
of dirty data back to disk", rather than its former meaning of "read
as many segments as necessary to free N segments worth of space". The
new meaning, combined with the new -n behavior described above,
further aids in cleaning storage efficiency as entire segments can be
written at once, using as few blocks as possible for segment summaries
and inode blocks.

* Make the cleaner take note of segments which could not be cleaned due
to error, and not attempt to clean them until they are entirely free
of dirty blocks. This prevents the case in which a cleanerd running
with -n 1 and without -b (formerly the default) would spin trying
repeatedly to clean a corrupt segment, while the remaining space
filled and deadlocked the filesystem.

* Update the lfs_cleanerd manual page to describe all the options,
including the changes mentioned here (in particular, the -b and -n
flags were previously undocumented).

fsck_lfs:

* Check, and optionally fix, lfs_avail (to an exact figure) and
lfs_bfree (within a margin of error) in pass 5.

newfs_lfs:

* Reduce the default dlfs_minfreeseg to 1/20 of the total segments.

* Add a warning if the sgs disklabel field is 16 (the default for FFS'
cpg, but not usually desirable for LFS' sgs: 5--8 is a better range).

* Change the calculation of lfs_avail and lfs_bfree, corresponding to
the kernel changes mentioned above.

mount_lfs:

* Add -N and -b options to pass corresponding -n and -b options to
lfs_cleanerd.

* Default to calling lfs_cleanerd with "-b -n 4".


[All of these changes were largely tested in the 1.5 branch, with the
idea that they (along with previous un-pulled-up work) could be applied
to the branch while it was still in ALPHA2; however my test system has
experienced corruption on another filesystem (/dev/console has gone
missing :^), and, while I believe this unrelated to the LFS changes, I
cannot with good conscience request that the changes be pulled up.]
 1.30 09-Sep-2000  perseant Change dlfs_dmeta and dlfs_avail to signed quantities, to prevent
underflow errors, visible in userland as impossibly high values
returned from df(1).
 1.29 05-Jul-2000  perseant Clean up accounting of lfs_uinodes (dirty but unwritten inodes).

Make lfs_uinodes a signed quantity for debugging purposes, and set it to
zero as fs mount time.

Enclose setting/clearing of the dirty flags (IN_MODIFIED, IN_ACCESSED,
IN_CLEANING) in macros, and use those macros everywhere. Make
LFS_ITIMES use these macros; updated the ITIMES macro in inode.h to know
about this. Make ufs_getattr use ITIMES instead of FFS_ITIMES.
 1.28 04-Jul-2000  perseant Fix errors observed while trying to fill the filesystem with yesterday's
fixes:

- Write copies of bfree and avail in the CLEANERINFO block, so the
cleaner doesn't have to guess which superblock has the current
information (if indeed any do).

- Tighten up accounting of lfs_avail (more needs to be done).

- When cleansing indirect blocks of UNWRITTEN, make sure not to mark
them clean, since they'll need to be rewritten later.
 1.27 03-Jul-2000  perseant Allow the number of free segments reserved for the cleaner to be
parametrized in the filesystem, defaulting to MIN_FREE_SEGS = 2 but set
to something more reasonable at newfs_lfs time.

Note the number of blocks that have been scheduled for writing but which
are not yet on disk in an inode extension, i_lfs_effnblks. Move
i_ffs_effnlink out of the ffs extension and onto the main inode, since
it's used all over the shared code and the lfs extension would clobber
it.

At inode write time, indirect blocks and inode-held blocks of inodes
that have i_lfs_effnblks != i_ffs_blocks are cleansed of UNWRITTEN disk
addresses, so that these never make it to disk.
 1.26 27-Jun-2000  perseant Fixes associated with filling an LFS:

Change the space computation to appear to change the size of the *disk*
rather than the *bytes used* when more segment summaries and inode
blocks are written. Try to estimate the amount of space that these will
take up when more files are written, so the disk size doesn't change too
much.

Regularize error returns from lfs_valloc, lfs_balloc, lfs_truncate: they
now fail entirely, rather than succeeding half-way and leaving the fs in
an inconsistent state.

Rewrite lfs_truncate, mostly stealing from ffs_truncate. The old
lfs_truncate had difficulty truncating a large file to a non-zero size
(indirect blocks were not handled appropriately).

Unmark VDIROP on fvp after ufs_remove, ufs_rmdir, so these can be
reclaimed immediately: this vnode would not be written to disk again
anyway if the removal succeeded, and if it failed, no directory
operation occurred.

ufs_makeinode and ufs_mkdir now remove IN_ADIROP on error.
 1.25 06-Jun-2000  perseant branches: 1.25.2;
Protect inode free list with seglock, instead of separate lock, so that
the head of the inode free list (on the superblock) always matches the
rest of the free list (in the ifile).

Protect lfs_fragextend with seglock, to prevent the segment byte count
fudging from making its way to disk.

Don't try to inactivate dirop vnodes that are still in the middle of
their dirop (may address PR#10285).
 1.24 31-May-2000  perseant update for IN_ACCESSED changes
 1.23 27-May-2000  perseant branches: 1.23.2;
Prevent dirops from getting around lfs_check and wedging the buffer cache.
All the dirop vnops now mark the inodes with a new flag, IN_ADIROP, which
is removed as soon as the dirop is done (as opposed to VDIROP which stays
until the file is written). To address one issue raised in PR#9357.
 1.22 13-May-2000  perseant Change the sementics of the last parameter from a boolean ("waitfor") to
a set of flags ("flags"). Two flags are defined, UPDATE_WAIT and
UPDATE_DIROP.

Under the old semantics, VOP_UPDATE would block if waitfor were set,
under the assumption that directory operations should be done
synchronously. At least LFS and FFS+softdep do not make this
assumption; FFS+softdep got around the problem by enclosing all relevant
calls to VOP_UPDATE in a "if(!DOINGSOFTDEP(vp))", while LFS simply
ignored waitfor, one of the reasons why NFS-serving an LFS filesystem
did not work properly.

Under the new semantics, the UPDATE_DIROP flag is a hint to the
fs-specific update routine that the call comes from a dirop routine, and
should be wait for, or not, accordingly.

Closes PR#8996.
 1.21 05-May-2000  perseant Change the way LFS does block accounting, from trying to infer from the
buffer cache flags, to marking the inode and/or indirect blocks with a
special disk address UNWRITTEN==-2 when a block is accounted for. (This
address is never written to disk, but only used in-core. This is essentially
the same method of block accounting as on the UBC branch, where the buffer
headers don't exist.) Make sure that truncation is handled properly,
especially in the case of holey files.

Fixes PR#9994.
 1.20 19-Jan-2000  perseant Changes to stabilize LFS. The first two of these should also apply to the
1.4 branch.

* Use a separate per-fs lock, instead of ufs_hashlock, to protect the Inode
free list. This seems to prevent the "lockmgr: %d, not exclusive lock holder
%d, unlocking" message I was mis-attributing last night to an unlocked vnode
being passed to vrele.

* Change calling semantics of lfs_ifind, to give better error reporting:
If fed a struct buf, it can report the block number of the offending inode
block as well as the inode number.

* Back out rev 1.10 of lfs_subr.c, since the replacement code was slightly
uglier while being functionally identical.

* Make lfs_vunref use the same free list convention as vrele/vput, so that
vget does not remove vnodes from a hash list they are not on.
 1.19 15-Dec-1999  perseant In lfs_bwrite, don't mark buffers dirty if lfs is mounted read-only.
(Previously buffers could be marked dirty by the cleaner, and possibly by
other means.)

Also check for softdep mount in vfs_shutdown before trying to bawrite
buffers, since other filesystems don't need it and lfs doesn't bawrite.
(This fragment reviewed by fvdl.)

Partially addresses PR#8964.
 1.18 08-Dec-1999  simonb Use an explicitly sized type (u_int32_t) for inode numbers in the super
block instead of ino_t. Reviewed by Konrad Schroder.
 1.17 06-Nov-1999  perseant branches: 1.17.2;
Address ufs_hashlock/ufs_ihashins protocol bug, discovered while doing a
post-mortem of a production machine. Also, take the active dirop
count off of the fs and make it global (since it is measuring a global
resource) and tie the threshold value LFS_MAXDIROP to desiredvnodes.
 1.16 15-Jun-1999  perseant branches: 1.16.2; 1.16.4; 1.16.6;
Minor changes to the segment live bytes calculation. In particular, fixed
a bug in fragment extension that could run the count negative. Also, don't
overcount for inodes, and don't count segment summaries. Thus, for empty
segments the live bytes count should now be exactly zero.
 1.15 01-Jun-1999  perseant Fixed lfs_update (and related functions) so that calls from lfs_fsync
will DTRT with vnodes marked VDIROP. In particular, the message
"flushing VDIROP" will no longer appear, and the filesystem will remain
stable in the event of a crash.

This was particularly a problem with NFS-exported LFSes, since fsync
was called on every file close.
 1.14 25-Mar-1999  perseant branches: 1.14.2; 1.14.4; 1.14.6;
clean up unused/required #ifdefs
 1.13 17-Mar-1999  perseant Move dlfs_pad to the end of struct dlfs (after the pad), for upward
compatibility.
 1.12 17-Mar-1999  perseant Fix pad on lfs.h so it is really 512 bytes, as advertized
 1.11 10-Mar-1999  perseant New sources should leave the LFS in a more-or-less working state. Changes
include:

- DIROP segregation is enabled, and greater care is taken
to make sure that a checkpoint completes. Fsck is not
needed to remount the filesystem.
- Several checks to make sure that the LFS subsystem does not
overuse various resources (memory, in particular).
- The cleaner routines, lfs_markv in particular, are completely
rewritten. A buffer overflow is removed. Greater care is taken
to ensure that inodes come from where lfs_cleanerd say they come
from (so we know nothing has changed since lfs_bmapv was called).
- Fragment allocation is fixed, so that writes beyond end-of-file
do the right thing.
 1.10 11-Sep-1998  pk PR#6032: define fixed sized on-disk superblock structure.
 1.9 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.8 05-Dec-1996  is Make the struct lfs 512 bytes long on 32bit machines whose compiler doesn't
align 32bit integers. Use explicit sized typing at some other places.

XXX This still won't fix lfs for 64bit machines, as we have some
assumptions about sizeof(pointer)=sizeof(u_int32_t) in here, and (if I
looked right) a misaligned u_int64_t. The right fix (to cite cgd) will
be to seperate on-disk-representation from in-core, but I don't have
the time (at the moment) to do this.
 1.7 09-Feb-1996  christos lfs prototypes
 1.6 21-Dec-1994  mycroft Add RCS ids where missing.
 1.5 14-Dec-1994  mycroft Sync with CSRG.
 1.4 17-Nov-1994  mycroft Round struct lfs to 512 bytes.
 1.3 20-Oct-1994  cgd update for new syscall args description mechanism, and deal safely
with wider types.
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.14.6.1 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.14.4.2 31-Aug-1999  perseant Rudimentary support for LFS under UBC:

- LFS-specific VOP_BALLOC and VOP_PUTPAGES vnode ops.

- getblk VREG panic #ifdef'd out (can be reinstated when Ifile is
internalized and Ifile can be made another type from VREG)

- interface to VOP_PUTPAGES changed to pass all pager flags, not
just sync. FS putpages routines must know about the pager flags.

- new LFS magic disk address, -2 ("unwritten"), meaning accounted for
but not assigned to a fixed disk location (since LFS does these two
things separately, and the previous accounting method using buffer
headers no longer will work). Changed references to (foo == (daddr_t)-1)
to (foo < 0). Since disk drivers reject all addresses < 0, this should
not present a problem for other FSs.
 1.14.4.1 21-Jun-1999  thorpej Sync w/ -current.
 1.14.2.4 20-Jan-2000  he Pull up revision 1.20 (requested by perseant):
Files removed (through unlink, rmdir) are now really removed, though the
removal is postponed until the dirop is complete to ensure validity of
the filesystem through a crash. Use a separate per-fs lock, instead of
ufs_hashlock, to protect the inode free list. Change calling semantics
of lfs_ifind, to give better error reporting: If fed a struct buf, it
can report the block number of the offending inode block as well as the
inode number.
 1.14.2.3 17-Dec-1999  he Pull up revision 1.17 (requested by perseant):
Address locking protocol error for inode hash, and make the
maximum number of active dirops a global quantity.
 1.14.2.2 17-Dec-1999  he Pull up revision 1.15 (requested by perseant):
Avoid flushing vnodes involved in a dirop, making lfs' promise
of "no fsck needed, even in the event of a crash" closer to
reality.
 1.14.2.1 25-Jun-1999  perry pullup 1.15->1.16 (perseant)
 1.16.6.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.16.4.1 15-Nov-1999  fvdl Sync with -current
 1.16.2.3 05-Jan-2001  bouyer Sync with HEAD
 1.16.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.16.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.17.2.2 06-Nov-1999  perseant Address ufs_hashlock/ufs_ihashins protocol bug, discovered while doing a
post-mortem of a production machine. Also, take the active dirop
count off of the fs and make it global (since it is measuring a global
resource) and tie the threshold value LFS_MAXDIROP to desiredvnodes.
 1.17.2.1 06-Nov-1999  perseant file lfs.h was added on branch comdex-fall-1999 on 1999-11-06 20:33:06 +0000
 1.23.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.25.2.2 03-Feb-2001  he Pull up revisions 1.33-1.35 (requested by perseant):
o Don't write anything if the filesystem is idle (PR#10979).
o Close up accounting holes in LFS' accounting of immediately-
available-space, number of clean segments, and amount of dirty
space taken up by metadata (PR#11468, PR#11470, PR#11534).
 1.25.2.1 14-Sep-2000  perseant Pull up recent LFS kernel changes (approved by thorpej):

ufs/ufs/inode.h, 1.20--1.22 (add i_lfs_effnblks extension ;
make ITIMES aware of LFS_ITIMES;
_LKM protection so userland progs
compile)
ufs/ufs/ufs_vnops.c, 1.69, 1.71 (remove IN_ADIROP;
use ITIMES instead of FFS_ITIMES)
ufs/ufs/ufs_readwrite.c, 1.27 (use lfs_reserve in lfs_write)
ufs/lfs/lfs.h, 1.26--1.32 (define LFS_EST_* macros ;
change MIN_FREE_SEGS to lfs_minfreesegs ;
add avail and bfree to CLEANERINFO ;
change lfs_uinodes to signed ;
change lfs_dmeta to signed ;
add whitespace to line up structure
members ;
explicit cast to int32_t in LFS_EST_*
macros)
ufs/lfs/lfs_alloc.c, back out 1.34.2.3 (pullups of 1.39, 1.40);
then pull up 1.38 (clean up on error)
1.39--1.43 (restore fvdl's ufs_hashlock fix ;
restore fvdl's ufs_hashlock fix ;
set i_lfs_effnblks ;
use UINO macros ;
add comments and fix long lines)
ufs/lfs/lfs_balloc.c, 1.19 (don't succeed halfway)
1.21--1.25 (use i_lfs_effnblks ;
fix i_lfs_effnblks computation and
quieten ;
fix i_ffs_blocks in unwritten fragment ;
remove useless debugging check ;
add comments and (c) 2000)
ufs/lfs/lfs_bio.c, 1.24--1.30 (cleanup and make lfs_flush_fs take
"struct lfs *" instead of "struct
mount *" ;
use lfs_minfreeseg instead of
MIN_FREE_SEGS ;
use UINO macros, and copy bfree/avail
to CLEANERINFO ;
add lfs_reserve function ;
1.28--1.30 fix printf formatting)
ufs/lfs/lfs_cksum.c, 1.13 (add (c) 2000)
ufs/lfs/lfs_debug.c, 1.11 (use btodb instead of DEV_BSIZE)
ufs/lfs/lfs_extern.h, 1.18, 1.20--1.21 (function prototype changes)
ufs/lfs/lfs_inode.c, 1.38 (rewrite lfs_truncate from
ffs_truncate)
1.40--1.44 (count written and unwritten blocks
seperately ;
use disk block units instead of bytes ;
remove unnecessary "mod" variable ;
correct B_DELWRI to avoid bawrite panic ;
use lfs_reserve)
ufs/lfs/lfs_segment.c, 1.52-1.59 (use lfs_dmeta to note used summaries ;
check for UNWRITTEN in indirect blocks ;
more debugging stuff inside #ifdef
DEBUG_LFS ;
use LK_CANRECURSE ;
don't drop dirty indirect blocks ;
use UINO macros ;
don't hose the free list ;
use btodb() instead of DEV_BSIZE ;
make it compile again (oops))
ufs/lfs/lfs_subr.c, 1.16--1.17 (check for locked inodes before
changing ;
use btodb() instead of DEV_BSIZE, (c)
2000)
ufs/lfs/lfs_syscalls.c, back out 1.41.4.2 (fvdl's ufs_hashlock fix);
then pull up 1.43 (use lfs_dmeta)
1.44--1.45 (restore fvdl's ufs_hashlock fix)
1.46--1.47 (fix lfs_avail leakage from sblock
segments ;
use UINO macros)
1.49 (bounds-check inode numbers in
lfs_markv)
ufs/lfs/lfs_vfsops.c, 1.53 (use LFS_EST_* macros in lfs_statfs)
1.56--1.58 (initialize lfs_minfreeseg, lfs_effnblk ;
initialize lfs_uinodes ;
initialize lfs_ravail)
ufs/lfs/lfs_vnops.c, 1.40 (remove VDIROP from removed files)
1.42--1.44 (move SET_ENDOP below the removal of
VDIROP ;
use UINO macros and add lfs_itimes
function ;
use lfs_reserve in dirops)
 1.36.6.4 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.36.6.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.36.6.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.36.6.1 03-Aug-2001  lukem update to -current
 1.36.4.4 13-Jul-2001  perseant Be more careful about when we update ctime/mtime. In particular, if we
are only writing indirect blocks, that doesn't count for mtime; and when
we first create a vnode, that certainly *does not* count for ctime
(a bug that's been there from the beginning).

This does not change the fact that mtime might still be set after write(2)
is "completed", but it does make the atime-in-the-ifile code have some
effect (noticeable less degradation of read time after an intervening
large write).
 1.36.4.3 10-Jul-2001  perseant Turn the free list into a tailq, with both head and tail kept on the ifile.

Update access times on the inode even if it does not get marked IN_ACCESS.
 1.36.4.2 02-Jul-2001  perseant Change disk addressing unit to be the fragment, instead of the disk sector.
All quantities in the superblock, inodes, indirect blocks, etc. refer now
to this abstract unit (called "fsb" as it is in FFS) instead of disk sectors;
as a consequence segment summary blocks have to be multiples of a fragment in
size. In v1 filesystems, compatibility code ensures that 1 fsb == 1 sector,
regardless of fragment size.

Fragments can now range in size between 512 and 32k; in the event that
LFS_LABELPAD (8k) is smaller than the disk address unit size, an extra
proto-superblock is kept at 8k from the beginning of the disk, to be used
*only* to locate the real superblocks. (Not all of the userland knows about
this yet.)

Almost all of this was done not by me, but by joff.
 1.36.4.1 27-Jun-2001  perseant Import of what I've been calling "LFSv2", that is, LFS with some features
added that require changes to the on-disk data structures. These include:

- 64-bit time in everything but inodes
- User-specified segment offset, and segment size no longer
restricted to PO2.
- Serial number on segment summaries in addition to timestamp, and
a new volume identifier, to make roll-forward feasible without
fear of finding old data and thinking it was new.

Although I think this version works at least as well as what's on the trunk,
we're not done yet; hence this commit is going in on a branch and not on
the trunk. Enhancements that are not here yet include fragment addressing,
like FFS does, instead of block addressing.
 1.36.2.5 11-Dec-2002  thorpej Sync with HEAD.
 1.36.2.4 01-Aug-2002  nathanw Catch up to -current.
 1.36.2.3 20-Jun-2002  nathanw Catch up to -current.
 1.36.2.2 08-Jan-2002  nathanw Catch up to -current.
 1.36.2.1 24-Aug-2001  nathanw Catch up with -current.
 1.39.4.1 20-Jun-2002  lukem Pull up revision 1.40 (requested by perseant in ticket #325):
For synchronous writes, keep separate i/o counters for each write, so
processes don't have to wait for one another to finish (e.g., nfsd seems
to be a little happier now, though I haven't measured the difference).
Synchronous checkpoints, however, must always wait for all i/o to finish.
Take the contents of the callback functions and have them run in thread
context instead (aiodoned thread). lfs_iocount no longer has to be
protected in splbio(), and quite a bit less of the segment construction
loop needs to be in splbio() as well.
If lfs_markv is handed a block that is not the correct size according to
the inode, refuse to process it. (Formerly it was extended to the "correct"
size.) This is possibly more prone to deadlock, but less prone to corruption.
lfs_segclean now outright refuses to clean segments that appear to have live
bytes in them. Again this may be more prone to deadlock but avoids
corruption.
Replace ufsspec_close and ufsfifo_close with LFS equivalents; this means
that no UFS functions need to know about LFS_ITIMES any more. Remove
the reference from ufs/inode.h.
Tested on i386, test-compiled on alpha.
 1.39.2.2 15-Jul-2002  gehenna catch up with -current.
 1.39.2.1 20-Jun-2002  gehenna catch up with -current.
 1.64.2.7 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.64.2.6 08-Mar-2005  skrll Sync with HEAD.
 1.64.2.5 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.64.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.64.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.64.2.2 25-Aug-2004  skrll Sync with HEAD.
 1.64.2.1 03-Aug-2004  skrll Sync with HEAD
 1.72.4.1 10-May-2005  riz Pull up the following revisions (requested by perseant in ticket #1281):

1.8 sys/ufs/lfs/TODO
1.75 sys/ufs/lfs/lfs.h (via patch)
1.74 sys/ufs/lfs/lfs_alloc.c (via patch)
1.49, 1.51 sys/ufs/lfs/lfs_balloc.c (1.51 via patch)
1.78 sys/ufs/lfs/lfs_bio.c
1.62 sys/ufs/lfs/lfs_extern.h (via patch)
1.156 sys/ufs/lfs/lfs_segment.c (via patch)
1.48 sys/ufs/lfs/lfs_subr.c
1.101 sys/ufs/lfs/lfs_syscalls.c
1.163 sys/ufs/lfs/lfs_vfsops.c (via patch)
1.134 sys/ufs/lfs/lfs_vnops.c (via patch)
1.61 sys/ufs/ufs/ufs_readwrite.c (via patch)

1.20 libexec/lfs_cleanerd/clean.h (via patch)
1.52 libexec/lfs_cleanerd/cleanerd.c (via patch)
1.41 libexec/lfs_cleanerd/library.c (via patch)

1.4 regress/sys/fs/lfs/newfs_fsck/Makefile
1.2 regress/sys/fs/lfs/newfs_fsck/mkfs_mount
1.2 regress/sys/fs/lfs/newfs_fsck/smallfiles
1.3 sbin/fsck_lfs/bufcache.c
1.3 sbin/fsck_lfs/bufcache.h
1.3 sbin/fsck_lfs/lfs.h
1.8 sbin/fsck_lfs/lfs.c (via patch)
1.8 sbin/fsck_lfs/pass3.c (via patch)
1.18 sbin/fsck_lfs/pass0.c (via patch)
1.18 sbin/fsck_lfs/utilities.c (via patch)
1.7 sbin/fsck_lfs/segwrite.c
1.19 sbin/fsck_lfs/setup.c (via patch)
1.3 sbin/newfs_lfs/Makefile
0 sbin/newfs_lfs/lfs.c (yes, remove it)
1.1 sbin/newfs_lfs/make_lfs.c
1.15 sbin/newfs_lfs/newfs.c (via patch)

Various minor LFS improvements.

Kernel:

* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this. Should fix PR #29045.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
Fixes PR #26680.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().

cleaner:

* Adapt lfs_cleanerd to use the fcntl call to get the Ifile filehandle,
so it need not be in the namespace.
* Make lfs_cleanerd be more careful when there are very few available
segments.
* Make lfs_cleanerd less verbose when the filesystem is unmounted.

newfs_lfs, fsck_lfs, and regression:

* Extend the lfs library from fsck_lfs(8) so that it can be used with a
not-yet-existent LFS. Make newfs_lfs(8) use this library, so it can
create LFSs whose Ifile is larger than one segment. Addresses PR #11110.
* Make newfs_lfs(8) use strsuftoi64() for its arguments, a la newfs(8).
* Make fsck_lfs(8) respect the "file system is clean" flag.
* Don't let fsck_lfs(8) think it has dirty blocks when invoked with the
-n flag.
* Remove the Ifile from the filesystem namespace. The cleaner now uses
a fcntl call on the root inode to find the Ifile filehandle. (As a
side-effect, addresses PR #29144.)
 1.74.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.74.4.1 29-Apr-2005  kent sync with -current
 1.77.2.18 10-Aug-2006  tron Apply patch (requested by fair in perseant #1457):
Bring LFS up to current, including a patch (1.95 lfs_alloc.c) that
should prevent the inode free list errors seen on the STABLE branch
subsequent to pullup ticket #1327.
 1.77.2.17 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_alloc.c: revision 1.93
sys/ufs/lfs/lfs.h: revision 1.106
sys/ufs/lfs/lfs_vfsops.c: revision 1.209
sys/ufs/lfs/lfs_vnops.c: revision 1.175
sys/ufs/lfs/lfs_segment.c: revision 1.178
Fixes to address the "vinvalbuf: dirty blocks" panic that can occur when
many inodes are cleaned at once. Make sure that we write all the pages
on vnodes that are being flushed, even if we don't think there's room;
drain v_numoutput before lfs_vflush() completes.
Also, don't allow a vnode that is in the process of being cleaned to be
chosen by getnewvnode(); this avoids a segment accounting panic in the case
that a large number of inodes are fed to lfs_markv() all at once.
 1.77.2.16 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_alloc.c: revision 1.92
sys/ufs/lfs/lfs.h: revision 1.105
sys/ufs/lfs/lfs_vfsops.c: revision 1.207
sys/ufs/lfs/lfs_subr.c: revision 1.59
sys/ufs/lfs/lfs_vnops.c: revision 1.173
sys/ufs/lfs/lfs_bio.c: revision 1.92
Introduce another per-filesystem parameter, lfs_resvseg, to separate the
notion of "how many segments are reserved for the cleaner" from that of
"how many segments are not counted in lfs_bfree". The default value
used for existing filesystems is the same as the previous implicit value
of (lfs_minfreeseg / 2 + 1), modulo some sanity checking.
Count pending dirops on a per-filesystem basis, since once we start
writing them we can't stop until we're done. This seems to help stave off
the "no clean segments" panic in the case of filling the filesystem with
directories and small files (e.g. simultaneously unpacking more copies of
pkgsrc than will fit).
 1.77.2.15 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs.h: revision 1.104
sys/ufs/lfs/lfs_vfsops.c: revision 1.206
sys/ufs/lfs/lfs_vnops.c: revision 1.170
sys/ufs/lfs/lfs_extern.h: revision 1.80
sys/ufs/lfs/lfs_segment.c: revision 1.176
sys/ufs/lfs/lfs_inode.c: revision 1.103 via patch
sys/ufs/lfs/lfs_alloc.c: revision 1.90
Postpone the segment accounting changes coming from truncation until the
inode that makes those changes valid is either written to disk by
lfs_writeinode() or discarded by lfs_vfree().
A couple of locking fixes are also included as well.
 1.77.2.14 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs.h: revision 1.103
sys/ufs/lfs/lfs_segment.c: revision 1.174
sys/ufs/lfs/lfs_vnops.c: revision 1.168
Introduce two fcntl calls that freeze the filesystem right at the point
where segment 0 is being considered for writing. This allows for automated
checkpoint vailidity scanning, and could be used (in conjunction with the
existing LFCNREWIND) for e.g. snapshot dumps as well.
Include a regression test that does such scanning.
When writing the Ifile, loop through the dirty block list three times to
make sure that the checkpoint is always consistent (the first and second
times the Ifile blocks can cross a segment boundary; not so the third time
unless the segments are very small). Discovered by using the aforementioned
regression test.
 1.77.2.13 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs.h: revision 1.102
sys/ufs/lfs/lfs_segment.c: revision 1.173
sys/ufs/lfs/lfs_vnops.c: revision 1.167 via patch
sys/ufs/lfs/lfs_bio.c: revision 1.91
Make lfs_vref/lfs_vunref not need to know about VXLOCK and VFREEING
explicitly (especially since we didn't know about VFREEING at all before),
but notice the EBUSY return from vget() instead.
Fix some more MP locking protocol issues, most of which were pointed out by
Christian Ehrhardt this morning on tech-kern.
 1.77.2.12 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs.h: revision 1.101
sys/ufs/lfs/lfs_vfsops.c: revision 1.202
sys/ufs/lfs/lfs_alloc.c: revision 1.88
Optimize the free list search a little more; in particular use words
instead of bytes for the index, and never search below fs->lfs_freehd.
Fix a bug in the previous version of the search (an erroneous assumption
that ino_t was signed).
Free the bitmap when we unmount the filesystem.
 1.77.2.11 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vfsops.c: revision 1.200
sys/ufs/lfs/lfs_vnops.c: revision 1.164
sys/ufs/lfs/lfs_inode.c: revision 1.101
sys/ufs/lfs/lfs_extern.h: revision 1.78
sys/ufs/lfs/lfs.h: revision 1.100
Implement a somewhat finer-grained mechanism for paging LFS-backed pages.
The writer daemon, if it does not need to flush the whole filesystem,
now only writes the vnodes for which the pagedaemon has requested pageouts
(although it does not pay attention to the page ranges the pagedaemon
supplies).
 1.77.2.10 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_alloc.c: revision 1.87
sys/ufs/lfs/lfs.h: revision 1.99
sys/ufs/lfs/lfs_vfsops.c: revision 1.199
sys/ufs/lfs/lfs_extern.h: revision 1.77 via patch
Keep the free list ordered. This solves a problem first pointed out to me
by Michel Oey, in which an aged LFS writes up to an extra Ifile block for
every file created; and paves the way for the truncation of the Ifile when
many files are deleted.
 1.77.2.9 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_subr.c: revision 1.58
sys/ufs/lfs/lfs.h: revision 1.98
Make the segment lock aware of LWPs. Fixes a (somewhat confusing)
"lockmgr: pid 3997, not exclusive lockholder 3997, unlocking" panic I
encountered while running blogbench on an LFS.
 1.77.2.8 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.158
sys/ufs/lfs/lfs_subr.c: revision 1.57
sys/ufs/lfs/lfs_segment.c: revision 1.171
sys/ufs/lfs/lfs.h: revision 1.97
sys/ufs/lfs/lfs_vfsops.c: revision 1.195
sys/ufs/lfs/lfs_extern.h: revision 1.76
Improvements to LFS's paging mechanism, to wit:
* Acknowledge that sometimes there are more dirty pages to be written to
disk than clean segments. When we reach the danger line,
lfs_gop_write() now returns EAGAIN. The caller of VOP_PUTPAGES(), if
it holds the segment lock, drops it and waits for the cleaner to make
room before continuing.
* Note and avoid a three-way deadlock in lfs_putpages (a writer holding
a page busy blocks on the cleaner while the cleaner blocks on the
segment lock while lfs_putpages blocks on the page).
 1.77.2.7 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_segment.c: revision 1.170
sys/ufs/lfs/lfs.h: revision 1.96
sys/ufs/lfs/lfs_vfsops.c: revision 1.194
sys/ufs/lfs/lfs_syscalls.c: revision 1.109
From Konrad Schroeder, in response to strange df output on anoncvs.netbsd.org:
We were returning the wrong value for free space. Now we're not.
 1.77.2.6 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs.h: revision 1.91
whitespace.
 1.77.2.5 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs.h: revision 1.90
change ino_t to u_int32_t for syscall compatibility.
 1.77.2.4 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs.h: revision 1.89
Move extern kernel variable declarations, into a _KERNEL protected session
so that the don't pollute userland's namespace.
 1.77.2.3 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.152
sys/ufs/lfs/lfs_debug.c: revision 1.31
sys/ufs/lfs/lfs_subr.c: revision 1.53
sys/ufs/lfs/lfs_extern.h: revision 1.68
sys/ufs/lfs/lfs_inode.c: revision 1.96
sys/ufs/lfs/lfs_bio.c: revision 1.86
sys/ufs/lfs/lfs_alloc.c: revision 1.83
sys/ufs/lfs/lfs_vfsops.c: revision 1.181
sys/ufs/lfs/lfs.h: revision 1.88
sys/ufs/lfs/lfs_segment.c: revision 1.164
- sprinkle const
- avoid shadow variables.
 1.77.2.2 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vfsops.c: revision 1.180
sys/ufs/lfs/lfs_syscalls.c: revision 1.106
sys/ufs/lfs/lfs.h: revision 1.87
Keep track of the number of segments reclaimed, since the cleaner doesn't
do this anymore (it hasn't for quite some time). Add a couple of conditional
debugging messages to indicate why segments are not cleaned, in the event
that lfs_segclean is used.
Make the LFCNSEGWAITALL fcntl work again.
 1.77.2.1 07-May-2005  tron Apply patch (requested by perseant in ticket #242):
* fsck_lfs buffer cache fixes, including PR #29151
* Change fsck_lfs phase 0 message to reflect reality
* fsck_lfs: check phase 5 (cleanerinfo accounting) even on
roll-forward
* Keep better track of the free list during roll-forward, avoiding
a core dump
* Improve hash table use for fsck_lfs buffer and vnode cache
* Document fsck_lfs flag -f, and implement -q
* Add resize_lfs, including kernel support
* Add LFS to mountd's list of exportable filesystem types
* Make the LFS lkm work again [christos@]
* Add MP locking to the LFS kernel subsystem
* Fix pager_map deadlock in lfs_putpages()
* Avoid incomplete file extension that looks like "partial
truncation" to fsck
* Use lfs_malloc for cleaner malloc, since the cleaner often runs
in low-memory conditions.
* Use splay trees, not hash table, to track page allocation for
write.
* Fix mkdir panic on full fs
* Fix page accounting leak by counting differently.
* Use rightly named structure for lfs_getattr [skrll@]
* Cosmetic changes for readability.
 1.88.2.6 27-Feb-2008  yamt sync with head.
 1.88.2.5 21-Jan-2008  yamt sync with head
 1.88.2.4 27-Oct-2007  yamt sync with head.
 1.88.2.3 26-Feb-2007  yamt sync with head.
 1.88.2.2 30-Dec-2006  yamt sync with head.
 1.88.2.1 21-Jun-2006  yamt sync with head.
 1.95.12.2 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.95.12.1 28-Mar-2006  tron Merge 2006-03-28 NetBSD-current into the "peter-altq" branch.
 1.95.10.3 11-May-2006  elad sync with head
 1.95.10.2 19-Apr-2006  elad sync with head.
 1.95.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.95.8.7 03-Sep-2006  yamt sync with head.
 1.95.8.6 11-Aug-2006  yamt sync with head
 1.95.8.5 26-Jun-2006  yamt sync with head.
 1.95.8.4 24-May-2006  yamt sync with head.
 1.95.8.3 11-Apr-2006  yamt sync with head
 1.95.8.2 01-Apr-2006  yamt sync with head.
 1.95.8.1 05-Mar-2006  yamt separate page replacement policy from the rest of kernel.
 1.95.6.2 01-Jun-2006  kardel Sync with head.
 1.95.6.1 22-Apr-2006  simonb Sync with head.
 1.95.4.1 09-Sep-2006  rpaulo sync with head
 1.107.4.1 13-Jul-2006  gdamore Merge from HEAD.
 1.114.2.1 18-Nov-2006  ad Sync with head.
 1.116.2.1 22-Oct-2006  yamt sync with head
 1.118.8.1 11-Jul-2007  mjf Sync with head.
 1.118.6.4 24-Aug-2007  ad Sync with buffer cache locking changes. See buf.h/vfs_bio.c for details.
Some minor portions are incomplete and needs to be verified as a whole.
 1.118.6.3 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.118.6.2 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.118.6.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.118.2.2 17-May-2007  yamt sync with head.
 1.118.2.1 07-May-2007  yamt sync with head.
 1.120.10.1 14-Oct-2007  yamt sync with head.
 1.120.8.3 23-Mar-2008  matt sync with HEAD
 1.120.8.2 09-Jan-2008  matt sync with HEAD
 1.120.8.1 06-Nov-2007  matt sync with HEAD
 1.120.6.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.122.10.2 08-Jan-2008  bouyer Sync with HEAD
 1.122.10.1 02-Jan-2008  bouyer Sync with HEAD
 1.122.6.2 19-Dec-2007  ad Use a global lfs_lock.
 1.122.6.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.122.4.1 18-Feb-2008  mjf Sync with HEAD.
 1.125.10.4 11-Mar-2010  yamt sync with head
 1.125.10.3 19-Aug-2009  yamt sync with head.
 1.125.10.2 04-May-2009  yamt sync with head.
 1.125.10.1 16-May-2008  yamt sync with head.
 1.125.8.1 18-May-2008  yamt sync with head.
 1.125.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.126.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.127.12.1 23-Jul-2009  jym Sync with HEAD.
 1.132.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.134.6.1 18-Feb-2012  mrg merge to -current.
 1.134.2.4 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.134.2.3 23-Jan-2013  yamt sync with head
 1.134.2.2 17-Apr-2012  yamt sync with head
 1.134.2.1 30-Nov-2011  yamt make lfs another pager specific flag so that it won't be affected by
an nfs hack in genfs.
 1.135.2.1 17-Mar-2012  bouyer Pull up following revision(s) (requested by perseant in ticket #116):
sys/ufs/lfs/lfs_alloc.c: revision 1.112
tests/fs/vfs/t_rmdirrace.c: revision 1.9
tests/fs/vfs/t_renamerace.c: revision 1.25
sys/ufs/lfs/lfs_vnops.c: revision 1.240
sys/ufs/lfs/lfs_segment.c: revision 1.224
sys/ufs/lfs/lfs_bio.c: revision 1.122
sys/ufs/lfs/lfs_vfsops.c: revision 1.294
sbin/newfs_lfs/make_lfs.c: revision 1.19
sys/ufs/lfs/lfs.h: revision 1.136
Pass t_renamerace and t_rmdirrace tests.
Adapt dholland@'s fix to ufs_rename to fix PR kern/43582. Address several
other MP locking issues discovered during the course of investigating the
same problem.
Removed extraneous vn_lock() calls on the Ifile, since the Ifile writes
are controlled by the segment lock.
Fix PR kern/45982 by deemphasizing the estimate of how much metadata
will fill the empty space on disk when the disk is nearly empty
(t_renamerace crates a lot of inode blocks on a tiny empty disk).
 1.136.2.4 03-Dec-2017  jdolecek update from HEAD
 1.136.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.136.2.2 23-Jun-2013  tls resync from head
 1.136.2.1 25-Feb-2013  tls resync with head
 1.157.2.1 28-Aug-2013  rmind sync with head
 1.160.6.6 28-Aug-2017  skrll Sync with HEAD
 1.160.6.5 09-Jul-2016  skrll Sync with HEAD
 1.160.6.4 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.160.6.3 22-Sep-2015  skrll Sync with HEAD
 1.160.6.2 06-Jun-2015  skrll Sync with HEAD
 1.160.6.1 06-Apr-2015  skrll Sync with HEAD
 1.199.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.199.2.1 26-Apr-2017  pgoyette Sync with HEAD
 1.201.6.1 30-Oct-2017  snj Pull up following revision(s) (requested by maya in ticket #330):
sbin/fsck_lfs/inode.c: 1.69
sbin/fsck_lfs/lfs.c: 1.73
sbin/fsck_lfs/pass6.c: 1.50
sbin/fsck_lfs/segwrite.c: 1.46
sys/ufs/lfs/lfs.h: 1.202-1.203
sys/ufs/lfs/lfs_accessors.h: 1.48
sys/ufs/lfs/lfs_alloc.c: 1.136-1.137
sys/ufs/lfs/lfs_balloc.c: 1.94
sys/ufs/lfs/lfs_bio.c: 1.141
sys/ufs/lfs/lfs_extern.h: 1.113
sys/ufs/lfs/lfs_inode.c: 1.156-1.157
sys/ufs/lfs/lfs_inode.h: 1.20, 1.21, 1.23
sys/ufs/lfs/lfs_itimes.c: 1.20
sys/ufs/lfs/lfs_pages.c: 1.13-1.15
sys/ufs/lfs/lfs_rename.c: 1.22
sys/ufs/lfs/lfs_segment.c: 1.270-1.275
sys/ufs/lfs/lfs_subr.c: 1.94-1.97
sys/ufs/lfs/lfs_syscalls.c: 1.175
sys/ufs/lfs/lfs_vfsops.c: 1.360
sys/ufs/lfs/lfs_vnops.c: 1.316-1.321
sys/ufs/lfs/ulfs_inode.c: 1.20
sys/ufs/lfs/ulfs_inode.h: 1.24
sys/ufs/lfs/ulfs_lookup.c: 1.41
sys/ufs/lfs/ulfs_quota2.c: 1.31
sys/ufs/lfs/ulfs_readwrite.c: 1.24
sys/ufs/lfs/ulfs_vnops.c: 1.49-1.50
Update inode member i_flag --> i_state to keep up with kernel changes
Move definition of IN_ALLMOD near the flag it's a mask for.
Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
Correct confusion between i_flag and i_flags
These will have to be renamed.
Spotted by Riastradh, thanks!
Add an XXX about the missing flags so it's not buried in a commit
message.
now the XXX count for LFS is 260
Rename i_flag to i_state.
The similarity to i_flags has previously caused errors.
Use continue to denote the no-op loop to match netbsd style
newline for extra clarity.
It isn't safe to drain dirops with seglock held, it'll deadlock if there
are any dirops. drain before grabbing seglock.
lfs_dirops == 0 is always true (as we already drained dirops), so omit
that part of the comparison.
Fixes a lot of LFS deadlocks. PR kern/52301
Many thanks to dholland for help analyzing coredumps
Ifdef out KDASSERT which fires on my machine.
Deduplicate sanity check that seglock is held on segunlock
Revert r1.272 fix to PR kern/52301, the performance hit is making things
unusable.
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar
XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
fix buffer overflow/KASSERT when cookies are supplied
lfs no longer uses the ffs-style struct direct, use the correct minimum
size
from dholland
XXX more wrong
Consistently use {,UN}MARK_VNODE macros rather than function calls.
Not much point doing anything after a panic call
Ask some question about the code in a XXX comment
XXX question our double-flushing of dirops
Fix typo in comment
 1.203.4.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.203.4.1 10-Jun-2019  christos Sync with HEAD
 1.203.2.1 18-Jan-2019  pgoyette Synch with HEAD
 1.204.6.1 29-Feb-2020  ad Sync with head.
 1.204.4.1 17-Aug-2020  martin Pull up following revision(s) (requested by riastradh in ticket #1050):

sys/ufs/lfs/lfs_subr.c: revision 1.101
sys/ufs/lfs/lfs_subr.c: revision 1.102
sys/ufs/lfs/lfs_inode.c: revision 1.158
sys/ufs/lfs/lfs_inode.h: revision 1.25
sys/ufs/lfs/lfs_balloc.c: revision 1.95
sys/ufs/lfs/lfs_pages.c: revision 1.21
sys/ufs/lfs/lfs_vnops.c: revision 1.330
sys/ufs/lfs/lfs_alloc.c: revision 1.140 (patch)
sys/ufs/lfs/lfs_alloc.c: revision 1.141 (patch)
lib/libp2k/p2k.c: revision 1.72
sys/ufs/lfs/lfs.h: revision 1.205
sys/ufs/lfs/lfs.h: revision 1.206
sys/ufs/lfs/lfs_segment.c: revision 1.284
sys/ufs/lfs/lfs.h: revision 1.207
sys/ufs/lfs/lfs_segment.c: revision 1.285
sys/ufs/lfs/lfs_debug.c: revision 1.55
sys/ufs/lfs/lfs_rename.c: revision 1.23
usr.sbin/dumplfs/dumplfs.c: revision 1.65
sys/ufs/lfs/lfs_vfsops.c: revision 1.371
sys/arch/i386/stand/efiboot/bootx64/Makefile: revision 1.3
sys/ufs/lfs/lfs_vfsops.c: revision 1.372
sys/ufs/lfs/lfs_vfsops.c: revision 1.373
sbin/fsck_lfs/pass1.c: revision 1.46
sys/ufs/lfs/lfs_vnops.c: revision 1.326
sys/ufs/lfs/lfs_vnops.c: revision 1.327
sys/ufs/lfs/lfs_vfsops.c: revision 1.375 (patch)
sys/ufs/lfs/lfs_vnops.c: revision 1.328
sys/ufs/lfs/lfs_subr.c: revision 1.98
sys/ufs/lfs/lfs_extern.h: revision 1.116
sys/ufs/lfs/lfs_vnops.c: revision 1.329
sys/ufs/lfs/lfs_subr.c: revision 1.99
sys/ufs/lfs/lfs_extern.h: revision 1.117
sys/ufs/lfs/lfs_accessors.h: revision 1.49
sys/ufs/lfs/lfs_extern.h: revision 1.118
sys/rump/fs/lib/liblfs/Makefile: revision 1.15
sys/ufs/lfs/lfs_bio.c: revision 1.146 (patch)
sys/ufs/lfs/lfs_bio.c: revision 1.147
sys/ufs/lfs/lfs_subr.c: revision 1.100

Fix kassert in lfs by initializing vp first.

Use a marker node to iterate lfs_dchainhd / i_lfs_dchain.

I believe elements can be removed while the lock is dropped,
including the next node we're hanging on to.

Just use VOP_BWRITE for lfs_bwrite_log.
Hope this doesn't cause trouble with vfs_suspend.

Teach lfs to transition ro<->rw.

Prevent new dirops while we issue lfs_flush_dirops.

lfs_flush_dirops assumes (by KASSERT((ip->i_state & IN_ADIROP) == 0))
that vnodes on the dchain will not become involved in active dirops
even while holding no other locks (lfs_lock, v_interlock), so we must
set lfs_writer here. All other callers already set lfs_writer.

We set fs->lfs_writer++ without explicitly doing lfs_writer_enter
because
(a) we already waited for the dirops to drain, and
(b) we hold lfs_lock and cannot drop it before setting lfs_writer.

Assert lfs_writer where I think we can now prove it.

Serialize access to the splay tree with lfs_lock.

Change some cheap KDASSERT into KASSERT.

Take a reference and fix assertions in lfs_flush_dirops.
Fixes panic:
KASSERT((ip->i_state & IN_ADIROP) == 0) at lfs_vnops.c:1670
lfs_flush_dirops
lfs_check
lfs_setattr
VOP_SETATTR
change_mode
sys_fchmod
syscall

This assertion -- and the assertion that vp->v_uflag has VU_DIROP set
-- is valid only until we release lfs_lock, because we may race with
lfs_unmark_dirop which will remove the nodes and change the flags.

Further, vp itself is valid only as long as it is referenced, which it
is as long as it's on the dchain, but lfs_unmark_dirop drops the
dchain's reference.

Don't lfs_writer_enter while holding v_interlock.

There's no need to lfs_writer_enter at all here, as far as I can see.
lfs_flush_fs will do it for us.

Break deadlock in PR kern/52301.

The lock order is lfs_writer -> lfs_seglock. The problem in 52301 is
that lfs_segwrite violates this lock order by sometimes doing
lfs_seglock -> lfs_writer, either (a) when doing a checkpoint or (b),
opportunistically, when there are no dirops pending. Both cases can
deadlock, because dirops sometimes take the seglock (lfs_truncate,
lfs_valloc, lfs_vfree):
(a) There may be dirops pending, and they may be waiting for the
seglock, so we can't wait for them to complete while holding the
seglock.
(b) The test for fs->lfs_dirops == 0 happens unlocked, and the state
may change by the time lfs_writer_enter acquires lfs_lock.

To resolve this in each case:
(a) Do lfs_writer_enter before lfs_seglock, since we will need it
unconditionally anyway. The worst performance impact of this should
be that some dirops get delayed a little bit.
(b) Create a new lfs_writer_tryenter to use at this point so that the
test for fs->lfs_dirops == 0 and the acquisition of lfs_writer happen
atomically under lfs_lock.

Initialize/destroy lfs_allclean_wakeup in modcmd, not lfs_mountfs.

Fixes reloading lfs.kmod.

In lfs_update, hold lfs_writer around lfs_vflush.

Otherwise, we might do
lfs_vflush
-> lfs_seglock
-> lfs_segwait(SEGM_CKP)
-> lfs_writer_enter
which is the reverse of the lfs_writer -> lfs_seglock ordering.

Call lfs_orphan in lfs_rename while we're still in the dirop.
lfs_writer_enter can't fail; keep it simple and don't pretend it can.

Assert that mtsleep can't fail either -- it doesn't catch signals and
there's no timeout.

Teach LFS_ORPHAN_NEXTFREE about lfs64.

Dust off the orphan detection code and try to make it work.

Fix !DIAGNOSTIC compile

Fix userland references to LFS_ORPHAN_NEXTFREE.

Forgot to grep for these or do a full distribution build, oops!

Fix missing <sys/evcnt.h> by removing the evcnts instead.

Just wanted to confirm that a race might happen, and indeed it did.
These serve little diagnostic value otherwise.

OR into bp->b_cflags; don't overwrite.

CTASSERT lfs on-disk structure sizes.

Avoid misaligned access to lfs64 on-disk records in memory.
lfs64 directory entries are only 32-bit aligned in order to conserve
space in directory blocks, and we had a hack to stuff a 64-bit inode
in them. This replaces the hack by __aligned(4) __packed, and goes
further:

1. It's not clear that all the other lfs64 data structures are 64-bit
aligned on disk to begin with. We can go through these later and
upgrade them from
struct foo64 {
...
} __aligned(4) __packed;
union foo {
struct foo64 f64;
...
};
to
struct foo64 {
...
};
union foo {
struct foo64 f64 __aligned(8);
...
} __aligned(4) __packed;
if we really want to take advantage of 64-bit memory accesses.
However, the __aligned(4) __packed must remain on the union
because:
2. We access even the lfs32 data structures via a union that has
lfs64 members, and it turns out that compilers will assume access
through a union with 64-bit aligned members implies the whole
union has 64-bit alignment, even if we're only accessing a 32-bit
aligned member.

Fix clang build after packed lfs64 accessor change.

Suppress spurious address-of-packed error in rump lfs too.
 1.54 04-Nov-2025  perseant Remove su_flags array, replacing it with a new flag SEGUSE_READY.
Segments progress from having su_nbytes==0 to SEGUSE_EMPTY to SEGUSE_READY
to clean, progressing to the nest step after a checkpoint.
 1.53 20-Oct-2025  perseant * Generalize the partial-segment parser introduced for roll-forward,
using it to facilitate an in-kernel segment rewriter (cleaner), and a
mechanism to check whether a segment is in fact empty (only used
with DEBUG).

* Add these new fcntl calls:
- LFCNFILESTATS: For each inode given, report its number of direct
blocks, how many gaps (discontinuities) there are between direct
blocks, and how large the total gap distance is. This will be
useful for a coalescing agent.
- LFCNREWRITEFILE: For each inode given, rewrite its direct blocks,
effectively coalescing it into as compact a form as possible.
- LFCNSCRAMBLE: As above, except that it only rewrites every other
block. This causes the file to have many gaps that can be
measured with LFCNFILESTATS and addressed with LFCNREWRITEFILE,
for testing purposes.
- LFCNREWRITESEGS: Rewrite any live data in the given segments.
This is intended to simplify the cleaner API and facilitate an
in-kernel cleaner.
- LFCNCLEANERINFO: Get the most current CLEANERINFO data from the
kernel.
- LFCNSEGUSE: Retrieve segment usage data from the kernel.

* Vnodes marked IN_CLEANING now take a reference. Add a new "cleaner
lock", which must be taken by the cleaner before the segment lock,
and before marking nodes IN_CLEANING. This allows us to flush
vnodes, if necessary, before the cleaning segment is written, and
never to flush vnodes being cleaned. When the cleaner lock is
released, the vnodes are cleared of IN_CLEANING and the reference
dropped.

* Track a potential infinite loop in lfs_gatherblock.

* Pull "needs to flush" and "needs to wait for flush" into functions
instead of inlining their definitions.
 1.52 15-Sep-2025  perseant If setting the head (or tail) of the inode free list to LFS_UNUSED_INUM, also
set the tail (resp. head) to LFS_UNUSED_INUM, as the list is now empty.

Add a check to ensure that lfs_valloc_fixed will always terminate, even
if the free list should contain a loop. Extend the ifile at the end if it
is empty, to match the assumption of lfs_valloc() that the free list is
never empty.

Needed for roll-forward.
 1.51 24-Apr-2022  rillig lfs: fix lint warning about empty declaration
 1.50 07-Sep-2020  riastradh Suppress -Waddress-of-packed-member just for lfs_accessors.h.

We can remove -Wno-error=address-of-packed-member from various
makefiles now.
 1.49 21-Mar-2020  riastradh Avoid misaligned access to lfs64 on-disk records in memory.

lfs64 directory entries are only 32-bit aligned in order to conserve
space in directory blocks, and we had a hack to stuff a 64-bit inode
in them. This replaces the hack by __aligned(4) __packed, and goes
further:

1. It's not clear that all the other lfs64 data structures are 64-bit
aligned on disk to begin with. We can go through these later and
upgrade them from

struct foo64 {
...
} __aligned(4) __packed;

union foo {
struct foo64 f64;
...
};

to

struct foo64 {
...
};

union foo {
struct foo64 f64 __aligned(8);
...
} __aligned(4) __packed;

if we really want to take advantage of 64-bit memory accesses.

However, the __aligned(4) __packed must remain on the union
because:

2. We access even the lfs32 data structures via a union that has
lfs64 members, and it turns out that compilers will assume access
through a union with 64-bit aligned members implies the whole
union has 64-bit alignment, even if we're only accessing a 32-bit
aligned member.
 1.48 10-Jun-2017  maya branches: 1.48.4; 1.48.8; 1.48.12;
Rename i_flag to i_state.

The similarity to i_flags has previously caused errors.
 1.47 12-Jan-2017  christos branches: 1.47.8;
fix sign confusion
 1.46 20-Jun-2016  dholland branches: 1.46.2;
Massedit u_int{8,16,32,64}_t to uint{8,16,32,64}_t. This effectively
merges ufs/dinode.h 1.25.
 1.45 19-Jun-2016  dholland we are actually synced with ufs/dinode.h 1.24 and ufs/dir.h 1.25.
 1.44 19-Feb-2016  riastradh Explicitly cast between char and unsigned char here.
 1.43 19-Feb-2016  riastradh Various housekeeping.

- Include <ufs/lfs/lfs.h> for union lfs_dinode &c.
- Include <string.h> or <sys/systm.h> for memcpy.
- Avoid signedness mismatch in lfs dino accessor for `rdev'.
- Avoid shadowing global `index'.
 1.42 10-Jan-2016  christos there is no reason to use __unused here.
 1.41 10-Jan-2016  dholland Fix two functions that were accidentally "static __unused" instead of
"static __unused inline". Oops; but probably not actually harmful.
 1.40 19-Oct-2015  dholland fix stupid typo in the 64-bit branch of the d_namlen accessor
 1.39 19-Oct-2015  dholland improve some panic messages
 1.38 15-Oct-2015  dholland Remove stray #define of lfs_magic
(the last of the fake superblock structure field macros)
 1.37 10-Oct-2015  dholland Add byteswapping to the inode block-pointer accessors.
 1.36 03-Oct-2015  dholland Drop an explicit sign-extension in fsck that shouldn't be needed any
more.
 1.35 03-Oct-2015  dholland Use IINFO in lfs_writeinode().
(both the kernel and the userland copies)
 1.34 03-Oct-2015  dholland Add an IINFO struct, which is like the FINFO struct but for the inode
blocks portion of the segment summary.

A segment summary block begins with a header (SEGSUM); the rest of the
block contains FINFO structures describing file blocks growing upward
from the bottom (after the header), and IINFO structures describing
inode blocks grown downward from the end of the block. (When they meet
the segment is full regardless of how many blocks might be left.)

IINFO contains just a block number, and until now this information was
handled by just using uint32_t*; switching to a structure will make
the code a lot easier to read, and also make it easier to have 32-bit
and 64-bit versions without making a mess.

This commit just adds the structures and accessors; they'll be
deployed into the code in subsequent commits.
 1.33 21-Sep-2015  dholland branches: 1.33.2;
Fix some assorted 32-bit assumptions not yet otherwise handled.

Also apply patch to fix the overt problem in PR 50246: newfs was
calculating ifpb wrong for volumes with non-default block sizes.
 1.32 21-Sep-2015  dholland Oops, I forgot to make the atime in the 64-bit IFILE 64 bits.
Correct that. Incompatible change, but no LFS64 volumes can have been
created yet.
 1.31 21-Sep-2015  dholland Add 64-bit directory entry structures, and adjust accessors accordingly.

The LFS64 directory entry has a 64-bit inode number. This is stored as
two 32-bit values to avoid inducing 64-bit alignment requirements.

The exposed type for manipulating directory entries is now
LFS_DIRHEADER, following the same convention as e.g. IFILE and SEGUSE.
(But with LFS_ on it, because.)
 1.30 21-Sep-2015  dholland Oops; LFS_DIRECTSIZ() is going to need the fs as an argument.

Also, it turns out that dirhash needs a compile-time-constant version
of LFS_DIRECTSIZ(LFS_MAXNAMLEN+1), independent of 64-vs-32, so create
LFS_MAXDIRENTRYSIZE for this. Sigh.
 1.29 20-Sep-2015  dholland Clean up struct lfs_dirtemplate.
 1.28 20-Sep-2015  dholland Fix glaringly stupid overflow/sizing bug in -r1.25. The part I don't
get is how it passed testing...
 1.27 15-Sep-2015  dholland Pass around struct lfs_dirheader instead of struct lfs_direct.
 1.26 15-Sep-2015  dholland Add an accessor function for directory names.
 1.25 15-Sep-2015  dholland Add a function lfs_copydirname() to copy directory names in place; use
it in place of (variously) memcpy and strlcpy. (The latter isn't even
correct; was probably changed blindly from strncpy at some point.)

The new function zeroes the padding in the directory entry instead of
leaving trash behind.
 1.24 15-Sep-2015  dholland Move the header part of struct lfs_direct to its own structure.
(lfs_dirheader)

Take the opportunity to improve the directory generation code in
make_lfs.c. (Everything else was unaffected by virtue of using
accessor functions.)
 1.23 15-Sep-2015  dholland Add and use accessor functions for more of the directory entry fields.
 1.22 01-Sep-2015  dholland Add new accessors for the d_type and d_namlen fields of struct lfs_direct.
Napalm the old byteswap access logic for these.
 1.21 01-Sep-2015  dholland Fix up indirect block handling in truncate to be 32/64 clean.
 1.20 01-Sep-2015  dholland Tidy the MAXSYMLINKLEN macros.
 1.19 01-Sep-2015  dholland The ifile's inode number is constant. (it is always 1)

Therefore, storing the value in the superblock and reading it out
again is silly and offers the opportunity for it to become corrupted.
So, don't do that (most of the code already didn't) and use the
existing constant instead. Initialize new 32-bit superblocks with
the value for the sake of old userland programs, but don't keep the
value in the 64-bit superblock at all.

(approved by Margo Seltzer)
 1.18 01-Sep-2015  dholland Make the inode fields in the 64-bit superblock 64 bits wide.
Reasoning as before.

Note that I am not going through and checking for 64->32 truncations
in inode numbers; I'm sure there are quite a few, but that's a project
for later.
 1.17 01-Sep-2015  dholland Add byteswapping to the dinode accessors.

This prevents regressions in the ulfs code when switching to the new
accessors. Note that while adding byteswapping to the other accessors
is straightforward, I haven't done it yet; and that also is not enough
to make LFS_EI work, because there are places lying around that bypass
the accessors for one reason and another and all of them need to be
updated. That is going to have to wait for a later day as LFS_EI is
not on the critical path right now.
 1.16 01-Sep-2015  dholland Use the lfs dinode accessors in place of the ufs-derived ones.
(Mostly.)

The ufs-derived ones are fake structure member macros, which are gross
and not very safe. Also, it seems that a lot of places in the lfs code
were using the ffsv1 branch of them unconditionally, and this way it's
guaranteed all those places have been updated.

Found while doing this: for non-devices, have getattr produce NODEV
in the rdev field instead of leaking the address of the first direct
block.
 1.15 29-Aug-2015  mlelstv Fix IFILE pointer calculation when scanning freelist.
 1.14 19-Aug-2015  dholland Part two of dinodes; use the same union everywhere.
(previously the ufs-derived code had things set up slightly different)

Remove a bunch of associated mess.
 1.13 12-Aug-2015  dholland Hack up dinode usage to be 64 vs. 32 as needed. Part 1.

(This part changes the native lfs code; the ufs-derived code already
has 64 vs. 32 logic, but as aspects of it are unsafe, and don't
entirely interoperate cleanly with the lfs 64/32 stuff, pass 2 will be
rehashing that.)
 1.12 12-Aug-2015  dholland Provide 32-bit and 64-bit versions of FINFO.

This also entailed sorting out part of struct segment, as that
contains a pointer into the current FINFO data.
 1.11 12-Aug-2015  dholland Make 32-bit and 64-bit versions of SEGSUM.
Also fix some of the FINFO handling as it's closely entangled.
 1.10 12-Aug-2015  dholland Add IFILE32 and IFILE64 structures for the on-disk ifile entries.
Add and use accessors. There are also a bunch of places that cast and
I hope I've found them all...
 1.9 12-Aug-2015  dholland Make 32-bit and 64-bit versions of CLEANERINFO.

XXX: while this is written to disk, it seems like much of it would
XXX: be better set up as a commpage shared with the cleaner.
 1.8 02-Aug-2015  dholland Pass the fs object to LFS_MAX_DADDR so it can check lfs_is64.

Remove some hackish intentional 64->32 truncations next to the checks
using LFS_MAX_DADDR, and tackle the problem they handled in bmap
instead.

The problem: the magic block pointer value UNWRITTEN has magic value
-2, and if it's not handled specifically, uint32 -> uint64 promotion
turns it into 4294967294, which then causes consternation and
monkeyhouse downstream.

What's here is still kind of a hack, but it's a step forward.
 1.7 02-Aug-2015  dholland Add a (draft) 64-bit superblock. Make things build again.

Add pieces of support for using both superblock types where
convenient, and specifically to the superblock accessors, but don't
actually enable it anywhere.

First substantive step on PR 50000.
 1.6 02-Aug-2015  dholland Use accessor functions for the version field of the lfs superblock.
I thought at first maybe the cases that test the version should be
rolled into the accessors, but on the whole I think the conclusion on
that is no.
 1.5 02-Aug-2015  dholland Second batch of 64 -> 32 truncations in lfs, along with more minor
tidyups and corrections in passing.
 1.4 02-Aug-2015  dholland Fix assorted 64 -> 32 truncations in lfs. Also, some minor tidyups and
corrections in passing.
 1.3 02-Aug-2015  dholland Allow superblock accessors that widen 32-bit disk fields to 64-bit
memory values.
 1.2 28-Jul-2015  dholland Use lfs_accessors.h in conjunction with the cleaner's struct clfs.
Remove previous hacks.
 1.1 28-Jul-2015  dholland Add a new lfs header file: lfs_accessors.h.

This contains all the accessor functions and macros out of lfs.h.
Add an include of lfs_accessors.h after all uses of lfs.h... except
for code that wants to define its own struct lfs-alike that the
accessors are supposed to play along with. For these, set STRUCT_LFS
and include lfs_accessors.h after the necessary structure has been
defined, so that lfs_accessors.h can emit functions in terms of it.
 1.33.2.7 28-Aug-2017  skrll Sync with HEAD
 1.33.2.6 05-Feb-2017  skrll Sync with HEAD
 1.33.2.5 09-Jul-2016  skrll Sync with HEAD
 1.33.2.4 19-Mar-2016  skrll Sync with HEAD
 1.33.2.3 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.33.2.2 22-Sep-2015  skrll Sync with HEAD
 1.33.2.1 21-Sep-2015  skrll file lfs_accessors.h was added on branch nick-nhusb on 2015-09-22 12:06:17 +0000
 1.46.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.47.8.1 30-Oct-2017  snj Pull up following revision(s) (requested by maya in ticket #330):
sbin/fsck_lfs/inode.c: 1.69
sbin/fsck_lfs/lfs.c: 1.73
sbin/fsck_lfs/pass6.c: 1.50
sbin/fsck_lfs/segwrite.c: 1.46
sys/ufs/lfs/lfs.h: 1.202-1.203
sys/ufs/lfs/lfs_accessors.h: 1.48
sys/ufs/lfs/lfs_alloc.c: 1.136-1.137
sys/ufs/lfs/lfs_balloc.c: 1.94
sys/ufs/lfs/lfs_bio.c: 1.141
sys/ufs/lfs/lfs_extern.h: 1.113
sys/ufs/lfs/lfs_inode.c: 1.156-1.157
sys/ufs/lfs/lfs_inode.h: 1.20, 1.21, 1.23
sys/ufs/lfs/lfs_itimes.c: 1.20
sys/ufs/lfs/lfs_pages.c: 1.13-1.15
sys/ufs/lfs/lfs_rename.c: 1.22
sys/ufs/lfs/lfs_segment.c: 1.270-1.275
sys/ufs/lfs/lfs_subr.c: 1.94-1.97
sys/ufs/lfs/lfs_syscalls.c: 1.175
sys/ufs/lfs/lfs_vfsops.c: 1.360
sys/ufs/lfs/lfs_vnops.c: 1.316-1.321
sys/ufs/lfs/ulfs_inode.c: 1.20
sys/ufs/lfs/ulfs_inode.h: 1.24
sys/ufs/lfs/ulfs_lookup.c: 1.41
sys/ufs/lfs/ulfs_quota2.c: 1.31
sys/ufs/lfs/ulfs_readwrite.c: 1.24
sys/ufs/lfs/ulfs_vnops.c: 1.49-1.50
Update inode member i_flag --> i_state to keep up with kernel changes
Move definition of IN_ALLMOD near the flag it's a mask for.
Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
Correct confusion between i_flag and i_flags
These will have to be renamed.
Spotted by Riastradh, thanks!
Add an XXX about the missing flags so it's not buried in a commit
message.
now the XXX count for LFS is 260
Rename i_flag to i_state.
The similarity to i_flags has previously caused errors.
Use continue to denote the no-op loop to match netbsd style
newline for extra clarity.
It isn't safe to drain dirops with seglock held, it'll deadlock if there
are any dirops. drain before grabbing seglock.
lfs_dirops == 0 is always true (as we already drained dirops), so omit
that part of the comparison.
Fixes a lot of LFS deadlocks. PR kern/52301
Many thanks to dholland for help analyzing coredumps
Ifdef out KDASSERT which fires on my machine.
Deduplicate sanity check that seglock is held on segunlock
Revert r1.272 fix to PR kern/52301, the performance hit is making things
unusable.
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar
XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
fix buffer overflow/KASSERT when cookies are supplied
lfs no longer uses the ffs-style struct direct, use the correct minimum
size
from dholland
XXX more wrong
Consistently use {,UN}MARK_VNODE macros rather than function calls.
Not much point doing anything after a panic call
Ask some question about the code in a XXX comment
XXX question our double-flushing of dirops
Fix typo in comment
 1.48.12.1 17-Aug-2020  martin Pull up following revision(s) (requested by riastradh in ticket #1050):

sys/ufs/lfs/lfs_subr.c: revision 1.101
sys/ufs/lfs/lfs_subr.c: revision 1.102
sys/ufs/lfs/lfs_inode.c: revision 1.158
sys/ufs/lfs/lfs_inode.h: revision 1.25
sys/ufs/lfs/lfs_balloc.c: revision 1.95
sys/ufs/lfs/lfs_pages.c: revision 1.21
sys/ufs/lfs/lfs_vnops.c: revision 1.330
sys/ufs/lfs/lfs_alloc.c: revision 1.140 (patch)
sys/ufs/lfs/lfs_alloc.c: revision 1.141 (patch)
lib/libp2k/p2k.c: revision 1.72
sys/ufs/lfs/lfs.h: revision 1.205
sys/ufs/lfs/lfs.h: revision 1.206
sys/ufs/lfs/lfs_segment.c: revision 1.284
sys/ufs/lfs/lfs.h: revision 1.207
sys/ufs/lfs/lfs_segment.c: revision 1.285
sys/ufs/lfs/lfs_debug.c: revision 1.55
sys/ufs/lfs/lfs_rename.c: revision 1.23
usr.sbin/dumplfs/dumplfs.c: revision 1.65
sys/ufs/lfs/lfs_vfsops.c: revision 1.371
sys/arch/i386/stand/efiboot/bootx64/Makefile: revision 1.3
sys/ufs/lfs/lfs_vfsops.c: revision 1.372
sys/ufs/lfs/lfs_vfsops.c: revision 1.373
sbin/fsck_lfs/pass1.c: revision 1.46
sys/ufs/lfs/lfs_vnops.c: revision 1.326
sys/ufs/lfs/lfs_vnops.c: revision 1.327
sys/ufs/lfs/lfs_vfsops.c: revision 1.375 (patch)
sys/ufs/lfs/lfs_vnops.c: revision 1.328
sys/ufs/lfs/lfs_subr.c: revision 1.98
sys/ufs/lfs/lfs_extern.h: revision 1.116
sys/ufs/lfs/lfs_vnops.c: revision 1.329
sys/ufs/lfs/lfs_subr.c: revision 1.99
sys/ufs/lfs/lfs_extern.h: revision 1.117
sys/ufs/lfs/lfs_accessors.h: revision 1.49
sys/ufs/lfs/lfs_extern.h: revision 1.118
sys/rump/fs/lib/liblfs/Makefile: revision 1.15
sys/ufs/lfs/lfs_bio.c: revision 1.146 (patch)
sys/ufs/lfs/lfs_bio.c: revision 1.147
sys/ufs/lfs/lfs_subr.c: revision 1.100

Fix kassert in lfs by initializing vp first.

Use a marker node to iterate lfs_dchainhd / i_lfs_dchain.

I believe elements can be removed while the lock is dropped,
including the next node we're hanging on to.

Just use VOP_BWRITE for lfs_bwrite_log.
Hope this doesn't cause trouble with vfs_suspend.

Teach lfs to transition ro<->rw.

Prevent new dirops while we issue lfs_flush_dirops.

lfs_flush_dirops assumes (by KASSERT((ip->i_state & IN_ADIROP) == 0))
that vnodes on the dchain will not become involved in active dirops
even while holding no other locks (lfs_lock, v_interlock), so we must
set lfs_writer here. All other callers already set lfs_writer.

We set fs->lfs_writer++ without explicitly doing lfs_writer_enter
because
(a) we already waited for the dirops to drain, and
(b) we hold lfs_lock and cannot drop it before setting lfs_writer.

Assert lfs_writer where I think we can now prove it.

Serialize access to the splay tree with lfs_lock.

Change some cheap KDASSERT into KASSERT.

Take a reference and fix assertions in lfs_flush_dirops.
Fixes panic:
KASSERT((ip->i_state & IN_ADIROP) == 0) at lfs_vnops.c:1670
lfs_flush_dirops
lfs_check
lfs_setattr
VOP_SETATTR
change_mode
sys_fchmod
syscall

This assertion -- and the assertion that vp->v_uflag has VU_DIROP set
-- is valid only until we release lfs_lock, because we may race with
lfs_unmark_dirop which will remove the nodes and change the flags.

Further, vp itself is valid only as long as it is referenced, which it
is as long as it's on the dchain, but lfs_unmark_dirop drops the
dchain's reference.

Don't lfs_writer_enter while holding v_interlock.

There's no need to lfs_writer_enter at all here, as far as I can see.
lfs_flush_fs will do it for us.

Break deadlock in PR kern/52301.

The lock order is lfs_writer -> lfs_seglock. The problem in 52301 is
that lfs_segwrite violates this lock order by sometimes doing
lfs_seglock -> lfs_writer, either (a) when doing a checkpoint or (b),
opportunistically, when there are no dirops pending. Both cases can
deadlock, because dirops sometimes take the seglock (lfs_truncate,
lfs_valloc, lfs_vfree):
(a) There may be dirops pending, and they may be waiting for the
seglock, so we can't wait for them to complete while holding the
seglock.
(b) The test for fs->lfs_dirops == 0 happens unlocked, and the state
may change by the time lfs_writer_enter acquires lfs_lock.

To resolve this in each case:
(a) Do lfs_writer_enter before lfs_seglock, since we will need it
unconditionally anyway. The worst performance impact of this should
be that some dirops get delayed a little bit.
(b) Create a new lfs_writer_tryenter to use at this point so that the
test for fs->lfs_dirops == 0 and the acquisition of lfs_writer happen
atomically under lfs_lock.

Initialize/destroy lfs_allclean_wakeup in modcmd, not lfs_mountfs.

Fixes reloading lfs.kmod.

In lfs_update, hold lfs_writer around lfs_vflush.

Otherwise, we might do
lfs_vflush
-> lfs_seglock
-> lfs_segwait(SEGM_CKP)
-> lfs_writer_enter
which is the reverse of the lfs_writer -> lfs_seglock ordering.

Call lfs_orphan in lfs_rename while we're still in the dirop.
lfs_writer_enter can't fail; keep it simple and don't pretend it can.

Assert that mtsleep can't fail either -- it doesn't catch signals and
there's no timeout.

Teach LFS_ORPHAN_NEXTFREE about lfs64.

Dust off the orphan detection code and try to make it work.

Fix !DIAGNOSTIC compile

Fix userland references to LFS_ORPHAN_NEXTFREE.

Forgot to grep for these or do a full distribution build, oops!

Fix missing <sys/evcnt.h> by removing the evcnts instead.

Just wanted to confirm that a race might happen, and indeed it did.
These serve little diagnostic value otherwise.

OR into bp->b_cflags; don't overwrite.

CTASSERT lfs on-disk structure sizes.

Avoid misaligned access to lfs64 on-disk records in memory.
lfs64 directory entries are only 32-bit aligned in order to conserve
space in directory blocks, and we had a hack to stuff a 64-bit inode
in them. This replaces the hack by __aligned(4) __packed, and goes
further:

1. It's not clear that all the other lfs64 data structures are 64-bit
aligned on disk to begin with. We can go through these later and
upgrade them from
struct foo64 {
...
} __aligned(4) __packed;
union foo {
struct foo64 f64;
...
};
to
struct foo64 {
...
};
union foo {
struct foo64 f64 __aligned(8);
...
} __aligned(4) __packed;
if we really want to take advantage of 64-bit memory accesses.
However, the __aligned(4) __packed must remain on the union
because:
2. We access even the lfs32 data structures via a union that has
lfs64 members, and it turns out that compilers will assume access
through a union with 64-bit aligned members implies the whole
union has 64-bit alignment, even if we're only accessing a 32-bit
aligned member.

Fix clang build after packed lfs64 accessor change.

Suppress spurious address-of-packed error in rump lfs too.
 1.48.8.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.48.4.2 03-Dec-2017  jdolecek update from HEAD
 1.48.4.1 10-Jun-2017  jdolecek file lfs_accessors.h was added on branch tls-maxphys on 2017-12-03 11:39:22 +0000
 1.147 03-Nov-2025  perseant Be more careful about only setting IN_CLEANING in lfs_setclean() and clearing
it in lfs_clrclean(). Prevents a crash from re-removing an entry from the
lfs_cleanhd TAILQ.
 1.146 01-Nov-2025  perseant Create a new LFS inode flag, IN_DEAD, to indicate that a file's last
reference, other than those that come with VU_DIROP or IN_CLEANING and
the one the caller holds, has been dropped. Check and apply this flag
in lfs_orphan(), and call lfs_orphan() on close if the link count is
zero. Change the signature of lfs_orphan to facilitate.

Make test t_vfsops:lfs_tfhremove expect success.

Closes PR kern/43745.
 1.145 21-Sep-2025  christos lfs_freelist_prev is unused
 1.144 17-Sep-2025  perseant Add routines to check freelist consistency if compiled with DEBUG and
conditional on a kernel variable manipulated via sysctl.
Add checks before and after each routine that modifies the free list.
#if 0 a section of lfs_vfree() that was intended to keep the free list ordered
but instead corrupted it.
 1.143 15-Sep-2025  perseant Initialize nextfree, to placate gcc.
 1.142 15-Sep-2025  perseant If setting the head (or tail) of the inode free list to LFS_UNUSED_INUM, also
set the tail (resp. head) to LFS_UNUSED_INUM, as the list is now empty.

Add a check to ensure that lfs_valloc_fixed will always terminate, even
if the free list should contain a loop. Extend the ifile at the end if it
is empty, to match the assumption of lfs_valloc() that the free list is
never empty.

Needed for roll-forward.
 1.141 23-Feb-2020  riastradh Dust off the orphan detection code and try to make it work.
 1.140 23-Feb-2020  riastradh Teach LFS_ORPHAN_NEXTFREE about lfs64.
 1.139 22-Feb-2020  kamil Avoid undefined behavior in *_BITMAP_FREE() macros

left shift of 1 by 31 places cannot be represented in type 'int'
 1.138 17-Jan-2020  ad VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.
 1.137 19-Aug-2017  maya branches: 1.137.4; 1.137.8; 1.137.10;
Consistently use {,UN}MARK_VNODE macros rather than function calls.
 1.136 10-Jun-2017  maya Rename i_flag to i_state.

The similarity to i_flags has previously caused errors.
 1.135 13-Mar-2017  maya branches: 1.135.6;
Fill in some XXXs with the exact action described in them. match
lfs_valloc behaviour.
 1.134 13-Mar-2017  riastradh #if DIAGNOSTIC panic ---> KASSERT

Replace some #if DEBUG by this too. DEBUG is only for expensive
assertions; these are not.
 1.133 07-Aug-2016  dholland branches: 1.133.2;
Remove unused <sys/tree.h>
 1.132 07-Aug-2016  dholland Comments
 1.131 10-Oct-2015  dholland branches: 1.131.2;
Fix minor bitrot in #if 0 or otherwise disabled code.
 1.130 13-Sep-2015  dholland Fix wrong code in lfs_valloc_fixed(). It was overwriting the inode
number it was supposed to be allocating with the head of the inode
freelist, then applying the wrong test to that result. Net result:
unless the freelist was empty (in which case it would always fail),
it would in general drop a bunch of entries from the freelist.

This code seems to have been broken when the first version of lfsv2
was imported onto the perseant-lfsv2 branch in -r1.47.2.1, and
remained broken since, in spite of having been moved to lfs_rfw.c and
back and rearranged quite a bit in the meantime.

Sigh.

Found by Coverity in a rather confusing way as CID 1316545.
 1.129 01-Sep-2015  dholland Use the lfs dinode accessors in place of the ufs-derived ones.
(Mostly.)

The ufs-derived ones are fake structure member macros, which are gross
and not very safe. Also, it seems that a lot of places in the lfs code
were using the ffsv1 branch of them unconditionally, and this way it's
guaranteed all those places have been updated.

Found while doing this: for non-devices, have getattr produce NODEV
in the rdev field instead of leaking the address of the first direct
block.
 1.128 29-Aug-2015  mlelstv Fix IFILE pointer calculation when scanning freelist.
 1.127 12-Aug-2015  dholland Hack up dinode usage to be 64 vs. 32 as needed. Part 1.

(This part changes the native lfs code; the ufs-derived code already
has 64 vs. 32 logic, but as aspects of it are unsafe, and don't
entirely interoperate cleanly with the lfs 64/32 stuff, pass 2 will be
rehashing that.)
 1.126 12-Aug-2015  dholland Add IFILE32 and IFILE64 structures for the on-disk ifile entries.
Add and use accessors. There are also a bunch of places that cast and
I hope I've found them all...
 1.125 02-Aug-2015  dholland Use accessor functions for the version field of the lfs superblock.
I thought at first maybe the cases that test the version should be
rolled into the accessors, but on the whole I think the conclusion on
that is no.
 1.124 28-Jul-2015  dholland Add a new lfs header file: lfs_accessors.h.

This contains all the accessor functions and macros out of lfs.h.
Add an include of lfs_accessors.h after all uses of lfs.h... except
for code that wants to define its own struct lfs-alike that the
accessors are supposed to play along with. For these, set STRUCT_LFS
and include lfs_accessors.h after the necessary structure has been
defined, so that lfs_accessors.h can emit functions in terms of it.
 1.123 24-Jul-2015  dholland More lfs superblock accessors.
(This changes the rest of the code over; all the accessors were
already added.)

The difference between this commit and the previous one is arbitrary,
but the previous one passed the regression tests on its own so I'm
keeping it separate to help with any bisections that might be needed
in the future.
 1.122 24-Jul-2015  dholland Switch to accessor functions for elements of the LFS on-disk
superblock. This will allow switching between 32/64 bit forms on the
fly; it will also allow handling LFS_EI reasonably tidily. (That
currently doesn't work on the superblock.)

It also gets rid of cpp abuse in the form of fake structure member
macros.

Also, instead of doing sleep/wakeup on &lfs_avail and &lfs_nextseg
inside the on-disk superblock, add extra elements to the in-memory
struct lfs for this. (XXX: these should be changed to condvars, but
not right now)

XXX: this migrates a structure needed by the lfs code in libsa (struct
salfs) into lfs.h, where it doesn't belong, but for the time being
this is necessary in order to allow the accessors (and the various
lfs macros and other goop that relies on them) to compile.
 1.121 16-Jul-2015  dholland Don't cast the return value of malloc.
 1.120 31-May-2015  hannken Change lfs from hash table to vcache.

- Change lfs_valloc() to return an inode number and version instead of
a vnode and move lfs_ialloc() and lfs_vcreate() to new lfs_init_vnode().

- Add lfs_valloc_fixed() to allocate a known inode, used by kernel
roll forward.

- Remove lfs_*ref(), these functions cannot coexist with vcache and
their commented behaviour is far away from their implementation.

- Add the cleaner lwp and blockinfo to struct ulfsmount so lfs_loadvnode()
may use hints from the cleaner.

- Remove vnode locks from ulfs_lookup() like we did with ufs_lookup().
 1.119 28-Jul-2013  dholland branches: 1.119.6;
Add more of the bits for supporting quotas.
 1.118 28-Jul-2013  dholland Add lfs_kernel.h for declarations that don't need to be exposed to userland.

lfs currently has the following headers:
lfs.h - on-disk structures and stuff needed for userlevel tools
lfs_inode.h - additional restricted materials for userlevel tools
that operate the fs (newfs_lfs, fsck_lfs, lfs_cleanerd)
lfs_kernel.h - stuff needed only in the kernel

and the following legacy headers that are expected to be mopped up and
folded into one of the above:
lfs_extern.h - function prototypes
ulfs_bswap.h - endian-independent support
ulfs_dinode.h - now contains very little
ulfs_dirhash.h - dirhash support
ulfs_extattr.h - extattr support
ulfs_extern.h - more function prototypes
ulfs_inode.h - assorted kernel-only declarations
ulfs_quota.h - quota support
ulfs_quota1.h - more quota support
ulfs_quota2.h - more quota support
ulfs_quotacommon.h - more quota support
ulfsmount.h - legacy copy of ufsmount material
 1.117 18-Jun-2013  christos branches: 1.117.2;
Prefix most of the cpp macros with lfs_ and LFS_ to avoid conflicts with ffs.
This was done so that boot blocks that want to compile both FFS and LFS in
the same file work.
 1.116 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.115 06-Jun-2013  dholland Split lfs from ufs step 3: rearrange config stuff.
Add new options:
LFS_EI
LFS_DIRHASH
LFS_EXTATTR
LFS_EXTATTR_AUTOSTART
LFS_QUOTA
LFS_QUOTA2

and update code referring to the corresponding FFS and UFS config
symbols to use the LFS versions. Disable the one extant reference
to APPLE_UFS in the ulfs files. Use opt_lfs.h only, not opt_ffs.h.
 1.114 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.113 22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.112 16-Feb-2012  perseant branches: 1.112.2;
Pass t_renamerace and t_rmdirrace tests.

Adapt dholland@'s fix to ufs_rename to fix PR kern/43582. Address several
other MP locking issues discovered during the course of investigating the
same problem.

Removed extraneous vn_lock() calls on the Ifile, since the Ifile writes
are controlled by the segment lock.

Fix PR kern/45982 by deemphasizing the estimate of how much metadata
will fill the empty space on disk when the disk is nearly empty
(t_renamerace crates a lot of inode blocks on a tiny empty disk).
 1.111 12-Jun-2011  rmind branches: 1.111.2; 1.111.6; 1.111.8;
Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.110 24-Jun-2010  hannken branches: 1.110.6;
Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.
 1.109 08-Jan-2010  pooka branches: 1.109.2; 1.109.4;
The VATTR_NULL/VREF/VHOLD/HOLDRELE() macros lost their will to live
years ago when the kernel was modified to not alter ABI based on
DIAGNOSTIC, and now just call the respective function interfaces
(in lowercase). Plenty of mix'n match upper/lowercase has creeped
into the tree since then. Nuke the macros and convert all callsites
to lowercase.

no functional change
 1.108 13-Sep-2009  tsutsui Move declaration of ufs_hashlock into <ufs/ufs_extern.h> from each c source.
 1.107 28-Apr-2008  martin Remove clause 3 and 4 from TNF licenses
 1.106 30-Jan-2008  ad branches: 1.106.6; 1.106.8; 1.106.10;
Replace struct lock on vnodes with a simpler lock object built on
krwlock_t. This is a step towards removing lockmgr and simplifying
vnode locking. Discussed on tech-kern.
 1.105 02-Jan-2008  ad Merge vmlocking2 to head.
 1.104 12-Dec-2007  he Fix a use of lfs_truncate() inside an #ifdef notyet (so no resulting change);
lfs_truncate() has lost its lwp argument.
 1.103 10-Oct-2007  ad branches: 1.103.4; 1.103.6; 1.103.8; 1.103.10;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.102 08-Oct-2007  ad Merge ffs locking & brelse changes from the vmlocking branch.
 1.101 10-Jul-2007  hannken branches: 1.101.6; 1.101.8; 1.101.10;
Move `struct dquot' and its supporting functions from quota.h to ufs_quota.c.

- Make quota-internal functions static.
- Clean up declarations in quota.h and ufs_extern.h. quota.h now has the
description of quota criterions, on-disk structure, user-kernel interface and
declaration of init/done functions. All ufs quota related function
prototypes go to ufs_extern.h.
- New functions ufsquota_init() and ufsquota_free() create or destroy the
quota fields of `struct inode'.
- chkdq() and chkiq() always update the quota fields of `struct inode' first.
- Only ufs_access() explicitely calls getinoquota().

No objections on tech-kern@
 1.100 15-Feb-2007  ad branches: 1.100.6; 1.100.8;
Replace some uses of lockmgr() / simplelocks.
 1.99 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.98 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.97 01-Sep-2006  perseant branches: 1.97.2; 1.97.4;
Changes to help the roll-forward agent, to wit:

* Mark being-deleted files in the Ifile so we can finish deleting them
at fs mount time.
* Flag the Ifile with "cleaner must clean" when writers are waiting for
the cleaner, rather than relying solely on the cleaner's estimation of
whether it should clean or not.
* Note partial segments written by a user agent (in particular,
fsck_lfs) so that repeated rolls forward don't interfere with one
another.
* Add a new fcntl, LFCNPASS, that allows the log to wrap exactly once,
for better testing of the validity of checkpoints.
* Keep track of the on-disk nlink count when cleaning, so that we don't
partially complete directory operations while cleaning.
* Ensure that every single Ifile inode write represents a consistent
view of the filesystem. In particular, the accounting for the segment
we are writing the inode into must be correct, and the accounting for
the segment that inode used to reside in must be correct. Rather than
just rewriting the inode if we wrote it wrong, rewrite the necessary
ifile blocks before writing the inode so we never write it wrong.
* Don't unmark any VDIROP vnodes if we haven't written them to disk,
avoiding yet another problem with the "wait for the cleaner" error
return from lfs_putpages().

Also, move the last callback to an aiodone call, so we no longer do any
memory management from interrupt context.
 1.96 20-Jul-2006  perseant Separate the (non-working) LFS kernel roll-forward code into its own file,
lfs_rfw.c.
 1.95 06-Jul-2006  perseant Protect lfs_order_freelist() with the segment lock.
 1.94 14-May-2006  elad branches: 1.94.4;
integrate kauth.
 1.93 12-May-2006  perseant Fixes to address the "vinvalbuf: dirty blocks" panic that can occur when
many inodes are cleaned at once. Make sure that we write all the pages
on vnodes that are being flushed, even if we don't think there's room;
drain v_numoutput before lfs_vflush() completes.

Also, don't allow a vnode that is in the process of being cleaned to be
chosen by getnewvnode(); this avoids a segment accounting panic in the case
that a large number of inodes are fed to lfs_markv() all at once.
 1.92 04-May-2006  perseant Introduce another per-filesystem parameter, lfs_resvseg, to separate the
notion of "how many segments are reserved for the cleaner" from that of
"how many segments are not counted in lfs_bfree". The default value
used for existing filesystems is the same as the previous implicit value
of (lfs_minfreeseg / 2 + 1), modulo some sanity checking.

Count pending dirops on a per-filesystem basis, since once we start
writing them we can't stop until we're done. This seems to help stave off
the "no clean segments" panic in the case of filling the filesystem with
directories and small files (e.g. simultaneously unpacking more copies of
pkgsrc than will fit).
 1.91 30-Apr-2006  perseant Add an explicit list initialization that was missing from my last commit.
 1.90 30-Apr-2006  perseant Postpone the segment accounting changes coming from truncation until the
inode that makes those changes valid is either written to disk by
lfs_writeinode() or discarded by lfs_vfree().

A couple of locking fixes are also included as well.
 1.89 22-Apr-2006  perseant Fix a fencepost error in the bitmap handling in extend_ifile(), and another
in lfs_freelist_prev().
 1.88 10-Apr-2006  perseant Optimize the free list search a little more; in particular use words
instead of bytes for the index, and never search below fs->lfs_freehd.

Fix a bug in the previous version of the search (an erroneous assumption
that ino_t was signed).

Free the bitmap when we unmount the filesystem.
 1.87 08-Apr-2006  perseant Keep the free list ordered. This solves a problem first pointed out to me
by Michel Oey, in which an aged LFS writes up to an extra Ifile block for
every file created; and paves the way for the truncation of the Ifile when
many files are deleted.
 1.86 11-Dec-2005  christos branches: 1.86.4; 1.86.6; 1.86.8; 1.86.10; 1.86.12;
merge ktrace-lwp.
 1.85 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.84 19-Aug-2005  christos branches: 1.84.2;
64 bit inode changes.
 1.83 29-May-2005  christos branches: 1.83.2;
- sprinkle const
- avoid shadow variables.
 1.82 19-Apr-2005  perseant Keep per-inode, per-fs, and subsystem-wide counts of blocks allocated through
lfs_balloc(), and use that to estimate the number of dirty pages belonging
to LFS (subsystem or filesystem). This is almost certainly wrong for
the case of a large mmap()ed region, but the accounting is tighter than
what we had before, and performs much better in the typical case of pages
dirtied through write().
 1.81 16-Apr-2005  perseant Use splay trees, rather than a hash table, to manage the accounting of
blocks allocated through VOP_BALLOC() for pages to be written to disk.
This accounting no longer takes a noticeable fraction of the system CPU.
 1.80 14-Apr-2005  perseant Consolidate the hash table we use to maintain the integrity of lfs_avail
into a single, system-wide table, rather than having a separate hash table
per inode. Significantly reduces the "system" cpu usage of your average
file write.
 1.79 14-Apr-2005  perseant Keep track of the highest block held by an LFS inode, so that we can
be assured that the last byte of a file is always allocated. Previously
a file extension could cause the filesystem to be flushed, writing an
inconsistent inode to disk. Although this condition would be corrected
the next time blocks were written to disk, an intervening crash would leave
the filesystem in an inconsistent state, leaving fsck_lfs to complain
of an inode "partially truncated".
 1.78 01-Apr-2005  perseant Protect various per-fs structures with fs->lfs_interlock simple_lock, to
improve behavior in the multiprocessor case. Add debugging segment-lock
assertion statements.
 1.77 23-Mar-2005  perseant Make LFS dirops get their vnode first, before incrementing the dirop count,
to prevent a deadlock trying to call VOP_PUTPAGES() on a VDIROP vnode.
This can happen when a stacked filesystem is mounted on top of an LFS: an
LFS dirop needs to get a vnode, which is available from the upper layer.
The corresponding lower layer vnode, however, is VDIROP, so the upper layer
can't be cleaned out since its VOP_PUTPAGES() is passed through to the lower
layer, which waits for dirops to drain before it can proceed. Deadlock.

Tweak ufs_makeinode() and ufs_mkdir() to pass the a_vpp argument through
to VOP_VALLOC().

Partially addresses PR # 26043, though it probably does not completely fix
the problem described there.
 1.76 08-Mar-2005  perseant branches: 1.76.2;
Straighten out the maze of ifdefs. Instead, consolidate all the debugging
stuff under '#ifdef DEBUG', and use sysctl knobs to turn on/off particular
parts of the debugging reporting (if DEBUG is enabled). Re-enable the LFS
statistics in sysctl, while I'm there. A bit of a rototill.
 1.75 26-Feb-2005  perry nuke trailing whitespace
 1.74 26-Feb-2005  perseant Various minor LFS improvements:

* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statvfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().
 1.73 14-Aug-2004  mycroft branches: 1.73.4; 1.73.6;
Add a new flag, IN_MODIFY. This is like IN_UPDATE|IN_CHANGE, but unlike
setting those flags, it does not cause the inode to be written in the periodic
sync. This is used for writes to special files (devices and named pipes) and
FIFOs.

Do not preemptively sync updates to access times and modification times. They
are now updated in the inode only opportunistically, or when the file or device
is closed. (Really, it should be delayed beyond close, but this is enough to
help substantially with device nodes.)

And the most amusing part:
Trickle sync was broken on both FFS and ext2fs, in different ways. In FFS, the
periodic call to VFS_SYNC(MNT_LAZY) was still causing all file data to be
synced. In ext2fs, it was causing the metadata to *not* be synced. We now
only call VOP_UPDATE() on the node if we're doing MNT_LAZY. I've confirmed
that we do in fact trickle correctly now.
 1.72 23-Sep-2003  yamt branches: 1.72.4;
cleanup IN_ADIROP/VDIROP handling a little.
 1.71 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.70 12-Jul-2003  yamt - protect global resource counts with lfs_subsys_lock.
- clean up scattered externs a little.
 1.69 29-Jun-2003  fvdl branches: 1.69.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.68 29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.67 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.66 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.65 15-Mar-2003  perseant Add simple_lock protection for lfs_seglock and lfs_subsys_pages; these will
be expanded to cover other per-fs and subsystem-wide data as well.

Fix a case of IN_MODIFIED being set without updating lfs_uinodes, resulting
in a "lfs_uinodes < 0" panic.

Fix a deadlock in lfs_putpages arising from the need to busy all pages in a
block; unbusy any that had already been busied before starting over.
 1.64 20-Feb-2003  perseant Tabify, and fix some comment alignment problems.
 1.63 17-Feb-2003  perseant Add code to UBCify LFS. This is still behind "#ifdef LFS_UBC" for now
(there are still some details to work out) but expect that to go
away soon. To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:

* Create a writer daemon kernel thread whose purpose is to handle page
writes for the pagedaemon, but which also takes over some of the
functions of lfs_check(). This thread is started the first time an
LFS is mounted.

* Add a "flags" parameter to GOP_SIZE. Current values are
GOP_SIZE_READ, meaning that the call should return the size of the
in-core version of the file, and GOP_SIZE_WRITE, meaning that it
should return the on-disk size. One of GOP_SIZE_READ or
GOP_SIZE_WRITE must be specified.

* Instead of using malloc(...M_WAITOK) for everything, reserve enough
resources to get by and use malloc(...M_NOWAIT), using the reserves if
necessary. Use the pool subsystem for structures small enough that
this is feasible. This also obsoletes LFS_THROTTLE.

And a few that are not strictly necessary:

* Moves the LFS inode extensions off onto a separately allocated
structure; getting closer to LFS as an LKM. "Welcome to 1.6O."

* Unified GOP_ALLOC between FFS and LFS.

* Update LFS copyright headers to correct values.

* Actually cast to unsigned in lfs_shellsort, like the comment says.

* Keep track of which segments were empty before the previous
checkpoint; any segments that pass two checkpoints both dirty and
empty can be summarily cleaned. Do this. Right now lfs_segclean
still works, but this should be turned into an effectless
compatibility syscall.
 1.62 27-Jan-2003  yamt make these compilable with lfs debug options.
(follow daddr_t change)

XXX maybe segment number should be 64bit.
 1.61 25-Jan-2003  tron Use PRId64 instead of hard coding "%lld" to fix build problems under
LP64 ports.
 1.60 25-Jan-2003  tron Fix printf() format strings problems caused by "daddr_t" change.
 1.59 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.58 08-Jan-2003  yamt use lfs_unmark_vnode instead of duplicated code fragments.
 1.57 24-Nov-2002  yamt make sure i_lfs_fragsize is initialized.
fix panic "lfs_writefile: more than one fragment!"
PR 18974.
 1.56 14-May-2002  perseant Phase one of my three-phase plan to make LFS play nice with UBC, and bug-fixes
I found while making sure there weren't any new ones.

* Make the write clusters keep track of the buffers whose blocks they contain.
This should make it possible to (1) write clusters using a page mapping
instead of malloc, if desired, and (2) schedule blocks for rewriting
(somewhere else) if a write error occurs. Code is present to use
pagemove() to construct the clusters but that is untested and will go away
anyway in favor of page mapping.
* DEBUG now keeps a log of Ifile writes, so that any lingering instances of
the "dirty bufs" problem can be properly debugged.
* Keep track of whether the Ifile has been dirtied by various routines that
can be called by lfs_segwrite, and loop on that until it is clean, for
a checkpoint. Checkpoints need to be squeaky clean.
* Warn the user (once) if the Ifile grows larger than is reasonable for their
buffer cache. Both lfs_mountfs and lfs_unmount check since the Ifile can
grow.
* If an inode is not found in a disk block, try rereading the block, under
the assumption that the block was copied to a cluster and then freed.
* Protect WRITEINPROG() with splbio() to fix a hang in lfs_update.
 1.55 04-Feb-2002  perseant Correct free list tail pointer, when adding blocks of new inodes to v2
filesystems. Should fix PR #14408.
 1.54 18-Dec-2001  chs use the new compatibility routines to allow mmap() to work
(in the same non-coherent fashion that it worked pre-UBC)
until someone has time to do it the right way.
 1.53 23-Nov-2001  chs add spaces for KNF. confirmed to produce identical objects.
 1.52 08-Nov-2001  lukem add RCSID
 1.51 14-Oct-2001  chs branches: 1.51.2;
initialize the vnode's copy of the size in lfs_ialloc().
 1.50 28-Sep-2001  chs don't depend on other headers to include sys/proc.h for us.
 1.49 15-Sep-2001  chs a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.48 13-Jul-2001  perseant branches: 1.48.2;
Merge the short-lived perseant-lfsv2 branch into the trunk.

Kernels and tools understand both v1 and v2 filesystems; newfs_lfs
generates v2 by default. Changes for the v2 layout include:

- Segments of non-PO2 size and arbitrary block offset, so these can be
matched to convenient physical characteristics of the partition (e.g.,
stripe or track size and offset).

- Address by fragment instead of by disk sector, paving the way for
non-512-byte-sector devices. In theory fragments can be as large
as you like, though in reality they must be smaller than MAXBSIZE in size.

- Use serial number and filesystem identifier to ensure that roll-forward
doesn't get old data and think it's new. Roll-forward is enabled for
v2 filesystems, though not for v1 filesystems by default.

- The inode free list is now a tailq, paving the way for undelete (undelete
is not yet implemented, but can be without further non-backwards-compatible
changes to disk structures).

- Inode atime information is kept in the Ifile, instead of on the inode;
that is, the inode is never written *just* because atime was changed.
Because of this the inodes remain near the file data on the disk, rather
than wandering all over as the disk is read repeatedly. This speeds up
repeated reads by a small but noticeable amount.

Other changes of note include:

- The ifile written by newfs_lfs can now be of arbitrary length, it is no
longer restricted to a single indirect block.

- Fixed an old bug where ctime was changed every time a vnode was created.
I need to look more closely to make sure that the times are only updated
during write(2) and friends, not after-the-fact during a segment write,
and certainly not by the cleaner.
 1.47 30-May-2001  mrg branches: 1.47.2; 1.47.4;
use _KERNEL_OPT
 1.46 03-Dec-2000  perseant branches: 1.46.2;
Get rid of some old unnecessary code that cleared B_NEEDCOMMIT from buffers in
lfs_writeseg (possibly after they had been freed).

If MALLOCLOG is defined, make lfs_newbuf and lfs_freebuf pass along the
caller's file and line to _malloc and _free.
 1.45 27-Nov-2000  chs Initial integration of the Unified Buffer Cache project.
 1.44 27-Nov-2000  perseant If LFS_DO_ROLLFORWARD is defined, roll forward from the older checkpoint
on mount, through the newer checkpoint and on through any newer
partial-segments that may have been written but not checkpointed because
of an intervening crash.

LFS_DO_ROLLFORWARD is not defined by default.
 1.43 09-Sep-2000  perseant Various bug-fixes to LFS, to wit:


Kernel:

* Add runtime quantity lfs_ravail, the number of disk-blocks reserved
for writing. Writes to the filesystem first reserve a maximum amount
of blocks before their write is allowed to proceed; after the blocks
are allocated the reserved total is reduced by a corresponding amount.

If the lfs_reserve function cannot immediately reserve the requested
number of blocks, the inode is unlocked, and the thread sleeps until
the cleaner has made enough space available for the blocks to be
reserved. In this way large files can be written to the filesystem
(or, smaller files can be written to a nearly-full but thoroughly
clean filesystem) and the cleaner can still function properly.

* Remove explicit switching on dlfs_minfreeseg from the kernel code; it
is now merely a fs-creation parameter used to compute dlfs_avail and
dlfs_bfree (and used by fsck_lfs(8) to check their accuracy). Its
former role is better assumed by a properly computed dlfs_avail.

* Bounds-check inode numbers submitted through lfs_bmapv and lfs_markv.
This prevents a panic, but, if the cleaner is feeding the filesystem
the wrong data, you are still in a world of hurt.

* Cleanup: remove explicit references of DEV_BSIZE in favor of
btodb()/dbtob().

lfs_cleanerd:

* Make -n mean "send N segments' blocks through a single call to
lfs_markv". Previously it had meant "clean N segments though N calls
to lfs_markv, before looking again to see if more need to be cleaned".
The new behavior gives better packing of direct data on disk with as
little metadata as possible, largely alleviating the problem that the
cleaner can consume more disk through inefficient use of metadata than
it frees by moving dirty data away from clean "holes" to produce
entirely clean segments.

* Make -b mean "read as many segments as necessary to write N segments
of dirty data back to disk", rather than its former meaning of "read
as many segments as necessary to free N segments worth of space". The
new meaning, combined with the new -n behavior described above,
further aids in cleaning storage efficiency as entire segments can be
written at once, using as few blocks as possible for segment summaries
and inode blocks.

* Make the cleaner take note of segments which could not be cleaned due
to error, and not attempt to clean them until they are entirely free
of dirty blocks. This prevents the case in which a cleanerd running
with -n 1 and without -b (formerly the default) would spin trying
repeatedly to clean a corrupt segment, while the remaining space
filled and deadlocked the filesystem.

* Update the lfs_cleanerd manual page to describe all the options,
including the changes mentioned here (in particular, the -b and -n
flags were previously undocumented).

fsck_lfs:

* Check, and optionally fix, lfs_avail (to an exact figure) and
lfs_bfree (within a margin of error) in pass 5.

newfs_lfs:

* Reduce the default dlfs_minfreeseg to 1/20 of the total segments.

* Add a warning if the sgs disklabel field is 16 (the default for FFS'
cpg, but not usually desirable for LFS' sgs: 5--8 is a better range).

* Change the calculation of lfs_avail and lfs_bfree, corresponding to
the kernel changes mentioned above.

mount_lfs:

* Add -N and -b options to pass corresponding -n and -b options to
lfs_cleanerd.

* Default to calling lfs_cleanerd with "-b -n 4".


[All of these changes were largely tested in the 1.5 branch, with the
idea that they (along with previous un-pulled-up work) could be applied
to the branch while it was still in ALPHA2; however my test system has
experienced corruption on another filesystem (/dev/console has gone
missing :^), and, while I believe this unrelated to the LFS changes, I
cannot with good conscience request that the changes be pulled up.]
 1.42 05-Jul-2000  perseant Clean up accounting of lfs_uinodes (dirty but unwritten inodes).

Make lfs_uinodes a signed quantity for debugging purposes, and set it to
zero as fs mount time.

Enclose setting/clearing of the dirty flags (IN_MODIFIED, IN_ACCESSED,
IN_CLEANING) in macros, and use those macros everywhere. Make
LFS_ITIMES use these macros; updated the ITIMES macro in inode.h to know
about this. Make ufs_getattr use ITIMES instead of FFS_ITIMES.
 1.41 03-Jul-2000  perseant i_lfs_effnblks fixes. Put debugging printfs under #ifdef DEBUG_LFS.
 1.40 30-Jun-2000  fvdl Rearrange code around getnewvnode as was already done for ffs, to avoid
locking against oneself because getnewvnode recycles a softdep-using vnode.
 1.39 28-Jun-2000  mrg remove include of <vm/vm.h> and <uvm/uvm_extern.h>
 1.38 27-Jun-2000  perseant Fixes associated with filling an LFS:

Change the space computation to appear to change the size of the *disk*
rather than the *bytes used* when more segment summaries and inode
blocks are written. Try to estimate the amount of space that these will
take up when more files are written, so the disk size doesn't change too
much.

Regularize error returns from lfs_valloc, lfs_balloc, lfs_truncate: they
now fail entirely, rather than succeeding half-way and leaving the fs in
an inconsistent state.

Rewrite lfs_truncate, mostly stealing from ffs_truncate. The old
lfs_truncate had difficulty truncating a large file to a non-zero size
(indirect blocks were not handled appropriately).

Unmark VDIROP on fvp after ufs_remove, ufs_rmdir, so these can be
reclaimed immediately: this vnode would not be written to disk again
anyway if the removal succeeded, and if it failed, no directory
operation occurred.

ufs_makeinode and ufs_mkdir now remove IN_ADIROP on error.
 1.37 22-Jun-2000  perseant fix my own typo, grr....
 1.36 22-Jun-2000  perseant Read i_ffs_gen from the version number in the Ifile during lfs_valloc,
instead of keeping it always == 1. (The ifile version number is
increased on vfree.) May address PR #7213, but I haven't been able to
test thoroughly enough to say for sure.
 1.35 22-Jun-2000  perseant Update lfs_vunref for the fact that now a vnode can be locked with no
references (locked for VOP_INACTIVE at the end of vrele) and it's okay.
Check the return value of lfs_vref where appropriate.
Fixes PR #s 10285 and 10352.
 1.34 06-Jun-2000  perseant branches: 1.34.2;
Protect inode free list with seglock, instead of separate lock, so that
the head of the inode free list (on the superblock) always matches the
rest of the free list (in the ifile).

Protect lfs_fragextend with seglock, to prevent the segment byte count
fudging from making its way to disk.

Don't try to inactivate dirop vnodes that are still in the middle of
their dirop (may address PR#10285).
 1.33 31-May-2000  perseant update for IN_ACCESSED changes
 1.32 27-May-2000  perseant branches: 1.32.2;
Prevent dirops from getting around lfs_check and wedging the buffer cache.
All the dirop vnops now mark the inodes with a new flag, IN_ADIROP, which
is removed as soon as the dirop is done (as opposed to VDIROP which stays
until the file is written). To address one issue raised in PR#9357.
 1.31 19-Jan-2000  perseant Changes to stabilize LFS. The first two of these should also apply to the
1.4 branch.

* Use a separate per-fs lock, instead of ufs_hashlock, to protect the Inode
free list. This seems to prevent the "lockmgr: %d, not exclusive lock holder
%d, unlocking" message I was mis-attributing last night to an unlocked vnode
being passed to vrele.

* Change calling semantics of lfs_ifind, to give better error reporting:
If fed a struct buf, it can report the block number of the offending inode
block as well as the inode number.

* Back out rev 1.10 of lfs_subr.c, since the replacement code was slightly
uglier while being functionally identical.

* Make lfs_vunref use the same free list convention as vrele/vput, so that
vget does not remove vnodes from a hash list they are not on.
 1.30 15-Dec-1999  perseant Fix error returns on lfs vnops so that locks and reference counts are
preserved. Handle dirop accounting in lfs_vfree for this case as well.
May address PR#8823.
 1.29 15-Nov-1999  fvdl Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.28 12-Nov-1999  perseant Back out my patch of the 8th (to address unreferenced inode problem).
Apparently this needs more thought.
 1.27 09-Nov-1999  perseant If ifile blocks were written before dirops were complete, and then the
system crashed, inodes could be allocated that were not referenced. (Though
not a serious problem, it evidences itself in phase 4 of fsck_lfs.) Fix
this by marking if_daddr with UNASSIGNED before the inodes are actually
written; at mount time the ifile is checked for UNASSIGNED entries and
any that are found are linked back into the free list. (The latter
functionality should move into the roll-forward agent when it materializes.)
 1.26 06-Nov-1999  perseant branches: 1.26.2;
Address ufs_hashlock/ufs_ihashins protocol bug, discovered while doing a
post-mortem of a production machine. Also, take the active dirop
count off of the fs and make it global (since it is measuring a global
resource) and tie the threshold value LFS_MAXDIROP to desiredvnodes.
 1.25 03-Sep-1999  perseant branches: 1.25.2; 1.25.4; 1.25.6;
Make changes that will allow an LFS filesystem to be used as the root
filesystem. In particular,

- Fix mknod deadlock, described in PR 8172.
- Enable lfs_mountroot.
- Make lfs_writevnodes treat filesystems mounted on lfs device nodes properly,
by flushing that device rather than trying to add blocks to the device inode.

This, in combination with lfs boot blocks, will allow operation of an all-lfs
system.
 1.24 08-Jul-1999  wrstuden Modify file systems to deal with struct lock in struct vnode. All leaf
fs's other than nfs use genfs_lock() for locking.

Modify lookup routines to set PDIRUNLOCK when they unlock the parrent.
 1.23 17-Jun-1999  tls squash some compiler warnings on debug printfs by casting to int
 1.22 15-Jun-1999  perseant Minor changes to the segment live bytes calculation. In particular, fixed
a bug in fragment extension that could run the count negative. Also, don't
overcount for inodes, and don't count segment summaries. Thus, for empty
segments the live bytes count should now be exactly zero.
 1.21 16-Apr-1999  perseant Other half of the ufs_hashlock locking fix (oops)
 1.20 16-Apr-1999  perseant Fix locking panic on ufs_hashlock
 1.19 11-Apr-1999  perseant Take out the `#ifdef USE_UFSHASH'; use ufs_hashlock to lock the inode free
list instead of free_lock.
 1.18 24-Mar-1999  mrg branches: 1.18.2;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.
 1.17 10-Mar-1999  perseant New sources should leave the LFS in a more-or-less working state. Changes
include:

- DIROP segregation is enabled, and greater care is taken
to make sure that a checkpoint completes. Fsck is not
needed to remount the filesystem.
- Several checks to make sure that the LFS subsystem does not
overuse various resources (memory, in particular).
- The cleaner routines, lfs_markv in particular, are completely
rewritten. A buffer overflow is removed. Greater care is taken
to ensure that inodes come from where lfs_cleanerd say they come
from (so we know nothing has changed since lfs_bmapv was called).
- Fragment allocation is fixed, so that writes beyond end-of-file
do the right thing.
 1.16 23-Oct-1998  thorpej Use DINODE_SIZE rather than sizeof(struct dinode).
 1.15 01-Sep-1998  thorpej Use the pool allocator and the "nointr" pool page allocator for LFS inodes.
 1.14 24-Jun-1998  sommerfe Always include fifos; "not an option any more".
 1.13 09-Jun-1998  scottr Protect various config(8)-generated files from inclusion while
building LKMs. Fixes PR 5557.
 1.12 08-Jun-1998  scottr Use the newly-defined opt_quota.h.
 1.11 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.10 07-Feb-1998  chs add UVM stuff.
 1.9 04-Jul-1997  drochner Don't cast 64bit (off_t) file sizes to vm_offset_t (32bit on many
architectures), truncate them intelligently instead.
The truncation is done centralized in vnode_pager.c.
This prevents from wrap-over effects when parts of large (>2^32 byte) files
are mmapped.
Don't allow to mmap above the numerical range of vm_offset_t.
This is considered a temporary solution until the vm system handles the
object sizes/offsets more cleanly.
 1.8 11-Jun-1997  bouyer Add support for ext2fs, this needed a few modifications to ufs/ufs/inode.h:
- added an "union inode_ext" to struct inode, for the per-fs extentions.
For now only ext2fs uses it.
- i_din is now an union:
union {
struct dinode ffs_din; /* 128 bytes of the on-disk dinode. */
struct ext2fs_dinode e2fs_din; /* 128 bytes of the on-disk dinode. */
} i_din
Added a lot of #define i_ffs_* and i_e2fs_* to access the fields.
- Added two macros: FFS_ITIMES and EXT2FS_ITIMES. ITIMES calls the rigth
macro, depending on the time of the inode. ITIMES is used where necessary,
FFS_ITIMES and EXT2FS_ITIMES in other places.
 1.7 10-Mar-1997  mycroft Just increment the generation count. Using the time is bogus and defeats
fsirand(8).
 1.6 12-Oct-1996  christos branches: 1.6.6;
revert previous kprintf changes
 1.5 10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.4 25-Mar-1996  pk Appease gcc: unused variables if !QUOTA
 1.3 09-Feb-1996  christos lfs prototypes
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.6.6.1 12-Mar-1997  is Merge in changes from Trunk
 1.18.2.8 20-Jan-2000  he Pull up revision 1.31 (via patch, requested by perseant):
Files removed (through unlink, rmdir) are now really removed, though the
removal is postponed until the dirop is complete to ensure validity of
the filesystem through a crash. Use a separate per-fs lock, instead of
ufs_hashlock, to protect the inode free list. Change calling semantics
of lfs_ifind, to give better error reporting: If fed a struct buf, it
can report the block number of the offending inode block as well as the
inode number.
 1.18.2.7 15-Jan-2000  he Pull up revision 1.30 (requested by perseant):
Fix error returns on lfs vnops so that locks and reference counts
are preserved. Handle dirop accounting in lfs_vfree for this
case as well. Addresses PR#8823.
 1.18.2.6 15-Jan-2000  he Pull up revision 1.25 (requested by perseant):
Address problems related to using an LFS filesystem as the root
filesystem, including mknod hangs. Fixes PR#8172 and PR#9072.
 1.18.2.5 17-Dec-1999  he Pull up revision 1.26 (requested by perseant):
Address locking protocol error for inode hash, and make the
maximum number of active dirops a global quantity.
 1.18.2.4 03-Sep-1999  he Pull up revision 1.23:
Fix a printf format bug that gives compiler warnings/errors on
64-bit platforms, fixing PR#8241. (perseant)
 1.18.2.3 25-Jun-1999  perry pullup 1.21->1.22 (perseant)
 1.18.2.2 16-Apr-1999  perseant branches: 1.18.2.2.2; 1.18.2.2.4;
Pull up src/sys/ufs/lfs: lfs_alloc.c 1.19->1.21.

This fixes another locking problem, this time a lock on ufs_hashlock in
lfs_vfree. The lock could be held by a process calling getnewvnode, and
then attempted again by lfs_vfree. This works around that, not attempting
to get the lock if curproc already holds it.
 1.18.2.1 13-Apr-1999  perseant Pull-up of changes made to the trunk on Sunday [1.18->1.19], to wit:

Take out the `#ifdef USE_UFSHASH'; use ufs_hashlock to lock the inode free
list instead of free_lock.

Fix inode reporting in lfs_statfs (the meaning of f_files and f_ffree was
reversed).

Fix "lfs_ifind: dinode xxx not found" panic. When inodes were freed, then
immediately reloaded, their dinodes were located in an inode block which
was not on disk at the advertized location, nor in the cache (although it
would be flushed to disk next segment write). Fix this by using getblk()
instead of lfs_newbuf() for inode blocks.

Better checking for held inode locks in lfs_fastvget, for a number of
error conditions. Also change the default setting of lfs_clean_vnhead to
0, which seems to make the locking problems go away (although this is
difficult to test as I can't reliably reproduce them).

Make sure that the wakeup occurs for vnodes that lfs_update might be
sleeping on (nodes which are not marked IN_MODIFIED/IN_CLEANING, but which
have dirty buffers), by marking them with the appropriate flag if
dirtybuffers were added while the write was in progress.

Fix block counting during file truncation, if not truncating to zero.

Disallow threshold-initiated cache flush when dirops are active. Also,
make SET_ENDOP use lfs_check instead of inlining most of it.

Improve the debugging printfs in the cleaner syscalls (in particular, make
it obvious that they're coming from lfs).

Check the superblock version field, and refuse to mount the filesystem if
the version number is higher than we know about. This allows, e.g.,
changes in the format of the ifile, segment size restrictions and
boundaries, etc., which would not affect existing fields in the
superblock, but which would drastically affect the filesystem, to be
smoothly integrated at a later date.
 1.18.2.2.4.1 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.18.2.2.2.4 31-Aug-1999  perseant Rudimentary support for LFS under UBC:

- LFS-specific VOP_BALLOC and VOP_PUTPAGES vnode ops.

- getblk VREG panic #ifdef'd out (can be reinstated when Ifile is
internalized and Ifile can be made another type from VREG)

- interface to VOP_PUTPAGES changed to pass all pager flags, not
just sync. FS putpages routines must know about the pager flags.

- new LFS magic disk address, -2 ("unwritten"), meaning accounted for
but not assigned to a fixed disk location (since LFS does these two
things separately, and the previous accounting method using buffer
headers no longer will work). Changed references to (foo == (daddr_t)-1)
to (foo < 0). Since disk drivers reject all addresses < 0, this should
not present a problem for other FSs.
 1.18.2.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.18.2.2.2.2 11-Jul-1999  chs remove uvm_vnp_uncache(), it's no longer needed.
 1.18.2.2.2.1 21-Jun-1999  thorpej Sync w/ -current.
 1.25.6.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.25.4.2 15-Nov-1999  fvdl Sync with -current
 1.25.4.1 19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.25.2.2 08-Dec-2000  bouyer Sync with HEAD.
 1.25.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.26.2.2 06-Nov-1999  perseant Address ufs_hashlock/ufs_ihashins protocol bug, discovered while doing a
post-mortem of a production machine. Also, take the active dirop
count off of the fs and make it global (since it is measuring a global
resource) and tie the threshold value LFS_MAXDIROP to desiredvnodes.
 1.26.2.1 06-Nov-1999  perseant file lfs_alloc.c was added on branch comdex-fall-1999 on 1999-11-06 20:33:06 +0000
 1.32.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.34.2.4 14-Sep-2000  perseant Pull up recent LFS kernel changes (approved by thorpej):

ufs/ufs/inode.h, 1.20--1.22 (add i_lfs_effnblks extension ;
make ITIMES aware of LFS_ITIMES;
_LKM protection so userland progs
compile)
ufs/ufs/ufs_vnops.c, 1.69, 1.71 (remove IN_ADIROP;
use ITIMES instead of FFS_ITIMES)
ufs/ufs/ufs_readwrite.c, 1.27 (use lfs_reserve in lfs_write)
ufs/lfs/lfs.h, 1.26--1.32 (define LFS_EST_* macros ;
change MIN_FREE_SEGS to lfs_minfreesegs ;
add avail and bfree to CLEANERINFO ;
change lfs_uinodes to signed ;
change lfs_dmeta to signed ;
add whitespace to line up structure
members ;
explicit cast to int32_t in LFS_EST_*
macros)
ufs/lfs/lfs_alloc.c, back out 1.34.2.3 (pullups of 1.39, 1.40);
then pull up 1.38 (clean up on error)
1.39--1.43 (restore fvdl's ufs_hashlock fix ;
restore fvdl's ufs_hashlock fix ;
set i_lfs_effnblks ;
use UINO macros ;
add comments and fix long lines)
ufs/lfs/lfs_balloc.c, 1.19 (don't succeed halfway)
1.21--1.25 (use i_lfs_effnblks ;
fix i_lfs_effnblks computation and
quieten ;
fix i_ffs_blocks in unwritten fragment ;
remove useless debugging check ;
add comments and (c) 2000)
ufs/lfs/lfs_bio.c, 1.24--1.30 (cleanup and make lfs_flush_fs take
"struct lfs *" instead of "struct
mount *" ;
use lfs_minfreeseg instead of
MIN_FREE_SEGS ;
use UINO macros, and copy bfree/avail
to CLEANERINFO ;
add lfs_reserve function ;
1.28--1.30 fix printf formatting)
ufs/lfs/lfs_cksum.c, 1.13 (add (c) 2000)
ufs/lfs/lfs_debug.c, 1.11 (use btodb instead of DEV_BSIZE)
ufs/lfs/lfs_extern.h, 1.18, 1.20--1.21 (function prototype changes)
ufs/lfs/lfs_inode.c, 1.38 (rewrite lfs_truncate from
ffs_truncate)
1.40--1.44 (count written and unwritten blocks
seperately ;
use disk block units instead of bytes ;
remove unnecessary "mod" variable ;
correct B_DELWRI to avoid bawrite panic ;
use lfs_reserve)
ufs/lfs/lfs_segment.c, 1.52-1.59 (use lfs_dmeta to note used summaries ;
check for UNWRITTEN in indirect blocks ;
more debugging stuff inside #ifdef
DEBUG_LFS ;
use LK_CANRECURSE ;
don't drop dirty indirect blocks ;
use UINO macros ;
don't hose the free list ;
use btodb() instead of DEV_BSIZE ;
make it compile again (oops))
ufs/lfs/lfs_subr.c, 1.16--1.17 (check for locked inodes before
changing ;
use btodb() instead of DEV_BSIZE, (c)
2000)
ufs/lfs/lfs_syscalls.c, back out 1.41.4.2 (fvdl's ufs_hashlock fix);
then pull up 1.43 (use lfs_dmeta)
1.44--1.45 (restore fvdl's ufs_hashlock fix)
1.46--1.47 (fix lfs_avail leakage from sblock
segments ;
use UINO macros)
1.49 (bounds-check inode numbers in
lfs_markv)
ufs/lfs/lfs_vfsops.c, 1.53 (use LFS_EST_* macros in lfs_statfs)
1.56--1.58 (initialize lfs_minfreeseg, lfs_effnblk ;
initialize lfs_uinodes ;
initialize lfs_ravail)
ufs/lfs/lfs_vnops.c, 1.40 (remove VDIROP from removed files)
1.42--1.44 (move SET_ENDOP below the removal of
VDIROP ;
use UINO macros and add lfs_itimes
function ;
use lfs_reserve in dirops)
 1.34.2.3 03-Jul-2000  fvdl pullup the fixes from the trunk to not hold ufs_hashlock across
getnewvnode()
 1.34.2.2 28-Jun-2000  perseant pull up i_ffs_gen patch from trunk
 1.34.2.1 22-Jun-2000  perseant Pull up lfs_vunref fix from the trunk.
 1.46.2.11 08-Jan-2003  thorpej Sync with HEAD.
 1.46.2.10 11-Dec-2002  thorpej Sync with HEAD.
 1.46.2.9 20-Jun-2002  nathanw Catch up to -current.
 1.46.2.8 28-Feb-2002  nathanw Catch up to -current.
 1.46.2.7 08-Jan-2002  nathanw Catch up to -current.
 1.46.2.6 14-Nov-2001  nathanw Catch up to -current.
 1.46.2.5 22-Oct-2001  nathanw Catch up to -current.
 1.46.2.4 08-Oct-2001  nathanw Catch up to -current.
 1.46.2.3 21-Sep-2001  nathanw Catch up to -current.
 1.46.2.2 24-Aug-2001  nathanw Catch up with -current.
 1.46.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.47.4.4 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.47.4.3 11-Feb-2002  jdolecek Sync w/ -current.
 1.47.4.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.47.4.1 03-Aug-2001  lukem update to -current
 1.47.2.6 13-Jul-2001  perseant Be more careful about when we update ctime/mtime. In particular, if we
are only writing indirect blocks, that doesn't count for mtime; and when
we first create a vnode, that certainly *does not* count for ctime
(a bug that's been there from the beginning).

This does not change the fact that mtime might still be set after write(2)
is "completed", but it does make the atime-in-the-ifile code have some
effect (noticeable less degradation of read time after an intervening
large write).
 1.47.2.5 10-Jul-2001  perseant Turn the free list into a tailq, with both head and tail kept on the ifile.

Update access times on the inode even if it does not get marked IN_ACCESS.
 1.47.2.4 02-Jul-2001  perseant Change disk addressing unit to be the fragment, instead of the disk sector.
All quantities in the superblock, inodes, indirect blocks, etc. refer now
to this abstract unit (called "fsb" as it is in FFS) instead of disk sectors;
as a consequence segment summary blocks have to be multiples of a fragment in
size. In v1 filesystems, compatibility code ensures that 1 fsb == 1 sector,
regardless of fragment size.

Fragments can now range in size between 512 and 32k; in the event that
LFS_LABELPAD (8k) is smaller than the disk address unit size, an extra
proto-superblock is kept at 8k from the beginning of the disk, to be used
*only* to locate the real superblocks. (Not all of the userland knows about
this yet.)

Almost all of this was done not by me, but by joff.
 1.47.2.3 29-Jun-2001  perseant Update the Ifile copy of the free list head in lfs_vfree, so inode numbers
actually get reused.
 1.47.2.2 29-Jun-2001  perseant Get rid of __P(), protoizing where it had not already been done
 1.47.2.1 27-Jun-2001  perseant Import of what I've been calling "LFSv2", that is, LFS with some features
added that require changes to the on-disk data structures. These include:

- 64-bit time in everything but inodes
- User-specified segment offset, and segment size no longer
restricted to PO2.
- Serial number on segment summaries in addition to timestamp, and
a new volume identifier, to make roll-forward feasible without
fear of finding old data and thinking it was new.

Although I think this version works at least as well as what's on the trunk,
we're not done yet; hence this commit is going in on a branch and not on
the trunk. Enhancements that are not here yet include fragment addressing,
like FFS does, instead of block addressing.
 1.48.2.1 01-Oct-2001  fvdl Catch up with -current.
 1.51.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.69.2.10 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.69.2.9 01-Apr-2005  skrll Sync with HEAD.
 1.69.2.8 08-Mar-2005  skrll Sync with HEAD.
 1.69.2.7 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.69.2.6 21-Sep-2004  skrll Fix the sync with head I botched.
 1.69.2.5 18-Sep-2004  skrll Sync with HEAD.
 1.69.2.4 25-Aug-2004  skrll Sync with HEAD.
 1.69.2.3 24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.69.2.2 03-Aug-2004  skrll Sync with HEAD
 1.69.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.72.4.1 10-May-2005  riz Pull up the following revisions (requested by perseant in ticket #1281):

1.8 sys/ufs/lfs/TODO
1.75 sys/ufs/lfs/lfs.h (via patch)
1.74 sys/ufs/lfs/lfs_alloc.c (via patch)
1.49, 1.51 sys/ufs/lfs/lfs_balloc.c (1.51 via patch)
1.78 sys/ufs/lfs/lfs_bio.c
1.62 sys/ufs/lfs/lfs_extern.h (via patch)
1.156 sys/ufs/lfs/lfs_segment.c (via patch)
1.48 sys/ufs/lfs/lfs_subr.c
1.101 sys/ufs/lfs/lfs_syscalls.c
1.163 sys/ufs/lfs/lfs_vfsops.c (via patch)
1.134 sys/ufs/lfs/lfs_vnops.c (via patch)
1.61 sys/ufs/ufs/ufs_readwrite.c (via patch)

1.20 libexec/lfs_cleanerd/clean.h (via patch)
1.52 libexec/lfs_cleanerd/cleanerd.c (via patch)
1.41 libexec/lfs_cleanerd/library.c (via patch)

1.4 regress/sys/fs/lfs/newfs_fsck/Makefile
1.2 regress/sys/fs/lfs/newfs_fsck/mkfs_mount
1.2 regress/sys/fs/lfs/newfs_fsck/smallfiles
1.3 sbin/fsck_lfs/bufcache.c
1.3 sbin/fsck_lfs/bufcache.h
1.3 sbin/fsck_lfs/lfs.h
1.8 sbin/fsck_lfs/lfs.c (via patch)
1.8 sbin/fsck_lfs/pass3.c (via patch)
1.18 sbin/fsck_lfs/pass0.c (via patch)
1.18 sbin/fsck_lfs/utilities.c (via patch)
1.7 sbin/fsck_lfs/segwrite.c
1.19 sbin/fsck_lfs/setup.c (via patch)
1.3 sbin/newfs_lfs/Makefile
0 sbin/newfs_lfs/lfs.c (yes, remove it)
1.1 sbin/newfs_lfs/make_lfs.c
1.15 sbin/newfs_lfs/newfs.c (via patch)

Various minor LFS improvements.

Kernel:

* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this. Should fix PR #29045.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
Fixes PR #26680.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().

cleaner:

* Adapt lfs_cleanerd to use the fcntl call to get the Ifile filehandle,
so it need not be in the namespace.
* Make lfs_cleanerd be more careful when there are very few available
segments.
* Make lfs_cleanerd less verbose when the filesystem is unmounted.

newfs_lfs, fsck_lfs, and regression:

* Extend the lfs library from fsck_lfs(8) so that it can be used with a
not-yet-existent LFS. Make newfs_lfs(8) use this library, so it can
create LFSs whose Ifile is larger than one segment. Addresses PR #11110.
* Make newfs_lfs(8) use strsuftoi64() for its arguments, a la newfs(8).
* Make fsck_lfs(8) respect the "file system is clean" flag.
* Don't let fsck_lfs(8) think it has dirty blocks when invoked with the
-n flag.
* Remove the Ifile from the filesystem namespace. The cleaner now uses
a fcntl call on the root inode to find the Ifile filehandle. (As a
side-effect, addresses PR #29144.)
 1.73.6.2 26-Mar-2005  yamt sync with head.
 1.73.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.73.4.1 29-Apr-2005  kent sync with -current
 1.76.2.12 10-Aug-2006  tron Apply patch (requested by fair in perseant #1457):
Bring LFS up to current, including a patch (1.95 lfs_alloc.c) that
should prevent the inode free list errors seen on the STABLE branch
subsequent to pullup ticket #1327.
 1.76.2.11 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_alloc.c: revision 1.93
sys/ufs/lfs/lfs.h: revision 1.106
sys/ufs/lfs/lfs_vfsops.c: revision 1.209
sys/ufs/lfs/lfs_vnops.c: revision 1.175
sys/ufs/lfs/lfs_segment.c: revision 1.178
Fixes to address the "vinvalbuf: dirty blocks" panic that can occur when
many inodes are cleaned at once. Make sure that we write all the pages
on vnodes that are being flushed, even if we don't think there's room;
drain v_numoutput before lfs_vflush() completes.
Also, don't allow a vnode that is in the process of being cleaned to be
chosen by getnewvnode(); this avoids a segment accounting panic in the case
that a large number of inodes are fed to lfs_markv() all at once.
 1.76.2.10 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_alloc.c: revision 1.92
sys/ufs/lfs/lfs.h: revision 1.105
sys/ufs/lfs/lfs_vfsops.c: revision 1.207
sys/ufs/lfs/lfs_subr.c: revision 1.59
sys/ufs/lfs/lfs_vnops.c: revision 1.173
sys/ufs/lfs/lfs_bio.c: revision 1.92
Introduce another per-filesystem parameter, lfs_resvseg, to separate the
notion of "how many segments are reserved for the cleaner" from that of
"how many segments are not counted in lfs_bfree". The default value
used for existing filesystems is the same as the previous implicit value
of (lfs_minfreeseg / 2 + 1), modulo some sanity checking.
Count pending dirops on a per-filesystem basis, since once we start
writing them we can't stop until we're done. This seems to help stave off
the "no clean segments" panic in the case of filling the filesystem with
directories and small files (e.g. simultaneously unpacking more copies of
pkgsrc than will fit).
 1.76.2.9 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_alloc.c: revision 1.91
Add an explicit list initialization that was missing from my last commit.
 1.76.2.8 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs.h: revision 1.104
sys/ufs/lfs/lfs_vfsops.c: revision 1.206
sys/ufs/lfs/lfs_vnops.c: revision 1.170
sys/ufs/lfs/lfs_extern.h: revision 1.80
sys/ufs/lfs/lfs_segment.c: revision 1.176
sys/ufs/lfs/lfs_inode.c: revision 1.103 via patch
sys/ufs/lfs/lfs_alloc.c: revision 1.90
Postpone the segment accounting changes coming from truncation until the
inode that makes those changes valid is either written to disk by
lfs_writeinode() or discarded by lfs_vfree().
A couple of locking fixes are also included as well.
 1.76.2.7 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_alloc.c: revision 1.89
Fix a fencepost error in the bitmap handling in extend_ifile(), and another
in lfs_freelist_prev().
 1.76.2.6 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs.h: revision 1.101
sys/ufs/lfs/lfs_vfsops.c: revision 1.202
sys/ufs/lfs/lfs_alloc.c: revision 1.88
Optimize the free list search a little more; in particular use words
instead of bytes for the index, and never search below fs->lfs_freehd.
Fix a bug in the previous version of the search (an erroneous assumption
that ino_t was signed).
Free the bitmap when we unmount the filesystem.
 1.76.2.5 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_alloc.c: revision 1.87
sys/ufs/lfs/lfs.h: revision 1.99
sys/ufs/lfs/lfs_vfsops.c: revision 1.199
sys/ufs/lfs/lfs_extern.h: revision 1.77 via patch
Keep the free list ordered. This solves a problem first pointed out to me
by Michel Oey, in which an aged LFS writes up to an extra Ifile block for
every file created; and paves the way for the truncation of the Ifile when
many files are deleted.
 1.76.2.4 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.153
sys/ufs/lfs/lfs_debug.c: revision 1.32
sys/ufs/lfs/lfs_alloc.c: revision 1.84
sys/ufs/lfs/lfs_vfsops.c: revision 1.185
sys/ufs/lfs/lfs_segment.c: revision 1.165
64 bit inode changes.
 1.76.2.3 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.152
sys/ufs/lfs/lfs_debug.c: revision 1.31
sys/ufs/lfs/lfs_subr.c: revision 1.53
sys/ufs/lfs/lfs_extern.h: revision 1.68
sys/ufs/lfs/lfs_inode.c: revision 1.96
sys/ufs/lfs/lfs_bio.c: revision 1.86
sys/ufs/lfs/lfs_alloc.c: revision 1.83
sys/ufs/lfs/lfs_vfsops.c: revision 1.181
sys/ufs/lfs/lfs.h: revision 1.88
sys/ufs/lfs/lfs_segment.c: revision 1.164
- sprinkle const
- avoid shadow variables.
 1.76.2.2 07-May-2005  tron Apply patch (requested by perseant in ticket #242):
* fsck_lfs buffer cache fixes, including PR #29151
* Change fsck_lfs phase 0 message to reflect reality
* fsck_lfs: check phase 5 (cleanerinfo accounting) even on
roll-forward
* Keep better track of the free list during roll-forward, avoiding
a core dump
* Improve hash table use for fsck_lfs buffer and vnode cache
* Document fsck_lfs flag -f, and implement -q
* Add resize_lfs, including kernel support
* Add LFS to mountd's list of exportable filesystem types
* Make the LFS lkm work again [christos@]
* Add MP locking to the LFS kernel subsystem
* Fix pager_map deadlock in lfs_putpages()
* Avoid incomplete file extension that looks like "partial
truncation" to fsck
* Use lfs_malloc for cleaner malloc, since the cleaner often runs
in low-memory conditions.
* Use splay trees, not hash table, to track page allocation for
write.
* Fix mkdir panic on full fs
* Fix page accounting leak by counting differently.
* Use rightly named structure for lfs_getattr [skrll@]
* Cosmetic changes for readability.
 1.76.2.1 30-Mar-2005  tron Pull up revision 1.77 (requested by perseant in ticket #74):
Make LFS dirops get their vnode first, before incrementing the dirop
count, to prevent a deadlock trying to call VOP_PUTPAGES() on a VDIROP
vnode. This can happen when a stacked filesystem is mounted on top of an
LFS: an LFS dirop needs to get a vnode, which is available from the upper
layer. The corresponding lower layer vnode, however, is VDIROP, so the
upper layer can't be cleaned out since its VOP_PUTPAGES() is passed
through to the lower layer, which waits for dirops to drain before it can
proceed. Deadlock.
Tweak ufs_makeinode() and ufs_mkdir() to pass the a_vpp argument through
to VOP_VALLOC().
Partially addresses PR # 26043, though it probably does not completely fix
the problem described there.
 1.83.2.7 04-Feb-2008  yamt sync with head.
 1.83.2.6 21-Jan-2008  yamt sync with head
 1.83.2.5 27-Oct-2007  yamt sync with head.
 1.83.2.4 03-Sep-2007  yamt sync with head.
 1.83.2.3 26-Feb-2007  yamt sync with head.
 1.83.2.2 30-Dec-2006  yamt sync with head.
 1.83.2.1 21-Jun-2006  yamt sync with head.
 1.84.2.2 29-Oct-2005  yamt use lfs_* directly rather than via ufs_ops.
suggested by Chuck Silvers.
 1.84.2.1 20-Oct-2005  yamt adapt ufs.
 1.86.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.86.10.4 11-May-2006  elad sync with head
 1.86.10.3 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.86.10.2 19-Apr-2006  elad sync with head.
 1.86.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.86.8.4 03-Sep-2006  yamt sync with head.
 1.86.8.3 11-Aug-2006  yamt sync with head
 1.86.8.2 24-May-2006  yamt sync with head.
 1.86.8.1 11-Apr-2006  yamt sync with head
 1.86.6.2 01-Jun-2006  kardel Sync with head.
 1.86.6.1 22-Apr-2006  simonb Sync with head.
 1.86.4.1 09-Sep-2006  rpaulo sync with head
 1.94.4.1 13-Jul-2006  gdamore Merge from HEAD.
 1.97.4.2 10-Dec-2006  yamt sync with head.
 1.97.4.1 22-Oct-2006  yamt sync with head
 1.97.2.1 18-Nov-2006  ad Sync with head.
 1.100.8.1 11-Jul-2007  mjf Sync with head.
 1.100.6.4 15-Jul-2007  ad Sync with head.
 1.100.6.3 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.100.6.2 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.100.6.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.101.10.1 14-Oct-2007  yamt sync with head.
 1.101.8.3 23-Mar-2008  matt sync with HEAD
 1.101.8.2 09-Jan-2008  matt sync with HEAD
 1.101.8.1 06-Nov-2007  matt sync with HEAD
 1.101.6.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.103.10.2 02-Jan-2008  bouyer Sync with HEAD
 1.103.10.1 13-Dec-2007  bouyer Sync with HEAD
 1.103.8.1 13-Dec-2007  yamt sync with head.
 1.103.6.4 26-Dec-2007  ad Sync with head.
 1.103.6.3 19-Dec-2007  ad Use a global lfs_lock.
 1.103.6.2 19-Dec-2007  ad Get lfs mostly working.
 1.103.6.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.103.4.1 18-Feb-2008  mjf Sync with HEAD.
 1.106.10.4 11-Aug-2010  yamt sync with head.
 1.106.10.3 11-Mar-2010  yamt sync with head
 1.106.10.2 16-Sep-2009  yamt sync with head
 1.106.10.1 16-May-2008  yamt sync with head.
 1.106.8.1 18-May-2008  yamt sync with head.
 1.106.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.109.4.2 03-Jul-2010  rmind sync with head
 1.109.4.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.109.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.110.6.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.111.8.1 17-Mar-2012  bouyer Pull up following revision(s) (requested by perseant in ticket #116):
sys/ufs/lfs/lfs_alloc.c: revision 1.112
tests/fs/vfs/t_rmdirrace.c: revision 1.9
tests/fs/vfs/t_renamerace.c: revision 1.25
sys/ufs/lfs/lfs_vnops.c: revision 1.240
sys/ufs/lfs/lfs_segment.c: revision 1.224
sys/ufs/lfs/lfs_bio.c: revision 1.122
sys/ufs/lfs/lfs_vfsops.c: revision 1.294
sbin/newfs_lfs/make_lfs.c: revision 1.19
sys/ufs/lfs/lfs.h: revision 1.136
Pass t_renamerace and t_rmdirrace tests.
Adapt dholland@'s fix to ufs_rename to fix PR kern/43582. Address several
other MP locking issues discovered during the course of investigating the
same problem.
Removed extraneous vn_lock() calls on the Ifile, since the Ifile writes
are controlled by the segment lock.
Fix PR kern/45982 by deemphasizing the estimate of how much metadata
will fill the empty space on disk when the disk is nearly empty
(t_renamerace crates a lot of inode blocks on a tiny empty disk).
 1.111.6.1 18-Feb-2012  mrg merge to -current.
 1.111.2.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.111.2.2 23-Jan-2013  yamt sync with head
 1.111.2.1 17-Apr-2012  yamt sync with head
 1.112.2.4 03-Dec-2017  jdolecek update from HEAD
 1.112.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.112.2.2 23-Jun-2013  tls resync from head
 1.112.2.1 25-Feb-2013  tls resync with head
 1.117.2.1 28-Aug-2013  rmind sync with head
 1.119.6.5 28-Aug-2017  skrll Sync with HEAD
 1.119.6.4 05-Oct-2016  skrll Sync with HEAD
 1.119.6.3 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.119.6.2 22-Sep-2015  skrll Sync with HEAD
 1.119.6.1 06-Jun-2015  skrll Sync with HEAD
 1.131.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.133.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.135.6.1 30-Oct-2017  snj Pull up following revision(s) (requested by maya in ticket #330):
sbin/fsck_lfs/inode.c: 1.69
sbin/fsck_lfs/lfs.c: 1.73
sbin/fsck_lfs/pass6.c: 1.50
sbin/fsck_lfs/segwrite.c: 1.46
sys/ufs/lfs/lfs.h: 1.202-1.203
sys/ufs/lfs/lfs_accessors.h: 1.48
sys/ufs/lfs/lfs_alloc.c: 1.136-1.137
sys/ufs/lfs/lfs_balloc.c: 1.94
sys/ufs/lfs/lfs_bio.c: 1.141
sys/ufs/lfs/lfs_extern.h: 1.113
sys/ufs/lfs/lfs_inode.c: 1.156-1.157
sys/ufs/lfs/lfs_inode.h: 1.20, 1.21, 1.23
sys/ufs/lfs/lfs_itimes.c: 1.20
sys/ufs/lfs/lfs_pages.c: 1.13-1.15
sys/ufs/lfs/lfs_rename.c: 1.22
sys/ufs/lfs/lfs_segment.c: 1.270-1.275
sys/ufs/lfs/lfs_subr.c: 1.94-1.97
sys/ufs/lfs/lfs_syscalls.c: 1.175
sys/ufs/lfs/lfs_vfsops.c: 1.360
sys/ufs/lfs/lfs_vnops.c: 1.316-1.321
sys/ufs/lfs/ulfs_inode.c: 1.20
sys/ufs/lfs/ulfs_inode.h: 1.24
sys/ufs/lfs/ulfs_lookup.c: 1.41
sys/ufs/lfs/ulfs_quota2.c: 1.31
sys/ufs/lfs/ulfs_readwrite.c: 1.24
sys/ufs/lfs/ulfs_vnops.c: 1.49-1.50
Update inode member i_flag --> i_state to keep up with kernel changes
Move definition of IN_ALLMOD near the flag it's a mask for.
Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
Correct confusion between i_flag and i_flags
These will have to be renamed.
Spotted by Riastradh, thanks!
Add an XXX about the missing flags so it's not buried in a commit
message.
now the XXX count for LFS is 260
Rename i_flag to i_state.
The similarity to i_flags has previously caused errors.
Use continue to denote the no-op loop to match netbsd style
newline for extra clarity.
It isn't safe to drain dirops with seglock held, it'll deadlock if there
are any dirops. drain before grabbing seglock.
lfs_dirops == 0 is always true (as we already drained dirops), so omit
that part of the comparison.
Fixes a lot of LFS deadlocks. PR kern/52301
Many thanks to dholland for help analyzing coredumps
Ifdef out KDASSERT which fires on my machine.
Deduplicate sanity check that seglock is held on segunlock
Revert r1.272 fix to PR kern/52301, the performance hit is making things
unusable.
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar
XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
fix buffer overflow/KASSERT when cookies are supplied
lfs no longer uses the ffs-style struct direct, use the correct minimum
size
from dholland
XXX more wrong
Consistently use {,UN}MARK_VNODE macros rather than function calls.
Not much point doing anything after a panic call
Ask some question about the code in a XXX comment
XXX question our double-flushing of dirops
Fix typo in comment
 1.137.10.2 29-Feb-2020  ad Sync with head.
 1.137.10.1 17-Jan-2020  ad Sync with head.
 1.137.8.1 17-Aug-2020  martin Pull up following revision(s) (requested by riastradh in ticket #1050):

sys/ufs/lfs/lfs_subr.c: revision 1.101
sys/ufs/lfs/lfs_subr.c: revision 1.102
sys/ufs/lfs/lfs_inode.c: revision 1.158
sys/ufs/lfs/lfs_inode.h: revision 1.25
sys/ufs/lfs/lfs_balloc.c: revision 1.95
sys/ufs/lfs/lfs_pages.c: revision 1.21
sys/ufs/lfs/lfs_vnops.c: revision 1.330
sys/ufs/lfs/lfs_alloc.c: revision 1.140 (patch)
sys/ufs/lfs/lfs_alloc.c: revision 1.141 (patch)
lib/libp2k/p2k.c: revision 1.72
sys/ufs/lfs/lfs.h: revision 1.205
sys/ufs/lfs/lfs.h: revision 1.206
sys/ufs/lfs/lfs_segment.c: revision 1.284
sys/ufs/lfs/lfs.h: revision 1.207
sys/ufs/lfs/lfs_segment.c: revision 1.285
sys/ufs/lfs/lfs_debug.c: revision 1.55
sys/ufs/lfs/lfs_rename.c: revision 1.23
usr.sbin/dumplfs/dumplfs.c: revision 1.65
sys/ufs/lfs/lfs_vfsops.c: revision 1.371
sys/arch/i386/stand/efiboot/bootx64/Makefile: revision 1.3
sys/ufs/lfs/lfs_vfsops.c: revision 1.372
sys/ufs/lfs/lfs_vfsops.c: revision 1.373
sbin/fsck_lfs/pass1.c: revision 1.46
sys/ufs/lfs/lfs_vnops.c: revision 1.326
sys/ufs/lfs/lfs_vnops.c: revision 1.327
sys/ufs/lfs/lfs_vfsops.c: revision 1.375 (patch)
sys/ufs/lfs/lfs_vnops.c: revision 1.328
sys/ufs/lfs/lfs_subr.c: revision 1.98
sys/ufs/lfs/lfs_extern.h: revision 1.116
sys/ufs/lfs/lfs_vnops.c: revision 1.329
sys/ufs/lfs/lfs_subr.c: revision 1.99
sys/ufs/lfs/lfs_extern.h: revision 1.117
sys/ufs/lfs/lfs_accessors.h: revision 1.49
sys/ufs/lfs/lfs_extern.h: revision 1.118
sys/rump/fs/lib/liblfs/Makefile: revision 1.15
sys/ufs/lfs/lfs_bio.c: revision 1.146 (patch)
sys/ufs/lfs/lfs_bio.c: revision 1.147
sys/ufs/lfs/lfs_subr.c: revision 1.100

Fix kassert in lfs by initializing vp first.

Use a marker node to iterate lfs_dchainhd / i_lfs_dchain.

I believe elements can be removed while the lock is dropped,
including the next node we're hanging on to.

Just use VOP_BWRITE for lfs_bwrite_log.
Hope this doesn't cause trouble with vfs_suspend.

Teach lfs to transition ro<->rw.

Prevent new dirops while we issue lfs_flush_dirops.

lfs_flush_dirops assumes (by KASSERT((ip->i_state & IN_ADIROP) == 0))
that vnodes on the dchain will not become involved in active dirops
even while holding no other locks (lfs_lock, v_interlock), so we must
set lfs_writer here. All other callers already set lfs_writer.

We set fs->lfs_writer++ without explicitly doing lfs_writer_enter
because
(a) we already waited for the dirops to drain, and
(b) we hold lfs_lock and cannot drop it before setting lfs_writer.

Assert lfs_writer where I think we can now prove it.

Serialize access to the splay tree with lfs_lock.

Change some cheap KDASSERT into KASSERT.

Take a reference and fix assertions in lfs_flush_dirops.
Fixes panic:
KASSERT((ip->i_state & IN_ADIROP) == 0) at lfs_vnops.c:1670
lfs_flush_dirops
lfs_check
lfs_setattr
VOP_SETATTR
change_mode
sys_fchmod
syscall

This assertion -- and the assertion that vp->v_uflag has VU_DIROP set
-- is valid only until we release lfs_lock, because we may race with
lfs_unmark_dirop which will remove the nodes and change the flags.

Further, vp itself is valid only as long as it is referenced, which it
is as long as it's on the dchain, but lfs_unmark_dirop drops the
dchain's reference.

Don't lfs_writer_enter while holding v_interlock.

There's no need to lfs_writer_enter at all here, as far as I can see.
lfs_flush_fs will do it for us.

Break deadlock in PR kern/52301.

The lock order is lfs_writer -> lfs_seglock. The problem in 52301 is
that lfs_segwrite violates this lock order by sometimes doing
lfs_seglock -> lfs_writer, either (a) when doing a checkpoint or (b),
opportunistically, when there are no dirops pending. Both cases can
deadlock, because dirops sometimes take the seglock (lfs_truncate,
lfs_valloc, lfs_vfree):
(a) There may be dirops pending, and they may be waiting for the
seglock, so we can't wait for them to complete while holding the
seglock.
(b) The test for fs->lfs_dirops == 0 happens unlocked, and the state
may change by the time lfs_writer_enter acquires lfs_lock.

To resolve this in each case:
(a) Do lfs_writer_enter before lfs_seglock, since we will need it
unconditionally anyway. The worst performance impact of this should
be that some dirops get delayed a little bit.
(b) Create a new lfs_writer_tryenter to use at this point so that the
test for fs->lfs_dirops == 0 and the acquisition of lfs_writer happen
atomically under lfs_lock.

Initialize/destroy lfs_allclean_wakeup in modcmd, not lfs_mountfs.

Fixes reloading lfs.kmod.

In lfs_update, hold lfs_writer around lfs_vflush.

Otherwise, we might do
lfs_vflush
-> lfs_seglock
-> lfs_segwait(SEGM_CKP)
-> lfs_writer_enter
which is the reverse of the lfs_writer -> lfs_seglock ordering.

Call lfs_orphan in lfs_rename while we're still in the dirop.
lfs_writer_enter can't fail; keep it simple and don't pretend it can.

Assert that mtsleep can't fail either -- it doesn't catch signals and
there's no timeout.

Teach LFS_ORPHAN_NEXTFREE about lfs64.

Dust off the orphan detection code and try to make it work.

Fix !DIAGNOSTIC compile

Fix userland references to LFS_ORPHAN_NEXTFREE.

Forgot to grep for these or do a full distribution build, oops!

Fix missing <sys/evcnt.h> by removing the evcnts instead.

Just wanted to confirm that a race might happen, and indeed it did.
These serve little diagnostic value otherwise.

OR into bp->b_cflags; don't overwrite.

CTASSERT lfs on-disk structure sizes.

Avoid misaligned access to lfs64 on-disk records in memory.
lfs64 directory entries are only 32-bit aligned in order to conserve
space in directory blocks, and we had a hack to stuff a 64-bit inode
in them. This replaces the hack by __aligned(4) __packed, and goes
further:

1. It's not clear that all the other lfs64 data structures are 64-bit
aligned on disk to begin with. We can go through these later and
upgrade them from
struct foo64 {
...
} __aligned(4) __packed;
union foo {
struct foo64 f64;
...
};
to
struct foo64 {
...
};
union foo {
struct foo64 f64 __aligned(8);
...
} __aligned(4) __packed;
if we really want to take advantage of 64-bit memory accesses.
However, the __aligned(4) __packed must remain on the union
because:
2. We access even the lfs32 data structures via a union that has
lfs64 members, and it turns out that compilers will assume access
through a union with 64-bit aligned members implies the whole
union has 64-bit alignment, even if we're only accessing a 32-bit
aligned member.

Fix clang build after packed lfs64 accessor change.

Suppress spurious address-of-packed error in rump lfs too.
 1.137.4.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.97 20-Oct-2025  perseant * Generalize the partial-segment parser introduced for roll-forward,
using it to facilitate an in-kernel segment rewriter (cleaner), and a
mechanism to check whether a segment is in fact empty (only used
with DEBUG).

* Add these new fcntl calls:
- LFCNFILESTATS: For each inode given, report its number of direct
blocks, how many gaps (discontinuities) there are between direct
blocks, and how large the total gap distance is. This will be
useful for a coalescing agent.
- LFCNREWRITEFILE: For each inode given, rewrite its direct blocks,
effectively coalescing it into as compact a form as possible.
- LFCNSCRAMBLE: As above, except that it only rewrites every other
block. This causes the file to have many gaps that can be
measured with LFCNFILESTATS and addressed with LFCNREWRITEFILE,
for testing purposes.
- LFCNREWRITESEGS: Rewrite any live data in the given segments.
This is intended to simplify the cleaner API and facilitate an
in-kernel cleaner.
- LFCNCLEANERINFO: Get the most current CLEANERINFO data from the
kernel.
- LFCNSEGUSE: Retrieve segment usage data from the kernel.

* Vnodes marked IN_CLEANING now take a reference. Add a new "cleaner
lock", which must be taken by the cleaner before the segment lock,
and before marking nodes IN_CLEANING. This allows us to flush
vnodes, if necessary, before the cleaning segment is written, and
never to flush vnodes being cleaned. When the cleaner lock is
released, the vnodes are cleared of IN_CLEANING and the reference
dropped.

* Track a potential infinite loop in lfs_gatherblock.

* Pull "needs to flush" and "needs to wait for flush" into functions
instead of inlining their definitions.
 1.96 05-Sep-2020  riastradh Round of uvm.h cleanup.

The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
to query whether curlwp is the pagedaemon, which should maybe be
exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
by it. We should split up uvm_extern.h but this will serve for now
to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
UVMHIST(ubchist), since ubchist is declared in uvm.h but the
reference evaporates if UVMHIST is not defined, so we reduce header
file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
here.

ok chs@
 1.95 23-Feb-2020  riastradh Serialize access to the splay tree with lfs_lock.
 1.94 10-Jun-2017  maya branches: 1.94.6; 1.94.10; 1.94.12;
Rename i_flag to i_state.

The similarity to i_flags has previously caused errors.
 1.93 08-Jun-2017  chs move some buffer cache internals declarations from buf.h to vfs_bio.c.
this is needed to avoid name conflicts with ZFS and also
makes it clearer that other code shouldn't be messing with these.
remove the LFS debug code that poked around in bufqueues and
remove the BQ_EMPTY bufqueue since nothing uses it anymore.
provide a function to let LFS and wapbl read the value of nbuf for now.
 1.92 06-Apr-2017  maya branches: 1.92.6;
Provide a LFS_ENTER_LOG (__nothing) in the !DEBUG case.
so I can drop lots of #ifdef DEBUG around this macro. NFCI
 1.91 07-Aug-2016  dholland branches: 1.91.2;
Fix stupid thinko.
 1.90 07-Aug-2016  dholland comments
 1.89 07-Aug-2016  dholland use static properly
 1.88 10-Oct-2015  dholland branches: 1.88.2;
Use accessors for some more indirect block manipulations.
 1.87 01-Sep-2015  dholland Use the lfs dinode accessors in place of the ufs-derived ones.
(Mostly.)

The ufs-derived ones are fake structure member macros, which are gross
and not very safe. Also, it seems that a lot of places in the lfs code
were using the ffsv1 branch of them unconditionally, and this way it's
guaranteed all those places have been updated.

Found while doing this: for non-devices, have getattr produce NODEV
in the rdev field instead of leaking the address of the first direct
block.
 1.86 02-Aug-2015  dholland Pass the fs object to LFS_MAX_DADDR so it can check lfs_is64.

Remove some hackish intentional 64->32 truncations next to the checks
using LFS_MAX_DADDR, and tackle the problem they handled in bmap
instead.

The problem: the magic block pointer value UNWRITTEN has magic value
-2, and if it's not handled specifically, uint32 -> uint64 promotion
turns it into 4294967294, which then causes consternation and
monkeyhouse downstream.

What's here is still kind of a hack, but it's a step forward.
 1.85 02-Aug-2015  dholland Fix assorted 64 -> 32 truncations in lfs. Also, some minor tidyups and
corrections in passing.
 1.84 28-Jul-2015  dholland Add a new lfs header file: lfs_accessors.h.

This contains all the accessor functions and macros out of lfs.h.
Add an include of lfs_accessors.h after all uses of lfs.h... except
for code that wants to define its own struct lfs-alike that the
accessors are supposed to play along with. For these, set STRUCT_LFS
and include lfs_accessors.h after the necessary structure has been
defined, so that lfs_accessors.h can emit functions in terms of it.
 1.83 24-Jul-2015  dholland More lfs superblock accessors.
(This changes the rest of the code over; all the accessors were
already added.)

The difference between this commit and the previous one is arbitrary,
but the previous one passed the regression tests on its own so I'm
keeping it separate to help with any bisections that might be needed
in the future.
 1.82 24-Jul-2015  dholland Switch to accessor functions for elements of the LFS on-disk
superblock. This will allow switching between 32/64 bit forms on the
fly; it will also allow handling LFS_EI reasonably tidily. (That
currently doesn't work on the superblock.)

It also gets rid of cpp abuse in the form of fake structure member
macros.

Also, instead of doing sleep/wakeup on &lfs_avail and &lfs_nextseg
inside the on-disk superblock, add extra elements to the in-memory
struct lfs for this. (XXX: these should be changed to condvars, but
not right now)

XXX: this migrates a structure needed by the lfs code in libsa (struct
salfs) into lfs.h, where it doesn't belong, but for the time being
this is necessary in order to allow the accessors (and the various
lfs macros and other goop that relies on them) to compile.
 1.81 28-Mar-2015  maxv Remove the 'cred' argument from bread(). Remove a now unused var in
ffs_snapshot.c. Update the man page accordingly.

ok hannken@
 1.80 28-Jul-2013  dholland branches: 1.80.6;
Add more of the bits for supporting quotas.
 1.79 28-Jul-2013  dholland Migrate the miscellaneous ulfs-level info from struct ulfsmount to
struct lfs.

Put them inside #ifdef _KERNEL there. They are not the only such
members, gross as that is. Unfortunately, moving struct lfs to
lfs_kernel.h does not work.
 1.78 28-Jul-2013  dholland Add lfs_kernel.h for declarations that don't need to be exposed to userland.

lfs currently has the following headers:
lfs.h - on-disk structures and stuff needed for userlevel tools
lfs_inode.h - additional restricted materials for userlevel tools
that operate the fs (newfs_lfs, fsck_lfs, lfs_cleanerd)
lfs_kernel.h - stuff needed only in the kernel

and the following legacy headers that are expected to be mopped up and
folded into one of the above:
lfs_extern.h - function prototypes
ulfs_bswap.h - endian-independent support
ulfs_dinode.h - now contains very little
ulfs_dirhash.h - dirhash support
ulfs_extattr.h - extattr support
ulfs_extern.h - more function prototypes
ulfs_inode.h - assorted kernel-only declarations
ulfs_quota.h - quota support
ulfs_quota1.h - more quota support
ulfs_quota2.h - more quota support
ulfs_quotacommon.h - more quota support
ulfsmount.h - legacy copy of ufsmount material
 1.77 18-Jun-2013  christos branches: 1.77.2;
Prefix most of the cpp macros with lfs_ and LFS_ to avoid conflicts with ffs.
This was done so that boot blocks that want to compile both FFS and LFS in
the same file work.
 1.76 06-Jun-2013  dholland Add lfs_ or ulfs_ in front of extern symbols lacking them, mostly
quota-related (and particularly quota2-related) stuff.
 1.75 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.74 06-Jun-2013  dholland Split lfs from ufs step 3: rearrange config stuff.
Add new options:
LFS_EI
LFS_DIRHASH
LFS_EXTATTR
LFS_EXTATTR_AUTOSTART
LFS_QUOTA
LFS_QUOTA2

and update code referring to the corresponding FFS and UFS config
symbols to use the LFS versions. Disable the one extant reference
to APPLE_UFS in the ulfs files. Use opt_lfs.h only, not opt_ffs.h.
 1.73 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.72 22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.71 20-Dec-2012  hannken Change bread() and breadn() to never return a buffer on
error and modify all callers to not brelse() on error.

Welcome to 6.99.16

PR kern/46282 (6.0_BETA crash: msdosfs_bmap -> pcbmap -> bread -> bio_doread)
 1.70 11-Jul-2011  hannken branches: 1.70.2; 1.70.12;
Change VOP_BWRITE() to take a vnode as its first argument like all other
VOPs do. Layered file systems no longer have to modify bp->b_vp and run
into trouble when an async VOP_BWRITE() uses the wrong vnode.

- change all occurences of VOP_BWRITE(bp) to VOP_BWRITE(bp->b_vp, bp).
- remove layer_bwrite().
- welcome to 5.99.55

Adresses PR kern/38762 panic: vwakeup: neg numoutput

No objections from tech-kern@.
 1.69 16-Feb-2010  mlelstv Three changes in a single commit.

- drop the notion of frags (LFS fragments) vs fsb (FFS fragments)
The code uses a complicated unity function that just makes the
code difficult to understand.

- support larger sector sizes. Fix disk address computations
to use DEV_BSIZE in the kernel as required by device drivers
and to use sector sizes in userland.

- Fix several locking bugs in lfs_bio.c and lfs_subr.c.
 1.68 18-Mar-2009  cegger branches: 1.68.2;
bzero -> memset
 1.67 16-May-2008  hannken branches: 1.67.6; 1.67.12;
Make sure all cached buffers with valid, not yet written data have been
run through copy-on-write. Call fscow_run() with valid data where possible.

The LP_UFSCOW hack is no longer needed to protect ffs_copyonwrite() against
endless recursion.

- Add a flag B_MODIFY to bread(), breada() and breadn(). If set the caller
intends to modify the buffer returned.

- Always run copy-on-write on buffers returned from ffs_balloc().

- Add new function ffs_getblk() that gets a buffer, assigns a new blkno,
may clear the buffer and runs copy-on-write. Process possible errors
from getblk() or fscow_run(). Part of PR kern/38664.

Welcome to 4.99.63

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.66 28-Apr-2008  martin branches: 1.66.2;
Remove clause 3 and 4 from TNF licenses
 1.65 15-Feb-2008  ad branches: 1.65.6; 1.65.8; 1.65.10;
The buffer LOCKED flag need not be under the protection of bufcache_lock,
BUSY is enough.
 1.64 02-Jan-2008  ad Merge vmlocking2 to head.
 1.63 08-Oct-2007  ad branches: 1.63.4; 1.63.6; 1.63.10;
Merge ffs locking & brelse changes from the vmlocking branch.
 1.62 15-Feb-2007  ad branches: 1.62.6; 1.62.18; 1.62.20; 1.62.22;
Replace some uses of lockmgr() / simplelocks.
 1.61 14-May-2006  elad integrate kauth.
 1.60 07-Apr-2006  perseant Several minor bug fixes:

* Correct (weak) segment lock assertions in lfs_fragextend and lfs_putpages.
* Keep IN_MODIFIED set if we run out of avail in lfs_putpages.
* Don't try to (re)write buffers on a VBLK vnode; fixes a panic I found
while running with an LFS root.
* Raise priority of LFCNSEGWAIT to PVFS; PUSER is way too low for
something the pagedaemon is relying on.
 1.59 24-Dec-2005  perry branches: 1.59.4; 1.59.6; 1.59.8; 1.59.10; 1.59.12;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.58 11-Dec-2005  christos merge ktrace-lwp.
 1.57 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.56 19-Apr-2005  perseant branches: 1.56.2; 1.56.4;
Keep per-inode, per-fs, and subsystem-wide counts of blocks allocated through
lfs_balloc(), and use that to estimate the number of dirty pages belonging
to LFS (subsystem or filesystem). This is almost certainly wrong for
the case of a large mmap()ed region, but the accounting is tighter than
what we had before, and performs much better in the typical case of pages
dirtied through write().
 1.55 16-Apr-2005  perseant Use splay trees, rather than a hash table, to manage the accounting of
blocks allocated through VOP_BALLOC() for pages to be written to disk.
This accounting no longer takes a noticeable fraction of the system CPU.
 1.54 14-Apr-2005  perseant Tabify leading whitespace
 1.53 14-Apr-2005  perseant Consolidate the hash table we use to maintain the integrity of lfs_avail
into a single, system-wide table, rather than having a separate hash table
per inode. Significantly reduces the "system" cpu usage of your average
file write.
 1.52 01-Apr-2005  perseant Protect various per-fs structures with fs->lfs_interlock simple_lock, to
improve behavior in the multiprocessor case. Add debugging segment-lock
assertion statements.
 1.51 02-Mar-2005  perseant branches: 1.51.2;
Put the ISSPACE() check where it belongs. This allows rewriting a file
on a full filesystem while still returning ENOSPC on an attempt to allocate
new blocks.
 1.50 26-Feb-2005  perry nuke trailing whitespace
 1.49 26-Feb-2005  perseant Various minor LFS improvements:

* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statvfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().
 1.48 25-Jan-2004  hannken branches: 1.48.6; 1.48.8; 1.48.10;
Make VOP_STRATEGY(bp) a real VOP as discussed on tech-kern.

VOP_STRATEGY(bp) is replaced by one of two new functions:

- VOP_STRATEGY(vp, bp) Call the strategy routine of vp for bp.
- DEV_STRATEGY(bp) Call the d_strategy routine of bp->b_dev for bp.

DEV_STRATEGY(bp) is used only for block-to-block device situations.
 1.47 30-Dec-2003  pk Replace the traditional buffer memory management -- based on fixed per buffer
virtual memory reservation and a private pool of memory pages -- by a scheme
based on memory pools.

This allows better utilization of memory because buffers can now be allocated
with a granularity finer than the system's native page size (useful for
filesystems with e.g. 1k or 2k fragment sizes). It also avoids fragmentation
of virtual to physical memory mappings (due to the former fixed virtual
address reservation) resulting in better utilization of MMU resources on some
platforms. Finally, the scheme is more flexible by allowing run-time decisions
on the amount of memory to be used for buffers.

On the other hand, the effectiveness of the LRU queue for buffer recycling
may be somewhat reduced compared to the traditional method since, due to the
nature of the pool based memory allocation, the actual least recently used
buffer may release its memory to a pool different from the one needed by a
newly allocated buffer. However, this effect will kick in only if the
system is under memory pressure.
 1.46 29-Oct-2003  mycroft Adjust to remove bogus initializer.
 1.45 25-Oct-2003  christos Fix uninitialized variable warnings.
 1.44 04-Sep-2003  yamt don't call LFS_DEBUG_COUNTLOCKED after bread().
lfs_countlocked doesn't count buffers that isn't on the freelist.
 1.43 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.42 18-May-2003  yamt branches: 1.42.2;
make is_sequential a callback in order to achieve better lfs write clustering.

since lfs always rewrite blocks into the new segment,
current on-disk place of the block doesn't affect to write clustering.

ok'ed by Konrad Schroder.
 1.41 29-Apr-2003  yamt add an assertion.
 1.40 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.39 15-Mar-2003  perseant Add simple_lock protection for lfs_seglock and lfs_subsys_pages; these will
be expanded to cover other per-fs and subsystem-wide data as well.

Fix a case of IN_MODIFIED being set without updating lfs_uinodes, resulting
in a "lfs_uinodes < 0" panic.

Fix a deadlock in lfs_putpages arising from the need to busy all pages in a
block; unbusy any that had already been busied before starting over.
 1.38 28-Feb-2003  perseant Fix a clrbuf() on an uninitialized pointer.
 1.37 20-Feb-2003  perseant Tabify, and fix some comment alignment problems.
 1.36 17-Feb-2003  perseant Add code to UBCify LFS. This is still behind "#ifdef LFS_UBC" for now
(there are still some details to work out) but expect that to go
away soon. To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:

* Create a writer daemon kernel thread whose purpose is to handle page
writes for the pagedaemon, but which also takes over some of the
functions of lfs_check(). This thread is started the first time an
LFS is mounted.

* Add a "flags" parameter to GOP_SIZE. Current values are
GOP_SIZE_READ, meaning that the call should return the size of the
in-core version of the file, and GOP_SIZE_WRITE, meaning that it
should return the on-disk size. One of GOP_SIZE_READ or
GOP_SIZE_WRITE must be specified.

* Instead of using malloc(...M_WAITOK) for everything, reserve enough
resources to get by and use malloc(...M_NOWAIT), using the reserves if
necessary. Use the pool subsystem for structures small enough that
this is feasible. This also obsoletes LFS_THROTTLE.

And a few that are not strictly necessary:

* Moves the LFS inode extensions off onto a separately allocated
structure; getting closer to LFS as an LKM. "Welcome to 1.6O."

* Unified GOP_ALLOC between FFS and LFS.

* Update LFS copyright headers to correct values.

* Actually cast to unsigned in lfs_shellsort, like the comment says.

* Keep track of which segments were empty before the previous
checkpoint; any segments that pass two checkpoints both dirty and
empty can be summarily cleaned. Do this. Right now lfs_segclean
still works, but this should be turned into an effectless
compatibility syscall.
 1.35 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.34 11-Dec-2002  yamt take care of B_CLRBUF in lfs_balloc.
otherwise you'll see uninitialized blocks.
 1.33 06-Jul-2002  perseant Deal with fragment size changes better. For each fragment that can
exist on an on-disk inode, we keep a record of its size in struct inode,
which is updated when we write the block to disk. The cleaner routines
thus have ready access to what size is the correct size for this block,
on disk.

Fixed a related bug: if a file with fragments is being cleaned
(fragments being cleaned) at the same time it is being extended beyond
NDADDR blocks, we could write a bogus FINFO record that has a frag in the
middle; when it was cleaned this would give back bogus file data. Don't
write the indirect blocks in this case, since there is no need.

lfs_fragextend and lfs_truncate no longer require the seglock, but instead
take a shared lock, which the seglock locks exclusively.
 1.32 14-May-2002  perseant branches: 1.32.2;
Phase one of my three-phase plan to make LFS play nice with UBC, and bug-fixes
I found while making sure there weren't any new ones.

* Make the write clusters keep track of the buffers whose blocks they contain.
This should make it possible to (1) write clusters using a page mapping
instead of malloc, if desired, and (2) schedule blocks for rewriting
(somewhere else) if a write error occurs. Code is present to use
pagemove() to construct the clusters but that is untested and will go away
anyway in favor of page mapping.
* DEBUG now keeps a log of Ifile writes, so that any lingering instances of
the "dirty bufs" problem can be properly debugged.
* Keep track of whether the Ifile has been dirtied by various routines that
can be called by lfs_segwrite, and loop on that until it is clean, for
a checkpoint. Checkpoints need to be squeaky clean.
* Warn the user (once) if the Ifile grows larger than is reasonable for their
buffer cache. Both lfs_mountfs and lfs_unmount check since the Ifile can
grow.
* If an inode is not found in a disk block, try rereading the block, under
the assumption that the block was copied to a cluster and then freed.
* Protect WRITEINPROG() with splbio() to fix a hang in lfs_update.
 1.31 23-Nov-2001  chs add spaces for KNF. confirmed to produce identical objects.
 1.30 08-Nov-2001  lukem add RCSID
 1.29 13-Jul-2001  perseant branches: 1.29.4;
Merge the short-lived perseant-lfsv2 branch into the trunk.

Kernels and tools understand both v1 and v2 filesystems; newfs_lfs
generates v2 by default. Changes for the v2 layout include:

- Segments of non-PO2 size and arbitrary block offset, so these can be
matched to convenient physical characteristics of the partition (e.g.,
stripe or track size and offset).

- Address by fragment instead of by disk sector, paving the way for
non-512-byte-sector devices. In theory fragments can be as large
as you like, though in reality they must be smaller than MAXBSIZE in size.

- Use serial number and filesystem identifier to ensure that roll-forward
doesn't get old data and think it's new. Roll-forward is enabled for
v2 filesystems, though not for v1 filesystems by default.

- The inode free list is now a tailq, paving the way for undelete (undelete
is not yet implemented, but can be without further non-backwards-compatible
changes to disk structures).

- Inode atime information is kept in the Ifile, instead of on the inode;
that is, the inode is never written *just* because atime was changed.
Because of this the inodes remain near the file data on the disk, rather
than wandering all over as the disk is read repeatedly. This speeds up
repeated reads by a small but noticeable amount.

Other changes of note include:

- The ifile written by newfs_lfs can now be of arbitrary length, it is no
longer restricted to a single indirect block.

- Fixed an old bug where ctime was changed every time a vnode was created.
I need to look more closely to make sure that the times are only updated
during write(2) and friends, not after-the-fact during a segment write,
and certainly not by the cleaner.
 1.28 30-May-2001  mrg branches: 1.28.2; 1.28.4;
use _KERNEL_OPT
 1.27 21-Nov-2000  perseant branches: 1.27.2;
More locked_queue_* and lfs_avail accounting fixes from Jesse Off
<joff@gci-net.com>. Remove a specious btodb() in lfs_fragextend, and
count blocks shrunk or removed by VOP_TRUNCATE in lfs_avail.
 1.26 17-Nov-2000  perseant Correct accounting of lfs_avail, locked_queue_count, and locked_queue_bytes.
(PR #11468). In the case of fragment allocation, check to see if enough
space is available before extending a fragment already scheduled for writing.

The locked_queue_* variables indicate the number of buffer headers and bytes,
respectively, that are unavailable to getnewbuf() because they are locked up
waiting for LFS to flush them; make sure that that is actually what we're
counting, i.e., never count malloced buffers, and always use b_bufsize instead
of b_bcount.

If DEBUG is defined, the periodic calls to lfs_countlocked will now complain
if either counter is incorrect. (In the future lfs_countlocked will not need
to be called at all if DEBUG is not defined.)
 1.25 09-Sep-2000  perseant Various bug-fixes to LFS, to wit:


Kernel:

* Add runtime quantity lfs_ravail, the number of disk-blocks reserved
for writing. Writes to the filesystem first reserve a maximum amount
of blocks before their write is allowed to proceed; after the blocks
are allocated the reserved total is reduced by a corresponding amount.

If the lfs_reserve function cannot immediately reserve the requested
number of blocks, the inode is unlocked, and the thread sleeps until
the cleaner has made enough space available for the blocks to be
reserved. In this way large files can be written to the filesystem
(or, smaller files can be written to a nearly-full but thoroughly
clean filesystem) and the cleaner can still function properly.

* Remove explicit switching on dlfs_minfreeseg from the kernel code; it
is now merely a fs-creation parameter used to compute dlfs_avail and
dlfs_bfree (and used by fsck_lfs(8) to check their accuracy). Its
former role is better assumed by a properly computed dlfs_avail.

* Bounds-check inode numbers submitted through lfs_bmapv and lfs_markv.
This prevents a panic, but, if the cleaner is feeding the filesystem
the wrong data, you are still in a world of hurt.

* Cleanup: remove explicit references of DEV_BSIZE in favor of
btodb()/dbtob().

lfs_cleanerd:

* Make -n mean "send N segments' blocks through a single call to
lfs_markv". Previously it had meant "clean N segments though N calls
to lfs_markv, before looking again to see if more need to be cleaned".
The new behavior gives better packing of direct data on disk with as
little metadata as possible, largely alleviating the problem that the
cleaner can consume more disk through inefficient use of metadata than
it frees by moving dirty data away from clean "holes" to produce
entirely clean segments.

* Make -b mean "read as many segments as necessary to write N segments
of dirty data back to disk", rather than its former meaning of "read
as many segments as necessary to free N segments worth of space". The
new meaning, combined with the new -n behavior described above,
further aids in cleaning storage efficiency as entire segments can be
written at once, using as few blocks as possible for segment summaries
and inode blocks.

* Make the cleaner take note of segments which could not be cleaned due
to error, and not attempt to clean them until they are entirely free
of dirty blocks. This prevents the case in which a cleanerd running
with -n 1 and without -b (formerly the default) would spin trying
repeatedly to clean a corrupt segment, while the remaining space
filled and deadlocked the filesystem.

* Update the lfs_cleanerd manual page to describe all the options,
including the changes mentioned here (in particular, the -b and -n
flags were previously undocumented).

fsck_lfs:

* Check, and optionally fix, lfs_avail (to an exact figure) and
lfs_bfree (within a margin of error) in pass 5.

newfs_lfs:

* Reduce the default dlfs_minfreeseg to 1/20 of the total segments.

* Add a warning if the sgs disklabel field is 16 (the default for FFS'
cpg, but not usually desirable for LFS' sgs: 5--8 is a better range).

* Change the calculation of lfs_avail and lfs_bfree, corresponding to
the kernel changes mentioned above.

mount_lfs:

* Add -N and -b options to pass corresponding -n and -b options to
lfs_cleanerd.

* Default to calling lfs_cleanerd with "-b -n 4".


[All of these changes were largely tested in the 1.5 branch, with the
idea that they (along with previous un-pulled-up work) could be applied
to the branch while it was still in ALPHA2; however my test system has
experienced corruption on another filesystem (/dev/console has gone
missing :^), and, while I believe this unrelated to the LFS changes, I
cannot with good conscience request that the changes be pulled up.]
 1.24 04-Jul-2000  perseant Fix errors observed while trying to fill the filesystem with yesterday's
fixes:

- Write copies of bfree and avail in the CLEANERINFO block, so the
cleaner doesn't have to guess which superblock has the current
information (if indeed any do).

- Tighten up accounting of lfs_avail (more needs to be done).

- When cleansing indirect blocks of UNWRITTEN, make sure not to mark
them clean, since they'll need to be rewritten later.
 1.23 03-Jul-2000  perseant Fix i_ffs_blocks in fragment extension case where fragment has not yet
been written to disk.
 1.22 03-Jul-2000  perseant i_lfs_effnblks fixes. Put debugging printfs under #ifdef DEBUG_LFS.
 1.21 03-Jul-2000  perseant Allow the number of free segments reserved for the cleaner to be
parametrized in the filesystem, defaulting to MIN_FREE_SEGS = 2 but set
to something more reasonable at newfs_lfs time.

Note the number of blocks that have been scheduled for writing but which
are not yet on disk in an inode extension, i_lfs_effnblks. Move
i_ffs_effnlink out of the ffs extension and onto the main inode, since
it's used all over the shared code and the lfs extension would clobber
it.

At inode write time, indirect blocks and inode-held blocks of inodes
that have i_lfs_effnblks != i_ffs_blocks are cleansed of UNWRITTEN disk
addresses, so that these never make it to disk.
 1.20 28-Jun-2000  mrg remove include of <vm/vm.h> and <uvm/uvm_extern.h>
 1.19 27-Jun-2000  perseant Fixes associated with filling an LFS:

Change the space computation to appear to change the size of the *disk*
rather than the *bytes used* when more segment summaries and inode
blocks are written. Try to estimate the amount of space that these will
take up when more files are written, so the disk size doesn't change too
much.

Regularize error returns from lfs_valloc, lfs_balloc, lfs_truncate: they
now fail entirely, rather than succeeding half-way and leaving the fs in
an inconsistent state.

Rewrite lfs_truncate, mostly stealing from ffs_truncate. The old
lfs_truncate had difficulty truncating a large file to a non-zero size
(indirect blocks were not handled appropriately).

Unmark VDIROP on fvp after ufs_remove, ufs_rmdir, so these can be
reclaimed immediately: this vnode would not be written to disk again
anyway if the removal succeeded, and if it failed, no directory
operation occurred.

ufs_makeinode and ufs_mkdir now remove IN_ADIROP on error.
 1.18 06-Jun-2000  perseant branches: 1.18.2;
Protect inode free list with seglock, instead of separate lock, so that
the head of the inode free list (on the superblock) always matches the
rest of the free list (in the ifile).

Protect lfs_fragextend with seglock, to prevent the segment byte count
fudging from making its way to disk.

Don't try to inactivate dirop vnodes that are still in the middle of
their dirop (may address PR#10285).
 1.17 30-May-2000  perseant Don't try to "correct" accounting for fragments being extended but which
have never been written to disk.
 1.16 05-May-2000  perseant branches: 1.16.2;
Change the way LFS does block accounting, from trying to infer from the
buffer cache flags, to marking the inode and/or indirect blocks with a
special disk address UNWRITTEN==-2 when a block is accounted for. (This
address is never written to disk, but only used in-core. This is essentially
the same method of block accounting as on the UBC branch, where the buffer
headers don't exist.) Make sure that truncation is handled properly,
especially in the case of holey files.

Fixes PR#9994.
 1.15 23-Apr-2000  perseant Fix problems outlined in PR#9926:
- lfs_truncate extends the file if called with length > i_ffs_size;
- lfs_truncate errors out if called with length < 0;
- lfs_balloc block accounting corrected for the case of blocks read
into the cache before they exist on disk;
- mp->mnt_stat.f_iosize is initialized in lfs_mountfs.
 1.14 15-Nov-1999  fvdl Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.13 15-Jun-1999  perseant branches: 1.13.2; 1.13.4; 1.13.8;
Minor changes to the segment live bytes calculation. In particular, fixed
a bug in fragment extension that could run the count negative. Also, don't
overcount for inodes, and don't count segment summaries. Thus, for empty
segments the live bytes count should now be exactly zero.
 1.12 24-Mar-1999  mrg branches: 1.12.2; 1.12.4; 1.12.6;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.
 1.11 10-Mar-1999  perseant New sources should leave the LFS in a more-or-less working state. Changes
include:

- DIROP segregation is enabled, and greater care is taken
to make sure that a checkpoint completes. Fsck is not
needed to remount the filesystem.
- Several checks to make sure that the LFS subsystem does not
overuse various resources (memory, in particular).
- The cleaner routines, lfs_markv in particular, are completely
rewritten. A buffer overflow is removed. Greater care is taken
to ensure that inodes come from where lfs_cleanerd say they come
from (so we know nothing has changed since lfs_bmapv was called).
- Fragment allocation is fixed, so that writes beyond end-of-file
do the right thing.
 1.10 09-Nov-1998  mycroft GC the B_CACHE bit.
 1.9 09-Jun-1998  scottr Protect various config(8)-generated files from inclusion while
building LKMs. Fixes PR 5557.
 1.8 08-Jun-1998  scottr Use the newly-defined opt_quota.h.
 1.7 03-Mar-1998  drochner Don't cast the quad_t file size to u_long, this can cause overflows.
 1.6 03-Mar-1998  fvdl Make this compile again with UVM
 1.5 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.4 11-Jun-1997  bouyer Add support for ext2fs, this needed a few modifications to ufs/ufs/inode.h:
- added an "union inode_ext" to struct inode, for the per-fs extentions.
For now only ext2fs uses it.
- i_din is now an union:
union {
struct dinode ffs_din; /* 128 bytes of the on-disk dinode. */
struct ext2fs_dinode e2fs_din; /* 128 bytes of the on-disk dinode. */
} i_din
Added a lot of #define i_ffs_* and i_e2fs_* to access the fields.
- Added two macros: FFS_ITIMES and EXT2FS_ITIMES. ITIMES calls the rigth
macro, depending on the time of the inode. ITIMES is used where necessary,
FFS_ITIMES and EXT2FS_ITIMES in other places.
 1.3 09-Feb-1996  christos lfs prototypes
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.12.6.1 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.12.4.3 31-Aug-1999  perseant Rudimentary support for LFS under UBC:

- LFS-specific VOP_BALLOC and VOP_PUTPAGES vnode ops.

- getblk VREG panic #ifdef'd out (can be reinstated when Ifile is
internalized and Ifile can be made another type from VREG)

- interface to VOP_PUTPAGES changed to pass all pager flags, not
just sync. FS putpages routines must know about the pager flags.

- new LFS magic disk address, -2 ("unwritten"), meaning accounted for
but not assigned to a fixed disk location (since LFS does these two
things separately, and the previous accounting method using buffer
headers no longer will work). Changed references to (foo == (daddr_t)-1)
to (foo < 0). Since disk drivers reject all addresses < 0, this should
not present a problem for other FSs.
 1.12.4.2 04-Jul-1999  chs a couple steps towards supporting UBC.
 1.12.4.1 21-Jun-1999  thorpej Sync w/ -current.
 1.12.2.1 25-Jun-1999  perry pullup 1.12->1.13 (perseant)
 1.13.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.13.4.1 19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.13.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.13.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.16.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.18.2.2 03-Feb-2001  he Pull up revisions 1.26-1.27 (requested by perseant):
o Close up accounting holes in LFS' accounting of immediately-
available-space, number of clean segments, and amount of dirty
space taken up by metadata (PR#11468, PR#11470, PR#11534).
 1.18.2.1 14-Sep-2000  perseant Pull up recent LFS kernel changes (approved by thorpej):

ufs/ufs/inode.h, 1.20--1.22 (add i_lfs_effnblks extension ;
make ITIMES aware of LFS_ITIMES;
_LKM protection so userland progs
compile)
ufs/ufs/ufs_vnops.c, 1.69, 1.71 (remove IN_ADIROP;
use ITIMES instead of FFS_ITIMES)
ufs/ufs/ufs_readwrite.c, 1.27 (use lfs_reserve in lfs_write)
ufs/lfs/lfs.h, 1.26--1.32 (define LFS_EST_* macros ;
change MIN_FREE_SEGS to lfs_minfreesegs ;
add avail and bfree to CLEANERINFO ;
change lfs_uinodes to signed ;
change lfs_dmeta to signed ;
add whitespace to line up structure
members ;
explicit cast to int32_t in LFS_EST_*
macros)
ufs/lfs/lfs_alloc.c, back out 1.34.2.3 (pullups of 1.39, 1.40);
then pull up 1.38 (clean up on error)
1.39--1.43 (restore fvdl's ufs_hashlock fix ;
restore fvdl's ufs_hashlock fix ;
set i_lfs_effnblks ;
use UINO macros ;
add comments and fix long lines)
ufs/lfs/lfs_balloc.c, 1.19 (don't succeed halfway)
1.21--1.25 (use i_lfs_effnblks ;
fix i_lfs_effnblks computation and
quieten ;
fix i_ffs_blocks in unwritten fragment ;
remove useless debugging check ;
add comments and (c) 2000)
ufs/lfs/lfs_bio.c, 1.24--1.30 (cleanup and make lfs_flush_fs take
"struct lfs *" instead of "struct
mount *" ;
use lfs_minfreeseg instead of
MIN_FREE_SEGS ;
use UINO macros, and copy bfree/avail
to CLEANERINFO ;
add lfs_reserve function ;
1.28--1.30 fix printf formatting)
ufs/lfs/lfs_cksum.c, 1.13 (add (c) 2000)
ufs/lfs/lfs_debug.c, 1.11 (use btodb instead of DEV_BSIZE)
ufs/lfs/lfs_extern.h, 1.18, 1.20--1.21 (function prototype changes)
ufs/lfs/lfs_inode.c, 1.38 (rewrite lfs_truncate from
ffs_truncate)
1.40--1.44 (count written and unwritten blocks
seperately ;
use disk block units instead of bytes ;
remove unnecessary "mod" variable ;
correct B_DELWRI to avoid bawrite panic ;
use lfs_reserve)
ufs/lfs/lfs_segment.c, 1.52-1.59 (use lfs_dmeta to note used summaries ;
check for UNWRITTEN in indirect blocks ;
more debugging stuff inside #ifdef
DEBUG_LFS ;
use LK_CANRECURSE ;
don't drop dirty indirect blocks ;
use UINO macros ;
don't hose the free list ;
use btodb() instead of DEV_BSIZE ;
make it compile again (oops))
ufs/lfs/lfs_subr.c, 1.16--1.17 (check for locked inodes before
changing ;
use btodb() instead of DEV_BSIZE, (c)
2000)
ufs/lfs/lfs_syscalls.c, back out 1.41.4.2 (fvdl's ufs_hashlock fix);
then pull up 1.43 (use lfs_dmeta)
1.44--1.45 (restore fvdl's ufs_hashlock fix)
1.46--1.47 (fix lfs_avail leakage from sblock
segments ;
use UINO macros)
1.49 (bounds-check inode numbers in
lfs_markv)
ufs/lfs/lfs_vfsops.c, 1.53 (use LFS_EST_* macros in lfs_statfs)
1.56--1.58 (initialize lfs_minfreeseg, lfs_effnblk ;
initialize lfs_uinodes ;
initialize lfs_ravail)
ufs/lfs/lfs_vnops.c, 1.40 (remove VDIROP from removed files)
1.42--1.44 (move SET_ENDOP below the removal of
VDIROP ;
use UINO macros and add lfs_itimes
function ;
use lfs_reserve in dirops)
 1.27.2.8 11-Dec-2002  thorpej Sync with HEAD.
 1.27.2.7 01-Aug-2002  nathanw Catch up to -current.
 1.27.2.6 20-Jun-2002  nathanw Catch up to -current.
 1.27.2.5 08-Jan-2002  nathanw Catch up to -current.
 1.27.2.4 14-Nov-2001  nathanw Catch up to -current.
 1.27.2.3 24-Aug-2001  nathanw Catch up with -current.
 1.27.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.27.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.28.4.4 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.28.4.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.28.4.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.28.4.1 03-Aug-2001  lukem update to -current
 1.28.2.2 02-Jul-2001  perseant Change disk addressing unit to be the fragment, instead of the disk sector.
All quantities in the superblock, inodes, indirect blocks, etc. refer now
to this abstract unit (called "fsb" as it is in FFS) instead of disk sectors;
as a consequence segment summary blocks have to be multiples of a fragment in
size. In v1 filesystems, compatibility code ensures that 1 fsb == 1 sector,
regardless of fragment size.

Fragments can now range in size between 512 and 32k; in the event that
LFS_LABELPAD (8k) is smaller than the disk address unit size, an extra
proto-superblock is kept at 8k from the beginning of the disk, to be used
*only* to locate the real superblocks. (Not all of the userland knows about
this yet.)

Almost all of this was done not by me, but by joff.
 1.28.2.1 29-Jun-2001  perseant Get rid of __P(), protoizing where it had not already been done
 1.29.4.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.32.2.1 15-Jul-2002  gehenna catch up with -current.
 1.42.2.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.42.2.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.42.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.42.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.42.2.1 03-Aug-2004  skrll Sync with HEAD
 1.48.10.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.48.8.1 29-Apr-2005  kent sync with -current
 1.48.6.1 10-May-2005  riz Pull up the following revisions (requested by perseant in ticket #1281):

1.8 sys/ufs/lfs/TODO
1.75 sys/ufs/lfs/lfs.h (via patch)
1.74 sys/ufs/lfs/lfs_alloc.c (via patch)
1.49, 1.51 sys/ufs/lfs/lfs_balloc.c (1.51 via patch)
1.78 sys/ufs/lfs/lfs_bio.c
1.62 sys/ufs/lfs/lfs_extern.h (via patch)
1.156 sys/ufs/lfs/lfs_segment.c (via patch)
1.48 sys/ufs/lfs/lfs_subr.c
1.101 sys/ufs/lfs/lfs_syscalls.c
1.163 sys/ufs/lfs/lfs_vfsops.c (via patch)
1.134 sys/ufs/lfs/lfs_vnops.c (via patch)
1.61 sys/ufs/ufs/ufs_readwrite.c (via patch)

1.20 libexec/lfs_cleanerd/clean.h (via patch)
1.52 libexec/lfs_cleanerd/cleanerd.c (via patch)
1.41 libexec/lfs_cleanerd/library.c (via patch)

1.4 regress/sys/fs/lfs/newfs_fsck/Makefile
1.2 regress/sys/fs/lfs/newfs_fsck/mkfs_mount
1.2 regress/sys/fs/lfs/newfs_fsck/smallfiles
1.3 sbin/fsck_lfs/bufcache.c
1.3 sbin/fsck_lfs/bufcache.h
1.3 sbin/fsck_lfs/lfs.h
1.8 sbin/fsck_lfs/lfs.c (via patch)
1.8 sbin/fsck_lfs/pass3.c (via patch)
1.18 sbin/fsck_lfs/pass0.c (via patch)
1.18 sbin/fsck_lfs/utilities.c (via patch)
1.7 sbin/fsck_lfs/segwrite.c
1.19 sbin/fsck_lfs/setup.c (via patch)
1.3 sbin/newfs_lfs/Makefile
0 sbin/newfs_lfs/lfs.c (yes, remove it)
1.1 sbin/newfs_lfs/make_lfs.c
1.15 sbin/newfs_lfs/newfs.c (via patch)

Various minor LFS improvements.

Kernel:

* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this. Should fix PR #29045.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
Fixes PR #26680.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().

cleaner:

* Adapt lfs_cleanerd to use the fcntl call to get the Ifile filehandle,
so it need not be in the namespace.
* Make lfs_cleanerd be more careful when there are very few available
segments.
* Make lfs_cleanerd less verbose when the filesystem is unmounted.

newfs_lfs, fsck_lfs, and regression:

* Extend the lfs library from fsck_lfs(8) so that it can be used with a
not-yet-existent LFS. Make newfs_lfs(8) use this library, so it can
create LFSs whose Ifile is larger than one segment. Addresses PR #11110.
* Make newfs_lfs(8) use strsuftoi64() for its arguments, a la newfs(8).
* Make fsck_lfs(8) respect the "file system is clean" flag.
* Don't let fsck_lfs(8) think it has dirty blocks when invoked with the
-n flag.
* Remove the Ifile from the filesystem namespace. The cleaner now uses
a fcntl call on the root inode to find the Ifile filehandle. (As a
side-effect, addresses PR #29144.)
 1.51.2.2 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_balloc.c: revision 1.60
sys/ufs/lfs/lfs_syscalls.c: revision 1.111
sys/ufs/lfs/lfs_segment.c: revision 1.172
sys/ufs/lfs/lfs_vnops.c: revision 1.163
Several minor bug fixes:
* Correct (weak) segment lock assertions in lfs_fragextend and lfs_putpages.
* Keep IN_MODIFIED set if we run out of avail in lfs_putpages.
* Don't try to (re)write buffers on a VBLK vnode; fixes a panic I found
while running with an LFS root.
* Raise priority of LFCNSEGWAIT to PVFS; PUSER is way too low for
something the pagedaemon is relying on.
 1.51.2.1 07-May-2005  tron Apply patch (requested by perseant in ticket #242):
* fsck_lfs buffer cache fixes, including PR #29151
* Change fsck_lfs phase 0 message to reflect reality
* fsck_lfs: check phase 5 (cleanerinfo accounting) even on
roll-forward
* Keep better track of the free list during roll-forward, avoiding
a core dump
* Improve hash table use for fsck_lfs buffer and vnode cache
* Document fsck_lfs flag -f, and implement -q
* Add resize_lfs, including kernel support
* Add LFS to mountd's list of exportable filesystem types
* Make the LFS lkm work again [christos@]
* Add MP locking to the LFS kernel subsystem
* Fix pager_map deadlock in lfs_putpages()
* Avoid incomplete file extension that looks like "partial
truncation" to fsck
* Use lfs_malloc for cleaner malloc, since the cleaner often runs
in low-memory conditions.
* Use splay trees, not hash table, to track page allocation for
write.
* Fix mkdir panic on full fs
* Fix page accounting leak by counting differently.
* Use rightly named structure for lfs_getattr [skrll@]
* Cosmetic changes for readability.
 1.56.4.1 20-Oct-2005  yamt adapt ufs.
 1.56.2.5 27-Feb-2008  yamt sync with head.
 1.56.2.4 21-Jan-2008  yamt sync with head
 1.56.2.3 27-Oct-2007  yamt sync with head.
 1.56.2.2 26-Feb-2007  yamt sync with head.
 1.56.2.1 21-Jun-2006  yamt sync with head.
 1.59.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.59.10.3 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.59.10.2 19-Apr-2006  elad sync with head.
 1.59.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.59.8.2 24-May-2006  yamt sync with head.
 1.59.8.1 11-Apr-2006  yamt sync with head
 1.59.6.2 01-Jun-2006  kardel Sync with head.
 1.59.6.1 22-Apr-2006  simonb Sync with head.
 1.59.4.1 09-Sep-2006  rpaulo sync with head
 1.62.22.1 14-Oct-2007  yamt sync with head.
 1.62.20.3 23-Mar-2008  matt sync with HEAD
 1.62.20.2 09-Jan-2008  matt sync with HEAD
 1.62.20.1 06-Nov-2007  matt sync with HEAD
 1.62.18.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.62.6.3 24-Aug-2007  ad Sync with buffer cache locking changes. See buf.h/vfs_bio.c for details.
Some minor portions are incomplete and needs to be verified as a whole.
 1.62.6.2 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.62.6.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.63.10.1 02-Jan-2008  bouyer Sync with HEAD
 1.63.6.2 19-Dec-2007  ad Use a global lfs_lock.
 1.63.6.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.63.4.1 18-Feb-2008  mjf Sync with HEAD.
 1.65.10.3 11-Mar-2010  yamt sync with head
 1.65.10.2 04-May-2009  yamt sync with head.
 1.65.10.1 16-May-2008  yamt sync with head.
 1.65.8.1 18-May-2008  yamt sync with head.
 1.65.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.66.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.67.12.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.67.6.1 28-Apr-2009  skrll Sync with HEAD.
 1.68.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.70.12.4 03-Dec-2017  jdolecek update from HEAD
 1.70.12.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.70.12.2 23-Jun-2013  tls resync from head
 1.70.12.1 25-Feb-2013  tls resync with head
 1.70.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.70.2.1 23-Jan-2013  yamt sync with head
 1.77.2.1 28-Aug-2013  rmind sync with head
 1.80.6.5 28-Aug-2017  skrll Sync with HEAD
 1.80.6.4 05-Oct-2016  skrll Sync with HEAD
 1.80.6.3 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.80.6.2 22-Sep-2015  skrll Sync with HEAD
 1.80.6.1 06-Apr-2015  skrll Sync with HEAD
 1.88.2.1 26-Apr-2017  pgoyette Sync with HEAD
 1.91.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.92.6.1 30-Oct-2017  snj Pull up following revision(s) (requested by maya in ticket #330):
sbin/fsck_lfs/inode.c: 1.69
sbin/fsck_lfs/lfs.c: 1.73
sbin/fsck_lfs/pass6.c: 1.50
sbin/fsck_lfs/segwrite.c: 1.46
sys/ufs/lfs/lfs.h: 1.202-1.203
sys/ufs/lfs/lfs_accessors.h: 1.48
sys/ufs/lfs/lfs_alloc.c: 1.136-1.137
sys/ufs/lfs/lfs_balloc.c: 1.94
sys/ufs/lfs/lfs_bio.c: 1.141
sys/ufs/lfs/lfs_extern.h: 1.113
sys/ufs/lfs/lfs_inode.c: 1.156-1.157
sys/ufs/lfs/lfs_inode.h: 1.20, 1.21, 1.23
sys/ufs/lfs/lfs_itimes.c: 1.20
sys/ufs/lfs/lfs_pages.c: 1.13-1.15
sys/ufs/lfs/lfs_rename.c: 1.22
sys/ufs/lfs/lfs_segment.c: 1.270-1.275
sys/ufs/lfs/lfs_subr.c: 1.94-1.97
sys/ufs/lfs/lfs_syscalls.c: 1.175
sys/ufs/lfs/lfs_vfsops.c: 1.360
sys/ufs/lfs/lfs_vnops.c: 1.316-1.321
sys/ufs/lfs/ulfs_inode.c: 1.20
sys/ufs/lfs/ulfs_inode.h: 1.24
sys/ufs/lfs/ulfs_lookup.c: 1.41
sys/ufs/lfs/ulfs_quota2.c: 1.31
sys/ufs/lfs/ulfs_readwrite.c: 1.24
sys/ufs/lfs/ulfs_vnops.c: 1.49-1.50
Update inode member i_flag --> i_state to keep up with kernel changes
Move definition of IN_ALLMOD near the flag it's a mask for.
Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
Correct confusion between i_flag and i_flags
These will have to be renamed.
Spotted by Riastradh, thanks!
Add an XXX about the missing flags so it's not buried in a commit
message.
now the XXX count for LFS is 260
Rename i_flag to i_state.
The similarity to i_flags has previously caused errors.
Use continue to denote the no-op loop to match netbsd style
newline for extra clarity.
It isn't safe to drain dirops with seglock held, it'll deadlock if there
are any dirops. drain before grabbing seglock.
lfs_dirops == 0 is always true (as we already drained dirops), so omit
that part of the comparison.
Fixes a lot of LFS deadlocks. PR kern/52301
Many thanks to dholland for help analyzing coredumps
Ifdef out KDASSERT which fires on my machine.
Deduplicate sanity check that seglock is held on segunlock
Revert r1.272 fix to PR kern/52301, the performance hit is making things
unusable.
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar
XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
fix buffer overflow/KASSERT when cookies are supplied
lfs no longer uses the ffs-style struct direct, use the correct minimum
size
from dholland
XXX more wrong
Consistently use {,UN}MARK_VNODE macros rather than function calls.
Not much point doing anything after a panic call
Ask some question about the code in a XXX comment
XXX question our double-flushing of dirops
Fix typo in comment
 1.94.12.1 29-Feb-2020  ad Sync with head.
 1.94.10.1 17-Aug-2020  martin Pull up following revision(s) (requested by riastradh in ticket #1050):

sys/ufs/lfs/lfs_subr.c: revision 1.101
sys/ufs/lfs/lfs_subr.c: revision 1.102
sys/ufs/lfs/lfs_inode.c: revision 1.158
sys/ufs/lfs/lfs_inode.h: revision 1.25
sys/ufs/lfs/lfs_balloc.c: revision 1.95
sys/ufs/lfs/lfs_pages.c: revision 1.21
sys/ufs/lfs/lfs_vnops.c: revision 1.330
sys/ufs/lfs/lfs_alloc.c: revision 1.140 (patch)
sys/ufs/lfs/lfs_alloc.c: revision 1.141 (patch)
lib/libp2k/p2k.c: revision 1.72
sys/ufs/lfs/lfs.h: revision 1.205
sys/ufs/lfs/lfs.h: revision 1.206
sys/ufs/lfs/lfs_segment.c: revision 1.284
sys/ufs/lfs/lfs.h: revision 1.207
sys/ufs/lfs/lfs_segment.c: revision 1.285
sys/ufs/lfs/lfs_debug.c: revision 1.55
sys/ufs/lfs/lfs_rename.c: revision 1.23
usr.sbin/dumplfs/dumplfs.c: revision 1.65
sys/ufs/lfs/lfs_vfsops.c: revision 1.371
sys/arch/i386/stand/efiboot/bootx64/Makefile: revision 1.3
sys/ufs/lfs/lfs_vfsops.c: revision 1.372
sys/ufs/lfs/lfs_vfsops.c: revision 1.373
sbin/fsck_lfs/pass1.c: revision 1.46
sys/ufs/lfs/lfs_vnops.c: revision 1.326
sys/ufs/lfs/lfs_vnops.c: revision 1.327
sys/ufs/lfs/lfs_vfsops.c: revision 1.375 (patch)
sys/ufs/lfs/lfs_vnops.c: revision 1.328
sys/ufs/lfs/lfs_subr.c: revision 1.98
sys/ufs/lfs/lfs_extern.h: revision 1.116
sys/ufs/lfs/lfs_vnops.c: revision 1.329
sys/ufs/lfs/lfs_subr.c: revision 1.99
sys/ufs/lfs/lfs_extern.h: revision 1.117
sys/ufs/lfs/lfs_accessors.h: revision 1.49
sys/ufs/lfs/lfs_extern.h: revision 1.118
sys/rump/fs/lib/liblfs/Makefile: revision 1.15
sys/ufs/lfs/lfs_bio.c: revision 1.146 (patch)
sys/ufs/lfs/lfs_bio.c: revision 1.147
sys/ufs/lfs/lfs_subr.c: revision 1.100

Fix kassert in lfs by initializing vp first.

Use a marker node to iterate lfs_dchainhd / i_lfs_dchain.

I believe elements can be removed while the lock is dropped,
including the next node we're hanging on to.

Just use VOP_BWRITE for lfs_bwrite_log.
Hope this doesn't cause trouble with vfs_suspend.

Teach lfs to transition ro<->rw.

Prevent new dirops while we issue lfs_flush_dirops.

lfs_flush_dirops assumes (by KASSERT((ip->i_state & IN_ADIROP) == 0))
that vnodes on the dchain will not become involved in active dirops
even while holding no other locks (lfs_lock, v_interlock), so we must
set lfs_writer here. All other callers already set lfs_writer.

We set fs->lfs_writer++ without explicitly doing lfs_writer_enter
because
(a) we already waited for the dirops to drain, and
(b) we hold lfs_lock and cannot drop it before setting lfs_writer.

Assert lfs_writer where I think we can now prove it.

Serialize access to the splay tree with lfs_lock.

Change some cheap KDASSERT into KASSERT.

Take a reference and fix assertions in lfs_flush_dirops.
Fixes panic:
KASSERT((ip->i_state & IN_ADIROP) == 0) at lfs_vnops.c:1670
lfs_flush_dirops
lfs_check
lfs_setattr
VOP_SETATTR
change_mode
sys_fchmod
syscall

This assertion -- and the assertion that vp->v_uflag has VU_DIROP set
-- is valid only until we release lfs_lock, because we may race with
lfs_unmark_dirop which will remove the nodes and change the flags.

Further, vp itself is valid only as long as it is referenced, which it
is as long as it's on the dchain, but lfs_unmark_dirop drops the
dchain's reference.

Don't lfs_writer_enter while holding v_interlock.

There's no need to lfs_writer_enter at all here, as far as I can see.
lfs_flush_fs will do it for us.

Break deadlock in PR kern/52301.

The lock order is lfs_writer -> lfs_seglock. The problem in 52301 is
that lfs_segwrite violates this lock order by sometimes doing
lfs_seglock -> lfs_writer, either (a) when doing a checkpoint or (b),
opportunistically, when there are no dirops pending. Both cases can
deadlock, because dirops sometimes take the seglock (lfs_truncate,
lfs_valloc, lfs_vfree):
(a) There may be dirops pending, and they may be waiting for the
seglock, so we can't wait for them to complete while holding the
seglock.
(b) The test for fs->lfs_dirops == 0 happens unlocked, and the state
may change by the time lfs_writer_enter acquires lfs_lock.

To resolve this in each case:
(a) Do lfs_writer_enter before lfs_seglock, since we will need it
unconditionally anyway. The worst performance impact of this should
be that some dirops get delayed a little bit.
(b) Create a new lfs_writer_tryenter to use at this point so that the
test for fs->lfs_dirops == 0 and the acquisition of lfs_writer happen
atomically under lfs_lock.

Initialize/destroy lfs_allclean_wakeup in modcmd, not lfs_mountfs.

Fixes reloading lfs.kmod.

In lfs_update, hold lfs_writer around lfs_vflush.

Otherwise, we might do
lfs_vflush
-> lfs_seglock
-> lfs_segwait(SEGM_CKP)
-> lfs_writer_enter
which is the reverse of the lfs_writer -> lfs_seglock ordering.

Call lfs_orphan in lfs_rename while we're still in the dirop.
lfs_writer_enter can't fail; keep it simple and don't pretend it can.

Assert that mtsleep can't fail either -- it doesn't catch signals and
there's no timeout.

Teach LFS_ORPHAN_NEXTFREE about lfs64.

Dust off the orphan detection code and try to make it work.

Fix !DIAGNOSTIC compile

Fix userland references to LFS_ORPHAN_NEXTFREE.

Forgot to grep for these or do a full distribution build, oops!

Fix missing <sys/evcnt.h> by removing the evcnts instead.

Just wanted to confirm that a race might happen, and indeed it did.
These serve little diagnostic value otherwise.

OR into bp->b_cflags; don't overwrite.

CTASSERT lfs on-disk structure sizes.

Avoid misaligned access to lfs64 on-disk records in memory.
lfs64 directory entries are only 32-bit aligned in order to conserve
space in directory blocks, and we had a hack to stuff a 64-bit inode
in them. This replaces the hack by __aligned(4) __packed, and goes
further:

1. It's not clear that all the other lfs64 data structures are 64-bit
aligned on disk to begin with. We can go through these later and
upgrade them from
struct foo64 {
...
} __aligned(4) __packed;
union foo {
struct foo64 f64;
...
};
to
struct foo64 {
...
};
union foo {
struct foo64 f64 __aligned(8);
...
} __aligned(4) __packed;
if we really want to take advantage of 64-bit memory accesses.
However, the __aligned(4) __packed must remain on the union
because:
2. We access even the lfs32 data structures via a union that has
lfs64 members, and it turns out that compilers will assume access
through a union with 64-bit aligned members implies the whole
union has 64-bit alignment, even if we're only accessing a 32-bit
aligned member.

Fix clang build after packed lfs64 accessor change.

Suppress spurious address-of-packed error in rump lfs too.
 1.94.6.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.152 03-Nov-2025  perseant Be more careful about only setting IN_CLEANING in lfs_setclean() and clearing
it in lfs_clrclean(). Prevents a crash from re-removing an entry from the
lfs_cleanhd TAILQ.
 1.151 20-Oct-2025  perseant * Generalize the partial-segment parser introduced for roll-forward,
using it to facilitate an in-kernel segment rewriter (cleaner), and a
mechanism to check whether a segment is in fact empty (only used
with DEBUG).

* Add these new fcntl calls:
- LFCNFILESTATS: For each inode given, report its number of direct
blocks, how many gaps (discontinuities) there are between direct
blocks, and how large the total gap distance is. This will be
useful for a coalescing agent.
- LFCNREWRITEFILE: For each inode given, rewrite its direct blocks,
effectively coalescing it into as compact a form as possible.
- LFCNSCRAMBLE: As above, except that it only rewrites every other
block. This causes the file to have many gaps that can be
measured with LFCNFILESTATS and addressed with LFCNREWRITEFILE,
for testing purposes.
- LFCNREWRITESEGS: Rewrite any live data in the given segments.
This is intended to simplify the cleaner API and facilitate an
in-kernel cleaner.
- LFCNCLEANERINFO: Get the most current CLEANERINFO data from the
kernel.
- LFCNSEGUSE: Retrieve segment usage data from the kernel.

* Vnodes marked IN_CLEANING now take a reference. Add a new "cleaner
lock", which must be taken by the cleaner before the segment lock,
and before marking nodes IN_CLEANING. This allows us to flush
vnodes, if necessary, before the cleaning segment is written, and
never to flush vnodes being cleaned. When the cleaner lock is
released, the vnodes are cleared of IN_CLEANING and the reference
dropped.

* Track a potential infinite loop in lfs_gatherblock.

* Pull "needs to flush" and "needs to wait for flush" into functions
instead of inlining their definitions.
 1.150 15-Sep-2025  perseant If we don't have enough space, flush with checkpoint: the Ifile might be
clogging up the buffer cache.

Rewrite the logic in lfs_flush() so that the requested filesystem is always
flushed, regardless of whether only_onefs is set.

Use LFS_WAIT_BYTES and LFS_WAIT_BUFS as the thresholds when determining
whether to wait for resources, rather than their _MAX_ counterparts.
 1.149 05-Sep-2020  riastradh Round of uvm.h cleanup.

The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
to query whether curlwp is the pagedaemon, which should maybe be
exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
by it. We should split up uvm_extern.h but this will serve for now
to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
UVMHIST(ubchist), since ubchist is declared in uvm.h but the
reference evaporates if UVMHIST is not defined, so we reduce header
file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
here.

ok chs@
 1.148 11-Jun-2020  ad uvm_availmem(): give it a boolean argument to specify whether a recent
cached value will do, or if the very latest total must be fetched. It can
be called thousands of times a second and fetching the totals impacts not
only the calling LWP but other CPUs doing unrelated activity in the VM
system.
 1.147 14-Mar-2020  ad OR into bp->b_cflags; don't overwrite.
 1.146 23-Feb-2020  riastradh Prevent new dirops while we issue lfs_flush_dirops.

lfs_flush_dirops assumes (by KASSERT((ip->i_state & IN_ADIROP) == 0))
that vnodes on the dchain will not become involved in active dirops
even while holding no other locks (lfs_lock, v_interlock), so we must
set lfs_writer here. All other callers already set lfs_writer.

We set fs->lfs_writer++ without explicitly doing lfs_writer_enter
because

(a) we already waited for the dirops to drain, and
(b) we hold lfs_lock and cannot drop it before setting lfs_writer.
 1.145 18-Feb-2020  chs remove the aiodoned thread. I originally added this to provide a thread context
for doing page cache iodone work, but since then biodone() has changed to
hand off all iodone work to a softint thread, so we no longer need the
special-purpose aiodoned thread.
 1.144 31-Dec-2019  ad branches: 1.144.2;
Rename uvm_free() -> uvm_availmem().
 1.143 21-Dec-2019  ad uvmexp.free -> uvm_free()
 1.142 09-Jun-2018  zafer branches: 1.142.2; 1.142.6;
Add missing b_cflags and b_oflags.
Ok dholland@
Addresses PR kern/42342 by Yoshihiro Nakajima
 1.141 10-Jun-2017  maya branches: 1.141.4;
Rename i_flag to i_state.

The similarity to i_flags has previously caused errors.
 1.140 08-Jun-2017  chs move some buffer cache internals declarations from buf.h to vfs_bio.c.
this is needed to avoid name conflicts with ZFS and also
makes it clearer that other code shouldn't be messing with these.
remove the LFS debug code that poked around in bufqueues and
remove the BQ_EMPTY bufqueue since nothing uses it anymore.
provide a function to let LFS and wapbl read the value of nbuf for now.
 1.139 17-Apr-2017  hannken branches: 1.139.4;
Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.
 1.138 13-Apr-2017  hannken Switch lfs_flush() and lfs_writerd() to mountlist iterator.
 1.137 01-Apr-2017  maya Switch lfs_writer_daemon to use condvar instead of mtsleep.
track thread existence with struct lwp instead of pid + lid,
it's more useful from ddb.
 1.136 13-Mar-2017  riastradh #if DIAGNOSTIC panic ---> KASSERT

Replace some #if DEBUG by this too. DEBUG is only for expensive
assertions; these are not.
 1.135 03-Oct-2015  hannken branches: 1.135.2; 1.135.4;
Remove dubious vhold()/holdrele() from lfs_reserve().
The vnodes are always referenced on entry.

If we changed ulfs_remove() and ulfs_rmdir() to return the locked dvp
the vnodes were always locked on entry.

Remove an outdated comment from lfs_reserveavail(), unlocking/relocking
the vnode was removed in rev 1.49.
 1.134 12-Aug-2015  dholland Hack up dinode usage to be 64 vs. 32 as needed. Part 1.

(This part changes the native lfs code; the ufs-derived code already
has 64 vs. 32 logic, but as aspects of it are unsafe, and don't
entirely interoperate cleanly with the lfs 64/32 stuff, pass 2 will be
rehashing that.)
 1.133 02-Aug-2015  dholland Fix assorted 64 -> 32 truncations in lfs. Also, some minor tidyups and
corrections in passing.
 1.132 28-Jul-2015  dholland Add a new lfs header file: lfs_accessors.h.

This contains all the accessor functions and macros out of lfs.h.
Add an include of lfs_accessors.h after all uses of lfs.h... except
for code that wants to define its own struct lfs-alike that the
accessors are supposed to play along with. For these, set STRUCT_LFS
and include lfs_accessors.h after the necessary structure has been
defined, so that lfs_accessors.h can emit functions in terms of it.
 1.131 25-Jul-2015  martin Use accessors in DEBUG and DIAGNOSTIC code as well
 1.130 24-Jul-2015  dholland More lfs superblock accessors.
(This changes the rest of the code over; all the accessors were
already added.)

The difference between this commit and the previous one is arbitrary,
but the previous one passed the regression tests on its own so I'm
keeping it separate to help with any bisections that might be needed
in the future.
 1.129 24-Jul-2015  dholland Switch to accessor functions for elements of the LFS on-disk
superblock. This will allow switching between 32/64 bit forms on the
fly; it will also allow handling LFS_EI reasonably tidily. (That
currently doesn't work on the superblock.)

It also gets rid of cpp abuse in the form of fake structure member
macros.

Also, instead of doing sleep/wakeup on &lfs_avail and &lfs_nextseg
inside the on-disk superblock, add extra elements to the in-memory
struct lfs for this. (XXX: these should be changed to condvars, but
not right now)

XXX: this migrates a structure needed by the lfs code in libsa (struct
salfs) into lfs.h, where it doesn't belong, but for the time being
this is necessary in order to allow the accessors (and the various
lfs macros and other goop that relies on them) to compile.
 1.128 27-Nov-2013  christos branches: 1.128.6;
Change the queue.3 *_END(&head) macros to NULL. Since we don't have CIRCLEQ
anymore, all the macros expand to NULL anyway, so this improves readability.
Requested by rmind@
 1.127 23-Nov-2013  christos change the mountlist CIRCLEQ into a TAILQ
 1.126 28-Jul-2013  dholland Add lfs_kernel.h for declarations that don't need to be exposed to userland.

lfs currently has the following headers:
lfs.h - on-disk structures and stuff needed for userlevel tools
lfs_inode.h - additional restricted materials for userlevel tools
that operate the fs (newfs_lfs, fsck_lfs, lfs_cleanerd)
lfs_kernel.h - stuff needed only in the kernel

and the following legacy headers that are expected to be mopped up and
folded into one of the above:
lfs_extern.h - function prototypes
ulfs_bswap.h - endian-independent support
ulfs_dinode.h - now contains very little
ulfs_dirhash.h - dirhash support
ulfs_extattr.h - extattr support
ulfs_extern.h - more function prototypes
ulfs_inode.h - assorted kernel-only declarations
ulfs_quota.h - quota support
ulfs_quota1.h - more quota support
ulfs_quota2.h - more quota support
ulfs_quotacommon.h - more quota support
ulfsmount.h - legacy copy of ufsmount material
 1.125 18-Jun-2013  christos branches: 1.125.2;
Prefix most of the cpp macros with lfs_ and LFS_ to avoid conflicts with ffs.
This was done so that boot blocks that want to compile both FFS and LFS in
the same file work.
 1.124 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.123 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.122 16-Feb-2012  perseant branches: 1.122.2;
Pass t_renamerace and t_rmdirrace tests.

Adapt dholland@'s fix to ufs_rename to fix PR kern/43582. Address several
other MP locking issues discovered during the course of investigating the
same problem.

Removed extraneous vn_lock() calls on the Ifile, since the Ifile writes
are controlled by the segment lock.

Fix PR kern/45982 by deemphasizing the estimate of how much metadata
will fill the empty space on disk when the disk is nearly empty
(t_renamerace crates a lot of inode blocks on a tiny empty disk).
 1.121 02-Jan-2012  perseant branches: 1.121.2;

* Remove PGO_RECLAIM during lfs_putpages()' call to genfs_putpages(),
to avoid a live lock in the latter when reclaiming a vnode with
dirty pages.

* Add a new segment flag, SEGM_RECLAIM, to note when a segment is
being written for vnode reclamation, and record which inode is being
reclaimed, to aid in forensic debugging.

* Add a new segment flag, SEGM_SINGLE, so that opportunistic writes
can write a single segment's worth of blocks and then stop, rather
than writing all the way up to the cleaner's reserved number of
segments.

* Add assert statements to check mutex ownership is the way it ought
to be, mostly in lfs_putpages; fix problems uncovered by this.

* Don't clear VU_DIROP until the inode actually makes its way to disk,
avoiding a problem where dirop inodes could become separated
(uncovered by a modified version of the "ckckp" forensic regression
test).

* Move the vfs_getopsbyname() call into lfs_writerd. Prepare code to
make lfs_writerd notice when there are no more LFSs, and exit losing
the reference, so that, in theory, the module can be unloaded. This
code is not enabled, since it causes a crash on exit.

* Set IN_MODIFIED on inodes flushed by lfs_flush_dirops. Really we
only need to set IN_MODIFIED if we are going to write them again
(e.g., to write pages); need to think about this more.

Finally, several changes to help avoid "no clean segments" panics:

* In lfs_bmapv, note when a vnode is loaded only to discover whether
its blocks are live, so it can immediately be recycled. Since the
cleaner will try to choose ~empty segments over full ones, this
prevents the cleaner from (1) filling the vnode cache with junk, and
(2) squeezing any unwritten writes to disk and running the fs out of
segments.

* Overestimate by half the amount of metadata that will be required
to fill the clean segments. This will make the disk appear smaller,
but should help avoid a "no clean segments" panic.

* Rearrange lfs_writerd. In particular, lfs_writerd now pays
attention to the number of clean segments available, and holds off
writing until there is room.
 1.120 11-Jul-2011  hannken branches: 1.120.2; 1.120.6;
Change VOP_BWRITE() to take a vnode as its first argument like all other
VOPs do. Layered file systems no longer have to modify bp->b_vp and run
into trouble when an async VOP_BWRITE() uses the wrong vnode.

- change all occurences of VOP_BWRITE(bp) to VOP_BWRITE(bp->b_vp, bp).
- remove layer_bwrite().
- welcome to 5.99.55

Adresses PR kern/38762 panic: vwakeup: neg numoutput

No objections from tech-kern@.
 1.119 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.118 24-Jun-2010  hannken branches: 1.118.6;
Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.
 1.117 16-Feb-2010  mlelstv branches: 1.117.2;
Three changes in a single commit.

- drop the notion of frags (LFS fragments) vs fsb (FFS fragments)
The code uses a complicated unity function that just makes the
code difficult to understand.

- support larger sector sizes. Fix disk address computations
to use DEV_BSIZE in the kernel as required by device drivers
and to use sector sizes in userland.

- Fix several locking bugs in lfs_bio.c and lfs_subr.c.
 1.116 08-Jan-2010  pooka branches: 1.116.2;
The VATTR_NULL/VREF/VHOLD/HOLDRELE() macros lost their will to live
years ago when the kernel was modified to not alter ABI based on
DIAGNOSTIC, and now just call the respective function interfaces
(in lowercase). Plenty of mix'n match upper/lowercase has creeped
into the tree since then. Nuke the macros and convert all callsites
to lowercase.

no functional change
 1.115 07-Dec-2009  eeh Fix some more hangs and deadlocks.
 1.114 06-May-2008  ad branches: 1.114.18;
PR kern/38141 lookup/vfs_busy acquire rwlock recursively

Simplify the mount locking. Remove all the crud to deal with recursion on
the mount lock, and crud to deal with unmount as another weirdo lock.

Hopefully this will once and for all fix the deadlocks with this. With this
commit there are two locks on each mount:

- krwlock_t mnt_unmounting. This is used to prevent unmount across critical
sections like getnewvnode(). It's only ever read locked with rw_tryenter(),
and is only ever write locked in dounmount(). A write hold can't be taken
on this lock if the current LWP could hold a vnode lock.

- kmutex_t mnt_updating. This is taken by threads updating the mount, for
example when going r/o -> r/w, and is only present to serialize updates.
In order to take this lock, a read hold must first be taken on
mnt_unmounting, and the two need to be held across the operation.

One effect of this change: previously if an unmount failed, we would make a
half hearted attempt to back out of it gracefully, but that was unlikely to
work in a lot of cases. Now while an unmount that will be aborted is in
progress, new file operations within the mount will fail instead of being
delayed. That is unlikely to be a problem though, because if the admin
requests unmount of a file system then s(he) has made a decision to deny
access to the resource.
 1.113 30-Apr-2008  ad PR kern/38135 vfs_busy/vfs_trybusy confusion

The previous fix worked, but it opened a window where mounts could have
disappeared from mountlist while the caller was traversing it using
vfs_trybusy(). Fix that.
 1.112 29-Apr-2008  ad kern/38135 vfs_busy/vfs_trybusy confusion

The symptom was that sometimes file systems would occasionally not appear
in output from 'df' or 'mount' if the system was busy. Resolution:

- Make mount locks work somewhat like vm_map locks.
- vfs_trybusy() now only fails if the mount is gone, or if someone is
unmounting the file system. Simple contention on mnt_lock doesn't
cause it to fail.
- vfs_busy() will wait even if the file system is being unmounted.
 1.111 28-Apr-2008  martin Remove clause 3 and 4 from TNF licenses
 1.110 20-Feb-2008  matt branches: 1.110.6; 1.110.8; 1.110.10;
Merge all the *different* definitions of bufqueues into one common one.
 1.109 15-Feb-2008  ad The buffer LOCKED flag need not be under the protection of bufcache_lock,
BUSY is enough.
 1.108 30-Jan-2008  ad PR kern/37706 (forced unmount of file systems is unsafe):

- Do reference counting for 'struct mount'. Each vnode associated with a
mount takes a reference, and in turn the mount takes a reference to the
vfsops.
- Now that mounts are reference counted, replace the overcomplicated mount
locking inherited from 4.4BSD with a recursable rwlock.
 1.107 02-Jan-2008  ad Merge vmlocking2 to head.
 1.106 11-Oct-2007  ad branches: 1.106.4; 1.106.6; 1.106.10;
Remove LOCK_ASSERT(!simple_lock_held(&foo));
 1.105 10-Oct-2007  ad Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.104 08-Oct-2007  ad Merge ffs locking & brelse changes from the vmlocking branch.
 1.103 29-Jul-2007  ad branches: 1.103.4; 1.103.6; 1.103.8; 1.103.10;
It's not a good idea for device drivers to modify b_flags, as they don't
need to understand the locking around that field. Instead of setting
B_ERROR, set b_error instead. b_error is 'owned' by whoever completes
the I/O request.
 1.102 17-Jul-2007  christos branches: 1.102.2;
eliminate MFSNAMELEN
 1.101 16-May-2007  perseant Change references to SEGM_W_DIROPS to SEGM_CKP, and replace the logic that
formerly used SEGM_W_DIROPS in lfs_segwrite() appropriately. This prevents
a problem in which processes could get stuck in "buffers" sleep forever.
 1.100 18-Apr-2007  perseant Add/change a couple of comments about locking restrictions.
 1.99 17-Apr-2007  perseant Install a new sysctl, vfs.lfs.ignore_lazy_sync, which causes LFS to ignore
the "smooth" syncer, as if vfs.sync.*delay = 0, but only for LFS. The
default is "on", i.e., ignore lazy sync.

Reduce the amount of polling/busy-waiting done by lfs_putpages(). To
accomplish this, copied genfs_putpages() and modified it to indicate which
page it was that caused it to return with EDEADLK. fsync()/fdatasync()
should no longer ever fail with EAGAIN, and should not consume huge
quantities of cpu.

Also, try to make dirops less likely to be written as the result of a
VOP_PUTPAGES(), while ensuring that they are written regularly.
 1.98 16-Nov-2006  christos branches: 1.98.2; 1.98.4; 1.98.8; 1.98.10; 1.98.16;
__unused removal on arguments; approved by core.
 1.97 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.96 04-Oct-2006  christos fix empty if
 1.95 15-Sep-2006  yamt branches: 1.95.2;
merge yamt-pdpolicy branch.
- separate page replacement policy from the rest of kernel
- implement an alternative replacement policy
 1.94 29-Jun-2006  perseant branches: 1.94.4;
Don't wake up the cleaner if the filesystem is unwrappable, and fix the
compatibility fcntls.

Also includes one-line fixes for an MP locking bug and a zero-length FINFO
problem that manifested during testing.
 1.93 14-May-2006  elad branches: 1.93.4;
integrate kauth.
 1.92 04-May-2006  perseant Introduce another per-filesystem parameter, lfs_resvseg, to separate the
notion of "how many segments are reserved for the cleaner" from that of
"how many segments are not counted in lfs_bfree". The default value
used for existing filesystems is the same as the previous implicit value
of (lfs_minfreeseg / 2 + 1), modulo some sanity checking.

Count pending dirops on a per-filesystem basis, since once we start
writing them we can't stop until we're done. This seems to help stave off
the "no clean segments" panic in the case of filling the filesystem with
directories and small files (e.g. simultaneously unpacking more copies of
pkgsrc than will fit).
 1.91 13-Apr-2006  perseant Make lfs_vref/lfs_vunref not need to know about VXLOCK and VFREEING
explicitly (especially since we didn't know about VFREEING at all before),
but notice the EBUSY return from vget() instead.

Fix some more MP locking protocol issues, most of which were pointed out by
Christian Ehrhardt this morning on tech-kern.
 1.90 05-Mar-2006  christos branches: 1.90.2; 1.90.4;
cleanup more SET/CLR/ISSET lossage
 1.89 06-Jan-2006  yamt branches: 1.89.2; 1.89.4; 1.89.6;
initialize necessary members of struct buf. PR/32462 from Reinoud Zandijk.
 1.88 04-Jan-2006  yamt - add simple functions to allocate/free a buffer for i/o.
- make bufpool static.
 1.87 11-Dec-2005  christos branches: 1.87.2;
merge ktrace-lwp.
 1.86 29-May-2005  christos branches: 1.86.2;
- sprinkle const
- avoid shadow variables.
 1.85 23-Apr-2005  perseant Provide a resize_lfs(8), including kernel and cleaner support. The current
implementation requires the fs to be mounted while resizing. Tested in both
directions, and everything appears to work happily, but ymmv.
 1.84 19-Apr-2005  perseant Keep per-inode, per-fs, and subsystem-wide counts of blocks allocated through
lfs_balloc(), and use that to estimate the number of dirty pages belonging
to LFS (subsystem or filesystem). This is almost certainly wrong for
the case of a large mmap()ed region, but the accounting is tighter than
what we had before, and performs much better in the typical case of pages
dirtied through write().
 1.83 06-Apr-2005  perseant Fix some locking issues that appeared with the simple_lock work.
Address a "pager_map" deadlock in lfs_putpages().
 1.82 01-Apr-2005  perseant Protect various per-fs structures with fs->lfs_interlock simple_lock, to
improve behavior in the multiprocessor case. Add debugging segment-lock
assertion statements.
 1.81 09-Mar-2005  perseant branches: 1.81.2;
Be more careful about handling of flags to lfs_flush, to ensure that
the lfs_writing mutex is respected.
 1.80 08-Mar-2005  perseant Straighten out the maze of ifdefs. Instead, consolidate all the debugging
stuff under '#ifdef DEBUG', and use sysctl knobs to turn on/off particular
parts of the debugging reporting (if DEBUG is enabled). Re-enable the LFS
statistics in sysctl, while I'm there. A bit of a rototill.
 1.79 26-Feb-2005  perry nuke trailing whitespace
 1.78 26-Feb-2005  perseant Various minor LFS improvements:

* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statvfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().
 1.77 28-Jan-2004  yamt branches: 1.77.6; 1.77.8; 1.77.10;
use bufmem instead of bufpages to make lfs a little less broken.
 1.76 04-Dec-2003  yamt use b_private rather than b_saveaddr.
XXX LFS_USE_B_INVAL
 1.75 03-Oct-2003  yamt assertions.
 1.74 23-Sep-2003  yamt remove unnecessary externs of lfs_do_flush.
 1.73 07-Sep-2003  yamt - raise spl to bio in lfs_countlocked() rather than having callers to do so.
- buffer cache MP locks.
- assert B_CALL buffers are not on the free queue.
 1.72 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.71 12-Jul-2003  yamt - protect global resource counts with lfs_subsys_lock.
- clean up scattered externs a little.
 1.70 02-Jul-2003  yamt a comment.
 1.69 02-Jul-2003  yamt use queue.h macros.
 1.68 02-Jul-2003  yamt use VFSTOUFS macro.
 1.67 02-Jul-2003  yamt - add a new functions, lfs_writer_enter/leave, and use them instead of
duplicated code fragments.
- add an assertion.
 1.66 27-Apr-2003  perseant branches: 1.66.2;
Don't change update time on block write; lets e.g. "tar xp" work properly.
 1.65 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.64 15-Mar-2003  perseant Add simple_lock protection for lfs_seglock and lfs_subsys_pages; these will
be expanded to cover other per-fs and subsystem-wide data as well.

Fix a case of IN_MODIFIED being set without updating lfs_uinodes, resulting
in a "lfs_uinodes < 0" panic.

Fix a deadlock in lfs_putpages arising from the need to busy all pages in a
block; unbusy any that had already been busied before starting over.
 1.63 02-Mar-2003  perseant Account SEGUSE_ACTIVE correctly so that the automatic segment cleaning
actually happens.

Add a new fcntl call that will write the minimum necessary to checkpoint
(i.e., for on-disk directory structure to be consistent, not including
updates to file data) so that the cleaner can clean segments more quickly
without sacrificing three-way commit for cleaning.
 1.62 25-Feb-2003  thorpej Add a new BUF_INIT() macro which initializes b_dep and b_interlock, and
use it. This fixes a few places where either b_dep or b_interlock were
not properly initialized.
 1.61 20-Feb-2003  perseant Tabify, and fix some comment alignment problems.
 1.60 19-Feb-2003  yamt workaround for "another flush is..." infinity loop in writerd.
if we're writerd, sleep in lfs_flush until another writer goes away
instead of busy loop in writed.
 1.59 19-Feb-2003  yamt init b_interlock.
 1.58 17-Feb-2003  perseant Add code to UBCify LFS. This is still behind "#ifdef LFS_UBC" for now
(there are still some details to work out) but expect that to go
away soon. To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:

* Create a writer daemon kernel thread whose purpose is to handle page
writes for the pagedaemon, but which also takes over some of the
functions of lfs_check(). This thread is started the first time an
LFS is mounted.

* Add a "flags" parameter to GOP_SIZE. Current values are
GOP_SIZE_READ, meaning that the call should return the size of the
in-core version of the file, and GOP_SIZE_WRITE, meaning that it
should return the on-disk size. One of GOP_SIZE_READ or
GOP_SIZE_WRITE must be specified.

* Instead of using malloc(...M_WAITOK) for everything, reserve enough
resources to get by and use malloc(...M_NOWAIT), using the reserves if
necessary. Use the pool subsystem for structures small enough that
this is feasible. This also obsoletes LFS_THROTTLE.

And a few that are not strictly necessary:

* Moves the LFS inode extensions off onto a separately allocated
structure; getting closer to LFS as an LKM. "Welcome to 1.6O."

* Unified GOP_ALLOC between FFS and LFS.

* Update LFS copyright headers to correct values.

* Actually cast to unsigned in lfs_shellsort, like the comment says.

* Keep track of which segments were empty before the previous
checkpoint; any segments that pass two checkpoints both dirty and
empty can be summarily cleaned. Do this. Right now lfs_segclean
still works, but this should be turned into an effectless
compatibility syscall.
 1.57 05-Feb-2003  pk Make the buffer cache code MP-safe.
 1.56 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.55 30-Dec-2002  yamt comment and assertions
 1.54 30-Dec-2002  yamt move check of lfs_unlockvp from lfs_reserveavail to lfs_reserve
because lfs_reservebuf needs same check as well.
 1.53 29-Dec-2002  yamt fix vref/vunref mismatch.
 1.52 28-Dec-2002  yamt - in lfs_reserve, vref vnodes that we're locking so that cleaner doesn't
try to reclaim them.
(workaround for deadlock noted in the comment in lfs_reserveavail)
- in lfs_rename, mark vnodes which are being moved as well as directry vnodes.
 1.51 26-Dec-2002  yamt - in lfs_reserve, reserve locked buffer count as well.
- don't wait for locking buf in lfs_bwrite_ext to avoid deadlocks.
- skip lfs_reserve when we're doing dirop.
reserve more (for lfs_truncate) in set_dirop instead.

this mostly solves PR 18972. (and hopefully PR 19196)
 1.50 22-Dec-2002  yamt add a XXX comment. (description of possible deadlock)
 1.49 17-Dec-2002  yamt #if 0 out vnode unlock/lock in lfs_reserve for now and add a comment about it.
deadlock is better than corruption (or panic), IMO.
 1.48 14-Dec-2002  yamt - in lfs_bwrite_ext, if we're cleaner,
mark inode IN_CLEANING rather then IN_MODIFIED.
otherwise cleaned (indirect) blocks belongs to the inode isn't written
until next sync.
- add assertions.
 1.47 27-Nov-2002  yamt more XXX comment.
 1.46 24-Nov-2002  yamt add a XXX comment to lfs_reserve.
* it isn't safe to unlock vp here
* because we're passing data using inode from namei.
* (eg. i_offset)
 1.45 24-Nov-2002  yamt lfs_reserve shouldn't block for lfs_unlockvp.
otherwise cleaner deadlocks.
PR 19134.
 1.44 20-Jun-2002  perseant Fix miscalculation in lfs_fits found by Trevin Beattie <trevin@xmission.com>.
Change some of the variable names from "nb", "db" to "fsb" to reflect their
calling conventions.
 1.43 14-May-2002  perseant branches: 1.43.2;
Phase one of my three-phase plan to make LFS play nice with UBC, and bug-fixes
I found while making sure there weren't any new ones.

* Make the write clusters keep track of the buffers whose blocks they contain.
This should make it possible to (1) write clusters using a page mapping
instead of malloc, if desired, and (2) schedule blocks for rewriting
(somewhere else) if a write error occurs. Code is present to use
pagemove() to construct the clusters but that is untested and will go away
anyway in favor of page mapping.
* DEBUG now keeps a log of Ifile writes, so that any lingering instances of
the "dirty bufs" problem can be properly debugged.
* Keep track of whether the Ifile has been dirtied by various routines that
can be called by lfs_segwrite, and loop on that until it is clean, for
a checkpoint. Checkpoints need to be squeaky clean.
* Warn the user (once) if the Ifile grows larger than is reasonable for their
buffer cache. Both lfs_mountfs and lfs_unmount check since the Ifile can
grow.
* If an inode is not found in a disk block, try rereading the block, under
the assumption that the block was copied to a cluster and then freed.
* Protect WRITEINPROG() with splbio() to fix a hang in lfs_update.
 1.42 12-May-2002  matt Eliminate commons.
 1.41 11-Feb-2002  perseant Include the space taken by inodes in the count made by lfs_check();
make VOP_SETATTR call lfs_check. This prevents large numbers of inode
changes (say, at the end of tar(1)) from filling the buffer cache.
 1.40 23-Nov-2001  chs add spaces for KNF. confirmed to produce identical objects.
 1.39 08-Nov-2001  lukem add RCSID
 1.38 06-Nov-2001  simonb Remove some variables that are set but never used.
 1.37 26-Oct-2001  lukem remove #include <ufs/ufs/quota.h> where it was just to appease
<ufs/ufs/inode.h>, since the latter now includes the former. leave the former
in source that obviously uses specific bits of it (for completeness.)
 1.36 13-Jul-2001  perseant branches: 1.36.4;
Merge the short-lived perseant-lfsv2 branch into the trunk.

Kernels and tools understand both v1 and v2 filesystems; newfs_lfs
generates v2 by default. Changes for the v2 layout include:

- Segments of non-PO2 size and arbitrary block offset, so these can be
matched to convenient physical characteristics of the partition (e.g.,
stripe or track size and offset).

- Address by fragment instead of by disk sector, paving the way for
non-512-byte-sector devices. In theory fragments can be as large
as you like, though in reality they must be smaller than MAXBSIZE in size.

- Use serial number and filesystem identifier to ensure that roll-forward
doesn't get old data and think it's new. Roll-forward is enabled for
v2 filesystems, though not for v1 filesystems by default.

- The inode free list is now a tailq, paving the way for undelete (undelete
is not yet implemented, but can be without further non-backwards-compatible
changes to disk structures).

- Inode atime information is kept in the Ifile, instead of on the inode;
that is, the inode is never written *just* because atime was changed.
Because of this the inodes remain near the file data on the disk, rather
than wandering all over as the disk is read repeatedly. This speeds up
repeated reads by a small but noticeable amount.

Other changes of note include:

- The ifile written by newfs_lfs can now be of arbitrary length, it is no
longer restricted to a single indirect block.

- Fixed an old bug where ctime was changed every time a vnode was created.
I need to look more closely to make sure that the times are only updated
during write(2) and friends, not after-the-fact during a segment write,
and certainly not by the cleaner.
 1.35 03-Dec-2000  perseant branches: 1.35.2; 1.35.4; 1.35.6;
Fix typo in 'malloc' for non-MALLOCLOG case
 1.34 03-Dec-2000  perseant Get rid of some old unnecessary code that cleared B_NEEDCOMMIT from buffers in
lfs_writeseg (possibly after they had been freed).

If MALLOCLOG is defined, make lfs_newbuf and lfs_freebuf pass along the
caller's file and line to _malloc and _free.
 1.33 27-Nov-2000  perseant If LFS_DO_ROLLFORWARD is defined, roll forward from the older checkpoint
on mount, through the newer checkpoint and on through any newer
partial-segments that may have been written but not checkpointed because
of an intervening crash.

LFS_DO_ROLLFORWARD is not defined by default.
 1.32 17-Nov-2000  perseant Correct accounting of lfs_avail, locked_queue_count, and locked_queue_bytes.
(PR #11468). In the case of fragment allocation, check to see if enough
space is available before extending a fragment already scheduled for writing.

The locked_queue_* variables indicate the number of buffer headers and bytes,
respectively, that are unavailable to getnewbuf() because they are locked up
waiting for LFS to flush them; make sure that that is actually what we're
counting, i.e., never count malloced buffers, and always use b_bufsize instead
of b_bcount.

If DEBUG is defined, the periodic calls to lfs_countlocked will now complain
if either counter is incorrect. (In the future lfs_countlocked will not need
to be called at all if DEBUG is not defined.)
 1.31 12-Nov-2000  perseant Do not needlessly dirty segment table blocks during lfs_segwrite,
preventing needless disk activity when the filesystem is idle. (PR #10979.)
 1.30 13-Sep-2000  perseant Cast back to int32_t in LFS_EST_BFREE and LFS_EST_RSVD macros, for
consistency with their arguments.

Change the debugging printf in lfs_reserve to match, and enclose it in
#ifdef DEBUG.

Tested on alpha, arm32, sparc.
 1.29 12-Sep-2000  perseant Make this file compile on the alpha as well (use %ld and cast to long,
instead of %qd with no cast).
 1.28 10-Sep-2000  augustss Make this file compile again.
 1.27 09-Sep-2000  perseant Various bug-fixes to LFS, to wit:


Kernel:

* Add runtime quantity lfs_ravail, the number of disk-blocks reserved
for writing. Writes to the filesystem first reserve a maximum amount
of blocks before their write is allowed to proceed; after the blocks
are allocated the reserved total is reduced by a corresponding amount.

If the lfs_reserve function cannot immediately reserve the requested
number of blocks, the inode is unlocked, and the thread sleeps until
the cleaner has made enough space available for the blocks to be
reserved. In this way large files can be written to the filesystem
(or, smaller files can be written to a nearly-full but thoroughly
clean filesystem) and the cleaner can still function properly.

* Remove explicit switching on dlfs_minfreeseg from the kernel code; it
is now merely a fs-creation parameter used to compute dlfs_avail and
dlfs_bfree (and used by fsck_lfs(8) to check their accuracy). Its
former role is better assumed by a properly computed dlfs_avail.

* Bounds-check inode numbers submitted through lfs_bmapv and lfs_markv.
This prevents a panic, but, if the cleaner is feeding the filesystem
the wrong data, you are still in a world of hurt.

* Cleanup: remove explicit references of DEV_BSIZE in favor of
btodb()/dbtob().

lfs_cleanerd:

* Make -n mean "send N segments' blocks through a single call to
lfs_markv". Previously it had meant "clean N segments though N calls
to lfs_markv, before looking again to see if more need to be cleaned".
The new behavior gives better packing of direct data on disk with as
little metadata as possible, largely alleviating the problem that the
cleaner can consume more disk through inefficient use of metadata than
it frees by moving dirty data away from clean "holes" to produce
entirely clean segments.

* Make -b mean "read as many segments as necessary to write N segments
of dirty data back to disk", rather than its former meaning of "read
as many segments as necessary to free N segments worth of space". The
new meaning, combined with the new -n behavior described above,
further aids in cleaning storage efficiency as entire segments can be
written at once, using as few blocks as possible for segment summaries
and inode blocks.

* Make the cleaner take note of segments which could not be cleaned due
to error, and not attempt to clean them until they are entirely free
of dirty blocks. This prevents the case in which a cleanerd running
with -n 1 and without -b (formerly the default) would spin trying
repeatedly to clean a corrupt segment, while the remaining space
filled and deadlocked the filesystem.

* Update the lfs_cleanerd manual page to describe all the options,
including the changes mentioned here (in particular, the -b and -n
flags were previously undocumented).

fsck_lfs:

* Check, and optionally fix, lfs_avail (to an exact figure) and
lfs_bfree (within a margin of error) in pass 5.

newfs_lfs:

* Reduce the default dlfs_minfreeseg to 1/20 of the total segments.

* Add a warning if the sgs disklabel field is 16 (the default for FFS'
cpg, but not usually desirable for LFS' sgs: 5--8 is a better range).

* Change the calculation of lfs_avail and lfs_bfree, corresponding to
the kernel changes mentioned above.

mount_lfs:

* Add -N and -b options to pass corresponding -n and -b options to
lfs_cleanerd.

* Default to calling lfs_cleanerd with "-b -n 4".


[All of these changes were largely tested in the 1.5 branch, with the
idea that they (along with previous un-pulled-up work) could be applied
to the branch while it was still in ALPHA2; however my test system has
experienced corruption on another filesystem (/dev/console has gone
missing :^), and, while I believe this unrelated to the LFS changes, I
cannot with good conscience request that the changes be pulled up.]
 1.26 05-Jul-2000  perseant Clean up accounting of lfs_uinodes (dirty but unwritten inodes).

Make lfs_uinodes a signed quantity for debugging purposes, and set it to
zero as fs mount time.

Enclose setting/clearing of the dirty flags (IN_MODIFIED, IN_ACCESSED,
IN_CLEANING) in macros, and use those macros everywhere. Make
LFS_ITIMES use these macros; updated the ITIMES macro in inode.h to know
about this. Make ufs_getattr use ITIMES instead of FFS_ITIMES.
 1.25 03-Jul-2000  perseant Allow the number of free segments reserved for the cleaner to be
parametrized in the filesystem, defaulting to MIN_FREE_SEGS = 2 but set
to something more reasonable at newfs_lfs time.

Note the number of blocks that have been scheduled for writing but which
are not yet on disk in an inode extension, i_lfs_effnblks. Move
i_ffs_effnlink out of the ffs extension and onto the main inode, since
it's used all over the shared code and the lfs extension would clobber
it.

At inode write time, indirect blocks and inode-held blocks of inodes
that have i_lfs_effnblks != i_ffs_blocks are cleansed of UNWRITTEN disk
addresses, so that these never make it to disk.
 1.24 27-Jun-2000  perseant Fixes associated with filling an LFS:

Change the space computation to appear to change the size of the *disk*
rather than the *bytes used* when more segment summaries and inode
blocks are written. Try to estimate the amount of space that these will
take up when more files are written, so the disk size doesn't change too
much.

Regularize error returns from lfs_valloc, lfs_balloc, lfs_truncate: they
now fail entirely, rather than succeeding half-way and leaving the fs in
an inconsistent state.

Rewrite lfs_truncate, mostly stealing from ffs_truncate. The old
lfs_truncate had difficulty truncating a large file to a non-zero size
(indirect blocks were not handled appropriately).

Unmark VDIROP on fvp after ufs_remove, ufs_rmdir, so these can be
reclaimed immediately: this vnode would not be written to disk again
anyway if the removal succeeded, and if it failed, no directory
operation occurred.

ufs_makeinode and ufs_mkdir now remove IN_ADIROP on error.
 1.23 06-Jun-2000  perseant branches: 1.23.2;
Protect inode free list with seglock, instead of separate lock, so that
the head of the inode free list (on the superblock) always matches the
rest of the free list (in the ifile).

Protect lfs_fragextend with seglock, to prevent the segment byte count
fudging from making its way to disk.

Don't try to inactivate dirop vnodes that are still in the middle of
their dirop (may address PR#10285).
 1.22 31-May-2000  fredb Make this build. (Balance parenthesis.
 1.21 31-May-2000  perseant update for IN_ACCESSED changes
 1.20 27-May-2000  perseant branches: 1.20.2;
Prevent dirops from getting around lfs_check and wedging the buffer cache.
All the dirop vnops now mark the inodes with a new flag, IN_ADIROP, which
is removed as soon as the dirop is done (as opposed to VDIROP which stays
until the file is written). To address one issue raised in PR#9357.
 1.19 19-May-2000  thorpej NULL != 0
 1.18 05-May-2000  perseant Change the way LFS does block accounting, from trying to infer from the
buffer cache flags, to marking the inode and/or indirect blocks with a
special disk address UNWRITTEN==-2 when a block is accounted for. (This
address is never written to disk, but only used in-core. This is essentially
the same method of block accounting as on the UBC branch, where the buffer
headers don't exist.) Make sure that truncation is handled properly,
especially in the case of holey files.

Fixes PR#9994.
 1.17 30-Mar-2000  augustss Remove register declarations.
 1.16 15-Dec-1999  perseant In lfs_bwrite, don't mark buffers dirty if lfs is mounted read-only.
(Previously buffers could be marked dirty by the cleaner, and possibly by
other means.)

Also check for softdep mount in vfs_shutdown before trying to bawrite
buffers, since other filesystems don't need it and lfs doesn't bawrite.
(This fragment reviewed by fvdl.)

Partially addresses PR#8964.
 1.15 04-Dec-1999  ragge CL* discarding.
 1.14 23-Nov-1999  fvdl Be more careful to block bio interrupts for some data structures. There
were at least a few missed cases where vp->v_{clean,dirty}blkhd were
unprotected since the softdep/trickle sync merge.
 1.13 06-Nov-1999  perseant branches: 1.13.2;
Address ufs_hashlock/ufs_ihashins protocol bug, discovered while doing a
post-mortem of a production machine. Also, take the active dirop
count off of the fs and make it global (since it is measuring a global
resource) and tie the threshold value LFS_MAXDIROP to desiredvnodes.
 1.12 21-Oct-1999  perseant Under degenerate access patterns (e.g. `bonnie' benchmark) lfs_check could
fail, because the particular block being requested was always in the cache
(although other routines that cannot afford to call lfs_check have in the
meantime stuffed the cache full of dirty blocks). Partially addresses PR 8383.
 1.11 01-Jun-1999  perseant branches: 1.11.2; 1.11.4; 1.11.6;
Fixed lfs_update (and related functions) so that calls from lfs_fsync
will DTRT with vnodes marked VDIROP. In particular, the message
"flushing VDIROP" will no longer appear, and the filesystem will remain
stable in the event of a crash.

This was particularly a problem with NFS-exported LFSes, since fsync
was called on every file close.
 1.10 12-Apr-1999  perseant Disallow threshold-initiated cache flush when dirops are active. Also, make
SET_ENDOP use lfs_check instead of inlining most of it.
 1.9 25-Mar-1999  perseant branches: 1.9.2;
Fixes to make dirops and lfs_vflush play together well. In particular,
if we are short on vnodes, lfs_vflush from another process can grab a
vnode that lfs_markv has already processed but not yet written; but
lfs_markv holds the seglock. When lfs_vflush gets around to writing it,
the context for copyin is gone. So, now lfs_markv calls copyin itself,
rather than having lfs_writeseg do it.
 1.8 25-Mar-1999  perseant clean up unused/required #ifdefs
 1.7 10-Mar-1999  perseant New sources should leave the LFS in a more-or-less working state. Changes
include:

- DIROP segregation is enabled, and greater care is taken
to make sure that a checkpoint completes. Fsck is not
needed to remount the filesystem.
- Several checks to make sure that the LFS subsystem does not
overuse various resources (memory, in particular).
- The cleaner routines, lfs_markv in particular, are completely
rewritten. A buffer overflow is removed. Greater care is taken
to ensure that inodes come from where lfs_cleanerd say they come
from (so we know nothing has changed since lfs_bmapv was called).
- Fragment allocation is fixed, so that writes beyond end-of-file
do the right thing.
 1.6 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.5 09-Feb-1996  christos lfs prototypes
 1.4 18-Jun-1995  cgd don't assume the f_fsnamelen is nul-truncated or longer than MFSNAMELEN
 1.3 18-Jan-1995  mycroft Turn mountlist into a CIRCLEQ, and handle setting and checking of MNT_ROOTFS
differently.
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.9.2.4 17-Dec-1999  he Pull up revision 1.13 (requested by perseant):
Address locking protocol error for inode hash, and make the
maximum number of active dirops a global quantity.
 1.9.2.3 17-Dec-1999  he Pull up revision 1.11 (requested by perseant):
Avoid flushing vnodes involved in a dirop, making lfs' promise
of "no fsck needed, even in the event of a crash" closer to
reality.
 1.9.2.2 26-Oct-1999  he Pull up revision 1.12 (requested by perseant):
Fix LFS buffer starvation under degenerate access patterns.
 1.9.2.1 13-Apr-1999  perseant branches: 1.9.2.1.2;
Pull-up of changes made to the trunk on Sunday [1.9->1.10], to wit:

Take out the `#ifdef USE_UFSHASH'; use ufs_hashlock to lock the inode free
list instead of free_lock.

Fix inode reporting in lfs_statfs (the meaning of f_files and f_ffree was
reversed).

Fix "lfs_ifind: dinode xxx not found" panic. When inodes were freed, then
immediately reloaded, their dinodes were located in an inode block which
was not on disk at the advertized location, nor in the cache (although it
would be flushed to disk next segment write). Fix this by using getblk()
instead of lfs_newbuf() for inode blocks.

Better checking for held inode locks in lfs_fastvget, for a number of
error conditions. Also change the default setting of lfs_clean_vnhead to
0, which seems to make the locking problems go away (although this is
difficult to test as I can't reliably reproduce them).

Make sure that the wakeup occurs for vnodes that lfs_update might be
sleeping on (nodes which are not marked IN_MODIFIED/IN_CLEANING, but which
have dirty buffers), by marking them with the appropriate flag if
dirtybuffers were added while the write was in progress.

Fix block counting during file truncation, if not truncating to zero.

Disallow threshold-initiated cache flush when dirops are active. Also,
make SET_ENDOP use lfs_check instead of inlining most of it.

Improve the debugging printfs in the cleaner syscalls (in particular, make
it obvious that they're coming from lfs).

Check the superblock version field, and refuse to mount the filesystem if
the version number is higher than we know about. This allows, e.g.,
changes in the format of the ifile, segment size restrictions and
boundaries, etc., which would not affect existing fields in the
superblock, but which would drastically affect the filesystem, to be
smoothly integrated at a later date.
 1.9.2.1.2.2 31-Aug-1999  perseant Rudimentary support for LFS under UBC:

- LFS-specific VOP_BALLOC and VOP_PUTPAGES vnode ops.

- getblk VREG panic #ifdef'd out (can be reinstated when Ifile is
internalized and Ifile can be made another type from VREG)

- interface to VOP_PUTPAGES changed to pass all pager flags, not
just sync. FS putpages routines must know about the pager flags.

- new LFS magic disk address, -2 ("unwritten"), meaning accounted for
but not assigned to a fixed disk location (since LFS does these two
things separately, and the previous accounting method using buffer
headers no longer will work). Changed references to (foo == (daddr_t)-1)
to (foo < 0). Since disk drivers reject all addresses < 0, this should
not present a problem for other FSs.
 1.9.2.1.2.1 21-Jun-1999  thorpej Sync w/ -current.
 1.11.6.2 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.11.6.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.11.4.1 15-Nov-1999  fvdl Sync with -current
 1.11.2.3 08-Dec-2000  bouyer Sync with HEAD.
 1.11.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.11.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.13.2.2 06-Nov-1999  perseant Address ufs_hashlock/ufs_ihashins protocol bug, discovered while doing a
post-mortem of a production machine. Also, take the active dirop
count off of the fs and make it global (since it is measuring a global
resource) and tie the threshold value LFS_MAXDIROP to desiredvnodes.
 1.13.2.1 06-Nov-1999  perseant file lfs_bio.c was added on branch comdex-fall-1999 on 1999-11-06 20:33:06 +0000
 1.20.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.23.2.2 03-Feb-2001  he Pull up revisions 1.31-1.32 (requested by perseant):
o Don't write anything if the filesystem is idle (PR#10979).
o Close up accounting holes in LFS' accounting of immediately-
available-space, number of clean segments, and amount of dirty
space taken up by metadata (PR#11468, PR#11470, PR#11534).
 1.23.2.1 14-Sep-2000  perseant Pull up recent LFS kernel changes (approved by thorpej):

ufs/ufs/inode.h, 1.20--1.22 (add i_lfs_effnblks extension ;
make ITIMES aware of LFS_ITIMES;
_LKM protection so userland progs
compile)
ufs/ufs/ufs_vnops.c, 1.69, 1.71 (remove IN_ADIROP;
use ITIMES instead of FFS_ITIMES)
ufs/ufs/ufs_readwrite.c, 1.27 (use lfs_reserve in lfs_write)
ufs/lfs/lfs.h, 1.26--1.32 (define LFS_EST_* macros ;
change MIN_FREE_SEGS to lfs_minfreesegs ;
add avail and bfree to CLEANERINFO ;
change lfs_uinodes to signed ;
change lfs_dmeta to signed ;
add whitespace to line up structure
members ;
explicit cast to int32_t in LFS_EST_*
macros)
ufs/lfs/lfs_alloc.c, back out 1.34.2.3 (pullups of 1.39, 1.40);
then pull up 1.38 (clean up on error)
1.39--1.43 (restore fvdl's ufs_hashlock fix ;
restore fvdl's ufs_hashlock fix ;
set i_lfs_effnblks ;
use UINO macros ;
add comments and fix long lines)
ufs/lfs/lfs_balloc.c, 1.19 (don't succeed halfway)
1.21--1.25 (use i_lfs_effnblks ;
fix i_lfs_effnblks computation and
quieten ;
fix i_ffs_blocks in unwritten fragment ;
remove useless debugging check ;
add comments and (c) 2000)
ufs/lfs/lfs_bio.c, 1.24--1.30 (cleanup and make lfs_flush_fs take
"struct lfs *" instead of "struct
mount *" ;
use lfs_minfreeseg instead of
MIN_FREE_SEGS ;
use UINO macros, and copy bfree/avail
to CLEANERINFO ;
add lfs_reserve function ;
1.28--1.30 fix printf formatting)
ufs/lfs/lfs_cksum.c, 1.13 (add (c) 2000)
ufs/lfs/lfs_debug.c, 1.11 (use btodb instead of DEV_BSIZE)
ufs/lfs/lfs_extern.h, 1.18, 1.20--1.21 (function prototype changes)
ufs/lfs/lfs_inode.c, 1.38 (rewrite lfs_truncate from
ffs_truncate)
1.40--1.44 (count written and unwritten blocks
seperately ;
use disk block units instead of bytes ;
remove unnecessary "mod" variable ;
correct B_DELWRI to avoid bawrite panic ;
use lfs_reserve)
ufs/lfs/lfs_segment.c, 1.52-1.59 (use lfs_dmeta to note used summaries ;
check for UNWRITTEN in indirect blocks ;
more debugging stuff inside #ifdef
DEBUG_LFS ;
use LK_CANRECURSE ;
don't drop dirty indirect blocks ;
use UINO macros ;
don't hose the free list ;
use btodb() instead of DEV_BSIZE ;
make it compile again (oops))
ufs/lfs/lfs_subr.c, 1.16--1.17 (check for locked inodes before
changing ;
use btodb() instead of DEV_BSIZE, (c)
2000)
ufs/lfs/lfs_syscalls.c, back out 1.41.4.2 (fvdl's ufs_hashlock fix);
then pull up 1.43 (use lfs_dmeta)
1.44--1.45 (restore fvdl's ufs_hashlock fix)
1.46--1.47 (fix lfs_avail leakage from sblock
segments ;
use UINO macros)
1.49 (bounds-check inode numbers in
lfs_markv)
ufs/lfs/lfs_vfsops.c, 1.53 (use LFS_EST_* macros in lfs_statfs)
1.56--1.58 (initialize lfs_minfreeseg, lfs_effnblk ;
initialize lfs_uinodes ;
initialize lfs_ravail)
ufs/lfs/lfs_vnops.c, 1.40 (remove VDIROP from removed files)
1.42--1.44 (move SET_ENDOP below the removal of
VDIROP ;
use UINO macros and add lfs_itimes
function ;
use lfs_reserve in dirops)
 1.35.6.5 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.35.6.4 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.35.6.3 16-Mar-2002  jdolecek Catch up with -current.
 1.35.6.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.35.6.1 03-Aug-2001  lukem update to -current
 1.35.4.4 13-Jul-2001  perseant Be more careful about when we update ctime/mtime. In particular, if we
are only writing indirect blocks, that doesn't count for mtime; and when
we first create a vnode, that certainly *does not* count for ctime
(a bug that's been there from the beginning).

This does not change the fact that mtime might still be set after write(2)
is "completed", but it does make the atime-in-the-ifile code have some
effect (noticeable less degradation of read time after an intervening
large write).
 1.35.4.3 02-Jul-2001  perseant Change disk addressing unit to be the fragment, instead of the disk sector.
All quantities in the superblock, inodes, indirect blocks, etc. refer now
to this abstract unit (called "fsb" as it is in FFS) instead of disk sectors;
as a consequence segment summary blocks have to be multiples of a fragment in
size. In v1 filesystems, compatibility code ensures that 1 fsb == 1 sector,
regardless of fragment size.

Fragments can now range in size between 512 and 32k; in the event that
LFS_LABELPAD (8k) is smaller than the disk address unit size, an extra
proto-superblock is kept at 8k from the beginning of the disk, to be used
*only* to locate the real superblocks. (Not all of the userland knows about
this yet.)

Almost all of this was done not by me, but by joff.
 1.35.4.2 29-Jun-2001  perseant Get rid of __P(), protoizing where it had not already been done
 1.35.4.1 27-Jun-2001  perseant Import of what I've been calling "LFSv2", that is, LFS with some features
added that require changes to the on-disk data structures. These include:

- 64-bit time in everything but inodes
- User-specified segment offset, and segment size no longer
restricted to PO2.
- Serial number on segment summaries in addition to timestamp, and
a new volume identifier, to make roll-forward feasible without
fear of finding old data and thinking it was new.

Although I think this version works at least as well as what's on the trunk,
we're not done yet; hence this commit is going in on a branch and not on
the trunk. Enhancements that are not here yet include fragment addressing,
like FFS does, instead of block addressing.
 1.35.2.13 03-Jan-2003  thorpej Sync with HEAD.
 1.35.2.12 29-Dec-2002  thorpej Sync with HEAD.
 1.35.2.11 19-Dec-2002  thorpej Sync with HEAD.
 1.35.2.10 11-Dec-2002  thorpej Sync with HEAD.
 1.35.2.9 01-Aug-2002  nathanw Catch up to -current.
 1.35.2.8 15-Jul-2002  nathanw Whitespace.
 1.35.2.7 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.35.2.6 20-Jun-2002  nathanw Catch up to -current.
 1.35.2.5 28-Feb-2002  nathanw Catch up to -current.
 1.35.2.4 08-Jan-2002  nathanw Catch up to -current.
 1.35.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.35.2.2 24-Aug-2001  nathanw Catch up with -current.
 1.35.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.36.4.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.43.2.1 15-Jul-2002  gehenna catch up with -current.
 1.66.2.7 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.66.2.6 01-Apr-2005  skrll Sync with HEAD.
 1.66.2.5 08-Mar-2005  skrll Sync with HEAD.
 1.66.2.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.66.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.66.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.66.2.1 03-Aug-2004  skrll Sync with HEAD
 1.77.10.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.77.8.1 29-Apr-2005  kent sync with -current
 1.77.6.1 10-May-2005  riz Pull up the following revisions (requested by perseant in ticket #1281):

1.8 sys/ufs/lfs/TODO
1.75 sys/ufs/lfs/lfs.h (via patch)
1.74 sys/ufs/lfs/lfs_alloc.c (via patch)
1.49, 1.51 sys/ufs/lfs/lfs_balloc.c (1.51 via patch)
1.78 sys/ufs/lfs/lfs_bio.c
1.62 sys/ufs/lfs/lfs_extern.h (via patch)
1.156 sys/ufs/lfs/lfs_segment.c (via patch)
1.48 sys/ufs/lfs/lfs_subr.c
1.101 sys/ufs/lfs/lfs_syscalls.c
1.163 sys/ufs/lfs/lfs_vfsops.c (via patch)
1.134 sys/ufs/lfs/lfs_vnops.c (via patch)
1.61 sys/ufs/ufs/ufs_readwrite.c (via patch)

1.20 libexec/lfs_cleanerd/clean.h (via patch)
1.52 libexec/lfs_cleanerd/cleanerd.c (via patch)
1.41 libexec/lfs_cleanerd/library.c (via patch)

1.4 regress/sys/fs/lfs/newfs_fsck/Makefile
1.2 regress/sys/fs/lfs/newfs_fsck/mkfs_mount
1.2 regress/sys/fs/lfs/newfs_fsck/smallfiles
1.3 sbin/fsck_lfs/bufcache.c
1.3 sbin/fsck_lfs/bufcache.h
1.3 sbin/fsck_lfs/lfs.h
1.8 sbin/fsck_lfs/lfs.c (via patch)
1.8 sbin/fsck_lfs/pass3.c (via patch)
1.18 sbin/fsck_lfs/pass0.c (via patch)
1.18 sbin/fsck_lfs/utilities.c (via patch)
1.7 sbin/fsck_lfs/segwrite.c
1.19 sbin/fsck_lfs/setup.c (via patch)
1.3 sbin/newfs_lfs/Makefile
0 sbin/newfs_lfs/lfs.c (yes, remove it)
1.1 sbin/newfs_lfs/make_lfs.c
1.15 sbin/newfs_lfs/newfs.c (via patch)

Various minor LFS improvements.

Kernel:

* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this. Should fix PR #29045.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
Fixes PR #26680.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().

cleaner:

* Adapt lfs_cleanerd to use the fcntl call to get the Ifile filehandle,
so it need not be in the namespace.
* Make lfs_cleanerd be more careful when there are very few available
segments.
* Make lfs_cleanerd less verbose when the filesystem is unmounted.

newfs_lfs, fsck_lfs, and regression:

* Extend the lfs library from fsck_lfs(8) so that it can be used with a
not-yet-existent LFS. Make newfs_lfs(8) use this library, so it can
create LFSs whose Ifile is larger than one segment. Addresses PR #11110.
* Make newfs_lfs(8) use strsuftoi64() for its arguments, a la newfs(8).
* Make fsck_lfs(8) respect the "file system is clean" flag.
* Don't let fsck_lfs(8) think it has dirty blocks when invoked with the
-n flag.
* Remove the Ifile from the filesystem namespace. The cleaner now uses
a fcntl call on the root inode to find the Ifile filehandle. (As a
side-effect, addresses PR #29144.)
 1.81.2.5 10-Aug-2006  tron Apply patch (requested by fair in perseant #1457):
Bring LFS up to current, including a patch (1.95 lfs_alloc.c) that
should prevent the inode free list errors seen on the STABLE branch
subsequent to pullup ticket #1327.
 1.81.2.4 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_alloc.c: revision 1.92
sys/ufs/lfs/lfs.h: revision 1.105
sys/ufs/lfs/lfs_vfsops.c: revision 1.207
sys/ufs/lfs/lfs_subr.c: revision 1.59
sys/ufs/lfs/lfs_vnops.c: revision 1.173
sys/ufs/lfs/lfs_bio.c: revision 1.92
Introduce another per-filesystem parameter, lfs_resvseg, to separate the
notion of "how many segments are reserved for the cleaner" from that of
"how many segments are not counted in lfs_bfree". The default value
used for existing filesystems is the same as the previous implicit value
of (lfs_minfreeseg / 2 + 1), modulo some sanity checking.
Count pending dirops on a per-filesystem basis, since once we start
writing them we can't stop until we're done. This seems to help stave off
the "no clean segments" panic in the case of filling the filesystem with
directories and small files (e.g. simultaneously unpacking more copies of
pkgsrc than will fit).
 1.81.2.3 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs.h: revision 1.102
sys/ufs/lfs/lfs_segment.c: revision 1.173
sys/ufs/lfs/lfs_vnops.c: revision 1.167 via patch
sys/ufs/lfs/lfs_bio.c: revision 1.91
Make lfs_vref/lfs_vunref not need to know about VXLOCK and VFREEING
explicitly (especially since we didn't know about VFREEING at all before),
but notice the EBUSY return from vget() instead.
Fix some more MP locking protocol issues, most of which were pointed out by
Christian Ehrhardt this morning on tech-kern.
 1.81.2.2 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.152
sys/ufs/lfs/lfs_debug.c: revision 1.31
sys/ufs/lfs/lfs_subr.c: revision 1.53
sys/ufs/lfs/lfs_extern.h: revision 1.68
sys/ufs/lfs/lfs_inode.c: revision 1.96
sys/ufs/lfs/lfs_bio.c: revision 1.86
sys/ufs/lfs/lfs_alloc.c: revision 1.83
sys/ufs/lfs/lfs_vfsops.c: revision 1.181
sys/ufs/lfs/lfs.h: revision 1.88
sys/ufs/lfs/lfs_segment.c: revision 1.164
- sprinkle const
- avoid shadow variables.
 1.81.2.1 07-May-2005  tron Apply patch (requested by perseant in ticket #242):
* fsck_lfs buffer cache fixes, including PR #29151
* Change fsck_lfs phase 0 message to reflect reality
* fsck_lfs: check phase 5 (cleanerinfo accounting) even on
roll-forward
* Keep better track of the free list during roll-forward, avoiding
a core dump
* Improve hash table use for fsck_lfs buffer and vnode cache
* Document fsck_lfs flag -f, and implement -q
* Add resize_lfs, including kernel support
* Add LFS to mountd's list of exportable filesystem types
* Make the LFS lkm work again [christos@]
* Add MP locking to the LFS kernel subsystem
* Fix pager_map deadlock in lfs_putpages()
* Avoid incomplete file extension that looks like "partial
truncation" to fsck
* Use lfs_malloc for cleaner malloc, since the cleaner often runs
in low-memory conditions.
* Use splay trees, not hash table, to track page allocation for
write.
* Fix mkdir panic on full fs
* Fix page accounting leak by counting differently.
* Use rightly named structure for lfs_getattr [skrll@]
* Cosmetic changes for readability.
 1.86.2.7 27-Feb-2008  yamt sync with head.
 1.86.2.6 04-Feb-2008  yamt sync with head.
 1.86.2.5 21-Jan-2008  yamt sync with head
 1.86.2.4 27-Oct-2007  yamt sync with head.
 1.86.2.3 03-Sep-2007  yamt sync with head.
 1.86.2.2 30-Dec-2006  yamt sync with head.
 1.86.2.1 21-Jun-2006  yamt sync with head.
 1.87.2.1 15-Jan-2006  yamt sync with head.
 1.89.6.4 11-Aug-2006  yamt sync with head
 1.89.6.3 24-May-2006  yamt sync with head.
 1.89.6.2 13-Mar-2006  yamt sync with head.
 1.89.6.1 05-Mar-2006  yamt separate page replacement policy from the rest of kernel.
 1.89.4.2 01-Jun-2006  kardel Sync with head.
 1.89.4.1 22-Apr-2006  simonb Sync with head.
 1.89.2.1 09-Sep-2006  rpaulo sync with head
 1.90.4.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.90.2.3 11-May-2006  elad sync with head
 1.90.2.2 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.90.2.1 19-Apr-2006  elad sync with head.
 1.93.4.1 13-Jul-2006  gdamore Merge from HEAD.
 1.94.4.1 18-Nov-2006  ad Sync with head.
 1.95.2.2 10-Dec-2006  yamt sync with head.
 1.95.2.1 22-Oct-2006  yamt sync with head
 1.98.16.1 03-Sep-2007  wrstuden Sync w/ NetBSD-4-RC_1
 1.98.10.1 11-Jul-2007  mjf Sync with head.
 1.98.8.7 24-Aug-2007  ad Sync with buffer cache locking changes. See buf.h/vfs_bio.c for details.
Some minor portions are incomplete and needs to be verified as a whole.
 1.98.8.6 20-Aug-2007  ad Sync with HEAD.
 1.98.8.5 19-Aug-2007  ad - Back out the biodone() changes.
- Eliminate B_ERROR (from HEAD).
 1.98.8.4 23-Jun-2007  ad - Lock v_cleanblkhd, v_dirtyblkhd, v_numoutput with the vnode's interlock.
Get rid of global_v_numoutput_lock. Partially incomplete as the buffer
cache locking doesn't work very well and needs an overhaul.
- Some changes to try and make softdep MP safe. Untested.
 1.98.8.3 08-Jun-2007  ad Sync with head.
 1.98.8.2 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.98.8.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.98.4.2 17-May-2007  yamt sync with head.
 1.98.4.1 07-May-2007  yamt sync with head.
 1.98.2.1 05-Jun-2007  bouyer Pull up following revision(s) (requested by perseant in ticket #703):
sys/miscfs/genfs/genfs.h 1.21
sys/miscfs/genfs/genfs_vnops.c 1.151
sys/ufs/lfs/lfs.h 1.119, 1.120
sys/ufs/lfs/lfs_bio.c 1.99-101
sys/ufs/lfs/lfs_extern.h 1.89
sys/ufs/lfs/lfs_inode.c 1.108, 1.109
sys/ufs/lfs/lfs_segment.c 1.197, 1.199, 1.200
sys/ufs/lfs/lfs_subr.c 1.69, 1.70
sys/ufs/lfs/lfs_syscalls.c 1.119
sys/ufs/lfs/lfs_vfsops.c 1.234, 1.235
sys/ufs/lfs/lfs_vnops.c 1.195, 1.196, 1.200, 1.202-206

Reduce busy waiting in lfs_putpages(), and other LFS improvements.
 1.102.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.103.10.2 29-Jul-2007  ad It's not a good idea for device drivers to modify b_flags, as they don't
need to understand the locking around that field. Instead of setting
B_ERROR, set b_error instead. b_error is 'owned' by whoever completes
the I/O request.
 1.103.10.1 29-Jul-2007  ad file lfs_bio.c was added on branch matt-mips64 on 2007-07-29 13:31:15 +0000
 1.103.8.1 14-Oct-2007  yamt sync with head.
 1.103.6.3 23-Mar-2008  matt sync with HEAD
 1.103.6.2 09-Jan-2008  matt sync with HEAD
 1.103.6.1 06-Nov-2007  matt sync with HEAD
 1.103.4.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.106.10.1 02-Jan-2008  bouyer Sync with HEAD
 1.106.6.5 19-Dec-2007  ad Use a global lfs_lock.
 1.106.6.4 19-Dec-2007  ad Fix some more problems w/lfs on this branch.
 1.106.6.3 19-Dec-2007  ad Get lfs mostly working.
 1.106.6.2 08-Dec-2007  ad Minor locking fixes.
 1.106.6.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.106.4.1 18-Feb-2008  mjf Sync with HEAD.
 1.110.10.3 11-Aug-2010  yamt sync with head.
 1.110.10.2 11-Mar-2010  yamt sync with head
 1.110.10.1 16-May-2008  yamt sync with head.
 1.110.8.1 18-May-2008  yamt sync with head.
 1.110.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.114.18.1 19-Dec-2013  matt Adapt to new uvm_estimatepageable arguments
 1.116.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.116.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.117.2.2 03-Jul-2010  rmind sync with head
 1.117.2.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.118.6.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.120.6.1 18-Feb-2012  mrg merge to -current.
 1.120.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.120.2.1 17-Apr-2012  yamt sync with head
 1.121.2.1 17-Mar-2012  bouyer Pull up following revision(s) (requested by perseant in ticket #116):
sys/ufs/lfs/lfs_alloc.c: revision 1.112
tests/fs/vfs/t_rmdirrace.c: revision 1.9
tests/fs/vfs/t_renamerace.c: revision 1.25
sys/ufs/lfs/lfs_vnops.c: revision 1.240
sys/ufs/lfs/lfs_segment.c: revision 1.224
sys/ufs/lfs/lfs_bio.c: revision 1.122
sys/ufs/lfs/lfs_vfsops.c: revision 1.294
sbin/newfs_lfs/make_lfs.c: revision 1.19
sys/ufs/lfs/lfs.h: revision 1.136
Pass t_renamerace and t_rmdirrace tests.
Adapt dholland@'s fix to ufs_rename to fix PR kern/43582. Address several
other MP locking issues discovered during the course of investigating the
same problem.
Removed extraneous vn_lock() calls on the Ifile, since the Ifile writes
are controlled by the segment lock.
Fix PR kern/45982 by deemphasizing the estimate of how much metadata
will fill the empty space on disk when the disk is nearly empty
(t_renamerace crates a lot of inode blocks on a tiny empty disk).
 1.122.2.3 03-Dec-2017  jdolecek update from HEAD
 1.122.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.122.2.1 23-Jun-2013  tls resync from head
 1.125.2.2 18-May-2014  rmind sync with head
 1.125.2.1 28-Aug-2013  rmind sync with head
 1.128.6.3 28-Aug-2017  skrll Sync with HEAD
 1.128.6.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.128.6.1 22-Sep-2015  skrll Sync with HEAD
 1.135.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.135.2.2 26-Apr-2017  pgoyette Sync with HEAD
 1.135.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.139.4.1 30-Oct-2017  snj Pull up following revision(s) (requested by maya in ticket #330):
sbin/fsck_lfs/inode.c: 1.69
sbin/fsck_lfs/lfs.c: 1.73
sbin/fsck_lfs/pass6.c: 1.50
sbin/fsck_lfs/segwrite.c: 1.46
sys/ufs/lfs/lfs.h: 1.202-1.203
sys/ufs/lfs/lfs_accessors.h: 1.48
sys/ufs/lfs/lfs_alloc.c: 1.136-1.137
sys/ufs/lfs/lfs_balloc.c: 1.94
sys/ufs/lfs/lfs_bio.c: 1.141
sys/ufs/lfs/lfs_extern.h: 1.113
sys/ufs/lfs/lfs_inode.c: 1.156-1.157
sys/ufs/lfs/lfs_inode.h: 1.20, 1.21, 1.23
sys/ufs/lfs/lfs_itimes.c: 1.20
sys/ufs/lfs/lfs_pages.c: 1.13-1.15
sys/ufs/lfs/lfs_rename.c: 1.22
sys/ufs/lfs/lfs_segment.c: 1.270-1.275
sys/ufs/lfs/lfs_subr.c: 1.94-1.97
sys/ufs/lfs/lfs_syscalls.c: 1.175
sys/ufs/lfs/lfs_vfsops.c: 1.360
sys/ufs/lfs/lfs_vnops.c: 1.316-1.321
sys/ufs/lfs/ulfs_inode.c: 1.20
sys/ufs/lfs/ulfs_inode.h: 1.24
sys/ufs/lfs/ulfs_lookup.c: 1.41
sys/ufs/lfs/ulfs_quota2.c: 1.31
sys/ufs/lfs/ulfs_readwrite.c: 1.24
sys/ufs/lfs/ulfs_vnops.c: 1.49-1.50
Update inode member i_flag --> i_state to keep up with kernel changes
Move definition of IN_ALLMOD near the flag it's a mask for.
Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
Correct confusion between i_flag and i_flags
These will have to be renamed.
Spotted by Riastradh, thanks!
Add an XXX about the missing flags so it's not buried in a commit
message.
now the XXX count for LFS is 260
Rename i_flag to i_state.
The similarity to i_flags has previously caused errors.
Use continue to denote the no-op loop to match netbsd style
newline for extra clarity.
It isn't safe to drain dirops with seglock held, it'll deadlock if there
are any dirops. drain before grabbing seglock.
lfs_dirops == 0 is always true (as we already drained dirops), so omit
that part of the comparison.
Fixes a lot of LFS deadlocks. PR kern/52301
Many thanks to dholland for help analyzing coredumps
Ifdef out KDASSERT which fires on my machine.
Deduplicate sanity check that seglock is held on segunlock
Revert r1.272 fix to PR kern/52301, the performance hit is making things
unusable.
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar
XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
fix buffer overflow/KASSERT when cookies are supplied
lfs no longer uses the ffs-style struct direct, use the correct minimum
size
from dholland
XXX more wrong
Consistently use {,UN}MARK_VNODE macros rather than function calls.
Not much point doing anything after a panic call
Ask some question about the code in a XXX comment
XXX question our double-flushing of dirops
Fix typo in comment
 1.141.4.1 25-Jun-2018  pgoyette Sync with HEAD
 1.142.6.1 17-Aug-2020  martin Pull up following revision(s) (requested by riastradh in ticket #1050):

sys/ufs/lfs/lfs_subr.c: revision 1.101
sys/ufs/lfs/lfs_subr.c: revision 1.102
sys/ufs/lfs/lfs_inode.c: revision 1.158
sys/ufs/lfs/lfs_inode.h: revision 1.25
sys/ufs/lfs/lfs_balloc.c: revision 1.95
sys/ufs/lfs/lfs_pages.c: revision 1.21
sys/ufs/lfs/lfs_vnops.c: revision 1.330
sys/ufs/lfs/lfs_alloc.c: revision 1.140 (patch)
sys/ufs/lfs/lfs_alloc.c: revision 1.141 (patch)
lib/libp2k/p2k.c: revision 1.72
sys/ufs/lfs/lfs.h: revision 1.205
sys/ufs/lfs/lfs.h: revision 1.206
sys/ufs/lfs/lfs_segment.c: revision 1.284
sys/ufs/lfs/lfs.h: revision 1.207
sys/ufs/lfs/lfs_segment.c: revision 1.285
sys/ufs/lfs/lfs_debug.c: revision 1.55
sys/ufs/lfs/lfs_rename.c: revision 1.23
usr.sbin/dumplfs/dumplfs.c: revision 1.65
sys/ufs/lfs/lfs_vfsops.c: revision 1.371
sys/arch/i386/stand/efiboot/bootx64/Makefile: revision 1.3
sys/ufs/lfs/lfs_vfsops.c: revision 1.372
sys/ufs/lfs/lfs_vfsops.c: revision 1.373
sbin/fsck_lfs/pass1.c: revision 1.46
sys/ufs/lfs/lfs_vnops.c: revision 1.326
sys/ufs/lfs/lfs_vnops.c: revision 1.327
sys/ufs/lfs/lfs_vfsops.c: revision 1.375 (patch)
sys/ufs/lfs/lfs_vnops.c: revision 1.328
sys/ufs/lfs/lfs_subr.c: revision 1.98
sys/ufs/lfs/lfs_extern.h: revision 1.116
sys/ufs/lfs/lfs_vnops.c: revision 1.329
sys/ufs/lfs/lfs_subr.c: revision 1.99
sys/ufs/lfs/lfs_extern.h: revision 1.117
sys/ufs/lfs/lfs_accessors.h: revision 1.49
sys/ufs/lfs/lfs_extern.h: revision 1.118
sys/rump/fs/lib/liblfs/Makefile: revision 1.15
sys/ufs/lfs/lfs_bio.c: revision 1.146 (patch)
sys/ufs/lfs/lfs_bio.c: revision 1.147
sys/ufs/lfs/lfs_subr.c: revision 1.100

Fix kassert in lfs by initializing vp first.

Use a marker node to iterate lfs_dchainhd / i_lfs_dchain.

I believe elements can be removed while the lock is dropped,
including the next node we're hanging on to.

Just use VOP_BWRITE for lfs_bwrite_log.
Hope this doesn't cause trouble with vfs_suspend.

Teach lfs to transition ro<->rw.

Prevent new dirops while we issue lfs_flush_dirops.

lfs_flush_dirops assumes (by KASSERT((ip->i_state & IN_ADIROP) == 0))
that vnodes on the dchain will not become involved in active dirops
even while holding no other locks (lfs_lock, v_interlock), so we must
set lfs_writer here. All other callers already set lfs_writer.

We set fs->lfs_writer++ without explicitly doing lfs_writer_enter
because
(a) we already waited for the dirops to drain, and
(b) we hold lfs_lock and cannot drop it before setting lfs_writer.

Assert lfs_writer where I think we can now prove it.

Serialize access to the splay tree with lfs_lock.

Change some cheap KDASSERT into KASSERT.

Take a reference and fix assertions in lfs_flush_dirops.
Fixes panic:
KASSERT((ip->i_state & IN_ADIROP) == 0) at lfs_vnops.c:1670
lfs_flush_dirops
lfs_check
lfs_setattr
VOP_SETATTR
change_mode
sys_fchmod
syscall

This assertion -- and the assertion that vp->v_uflag has VU_DIROP set
-- is valid only until we release lfs_lock, because we may race with
lfs_unmark_dirop which will remove the nodes and change the flags.

Further, vp itself is valid only as long as it is referenced, which it
is as long as it's on the dchain, but lfs_unmark_dirop drops the
dchain's reference.

Don't lfs_writer_enter while holding v_interlock.

There's no need to lfs_writer_enter at all here, as far as I can see.
lfs_flush_fs will do it for us.

Break deadlock in PR kern/52301.

The lock order is lfs_writer -> lfs_seglock. The problem in 52301 is
that lfs_segwrite violates this lock order by sometimes doing
lfs_seglock -> lfs_writer, either (a) when doing a checkpoint or (b),
opportunistically, when there are no dirops pending. Both cases can
deadlock, because dirops sometimes take the seglock (lfs_truncate,
lfs_valloc, lfs_vfree):
(a) There may be dirops pending, and they may be waiting for the
seglock, so we can't wait for them to complete while holding the
seglock.
(b) The test for fs->lfs_dirops == 0 happens unlocked, and the state
may change by the time lfs_writer_enter acquires lfs_lock.

To resolve this in each case:
(a) Do lfs_writer_enter before lfs_seglock, since we will need it
unconditionally anyway. The worst performance impact of this should
be that some dirops get delayed a little bit.
(b) Create a new lfs_writer_tryenter to use at this point so that the
test for fs->lfs_dirops == 0 and the acquisition of lfs_writer happen
atomically under lfs_lock.

Initialize/destroy lfs_allclean_wakeup in modcmd, not lfs_mountfs.

Fixes reloading lfs.kmod.

In lfs_update, hold lfs_writer around lfs_vflush.

Otherwise, we might do
lfs_vflush
-> lfs_seglock
-> lfs_segwait(SEGM_CKP)
-> lfs_writer_enter
which is the reverse of the lfs_writer -> lfs_seglock ordering.

Call lfs_orphan in lfs_rename while we're still in the dirop.
lfs_writer_enter can't fail; keep it simple and don't pretend it can.

Assert that mtsleep can't fail either -- it doesn't catch signals and
there's no timeout.

Teach LFS_ORPHAN_NEXTFREE about lfs64.

Dust off the orphan detection code and try to make it work.

Fix !DIAGNOSTIC compile

Fix userland references to LFS_ORPHAN_NEXTFREE.

Forgot to grep for these or do a full distribution build, oops!

Fix missing <sys/evcnt.h> by removing the evcnts instead.

Just wanted to confirm that a race might happen, and indeed it did.
These serve little diagnostic value otherwise.

OR into bp->b_cflags; don't overwrite.

CTASSERT lfs on-disk structure sizes.

Avoid misaligned access to lfs64 on-disk records in memory.
lfs64 directory entries are only 32-bit aligned in order to conserve
space in directory blocks, and we had a hack to stuff a 64-bit inode
in them. This replaces the hack by __aligned(4) __packed, and goes
further:

1. It's not clear that all the other lfs64 data structures are 64-bit
aligned on disk to begin with. We can go through these later and
upgrade them from
struct foo64 {
...
} __aligned(4) __packed;
union foo {
struct foo64 f64;
...
};
to
struct foo64 {
...
};
union foo {
struct foo64 f64 __aligned(8);
...
} __aligned(4) __packed;
if we really want to take advantage of 64-bit memory accesses.
However, the __aligned(4) __packed must remain on the union
because:
2. We access even the lfs32 data structures via a union that has
lfs64 members, and it turns out that compilers will assume access
through a union with 64-bit aligned members implies the whole
union has 64-bit alignment, even if we're only accessing a 32-bit
aligned member.

Fix clang build after packed lfs64 accessor change.

Suppress spurious address-of-packed error in rump lfs too.
 1.142.2.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.144.2.1 29-Feb-2020  ad Sync with head.
 1.30 02-Aug-2015  dholland Add a (draft) 64-bit superblock. Make things build again.

Add pieces of support for using both superblock types where
convenient, and specifically to the superblock accessors, but don't
actually enable it anywhere.

First substantive step on PR 50000.
 1.29 06-Jun-2013  dholland branches: 1.29.10;
Cleanups and hacks to make lfs userland stuff build:
- lfs_cksum.c doesn't actually need ulfs_inode.h any more.
- neither does lfs_itimes.c.
- add hacks to fsck_lfs to make it compile.
- add hacks to newfs_lfs to make it compile.
- fix warning in ulfs_quota.c when quotas are fully disabled
(as I guess is happening with the rumpity version)

XXX: This commit adds -I${NETBSDSRCDIR}/sys to the Makefiles for
XXX: fsck_lfs, newfs_lfs, and lfs_cleanerd. This needs to be cleaned
XXX: up ASAP; but I consider this less problematic in the short term
XXX: than spewing ulfs_*.h into /usr/include.
 1.28 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.27 28-Apr-2008  martin branches: 1.27.34; 1.27.44;
Remove clause 3 and 4 from TNF licenses
 1.26 11-Dec-2005  christos branches: 1.26.70; 1.26.72; 1.26.74;
merge ktrace-lwp.
 1.25 26-Feb-2005  perry nuke trailing whitespace
 1.24 09-Mar-2004  yamt branches: 1.24.8; 1.24.10;
calculate data checksum inline.
 1.23 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.22 20-Feb-2003  perseant branches: 1.22.2;
Tabify, and fix some comment alignment problems.
 1.21 17-Feb-2003  perseant Add code to UBCify LFS. This is still behind "#ifdef LFS_UBC" for now
(there are still some details to work out) but expect that to go
away soon. To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:

* Create a writer daemon kernel thread whose purpose is to handle page
writes for the pagedaemon, but which also takes over some of the
functions of lfs_check(). This thread is started the first time an
LFS is mounted.

* Add a "flags" parameter to GOP_SIZE. Current values are
GOP_SIZE_READ, meaning that the call should return the size of the
in-core version of the file, and GOP_SIZE_WRITE, meaning that it
should return the on-disk size. One of GOP_SIZE_READ or
GOP_SIZE_WRITE must be specified.

* Instead of using malloc(...M_WAITOK) for everything, reserve enough
resources to get by and use malloc(...M_NOWAIT), using the reserves if
necessary. Use the pool subsystem for structures small enough that
this is feasible. This also obsoletes LFS_THROTTLE.

And a few that are not strictly necessary:

* Moves the LFS inode extensions off onto a separately allocated
structure; getting closer to LFS as an LKM. "Welcome to 1.6O."

* Unified GOP_ALLOC between FFS and LFS.

* Update LFS copyright headers to correct values.

* Actually cast to unsigned in lfs_shellsort, like the comment says.

* Keep track of which segments were empty before the previous
checkpoint; any segments that pass two checkpoints both dirty and
empty can be summarily cleaned. Do this. Right now lfs_segclean
still works, but this should be turned into an effectless
compatibility syscall.
 1.20 16-Jun-2002  perseant For synchronous writes, keep separate i/o counters for each write, so
processes don't have to wait for one another to finish (e.g., nfsd seems
to be a little happier now, though I haven't measured the difference).
Synchronous checkpoints, however, must always wait for all i/o to finish.

Take the contents of the callback functions and have them run in thread
context instead (aiodoned thread). lfs_iocount no longer has to be
protected in splbio(), and quite a bit less of the segment construction
loop needs to be in splbio() as well.

If lfs_markv is handed a block that is not the correct size according to
the inode, refuse to process it. (Formerly it was extended to the "correct"
size.) This is possibly more prone to deadlock, but less prone to corruption.

lfs_segclean now outright refuses to clean segments that appear to have live
bytes in them. Again this may be more prone to deadlock but avoids
corruption.

Replace ufsspec_close and ufsfifo_close with LFS equivalents; this means
that no UFS functions need to know about LFS_ITIMES any more. Remove
the reference from ufs/inode.h.

Tested on i386, test-compiled on alpha.
 1.19 15-Nov-2001  lukem branches: 1.19.8; 1.19.10;
don't need <sys/types.h> when including <sys/param.h>
 1.18 08-Nov-2001  lukem add RCSID
 1.17 26-Oct-2001  lukem remove #include <ufs/ufs/quota.h> where it was just to appease
<ufs/ufs/inode.h>, since the latter now includes the former. leave the former
in source that obviously uses specific bits of it (for completeness.)
 1.16 13-Jul-2001  perseant branches: 1.16.4;
Merge the short-lived perseant-lfsv2 branch into the trunk.

Kernels and tools understand both v1 and v2 filesystems; newfs_lfs
generates v2 by default. Changes for the v2 layout include:

- Segments of non-PO2 size and arbitrary block offset, so these can be
matched to convenient physical characteristics of the partition (e.g.,
stripe or track size and offset).

- Address by fragment instead of by disk sector, paving the way for
non-512-byte-sector devices. In theory fragments can be as large
as you like, though in reality they must be smaller than MAXBSIZE in size.

- Use serial number and filesystem identifier to ensure that roll-forward
doesn't get old data and think it's new. Roll-forward is enabled for
v2 filesystems, though not for v1 filesystems by default.

- The inode free list is now a tailq, paving the way for undelete (undelete
is not yet implemented, but can be without further non-backwards-compatible
changes to disk structures).

- Inode atime information is kept in the Ifile, instead of on the inode;
that is, the inode is never written *just* because atime was changed.
Because of this the inodes remain near the file data on the disk, rather
than wandering all over as the disk is read repeatedly. This speeds up
repeated reads by a small but noticeable amount.

Other changes of note include:

- The ifile written by newfs_lfs can now be of arbitrary length, it is no
longer restricted to a single indirect block.

- Fixed an old bug where ctime was changed every time a vnode was created.
I need to look more closely to make sure that the times are only updated
during write(2) and friends, not after-the-fact during a segment write,
and certainly not by the cleaner.
 1.15 04-Feb-2001  christos branches: 1.15.2; 1.15.4; 1.15.6;
don't include lfs_extern.h; ufs/inode.h does too.
 1.14 25-Nov-2000  perseant Use u_int32_t instead of u_long to compute LFS checksums, since the
checksum is stored in a u_int32_t.
 1.13 09-Sep-2000  perseant Various bug-fixes to LFS, to wit:


Kernel:

* Add runtime quantity lfs_ravail, the number of disk-blocks reserved
for writing. Writes to the filesystem first reserve a maximum amount
of blocks before their write is allowed to proceed; after the blocks
are allocated the reserved total is reduced by a corresponding amount.

If the lfs_reserve function cannot immediately reserve the requested
number of blocks, the inode is unlocked, and the thread sleeps until
the cleaner has made enough space available for the blocks to be
reserved. In this way large files can be written to the filesystem
(or, smaller files can be written to a nearly-full but thoroughly
clean filesystem) and the cleaner can still function properly.

* Remove explicit switching on dlfs_minfreeseg from the kernel code; it
is now merely a fs-creation parameter used to compute dlfs_avail and
dlfs_bfree (and used by fsck_lfs(8) to check their accuracy). Its
former role is better assumed by a properly computed dlfs_avail.

* Bounds-check inode numbers submitted through lfs_bmapv and lfs_markv.
This prevents a panic, but, if the cleaner is feeding the filesystem
the wrong data, you are still in a world of hurt.

* Cleanup: remove explicit references of DEV_BSIZE in favor of
btodb()/dbtob().

lfs_cleanerd:

* Make -n mean "send N segments' blocks through a single call to
lfs_markv". Previously it had meant "clean N segments though N calls
to lfs_markv, before looking again to see if more need to be cleaned".
The new behavior gives better packing of direct data on disk with as
little metadata as possible, largely alleviating the problem that the
cleaner can consume more disk through inefficient use of metadata than
it frees by moving dirty data away from clean "holes" to produce
entirely clean segments.

* Make -b mean "read as many segments as necessary to write N segments
of dirty data back to disk", rather than its former meaning of "read
as many segments as necessary to free N segments worth of space". The
new meaning, combined with the new -n behavior described above,
further aids in cleaning storage efficiency as entire segments can be
written at once, using as few blocks as possible for segment summaries
and inode blocks.

* Make the cleaner take note of segments which could not be cleaned due
to error, and not attempt to clean them until they are entirely free
of dirty blocks. This prevents the case in which a cleanerd running
with -n 1 and without -b (formerly the default) would spin trying
repeatedly to clean a corrupt segment, while the remaining space
filled and deadlocked the filesystem.

* Update the lfs_cleanerd manual page to describe all the options,
including the changes mentioned here (in particular, the -b and -n
flags were previously undocumented).

fsck_lfs:

* Check, and optionally fix, lfs_avail (to an exact figure) and
lfs_bfree (within a margin of error) in pass 5.

newfs_lfs:

* Reduce the default dlfs_minfreeseg to 1/20 of the total segments.

* Add a warning if the sgs disklabel field is 16 (the default for FFS'
cpg, but not usually desirable for LFS' sgs: 5--8 is a better range).

* Change the calculation of lfs_avail and lfs_bfree, corresponding to
the kernel changes mentioned above.

mount_lfs:

* Add -N and -b options to pass corresponding -n and -b options to
lfs_cleanerd.

* Default to calling lfs_cleanerd with "-b -n 4".


[All of these changes were largely tested in the 1.5 branch, with the
idea that they (along with previous un-pulled-up work) could be applied
to the branch while it was still in ALPHA2; however my test system has
experienced corruption on another filesystem (/dev/console has gone
missing :^), and, while I believe this unrelated to the LFS changes, I
cannot with good conscience request that the changes be pulled up.]
 1.12 30-Mar-2000  augustss branches: 1.12.4;
Remove register declarations.
 1.11 25-Mar-1999  perseant branches: 1.11.8;
Change lfs_sb_cksum to use offsetof() instead of an inlined version.

Fix lfs_vref/lfs_vunredf to ignore VXLOCKed vnodes that are also being
flushed.

Improve the debugging messages somewhat.
 1.10 10-Mar-1999  perseant New sources should leave the LFS in a more-or-less working state. Changes
include:

- DIROP segregation is enabled, and greater care is taken
to make sure that a checkpoint completes. Fsck is not
needed to remount the filesystem.
- Several checks to make sure that the LFS subsystem does not
overuse various resources (memory, in particular).
- The cleaner routines, lfs_markv in particular, are completely
rewritten. A buffer overflow is removed. Greater care is taken
to ensure that inodes come from where lfs_cleanerd say they come
from (so we know nothing has changed since lfs_bmapv was called).
- Fragment allocation is fixed, so that writes beyond end-of-file
do the right thing.
 1.9 11-Sep-1998  pk PR#6032: define fixed sized on-disk superblock structure.
 1.8 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.7 15-Sep-1997  lukem prototype lfs_cksum ifndef KERNEL
 1.6 16-Feb-1996  christos branches: 1.6.12;
Protect include in lfs_cksum.c so that it can be used by userland programs.
 1.5 09-Feb-1996  christos lfs prototypes
 1.4 14-Dec-1994  mycroft Sync with CSRG.
 1.3 20-Sep-1994  cgd c syntax
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.6.12.1 22-Sep-1997  thorpej Update marc-pcmcia branch from trunk.
 1.11.8.3 11-Feb-2001  bouyer Sync with HEAD.
 1.11.8.2 08-Dec-2000  bouyer Sync with HEAD.
 1.11.8.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.12.4.1 14-Sep-2000  perseant Pull up recent LFS kernel changes (approved by thorpej):

ufs/ufs/inode.h, 1.20--1.22 (add i_lfs_effnblks extension ;
make ITIMES aware of LFS_ITIMES;
_LKM protection so userland progs
compile)
ufs/ufs/ufs_vnops.c, 1.69, 1.71 (remove IN_ADIROP;
use ITIMES instead of FFS_ITIMES)
ufs/ufs/ufs_readwrite.c, 1.27 (use lfs_reserve in lfs_write)
ufs/lfs/lfs.h, 1.26--1.32 (define LFS_EST_* macros ;
change MIN_FREE_SEGS to lfs_minfreesegs ;
add avail and bfree to CLEANERINFO ;
change lfs_uinodes to signed ;
change lfs_dmeta to signed ;
add whitespace to line up structure
members ;
explicit cast to int32_t in LFS_EST_*
macros)
ufs/lfs/lfs_alloc.c, back out 1.34.2.3 (pullups of 1.39, 1.40);
then pull up 1.38 (clean up on error)
1.39--1.43 (restore fvdl's ufs_hashlock fix ;
restore fvdl's ufs_hashlock fix ;
set i_lfs_effnblks ;
use UINO macros ;
add comments and fix long lines)
ufs/lfs/lfs_balloc.c, 1.19 (don't succeed halfway)
1.21--1.25 (use i_lfs_effnblks ;
fix i_lfs_effnblks computation and
quieten ;
fix i_ffs_blocks in unwritten fragment ;
remove useless debugging check ;
add comments and (c) 2000)
ufs/lfs/lfs_bio.c, 1.24--1.30 (cleanup and make lfs_flush_fs take
"struct lfs *" instead of "struct
mount *" ;
use lfs_minfreeseg instead of
MIN_FREE_SEGS ;
use UINO macros, and copy bfree/avail
to CLEANERINFO ;
add lfs_reserve function ;
1.28--1.30 fix printf formatting)
ufs/lfs/lfs_cksum.c, 1.13 (add (c) 2000)
ufs/lfs/lfs_debug.c, 1.11 (use btodb instead of DEV_BSIZE)
ufs/lfs/lfs_extern.h, 1.18, 1.20--1.21 (function prototype changes)
ufs/lfs/lfs_inode.c, 1.38 (rewrite lfs_truncate from
ffs_truncate)
1.40--1.44 (count written and unwritten blocks
seperately ;
use disk block units instead of bytes ;
remove unnecessary "mod" variable ;
correct B_DELWRI to avoid bawrite panic ;
use lfs_reserve)
ufs/lfs/lfs_segment.c, 1.52-1.59 (use lfs_dmeta to note used summaries ;
check for UNWRITTEN in indirect blocks ;
more debugging stuff inside #ifdef
DEBUG_LFS ;
use LK_CANRECURSE ;
don't drop dirty indirect blocks ;
use UINO macros ;
don't hose the free list ;
use btodb() instead of DEV_BSIZE ;
make it compile again (oops))
ufs/lfs/lfs_subr.c, 1.16--1.17 (check for locked inodes before
changing ;
use btodb() instead of DEV_BSIZE, (c)
2000)
ufs/lfs/lfs_syscalls.c, back out 1.41.4.2 (fvdl's ufs_hashlock fix);
then pull up 1.43 (use lfs_dmeta)
1.44--1.45 (restore fvdl's ufs_hashlock fix)
1.46--1.47 (fix lfs_avail leakage from sblock
segments ;
use UINO macros)
1.49 (bounds-check inode numbers in
lfs_markv)
ufs/lfs/lfs_vfsops.c, 1.53 (use LFS_EST_* macros in lfs_statfs)
1.56--1.58 (initialize lfs_minfreeseg, lfs_effnblk ;
initialize lfs_uinodes ;
initialize lfs_ravail)
ufs/lfs/lfs_vnops.c, 1.40 (remove VDIROP from removed files)
1.42--1.44 (move SET_ENDOP below the removal of
VDIROP ;
use UINO macros and add lfs_itimes
function ;
use lfs_reserve in dirops)
 1.15.6.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.15.6.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.15.6.1 03-Aug-2001  lukem update to -current
 1.15.4.1 29-Jun-2001  perseant Get rid of __P(), protoizing where it had not already been done
 1.15.2.4 20-Jun-2002  nathanw Catch up to -current.
 1.15.2.3 08-Jan-2002  nathanw Catch up to -current.
 1.15.2.2 14-Nov-2001  nathanw Catch up to -current.
 1.15.2.1 24-Aug-2001  nathanw Catch up with -current.
 1.16.4.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.19.10.1 20-Jun-2002  lukem Pull up revision 1.20 (requested by perseant in ticket #325):
For synchronous writes, keep separate i/o counters for each write, so
processes don't have to wait for one another to finish (e.g., nfsd seems
to be a little happier now, though I haven't measured the difference).
Synchronous checkpoints, however, must always wait for all i/o to finish.
Take the contents of the callback functions and have them run in thread
context instead (aiodoned thread). lfs_iocount no longer has to be
protected in splbio(), and quite a bit less of the segment construction
loop needs to be in splbio() as well.
If lfs_markv is handed a block that is not the correct size according to
the inode, refuse to process it. (Formerly it was extended to the "correct"
size.) This is possibly more prone to deadlock, but less prone to corruption.
lfs_segclean now outright refuses to clean segments that appear to have live
bytes in them. Again this may be more prone to deadlock but avoids
corruption.
Replace ufsspec_close and ufsfifo_close with LFS equivalents; this means
that no UFS functions need to know about LFS_ITIMES any more. Remove
the reference from ufs/inode.h.
Tested on i386, test-compiled on alpha.
 1.19.8.1 20-Jun-2002  gehenna catch up with -current.
 1.22.2.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.22.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.22.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.22.2.1 03-Aug-2004  skrll Sync with HEAD
 1.24.10.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.24.8.1 29-Apr-2005  kent sync with -current
 1.26.74.1 16-May-2008  yamt sync with head.
 1.26.72.1 18-May-2008  yamt sync with head.
 1.26.70.1 02-Jun-2008  mjf Sync with HEAD.
 1.27.44.2 03-Dec-2017  jdolecek update from HEAD
 1.27.44.1 23-Jun-2013  tls resync from head
 1.27.34.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.29.10.1 22-Sep-2015  skrll Sync with HEAD
 1.55 23-Feb-2020  riastradh Just use VOP_BWRITE for lfs_bwrite_log.

Hope this doesn't cause trouble with vfs_suspend.
 1.54 01-Sep-2015  dholland branches: 1.54.18; 1.54.22; 1.54.24;
The ifile's inode number is constant. (it is always 1)

Therefore, storing the value in the superblock and reading it out
again is silly and offers the opportunity for it to become corrupted.
So, don't do that (most of the code already didn't) and use the
existing constant instead. Initialize new 32-bit superblocks with
the value for the sake of old userland programs, but don't keep the
value in the 64-bit superblock at all.

(approved by Margo Seltzer)
 1.53 01-Sep-2015  dholland Make the inode fields in the 64-bit superblock 64 bits wide.
Reasoning as before.

Note that I am not going through and checking for 64->32 truncations
in inode numbers; I'm sure there are quite a few, but that's a project
for later.
 1.52 12-Aug-2015  dholland Hack up dinode usage to be 64 vs. 32 as needed. Part 1.

(This part changes the native lfs code; the ufs-derived code already
has 64 vs. 32 logic, but as aspects of it are unsafe, and don't
entirely interoperate cleanly with the lfs 64/32 stuff, pass 2 will be
rehashing that.)
 1.51 12-Aug-2015  dholland Provide 32-bit and 64-bit versions of FINFO.

This also entailed sorting out part of struct segment, as that
contains a pointer into the current FINFO data.
 1.50 02-Aug-2015  dholland Add a (draft) 64-bit superblock. Make things build again.

Add pieces of support for using both superblock types where
convenient, and specifically to the superblock accessors, but don't
actually enable it anywhere.

First substantive step on PR 50000.
 1.49 02-Aug-2015  dholland Use accessor functions for the version field of the lfs superblock.
I thought at first maybe the cases that test the version should be
rolled into the accessors, but on the whole I think the conclusion on
that is no.
 1.48 02-Aug-2015  dholland Second batch of 64 -> 32 truncations in lfs, along with more minor
tidyups and corrections in passing.
 1.47 02-Aug-2015  dholland Fix assorted 64 -> 32 truncations in lfs. Also, some minor tidyups and
corrections in passing.
 1.46 28-Jul-2015  dholland Add a new lfs header file: lfs_accessors.h.

This contains all the accessor functions and macros out of lfs.h.
Add an include of lfs_accessors.h after all uses of lfs.h... except
for code that wants to define its own struct lfs-alike that the
accessors are supposed to play along with. For these, set STRUCT_LFS
and include lfs_accessors.h after the necessary structure has been
defined, so that lfs_accessors.h can emit functions in terms of it.
 1.45 25-Jul-2015  hannken Another lfs superblock accessor (inside #ifdef 0).
 1.44 25-Jul-2015  martin Use accessors in DEBUG and DIAGNOSTIC code as well
 1.43 18-Jun-2013  christos branches: 1.43.10;
Prefix most of the cpp macros with lfs_ and LFS_ to avoid conflicts with ffs.
This was done so that boot blocks that want to compile both FFS and LFS in
the same file work.
 1.42 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.41 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.40 22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.39 17-Jul-2011  joerg branches: 1.39.2; 1.39.12;
Retire varargs.h support. Move machine/stdarg.h logic into MI
sys/stdarg.h and expect compiler to provide proper builtins, defaulting
to the GCC interface. lint still has a special fallback.
Reduce abuse of _BSD_VA_LIST_ by defining __va_list by default and
derive va_list as required by standards.
 1.38 19-Jul-2009  dholland minor knf
 1.37 28-Apr-2008  martin branches: 1.37.14;
Remove clause 3 and 4 from TNF licenses
 1.36 02-Jan-2008  ad branches: 1.36.6; 1.36.8; 1.36.10;
Merge vmlocking2 to head.
 1.35 12-Dec-2007  lukem Move __KERNEL_RCSID() so that it's always available if this file is
compiled, even if DEBUG isn't defined.
(This matches the behaviour of various other source files that
provide functions only if DEBUG is enabled.)
 1.34 22-Jul-2007  christos branches: 1.34.6; 1.34.12; 1.34.14; 1.34.16; 1.34.18; 1.34.22;
make this compile again
 1.33 11-Dec-2005  christos branches: 1.33.30; 1.33.40;
merge ktrace-lwp.
 1.32 19-Aug-2005  christos 64 bit inode changes.
 1.31 29-May-2005  christos branches: 1.31.2;
- sprinkle const
- avoid shadow variables.
 1.30 01-Apr-2005  perseant Protect various per-fs structures with fs->lfs_interlock simple_lock, to
improve behavior in the multiprocessor case. Add debugging segment-lock
assertion statements.
 1.29 26-Mar-2005  christos make this compile again :-(
 1.28 26-Mar-2005  christos Use vlog(9). Open-coding vlog here breaks lkm's because including
<sys/kprintf.h> includes opt_multiprocessor.h. One could argue
that the lock stuff should just move to subr_prf.c since nothing
else uses it.
 1.27 08-Mar-2005  simonb branches: 1.27.2;
Tab Police.
 1.26 08-Mar-2005  perseant Straighten out the maze of ifdefs. Instead, consolidate all the debugging
stuff under '#ifdef DEBUG', and use sysctl knobs to turn on/off particular
parts of the debugging reporting (if DEBUG is enabled). Re-enable the LFS
statistics in sysctl, while I'm there. A bit of a rototill.
 1.25 26-Feb-2005  perry nuke trailing whitespace
 1.24 30-Oct-2003  simonb branches: 1.24.8; 1.24.10;
Remove some assigned-to but otherwise unused variables.
 1.23 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.22 02-Apr-2003  fvdl branches: 1.22.2;
Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.21 20-Feb-2003  perseant Tabify, and fix some comment alignment problems.
 1.20 17-Feb-2003  perseant Add code to UBCify LFS. This is still behind "#ifdef LFS_UBC" for now
(there are still some details to work out) but expect that to go
away soon. To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:

* Create a writer daemon kernel thread whose purpose is to handle page
writes for the pagedaemon, but which also takes over some of the
functions of lfs_check(). This thread is started the first time an
LFS is mounted.

* Add a "flags" parameter to GOP_SIZE. Current values are
GOP_SIZE_READ, meaning that the call should return the size of the
in-core version of the file, and GOP_SIZE_WRITE, meaning that it
should return the on-disk size. One of GOP_SIZE_READ or
GOP_SIZE_WRITE must be specified.

* Instead of using malloc(...M_WAITOK) for everything, reserve enough
resources to get by and use malloc(...M_NOWAIT), using the reserves if
necessary. Use the pool subsystem for structures small enough that
this is feasible. This also obsoletes LFS_THROTTLE.

And a few that are not strictly necessary:

* Moves the LFS inode extensions off onto a separately allocated
structure; getting closer to LFS as an LKM. "Welcome to 1.6O."

* Unified GOP_ALLOC between FFS and LFS.

* Update LFS copyright headers to correct values.

* Actually cast to unsigned in lfs_shellsort, like the comment says.

* Keep track of which segments were empty before the previous
checkpoint; any segments that pass two checkpoints both dirty and
empty can be summarily cleaned. Do this. Right now lfs_segclean
still works, but this should be turned into an effectless
compatibility syscall.
 1.19 29-Jan-2003  yamt don't use daddr_t for segment summary since it's an on-disk structure.
 1.18 25-Jan-2003  kleink Fix further printf format warnings for DEBUG, in the wake of daddr_t
having changed.
 1.17 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.16 14-May-2002  perseant Phase one of my three-phase plan to make LFS play nice with UBC, and bug-fixes
I found while making sure there weren't any new ones.

* Make the write clusters keep track of the buffers whose blocks they contain.
This should make it possible to (1) write clusters using a page mapping
instead of malloc, if desired, and (2) schedule blocks for rewriting
(somewhere else) if a write error occurs. Code is present to use
pagemove() to construct the clusters but that is untested and will go away
anyway in favor of page mapping.
* DEBUG now keeps a log of Ifile writes, so that any lingering instances of
the "dirty bufs" problem can be properly debugged.
* Keep track of whether the Ifile has been dirtied by various routines that
can be called by lfs_segwrite, and loop on that until it is clean, for
a checkpoint. Checkpoints need to be squeaky clean.
* Warn the user (once) if the Ifile grows larger than is reasonable for their
buffer cache. Both lfs_mountfs and lfs_unmount check since the Ifile can
grow.
* If an inode is not found in a disk block, try rereading the block, under
the assumption that the block was copied to a cluster and then freed.
* Protect WRITEINPROG() with splbio() to fix a hang in lfs_update.
 1.15 23-Nov-2001  chs add spaces for KNF. confirmed to produce identical objects.
 1.14 08-Nov-2001  lukem add RCSID
 1.13 26-Oct-2001  lukem remove #include <ufs/ufs/quota.h> where it was just to appease
<ufs/ufs/inode.h>, since the latter now includes the former. leave the former
in source that obviously uses specific bits of it (for completeness.)
 1.12 13-Jul-2001  perseant branches: 1.12.4;
Merge the short-lived perseant-lfsv2 branch into the trunk.

Kernels and tools understand both v1 and v2 filesystems; newfs_lfs
generates v2 by default. Changes for the v2 layout include:

- Segments of non-PO2 size and arbitrary block offset, so these can be
matched to convenient physical characteristics of the partition (e.g.,
stripe or track size and offset).

- Address by fragment instead of by disk sector, paving the way for
non-512-byte-sector devices. In theory fragments can be as large
as you like, though in reality they must be smaller than MAXBSIZE in size.

- Use serial number and filesystem identifier to ensure that roll-forward
doesn't get old data and think it's new. Roll-forward is enabled for
v2 filesystems, though not for v1 filesystems by default.

- The inode free list is now a tailq, paving the way for undelete (undelete
is not yet implemented, but can be without further non-backwards-compatible
changes to disk structures).

- Inode atime information is kept in the Ifile, instead of on the inode;
that is, the inode is never written *just* because atime was changed.
Because of this the inodes remain near the file data on the disk, rather
than wandering all over as the disk is read repeatedly. This speeds up
repeated reads by a small but noticeable amount.

Other changes of note include:

- The ifile written by newfs_lfs can now be of arbitrary length, it is no
longer restricted to a single indirect block.

- Fixed an old bug where ctime was changed every time a vnode was created.
I need to look more closely to make sure that the times are only updated
during write(2) and friends, not after-the-fact during a segment write,
and certainly not by the cleaner.
 1.11 09-Sep-2000  perseant branches: 1.11.2; 1.11.4; 1.11.6;
Various bug-fixes to LFS, to wit:


Kernel:

* Add runtime quantity lfs_ravail, the number of disk-blocks reserved
for writing. Writes to the filesystem first reserve a maximum amount
of blocks before their write is allowed to proceed; after the blocks
are allocated the reserved total is reduced by a corresponding amount.

If the lfs_reserve function cannot immediately reserve the requested
number of blocks, the inode is unlocked, and the thread sleeps until
the cleaner has made enough space available for the blocks to be
reserved. In this way large files can be written to the filesystem
(or, smaller files can be written to a nearly-full but thoroughly
clean filesystem) and the cleaner can still function properly.

* Remove explicit switching on dlfs_minfreeseg from the kernel code; it
is now merely a fs-creation parameter used to compute dlfs_avail and
dlfs_bfree (and used by fsck_lfs(8) to check their accuracy). Its
former role is better assumed by a properly computed dlfs_avail.

* Bounds-check inode numbers submitted through lfs_bmapv and lfs_markv.
This prevents a panic, but, if the cleaner is feeding the filesystem
the wrong data, you are still in a world of hurt.

* Cleanup: remove explicit references of DEV_BSIZE in favor of
btodb()/dbtob().

lfs_cleanerd:

* Make -n mean "send N segments' blocks through a single call to
lfs_markv". Previously it had meant "clean N segments though N calls
to lfs_markv, before looking again to see if more need to be cleaned".
The new behavior gives better packing of direct data on disk with as
little metadata as possible, largely alleviating the problem that the
cleaner can consume more disk through inefficient use of metadata than
it frees by moving dirty data away from clean "holes" to produce
entirely clean segments.

* Make -b mean "read as many segments as necessary to write N segments
of dirty data back to disk", rather than its former meaning of "read
as many segments as necessary to free N segments worth of space". The
new meaning, combined with the new -n behavior described above,
further aids in cleaning storage efficiency as entire segments can be
written at once, using as few blocks as possible for segment summaries
and inode blocks.

* Make the cleaner take note of segments which could not be cleaned due
to error, and not attempt to clean them until they are entirely free
of dirty blocks. This prevents the case in which a cleanerd running
with -n 1 and without -b (formerly the default) would spin trying
repeatedly to clean a corrupt segment, while the remaining space
filled and deadlocked the filesystem.

* Update the lfs_cleanerd manual page to describe all the options,
including the changes mentioned here (in particular, the -b and -n
flags were previously undocumented).

fsck_lfs:

* Check, and optionally fix, lfs_avail (to an exact figure) and
lfs_bfree (within a margin of error) in pass 5.

newfs_lfs:

* Reduce the default dlfs_minfreeseg to 1/20 of the total segments.

* Add a warning if the sgs disklabel field is 16 (the default for FFS'
cpg, but not usually desirable for LFS' sgs: 5--8 is a better range).

* Change the calculation of lfs_avail and lfs_bfree, corresponding to
the kernel changes mentioned above.

mount_lfs:

* Add -N and -b options to pass corresponding -n and -b options to
lfs_cleanerd.

* Default to calling lfs_cleanerd with "-b -n 4".


[All of these changes were largely tested in the 1.5 branch, with the
idea that they (along with previous un-pulled-up work) could be applied
to the branch while it was still in ALPHA2; however my test system has
experienced corruption on another filesystem (/dev/console has gone
missing :^), and, while I believe this unrelated to the LFS changes, I
cannot with good conscience request that the changes be pulled up.]
 1.10 23-Apr-2000  perseant branches: 1.10.4;
Fix problems outlined in PR#9926:
- lfs_truncate extends the file if called with length > i_ffs_size;
- lfs_truncate errors out if called with length < 0;
- lfs_balloc block accounting corrected for the case of blocks read
into the cache before they exist on disk;
- mp->mnt_stat.f_iosize is initialized in lfs_mountfs.
 1.9 10-Mar-1999  perseant branches: 1.9.8; 1.9.14;
New sources should leave the LFS in a more-or-less working state. Changes
include:

- DIROP segregation is enabled, and greater care is taken
to make sure that a checkpoint completes. Fsck is not
needed to remount the filesystem.
- Several checks to make sure that the LFS subsystem does not
overuse various resources (memory, in particular).
- The cleaner routines, lfs_markv in particular, are completely
rewritten. A buffer overflow is removed. Greater care is taken
to ensure that inodes come from where lfs_cleanerd say they come
from (so we know nothing has changed since lfs_bmapv was called).
- Fragment allocation is fixed, so that writes beyond end-of-file
do the right thing.
 1.8 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.7 15-Nov-1996  cgd cast int64_t-sized types to "long long" before printing them with %qd.
gcc thinks that the 'q' modifier describes a "long long", and so -Wformat
whines when printing with 'q' on the alpha, since int64_t-sized types are
done with variations on "long" rather than "long long".
 1.6 12-Oct-1996  christos revert previous kprintf changes
 1.5 10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.4 17-Mar-1996  christos Fix printf format strings
 1.3 12-Feb-1996  christos di_size is a quad and needs %qu not %lu
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.9.14.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.9.8.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.10.4.1 14-Sep-2000  perseant Pull up recent LFS kernel changes (approved by thorpej):

ufs/ufs/inode.h, 1.20--1.22 (add i_lfs_effnblks extension ;
make ITIMES aware of LFS_ITIMES;
_LKM protection so userland progs
compile)
ufs/ufs/ufs_vnops.c, 1.69, 1.71 (remove IN_ADIROP;
use ITIMES instead of FFS_ITIMES)
ufs/ufs/ufs_readwrite.c, 1.27 (use lfs_reserve in lfs_write)
ufs/lfs/lfs.h, 1.26--1.32 (define LFS_EST_* macros ;
change MIN_FREE_SEGS to lfs_minfreesegs ;
add avail and bfree to CLEANERINFO ;
change lfs_uinodes to signed ;
change lfs_dmeta to signed ;
add whitespace to line up structure
members ;
explicit cast to int32_t in LFS_EST_*
macros)
ufs/lfs/lfs_alloc.c, back out 1.34.2.3 (pullups of 1.39, 1.40);
then pull up 1.38 (clean up on error)
1.39--1.43 (restore fvdl's ufs_hashlock fix ;
restore fvdl's ufs_hashlock fix ;
set i_lfs_effnblks ;
use UINO macros ;
add comments and fix long lines)
ufs/lfs/lfs_balloc.c, 1.19 (don't succeed halfway)
1.21--1.25 (use i_lfs_effnblks ;
fix i_lfs_effnblks computation and
quieten ;
fix i_ffs_blocks in unwritten fragment ;
remove useless debugging check ;
add comments and (c) 2000)
ufs/lfs/lfs_bio.c, 1.24--1.30 (cleanup and make lfs_flush_fs take
"struct lfs *" instead of "struct
mount *" ;
use lfs_minfreeseg instead of
MIN_FREE_SEGS ;
use UINO macros, and copy bfree/avail
to CLEANERINFO ;
add lfs_reserve function ;
1.28--1.30 fix printf formatting)
ufs/lfs/lfs_cksum.c, 1.13 (add (c) 2000)
ufs/lfs/lfs_debug.c, 1.11 (use btodb instead of DEV_BSIZE)
ufs/lfs/lfs_extern.h, 1.18, 1.20--1.21 (function prototype changes)
ufs/lfs/lfs_inode.c, 1.38 (rewrite lfs_truncate from
ffs_truncate)
1.40--1.44 (count written and unwritten blocks
seperately ;
use disk block units instead of bytes ;
remove unnecessary "mod" variable ;
correct B_DELWRI to avoid bawrite panic ;
use lfs_reserve)
ufs/lfs/lfs_segment.c, 1.52-1.59 (use lfs_dmeta to note used summaries ;
check for UNWRITTEN in indirect blocks ;
more debugging stuff inside #ifdef
DEBUG_LFS ;
use LK_CANRECURSE ;
don't drop dirty indirect blocks ;
use UINO macros ;
don't hose the free list ;
use btodb() instead of DEV_BSIZE ;
make it compile again (oops))
ufs/lfs/lfs_subr.c, 1.16--1.17 (check for locked inodes before
changing ;
use btodb() instead of DEV_BSIZE, (c)
2000)
ufs/lfs/lfs_syscalls.c, back out 1.41.4.2 (fvdl's ufs_hashlock fix);
then pull up 1.43 (use lfs_dmeta)
1.44--1.45 (restore fvdl's ufs_hashlock fix)
1.46--1.47 (fix lfs_avail leakage from sblock
segments ;
use UINO macros)
1.49 (bounds-check inode numbers in
lfs_markv)
ufs/lfs/lfs_vfsops.c, 1.53 (use LFS_EST_* macros in lfs_statfs)
1.56--1.58 (initialize lfs_minfreeseg, lfs_effnblk ;
initialize lfs_uinodes ;
initialize lfs_ravail)
ufs/lfs/lfs_vnops.c, 1.40 (remove VDIROP from removed files)
1.42--1.44 (move SET_ENDOP below the removal of
VDIROP ;
use UINO macros and add lfs_itimes
function ;
use lfs_reserve in dirops)
 1.11.6.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.11.6.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.11.6.1 03-Aug-2001  lukem update to -current
 1.11.4.3 02-Jul-2001  perseant Change disk addressing unit to be the fragment, instead of the disk sector.
All quantities in the superblock, inodes, indirect blocks, etc. refer now
to this abstract unit (called "fsb" as it is in FFS) instead of disk sectors;
as a consequence segment summary blocks have to be multiples of a fragment in
size. In v1 filesystems, compatibility code ensures that 1 fsb == 1 sector,
regardless of fragment size.

Fragments can now range in size between 512 and 32k; in the event that
LFS_LABELPAD (8k) is smaller than the disk address unit size, an extra
proto-superblock is kept at 8k from the beginning of the disk, to be used
*only* to locate the real superblocks. (Not all of the userland knows about
this yet.)

Almost all of this was done not by me, but by joff.
 1.11.4.2 29-Jun-2001  perseant Get rid of __P(), protoizing where it had not already been done
 1.11.4.1 27-Jun-2001  perseant Import of what I've been calling "LFSv2", that is, LFS with some features
added that require changes to the on-disk data structures. These include:

- 64-bit time in everything but inodes
- User-specified segment offset, and segment size no longer
restricted to PO2.
- Serial number on segment summaries in addition to timestamp, and
a new volume identifier, to make roll-forward feasible without
fear of finding old data and thinking it was new.

Although I think this version works at least as well as what's on the trunk,
we're not done yet; hence this commit is going in on a branch and not on
the trunk. Enhancements that are not here yet include fragment addressing,
like FFS does, instead of block addressing.
 1.11.2.4 20-Jun-2002  nathanw Catch up to -current.
 1.11.2.3 08-Jan-2002  nathanw Catch up to -current.
 1.11.2.2 14-Nov-2001  nathanw Catch up to -current.
 1.11.2.1 24-Aug-2001  nathanw Catch up with -current.
 1.12.4.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.22.2.7 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.22.2.6 01-Apr-2005  skrll Sync with HEAD.
 1.22.2.5 08-Mar-2005  skrll Sync with HEAD.
 1.22.2.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.22.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.22.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.22.2.1 03-Aug-2004  skrll Sync with HEAD
 1.24.10.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.24.8.1 29-Apr-2005  kent sync with -current
 1.27.2.3 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.153
sys/ufs/lfs/lfs_debug.c: revision 1.32
sys/ufs/lfs/lfs_alloc.c: revision 1.84
sys/ufs/lfs/lfs_vfsops.c: revision 1.185
sys/ufs/lfs/lfs_segment.c: revision 1.165
64 bit inode changes.
 1.27.2.2 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.152
sys/ufs/lfs/lfs_debug.c: revision 1.31
sys/ufs/lfs/lfs_subr.c: revision 1.53
sys/ufs/lfs/lfs_extern.h: revision 1.68
sys/ufs/lfs/lfs_inode.c: revision 1.96
sys/ufs/lfs/lfs_bio.c: revision 1.86
sys/ufs/lfs/lfs_alloc.c: revision 1.83
sys/ufs/lfs/lfs_vfsops.c: revision 1.181
sys/ufs/lfs/lfs.h: revision 1.88
sys/ufs/lfs/lfs_segment.c: revision 1.164
- sprinkle const
- avoid shadow variables.
 1.27.2.1 07-May-2005  tron Apply patch (requested by perseant in ticket #242):
* fsck_lfs buffer cache fixes, including PR #29151
* Change fsck_lfs phase 0 message to reflect reality
* fsck_lfs: check phase 5 (cleanerinfo accounting) even on
roll-forward
* Keep better track of the free list during roll-forward, avoiding
a core dump
* Improve hash table use for fsck_lfs buffer and vnode cache
* Document fsck_lfs flag -f, and implement -q
* Add resize_lfs, including kernel support
* Add LFS to mountd's list of exportable filesystem types
* Make the LFS lkm work again [christos@]
* Add MP locking to the LFS kernel subsystem
* Fix pager_map deadlock in lfs_putpages()
* Avoid incomplete file extension that looks like "partial
truncation" to fsck
* Use lfs_malloc for cleaner malloc, since the cleaner often runs
in low-memory conditions.
* Use splay trees, not hash table, to track page allocation for
write.
* Fix mkdir panic on full fs
* Fix page accounting leak by counting differently.
* Use rightly named structure for lfs_getattr [skrll@]
* Cosmetic changes for readability.
 1.31.2.3 21-Jan-2008  yamt sync with head
 1.31.2.2 03-Sep-2007  yamt sync with head.
 1.31.2.1 21-Jun-2006  yamt sync with head.
 1.33.40.1 15-Aug-2007  skrll Sync with HEAD.
 1.33.30.4 28-Aug-2007  yamt make this compile with DEBUG.
 1.33.30.3 20-Aug-2007  ad Sync with HEAD.
 1.33.30.2 15-Jul-2007  ad Sync with head.
 1.33.30.1 05-Apr-2007  ad Compile fixes.
 1.34.22.2 22-Jul-2007  christos make this compile again
 1.34.22.1 22-Jul-2007  christos file lfs_debug.c was added on branch matt-mips64 on 2007-07-22 03:41:00 +0000
 1.34.18.2 02-Jan-2008  bouyer Sync with HEAD
 1.34.18.1 13-Dec-2007  bouyer Sync with HEAD
 1.34.16.1 13-Dec-2007  yamt sync with head.
 1.34.14.2 26-Dec-2007  ad Sync with head.
 1.34.14.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.34.12.1 18-Feb-2008  mjf Sync with HEAD.
 1.34.6.1 09-Jan-2008  matt sync with HEAD
 1.36.10.2 19-Aug-2009  yamt sync with head.
 1.36.10.1 16-May-2008  yamt sync with head.
 1.36.8.1 18-May-2008  yamt sync with head.
 1.36.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.37.14.1 23-Jul-2009  jym Sync with HEAD.
 1.39.12.3 03-Dec-2017  jdolecek update from HEAD
 1.39.12.2 23-Jun-2013  tls resync from head
 1.39.12.1 25-Feb-2013  tls resync with head
 1.39.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.39.2.1 23-Jan-2013  yamt sync with head
 1.43.10.1 22-Sep-2015  skrll Sync with HEAD
 1.54.24.1 29-Feb-2020  ad Sync with head.
 1.54.22.1 17-Aug-2020  martin Pull up following revision(s) (requested by riastradh in ticket #1050):

sys/ufs/lfs/lfs_subr.c: revision 1.101
sys/ufs/lfs/lfs_subr.c: revision 1.102
sys/ufs/lfs/lfs_inode.c: revision 1.158
sys/ufs/lfs/lfs_inode.h: revision 1.25
sys/ufs/lfs/lfs_balloc.c: revision 1.95
sys/ufs/lfs/lfs_pages.c: revision 1.21
sys/ufs/lfs/lfs_vnops.c: revision 1.330
sys/ufs/lfs/lfs_alloc.c: revision 1.140 (patch)
sys/ufs/lfs/lfs_alloc.c: revision 1.141 (patch)
lib/libp2k/p2k.c: revision 1.72
sys/ufs/lfs/lfs.h: revision 1.205
sys/ufs/lfs/lfs.h: revision 1.206
sys/ufs/lfs/lfs_segment.c: revision 1.284
sys/ufs/lfs/lfs.h: revision 1.207
sys/ufs/lfs/lfs_segment.c: revision 1.285
sys/ufs/lfs/lfs_debug.c: revision 1.55
sys/ufs/lfs/lfs_rename.c: revision 1.23
usr.sbin/dumplfs/dumplfs.c: revision 1.65
sys/ufs/lfs/lfs_vfsops.c: revision 1.371
sys/arch/i386/stand/efiboot/bootx64/Makefile: revision 1.3
sys/ufs/lfs/lfs_vfsops.c: revision 1.372
sys/ufs/lfs/lfs_vfsops.c: revision 1.373
sbin/fsck_lfs/pass1.c: revision 1.46
sys/ufs/lfs/lfs_vnops.c: revision 1.326
sys/ufs/lfs/lfs_vnops.c: revision 1.327
sys/ufs/lfs/lfs_vfsops.c: revision 1.375 (patch)
sys/ufs/lfs/lfs_vnops.c: revision 1.328
sys/ufs/lfs/lfs_subr.c: revision 1.98
sys/ufs/lfs/lfs_extern.h: revision 1.116
sys/ufs/lfs/lfs_vnops.c: revision 1.329
sys/ufs/lfs/lfs_subr.c: revision 1.99
sys/ufs/lfs/lfs_extern.h: revision 1.117
sys/ufs/lfs/lfs_accessors.h: revision 1.49
sys/ufs/lfs/lfs_extern.h: revision 1.118
sys/rump/fs/lib/liblfs/Makefile: revision 1.15
sys/ufs/lfs/lfs_bio.c: revision 1.146 (patch)
sys/ufs/lfs/lfs_bio.c: revision 1.147
sys/ufs/lfs/lfs_subr.c: revision 1.100

Fix kassert in lfs by initializing vp first.

Use a marker node to iterate lfs_dchainhd / i_lfs_dchain.

I believe elements can be removed while the lock is dropped,
including the next node we're hanging on to.

Just use VOP_BWRITE for lfs_bwrite_log.
Hope this doesn't cause trouble with vfs_suspend.

Teach lfs to transition ro<->rw.

Prevent new dirops while we issue lfs_flush_dirops.

lfs_flush_dirops assumes (by KASSERT((ip->i_state & IN_ADIROP) == 0))
that vnodes on the dchain will not become involved in active dirops
even while holding no other locks (lfs_lock, v_interlock), so we must
set lfs_writer here. All other callers already set lfs_writer.

We set fs->lfs_writer++ without explicitly doing lfs_writer_enter
because
(a) we already waited for the dirops to drain, and
(b) we hold lfs_lock and cannot drop it before setting lfs_writer.

Assert lfs_writer where I think we can now prove it.

Serialize access to the splay tree with lfs_lock.

Change some cheap KDASSERT into KASSERT.

Take a reference and fix assertions in lfs_flush_dirops.
Fixes panic:
KASSERT((ip->i_state & IN_ADIROP) == 0) at lfs_vnops.c:1670
lfs_flush_dirops
lfs_check
lfs_setattr
VOP_SETATTR
change_mode
sys_fchmod
syscall

This assertion -- and the assertion that vp->v_uflag has VU_DIROP set
-- is valid only until we release lfs_lock, because we may race with
lfs_unmark_dirop which will remove the nodes and change the flags.

Further, vp itself is valid only as long as it is referenced, which it
is as long as it's on the dchain, but lfs_unmark_dirop drops the
dchain's reference.

Don't lfs_writer_enter while holding v_interlock.

There's no need to lfs_writer_enter at all here, as far as I can see.
lfs_flush_fs will do it for us.

Break deadlock in PR kern/52301.

The lock order is lfs_writer -> lfs_seglock. The problem in 52301 is
that lfs_segwrite violates this lock order by sometimes doing
lfs_seglock -> lfs_writer, either (a) when doing a checkpoint or (b),
opportunistically, when there are no dirops pending. Both cases can
deadlock, because dirops sometimes take the seglock (lfs_truncate,
lfs_valloc, lfs_vfree):
(a) There may be dirops pending, and they may be waiting for the
seglock, so we can't wait for them to complete while holding the
seglock.
(b) The test for fs->lfs_dirops == 0 happens unlocked, and the state
may change by the time lfs_writer_enter acquires lfs_lock.

To resolve this in each case:
(a) Do lfs_writer_enter before lfs_seglock, since we will need it
unconditionally anyway. The worst performance impact of this should
be that some dirops get delayed a little bit.
(b) Create a new lfs_writer_tryenter to use at this point so that the
test for fs->lfs_dirops == 0 and the acquisition of lfs_writer happen
atomically under lfs_lock.

Initialize/destroy lfs_allclean_wakeup in modcmd, not lfs_mountfs.

Fixes reloading lfs.kmod.

In lfs_update, hold lfs_writer around lfs_vflush.

Otherwise, we might do
lfs_vflush
-> lfs_seglock
-> lfs_segwait(SEGM_CKP)
-> lfs_writer_enter
which is the reverse of the lfs_writer -> lfs_seglock ordering.

Call lfs_orphan in lfs_rename while we're still in the dirop.
lfs_writer_enter can't fail; keep it simple and don't pretend it can.

Assert that mtsleep can't fail either -- it doesn't catch signals and
there's no timeout.

Teach LFS_ORPHAN_NEXTFREE about lfs64.

Dust off the orphan detection code and try to make it work.

Fix !DIAGNOSTIC compile

Fix userland references to LFS_ORPHAN_NEXTFREE.

Forgot to grep for these or do a full distribution build, oops!

Fix missing <sys/evcnt.h> by removing the evcnts instead.

Just wanted to confirm that a race might happen, and indeed it did.
These serve little diagnostic value otherwise.

OR into bp->b_cflags; don't overwrite.

CTASSERT lfs on-disk structure sizes.

Avoid misaligned access to lfs64 on-disk records in memory.
lfs64 directory entries are only 32-bit aligned in order to conserve
space in directory blocks, and we had a hack to stuff a 64-bit inode
in them. This replaces the hack by __aligned(4) __packed, and goes
further:

1. It's not clear that all the other lfs64 data structures are 64-bit
aligned on disk to begin with. We can go through these later and
upgrade them from
struct foo64 {
...
} __aligned(4) __packed;
union foo {
struct foo64 f64;
...
};
to
struct foo64 {
...
};
union foo {
struct foo64 f64 __aligned(8);
...
} __aligned(4) __packed;
if we really want to take advantage of 64-bit memory accesses.
However, the __aligned(4) __packed must remain on the union
because:
2. We access even the lfs32 data structures via a union that has
lfs64 members, and it turns out that compilers will assume access
through a union with 64-bit aligned members implies the whole
union has 64-bit alignment, even if we're only accessing a 32-bit
aligned member.

Fix clang build after packed lfs64 accessor change.

Suppress spurious address-of-packed error in rump lfs too.
 1.54.18.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.125 04-Nov-2025  perseant Remove su_flags array, replacing it with a new flag SEGUSE_READY.
Segments progress from having su_nbytes==0 to SEGUSE_EMPTY to SEGUSE_READY
to clean, progressing to the nest step after a checkpoint.
 1.124 01-Nov-2025  perseant Create a new LFS inode flag, IN_DEAD, to indicate that a file's last
reference, other than those that come with VU_DIROP or IN_CLEANING and
the one the caller holds, has been dropped. Check and apply this flag
in lfs_orphan(), and call lfs_orphan() on close if the link count is
zero. Change the signature of lfs_orphan to facilitate.

Make test t_vfsops:lfs_tfhremove expect success.

Closes PR kern/43745.
 1.123 20-Oct-2025  perseant * Generalize the partial-segment parser introduced for roll-forward,
using it to facilitate an in-kernel segment rewriter (cleaner), and a
mechanism to check whether a segment is in fact empty (only used
with DEBUG).

* Add these new fcntl calls:
- LFCNFILESTATS: For each inode given, report its number of direct
blocks, how many gaps (discontinuities) there are between direct
blocks, and how large the total gap distance is. This will be
useful for a coalescing agent.
- LFCNREWRITEFILE: For each inode given, rewrite its direct blocks,
effectively coalescing it into as compact a form as possible.
- LFCNSCRAMBLE: As above, except that it only rewrites every other
block. This causes the file to have many gaps that can be
measured with LFCNFILESTATS and addressed with LFCNREWRITEFILE,
for testing purposes.
- LFCNREWRITESEGS: Rewrite any live data in the given segments.
This is intended to simplify the cleaner API and facilitate an
in-kernel cleaner.
- LFCNCLEANERINFO: Get the most current CLEANERINFO data from the
kernel.
- LFCNSEGUSE: Retrieve segment usage data from the kernel.

* Vnodes marked IN_CLEANING now take a reference. Add a new "cleaner
lock", which must be taken by the cleaner before the segment lock,
and before marking nodes IN_CLEANING. This allows us to flush
vnodes, if necessary, before the cleaning segment is written, and
never to flush vnodes being cleaned. When the cleaner lock is
released, the vnodes are cleared of IN_CLEANING and the reference
dropped.

* Track a potential infinite loop in lfs_gatherblock.

* Pull "needs to flush" and "needs to wait for flush" into functions
instead of inlining their definitions.
 1.122 17-Sep-2025  perseant Use a workqueue to handle the superblock callback.
 1.121 17-Sep-2025  perseant Add routines to check freelist consistency if compiled with DEBUG and
conditional on a kernel variable manipulated via sysctl.
Add checks before and after each routine that modifies the free list.
#if 0 a section of lfs_vfree() that was intended to keep the free list ordered
but instead corrupted it.
 1.120 04-Sep-2025  perseant Copy the flags from a full partial segment to its continuation, if
a continuation is necessary, so that partial-segment collections marked
with SS_DIROP|SS_CONT are properly completed wiht a partial-segment marked
SS_DIROP (without SS_CONT). Necessary for roll-forward.
 1.119 02-Sep-2025  perseant Use a workqueue to handle cluster iodone, rather than doing it in interrupt context.
 1.118 23-Feb-2020  riastradh Dust off the orphan detection code and try to make it work.
 1.117 23-Feb-2020  riastradh lfs_writer_enter can't fail; keep it simple and don't pretend it can.

Assert that mtsleep can't fail either -- it doesn't catch signals and
there's no timeout.
 1.116 23-Feb-2020  riastradh Break deadlock in PR kern/52301.

The lock order is lfs_writer -> lfs_seglock. The problem in 52301 is
that lfs_segwrite violates this lock order by sometimes doing
lfs_seglock -> lfs_writer, either (a) when doing a checkpoint or (b),
opportunistically, when there are no dirops pending. Both cases can
deadlock, because dirops sometimes take the seglock (lfs_truncate,
lfs_valloc, lfs_vfree):

(a) There may be dirops pending, and they may be waiting for the
seglock, so we can't wait for them to complete while holding the
seglock.

(b) The test for fs->lfs_dirops == 0 happens unlocked, and the state
may change by the time lfs_writer_enter acquires lfs_lock.

To resolve this in each case:

(a) Do lfs_writer_enter before lfs_seglock, since we will need it
unconditionally anyway. The worst performance impact of this should
be that some dirops get delayed a little bit.

(b) Create a new lfs_writer_tryenter to use at this point so that the
test for fs->lfs_dirops == 0 and the acquisition of lfs_writer happen
atomically under lfs_lock.
 1.115 18-Feb-2020  chs remove the aiodoned thread. I originally added this to provide a thread context
for doing page cache iodone work, but since then biodone() has changed to
hand off all iodone work to a softint thread, so we no longer need the
special-purpose aiodoned thread.
 1.114 22-Aug-2018  msaitoh branches: 1.114.4; 1.114.6;
- Cleanup for dynamic sysctl:
- Remove unused *_NAMES macros for sysctl.
- Remove unused *_MAXID for sysctls.
- Move CTL_MACHDEP sysctl definitions for m68k into m68k/include/cpu.h and
use them on all m68k machines.
 1.113 26-Jul-2017  maya branches: 1.113.2; 1.113.4;
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar

XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
 1.112 08-Jun-2017  chs move some buffer cache internals declarations from buf.h to vfs_bio.c.
this is needed to avoid name conflicts with ZFS and also
makes it clearer that other code shouldn't be messing with these.
remove the LFS debug code that poked around in bufqueues and
remove the BQ_EMPTY bufqueue since nothing uses it anymore.
provide a function to let LFS and wapbl read the value of nbuf for now.
 1.111 20-Jun-2016  dholland branches: 1.111.10;
u_int{8,16,32,64}_t -> uint{8,16,32,64}_t in remaining lfs headers.
 1.110 12-Aug-2015  dholland Hack up dinode usage to be 64 vs. 32 as needed. Part 1.

(This part changes the native lfs code; the ufs-derived code already
has 64 vs. 32 logic, but as aspects of it are unsafe, and don't
entirely interoperate cleanly with the lfs 64/32 stuff, pass 2 will be
rehashing that.)
 1.109 12-Aug-2015  dholland Move the security checks for lfs_bmapv/lfs_markv into those functions.
(instead of the system call entry points)

Avoids duplication.

While touching these, pass the lwp around instead of the proc -- the
latter was there for no other reason than because once upon a time
struct proc was the first argument of all syscalls.

(For that matter, why not just use curlwp instead of passing it around
all over the place? The cost of passing it to every syscall probably
exceeds the cost of loading it from curcpu, even on machines where
it's not just kept in a register all the time.)
 1.108 12-Aug-2015  dholland Fix assorted 64->32 truncations related to BLOCK_INFO.

Also make note of a cleaner limitation: it seems that when it goes to
coalesce discontiguous files, it mallocs an array with one BLOCK_INFO
for every block in the file. Therefore, with 64-bit LFS, on a 32-bit
platform it will be possible to have files large enough to overflow
the cleaner's address space. Currently these will be skipped and cause
warnings via syslog.

At some point someone should rewrite the logic to coalesce files to
use chunks of some reasonable size, as discontinuity between such
chunks is immaterial and mallocing this much space is silly and
fragile. Also, the kernel only accepts up to 65536 blocks at a time
for bmapv and markv, so processing more than this at once probably
isn't useful and may not even work currently. I don't want to change
this around just now as it's not entirely trivial.
 1.107 02-Aug-2015  dholland Add a (draft) 64-bit superblock. Make things build again.

Add pieces of support for using both superblock types where
convenient, and specifically to the superblock accessors, but don't
actually enable it anywhere.

First substantive step on PR 50000.
 1.106 02-Aug-2015  dholland Second batch of 64 -> 32 truncations in lfs, along with more minor
tidyups and corrections in passing.
 1.105 31-May-2015  hannken Change lfs from hash table to vcache.

- Change lfs_valloc() to return an inode number and version instead of
a vnode and move lfs_ialloc() and lfs_vcreate() to new lfs_init_vnode().

- Add lfs_valloc_fixed() to allocate a known inode, used by kernel
roll forward.

- Remove lfs_*ref(), these functions cannot coexist with vcache and
their commented behaviour is far away from their implementation.

- Add the cleaner lwp and blockinfo to struct ulfsmount so lfs_loadvnode()
may use hints from the cleaner.

- Remove vnode locks from ulfs_lookup() like we did with ufs_lookup().
 1.104 31-May-2015  hannken Make lfs_fastvget() private to lfs_syscalls.c, change it to take the
BLOCK_INFO and vnode lock type instead of the inode disk address and
return the vnode locked.

Change lfs_markv() and lfs_bmapv() to work on locked vnodes.
 1.103 31-May-2015  hannken Use VFS_PROTOS() for lfs.
Rename conflicting struct lfs field "lfs_start" to "lfs_s0addr".

No functional change.
 1.102 27-Mar-2015  riastradh Disentangle buffer-cached I/O from page-cached I/O in UFS.

Page-cached I/O is used for regular files, and is initiated by VFS
users such as userland and NFS.

Buffer-cached I/O is used for directories and symlinks, and is issued
only internally by UFS.

New UFS routine ufs_bufio replaces vn_rdwr for internal use.
ufs_bufio is implemented by new UFS operations uo_bufrd/uo_bufwr,
which sit in ufs_readwrite.c alongside the VOP_READ/VOP_WRITE
implementations.

I preserved the code as much as possible and will leave further
simplification for future commits. I kept the ulfs_readwrite.c
copypasta close to ufs_readwrite.c in case we ever want to merge them
back; likewise ext2fs_readwrite.c.

No externally visible semantic change. All atf fs tests still pass.
 1.101 18-Mar-2014  riastradh branches: 1.101.6;
Merge riastradh-drm2 to HEAD.
 1.100 20-Jul-2013  dholland Collect the pieces of lfs rename into lfs_rename.c, and sprinkle static.
 1.99 06-Jun-2013  dholland branches: 1.99.2; 1.99.4;
Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.98 23-Feb-2012  joerg branches: 1.98.2;
Make sure that __BEGIN_DECLS and __END_DECLS are paired.
 1.97 02-Jan-2012  perseant * Remove PGO_RECLAIM during lfs_putpages()' call to genfs_putpages(),
to avoid a live lock in the latter when reclaiming a vnode with
dirty pages.

* Add a new segment flag, SEGM_RECLAIM, to note when a segment is
being written for vnode reclamation, and record which inode is being
reclaimed, to aid in forensic debugging.

* Add a new segment flag, SEGM_SINGLE, so that opportunistic writes
can write a single segment's worth of blocks and then stop, rather
than writing all the way up to the cleaner's reserved number of
segments.

* Add assert statements to check mutex ownership is the way it ought
to be, mostly in lfs_putpages; fix problems uncovered by this.

* Don't clear VU_DIROP until the inode actually makes its way to disk,
avoiding a problem where dirop inodes could become separated
(uncovered by a modified version of the "ckckp" forensic regression
test).

* Move the vfs_getopsbyname() call into lfs_writerd. Prepare code to
make lfs_writerd notice when there are no more LFSs, and exit losing
the reference, so that, in theory, the module can be unloaded. This
code is not enabled, since it causes a crash on exit.

* Set IN_MODIFIED on inodes flushed by lfs_flush_dirops. Really we
only need to set IN_MODIFIED if we are going to write them again
(e.g., to write pages); need to think about this more.

Finally, several changes to help avoid "no clean segments" panics:

* In lfs_bmapv, note when a vnode is loaded only to discover whether
its blocks are live, so it can immediately be recycled. Since the
cleaner will try to choose ~empty segments over full ones, this
prevents the cleaner from (1) filling the vnode cache with junk, and
(2) squeezing any unwritten writes to disk and running the fs out of
segments.

* Overestimate by half the amount of metadata that will be required
to fill the clean segments. This will make the disk appear smaller,
but should help avoid a "no clean segments" panic.

* Rearrange lfs_writerd. In particular, lfs_writerd now pays
attention to the number of clean segments available, and holds off
writing until there is room.
 1.96 28-Jun-2008  rumble branches: 1.96.30; 1.96.34;
Create sysctl entries during module initialisation and destroy them
appropriately.

Many of these file systems are now ready for modularisation.
 1.95 28-Apr-2008  martin branches: 1.95.2; 1.95.4;
Remove clause 3 and 4 from TNF licenses
 1.94 02-Jan-2008  ad branches: 1.94.6; 1.94.8; 1.94.10;
Merge vmlocking2 to head.
 1.93 08-Dec-2007  pooka branches: 1.93.4;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.
 1.92 26-Nov-2007  pooka branches: 1.92.2;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.91 31-Jul-2007  pooka branches: 1.91.2; 1.91.4; 1.91.10; 1.91.12;
* nuke the nameidata parameter from VFS_MOUNT(). Nobody on tech-kern
knew what it was supposed to be used for and wrstuden gave a go-ahead
* while rototilling, convert file systems which went easily to
use VFS_PROTOS() instead of manually prototyping the methods
 1.90 12-Jul-2007  dsl branches: 1.90.2;
Change the VFS_MOUNT() interface so that the 'data' buffer passed to the
fs code is a kernel buffer, pass though the length of the buffer as well.
Since the length of the userspace buffer isn'it (yet) passed through the mount
system call, add a field to the vfsops structure containing the default length.
Split sys_mount() for calls from compat code.
Ride one of the recent kernel version changes - old fs LKMs will load, but
sys_mount() will reject any attempt to use them.
 1.89 17-Apr-2007  perseant Install a new sysctl, vfs.lfs.ignore_lazy_sync, which causes LFS to ignore
the "smooth" syncer, as if vfs.sync.*delay = 0, but only for LFS. The
default is "on", i.e., ignore lazy sync.

Reduce the amount of polling/busy-waiting done by lfs_putpages(). To
accomplish this, copied genfs_putpages() and modified it to indicate which
page it was that caused it to return with EDEADLK. fsync()/fdatasync()
should no longer ever fail with EAGAIN, and should not consume huge
quantities of cpu.

Also, try to make dirops less likely to be written as the result of a
VOP_PUTPAGES(), while ensuring that they are written regularly.
 1.88 04-Mar-2007  christos branches: 1.88.2; 1.88.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.87 01-Sep-2006  perseant branches: 1.87.6; 1.87.8; 1.87.12;
Changes to help the roll-forward agent, to wit:

* Mark being-deleted files in the Ifile so we can finish deleting them
at fs mount time.
* Flag the Ifile with "cleaner must clean" when writers are waiting for
the cleaner, rather than relying solely on the cleaner's estimation of
whether it should clean or not.
* Note partial segments written by a user agent (in particular,
fsck_lfs) so that repeated rolls forward don't interfere with one
another.
* Add a new fcntl, LFCNPASS, that allows the log to wrap exactly once,
for better testing of the validity of checkpoints.
* Keep track of the on-disk nlink count when cleaning, so that we don't
partially complete directory operations while cleaning.
* Ensure that every single Ifile inode write represents a consistent
view of the filesystem. In particular, the accounting for the segment
we are writing the inode into must be correct, and the accounting for
the segment that inode used to reside in must be correct. Rather than
just rewriting the inode if we wrote it wrong, rewrite the necessary
ifile blocks before writing the inode so we never write it wrong.
* Don't unmark any VDIROP vnodes if we haven't written them to disk,
avoiding yet another problem with the "wait for the cleaner" error
return from lfs_putpages().

Also, move the last callback to an aiodone call, so we no longer do any
memory management from interrupt context.
 1.86 20-Jul-2006  perseant Separate the (non-working) LFS kernel roll-forward code into its own file,
lfs_rfw.c.
 1.85 13-Jul-2006  martin Fix alignement problems for fhandle_t, exposed by gcc4.1.

While touching all vptofh/fhtovp functions, get rid of VFS_MAXFIDSIZ,
version the getfh(2) syscall and explicitly pass the size available in
the filehandle from userland.

Discussed on tech-kern, with lots of help from yamt (thanks!).
 1.84 29-Jun-2006  perseant Don't wake up the cleaner if the filesystem is unwrappable, and fix the
compatibility fcntls.

Also includes one-line fixes for an MP locking bug and a zero-length FINFO
problem that manifested during testing.
 1.83 18-May-2006  perseant branches: 1.83.4;
Break out the finfo array manipulation code into two new functions,
lfs_acquire_finfo() and lfs_release_finfo(). Add a debugging check
for zero-length finfo arrays in the segment summary to avoid future
regressions.
 1.82 14-May-2006  elad integrate kauth.
 1.81 01-May-2006  perseant Don't ever partially write dirops, even if we need the cleaner to run.
This increases the chances of the "no clean segments" panic slightly,
but allows us to run the ckckp regression test successfully to completion.
 1.80 30-Apr-2006  perseant Postpone the segment accounting changes coming from truncation until the
inode that makes those changes valid is either written to disk by
lfs_writeinode() or discarded by lfs_vfree().

A couple of locking fixes are also included as well.
 1.79 23-Apr-2006  yamt remove unused FFS_NAMES and LFS_NAMES.
 1.78 08-Apr-2006  perseant Implement a somewhat finer-grained mechanism for paging LFS-backed pages.
The writer daemon, if it does not need to flush the whole filesystem,
now only writes the vnodes for which the pagedaemon has requested pageouts
(although it does not pay attention to the page ranges the pagedaemon
supplies).
 1.77 08-Apr-2006  perseant Keep the free list ordered. This solves a problem first pointed out to me
by Michel Oey, in which an aged LFS writes up to an extra Ifile block for
every file created; and paves the way for the truncation of the Ifile when
many files are deleted.
 1.76 24-Mar-2006  perseant Improvements to LFS's paging mechanism, to wit:

* Acknowledge that sometimes there are more dirty pages to be written to
disk than clean segments. When we reach the danger line,
lfs_gop_write() now returns EAGAIN. The caller of VOP_PUTPAGES(), if
it holds the segment lock, drops it and waits for the cleaner to make
room before continuing.

* Note and avoid a three-way deadlock in lfs_putpages (a writer holding
a page busy blocks on the cleaner while the cleaner blocks on the
segment lock while lfs_putpages blocks on the page).
 1.75 14-Jan-2006  yamt branches: 1.75.2; 1.75.4; 1.75.6; 1.75.8; 1.75.10;
- unify ffs_blkatoff and lfs_blkatoff.
- remove ufs_ops::uo_blkatoff.
- add directory read-ahead code. (disabled for now.)
 1.74 06-Jan-2006  yamt remove an obsolete prototype.
 1.73 11-Dec-2005  christos branches: 1.73.2;
merge ktrace-lwp.
 1.72 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.71 13-Sep-2005  christos branches: 1.71.2;
split out lfs_itimes(). It is used in fsck_lfs.
 1.70 12-Sep-2005  christos Use nanotime() to update the time fields in filesystems. Convert the code
from macros to real functions. Original patch and review from chuq.
Note: ext2fs only keeps seconds in the on-disk inode, and msdosfs does not
have enough precision for all fields, so this is not very useful for those
two.
 1.69 28-Jun-2005  yamt branches: 1.69.2;
- constify genfs_ops.
- use member designators.
 1.68 29-May-2005  christos - sprinkle const
- avoid shadow variables.
 1.67 23-Apr-2005  perseant Provide a resize_lfs(8), including kernel and cleaner support. The current
implementation requires the fs to be mounted while resizing. Tested in both
directions, and everything appears to work happily, but ymmv.
 1.66 16-Apr-2005  perseant Use splay trees, rather than a hash table, to manage the accounting of
blocks allocated through VOP_BALLOC() for pages to be written to disk.
This accounting no longer takes a noticeable fraction of the system CPU.
 1.65 14-Apr-2005  perseant Consolidate the hash table we use to maintain the integrity of lfs_avail
into a single, system-wide table, rather than having a separate hash table
per inode. Significantly reduces the "system" cpu usage of your average
file write.
 1.64 08-Mar-2005  perseant branches: 1.64.2;
Straighten out the maze of ifdefs. Instead, consolidate all the debugging
stuff under '#ifdef DEBUG', and use sysctl knobs to turn on/off particular
parts of the debugging reporting (if DEBUG is enabled). Re-enable the LFS
statistics in sysctl, while I'm there. A bit of a rototill.
 1.63 26-Feb-2005  perry nuke trailing whitespace
 1.62 26-Feb-2005  perseant Various minor LFS improvements:

* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statvfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().
 1.61 20-May-2004  atatat branches: 1.61.4; 1.61.6;
Tweak sysctl setup functions (the macros, actually) for use in lkms,
and tweak lkminit_*.c (where applicable) to call them, and to call
sysctl_teardown() when being unloaded.

This consists of (1) making setup functions not be static when being
compiled as lkms (change to sys/sysctl.h), (2) making prototypes
visible for the various setup functions in header files (changes to
various header files), and (3) making simple "load" and "unload"
functions in the actual lkminit stuff.

linux_sysctl.c also needs its root exposed (ie, made not static) for
this (when built as an lkm).
 1.60 21-Apr-2004  christos Replace the statfs() family of system calls with statvfs().
Retain binary compatibility.
 1.59 09-Mar-2004  yamt branches: 1.59.2;
calculate data checksum inline.
 1.58 04-Dec-2003  atatat Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
 1.57 07-Nov-2003  yamt - tweak lfs_update_single()'s prototype so that it can be used by
roll-forward code.
- reduce code duplication using the above in update_meta()
this also fixes fragment accounting.
 1.56 07-Nov-2003  yamt fix spec vnode aliasing.
 1.55 29-Sep-2003  yamt remove redundant prototypes.
 1.54 23-Sep-2003  yamt cleanup IN_ADIROP/VDIROP handling a little.
 1.53 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.52 12-Jul-2003  yamt - protect global resource counts with lfs_subsys_lock.
- clean up scattered externs a little.
 1.51 02-Jul-2003  yamt - add a new functions, lfs_writer_enter/leave, and use them instead of
duplicated code fragments.
- add an assertion.
 1.50 29-Jun-2003  fvdl branches: 1.50.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.49 29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.48 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.47 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.46 20-Mar-2003  yamt fix "more than one fragment" panics;
direct and indirect block pointers are not valid in the case of shortlinks.
while i'm here, move duplicated code in lfs_vget/fastvget into a new
function, lfs_vinit.
 1.45 08-Mar-2003  perseant Add an lfs_strategy() that checks to make sure we're not trying to read
where the cleaner is trying to write, instead of tying up the "live"
buffers (or pages).

Fix a bug in the LFS_UBC case where oversized buffers would not be
checksummed correctly, causing uncleanable segments.

Make sure that wakeup(fs->lfs_iocount) is done if fs->lfs_iocount is 1
as well as 0, since we wait in some places for it to drop to 1.

Activate all pages that make it into lfs_gop_write without the segment
lock held, since they must have been dirtied very recently, even if
PG_DELWRI is not set.
 1.44 25-Feb-2003  perseant Make fs-specific fcntl macros take three arguments (approved wrstuden).
Let LFS use fcntl for cleaner functions.
 1.43 24-Feb-2003  perseant Add lfs_ioctl vnode op, with ioctls to take over cleaner system call
functionality (not including segment clean, since that is now done
automatically as checkpoints happen).
 1.42 23-Feb-2003  perseant Fix a buffer overflow bug in the LFS_UBC case that manifested itself
either as a mysterious UVM error or as "panic: dirty bufs". Verify
maximum size in lfs_malloc.

Teach lfs_updatemeta and lfs_shellsort about oversized cluster blocks from
lfs_gop_write.

When unwiring pages in lfs_gop_write, deactivate them, under the theory
that the pagedaemon wanted to free them last we knew.
 1.41 20-Feb-2003  perseant Tabify, and fix some comment alignment problems.
 1.40 18-Feb-2003  perseant Make it compile again, grr....
 1.39 17-Feb-2003  perseant Add code to UBCify LFS. This is still behind "#ifdef LFS_UBC" for now
(there are still some details to work out) but expect that to go
away soon. To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:

* Create a writer daemon kernel thread whose purpose is to handle page
writes for the pagedaemon, but which also takes over some of the
functions of lfs_check(). This thread is started the first time an
LFS is mounted.

* Add a "flags" parameter to GOP_SIZE. Current values are
GOP_SIZE_READ, meaning that the call should return the size of the
in-core version of the file, and GOP_SIZE_WRITE, meaning that it
should return the on-disk size. One of GOP_SIZE_READ or
GOP_SIZE_WRITE must be specified.

* Instead of using malloc(...M_WAITOK) for everything, reserve enough
resources to get by and use malloc(...M_NOWAIT), using the reserves if
necessary. Use the pool subsystem for structures small enough that
this is feasible. This also obsoletes LFS_THROTTLE.

And a few that are not strictly necessary:

* Moves the LFS inode extensions off onto a separately allocated
structure; getting closer to LFS as an LKM. "Welcome to 1.6O."

* Unified GOP_ALLOC between FFS and LFS.

* Update LFS copyright headers to correct values.

* Actually cast to unsigned in lfs_shellsort, like the comment says.

* Keep track of which segments were empty before the previous
checkpoint; any segments that pass two checkpoints both dirty and
empty can be summarily cleaned. Do this. Right now lfs_segclean
still works, but this should be turned into an effectless
compatibility syscall.
 1.38 01-Feb-2003  tron Only use MALLOC_DECLARE() in kernel namespace.
 1.37 01-Feb-2003  thorpej Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.
 1.36 30-Jan-2003  yamt there's no need to treat VOP_WHITEOUT as dirop
because it modifies only one inode.
 1.35 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.34 28-Dec-2002  yamt - in lfs_reserve, vref vnodes that we're locking so that cleaner doesn't
try to reclaim them.
(workaround for deadlock noted in the comment in lfs_reserveavail)
- in lfs_rename, mark vnodes which are being moved as well as directry vnodes.
 1.33 17-Dec-2002  yamt no need for cleaner to hold vnode locks.
cleaner and normal vnode operations are synchronized enough by
seglock/fraglock and buf's B_BUSY-ness.
 1.32 01-Dec-2002  matt Add multiple inclusion protection for headers. Fix mismatched
variable declarations (missing const's) as needed.
 1.31 16-Jun-2002  perseant For synchronous writes, keep separate i/o counters for each write, so
processes don't have to wait for one another to finish (e.g., nfsd seems
to be a little happier now, though I haven't measured the difference).
Synchronous checkpoints, however, must always wait for all i/o to finish.

Take the contents of the callback functions and have them run in thread
context instead (aiodoned thread). lfs_iocount no longer has to be
protected in splbio(), and quite a bit less of the segment construction
loop needs to be in splbio() as well.

If lfs_markv is handed a block that is not the correct size according to
the inode, refuse to process it. (Formerly it was extended to the "correct"
size.) This is possibly more prone to deadlock, but less prone to corruption.

lfs_segclean now outright refuses to clean segments that appear to have live
bytes in them. Again this may be more prone to deadlock but avoids
corruption.

Replace ufsspec_close and ufsfifo_close with LFS equivalents; this means
that no UFS functions need to know about LFS_ITIMES any more. Remove
the reference from ufs/inode.h.

Tested on i386, test-compiled on alpha.
 1.30 14-May-2002  perseant branches: 1.30.2; 1.30.4;
Phase one of my three-phase plan to make LFS play nice with UBC, and bug-fixes
I found while making sure there weren't any new ones.

* Make the write clusters keep track of the buffers whose blocks they contain.
This should make it possible to (1) write clusters using a page mapping
instead of malloc, if desired, and (2) schedule blocks for rewriting
(somewhere else) if a write error occurs. Code is present to use
pagemove() to construct the clusters but that is untested and will go away
anyway in favor of page mapping.
* DEBUG now keeps a log of Ifile writes, so that any lingering instances of
the "dirty bufs" problem can be properly debugged.
* Keep track of whether the Ifile has been dirtied by various routines that
can be called by lfs_segwrite, and loop on that until it is clean, for
a checkpoint. Checkpoints need to be squeaky clean.
* Warn the user (once) if the Ifile grows larger than is reasonable for their
buffer cache. Both lfs_mountfs and lfs_unmount check since the Ifile can
grow.
* If an inode is not found in a disk block, try rereading the block, under
the assumption that the block was copied to a cluster and then freed.
* Protect WRITEINPROG() with splbio() to fix a hang in lfs_update.
 1.29 12-May-2002  matt Eliminate commons.
 1.28 11-Feb-2002  perseant Include the space taken by inodes in the count made by lfs_check();
make VOP_SETATTR call lfs_check. This prevents large numbers of inode
changes (say, at the end of tar(1)) from filling the buffer cache.
 1.27 18-Dec-2001  chs use the new compatibility routines to allow mmap() to work
(in the same non-coherent fashion that it worked pre-UBC)
until someone has time to do it the right way.
 1.26 15-Sep-2001  chs add a new VFS op, vfs_reinit, which is called when desiredvnodes is
adjusted via sysctl. file systems that have hash tables which are
sized based on the value of this variable now resize those hash tables
using the new value. the max number of FFS softdeps is also recalculated.

convert various file systems to use the <sys/queue.h> macros for
their hash tables.
 1.25 13-Jul-2001  perseant branches: 1.25.2;
Merge the short-lived perseant-lfsv2 branch into the trunk.

Kernels and tools understand both v1 and v2 filesystems; newfs_lfs
generates v2 by default. Changes for the v2 layout include:

- Segments of non-PO2 size and arbitrary block offset, so these can be
matched to convenient physical characteristics of the partition (e.g.,
stripe or track size and offset).

- Address by fragment instead of by disk sector, paving the way for
non-512-byte-sector devices. In theory fragments can be as large
as you like, though in reality they must be smaller than MAXBSIZE in size.

- Use serial number and filesystem identifier to ensure that roll-forward
doesn't get old data and think it's new. Roll-forward is enabled for
v2 filesystems, though not for v1 filesystems by default.

- The inode free list is now a tailq, paving the way for undelete (undelete
is not yet implemented, but can be without further non-backwards-compatible
changes to disk structures).

- Inode atime information is kept in the Ifile, instead of on the inode;
that is, the inode is never written *just* because atime was changed.
Because of this the inodes remain near the file data on the disk, rather
than wandering all over as the disk is read repeatedly. This speeds up
repeated reads by a small but noticeable amount.

Other changes of note include:

- The ifile written by newfs_lfs can now be of arbitrary length, it is no
longer restricted to a single indirect block.

- Fixed an old bug where ctime was changed every time a vnode was created.
I need to look more closely to make sure that the times are only updated
during write(2) and friends, not after-the-fact during a segment write,
and certainly not by the cleaner.
 1.24 03-Dec-2000  perseant branches: 1.24.2; 1.24.4; 1.24.6;
Get rid of some old unnecessary code that cleared B_NEEDCOMMIT from buffers in
lfs_writeseg (possibly after they had been freed).

If MALLOCLOG is defined, make lfs_newbuf and lfs_freebuf pass along the
caller's file and line to _malloc and _free.
 1.23 25-Nov-2000  perseant Use u_int32_t instead of u_long to compute LFS checksums, since the
checksum is stored in a u_int32_t.
 1.22 17-Nov-2000  perseant Correct accounting of lfs_avail, locked_queue_count, and locked_queue_bytes.
(PR #11468). In the case of fragment allocation, check to see if enough
space is available before extending a fragment already scheduled for writing.

The locked_queue_* variables indicate the number of buffer headers and bytes,
respectively, that are unavailable to getnewbuf() because they are locked up
waiting for LFS to flush them; make sure that that is actually what we're
counting, i.e., never count malloced buffers, and always use b_bufsize instead
of b_bcount.

If DEBUG is defined, the periodic calls to lfs_countlocked will now complain
if either counter is incorrect. (In the future lfs_countlocked will not need
to be called at all if DEBUG is not defined.)
 1.21 09-Sep-2000  perseant Various bug-fixes to LFS, to wit:


Kernel:

* Add runtime quantity lfs_ravail, the number of disk-blocks reserved
for writing. Writes to the filesystem first reserve a maximum amount
of blocks before their write is allowed to proceed; after the blocks
are allocated the reserved total is reduced by a corresponding amount.

If the lfs_reserve function cannot immediately reserve the requested
number of blocks, the inode is unlocked, and the thread sleeps until
the cleaner has made enough space available for the blocks to be
reserved. In this way large files can be written to the filesystem
(or, smaller files can be written to a nearly-full but thoroughly
clean filesystem) and the cleaner can still function properly.

* Remove explicit switching on dlfs_minfreeseg from the kernel code; it
is now merely a fs-creation parameter used to compute dlfs_avail and
dlfs_bfree (and used by fsck_lfs(8) to check their accuracy). Its
former role is better assumed by a properly computed dlfs_avail.

* Bounds-check inode numbers submitted through lfs_bmapv and lfs_markv.
This prevents a panic, but, if the cleaner is feeding the filesystem
the wrong data, you are still in a world of hurt.

* Cleanup: remove explicit references of DEV_BSIZE in favor of
btodb()/dbtob().

lfs_cleanerd:

* Make -n mean "send N segments' blocks through a single call to
lfs_markv". Previously it had meant "clean N segments though N calls
to lfs_markv, before looking again to see if more need to be cleaned".
The new behavior gives better packing of direct data on disk with as
little metadata as possible, largely alleviating the problem that the
cleaner can consume more disk through inefficient use of metadata than
it frees by moving dirty data away from clean "holes" to produce
entirely clean segments.

* Make -b mean "read as many segments as necessary to write N segments
of dirty data back to disk", rather than its former meaning of "read
as many segments as necessary to free N segments worth of space". The
new meaning, combined with the new -n behavior described above,
further aids in cleaning storage efficiency as entire segments can be
written at once, using as few blocks as possible for segment summaries
and inode blocks.

* Make the cleaner take note of segments which could not be cleaned due
to error, and not attempt to clean them until they are entirely free
of dirty blocks. This prevents the case in which a cleanerd running
with -n 1 and without -b (formerly the default) would spin trying
repeatedly to clean a corrupt segment, while the remaining space
filled and deadlocked the filesystem.

* Update the lfs_cleanerd manual page to describe all the options,
including the changes mentioned here (in particular, the -b and -n
flags were previously undocumented).

fsck_lfs:

* Check, and optionally fix, lfs_avail (to an exact figure) and
lfs_bfree (within a margin of error) in pass 5.

newfs_lfs:

* Reduce the default dlfs_minfreeseg to 1/20 of the total segments.

* Add a warning if the sgs disklabel field is 16 (the default for FFS'
cpg, but not usually desirable for LFS' sgs: 5--8 is a better range).

* Change the calculation of lfs_avail and lfs_bfree, corresponding to
the kernel changes mentioned above.

mount_lfs:

* Add -N and -b options to pass corresponding -n and -b options to
lfs_cleanerd.

* Default to calling lfs_cleanerd with "-b -n 4".


[All of these changes were largely tested in the 1.5 branch, with the
idea that they (along with previous un-pulled-up work) could be applied
to the branch while it was still in ALPHA2; however my test system has
experienced corruption on another filesystem (/dev/console has gone
missing :^), and, while I believe this unrelated to the LFS changes, I
cannot with good conscience request that the changes be pulled up.]
 1.20 05-Jul-2000  perseant Clean up accounting of lfs_uinodes (dirty but unwritten inodes).

Make lfs_uinodes a signed quantity for debugging purposes, and set it to
zero as fs mount time.

Enclose setting/clearing of the dirty flags (IN_MODIFIED, IN_ACCESSED,
IN_CLEANING) in macros, and use those macros everywhere. Make
LFS_ITIMES use these macros; updated the ITIMES macro in inode.h to know
about this. Make ufs_getattr use ITIMES instead of FFS_ITIMES.
 1.19 30-Jun-2000  fvdl Rearrange code around getnewvnode as was already done for ffs, to avoid
locking against oneself because getnewvnode recycles a softdep-using vnode.
 1.18 27-Jun-2000  perseant Fixes associated with filling an LFS:

Change the space computation to appear to change the size of the *disk*
rather than the *bytes used* when more segment summaries and inode
blocks are written. Try to estimate the amount of space that these will
take up when more files are written, so the disk size doesn't change too
much.

Regularize error returns from lfs_valloc, lfs_balloc, lfs_truncate: they
now fail entirely, rather than succeeding half-way and leaving the fs in
an inconsistent state.

Rewrite lfs_truncate, mostly stealing from ffs_truncate. The old
lfs_truncate had difficulty truncating a large file to a non-zero size
(indirect blocks were not handled appropriately).

Unmark VDIROP on fvp after ufs_remove, ufs_rmdir, so these can be
reclaimed immediately: this vnode would not be written to disk again
anyway if the removal succeeded, and if it failed, no directory
operation occurred.

ufs_makeinode and ufs_mkdir now remove IN_ADIROP on error.
 1.17 16-Mar-2000  jdolecek branches: 1.17.4;
Add new VFS op routine - vfs_done and call it on filesystem detach
in vfs_detach(). vfs_done may free global filesystem's resources,
typically those allocated in respective filesystem's init function.
Needed so those filesystems which went in via LKM have a chance to
clean after themselves before unloading. This fixes random panics
when LKM for filesystem using pools was loaded and unloaded several
times.

For each leaf filesystem, add appropriate vfs_done routine.
 1.16 19-Jan-2000  perseant Changes to stabilize LFS. The first two of these should also apply to the
1.4 branch.

* Use a separate per-fs lock, instead of ufs_hashlock, to protect the Inode
free list. This seems to prevent the "lockmgr: %d, not exclusive lock holder
%d, unlocking" message I was mis-attributing last night to an unlocked vnode
being passed to vrele.

* Change calling semantics of lfs_ifind, to give better error reporting:
If fed a struct buf, it can report the block number of the offending inode
block as well as the inode number.

* Back out rev 1.10 of lfs_subr.c, since the replacement code was slightly
uglier while being functionally identical.

* Make lfs_vunref use the same free list convention as vrele/vput, so that
vget does not remove vnodes from a hash list they are not on.
 1.15 15-Nov-1999  fvdl Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.14 01-Jun-1999  perseant branches: 1.14.2; 1.14.4; 1.14.8;
Fixed lfs_update (and related functions) so that calls from lfs_fsync
will DTRT with vnodes marked VDIROP. In particular, the message
"flushing VDIROP" will no longer appear, and the filesystem will remain
stable in the event of a crash.

This was particularly a problem with NFS-exported LFSes, since fsync
was called on every file close.
 1.13 10-Mar-1999  perseant branches: 1.13.2; 1.13.4;
New sources should leave the LFS in a more-or-less working state. Changes
include:

- DIROP segregation is enabled, and greater care is taken
to make sure that a checkpoint completes. Fsck is not
needed to remount the filesystem.
- Several checks to make sure that the LFS subsystem does not
overuse various resources (memory, in particular).
- The cleaner routines, lfs_markv in particular, are completely
rewritten. A buffer overflow is removed. Greater care is taken
to ensure that inodes come from where lfs_cleanerd say they come
from (so we know nothing has changed since lfs_bmapv was called).
- Fragment allocation is fixed, so that writes beyond end-of-file
do the right thing.
 1.12 26-Feb-1999  wrstuden Modify vfsops to seperate vfs_fhtovp() into two routines. vfs_fhtovp() now
only handles the file handle to vnode conversion, and a new call,
vfs_checkexp(), performs the export verification.
 1.11 11-Sep-1998  pk PR#6032: define fixed sized on-disk superblock structure.
 1.10 01-Sep-1998  thorpej Use the pool allocator and the "nointr" pool page allocator for LFS inodes.
 1.9 24-Jun-1998  sommerfe Always include fifos; "not an option any more".
 1.8 22-Jun-1998  sommerfe defopt for options FIFO
 1.7 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.6 22-Dec-1996  cgd Change the second and third args to struct vfsops' (*vfs_mount)() to
'const char *', and 'void *', respectively. The second arg is taken directly
from user arguments, and is const there, so must be const in the prototypes
and functions. The third arg is also taken directly from user arguments.
It doesn't have to be changed, but since it's cleaner to keep the type
the same as the user arg's type, and I'm already making the 'const char *'
change...
 1.5 12-Feb-1996  christos Add fwd declaration for struct ucred
 1.4 09-Feb-1996  christos lfs prototypes
 1.3 14-Dec-1994  mycroft Sync with CSRG.
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.13.4.3 31-Aug-1999  perseant Rudimentary support for LFS under UBC:

- LFS-specific VOP_BALLOC and VOP_PUTPAGES vnode ops.

- getblk VREG panic #ifdef'd out (can be reinstated when Ifile is
internalized and Ifile can be made another type from VREG)

- interface to VOP_PUTPAGES changed to pass all pager flags, not
just sync. FS putpages routines must know about the pager flags.

- new LFS magic disk address, -2 ("unwritten"), meaning accounted for
but not assigned to a fixed disk location (since LFS does these two
things separately, and the previous accounting method using buffer
headers no longer will work). Changed references to (foo == (daddr_t)-1)
to (foo < 0). Since disk drivers reject all addresses < 0, this should
not present a problem for other FSs.
 1.13.4.2 11-Jul-1999  chs add placeholders for getpages/putpages.
 1.13.4.1 21-Jun-1999  thorpej Sync w/ -current.
 1.13.2.2 20-Jan-2000  he Pull up revision 1.16 (requested by perseant):
Files removed (through unlink, rmdir) are now really removed, though the
removal is postponed until the dirop is complete to ensure validity of
the filesystem through a crash. Use a separate per-fs lock, instead of
ufs_hashlock, to protect the inode free list. Change calling semantics
of lfs_ifind, to give better error reporting: If fed a struct buf, it
can report the block number of the offending inode block as well as the
inode number.
 1.13.2.1 17-Dec-1999  he Pull up revision 1.14 (requested by perseant):
Avoid flushing vnodes involved in a dirop, making lfs' promise
of "no fsck needed, even in the event of a crash" closer to
reality.
 1.14.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.14.4.1 19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.14.2.3 08-Dec-2000  bouyer Sync with HEAD.
 1.14.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.14.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.17.4.3 03-Feb-2001  he Pull up revision 1.22 (requested by perseant):
o Close up accounting holes in LFS' accounting of immediately-
available-space, number of clean segments, and amount of dirty
space taken up by metadata (PR#11468, PR#11470, PR#11534).
 1.17.4.2 14-Sep-2000  perseant Pull up recent LFS kernel changes (approved by thorpej):

ufs/ufs/inode.h, 1.20--1.22 (add i_lfs_effnblks extension ;
make ITIMES aware of LFS_ITIMES;
_LKM protection so userland progs
compile)
ufs/ufs/ufs_vnops.c, 1.69, 1.71 (remove IN_ADIROP;
use ITIMES instead of FFS_ITIMES)
ufs/ufs/ufs_readwrite.c, 1.27 (use lfs_reserve in lfs_write)
ufs/lfs/lfs.h, 1.26--1.32 (define LFS_EST_* macros ;
change MIN_FREE_SEGS to lfs_minfreesegs ;
add avail and bfree to CLEANERINFO ;
change lfs_uinodes to signed ;
change lfs_dmeta to signed ;
add whitespace to line up structure
members ;
explicit cast to int32_t in LFS_EST_*
macros)
ufs/lfs/lfs_alloc.c, back out 1.34.2.3 (pullups of 1.39, 1.40);
then pull up 1.38 (clean up on error)
1.39--1.43 (restore fvdl's ufs_hashlock fix ;
restore fvdl's ufs_hashlock fix ;
set i_lfs_effnblks ;
use UINO macros ;
add comments and fix long lines)
ufs/lfs/lfs_balloc.c, 1.19 (don't succeed halfway)
1.21--1.25 (use i_lfs_effnblks ;
fix i_lfs_effnblks computation and
quieten ;
fix i_ffs_blocks in unwritten fragment ;
remove useless debugging check ;
add comments and (c) 2000)
ufs/lfs/lfs_bio.c, 1.24--1.30 (cleanup and make lfs_flush_fs take
"struct lfs *" instead of "struct
mount *" ;
use lfs_minfreeseg instead of
MIN_FREE_SEGS ;
use UINO macros, and copy bfree/avail
to CLEANERINFO ;
add lfs_reserve function ;
1.28--1.30 fix printf formatting)
ufs/lfs/lfs_cksum.c, 1.13 (add (c) 2000)
ufs/lfs/lfs_debug.c, 1.11 (use btodb instead of DEV_BSIZE)
ufs/lfs/lfs_extern.h, 1.18, 1.20--1.21 (function prototype changes)
ufs/lfs/lfs_inode.c, 1.38 (rewrite lfs_truncate from
ffs_truncate)
1.40--1.44 (count written and unwritten blocks
seperately ;
use disk block units instead of bytes ;
remove unnecessary "mod" variable ;
correct B_DELWRI to avoid bawrite panic ;
use lfs_reserve)
ufs/lfs/lfs_segment.c, 1.52-1.59 (use lfs_dmeta to note used summaries ;
check for UNWRITTEN in indirect blocks ;
more debugging stuff inside #ifdef
DEBUG_LFS ;
use LK_CANRECURSE ;
don't drop dirty indirect blocks ;
use UINO macros ;
don't hose the free list ;
use btodb() instead of DEV_BSIZE ;
make it compile again (oops))
ufs/lfs/lfs_subr.c, 1.16--1.17 (check for locked inodes before
changing ;
use btodb() instead of DEV_BSIZE, (c)
2000)
ufs/lfs/lfs_syscalls.c, back out 1.41.4.2 (fvdl's ufs_hashlock fix);
then pull up 1.43 (use lfs_dmeta)
1.44--1.45 (restore fvdl's ufs_hashlock fix)
1.46--1.47 (fix lfs_avail leakage from sblock
segments ;
use UINO macros)
1.49 (bounds-check inode numbers in
lfs_markv)
ufs/lfs/lfs_vfsops.c, 1.53 (use LFS_EST_* macros in lfs_statfs)
1.56--1.58 (initialize lfs_minfreeseg, lfs_effnblk ;
initialize lfs_uinodes ;
initialize lfs_ravail)
ufs/lfs/lfs_vnops.c, 1.40 (remove VDIROP from removed files)
1.42--1.44 (move SET_ENDOP below the removal of
VDIROP ;
use UINO macros and add lfs_itimes
function ;
use lfs_reserve in dirops)
 1.17.4.1 03-Jul-2000  fvdl pullup the fixes from the trunk to not hold ufs_hashlock across
getnewvnode()
 1.24.6.4 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.24.6.3 16-Mar-2002  jdolecek Catch up with -current.
 1.24.6.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.24.6.1 03-Aug-2001  lukem update to -current
 1.24.4.2 29-Jun-2001  perseant Get rid of __P(), protoizing where it had not already been done
 1.24.4.1 27-Jun-2001  perseant Import of what I've been calling "LFSv2", that is, LFS with some features
added that require changes to the on-disk data structures. These include:

- 64-bit time in everything but inodes
- User-specified segment offset, and segment size no longer
restricted to PO2.
- Serial number on segment summaries in addition to timestamp, and
a new volume identifier, to make roll-forward feasible without
fear of finding old data and thinking it was new.

Although I think this version works at least as well as what's on the trunk,
we're not done yet; hence this commit is going in on a branch and not on
the trunk. Enhancements that are not here yet include fragment addressing,
like FFS does, instead of block addressing.
 1.24.2.8 29-Dec-2002  thorpej Sync with HEAD.
 1.24.2.7 19-Dec-2002  thorpej Sync with HEAD.
 1.24.2.6 11-Dec-2002  thorpej Sync with HEAD.
 1.24.2.5 20-Jun-2002  nathanw Catch up to -current.
 1.24.2.4 28-Feb-2002  nathanw Catch up to -current.
 1.24.2.3 08-Jan-2002  nathanw Catch up to -current.
 1.24.2.2 21-Sep-2001  nathanw Catch up to -current.
 1.24.2.1 24-Aug-2001  nathanw Catch up with -current.
 1.25.2.1 01-Oct-2001  fvdl Catch up with -current.
 1.30.4.1 20-Jun-2002  lukem Pull up revision 1.31 (requested by perseant in ticket #325):
For synchronous writes, keep separate i/o counters for each write, so
processes don't have to wait for one another to finish (e.g., nfsd seems
to be a little happier now, though I haven't measured the difference).
Synchronous checkpoints, however, must always wait for all i/o to finish.
Take the contents of the callback functions and have them run in thread
context instead (aiodoned thread). lfs_iocount no longer has to be
protected in splbio(), and quite a bit less of the segment construction
loop needs to be in splbio() as well.
If lfs_markv is handed a block that is not the correct size according to
the inode, refuse to process it. (Formerly it was extended to the "correct"
size.) This is possibly more prone to deadlock, but less prone to corruption.
lfs_segclean now outright refuses to clean segments that appear to have live
bytes in them. Again this may be more prone to deadlock but avoids
corruption.
Replace ufsspec_close and ufsfifo_close with LFS equivalents; this means
that no UFS functions need to know about LFS_ITIMES any more. Remove
the reference from ufs/inode.h.
Tested on i386, test-compiled on alpha.
 1.30.2.1 20-Jun-2002  gehenna catch up with -current.
 1.50.2.9 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.50.2.8 08-Mar-2005  skrll Sync with HEAD.
 1.50.2.7 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.50.2.6 30-Oct-2004  skrll Oops, forgot this as part of the

"Reduced diff to HEAD by restoring the struct proc * argument to lfs_bmapv"

change
 1.50.2.5 21-Sep-2004  skrll Fix the sync with head I botched.
 1.50.2.4 18-Sep-2004  skrll Sync with HEAD.
 1.50.2.3 24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.50.2.2 03-Aug-2004  skrll Sync with HEAD
 1.50.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.59.2.1 23-May-2004  tron branches: 1.59.2.1.2;
Pull up revision 1.61 (requested by atatat in ticket #374):
Tweak sysctl setup functions (the macros, actually) for use in lkms,
and tweak lkminit_*.c (where applicable) to call them, and to call
sysctl_teardown() when being unloaded.
This consists of (1) making setup functions not be static when being
compiled as lkms (change to sys/sysctl.h), (2) making prototypes
visible for the various setup functions in header files (changes to
various header files), and (3) making simple "load" and "unload"
functions in the actual lkminit stuff.
linux_sysctl.c also needs its root exposed (ie, made not static) for
this (when built as an lkm).
 1.59.2.1.2.1 10-May-2005  riz Pull up the following revisions (requested by perseant in ticket #1281):

1.8 sys/ufs/lfs/TODO
1.75 sys/ufs/lfs/lfs.h (via patch)
1.74 sys/ufs/lfs/lfs_alloc.c (via patch)
1.49, 1.51 sys/ufs/lfs/lfs_balloc.c (1.51 via patch)
1.78 sys/ufs/lfs/lfs_bio.c
1.62 sys/ufs/lfs/lfs_extern.h (via patch)
1.156 sys/ufs/lfs/lfs_segment.c (via patch)
1.48 sys/ufs/lfs/lfs_subr.c
1.101 sys/ufs/lfs/lfs_syscalls.c
1.163 sys/ufs/lfs/lfs_vfsops.c (via patch)
1.134 sys/ufs/lfs/lfs_vnops.c (via patch)
1.61 sys/ufs/ufs/ufs_readwrite.c (via patch)

1.20 libexec/lfs_cleanerd/clean.h (via patch)
1.52 libexec/lfs_cleanerd/cleanerd.c (via patch)
1.41 libexec/lfs_cleanerd/library.c (via patch)

1.4 regress/sys/fs/lfs/newfs_fsck/Makefile
1.2 regress/sys/fs/lfs/newfs_fsck/mkfs_mount
1.2 regress/sys/fs/lfs/newfs_fsck/smallfiles
1.3 sbin/fsck_lfs/bufcache.c
1.3 sbin/fsck_lfs/bufcache.h
1.3 sbin/fsck_lfs/lfs.h
1.8 sbin/fsck_lfs/lfs.c (via patch)
1.8 sbin/fsck_lfs/pass3.c (via patch)
1.18 sbin/fsck_lfs/pass0.c (via patch)
1.18 sbin/fsck_lfs/utilities.c (via patch)
1.7 sbin/fsck_lfs/segwrite.c
1.19 sbin/fsck_lfs/setup.c (via patch)
1.3 sbin/newfs_lfs/Makefile
0 sbin/newfs_lfs/lfs.c (yes, remove it)
1.1 sbin/newfs_lfs/make_lfs.c
1.15 sbin/newfs_lfs/newfs.c (via patch)

Various minor LFS improvements.

Kernel:

* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this. Should fix PR #29045.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
Fixes PR #26680.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().

cleaner:

* Adapt lfs_cleanerd to use the fcntl call to get the Ifile filehandle,
so it need not be in the namespace.
* Make lfs_cleanerd be more careful when there are very few available
segments.
* Make lfs_cleanerd less verbose when the filesystem is unmounted.

newfs_lfs, fsck_lfs, and regression:

* Extend the lfs library from fsck_lfs(8) so that it can be used with a
not-yet-existent LFS. Make newfs_lfs(8) use this library, so it can
create LFSs whose Ifile is larger than one segment. Addresses PR #11110.
* Make newfs_lfs(8) use strsuftoi64() for its arguments, a la newfs(8).
* Make fsck_lfs(8) respect the "file system is clean" flag.
* Don't let fsck_lfs(8) think it has dirty blocks when invoked with the
-n flag.
* Remove the Ifile from the filesystem namespace. The cleaner now uses
a fcntl call on the root inode to find the Ifile filehandle. (As a
side-effect, addresses PR #29144.)
 1.61.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.61.4.1 29-Apr-2005  kent sync with -current
 1.64.2.9 10-Aug-2006  tron Apply patch (requested by fair in perseant #1457):
Bring LFS up to current, including a patch (1.95 lfs_alloc.c) that
should prevent the inode free list errors seen on the STABLE branch
subsequent to pullup ticket #1327.
 1.64.2.8 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.171
sys/ufs/lfs/lfs_extern.h: revision 1.81
sys/ufs/lfs/lfs_segment.c: revision 1.177
Don't ever partially write dirops, even if we need the cleaner to run.
This increases the chances of the "no clean segments" panic slightly,
but allows us to run the ckckp regression test successfully to completion.
 1.64.2.7 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs.h: revision 1.104
sys/ufs/lfs/lfs_vfsops.c: revision 1.206
sys/ufs/lfs/lfs_vnops.c: revision 1.170
sys/ufs/lfs/lfs_extern.h: revision 1.80
sys/ufs/lfs/lfs_segment.c: revision 1.176
sys/ufs/lfs/lfs_inode.c: revision 1.103 via patch
sys/ufs/lfs/lfs_alloc.c: revision 1.90
Postpone the segment accounting changes coming from truncation until the
inode that makes those changes valid is either written to disk by
lfs_writeinode() or discarded by lfs_vfree().
A couple of locking fixes are also included as well.
 1.64.2.6 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vfsops.c: revision 1.200
sys/ufs/lfs/lfs_vnops.c: revision 1.164
sys/ufs/lfs/lfs_inode.c: revision 1.101
sys/ufs/lfs/lfs_extern.h: revision 1.78
sys/ufs/lfs/lfs.h: revision 1.100
Implement a somewhat finer-grained mechanism for paging LFS-backed pages.
The writer daemon, if it does not need to flush the whole filesystem,
now only writes the vnodes for which the pagedaemon has requested pageouts
(although it does not pay attention to the page ranges the pagedaemon
supplies).
 1.64.2.5 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_alloc.c: revision 1.87
sys/ufs/lfs/lfs.h: revision 1.99
sys/ufs/lfs/lfs_vfsops.c: revision 1.199
sys/ufs/lfs/lfs_extern.h: revision 1.77 via patch
Keep the free list ordered. This solves a problem first pointed out to me
by Michel Oey, in which an aged LFS writes up to an extra Ifile block for
every file created; and paves the way for the truncation of the Ifile when
many files are deleted.
 1.64.2.4 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.158
sys/ufs/lfs/lfs_subr.c: revision 1.57
sys/ufs/lfs/lfs_segment.c: revision 1.171
sys/ufs/lfs/lfs.h: revision 1.97
sys/ufs/lfs/lfs_vfsops.c: revision 1.195
sys/ufs/lfs/lfs_extern.h: revision 1.76
Improvements to LFS's paging mechanism, to wit:
* Acknowledge that sometimes there are more dirty pages to be written to
disk than clean segments. When we reach the danger line,
lfs_gop_write() now returns EAGAIN. The caller of VOP_PUTPAGES(), if
it holds the segment lock, drops it and waits for the cleaner to make
room before continuing.
* Note and avoid a three-way deadlock in lfs_putpages (a writer holding
a page busy blocks on the cleaner while the cleaner blocks on the
segment lock while lfs_putpages blocks on the page).
 1.64.2.3 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.152
sys/ufs/lfs/lfs_debug.c: revision 1.31
sys/ufs/lfs/lfs_subr.c: revision 1.53
sys/ufs/lfs/lfs_extern.h: revision 1.68
sys/ufs/lfs/lfs_inode.c: revision 1.96
sys/ufs/lfs/lfs_bio.c: revision 1.86
sys/ufs/lfs/lfs_alloc.c: revision 1.83
sys/ufs/lfs/lfs_vfsops.c: revision 1.181
sys/ufs/lfs/lfs.h: revision 1.88
sys/ufs/lfs/lfs_segment.c: revision 1.164
- sprinkle const
- avoid shadow variables.
 1.64.2.2 24-Aug-2005  riz Pull up following revision(s) (requested by yamt in ticket #688):
sys/miscfs/genfs/genfs_vnops.c: revision 1.98 via patch
sys/ufs/ffs/ffs_vfsops.c: revision 1.165
sys/ufs/lfs/lfs_extern.h: revision 1.69
sys/fs/filecorefs/filecore_vfsops.c: revision 1.20
sys/nfs/nfs_node.c: revision 1.80
sys/fs/smbfs/smbfs_node.c: revision 1.24
sys/fs/cd9660/cd9660_vfsops.c: revision 1.24
sys/fs/msdosfs/msdosfs_denode.c: revision 1.8
sys/miscfs/genfs/genfs_node.h: revision 1.6
sys/ufs/lfs/lfs_vfsops.c: revision 1.183
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.86
sys/fs/adosfs/advfsops.c: revision 1.23
sys/fs/ntfs/ntfs_vfsops.c: revision 1.31
- constify genfs_ops.
- use member designators.

sys/miscfs/genfs/genfs_vnops.c: revision 1.99 via patch
genfs_getpages: don't forget to put the vnode onto the syncer's work que
ue
even in the case of PGO_LOCKED.

sys/uvm/uvm_bio.c: revision 1.40
sys/uvm/uvm_pager.h: revision 1.29
sys/miscfs/genfs/genfs_vnops.c: revision 1.100 via patch
sys/ufs/ufs/ufs_inode.c: revision 1.50
- introduce PGO_NOBLOCKALLOC and use it for ubc mapping
to prevent unnecessary block allocations in the case that
page size > block size.
- ufs_balloc_range: use VM_PROT_WRITE+PGO_NOBLOCKALLOC rather than
VM_PROT_READ.

sys/uvm/uvm_fault.c: revision 1.96
sys/miscfs/genfs/genfs_vnops.c: revision 1.101 via patch
sys/uvm/uvm_object.h: revision 1.19
sys/miscfs/genfs/genfs_node.h: revision 1.7
ensure that vnodes with dirty pages are always on syncer's queue.
- genfs_putpages: wait for i/o completion of PG_RELEASED/PG_PAGEOUT pages by
setting "wasclean" false when encountering them.
suggested by Stephan Uphoff in PR/24596 (1).
- genfs_putpages: write protect pages when cleaning out, if
we're going to take the vnode off the syncer's queue.
uvm_fault: don't write-map pages unless its vnode is already on
the syncer's queue.
fix PR/24596 (3) but in the different way from the suggested fix.
(to keep our current behaviour, ie. not to require explicit msync.
discussed on tech-kern@.)
- genfs_putpages: don't mistakenly take a vnode off the queue
by introducing a generation number in genfs_node.
genfs_getpages: increment the generation number.
suggested by Stephan Uphoff in PR/24596 (2).
- add some assertions.

sys/miscfs/genfs/genfs_vnops.c: revision 1.102 via patch
genfs_putpages: don't bother to clean the vnode unless VONWORKLST.

sys/ufs/ffs/ffs_vnops.c: revision 1.71
ffs_full_fsync: because VBLK/VCHR can be mmap'ed,
do VOP_PUTPAGES for them as well.

sys/uvm/uvm_fault.c: revision 1.97
uvm_fault: check a correct object in the case of layered filesystems.
fix PR/30811 from Jukka Salmi.

sys/uvm/uvm_object.h: revision 1.20
sys/ufs/ffs/ffs_vfsops.c: revision 1.167
sys/uvm/uvm_bio.c: revision 1.41
sys/ufs/ufs/ufs_vnops.c: revision 1.129
sys/uvm/uvm_mmap.c: revision 1.92
sys/uvm/uvm_fault.c: revision 1.98
sys/kern/vfs_subr.c: revision 1.252
sys/fs/msdosfs/denode.h: revision 1.5
sys/miscfs/genfs/genfs_vnops.c: revision 1.103 via patch
sys/fs/msdosfs/msdosfs_denode.c: revision 1.9
sys/sys/vnode.h: revision 1.141
sys/ufs/ufs/ufs_inode.c: revision 1.51
sys/ufs/ufs/ufs_extern.h: revision 1.45 via patch
sys/miscfs/genfs/genfs_node.h: revision 1.8
sys/ufs/lfs/lfs_vfsops.c: revision 1.184
sys/uvm/uvm_pager.h: revision 1.30
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.87
update file timestamps for nfsd loaned-read and mmap.
PR/25279. discussed on tech-kern@.

sys/miscfs/genfs/genfs_vnops.c: revision 1.104 via patch
don't write-protect wired pages. pointed by Chuck Silvers.
for now, leave a vnode on the syncer's queue, as suggested by him.

sys/ufs/ffs/ffs_vnops.c: revision 1.72
revert VCHR part of ffs_vnops.c 1.71.
as VCHR uses the device pager, no point to call VOP_PUTPAGES here.
pointed by Chuck Silvers.
 1.64.2.1 07-May-2005  tron Apply patch (requested by perseant in ticket #242):
* fsck_lfs buffer cache fixes, including PR #29151
* Change fsck_lfs phase 0 message to reflect reality
* fsck_lfs: check phase 5 (cleanerinfo accounting) even on
roll-forward
* Keep better track of the free list during roll-forward, avoiding
a core dump
* Improve hash table use for fsck_lfs buffer and vnode cache
* Document fsck_lfs flag -f, and implement -q
* Add resize_lfs, including kernel support
* Add LFS to mountd's list of exportable filesystem types
* Make the LFS lkm work again [christos@]
* Add MP locking to the LFS kernel subsystem
* Fix pager_map deadlock in lfs_putpages()
* Avoid incomplete file extension that looks like "partial
truncation" to fsck
* Use lfs_malloc for cleaner malloc, since the cleaner often runs
in low-memory conditions.
* Use splay trees, not hash table, to track page allocation for
write.
* Fix mkdir panic on full fs
* Fix page accounting leak by counting differently.
* Use rightly named structure for lfs_getattr [skrll@]
* Cosmetic changes for readability.
 1.69.2.5 21-Jan-2008  yamt sync with head
 1.69.2.4 07-Dec-2007  yamt sync with head
 1.69.2.3 03-Sep-2007  yamt sync with head.
 1.69.2.2 30-Dec-2006  yamt sync with head.
 1.69.2.1 21-Jun-2006  yamt sync with head.
 1.71.2.1 20-Oct-2005  yamt adapt ufs.
 1.73.2.1 15-Jan-2006  yamt sync with head.
 1.75.10.2 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.75.10.1 28-Mar-2006  tron Merge 2006-03-28 NetBSD-current into the "peter-altq" branch.
 1.75.8.5 11-May-2006  elad sync with head
 1.75.8.4 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.75.8.3 03-May-2006  yamt wrap some decls with #ifdef _KERNEL. ok'ed by elad@.
 1.75.8.2 19-Apr-2006  elad sync with head.
 1.75.8.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.75.6.5 03-Sep-2006  yamt sync with head.
 1.75.6.4 11-Aug-2006  yamt sync with head
 1.75.6.3 24-May-2006  yamt sync with head.
 1.75.6.2 11-Apr-2006  yamt sync with head
 1.75.6.1 01-Apr-2006  yamt sync with head.
 1.75.4.2 01-Jun-2006  kardel Sync with head.
 1.75.4.1 22-Apr-2006  simonb Sync with head.
 1.75.2.1 09-Sep-2006  rpaulo sync with head
 1.83.4.1 13-Jul-2006  gdamore Merge from HEAD.
 1.87.12.1 03-Sep-2007  wrstuden Sync w/ NetBSD-4-RC_1
 1.87.8.2 07-May-2007  yamt sync with head.
 1.87.8.1 12-Mar-2007  rmind Sync with HEAD.
 1.87.6.1 05-Jun-2007  bouyer Pull up following revision(s) (requested by perseant in ticket #703):
sys/miscfs/genfs/genfs.h 1.21
sys/miscfs/genfs/genfs_vnops.c 1.151
sys/ufs/lfs/lfs.h 1.119, 1.120
sys/ufs/lfs/lfs_bio.c 1.99-101
sys/ufs/lfs/lfs_extern.h 1.89
sys/ufs/lfs/lfs_inode.c 1.108, 1.109
sys/ufs/lfs/lfs_segment.c 1.197, 1.199, 1.200
sys/ufs/lfs/lfs_subr.c 1.69, 1.70
sys/ufs/lfs/lfs_syscalls.c 1.119
sys/ufs/lfs/lfs_vfsops.c 1.234, 1.235
sys/ufs/lfs/lfs_vnops.c 1.195, 1.196, 1.200, 1.202-206

Reduce busy waiting in lfs_putpages(), and other LFS improvements.
 1.88.4.1 11-Jul-2007  mjf Sync with head.
 1.88.2.4 20-Aug-2007  ad Sync with HEAD.
 1.88.2.3 15-Jul-2007  ad Sync with head.
 1.88.2.2 08-Jun-2007  ad Sync with head.
 1.88.2.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.90.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.91.12.2 31-Jul-2007  pooka * nuke the nameidata parameter from VFS_MOUNT(). Nobody on tech-kern
knew what it was supposed to be used for and wrstuden gave a go-ahead
* while rototilling, convert file systems which went easily to
use VFS_PROTOS() instead of manually prototyping the methods
 1.91.12.1 31-Jul-2007  pooka file lfs_extern.h was added on branch matt-mips64 on 2007-07-31 21:14:21 +0000
 1.91.10.3 18-Feb-2008  mjf Sync with HEAD.
 1.91.10.2 27-Dec-2007  mjf Sync with HEAD.
 1.91.10.1 08-Dec-2007  mjf Sync with HEAD.
 1.91.4.1 09-Jan-2008  matt sync with HEAD
 1.91.2.2 09-Dec-2007  jmcneill Sync with HEAD.
 1.91.2.1 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.92.2.4 26-Dec-2007  ad Sync with head.
 1.92.2.3 19-Dec-2007  ad Use a global lfs_lock.
 1.92.2.2 19-Dec-2007  ad Get lfs mostly working.
 1.92.2.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.93.4.1 02-Jan-2008  bouyer Sync with HEAD
 1.94.10.2 04-May-2009  yamt sync with head.
 1.94.10.1 16-May-2008  yamt sync with head.
 1.94.8.1 18-May-2008  yamt sync with head.
 1.94.6.2 29-Jun-2008  mjf Sync with HEAD.
 1.94.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.95.4.1 03-Jul-2008  simonb Sync with head.
 1.95.2.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.96.34.2 24-Feb-2012  mrg sync to -current.
 1.96.34.1 18-Feb-2012  mrg merge to -current.
 1.96.30.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.96.30.1 17-Apr-2012  yamt sync with head
 1.98.2.3 03-Dec-2017  jdolecek update from HEAD
 1.98.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.98.2.1 23-Jun-2013  tls resync from head
 1.99.4.1 23-Jul-2013  riastradh sync with HEAD
 1.99.2.1 28-Aug-2013  rmind sync with head
 1.101.6.5 28-Aug-2017  skrll Sync with HEAD
 1.101.6.4 09-Jul-2016  skrll Sync with HEAD
 1.101.6.3 22-Sep-2015  skrll Sync with HEAD
 1.101.6.2 06-Jun-2015  skrll Sync with HEAD
 1.101.6.1 06-Apr-2015  skrll Sync with HEAD
 1.111.10.1 30-Oct-2017  snj Pull up following revision(s) (requested by maya in ticket #330):
sbin/fsck_lfs/inode.c: 1.69
sbin/fsck_lfs/lfs.c: 1.73
sbin/fsck_lfs/pass6.c: 1.50
sbin/fsck_lfs/segwrite.c: 1.46
sys/ufs/lfs/lfs.h: 1.202-1.203
sys/ufs/lfs/lfs_accessors.h: 1.48
sys/ufs/lfs/lfs_alloc.c: 1.136-1.137
sys/ufs/lfs/lfs_balloc.c: 1.94
sys/ufs/lfs/lfs_bio.c: 1.141
sys/ufs/lfs/lfs_extern.h: 1.113
sys/ufs/lfs/lfs_inode.c: 1.156-1.157
sys/ufs/lfs/lfs_inode.h: 1.20, 1.21, 1.23
sys/ufs/lfs/lfs_itimes.c: 1.20
sys/ufs/lfs/lfs_pages.c: 1.13-1.15
sys/ufs/lfs/lfs_rename.c: 1.22
sys/ufs/lfs/lfs_segment.c: 1.270-1.275
sys/ufs/lfs/lfs_subr.c: 1.94-1.97
sys/ufs/lfs/lfs_syscalls.c: 1.175
sys/ufs/lfs/lfs_vfsops.c: 1.360
sys/ufs/lfs/lfs_vnops.c: 1.316-1.321
sys/ufs/lfs/ulfs_inode.c: 1.20
sys/ufs/lfs/ulfs_inode.h: 1.24
sys/ufs/lfs/ulfs_lookup.c: 1.41
sys/ufs/lfs/ulfs_quota2.c: 1.31
sys/ufs/lfs/ulfs_readwrite.c: 1.24
sys/ufs/lfs/ulfs_vnops.c: 1.49-1.50
Update inode member i_flag --> i_state to keep up with kernel changes
Move definition of IN_ALLMOD near the flag it's a mask for.
Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
Correct confusion between i_flag and i_flags
These will have to be renamed.
Spotted by Riastradh, thanks!
Add an XXX about the missing flags so it's not buried in a commit
message.
now the XXX count for LFS is 260
Rename i_flag to i_state.
The similarity to i_flags has previously caused errors.
Use continue to denote the no-op loop to match netbsd style
newline for extra clarity.
It isn't safe to drain dirops with seglock held, it'll deadlock if there
are any dirops. drain before grabbing seglock.
lfs_dirops == 0 is always true (as we already drained dirops), so omit
that part of the comparison.
Fixes a lot of LFS deadlocks. PR kern/52301
Many thanks to dholland for help analyzing coredumps
Ifdef out KDASSERT which fires on my machine.
Deduplicate sanity check that seglock is held on segunlock
Revert r1.272 fix to PR kern/52301, the performance hit is making things
unusable.
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar
XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
fix buffer overflow/KASSERT when cookies are supplied
lfs no longer uses the ffs-style struct direct, use the correct minimum
size
from dholland
XXX more wrong
Consistently use {,UN}MARK_VNODE macros rather than function calls.
Not much point doing anything after a panic call
Ask some question about the code in a XXX comment
XXX question our double-flushing of dirops
Fix typo in comment
 1.113.4.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.113.4.1 10-Jun-2019  christos Sync with HEAD
 1.113.2.1 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.114.6.1 29-Feb-2020  ad Sync with head.
 1.114.4.1 17-Aug-2020  martin Pull up following revision(s) (requested by riastradh in ticket #1050):

sys/ufs/lfs/lfs_subr.c: revision 1.101
sys/ufs/lfs/lfs_subr.c: revision 1.102
sys/ufs/lfs/lfs_inode.c: revision 1.158
sys/ufs/lfs/lfs_inode.h: revision 1.25
sys/ufs/lfs/lfs_balloc.c: revision 1.95
sys/ufs/lfs/lfs_pages.c: revision 1.21
sys/ufs/lfs/lfs_vnops.c: revision 1.330
sys/ufs/lfs/lfs_alloc.c: revision 1.140 (patch)
sys/ufs/lfs/lfs_alloc.c: revision 1.141 (patch)
lib/libp2k/p2k.c: revision 1.72
sys/ufs/lfs/lfs.h: revision 1.205
sys/ufs/lfs/lfs.h: revision 1.206
sys/ufs/lfs/lfs_segment.c: revision 1.284
sys/ufs/lfs/lfs.h: revision 1.207
sys/ufs/lfs/lfs_segment.c: revision 1.285
sys/ufs/lfs/lfs_debug.c: revision 1.55
sys/ufs/lfs/lfs_rename.c: revision 1.23
usr.sbin/dumplfs/dumplfs.c: revision 1.65
sys/ufs/lfs/lfs_vfsops.c: revision 1.371
sys/arch/i386/stand/efiboot/bootx64/Makefile: revision 1.3
sys/ufs/lfs/lfs_vfsops.c: revision 1.372
sys/ufs/lfs/lfs_vfsops.c: revision 1.373
sbin/fsck_lfs/pass1.c: revision 1.46
sys/ufs/lfs/lfs_vnops.c: revision 1.326
sys/ufs/lfs/lfs_vnops.c: revision 1.327
sys/ufs/lfs/lfs_vfsops.c: revision 1.375 (patch)
sys/ufs/lfs/lfs_vnops.c: revision 1.328
sys/ufs/lfs/lfs_subr.c: revision 1.98
sys/ufs/lfs/lfs_extern.h: revision 1.116
sys/ufs/lfs/lfs_vnops.c: revision 1.329
sys/ufs/lfs/lfs_subr.c: revision 1.99
sys/ufs/lfs/lfs_extern.h: revision 1.117
sys/ufs/lfs/lfs_accessors.h: revision 1.49
sys/ufs/lfs/lfs_extern.h: revision 1.118
sys/rump/fs/lib/liblfs/Makefile: revision 1.15
sys/ufs/lfs/lfs_bio.c: revision 1.146 (patch)
sys/ufs/lfs/lfs_bio.c: revision 1.147
sys/ufs/lfs/lfs_subr.c: revision 1.100

Fix kassert in lfs by initializing vp first.

Use a marker node to iterate lfs_dchainhd / i_lfs_dchain.

I believe elements can be removed while the lock is dropped,
including the next node we're hanging on to.

Just use VOP_BWRITE for lfs_bwrite_log.
Hope this doesn't cause trouble with vfs_suspend.

Teach lfs to transition ro<->rw.

Prevent new dirops while we issue lfs_flush_dirops.

lfs_flush_dirops assumes (by KASSERT((ip->i_state & IN_ADIROP) == 0))
that vnodes on the dchain will not become involved in active dirops
even while holding no other locks (lfs_lock, v_interlock), so we must
set lfs_writer here. All other callers already set lfs_writer.

We set fs->lfs_writer++ without explicitly doing lfs_writer_enter
because
(a) we already waited for the dirops to drain, and
(b) we hold lfs_lock and cannot drop it before setting lfs_writer.

Assert lfs_writer where I think we can now prove it.

Serialize access to the splay tree with lfs_lock.

Change some cheap KDASSERT into KASSERT.

Take a reference and fix assertions in lfs_flush_dirops.
Fixes panic:
KASSERT((ip->i_state & IN_ADIROP) == 0) at lfs_vnops.c:1670
lfs_flush_dirops
lfs_check
lfs_setattr
VOP_SETATTR
change_mode
sys_fchmod
syscall

This assertion -- and the assertion that vp->v_uflag has VU_DIROP set
-- is valid only until we release lfs_lock, because we may race with
lfs_unmark_dirop which will remove the nodes and change the flags.

Further, vp itself is valid only as long as it is referenced, which it
is as long as it's on the dchain, but lfs_unmark_dirop drops the
dchain's reference.

Don't lfs_writer_enter while holding v_interlock.

There's no need to lfs_writer_enter at all here, as far as I can see.
lfs_flush_fs will do it for us.

Break deadlock in PR kern/52301.

The lock order is lfs_writer -> lfs_seglock. The problem in 52301 is
that lfs_segwrite violates this lock order by sometimes doing
lfs_seglock -> lfs_writer, either (a) when doing a checkpoint or (b),
opportunistically, when there are no dirops pending. Both cases can
deadlock, because dirops sometimes take the seglock (lfs_truncate,
lfs_valloc, lfs_vfree):
(a) There may be dirops pending, and they may be waiting for the
seglock, so we can't wait for them to complete while holding the
seglock.
(b) The test for fs->lfs_dirops == 0 happens unlocked, and the state
may change by the time lfs_writer_enter acquires lfs_lock.

To resolve this in each case:
(a) Do lfs_writer_enter before lfs_seglock, since we will need it
unconditionally anyway. The worst performance impact of this should
be that some dirops get delayed a little bit.
(b) Create a new lfs_writer_tryenter to use at this point so that the
test for fs->lfs_dirops == 0 and the acquisition of lfs_writer happen
atomically under lfs_lock.

Initialize/destroy lfs_allclean_wakeup in modcmd, not lfs_mountfs.

Fixes reloading lfs.kmod.

In lfs_update, hold lfs_writer around lfs_vflush.

Otherwise, we might do
lfs_vflush
-> lfs_seglock
-> lfs_segwait(SEGM_CKP)
-> lfs_writer_enter
which is the reverse of the lfs_writer -> lfs_seglock ordering.

Call lfs_orphan in lfs_rename while we're still in the dirop.
lfs_writer_enter can't fail; keep it simple and don't pretend it can.

Assert that mtsleep can't fail either -- it doesn't catch signals and
there's no timeout.

Teach LFS_ORPHAN_NEXTFREE about lfs64.

Dust off the orphan detection code and try to make it work.

Fix !DIAGNOSTIC compile

Fix userland references to LFS_ORPHAN_NEXTFREE.

Forgot to grep for these or do a full distribution build, oops!

Fix missing <sys/evcnt.h> by removing the evcnts instead.

Just wanted to confirm that a race might happen, and indeed it did.
These serve little diagnostic value otherwise.

OR into bp->b_cflags; don't overwrite.

CTASSERT lfs on-disk structure sizes.

Avoid misaligned access to lfs64 on-disk records in memory.
lfs64 directory entries are only 32-bit aligned in order to conserve
space in directory blocks, and we had a hack to stuff a 64-bit inode
in them. This replaces the hack by __aligned(4) __packed, and goes
further:

1. It's not clear that all the other lfs64 data structures are 64-bit
aligned on disk to begin with. We can go through these later and
upgrade them from
struct foo64 {
...
} __aligned(4) __packed;
union foo {
struct foo64 f64;
...
};
to
struct foo64 {
...
};
union foo {
struct foo64 f64 __aligned(8);
...
} __aligned(4) __packed;
if we really want to take advantage of 64-bit memory accesses.
However, the __aligned(4) __packed must remain on the union
because:
2. We access even the lfs32 data structures via a union that has
lfs64 members, and it turns out that compilers will assume access
through a union with 64-bit aligned members implies the whole
union has 64-bit alignment, even if we're only accessing a 32-bit
aligned member.

Fix clang build after packed lfs64 accessor change.

Suppress spurious address-of-packed error in rump lfs too.
 1.160 23-Apr-2020  ad PR kern/54759 (vm.ubc_direct deadlock when read()/write() into mapping of itself)

- Add new flag UBC_ISMAPPED which tells ubc_uiomove() the object is mmap()ed
somewhere. Use it to decide whether to do direct-mapped copy, rather than
poking around directly in the vnode in ubc_uiomove(), which is ugly and
doesn't work for tmpfs. It would be nicer to contain all this in UVM but
the filesystem provides the needed locking here (VV_MAPPED) and to
reinvent that would suck more.

- Rename UBC_UNMAP_FLAG() to UBC_VNODE_FLAGS(). Pass in UBC_ISMAPPED where
appropriate.
 1.159 23-Feb-2020  ad branches: 1.159.4;
UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.158 23-Feb-2020  riastradh In lfs_update, hold lfs_writer around lfs_vflush.

Otherwise, we might do

lfs_vflush
-> lfs_seglock
-> lfs_segwait(SEGM_CKP)
-> lfs_writer_enter

which is the reverse of the lfs_writer -> lfs_seglock ordering.
 1.157 10-Jun-2017  maya branches: 1.157.6; 1.157.10; 1.157.12;
Rename i_flag to i_state.

The similarity to i_flags has previously caused errors.
 1.156 05-Jun-2017  maya Correct confusion between i_flag and i_flags
These will have to be renamed.

Spotted by Riastradh, thanks!
 1.155 01-Apr-2017  maya branches: 1.155.6;
Simplify locking
 1.154 31-Mar-2017  maya stopgap fix- move lfs_lock to include calls to lfs_dino_{set,get}block

blocks new users that need seglock (need to take lfs_lock) that
setblock before the assert (truncate to 0 but 31 blks/31 effblks)

not proper, but lets me run firefox on lfs
 1.153 21-Mar-2017  maya Update mtime even if oip->i_size == length

PR kern/51762, LFS version.
 1.152 19-Mar-2017  riastradh Fix inadvertently reversed sense of comparisons.
 1.151 18-Mar-2017  riastradh #if DIAGNOSTIC panic ---> KASSERT
 1.150 16-Mar-2017  maya actually cast to unsigned long long and use %llu. certainly not use hex (oops)
suggested by dh
 1.149 15-Mar-2017  maya print inode number in an assert I keep hitting and the adjacent one.
use PRIx64 for printing inode number elsewhere.
 1.148 13-Mar-2017  riastradh #if DIAGNOSTIC panic ---> KASSERTMSG
 1.147 01-Sep-2015  dholland branches: 1.147.2; 1.147.4;
Fix up indirect block handling in truncate to be 32/64 clean.
 1.146 01-Sep-2015  dholland Use the lfs dinode accessors in place of the ufs-derived ones.
(Mostly.)

The ufs-derived ones are fake structure member macros, which are gross
and not very safe. Also, it seems that a lot of places in the lfs code
were using the ffsv1 branch of them unconditionally, and this way it's
guaranteed all those places have been updated.

Found while doing this: for non-devices, have getattr produce NODEV
in the rdev field instead of leaking the address of the first direct
block.
 1.145 19-Aug-2015  dholland Part two of dinodes; use the same union everywhere.
(previously the ufs-derived code had things set up slightly different)

Remove a bunch of associated mess.
 1.144 12-Aug-2015  dholland Hack up dinode usage to be 64 vs. 32 as needed. Part 1.

(This part changes the native lfs code; the ufs-derived code already
has 64 vs. 32 logic, but as aspects of it are unsafe, and don't
entirely interoperate cleanly with the lfs 64/32 stuff, pass 2 will be
rehashing that.)
 1.143 02-Aug-2015  dholland whoops, fix 32-bit build
 1.142 02-Aug-2015  dholland Make i_eff_nblks in the in-memory inode 64 bits wide.
 1.141 02-Aug-2015  dholland Fix assorted 64 -> 32 truncations in lfs. Also, some minor tidyups and
corrections in passing.
 1.140 28-Jul-2015  dholland Add a new lfs header file: lfs_accessors.h.

This contains all the accessor functions and macros out of lfs.h.
Add an include of lfs_accessors.h after all uses of lfs.h... except
for code that wants to define its own struct lfs-alike that the
accessors are supposed to play along with. For these, set STRUCT_LFS
and include lfs_accessors.h after the necessary structure has been
defined, so that lfs_accessors.h can emit functions in terms of it.
 1.139 24-Jul-2015  dholland More lfs superblock accessors.
(This changes the rest of the code over; all the accessors were
already added.)

The difference between this commit and the previous one is arbitrary,
but the previous one passed the regression tests on its own so I'm
keeping it separate to help with any bisections that might be needed
in the future.
 1.138 24-Jul-2015  dholland Switch to accessor functions for elements of the LFS on-disk
superblock. This will allow switching between 32/64 bit forms on the
fly; it will also allow handling LFS_EI reasonably tidily. (That
currently doesn't work on the superblock.)

It also gets rid of cpp abuse in the form of fake structure member
macros.

Also, instead of doing sleep/wakeup on &lfs_avail and &lfs_nextseg
inside the on-disk superblock, add extra elements to the in-memory
struct lfs for this. (XXX: these should be changed to condvars, but
not right now)

XXX: this migrates a structure needed by the lfs code in libsa (struct
salfs) into lfs.h, where it doesn't belong, but for the time being
this is necessary in order to allow the accessors (and the various
lfs macros and other goop that relies on them) to compile.
 1.137 16-Jul-2015  dholland Don't cast the return value of malloc.
 1.136 17-Oct-2013  christos branches: 1.136.6;
- remove unused variables
- add debug ifdefs for debugging variables
- __USE() where appropriate.
 1.135 28-Jul-2013  dholland Add more of the bits for supporting quotas.
 1.134 28-Jul-2013  dholland Migrate the miscellaneous ulfs-level info from struct ulfsmount to
struct lfs.

Put them inside #ifdef _KERNEL there. They are not the only such
members, gross as that is. Unfortunately, moving struct lfs to
lfs_kernel.h does not work.
 1.133 28-Jul-2013  dholland Add lfs_kernel.h for declarations that don't need to be exposed to userland.

lfs currently has the following headers:
lfs.h - on-disk structures and stuff needed for userlevel tools
lfs_inode.h - additional restricted materials for userlevel tools
that operate the fs (newfs_lfs, fsck_lfs, lfs_cleanerd)
lfs_kernel.h - stuff needed only in the kernel

and the following legacy headers that are expected to be mopped up and
folded into one of the above:
lfs_extern.h - function prototypes
ulfs_bswap.h - endian-independent support
ulfs_dinode.h - now contains very little
ulfs_dirhash.h - dirhash support
ulfs_extattr.h - extattr support
ulfs_extern.h - more function prototypes
ulfs_inode.h - assorted kernel-only declarations
ulfs_quota.h - quota support
ulfs_quota1.h - more quota support
ulfs_quota2.h - more quota support
ulfs_quotacommon.h - more quota support
ulfsmount.h - legacy copy of ufsmount material
 1.132 18-Jun-2013  christos branches: 1.132.2;
Prefix most of the cpp macros with lfs_ and LFS_ to avoid conflicts with ffs.
This was done so that boot blocks that want to compile both FFS and LFS in
the same file work.
 1.131 06-Jun-2013  dholland Add lfs_ or ulfs_ in front of extern symbols lacking them, mostly
quota-related (and particularly quota2-related) stuff.
 1.130 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.129 06-Jun-2013  dholland Split lfs from ufs step 3: rearrange config stuff.
Add new options:
LFS_EI
LFS_DIRHASH
LFS_EXTATTR
LFS_EXTATTR_AUTOSTART
LFS_QUOTA
LFS_QUOTA2

and update code referring to the corresponding FFS and UFS config
symbols to use the LFS versions. Disable the one extant reference
to APPLE_UFS in the ulfs files. Use opt_lfs.h only, not opt_ffs.h.
 1.128 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.127 22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.126 23-Nov-2011  bouyer branches: 1.126.8;
If ufs_balloc_range() fails, make sure to call ?fs_truncate() to
reset v_writesize to the right value.
If v_writesize is left larger than the allocated blocks, we may have
the same issue as the one described in
http://mail-index.netbsd.org/tech-kern/2010/02/02/msg007156.html
 1.125 11-Jul-2011  hannken branches: 1.125.2;
Change VOP_BWRITE() to take a vnode as its first argument like all other
VOPs do. Layered file systems no longer have to modify bp->b_vp and run
into trouble when an async VOP_BWRITE() uses the wrong vnode.

- change all occurences of VOP_BWRITE(bp) to VOP_BWRITE(bp->b_vp, bp).
- remove layer_bwrite().
- welcome to 5.99.55

Adresses PR kern/38762 panic: vwakeup: neg numoutput

No objections from tech-kern@.
 1.124 16-Jun-2011  hannken Rename uvm_vnp_zerorange(struct vnode *, off_t, size_t) to
ubc_zerorange(struct uvm_object *, off_t, size_t, int) changing
the first argument to an uvm_object and adding a flags argument.

Modify tmpfs_reg_resize() to zero the backing store (aobj) instead
of the vnode. Ubc_purge() no longer panics when unmounting tmpfs.

Keep uvm_vnp_zerorange() until the next kernel version bump.
 1.123 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.122 16-Feb-2010  mlelstv branches: 1.122.2; 1.122.8;
Three changes in a single commit.

- drop the notion of frags (LFS fragments) vs fsb (FFS fragments)
The code uses a complicated unity function that just makes the
code difficult to understand.

- support larger sector sizes. Fix disk address computations
to use DEV_BSIZE in the kernel as required by device drivers
and to use sector sizes in userland.

- Fix several locking bugs in lfs_bio.c and lfs_subr.c.
 1.121 07-Feb-2010  bouyer branches: 1.121.2;
- ufs_balloc_range(): on error, only PG_RELEASED the pages that were
allocated to extend the file to the new size. Releasing all pages
may release pages that contains previously-written data not yet flushed
to disk. Should fix PR kern/35704
- {ffs,lfs,ext2fs}_truncate(): Even if the inode's size is the same as
the new length, call uvm_vnp_setsize(). *_truncate() may have been
called by *_write() in the error path (e.g. block allocation failure
because of quota of file system full), and at this point v_writesize
has been set to the desired size of the file and not reverted to the
old size. Not adjusting v_writesize to the real size cause
genfs_do_io() to write to disk past the real end of the file.
 1.120 28-Apr-2008  martin branches: 1.120.10; 1.120.18;
Remove clause 3 and 4 from TNF licenses
 1.119 27-Mar-2008  ad branches: 1.119.2; 1.119.4;
Make rusage collection per-LWP and collate in the appropriate places.
cloned threads need a little bit more work but the locking needs to
be fixed first.
 1.118 15-Feb-2008  ad branches: 1.118.6;
Give bbusy() an interlock argument. If the we need to wait for the buffer,
the interlock is dropped and reacquired when awoken. This allows for
busying buffers attached to a list that is not locked by bufcache_lock.
 1.117 15-Feb-2008  ad The buffer LOCKED flag need not be under the protection of bufcache_lock,
BUSY is enough.
 1.116 02-Jan-2008  ad Merge vmlocking2 to head.
 1.115 08-Dec-2007  pooka branches: 1.115.4;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.
 1.114 10-Oct-2007  ad branches: 1.114.4; 1.114.6;
Fix DEBUG builds.
 1.113 10-Oct-2007  ad Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.112 08-Oct-2007  ad Merge ffs locking & brelse changes from the vmlocking branch.
 1.111 10-Jul-2007  hannken branches: 1.111.6; 1.111.8; 1.111.10;
Move `struct dquot' and its supporting functions from quota.h to ufs_quota.c.

- Make quota-internal functions static.
- Clean up declarations in quota.h and ufs_extern.h. quota.h now has the
description of quota criterions, on-disk structure, user-kernel interface and
declaration of init/done functions. All ufs quota related function
prototypes go to ufs_extern.h.
- New functions ufsquota_init() and ufsquota_free() create or destroy the
quota fields of `struct inode'.
- chkdq() and chkiq() always update the quota fields of `struct inode' first.
- Only ufs_access() explicitely calls getinoquota().

No objections on tech-kern@
 1.110 05-Jun-2007  yamt improve post-ubc file overwrite performance in common cases.
ie. when it's safe, actually overwrite blocks rather than doing
read-modify-write.

also fixes PR/33152 and PR/36303.
 1.109 16-May-2007  perseant Change references to SEGM_W_DIROPS to SEGM_CKP, and replace the logic that
formerly used SEGM_W_DIROPS in lfs_segwrite() appropriately. This prevents
a problem in which processes could get stuck in "buffers" sleep forever.
 1.108 18-Apr-2007  perseant Remember to write dirops when the vnode we are trying to flush is a dirop.
 1.107 04-Mar-2007  christos branches: 1.107.2; 1.107.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.106 14-Oct-2006  yamt branches: 1.106.4;
don't use g_glock directly.
 1.105 14-May-2006  elad branches: 1.105.8; 1.105.10;
integrate kauth.
 1.104 14-May-2006  christos Correct a bogus expression gcc4 found.
 1.103 30-Apr-2006  perseant Postpone the segment accounting changes coming from truncation until the
inode that makes those changes valid is either written to disk by
lfs_writeinode() or discarded by lfs_vfree().

A couple of locking fixes are also included as well.
 1.102 19-Apr-2006  perseant Avoid a possible sign overflow condition in lfs_truncate, which would result
in a buffer overflow (underflow). Coverity CID 1521.
 1.101 08-Apr-2006  perseant Implement a somewhat finer-grained mechanism for paging LFS-backed pages.
The writer daemon, if it does not need to flush the whole filesystem,
now only writes the vnodes for which the pagedaemon has requested pageouts
(although it does not pay attention to the page ranges the pagedaemon
supplies).
 1.100 11-Dec-2005  christos branches: 1.100.4; 1.100.6; 1.100.8; 1.100.10; 1.100.12;
merge ktrace-lwp.
 1.99 11-Nov-2005  yamt - ignore truncation for VCHR/VBLK/VFIFO as it used to be
before yamt-vop merge. PR/32049 from Atsushi Onoe.
- reject setattr which attempts to change size of VLNK/VSOCK.
 1.98 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.97 12-Sep-2005  christos branches: 1.97.2;
Use nanotime() to update the time fields in filesystems. Convert the code
from macros to real functions. Original patch and review from chuq.
Note: ext2fs only keeps seconds in the on-disk inode, and msdosfs does not
have enough precision for all fields, so this is not very useful for those
two.
 1.96 29-May-2005  christos branches: 1.96.2;
- sprinkle const
- avoid shadow variables.
 1.95 23-Apr-2005  perseant Provide a resize_lfs(8), including kernel and cleaner support. The current
implementation requires the fs to be mounted while resizing. Tested in both
directions, and everything appears to work happily, but ymmv.
 1.94 16-Apr-2005  perseant Use splay trees, rather than a hash table, to manage the accounting of
blocks allocated through VOP_BALLOC() for pages to be written to disk.
This accounting no longer takes a noticeable fraction of the system CPU.
 1.93 16-Apr-2005  perseant Use lfs_malloc() to manage the blkiov arrays that the cleaner functions use,
since the cleaner is likely to operate in a low-memory condition.
 1.92 14-Apr-2005  perseant Keep track of the highest block held by an LFS inode, so that we can
be assured that the last byte of a file is always allocated. Previously
a file extension could cause the filesystem to be flushed, writing an
inconsistent inode to disk. Although this condition would be corrected
the next time blocks were written to disk, an intervening crash would leave
the filesystem in an inconsistent state, leaving fsck_lfs to complain
of an inode "partially truncated".
 1.91 01-Apr-2005  perseant Protect various per-fs structures with fs->lfs_interlock simple_lock, to
improve behavior in the multiprocessor case. Add debugging segment-lock
assertion statements.
 1.90 08-Mar-2005  perseant branches: 1.90.2;
Straighten out the maze of ifdefs. Instead, consolidate all the debugging
stuff under '#ifdef DEBUG', and use sysctl knobs to turn on/off particular
parts of the debugging reporting (if DEBUG is enabled). Re-enable the LFS
statistics in sysctl, while I'm there. A bit of a rototill.
 1.89 26-Feb-2005  perry nuke trailing whitespace
 1.88 15-Aug-2004  mycroft branches: 1.88.4; 1.88.6;
Don't write out the extra zero pages with PGO_SYNCIO. We start an asynchronous
write anyway, and they will not be freed until that write is finished.
 1.87 15-Aug-2004  mycroft Copy the current partial-truncate logic from FFS. In the process, fix a
potential overrun when truncating a fragment.
 1.86 15-Aug-2004  mycroft Minor simplification to some arithmetic.
 1.85 15-Aug-2004  mycroft Fixing age old cruft:
* Rather than using mnt_maxsymlinklen to indicate that a file systems returns
d_type fields(!), add a new internal flag, IMNT_DTYPE.

Add 3 new elements to ufsmount:
* um_maxsymlinklen, replaces mnt_maxsymlinklen (which never should have existed
in the first place).
* um_dirblksiz, which tracks the current directory block size, eliminating the
FS-specific checks littered throughout the code. This may be used later to
make the block size variable.
* um_maxfilesize, which is the maximum file size, possibly adjusted lower due
to implementation issues.

Sync some bug fixes from FFS into ext2fs, particularly:
* ffs_lookup.c 1.21, 1.28, 1.33, 1.48
* ffs_inode.c 1.43, 1.44, 1.45, 1.66, 1.67
* ffs_vnops.c 1.84, 1.85, 1.86

Clean up some crappy pointer frobnication.
 1.84 14-Aug-2004  mycroft Add a new flag, IN_MODIFY. This is like IN_UPDATE|IN_CHANGE, but unlike
setting those flags, it does not cause the inode to be written in the periodic
sync. This is used for writes to special files (devices and named pipes) and
FIFOs.

Do not preemptively sync updates to access times and modification times. They
are now updated in the inode only opportunistically, or when the file or device
is closed. (Really, it should be delayed beyond close, but this is enough to
help substantially with device nodes.)

And the most amusing part:
Trickle sync was broken on both FFS and ext2fs, in different ways. In FFS, the
periodic call to VFS_SYNC(MNT_LAZY) was still causing all file data to be
synced. In ext2fs, it was causing the metadata to *not* be synced. We now
only call VOP_UPDATE() on the node if we're doing MNT_LAZY. I've confirmed
that we do in fact trickle correctly now.
 1.83 30-Mar-2004  oster If we bail out due to an error, we need 'unreserve' the space that
we'd reserved earlier.

Approved by: yamt
 1.82 25-Jan-2004  hannken branches: 1.82.2;
Make VOP_STRATEGY(bp) a real VOP as discussed on tech-kern.

VOP_STRATEGY(bp) is replaced by one of two new functions:

- VOP_STRATEGY(vp, bp) Call the strategy routine of vp for bp.
- DEV_STRATEGY(bp) Call the d_strategy routine of bp->b_dev for bp.

DEV_STRATEGY(bp) is used only for block-to-block device situations.
 1.81 30-Dec-2003  pk Replace the traditional buffer memory management -- based on fixed per buffer
virtual memory reservation and a private pool of memory pages -- by a scheme
based on memory pools.

This allows better utilization of memory because buffers can now be allocated
with a granularity finer than the system's native page size (useful for
filesystems with e.g. 1k or 2k fragment sizes). It also avoids fragmentation
of virtual to physical memory mappings (due to the former fixed virtual
address reservation) resulting in better utilization of MMU resources on some
platforms. Finally, the scheme is more flexible by allowing run-time decisions
on the amount of memory to be used for buffers.

On the other hand, the effectiveness of the LRU queue for buffer recycling
may be somewhat reduced compared to the traditional method since, due to the
nature of the pool based memory allocation, the actual least recently used
buffer may release its memory to a pool different from the one needed by a
newly allocated buffer. However, this effect will kick in only if the
system is under memory pressure.
 1.80 07-Nov-2003  yamt more assertion about file truncation to zero.
 1.79 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.78 12-Jul-2003  yamt - protect global resource counts with lfs_subsys_lock.
- clean up scattered externs a little.
 1.77 29-Jun-2003  fvdl branches: 1.77.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.76 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.75 27-Apr-2003  yamt fix b_interlock lock/unlock mismatches.
 1.74 23-Apr-2003  perseant Make LFS work better (though still not "well") as an NFS-exported
filesystem (and other things that needed to be fixed before the tests
would complete), to wit:

* Include the fs ident in the filehandle; improve stale filehandle checks.

* Change definition of blksize() to use the on-dinode size instead of
the inode's i_size, so that fsck_lfs will work properly again.

* Use b_interlock in lfs_vtruncbuf.

* Postpone dirop reclamation until after the seglock has been released,
so that lfs_truncate is not called with the segment lock held.

* Don't loop in lfs_fsync(), just write everything and wait.

* Be more careful about the interlock/uobjlock in lfs_putpages: when we
lose this lock, we have to resynchronize dirtiness of pages in each
block.

* Be sure to always write indirect blocks and update metadata in
lfs_putpages; fixes a bug that caused blocks to be accounted to the
wrong segment.
 1.73 10-Apr-2003  simonb '#if 0' out a variable that is currently only used in other '#if 0'd out
code.
 1.72 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.71 20-Mar-2003  perseant Hold the segment lock during truncation to prevent indirect blocks from
being written by lfs_updatemeta while lfs_truncate is also writing them,
a bug pointed out by YAMAMOTO Takashi <yamt@netbsd.org>.
 1.70 08-Mar-2003  perseant Take away "#ifdef LFS_UBC".
 1.69 04-Mar-2003  perseant Don't force all truncations to be synchronous
 1.68 01-Mar-2003  perseant Be careful to always zero pages on truncation/fragment extension,
in the case where the filesystem block size is larger than PAGE_SIZE.
 1.67 28-Feb-2003  perseant Make lfs_truncate handle file extension correctly, in the LFS_UBC case.
 1.66 28-Feb-2003  perseant Quell a hasty panic in lfs_truncate: on-inode disk addresses can be
different between the beginning and end of the call.
 1.65 20-Feb-2003  perseant Tabify, and fix some comment alignment problems.
 1.64 17-Feb-2003  perseant Add code to UBCify LFS. This is still behind "#ifdef LFS_UBC" for now
(there are still some details to work out) but expect that to go
away soon. To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:

* Create a writer daemon kernel thread whose purpose is to handle page
writes for the pagedaemon, but which also takes over some of the
functions of lfs_check(). This thread is started the first time an
LFS is mounted.

* Add a "flags" parameter to GOP_SIZE. Current values are
GOP_SIZE_READ, meaning that the call should return the size of the
in-core version of the file, and GOP_SIZE_WRITE, meaning that it
should return the on-disk size. One of GOP_SIZE_READ or
GOP_SIZE_WRITE must be specified.

* Instead of using malloc(...M_WAITOK) for everything, reserve enough
resources to get by and use malloc(...M_NOWAIT), using the reserves if
necessary. Use the pool subsystem for structures small enough that
this is feasible. This also obsoletes LFS_THROTTLE.

And a few that are not strictly necessary:

* Moves the LFS inode extensions off onto a separately allocated
structure; getting closer to LFS as an LKM. "Welcome to 1.6O."

* Unified GOP_ALLOC between FFS and LFS.

* Update LFS copyright headers to correct values.

* Actually cast to unsigned in lfs_shellsort, like the comment says.

* Keep track of which segments were empty before the previous
checkpoint; any segments that pass two checkpoints both dirty and
empty can be summarily cleaned. Do this. Right now lfs_segclean
still works, but this should be turned into an effectless
compatibility syscall.
 1.63 25-Jan-2003  fvdl The oldblks and newblks arrays are used to store direct copies of
on-disk block pointers, so they should be int32_t. Error found
by Izumi Tsutsui.
 1.62 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.61 28-Dec-2002  yamt - in lfs_reserve, vref vnodes that we're locking so that cleaner doesn't
try to reclaim them.
(workaround for deadlock noted in the comment in lfs_reserveavail)
- in lfs_rename, mark vnodes which are being moved as well as directry vnodes.
 1.60 27-Sep-2002  provos remove trailing \n in panic(). approved perry.
 1.59 06-Jul-2002  perseant Deal with fragment size changes better. For each fragment that can
exist on an on-disk inode, we keep a record of its size in struct inode,
which is updated when we write the block to disk. The cleaner routines
thus have ready access to what size is the correct size for this block,
on disk.

Fixed a related bug: if a file with fragments is being cleaned
(fragments being cleaned) at the same time it is being extended beyond
NDADDR blocks, we could write a bogus FINFO record that has a frag in the
middle; when it was cleaned this would give back bogus file data. Don't
write the indirect blocks in this case, since there is no need.

lfs_fragextend and lfs_truncate no longer require the seglock, but instead
take a shared lock, which the seglock locks exclusively.
 1.58 02-Jul-2002  yamt fix printf format for DEBUG_LFS.
 1.57 14-May-2002  perseant branches: 1.57.2;
Phase one of my three-phase plan to make LFS play nice with UBC, and bug-fixes
I found while making sure there weren't any new ones.

* Make the write clusters keep track of the buffers whose blocks they contain.
This should make it possible to (1) write clusters using a page mapping
instead of malloc, if desired, and (2) schedule blocks for rewriting
(somewhere else) if a write error occurs. Code is present to use
pagemove() to construct the clusters but that is untested and will go away
anyway in favor of page mapping.
* DEBUG now keeps a log of Ifile writes, so that any lingering instances of
the "dirty bufs" problem can be properly debugged.
* Keep track of whether the Ifile has been dirtied by various routines that
can be called by lfs_segwrite, and loop on that until it is clean, for
a checkpoint. Checkpoints need to be squeaky clean.
* Warn the user (once) if the Ifile grows larger than is reasonable for their
buffer cache. Both lfs_mountfs and lfs_unmount check since the Ifile can
grow.
* If an inode is not found in a disk block, try rereading the block, under
the assumption that the block was copied to a cluster and then freed.
* Protect WRITEINPROG() with splbio() to fix a hang in lfs_update.
 1.56 23-Nov-2001  chs add spaces for KNF. confirmed to produce identical objects.
 1.55 08-Nov-2001  lukem add RCSID
 1.54 06-Nov-2001  simonb Remove some variables that are set but never used.
 1.53 15-Sep-2001  chs branches: 1.53.2;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.52 13-Jul-2001  perseant branches: 1.52.2;
Merge the short-lived perseant-lfsv2 branch into the trunk.

Kernels and tools understand both v1 and v2 filesystems; newfs_lfs
generates v2 by default. Changes for the v2 layout include:

- Segments of non-PO2 size and arbitrary block offset, so these can be
matched to convenient physical characteristics of the partition (e.g.,
stripe or track size and offset).

- Address by fragment instead of by disk sector, paving the way for
non-512-byte-sector devices. In theory fragments can be as large
as you like, though in reality they must be smaller than MAXBSIZE in size.

- Use serial number and filesystem identifier to ensure that roll-forward
doesn't get old data and think it's new. Roll-forward is enabled for
v2 filesystems, though not for v1 filesystems by default.

- The inode free list is now a tailq, paving the way for undelete (undelete
is not yet implemented, but can be without further non-backwards-compatible
changes to disk structures).

- Inode atime information is kept in the Ifile, instead of on the inode;
that is, the inode is never written *just* because atime was changed.
Because of this the inodes remain near the file data on the disk, rather
than wandering all over as the disk is read repeatedly. This speeds up
repeated reads by a small but noticeable amount.

Other changes of note include:

- The ifile written by newfs_lfs can now be of arbitrary length, it is no
longer restricted to a single indirect block.

- Fixed an old bug where ctime was changed every time a vnode was created.
I need to look more closely to make sure that the times are only updated
during write(2) and friends, not after-the-fact during a segment write,
and certainly not by the cleaner.
 1.51 30-May-2001  mrg branches: 1.51.2; 1.51.4;
use _KERNEL_OPT
 1.50 03-Dec-2000  perseant branches: 1.50.2;
Get rid of some old unnecessary code that cleared B_NEEDCOMMIT from buffers in
lfs_writeseg (possibly after they had been freed).

If MALLOCLOG is defined, make lfs_newbuf and lfs_freebuf pass along the
caller's file and line to _malloc and _free.
 1.49 27-Nov-2000  chs Initial integration of the Unified Buffer Cache project.
 1.48 27-Nov-2000  perseant If LFS_DO_ROLLFORWARD is defined, roll forward from the older checkpoint
on mount, through the newer checkpoint and on through any newer
partial-segments that may have been written but not checkpointed because
of an intervening crash.

LFS_DO_ROLLFORWARD is not defined by default.
 1.47 21-Nov-2000  perseant More locked_queue_* and lfs_avail accounting fixes from Jesse Off
<joff@gci-net.com>. Remove a specious btodb() in lfs_fragextend, and
count blocks shrunk or removed by VOP_TRUNCATE in lfs_avail.
 1.46 17-Nov-2000  perseant Correct accounting of lfs_avail, locked_queue_count, and locked_queue_bytes.
(PR #11468). In the case of fragment allocation, check to see if enough
space is available before extending a fragment already scheduled for writing.

The locked_queue_* variables indicate the number of buffer headers and bytes,
respectively, that are unavailable to getnewbuf() because they are locked up
waiting for LFS to flush them; make sure that that is actually what we're
counting, i.e., never count malloced buffers, and always use b_bufsize instead
of b_bcount.

If DEBUG is defined, the periodic calls to lfs_countlocked will now complain
if either counter is incorrect. (In the future lfs_countlocked will not need
to be called at all if DEBUG is not defined.)
 1.45 14-Oct-2000  perseant In lfs_truncate, don't overcount the real blocks removed from the inode,
when deallocating a fragment that has not made it to disk yet.

Also, during dirops, give the directory vnode an extra reference in
SET_DIROP, to ensure its continued existence during SET_ENDOP, preventing
a possible NULL-dereference there.

These two changes should close PR #11064.
 1.44 09-Sep-2000  perseant Various bug-fixes to LFS, to wit:


Kernel:

* Add runtime quantity lfs_ravail, the number of disk-blocks reserved
for writing. Writes to the filesystem first reserve a maximum amount
of blocks before their write is allowed to proceed; after the blocks
are allocated the reserved total is reduced by a corresponding amount.

If the lfs_reserve function cannot immediately reserve the requested
number of blocks, the inode is unlocked, and the thread sleeps until
the cleaner has made enough space available for the blocks to be
reserved. In this way large files can be written to the filesystem
(or, smaller files can be written to a nearly-full but thoroughly
clean filesystem) and the cleaner can still function properly.

* Remove explicit switching on dlfs_minfreeseg from the kernel code; it
is now merely a fs-creation parameter used to compute dlfs_avail and
dlfs_bfree (and used by fsck_lfs(8) to check their accuracy). Its
former role is better assumed by a properly computed dlfs_avail.

* Bounds-check inode numbers submitted through lfs_bmapv and lfs_markv.
This prevents a panic, but, if the cleaner is feeding the filesystem
the wrong data, you are still in a world of hurt.

* Cleanup: remove explicit references of DEV_BSIZE in favor of
btodb()/dbtob().

lfs_cleanerd:

* Make -n mean "send N segments' blocks through a single call to
lfs_markv". Previously it had meant "clean N segments though N calls
to lfs_markv, before looking again to see if more need to be cleaned".
The new behavior gives better packing of direct data on disk with as
little metadata as possible, largely alleviating the problem that the
cleaner can consume more disk through inefficient use of metadata than
it frees by moving dirty data away from clean "holes" to produce
entirely clean segments.

* Make -b mean "read as many segments as necessary to write N segments
of dirty data back to disk", rather than its former meaning of "read
as many segments as necessary to free N segments worth of space". The
new meaning, combined with the new -n behavior described above,
further aids in cleaning storage efficiency as entire segments can be
written at once, using as few blocks as possible for segment summaries
and inode blocks.

* Make the cleaner take note of segments which could not be cleaned due
to error, and not attempt to clean them until they are entirely free
of dirty blocks. This prevents the case in which a cleanerd running
with -n 1 and without -b (formerly the default) would spin trying
repeatedly to clean a corrupt segment, while the remaining space
filled and deadlocked the filesystem.

* Update the lfs_cleanerd manual page to describe all the options,
including the changes mentioned here (in particular, the -b and -n
flags were previously undocumented).

fsck_lfs:

* Check, and optionally fix, lfs_avail (to an exact figure) and
lfs_bfree (within a margin of error) in pass 5.

newfs_lfs:

* Reduce the default dlfs_minfreeseg to 1/20 of the total segments.

* Add a warning if the sgs disklabel field is 16 (the default for FFS'
cpg, but not usually desirable for LFS' sgs: 5--8 is a better range).

* Change the calculation of lfs_avail and lfs_bfree, corresponding to
the kernel changes mentioned above.

mount_lfs:

* Add -N and -b options to pass corresponding -n and -b options to
lfs_cleanerd.

* Default to calling lfs_cleanerd with "-b -n 4".


[All of these changes were largely tested in the 1.5 branch, with the
idea that they (along with previous un-pulled-up work) could be applied
to the branch while it was still in ALPHA2; however my test system has
experienced corruption on another filesystem (/dev/console has gone
missing :^), and, while I believe this unrelated to the LFS changes, I
cannot with good conscience request that the changes be pulled up.]
 1.43 09-Sep-2000  perseant Make sure to unmark B_DELWRI on blocks freed due to truncation to a non-zero
file length. Should fix PR #s 10551 and 10831.
 1.42 05-Jul-2000  perseant Clean up accounting of lfs_uinodes (dirty but unwritten inodes).

Make lfs_uinodes a signed quantity for debugging purposes, and set it to
zero as fs mount time.

Enclose setting/clearing of the dirty flags (IN_MODIFIED, IN_ACCESSED,
IN_CLEANING) in macros, and use those macros everywhere. Make
LFS_ITIMES use these macros; updated the ITIMES macro in inode.h to know
about this. Make ufs_getattr use ITIMES instead of FFS_ITIMES.
 1.41 04-Jul-2000  perseant Fix errors observed while trying to fill the filesystem with yesterday's
fixes:

- Write copies of bfree and avail in the CLEANERINFO block, so the
cleaner doesn't have to guess which superblock has the current
information (if indeed any do).

- Tighten up accounting of lfs_avail (more needs to be done).

- When cleansing indirect blocks of UNWRITTEN, make sure not to mark
them clean, since they'll need to be rewritten later.
 1.40 03-Jul-2000  perseant Allow the number of free segments reserved for the cleaner to be
parametrized in the filesystem, defaulting to MIN_FREE_SEGS = 2 but set
to something more reasonable at newfs_lfs time.

Note the number of blocks that have been scheduled for writing but which
are not yet on disk in an inode extension, i_lfs_effnblks. Move
i_ffs_effnlink out of the ffs extension and onto the main inode, since
it's used all over the shared code and the lfs extension would clobber
it.

At inode write time, indirect blocks and inode-held blocks of inodes
that have i_lfs_effnblks != i_ffs_blocks are cleansed of UNWRITTEN disk
addresses, so that these never make it to disk.
 1.39 28-Jun-2000  mrg remove include of <vm/vm.h> and <uvm/uvm_extern.h>
 1.38 27-Jun-2000  perseant Fixes associated with filling an LFS:

Change the space computation to appear to change the size of the *disk*
rather than the *bytes used* when more segment summaries and inode
blocks are written. Try to estimate the amount of space that these will
take up when more files are written, so the disk size doesn't change too
much.

Regularize error returns from lfs_valloc, lfs_balloc, lfs_truncate: they
now fail entirely, rather than succeeding half-way and leaving the fs in
an inconsistent state.

Rewrite lfs_truncate, mostly stealing from ffs_truncate. The old
lfs_truncate had difficulty truncating a large file to a non-zero size
(indirect blocks were not handled appropriately).

Unmark VDIROP on fvp after ufs_remove, ufs_rmdir, so these can be
reclaimed immediately: this vnode would not be written to disk again
anyway if the removal succeeded, and if it failed, no directory
operation occurred.

ufs_makeinode and ufs_mkdir now remove IN_ADIROP on error.
 1.37 31-May-2000  perseant branches: 1.37.2;
update for IN_ACCESSED changes
 1.36 13-May-2000  perseant branches: 1.36.2;
Change the sementics of the last parameter from a boolean ("waitfor") to
a set of flags ("flags"). Two flags are defined, UPDATE_WAIT and
UPDATE_DIROP.

Under the old semantics, VOP_UPDATE would block if waitfor were set,
under the assumption that directory operations should be done
synchronously. At least LFS and FFS+softdep do not make this
assumption; FFS+softdep got around the problem by enclosing all relevant
calls to VOP_UPDATE in a "if(!DOINGSOFTDEP(vp))", while LFS simply
ignored waitfor, one of the reasons why NFS-serving an LFS filesystem
did not work properly.

Under the new semantics, the UPDATE_DIROP flag is a hint to the
fs-specific update routine that the call comes from a dirop routine, and
should be wait for, or not, accordingly.

Closes PR#8996.
 1.35 05-May-2000  perseant Change the way LFS does block accounting, from trying to infer from the
buffer cache flags, to marking the inode and/or indirect blocks with a
special disk address UNWRITTEN==-2 when a block is accounted for. (This
address is never written to disk, but only used in-core. This is essentially
the same method of block accounting as on the UBC branch, where the buffer
headers don't exist.) Make sure that truncation is handled properly,
especially in the case of holey files.

Fixes PR#9994.
 1.34 24-Apr-2000  perseant get rid of unused variable aflags
 1.33 23-Apr-2000  perseant Fix problems outlined in PR#9926:
- lfs_truncate extends the file if called with length > i_ffs_size;
- lfs_truncate errors out if called with length < 0;
- lfs_balloc block accounting corrected for the case of blocks read
into the cache before they exist on disk;
- mp->mnt_stat.f_iosize is initialized in lfs_mountfs.
 1.32 30-Mar-2000  augustss Remove register declarations.
 1.31 12-Mar-2000  bouyer lfs_truncate: handle synlinks with length > maxsymlink_len as regular files.
For symlinks > 60 chars we were bzero'ing part of (struct inode) past the
actual inode struct, corrupting memory following the current (struct inode)
resuling in a 'panic: pool_get(lfsinopl): free list modified' later.
This could also be the cause of random panics. With this fix LFS seems to be
useable for me now.
 1.30 19-Jan-2000  perseant Changes to stabilize LFS. The first two of these should also apply to the
1.4 branch.

* Use a separate per-fs lock, instead of ufs_hashlock, to protect the Inode
free list. This seems to prevent the "lockmgr: %d, not exclusive lock holder
%d, unlocking" message I was mis-attributing last night to an unlocked vnode
being passed to vrele.

* Change calling semantics of lfs_ifind, to give better error reporting:
If fed a struct buf, it can report the block number of the offending inode
block as well as the inode number.

* Back out rev 1.10 of lfs_subr.c, since the replacement code was slightly
uglier while being functionally identical.

* Make lfs_vunref use the same free list convention as vrele/vput, so that
vget does not remove vnodes from a hash list they are not on.
 1.29 16-Jan-2000  perseant Fix a problem in my changes of Dec 14th, that prevents removed vnodes
from being inactivated under some conditions. Removed vnodes are now
inactivated when the VDIROP flag is cleared, and to prevent block
accounting problems this clearing has been postponed until
lfs_segunlock.
 1.28 23-Nov-1999  fvdl Be more careful to block bio interrupts for some data structures. There
were at least a few missed cases where vp->v_{clean,dirty}blkhd were
unprotected since the softdep/trickle sync merge.
 1.27 03-Sep-1999  perseant branches: 1.27.2; 1.27.8;
Make changes that will allow an LFS filesystem to be used as the root
filesystem. In particular,

- Fix mknod deadlock, described in PR 8172.
- Enable lfs_mountroot.
- Make lfs_writevnodes treat filesystems mounted on lfs device nodes properly,
by flushing that device rather than trying to add blocks to the device inode.

This, in combination with lfs boot blocks, will allow operation of an all-lfs
system.
 1.26 15-Jun-1999  perseant Minor changes to the segment live bytes calculation. In particular, fixed
a bug in fragment extension that could run the count negative. Also, don't
overcount for inodes, and don't count segment summaries. Thus, for empty
segments the live bytes count should now be exactly zero.
 1.25 01-Jun-1999  perseant Fixed lfs_update (and related functions) so that calls from lfs_fsync
will DTRT with vnodes marked VDIROP. In particular, the message
"flushing VDIROP" will no longer appear, and the filesystem will remain
stable in the event of a crash.

This was particularly a problem with NFS-exported LFSes, since fsync
was called on every file close.
 1.24 12-Apr-1999  perseant Fix block counting during file truncation, if not truncating to zero.
 1.23 12-Apr-1999  perseant Make sure that the wakeup occurs for vnodes that lfs_update might be sleeping
on (nodes which are not marked IN_MODIFIED/IN_CLEANING, but which have dirty
buffers), by marking them with the appropriate flag if dirtybuffers were added
while the write was in progress.
 1.22 01-Apr-1999  perseant branches: 1.22.2;
Fix buffer handling problems in lfs_vinvalbuf
 1.21 29-Mar-1999  perseant lfs_truncate calls vinvalbuf to invalidate all currently-hald buffers, which
in turn forces a flush of the vnode, whether or not it is involved in a dirop.
(This can happen during a remove or rmdir, when the directory is shrunk.)
Because of the nature of dirops, however, flushing a vnode involved in a dirop
is disallowed (and was marked with a panic). This patch has lfs_truncate
call a specialized vinvalbuf that only invalidates buffers following the new
end-of-file, and thus does not require a flush. Also the panic is demoted,
in case I missed any other path to lfs_vflush.
 1.20 25-Mar-1999  perseant clean up unused/required #ifdefs
 1.19 24-Mar-1999  mrg completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.
 1.18 10-Mar-1999  perseant New sources should leave the LFS in a more-or-less working state. Changes
include:

- DIROP segregation is enabled, and greater care is taken
to make sure that a checkpoint completes. Fsck is not
needed to remount the filesystem.
- Several checks to make sure that the LFS subsystem does not
overuse various resources (memory, in particular).
- The cleaner routines, lfs_markv in particular, are completely
rewritten. A buffer overflow is removed. Greater care is taken
to ensure that inodes come from where lfs_cleanerd say they come
from (so we know nothing has changed since lfs_bmapv was called).
- Fragment allocation is fixed, so that writes beyond end-of-file
do the right thing.
 1.17 05-Mar-1999  mycroft Pass null pointers to VOP_UPDATE rather than having all the callers fetch the
current time themselves.
 1.16 05-Mar-1999  mycroft Permit the access and modify time pointers passed to VOP_UPDATE to be null,
meaning the current time.
 1.15 10-Feb-1999  bouyer Make sure a buffer optained from bread() is always bresle()'d in case of
error. Closes PR kern/1448 from Wolfgang Solfrank.
 1.14 09-Jun-1998  scottr Protect various config(8)-generated files from inclusion while
building LKMs. Fixes PR 5557.
 1.13 08-Jun-1998  scottr Use the newly-defined opt_quota.h.
 1.12 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.11 07-Feb-1998  chs add UVM stuff.
 1.10 04-Jul-1997  drochner Don't cast 64bit (off_t) file sizes to vm_offset_t (32bit on many
architectures), truncate them intelligently instead.
The truncation is done centralized in vnode_pager.c.
This prevents from wrap-over effects when parts of large (>2^32 byte) files
are mmapped.
Don't allow to mmap above the numerical range of vm_offset_t.
This is considered a temporary solution until the vm system handles the
object sizes/offsets more cleanly.
 1.9 11-Jun-1997  bouyer Add support for ext2fs, this needed a few modifications to ufs/ufs/inode.h:
- added an "union inode_ext" to struct inode, for the per-fs extentions.
For now only ext2fs uses it.
- i_din is now an union:
union {
struct dinode ffs_din; /* 128 bytes of the on-disk dinode. */
struct ext2fs_dinode e2fs_din; /* 128 bytes of the on-disk dinode. */
} i_din
Added a lot of #define i_ffs_* and i_e2fs_* to access the fields.
- Added two macros: FFS_ITIMES and EXT2FS_ITIMES. ITIMES calls the rigth
macro, depending on the time of the inode. ITIMES is used where necessary,
FFS_ITIMES and EXT2FS_ITIMES in other places.
 1.8 12-Oct-1996  christos revert previous kprintf changes
 1.7 10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.6 01-Sep-1996  mycroft Add a set of generic file system operations that most file systems use.
Also, fix some time stamp bogosities.
 1.5 11-May-1996  mycroft Change VOP_UPDATE() semantics:
* Make 2nd and 3rd args timespecs, not timevals.
* Consistently pass a Boolean as the 4th arg (except in LFS).
Also, fix ffs_update() and lfs_update() to actually change the nsec fields.
 1.4 09-Feb-1996  christos lfs prototypes
 1.3 15-Jun-1995  cgd compensate for timeval/timespec/stat structure changes.
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.22.2.5 20-Jan-2000  he Pull up revisions 1.29-1.30 (requested by perseant):
Files removed (through unlink, rmdir) are now really removed, though the
removal is postponed until the dirop is complete to ensure validity of
the filesystem through a crash. Use a separate per-fs lock, instead of
ufs_hashlock, to protect the inode free list. Change calling semantics
of lfs_ifind, to give better error reporting: If fed a struct buf, it
can report the block number of the offending inode block as well as the
inode number.
 1.22.2.4 15-Jan-2000  he Pull up revision 1.27 (requested by perseant):
Address problems related to using an LFS filesystem as the root
filesystem, including mknod hangs. Fixes PR#8172 and PR#9072.
 1.22.2.3 17-Dec-1999  he Pull up revision 1.25 (requested by perseant):
Avoid flushing vnodes involved in a dirop, making lfs' promise
of "no fsck needed, even in the event of a crash" closer to
reality.
 1.22.2.2 25-Jun-1999  perry pullup 1.25->1.26 (perseant)
 1.22.2.1 13-Apr-1999  perseant branches: 1.22.2.1.2; 1.22.2.1.4;
Pull-up of changes made to the trunk on Sunday [1.22->1.24], to wit:

Take out the `#ifdef USE_UFSHASH'; use ufs_hashlock to lock the inode free
list instead of free_lock.

Fix inode reporting in lfs_statfs (the meaning of f_files and f_ffree was
reversed).

Fix "lfs_ifind: dinode xxx not found" panic. When inodes were freed, then
immediately reloaded, their dinodes were located in an inode block which
was not on disk at the advertized location, nor in the cache (although it
would be flushed to disk next segment write). Fix this by using getblk()
instead of lfs_newbuf() for inode blocks.

Better checking for held inode locks in lfs_fastvget, for a number of
error conditions. Also change the default setting of lfs_clean_vnhead to
0, which seems to make the locking problems go away (although this is
difficult to test as I can't reliably reproduce them).

Make sure that the wakeup occurs for vnodes that lfs_update might be
sleeping on (nodes which are not marked IN_MODIFIED/IN_CLEANING, but which
have dirty buffers), by marking them with the appropriate flag if
dirtybuffers were added while the write was in progress.

Fix block counting during file truncation, if not truncating to zero.

Disallow threshold-initiated cache flush when dirops are active. Also,
make SET_ENDOP use lfs_check instead of inlining most of it.

Improve the debugging printfs in the cleaner syscalls (in particular, make
it obvious that they're coming from lfs).

Check the superblock version field, and refuse to mount the filesystem if
the version number is higher than we know about. This allows, e.g.,
changes in the format of the ifile, segment size restrictions and
boundaries, etc., which would not affect existing fields in the
superblock, but which would drastically affect the filesystem, to be
smoothly integrated at a later date.
 1.22.2.1.4.1 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.22.2.1.2.3 31-Aug-1999  perseant Rudimentary support for LFS under UBC:

- LFS-specific VOP_BALLOC and VOP_PUTPAGES vnode ops.

- getblk VREG panic #ifdef'd out (can be reinstated when Ifile is
internalized and Ifile can be made another type from VREG)

- interface to VOP_PUTPAGES changed to pass all pager flags, not
just sync. FS putpages routines must know about the pager flags.

- new LFS magic disk address, -2 ("unwritten"), meaning accounted for
but not assigned to a fixed disk location (since LFS does these two
things separately, and the previous accounting method using buffer
headers no longer will work). Changed references to (foo == (daddr_t)-1)
to (foo < 0). Since disk drivers reject all addresses < 0, this should
not present a problem for other FSs.
 1.22.2.1.2.2 11-Jul-1999  chs remove uvm_vnp_uncache(), it's no longer needed.
 1.22.2.1.2.1 21-Jun-1999  thorpej Sync w/ -current.
 1.27.8.2 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.27.8.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.27.2.3 08-Dec-2000  bouyer Sync with HEAD.
 1.27.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.27.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.36.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.37.2.3 23-Mar-2001  he Pull up revisions 1.46-1.47 (via patch, requested by perseant):
o Close up accounting holes in LFS' accounting of immediately-
available-space, number of clean segments, and amount of dirty
space taken up by metadata (PR#11468, PR#11470, PR#11534).
This one got left out when the rest was pulled up. Sorry.
 1.37.2.2 01-Nov-2000  tv Pullup 1.45 [perseant, toshii]:
In lfs_truncate, don't overcount the real blocks removed from the inode,
when deallocating a fragment that has not made it to disk yet.

Also, during dirops, give the directory vnode an extra reference in
SET_DIROP, to ensure its continued existence during SET_ENDOP, preventing
a possible NULL-dereference there.

These two changes should close PR #11064.
 1.37.2.1 14-Sep-2000  perseant Pull up recent LFS kernel changes (approved by thorpej):

ufs/ufs/inode.h, 1.20--1.22 (add i_lfs_effnblks extension ;
make ITIMES aware of LFS_ITIMES;
_LKM protection so userland progs
compile)
ufs/ufs/ufs_vnops.c, 1.69, 1.71 (remove IN_ADIROP;
use ITIMES instead of FFS_ITIMES)
ufs/ufs/ufs_readwrite.c, 1.27 (use lfs_reserve in lfs_write)
ufs/lfs/lfs.h, 1.26--1.32 (define LFS_EST_* macros ;
change MIN_FREE_SEGS to lfs_minfreesegs ;
add avail and bfree to CLEANERINFO ;
change lfs_uinodes to signed ;
change lfs_dmeta to signed ;
add whitespace to line up structure
members ;
explicit cast to int32_t in LFS_EST_*
macros)
ufs/lfs/lfs_alloc.c, back out 1.34.2.3 (pullups of 1.39, 1.40);
then pull up 1.38 (clean up on error)
1.39--1.43 (restore fvdl's ufs_hashlock fix ;
restore fvdl's ufs_hashlock fix ;
set i_lfs_effnblks ;
use UINO macros ;
add comments and fix long lines)
ufs/lfs/lfs_balloc.c, 1.19 (don't succeed halfway)
1.21--1.25 (use i_lfs_effnblks ;
fix i_lfs_effnblks computation and
quieten ;
fix i_ffs_blocks in unwritten fragment ;
remove useless debugging check ;
add comments and (c) 2000)
ufs/lfs/lfs_bio.c, 1.24--1.30 (cleanup and make lfs_flush_fs take
"struct lfs *" instead of "struct
mount *" ;
use lfs_minfreeseg instead of
MIN_FREE_SEGS ;
use UINO macros, and copy bfree/avail
to CLEANERINFO ;
add lfs_reserve function ;
1.28--1.30 fix printf formatting)
ufs/lfs/lfs_cksum.c, 1.13 (add (c) 2000)
ufs/lfs/lfs_debug.c, 1.11 (use btodb instead of DEV_BSIZE)
ufs/lfs/lfs_extern.h, 1.18, 1.20--1.21 (function prototype changes)
ufs/lfs/lfs_inode.c, 1.38 (rewrite lfs_truncate from
ffs_truncate)
1.40--1.44 (count written and unwritten blocks
seperately ;
use disk block units instead of bytes ;
remove unnecessary "mod" variable ;
correct B_DELWRI to avoid bawrite panic ;
use lfs_reserve)
ufs/lfs/lfs_segment.c, 1.52-1.59 (use lfs_dmeta to note used summaries ;
check for UNWRITTEN in indirect blocks ;
more debugging stuff inside #ifdef
DEBUG_LFS ;
use LK_CANRECURSE ;
don't drop dirty indirect blocks ;
use UINO macros ;
don't hose the free list ;
use btodb() instead of DEV_BSIZE ;
make it compile again (oops))
ufs/lfs/lfs_subr.c, 1.16--1.17 (check for locked inodes before
changing ;
use btodb() instead of DEV_BSIZE, (c)
2000)
ufs/lfs/lfs_syscalls.c, back out 1.41.4.2 (fvdl's ufs_hashlock fix);
then pull up 1.43 (use lfs_dmeta)
1.44--1.45 (restore fvdl's ufs_hashlock fix)
1.46--1.47 (fix lfs_avail leakage from sblock
segments ;
use UINO macros)
1.49 (bounds-check inode numbers in
lfs_markv)
ufs/lfs/lfs_vfsops.c, 1.53 (use LFS_EST_* macros in lfs_statfs)
1.56--1.58 (initialize lfs_minfreeseg, lfs_effnblk ;
initialize lfs_uinodes ;
initialize lfs_ravail)
ufs/lfs/lfs_vnops.c, 1.40 (remove VDIROP from removed files)
1.42--1.44 (move SET_ENDOP below the removal of
VDIROP ;
use UINO macros and add lfs_itimes
function ;
use lfs_reserve in dirops)
 1.50.2.9 29-Dec-2002  thorpej Sync with HEAD.
 1.50.2.8 18-Oct-2002  nathanw Catch up to -current.
 1.50.2.7 01-Aug-2002  nathanw Catch up to -current.
 1.50.2.6 20-Jun-2002  nathanw Catch up to -current.
 1.50.2.5 08-Jan-2002  nathanw Catch up to -current.
 1.50.2.4 14-Nov-2001  nathanw Catch up to -current.
 1.50.2.3 21-Sep-2001  nathanw Catch up to -current.
 1.50.2.2 24-Aug-2001  nathanw Catch up with -current.
 1.50.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.51.4.5 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.51.4.4 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.51.4.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.51.4.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.51.4.1 03-Aug-2001  lukem update to -current
 1.51.2.3 02-Jul-2001  perseant Change disk addressing unit to be the fragment, instead of the disk sector.
All quantities in the superblock, inodes, indirect blocks, etc. refer now
to this abstract unit (called "fsb" as it is in FFS) instead of disk sectors;
as a consequence segment summary blocks have to be multiples of a fragment in
size. In v1 filesystems, compatibility code ensures that 1 fsb == 1 sector,
regardless of fragment size.

Fragments can now range in size between 512 and 32k; in the event that
LFS_LABELPAD (8k) is smaller than the disk address unit size, an extra
proto-superblock is kept at 8k from the beginning of the disk, to be used
*only* to locate the real superblocks. (Not all of the userland knows about
this yet.)

Almost all of this was done not by me, but by joff.
 1.51.2.2 29-Jun-2001  perseant Get rid of __P(), protoizing where it had not already been done
 1.51.2.1 27-Jun-2001  perseant Import of what I've been calling "LFSv2", that is, LFS with some features
added that require changes to the on-disk data structures. These include:

- 64-bit time in everything but inodes
- User-specified segment offset, and segment size no longer
restricted to PO2.
- Serial number on segment summaries in addition to timestamp, and
a new volume identifier, to make roll-forward feasible without
fear of finding old data and thinking it was new.

Although I think this version works at least as well as what's on the trunk,
we're not done yet; hence this commit is going in on a branch and not on
the trunk. Enhancements that are not here yet include fragment addressing,
like FFS does, instead of block addressing.
 1.52.2.1 01-Oct-2001  fvdl Catch up with -current.
 1.53.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.57.2.1 15-Jul-2002  gehenna catch up with -current.
 1.77.2.9 11-Dec-2005  christos Sync with head.
 1.77.2.8 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.77.2.7 08-Mar-2005  skrll Sync with HEAD.
 1.77.2.6 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.77.2.5 21-Sep-2004  skrll Fix the sync with head I botched.
 1.77.2.4 18-Sep-2004  skrll Sync with HEAD.
 1.77.2.3 25-Aug-2004  skrll Sync with HEAD.
 1.77.2.2 03-Aug-2004  skrll Sync with HEAD
 1.77.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.82.2.1 09-Apr-2004  jmc Pullup rev 1.83 (requested by oster in ticket #112)

If we bail out due to an error, we need 'unreserve' the space that
we'd reserved earlier.
 1.88.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.88.4.1 29-Apr-2005  kent sync with -current
 1.90.2.5 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs.h: revision 1.104
sys/ufs/lfs/lfs_vfsops.c: revision 1.206
sys/ufs/lfs/lfs_vnops.c: revision 1.170
sys/ufs/lfs/lfs_extern.h: revision 1.80
sys/ufs/lfs/lfs_segment.c: revision 1.176
sys/ufs/lfs/lfs_inode.c: revision 1.103 via patch
sys/ufs/lfs/lfs_alloc.c: revision 1.90
Postpone the segment accounting changes coming from truncation until the
inode that makes those changes valid is either written to disk by
lfs_writeinode() or discarded by lfs_vfree().
A couple of locking fixes are also included as well.
 1.90.2.4 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_inode.c: revision 1.102
Avoid a possible sign overflow condition in lfs_truncate, which would result
in a buffer overflow (underflow). Coverity CID 1521.
 1.90.2.3 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vfsops.c: revision 1.200
sys/ufs/lfs/lfs_vnops.c: revision 1.164
sys/ufs/lfs/lfs_inode.c: revision 1.101
sys/ufs/lfs/lfs_extern.h: revision 1.78
sys/ufs/lfs/lfs.h: revision 1.100
Implement a somewhat finer-grained mechanism for paging LFS-backed pages.
The writer daemon, if it does not need to flush the whole filesystem,
now only writes the vnodes for which the pagedaemon has requested pageouts
(although it does not pay attention to the page ranges the pagedaemon
supplies).
 1.90.2.2 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.152
sys/ufs/lfs/lfs_debug.c: revision 1.31
sys/ufs/lfs/lfs_subr.c: revision 1.53
sys/ufs/lfs/lfs_extern.h: revision 1.68
sys/ufs/lfs/lfs_inode.c: revision 1.96
sys/ufs/lfs/lfs_bio.c: revision 1.86
sys/ufs/lfs/lfs_alloc.c: revision 1.83
sys/ufs/lfs/lfs_vfsops.c: revision 1.181
sys/ufs/lfs/lfs.h: revision 1.88
sys/ufs/lfs/lfs_segment.c: revision 1.164
- sprinkle const
- avoid shadow variables.
 1.90.2.1 07-May-2005  tron Apply patch (requested by perseant in ticket #242):
* fsck_lfs buffer cache fixes, including PR #29151
* Change fsck_lfs phase 0 message to reflect reality
* fsck_lfs: check phase 5 (cleanerinfo accounting) even on
roll-forward
* Keep better track of the free list during roll-forward, avoiding
a core dump
* Improve hash table use for fsck_lfs buffer and vnode cache
* Document fsck_lfs flag -f, and implement -q
* Add resize_lfs, including kernel support
* Add LFS to mountd's list of exportable filesystem types
* Make the LFS lkm work again [christos@]
* Add MP locking to the LFS kernel subsystem
* Fix pager_map deadlock in lfs_putpages()
* Avoid incomplete file extension that looks like "partial
truncation" to fsck
* Use lfs_malloc for cleaner malloc, since the cleaner often runs
in low-memory conditions.
* Use splay trees, not hash table, to track page allocation for
write.
* Fix mkdir panic on full fs
* Fix page accounting leak by counting differently.
* Use rightly named structure for lfs_getattr [skrll@]
* Cosmetic changes for readability.
 1.96.2.6 27-Feb-2008  yamt sync with head.
 1.96.2.5 21-Jan-2008  yamt sync with head
 1.96.2.4 27-Oct-2007  yamt sync with head.
 1.96.2.3 03-Sep-2007  yamt sync with head.
 1.96.2.2 30-Dec-2006  yamt sync with head.
 1.96.2.1 21-Jun-2006  yamt sync with head.
 1.97.2.2 29-Oct-2005  yamt use lfs_* directly rather than via ufs_ops.
suggested by Chuck Silvers.
 1.97.2.1 20-Oct-2005  yamt adapt ufs.
 1.100.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.100.10.4 11-May-2006  elad sync with head
 1.100.10.3 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.100.10.2 19-Apr-2006  elad sync with head.
 1.100.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.100.8.2 24-May-2006  yamt sync with head.
 1.100.8.1 11-Apr-2006  yamt sync with head
 1.100.6.2 01-Jun-2006  kardel Sync with head.
 1.100.6.1 22-Apr-2006  simonb Sync with head.
 1.100.4.1 09-Sep-2006  rpaulo sync with head
 1.105.10.1 22-Oct-2006  yamt sync with head
 1.105.8.1 18-Nov-2006  ad Sync with head.
 1.106.4.3 17-May-2007  yamt sync with head.
 1.106.4.2 07-May-2007  yamt sync with head.
 1.106.4.1 12-Mar-2007  rmind Sync with HEAD.
 1.107.4.1 11-Jul-2007  mjf Sync with head.
 1.107.2.7 28-Aug-2007  yamt make this compilable with DEBUG.
 1.107.2.6 24-Aug-2007  ad Sync with buffer cache locking changes. See buf.h/vfs_bio.c for details.
Some minor portions are incomplete and needs to be verified as a whole.
 1.107.2.5 15-Jul-2007  ad Sync with head.
 1.107.2.4 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.107.2.3 09-Jun-2007  ad Sync with head.
 1.107.2.2 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.107.2.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.111.10.1 14-Oct-2007  yamt sync with head.
 1.111.8.3 23-Mar-2008  matt sync with HEAD
 1.111.8.2 09-Jan-2008  matt sync with HEAD
 1.111.8.1 06-Nov-2007  matt sync with HEAD
 1.111.6.2 09-Dec-2007  jmcneill Sync with HEAD.
 1.111.6.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.114.6.5 26-Dec-2007  ad Sync with head.
 1.114.6.4 19-Dec-2007  ad Use a global lfs_lock.
 1.114.6.3 19-Dec-2007  ad Fix some more problems w/lfs on this branch.
 1.114.6.2 19-Dec-2007  ad Get lfs mostly working.
 1.114.6.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.114.4.1 18-Feb-2008  mjf Sync with HEAD.
 1.115.4.1 02-Jan-2008  bouyer Sync with HEAD
 1.118.6.2 02-Jun-2008  mjf Sync with HEAD.
 1.118.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.119.4.2 11-Mar-2010  yamt sync with head
 1.119.4.1 16-May-2008  yamt sync with head.
 1.119.2.1 18-May-2008  yamt sync with head.
 1.120.18.1 21-Apr-2010  matt sync to netbsd-5
 1.120.10.2 25-Jan-2012  riz Pull up following revision(s) (requested by bouyer in ticket #1702):
sys/ufs/lfs/lfs_inode.c: revision 1.126
sys/ufs/ffs/ffs_inode.c: revision 1.108
If ufs_balloc_range() fails, make sure to call ?fs_truncate() to
reset v_writesize to the right value.
If v_writesize is left larger than the allocated blocks, we may have
the same issue as the one described in
http://mail-index.netbsd.org/tech-kern/2010/02/02/msg007156.html
 1.120.10.1 22-Feb-2010  snj Pull up following revision(s) (requested by bouyer in ticket #1302):
sys/ufs/ext2fs/ext2fs_inode.c: revision 1.71
sys/ufs/ffs/ffs_inode.c: revision 1.104
sys/ufs/lfs/lfs_inode.c: revision 1.121
sys/ufs/ufs/ufs_inode.c: revision 1.79
- ufs_balloc_range(): on error, only PG_RELEASED the pages that were
allocated to extend the file to the new size. Releasing all pages
may release pages that contains previously-written data not yet flushed
to disk. Should fix PR kern/35704
- {ffs,lfs,ext2fs}_truncate(): Even if the inode's size is the same as
the new length, call uvm_vnp_setsize(). *_truncate() may have been
called by *_write() in the error path (e.g. block allocation failure
because of quota of file system full), and at this point v_writesize
has been set to the desired size of the file and not reverted to the
old size. Not adjusting v_writesize to the real size cause
genfs_do_io() to write to disk past the real end of the file.
 1.121.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.122.8.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.122.2.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.125.2.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.125.2.2 23-Jan-2013  yamt sync with head
 1.125.2.1 17-Apr-2012  yamt sync with head
 1.126.8.4 03-Dec-2017  jdolecek update from HEAD
 1.126.8.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.126.8.2 23-Jun-2013  tls resync from head
 1.126.8.1 25-Feb-2013  tls resync with head
 1.132.2.2 18-May-2014  rmind sync with head
 1.132.2.1 28-Aug-2013  rmind sync with head
 1.136.6.2 28-Aug-2017  skrll Sync with HEAD
 1.136.6.1 22-Sep-2015  skrll Sync with HEAD
 1.147.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.147.2.2 26-Apr-2017  pgoyette Sync with HEAD
 1.147.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.155.6.1 30-Oct-2017  snj Pull up following revision(s) (requested by maya in ticket #330):
sbin/fsck_lfs/inode.c: 1.69
sbin/fsck_lfs/lfs.c: 1.73
sbin/fsck_lfs/pass6.c: 1.50
sbin/fsck_lfs/segwrite.c: 1.46
sys/ufs/lfs/lfs.h: 1.202-1.203
sys/ufs/lfs/lfs_accessors.h: 1.48
sys/ufs/lfs/lfs_alloc.c: 1.136-1.137
sys/ufs/lfs/lfs_balloc.c: 1.94
sys/ufs/lfs/lfs_bio.c: 1.141
sys/ufs/lfs/lfs_extern.h: 1.113
sys/ufs/lfs/lfs_inode.c: 1.156-1.157
sys/ufs/lfs/lfs_inode.h: 1.20, 1.21, 1.23
sys/ufs/lfs/lfs_itimes.c: 1.20
sys/ufs/lfs/lfs_pages.c: 1.13-1.15
sys/ufs/lfs/lfs_rename.c: 1.22
sys/ufs/lfs/lfs_segment.c: 1.270-1.275
sys/ufs/lfs/lfs_subr.c: 1.94-1.97
sys/ufs/lfs/lfs_syscalls.c: 1.175
sys/ufs/lfs/lfs_vfsops.c: 1.360
sys/ufs/lfs/lfs_vnops.c: 1.316-1.321
sys/ufs/lfs/ulfs_inode.c: 1.20
sys/ufs/lfs/ulfs_inode.h: 1.24
sys/ufs/lfs/ulfs_lookup.c: 1.41
sys/ufs/lfs/ulfs_quota2.c: 1.31
sys/ufs/lfs/ulfs_readwrite.c: 1.24
sys/ufs/lfs/ulfs_vnops.c: 1.49-1.50
Update inode member i_flag --> i_state to keep up with kernel changes
Move definition of IN_ALLMOD near the flag it's a mask for.
Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
Correct confusion between i_flag and i_flags
These will have to be renamed.
Spotted by Riastradh, thanks!
Add an XXX about the missing flags so it's not buried in a commit
message.
now the XXX count for LFS is 260
Rename i_flag to i_state.
The similarity to i_flags has previously caused errors.
Use continue to denote the no-op loop to match netbsd style
newline for extra clarity.
It isn't safe to drain dirops with seglock held, it'll deadlock if there
are any dirops. drain before grabbing seglock.
lfs_dirops == 0 is always true (as we already drained dirops), so omit
that part of the comparison.
Fixes a lot of LFS deadlocks. PR kern/52301
Many thanks to dholland for help analyzing coredumps
Ifdef out KDASSERT which fires on my machine.
Deduplicate sanity check that seglock is held on segunlock
Revert r1.272 fix to PR kern/52301, the performance hit is making things
unusable.
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar
XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
fix buffer overflow/KASSERT when cookies are supplied
lfs no longer uses the ffs-style struct direct, use the correct minimum
size
from dholland
XXX more wrong
Consistently use {,UN}MARK_VNODE macros rather than function calls.
Not much point doing anything after a panic call
Ask some question about the code in a XXX comment
XXX question our double-flushing of dirops
Fix typo in comment
 1.157.12.1 29-Feb-2020  ad Sync with head.
 1.157.10.1 17-Aug-2020  martin Pull up following revision(s) (requested by riastradh in ticket #1050):

sys/ufs/lfs/lfs_subr.c: revision 1.101
sys/ufs/lfs/lfs_subr.c: revision 1.102
sys/ufs/lfs/lfs_inode.c: revision 1.158
sys/ufs/lfs/lfs_inode.h: revision 1.25
sys/ufs/lfs/lfs_balloc.c: revision 1.95
sys/ufs/lfs/lfs_pages.c: revision 1.21
sys/ufs/lfs/lfs_vnops.c: revision 1.330
sys/ufs/lfs/lfs_alloc.c: revision 1.140 (patch)
sys/ufs/lfs/lfs_alloc.c: revision 1.141 (patch)
lib/libp2k/p2k.c: revision 1.72
sys/ufs/lfs/lfs.h: revision 1.205
sys/ufs/lfs/lfs.h: revision 1.206
sys/ufs/lfs/lfs_segment.c: revision 1.284
sys/ufs/lfs/lfs.h: revision 1.207
sys/ufs/lfs/lfs_segment.c: revision 1.285
sys/ufs/lfs/lfs_debug.c: revision 1.55
sys/ufs/lfs/lfs_rename.c: revision 1.23
usr.sbin/dumplfs/dumplfs.c: revision 1.65
sys/ufs/lfs/lfs_vfsops.c: revision 1.371
sys/arch/i386/stand/efiboot/bootx64/Makefile: revision 1.3
sys/ufs/lfs/lfs_vfsops.c: revision 1.372
sys/ufs/lfs/lfs_vfsops.c: revision 1.373
sbin/fsck_lfs/pass1.c: revision 1.46
sys/ufs/lfs/lfs_vnops.c: revision 1.326
sys/ufs/lfs/lfs_vnops.c: revision 1.327
sys/ufs/lfs/lfs_vfsops.c: revision 1.375 (patch)
sys/ufs/lfs/lfs_vnops.c: revision 1.328
sys/ufs/lfs/lfs_subr.c: revision 1.98
sys/ufs/lfs/lfs_extern.h: revision 1.116
sys/ufs/lfs/lfs_vnops.c: revision 1.329
sys/ufs/lfs/lfs_subr.c: revision 1.99
sys/ufs/lfs/lfs_extern.h: revision 1.117
sys/ufs/lfs/lfs_accessors.h: revision 1.49
sys/ufs/lfs/lfs_extern.h: revision 1.118
sys/rump/fs/lib/liblfs/Makefile: revision 1.15
sys/ufs/lfs/lfs_bio.c: revision 1.146 (patch)
sys/ufs/lfs/lfs_bio.c: revision 1.147
sys/ufs/lfs/lfs_subr.c: revision 1.100

Fix kassert in lfs by initializing vp first.

Use a marker node to iterate lfs_dchainhd / i_lfs_dchain.

I believe elements can be removed while the lock is dropped,
including the next node we're hanging on to.

Just use VOP_BWRITE for lfs_bwrite_log.
Hope this doesn't cause trouble with vfs_suspend.

Teach lfs to transition ro<->rw.

Prevent new dirops while we issue lfs_flush_dirops.

lfs_flush_dirops assumes (by KASSERT((ip->i_state & IN_ADIROP) == 0))
that vnodes on the dchain will not become involved in active dirops
even while holding no other locks (lfs_lock, v_interlock), so we must
set lfs_writer here. All other callers already set lfs_writer.

We set fs->lfs_writer++ without explicitly doing lfs_writer_enter
because
(a) we already waited for the dirops to drain, and
(b) we hold lfs_lock and cannot drop it before setting lfs_writer.

Assert lfs_writer where I think we can now prove it.

Serialize access to the splay tree with lfs_lock.

Change some cheap KDASSERT into KASSERT.

Take a reference and fix assertions in lfs_flush_dirops.
Fixes panic:
KASSERT((ip->i_state & IN_ADIROP) == 0) at lfs_vnops.c:1670
lfs_flush_dirops
lfs_check
lfs_setattr
VOP_SETATTR
change_mode
sys_fchmod
syscall

This assertion -- and the assertion that vp->v_uflag has VU_DIROP set
-- is valid only until we release lfs_lock, because we may race with
lfs_unmark_dirop which will remove the nodes and change the flags.

Further, vp itself is valid only as long as it is referenced, which it
is as long as it's on the dchain, but lfs_unmark_dirop drops the
dchain's reference.

Don't lfs_writer_enter while holding v_interlock.

There's no need to lfs_writer_enter at all here, as far as I can see.
lfs_flush_fs will do it for us.

Break deadlock in PR kern/52301.

The lock order is lfs_writer -> lfs_seglock. The problem in 52301 is
that lfs_segwrite violates this lock order by sometimes doing
lfs_seglock -> lfs_writer, either (a) when doing a checkpoint or (b),
opportunistically, when there are no dirops pending. Both cases can
deadlock, because dirops sometimes take the seglock (lfs_truncate,
lfs_valloc, lfs_vfree):
(a) There may be dirops pending, and they may be waiting for the
seglock, so we can't wait for them to complete while holding the
seglock.
(b) The test for fs->lfs_dirops == 0 happens unlocked, and the state
may change by the time lfs_writer_enter acquires lfs_lock.

To resolve this in each case:
(a) Do lfs_writer_enter before lfs_seglock, since we will need it
unconditionally anyway. The worst performance impact of this should
be that some dirops get delayed a little bit.
(b) Create a new lfs_writer_tryenter to use at this point so that the
test for fs->lfs_dirops == 0 and the acquisition of lfs_writer happen
atomically under lfs_lock.

Initialize/destroy lfs_allclean_wakeup in modcmd, not lfs_mountfs.

Fixes reloading lfs.kmod.

In lfs_update, hold lfs_writer around lfs_vflush.

Otherwise, we might do
lfs_vflush
-> lfs_seglock
-> lfs_segwait(SEGM_CKP)
-> lfs_writer_enter
which is the reverse of the lfs_writer -> lfs_seglock ordering.

Call lfs_orphan in lfs_rename while we're still in the dirop.
lfs_writer_enter can't fail; keep it simple and don't pretend it can.

Assert that mtsleep can't fail either -- it doesn't catch signals and
there's no timeout.

Teach LFS_ORPHAN_NEXTFREE about lfs64.

Dust off the orphan detection code and try to make it work.

Fix !DIAGNOSTIC compile

Fix userland references to LFS_ORPHAN_NEXTFREE.

Forgot to grep for these or do a full distribution build, oops!

Fix missing <sys/evcnt.h> by removing the evcnts instead.

Just wanted to confirm that a race might happen, and indeed it did.
These serve little diagnostic value otherwise.

OR into bp->b_cflags; don't overwrite.

CTASSERT lfs on-disk structure sizes.

Avoid misaligned access to lfs64 on-disk records in memory.
lfs64 directory entries are only 32-bit aligned in order to conserve
space in directory blocks, and we had a hack to stuff a 64-bit inode
in them. This replaces the hack by __aligned(4) __packed, and goes
further:

1. It's not clear that all the other lfs64 data structures are 64-bit
aligned on disk to begin with. We can go through these later and
upgrade them from
struct foo64 {
...
} __aligned(4) __packed;
union foo {
struct foo64 f64;
...
};
to
struct foo64 {
...
};
union foo {
struct foo64 f64 __aligned(8);
...
} __aligned(4) __packed;
if we really want to take advantage of 64-bit memory accesses.
However, the __aligned(4) __packed must remain on the union
because:
2. We access even the lfs32 data structures via a union that has
lfs64 members, and it turns out that compilers will assume access
through a union with 64-bit aligned members implies the whole
union has 64-bit alignment, even if we're only accessing a 32-bit
aligned member.

Fix clang build after packed lfs64 accessor change.

Suppress spurious address-of-packed error in rump lfs too.
 1.157.6.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.159.4.1 25-Apr-2020  bouyer Sync with bouyer-xenpvh-base2 (HEAD)
 1.28 01-Nov-2025  perseant Create a new LFS inode flag, IN_DEAD, to indicate that a file's last
reference, other than those that come with VU_DIROP or IN_CLEANING and
the one the caller holds, has been dropped. Check and apply this flag
in lfs_orphan(), and call lfs_orphan() on close if the link count is
zero. Change the signature of lfs_orphan to facilitate.

Make test t_vfsops:lfs_tfhremove expect success.

Closes PR kern/43745.
 1.27 20-Oct-2025  perseant * Generalize the partial-segment parser introduced for roll-forward,
using it to facilitate an in-kernel segment rewriter (cleaner), and a
mechanism to check whether a segment is in fact empty (only used
with DEBUG).

* Add these new fcntl calls:
- LFCNFILESTATS: For each inode given, report its number of direct
blocks, how many gaps (discontinuities) there are between direct
blocks, and how large the total gap distance is. This will be
useful for a coalescing agent.
- LFCNREWRITEFILE: For each inode given, rewrite its direct blocks,
effectively coalescing it into as compact a form as possible.
- LFCNSCRAMBLE: As above, except that it only rewrites every other
block. This causes the file to have many gaps that can be
measured with LFCNFILESTATS and addressed with LFCNREWRITEFILE,
for testing purposes.
- LFCNREWRITESEGS: Rewrite any live data in the given segments.
This is intended to simplify the cleaner API and facilitate an
in-kernel cleaner.
- LFCNCLEANERINFO: Get the most current CLEANERINFO data from the
kernel.
- LFCNSEGUSE: Retrieve segment usage data from the kernel.

* Vnodes marked IN_CLEANING now take a reference. Add a new "cleaner
lock", which must be taken by the cleaner before the segment lock,
and before marking nodes IN_CLEANING. This allows us to flush
vnodes, if necessary, before the cleaning segment is written, and
never to flush vnodes being cleaned. When the cleaner lock is
released, the vnodes are cleared of IN_CLEANING and the reference
dropped.

* Track a potential infinite loop in lfs_gatherblock.

* Pull "needs to flush" and "needs to wait for flush" into functions
instead of inlining their definitions.
 1.26 23-Mar-2022  andvar fix few typos for word "previous(ly)" in comments.
 1.25 23-Feb-2020  riastradh Use a marker node to iterate lfs_dchainhd / i_lfs_dchain.

I believe elements can be removed while the lock is dropped,
including the next node we're hanging on to.
 1.24 18-Feb-2020  chs remove the aiodoned thread. I originally added this to provide a thread context
for doing page cache iodone work, but since then biodone() has changed to
hand off all iodone work to a softint thread, so we no longer need the
special-purpose aiodoned thread.
 1.23 10-Jun-2017  maya branches: 1.23.6; 1.23.10; 1.23.12;
Rename i_flag to i_state.

The similarity to i_flags has previously caused errors.
 1.22 08-Jun-2017  chs move some buffer cache internals declarations from buf.h to vfs_bio.c.
this is needed to avoid name conflicts with ZFS and also
makes it clearer that other code shouldn't be messing with these.
remove the LFS debug code that poked around in bufqueues and
remove the BQ_EMPTY bufqueue since nothing uses it anymore.
provide a function to let LFS and wapbl read the value of nbuf for now.
 1.21 05-Jun-2017  maya Add an XXX about the missing flags so it's not buried in a commit
message.

now the XXX count for LFS is 260
 1.20 05-Jun-2017  maya Move definition of IN_ALLMOD near the flag it's a mask for.

Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
 1.19 06-Apr-2017  maya branches: 1.19.6;
don't guard lfs_sbactive or lfs_log with splbio, lfs_lock is plenty.
 1.18 06-Apr-2017  maya Provide a LFS_ENTER_LOG (__nothing) in the !DEBUG case.
so I can drop lots of #ifdef DEBUG around this macro. NFCI
 1.17 06-Apr-2017  maya Drop single use macro LFS_BCLEAN_LOG with an inlined implementation.

LFS_ENTER_LOG currently macro grabs lfs_lock, so I'd like to have just one
name for it.
 1.16 20-Jun-2016  dholland branches: 1.16.2; 1.16.4;
u_int{8,16,32,64}_t -> uint{8,16,32,64}_t in remaining lfs headers.
 1.15 20-Jun-2016  dholland Note more already-merged versions:

inode.h 1.68 is subsumed by ulfs_inode.h 1.19
inode.h 1.69-1.72 do not apply to lfs
ufs_extern.h 1.74 was covered when lfs was moved to the new vnode cache
ufs_extern.h 1.75 is equivalent to ulfs_extern.h 1.13
ufs_extern.h 1.76-1.77 do not apply to lfs
ufsmount.h 1.42 does not apply to lfs
ufs_inode.c 1.90 is subsumed by ulfs_inode.c 1.10
ufs_inode.c 1.91-1.92 do not apply to lfs
ufs_lookup.c 1.130 is subsumed by ulfs_lookup.c 1.24
ufs_lookup.c 1.131 is equivalent to ulfs_lookup.c 1.20
ufs_lookup.c 1.132 is equivalent to ulfs_lookup.c 1.21
ufs_lookup.c 1.133 is equivalent to ulfs_lookup.c 1.22
ufs_lookup.c 1.134 is equivalent to ulfs_lookup.c 1.23
ufs_lookup.c 1.135 is equivalent to ulfs_lookup.c 1.25
ufs_quota2.c 1.38 is equivalent to ulfs_quota2.c 1.17
ufs_quota2.c 1.39 is equivalent to ulfs_quota2.c 1.16
ufs_quota2.c 1.40 is equivalent to ulfs_quota2.c 1.18
ufs_vfsops.c 1.53 is subsumed by lfs_vfsops.c 1.324
ufs_vfsops.c 1.54 is subsumed by lfs_vfsops.c 1.324
ufs_vnops.c 1.223-1.224 do not apply to lfs
 1.14 20-Jun-2016  dholland ufs/inode.h -r1.67 is effectively merged into here too.
 1.13 20-Jun-2016  dholland Merge ufs/inode.h 1.66: remove i_hash from struct inode. This is the
hash table entry link from the old per-fs vnode cache and we don't
need it any more.
 1.12 19-Jun-2016  dholland Mark ufs file versions we're already synced with.
 1.11 01-Sep-2015  dholland Use the lfs dinode accessors in place of the ufs-derived ones.
(Mostly.)

The ufs-derived ones are fake structure member macros, which are gross
and not very safe. Also, it seems that a lot of places in the lfs code
were using the ffsv1 branch of them unconditionally, and this way it's
guaranteed all those places have been updated.

Found while doing this: for non-devices, have getattr produce NODEV
in the rdev field instead of leaking the address of the first direct
block.
 1.10 19-Aug-2015  dholland Part two of dinodes; use the same union everywhere.
(previously the ufs-derived code had things set up slightly different)

Remove a bunch of associated mess.
 1.9 12-Aug-2015  dholland Hack up dinode usage to be 64 vs. 32 as needed. Part 1.

(This part changes the native lfs code; the ufs-derived code already
has 64 vs. 32 logic, but as aspects of it are unsafe, and don't
entirely interoperate cleanly with the lfs 64/32 stuff, pass 2 will be
rehashing that.)
 1.8 02-Aug-2015  dholland Make i_eff_nblks in the in-memory inode 64 bits wide.
 1.7 26-May-2014  ryoon branches: 1.7.4;
Close comments
 1.6 26-May-2014  dholland remove ffs-only IN_SPACECOUNTED
 1.5 18-Jun-2013  dholland branches: 1.5.2; 1.5.8; 1.5.10;
Tuck away a bunch of symbols that don't need to be public.
 1.4 09-Jun-2013  dholland Move struct lfs_inode_ext to lfs_inode.h; it doesn't need to be public.
 1.3 08-Jun-2013  dholland G/C another unneeded union
 1.2 08-Jun-2013  dholland Remove stale union and accessor macros.
 1.1 08-Jun-2013  dholland Split the definitions suitable for userland out of ulfs_inode.h into
lfs_inode.h. Since fsck_lfs, newfs_lfs, and lfs_cleanerd want to reuse
the inode structure for their own internal use, and some of them share
parts of the kernel code as well, the best way forward is to provide a
relatively sanitized header that doesn't bring in stray material.

Shuffle a few other definitions around so that lfs_inode.h depends
only on lfs.h.

Install lfs_inode.h into /usr/include.
 1.5.10.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.5.10.1 18-Jun-2013  yamt file lfs_inode.h was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.5.8.1 10-Aug-2014  tls Rebase.
 1.5.2.4 03-Dec-2017  jdolecek update from HEAD
 1.5.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.5.2.2 23-Jun-2013  tls resync from head
 1.5.2.1 18-Jun-2013  tls file lfs_inode.h was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.7.4.3 28-Aug-2017  skrll Sync with HEAD
 1.7.4.2 09-Jul-2016  skrll Sync with HEAD
 1.7.4.1 22-Sep-2015  skrll Sync with HEAD
 1.16.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.16.2.1 26-Apr-2017  pgoyette Sync with HEAD
 1.19.6.1 30-Oct-2017  snj Pull up following revision(s) (requested by maya in ticket #330):
sbin/fsck_lfs/inode.c: 1.69
sbin/fsck_lfs/lfs.c: 1.73
sbin/fsck_lfs/pass6.c: 1.50
sbin/fsck_lfs/segwrite.c: 1.46
sys/ufs/lfs/lfs.h: 1.202-1.203
sys/ufs/lfs/lfs_accessors.h: 1.48
sys/ufs/lfs/lfs_alloc.c: 1.136-1.137
sys/ufs/lfs/lfs_balloc.c: 1.94
sys/ufs/lfs/lfs_bio.c: 1.141
sys/ufs/lfs/lfs_extern.h: 1.113
sys/ufs/lfs/lfs_inode.c: 1.156-1.157
sys/ufs/lfs/lfs_inode.h: 1.20, 1.21, 1.23
sys/ufs/lfs/lfs_itimes.c: 1.20
sys/ufs/lfs/lfs_pages.c: 1.13-1.15
sys/ufs/lfs/lfs_rename.c: 1.22
sys/ufs/lfs/lfs_segment.c: 1.270-1.275
sys/ufs/lfs/lfs_subr.c: 1.94-1.97
sys/ufs/lfs/lfs_syscalls.c: 1.175
sys/ufs/lfs/lfs_vfsops.c: 1.360
sys/ufs/lfs/lfs_vnops.c: 1.316-1.321
sys/ufs/lfs/ulfs_inode.c: 1.20
sys/ufs/lfs/ulfs_inode.h: 1.24
sys/ufs/lfs/ulfs_lookup.c: 1.41
sys/ufs/lfs/ulfs_quota2.c: 1.31
sys/ufs/lfs/ulfs_readwrite.c: 1.24
sys/ufs/lfs/ulfs_vnops.c: 1.49-1.50
Update inode member i_flag --> i_state to keep up with kernel changes
Move definition of IN_ALLMOD near the flag it's a mask for.
Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
Correct confusion between i_flag and i_flags
These will have to be renamed.
Spotted by Riastradh, thanks!
Add an XXX about the missing flags so it's not buried in a commit
message.
now the XXX count for LFS is 260
Rename i_flag to i_state.
The similarity to i_flags has previously caused errors.
Use continue to denote the no-op loop to match netbsd style
newline for extra clarity.
It isn't safe to drain dirops with seglock held, it'll deadlock if there
are any dirops. drain before grabbing seglock.
lfs_dirops == 0 is always true (as we already drained dirops), so omit
that part of the comparison.
Fixes a lot of LFS deadlocks. PR kern/52301
Many thanks to dholland for help analyzing coredumps
Ifdef out KDASSERT which fires on my machine.
Deduplicate sanity check that seglock is held on segunlock
Revert r1.272 fix to PR kern/52301, the performance hit is making things
unusable.
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar
XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
fix buffer overflow/KASSERT when cookies are supplied
lfs no longer uses the ffs-style struct direct, use the correct minimum
size
from dholland
XXX more wrong
Consistently use {,UN}MARK_VNODE macros rather than function calls.
Not much point doing anything after a panic call
Ask some question about the code in a XXX comment
XXX question our double-flushing of dirops
Fix typo in comment
 1.23.12.1 29-Feb-2020  ad Sync with head.
 1.23.10.1 17-Aug-2020  martin Pull up following revision(s) (requested by riastradh in ticket #1050):

sys/ufs/lfs/lfs_subr.c: revision 1.101
sys/ufs/lfs/lfs_subr.c: revision 1.102
sys/ufs/lfs/lfs_inode.c: revision 1.158
sys/ufs/lfs/lfs_inode.h: revision 1.25
sys/ufs/lfs/lfs_balloc.c: revision 1.95
sys/ufs/lfs/lfs_pages.c: revision 1.21
sys/ufs/lfs/lfs_vnops.c: revision 1.330
sys/ufs/lfs/lfs_alloc.c: revision 1.140 (patch)
sys/ufs/lfs/lfs_alloc.c: revision 1.141 (patch)
lib/libp2k/p2k.c: revision 1.72
sys/ufs/lfs/lfs.h: revision 1.205
sys/ufs/lfs/lfs.h: revision 1.206
sys/ufs/lfs/lfs_segment.c: revision 1.284
sys/ufs/lfs/lfs.h: revision 1.207
sys/ufs/lfs/lfs_segment.c: revision 1.285
sys/ufs/lfs/lfs_debug.c: revision 1.55
sys/ufs/lfs/lfs_rename.c: revision 1.23
usr.sbin/dumplfs/dumplfs.c: revision 1.65
sys/ufs/lfs/lfs_vfsops.c: revision 1.371
sys/arch/i386/stand/efiboot/bootx64/Makefile: revision 1.3
sys/ufs/lfs/lfs_vfsops.c: revision 1.372
sys/ufs/lfs/lfs_vfsops.c: revision 1.373
sbin/fsck_lfs/pass1.c: revision 1.46
sys/ufs/lfs/lfs_vnops.c: revision 1.326
sys/ufs/lfs/lfs_vnops.c: revision 1.327
sys/ufs/lfs/lfs_vfsops.c: revision 1.375 (patch)
sys/ufs/lfs/lfs_vnops.c: revision 1.328
sys/ufs/lfs/lfs_subr.c: revision 1.98
sys/ufs/lfs/lfs_extern.h: revision 1.116
sys/ufs/lfs/lfs_vnops.c: revision 1.329
sys/ufs/lfs/lfs_subr.c: revision 1.99
sys/ufs/lfs/lfs_extern.h: revision 1.117
sys/ufs/lfs/lfs_accessors.h: revision 1.49
sys/ufs/lfs/lfs_extern.h: revision 1.118
sys/rump/fs/lib/liblfs/Makefile: revision 1.15
sys/ufs/lfs/lfs_bio.c: revision 1.146 (patch)
sys/ufs/lfs/lfs_bio.c: revision 1.147
sys/ufs/lfs/lfs_subr.c: revision 1.100

Fix kassert in lfs by initializing vp first.

Use a marker node to iterate lfs_dchainhd / i_lfs_dchain.

I believe elements can be removed while the lock is dropped,
including the next node we're hanging on to.

Just use VOP_BWRITE for lfs_bwrite_log.
Hope this doesn't cause trouble with vfs_suspend.

Teach lfs to transition ro<->rw.

Prevent new dirops while we issue lfs_flush_dirops.

lfs_flush_dirops assumes (by KASSERT((ip->i_state & IN_ADIROP) == 0))
that vnodes on the dchain will not become involved in active dirops
even while holding no other locks (lfs_lock, v_interlock), so we must
set lfs_writer here. All other callers already set lfs_writer.

We set fs->lfs_writer++ without explicitly doing lfs_writer_enter
because
(a) we already waited for the dirops to drain, and
(b) we hold lfs_lock and cannot drop it before setting lfs_writer.

Assert lfs_writer where I think we can now prove it.

Serialize access to the splay tree with lfs_lock.

Change some cheap KDASSERT into KASSERT.

Take a reference and fix assertions in lfs_flush_dirops.
Fixes panic:
KASSERT((ip->i_state & IN_ADIROP) == 0) at lfs_vnops.c:1670
lfs_flush_dirops
lfs_check
lfs_setattr
VOP_SETATTR
change_mode
sys_fchmod
syscall

This assertion -- and the assertion that vp->v_uflag has VU_DIROP set
-- is valid only until we release lfs_lock, because we may race with
lfs_unmark_dirop which will remove the nodes and change the flags.

Further, vp itself is valid only as long as it is referenced, which it
is as long as it's on the dchain, but lfs_unmark_dirop drops the
dchain's reference.

Don't lfs_writer_enter while holding v_interlock.

There's no need to lfs_writer_enter at all here, as far as I can see.
lfs_flush_fs will do it for us.

Break deadlock in PR kern/52301.

The lock order is lfs_writer -> lfs_seglock. The problem in 52301 is
that lfs_segwrite violates this lock order by sometimes doing
lfs_seglock -> lfs_writer, either (a) when doing a checkpoint or (b),
opportunistically, when there are no dirops pending. Both cases can
deadlock, because dirops sometimes take the seglock (lfs_truncate,
lfs_valloc, lfs_vfree):
(a) There may be dirops pending, and they may be waiting for the
seglock, so we can't wait for them to complete while holding the
seglock.
(b) The test for fs->lfs_dirops == 0 happens unlocked, and the state
may change by the time lfs_writer_enter acquires lfs_lock.

To resolve this in each case:
(a) Do lfs_writer_enter before lfs_seglock, since we will need it
unconditionally anyway. The worst performance impact of this should
be that some dirops get delayed a little bit.
(b) Create a new lfs_writer_tryenter to use at this point so that the
test for fs->lfs_dirops == 0 and the acquisition of lfs_writer happen
atomically under lfs_lock.

Initialize/destroy lfs_allclean_wakeup in modcmd, not lfs_mountfs.

Fixes reloading lfs.kmod.

In lfs_update, hold lfs_writer around lfs_vflush.

Otherwise, we might do
lfs_vflush
-> lfs_seglock
-> lfs_segwait(SEGM_CKP)
-> lfs_writer_enter
which is the reverse of the lfs_writer -> lfs_seglock ordering.

Call lfs_orphan in lfs_rename while we're still in the dirop.
lfs_writer_enter can't fail; keep it simple and don't pretend it can.

Assert that mtsleep can't fail either -- it doesn't catch signals and
there's no timeout.

Teach LFS_ORPHAN_NEXTFREE about lfs64.

Dust off the orphan detection code and try to make it work.

Fix !DIAGNOSTIC compile

Fix userland references to LFS_ORPHAN_NEXTFREE.

Forgot to grep for these or do a full distribution build, oops!

Fix missing <sys/evcnt.h> by removing the evcnts instead.

Just wanted to confirm that a race might happen, and indeed it did.
These serve little diagnostic value otherwise.

OR into bp->b_cflags; don't overwrite.

CTASSERT lfs on-disk structure sizes.

Avoid misaligned access to lfs64 on-disk records in memory.
lfs64 directory entries are only 32-bit aligned in order to conserve
space in directory blocks, and we had a hack to stuff a 64-bit inode
in them. This replaces the hack by __aligned(4) __packed, and goes
further:

1. It's not clear that all the other lfs64 data structures are 64-bit
aligned on disk to begin with. We can go through these later and
upgrade them from
struct foo64 {
...
} __aligned(4) __packed;
union foo {
struct foo64 f64;
...
};
to
struct foo64 {
...
};
union foo {
struct foo64 f64 __aligned(8);
...
} __aligned(4) __packed;
if we really want to take advantage of 64-bit memory accesses.
However, the __aligned(4) __packed must remain on the union
because:
2. We access even the lfs32 data structures via a union that has
lfs64 members, and it turns out that compilers will assume access
through a union with 64-bit aligned members implies the whole
union has 64-bit alignment, even if we're only accessing a 32-bit
aligned member.

Fix clang build after packed lfs64 accessor change.

Suppress spurious address-of-packed error in rump lfs too.
 1.23.6.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.20 10-Jun-2017  maya Rename i_flag to i_state.

The similarity to i_flags has previously caused errors.
 1.19 01-Sep-2015  dholland branches: 1.19.10;
Use the lfs dinode accessors in place of the ufs-derived ones.
(Mostly.)

The ufs-derived ones are fake structure member macros, which are gross
and not very safe. Also, it seems that a lot of places in the lfs code
were using the ffsv1 branch of them unconditionally, and this way it's
guaranteed all those places have been updated.

Found while doing this: for non-devices, have getattr produce NODEV
in the rdev field instead of leaking the address of the first direct
block.
 1.18 12-Aug-2015  dholland Add IFILE32 and IFILE64 structures for the on-disk ifile entries.
Add and use accessors. There are also a bunch of places that cast and
I hope I've found them all...
 1.17 02-Aug-2015  dholland Use accessor functions for the version field of the lfs superblock.
I thought at first maybe the cases that test the version should be
rolled into the accessors, but on the whole I think the conclusion on
that is no.
 1.16 28-Jul-2015  dholland Add a new lfs header file: lfs_accessors.h.

This contains all the accessor functions and macros out of lfs.h.
Add an include of lfs_accessors.h after all uses of lfs.h... except
for code that wants to define its own struct lfs-alike that the
accessors are supposed to play along with. For these, set STRUCT_LFS
and include lfs_accessors.h after the necessary structure has been
defined, so that lfs_accessors.h can emit functions in terms of it.
 1.15 08-Jun-2013  dholland branches: 1.15.10;
Tidy up the LFS userland build hacks.
Don't use -I${NETBSDSRCDIR}/sys; don't include files other than the
exported LFS headers, which are lfs.h, lfs_inode.h, and (for now)
lfs_extern.h.
 1.14 06-Jun-2013  dholland Cleanups and hacks to make lfs userland stuff build:
- lfs_cksum.c doesn't actually need ulfs_inode.h any more.
- neither does lfs_itimes.c.
- add hacks to fsck_lfs to make it compile.
- add hacks to newfs_lfs to make it compile.
- fix warning in ulfs_quota.c when quotas are fully disabled
(as I guess is happening with the rumpity version)

XXX: This commit adds -I${NETBSDSRCDIR}/sys to the Makefiles for
XXX: fsck_lfs, newfs_lfs, and lfs_cleanerd. This needs to be cleaned
XXX: up ASAP; but I consider this less problematic in the short term
XXX: than spewing ulfs_*.h into /usr/include.
 1.13 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.12 28-Apr-2008  martin branches: 1.12.34; 1.12.44;
Remove clause 3 and 4 from TNF licenses
 1.11 02-Jan-2008  ad branches: 1.11.6; 1.11.8; 1.11.10;
Merge vmlocking2 to head.
 1.10 23-Jun-2006  yamt branches: 1.10.14; 1.10.30; 1.10.36; 1.10.40; 1.10.44;
fix a simonb-timecounters regression.
the precision of getnanotime() is not suitable for file timestamps.
esp. when it's nfs-exported.

- introduce vfs_timestamp().
(the name is from freebsd. currently merely a wrapper of nanotime())
- for ufs-like filesystems, use it rather than getnanotime().

XXX check other filesystems.
 1.9 07-Jun-2006  kardel branches: 1.9.2; 1.9.4;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.8 15-May-2006  christos branches: 1.8.2;
we need <sys/kauth.h> for the kernel.
 1.7 15-May-2006  christos Don't include <sys/kauth.h>; breaks userland (newfs_lfs)
 1.6 14-May-2006  elad integrate kauth.
 1.5 19-Mar-2006  rtr don't bother checking of ts == NULL before assigning since we know that
it is.
solves coverity 2725 / run 6
 1.4 11-Dec-2005  christos branches: 1.4.4; 1.4.6; 1.4.8; 1.4.10; 1.4.12;
merge ktrace-lwp.
 1.3 30-Oct-2005  simonb branches: 1.3.2;
We don't need <sys/systm.h> here.
 1.2 13-Sep-2005  christos branches: 1.2.2;
redefine panic if we are a user program.
 1.1 13-Sep-2005  christos split out lfs_itimes(). It is used in fsck_lfs.
 1.2.2.1 02-Nov-2005  yamt sync with head.
 1.3.2.2 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.3.2.1 30-Oct-2005  skrll file lfs_itimes.c was added on branch ktrace-lwp on 2005-11-10 14:12:32 +0000
 1.4.12.2 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.4.12.1 28-Mar-2006  tron Merge 2006-03-28 NetBSD-current into the "peter-altq" branch.
 1.4.10.2 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.4.10.1 19-Apr-2006  elad sync with head.
 1.4.8.3 26-Jun-2006  yamt sync with head.
 1.4.8.2 24-May-2006  yamt sync with head.
 1.4.8.1 01-Apr-2006  yamt sync with head.
 1.4.6.4 01-Jun-2006  kardel Sync with head.
 1.4.6.3 22-Apr-2006  simonb Sync with head.
 1.4.6.2 05-Feb-2006  simonb In the *itimes functions, just call getnanotime() at the start of
the function and use the result if needed, rather than the previous
conditional calls/assignments method. The code is clearer this way,
and benchmarks at about the same speed.
 1.4.6.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time() and use "time_second"
instead of "time.tv_sec".
 1.4.4.1 09-Sep-2006  rpaulo sync with head
 1.8.2.1 19-Jun-2006  chap Sync with head.
 1.9.4.4 21-Jan-2008  yamt sync with head
 1.9.4.3 30-Dec-2006  yamt sync with head.
 1.9.4.2 21-Jun-2006  yamt sync with head.
 1.9.4.1 07-Jun-2006  yamt file lfs_itimes.c was added on branch yamt-lazymbuf on 2006-06-21 15:12:39 +0000
 1.9.2.1 13-Jul-2006  gdamore Merge from HEAD.
 1.10.44.1 02-Jan-2008  bouyer Sync with HEAD
 1.10.40.2 19-Dec-2007  ad Use a global lfs_lock.
 1.10.40.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.10.36.1 18-Feb-2008  mjf Sync with HEAD.
 1.10.30.1 09-Jan-2008  matt sync with HEAD
 1.10.14.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.11.10.1 16-May-2008  yamt sync with head.
 1.11.8.1 18-May-2008  yamt sync with head.
 1.11.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.12.44.2 03-Dec-2017  jdolecek update from HEAD
 1.12.44.1 23-Jun-2013  tls resync from head
 1.12.34.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.15.10.2 28-Aug-2017  skrll Sync with HEAD
 1.15.10.1 22-Sep-2015  skrll Sync with HEAD
 1.19.10.1 30-Oct-2017  snj Pull up following revision(s) (requested by maya in ticket #330):
sbin/fsck_lfs/inode.c: 1.69
sbin/fsck_lfs/lfs.c: 1.73
sbin/fsck_lfs/pass6.c: 1.50
sbin/fsck_lfs/segwrite.c: 1.46
sys/ufs/lfs/lfs.h: 1.202-1.203
sys/ufs/lfs/lfs_accessors.h: 1.48
sys/ufs/lfs/lfs_alloc.c: 1.136-1.137
sys/ufs/lfs/lfs_balloc.c: 1.94
sys/ufs/lfs/lfs_bio.c: 1.141
sys/ufs/lfs/lfs_extern.h: 1.113
sys/ufs/lfs/lfs_inode.c: 1.156-1.157
sys/ufs/lfs/lfs_inode.h: 1.20, 1.21, 1.23
sys/ufs/lfs/lfs_itimes.c: 1.20
sys/ufs/lfs/lfs_pages.c: 1.13-1.15
sys/ufs/lfs/lfs_rename.c: 1.22
sys/ufs/lfs/lfs_segment.c: 1.270-1.275
sys/ufs/lfs/lfs_subr.c: 1.94-1.97
sys/ufs/lfs/lfs_syscalls.c: 1.175
sys/ufs/lfs/lfs_vfsops.c: 1.360
sys/ufs/lfs/lfs_vnops.c: 1.316-1.321
sys/ufs/lfs/ulfs_inode.c: 1.20
sys/ufs/lfs/ulfs_inode.h: 1.24
sys/ufs/lfs/ulfs_lookup.c: 1.41
sys/ufs/lfs/ulfs_quota2.c: 1.31
sys/ufs/lfs/ulfs_readwrite.c: 1.24
sys/ufs/lfs/ulfs_vnops.c: 1.49-1.50
Update inode member i_flag --> i_state to keep up with kernel changes
Move definition of IN_ALLMOD near the flag it's a mask for.
Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
Correct confusion between i_flag and i_flags
These will have to be renamed.
Spotted by Riastradh, thanks!
Add an XXX about the missing flags so it's not buried in a commit
message.
now the XXX count for LFS is 260
Rename i_flag to i_state.
The similarity to i_flags has previously caused errors.
Use continue to denote the no-op loop to match netbsd style
newline for extra clarity.
It isn't safe to drain dirops with seglock held, it'll deadlock if there
are any dirops. drain before grabbing seglock.
lfs_dirops == 0 is always true (as we already drained dirops), so omit
that part of the comparison.
Fixes a lot of LFS deadlocks. PR kern/52301
Many thanks to dholland for help analyzing coredumps
Ifdef out KDASSERT which fires on my machine.
Deduplicate sanity check that seglock is held on segunlock
Revert r1.272 fix to PR kern/52301, the performance hit is making things
unusable.
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar
XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
fix buffer overflow/KASSERT when cookies are supplied
lfs no longer uses the ffs-style struct direct, use the correct minimum
size
from dholland
XXX more wrong
Consistently use {,UN}MARK_VNODE macros rather than function calls.
Not much point doing anything after a panic call
Ask some question about the code in a XXX comment
XXX question our double-flushing of dirops
Fix typo in comment
 1.4 20-Oct-2025  perseant * Generalize the partial-segment parser introduced for roll-forward,
using it to facilitate an in-kernel segment rewriter (cleaner), and a
mechanism to check whether a segment is in fact empty (only used
with DEBUG).

* Add these new fcntl calls:
- LFCNFILESTATS: For each inode given, report its number of direct
blocks, how many gaps (discontinuities) there are between direct
blocks, and how large the total gap distance is. This will be
useful for a coalescing agent.
- LFCNREWRITEFILE: For each inode given, rewrite its direct blocks,
effectively coalescing it into as compact a form as possible.
- LFCNSCRAMBLE: As above, except that it only rewrites every other
block. This causes the file to have many gaps that can be
measured with LFCNFILESTATS and addressed with LFCNREWRITEFILE,
for testing purposes.
- LFCNREWRITESEGS: Rewrite any live data in the given segments.
This is intended to simplify the cleaner API and facilitate an
in-kernel cleaner.
- LFCNCLEANERINFO: Get the most current CLEANERINFO data from the
kernel.
- LFCNSEGUSE: Retrieve segment usage data from the kernel.

* Vnodes marked IN_CLEANING now take a reference. Add a new "cleaner
lock", which must be taken by the cleaner before the segment lock,
and before marking nodes IN_CLEANING. This allows us to flush
vnodes, if necessary, before the cleaning segment is written, and
never to flush vnodes being cleaned. When the cleaner lock is
released, the vnodes are cleared of IN_CLEANING and the reference
dropped.

* Track a potential infinite loop in lfs_gatherblock.

* Pull "needs to flush" and "needs to wait for flush" into functions
instead of inlining their definitions.
 1.3 20-Jun-2016  dholland u_int{8,16,32,64}_t -> uint{8,16,32,64}_t in remaining lfs headers.
 1.2 12-Aug-2015  dholland Widen several of the fields of BLOCK_INFO to 64 bits.

Keep the old BLOCK_INFO as BLOCK_INFO_70, and version the fcntls that
use it.

Note that BLOCK_INFO_70 has 64-bit padding issues so that it's
different on 32-bit and 64-bit machines. This has been fixed. However,
BLOCK_INFO also contains a pointer, so compat32 stuff for 32-on-64 is
still needed and doesn't currently exist.
 1.1 28-Jul-2013  dholland branches: 1.1.2; 1.1.6; 1.1.10; 1.1.12;
Add lfs_kernel.h for declarations that don't need to be exposed to userland.

lfs currently has the following headers:
lfs.h - on-disk structures and stuff needed for userlevel tools
lfs_inode.h - additional restricted materials for userlevel tools
that operate the fs (newfs_lfs, fsck_lfs, lfs_cleanerd)
lfs_kernel.h - stuff needed only in the kernel

and the following legacy headers that are expected to be mopped up and
folded into one of the above:
lfs_extern.h - function prototypes
ulfs_bswap.h - endian-independent support
ulfs_dinode.h - now contains very little
ulfs_dirhash.h - dirhash support
ulfs_extattr.h - extattr support
ulfs_extern.h - more function prototypes
ulfs_inode.h - assorted kernel-only declarations
ulfs_quota.h - quota support
ulfs_quota1.h - more quota support
ulfs_quota2.h - more quota support
ulfs_quotacommon.h - more quota support
ulfsmount.h - legacy copy of ufsmount material
 1.1.12.2 09-Jul-2016  skrll Sync with HEAD
 1.1.12.1 22-Sep-2015  skrll Sync with HEAD
 1.1.10.3 03-Dec-2017  jdolecek update from HEAD
 1.1.10.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.1.10.1 28-Jul-2013  tls file lfs_kernel.h was added on branch tls-maxphys on 2014-08-20 00:04:44 +0000
 1.1.6.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.1.6.1 28-Jul-2013  yamt file lfs_kernel.h was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.1.2.2 28-Aug-2013  rmind sync with head
 1.1.2.1 28-Jul-2013  rmind file lfs_kernel.h was added on branch rmind-smpnet on 2013-08-28 23:59:38 +0000
 1.27 11-Apr-2023  riastradh lfs: Assert page identity doesn't change.

Forgot what I was debugging when I inserted a relookup in my local
tree months or years ago, but whatever it was, if that solved a
problem, this KDASSERT will make the problem more obvious.
 1.26 05-Sep-2020  riastradh Round of uvm.h cleanup.

The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
to query whether curlwp is the pagedaemon, which should maybe be
exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
by it. We should split up uvm_extern.h but this will serve for now
to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
UVMHIST(ubchist), since ubchist is declared in uvm.h but the
reference evaporates if UVMHIST is not defined, so we reduce header
file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
here.

ok chs@
 1.25 17-Mar-2020  ad Tweak the March 14th change to make page waits interlocked by pg->interlock.
Remove unneeded changes and only deal with the PQ_WANTED flag, to exclude
possible bugs.
 1.24 14-Mar-2020  ad Make uvm_pagemarkdirty() responsible for putting vnodes onto the syncer
work list. Proposed on tech-kern@.
 1.23 14-Mar-2020  ad Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.
 1.22 23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.21 23-Feb-2020  riastradh Don't lfs_writer_enter while holding v_interlock.

There's no need to lfs_writer_enter at all here, as far as I can see.
lfs_flush_fs will do it for us.
 1.20 15-Jan-2020  ad Merge from yamt-pagecache (after much testing):

- Reduce unnecessary page scan in putpages esp. when an object has a ton of
pages cached but only a few of them are dirty.

- Reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
 1.19 31-Dec-2019  ad branches: 1.19.2;
- Add and use wrapper functions that take and acquire page interlocks, and pairs
of page interlocks. Require that the page interlock be held over calls to
uvm_pageactivate(), uvm_pagewire() and similar.

- Solve the concurrency problem with page replacement state. Rather than
updating the global state synchronously, set an intended state on
individual pages (active, inactive, enqueued, dequeued) while holding the
page interlock. After the interlock is released put the pages on a 128
entry per-CPU queue for their state changes to be made real in batch.
This results in in a ~400 fold decrease in contention on my test system.
Proposed on tech-kern but modified to use the page interlock rather than
atomics to synchronise as it's much easier to maintain that way, and
cheaper.
 1.18 20-Dec-2019  ad Fix lfs_putpages() for bsize < nbpg.
 1.17 15-Dec-2019  ad Merge from yamt-pagecache:

- do gang lookup of pages using radixtree.
- remove now unused uvm_object::uo_memq and vm_page::listq.queue.
 1.16 13-Dec-2019  ad Break the global uvm_pageqlock into a per-page identity lock and a private
lock for use of the pagedaemon policy code. Discussed on tech-kern.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour
 1.15 19-Aug-2017  maya branches: 1.15.4; 1.15.8;
Ask some question about the code in a XXX comment
 1.14 10-Jun-2017  maya Rename i_flag to i_state.

The similarity to i_flags has previously caused errors.
 1.13 05-Jun-2017  maya Correct confusion between i_flag and i_flags
These will have to be renamed.

Spotted by Riastradh, thanks!
 1.12 04-Jun-2017  hannken Operations fstrans_start() and fstrans_start_nowait() now always
use FSTRANS_SHARED as lock type so remove the lock type argument.

File system state FSTRANS_SUSPENDING is now unused so remove it.

Regen vnode_if files.

Ride 8.99.1 less than a hour ago.
 1.11 01-Apr-2017  maya branches: 1.11.6;
Switch lfs_writer_daemon to use condvar instead of mtsleep.
track thread existence with struct lwp instead of pid + lid,
it's more useful from ddb.
 1.10 30-Mar-2017  hannken Remove now redundant calls to fstrans_start()/fstrans_done().

Add fstrans_start()/fstrans_done() to lfs_putpages().
 1.9 04-Oct-2016  christos branches: 1.9.2;
Grr, the optimizer on mips64 can't handle this... Use MIN_PAGE_SIZE.
 1.8 21-Jul-2016  christos Don't do variable stack allocations for systems with non-const PAGE_SIZE;
instead assume that the smallest pagesize is 1024.
 1.7 12-Aug-2015  dholland branches: 1.7.2;
Make 32-bit and 64-bit versions of SEGSUM.
Also fix some of the FINFO handling as it's closely entangled.
 1.6 02-Aug-2015  dholland Second batch of 64 -> 32 truncations in lfs, along with more minor
tidyups and corrections in passing.
 1.5 28-Jul-2015  dholland Add a new lfs header file: lfs_accessors.h.

This contains all the accessor functions and macros out of lfs.h.
Add an include of lfs_accessors.h after all uses of lfs.h... except
for code that wants to define its own struct lfs-alike that the
accessors are supposed to play along with. For these, set STRUCT_LFS
and include lfs_accessors.h after the necessary structure has been
defined, so that lfs_accessors.h can emit functions in terms of it.
 1.4 25-Jul-2015  martin Use accessors in DEBUG and DIAGNOSTIC code as well
 1.3 24-Jul-2015  dholland More lfs superblock accessors.
(This changes the rest of the code over; all the accessors were
already added.)

The difference between this commit and the previous one is arbitrary,
but the previous one passed the regression tests on its own so I'm
keeping it separate to help with any bisections that might be needed
in the future.
 1.2 24-Jul-2015  dholland Switch to accessor functions for elements of the LFS on-disk
superblock. This will allow switching between 32/64 bit forms on the
fly; it will also allow handling LFS_EI reasonably tidily. (That
currently doesn't work on the superblock.)

It also gets rid of cpp abuse in the form of fake structure member
macros.

Also, instead of doing sleep/wakeup on &lfs_avail and &lfs_nextseg
inside the on-disk superblock, add extra elements to the in-memory
struct lfs for this. (XXX: these should be changed to condvars, but
not right now)

XXX: this migrates a structure needed by the lfs code in libsa (struct
salfs) into lfs.h, where it doesn't belong, but for the time being
this is necessary in order to allow the accessors (and the various
lfs macros and other goop that relies on them) to compile.
 1.1 16-May-2014  dholland branches: 1.1.2; 1.1.4; 1.1.8; 1.1.10;
Move lfs_getpages and lfs_putpages to their own file.
 1.1.10.4 28-Aug-2017  skrll Sync with HEAD
 1.1.10.3 05-Dec-2016  skrll Sync with HEAD
 1.1.10.2 05-Oct-2016  skrll Sync with HEAD
 1.1.10.1 22-Sep-2015  skrll Sync with HEAD
 1.1.8.3 03-Dec-2017  jdolecek update from HEAD
 1.1.8.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.1.8.1 16-May-2014  tls file lfs_pages.c was added on branch tls-maxphys on 2014-08-20 00:04:44 +0000
 1.1.4.2 10-Aug-2014  tls Rebase.
 1.1.4.1 16-May-2014  tls file lfs_pages.c was added on branch tls-earlyentropy on 2014-08-10 06:56:58 +0000
 1.1.2.2 18-May-2014  rmind sync with head
 1.1.2.1 16-May-2014  rmind file lfs_pages.c was added on branch rmind-smpnet on 2014-05-18 17:46:21 +0000
 1.7.2.3 26-Apr-2017  pgoyette Sync with HEAD
 1.7.2.2 04-Nov-2016  pgoyette Sync with HEAD
 1.7.2.1 26-Jul-2016  pgoyette Sync with HEAD
 1.9.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.11.6.2 30-Oct-2017  snj Pull up following revision(s) (requested by maya in ticket #330):
sbin/fsck_lfs/inode.c: 1.69
sbin/fsck_lfs/lfs.c: 1.73
sbin/fsck_lfs/pass6.c: 1.50
sbin/fsck_lfs/segwrite.c: 1.46
sys/ufs/lfs/lfs.h: 1.202-1.203
sys/ufs/lfs/lfs_accessors.h: 1.48
sys/ufs/lfs/lfs_alloc.c: 1.136-1.137
sys/ufs/lfs/lfs_balloc.c: 1.94
sys/ufs/lfs/lfs_bio.c: 1.141
sys/ufs/lfs/lfs_extern.h: 1.113
sys/ufs/lfs/lfs_inode.c: 1.156-1.157
sys/ufs/lfs/lfs_inode.h: 1.20, 1.21, 1.23
sys/ufs/lfs/lfs_itimes.c: 1.20
sys/ufs/lfs/lfs_pages.c: 1.13-1.15
sys/ufs/lfs/lfs_rename.c: 1.22
sys/ufs/lfs/lfs_segment.c: 1.270-1.275
sys/ufs/lfs/lfs_subr.c: 1.94-1.97
sys/ufs/lfs/lfs_syscalls.c: 1.175
sys/ufs/lfs/lfs_vfsops.c: 1.360
sys/ufs/lfs/lfs_vnops.c: 1.316-1.321
sys/ufs/lfs/ulfs_inode.c: 1.20
sys/ufs/lfs/ulfs_inode.h: 1.24
sys/ufs/lfs/ulfs_lookup.c: 1.41
sys/ufs/lfs/ulfs_quota2.c: 1.31
sys/ufs/lfs/ulfs_readwrite.c: 1.24
sys/ufs/lfs/ulfs_vnops.c: 1.49-1.50
Update inode member i_flag --> i_state to keep up with kernel changes
Move definition of IN_ALLMOD near the flag it's a mask for.
Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
Correct confusion between i_flag and i_flags
These will have to be renamed.
Spotted by Riastradh, thanks!
Add an XXX about the missing flags so it's not buried in a commit
message.
now the XXX count for LFS is 260
Rename i_flag to i_state.
The similarity to i_flags has previously caused errors.
Use continue to denote the no-op loop to match netbsd style
newline for extra clarity.
It isn't safe to drain dirops with seglock held, it'll deadlock if there
are any dirops. drain before grabbing seglock.
lfs_dirops == 0 is always true (as we already drained dirops), so omit
that part of the comparison.
Fixes a lot of LFS deadlocks. PR kern/52301
Many thanks to dholland for help analyzing coredumps
Ifdef out KDASSERT which fires on my machine.
Deduplicate sanity check that seglock is held on segunlock
Revert r1.272 fix to PR kern/52301, the performance hit is making things
unusable.
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar
XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
fix buffer overflow/KASSERT when cookies are supplied
lfs no longer uses the ffs-style struct direct, use the correct minimum
size
from dholland
XXX more wrong
Consistently use {,UN}MARK_VNODE macros rather than function calls.
Not much point doing anything after a panic call
Ask some question about the code in a XXX comment
XXX question our double-flushing of dirops
Fix typo in comment
 1.11.6.1 04-Jun-2017  bouyer pullup the following revisions, requested by hannken in ticket #2:
src/share/man/man9/fstrans.9 1.25
src/sys/kern/vfs_mount.c 1.66
src/sys/kern/vfs_subr.c 1.468
src/sys/kern/vfs_trans.c 1.46
src/sys/kern/vfs_vnode.c 1.94, 1.95, 1.96
src/sys/kern/vnode_if.c 1.105, 1.106
src/sys/kern/vnode_if.sh 1.65, 1.66
src/sys/kern/vnode_if.src 1.76
src/sys/miscfs/genfs/genfs_io.c 1.69
src/sys/miscfs/genfs/genfs_vnops.c 1.196, 1.197
src/sys/miscfs/genfs/layer_extern.h 1.40
src/sys/miscfs/genfs/layer_vfsops.c 1.51
src/sys/miscfs/genfs/layer_vnops.c 1.67
src/sys/miscfs/nullfs/null_vnops.c 1.42
src/sys/miscfs/overlay/overlay_vnops.c 1.24
src/sys/miscfs/umapfs/umap_vnops.c 1.60
src/sys/rump/include/rump/rumpvnode_if.h 1.29, 1.30
src/sys/rump/librump/rumpkern/emul.c 1.182
src/sys/rump/librump/rumpvfs/rumpvnode_if.c 1.29, 1.30
src/sys/sys/fstrans.h 1.11
src/sys/sys/vnode.h 1.278
src/sys/sys/vnode_if.h 1.100, 1.101
src/sys/sys/vnode_impl.h 1.14, 1.15
src/sys/ufs/lfs/lfs_pages.c 1.12

Vnode state, lock and fstrans cleanup:
- Rename vnode state "VS_ACTIVE" to "VS_LOADED" and add synthetic
state "VS_ACTIVE" to assert a loaded vnode with usecount > 0.

- Redo FSTRANS in vnode_if.c and use it for VOP_LOCK and VOP_UNLOCK.

- Cleanup the genfs lock operations.

- Make "struct vnode_impl" member "vi_lock" a krwlock_t again.

- Remove the lock type argument from fstrans_start and
fstrans_start_nowait,
remove now unused FSTRANS state "FSTRANS_SUSPENDING".
 1.15.8.1 17-Aug-2020  martin Pull up following revision(s) (requested by riastradh in ticket #1050):

sys/ufs/lfs/lfs_subr.c: revision 1.101
sys/ufs/lfs/lfs_subr.c: revision 1.102
sys/ufs/lfs/lfs_inode.c: revision 1.158
sys/ufs/lfs/lfs_inode.h: revision 1.25
sys/ufs/lfs/lfs_balloc.c: revision 1.95
sys/ufs/lfs/lfs_pages.c: revision 1.21
sys/ufs/lfs/lfs_vnops.c: revision 1.330
sys/ufs/lfs/lfs_alloc.c: revision 1.140 (patch)
sys/ufs/lfs/lfs_alloc.c: revision 1.141 (patch)
lib/libp2k/p2k.c: revision 1.72
sys/ufs/lfs/lfs.h: revision 1.205
sys/ufs/lfs/lfs.h: revision 1.206
sys/ufs/lfs/lfs_segment.c: revision 1.284
sys/ufs/lfs/lfs.h: revision 1.207
sys/ufs/lfs/lfs_segment.c: revision 1.285
sys/ufs/lfs/lfs_debug.c: revision 1.55
sys/ufs/lfs/lfs_rename.c: revision 1.23
usr.sbin/dumplfs/dumplfs.c: revision 1.65
sys/ufs/lfs/lfs_vfsops.c: revision 1.371
sys/arch/i386/stand/efiboot/bootx64/Makefile: revision 1.3
sys/ufs/lfs/lfs_vfsops.c: revision 1.372
sys/ufs/lfs/lfs_vfsops.c: revision 1.373
sbin/fsck_lfs/pass1.c: revision 1.46
sys/ufs/lfs/lfs_vnops.c: revision 1.326
sys/ufs/lfs/lfs_vnops.c: revision 1.327
sys/ufs/lfs/lfs_vfsops.c: revision 1.375 (patch)
sys/ufs/lfs/lfs_vnops.c: revision 1.328
sys/ufs/lfs/lfs_subr.c: revision 1.98
sys/ufs/lfs/lfs_extern.h: revision 1.116
sys/ufs/lfs/lfs_vnops.c: revision 1.329
sys/ufs/lfs/lfs_subr.c: revision 1.99
sys/ufs/lfs/lfs_extern.h: revision 1.117
sys/ufs/lfs/lfs_accessors.h: revision 1.49
sys/ufs/lfs/lfs_extern.h: revision 1.118
sys/rump/fs/lib/liblfs/Makefile: revision 1.15
sys/ufs/lfs/lfs_bio.c: revision 1.146 (patch)
sys/ufs/lfs/lfs_bio.c: revision 1.147
sys/ufs/lfs/lfs_subr.c: revision 1.100

Fix kassert in lfs by initializing vp first.

Use a marker node to iterate lfs_dchainhd / i_lfs_dchain.

I believe elements can be removed while the lock is dropped,
including the next node we're hanging on to.

Just use VOP_BWRITE for lfs_bwrite_log.
Hope this doesn't cause trouble with vfs_suspend.

Teach lfs to transition ro<->rw.

Prevent new dirops while we issue lfs_flush_dirops.

lfs_flush_dirops assumes (by KASSERT((ip->i_state & IN_ADIROP) == 0))
that vnodes on the dchain will not become involved in active dirops
even while holding no other locks (lfs_lock, v_interlock), so we must
set lfs_writer here. All other callers already set lfs_writer.

We set fs->lfs_writer++ without explicitly doing lfs_writer_enter
because
(a) we already waited for the dirops to drain, and
(b) we hold lfs_lock and cannot drop it before setting lfs_writer.

Assert lfs_writer where I think we can now prove it.

Serialize access to the splay tree with lfs_lock.

Change some cheap KDASSERT into KASSERT.

Take a reference and fix assertions in lfs_flush_dirops.
Fixes panic:
KASSERT((ip->i_state & IN_ADIROP) == 0) at lfs_vnops.c:1670
lfs_flush_dirops
lfs_check
lfs_setattr
VOP_SETATTR
change_mode
sys_fchmod
syscall

This assertion -- and the assertion that vp->v_uflag has VU_DIROP set
-- is valid only until we release lfs_lock, because we may race with
lfs_unmark_dirop which will remove the nodes and change the flags.

Further, vp itself is valid only as long as it is referenced, which it
is as long as it's on the dchain, but lfs_unmark_dirop drops the
dchain's reference.

Don't lfs_writer_enter while holding v_interlock.

There's no need to lfs_writer_enter at all here, as far as I can see.
lfs_flush_fs will do it for us.

Break deadlock in PR kern/52301.

The lock order is lfs_writer -> lfs_seglock. The problem in 52301 is
that lfs_segwrite violates this lock order by sometimes doing
lfs_seglock -> lfs_writer, either (a) when doing a checkpoint or (b),
opportunistically, when there are no dirops pending. Both cases can
deadlock, because dirops sometimes take the seglock (lfs_truncate,
lfs_valloc, lfs_vfree):
(a) There may be dirops pending, and they may be waiting for the
seglock, so we can't wait for them to complete while holding the
seglock.
(b) The test for fs->lfs_dirops == 0 happens unlocked, and the state
may change by the time lfs_writer_enter acquires lfs_lock.

To resolve this in each case:
(a) Do lfs_writer_enter before lfs_seglock, since we will need it
unconditionally anyway. The worst performance impact of this should
be that some dirops get delayed a little bit.
(b) Create a new lfs_writer_tryenter to use at this point so that the
test for fs->lfs_dirops == 0 and the acquisition of lfs_writer happen
atomically under lfs_lock.

Initialize/destroy lfs_allclean_wakeup in modcmd, not lfs_mountfs.

Fixes reloading lfs.kmod.

In lfs_update, hold lfs_writer around lfs_vflush.

Otherwise, we might do
lfs_vflush
-> lfs_seglock
-> lfs_segwait(SEGM_CKP)
-> lfs_writer_enter
which is the reverse of the lfs_writer -> lfs_seglock ordering.

Call lfs_orphan in lfs_rename while we're still in the dirop.
lfs_writer_enter can't fail; keep it simple and don't pretend it can.

Assert that mtsleep can't fail either -- it doesn't catch signals and
there's no timeout.

Teach LFS_ORPHAN_NEXTFREE about lfs64.

Dust off the orphan detection code and try to make it work.

Fix !DIAGNOSTIC compile

Fix userland references to LFS_ORPHAN_NEXTFREE.

Forgot to grep for these or do a full distribution build, oops!

Fix missing <sys/evcnt.h> by removing the evcnts instead.

Just wanted to confirm that a race might happen, and indeed it did.
These serve little diagnostic value otherwise.

OR into bp->b_cflags; don't overwrite.

CTASSERT lfs on-disk structure sizes.

Avoid misaligned access to lfs64 on-disk records in memory.
lfs64 directory entries are only 32-bit aligned in order to conserve
space in directory blocks, and we had a hack to stuff a 64-bit inode
in them. This replaces the hack by __aligned(4) __packed, and goes
further:

1. It's not clear that all the other lfs64 data structures are 64-bit
aligned on disk to begin with. We can go through these later and
upgrade them from
struct foo64 {
...
} __aligned(4) __packed;
union foo {
struct foo64 f64;
...
};
to
struct foo64 {
...
};
union foo {
struct foo64 f64 __aligned(8);
...
} __aligned(4) __packed;
if we really want to take advantage of 64-bit memory accesses.
However, the __aligned(4) __packed must remain on the union
because:
2. We access even the lfs32 data structures via a union that has
lfs64 members, and it turns out that compilers will assume access
through a union with 64-bit aligned members implies the whole
union has 64-bit alignment, even if we're only accessing a 32-bit
aligned member.

Fix clang build after packed lfs64 accessor change.

Suppress spurious address-of-packed error in rump lfs too.
 1.15.4.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.19.2.2 29-Feb-2020  ad Sync with head.
 1.19.2.1 17-Jan-2020  ad Sync with head.
 1.26 01-Nov-2025  perseant Create a new LFS inode flag, IN_DEAD, to indicate that a file's last
reference, other than those that come with VU_DIROP or IN_CLEANING and
the one the caller holds, has been dropped. Check and apply this flag
in lfs_orphan(), and call lfs_orphan() on close if the link count is
zero. Change the signature of lfs_orphan to facilitate.

Make test t_vfsops:lfs_tfhremove expect success.

Closes PR kern/43745.
 1.25 20-Oct-2021  thorpej Overhaul of the EVFILT_VNODE kevent(2) filter:

- Centralize vnode kevent handling in the VOP_*() wrappers, rather than
forcing each individual file system to deal with it (except VOP_RENAME(),
because VOP_RENAME() is a mess and we currently have 2 different ways
of handling it; at least it's reasonably well-centralized in the "new"
way).
- Add support for NOTE_OPEN, NOTE_CLOSE, NOTE_CLOSE_WRITE, and NOTE_READ,
compatible with the same events in FreeBSD.
- Track which kevent notifications clients are interested in receiving
to avoid doing work for events no one cares about (avoiding, e.g.
taking locks and traversing the klist to send a NOTE_WRITE when
someone is merely watching for a file to be deleted, for example).

In support of the above:

- Add support in vnode_if.sh for specifying PRE- and POST-op handlers,
to be invoked before and after vop_pre() and vop_post(), respectively.
Basic idea from FreeBSD, but implemented differently.
- Add support in vnode_if.sh for specifying CONTEXT fields in the
vop_*_args structures. These context fields are used to convey information
between the file system VOP function and the VOP wrapper, but do not
occupy an argument slot in the VOP_*() call itself. These context fields
are initialized and subsequently interpreted by PRE- and POST-op handlers.
- Version VOP_REMOVE(), uses the a context field for the file system to report
back the resulting link count of the target vnode. Return this in tmpfs,
udf, nfs, chfs, ext2fs, lfs, and ufs.

NetBSD 9.99.92.
 1.24 05-Sep-2020  riastradh Round of uvm.h cleanup.

The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
to query whether curlwp is the pagedaemon, which should maybe be
exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
by it. We should split up uvm_extern.h but this will serve for now
to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
UVMHIST(ubchist), since ubchist is declared in uvm.h but the
reference evaporates if UVMHIST is not defined, so we reduce header
file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
here.

ok chs@
 1.23 23-Feb-2020  riastradh Call lfs_orphan in lfs_rename while we're still in the dirop.
 1.22 10-Jun-2017  maya branches: 1.22.6; 1.22.10; 1.22.12;
Rename i_flag to i_state.

The similarity to i_flags has previously caused errors.
 1.21 20-Jun-2016  dholland branches: 1.21.10;
One more batch of already-synced ufs changes:

ufs_extern.h 1.79 is equivalent to ulfs_extern.h 1.14
ufsmount.h 1.43 is (roughly) equivalent to lfs_extern.h 1.102
ufs_inode.c 1.94 does not apply to lfs
ufs_inode.c 1.95 does not apply to lfs either
ufs_readwrite.c 1.108 is equivalent to ulfs_readwrite.c 1.8
ufs_readwrite.c 1.109 is equivalent to ulfs_readwrite.c 1.9
ufs_readwrite.c 1.110 is equivalent to ulfs_readwrite.c 1.10
ufs_readwrite.c 1.111 does not apply to lfs
ufs_readwrite.c 1.112 is equivalent to ulfs_readwrite.c 1.11
ufs_readwrite.c 1.113 is equivalent to ulfs_readwrite.c 1.13
ufs_readwrite.c 1.114 is equivalent to ulfs_readwrite.c 1.14
ufs_readwrite.c 1.115 is equivalent to ulfs_readwrite.c 1.15
ufs_readwrite.c 1.116-1.118 does not apply to lfs
ufs_readwrite.c 1.119-1.120 are equivalent to ulfs_readwrite.c 1.16
ufs_rename.c 1.12 is equivalent to lfs_rename.c 1.8
ufs_vnops.c 1.226 is equivalent to ulfs_vnops.c 1.22 and lfs_vnops.c 1.270
ufs_vnops.c 1.227 is equivalent to ulfs_vnops.c 1.23
ufs_vnops.c 1.228-1.229 are equivalent to ulfs_vnops.c 1.24
ufs_vnops.c 1.230 is equivalent to ulfs_vnops.c 1.25 and lfs_vnops.c 1.271
ufs_vnops.c 1.231 originated in lfs
ufs_vnops.c 1.232 does not apply to lfs
 1.20 20-Jun-2016  dholland fix typo in previous
 1.19 20-Jun-2016  dholland Merge ufs_rename.c 1.11: ufs_gro_genealogy: use vcache_get() to lookup DOTDOT.
 1.18 20-Jun-2016  dholland More already-merged or equivalent changes:

ufs_dirhash.c 1.36 corresponds to ulfs_dirhash.c 1.8
ufs_extattr.c 1.43 corresponds to ulfs_extattr.c 1.7
ufs_lookup.c 1.126 does not apply to lfs
ufs_lookup.c 1.127 we already have
ufs_lookup.c 1.128 does not apply to lfs
ufs_lookup.c 1.129 corresponds to ulfs_lookup.c 1.19
ufs_quota1.c 1.19 corresponds to ulfs_quota1.c 1.7
ufs_quota1.c 1.20 corresponds to ulfs_quota1.c 1.8
ufs_quota2.c 1.36 we have equivalent changes for
ufs_rename.c 1.9 corresponds to lfs_rename.c 1.5
ufs_rename.c 1.10 corresponds to lfs_rename.c 1.6
ufs_vnops.c 1.219 corresponds to lfs_vnops.c 1.260 and ulfs_vnops.c 1.19
ufs_vnops.c 1.220 corresponds to lfs_vnops.c 1.261 and ulfs_vnops.c 1.20
ufs_vnops.c 1.221 was superseded by later changes
ufs_vnops.c 1.222 got fixed independently in lfs
 1.17 19-Jun-2016  dholland Mark ufs file versions we're already synced with.
 1.16 21-Sep-2015  dholland Add 64-bit directory entry structures, and adjust accessors accordingly.

The LFS64 directory entry has a 64-bit inode number. This is stored as
two 32-bit values to avoid inducing 64-bit alignment requirements.

The exposed type for manipulating directory entries is now
LFS_DIRHEADER, following the same convention as e.g. IFILE and SEGUSE.
(But with LFS_ on it, because.)
 1.15 21-Sep-2015  dholland Oops; LFS_DIRECTSIZ() is going to need the fs as an argument.

Also, it turns out that dirhash needs a compile-time-constant version
of LFS_DIRECTSIZ(LFS_MAXNAMLEN+1), independent of 64-vs-32, so create
LFS_MAXDIRENTRYSIZE for this. Sigh.
 1.14 20-Sep-2015  dholland Clean up struct lfs_dirtemplate.
 1.13 15-Sep-2015  dholland Pass around struct lfs_dirheader instead of struct lfs_direct.
 1.12 15-Sep-2015  dholland Add an accessor function for directory names.
 1.11 15-Sep-2015  dholland Kill off ulfs_makedirentry; just pass the data to ulfs_direnter instead.
For now, move one copy of the code that allocates and fills in a
temporary struct lfs_direct to the top of ulfs_direnter; but it should
go away shortly.
 1.10 15-Sep-2015  dholland Add and use accessor functions for more of the directory entry fields.
 1.9 01-Sep-2015  dholland Add new accessors for the d_type and d_namlen fields of struct lfs_direct.
Napalm the old byteswap access logic for these.
 1.8 27-Mar-2015  riastradh Disentangle buffer-cached I/O from page-cached I/O in UFS.

Page-cached I/O is used for regular files, and is initiated by VFS
users such as userland and NFS.

Buffer-cached I/O is used for directories and symlinks, and is issued
only internally by UFS.

New UFS routine ufs_bufio replaces vn_rdwr for internal use.
ufs_bufio is implemented by new UFS operations uo_bufrd/uo_bufwr,
which sit in ufs_readwrite.c alongside the VOP_READ/VOP_WRITE
implementations.

I preserved the code as much as possible and will leave further
simplification for future commits. I kept the ulfs_readwrite.c
copypasta close to ufs_readwrite.c in case we ever want to merge them
back; likewise ext2fs_readwrite.c.

No externally visible semantic change. All atf fs tests still pass.
 1.7 17-May-2014  dholland branches: 1.7.2; 1.7.6; 1.7.8;
Remove the DIROP macros. They are evil, especially the CREATE ones.

This results in some duplicate logic in the creation vnops (symlink,
mknod, create, mkdir) but we will probably be able to factor it out in
a more sensible way later.

Now the creation vnops call getnewvnode explicitly instead of under
multiple layers of obscure gunk. Then we explicitly do lfs_set_dirop,
and afterwards lfs_unset_dirop.
 1.6 06-Feb-2014  hannken branches: 1.6.2;
Move fstrans_start()/fstrans_done() into genfs_insane_rename() to protect
the complete rename operation like we do for all other vnode operations.
 1.5 28-Jan-2014  martin Quell a gcc 4.8 maybe-unitialized false positive
 1.4 28-Jul-2013  dholland branches: 1.4.2;
Migrate the miscellaneous ulfs-level info from struct ulfsmount to
struct lfs.

Put them inside #ifdef _KERNEL there. They are not the only such
members, gross as that is. Unfortunately, moving struct lfs to
lfs_kernel.h does not work.
 1.3 28-Jul-2013  dholland Remove the now-pointless ulfs ops macros.
 1.2 20-Jul-2013  dholland branches: 1.2.2;
G/C unused pieces.
 1.1 20-Jul-2013  dholland Collect the pieces of lfs rename into lfs_rename.c, and sprinkle static.
 1.2.2.2 23-Jul-2013  riastradh sync with HEAD
 1.2.2.1 20-Jul-2013  riastradh file lfs_rename.c was added on branch riastradh-drm2 on 2013-07-23 21:07:38 +0000
 1.4.2.3 18-May-2014  rmind sync with head
 1.4.2.2 28-Aug-2013  rmind sync with head
 1.4.2.1 28-Jul-2013  rmind file lfs_rename.c was added on branch rmind-smpnet on 2013-08-28 23:59:38 +0000
 1.6.2.1 10-Aug-2014  tls Rebase.
 1.7.8.4 28-Aug-2017  skrll Sync with HEAD
 1.7.8.3 09-Jul-2016  skrll Sync with HEAD
 1.7.8.2 22-Sep-2015  skrll Sync with HEAD
 1.7.8.1 06-Apr-2015  skrll Sync with HEAD
 1.7.6.3 03-Dec-2017  jdolecek update from HEAD
 1.7.6.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.7.6.1 17-May-2014  tls file lfs_rename.c was added on branch tls-maxphys on 2014-08-20 00:04:44 +0000
 1.7.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.7.2.1 17-May-2014  yamt file lfs_rename.c was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.21.10.1 30-Oct-2017  snj Pull up following revision(s) (requested by maya in ticket #330):
sbin/fsck_lfs/inode.c: 1.69
sbin/fsck_lfs/lfs.c: 1.73
sbin/fsck_lfs/pass6.c: 1.50
sbin/fsck_lfs/segwrite.c: 1.46
sys/ufs/lfs/lfs.h: 1.202-1.203
sys/ufs/lfs/lfs_accessors.h: 1.48
sys/ufs/lfs/lfs_alloc.c: 1.136-1.137
sys/ufs/lfs/lfs_balloc.c: 1.94
sys/ufs/lfs/lfs_bio.c: 1.141
sys/ufs/lfs/lfs_extern.h: 1.113
sys/ufs/lfs/lfs_inode.c: 1.156-1.157
sys/ufs/lfs/lfs_inode.h: 1.20, 1.21, 1.23
sys/ufs/lfs/lfs_itimes.c: 1.20
sys/ufs/lfs/lfs_pages.c: 1.13-1.15
sys/ufs/lfs/lfs_rename.c: 1.22
sys/ufs/lfs/lfs_segment.c: 1.270-1.275
sys/ufs/lfs/lfs_subr.c: 1.94-1.97
sys/ufs/lfs/lfs_syscalls.c: 1.175
sys/ufs/lfs/lfs_vfsops.c: 1.360
sys/ufs/lfs/lfs_vnops.c: 1.316-1.321
sys/ufs/lfs/ulfs_inode.c: 1.20
sys/ufs/lfs/ulfs_inode.h: 1.24
sys/ufs/lfs/ulfs_lookup.c: 1.41
sys/ufs/lfs/ulfs_quota2.c: 1.31
sys/ufs/lfs/ulfs_readwrite.c: 1.24
sys/ufs/lfs/ulfs_vnops.c: 1.49-1.50
Update inode member i_flag --> i_state to keep up with kernel changes
Move definition of IN_ALLMOD near the flag it's a mask for.
Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
Correct confusion between i_flag and i_flags
These will have to be renamed.
Spotted by Riastradh, thanks!
Add an XXX about the missing flags so it's not buried in a commit
message.
now the XXX count for LFS is 260
Rename i_flag to i_state.
The similarity to i_flags has previously caused errors.
Use continue to denote the no-op loop to match netbsd style
newline for extra clarity.
It isn't safe to drain dirops with seglock held, it'll deadlock if there
are any dirops. drain before grabbing seglock.
lfs_dirops == 0 is always true (as we already drained dirops), so omit
that part of the comparison.
Fixes a lot of LFS deadlocks. PR kern/52301
Many thanks to dholland for help analyzing coredumps
Ifdef out KDASSERT which fires on my machine.
Deduplicate sanity check that seglock is held on segunlock
Revert r1.272 fix to PR kern/52301, the performance hit is making things
unusable.
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar
XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
fix buffer overflow/KASSERT when cookies are supplied
lfs no longer uses the ffs-style struct direct, use the correct minimum
size
from dholland
XXX more wrong
Consistently use {,UN}MARK_VNODE macros rather than function calls.
Not much point doing anything after a panic call
Ask some question about the code in a XXX comment
XXX question our double-flushing of dirops
Fix typo in comment
 1.22.12.1 29-Feb-2020  ad Sync with head.
 1.22.10.1 17-Aug-2020  martin Pull up following revision(s) (requested by riastradh in ticket #1050):

sys/ufs/lfs/lfs_subr.c: revision 1.101
sys/ufs/lfs/lfs_subr.c: revision 1.102
sys/ufs/lfs/lfs_inode.c: revision 1.158
sys/ufs/lfs/lfs_inode.h: revision 1.25
sys/ufs/lfs/lfs_balloc.c: revision 1.95
sys/ufs/lfs/lfs_pages.c: revision 1.21
sys/ufs/lfs/lfs_vnops.c: revision 1.330
sys/ufs/lfs/lfs_alloc.c: revision 1.140 (patch)
sys/ufs/lfs/lfs_alloc.c: revision 1.141 (patch)
lib/libp2k/p2k.c: revision 1.72
sys/ufs/lfs/lfs.h: revision 1.205
sys/ufs/lfs/lfs.h: revision 1.206
sys/ufs/lfs/lfs_segment.c: revision 1.284
sys/ufs/lfs/lfs.h: revision 1.207
sys/ufs/lfs/lfs_segment.c: revision 1.285
sys/ufs/lfs/lfs_debug.c: revision 1.55
sys/ufs/lfs/lfs_rename.c: revision 1.23
usr.sbin/dumplfs/dumplfs.c: revision 1.65
sys/ufs/lfs/lfs_vfsops.c: revision 1.371
sys/arch/i386/stand/efiboot/bootx64/Makefile: revision 1.3
sys/ufs/lfs/lfs_vfsops.c: revision 1.372
sys/ufs/lfs/lfs_vfsops.c: revision 1.373
sbin/fsck_lfs/pass1.c: revision 1.46
sys/ufs/lfs/lfs_vnops.c: revision 1.326
sys/ufs/lfs/lfs_vnops.c: revision 1.327
sys/ufs/lfs/lfs_vfsops.c: revision 1.375 (patch)
sys/ufs/lfs/lfs_vnops.c: revision 1.328
sys/ufs/lfs/lfs_subr.c: revision 1.98
sys/ufs/lfs/lfs_extern.h: revision 1.116
sys/ufs/lfs/lfs_vnops.c: revision 1.329
sys/ufs/lfs/lfs_subr.c: revision 1.99
sys/ufs/lfs/lfs_extern.h: revision 1.117
sys/ufs/lfs/lfs_accessors.h: revision 1.49
sys/ufs/lfs/lfs_extern.h: revision 1.118
sys/rump/fs/lib/liblfs/Makefile: revision 1.15
sys/ufs/lfs/lfs_bio.c: revision 1.146 (patch)
sys/ufs/lfs/lfs_bio.c: revision 1.147
sys/ufs/lfs/lfs_subr.c: revision 1.100

Fix kassert in lfs by initializing vp first.

Use a marker node to iterate lfs_dchainhd / i_lfs_dchain.

I believe elements can be removed while the lock is dropped,
including the next node we're hanging on to.

Just use VOP_BWRITE for lfs_bwrite_log.
Hope this doesn't cause trouble with vfs_suspend.

Teach lfs to transition ro<->rw.

Prevent new dirops while we issue lfs_flush_dirops.

lfs_flush_dirops assumes (by KASSERT((ip->i_state & IN_ADIROP) == 0))
that vnodes on the dchain will not become involved in active dirops
even while holding no other locks (lfs_lock, v_interlock), so we must
set lfs_writer here. All other callers already set lfs_writer.

We set fs->lfs_writer++ without explicitly doing lfs_writer_enter
because
(a) we already waited for the dirops to drain, and
(b) we hold lfs_lock and cannot drop it before setting lfs_writer.

Assert lfs_writer where I think we can now prove it.

Serialize access to the splay tree with lfs_lock.

Change some cheap KDASSERT into KASSERT.

Take a reference and fix assertions in lfs_flush_dirops.
Fixes panic:
KASSERT((ip->i_state & IN_ADIROP) == 0) at lfs_vnops.c:1670
lfs_flush_dirops
lfs_check
lfs_setattr
VOP_SETATTR
change_mode
sys_fchmod
syscall

This assertion -- and the assertion that vp->v_uflag has VU_DIROP set
-- is valid only until we release lfs_lock, because we may race with
lfs_unmark_dirop which will remove the nodes and change the flags.

Further, vp itself is valid only as long as it is referenced, which it
is as long as it's on the dchain, but lfs_unmark_dirop drops the
dchain's reference.

Don't lfs_writer_enter while holding v_interlock.

There's no need to lfs_writer_enter at all here, as far as I can see.
lfs_flush_fs will do it for us.

Break deadlock in PR kern/52301.

The lock order is lfs_writer -> lfs_seglock. The problem in 52301 is
that lfs_segwrite violates this lock order by sometimes doing
lfs_seglock -> lfs_writer, either (a) when doing a checkpoint or (b),
opportunistically, when there are no dirops pending. Both cases can
deadlock, because dirops sometimes take the seglock (lfs_truncate,
lfs_valloc, lfs_vfree):
(a) There may be dirops pending, and they may be waiting for the
seglock, so we can't wait for them to complete while holding the
seglock.
(b) The test for fs->lfs_dirops == 0 happens unlocked, and the state
may change by the time lfs_writer_enter acquires lfs_lock.

To resolve this in each case:
(a) Do lfs_writer_enter before lfs_seglock, since we will need it
unconditionally anyway. The worst performance impact of this should
be that some dirops get delayed a little bit.
(b) Create a new lfs_writer_tryenter to use at this point so that the
test for fs->lfs_dirops == 0 and the acquisition of lfs_writer happen
atomically under lfs_lock.

Initialize/destroy lfs_allclean_wakeup in modcmd, not lfs_mountfs.

Fixes reloading lfs.kmod.

In lfs_update, hold lfs_writer around lfs_vflush.

Otherwise, we might do
lfs_vflush
-> lfs_seglock
-> lfs_segwait(SEGM_CKP)
-> lfs_writer_enter
which is the reverse of the lfs_writer -> lfs_seglock ordering.

Call lfs_orphan in lfs_rename while we're still in the dirop.
lfs_writer_enter can't fail; keep it simple and don't pretend it can.

Assert that mtsleep can't fail either -- it doesn't catch signals and
there's no timeout.

Teach LFS_ORPHAN_NEXTFREE about lfs64.

Dust off the orphan detection code and try to make it work.

Fix !DIAGNOSTIC compile

Fix userland references to LFS_ORPHAN_NEXTFREE.

Forgot to grep for these or do a full distribution build, oops!

Fix missing <sys/evcnt.h> by removing the evcnts instead.

Just wanted to confirm that a race might happen, and indeed it did.
These serve little diagnostic value otherwise.

OR into bp->b_cflags; don't overwrite.

CTASSERT lfs on-disk structure sizes.

Avoid misaligned access to lfs64 on-disk records in memory.
lfs64 directory entries are only 32-bit aligned in order to conserve
space in directory blocks, and we had a hack to stuff a 64-bit inode
in them. This replaces the hack by __aligned(4) __packed, and goes
further:

1. It's not clear that all the other lfs64 data structures are 64-bit
aligned on disk to begin with. We can go through these later and
upgrade them from
struct foo64 {
...
} __aligned(4) __packed;
union foo {
struct foo64 f64;
...
};
to
struct foo64 {
...
};
union foo {
struct foo64 f64 __aligned(8);
...
} __aligned(4) __packed;
if we really want to take advantage of 64-bit memory accesses.
However, the __aligned(4) __packed must remain on the union
because:
2. We access even the lfs32 data structures via a union that has
lfs64 members, and it turns out that compilers will assume access
through a union with 64-bit aligned members implies the whole
union has 64-bit alignment, even if we're only accessing a 32-bit
aligned member.

Fix clang build after packed lfs64 accessor change.

Suppress spurious address-of-packed error in rump lfs too.
 1.22.6.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.40 20-Oct-2025  perseant * Generalize the partial-segment parser introduced for roll-forward,
using it to facilitate an in-kernel segment rewriter (cleaner), and a
mechanism to check whether a segment is in fact empty (only used
with DEBUG).

* Add these new fcntl calls:
- LFCNFILESTATS: For each inode given, report its number of direct
blocks, how many gaps (discontinuities) there are between direct
blocks, and how large the total gap distance is. This will be
useful for a coalescing agent.
- LFCNREWRITEFILE: For each inode given, rewrite its direct blocks,
effectively coalescing it into as compact a form as possible.
- LFCNSCRAMBLE: As above, except that it only rewrites every other
block. This causes the file to have many gaps that can be
measured with LFCNFILESTATS and addressed with LFCNREWRITEFILE,
for testing purposes.
- LFCNREWRITESEGS: Rewrite any live data in the given segments.
This is intended to simplify the cleaner API and facilitate an
in-kernel cleaner.
- LFCNCLEANERINFO: Get the most current CLEANERINFO data from the
kernel.
- LFCNSEGUSE: Retrieve segment usage data from the kernel.

* Vnodes marked IN_CLEANING now take a reference. Add a new "cleaner
lock", which must be taken by the cleaner before the segment lock,
and before marking nodes IN_CLEANING. This allows us to flush
vnodes, if necessary, before the cleaning segment is written, and
never to flush vnodes being cleaned. When the cleaner lock is
released, the vnodes are cleared of IN_CLEANING and the reference
dropped.

* Track a potential infinite loop in lfs_gatherblock.

* Pull "needs to flush" and "needs to wait for flush" into functions
instead of inlining their definitions.
 1.39 14-Oct-2025  perseant Check the existing inode address against LFS_UNUSED_DADDR before checking
whether it is in the same segment, to prevent a byte undercount in segment 0
during roll forward. This was most often expressed in the fs/lfs/t_rfw rfw64
test case, though it affected both 32- and 64-bit LFSs equally.
 1.38 06-Oct-2025  perseant Don't stop recovery when we find a partial-segment with neither inodes nor
finfos. Under normal conditions, we should never be producing such a partial
segment. However, these do sometimes appear and they need not prevent us
from continuing.
 1.37 17-Sep-2025  perseant Add working in-kernel roll forward.
 1.36 05-Sep-2020  riastradh Round of uvm.h cleanup.

The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
to query whether curlwp is the pagedaemon, which should maybe be
exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
by it. We should split up uvm_extern.h but this will serve for now
to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
UVMHIST(ubchist), since ubchist is declared in uvm.h but the
reference evaporates if UVMHIST is not defined, so we reduce header
file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
here.

ok chs@
 1.35 17-Jan-2020  ad VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.
 1.34 01-Jan-2019  hannken branches: 1.34.6;
Add "void *extra" argument to vcache_new() so a file system may
pass more information about the file to create.

Welcome to 8.99.30
 1.33 10-Dec-2018  maxv Remove unused mbuf.h includes.
 1.32 03-Oct-2015  dholland branches: 1.32.16; 1.32.18;
Use the new IINFO in the rfw code, eliminating hardwired 32-bit values.
 1.31 01-Sep-2015  dholland Use the lfs dinode accessors in place of the ufs-derived ones.
(Mostly.)

The ufs-derived ones are fake structure member macros, which are gross
and not very safe. Also, it seems that a lot of places in the lfs code
were using the ffsv1 branch of them unconditionally, and this way it's
guaranteed all those places have been updated.

Found while doing this: for non-devices, have getattr produce NODEV
in the rdev field instead of leaking the address of the first direct
block.
 1.30 19-Aug-2015  dholland Part two of dinodes; use the same union everywhere.
(previously the ufs-derived code had things set up slightly different)

Remove a bunch of associated mess.
 1.29 12-Aug-2015  dholland Hack up dinode usage to be 64 vs. 32 as needed. Part 1.

(This part changes the native lfs code; the ufs-derived code already
has 64 vs. 32 logic, but as aspects of it are unsafe, and don't
entirely interoperate cleanly with the lfs 64/32 stuff, pass 2 will be
rehashing that.)
 1.28 12-Aug-2015  dholland Provide 32-bit and 64-bit versions of FINFO.

This also entailed sorting out part of struct segment, as that
contains a pointer into the current FINFO data.
 1.27 12-Aug-2015  dholland Make 32-bit and 64-bit versions of SEGSUM.
Also fix some of the FINFO handling as it's closely entangled.
 1.26 12-Aug-2015  dholland Add IFILE32 and IFILE64 structures for the on-disk ifile entries.
Add and use accessors. There are also a bunch of places that cast and
I hope I've found them all...
 1.25 02-Aug-2015  dholland Use accessor functions for the version field of the lfs superblock.
I thought at first maybe the cases that test the version should be
rolled into the accessors, but on the whole I think the conclusion on
that is no.
 1.24 28-Jul-2015  dholland Add a new lfs header file: lfs_accessors.h.

This contains all the accessor functions and macros out of lfs.h.
Add an include of lfs_accessors.h after all uses of lfs.h... except
for code that wants to define its own struct lfs-alike that the
accessors are supposed to play along with. For these, set STRUCT_LFS
and include lfs_accessors.h after the necessary structure has been
defined, so that lfs_accessors.h can emit functions in terms of it.
 1.23 24-Jul-2015  dholland More lfs superblock accessors.
(This changes the rest of the code over; all the accessors were
already added.)

The difference between this commit and the previous one is arbitrary,
but the previous one passed the regression tests on its own so I'm
keeping it separate to help with any bisections that might be needed
in the future.
 1.22 24-Jul-2015  dholland Switch to accessor functions for elements of the LFS on-disk
superblock. This will allow switching between 32/64 bit forms on the
fly; it will also allow handling LFS_EI reasonably tidily. (That
currently doesn't work on the superblock.)

It also gets rid of cpp abuse in the form of fake structure member
macros.

Also, instead of doing sleep/wakeup on &lfs_avail and &lfs_nextseg
inside the on-disk superblock, add extra elements to the in-memory
struct lfs for this. (XXX: these should be changed to condvars, but
not right now)

XXX: this migrates a structure needed by the lfs code in libsa (struct
salfs) into lfs.h, where it doesn't belong, but for the time being
this is necessary in order to allow the accessors (and the various
lfs macros and other goop that relies on them) to compile.
 1.21 16-Jul-2015  dholland Don't cast the return value of malloc.
 1.20 31-May-2015  hannken Change lfs from hash table to vcache.

- Change lfs_valloc() to return an inode number and version instead of
a vnode and move lfs_ialloc() and lfs_vcreate() to new lfs_init_vnode().

- Add lfs_valloc_fixed() to allocate a known inode, used by kernel
roll forward.

- Remove lfs_*ref(), these functions cannot coexist with vcache and
their commented behaviour is far away from their implementation.

- Add the cleaner lwp and blockinfo to struct ulfsmount so lfs_loadvnode()
may use hints from the cleaner.

- Remove vnode locks from ulfs_lookup() like we did with ufs_lookup().
 1.19 28-Mar-2015  maxv Remove the 'cred' argument from bread(). Remove a now unused var in
ffs_snapshot.c. Update the man page accordingly.

ok hannken@
 1.18 28-Jul-2013  dholland branches: 1.18.6;
Add lfs_kernel.h for declarations that don't need to be exposed to userland.

lfs currently has the following headers:
lfs.h - on-disk structures and stuff needed for userlevel tools
lfs_inode.h - additional restricted materials for userlevel tools
that operate the fs (newfs_lfs, fsck_lfs, lfs_cleanerd)
lfs_kernel.h - stuff needed only in the kernel

and the following legacy headers that are expected to be mopped up and
folded into one of the above:
lfs_extern.h - function prototypes
ulfs_bswap.h - endian-independent support
ulfs_dinode.h - now contains very little
ulfs_dirhash.h - dirhash support
ulfs_extattr.h - extattr support
ulfs_extern.h - more function prototypes
ulfs_inode.h - assorted kernel-only declarations
ulfs_quota.h - quota support
ulfs_quota1.h - more quota support
ulfs_quota2.h - more quota support
ulfs_quotacommon.h - more quota support
ulfsmount.h - legacy copy of ufsmount material
 1.17 18-Jun-2013  christos branches: 1.17.2;
Prefix most of the cpp macros with lfs_ and LFS_ to avoid conflicts with ffs.
This was done so that boot blocks that want to compile both FFS and LFS in
the same file work.
 1.16 08-Jun-2013  dholland Stick LFS_ in front of IFMT, IFIFO, IFREG, etc. so as not to conflict
with the UFS copies of these symbols. (Which themselves ought to have
UFS_ stuck on.)
 1.15 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.14 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.13 22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.12 22-Feb-2009  ad branches: 1.12.12; 1.12.22;
PR kern/26878 FFSv2 + softdep = livelock (no free ram)
PR kern/16942 panic with softdep and quotas
PR kern/19565 panic: softdep_write_inodeblock: indirect pointer #1 mismatch
PR kern/26274 softdep panic: allocdirect_merge: ...
PR kern/26374 Long delay before non-root users can write to softdep partitions
PR kern/28621 1.6.x "vp != NULL" panic in ffs_softdep.c:4653 while unmounting a softdep (+quota) filesystem
PR kern/29513 FFS+Softdep panic with unfsck-able file-corruption
PR kern/31544 The ffs softdep code appears to fail to write dirty bits to disk
PR kern/31981 stopping scsi disk can cause panic (softdep)
PR kern/32116 kernel panic in softdep (assertion failure)
PR kern/32532 softdep_trackbufs deadlock
PR kern/37191 softdep: locking against myself
PR kern/40474 Kernel panic after remounting raid root with softdep

Retire softdep, pass 2. As discussed and later formally announced on the
mailing lists.
 1.11 16-May-2008  hannken branches: 1.11.6; 1.11.12;
Make sure all cached buffers with valid, not yet written data have been
run through copy-on-write. Call fscow_run() with valid data where possible.

The LP_UFSCOW hack is no longer needed to protect ffs_copyonwrite() against
endless recursion.

- Add a flag B_MODIFY to bread(), breada() and breadn(). If set the caller
intends to modify the buffer returned.

- Always run copy-on-write on buffers returned from ffs_balloc().

- Add new function ffs_getblk() that gets a buffer, assigns a new blkno,
may clear the buffer and runs copy-on-write. Process possible errors
from getblk() or fscow_run(). Part of PR kern/38664.

Welcome to 4.99.63

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.10 28-Apr-2008  martin branches: 1.10.2;
Remove clause 3 and 4 from TNF licenses
 1.9 02-Jan-2008  ad branches: 1.9.6; 1.9.8; 1.9.10;
Merge vmlocking2 to head.
 1.8 12-Dec-2007  he Make this build again, as part of sys/lkm/dev/vnd/:
- lfs_truncate() has lost its lwp argument.
- Cast from void* to char* before doing pointer arithmetic.
 1.7 12-Dec-2007  ad Fix a stray brelse() that got missed.
 1.6 12-Dec-2007  lukem defflag LFS_KERNEL_RFW (in opt_lfs.h).
Note: lfs_rfw.c doesn't compile if you define the option; locking API fallout?
 1.5 10-Oct-2007  ad branches: 1.5.4; 1.5.6; 1.5.8; 1.5.10;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.4 08-Oct-2007  ad Merge ffs locking & brelse changes from the vmlocking branch.
 1.3 01-Sep-2006  perseant branches: 1.3.4; 1.3.10; 1.3.16; 1.3.30; 1.3.32; 1.3.34;
Changes to help the roll-forward agent, to wit:

* Mark being-deleted files in the Ifile so we can finish deleting them
at fs mount time.
* Flag the Ifile with "cleaner must clean" when writers are waiting for
the cleaner, rather than relying solely on the cleaner's estimation of
whether it should clean or not.
* Note partial segments written by a user agent (in particular,
fsck_lfs) so that repeated rolls forward don't interfere with one
another.
* Add a new fcntl, LFCNPASS, that allows the log to wrap exactly once,
for better testing of the validity of checkpoints.
* Keep track of the on-disk nlink count when cleaning, so that we don't
partially complete directory operations while cleaning.
* Ensure that every single Ifile inode write represents a consistent
view of the filesystem. In particular, the accounting for the segment
we are writing the inode into must be correct, and the accounting for
the segment that inode used to reside in must be correct. Rather than
just rewriting the inode if we wrote it wrong, rewrite the necessary
ifile blocks before writing the inode so we never write it wrong.
* Don't unmark any VDIROP vnodes if we haven't written them to disk,
avoiding yet another problem with the "wait for the cleaner" error
return from lfs_putpages().

Also, move the last callback to an aiodone call, so we no longer do any
memory management from interrupt context.
 1.2 20-Jul-2006  perseant branches: 1.2.4;
Oops, commit the correct version of lfs_rfw.c. The roll-forward functionality
is known not to work in this version (as it did not previously) but it should
at least compile.
 1.1 20-Jul-2006  perseant Separate the (non-working) LFS kernel roll-forward code into its own file,
lfs_rfw.c.
 1.2.4.3 03-Sep-2006  yamt sync with head.
 1.2.4.2 11-Aug-2006  yamt sync with head
 1.2.4.1 20-Jul-2006  yamt file lfs_rfw.c was added on branch yamt-pdpolicy on 2006-08-11 15:47:37 +0000
 1.3.34.1 14-Oct-2007  yamt sync with head.
 1.3.32.2 09-Jan-2008  matt sync with HEAD
 1.3.32.1 06-Nov-2007  matt sync with HEAD
 1.3.30.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.3.16.3 24-Aug-2007  ad Sync with buffer cache locking changes. See buf.h/vfs_bio.c for details.
Some minor portions are incomplete and needs to be verified as a whole.
 1.3.16.2 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.3.16.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.3.10.4 21-Jan-2008  yamt sync with head
 1.3.10.3 27-Oct-2007  yamt sync with head.
 1.3.10.2 30-Dec-2006  yamt sync with head.
 1.3.10.1 01-Sep-2006  yamt file lfs_rfw.c was added on branch yamt-lazymbuf on 2006-12-30 20:51:01 +0000
 1.3.4.2 09-Sep-2006  rpaulo sync with head
 1.3.4.1 01-Sep-2006  rpaulo file lfs_rfw.c was added on branch rpaulo-netinet-merge-pcb on 2006-09-09 03:00:00 +0000
 1.5.10.2 02-Jan-2008  bouyer Sync with HEAD
 1.5.10.1 13-Dec-2007  bouyer Sync with HEAD
 1.5.8.1 13-Dec-2007  yamt sync with head.
 1.5.6.5 28-Dec-2007  ad Make it compile.
 1.5.6.4 26-Dec-2007  ad Sync with head.
 1.5.6.3 19-Dec-2007  ad Use a global lfs_lock.
 1.5.6.2 19-Dec-2007  ad Fix some more problems w/lfs on this branch.
 1.5.6.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.5.4.1 18-Feb-2008  mjf Sync with HEAD.
 1.9.10.2 04-May-2009  yamt sync with head.
 1.9.10.1 16-May-2008  yamt sync with head.
 1.9.8.1 18-May-2008  yamt sync with head.
 1.9.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.10.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.11.12.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.11.6.1 03-Mar-2009  skrll Sync with HEAD.
 1.12.22.4 03-Dec-2017  jdolecek update from HEAD
 1.12.22.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.12.22.2 23-Jun-2013  tls resync from head
 1.12.22.1 25-Feb-2013  tls resync with head
 1.12.12.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.12.12.1 23-Jan-2013  yamt sync with head
 1.17.2.1 28-Aug-2013  rmind sync with head
 1.18.6.4 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.18.6.3 22-Sep-2015  skrll Sync with HEAD
 1.18.6.2 06-Jun-2015  skrll Sync with HEAD
 1.18.6.1 06-Apr-2015  skrll Sync with HEAD
 1.32.18.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.32.18.1 10-Jun-2019  christos Sync with HEAD
 1.32.16.2 18-Jan-2019  pgoyette Synch with HEAD
 1.32.16.1 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.34.6.1 17-Jan-2020  ad Sync with head.
 1.297 04-Nov-2025  perseant Remove su_flags array, replacing it with a new flag SEGUSE_READY.
Segments progress from having su_nbytes==0 to SEGUSE_EMPTY to SEGUSE_READY
to clean, progressing to the nest step after a checkpoint.
 1.296 03-Nov-2025  perseant Be more careful about only setting IN_CLEANING in lfs_setclean() and clearing
it in lfs_clrclean(). Prevents a crash from re-removing an entry from the
lfs_cleanhd TAILQ.
 1.295 29-Oct-2025  perseant Use IINFOSIZE and LFS_BLKPTRSIZE, rather than sizeof(int32_t), to represent
the size of inode numbers and logical block numbers, respectively, in the
segment summary header. Prevents an overrun in LFS64.
 1.294 20-Oct-2025  perseant * Generalize the partial-segment parser introduced for roll-forward,
using it to facilitate an in-kernel segment rewriter (cleaner), and a
mechanism to check whether a segment is in fact empty (only used
with DEBUG).

* Add these new fcntl calls:
- LFCNFILESTATS: For each inode given, report its number of direct
blocks, how many gaps (discontinuities) there are between direct
blocks, and how large the total gap distance is. This will be
useful for a coalescing agent.
- LFCNREWRITEFILE: For each inode given, rewrite its direct blocks,
effectively coalescing it into as compact a form as possible.
- LFCNSCRAMBLE: As above, except that it only rewrites every other
block. This causes the file to have many gaps that can be
measured with LFCNFILESTATS and addressed with LFCNREWRITEFILE,
for testing purposes.
- LFCNREWRITESEGS: Rewrite any live data in the given segments.
This is intended to simplify the cleaner API and facilitate an
in-kernel cleaner.
- LFCNCLEANERINFO: Get the most current CLEANERINFO data from the
kernel.
- LFCNSEGUSE: Retrieve segment usage data from the kernel.

* Vnodes marked IN_CLEANING now take a reference. Add a new "cleaner
lock", which must be taken by the cleaner before the segment lock,
and before marking nodes IN_CLEANING. This allows us to flush
vnodes, if necessary, before the cleaning segment is written, and
never to flush vnodes being cleaned. When the cleaner lock is
released, the vnodes are cleared of IN_CLEANING and the reference
dropped.

* Track a potential infinite loop in lfs_gatherblock.

* Pull "needs to flush" and "needs to wait for flush" into functions
instead of inlining their definitions.
 1.293 17-Sep-2025  perseant Add working in-kernel roll forward.
 1.292 17-Sep-2025  perseant Use a workqueue to handle the superblock callback.
 1.291 17-Sep-2025  perseant Add routines to check freelist consistency if compiled with DEBUG and
conditional on a kernel variable manipulated via sysctl.
Add checks before and after each routine that modifies the free list.
#if 0 a section of lfs_vfree() that was intended to keep the free list ordered
but instead corrupted it.
 1.290 04-Sep-2025  perseant Copy the flags from a full partial segment to its continuation, if
a continuation is necessary, so that partial-segment collections marked
with SS_DIROP|SS_CONT are properly completed wiht a partial-segment marked
SS_DIROP (without SS_CONT). Necessary for roll-forward.
 1.289 02-Sep-2025  perseant Use a workqueue to handle cluster iodone, rather than doing it in interrupt context.
 1.288 05-Sep-2020  riastradh Round of uvm.h cleanup.

The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
to query whether curlwp is the pagedaemon, which should maybe be
exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
by it. We should split up uvm_extern.h but this will serve for now
to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
UVMHIST(ubchist), since ubchist is declared in uvm.h but the
reference evaporates if UVMHIST is not defined, so we reduce header
file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
here.

ok chs@
 1.287 13-Aug-2020  riastradh Skip unlinked inodes.

They no longer matter on disk so we don't need to write anything out
for them.
 1.286 23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.285 23-Feb-2020  riastradh Break deadlock in PR kern/52301.

The lock order is lfs_writer -> lfs_seglock. The problem in 52301 is
that lfs_segwrite violates this lock order by sometimes doing
lfs_seglock -> lfs_writer, either (a) when doing a checkpoint or (b),
opportunistically, when there are no dirops pending. Both cases can
deadlock, because dirops sometimes take the seglock (lfs_truncate,
lfs_valloc, lfs_vfree):

(a) There may be dirops pending, and they may be waiting for the
seglock, so we can't wait for them to complete while holding the
seglock.

(b) The test for fs->lfs_dirops == 0 happens unlocked, and the state
may change by the time lfs_writer_enter acquires lfs_lock.

To resolve this in each case:

(a) Do lfs_writer_enter before lfs_seglock, since we will need it
unconditionally anyway. The worst performance impact of this should
be that some dirops get delayed a little bit.

(b) Create a new lfs_writer_tryenter to use at this point so that the
test for fs->lfs_dirops == 0 and the acquisition of lfs_writer happen
atomically under lfs_lock.
 1.284 23-Feb-2020  riastradh Change some cheap KDASSERT into KASSERT.
 1.283 22-Feb-2020  ad Make LFS/rump play nice with aiodoned removal.

PR kern/55004 (Hundreds of file system tests now fail on real hardware)
 1.282 18-Feb-2020  chs remove the aiodoned thread. I originally added this to provide a thread context
for doing page cache iodone work, but since then biodone() has changed to
hand off all iodone work to a softint thread, so we no longer need the
special-purpose aiodoned thread.
 1.281 15-Jan-2020  ad Merge from yamt-pagecache (after much testing):

- Reduce unnecessary page scan in putpages esp. when an object has a ton of
pages cached but only a few of them are dirty.

- Reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
 1.280 08-Dec-2019  ad branches: 1.280.2;
Revert previous. No performance gain worth the potential headaches
with buffers in these contexts.
 1.279 08-Dec-2019  ad Avoid thundering herd: cv_broadcast(&bp->b_busy) -> cv_signal(&bp->b_busy)
 1.278 03-Sep-2018  riastradh branches: 1.278.4;
Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
 1.277 09-Jun-2018  zafer branches: 1.277.2;
Add missing b_cflags and b_oflags.
Ok dholland@
Addresses PR kern/42342 by Yoshihiro Nakajima
 1.276 06-Jun-2018  maya Remove duplicate ;
 1.275 20-Aug-2017  maya branches: 1.275.2;
XXX question our double-flushing of dirops
 1.274 26-Jul-2017  maya change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar

XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
 1.273 26-Jul-2017  maya Revert r1.272 fix to PR kern/52301, the performance hit is making things
unusable.
 1.272 15-Jun-2017  maya It isn't safe to drain dirops with seglock held, it'll deadlock if there
are any dirops. drain before grabbing seglock.

lfs_dirops == 0 is always true (as we already drained dirops), so omit
that part of the comparison.

Fixes a lot of LFS deadlocks. PR kern/52301

Many thanks to dholland for help analyzing coredumps
 1.271 12-Jun-2017  maya Use continue to denote the no-op loop to match netbsd style
newline for extra clarity.
 1.270 10-Jun-2017  maya Rename i_flag to i_state.

The similarity to i_flags has previously caused errors.
 1.269 06-Apr-2017  maya branches: 1.269.6;
don't guard lfs_sbactive or lfs_log with splbio, lfs_lock is plenty.
 1.268 06-Apr-2017  maya remove deprecated comment (and move it below assert)
there's no spl dance for I/O here.
 1.267 06-Apr-2017  maya Provide a LFS_ENTER_LOG (__nothing) in the !DEBUG case.
so I can drop lots of #ifdef DEBUG around this macro. NFCI
 1.266 06-Apr-2017  maya Drop single use macro LFS_BCLEAN_LOG with an inlined implementation.

LFS_ENTER_LOG currently macro grabs lfs_lock, so I'd like to have just one
name for it.
 1.265 01-Apr-2017  riastradh KASSERT(mutex_owned(vp->v_interlock)) in vnode iterator selector.
 1.264 13-Mar-2017  riastradh #if DIAGNOSTIC panic ---> KASSERT

Replace some #if DEBUG by this too. DEBUG is only for expensive
assertions; these are not.
 1.263 19-Oct-2015  dholland branches: 1.263.2; 1.263.4;
improve some panic messages
 1.262 10-Oct-2015  dholland Fix minor bitrot in #if 0 or otherwise disabled code.
 1.261 10-Oct-2015  dholland Use accessors for some more indirect block manipulations.
 1.260 03-Oct-2015  dholland Use IINFO in lfs_writeinode().
(both the kernel and the userland copies)
 1.259 01-Sep-2015  dholland Use the lfs dinode accessors in place of the ufs-derived ones.
(Mostly.)

The ufs-derived ones are fake structure member macros, which are gross
and not very safe. Also, it seems that a lot of places in the lfs code
were using the ffsv1 branch of them unconditionally, and this way it's
guaranteed all those places have been updated.

Found while doing this: for non-devices, have getattr produce NODEV
in the rdev field instead of leaking the address of the first direct
block.
 1.258 21-Aug-2015  hannken lfs_writevnodes: replace mnt_vnodelist traversal with vfs_vnode_iterator.
 1.257 19-Aug-2015  dholland Part two of dinodes; use the same union everywhere.
(previously the ufs-derived code had things set up slightly different)

Remove a bunch of associated mess.
 1.256 12-Aug-2015  dholland Hack up dinode usage to be 64 vs. 32 as needed. Part 1.

(This part changes the native lfs code; the ufs-derived code already
has 64 vs. 32 logic, but as aspects of it are unsafe, and don't
entirely interoperate cleanly with the lfs 64/32 stuff, pass 2 will be
rehashing that.)
 1.255 12-Aug-2015  dholland Provide 32-bit and 64-bit versions of FINFO.

This also entailed sorting out part of struct segment, as that
contains a pointer into the current FINFO data.
 1.254 12-Aug-2015  dholland Make 32-bit and 64-bit versions of SEGSUM.
Also fix some of the FINFO handling as it's closely entangled.
 1.253 12-Aug-2015  dholland Add IFILE32 and IFILE64 structures for the on-disk ifile entries.
Add and use accessors. There are also a bunch of places that cast and
I hope I've found them all...
 1.252 12-Aug-2015  dholland Make 32-bit and 64-bit versions of CLEANERINFO.

XXX: while this is written to disk, it seems like much of it would
XXX: be better set up as a commpage shared with the cleaner.
 1.251 02-Aug-2015  dholland Pass the fs object to LFS_MAX_DADDR so it can check lfs_is64.

Remove some hackish intentional 64->32 truncations next to the checks
using LFS_MAX_DADDR, and tackle the problem they handled in bmap
instead.

The problem: the magic block pointer value UNWRITTEN has magic value
-2, and if it's not handled specifically, uint32 -> uint64 promotion
turns it into 4294967294, which then causes consternation and
monkeyhouse downstream.

What's here is still kind of a hack, but it's a step forward.
 1.250 02-Aug-2015  dholland Add a (draft) 64-bit superblock. Make things build again.

Add pieces of support for using both superblock types where
convenient, and specifically to the superblock accessors, but don't
actually enable it anywhere.

First substantive step on PR 50000.
 1.249 02-Aug-2015  dholland Use accessor functions for the version field of the lfs superblock.
I thought at first maybe the cases that test the version should be
rolled into the accessors, but on the whole I think the conclusion on
that is no.
 1.248 02-Aug-2015  dholland Make i_eff_nblks in the in-memory inode 64 bits wide.
 1.247 02-Aug-2015  dholland Fix catastrophic bug in lfs_rewind() that changed segment numbers
(lfs_curseg/lfs_nextseg in the superblock) using the wrong units.
These fields are for whatever reason the start addresses of segments
(measured in frags) rather than the segment numbers 0..n.

This only apparently affects dumping from a mounted fs; however, it
trashes the fs.

I would really, really like to have a static analysis tool that can
keep track of the units things are measured in, since fs code is full
of conversion macros and the macros are named inscrutable things like
"sntod" whose letters don't necessarily even correspond to the units
they convert. It is surprising that more of these are not wrong.
 1.246 02-Aug-2015  dholland Second batch of 64 -> 32 truncations in lfs, along with more minor
tidyups and corrections in passing.
 1.245 28-Jul-2015  dholland Add a new lfs header file: lfs_accessors.h.

This contains all the accessor functions and macros out of lfs.h.
Add an include of lfs_accessors.h after all uses of lfs.h... except
for code that wants to define its own struct lfs-alike that the
accessors are supposed to play along with. For these, set STRUCT_LFS
and include lfs_accessors.h after the necessary structure has been
defined, so that lfs_accessors.h can emit functions in terms of it.
 1.244 25-Jul-2015  martin Use accessors in DEBUG and DIAGNOSTIC code as well
 1.243 24-Jul-2015  dholland More lfs superblock accessors.
(This changes the rest of the code over; all the accessors were
already added.)

The difference between this commit and the previous one is arbitrary,
but the previous one passed the regression tests on its own so I'm
keeping it separate to help with any bisections that might be needed
in the future.
 1.242 24-Jul-2015  dholland Switch to accessor functions for elements of the LFS on-disk
superblock. This will allow switching between 32/64 bit forms on the
fly; it will also allow handling LFS_EI reasonably tidily. (That
currently doesn't work on the superblock.)

It also gets rid of cpp abuse in the form of fake structure member
macros.

Also, instead of doing sleep/wakeup on &lfs_avail and &lfs_nextseg
inside the on-disk superblock, add extra elements to the in-memory
struct lfs for this. (XXX: these should be changed to condvars, but
not right now)

XXX: this migrates a structure needed by the lfs code in libsa (struct
salfs) into lfs.h, where it doesn't belong, but for the time being
this is necessary in order to allow the accessors (and the various
lfs macros and other goop that relies on them) to compile.
 1.241 07-Jun-2015  hannken Fix copy and paste errors from last commits.
- Kernel i386/ALL and amd64/ALL compile again.
- Resolves CID 1304138 (DEADCODE) and 1304139 (IDENTICAL_BRANCHES).
 1.240 31-May-2015  hannken Change lfs from hash table to vcache.

- Change lfs_valloc() to return an inode number and version instead of
a vnode and move lfs_ialloc() and lfs_vcreate() to new lfs_init_vnode().

- Add lfs_valloc_fixed() to allocate a known inode, used by kernel
roll forward.

- Remove lfs_*ref(), these functions cannot coexist with vcache and
their commented behaviour is far away from their implementation.

- Add the cleaner lwp and blockinfo to struct ulfsmount so lfs_loadvnode()
may use hints from the cleaner.

- Remove vnode locks from ulfs_lookup() like we did with ufs_lookup().
 1.239 31-May-2015  hannken Use VFS_PROTOS() for lfs.
Rename conflicting struct lfs field "lfs_start" to "lfs_s0addr".

No functional change.
 1.238 20-Apr-2015  riastradh Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.
 1.237 28-Mar-2015  maxv Remove the 'cred' argument from bread(). Remove a now unused var in
ffs_snapshot.c. Update the man page accordingly.

ok hannken@
 1.236 24-Mar-2014  hannken branches: 1.236.4; 1.236.6;
- Make VI_XLOCK, VI_CLEAN and VI_LOCKSHARE private to kern/vfs_*.c.
- Make vwait() static.
- Add vdead_check() to check a vnode for being or becoming dead.

Discussed on tech-kern.

Welcome to 6.99.38
 1.235 18-Mar-2014  hannken Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37
 1.234 17-Mar-2014  hannken Change vismarker() to VI_MARKER for lfs_writevnodes().
This operation has to be changed to vfs_vnode_iterator.
 1.233 29-Oct-2013  hannken Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25
 1.232 17-Oct-2013  christos - remove unused variables
- add debug ifdefs for debugging variables
- __USE() where appropriate.
 1.231 28-Jul-2013  dholland Add lfs_kernel.h for declarations that don't need to be exposed to userland.

lfs currently has the following headers:
lfs.h - on-disk structures and stuff needed for userlevel tools
lfs_inode.h - additional restricted materials for userlevel tools
that operate the fs (newfs_lfs, fsck_lfs, lfs_cleanerd)
lfs_kernel.h - stuff needed only in the kernel

and the following legacy headers that are expected to be mopped up and
folded into one of the above:
lfs_extern.h - function prototypes
ulfs_bswap.h - endian-independent support
ulfs_dinode.h - now contains very little
ulfs_dirhash.h - dirhash support
ulfs_extattr.h - extattr support
ulfs_extern.h - more function prototypes
ulfs_inode.h - assorted kernel-only declarations
ulfs_quota.h - quota support
ulfs_quota1.h - more quota support
ulfs_quota2.h - more quota support
ulfs_quotacommon.h - more quota support
ulfsmount.h - legacy copy of ufsmount material
 1.230 18-Jun-2013  christos branches: 1.230.2;
Prefix most of the cpp macros with lfs_ and LFS_ to avoid conflicts with ffs.
This was done so that boot blocks that want to compile both FFS and LFS in
the same file work.
 1.229 08-Jun-2013  dholland ulfs_dir.h has been emptied; remove it.
 1.228 08-Jun-2013  dholland Stick LFS_ in front of IFMT, IFIFO, IFREG, etc. so as not to conflict
with the UFS copies of these symbols. (Which themselves ought to have
UFS_ stuck on.)
 1.227 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.226 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.225 22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.224 16-Feb-2012  perseant branches: 1.224.2;
Pass t_renamerace and t_rmdirrace tests.

Adapt dholland@'s fix to ufs_rename to fix PR kern/43582. Address several
other MP locking issues discovered during the course of investigating the
same problem.

Removed extraneous vn_lock() calls on the Ifile, since the Ifile writes
are controlled by the segment lock.

Fix PR kern/45982 by deemphasizing the estimate of how much metadata
will fill the empty space on disk when the disk is nearly empty
(t_renamerace crates a lot of inode blocks on a tiny empty disk).
 1.223 02-Jan-2012  perseant branches: 1.223.2;

* Remove PGO_RECLAIM during lfs_putpages()' call to genfs_putpages(),
to avoid a live lock in the latter when reclaiming a vnode with
dirty pages.

* Add a new segment flag, SEGM_RECLAIM, to note when a segment is
being written for vnode reclamation, and record which inode is being
reclaimed, to aid in forensic debugging.

* Add a new segment flag, SEGM_SINGLE, so that opportunistic writes
can write a single segment's worth of blocks and then stop, rather
than writing all the way up to the cleaner's reserved number of
segments.

* Add assert statements to check mutex ownership is the way it ought
to be, mostly in lfs_putpages; fix problems uncovered by this.

* Don't clear VU_DIROP until the inode actually makes its way to disk,
avoiding a problem where dirop inodes could become separated
(uncovered by a modified version of the "ckckp" forensic regression
test).

* Move the vfs_getopsbyname() call into lfs_writerd. Prepare code to
make lfs_writerd notice when there are no more LFSs, and exit losing
the reference, so that, in theory, the module can be unloaded. This
code is not enabled, since it causes a crash on exit.

* Set IN_MODIFIED on inodes flushed by lfs_flush_dirops. Really we
only need to set IN_MODIFIED if we are going to write them again
(e.g., to write pages); need to think about this more.

Finally, several changes to help avoid "no clean segments" panics:

* In lfs_bmapv, note when a vnode is loaded only to discover whether
its blocks are live, so it can immediately be recycled. Since the
cleaner will try to choose ~empty segments over full ones, this
prevents the cleaner from (1) filling the vnode cache with junk, and
(2) squeezing any unwritten writes to disk and running the fs out of
segments.

* Overestimate by half the amount of metadata that will be required
to fill the clean segments. This will make the disk appear smaller,
but should help avoid a "no clean segments" panic.

* Rearrange lfs_writerd. In particular, lfs_writerd now pays
attention to the number of clean segments available, and holds off
writing until there is room.
 1.222 11-Jul-2011  hannken branches: 1.222.2; 1.222.6;
Change VOP_BWRITE() to take a vnode as its first argument like all other
VOPs do. Layered file systems no longer have to modify bp->b_vp and run
into trouble when an async VOP_BWRITE() uses the wrong vnode.

- change all occurences of VOP_BWRITE(bp) to VOP_BWRITE(bp->b_vp, bp).
- remove layer_bwrite().
- welcome to 5.99.55

Adresses PR kern/38762 panic: vwakeup: neg numoutput

No objections from tech-kern@.
 1.221 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.220 03-Apr-2011  rmind branches: 1.220.2;
- Use offsetof() in VOPARG_OFFSETOF() instead of re-implementing it.
- Remove VDESC_NOMAP_VPP and VDESC_VPP_WILLRELE.
- Remove VRELEL_NOINACTIVE and VRELEL_ONHEAD.
 1.219 02-Apr-2011  rmind Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.
 1.218 23-Mar-2011  rmind G/C count_lock_queue (unused for 12 years)
 1.217 21-Jul-2010  hannken branches: 1.217.2;
Make holding v_interlock mandatory for callers of vget().

Announced some time ago on tech-kern.
 1.216 24-Jun-2010  hannken Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.
 1.215 16-Feb-2010  mlelstv branches: 1.215.2;
Three changes in a single commit.

- drop the notion of frags (LFS fragments) vs fsb (FFS fragments)
The code uses a complicated unity function that just makes the
code difficult to understand.

- support larger sector sizes. Fix disk address computations
to use DEV_BSIZE in the kernel as required by device drivers
and to use sector sizes in userland.

- Fix several locking bugs in lfs_bio.c and lfs_subr.c.
 1.214 07-Aug-2009  wiz branches: 1.214.2;
Add missing parenthesis in #ifdef LFS_USE_B_INVAL.
From Henning Petersen in PR 41841.
 1.213 02-Jun-2008  ad branches: 1.213.8; 1.213.18; 1.213.22;
Use atomics to maintain v_usecount.
 1.212 16-May-2008  hannken Make sure all cached buffers with valid, not yet written data have been
run through copy-on-write. Call fscow_run() with valid data where possible.

The LP_UFSCOW hack is no longer needed to protect ffs_copyonwrite() against
endless recursion.

- Add a flag B_MODIFY to bread(), breada() and breadn(). If set the caller
intends to modify the buffer returned.

- Always run copy-on-write on buffers returned from ffs_balloc().

- Add new function ffs_getblk() that gets a buffer, assigns a new blkno,
may clear the buffer and runs copy-on-write. Process possible errors
from getblk() or fscow_run(). Part of PR kern/38664.

Welcome to 4.99.63

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.211 28-Apr-2008  martin branches: 1.211.2;
Remove clause 3 and 4 from TNF licenses
 1.210 27-Mar-2008  ad branches: 1.210.2; 1.210.4;
Make rusage collection per-LWP and collate in the appropriate places.
cloned threads need a little bit more work but the locking needs to
be fixed first.
 1.209 15-Feb-2008  ad branches: 1.209.6;
The buffer LOCKED flag need not be under the protection of bufcache_lock,
BUSY is enough.
 1.208 27-Jan-2008  pooka Replace vrelel() 010101-mania with a flags parameter. However,
leave flags unimplemented for a while (no change in functionality).
 1.207 02-Jan-2008  ad Merge vmlocking2 to head.
 1.206 10-Oct-2007  ad branches: 1.206.4; 1.206.6; 1.206.10;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.205 08-Oct-2007  ad Merge ffs locking & brelse changes from the vmlocking branch.
 1.204 09-Aug-2007  pooka branches: 1.204.2; 1.204.4;
Instead of having lfs muck directly about with vnode free lists,
introduce vrele2(), which allows to release vnodes the way lfs
sometimes wants it:
+ without calling inactive
+ inserting the vnode at the head of the freelist (this is a very
questionable optimization that isn't even enabled by default,
but I went along with the same semantics for now)
 1.203 29-Jul-2007  ad branches: 1.203.4; 1.203.6;
It's not a good idea for device drivers to modify b_flags, as they don't
need to understand the locking around that field. Instead of setting
B_ERROR, set b_error instead. b_error is 'owned' by whoever completes
the I/O request.
 1.202 12-Jul-2007  rmind branches: 1.202.2;
Implementation of per-CPU work-queues support for workqueue(9) interface.
WQ_PERCPU flag for workqueue and additional argument for workqueue_enqueue()
to assign a CPU might be used. Notes:
- For now, the list is used for workqueue_queue, which is non-optimal,
and will be changed with array, where index would be CPU ID.
- The data structures should be changed to be cache-friendly.

Reviewed by: <yamt>, <tech-kern>
 1.201 30-Jun-2007  pooka Using POOL_INIT here makes no sense, since file systems always have
an init method. So get rid of it and #ifdef _LKM and just always
init in the init method. Give malloc types the same treatment.
Makes file systems nicer to work with in linksetless environments
and fixes a few LKM discrepancies.
 1.200 16-May-2007  perseant Change references to SEGM_W_DIROPS to SEGM_CKP, and replace the logic that
formerly used SEGM_W_DIROPS in lfs_segwrite() appropriately. This prevents
a problem in which processes could get stuck in "buffers" sleep forever.
 1.199 17-Apr-2007  perseant Install a new sysctl, vfs.lfs.ignore_lazy_sync, which causes LFS to ignore
the "smooth" syncer, as if vfs.sync.*delay = 0, but only for LFS. The
default is "on", i.e., ignore lazy sync.

Reduce the amount of polling/busy-waiting done by lfs_putpages(). To
accomplish this, copied genfs_putpages() and modified it to indicate which
page it was that caused it to return with EDEADLK. fsync()/fdatasync()
should no longer ever fail with EAGAIN, and should not consume huge
quantities of cpu.

Also, try to make dirops less likely to be written as the result of a
VOP_PUTPAGES(), while ensuring that they are written regularly.
 1.198 04-Mar-2007  christos branches: 1.198.2; 1.198.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.197 23-Feb-2007  perseant Reverse the order of searching the vnode list in lfs_writevnodes(). This
should speed up e.g. "chown -R" on LFS filesystems; e.g. it shows a 100%
increase in the 'seq_stat' column of bonnie++.
 1.196 21-Dec-2006  yamt branches: 1.196.2;
merge yamt-splraiseipl branch.

- finish implementing splraiseipl (and makeiplcookie).
http://mail-index.NetBSD.org/tech-kern/2006/07/01/0000.html
- complete workqueue(9) and fix its ipl problem, which is reported
to cause audio skipping.
- fix netbt (at least compilation problems) for some ports.
- fix PR/33218.
 1.195 16-Nov-2006  christos branches: 1.195.2; 1.195.4;
__unused removal on arguments; approved by core.
 1.194 20-Oct-2006  reinoud Replace the LIST structure mp->mnt_vnodelist to a TAILQ structure since all
vnodes were synced and processed backwards. This meant that the last
accessed node was processed first and the earlierst last.

An extra benefit is the removal of the ugly hack from the Berkly days on
LFS.

In the proces, i've also replaced the various variations hand written loops
by the TAILQ_FOREACH() macro's.
 1.193 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.192 04-Oct-2006  christos fix empty if
 1.191 28-Sep-2006  perseant Use lockstatus instead of a homebrewed locking system to control
LFCNWRAPSTOP and LFCNWRAPGO.

Be less verbose about the various looping checks: use log() rather than
printf(), and only log anything if we are really looping ("count = 2" is
not an error condition).

Allow dirops sleeping on available space to be interruptible.
 1.190 02-Sep-2006  christos branches: 1.190.2; 1.190.4;
remove impossible test
 1.189 01-Sep-2006  perseant Changes to help the roll-forward agent, to wit:

* Mark being-deleted files in the Ifile so we can finish deleting them
at fs mount time.
* Flag the Ifile with "cleaner must clean" when writers are waiting for
the cleaner, rather than relying solely on the cleaner's estimation of
whether it should clean or not.
* Note partial segments written by a user agent (in particular,
fsck_lfs) so that repeated rolls forward don't interfere with one
another.
* Add a new fcntl, LFCNPASS, that allows the log to wrap exactly once,
for better testing of the validity of checkpoints.
* Keep track of the on-disk nlink count when cleaning, so that we don't
partially complete directory operations while cleaning.
* Ensure that every single Ifile inode write represents a consistent
view of the filesystem. In particular, the accounting for the segment
we are writing the inode into must be correct, and the accounting for
the segment that inode used to reside in must be correct. Rather than
just rewriting the inode if we wrote it wrong, rewrite the necessary
ifile blocks before writing the inode so we never write it wrong.
* Don't unmark any VDIROP vnodes if we haven't written them to disk,
avoiding yet another problem with the "wait for the cleaner" error
return from lfs_putpages().

Also, move the last callback to an aiodone call, so we no longer do any
memory management from interrupt context.
 1.188 20-Jul-2006  perseant Note partial segments that are written by the cleaner, to help out the
roll-forward agent.
 1.187 20-Jul-2006  perseant Loop on the check for lfs_nowrap, so we don't allow a process to squeeze by.
 1.186 20-Jul-2006  perseant Don't try to write all the vnodes, when the cleaner needs a vnode to be
recycled.
 1.185 29-Jun-2006  perseant Don't wake up the cleaner if the filesystem is unwrappable, and fix the
compatibility fcntls.

Also includes one-line fixes for an MP locking bug and a zero-length FINFO
problem that manifested during testing.
 1.184 24-Jun-2006  perseant Change LFCNWRAP{STOP,GO} to make them more suitable for snapshotting; in
particular, the caller can now choose whether to wait for the condition
to be met, and if the caller of LFCNWRAPSTOP dies or otherwise closes
the descriptor, the filesystem is started again. Updated the ckckp
regression test to use the new semantics.

dump_lfs(8) now uses the fcntls to implement LFS-style snapshotting through
the -X flag, addressing PR#33457 albeit not using fss(4). Fixed a couple
other problems with dump_lfs that manifested themselves during testing.
 1.183 23-Jun-2006  yamt fix a simonb-timecounters regression.
the precision of getnanotime() is not suitable for file timestamps.
esp. when it's nfs-exported.

- introduce vfs_timestamp().
(the name is from freebsd. currently merely a wrapper of nanotime())
- for ufs-like filesystems, use it rather than getnanotime().

XXX check other filesystems.
 1.182 07-Jun-2006  kardel branches: 1.182.2;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.181 20-May-2006  perseant Fix a bug in which FINFOs were written with a version number of zero.
Add assertions and add this to the DEBUG fip test in lfs_writeseg.
 1.180 18-May-2006  perseant branches: 1.180.2;
Break out the finfo array manipulation code into two new functions,
lfs_acquire_finfo() and lfs_release_finfo(). Add a debugging check
for zero-length finfo arrays in the segment summary to avoid future
regressions.
 1.179 14-May-2006  elad integrate kauth.
 1.178 12-May-2006  perseant Fixes to address the "vinvalbuf: dirty blocks" panic that can occur when
many inodes are cleaned at once. Make sure that we write all the pages
on vnodes that are being flushed, even if we don't think there's room;
drain v_numoutput before lfs_vflush() completes.

Also, don't allow a vnode that is in the process of being cleaned to be
chosen by getnewvnode(); this avoids a segment accounting panic in the case
that a large number of inodes are fed to lfs_markv() all at once.
 1.177 01-May-2006  perseant Don't ever partially write dirops, even if we need the cleaner to run.
This increases the chances of the "no clean segments" panic slightly,
but allows us to run the ckckp regression test successfully to completion.
 1.176 30-Apr-2006  perseant Postpone the segment accounting changes coming from truncation until the
inode that makes those changes valid is either written to disk by
lfs_writeinode() or discarded by lfs_vfree().

A couple of locking fixes are also included as well.
 1.175 22-Apr-2006  perseant Regression test improvements:

Move the stop for LFCNWRAPSTOP to the point at which writing at segment 0
is really about to commence, since this is what the test expects (and
incidentally what a snapshotting utility wants as well).

More correctly reconstruct the on-disk state at every checkpoint, rather
than relying on the entire state at the point of wrapping to be accurate
(that is only true the first time we wrap). Add a "make abort" target to
make rerunning the test more convenient when it has failed and we're done
analyzing the failure.
 1.174 17-Apr-2006  perseant Introduce two fcntl calls that freeze the filesystem right at the point
where segment 0 is being considered for writing. This allows for automated
checkpoint vailidity scanning, and could be used (in conjunction with the
existing LFCNREWIND) for e.g. snapshot dumps as well.

Include a regression test that does such scanning.

When writing the Ifile, loop through the dirty block list three times to
make sure that the checkpoint is always consistent (the first and second
times the Ifile blocks can cross a segment boundary; not so the third time
unless the segments are very small). Discovered by using the aforementioned
regression test.
 1.173 13-Apr-2006  perseant Make lfs_vref/lfs_vunref not need to know about VXLOCK and VFREEING
explicitly (especially since we didn't know about VFREEING at all before),
but notice the EBUSY return from vget() instead.

Fix some more MP locking protocol issues, most of which were pointed out by
Christian Ehrhardt this morning on tech-kern.
 1.172 07-Apr-2006  perseant Several minor bug fixes:

* Correct (weak) segment lock assertions in lfs_fragextend and lfs_putpages.
* Keep IN_MODIFIED set if we run out of avail in lfs_putpages.
* Don't try to (re)write buffers on a VBLK vnode; fixes a panic I found
while running with an LFS root.
* Raise priority of LFCNSEGWAIT to PVFS; PUSER is way too low for
something the pagedaemon is relying on.
 1.171 24-Mar-2006  perseant Improvements to LFS's paging mechanism, to wit:

* Acknowledge that sometimes there are more dirty pages to be written to
disk than clean segments. When we reach the danger line,
lfs_gop_write() now returns EAGAIN. The caller of VOP_PUTPAGES(), if
it holds the segment lock, drops it and waits for the cleaner to make
room before continuing.

* Note and avoid a three-way deadlock in lfs_putpages (a writer holding
a page busy blocks on the cleaner while the cleaner blocks on the
segment lock while lfs_putpages blocks on the page).
 1.170 17-Mar-2006  tls From Konrad Schroeder, in response to strange df output on anoncvs.netbsd.org:
We were returning the wrong value for free space. Now we're not.
 1.169 04-Jan-2006  yamt branches: 1.169.2; 1.169.4; 1.169.6; 1.169.8; 1.169.10;
- add simple functions to allocate/free a buffer for i/o.
- make bufpool static.
 1.168 11-Dec-2005  christos branches: 1.168.2;
merge ktrace-lwp.
 1.167 26-Sep-2005  yamt always use nanotime rather than time.
it's bad to mix nanotime and time because it sometimes
make timestamps go backwards.
 1.166 12-Sep-2005  christos Use nanotime() to update the time fields in filesystems. Convert the code
from macros to real functions. Original patch and review from chuq.
Note: ext2fs only keeps seconds in the on-disk inode, and msdosfs does not
have enough precision for all fields, so this is not very useful for those
two.
 1.165 19-Aug-2005  christos 64 bit inode changes.
 1.164 29-May-2005  christos branches: 1.164.2;
- sprinkle const
- avoid shadow variables.
 1.163 23-Apr-2005  perseant Provide a resize_lfs(8), including kernel and cleaner support. The current
implementation requires the fs to be mounted while resizing. Tested in both
directions, and everything appears to work happily, but ymmv.
 1.162 19-Apr-2005  perseant Keep per-inode, per-fs, and subsystem-wide counts of blocks allocated through
lfs_balloc(), and use that to estimate the number of dirty pages belonging
to LFS (subsystem or filesystem). This is almost certainly wrong for
the case of a large mmap()ed region, but the accounting is tighter than
what we had before, and performs much better in the typical case of pages
dirtied through write().
 1.161 18-Apr-2005  perseant Check the to-be-on-disk consistency of directories as well (correct a typo
in an earlier commit).
 1.160 14-Apr-2005  perseant Keep track of the highest block held by an LFS inode, so that we can
be assured that the last byte of a file is always allocated. Previously
a file extension could cause the filesystem to be flushed, writing an
inconsistent inode to disk. Although this condition would be corrected
the next time blocks were written to disk, an intervening crash would leave
the filesystem in an inconsistent state, leaving fsck_lfs to complain
of an inode "partially truncated".
 1.159 01-Apr-2005  perseant Protect various per-fs structures with fs->lfs_interlock simple_lock, to
improve behavior in the multiprocessor case. Add debugging segment-lock
assertion statements.
 1.158 08-Mar-2005  perseant branches: 1.158.2;
Straighten out the maze of ifdefs. Instead, consolidate all the debugging
stuff under '#ifdef DEBUG', and use sysctl knobs to turn on/off particular
parts of the debugging reporting (if DEBUG is enabled). Re-enable the LFS
statistics in sysctl, while I'm there. A bit of a rototill.
 1.157 26-Feb-2005  perry nuke trailing whitespace
 1.156 26-Feb-2005  perseant Various minor LFS improvements:

* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statvfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().
 1.155 18-Sep-2004  yamt branches: 1.155.4; 1.155.6;
change some members of struct buf from long to int.
ride on 2.0H.
 1.154 14-Aug-2004  mycroft Add a new flag, IN_MODIFY. This is like IN_UPDATE|IN_CHANGE, but unlike
setting those flags, it does not cause the inode to be written in the periodic
sync. This is used for writes to special files (devices and named pipes) and
FIFOs.

Do not preemptively sync updates to access times and modification times. They
are now updated in the inode only opportunistically, or when the file or device
is closed. (Really, it should be delayed beyond close, but this is enough to
help substantially with device nodes.)

And the most amusing part:
Trickle sync was broken on both FFS and ext2fs, in different ways. In FFS, the
periodic call to VFS_SYNC(MNT_LAZY) was still causing all file data to be
synced. In ext2fs, it was causing the metadata to *not* be synced. We now
only call VOP_UPDATE() on the node if we're doing MNT_LAZY. I've confirmed
that we do in fact trickle correctly now.
 1.153 19-May-2004  yamt lfs_cluster_aiodone: turn an invariant condition into an assertion.
 1.152 09-Mar-2004  yamt branches: 1.152.4;
calculate data checksum inline.
 1.151 09-Mar-2004  yamt use correct segment size. this fixes memory corruption when using lfsv1.
 1.150 29-Jan-2004  yamt lfs_update_single: add an assertion.
 1.149 28-Jan-2004  yamt eliminate tricky usages of VOP_STRATEGY which are (no longer?) necessary.
 1.148 25-Jan-2004  hannken Make VOP_STRATEGY(bp) a real VOP as discussed on tech-kern.

VOP_STRATEGY(bp) is replaced by one of two new functions:

- VOP_STRATEGY(vp, bp) Call the strategy routine of vp for bp.
- DEV_STRATEGY(bp) Call the d_strategy routine of bp->b_dev for bp.

DEV_STRATEGY(bp) is used only for block-to-block device situations.
 1.147 10-Jan-2004  yamt store a i/o priority hint in struct buf for buffer queue discipline.
 1.146 17-Dec-2003  yamt set VBWAIT when waiting v_numoutput to be drained.
 1.145 17-Dec-2003  yamt remove a redundant substitution.
 1.144 04-Dec-2003  yamt use b_private rather than b_saveaddr.
XXX LFS_USE_B_INVAL
 1.143 07-Nov-2003  yamt - tweak lfs_update_single()'s prototype so that it can be used by
roll-forward code.
- reduce code duplication using the above in update_meta()
this also fixes fragment accounting.
 1.142 25-Oct-2003  christos Fix uninitialized variable warnings.
 1.141 18-Oct-2003  yamt be more strict about sa->vp.
(make sure the last lfs_updatemata in lfs_putpages takes effect.)
 1.140 18-Oct-2003  simonb Remove assigned-to but otherwise unused variable.
 1.139 17-Oct-2003  yamt add comments and tweak code a little for readability.
(no behaviour changes)
 1.138 14-Oct-2003  yamt remove a redundant definition of LFS_MAX_ACTIVE.
 1.137 08-Oct-2003  yamt - a comment.
- bcopy -> memcpy
- increase 'p' only when needed.
 1.136 03-Oct-2003  yamt assertions.
 1.135 03-Oct-2003  yamt reassignbuf() when lfs_writeseg() takes away B_DELWRI.
 1.134 03-Oct-2003  yamt when inactivating segments, compare segment numbers correctly.
 1.133 29-Sep-2003  yamt remove redundant prototypes.
 1.132 07-Sep-2003  yamt - buffer cache MP locks.
- avoid changing buffer state on the free queue.
 1.131 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.130 30-Jul-2003  yamt using normal bufcache buffer for cluster buffer head.
 1.129 23-Jul-2003  yamt KNF.
 1.128 12-Jul-2003  yamt - wrap long lines.
- remove a mysterious blank line.
 1.127 12-Jul-2003  yamt - protect global resource counts with lfs_subsys_lock.
- clean up scattered externs a little.
 1.126 02-Jul-2003  yamt use queue.h macros.
 1.125 02-Jul-2003  yamt - add a new functions, lfs_writer_enter/leave, and use them instead of
duplicated code fragments.
- add an assertion.
 1.124 29-Jun-2003  fvdl branches: 1.124.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.123 29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.122 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.121 18-May-2003  yamt make is_sequential a callback in order to achieve better lfs write clustering.

since lfs always rewrite blocks into the new segment,
current on-disk place of the block doesn't affect to write clustering.

ok'ed by Konrad Schroder.
 1.120 23-Apr-2003  perseant Make LFS work better (though still not "well") as an NFS-exported
filesystem (and other things that needed to be fixed before the tests
would complete), to wit:

* Include the fs ident in the filehandle; improve stale filehandle checks.

* Change definition of blksize() to use the on-dinode size instead of
the inode's i_size, so that fsck_lfs will work properly again.

* Use b_interlock in lfs_vtruncbuf.

* Postpone dirop reclamation until after the seglock has been released,
so that lfs_truncate is not called with the segment lock held.

* Don't loop in lfs_fsync(), just write everything and wait.

* Be more careful about the interlock/uobjlock in lfs_putpages: when we
lose this lock, we have to resynchronize dirtiness of pages in each
block.

* Be sure to always write indirect blocks and update metadata in
lfs_putpages; fixes a bug that caused blocks to be accounted to the
wrong segment.
 1.119 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.118 01-Apr-2003  yamt add assertions and a debug check.
 1.117 28-Mar-2003  fvdl The checkpoint loop always used (multiples of) lfs_sepb as the number
of segments to mark. However, this may be much more than lfs_nseg.

Originally this wasn't a big problem, since only the structures in the
diskblock were changed, but nowadays there's a mirror of the segflags
in the in-core superblock. This problem caused the code to walk
way past the end of that allocated area, causing memory corruption
in other kernel structures. So, use lfs_nseg as the maximum, as it should be.

While here, simplify the loop; it had become an obfuscated piece of
code overtime.
 1.116 28-Mar-2003  perseant Add a sleeper count, to prevent the cleaner from panicing the kernel
when the filesystem is unmounted, relocking the Ifile when its lock is
draining. (We can't use vfs_busy() since the process is sleeping for a
good long time.) Clean up / organize lfs.h, while I'm here.

In lfs_update_single, assert that disk addresses are either negative, or
are still positive when converted to int32_t, to prevent recurrence of a
negative/positive block problem.
 1.115 21-Mar-2003  perseant KNF (space after keywords).
 1.114 21-Mar-2003  perseant Use VONWORKLST as a heuristic for vnode emptiness, rather than exhaustively
checking the memq.

Take greater care not to dirty the Ifile vnode when unmounting the filesystem.
This should fix a "(vp->v_flag & VONWORKLST) == 0" assertion panic in vgonel
that could occur when unmounting.

Do not allow the Ifile to be mapped for writing.
 1.113 20-Mar-2003  yamt lfs_writevnodes:
in the case of "starting over", kick lfs_writeseg
in order to avoid deadlock in check_dirty.
 1.112 20-Mar-2003  perseant Don't break out of Ifile-writing loop in lfs_segwrite until nothing is left.
Note however that blocks can be added to the Ifile even when the segment
block is held because of inodes' atime. Do not panic with "dirty blocks"
if these blocks are present.
 1.111 15-Mar-2003  perseant Add simple_lock protection for lfs_seglock and lfs_subsys_pages; these will
be expanded to cover other per-fs and subsystem-wide data as well.

Fix a case of IN_MODIFIED being set without updating lfs_uinodes, resulting
in a "lfs_uinodes < 0" panic.

Fix a deadlock in lfs_putpages arising from the need to busy all pages in a
block; unbusy any that had already been busied before starting over.
 1.110 15-Mar-2003  kristerw SO C requires a statement after a label.
 1.109 11-Mar-2003  perseant - Get rid of unused #ifdefs LFS_NO_PAGEMOVE and LFS_MALLOC_SUMMARY (both
always true) and accompanying dead code.

- When constructing write clusters in lfs_writeseg, if the block we are
about to add is itself a cluster from GOP_WRITE, don't put a cluster
in a cluster, just write the GOP_WRITE cluster on its own. This seems
to represent a slight performance gain on my test machine.

- Charge someone's rusage for writes on LFSes. It's difficult to tell
who the "right" process to charge is; just charge whoever triggered
the write.
 1.108 08-Mar-2003  perseant Take away "#ifdef LFS_UBC".
 1.107 08-Mar-2003  perseant Add an lfs_strategy() that checks to make sure we're not trying to read
where the cleaner is trying to write, instead of tying up the "live"
buffers (or pages).

Fix a bug in the LFS_UBC case where oversized buffers would not be
checksummed correctly, causing uncleanable segments.

Make sure that wakeup(fs->lfs_iocount) is done if fs->lfs_iocount is 1
as well as 0, since we wait in some places for it to drop to 1.

Activate all pages that make it into lfs_gop_write without the segment
lock held, since they must have been dirtied very recently, even if
PG_DELWRI is not set.
 1.106 04-Mar-2003  perseant Make sure we hold the uobjlock when checking for dirty pages, in lfs_vflush.
Note that pages can become dirty without our knowing it, anyway; don't
panic if that happens.
 1.105 02-Mar-2003  perseant Account SEGUSE_ACTIVE correctly so that the automatic segment cleaning
actually happens.

Add a new fcntl call that will write the minimum necessary to checkpoint
(i.e., for on-disk directory structure to be consistent, not including
updates to file data) so that the cleaner can clean segments more quickly
without sacrificing three-way commit for cleaning.
 1.104 23-Feb-2003  perseant Fix a buffer overflow bug in the LFS_UBC case that manifested itself
either as a mysterious UVM error or as "panic: dirty bufs". Verify
maximum size in lfs_malloc.

Teach lfs_updatemeta and lfs_shellsort about oversized cluster blocks from
lfs_gop_write.

When unwiring pages in lfs_gop_write, deactivate them, under the theory
that the pagedaemon wanted to free them last we knew.
 1.103 20-Feb-2003  perseant Tabify, and fix some comment alignment problems.
 1.102 19-Feb-2003  yamt acquire v_interlock before calling VOP_PUTPAGES.
 1.101 17-Feb-2003  perseant Add code to UBCify LFS. This is still behind "#ifdef LFS_UBC" for now
(there are still some details to work out) but expect that to go
away soon. To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:

* Create a writer daemon kernel thread whose purpose is to handle page
writes for the pagedaemon, but which also takes over some of the
functions of lfs_check(). This thread is started the first time an
LFS is mounted.

* Add a "flags" parameter to GOP_SIZE. Current values are
GOP_SIZE_READ, meaning that the call should return the size of the
in-core version of the file, and GOP_SIZE_WRITE, meaning that it
should return the on-disk size. One of GOP_SIZE_READ or
GOP_SIZE_WRITE must be specified.

* Instead of using malloc(...M_WAITOK) for everything, reserve enough
resources to get by and use malloc(...M_NOWAIT), using the reserves if
necessary. Use the pool subsystem for structures small enough that
this is feasible. This also obsoletes LFS_THROTTLE.

And a few that are not strictly necessary:

* Moves the LFS inode extensions off onto a separately allocated
structure; getting closer to LFS as an LKM. "Welcome to 1.6O."

* Unified GOP_ALLOC between FFS and LFS.

* Update LFS copyright headers to correct values.

* Actually cast to unsigned in lfs_shellsort, like the comment says.

* Keep track of which segments were empty before the previous
checkpoint; any segments that pass two checkpoints both dirty and
empty can be summarily cleaned. Do this. Right now lfs_segclean
still works, but this should be turned into an effectless
compatibility syscall.
 1.100 05-Feb-2003  pk Make the buffer cache code MP-safe.
 1.99 01-Feb-2003  thorpej Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.
 1.98 29-Jan-2003  yamt don't use daddr_t for segment summary since it's an on-disk structure.
 1.97 29-Jan-2003  simonb Remove variable that is only assigned to but not referenced.
 1.96 27-Jan-2003  yamt make these compilable with lfs debug options.
(follow daddr_t change)

XXX maybe segment number should be 64bit.
 1.95 27-Jan-2003  kleink Further printf format fixes in the wake of daddr_t.

Note that PRI?64 and long long int arguments aren't made for each other,
nor are %lld and int64_t arguments.
 1.94 25-Jan-2003  kleink Fix further printf format warnings for DEBUG, in the wake of daddr_t
having changed.
 1.93 25-Jan-2003  tron Use PRId64 instead of hard coding "%lld" to fix build problems under
LP64 ports.
 1.92 25-Jan-2003  tron Fix printf() format strings problems caused by "daddr_t" change.
 1.91 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.90 08-Jan-2003  yamt backout wrong assertions that i added.
 1.89 08-Jan-2003  yamt add assertions.
 1.88 31-Dec-2002  yamt write ifile only when it has dirty buffers.
 1.87 17-Dec-2002  yamt no need for cleaner to hold vnode locks.
cleaner and normal vnode operations are synchronized enough by
seglock/fraglock and buf's B_BUSY-ness.
 1.86 17-Dec-2002  yamt use ufs_daddr_t instead of int where appropriate.
 1.85 14-Dec-2002  yamt in lfs_writefile, check v_type==VNON earlier.
to avoid null dereference with DEBUG_LFS_VERBOSE.
 1.84 13-Dec-2002  yamt save a segment write when doing checkpoint.
 1.83 12-Dec-2002  yamt correct DIAGNOSTIC code for duplicated inodes in a segment and su_nbytes.
 1.82 27-Sep-2002  provos remove trailing \n in panic(). approved perry.
 1.81 22-Sep-2002  jdolecek don't need <sys/conf.h> here
 1.80 06-Jul-2002  perseant Deal with fragment size changes better. For each fragment that can
exist on an on-disk inode, we keep a record of its size in struct inode,
which is updated when we write the block to disk. The cleaner routines
thus have ready access to what size is the correct size for this block,
on disk.

Fixed a related bug: if a file with fragments is being cleaned
(fragments being cleaned) at the same time it is being extended beyond
NDADDR blocks, we could write a bogus FINFO record that has a frag in the
middle; when it was cleaned this would give back bogus file data. Don't
write the indirect blocks in this case, since there is no need.

lfs_fragextend and lfs_truncate no longer require the seglock, but instead
take a shared lock, which the seglock locks exclusively.
 1.79 16-Jun-2002  perseant For synchronous writes, keep separate i/o counters for each write, so
processes don't have to wait for one another to finish (e.g., nfsd seems
to be a little happier now, though I haven't measured the difference).
Synchronous checkpoints, however, must always wait for all i/o to finish.

Take the contents of the callback functions and have them run in thread
context instead (aiodoned thread). lfs_iocount no longer has to be
protected in splbio(), and quite a bit less of the segment construction
loop needs to be in splbio() as well.

If lfs_markv is handed a block that is not the correct size according to
the inode, refuse to process it. (Formerly it was extended to the "correct"
size.) This is possibly more prone to deadlock, but less prone to corruption.

lfs_segclean now outright refuses to clean segments that appear to have live
bytes in them. Again this may be more prone to deadlock but avoids
corruption.

Replace ufsspec_close and ufsfifo_close with LFS equivalents; this means
that no UFS functions need to know about LFS_ITIMES any more. Remove
the reference from ufs/inode.h.

Tested on i386, test-compiled on alpha.
 1.78 24-May-2002  perseant Fix a couple of instances where reassignbuf() was not done at splbio.

Tested on i386.
 1.77 23-May-2002  perseant Back out rev 1.174 of vfs_subr.c, because the splbio() wasn't protecting
enough to be useful, and broadening it so that it did would have meant
that operations possibly requiring synchronous disk activity would have
to be done in splbio(). This clearly was not going to work.

Worked around this in the LFS case by having lfs_cluster_callback put an
extra hold on the vnode before calling biodone(), and taking the hold
off without HOLDRELE's problematic list swapping. lfs_vunref() will take
care of that---in thread context---on the next write if need be.

Also, ensure that the list walking in lfs_{writevnodes,segunlock,gather}
takes into account the possibility that the list may change
underneath it (possibly because it itself deleted an element).

Tested on i386, test-compiled on alpha.
 1.76 20-May-2002  perseant branches: 1.76.2;
Protect v_freelist with splbio(), since HOLDRELE can be called in
interrupt context (through brelvp). (LFS may be the only subsystem
affected by this problem.)

Tested on i386.
 1.75 17-May-2002  perseant use macros from <sys/queue.h>
 1.74 14-May-2002  perseant branches: 1.74.2;
Phase one of my three-phase plan to make LFS play nice with UBC, and bug-fixes
I found while making sure there weren't any new ones.

* Make the write clusters keep track of the buffers whose blocks they contain.
This should make it possible to (1) write clusters using a page mapping
instead of malloc, if desired, and (2) schedule blocks for rewriting
(somewhere else) if a write error occurs. Code is present to use
pagemove() to construct the clusters but that is untested and will go away
anyway in favor of page mapping.
* DEBUG now keeps a log of Ifile writes, so that any lingering instances of
the "dirty bufs" problem can be properly debugged.
* Keep track of whether the Ifile has been dirtied by various routines that
can be called by lfs_segwrite, and loop on that until it is clean, for
a checkpoint. Checkpoints need to be squeaky clean.
* Warn the user (once) if the Ifile grows larger than is reasonable for their
buffer cache. Both lfs_mountfs and lfs_unmount check since the Ifile can
grow.
* If an inode is not found in a disk block, try rereading the block, under
the assumption that the block was copied to a cluster and then freed.
* Protect WRITEINPROG() with splbio() to fix a hang in lfs_update.
 1.73 23-Nov-2001  chs add spaces for KNF. confirmed to produce identical objects.
 1.72 08-Nov-2001  lukem add RCSID
 1.71 26-Oct-2001  lukem remove #include <ufs/ufs/quota.h> where it was just to appease
<ufs/ufs/inode.h>, since the latter now includes the former. leave the former
in source that obviously uses specific bits of it (for completeness.)
 1.70 26-Jul-2001  jdolecek branches: 1.70.2; 1.70.4;
lfs_writeseg(): make el_size a size_t (cosmetic only, no functional change)
 1.69 13-Jul-2001  perseant Merge the short-lived perseant-lfsv2 branch into the trunk.

Kernels and tools understand both v1 and v2 filesystems; newfs_lfs
generates v2 by default. Changes for the v2 layout include:

- Segments of non-PO2 size and arbitrary block offset, so these can be
matched to convenient physical characteristics of the partition (e.g.,
stripe or track size and offset).

- Address by fragment instead of by disk sector, paving the way for
non-512-byte-sector devices. In theory fragments can be as large
as you like, though in reality they must be smaller than MAXBSIZE in size.

- Use serial number and filesystem identifier to ensure that roll-forward
doesn't get old data and think it's new. Roll-forward is enabled for
v2 filesystems, though not for v1 filesystems by default.

- The inode free list is now a tailq, paving the way for undelete (undelete
is not yet implemented, but can be without further non-backwards-compatible
changes to disk structures).

- Inode atime information is kept in the Ifile, instead of on the inode;
that is, the inode is never written *just* because atime was changed.
Because of this the inodes remain near the file data on the disk, rather
than wandering all over as the disk is read repeatedly. This speeds up
repeated reads by a small but noticeable amount.

Other changes of note include:

- The ifile written by newfs_lfs can now be of arbitrary length, it is no
longer restricted to a single indirect block.

- Fixed an old bug where ctime was changed every time a vnode was created.
I need to look more closely to make sure that the times are only updated
during write(2) and friends, not after-the-fact during a segment write,
and certainly not by the cleaner.
 1.68 30-May-2001  mrg branches: 1.68.2; 1.68.4;
use _KERNEL_OPT
 1.67 09-Jan-2001  joff branches: 1.67.2;
If DIAGNOSTIC and the segment writer gets a badly sized buffer, panic()
instead of silently corrupting the filesystem.
 1.66 03-Dec-2000  perseant Get rid of some old unnecessary code that cleared B_NEEDCOMMIT from buffers in
lfs_writeseg (possibly after they had been freed).

If MALLOCLOG is defined, make lfs_newbuf and lfs_freebuf pass along the
caller's file and line to _malloc and _free.
 1.65 30-Nov-2000  jdolecek only include opt_ddb.h for !LKM
 1.64 27-Nov-2000  chs Initial integration of the Unified Buffer Cache project.
 1.63 27-Nov-2000  perseant If LFS_DO_ROLLFORWARD is defined, roll forward from the older checkpoint
on mount, through the newer checkpoint and on through any newer
partial-segments that may have been written but not checkpointed because
of an intervening crash.

LFS_DO_ROLLFORWARD is not defined by default.
 1.62 17-Nov-2000  perseant Correct accounting of lfs_avail, locked_queue_count, and locked_queue_bytes.
(PR #11468). In the case of fragment allocation, check to see if enough
space is available before extending a fragment already scheduled for writing.

The locked_queue_* variables indicate the number of buffer headers and bytes,
respectively, that are unavailable to getnewbuf() because they are locked up
waiting for LFS to flush them; make sure that that is actually what we're
counting, i.e., never count malloced buffers, and always use b_bufsize instead
of b_bcount.

If DEBUG is defined, the periodic calls to lfs_countlocked will now complain
if either counter is incorrect. (In the future lfs_countlocked will not need
to be called at all if DEBUG is not defined.)
 1.61 12-Nov-2000  perseant Do not needlessly dirty segment table blocks during lfs_segwrite,
preventing needless disk activity when the filesystem is idle. (PR #10979.)
 1.60 12-Nov-2000  toshii Fix obsolete comments in lfs_writeinode since rev. 1.27.
New comments are mostly from perseant, with my additions.
 1.59 09-Sep-2000  perseant oops
 1.58 09-Sep-2000  perseant Various bug-fixes to LFS, to wit:


Kernel:

* Add runtime quantity lfs_ravail, the number of disk-blocks reserved
for writing. Writes to the filesystem first reserve a maximum amount
of blocks before their write is allowed to proceed; after the blocks
are allocated the reserved total is reduced by a corresponding amount.

If the lfs_reserve function cannot immediately reserve the requested
number of blocks, the inode is unlocked, and the thread sleeps until
the cleaner has made enough space available for the blocks to be
reserved. In this way large files can be written to the filesystem
(or, smaller files can be written to a nearly-full but thoroughly
clean filesystem) and the cleaner can still function properly.

* Remove explicit switching on dlfs_minfreeseg from the kernel code; it
is now merely a fs-creation parameter used to compute dlfs_avail and
dlfs_bfree (and used by fsck_lfs(8) to check their accuracy). Its
former role is better assumed by a properly computed dlfs_avail.

* Bounds-check inode numbers submitted through lfs_bmapv and lfs_markv.
This prevents a panic, but, if the cleaner is feeding the filesystem
the wrong data, you are still in a world of hurt.

* Cleanup: remove explicit references of DEV_BSIZE in favor of
btodb()/dbtob().

lfs_cleanerd:

* Make -n mean "send N segments' blocks through a single call to
lfs_markv". Previously it had meant "clean N segments though N calls
to lfs_markv, before looking again to see if more need to be cleaned".
The new behavior gives better packing of direct data on disk with as
little metadata as possible, largely alleviating the problem that the
cleaner can consume more disk through inefficient use of metadata than
it frees by moving dirty data away from clean "holes" to produce
entirely clean segments.

* Make -b mean "read as many segments as necessary to write N segments
of dirty data back to disk", rather than its former meaning of "read
as many segments as necessary to free N segments worth of space". The
new meaning, combined with the new -n behavior described above,
further aids in cleaning storage efficiency as entire segments can be
written at once, using as few blocks as possible for segment summaries
and inode blocks.

* Make the cleaner take note of segments which could not be cleaned due
to error, and not attempt to clean them until they are entirely free
of dirty blocks. This prevents the case in which a cleanerd running
with -n 1 and without -b (formerly the default) would spin trying
repeatedly to clean a corrupt segment, while the remaining space
filled and deadlocked the filesystem.

* Update the lfs_cleanerd manual page to describe all the options,
including the changes mentioned here (in particular, the -b and -n
flags were previously undocumented).

fsck_lfs:

* Check, and optionally fix, lfs_avail (to an exact figure) and
lfs_bfree (within a margin of error) in pass 5.

newfs_lfs:

* Reduce the default dlfs_minfreeseg to 1/20 of the total segments.

* Add a warning if the sgs disklabel field is 16 (the default for FFS'
cpg, but not usually desirable for LFS' sgs: 5--8 is a better range).

* Change the calculation of lfs_avail and lfs_bfree, corresponding to
the kernel changes mentioned above.

mount_lfs:

* Add -N and -b options to pass corresponding -n and -b options to
lfs_cleanerd.

* Default to calling lfs_cleanerd with "-b -n 4".


[All of these changes were largely tested in the 1.5 branch, with the
idea that they (along with previous un-pulled-up work) could be applied
to the branch while it was still in ALPHA2; however my test system has
experienced corruption on another filesystem (/dev/console has gone
missing :^), and, while I believe this unrelated to the LFS changes, I
cannot with good conscience request that the changes be pulled up.]
 1.57 09-Sep-2000  perseant Fix a buffer-cache corrupting bug in lfs_writeseg, where brelse could
be improperly used on an already-queued buffer.
 1.56 05-Jul-2000  perseant Clean up accounting of lfs_uinodes (dirty but unwritten inodes).

Make lfs_uinodes a signed quantity for debugging purposes, and set it to
zero as fs mount time.

Enclose setting/clearing of the dirty flags (IN_MODIFIED, IN_ACCESSED,
IN_CLEANING) in macros, and use those macros everywhere. Make
LFS_ITIMES use these macros; updated the ITIMES macro in inode.h to know
about this. Make ufs_getattr use ITIMES instead of FFS_ITIMES.
 1.55 04-Jul-2000  perseant Fix errors observed while trying to fill the filesystem with yesterday's
fixes:

- Write copies of bfree and avail in the CLEANERINFO block, so the
cleaner doesn't have to guess which superblock has the current
information (if indeed any do).

- Tighten up accounting of lfs_avail (more needs to be done).

- When cleansing indirect blocks of UNWRITTEN, make sure not to mark
them clean, since they'll need to be rewritten later.
 1.54 03-Jul-2000  perseant i_lfs_effnblks fixes. Put debugging printfs under #ifdef DEBUG_LFS.
 1.53 03-Jul-2000  perseant Allow the number of free segments reserved for the cleaner to be
parametrized in the filesystem, defaulting to MIN_FREE_SEGS = 2 but set
to something more reasonable at newfs_lfs time.

Note the number of blocks that have been scheduled for writing but which
are not yet on disk in an inode extension, i_lfs_effnblks. Move
i_ffs_effnlink out of the ffs extension and onto the main inode, since
it's used all over the shared code and the lfs extension would clobber
it.

At inode write time, indirect blocks and inode-held blocks of inodes
that have i_lfs_effnblks != i_ffs_blocks are cleansed of UNWRITTEN disk
addresses, so that these never make it to disk.
 1.52 27-Jun-2000  perseant Fixes associated with filling an LFS:

Change the space computation to appear to change the size of the *disk*
rather than the *bytes used* when more segment summaries and inode
blocks are written. Try to estimate the amount of space that these will
take up when more files are written, so the disk size doesn't change too
much.

Regularize error returns from lfs_valloc, lfs_balloc, lfs_truncate: they
now fail entirely, rather than succeeding half-way and leaving the fs in
an inconsistent state.

Rewrite lfs_truncate, mostly stealing from ffs_truncate. The old
lfs_truncate had difficulty truncating a large file to a non-zero size
(indirect blocks were not handled appropriately).

Unmark VDIROP on fvp after ufs_remove, ufs_rmdir, so these can be
reclaimed immediately: this vnode would not be written to disk again
anyway if the removal succeeded, and if it failed, no directory
operation occurred.

ufs_makeinode and ufs_mkdir now remove IN_ADIROP on error.
 1.51 27-Jun-2000  perseant From John Evans <jevans@cray.com>: use datosn() to convert to segment
number, when remarking the current segment ACTIVE. See PR #10463.
 1.50 22-Jun-2000  perseant Update lfs_vunref for the fact that now a vnode can be locked with no
references (locked for VOP_INACTIVE at the end of vrele) and it's okay.
Check the return value of lfs_vref where appropriate.
Fixes PR #s 10285 and 10352.
 1.49 06-Jun-2000  perseant branches: 1.49.2;
Protect inode free list with seglock, instead of separate lock, so that
the head of the inode free list (on the superblock) always matches the
rest of the free list (in the ifile).

Protect lfs_fragextend with seglock, to prevent the segment byte count
fudging from making its way to disk.

Don't try to inactivate dirop vnodes that are still in the middle of
their dirop (may address PR#10285).
 1.48 31-May-2000  fredb Make this build. (Balance parenthesis.
 1.47 31-May-2000  perseant update for IN_ACCESSED changes
 1.46 27-May-2000  perseant branches: 1.46.2;
Prevent dirops from getting around lfs_check and wedging the buffer cache.
All the dirop vnops now mark the inodes with a new flag, IN_ADIROP, which
is removed as soon as the dirop is done (as opposed to VDIROP which stays
until the file is written). To address one issue raised in PR#9357.
 1.45 19-May-2000  thorpej NULL != 0
 1.44 10-May-2000  perseant stop vnode reference leak introduced in patch to PR#9994
 1.43 05-May-2000  perseant Change the way LFS does block accounting, from trying to infer from the
buffer cache flags, to marking the inode and/or indirect blocks with a
special disk address UNWRITTEN==-2 when a block is accounted for. (This
address is never written to disk, but only used in-core. This is essentially
the same method of block accounting as on the UBC branch, where the buffer
headers don't exist.) Make sure that truncation is handled properly,
especially in the case of holey files.

Fixes PR#9994.
 1.42 30-Mar-2000  augustss Remove register declarations.
 1.41 13-Mar-2000  soren Fix doubled 'the's in comments.
 1.40 19-Jan-2000  perseant Changes to stabilize LFS. The first two of these should also apply to the
1.4 branch.

* Use a separate per-fs lock, instead of ufs_hashlock, to protect the Inode
free list. This seems to prevent the "lockmgr: %d, not exclusive lock holder
%d, unlocking" message I was mis-attributing last night to an unlocked vnode
being passed to vrele.

* Change calling semantics of lfs_ifind, to give better error reporting:
If fed a struct buf, it can report the block number of the offending inode
block as well as the inode number.

* Back out rev 1.10 of lfs_subr.c, since the replacement code was slightly
uglier while being functionally identical.

* Make lfs_vunref use the same free list convention as vrele/vput, so that
vget does not remove vnodes from a hash list they are not on.
 1.39 16-Jan-2000  perseant Fix a problem in my changes of Dec 14th, that prevents removed vnodes
from being inactivated under some conditions. Removed vnodes are now
inactivated when the VDIROP flag is cleared, and to prevent block
accounting problems this clearing has been postponed until
lfs_segunlock.
 1.38 14-Jan-2000  perseant Better handling of various combinations of cleaning, vnode flushing, and
dirop writing. In particular, lfs_writevnodes now writes all buffers from
a flushed vnode whether cleaning or not, and the same with the Ifile; and
lfs_segwrite does not attempt to write data from other non-cleaning vnodes,
even if a vnode is being flushed.
 1.37 03-Dec-1999  perseant Handle the case of a vnode flush while dirops are active correctly in
lfs_segwrite. Also, make sure a flush is called in SET_DIROP before sleeping
on its results. Addresses PR #8863.
 1.36 17-Nov-1999  perseant Fix spllevel problem with superblock exclusion and with segment write throttle.
May address PR#8383.
 1.35 15-Nov-1999  fvdl Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.34 12-Nov-1999  perseant Back out my patch of the 8th (to address unreferenced inode problem).
Apparently this needs more thought.
 1.33 09-Nov-1999  perseant If ifile blocks were written before dirops were complete, and then the
system crashed, inodes could be allocated that were not referenced. (Though
not a serious problem, it evidences itself in phase 4 of fsck_lfs.) Fix
this by marking if_daddr with UNASSIGNED before the inodes are actually
written; at mount time the ifile is checked for UNASSIGNED entries and
any that are found are linked back into the free list. (The latter
functionality should move into the roll-forward agent when it materializes.)
 1.32 06-Nov-1999  perseant branches: 1.32.2;
Address ufs_hashlock/ufs_ihashins protocol bug, discovered while doing a
post-mortem of a production machine. Also, take the active dirop
count off of the fs and make it global (since it is measuring a global
resource) and tie the threshold value LFS_MAXDIROP to desiredvnodes.
 1.31 01-Oct-1999  mycroft branches: 1.31.2; 1.31.4; 1.31.6;
Fix printf() formats.
 1.30 03-Sep-1999  perseant Make changes that will allow an LFS filesystem to be used as the root
filesystem. In particular,

- Fix mknod deadlock, described in PR 8172.
- Enable lfs_mountroot.
- Make lfs_writevnodes treat filesystems mounted on lfs device nodes properly,
by flushing that device rather than trying to add blocks to the device inode.

This, in combination with lfs boot blocks, will allow operation of an all-lfs
system.
 1.29 08-Jul-1999  wrstuden Modify file systems to deal with struct lock in struct vnode. All leaf
fs's other than nfs use genfs_lock() for locking.

Modify lookup routines to set PDIRUNLOCK when they unlock the parrent.
 1.28 17-Jun-1999  tls squash some compiler warnings on debug printfs by casting to int
 1.27 15-Jun-1999  perseant Minor changes to the segment live bytes calculation. In particular, fixed
a bug in fragment extension that could run the count negative. Also, don't
overcount for inodes, and don't count segment summaries. Thus, for empty
segments the live bytes count should now be exactly zero.
 1.26 12-Apr-1999  perseant Make sure that the wakeup occurs for vnodes that lfs_update might be sleeping
on (nodes which are not marked IN_MODIFIED/IN_CLEANING, but which have dirty
buffers), by marking them with the appropriate flag if dirtybuffers were added
while the write was in progress.
 1.25 12-Apr-1999  perseant Better checking for held inode locks in lfs_fastvget, for a number of error
conditions. Also change the default setting of lfs_clean_vnhead to 0, which
seems to make the locking problems go away (although this is difficult to
test as I can't reliably reproduce them).
 1.24 12-Apr-1999  perseant Fix "lfs_ifind: dinode xxx not found" panic. When inodes were freed,
then immediately reloaded, their dinodes were located in an inode block
which was not on disk at the advertized location, nor in the cache (although
it would be flushed to disk next segment write). Fix this by using getblk()
instead of lfs_newbuf() for inode blocks.
 1.23 30-Mar-1999  perseant branches: 1.23.2;
Add initialization to quell compiler warning (only on some platforms?)
 1.22 30-Mar-1999  perseant Move variable initialization to the top of lfs_vflush
 1.21 29-Mar-1999  perseant lfs_truncate calls vinvalbuf to invalidate all currently-hald buffers, which
in turn forces a flush of the vnode, whether or not it is involved in a dirop.
(This can happen during a remove or rmdir, when the directory is shrunk.)
Because of the nature of dirops, however, flushing a vnode involved in a dirop
is disallowed (and was marked with a panic). This patch has lfs_truncate
call a specialized vinvalbuf that only invalidates buffers following the new
end-of-file, and thus does not require a flush. Also the panic is demoted,
in case I missed any other path to lfs_vflush.
 1.20 25-Mar-1999  perseant Make sysctl variable lfs_clean_vnhead do what it was supposed to do,
namely, toggle whether vnodes loaded only for cleaning (as opposed to
normal filesystem use) are freed to the *head* of the vnode free list,
rather than the tail. This should avoid a possible cache flushing
effect, if the cleaner cleans a segment containing a large number of
live inodes.
 1.19 25-Mar-1999  perseant Fixes to make dirops and lfs_vflush play together well. In particular,
if we are short on vnodes, lfs_vflush from another process can grab a
vnode that lfs_markv has already processed but not yet written; but
lfs_markv holds the seglock. When lfs_vflush gets around to writing it,
the context for copyin is gone. So, now lfs_markv calls copyin itself,
rather than having lfs_writeseg do it.
 1.18 25-Mar-1999  perseant Lock buffers with B_BUSY between data checksum calculation and write, so
some other process doesn't change the data after it was checksummed.
 1.17 25-Mar-1999  perseant Change lfs_sb_cksum to use offsetof() instead of an inlined version.

Fix lfs_vref/lfs_vunredf to ignore VXLOCKed vnodes that are also being
flushed.

Improve the debugging messages somewhat.
 1.16 25-Mar-1999  perseant clean up unused/required #ifdefs
 1.15 10-Mar-1999  perseant New sources should leave the LFS in a more-or-less working state. Changes
include:

- DIROP segregation is enabled, and greater care is taken
to make sure that a checkpoint completes. Fsck is not
needed to remount the filesystem.
- Several checks to make sure that the LFS subsystem does not
overuse various resources (memory, in particular).
- The cleaner routines, lfs_markv in particular, are completely
rewritten. A buffer overflow is removed. Greater care is taken
to ensure that inodes come from where lfs_cleanerd say they come
from (so we know nothing has changed since lfs_bmapv was called).
- Fragment allocation is fixed, so that writes beyond end-of-file
do the right thing.
 1.14 09-Nov-1998  mycroft GC the B_CACHE bit.
 1.13 23-Oct-1998  thorpej Use DINODE_SIZE rather than sizeof(struct dinode).
 1.12 11-Sep-1998  pk PR#6032: define fixed sized on-disk superblock structure.
 1.11 08-May-1998  kleink Fix some arithmetics lossage on typeless pointers.
 1.10 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.9 13-Jun-1997  pk TIMESPEC_TO_TIMEVAL => TIMEVAL_TO_TIMESPEC
 1.8 11-Jun-1997  bouyer Add support for ext2fs, this needed a few modifications to ufs/ufs/inode.h:
- added an "union inode_ext" to struct inode, for the per-fs extentions.
For now only ext2fs uses it.
- i_din is now an union:
union {
struct dinode ffs_din; /* 128 bytes of the on-disk dinode. */
struct ext2fs_dinode e2fs_din; /* 128 bytes of the on-disk dinode. */
} i_din
Added a lot of #define i_ffs_* and i_e2fs_* to access the fields.
- Added two macros: FFS_ITIMES and EXT2FS_ITIMES. ITIMES calls the rigth
macro, depending on the time of the inode. ITIMES is used where necessary,
FFS_ITIMES and EXT2FS_ITIMES in other places.
 1.7 12-Oct-1996  christos revert previous kprintf changes
 1.6 10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.5 01-Sep-1996  mycroft Add a set of generic file system operations that most file systems use.
Also, fix some time stamp bogosities.
 1.4 09-Feb-1996  christos lfs prototypes
 1.3 21-Aug-1994  cgd C syntax fix, and syscall args style (For later.)
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.23.2.10 20-Jan-2000  he Pull up revision 1.39 (requested by perseant):
Files removed (through unlink, rmdir) are now really removed, though the
removal is postponed until the dirop is complete to ensure validity of
the filesystem through a crash. Use a separate per-fs lock, instead of
ufs_hashlock, to protect the inode free list. Change calling semantics
of lfs_ifind, to give better error reporting: If fed a struct buf, it
can report the block number of the offending inode block as well as the
inode number.
 1.23.2.9 15-Jan-2000  he Pull up revision 1.38 (requested by perseant):
Handle flushing a vnode during cleaning, and cleaning the Ifile,
more correctly, avoiding possible disk corruption in some cases.
 1.23.2.8 15-Jan-2000  he Pull up revision 1.30 (requested by perseant):
Address problems related to using an LFS filesystem as the root
filesystem, including mknod hangs. Fixes PR#8172 and PR#9072.
 1.23.2.7 18-Dec-1999  he Pull up revision 1.37 (requested by perseant):
Handle the case of a vnode flush while dirops are active correctly
in lfs_segwrite. Also, make sure a flush is called in SET_DIROP
before sleeping on its results. Addresses PR#8863.
 1.23.2.6 17-Dec-1999  he Pull up revision 1.32 (requested by perseant):
Address locking protocol error for inode hash, and make the
maximum number of active dirops a global quantity.
 1.23.2.5 16-Dec-1999  he Pull up revision 1.36 (requested by perseant):
Fix spllevel problem with superblock exclusion and with write
throttle. Addressess PR#8383.
 1.23.2.4 10-Oct-1999  cgd pull up rev 1.31 from trunk (requested by mycroft):
Fix potential overflow of v_usecount and v_writecount (and panics
resulting from this) by widening them to `long'. Mostly affects
systems where maxvnodes>=32768.
 1.23.2.3 03-Sep-1999  he Pull up revision 1.28:
Fix a printf format bug that gives compiler warnings/errors on
64-bit platforms, fixing PR#8241. (perseant)
 1.23.2.2 25-Jun-1999  perry pullup 1.26->1.27 (perseant)
 1.23.2.1 13-Apr-1999  perseant branches: 1.23.2.1.2; 1.23.2.1.4;
Pull-up of changes made to the trunk on Sunday [1.23->1.26], to wit:

Take out the `#ifdef USE_UFSHASH'; use ufs_hashlock to lock the inode free
list instead of free_lock.

Fix inode reporting in lfs_statfs (the meaning of f_files and f_ffree was
reversed).

Fix "lfs_ifind: dinode xxx not found" panic. When inodes were freed, then
immediately reloaded, their dinodes were located in an inode block which
was not on disk at the advertized location, nor in the cache (although it
would be flushed to disk next segment write). Fix this by using getblk()
instead of lfs_newbuf() for inode blocks.

Better checking for held inode locks in lfs_fastvget, for a number of
error conditions. Also change the default setting of lfs_clean_vnhead to
0, which seems to make the locking problems go away (although this is
difficult to test as I can't reliably reproduce them).

Make sure that the wakeup occurs for vnodes that lfs_update might be
sleeping on (nodes which are not marked IN_MODIFIED/IN_CLEANING, but which
have dirty buffers), by marking them with the appropriate flag if
dirtybuffers were added while the write was in progress.

Fix block counting during file truncation, if not truncating to zero.

Disallow threshold-initiated cache flush when dirops are active. Also,
make SET_ENDOP use lfs_check instead of inlining most of it.

Improve the debugging printfs in the cleaner syscalls (in particular, make
it obvious that they're coming from lfs).

Check the superblock version field, and refuse to mount the filesystem if
the version number is higher than we know about. This allows, e.g.,
changes in the format of the ifile, segment size restrictions and
boundaries, etc., which would not affect existing fields in the
superblock, but which would drastically affect the filesystem, to be
smoothly integrated at a later date.
 1.23.2.1.4.1 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.23.2.1.2.4 31-Aug-1999  perseant Rudimentary support for LFS under UBC:

- LFS-specific VOP_BALLOC and VOP_PUTPAGES vnode ops.

- getblk VREG panic #ifdef'd out (can be reinstated when Ifile is
internalized and Ifile can be made another type from VREG)

- interface to VOP_PUTPAGES changed to pass all pager flags, not
just sync. FS putpages routines must know about the pager flags.

- new LFS magic disk address, -2 ("unwritten"), meaning accounted for
but not assigned to a fixed disk location (since LFS does these two
things separately, and the previous accounting method using buffer
headers no longer will work). Changed references to (foo == (daddr_t)-1)
to (foo < 0). Since disk drivers reject all addresses < 0, this should
not present a problem for other FSs.
 1.23.2.1.2.3 02-Aug-1999  thorpej Update from trunk.
 1.23.2.1.2.2 21-Jun-1999  thorpej Correct a printf format now that vnode flags are an int (in the uvm_vnode
structure).
 1.23.2.1.2.1 21-Jun-1999  thorpej Sync w/ -current.
 1.31.6.2 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.31.6.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.31.4.2 15-Nov-1999  fvdl Sync with -current
 1.31.4.1 19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.31.2.4 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.31.2.3 08-Dec-2000  bouyer Sync with HEAD.
 1.31.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.31.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.32.2.2 06-Nov-1999  perseant Address ufs_hashlock/ufs_ihashins protocol bug, discovered while doing a
post-mortem of a production machine. Also, take the active dirop
count off of the fs and make it global (since it is measuring a global
resource) and tie the threshold value LFS_MAXDIROP to desiredvnodes.
 1.32.2.1 06-Nov-1999  perseant file lfs_segment.c was added on branch comdex-fall-1999 on 1999-11-06 20:33:06 +0000
 1.46.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.49.2.4 03-Feb-2001  he Pull up revisions 1.60-1.62 (requested by perseant):
o Don't write anything if the filesystem is idle (PR#10979).
o Close up accounting holes in LFS' accounting of immediately-
available-space, number of clean segments, and amount of dirty
space taken up by metadata (PR#11468, PR#11470, PR#11534).
 1.49.2.3 14-Sep-2000  perseant Pull up recent LFS kernel changes (approved by thorpej):

ufs/ufs/inode.h, 1.20--1.22 (add i_lfs_effnblks extension ;
make ITIMES aware of LFS_ITIMES;
_LKM protection so userland progs
compile)
ufs/ufs/ufs_vnops.c, 1.69, 1.71 (remove IN_ADIROP;
use ITIMES instead of FFS_ITIMES)
ufs/ufs/ufs_readwrite.c, 1.27 (use lfs_reserve in lfs_write)
ufs/lfs/lfs.h, 1.26--1.32 (define LFS_EST_* macros ;
change MIN_FREE_SEGS to lfs_minfreesegs ;
add avail and bfree to CLEANERINFO ;
change lfs_uinodes to signed ;
change lfs_dmeta to signed ;
add whitespace to line up structure
members ;
explicit cast to int32_t in LFS_EST_*
macros)
ufs/lfs/lfs_alloc.c, back out 1.34.2.3 (pullups of 1.39, 1.40);
then pull up 1.38 (clean up on error)
1.39--1.43 (restore fvdl's ufs_hashlock fix ;
restore fvdl's ufs_hashlock fix ;
set i_lfs_effnblks ;
use UINO macros ;
add comments and fix long lines)
ufs/lfs/lfs_balloc.c, 1.19 (don't succeed halfway)
1.21--1.25 (use i_lfs_effnblks ;
fix i_lfs_effnblks computation and
quieten ;
fix i_ffs_blocks in unwritten fragment ;
remove useless debugging check ;
add comments and (c) 2000)
ufs/lfs/lfs_bio.c, 1.24--1.30 (cleanup and make lfs_flush_fs take
"struct lfs *" instead of "struct
mount *" ;
use lfs_minfreeseg instead of
MIN_FREE_SEGS ;
use UINO macros, and copy bfree/avail
to CLEANERINFO ;
add lfs_reserve function ;
1.28--1.30 fix printf formatting)
ufs/lfs/lfs_cksum.c, 1.13 (add (c) 2000)
ufs/lfs/lfs_debug.c, 1.11 (use btodb instead of DEV_BSIZE)
ufs/lfs/lfs_extern.h, 1.18, 1.20--1.21 (function prototype changes)
ufs/lfs/lfs_inode.c, 1.38 (rewrite lfs_truncate from
ffs_truncate)
1.40--1.44 (count written and unwritten blocks
seperately ;
use disk block units instead of bytes ;
remove unnecessary "mod" variable ;
correct B_DELWRI to avoid bawrite panic ;
use lfs_reserve)
ufs/lfs/lfs_segment.c, 1.52-1.59 (use lfs_dmeta to note used summaries ;
check for UNWRITTEN in indirect blocks ;
more debugging stuff inside #ifdef
DEBUG_LFS ;
use LK_CANRECURSE ;
don't drop dirty indirect blocks ;
use UINO macros ;
don't hose the free list ;
use btodb() instead of DEV_BSIZE ;
make it compile again (oops))
ufs/lfs/lfs_subr.c, 1.16--1.17 (check for locked inodes before
changing ;
use btodb() instead of DEV_BSIZE, (c)
2000)
ufs/lfs/lfs_syscalls.c, back out 1.41.4.2 (fvdl's ufs_hashlock fix);
then pull up 1.43 (use lfs_dmeta)
1.44--1.45 (restore fvdl's ufs_hashlock fix)
1.46--1.47 (fix lfs_avail leakage from sblock
segments ;
use UINO macros)
1.49 (bounds-check inode numbers in
lfs_markv)
ufs/lfs/lfs_vfsops.c, 1.53 (use LFS_EST_* macros in lfs_statfs)
1.56--1.58 (initialize lfs_minfreeseg, lfs_effnblk ;
initialize lfs_uinodes ;
initialize lfs_ravail)
ufs/lfs/lfs_vnops.c, 1.40 (remove VDIROP from removed files)
1.42--1.44 (move SET_ENDOP below the removal of
VDIROP ;
use UINO macros and add lfs_itimes
function ;
use lfs_reserve in dirops)
 1.49.2.2 28-Jun-2000  perseant pull up active current segment patch from trunk
 1.49.2.1 22-Jun-2000  perseant Pull up lfs_vunref fix from the trunk.
 1.67.2.13 08-Jan-2003  thorpej Oh my aching HEAD.
 1.67.2.12 08-Jan-2003  thorpej Sync with HEAD.
 1.67.2.11 03-Jan-2003  thorpej Sync with HEAD.
 1.67.2.10 19-Dec-2002  thorpej Sync with HEAD.
 1.67.2.9 18-Oct-2002  nathanw Catch up to -current.
 1.67.2.8 01-Aug-2002  nathanw Catch up to -current.
 1.67.2.7 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.67.2.6 20-Jun-2002  nathanw Catch up to -current.
 1.67.2.5 08-Jan-2002  nathanw Catch up to -current.
 1.67.2.4 14-Nov-2001  nathanw Catch up to -current.
 1.67.2.3 24-Aug-2001  nathanw Catch up with -current.
 1.67.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.67.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.68.4.5 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.68.4.4 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.68.4.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.68.4.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.68.4.1 03-Aug-2001  lukem update to -current
 1.68.2.3 02-Jul-2001  perseant Change disk addressing unit to be the fragment, instead of the disk sector.
All quantities in the superblock, inodes, indirect blocks, etc. refer now
to this abstract unit (called "fsb" as it is in FFS) instead of disk sectors;
as a consequence segment summary blocks have to be multiples of a fragment in
size. In v1 filesystems, compatibility code ensures that 1 fsb == 1 sector,
regardless of fragment size.

Fragments can now range in size between 512 and 32k; in the event that
LFS_LABELPAD (8k) is smaller than the disk address unit size, an extra
proto-superblock is kept at 8k from the beginning of the disk, to be used
*only* to locate the real superblocks. (Not all of the userland knows about
this yet.)

Almost all of this was done not by me, but by joff.
 1.68.2.2 29-Jun-2001  perseant Get rid of __P(), protoizing where it had not already been done
 1.68.2.1 27-Jun-2001  perseant Import of what I've been calling "LFSv2", that is, LFS with some features
added that require changes to the on-disk data structures. These include:

- 64-bit time in everything but inodes
- User-specified segment offset, and segment size no longer
restricted to PO2.
- Serial number on segment summaries in addition to timestamp, and
a new volume identifier, to make roll-forward feasible without
fear of finding old data and thinking it was new.

Although I think this version works at least as well as what's on the trunk,
we're not done yet; hence this commit is going in on a branch and not on
the trunk. Enhancements that are not here yet include fragment addressing,
like FFS does, instead of block addressing.
 1.70.4.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.70.2.1 07-Sep-2001  thorpej Commit my "devvp" changes to the thorpej-devvp branch. This
replaces the use of dev_t in most places with a struct vnode *.

This will form the basic infrastructure for real cloning device
support (besides being architecurally cleaner -- it'll be good
to get away from using numbers to represent objects).
 1.74.2.3 15-Jul-2002  gehenna catch up with -current.
 1.74.2.2 20-Jun-2002  gehenna catch up with -current.
 1.74.2.1 30-May-2002  gehenna Catch up with -current.
 1.76.2.3 20-Jun-2002  lukem Pull up revision 1.79 (requested by perseant in ticket #325):
For synchronous writes, keep separate i/o counters for each write, so
processes don't have to wait for one another to finish (e.g., nfsd seems
to be a little happier now, though I haven't measured the difference).
Synchronous checkpoints, however, must always wait for all i/o to finish.
Take the contents of the callback functions and have them run in thread
context instead (aiodoned thread). lfs_iocount no longer has to be
protected in splbio(), and quite a bit less of the segment construction
loop needs to be in splbio() as well.
If lfs_markv is handed a block that is not the correct size according to
the inode, refuse to process it. (Formerly it was extended to the "correct"
size.) This is possibly more prone to deadlock, but less prone to corruption.
lfs_segclean now outright refuses to clean segments that appear to have live
bytes in them. Again this may be more prone to deadlock but avoids
corruption.
Replace ufsspec_close and ufsfifo_close with LFS equivalents; this means
that no UFS functions need to know about LFS_ITIMES any more. Remove
the reference from ufs/inode.h.
Tested on i386, test-compiled on alpha.
 1.76.2.2 02-Jun-2002  tv Pull up revision 1.78 (requested by perseant in ticket #135):
Fix a couple of instances where reassignbuf() was not done at splbio.
Tested on i386.
 1.76.2.1 02-Jun-2002  tv Pull up revision 1.77 (requested by perseant in ticket #132):
Back out rev 1.174 of vfs_subr.c, because the splbio() wasn't protecting
enough to be useful, and broadening it so that it did would have meant
that operations possibly requiring synchronous disk activity would have
to be done in splbio(). This clearly was not going to work.
Worked around this in the LFS case by having lfs_cluster_callback put an
extra hold on the vnode before calling biodone(), and taking the hold
off without HOLDRELE's problematic list swapping. lfs_vunref() will take
care of that---in thread context---on the next write if need be.
Also, ensure that the list walking in lfs_{writevnodes,segunlock,gather}
takes into account the possibility that the list may change
underneath it (possibly because it itself deleted an element).
Tested on i386, test-compiled on alpha.
 1.124.2.10 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.124.2.9 08-Mar-2005  skrll Sync with HEAD.
 1.124.2.8 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.124.2.7 24-Sep-2004  skrll Sync with HEAD.
 1.124.2.6 21-Sep-2004  skrll Fix the sync with head I botched.
 1.124.2.5 18-Sep-2004  skrll Sync with HEAD.
 1.124.2.4 25-Aug-2004  skrll Sync with HEAD.
 1.124.2.3 24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.124.2.2 03-Aug-2004  skrll Sync with HEAD
 1.124.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.152.4.1 10-May-2005  riz Pull up the following revisions (requested by perseant in ticket #1281):

1.8 sys/ufs/lfs/TODO
1.75 sys/ufs/lfs/lfs.h (via patch)
1.74 sys/ufs/lfs/lfs_alloc.c (via patch)
1.49, 1.51 sys/ufs/lfs/lfs_balloc.c (1.51 via patch)
1.78 sys/ufs/lfs/lfs_bio.c
1.62 sys/ufs/lfs/lfs_extern.h (via patch)
1.156 sys/ufs/lfs/lfs_segment.c (via patch)
1.48 sys/ufs/lfs/lfs_subr.c
1.101 sys/ufs/lfs/lfs_syscalls.c
1.163 sys/ufs/lfs/lfs_vfsops.c (via patch)
1.134 sys/ufs/lfs/lfs_vnops.c (via patch)
1.61 sys/ufs/ufs/ufs_readwrite.c (via patch)

1.20 libexec/lfs_cleanerd/clean.h (via patch)
1.52 libexec/lfs_cleanerd/cleanerd.c (via patch)
1.41 libexec/lfs_cleanerd/library.c (via patch)

1.4 regress/sys/fs/lfs/newfs_fsck/Makefile
1.2 regress/sys/fs/lfs/newfs_fsck/mkfs_mount
1.2 regress/sys/fs/lfs/newfs_fsck/smallfiles
1.3 sbin/fsck_lfs/bufcache.c
1.3 sbin/fsck_lfs/bufcache.h
1.3 sbin/fsck_lfs/lfs.h
1.8 sbin/fsck_lfs/lfs.c (via patch)
1.8 sbin/fsck_lfs/pass3.c (via patch)
1.18 sbin/fsck_lfs/pass0.c (via patch)
1.18 sbin/fsck_lfs/utilities.c (via patch)
1.7 sbin/fsck_lfs/segwrite.c
1.19 sbin/fsck_lfs/setup.c (via patch)
1.3 sbin/newfs_lfs/Makefile
0 sbin/newfs_lfs/lfs.c (yes, remove it)
1.1 sbin/newfs_lfs/make_lfs.c
1.15 sbin/newfs_lfs/newfs.c (via patch)

Various minor LFS improvements.

Kernel:

* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this. Should fix PR #29045.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
Fixes PR #26680.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().

cleaner:

* Adapt lfs_cleanerd to use the fcntl call to get the Ifile filehandle,
so it need not be in the namespace.
* Make lfs_cleanerd be more careful when there are very few available
segments.
* Make lfs_cleanerd less verbose when the filesystem is unmounted.

newfs_lfs, fsck_lfs, and regression:

* Extend the lfs library from fsck_lfs(8) so that it can be used with a
not-yet-existent LFS. Make newfs_lfs(8) use this library, so it can
create LFSs whose Ifile is larger than one segment. Addresses PR #11110.
* Make newfs_lfs(8) use strsuftoi64() for its arguments, a la newfs(8).
* Make fsck_lfs(8) respect the "file system is clean" flag.
* Don't let fsck_lfs(8) think it has dirty blocks when invoked with the
-n flag.
* Remove the Ifile from the filesystem namespace. The cleaner now uses
a fcntl call on the root inode to find the Ifile filehandle. (As a
side-effect, addresses PR #29144.)
 1.155.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.155.4.1 29-Apr-2005  kent sync with -current
 1.158.2.13 10-Aug-2006  tron Apply patch (requested by fair in perseant #1457):
Bring LFS up to current, including a patch (1.95 lfs_alloc.c) that
should prevent the inode free list errors seen on the STABLE branch
subsequent to pullup ticket #1327.
 1.158.2.12 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_alloc.c: revision 1.93
sys/ufs/lfs/lfs.h: revision 1.106
sys/ufs/lfs/lfs_vfsops.c: revision 1.209
sys/ufs/lfs/lfs_vnops.c: revision 1.175
sys/ufs/lfs/lfs_segment.c: revision 1.178
Fixes to address the "vinvalbuf: dirty blocks" panic that can occur when
many inodes are cleaned at once. Make sure that we write all the pages
on vnodes that are being flushed, even if we don't think there's room;
drain v_numoutput before lfs_vflush() completes.
Also, don't allow a vnode that is in the process of being cleaned to be
chosen by getnewvnode(); this avoids a segment accounting panic in the case
that a large number of inodes are fed to lfs_markv() all at once.
 1.158.2.11 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.171
sys/ufs/lfs/lfs_extern.h: revision 1.81
sys/ufs/lfs/lfs_segment.c: revision 1.177
Don't ever partially write dirops, even if we need the cleaner to run.
This increases the chances of the "no clean segments" panic slightly,
but allows us to run the ckckp regression test successfully to completion.
 1.158.2.10 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs.h: revision 1.104
sys/ufs/lfs/lfs_vfsops.c: revision 1.206
sys/ufs/lfs/lfs_vnops.c: revision 1.170
sys/ufs/lfs/lfs_extern.h: revision 1.80
sys/ufs/lfs/lfs_segment.c: revision 1.176
sys/ufs/lfs/lfs_inode.c: revision 1.103 via patch
sys/ufs/lfs/lfs_alloc.c: revision 1.90
Postpone the segment accounting changes coming from truncation until the
inode that makes those changes valid is either written to disk by
lfs_writeinode() or discarded by lfs_vfree().
A couple of locking fixes are also included as well.
 1.158.2.9 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_segment.c: revision 1.175
Regression test improvements:
Move the stop for LFCNWRAPSTOP to the point at which writing at segment 0
is really about to commence, since this is what the test expects (and
incidentally what a snapshotting utility wants as well).
More correctly reconstruct the on-disk state at every checkpoint, rather
than relying on the entire state at the point of wrapping to be accurate
(that is only true the first time we wrap). Add a "make abort" target to
make rerunning the test more convenient when it has failed and we're done
analyzing the failure.
 1.158.2.8 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs.h: revision 1.103
sys/ufs/lfs/lfs_segment.c: revision 1.174
sys/ufs/lfs/lfs_vnops.c: revision 1.168
Introduce two fcntl calls that freeze the filesystem right at the point
where segment 0 is being considered for writing. This allows for automated
checkpoint vailidity scanning, and could be used (in conjunction with the
existing LFCNREWIND) for e.g. snapshot dumps as well.
Include a regression test that does such scanning.
When writing the Ifile, loop through the dirty block list three times to
make sure that the checkpoint is always consistent (the first and second
times the Ifile blocks can cross a segment boundary; not so the third time
unless the segments are very small). Discovered by using the aforementioned
regression test.
 1.158.2.7 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs.h: revision 1.102
sys/ufs/lfs/lfs_segment.c: revision 1.173
sys/ufs/lfs/lfs_vnops.c: revision 1.167 via patch
sys/ufs/lfs/lfs_bio.c: revision 1.91
Make lfs_vref/lfs_vunref not need to know about VXLOCK and VFREEING
explicitly (especially since we didn't know about VFREEING at all before),
but notice the EBUSY return from vget() instead.
Fix some more MP locking protocol issues, most of which were pointed out by
Christian Ehrhardt this morning on tech-kern.
 1.158.2.6 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_balloc.c: revision 1.60
sys/ufs/lfs/lfs_syscalls.c: revision 1.111
sys/ufs/lfs/lfs_segment.c: revision 1.172
sys/ufs/lfs/lfs_vnops.c: revision 1.163
Several minor bug fixes:
* Correct (weak) segment lock assertions in lfs_fragextend and lfs_putpages.
* Keep IN_MODIFIED set if we run out of avail in lfs_putpages.
* Don't try to (re)write buffers on a VBLK vnode; fixes a panic I found
while running with an LFS root.
* Raise priority of LFCNSEGWAIT to PVFS; PUSER is way too low for
something the pagedaemon is relying on.
 1.158.2.5 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.158
sys/ufs/lfs/lfs_subr.c: revision 1.57
sys/ufs/lfs/lfs_segment.c: revision 1.171
sys/ufs/lfs/lfs.h: revision 1.97
sys/ufs/lfs/lfs_vfsops.c: revision 1.195
sys/ufs/lfs/lfs_extern.h: revision 1.76
Improvements to LFS's paging mechanism, to wit:
* Acknowledge that sometimes there are more dirty pages to be written to
disk than clean segments. When we reach the danger line,
lfs_gop_write() now returns EAGAIN. The caller of VOP_PUTPAGES(), if
it holds the segment lock, drops it and waits for the cleaner to make
room before continuing.
* Note and avoid a three-way deadlock in lfs_putpages (a writer holding
a page busy blocks on the cleaner while the cleaner blocks on the
segment lock while lfs_putpages blocks on the page).
 1.158.2.4 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_segment.c: revision 1.170
sys/ufs/lfs/lfs.h: revision 1.96
sys/ufs/lfs/lfs_vfsops.c: revision 1.194
sys/ufs/lfs/lfs_syscalls.c: revision 1.109
From Konrad Schroeder, in response to strange df output on anoncvs.netbsd.org:
We were returning the wrong value for free space. Now we're not.
 1.158.2.3 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.153
sys/ufs/lfs/lfs_debug.c: revision 1.32
sys/ufs/lfs/lfs_alloc.c: revision 1.84
sys/ufs/lfs/lfs_vfsops.c: revision 1.185
sys/ufs/lfs/lfs_segment.c: revision 1.165
64 bit inode changes.
 1.158.2.2 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.152
sys/ufs/lfs/lfs_debug.c: revision 1.31
sys/ufs/lfs/lfs_subr.c: revision 1.53
sys/ufs/lfs/lfs_extern.h: revision 1.68
sys/ufs/lfs/lfs_inode.c: revision 1.96
sys/ufs/lfs/lfs_bio.c: revision 1.86
sys/ufs/lfs/lfs_alloc.c: revision 1.83
sys/ufs/lfs/lfs_vfsops.c: revision 1.181
sys/ufs/lfs/lfs.h: revision 1.88
sys/ufs/lfs/lfs_segment.c: revision 1.164
- sprinkle const
- avoid shadow variables.
 1.158.2.1 07-May-2005  tron Apply patch (requested by perseant in ticket #242):
* fsck_lfs buffer cache fixes, including PR #29151
* Change fsck_lfs phase 0 message to reflect reality
* fsck_lfs: check phase 5 (cleanerinfo accounting) even on
roll-forward
* Keep better track of the free list during roll-forward, avoiding
a core dump
* Improve hash table use for fsck_lfs buffer and vnode cache
* Document fsck_lfs flag -f, and implement -q
* Add resize_lfs, including kernel support
* Add LFS to mountd's list of exportable filesystem types
* Make the LFS lkm work again [christos@]
* Add MP locking to the LFS kernel subsystem
* Fix pager_map deadlock in lfs_putpages()
* Avoid incomplete file extension that looks like "partial
truncation" to fsck
* Use lfs_malloc for cleaner malloc, since the cleaner often runs
in low-memory conditions.
* Use splay trees, not hash table, to track page allocation for
write.
* Fix mkdir panic on full fs
* Fix page accounting leak by counting differently.
* Use rightly named structure for lfs_getattr [skrll@]
* Cosmetic changes for readability.
 1.164.2.8 27-Feb-2008  yamt sync with head.
 1.164.2.7 04-Feb-2008  yamt sync with head.
 1.164.2.6 21-Jan-2008  yamt sync with head
 1.164.2.5 27-Oct-2007  yamt sync with head.
 1.164.2.4 03-Sep-2007  yamt sync with head.
 1.164.2.3 26-Feb-2007  yamt sync with head.
 1.164.2.2 30-Dec-2006  yamt sync with head.
 1.164.2.1 21-Jun-2006  yamt sync with head.
 1.168.2.1 15-Jan-2006  yamt sync with head.
 1.169.10.2 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.169.10.1 28-Mar-2006  tron Merge 2006-03-28 NetBSD-current into the "peter-altq" branch.
 1.169.8.3 11-May-2006  elad sync with head
 1.169.8.2 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.169.8.1 19-Apr-2006  elad sync with head.
 1.169.6.6 03-Sep-2006  yamt sync with head.
 1.169.6.5 11-Aug-2006  yamt sync with head
 1.169.6.4 26-Jun-2006  yamt sync with head.
 1.169.6.3 24-May-2006  yamt sync with head.
 1.169.6.2 11-Apr-2006  yamt sync with head
 1.169.6.1 01-Apr-2006  yamt sync with head.
 1.169.4.3 01-Jun-2006  kardel Sync with head.
 1.169.4.2 22-Apr-2006  simonb Sync with head.
 1.169.4.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time() and use "time_second"
instead of "time.tv_sec".
 1.169.2.1 09-Sep-2006  rpaulo sync with head
 1.180.2.1 19-Jun-2006  chap Sync with head.
 1.182.2.1 13-Jul-2006  gdamore Merge from HEAD.
 1.190.4.3 10-Dec-2006  yamt sync with head.
 1.190.4.2 22-Oct-2006  yamt use workqueue for aiodoned.
 1.190.4.1 22-Oct-2006  yamt sync with head
 1.190.2.2 12-Jan-2007  ad Sync with head.
 1.190.2.1 18-Nov-2006  ad Sync with head.
 1.195.4.1 03-Sep-2007  wrstuden Sync w/ NetBSD-4-RC_1
 1.195.2.1 05-Jun-2007  bouyer Pull up following revision(s) (requested by perseant in ticket #703):
sys/miscfs/genfs/genfs.h 1.21
sys/miscfs/genfs/genfs_vnops.c 1.151
sys/ufs/lfs/lfs.h 1.119, 1.120
sys/ufs/lfs/lfs_bio.c 1.99-101
sys/ufs/lfs/lfs_extern.h 1.89
sys/ufs/lfs/lfs_inode.c 1.108, 1.109
sys/ufs/lfs/lfs_segment.c 1.197, 1.199, 1.200
sys/ufs/lfs/lfs_subr.c 1.69, 1.70
sys/ufs/lfs/lfs_syscalls.c 1.119
sys/ufs/lfs/lfs_vfsops.c 1.234, 1.235
sys/ufs/lfs/lfs_vnops.c 1.195, 1.196, 1.200, 1.202-206

Reduce busy waiting in lfs_putpages(), and other LFS improvements.
 1.196.2.4 17-May-2007  yamt sync with head.
 1.196.2.3 07-May-2007  yamt sync with head.
 1.196.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.196.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.198.4.1 11-Jul-2007  mjf Sync with head.
 1.198.2.13 01-Oct-2007  ad Make it compile (XXX not correct).
 1.198.2.12 28-Aug-2007  yamt - mark aiodone workqueue MPSAFE.
- make lfs callbacks acquire kernel_lock by themselves.

ok'ed by Andrew Doran.
 1.198.2.11 28-Aug-2007  yamt make this compilable with DEBUG.
 1.198.2.10 24-Aug-2007  ad Sync with buffer cache locking changes. See buf.h/vfs_bio.c for details.
Some minor portions are incomplete and needs to be verified as a whole.
 1.198.2.9 20-Aug-2007  ad Sync with HEAD.
 1.198.2.8 19-Aug-2007  ad - Back out the biodone() changes.
- Eliminate B_ERROR (from HEAD).
 1.198.2.7 15-Jul-2007  ad Sync with head.
 1.198.2.6 23-Jun-2007  ad - Lock v_cleanblkhd, v_dirtyblkhd, v_numoutput with the vnode's interlock.
Get rid of global_v_numoutput_lock. Partially incomplete as the buffer
cache locking doesn't work very well and needs an overhaul.
- Some changes to try and make softdep MP safe. Untested.
 1.198.2.5 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.198.2.4 08-Jun-2007  ad Sync with head.
 1.198.2.3 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.198.2.2 21-Mar-2007  ad - Replace more simple_locks, and fix up in a few places.
- Use condition variables.
- LOCK_ASSERT -> KASSERT.
 1.198.2.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.202.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.203.6.2 29-Jul-2007  ad It's not a good idea for device drivers to modify b_flags, as they don't
need to understand the locking around that field. Instead of setting
B_ERROR, set b_error instead. b_error is 'owned' by whoever completes
the I/O request.
 1.203.6.1 29-Jul-2007  ad file lfs_segment.c was added on branch matt-mips64 on 2007-07-29 13:31:15 +0000
 1.203.4.2 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.203.4.1 16-Aug-2007  jmcneill Sync with HEAD.
 1.204.4.1 14-Oct-2007  yamt sync with head.
 1.204.2.3 23-Mar-2008  matt sync with HEAD
 1.204.2.2 09-Jan-2008  matt sync with HEAD
 1.204.2.1 06-Nov-2007  matt sync with HEAD
 1.206.10.1 02-Jan-2008  bouyer Sync with HEAD
 1.206.6.4 19-Dec-2007  ad Use a global lfs_lock.
 1.206.6.3 19-Dec-2007  ad Fix some more problems w/lfs on this branch.
 1.206.6.2 19-Dec-2007  ad Get lfs mostly working.
 1.206.6.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.206.4.1 18-Feb-2008  mjf Sync with HEAD.
 1.209.6.3 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.209.6.2 02-Jun-2008  mjf Sync with HEAD.
 1.209.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.210.4.5 11-Aug-2010  yamt sync with head.
 1.210.4.4 11-Mar-2010  yamt sync with head
 1.210.4.3 19-Aug-2009  yamt sync with head.
 1.210.4.2 04-May-2009  yamt sync with head.
 1.210.4.1 16-May-2008  yamt sync with head.
 1.210.2.2 04-Jun-2008  yamt sync with head
 1.210.2.1 18-May-2008  yamt sync with head.
 1.211.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.213.22.2 09-Nov-2015  snj Fix ticket #1974 fallout.
 1.213.22.1 07-Nov-2015  snj Pull up following revision(s) (requested by dholland in ticket #1974):
sys/ufs/lfs/lfs_segment.c: revision 1.247 via patch
Fix catastrophic bug in lfs_rewind() that changed segment numbers
(lfs_curseg/lfs_nextseg in the superblock) using the wrong units.
These fields are for whatever reason the start addresses of segments
(measured in frags) rather than the segment numbers 0..n.
This only apparently affects dumping from a mounted fs; however, it
trashes the fs.
I would really, really like to have a static analysis tool that can
keep track of the units things are measured in, since fs code is full
of conversion macros and the macros are named inscrutable things like
"sntod" whose letters don't necessarily even correspond to the units
they convert. It is surprising that more of these are not wrong.
 1.213.18.2 09-Nov-2015  snj Fix ticket #1974 fallout.
 1.213.18.1 07-Nov-2015  snj Pull up following revision(s) (requested by dholland in ticket #1974):
sys/ufs/lfs/lfs_segment.c: revision 1.247 via patch
Fix catastrophic bug in lfs_rewind() that changed segment numbers
(lfs_curseg/lfs_nextseg in the superblock) using the wrong units.
These fields are for whatever reason the start addresses of segments
(measured in frags) rather than the segment numbers 0..n.
This only apparently affects dumping from a mounted fs; however, it
trashes the fs.
I would really, really like to have a static analysis tool that can
keep track of the units things are measured in, since fs code is full
of conversion macros and the macros are named inscrutable things like
"sntod" whose letters don't necessarily even correspond to the units
they convert. It is surprising that more of these are not wrong.
 1.213.8.2 09-Nov-2015  sborrill Fix breakage from ticket #1974
 1.213.8.1 07-Nov-2015  snj Pull up following revision(s) (requested by dholland in ticket #1974):
sys/ufs/lfs/lfs_segment.c: revision 1.247 via patch
Fix catastrophic bug in lfs_rewind() that changed segment numbers
(lfs_curseg/lfs_nextseg in the superblock) using the wrong units.
These fields are for whatever reason the start addresses of segments
(measured in frags) rather than the segment numbers 0..n.
This only apparently affects dumping from a mounted fs; however, it
trashes the fs.
I would really, really like to have a static analysis tool that can
keep track of the units things are measured in, since fs code is full
of conversion macros and the macros are named inscrutable things like
"sntod" whose letters don't necessarily even correspond to the units
they convert. It is surprising that more of these are not wrong.
 1.214.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.214.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.215.2.4 21-Apr-2011  rmind sync with head
 1.215.2.3 05-Mar-2011  rmind sync with head
 1.215.2.2 03-Jul-2010  rmind sync with head
 1.215.2.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.217.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.220.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.222.6.1 18-Feb-2012  mrg merge to -current.
 1.222.2.4 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.222.2.3 23-Jan-2013  yamt sync with head
 1.222.2.2 17-Apr-2012  yamt sync with head
 1.222.2.1 02-Nov-2011  yamt page cache related changes

- maintain object pages in radix tree rather than rb tree.
- reduce unnecessary page scan in putpages. esp. when an object has a ton of
pages cached but only a few of them are dirty.
- reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
- fix nfs commit range tracking.
- fix nfs write clustering. XXX hack
 1.223.2.2 15-Nov-2015  bouyer Pull up following revision(s) (requested by dholland in ticket #1319):
sys/ufs/lfs/lfs_segment.c: revision 1.247 via patch
Fix catastrophic bug in lfs_rewind() that changed segment numbers
(lfs_curseg/lfs_nextseg in the superblock) using the wrong units.
These fields are for whatever reason the start addresses of segments
(measured in frags) rather than the segment numbers 0..n.
This only apparently affects dumping from a mounted fs; however, it
trashes the fs.
I would really, really like to have a static analysis tool that can
keep track of the units things are measured in, since fs code is full
of conversion macros and the macros are named inscrutable things like
"sntod" whose letters don't necessarily even correspond to the units
they convert. It is surprising that more of these are not wrong.
 1.223.2.1 17-Mar-2012  bouyer Pull up following revision(s) (requested by perseant in ticket #116):
sys/ufs/lfs/lfs_alloc.c: revision 1.112
tests/fs/vfs/t_rmdirrace.c: revision 1.9
tests/fs/vfs/t_renamerace.c: revision 1.25
sys/ufs/lfs/lfs_vnops.c: revision 1.240
sys/ufs/lfs/lfs_segment.c: revision 1.224
sys/ufs/lfs/lfs_bio.c: revision 1.122
sys/ufs/lfs/lfs_vfsops.c: revision 1.294
sbin/newfs_lfs/make_lfs.c: revision 1.19
sys/ufs/lfs/lfs.h: revision 1.136
Pass t_renamerace and t_rmdirrace tests.
Adapt dholland@'s fix to ufs_rename to fix PR kern/43582. Address several
other MP locking issues discovered during the course of investigating the
same problem.
Removed extraneous vn_lock() calls on the Ifile, since the Ifile writes
are controlled by the segment lock.
Fix PR kern/45982 by deemphasizing the estimate of how much metadata
will fill the empty space on disk when the disk is nearly empty
(t_renamerace crates a lot of inode blocks on a tiny empty disk).
 1.224.2.4 03-Dec-2017  jdolecek update from HEAD
 1.224.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.224.2.2 23-Jun-2013  tls resync from head
 1.224.2.1 25-Feb-2013  tls resync with head
 1.230.2.2 18-May-2014  rmind sync with head
 1.230.2.1 28-Aug-2013  rmind sync with head
 1.236.6.5 28-Aug-2017  skrll Sync with HEAD
 1.236.6.4 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.236.6.3 22-Sep-2015  skrll Sync with HEAD
 1.236.6.2 06-Jun-2015  skrll Sync with HEAD
 1.236.6.1 06-Apr-2015  skrll Sync with HEAD
 1.236.4.1 04-Aug-2015  snj Pull up following revision(s) (requested by dholland in ticket #932):
sys/ufs/lfs/lfs_segment.c: revision 1.247 via patch
Fix catastrophic bug in lfs_rewind() that changed segment numbers
(lfs_curseg/lfs_nextseg in the superblock) using the wrong units.
These fields are for whatever reason the start addresses of segments
(measured in frags) rather than the segment numbers 0..n.
This only apparently affects dumping from a mounted fs; however, it
trashes the fs.
I would really, really like to have a static analysis tool that can
keep track of the units things are measured in, since fs code is full
of conversion macros and the macros are named inscrutable things like
"sntod" whose letters don't necessarily even correspond to the units
they convert. It is surprising that more of these are not wrong.
 1.263.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.263.2.2 26-Apr-2017  pgoyette Sync with HEAD
 1.263.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.269.6.1 30-Oct-2017  snj Pull up following revision(s) (requested by maya in ticket #330):
sbin/fsck_lfs/inode.c: 1.69
sbin/fsck_lfs/lfs.c: 1.73
sbin/fsck_lfs/pass6.c: 1.50
sbin/fsck_lfs/segwrite.c: 1.46
sys/ufs/lfs/lfs.h: 1.202-1.203
sys/ufs/lfs/lfs_accessors.h: 1.48
sys/ufs/lfs/lfs_alloc.c: 1.136-1.137
sys/ufs/lfs/lfs_balloc.c: 1.94
sys/ufs/lfs/lfs_bio.c: 1.141
sys/ufs/lfs/lfs_extern.h: 1.113
sys/ufs/lfs/lfs_inode.c: 1.156-1.157
sys/ufs/lfs/lfs_inode.h: 1.20, 1.21, 1.23
sys/ufs/lfs/lfs_itimes.c: 1.20
sys/ufs/lfs/lfs_pages.c: 1.13-1.15
sys/ufs/lfs/lfs_rename.c: 1.22
sys/ufs/lfs/lfs_segment.c: 1.270-1.275
sys/ufs/lfs/lfs_subr.c: 1.94-1.97
sys/ufs/lfs/lfs_syscalls.c: 1.175
sys/ufs/lfs/lfs_vfsops.c: 1.360
sys/ufs/lfs/lfs_vnops.c: 1.316-1.321
sys/ufs/lfs/ulfs_inode.c: 1.20
sys/ufs/lfs/ulfs_inode.h: 1.24
sys/ufs/lfs/ulfs_lookup.c: 1.41
sys/ufs/lfs/ulfs_quota2.c: 1.31
sys/ufs/lfs/ulfs_readwrite.c: 1.24
sys/ufs/lfs/ulfs_vnops.c: 1.49-1.50
Update inode member i_flag --> i_state to keep up with kernel changes
Move definition of IN_ALLMOD near the flag it's a mask for.
Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
Correct confusion between i_flag and i_flags
These will have to be renamed.
Spotted by Riastradh, thanks!
Add an XXX about the missing flags so it's not buried in a commit
message.
now the XXX count for LFS is 260
Rename i_flag to i_state.
The similarity to i_flags has previously caused errors.
Use continue to denote the no-op loop to match netbsd style
newline for extra clarity.
It isn't safe to drain dirops with seglock held, it'll deadlock if there
are any dirops. drain before grabbing seglock.
lfs_dirops == 0 is always true (as we already drained dirops), so omit
that part of the comparison.
Fixes a lot of LFS deadlocks. PR kern/52301
Many thanks to dholland for help analyzing coredumps
Ifdef out KDASSERT which fires on my machine.
Deduplicate sanity check that seglock is held on segunlock
Revert r1.272 fix to PR kern/52301, the performance hit is making things
unusable.
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar
XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
fix buffer overflow/KASSERT when cookies are supplied
lfs no longer uses the ffs-style struct direct, use the correct minimum
size
from dholland
XXX more wrong
Consistently use {,UN}MARK_VNODE macros rather than function calls.
Not much point doing anything after a panic call
Ask some question about the code in a XXX comment
XXX question our double-flushing of dirops
Fix typo in comment
 1.275.2.2 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.275.2.1 25-Jun-2018  pgoyette Sync with HEAD
 1.277.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.277.2.1 10-Jun-2019  christos Sync with HEAD
 1.278.4.1 17-Aug-2020  martin Pull up following revision(s) (requested by riastradh in ticket #1050):

sys/ufs/lfs/lfs_subr.c: revision 1.101
sys/ufs/lfs/lfs_subr.c: revision 1.102
sys/ufs/lfs/lfs_inode.c: revision 1.158
sys/ufs/lfs/lfs_inode.h: revision 1.25
sys/ufs/lfs/lfs_balloc.c: revision 1.95
sys/ufs/lfs/lfs_pages.c: revision 1.21
sys/ufs/lfs/lfs_vnops.c: revision 1.330
sys/ufs/lfs/lfs_alloc.c: revision 1.140 (patch)
sys/ufs/lfs/lfs_alloc.c: revision 1.141 (patch)
lib/libp2k/p2k.c: revision 1.72
sys/ufs/lfs/lfs.h: revision 1.205
sys/ufs/lfs/lfs.h: revision 1.206
sys/ufs/lfs/lfs_segment.c: revision 1.284
sys/ufs/lfs/lfs.h: revision 1.207
sys/ufs/lfs/lfs_segment.c: revision 1.285
sys/ufs/lfs/lfs_debug.c: revision 1.55
sys/ufs/lfs/lfs_rename.c: revision 1.23
usr.sbin/dumplfs/dumplfs.c: revision 1.65
sys/ufs/lfs/lfs_vfsops.c: revision 1.371
sys/arch/i386/stand/efiboot/bootx64/Makefile: revision 1.3
sys/ufs/lfs/lfs_vfsops.c: revision 1.372
sys/ufs/lfs/lfs_vfsops.c: revision 1.373
sbin/fsck_lfs/pass1.c: revision 1.46
sys/ufs/lfs/lfs_vnops.c: revision 1.326
sys/ufs/lfs/lfs_vnops.c: revision 1.327
sys/ufs/lfs/lfs_vfsops.c: revision 1.375 (patch)
sys/ufs/lfs/lfs_vnops.c: revision 1.328
sys/ufs/lfs/lfs_subr.c: revision 1.98
sys/ufs/lfs/lfs_extern.h: revision 1.116
sys/ufs/lfs/lfs_vnops.c: revision 1.329
sys/ufs/lfs/lfs_subr.c: revision 1.99
sys/ufs/lfs/lfs_extern.h: revision 1.117
sys/ufs/lfs/lfs_accessors.h: revision 1.49
sys/ufs/lfs/lfs_extern.h: revision 1.118
sys/rump/fs/lib/liblfs/Makefile: revision 1.15
sys/ufs/lfs/lfs_bio.c: revision 1.146 (patch)
sys/ufs/lfs/lfs_bio.c: revision 1.147
sys/ufs/lfs/lfs_subr.c: revision 1.100

Fix kassert in lfs by initializing vp first.

Use a marker node to iterate lfs_dchainhd / i_lfs_dchain.

I believe elements can be removed while the lock is dropped,
including the next node we're hanging on to.

Just use VOP_BWRITE for lfs_bwrite_log.
Hope this doesn't cause trouble with vfs_suspend.

Teach lfs to transition ro<->rw.

Prevent new dirops while we issue lfs_flush_dirops.

lfs_flush_dirops assumes (by KASSERT((ip->i_state & IN_ADIROP) == 0))
that vnodes on the dchain will not become involved in active dirops
even while holding no other locks (lfs_lock, v_interlock), so we must
set lfs_writer here. All other callers already set lfs_writer.

We set fs->lfs_writer++ without explicitly doing lfs_writer_enter
because
(a) we already waited for the dirops to drain, and
(b) we hold lfs_lock and cannot drop it before setting lfs_writer.

Assert lfs_writer where I think we can now prove it.

Serialize access to the splay tree with lfs_lock.

Change some cheap KDASSERT into KASSERT.

Take a reference and fix assertions in lfs_flush_dirops.
Fixes panic:
KASSERT((ip->i_state & IN_ADIROP) == 0) at lfs_vnops.c:1670
lfs_flush_dirops
lfs_check
lfs_setattr
VOP_SETATTR
change_mode
sys_fchmod
syscall

This assertion -- and the assertion that vp->v_uflag has VU_DIROP set
-- is valid only until we release lfs_lock, because we may race with
lfs_unmark_dirop which will remove the nodes and change the flags.

Further, vp itself is valid only as long as it is referenced, which it
is as long as it's on the dchain, but lfs_unmark_dirop drops the
dchain's reference.

Don't lfs_writer_enter while holding v_interlock.

There's no need to lfs_writer_enter at all here, as far as I can see.
lfs_flush_fs will do it for us.

Break deadlock in PR kern/52301.

The lock order is lfs_writer -> lfs_seglock. The problem in 52301 is
that lfs_segwrite violates this lock order by sometimes doing
lfs_seglock -> lfs_writer, either (a) when doing a checkpoint or (b),
opportunistically, when there are no dirops pending. Both cases can
deadlock, because dirops sometimes take the seglock (lfs_truncate,
lfs_valloc, lfs_vfree):
(a) There may be dirops pending, and they may be waiting for the
seglock, so we can't wait for them to complete while holding the
seglock.
(b) The test for fs->lfs_dirops == 0 happens unlocked, and the state
may change by the time lfs_writer_enter acquires lfs_lock.

To resolve this in each case:
(a) Do lfs_writer_enter before lfs_seglock, since we will need it
unconditionally anyway. The worst performance impact of this should
be that some dirops get delayed a little bit.
(b) Create a new lfs_writer_tryenter to use at this point so that the
test for fs->lfs_dirops == 0 and the acquisition of lfs_writer happen
atomically under lfs_lock.

Initialize/destroy lfs_allclean_wakeup in modcmd, not lfs_mountfs.

Fixes reloading lfs.kmod.

In lfs_update, hold lfs_writer around lfs_vflush.

Otherwise, we might do
lfs_vflush
-> lfs_seglock
-> lfs_segwait(SEGM_CKP)
-> lfs_writer_enter
which is the reverse of the lfs_writer -> lfs_seglock ordering.

Call lfs_orphan in lfs_rename while we're still in the dirop.
lfs_writer_enter can't fail; keep it simple and don't pretend it can.

Assert that mtsleep can't fail either -- it doesn't catch signals and
there's no timeout.

Teach LFS_ORPHAN_NEXTFREE about lfs64.

Dust off the orphan detection code and try to make it work.

Fix !DIAGNOSTIC compile

Fix userland references to LFS_ORPHAN_NEXTFREE.

Forgot to grep for these or do a full distribution build, oops!

Fix missing <sys/evcnt.h> by removing the evcnts instead.

Just wanted to confirm that a race might happen, and indeed it did.
These serve little diagnostic value otherwise.

OR into bp->b_cflags; don't overwrite.

CTASSERT lfs on-disk structure sizes.

Avoid misaligned access to lfs64 on-disk records in memory.
lfs64 directory entries are only 32-bit aligned in order to conserve
space in directory blocks, and we had a hack to stuff a 64-bit inode
in them. This replaces the hack by __aligned(4) __packed, and goes
further:

1. It's not clear that all the other lfs64 data structures are 64-bit
aligned on disk to begin with. We can go through these later and
upgrade them from
struct foo64 {
...
} __aligned(4) __packed;
union foo {
struct foo64 f64;
...
};
to
struct foo64 {
...
};
union foo {
struct foo64 f64 __aligned(8);
...
} __aligned(4) __packed;
if we really want to take advantage of 64-bit memory accesses.
However, the __aligned(4) __packed must remain on the union
because:
2. We access even the lfs32 data structures via a union that has
lfs64 members, and it turns out that compilers will assume access
through a union with 64-bit aligned members implies the whole
union has 64-bit alignment, even if we're only accessing a 32-bit
aligned member.

Fix clang build after packed lfs64 accessor change.

Suppress spurious address-of-packed error in rump lfs too.
 1.280.2.2 29-Feb-2020  ad Sync with head.
 1.280.2.1 17-Jan-2020  ad Sync with head.
 1.107 04-Nov-2025  perseant Remove su_flags array, replacing it with a new flag SEGUSE_READY.
Segments progress from having su_nbytes==0 to SEGUSE_EMPTY to SEGUSE_READY
to clean, progressing to the nest step after a checkpoint.
 1.106 03-Nov-2025  perseant Be more careful about only setting IN_CLEANING in lfs_setclean() and clearing
it in lfs_clrclean(). Prevents a crash from re-removing an entry from the
lfs_cleanhd TAILQ.
 1.105 20-Oct-2025  perseant * Generalize the partial-segment parser introduced for roll-forward,
using it to facilitate an in-kernel segment rewriter (cleaner), and a
mechanism to check whether a segment is in fact empty (only used
with DEBUG).

* Add these new fcntl calls:
- LFCNFILESTATS: For each inode given, report its number of direct
blocks, how many gaps (discontinuities) there are between direct
blocks, and how large the total gap distance is. This will be
useful for a coalescing agent.
- LFCNREWRITEFILE: For each inode given, rewrite its direct blocks,
effectively coalescing it into as compact a form as possible.
- LFCNSCRAMBLE: As above, except that it only rewrites every other
block. This causes the file to have many gaps that can be
measured with LFCNFILESTATS and addressed with LFCNREWRITEFILE,
for testing purposes.
- LFCNREWRITESEGS: Rewrite any live data in the given segments.
This is intended to simplify the cleaner API and facilitate an
in-kernel cleaner.
- LFCNCLEANERINFO: Get the most current CLEANERINFO data from the
kernel.
- LFCNSEGUSE: Retrieve segment usage data from the kernel.

* Vnodes marked IN_CLEANING now take a reference. Add a new "cleaner
lock", which must be taken by the cleaner before the segment lock,
and before marking nodes IN_CLEANING. This allows us to flush
vnodes, if necessary, before the cleaning segment is written, and
never to flush vnodes being cleaned. When the cleaner lock is
released, the vnodes are cleared of IN_CLEANING and the reference
dropped.

* Track a potential infinite loop in lfs_gatherblock.

* Pull "needs to flush" and "needs to wait for flush" into functions
instead of inlining their definitions.
 1.104 04-Sep-2025  perseant Copy the flags from a full partial segment to its continuation, if
a continuation is necessary, so that partial-segment collections marked
with SS_DIROP|SS_CONT are properly completed wiht a partial-segment marked
SS_DIROP (without SS_CONT). Necessary for roll-forward.
 1.103 05-Sep-2020  riastradh Round of uvm.h cleanup.

The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
to query whether curlwp is the pagedaemon, which should maybe be
exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
by it. We should split up uvm_extern.h but this will serve for now
to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
UVMHIST(ubchist), since ubchist is declared in uvm.h but the
reference evaporates if UVMHIST is not defined, so we reduce header
file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
here.

ok chs@
 1.102 23-Feb-2020  riastradh Fix missing <sys/evcnt.h> by removing the evcnts instead.

Just wanted to confirm that a race might happen, and indeed it did.
These serve little diagnostic value otherwise.
 1.101 23-Feb-2020  ad Fix !DIAGNOSTIC compile
 1.100 23-Feb-2020  riastradh lfs_writer_enter can't fail; keep it simple and don't pretend it can.

Assert that mtsleep can't fail either -- it doesn't catch signals and
there's no timeout.
 1.99 23-Feb-2020  riastradh Break deadlock in PR kern/52301.

The lock order is lfs_writer -> lfs_seglock. The problem in 52301 is
that lfs_segwrite violates this lock order by sometimes doing
lfs_seglock -> lfs_writer, either (a) when doing a checkpoint or (b),
opportunistically, when there are no dirops pending. Both cases can
deadlock, because dirops sometimes take the seglock (lfs_truncate,
lfs_valloc, lfs_vfree):

(a) There may be dirops pending, and they may be waiting for the
seglock, so we can't wait for them to complete while holding the
seglock.

(b) The test for fs->lfs_dirops == 0 happens unlocked, and the state
may change by the time lfs_writer_enter acquires lfs_lock.

To resolve this in each case:

(a) Do lfs_writer_enter before lfs_seglock, since we will need it
unconditionally anyway. The worst performance impact of this should
be that some dirops get delayed a little bit.

(b) Create a new lfs_writer_tryenter to use at this point so that the
test for fs->lfs_dirops == 0 and the acquisition of lfs_writer happen
atomically under lfs_lock.
 1.98 23-Feb-2020  riastradh Use a marker node to iterate lfs_dchainhd / i_lfs_dchain.

I believe elements can be removed while the lock is dropped,
including the next node we're hanging on to.
 1.97 26-Jul-2017  maya branches: 1.97.4; 1.97.8; 1.97.10;
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar

XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
 1.96 26-Jul-2017  maya Deduplicate sanity check that seglock is held on segunlock
 1.95 19-Jun-2017  maya Ifdef out KDASSERT which fires on my machine.
 1.94 10-Jun-2017  maya Rename i_flag to i_state.

The similarity to i_flags has previously caused errors.
 1.93 08-Jun-2017  chs move some buffer cache internals declarations from buf.h to vfs_bio.c.
this is needed to avoid name conflicts with ZFS and also
makes it clearer that other code shouldn't be messing with these.
remove the LFS debug code that poked around in bufqueues and
remove the BQ_EMPTY bufqueue since nothing uses it anymore.
provide a function to let LFS and wapbl read the value of nbuf for now.
 1.92 06-Apr-2017  maya branches: 1.92.6;
don't guard lfs_sbactive or lfs_log with splbio, lfs_lock is plenty.
 1.91 06-Apr-2017  maya don't guard lfs_reshash with splbio, lfs_lock is plenty
 1.90 06-Apr-2017  maya if DEBUG panic => KDASSERT. reduces ifdefs. NFC
 1.89 06-Apr-2017  maya Provide a LFS_ENTER_LOG (__nothing) in the !DEBUG case.
so I can drop lots of #ifdef DEBUG around this macro. NFCI
 1.88 01-Apr-2017  maya Keep on holding lfs_lock when calling cv_broadcast

pointed out by skrll, thanks.
 1.87 01-Apr-2017  maya switch lfs_dirops to condvar (from mtsleep)
 1.86 03-Oct-2015  dholland branches: 1.86.2; 1.86.4;
Use IINFO in lfs_writeinode().
(both the kernel and the userland copies)
 1.85 12-Aug-2015  dholland Make 32-bit and 64-bit versions of CLEANERINFO.

XXX: while this is written to disk, it seems like much of it would
XXX: be better set up as a commpage shared with the cleaner.
 1.84 28-Jul-2015  dholland Add a new lfs header file: lfs_accessors.h.

This contains all the accessor functions and macros out of lfs.h.
Add an include of lfs_accessors.h after all uses of lfs.h... except
for code that wants to define its own struct lfs-alike that the
accessors are supposed to play along with. For these, set STRUCT_LFS
and include lfs_accessors.h after the necessary structure has been
defined, so that lfs_accessors.h can emit functions in terms of it.
 1.83 24-Jul-2015  dholland More lfs superblock accessors.
(This changes the rest of the code over; all the accessors were
already added.)

The difference between this commit and the previous one is arbitrary,
but the previous one passed the regression tests on its own so I'm
keeping it separate to help with any bisections that might be needed
in the future.
 1.82 24-Jul-2015  dholland Switch to accessor functions for elements of the LFS on-disk
superblock. This will allow switching between 32/64 bit forms on the
fly; it will also allow handling LFS_EI reasonably tidily. (That
currently doesn't work on the superblock.)

It also gets rid of cpp abuse in the form of fake structure member
macros.

Also, instead of doing sleep/wakeup on &lfs_avail and &lfs_nextseg
inside the on-disk superblock, add extra elements to the in-memory
struct lfs for this. (XXX: these should be changed to condvars, but
not right now)

XXX: this migrates a structure needed by the lfs code in libsa (struct
salfs) into lfs.h, where it doesn't belong, but for the time being
this is necessary in order to allow the accessors (and the various
lfs macros and other goop that relies on them) to compile.
 1.81 16-Jul-2015  dholland Don't cast the return value of malloc.
 1.80 28-Jul-2013  dholland branches: 1.80.6;
Add lfs_kernel.h for declarations that don't need to be exposed to userland.

lfs currently has the following headers:
lfs.h - on-disk structures and stuff needed for userlevel tools
lfs_inode.h - additional restricted materials for userlevel tools
that operate the fs (newfs_lfs, fsck_lfs, lfs_cleanerd)
lfs_kernel.h - stuff needed only in the kernel

and the following legacy headers that are expected to be mopped up and
folded into one of the above:
lfs_extern.h - function prototypes
ulfs_bswap.h - endian-independent support
ulfs_dinode.h - now contains very little
ulfs_dirhash.h - dirhash support
ulfs_extattr.h - extattr support
ulfs_extern.h - more function prototypes
ulfs_inode.h - assorted kernel-only declarations
ulfs_quota.h - quota support
ulfs_quota1.h - more quota support
ulfs_quota2.h - more quota support
ulfs_quotacommon.h - more quota support
ulfsmount.h - legacy copy of ufsmount material
 1.79 18-Jun-2013  christos branches: 1.79.2;
Prefix most of the cpp macros with lfs_ and LFS_ to avoid conflicts with ffs.
This was done so that boot blocks that want to compile both FFS and LFS in
the same file work.
 1.78 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.77 02-Jan-2012  perseant branches: 1.77.6;

* Remove PGO_RECLAIM during lfs_putpages()' call to genfs_putpages(),
to avoid a live lock in the latter when reclaiming a vnode with
dirty pages.

* Add a new segment flag, SEGM_RECLAIM, to note when a segment is
being written for vnode reclamation, and record which inode is being
reclaimed, to aid in forensic debugging.

* Add a new segment flag, SEGM_SINGLE, so that opportunistic writes
can write a single segment's worth of blocks and then stop, rather
than writing all the way up to the cleaner's reserved number of
segments.

* Add assert statements to check mutex ownership is the way it ought
to be, mostly in lfs_putpages; fix problems uncovered by this.

* Don't clear VU_DIROP until the inode actually makes its way to disk,
avoiding a problem where dirop inodes could become separated
(uncovered by a modified version of the "ckckp" forensic regression
test).

* Move the vfs_getopsbyname() call into lfs_writerd. Prepare code to
make lfs_writerd notice when there are no more LFSs, and exit losing
the reference, so that, in theory, the module can be unloaded. This
code is not enabled, since it causes a crash on exit.

* Set IN_MODIFIED on inodes flushed by lfs_flush_dirops. Really we
only need to set IN_MODIFIED if we are going to write them again
(e.g., to write pages); need to think about this more.

Finally, several changes to help avoid "no clean segments" panics:

* In lfs_bmapv, note when a vnode is loaded only to discover whether
its blocks are live, so it can immediately be recycled. Since the
cleaner will try to choose ~empty segments over full ones, this
prevents the cleaner from (1) filling the vnode cache with junk, and
(2) squeezing any unwritten writes to disk and running the fs out of
segments.

* Overestimate by half the amount of metadata that will be required
to fill the clean segments. This will make the disk appear smaller,
but should help avoid a "no clean segments" panic.

* Rearrange lfs_writerd. In particular, lfs_writerd now pays
attention to the number of clean segments available, and holds off
writing until there is room.
 1.76 25-Jun-2010  hannken branches: 1.76.8; 1.76.12;
Undo last commit and don't try to lock vnodes in lfs_unmark_dirop()
as we may deadlock trying to write the superblock.

Should fix PR #43503 Can't create device nodes on LFS.
 1.75 24-Jun-2010  hannken Clean up vnode lock operations:

- VOP_LOCK(vp, flags): Limit the set of allowed flags to LK_EXCLUSIVE,
LK_SHARED and LK_NOWAIT. LK_INTERLOCK is no longer allowed as it
makes no sense here.

- VOP_ISLOCKED(vp): Remove the for some time unused return value
LK_EXCLOTHER. Mark this operation as "diagnostic only".
Making a lock decision based on this operation is no longer allowed.

Discussed on tech-kern.
 1.74 16-Feb-2010  mlelstv branches: 1.74.2;
Three changes in a single commit.

- drop the notion of frags (LFS fragments) vs fsb (FFS fragments)
The code uses a complicated unity function that just makes the
code difficult to understand.

- support larger sector sizes. Fix disk address computations
to use DEV_BSIZE in the kernel as required by device drivers
and to use sector sizes in userland.

- Fix several locking bugs in lfs_bio.c and lfs_subr.c.
 1.73 28-Apr-2008  martin branches: 1.73.20;
Remove clause 3 and 4 from TNF licenses
 1.72 02-Jan-2008  ad branches: 1.72.6; 1.72.8; 1.72.10;
Merge vmlocking2 to head.
 1.71 10-Oct-2007  ad branches: 1.71.4; 1.71.6; 1.71.10;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.70 15-May-2007  tnn branches: 1.70.6; 1.70.8; 1.70.10;
Add missing underscore to wchan name.
 1.69 18-Apr-2007  perseant Add/change a couple of comments about locking restrictions.
 1.68 12-Mar-2007  ad branches: 1.68.2;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.
 1.67 21-Feb-2007  thorpej branches: 1.67.4;
Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.66 15-Feb-2007  ad branches: 1.66.2;
Replace some uses of lockmgr() / simplelocks.
 1.65 16-Nov-2006  christos branches: 1.65.2; 1.65.4;
__unused removal on arguments; approved by core.
 1.64 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.63 04-Oct-2006  christos fix empty if
 1.62 15-Sep-2006  perseant branches: 1.62.2;
Don't remark a locked inode with IN_MODIFIED after writing it to disk,
if we ourselves hold the lock. This prevents e.g. mknod from hanging
indefinitely.

Also, always use the return value from VOP_ISLOCKED to determine whether
we hold the lock or someone else does, rather than looking into the lock
structure ourselves.
 1.61 01-Sep-2006  perseant branches: 1.61.2;
Changes to help the roll-forward agent, to wit:

* Mark being-deleted files in the Ifile so we can finish deleting them
at fs mount time.
* Flag the Ifile with "cleaner must clean" when writers are waiting for
the cleaner, rather than relying solely on the cleaner's estimation of
whether it should clean or not.
* Note partial segments written by a user agent (in particular,
fsck_lfs) so that repeated rolls forward don't interfere with one
another.
* Add a new fcntl, LFCNPASS, that allows the log to wrap exactly once,
for better testing of the validity of checkpoints.
* Keep track of the on-disk nlink count when cleaning, so that we don't
partially complete directory operations while cleaning.
* Ensure that every single Ifile inode write represents a consistent
view of the filesystem. In particular, the accounting for the segment
we are writing the inode into must be correct, and the accounting for
the segment that inode used to reside in must be correct. Rather than
just rewriting the inode if we wrote it wrong, rewrite the necessary
ifile blocks before writing the inode so we never write it wrong.
* Don't unmark any VDIROP vnodes if we haven't written them to disk,
avoiding yet another problem with the "wait for the cleaner" error
return from lfs_putpages().

Also, move the last callback to an aiodone call, so we no longer do any
memory management from interrupt context.
 1.60 29-Jun-2006  perseant Don't wake up the cleaner if the filesystem is unwrappable, and fix the
compatibility fcntls.

Also includes one-line fixes for an MP locking bug and a zero-length FINFO
problem that manifested during testing.
 1.59 04-May-2006  perseant branches: 1.59.4;
Introduce another per-filesystem parameter, lfs_resvseg, to separate the
notion of "how many segments are reserved for the cleaner" from that of
"how many segments are not counted in lfs_bfree". The default value
used for existing filesystems is the same as the previous implicit value
of (lfs_minfreeseg / 2 + 1), modulo some sanity checking.

Count pending dirops on a per-filesystem basis, since once we start
writing them we can't stop until we're done. This seems to help stave off
the "no clean segments" panic in the case of filling the filesystem with
directories and small files (e.g. simultaneously unpacking more copies of
pkgsrc than will fit).
 1.58 07-Apr-2006  perseant Make the segment lock aware of LWPs. Fixes a (somewhat confusing)
"lockmgr: pid 3997, not exclusive lockholder 3997, unlocking" panic I
encountered while running blogbench on an LFS.
 1.57 24-Mar-2006  perseant Improvements to LFS's paging mechanism, to wit:

* Acknowledge that sometimes there are more dirty pages to be written to
disk than clean segments. When we reach the danger line,
lfs_gop_write() now returns EAGAIN. The caller of VOP_PUTPAGES(), if
it holds the segment lock, drops it and waits for the cleaner to make
room before continuing.

* Note and avoid a three-way deadlock in lfs_putpages (a writer holding
a page busy blocks on the cleaner while the cleaner blocks on the
segment lock while lfs_putpages blocks on the page).
 1.56 14-Jan-2006  yamt branches: 1.56.2; 1.56.4; 1.56.6; 1.56.8; 1.56.10;
- unify ffs_blkatoff and lfs_blkatoff.
- remove ufs_ops::uo_blkatoff.
- add directory read-ahead code. (disabled for now.)
 1.55 11-Dec-2005  christos branches: 1.55.2;
merge ktrace-lwp.
 1.54 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.53 29-May-2005  christos branches: 1.53.2; 1.53.4;
- sprinkle const
- avoid shadow variables.
 1.52 16-Apr-2005  perseant Use lfs_malloc() to manage the blkiov arrays that the cleaner functions use,
since the cleaner is likely to operate in a low-memory condition.
 1.51 01-Apr-2005  perseant Protect various per-fs structures with fs->lfs_interlock simple_lock, to
improve behavior in the multiprocessor case. Add debugging segment-lock
assertion statements.
 1.50 08-Mar-2005  perseant branches: 1.50.2;
Straighten out the maze of ifdefs. Instead, consolidate all the debugging
stuff under '#ifdef DEBUG', and use sysctl knobs to turn on/off particular
parts of the debugging reporting (if DEBUG is enabled). Re-enable the LFS
statistics in sysctl, while I'm there. A bit of a rototill.
 1.49 26-Feb-2005  perry nuke trailing whitespace
 1.48 26-Feb-2005  perseant Various minor LFS improvements:

* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statvfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().
 1.47 09-Mar-2004  yamt branches: 1.47.6; 1.47.8; 1.47.10;
use correct segment size. this fixes memory corruption when using lfsv1.
 1.46 21-Dec-2003  simonb Fix usage of fifth argument to pool_init().
 1.45 14-Oct-2003  dbj add mnt_iflag field to struct mount for internal flags
mv MNT_GONE, MNT_UNMOUNT and MNT_WANTRDWR to this field
additonally add mnt_writeopcountupper and mnt_writeopcountlower fields
in preparation for pending write suspension support work
bump kernel version to 1.6ZD
 1.44 07-Sep-2003  yamt use LFS_DEBUG_COUNTLOCKED macro.
 1.43 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.42 12-Jul-2003  yamt - protect global resource counts with lfs_subsys_lock.
- clean up scattered externs a little.
 1.41 02-Jul-2003  yamt - add a new functions, lfs_writer_enter/leave, and use them instead of
duplicated code fragments.
- add an assertion.
 1.40 23-Apr-2003  perseant branches: 1.40.2;
Make LFS work better (though still not "well") as an NFS-exported
filesystem (and other things that needed to be fixed before the tests
would complete), to wit:

* Include the fs ident in the filehandle; improve stale filehandle checks.

* Change definition of blksize() to use the on-dinode size instead of
the inode's i_size, so that fsck_lfs will work properly again.

* Use b_interlock in lfs_vtruncbuf.

* Postpone dirop reclamation until after the seglock has been released,
so that lfs_truncate is not called with the segment lock held.

* Don't loop in lfs_fsync(), just write everything and wait.

* Be more careful about the interlock/uobjlock in lfs_putpages: when we
lose this lock, we have to resynchronize dirtiness of pages in each
block.

* Be sure to always write indirect blocks and update metadata in
lfs_putpages; fixes a bug that caused blocks to be accounted to the
wrong segment.
 1.39 21-Mar-2003  perseant KNF (space after keywords).
 1.38 15-Mar-2003  perseant Add simple_lock protection for lfs_seglock and lfs_subsys_pages; these will
be expanded to cover other per-fs and subsystem-wide data as well.

Fix a case of IN_MODIFIED being set without updating lfs_uinodes, resulting
in a "lfs_uinodes < 0" panic.

Fix a deadlock in lfs_putpages arising from the need to busy all pages in a
block; unbusy any that had already been busied before starting over.
 1.37 11-Mar-2003  perseant - Get rid of unused #ifdefs LFS_NO_PAGEMOVE and LFS_MALLOC_SUMMARY (both
always true) and accompanying dead code.

- When constructing write clusters in lfs_writeseg, if the block we are
about to add is itself a cluster from GOP_WRITE, don't put a cluster
in a cluster, just write the GOP_WRITE cluster on its own. This seems
to represent a slight performance gain on my test machine.

- Charge someone's rusage for writes on LFSes. It's difficult to tell
who the "right" process to charge is; just charge whoever triggered
the write.
 1.36 08-Mar-2003  perseant Add an lfs_strategy() that checks to make sure we're not trying to read
where the cleaner is trying to write, instead of tying up the "live"
buffers (or pages).

Fix a bug in the LFS_UBC case where oversized buffers would not be
checksummed correctly, causing uncleanable segments.

Make sure that wakeup(fs->lfs_iocount) is done if fs->lfs_iocount is 1
as well as 0, since we wait in some places for it to drop to 1.

Activate all pages that make it into lfs_gop_write without the segment
lock held, since they must have been dirtied very recently, even if
PG_DELWRI is not set.
 1.35 04-Mar-2003  perseant Don't add dirty blocks to the ifile in lfs_segunlock, if we're trying to
unmount the filesystem. This avoids a "dirty blocks" panic.
 1.34 23-Feb-2003  perseant Fix a buffer overflow bug in the LFS_UBC case that manifested itself
either as a mysterious UVM error or as "panic: dirty bufs". Verify
maximum size in lfs_malloc.

Teach lfs_updatemeta and lfs_shellsort about oversized cluster blocks from
lfs_gop_write.

When unwiring pages in lfs_gop_write, deactivate them, under the theory
that the pagedaemon wanted to free them last we knew.
 1.33 20-Feb-2003  perseant Tabify, and fix some comment alignment problems.
 1.32 19-Feb-2003  yamt add debug code to lfs_free.
 1.31 17-Feb-2003  perseant Add code to UBCify LFS. This is still behind "#ifdef LFS_UBC" for now
(there are still some details to work out) but expect that to go
away soon. To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:

* Create a writer daemon kernel thread whose purpose is to handle page
writes for the pagedaemon, but which also takes over some of the
functions of lfs_check(). This thread is started the first time an
LFS is mounted.

* Add a "flags" parameter to GOP_SIZE. Current values are
GOP_SIZE_READ, meaning that the call should return the size of the
in-core version of the file, and GOP_SIZE_WRITE, meaning that it
should return the on-disk size. One of GOP_SIZE_READ or
GOP_SIZE_WRITE must be specified.

* Instead of using malloc(...M_WAITOK) for everything, reserve enough
resources to get by and use malloc(...M_NOWAIT), using the reserves if
necessary. Use the pool subsystem for structures small enough that
this is feasible. This also obsoletes LFS_THROTTLE.

And a few that are not strictly necessary:

* Moves the LFS inode extensions off onto a separately allocated
structure; getting closer to LFS as an LKM. "Welcome to 1.6O."

* Unified GOP_ALLOC between FFS and LFS.

* Update LFS copyright headers to correct values.

* Actually cast to unsigned in lfs_shellsort, like the comment says.

* Keep track of which segments were empty before the previous
checkpoint; any segments that pass two checkpoints both dirty and
empty can be summarily cleaned. Do this. Right now lfs_segclean
still works, but this should be turned into an effectless
compatibility syscall.
 1.30 29-Jan-2003  yamt don't use daddr_t for segment summary since it's an on-disk structure.
 1.29 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.28 11-Jul-2002  perseant Remove lying comment on SEGM_PROT seglock.
 1.27 06-Jul-2002  perseant Deal with fragment size changes better. For each fragment that can
exist on an on-disk inode, we keep a record of its size in struct inode,
which is updated when we write the block to disk. The cleaner routines
thus have ready access to what size is the correct size for this block,
on disk.

Fixed a related bug: if a file with fragments is being cleaned
(fragments being cleaned) at the same time it is being extended beyond
NDADDR blocks, we could write a bogus FINFO record that has a frag in the
middle; when it was cleaned this would give back bogus file data. Don't
write the indirect blocks in this case, since there is no need.

lfs_fragextend and lfs_truncate no longer require the seglock, but instead
take a shared lock, which the seglock locks exclusively.
 1.26 16-Jun-2002  perseant For synchronous writes, keep separate i/o counters for each write, so
processes don't have to wait for one another to finish (e.g., nfsd seems
to be a little happier now, though I haven't measured the difference).
Synchronous checkpoints, however, must always wait for all i/o to finish.

Take the contents of the callback functions and have them run in thread
context instead (aiodoned thread). lfs_iocount no longer has to be
protected in splbio(), and quite a bit less of the segment construction
loop needs to be in splbio() as well.

If lfs_markv is handed a block that is not the correct size according to
the inode, refuse to process it. (Formerly it was extended to the "correct"
size.) This is possibly more prone to deadlock, but less prone to corruption.

lfs_segclean now outright refuses to clean segments that appear to have live
bytes in them. Again this may be more prone to deadlock but avoids
corruption.

Replace ufsspec_close and ufsfifo_close with LFS equivalents; this means
that no UFS functions need to know about LFS_ITIMES any more. Remove
the reference from ufs/inode.h.

Tested on i386, test-compiled on alpha.
 1.25 24-May-2002  perseant Fix a couple of instances where reassignbuf() was not done at splbio.

Tested on i386.
 1.24 23-May-2002  perseant Back out rev 1.174 of vfs_subr.c, because the splbio() wasn't protecting
enough to be useful, and broadening it so that it did would have meant
that operations possibly requiring synchronous disk activity would have
to be done in splbio(). This clearly was not going to work.

Worked around this in the LFS case by having lfs_cluster_callback put an
extra hold on the vnode before calling biodone(), and taking the hold
off without HOLDRELE's problematic list swapping. lfs_vunref() will take
care of that---in thread context---on the next write if need be.

Also, ensure that the list walking in lfs_{writevnodes,segunlock,gather}
takes into account the possibility that the list may change
underneath it (possibly because it itself deleted an element).

Tested on i386, test-compiled on alpha.
 1.23 17-May-2002  perseant branches: 1.23.2;
use macros from <sys/queue.h>
 1.22 14-May-2002  perseant branches: 1.22.2;
Phase one of my three-phase plan to make LFS play nice with UBC, and bug-fixes
I found while making sure there weren't any new ones.

* Make the write clusters keep track of the buffers whose blocks they contain.
This should make it possible to (1) write clusters using a page mapping
instead of malloc, if desired, and (2) schedule blocks for rewriting
(somewhere else) if a write error occurs. Code is present to use
pagemove() to construct the clusters but that is untested and will go away
anyway in favor of page mapping.
* DEBUG now keeps a log of Ifile writes, so that any lingering instances of
the "dirty bufs" problem can be properly debugged.
* Keep track of whether the Ifile has been dirtied by various routines that
can be called by lfs_segwrite, and loop on that until it is clean, for
a checkpoint. Checkpoints need to be squeaky clean.
* Warn the user (once) if the Ifile grows larger than is reasonable for their
buffer cache. Both lfs_mountfs and lfs_unmount check since the Ifile can
grow.
* If an inode is not found in a disk block, try rereading the block, under
the assumption that the block was copied to a cluster and then freed.
* Protect WRITEINPROG() with splbio() to fix a hang in lfs_update.
 1.21 23-Nov-2001  chs add spaces for KNF. confirmed to produce identical objects.
 1.20 08-Nov-2001  lukem add RCSID
 1.19 26-Oct-2001  lukem remove #include <ufs/ufs/quota.h> where it was just to appease
<ufs/ufs/inode.h>, since the latter now includes the former. leave the former
in source that obviously uses specific bits of it (for completeness.)
 1.18 13-Jul-2001  perseant branches: 1.18.4;
Merge the short-lived perseant-lfsv2 branch into the trunk.

Kernels and tools understand both v1 and v2 filesystems; newfs_lfs
generates v2 by default. Changes for the v2 layout include:

- Segments of non-PO2 size and arbitrary block offset, so these can be
matched to convenient physical characteristics of the partition (e.g.,
stripe or track size and offset).

- Address by fragment instead of by disk sector, paving the way for
non-512-byte-sector devices. In theory fragments can be as large
as you like, though in reality they must be smaller than MAXBSIZE in size.

- Use serial number and filesystem identifier to ensure that roll-forward
doesn't get old data and think it's new. Roll-forward is enabled for
v2 filesystems, though not for v1 filesystems by default.

- The inode free list is now a tailq, paving the way for undelete (undelete
is not yet implemented, but can be without further non-backwards-compatible
changes to disk structures).

- Inode atime information is kept in the Ifile, instead of on the inode;
that is, the inode is never written *just* because atime was changed.
Because of this the inodes remain near the file data on the disk, rather
than wandering all over as the disk is read repeatedly. This speeds up
repeated reads by a small but noticeable amount.

Other changes of note include:

- The ifile written by newfs_lfs can now be of arbitrary length, it is no
longer restricted to a single indirect block.

- Fixed an old bug where ctime was changed every time a vnode was created.
I need to look more closely to make sure that the times are only updated
during write(2) and friends, not after-the-fact during a segment write,
and certainly not by the cleaner.
 1.17 09-Sep-2000  perseant branches: 1.17.2; 1.17.4; 1.17.6;
Various bug-fixes to LFS, to wit:


Kernel:

* Add runtime quantity lfs_ravail, the number of disk-blocks reserved
for writing. Writes to the filesystem first reserve a maximum amount
of blocks before their write is allowed to proceed; after the blocks
are allocated the reserved total is reduced by a corresponding amount.

If the lfs_reserve function cannot immediately reserve the requested
number of blocks, the inode is unlocked, and the thread sleeps until
the cleaner has made enough space available for the blocks to be
reserved. In this way large files can be written to the filesystem
(or, smaller files can be written to a nearly-full but thoroughly
clean filesystem) and the cleaner can still function properly.

* Remove explicit switching on dlfs_minfreeseg from the kernel code; it
is now merely a fs-creation parameter used to compute dlfs_avail and
dlfs_bfree (and used by fsck_lfs(8) to check their accuracy). Its
former role is better assumed by a properly computed dlfs_avail.

* Bounds-check inode numbers submitted through lfs_bmapv and lfs_markv.
This prevents a panic, but, if the cleaner is feeding the filesystem
the wrong data, you are still in a world of hurt.

* Cleanup: remove explicit references of DEV_BSIZE in favor of
btodb()/dbtob().

lfs_cleanerd:

* Make -n mean "send N segments' blocks through a single call to
lfs_markv". Previously it had meant "clean N segments though N calls
to lfs_markv, before looking again to see if more need to be cleaned".
The new behavior gives better packing of direct data on disk with as
little metadata as possible, largely alleviating the problem that the
cleaner can consume more disk through inefficient use of metadata than
it frees by moving dirty data away from clean "holes" to produce
entirely clean segments.

* Make -b mean "read as many segments as necessary to write N segments
of dirty data back to disk", rather than its former meaning of "read
as many segments as necessary to free N segments worth of space". The
new meaning, combined with the new -n behavior described above,
further aids in cleaning storage efficiency as entire segments can be
written at once, using as few blocks as possible for segment summaries
and inode blocks.

* Make the cleaner take note of segments which could not be cleaned due
to error, and not attempt to clean them until they are entirely free
of dirty blocks. This prevents the case in which a cleanerd running
with -n 1 and without -b (formerly the default) would spin trying
repeatedly to clean a corrupt segment, while the remaining space
filled and deadlocked the filesystem.

* Update the lfs_cleanerd manual page to describe all the options,
including the changes mentioned here (in particular, the -b and -n
flags were previously undocumented).

fsck_lfs:

* Check, and optionally fix, lfs_avail (to an exact figure) and
lfs_bfree (within a margin of error) in pass 5.

newfs_lfs:

* Reduce the default dlfs_minfreeseg to 1/20 of the total segments.

* Add a warning if the sgs disklabel field is 16 (the default for FFS'
cpg, but not usually desirable for LFS' sgs: 5--8 is a better range).

* Change the calculation of lfs_avail and lfs_bfree, corresponding to
the kernel changes mentioned above.

mount_lfs:

* Add -N and -b options to pass corresponding -n and -b options to
lfs_cleanerd.

* Default to calling lfs_cleanerd with "-b -n 4".


[All of these changes were largely tested in the 1.5 branch, with the
idea that they (along with previous un-pulled-up work) could be applied
to the branch while it was still in ALPHA2; however my test system has
experienced corruption on another filesystem (/dev/console has gone
missing :^), and, while I believe this unrelated to the LFS changes, I
cannot with good conscience request that the changes be pulled up.]
 1.16 27-Jun-2000  perseant Fixes associated with filling an LFS:

Change the space computation to appear to change the size of the *disk*
rather than the *bytes used* when more segment summaries and inode
blocks are written. Try to estimate the amount of space that these will
take up when more files are written, so the disk size doesn't change too
much.

Regularize error returns from lfs_valloc, lfs_balloc, lfs_truncate: they
now fail entirely, rather than succeeding half-way and leaving the fs in
an inconsistent state.

Rewrite lfs_truncate, mostly stealing from ffs_truncate. The old
lfs_truncate had difficulty truncating a large file to a non-zero size
(indirect blocks were not handled appropriately).

Unmark VDIROP on fvp after ufs_remove, ufs_rmdir, so these can be
reclaimed immediately: this vnode would not be written to disk again
anyway if the removal succeeded, and if it failed, no directory
operation occurred.

ufs_makeinode and ufs_mkdir now remove IN_ADIROP on error.
 1.15 06-Jun-2000  perseant branches: 1.15.2;
Don't try to inactivate dirop vnodes that are still in the middle of
their dirop.
 1.14 05-May-2000  perseant branches: 1.14.2;
Change the way LFS does block accounting, from trying to infer from the
buffer cache flags, to marking the inode and/or indirect blocks with a
special disk address UNWRITTEN==-2 when a block is accounted for. (This
address is never written to disk, but only used in-core. This is essentially
the same method of block accounting as on the UBC branch, where the buffer
headers don't exist.) Make sure that truncation is handled properly,
especially in the case of holey files.

Fixes PR#9994.
 1.13 30-Mar-2000  augustss Remove register declarations.
 1.12 19-Jan-2000  perseant Changes to stabilize LFS. The first two of these should also apply to the
1.4 branch.

* Use a separate per-fs lock, instead of ufs_hashlock, to protect the Inode
free list. This seems to prevent the "lockmgr: %d, not exclusive lock holder
%d, unlocking" message I was mis-attributing last night to an unlocked vnode
being passed to vrele.

* Change calling semantics of lfs_ifind, to give better error reporting:
If fed a struct buf, it can report the block number of the offending inode
block as well as the inode number.

* Back out rev 1.10 of lfs_subr.c, since the replacement code was slightly
uglier while being functionally identical.

* Make lfs_vunref use the same free list convention as vrele/vput, so that
vget does not remove vnodes from a hash list they are not on.
 1.11 16-Jan-2000  perseant Make sure that vnodes are locked when inactivated (e.g. by the cleaner)
 1.10 16-Jan-2000  perseant Fix a problem in my changes of Dec 14th, that prevents removed vnodes
from being inactivated under some conditions. Removed vnodes are now
inactivated when the VDIROP flag is cleared, and to prevent block
accounting problems this clearing has been postponed until
lfs_segunlock.
 1.9 25-Mar-1999  perseant branches: 1.9.2; 1.9.8; 1.9.14;
clean up unused/required #ifdefs
 1.8 10-Mar-1999  perseant New sources should leave the LFS in a more-or-less working state. Changes
include:

- DIROP segregation is enabled, and greater care is taken
to make sure that a checkpoint completes. Fsck is not
needed to remount the filesystem.
- Several checks to make sure that the LFS subsystem does not
overuse various resources (memory, in particular).
- The cleaner routines, lfs_markv in particular, are completely
rewritten. A buffer overflow is removed. Greater care is taken
to ensure that inodes come from where lfs_cleanerd say they come
from (so we know nothing has changed since lfs_bmapv was called).
- Fragment allocation is fixed, so that writes beyond end-of-file
do the right thing.
 1.7 25-Aug-1998  thorpej Add some braces to make egcs happy.
 1.6 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.5 12-Oct-1996  christos revert previous kprintf changes
 1.4 10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.3 09-Feb-1996  christos lfs prototypes
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.9.14.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.9.8.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.9.2.1 20-Jan-2000  he Pull up revision 1.10 (requested by perseant):
Files removed (through unlink, rmdir) are now really removed, though the
removal is postponed until the dirop is complete to ensure validity of
the filesystem through a crash. Use a separate per-fs lock, instead of
ufs_hashlock, to protect the inode free list. Change calling semantics
of lfs_ifind, to give better error reporting: If fed a struct buf, it
can report the block number of the offending inode block as well as the
inode number.
 1.14.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.15.2.1 14-Sep-2000  perseant Pull up recent LFS kernel changes (approved by thorpej):

ufs/ufs/inode.h, 1.20--1.22 (add i_lfs_effnblks extension ;
make ITIMES aware of LFS_ITIMES;
_LKM protection so userland progs
compile)
ufs/ufs/ufs_vnops.c, 1.69, 1.71 (remove IN_ADIROP;
use ITIMES instead of FFS_ITIMES)
ufs/ufs/ufs_readwrite.c, 1.27 (use lfs_reserve in lfs_write)
ufs/lfs/lfs.h, 1.26--1.32 (define LFS_EST_* macros ;
change MIN_FREE_SEGS to lfs_minfreesegs ;
add avail and bfree to CLEANERINFO ;
change lfs_uinodes to signed ;
change lfs_dmeta to signed ;
add whitespace to line up structure
members ;
explicit cast to int32_t in LFS_EST_*
macros)
ufs/lfs/lfs_alloc.c, back out 1.34.2.3 (pullups of 1.39, 1.40);
then pull up 1.38 (clean up on error)
1.39--1.43 (restore fvdl's ufs_hashlock fix ;
restore fvdl's ufs_hashlock fix ;
set i_lfs_effnblks ;
use UINO macros ;
add comments and fix long lines)
ufs/lfs/lfs_balloc.c, 1.19 (don't succeed halfway)
1.21--1.25 (use i_lfs_effnblks ;
fix i_lfs_effnblks computation and
quieten ;
fix i_ffs_blocks in unwritten fragment ;
remove useless debugging check ;
add comments and (c) 2000)
ufs/lfs/lfs_bio.c, 1.24--1.30 (cleanup and make lfs_flush_fs take
"struct lfs *" instead of "struct
mount *" ;
use lfs_minfreeseg instead of
MIN_FREE_SEGS ;
use UINO macros, and copy bfree/avail
to CLEANERINFO ;
add lfs_reserve function ;
1.28--1.30 fix printf formatting)
ufs/lfs/lfs_cksum.c, 1.13 (add (c) 2000)
ufs/lfs/lfs_debug.c, 1.11 (use btodb instead of DEV_BSIZE)
ufs/lfs/lfs_extern.h, 1.18, 1.20--1.21 (function prototype changes)
ufs/lfs/lfs_inode.c, 1.38 (rewrite lfs_truncate from
ffs_truncate)
1.40--1.44 (count written and unwritten blocks
seperately ;
use disk block units instead of bytes ;
remove unnecessary "mod" variable ;
correct B_DELWRI to avoid bawrite panic ;
use lfs_reserve)
ufs/lfs/lfs_segment.c, 1.52-1.59 (use lfs_dmeta to note used summaries ;
check for UNWRITTEN in indirect blocks ;
more debugging stuff inside #ifdef
DEBUG_LFS ;
use LK_CANRECURSE ;
don't drop dirty indirect blocks ;
use UINO macros ;
don't hose the free list ;
use btodb() instead of DEV_BSIZE ;
make it compile again (oops))
ufs/lfs/lfs_subr.c, 1.16--1.17 (check for locked inodes before
changing ;
use btodb() instead of DEV_BSIZE, (c)
2000)
ufs/lfs/lfs_syscalls.c, back out 1.41.4.2 (fvdl's ufs_hashlock fix);
then pull up 1.43 (use lfs_dmeta)
1.44--1.45 (restore fvdl's ufs_hashlock fix)
1.46--1.47 (fix lfs_avail leakage from sblock
segments ;
use UINO macros)
1.49 (bounds-check inode numbers in
lfs_markv)
ufs/lfs/lfs_vfsops.c, 1.53 (use LFS_EST_* macros in lfs_statfs)
1.56--1.58 (initialize lfs_minfreeseg, lfs_effnblk ;
initialize lfs_uinodes ;
initialize lfs_ravail)
ufs/lfs/lfs_vnops.c, 1.40 (remove VDIROP from removed files)
1.42--1.44 (move SET_ENDOP below the removal of
VDIROP ;
use UINO macros and add lfs_itimes
function ;
use lfs_reserve in dirops)
 1.17.6.4 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.17.6.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.17.6.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.17.6.1 03-Aug-2001  lukem update to -current
 1.17.4.3 02-Jul-2001  perseant Change disk addressing unit to be the fragment, instead of the disk sector.
All quantities in the superblock, inodes, indirect blocks, etc. refer now
to this abstract unit (called "fsb" as it is in FFS) instead of disk sectors;
as a consequence segment summary blocks have to be multiples of a fragment in
size. In v1 filesystems, compatibility code ensures that 1 fsb == 1 sector,
regardless of fragment size.

Fragments can now range in size between 512 and 32k; in the event that
LFS_LABELPAD (8k) is smaller than the disk address unit size, an extra
proto-superblock is kept at 8k from the beginning of the disk, to be used
*only* to locate the real superblocks. (Not all of the userland knows about
this yet.)

Almost all of this was done not by me, but by joff.
 1.17.4.2 29-Jun-2001  perseant Get rid of __P(), protoizing where it had not already been done
 1.17.4.1 27-Jun-2001  perseant Import of what I've been calling "LFSv2", that is, LFS with some features
added that require changes to the on-disk data structures. These include:

- 64-bit time in everything but inodes
- User-specified segment offset, and segment size no longer
restricted to PO2.
- Serial number on segment summaries in addition to timestamp, and
a new volume identifier, to make roll-forward feasible without
fear of finding old data and thinking it was new.

Although I think this version works at least as well as what's on the trunk,
we're not done yet; hence this commit is going in on a branch and not on
the trunk. Enhancements that are not here yet include fragment addressing,
like FFS does, instead of block addressing.
 1.17.2.7 01-Aug-2002  nathanw Catch up to -current.
 1.17.2.6 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.17.2.5 20-Jun-2002  nathanw Catch up to -current.
 1.17.2.4 08-Jan-2002  nathanw Catch up to -current.
 1.17.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.17.2.2 24-Aug-2001  nathanw Catch up with -current.
 1.17.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.18.4.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.22.2.3 15-Jul-2002  gehenna catch up with -current.
 1.22.2.2 20-Jun-2002  gehenna catch up with -current.
 1.22.2.1 30-May-2002  gehenna Catch up with -current.
 1.23.2.3 20-Jun-2002  lukem Pull up revision 1.26 (requested by perseant in ticket #325):
For synchronous writes, keep separate i/o counters for each write, so
processes don't have to wait for one another to finish (e.g., nfsd seems
to be a little happier now, though I haven't measured the difference).
Synchronous checkpoints, however, must always wait for all i/o to finish.
Take the contents of the callback functions and have them run in thread
context instead (aiodoned thread). lfs_iocount no longer has to be
protected in splbio(), and quite a bit less of the segment construction
loop needs to be in splbio() as well.
If lfs_markv is handed a block that is not the correct size according to
the inode, refuse to process it. (Formerly it was extended to the "correct"
size.) This is possibly more prone to deadlock, but less prone to corruption.
lfs_segclean now outright refuses to clean segments that appear to have live
bytes in them. Again this may be more prone to deadlock but avoids
corruption.
Replace ufsspec_close and ufsfifo_close with LFS equivalents; this means
that no UFS functions need to know about LFS_ITIMES any more. Remove
the reference from ufs/inode.h.
Tested on i386, test-compiled on alpha.
 1.23.2.2 02-Jun-2002  tv Pull up revision 1.25 (requested by perseant in ticket #135):
Fix a couple of instances where reassignbuf() was not done at splbio.
Tested on i386.
 1.23.2.1 02-Jun-2002  tv Pull up revision 1.24 (requested by perseant in ticket #132):
Back out rev 1.174 of vfs_subr.c, because the splbio() wasn't protecting
enough to be useful, and broadening it so that it did would have meant
that operations possibly requiring synchronous disk activity would have
to be done in splbio(). This clearly was not going to work.
Worked around this in the LFS case by having lfs_cluster_callback put an
extra hold on the vnode before calling biodone(), and taking the hold
off without HOLDRELE's problematic list swapping. lfs_vunref() will take
care of that---in thread context---on the next write if need be.
Also, ensure that the list walking in lfs_{writevnodes,segunlock,gather}
takes into account the possibility that the list may change
underneath it (possibly because it itself deleted an element).
Tested on i386, test-compiled on alpha.
 1.40.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.40.2.5 08-Mar-2005  skrll Sync with HEAD.
 1.40.2.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.40.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.40.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.40.2.1 03-Aug-2004  skrll Sync with HEAD
 1.47.10.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.47.8.1 29-Apr-2005  kent sync with -current
 1.47.6.1 10-May-2005  riz Pull up the following revisions (requested by perseant in ticket #1281):

1.8 sys/ufs/lfs/TODO
1.75 sys/ufs/lfs/lfs.h (via patch)
1.74 sys/ufs/lfs/lfs_alloc.c (via patch)
1.49, 1.51 sys/ufs/lfs/lfs_balloc.c (1.51 via patch)
1.78 sys/ufs/lfs/lfs_bio.c
1.62 sys/ufs/lfs/lfs_extern.h (via patch)
1.156 sys/ufs/lfs/lfs_segment.c (via patch)
1.48 sys/ufs/lfs/lfs_subr.c
1.101 sys/ufs/lfs/lfs_syscalls.c
1.163 sys/ufs/lfs/lfs_vfsops.c (via patch)
1.134 sys/ufs/lfs/lfs_vnops.c (via patch)
1.61 sys/ufs/ufs/ufs_readwrite.c (via patch)

1.20 libexec/lfs_cleanerd/clean.h (via patch)
1.52 libexec/lfs_cleanerd/cleanerd.c (via patch)
1.41 libexec/lfs_cleanerd/library.c (via patch)

1.4 regress/sys/fs/lfs/newfs_fsck/Makefile
1.2 regress/sys/fs/lfs/newfs_fsck/mkfs_mount
1.2 regress/sys/fs/lfs/newfs_fsck/smallfiles
1.3 sbin/fsck_lfs/bufcache.c
1.3 sbin/fsck_lfs/bufcache.h
1.3 sbin/fsck_lfs/lfs.h
1.8 sbin/fsck_lfs/lfs.c (via patch)
1.8 sbin/fsck_lfs/pass3.c (via patch)
1.18 sbin/fsck_lfs/pass0.c (via patch)
1.18 sbin/fsck_lfs/utilities.c (via patch)
1.7 sbin/fsck_lfs/segwrite.c
1.19 sbin/fsck_lfs/setup.c (via patch)
1.3 sbin/newfs_lfs/Makefile
0 sbin/newfs_lfs/lfs.c (yes, remove it)
1.1 sbin/newfs_lfs/make_lfs.c
1.15 sbin/newfs_lfs/newfs.c (via patch)

Various minor LFS improvements.

Kernel:

* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this. Should fix PR #29045.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
Fixes PR #26680.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().

cleaner:

* Adapt lfs_cleanerd to use the fcntl call to get the Ifile filehandle,
so it need not be in the namespace.
* Make lfs_cleanerd be more careful when there are very few available
segments.
* Make lfs_cleanerd less verbose when the filesystem is unmounted.

newfs_lfs, fsck_lfs, and regression:

* Extend the lfs library from fsck_lfs(8) so that it can be used with a
not-yet-existent LFS. Make newfs_lfs(8) use this library, so it can
create LFSs whose Ifile is larger than one segment. Addresses PR #11110.
* Make newfs_lfs(8) use strsuftoi64() for its arguments, a la newfs(8).
* Make fsck_lfs(8) respect the "file system is clean" flag.
* Don't let fsck_lfs(8) think it has dirty blocks when invoked with the
-n flag.
* Remove the Ifile from the filesystem namespace. The cleaner now uses
a fcntl call on the root inode to find the Ifile filehandle. (As a
side-effect, addresses PR #29144.)
 1.50.2.6 10-Aug-2006  tron Apply patch (requested by fair in perseant #1457):
Bring LFS up to current, including a patch (1.95 lfs_alloc.c) that
should prevent the inode free list errors seen on the STABLE branch
subsequent to pullup ticket #1327.
 1.50.2.5 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_alloc.c: revision 1.92
sys/ufs/lfs/lfs.h: revision 1.105
sys/ufs/lfs/lfs_vfsops.c: revision 1.207
sys/ufs/lfs/lfs_subr.c: revision 1.59
sys/ufs/lfs/lfs_vnops.c: revision 1.173
sys/ufs/lfs/lfs_bio.c: revision 1.92
Introduce another per-filesystem parameter, lfs_resvseg, to separate the
notion of "how many segments are reserved for the cleaner" from that of
"how many segments are not counted in lfs_bfree". The default value
used for existing filesystems is the same as the previous implicit value
of (lfs_minfreeseg / 2 + 1), modulo some sanity checking.
Count pending dirops on a per-filesystem basis, since once we start
writing them we can't stop until we're done. This seems to help stave off
the "no clean segments" panic in the case of filling the filesystem with
directories and small files (e.g. simultaneously unpacking more copies of
pkgsrc than will fit).
 1.50.2.4 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_subr.c: revision 1.58
sys/ufs/lfs/lfs.h: revision 1.98
Make the segment lock aware of LWPs. Fixes a (somewhat confusing)
"lockmgr: pid 3997, not exclusive lockholder 3997, unlocking" panic I
encountered while running blogbench on an LFS.
 1.50.2.3 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.158
sys/ufs/lfs/lfs_subr.c: revision 1.57
sys/ufs/lfs/lfs_segment.c: revision 1.171
sys/ufs/lfs/lfs.h: revision 1.97
sys/ufs/lfs/lfs_vfsops.c: revision 1.195
sys/ufs/lfs/lfs_extern.h: revision 1.76
Improvements to LFS's paging mechanism, to wit:
* Acknowledge that sometimes there are more dirty pages to be written to
disk than clean segments. When we reach the danger line,
lfs_gop_write() now returns EAGAIN. The caller of VOP_PUTPAGES(), if
it holds the segment lock, drops it and waits for the cleaner to make
room before continuing.
* Note and avoid a three-way deadlock in lfs_putpages (a writer holding
a page busy blocks on the cleaner while the cleaner blocks on the
segment lock while lfs_putpages blocks on the page).
 1.50.2.2 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.152
sys/ufs/lfs/lfs_debug.c: revision 1.31
sys/ufs/lfs/lfs_subr.c: revision 1.53
sys/ufs/lfs/lfs_extern.h: revision 1.68
sys/ufs/lfs/lfs_inode.c: revision 1.96
sys/ufs/lfs/lfs_bio.c: revision 1.86
sys/ufs/lfs/lfs_alloc.c: revision 1.83
sys/ufs/lfs/lfs_vfsops.c: revision 1.181
sys/ufs/lfs/lfs.h: revision 1.88
sys/ufs/lfs/lfs_segment.c: revision 1.164
- sprinkle const
- avoid shadow variables.
 1.50.2.1 07-May-2005  tron Apply patch (requested by perseant in ticket #242):
* fsck_lfs buffer cache fixes, including PR #29151
* Change fsck_lfs phase 0 message to reflect reality
* fsck_lfs: check phase 5 (cleanerinfo accounting) even on
roll-forward
* Keep better track of the free list during roll-forward, avoiding
a core dump
* Improve hash table use for fsck_lfs buffer and vnode cache
* Document fsck_lfs flag -f, and implement -q
* Add resize_lfs, including kernel support
* Add LFS to mountd's list of exportable filesystem types
* Make the LFS lkm work again [christos@]
* Add MP locking to the LFS kernel subsystem
* Fix pager_map deadlock in lfs_putpages()
* Avoid incomplete file extension that looks like "partial
truncation" to fsck
* Use lfs_malloc for cleaner malloc, since the cleaner often runs
in low-memory conditions.
* Use splay trees, not hash table, to track page allocation for
write.
* Fix mkdir panic on full fs
* Fix page accounting leak by counting differently.
* Use rightly named structure for lfs_getattr [skrll@]
* Cosmetic changes for readability.
 1.53.4.1 20-Oct-2005  yamt adapt ufs.
 1.53.2.6 21-Jan-2008  yamt sync with head
 1.53.2.5 27-Oct-2007  yamt sync with head.
 1.53.2.4 03-Sep-2007  yamt sync with head.
 1.53.2.3 26-Feb-2007  yamt sync with head.
 1.53.2.2 30-Dec-2006  yamt sync with head.
 1.53.2.1 21-Jun-2006  yamt sync with head.
 1.55.2.1 15-Jan-2006  yamt sync with head.
 1.56.10.2 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.56.10.1 28-Mar-2006  tron Merge 2006-03-28 NetBSD-current into the "peter-altq" branch.
 1.56.8.2 11-May-2006  elad sync with head
 1.56.8.1 19-Apr-2006  elad sync with head.
 1.56.6.5 03-Sep-2006  yamt sync with head.
 1.56.6.4 11-Aug-2006  yamt sync with head
 1.56.6.3 24-May-2006  yamt sync with head.
 1.56.6.2 11-Apr-2006  yamt sync with head
 1.56.6.1 01-Apr-2006  yamt sync with head.
 1.56.4.2 01-Jun-2006  kardel Sync with head.
 1.56.4.1 22-Apr-2006  simonb Sync with head.
 1.56.2.1 09-Sep-2006  rpaulo sync with head
 1.59.4.1 13-Jul-2006  gdamore Merge from HEAD.
 1.61.2.1 18-Nov-2006  ad Sync with head.
 1.62.2.2 10-Dec-2006  yamt sync with head.
 1.62.2.1 22-Oct-2006  yamt sync with head
 1.65.4.1 03-Sep-2007  wrstuden Sync w/ NetBSD-4-RC_1
 1.65.2.1 05-Jun-2007  bouyer Pull up following revision(s) (requested by perseant in ticket #703):
sys/miscfs/genfs/genfs.h 1.21
sys/miscfs/genfs/genfs_vnops.c 1.151
sys/ufs/lfs/lfs.h 1.119, 1.120
sys/ufs/lfs/lfs_bio.c 1.99-101
sys/ufs/lfs/lfs_extern.h 1.89
sys/ufs/lfs/lfs_inode.c 1.108, 1.109
sys/ufs/lfs/lfs_segment.c 1.197, 1.199, 1.200
sys/ufs/lfs/lfs_subr.c 1.69, 1.70
sys/ufs/lfs/lfs_syscalls.c 1.119
sys/ufs/lfs/lfs_vfsops.c 1.234, 1.235
sys/ufs/lfs/lfs_vnops.c 1.195, 1.196, 1.200, 1.202-206

Reduce busy waiting in lfs_putpages(), and other LFS improvements.
 1.66.2.4 17-May-2007  yamt sync with head.
 1.66.2.3 07-May-2007  yamt sync with head.
 1.66.2.2 24-Mar-2007  yamt sync with head.
 1.66.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.67.4.5 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.67.4.4 08-Jun-2007  ad Sync with head.
 1.67.4.3 21-Mar-2007  ad GC the simplelock/spinlock debugging stuff.
 1.67.4.2 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.67.4.1 13-Mar-2007  ad Sync with head.
 1.68.2.1 11-Jul-2007  mjf Sync with head.
 1.70.10.1 14-Oct-2007  yamt sync with head.
 1.70.8.2 09-Jan-2008  matt sync with HEAD
 1.70.8.1 06-Nov-2007  matt sync with HEAD
 1.70.6.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.71.10.1 02-Jan-2008  bouyer Sync with HEAD
 1.71.6.2 19-Dec-2007  ad Use a global lfs_lock.
 1.71.6.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.71.4.1 18-Feb-2008  mjf Sync with HEAD.
 1.72.10.3 11-Aug-2010  yamt sync with head.
 1.72.10.2 11-Mar-2010  yamt sync with head
 1.72.10.1 16-May-2008  yamt sync with head.
 1.72.8.1 18-May-2008  yamt sync with head.
 1.72.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.73.20.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.73.20.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.74.2.1 03-Jul-2010  rmind sync with head
 1.76.12.1 18-Feb-2012  mrg merge to -current.
 1.76.8.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.76.8.1 17-Apr-2012  yamt sync with head
 1.77.6.3 03-Dec-2017  jdolecek update from HEAD
 1.77.6.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.77.6.1 23-Jun-2013  tls resync from head
 1.79.2.1 28-Aug-2013  rmind sync with head
 1.80.6.3 28-Aug-2017  skrll Sync with HEAD
 1.80.6.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.80.6.1 22-Sep-2015  skrll Sync with HEAD
 1.86.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.86.2.1 26-Apr-2017  pgoyette Sync with HEAD
 1.92.6.1 30-Oct-2017  snj Pull up following revision(s) (requested by maya in ticket #330):
sbin/fsck_lfs/inode.c: 1.69
sbin/fsck_lfs/lfs.c: 1.73
sbin/fsck_lfs/pass6.c: 1.50
sbin/fsck_lfs/segwrite.c: 1.46
sys/ufs/lfs/lfs.h: 1.202-1.203
sys/ufs/lfs/lfs_accessors.h: 1.48
sys/ufs/lfs/lfs_alloc.c: 1.136-1.137
sys/ufs/lfs/lfs_balloc.c: 1.94
sys/ufs/lfs/lfs_bio.c: 1.141
sys/ufs/lfs/lfs_extern.h: 1.113
sys/ufs/lfs/lfs_inode.c: 1.156-1.157
sys/ufs/lfs/lfs_inode.h: 1.20, 1.21, 1.23
sys/ufs/lfs/lfs_itimes.c: 1.20
sys/ufs/lfs/lfs_pages.c: 1.13-1.15
sys/ufs/lfs/lfs_rename.c: 1.22
sys/ufs/lfs/lfs_segment.c: 1.270-1.275
sys/ufs/lfs/lfs_subr.c: 1.94-1.97
sys/ufs/lfs/lfs_syscalls.c: 1.175
sys/ufs/lfs/lfs_vfsops.c: 1.360
sys/ufs/lfs/lfs_vnops.c: 1.316-1.321
sys/ufs/lfs/ulfs_inode.c: 1.20
sys/ufs/lfs/ulfs_inode.h: 1.24
sys/ufs/lfs/ulfs_lookup.c: 1.41
sys/ufs/lfs/ulfs_quota2.c: 1.31
sys/ufs/lfs/ulfs_readwrite.c: 1.24
sys/ufs/lfs/ulfs_vnops.c: 1.49-1.50
Update inode member i_flag --> i_state to keep up with kernel changes
Move definition of IN_ALLMOD near the flag it's a mask for.
Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
Correct confusion between i_flag and i_flags
These will have to be renamed.
Spotted by Riastradh, thanks!
Add an XXX about the missing flags so it's not buried in a commit
message.
now the XXX count for LFS is 260
Rename i_flag to i_state.
The similarity to i_flags has previously caused errors.
Use continue to denote the no-op loop to match netbsd style
newline for extra clarity.
It isn't safe to drain dirops with seglock held, it'll deadlock if there
are any dirops. drain before grabbing seglock.
lfs_dirops == 0 is always true (as we already drained dirops), so omit
that part of the comparison.
Fixes a lot of LFS deadlocks. PR kern/52301
Many thanks to dholland for help analyzing coredumps
Ifdef out KDASSERT which fires on my machine.
Deduplicate sanity check that seglock is held on segunlock
Revert r1.272 fix to PR kern/52301, the performance hit is making things
unusable.
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar
XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
fix buffer overflow/KASSERT when cookies are supplied
lfs no longer uses the ffs-style struct direct, use the correct minimum
size
from dholland
XXX more wrong
Consistently use {,UN}MARK_VNODE macros rather than function calls.
Not much point doing anything after a panic call
Ask some question about the code in a XXX comment
XXX question our double-flushing of dirops
Fix typo in comment
 1.97.10.1 29-Feb-2020  ad Sync with head.
 1.97.8.1 17-Aug-2020  martin Pull up following revision(s) (requested by riastradh in ticket #1050):

sys/ufs/lfs/lfs_subr.c: revision 1.101
sys/ufs/lfs/lfs_subr.c: revision 1.102
sys/ufs/lfs/lfs_inode.c: revision 1.158
sys/ufs/lfs/lfs_inode.h: revision 1.25
sys/ufs/lfs/lfs_balloc.c: revision 1.95
sys/ufs/lfs/lfs_pages.c: revision 1.21
sys/ufs/lfs/lfs_vnops.c: revision 1.330
sys/ufs/lfs/lfs_alloc.c: revision 1.140 (patch)
sys/ufs/lfs/lfs_alloc.c: revision 1.141 (patch)
lib/libp2k/p2k.c: revision 1.72
sys/ufs/lfs/lfs.h: revision 1.205
sys/ufs/lfs/lfs.h: revision 1.206
sys/ufs/lfs/lfs_segment.c: revision 1.284
sys/ufs/lfs/lfs.h: revision 1.207
sys/ufs/lfs/lfs_segment.c: revision 1.285
sys/ufs/lfs/lfs_debug.c: revision 1.55
sys/ufs/lfs/lfs_rename.c: revision 1.23
usr.sbin/dumplfs/dumplfs.c: revision 1.65
sys/ufs/lfs/lfs_vfsops.c: revision 1.371
sys/arch/i386/stand/efiboot/bootx64/Makefile: revision 1.3
sys/ufs/lfs/lfs_vfsops.c: revision 1.372
sys/ufs/lfs/lfs_vfsops.c: revision 1.373
sbin/fsck_lfs/pass1.c: revision 1.46
sys/ufs/lfs/lfs_vnops.c: revision 1.326
sys/ufs/lfs/lfs_vnops.c: revision 1.327
sys/ufs/lfs/lfs_vfsops.c: revision 1.375 (patch)
sys/ufs/lfs/lfs_vnops.c: revision 1.328
sys/ufs/lfs/lfs_subr.c: revision 1.98
sys/ufs/lfs/lfs_extern.h: revision 1.116
sys/ufs/lfs/lfs_vnops.c: revision 1.329
sys/ufs/lfs/lfs_subr.c: revision 1.99
sys/ufs/lfs/lfs_extern.h: revision 1.117
sys/ufs/lfs/lfs_accessors.h: revision 1.49
sys/ufs/lfs/lfs_extern.h: revision 1.118
sys/rump/fs/lib/liblfs/Makefile: revision 1.15
sys/ufs/lfs/lfs_bio.c: revision 1.146 (patch)
sys/ufs/lfs/lfs_bio.c: revision 1.147
sys/ufs/lfs/lfs_subr.c: revision 1.100

Fix kassert in lfs by initializing vp first.

Use a marker node to iterate lfs_dchainhd / i_lfs_dchain.

I believe elements can be removed while the lock is dropped,
including the next node we're hanging on to.

Just use VOP_BWRITE for lfs_bwrite_log.
Hope this doesn't cause trouble with vfs_suspend.

Teach lfs to transition ro<->rw.

Prevent new dirops while we issue lfs_flush_dirops.

lfs_flush_dirops assumes (by KASSERT((ip->i_state & IN_ADIROP) == 0))
that vnodes on the dchain will not become involved in active dirops
even while holding no other locks (lfs_lock, v_interlock), so we must
set lfs_writer here. All other callers already set lfs_writer.

We set fs->lfs_writer++ without explicitly doing lfs_writer_enter
because
(a) we already waited for the dirops to drain, and
(b) we hold lfs_lock and cannot drop it before setting lfs_writer.

Assert lfs_writer where I think we can now prove it.

Serialize access to the splay tree with lfs_lock.

Change some cheap KDASSERT into KASSERT.

Take a reference and fix assertions in lfs_flush_dirops.
Fixes panic:
KASSERT((ip->i_state & IN_ADIROP) == 0) at lfs_vnops.c:1670
lfs_flush_dirops
lfs_check
lfs_setattr
VOP_SETATTR
change_mode
sys_fchmod
syscall

This assertion -- and the assertion that vp->v_uflag has VU_DIROP set
-- is valid only until we release lfs_lock, because we may race with
lfs_unmark_dirop which will remove the nodes and change the flags.

Further, vp itself is valid only as long as it is referenced, which it
is as long as it's on the dchain, but lfs_unmark_dirop drops the
dchain's reference.

Don't lfs_writer_enter while holding v_interlock.

There's no need to lfs_writer_enter at all here, as far as I can see.
lfs_flush_fs will do it for us.

Break deadlock in PR kern/52301.

The lock order is lfs_writer -> lfs_seglock. The problem in 52301 is
that lfs_segwrite violates this lock order by sometimes doing
lfs_seglock -> lfs_writer, either (a) when doing a checkpoint or (b),
opportunistically, when there are no dirops pending. Both cases can
deadlock, because dirops sometimes take the seglock (lfs_truncate,
lfs_valloc, lfs_vfree):
(a) There may be dirops pending, and they may be waiting for the
seglock, so we can't wait for them to complete while holding the
seglock.
(b) The test for fs->lfs_dirops == 0 happens unlocked, and the state
may change by the time lfs_writer_enter acquires lfs_lock.

To resolve this in each case:
(a) Do lfs_writer_enter before lfs_seglock, since we will need it
unconditionally anyway. The worst performance impact of this should
be that some dirops get delayed a little bit.
(b) Create a new lfs_writer_tryenter to use at this point so that the
test for fs->lfs_dirops == 0 and the acquisition of lfs_writer happen
atomically under lfs_lock.

Initialize/destroy lfs_allclean_wakeup in modcmd, not lfs_mountfs.

Fixes reloading lfs.kmod.

In lfs_update, hold lfs_writer around lfs_vflush.

Otherwise, we might do
lfs_vflush
-> lfs_seglock
-> lfs_segwait(SEGM_CKP)
-> lfs_writer_enter
which is the reverse of the lfs_writer -> lfs_seglock ordering.

Call lfs_orphan in lfs_rename while we're still in the dirop.
lfs_writer_enter can't fail; keep it simple and don't pretend it can.

Assert that mtsleep can't fail either -- it doesn't catch signals and
there's no timeout.

Teach LFS_ORPHAN_NEXTFREE about lfs64.

Dust off the orphan detection code and try to make it work.

Fix !DIAGNOSTIC compile

Fix userland references to LFS_ORPHAN_NEXTFREE.

Forgot to grep for these or do a full distribution build, oops!

Fix missing <sys/evcnt.h> by removing the evcnts instead.

Just wanted to confirm that a race might happen, and indeed it did.
These serve little diagnostic value otherwise.

OR into bp->b_cflags; don't overwrite.

CTASSERT lfs on-disk structure sizes.

Avoid misaligned access to lfs64 on-disk records in memory.
lfs64 directory entries are only 32-bit aligned in order to conserve
space in directory blocks, and we had a hack to stuff a 64-bit inode
in them. This replaces the hack by __aligned(4) __packed, and goes
further:

1. It's not clear that all the other lfs64 data structures are 64-bit
aligned on disk to begin with. We can go through these later and
upgrade them from
struct foo64 {
...
} __aligned(4) __packed;
union foo {
struct foo64 f64;
...
};
to
struct foo64 {
...
};
union foo {
struct foo64 f64 __aligned(8);
...
} __aligned(4) __packed;
if we really want to take advantage of 64-bit memory accesses.
However, the __aligned(4) __packed must remain on the union
because:
2. We access even the lfs32 data structures via a union that has
lfs64 members, and it turns out that compilers will assume access
through a union with 64-bit aligned members implies the whole
union has 64-bit alignment, even if we're only accessing a 32-bit
aligned member.

Fix clang build after packed lfs64 accessor change.

Suppress spurious address-of-packed error in rump lfs too.
 1.97.4.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.179 04-Nov-2025  perseant Remove su_flags array, replacing it with a new flag SEGUSE_READY.
Segments progress from having su_nbytes==0 to SEGUSE_EMPTY to SEGUSE_READY
to clean, progressing to the nest step after a checkpoint.
 1.178 03-Nov-2025  perseant Be more careful about only setting IN_CLEANING in lfs_setclean() and clearing
it in lfs_clrclean(). Prevents a crash from re-removing an entry from the
lfs_cleanhd TAILQ.
 1.177 20-Oct-2025  perseant * Generalize the partial-segment parser introduced for roll-forward,
using it to facilitate an in-kernel segment rewriter (cleaner), and a
mechanism to check whether a segment is in fact empty (only used
with DEBUG).

* Add these new fcntl calls:
- LFCNFILESTATS: For each inode given, report its number of direct
blocks, how many gaps (discontinuities) there are between direct
blocks, and how large the total gap distance is. This will be
useful for a coalescing agent.
- LFCNREWRITEFILE: For each inode given, rewrite its direct blocks,
effectively coalescing it into as compact a form as possible.
- LFCNSCRAMBLE: As above, except that it only rewrites every other
block. This causes the file to have many gaps that can be
measured with LFCNFILESTATS and addressed with LFCNREWRITEFILE,
for testing purposes.
- LFCNREWRITESEGS: Rewrite any live data in the given segments.
This is intended to simplify the cleaner API and facilitate an
in-kernel cleaner.
- LFCNCLEANERINFO: Get the most current CLEANERINFO data from the
kernel.
- LFCNSEGUSE: Retrieve segment usage data from the kernel.

* Vnodes marked IN_CLEANING now take a reference. Add a new "cleaner
lock", which must be taken by the cleaner before the segment lock,
and before marking nodes IN_CLEANING. This allows us to flush
vnodes, if necessary, before the cleaning segment is written, and
never to flush vnodes being cleaned. When the cleaner lock is
released, the vnodes are cleared of IN_CLEANING and the reference
dropped.

* Track a potential infinite loop in lfs_gatherblock.

* Pull "needs to flush" and "needs to wait for flush" into functions
instead of inlining their definitions.
 1.176 18-Feb-2020  chs remove the aiodoned thread. I originally added this to provide a thread context
for doing page cache iodone work, but since then biodone() has changed to
hand off all iodone work to a softint thread, so we no longer need the
special-purpose aiodoned thread.
 1.175 26-Jul-2017  maya branches: 1.175.4; 1.175.10;
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar

XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
 1.174 17-Apr-2017  hannken branches: 1.174.4;
Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.
 1.173 13-Mar-2017  riastradh #if DIAGNOSTIC panic ---> KASSERT

Replace some #if DEBUG by this too. DEBUG is only for expensive
assertions; these are not.
 1.172 15-Oct-2015  dholland branches: 1.172.2; 1.172.4;
Move stuff from struct ulfsmount to struct lfs.
 1.171 10-Oct-2015  dholland Fix minor bitrot in #if 0 or otherwise disabled code.
 1.170 01-Sep-2015  dholland Use the lfs dinode accessors in place of the ufs-derived ones.
(Mostly.)

The ufs-derived ones are fake structure member macros, which are gross
and not very safe. Also, it seems that a lot of places in the lfs code
were using the ffsv1 branch of them unconditionally, and this way it's
guaranteed all those places have been updated.

Found while doing this: for non-devices, have getattr produce NODEV
in the rdev field instead of leaking the address of the first direct
block.
 1.169 12-Aug-2015  dholland Hack up dinode usage to be 64 vs. 32 as needed. Part 1.

(This part changes the native lfs code; the ufs-derived code already
has 64 vs. 32 logic, but as aspects of it are unsafe, and don't
entirely interoperate cleanly with the lfs 64/32 stuff, pass 2 will be
rehashing that.)
 1.168 12-Aug-2015  dholland Add IFILE32 and IFILE64 structures for the on-disk ifile entries.
Add and use accessors. There are also a bunch of places that cast and
I hope I've found them all...
 1.167 12-Aug-2015  dholland Make 32-bit and 64-bit versions of CLEANERINFO.

XXX: while this is written to disk, it seems like much of it would
XXX: be better set up as a commpage shared with the cleaner.
 1.166 12-Aug-2015  dholland Move the security checks for lfs_bmapv/lfs_markv into those functions.
(instead of the system call entry points)

Avoids duplication.

While touching these, pass the lwp around instead of the proc -- the
latter was there for no other reason than because once upon a time
struct proc was the first argument of all syscalls.

(For that matter, why not just use curlwp instead of passing it around
all over the place? The cost of passing it to every syscall probably
exceeds the cost of loading it from curcpu, even on machines where
it's not just kept in a register all the time.)
 1.165 12-Aug-2015  dholland Fix assorted 64->32 truncations related to BLOCK_INFO.

Also make note of a cleaner limitation: it seems that when it goes to
coalesce discontiguous files, it mallocs an array with one BLOCK_INFO
for every block in the file. Therefore, with 64-bit LFS, on a 32-bit
platform it will be possible to have files large enough to overflow
the cleaner's address space. Currently these will be skipped and cause
warnings via syslog.

At some point someone should rewrite the logic to coalesce files to
use chunks of some reasonable size, as discontinuity between such
chunks is immaterial and mallocing this much space is silly and
fragile. Also, the kernel only accepts up to 65536 blocks at a time
for bmapv and markv, so processing more than this at once probably
isn't useful and may not even work currently. I don't want to change
this around just now as it's not entirely trivial.
 1.164 02-Aug-2015  dholland Use accessor functions for the version field of the lfs superblock.
I thought at first maybe the cases that test the version should be
rolled into the accessors, but on the whole I think the conclusion on
that is no.
 1.163 28-Jul-2015  dholland Add a new lfs header file: lfs_accessors.h.

This contains all the accessor functions and macros out of lfs.h.
Add an include of lfs_accessors.h after all uses of lfs.h... except
for code that wants to define its own struct lfs-alike that the
accessors are supposed to play along with. For these, set STRUCT_LFS
and include lfs_accessors.h after the necessary structure has been
defined, so that lfs_accessors.h can emit functions in terms of it.
 1.162 24-Jul-2015  dholland More lfs superblock accessors.
(This changes the rest of the code over; all the accessors were
already added.)

The difference between this commit and the previous one is arbitrary,
but the previous one passed the regression tests on its own so I'm
keeping it separate to help with any bisections that might be needed
in the future.
 1.161 24-Jul-2015  dholland Switch to accessor functions for elements of the LFS on-disk
superblock. This will allow switching between 32/64 bit forms on the
fly; it will also allow handling LFS_EI reasonably tidily. (That
currently doesn't work on the superblock.)

It also gets rid of cpp abuse in the form of fake structure member
macros.

Also, instead of doing sleep/wakeup on &lfs_avail and &lfs_nextseg
inside the on-disk superblock, add extra elements to the in-memory
struct lfs for this. (XXX: these should be changed to condvars, but
not right now)

XXX: this migrates a structure needed by the lfs code in libsa (struct
salfs) into lfs.h, where it doesn't belong, but for the time being
this is necessary in order to allow the accessors (and the various
lfs macros and other goop that relies on them) to compile.
 1.160 31-May-2015  hannken Change lfs from hash table to vcache.

- Change lfs_valloc() to return an inode number and version instead of
a vnode and move lfs_ialloc() and lfs_vcreate() to new lfs_init_vnode().

- Add lfs_valloc_fixed() to allocate a known inode, used by kernel
roll forward.

- Remove lfs_*ref(), these functions cannot coexist with vcache and
their commented behaviour is far away from their implementation.

- Add the cleaner lwp and blockinfo to struct ulfsmount so lfs_loadvnode()
may use hints from the cleaner.

- Remove vnode locks from ulfs_lookup() like we did with ufs_lookup().
 1.159 31-May-2015  hannken Make lfs_fastvget() private to lfs_syscalls.c, change it to take the
BLOCK_INFO and vnode lock type instead of the inode disk address and
return the vnode locked.

Change lfs_markv() and lfs_bmapv() to work on locked vnodes.
 1.158 31-May-2015  hannken Use VFS_PROTOS() for lfs.
Rename conflicting struct lfs field "lfs_start" to "lfs_s0addr".

No functional change.
 1.157 20-Apr-2015  riastradh Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.
 1.156 28-Mar-2015  maxv Remove the 'cred' argument from bread(). Remove a now unused var in
ffs_snapshot.c. Update the man page accordingly.

ok hannken@
 1.155 17-Apr-2014  pgoyette branches: 1.155.4;
s/null/NULL/ to fix build break

Hello, xtos!
 1.154 17-Apr-2014  christos CID/1203190: Fix NULL deref
 1.153 09-Apr-2014  riastradh Take vp->v_interlock before vdead_check in lfs_bmapv.

XXX This code is a pile of bodge that needs a serious rototill anyway.
 1.152 24-Mar-2014  hannken branches: 1.152.2;
- Make VI_XLOCK, VI_CLEAN and VI_LOCKSHARE private to kern/vfs_*.c.
- Make vwait() static.
- Add vdead_check() to check a vnode for being or becoming dead.

Discussed on tech-kern.

Welcome to 6.99.38
 1.151 05-Mar-2014  hannken Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34
 1.150 29-Oct-2013  hannken Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25
 1.149 07-Oct-2013  dholland Remove stray KERNEL_UNLOCK_ONE() in error path of lfs_markv().
From Wolfgang Stukenbrock in PR 44370.

This error path is only reachable if lfs_markv is handed an out of
range inode number, so it's unlikely that it gets tickled very often.

It isn't clear to me that we need the kernel lock in here at all, as
the path to lfs_markv that's actually used at this point (via fcntl)
doesn't take it. But, one thing at a time.
 1.148 28-Jul-2013  dholland Add lfs_kernel.h for declarations that don't need to be exposed to userland.

lfs currently has the following headers:
lfs.h - on-disk structures and stuff needed for userlevel tools
lfs_inode.h - additional restricted materials for userlevel tools
that operate the fs (newfs_lfs, fsck_lfs, lfs_cleanerd)
lfs_kernel.h - stuff needed only in the kernel

and the following legacy headers that are expected to be mopped up and
folded into one of the above:
lfs_extern.h - function prototypes
ulfs_bswap.h - endian-independent support
ulfs_dinode.h - now contains very little
ulfs_dirhash.h - dirhash support
ulfs_extattr.h - extattr support
ulfs_extern.h - more function prototypes
ulfs_inode.h - assorted kernel-only declarations
ulfs_quota.h - quota support
ulfs_quota1.h - more quota support
ulfs_quota2.h - more quota support
ulfs_quotacommon.h - more quota support
ulfsmount.h - legacy copy of ufsmount material
 1.147 18-Jun-2013  christos branches: 1.147.2;
Prefix most of the cpp macros with lfs_ and LFS_ to avoid conflicts with ffs.
This was done so that boot blocks that want to compile both FFS and LFS in
the same file work.
 1.146 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.145 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.144 22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.143 20-Dec-2012  hannken Change bread() and breadn() to never return a buffer on
error and modify all callers to not brelse() on error.

Welcome to 6.99.16

PR kern/46282 (6.0_BETA crash: msdosfs_bmap -> pcbmap -> bread -> bio_doread)
 1.142 13-Mar-2012  elad branches: 1.142.2;
Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.
 1.141 15-Jan-2012  perseant Corrections to part of rev 1.140. lfs_bmapv, not lfs_markv, marks vnodes
LFSI_BMAP and recycles them. This greatly reduces the writing leakage
occurring when the filesystem has no space available for non-cleaning
writes.
 1.140 02-Jan-2012  perseant * Remove PGO_RECLAIM during lfs_putpages()' call to genfs_putpages(),
to avoid a live lock in the latter when reclaiming a vnode with
dirty pages.

* Add a new segment flag, SEGM_RECLAIM, to note when a segment is
being written for vnode reclamation, and record which inode is being
reclaimed, to aid in forensic debugging.

* Add a new segment flag, SEGM_SINGLE, so that opportunistic writes
can write a single segment's worth of blocks and then stop, rather
than writing all the way up to the cleaner's reserved number of
segments.

* Add assert statements to check mutex ownership is the way it ought
to be, mostly in lfs_putpages; fix problems uncovered by this.

* Don't clear VU_DIROP until the inode actually makes its way to disk,
avoiding a problem where dirop inodes could become separated
(uncovered by a modified version of the "ckckp" forensic regression
test).

* Move the vfs_getopsbyname() call into lfs_writerd. Prepare code to
make lfs_writerd notice when there are no more LFSs, and exit losing
the reference, so that, in theory, the module can be unloaded. This
code is not enabled, since it causes a crash on exit.

* Set IN_MODIFIED on inodes flushed by lfs_flush_dirops. Really we
only need to set IN_MODIFIED if we are going to write them again
(e.g., to write pages); need to think about this more.

Finally, several changes to help avoid "no clean segments" panics:

* In lfs_bmapv, note when a vnode is loaded only to discover whether
its blocks are live, so it can immediately be recycled. Since the
cleaner will try to choose ~empty segments over full ones, this
prevents the cleaner from (1) filling the vnode cache with junk, and
(2) squeezing any unwritten writes to disk and running the fs out of
segments.

* Overestimate by half the amount of metadata that will be required
to fill the clean segments. This will make the disk appear smaller,
but should help avoid a "no clean segments" panic.

* Rearrange lfs_writerd. In particular, lfs_writerd now pays
attention to the number of clean segments available, and holds off
writing until there is room.
 1.139 12-Jun-2011  rmind branches: 1.139.2; 1.139.6;
Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.138 01-Jul-2010  hannken branches: 1.138.6;
Remove vlockmgr(). Generic vnode lock operations now use a rwlock located
in the vnode. All LK_* flags move from sys/lock.h to sys/vnode.h. Calls
to vlockmgr() in file systems get replaced with VOP_LOCK() or VOP_UNLOCK().

Welcome to 5.99.34.

Discussed on tech-kern.
 1.137 24-Jun-2010  hannken Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.
 1.136 16-Feb-2010  mlelstv branches: 1.136.2;
Three changes in a single commit.

- drop the notion of frags (LFS fragments) vs fsb (FFS fragments)
The code uses a complicated unity function that just makes the
code difficult to understand.

- support larger sector sizes. Fix disk address computations
to use DEV_BSIZE in the kernel as required by device drivers
and to use sector sizes in userland.

- Fix several locking bugs in lfs_bio.c and lfs_subr.c.
 1.135 13-Sep-2009  tsutsui branches: 1.135.2;
Move declaration of ufs_hashlock into <ufs/ufs_extern.h> from each c source.
 1.134 11-Jan-2009  christos merge christos-time_t
 1.133 16-May-2008  hannken branches: 1.133.6;
Make sure all cached buffers with valid, not yet written data have been
run through copy-on-write. Call fscow_run() with valid data where possible.

The LP_UFSCOW hack is no longer needed to protect ffs_copyonwrite() against
endless recursion.

- Add a flag B_MODIFY to bread(), breada() and breadn(). If set the caller
intends to modify the buffer returned.

- Always run copy-on-write on buffers returned from ffs_balloc().

- Add new function ffs_getblk() that gets a buffer, assigns a new blkno,
may clear the buffer and runs copy-on-write. Process possible errors
from getblk() or fscow_run(). Part of PR kern/38664.

Welcome to 4.99.63

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.132 06-May-2008  ad branches: 1.132.2;
PR kern/38141 lookup/vfs_busy acquire rwlock recursively

Simplify the mount locking. Remove all the crud to deal with recursion on
the mount lock, and crud to deal with unmount as another weirdo lock.

Hopefully this will once and for all fix the deadlocks with this. With this
commit there are two locks on each mount:

- krwlock_t mnt_unmounting. This is used to prevent unmount across critical
sections like getnewvnode(). It's only ever read locked with rw_tryenter(),
and is only ever write locked in dounmount(). A write hold can't be taken
on this lock if the current LWP could hold a vnode lock.

- kmutex_t mnt_updating. This is taken by threads updating the mount, for
example when going r/o -> r/w, and is only present to serialize updates.
In order to take this lock, a read hold must first be taken on
mnt_unmounting, and the two need to be held across the operation.

One effect of this change: previously if an unmount failed, we would make a
half hearted attempt to back out of it gracefully, but that was unlikely to
work in a lot of cases. Now while an unmount that will be aborted is in
progress, new file operations within the mount will fail instead of being
delayed. That is unlikely to be a problem though, because if the admin
requests unmount of a file system then s(he) has made a decision to deny
access to the resource.
 1.131 30-Apr-2008  ad PR kern/38135 vfs_busy/vfs_trybusy confusion

The previous fix worked, but it opened a window where mounts could have
disappeared from mountlist while the caller was traversing it using
vfs_trybusy(). Fix that.
 1.130 28-Apr-2008  martin Remove clause 3 and 4 from TNF licenses
 1.129 21-Apr-2008  ad branches: 1.129.2;
Acquire kernel_lock directly in LFS syscalls.
 1.128 30-Jan-2008  ad branches: 1.128.6; 1.128.8; 1.128.10;
PR kern/37706 (forced unmount of file systems is unsafe):

- Do reference counting for 'struct mount'. Each vnode associated with a
mount takes a reference, and in turn the mount takes a reference to the
vfsops.
- Now that mounts are reference counted, replace the overcomplicated mount
locking inherited from 4.4BSD with a recursable rwlock.
 1.127 30-Jan-2008  ad Replace struct lock on vnodes with a simpler lock object built on
krwlock_t. This is a step towards removing lockmgr and simplifying
vnode locking. Discussed on tech-kern.
 1.126 02-Jan-2008  ad Merge vmlocking2 to head.
 1.125 20-Dec-2007  dsl Convert all the system call entry points from:
int foo(struct lwp *l, void *v, register_t *retval)
to:
int foo(struct lwp *l, const struct foo_args *uap, register_t *retval)
Fixup compat code to not write into 'uap' and (in some cases) to actually
pass a correctly formatted 'uap' structure with the right name to the
next routine.
A few 'compat' routines that just call standard ones have been deleted.
All the 'compat' code compiles (along with the kernels required to test
build it).
98% done by automated scripts.
 1.124 10-Oct-2007  ad branches: 1.124.4; 1.124.6; 1.124.10;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.123 08-Oct-2007  ad Merge ffs locking & brelse changes from the vmlocking branch.
 1.122 04-Mar-2007  christos branches: 1.122.2; 1.122.14; 1.122.16; 1.122.18;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.121 15-Feb-2007  ad branches: 1.121.2;
Replace some uses of lockmgr() / simplelocks.
 1.120 09-Feb-2007  ad Merge newlock2 to head.
 1.119 04-Jan-2007  elad Consistent usage of KAUTH_GENERIC_ISSUSER.
 1.118 16-Nov-2006  christos branches: 1.118.2; 1.118.4;
__unused removal on arguments; approved by core.
 1.117 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.116 01-Sep-2006  perseant branches: 1.116.2; 1.116.4;
Changes to help the roll-forward agent, to wit:

* Mark being-deleted files in the Ifile so we can finish deleting them
at fs mount time.
* Flag the Ifile with "cleaner must clean" when writers are waiting for
the cleaner, rather than relying solely on the cleaner's estimation of
whether it should clean or not.
* Note partial segments written by a user agent (in particular,
fsck_lfs) so that repeated rolls forward don't interfere with one
another.
* Add a new fcntl, LFCNPASS, that allows the log to wrap exactly once,
for better testing of the validity of checkpoints.
* Keep track of the on-disk nlink count when cleaning, so that we don't
partially complete directory operations while cleaning.
* Ensure that every single Ifile inode write represents a consistent
view of the filesystem. In particular, the accounting for the segment
we are writing the inode into must be correct, and the accounting for
the segment that inode used to reside in must be correct. Rather than
just rewriting the inode if we wrote it wrong, rewrite the necessary
ifile blocks before writing the inode so we never write it wrong.
* Don't unmark any VDIROP vnodes if we haven't written them to disk,
avoiding yet another problem with the "wait for the cleaner" error
return from lfs_putpages().

Also, move the last callback to an aiodone call, so we no longer do any
memory management from interrupt context.
 1.115 23-Jul-2006  ad Use the LWP cached credentials where sane.
 1.114 07-Jun-2006  kardel merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.113 14-May-2006  elad branches: 1.113.2;
integrate kauth.
 1.112 18-Apr-2006  perseant Get rid of the LFS_FORCE_WRITE case. We never really used it, and it could
panic the kernel if cleaner daemon passed the right combination of arguments.
Coverity CID 2741.
 1.111 07-Apr-2006  perseant Several minor bug fixes:

* Correct (weak) segment lock assertions in lfs_fragextend and lfs_putpages.
* Keep IN_MODIFIED set if we run out of avail in lfs_putpages.
* Don't try to (re)write buffers on a VBLK vnode; fixes a panic I found
while running with an LFS root.
* Raise priority of LFCNSEGWAIT to PVFS; PUSER is way too low for
something the pagedaemon is relying on.
 1.110 19-Mar-2006  rtr init struct vnode *vp = NULL
coverity 2724 / run 6
XXX in future runs coverity may complain about deref NULL now but comment
on line 382 indicates this should not be possible
 1.109 17-Mar-2006  tls From Konrad Schroeder, in response to strange df output on anoncvs.netbsd.org:
We were returning the wrong value for free space. Now we're not.
 1.108 11-Dec-2005  christos branches: 1.108.4; 1.108.6; 1.108.8; 1.108.10; 1.108.12;
merge ktrace-lwp.
 1.107 25-May-2005  perseant branches: 1.107.2;
Don't update lfs_stats.segs_reclaimed if we're not keeping statistics.
Patch from Juan RP.
 1.106 20-May-2005  perseant Keep track of the number of segments reclaimed, since the cleaner doesn't
do this anymore (it hasn't for quite some time). Add a couple of conditional
debugging messages to indicate why segments are not cleaned, in the event
that lfs_segclean is used.

Make the LFCNSEGWAITALL fcntl work again.
 1.105 16-Apr-2005  perseant Use lfs_malloc() to manage the blkiov arrays that the cleaner functions use,
since the cleaner is likely to operate in a low-memory condition.
 1.104 01-Apr-2005  perseant Protect various per-fs structures with fs->lfs_interlock simple_lock, to
improve behavior in the multiprocessor case. Add debugging segment-lock
assertion statements.
 1.103 08-Mar-2005  perseant branches: 1.103.2;
Straighten out the maze of ifdefs. Instead, consolidate all the debugging
stuff under '#ifdef DEBUG', and use sysctl knobs to turn on/off particular
parts of the debugging reporting (if DEBUG is enabled). Re-enable the LFS
statistics in sysctl, while I'm there. A bit of a rototill.
 1.102 26-Feb-2005  perry nuke trailing whitespace
 1.101 26-Feb-2005  perseant Various minor LFS improvements:

* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statvfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().
 1.100 04-Dec-2003  yamt branches: 1.100.6; 1.100.8; 1.100.10;
use b_private rather than b_saveaddr.
XXX LFS_USE_B_INVAL
 1.99 07-Nov-2003  yamt fix spec vnode aliasing.
 1.98 10-Sep-2003  yamt g/c CHECK_COPYIN.
 1.97 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.96 30-Jul-2003  yamt - check EROFS earlier in lfs_markv.
- remove wrong error recovery code (fake buffers are never on bufqueue)
and put a comment instead.
 1.95 30-Jul-2003  yamt remove an unused definition of LFS_VREF_THRESHOLD.
 1.94 02-Jul-2003  yamt use queue.h macros.
 1.93 29-Jun-2003  fvdl branches: 1.93.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.92 29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.91 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.90 17-May-2003  nakayama Avoid comparison is always false warning in gcc 3.3 w/ 64-bit size_t.
 1.89 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.88 20-Mar-2003  yamt fix "more than one fragment" panics;
direct and indirect block pointers are not valid in the case of shortlinks.
while i'm here, move duplicated code in lfs_vget/fastvget into a new
function, lfs_vinit.
 1.87 15-Mar-2003  perseant Add simple_lock protection for lfs_seglock and lfs_subsys_pages; these will
be expanded to cover other per-fs and subsystem-wide data as well.

Fix a case of IN_MODIFIED being set without updating lfs_uinodes, resulting
in a "lfs_uinodes < 0" panic.

Fix a deadlock in lfs_putpages arising from the need to busy all pages in a
block; unbusy any that had already been busied before starting over.
 1.86 08-Mar-2003  perseant Only #define LFS if not already defined.
 1.85 08-Mar-2003  perseant Add an lfs_strategy() that checks to make sure we're not trying to read
where the cleaner is trying to write, instead of tying up the "live"
buffers (or pages).

Fix a bug in the LFS_UBC case where oversized buffers would not be
checksummed correctly, causing uncleanable segments.

Make sure that wakeup(fs->lfs_iocount) is done if fs->lfs_iocount is 1
as well as 0, since we wait in some places for it to drop to 1.

Activate all pages that make it into lfs_gop_write without the segment
lock held, since they must have been dirtied very recently, even if
PG_DELWRI is not set.
 1.84 24-Feb-2003  perseant Add lfs_ioctl vnode op, with ioctls to take over cleaner system call
functionality (not including segment clean, since that is now done
automatically as checkpoints happen).
 1.83 23-Feb-2003  simonb Remove assigned-to but not used variable.
 1.82 20-Feb-2003  perseant Tabify, and fix some comment alignment problems.
 1.81 18-Feb-2003  perseant Make it compile again, grr....
 1.80 17-Feb-2003  perseant Add code to UBCify LFS. This is still behind "#ifdef LFS_UBC" for now
(there are still some details to work out) but expect that to go
away soon. To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:

* Create a writer daemon kernel thread whose purpose is to handle page
writes for the pagedaemon, but which also takes over some of the
functions of lfs_check(). This thread is started the first time an
LFS is mounted.

* Add a "flags" parameter to GOP_SIZE. Current values are
GOP_SIZE_READ, meaning that the call should return the size of the
in-core version of the file, and GOP_SIZE_WRITE, meaning that it
should return the on-disk size. One of GOP_SIZE_READ or
GOP_SIZE_WRITE must be specified.

* Instead of using malloc(...M_WAITOK) for everything, reserve enough
resources to get by and use malloc(...M_NOWAIT), using the reserves if
necessary. Use the pool subsystem for structures small enough that
this is feasible. This also obsoletes LFS_THROTTLE.

And a few that are not strictly necessary:

* Moves the LFS inode extensions off onto a separately allocated
structure; getting closer to LFS as an LKM. "Welcome to 1.6O."

* Unified GOP_ALLOC between FFS and LFS.

* Update LFS copyright headers to correct values.

* Actually cast to unsigned in lfs_shellsort, like the comment says.

* Keep track of which segments were empty before the previous
checkpoint; any segments that pass two checkpoints both dirty and
empty can be summarily cleaned. Do this. Right now lfs_segclean
still works, but this should be turned into an effectless
compatibility syscall.
 1.79 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.78 18-Jan-2003  thorpej Merge the nathanw_sa branch.
 1.77 26-Dec-2002  yamt don't try to write all blocks passed to lfs_markv at once
since it likely causes buf starvation.
 1.76 21-Dec-2002  yamt add a XXX comment
 1.75 18-Dec-2002  yamt correct/add assertion.
 1.74 17-Dec-2002  yamt no need for cleaner to hold vnode locks.
cleaner and normal vnode operations are synchronized enough by
seglock/fraglock and buf's B_BUSY-ness.
 1.73 24-Nov-2002  yamt in lfs_fakebuf, make corresponding buffer busy to avoid
reading blocks that isn't written yet.
it's needed because we'll update metadatas in lfs_updatemeta
before data pointed by them is actually written to disk.

XXX should be solved with fake inode/indirect blocks instead?
 1.72 24-Nov-2002  yamt blksize() macro shouldn't used for indirect blocks.
this fixes "getblk: block size invariant failed" panic.
PR 18977.
 1.71 03-Aug-2002  itojun correct range check, have overflow check, fix type mismatches,
for cmap args and some other calls. from openbsd
 1.70 07-Jul-2002  briggs Fix a printf format warning.
 1.69 06-Jul-2002  perseant Deal with fragment size changes better. For each fragment that can
exist on an on-disk inode, we keep a record of its size in struct inode,
which is updated when we write the block to disk. The cleaner routines
thus have ready access to what size is the correct size for this block,
on disk.

Fixed a related bug: if a file with fragments is being cleaned
(fragments being cleaned) at the same time it is being extended beyond
NDADDR blocks, we could write a bogus FINFO record that has a frag in the
middle; when it was cleaned this would give back bogus file data. Don't
write the indirect blocks in this case, since there is no need.

lfs_fragextend and lfs_truncate no longer require the seglock, but instead
take a shared lock, which the seglock locks exclusively.
 1.68 20-Jun-2002  perseant Don't bomb out of lfs_bmapv if the caller is requesting blocks that
live in the current segment. There's nothing wrong with this, and
it is necessary for the correct operation of the coaleascer.
 1.67 16-Jun-2002  perseant For synchronous writes, keep separate i/o counters for each write, so
processes don't have to wait for one another to finish (e.g., nfsd seems
to be a little happier now, though I haven't measured the difference).
Synchronous checkpoints, however, must always wait for all i/o to finish.

Take the contents of the callback functions and have them run in thread
context instead (aiodoned thread). lfs_iocount no longer has to be
protected in splbio(), and quite a bit less of the segment construction
loop needs to be in splbio() as well.

If lfs_markv is handed a block that is not the correct size according to
the inode, refuse to process it. (Formerly it was extended to the "correct"
size.) This is possibly more prone to deadlock, but less prone to corruption.

lfs_segclean now outright refuses to clean segments that appear to have live
bytes in them. Again this may be more prone to deadlock but avoids
corruption.

Replace ufsspec_close and ufsfifo_close with LFS equivalents; this means
that no UFS functions need to know about LFS_ITIMES any more. Remove
the reference from ufs/inode.h.

Tested on i386, test-compiled on alpha.
 1.66 06-Jun-2002  perseant Let lfs_bmapv fill in the bi_size member of the BLOCK_INFO structure,
as well as bi_daddr. This lets the cleaner have an idea of what the size
of this block was at the time it was written without having to refer to
a segment header (e.g., in the file coalescing case).

Tested on i386.
 1.65 14-May-2002  perseant branches: 1.65.2; 1.65.4;
Phase one of my three-phase plan to make LFS play nice with UBC, and bug-fixes
I found while making sure there weren't any new ones.

* Make the write clusters keep track of the buffers whose blocks they contain.
This should make it possible to (1) write clusters using a page mapping
instead of malloc, if desired, and (2) schedule blocks for rewriting
(somewhere else) if a write error occurs. Code is present to use
pagemove() to construct the clusters but that is untested and will go away
anyway in favor of page mapping.
* DEBUG now keeps a log of Ifile writes, so that any lingering instances of
the "dirty bufs" problem can be properly debugged.
* Keep track of whether the Ifile has been dirtied by various routines that
can be called by lfs_segwrite, and loop on that until it is clean, for
a checkpoint. Checkpoints need to be squeaky clean.
* Warn the user (once) if the Ifile grows larger than is reasonable for their
buffer cache. Both lfs_mountfs and lfs_unmount check since the Ifile can
grow.
* If an inode is not found in a disk block, try rereading the block, under
the assumption that the block was copied to a cluster and then freed.
* Protect WRITEINPROG() with splbio() to fix a hang in lfs_update.
 1.64 12-May-2002  matt Eliminate commons.
 1.63 18-Dec-2001  chs use the new compatibility routines to allow mmap() to work
(in the same non-coherent fashion that it worked pre-UBC)
until someone has time to do it the right way.
 1.62 23-Nov-2001  chs add spaces for KNF. confirmed to produce identical objects.
 1.61 08-Nov-2001  lukem add RCSID
 1.60 26-Oct-2001  lukem remove #include <ufs/ufs/quota.h> where it was just to appease
<ufs/ufs/inode.h>, since the latter now includes the former. leave the former
in source that obviously uses specific bits of it (for completeness.)
 1.59 15-Sep-2001  chs branches: 1.59.2;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.58 03-Aug-2001  jdolecek branches: 1.58.2;
Constraint 'blkcnt' of lfs_markv() syscall by 64KB. Reviewed by
Konrad Schroder <perseant@NetBSD.org>.
 1.57 13-Jul-2001  perseant Merge the short-lived perseant-lfsv2 branch into the trunk.

Kernels and tools understand both v1 and v2 filesystems; newfs_lfs
generates v2 by default. Changes for the v2 layout include:

- Segments of non-PO2 size and arbitrary block offset, so these can be
matched to convenient physical characteristics of the partition (e.g.,
stripe or track size and offset).

- Address by fragment instead of by disk sector, paving the way for
non-512-byte-sector devices. In theory fragments can be as large
as you like, though in reality they must be smaller than MAXBSIZE in size.

- Use serial number and filesystem identifier to ensure that roll-forward
doesn't get old data and think it's new. Roll-forward is enabled for
v2 filesystems, though not for v1 filesystems by default.

- The inode free list is now a tailq, paving the way for undelete (undelete
is not yet implemented, but can be without further non-backwards-compatible
changes to disk structures).

- Inode atime information is kept in the Ifile, instead of on the inode;
that is, the inode is never written *just* because atime was changed.
Because of this the inodes remain near the file data on the disk, rather
than wandering all over as the disk is read repeatedly. This speeds up
repeated reads by a small but noticeable amount.

Other changes of note include:

- The ifile written by newfs_lfs can now be of arbitrary length, it is no
longer restricted to a single indirect block.

- Fixed an old bug where ctime was changed every time a vnode was created.
I need to look more closely to make sure that the times are only updated
during write(2) and friends, not after-the-fact during a segment write,
and certainly not by the cleaner.
 1.56 03-Dec-2000  perseant branches: 1.56.2; 1.56.4; 1.56.6;
Call uvm_vmp_setsize() in lfs_{fast,}vget to set initial vnode size.
 1.55 30-Nov-2000  jdolecek no need to include fs_lfs.h, define LFS directly
 1.54 27-Nov-2000  perseant If LFS_DO_ROLLFORWARD is defined, roll forward from the older checkpoint
on mount, through the newer checkpoint and on through any newer
partial-segments that may have been written but not checkpointed because
of an intervening crash.

LFS_DO_ROLLFORWARD is not defined by default.
 1.53 22-Nov-2000  perseant Protect lfs_{bmapv,markv} with vfs_{un,}busy. Fix a reference/lock leak
in an error case in lfs_markv. Change the vfs_getvfs() error to return
ENOENT, for consistency with failure of vfs_busy().

99% of this patch was from Jesse Off <joff@gci-net.com> (PR #11547).
 1.52 17-Nov-2000  perseant Correct accounting of lfs_avail, locked_queue_count, and locked_queue_bytes.
(PR #11468). In the case of fragment allocation, check to see if enough
space is available before extending a fragment already scheduled for writing.

The locked_queue_* variables indicate the number of buffer headers and bytes,
respectively, that are unavailable to getnewbuf() because they are locked up
waiting for LFS to flush them; make sure that that is actually what we're
counting, i.e., never count malloced buffers, and always use b_bufsize instead
of b_bcount.

If DEBUG is defined, the periodic calls to lfs_countlocked will now complain
if either counter is incorrect. (In the future lfs_countlocked will not need
to be called at all if DEBUG is not defined.)
 1.51 21-Oct-2000  toshii In lfs_fastvget(), initialize i_lfs_effnblks correctly.
 1.50 20-Oct-2000  perseant Do not increment the clean segment counter, if a segment that the cleaner
is trying to clean is already clean (e.g., if two lfs_cleanerds are running
at once.)
 1.49 09-Sep-2000  perseant Various bug-fixes to LFS, to wit:


Kernel:

* Add runtime quantity lfs_ravail, the number of disk-blocks reserved
for writing. Writes to the filesystem first reserve a maximum amount
of blocks before their write is allowed to proceed; after the blocks
are allocated the reserved total is reduced by a corresponding amount.

If the lfs_reserve function cannot immediately reserve the requested
number of blocks, the inode is unlocked, and the thread sleeps until
the cleaner has made enough space available for the blocks to be
reserved. In this way large files can be written to the filesystem
(or, smaller files can be written to a nearly-full but thoroughly
clean filesystem) and the cleaner can still function properly.

* Remove explicit switching on dlfs_minfreeseg from the kernel code; it
is now merely a fs-creation parameter used to compute dlfs_avail and
dlfs_bfree (and used by fsck_lfs(8) to check their accuracy). Its
former role is better assumed by a properly computed dlfs_avail.

* Bounds-check inode numbers submitted through lfs_bmapv and lfs_markv.
This prevents a panic, but, if the cleaner is feeding the filesystem
the wrong data, you are still in a world of hurt.

* Cleanup: remove explicit references of DEV_BSIZE in favor of
btodb()/dbtob().

lfs_cleanerd:

* Make -n mean "send N segments' blocks through a single call to
lfs_markv". Previously it had meant "clean N segments though N calls
to lfs_markv, before looking again to see if more need to be cleaned".
The new behavior gives better packing of direct data on disk with as
little metadata as possible, largely alleviating the problem that the
cleaner can consume more disk through inefficient use of metadata than
it frees by moving dirty data away from clean "holes" to produce
entirely clean segments.

* Make -b mean "read as many segments as necessary to write N segments
of dirty data back to disk", rather than its former meaning of "read
as many segments as necessary to free N segments worth of space". The
new meaning, combined with the new -n behavior described above,
further aids in cleaning storage efficiency as entire segments can be
written at once, using as few blocks as possible for segment summaries
and inode blocks.

* Make the cleaner take note of segments which could not be cleaned due
to error, and not attempt to clean them until they are entirely free
of dirty blocks. This prevents the case in which a cleanerd running
with -n 1 and without -b (formerly the default) would spin trying
repeatedly to clean a corrupt segment, while the remaining space
filled and deadlocked the filesystem.

* Update the lfs_cleanerd manual page to describe all the options,
including the changes mentioned here (in particular, the -b and -n
flags were previously undocumented).

fsck_lfs:

* Check, and optionally fix, lfs_avail (to an exact figure) and
lfs_bfree (within a margin of error) in pass 5.

newfs_lfs:

* Reduce the default dlfs_minfreeseg to 1/20 of the total segments.

* Add a warning if the sgs disklabel field is 16 (the default for FFS'
cpg, but not usually desirable for LFS' sgs: 5--8 is a better range).

* Change the calculation of lfs_avail and lfs_bfree, corresponding to
the kernel changes mentioned above.

mount_lfs:

* Add -N and -b options to pass corresponding -n and -b options to
lfs_cleanerd.

* Default to calling lfs_cleanerd with "-b -n 4".


[All of these changes were largely tested in the 1.5 branch, with the
idea that they (along with previous un-pulled-up work) could be applied
to the branch while it was still in ALPHA2; however my test system has
experienced corruption on another filesystem (/dev/console has gone
missing :^), and, while I believe this unrelated to the LFS changes, I
cannot with good conscience request that the changes be pulled up.]
 1.48 13-Jul-2000  thorpej XXX Use of hzto() return value needs to be double-checked here.
 1.47 05-Jul-2000  perseant Clean up accounting of lfs_uinodes (dirty but unwritten inodes).

Make lfs_uinodes a signed quantity for debugging purposes, and set it to
zero as fs mount time.

Enclose setting/clearing of the dirty flags (IN_MODIFIED, IN_ACCESSED,
IN_CLEANING) in macros, and use those macros everywhere. Make
LFS_ITIMES use these macros; updated the ITIMES macro in inode.h to know
about this. Make ufs_getattr use ITIMES instead of FFS_ITIMES.
 1.46 04-Jul-2000  perseant Fix errors observed while trying to fill the filesystem with yesterday's
fixes:

- Write copies of bfree and avail in the CLEANERINFO block, so the
cleaner doesn't have to guess which superblock has the current
information (if indeed any do).

- Tighten up accounting of lfs_avail (more needs to be done).

- When cleansing indirect blocks of UNWRITTEN, make sure not to mark
them clean, since they'll need to be rewritten later.
 1.45 03-Jul-2000  fvdl Correct typo in previous.
 1.44 30-Jun-2000  fvdl Rearrange code around getnewvnode as was already done for ffs, to avoid
locking against oneself because getnewvnode recycles a softdep-using vnode.
 1.43 27-Jun-2000  perseant Fixes associated with filling an LFS:

Change the space computation to appear to change the size of the *disk*
rather than the *bytes used* when more segment summaries and inode
blocks are written. Try to estimate the amount of space that these will
take up when more files are written, so the disk size doesn't change too
much.

Regularize error returns from lfs_valloc, lfs_balloc, lfs_truncate: they
now fail entirely, rather than succeeding half-way and leaving the fs in
an inconsistent state.

Rewrite lfs_truncate, mostly stealing from ffs_truncate. The old
lfs_truncate had difficulty truncating a large file to a non-zero size
(indirect blocks were not handled appropriately).

Unmark VDIROP on fvp after ufs_remove, ufs_rmdir, so these can be
reclaimed immediately: this vnode would not be written to disk again
anyway if the removal succeeded, and if it failed, no directory
operation occurred.

ufs_makeinode and ufs_mkdir now remove IN_ADIROP on error.
 1.42 22-Jun-2000  perseant Update lfs_vunref for the fact that now a vnode can be locked with no
references (locked for VOP_INACTIVE at the end of vrele) and it's okay.
Check the return value of lfs_vref where appropriate.
Fixes PR #s 10285 and 10352.
 1.41 30-Mar-2000  augustss branches: 1.41.4;
Remove register declarations.
 1.40 19-Jan-2000  perseant Changes to stabilize LFS. The first two of these should also apply to the
1.4 branch.

* Use a separate per-fs lock, instead of ufs_hashlock, to protect the Inode
free list. This seems to prevent the "lockmgr: %d, not exclusive lock holder
%d, unlocking" message I was mis-attributing last night to an unlocked vnode
being passed to vrele.

* Change calling semantics of lfs_ifind, to give better error reporting:
If fed a struct buf, it can report the block number of the offending inode
block as well as the inode number.

* Back out rev 1.10 of lfs_subr.c, since the replacement code was slightly
uglier while being functionally identical.

* Make lfs_vunref use the same free list convention as vrele/vput, so that
vget does not remove vnodes from a hash list they are not on.
 1.39 16-Jan-2000  perseant correct typo (reference uninitialized variable)
 1.38 14-Jan-2000  perseant Expand the category of "metadata" in lfs_markv to include Ifile data blocks.
This prevents a rare condition in which Ifile "ifile" blocks, that is, the
blocks of the ifile which point VOP_VGET at the inode block containing the
requested inode, from being "unwritten" when cleaning during intense disk
activity.
 1.37 23-Nov-1999  fvdl Be more careful to block bio interrupts for some data structures. There
were at least a few missed cases where vp->v_{clean,dirty}blkhd were
unprotected since the softdep/trickle sync merge.
 1.36 21-Nov-1999  perseant Initialize i_ffs_effnlink, so every file doesn't look like it's already been
deleted for the purpose of dirops (particularly create and mkdir). Addresses
PR#8815.
 1.35 12-Nov-1999  perseant Back out my patch of the 8th (to address unreferenced inode problem).
Apparently this needs more thought.
 1.34 09-Nov-1999  perseant If ifile blocks were written before dirops were complete, and then the
system crashed, inodes could be allocated that were not referenced. (Though
not a serious problem, it evidences itself in phase 4 of fsck_lfs.) Fix
this by marking if_daddr with UNASSIGNED before the inodes are actually
written; at mount time the ifile is checked for UNASSIGNED entries and
any that are found are linked back into the free list. (The latter
functionality should move into the roll-forward agent when it materializes.)
 1.33 08-Jul-1999  wrstuden branches: 1.33.2; 1.33.4; 1.33.8;
Modify file systems to deal with struct lock in struct vnode. All leaf
fs's other than nfs use genfs_lock() for locking.

Modify lookup routines to set PDIRUNLOCK when they unlock the parrent.
 1.32 09-Jun-1999  drochner complete the previous
reindent syscall args
 1.31 09-Jun-1999  christos prefix the lfs syscalls with sys_
 1.30 14-Apr-1999  perseant Fix lost lock in lfs_markv -- a typo-class bug, obvious when you look at it.
 1.29 12-Apr-1999  perseant Improve the debugging printfs in the cleaner syscalls (in particular, make
it obvious that they're coming from lfs).
 1.28 12-Apr-1999  perseant Better checking for held inode locks in lfs_fastvget, for a number of error
conditions. Also change the default setting of lfs_clean_vnhead to 0, which
seems to make the locking problems go away (although this is difficult to
test as I can't reliably reproduce them).
 1.27 11-Apr-1999  perseant Take out the `#ifdef USE_UFSHASH'; use ufs_hashlock to lock the inode free
list instead of free_lock.
 1.26 29-Mar-1999  perseant branches: 1.26.2;
Fix unit mismatch in debugging code in lfs_segclean; also put it properly
within `#ifdef DEBUG_LFS'.
 1.25 25-Mar-1999  perseant Fixes to make dirops and lfs_vflush play together well. In particular,
if we are short on vnodes, lfs_vflush from another process can grab a
vnode that lfs_markv has already processed but not yet written; but
lfs_markv holds the seglock. When lfs_vflush gets around to writing it,
the context for copyin is gone. So, now lfs_markv calls copyin itself,
rather than having lfs_writeseg do it.
 1.24 25-Mar-1999  perseant Change lfs_sb_cksum to use offsetof() instead of an inlined version.

Fix lfs_vref/lfs_vunredf to ignore VXLOCKed vnodes that are also being
flushed.

Improve the debugging messages somewhat.
 1.23 25-Mar-1999  perseant clean up unused/required #ifdefs
 1.22 10-Mar-1999  perseant New sources should leave the LFS in a more-or-less working state. Changes
include:

- DIROP segregation is enabled, and greater care is taken
to make sure that a checkpoint completes. Fsck is not
needed to remount the filesystem.
- Several checks to make sure that the LFS subsystem does not
overuse various resources (memory, in particular).
- The cleaner routines, lfs_markv in particular, are completely
rewritten. A buffer overflow is removed. Greater care is taken
to ensure that inodes come from where lfs_cleanerd say they come
from (so we know nothing has changed since lfs_bmapv was called).
- Fragment allocation is fixed, so that writes beyond end-of-file
do the right thing.
 1.21 09-Nov-1998  mycroft GC the B_CACHE bit.
 1.20 23-Oct-1998  thorpej Use DINODE_SIZE rather than sizeof(struct dinode).
 1.19 15-Sep-1998  pk Apply patch from PR#5542: buffer overflow in lfs_markv().
 1.18 24-Jun-1998  sommerfe Always include fifos; "not an option any more".
 1.17 09-Jun-1998  scottr Protect various config(8)-generated files from inclusion while
building LKMs. Fixes PR 5557.
 1.16 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.15 19-Feb-1998  thorpej Include the LFS option header.
 1.14 13-Jan-1998  thorpej Nuke spurious semicolon, from Konrad Schroder <perseant@hhhh.org>.
 1.13 11-Jun-1997  bouyer Add support for ext2fs, this needed a few modifications to ufs/ufs/inode.h:
- added an "union inode_ext" to struct inode, for the per-fs extentions.
For now only ext2fs uses it.
- i_din is now an union:
union {
struct dinode ffs_din; /* 128 bytes of the on-disk dinode. */
struct ext2fs_dinode e2fs_din; /* 128 bytes of the on-disk dinode. */
} i_din
Added a lot of #define i_ffs_* and i_e2fs_* to access the fields.
- Added two macros: FFS_ITIMES and EXT2FS_ITIMES. ITIMES calls the rigth
macro, depending on the time of the inode. ITIMES is used where necessary,
FFS_ITIMES and EXT2FS_ITIMES in other places.
 1.12 12-Oct-1996  christos revert previous kprintf changes
 1.11 10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.10 09-Feb-1996  christos lfs prototypes
 1.9 21-Sep-1995  thorpej Make system calls conform to a standard prototype and bring those
prototypes into scope.
 1.8 21-Mar-1995  mycroft Update to use timer{add,sub}().
 1.7 14-Dec-1994  mycroft Sync with CSRG.
 1.6 11-Dec-1994  mycroft Use __timeradd(), not timevaladd().
 1.5 20-Oct-1994  cgd update for new syscall args description mechanism, and deal safely
with wider types.
 1.4 21-Aug-1994  cgd C syntax fix, and syscall args style (For later.)
 1.3 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.2 16-Jun-1994  mycroft This i_flags should be i_flag.
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.26.2.4 20-Jan-2000  he Pull up revision 1.40 (requested by perseant):
Files removed (through unlink, rmdir) are now really removed, though the
removal is postponed until the dirop is complete to ensure validity of
the filesystem through a crash. Use a separate per-fs lock, instead of
ufs_hashlock, to protect the inode free list. Change calling semantics
of lfs_ifind, to give better error reporting: If fed a struct buf, it
can report the block number of the offending inode block as well as the
inode number.
 1.26.2.3 15-Jan-2000  he Pull up revision 1.38 (requested by perseant):
Expand the category of "metadata" in lfs_markv to include Ifile
data blocks. This prevents a rare condition in which certain
Ifile blocks are "unwritten" when cleaning during intense disk
activity.
 1.26.2.2 15-Apr-1999  perseant branches: 1.26.2.2.2;
Pull up 1.29->1.30; trivial fix for the forgotten lock problem.
 1.26.2.1 13-Apr-1999  perseant Pull-up of changes made to the trunk on Sunday [1.27-1.28], to wit:

Take out the `#ifdef USE_UFSHASH'; use ufs_hashlock to lock the inode free
list instead of free_lock.

Fix inode reporting in lfs_statfs (the meaning of f_files and f_ffree was
reversed).

Fix "lfs_ifind: dinode xxx not found" panic. When inodes were freed, then
immediately reloaded, their dinodes were located in an inode block which
was not on disk at the advertized location, nor in the cache (although it
would be flushed to disk next segment write). Fix this by using getblk()
instead of lfs_newbuf() for inode blocks.

Better checking for held inode locks in lfs_fastvget, for a number of
error conditions. Also change the default setting of lfs_clean_vnhead to
0, which seems to make the locking problems go away (although this is
difficult to test as I can't reliably reproduce them).

Make sure that the wakeup occurs for vnodes that lfs_update might be
sleeping on (nodes which are not marked IN_MODIFIED/IN_CLEANING, but which
have dirty buffers), by marking them with the appropriate flag if
dirtybuffers were added while the write was in progress.

Fix block counting during file truncation, if not truncating to zero.

Disallow threshold-initiated cache flush when dirops are active. Also,
make SET_ENDOP use lfs_check instead of inlining most of it.

Improve the debugging printfs in the cleaner syscalls (in particular, make
it obvious that they're coming from lfs).

Check the superblock version field, and refuse to mount the filesystem if
the version number is higher than we know about. This allows, e.g.,
changes in the format of the ifile, segment size restrictions and
boundaries, etc., which would not affect existing fields in the
superblock, but which would drastically affect the filesystem, to be
smoothly integrated at a later date.
 1.26.2.2.2.3 31-Aug-1999  perseant Rudimentary support for LFS under UBC:

- LFS-specific VOP_BALLOC and VOP_PUTPAGES vnode ops.

- getblk VREG panic #ifdef'd out (can be reinstated when Ifile is
internalized and Ifile can be made another type from VREG)

- interface to VOP_PUTPAGES changed to pass all pager flags, not
just sync. FS putpages routines must know about the pager flags.

- new LFS magic disk address, -2 ("unwritten"), meaning accounted for
but not assigned to a fixed disk location (since LFS does these two
things separately, and the previous accounting method using buffer
headers no longer will work). Changed references to (foo == (daddr_t)-1)
to (foo < 0). Since disk drivers reject all addresses < 0, this should
not present a problem for other FSs.
 1.26.2.2.2.2 02-Aug-1999  thorpej Update from trunk.
 1.26.2.2.2.1 21-Jun-1999  thorpej Sync w/ -current.
 1.33.8.2 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.33.8.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.33.4.1 15-Nov-1999  fvdl Sync with -current
 1.33.2.3 08-Dec-2000  bouyer Sync with HEAD.
 1.33.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.33.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.41.4.7 16-Aug-2001  tv Pullup [jdolecek]:

sys/ufs/lfs/lfs_syscalls.c 1.58 via patch

Constrain "blkcnt" of lfs_markv() syscall by 64KB.
 1.41.4.6 03-Feb-2001  he Pull up revisions 1.50,1.52-1.53 (requested by perseant):
o Close up accounting holes in LFS' accounting of immediately-
available-space, number of clean segments, and amount of dirty
space taken up by metadata (PR#11468, PR#11470, PR#11534).
o Don't clean the same segment twice.
o Fix locking and reference leaks in lfs_markv, on error (PR#11547).
 1.41.4.5 01-Nov-2000  tv Pullup 1.51 [toshii]:
In lfs_fastvget(), initialize i_lfs_effnblks correctly.
 1.41.4.4 14-Sep-2000  perseant Pull up recent LFS kernel changes (approved by thorpej):

ufs/ufs/inode.h, 1.20--1.22 (add i_lfs_effnblks extension ;
make ITIMES aware of LFS_ITIMES;
_LKM protection so userland progs
compile)
ufs/ufs/ufs_vnops.c, 1.69, 1.71 (remove IN_ADIROP;
use ITIMES instead of FFS_ITIMES)
ufs/ufs/ufs_readwrite.c, 1.27 (use lfs_reserve in lfs_write)
ufs/lfs/lfs.h, 1.26--1.32 (define LFS_EST_* macros ;
change MIN_FREE_SEGS to lfs_minfreesegs ;
add avail and bfree to CLEANERINFO ;
change lfs_uinodes to signed ;
change lfs_dmeta to signed ;
add whitespace to line up structure
members ;
explicit cast to int32_t in LFS_EST_*
macros)
ufs/lfs/lfs_alloc.c, back out 1.34.2.3 (pullups of 1.39, 1.40);
then pull up 1.38 (clean up on error)
1.39--1.43 (restore fvdl's ufs_hashlock fix ;
restore fvdl's ufs_hashlock fix ;
set i_lfs_effnblks ;
use UINO macros ;
add comments and fix long lines)
ufs/lfs/lfs_balloc.c, 1.19 (don't succeed halfway)
1.21--1.25 (use i_lfs_effnblks ;
fix i_lfs_effnblks computation and
quieten ;
fix i_ffs_blocks in unwritten fragment ;
remove useless debugging check ;
add comments and (c) 2000)
ufs/lfs/lfs_bio.c, 1.24--1.30 (cleanup and make lfs_flush_fs take
"struct lfs *" instead of "struct
mount *" ;
use lfs_minfreeseg instead of
MIN_FREE_SEGS ;
use UINO macros, and copy bfree/avail
to CLEANERINFO ;
add lfs_reserve function ;
1.28--1.30 fix printf formatting)
ufs/lfs/lfs_cksum.c, 1.13 (add (c) 2000)
ufs/lfs/lfs_debug.c, 1.11 (use btodb instead of DEV_BSIZE)
ufs/lfs/lfs_extern.h, 1.18, 1.20--1.21 (function prototype changes)
ufs/lfs/lfs_inode.c, 1.38 (rewrite lfs_truncate from
ffs_truncate)
1.40--1.44 (count written and unwritten blocks
seperately ;
use disk block units instead of bytes ;
remove unnecessary "mod" variable ;
correct B_DELWRI to avoid bawrite panic ;
use lfs_reserve)
ufs/lfs/lfs_segment.c, 1.52-1.59 (use lfs_dmeta to note used summaries ;
check for UNWRITTEN in indirect blocks ;
more debugging stuff inside #ifdef
DEBUG_LFS ;
use LK_CANRECURSE ;
don't drop dirty indirect blocks ;
use UINO macros ;
don't hose the free list ;
use btodb() instead of DEV_BSIZE ;
make it compile again (oops))
ufs/lfs/lfs_subr.c, 1.16--1.17 (check for locked inodes before
changing ;
use btodb() instead of DEV_BSIZE, (c)
2000)
ufs/lfs/lfs_syscalls.c, back out 1.41.4.2 (fvdl's ufs_hashlock fix);
then pull up 1.43 (use lfs_dmeta)
1.44--1.45 (restore fvdl's ufs_hashlock fix)
1.46--1.47 (fix lfs_avail leakage from sblock
segments ;
use UINO macros)
1.49 (bounds-check inode numbers in
lfs_markv)
ufs/lfs/lfs_vfsops.c, 1.53 (use LFS_EST_* macros in lfs_statfs)
1.56--1.58 (initialize lfs_minfreeseg, lfs_effnblk ;
initialize lfs_uinodes ;
initialize lfs_ravail)
ufs/lfs/lfs_vnops.c, 1.40 (remove VDIROP from removed files)
1.42--1.44 (move SET_ENDOP below the removal of
VDIROP ;
use UINO macros and add lfs_itimes
function ;
use lfs_reserve in dirops)
 1.41.4.3 13-Jul-2000  thorpej Pull up rev. 1.48:
XXX Use of hzto() return value needs to be double-checked here.
 1.41.4.2 03-Jul-2000  fvdl pullup the fixes from the trunk to not hold ufs_hashlock across
getnewvnode()
 1.41.4.1 22-Jun-2000  perseant Pull up lfs_vunref fix from the trunk.
 1.56.6.5 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.56.6.4 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.56.6.3 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.56.6.2 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.56.6.1 03-Aug-2001  lukem update to -current
 1.56.4.3 02-Jul-2001  perseant Change disk addressing unit to be the fragment, instead of the disk sector.
All quantities in the superblock, inodes, indirect blocks, etc. refer now
to this abstract unit (called "fsb" as it is in FFS) instead of disk sectors;
as a consequence segment summary blocks have to be multiples of a fragment in
size. In v1 filesystems, compatibility code ensures that 1 fsb == 1 sector,
regardless of fragment size.

Fragments can now range in size between 512 and 32k; in the event that
LFS_LABELPAD (8k) is smaller than the disk address unit size, an extra
proto-superblock is kept at 8k from the beginning of the disk, to be used
*only* to locate the real superblocks. (Not all of the userland knows about
this yet.)

Almost all of this was done not by me, but by joff.
 1.56.4.2 29-Jun-2001  perseant Get rid of __P(), protoizing where it had not already been done
 1.56.4.1 27-Jun-2001  perseant Import of what I've been calling "LFSv2", that is, LFS with some features
added that require changes to the on-disk data structures. These include:

- 64-bit time in everything but inodes
- User-specified segment offset, and segment size no longer
restricted to PO2.
- Serial number on segment summaries in addition to timestamp, and
a new volume identifier, to make roll-forward feasible without
fear of finding old data and thinking it was new.

Although I think this version works at least as well as what's on the trunk,
we're not done yet; hence this commit is going in on a branch and not on
the trunk. Enhancements that are not here yet include fragment addressing,
like FFS does, instead of block addressing.
 1.56.2.14 29-Dec-2002  thorpej Sync with HEAD.
 1.56.2.13 19-Dec-2002  thorpej Sync with HEAD.
 1.56.2.12 11-Dec-2002  thorpej Sync with HEAD.
 1.56.2.11 13-Aug-2002  nathanw Catch up to -current.
 1.56.2.10 01-Aug-2002  nathanw Catch up to -current.
 1.56.2.9 12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.56.2.8 20-Jun-2002  nathanw Catch up to -current.
 1.56.2.7 29-May-2002  nathanw #include <sys/sa.h> before <sys/syscallargs.h>, to provide sa_upcall_t
now that <sys/param.h> doesn't include <sys/sa.h>.

(Behold the Power of Ed)
 1.56.2.6 08-Jan-2002  nathanw Catch up to -current.
 1.56.2.5 14-Nov-2001  nathanw Catch up to -current.
 1.56.2.4 21-Sep-2001  nathanw Catch up to -current.
 1.56.2.3 24-Aug-2001  nathanw A few files and lwp/proc conversions I missed in the last big update.
GENERIC runs again.
 1.56.2.2 24-Aug-2001  nathanw Catch up with -current.
 1.56.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.58.2.1 01-Oct-2001  fvdl Catch up with -current.
 1.59.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.65.4.2 07-Aug-2002  lukem Pull up revision 1.71 (requested by itojun in ticket #616):
correct range check, have overflow check, fix type mismatches,
for cmap args and some other calls. from openbsd
 1.65.4.1 20-Jun-2002  lukem Pull up revision 1.67 (requested by perseant in ticket #325):
For synchronous writes, keep separate i/o counters for each write, so
processes don't have to wait for one another to finish (e.g., nfsd seems
to be a little happier now, though I haven't measured the difference).
Synchronous checkpoints, however, must always wait for all i/o to finish.
Take the contents of the callback functions and have them run in thread
context instead (aiodoned thread). lfs_iocount no longer has to be
protected in splbio(), and quite a bit less of the segment construction
loop needs to be in splbio() as well.
If lfs_markv is handed a block that is not the correct size according to
the inode, refuse to process it. (Formerly it was extended to the "correct"
size.) This is possibly more prone to deadlock, but less prone to corruption.
lfs_segclean now outright refuses to clean segments that appear to have live
bytes in them. Again this may be more prone to deadlock but avoids
corruption.
Replace ufsspec_close and ufsfifo_close with LFS equivalents; this means
that no UFS functions need to know about LFS_ITIMES any more. Remove
the reference from ufs/inode.h.
Tested on i386, test-compiled on alpha.
 1.65.2.3 29-Aug-2002  gehenna catch up with -current.
 1.65.2.2 15-Jul-2002  gehenna catch up with -current.
 1.65.2.1 20-Jun-2002  gehenna catch up with -current.
 1.93.2.9 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.93.2.8 08-Mar-2005  skrll Sync with HEAD.
 1.93.2.7 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.93.2.6 30-Oct-2004  skrll Reduced diff to HEAD by restoring the struct proc * argument to lfs_bmapv
 1.93.2.5 21-Sep-2004  skrll Fix the sync with head I botched.
 1.93.2.4 18-Sep-2004  skrll Sync with HEAD.
 1.93.2.3 24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.93.2.2 03-Aug-2004  skrll Sync with HEAD
 1.93.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.100.10.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.100.8.1 29-Apr-2005  kent sync with -current
 1.100.6.1 10-May-2005  riz Pull up the following revisions (requested by perseant in ticket #1281):

1.8 sys/ufs/lfs/TODO
1.75 sys/ufs/lfs/lfs.h (via patch)
1.74 sys/ufs/lfs/lfs_alloc.c (via patch)
1.49, 1.51 sys/ufs/lfs/lfs_balloc.c (1.51 via patch)
1.78 sys/ufs/lfs/lfs_bio.c
1.62 sys/ufs/lfs/lfs_extern.h (via patch)
1.156 sys/ufs/lfs/lfs_segment.c (via patch)
1.48 sys/ufs/lfs/lfs_subr.c
1.101 sys/ufs/lfs/lfs_syscalls.c
1.163 sys/ufs/lfs/lfs_vfsops.c (via patch)
1.134 sys/ufs/lfs/lfs_vnops.c (via patch)
1.61 sys/ufs/ufs/ufs_readwrite.c (via patch)

1.20 libexec/lfs_cleanerd/clean.h (via patch)
1.52 libexec/lfs_cleanerd/cleanerd.c (via patch)
1.41 libexec/lfs_cleanerd/library.c (via patch)

1.4 regress/sys/fs/lfs/newfs_fsck/Makefile
1.2 regress/sys/fs/lfs/newfs_fsck/mkfs_mount
1.2 regress/sys/fs/lfs/newfs_fsck/smallfiles
1.3 sbin/fsck_lfs/bufcache.c
1.3 sbin/fsck_lfs/bufcache.h
1.3 sbin/fsck_lfs/lfs.h
1.8 sbin/fsck_lfs/lfs.c (via patch)
1.8 sbin/fsck_lfs/pass3.c (via patch)
1.18 sbin/fsck_lfs/pass0.c (via patch)
1.18 sbin/fsck_lfs/utilities.c (via patch)
1.7 sbin/fsck_lfs/segwrite.c
1.19 sbin/fsck_lfs/setup.c (via patch)
1.3 sbin/newfs_lfs/Makefile
0 sbin/newfs_lfs/lfs.c (yes, remove it)
1.1 sbin/newfs_lfs/make_lfs.c
1.15 sbin/newfs_lfs/newfs.c (via patch)

Various minor LFS improvements.

Kernel:

* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this. Should fix PR #29045.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
Fixes PR #26680.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().

cleaner:

* Adapt lfs_cleanerd to use the fcntl call to get the Ifile filehandle,
so it need not be in the namespace.
* Make lfs_cleanerd be more careful when there are very few available
segments.
* Make lfs_cleanerd less verbose when the filesystem is unmounted.

newfs_lfs, fsck_lfs, and regression:

* Extend the lfs library from fsck_lfs(8) so that it can be used with a
not-yet-existent LFS. Make newfs_lfs(8) use this library, so it can
create LFSs whose Ifile is larger than one segment. Addresses PR #11110.
* Make newfs_lfs(8) use strsuftoi64() for its arguments, a la newfs(8).
* Make fsck_lfs(8) respect the "file system is clean" flag.
* Don't let fsck_lfs(8) think it has dirty blocks when invoked with the
-n flag.
* Remove the Ifile from the filesystem namespace. The cleaner now uses
a fcntl call on the root inode to find the Ifile filehandle. (As a
side-effect, addresses PR #29144.)
 1.103.2.5 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_syscalls.c: revision 1.112
Get rid of the LFS_FORCE_WRITE case. We never really used it, and it could
panic the kernel if cleaner daemon passed the right combination of arguments.
Coverity CID 2741.
 1.103.2.4 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_balloc.c: revision 1.60
sys/ufs/lfs/lfs_syscalls.c: revision 1.111
sys/ufs/lfs/lfs_segment.c: revision 1.172
sys/ufs/lfs/lfs_vnops.c: revision 1.163
Several minor bug fixes:
* Correct (weak) segment lock assertions in lfs_fragextend and lfs_putpages.
* Keep IN_MODIFIED set if we run out of avail in lfs_putpages.
* Don't try to (re)write buffers on a VBLK vnode; fixes a panic I found
while running with an LFS root.
* Raise priority of LFCNSEGWAIT to PVFS; PUSER is way too low for
something the pagedaemon is relying on.
 1.103.2.3 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_segment.c: revision 1.170
sys/ufs/lfs/lfs.h: revision 1.96
sys/ufs/lfs/lfs_vfsops.c: revision 1.194
sys/ufs/lfs/lfs_syscalls.c: revision 1.109
From Konrad Schroeder, in response to strange df output on anoncvs.netbsd.org:
We were returning the wrong value for free space. Now we're not.
 1.103.2.2 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vfsops.c: revision 1.180
sys/ufs/lfs/lfs_syscalls.c: revision 1.106
sys/ufs/lfs/lfs.h: revision 1.87
Keep track of the number of segments reclaimed, since the cleaner doesn't
do this anymore (it hasn't for quite some time). Add a couple of conditional
debugging messages to indicate why segments are not cleaned, in the event
that lfs_segclean is used.
Make the LFCNSEGWAITALL fcntl work again.

sys/ufs/lfs/lfs_syscalls.c: revision 1.107
Don't update lfs_stats.segs_reclaimed if we're not keeping statistics.
Patch from Juan RP.
 1.103.2.1 07-May-2005  tron Apply patch (requested by perseant in ticket #242):
* fsck_lfs buffer cache fixes, including PR #29151
* Change fsck_lfs phase 0 message to reflect reality
* fsck_lfs: check phase 5 (cleanerinfo accounting) even on
roll-forward
* Keep better track of the free list during roll-forward, avoiding
a core dump
* Improve hash table use for fsck_lfs buffer and vnode cache
* Document fsck_lfs flag -f, and implement -q
* Add resize_lfs, including kernel support
* Add LFS to mountd's list of exportable filesystem types
* Make the LFS lkm work again [christos@]
* Add MP locking to the LFS kernel subsystem
* Fix pager_map deadlock in lfs_putpages()
* Avoid incomplete file extension that looks like "partial
truncation" to fsck
* Use lfs_malloc for cleaner malloc, since the cleaner often runs
in low-memory conditions.
* Use splay trees, not hash table, to track page allocation for
write.
* Fix mkdir panic on full fs
* Fix page accounting leak by counting differently.
* Use rightly named structure for lfs_getattr [skrll@]
* Cosmetic changes for readability.
 1.107.2.7 04-Feb-2008  yamt sync with head.
 1.107.2.6 21-Jan-2008  yamt sync with head
 1.107.2.5 27-Oct-2007  yamt sync with head.
 1.107.2.4 03-Sep-2007  yamt sync with head.
 1.107.2.3 26-Feb-2007  yamt sync with head.
 1.107.2.2 30-Dec-2006  yamt sync with head.
 1.107.2.1 21-Jun-2006  yamt sync with head.
 1.108.12.2 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.108.12.1 28-Mar-2006  tron Merge 2006-03-28 NetBSD-current into the "peter-altq" branch.
 1.108.10.5 11-May-2006  elad sync with head
 1.108.10.4 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.108.10.3 19-Apr-2006  elad sync with head.
 1.108.10.2 10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.108.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.108.8.6 03-Sep-2006  yamt sync with head.
 1.108.8.5 11-Aug-2006  yamt sync with head
 1.108.8.4 26-Jun-2006  yamt sync with head.
 1.108.8.3 24-May-2006  yamt sync with head.
 1.108.8.2 11-Apr-2006  yamt sync with head
 1.108.8.1 01-Apr-2006  yamt sync with head.
 1.108.6.3 01-Jun-2006  kardel Sync with head.
 1.108.6.2 22-Apr-2006  simonb Sync with head.
 1.108.6.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time() and use "time_second"
instead of "time.tv_sec".
 1.108.4.1 09-Sep-2006  rpaulo sync with head
 1.113.2.1 19-Jun-2006  chap Sync with head.
 1.116.4.2 10-Dec-2006  yamt sync with head.
 1.116.4.1 22-Oct-2006  yamt sync with head
 1.116.2.3 30-Jan-2007  ad Remove support for SA. Ok core@.
 1.116.2.2 12-Jan-2007  ad Sync with head.
 1.116.2.1 18-Nov-2006  ad Sync with head.
 1.118.4.1 03-Sep-2007  wrstuden Sync w/ NetBSD-4-RC_1
 1.118.2.1 05-Jun-2007  bouyer Pull up following revision(s) (requested by perseant in ticket #703):
sys/miscfs/genfs/genfs.h 1.21
sys/miscfs/genfs/genfs_vnops.c 1.151
sys/ufs/lfs/lfs.h 1.119, 1.120
sys/ufs/lfs/lfs_bio.c 1.99-101
sys/ufs/lfs/lfs_extern.h 1.89
sys/ufs/lfs/lfs_inode.c 1.108, 1.109
sys/ufs/lfs/lfs_segment.c 1.197, 1.199, 1.200
sys/ufs/lfs/lfs_subr.c 1.69, 1.70
sys/ufs/lfs/lfs_syscalls.c 1.119
sys/ufs/lfs/lfs_vfsops.c 1.234, 1.235
sys/ufs/lfs/lfs_vnops.c 1.195, 1.196, 1.200, 1.202-206

Reduce busy waiting in lfs_putpages(), and other LFS improvements.
 1.121.2.1 12-Mar-2007  rmind Sync with HEAD.
 1.122.18.1 14-Oct-2007  yamt sync with head.
 1.122.16.3 23-Mar-2008  matt sync with HEAD
 1.122.16.2 09-Jan-2008  matt sync with HEAD
 1.122.16.1 06-Nov-2007  matt sync with HEAD
 1.122.14.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.122.2.4 24-Aug-2007  ad Sync with buffer cache locking changes. See buf.h/vfs_bio.c for details.
Some minor portions are incomplete and needs to be verified as a whole.
 1.122.2.3 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.122.2.2 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.122.2.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.124.10.1 02-Jan-2008  bouyer Sync with HEAD
 1.124.6.5 26-Dec-2007  ad Sync with head.
 1.124.6.4 19-Dec-2007  ad Use a global lfs_lock.
 1.124.6.3 19-Dec-2007  ad Fix some more problems w/lfs on this branch.
 1.124.6.2 19-Dec-2007  ad Get lfs mostly working.
 1.124.6.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.124.4.1 18-Feb-2008  mjf Sync with HEAD.
 1.128.10.1 18-May-2008  yamt sync with head.
 1.128.8.2 01-Nov-2008  christos Sync with head.
 1.128.8.1 29-Mar-2008  christos Welcome to the time_t=long long dev_t=uint64_t branch.
 1.128.6.2 17-Jan-2009  mjf Sync with HEAD.
 1.128.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.129.2.5 11-Aug-2010  yamt sync with head.
 1.129.2.4 11-Mar-2010  yamt sync with head
 1.129.2.3 16-Sep-2009  yamt sync with head
 1.129.2.2 04-May-2009  yamt sync with head.
 1.129.2.1 16-May-2008  yamt sync with head.
 1.132.2.3 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.132.2.2 14-May-2008  wrstuden Per discussion with ad, remove most of the #include <sys/sa.h> lines
as they were including sa.h just for the type(s) needed for syscallargs.h.

Instead, create a new file, sys/satypes.h, which contains just the
types needed for syscallargs.h. Yes, there's only one now, but that
may change and it's probably more likely to change if it'd be difficult
to handle. :-)

Per discussion with matt at n dot o, add an include of satypes.h to
sigtypes.h. Upcall handlers are kinda signal handlers, and signalling
is the header file that's already included for syscallargs.h that
closest matches SA.

This shaves about 3000 lines off of the diff of the branch relative
to the base. That also represents about 18% of the total before this
checkin.

I think this reduction is very good thing.
 1.132.2.1 10-May-2008  wrstuden Initial checkin of re-adding SA. Everything except kern_sa.c
compiles in GENERIC for i386. This is still a work-in-progress, but
this checkin covers most of the mechanical work (changing signalling
to be able to accomidate SA's process-wide signalling and re-adding
includes of sys/sa.h and savar.h). Subsequent changes will be much
more interesting.

Also, kern_sa.c has received partial cleanup. There's still more
to do, though.
 1.133.6.1 19-Jan-2009  skrll Sync with HEAD.
 1.135.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.135.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.136.2.3 19-May-2011  rmind Implement sharing of vnode_t::v_interlock amongst vnodes:
- Lock is shared amongst UVM objects using uvm_obj_setlock() or getnewvnode().
- Adjust vnode cache to handle unsharing, add VI_LOCKSHARE flag for that.
- Use sharing in tmpfs and layerfs for underlying object.
- Simplify locking in ubc_fault().
- Sprinkle some asserts.

Discussed with ad@.
 1.136.2.2 03-Jul-2010  rmind sync with head
 1.136.2.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.138.6.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.139.6.2 05-Apr-2012  mrg sync to latest -current.
 1.139.6.1 18-Feb-2012  mrg merge to -current.
 1.139.2.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.139.2.2 23-Jan-2013  yamt sync with head
 1.139.2.1 17-Apr-2012  yamt sync with head
 1.142.2.4 03-Dec-2017  jdolecek update from HEAD
 1.142.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.142.2.2 23-Jun-2013  tls resync from head
 1.142.2.1 25-Feb-2013  tls resync with head
 1.147.2.2 18-May-2014  rmind sync with head
 1.147.2.1 28-Aug-2013  rmind sync with head
 1.152.2.1 10-Aug-2014  tls Rebase.
 1.155.4.5 28-Aug-2017  skrll Sync with HEAD
 1.155.4.4 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.155.4.3 22-Sep-2015  skrll Sync with HEAD
 1.155.4.2 06-Jun-2015  skrll Sync with HEAD
 1.155.4.1 06-Apr-2015  skrll Sync with HEAD
 1.172.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.172.2.2 26-Apr-2017  pgoyette Sync with HEAD
 1.172.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.174.4.1 30-Oct-2017  snj Pull up following revision(s) (requested by maya in ticket #330):
sbin/fsck_lfs/inode.c: 1.69
sbin/fsck_lfs/lfs.c: 1.73
sbin/fsck_lfs/pass6.c: 1.50
sbin/fsck_lfs/segwrite.c: 1.46
sys/ufs/lfs/lfs.h: 1.202-1.203
sys/ufs/lfs/lfs_accessors.h: 1.48
sys/ufs/lfs/lfs_alloc.c: 1.136-1.137
sys/ufs/lfs/lfs_balloc.c: 1.94
sys/ufs/lfs/lfs_bio.c: 1.141
sys/ufs/lfs/lfs_extern.h: 1.113
sys/ufs/lfs/lfs_inode.c: 1.156-1.157
sys/ufs/lfs/lfs_inode.h: 1.20, 1.21, 1.23
sys/ufs/lfs/lfs_itimes.c: 1.20
sys/ufs/lfs/lfs_pages.c: 1.13-1.15
sys/ufs/lfs/lfs_rename.c: 1.22
sys/ufs/lfs/lfs_segment.c: 1.270-1.275
sys/ufs/lfs/lfs_subr.c: 1.94-1.97
sys/ufs/lfs/lfs_syscalls.c: 1.175
sys/ufs/lfs/lfs_vfsops.c: 1.360
sys/ufs/lfs/lfs_vnops.c: 1.316-1.321
sys/ufs/lfs/ulfs_inode.c: 1.20
sys/ufs/lfs/ulfs_inode.h: 1.24
sys/ufs/lfs/ulfs_lookup.c: 1.41
sys/ufs/lfs/ulfs_quota2.c: 1.31
sys/ufs/lfs/ulfs_readwrite.c: 1.24
sys/ufs/lfs/ulfs_vnops.c: 1.49-1.50
Update inode member i_flag --> i_state to keep up with kernel changes
Move definition of IN_ALLMOD near the flag it's a mask for.
Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
Correct confusion between i_flag and i_flags
These will have to be renamed.
Spotted by Riastradh, thanks!
Add an XXX about the missing flags so it's not buried in a commit
message.
now the XXX count for LFS is 260
Rename i_flag to i_state.
The similarity to i_flags has previously caused errors.
Use continue to denote the no-op loop to match netbsd style
newline for extra clarity.
It isn't safe to drain dirops with seglock held, it'll deadlock if there
are any dirops. drain before grabbing seglock.
lfs_dirops == 0 is always true (as we already drained dirops), so omit
that part of the comparison.
Fixes a lot of LFS deadlocks. PR kern/52301
Many thanks to dholland for help analyzing coredumps
Ifdef out KDASSERT which fires on my machine.
Deduplicate sanity check that seglock is held on segunlock
Revert r1.272 fix to PR kern/52301, the performance hit is making things
unusable.
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar
XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
fix buffer overflow/KASSERT when cookies are supplied
lfs no longer uses the ffs-style struct direct, use the correct minimum
size
from dholland
XXX more wrong
Consistently use {,UN}MARK_VNODE macros rather than function calls.
Not much point doing anything after a panic call
Ask some question about the code in a XXX comment
XXX question our double-flushing of dirops
Fix typo in comment
 1.175.10.1 29-Feb-2020  ad Sync with head.
 1.175.4.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.392 04-Nov-2025  perseant Remove su_flags array, replacing it with a new flag SEGUSE_READY.
Segments progress from having su_nbytes==0 to SEGUSE_EMPTY to SEGUSE_READY
to clean, progressing to the nest step after a checkpoint.
 1.391 20-Oct-2025  perseant Correct handling of B_MODIFY in lfs_resize_fs to avoid leaving some old
file-entry data in the segment table.
 1.390 20-Oct-2025  perseant * Generalize the partial-segment parser introduced for roll-forward,
using it to facilitate an in-kernel segment rewriter (cleaner), and a
mechanism to check whether a segment is in fact empty (only used
with DEBUG).

* Add these new fcntl calls:
- LFCNFILESTATS: For each inode given, report its number of direct
blocks, how many gaps (discontinuities) there are between direct
blocks, and how large the total gap distance is. This will be
useful for a coalescing agent.
- LFCNREWRITEFILE: For each inode given, rewrite its direct blocks,
effectively coalescing it into as compact a form as possible.
- LFCNSCRAMBLE: As above, except that it only rewrites every other
block. This causes the file to have many gaps that can be
measured with LFCNFILESTATS and addressed with LFCNREWRITEFILE,
for testing purposes.
- LFCNREWRITESEGS: Rewrite any live data in the given segments.
This is intended to simplify the cleaner API and facilitate an
in-kernel cleaner.
- LFCNCLEANERINFO: Get the most current CLEANERINFO data from the
kernel.
- LFCNSEGUSE: Retrieve segment usage data from the kernel.

* Vnodes marked IN_CLEANING now take a reference. Add a new "cleaner
lock", which must be taken by the cleaner before the segment lock,
and before marking nodes IN_CLEANING. This allows us to flush
vnodes, if necessary, before the cleaning segment is written, and
never to flush vnodes being cleaned. When the cleaner lock is
released, the vnodes are cleared of IN_CLEANING and the reference
dropped.

* Track a potential infinite loop in lfs_gatherblock.

* Pull "needs to flush" and "needs to wait for flush" into functions
instead of inlining their definitions.
 1.389 29-Sep-2025  perseant Use the symbolic name MNT_WAIT when calling VFS_SYNC. No functional change.
 1.388 19-Sep-2025  perseant Interpret the "waitfor" argument to lfs_sync to match the passed values.
Not every sync needs to be a synchronous checkpoint.
 1.387 17-Sep-2025  perseant Add working in-kernel roll forward.
 1.386 17-Sep-2025  perseant Use a workqueue to handle the superblock callback.
 1.385 17-Sep-2025  perseant Add routines to check freelist consistency if compiled with DEBUG and
conditional on a kernel variable manipulated via sysctl.
Add checks before and after each routine that modifies the free list.
#if 0 a section of lfs_vfree() that was intended to keep the free list ordered
but instead corrupted it.
 1.384 02-Sep-2025  perseant Use a workqueue to handle cluster iodone, rather than doing it in interrupt context.
 1.383 30-Dec-2024  hannken emove comment "we are always called with the filesystem marked `MPBUSY'."
above some xxx_sync() operations. These operations get called without
any exclusive lock.

This comment appeared with "add quota support" on 1990-05-02.
On 1998/02/18 MNT_MPBUSY disappeared when vfs_busy() was changed from
an exclusive lock to a shared lock.

PR kern/58837 "ffs: Missing locking around fs_fmod/time"
 1.382 19-Mar-2022  hannken branches: 1.382.4; 1.382.10;
Remove now unused VV_LOCKSWORK, all file systems support locking.

Remove unused predicates vn_locked() and vn_anylocked().

Welcome to 9.99.95
 1.381 31-Jul-2021  andvar s/threshhold/threshold
 1.380 05-Sep-2020  riastradh branches: 1.380.6;
Round of uvm.h cleanup.

The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
to query whether curlwp is the pagedaemon, which should maybe be
exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
by it. We should split up uvm_extern.h but this will serve for now
to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
UVMHIST(ubchist), since ubchist is declared in uvm.h but the
reference evaporates if UVMHIST is not defined, so we reduce header
file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
here.

ok chs@
 1.379 04-Aug-2020  riastradh Mark lfs vnodes with VV_LOCKSWORK, same as ffs.
 1.378 04-Apr-2020  ad Merge the remaining changes from the ad-namecache branch, affecting namei()
and getcwd():

- push vnode locking back as far as possible.
- do most lookups directly in the namecache, avoiding vnode locks & refs.
- don't block new refs to vnodes across VOP_INACTIVE().
- get shared locks for VOP_LOOKUP() if the file system supports it.
- correct lock types for VOP_ACCESS() / VOP_GETATTR() in a few places.

Possible future enhancements:

- make the lookups lockless.
- support dotdot lookups by being lockless and inferring absence of chroot.
- maybe make it work for layered file systems.
- avoid vnode references at the root & cwd.
 1.377 16-Mar-2020  pgoyette Use the module subsystem's ability to process SYSCTL_SETUP() entries to
automate installation of sysctl nodes.

Note that there are still a number of device and pseudo-device modules
that create entries tied to individual device units, rather than to the
module itself. These are not changed.
 1.376 14-Mar-2020  ad Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.
 1.375 14-Mar-2020  ad OR into bp->b_cflags; don't overwrite.
 1.374 23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.373 23-Feb-2020  riastradh Dust off the orphan detection code and try to make it work.
 1.372 23-Feb-2020  riastradh Initialize/destroy lfs_allclean_wakeup in modcmd, not lfs_mountfs.

Fixes reloading lfs.kmod.
 1.371 23-Feb-2020  riastradh Teach lfs to transition ro<->rw.
 1.370 18-Feb-2020  chs remove the aiodoned thread. I originally added this to provide a thread context
for doing page cache iodone work, but since then biodone() has changed to
hand off all iodone work to a softint thread, so we no longer need the
special-purpose aiodoned thread.
 1.369 17-Jan-2020  ad VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.
 1.368 15-Jan-2020  ad Merge from yamt-pagecache (after much testing):

- Reduce unnecessary page scan in putpages esp. when an object has a ton of
pages cached but only a few of them are dirty.

- Reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
 1.367 31-Dec-2019  ad branches: 1.367.2;
- Add and use wrapper functions that take and acquire page interlocks, and pairs
of page interlocks. Require that the page interlock be held over calls to
uvm_pageactivate(), uvm_pagewire() and similar.

- Solve the concurrency problem with page replacement state. Rather than
updating the global state synchronously, set an intended state on
individual pages (active, inactive, enqueued, dequeued) while holding the
page interlock. After the interlock is released put the pages on a 128
entry per-CPU queue for their state changes to be made real in batch.
This results in in a ~400 fold decrease in contention on my test system.
Proposed on tech-kern but modified to use the page interlock rather than
atomics to synchronise as it's much easier to maintain that way, and
cheaper.
 1.366 13-Dec-2019  ad Break the global uvm_pageqlock into a per-page identity lock and a private
lock for use of the pagedaemon policy code. Discussed on tech-kern.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour
 1.365 28-May-2019  msaitoh branches: 1.365.2;
s/recieve/receive/
 1.364 01-Jan-2019  hannken Add "void *extra" argument to vcache_new() so a file system may
pass more information about the file to create.

Welcome to 8.99.30
 1.363 10-Dec-2018  maxv Remove unused mbuf.h includes.
 1.362 28-May-2018  chs branches: 1.362.2;
add a genfs method to allow a file system to limit the range of pages
that are given to a single GOP_WRITE() call. needed by ZFS.
 1.361 28-Oct-2017  pgoyette branches: 1.361.2;
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3) format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
but it is possible that I've missed some of them. I would be glad to
update any stragglers that anyone identifies.
 1.360 26-Jul-2017  maya change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar

XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
 1.359 17-Apr-2017  hannken branches: 1.359.2; 1.359.4;
Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.
 1.358 17-Apr-2017  hannken Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).
 1.357 13-Apr-2017  hannken Switch lfs_flush() and lfs_writerd() to mountlist iterator.
 1.356 01-Apr-2017  maya Switch lfs_writer_daemon to use condvar instead of mtsleep.
track thread existence with struct lwp instead of pid + lid,
it's more useful from ddb.
 1.355 01-Apr-2017  maya switch lfs_dirops to condvar (from mtsleep)
 1.354 01-Apr-2017  maya switch lfs_sleepers to condvar (from mtsleep)
 1.353 13-Mar-2017  riastradh #if DIAGNOSTIC panic ---> KASSERT

Replace some #if DEBUG by this too. DEBUG is only for expensive
assertions; these are not.
 1.352 17-Feb-2017  hannken Add generic genfs_suspendctl() and use it for all file systems.
Layered file systems need work.
 1.351 07-Jul-2016  msaitoh branches: 1.351.2; 1.351.4;
KNF. Remove extra spaces. No functional change.
 1.350 20-Jun-2016  dholland Merge -r1.44 of ufs_extattr.c and related change -r1.302 of ffs_vfops.c:
fix use-after-free on failed unmount with extended attributes enabled.
 1.349 19-Oct-2015  dholland Set the legacy ulfs fstype field to ULFS2 when mounting lfs64. Oops.
 1.348 15-Oct-2015  dholland Enable mounting lfs64 volumes.
 1.347 15-Oct-2015  dholland Move stuff from struct ulfsmount to struct lfs.
 1.346 10-Oct-2015  dholland Remove no longer needed explicit 32->64 sign extension.

This is the last 32-bit-on-disk item among those that were either
already tagged or readily discoverable.
 1.345 01-Sep-2015  dholland Add new accessors for the d_type and d_namlen fields of struct lfs_direct.
Napalm the old byteswap access logic for these.
 1.344 01-Sep-2015  dholland Make the inode fields in the 64-bit superblock 64 bits wide.
Reasoning as before.

Note that I am not going through and checking for 64->32 truncations
in inode numbers; I'm sure there are quite a few, but that's a project
for later.
 1.343 01-Sep-2015  dholland Add byteswapping to the dinode accessors.

This prevents regressions in the ulfs code when switching to the new
accessors. Note that while adding byteswapping to the other accessors
is straightforward, I haven't done it yet; and that also is not enough
to make LFS_EI work, because there are places lying around that bypass
the accessors for one reason and another and all of them need to be
updated. That is going to have to wait for a later day as LFS_EI is
not on the critical path right now.
 1.342 01-Sep-2015  dholland Use the lfs dinode accessors in place of the ufs-derived ones.
(Mostly.)

The ufs-derived ones are fake structure member macros, which are gross
and not very safe. Also, it seems that a lot of places in the lfs code
were using the ffsv1 branch of them unconditionally, and this way it's
guaranteed all those places have been updated.

Found while doing this: for non-devices, have getattr produce NODEV
in the rdev field instead of leaking the address of the first direct
block.
 1.341 19-Aug-2015  dholland Part two of dinodes; use the same union everywhere.
(previously the ufs-derived code had things set up slightly different)

Remove a bunch of associated mess.
 1.340 12-Aug-2015  dholland Hack up dinode usage to be 64 vs. 32 as needed. Part 1.

(This part changes the native lfs code; the ufs-derived code already
has 64 vs. 32 logic, but as aspects of it are unsafe, and don't
entirely interoperate cleanly with the lfs 64/32 stuff, pass 2 will be
rehashing that.)
 1.339 12-Aug-2015  dholland Provide 32-bit and 64-bit versions of FINFO.

This also entailed sorting out part of struct segment, as that
contains a pointer into the current FINFO data.
 1.338 12-Aug-2015  dholland Make 32-bit and 64-bit versions of SEGSUM.
Also fix some of the FINFO handling as it's closely entangled.
 1.337 12-Aug-2015  dholland Add IFILE32 and IFILE64 structures for the on-disk ifile entries.
Add and use accessors. There are also a bunch of places that cast and
I hope I've found them all...
 1.336 12-Aug-2015  dholland Make 32-bit and 64-bit versions of CLEANERINFO.

XXX: while this is written to disk, it seems like much of it would
XXX: be better set up as a commpage shared with the cleaner.
 1.335 12-Aug-2015  dholland Fix botched syscall_package. HI CHRISTOS
 1.334 02-Aug-2015  dholland Pass the fs object to LFS_MAX_DADDR so it can check lfs_is64.

Remove some hackish intentional 64->32 truncations next to the checks
using LFS_MAX_DADDR, and tackle the problem they handled in bmap
instead.

The problem: the magic block pointer value UNWRITTEN has magic value
-2, and if it's not handled specifically, uint32 -> uint64 promotion
turns it into 4294967294, which then causes consternation and
monkeyhouse downstream.

What's here is still kind of a hack, but it's a step forward.
 1.333 02-Aug-2015  dholland Add a (draft) 64-bit superblock. Make things build again.

Add pieces of support for using both superblock types where
convenient, and specifically to the superblock accessors, but don't
actually enable it anywhere.

First substantive step on PR 50000.
 1.332 02-Aug-2015  dholland Use accessor functions for the version field of the lfs superblock.
I thought at first maybe the cases that test the version should be
rolled into the accessors, but on the whole I think the conclusion on
that is no.
 1.331 02-Aug-2015  dholland Second batch of 64 -> 32 truncations in lfs, along with more minor
tidyups and corrections in passing.
 1.330 02-Aug-2015  dholland Fix assorted 64 -> 32 truncations in lfs. Also, some minor tidyups and
corrections in passing.
 1.329 28-Jul-2015  dholland Add a new lfs header file: lfs_accessors.h.

This contains all the accessor functions and macros out of lfs.h.
Add an include of lfs_accessors.h after all uses of lfs.h... except
for code that wants to define its own struct lfs-alike that the
accessors are supposed to play along with. For these, set STRUCT_LFS
and include lfs_accessors.h after the necessary structure has been
defined, so that lfs_accessors.h can emit functions in terms of it.
 1.328 24-Jul-2015  dholland More lfs superblock accessors.
(This changes the rest of the code over; all the accessors were
already added.)

The difference between this commit and the previous one is arbitrary,
but the previous one passed the regression tests on its own so I'm
keeping it separate to help with any bisections that might be needed
in the future.
 1.327 24-Jul-2015  dholland Switch to accessor functions for elements of the LFS on-disk
superblock. This will allow switching between 32/64 bit forms on the
fly; it will also allow handling LFS_EI reasonably tidily. (That
currently doesn't work on the superblock.)

It also gets rid of cpp abuse in the form of fake structure member
macros.

Also, instead of doing sleep/wakeup on &lfs_avail and &lfs_nextseg
inside the on-disk superblock, add extra elements to the in-memory
struct lfs for this. (XXX: these should be changed to condvars, but
not right now)

XXX: this migrates a structure needed by the lfs code in libsa (struct
salfs) into lfs.h, where it doesn't belong, but for the time being
this is necessary in order to allow the accessors (and the various
lfs macros and other goop that relies on them) to compile.
 1.326 16-Jul-2015  dholland Don't cast the return value of malloc.
 1.325 07-Jun-2015  hannken Fix copy and paste errors from last commits.
- Kernel i386/ALL and amd64/ALL compile again.
- Resolves CID 1304138 (DEADCODE) and 1304139 (IDENTICAL_BRANCHES).
 1.324 31-May-2015  hannken Change lfs from hash table to vcache.

- Change lfs_valloc() to return an inode number and version instead of
a vnode and move lfs_ialloc() and lfs_vcreate() to new lfs_init_vnode().

- Add lfs_valloc_fixed() to allocate a known inode, used by kernel
roll forward.

- Remove lfs_*ref(), these functions cannot coexist with vcache and
their commented behaviour is far away from their implementation.

- Add the cleaner lwp and blockinfo to struct ulfsmount so lfs_loadvnode()
may use hints from the cleaner.

- Remove vnode locks from ulfs_lookup() like we did with ufs_lookup().
 1.323 31-May-2015  hannken Use VFS_PROTOS() for lfs.
Rename conflicting struct lfs field "lfs_start" to "lfs_s0addr".

No functional change.
 1.322 28-Mar-2015  maxv Remove the 'cred' argument from bread(). Remove a now unused var in
ffs_snapshot.c. Update the man page accordingly.

ok hannken@
 1.321 16-Apr-2014  maxv branches: 1.321.4;
An (un)privileged user can easily make the kernel dereference a NULL
pointer.

The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).

ok christos@
 1.320 24-Mar-2014  hannken branches: 1.320.2;
- Make VI_XLOCK, VI_CLEAN and VI_LOCKSHARE private to kern/vfs_*.c.
- Make vwait() static.
- Add vdead_check() to check a vnode for being or becoming dead.

Discussed on tech-kern.

Welcome to 6.99.38
 1.319 23-Mar-2014  hannken Change all vfsops to use C99 designated initializers.

No functional changes intended.
 1.318 25-Feb-2014  pooka Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.317 27-Nov-2013  christos Change the queue.3 *_END(&head) macros to NULL. Since we don't have CIRCLEQ
anymore, all the macros expand to NULL anyway, so this improves readability.
Requested by rmind@
 1.316 23-Nov-2013  christos change the mountlist CIRCLEQ into a TAILQ
 1.315 17-Oct-2013  christos - remove unused variables
- add debug ifdefs for debugging variables
- __USE() where appropriate.
 1.314 30-Sep-2013  hannken Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>
 1.313 28-Jul-2013  dholland Merge in some of the stuff for supporting the extended attributes code.
 1.312 28-Jul-2013  dholland Add more of the bits for supporting quotas.
 1.311 28-Jul-2013  dholland Bring in a copy of ffs_quota2_mount() for reference.
Add stuff to struct lfs that it needs to initialize.
Clear these fields in mount as there's no on-disk support for quota2;
but this increases the chances of being able to add it (or something
like it) in the future.
 1.310 28-Jul-2013  dholland Migrate the miscellaneous ulfs-level info from struct ulfsmount to
struct lfs.

Put them inside #ifdef _KERNEL there. They are not the only such
members, gross as that is. Unfortunately, moving struct lfs to
lfs_kernel.h does not work.
 1.309 28-Jul-2013  dholland Add lfs_kernel.h for declarations that don't need to be exposed to userland.

lfs currently has the following headers:
lfs.h - on-disk structures and stuff needed for userlevel tools
lfs_inode.h - additional restricted materials for userlevel tools
that operate the fs (newfs_lfs, fsck_lfs, lfs_cleanerd)
lfs_kernel.h - stuff needed only in the kernel

and the following legacy headers that are expected to be mopped up and
folded into one of the above:
lfs_extern.h - function prototypes
ulfs_bswap.h - endian-independent support
ulfs_dinode.h - now contains very little
ulfs_dirhash.h - dirhash support
ulfs_extattr.h - extattr support
ulfs_extern.h - more function prototypes
ulfs_inode.h - assorted kernel-only declarations
ulfs_quota.h - quota support
ulfs_quota1.h - more quota support
ulfs_quota2.h - more quota support
ulfs_quotacommon.h - more quota support
ulfsmount.h - legacy copy of ufsmount material
 1.308 28-Jul-2013  dholland Get rid of the ulfs_ops table as we only have one fs in here now.
 1.307 18-Jun-2013  christos branches: 1.307.2;
Prefix most of the cpp macros with lfs_ and LFS_ to avoid conflicts with ffs.
This was done so that boot blocks that want to compile both FFS and LFS in
the same file work.
 1.306 17-Jun-2013  christos LFS module does not depend on FFS anymore. (NAKAJIMA Yoshihiro)
 1.305 10-Jun-2013  hannken Make DEBUG kernel compile: di_u.inumber -> di_inumber
 1.304 08-Jun-2013  dholland DIRBLKSIZ -> LFS_DIRBLKSIZ
DIRECTSIZ -> LFS_DIRECTSIZ
DIRSIZ -> LFS_DIRSIZ
OLDDIRFMT -> LFS_OLDDIRFMT
NEWDIRFMT -> LFS_NEWDIRFMT
IFTODT -> LFS_IFTODT
DTTOIF -> LFS_DTTOIF
 1.303 08-Jun-2013  dholland Stick LFS_ in front of IFMT, IFIFO, IFREG, etc. so as not to conflict
with the UFS copies of these symbols. (Which themselves ought to have
UFS_ stuck on.)
 1.302 06-Jun-2013  dholland Add lfs_ or ulfs_ in front of extern symbols lacking them, mostly
quota-related (and particularly quota2-related) stuff.
 1.301 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.300 06-Jun-2013  dholland Split lfs from ufs step 3: rearrange config stuff.
Add new options:
LFS_EI
LFS_DIRHASH
LFS_EXTATTR
LFS_EXTATTR_AUTOSTART
LFS_QUOTA
LFS_QUOTA2

and update code referring to the corresponding FFS and UFS config
symbols to use the LFS versions. Disable the one extant reference
to APPLE_UFS in the ulfs files. Use opt_lfs.h only, not opt_ffs.h.
 1.299 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.298 22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.297 20-Dec-2012  hannken Change bread() and breadn() to never return a buffer on
error and modify all callers to not brelse() on error.

Welcome to 6.99.16

PR kern/46282 (6.0_BETA crash: msdosfs_bmap -> pcbmap -> bread -> bio_doread)
 1.296 30-Apr-2012  rmind branches: 1.296.2;
- Replace some malloc(9) uses with kmem(9).
- G/C M_IPMOPTS, M_IPMADDR and M_BWMETER.
 1.295 13-Mar-2012  elad Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.
 1.294 16-Feb-2012  perseant Pass t_renamerace and t_rmdirrace tests.

Adapt dholland@'s fix to ufs_rename to fix PR kern/43582. Address several
other MP locking issues discovered during the course of investigating the
same problem.

Removed extraneous vn_lock() calls on the Ifile, since the Ifile writes
are controlled by the segment lock.

Fix PR kern/45982 by deemphasizing the estimate of how much metadata
will fill the empty space on disk when the disk is nearly empty
(t_renamerace crates a lot of inode blocks on a tiny empty disk).
 1.293 04-Jan-2012  perseant branches: 1.293.2;
lfs_writerd thread exits when no more LFSs are mounted.
 1.292 02-Jan-2012  perseant * Remove PGO_RECLAIM during lfs_putpages()' call to genfs_putpages(),
to avoid a live lock in the latter when reclaiming a vnode with
dirty pages.

* Add a new segment flag, SEGM_RECLAIM, to note when a segment is
being written for vnode reclamation, and record which inode is being
reclaimed, to aid in forensic debugging.

* Add a new segment flag, SEGM_SINGLE, so that opportunistic writes
can write a single segment's worth of blocks and then stop, rather
than writing all the way up to the cleaner's reserved number of
segments.

* Add assert statements to check mutex ownership is the way it ought
to be, mostly in lfs_putpages; fix problems uncovered by this.

* Don't clear VU_DIROP until the inode actually makes its way to disk,
avoiding a problem where dirop inodes could become separated
(uncovered by a modified version of the "ckckp" forensic regression
test).

* Move the vfs_getopsbyname() call into lfs_writerd. Prepare code to
make lfs_writerd notice when there are no more LFSs, and exit losing
the reference, so that, in theory, the module can be unloaded. This
code is not enabled, since it causes a crash on exit.

* Set IN_MODIFIED on inodes flushed by lfs_flush_dirops. Really we
only need to set IN_MODIFIED if we are going to write them again
(e.g., to write pages); need to think about this more.

Finally, several changes to help avoid "no clean segments" panics:

* In lfs_bmapv, note when a vnode is loaded only to discover whether
its blocks are live, so it can immediately be recycled. Since the
cleaner will try to choose ~empty segments over full ones, this
prevents the cleaner from (1) filling the vnode cache with junk, and
(2) squeezing any unwritten writes to disk and running the fs out of
segments.

* Overestimate by half the amount of metadata that will be required
to fill the clean segments. This will make the disk appear smaller,
but should help avoid a "no clean segments" panic.

* Rearrange lfs_writerd. In particular, lfs_writerd now pays
attention to the number of clean segments available, and holds off
writing until there is room.
 1.291 14-Nov-2011  hannken branches: 1.291.4;
VOP_OPEN() needs a locked vnode. All these copy-and-pasted xxxfs_mount()
implementations need more review.
 1.290 11-Jul-2011  hannken branches: 1.290.2;
Change VOP_BWRITE() to take a vnode as its first argument like all other
VOPs do. Layered file systems no longer have to modify bp->b_vp and run
into trouble when an async VOP_BWRITE() uses the wrong vnode.

- change all occurences of VOP_BWRITE(bp) to VOP_BWRITE(bp->b_vp, bp).
- remove layer_bwrite().
- welcome to 5.99.55

Adresses PR kern/38762 panic: vwakeup: neg numoutput

No objections from tech-kern@.
 1.289 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.288 06-Mar-2011  bouyer branches: 1.288.2;
merge the bouyer-quota2 branch. This adds a new on-disk format
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.

See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
 1.287 24-Jun-2010  hannken branches: 1.287.2; 1.287.4;
Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.
 1.286 02-Mar-2010  pooka branches: 1.286.2;
load lfs syscalls in modload
 1.285 02-Mar-2010  pooka /*
* XXX: Get extra reference to LFS vfsops. This prevents unload,
* but also prevents kernel panic due to text being unloaded
* from below lfs_writerd. When lfs_writerd can exit, remove
* this!!!
*/
 1.284 18-Feb-2010  eeh Fix root filesystem support.
 1.283 16-Feb-2010  mlelstv Three changes in a single commit.

- drop the notion of frags (LFS fragments) vs fsb (FFS fragments)
The code uses a complicated unity function that just makes the
code difficult to understand.

- support larger sector sizes. Fix disk address computations
to use DEV_BSIZE in the kernel as required by device drivers
and to use sector sizes in userland.

- Fix several locking bugs in lfs_bio.c and lfs_subr.c.
 1.282 08-Jan-2010  pooka branches: 1.282.2;
The VATTR_NULL/VREF/VHOLD/HOLDRELE() macros lost their will to live
years ago when the kernel was modified to not alter ABI based on
DIAGNOSTIC, and now just call the respective function interfaces
(in lowercase). Plenty of mix'n match upper/lowercase has creeped
into the tree since then. Nuke the macros and convert all callsites
to lowercase.

no functional change
 1.281 07-Dec-2009  eeh Fix some more hangs and deadlocks.
 1.280 17-Nov-2009  pooka Create unwind log in global variable instead of automatic variable.

memory leak spotted by njoly's valgrind run
 1.279 29-Oct-2009  eeh Fix up numoutput accounting.
 1.278 13-Sep-2009  tsutsui Move declaration of ufs_hashlock into <ufs/ufs_extern.h> from each c source.
 1.277 05-Aug-2009  pooka Compensate v_numoutput & nestbuf for lfs's rather peculiar I/O habits.
 1.276 05-Aug-2009  pooka remember to nestiobuf_done() too
 1.275 05-Aug-2009  pooka Use nestiobuf instead of homerolled equivalent.
 1.274 29-Jun-2009  dholland Convert 67 namei call sites to use namei_simple, in these functions:

check_console, veriexecclose, veriexec_delete, veriexec_file_add,
emul_find_root, coff_load_shlib (sh3 version), coff_load_shlib,
compat_20_sys_statfs, compat_20_netbsd32_statfs,
ELFNAME2(netbsd32,probe_noteless), darwin_sys_statfs,
ibcs2_sys_statfs, ibcs2_sys_statvfs, linux_sys_uselib,
osf1_sys_statfs, sunos_sys_statfs, sunos32_sys_statfs,
ultrix_sys_statfs, do_sys_mount, fss_create_files (3 of 4),
adosfs_mount, cd9660_mount, coda_ioctl, coda_mount, ext2fs_mount,
ffs_mount, filecore_mount, hfs_mount, lfs_mount, msdosfs_mount,
ntfs_mount, sysvbfs_mount, udf_mount, union_mount, sys_chflags,
sys_lchflags, sys_chmod, sys_lchmod, sys_chown, sys_lchown,
sys___posix_chown, sys___posix_lchown, sys_link, do_sys_pstatvfs,
sys_quotactl, sys_revoke, sys_truncate, do_sys_utimes, sys_extattrctl,
sys_extattr_set_file, sys_extattr_set_link, sys_extattr_get_file,
sys_extattr_get_link, sys_extattr_delete_file,
sys_extattr_delete_link, sys_extattr_list_file, sys_extattr_list_link,
sys_setxattr, sys_lsetxattr, sys_getxattr, sys_lgetxattr,
sys_listxattr, sys_llistxattr, sys_removexattr, sys_lremovexattr

All have been scrutinized (several times, in fact) and compile-tested,
but not all have been explicitly tested in action.

XXX: While I haven't (intentionally) changed the use or nonuse of
XXX: TRYEMULROOT in any of these places, I'm not convinced all the
XXX: uses are correct; an audit might be desirable.
 1.273 07-May-2009  elad Use genfs_can_mount().
 1.272 04-Apr-2009  ad Turn up the volume on the warning message a bit.
 1.271 15-Mar-2009  cegger ansify function definitions
 1.270 22-Feb-2009  ad PR kern/26878 FFSv2 + softdep = livelock (no free ram)
PR kern/16942 panic with softdep and quotas
PR kern/19565 panic: softdep_write_inodeblock: indirect pointer #1 mismatch
PR kern/26274 softdep panic: allocdirect_merge: ...
PR kern/26374 Long delay before non-root users can write to softdep partitions
PR kern/28621 1.6.x "vp != NULL" panic in ffs_softdep.c:4653 while unmounting a softdep (+quota) filesystem
PR kern/29513 FFS+Softdep panic with unfsck-able file-corruption
PR kern/31544 The ffs softdep code appears to fail to write dirty bits to disk
PR kern/31981 stopping scsi disk can cause panic (softdep)
PR kern/32116 kernel panic in softdep (assertion failure)
PR kern/32532 softdep_trackbufs deadlock
PR kern/37191 softdep: locking against myself
PR kern/40474 Kernel panic after remounting raid root with softdep

Retire softdep, pass 2. As discussed and later formally announced on the
mailing lists.
 1.269 13-Nov-2008  ad branches: 1.269.4;
These depend on ffs.
 1.268 13-Nov-2008  ad Remove #ifdef LFS from the ufs code.
 1.267 28-Jun-2008  rumble branches: 1.267.2; 1.267.4; 1.267.6;
Fix lkm fallout from previous sysctl changes. This largely duplicates
sysctl creation code, but lkms are going away soon(ish) anyway.

Spotted by Chris Gilbert.
 1.266 28-Jun-2008  rumble Create sysctl entries during module initialisation and destroy them
appropriately.

Many of these file systems are now ready for modularisation.
 1.265 24-May-2008  nakayama branches: 1.265.2;
s/log file system/log-structured file system/
 1.264 20-May-2008  ad Don't moan about LFS unless the mount succeeds.
 1.263 18-May-2008  ad Until these get fixed or replaced:

WARNING: the foo file system is experimental and may be unstable
 1.262 16-May-2008  hannken Make sure all cached buffers with valid, not yet written data have been
run through copy-on-write. Call fscow_run() with valid data where possible.

The LP_UFSCOW hack is no longer needed to protect ffs_copyonwrite() against
endless recursion.

- Add a flag B_MODIFY to bread(), breada() and breadn(). If set the caller
intends to modify the buffer returned.

- Always run copy-on-write on buffers returned from ffs_balloc().

- Add new function ffs_getblk() that gets a buffer, assigns a new blkno,
may clear the buffer and runs copy-on-write. Process possible errors
from getblk() or fscow_run(). Part of PR kern/38664.

Welcome to 4.99.63

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.261 10-May-2008  rumble Convert file systems to dynamically attach with the new module interface.
Make VFS hooks dynamic while we're here and say farewell to VFS_ATTACH and
VFS_HOOKS_ATTACH linksets.

As a consequence, most of the file systems can now be loaded as new style
modules.

Quick sanity check by ad@.
 1.260 06-May-2008  ad branches: 1.260.2;
PR kern/38141 lookup/vfs_busy acquire rwlock recursively

Simplify the mount locking. Remove all the crud to deal with recursion on
the mount lock, and crud to deal with unmount as another weirdo lock.

Hopefully this will once and for all fix the deadlocks with this. With this
commit there are two locks on each mount:

- krwlock_t mnt_unmounting. This is used to prevent unmount across critical
sections like getnewvnode(). It's only ever read locked with rw_tryenter(),
and is only ever write locked in dounmount(). A write hold can't be taken
on this lock if the current LWP could hold a vnode lock.

- kmutex_t mnt_updating. This is taken by threads updating the mount, for
example when going r/o -> r/w, and is only present to serialize updates.
In order to take this lock, a read hold must first be taken on
mnt_unmounting, and the two need to be held across the operation.

One effect of this change: previously if an unmount failed, we would make a
half hearted attempt to back out of it gracefully, but that was unlikely to
work in a lot of cases. Now while an unmount that will be aborted is in
progress, new file operations within the mount will fail instead of being
delayed. That is unlikely to be a problem though, because if the admin
requests unmount of a file system then s(he) has made a decision to deny
access to the resource.
 1.259 30-Apr-2008  ad PR kern/38135 vfs_busy/vfs_trybusy confusion

The previous fix worked, but it opened a window where mounts could have
disappeared from mountlist while the caller was traversing it using
vfs_trybusy(). Fix that.
 1.258 29-Apr-2008  ad kern/38135 vfs_busy/vfs_trybusy confusion

The symptom was that sometimes file systems would occasionally not appear
in output from 'df' or 'mount' if the system was busy. Resolution:

- Make mount locks work somewhat like vm_map locks.
- vfs_trybusy() now only fails if the mount is gone, or if someone is
unmounting the file system. Simple contention on mnt_lock doesn't
cause it to fail.
- vfs_busy() will wait even if the file system is being unmounted.
 1.257 29-Apr-2008  ad PR kern/38057 ffs makes assuptions about devvp file system
PR kern/33406 softdeps get stuck in endless loop

Introduce VFS_FSYNC() and call it when syncing a block device, if it
has a mounted file system.
 1.256 28-Apr-2008  martin Remove clause 3 and 4 from TNF licenses
 1.255 30-Jan-2008  ad branches: 1.255.6; 1.255.8; 1.255.10;
PR kern/37706 (forced unmount of file systems is unsafe):

- Do reference counting for 'struct mount'. Each vnode associated with a
mount takes a reference, and in turn the mount takes a reference to the
vfsops.
- Now that mounts are reference counted, replace the overcomplicated mount
locking inherited from 4.4BSD with a recursable rwlock.
 1.254 28-Jan-2008  dholland Fix some race conditions in rename.
Introduce a per-FS rename lock and new vfsops to manipulate it.
Get this lock while renaming. Also add another relookup() in do_sys_rename,
which is a hack to kludge around some of the worst deficiencies of
ufs_rename.
reviewed-by: pooka (and an earlier rev by ad)
posted on tech-kern with no objections.
 1.253 24-Jan-2008  ad specfs changes for PR kern/37717 (raidclose() is no longer called on
shutdown). There are still problems with device access and a PR will be
filed.

- Kill checkalias(). Allow multiple vnodes to reference a single device.

- Don't play dangerous tricks with block vnodes to ensure that only one
vnode can describe a block device. Instead, prohibit concurrent opens of
block devices. As a bonus remove the unreliable code that prevents
multiple file system mounts on the same device. It's no longer needed.

- Track opens by vnode and by device. Issue cdev_close() when the last open
goes away, instead of abusing vnode::v_usecount to tell if the device is
open.
 1.252 02-Jan-2008  ad Merge vmlocking2 to head.
 1.251 12-Dec-2007  lukem defflag LFS_KERNEL_RFW (in opt_lfs.h).
Note: lfs_rfw.c doesn't compile if you define the option; locking API fallout?
 1.250 08-Dec-2007  pooka branches: 1.250.2; 1.250.4;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.
 1.249 26-Nov-2007  pooka branches: 1.249.2;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.248 22-Nov-2007  yamt lfs_mountroot: use vfs_destroy.
 1.247 10-Nov-2007  rmind Use PRI_BIO for kthreads instead of PINOD. Fixes a missed case of priority
inversion, which caused LFS to fire some assertions.

Reported by Kurt Schreiner on <current-users>.
 1.246 10-Oct-2007  ad branches: 1.246.2; 1.246.4;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.245 08-Oct-2007  ad Merge ffs locking & brelse changes from the vmlocking branch.
 1.244 31-Jul-2007  pooka branches: 1.244.2; 1.244.4; 1.244.6; 1.244.8;
* nuke the nameidata parameter from VFS_MOUNT(). Nobody on tech-kern
knew what it was supposed to be used for and wrstuden gave a go-ahead
* while rototilling, convert file systems which went easily to
use VFS_PROTOS() instead of manually prototyping the methods
 1.243 29-Jul-2007  ad It's not a good idea for device drivers to modify b_flags, as they don't
need to understand the locking around that field. Instead of setting
B_ERROR, set b_error instead. b_error is 'owned' by whoever completes
the I/O request.
 1.242 26-Jul-2007  pooka Use eopnotsupp() instead of vfs_stdsuspendctl() and retire the latter.
 1.241 23-Jul-2007  ad Workaround the ufs_haslock/ufs_ihash_lock deadlock. From a patch
posted by Blair Sadewitz.
 1.240 17-Jul-2007  christos branches: 1.240.2;
Eliminate MFSNAMELEN
 1.239 17-Jul-2007  pooka Make set_statvfs_info() take a parameter for the vfs name instead
of always retrieving it from mp->mnt_op->vfs_name

christos ok
 1.238 12-Jul-2007  dsl Change the VFS_MOUNT() interface so that the 'data' buffer passed to the
fs code is a kernel buffer, pass though the length of the buffer as well.
Since the length of the userspace buffer isn'it (yet) passed through the mount
system call, add a field to the vfsops structure containing the default length.
Split sys_mount() for calls from compat code.
Ride one of the recent kernel version changes - old fs LKMs will load, but
sys_mount() will reject any attempt to use them.
 1.237 09-Jul-2007  ad Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.236 30-Jun-2007  pooka Using POOL_INIT here makes no sense, since file systems always have
an init method. So get rid of it and #ifdef _LKM and just always
init in the init method. Give malloc types the same treatment.
Makes file systems nicer to work with in linksetless environments
and fixes a few LKM discrepancies.
 1.235 16-May-2007  perseant Change references to SEGM_W_DIROPS to SEGM_CKP, and replace the logic that
formerly used SEGM_W_DIROPS in lfs_segwrite() appropriately. This prevents
a problem in which processes could get stuck in "buffers" sleep forever.
 1.234 17-Apr-2007  perseant Install a new sysctl, vfs.lfs.ignore_lazy_sync, which causes LFS to ignore
the "smooth" syncer, as if vfs.sync.*delay = 0, but only for LFS. The
default is "on", i.e., ignore lazy sync.

Reduce the amount of polling/busy-waiting done by lfs_putpages(). To
accomplish this, copied genfs_putpages() and modified it to indicate which
page it was that caused it to return with EDEADLK. fsync()/fdatasync()
should no longer ever fail with EAGAIN, and should not consume huge
quantities of cpu.

Also, try to make dirops less likely to be written as the result of a
VOP_PUTPAGES(), while ensuring that they are written regularly.
 1.233 13-Mar-2007  ad Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.
 1.232 12-Mar-2007  ad branches: 1.232.2;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.
 1.231 22-Feb-2007  thorpej branches: 1.231.4;
TRUE -> true, FALSE -> false
 1.230 21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.229 18-Feb-2007  ad Release ufs_hashlock before calling ungetnewvnode().
 1.228 15-Feb-2007  ad branches: 1.228.2;
Destroy the fraglock on unmount.
 1.227 15-Feb-2007  ad Replace some uses of lockmgr() / simplelocks.
 1.226 19-Jan-2007  hannken New file system suspension API to replace vn_start_write and vn_finished_write.
The suspension helpers are now put into file system specific operations.
This means every file system not supporting these helpers cannot be suspended
and therefore snapshots are no longer possible.

Implemented for file systems of type ffs.

The new API is enabled on a kernel option NEWVNGATE. This option is
not enabled by default in any kernel config.

Presented and discussed on tech-kern with much input from
Bill Studenmund <wrstuden@netbsd.org> and YAMAMOTO Takashi <yamt@netbsd.org>.

Welcome to 4.99.9 (new vfs op vfs_suspendctl).
 1.225 04-Jan-2007  elad Consistent usage of KAUTH_GENERIC_ISSUSER.
 1.224 16-Nov-2006  christos branches: 1.224.2; 1.224.4;
__unused removal on arguments; approved by core.
 1.223 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.222 04-Oct-2006  christos fix empty if
 1.221 28-Sep-2006  perseant Use lockstatus instead of a homebrewed locking system to control
LFCNWRAPSTOP and LFCNWRAPGO.

Be less verbose about the various looping checks: use log() rather than
printf(), and only log anything if we are really looping ("count = 2" is
not an error condition).

Allow dirops sleeping on available space to be interruptible.
 1.220 02-Sep-2006  christos branches: 1.220.2; 1.220.4;
- add missing initializers
- comment out impossible code
 1.219 01-Sep-2006  perseant Changes to help the roll-forward agent, to wit:

* Mark being-deleted files in the Ifile so we can finish deleting them
at fs mount time.
* Flag the Ifile with "cleaner must clean" when writers are waiting for
the cleaner, rather than relying solely on the cleaner's estimation of
whether it should clean or not.
* Note partial segments written by a user agent (in particular,
fsck_lfs) so that repeated rolls forward don't interfere with one
another.
* Add a new fcntl, LFCNPASS, that allows the log to wrap exactly once,
for better testing of the validity of checkpoints.
* Keep track of the on-disk nlink count when cleaning, so that we don't
partially complete directory operations while cleaning.
* Ensure that every single Ifile inode write represents a consistent
view of the filesystem. In particular, the accounting for the segment
we are writing the inode into must be correct, and the accounting for
the segment that inode used to reside in must be correct. Rather than
just rewriting the inode if we wrote it wrong, rewrite the necessary
ifile blocks before writing the inode so we never write it wrong.
* Don't unmark any VDIROP vnodes if we haven't written them to disk,
avoiding yet another problem with the "wait for the cleaner" error
return from lfs_putpages().

Also, move the last callback to an aiodone call, so we no longer do any
memory management from interrupt context.
 1.218 23-Jul-2006  ad Use the LWP cached credentials where sane.
 1.217 20-Jul-2006  perseant Separate the (non-working) LFS kernel roll-forward code into its own file,
lfs_rfw.c.
 1.216 13-Jul-2006  martin Fix alignement problems for fhandle_t, exposed by gcc4.1.

While touching all vptofh/fhtovp functions, get rid of VFS_MAXFIDSIZ,
version the getfh(2) syscall and explicitly pass the size available in
the filehandle from userland.

Discussed on tech-kern, with lots of help from yamt (thanks!).
 1.215 06-Jul-2006  perseant Fix a typo that caused a "multiple free" panic on unmounting a resized lfs.
 1.214 29-Jun-2006  perseant Don't wake up the cleaner if the filesystem is unwrappable, and fix the
compatibility fcntls.

Also includes one-line fixes for an MP locking bug and a zero-length FINFO
problem that manifested during testing.
 1.213 24-May-2006  perseant branches: 1.213.2;
Read the inode version number fro a more reliable source, quelling a
diagnostic assertion panic.
 1.212 18-May-2006  perseant branches: 1.212.2;
Break out the finfo array manipulation code into two new functions,
lfs_acquire_finfo() and lfs_release_finfo(). Add a debugging check
for zero-length finfo arrays in the segment summary to avoid future
regressions.
 1.211 18-May-2006  perseant Don't duplicate the LFS_STARVED_FOR_SEGS check (an oversight that came
in with rev 1.210).
 1.210 14-May-2006  elad integrate kauth.
 1.209 12-May-2006  perseant Fixes to address the "vinvalbuf: dirty blocks" panic that can occur when
many inodes are cleaned at once. Make sure that we write all the pages
on vnodes that are being flushed, even if we don't think there's room;
drain v_numoutput before lfs_vflush() completes.

Also, don't allow a vnode that is in the process of being cleaned to be
chosen by getnewvnode(); this avoids a segment accounting panic in the case
that a large number of inodes are fed to lfs_markv() all at once.
 1.208 10-May-2006  mrg quell GCC 4.1 uninitialised variable warnings.

XXX: we should audit the tree for which old ones are no longer needed
after getting the older compilers out of the tree..
 1.207 04-May-2006  perseant Introduce another per-filesystem parameter, lfs_resvseg, to separate the
notion of "how many segments are reserved for the cleaner" from that of
"how many segments are not counted in lfs_bfree". The default value
used for existing filesystems is the same as the previous implicit value
of (lfs_minfreeseg / 2 + 1), modulo some sanity checking.

Count pending dirops on a per-filesystem basis, since once we start
writing them we can't stop until we're done. This seems to help stave off
the "no clean segments" panic in the case of filling the filesystem with
directories and small files (e.g. simultaneously unpacking more copies of
pkgsrc than will fit).
 1.206 30-Apr-2006  perseant Postpone the segment accounting changes coming from truncation until the
inode that makes those changes valid is either written to disk by
lfs_writeinode() or discarded by lfs_vfree().

A couple of locking fixes are also included as well.
 1.205 18-Apr-2006  perseant Don't roll forward if we aren't given a process context. Coverity CID 1076.
 1.204 15-Apr-2006  christos Coverity CID 2499: Fix uninitialize variable use.
 1.203 10-Apr-2006  perseant Remove mostly useless BUFPAGES warning message from lfs_{un,}mount.
 1.202 10-Apr-2006  perseant Optimize the free list search a little more; in particular use words
instead of bytes for the index, and never search below fs->lfs_freehd.

Fix a bug in the previous version of the search (an erroneous assumption
that ino_t was signed).

Free the bitmap when we unmount the filesystem.
 1.201 10-Apr-2006  perseant Correct a locking bug in the recent pager optimization.
 1.200 08-Apr-2006  perseant Implement a somewhat finer-grained mechanism for paging LFS-backed pages.
The writer daemon, if it does not need to flush the whole filesystem,
now only writes the vnodes for which the pagedaemon has requested pageouts
(although it does not pay attention to the page ranges the pagedaemon
supplies).
 1.199 08-Apr-2006  perseant Keep the free list ordered. This solves a problem first pointed out to me
by Michel Oey, in which an aged LFS writes up to an extra Ifile block for
every file created; and paves the way for the truncation of the Ifile when
many files are deleted.
 1.198 31-Mar-2006  perseant Handle the "filesystem is clean" flag correctly when upgrading from
read-only to read-write mount. This makes "root on lfs" work for me,
although it looks like a different traceback from PR#32667.
 1.197 30-Mar-2006  yamt some cleanups after the introduction of GOP_SIZE_MEM flag.
- remove GOP_SIZE_READ/GOP_SIZE_WRITE flags.
they have not been used since the change.
- ufs_balloc_range: remove code which has been no-op since the change.
thanks Konrad Schroder for explaining the original intention of the code.
- ffs_gop_size: don't extend past eof, in the case of GOP_SIZE_MEM.
otherwise genfs_getpages end up to allocate pages past eof unnecessarily.
 1.196 28-Mar-2006  perseant Double-checkpoint on unmount. This ensures that vnodes belonging to removed
files are really freed, preventing occasional spurious EBUSY returns from
vflush().
 1.195 24-Mar-2006  perseant Improvements to LFS's paging mechanism, to wit:

* Acknowledge that sometimes there are more dirty pages to be written to
disk than clean segments. When we reach the danger line,
lfs_gop_write() now returns EAGAIN. The caller of VOP_PUTPAGES(), if
it holds the segment lock, drops it and waits for the cleaner to make
room before continuing.

* Note and avoid a three-way deadlock in lfs_putpages (a writer holding
a page busy blocks on the cleaner while the cleaner blocks on the
segment lock while lfs_putpages blocks on the page).
 1.194 17-Mar-2006  tls From Konrad Schroeder, in response to strange df output on anoncvs.netbsd.org:
We were returning the wrong value for free space. Now we're not.
 1.193 21-Feb-2006  thorpej branches: 1.193.2; 1.193.4; 1.193.6;
Use device_class() instead of accessing dv_class directly.
 1.192 14-Jan-2006  yamt branches: 1.192.2; 1.192.4;
- unify ffs_blkatoff and lfs_blkatoff.
- remove ufs_ops::uo_blkatoff.
- add directory read-ahead code. (disabled for now.)
 1.191 04-Jan-2006  yamt - add simple functions to allocate/free a buffer for i/o.
- make bufpool static.
 1.190 11-Dec-2005  christos branches: 1.190.2;
merge ktrace-lwp.
 1.189 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.188 27-Sep-2005  yamt branches: 1.188.2;
introduce "ufs_ops" and use it for ITIMES.
 1.187 23-Sep-2005  jmmv Apply the NFS exports list rototill patch:

- Remove all NFS related stuff from file system specific code.
- Drop the vfs_checkexp hook and generalize it in the new nfs_check_export
function, thus removing redundancy from all file systems.
- Move all NFS export-related stuff from kern/vfs_subr.c to the new
file sys/nfs/nfs_export.c. The former was becoming large and its code
is always compiled, regardless of the build options. Using the latter,
the code is only compiled in when NFSSERVER is enabled. While doing this,
also make some functions in nfs_subs.c conditional to NFSSERVER.
- Add a new command in nfssvc(2), called NFSSVC_SETEXPORTSLIST, that takes a
path and a set of export entries. At the moment it can only clear the
exports list or append entries, one by one, but it is done in a way that
allows setting the whole set of entries atomically in the future (see the
comment in mountd_set_exports_list or in doc/TODO).
- Change mountd(8) to use the nfssvc(2) system call instead of mount(2) so
that it becomes file system agnostic. In fact, all this whole thing was
done to remove a 'XXX' block from this utility!
- Change the mount*, newfs and fsck* userland utilities to not deal with NFS
exports initialization; done internally by the kernel when initializing
the NFS support for each file system.
- Implement an interface for VFS (called VFS hooks) so that several kernel
subsystems can run arbitrary code upon receipt of specific VFS events.
At the moment, this only provides support for unmount and is used to
destroy NFS exports lists from the file systems being unmounted, though it
has room for extension.

Thanks go to yamt@, chs@, thorpej@, wrstuden@ and others for their comments
and advice in the development of this patch.
 1.186 23-Aug-2005  christos Don't overload MAXNAMLEN, use a separate constant for each filesystem type.
 1.185 19-Aug-2005  christos 64 bit inode changes.
 1.184 23-Jul-2005  yamt update file timestamps for nfsd loaned-read and mmap.
PR/25279. discussed on tech-kern@.
 1.183 28-Jun-2005  yamt branches: 1.183.2;
- constify genfs_ops.
- use member designators.
 1.182 09-Jun-2005  atatat Properly fix the constipated lossage wrt -Wcast-qual and the sysctl
code. I know it's not the prettiest code, but it seems to work rather
well in spite of itself.
 1.181 29-May-2005  christos - sprinkle const
- avoid shadow variables.
 1.180 20-May-2005  perseant Keep track of the number of segments reclaimed, since the cleaner doesn't
do this anymore (it hasn't for quite some time). Add a couple of conditional
debugging messages to indicate why segments are not cleaned, in the event
that lfs_segclean is used.

Make the LFCNSEGWAITALL fcntl work again.
 1.179 20-May-2005  perseant Fill in the lfs_fsmnt field in the superblock when we mount the filesystem,
so fsck(8) can tell where it was last mounted.
 1.178 04-May-2005  perseant Don't let the pager_map deadlock avoidance code in lfs_putpages() write
segments containing zero-block FINFO records. These records cause segments
to become uncleanable, which would eventually result in a "no clean segments"
panic.
 1.177 23-Apr-2005  perseant Provide a resize_lfs(8), including kernel and cleaner support. The current
implementation requires the fs to be mounted while resizing. Tested in both
directions, and everything appears to work happily, but ymmv.
 1.176 19-Apr-2005  perseant Keep per-inode, per-fs, and subsystem-wide counts of blocks allocated through
lfs_balloc(), and use that to estimate the number of dirty pages belonging
to LFS (subsystem or filesystem). This is almost certainly wrong for
the case of a large mmap()ed region, but the accounting is tighter than
what we had before, and performs much better in the typical case of pages
dirtied through write().
 1.175 16-Apr-2005  perseant Remove left-over reference to "lfs_blist", for _LKM case.
 1.174 16-Apr-2005  perseant Use splay trees, rather than a hash table, to manage the accounting of
blocks allocated through VOP_BALLOC() for pages to be written to disk.
This accounting no longer takes a noticeable fraction of the system CPU.
 1.173 14-Apr-2005  perseant Consolidate the hash table we use to maintain the integrity of lfs_avail
into a single, system-wide table, rather than having a separate hash table
per inode. Significantly reduces the "system" cpu usage of your average
file write.
 1.172 14-Apr-2005  perseant Keep track of the highest block held by an LFS inode, so that we can
be assured that the last byte of a file is always allocated. Previously
a file extension could cause the filesystem to be flushed, writing an
inconsistent inode to disk. Although this condition would be corrected
the next time blocks were written to disk, an intervening crash would leave
the filesystem in an inconsistent state, leaving fsck_lfs to complain
of an inode "partially truncated".
 1.171 08-Apr-2005  perseant Clean up the handling of the pager_map deadlock in lfs_putpages, after
realizing that it is safe to sleep the second time through the loop.
 1.170 06-Apr-2005  perseant Fix some locking issues that appeared with the simple_lock work.
Address a "pager_map" deadlock in lfs_putpages().
 1.169 01-Apr-2005  perseant Protect various per-fs structures with fs->lfs_interlock simple_lock, to
improve behavior in the multiprocessor case. Add debugging segment-lock
assertion statements.
 1.168 29-Mar-2005  thorpej - Define a VFS_ATTACH() macro that places a reference to a vfsops structure
into the "vfsops" link set.
- Use VFS_ATTACH() where vfsops are declared for individual file systems.
- In vfsinit(), traverse the "vfsops" link set, rather than vfs_list_initial[].
 1.167 08-Mar-2005  simonb branches: 1.167.2;
Tab Police.
 1.166 08-Mar-2005  perseant Straighten out the maze of ifdefs. Instead, consolidate all the debugging
stuff under '#ifdef DEBUG', and use sysctl knobs to turn on/off particular
parts of the debugging reporting (if DEBUG is enabled). Re-enable the LFS
statistics in sysctl, while I'm there. A bit of a rototill.
 1.165 04-Mar-2005  perseant Move "ifile is too large for your NBUFS/BUFPAGES" messages into a function.
Use log(9) to warn the user instead of printf(9). Since the theory is that
the Ifile is "always in cache", but the greater performance risk is
when the inode entries can't be held in cache, note these two cases
separately, at different log levels (notice and warning, respectively).
 1.164 26-Feb-2005  perry nuke trailing whitespace
 1.163 26-Feb-2005  perseant Various minor LFS improvements:

* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statvfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().
 1.162 11-Jan-2005  mycroft branches: 1.162.2; 1.162.4;
Rearrange some code slightly to avoid uninitialized variable warnings.
 1.161 09-Jan-2005  mycroft Whoops -- move the location of the VOP_OPEN()/VOP_CLOSE(), et al, from
foo_mountfs() to foo_mount(), to match the new mountroot API.
Also, for ext2fs and lfs, copy some restructuring from ffs to allow changing
file system parameters without specifying the device name.
(ntfs could use some more work.)
 1.160 09-Jan-2005  mycroft Rework the mountroot interface so that vfs_mountroot() opens the root device
and just passes it on to the file system functions. This avoids opening and
closing the device several times.

Mentioned on tech-kern some time ago, IIRC. I've been running this for a
long time.
 1.159 02-Jan-2005  thorpej Add the system call and VFS infrastructure for file system extended
attributes.

From FreeBSD.
 1.158 16-Aug-2004  mycroft Make sure to set IMNT_DTYPE here...
 1.157 15-Aug-2004  mycroft Need to set um_dirblksiz here...
 1.156 15-Aug-2004  mycroft Fixing age old cruft:
* Rather than using mnt_maxsymlinklen to indicate that a file systems returns
d_type fields(!), add a new internal flag, IMNT_DTYPE.

Add 3 new elements to ufsmount:
* um_maxsymlinklen, replaces mnt_maxsymlinklen (which never should have existed
in the first place).
* um_dirblksiz, which tracks the current directory block size, eliminating the
FS-specific checks littered throughout the code. This may be used later to
make the block size variable.
* um_maxfilesize, which is the maximum file size, possibly adjusted lower due
to implementation issues.

Sync some bug fixes from FFS into ext2fs, particularly:
* ffs_lookup.c 1.21, 1.28, 1.33, 1.48
* ffs_inode.c 1.43, 1.44, 1.45, 1.66, 1.67
* ffs_vnops.c 1.84, 1.85, 1.86

Clean up some crappy pointer frobnication.
 1.155 14-Aug-2004  mycroft Add a new flag, IN_MODIFY. This is like IN_UPDATE|IN_CHANGE, but unlike
setting those flags, it does not cause the inode to be written in the periodic
sync. This is used for writes to special files (devices and named pipes) and
FIFOs.

Do not preemptively sync updates to access times and modification times. They
are now updated in the inode only opportunistically, or when the file or device
is closed. (Really, it should be delayed beyond close, but this is enough to
help substantially with device nodes.)

And the most amusing part:
Trickle sync was broken on both FFS and ext2fs, in different ways. In FFS, the
periodic call to VFS_SYNC(MNT_LAZY) was still causing all file data to be
synced. In ext2fs, it was causing the metadata to *not* be synced. We now
only call VOP_UPDATE() on the node if we're doing MNT_LAZY. I've confirmed
that we do in fact trickle correctly now.
 1.154 05-Jul-2004  pk Call inittodr() from main(). Let file system code set the recorded `last
update' time (if any) through the new function setrootfstime().
 1.153 30-May-2004  yamt lfs_gop_write: assert that ifile never come here.
 1.152 25-May-2004  hannken Add ffs internal snapshots. Written by Marshall Kirk McKusick for FreeBSD.

- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.

Welcome to 2.0F.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.151 25-May-2004  atatat Sysctl descriptions under vfs subtree
 1.150 20-May-2004  atatat Explicitly call pool_init() (and pool_destroy()) when being built as
an _LKM.

This adds pools to the list of things that lkms must do manually
because they're set up with link sets. Not that there's anything
wrong with link sets, but that we need to try harder to remember that
lkms are second class citizens. Of a sort.
 1.149 25-Apr-2004  simonb Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.
 1.148 22-Apr-2004  yamt lfs_statvfs: report f_frsize correctly.
 1.147 21-Apr-2004  christos Replace the statfs() family of system calls with statvfs().
Retain binary compatibility.
 1.146 27-Mar-2004  atatat branches: 1.146.2;
Manually attach malloc types when being built as an lkm.
 1.145 24-Mar-2004  atatat Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.
 1.144 26-Feb-2004  oster Add a missing:

pool_destroy(&lfs_dinode_pool);

to lfs_done().

Approved-by: yamt
 1.143 28-Jan-2004  he Let the cast to (long long) for using the result as a printf argument
apply to the whole expression, not just the first factor.
 1.142 28-Jan-2004  yamt use bufmem instead of bufpages to make lfs a little less broken.
 1.141 04-Dec-2003  atatat Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
 1.140 07-Nov-2003  yamt - tweak lfs_update_single()'s prototype so that it can be used by
roll-forward code.
- reduce code duplication using the above in update_meta()
this also fixes fragment accounting.
 1.139 07-Nov-2003  yamt fix spec vnode aliasing.
 1.138 07-Nov-2003  yamt - tell filesize changes to vm when roll-forwarding data blocks.
- handle fragment extension better during roll-forward.
- related assertions.
 1.137 30-Oct-2003  simonb Remove some assigned-to but otherwise unused variables.
 1.136 25-Oct-2003  christos Fix uninitialized variable warnings.
 1.135 14-Oct-2003  dbj add mnt_iflag field to struct mount for internal flags
mv MNT_GONE, MNT_UNMOUNT and MNT_WANTRDWR to this field
additonally add mnt_writeopcountupper and mnt_writeopcountlower fields
in preparation for pending write suspension support work
bump kernel version to 1.6ZD
 1.134 14-Oct-2003  yamt add a prototype of check_segsum().
 1.133 14-Oct-2003  yamt when roll-forwarding, check segment serial numbers correctly.
 1.132 14-Oct-2003  yamt add a missing fsbtodb() to read a correct block for roll-forwarding.
 1.131 07-Sep-2003  yamt comments on lfs_issequential_hole.
 1.130 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.129 23-Jul-2003  yamt add parenthesis missed in rev.1.127.
 1.128 23-Jul-2003  yamt whitespace
 1.127 23-Jul-2003  yamt add KASSERTs in lfs_issequential_hole.
 1.126 12-Jul-2003  yamt more MP locks.
 1.125 12-Jul-2003  yamt - protect global resource counts with lfs_subsys_lock.
- clean up scattered externs a little.
 1.124 02-Jul-2003  yamt use queue.h macros.
 1.123 02-Jul-2003  yamt use VFSTOUFS macro.
 1.122 02-Jul-2003  yamt - add a new functions, lfs_writer_enter/leave, and use them instead of
duplicated code fragments.
- add an assertion.
 1.121 29-Jun-2003  fvdl branches: 1.121.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.120 29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.119 28-Jun-2003  bouyer Adapt for struct proc* -> struct lwp* changes.
 1.118 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.117 18-May-2003  yamt make is_sequential a callback in order to achieve better lfs write clustering.

since lfs always rewrite blocks into the new segment,
current on-disk place of the block doesn't affect to write clustering.

ok'ed by Konrad Schroder.
 1.116 29-Apr-2003  perseant Restrict the run of cluster blocks to on-disk contiguous blocks (back out
part of rev 1.115), to avoid writing over holes. This is the lesser of two
evils, to be replaced soon.
 1.115 23-Apr-2003  perseant Make LFS work better (though still not "well") as an NFS-exported
filesystem (and other things that needed to be fixed before the tests
would complete), to wit:

* Include the fs ident in the filehandle; improve stale filehandle checks.

* Change definition of blksize() to use the on-dinode size instead of
the inode's i_size, so that fsck_lfs will work properly again.

* Use b_interlock in lfs_vtruncbuf.

* Postpone dirop reclamation until after the seglock has been released,
so that lfs_truncate is not called with the segment lock held.

* Don't loop in lfs_fsync(), just write everything and wait.

* Be more careful about the interlock/uobjlock in lfs_putpages: when we
lose this lock, we have to resynchronize dirtiness of pages in each
block.

* Be sure to always write indirect blocks and update metadata in
lfs_putpages; fixes a bug that caused blocks to be accounted to the
wrong segment.
 1.114 16-Apr-2003  christos PR/1796: John Kohl: statfs misbehaves under chrooted environments.

- Under chroot it displays only the visible filesystems with appropriate paths.
- The statfs f_mntonname gets adjusted to contain the real path from root.
- While was there, fixed a bug in ext2fs, locking problems with vfs_getfsstat(),
and factored out some of the vfsop statfs() code to copy_statfs_info(). This
fixes the problem where some filesystems forgot to set fsid.
- Made coda look more like a normal fs.
 1.113 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.112 28-Mar-2003  perseant Add a sleeper count, to prevent the cleaner from panicing the kernel
when the filesystem is unmounted, relocking the Ifile when its lock is
draining. (We can't use vfs_busy() since the process is sleeping for a
good long time.) Clean up / organize lfs.h, while I'm here.

In lfs_update_single, assert that disk addresses are either negative, or
are still positive when converted to int32_t, to prevent recurrence of a
negative/positive block problem.
 1.111 21-Mar-2003  dsl Use 'void *' instead of 'caddr_t' in prototypes of VOP_IOCTL, VOP_FCNTL
and VOP_ADVLOCK, delete casts from callers (and some to copyin/out).
 1.110 21-Mar-2003  perseant KNF (space after keywords).
 1.109 21-Mar-2003  perseant Use VONWORKLST as a heuristic for vnode emptiness, rather than exhaustively
checking the memq.

Take greater care not to dirty the Ifile vnode when unmounting the filesystem.
This should fix a "(vp->v_flag & VONWORKLST) == 0" assertion panic in vgonel
that could occur when unmounting.

Do not allow the Ifile to be mapped for writing.
 1.108 21-Mar-2003  yamt make this compilable with DIAGNOSTIC and without DEBUG.
fix PR 20827 from FUKAUMI Naoki.
 1.107 20-Mar-2003  yamt fix "more than one fragment" panics;
direct and indirect block pointers are not valid in the case of shortlinks.
while i'm here, move duplicated code in lfs_vget/fastvget into a new
function, lfs_vinit.
 1.106 18-Mar-2003  perseant Remember to destroy lfs_inoext_pool when closing up the LFS subsystem.
 1.105 15-Mar-2003  perseant Add simple_lock protection for lfs_seglock and lfs_subsys_pages; these will
be expanded to cover other per-fs and subsystem-wide data as well.

Fix a case of IN_MODIFIED being set without updating lfs_uinodes, resulting
in a "lfs_uinodes < 0" panic.

Fix a deadlock in lfs_putpages arising from the need to busy all pages in a
block; unbusy any that had already been busied before starting over.
 1.104 08-Mar-2003  perseant Take away "#ifdef LFS_UBC".
 1.103 08-Mar-2003  perseant Add an lfs_strategy() that checks to make sure we're not trying to read
where the cleaner is trying to write, instead of tying up the "live"
buffers (or pages).

Fix a bug in the LFS_UBC case where oversized buffers would not be
checksummed correctly, causing uncleanable segments.

Make sure that wakeup(fs->lfs_iocount) is done if fs->lfs_iocount is 1
as well as 0, since we wait in some places for it to drop to 1.

Activate all pages that make it into lfs_gop_write without the segment
lock held, since they must have been dirtied very recently, even if
PG_DELWRI is not set.
 1.102 02-Mar-2003  perseant Account SEGUSE_ACTIVE correctly so that the automatic segment cleaning
actually happens.

Add a new fcntl call that will write the minimum necessary to checkpoint
(i.e., for on-disk directory structure to be consistent, not including
updates to file data) so that the cleaner can clean segments more quickly
without sacrificing three-way commit for cleaning.
 1.101 01-Mar-2003  yamt use pid_t for pid.
 1.100 01-Mar-2003  perseant Be careful to always zero pages on truncation/fragment extension,
in the case where the filesystem block size is larger than PAGE_SIZE.
 1.99 25-Feb-2003  thorpej Add a new BUF_INIT() macro which initializes b_dep and b_interlock, and
use it. This fixes a few places where either b_dep or b_interlock were
not properly initialized.
 1.98 25-Feb-2003  yamt fix simplelocks
 1.97 23-Feb-2003  perseant Fix a buffer overflow bug in the LFS_UBC case that manifested itself
either as a mysterious UVM error or as "panic: dirty bufs". Verify
maximum size in lfs_malloc.

Teach lfs_updatemeta and lfs_shellsort about oversized cluster blocks from
lfs_gop_write.

When unwiring pages in lfs_gop_write, deactivate them, under the theory
that the pagedaemon wanted to free them last we knew.
 1.96 20-Feb-2003  perseant Tabify, and fix some comment alignment problems.
 1.95 19-Feb-2003  yamt workaround for "another flush is..." infinity loop in writerd.
if we're writerd, sleep in lfs_flush until another writer goes away
instead of busy loop in writed.
 1.94 19-Feb-2003  yamt wire the pages instead of just dequeue'ing them.
advised by Chuck Silvers.
 1.93 19-Feb-2003  yamt init b_interlock.
 1.92 19-Feb-2003  yamt init b_interlock.
 1.91 17-Feb-2003  perseant Add code to UBCify LFS. This is still behind "#ifdef LFS_UBC" for now
(there are still some details to work out) but expect that to go
away soon. To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:

* Create a writer daemon kernel thread whose purpose is to handle page
writes for the pagedaemon, but which also takes over some of the
functions of lfs_check(). This thread is started the first time an
LFS is mounted.

* Add a "flags" parameter to GOP_SIZE. Current values are
GOP_SIZE_READ, meaning that the call should return the size of the
in-core version of the file, and GOP_SIZE_WRITE, meaning that it
should return the on-disk size. One of GOP_SIZE_READ or
GOP_SIZE_WRITE must be specified.

* Instead of using malloc(...M_WAITOK) for everything, reserve enough
resources to get by and use malloc(...M_NOWAIT), using the reserves if
necessary. Use the pool subsystem for structures small enough that
this is feasible. This also obsoletes LFS_THROTTLE.

And a few that are not strictly necessary:

* Moves the LFS inode extensions off onto a separately allocated
structure; getting closer to LFS as an LKM. "Welcome to 1.6O."

* Unified GOP_ALLOC between FFS and LFS.

* Update LFS copyright headers to correct values.

* Actually cast to unsigned in lfs_shellsort, like the comment says.

* Keep track of which segments were empty before the previous
checkpoint; any segments that pass two checkpoints both dirty and
empty can be summarily cleaned. Do this. Right now lfs_segclean
still works, but this should be turned into an effectless
compatibility syscall.
 1.90 29-Jan-2003  yamt don't use daddr_t for segment summary since it's an on-disk structure.
 1.89 27-Jan-2003  yamt make these compilable with lfs debug options.
(follow daddr_t change)

XXX maybe segment number should be 64bit.
 1.88 25-Jan-2003  kleink Fix further printf format warnings for DEBUG, in the wake of daddr_t
having changed.
 1.87 25-Jan-2003  tron Use PRId64 instead of hard coding "%lld" to fix build problems under
LP64 ports.
 1.86 25-Jan-2003  tron Fix printf() format strings problems caused by "daddr_t" change.
 1.85 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.84 12-Jan-2003  yamt - zerofill struct lfs when allocating it.
- use M_ZERO instead of memset after malloc.
 1.83 24-Nov-2002  yamt lfs_sync should wait at lfs_writer, not lfs_dirops.
PR 18973.
 1.82 27-Sep-2002  provos remove trailing \n in panic(). approved perry.
 1.81 21-Sep-2002  christos MNT_GETARGS support
 1.80 06-Sep-2002  gehenna Merge the gehenna-devsw branch into the trunk.

This merge changes the device switch tables from static array to
dynamically generated by config(8).

- All device switches is defined as a constant structure in device drivers.

- The new grammer ``device-major'' is introduced to ``files''.

device-major <prefix> char <num> [block <num>] [<rules>]

- All device major numbers must be listed up in port dependent majors.<arch>
by using this grammer.

- Added the new naming convention.
The name of the device switch must be <prefix>_[bc]devsw for auto-generation
of device switch tables.

- The backward compatibility of loading block/character device
switch by LKM framework is broken. This is necessary to convert
from block/character device major to device name in runtime and vice versa.

- The restriction to assign device major by LKM is completely removed.
We don't need to reserve LKM entries for dynamic loading of device switch.

- In compile time, device major numbers list is packed into the kernel and
the LKM framework will refer it to assign device major number dynamically.
 1.79 30-Jul-2002  soren Die, qaddr_t, die! - mnt_data in struct mount is already effectively
a void *, so stop pretending otherwise.
 1.78 06-Jul-2002  perseant Deal with fragment size changes better. For each fragment that can
exist on an on-disk inode, we keep a record of its size in struct inode,
which is updated when we write the block to disk. The cleaner routines
thus have ready access to what size is the correct size for this block,
on disk.

Fixed a related bug: if a file with fragments is being cleaned
(fragments being cleaned) at the same time it is being extended beyond
NDADDR blocks, we could write a bogus FINFO record that has a frag in the
middle; when it was cleaned this would give back bogus file data. Don't
write the indirect blocks in this case, since there is no need.

lfs_fragextend and lfs_truncate no longer require the seglock, but instead
take a shared lock, which the seglock locks exclusively.
 1.77 16-Jun-2002  perseant For synchronous writes, keep separate i/o counters for each write, so
processes don't have to wait for one another to finish (e.g., nfsd seems
to be a little happier now, though I haven't measured the difference).
Synchronous checkpoints, however, must always wait for all i/o to finish.

Take the contents of the callback functions and have them run in thread
context instead (aiodoned thread). lfs_iocount no longer has to be
protected in splbio(), and quite a bit less of the segment construction
loop needs to be in splbio() as well.

If lfs_markv is handed a block that is not the correct size according to
the inode, refuse to process it. (Formerly it was extended to the "correct"
size.) This is possibly more prone to deadlock, but less prone to corruption.

lfs_segclean now outright refuses to clean segments that appear to have live
bytes in them. Again this may be more prone to deadlock but avoids
corruption.

Replace ufsspec_close and ufsfifo_close with LFS equivalents; this means
that no UFS functions need to know about LFS_ITIMES any more. Remove
the reference from ufs/inode.h.

Tested on i386, test-compiled on alpha.
 1.76 17-May-2002  perseant branches: 1.76.2;
use macros from <sys/queue.h>
 1.75 16-May-2002  thorpej Fix LP64 printf format warning.
 1.74 14-May-2002  perseant branches: 1.74.2;
Phase one of my three-phase plan to make LFS play nice with UBC, and bug-fixes
I found while making sure there weren't any new ones.

* Make the write clusters keep track of the buffers whose blocks they contain.
This should make it possible to (1) write clusters using a page mapping
instead of malloc, if desired, and (2) schedule blocks for rewriting
(somewhere else) if a write error occurs. Code is present to use
pagemove() to construct the clusters but that is untested and will go away
anyway in favor of page mapping.
* DEBUG now keeps a log of Ifile writes, so that any lingering instances of
the "dirty bufs" problem can be properly debugged.
* Keep track of whether the Ifile has been dirtied by various routines that
can be called by lfs_segwrite, and loop on that until it is clean, for
a checkpoint. Checkpoints need to be squeaky clean.
* Warn the user (once) if the Ifile grows larger than is reasonable for their
buffer cache. Both lfs_mountfs and lfs_unmount check since the Ifile can
grow.
* If an inode is not found in a disk block, try rereading the block, under
the assumption that the block was copied to a cluster and then freed.
* Protect WRITEINPROG() with splbio() to fix a hang in lfs_update.
 1.73 12-May-2002  matt Eliminate commons.
 1.72 08-Mar-2002  thorpej Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.
 1.71 18-Dec-2001  chs use the new compatibility routines to allow mmap() to work
(in the same non-coherent fashion that it worked pre-UBC)
until someone has time to do it the right way.
 1.70 23-Nov-2001  chs add spaces for KNF. confirmed to produce identical objects.
 1.69 08-Nov-2001  lukem add RCSID
 1.68 15-Sep-2001  chs branches: 1.68.2;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.67 15-Sep-2001  chs add a new VFS op, vfs_reinit, which is called when desiredvnodes is
adjusted via sysctl. file systems that have hash tables which are
sized based on the value of this variable now resize those hash tables
using the new value. the max number of FFS softdeps is also recalculated.

convert various file systems to use the <sys/queue.h> macros for
their hash tables.
 1.66 13-Jul-2001  perseant branches: 1.66.2;
Merge the short-lived perseant-lfsv2 branch into the trunk.

Kernels and tools understand both v1 and v2 filesystems; newfs_lfs
generates v2 by default. Changes for the v2 layout include:

- Segments of non-PO2 size and arbitrary block offset, so these can be
matched to convenient physical characteristics of the partition (e.g.,
stripe or track size and offset).

- Address by fragment instead of by disk sector, paving the way for
non-512-byte-sector devices. In theory fragments can be as large
as you like, though in reality they must be smaller than MAXBSIZE in size.

- Use serial number and filesystem identifier to ensure that roll-forward
doesn't get old data and think it's new. Roll-forward is enabled for
v2 filesystems, though not for v1 filesystems by default.

- The inode free list is now a tailq, paving the way for undelete (undelete
is not yet implemented, but can be without further non-backwards-compatible
changes to disk structures).

- Inode atime information is kept in the Ifile, instead of on the inode;
that is, the inode is never written *just* because atime was changed.
Because of this the inodes remain near the file data on the disk, rather
than wandering all over as the disk is read repeatedly. This speeds up
repeated reads by a small but noticeable amount.

Other changes of note include:

- The ifile written by newfs_lfs can now be of arbitrary length, it is no
longer restricted to a single indirect block.

- Fixed an old bug where ctime was changed every time a vnode was created.
I need to look more closely to make sure that the times are only updated
during write(2) and friends, not after-the-fact during a segment write,
and certainly not by the cleaner.
 1.65 30-May-2001  mrg branches: 1.65.2; 1.65.4;
use _KERNEL_OPT
 1.64 26-Jan-2001  itohy branches: 1.64.2;
Call inittodr() from lfs_mountroot() so that the system time is set properly
when booted from LFS.
 1.63 22-Jan-2001  jdolecek make filesystem vnodeop, specop, fifoop and vnodeopv_* arrays const
 1.62 03-Dec-2000  perseant Call uvm_vmp_setsize() in lfs_{fast,}vget to set initial vnode size.
 1.61 03-Dec-2000  chs don't forget to set um_lognindir (now required by ufs_bmaparray()).
 1.60 27-Nov-2000  perseant If LFS_DO_ROLLFORWARD is defined, roll forward from the older checkpoint
on mount, through the newer checkpoint and on through any newer
partial-segments that may have been written but not checkpointed because
of an intervening crash.

LFS_DO_ROLLFORWARD is not defined by default.
 1.59 14-Nov-2000  perseant Initialize the cleaner information in the Ifile from the same info from
the superblock at fs mount time, enabling the previous patch to fsck_lfs.
Patch from Jesse Off <joff@gci-net.com> (Closes PR #11470).
 1.58 09-Sep-2000  perseant Various bug-fixes to LFS, to wit:


Kernel:

* Add runtime quantity lfs_ravail, the number of disk-blocks reserved
for writing. Writes to the filesystem first reserve a maximum amount
of blocks before their write is allowed to proceed; after the blocks
are allocated the reserved total is reduced by a corresponding amount.

If the lfs_reserve function cannot immediately reserve the requested
number of blocks, the inode is unlocked, and the thread sleeps until
the cleaner has made enough space available for the blocks to be
reserved. In this way large files can be written to the filesystem
(or, smaller files can be written to a nearly-full but thoroughly
clean filesystem) and the cleaner can still function properly.

* Remove explicit switching on dlfs_minfreeseg from the kernel code; it
is now merely a fs-creation parameter used to compute dlfs_avail and
dlfs_bfree (and used by fsck_lfs(8) to check their accuracy). Its
former role is better assumed by a properly computed dlfs_avail.

* Bounds-check inode numbers submitted through lfs_bmapv and lfs_markv.
This prevents a panic, but, if the cleaner is feeding the filesystem
the wrong data, you are still in a world of hurt.

* Cleanup: remove explicit references of DEV_BSIZE in favor of
btodb()/dbtob().

lfs_cleanerd:

* Make -n mean "send N segments' blocks through a single call to
lfs_markv". Previously it had meant "clean N segments though N calls
to lfs_markv, before looking again to see if more need to be cleaned".
The new behavior gives better packing of direct data on disk with as
little metadata as possible, largely alleviating the problem that the
cleaner can consume more disk through inefficient use of metadata than
it frees by moving dirty data away from clean "holes" to produce
entirely clean segments.

* Make -b mean "read as many segments as necessary to write N segments
of dirty data back to disk", rather than its former meaning of "read
as many segments as necessary to free N segments worth of space". The
new meaning, combined with the new -n behavior described above,
further aids in cleaning storage efficiency as entire segments can be
written at once, using as few blocks as possible for segment summaries
and inode blocks.

* Make the cleaner take note of segments which could not be cleaned due
to error, and not attempt to clean them until they are entirely free
of dirty blocks. This prevents the case in which a cleanerd running
with -n 1 and without -b (formerly the default) would spin trying
repeatedly to clean a corrupt segment, while the remaining space
filled and deadlocked the filesystem.

* Update the lfs_cleanerd manual page to describe all the options,
including the changes mentioned here (in particular, the -b and -n
flags were previously undocumented).

fsck_lfs:

* Check, and optionally fix, lfs_avail (to an exact figure) and
lfs_bfree (within a margin of error) in pass 5.

newfs_lfs:

* Reduce the default dlfs_minfreeseg to 1/20 of the total segments.

* Add a warning if the sgs disklabel field is 16 (the default for FFS'
cpg, but not usually desirable for LFS' sgs: 5--8 is a better range).

* Change the calculation of lfs_avail and lfs_bfree, corresponding to
the kernel changes mentioned above.

mount_lfs:

* Add -N and -b options to pass corresponding -n and -b options to
lfs_cleanerd.

* Default to calling lfs_cleanerd with "-b -n 4".


[All of these changes were largely tested in the 1.5 branch, with the
idea that they (along with previous un-pulled-up work) could be applied
to the branch while it was still in ALPHA2; however my test system has
experienced corruption on another filesystem (/dev/console has gone
missing :^), and, while I believe this unrelated to the LFS changes, I
cannot with good conscience request that the changes be pulled up.]
 1.57 05-Jul-2000  perseant Clean up accounting of lfs_uinodes (dirty but unwritten inodes).

Make lfs_uinodes a signed quantity for debugging purposes, and set it to
zero as fs mount time.

Enclose setting/clearing of the dirty flags (IN_MODIFIED, IN_ACCESSED,
IN_CLEANING) in macros, and use those macros everywhere. Make
LFS_ITIMES use these macros; updated the ITIMES macro in inode.h to know
about this. Make ufs_getattr use ITIMES instead of FFS_ITIMES.
 1.56 03-Jul-2000  perseant Allow the number of free segments reserved for the cleaner to be
parametrized in the filesystem, defaulting to MIN_FREE_SEGS = 2 but set
to something more reasonable at newfs_lfs time.

Note the number of blocks that have been scheduled for writing but which
are not yet on disk in an inode extension, i_lfs_effnblks. Move
i_ffs_effnlink out of the ffs extension and onto the main inode, since
it's used all over the shared code and the lfs extension would clobber
it.

At inode write time, indirect blocks and inode-held blocks of inodes
that have i_lfs_effnblks != i_ffs_blocks are cleansed of UNWRITTEN disk
addresses, so that these never make it to disk.
 1.55 30-Jun-2000  fvdl Rearrange code around getnewvnode as was already done for ffs, to avoid
locking against oneself because getnewvnode recycles a softdep-using vnode.
 1.54 28-Jun-2000  mrg <vm/vm.h> -> <uvm/uvm_extern.h>
 1.53 27-Jun-2000  perseant Fixes associated with filling an LFS:

Change the space computation to appear to change the size of the *disk*
rather than the *bytes used* when more segment summaries and inode
blocks are written. Try to estimate the amount of space that these will
take up when more files are written, so the disk size doesn't change too
much.

Regularize error returns from lfs_valloc, lfs_balloc, lfs_truncate: they
now fail entirely, rather than succeeding half-way and leaving the fs in
an inconsistent state.

Rewrite lfs_truncate, mostly stealing from ffs_truncate. The old
lfs_truncate had difficulty truncating a large file to a non-zero size
(indirect blocks were not handled appropriately).

Unmark VDIROP on fvp after ufs_remove, ufs_rmdir, so these can be
reclaimed immediately: this vnode would not be written to disk again
anyway if the removal succeeded, and if it failed, no directory
operation occurred.

ufs_makeinode and ufs_mkdir now remove IN_ADIROP on error.
 1.52 27-May-2000  perseant branches: 1.52.4;
Prevent dirops from getting around lfs_check and wedging the buffer cache.
All the dirop vnops now mark the inodes with a new flag, IN_ADIROP, which
is removed as soon as the dirop is done (as opposed to VDIROP which stays
until the file is written). To address one issue raised in PR#9357.
 1.51 19-May-2000  thorpej NULL != 0
 1.50 29-Apr-2000  perseant Test whether the filesystem is an LFS before trying to read the alternate
superblock (whose disk address is stored in the primary superblock). Also,
refuse to mount a filesystem whose superblocks overlap or where the alt.
superblock has a lower disk address than the primary superblock.

Solves PR#10001.
 1.49 23-Apr-2000  perseant Fix problems outlined in PR#9926:
- lfs_truncate extends the file if called with length > i_ffs_size;
- lfs_truncate errors out if called with length < 0;
- lfs_balloc block accounting corrected for the case of blocks read
into the cache before they exist on disk;
- mp->mnt_stat.f_iosize is initialized in lfs_mountfs.
 1.48 30-Mar-2000  augustss Remove register declarations.
 1.47 16-Mar-2000  jdolecek Add new VFS op routine - vfs_done and call it on filesystem detach
in vfs_detach(). vfs_done may free global filesystem's resources,
typically those allocated in respective filesystem's init function.
Needed so those filesystems which went in via LKM have a chance to
clean after themselves before unloading. This fixes random panics
when LKM for filesystem using pools was loaded and unloaded several
times.

For each leaf filesystem, add appropriate vfs_done routine.
 1.46 19-Jan-2000  perseant Changes to stabilize LFS. The first two of these should also apply to the
1.4 branch.

* Use a separate per-fs lock, instead of ufs_hashlock, to protect the Inode
free list. This seems to prevent the "lockmgr: %d, not exclusive lock holder
%d, unlocking" message I was mis-attributing last night to an unlocked vnode
being passed to vrele.

* Change calling semantics of lfs_ifind, to give better error reporting:
If fed a struct buf, it can report the block number of the offending inode
block as well as the inode number.

* Back out rev 1.10 of lfs_subr.c, since the replacement code was slightly
uglier while being functionally identical.

* Make lfs_vunref use the same free list convention as vrele/vput, so that
vget does not remove vnodes from a hash list they are not on.
 1.45 21-Nov-1999  perseant Initialize i_ffs_effnlink, so every file doesn't look like it's already been
deleted for the purpose of dirops (particularly create and mkdir). Addresses
PR#8815.
 1.44 15-Nov-1999  fvdl Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.43 12-Nov-1999  perseant Back out my patch of the 8th (to address unreferenced inode problem).
Apparently this needs more thought.
 1.42 09-Nov-1999  perseant If ifile blocks were written before dirops were complete, and then the
system crashed, inodes could be allocated that were not referenced. (Though
not a serious problem, it evidences itself in phase 4 of fsck_lfs.) Fix
this by marking if_daddr with UNASSIGNED before the inodes are actually
written; at mount time the ifile is checked for UNASSIGNED entries and
any that are found are linked back into the free list. (The latter
functionality should move into the roll-forward agent when it materializes.)
 1.41 06-Nov-1999  perseant branches: 1.41.2;
Address ufs_hashlock/ufs_ihashins protocol bug, discovered while doing a
post-mortem of a production machine. Also, take the active dirop
count off of the fs and make it global (since it is measuring a global
resource) and tie the threshold value LFS_MAXDIROP to desiredvnodes.
 1.40 20-Oct-1999  enami Check if the type of device node isn't VBAD before touching v_specinfo. If
the device vnode is revoked, the field is NULL and touching it causes null
pointer derefercence.
 1.39 18-Oct-1999  wrstuden branches: 1.39.2; 1.39.4;
Catch a few cases missed earlier where we need to lock the vnode before
calling VOP_CLOSE().
 1.38 08-Sep-1999  augustss branches: 1.38.2;
Add #include <sys/device.h> so this file compiles again.
 1.37 08-Sep-1999  sommerfeld Avoid dereferencing NULL rootvp if booting diskless.
 1.36 03-Sep-1999  perseant Make changes that will allow an LFS filesystem to be used as the root
filesystem. In particular,

- Fix mknod deadlock, described in PR 8172.
- Enable lfs_mountroot.
- Make lfs_writevnodes treat filesystems mounted on lfs device nodes properly,
by flushing that device rather than trying to add blocks to the device inode.

This, in combination with lfs boot blocks, will allow operation of an all-lfs
system.
 1.35 17-Jul-1999  wrstuden Adjust mountroot routines to vrele rootvp in case of mount error. Closes
PR 7977 by Neil Carson, <neil@brini.com>.
 1.34 01-Jun-1999  perseant Fixed lfs_update (and related functions) so that calls from lfs_fsync
will DTRT with vnodes marked VDIROP. In particular, the message
"flushing VDIROP" will no longer appear, and the filesystem will remain
stable in the event of a crash.

This was particularly a problem with NFS-exported LFSes, since fsync
was called on every file close.
 1.33 04-May-1999  scottr Include opt_ddb.h so we will get the Debugger() prototype.
 1.32 12-Apr-1999  perseant Check the superblock version field, and refuse to mount the filesystem
if the version number is higher than we know about. This allows, e.g.,
changes in the format of the ifile, segment size restrictions and boundaries,
etc., which would not affect existing fields in the superblock, but which
would drastically affect the filesystem, to be smoothly integrated at a
later date.
 1.31 11-Apr-1999  perseant Fix inode reporting in lfs_statfs (the meaning of f_files and f_ffree was
reversed).
 1.30 11-Apr-1999  perseant Mark the current segment with SEGUSE_ACTIVE at mount time, rather than waiting
for the first write. If this is not done, the cleaner may try to clean the
current segment out from under the writer if the filesystem is mounted after
a crash (or any other time that the dirty:clean segment ration is high enough).
 1.29 04-Apr-1999  mycroft Fix obvious bugs:
* The MNT_UPDATE case had a null pointer dereference. (This is a good example
of why blindly adding bogus initializiers is a FUNDAMENTALLY BAD IDEA!)
* Make sure the whole ufsmount is zeroed, as the export code relies on this.
* If we decided to use the second/alternate superblock, make sure to copy the
in-core version from the right buffer.
Also, reenable NFS exporting.
 1.28 25-Mar-1999  perseant branches: 1.28.2;
clean up unused/required #ifdefs
 1.27 24-Mar-1999  tron Don't include "opt_uvm.h" any more.
 1.26 10-Mar-1999  perseant New sources should leave the LFS in a more-or-less working state. Changes
include:

- DIROP segregation is enabled, and greater care is taken
to make sure that a checkpoint completes. Fsck is not
needed to remount the filesystem.
- Several checks to make sure that the LFS subsystem does not
overuse various resources (memory, in particular).
- The cleaner routines, lfs_markv in particular, are completely
rewritten. A buffer overflow is removed. Greater care is taken
to ensure that inodes come from where lfs_cleanerd say they come
from (so we know nothing has changed since lfs_bmapv was called).
- Fragment allocation is fixed, so that writes beyond end-of-file
do the right thing.
 1.25 26-Feb-1999  wrstuden Modify vfsops to seperate vfs_fhtovp() into two routines. vfs_fhtovp() now
only handles the file handle to vnode conversion, and a new call,
vfs_checkexp(), performs the export verification.
 1.24 11-Sep-1998  pk PR#6032: define fixed sized on-disk superblock structure.
 1.23 01-Sep-1998  thorpej Use the pool allocator and the "nointr" pool page allocator for LFS inodes.
 1.22 24-Jun-1998  sommerfe Always include fifos; "not an option any more".
 1.21 22-Jun-1998  sommerfe defopt for options FIFO
 1.20 09-Jun-1998  scottr Protect various config(8)-generated files from inclusion while
building LKMs. Fixes PR 5557.
 1.19 08-Jun-1998  scottr Use the newly-defined opt_quota.h.
 1.18 18-Mar-1998  bouyer Add support for reading/writing FFS in non-native byte order, conditioned
to "options FFS_EI". The superblock and inodes (without blk addr) are
byteswapped at disk read/write time, other metadatas are byteswapped
when used (as they are acceeded directly in the buffer cache).
This required the addition of a "um_flags" field to struct ufsmount.
ffs_bswap.c contains superblock and inode byteswap routines also used
by userland utilities.
 1.17 01-Mar-1998  fvdl Remove accidentally enabled lfs_mountroot from vfsops struct.
 1.16 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.15 18-Feb-1998  thorpej Place a pointer to an array of our vnodeopv_desc *'s in our vfsops
structure, for use by vfs_attach().
 1.14 16-Oct-1997  mjacob In calculating the f_bavail field, don't take 32 bit quantities and
multiply them by 90 (to be divided by 100) and expect them to be sane
for very large values (I was getting a negative 'avail' count).
 1.13 11-Jun-1997  bouyer Add support for ext2fs, this needed a few modifications to ufs/ufs/inode.h:
- added an "union inode_ext" to struct inode, for the per-fs extentions.
For now only ext2fs uses it.
- i_din is now an union:
union {
struct dinode ffs_din; /* 128 bytes of the on-disk dinode. */
struct ext2fs_dinode e2fs_din; /* 128 bytes of the on-disk dinode. */
} i_din
Added a lot of #define i_ffs_* and i_e2fs_* to access the fields.
- Added two macros: FFS_ITIMES and EXT2FS_ITIMES. ITIMES calls the rigth
macro, depending on the time of the inode. ITIMES is used where necessary,
FFS_ITIMES and EXT2FS_ITIMES in other places.
 1.12 22-Dec-1996  cgd Change the second and third args to struct vfsops' (*vfs_mount)() to
'const char *', and 'void *', respectively. The second arg is taken directly
from user arguments, and is const there, so must be const in the prototypes
and functions. The third arg is also taken directly from user arguments.
It doesn't have to be changed, but since it's cleaner to keep the type
the same as the user arg's type, and I'm already making the 'const char *'
change...
 1.11 25-Mar-1996  pk Appease gcc: unused variables if !QUOTA
 1.10 09-Feb-1996  christos lfs prototypes
 1.9 18-Jun-1995  cgd don't assume the f_fsnamelen is nul-truncated or longer than MFSNAMELEN
 1.8 09-Mar-1995  mycroft copy*str() should use size_t.
 1.7 08-Mar-1995  cgd size for copyinstr should be u_long
 1.6 18-Jan-1995  mycroft Clean up the code to frob mnt_stat a bit.
 1.5 18-Jan-1995  mycroft Turn mountlist into a CIRCLEQ, and handle setting and checking of MNT_ROOTFS
differently.
 1.4 15-Dec-1994  mycroft Call foo_statfs() from a common place when mounting.
 1.3 14-Dec-1994  mycroft Sync with CSRG.
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.28.2.9 05-May-2000  he Pull up revision 1.50 (requested by perseant):
Sanity check the superblock before trying to use it to find the
alt superblock; sanity check the disk address of the alt superblock
to avoid deadlocking when trying to read it with the primary
superblock buffer still busy. Fixes PR#10001.
 1.28.2.8 29-Mar-2000  he Pull up revision 1.38 (requested by simonb):
Prevent lfs_mountroot() from attempting to use a network device
as root.
(This revision is needed on some NetBSD platforms.)
 1.28.2.7 29-Mar-2000  he Pull up revision 1.37 (requested by pk):
Prevent lfs_mountroot() from attempting to use a network device
as root.
 1.28.2.6 20-Jan-2000  he Pull up revision 1.46 (via patch, requested by perseant):
Files removed (through unlink, rmdir) are now really removed, though the
removal is postponed until the dirop is complete to ensure validity of
the filesystem through a crash. Use a separate per-fs lock, instead of
ufs_hashlock, to protect the inode free list. Change calling semantics
of lfs_ifind, to give better error reporting: If fed a struct buf, it
can report the block number of the offending inode block as well as the
inode number.
 1.28.2.5 15-Jan-2000  he Pull up revision 1.36 (requested by perseant):
Address problems related to using an LFS filesystem as the root
filesystem, including mknod hangs. Fixes PR#8172 and PR#9072.
 1.28.2.4 17-Dec-1999  he Pull up revision 1.41 (requested by perseant):
Address locking protocol error for inode hash, and make the
maximum number of active dirops a global quantity.
 1.28.2.3 17-Dec-1999  he Pull up revision 1.34 (via patch, requested by perseant):
Avoid flushing vnodes involved in a dirop, making lfs' promise
of "no fsck needed, even in the event of a crash" closer to
reality.
 1.28.2.2 19-Oct-1999  he Pull up revision 1.39 (requested by wrstuden):
Catch a few cases missed earlier where we need to lock the vnode before
calling VOP_CLOSE().
 1.28.2.1 13-Apr-1999  perseant branches: 1.28.2.1.2;
Pull-up of changes made to the trunk on Sunday [1.30->1.32], to wit:

Take out the `#ifdef USE_UFSHASH'; use ufs_hashlock to lock the inode free
list instead of free_lock.

Fix inode reporting in lfs_statfs (the meaning of f_files and f_ffree was
reversed).

Fix "lfs_ifind: dinode xxx not found" panic. When inodes were freed, then
immediately reloaded, their dinodes were located in an inode block which
was not on disk at the advertized location, nor in the cache (although it
would be flushed to disk next segment write). Fix this by using getblk()
instead of lfs_newbuf() for inode blocks.

Better checking for held inode locks in lfs_fastvget, for a number of
error conditions. Also change the default setting of lfs_clean_vnhead to
0, which seems to make the locking problems go away (although this is
difficult to test as I can't reliably reproduce them).

Make sure that the wakeup occurs for vnodes that lfs_update might be
sleeping on (nodes which are not marked IN_MODIFIED/IN_CLEANING, but which
have dirty buffers), by marking them with the appropriate flag if
dirtybuffers were added while the write was in progress.

Fix block counting during file truncation, if not truncating to zero.

Disallow threshold-initiated cache flush when dirops are active. Also,
make SET_ENDOP use lfs_check instead of inlining most of it.

Improve the debugging printfs in the cleaner syscalls (in particular, make
it obvious that they're coming from lfs).

Check the superblock version field, and refuse to mount the filesystem if
the version number is higher than we know about. This allows, e.g.,
changes in the format of the ifile, segment size restrictions and
boundaries, etc., which would not affect existing fields in the
superblock, but which would drastically affect the filesystem, to be
smoothly integrated at a later date.
 1.28.2.1.2.3 31-Aug-1999  perseant Rudimentary support for LFS under UBC:

- LFS-specific VOP_BALLOC and VOP_PUTPAGES vnode ops.

- getblk VREG panic #ifdef'd out (can be reinstated when Ifile is
internalized and Ifile can be made another type from VREG)

- interface to VOP_PUTPAGES changed to pass all pager flags, not
just sync. FS putpages routines must know about the pager flags.

- new LFS magic disk address, -2 ("unwritten"), meaning accounted for
but not assigned to a fixed disk location (since LFS does these two
things separately, and the previous accounting method using buffer
headers no longer will work). Changed references to (foo == (daddr_t)-1)
to (foo < 0). Since disk drivers reject all addresses < 0, this should
not present a problem for other FSs.
 1.28.2.1.2.2 02-Aug-1999  thorpej Update from trunk.
 1.28.2.1.2.1 21-Jun-1999  thorpej Sync w/ -current.
 1.38.2.2 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.38.2.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.39.4.3 15-Nov-1999  fvdl Sync with -current
 1.39.4.2 03-Nov-1999  fvdl Give ufs_ihashget an extra argument: the flags passed to vget() for
locking. This way we can avoid locking against ourselves when
ufs_ihashget is called during the flushing of metadata. XXX

Also, comment out a VOP_FSYNC call that I think is now unneeded, and
put a diagnostic printf there to check if this still happens.
 1.39.4.1 19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.39.2.5 11-Feb-2001  bouyer Sync with HEAD.
 1.39.2.4 08-Dec-2000  bouyer Sync with HEAD.
 1.39.2.3 22-Nov-2000  bouyer Sync with HEAD.
 1.39.2.2 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.39.2.1 20-Oct-1999  thorpej Sync w/ trunk.
 1.41.2.2 06-Nov-1999  perseant Address ufs_hashlock/ufs_ihashins protocol bug, discovered while doing a
post-mortem of a production machine. Also, take the active dirop
count off of the fs and make it global (since it is measuring a global
resource) and tie the threshold value LFS_MAXDIROP to desiredvnodes.
 1.41.2.1 06-Nov-1999  perseant file lfs_vfsops.c was added on branch comdex-fall-1999 on 1999-11-06 20:33:07 +0000
 1.52.4.3 03-Feb-2001  he Pull up revision 1.59 (requested by perseant):
o Initialize cleaner info from superblock, making fsck_lfs'
accounting of lfs_nclean work.
 1.52.4.2 14-Sep-2000  perseant Pull up recent LFS kernel changes (approved by thorpej):

ufs/ufs/inode.h, 1.20--1.22 (add i_lfs_effnblks extension ;
make ITIMES aware of LFS_ITIMES;
_LKM protection so userland progs
compile)
ufs/ufs/ufs_vnops.c, 1.69, 1.71 (remove IN_ADIROP;
use ITIMES instead of FFS_ITIMES)
ufs/ufs/ufs_readwrite.c, 1.27 (use lfs_reserve in lfs_write)
ufs/lfs/lfs.h, 1.26--1.32 (define LFS_EST_* macros ;
change MIN_FREE_SEGS to lfs_minfreesegs ;
add avail and bfree to CLEANERINFO ;
change lfs_uinodes to signed ;
change lfs_dmeta to signed ;
add whitespace to line up structure
members ;
explicit cast to int32_t in LFS_EST_*
macros)
ufs/lfs/lfs_alloc.c, back out 1.34.2.3 (pullups of 1.39, 1.40);
then pull up 1.38 (clean up on error)
1.39--1.43 (restore fvdl's ufs_hashlock fix ;
restore fvdl's ufs_hashlock fix ;
set i_lfs_effnblks ;
use UINO macros ;
add comments and fix long lines)
ufs/lfs/lfs_balloc.c, 1.19 (don't succeed halfway)
1.21--1.25 (use i_lfs_effnblks ;
fix i_lfs_effnblks computation and
quieten ;
fix i_ffs_blocks in unwritten fragment ;
remove useless debugging check ;
add comments and (c) 2000)
ufs/lfs/lfs_bio.c, 1.24--1.30 (cleanup and make lfs_flush_fs take
"struct lfs *" instead of "struct
mount *" ;
use lfs_minfreeseg instead of
MIN_FREE_SEGS ;
use UINO macros, and copy bfree/avail
to CLEANERINFO ;
add lfs_reserve function ;
1.28--1.30 fix printf formatting)
ufs/lfs/lfs_cksum.c, 1.13 (add (c) 2000)
ufs/lfs/lfs_debug.c, 1.11 (use btodb instead of DEV_BSIZE)
ufs/lfs/lfs_extern.h, 1.18, 1.20--1.21 (function prototype changes)
ufs/lfs/lfs_inode.c, 1.38 (rewrite lfs_truncate from
ffs_truncate)
1.40--1.44 (count written and unwritten blocks
seperately ;
use disk block units instead of bytes ;
remove unnecessary "mod" variable ;
correct B_DELWRI to avoid bawrite panic ;
use lfs_reserve)
ufs/lfs/lfs_segment.c, 1.52-1.59 (use lfs_dmeta to note used summaries ;
check for UNWRITTEN in indirect blocks ;
more debugging stuff inside #ifdef
DEBUG_LFS ;
use LK_CANRECURSE ;
don't drop dirty indirect blocks ;
use UINO macros ;
don't hose the free list ;
use btodb() instead of DEV_BSIZE ;
make it compile again (oops))
ufs/lfs/lfs_subr.c, 1.16--1.17 (check for locked inodes before
changing ;
use btodb() instead of DEV_BSIZE, (c)
2000)
ufs/lfs/lfs_syscalls.c, back out 1.41.4.2 (fvdl's ufs_hashlock fix);
then pull up 1.43 (use lfs_dmeta)
1.44--1.45 (restore fvdl's ufs_hashlock fix)
1.46--1.47 (fix lfs_avail leakage from sblock
segments ;
use UINO macros)
1.49 (bounds-check inode numbers in
lfs_markv)
ufs/lfs/lfs_vfsops.c, 1.53 (use LFS_EST_* macros in lfs_statfs)
1.56--1.58 (initialize lfs_minfreeseg, lfs_effnblk ;
initialize lfs_uinodes ;
initialize lfs_ravail)
ufs/lfs/lfs_vnops.c, 1.40 (remove VDIROP from removed files)
1.42--1.44 (move SET_ENDOP below the removal of
VDIROP ;
use UINO macros and add lfs_itimes
function ;
use lfs_reserve in dirops)
 1.52.4.1 03-Jul-2000  fvdl pullup the fixes from the trunk to not hold ufs_hashlock across
getnewvnode()
 1.64.2.15 15-Jan-2003  thorpej Sync with HEAD.
 1.64.2.14 11-Dec-2002  thorpej Sync with HEAD.
 1.64.2.13 18-Oct-2002  nathanw Catch up to -current.
 1.64.2.12 17-Sep-2002  nathanw Catch up to -current.
 1.64.2.11 01-Aug-2002  nathanw Catch up to -current.
 1.64.2.10 15-Jul-2002  nathanw Whitespace.
 1.64.2.9 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.64.2.8 20-Jun-2002  nathanw Catch up to -current.
 1.64.2.7 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.64.2.6 08-Jan-2002  nathanw Catch up to -current.
 1.64.2.5 14-Nov-2001  nathanw Catch up to -current.
 1.64.2.4 21-Sep-2001  nathanw Catch up to -current.
 1.64.2.3 24-Aug-2001  nathanw Catch up with -current.
 1.64.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.64.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.65.4.6 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.65.4.5 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.65.4.4 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.65.4.3 16-Mar-2002  jdolecek Catch up with -current.
 1.65.4.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.65.4.1 03-Aug-2001  lukem update to -current
 1.65.2.4 02-Jul-2001  perseant Change disk addressing unit to be the fragment, instead of the disk sector.
All quantities in the superblock, inodes, indirect blocks, etc. refer now
to this abstract unit (called "fsb" as it is in FFS) instead of disk sectors;
as a consequence segment summary blocks have to be multiples of a fragment in
size. In v1 filesystems, compatibility code ensures that 1 fsb == 1 sector,
regardless of fragment size.

Fragments can now range in size between 512 and 32k; in the event that
LFS_LABELPAD (8k) is smaller than the disk address unit size, an extra
proto-superblock is kept at 8k from the beginning of the disk, to be used
*only* to locate the real superblocks. (Not all of the userland knows about
this yet.)

Almost all of this was done not by me, but by joff.
 1.65.2.3 29-Jun-2001  perseant fix comment in light of roll_id
 1.65.2.2 29-Jun-2001  perseant Get rid of __P(), protoizing where it had not already been done
 1.65.2.1 27-Jun-2001  perseant Import of what I've been calling "LFSv2", that is, LFS with some features
added that require changes to the on-disk data structures. These include:

- 64-bit time in everything but inodes
- User-specified segment offset, and segment size no longer
restricted to PO2.
- Serial number on segment summaries in addition to timestamp, and
a new volume identifier, to make roll-forward feasible without
fear of finding old data and thinking it was new.

Although I think this version works at least as well as what's on the trunk,
we're not done yet; hence this commit is going in on a branch and not on
the trunk. Enhancements that are not here yet include fragment addressing,
like FFS does, instead of block addressing.
 1.66.2.3 01-Oct-2001  fvdl Catch up with -current.
 1.66.2.2 26-Sep-2001  fvdl * add a VCLONED vnode flag that indicates a vnode representing a cloned
device.
* rename REVOKEALL to REVOKEALIAS, and add a REVOKECLONE flag, to pass
to VOP_REVOKE
* the revoke system call will revoke all aliases, as before, but not the
clones
* vdevgone is called when detaching a device, so make it use REVOKECLONE
to get rid of all clones as well
* clean up all uses of VOP_OPEN wrt. locking.
* add a few VOPS to spec_vnops that need to do something when it's a
clone vnode (access and getattr)
* add a copy of the vnode vattr structure of the original 'master' vnode
to the specinfo of a cloned vnode. could possibly redirect getattr to
the 'master' vnode, but this has issues with revoke
* add a vdev_reassignvp function that disassociates a vnode from its
original device, and reassociates it with the specified dev_t. to be
used by cloning devices only, in case a new minor is allocated.
* change all direct references in drivers to v_devcookie and v_rdev
to vdev_privdata(vp) and vdev_rdev(vp). for diagnostic purposes
when debugging race conditions that still exist wrt. locking and
revoking vnodes.
* make the locking state of a vnode consistent when passed to
d_open and d_close (unlocked). locked would be better, but has
some deadlock issues
 1.66.2.1 18-Sep-2001  fvdl Various changes to make cloning devices possible:

* Add an extra argument (struct vnode **) to VOP_OPEN. If it is
not NULL, specfs will create a cloned (aliased) vnode during
the call, and return it there. The caller should release and
unlock the original vnode if a new vnode was returned. The
new vnode is returned locked.

* Add a flag field to the cdevsw and bdevsw structures.
DF_CLONING indicates that it wants a new vnode for each
open (XXX is there a better way? devprop?)

* If a device is cloning, always call the close entry
point for a VOP_CLOSE.


Also, rewrite cons.c to do the right thing with vnodes. Use VOPs
rather then direct device entry calls. Suggested by mycroft@

Light to moderate testing done an i386 system (arch doesn't matter
though, these are MI changes).
 1.68.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.74.2.5 29-Aug-2002  gehenna catch up with -current.
 1.74.2.4 15-Jul-2002  gehenna catch up with -current.
 1.74.2.3 20-Jun-2002  gehenna catch up with -current.
 1.74.2.2 30-May-2002  gehenna Catch up with -current.
 1.74.2.1 16-May-2002  gehenna Use devsw APIs for checking validity of major numbers.
 1.76.2.1 20-Jun-2002  lukem Pull up revision 1.77 (requested by perseant in ticket #325):
For synchronous writes, keep separate i/o counters for each write, so
processes don't have to wait for one another to finish (e.g., nfsd seems
to be a little happier now, though I haven't measured the difference).
Synchronous checkpoints, however, must always wait for all i/o to finish.
Take the contents of the callback functions and have them run in thread
context instead (aiodoned thread). lfs_iocount no longer has to be
protected in splbio(), and quite a bit less of the segment construction
loop needs to be in splbio() as well.
If lfs_markv is handed a block that is not the correct size according to
the inode, refuse to process it. (Formerly it was extended to the "correct"
size.) This is possibly more prone to deadlock, but less prone to corruption.
lfs_segclean now outright refuses to clean segments that appear to have live
bytes in them. Again this may be more prone to deadlock but avoids
corruption.
Replace ufsspec_close and ufsfifo_close with LFS equivalents; this means
that no UFS functions need to know about LFS_ITIMES any more. Remove
the reference from ufs/inode.h.
Tested on i386, test-compiled on alpha.
 1.121.2.13 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.121.2.12 01-Apr-2005  skrll Sync with HEAD.
 1.121.2.11 08-Mar-2005  skrll Sync with HEAD.
 1.121.2.10 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.121.2.9 17-Jan-2005  skrll Sync with HEAD.
 1.121.2.8 27-Oct-2004  skrll Remove the struct lwp * arguments from qsync and ufs_checkpath that are
no longer (read: were never) required.
 1.121.2.7 21-Sep-2004  skrll Fix the sync with head I botched.
 1.121.2.6 18-Sep-2004  skrll Sync with HEAD.
 1.121.2.5 25-Aug-2004  skrll Sync with HEAD.
 1.121.2.4 24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.121.2.3 03-Aug-2004  skrll Sync with HEAD
 1.121.2.2 19-Aug-2003  skrll LWPify
 1.121.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.146.2.1 29-May-2004  tron branches: 1.146.2.1.2;
Pull up revision 1.151 (requested by atatat in ticket #393):
Sysctl descriptions under vfs subtree
 1.146.2.1.2.1 10-May-2005  riz Pull up the following revisions (requested by perseant in ticket #1281):

1.8 sys/ufs/lfs/TODO
1.75 sys/ufs/lfs/lfs.h (via patch)
1.74 sys/ufs/lfs/lfs_alloc.c (via patch)
1.49, 1.51 sys/ufs/lfs/lfs_balloc.c (1.51 via patch)
1.78 sys/ufs/lfs/lfs_bio.c
1.62 sys/ufs/lfs/lfs_extern.h (via patch)
1.156 sys/ufs/lfs/lfs_segment.c (via patch)
1.48 sys/ufs/lfs/lfs_subr.c
1.101 sys/ufs/lfs/lfs_syscalls.c
1.163 sys/ufs/lfs/lfs_vfsops.c (via patch)
1.134 sys/ufs/lfs/lfs_vnops.c (via patch)
1.61 sys/ufs/ufs/ufs_readwrite.c (via patch)

1.20 libexec/lfs_cleanerd/clean.h (via patch)
1.52 libexec/lfs_cleanerd/cleanerd.c (via patch)
1.41 libexec/lfs_cleanerd/library.c (via patch)

1.4 regress/sys/fs/lfs/newfs_fsck/Makefile
1.2 regress/sys/fs/lfs/newfs_fsck/mkfs_mount
1.2 regress/sys/fs/lfs/newfs_fsck/smallfiles
1.3 sbin/fsck_lfs/bufcache.c
1.3 sbin/fsck_lfs/bufcache.h
1.3 sbin/fsck_lfs/lfs.h
1.8 sbin/fsck_lfs/lfs.c (via patch)
1.8 sbin/fsck_lfs/pass3.c (via patch)
1.18 sbin/fsck_lfs/pass0.c (via patch)
1.18 sbin/fsck_lfs/utilities.c (via patch)
1.7 sbin/fsck_lfs/segwrite.c
1.19 sbin/fsck_lfs/setup.c (via patch)
1.3 sbin/newfs_lfs/Makefile
0 sbin/newfs_lfs/lfs.c (yes, remove it)
1.1 sbin/newfs_lfs/make_lfs.c
1.15 sbin/newfs_lfs/newfs.c (via patch)

Various minor LFS improvements.

Kernel:

* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this. Should fix PR #29045.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
Fixes PR #26680.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().

cleaner:

* Adapt lfs_cleanerd to use the fcntl call to get the Ifile filehandle,
so it need not be in the namespace.
* Make lfs_cleanerd be more careful when there are very few available
segments.
* Make lfs_cleanerd less verbose when the filesystem is unmounted.

newfs_lfs, fsck_lfs, and regression:

* Extend the lfs library from fsck_lfs(8) so that it can be used with a
not-yet-existent LFS. Make newfs_lfs(8) use this library, so it can
create LFSs whose Ifile is larger than one segment. Addresses PR #11110.
* Make newfs_lfs(8) use strsuftoi64() for its arguments, a la newfs(8).
* Make fsck_lfs(8) respect the "file system is clean" flag.
* Don't let fsck_lfs(8) think it has dirty blocks when invoked with the
-n flag.
* Remove the Ifile from the filesystem namespace. The cleaner now uses
a fcntl call on the root inode to find the Ifile filehandle. (As a
side-effect, addresses PR #29144.)
 1.162.4.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.162.2.1 29-Apr-2005  kent sync with -current
 1.167.2.22 10-Aug-2006  tron Apply patch (requested by fair in perseant #1457):
Bring LFS up to current, including a patch (1.95 lfs_alloc.c) that
should prevent the inode free list errors seen on the STABLE branch
subsequent to pullup ticket #1327.
 1.167.2.21 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_alloc.c: revision 1.93
sys/ufs/lfs/lfs.h: revision 1.106
sys/ufs/lfs/lfs_vfsops.c: revision 1.209
sys/ufs/lfs/lfs_vnops.c: revision 1.175
sys/ufs/lfs/lfs_segment.c: revision 1.178
Fixes to address the "vinvalbuf: dirty blocks" panic that can occur when
many inodes are cleaned at once. Make sure that we write all the pages
on vnodes that are being flushed, even if we don't think there's room;
drain v_numoutput before lfs_vflush() completes.
Also, don't allow a vnode that is in the process of being cleaned to be
chosen by getnewvnode(); this avoids a segment accounting panic in the case
that a large number of inodes are fed to lfs_markv() all at once.
 1.167.2.20 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_alloc.c: revision 1.92
sys/ufs/lfs/lfs.h: revision 1.105
sys/ufs/lfs/lfs_vfsops.c: revision 1.207
sys/ufs/lfs/lfs_subr.c: revision 1.59
sys/ufs/lfs/lfs_vnops.c: revision 1.173
sys/ufs/lfs/lfs_bio.c: revision 1.92
Introduce another per-filesystem parameter, lfs_resvseg, to separate the
notion of "how many segments are reserved for the cleaner" from that of
"how many segments are not counted in lfs_bfree". The default value
used for existing filesystems is the same as the previous implicit value
of (lfs_minfreeseg / 2 + 1), modulo some sanity checking.
Count pending dirops on a per-filesystem basis, since once we start
writing them we can't stop until we're done. This seems to help stave off
the "no clean segments" panic in the case of filling the filesystem with
directories and small files (e.g. simultaneously unpacking more copies of
pkgsrc than will fit).
 1.167.2.19 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs.h: revision 1.104
sys/ufs/lfs/lfs_vfsops.c: revision 1.206
sys/ufs/lfs/lfs_vnops.c: revision 1.170
sys/ufs/lfs/lfs_extern.h: revision 1.80
sys/ufs/lfs/lfs_segment.c: revision 1.176
sys/ufs/lfs/lfs_inode.c: revision 1.103 via patch
sys/ufs/lfs/lfs_alloc.c: revision 1.90
Postpone the segment accounting changes coming from truncation until the
inode that makes those changes valid is either written to disk by
lfs_writeinode() or discarded by lfs_vfree().
A couple of locking fixes are also included as well.
 1.167.2.18 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vfsops.c: revision 1.205
Don't roll forward if we aren't given a process context. Coverity CID 1076.
 1.167.2.17 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vfsops.c: revision 1.204 via patch
Coverity CID 2499: Fix uninitialize variable use.
 1.167.2.16 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vfsops.c: revision 1.203 via patch
Remove mostly useless BUFPAGES warning message from lfs_{un,}mount.
 1.167.2.15 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs.h: revision 1.101
sys/ufs/lfs/lfs_vfsops.c: revision 1.202
sys/ufs/lfs/lfs_alloc.c: revision 1.88
Optimize the free list search a little more; in particular use words
instead of bytes for the index, and never search below fs->lfs_freehd.
Fix a bug in the previous version of the search (an erroneous assumption
that ino_t was signed).
Free the bitmap when we unmount the filesystem.
 1.167.2.14 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vfsops.c: revision 1.201
Correct a locking bug in the recent pager optimization.
 1.167.2.13 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vfsops.c: revision 1.200
sys/ufs/lfs/lfs_vnops.c: revision 1.164
sys/ufs/lfs/lfs_inode.c: revision 1.101
sys/ufs/lfs/lfs_extern.h: revision 1.78
sys/ufs/lfs/lfs.h: revision 1.100
Implement a somewhat finer-grained mechanism for paging LFS-backed pages.
The writer daemon, if it does not need to flush the whole filesystem,
now only writes the vnodes for which the pagedaemon has requested pageouts
(although it does not pay attention to the page ranges the pagedaemon
supplies).
 1.167.2.12 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_alloc.c: revision 1.87
sys/ufs/lfs/lfs.h: revision 1.99
sys/ufs/lfs/lfs_vfsops.c: revision 1.199
sys/ufs/lfs/lfs_extern.h: revision 1.77 via patch
Keep the free list ordered. This solves a problem first pointed out to me
by Michel Oey, in which an aged LFS writes up to an extra Ifile block for
every file created; and paves the way for the truncation of the Ifile when
many files are deleted.
 1.167.2.11 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vfsops.c: revision 1.198
sys/ufs/lfs/lfs_vnops.c: revision 1.161
Handle the "filesystem is clean" flag correctly when upgrading from
read-only to read-write mount. This makes "root on lfs" work for me,
although it looks like a different traceback from PR#32667.
 1.167.2.10 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vfsops.c: revision 1.196
Double-checkpoint on unmount. This ensures that vnodes belonging to removed
files are really freed, preventing occasional spurious EBUSY returns from
vflush().
 1.167.2.9 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.158
sys/ufs/lfs/lfs_subr.c: revision 1.57
sys/ufs/lfs/lfs_segment.c: revision 1.171
sys/ufs/lfs/lfs.h: revision 1.97
sys/ufs/lfs/lfs_vfsops.c: revision 1.195
sys/ufs/lfs/lfs_extern.h: revision 1.76
Improvements to LFS's paging mechanism, to wit:
* Acknowledge that sometimes there are more dirty pages to be written to
disk than clean segments. When we reach the danger line,
lfs_gop_write() now returns EAGAIN. The caller of VOP_PUTPAGES(), if
it holds the segment lock, drops it and waits for the cleaner to make
room before continuing.
* Note and avoid a three-way deadlock in lfs_putpages (a writer holding
a page busy blocks on the cleaner while the cleaner blocks on the
segment lock while lfs_putpages blocks on the page).
 1.167.2.8 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_segment.c: revision 1.170
sys/ufs/lfs/lfs.h: revision 1.96
sys/ufs/lfs/lfs_vfsops.c: revision 1.194
sys/ufs/lfs/lfs_syscalls.c: revision 1.109
From Konrad Schroeder, in response to strange df output on anoncvs.netbsd.org:
We were returning the wrong value for free space. Now we're not.
 1.167.2.7 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.153
sys/ufs/lfs/lfs_debug.c: revision 1.32
sys/ufs/lfs/lfs_alloc.c: revision 1.84
sys/ufs/lfs/lfs_vfsops.c: revision 1.185
sys/ufs/lfs/lfs_segment.c: revision 1.165
64 bit inode changes.
 1.167.2.6 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.152
sys/ufs/lfs/lfs_debug.c: revision 1.31
sys/ufs/lfs/lfs_subr.c: revision 1.53
sys/ufs/lfs/lfs_extern.h: revision 1.68
sys/ufs/lfs/lfs_inode.c: revision 1.96
sys/ufs/lfs/lfs_bio.c: revision 1.86
sys/ufs/lfs/lfs_alloc.c: revision 1.83
sys/ufs/lfs/lfs_vfsops.c: revision 1.181
sys/ufs/lfs/lfs.h: revision 1.88
sys/ufs/lfs/lfs_segment.c: revision 1.164
- sprinkle const
- avoid shadow variables.
 1.167.2.5 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vfsops.c: revision 1.180
sys/ufs/lfs/lfs_syscalls.c: revision 1.106
sys/ufs/lfs/lfs.h: revision 1.87
Keep track of the number of segments reclaimed, since the cleaner doesn't
do this anymore (it hasn't for quite some time). Add a couple of conditional
debugging messages to indicate why segments are not cleaned, in the event
that lfs_segclean is used.
Make the LFCNSEGWAITALL fcntl work again.
 1.167.2.4 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vfsops.c: revision 1.179
Fill in the lfs_fsmnt field in the superblock when we mount the filesystem,
so fsck(8) can tell where it was last mounted.
 1.167.2.3 24-Aug-2005  riz Pull up following revision(s) (requested by yamt in ticket #688):
sys/miscfs/genfs/genfs_vnops.c: revision 1.98 via patch
sys/ufs/ffs/ffs_vfsops.c: revision 1.165
sys/ufs/lfs/lfs_extern.h: revision 1.69
sys/fs/filecorefs/filecore_vfsops.c: revision 1.20
sys/nfs/nfs_node.c: revision 1.80
sys/fs/smbfs/smbfs_node.c: revision 1.24
sys/fs/cd9660/cd9660_vfsops.c: revision 1.24
sys/fs/msdosfs/msdosfs_denode.c: revision 1.8
sys/miscfs/genfs/genfs_node.h: revision 1.6
sys/ufs/lfs/lfs_vfsops.c: revision 1.183
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.86
sys/fs/adosfs/advfsops.c: revision 1.23
sys/fs/ntfs/ntfs_vfsops.c: revision 1.31
- constify genfs_ops.
- use member designators.

sys/miscfs/genfs/genfs_vnops.c: revision 1.99 via patch
genfs_getpages: don't forget to put the vnode onto the syncer's work que
ue
even in the case of PGO_LOCKED.

sys/uvm/uvm_bio.c: revision 1.40
sys/uvm/uvm_pager.h: revision 1.29
sys/miscfs/genfs/genfs_vnops.c: revision 1.100 via patch
sys/ufs/ufs/ufs_inode.c: revision 1.50
- introduce PGO_NOBLOCKALLOC and use it for ubc mapping
to prevent unnecessary block allocations in the case that
page size > block size.
- ufs_balloc_range: use VM_PROT_WRITE+PGO_NOBLOCKALLOC rather than
VM_PROT_READ.

sys/uvm/uvm_fault.c: revision 1.96
sys/miscfs/genfs/genfs_vnops.c: revision 1.101 via patch
sys/uvm/uvm_object.h: revision 1.19
sys/miscfs/genfs/genfs_node.h: revision 1.7
ensure that vnodes with dirty pages are always on syncer's queue.
- genfs_putpages: wait for i/o completion of PG_RELEASED/PG_PAGEOUT pages by
setting "wasclean" false when encountering them.
suggested by Stephan Uphoff in PR/24596 (1).
- genfs_putpages: write protect pages when cleaning out, if
we're going to take the vnode off the syncer's queue.
uvm_fault: don't write-map pages unless its vnode is already on
the syncer's queue.
fix PR/24596 (3) but in the different way from the suggested fix.
(to keep our current behaviour, ie. not to require explicit msync.
discussed on tech-kern@.)
- genfs_putpages: don't mistakenly take a vnode off the queue
by introducing a generation number in genfs_node.
genfs_getpages: increment the generation number.
suggested by Stephan Uphoff in PR/24596 (2).
- add some assertions.

sys/miscfs/genfs/genfs_vnops.c: revision 1.102 via patch
genfs_putpages: don't bother to clean the vnode unless VONWORKLST.

sys/ufs/ffs/ffs_vnops.c: revision 1.71
ffs_full_fsync: because VBLK/VCHR can be mmap'ed,
do VOP_PUTPAGES for them as well.

sys/uvm/uvm_fault.c: revision 1.97
uvm_fault: check a correct object in the case of layered filesystems.
fix PR/30811 from Jukka Salmi.

sys/uvm/uvm_object.h: revision 1.20
sys/ufs/ffs/ffs_vfsops.c: revision 1.167
sys/uvm/uvm_bio.c: revision 1.41
sys/ufs/ufs/ufs_vnops.c: revision 1.129
sys/uvm/uvm_mmap.c: revision 1.92
sys/uvm/uvm_fault.c: revision 1.98
sys/kern/vfs_subr.c: revision 1.252
sys/fs/msdosfs/denode.h: revision 1.5
sys/miscfs/genfs/genfs_vnops.c: revision 1.103 via patch
sys/fs/msdosfs/msdosfs_denode.c: revision 1.9
sys/sys/vnode.h: revision 1.141
sys/ufs/ufs/ufs_inode.c: revision 1.51
sys/ufs/ufs/ufs_extern.h: revision 1.45 via patch
sys/miscfs/genfs/genfs_node.h: revision 1.8
sys/ufs/lfs/lfs_vfsops.c: revision 1.184
sys/uvm/uvm_pager.h: revision 1.30
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.87
update file timestamps for nfsd loaned-read and mmap.
PR/25279. discussed on tech-kern@.

sys/miscfs/genfs/genfs_vnops.c: revision 1.104 via patch
don't write-protect wired pages. pointed by Chuck Silvers.
for now, leave a vnode on the syncer's queue, as suggested by him.

sys/ufs/ffs/ffs_vnops.c: revision 1.72
revert VCHR part of ffs_vnops.c 1.71.
as VCHR uses the device pager, no point to call VOP_PUTPAGES here.
pointed by Chuck Silvers.
 1.167.2.2 18-May-2005  snj Pull up revision 1.178 (requested by perseant in ticket #311):
Don't let the pager_map deadlock avoidance code in lfs_putpages() write
segments containing zero-block FINFO records. These records cause segments
to become uncleanable, which would eventually result in a "no clean segments"
panic.
 1.167.2.1 07-May-2005  tron Apply patch (requested by perseant in ticket #242):
* fsck_lfs buffer cache fixes, including PR #29151
* Change fsck_lfs phase 0 message to reflect reality
* fsck_lfs: check phase 5 (cleanerinfo accounting) even on
roll-forward
* Keep better track of the free list during roll-forward, avoiding
a core dump
* Improve hash table use for fsck_lfs buffer and vnode cache
* Document fsck_lfs flag -f, and implement -q
* Add resize_lfs, including kernel support
* Add LFS to mountd's list of exportable filesystem types
* Make the LFS lkm work again [christos@]
* Add MP locking to the LFS kernel subsystem
* Fix pager_map deadlock in lfs_putpages()
* Avoid incomplete file extension that looks like "partial
truncation" to fsck
* Use lfs_malloc for cleaner malloc, since the cleaner often runs
in low-memory conditions.
* Use splay trees, not hash table, to track page allocation for
write.
* Fix mkdir panic on full fs
* Fix page accounting leak by counting differently.
* Use rightly named structure for lfs_getattr [skrll@]
* Cosmetic changes for readability.
 1.183.2.9 04-Feb-2008  yamt sync with head.
 1.183.2.8 21-Jan-2008  yamt sync with head
 1.183.2.7 07-Dec-2007  yamt sync with head
 1.183.2.6 15-Nov-2007  yamt sync with head.
 1.183.2.5 27-Oct-2007  yamt sync with head.
 1.183.2.4 03-Sep-2007  yamt sync with head.
 1.183.2.3 26-Feb-2007  yamt sync with head.
 1.183.2.2 30-Dec-2006  yamt sync with head.
 1.183.2.1 21-Jun-2006  yamt sync with head.
 1.188.2.2 29-Oct-2005  yamt use lfs_* directly rather than via ufs_ops.
suggested by Chuck Silvers.
 1.188.2.1 20-Oct-2005  yamt adapt ufs.
 1.190.2.2 01-Mar-2006  yamt sync with head.
 1.190.2.1 15-Jan-2006  yamt sync with head.
 1.192.4.2 01-Jun-2006  kardel Sync with head.
 1.192.4.1 22-Apr-2006  simonb Sync with head.
 1.192.2.1 09-Sep-2006  rpaulo sync with head
 1.193.6.3 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.193.6.2 31-Mar-2006  tron Merge 2006-03-31 NetBSD-current into the "peter-altq" branch.
 1.193.6.1 28-Mar-2006  tron Merge 2006-03-28 NetBSD-current into the "peter-altq" branch.
 1.193.4.4 11-May-2006  elad sync with head
 1.193.4.3 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.193.4.2 19-Apr-2006  elad sync with head.
 1.193.4.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.193.2.6 03-Sep-2006  yamt sync with head.
 1.193.2.5 11-Aug-2006  yamt sync with head
 1.193.2.4 26-Jun-2006  yamt sync with head.
 1.193.2.3 24-May-2006  yamt sync with head.
 1.193.2.2 11-Apr-2006  yamt sync with head
 1.193.2.1 01-Apr-2006  yamt sync with head.
 1.212.2.1 19-Jun-2006  chap Sync with head.
 1.213.2.1 13-Jul-2006  gdamore Merge from HEAD.
 1.220.4.2 10-Dec-2006  yamt sync with head.
 1.220.4.1 22-Oct-2006  yamt sync with head
 1.220.2.3 01-Feb-2007  ad Sync with head.
 1.220.2.2 12-Jan-2007  ad Sync with head.
 1.220.2.1 18-Nov-2006  ad Sync with head.
 1.224.4.1 03-Sep-2007  wrstuden Sync w/ NetBSD-4-RC_1
 1.224.2.1 05-Jun-2007  bouyer Pull up following revision(s) (requested by perseant in ticket #703):
sys/miscfs/genfs/genfs.h 1.21
sys/miscfs/genfs/genfs_vnops.c 1.151
sys/ufs/lfs/lfs.h 1.119, 1.120
sys/ufs/lfs/lfs_bio.c 1.99-101
sys/ufs/lfs/lfs_extern.h 1.89
sys/ufs/lfs/lfs_inode.c 1.108, 1.109
sys/ufs/lfs/lfs_segment.c 1.197, 1.199, 1.200
sys/ufs/lfs/lfs_subr.c 1.69, 1.70
sys/ufs/lfs/lfs_syscalls.c 1.119
sys/ufs/lfs/lfs_vfsops.c 1.234, 1.235
sys/ufs/lfs/lfs_vnops.c 1.195, 1.196, 1.200, 1.202-206

Reduce busy waiting in lfs_putpages(), and other LFS improvements.
 1.228.2.4 17-May-2007  yamt sync with head.
 1.228.2.3 07-May-2007  yamt sync with head.
 1.228.2.2 24-Mar-2007  yamt sync with head.
 1.228.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.231.4.13 16-Sep-2007  ad - Checkpoint work in progress on the vnode lifecycle and reference counting
stuff. This makes it work properly without kernel_lock and fixes a few
quite old bugs. See vfs_subr.c 1.283.2.17 for details.

- Fix some problems with softdep. Unfortunately our softdep code appears
to have some longstanding bugs that cause it fail under stress test.
 1.231.4.12 24-Aug-2007  ad Sync with buffer cache locking changes. See buf.h/vfs_bio.c for details.
Some minor portions are incomplete and needs to be verified as a whole.
 1.231.4.11 21-Aug-2007  yamt fix some races around pagedaemon and uvm_wait. ok'ed by Andrew Doran.
 1.231.4.10 20-Aug-2007  ad Sync with HEAD.
 1.231.4.9 19-Aug-2007  ad - Back out the biodone() changes.
- Eliminate B_ERROR (from HEAD).
 1.231.4.8 15-Jul-2007  ad Sync with head.
 1.231.4.7 23-Jun-2007  ad - Lock v_cleanblkhd, v_dirtyblkhd, v_numoutput with the vnode's interlock.
Get rid of global_v_numoutput_lock. Partially incomplete as the buffer
cache locking doesn't work very well and needs an overhaul.
- Some changes to try and make softdep MP safe. Untested.
 1.231.4.6 08-Jun-2007  ad Sync with head.
 1.231.4.5 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.231.4.4 10-Apr-2007  ad Nuke the deferred kthread creation stuff, as it's no longer needed.
Pointed out by thorpej@.
 1.231.4.3 09-Apr-2007  ad - Add two new arguments to kthread_create1: pri_t pri, bool mpsafe.
- Fork kthreads off proc0 as new LWPs, not new processes.
 1.231.4.2 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.231.4.1 13-Mar-2007  ad Sync with head.
 1.232.2.1 11-Jul-2007  mjf Sync with head.
 1.240.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.244.8.2 31-Jul-2007  pooka * nuke the nameidata parameter from VFS_MOUNT(). Nobody on tech-kern
knew what it was supposed to be used for and wrstuden gave a go-ahead
* while rototilling, convert file systems which went easily to
use VFS_PROTOS() instead of manually prototyping the methods
 1.244.8.1 31-Jul-2007  pooka file lfs_vfsops.c was added on branch matt-mips64 on 2007-07-31 21:14:21 +0000
 1.244.6.1 14-Oct-2007  yamt sync with head.
 1.244.4.3 23-Mar-2008  matt sync with HEAD
 1.244.4.2 09-Jan-2008  matt sync with HEAD
 1.244.4.1 06-Nov-2007  matt sync with HEAD
 1.244.2.4 09-Dec-2007  jmcneill Sync with HEAD.
 1.244.2.3 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.244.2.2 11-Nov-2007  joerg Sync with HEAD.
 1.244.2.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.246.4.4 18-Feb-2008  mjf Sync with HEAD.
 1.246.4.3 27-Dec-2007  mjf Sync with HEAD.
 1.246.4.2 08-Dec-2007  mjf Sync with HEAD.
 1.246.4.1 19-Nov-2007  mjf Sync with HEAD.
 1.246.2.2 22-Nov-2007  bouyer Sync with HEAD
 1.246.2.1 13-Nov-2007  bouyer Sync with HEAD
 1.249.2.5 26-Dec-2007  ad Sync with head.
 1.249.2.4 19-Dec-2007  ad Use a global lfs_lock.
 1.249.2.3 19-Dec-2007  ad Fix some more problems w/lfs on this branch.
 1.249.2.2 19-Dec-2007  ad Get lfs mostly working.
 1.249.2.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.250.4.2 02-Jan-2008  bouyer Sync with HEAD
 1.250.4.1 13-Dec-2007  bouyer Sync with HEAD
 1.250.2.1 13-Dec-2007  yamt sync with head.
 1.255.10.8 11-Aug-2010  yamt sync with head.
 1.255.10.7 11-Mar-2010  yamt sync with head
 1.255.10.6 16-Sep-2009  yamt sync with head
 1.255.10.5 19-Aug-2009  yamt sync with head.
 1.255.10.4 18-Jul-2009  yamt sync with head.
 1.255.10.3 16-May-2009  yamt sync with head
 1.255.10.2 04-May-2009  yamt sync with head.
 1.255.10.1 16-May-2008  yamt sync with head.
 1.255.8.2 04-Jun-2008  yamt sync with head
 1.255.8.1 18-May-2008  yamt sync with head.
 1.255.6.3 17-Jan-2009  mjf Sync with HEAD.
 1.255.6.2 29-Jun-2008  mjf Sync with HEAD.
 1.255.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.260.2.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.260.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.265.2.1 03-Jul-2008  simonb Sync with head.
 1.267.6.2 25-Apr-2014  sborrill Pull up the following revisions(s) (requested by maxv in ticket #1901):
sys/kern/vfs_syscalls.c: revision 1.478, 1.480 via patch
sys/coda/coda_vfsops.c: revision 1.81
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50 via patch
sys/fs/puffs/puffs_vfsops.c: revision 1.110 via patch
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59 via patch
sys/fs/udf/udf_vfsops.c: revision 1.67
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/kern/vfs_syscalls.c: revision 1.479
sys/miscfs/nullfs/null_vfsops.c: revision 1.88 via patch
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/nfs/nfs_vfsops.c: revision 1.227
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/ufs/mfs/mfs_vfsops.c: revision 1.107

Due to missing checks in the mount syscall, and a wrong assumption on the
file systems side, the kernel could allocate an unbounded or zero-sized
memory buffer, and could dereference a NULL pointer when particular
arguments are given by a user.
 1.267.6.1 04-Apr-2009  snj branches: 1.267.6.1.4; 1.267.6.1.6; 1.267.6.1.10;
Pull up following revision(s) (requested by ad in ticket #662):
sys/ufs/lfs/lfs_vfsops.c: revision 1.272
Turn up the volume on the warning message a bit.
 1.267.6.1.10.1 28-Apr-2014  sborrill Pull up the following revisions(s) (requested by maxv in ticket #1901):
sys/kern/vfs_syscalls.c: revision 1.478, 1.480 via patch
sys/coda/coda_vfsops.c: revision 1.81
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50 via patch
sys/fs/puffs/puffs_vfsops.c: revision 1.110 via patch
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59 via patch
sys/fs/udf/udf_vfsops.c: revision 1.67
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/kern/vfs_syscalls.c: revision 1.479
sys/miscfs/nullfs/null_vfsops.c: revision 1.88 via patch
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/nfs/nfs_vfsops.c: revision 1.227
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/ufs/mfs/mfs_vfsops.c: revision 1.107

Due to missing checks in the mount syscall, and a wrong assumption on the
file systems side, the kernel could allocate an unbounded or zero-sized
memory buffer, and could dereference a NULL pointer when particular
arguments are given by a user.
 1.267.6.1.6.1 28-Apr-2014  sborrill Pull up the following revisions(s) (requested by maxv in ticket #1901):
sys/kern/vfs_syscalls.c: revision 1.478, 1.480 via patch
sys/coda/coda_vfsops.c: revision 1.81
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50 via patch
sys/fs/puffs/puffs_vfsops.c: revision 1.110 via patch
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59 via patch
sys/fs/udf/udf_vfsops.c: revision 1.67
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/kern/vfs_syscalls.c: revision 1.479
sys/miscfs/nullfs/null_vfsops.c: revision 1.88 via patch
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/nfs/nfs_vfsops.c: revision 1.227
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/ufs/mfs/mfs_vfsops.c: revision 1.107

Due to missing checks in the mount syscall, and a wrong assumption on the
file systems side, the kernel could allocate an unbounded or zero-sized
memory buffer, and could dereference a NULL pointer when particular
arguments are given by a user.
 1.267.6.1.4.1 09-Feb-2012  matt Change to use the updated uvm_pageout_* signature.
 1.267.4.3 28-Apr-2009  skrll Sync with HEAD.
 1.267.4.2 03-Mar-2009  skrll Sync with HEAD.
 1.267.4.1 19-Jan-2009  skrll Sync with HEAD.
 1.267.2.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.269.4.2 23-Jul-2009  jym Sync with HEAD.
 1.269.4.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.282.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.282.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.286.2.4 19-May-2011  rmind Implement sharing of vnode_t::v_interlock amongst vnodes:
- Lock is shared amongst UVM objects using uvm_obj_setlock() or getnewvnode().
- Adjust vnode cache to handle unsharing, add VI_LOCKSHARE flag for that.
- Use sharing in tmpfs and layerfs for underlying object.
- Simplify locking in ubc_fault().
- Sprinkle some asserts.

Discussed with ad@.
 1.286.2.3 21-Apr-2011  rmind sync with head
 1.286.2.2 03-Jul-2010  rmind sync with head
 1.286.2.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.287.4.1 09-Feb-2011  bouyer Various build fixes
 1.287.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.288.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.290.2.5 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.290.2.4 23-Jan-2013  yamt sync with head
 1.290.2.3 23-May-2012  yamt sync with head.
 1.290.2.2 17-Apr-2012  yamt sync with head
 1.290.2.1 02-Nov-2011  yamt page cache related changes

- maintain object pages in radix tree rather than rb tree.
- reduce unnecessary page scan in putpages. esp. when an object has a ton of
pages cached but only a few of them are dirty.
- reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
- fix nfs commit range tracking.
- fix nfs write clustering. XXX hack
 1.291.4.3 02-Jun-2012  mrg sync to latest -current.
 1.291.4.2 05-Apr-2012  mrg sync to latest -current.
 1.291.4.1 18-Feb-2012  mrg merge to -current.
 1.293.2.2 21-Apr-2014  bouyer Pull up following revision(s) (requested by maxv in ticket #1050):
sys/ufs/chfs/chfs_vfsops.c: revision 1.11
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/fs/nilfs/nilfs_vfsops.c: revision 1.16
sys/ufs/mfs/mfs_vfsops.c: revision 1.107
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/kern/vfs_syscalls.c: revision 1.478
sys/kern/vfs_syscalls.c: revision 1.479
sys/fs/puffs/puffs_vfsops.c: revision 1.110
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/nfs/nfs_vfsops.c: revision 1.227
sys/fs/v7fs/v7fs_vfsops.c: revision 1.10
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/miscfs/nullfs/null_vfsops.c: revision 1.88
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50
sys/coda/coda_vfsops.c: revision 1.81
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/kern/vfs_syscalls.c: revision 1.480
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/kern/vfs_syscalls.c: revision 1.482
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vfsops.c: revision 1.12
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/udf/udf_vfsops.c: revision 1.67
Limit check for 'data_len'. Otherwise a (un)privileged user can easily
panic the system by passing a huge size.
ok christos@
An (un)privileged user can easily make the kernel dereference a NULL
pointer.
The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).
ok christos@
Some fs's - like kernfs - set their vfs_min_mount_data to zero. Add a check
to prevent an (un)privileged user from requesting a zero-sized allocation
(and thus a panic).
This thing is totally buggy: 'data_len' is modified by the fs, so calling
kmem_free with it while its value has changed since the kmem_alloc is far
from being a good idea.
If the kernel figures out that something mismatches, it will panic
(typically with kernfs).
 1.293.2.1 17-Mar-2012  bouyer branches: 1.293.2.1.4; 1.293.2.1.6;
Pull up following revision(s) (requested by perseant in ticket #116):
sys/ufs/lfs/lfs_alloc.c: revision 1.112
tests/fs/vfs/t_rmdirrace.c: revision 1.9
tests/fs/vfs/t_renamerace.c: revision 1.25
sys/ufs/lfs/lfs_vnops.c: revision 1.240
sys/ufs/lfs/lfs_segment.c: revision 1.224
sys/ufs/lfs/lfs_bio.c: revision 1.122
sys/ufs/lfs/lfs_vfsops.c: revision 1.294
sbin/newfs_lfs/make_lfs.c: revision 1.19
sys/ufs/lfs/lfs.h: revision 1.136
Pass t_renamerace and t_rmdirrace tests.
Adapt dholland@'s fix to ufs_rename to fix PR kern/43582. Address several
other MP locking issues discovered during the course of investigating the
same problem.
Removed extraneous vn_lock() calls on the Ifile, since the Ifile writes
are controlled by the segment lock.
Fix PR kern/45982 by deemphasizing the estimate of how much metadata
will fill the empty space on disk when the disk is nearly empty
(t_renamerace crates a lot of inode blocks on a tiny empty disk).
 1.293.2.1.6.1 21-Apr-2014  bouyer Pull up following revision(s) (requested by maxv in ticket #1050):
sys/ufs/chfs/chfs_vfsops.c: revision 1.11
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/fs/nilfs/nilfs_vfsops.c: revision 1.16
sys/ufs/mfs/mfs_vfsops.c: revision 1.107
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/kern/vfs_syscalls.c: revision 1.478
sys/kern/vfs_syscalls.c: revision 1.479
sys/fs/puffs/puffs_vfsops.c: revision 1.110
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/nfs/nfs_vfsops.c: revision 1.227
sys/fs/v7fs/v7fs_vfsops.c: revision 1.10
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/miscfs/nullfs/null_vfsops.c: revision 1.88
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50
sys/coda/coda_vfsops.c: revision 1.81
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/kern/vfs_syscalls.c: revision 1.480
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/kern/vfs_syscalls.c: revision 1.482
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vfsops.c: revision 1.12
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/udf/udf_vfsops.c: revision 1.67
Limit check for 'data_len'. Otherwise a (un)privileged user can easily
panic the system by passing a huge size.
ok christos@
An (un)privileged user can easily make the kernel dereference a NULL
pointer.
The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).
ok christos@
Some fs's - like kernfs - set their vfs_min_mount_data to zero. Add a check
to prevent an (un)privileged user from requesting a zero-sized allocation
(and thus a panic).
This thing is totally buggy: 'data_len' is modified by the fs, so calling
kmem_free with it while its value has changed since the kmem_alloc is far
from being a good idea.
If the kernel figures out that something mismatches, it will panic
(typically with kernfs).
 1.293.2.1.4.1 21-Apr-2014  bouyer Pull up following revision(s) (requested by maxv in ticket #1050):
sys/ufs/chfs/chfs_vfsops.c: revision 1.11
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/fs/nilfs/nilfs_vfsops.c: revision 1.16
sys/ufs/mfs/mfs_vfsops.c: revision 1.107
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/kern/vfs_syscalls.c: revision 1.478
sys/kern/vfs_syscalls.c: revision 1.479
sys/fs/puffs/puffs_vfsops.c: revision 1.110
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/nfs/nfs_vfsops.c: revision 1.227
sys/fs/v7fs/v7fs_vfsops.c: revision 1.10
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/miscfs/nullfs/null_vfsops.c: revision 1.88
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50
sys/coda/coda_vfsops.c: revision 1.81
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/kern/vfs_syscalls.c: revision 1.480
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/kern/vfs_syscalls.c: revision 1.482
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vfsops.c: revision 1.12
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/udf/udf_vfsops.c: revision 1.67
Limit check for 'data_len'. Otherwise a (un)privileged user can easily
panic the system by passing a huge size.
ok christos@
An (un)privileged user can easily make the kernel dereference a NULL
pointer.
The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).
ok christos@
Some fs's - like kernfs - set their vfs_min_mount_data to zero. Add a check
to prevent an (un)privileged user from requesting a zero-sized allocation
(and thus a panic).
This thing is totally buggy: 'data_len' is modified by the fs, so calling
kmem_free with it while its value has changed since the kmem_alloc is far
from being a good idea.
If the kernel figures out that something mismatches, it will panic
(typically with kernfs).
 1.296.2.4 03-Dec-2017  jdolecek update from HEAD
 1.296.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.296.2.2 23-Jun-2013  tls resync from head
 1.296.2.1 25-Feb-2013  tls resync with head
 1.307.2.2 18-May-2014  rmind sync with head
 1.307.2.1 28-Aug-2013  rmind sync with head
 1.320.2.1 10-Aug-2014  tls Rebase.
 1.321.4.6 28-Aug-2017  skrll Sync with HEAD
 1.321.4.5 09-Jul-2016  skrll Sync with HEAD
 1.321.4.4 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.321.4.3 22-Sep-2015  skrll Sync with HEAD
 1.321.4.2 06-Jun-2015  skrll Sync with HEAD
 1.321.4.1 06-Apr-2015  skrll Sync with HEAD
 1.351.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.351.2.3 26-Apr-2017  pgoyette Sync with HEAD
 1.351.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.351.2.1 20-Jul-2016  pgoyette Adapt machine-independant code to the new {b,c}devsw reference-counting
(using localcount(9)). All callers of {b,c}devsw_lookup() now call
{b,c}devsw_lookup_acquire() which retains a reference on the 'struct
{b,c}devsw'. This reference must be released by the caller once it is
finished with the structure's content (or other data that would disappear
if the 'struct {b,c}devsw' were to disappear).
 1.359.4.2 02-Nov-2017  snj Pull up following revision(s) (requested by pgoyette in ticket #335):
share/man/man9/kernhist.9: 1.5-1.8
sys/arch/acorn26/acorn26/pmap.c: 1.39
sys/arch/arm/arm32/fault.c: 1.105 via patch
sys/arch/arm/arm32/pmap.c: 1.350, 1.359
sys/arch/arm/broadcom/bcm2835_bsc.c: 1.7
sys/arch/arm/omap/if_cpsw.c: 1.20
sys/arch/arm/omap/tiotg.c: 1.7
sys/arch/evbarm/conf/RPI2_INSTALL: 1.3
sys/dev/ic/sl811hs.c: 1.98
sys/dev/usb/ehci.c: 1.256
sys/dev/usb/if_axe.c: 1.83
sys/dev/usb/motg.c: 1.18
sys/dev/usb/ohci.c: 1.274
sys/dev/usb/ucom.c: 1.119
sys/dev/usb/uhci.c: 1.277
sys/dev/usb/uhub.c: 1.137
sys/dev/usb/umass.c: 1.160-1.162
sys/dev/usb/umass_quirks.c: 1.100
sys/dev/usb/umass_scsipi.c: 1.55
sys/dev/usb/usb.c: 1.168
sys/dev/usb/usb_mem.c: 1.70
sys/dev/usb/usb_subr.c: 1.221
sys/dev/usb/usbdi.c: 1.175
sys/dev/usb/usbdi_util.c: 1.67-1.70
sys/dev/usb/usbroothub.c: 1.3
sys/dev/usb/xhci.c: 1.75
sys/external/bsd/drm2/dist/drm/i915/i915_gem.c: 1.34
sys/kern/kern_history.c: 1.15
sys/kern/kern_xxx.c: 1.74
sys/kern/vfs_bio.c: 1.275-1.276
sys/miscfs/genfs/genfs_io.c: 1.71
sys/sys/kernhist.h: 1.21
sys/ufs/ffs/ffs_balloc.c: 1.63
sys/ufs/lfs/lfs_vfsops.c: 1.361
sys/ufs/lfs/ulfs_inode.c: 1.21
sys/ufs/lfs/ulfs_vnops.c: 1.52
sys/ufs/ufs/ufs_inode.c: 1.102
sys/ufs/ufs/ufs_vnops.c: 1.239
sys/uvm/pmap/pmap.c: 1.37-1.39
sys/uvm/pmap/pmap_tlb.c: 1.22
sys/uvm/uvm_amap.c: 1.108
sys/uvm/uvm_anon.c: 1.64
sys/uvm/uvm_aobj.c: 1.126
sys/uvm/uvm_bio.c: 1.91
sys/uvm/uvm_device.c: 1.66
sys/uvm/uvm_fault.c: 1.201
sys/uvm/uvm_km.c: 1.144
sys/uvm/uvm_loan.c: 1.85
sys/uvm/uvm_map.c: 1.353
sys/uvm/uvm_page.c: 1.194
sys/uvm/uvm_pager.c: 1.111
sys/uvm/uvm_pdaemon.c: 1.109
sys/uvm/uvm_swap.c: 1.175
sys/uvm/uvm_vnode.c: 1.103
usr.bin/vmstat/vmstat.c: 1.219
Reorder to test for null before null deref in debug code
--
Reorder to test for null before null deref in debug code
--
KNF
--
No need for '\n' in UVMHIST_LOG
--
normalise a BIOHIST log message
--
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...
(As proposed on tech-kern@ with additional changes and enhancements.)
Details of changes:
* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)
* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.
* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.
* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."
* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.
* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).
* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).
* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.
* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.
[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3)
format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".
[2] I've tried very hard to find "all [the] existing users of
kernhist(9)"
but it is possible that I've missed some of them. I would be glad
to
update any stragglers that anyone identifies.
--
For some reason this single kernel seems to have outgrown its declared
size as a result of the kernhist(9) changes. Bump the size.
XXX The amount of increase may be excessive - anyone with more detailed
XXX knowledge please feel free to further adjust the value
appropriately.
--
Misssed one cast of pointer --> uintptr_t in previous kernhist(9) commit
--
And yet another one. :(
--
Use correct mark-up for NetBSD version.
--
More improvements in grammar and readability.
--
Remove a stray '"' (obvious typo) and add a couple of casts that are
probably needed.
--
And replace an instance of "%p" conversion with "%#jx"
--
Whitespace fix. Give Bl tag table a width. Fix Xr.
 1.359.4.1 30-Oct-2017  snj Pull up following revision(s) (requested by maya in ticket #330):
sbin/fsck_lfs/inode.c: 1.69
sbin/fsck_lfs/lfs.c: 1.73
sbin/fsck_lfs/pass6.c: 1.50
sbin/fsck_lfs/segwrite.c: 1.46
sys/ufs/lfs/lfs.h: 1.202-1.203
sys/ufs/lfs/lfs_accessors.h: 1.48
sys/ufs/lfs/lfs_alloc.c: 1.136-1.137
sys/ufs/lfs/lfs_balloc.c: 1.94
sys/ufs/lfs/lfs_bio.c: 1.141
sys/ufs/lfs/lfs_extern.h: 1.113
sys/ufs/lfs/lfs_inode.c: 1.156-1.157
sys/ufs/lfs/lfs_inode.h: 1.20, 1.21, 1.23
sys/ufs/lfs/lfs_itimes.c: 1.20
sys/ufs/lfs/lfs_pages.c: 1.13-1.15
sys/ufs/lfs/lfs_rename.c: 1.22
sys/ufs/lfs/lfs_segment.c: 1.270-1.275
sys/ufs/lfs/lfs_subr.c: 1.94-1.97
sys/ufs/lfs/lfs_syscalls.c: 1.175
sys/ufs/lfs/lfs_vfsops.c: 1.360
sys/ufs/lfs/lfs_vnops.c: 1.316-1.321
sys/ufs/lfs/ulfs_inode.c: 1.20
sys/ufs/lfs/ulfs_inode.h: 1.24
sys/ufs/lfs/ulfs_lookup.c: 1.41
sys/ufs/lfs/ulfs_quota2.c: 1.31
sys/ufs/lfs/ulfs_readwrite.c: 1.24
sys/ufs/lfs/ulfs_vnops.c: 1.49-1.50
Update inode member i_flag --> i_state to keep up with kernel changes
Move definition of IN_ALLMOD near the flag it's a mask for.
Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
Correct confusion between i_flag and i_flags
These will have to be renamed.
Spotted by Riastradh, thanks!
Add an XXX about the missing flags so it's not buried in a commit
message.
now the XXX count for LFS is 260
Rename i_flag to i_state.
The similarity to i_flags has previously caused errors.
Use continue to denote the no-op loop to match netbsd style
newline for extra clarity.
It isn't safe to drain dirops with seglock held, it'll deadlock if there
are any dirops. drain before grabbing seglock.
lfs_dirops == 0 is always true (as we already drained dirops), so omit
that part of the comparison.
Fixes a lot of LFS deadlocks. PR kern/52301
Many thanks to dholland for help analyzing coredumps
Ifdef out KDASSERT which fires on my machine.
Deduplicate sanity check that seglock is held on segunlock
Revert r1.272 fix to PR kern/52301, the performance hit is making things
unusable.
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar
XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
fix buffer overflow/KASSERT when cookies are supplied
lfs no longer uses the ffs-style struct direct, use the correct minimum
size
from dholland
XXX more wrong
Consistently use {,UN}MARK_VNODE macros rather than function calls.
Not much point doing anything after a panic call
Ask some question about the code in a XXX comment
XXX question our double-flushing of dirops
Fix typo in comment
 1.359.2.1 27-Apr-2017  pgoyette Restore all work from the former pgoyette-localcount branch (which is
now abandoned doe to cvs merge botch).

The branch now builds, and installs via anita. There are still some
problems (cgd is non-functional and all atf tests time-out) but they
will get resolved soon.
 1.361.2.3 18-Jan-2019  pgoyette Synch with HEAD
 1.361.2.2 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.361.2.1 25-Jun-2018  pgoyette Sync with HEAD
 1.362.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.362.2.1 10-Jun-2019  christos Sync with HEAD
 1.365.2.2 07-Jan-2025  martin Pull up following revision(s) (requested by hannken in ticket #1934):

sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.228
sys/ufs/lfs/lfs_vfsops.c: revision 1.383
sys/ufs/ffs/ffs_wapbl.c: revision 1.50
sys/ufs/ffs/ffs_vfsops.c: revision 1.383 (patch)
sys/ufs/ffs/ffs_vfsops.c: revision 1.384 (patch)

Remove comment "we are always called with the filesystem marked `MPBUSY'."
above some xxx_sync() operations. These operations get called without
any exclusive lock.

This comment appeared with "add quota support" on 1990-05-02.
On 1998/02/18 MNT_MPBUSY disappeared when vfs_busy() was changed from
an exclusive lock to a shared lock.

PR kern/58837 "ffs: Missing locking around fs_fmod/time"

Protect test/clear fs->fs_fmod with um_lock like it is already
protected in ffs_alloc.c.

When writing to disk protect moving superblock to buffer with um_lock.

Set/clear fs->fmod while mounting, updating a mount or unmounting
is safe as these operations run exclusive, either mounting creates
a new file system or the file system is suspended. Assert suspension
for update and unmount.

PR kern/58837 "ffs: Missing locking around fs_fmod/time"
 1.365.2.1 17-Aug-2020  martin Pull up following revision(s) (requested by riastradh in ticket #1050):

sys/ufs/lfs/lfs_subr.c: revision 1.101
sys/ufs/lfs/lfs_subr.c: revision 1.102
sys/ufs/lfs/lfs_inode.c: revision 1.158
sys/ufs/lfs/lfs_inode.h: revision 1.25
sys/ufs/lfs/lfs_balloc.c: revision 1.95
sys/ufs/lfs/lfs_pages.c: revision 1.21
sys/ufs/lfs/lfs_vnops.c: revision 1.330
sys/ufs/lfs/lfs_alloc.c: revision 1.140 (patch)
sys/ufs/lfs/lfs_alloc.c: revision 1.141 (patch)
lib/libp2k/p2k.c: revision 1.72
sys/ufs/lfs/lfs.h: revision 1.205
sys/ufs/lfs/lfs.h: revision 1.206
sys/ufs/lfs/lfs_segment.c: revision 1.284
sys/ufs/lfs/lfs.h: revision 1.207
sys/ufs/lfs/lfs_segment.c: revision 1.285
sys/ufs/lfs/lfs_debug.c: revision 1.55
sys/ufs/lfs/lfs_rename.c: revision 1.23
usr.sbin/dumplfs/dumplfs.c: revision 1.65
sys/ufs/lfs/lfs_vfsops.c: revision 1.371
sys/arch/i386/stand/efiboot/bootx64/Makefile: revision 1.3
sys/ufs/lfs/lfs_vfsops.c: revision 1.372
sys/ufs/lfs/lfs_vfsops.c: revision 1.373
sbin/fsck_lfs/pass1.c: revision 1.46
sys/ufs/lfs/lfs_vnops.c: revision 1.326
sys/ufs/lfs/lfs_vnops.c: revision 1.327
sys/ufs/lfs/lfs_vfsops.c: revision 1.375 (patch)
sys/ufs/lfs/lfs_vnops.c: revision 1.328
sys/ufs/lfs/lfs_subr.c: revision 1.98
sys/ufs/lfs/lfs_extern.h: revision 1.116
sys/ufs/lfs/lfs_vnops.c: revision 1.329
sys/ufs/lfs/lfs_subr.c: revision 1.99
sys/ufs/lfs/lfs_extern.h: revision 1.117
sys/ufs/lfs/lfs_accessors.h: revision 1.49
sys/ufs/lfs/lfs_extern.h: revision 1.118
sys/rump/fs/lib/liblfs/Makefile: revision 1.15
sys/ufs/lfs/lfs_bio.c: revision 1.146 (patch)
sys/ufs/lfs/lfs_bio.c: revision 1.147
sys/ufs/lfs/lfs_subr.c: revision 1.100

Fix kassert in lfs by initializing vp first.

Use a marker node to iterate lfs_dchainhd / i_lfs_dchain.

I believe elements can be removed while the lock is dropped,
including the next node we're hanging on to.

Just use VOP_BWRITE for lfs_bwrite_log.
Hope this doesn't cause trouble with vfs_suspend.

Teach lfs to transition ro<->rw.

Prevent new dirops while we issue lfs_flush_dirops.

lfs_flush_dirops assumes (by KASSERT((ip->i_state & IN_ADIROP) == 0))
that vnodes on the dchain will not become involved in active dirops
even while holding no other locks (lfs_lock, v_interlock), so we must
set lfs_writer here. All other callers already set lfs_writer.

We set fs->lfs_writer++ without explicitly doing lfs_writer_enter
because
(a) we already waited for the dirops to drain, and
(b) we hold lfs_lock and cannot drop it before setting lfs_writer.

Assert lfs_writer where I think we can now prove it.

Serialize access to the splay tree with lfs_lock.

Change some cheap KDASSERT into KASSERT.

Take a reference and fix assertions in lfs_flush_dirops.
Fixes panic:
KASSERT((ip->i_state & IN_ADIROP) == 0) at lfs_vnops.c:1670
lfs_flush_dirops
lfs_check
lfs_setattr
VOP_SETATTR
change_mode
sys_fchmod
syscall

This assertion -- and the assertion that vp->v_uflag has VU_DIROP set
-- is valid only until we release lfs_lock, because we may race with
lfs_unmark_dirop which will remove the nodes and change the flags.

Further, vp itself is valid only as long as it is referenced, which it
is as long as it's on the dchain, but lfs_unmark_dirop drops the
dchain's reference.

Don't lfs_writer_enter while holding v_interlock.

There's no need to lfs_writer_enter at all here, as far as I can see.
lfs_flush_fs will do it for us.

Break deadlock in PR kern/52301.

The lock order is lfs_writer -> lfs_seglock. The problem in 52301 is
that lfs_segwrite violates this lock order by sometimes doing
lfs_seglock -> lfs_writer, either (a) when doing a checkpoint or (b),
opportunistically, when there are no dirops pending. Both cases can
deadlock, because dirops sometimes take the seglock (lfs_truncate,
lfs_valloc, lfs_vfree):
(a) There may be dirops pending, and they may be waiting for the
seglock, so we can't wait for them to complete while holding the
seglock.
(b) The test for fs->lfs_dirops == 0 happens unlocked, and the state
may change by the time lfs_writer_enter acquires lfs_lock.

To resolve this in each case:
(a) Do lfs_writer_enter before lfs_seglock, since we will need it
unconditionally anyway. The worst performance impact of this should
be that some dirops get delayed a little bit.
(b) Create a new lfs_writer_tryenter to use at this point so that the
test for fs->lfs_dirops == 0 and the acquisition of lfs_writer happen
atomically under lfs_lock.

Initialize/destroy lfs_allclean_wakeup in modcmd, not lfs_mountfs.

Fixes reloading lfs.kmod.

In lfs_update, hold lfs_writer around lfs_vflush.

Otherwise, we might do
lfs_vflush
-> lfs_seglock
-> lfs_segwait(SEGM_CKP)
-> lfs_writer_enter
which is the reverse of the lfs_writer -> lfs_seglock ordering.

Call lfs_orphan in lfs_rename while we're still in the dirop.
lfs_writer_enter can't fail; keep it simple and don't pretend it can.

Assert that mtsleep can't fail either -- it doesn't catch signals and
there's no timeout.

Teach LFS_ORPHAN_NEXTFREE about lfs64.

Dust off the orphan detection code and try to make it work.

Fix !DIAGNOSTIC compile

Fix userland references to LFS_ORPHAN_NEXTFREE.

Forgot to grep for these or do a full distribution build, oops!

Fix missing <sys/evcnt.h> by removing the evcnts instead.

Just wanted to confirm that a race might happen, and indeed it did.
These serve little diagnostic value otherwise.

OR into bp->b_cflags; don't overwrite.

CTASSERT lfs on-disk structure sizes.

Avoid misaligned access to lfs64 on-disk records in memory.
lfs64 directory entries are only 32-bit aligned in order to conserve
space in directory blocks, and we had a hack to stuff a 64-bit inode
in them. This replaces the hack by __aligned(4) __packed, and goes
further:

1. It's not clear that all the other lfs64 data structures are 64-bit
aligned on disk to begin with. We can go through these later and
upgrade them from
struct foo64 {
...
} __aligned(4) __packed;
union foo {
struct foo64 f64;
...
};
to
struct foo64 {
...
};
union foo {
struct foo64 f64 __aligned(8);
...
} __aligned(4) __packed;
if we really want to take advantage of 64-bit memory accesses.
However, the __aligned(4) __packed must remain on the union
because:
2. We access even the lfs32 data structures via a union that has
lfs64 members, and it turns out that compilers will assume access
through a union with 64-bit aligned members implies the whole
union has 64-bit alignment, even if we're only accessing a 32-bit
aligned member.

Fix clang build after packed lfs64 accessor change.

Suppress spurious address-of-packed error in rump lfs too.
 1.367.2.3 29-Feb-2020  ad Sync with head.
 1.367.2.2 19-Jan-2020  ad Set IMNT_SHRLOOKUP and use it for the in-cache case. Need to check what
more can be done with tmpfs though, it can probably do the whole lookup.
 1.367.2.1 17-Jan-2020  ad Sync with head.
 1.380.6.1 01-Aug-2021  thorpej Sync with HEAD.
 1.382.10.1 02-Aug-2025  perseant Sync with HEAD
 1.382.4.1 07-Jan-2025  martin Pull up following revision(s) (requested by hannken in ticket #1037):

sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.228
sys/ufs/lfs/lfs_vfsops.c: revision 1.383
sys/ufs/ffs/ffs_wapbl.c: revision 1.50
sys/ufs/ffs/ffs_vfsops.c: revision 1.383
sys/ufs/ffs/ffs_vfsops.c: revision 1.384

Remove comment "we are always called with the filesystem marked `MPBUSY'."
above some xxx_sync() operations. These operations get called without
any exclusive lock.

This comment appeared with "add quota support" on 1990-05-02.
On 1998/02/18 MNT_MPBUSY disappeared when vfs_busy() was changed from
an exclusive lock to a shared lock.

PR kern/58837 "ffs: Missing locking around fs_fmod/time"

Protect test/clear fs->fs_fmod with um_lock like it is already
protected in ffs_alloc.c.

When writing to disk protect moving superblock to buffer with um_lock.

Set/clear fs->fmod while mounting, updating a mount or unmounting
is safe as these operations run exclusive, either mounting creates
a new file system or the file system is suspended. Assert suspension
for update and unmount.

PR kern/58837 "ffs: Missing locking around fs_fmod/time"
 1.347 03-Nov-2025  perseant Be more careful about only setting IN_CLEANING in lfs_setclean() and clearing
it in lfs_clrclean(). Prevents a crash from re-removing an entry from the
lfs_cleanhd TAILQ.
 1.346 01-Nov-2025  perseant Create a new LFS inode flag, IN_DEAD, to indicate that a file's last
reference, other than those that come with VU_DIROP or IN_CLEANING and
the one the caller holds, has been dropped. Check and apply this flag
in lfs_orphan(), and call lfs_orphan() on close if the link count is
zero. Change the signature of lfs_orphan to facilitate.

Make test t_vfsops:lfs_tfhremove expect success.

Closes PR kern/43745.
 1.345 20-Oct-2025  perseant * Generalize the partial-segment parser introduced for roll-forward,
using it to facilitate an in-kernel segment rewriter (cleaner), and a
mechanism to check whether a segment is in fact empty (only used
with DEBUG).

* Add these new fcntl calls:
- LFCNFILESTATS: For each inode given, report its number of direct
blocks, how many gaps (discontinuities) there are between direct
blocks, and how large the total gap distance is. This will be
useful for a coalescing agent.
- LFCNREWRITEFILE: For each inode given, rewrite its direct blocks,
effectively coalescing it into as compact a form as possible.
- LFCNSCRAMBLE: As above, except that it only rewrites every other
block. This causes the file to have many gaps that can be
measured with LFCNFILESTATS and addressed with LFCNREWRITEFILE,
for testing purposes.
- LFCNREWRITESEGS: Rewrite any live data in the given segments.
This is intended to simplify the cleaner API and facilitate an
in-kernel cleaner.
- LFCNCLEANERINFO: Get the most current CLEANERINFO data from the
kernel.
- LFCNSEGUSE: Retrieve segment usage data from the kernel.

* Vnodes marked IN_CLEANING now take a reference. Add a new "cleaner
lock", which must be taken by the cleaner before the segment lock,
and before marking nodes IN_CLEANING. This allows us to flush
vnodes, if necessary, before the cleaning segment is written, and
never to flush vnodes being cleaned. When the cleaner lock is
released, the vnodes are cleared of IN_CLEANING and the reference
dropped.

* Track a potential infinite loop in lfs_gatherblock.

* Pull "needs to flush" and "needs to wait for flush" into functions
instead of inlining their definitions.
 1.344 01-Oct-2025  perseant Align case labels with 8-character tab stops. No functional change.
 1.343 17-Sep-2025  perseant Add working in-kernel roll forward.
 1.342 06-Sep-2025  perseant Lock the vnode before calling lfs_set_dirop, to meet the conditions of
the assertion. Fixes a regression introduced in rev 1.341.
 1.341 05-Sep-2025  perseant Protect the changed link count of the linked vnode with {,UN}MARK_DIROP
in lfs_link(). Necessary for roll-forward.
 1.340 20-Oct-2021  thorpej Overhaul of the EVFILT_VNODE kevent(2) filter:

- Centralize vnode kevent handling in the VOP_*() wrappers, rather than
forcing each individual file system to deal with it (except VOP_RENAME(),
because VOP_RENAME() is a mess and we currently have 2 different ways
of handling it; at least it's reasonably well-centralized in the "new"
way).
- Add support for NOTE_OPEN, NOTE_CLOSE, NOTE_CLOSE_WRITE, and NOTE_READ,
compatible with the same events in FreeBSD.
- Track which kevent notifications clients are interested in receiving
to avoid doing work for events no one cares about (avoiding, e.g.
taking locks and traversing the klist to send a NOTE_WRITE when
someone is merely watching for a file to be deleted, for example).

In support of the above:

- Add support in vnode_if.sh for specifying PRE- and POST-op handlers,
to be invoked before and after vop_pre() and vop_post(), respectively.
Basic idea from FreeBSD, but implemented differently.
- Add support in vnode_if.sh for specifying CONTEXT fields in the
vop_*_args structures. These context fields are used to convey information
between the file system VOP function and the VOP wrapper, but do not
occupy an argument slot in the VOP_*() call itself. These context fields
are initialized and subsequently interpreted by PRE- and POST-op handlers.
- Version VOP_REMOVE(), uses the a context field for the file system to report
back the resulting link count of the target vnode. Return this in tmpfs,
udf, nfs, chfs, ext2fs, lfs, and ufs.

NetBSD 9.99.92.
 1.339 18-Jul-2021  dholland Abolish all the silly indirection macros for initializing vnode ops tables.

These are things of the form #define foofs_op genfs_op, or #define
foofs_op genfs_eopnotsupp, or similar. They serve no purpose besides
obfuscation, and have gotten cutpasted all over everywhere.
 1.338 18-Jul-2021  dholland Use macros for the canned parts of device and fifo vnode op tables.

Add GENFS_SPECOP_ENTRIES and GENFS_FIFOOP_ENTRIES macros that contain
the portion of the vnode ops table declaration that is
(conservatively) the same in every fs. Use these in every fs that
supports devices and/or fifos with separate ops tables.

Note that ptyfs works differently (it has one type of vnode with
open-coded dispatch to the specfs code, which I haven't changed in
this commit) and rump/librump/rumpvfs/rumpfs.c has an indirect dynamic
dispatch that already does more or less the same thing, which I also
haven't changed.

Also note that this anticipates a few bits in the next changeset here
and there, and adds missing but unreachable calls in some cases (e.g.
most fses weren't defining whiteout on devices and fifos, but it isn't
reachable there), and it changes parsepath on devices and fifos to
genfs_badop from genfs_parsepath (but it's not reachable there
either).

It appears that devices in kernfs were missing kqfilter, so it's
possible that if you try to use kqueue on /kern/rootdev that it'll
explode.

And finally note that the ops declaration tables aren't
order-dependent. (Other than vop_default_desc has to come first.)
Otherwise this wouldn't work.
 1.337 29-Jun-2021  dholland - Add a new vnode op: VOP_PARSEPATH.
- Move namei_getcomponent to genfs_vnops.c and call it genfs_parsepath.
- Add a parsepath entry to every vnode ops table.

VOP_PARSEPATH takes a directory vnode to be searched and a complete
following path and chooses how much of that path to consume. To begin
with, all parsepath calls are genfs_parsepath, which locates the first
'/' as always.

Note that the call doesn't take the whole struct componentname, only
the string. The other bits of struct componentname should not be
needed and there's no reason to cause potential complications by
exposing them.
 1.336 05-Sep-2020  riastradh branches: 1.336.6;
Round of uvm.h cleanup.

The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
to query whether curlwp is the pagedaemon, which should maybe be
exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
by it. We should split up uvm_extern.h but this will serve for now
to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
UVMHIST(ubchist), since ubchist is declared in uvm.h but the
reference evaporates if UVMHIST is not defined, so we reduce header
file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
here.

ok chs@
 1.335 05-Sep-2020  riastradh Revert "ufs: Prevent mkdir from choking on deleted directories."

This change made no sense and should not have been committed.
 1.334 05-Sep-2020  riastradh ufs: Prevent mkdir from choking on deleted directories.

Fix some missing uvm_vnp_setsize in screw cases while here.
 1.333 16-May-2020  christos Add ACL support for FFS. From FreeBSD.
 1.332 13-Apr-2020  ad Replace most uses of vp->v_usecount with a call to vrefcnt(vp), a function
that hides the details and does atomic_load_relaxed(). Signature matches
FreeBSD.
 1.331 23-Feb-2020  ad branches: 1.331.4;
UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.330 23-Feb-2020  riastradh Fix missing <sys/evcnt.h> by removing the evcnts instead.

Just wanted to confirm that a race might happen, and indeed it did.
These serve little diagnostic value otherwise.
 1.329 23-Feb-2020  riastradh Take a reference and fix assertions in lfs_flush_dirops.

Fixes panic:

KASSERT((ip->i_state & IN_ADIROP) == 0) at lfs_vnops.c:1670
lfs_flush_dirops
lfs_check
lfs_setattr
VOP_SETATTR
change_mode
sys_fchmod
syscall

This assertion -- and the assertion that vp->v_uflag has VU_DIROP set
-- is valid only until we release lfs_lock, because we may race with
lfs_unmark_dirop which will remove the nodes and change the flags.

Further, vp itself is valid only as long as it is referenced, which it
is as long as it's on the dchain, but lfs_unmark_dirop drops the
dchain's reference.
 1.328 23-Feb-2020  riastradh Change some cheap KDASSERT into KASSERT.
 1.327 23-Feb-2020  riastradh Assert lfs_writer where I think we can now prove it.
 1.326 23-Feb-2020  riastradh Use a marker node to iterate lfs_dchainhd / i_lfs_dchain.

I believe elements can be removed while the lock is dropped,
including the next node we're hanging on to.
 1.325 18-Sep-2019  christos branches: 1.325.2;
Add newly created vnodes to the namei cache. The rest of the filesystems
already did that (or they don't support writing). Discussed in tech-kern.
 1.324 20-Jun-2019  christos branches: 1.324.2;
unifdef -DLFS_READWRITE ulfs_readwrite.c
 1.323 01-Jan-2019  hannken Add "void *extra" argument to vcache_new() so a file system may
pass more information about the file to create.

Welcome to 8.99.30
 1.322 11-Aug-2018  zafer In lfs_mkdir fix wrong return path in case of EMLINK which causes a panic. Also, check earlier before setting up dirop.
 1.321 20-Aug-2017  maya branches: 1.321.2; 1.321.4;
Fix typo in comment
 1.320 19-Aug-2017  maya Not much point doing anything after a panic call
 1.319 19-Aug-2017  maya Consistently use {,UN}MARK_VNODE macros rather than function calls.
 1.318 26-Jul-2017  maya change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar

XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
 1.317 10-Jun-2017  maya Rename i_flag to i_state.

The similarity to i_flags has previously caused errors.
 1.316 05-Jun-2017  maya Correct confusion between i_flag and i_flags
These will have to be renamed.

Spotted by Riastradh, thanks!
 1.315 26-May-2017  riastradh branches: 1.315.2;
Make VOP_RECLAIM do the last unlock of the vnode.

VOP_RECLAIM naturally has exclusive access to the vnode, so having it
locked on entry is not strictly necessary -- but it means if there
are any final operations that must be done on the vnode, such as
ffs_update, requiring exclusive access to it, we can now kassert that
the vnode is locked in those operations.

We can't just have the caller release the last lock because some file
systems don't use genfs_lock, and require the vnode to remain valid
for VOP_UNLOCK to work, notably unionfs.
 1.314 26-Apr-2017  riastradh Change VOP_REMOVE and VOP_RMDIR to preserve lock/ref on dvp.

No change to vp -- the plan is to replace the node by the
componentname in the vop parameters, and let all directory vops do
lookups internally.

Proposed on tech-kern with no objections:
https://mail-index.netbsd.org/tech-kern/2017/04/17/msg021825.html
 1.313 11-Apr-2017  riastradh Make VOP_INACTIVE preserve vnode lock on return.

Discussed on tech-kern:
https://mail-index.netbsd.org/tech-kern/2017/04/01/msg021751.html

Ride 7.99.68, a bumpy bus of incremental vfs improvements!
 1.312 11-Apr-2017  riastradh Fix non-DIAGNOSTIC build by using vp outside KASSERT too.
 1.311 11-Apr-2017  riastradh Sprinkle lock ownership assertions.
 1.310 01-Apr-2017  maya Switch lfs_writer_daemon to use condvar instead of mtsleep.
track thread existence with struct lwp instead of pid + lid,
it's more useful from ddb.
 1.309 01-Apr-2017  maya switch lfs_dirops to condvar (from mtsleep)
 1.308 01-Apr-2017  maya switch lfs_sleepers to condvar (from mtsleep)
 1.307 30-Mar-2017  hannken Remove now redundant calls to fstrans_start()/fstrans_done().

Add fstrans_start()/fstrans_done() to lfs_putpages().
 1.306 16-Mar-2017  maya actually cast to unsigned long long and use %llu. certainly not use hex (oops)
suggested by dh
 1.305 15-Mar-2017  maya print inode number in an assert I keep hitting and the adjacent one.
use PRIx64 for printing inode number elsewhere.
 1.304 13-Jul-2016  maya branches: 1.304.2; 1.304.4;
Fix a deadlock

ok dholland@
 1.303 20-Jun-2016  dholland In lfs_mknod, don't release dvp until done with it. This was exposed a
while back when I removed a sketchy preprocessor macro scheme, but I'd
left it the way it was at the time and marked it for later. Now I
guess it's later.

Also don't randomly use both dvp and ap->a_dvp; they're the same, so
pick one and stick to it.
 1.302 20-Jun-2016  dholland One more batch of already-synced ufs changes:

ufs_extern.h 1.79 is equivalent to ulfs_extern.h 1.14
ufsmount.h 1.43 is (roughly) equivalent to lfs_extern.h 1.102
ufs_inode.c 1.94 does not apply to lfs
ufs_inode.c 1.95 does not apply to lfs either
ufs_readwrite.c 1.108 is equivalent to ulfs_readwrite.c 1.8
ufs_readwrite.c 1.109 is equivalent to ulfs_readwrite.c 1.9
ufs_readwrite.c 1.110 is equivalent to ulfs_readwrite.c 1.10
ufs_readwrite.c 1.111 does not apply to lfs
ufs_readwrite.c 1.112 is equivalent to ulfs_readwrite.c 1.11
ufs_readwrite.c 1.113 is equivalent to ulfs_readwrite.c 1.13
ufs_readwrite.c 1.114 is equivalent to ulfs_readwrite.c 1.14
ufs_readwrite.c 1.115 is equivalent to ulfs_readwrite.c 1.15
ufs_readwrite.c 1.116-1.118 does not apply to lfs
ufs_readwrite.c 1.119-1.120 are equivalent to ulfs_readwrite.c 1.16
ufs_rename.c 1.12 is equivalent to lfs_rename.c 1.8
ufs_vnops.c 1.226 is equivalent to ulfs_vnops.c 1.22 and lfs_vnops.c 1.270
ufs_vnops.c 1.227 is equivalent to ulfs_vnops.c 1.23
ufs_vnops.c 1.228-1.229 are equivalent to ulfs_vnops.c 1.24
ufs_vnops.c 1.230 is equivalent to ulfs_vnops.c 1.25 and lfs_vnops.c 1.271
ufs_vnops.c 1.231 originated in lfs
ufs_vnops.c 1.232 does not apply to lfs
 1.301 20-Jun-2016  dholland With the previous we seem to have the changes from -r1.225 of ufs_vnops.c.
(as that was stuff from moving ffs to the new vcache and lfs has also been
moved, this is not surprising)
 1.300 20-Jun-2016  dholland ulfs_makeinode -> lfs_makeinode
 1.299 20-Jun-2016  dholland Merge (effectively) -r1.78 of ufs_extern.h: shift ulfs_makeinode to
lfs_vnops.c and make it file-static there, as that's the only place
it's used.
 1.298 20-Jun-2016  dholland Note more already-merged versions:

inode.h 1.68 is subsumed by ulfs_inode.h 1.19
inode.h 1.69-1.72 do not apply to lfs
ufs_extern.h 1.74 was covered when lfs was moved to the new vnode cache
ufs_extern.h 1.75 is equivalent to ulfs_extern.h 1.13
ufs_extern.h 1.76-1.77 do not apply to lfs
ufsmount.h 1.42 does not apply to lfs
ufs_inode.c 1.90 is subsumed by ulfs_inode.c 1.10
ufs_inode.c 1.91-1.92 do not apply to lfs
ufs_lookup.c 1.130 is subsumed by ulfs_lookup.c 1.24
ufs_lookup.c 1.131 is equivalent to ulfs_lookup.c 1.20
ufs_lookup.c 1.132 is equivalent to ulfs_lookup.c 1.21
ufs_lookup.c 1.133 is equivalent to ulfs_lookup.c 1.22
ufs_lookup.c 1.134 is equivalent to ulfs_lookup.c 1.23
ufs_lookup.c 1.135 is equivalent to ulfs_lookup.c 1.25
ufs_quota2.c 1.38 is equivalent to ulfs_quota2.c 1.17
ufs_quota2.c 1.39 is equivalent to ulfs_quota2.c 1.16
ufs_quota2.c 1.40 is equivalent to ulfs_quota2.c 1.18
ufs_vfsops.c 1.53 is subsumed by lfs_vfsops.c 1.324
ufs_vfsops.c 1.54 is subsumed by lfs_vfsops.c 1.324
ufs_vnops.c 1.223-1.224 do not apply to lfs
 1.297 20-Jun-2016  dholland More already-merged or equivalent changes:

ufs_dirhash.c 1.36 corresponds to ulfs_dirhash.c 1.8
ufs_extattr.c 1.43 corresponds to ulfs_extattr.c 1.7
ufs_lookup.c 1.126 does not apply to lfs
ufs_lookup.c 1.127 we already have
ufs_lookup.c 1.128 does not apply to lfs
ufs_lookup.c 1.129 corresponds to ulfs_lookup.c 1.19
ufs_quota1.c 1.19 corresponds to ulfs_quota1.c 1.7
ufs_quota1.c 1.20 corresponds to ulfs_quota1.c 1.8
ufs_quota2.c 1.36 we have equivalent changes for
ufs_rename.c 1.9 corresponds to lfs_rename.c 1.5
ufs_rename.c 1.10 corresponds to lfs_rename.c 1.6
ufs_vnops.c 1.219 corresponds to lfs_vnops.c 1.260 and ulfs_vnops.c 1.19
ufs_vnops.c 1.220 corresponds to lfs_vnops.c 1.261 and ulfs_vnops.c 1.20
ufs_vnops.c 1.221 was superseded by later changes
ufs_vnops.c 1.222 got fixed independently in lfs
 1.296 19-Jun-2016  dholland we already have ufs_lookup.c 1.125 and ufs_vnops.c 1.218.
 1.295 19-Jun-2016  dholland missed one
(probably this should be tracked in some way other than pasting rcsid
comments, but it's what we've got)
 1.294 19-Jun-2016  dholland Merge -r1.216 of ufs_vnops.c: comments about maxsymlinklen handling
 1.293 21-Sep-2015  dholland Add 64-bit directory entry structures, and adjust accessors accordingly.

The LFS64 directory entry has a 64-bit inode number. This is stored as
two 32-bit values to avoid inducing 64-bit alignment requirements.

The exposed type for manipulating directory entries is now
LFS_DIRHEADER, following the same convention as e.g. IFILE and SEGUSE.
(But with LFS_ on it, because.)
 1.292 21-Sep-2015  dholland Oops; LFS_DIRECTSIZ() is going to need the fs as an argument.

Also, it turns out that dirhash needs a compile-time-constant version
of LFS_DIRECTSIZ(LFS_MAXNAMLEN+1), independent of 64-vs-32, so create
LFS_MAXDIRENTRYSIZE for this. Sigh.
 1.291 20-Sep-2015  dholland Clean up struct lfs_dirtemplate.
 1.290 15-Sep-2015  dholland Kill off ulfs_makedirentry; just pass the data to ulfs_direnter instead.
For now, move one copy of the code that allocates and fills in a
temporary struct lfs_direct to the top of ulfs_direnter; but it should
go away shortly.
 1.289 01-Sep-2015  dholland Add new accessors for the d_type and d_namlen fields of struct lfs_direct.
Napalm the old byteswap access logic for these.
 1.288 01-Sep-2015  dholland Use the lfs dinode accessors in place of the ufs-derived ones.
(Mostly.)

The ufs-derived ones are fake structure member macros, which are gross
and not very safe. Also, it seems that a lot of places in the lfs code
were using the ffsv1 branch of them unconditionally, and this way it's
guaranteed all those places have been updated.

Found while doing this: for non-devices, have getattr produce NODEV
in the rdev field instead of leaking the address of the first direct
block.
 1.287 19-Aug-2015  dholland Part two of dinodes; use the same union everywhere.
(previously the ufs-derived code had things set up slightly different)

Remove a bunch of associated mess.
 1.286 12-Aug-2015  dholland Hack up dinode usage to be 64 vs. 32 as needed. Part 1.

(This part changes the native lfs code; the ufs-derived code already
has 64 vs. 32 logic, but as aspects of it are unsafe, and don't
entirely interoperate cleanly with the lfs 64/32 stuff, pass 2 will be
rehashing that.)
 1.285 12-Aug-2015  dholland Make 32-bit and 64-bit versions of SEGSUM.
Also fix some of the FINFO handling as it's closely entangled.
 1.284 12-Aug-2015  dholland Make 32-bit and 64-bit versions of CLEANERINFO.

XXX: while this is written to disk, it seems like much of it would
XXX: be better set up as a commpage shared with the cleaner.
 1.283 12-Aug-2015  dholland Widen several of the fields of BLOCK_INFO to 64 bits.

Keep the old BLOCK_INFO as BLOCK_INFO_70, and version the fcntls that
use it.

Note that BLOCK_INFO_70 has 64-bit padding issues so that it's
different on 32-bit and 64-bit machines. This has been fixed. However,
BLOCK_INFO also contains a pointer, so compat32 stuff for 32-on-64 is
still needed and doesn't currently exist.
 1.282 12-Aug-2015  dholland Move the security checks for lfs_bmapv/lfs_markv into those functions.
(instead of the system call entry points)

Avoids duplication.

While touching these, pass the lwp around instead of the proc -- the
latter was there for no other reason than because once upon a time
struct proc was the first argument of all syscalls.

(For that matter, why not just use curlwp instead of passing it around
all over the place? The cost of passing it to every syscall probably
exceeds the cost of loading it from curcpu, even on machines where
it's not just kept in a register all the time.)
 1.281 03-Aug-2015  dholland Simplify some leftover code and remove some old assertions.

Last year when I killed off some evil dirop-related macros, I added
these assertions because if the things they asserted weren't true we'd
be leaking vnodes. Well, it seems that the code at the time did leak
vnodes, so certain failure cases (e.g. mkdir with disk full) would
assert. Nobody apparently tripped on this in the past fourteen months,
until I broke balloc so it always failed (unrelatedly) while working
on some LFS64 changes.

However, the vnode leak has since been removed by hannken@ as part of
the vnode cache changes, so the assertions are now superfluous;
instead, just make sure *vpp gets nulled on failure, and don't worry
about whether or not VU_DIROP is set as it shouldn't matter any more.

XXX: there's still a lot of gratuitous pointer aliasing in here that
should be tidied away.
 1.280 02-Aug-2015  dholland lfs_cleanint[] in the in-memory superblock needs to have 64-bit entries.
 1.279 02-Aug-2015  dholland Make i_eff_nblks in the in-memory inode 64 bits wide.
 1.278 28-Jul-2015  dholland Add a new lfs header file: lfs_accessors.h.

This contains all the accessor functions and macros out of lfs.h.
Add an include of lfs_accessors.h after all uses of lfs.h... except
for code that wants to define its own struct lfs-alike that the
accessors are supposed to play along with. For these, set STRUCT_LFS
and include lfs_accessors.h after the necessary structure has been
defined, so that lfs_accessors.h can emit functions in terms of it.
 1.277 26-Jul-2015  hannken lfs_flush_pchain: replace vget() with vcache_get().
 1.276 25-Jul-2015  martin Use accessors in DEBUG and DIAGNOSTIC code as well
 1.275 24-Jul-2015  dholland More lfs superblock accessors.
(This changes the rest of the code over; all the accessors were
already added.)

The difference between this commit and the previous one is arbitrary,
but the previous one passed the regression tests on its own so I'm
keeping it separate to help with any bisections that might be needed
in the future.
 1.274 24-Jul-2015  dholland Switch to accessor functions for elements of the LFS on-disk
superblock. This will allow switching between 32/64 bit forms on the
fly; it will also allow handling LFS_EI reasonably tidily. (That
currently doesn't work on the superblock.)

It also gets rid of cpp abuse in the form of fake structure member
macros.

Also, instead of doing sleep/wakeup on &lfs_avail and &lfs_nextseg
inside the on-disk superblock, add extra elements to the in-memory
struct lfs for this. (XXX: these should be changed to condvars, but
not right now)

XXX: this migrates a structure needed by the lfs code in libsa (struct
salfs) into lfs.h, where it doesn't belong, but for the time being
this is necessary in order to allow the accessors (and the various
lfs macros and other goop that relies on them) to compile.
 1.273 07-Jun-2015  hannken Fix copy and paste errors from last commits.
- Kernel i386/ALL and amd64/ALL compile again.
- Resolves CID 1304138 (DEADCODE) and 1304139 (IDENTICAL_BRANCHES).
 1.272 31-May-2015  hannken Change lfs from hash table to vcache.

- Change lfs_valloc() to return an inode number and version instead of
a vnode and move lfs_ialloc() and lfs_vcreate() to new lfs_init_vnode().

- Add lfs_valloc_fixed() to allocate a known inode, used by kernel
roll forward.

- Remove lfs_*ref(), these functions cannot coexist with vcache and
their commented behaviour is far away from their implementation.

- Add the cleaner lwp and blockinfo to struct ulfsmount so lfs_loadvnode()
may use hints from the cleaner.

- Remove vnode locks from ulfs_lookup() like we did with ufs_lookup().
 1.271 20-Apr-2015  riastradh Make VOP_LINK return directory still locked and referenced.

Ride 7.99.10 bump.
 1.270 27-Mar-2015  riastradh Disentangle buffer-cached I/O from page-cached I/O in UFS.

Page-cached I/O is used for regular files, and is initiated by VFS
users such as userland and NFS.

Buffer-cached I/O is used for directories and symlinks, and is issued
only internally by UFS.

New UFS routine ufs_bufio replaces vn_rdwr for internal use.
ufs_bufio is implemented by new UFS operations uo_bufrd/uo_bufwr,
which sit in ufs_readwrite.c alongside the VOP_READ/VOP_WRITE
implementations.

I preserved the code as much as possible and will leave further
simplification for future commits. I kept the ulfs_readwrite.c
copypasta close to ufs_readwrite.c in case we ever want to merge them
back; likewise ext2fs_readwrite.c.

No externally visible semantic change. All atf fs tests still pass.
 1.269 25-Jul-2014  dholland branches: 1.269.2; 1.269.4;
Add VOP_FALLOCATE and VOP_FDISCARD to every vnode ops table I can
find.

The filesystem ones all call genfs_eopnotsupp - right now I am only
implementing the plumbing and we can implement fallocate and/or
fdiscard for files later.

The device ones call spec_fallocate (which is also genfs_eopnotsupp)
and spec_fdiscard, which dispatches to the device-level op.

The fifo ones all call vn_fifo_bypass, which also ends up being
EOPNOTSUPP.
 1.268 17-May-2014  dholland Merge ulfs_create into lfs_create.
 1.267 17-May-2014  dholland Merge ulfs_mkdir into lfs_mkdir.
 1.266 17-May-2014  dholland Merge ulfs_symlink into lfs_symlink.
 1.265 17-May-2014  dholland Move the ulfs-level (copy of ufs) vnops for symlink, create, and mkdir
into lfs_vnops.c preparatory to folding them into the lfs entry points.

(lfs_vnops.c now has four licenses. sigh.)
 1.264 17-May-2014  dholland Remove the DIROP macros. They are evil, especially the CREATE ones.

This results in some duplicate logic in the creation vnops (symlink,
mknod, create, mkdir) but we will probably be able to factor it out in
a more sensible way later.

Now the creation vnops call getnewvnode explicitly instead of under
multiple layers of obscure gunk. Then we explicitly do lfs_set_dirop,
and afterwards lfs_unset_dirop.
 1.263 16-May-2014  dholland Move lfs_getpages and lfs_putpages to their own file.
 1.262 24-Mar-2014  hannken branches: 1.262.2;
- Make VI_XLOCK, VI_CLEAN and VI_LOCKSHARE private to kern/vfs_*.c.
- Make vwait() static.
- Add vdead_check() to check a vnode for being or becoming dead.

Discussed on tech-kern.

Welcome to 6.99.38
 1.261 23-Jan-2014  hannken Change vnode operations create, mknod, mkdir and symlink to return
the resulting vnode *vpp unlocked.

Discussed on tech-kern@

Welcome to 6.99.30
 1.260 17-Jan-2014  hannken Change vnode operations create, mknod, mkdir and symlink to keep the
directory node dvp locked on return.

Discussed on tech-kern@

Welcome to 6.99.29
 1.259 18-Oct-2013  christos use __USE() in the right place, instead of (void)var.
 1.258 17-Oct-2013  christos - remove unused variables
- add debug ifdefs for debugging variables
- __USE() where appropriate.
 1.257 29-Jul-2013  dholland Fix build both with and without options LFS_EI.
 1.256 29-Jul-2013  dholland Revert previous; it is wrong.
 1.255 28-Jul-2013  pgoyette Remove unused variable to fix the build.
 1.254 28-Jul-2013  dholland Merge the extattr VOPs from ffs.
As these do nothing besides dispatch to ulfs_extattr.c it wasn't
exactly hard.

This might just make extended attributes work on lfs...
 1.253 28-Jul-2013  dholland Migrate the miscellaneous ulfs-level info from struct ulfsmount to
struct lfs.

Put them inside #ifdef _KERNEL there. They are not the only such
members, gross as that is. Unfortunately, moving struct lfs to
lfs_kernel.h does not work.
 1.252 28-Jul-2013  dholland Add lfs_kernel.h for declarations that don't need to be exposed to userland.

lfs currently has the following headers:
lfs.h - on-disk structures and stuff needed for userlevel tools
lfs_inode.h - additional restricted materials for userlevel tools
that operate the fs (newfs_lfs, fsck_lfs, lfs_cleanerd)
lfs_kernel.h - stuff needed only in the kernel

and the following legacy headers that are expected to be mopped up and
folded into one of the above:
lfs_extern.h - function prototypes
ulfs_bswap.h - endian-independent support
ulfs_dinode.h - now contains very little
ulfs_dirhash.h - dirhash support
ulfs_extattr.h - extattr support
ulfs_extern.h - more function prototypes
ulfs_inode.h - assorted kernel-only declarations
ulfs_quota.h - quota support
ulfs_quota1.h - more quota support
ulfs_quota2.h - more quota support
ulfs_quotacommon.h - more quota support
ulfsmount.h - legacy copy of ufsmount material
 1.251 21-Jul-2013  dholland Merge logic from ulfs_close(), ulfs_getattr(), and ulfs_strategy()
into the preexisting lfs_*() versions of these functions, and delete
the unused ulfs copies.
 1.250 20-Jul-2013  dholland Merge ulfs_mknod into lfs_mknod, which was missing some bits.
 1.249 20-Jul-2013  dholland Collect the pieces of lfs rename into lfs_rename.c, and sprinkle static.
 1.248 18-Jun-2013  christos branches: 1.248.2; 1.248.4;
Prefix most of the cpp macros with lfs_ and LFS_ to avoid conflicts with ffs.
This was done so that boot blocks that want to compile both FFS and LFS in
the same file work.
 1.247 08-Jun-2013  dholland ulfs_dir.h has been emptied; remove it.
 1.246 08-Jun-2013  dholland Stick LFS_ in front of IFMT, IFIFO, IFREG, etc. so as not to conflict
with the UFS copies of these symbols. (Which themselves ought to have
UFS_ stuck on.)
 1.245 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.244 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.243 22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.242 09-May-2012  riastradh branches: 1.242.2;
Adapt ffs, lfs, and ext2fs to use genfs_rename.

ok dholland, rmind
 1.241 13-Mar-2012  elad Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.
 1.240 16-Feb-2012  perseant Pass t_renamerace and t_rmdirrace tests.

Adapt dholland@'s fix to ufs_rename to fix PR kern/43582. Address several
other MP locking issues discovered during the course of investigating the
same problem.

Removed extraneous vn_lock() calls on the Ifile, since the Ifile writes
are controlled by the segment lock.

Fix PR kern/45982 by deemphasizing the estimate of how much metadata
will fill the empty space on disk when the disk is nearly empty
(t_renamerace crates a lot of inode blocks on a tiny empty disk).
 1.239 02-Jan-2012  perseant branches: 1.239.2;

* Remove PGO_RECLAIM during lfs_putpages()' call to genfs_putpages(),
to avoid a live lock in the latter when reclaiming a vnode with
dirty pages.

* Add a new segment flag, SEGM_RECLAIM, to note when a segment is
being written for vnode reclamation, and record which inode is being
reclaimed, to aid in forensic debugging.

* Add a new segment flag, SEGM_SINGLE, so that opportunistic writes
can write a single segment's worth of blocks and then stop, rather
than writing all the way up to the cleaner's reserved number of
segments.

* Add assert statements to check mutex ownership is the way it ought
to be, mostly in lfs_putpages; fix problems uncovered by this.

* Don't clear VU_DIROP until the inode actually makes its way to disk,
avoiding a problem where dirop inodes could become separated
(uncovered by a modified version of the "ckckp" forensic regression
test).

* Move the vfs_getopsbyname() call into lfs_writerd. Prepare code to
make lfs_writerd notice when there are no more LFSs, and exit losing
the reference, so that, in theory, the module can be unloaded. This
code is not enabled, since it causes a crash on exit.

* Set IN_MODIFIED on inodes flushed by lfs_flush_dirops. Really we
only need to set IN_MODIFIED if we are going to write them again
(e.g., to write pages); need to think about this more.

Finally, several changes to help avoid "no clean segments" panics:

* In lfs_bmapv, note when a vnode is loaded only to discover whether
its blocks are live, so it can immediately be recycled. Since the
cleaner will try to choose ~empty segments over full ones, this
prevents the cleaner from (1) filling the vnode cache with junk, and
(2) squeezing any unwritten writes to disk and running the fs out of
segments.

* Overestimate by half the amount of metadata that will be required
to fill the clean segments. This will make the disk appear smaller,
but should help avoid a "no clean segments" panic.

* Rearrange lfs_writerd. In particular, lfs_writerd now pays
attention to the number of clean segments available, and holds off
writing until there is room.
 1.238 20-Sep-2011  chs branches: 1.238.2; 1.238.6;
strengthen the assertions about pages existing during block allocation,
which were incorrectly relaxed last year. add some comments so that
the intent of these is hopefully clearer.

in ufs_balloc_range(), don't free pages or mark them dirty if
allocating their backing store failed. this fixes PR 45369.
 1.237 12-Jul-2011  dholland Pass the ufs_lookup_results pointer around instead of fetching it from
the inode in the guts of ufs. Now, in VOPs where i_crap is used it is
used (directly) only immediately on entry to the VOP call and then
passed around by reference.

Except for rename, which needs explicit sorting out. The code in
ufs_wapbl_rename is unchanged in behavior but I'm increasingly
inclined to think it's wrong.
 1.236 11-Jul-2011  hannken Change VOP_BWRITE() to take a vnode as its first argument like all other
VOPs do. Layered file systems no longer have to modify bp->b_vp and run
into trouble when an async VOP_BWRITE() uses the wrong vnode.

- change all occurences of VOP_BWRITE(bp) to VOP_BWRITE(bp->b_vp, bp).
- remove layer_bwrite().
- welcome to 5.99.55

Adresses PR kern/38762 panic: vwakeup: neg numoutput

No objections from tech-kern@.
 1.235 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.234 05-Jan-2011  martin branches: 1.234.6;
Avoid NULL deref inside a KASSERT, as discussed on tech-kern.
 1.233 02-Jan-2011  dholland Remove the special refcount behavior (adding an extra reference to the
parent dir) associated with SAVESTART in relookup().

Check all call sites to make sure that SAVESTART wasn't set while
calling relookup(); if it was, adjust the refcount behavior. Remove
related references to SAVESTART.

The only code that was reaching the extra ref was msdosfs_rename,
where the refcount behavior was already fairly broken and/or gross;
repair it.

Add a dummy 4th argument to relookup to make sure code that hasn't
been inspected won't compile. (This will go away next time the
relookup semantics change, which they will.)
 1.232 18-Dec-2010  eeh Byebye deadlock.
 1.231 04-Aug-2010  hannken Free the on disk inodes in the reclaim routine.
 1.230 29-Jul-2010  hannken Add vm page flag PG_MARKER and use it to tag dummy marker pages
in genfs_do_putpages() and uao_put().
Use 'v_uobj.uo_npages' to check for an empty memq.
Put some assertions where these marker pages may not appear.

Ok: YAMAMOTO Takashi <yamt@netbsd.org>
 1.229 24-Jun-2010  hannken Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.
 1.228 24-Jun-2010  hannken Clean up vnode lock operations:

- VOP_LOCK(vp, flags): Limit the set of allowed flags to LK_EXCLUSIVE,
LK_SHARED and LK_NOWAIT. LK_INTERLOCK is no longer allowed as it
makes no sense here.

- VOP_ISLOCKED(vp): Remove the for some time unused return value
LK_EXCLOTHER. Mark this operation as "diagnostic only".
Making a lock decision based on this operation is no longer allowed.

Discussed on tech-kern.
 1.227 29-Mar-2010  pooka Stop exposing fifofs internals and leave only fifo_vnodeop_p visible.
 1.226 07-Dec-2009  eeh branches: 1.226.2; 1.226.4;
Fix some more hangs and deadlocks.
 1.225 17-Nov-2009  eeh This should fix a deadlock.
 1.224 05-Nov-2009  pooka Include compat code by default.
 1.223 30-Oct-2009  christos compile without COMPAT_50
 1.222 29-Oct-2009  christos PR/42246: NAKAJIMA Yoshihiro: provide COMPAT_50 for LFS
 1.221 07-May-2009  elad Replace KAUTH_GENERIC_ISSUSER with a better alternative.
 1.220 22-Feb-2009  ad PR kern/26878 FFSv2 + softdep = livelock (no free ram)
PR kern/16942 panic with softdep and quotas
PR kern/19565 panic: softdep_write_inodeblock: indirect pointer #1 mismatch
PR kern/26274 softdep panic: allocdirect_merge: ...
PR kern/26374 Long delay before non-root users can write to softdep partitions
PR kern/28621 1.6.x "vp != NULL" panic in ffs_softdep.c:4653 while unmounting a softdep (+quota) filesystem
PR kern/29513 FFS+Softdep panic with unfsck-able file-corruption
PR kern/31544 The ffs softdep code appears to fail to write dirty bits to disk
PR kern/31981 stopping scsi disk can cause panic (softdep)
PR kern/32116 kernel panic in softdep (assertion failure)
PR kern/32532 softdep_trackbufs deadlock
PR kern/37191 softdep: locking against myself
PR kern/40474 Kernel panic after remounting raid root with softdep

Retire softdep, pass 2. As discussed and later formally announced on the
mailing lists.
 1.219 16-Jan-2009  yamt branches: 1.219.2;
one more change which i forgot to commit with
UVM_PAGE_HASH_PENALTY -> UVM_PAGE_TREE_PENALTY rename.
noticed by Andreas Wrede.
 1.218 24-Jun-2008  gmcgarry branches: 1.218.4; 1.218.6; 1.218.12;
fcntl(4) says the command is type int. lfs_fcntl() comment says u_long. The implementation says int. Synchronise comment with documentation and cast to int before comparison.
 1.217 04-Jun-2008  ad branches: 1.217.2;
vm_page: put TAILQ_ENTRY into a union with LIST_ENTRY, so we can use both.
 1.216 28-Apr-2008  martin branches: 1.216.2;
Remove clause 3 and 4 from TNF licenses
 1.215 25-Jan-2008  ad branches: 1.215.6; 1.215.8; 1.215.10;
Remove VOP_LEASE. Discussed on tech-kern.
 1.214 02-Jan-2008  ad Merge vmlocking2 to head.
 1.213 26-Nov-2007  pooka branches: 1.213.2; 1.213.6;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.212 10-Oct-2007  ad branches: 1.212.4;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.211 08-Oct-2007  ad Merge ffs locking & brelse changes from the vmlocking branch.
 1.210 29-Jul-2007  ad branches: 1.210.4; 1.210.6; 1.210.8; 1.210.10;
It's not a good idea for device drivers to modify b_flags, as they don't
need to understand the locking around that field. Instead of setting
B_ERROR, set b_error instead. b_error is 'owned' by whoever completes
the I/O request.
 1.209 27-Jul-2007  pooka Change unused fflags parameter in VOP_MMAP to prot and pass in
desired vm protection.
 1.208 10-Jul-2007  perseant branches: 1.208.2;
Move the "vp = NULL" assignment after the code that requires vp != NULL.
Reported by Chris Ross on current-users.
 1.207 09-Jul-2007  ad Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.206 24-Apr-2007  perseant Get rid of our own private copy of genfs_putpages, having adapted the real
genfs_putpages to suit our purposes.
 1.205 17-Apr-2007  perseant Fix another locking protocol error in lfs_fsync().
 1.204 17-Apr-2007  perseant Fix MP locking protocol violations introduced in my previous commit.
 1.203 17-Apr-2007  perseant Install a new sysctl, vfs.lfs.ignore_lazy_sync, which causes LFS to ignore
the "smooth" syncer, as if vfs.sync.*delay = 0, but only for LFS. The
default is "on", i.e., ignore lazy sync.

Reduce the amount of polling/busy-waiting done by lfs_putpages(). To
accomplish this, copied genfs_putpages() and modified it to indicate which
page it was that caused it to return with EDEADLK. fsync()/fdatasync()
should no longer ever fail with EAGAIN, and should not consume huge
quantities of cpu.

Also, try to make dirops less likely to be written as the result of a
VOP_PUTPAGES(), while ensuring that they are written regularly.
 1.202 05-Apr-2007  perseant correct comment for lfs_putpages
 1.201 04-Mar-2007  christos branches: 1.201.2; 1.201.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.200 21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.199 20-Feb-2007  ad Call genfs_node_destroy() where appropriate.
 1.198 09-Feb-2007  ad branches: 1.198.2;
Merge newlock2 to head.
 1.197 19-Jan-2007  hannken New file system suspension API to replace vn_start_write and vn_finished_write.
The suspension helpers are now put into file system specific operations.
This means every file system not supporting these helpers cannot be suspended
and therefore snapshots are no longer possible.

Implemented for file systems of type ffs.

The new API is enabled on a kernel option NEWVNGATE. This option is
not enabled by default in any kernel config.

Presented and discussed on tech-kern with much input from
Bill Studenmund <wrstuden@netbsd.org> and YAMAMOTO Takashi <yamt@netbsd.org>.

Welcome to 4.99.9 (new vfs op vfs_suspendctl).
 1.196 04-Jan-2007  elad Consistent usage of KAUTH_GENERIC_ISSUSER.
 1.195 03-Jan-2007  perseant Change VONWORKLST handling to better match its other uses; in particular,
check memq and clear VWRITEMAPDIRTY at the same time.
 1.194 09-Dec-2006  chs a smorgasbord of improvements to vnode locking and path lookup:
- LOCKPARENT is no longer relevant for lookup(), relookup() or VOP_LOOKUP().
these now always return the parent vnode locked. namei() works as before.
lookup() and various other paths no longer acquire vnode locks in the
wrong order via vrele(). fixes PR 32535.
as a nice side effect, path lookup is also up to 25% faster.
- the above allows us to get rid of PDIRUNLOCK.
- also get rid of WANTPARENT (just use LOCKPARENT and unlock it).
- remove an assumption in layer_node_find() that all file systems implement
a recursive VOP_LOCK() (unionfs doesn't).
- require that all file systems supply vfs_vptofh and vfs_fhtovp routines.
fill in eopnotsupp() for file systems that don't support being exported
and remove the checks for NULL. (layerfs calls these without checking.)
- in union_lookup1(), don't change refcounts in the ISDOTDOT case, just
adjust which vnode is locked. fixes PR 33374.
- apply fixes for ufs_rename() from ufs_vnops.c rev. 1.61 to ext2fs_rename().
 1.193 16-Nov-2006  christos branches: 1.193.2;
__unused removal on arguments; approved by core.
 1.192 20-Oct-2006  reinoud Replace the LIST structure mp->mnt_vnodelist to a TAILQ structure since all
vnodes were synced and processed backwards. This meant that the last
accessed node was processed first and the earlierst last.

An extra benefit is the removal of the ugly hack from the Berkly days on
LFS.

In the proces, i've also replaced the various variations hand written loops
by the TAILQ_FOREACH() macro's.
 1.191 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.190 28-Sep-2006  perseant Use lockstatus instead of a homebrewed locking system to control
LFCNWRAPSTOP and LFCNWRAPGO.

Be less verbose about the various looping checks: use log() rather than
printf(), and only log anything if we are really looping ("count = 2" is
not an error condition).

Allow dirops sleeping on available space to be interruptible.
 1.189 15-Sep-2006  perseant branches: 1.189.2;
Don't remark a locked inode with IN_MODIFIED after writing it to disk,
if we ourselves hold the lock. This prevents e.g. mknod from hanging
indefinitely.

Also, always use the return value from VOP_ISLOCKED to determine whether
we hold the lock or someone else does, rather than looking into the lock
structure ourselves.
 1.188 01-Sep-2006  perseant branches: 1.188.2;
Changes to help the roll-forward agent, to wit:

* Mark being-deleted files in the Ifile so we can finish deleting them
at fs mount time.
* Flag the Ifile with "cleaner must clean" when writers are waiting for
the cleaner, rather than relying solely on the cleaner's estimation of
whether it should clean or not.
* Note partial segments written by a user agent (in particular,
fsck_lfs) so that repeated rolls forward don't interfere with one
another.
* Add a new fcntl, LFCNPASS, that allows the log to wrap exactly once,
for better testing of the validity of checkpoints.
* Keep track of the on-disk nlink count when cleaning, so that we don't
partially complete directory operations while cleaning.
* Ensure that every single Ifile inode write represents a consistent
view of the filesystem. In particular, the accounting for the segment
we are writing the inode into must be correct, and the accounting for
the segment that inode used to reside in must be correct. Rather than
just rewriting the inode if we wrote it wrong, rewrite the necessary
ifile blocks before writing the inode so we never write it wrong.
* Don't unmark any VDIROP vnodes if we haven't written them to disk,
avoiding yet another problem with the "wait for the cleaner" error
return from lfs_putpages().

Also, move the last callback to an aiodone call, so we no longer do any
memory management from interrupt context.
 1.187 06-Aug-2006  martin Fix size confusion with lfs_fhandle - and as it now turns out to be the same
as the lfs compat_30_fhandle, g/c the latter.
Add an alias for the LFCNIFILEFH fcntl, so that binaries compiled in the
meantime (with too large lfs_fhandle) continue to work.

This makes vfs_cleanerd work again after the kernel checks filehandle size
more strictly (problem reported by Kurt Schreiner on current-users).
 1.186 31-Jul-2006  martin Make filehandles opaque to userland
 1.185 23-Jul-2006  ad Use the LWP cached credentials where sane.
 1.184 20-Jul-2006  perseant Move the kauth checks up front, so that all new LFS fcntl calls are subject
to the check for superuser privilege.
 1.183 13-Jul-2006  martin Apply _KERNEL_OPT
 1.182 13-Jul-2006  martin Version the lfs_cleanerd internal fcntl() for filehandles too,
so old cleaners should work with newer kernels.
 1.181 13-Jul-2006  martin Fix alignement problems for fhandle_t, exposed by gcc4.1.

While touching all vptofh/fhtovp functions, get rid of VFS_MAXFIDSIZ,
version the getfh(2) syscall and explicitly pass the size available in
the filehandle from userland.

Discussed on tech-kern, with lots of help from yamt (thanks!).
 1.180 29-Jun-2006  perseant Don't wake up the cleaner if the filesystem is unwrappable, and fix the
compatibility fcntls.

Also includes one-line fixes for an MP locking bug and a zero-length FINFO
problem that manifested during testing.
 1.179 24-Jun-2006  perseant Change LFCNWRAP{STOP,GO} to make them more suitable for snapshotting; in
particular, the caller can now choose whether to wait for the condition
to be met, and if the caller of LFCNWRAPSTOP dies or otherwise closes
the descriptor, the filesystem is started again. Updated the ckckp
regression test to use the new semantics.

dump_lfs(8) now uses the fcntls to implement LFS-style snapshotting through
the -X flag, addressing PR#33457 albeit not using fss(4). Fixed a couple
other problems with dump_lfs that manifested themselves during testing.
 1.178 18-May-2006  perseant branches: 1.178.4;
Break out the finfo array manipulation code into two new functions,
lfs_acquire_finfo() and lfs_release_finfo(). Add a debugging check
for zero-length finfo arrays in the segment summary to avoid future
regressions.
 1.177 17-May-2006  perseant Don't be quite so eager to error out from lfs_putpages() when pages are
busy; if we've sensed a possible 3-way deadlock and are not the pagedaemon,
relock and try again.
 1.176 14-May-2006  elad integrate kauth.
 1.175 12-May-2006  perseant Fixes to address the "vinvalbuf: dirty blocks" panic that can occur when
many inodes are cleaned at once. Make sure that we write all the pages
on vnodes that are being flushed, even if we don't think there's room;
drain v_numoutput before lfs_vflush() completes.

Also, don't allow a vnode that is in the process of being cleaned to be
chosen by getnewvnode(); this avoids a segment accounting panic in the case
that a large number of inodes are fed to lfs_markv() all at once.
 1.174 04-May-2006  perseant Change VOP_FCNTL to take an unlocked vnode. Approved by wrstuden@.
 1.173 04-May-2006  perseant Introduce another per-filesystem parameter, lfs_resvseg, to separate the
notion of "how many segments are reserved for the cleaner" from that of
"how many segments are not counted in lfs_bfree". The default value
used for existing filesystems is the same as the previous implicit value
of (lfs_minfreeseg / 2 + 1), modulo some sanity checking.

Count pending dirops on a per-filesystem basis, since once we start
writing them we can't stop until we're done. This seems to help stave off
the "no clean segments" panic in the case of filling the filesystem with
directories and small files (e.g. simultaneously unpacking more copies of
pkgsrc than will fit).
 1.172 02-May-2006  perseant Fix a "locking against myself": lfs_flush_dirops() doesn't need to lock the
vnodes to write their blocks, since it holds the segment lock.
 1.171 01-May-2006  perseant Don't ever partially write dirops, even if we need the cleaner to run.
This increases the chances of the "no clean segments" panic slightly,
but allows us to run the ckckp regression test successfully to completion.
 1.170 30-Apr-2006  perseant Postpone the segment accounting changes coming from truncation until the
inode that makes those changes valid is either written to disk by
lfs_writeinode() or discarded by lfs_vfree().

A couple of locking fixes are also included as well.
 1.169 18-Apr-2006  perseant Yet another MP locking issue.
 1.168 17-Apr-2006  perseant Introduce two fcntl calls that freeze the filesystem right at the point
where segment 0 is being considered for writing. This allows for automated
checkpoint vailidity scanning, and could be used (in conjunction with the
existing LFCNREWIND) for e.g. snapshot dumps as well.

Include a regression test that does such scanning.

When writing the Ifile, loop through the dirty block list three times to
make sure that the checkpoint is always consistent (the first and second
times the Ifile blocks can cross a segment boundary; not so the third time
unless the segments are very small). Discovered by using the aforementioned
regression test.
 1.167 13-Apr-2006  perseant Make lfs_vref/lfs_vunref not need to know about VXLOCK and VFREEING
explicitly (especially since we didn't know about VFREEING at all before),
but notice the EBUSY return from vget() instead.

Fix some more MP locking protocol issues, most of which were pointed out by
Christian Ehrhardt this morning on tech-kern.
 1.166 11-Apr-2006  perseant Another MP locking fix.
 1.165 10-Apr-2006  perseant Don't leak vnode references if we fail to lock a vnode in lfs_flush_pchain().
Also fix another (probably only academic) simple_lock protocol error.
 1.164 08-Apr-2006  perseant Implement a somewhat finer-grained mechanism for paging LFS-backed pages.
The writer daemon, if it does not need to flush the whole filesystem,
now only writes the vnodes for which the pagedaemon has requested pageouts
(although it does not pay attention to the page ranges the pagedaemon
supplies).
 1.163 07-Apr-2006  perseant Several minor bug fixes:

* Correct (weak) segment lock assertions in lfs_fragextend and lfs_putpages.
* Keep IN_MODIFIED set if we run out of avail in lfs_putpages.
* Don't try to (re)write buffers on a VBLK vnode; fixes a panic I found
while running with an LFS root.
* Raise priority of LFCNSEGWAIT to PVFS; PUSER is way too low for
something the pagedaemon is relying on.
 1.162 01-Apr-2006  perseant Make sure we unlock to zero when avoiding 3-way deadlock; otherwise we
simply have a different form of deadlock.
 1.161 31-Mar-2006  perseant Handle the "filesystem is clean" flag correctly when upgrading from
read-only to read-write mount. This makes "root on lfs" work for me,
although it looks like a different traceback from PR#32667.
 1.160 30-Mar-2006  yamt some cleanups after the introduction of GOP_SIZE_MEM flag.
- remove GOP_SIZE_READ/GOP_SIZE_WRITE flags.
they have not been used since the change.
- ufs_balloc_range: remove code which has been no-op since the change.
thanks Konrad Schroder for explaining the original intention of the code.
- ffs_gop_size: don't extend past eof, in the case of GOP_SIZE_MEM.
otherwise genfs_getpages end up to allocate pages past eof unnecessarily.
 1.159 28-Mar-2006  perseant Don't let the pagedaemon wait for pages, since that is just asking for
a deadlock.
 1.158 24-Mar-2006  perseant Improvements to LFS's paging mechanism, to wit:

* Acknowledge that sometimes there are more dirty pages to be written to
disk than clean segments. When we reach the danger line,
lfs_gop_write() now returns EAGAIN. The caller of VOP_PUTPAGES(), if
it holds the segment lock, drops it and waits for the cleaner to make
room before continuing.

* Note and avoid a three-way deadlock in lfs_putpages (a writer holding
a page busy blocks on the cleaner while the cleaner blocks on the
segment lock while lfs_putpages blocks on the page).
 1.157 11-Dec-2005  christos branches: 1.157.4; 1.157.6; 1.157.8; 1.157.10; 1.157.12;
merge ktrace-lwp.
 1.156 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.155 13-Sep-2005  christos branches: 1.155.2;
split out lfs_itimes(). It is used in fsck_lfs.
 1.154 12-Sep-2005  christos Use nanotime() to update the time fields in filesystems. Convert the code
from macros to real functions. Original patch and review from chuq.
Note: ext2fs only keeps seconds in the on-disk inode, and msdosfs does not
have enough precision for all fields, so this is not very useful for those
two.
 1.153 19-Aug-2005  christos 64 bit inode changes.
 1.152 29-May-2005  christos branches: 1.152.2;
- sprinkle const
- avoid shadow variables.
 1.151 20-May-2005  perseant VOP_LOCK drops the interlock; pick it up again to avoid an "already unlocked"
panic in lfs_putpages.
 1.150 27-Apr-2005  perseant Recognize that we hold the v_interlock when relocking after a flush in
lfs_putpages.
 1.149 25-Apr-2005  skrll Use the right arg structure for lfs_setattr, i.e. s/getattr/setattr/.
 1.148 23-Apr-2005  perseant Provide a resize_lfs(8), including kernel and cleaner support. The current
implementation requires the fs to be mounted while resizing. Tested in both
directions, and everything appears to work happily, but ymmv.
 1.147 19-Apr-2005  perseant Keep per-inode, per-fs, and subsystem-wide counts of blocks allocated through
lfs_balloc(), and use that to estimate the number of dirty pages belonging
to LFS (subsystem or filesystem). This is almost certainly wrong for
the case of a large mmap()ed region, but the accounting is tighter than
what we had before, and performs much better in the typical case of pages
dirtied through write().
 1.146 18-Apr-2005  perseant Check for the inode having been previously freed, in UNMARK_VNODE().
Avoids a panic when calling mkdir() on a full filesystem.
 1.145 16-Apr-2005  perseant Use splay trees, rather than a hash table, to manage the accounting of
blocks allocated through VOP_BALLOC() for pages to be written to disk.
This accounting no longer takes a noticeable fraction of the system CPU.
 1.144 16-Apr-2005  perseant Use lfs_malloc() to manage the blkiov arrays that the cleaner functions use,
since the cleaner is likely to operate in a low-memory condition.
 1.143 14-Apr-2005  perseant Tabify leading whitespace
 1.142 14-Apr-2005  perseant Consolidate the hash table we use to maintain the integrity of lfs_avail
into a single, system-wide table, rather than having a separate hash table
per inode. Significantly reduces the "system" cpu usage of your average
file write.
 1.141 01-Apr-2005  perseant Protect various per-fs structures with fs->lfs_interlock simple_lock, to
improve behavior in the multiprocessor case. Add debugging segment-lock
assertion statements.
 1.140 25-Mar-2005  perseant Don't sleep while holding the vnode interlock. Should take care of the
first panic case in PR #26043.
 1.139 24-Mar-2005  chs avoid the need for recursive locking lfs_flush_dirops() by unlocking
the vnode around the call to this in the caller.
 1.138 23-Mar-2005  perseant Make LFS dirops get their vnode first, before incrementing the dirop count,
to prevent a deadlock trying to call VOP_PUTPAGES() on a VDIROP vnode.
This can happen when a stacked filesystem is mounted on top of an LFS: an
LFS dirop needs to get a vnode, which is available from the upper layer.
The corresponding lower layer vnode, however, is VDIROP, so the upper layer
can't be cleaned out since its VOP_PUTPAGES() is passed through to the lower
layer, which waits for dirops to drain before it can proceed. Deadlock.

Tweak ufs_makeinode() and ufs_mkdir() to pass the a_vpp argument through
to VOP_VALLOC().

Partially addresses PR # 26043, though it probably does not completely fix
the problem described there.
 1.137 08-Mar-2005  simonb branches: 1.137.2;
Tab Police.
 1.136 08-Mar-2005  perseant Straighten out the maze of ifdefs. Instead, consolidate all the debugging
stuff under '#ifdef DEBUG', and use sysctl knobs to turn on/off particular
parts of the debugging reporting (if DEBUG is enabled). Re-enable the LFS
statistics in sysctl, while I'm there. A bit of a rototill.
 1.135 26-Feb-2005  perry nuke trailing whitespace
 1.134 26-Feb-2005  perseant Various minor LFS improvements:

* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statvfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().
 1.133 25-Jan-2005  wrstuden Extend fsync_range(2) to support the FDISKSYNC flag, which requests
that the sync be propogated out through the disk drive caches.
 1.132 22-Apr-2004  yamt branches: 1.132.4; 1.132.6;
check_dirty: fix another PHOLD leak. ("goto top" path)
 1.131 21-Apr-2004  christos Replace the statfs() family of system calls with statvfs().
Retain binary compatibility.
 1.130 20-Apr-2004  yamt check_dirty: plug a PHOLD leak. from Greg Oster.
 1.129 26-Feb-2004  yamt branches: 1.129.4;
lfs_putpages: fix a simple_lock mismatch.
 1.128 26-Jan-2004  hannken Fix xxx_strategy() to use the vnode arg instead of bp->b_vp.
 1.127 25-Jan-2004  hannken Make VOP_STRATEGY(bp) a real VOP as discussed on tech-kern.

VOP_STRATEGY(bp) is replaced by one of two new functions:

- VOP_STRATEGY(vp, bp) Call the strategy routine of vp for bp.
- DEV_STRATEGY(bp) Call the d_strategy routine of bp->b_dev for bp.

DEV_STRATEGY(bp) is used only for block-to-block device situations.
 1.126 16-Dec-2003  yamt - reduce code duplication.
- use boolean_t where appropriate.
 1.125 16-Dec-2003  yamt g/c lfs_no_inactive.
 1.124 25-Nov-2003  yamt use FINFOSIZE macro.
 1.123 30-Oct-2003  simonb Remove some assigned-to but otherwise unused variables.
 1.122 25-Oct-2003  christos Fix uninitialized variable warnings.
 1.121 21-Oct-2003  fvdl Correct preempt() calls.
 1.120 18-Oct-2003  yamt be more strict about sa->vp.
(make sure the last lfs_updatemata in lfs_putpages takes effect.)
 1.119 14-Oct-2003  dbj add mnt_iflag field to struct mount for internal flags
mv MNT_GONE, MNT_UNMOUNT and MNT_WANTRDWR to this field
additonally add mnt_writeopcountupper and mnt_writeopcountlower fields
in preparation for pending write suspension support work
bump kernel version to 1.6ZD
 1.118 24-Sep-2003  yamt fix a bug of lfs.

genfs_getpages() can read in more blocks than it should due to faked filesize
of lfs_gop_size(). it's a security problem and it makes gcc3 "internal error"

to fix this,
- in genfs_getpages(), always calculate diskeof and memeof separately
so that filesystems (in this case, lfs) can use different strategies
for them.
- introduce GOP_SIZE_MEM flag and use it to request in-core filesize.
(it was an intention of GOP_SIZE_READ,
but after the above change _READ is not a straightforward name)

after this, no one uses GOP_SIZE_{READ,WRITE} anymore but leave them for now.
 1.117 23-Sep-2003  yamt cleanup IN_ADIROP/VDIROP handling a little.
 1.116 23-Sep-2003  yamt remove unnecessary externs of lfs_do_flush.
 1.115 20-Sep-2003  yamt some comments
 1.114 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.113 12-Jul-2003  yamt more MP locks.
 1.112 12-Jul-2003  yamt - protect global resource counts with lfs_subsys_lock.
- clean up scattered externs a little.
 1.111 02-Jul-2003  yamt - add a new functions, lfs_writer_enter/leave, and use them instead of
duplicated code fragments.
- add an assertion.
 1.110 02-Jul-2003  yamt drain dirops before aqcuiring seglock. otherwise it might deadlocks.
PR/20676 (Karl Knutsson)
 1.109 29-Jun-2003  fvdl branches: 1.109.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.108 29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.107 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.106 07-May-2003  ragge Add a missing ifdef DDB.
 1.105 02-May-2003  perseant Correct arguments to check_dirty, ensuring that all pages in a block are
written if any of them are dirty. Pointed out by yamt.
 1.104 27-Apr-2003  yamt fix a comment.
 1.103 23-Apr-2003  perseant Make LFS work better (though still not "well") as an NFS-exported
filesystem (and other things that needed to be fixed before the tests
would complete), to wit:

* Include the fs ident in the filehandle; improve stale filehandle checks.

* Change definition of blksize() to use the on-dinode size instead of
the inode's i_size, so that fsck_lfs will work properly again.

* Use b_interlock in lfs_vtruncbuf.

* Postpone dirop reclamation until after the seglock has been released,
so that lfs_truncate is not called with the segment lock held.

* Don't loop in lfs_fsync(), just write everything and wait.

* Be more careful about the interlock/uobjlock in lfs_putpages: when we
lose this lock, we have to resynchronize dirtiness of pages in each
block.

* Be sure to always write indirect blocks and update metadata in
lfs_putpages; fixes a bug that caused blocks to be accounted to the
wrong segment.
 1.102 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.101 01-Apr-2003  yamt lfs_strategy is used only for read.
 1.100 28-Mar-2003  perseant Add a sleeper count, to prevent the cleaner from panicing the kernel
when the filesystem is unmounted, relocking the Ifile when its lock is
draining. (We can't use vfs_busy() since the process is sleeping for a
good long time.) Clean up / organize lfs.h, while I'm here.

In lfs_update_single, assert that disk addresses are either negative, or
are still positive when converted to int32_t, to prevent recurrence of a
negative/positive block problem.
 1.99 22-Mar-2003  perseant Unlock ifile inode during streamlined VOP_INACTIVE.
 1.98 21-Mar-2003  perseant KNF (space after keywords).
 1.97 21-Mar-2003  perseant Use VONWORKLST as a heuristic for vnode emptiness, rather than exhaustively
checking the memq.

Take greater care not to dirty the Ifile vnode when unmounting the filesystem.
This should fix a "(vp->v_flag & VONWORKLST) == 0" assertion panic in vgonel
that could occur when unmounting.

Do not allow the Ifile to be mapped for writing.
 1.96 15-Mar-2003  perseant Add simple_lock protection for lfs_seglock and lfs_subsys_pages; these will
be expanded to cover other per-fs and subsystem-wide data as well.

Fix a case of IN_MODIFIED being set without updating lfs_uinodes, resulting
in a "lfs_uinodes < 0" panic.

Fix a deadlock in lfs_putpages arising from the need to busy all pages in a
block; unbusy any that had already been busied before starting over.
 1.95 08-Mar-2003  perseant Take away "#ifdef LFS_UBC".
 1.94 08-Mar-2003  perseant Add an lfs_strategy() that checks to make sure we're not trying to read
where the cleaner is trying to write, instead of tying up the "live"
buffers (or pages).

Fix a bug in the LFS_UBC case where oversized buffers would not be
checksummed correctly, causing uncleanable segments.

Make sure that wakeup(fs->lfs_iocount) is done if fs->lfs_iocount is 1
as well as 0, since we wait in some places for it to drop to 1.

Activate all pages that make it into lfs_gop_write without the segment
lock held, since they must have been dirtied very recently, even if
PG_DELWRI is not set.
 1.93 04-Mar-2003  perseant Make sure we hold the uobjlock when checking for dirty pages, in lfs_vflush.
Note that pages can become dirty without our knowing it, anyway; don't
panic if that happens.
 1.92 02-Mar-2003  perseant Account SEGUSE_ACTIVE correctly so that the automatic segment cleaning
actually happens.

Add a new fcntl call that will write the minimum necessary to checkpoint
(i.e., for on-disk directory structure to be consistent, not including
updates to file data) so that the cleaner can clean segments more quickly
without sacrificing three-way commit for cleaning.
 1.91 01-Mar-2003  yamt use pid_t for pid.
 1.90 25-Feb-2003  perseant Make fs-specific fcntl macros take three arguments (approved wrstuden).
Let LFS use fcntl for cleaner functions.
 1.89 24-Feb-2003  perseant Add lfs_ioctl vnode op, with ioctls to take over cleaner system call
functionality (not including segment clean, since that is now done
automatically as checkpoints happen).
 1.88 23-Feb-2003  perseant Fix a buffer overflow bug in the LFS_UBC case that manifested itself
either as a mysterious UVM error or as "panic: dirty bufs". Verify
maximum size in lfs_malloc.

Teach lfs_updatemeta and lfs_shellsort about oversized cluster blocks from
lfs_gop_write.

When unwiring pages in lfs_gop_write, deactivate them, under the theory
that the pagedaemon wanted to free them last we knew.
 1.87 22-Feb-2003  yamt fix simple_lock/unlock mismatches.
 1.86 20-Feb-2003  perseant Tabify, and fix some comment alignment problems.
 1.85 19-Feb-2003  yamt wire the pages instead of just dequeue'ing them.
advised by Chuck Silvers.
 1.84 17-Feb-2003  perseant Add code to UBCify LFS. This is still behind "#ifdef LFS_UBC" for now
(there are still some details to work out) but expect that to go
away soon. To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:

* Create a writer daemon kernel thread whose purpose is to handle page
writes for the pagedaemon, but which also takes over some of the
functions of lfs_check(). This thread is started the first time an
LFS is mounted.

* Add a "flags" parameter to GOP_SIZE. Current values are
GOP_SIZE_READ, meaning that the call should return the size of the
in-core version of the file, and GOP_SIZE_WRITE, meaning that it
should return the on-disk size. One of GOP_SIZE_READ or
GOP_SIZE_WRITE must be specified.

* Instead of using malloc(...M_WAITOK) for everything, reserve enough
resources to get by and use malloc(...M_NOWAIT), using the reserves if
necessary. Use the pool subsystem for structures small enough that
this is feasible. This also obsoletes LFS_THROTTLE.

And a few that are not strictly necessary:

* Moves the LFS inode extensions off onto a separately allocated
structure; getting closer to LFS as an LKM. "Welcome to 1.6O."

* Unified GOP_ALLOC between FFS and LFS.

* Update LFS copyright headers to correct values.

* Actually cast to unsigned in lfs_shellsort, like the comment says.

* Keep track of which segments were empty before the previous
checkpoint; any segments that pass two checkpoints both dirty and
empty can be summarily cleaned. Do this. Right now lfs_segclean
still works, but this should be turned into an effectless
compatibility syscall.
 1.83 03-Feb-2003  perseant Don't call a dirop within a dirop: if lfs_rename is actually deleting
a link, call lfs_remove directly before starting dirop rather than
having ufs_rename do it.
 1.82 30-Jan-2003  yamt there's no need to treat VOP_WHITEOUT as dirop
because it modifies only one inode.
 1.81 25-Jan-2003  kleink Fix further printf format warnings for DEBUG, in the wake of daddr_t
having changed.
 1.80 18-Jan-2003  thorpej Merge the nathanw_sa branch.
 1.79 08-Jan-2003  yamt for lfs_remove/lfs_rmdir, keep removed vnodes marked VDIROP.
(backout parts of rev.1.40)
otherwise, directory structures can be corrupted because checkpoints can
occur via eg. lfs_vflush before parent directory is written.
 1.78 08-Jan-2003  yamt in set_dirop/endop, use normal vref/vrele instead of lfs versions
so that we don't miss lfs_inactivate.
 1.77 08-Jan-2003  yamt add assertions.
 1.76 08-Jan-2003  yamt use lfs_unmark_vnode instead of duplicated code fragments.
 1.75 29-Dec-2002  yamt backout assertions in lfs_inactive.
they can be false when unmounting forcibly.
 1.74 28-Dec-2002  christos fix compile problem.
 1.73 28-Dec-2002  yamt avoid warnings without DIAGNOSTIC.

pointed by Andreas Wrede.
 1.72 28-Dec-2002  yamt dirop inode can't be passed to lfs_inactivate.
 1.71 28-Dec-2002  yamt - in lfs_reserve, vref vnodes that we're locking so that cleaner doesn't
try to reclaim them.
(workaround for deadlock noted in the comment in lfs_reserveavail)
- in lfs_rename, mark vnodes which are being moved as well as directry vnodes.
 1.70 26-Dec-2002  yamt - in lfs_reserve, reserve locked buffer count as well.
- don't wait for locking buf in lfs_bwrite_ext to avoid deadlocks.
- skip lfs_reserve when we're doing dirop.
reserve more (for lfs_truncate) in set_dirop instead.

this mostly solves PR 18972. (and hopefully PR 19196)
 1.69 24-Nov-2002  yamt correct locking for lfs_rmdir. PR 18976.
 1.68 23-Oct-2002  jdolecek merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe
 1.67 27-Sep-2002  provos remove trailing \n in panic(). approved perry.
 1.66 22-Sep-2002  jdolecek don't need <sys/conf.h> here
 1.65 16-Jun-2002  perseant For synchronous writes, keep separate i/o counters for each write, so
processes don't have to wait for one another to finish (e.g., nfsd seems
to be a little happier now, though I haven't measured the difference).
Synchronous checkpoints, however, must always wait for all i/o to finish.

Take the contents of the callback functions and have them run in thread
context instead (aiodoned thread). lfs_iocount no longer has to be
protected in splbio(), and quite a bit less of the segment construction
loop needs to be in splbio() as well.

If lfs_markv is handed a block that is not the correct size according to
the inode, refuse to process it. (Formerly it was extended to the "correct"
size.) This is possibly more prone to deadlock, but less prone to corruption.

lfs_segclean now outright refuses to clean segments that appear to have live
bytes in them. Again this may be more prone to deadlock but avoids
corruption.

Replace ufsspec_close and ufsfifo_close with LFS equivalents; this means
that no UFS functions need to know about LFS_ITIMES any more. Remove
the reference from ufs/inode.h.

Tested on i386, test-compiled on alpha.
 1.64 17-May-2002  perseant branches: 1.64.2;
use macros from <sys/queue.h>
 1.63 14-May-2002  perseant branches: 1.63.2;
Phase one of my three-phase plan to make LFS play nice with UBC, and bug-fixes
I found while making sure there weren't any new ones.

* Make the write clusters keep track of the buffers whose blocks they contain.
This should make it possible to (1) write clusters using a page mapping
instead of malloc, if desired, and (2) schedule blocks for rewriting
(somewhere else) if a write error occurs. Code is present to use
pagemove() to construct the clusters but that is untested and will go away
anyway in favor of page mapping.
* DEBUG now keeps a log of Ifile writes, so that any lingering instances of
the "dirty bufs" problem can be properly debugged.
* Keep track of whether the Ifile has been dirtied by various routines that
can be called by lfs_segwrite, and loop on that until it is clean, for
a checkpoint. Checkpoints need to be squeaky clean.
* Warn the user (once) if the Ifile grows larger than is reasonable for their
buffer cache. Both lfs_mountfs and lfs_unmount check since the Ifile can
grow.
* If an inode is not found in a disk block, try rereading the block, under
the assumption that the block was copied to a cluster and then freed.
* Protect WRITEINPROG() with splbio() to fix a hang in lfs_update.
 1.62 27-Apr-2002  perseant Make exported LFSes not panic on the first file create.
 1.61 11-Feb-2002  perseant Include the space taken by inodes in the count made by lfs_check();
make VOP_SETATTR call lfs_check. This prevents large numbers of inode
changes (say, at the end of tar(1)) from filling the buffer cache.
 1.60 18-Dec-2001  chs use the new compatibility routines to allow mmap() to work
(in the same non-coherent fashion that it worked pre-UBC)
until someone has time to do it the right way.
 1.59 23-Nov-2001  chs add spaces for KNF. confirmed to produce identical objects.
 1.58 08-Nov-2001  lukem add RCSID
 1.57 26-Oct-2001  lukem remove #include <ufs/ufs/quota.h> where it was just to appease
<ufs/ufs/inode.h>, since the latter now includes the former. leave the former
in source that obviously uses specific bits of it (for completeness.)
 1.56 22-Sep-2001  sommerfeld branches: 1.56.2;
Add fifo_putpages() placebo so that the vnode's uobj is unlocked.
 1.55 15-Sep-2001  chs a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.54 24-Aug-2001  chs branches: 1.54.2;
disable mmap() for LFS until it is fixed.
 1.53 17-Aug-2001  chs add getpages/putpages entries for spec vnodes.
 1.52 24-Jul-2001  assar change vop_symlink and vop_mknod to return vpp (the created node)
refed, so that the caller can actually use it. update callers and
file systems that implement these vnode operations
 1.51 13-Jul-2001  perseant Merge the short-lived perseant-lfsv2 branch into the trunk.

Kernels and tools understand both v1 and v2 filesystems; newfs_lfs
generates v2 by default. Changes for the v2 layout include:

- Segments of non-PO2 size and arbitrary block offset, so these can be
matched to convenient physical characteristics of the partition (e.g.,
stripe or track size and offset).

- Address by fragment instead of by disk sector, paving the way for
non-512-byte-sector devices. In theory fragments can be as large
as you like, though in reality they must be smaller than MAXBSIZE in size.

- Use serial number and filesystem identifier to ensure that roll-forward
doesn't get old data and think it's new. Roll-forward is enabled for
v2 filesystems, though not for v1 filesystems by default.

- The inode free list is now a tailq, paving the way for undelete (undelete
is not yet implemented, but can be without further non-backwards-compatible
changes to disk structures).

- Inode atime information is kept in the Ifile, instead of on the inode;
that is, the inode is never written *just* because atime was changed.
Because of this the inodes remain near the file data on the disk, rather
than wandering all over as the disk is read repeatedly. This speeds up
repeated reads by a small but noticeable amount.

Other changes of note include:

- The ifile written by newfs_lfs can now be of arbitrary length, it is no
longer restricted to a single indirect block.

- Fixed an old bug where ctime was changed every time a vnode was created.
I need to look more closely to make sure that the times are only updated
during write(2) and friends, not after-the-fact during a segment write,
and certainly not by the cleaner.
 1.50 22-Jan-2001  jdolecek branches: 1.50.2; 1.50.4; 1.50.6;
make filesystem vnodeop, specop, fifoop and vnodeopv_* arrays const
 1.49 18-Nov-2000  toshii Make buildable again.
The previous commit was a backout of rev. 1.45, which must be an accident.
 1.48 17-Nov-2000  perseant Correct accounting of lfs_avail, locked_queue_count, and locked_queue_bytes.
(PR #11468). In the case of fragment allocation, check to see if enough
space is available before extending a fragment already scheduled for writing.

The locked_queue_* variables indicate the number of buffer headers and bytes,
respectively, that are unavailable to getnewbuf() because they are locked up
waiting for LFS to flush them; make sure that that is actually what we're
counting, i.e., never count malloced buffers, and always use b_bufsize instead
of b_bcount.

If DEBUG is defined, the periodic calls to lfs_countlocked will now complain
if either counter is incorrect. (In the future lfs_countlocked will not need
to be called at all if DEBUG is not defined.)
 1.47 12-Nov-2000  perseant Do not needlessly dirty segment table blocks during lfs_segwrite,
preventing needless disk activity when the filesystem is idle. (PR #10979.)
 1.46 14-Oct-2000  perseant In lfs_truncate, don't overcount the real blocks removed from the inode,
when deallocating a fragment that has not made it to disk yet.

Also, during dirops, give the directory vnode an extra reference in
SET_DIROP, to ensure its continued existence during SET_ENDOP, preventing
a possible NULL-dereference there.

These two changes should close PR #11064.
 1.45 19-Sep-2000  fvdl Adapt for VOP_FSYNC parameter change.
 1.44 09-Sep-2000  perseant Various bug-fixes to LFS, to wit:


Kernel:

* Add runtime quantity lfs_ravail, the number of disk-blocks reserved
for writing. Writes to the filesystem first reserve a maximum amount
of blocks before their write is allowed to proceed; after the blocks
are allocated the reserved total is reduced by a corresponding amount.

If the lfs_reserve function cannot immediately reserve the requested
number of blocks, the inode is unlocked, and the thread sleeps until
the cleaner has made enough space available for the blocks to be
reserved. In this way large files can be written to the filesystem
(or, smaller files can be written to a nearly-full but thoroughly
clean filesystem) and the cleaner can still function properly.

* Remove explicit switching on dlfs_minfreeseg from the kernel code; it
is now merely a fs-creation parameter used to compute dlfs_avail and
dlfs_bfree (and used by fsck_lfs(8) to check their accuracy). Its
former role is better assumed by a properly computed dlfs_avail.

* Bounds-check inode numbers submitted through lfs_bmapv and lfs_markv.
This prevents a panic, but, if the cleaner is feeding the filesystem
the wrong data, you are still in a world of hurt.

* Cleanup: remove explicit references of DEV_BSIZE in favor of
btodb()/dbtob().

lfs_cleanerd:

* Make -n mean "send N segments' blocks through a single call to
lfs_markv". Previously it had meant "clean N segments though N calls
to lfs_markv, before looking again to see if more need to be cleaned".
The new behavior gives better packing of direct data on disk with as
little metadata as possible, largely alleviating the problem that the
cleaner can consume more disk through inefficient use of metadata than
it frees by moving dirty data away from clean "holes" to produce
entirely clean segments.

* Make -b mean "read as many segments as necessary to write N segments
of dirty data back to disk", rather than its former meaning of "read
as many segments as necessary to free N segments worth of space". The
new meaning, combined with the new -n behavior described above,
further aids in cleaning storage efficiency as entire segments can be
written at once, using as few blocks as possible for segment summaries
and inode blocks.

* Make the cleaner take note of segments which could not be cleaned due
to error, and not attempt to clean them until they are entirely free
of dirty blocks. This prevents the case in which a cleanerd running
with -n 1 and without -b (formerly the default) would spin trying
repeatedly to clean a corrupt segment, while the remaining space
filled and deadlocked the filesystem.

* Update the lfs_cleanerd manual page to describe all the options,
including the changes mentioned here (in particular, the -b and -n
flags were previously undocumented).

fsck_lfs:

* Check, and optionally fix, lfs_avail (to an exact figure) and
lfs_bfree (within a margin of error) in pass 5.

newfs_lfs:

* Reduce the default dlfs_minfreeseg to 1/20 of the total segments.

* Add a warning if the sgs disklabel field is 16 (the default for FFS'
cpg, but not usually desirable for LFS' sgs: 5--8 is a better range).

* Change the calculation of lfs_avail and lfs_bfree, corresponding to
the kernel changes mentioned above.

mount_lfs:

* Add -N and -b options to pass corresponding -n and -b options to
lfs_cleanerd.

* Default to calling lfs_cleanerd with "-b -n 4".


[All of these changes were largely tested in the 1.5 branch, with the
idea that they (along with previous un-pulled-up work) could be applied
to the branch while it was still in ALPHA2; however my test system has
experienced corruption on another filesystem (/dev/console has gone
missing :^), and, while I believe this unrelated to the LFS changes, I
cannot with good conscience request that the changes be pulled up.]
 1.43 05-Jul-2000  perseant Clean up accounting of lfs_uinodes (dirty but unwritten inodes).

Make lfs_uinodes a signed quantity for debugging purposes, and set it to
zero as fs mount time.

Enclose setting/clearing of the dirty flags (IN_MODIFIED, IN_ACCESSED,
IN_CLEANING) in macros, and use those macros everywhere. Make
LFS_ITIMES use these macros; updated the ITIMES macro in inode.h to know
about this. Make ufs_getattr use ITIMES instead of FFS_ITIMES.
 1.42 01-Jul-2000  perseant Move SET_ENDOP after vrele to avoid deactivating vnode twice, if
SET_ENDOP triggers a write.
 1.41 28-Jun-2000  mrg remove include of <vm/vm.h> and <uvm/uvm_extern.h>
 1.40 27-Jun-2000  perseant Fixes associated with filling an LFS:

Change the space computation to appear to change the size of the *disk*
rather than the *bytes used* when more segment summaries and inode
blocks are written. Try to estimate the amount of space that these will
take up when more files are written, so the disk size doesn't change too
much.

Regularize error returns from lfs_valloc, lfs_balloc, lfs_truncate: they
now fail entirely, rather than succeeding half-way and leaving the fs in
an inconsistent state.

Rewrite lfs_truncate, mostly stealing from ffs_truncate. The old
lfs_truncate had difficulty truncating a large file to a non-zero size
(indirect blocks were not handled appropriately).

Unmark VDIROP on fvp after ufs_remove, ufs_rmdir, so these can be
reclaimed immediately: this vnode would not be written to disk again
anyway if the removal succeeded, and if it failed, no directory
operation occurred.

ufs_makeinode and ufs_mkdir now remove IN_ADIROP on error.
 1.39 22-Jun-2000  perseant Update lfs_vunref for the fact that now a vnode can be locked with no
references (locked for VOP_INACTIVE at the end of vrele) and it's okay.
Check the return value of lfs_vref where appropriate.
Fixes PR #s 10285 and 10352.
 1.38 31-May-2000  perseant branches: 1.38.2;
update for IN_ACCESSED changes
 1.37 27-May-2000  perseant branches: 1.37.2;
Prevent dirops from getting around lfs_check and wedging the buffer cache.
All the dirop vnops now mark the inodes with a new flag, IN_ADIROP, which
is removed as soon as the dirop is done (as opposed to VDIROP which stays
until the file is written). To address one issue raised in PR#9357.
 1.36 13-May-2000  perseant Change the sementics of the last parameter from a boolean ("waitfor") to
a set of flags ("flags"). Two flags are defined, UPDATE_WAIT and
UPDATE_DIROP.

Under the old semantics, VOP_UPDATE would block if waitfor were set,
under the assumption that directory operations should be done
synchronously. At least LFS and FFS+softdep do not make this
assumption; FFS+softdep got around the problem by enclosing all relevant
calls to VOP_UPDATE in a "if(!DOINGSOFTDEP(vp))", while LFS simply
ignored waitfor, one of the reasons why NFS-serving an LFS filesystem
did not work properly.

Under the new semantics, the UPDATE_DIROP flag is a hint to the
fs-specific update routine that the call comes from a dirop routine, and
should be wait for, or not, accordingly.

Closes PR#8996.
 1.35 30-Mar-2000  augustss Remove register declarations.
 1.34 15-Dec-1999  perseant Fix error returns on lfs vnops so that locks and reference counts are
preserved. Handle dirop accounting in lfs_vfree for this case as well.
May address PR#8823.
 1.33 03-Dec-1999  perseant Handle the case of a vnode flush while dirops are active correctly in
lfs_segwrite. Also, make sure a flush is called in SET_DIROP before sleeping
on its results. Addresses PR #8863.
 1.32 15-Nov-1999  fvdl Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.31 06-Nov-1999  perseant branches: 1.31.2;
Address ufs_hashlock/ufs_ihashins protocol bug, discovered while doing a
post-mortem of a production machine. Also, take the active dirop
count off of the fs and make it global (since it is measuring a global
resource) and tie the threshold value LFS_MAXDIROP to desiredvnodes.
 1.30 05-Nov-1999  perseant Better fix for PR# 8577: before setting dirops, check for cross-device
rename and error out. This avoids possible problems with attempting
rename between two LFSs.
 1.29 01-Nov-1999  perseant Check that the destination vnode is on an LFS before trying to twiddle
its superblock. Fixes PR#8577.
 1.28 03-Sep-1999  perseant branches: 1.28.2; 1.28.4; 1.28.6;
Make changes that will allow an LFS filesystem to be used as the root
filesystem. In particular,

- Fix mknod deadlock, described in PR 8172.
- Enable lfs_mountroot.
- Make lfs_writevnodes treat filesystems mounted on lfs device nodes properly,
by flushing that device rather than trying to add blocks to the device inode.

This, in combination with lfs boot blocks, will allow operation of an all-lfs
system.
 1.27 03-Aug-1999  wrstuden Add support for fcntl(2) to generate VOP_FCNTL calls. Any fcntl
call with F_FSCTL set and F_SETFL calls generate calls to a new
fileop fo_fcntl. Add genfs_fcntl() and soo_fcntl() which return 0
for F_SETFL and EOPNOTSUPP otherwise. Have all leaf filesystems
use genfs_fcntl().

Reviewed by: thorpej
Tested by: wrstuden
 1.26 12-Apr-1999  perseant Disallow threshold-initiated cache flush when dirops are active. Also, make
SET_ENDOP use lfs_check instead of inlining most of it.
 1.25 29-Mar-1999  perseant branches: 1.25.2;
Fix the other missing dirop wakeup
 1.24 25-Mar-1999  perseant Since dirop vnodes can't be flushed, they hold a reference until their
dirop is completely written to disk. This means that ordinary calls to
ufs vnops which would ordinarily call VOP_INACTIVE through vrele/vput,
don't. This patch detects that condition after such vnops have been
run, and calls VOP_INACTIVE if it would ordinarily have been called by
the ufs call.
 1.23 25-Mar-1999  perseant clean up unused/required #ifdefs
 1.22 10-Mar-1999  perseant New sources should leave the LFS in a more-or-less working state. Changes
include:

- DIROP segregation is enabled, and greater care is taken
to make sure that a checkpoint completes. Fsck is not
needed to remount the filesystem.
- Several checks to make sure that the LFS subsystem does not
overuse various resources (memory, in particular).
- The cleaner routines, lfs_markv in particular, are completely
rewritten. A buffer overflow is removed. Greater care is taken
to ensure that inodes come from where lfs_cleanerd say they come
from (so we know nothing has changed since lfs_bmapv was called).
- Fragment allocation is fixed, so that writes beyond end-of-file
do the right thing.
 1.21 05-Mar-1999  mycroft Pass null pointers to VOP_UPDATE rather than having all the callers fetch the
current time themselves.
 1.20 06-Nov-1998  cgd argument to dbtob needs to be cast to u_quad_t here to avoid shift lossage
 1.19 01-Sep-1998  thorpej Use the pool allocator and the "nointr" pool page allocator for LFS inodes.
 1.18 24-Jun-1998  sommerfe Always include fifos; "not an option any more".
 1.17 22-Jun-1998  sommerfe defopt for options FIFO
 1.16 05-Jun-1998  kleink Convert fsync vnode operator implementations and usage from the old `waitfor'
argument and MNT_WAIT/MNT_NOWAIT to `flags' and FSYNC_WAIT.
 1.15 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.14 11-Jun-1997  bouyer Add support for ext2fs, this needed a few modifications to ufs/ufs/inode.h:
- added an "union inode_ext" to struct inode, for the per-fs extentions.
For now only ext2fs uses it.
- i_din is now an union:
union {
struct dinode ffs_din; /* 128 bytes of the on-disk dinode. */
struct ext2fs_dinode e2fs_din; /* 128 bytes of the on-disk dinode. */
} i_din
Added a lot of #define i_ffs_* and i_e2fs_* to access the fields.
- Added two macros: FFS_ITIMES and EXT2FS_ITIMES. ITIMES calls the rigth
macro, depending on the time of the inode. ITIMES is used where necessary,
FFS_ITIMES and EXT2FS_ITIMES in other places.
 1.13 07-Sep-1996  mycroft Implement poll(2).
 1.12 01-Sep-1996  mycroft Add a set of generic file system operations that most file systems use.
Also, fix some time stamp bogosities.
 1.11 11-May-1996  mycroft Change VOP_UPDATE() semantics:
* Make 2nd and 3rd args timespecs, not timevals.
* Consistently pass a Boolean as the 4th arg (except in LFS).
Also, fix ffs_update() and lfs_update() to actually change the nsec fields.
 1.10 09-Feb-1996  christos lfs prototypes
 1.9 09-Feb-1996  mycroft Fix vop_link, vop_symlink, and vop_remove semantics in several ways:
* Change the argument names to vop_link so they actually make sense.
* Implement vop_link and vop_symlink for all file systems, so they do proper
cleanup.
* Require the file system to decide whether or not linking and unlinking of
directories is allowed, and disable it for all current file systems.
 1.8 01-Feb-1996  jtc Rename struct timespec fields to conform to POSIX.1b
 1.7 15-Jun-1995  cgd compensate for timeval/timespec/stat structure changes.
 1.6 14-Dec-1994  mycroft Sync with CSRG.
 1.5 13-Dec-1994  mycroft Not ready for part of the previous change yet...
 1.4 13-Dec-1994  mycroft Turn lease_check() into a vnode op, per CSRG.
 1.3 20-Oct-1994  cgd update for new syscall args description mechanism, and deal safely
with wider types.
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.25.2.6 15-Jan-2000  he Pull up revision 1.34 (requested by perseant):
Fix error returns on lfs vnops so that locks and reference counts
are preserved. Handle dirop accounting in lfs_vfree for this
case as well. Addresses PR#8823.
 1.25.2.5 15-Jan-2000  he Pull up revision 1.28 (requested by perseant):
Address problems related to using an LFS filesystem as the root
filesystem, including mknod hangs. Fixes PR#8172 and PR#9072.
 1.25.2.4 18-Dec-1999  he Pull up revision 1.33 (requested by perseant):
Handle the case of a vnode flush while dirops are active correctly
in lfs_segwrite. Also, make sure a flush is called in SET_DIROP
before sleeping on its results. Addresses PR#8863.
 1.25.2.3 17-Dec-1999  he Pull up revision 1.31 (requested by perseant):
Address locking protocol error for inode hash, and make the
maximum number of active dirops a global quantity.
 1.25.2.2 08-Nov-1999  cgd pull up revs 1.29-1.30 from trunk (requested by perseant):
Check for cross-device rename before setting up dirop markers in
lfs_rename. Addresses PR#8577.
 1.25.2.1 13-Apr-1999  perseant branches: 1.25.2.1.2;
Pull-up of changes made to the trunk on Sunday [1.25->1.26], to wit:

Take out the `#ifdef USE_UFSHASH'; use ufs_hashlock to lock the inode free
list instead of free_lock.

Fix inode reporting in lfs_statfs (the meaning of f_files and f_ffree was
reversed).

Fix "lfs_ifind: dinode xxx not found" panic. When inodes were freed, then
immediately reloaded, their dinodes were located in an inode block which
was not on disk at the advertized location, nor in the cache (although it
would be flushed to disk next segment write). Fix this by using getblk()
instead of lfs_newbuf() for inode blocks.

Better checking for held inode locks in lfs_fastvget, for a number of
error conditions. Also change the default setting of lfs_clean_vnhead to
0, which seems to make the locking problems go away (although this is
difficult to test as I can't reliably reproduce them).

Make sure that the wakeup occurs for vnodes that lfs_update might be
sleeping on (nodes which are not marked IN_MODIFIED/IN_CLEANING, but which
have dirty buffers), by marking them with the appropriate flag if
dirtybuffers were added while the write was in progress.

Fix block counting during file truncation, if not truncating to zero.

Disallow threshold-initiated cache flush when dirops are active. Also,
make SET_ENDOP use lfs_check instead of inlining most of it.

Improve the debugging printfs in the cleaner syscalls (in particular, make
it obvious that they're coming from lfs).

Check the superblock version field, and refuse to mount the filesystem if
the version number is higher than we know about. This allows, e.g.,
changes in the format of the ifile, segment size restrictions and
boundaries, etc., which would not affect existing fields in the
superblock, but which would drastically affect the filesystem, to be
smoothly integrated at a later date.
 1.25.2.1.2.5 31-Aug-1999  perseant Rudimentary support for LFS under UBC:

- LFS-specific VOP_BALLOC and VOP_PUTPAGES vnode ops.

- getblk VREG panic #ifdef'd out (can be reinstated when Ifile is
internalized and Ifile can be made another type from VREG)

- interface to VOP_PUTPAGES changed to pass all pager flags, not
just sync. FS putpages routines must know about the pager flags.

- new LFS magic disk address, -2 ("unwritten"), meaning accounted for
but not assigned to a fixed disk location (since LFS does these two
things separately, and the previous accounting method using buffer
headers no longer will work). Changed references to (foo == (daddr_t)-1)
to (foo < 0). Since disk drivers reject all addresses < 0, this should
not present a problem for other FSs.
 1.25.2.1.2.4 11-Jul-1999  chs add placeholders for getpages/putpages.
 1.25.2.1.2.3 02-Jul-1999  thorpej Take two at making a non-converted LFS work in a UBC kernel.
 1.25.2.1.2.2 21-Jun-1999  thorpej Pull in ffs_extern.h to get ffs_balloc_range() prototype for
ufs_readwrite.c
 1.25.2.1.2.1 21-Jun-1999  thorpej Sync w/ -current.
 1.28.6.2 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.28.6.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.28.4.2 15-Nov-1999  fvdl Sync with -current
 1.28.4.1 19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.28.2.3 11-Feb-2001  bouyer Sync with HEAD.
 1.28.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.28.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.31.2.2 06-Nov-1999  perseant Address ufs_hashlock/ufs_ihashins protocol bug, discovered while doing a
post-mortem of a production machine. Also, take the active dirop
count off of the fs and make it global (since it is measuring a global
resource) and tie the threshold value LFS_MAXDIROP to desiredvnodes.
 1.31.2.1 06-Nov-1999  perseant file lfs_vnops.c was added on branch comdex-fall-1999 on 1999-11-06 20:33:07 +0000
 1.37.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.38.2.6 03-Feb-2001  he Pull up revisions 1.47-1.49 (requested by perseant):
o Don't write anything if the filesystem is idle (PR#10979).
o Close up accounting holes in LFS' accounting of immediately-
available-space, number of clean segments, and amount of dirty
space taken up by metadata (PR#11468, PR#11470, PR#11534).
 1.38.2.5 14-Dec-2000  he Pull up revision 1.45 (requested by fvdl):
Improve NFS performance, possibly with as much as 100% in
throughput. Please note: this implies a kernel interface change,
VOP_FSYNC gains two arguments.
 1.38.2.4 01-Nov-2000  tv Fix pullup of 1.46 [perseant, toshii]:
In lfs_truncate, don't overcount the real blocks removed from the inode,
when deallocating a fragment that has not made it to disk yet.

Also, during dirops, give the directory vnode an extra reference in
SET_DIROP, to ensure its continued existence during SET_ENDOP, preventing
a possible NULL-dereference there.

These two changes should close PR #11064.
 1.38.2.3 01-Nov-2000  tv Pullup 1.46 [perseant, toshii]:
In lfs_truncate, don't overcount the real blocks removed from the inode,
when deallocating a fragment that has not made it to disk yet.

Also, during dirops, give the directory vnode an extra reference in
SET_DIROP, to ensure its continued existence during SET_ENDOP, preventing
a possible NULL-dereference there.

These two changes should close PR #11064.
 1.38.2.2 14-Sep-2000  perseant Pull up recent LFS kernel changes (approved by thorpej):

ufs/ufs/inode.h, 1.20--1.22 (add i_lfs_effnblks extension ;
make ITIMES aware of LFS_ITIMES;
_LKM protection so userland progs
compile)
ufs/ufs/ufs_vnops.c, 1.69, 1.71 (remove IN_ADIROP;
use ITIMES instead of FFS_ITIMES)
ufs/ufs/ufs_readwrite.c, 1.27 (use lfs_reserve in lfs_write)
ufs/lfs/lfs.h, 1.26--1.32 (define LFS_EST_* macros ;
change MIN_FREE_SEGS to lfs_minfreesegs ;
add avail and bfree to CLEANERINFO ;
change lfs_uinodes to signed ;
change lfs_dmeta to signed ;
add whitespace to line up structure
members ;
explicit cast to int32_t in LFS_EST_*
macros)
ufs/lfs/lfs_alloc.c, back out 1.34.2.3 (pullups of 1.39, 1.40);
then pull up 1.38 (clean up on error)
1.39--1.43 (restore fvdl's ufs_hashlock fix ;
restore fvdl's ufs_hashlock fix ;
set i_lfs_effnblks ;
use UINO macros ;
add comments and fix long lines)
ufs/lfs/lfs_balloc.c, 1.19 (don't succeed halfway)
1.21--1.25 (use i_lfs_effnblks ;
fix i_lfs_effnblks computation and
quieten ;
fix i_ffs_blocks in unwritten fragment ;
remove useless debugging check ;
add comments and (c) 2000)
ufs/lfs/lfs_bio.c, 1.24--1.30 (cleanup and make lfs_flush_fs take
"struct lfs *" instead of "struct
mount *" ;
use lfs_minfreeseg instead of
MIN_FREE_SEGS ;
use UINO macros, and copy bfree/avail
to CLEANERINFO ;
add lfs_reserve function ;
1.28--1.30 fix printf formatting)
ufs/lfs/lfs_cksum.c, 1.13 (add (c) 2000)
ufs/lfs/lfs_debug.c, 1.11 (use btodb instead of DEV_BSIZE)
ufs/lfs/lfs_extern.h, 1.18, 1.20--1.21 (function prototype changes)
ufs/lfs/lfs_inode.c, 1.38 (rewrite lfs_truncate from
ffs_truncate)
1.40--1.44 (count written and unwritten blocks
seperately ;
use disk block units instead of bytes ;
remove unnecessary "mod" variable ;
correct B_DELWRI to avoid bawrite panic ;
use lfs_reserve)
ufs/lfs/lfs_segment.c, 1.52-1.59 (use lfs_dmeta to note used summaries ;
check for UNWRITTEN in indirect blocks ;
more debugging stuff inside #ifdef
DEBUG_LFS ;
use LK_CANRECURSE ;
don't drop dirty indirect blocks ;
use UINO macros ;
don't hose the free list ;
use btodb() instead of DEV_BSIZE ;
make it compile again (oops))
ufs/lfs/lfs_subr.c, 1.16--1.17 (check for locked inodes before
changing ;
use btodb() instead of DEV_BSIZE, (c)
2000)
ufs/lfs/lfs_syscalls.c, back out 1.41.4.2 (fvdl's ufs_hashlock fix);
then pull up 1.43 (use lfs_dmeta)
1.44--1.45 (restore fvdl's ufs_hashlock fix)
1.46--1.47 (fix lfs_avail leakage from sblock
segments ;
use UINO macros)
1.49 (bounds-check inode numbers in
lfs_markv)
ufs/lfs/lfs_vfsops.c, 1.53 (use LFS_EST_* macros in lfs_statfs)
1.56--1.58 (initialize lfs_minfreeseg, lfs_effnblk ;
initialize lfs_uinodes ;
initialize lfs_ravail)
ufs/lfs/lfs_vnops.c, 1.40 (remove VDIROP from removed files)
1.42--1.44 (move SET_ENDOP below the removal of
VDIROP ;
use UINO macros and add lfs_itimes
function ;
use lfs_reserve in dirops)
 1.38.2.1 22-Jun-2000  perseant Pull up lfs_vunref fix from the trunk.
 1.50.6.9 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.50.6.8 26-Sep-2002  jdolecek hook in genfs_kqfilter(), kevents seem to work fine
 1.50.6.7 23-Sep-2002  jdolecek add spec kqfilter vnode op
 1.50.6.6 22-Sep-2002  jdolecek add fifo_kqfilter() to fifo ops, to switch on support for kevents
 1.50.6.5 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.50.6.4 16-Mar-2002  jdolecek Catch up with -current.
 1.50.6.3 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.50.6.2 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.50.6.1 03-Aug-2001  lukem update to -current
 1.50.4.2 02-Jul-2001  perseant Change disk addressing unit to be the fragment, instead of the disk sector.
All quantities in the superblock, inodes, indirect blocks, etc. refer now
to this abstract unit (called "fsb" as it is in FFS) instead of disk sectors;
as a consequence segment summary blocks have to be multiples of a fragment in
size. In v1 filesystems, compatibility code ensures that 1 fsb == 1 sector,
regardless of fragment size.

Fragments can now range in size between 512 and 32k; in the event that
LFS_LABELPAD (8k) is smaller than the disk address unit size, an extra
proto-superblock is kept at 8k from the beginning of the disk, to be used
*only* to locate the real superblocks. (Not all of the userland knows about
this yet.)

Almost all of this was done not by me, but by joff.
 1.50.4.1 29-Jun-2001  perseant Get rid of __P(), protoizing where it had not already been done
 1.50.2.15 08-Jan-2003  thorpej Sync with HEAD.
 1.50.2.14 08-Jan-2003  thorpej Sync with HEAD.
 1.50.2.13 29-Dec-2002  thorpej Sync with HEAD.
 1.50.2.12 11-Dec-2002  thorpej Sync with HEAD.
 1.50.2.11 11-Nov-2002  nathanw Catch up to -current
 1.50.2.10 18-Oct-2002  nathanw Catch up to -current.
 1.50.2.9 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.50.2.8 20-Jun-2002  nathanw Catch up to -current.
 1.50.2.7 28-Feb-2002  nathanw Catch up to -current.
 1.50.2.6 08-Jan-2002  nathanw Catch up to -current.
 1.50.2.5 14-Nov-2001  nathanw Catch up to -current.
 1.50.2.4 26-Sep-2001  nathanw Catch up to -current.
Again.
 1.50.2.3 21-Sep-2001  nathanw Catch up to -current.
 1.50.2.2 24-Aug-2001  nathanw Catch up with -current.
 1.50.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.54.2.1 01-Oct-2001  fvdl Catch up with -current.
 1.56.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.63.2.2 20-Jun-2002  gehenna catch up with -current.
 1.63.2.1 30-May-2002  gehenna Catch up with -current.
 1.64.2.1 20-Jun-2002  lukem Pull up revision 1.65 (requested by perseant in ticket #325):
For synchronous writes, keep separate i/o counters for each write, so
processes don't have to wait for one another to finish (e.g., nfsd seems
to be a little happier now, though I haven't measured the difference).
Synchronous checkpoints, however, must always wait for all i/o to finish.
Take the contents of the callback functions and have them run in thread
context instead (aiodoned thread). lfs_iocount no longer has to be
protected in splbio(), and quite a bit less of the segment construction
loop needs to be in splbio() as well.
If lfs_markv is handed a block that is not the correct size according to
the inode, refuse to process it. (Formerly it was extended to the "correct"
size.) This is possibly more prone to deadlock, but less prone to corruption.
lfs_segclean now outright refuses to clean segments that appear to have live
bytes in them. Again this may be more prone to deadlock but avoids
corruption.
Replace ufsspec_close and ufsfifo_close with LFS equivalents; this means
that no UFS functions need to know about LFS_ITIMES any more. Remove
the reference from ufs/inode.h.
Tested on i386, test-compiled on alpha.
 1.109.2.11 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.109.2.10 01-Apr-2005  skrll Sync with HEAD.
 1.109.2.9 08-Mar-2005  skrll Sync with HEAD.
 1.109.2.8 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.109.2.7 04-Feb-2005  skrll Sync with HEAD.
 1.109.2.6 30-Oct-2004  skrll Reduced diff to HEAD by restoring the struct proc * argument to lfs_bmapv
 1.109.2.5 21-Sep-2004  skrll Fix the sync with head I botched.
 1.109.2.4 18-Sep-2004  skrll Sync with HEAD.
 1.109.2.3 24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.109.2.2 03-Aug-2004  skrll Sync with HEAD
 1.109.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.129.4.1 10-May-2005  riz Pull up the following revisions (requested by perseant in ticket #1281):

1.8 sys/ufs/lfs/TODO
1.75 sys/ufs/lfs/lfs.h (via patch)
1.74 sys/ufs/lfs/lfs_alloc.c (via patch)
1.49, 1.51 sys/ufs/lfs/lfs_balloc.c (1.51 via patch)
1.78 sys/ufs/lfs/lfs_bio.c
1.62 sys/ufs/lfs/lfs_extern.h (via patch)
1.156 sys/ufs/lfs/lfs_segment.c (via patch)
1.48 sys/ufs/lfs/lfs_subr.c
1.101 sys/ufs/lfs/lfs_syscalls.c
1.163 sys/ufs/lfs/lfs_vfsops.c (via patch)
1.134 sys/ufs/lfs/lfs_vnops.c (via patch)
1.61 sys/ufs/ufs/ufs_readwrite.c (via patch)

1.20 libexec/lfs_cleanerd/clean.h (via patch)
1.52 libexec/lfs_cleanerd/cleanerd.c (via patch)
1.41 libexec/lfs_cleanerd/library.c (via patch)

1.4 regress/sys/fs/lfs/newfs_fsck/Makefile
1.2 regress/sys/fs/lfs/newfs_fsck/mkfs_mount
1.2 regress/sys/fs/lfs/newfs_fsck/smallfiles
1.3 sbin/fsck_lfs/bufcache.c
1.3 sbin/fsck_lfs/bufcache.h
1.3 sbin/fsck_lfs/lfs.h
1.8 sbin/fsck_lfs/lfs.c (via patch)
1.8 sbin/fsck_lfs/pass3.c (via patch)
1.18 sbin/fsck_lfs/pass0.c (via patch)
1.18 sbin/fsck_lfs/utilities.c (via patch)
1.7 sbin/fsck_lfs/segwrite.c
1.19 sbin/fsck_lfs/setup.c (via patch)
1.3 sbin/newfs_lfs/Makefile
0 sbin/newfs_lfs/lfs.c (yes, remove it)
1.1 sbin/newfs_lfs/make_lfs.c
1.15 sbin/newfs_lfs/newfs.c (via patch)

Various minor LFS improvements.

Kernel:

* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this. Should fix PR #29045.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
Fixes PR #26680.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().

cleaner:

* Adapt lfs_cleanerd to use the fcntl call to get the Ifile filehandle,
so it need not be in the namespace.
* Make lfs_cleanerd be more careful when there are very few available
segments.
* Make lfs_cleanerd less verbose when the filesystem is unmounted.

newfs_lfs, fsck_lfs, and regression:

* Extend the lfs library from fsck_lfs(8) so that it can be used with a
not-yet-existent LFS. Make newfs_lfs(8) use this library, so it can
create LFSs whose Ifile is larger than one segment. Addresses PR #11110.
* Make newfs_lfs(8) use strsuftoi64() for its arguments, a la newfs(8).
* Make fsck_lfs(8) respect the "file system is clean" flag.
* Don't let fsck_lfs(8) think it has dirty blocks when invoked with the
-n flag.
* Remove the Ifile from the filesystem namespace. The cleaner now uses
a fcntl call on the root inode to find the Ifile filehandle. (As a
side-effect, addresses PR #29144.)
 1.132.6.3 26-Mar-2005  yamt sync with head.
 1.132.6.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.132.6.1 12-Feb-2005  yamt sync with head.
 1.132.4.1 29-Apr-2005  kent sync with -current
 1.137.2.25 10-Aug-2006  tron Apply patch (requested by fair in perseant #1457):
Bring LFS up to current, including a patch (1.95 lfs_alloc.c) that
should prevent the inode free list errors seen on the STABLE branch
subsequent to pullup ticket #1327.
 1.137.2.24 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.177
Don't be quite so eager to error out from lfs_putpages() when pages are
busy; if we've sensed a possible 3-way deadlock and are not the pagedaemon,
relock and try again.
 1.137.2.23 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_alloc.c: revision 1.93
sys/ufs/lfs/lfs.h: revision 1.106
sys/ufs/lfs/lfs_vfsops.c: revision 1.209
sys/ufs/lfs/lfs_vnops.c: revision 1.175
sys/ufs/lfs/lfs_segment.c: revision 1.178
Fixes to address the "vinvalbuf: dirty blocks" panic that can occur when
many inodes are cleaned at once. Make sure that we write all the pages
on vnodes that are being flushed, even if we don't think there's room;
drain v_numoutput before lfs_vflush() completes.
Also, don't allow a vnode that is in the process of being cleaned to be
chosen by getnewvnode(); this avoids a segment accounting panic in the case
that a large number of inodes are fed to lfs_markv() all at once.
 1.137.2.22 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_alloc.c: revision 1.92
sys/ufs/lfs/lfs.h: revision 1.105
sys/ufs/lfs/lfs_vfsops.c: revision 1.207
sys/ufs/lfs/lfs_subr.c: revision 1.59
sys/ufs/lfs/lfs_vnops.c: revision 1.173
sys/ufs/lfs/lfs_bio.c: revision 1.92
Introduce another per-filesystem parameter, lfs_resvseg, to separate the
notion of "how many segments are reserved for the cleaner" from that of
"how many segments are not counted in lfs_bfree". The default value
used for existing filesystems is the same as the previous implicit value
of (lfs_minfreeseg / 2 + 1), modulo some sanity checking.
Count pending dirops on a per-filesystem basis, since once we start
writing them we can't stop until we're done. This seems to help stave off
the "no clean segments" panic in the case of filling the filesystem with
directories and small files (e.g. simultaneously unpacking more copies of
pkgsrc than will fit).
 1.137.2.21 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.172
Fix a "locking against myself": lfs_flush_dirops() doesn't need to lock the
vnodes to write their blocks, since it holds the segment lock.
 1.137.2.20 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.171
sys/ufs/lfs/lfs_extern.h: revision 1.81
sys/ufs/lfs/lfs_segment.c: revision 1.177
Don't ever partially write dirops, even if we need the cleaner to run.
This increases the chances of the "no clean segments" panic slightly,
but allows us to run the ckckp regression test successfully to completion.
 1.137.2.19 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs.h: revision 1.104
sys/ufs/lfs/lfs_vfsops.c: revision 1.206
sys/ufs/lfs/lfs_vnops.c: revision 1.170
sys/ufs/lfs/lfs_extern.h: revision 1.80
sys/ufs/lfs/lfs_segment.c: revision 1.176
sys/ufs/lfs/lfs_inode.c: revision 1.103 via patch
sys/ufs/lfs/lfs_alloc.c: revision 1.90
Postpone the segment accounting changes coming from truncation until the
inode that makes those changes valid is either written to disk by
lfs_writeinode() or discarded by lfs_vfree().
A couple of locking fixes are also included as well.
 1.137.2.18 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.169
Yet another MP locking issue.
 1.137.2.17 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs.h: revision 1.103
sys/ufs/lfs/lfs_segment.c: revision 1.174
sys/ufs/lfs/lfs_vnops.c: revision 1.168
Introduce two fcntl calls that freeze the filesystem right at the point
where segment 0 is being considered for writing. This allows for automated
checkpoint vailidity scanning, and could be used (in conjunction with the
existing LFCNREWIND) for e.g. snapshot dumps as well.
Include a regression test that does such scanning.
When writing the Ifile, loop through the dirty block list three times to
make sure that the checkpoint is always consistent (the first and second
times the Ifile blocks can cross a segment boundary; not so the third time
unless the segments are very small). Discovered by using the aforementioned
regression test.
 1.137.2.16 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs.h: revision 1.102
sys/ufs/lfs/lfs_segment.c: revision 1.173
sys/ufs/lfs/lfs_vnops.c: revision 1.167 via patch
sys/ufs/lfs/lfs_bio.c: revision 1.91
Make lfs_vref/lfs_vunref not need to know about VXLOCK and VFREEING
explicitly (especially since we didn't know about VFREEING at all before),
but notice the EBUSY return from vget() instead.
Fix some more MP locking protocol issues, most of which were pointed out by
Christian Ehrhardt this morning on tech-kern.
 1.137.2.15 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.166
Another MP locking fix.
 1.137.2.14 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.165
Don't leak vnode references if we fail to lock a vnode in lfs_flush_pchain().
Also fix another (probably only academic) simple_lock protocol error.
 1.137.2.13 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vfsops.c: revision 1.200
sys/ufs/lfs/lfs_vnops.c: revision 1.164
sys/ufs/lfs/lfs_inode.c: revision 1.101
sys/ufs/lfs/lfs_extern.h: revision 1.78
sys/ufs/lfs/lfs.h: revision 1.100
Implement a somewhat finer-grained mechanism for paging LFS-backed pages.
The writer daemon, if it does not need to flush the whole filesystem,
now only writes the vnodes for which the pagedaemon has requested pageouts
(although it does not pay attention to the page ranges the pagedaemon
supplies).
 1.137.2.12 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_balloc.c: revision 1.60
sys/ufs/lfs/lfs_syscalls.c: revision 1.111
sys/ufs/lfs/lfs_segment.c: revision 1.172
sys/ufs/lfs/lfs_vnops.c: revision 1.163
Several minor bug fixes:
* Correct (weak) segment lock assertions in lfs_fragextend and lfs_putpages.
* Keep IN_MODIFIED set if we run out of avail in lfs_putpages.
* Don't try to (re)write buffers on a VBLK vnode; fixes a panic I found
while running with an LFS root.
* Raise priority of LFCNSEGWAIT to PVFS; PUSER is way too low for
something the pagedaemon is relying on.
 1.137.2.11 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.162
Make sure we unlock to zero when avoiding 3-way deadlock; otherwise we
simply have a different form of deadlock.
 1.137.2.10 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vfsops.c: revision 1.198
sys/ufs/lfs/lfs_vnops.c: revision 1.161
Handle the "filesystem is clean" flag correctly when upgrading from
read-only to read-write mount. This makes "root on lfs" work for me,
although it looks like a different traceback from PR#32667.
 1.137.2.9 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.159
Don't let the pagedaemon wait for pages, since that is just asking for
a deadlock.
 1.137.2.8 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.158
sys/ufs/lfs/lfs_subr.c: revision 1.57
sys/ufs/lfs/lfs_segment.c: revision 1.171
sys/ufs/lfs/lfs.h: revision 1.97
sys/ufs/lfs/lfs_vfsops.c: revision 1.195
sys/ufs/lfs/lfs_extern.h: revision 1.76
Improvements to LFS's paging mechanism, to wit:
* Acknowledge that sometimes there are more dirty pages to be written to
disk than clean segments. When we reach the danger line,
lfs_gop_write() now returns EAGAIN. The caller of VOP_PUTPAGES(), if
it holds the segment lock, drops it and waits for the cleaner to make
room before continuing.
* Note and avoid a three-way deadlock in lfs_putpages (a writer holding
a page busy blocks on the cleaner while the cleaner blocks on the
segment lock while lfs_putpages blocks on the page).
 1.137.2.7 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.153
sys/ufs/lfs/lfs_debug.c: revision 1.32
sys/ufs/lfs/lfs_alloc.c: revision 1.84
sys/ufs/lfs/lfs_vfsops.c: revision 1.185
sys/ufs/lfs/lfs_segment.c: revision 1.165
64 bit inode changes.
 1.137.2.6 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.152
sys/ufs/lfs/lfs_debug.c: revision 1.31
sys/ufs/lfs/lfs_subr.c: revision 1.53
sys/ufs/lfs/lfs_extern.h: revision 1.68
sys/ufs/lfs/lfs_inode.c: revision 1.96
sys/ufs/lfs/lfs_bio.c: revision 1.86
sys/ufs/lfs/lfs_alloc.c: revision 1.83
sys/ufs/lfs/lfs_vfsops.c: revision 1.181
sys/ufs/lfs/lfs.h: revision 1.88
sys/ufs/lfs/lfs_segment.c: revision 1.164
- sprinkle const
- avoid shadow variables.
 1.137.2.5 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.151
VOP_LOCK drops the interlock; pick it up again to avoid an "already unlocked"
panic in lfs_putpages.
 1.137.2.4 07-May-2005  tron Apply patch (requested by perseant in ticket #242):
* fsck_lfs buffer cache fixes, including PR #29151
* Change fsck_lfs phase 0 message to reflect reality
* fsck_lfs: check phase 5 (cleanerinfo accounting) even on
roll-forward
* Keep better track of the free list during roll-forward, avoiding
a core dump
* Improve hash table use for fsck_lfs buffer and vnode cache
* Document fsck_lfs flag -f, and implement -q
* Add resize_lfs, including kernel support
* Add LFS to mountd's list of exportable filesystem types
* Make the LFS lkm work again [christos@]
* Add MP locking to the LFS kernel subsystem
* Fix pager_map deadlock in lfs_putpages()
* Avoid incomplete file extension that looks like "partial
truncation" to fsck
* Use lfs_malloc for cleaner malloc, since the cleaner often runs
in low-memory conditions.
* Use splay trees, not hash table, to track page allocation for
write.
* Fix mkdir panic on full fs
* Fix page accounting leak by counting differently.
* Use rightly named structure for lfs_getattr [skrll@]
* Cosmetic changes for readability.
 1.137.2.3 30-Mar-2005  tron Pull up revision 1.140 (requested by perseant in ticket #74):
Don't sleep while holding the vnode interlock. Should take care of the
first panic case in PR #26043.
 1.137.2.2 30-Mar-2005  tron Pull up revision 1.139 (requested by perseant in ticket #74):
avoid the need for recursive locking lfs_flush_dirops() by unlocking
the vnode around the call to this in the caller.
 1.137.2.1 30-Mar-2005  tron Pull up revision 1.138 (requested by perseant in ticket #74):
Make LFS dirops get their vnode first, before incrementing the dirop
count, to prevent a deadlock trying to call VOP_PUTPAGES() on a VDIROP
vnode. This can happen when a stacked filesystem is mounted on top of an
LFS: an LFS dirop needs to get a vnode, which is available from the upper
layer. The corresponding lower layer vnode, however, is VDIROP, so the
upper layer can't be cleaned out since its VOP_PUTPAGES() is passed
through to the lower layer, which waits for dirops to drain before it can
proceed. Deadlock.
Tweak ufs_makeinode() and ufs_mkdir() to pass the a_vpp argument through
to VOP_VALLOC().
Partially addresses PR # 26043, though it probably does not completely fix
the problem described there.
 1.152.2.8 04-Feb-2008  yamt sync with head.
 1.152.2.7 21-Jan-2008  yamt sync with head
 1.152.2.6 07-Dec-2007  yamt sync with head
 1.152.2.5 27-Oct-2007  yamt sync with head.
 1.152.2.4 03-Sep-2007  yamt sync with head.
 1.152.2.3 26-Feb-2007  yamt sync with head.
 1.152.2.2 30-Dec-2006  yamt sync with head.
 1.152.2.1 21-Jun-2006  yamt sync with head.
 1.155.2.2 29-Oct-2005  yamt use lfs_* directly rather than via ufs_ops.
suggested by Chuck Silvers.
 1.155.2.1 20-Oct-2005  yamt adapt ufs.
 1.157.12.3 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.157.12.2 31-Mar-2006  tron Merge 2006-03-31 NetBSD-current into the "peter-altq" branch.
 1.157.12.1 28-Mar-2006  tron Merge 2006-03-28 NetBSD-current into the "peter-altq" branch.
 1.157.10.5 11-May-2006  elad sync with head
 1.157.10.4 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.157.10.3 19-Apr-2006  elad sync with head.
 1.157.10.2 10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.157.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.157.8.6 03-Sep-2006  yamt sync with head.
 1.157.8.5 11-Aug-2006  yamt sync with head
 1.157.8.4 26-Jun-2006  yamt sync with head.
 1.157.8.3 24-May-2006  yamt sync with head.
 1.157.8.2 11-Apr-2006  yamt sync with head
 1.157.8.1 01-Apr-2006  yamt sync with head.
 1.157.6.2 01-Jun-2006  kardel Sync with head.
 1.157.6.1 22-Apr-2006  simonb Sync with head.
 1.157.4.1 09-Sep-2006  rpaulo sync with head
 1.178.4.1 13-Jul-2006  gdamore Merge from HEAD.
 1.188.2.4 01-Feb-2007  ad Sync with head.
 1.188.2.3 30-Jan-2007  ad Remove support for SA. Ok core@.
 1.188.2.2 12-Jan-2007  ad Sync with head.
 1.188.2.1 18-Nov-2006  ad Sync with head.
 1.189.2.2 10-Dec-2006  yamt sync with head.
 1.189.2.1 22-Oct-2006  yamt sync with head
 1.193.2.3 25-Nov-2007  xtraeme Pull up following revision(s) (requested by christos in ticket #994):
sys/ufs/lfs/lfs_vnops.c: revision 1.208 (patch)
Move the "vp = NULL" assignment after the code that requires vp != NULL.
Reported by Chris Ross on current-users.
 1.193.2.2 05-Jun-2007  bouyer Pull up following revision(s) (requested by perseant in ticket #703):
sys/miscfs/genfs/genfs.h 1.21
sys/miscfs/genfs/genfs_vnops.c 1.151
sys/ufs/lfs/lfs.h 1.119, 1.120
sys/ufs/lfs/lfs_bio.c 1.99-101
sys/ufs/lfs/lfs_extern.h 1.89
sys/ufs/lfs/lfs_inode.c 1.108, 1.109
sys/ufs/lfs/lfs_segment.c 1.197, 1.199, 1.200
sys/ufs/lfs/lfs_subr.c 1.69, 1.70
sys/ufs/lfs/lfs_syscalls.c 1.119
sys/ufs/lfs/lfs_vfsops.c 1.234, 1.235
sys/ufs/lfs/lfs_vnops.c 1.195, 1.196, 1.200, 1.202-206

Reduce busy waiting in lfs_putpages(), and other LFS improvements.
 1.193.2.1 17-Feb-2007  tron branches: 1.193.2.1.2;
Apply patch (requested by chs in ticket #422):
- Fix various deadlock problems with nullfs and unionfs.
- Speed up path lookups by upto 25%.
 1.193.2.1.2.2 06-Jan-2008  wrstuden Catch up to netbsd-4.0 release.
 1.193.2.1.2.1 03-Sep-2007  wrstuden Sync w/ NetBSD-4-RC_1
 1.198.2.4 07-May-2007  yamt sync with head.
 1.198.2.3 15-Apr-2007  yamt sync with head.
 1.198.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.198.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.201.4.1 11-Jul-2007  mjf Sync with head.
 1.201.2.12 16-Sep-2007  ad - Checkpoint work in progress on the vnode lifecycle and reference counting
stuff. This makes it work properly without kernel_lock and fixes a few
quite old bugs. See vfs_subr.c 1.283.2.17 for details.

- Fix some problems with softdep. Unfortunately our softdep code appears
to have some longstanding bugs that cause it fail under stress test.
 1.201.2.11 20-Aug-2007  ad Sync with HEAD.
 1.201.2.10 19-Aug-2007  ad - Back out the biodone() changes.
- Eliminate B_ERROR (from HEAD).
 1.201.2.9 15-Jul-2007  ad Sync with head.
 1.201.2.8 23-Jun-2007  ad - Lock v_cleanblkhd, v_dirtyblkhd, v_numoutput with the vnode's interlock.
Get rid of global_v_numoutput_lock. Partially incomplete as the buffer
cache locking doesn't work very well and needs an overhaul.
- Some changes to try and make softdep MP safe. Untested.
 1.201.2.7 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.201.2.6 08-Jun-2007  ad Sync with head.
 1.201.2.5 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.201.2.4 10-Apr-2007  ad Sync with head.
 1.201.2.3 09-Apr-2007  ad - Add two new arguments to kthread_create1: pri_t pri, bool mpsafe.
- Fork kthreads off proc0 as new LWPs, not new processes.
 1.201.2.2 21-Mar-2007  ad - Replace more simple_locks, and fix up in a few places.
- Use condition variables.
- LOCK_ASSERT -> KASSERT.
 1.201.2.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.208.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.210.10.2 29-Jul-2007  ad It's not a good idea for device drivers to modify b_flags, as they don't
need to understand the locking around that field. Instead of setting
B_ERROR, set b_error instead. b_error is 'owned' by whoever completes
the I/O request.
 1.210.10.1 29-Jul-2007  ad file lfs_vnops.c was added on branch matt-mips64 on 2007-07-29 13:31:16 +0000
 1.210.8.1 14-Oct-2007  yamt sync with head.
 1.210.6.3 23-Mar-2008  matt sync with HEAD
 1.210.6.2 09-Jan-2008  matt sync with HEAD
 1.210.6.1 06-Nov-2007  matt sync with HEAD
 1.210.4.2 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.210.4.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.212.4.2 18-Feb-2008  mjf Sync with HEAD.
 1.212.4.1 08-Dec-2007  mjf Sync with HEAD.
 1.213.6.1 02-Jan-2008  bouyer Sync with HEAD
 1.213.2.4 19-Dec-2007  ad Use a global lfs_lock.
 1.213.2.3 19-Dec-2007  ad Fix some more problems w/lfs on this branch.
 1.213.2.2 19-Dec-2007  ad Get lfs mostly working.
 1.213.2.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.215.10.5 11-Aug-2010  yamt sync with head.
 1.215.10.4 11-Mar-2010  yamt sync with head
 1.215.10.3 16-May-2009  yamt sync with head
 1.215.10.2 04-May-2009  yamt sync with head.
 1.215.10.1 16-May-2008  yamt sync with head.
 1.215.8.2 17-Jun-2008  yamt sync with head.
 1.215.8.1 18-May-2008  yamt sync with head.
 1.215.6.4 17-Jan-2009  mjf Sync with HEAD.
 1.215.6.3 29-Jun-2008  mjf Sync with HEAD.
 1.215.6.2 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.215.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.216.2.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.216.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.217.2.1 27-Jun-2008  simonb Sync with head.
 1.218.12.1 29-Feb-2012  matt Deal with UVM_PAGE_OWN changes.
 1.218.6.1 19-May-2012  riz Apply patch (requested by buhrow in ticket #1759):


sys/ufs/lfs/lfs_vnops.c patch
sys/ufs/ufs/inode.h patch
sys/ufs/ufs/ufs_extern.h patch
sys/ufs/ufs/ufs_lookup.c patch
sys/ufs/ufs/ufs_vnops.c patch
sys/ufs/ufs/ufs_wapbl.c patch

Port dholland's ufs_rename locking changes to netbsd-5.
[buhrow, ticket #1759]

Hello. More testing has revealed a minor misunderstanding between the
vnode API in -current and 5.x. The below patch, against NetBSD-5.1
sources, rolls all the accumulated patches into one patch set. With this
patch, I believe you can now run with WAPBL, softdep or traditional ufs
semantics with heavy file loads and avoid panics due to resource exhaustion
and/or tstile deadlocks. Testing has been done on I386, both uniprocessor
and multiprocessor, and on Sparc machines in uniprocessor mode, though I
think multiprocessor Sparc would be fine as well. Since these changes are
machine independent, I don't anticipate any issues on any platform. It is
my hope that modulo any final issues that come up in the final round of
testing I'm currently performing, these patches will be ready to be pulled
up into the NetBSD-5 branch.
Finally, I'd like to thank mouse@ and hannken@ for their help and
patience in helping me track down and test the final versions of these
patches. With their assistance, I'm confident these patches make NetBSD-5
a much more stable and robust operating environment in a variety of
setings.
 1.218.4.2 03-Mar-2009  skrll Sync with HEAD.
 1.218.4.1 19-Jan-2009  skrll Sync with HEAD.
 1.219.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.226.4.6 21-May-2011  rmind Fix the build.
 1.226.4.5 19-May-2011  rmind Implement sharing of vnode_t::v_interlock amongst vnodes:
- Lock is shared amongst UVM objects using uvm_obj_setlock() or getnewvnode().
- Adjust vnode cache to handle unsharing, add VI_LOCKSHARE flag for that.
- Use sharing in tmpfs and layerfs for underlying object.
- Simplify locking in ubc_fault().
- Sprinkle some asserts.

Discussed with ad@.
 1.226.4.4 05-Mar-2011  rmind sync with head
 1.226.4.3 03-Jul-2010  rmind sync with head
 1.226.4.2 30-May-2010  rmind sync with head
 1.226.4.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.226.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.226.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.234.6.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.238.6.3 02-Jun-2012  mrg sync to latest -current.
 1.238.6.2 05-Apr-2012  mrg sync to latest -current.
 1.238.6.1 18-Feb-2012  mrg merge to -current.
 1.238.2.6 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.238.2.5 23-Jan-2013  yamt sync with head
 1.238.2.4 23-May-2012  yamt sync with head.
 1.238.2.3 17-Apr-2012  yamt sync with head
 1.238.2.2 06-Nov-2011  yamt remove pg->listq and uobj->memq
 1.238.2.1 02-Nov-2011  yamt page cache related changes

- maintain object pages in radix tree rather than rb tree.
- reduce unnecessary page scan in putpages. esp. when an object has a ton of
pages cached but only a few of them are dirty.
- reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
- fix nfs commit range tracking.
- fix nfs write clustering. XXX hack
 1.239.2.2 27-Aug-2016  bouyer Pull up following revision(s) (requested by dholland in ticket #1389):
sys/ufs/lfs/lfs_vnops.c: revision 1.304
Fix a deadlock
ok dholland@
 1.239.2.1 17-Mar-2012  bouyer Pull up following revision(s) (requested by perseant in ticket #116):
sys/ufs/lfs/lfs_alloc.c: revision 1.112
tests/fs/vfs/t_rmdirrace.c: revision 1.9
tests/fs/vfs/t_renamerace.c: revision 1.25
sys/ufs/lfs/lfs_vnops.c: revision 1.240
sys/ufs/lfs/lfs_segment.c: revision 1.224
sys/ufs/lfs/lfs_bio.c: revision 1.122
sys/ufs/lfs/lfs_vfsops.c: revision 1.294
sbin/newfs_lfs/make_lfs.c: revision 1.19
sys/ufs/lfs/lfs.h: revision 1.136
Pass t_renamerace and t_rmdirrace tests.
Adapt dholland@'s fix to ufs_rename to fix PR kern/43582. Address several
other MP locking issues discovered during the course of investigating the
same problem.
Removed extraneous vn_lock() calls on the Ifile, since the Ifile writes
are controlled by the segment lock.
Fix PR kern/45982 by deemphasizing the estimate of how much metadata
will fill the empty space on disk when the disk is nearly empty
(t_renamerace crates a lot of inode blocks on a tiny empty disk).
 1.242.2.4 03-Dec-2017  jdolecek update from HEAD
 1.242.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.242.2.2 23-Jun-2013  tls resync from head
 1.242.2.1 25-Feb-2013  tls resync with head
 1.248.4.1 23-Jul-2013  riastradh sync with HEAD
 1.248.2.2 18-May-2014  rmind sync with head
 1.248.2.1 28-Aug-2013  rmind sync with head
 1.262.2.1 10-Aug-2014  tls Rebase.
 1.269.4.6 28-Aug-2017  skrll Sync with HEAD
 1.269.4.5 05-Oct-2016  skrll Sync with HEAD
 1.269.4.4 09-Jul-2016  skrll Sync with HEAD
 1.269.4.3 22-Sep-2015  skrll Sync with HEAD
 1.269.4.2 06-Jun-2015  skrll Sync with HEAD
 1.269.4.1 06-Apr-2015  skrll Sync with HEAD
 1.269.2.2 14-Jul-2016  martin Pull up following revision(s) (requested by dholland in ticket #1205):
sys/ufs/lfs/lfs_vnops.c: revision 1.304
Fix a deadlock
ok dholland@
 1.269.2.1 06-Aug-2015  snj Apply patch (requested by dholland in ticket #935):
Comment out some KASSERTs.
 1.304.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.304.2.2 26-Apr-2017  pgoyette Sync with HEAD
 1.304.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.315.2.1 30-Oct-2017  snj Pull up following revision(s) (requested by maya in ticket #330):
sbin/fsck_lfs/inode.c: 1.69
sbin/fsck_lfs/lfs.c: 1.73
sbin/fsck_lfs/pass6.c: 1.50
sbin/fsck_lfs/segwrite.c: 1.46
sys/ufs/lfs/lfs.h: 1.202-1.203
sys/ufs/lfs/lfs_accessors.h: 1.48
sys/ufs/lfs/lfs_alloc.c: 1.136-1.137
sys/ufs/lfs/lfs_balloc.c: 1.94
sys/ufs/lfs/lfs_bio.c: 1.141
sys/ufs/lfs/lfs_extern.h: 1.113
sys/ufs/lfs/lfs_inode.c: 1.156-1.157
sys/ufs/lfs/lfs_inode.h: 1.20, 1.21, 1.23
sys/ufs/lfs/lfs_itimes.c: 1.20
sys/ufs/lfs/lfs_pages.c: 1.13-1.15
sys/ufs/lfs/lfs_rename.c: 1.22
sys/ufs/lfs/lfs_segment.c: 1.270-1.275
sys/ufs/lfs/lfs_subr.c: 1.94-1.97
sys/ufs/lfs/lfs_syscalls.c: 1.175
sys/ufs/lfs/lfs_vfsops.c: 1.360
sys/ufs/lfs/lfs_vnops.c: 1.316-1.321
sys/ufs/lfs/ulfs_inode.c: 1.20
sys/ufs/lfs/ulfs_inode.h: 1.24
sys/ufs/lfs/ulfs_lookup.c: 1.41
sys/ufs/lfs/ulfs_quota2.c: 1.31
sys/ufs/lfs/ulfs_readwrite.c: 1.24
sys/ufs/lfs/ulfs_vnops.c: 1.49-1.50
Update inode member i_flag --> i_state to keep up with kernel changes
Move definition of IN_ALLMOD near the flag it's a mask for.
Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
Correct confusion between i_flag and i_flags
These will have to be renamed.
Spotted by Riastradh, thanks!
Add an XXX about the missing flags so it's not buried in a commit
message.
now the XXX count for LFS is 260
Rename i_flag to i_state.
The similarity to i_flags has previously caused errors.
Use continue to denote the no-op loop to match netbsd style
newline for extra clarity.
It isn't safe to drain dirops with seglock held, it'll deadlock if there
are any dirops. drain before grabbing seglock.
lfs_dirops == 0 is always true (as we already drained dirops), so omit
that part of the comparison.
Fixes a lot of LFS deadlocks. PR kern/52301
Many thanks to dholland for help analyzing coredumps
Ifdef out KDASSERT which fires on my machine.
Deduplicate sanity check that seglock is held on segunlock
Revert r1.272 fix to PR kern/52301, the performance hit is making things
unusable.
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar
XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
fix buffer overflow/KASSERT when cookies are supplied
lfs no longer uses the ffs-style struct direct, use the correct minimum
size
from dholland
XXX more wrong
Consistently use {,UN}MARK_VNODE macros rather than function calls.
Not much point doing anything after a panic call
Ask some question about the code in a XXX comment
XXX question our double-flushing of dirops
Fix typo in comment
 1.321.4.4 21-Apr-2020  martin Sync with HEAD
 1.321.4.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.321.4.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.321.4.1 10-Jun-2019  christos Sync with HEAD
 1.321.2.2 18-Jan-2019  pgoyette Synch with HEAD
 1.321.2.1 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.324.2.1 17-Aug-2020  martin Pull up following revision(s) (requested by riastradh in ticket #1050):

sys/ufs/lfs/lfs_subr.c: revision 1.101
sys/ufs/lfs/lfs_subr.c: revision 1.102
sys/ufs/lfs/lfs_inode.c: revision 1.158
sys/ufs/lfs/lfs_inode.h: revision 1.25
sys/ufs/lfs/lfs_balloc.c: revision 1.95
sys/ufs/lfs/lfs_pages.c: revision 1.21
sys/ufs/lfs/lfs_vnops.c: revision 1.330
sys/ufs/lfs/lfs_alloc.c: revision 1.140 (patch)
sys/ufs/lfs/lfs_alloc.c: revision 1.141 (patch)
lib/libp2k/p2k.c: revision 1.72
sys/ufs/lfs/lfs.h: revision 1.205
sys/ufs/lfs/lfs.h: revision 1.206
sys/ufs/lfs/lfs_segment.c: revision 1.284
sys/ufs/lfs/lfs.h: revision 1.207
sys/ufs/lfs/lfs_segment.c: revision 1.285
sys/ufs/lfs/lfs_debug.c: revision 1.55
sys/ufs/lfs/lfs_rename.c: revision 1.23
usr.sbin/dumplfs/dumplfs.c: revision 1.65
sys/ufs/lfs/lfs_vfsops.c: revision 1.371
sys/arch/i386/stand/efiboot/bootx64/Makefile: revision 1.3
sys/ufs/lfs/lfs_vfsops.c: revision 1.372
sys/ufs/lfs/lfs_vfsops.c: revision 1.373
sbin/fsck_lfs/pass1.c: revision 1.46
sys/ufs/lfs/lfs_vnops.c: revision 1.326
sys/ufs/lfs/lfs_vnops.c: revision 1.327
sys/ufs/lfs/lfs_vfsops.c: revision 1.375 (patch)
sys/ufs/lfs/lfs_vnops.c: revision 1.328
sys/ufs/lfs/lfs_subr.c: revision 1.98
sys/ufs/lfs/lfs_extern.h: revision 1.116
sys/ufs/lfs/lfs_vnops.c: revision 1.329
sys/ufs/lfs/lfs_subr.c: revision 1.99
sys/ufs/lfs/lfs_extern.h: revision 1.117
sys/ufs/lfs/lfs_accessors.h: revision 1.49
sys/ufs/lfs/lfs_extern.h: revision 1.118
sys/rump/fs/lib/liblfs/Makefile: revision 1.15
sys/ufs/lfs/lfs_bio.c: revision 1.146 (patch)
sys/ufs/lfs/lfs_bio.c: revision 1.147
sys/ufs/lfs/lfs_subr.c: revision 1.100

Fix kassert in lfs by initializing vp first.

Use a marker node to iterate lfs_dchainhd / i_lfs_dchain.

I believe elements can be removed while the lock is dropped,
including the next node we're hanging on to.

Just use VOP_BWRITE for lfs_bwrite_log.
Hope this doesn't cause trouble with vfs_suspend.

Teach lfs to transition ro<->rw.

Prevent new dirops while we issue lfs_flush_dirops.

lfs_flush_dirops assumes (by KASSERT((ip->i_state & IN_ADIROP) == 0))
that vnodes on the dchain will not become involved in active dirops
even while holding no other locks (lfs_lock, v_interlock), so we must
set lfs_writer here. All other callers already set lfs_writer.

We set fs->lfs_writer++ without explicitly doing lfs_writer_enter
because
(a) we already waited for the dirops to drain, and
(b) we hold lfs_lock and cannot drop it before setting lfs_writer.

Assert lfs_writer where I think we can now prove it.

Serialize access to the splay tree with lfs_lock.

Change some cheap KDASSERT into KASSERT.

Take a reference and fix assertions in lfs_flush_dirops.
Fixes panic:
KASSERT((ip->i_state & IN_ADIROP) == 0) at lfs_vnops.c:1670
lfs_flush_dirops
lfs_check
lfs_setattr
VOP_SETATTR
change_mode
sys_fchmod
syscall

This assertion -- and the assertion that vp->v_uflag has VU_DIROP set
-- is valid only until we release lfs_lock, because we may race with
lfs_unmark_dirop which will remove the nodes and change the flags.

Further, vp itself is valid only as long as it is referenced, which it
is as long as it's on the dchain, but lfs_unmark_dirop drops the
dchain's reference.

Don't lfs_writer_enter while holding v_interlock.

There's no need to lfs_writer_enter at all here, as far as I can see.
lfs_flush_fs will do it for us.

Break deadlock in PR kern/52301.

The lock order is lfs_writer -> lfs_seglock. The problem in 52301 is
that lfs_segwrite violates this lock order by sometimes doing
lfs_seglock -> lfs_writer, either (a) when doing a checkpoint or (b),
opportunistically, when there are no dirops pending. Both cases can
deadlock, because dirops sometimes take the seglock (lfs_truncate,
lfs_valloc, lfs_vfree):
(a) There may be dirops pending, and they may be waiting for the
seglock, so we can't wait for them to complete while holding the
seglock.
(b) The test for fs->lfs_dirops == 0 happens unlocked, and the state
may change by the time lfs_writer_enter acquires lfs_lock.

To resolve this in each case:
(a) Do lfs_writer_enter before lfs_seglock, since we will need it
unconditionally anyway. The worst performance impact of this should
be that some dirops get delayed a little bit.
(b) Create a new lfs_writer_tryenter to use at this point so that the
test for fs->lfs_dirops == 0 and the acquisition of lfs_writer happen
atomically under lfs_lock.

Initialize/destroy lfs_allclean_wakeup in modcmd, not lfs_mountfs.

Fixes reloading lfs.kmod.

In lfs_update, hold lfs_writer around lfs_vflush.

Otherwise, we might do
lfs_vflush
-> lfs_seglock
-> lfs_segwait(SEGM_CKP)
-> lfs_writer_enter
which is the reverse of the lfs_writer -> lfs_seglock ordering.

Call lfs_orphan in lfs_rename while we're still in the dirop.
lfs_writer_enter can't fail; keep it simple and don't pretend it can.

Assert that mtsleep can't fail either -- it doesn't catch signals and
there's no timeout.

Teach LFS_ORPHAN_NEXTFREE about lfs64.

Dust off the orphan detection code and try to make it work.

Fix !DIAGNOSTIC compile

Fix userland references to LFS_ORPHAN_NEXTFREE.

Forgot to grep for these or do a full distribution build, oops!

Fix missing <sys/evcnt.h> by removing the evcnts instead.

Just wanted to confirm that a race might happen, and indeed it did.
These serve little diagnostic value otherwise.

OR into bp->b_cflags; don't overwrite.

CTASSERT lfs on-disk structure sizes.

Avoid misaligned access to lfs64 on-disk records in memory.
lfs64 directory entries are only 32-bit aligned in order to conserve
space in directory blocks, and we had a hack to stuff a 64-bit inode
in them. This replaces the hack by __aligned(4) __packed, and goes
further:

1. It's not clear that all the other lfs64 data structures are 64-bit
aligned on disk to begin with. We can go through these later and
upgrade them from
struct foo64 {
...
} __aligned(4) __packed;
union foo {
struct foo64 f64;
...
};
to
struct foo64 {
...
};
union foo {
struct foo64 f64 __aligned(8);
...
} __aligned(4) __packed;
if we really want to take advantage of 64-bit memory accesses.
However, the __aligned(4) __packed must remain on the union
because:
2. We access even the lfs32 data structures via a union that has
lfs64 members, and it turns out that compilers will assume access
through a union with 64-bit aligned members implies the whole
union has 64-bit alignment, even if we're only accessing a 32-bit
aligned member.

Fix clang build after packed lfs64 accessor change.

Suppress spurious address-of-packed error in rump lfs too.
 1.325.2.1 29-Feb-2020  ad Sync with head.
 1.331.4.1 20-Apr-2020  bouyer Sync with HEAD
 1.336.6.1 01-Aug-2021  thorpej Sync with HEAD.
 1.9 30-Mar-2017  hannken Remove now redundant calls to fstrans_start()/fstrans_done().

Add fstrans_start()/fstrans_done() to lfs_putpages().
 1.8 13-Mar-2017  riastradh #if DIAGNOSTIC panic ---> KASSERT

Replace some #if DEBUG by this too. DEBUG is only for expensive
assertions; these are not.
 1.7 01-Sep-2015  dholland branches: 1.7.2; 1.7.4;
Use the lfs dinode accessors in place of the ufs-derived ones.
(Mostly.)

The ufs-derived ones are fake structure member macros, which are gross
and not very safe. Also, it seems that a lot of places in the lfs code
were using the ffsv1 branch of them unconditionally, and this way it's
guaranteed all those places have been updated.

Found while doing this: for non-devices, have getattr produce NODEV
in the rdev field instead of leaking the address of the first direct
block.
 1.6 02-Aug-2015  dholland Pass the fs object to LFS_MAX_DADDR so it can check lfs_is64.

Remove some hackish intentional 64->32 truncations next to the checks
using LFS_MAX_DADDR, and tackle the problem they handled in bmap
instead.

The problem: the magic block pointer value UNWRITTEN has magic value
-2, and if it's not handled specifically, uint32 -> uint64 promotion
turns it into 4294967294, which then causes consternation and
monkeyhouse downstream.

What's here is still kind of a hack, but it's a step forward.
 1.5 28-Jul-2013  dholland branches: 1.5.4; 1.5.8;
Migrate the miscellaneous ulfs-level info from struct ulfsmount to
struct lfs.

Put them inside #ifdef _KERNEL there. They are not the only such
members, gross as that is. Unfortunately, moving struct lfs to
lfs_kernel.h does not work.
 1.4 06-Jun-2013  dholland branches: 1.4.2; 1.4.4;
Remove stray references to ext2fs, chfs, ffs, and mfs.
 1.3 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.2 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.4.4.1 28-Aug-2013  rmind sync with head
 1.4.2.4 03-Dec-2017  jdolecek update from HEAD
 1.4.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.4.2.2 23-Jun-2013  tls resync from head
 1.4.2.1 06-Jun-2013  tls file ulfs_bmap.c was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.5.8.2 28-Aug-2017  skrll Sync with HEAD
 1.5.8.1 22-Sep-2015  skrll Sync with HEAD
 1.5.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.5.4.1 28-Jul-2013  yamt file ulfs_bmap.c was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.7.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.7.2.2 26-Apr-2017  pgoyette Sync with HEAD
 1.7.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.9 19-Apr-2018  christos s/static inline/static __inline/g for consistency.
 1.8 20-Jun-2016  dholland branches: 1.8.16;
u_int{8,16,32,64}_t -> uint{8,16,32,64}_t in remaining lfs headers.
 1.7 19-Jun-2016  dholland we already have changes here comparable to ufs_bswap.h -r1.20 and -r1.21.
 1.6 18-Oct-2013  christos branches: 1.6.4; 1.6.8;
use __USE() in the right place, instead of (void)var.
 1.5 17-Oct-2013  christos - remove unused variables
- add debug ifdefs for debugging variables
- __USE() where appropriate.
 1.4 28-Jul-2013  dholland Migrate the miscellaneous ulfs-level info from struct ulfsmount to
struct lfs.

Put them inside #ifdef _KERNEL there. They are not the only such
members, gross as that is. Unfortunately, moving struct lfs to
lfs_kernel.h does not work.
 1.3 06-Jun-2013  dholland branches: 1.3.2; 1.3.4;
Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.2 06-Jun-2013  dholland Split lfs from ufs step 3: rearrange config stuff.
Add new options:
LFS_EI
LFS_DIRHASH
LFS_EXTATTR
LFS_EXTATTR_AUTOSTART
LFS_QUOTA
LFS_QUOTA2

and update code referring to the corresponding FFS and UFS config
symbols to use the LFS versions. Disable the one extant reference
to APPLE_UFS in the ulfs files. Use opt_lfs.h only, not opt_ffs.h.
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.3.4.2 18-May-2014  rmind sync with head
 1.3.4.1 28-Aug-2013  rmind sync with head
 1.3.2.4 03-Dec-2017  jdolecek update from HEAD
 1.3.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.3.2.2 23-Jun-2013  tls resync from head
 1.3.2.1 06-Jun-2013  tls file ulfs_bswap.h was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.6.8.1 09-Jul-2016  skrll Sync with HEAD
 1.6.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.6.4.1 18-Oct-2013  yamt file ulfs_bswap.h was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.8.16.1 22-Apr-2018  pgoyette Sync with HEAD
 1.13 20-Jun-2016  dholland Massedit u_int{8,16,32,64}_t to uint{8,16,32,64}_t. This effectively
merges ufs/dinode.h 1.25.
 1.12 19-Jun-2016  dholland we are actually synced with ufs/dinode.h 1.24 and ufs/dir.h 1.25.
 1.11 08-Jun-2013  dholland branches: 1.11.2; 1.11.10; 1.11.14;
Move a comment to lfs.h that belongs better there.
 1.10 08-Jun-2013  dholland Move more symbols to lfs.h:
LFS_DIRBLKSIZ
LFS_DIRECTSIZ
LFS_DIRSIZ
LFS_OLDDIRFMT
LFS_NEWDIRFMT
LFS_IFTODT
LFS_DTTOIF
ULFS{,1,2}_MAXSYMLINKLEN
 1.9 08-Jun-2013  dholland Move stuff to lfs.h that's needed by userland:
LFS_DT_*
ULFS_ROOTINO
ULFS_WINO
struct lfs_direct
struct lfs_dirtemplate
struct lfs_odirtemplate
struct ulfs_args

Also fix FFS_MAXNAMLEN -> LFS_MAXNAMLEN in several places.
 1.8 08-Jun-2013  dholland struct direct -> struct lfs_direct
struct dirtemplate -> struct lfs_dirtemplate
struct odirtemplate -> struct lfs_odirtemplate
DT_* -> LFS_DT_*
 1.7 08-Jun-2013  dholland Now move LFS_IFMT and friends from ulfs_dinode.h to lfs.h.
 1.6 08-Jun-2013  dholland Stick LFS_ in front of IFMT, IFIFO, IFREG, etc. so as not to conflict
with the UFS copies of these symbols. (Which themselves ought to have
UFS_ stuck on.)
 1.5 08-Jun-2013  dholland Move the dinode (on-disk inode) structures to lfs.h, since they are
and will be obviously required by userland tools that need to read
the on-disk structures.

Also, DINODE{1,2}_SIZE -> LFS_DINODE{1,2}_SIZE.
 1.4 06-Jun-2013  dholland Remove references to Apple UFS.
 1.3 06-Jun-2013  dholland Cleanups to reduce symbol and header exposure:
- move struct ufid from ulfs_inode.h to lfs.h
- lfs.h needs sys/mount.h and sys/pool.h
- ulfs_quota2_subr.c needs lfs_inode.h
- remove ulfs_inode.h from lfs.h in favor of ulfs_dinode.h
- move ULFS_NDADDR, ULFS_NIADDR, ULFS_NXADDR from ulfs_dinode.h to lfs.h
- remove ulfs_dinode.h from lfs.h
- add lfs.h to ulfs_dinode.h
 1.2 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.11.14.1 09-Jul-2016  skrll Sync with HEAD
 1.11.10.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.11.10.1 08-Jun-2013  yamt file ulfs_dinode.h was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.11.2.3 03-Dec-2017  jdolecek update from HEAD
 1.11.2.2 23-Jun-2013  tls resync from head
 1.11.2.1 08-Jun-2013  tls file ulfs_dinode.h was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.9 08-Jun-2013  dholland ulfs_dir.h has been emptied; remove it.
 1.8 08-Jun-2013  dholland Move more symbols to lfs.h:
LFS_DIRBLKSIZ
LFS_DIRECTSIZ
LFS_DIRSIZ
LFS_OLDDIRFMT
LFS_NEWDIRFMT
LFS_IFTODT
LFS_DTTOIF
ULFS{,1,2}_MAXSYMLINKLEN
 1.7 08-Jun-2013  dholland DIRBLKSIZ -> LFS_DIRBLKSIZ
DIRECTSIZ -> LFS_DIRECTSIZ
DIRSIZ -> LFS_DIRSIZ
OLDDIRFMT -> LFS_OLDDIRFMT
NEWDIRFMT -> LFS_NEWDIRFMT
IFTODT -> LFS_IFTODT
DTTOIF -> LFS_DTTOIF
 1.6 08-Jun-2013  dholland Move stuff to lfs.h that's needed by userland:
LFS_DT_*
ULFS_ROOTINO
ULFS_WINO
struct lfs_direct
struct lfs_dirtemplate
struct lfs_odirtemplate
struct ulfs_args

Also fix FFS_MAXNAMLEN -> LFS_MAXNAMLEN in several places.
 1.5 08-Jun-2013  dholland struct direct -> struct lfs_direct
struct dirtemplate -> struct lfs_dirtemplate
struct odirtemplate -> struct lfs_odirtemplate
DT_* -> LFS_DT_*
 1.4 08-Jun-2013  dholland Split the definitions suitable for userland out of ulfs_inode.h into
lfs_inode.h. Since fsck_lfs, newfs_lfs, and lfs_cleanerd want to reuse
the inode structure for their own internal use, and some of them share
parts of the kernel code as well, the best way forward is to provide a
relatively sanitized header that doesn't bring in stray material.

Shuffle a few other definitions around so that lfs_inode.h depends
only on lfs.h.

Install lfs_inode.h into /usr/include.
 1.3 06-Jun-2013  dholland Remove references to Apple UFS.
 1.2 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.19 07-Aug-2022  simonb If UFS or LFS dirhash is enabled in the kernel, set the dirhash cache
size dependant on memory size. If less than 128MB of memory, default
to no cache. With 128MB of memory or more, use a maximum cache size of
1/64th of memory; cap maximum default cache size to 32MB (for systems
with 2GB of memory or more).

The dirhash cache sizes are still explicityly setable by sysctl(8) or
by adding relevant entry(s) to sysctl.conf(5).
 1.18 14-Mar-2020  ad - Hide the details of SPCF_SHOULDYIELD and related behind a couple of small
functions: preempt_point() and preempt_needed().

- preempt(): if the LWP has exceeded its timeslice in kernel, strip it of
any priority boost gained earlier from blocking.
 1.17 20-Jun-2016  dholland branches: 1.17.18;
Merge -r1.37 of ufs_dirhash.c:
clear i_dirhash sooner, but what lock protects it?
 1.16 20-Jun-2016  dholland More already-merged or equivalent changes:

ufs_dirhash.c 1.36 corresponds to ulfs_dirhash.c 1.8
ufs_extattr.c 1.43 corresponds to ulfs_extattr.c 1.7
ufs_lookup.c 1.126 does not apply to lfs
ufs_lookup.c 1.127 we already have
ufs_lookup.c 1.128 does not apply to lfs
ufs_lookup.c 1.129 corresponds to ulfs_lookup.c 1.19
ufs_quota1.c 1.19 corresponds to ulfs_quota1.c 1.7
ufs_quota1.c 1.20 corresponds to ulfs_quota1.c 1.8
ufs_quota2.c 1.36 we have equivalent changes for
ufs_rename.c 1.9 corresponds to lfs_rename.c 1.5
ufs_rename.c 1.10 corresponds to lfs_rename.c 1.6
ufs_vnops.c 1.219 corresponds to lfs_vnops.c 1.260 and ulfs_vnops.c 1.19
ufs_vnops.c 1.220 corresponds to lfs_vnops.c 1.261 and ulfs_vnops.c 1.20
ufs_vnops.c 1.221 was superseded by later changes
ufs_vnops.c 1.222 got fixed independently in lfs
 1.15 19-Jun-2016  dholland Mark ufs file versions we're already synced with.
 1.14 21-Sep-2015  dholland Add 64-bit directory entry structures, and adjust accessors accordingly.

The LFS64 directory entry has a 64-bit inode number. This is stored as
two 32-bit values to avoid inducing 64-bit alignment requirements.

The exposed type for manipulating directory entries is now
LFS_DIRHEADER, following the same convention as e.g. IFILE and SEGUSE.
(But with LFS_ on it, because.)
 1.13 21-Sep-2015  dholland Oops; LFS_DIRECTSIZ() is going to need the fs as an argument.

Also, it turns out that dirhash needs a compile-time-constant version
of LFS_DIRECTSIZ(LFS_MAXNAMLEN+1), independent of 64-vs-32, so create
LFS_MAXDIRENTRYSIZE for this. Sigh.
 1.12 15-Sep-2015  dholland Pass around struct lfs_dirheader instead of struct lfs_direct.
 1.11 15-Sep-2015  dholland Add an accessor function for directory names.
 1.10 15-Sep-2015  dholland Add and use accessor functions for more of the directory entry fields.
 1.9 01-Sep-2015  dholland Add new accessors for the d_type and d_namlen fields of struct lfs_direct.
Napalm the old byteswap access logic for these.
 1.8 25-Feb-2014  pooka branches: 1.8.4; 1.8.8;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.7 28-Jul-2013  dholland Migrate the miscellaneous ulfs-level info from struct ulfsmount to
struct lfs.

Put them inside #ifdef _KERNEL there. They are not the only such
members, gross as that is. Unfortunately, moving struct lfs to
lfs_kernel.h does not work.
 1.6 08-Jun-2013  dholland branches: 1.6.2; 1.6.4;
ulfs_dir.h has been emptied; remove it.
 1.5 08-Jun-2013  dholland DIRBLKSIZ -> LFS_DIRBLKSIZ
DIRECTSIZ -> LFS_DIRECTSIZ
DIRSIZ -> LFS_DIRSIZ
OLDDIRFMT -> LFS_OLDDIRFMT
NEWDIRFMT -> LFS_NEWDIRFMT
IFTODT -> LFS_IFTODT
DTTOIF -> LFS_DTTOIF
 1.4 08-Jun-2013  dholland struct direct -> struct lfs_direct
struct dirtemplate -> struct lfs_dirtemplate
struct odirtemplate -> struct lfs_odirtemplate
DT_* -> LFS_DT_*
 1.3 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.2 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.6.4.2 18-May-2014  rmind sync with head
 1.6.4.1 28-Aug-2013  rmind sync with head
 1.6.2.4 03-Dec-2017  jdolecek update from HEAD
 1.6.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.6.2.2 23-Jun-2013  tls resync from head
 1.6.2.1 08-Jun-2013  tls file ulfs_dirhash.c was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.8.8.2 09-Jul-2016  skrll Sync with HEAD
 1.8.8.1 22-Sep-2015  skrll Sync with HEAD
 1.8.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.8.4.1 25-Feb-2014  yamt file ulfs_dirhash.c was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.17.18.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.12 19-Aug-2021  andvar s/memry/memory+s/softare/software/+s/grapics/graphics+s/ouput/output
 1.11 27-Dec-2019  msaitoh s/inital/initial/
 1.10 20-Jun-2016  dholland branches: 1.10.18;
u_int{8,16,32,64}_t -> uint{8,16,32,64}_t in remaining lfs headers.
 1.9 19-Jun-2016  dholland Mark ufs file versions we're already synced with.
 1.8 21-Sep-2015  dholland Add 64-bit directory entry structures, and adjust accessors accordingly.

The LFS64 directory entry has a 64-bit inode number. This is stored as
two 32-bit values to avoid inducing 64-bit alignment requirements.

The exposed type for manipulating directory entries is now
LFS_DIRHEADER, following the same convention as e.g. IFILE and SEGUSE.
(But with LFS_ on it, because.)
 1.7 21-Sep-2015  dholland Oops; LFS_DIRECTSIZ() is going to need the fs as an argument.

Also, it turns out that dirhash needs a compile-time-constant version
of LFS_DIRECTSIZ(LFS_MAXNAMLEN+1), independent of 64-vs-32, so create
LFS_MAXDIRENTRYSIZE for this. Sigh.
 1.6 15-Sep-2015  dholland Pass around struct lfs_dirheader instead of struct lfs_direct.
 1.5 08-Jun-2013  dholland branches: 1.5.2; 1.5.10; 1.5.14;
DIRBLKSIZ -> LFS_DIRBLKSIZ
DIRECTSIZ -> LFS_DIRECTSIZ
DIRSIZ -> LFS_DIRSIZ
OLDDIRFMT -> LFS_OLDDIRFMT
NEWDIRFMT -> LFS_NEWDIRFMT
IFTODT -> LFS_IFTODT
DTTOIF -> LFS_DTTOIF
 1.4 08-Jun-2013  dholland Move stuff to lfs.h that's needed by userland:
LFS_DT_*
ULFS_ROOTINO
ULFS_WINO
struct lfs_direct
struct lfs_dirtemplate
struct lfs_odirtemplate
struct ulfs_args

Also fix FFS_MAXNAMLEN -> LFS_MAXNAMLEN in several places.
 1.3 08-Jun-2013  dholland struct direct -> struct lfs_direct
struct dirtemplate -> struct lfs_dirtemplate
struct odirtemplate -> struct lfs_odirtemplate
DT_* -> LFS_DT_*
 1.2 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.5.14.2 09-Jul-2016  skrll Sync with HEAD
 1.5.14.1 22-Sep-2015  skrll Sync with HEAD
 1.5.10.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.5.10.1 08-Jun-2013  yamt file ulfs_dirhash.h was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.5.2.3 03-Dec-2017  jdolecek update from HEAD
 1.5.2.2 23-Jun-2013  tls resync from head
 1.5.2.1 08-Jun-2013  tls file ulfs_dirhash.h was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.10.18.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.18 10-Feb-2024  andvar Fix various typos in comments, log messages and documentation.
 1.17 29-Jun-2021  dholland Add containment for the cloning devices hack in vn_open.

Cloning devices (and also things like /dev/stderr) work by allocating
a struct file, stuffing it in the file table (which is a layer
violation), stuffing the file descriptor number for it in a magic
field of struct lwp (which is gross), and then "failing" with one of
two magic errnos, EDUPFD or EMOVEFD.

Before this commit, all callers of vn_open in the kernel (there are
quite a few) were expected to check for these errors and handle the
situation. Needless to say, none of them except for open() itself did,
resulting in internal negative errnos being returned to userspace.

This hack is fairly deeply rooted and cannot be eliminated all at
once. This commit adds logic to handle the magic errnos inside
vn_open; now on success vn_open returns either a vnode or an integer
file descriptor, along with a flag that says whether the underlying
code requested EDUPFD or EMOVEFD. Callers not prepared to cope with
file descriptors can pass NULL for the extra return values, in which
case if a file descriptor would be produced vn_open fails with
EOPNOTSUPP.

Since I'm rearranging vn_open's signature anyway, stop exposing struct
nameidata. Instead, take three arguments: an optional vnode to use as
the starting point (like openat()), the path, and additional namei
flags to use, restricted to NOCHROOT and TRYEMULROOT. (Other namei
behavior, e.g. NOFOLLOW, can be requested via the open flags.)

This change requires a kernel bump. Ride the one an hour ago.
(That was supposed to be coordinated; did not intend to let an hour
slip by. My fault.)
 1.16 16-May-2020  christos branches: 1.16.6;
Add ACL support for FFS. From FreeBSD.
 1.15 17-Jan-2020  ad VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.
 1.14 09-Nov-2016  dholland branches: 1.14.16; 1.14.22;
Apply ufs_extattr.c 1.48:
Explain why the lock in here needs to be recursive. Related to PR 46997.

ufs_extattr 1.47 was also committed directly here, so this file is still
fully synced with it.
 1.13 07-Jul-2016  msaitoh branches: 1.13.2;
KNF. Remove extra spaces. No functional change.
 1.12 20-Jun-2016  dholland Merge -r1.46 of ufs_extattr.c: Fix uninitialized mutex usage
 1.11 20-Jun-2016  dholland Merge -r1.45 of ufs_extattr.c:
Fix UFS1 extended attribute backend autocreation deadlock
 1.10 20-Jun-2016  dholland Merge -r1.44 of ufs_extattr.c and related change -r1.302 of ffs_vfops.c:
fix use-after-free on failed unmount with extended attributes enabled.
 1.9 20-Jun-2016  dholland More already-merged or equivalent changes:

ufs_dirhash.c 1.36 corresponds to ulfs_dirhash.c 1.8
ufs_extattr.c 1.43 corresponds to ulfs_extattr.c 1.7
ufs_lookup.c 1.126 does not apply to lfs
ufs_lookup.c 1.127 we already have
ufs_lookup.c 1.128 does not apply to lfs
ufs_lookup.c 1.129 corresponds to ulfs_lookup.c 1.19
ufs_quota1.c 1.19 corresponds to ulfs_quota1.c 1.7
ufs_quota1.c 1.20 corresponds to ulfs_quota1.c 1.8
ufs_quota2.c 1.36 we have equivalent changes for
ufs_rename.c 1.9 corresponds to lfs_rename.c 1.5
ufs_rename.c 1.10 corresponds to lfs_rename.c 1.6
ufs_vnops.c 1.219 corresponds to lfs_vnops.c 1.260 and ulfs_vnops.c 1.19
ufs_vnops.c 1.220 corresponds to lfs_vnops.c 1.261 and ulfs_vnops.c 1.20
ufs_vnops.c 1.221 was superseded by later changes
ufs_vnops.c 1.222 got fixed independently in lfs
 1.8 19-Jun-2016  dholland Mark ufs file versions we're already synced with.
 1.7 07-Feb-2014  hannken branches: 1.7.4; 1.7.8;
Change vnode operation lookup to return the resulting vnode *vpp unlocked.
Change cache_lookup() to return an unlocked vnode.

Discussed on tech-kern@

Welcome to 6.99.31
 1.6 08-Jun-2013  dholland branches: 1.6.2; 1.6.4;
ulfs_dir.h has been emptied; remove it.
 1.5 08-Jun-2013  dholland DIRBLKSIZ -> LFS_DIRBLKSIZ
DIRECTSIZ -> LFS_DIRECTSIZ
DIRSIZ -> LFS_DIRSIZ
OLDDIRFMT -> LFS_OLDDIRFMT
NEWDIRFMT -> LFS_NEWDIRFMT
IFTODT -> LFS_IFTODT
DTTOIF -> LFS_DTTOIF
 1.4 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.3 06-Jun-2013  dholland Split lfs from ufs step 3: rearrange config stuff.
Add new options:
LFS_EI
LFS_DIRHASH
LFS_EXTATTR
LFS_EXTATTR_AUTOSTART
LFS_QUOTA
LFS_QUOTA2

and update code referring to the corresponding FFS and UFS config
symbols to use the LFS versions. Disable the one extant reference
to APPLE_UFS in the ulfs files. Use opt_lfs.h only, not opt_ffs.h.
 1.2 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.6.4.1 18-May-2014  rmind sync with head
 1.6.2.4 03-Dec-2017  jdolecek update from HEAD
 1.6.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.6.2.2 23-Jun-2013  tls resync from head
 1.6.2.1 08-Jun-2013  tls file ulfs_extattr.c was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.7.8.2 05-Dec-2016  skrll Sync with HEAD
 1.7.8.1 09-Jul-2016  skrll Sync with HEAD
 1.7.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.7.4.1 07-Feb-2014  yamt file ulfs_extattr.c was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.13.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.14.22.1 17-Jan-2020  ad Sync with head.
 1.14.16.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.16.6.1 01-Aug-2021  thorpej Sync with HEAD.
 1.3 20-Jun-2016  dholland Merge -r1.11 of extattr.h:
Bump UFS1 extended attribute max name length to 256
 1.2 06-Jun-2013  dholland branches: 1.2.2; 1.2.10; 1.2.14;
Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.2.14.1 09-Jul-2016  skrll Sync with HEAD
 1.2.10.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.2.10.1 06-Jun-2013  yamt file ulfs_extattr.h was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.2.2.3 03-Dec-2017  jdolecek update from HEAD
 1.2.2.2 23-Jun-2013  tls resync from head
 1.2.2.1 06-Jun-2013  tls file ulfs_extattr.h was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.26 18-Jul-2021  dholland Abolish all the silly indirection macros for initializing vnode ops tables.

These are things of the form #define foofs_op genfs_op, or #define
foofs_op genfs_eopnotsupp, or similar. They serve no purpose besides
obfuscation, and have gotten cutpasted all over everywhere.
 1.25 17-Jan-2020  ad branches: 1.25.10;
VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.
 1.24 20-Jun-2016  dholland branches: 1.24.18; 1.24.24;
One more batch of already-synced ufs changes:

ufs_extern.h 1.79 is equivalent to ulfs_extern.h 1.14
ufsmount.h 1.43 is (roughly) equivalent to lfs_extern.h 1.102
ufs_inode.c 1.94 does not apply to lfs
ufs_inode.c 1.95 does not apply to lfs either
ufs_readwrite.c 1.108 is equivalent to ulfs_readwrite.c 1.8
ufs_readwrite.c 1.109 is equivalent to ulfs_readwrite.c 1.9
ufs_readwrite.c 1.110 is equivalent to ulfs_readwrite.c 1.10
ufs_readwrite.c 1.111 does not apply to lfs
ufs_readwrite.c 1.112 is equivalent to ulfs_readwrite.c 1.11
ufs_readwrite.c 1.113 is equivalent to ulfs_readwrite.c 1.13
ufs_readwrite.c 1.114 is equivalent to ulfs_readwrite.c 1.14
ufs_readwrite.c 1.115 is equivalent to ulfs_readwrite.c 1.15
ufs_readwrite.c 1.116-1.118 does not apply to lfs
ufs_readwrite.c 1.119-1.120 are equivalent to ulfs_readwrite.c 1.16
ufs_rename.c 1.12 is equivalent to lfs_rename.c 1.8
ufs_vnops.c 1.226 is equivalent to ulfs_vnops.c 1.22 and lfs_vnops.c 1.270
ufs_vnops.c 1.227 is equivalent to ulfs_vnops.c 1.23
ufs_vnops.c 1.228-1.229 are equivalent to ulfs_vnops.c 1.24
ufs_vnops.c 1.230 is equivalent to ulfs_vnops.c 1.25 and lfs_vnops.c 1.271
ufs_vnops.c 1.231 originated in lfs
ufs_vnops.c 1.232 does not apply to lfs
 1.23 20-Jun-2016  dholland Merge (effectively) -r1.78 of ufs_extern.h: shift ulfs_makeinode to
lfs_vnops.c and make it file-static there, as that's the only place
it's used.
 1.22 20-Jun-2016  dholland Note more already-merged versions:

inode.h 1.68 is subsumed by ulfs_inode.h 1.19
inode.h 1.69-1.72 do not apply to lfs
ufs_extern.h 1.74 was covered when lfs was moved to the new vnode cache
ufs_extern.h 1.75 is equivalent to ulfs_extern.h 1.13
ufs_extern.h 1.76-1.77 do not apply to lfs
ufsmount.h 1.42 does not apply to lfs
ufs_inode.c 1.90 is subsumed by ulfs_inode.c 1.10
ufs_inode.c 1.91-1.92 do not apply to lfs
ufs_lookup.c 1.130 is subsumed by ulfs_lookup.c 1.24
ufs_lookup.c 1.131 is equivalent to ulfs_lookup.c 1.20
ufs_lookup.c 1.132 is equivalent to ulfs_lookup.c 1.21
ufs_lookup.c 1.133 is equivalent to ulfs_lookup.c 1.22
ufs_lookup.c 1.134 is equivalent to ulfs_lookup.c 1.23
ufs_lookup.c 1.135 is equivalent to ulfs_lookup.c 1.25
ufs_quota2.c 1.38 is equivalent to ulfs_quota2.c 1.17
ufs_quota2.c 1.39 is equivalent to ulfs_quota2.c 1.16
ufs_quota2.c 1.40 is equivalent to ulfs_quota2.c 1.18
ufs_vfsops.c 1.53 is subsumed by lfs_vfsops.c 1.324
ufs_vfsops.c 1.54 is subsumed by lfs_vfsops.c 1.324
ufs_vnops.c 1.223-1.224 do not apply to lfs
 1.21 19-Jun-2016  dholland Update the ufs versions these files are synced with by 1: the
201306016 commit by hannken@ that removed references to ffs_snapgone
in ufs doesn't need to be synced into lfs.
 1.20 21-Sep-2015  dholland Add 64-bit directory entry structures, and adjust accessors accordingly.

The LFS64 directory entry has a 64-bit inode number. This is stored as
two 32-bit values to avoid inducing 64-bit alignment requirements.

The exposed type for manipulating directory entries is now
LFS_DIRHEADER, following the same convention as e.g. IFILE and SEGUSE.
(But with LFS_ on it, because.)
 1.19 15-Sep-2015  dholland Pass around struct lfs_dirheader instead of struct lfs_direct.
 1.18 15-Sep-2015  dholland Kill off the ulfs_direct_cache pool.
We no longer allocate temporary struct directs, so we don't need a
pool for them.
 1.17 15-Sep-2015  dholland Kill off ulfs_makedirentry; just pass the data to ulfs_direnter instead.
For now, move one copy of the code that allocates and fills in a
temporary struct lfs_direct to the top of ulfs_direnter; but it should
go away shortly.
 1.16 01-Sep-2015  dholland Add new accessors for the d_type and d_namlen fields of struct lfs_direct.
Napalm the old byteswap access logic for these.
 1.15 31-May-2015  hannken Change lfs from hash table to vcache.

- Change lfs_valloc() to return an inode number and version instead of
a vnode and move lfs_ialloc() and lfs_vcreate() to new lfs_init_vnode().

- Add lfs_valloc_fixed() to allocate a known inode, used by kernel
roll forward.

- Remove lfs_*ref(), these functions cannot coexist with vcache and
their commented behaviour is far away from their implementation.

- Add the cleaner lwp and blockinfo to struct ulfsmount so lfs_loadvnode()
may use hints from the cleaner.

- Remove vnode locks from ulfs_lookup() like we did with ufs_lookup().
 1.14 27-Mar-2015  riastradh Disentangle buffer-cached I/O from page-cached I/O in UFS.

Page-cached I/O is used for regular files, and is initiated by VFS
users such as userland and NFS.

Buffer-cached I/O is used for directories and symlinks, and is issued
only internally by UFS.

New UFS routine ufs_bufio replaces vn_rdwr for internal use.
ufs_bufio is implemented by new UFS operations uo_bufrd/uo_bufwr,
which sit in ufs_readwrite.c alongside the VOP_READ/VOP_WRITE
implementations.

I preserved the code as much as possible and will leave further
simplification for future commits. I kept the ulfs_readwrite.c
copypasta close to ufs_readwrite.c in case we ever want to merge them
back; likewise ext2fs_readwrite.c.

No externally visible semantic change. All atf fs tests still pass.
 1.13 25-May-2014  hannken branches: 1.13.4;
Remove ulfs_checkpath() and ulfs_readdotdot(). These are relics
from the pre-genfs_rename era.
 1.12 17-May-2014  dholland branches: 1.12.2;
Merge ulfs_mkdir into lfs_mkdir.
 1.11 17-May-2014  dholland Merge ulfs_symlink into lfs_symlink.
 1.10 28-Jul-2013  dholland branches: 1.10.2;
Bring in a copy of ffs_quota2_mount() for reference.
Add stuff to struct lfs that it needs to initialize.
Clear these fields in mount as there's no on-disk support for quota2;
but this increases the chances of being able to add it (or something
like it) in the future.
 1.9 28-Jul-2013  dholland Migrate the miscellaneous ulfs-level info from struct ulfsmount to
struct lfs.

Put them inside #ifdef _KERNEL there. They are not the only such
members, gross as that is. Unfortunately, moving struct lfs to
lfs_kernel.h does not work.
 1.8 20-Jul-2013  dholland G/C unused pieces.
 1.7 20-Jul-2013  dholland Collect the pieces of lfs rename into lfs_rename.c, and sprinkle static.
 1.6 08-Jun-2013  dholland branches: 1.6.2; 1.6.4; 1.6.6;
struct direct -> struct lfs_direct
struct dirtemplate -> struct lfs_dirtemplate
struct odirtemplate -> struct lfs_odirtemplate
DT_* -> LFS_DT_*
 1.5 06-Jun-2013  dholland Fix some exposed symbols:
LOSTFOUNDINO -> LFS_LOSTFOUNDINO
struct ufid -> struct ulfs_ufid
 1.4 06-Jun-2013  dholland Apparently we also need to cut and paste ffs_snapgone() in order to be
able to link the ufs code.

Instead of actually cutting and pasting it (as it depends on ffs-only
things) implement it as panic. Probably we'll be able to demonstrate
later that it's unreachable.

XXX: Someone should add snapgone to struct ufs_ops in ufs/ufsmount.h,
XXX: and fix ufs/ufs_lookup.c to not hardwire ffs.
 1.3 06-Jun-2013  dholland Add lfs_ or ulfs_ in front of extern symbols lacking them, mostly
quota-related (and particularly quota2-related) stuff.
 1.2 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.6.6.1 23-Jul-2013  riastradh sync with HEAD
 1.6.4.2 18-May-2014  rmind sync with head
 1.6.4.1 28-Aug-2013  rmind sync with head
 1.6.2.4 03-Dec-2017  jdolecek update from HEAD
 1.6.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.6.2.2 23-Jun-2013  tls resync from head
 1.6.2.1 08-Jun-2013  tls file ulfs_extern.h was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.10.2.1 10-Aug-2014  tls Rebase.
 1.12.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.12.2.1 17-May-2014  yamt file ulfs_extern.h was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.13.4.4 09-Jul-2016  skrll Sync with HEAD
 1.13.4.3 22-Sep-2015  skrll Sync with HEAD
 1.13.4.2 06-Jun-2015  skrll Sync with HEAD
 1.13.4.1 06-Apr-2015  skrll Sync with HEAD
 1.24.24.1 17-Jan-2020  ad Sync with head.
 1.24.18.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.25.10.1 01-Aug-2021  thorpej Sync with HEAD.
 1.6 31-May-2015  hannken Change lfs from hash table to vcache.

- Change lfs_valloc() to return an inode number and version instead of
a vnode and move lfs_ialloc() and lfs_vcreate() to new lfs_init_vnode().

- Add lfs_valloc_fixed() to allocate a known inode, used by kernel
roll forward.

- Remove lfs_*ref(), these functions cannot coexist with vcache and
their commented behaviour is far away from their implementation.

- Add the cleaner lwp and blockinfo to struct ulfsmount so lfs_loadvnode()
may use hints from the cleaner.

- Remove vnode locks from ulfs_lookup() like we did with ufs_lookup().
 1.5 20-Apr-2015  riastradh Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.
 1.4 27-Feb-2014  hannken branches: 1.4.4; 1.4.8;
The current implementation of vn_lock() is racy. Modification of
the vnode operations vector for active vnodes is unsafe because it
is not known whether deadfs or the original file system will be
called.

- Pass down LK_RETRY to the lock operation (hint for deadfs only).

- Change deadfs lock operation to return ENOENT if LK_RETRY is unset.

- Change all other lock operations to check for dead vnode once
the vnode is locked and unlock and return ENOENT in this case.

With these changes in place vnode lock operations will never succeed
after vclean() has marked the vnode as VI_XLOCK and before vclean()
has changed the operations vector.

Adresses PR kern/37706 (Forced unmount of file systems is unsafe)

Discussed on tech-kern.

Welcome to 6.99.33
 1.3 06-Jun-2013  dholland branches: 1.3.2; 1.3.4;
Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.2 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.3.4.1 18-May-2014  rmind sync with head
 1.3.2.4 03-Dec-2017  jdolecek update from HEAD
 1.3.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.3.2.2 23-Jun-2013  tls resync from head
 1.3.2.1 06-Jun-2013  tls file ulfs_ihash.c was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.4.8.1 06-Jun-2015  skrll Sync with HEAD
 1.4.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.4.4.1 27-Feb-2014  yamt file ulfs_ihash.c was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.26 05-Sep-2020  riastradh Round of uvm.h cleanup.

The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
to query whether curlwp is the pagedaemon, which should maybe be
exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
by it. We should split up uvm_extern.h but this will serve for now
to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
UVMHIST(ubchist), since ubchist is declared in uvm.h but the
reference evaporates if UVMHIST is not defined, so we reduce header
file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
here.

ok chs@
 1.25 23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.24 15-Jan-2020  ad Merge from yamt-pagecache (after much testing):

- Reduce unnecessary page scan in putpages esp. when an object has a ton of
pages cached but only a few of them are dirty.

- Reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
 1.23 31-Dec-2019  ad branches: 1.23.2;
- Add and use wrapper functions that take and acquire page interlocks, and pairs
of page interlocks. Require that the page interlock be held over calls to
uvm_pageactivate(), uvm_pagewire() and similar.

- Solve the concurrency problem with page replacement state. Rather than
updating the global state synchronously, set an intended state on
individual pages (active, inactive, enqueued, dequeued) while holding the
page interlock. After the interlock is released put the pages on a 128
entry per-CPU queue for their state changes to be made real in batch.
This results in in a ~400 fold decrease in contention on my test system.
Proposed on tech-kern but modified to use the page interlock rather than
atomics to synchronise as it's much easier to maintain that way, and
cheaper.
 1.22 13-Dec-2019  ad Break the global uvm_pageqlock into a per-page identity lock and a private
lock for use of the pagedaemon policy code. Discussed on tech-kern.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour
 1.21 28-Oct-2017  pgoyette branches: 1.21.4;
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3) format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
but it is possible that I've missed some of them. I would be glad to
update any stragglers that anyone identifies.
 1.20 10-Jun-2017  maya Rename i_flag to i_state.

The similarity to i_flags has previously caused errors.
 1.19 26-May-2017  riastradh branches: 1.19.2;
Eliminate crusty debugging sludge.

We have a mostly sane vnode lifecycle now. If this needs debugging,
it should be done once at the call site of VOP_RECLAIM.
 1.18 11-Apr-2017  riastradh Make VOP_INACTIVE preserve vnode lock on return.

Discussed on tech-kern:
https://mail-index.netbsd.org/tech-kern/2017/04/01/msg021751.html

Ride 7.99.68, a bumpy bus of incremental vfs improvements!
 1.17 30-Mar-2017  hannken Remove now redundant calls to fstrans_start()/fstrans_done().

Add fstrans_start()/fstrans_done() to lfs_putpages().
 1.16 20-Aug-2016  hannken branches: 1.16.2;
Remove now obsolete operation vcache_remove().

Welcome to 7.99.36
 1.15 20-Jun-2016  dholland branches: 1.15.2;
One more batch of already-synced ufs changes:

ufs_extern.h 1.79 is equivalent to ulfs_extern.h 1.14
ufsmount.h 1.43 is (roughly) equivalent to lfs_extern.h 1.102
ufs_inode.c 1.94 does not apply to lfs
ufs_inode.c 1.95 does not apply to lfs either
ufs_readwrite.c 1.108 is equivalent to ulfs_readwrite.c 1.8
ufs_readwrite.c 1.109 is equivalent to ulfs_readwrite.c 1.9
ufs_readwrite.c 1.110 is equivalent to ulfs_readwrite.c 1.10
ufs_readwrite.c 1.111 does not apply to lfs
ufs_readwrite.c 1.112 is equivalent to ulfs_readwrite.c 1.11
ufs_readwrite.c 1.113 is equivalent to ulfs_readwrite.c 1.13
ufs_readwrite.c 1.114 is equivalent to ulfs_readwrite.c 1.14
ufs_readwrite.c 1.115 is equivalent to ulfs_readwrite.c 1.15
ufs_readwrite.c 1.116-1.118 does not apply to lfs
ufs_readwrite.c 1.119-1.120 are equivalent to ulfs_readwrite.c 1.16
ufs_rename.c 1.12 is equivalent to lfs_rename.c 1.8
ufs_vnops.c 1.226 is equivalent to ulfs_vnops.c 1.22 and lfs_vnops.c 1.270
ufs_vnops.c 1.227 is equivalent to ulfs_vnops.c 1.23
ufs_vnops.c 1.228-1.229 are equivalent to ulfs_vnops.c 1.24
ufs_vnops.c 1.230 is equivalent to ulfs_vnops.c 1.25 and lfs_vnops.c 1.271
ufs_vnops.c 1.231 originated in lfs
ufs_vnops.c 1.232 does not apply to lfs
 1.14 20-Jun-2016  dholland Merge ufs_inode.c 1.93: missing unlock on error path.
 1.13 20-Jun-2016  dholland Note more already-merged versions:

inode.h 1.68 is subsumed by ulfs_inode.h 1.19
inode.h 1.69-1.72 do not apply to lfs
ufs_extern.h 1.74 was covered when lfs was moved to the new vnode cache
ufs_extern.h 1.75 is equivalent to ulfs_extern.h 1.13
ufs_extern.h 1.76-1.77 do not apply to lfs
ufsmount.h 1.42 does not apply to lfs
ufs_inode.c 1.90 is subsumed by ulfs_inode.c 1.10
ufs_inode.c 1.91-1.92 do not apply to lfs
ufs_lookup.c 1.130 is subsumed by ulfs_lookup.c 1.24
ufs_lookup.c 1.131 is equivalent to ulfs_lookup.c 1.20
ufs_lookup.c 1.132 is equivalent to ulfs_lookup.c 1.21
ufs_lookup.c 1.133 is equivalent to ulfs_lookup.c 1.22
ufs_lookup.c 1.134 is equivalent to ulfs_lookup.c 1.23
ufs_lookup.c 1.135 is equivalent to ulfs_lookup.c 1.25
ufs_quota2.c 1.38 is equivalent to ulfs_quota2.c 1.17
ufs_quota2.c 1.39 is equivalent to ulfs_quota2.c 1.16
ufs_quota2.c 1.40 is equivalent to ulfs_quota2.c 1.18
ufs_vfsops.c 1.53 is subsumed by lfs_vfsops.c 1.324
ufs_vfsops.c 1.54 is subsumed by lfs_vfsops.c 1.324
ufs_vnops.c 1.223-1.224 do not apply to lfs
 1.12 14-Nov-2015  pgoyette Remove historic references to wapbl.
 1.11 01-Sep-2015  dholland Add new accessors for the d_type and d_namlen fields of struct lfs_direct.
Napalm the old byteswap access logic for these.
 1.10 31-May-2015  hannken Change lfs from hash table to vcache.

- Change lfs_valloc() to return an inode number and version instead of
a vnode and move lfs_ialloc() and lfs_vcreate() to new lfs_init_vnode().

- Add lfs_valloc_fixed() to allocate a known inode, used by kernel
roll forward.

- Remove lfs_*ref(), these functions cannot coexist with vcache and
their commented behaviour is far away from their implementation.

- Add the cleaner lwp and blockinfo to struct ulfsmount so lfs_loadvnode()
may use hints from the cleaner.

- Remove vnode locks from ulfs_lookup() like we did with ufs_lookup().
 1.9 28-Jul-2013  dholland branches: 1.9.4; 1.9.6; 1.9.8;
Remove the now-pointless ulfs ops macros.
 1.8 28-Jul-2013  dholland Get rid of the ulfs_ops table as we only have one fs in here now.
 1.7 08-Jun-2013  dholland branches: 1.7.2; 1.7.4;
There is no WAPBL in LFS.
 1.6 08-Jun-2013  dholland mp->mnt_wapbl and mp->mnt_wapbl_replay are always NULL in here.
 1.5 06-Jun-2013  dholland Add lfs_ or ulfs_ in front of extern symbols lacking them, mostly
quota-related (and particularly quota2-related) stuff.
 1.4 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.3 06-Jun-2013  dholland Split lfs from ufs step 3: rearrange config stuff.
Add new options:
LFS_EI
LFS_DIRHASH
LFS_EXTATTR
LFS_EXTATTR_AUTOSTART
LFS_QUOTA
LFS_QUOTA2

and update code referring to the corresponding FFS and UFS config
symbols to use the LFS versions. Disable the one extant reference
to APPLE_UFS in the ulfs files. Use opt_lfs.h only, not opt_ffs.h.
 1.2 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.7.4.1 28-Aug-2013  rmind sync with head
 1.7.2.4 03-Dec-2017  jdolecek update from HEAD
 1.7.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.7.2.2 23-Jun-2013  tls resync from head
 1.7.2.1 08-Jun-2013  tls file ulfs_inode.c was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.9.8.6 28-Aug-2017  skrll Sync with HEAD
 1.9.8.5 05-Oct-2016  skrll Sync with HEAD
 1.9.8.4 09-Jul-2016  skrll Sync with HEAD
 1.9.8.3 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.9.8.2 22-Sep-2015  skrll Sync with HEAD
 1.9.8.1 06-Jun-2015  skrll Sync with HEAD
 1.9.6.1 10-Jul-2016  martin Pull up following revision(s) (requested by dholland in ticket #1188):
sys/ufs/lfs/ulfs_inode.c: revision 1.14
Merge ufs_inode.c 1.93: missing unlock on error path.
 1.9.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.9.4.1 28-Jul-2013  yamt file ulfs_inode.c was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.15.2.1 26-Apr-2017  pgoyette Sync with HEAD
 1.16.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.19.2.2 02-Nov-2017  snj Pull up following revision(s) (requested by pgoyette in ticket #335):
share/man/man9/kernhist.9: 1.5-1.8
sys/arch/acorn26/acorn26/pmap.c: 1.39
sys/arch/arm/arm32/fault.c: 1.105 via patch
sys/arch/arm/arm32/pmap.c: 1.350, 1.359
sys/arch/arm/broadcom/bcm2835_bsc.c: 1.7
sys/arch/arm/omap/if_cpsw.c: 1.20
sys/arch/arm/omap/tiotg.c: 1.7
sys/arch/evbarm/conf/RPI2_INSTALL: 1.3
sys/dev/ic/sl811hs.c: 1.98
sys/dev/usb/ehci.c: 1.256
sys/dev/usb/if_axe.c: 1.83
sys/dev/usb/motg.c: 1.18
sys/dev/usb/ohci.c: 1.274
sys/dev/usb/ucom.c: 1.119
sys/dev/usb/uhci.c: 1.277
sys/dev/usb/uhub.c: 1.137
sys/dev/usb/umass.c: 1.160-1.162
sys/dev/usb/umass_quirks.c: 1.100
sys/dev/usb/umass_scsipi.c: 1.55
sys/dev/usb/usb.c: 1.168
sys/dev/usb/usb_mem.c: 1.70
sys/dev/usb/usb_subr.c: 1.221
sys/dev/usb/usbdi.c: 1.175
sys/dev/usb/usbdi_util.c: 1.67-1.70
sys/dev/usb/usbroothub.c: 1.3
sys/dev/usb/xhci.c: 1.75
sys/external/bsd/drm2/dist/drm/i915/i915_gem.c: 1.34
sys/kern/kern_history.c: 1.15
sys/kern/kern_xxx.c: 1.74
sys/kern/vfs_bio.c: 1.275-1.276
sys/miscfs/genfs/genfs_io.c: 1.71
sys/sys/kernhist.h: 1.21
sys/ufs/ffs/ffs_balloc.c: 1.63
sys/ufs/lfs/lfs_vfsops.c: 1.361
sys/ufs/lfs/ulfs_inode.c: 1.21
sys/ufs/lfs/ulfs_vnops.c: 1.52
sys/ufs/ufs/ufs_inode.c: 1.102
sys/ufs/ufs/ufs_vnops.c: 1.239
sys/uvm/pmap/pmap.c: 1.37-1.39
sys/uvm/pmap/pmap_tlb.c: 1.22
sys/uvm/uvm_amap.c: 1.108
sys/uvm/uvm_anon.c: 1.64
sys/uvm/uvm_aobj.c: 1.126
sys/uvm/uvm_bio.c: 1.91
sys/uvm/uvm_device.c: 1.66
sys/uvm/uvm_fault.c: 1.201
sys/uvm/uvm_km.c: 1.144
sys/uvm/uvm_loan.c: 1.85
sys/uvm/uvm_map.c: 1.353
sys/uvm/uvm_page.c: 1.194
sys/uvm/uvm_pager.c: 1.111
sys/uvm/uvm_pdaemon.c: 1.109
sys/uvm/uvm_swap.c: 1.175
sys/uvm/uvm_vnode.c: 1.103
usr.bin/vmstat/vmstat.c: 1.219
Reorder to test for null before null deref in debug code
--
Reorder to test for null before null deref in debug code
--
KNF
--
No need for '\n' in UVMHIST_LOG
--
normalise a BIOHIST log message
--
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...
(As proposed on tech-kern@ with additional changes and enhancements.)
Details of changes:
* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)
* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.
* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.
* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."
* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.
* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).
* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).
* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.
* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.
[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3)
format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".
[2] I've tried very hard to find "all [the] existing users of
kernhist(9)"
but it is possible that I've missed some of them. I would be glad
to
update any stragglers that anyone identifies.
--
For some reason this single kernel seems to have outgrown its declared
size as a result of the kernhist(9) changes. Bump the size.
XXX The amount of increase may be excessive - anyone with more detailed
XXX knowledge please feel free to further adjust the value
appropriately.
--
Misssed one cast of pointer --> uintptr_t in previous kernhist(9) commit
--
And yet another one. :(
--
Use correct mark-up for NetBSD version.
--
More improvements in grammar and readability.
--
Remove a stray '"' (obvious typo) and add a couple of casts that are
probably needed.
--
And replace an instance of "%p" conversion with "%#jx"
--
Whitespace fix. Give Bl tag table a width. Fix Xr.
 1.19.2.1 30-Oct-2017  snj Pull up following revision(s) (requested by maya in ticket #330):
sbin/fsck_lfs/inode.c: 1.69
sbin/fsck_lfs/lfs.c: 1.73
sbin/fsck_lfs/pass6.c: 1.50
sbin/fsck_lfs/segwrite.c: 1.46
sys/ufs/lfs/lfs.h: 1.202-1.203
sys/ufs/lfs/lfs_accessors.h: 1.48
sys/ufs/lfs/lfs_alloc.c: 1.136-1.137
sys/ufs/lfs/lfs_balloc.c: 1.94
sys/ufs/lfs/lfs_bio.c: 1.141
sys/ufs/lfs/lfs_extern.h: 1.113
sys/ufs/lfs/lfs_inode.c: 1.156-1.157
sys/ufs/lfs/lfs_inode.h: 1.20, 1.21, 1.23
sys/ufs/lfs/lfs_itimes.c: 1.20
sys/ufs/lfs/lfs_pages.c: 1.13-1.15
sys/ufs/lfs/lfs_rename.c: 1.22
sys/ufs/lfs/lfs_segment.c: 1.270-1.275
sys/ufs/lfs/lfs_subr.c: 1.94-1.97
sys/ufs/lfs/lfs_syscalls.c: 1.175
sys/ufs/lfs/lfs_vfsops.c: 1.360
sys/ufs/lfs/lfs_vnops.c: 1.316-1.321
sys/ufs/lfs/ulfs_inode.c: 1.20
sys/ufs/lfs/ulfs_inode.h: 1.24
sys/ufs/lfs/ulfs_lookup.c: 1.41
sys/ufs/lfs/ulfs_quota2.c: 1.31
sys/ufs/lfs/ulfs_readwrite.c: 1.24
sys/ufs/lfs/ulfs_vnops.c: 1.49-1.50
Update inode member i_flag --> i_state to keep up with kernel changes
Move definition of IN_ALLMOD near the flag it's a mask for.
Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
Correct confusion between i_flag and i_flags
These will have to be renamed.
Spotted by Riastradh, thanks!
Add an XXX about the missing flags so it's not buried in a commit
message.
now the XXX count for LFS is 260
Rename i_flag to i_state.
The similarity to i_flags has previously caused errors.
Use continue to denote the no-op loop to match netbsd style
newline for extra clarity.
It isn't safe to drain dirops with seglock held, it'll deadlock if there
are any dirops. drain before grabbing seglock.
lfs_dirops == 0 is always true (as we already drained dirops), so omit
that part of the comparison.
Fixes a lot of LFS deadlocks. PR kern/52301
Many thanks to dholland for help analyzing coredumps
Ifdef out KDASSERT which fires on my machine.
Deduplicate sanity check that seglock is held on segunlock
Revert r1.272 fix to PR kern/52301, the performance hit is making things
unusable.
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar
XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
fix buffer overflow/KASSERT when cookies are supplied
lfs no longer uses the ffs-style struct direct, use the correct minimum
size
from dholland
XXX more wrong
Consistently use {,UN}MARK_VNODE macros rather than function calls.
Not much point doing anything after a panic call
Ask some question about the code in a XXX comment
XXX question our double-flushing of dirops
Fix typo in comment
 1.21.4.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.23.2.2 29-Feb-2020  ad Sync with head.
 1.23.2.1 17-Jan-2020  ad Sync with head.
 1.25 17-Feb-2024  mlelstv Whitespace.
 1.24 10-Jun-2017  maya Rename i_flag to i_state.

The similarity to i_flags has previously caused errors.
 1.23 08-Jun-2017  chs move some buffer cache internals declarations from buf.h to vfs_bio.c.
this is needed to avoid name conflicts with ZFS and also
makes it clearer that other code shouldn't be messing with these.
remove the LFS debug code that poked around in bufqueues and
remove the BQ_EMPTY bufqueue since nothing uses it anymore.
provide a function to let LFS and wapbl read the value of nbuf for now.
 1.22 21-Jun-2016  dholland branches: 1.22.10;
Revert version 1.19 (make ufid_ino in struct ulfs_ufid 64-bit) -- via
a twisty maze of marginal if not illegal type punning it breaks the
cleaner.

This will need to be done over, but it requires substantially more
mechanism and compat ioctls. Booo.
 1.21 20-Jun-2016  dholland u_int{8,16,32,64}_t -> uint{8,16,32,64}_t in remaining lfs headers.
 1.20 20-Jun-2016  dholland Note more already-merged versions:

inode.h 1.68 is subsumed by ulfs_inode.h 1.19
inode.h 1.69-1.72 do not apply to lfs
ufs_extern.h 1.74 was covered when lfs was moved to the new vnode cache
ufs_extern.h 1.75 is equivalent to ulfs_extern.h 1.13
ufs_extern.h 1.76-1.77 do not apply to lfs
ufsmount.h 1.42 does not apply to lfs
ufs_inode.c 1.90 is subsumed by ulfs_inode.c 1.10
ufs_inode.c 1.91-1.92 do not apply to lfs
ufs_lookup.c 1.130 is subsumed by ulfs_lookup.c 1.24
ufs_lookup.c 1.131 is equivalent to ulfs_lookup.c 1.20
ufs_lookup.c 1.132 is equivalent to ulfs_lookup.c 1.21
ufs_lookup.c 1.133 is equivalent to ulfs_lookup.c 1.22
ufs_lookup.c 1.134 is equivalent to ulfs_lookup.c 1.23
ufs_lookup.c 1.135 is equivalent to ulfs_lookup.c 1.25
ufs_quota2.c 1.38 is equivalent to ulfs_quota2.c 1.17
ufs_quota2.c 1.39 is equivalent to ulfs_quota2.c 1.16
ufs_quota2.c 1.40 is equivalent to ulfs_quota2.c 1.18
ufs_vfsops.c 1.53 is subsumed by lfs_vfsops.c 1.324
ufs_vfsops.c 1.54 is subsumed by lfs_vfsops.c 1.324
ufs_vnops.c 1.223-1.224 do not apply to lfs
 1.19 20-Jun-2016  dholland Merge -r1.67 of ufs/inode.h: make the inode field of a filehandle
64-bit instead of truncating to 32 bits. Note that if you're serving
nfs off lfs (but I don't think you are as I think there are known
fatal problems doing so) you'll need to reboot your clients after this
change.

I've used a 64-bit value explicitly instead of ino_t (as in the ufs
structure) because this is a structure whose size ought to be well
defined. I remember some discussion of this when the ufs change was
committed, but not the conclusion (if any) -- if anyone hates this it
can be changed to ino_t easily enough.
 1.18 20-Jun-2016  dholland Merge ufs/inode.h 1.66: remove i_hash from struct inode. This is the
hash table entry link from the old per-fs vnode cache and we don't
need it any more.
 1.17 19-Jun-2016  dholland Mark ufs file versions we're already synced with.
 1.16 01-Sep-2015  dholland Use the lfs dinode accessors in place of the ufs-derived ones.
(Mostly.)

The ufs-derived ones are fake structure member macros, which are gross
and not very safe. Also, it seems that a lot of places in the lfs code
were using the ffsv1 branch of them unconditionally, and this way it's
guaranteed all those places have been updated.

Found while doing this: for non-devices, have getattr produce NODEV
in the rdev field instead of leaking the address of the first direct
block.
 1.15 12-Aug-2015  dholland Hack up dinode usage to be 64 vs. 32 as needed. Part 1.

(This part changes the native lfs code; the ufs-derived code already
has 64 vs. 32 logic, but as aspects of it are unsafe, and don't
entirely interoperate cleanly with the lfs 64/32 stuff, pass 2 will be
rehashing that.)
 1.14 24-Jul-2015  dholland More lfs superblock accessors.
(This changes the rest of the code over; all the accessors were
already added.)

The difference between this commit and the previous one is arbitrary,
but the previous one passed the regression tests on its own so I'm
keeping it separate to help with any bisections that might be needed
in the future.
 1.13 24-Jul-2015  dholland Switch to accessor functions for elements of the LFS on-disk
superblock. This will allow switching between 32/64 bit forms on the
fly; it will also allow handling LFS_EI reasonably tidily. (That
currently doesn't work on the superblock.)

It also gets rid of cpp abuse in the form of fake structure member
macros.

Also, instead of doing sleep/wakeup on &lfs_avail and &lfs_nextseg
inside the on-disk superblock, add extra elements to the in-memory
struct lfs for this. (XXX: these should be changed to condvars, but
not right now)

XXX: this migrates a structure needed by the lfs code in libsa (struct
salfs) into lfs.h, where it doesn't belong, but for the time being
this is necessary in order to allow the accessors (and the various
lfs macros and other goop that relies on them) to compile.
 1.12 17-May-2014  dholland branches: 1.12.2; 1.12.6;
Remove the DIROP macros. They are evil, especially the CREATE ones.

This results in some duplicate logic in the creation vnops (symlink,
mknod, create, mkdir) but we will probably be able to factor it out in
a more sensible way later.

Now the creation vnops call getnewvnode explicitly instead of under
multiple layers of obscure gunk. Then we explicitly do lfs_set_dirop,
and afterwards lfs_unset_dirop.
 1.11 18-Mar-2014  riastradh branches: 1.11.2;
Merge riastradh-drm2 to HEAD.
 1.10 20-Jul-2013  dholland Collect the pieces of lfs rename into lfs_rename.c, and sprinkle static.
 1.9 18-Jun-2013  christos branches: 1.9.2; 1.9.4; 1.9.6;
Prefix most of the cpp macros with lfs_ and LFS_ to avoid conflicts with ffs.
This was done so that boot blocks that want to compile both FFS and LFS in
the same file work.
 1.8 18-Jun-2013  dholland Tuck away a bunch of symbols that don't need to be public.
 1.7 08-Jun-2013  dholland ulfs_dir.h has been emptied; remove it.
 1.6 08-Jun-2013  dholland Split the definitions suitable for userland out of ulfs_inode.h into
lfs_inode.h. Since fsck_lfs, newfs_lfs, and lfs_cleanerd want to reuse
the inode structure for their own internal use, and some of them share
parts of the kernel code as well, the best way forward is to provide a
relatively sanitized header that doesn't bring in stray material.

Shuffle a few other definitions around so that lfs_inode.h depends
only on lfs.h.

Install lfs_inode.h into /usr/include.
 1.5 06-Jun-2013  dholland Cleanups to reduce symbol and header exposure:
- move struct ufid from ulfs_inode.h to lfs.h
- lfs.h needs sys/mount.h and sys/pool.h
- ulfs_quota2_subr.c needs lfs_inode.h
- remove ulfs_inode.h from lfs.h in favor of ulfs_dinode.h
- move ULFS_NDADDR, ULFS_NIADDR, ULFS_NXADDR from ulfs_dinode.h to lfs.h
- remove ulfs_dinode.h from lfs.h
- add lfs.h to ulfs_dinode.h
 1.4 06-Jun-2013  dholland Remove stray references to ext2fs, chfs, ffs, and mfs.
 1.3 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.2 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.9.6.1 23-Jul-2013  riastradh sync with HEAD
 1.9.4.2 18-May-2014  rmind sync with head
 1.9.4.1 28-Aug-2013  rmind sync with head
 1.9.2.4 03-Dec-2017  jdolecek update from HEAD
 1.9.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.9.2.2 23-Jun-2013  tls resync from head
 1.9.2.1 18-Jun-2013  tls file ulfs_inode.h was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.11.2.1 10-Aug-2014  tls Rebase.
 1.12.6.3 28-Aug-2017  skrll Sync with HEAD
 1.12.6.2 09-Jul-2016  skrll Sync with HEAD
 1.12.6.1 22-Sep-2015  skrll Sync with HEAD
 1.12.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.12.2.1 17-May-2014  yamt file ulfs_inode.h was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.22.10.1 30-Oct-2017  snj Pull up following revision(s) (requested by maya in ticket #330):
sbin/fsck_lfs/inode.c: 1.69
sbin/fsck_lfs/lfs.c: 1.73
sbin/fsck_lfs/pass6.c: 1.50
sbin/fsck_lfs/segwrite.c: 1.46
sys/ufs/lfs/lfs.h: 1.202-1.203
sys/ufs/lfs/lfs_accessors.h: 1.48
sys/ufs/lfs/lfs_alloc.c: 1.136-1.137
sys/ufs/lfs/lfs_balloc.c: 1.94
sys/ufs/lfs/lfs_bio.c: 1.141
sys/ufs/lfs/lfs_extern.h: 1.113
sys/ufs/lfs/lfs_inode.c: 1.156-1.157
sys/ufs/lfs/lfs_inode.h: 1.20, 1.21, 1.23
sys/ufs/lfs/lfs_itimes.c: 1.20
sys/ufs/lfs/lfs_pages.c: 1.13-1.15
sys/ufs/lfs/lfs_rename.c: 1.22
sys/ufs/lfs/lfs_segment.c: 1.270-1.275
sys/ufs/lfs/lfs_subr.c: 1.94-1.97
sys/ufs/lfs/lfs_syscalls.c: 1.175
sys/ufs/lfs/lfs_vfsops.c: 1.360
sys/ufs/lfs/lfs_vnops.c: 1.316-1.321
sys/ufs/lfs/ulfs_inode.c: 1.20
sys/ufs/lfs/ulfs_inode.h: 1.24
sys/ufs/lfs/ulfs_lookup.c: 1.41
sys/ufs/lfs/ulfs_quota2.c: 1.31
sys/ufs/lfs/ulfs_readwrite.c: 1.24
sys/ufs/lfs/ulfs_vnops.c: 1.49-1.50
Update inode member i_flag --> i_state to keep up with kernel changes
Move definition of IN_ALLMOD near the flag it's a mask for.
Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
Correct confusion between i_flag and i_flags
These will have to be renamed.
Spotted by Riastradh, thanks!
Add an XXX about the missing flags so it's not buried in a commit
message.
now the XXX count for LFS is 260
Rename i_flag to i_state.
The similarity to i_flags has previously caused errors.
Use continue to denote the no-op loop to match netbsd style
newline for extra clarity.
It isn't safe to drain dirops with seglock held, it'll deadlock if there
are any dirops. drain before grabbing seglock.
lfs_dirops == 0 is always true (as we already drained dirops), so omit
that part of the comparison.
Fixes a lot of LFS deadlocks. PR kern/52301
Many thanks to dholland for help analyzing coredumps
Ifdef out KDASSERT which fires on my machine.
Deduplicate sanity check that seglock is held on segunlock
Revert r1.272 fix to PR kern/52301, the performance hit is making things
unusable.
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar
XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
fix buffer overflow/KASSERT when cookies are supplied
lfs no longer uses the ffs-style struct direct, use the correct minimum
size
from dholland
XXX more wrong
Consistently use {,UN}MARK_VNODE macros rather than function calls.
Not much point doing anything after a panic call
Ask some question about the code in a XXX comment
XXX question our double-flushing of dirops
Fix typo in comment
 1.48 08-Sep-2024  rillig fix a/an grammar in obvious cases
 1.47 06-Aug-2022  andvar branches: 1.47.10;
s/blity/bility/ in various words, mainly in comments.
 1.46 05-Sep-2020  riastradh Revert "ufs: Prevent mkdir from choking on deleted directories."

This change made no sense and should not have been committed.
 1.45 05-Sep-2020  riastradh ufs: Prevent mkdir from choking on deleted directories.

Fix some missing uvm_vnp_setsize in screw cases while here.
 1.44 16-May-2020  christos Add ACL support for FFS. From FreeBSD.
 1.43 04-Apr-2020  ad Merge the remaining changes from the ad-namecache branch, affecting namei()
and getcwd():

- push vnode locking back as far as possible.
- do most lookups directly in the namecache, avoiding vnode locks & refs.
- don't block new refs to vnodes across VOP_INACTIVE().
- get shared locks for VOP_LOOKUP() if the file system supports it.
- correct lock types for VOP_ACCESS() / VOP_GETATTR() in a few places.

Possible future enhancements:

- make the lookups lockless.
- support dotdot lookups by being lockless and inferring absence of chroot.
- maybe make it work for layered file systems.
- avoid vnode references at the root & cwd.
 1.42 14-Mar-2020  ad - Hide the details of SPCF_SHOULDYIELD and related behind a couple of small
functions: preempt_point() and preempt_needed().

- preempt(): if the LWP has exceeded its timeslice in kernel, strip it of
any priority boost gained earlier from blocking.
 1.41 10-Jun-2017  maya branches: 1.41.6; 1.41.12;
Rename i_flag to i_state.

The similarity to i_flags has previously caused errors.
 1.40 30-Mar-2017  hannken branches: 1.40.6;
Remove now redundant calls to fstrans_start()/fstrans_done().

Add fstrans_start()/fstrans_done() to lfs_putpages().
 1.39 20-Jun-2016  dholland branches: 1.39.2; 1.39.4;
Note more already-merged versions:

inode.h 1.68 is subsumed by ulfs_inode.h 1.19
inode.h 1.69-1.72 do not apply to lfs
ufs_extern.h 1.74 was covered when lfs was moved to the new vnode cache
ufs_extern.h 1.75 is equivalent to ulfs_extern.h 1.13
ufs_extern.h 1.76-1.77 do not apply to lfs
ufsmount.h 1.42 does not apply to lfs
ufs_inode.c 1.90 is subsumed by ulfs_inode.c 1.10
ufs_inode.c 1.91-1.92 do not apply to lfs
ufs_lookup.c 1.130 is subsumed by ulfs_lookup.c 1.24
ufs_lookup.c 1.131 is equivalent to ulfs_lookup.c 1.20
ufs_lookup.c 1.132 is equivalent to ulfs_lookup.c 1.21
ufs_lookup.c 1.133 is equivalent to ulfs_lookup.c 1.22
ufs_lookup.c 1.134 is equivalent to ulfs_lookup.c 1.23
ufs_lookup.c 1.135 is equivalent to ulfs_lookup.c 1.25
ufs_quota2.c 1.38 is equivalent to ulfs_quota2.c 1.17
ufs_quota2.c 1.39 is equivalent to ulfs_quota2.c 1.16
ufs_quota2.c 1.40 is equivalent to ulfs_quota2.c 1.18
ufs_vfsops.c 1.53 is subsumed by lfs_vfsops.c 1.324
ufs_vfsops.c 1.54 is subsumed by lfs_vfsops.c 1.324
ufs_vnops.c 1.223-1.224 do not apply to lfs
 1.38 20-Jun-2016  dholland More already-merged or equivalent changes:

ufs_dirhash.c 1.36 corresponds to ulfs_dirhash.c 1.8
ufs_extattr.c 1.43 corresponds to ulfs_extattr.c 1.7
ufs_lookup.c 1.126 does not apply to lfs
ufs_lookup.c 1.127 we already have
ufs_lookup.c 1.128 does not apply to lfs
ufs_lookup.c 1.129 corresponds to ulfs_lookup.c 1.19
ufs_quota1.c 1.19 corresponds to ulfs_quota1.c 1.7
ufs_quota1.c 1.20 corresponds to ulfs_quota1.c 1.8
ufs_quota2.c 1.36 we have equivalent changes for
ufs_rename.c 1.9 corresponds to lfs_rename.c 1.5
ufs_rename.c 1.10 corresponds to lfs_rename.c 1.6
ufs_vnops.c 1.219 corresponds to lfs_vnops.c 1.260 and ulfs_vnops.c 1.19
ufs_vnops.c 1.220 corresponds to lfs_vnops.c 1.261 and ulfs_vnops.c 1.20
ufs_vnops.c 1.221 was superseded by later changes
ufs_vnops.c 1.222 got fixed independently in lfs
 1.37 19-Jun-2016  dholland we already have ufs_lookup.c 1.125 and ufs_vnops.c 1.218.
 1.36 19-Jun-2016  dholland Update the ufs versions these files are synced with by 1: the
201306016 commit by hannken@ that removed references to ffs_snapgone
in ufs doesn't need to be synced into lfs.
 1.35 14-Nov-2015  pgoyette Remove historic references to wapbl.
 1.34 21-Sep-2015  dholland Add 64-bit directory entry structures, and adjust accessors accordingly.

The LFS64 directory entry has a 64-bit inode number. This is stored as
two 32-bit values to avoid inducing 64-bit alignment requirements.

The exposed type for manipulating directory entries is now
LFS_DIRHEADER, following the same convention as e.g. IFILE and SEGUSE.
(But with LFS_ on it, because.)
 1.33 21-Sep-2015  dholland Oops; LFS_DIRECTSIZ() is going to need the fs as an argument.

Also, it turns out that dirhash needs a compile-time-constant version
of LFS_DIRECTSIZ(LFS_MAXNAMLEN+1), independent of 64-vs-32, so create
LFS_MAXDIRENTRYSIZE for this. Sigh.
 1.32 15-Sep-2015  dholland Pass around struct lfs_dirheader instead of struct lfs_direct.
 1.31 15-Sep-2015  dholland Add an accessor function for directory names.
 1.30 15-Sep-2015  dholland Tidyups/fixes preparatory to making d_name[] in struct lfs_direct size
0 instead of size LFS_MAXNAMLEN+1, and preparatory to having accessor
functions for d_name. In particular, don't create prototype entries
and copy them, and access the name field only for directory structures
that are in buffers with space for the name to exist.
 1.29 15-Sep-2015  dholland Tidy up ulfs_direnter: don't malloc a temporary struct lfs_direct
and double-copy it. Just write to the destination buffer.
 1.28 15-Sep-2015  dholland Kill off ulfs_makedirentry; just pass the data to ulfs_direnter instead.
For now, move one copy of the code that allocates and fills in a
temporary struct lfs_direct to the top of ulfs_direnter; but it should
go away shortly.
 1.27 15-Sep-2015  dholland Add and use accessor functions for more of the directory entry fields.
 1.26 01-Sep-2015  dholland Add new accessors for the d_type and d_namlen fields of struct lfs_direct.
Napalm the old byteswap access logic for these.
 1.25 11-Jul-2015  mlelstv mp->mnt_stat.f_flag is never set. Use the mnt_flag directly.
This will now actually prevent the 'bad dir' panic if the filesystem
is read-only.
 1.24 31-May-2015  hannken Change lfs from hash table to vcache.

- Change lfs_valloc() to return an inode number and version instead of
a vnode and move lfs_ialloc() and lfs_vcreate() to new lfs_init_vnode().

- Add lfs_valloc_fixed() to allocate a known inode, used by kernel
roll forward.

- Remove lfs_*ref(), these functions cannot coexist with vcache and
their commented behaviour is far away from their implementation.

- Add the cleaner lwp and blockinfo to struct ulfsmount so lfs_loadvnode()
may use hints from the cleaner.

- Remove vnode locks from ulfs_lookup() like we did with ufs_lookup().
 1.23 28-Mar-2015  maxv Remove the 'cred' argument from breadn(), and update the man page
accordingly.

ok hannken@
 1.22 27-Mar-2015  riastradh Disentangle buffer-cached I/O from page-cached I/O in UFS.

Page-cached I/O is used for regular files, and is initiated by VFS
users such as userland and NFS.

Buffer-cached I/O is used for directories and symlinks, and is issued
only internally by UFS.

New UFS routine ufs_bufio replaces vn_rdwr for internal use.
ufs_bufio is implemented by new UFS operations uo_bufrd/uo_bufwr,
which sit in ufs_readwrite.c alongside the VOP_READ/VOP_WRITE
implementations.

I preserved the code as much as possible and will leave further
simplification for future commits. I kept the ulfs_readwrite.c
copypasta close to ufs_readwrite.c in case we ever want to merge them
back; likewise ext2fs_readwrite.c.

No externally visible semantic change. All atf fs tests still pass.
 1.21 03-Jun-2014  joerg branches: 1.21.4;
Introduce two helper functions to centralise the namecache statistics
in vfs_cache.c. Use consistent locking around the per-cpu data.
 1.20 25-May-2014  hannken Remove ulfs_checkpath() and ulfs_readdotdot(). These are relics
from the pre-genfs_rename era.
 1.19 07-Feb-2014  hannken branches: 1.19.2; 1.19.4;
Change vnode operation lookup to return the resulting vnode *vpp unlocked.
Change cache_lookup() to return an unlocked vnode.

Discussed on tech-kern@

Welcome to 6.99.31
 1.18 28-Jan-2014  martin Bogus gcc 4.8 maybe-used-uninitialized warning
 1.17 25-Oct-2013  martin Mark a diagnostic-only variable
 1.16 17-Oct-2013  christos - remove unused variables
- add debug ifdefs for debugging variables
- __USE() where appropriate.
 1.15 28-Jul-2013  dholland Migrate the miscellaneous ulfs-level info from struct ulfsmount to
struct lfs.

Put them inside #ifdef _KERNEL there. They are not the only such
members, gross as that is. Unfortunately, moving struct lfs to
lfs_kernel.h does not work.
 1.14 28-Jul-2013  dholland Remove the now-pointless ulfs ops macros.
 1.13 28-Jul-2013  dholland Get rid of the ulfs_ops table as we only have one fs in here now.
 1.12 18-Jun-2013  christos branches: 1.12.2; 1.12.4;
Prefix most of the cpp macros with lfs_ and LFS_ to avoid conflicts with ffs.
This was done so that boot blocks that want to compile both FFS and LFS in
the same file work.
 1.11 08-Jun-2013  dholland ulfs_dir.h has been emptied; remove it.
 1.10 08-Jun-2013  dholland There is no WAPBL in LFS.
 1.9 08-Jun-2013  dholland DIRBLKSIZ -> LFS_DIRBLKSIZ
DIRECTSIZ -> LFS_DIRECTSIZ
DIRSIZ -> LFS_DIRSIZ
OLDDIRFMT -> LFS_OLDDIRFMT
NEWDIRFMT -> LFS_NEWDIRFMT
IFTODT -> LFS_IFTODT
DTTOIF -> LFS_DTTOIF
 1.8 08-Jun-2013  dholland Move stuff to lfs.h that's needed by userland:
LFS_DT_*
ULFS_ROOTINO
ULFS_WINO
struct lfs_direct
struct lfs_dirtemplate
struct lfs_odirtemplate
struct ulfs_args

Also fix FFS_MAXNAMLEN -> LFS_MAXNAMLEN in several places.
 1.7 08-Jun-2013  dholland struct direct -> struct lfs_direct
struct dirtemplate -> struct lfs_dirtemplate
struct odirtemplate -> struct lfs_odirtemplate
DT_* -> LFS_DT_*
 1.6 06-Jun-2013  dholland Apparently we also need to cut and paste ffs_snapgone() in order to be
able to link the ufs code.

Instead of actually cutting and pasting it (as it depends on ffs-only
things) implement it as panic. Probably we'll be able to demonstrate
later that it's unreachable.

XXX: Someone should add snapgone to struct ufs_ops in ufs/ufsmount.h,
XXX: and fix ufs/ufs_lookup.c to not hardwire ffs.
 1.5 06-Jun-2013  dholland Add lfs_ or ulfs_ in front of extern symbols lacking them, mostly
quota-related (and particularly quota2-related) stuff.
 1.4 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.3 06-Jun-2013  dholland Split lfs from ufs step 3: rearrange config stuff.
Add new options:
LFS_EI
LFS_DIRHASH
LFS_EXTATTR
LFS_EXTATTR_AUTOSTART
LFS_QUOTA
LFS_QUOTA2

and update code referring to the corresponding FFS and UFS config
symbols to use the LFS versions. Disable the one extant reference
to APPLE_UFS in the ulfs files. Use opt_lfs.h only, not opt_ffs.h.
 1.2 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.12.4.2 18-May-2014  rmind sync with head
 1.12.4.1 28-Aug-2013  rmind sync with head
 1.12.2.4 03-Dec-2017  jdolecek update from HEAD
 1.12.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.12.2.2 23-Jun-2013  tls resync from head
 1.12.2.1 18-Jun-2013  tls file ulfs_lookup.c was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.19.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.19.4.1 07-Feb-2014  yamt file ulfs_lookup.c was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.19.2.1 10-Aug-2014  tls Rebase.
 1.21.4.5 28-Aug-2017  skrll Sync with HEAD
 1.21.4.4 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.21.4.3 22-Sep-2015  skrll Sync with HEAD
 1.21.4.2 06-Jun-2015  skrll Sync with HEAD
 1.21.4.1 06-Apr-2015  skrll Sync with HEAD
 1.39.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.39.2.1 26-Apr-2017  pgoyette Sync with HEAD
 1.40.6.1 30-Oct-2017  snj Pull up following revision(s) (requested by maya in ticket #330):
sbin/fsck_lfs/inode.c: 1.69
sbin/fsck_lfs/lfs.c: 1.73
sbin/fsck_lfs/pass6.c: 1.50
sbin/fsck_lfs/segwrite.c: 1.46
sys/ufs/lfs/lfs.h: 1.202-1.203
sys/ufs/lfs/lfs_accessors.h: 1.48
sys/ufs/lfs/lfs_alloc.c: 1.136-1.137
sys/ufs/lfs/lfs_balloc.c: 1.94
sys/ufs/lfs/lfs_bio.c: 1.141
sys/ufs/lfs/lfs_extern.h: 1.113
sys/ufs/lfs/lfs_inode.c: 1.156-1.157
sys/ufs/lfs/lfs_inode.h: 1.20, 1.21, 1.23
sys/ufs/lfs/lfs_itimes.c: 1.20
sys/ufs/lfs/lfs_pages.c: 1.13-1.15
sys/ufs/lfs/lfs_rename.c: 1.22
sys/ufs/lfs/lfs_segment.c: 1.270-1.275
sys/ufs/lfs/lfs_subr.c: 1.94-1.97
sys/ufs/lfs/lfs_syscalls.c: 1.175
sys/ufs/lfs/lfs_vfsops.c: 1.360
sys/ufs/lfs/lfs_vnops.c: 1.316-1.321
sys/ufs/lfs/ulfs_inode.c: 1.20
sys/ufs/lfs/ulfs_inode.h: 1.24
sys/ufs/lfs/ulfs_lookup.c: 1.41
sys/ufs/lfs/ulfs_quota2.c: 1.31
sys/ufs/lfs/ulfs_readwrite.c: 1.24
sys/ufs/lfs/ulfs_vnops.c: 1.49-1.50
Update inode member i_flag --> i_state to keep up with kernel changes
Move definition of IN_ALLMOD near the flag it's a mask for.
Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
Correct confusion between i_flag and i_flags
These will have to be renamed.
Spotted by Riastradh, thanks!
Add an XXX about the missing flags so it's not buried in a commit
message.
now the XXX count for LFS is 260
Rename i_flag to i_state.
The similarity to i_flags has previously caused errors.
Use continue to denote the no-op loop to match netbsd style
newline for extra clarity.
It isn't safe to drain dirops with seglock held, it'll deadlock if there
are any dirops. drain before grabbing seglock.
lfs_dirops == 0 is always true (as we already drained dirops), so omit
that part of the comparison.
Fixes a lot of LFS deadlocks. PR kern/52301
Many thanks to dholland for help analyzing coredumps
Ifdef out KDASSERT which fires on my machine.
Deduplicate sanity check that seglock is held on segunlock
Revert r1.272 fix to PR kern/52301, the performance hit is making things
unusable.
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar
XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
fix buffer overflow/KASSERT when cookies are supplied
lfs no longer uses the ffs-style struct direct, use the correct minimum
size
from dholland
XXX more wrong
Consistently use {,UN}MARK_VNODE macros rather than function calls.
Not much point doing anything after a panic call
Ask some question about the code in a XXX comment
XXX question our double-flushing of dirops
Fix typo in comment
 1.41.12.1 19-Jan-2020  ad Set IMNT_SHRLOOKUP and use it for the in-cache case. Need to check what
more can be done with tmpfs though, it can probably do the whole lookup.
 1.41.6.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.47.10.1 02-Aug-2025  perseant Sync with HEAD
 1.13 19-Jun-2016  dholland Mark ufs file versions we're already synced with.
 1.12 28-Jun-2014  dholland branches: 1.12.4;
Revert the following changes:

src/sys/sys/quotactl.h 1.37
src/sys/compat/netbsd32/netbsd32.h 1.101
src/sys/compat/netbsd32/netbsd32_netbsd.c 1.188, 1.189
src/sys/kern/vfs_quotactl.c 1.39
src/sys/kern/vfs_syscalls.c 1.483
src/sys/ufs/lfs/ulfs_quota.c 1.11
src/sys/ufs/ufs/ufs_quota.c 1.116
src/lib/libquota/quota_kernel.c 1.5

and do them correctly.

If you're going to change the name of something, you need to change
the name of *all* the things with the same name, not just a handful,
and you should change it to something similar so it still matches the
rest of the system rather than just picking an arbitrarily different
name.

Hi, Joerg.

To wit, rename the quotactl "delete" operation to "del", because
"delete" is a reserved word in C++ and for some reason Joerg wants to
run internal interfaces used only by C code through his C++ compiler.
Do not rename it to "remove" instead, because this doesn't match
libquota or the rest of the usage throughout the system; and rename
all the related identifiers, not just the ones that blew the mind of
Joerg's C++ compiler.

Because this is not a user-facing API (the only userland consumer
sys/quotactl.h is libquota) it is sort of ok to make arbitrary
source-incompatible changes; however, by the same token it's completely
unnecessary. If it *were* a user-facing API that someone might have a
semi-rational reason to want to run a C++ compiler on, it would be
incorrect to change it at this point.
 1.11 12-Jun-2014  joerg Don't t use a C++ keyword as field name.
 1.10 22-Nov-2013  dholland branches: 1.10.2; 1.10.4;
fix typo; hi christos
 1.9 16-Nov-2013  dholland This is now equivalent to ufs_quota.c -r1.115.

(it isn't quite the same textually in a few places but this doesn't
really matter)
 1.8 18-Oct-2013  christos fix unused variable warnings
 1.7 28-Jul-2013  dholland Migrate the miscellaneous ulfs-level info from struct ulfsmount to
struct lfs.

Put them inside #ifdef _KERNEL there. They are not the only such
members, gross as that is. Unfortunately, moving struct lfs to
lfs_kernel.h does not work.
 1.6 06-Jun-2013  dholland branches: 1.6.2; 1.6.4;
Cleanups and hacks to make lfs userland stuff build:
- lfs_cksum.c doesn't actually need ulfs_inode.h any more.
- neither does lfs_itimes.c.
- add hacks to fsck_lfs to make it compile.
- add hacks to newfs_lfs to make it compile.
- fix warning in ulfs_quota.c when quotas are fully disabled
(as I guess is happening with the rumpity version)

XXX: This commit adds -I${NETBSDSRCDIR}/sys to the Makefiles for
XXX: fsck_lfs, newfs_lfs, and lfs_cleanerd. This needs to be cleaned
XXX: up ASAP; but I consider this less problematic in the short term
XXX: than spewing ulfs_*.h into /usr/include.
 1.5 06-Jun-2013  dholland Add lfs_ or ulfs_ in front of extern symbols lacking them, mostly
quota-related (and particularly quota2-related) stuff.
 1.4 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.3 06-Jun-2013  dholland Split lfs from ufs step 3: rearrange config stuff.
Add new options:
LFS_EI
LFS_DIRHASH
LFS_EXTATTR
LFS_EXTATTR_AUTOSTART
LFS_QUOTA
LFS_QUOTA2

and update code referring to the corresponding FFS and UFS config
symbols to use the LFS versions. Disable the one extant reference
to APPLE_UFS in the ulfs files. Use opt_lfs.h only, not opt_ffs.h.
 1.2 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.6.4.2 18-May-2014  rmind sync with head
 1.6.4.1 28-Aug-2013  rmind sync with head
 1.6.2.4 03-Dec-2017  jdolecek update from HEAD
 1.6.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.6.2.2 23-Jun-2013  tls resync from head
 1.6.2.1 06-Jun-2013  tls file ulfs_quota.c was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.10.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.10.4.1 22-Nov-2013  yamt file ulfs_quota.c was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.10.2.1 10-Aug-2014  tls Rebase.
 1.12.4.1 09-Jul-2016  skrll Sync with HEAD
 1.7 20-Jun-2016  dholland u_int{8,16,32,64}_t -> uint{8,16,32,64}_t in remaining lfs headers.
 1.6 19-Jun-2016  dholland Mark ufs file versions we're already synced with.
 1.5 28-Jun-2014  dholland branches: 1.5.4;
Revert the following changes:

src/sys/sys/quotactl.h 1.37
src/sys/compat/netbsd32/netbsd32.h 1.101
src/sys/compat/netbsd32/netbsd32_netbsd.c 1.188, 1.189
src/sys/kern/vfs_quotactl.c 1.39
src/sys/kern/vfs_syscalls.c 1.483
src/sys/ufs/lfs/ulfs_quota.c 1.11
src/sys/ufs/ufs/ufs_quota.c 1.116
src/lib/libquota/quota_kernel.c 1.5

and do them correctly.

If you're going to change the name of something, you need to change
the name of *all* the things with the same name, not just a handful,
and you should change it to something similar so it still matches the
rest of the system rather than just picking an arbitrarily different
name.

Hi, Joerg.

To wit, rename the quotactl "delete" operation to "del", because
"delete" is a reserved word in C++ and for some reason Joerg wants to
run internal interfaces used only by C code through his C++ compiler.
Do not rename it to "remove" instead, because this doesn't match
libquota or the rest of the usage throughout the system; and rename
all the related identifiers, not just the ones that blew the mind of
Joerg's C++ compiler.

Because this is not a user-facing API (the only userland consumer
sys/quotactl.h is libquota) it is sort of ok to make arbitrary
source-incompatible changes; however, by the same token it's completely
unnecessary. If it *were* a user-facing API that someone might have a
semi-rational reason to want to run a C++ compiler on, it would be
incorrect to change it at this point.
 1.4 06-Jun-2013  dholland branches: 1.4.2; 1.4.8; 1.4.10;
Add lfs_ or ulfs_ in front of extern symbols lacking them, mostly
quota-related (and particularly quota2-related) stuff.
 1.3 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.2 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.4.10.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.4.10.1 06-Jun-2013  yamt file ulfs_quota.h was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.4.8.1 10-Aug-2014  tls Rebase.
 1.4.2.4 03-Dec-2017  jdolecek update from HEAD
 1.4.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.4.2.2 23-Jun-2013  tls resync from head
 1.4.2.1 06-Jun-2013  tls file ulfs_quota.h was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.5.4.1 09-Jul-2016  skrll Sync with HEAD
 1.12 29-Jun-2021  dholland Add containment for the cloning devices hack in vn_open.

Cloning devices (and also things like /dev/stderr) work by allocating
a struct file, stuffing it in the file table (which is a layer
violation), stuffing the file descriptor number for it in a magic
field of struct lwp (which is gross), and then "failing" with one of
two magic errnos, EDUPFD or EMOVEFD.

Before this commit, all callers of vn_open in the kernel (there are
quite a few) were expected to check for these errors and handle the
situation. Needless to say, none of them except for open() itself did,
resulting in internal negative errnos being returned to userspace.

This hack is fairly deeply rooted and cannot be eliminated all at
once. This commit adds logic to handle the magic errnos inside
vn_open; now on success vn_open returns either a vnode or an integer
file descriptor, along with a flag that says whether the underlying
code requested EDUPFD or EMOVEFD. Callers not prepared to cope with
file descriptors can pass NULL for the extra return values, in which
case if a file descriptor would be produced vn_open fails with
EOPNOTSUPP.

Since I'm rearranging vn_open's signature anyway, stop exposing struct
nameidata. Instead, take three arguments: an optional vnode to use as
the starting point (like openat()), the path, and additional namei
flags to use, restricted to NOCHROOT and TRYEMULROOT. (Other namei
behavior, e.g. NOFOLLOW, can be requested via the open flags.)

This change requires a kernel bump. Ride the one an hour ago.
(That was supposed to be coordinated; did not intend to let an hour
slip by. My fault.)
 1.11 20-Jun-2016  dholland branches: 1.11.34;
Merge -r1.20 and -r1.21 of ufs_quota1.c: widen before multiplying.
 1.10 20-Jun-2016  dholland More already-merged or equivalent changes:

ufs_dirhash.c 1.36 corresponds to ulfs_dirhash.c 1.8
ufs_extattr.c 1.43 corresponds to ulfs_extattr.c 1.7
ufs_lookup.c 1.126 does not apply to lfs
ufs_lookup.c 1.127 we already have
ufs_lookup.c 1.128 does not apply to lfs
ufs_lookup.c 1.129 corresponds to ulfs_lookup.c 1.19
ufs_quota1.c 1.19 corresponds to ulfs_quota1.c 1.7
ufs_quota1.c 1.20 corresponds to ulfs_quota1.c 1.8
ufs_quota2.c 1.36 we have equivalent changes for
ufs_rename.c 1.9 corresponds to lfs_rename.c 1.5
ufs_rename.c 1.10 corresponds to lfs_rename.c 1.6
ufs_vnops.c 1.219 corresponds to lfs_vnops.c 1.260 and ulfs_vnops.c 1.19
ufs_vnops.c 1.220 corresponds to lfs_vnops.c 1.261 and ulfs_vnops.c 1.20
ufs_vnops.c 1.221 was superseded by later changes
ufs_vnops.c 1.222 got fixed independently in lfs
 1.9 26-Jul-2015  hannken Remove bogus "mutex_enter(&mntvnode_lock)".
 1.8 24-May-2014  christos branches: 1.8.4;
Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.
 1.7 17-Mar-2014  hannken branches: 1.7.2; 1.7.4;
Change lfsquota1_handle_cmd_quotaon() and lfs_q1sync()
to use vfs_vnode_iterator.
 1.6 28-Jul-2013  dholland Migrate the miscellaneous ulfs-level info from struct ulfsmount to
struct lfs.

Put them inside #ifdef _KERNEL there. They are not the only such
members, gross as that is. Unfortunately, moving struct lfs to
lfs_kernel.h does not work.
 1.5 08-Jun-2013  dholland branches: 1.5.2; 1.5.4;
mp->mnt_wapbl and mp->mnt_wapbl_replay are always NULL in here.
 1.4 06-Jun-2013  dholland Add lfs_ or ulfs_ in front of extern symbols lacking them, mostly
quota-related (and particularly quota2-related) stuff.
 1.3 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.2 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.5.4.2 18-May-2014  rmind sync with head
 1.5.4.1 28-Aug-2013  rmind sync with head
 1.5.2.4 03-Dec-2017  jdolecek update from HEAD
 1.5.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.5.2.2 23-Jun-2013  tls resync from head
 1.5.2.1 08-Jun-2013  tls file ulfs_quota1.c was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.7.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.7.4.1 17-Mar-2014  yamt file ulfs_quota1.c was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.7.2.1 10-Aug-2014  tls Rebase.
 1.8.4.2 09-Jul-2016  skrll Sync with HEAD
 1.8.4.1 22-Sep-2015  skrll Sync with HEAD
 1.11.34.1 01-Aug-2021  thorpej Sync with HEAD.
 1.5 20-Jun-2016  dholland u_int{8,16,32,64}_t -> uint{8,16,32,64}_t in remaining lfs headers.
 1.4 06-Jun-2013  dholland branches: 1.4.2; 1.4.10; 1.4.14;
Add lfs_ or ulfs_ in front of extern symbols lacking them, mostly
quota-related (and particularly quota2-related) stuff.
 1.3 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.2 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.4.14.1 09-Jul-2016  skrll Sync with HEAD
 1.4.10.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.4.10.1 06-Jun-2013  yamt file ulfs_quota1.h was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.4.2.3 03-Dec-2017  jdolecek update from HEAD
 1.4.2.2 23-Jun-2013  tls resync from head
 1.4.2.1 06-Jun-2013  tls file ulfs_quota1.h was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.4 25-Jul-2021  skrll #include <sys/param.h> for COHERENCY_UNIT (and KNF)
 1.3 06-Jun-2013  dholland branches: 1.3.2; 1.3.10; 1.3.54;
Add lfs_ or ulfs_ in front of extern symbols lacking them, mostly
quota-related (and particularly quota2-related) stuff.
 1.2 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.3.54.1 01-Aug-2021  thorpej Sync with HEAD.
 1.3.10.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.3.10.1 06-Jun-2013  yamt file ulfs_quota1_subr.c was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.3.2.2 23-Jun-2013  tls resync from head
 1.3.2.1 06-Jun-2013  tls file ulfs_quota1_subr.c was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.35 28-May-2022  andvar s/grabing/grabbing/ in comments.
 1.34 15-Oct-2021  andvar fix typos in comments.
 1.33 05-Dec-2020  thorpej Remove unnecessary inclusion of <sys/timevar.h>.
 1.32 17-Jan-2020  ad branches: 1.32.6;
VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.
 1.31 10-Jun-2017  maya branches: 1.31.6; 1.31.12;
Rename i_flag to i_state.

The similarity to i_flags has previously caused errors.
 1.30 30-Mar-2017  hannken branches: 1.30.6;
Remove now redundant calls to fstrans_start()/fstrans_done().

Add fstrans_start()/fstrans_done() to lfs_putpages().
 1.29 20-Nov-2016  riastradh branches: 1.29.2;
KASSERT(mutex_owner(...)) ---> KASSERT(mutex_owned(...))
 1.28 07-Jul-2016  msaitoh branches: 1.28.2;
KNF. Remove extra spaces. No functional change.
 1.27 20-Jun-2016  dholland Note more already-merged versions:

inode.h 1.68 is subsumed by ulfs_inode.h 1.19
inode.h 1.69-1.72 do not apply to lfs
ufs_extern.h 1.74 was covered when lfs was moved to the new vnode cache
ufs_extern.h 1.75 is equivalent to ulfs_extern.h 1.13
ufs_extern.h 1.76-1.77 do not apply to lfs
ufsmount.h 1.42 does not apply to lfs
ufs_inode.c 1.90 is subsumed by ulfs_inode.c 1.10
ufs_inode.c 1.91-1.92 do not apply to lfs
ufs_lookup.c 1.130 is subsumed by ulfs_lookup.c 1.24
ufs_lookup.c 1.131 is equivalent to ulfs_lookup.c 1.20
ufs_lookup.c 1.132 is equivalent to ulfs_lookup.c 1.21
ufs_lookup.c 1.133 is equivalent to ulfs_lookup.c 1.22
ufs_lookup.c 1.134 is equivalent to ulfs_lookup.c 1.23
ufs_lookup.c 1.135 is equivalent to ulfs_lookup.c 1.25
ufs_quota2.c 1.38 is equivalent to ulfs_quota2.c 1.17
ufs_quota2.c 1.39 is equivalent to ulfs_quota2.c 1.16
ufs_quota2.c 1.40 is equivalent to ulfs_quota2.c 1.18
ufs_vfsops.c 1.53 is subsumed by lfs_vfsops.c 1.324
ufs_vfsops.c 1.54 is subsumed by lfs_vfsops.c 1.324
ufs_vnops.c 1.223-1.224 do not apply to lfs
 1.26 20-Jun-2016  dholland Merge some cosmetic changes from ffs_quota2.c 1.5. I didn't merge the
whitespace changes.
 1.25 20-Jun-2016  dholland Remove stray 'n' in file. silly control key...
 1.24 20-Jun-2016  dholland Merge ufs_quota2.c 1.37: set grace time if lowering the limit causes
the user/group to now be over quota. From Edgar Fu�.
 1.23 20-Jun-2016  dholland More already-merged or equivalent changes:

ufs_dirhash.c 1.36 corresponds to ulfs_dirhash.c 1.8
ufs_extattr.c 1.43 corresponds to ulfs_extattr.c 1.7
ufs_lookup.c 1.126 does not apply to lfs
ufs_lookup.c 1.127 we already have
ufs_lookup.c 1.128 does not apply to lfs
ufs_lookup.c 1.129 corresponds to ulfs_lookup.c 1.19
ufs_quota1.c 1.19 corresponds to ulfs_quota1.c 1.7
ufs_quota1.c 1.20 corresponds to ulfs_quota1.c 1.8
ufs_quota2.c 1.36 we have equivalent changes for
ufs_rename.c 1.9 corresponds to lfs_rename.c 1.5
ufs_rename.c 1.10 corresponds to lfs_rename.c 1.6
ufs_vnops.c 1.219 corresponds to lfs_vnops.c 1.260 and ulfs_vnops.c 1.19
ufs_vnops.c 1.220 corresponds to lfs_vnops.c 1.261 and ulfs_vnops.c 1.20
ufs_vnops.c 1.221 was superseded by later changes
ufs_vnops.c 1.222 got fixed independently in lfs
 1.22 14-Nov-2015  pgoyette Remove historic references to wapbl.
 1.21 28-Jul-2015  dholland Add a new lfs header file: lfs_accessors.h.

This contains all the accessor functions and macros out of lfs.h.
Add an include of lfs_accessors.h after all uses of lfs.h... except
for code that wants to define its own struct lfs-alike that the
accessors are supposed to play along with. For these, set STRUCT_LFS
and include lfs_accessors.h after the necessary structure has been
defined, so that lfs_accessors.h can emit functions in terms of it.
 1.20 24-Jul-2015  dholland More lfs superblock accessors.
(This changes the rest of the code over; all the accessors were
already added.)

The difference between this commit and the previous one is arbitrary,
but the previous one passed the regression tests on its own so I'm
keeping it separate to help with any bisections that might be needed
in the future.
 1.19 24-Jul-2015  dholland Switch to accessor functions for elements of the LFS on-disk
superblock. This will allow switching between 32/64 bit forms on the
fly; it will also allow handling LFS_EI reasonably tidily. (That
currently doesn't work on the superblock.)

It also gets rid of cpp abuse in the form of fake structure member
macros.

Also, instead of doing sleep/wakeup on &lfs_avail and &lfs_nextseg
inside the on-disk superblock, add extra elements to the in-memory
struct lfs for this. (XXX: these should be changed to condvars, but
not right now)

XXX: this migrates a structure needed by the lfs code in libsa (struct
salfs) into lfs.h, where it doesn't belong, but for the time being
this is necessary in order to allow the accessors (and the various
lfs macros and other goop that relies on them) to compile.
 1.18 28-Mar-2015  maxv Remove the 'cred' argument from bread(). Remove a now unused var in
ffs_snapshot.c. Update the man page accordingly.

ok hannken@
 1.17 08-Dec-2014  justin Avoid uninitialized variable error in some cases with gcc
 1.16 28-Jun-2014  dholland branches: 1.16.4;
Revert the following changes:

src/sys/sys/quotactl.h 1.37
src/sys/compat/netbsd32/netbsd32.h 1.101
src/sys/compat/netbsd32/netbsd32_netbsd.c 1.188, 1.189
src/sys/kern/vfs_quotactl.c 1.39
src/sys/kern/vfs_syscalls.c 1.483
src/sys/ufs/lfs/ulfs_quota.c 1.11
src/sys/ufs/ufs/ufs_quota.c 1.116
src/lib/libquota/quota_kernel.c 1.5

and do them correctly.

If you're going to change the name of something, you need to change
the name of *all* the things with the same name, not just a handful,
and you should change it to something similar so it still matches the
rest of the system rather than just picking an arbitrarily different
name.

Hi, Joerg.

To wit, rename the quotactl "delete" operation to "del", because
"delete" is a reserved word in C++ and for some reason Joerg wants to
run internal interfaces used only by C code through his C++ compiler.
Do not rename it to "remove" instead, because this doesn't match
libquota or the rest of the usage throughout the system; and rename
all the related identifiers, not just the ones that blew the mind of
Joerg's C++ compiler.

Because this is not a user-facing API (the only userland consumer
sys/quotactl.h is libquota) it is sort of ok to make arbitrary
source-incompatible changes; however, by the same token it's completely
unnecessary. If it *were* a user-facing API that someone might have a
semi-rational reason to want to run a C++ compiler on, it would be
incorrect to change it at this point.
 1.15 18-Oct-2013  christos branches: 1.15.2; 1.15.4;
fix unused variable warnings
 1.14 18-Oct-2013  christos use __USE() in the right place, instead of (void)var.
 1.13 29-Jul-2013  dholland Fix build both with and without options LFS_EI.
 1.12 29-Jul-2013  dholland Revert previous; it is wrong.
 1.11 28-Jul-2013  pgoyette Remove more unused variables to unbreak the build.
 1.10 28-Jul-2013  dholland Bring in a copy of ffs_quota2_mount() for reference.
Add stuff to struct lfs that it needs to initialize.
Clear these fields in mount as there's no on-disk support for quota2;
but this increases the chances of being able to add it (or something
like it) in the future.
 1.9 28-Jul-2013  dholland Migrate the miscellaneous ulfs-level info from struct ulfsmount to
struct lfs.

Put them inside #ifdef _KERNEL there. They are not the only such
members, gross as that is. Unfortunately, moving struct lfs to
lfs_kernel.h does not work.
 1.8 28-Jul-2013  dholland Remove the now-pointless ulfs ops macros.
 1.7 28-Jul-2013  dholland Get rid of the ulfs_ops table as we only have one fs in here now.
 1.6 08-Jun-2013  dholland branches: 1.6.2; 1.6.4;
There is no WAPBL in LFS.
 1.5 06-Jun-2013  dholland Add lfs_ or ulfs_ in front of extern symbols lacking them, mostly
quota-related (and particularly quota2-related) stuff.
 1.4 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.3 06-Jun-2013  dholland Split lfs from ufs step 3: rearrange config stuff.
Add new options:
LFS_EI
LFS_DIRHASH
LFS_EXTATTR
LFS_EXTATTR_AUTOSTART
LFS_QUOTA
LFS_QUOTA2

and update code referring to the corresponding FFS and UFS config
symbols to use the LFS versions. Disable the one extant reference
to APPLE_UFS in the ulfs files. Use opt_lfs.h only, not opt_ffs.h.
 1.2 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.6.4.2 18-May-2014  rmind sync with head
 1.6.4.1 28-Aug-2013  rmind sync with head
 1.6.2.4 03-Dec-2017  jdolecek update from HEAD
 1.6.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.6.2.2 23-Jun-2013  tls resync from head
 1.6.2.1 08-Jun-2013  tls file ulfs_quota2.c was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.15.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.15.4.1 18-Oct-2013  yamt file ulfs_quota2.c was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.15.2.1 10-Aug-2014  tls Rebase.
 1.16.4.6 28-Aug-2017  skrll Sync with HEAD
 1.16.4.5 05-Dec-2016  skrll Sync with HEAD
 1.16.4.4 09-Jul-2016  skrll Sync with HEAD
 1.16.4.3 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.16.4.2 22-Sep-2015  skrll Sync with HEAD
 1.16.4.1 06-Apr-2015  skrll Sync with HEAD
 1.28.2.2 26-Apr-2017  pgoyette Sync with HEAD
 1.28.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.29.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.30.6.1 30-Oct-2017  snj Pull up following revision(s) (requested by maya in ticket #330):
sbin/fsck_lfs/inode.c: 1.69
sbin/fsck_lfs/lfs.c: 1.73
sbin/fsck_lfs/pass6.c: 1.50
sbin/fsck_lfs/segwrite.c: 1.46
sys/ufs/lfs/lfs.h: 1.202-1.203
sys/ufs/lfs/lfs_accessors.h: 1.48
sys/ufs/lfs/lfs_alloc.c: 1.136-1.137
sys/ufs/lfs/lfs_balloc.c: 1.94
sys/ufs/lfs/lfs_bio.c: 1.141
sys/ufs/lfs/lfs_extern.h: 1.113
sys/ufs/lfs/lfs_inode.c: 1.156-1.157
sys/ufs/lfs/lfs_inode.h: 1.20, 1.21, 1.23
sys/ufs/lfs/lfs_itimes.c: 1.20
sys/ufs/lfs/lfs_pages.c: 1.13-1.15
sys/ufs/lfs/lfs_rename.c: 1.22
sys/ufs/lfs/lfs_segment.c: 1.270-1.275
sys/ufs/lfs/lfs_subr.c: 1.94-1.97
sys/ufs/lfs/lfs_syscalls.c: 1.175
sys/ufs/lfs/lfs_vfsops.c: 1.360
sys/ufs/lfs/lfs_vnops.c: 1.316-1.321
sys/ufs/lfs/ulfs_inode.c: 1.20
sys/ufs/lfs/ulfs_inode.h: 1.24
sys/ufs/lfs/ulfs_lookup.c: 1.41
sys/ufs/lfs/ulfs_quota2.c: 1.31
sys/ufs/lfs/ulfs_readwrite.c: 1.24
sys/ufs/lfs/ulfs_vnops.c: 1.49-1.50
Update inode member i_flag --> i_state to keep up with kernel changes
Move definition of IN_ALLMOD near the flag it's a mask for.
Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
Correct confusion between i_flag and i_flags
These will have to be renamed.
Spotted by Riastradh, thanks!
Add an XXX about the missing flags so it's not buried in a commit
message.
now the XXX count for LFS is 260
Rename i_flag to i_state.
The similarity to i_flags has previously caused errors.
Use continue to denote the no-op loop to match netbsd style
newline for extra clarity.
It isn't safe to drain dirops with seglock held, it'll deadlock if there
are any dirops. drain before grabbing seglock.
lfs_dirops == 0 is always true (as we already drained dirops), so omit
that part of the comparison.
Fixes a lot of LFS deadlocks. PR kern/52301
Many thanks to dholland for help analyzing coredumps
Ifdef out KDASSERT which fires on my machine.
Deduplicate sanity check that seglock is held on segunlock
Revert r1.272 fix to PR kern/52301, the performance hit is making things
unusable.
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar
XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
fix buffer overflow/KASSERT when cookies are supplied
lfs no longer uses the ffs-style struct direct, use the correct minimum
size
from dholland
XXX more wrong
Consistently use {,UN}MARK_VNODE macros rather than function calls.
Not much point doing anything after a panic call
Ask some question about the code in a XXX comment
XXX question our double-flushing of dirops
Fix typo in comment
 1.31.12.1 17-Jan-2020  ad Sync with head.
 1.31.6.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.32.6.1 14-Dec-2020  thorpej Sync w/ HEAD.
 1.4 06-Jun-2013  dholland branches: 1.4.2; 1.4.10;
Add lfs_ or ulfs_ in front of extern symbols lacking them, mostly
quota-related (and particularly quota2-related) stuff.
 1.3 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.2 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.4.10.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.4.10.1 06-Jun-2013  yamt file ulfs_quota2.h was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.4.2.2 23-Jun-2013  tls resync from head
 1.4.2.1 06-Jun-2013  tls file ulfs_quota2.h was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.7 24-Aug-2023  andvar s/defaut/default/ in comments.
 1.6 06-Jun-2013  dholland branches: 1.6.2; 1.6.10;
Cleanups to reduce symbol and header exposure:
- move struct ufid from ulfs_inode.h to lfs.h
- lfs.h needs sys/mount.h and sys/pool.h
- ulfs_quota2_subr.c needs lfs_inode.h
- remove ulfs_inode.h from lfs.h in favor of ulfs_dinode.h
- move ULFS_NDADDR, ULFS_NIADDR, ULFS_NXADDR from ulfs_dinode.h to lfs.h
- remove ulfs_dinode.h from lfs.h
- add lfs.h to ulfs_dinode.h
 1.5 06-Jun-2013  dholland Remove stray references to ext2fs, chfs, ffs, and mfs.
 1.4 06-Jun-2013  dholland Add lfs_ or ulfs_ in front of extern symbols lacking them, mostly
quota-related (and particularly quota2-related) stuff.
 1.3 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.2 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.6.10.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.6.10.1 06-Jun-2013  yamt file ulfs_quota2_subr.c was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.6.2.2 23-Jun-2013  tls resync from head
 1.6.2.1 06-Jun-2013  tls file ulfs_quota2_subr.c was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.4 08-Jun-2013  dholland branches: 1.4.2; 1.4.10;
Split the definitions suitable for userland out of ulfs_inode.h into
lfs_inode.h. Since fsck_lfs, newfs_lfs, and lfs_cleanerd want to reuse
the inode structure for their own internal use, and some of them share
parts of the kernel code as well, the best way forward is to provide a
relatively sanitized header that doesn't bring in stray material.

Shuffle a few other definitions around so that lfs_inode.h depends
only on lfs.h.

Install lfs_inode.h into /usr/include.
 1.3 06-Jun-2013  dholland Add lfs_ or ulfs_ in front of extern symbols lacking them, mostly
quota-related (and particularly quota2-related) stuff.
 1.2 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.4.10.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.4.10.1 08-Jun-2013  yamt file ulfs_quotacommon.h was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.4.2.2 23-Jun-2013  tls resync from head
 1.4.2.1 08-Jun-2013  tls file ulfs_quotacommon.h was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.28 20-Oct-2021  thorpej Overhaul of the EVFILT_VNODE kevent(2) filter:

- Centralize vnode kevent handling in the VOP_*() wrappers, rather than
forcing each individual file system to deal with it (except VOP_RENAME(),
because VOP_RENAME() is a mess and we currently have 2 different ways
of handling it; at least it's reasonably well-centralized in the "new"
way).
- Add support for NOTE_OPEN, NOTE_CLOSE, NOTE_CLOSE_WRITE, and NOTE_READ,
compatible with the same events in FreeBSD.
- Track which kevent notifications clients are interested in receiving
to avoid doing work for events no one cares about (avoiding, e.g.
taking locks and traversing the klist to send a NOTE_WRITE when
someone is merely watching for a file to be deleted, for example).

In support of the above:

- Add support in vnode_if.sh for specifying PRE- and POST-op handlers,
to be invoked before and after vop_pre() and vop_post(), respectively.
Basic idea from FreeBSD, but implemented differently.
- Add support in vnode_if.sh for specifying CONTEXT fields in the
vop_*_args structures. These context fields are used to convey information
between the file system VOP function and the VOP wrapper, but do not
occupy an argument slot in the VOP_*() call itself. These context fields
are initialized and subsequently interpreted by PRE- and POST-op handlers.
- Version VOP_REMOVE(), uses the a context field for the file system to report
back the resulting link count of the target vnode. Return this in tmpfs,
udf, nfs, chfs, ext2fs, lfs, and ufs.

NetBSD 9.99.92.
 1.27 23-Apr-2020  ad PR kern/54759 (vm.ubc_direct deadlock when read()/write() into mapping of itself)

- Add new flag UBC_ISMAPPED which tells ubc_uiomove() the object is mmap()ed
somewhere. Use it to decide whether to do direct-mapped copy, rather than
poking around directly in the vnode in ubc_uiomove(), which is ugly and
doesn't work for tmpfs. It would be nicer to contain all this in UVM but
the filesystem provides the needed locking here (VV_MAPPED) and to
reinvent that would suck more.

- Rename UBC_UNMAP_FLAG() to UBC_VNODE_FLAGS(). Pass in UBC_ISMAPPED where
appropriate.
 1.26 23-Feb-2020  ad branches: 1.26.4;
UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.25 20-Jun-2019  christos branches: 1.25.4;
unifdef -DLFS_READWRITE ulfs_readwrite.c
 1.24 10-Jun-2017  maya branches: 1.24.6;
Rename i_flag to i_state.

The similarity to i_flags has previously caused errors.
 1.23 30-Mar-2017  hannken branches: 1.23.6;
Remove now redundant calls to fstrans_start()/fstrans_done().

Add fstrans_start()/fstrans_done() to lfs_putpages().
 1.22 20-Jun-2016  dholland branches: 1.22.2; 1.22.4;
One more batch of already-synced ufs changes:

ufs_extern.h 1.79 is equivalent to ulfs_extern.h 1.14
ufsmount.h 1.43 is (roughly) equivalent to lfs_extern.h 1.102
ufs_inode.c 1.94 does not apply to lfs
ufs_inode.c 1.95 does not apply to lfs either
ufs_readwrite.c 1.108 is equivalent to ulfs_readwrite.c 1.8
ufs_readwrite.c 1.109 is equivalent to ulfs_readwrite.c 1.9
ufs_readwrite.c 1.110 is equivalent to ulfs_readwrite.c 1.10
ufs_readwrite.c 1.111 does not apply to lfs
ufs_readwrite.c 1.112 is equivalent to ulfs_readwrite.c 1.11
ufs_readwrite.c 1.113 is equivalent to ulfs_readwrite.c 1.13
ufs_readwrite.c 1.114 is equivalent to ulfs_readwrite.c 1.14
ufs_readwrite.c 1.115 is equivalent to ulfs_readwrite.c 1.15
ufs_readwrite.c 1.116-1.118 does not apply to lfs
ufs_readwrite.c 1.119-1.120 are equivalent to ulfs_readwrite.c 1.16
ufs_rename.c 1.12 is equivalent to lfs_rename.c 1.8
ufs_vnops.c 1.226 is equivalent to ulfs_vnops.c 1.22 and lfs_vnops.c 1.270
ufs_vnops.c 1.227 is equivalent to ulfs_vnops.c 1.23
ufs_vnops.c 1.228-1.229 are equivalent to ulfs_vnops.c 1.24
ufs_vnops.c 1.230 is equivalent to ulfs_vnops.c 1.25 and lfs_vnops.c 1.271
ufs_vnops.c 1.231 originated in lfs
ufs_vnops.c 1.232 does not apply to lfs
 1.21 19-Jun-2016  dholland Mark ufs file versions we're already synced with.
 1.20 23-Nov-2015  mlelstv fix assertion checking that bufrd function is used only for large symlinks
that aren't embedded in the inode.
 1.19 24-Jul-2015  dholland More lfs superblock accessors.
(This changes the rest of the code over; all the accessors were
already added.)

The difference between this commit and the previous one is arbitrary,
but the previous one passed the regression tests on its own so I'm
keeping it separate to help with any bisections that might be needed
in the future.
 1.18 24-Jul-2015  dholland Switch to accessor functions for elements of the LFS on-disk
superblock. This will allow switching between 32/64 bit forms on the
fly; it will also allow handling LFS_EI reasonably tidily. (That
currently doesn't work on the superblock.)

It also gets rid of cpp abuse in the form of fake structure member
macros.

Also, instead of doing sleep/wakeup on &lfs_avail and &lfs_nextseg
inside the on-disk superblock, add extra elements to the in-memory
struct lfs for this. (XXX: these should be changed to condvars, but
not right now)

XXX: this migrates a structure needed by the lfs code in libsa (struct
salfs) into lfs.h, where it doesn't belong, but for the time being
this is necessary in order to allow the accessors (and the various
lfs macros and other goop that relies on them) to compile.
 1.17 12-Apr-2015  riastradh Strip IO_JOURNALLOCKED, PGO_JOURNALLOCKED out of ulfs_readwrite.c.

These are vestigial from ufs_readwrite.c with wapbl -- lfs does not
have a journal but only the explicit wapbl calls, not these flags,
got ripped out in the transition to ulfs_readwrite.c.
 1.16 12-Apr-2015  riastradh Same putpages->kassert in ulfs_readwrite.c
 1.15 28-Mar-2015  maxv Remove the 'cred' argument from bread(). Remove a now unused var in
ffs_snapshot.c. Update the man page accordingly.

ok hannken@
 1.14 28-Mar-2015  riastradh Let I/O errors override inode update errors in UFS.

Fixes tests/fs/vfs/t_io:read_fault for UFS.
 1.13 28-Mar-2015  maxv Remove the 'cred' argument from breadn(), and update the man page
accordingly.

ok hannken@
 1.12 28-Mar-2015  riastradh Make some comments match better in ulfs_readwrite.
 1.11 28-Mar-2015  riastradh Factor out post-read/write inode updates in UFS.
 1.10 28-Mar-2015  riastradh Turn some `#if DIAGNOSTIC' into KASSERT.
 1.9 27-Mar-2015  riastradh Tighten some kasserts in ufs_bufio code paths.
 1.8 27-Mar-2015  riastradh Disentangle buffer-cached I/O from page-cached I/O in UFS.

Page-cached I/O is used for regular files, and is initiated by VFS
users such as userland and NFS.

Buffer-cached I/O is used for directories and symlinks, and is issued
only internally by UFS.

New UFS routine ufs_bufio replaces vn_rdwr for internal use.
ufs_bufio is implemented by new UFS operations uo_bufrd/uo_bufwr,
which sit in ufs_readwrite.c alongside the VOP_READ/VOP_WRITE
implementations.

I preserved the code as much as possible and will leave further
simplification for future commits. I kept the ulfs_readwrite.c
copypasta close to ufs_readwrite.c in case we ever want to merge them
back; likewise ext2fs_readwrite.c.

No externally visible semantic change. All atf fs tests still pass.
 1.7 17-Oct-2013  christos branches: 1.7.4; 1.7.8;
- remove unused variables
- add debug ifdefs for debugging variables
- __USE() where appropriate.
 1.6 28-Jul-2013  dholland Migrate the miscellaneous ulfs-level info from struct ulfsmount to
struct lfs.

Put them inside #ifdef _KERNEL there. They are not the only such
members, gross as that is. Unfortunately, moving struct lfs to
lfs_kernel.h does not work.
 1.5 28-Jul-2013  dholland Remove the now-pointless ulfs ops macros.
 1.4 18-Jun-2013  christos branches: 1.4.2; 1.4.4;
Prefix most of the cpp macros with lfs_ and LFS_ to avoid conflicts with ffs.
This was done so that boot blocks that want to compile both FFS and LFS in
the same file work.
 1.3 08-Jun-2013  dholland There is no WAPBL in LFS.
 1.2 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.4.4.2 18-May-2014  rmind sync with head
 1.4.4.1 28-Aug-2013  rmind sync with head
 1.4.2.4 03-Dec-2017  jdolecek update from HEAD
 1.4.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.4.2.2 23-Jun-2013  tls resync from head
 1.4.2.1 18-Jun-2013  tls file ulfs_readwrite.c was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.7.8.5 28-Aug-2017  skrll Sync with HEAD
 1.7.8.4 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.7.8.3 22-Sep-2015  skrll Sync with HEAD
 1.7.8.2 06-Jun-2015  skrll Sync with HEAD
 1.7.8.1 06-Apr-2015  skrll Sync with HEAD
 1.7.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.7.4.1 17-Oct-2013  yamt file ulfs_readwrite.c was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.22.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.22.2.1 26-Apr-2017  pgoyette Sync with HEAD
 1.23.6.1 30-Oct-2017  snj Pull up following revision(s) (requested by maya in ticket #330):
sbin/fsck_lfs/inode.c: 1.69
sbin/fsck_lfs/lfs.c: 1.73
sbin/fsck_lfs/pass6.c: 1.50
sbin/fsck_lfs/segwrite.c: 1.46
sys/ufs/lfs/lfs.h: 1.202-1.203
sys/ufs/lfs/lfs_accessors.h: 1.48
sys/ufs/lfs/lfs_alloc.c: 1.136-1.137
sys/ufs/lfs/lfs_balloc.c: 1.94
sys/ufs/lfs/lfs_bio.c: 1.141
sys/ufs/lfs/lfs_extern.h: 1.113
sys/ufs/lfs/lfs_inode.c: 1.156-1.157
sys/ufs/lfs/lfs_inode.h: 1.20, 1.21, 1.23
sys/ufs/lfs/lfs_itimes.c: 1.20
sys/ufs/lfs/lfs_pages.c: 1.13-1.15
sys/ufs/lfs/lfs_rename.c: 1.22
sys/ufs/lfs/lfs_segment.c: 1.270-1.275
sys/ufs/lfs/lfs_subr.c: 1.94-1.97
sys/ufs/lfs/lfs_syscalls.c: 1.175
sys/ufs/lfs/lfs_vfsops.c: 1.360
sys/ufs/lfs/lfs_vnops.c: 1.316-1.321
sys/ufs/lfs/ulfs_inode.c: 1.20
sys/ufs/lfs/ulfs_inode.h: 1.24
sys/ufs/lfs/ulfs_lookup.c: 1.41
sys/ufs/lfs/ulfs_quota2.c: 1.31
sys/ufs/lfs/ulfs_readwrite.c: 1.24
sys/ufs/lfs/ulfs_vnops.c: 1.49-1.50
Update inode member i_flag --> i_state to keep up with kernel changes
Move definition of IN_ALLMOD near the flag it's a mask for.
Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
Correct confusion between i_flag and i_flags
These will have to be renamed.
Spotted by Riastradh, thanks!
Add an XXX about the missing flags so it's not buried in a commit
message.
now the XXX count for LFS is 260
Rename i_flag to i_state.
The similarity to i_flags has previously caused errors.
Use continue to denote the no-op loop to match netbsd style
newline for extra clarity.
It isn't safe to drain dirops with seglock held, it'll deadlock if there
are any dirops. drain before grabbing seglock.
lfs_dirops == 0 is always true (as we already drained dirops), so omit
that part of the comparison.
Fixes a lot of LFS deadlocks. PR kern/52301
Many thanks to dholland for help analyzing coredumps
Ifdef out KDASSERT which fires on my machine.
Deduplicate sanity check that seglock is held on segunlock
Revert r1.272 fix to PR kern/52301, the performance hit is making things
unusable.
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar
XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
fix buffer overflow/KASSERT when cookies are supplied
lfs no longer uses the ffs-style struct direct, use the correct minimum
size
from dholland
XXX more wrong
Consistently use {,UN}MARK_VNODE macros rather than function calls.
Not much point doing anything after a panic call
Ask some question about the code in a XXX comment
XXX question our double-flushing of dirops
Fix typo in comment
 1.24.6.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.24.6.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.25.4.1 29-Feb-2020  ad Sync with head.
 1.26.4.1 25-Apr-2020  bouyer Sync with bouyer-xenpvh-base2 (HEAD)
 1.10 20-Jul-2013  dholland Collect the pieces of lfs rename into lfs_rename.c, and sprinkle static.
 1.9 19-Jun-2013  dholland branches: 1.9.2; 1.9.4; 1.9.6;
Rename ambiguous macros:
MAXDIRSIZE -> UFS_MAXDIRSIZE or LFS_MAXDIRSIZE
NINDIR -> FFS_NINDIR, EXT2_NINDIR, LFS_NINDIR, or MFS_NINDIR
INOPB -> FFS_INOPB, LFS_INOPB
INOPF -> FFS_INOPF, LFS_INOPF
blksize -> ffs_blksize, ext2_blksize, or lfs_blksize
sblksize -> ffs_blksize

These are not the only ambiguously defined filesystem macros, of
course, there's a pile more. I may not have found all the ambiguous
definitions of blksize(), too, as there are a lot of other things
called 'blksize' in the system.
 1.8 08-Jun-2013  dholland ulfs_dir.h has been emptied; remove it.
 1.7 08-Jun-2013  dholland There is no WAPBL in LFS.
 1.6 08-Jun-2013  dholland DIRBLKSIZ -> LFS_DIRBLKSIZ
DIRECTSIZ -> LFS_DIRECTSIZ
DIRSIZ -> LFS_DIRSIZ
OLDDIRFMT -> LFS_OLDDIRFMT
NEWDIRFMT -> LFS_NEWDIRFMT
IFTODT -> LFS_IFTODT
DTTOIF -> LFS_DTTOIF
 1.5 08-Jun-2013  dholland struct direct -> struct lfs_direct
struct dirtemplate -> struct lfs_dirtemplate
struct odirtemplate -> struct lfs_odirtemplate
DT_* -> LFS_DT_*
 1.4 08-Jun-2013  dholland Stick LFS_ in front of IFMT, IFIFO, IFREG, etc. so as not to conflict
with the UFS copies of these symbols. (Which themselves ought to have
UFS_ stuck on.)
 1.3 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.2 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.9.6.1 23-Jul-2013  riastradh sync with HEAD
 1.9.4.1 28-Aug-2013  rmind sync with head
 1.9.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.9.2.2 23-Jun-2013  tls resync from head
 1.9.2.1 19-Jun-2013  tls file ulfs_rename.c was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.4 05-Sep-2020  riastradh Round of uvm.h cleanup.

The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
to query whether curlwp is the pagedaemon, which should maybe be
exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
by it. We should split up uvm_extern.h but this will serve for now
to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
UVMHIST(ubchist), since ubchist is declared in uvm.h but the
reference evaporates if UVMHIST is not defined, so we reduce header
file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
here.

ok chs@
 1.3 14-Nov-2015  pgoyette Remove historic references to wapbl.
 1.2 08-Jun-2013  dholland branches: 1.2.2; 1.2.10; 1.2.14;
There is no WAPBL in LFS.
 1.1 06-Jun-2013  dholland Apparently we also need to cut and paste ffs_snapgone() in order to be
able to link the ufs code.

Instead of actually cutting and pasting it (as it depends on ffs-only
things) implement it as panic. Probably we'll be able to demonstrate
later that it's unreachable.

XXX: Someone should add snapgone to struct ufs_ops in ufs/ufsmount.h,
XXX: and fix ufs/ufs_lookup.c to not hardwire ffs.
 1.2.14.1 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.2.10.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.2.10.1 08-Jun-2013  yamt file ulfs_snapshot.c was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.2.2.3 03-Dec-2017  jdolecek update from HEAD
 1.2.2.2 23-Jun-2013  tls resync from head
 1.2.2.1 08-Jun-2013  tls file ulfs_snapshot.c was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.17 01-Nov-2025  perseant Create a new LFS inode flag, IN_DEAD, to indicate that a file's last
reference, other than those that come with VU_DIROP or IN_CLEANING and
the one the caller holds, has been dropped. Check and apply this flag
in lfs_orphan(), and call lfs_orphan() on close if the link count is
zero. Change the signature of lfs_orphan to facilitate.

Make test t_vfsops:lfs_tfhremove expect success.

Closes PR kern/43745.
 1.16 17-Jan-2020  ad VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.
 1.15 22-Dec-2019  ad branches: 1.15.2;
Make mntvnode_lock per-mount, and address false sharing of struct mount.
 1.14 10-Dec-2018  maxv Remove unused mbuf.h includes.
 1.13 17-Apr-2017  hannken branches: 1.13.10; 1.13.12;
Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.
 1.12 20-Jun-2016  dholland branches: 1.12.2; 1.12.4;
Note more already-merged versions:

inode.h 1.68 is subsumed by ulfs_inode.h 1.19
inode.h 1.69-1.72 do not apply to lfs
ufs_extern.h 1.74 was covered when lfs was moved to the new vnode cache
ufs_extern.h 1.75 is equivalent to ulfs_extern.h 1.13
ufs_extern.h 1.76-1.77 do not apply to lfs
ufsmount.h 1.42 does not apply to lfs
ufs_inode.c 1.90 is subsumed by ulfs_inode.c 1.10
ufs_inode.c 1.91-1.92 do not apply to lfs
ufs_lookup.c 1.130 is subsumed by ulfs_lookup.c 1.24
ufs_lookup.c 1.131 is equivalent to ulfs_lookup.c 1.20
ufs_lookup.c 1.132 is equivalent to ulfs_lookup.c 1.21
ufs_lookup.c 1.133 is equivalent to ulfs_lookup.c 1.22
ufs_lookup.c 1.134 is equivalent to ulfs_lookup.c 1.23
ufs_lookup.c 1.135 is equivalent to ulfs_lookup.c 1.25
ufs_quota2.c 1.38 is equivalent to ulfs_quota2.c 1.17
ufs_quota2.c 1.39 is equivalent to ulfs_quota2.c 1.16
ufs_quota2.c 1.40 is equivalent to ulfs_quota2.c 1.18
ufs_vfsops.c 1.53 is subsumed by lfs_vfsops.c 1.324
ufs_vfsops.c 1.54 is subsumed by lfs_vfsops.c 1.324
ufs_vnops.c 1.223-1.224 do not apply to lfs
 1.11 15-Sep-2015  dholland Kill off the ulfs_direct_cache pool.
We no longer allocate temporary struct directs, so we don't need a
pool for them.
 1.10 01-Sep-2015  dholland Add new accessors for the d_type and d_namlen fields of struct lfs_direct.
Napalm the old byteswap access logic for these.
 1.9 31-May-2015  hannken Change lfs from hash table to vcache.

- Change lfs_valloc() to return an inode number and version instead of
a vnode and move lfs_ialloc() and lfs_vcreate() to new lfs_init_vnode().

- Add lfs_valloc_fixed() to allocate a known inode, used by kernel
roll forward.

- Remove lfs_*ref(), these functions cannot coexist with vcache and
their commented behaviour is far away from their implementation.

- Add the cleaner lwp and blockinfo to struct ulfsmount so lfs_loadvnode()
may use hints from the cleaner.

- Remove vnode locks from ulfs_lookup() like we did with ufs_lookup().
 1.8 08-Jun-2013  dholland branches: 1.8.2; 1.8.10; 1.8.14;
struct direct -> struct lfs_direct
struct dirtemplate -> struct lfs_dirtemplate
struct odirtemplate -> struct lfs_odirtemplate
DT_* -> LFS_DT_*
 1.7 06-Jun-2013  dholland Fix some exposed symbols:
LOSTFOUNDINO -> LFS_LOSTFOUNDINO
struct ufid -> struct ulfs_ufid
 1.6 06-Jun-2013  dholland Cleanups to reduce symbol and header exposure:
- move struct ufid from ulfs_inode.h to lfs.h
- lfs.h needs sys/mount.h and sys/pool.h
- ulfs_quota2_subr.c needs lfs_inode.h
- remove ulfs_inode.h from lfs.h in favor of ulfs_dinode.h
- move ULFS_NDADDR, ULFS_NIADDR, ULFS_NXADDR from ulfs_dinode.h to lfs.h
- remove ulfs_dinode.h from lfs.h
- add lfs.h to ulfs_dinode.h
 1.5 06-Jun-2013  dholland Add lfs_ or ulfs_ in front of extern symbols lacking them, mostly
quota-related (and particularly quota2-related) stuff.
 1.4 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.3 06-Jun-2013  dholland Split lfs from ufs step 3: rearrange config stuff.
Add new options:
LFS_EI
LFS_DIRHASH
LFS_EXTATTR
LFS_EXTATTR_AUTOSTART
LFS_QUOTA
LFS_QUOTA2

and update code referring to the corresponding FFS and UFS config
symbols to use the LFS versions. Disable the one extant reference
to APPLE_UFS in the ulfs files. Use opt_lfs.h only, not opt_ffs.h.
 1.2 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.8.14.3 28-Aug-2017  skrll Sync with HEAD
 1.8.14.2 22-Sep-2015  skrll Sync with HEAD
 1.8.14.1 06-Jun-2015  skrll Sync with HEAD
 1.8.10.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.8.10.1 08-Jun-2013  yamt file ulfs_vfsops.c was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.8.2.3 03-Dec-2017  jdolecek update from HEAD
 1.8.2.2 23-Jun-2013  tls resync from head
 1.8.2.1 08-Jun-2013  tls file ulfs_vfsops.c was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.12.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.12.2.1 26-Apr-2017  pgoyette Sync with HEAD
 1.13.12.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.13.12.1 10-Jun-2019  christos Sync with HEAD
 1.13.10.1 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.15.2.1 17-Jan-2020  ad Sync with head.
 1.56 27-Mar-2022  christos add a kauth vnode check for creating links
 1.55 20-Oct-2021  thorpej Overhaul of the EVFILT_VNODE kevent(2) filter:

- Centralize vnode kevent handling in the VOP_*() wrappers, rather than
forcing each individual file system to deal with it (except VOP_RENAME(),
because VOP_RENAME() is a mess and we currently have 2 different ways
of handling it; at least it's reasonably well-centralized in the "new"
way).
- Add support for NOTE_OPEN, NOTE_CLOSE, NOTE_CLOSE_WRITE, and NOTE_READ,
compatible with the same events in FreeBSD.
- Track which kevent notifications clients are interested in receiving
to avoid doing work for events no one cares about (avoiding, e.g.
taking locks and traversing the klist to send a NOTE_WRITE when
someone is merely watching for a file to be deleted, for example).

In support of the above:

- Add support in vnode_if.sh for specifying PRE- and POST-op handlers,
to be invoked before and after vop_pre() and vop_post(), respectively.
Basic idea from FreeBSD, but implemented differently.
- Add support in vnode_if.sh for specifying CONTEXT fields in the
vop_*_args structures. These context fields are used to convey information
between the file system VOP function and the VOP wrapper, but do not
occupy an argument slot in the VOP_*() call itself. These context fields
are initialized and subsequently interpreted by PRE- and POST-op handlers.
- Version VOP_REMOVE(), uses the a context field for the file system to report
back the resulting link count of the target vnode. Return this in tmpfs,
udf, nfs, chfs, ext2fs, lfs, and ufs.

NetBSD 9.99.92.
 1.54 05-Sep-2020  riastradh Round of uvm.h cleanup.

The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
to query whether curlwp is the pagedaemon, which should maybe be
exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
by it. We should split up uvm_extern.h but this will serve for now
to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
UVMHIST(ubchist), since ubchist is declared in uvm.h but the
reference evaporates if UVMHIST is not defined, so we reduce header
file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
here.

ok chs@
 1.53 16-May-2020  christos Add ACL support for FFS. From FreeBSD.
 1.52 28-Oct-2017  pgoyette Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3) format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
but it is possible that I've missed some of them. I would be glad to
update any stragglers that anyone identifies.
 1.51 07-Aug-2017  dholland Tidy up ufs_readdir. First step only; there's plenty more that could be
done to improve this code.
 1.50 04-Aug-2017  maya fix buffer overflow/KASSERT when cookies are supplied
lfs no longer uses the ffs-style struct direct, use the correct minimum
size

from dholland
XXX more wrong
 1.49 10-Jun-2017  maya Rename i_flag to i_state.

The similarity to i_flags has previously caused errors.
 1.48 26-Apr-2017  riastradh branches: 1.48.4;
Change VOP_REMOVE and VOP_RMDIR to preserve lock/ref on dvp.

No change to vp -- the plan is to replace the node by the
componentname in the vop parameters, and let all directory vops do
lookups internally.

Proposed on tech-kern with no objections:
https://mail-index.netbsd.org/tech-kern/2017/04/17/msg021825.html
 1.47 11-Apr-2017  riastradh Sprinkle lock ownership assertions.
 1.46 30-Mar-2017  hannken Remove now redundant calls to fstrans_start()/fstrans_done().

Add fstrans_start()/fstrans_done() to lfs_putpages().
 1.45 13-Mar-2017  riastradh #if DIAGNOSTIC panic ---> KASSERT

Replace some #if DEBUG by this too. DEBUG is only for expensive
assertions; these are not.
 1.44 20-Jun-2016  dholland branches: 1.44.2; 1.44.4;
One more batch of already-synced ufs changes:

ufs_extern.h 1.79 is equivalent to ulfs_extern.h 1.14
ufsmount.h 1.43 is (roughly) equivalent to lfs_extern.h 1.102
ufs_inode.c 1.94 does not apply to lfs
ufs_inode.c 1.95 does not apply to lfs either
ufs_readwrite.c 1.108 is equivalent to ulfs_readwrite.c 1.8
ufs_readwrite.c 1.109 is equivalent to ulfs_readwrite.c 1.9
ufs_readwrite.c 1.110 is equivalent to ulfs_readwrite.c 1.10
ufs_readwrite.c 1.111 does not apply to lfs
ufs_readwrite.c 1.112 is equivalent to ulfs_readwrite.c 1.11
ufs_readwrite.c 1.113 is equivalent to ulfs_readwrite.c 1.13
ufs_readwrite.c 1.114 is equivalent to ulfs_readwrite.c 1.14
ufs_readwrite.c 1.115 is equivalent to ulfs_readwrite.c 1.15
ufs_readwrite.c 1.116-1.118 does not apply to lfs
ufs_readwrite.c 1.119-1.120 are equivalent to ulfs_readwrite.c 1.16
ufs_rename.c 1.12 is equivalent to lfs_rename.c 1.8
ufs_vnops.c 1.226 is equivalent to ulfs_vnops.c 1.22 and lfs_vnops.c 1.270
ufs_vnops.c 1.227 is equivalent to ulfs_vnops.c 1.23
ufs_vnops.c 1.228-1.229 are equivalent to ulfs_vnops.c 1.24
ufs_vnops.c 1.230 is equivalent to ulfs_vnops.c 1.25 and lfs_vnops.c 1.271
ufs_vnops.c 1.231 originated in lfs
ufs_vnops.c 1.232 does not apply to lfs
 1.43 20-Jun-2016  dholland With the previous we seem to have the changes from -r1.225 of ufs_vnops.c.
(as that was stuff from moving ffs to the new vcache and lfs has also been
moved, this is not surprising)
 1.42 20-Jun-2016  dholland Merge (effectively) -r1.78 of ufs_extern.h: shift ulfs_makeinode to
lfs_vnops.c and make it file-static there, as that's the only place
it's used.
 1.41 20-Jun-2016  dholland Note more already-merged versions:

inode.h 1.68 is subsumed by ulfs_inode.h 1.19
inode.h 1.69-1.72 do not apply to lfs
ufs_extern.h 1.74 was covered when lfs was moved to the new vnode cache
ufs_extern.h 1.75 is equivalent to ulfs_extern.h 1.13
ufs_extern.h 1.76-1.77 do not apply to lfs
ufsmount.h 1.42 does not apply to lfs
ufs_inode.c 1.90 is subsumed by ulfs_inode.c 1.10
ufs_inode.c 1.91-1.92 do not apply to lfs
ufs_lookup.c 1.130 is subsumed by ulfs_lookup.c 1.24
ufs_lookup.c 1.131 is equivalent to ulfs_lookup.c 1.20
ufs_lookup.c 1.132 is equivalent to ulfs_lookup.c 1.21
ufs_lookup.c 1.133 is equivalent to ulfs_lookup.c 1.22
ufs_lookup.c 1.134 is equivalent to ulfs_lookup.c 1.23
ufs_lookup.c 1.135 is equivalent to ulfs_lookup.c 1.25
ufs_quota2.c 1.38 is equivalent to ulfs_quota2.c 1.17
ufs_quota2.c 1.39 is equivalent to ulfs_quota2.c 1.16
ufs_quota2.c 1.40 is equivalent to ulfs_quota2.c 1.18
ufs_vfsops.c 1.53 is subsumed by lfs_vfsops.c 1.324
ufs_vfsops.c 1.54 is subsumed by lfs_vfsops.c 1.324
ufs_vnops.c 1.223-1.224 do not apply to lfs
 1.40 20-Jun-2016  dholland More already-merged or equivalent changes:

ufs_dirhash.c 1.36 corresponds to ulfs_dirhash.c 1.8
ufs_extattr.c 1.43 corresponds to ulfs_extattr.c 1.7
ufs_lookup.c 1.126 does not apply to lfs
ufs_lookup.c 1.127 we already have
ufs_lookup.c 1.128 does not apply to lfs
ufs_lookup.c 1.129 corresponds to ulfs_lookup.c 1.19
ufs_quota1.c 1.19 corresponds to ulfs_quota1.c 1.7
ufs_quota1.c 1.20 corresponds to ulfs_quota1.c 1.8
ufs_quota2.c 1.36 we have equivalent changes for
ufs_rename.c 1.9 corresponds to lfs_rename.c 1.5
ufs_rename.c 1.10 corresponds to lfs_rename.c 1.6
ufs_vnops.c 1.219 corresponds to lfs_vnops.c 1.260 and ulfs_vnops.c 1.19
ufs_vnops.c 1.220 corresponds to lfs_vnops.c 1.261 and ulfs_vnops.c 1.20
ufs_vnops.c 1.221 was superseded by later changes
ufs_vnops.c 1.222 got fixed independently in lfs
 1.39 19-Jun-2016  dholland we already have ufs_lookup.c 1.125 and ufs_vnops.c 1.218.
 1.38 19-Jun-2016  dholland note that we're synced with ufs_vnops.c -r1.217 and ufsmount.h -r1.41
(those changes removed lfs hooks from ufs so shouldn't be merged across)
 1.37 19-Jun-2016  dholland Merge -r1.216 of ufs_vnops.c: comments about maxsymlinklen handling
 1.36 19-Jun-2016  dholland Merge -r1.215 of ufs_vnops.c: the speed limit is 80
(-r1.214 was ffs-only)
 1.35 14-Nov-2015  pgoyette Remove historic references to wapbl.
 1.34 21-Sep-2015  dholland Add 64-bit directory entry structures, and adjust accessors accordingly.

The LFS64 directory entry has a 64-bit inode number. This is stored as
two 32-bit values to avoid inducing 64-bit alignment requirements.

The exposed type for manipulating directory entries is now
LFS_DIRHEADER, following the same convention as e.g. IFILE and SEGUSE.
(But with LFS_ on it, because.)
 1.33 21-Sep-2015  dholland Oops; LFS_DIRECTSIZ() is going to need the fs as an argument.

Also, it turns out that dirhash needs a compile-time-constant version
of LFS_DIRECTSIZ(LFS_MAXNAMLEN+1), independent of 64-vs-32, so create
LFS_MAXDIRENTRYSIZE for this. Sigh.
 1.32 15-Sep-2015  dholland Pass around struct lfs_dirheader instead of struct lfs_direct.
 1.31 15-Sep-2015  dholland Add an accessor function for directory names.
 1.30 15-Sep-2015  dholland Kill off ulfs_makedirentry; just pass the data to ulfs_direnter instead.
For now, move one copy of the code that allocates and fills in a
temporary struct lfs_direct to the top of ulfs_direnter; but it should
go away shortly.
 1.29 15-Sep-2015  dholland Add and use accessor functions for more of the directory entry fields.
 1.28 01-Sep-2015  dholland Add new accessors for the d_type and d_namlen fields of struct lfs_direct.
Napalm the old byteswap access logic for these.
 1.27 01-Sep-2015  dholland Use the lfs dinode accessors in place of the ufs-derived ones.
(Mostly.)

The ufs-derived ones are fake structure member macros, which are gross
and not very safe. Also, it seems that a lot of places in the lfs code
were using the ffsv1 branch of them unconditionally, and this way it's
guaranteed all those places have been updated.

Found while doing this: for non-devices, have getattr produce NODEV
in the rdev field instead of leaking the address of the first direct
block.
 1.26 31-May-2015  hannken Change lfs from hash table to vcache.

- Change lfs_valloc() to return an inode number and version instead of
a vnode and move lfs_ialloc() and lfs_vcreate() to new lfs_init_vnode().

- Add lfs_valloc_fixed() to allocate a known inode, used by kernel
roll forward.

- Remove lfs_*ref(), these functions cannot coexist with vcache and
their commented behaviour is far away from their implementation.

- Add the cleaner lwp and blockinfo to struct ulfsmount so lfs_loadvnode()
may use hints from the cleaner.

- Remove vnode locks from ulfs_lookup() like we did with ufs_lookup().
 1.25 20-Apr-2015  riastradh Make VOP_LINK return directory still locked and referenced.

Ride 7.99.10 bump.
 1.24 20-Apr-2015  riastradh Fix more dvp->v_mount after vput(dvp).
 1.23 27-Mar-2015  riastradh Tighten some kasserts in ufs_bufio code paths.
 1.22 27-Mar-2015  riastradh Disentangle buffer-cached I/O from page-cached I/O in UFS.

Page-cached I/O is used for regular files, and is initiated by VFS
users such as userland and NFS.

Buffer-cached I/O is used for directories and symlinks, and is issued
only internally by UFS.

New UFS routine ufs_bufio replaces vn_rdwr for internal use.
ufs_bufio is implemented by new UFS operations uo_bufrd/uo_bufwr,
which sit in ufs_readwrite.c alongside the VOP_READ/VOP_WRITE
implementations.

I preserved the code as much as possible and will leave further
simplification for future commits. I kept the ulfs_readwrite.c
copypasta close to ufs_readwrite.c in case we ever want to merge them
back; likewise ext2fs_readwrite.c.

No externally visible semantic change. All atf fs tests still pass.
 1.21 17-May-2014  dholland branches: 1.21.2; 1.21.6;
Move the ulfs-level (copy of ufs) vnops for symlink, create, and mkdir
into lfs_vnops.c preparatory to folding them into the lfs entry points.

(lfs_vnops.c now has four licenses. sigh.)
 1.20 23-Jan-2014  hannken branches: 1.20.2;
Change vnode operations create, mknod, mkdir and symlink to return
the resulting vnode *vpp unlocked.

Discussed on tech-kern@

Welcome to 6.99.30
 1.19 17-Jan-2014  hannken Change vnode operations create, mknod, mkdir and symlink to keep the
directory node dvp locked on return.

Discussed on tech-kern@

Welcome to 6.99.29
 1.18 28-Jul-2013  dholland Migrate the miscellaneous ulfs-level info from struct ulfsmount to
struct lfs.

Put them inside #ifdef _KERNEL there. They are not the only such
members, gross as that is. Unfortunately, moving struct lfs to
lfs_kernel.h does not work.
 1.17 28-Jul-2013  dholland Remove the now-pointless ulfs ops macros.
 1.16 28-Jul-2013  dholland Remove ulfsspec_close and ulfsfifo_close as they're not used.
 1.15 21-Jul-2013  dholland Merge logic from ulfs_close(), ulfs_getattr(), and ulfs_strategy()
into the preexisting lfs_*() versions of these functions, and delete
the unused ulfs copies.
 1.14 20-Jul-2013  dholland Remove ulfs_mknod, which is not used.
 1.13 08-Jun-2013  dholland branches: 1.13.2; 1.13.4; 1.13.6;
ulfs_dir.h has been emptied; remove it.
 1.12 08-Jun-2013  dholland There is no WAPBL in LFS.
 1.11 08-Jun-2013  dholland mp->mnt_wapbl and mp->mnt_wapbl_replay are always NULL in here.
 1.10 08-Jun-2013  dholland Merge -r1.213 of ufs_vnops.c:

Committed By: kardel
Date: Sat Jun 8 05:47:02 UTC 2013

fix clearing of system-flags (schg, sappnd). clearing system flags is
possible again at securelevel < 1.
reviewed by christos@
 1.9 08-Jun-2013  dholland DIRBLKSIZ -> LFS_DIRBLKSIZ
DIRECTSIZ -> LFS_DIRECTSIZ
DIRSIZ -> LFS_DIRSIZ
OLDDIRFMT -> LFS_OLDDIRFMT
NEWDIRFMT -> LFS_NEWDIRFMT
IFTODT -> LFS_IFTODT
DTTOIF -> LFS_DTTOIF
 1.8 08-Jun-2013  dholland struct direct -> struct lfs_direct
struct dirtemplate -> struct lfs_dirtemplate
struct odirtemplate -> struct lfs_odirtemplate
DT_* -> LFS_DT_*
 1.7 08-Jun-2013  dholland Stick LFS_ in front of IFMT, IFIFO, IFREG, etc. so as not to conflict
with the UFS copies of these symbols. (Which themselves ought to have
UFS_ stuck on.)
 1.6 06-Jun-2013  dholland Remove stray references to ext2fs, chfs, ffs, and mfs.
 1.5 06-Jun-2013  dholland Add lfs_ or ulfs_ in front of extern symbols lacking them, mostly
quota-related (and particularly quota2-related) stuff.
 1.4 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.3 06-Jun-2013  dholland Split lfs from ufs step 3: rearrange config stuff.
Add new options:
LFS_EI
LFS_DIRHASH
LFS_EXTATTR
LFS_EXTATTR_AUTOSTART
LFS_QUOTA
LFS_QUOTA2

and update code referring to the corresponding FFS and UFS config
symbols to use the LFS versions. Disable the one extant reference
to APPLE_UFS in the ulfs files. Use opt_lfs.h only, not opt_ffs.h.
 1.2 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.13.6.1 23-Jul-2013  riastradh sync with HEAD
 1.13.4.2 18-May-2014  rmind sync with head
 1.13.4.1 28-Aug-2013  rmind sync with head
 1.13.2.4 03-Dec-2017  jdolecek update from HEAD
 1.13.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.13.2.2 23-Jun-2013  tls resync from head
 1.13.2.1 08-Jun-2013  tls file ulfs_vnops.c was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.20.2.1 10-Aug-2014  tls Rebase.
 1.21.6.6 28-Aug-2017  skrll Sync with HEAD
 1.21.6.5 09-Jul-2016  skrll Sync with HEAD
 1.21.6.4 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.21.6.3 22-Sep-2015  skrll Sync with HEAD
 1.21.6.2 06-Jun-2015  skrll Sync with HEAD
 1.21.6.1 06-Apr-2015  skrll Sync with HEAD
 1.21.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.21.2.1 17-May-2014  yamt file ulfs_vnops.c was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.44.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.44.2.2 26-Apr-2017  pgoyette Sync with HEAD
 1.44.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.48.4.2 02-Nov-2017  snj Pull up following revision(s) (requested by pgoyette in ticket #335):
share/man/man9/kernhist.9: 1.5-1.8
sys/arch/acorn26/acorn26/pmap.c: 1.39
sys/arch/arm/arm32/fault.c: 1.105 via patch
sys/arch/arm/arm32/pmap.c: 1.350, 1.359
sys/arch/arm/broadcom/bcm2835_bsc.c: 1.7
sys/arch/arm/omap/if_cpsw.c: 1.20
sys/arch/arm/omap/tiotg.c: 1.7
sys/arch/evbarm/conf/RPI2_INSTALL: 1.3
sys/dev/ic/sl811hs.c: 1.98
sys/dev/usb/ehci.c: 1.256
sys/dev/usb/if_axe.c: 1.83
sys/dev/usb/motg.c: 1.18
sys/dev/usb/ohci.c: 1.274
sys/dev/usb/ucom.c: 1.119
sys/dev/usb/uhci.c: 1.277
sys/dev/usb/uhub.c: 1.137
sys/dev/usb/umass.c: 1.160-1.162
sys/dev/usb/umass_quirks.c: 1.100
sys/dev/usb/umass_scsipi.c: 1.55
sys/dev/usb/usb.c: 1.168
sys/dev/usb/usb_mem.c: 1.70
sys/dev/usb/usb_subr.c: 1.221
sys/dev/usb/usbdi.c: 1.175
sys/dev/usb/usbdi_util.c: 1.67-1.70
sys/dev/usb/usbroothub.c: 1.3
sys/dev/usb/xhci.c: 1.75
sys/external/bsd/drm2/dist/drm/i915/i915_gem.c: 1.34
sys/kern/kern_history.c: 1.15
sys/kern/kern_xxx.c: 1.74
sys/kern/vfs_bio.c: 1.275-1.276
sys/miscfs/genfs/genfs_io.c: 1.71
sys/sys/kernhist.h: 1.21
sys/ufs/ffs/ffs_balloc.c: 1.63
sys/ufs/lfs/lfs_vfsops.c: 1.361
sys/ufs/lfs/ulfs_inode.c: 1.21
sys/ufs/lfs/ulfs_vnops.c: 1.52
sys/ufs/ufs/ufs_inode.c: 1.102
sys/ufs/ufs/ufs_vnops.c: 1.239
sys/uvm/pmap/pmap.c: 1.37-1.39
sys/uvm/pmap/pmap_tlb.c: 1.22
sys/uvm/uvm_amap.c: 1.108
sys/uvm/uvm_anon.c: 1.64
sys/uvm/uvm_aobj.c: 1.126
sys/uvm/uvm_bio.c: 1.91
sys/uvm/uvm_device.c: 1.66
sys/uvm/uvm_fault.c: 1.201
sys/uvm/uvm_km.c: 1.144
sys/uvm/uvm_loan.c: 1.85
sys/uvm/uvm_map.c: 1.353
sys/uvm/uvm_page.c: 1.194
sys/uvm/uvm_pager.c: 1.111
sys/uvm/uvm_pdaemon.c: 1.109
sys/uvm/uvm_swap.c: 1.175
sys/uvm/uvm_vnode.c: 1.103
usr.bin/vmstat/vmstat.c: 1.219
Reorder to test for null before null deref in debug code
--
Reorder to test for null before null deref in debug code
--
KNF
--
No need for '\n' in UVMHIST_LOG
--
normalise a BIOHIST log message
--
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...
(As proposed on tech-kern@ with additional changes and enhancements.)
Details of changes:
* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)
* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.
* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.
* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."
* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.
* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).
* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).
* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.
* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.
[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3)
format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".
[2] I've tried very hard to find "all [the] existing users of
kernhist(9)"
but it is possible that I've missed some of them. I would be glad
to
update any stragglers that anyone identifies.
--
For some reason this single kernel seems to have outgrown its declared
size as a result of the kernhist(9) changes. Bump the size.
XXX The amount of increase may be excessive - anyone with more detailed
XXX knowledge please feel free to further adjust the value
appropriately.
--
Misssed one cast of pointer --> uintptr_t in previous kernhist(9) commit
--
And yet another one. :(
--
Use correct mark-up for NetBSD version.
--
More improvements in grammar and readability.
--
Remove a stray '"' (obvious typo) and add a couple of casts that are
probably needed.
--
And replace an instance of "%p" conversion with "%#jx"
--
Whitespace fix. Give Bl tag table a width. Fix Xr.
 1.48.4.1 30-Oct-2017  snj Pull up following revision(s) (requested by maya in ticket #330):
sbin/fsck_lfs/inode.c: 1.69
sbin/fsck_lfs/lfs.c: 1.73
sbin/fsck_lfs/pass6.c: 1.50
sbin/fsck_lfs/segwrite.c: 1.46
sys/ufs/lfs/lfs.h: 1.202-1.203
sys/ufs/lfs/lfs_accessors.h: 1.48
sys/ufs/lfs/lfs_alloc.c: 1.136-1.137
sys/ufs/lfs/lfs_balloc.c: 1.94
sys/ufs/lfs/lfs_bio.c: 1.141
sys/ufs/lfs/lfs_extern.h: 1.113
sys/ufs/lfs/lfs_inode.c: 1.156-1.157
sys/ufs/lfs/lfs_inode.h: 1.20, 1.21, 1.23
sys/ufs/lfs/lfs_itimes.c: 1.20
sys/ufs/lfs/lfs_pages.c: 1.13-1.15
sys/ufs/lfs/lfs_rename.c: 1.22
sys/ufs/lfs/lfs_segment.c: 1.270-1.275
sys/ufs/lfs/lfs_subr.c: 1.94-1.97
sys/ufs/lfs/lfs_syscalls.c: 1.175
sys/ufs/lfs/lfs_vfsops.c: 1.360
sys/ufs/lfs/lfs_vnops.c: 1.316-1.321
sys/ufs/lfs/ulfs_inode.c: 1.20
sys/ufs/lfs/ulfs_inode.h: 1.24
sys/ufs/lfs/ulfs_lookup.c: 1.41
sys/ufs/lfs/ulfs_quota2.c: 1.31
sys/ufs/lfs/ulfs_readwrite.c: 1.24
sys/ufs/lfs/ulfs_vnops.c: 1.49-1.50
Update inode member i_flag --> i_state to keep up with kernel changes
Move definition of IN_ALLMOD near the flag it's a mask for.
Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
Correct confusion between i_flag and i_flags
These will have to be renamed.
Spotted by Riastradh, thanks!
Add an XXX about the missing flags so it's not buried in a commit
message.
now the XXX count for LFS is 260
Rename i_flag to i_state.
The similarity to i_flags has previously caused errors.
Use continue to denote the no-op loop to match netbsd style
newline for extra clarity.
It isn't safe to drain dirops with seglock held, it'll deadlock if there
are any dirops. drain before grabbing seglock.
lfs_dirops == 0 is always true (as we already drained dirops), so omit
that part of the comparison.
Fixes a lot of LFS deadlocks. PR kern/52301
Many thanks to dholland for help analyzing coredumps
Ifdef out KDASSERT which fires on my machine.
Deduplicate sanity check that seglock is held on segunlock
Revert r1.272 fix to PR kern/52301, the performance hit is making things
unusable.
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar
XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
fix buffer overflow/KASSERT when cookies are supplied
lfs no longer uses the ffs-style struct direct, use the correct minimum
size
from dholland
XXX more wrong
Consistently use {,UN}MARK_VNODE macros rather than function calls.
Not much point doing anything after a panic call
Ask some question about the code in a XXX comment
XXX question our double-flushing of dirops
Fix typo in comment
 1.6 08-Jun-2013  dholland G/C
 1.5 08-Jun-2013  dholland There is no WAPBL in LFS.
 1.4 06-Jun-2013  dholland Remove stray references to ext2fs, chfs, ffs, and mfs.
 1.3 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.2 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.5 08-Jun-2013  dholland G/C
 1.4 08-Jun-2013  dholland There is no WAPBL in LFS.
 1.3 08-Jun-2013  dholland mp->mnt_wapbl and mp->mnt_wapbl_replay are always NULL in here.
 1.2 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.18 20-Jun-2016  dholland One more batch of already-synced ufs changes:

ufs_extern.h 1.79 is equivalent to ulfs_extern.h 1.14
ufsmount.h 1.43 is (roughly) equivalent to lfs_extern.h 1.102
ufs_inode.c 1.94 does not apply to lfs
ufs_inode.c 1.95 does not apply to lfs either
ufs_readwrite.c 1.108 is equivalent to ulfs_readwrite.c 1.8
ufs_readwrite.c 1.109 is equivalent to ulfs_readwrite.c 1.9
ufs_readwrite.c 1.110 is equivalent to ulfs_readwrite.c 1.10
ufs_readwrite.c 1.111 does not apply to lfs
ufs_readwrite.c 1.112 is equivalent to ulfs_readwrite.c 1.11
ufs_readwrite.c 1.113 is equivalent to ulfs_readwrite.c 1.13
ufs_readwrite.c 1.114 is equivalent to ulfs_readwrite.c 1.14
ufs_readwrite.c 1.115 is equivalent to ulfs_readwrite.c 1.15
ufs_readwrite.c 1.116-1.118 does not apply to lfs
ufs_readwrite.c 1.119-1.120 are equivalent to ulfs_readwrite.c 1.16
ufs_rename.c 1.12 is equivalent to lfs_rename.c 1.8
ufs_vnops.c 1.226 is equivalent to ulfs_vnops.c 1.22 and lfs_vnops.c 1.270
ufs_vnops.c 1.227 is equivalent to ulfs_vnops.c 1.23
ufs_vnops.c 1.228-1.229 are equivalent to ulfs_vnops.c 1.24
ufs_vnops.c 1.230 is equivalent to ulfs_vnops.c 1.25 and lfs_vnops.c 1.271
ufs_vnops.c 1.231 originated in lfs
ufs_vnops.c 1.232 does not apply to lfs
 1.17 20-Jun-2016  dholland Note more already-merged versions:

inode.h 1.68 is subsumed by ulfs_inode.h 1.19
inode.h 1.69-1.72 do not apply to lfs
ufs_extern.h 1.74 was covered when lfs was moved to the new vnode cache
ufs_extern.h 1.75 is equivalent to ulfs_extern.h 1.13
ufs_extern.h 1.76-1.77 do not apply to lfs
ufsmount.h 1.42 does not apply to lfs
ufs_inode.c 1.90 is subsumed by ulfs_inode.c 1.10
ufs_inode.c 1.91-1.92 do not apply to lfs
ufs_lookup.c 1.130 is subsumed by ulfs_lookup.c 1.24
ufs_lookup.c 1.131 is equivalent to ulfs_lookup.c 1.20
ufs_lookup.c 1.132 is equivalent to ulfs_lookup.c 1.21
ufs_lookup.c 1.133 is equivalent to ulfs_lookup.c 1.22
ufs_lookup.c 1.134 is equivalent to ulfs_lookup.c 1.23
ufs_lookup.c 1.135 is equivalent to ulfs_lookup.c 1.25
ufs_quota2.c 1.38 is equivalent to ulfs_quota2.c 1.17
ufs_quota2.c 1.39 is equivalent to ulfs_quota2.c 1.16
ufs_quota2.c 1.40 is equivalent to ulfs_quota2.c 1.18
ufs_vfsops.c 1.53 is subsumed by lfs_vfsops.c 1.324
ufs_vfsops.c 1.54 is subsumed by lfs_vfsops.c 1.324
ufs_vnops.c 1.223-1.224 do not apply to lfs
 1.16 19-Jun-2016  dholland note that we're synced with ufs_vnops.c -r1.217 and ufsmount.h -r1.41
(those changes removed lfs hooks from ufs so shouldn't be merged across)
 1.15 19-Jun-2016  dholland Update the ufs versions these files are synced with by 1: the
201306016 commit by hannken@ that removed references to ffs_snapgone
in ufs doesn't need to be synced into lfs.
 1.14 15-Oct-2015  dholland Move stuff from struct ulfsmount to struct lfs.
 1.13 31-May-2015  hannken Change lfs from hash table to vcache.

- Change lfs_valloc() to return an inode number and version instead of
a vnode and move lfs_ialloc() and lfs_vcreate() to new lfs_init_vnode().

- Add lfs_valloc_fixed() to allocate a known inode, used by kernel
roll forward.

- Remove lfs_*ref(), these functions cannot coexist with vcache and
their commented behaviour is far away from their implementation.

- Add the cleaner lwp and blockinfo to struct ulfsmount so lfs_loadvnode()
may use hints from the cleaner.

- Remove vnode locks from ulfs_lookup() like we did with ufs_lookup().
 1.12 28-Jul-2013  dholland branches: 1.12.4; 1.12.8;
Migrate the miscellaneous ulfs-level info from struct ulfsmount to
struct lfs.

Put them inside #ifdef _KERNEL there. They are not the only such
members, gross as that is. Unfortunately, moving struct lfs to
lfs_kernel.h does not work.
 1.11 28-Jul-2013  dholland Remove the now-pointless ulfs ops macros.
 1.10 28-Jul-2013  dholland Get rid of the ulfs_ops table as we only have one fs in here now.
 1.9 28-Jul-2013  dholland Improve comments in struct ulfsmount.
Also rearrange it to group related items together.
 1.8 28-Jul-2013  dholland Prune unused stuff from struct ulfsmount.
 1.7 08-Jun-2013  dholland branches: 1.7.2; 1.7.4;
Move stuff to lfs.h that's needed by userland:
LFS_DT_*
ULFS_ROOTINO
ULFS_WINO
struct lfs_direct
struct lfs_dirtemplate
struct lfs_odirtemplate
struct ulfs_args

Also fix FFS_MAXNAMLEN -> LFS_MAXNAMLEN in several places.
 1.6 06-Jun-2013  dholland Remove references to Apple UFS.
 1.5 06-Jun-2013  dholland Remove stray references to ext2fs, chfs, ffs, and mfs.
 1.4 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.3 06-Jun-2013  dholland Split lfs from ufs step 3: rearrange config stuff.
Add new options:
LFS_EI
LFS_DIRHASH
LFS_EXTATTR
LFS_EXTATTR_AUTOSTART
LFS_QUOTA
LFS_QUOTA2

and update code referring to the corresponding FFS and UFS config
symbols to use the LFS versions. Disable the one extant reference
to APPLE_UFS in the ulfs files. Use opt_lfs.h only, not opt_ffs.h.
 1.2 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.7.4.1 28-Aug-2013  rmind sync with head
 1.7.2.4 03-Dec-2017  jdolecek update from HEAD
 1.7.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.7.2.2 23-Jun-2013  tls resync from head
 1.7.2.1 08-Jun-2013  tls file ulfsmount.h was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.12.8.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.12.8.1 06-Jun-2015  skrll Sync with HEAD
 1.12.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.12.4.1 28-Jul-2013  yamt file ulfsmount.h was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.2 03-Jul-1999  thorpej Nuke unneeded include file.
 1.1 12-Jun-1998  cgd branches: 1.1.10;
Rework the way kernel include files are installed. In the new method,
as with user-land programs, include files are installed by each directory
in the tree that has includes to install. (This allows more flexibility
as to what gets installed, makes 'partial installs' easier, and gives us
more options as to which machines' includes get installed at any given
time.) The old SYS_INCLUDES={symlinks,copies} behaviours are _both_
still supported, though at least one bug in the 'symlinks' case is
fixed by this change. Include files can't be build before installation,
so directories that have includes as targets (e.g. dev/pci) have to move
those targets into a different Makefile.
 1.1.10.1 02-Aug-1999  thorpej Update from trunk.
 1.32 18-Jul-2021  dholland Abolish all the silly indirection macros for initializing vnode ops tables.

These are things of the form #define foofs_op genfs_op, or #define
foofs_op genfs_eopnotsupp, or similar. They serve no purpose besides
obfuscation, and have gotten cutpasted all over everywhere.
 1.31 02-Mar-2010  pooka branches: 1.31.78;
Make mfs_initminiroot() mandatory. Allows to remove #ifdef MFS.
 1.30 28-Jun-2008  rumble branches: 1.30.16;
Create sysctl entries during module initialisation and destroy them
appropriately.

Many of these file systems are now ready for modularisation.
 1.29 06-May-2008  ad branches: 1.29.2; 1.29.4;
PR kern/38141 lookup/vfs_busy acquire rwlock recursively

Simplify the mount locking. Remove all the crud to deal with recursion on
the mount lock, and crud to deal with unmount as another weirdo lock.

Hopefully this will once and for all fix the deadlocks with this. With this
commit there are two locks on each mount:

- krwlock_t mnt_unmounting. This is used to prevent unmount across critical
sections like getnewvnode(). It's only ever read locked with rw_tryenter(),
and is only ever write locked in dounmount(). A write hold can't be taken
on this lock if the current LWP could hold a vnode lock.

- kmutex_t mnt_updating. This is taken by threads updating the mount, for
example when going r/o -> r/w, and is only present to serialize updates.
In order to take this lock, a read hold must first be taken on
mnt_unmounting, and the two need to be held across the operation.

One effect of this change: previously if an unmount failed, we would make a
half hearted attempt to back out of it gracefully, but that was unlikely to
work in a lot of cases. Now while an unmount that will be aborted is in
progress, new file operations within the mount will fail instead of being
delayed. That is unlikely to be a problem though, because if the admin
requests unmount of a file system then s(he) has made a decision to deny
access to the resource.
 1.28 26-Mar-2008  ad branches: 1.28.2; 1.28.4;
Changes for PR kern/38291 (panic unmounting MFS /tmp):

- Reference count the mfsnode to fix an aincent bug. Only destroy when
reference count drops to zero. In mfs_start(), busy the mount and get
a reference to the mfsnode to prevent it disappearing while the server
is running. If the file system is gone already, vfs_busy() will fail.
- Always destroy the bufq.
- Use a global mfs_lock for simplicity.
- Replace use of malloc/free. Fixes broken MALLOC_TYPE change.
 1.27 02-Aug-2007  pooka branches: 1.27.24; 1.27.26;
include assumed headers
 1.26 31-Jul-2007  pooka * nuke the nameidata parameter from VFS_MOUNT(). Nobody on tech-kern
knew what it was supposed to be used for and wrstuden gave a go-ahead
* while rototilling, convert file systems which went easily to
use VFS_PROTOS() instead of manually prototyping the methods
 1.25 12-Jul-2007  dsl branches: 1.25.2;
Change the VFS_MOUNT() interface so that the 'data' buffer passed to the
fs code is a kernel buffer, pass though the length of the buffer as well.
Since the length of the userspace buffer isn'it (yet) passed through the mount
system call, add a field to the vfsops structure containing the default length.
Split sys_mount() for calls from compat code.
Ride one of the recent kernel version changes - old fs LKMs will load, but
sys_mount() will reject any attempt to use them.
 1.24 04-Mar-2007  christos branches: 1.24.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.23 14-May-2006  elad branches: 1.23.14;
integrate kauth.
 1.22 11-Dec-2005  christos branches: 1.22.4; 1.22.6; 1.22.8; 1.22.10; 1.22.12;
merge ktrace-lwp.
 1.21 30-Aug-2005  xtraeme * Remove __P()
* Use ANSI function declarations on ext2fs and mfs
 1.20 20-May-2004  atatat branches: 1.20.12;
Tweak sysctl setup functions (the macros, actually) for use in lkms,
and tweak lkminit_*.c (where applicable) to call them, and to call
sysctl_teardown() when being unloaded.

This consists of (1) making setup functions not be static when being
compiled as lkms (change to sys/sysctl.h), (2) making prototypes
visible for the various setup functions in header files (changes to
various header files), and (3) making simple "load" and "unload"
functions in the actual lkminit stuff.

linux_sysctl.c also needs its root exposed (ie, made not static) for
this (when built as an lkm).
 1.19 21-Apr-2004  christos Replace the statfs() family of system calls with statvfs().
Retain binary compatibility.
 1.18 04-Dec-2003  atatat branches: 1.18.2;
Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
 1.17 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.16 29-Jun-2003  fvdl branches: 1.16.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.15 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.14 01-Feb-2003  thorpej Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.
 1.13 01-Dec-2002  matt Add multiple inclusion protection for headers. Fix mismatched
variable declarations (missing const's) as needed.
 1.12 15-Sep-2001  chs add a new VFS op, vfs_reinit, which is called when desiredvnodes is
adjusted via sysctl. file systems that have hash tables which are
sized based on the value of this variable now resize those hash tables
using the new value. the max number of FFS softdeps is also recalculated.

convert various file systems to use the <sys/queue.h> macros for
their hash tables.
 1.11 19-May-2000  thorpej branches: 1.11.6; 1.11.10; 1.11.12;
Back out previous change; there is something Seriously Wrong.
 1.10 16-May-2000  thorpej Redo the way MFS does I/O to the server's address space. Instead of
queueing up buffers and awakening the MFS server process to do the I/O,
we do the I/O to the server process's address space directly using
facilities provided by UVM.

This makes it possible for buffers attempting to flush out while the
MFS is being unmounted to actually do the I/O, where before it would
fail if the server process wasn't in the MFS idle loop (i.e. had been
signaled and was attempting to exit).

Should fix kern/10122 (I can no longer reproduce the problem described
in the PR when running with these changes), and any number of other
MFS-related complaints made by people over time.
 1.9 16-Mar-2000  jdolecek Add new VFS op routine - vfs_done and call it on filesystem detach
in vfs_detach(). vfs_done may free global filesystem's resources,
typically those allocated in respective filesystem's init function.
Needed so those filesystems which went in via LKM have a chance to
clean after themselves before unloading.

For each leaf filesystem, add appropriate vfs_done routine.

Also remember how many times ffs_init() was called and do
the appropriate initialization on first call only. In ffs_done(),
destroy the resources when called by the last user of ffs code.
Change mfs to call ffs_init()/ffs_done() appropriately.
 1.8 10-Aug-1998  matthias branches: 1.8.12;
create miscfs/genfs/genfs_vnops.c:genfs_enoioctl and make all the other
filesystems use it instead of a private version.
 1.7 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.6 22-Dec-1996  cgd Change the second and third args to struct vfsops' (*vfs_mount)() to
'const char *', and 'void *', respectively. The second arg is taken directly
from user arguments, and is const there, so must be const in the prototypes
and functions. The third arg is also taken directly from user arguments.
It doesn't have to be changed, but since it's cleaner to keep the type
the same as the user arg's type, and I'm already making the 'const char *'
change...
 1.5 01-Sep-1996  mycroft Add a set of generic file system operations that most file systems use.
Also, fix some time stamp bogosities.
 1.4 09-Feb-1996  christos mfs prototypes
 1.3 14-Dec-1994  mycroft Sync with CSRG.
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.8.12.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.11.12.1 01-Oct-2001  fvdl Catch up with -current.
 1.11.10.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.11.6.2 11-Dec-2002  thorpej Sync with HEAD.
 1.11.6.1 21-Sep-2001  nathanw Catch up to -current.
 1.16.2.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.16.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.16.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.16.2.2 03-Aug-2004  skrll Sync with HEAD
 1.16.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.18.2.1 23-May-2004  tron Pull up revision 1.20 (requested by atatat in ticket #374):
Tweak sysctl setup functions (the macros, actually) for use in lkms,
and tweak lkminit_*.c (where applicable) to call them, and to call
sysctl_teardown() when being unloaded.
This consists of (1) making setup functions not be static when being
compiled as lkms (change to sys/sysctl.h), (2) making prototypes
visible for the various setup functions in header files (changes to
various header files), and (3) making simple "load" and "unload"
functions in the actual lkminit stuff.
linux_sysctl.c also needs its root exposed (ie, made not static) for
this (when built as an lkm).
 1.20.12.2 03-Sep-2007  yamt sync with head.
 1.20.12.1 21-Jun-2006  yamt sync with head.
 1.22.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.22.10.1 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.22.8.1 24-May-2006  yamt sync with head.
 1.22.6.1 01-Jun-2006  kardel Sync with head.
 1.22.4.1 09-Sep-2006  rpaulo sync with head
 1.23.14.1 12-Mar-2007  rmind Sync with HEAD.
 1.24.2.2 20-Aug-2007  ad Sync with HEAD.
 1.24.2.1 15-Jul-2007  ad Sync with head.
 1.25.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.27.26.2 02-Aug-2007  pooka include assumed headers
 1.27.26.1 02-Aug-2007  pooka file mfs_extern.h was added on branch matt-mips64 on 2007-08-02 12:53:31 +0000
 1.27.24.3 29-Jun-2008  mjf Sync with HEAD.
 1.27.24.2 02-Jun-2008  mjf Sync with HEAD.
 1.27.24.1 03-Apr-2008  mjf Sync with HEAD.
 1.28.4.3 11-Mar-2010  yamt sync with head
 1.28.4.2 04-May-2009  yamt sync with head.
 1.28.4.1 16-May-2008  yamt sync with head.
 1.28.2.1 18-May-2008  yamt sync with head.
 1.29.4.1 03-Jul-2008  simonb Sync with head.
 1.29.2.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.30.16.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.31.78.1 01-Aug-2021  thorpej Sync with HEAD.
 1.1 02-Mar-2010  pooka branches: 1.1.2; 1.1.6;
Make mfs_initminiroot() mandatory. Allows to remove #ifdef MFS.
 1.1.6.2 30-Apr-2010  uebayasi Sync with HEAD.
 1.1.6.1 02-Mar-2010  uebayasi file mfs_miniroot.c was added on branch uebayasi-xip on 2010-04-30 14:44:36 +0000
 1.1.2.2 11-Mar-2010  yamt sync with head
 1.1.2.1 02-Mar-2010  yamt file mfs_miniroot.c was added on branch yamt-nfs-mp on 2010-03-11 15:04:45 +0000
 1.117 16-Feb-2025  joe remove unecessary branches
 1.116 19-Mar-2022  hannken branches: 1.116.10;
Remove now unused VV_LOCKSWORK, all file systems support locking.

Remove unused predicates vn_locked() and vn_anylocked().

Welcome to 9.99.95
 1.115 19-Mar-2022  hannken Switch MFS device node to real vnode locking, VV_LOCKSWORK now.
 1.114 16-Mar-2020  pgoyette Use the module subsystem's ability to process SYSCTL_SETUP() entries to
automate installation of sysctl nodes.

Note that there are still a number of device and pseudo-device modules
that create entries tied to individual device units, rather than to the
module itself. These are not changed.
 1.113 17-Apr-2017  hannken branches: 1.113.12;
Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.
 1.112 17-Apr-2017  hannken Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).
 1.111 17-Feb-2017  hannken Add generic genfs_suspendctl() and use it for all file systems.
Layered file systems need work.
 1.110 17-Mar-2015  hannken branches: 1.110.2; 1.110.4;
Change ffs to use vcache_new:
- Change ffs_valloc to return an inode number.
- Remove now obsolete UFS operations UFS_VALLOC and UFS_VFREE.
- Make ufs_makeinode private to ufs_vnops.c and pass vattr instead of mode.
 1.109 14-Jan-2015  hannken Change mfs to use an anonymous vnode obtained with bdevvp()
for the specdev it mounts on.
 1.108 08-May-2014  hannken branches: 1.108.4;
Add a global vnode cache:

- vcache_get() retrieves a referenced and initialised vnode / fs node pair.
- vcache_remove() removes a vnode / fs node pair from the cache.

On cache miss vcache_get() calls new vfs operation vfs_loadvnode() to
initialise a vnode / fs node pair. This call is guaranteed exclusive,
no other thread will try to load this vnode / fs node pair.

Convert ufs/ext2fs, ufs/ffs and ufs/mfs to use this interface.

Remove now unused ufs/ufs_ihash

Discussed on tech-kern.

Welcome to 6.99.41
 1.107 16-Apr-2014  maxv An (un)privileged user can easily make the kernel dereference a NULL
pointer.

The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).

ok christos@
 1.106 23-Mar-2014  hannken branches: 1.106.2;
Change all vfsops to use C99 designated initializers.

No functional changes intended.
 1.105 25-Feb-2014  pooka Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.104 23-Nov-2013  christos change the mountlist CIRCLEQ into a TAILQ
 1.103 12-Jun-2011  rmind branches: 1.103.2; 1.103.8; 1.103.12; 1.103.14; 1.103.16; 1.103.22;
Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.102 02-Mar-2010  pooka branches: 1.102.2; 1.102.8;
Make mfs_initminiroot() mandatory. Allows to remove #ifdef MFS.
 1.101 13-Jan-2009  yamt branches: 1.101.4;
g/c BUFQ_FOO() macros and use bufq_foo() directly.
 1.100 19-Dec-2008  pgoyette Store config(1)'s root filesystem type as a text string rather than
embedding the address of its xxx_mountroot() in swapnetbsd.c. This
permits booting of kernels with hard-wired filesystem type even if the
filesystem is in a loadable module (ie, not linked into the kernel
image).

Discussed on current-users. Tested on amd64 and i386 with both hard-
wired and '?' filesystem times, and on both modular and monolithic
kernels.

Thanks to pooka@ for code review and suggestions.

Addresses my PR kern/40167
 1.99 13-Nov-2008  ad These depend on ffs.
 1.98 28-Jun-2008  rumble branches: 1.98.2; 1.98.4; 1.98.6; 1.98.12; 1.98.16;
Create sysctl entries during module initialisation and destroy them
appropriately.

Many of these file systems are now ready for modularisation.
 1.97 10-May-2008  rumble branches: 1.97.2;
Convert file systems to dynamically attach with the new module interface.
Make VFS hooks dynamic while we're here and say farewell to VFS_ATTACH and
VFS_HOOKS_ATTACH linksets.

As a consequence, most of the file systems can now be loaded as new style
modules.

Quick sanity check by ad@.
 1.96 06-May-2008  ad branches: 1.96.2;
PR kern/38141 lookup/vfs_busy acquire rwlock recursively

Simplify the mount locking. Remove all the crud to deal with recursion on
the mount lock, and crud to deal with unmount as another weirdo lock.

Hopefully this will once and for all fix the deadlocks with this. With this
commit there are two locks on each mount:

- krwlock_t mnt_unmounting. This is used to prevent unmount across critical
sections like getnewvnode(). It's only ever read locked with rw_tryenter(),
and is only ever write locked in dounmount(). A write hold can't be taken
on this lock if the current LWP could hold a vnode lock.

- kmutex_t mnt_updating. This is taken by threads updating the mount, for
example when going r/o -> r/w, and is only present to serialize updates.
In order to take this lock, a read hold must first be taken on
mnt_unmounting, and the two need to be held across the operation.

One effect of this change: previously if an unmount failed, we would make a
half hearted attempt to back out of it gracefully, but that was unlikely to
work in a lot of cases. Now while an unmount that will be aborted is in
progress, new file operations within the mount will fail instead of being
delayed. That is unlikely to be a problem though, because if the admin
requests unmount of a file system then s(he) has made a decision to deny
access to the resource.
 1.95 30-Apr-2008  ad PR kern/38135 vfs_busy/vfs_trybusy confusion

The previous fix worked, but it opened a window where mounts could have
disappeared from mountlist while the caller was traversing it using
vfs_trybusy(). Fix that.
 1.94 29-Apr-2008  ad kern/38135 vfs_busy/vfs_trybusy confusion

The symptom was that sometimes file systems would occasionally not appear
in output from 'df' or 'mount' if the system was busy. Resolution:

- Make mount locks work somewhat like vm_map locks.
- vfs_trybusy() now only fails if the mount is gone, or if someone is
unmounting the file system. Simple contention on mnt_lock doesn't
cause it to fail.
- vfs_busy() will wait even if the file system is being unmounted.
 1.93 29-Apr-2008  ad PR kern/38057 ffs makes assuptions about devvp file system
PR kern/33406 softdeps get stuck in endless loop

Introduce VFS_FSYNC() and call it when syncing a block device, if it
has a mounted file system.
 1.92 24-Apr-2008  ad branches: 1.92.2;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.
 1.91 26-Mar-2008  ad branches: 1.91.2;
Changes for PR kern/38291 (panic unmounting MFS /tmp):

- Reference count the mfsnode to fix an aincent bug. Only destroy when
reference count drops to zero. In mfs_start(), busy the mount and get
a reference to the mfsnode to prevent it disappearing while the server
is running. If the file system is gone already, vfs_busy() will fail.
- Always destroy the bufq.
- Use a global mfs_lock for simplicity.
- Replace use of malloc/free. Fixes broken MALLOC_TYPE change.
 1.90 21-Feb-2008  ad branches: 1.90.4;
Make MFS MP-safe. Needed because of the funny tricks it plays.
 1.89 30-Jan-2008  ad branches: 1.89.2;
PR kern/37706 (forced unmount of file systems is unsafe):

- Do reference counting for 'struct mount'. Each vnode associated with a
mount takes a reference, and in turn the mount takes a reference to the
vfsops.
- Now that mounts are reference counted, replace the overcomplicated mount
locking inherited from 4.4BSD with a recursable rwlock.
 1.88 28-Jan-2008  dholland Fix some race conditions in rename.
Introduce a per-FS rename lock and new vfsops to manipulate it.
Get this lock while renaming. Also add another relookup() in do_sys_rename,
which is a hack to kludge around some of the worst deficiencies of
ufs_rename.
reviewed-by: pooka (and an earlier rev by ad)
posted on tech-kern with no objections.
 1.87 25-Jan-2008  pooka spec_node_init() mfs device vnode.

fixes PR kern/37867 by Steve Woodford
 1.86 24-Jan-2008  ad specfs changes for PR kern/37717 (raidclose() is no longer called on
shutdown). There are still problems with device access and a PR will be
filed.

- Kill checkalias(). Allow multiple vnodes to reference a single device.

- Don't play dangerous tricks with block vnodes to ensure that only one
vnode can describe a block device. Instead, prohibit concurrent opens of
block devices. As a bonus remove the unreliable code that prevents
multiple file system mounts on the same device. It's no longer needed.

- Track opens by vnode and by device. Issue cdev_close() when the last open
goes away, instead of abusing vnode::v_usecount to tell if the device is
open.
 1.85 26-Nov-2007  pooka Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.84 10-Oct-2007  ad branches: 1.84.4;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.83 31-Jul-2007  pooka branches: 1.83.2; 1.83.4; 1.83.6; 1.83.8;
* nuke the nameidata parameter from VFS_MOUNT(). Nobody on tech-kern
knew what it was supposed to be used for and wrstuden gave a go-ahead
* while rototilling, convert file systems which went easily to
use VFS_PROTOS() instead of manually prototyping the methods
 1.82 26-Jul-2007  pooka Use eopnotsupp() instead of vfs_stdsuspendctl() and retire the latter.
 1.81 17-Jul-2007  pooka branches: 1.81.2;
Make set_statvfs_info() take a parameter for the vfs name instead
of always retrieving it from mp->mnt_op->vfs_name

christos ok
 1.80 12-Jul-2007  dsl Change the VFS_MOUNT() interface so that the 'data' buffer passed to the
fs code is a kernel buffer, pass though the length of the buffer as well.
Since the length of the userspace buffer isn'it (yet) passed through the mount
system call, add a field to the vfsops structure containing the default length.
Split sys_mount() for calls from compat code.
Ride one of the recent kernel version changes - old fs LKMs will load, but
sys_mount() will reject any attempt to use them.
 1.79 30-Jun-2007  pooka Using POOL_INIT here makes no sense, since file systems always have
an init method. So get rid of it and #ifdef _LKM and just always
init in the init method. Give malloc types the same treatment.
Makes file systems nicer to work with in linksetless environments
and fixes a few LKM discrepancies.
 1.78 04-Mar-2007  christos branches: 1.78.2; 1.78.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.77 09-Feb-2007  ad branches: 1.77.2;
Merge newlock2 to head.
 1.76 19-Jan-2007  hannken New file system suspension API to replace vn_start_write and vn_finished_write.
The suspension helpers are now put into file system specific operations.
This means every file system not supporting these helpers cannot be suspended
and therefore snapshots are no longer possible.

Implemented for file systems of type ffs.

The new API is enabled on a kernel option NEWVNGATE. This option is
not enabled by default in any kernel config.

Presented and discussed on tech-kern with much input from
Bill Studenmund <wrstuden@netbsd.org> and YAMAMOTO Takashi <yamt@netbsd.org>.

Welcome to 4.99.9 (new vfs op vfs_suspendctl).
 1.75 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.74 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.73 02-Sep-2006  christos branches: 1.73.2; 1.73.4;
add missing initializers
 1.72 15-Apr-2006  christos From my posting of April 3 to tech-kern:

My understanding is that the CLRSIG() is supposed to clear the signal
that was sent to the syncer process to prevent it from being delivered
to the syncer process in case unmounting fails, so that the syncer process
does not die while the filesystem is still mounted. The typical scenario
is, the syncher process is tsleep()ing in the kernel, and waking up when
it needs to do work. If someone sends a signal to it, eg. kill -TERM
the mfs process, then the kernel will try to unmount the mfs filesystem
before delivering the signal to the process. If that unmount fails, then
we should not really kill the process because that will hang the mount.
So we call CLRSIG() to stop the signal from being delivered.

So the first call to issignal() will return the signal number that was
sent to the syncer process (unless someone malicious was able to send
a lower numbered signal between the time tsleep() returned and we called
issignal()... something that is not really easy to do). But you are
right, we should not be calling it many times as a side effect of this
macro.

Rewrite CLRSIG() clear all the signals and call issignal() the correct
number of times.
 1.71 11-Dec-2005  christos branches: 1.71.4; 1.71.6; 1.71.8; 1.71.10; 1.71.12;
merge ktrace-lwp.
 1.70 15-Oct-2005  yamt - change the way to specify a bufq strategy. (by string rather than by number)
- rather than embedding bufq_state in driver softc,
have a pointer to the former.
- move bufq related functions from kern/subr_disk.c to kern/subr_bufq.c.
- rename method to strategy for consistency.
- move some definitions which don't need to be exposed to the rest of kernel
from sys/bufq.h to sys/bufq_impl.h.
(is it better to move it to kern/ or somewhere?)
- fix some obvious breakage in dev/qbus/ts.c. (not tested)
 1.69 23-Sep-2005  jmmv Apply the NFS exports list rototill patch:

- Remove all NFS related stuff from file system specific code.
- Drop the vfs_checkexp hook and generalize it in the new nfs_check_export
function, thus removing redundancy from all file systems.
- Move all NFS export-related stuff from kern/vfs_subr.c to the new
file sys/nfs/nfs_export.c. The former was becoming large and its code
is always compiled, regardless of the build options. Using the latter,
the code is only compiled in when NFSSERVER is enabled. While doing this,
also make some functions in nfs_subs.c conditional to NFSSERVER.
- Add a new command in nfssvc(2), called NFSSVC_SETEXPORTSLIST, that takes a
path and a set of export entries. At the moment it can only clear the
exports list or append entries, one by one, but it is done in a way that
allows setting the whole set of entries atomically in the future (see the
comment in mountd_set_exports_list or in doc/TODO).
- Change mountd(8) to use the nfssvc(2) system call instead of mount(2) so
that it becomes file system agnostic. In fact, all this whole thing was
done to remove a 'XXX' block from this utility!
- Change the mount*, newfs and fsck* userland utilities to not deal with NFS
exports initialization; done internally by the kernel when initializing
the NFS support for each file system.
- Implement an interface for VFS (called VFS hooks) so that several kernel
subsystems can run arbitrary code upon receipt of specific VFS events.
At the moment, this only provides support for unmount and is used to
destroy NFS exports lists from the file systems being unmounted, though it
has room for extension.

Thanks go to yamt@, chs@, thorpej@, wrstuden@ and others for their comments
and advice in the development of this patch.
 1.68 30-Aug-2005  xtraeme * Remove __P()
* Use ANSI function declarations on ext2fs and mfs
 1.67 29-May-2005  christos branches: 1.67.2;
- sprinkle const
- avoid shadow variables.
 1.66 29-Mar-2005  thorpej - Define a VFS_ATTACH() macro that places a reference to a vfsops structure
into the "vfsops" link set.
- Use VFS_ATTACH() where vfsops are declared for individual file systems.
- In vfsinit(), traverse the "vfsops" link set, rather than vfs_list_initial[].
 1.65 26-Feb-2005  perry nuke trailing whitespace
 1.64 09-Jan-2005  mycroft branches: 1.64.2; 1.64.4;
Rework the mountroot interface so that vfs_mountroot() opens the root device
and just passes it on to the file system functions. This avoids opening and
closing the device several times.

Mentioned on tech-kern some time ago, IIRC. I've been running this for a
long time.
 1.63 02-Jan-2005  thorpej Add the system call and VFS infrastructure for file system extended
attributes.

From FreeBSD.
 1.62 28-Oct-2004  yamt move buffer queue related stuffs from buf.h to their own header, bufq.h.
 1.61 05-Jul-2004  pk Call inittodr() from main(). Let file system code set the recorded `last
update' time (if any) through the new function setrootfstime().
 1.60 25-May-2004  hannken Add ffs internal snapshots. Written by Marshall Kirk McKusick for FreeBSD.

- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.

Welcome to 2.0F.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.59 25-May-2004  atatat Sysctl descriptions under vfs subtree
 1.58 21-Apr-2004  christos similar fix to enami's in the fstypename field. Not really needed, but better
safe than sorry.
 1.57 21-Apr-2004  enami Don't copy past the end of destination array boundary; the size of source
array changed due to recent statvfs change.
 1.56 21-Apr-2004  christos Replace the statfs() family of system calls with statvfs().
Retain binary compatibility.
 1.55 24-Mar-2004  atatat branches: 1.55.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.
 1.54 04-Dec-2003  atatat Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
 1.53 14-Oct-2003  dbj add mnt_iflag field to struct mount for internal flags
mv MNT_GONE, MNT_UNMOUNT and MNT_WANTRDWR to this field
additonally add mnt_writeopcountupper and mnt_writeopcountlower fields
in preparation for pending write suspension support work
bump kernel version to 1.6ZD
 1.52 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.51 29-Jun-2003  fvdl branches: 1.51.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.50 28-Jun-2003  bouyer Adapt for struct proc* -> struct lwp* changes.
 1.49 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.48 22-Apr-2003  christos fix lkm malloc lossage.
 1.47 22-Apr-2003  christos choose the smaller size of the two strings when memcpy'ing them. A better
fix would be to strncpy and null terminate? From enami.
 1.46 16-Apr-2003  christos PR/1796: John Kohl: statfs misbehaves under chrooted environments.

- Under chroot it displays only the visible filesystems with appropriate paths.
- The statfs f_mntonname gets adjusted to contain the real path from root.
- While was there, fixed a bug in ext2fs, locking problems with vfs_getfsstat(),
and factored out some of the vfsop statfs() code to copy_statfs_info(). This
fixes the problem where some filesystems forgot to set fsid.
- Made coda look more like a normal fs.
 1.45 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.44 01-Feb-2003  thorpej Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.
 1.43 18-Jan-2003  thorpej Merge the nathanw_sa branch.
 1.42 24-Oct-2002  chs the work-around in rev. 1.37 (turn off async) wasn't enough to prevent
hangs under heavy load. so we now apply the more extreme version:
make MFS mounts "sync". fixes PRs 17128 and 17321.
 1.41 21-Sep-2002  christos MNT_GETARGS support
 1.40 21-Jul-2002  hannken Rename bufq_init() to bufq_alloc().
Add bufq_free() to remove a buffer queue.
Avoid MALLOC while holding a spinlock.

From Chuck Silvers.
 1.39 19-Jul-2002  hannken Convert to new device buffer queue interface.
 1.38 04-Mar-2002  simonb branches: 1.38.2; 1.38.6; 1.38.8;
Don't use local extern declarations for the mountroot variable or
declare local prototypes for nfs_mountroot() or md_root_setconf().
 1.37 03-Feb-2002  chs fix PR 15299 by making MFS filesystems not be "async".
in the longer term, MFS needs to be made a lot more VM-friendly.
 1.36 08-Nov-2001  lukem add RCSID
 1.35 15-Sep-2001  chs branches: 1.35.2;
add a new VFS op, vfs_reinit, which is called when desiredvnodes is
adjusted via sysctl. file systems that have hash tables which are
sized based on the value of this variable now resize those hash tables
using the new value. the max number of FFS softdeps is also recalculated.

convert various file systems to use the <sys/queue.h> macros for
their hash tables.
 1.34 30-May-2001  mrg branches: 1.34.4; 1.34.6;
use _KERNEL_OPT
 1.33 16-Apr-2001  thorpej When unmounting a file system, acquire the syncer_lock before
vfs_busy'ing just before the dounmount() call. This is to avoid
sleeping with the mountlist_slock held -- but we must acquire
syncer_lock before vfs_busy because the syncer itself uses
syncer_lock -> vfs_busy locking order.
 1.32 24-Feb-2001  cgd branches: 1.32.2;
fix bug (pointed out as sequence point violation warning with current-ish gcc)
caused by use of makedev(major,minor++). makedev() now (since 32-bit
dev_t conversion) evaluates its second argument twice.
 1.31 22-Jan-2001  jdolecek make filesystem vnodeop, specop, fifoop and vnodeopv_* arrays const
 1.30 13-Oct-2000  simonb Position comment correctly wrt last commit.
 1.29 13-Oct-2000  simonb In mfs_start(), move the handling of outstanding I/O requests to before
the check for unmounting the filesystem.

Appears to fix kern/10122 from Hitoshi Matsunawa.
 1.28 19-May-2000  thorpej branches: 1.28.4;
Back out previous change; there is something Seriously Wrong.
 1.27 16-May-2000  thorpej Redo the way MFS does I/O to the server's address space. Instead of
queueing up buffers and awakening the MFS server process to do the I/O,
we do the I/O to the server process's address space directly using
facilities provided by UVM.

This makes it possible for buffers attempting to flush out while the
MFS is being unmounted to actually do the I/O, where before it would
fail if the server process wasn't in the MFS idle loop (i.e. had been
signaled and was attempting to exit).

Should fix kern/10122 (I can no longer reproduce the problem described
in the PR when running with these changes), and any number of other
MFS-related complaints made by people over time.
 1.26 16-May-2000  thorpej Record the proc directly, not the pid, of the MFS server process,
and nuke the spare fields in the mfsnode.
 1.25 30-Mar-2000  augustss Remove register declarations.
 1.24 29-Mar-2000  simonb Remove redundant decl of rootvp - it's in <sys/systm.h>.
 1.23 16-Mar-2000  jdolecek Add new VFS op routine - vfs_done and call it on filesystem detach
in vfs_detach(). vfs_done may free global filesystem's resources,
typically those allocated in respective filesystem's init function.
Needed so those filesystems which went in via LKM have a chance to
clean after themselves before unloading.

For each leaf filesystem, add appropriate vfs_done routine.

Also remember how many times ffs_init() was called and do
the appropriate initialization on first call only. In ffs_done(),
destroy the resources when called by the last user of ffs code.
Change mfs to call ffs_init()/ffs_done() appropriately.
 1.22 21-Jan-2000  thorpej Update for sys/buf.h/disksort_*() changes.
 1.21 17-Jul-1999  wrstuden branches: 1.21.2; 1.21.8;
Adjust mountroot routines to vrele rootvp in case of mount error. Closes
PR 7977 by Neil Carson, <neil@brini.com>.
 1.20 04-Apr-1999  mycroft It was silly to not make this exportable.
 1.19 26-Feb-1999  wrstuden branches: 1.19.2; 1.19.4;
Modify vfsops to seperate vfs_fhtovp() into two routines. vfs_fhtovp() now
only handles the file handle to vnode conversion, and a new call,
vfs_checkexp(), performs the export verification.
 1.18 09-Aug-1998  perry bzero->memset, bcopy->memcpy, bcmp->memcmp
 1.17 05-Jul-1998  jonathan * defopt COMPAT_{09,10,11,12,13} and COMPAT_NOMID.
TODO: revisit interaction between native compat and emul compat usage.
 1.16 01-Mar-1998  fvdl Remove accidentally enabled mfs_mountroot from vfsops struct.
 1.15 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.14 18-Feb-1998  thorpej Place a pointer to an array of our vnodeopv_desc *'s in our vfsops
structure, for use by vfs_attach().
 1.13 12-Jun-1997  mrg remove swap configuration.
 1.12 22-Feb-1997  fvdl Implement similar fix as in the NQNFS fix from BSDI, to avoid race conditions
when unmounting. It cleans up the loop a bit too.
 1.11 22-Dec-1996  cgd branches: 1.11.6;
Change the second and third args to struct vfsops' (*vfs_mount)() to
'const char *', and 'void *', respectively. The second arg is taken directly
from user arguments, and is const there, so must be const in the prototypes
and functions. The third arg is also taken directly from user arguments.
It doesn't have to be changed, but since it's cleaner to keep the type
the same as the user arg's type, and I'm already making the 'const char *'
change...
 1.10 09-Feb-1996  christos mfs prototypes
 1.9 01-Sep-1995  mycroft Do any pending I/O before trying to unmount, per John Kohl.
 1.8 18-Jun-1995  cgd don't assume the f_fsnamelen is nul-truncated or longer than MFSNAMELEN
 1.7 09-Mar-1995  mycroft copy*str() should use size_t.
 1.6 08-Mar-1995  cgd size for copyinstr should be u_long
 1.5 18-Jan-1995  mycroft Clean up the code to frob mnt_stat a bit.
 1.4 18-Jan-1995  mycroft Turn mountlist into a CIRCLEQ, and handle setting and checking of MNT_ROOTFS
differently.
 1.3 15-Dec-1994  mycroft Call foo_statfs() from a common place when mounting.
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.11.6.1 12-Mar-1997  is Merge in changes from Trunk
 1.19.4.2 02-Aug-1999  thorpej Update from trunk.
 1.19.4.1 21-Jun-1999  thorpej Sync w/ -current.
 1.19.2.1 13-Oct-2000  he Pull up revisions 1.29-1.30 (via patch, requested by simonb):
Move handling of outstanding I/O requests to before the check for
unmounting the file system. Fixes PR#10122.
 1.21.8.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.21.2.4 21-Apr-2001  bouyer Sync with HEAD
 1.21.2.3 12-Mar-2001  bouyer Sync with HEAD.
 1.21.2.2 11-Feb-2001  bouyer Sync with HEAD.
 1.21.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.28.4.1 17-Oct-2000  tv Pullup 1.29 and 1.30 [simonb]:
In mfs_start(), move the handling of outstanding I/O requests to before
the check for unmounting the filesystem.

Appears to fix kern/10122 from Hitoshi Matsunawa.
 1.32.2.12 11-Nov-2002  nathanw Catch up to -current
 1.32.2.11 18-Oct-2002  nathanw Catch up to -current.
 1.32.2.10 01-Aug-2002  nathanw Catch up to -current.
 1.32.2.9 12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.32.2.8 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.32.2.7 17-Apr-2002  nathanw Catch up to -current.
 1.32.2.6 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.32.2.5 28-Feb-2002  nathanw Catch up to -current.
 1.32.2.4 14-Nov-2001  nathanw Catch up to -current.
 1.32.2.3 21-Sep-2001  nathanw Catch up to -current.
 1.32.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.32.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.34.6.1 01-Oct-2001  fvdl Catch up with -current.
 1.34.4.5 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.34.4.4 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.34.4.3 16-Mar-2002  jdolecek Catch up with -current.
 1.34.4.2 11-Feb-2002  jdolecek Sync w/ -current.
 1.34.4.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.35.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.38.8.1 01-Nov-2002  lukem Pull up revision 1.42 (requested by tron in ticket #941):
the work-around in rev. 1.37 (turn off async) wasn't enough to prevent
hangs under heavy load. so we now apply the more extreme version:
make MFS mounts "sync". fixes PRs 17128 and 17321.
 1.38.6.2 29-Aug-2002  gehenna catch up with -current.
 1.38.6.1 20-Jul-2002  gehenna catch up with -current.
 1.38.2.1 11-Mar-2002  thorpej Make syncer_lock an adaptive mutex and rename it to syncer_mutex.
 1.51.2.10 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.51.2.9 01-Apr-2005  skrll Sync with HEAD.
 1.51.2.8 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.51.2.7 17-Jan-2005  skrll Sync with HEAD.
 1.51.2.6 02-Nov-2004  skrll Sync with HEAD.
 1.51.2.5 21-Sep-2004  skrll Fix the sync with head I botched.
 1.51.2.4 18-Sep-2004  skrll Sync with HEAD.
 1.51.2.3 03-Aug-2004  skrll Sync with HEAD
 1.51.2.2 02-Jul-2003  wrstuden Check in lwp-ification changes needed to get the evbarm/IQ80321 kernel
to compile.

only question I have is over the:
l->l_proc->p_stats->p_ru.ru_msgsnd++;
command at line 245 of dev/kttcp.c. Should we be doing per-lwp or
per-proc accounting?
 1.51.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.55.2.1 29-May-2004  tron Pull up revision 1.59 (requested by atatat in ticket #393):
Sysctl descriptions under vfs subtree
 1.64.4.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.64.2.1 29-Apr-2005  kent sync with -current
 1.67.2.8 27-Feb-2008  yamt sync with head.
 1.67.2.7 04-Feb-2008  yamt sync with head.
 1.67.2.6 07-Dec-2007  yamt sync with head
 1.67.2.5 27-Oct-2007  yamt sync with head.
 1.67.2.4 03-Sep-2007  yamt sync with head.
 1.67.2.3 26-Feb-2007  yamt sync with head.
 1.67.2.2 30-Dec-2006  yamt sync with head.
 1.67.2.1 21-Jun-2006  yamt sync with head.
 1.71.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.71.10.1 19-Apr-2006  elad sync with head.
 1.71.8.2 03-Sep-2006  yamt sync with head.
 1.71.8.1 24-May-2006  yamt sync with head.
 1.71.6.1 22-Apr-2006  simonb Sync with head.
 1.71.4.1 09-Sep-2006  rpaulo sync with head
 1.73.4.2 10-Dec-2006  yamt sync with head.
 1.73.4.1 22-Oct-2006  yamt sync with head
 1.73.2.5 05-Feb-2007  ad - When clearing signals dequeue siginfo first and free later, once
outside the lock permiter.
- Push kernel_lock back in a a couple of places.
- Adjust limcopy() to be MP safe (this needs redoing).
- Fix a couple of bugs noticed along the way.
- Catch up with condvar changes.
 1.73.2.4 01-Feb-2007  ad Sync with head.
 1.73.2.3 18-Nov-2006  ad Sync with head.
 1.73.2.2 21-Oct-2006  ad Checkpoint work in progress on locking and per-LWP signals. Very much a
a work in progress and there is still a lot to do.
 1.73.2.1 11-Sep-2006  ad - Convert some locks to mutexes and RW locks.
- Use the proclist_lock to protect pgrps and sessions in some places.
 1.77.2.1 12-Mar-2007  rmind Sync with HEAD.
 1.78.4.1 11-Jul-2007  mjf Sync with head.
 1.78.2.4 20-Aug-2007  ad Sync with HEAD.
 1.78.2.3 29-Jul-2007  ad Add vfs_destroy() to free mount structures. The specificdata_ref was being
leaked.
 1.78.2.2 15-Jul-2007  ad Sync with head.
 1.78.2.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.81.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.83.8.2 31-Jul-2007  pooka * nuke the nameidata parameter from VFS_MOUNT(). Nobody on tech-kern
knew what it was supposed to be used for and wrstuden gave a go-ahead
* while rototilling, convert file systems which went easily to
use VFS_PROTOS() instead of manually prototyping the methods
 1.83.8.1 31-Jul-2007  pooka file mfs_vfsops.c was added on branch matt-mips64 on 2007-07-31 21:14:21 +0000
 1.83.6.1 14-Oct-2007  yamt sync with head.
 1.83.4.3 23-Mar-2008  matt sync with HEAD
 1.83.4.2 09-Jan-2008  matt sync with HEAD
 1.83.4.1 06-Nov-2007  matt sync with HEAD
 1.83.2.2 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.83.2.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.84.4.2 18-Feb-2008  mjf Sync with HEAD.
 1.84.4.1 08-Dec-2007  mjf Sync with HEAD.
 1.89.2.1 24-Mar-2008  keiichi sync with head.
 1.90.4.5 17-Jan-2009  mjf Sync with HEAD.
 1.90.4.4 29-Jun-2008  mjf Sync with HEAD.
 1.90.4.3 02-Jun-2008  mjf Sync with HEAD.
 1.90.4.2 03-Apr-2008  mjf Sync with HEAD.
 1.90.4.1 21-Feb-2008  mjf file mfs_vfsops.c was added on branch mjf-devfs2 on 2008-04-03 12:43:14 +0000
 1.91.2.1 18-May-2008  yamt sync with head.
 1.92.2.3 11-Mar-2010  yamt sync with head
 1.92.2.2 04-May-2009  yamt sync with head.
 1.92.2.1 16-May-2008  yamt sync with head.
 1.96.2.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.96.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.97.2.1 03-Jul-2008  simonb Sync with head.
 1.98.16.1 28-Apr-2014  sborrill Pull up the following revisions(s) (requested by maxv in ticket #1901):
sys/kern/vfs_syscalls.c: revision 1.478, 1.480 via patch
sys/coda/coda_vfsops.c: revision 1.81
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50 via patch
sys/fs/puffs/puffs_vfsops.c: revision 1.110 via patch
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59 via patch
sys/fs/udf/udf_vfsops.c: revision 1.67
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/kern/vfs_syscalls.c: revision 1.479
sys/miscfs/nullfs/null_vfsops.c: revision 1.88 via patch
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/nfs/nfs_vfsops.c: revision 1.227
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/ufs/mfs/mfs_vfsops.c: revision 1.107

Due to missing checks in the mount syscall, and a wrong assumption on the
file systems side, the kernel could allocate an unbounded or zero-sized
memory buffer, and could dereference a NULL pointer when particular
arguments are given by a user.
 1.98.12.1 28-Apr-2014  sborrill Pull up the following revisions(s) (requested by maxv in ticket #1901):
sys/kern/vfs_syscalls.c: revision 1.478, 1.480 via patch
sys/coda/coda_vfsops.c: revision 1.81
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50 via patch
sys/fs/puffs/puffs_vfsops.c: revision 1.110 via patch
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59 via patch
sys/fs/udf/udf_vfsops.c: revision 1.67
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/kern/vfs_syscalls.c: revision 1.479
sys/miscfs/nullfs/null_vfsops.c: revision 1.88 via patch
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/nfs/nfs_vfsops.c: revision 1.227
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/ufs/mfs/mfs_vfsops.c: revision 1.107

Due to missing checks in the mount syscall, and a wrong assumption on the
file systems side, the kernel could allocate an unbounded or zero-sized
memory buffer, and could dereference a NULL pointer when particular
arguments are given by a user.
 1.98.6.1 25-Apr-2014  sborrill Pull up the following revisions(s) (requested by maxv in ticket #1901):
sys/kern/vfs_syscalls.c: revision 1.478, 1.480 via patch
sys/coda/coda_vfsops.c: revision 1.81
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50 via patch
sys/fs/puffs/puffs_vfsops.c: revision 1.110 via patch
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59 via patch
sys/fs/udf/udf_vfsops.c: revision 1.67
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/kern/vfs_syscalls.c: revision 1.479
sys/miscfs/nullfs/null_vfsops.c: revision 1.88 via patch
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/nfs/nfs_vfsops.c: revision 1.227
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/ufs/mfs/mfs_vfsops.c: revision 1.107

Due to missing checks in the mount syscall, and a wrong assumption on the
file systems side, the kernel could allocate an unbounded or zero-sized
memory buffer, and could dereference a NULL pointer when particular
arguments are given by a user.
 1.98.4.1 19-Jan-2009  skrll Sync with HEAD.
 1.98.2.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.101.4.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.102.8.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.102.2.1 19-May-2011  rmind Implement sharing of vnode_t::v_interlock amongst vnodes:
- Lock is shared amongst UVM objects using uvm_obj_setlock() or getnewvnode().
- Adjust vnode cache to handle unsharing, add VI_LOCKSHARE flag for that.
- Use sharing in tmpfs and layerfs for underlying object.
- Simplify locking in ubc_fault().
- Sprinkle some asserts.

Discussed with ad@.
 1.103.22.1 21-Apr-2014  bouyer Pull up following revision(s) (requested by maxv in ticket #1050):
sys/ufs/chfs/chfs_vfsops.c: revision 1.11
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/fs/nilfs/nilfs_vfsops.c: revision 1.16
sys/ufs/mfs/mfs_vfsops.c: revision 1.107
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/kern/vfs_syscalls.c: revision 1.478
sys/kern/vfs_syscalls.c: revision 1.479
sys/fs/puffs/puffs_vfsops.c: revision 1.110
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/nfs/nfs_vfsops.c: revision 1.227
sys/fs/v7fs/v7fs_vfsops.c: revision 1.10
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/miscfs/nullfs/null_vfsops.c: revision 1.88
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50
sys/coda/coda_vfsops.c: revision 1.81
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/kern/vfs_syscalls.c: revision 1.480
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/kern/vfs_syscalls.c: revision 1.482
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vfsops.c: revision 1.12
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/udf/udf_vfsops.c: revision 1.67
Limit check for 'data_len'. Otherwise a (un)privileged user can easily
panic the system by passing a huge size.
ok christos@
An (un)privileged user can easily make the kernel dereference a NULL
pointer.
The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).
ok christos@
Some fs's - like kernfs - set their vfs_min_mount_data to zero. Add a check
to prevent an (un)privileged user from requesting a zero-sized allocation
(and thus a panic).
This thing is totally buggy: 'data_len' is modified by the fs, so calling
kmem_free with it while its value has changed since the kmem_alloc is far
from being a good idea.
If the kernel figures out that something mismatches, it will panic
(typically with kernfs).
 1.103.16.1 18-May-2014  rmind sync with head
 1.103.14.1 21-Apr-2014  bouyer Pull up following revision(s) (requested by maxv in ticket #1050):
sys/ufs/chfs/chfs_vfsops.c: revision 1.11
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/fs/nilfs/nilfs_vfsops.c: revision 1.16
sys/ufs/mfs/mfs_vfsops.c: revision 1.107
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/kern/vfs_syscalls.c: revision 1.478
sys/kern/vfs_syscalls.c: revision 1.479
sys/fs/puffs/puffs_vfsops.c: revision 1.110
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/nfs/nfs_vfsops.c: revision 1.227
sys/fs/v7fs/v7fs_vfsops.c: revision 1.10
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/miscfs/nullfs/null_vfsops.c: revision 1.88
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50
sys/coda/coda_vfsops.c: revision 1.81
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/kern/vfs_syscalls.c: revision 1.480
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/kern/vfs_syscalls.c: revision 1.482
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vfsops.c: revision 1.12
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/udf/udf_vfsops.c: revision 1.67
Limit check for 'data_len'. Otherwise a (un)privileged user can easily
panic the system by passing a huge size.
ok christos@
An (un)privileged user can easily make the kernel dereference a NULL
pointer.
The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).
ok christos@
Some fs's - like kernfs - set their vfs_min_mount_data to zero. Add a check
to prevent an (un)privileged user from requesting a zero-sized allocation
(and thus a panic).
This thing is totally buggy: 'data_len' is modified by the fs, so calling
kmem_free with it while its value has changed since the kmem_alloc is far
from being a good idea.
If the kernel figures out that something mismatches, it will panic
(typically with kernfs).
 1.103.12.2 03-Dec-2017  jdolecek update from HEAD
 1.103.12.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.103.8.1 21-Apr-2014  bouyer Pull up following revision(s) (requested by maxv in ticket #1050):
sys/ufs/chfs/chfs_vfsops.c: revision 1.11
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/fs/nilfs/nilfs_vfsops.c: revision 1.16
sys/ufs/mfs/mfs_vfsops.c: revision 1.107
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/kern/vfs_syscalls.c: revision 1.478
sys/kern/vfs_syscalls.c: revision 1.479
sys/fs/puffs/puffs_vfsops.c: revision 1.110
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/nfs/nfs_vfsops.c: revision 1.227
sys/fs/v7fs/v7fs_vfsops.c: revision 1.10
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/miscfs/nullfs/null_vfsops.c: revision 1.88
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50
sys/coda/coda_vfsops.c: revision 1.81
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/kern/vfs_syscalls.c: revision 1.480
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/kern/vfs_syscalls.c: revision 1.482
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vfsops.c: revision 1.12
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/udf/udf_vfsops.c: revision 1.67
Limit check for 'data_len'. Otherwise a (un)privileged user can easily
panic the system by passing a huge size.
ok christos@
An (un)privileged user can easily make the kernel dereference a NULL
pointer.
The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).
ok christos@
Some fs's - like kernfs - set their vfs_min_mount_data to zero. Add a check
to prevent an (un)privileged user from requesting a zero-sized allocation
(and thus a panic).
This thing is totally buggy: 'data_len' is modified by the fs, so calling
kmem_free with it while its value has changed since the kmem_alloc is far
from being a good idea.
If the kernel figures out that something mismatches, it will panic
(typically with kernfs).
 1.103.2.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.106.2.1 10-Aug-2014  tls Rebase.
 1.108.4.2 28-Aug-2017  skrll Sync with HEAD
 1.108.4.1 06-Apr-2015  skrll Sync with HEAD
 1.110.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.110.2.2 26-Apr-2017  pgoyette Sync with HEAD
 1.110.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.113.12.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.116.10.1 02-Aug-2025  perseant Sync with HEAD
 1.64 19-Mar-2022  hannken Switch MFS device node to real vnode locking, VV_LOCKSWORK now.
 1.63 18-Jul-2021  dholland Abolish all the silly indirection macros for initializing vnode ops tables.

These are things of the form #define foofs_op genfs_op, or #define
foofs_op genfs_eopnotsupp, or similar. They serve no purpose besides
obfuscation, and have gotten cutpasted all over everywhere.
 1.62 29-Jun-2021  dholland - Add a new vnode op: VOP_PARSEPATH.
- Move namei_getcomponent to genfs_vnops.c and call it genfs_parsepath.
- Add a parsepath entry to every vnode ops table.

VOP_PARSEPATH takes a directory vnode to be searched and a complete
following path and chooses how much of that path to consume. To begin
with, all parsepath calls are genfs_parsepath, which locates the first
'/' as always.

Note that the call doesn't take the whole struct componentname, only
the string. The other bits of struct componentname should not be
needed and there's no reason to cause potential complications by
exposing them.
 1.61 16-May-2020  christos branches: 1.61.6;
Add ACL support for FFS. From FreeBSD.
 1.60 13-Apr-2020  ad Replace most uses of vp->v_usecount with a call to vrefcnt(vp), a function
that hides the details and does atomic_load_relaxed(). Signature matches
FreeBSD.
 1.59 20-Feb-2019  hannken branches: 1.59.10;
Remove superfluous VOP_UNLOCK(), vnode will be unlocked from spec_reclaim().
 1.58 26-May-2017  riastradh branches: 1.58.10;
Make VOP_RECLAIM do the last unlock of the vnode.

VOP_RECLAIM naturally has exclusive access to the vnode, so having it
locked on entry is not strictly necessary -- but it means if there
are any final operations that must be done on the vnode, such as
ffs_update, requiring exclusive access to it, we can now kassert that
the vnode is locked in those operations.

We can't just have the caller release the last lock because some file
systems don't use genfs_lock, and require the vnode to remain valid
for VOP_UNLOCK to work, notably unionfs.
 1.57 11-Apr-2017  riastradh Make VOP_INACTIVE preserve vnode lock on return.

Discussed on tech-kern:
https://mail-index.netbsd.org/tech-kern/2017/04/01/msg021751.html

Ride 7.99.68, a bumpy bus of incremental vfs improvements!
 1.56 14-Jan-2015  hannken branches: 1.56.2; 1.56.4;
Change mfs to use an anonymous vnode obtained with bdevvp()
for the specdev it mounts on.
 1.55 25-Jul-2014  dholland branches: 1.55.4;
Add VOP_FALLOCATE and VOP_FDISCARD to every vnode ops table I can
find.

The filesystem ones all call genfs_eopnotsupp - right now I am only
implementing the plumbing and we can implement fallocate and/or
fdiscard for files later.

The device ones call spec_fallocate (which is also genfs_eopnotsupp)
and spec_fdiscard, which dispatches to the device-level op.

The fifo ones all call vn_fifo_bypass, which also ends up being
EOPNOTSUPP.
 1.54 24-Jun-2010  hannken branches: 1.54.18; 1.54.32;
Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.
 1.53 13-Jan-2009  yamt branches: 1.53.4; 1.53.6;
g/c BUFQ_FOO() macros and use bufq_foo() directly.
 1.52 02-Jun-2008  christos branches: 1.52.6;
Revert to using specfs_fsync(); using a do-nothing mfs_fsync() does not work
because the filesystem cannot be unmounted since ffs_fsync() will loop forever
trying to empty the v_dirtyblkhd list.
 1.51 07-May-2008  ad mfs doesn't need fsync.
 1.50 06-May-2008  ad branches: 1.50.2;
PR kern/38141 lookup/vfs_busy acquire rwlock recursively

Simplify the mount locking. Remove all the crud to deal with recursion on
the mount lock, and crud to deal with unmount as another weirdo lock.

Hopefully this will once and for all fix the deadlocks with this. With this
commit there are two locks on each mount:

- krwlock_t mnt_unmounting. This is used to prevent unmount across critical
sections like getnewvnode(). It's only ever read locked with rw_tryenter(),
and is only ever write locked in dounmount(). A write hold can't be taken
on this lock if the current LWP could hold a vnode lock.

- kmutex_t mnt_updating. This is taken by threads updating the mount, for
example when going r/o -> r/w, and is only present to serialize updates.
In order to take this lock, a read hold must first be taken on
mnt_unmounting, and the two need to be held across the operation.

One effect of this change: previously if an unmount failed, we would make a
half hearted attempt to back out of it gracefully, but that was unlikely to
work in a lot of cases. Now while an unmount that will be aborted is in
progress, new file operations within the mount will fail instead of being
delayed. That is unlikely to be a problem though, because if the admin
requests unmount of a file system then s(he) has made a decision to deny
access to the resource.
 1.49 26-Mar-2008  ad branches: 1.49.2; 1.49.4;
Changes for PR kern/38291 (panic unmounting MFS /tmp):

- Reference count the mfsnode to fix an aincent bug. Only destroy when
reference count drops to zero. In mfs_start(), busy the mount and get
a reference to the mfsnode to prevent it disappearing while the server
is running. If the file system is gone already, vfs_busy() will fail.
- Always destroy the bufq.
- Use a global mfs_lock for simplicity.
- Replace use of malloc/free. Fixes broken MALLOC_TYPE change.
 1.48 21-Feb-2008  ad branches: 1.48.4;
Make MFS MP-safe. Needed because of the funny tricks it plays.
 1.47 17-Jan-2008  ad branches: 1.47.2;
mfs_close: remove a broken assertion.
 1.46 26-Nov-2007  pooka branches: 1.46.6;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.45 29-Jul-2007  ad branches: 1.45.4; 1.45.6; 1.45.12; 1.45.14;
It's not a good idea for device drivers to modify b_flags, as they don't
need to understand the locking around that field. Instead of setting
B_ERROR, set b_error instead. b_error is 'owned' by whoever completes
the I/O request.
 1.44 04-Mar-2007  christos branches: 1.44.2; 1.44.10;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.43 14-May-2006  elad branches: 1.43.14;
integrate kauth.
 1.42 11-Dec-2005  christos branches: 1.42.4; 1.42.6; 1.42.8; 1.42.10; 1.42.12;
merge ktrace-lwp.
 1.41 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.40 15-Oct-2005  yamt branches: 1.40.2;
- change the way to specify a bufq strategy. (by string rather than by number)
- rather than embedding bufq_state in driver softc,
have a pointer to the former.
- move bufq related functions from kern/subr_disk.c to kern/subr_bufq.c.
- rename method to strategy for consistency.
- move some definitions which don't need to be exposed to the rest of kernel
from sys/bufq.h to sys/bufq_impl.h.
(is it better to move it to kern/ or somewhere?)
- fix some obvious breakage in dev/qbus/ts.c. (not tested)
 1.39 30-Aug-2005  xtraeme * Remove __P()
* Use ANSI function declarations on ext2fs and mfs
 1.38 26-Feb-2005  perry branches: 1.38.4;
nuke trailing whitespace
 1.37 28-Oct-2004  yamt branches: 1.37.4; 1.37.6;
move buffer queue related stuffs from buf.h to their own header, bufq.h.
 1.36 26-Jan-2004  hannken Fix mfs_strategy() to use the vp argument.
From YAMAMOTO Takashi <yamt@netbsd.org>.
 1.35 28-Dec-2003  dbj use symbolic V_SAVE instead of value 1 when invoking vinvalbuf
 1.34 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.33 29-Jun-2003  fvdl branches: 1.33.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.32 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.31 25-Sep-2002  thorpej Don't include <sys/map.h>.
 1.30 19-Jul-2002  hannken Convert to new device buffer queue interface.
 1.29 06-Dec-2001  chs branches: 1.29.8;
add a VOP_PUTPAGES method for all the filesystems that don't have pages,
just unlock the interlock.
 1.28 08-Nov-2001  lukem add RCSID
 1.27 22-Jan-2001  jdolecek branches: 1.27.2; 1.27.6; 1.27.8; 1.27.10;
make filesystem vnodeop, specop, fifoop and vnodeopv_* arrays const
 1.26 27-Nov-2000  chs Initial integration of the Unified Buffer Cache project.
 1.25 09-Oct-2000  thorpej Make sure to set the residual count to 0 after a miniroot access
or after bitbucketing I/O during shutdown.
 1.24 11-Jun-2000  sommerfeld Bitbucket MFS I/O after vfs_shutdown has started..
 1.23 19-May-2000  thorpej branches: 1.23.2;
Back out previous change; there is something Seriously Wrong.
 1.22 16-May-2000  thorpej Redo the way MFS does I/O to the server's address space. Instead of
queueing up buffers and awakening the MFS server process to do the I/O,
we do the I/O to the server process's address space directly using
facilities provided by UVM.

This makes it possible for buffers attempting to flush out while the
MFS is being unmounted to actually do the I/O, where before it would
fail if the server process wasn't in the MFS idle loop (i.e. had been
signaled and was attempting to exit).

Should fix kern/10122 (I can no longer reproduce the problem described
in the PR when running with these changes), and any number of other
MFS-related complaints made by people over time.
 1.21 16-May-2000  thorpej Record the proc directly, not the pid, of the MFS server process,
and nuke the spare fields in the mfsnode.
 1.20 30-Mar-2000  augustss Remove register declarations.
 1.19 21-Jan-2000  thorpej Update for sys/buf.h/disksort_*() changes.
 1.18 01-Oct-1999  mycroft branches: 1.18.2; 1.18.8;
Fix printf() formats.
 1.17 03-Jul-1999  thorpej Nuke unneeded include file.
 1.16 15-Mar-1999  chs branches: 1.16.2; 1.16.4;
if an mfs i/o is successful, set b_resid to 0.
this allows the vnd driver to work on mfs files.
 1.15 10-Aug-1998  matthias create miscfs/genfs/genfs_vnops.c:genfs_enoioctl and make all the other
filesystems use it instead of a private version.
 1.14 09-Aug-1998  perry bzero->memset, bcopy->memcpy, bcmp->memcmp
 1.13 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.12 12-Oct-1996  christos revert previous kprintf changes
 1.11 10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.10 07-Sep-1996  mycroft Implement poll(2).
 1.9 01-Sep-1996  mycroft Add a set of generic file system operations that most file systems use.
Also, fix some time stamp bogosities.
 1.8 17-Mar-1996  christos Fix printf format strings
 1.7 21-Feb-1996  cgd in mfs_print: mfs_baseoff is a pointer, should be printed as %p, and
should NOT be cast to unsigned int.
 1.6 09-Feb-1996  christos mfs prototypes
 1.5 14-Dec-1994  mycroft Sync with CSRG.
 1.4 30-Oct-1994  cgd be more careful with types, also pull in headers where necessary.
 1.3 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.2 18-Jun-1994  cgd kill #ifdefs for vax/tahoe w/old vm
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.16.4.1 02-Aug-1999  thorpej Update from trunk.
 1.16.2.1 10-Oct-1999  cgd pull up rev 1.18 from trunk (requested by mycroft):
Fix potential overflow of v_usecount and v_writecount (and panics
resulting from this) by widening them to `long'. Mostly affects
systems where maxvnodes>=32768.
 1.18.8.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.18.2.3 11-Feb-2001  bouyer Sync with HEAD.
 1.18.2.2 08-Dec-2000  bouyer Sync with HEAD.
 1.18.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.23.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.27.10.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.27.8.2 18-Sep-2001  fvdl Various changes to make cloning devices possible:

* Add an extra argument (struct vnode **) to VOP_OPEN. If it is
not NULL, specfs will create a cloned (aliased) vnode during
the call, and return it there. The caller should release and
unlock the original vnode if a new vnode was returned. The
new vnode is returned locked.

* Add a flag field to the cdevsw and bdevsw structures.
DF_CLONING indicates that it wants a new vnode for each
open (XXX is there a better way? devprop?)

* If a device is cloning, always call the close entry
point for a VOP_CLOSE.


Also, rewrite cons.c to do the right thing with vnodes. Use VOPs
rather then direct device entry calls. Suggested by mycroft@

Light to moderate testing done an i386 system (arch doesn't matter
though, these are MI changes).
 1.27.8.1 07-Sep-2001  thorpej Commit my "devvp" changes to the thorpej-devvp branch. This
replaces the use of dev_t in most places with a struct vnode *.

This will form the basic infrastructure for real cloning device
support (besides being architecurally cleaner -- it'll be good
to get away from using numbers to represent objects).
 1.27.6.3 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.27.6.2 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.27.6.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.27.2.7 18-Oct-2002  nathanw Catch up to -current.
 1.27.2.6 01-Aug-2002  nathanw Catch up to -current.
 1.27.2.5 12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.27.2.4 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.27.2.3 08-Jan-2002  nathanw Catch up to -current.
 1.27.2.2 14-Nov-2001  nathanw Catch up to -current.
 1.27.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.29.8.1 20-Jul-2002  gehenna catch up with -current.
 1.33.2.7 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.33.2.6 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.33.2.5 02-Nov-2004  skrll Sync with HEAD.
 1.33.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.33.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.33.2.2 03-Aug-2004  skrll Sync with HEAD
 1.33.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.37.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.37.4.1 29-Apr-2005  kent sync with -current
 1.38.4.5 27-Feb-2008  yamt sync with head.
 1.38.4.4 21-Jan-2008  yamt sync with head
 1.38.4.3 07-Dec-2007  yamt sync with head
 1.38.4.2 03-Sep-2007  yamt sync with head.
 1.38.4.1 21-Jun-2006  yamt sync with head.
 1.40.2.1 20-Oct-2005  yamt adapt ufs.
 1.42.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.42.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.42.8.1 24-May-2006  yamt sync with head.
 1.42.6.1 01-Jun-2006  kardel Sync with head.
 1.42.4.1 09-Sep-2006  rpaulo sync with head
 1.43.14.1 12-Mar-2007  rmind Sync with HEAD.
 1.44.10.1 15-Aug-2007  skrll Sync with HEAD.
 1.44.2.2 19-Aug-2007  ad - Back out the biodone() changes.
- Eliminate B_ERROR (from HEAD).
 1.44.2.1 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.45.14.2 29-Jul-2007  ad It's not a good idea for device drivers to modify b_flags, as they don't
need to understand the locking around that field. Instead of setting
B_ERROR, set b_error instead. b_error is 'owned' by whoever completes
the I/O request.
 1.45.14.1 29-Jul-2007  ad file mfs_vnops.c was added on branch matt-mips64 on 2007-07-29 13:31:17 +0000
 1.45.12.2 18-Feb-2008  mjf Sync with HEAD.
 1.45.12.1 08-Dec-2007  mjf Sync with HEAD.
 1.45.6.2 23-Mar-2008  matt sync with HEAD
 1.45.6.1 09-Jan-2008  matt sync with HEAD
 1.45.4.1 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.46.6.1 19-Jan-2008  bouyer Sync with HEAD
 1.47.2.1 24-Mar-2008  keiichi sync with head.
 1.48.4.3 17-Jan-2009  mjf Sync with HEAD.
 1.48.4.2 03-Apr-2008  mjf Sync with HEAD.
 1.48.4.1 21-Feb-2008  mjf file mfs_vnops.c was added on branch mjf-devfs2 on 2008-04-03 12:43:14 +0000
 1.49.4.3 11-Aug-2010  yamt sync with head.
 1.49.4.2 04-May-2009  yamt sync with head.
 1.49.4.1 16-May-2008  yamt sync with head.
 1.49.2.2 04-Jun-2008  yamt sync with head
 1.49.2.1 18-May-2008  yamt sync with head.
 1.50.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.52.6.1 19-Jan-2009  skrll Sync with HEAD.
 1.53.6.1 03-Jul-2010  rmind sync with head
 1.53.4.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.54.32.1 10-Aug-2014  tls Rebase.
 1.54.18.2 03-Dec-2017  jdolecek update from HEAD
 1.54.18.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.55.4.2 28-Aug-2017  skrll Sync with HEAD
 1.55.4.1 06-Apr-2015  skrll Sync with HEAD
 1.56.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.56.2.1 26-Apr-2017  pgoyette Sync with HEAD
 1.58.10.2 21-Apr-2020  martin Sync with HEAD
 1.58.10.1 10-Jun-2019  christos Sync with HEAD
 1.59.10.1 20-Apr-2020  bouyer Sync with HEAD
 1.61.6.1 01-Aug-2021  thorpej Sync with HEAD.
 1.3 03-Jul-1999  thorpej Nuke unneeded include file.
 1.2 29-Jun-1994  cgd branches: 1.2.30;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.2.30.1 02-Aug-1999  thorpej Update from trunk.
 1.22 18-Jul-2021  dholland Abolish all the silly indirection macros for initializing vnode ops tables.

These are things of the form #define foofs_op genfs_op, or #define
foofs_op genfs_eopnotsupp, or similar. They serve no purpose besides
obfuscation, and have gotten cutpasted all over everywhere.
 1.21 26-Mar-2008  ad branches: 1.21.108;
Changes for PR kern/38291 (panic unmounting MFS /tmp):

- Reference count the mfsnode to fix an aincent bug. Only destroy when
reference count drops to zero. In mfs_start(), busy the mount and get
a reference to the mfsnode to prevent it disappearing while the server
is running. If the file system is gone already, vfs_busy() will fail.
- Always destroy the bufq.
- Use a global mfs_lock for simplicity.
- Replace use of malloc/free. Fixes broken MALLOC_TYPE change.
 1.20 21-Feb-2008  ad branches: 1.20.4;
Make MFS MP-safe. Needed because of the funny tricks it plays.
 1.19 04-Mar-2007  christos branches: 1.19.16; 1.19.32;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.18 11-Dec-2005  christos branches: 1.18.26;
merge ktrace-lwp.
 1.17 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.16 15-Oct-2005  yamt branches: 1.16.2;
- change the way to specify a bufq strategy. (by string rather than by number)
- rather than embedding bufq_state in driver softc,
have a pointer to the former.
- move bufq related functions from kern/subr_disk.c to kern/subr_bufq.c.
- rename method to strategy for consistency.
- move some definitions which don't need to be exposed to the rest of kernel
from sys/bufq.h to sys/bufq_impl.h.
(is it better to move it to kern/ or somewhere?)
- fix some obvious breakage in dev/qbus/ts.c. (not tested)
 1.15 09-Nov-2004  yamt branches: 1.15.12;
- hide bufq_state in mfsnode from userland.
- move bufq.h into obsolete set.

tested to compile pkgsrc/sysutils/lsof.
 1.14 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.13 01-Dec-2002  matt branches: 1.13.6;
Add multiple inclusion protection for headers. Fix mismatched
variable declarations (missing const's) as needed.
 1.12 19-Jul-2002  hannken Convert to new device buffer queue interface.
 1.11 06-Dec-2001  chs branches: 1.11.8;
add a VOP_PUTPAGES method for all the filesystems that don't have pages,
just unlock the interlock.
 1.10 19-May-2000  thorpej branches: 1.10.6; 1.10.10;
Back out previous change; there is something Seriously Wrong.
 1.9 16-May-2000  thorpej Redo the way MFS does I/O to the server's address space. Instead of
queueing up buffers and awakening the MFS server process to do the I/O,
we do the I/O to the server process's address space directly using
facilities provided by UVM.

This makes it possible for buffers attempting to flush out while the
MFS is being unmounted to actually do the I/O, where before it would
fail if the server process wasn't in the MFS idle loop (i.e. had been
signaled and was attempting to exit).

Should fix kern/10122 (I can no longer reproduce the problem described
in the PR when running with these changes), and any number of other
MFS-related complaints made by people over time.
 1.8 16-May-2000  thorpej Record the proc directly, not the pid, of the MFS server process,
and nuke the spare fields in the mfsnode.
 1.7 21-Jan-2000  thorpej Update for sys/buf.h/disksort_*() changes.
 1.6 01-Mar-1998  fvdl branches: 1.6.14;
Merge with Lite2 + local changes
 1.5 07-Sep-1996  mycroft Implement poll(2).
 1.4 01-Sep-1996  mycroft Add a set of generic file system operations that most file systems use.
Also, fix some time stamp bogosities.
 1.3 09-Feb-1996  christos mfs prototypes
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.6.14.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.10.10.2 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.10.10.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.10.6.3 11-Dec-2002  thorpej Sync with HEAD.
 1.10.6.2 01-Aug-2002  nathanw Catch up to -current.
 1.10.6.1 08-Jan-2002  nathanw Catch up to -current.
 1.11.8.1 20-Jul-2002  gehenna catch up with -current.
 1.13.6.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.13.6.4 14-Nov-2004  skrll Sync with HEAD.
 1.13.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.13.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.13.6.1 03-Aug-2004  skrll Sync with HEAD
 1.15.12.3 27-Feb-2008  yamt sync with head.
 1.15.12.2 03-Sep-2007  yamt sync with head.
 1.15.12.1 21-Jun-2006  yamt sync with head.
 1.16.2.1 20-Oct-2005  yamt adapt ufs.
 1.18.26.1 12-Mar-2007  rmind Sync with HEAD.
 1.19.32.1 24-Mar-2008  keiichi sync with head.
 1.19.16.1 23-Mar-2008  matt sync with HEAD
 1.20.4.2 03-Apr-2008  mjf Sync with HEAD.
 1.20.4.1 21-Feb-2008  mjf file mfsnode.h was added on branch mjf-devfs2 on 2008-04-03 12:43:14 +0000
 1.21.108.1 01-Aug-2021  thorpej Sync with HEAD.
 1.7 06-Mar-2011  bouyer merge the bouyer-quota2 branch. This adds a new on-disk format
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.

See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
 1.6 31-Jul-2008  simonb branches: 1.6.16; 1.6.22; 1.6.24;
Merge the simonb-wapbl branch. From the original branch commit:

Add Wasabi System's WAPBL (Write Ahead Physical Block Logging)
journaling code. Originally written by Darrin B. Jewell while
at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

OK'd by core@, releng@.
 1.5 11-Dec-2005  christos branches: 1.5.32; 1.5.70; 1.5.74; 1.5.76; 1.5.78; 1.5.80;
merge ktrace-lwp.
 1.4 28-Aug-2005  thorpej Experimental support for extended attributes on UFS1 file systems, using a
backing file per attribute type indexed by inode number to hold the extended
attributes.

This is working pretty well on my test systems, except for the "autostart"
feature. I need someone with a better handle on the VFS locking protocol
to go over that.

This is a work-in-progress. There are parts of this that could be re-factored
allowing this approach to be used on other types of file systems.

Adapted from FreeBSD.
 1.3 24-Jan-2005  rumble branches: 1.3.8;
Remove dirhash.h.
 1.2 23-Jan-2005  rumble Bring in Ian Dowse's Dirhash from FreeBSD. Hash tables of
directories are created on the fly and used to increase
performance by circumventing ufs_lookup's linear search.

Dirhash is enabled by the UFS_DIRHASH option, but not
by default.
 1.1 12-Jun-1998  cgd branches: 1.1.50; 1.1.58;
Rework the way kernel include files are installed. In the new method,
as with user-land programs, include files are installed by each directory
in the tree that has includes to install. (This allows more flexibility
as to what gets installed, makes 'partial installs' easier, and gives us
more options as to which machines' includes get installed at any given
time.) The old SYS_INCLUDES={symlinks,copies} behaviours are _both_
still supported, though at least one bug in the 'symlinks' case is
fixed by this change. Include files can't be build before installation,
so directories that have includes as targets (e.g. dev/pci) have to move
those targets into a different Makefile.
 1.1.58.1 29-Apr-2005  kent sync with -current
 1.1.50.2 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.1.50.1 24-Jan-2005  skrll Sync with HEAD.
 1.3.8.1 21-Jun-2006  yamt sync with head.
 1.5.80.1 19-Oct-2008  haad Sync with HEAD.
 1.5.78.1 28-Jul-2008  simonb Add support for creating a WAPBL log in the filesystem. Will
create an in-filesystem log on first "mount -o log" if one doesn't
exist, and will then continue to use same log in the future. See
(soon to be added) wapbl(4) for more info.

Adds a new B_CONTIG low-level allocation flag that uses hints in
"struct ffs_inode_ext" to lay out an ffs file's data contiguously.

Thanks to Greg Oster for helping with the design of this and to
Antti Kantee for code review and suggestions.
 1.5.76.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.5.74.1 04-May-2009  yamt sync with head.
 1.5.70.1 28-Sep-2008  mjf Sync with HEAD.
 1.5.32.1 30-Mar-2007  mjf Add initial implementation of transaction API.
 1.6.24.1 20-Jan-2011  bouyer Snapshot of work in progress on a modernised disk quota system:
- new quotactl syscall (versionned for backward compat), which takes
as parameter a path to a mount point, and a prop_dictionary
(in plistref format) describing commands and arguments.
For each command, status and data are returned as a prop_dictionary.
quota commands features will be added to take advantage of this,
exporting quota data or getting quota commands as plists.

- new on disk-format storage (all 64bit wide), integrated to metadata for
ffs (and playing nicely with wapbl).
Quotas are enabled on a ffs filesystem via superblock flags.
tunefs(8) can enable or disable quotas.
On a quota-enabled filesystem, fsck_ffs(8) will track per-uid/gid
block and inode usages, and will check and update quotas in Pass 6.
quota usage and limits are stored in unliked files (one for users,
one for groups)l fsck_ffs(8) will create the files if needed, or
free them if needed. This means that after enabling or disabling
quotas on a filesystem; a fsck_ffs(8) run is required.
quotacheck(8) is not needed any more, on a unclean shutdown
fsck or journal replay will take care of fixing quotas.
newfs(8) can create a ready-to-mount quota-enabled filesystem
(superblock flags are set and quota inodes are created).
Other new features or semantic changes:
- default quota datas, applied to users or groups which don't already
have a quota entry
- per-user/group grace time (instead of a filesystem global one)
- 0 really means "nothing allowed at all", not "no limit".
If you want "no limit", set the limit to UQUAD_MAX (tools will
understand "unlimited" and "-")

A quota file is structured as follow:
it starts with a header, containing a few per-filesystem values,
and the default quota limits.
Quota entries are linked together as a simple list, each entry has a
pointer (as an offset withing the file) to the next.
The header has a pointer to a list of free quota entries, and
a hash table of in-use entries. The size of the hash table depends
on the filesystem block size (header+hash table should fit in the
first block). The file is not sparse and is a multiple of
filesystem block size (when the free quota entry list is empty a new
filesystem block is allocated). quota entries to not cross
filesystem block boundaries.

In memory, the kernel keeps a cache of recently used quota entries
as a reference to the block number, and offset withing the block.
The quota entry itself is keept in the buf cache.

fsck_ffs(8), tunefs(8) and newfs(8) supports are completed (with
related atf tests :)
The kernel can update disk usage and report it via quotactl(2).

Todo: enforce quotas limits (limits are not checked by kernel yet)
update repquota, edquota and rpc.rquotad to the new world
implement compat_50_quotactl ioctl.
update quotactl(2) man page

fsck_ffs required fixes so that allocating new blocks or inodes will
properly update the superblock and cg sumaries. This was not an issue up
to now because superblock and cg sumaries check happened last, but now
allocations or frees can happen in pass 6.
 1.6.22.1 06-Jun-2011  jruoho Sync with HEAD.
 1.6.16.1 21-Apr-2011  rmind sync with head
 1.2 30-Jan-2023  andvar s/isses/issues/
 1.1 16-May-2020  christos Add ACL support for FFS. From FreeBSD.
 1.1 16-May-2020  christos Add ACL support for FFS. From FreeBSD.
 1.25 22-Jan-2016  dholland u_int{16,32,64}_t -> uint{16,32,64}_t, for the benefit of userland.
 1.24 09-Jun-2013  dholland branches: 1.24.10;
Remove lfs-only inumber field (and its supporting union) from struct
ufs1_dinode.
 1.23 09-Jun-2013  dholland Get rid of this copy of the accessor macro for di_u.inumber; it breaks
the build now.
 1.22 22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.21 28-Jun-2009  ad branches: 1.21.12; 1.21.22;
+/*
+ * NOTE: COORDINATE ON-DISK FORMAT CHANGES WITH THE FREEBSD PROJECT.
+ */
 1.20 12-May-2009  ad Add di_modrev to the inode, for NFSv4. From FreeBSD.
 1.19 11-Dec-2005  christos branches: 1.19.74; 1.19.90;
merge ktrace-lwp.
 1.18 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.17 02-Apr-2003  fvdl branches: 1.17.2;
Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.16 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.15 06-Jan-2003  wiz writable, not writeable.
 1.14 28-Sep-2002  dbj Add support for the Apple UFS variation on ffs
This is the bulk of PR #17345

The general approach is to use a run time deteriminable value
for DIRBLKSIZ. Additional allowances are included for using
MAXSYMLINKLEN with FS_42INODEFMT and a shift in the cylinder group
cluster summary count array. Support is added for managing
the Apple UFS volume label.
 1.13 27-Jul-2001  lukem multiple include protection
 1.12 15-Nov-1999  fvdl branches: 1.12.4; 1.12.6; 1.12.10;
Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.11 03-Aug-1999  drochner branches: 1.11.2; 1.11.4; 1.11.8;
clean up inclusion of "opt_ffs.h" and use of "FFS_EI" a bit
 1.10 08-Jul-1999  wrstuden Modify file systems to deal with struct lock in struct vnode. All leaf
fs's other than nfs use genfs_lock() for locking.

Modify lookup routines to set PDIRUNLOCK when they unlock the parrent.
 1.9 23-Oct-1998  thorpej branches: 1.9.8;
Define a symbolic constant to represent the size of a dinode.
 1.8 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.7 15-Jun-1995  cgd compensate for timeval/timespec/stat structure changes.
 1.6 21-Dec-1994  mycroft Add RCS ids where missing.
 1.5 14-Dec-1994  mycroft Sync with CSRG.
 1.4 20-Oct-1994  cgd update for new syscall args description mechanism, and deal safely
with wider types.
 1.3 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.2 14-Jun-1994  mycroft Fix compatibility with old fastlinks.
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.9.8.1 02-Aug-1999  thorpej Update from trunk.
 1.11.8.2 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.11.8.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.11.4.1 19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.11.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.12.10.2 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.12.10.1 03-Aug-2001  lukem update to -current
 1.12.6.3 07-Jan-2003  thorpej Sync with HEAD.
 1.12.6.2 18-Oct-2002  nathanw Catch up to -current.
 1.12.6.1 24-Aug-2001  nathanw Catch up with -current.
 1.12.4.1 25-Nov-2001  he Pull up revision 1.13 (requested by lukem):
Multiple include protection.
 1.17.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.17.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.17.2.1 03-Aug-2004  skrll Sync with HEAD
 1.19.90.2 23-Jul-2009  jym Sync with HEAD.
 1.19.90.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.19.74.2 18-Jul-2009  yamt sync with head.
 1.19.74.1 16-May-2009  yamt sync with head
 1.21.22.3 03-Dec-2017  jdolecek update from HEAD
 1.21.22.2 23-Jun-2013  tls resync from head
 1.21.22.1 25-Feb-2013  tls resync with head
 1.21.12.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.21.12.1 23-Jan-2013  yamt sync with head
 1.24.10.1 19-Mar-2016  skrll Sync with HEAD
 1.27 05-May-2019  christos Add more comments to explain what we are doing.
 1.26 05-May-2019  christos Zero out all the dirent padding not just one byte, to avoid kernel memory
disclosure (from https://svnweb.freebsd.org/base?view=revision&revision=347066)
 1.25 01-Sep-2015  dholland branches: 1.25.10; 1.25.18;
Pull over comments on struct direct's type/reclen byte swapping from LFS.
 1.24 19-Jun-2013  dholland branches: 1.24.10;
Rename ambiguous macros:
MAXDIRSIZE -> UFS_MAXDIRSIZE or LFS_MAXDIRSIZE
NINDIR -> FFS_NINDIR, EXT2_NINDIR, LFS_NINDIR, or MFS_NINDIR
INOPB -> FFS_INOPB, LFS_INOPB
INOPF -> FFS_INOPF, LFS_INOPF
blksize -> ffs_blksize, ext2_blksize, or lfs_blksize
sblksize -> ffs_blksize

These are not the only ambiguously defined filesystem macros, of
course, there's a pile more. I may not have found all the ambiguous
definitions of blksize(), too, as there are a lot of other things
called 'blksize' in the system.
 1.23 09-Jun-2013  dholland Stick UFS_ in front of these symbols:
DIRBLKSIZ
DIRECTSIZ
DIRSIZ
OLDDIRFMT
NEWDIRFMT

Part of PR 47909.
 1.22 07-Jun-2013  dholland typo in comment
 1.21 22-Jul-2009  dholland branches: 1.21.12; 1.21.22;
typo in comment
 1.20 11-Dec-2005  christos branches: 1.20.74; 1.20.90;
merge ktrace-lwp.
 1.19 23-Aug-2005  christos Don't overload MAXNAMLEN, use a separate constant for each filesystem type.
 1.18 19-Aug-2005  christos now that we've changed the _DIRENT_ALIGN macro, provide a d_fileno for struct
direct
 1.17 07-Aug-2003  agc branches: 1.17.16;
Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.16 19-Apr-2003  christos branches: 1.16.2;
PR/2996: Greg Oster: Fix comment to match reality. Names are padded to
four bytes, but the bytes are not null.
 1.15 01-Dec-2002  matt Add multiple inclusion protection for headers. Fix mismatched
variable declarations (missing const's) as needed.
 1.14 28-Sep-2002  dbj Add support for the Apple UFS variation on ffs
This is the bulk of PR #17345

The general approach is to use a run time deteriminable value
for DIRBLKSIZ. Additional allowances are included for using
MAXSYMLINKLEN with FS_42INODEFMT and a shift in the cylinder group
cluster summary count array. Support is added for managing
the Apple UFS volume label.
 1.13 06-Feb-2002  lukem #undef DIRBLKSIZ before #define-ing it
 1.12 31-Jan-2002  tv #undef MAXNAMLEN before defining it; this lets ufs/ufs/dir.h be used
properly on non-NetBSD hosts with makefs(8).
 1.11 16-Nov-2001  lukem move code to calculate size of direct for a given namlen to separate
DIRECTSIZ() macro, and use this to implement a (now shorter) DIRSIZ().
inspired by freebsd
 1.10 18-Mar-1998  bouyer branches: 1.10.20; 1.10.26; 1.10.30;
Add support for reading/writing FFS in non-native byte order, conditioned
to "options FFS_EI". The superblock and inodes (without blk addr) are
byteswapped at disk read/write time, other metadatas are byteswapped
when used (as they are acceeded directly in the buffer cache).
This required the addition of a "um_flags" field to struct ufsmount.
ffs_bswap.c contains superblock and inode byteswap routines also used
by userland utilities.
 1.9 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.8 09-Mar-1996  scottr Remove hack to work around <sys/dirent.h> DIRSIZ conflict
 1.7 29-Feb-1996  gwr Need to #undef DIRSIZ from <sys/dirent.h> before we redefine it.
 1.6 15-Jun-1995  cgd compensate for timeval/timespec/stat structure changes.
 1.5 21-Dec-1994  mycroft Add RCS ids where missing.
 1.4 14-Dec-1994  mycroft Sync with CSRG.
 1.3 20-Oct-1994  cgd update for new syscall args description mechanism, and deal safely
with wider types.
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.10.30.3 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.10.30.2 11-Feb-2002  jdolecek Sync w/ -current.
 1.10.30.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.10.26.4 11-Dec-2002  thorpej Sync with HEAD.
 1.10.26.3 18-Oct-2002  nathanw Catch up to -current.
 1.10.26.2 28-Feb-2002  nathanw Catch up to -current.
 1.10.26.1 08-Jan-2002  nathanw Catch up to -current.
 1.10.20.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.16.2.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.16.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.16.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.16.2.1 03-Aug-2004  skrll Sync with HEAD
 1.17.16.1 21-Jun-2006  yamt sync with head.
 1.20.90.1 23-Jul-2009  jym Sync with HEAD.
 1.20.74.1 19-Aug-2009  yamt sync with head.
 1.21.22.2 03-Dec-2017  jdolecek update from HEAD
 1.21.22.1 23-Jun-2013  tls resync from head
 1.21.12.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.24.10.1 22-Sep-2015  skrll Sync with HEAD
 1.25.18.1 10-Jun-2019  christos Sync with HEAD
 1.25.10.1 02-Mar-2020  martin Additionally pull up the following revisions, to fix build fallout from
ticket #1511:

src/sys/ufs/ufs/dir.h 1.26
sys/ufs/ufs/ufs_lookup.c 1.149

Zero out all the dirent padding not just one byte, to avoid kernel memory
disclosure (from https://svnweb.freebsd.org/base?view=revision&revision=347066)
 1.9 19-Aug-2021  andvar s/memry/memory+s/softare/software/+s/grapics/graphics+s/ouput/output
 1.8 27-Dec-2019  msaitoh s/inital/initial/
 1.7 09-Jun-2013  dholland branches: 1.7.34;
Stick UFS_ in front of these symbols:
DIRBLKSIZ
DIRECTSIZ
DIRSIZ
OLDDIRFMT
NEWDIRFMT

Part of PR 47909.
 1.6 04-Jun-2008  ad branches: 1.6.32; 1.6.42;
- Tidy up the locking a bit.
- Use atomics/kmem_alloc/pool_cache.
 1.5 09-Jul-2007  ad branches: 1.5.28; 1.5.30; 1.5.32; 1.5.34;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.4 11-Dec-2005  christos branches: 1.4.30; 1.4.32;
merge ktrace-lwp.
 1.3 23-Aug-2005  christos Don't overload MAXNAMLEN, use a separate constant for each filesystem type.
 1.2 26-Feb-2005  perry branches: 1.2.4; 1.2.6;
nuke trailing whitespace
 1.1 23-Jan-2005  rumble branches: 1.1.2; 1.1.4;
Bring in Ian Dowse's Dirhash from FreeBSD. Hash tables of
directories are created on the fly and used to increase
performance by circumventing ufs_lookup's linear search.

Dirhash is enabled by the UFS_DIRHASH option, but not
by default.
 1.1.4.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.1.2.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.1.2.3 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.1.2.2 24-Jan-2005  skrll Sync with HEAD.
 1.1.2.1 23-Jan-2005  skrll file dirhash.h was added on branch ktrace-lwp on 2005-01-24 08:36:05 +0000
 1.2.6.2 03-Sep-2007  yamt sync with head.
 1.2.6.1 21-Jun-2006  yamt sync with head.
 1.2.4.2 29-Apr-2005  kent sync with -current
 1.2.4.1 26-Feb-2005  kent file dirhash.h was added on branch kent-audio2 on 2005-04-29 11:29:39 +0000
 1.4.32.1 11-Jul-2007  mjf Sync with head.
 1.4.30.1 13-Apr-2007  ad Enable the dirhash locking, and add some comments from FreeBSD.
 1.5.34.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.5.32.1 04-May-2009  yamt sync with head.
 1.5.30.1 17-Jun-2008  yamt sync with head.
 1.5.28.1 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.6.42.1 23-Jun-2013  tls resync from head
 1.6.32.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.7.34.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.12 18-Apr-2020  christos Extended attribute support for ffsv2, from FreeBSD.
 1.11 19-Dec-2014  manu branches: 1.11.18; 1.11.28;
Bump UFS1 extended attribute max name length to 256

For extended attribute name max length, kernel filesystem-independant
code use either EXTATTR_MAXNAMELEN (BSD API) or XATTR_NAME_MAX (Linux API),
which are both defined as KERNEL_NAME_MAX and fits Linux limit of 255
without training \0.

UFS1 code had a lower limit that broke Linux compatibility. We can bump
the limit without sacrifying backward compatibility, because:

1) There is no API exposing this limit outside the kernel. Upper kernel
layers have a larger limit handle the increase without a hitch

2) Each attribute has its own backing store in the fileystem, the name
of the backing store matching the attribute name. A newer kernel can
create/read/write backing store for longer attribute names and will
have no problem with existing shorter names.
 1.10 09-Oct-2011  chs branches: 1.10.8; 1.10.12; 1.10.28; 1.10.30;
add forward declarations for the VOP args structures
so that fstat can include this file.
 1.9 17-Jun-2011  manu Add mount -o extattr option to enable extended attributs (corrently only
for UFS1).
Remove kernel option for EA backing store autocreation and do it by
default. Add a sysctl so that autocreated attriutr size can be modified.
 1.8 30-Jan-2008  ad branches: 1.8.42;
Replace use of lockmgr.
 1.7 25-Jan-2008  pooka Destroy extattr lock when destroying extattrs associated with the
mountpoint. Make stopping extattrs always succesful to facilitate
always being able to free resources.
 1.6 26-Nov-2007  pooka Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.5 30-Jun-2007  pooka branches: 1.5.6; 1.5.8; 1.5.14;
Using POOL_INIT here makes no sense, since file systems always have
an init method. So get rid of it and #ifdef _LKM and just always
init in the init method. Give malloc types the same treatment.
Makes file systems nicer to work with in linksetless environments
and fixes a few LKM discrepancies.
 1.4 14-May-2006  elad branches: 1.4.6; 1.4.20; 1.4.22;
integrate kauth.
 1.3 23-Dec-2005  rpaulo branches: 1.3.4; 1.3.6; 1.3.8; 1.3.10; 1.3.12;
Convert UFS_EXTATTR to struct lwp.
 1.2 11-Dec-2005  christos merge ktrace-lwp.
 1.1 28-Aug-2005  thorpej branches: 1.1.6;
Experimental support for extended attributes on UFS1 file systems, using a
backing file per attribute type indexed by inode number to hold the extended
attributes.

This is working pretty well on my test systems, except for the "autostart"
feature. I need someone with a better handle on the VFS locking protocol
to go over that.

This is a work-in-progress. There are parts of this that could be re-factored
allowing this approach to be used on other types of file systems.

Adapted from FreeBSD.
 1.1.6.2 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.1.6.1 28-Aug-2005  skrll file extattr.h was added on branch ktrace-lwp on 2005-11-10 14:12:39 +0000
 1.3.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.3.10.2 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.3.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.3.8.1 24-May-2006  yamt sync with head.
 1.3.6.1 01-Jun-2006  kardel Sync with head.
 1.3.4.1 09-Sep-2006  rpaulo sync with head
 1.4.22.1 11-Jul-2007  mjf Sync with head.
 1.4.20.1 15-Jul-2007  ad Sync with head.
 1.4.6.5 04-Feb-2008  yamt sync with head.
 1.4.6.4 07-Dec-2007  yamt sync with head
 1.4.6.3 03-Sep-2007  yamt sync with head.
 1.4.6.2 21-Jun-2006  yamt sync with head.
 1.4.6.1 14-May-2006  yamt file extattr.h was added on branch yamt-lazymbuf on 2006-06-21 15:12:39 +0000
 1.5.14.2 18-Feb-2008  mjf Sync with HEAD.
 1.5.14.1 08-Dec-2007  mjf Sync with HEAD.
 1.5.8.2 23-Mar-2008  matt sync with HEAD
 1.5.8.1 09-Jan-2008  matt sync with HEAD
 1.5.6.1 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.8.42.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.10.30.1 06-Apr-2015  skrll Sync with HEAD
 1.10.28.1 22-Dec-2014  msaitoh Pull up following revision(s) (requested by manu in ticket #350):
sys/ufs/ufs/extattr.h: revision 1.11
Bump UFS1 extended attribute max name length to 256
For extended attribute name max length, kernel filesystem-independant
code use either EXTATTR_MAXNAMELEN (BSD API) or XATTR_NAME_MAX (Linux
API),
which are both defined as KERNEL_NAME_MAX and fits Linux limit of 255
without training \0.
UFS1 code had a lower limit that broke Linux compatibility. We can bump
the limit without sacrifying backward compatibility, because:
1) There is no API exposing this limit outside the kernel. Upper kernel
layers have a larger limit handle the increase without a hitch
2) Each attribute has its own backing store in the fileystem, the name
of the backing store matching the attribute name. A newer kernel can
create/read/write backing store for longer attribute names and will
have no problem with existing shorter names.
 1.10.12.1 03-Dec-2017  jdolecek update from HEAD
 1.10.8.1 22-Dec-2014  msaitoh Pull up following revision(s) (requested by manu in ticket #1218):
sys/ufs/ufs/extattr.h: revision 1.11
Bump UFS1 extended attribute max name length to 256
For extended attribute name max length, kernel filesystem-independant
code use either EXTATTR_MAXNAMELEN (BSD API) or XATTR_NAME_MAX (Linux
API),
which are both defined as KERNEL_NAME_MAX and fits Linux limit of 255
without training \0.
UFS1 code had a lower limit that broke Linux compatibility. We can bump
the limit without sacrifying backward compatibility, because:
1) There is no API exposing this limit outside the kernel. Upper kernel
layers have a larger limit handle the increase without a hitch
2) Each attribute has its own backing store in the fileystem, the name
of the backing store matching the attribute name. A newer kernel can
create/read/write backing store for longer attribute names and will
have no problem with existing shorter names.
 1.11.28.1 20-Apr-2020  bouyer Sync with HEAD
 1.11.18.1 21-Apr-2020  martin Sync with HEAD
 1.79 23-Mar-2022  andvar fix few typos for word "previous(ly)" in comments.
 1.78 20-Aug-2020  christos Don't cache id's for vnodes that have ACLs. ok chs@
 1.77 18-Apr-2020  christos Extended attribute support for ffsv2, from FreeBSD.
 1.76 20-Aug-2017  maya branches: 1.76.4; 1.76.14;
update the comment to the current IFMT/permissions location
 1.75 14-Aug-2016  jdolecek again remove IN_E4EXTENTS; it's not used anywhere any more, and it's better to keep fs-specific flags out of generic headers anyway
 1.74 04-Aug-2016  jdolecek move i_e2fs_* defines from ufs/inode.h to ext2fs/ext2fs_dinode.h, where they belong; they don't seem to be used anywhere else then ext2fs code any more
 1.73 04-Aug-2016  jdolecek rename struct ext2fs_dinode attribute e2di_dacl to correct
e2di_size_high; even Linux ext2 filesystem code actually uses it
unconditionally this way and ext4 code finally also calls it that way
in their struct definition too; if there was any trace of this for other
purpose it's long gone
 1.72 03-Jun-2016  christos branches: 1.72.2;
ext4 extents glue
 1.71 26-May-2014  dholland branches: 1.71.4;
Fix previous. Anyone have a brown paper bag?
 1.70 26-May-2014  ryoon Close comments
 1.69 26-May-2014  dholland Remove lfs-only inode flags.
 1.68 17-May-2014  martin Reorder struct ufid members to avoid padding (and save 4 bytes) on some
architectures.
 1.67 14-May-2014  martin Make filehandles on UFS based filesystems use proper 64bit inodes.
32bit restriction noticed by Taylor R Campbell.
 1.66 08-May-2014  hannken Add a global vnode cache:

- vcache_get() retrieves a referenced and initialised vnode / fs node pair.
- vcache_remove() removes a vnode / fs node pair from the cache.

On cache miss vcache_get() calls new vfs operation vfs_loadvnode() to
initialise a vnode / fs node pair. This call is guaranteed exclusive,
no other thread will try to load this vnode / fs node pair.

Convert ufs/ext2fs, ufs/ffs and ufs/mfs to use this interface.

Remove now unused ufs/ufs_ihash

Discussed on tech-kern.

Welcome to 6.99.41
 1.65 09-Jun-2013  dholland branches: 1.65.2; 1.65.6;
Remove lfs-only inumber field (and its supporting union) from struct
ufs1_dinode.
 1.64 19-Nov-2012  jakllsch - Add e2di_version, e2di_nblock_high, e2di_facl_high fields to ext2fs_dinode.

- Update i_e2fs_ aliases to match.

- ext2fs_bswap support for these ext2fs_dinode fields.

(e2di_version and e2di_facl_high replace previously reserved fields.
e2di_nblock_high was formerly e2di_nfrag and e2di_fsize, however these
are currently defined in e2fsprogs as only being relevant for HURD.)
 1.63 19-Nov-2012  jakllsch Move i_e2fs_rdev define to be adjacent to the field it aliases.
 1.62 04-Jun-2012  riastradh branches: 1.62.2;
Use two separate comments for stub where IN_RENAME was.
 1.61 04-Jun-2012  riastradh Kill the IN_RENAME in-core inode flag in ufs and ext2fs.

Now that rename works we need not to wave this sort of voodoo at it.

ok dholland
 1.60 05-May-2012  yamt comments and cosmetics. no functional changes.
 1.59 02-Jan-2012  perseant * Remove PGO_RECLAIM during lfs_putpages()' call to genfs_putpages(),
to avoid a live lock in the latter when reclaiming a vnode with
dirty pages.

* Add a new segment flag, SEGM_RECLAIM, to note when a segment is
being written for vnode reclamation, and record which inode is being
reclaimed, to aid in forensic debugging.

* Add a new segment flag, SEGM_SINGLE, so that opportunistic writes
can write a single segment's worth of blocks and then stop, rather
than writing all the way up to the cleaner's reserved number of
segments.

* Add assert statements to check mutex ownership is the way it ought
to be, mostly in lfs_putpages; fix problems uncovered by this.

* Don't clear VU_DIROP until the inode actually makes its way to disk,
avoiding a problem where dirop inodes could become separated
(uncovered by a modified version of the "ckckp" forensic regression
test).

* Move the vfs_getopsbyname() call into lfs_writerd. Prepare code to
make lfs_writerd notice when there are no more LFSs, and exit losing
the reference, so that, in theory, the module can be unloaded. This
code is not enabled, since it causes a crash on exit.

* Set IN_MODIFIED on inodes flushed by lfs_flush_dirops. Really we
only need to set IN_MODIFIED if we are going to write them again
(e.g., to write pages); need to think about this more.

Finally, several changes to help avoid "no clean segments" panics:

* In lfs_bmapv, note when a vnode is loaded only to discover whether
its blocks are live, so it can immediately be recycled. Since the
cleaner will try to choose ~empty segments over full ones, this
prevents the cleaner from (1) filling the vnode cache with junk, and
(2) squeezing any unwritten writes to disk and running the fs out of
segments.

* Overestimate by half the amount of metadata that will be required
to fill the clean segments. This will make the disk appear smaller,
but should help avoid a "no clean segments" panic.

* Rearrange lfs_writerd. In particular, lfs_writerd now pays
attention to the number of clean segments available, and holds off
writing until there is room.
 1.58 12-Jul-2011  dholland branches: 1.58.2; 1.58.6;
Currently, ufs_lookup produces five auxiliary results that are left in
the vnode when lookup returns and fished out again later.

1. Create struct ufs_lookup_results to hold these.

2. Call the ufs_lookup_results instance in struct inode "i_crap" to be
clear about exactly what's going on, and to distinguish the lookup
results from respectable members of struct inode.

3. Update references to these members in the directory access
subroutines.

4. Include preliminary infrastructure for checking that the i_crap
being used is still valid when it's used. This doesn't actually do
anything yet.

5. Update the way ufs_wapbl_rename manipulates these elements to use
the new data structures. I have not changed the manipulation; it may
or may not be correct but I continue to suspect that it is not.

The word of the day is "stigmergy".
 1.57 28-Jul-2010  hannken ext2fs,ffs: free on disk inodes in the reclaim routine.
Remove now unneeded vnode flag VI_FREEING.

Welcome to 5.99.38.

Ok: Andrew Doran <ad@netbsd.org>
 1.56 22-Feb-2009  ad branches: 1.56.2; 1.56.4;
PR kern/26878 FFSv2 + softdep = livelock (no free ram)
PR kern/16942 panic with softdep and quotas
PR kern/19565 panic: softdep_write_inodeblock: indirect pointer #1 mismatch
PR kern/26274 softdep panic: allocdirect_merge: ...
PR kern/26374 Long delay before non-root users can write to softdep partitions
PR kern/28621 1.6.x "vp != NULL" panic in ffs_softdep.c:4653 while unmounting a softdep (+quota) filesystem
PR kern/29513 FFS+Softdep panic with unfsck-able file-corruption
PR kern/31544 The ffs softdep code appears to fail to write dirty bits to disk
PR kern/31981 stopping scsi disk can cause panic (softdep)
PR kern/32116 kernel panic in softdep (assertion failure)
PR kern/32532 softdep_trackbufs deadlock
PR kern/37191 softdep: locking against myself
PR kern/40474 Kernel panic after remounting raid root with softdep

Retire softdep, pass 2. As discussed and later formally announced on the
mailing lists.
 1.55 23-Nov-2008  mrg branches: 1.55.4;
add support for 32 bit uid/gid fields in ext2, but only do so for
when the revision is > REV0.
 1.54 23-Sep-2008  christos branches: 1.54.2; 1.54.4;
fix reversed comment, from anon ymous
 1.53 31-Jul-2008  simonb Be consistent with #define<tab>.
 1.52 31-Jul-2008  simonb Merge the simonb-wapbl branch. From the original branch commit:

Add Wasabi System's WAPBL (Write Ahead Physical Block Logging)
journaling code. Originally written by Darrin B. Jewell while
at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

OK'd by core@, releng@.
 1.51 09-Jan-2008  ad branches: 1.51.6; 1.51.10; 1.51.12; 1.51.14; 1.51.16;
Go back to freeing on disk inodes in the inactive routine. It would be
better not to do this, but it rules out potential side effects with softdep.
 1.50 07-Jan-2008  ad Fix 'panic: softdep_update_inodeblock: update failed'.
 1.49 02-Jan-2008  ad Merge vmlocking2 to head.
 1.48 09-Apr-2007  pooka branches: 1.48.10; 1.48.16; 1.48.18; 1.48.22;
fix comment: struct fid is in fstypes.h now
 1.47 04-Mar-2007  christos branches: 1.47.2; 1.47.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.46 11-Dec-2005  christos branches: 1.46.26;
merge ktrace-lwp.
 1.45 27-Sep-2005  yamt introduce "ufs_ops" and use it for ITIMES.
 1.44 12-Sep-2005  christos - access the ffs and ext2fs itimes functions through a pointer, so that
if the filesystem is not compiled in the kernel still links. Probably
a better solution is to use weak symbols.
- move the filesystem-specific itime macros to the filesystem header files.
 1.43 12-Sep-2005  christos Use nanotime() to update the time fields in filesystems. Convert the code
from macros to real functions. Original patch and review from chuq.
Note: ext2fs only keeps seconds in the on-disk inode, and msdosfs does not
have enough precision for all fields, so this is not very useful for those
two.
 1.42 19-Aug-2005  christos 64 bit inode changes.
 1.41 26-Feb-2005  perry branches: 1.41.4;
nuke trailing whitespace
 1.40 23-Jan-2005  rumble branches: 1.40.2;
Bring in Ian Dowse's Dirhash from FreeBSD. Hash tables of
directories are created on the fly and used to increase
performance by circumventing ufs_lookup's linear search.

Dirhash is enabled by the UFS_DIRHASH option, but not
by default.
 1.39 14-Aug-2004  mycroft branches: 1.39.4;
Push atime/mtime updates even further -- into the reclaim path, so they happen
rarely in the normal case. (Note: This happens at reboot/shutdown time because
all file systems are unmounted.)

Also, for IN_MODIFY, use IN_ACCESSED, not IN_MODIFIED; otherwise "ls -l" of
your device node or FIFO would cause the time stamps to get written too
quickly.
 1.38 14-Aug-2004  mycroft Add a new flag, IN_MODIFY. This is like IN_UPDATE|IN_CHANGE, but unlike
setting those flags, it does not cause the inode to be written in the periodic
sync. This is used for writes to special files (devices and named pipes) and
FIFOs.

Do not preemptively sync updates to access times and modification times. They
are now updated in the inode only opportunistically, or when the file or device
is closed. (Really, it should be delayed beyond close, but this is enough to
help substantially with device nodes.)

And the most amusing part:
Trickle sync was broken on both FFS and ext2fs, in different ways. In FFS, the
periodic call to VFS_SYNC(MNT_LAZY) was still causing all file data to be
synced. In ext2fs, it was causing the metadata to *not* be synced. We now
only call VOP_UPDATE() on the node if we're doing MNT_LAZY. I've confirmed
that we do in fact trickle correctly now.
 1.37 25-May-2004  hannken Add ffs internal snapshots. Written by Marshall Kirk McKusick for FreeBSD.

- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.

Welcome to 2.0F.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.36 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.35 15-May-2003  kristerw branches: 1.35.2;
The C language does not permit statements of the form
(X ? Y : Z) = 0;
even though gcc handles this by a stupid extension.

Transform these to correct C.

Approved by fvdl.
 1.34 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.33 17-Feb-2003  perseant Add code to UBCify LFS. This is still behind "#ifdef LFS_UBC" for now
(there are still some details to work out) but expect that to go
away soon. To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:

* Create a writer daemon kernel thread whose purpose is to handle page
writes for the pagedaemon, but which also takes over some of the
functions of lfs_check(). This thread is started the first time an
LFS is mounted.

* Add a "flags" parameter to GOP_SIZE. Current values are
GOP_SIZE_READ, meaning that the call should return the size of the
in-core version of the file, and GOP_SIZE_WRITE, meaning that it
should return the on-disk size. One of GOP_SIZE_READ or
GOP_SIZE_WRITE must be specified.

* Instead of using malloc(...M_WAITOK) for everything, reserve enough
resources to get by and use malloc(...M_NOWAIT), using the reserves if
necessary. Use the pool subsystem for structures small enough that
this is feasible. This also obsoletes LFS_THROTTLE.

And a few that are not strictly necessary:

* Moves the LFS inode extensions off onto a separately allocated
structure; getting closer to LFS as an LKM. "Welcome to 1.6O."

* Unified GOP_ALLOC between FFS and LFS.

* Update LFS copyright headers to correct values.

* Actually cast to unsigned in lfs_shellsort, like the comment says.

* Keep track of which segments were empty before the previous
checkpoint; any segments that pass two checkpoints both dirty and
empty can be summarily cleaned. Do this. Right now lfs_segclean
still works, but this should be turned into an effectless
compatibility syscall.
 1.32 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.31 01-Dec-2002  matt Add multiple inclusion protection for headers. Fix mismatched
variable declarations (missing const's) as needed.
 1.30 26-Nov-2002  yamt eliminate i_ino from in-core inode
and use local variable instead.

ok'ed by Frank van der Linden.
 1.29 06-Jul-2002  perseant Deal with fragment size changes better. For each fragment that can
exist on an on-disk inode, we keep a record of its size in struct inode,
which is updated when we write the block to disk. The cleaner routines
thus have ready access to what size is the correct size for this block,
on disk.

Fixed a related bug: if a file with fragments is being cleaned
(fragments being cleaned) at the same time it is being extended beyond
NDADDR blocks, we could write a bogus FINFO record that has a frag in the
middle; when it was cleaned this would give back bogus file data. Don't
write the indirect blocks in this case, since there is no need.

lfs_fragextend and lfs_truncate no longer require the seglock, but instead
take a shared lock, which the seglock locks exclusively.
 1.28 16-Jun-2002  perseant For synchronous writes, keep separate i/o counters for each write, so
processes don't have to wait for one another to finish (e.g., nfsd seems
to be a little happier now, though I haven't measured the difference).
Synchronous checkpoints, however, must always wait for all i/o to finish.

Take the contents of the callback functions and have them run in thread
context instead (aiodoned thread). lfs_iocount no longer has to be
protected in splbio(), and quite a bit less of the segment construction
loop needs to be in splbio() as well.

If lfs_markv is handed a block that is not the correct size according to
the inode, refuse to process it. (Formerly it was extended to the "correct"
size.) This is possibly more prone to deadlock, but less prone to corruption.

lfs_segclean now outright refuses to clean segments that appear to have live
bytes in them. Again this may be more prone to deadlock but avoids
corruption.

Replace ufsspec_close and ufsfifo_close with LFS equivalents; this means
that no UFS functions need to know about LFS_ITIMES any more. Remove
the reference from ufs/inode.h.

Tested on i386, test-compiled on alpha.
 1.27 18-Dec-2001  fvdl branches: 1.27.8; 1.27.10;
Bring over fixes from FreeBSD that weren't incorporated yet, mainly
from Kirk McKusick. They implement taking pending block/inode frees
into account for the sake of correct statfs() numbers, and adding
a new softdep type (newdirblk) to correctly handle newly allocated
directory blocks.

Minor additional changes: 1) swap the newly introduced fs_pendinginodes
and fs_pendingblock fields in ffs_sb_swap, and 2) declare lkt_held
in the debug version of the softdep lock structure volatile, as it
can be modified from interrupt context #ifdef DEBUG.
 1.26 26-Oct-2001  lukem this needs <ufs/ufs/quota.h>, so pull it in
 1.25 15-Sep-2001  chs branches: 1.25.2;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.24 05-Jun-2001  mrg branches: 1.24.4; 1.24.6;
only include "fs_lfs.h" if _KERNEL_OPT.
 1.23 10-Jan-2001  chs branches: 1.23.2;
attach the softdep pagecache pseudo-buffers to the inode
so we can find them quickly in the softdep truncate path.
 1.22 06-Jul-2000  perseant Fix so non-kernel code will compile (_LKM)
 1.21 05-Jul-2000  perseant Clean up accounting of lfs_uinodes (dirty but unwritten inodes).

Make lfs_uinodes a signed quantity for debugging purposes, and set it to
zero as fs mount time.

Enclose setting/clearing of the dirty flags (IN_MODIFIED, IN_ACCESSED,
IN_CLEANING) in macros, and use those macros everywhere. Make
LFS_ITIMES use these macros; updated the ITIMES macro in inode.h to know
about this. Make ufs_getattr use ITIMES instead of FFS_ITIMES.
 1.20 03-Jul-2000  perseant Allow the number of free segments reserved for the cleaner to be
parametrized in the filesystem, defaulting to MIN_FREE_SEGS = 2 but set
to something more reasonable at newfs_lfs time.

Note the number of blocks that have been scheduled for writing but which
are not yet on disk in an inode extension, i_lfs_effnblks. Move
i_ffs_effnlink out of the ffs extension and onto the main inode, since
it's used all over the shared code and the lfs extension would clobber
it.

At inode write time, indirect blocks and inode-held blocks of inodes
that have i_lfs_effnblks != i_ffs_blocks are cleansed of UNWRITTEN disk
addresses, so that these never make it to disk.
 1.19 29-May-2000  mycroft branches: 1.19.2;
Pull in IN_ACCESSED changes and some MNT_LAZY `bug fixes' from FFS to EXT2FS.
 1.18 29-May-2000  mycroft Add a new inode flags called IN_ACCESSED. This used in place of IN_MODIFIED
to record that the atime was updated. In ffs_update(), we only do synchronous
writes if something *other* than the atime was changed.
 1.17 27-May-2000  perseant branches: 1.17.2;
Prevent dirops from getting around lfs_check and wedging the buffer cache.
All the dirop vnops now mark the inodes with a new flag, IN_ADIROP, which
is removed as soon as the dirop is done (as opposed to VDIROP which stays
until the file is written). To address one issue raised in PR#9357.
 1.16 18-Nov-1999  enami Define i_e2fs_rdev.
 1.15 18-Nov-1999  enami Cosmetic changes; fix indentation and usage of white spaces.
 1.14 15-Nov-1999  fvdl Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.13 08-Jul-1999  wrstuden branches: 1.13.2; 1.13.4; 1.13.8;
Modify file systems to deal with struct lock in struct vnode. All leaf
fs's other than nfs use genfs_lock() for locking.

Modify lookup routines to set PDIRUNLOCK when they unlock the parrent.
 1.12 09-Mar-1999  perseant branches: 1.12.4;
Add IN_CLEANING flag for LFS
 1.11 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.10 11-Jun-1997  bouyer Add support for ext2fs, this needed a few modifications to ufs/ufs/inode.h:
- added an "union inode_ext" to struct inode, for the per-fs extentions.
For now only ext2fs uses it.
- i_din is now an union:
union {
struct dinode ffs_din; /* 128 bytes of the on-disk dinode. */
struct ext2fs_dinode e2fs_din; /* 128 bytes of the on-disk dinode. */
} i_din
Added a lot of #define i_ffs_* and i_e2fs_* to access the fields.
- Added two macros: FFS_ITIMES and EXT2FS_ITIMES. ITIMES calls the rigth
macro, depending on the time of the inode. ITIMES is used where necessary,
FFS_ITIMES and EXT2FS_ITIMES in other places.
 1.9 01-Sep-1996  mycroft Add a set of generic file system operations that most file systems use.
Also, fix some time stamp bogosities.
 1.8 15-Jun-1995  cgd compensate for timeval/timespec/stat structure changes.
 1.7 26-Mar-1995  jtc KERNEL -> _KERNEL
 1.6 21-Dec-1994  mycroft Add RCS ids where missing.
 1.5 14-Dec-1994  mycroft Sync with CSRG.
 1.4 20-Oct-1994  cgd update for new syscall args description mechanism, and deal safely
with wider types.
 1.3 30-Jun-1994  cgd fix the definition of a dev_t
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.12.4.1 02-Aug-1999  thorpej Update from trunk.
 1.13.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.13.4.1 19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.13.2.2 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.13.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.17.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.19.2.1 14-Sep-2000  perseant Pull up recent LFS kernel changes (approved by thorpej):

ufs/ufs/inode.h, 1.20--1.22 (add i_lfs_effnblks extension ;
make ITIMES aware of LFS_ITIMES;
_LKM protection so userland progs
compile)
ufs/ufs/ufs_vnops.c, 1.69, 1.71 (remove IN_ADIROP;
use ITIMES instead of FFS_ITIMES)
ufs/ufs/ufs_readwrite.c, 1.27 (use lfs_reserve in lfs_write)
ufs/lfs/lfs.h, 1.26--1.32 (define LFS_EST_* macros ;
change MIN_FREE_SEGS to lfs_minfreesegs ;
add avail and bfree to CLEANERINFO ;
change lfs_uinodes to signed ;
change lfs_dmeta to signed ;
add whitespace to line up structure
members ;
explicit cast to int32_t in LFS_EST_*
macros)
ufs/lfs/lfs_alloc.c, back out 1.34.2.3 (pullups of 1.39, 1.40);
then pull up 1.38 (clean up on error)
1.39--1.43 (restore fvdl's ufs_hashlock fix ;
restore fvdl's ufs_hashlock fix ;
set i_lfs_effnblks ;
use UINO macros ;
add comments and fix long lines)
ufs/lfs/lfs_balloc.c, 1.19 (don't succeed halfway)
1.21--1.25 (use i_lfs_effnblks ;
fix i_lfs_effnblks computation and
quieten ;
fix i_ffs_blocks in unwritten fragment ;
remove useless debugging check ;
add comments and (c) 2000)
ufs/lfs/lfs_bio.c, 1.24--1.30 (cleanup and make lfs_flush_fs take
"struct lfs *" instead of "struct
mount *" ;
use lfs_minfreeseg instead of
MIN_FREE_SEGS ;
use UINO macros, and copy bfree/avail
to CLEANERINFO ;
add lfs_reserve function ;
1.28--1.30 fix printf formatting)
ufs/lfs/lfs_cksum.c, 1.13 (add (c) 2000)
ufs/lfs/lfs_debug.c, 1.11 (use btodb instead of DEV_BSIZE)
ufs/lfs/lfs_extern.h, 1.18, 1.20--1.21 (function prototype changes)
ufs/lfs/lfs_inode.c, 1.38 (rewrite lfs_truncate from
ffs_truncate)
1.40--1.44 (count written and unwritten blocks
seperately ;
use disk block units instead of bytes ;
remove unnecessary "mod" variable ;
correct B_DELWRI to avoid bawrite panic ;
use lfs_reserve)
ufs/lfs/lfs_segment.c, 1.52-1.59 (use lfs_dmeta to note used summaries ;
check for UNWRITTEN in indirect blocks ;
more debugging stuff inside #ifdef
DEBUG_LFS ;
use LK_CANRECURSE ;
don't drop dirty indirect blocks ;
use UINO macros ;
don't hose the free list ;
use btodb() instead of DEV_BSIZE ;
make it compile again (oops))
ufs/lfs/lfs_subr.c, 1.16--1.17 (check for locked inodes before
changing ;
use btodb() instead of DEV_BSIZE, (c)
2000)
ufs/lfs/lfs_syscalls.c, back out 1.41.4.2 (fvdl's ufs_hashlock fix);
then pull up 1.43 (use lfs_dmeta)
1.44--1.45 (restore fvdl's ufs_hashlock fix)
1.46--1.47 (fix lfs_avail leakage from sblock
segments ;
use UINO macros)
1.49 (bounds-check inode numbers in
lfs_markv)
ufs/lfs/lfs_vfsops.c, 1.53 (use LFS_EST_* macros in lfs_statfs)
1.56--1.58 (initialize lfs_minfreeseg, lfs_effnblk ;
initialize lfs_uinodes ;
initialize lfs_ravail)
ufs/lfs/lfs_vnops.c, 1.40 (remove VDIROP from removed files)
1.42--1.44 (move SET_ENDOP below the removal of
VDIROP ;
use UINO macros and add lfs_itimes
function ;
use lfs_reserve in dirops)
 1.23.2.7 11-Dec-2002  thorpej Sync with HEAD.
 1.23.2.6 01-Aug-2002  nathanw Catch up to -current.
 1.23.2.5 20-Jun-2002  nathanw Catch up to -current.
 1.23.2.4 08-Jan-2002  nathanw Catch up to -current.
 1.23.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.23.2.2 21-Sep-2001  nathanw Catch up to -current.
 1.23.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.24.6.1 01-Oct-2001  fvdl Catch up with -current.
 1.24.4.3 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.24.4.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.24.4.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.25.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.27.10.1 20-Jun-2002  lukem Pull up revision 1.28 (requested by perseant in ticket #325):
For synchronous writes, keep separate i/o counters for each write, so
processes don't have to wait for one another to finish (e.g., nfsd seems
to be a little happier now, though I haven't measured the difference).
Synchronous checkpoints, however, must always wait for all i/o to finish.
Take the contents of the callback functions and have them run in thread
context instead (aiodoned thread). lfs_iocount no longer has to be
protected in splbio(), and quite a bit less of the segment construction
loop needs to be in splbio() as well.
If lfs_markv is handed a block that is not the correct size according to
the inode, refuse to process it. (Formerly it was extended to the "correct"
size.) This is possibly more prone to deadlock, but less prone to corruption.
lfs_segclean now outright refuses to clean segments that appear to have live
bytes in them. Again this may be more prone to deadlock but avoids
corruption.
Replace ufsspec_close and ufsfifo_close with LFS equivalents; this means
that no UFS functions need to know about LFS_ITIMES any more. Remove
the reference from ufs/inode.h.
Tested on i386, test-compiled on alpha.
 1.27.8.2 15-Jul-2002  gehenna catch up with -current.
 1.27.8.1 20-Jun-2002  gehenna catch up with -current.
 1.35.2.7 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.35.2.6 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.35.2.5 24-Jan-2005  skrll Sync with HEAD.
 1.35.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.35.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.35.2.2 25-Aug-2004  skrll Sync with HEAD.
 1.35.2.1 03-Aug-2004  skrll Sync with HEAD
 1.39.4.1 29-Apr-2005  kent sync with -current
 1.40.2.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.41.4.3 21-Jan-2008  yamt sync with head
 1.41.4.2 03-Sep-2007  yamt sync with head.
 1.41.4.1 21-Jun-2006  yamt sync with head.
 1.46.26.2 15-Apr-2007  yamt sync with head.
 1.46.26.1 12-Mar-2007  rmind Sync with HEAD.
 1.47.4.1 11-Jul-2007  mjf Sync with head.
 1.47.2.2 16-Sep-2007  ad - Checkpoint work in progress on the vnode lifecycle and reference counting
stuff. This makes it work properly without kernel_lock and fixes a few
quite old bugs. See vfs_subr.c 1.283.2.17 for details.

- Fix some problems with softdep. Unfortunately our softdep code appears
to have some longstanding bugs that cause it fail under stress test.
 1.47.2.1 10-Apr-2007  ad Sync with head.
 1.48.22.3 10-Jan-2008  bouyer Sync with HEAD
 1.48.22.2 08-Jan-2008  bouyer Sync with HEAD
 1.48.22.1 02-Jan-2008  bouyer Sync with HEAD
 1.48.18.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.48.16.1 18-Feb-2008  mjf Sync with HEAD.
 1.48.10.2 23-Mar-2008  matt sync with HEAD
 1.48.10.1 09-Jan-2008  matt sync with HEAD
 1.51.16.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.51.16.1 19-Oct-2008  haad Sync with HEAD.
 1.51.14.1 28-Jul-2008  simonb Add support for creating a WAPBL log in the filesystem. Will
create an in-filesystem log on first "mount -o log" if one doesn't
exist, and will then continue to use same log in the future. See
(soon to be added) wapbl(4) for more info.

Adds a new B_CONTIG low-level allocation flag that uses hints in
"struct ffs_inode_ext" to lay out an ffs file's data contiguously.

Thanks to Greg Oster for helping with the design of this and to
Antti Kantee for code review and suggestions.
 1.51.12.2 10-Oct-2008  skrll Sync with HEAD.
 1.51.12.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.51.10.2 11-Aug-2010  yamt sync with head.
 1.51.10.1 04-May-2009  yamt sync with head.
 1.51.6.2 17-Jan-2009  mjf Sync with HEAD.
 1.51.6.1 28-Sep-2008  mjf Sync with HEAD.
 1.54.4.2 19-May-2012  riz Apply patch (requested by buhrow in ticket #1759):


sys/ufs/lfs/lfs_vnops.c patch
sys/ufs/ufs/inode.h patch
sys/ufs/ufs/ufs_extern.h patch
sys/ufs/ufs/ufs_lookup.c patch
sys/ufs/ufs/ufs_vnops.c patch
sys/ufs/ufs/ufs_wapbl.c patch

Port dholland's ufs_rename locking changes to netbsd-5.
[buhrow, ticket #1759]

Hello. More testing has revealed a minor misunderstanding between the
vnode API in -current and 5.x. The below patch, against NetBSD-5.1
sources, rolls all the accumulated patches into one patch set. With this
patch, I believe you can now run with WAPBL, softdep or traditional ufs
semantics with heavy file loads and avoid panics due to resource exhaustion
and/or tstile deadlocks. Testing has been done on I386, both uniprocessor
and multiprocessor, and on Sparc machines in uniprocessor mode, though I
think multiprocessor Sparc would be fine as well. Since these changes are
machine independent, I don't anticipate any issues on any platform. It is
my hope that modulo any final issues that come up in the final round of
testing I'm currently performing, these patches will be ready to be pulled
up into the NetBSD-5 branch.
Finally, I'd like to thank mouse@ and hannken@ for their help and
patience in helping me track down and test the final versions of these
patches. With their assistance, I'm confident these patches make NetBSD-5
a much more stable and robust operating environment in a variety of
setings.
 1.54.4.1 29-Nov-2008  snj Pull up following revision(s) (requested by mrg in ticket #147):
sys/ufs/ext2fs/ext2fs_alloc.c: revision 1.37
sys/ufs/ext2fs/ext2fs_bswap.c: revision 1.14
sys/ufs/ext2fs/ext2fs_dinode.h: revision 1.17
sys/ufs/ext2fs/ext2fs_lookup.c: revision 1.56
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.83
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.140
sys/ufs/ufs/inode.h: revision 1.55
add support for 32 bit uid/gid fields in ext2, but only do so for
when the revision is > REV0.
 1.54.2.2 03-Mar-2009  skrll Sync with HEAD.
 1.54.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.55.4.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.56.4.1 05-Mar-2011  rmind sync with head
 1.56.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.58.6.2 02-Jun-2012  mrg sync to latest -current.
 1.58.6.1 18-Feb-2012  mrg merge to -current.
 1.58.2.5 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.58.2.4 16-Jan-2013  yamt sync with (a bit old) head
 1.58.2.3 30-Oct-2012  yamt sync with head
 1.58.2.2 23-May-2012  yamt sync with head.
 1.58.2.1 17-Apr-2012  yamt sync with head
 1.62.2.4 03-Dec-2017  jdolecek update from HEAD
 1.62.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.62.2.2 23-Jun-2013  tls resync from head
 1.62.2.1 25-Feb-2013  tls resync with head
 1.65.6.1 10-Aug-2014  tls Rebase.
 1.65.2.1 18-May-2014  rmind sync with head
 1.71.4.3 28-Aug-2017  skrll Sync with HEAD
 1.71.4.2 05-Oct-2016  skrll Sync with HEAD
 1.71.4.1 09-Jul-2016  skrll Sync with HEAD
 1.72.2.1 06-Aug-2016  pgoyette Sync with HEAD
 1.76.14.1 20-Apr-2020  bouyer Sync with HEAD
 1.76.4.1 21-Apr-2020  martin Sync with HEAD
 1.2 01-Mar-1998  fvdl Remove extraneous files from Lite2 merge.
 1.1 01-Mar-1998  fvdl branches: 1.1.1;
Initial revision
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.30 26-Aug-2012  dholland Move INITQFNAMES to the right header file.
 1.29 29-Jan-2012  dholland Rename static inline "helper" functions:
ufsclass2qtype -> quota_idtype_to_ufs
qtype2ufsclass -> quota_idtype_from_ufs

The reason for the direction of "ufs" changing is that the old names
were among the symbols using "ufs" to mean "fs-independent". So the
old names were for translating "ufsclass" (fs-independent quota id
type) to "qtype" (ufs-specific quota id type) and vice versa.

These functions are used in only two places, both of which are
inappropriate, so at some point they should probably be removed.
They're also identity transformations so not particularly helpful,
unless one were to make a careful and concerted effort to distinguish
the ufs quota code numbers from the fs-independent ones. This has not
been done and is probably impossible without support from a program
verifier, and maybe not even then.

They are static inline, so no compat concerns arise.

Also adjust the symbols they use to avoid <quota/quotaprop.h>.
 1.28 25-Mar-2011  bouyer branches: 1.28.4; 1.28.8;
Don't include quota/quotaprop.h for tools.
 1.27 24-Mar-2011  bouyer Add a new libquota library, which contains some blocks to build and/or
parse quota plists; as well as a getfsquota() function to retrieve quotas
for a single id from a single filesystem (whatever filesystem this is:
a local quota-enabled fs or NFS). This is build on functions getufsquota()
(for local filesystems with UFS-like quotas) and getnfsquota();
which are also available to userland programs.
move functions from quota2_subr.c to libquota or libprop as appropriate,
and ajust in-tree quota tools.
move some declarations from kernel headers to either sys/quota.h or
quota/quota.h as appropriate. ufs/ufs/quota.h still installed because
it's needed by other installed ufs headers.
ufs/ufs/quota1.h still installed as a quick&dirty way to get a code
using the old quotactl() to compile (just include ufs/ufs/quota1.h instead of
ufs/ufs/quota.h - old code won't compile without this change and this is
on purpose).
Discussed on tech-kern@ and tech-net@ (long thread, but not much about
libquota itself ...)
 1.26 06-Mar-2011  bouyer merge the bouyer-quota2 branch. This adds a new on-disk format
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.

See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
 1.25 10-Jul-2007  hannken branches: 1.25.56; 1.25.62; 1.25.64;
Move `struct dquot' and its supporting functions from quota.h to ufs_quota.c.

- Make quota-internal functions static.
- Clean up declarations in quota.h and ufs_extern.h. quota.h now has the
description of quota criterions, on-disk structure, user-kernel interface and
declaration of init/done functions. All ufs quota related function
prototypes go to ufs_extern.h.
- New functions ufsquota_init() and ufsquota_free() create or destroy the
quota fields of `struct inode'.
- chkdq() and chkiq() always update the quota fields of `struct inode' first.
- Only ufs_access() explicitely calls getinoquota().

No objections on tech-kern@
 1.24 23-Jun-2007  hannken If a quota-enabled file system has 65536 active vnodes for one uid
the reference counter of the corresponding struct dquot will overflow.

Change the type of the reference counter from u_int16_t to u_int32_t and
add an assertion to check for overflow.

Observed and tested by Edgar Fu�.

Welcome to 4.99.21 (struct dquot and therefore struct inode changed layout)
 1.23 04-Mar-2007  christos branches: 1.23.2; 1.23.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.22 14-May-2006  elad branches: 1.22.12; 1.22.14; 1.22.18;
integrate kauth.
 1.21 27-Dec-2005  chs branches: 1.21.4; 1.21.6; 1.21.8; 1.21.10; 1.21.12;
changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
make those fields always present.
- for functions which are conditionally inline, make them never inline.
- remove some other functions which are conditionally defined but
don't actually do anything anymore.
- make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.
 1.20 11-Dec-2005  christos merge ktrace-lwp.
 1.19 10-Jul-2005  thorpej - Use ANSI function decls.
- Sprinkle some static.
 1.18 27-Apr-2004  jrf branches: 1.18.10; 1.18.12; 1.18.14; 1.18.16;
First pass for some caddr_t removal and changes to get rid of it where we
no longer use and/or need it

- removed casts from unionfs, deadfs and fdesc
(there are more to hunt down still)
- changed vfs_quotactl args argumet from caddr_t to void *
- changed vfs_quotactl structures/callers to reflect the api change

Compiled fine and ran for about a day. Approved/reviewed by
christos@netbsd.org and gimpy@netbsd.org.
 1.17 07-Aug-2003  agc branches: 1.17.4;
Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.16 29-Jun-2003  fvdl branches: 1.16.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.15 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.14 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.13 01-Dec-2002  matt Add multiple inclusion protection for headers. Fix mismatched
variable declarations (missing const's) as needed.
 1.12 06-Nov-2001  simonb Remove superfluous semicolon.
 1.11 26-Oct-2001  lukem rename inclusion protection define from _QUOTA_ to _UFS_UFS_QUOTA_H_
 1.10 15-Sep-2001  chs branches: 1.10.2;
add a new VFS op, vfs_reinit, which is called when desiredvnodes is
adjusted via sysctl. file systems that have hash tables which are
sized based on the value of this variable now resize those hash tables
using the new value. the max number of FFS softdeps is also recalculated.

convert various file systems to use the <sys/queue.h> macros for
their hash tables.
 1.9 23-Feb-2001  eeh branches: 1.9.2; 1.9.6; 1.9.8;
Use int32_t for on-disk time_t values.
 1.8 16-Mar-2000  jdolecek branches: 1.8.4;
Change ufs_init() to keep global count of how many times it was called.
Resources are initialized still just once (on first call).

Add ufs_done(), which takes care of freeing all resources allocated in
ufs_init(). The resources are freed only when last user of the code exits.
 1.7 28-Sep-1996  christos branches: 1.7.28;
Add prototype for quotactl.
 1.6 26-Mar-1995  jtc KERNEL -> _KERNEL
 1.5 21-Dec-1994  mycroft #include sys/queue.h, but also hide kernel structures in #ifdef KERNEL.
 1.4 13-Dec-1994  mycroft Sync with CSRG.
 1.3 20-Oct-1994  cgd update for new syscall args description mechanism, and deal safely
with wider types.
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.7.28.2 12-Mar-2001  bouyer Sync with HEAD.
 1.7.28.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.8.4.1 25-Nov-2001  he Pull up revision 1.9 (requested by lukem):
Use int32_t for on-disk time_t representation.
Convert %q_ to %ll_ in print formats.
 1.9.8.1 01-Oct-2001  fvdl Catch up with -current.
 1.9.6.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.9.2.3 11-Dec-2002  thorpej Sync with HEAD.
 1.9.2.2 14-Nov-2001  nathanw Catch up to -current.
 1.9.2.1 21-Sep-2001  nathanw Catch up to -current.
 1.10.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.16.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.16.2.5 27-Oct-2004  skrll Remove the struct lwp * arguments from qsync and ufs_checkpath that are
no longer (read: were never) required.
 1.16.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.16.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.16.2.2 03-Aug-2004  skrll Sync with HEAD
 1.16.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.17.4.1 11-Aug-2007  bouyer Pull up following revision(s) (requested by hannken in ticket #11334):
sys/ufs/ufs/ufs_quota.c: revision 1.46
sys/ufs/ufs/quota.h: revision 1.24
If a quota-enabled file system has 65536 active vnodes for one uid
the reference counter of the corresponding struct dquot will overflow.
Change the type of the reference counter from u_int16_t to u_int32_t and
add an assertion to check for overflow.
Observed and tested by Edgar Fu�.
 1.18.16.1 28-Jun-2007  ghen Pull up following revision(s) (requested by hannken in ticket #1807):
sys/ufs/ufs/ufs_quota.c: revision 1.46
sys/ufs/ufs/quota.h: revision 1.24
sys/sys/param.h: patch
If a quota-enabled file system has 65536 active vnodes for one uid
the reference counter of the corresponding struct dquot will overflow.
Change the type of the reference counter from u_int16_t to u_int32_t and
add an assertion to check for overflow.
Bump kernel version as LKM's depending on UFS internals will have to be
recompiled after this change (discussed and approved on tech-kern).
 1.18.14.1 28-Jun-2007  ghen Pull up following revision(s) (requested by hannken in ticket #1807):
sys/ufs/ufs/ufs_quota.c: revision 1.46
sys/ufs/ufs/quota.h: revision 1.24
sys/sys/param.h: patch
If a quota-enabled file system has 65536 active vnodes for one uid
the reference counter of the corresponding struct dquot will overflow.
Change the type of the reference counter from u_int16_t to u_int32_t and
add an assertion to check for overflow.
Bump kernel version as LKM's depending on UFS internals will have to be
recompiled after this change (discussed and approved on tech-kern).
 1.18.12.2 03-Sep-2007  yamt sync with head.
 1.18.12.1 21-Jun-2006  yamt sync with head.
 1.18.10.1 28-Jun-2007  ghen Pull up following revision(s) (requested by hannken in ticket #1807):
sys/ufs/ufs/ufs_quota.c: revision 1.46
sys/ufs/ufs/quota.h: revision 1.24
sys/sys/param.h: patch
If a quota-enabled file system has 65536 active vnodes for one uid
the reference counter of the corresponding struct dquot will overflow.
Change the type of the reference counter from u_int16_t to u_int32_t and
add an assertion to check for overflow.
Bump kernel version as LKM's depending on UFS internals will have to be
recompiled after this change (discussed and approved on tech-kern).
 1.21.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.21.10.2 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.21.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.21.8.1 24-May-2006  yamt sync with head.
 1.21.6.1 01-Jun-2006  kardel Sync with head.
 1.21.4.1 09-Sep-2006  rpaulo sync with head
 1.22.18.1 03-Sep-2007  wrstuden Sync w/ NetBSD-4-RC_1
 1.22.14.1 12-Mar-2007  rmind Sync with HEAD.
 1.22.12.1 28-Jun-2007  ghen Pull up following revision(s) (requested by hannken in ticket #747):
sys/ufs/ufs/ufs_quota.c: revision 1.46
sys/ufs/ufs/quota.h: revision 1.24
sys/sys/param.h: patch
If a quota-enabled file system has 65536 active vnodes for one uid
the reference counter of the corresponding struct dquot will overflow.
Change the type of the reference counter from u_int16_t to u_int32_t and
add an assertion to check for overflow.
Bump kernel version as LKM's depending on UFS internals will have to be
recompiled after this change (discussed and approved on tech-kern).
 1.23.4.1 11-Jul-2007  mjf Sync with head.
 1.23.2.1 15-Jul-2007  ad Sync with head.
 1.25.64.3 09-Feb-2011  bouyer Various build fixes
 1.25.64.2 31-Jan-2011  bouyer Rename defininition for limits Q2V_* to QL_* and move from quota2.h to quota.h.
 1.25.64.1 20-Jan-2011  bouyer Snapshot of work in progress on a modernised disk quota system:
- new quotactl syscall (versionned for backward compat), which takes
as parameter a path to a mount point, and a prop_dictionary
(in plistref format) describing commands and arguments.
For each command, status and data are returned as a prop_dictionary.
quota commands features will be added to take advantage of this,
exporting quota data or getting quota commands as plists.

- new on disk-format storage (all 64bit wide), integrated to metadata for
ffs (and playing nicely with wapbl).
Quotas are enabled on a ffs filesystem via superblock flags.
tunefs(8) can enable or disable quotas.
On a quota-enabled filesystem, fsck_ffs(8) will track per-uid/gid
block and inode usages, and will check and update quotas in Pass 6.
quota usage and limits are stored in unliked files (one for users,
one for groups)l fsck_ffs(8) will create the files if needed, or
free them if needed. This means that after enabling or disabling
quotas on a filesystem; a fsck_ffs(8) run is required.
quotacheck(8) is not needed any more, on a unclean shutdown
fsck or journal replay will take care of fixing quotas.
newfs(8) can create a ready-to-mount quota-enabled filesystem
(superblock flags are set and quota inodes are created).
Other new features or semantic changes:
- default quota datas, applied to users or groups which don't already
have a quota entry
- per-user/group grace time (instead of a filesystem global one)
- 0 really means "nothing allowed at all", not "no limit".
If you want "no limit", set the limit to UQUAD_MAX (tools will
understand "unlimited" and "-")

A quota file is structured as follow:
it starts with a header, containing a few per-filesystem values,
and the default quota limits.
Quota entries are linked together as a simple list, each entry has a
pointer (as an offset withing the file) to the next.
The header has a pointer to a list of free quota entries, and
a hash table of in-use entries. The size of the hash table depends
on the filesystem block size (header+hash table should fit in the
first block). The file is not sparse and is a multiple of
filesystem block size (when the free quota entry list is empty a new
filesystem block is allocated). quota entries to not cross
filesystem block boundaries.

In memory, the kernel keeps a cache of recently used quota entries
as a reference to the block number, and offset withing the block.
The quota entry itself is keept in the buf cache.

fsck_ffs(8), tunefs(8) and newfs(8) supports are completed (with
related atf tests :)
The kernel can update disk usage and report it via quotactl(2).

Todo: enforce quotas limits (limits are not checked by kernel yet)
update repquota, edquota and rpc.rquotad to the new world
implement compat_50_quotactl ioctl.
update quotactl(2) man page

fsck_ffs required fixes so that allocating new blocks or inodes will
properly update the superblock and cg sumaries. This was not an issue up
to now because superblock and cg sumaries check happened last, but now
allocations or frees can happen in pass 6.
 1.25.62.1 06-Jun-2011  jruoho Sync with HEAD.
 1.25.56.1 21-Apr-2011  rmind sync with head
 1.28.8.1 18-Feb-2012  mrg merge to -current.
 1.28.4.2 30-Oct-2012  yamt sync with head
 1.28.4.1 17-Apr-2012  yamt sync with head
 1.7 26-Aug-2012  dholland Move INITQFNAMES to the right header file.
 1.6 29-Jan-2012  dholland Change dqblk_to_quotaval() from quota1_subr.c to dqblk_to_quotavals(),
and pass in two single quotaval structs (for blocks and inodes)
instead of an array of (implicitly) QUOTA_NLIMITS quotaval structs
indexed by constants from quotaprop.h.

Note: because this code is used by COMPAT_50 as well as ufs, this
change requires a kernel version bump. (The code is also used by
edquota, but via .PATH so it's not ABI-sensitive there.)
 1.5 25-Nov-2011  dholland branches: 1.5.2;
Rename struct ufs_quota_entry -> struct quotaval.
 1.4 20-Nov-2011  dholland Reshuffle decls among the quota headers so everything is in the place
it should be:
- stuff for the proplib interface goes in <quota/quotaprop.h>
- stuff for userlevel only goes in <quota/quota.h>
- stuff shared between user and kernel goes in <sys/quota.h>

Note that <quota/quota.h> and <quota/quotaprop.h> are expected to be
moved or removed later on... one thing at a time.

Update include directives in other files as needed.
 1.3 24-Mar-2011  bouyer branches: 1.3.2; 1.3.6; 1.3.8;
Add a new libquota library, which contains some blocks to build and/or
parse quota plists; as well as a getfsquota() function to retrieve quotas
for a single id from a single filesystem (whatever filesystem this is:
a local quota-enabled fs or NFS). This is build on functions getufsquota()
(for local filesystems with UFS-like quotas) and getnfsquota();
which are also available to userland programs.
move functions from quota2_subr.c to libquota or libprop as appropriate,
and ajust in-tree quota tools.
move some declarations from kernel headers to either sys/quota.h or
quota/quota.h as appropriate. ufs/ufs/quota.h still installed because
it's needed by other installed ufs headers.
ufs/ufs/quota1.h still installed as a quick&dirty way to get a code
using the old quotactl() to compile (just include ufs/ufs/quota1.h instead of
ufs/ufs/quota.h - old code won't compile without this change and this is
on purpose).
Discussed on tech-kern@ and tech-net@ (long thread, but not much about
libquota itself ...)
 1.2 06-Mar-2011  bouyer merge the bouyer-quota2 branch. This adds a new on-disk format
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.

See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
 1.1 20-Jan-2011  bouyer branches: 1.1.2;
file quota1.h was initially added on branch bouyer-quota2.
 1.1.2.3 09-Feb-2011  bouyer Various build fixes
 1.1.2.2 28-Jan-2011  bouyer Add conversion functions between old and new format.
 1.1.2.1 20-Jan-2011  bouyer Snapshot of work in progress on a modernised disk quota system:
- new quotactl syscall (versionned for backward compat), which takes
as parameter a path to a mount point, and a prop_dictionary
(in plistref format) describing commands and arguments.
For each command, status and data are returned as a prop_dictionary.
quota commands features will be added to take advantage of this,
exporting quota data or getting quota commands as plists.

- new on disk-format storage (all 64bit wide), integrated to metadata for
ffs (and playing nicely with wapbl).
Quotas are enabled on a ffs filesystem via superblock flags.
tunefs(8) can enable or disable quotas.
On a quota-enabled filesystem, fsck_ffs(8) will track per-uid/gid
block and inode usages, and will check and update quotas in Pass 6.
quota usage and limits are stored in unliked files (one for users,
one for groups)l fsck_ffs(8) will create the files if needed, or
free them if needed. This means that after enabling or disabling
quotas on a filesystem; a fsck_ffs(8) run is required.
quotacheck(8) is not needed any more, on a unclean shutdown
fsck or journal replay will take care of fixing quotas.
newfs(8) can create a ready-to-mount quota-enabled filesystem
(superblock flags are set and quota inodes are created).
Other new features or semantic changes:
- default quota datas, applied to users or groups which don't already
have a quota entry
- per-user/group grace time (instead of a filesystem global one)
- 0 really means "nothing allowed at all", not "no limit".
If you want "no limit", set the limit to UQUAD_MAX (tools will
understand "unlimited" and "-")

A quota file is structured as follow:
it starts with a header, containing a few per-filesystem values,
and the default quota limits.
Quota entries are linked together as a simple list, each entry has a
pointer (as an offset withing the file) to the next.
The header has a pointer to a list of free quota entries, and
a hash table of in-use entries. The size of the hash table depends
on the filesystem block size (header+hash table should fit in the
first block). The file is not sparse and is a multiple of
filesystem block size (when the free quota entry list is empty a new
filesystem block is allocated). quota entries to not cross
filesystem block boundaries.

In memory, the kernel keeps a cache of recently used quota entries
as a reference to the block number, and offset withing the block.
The quota entry itself is keept in the buf cache.

fsck_ffs(8), tunefs(8) and newfs(8) supports are completed (with
related atf tests :)
The kernel can update disk usage and report it via quotactl(2).

Todo: enforce quotas limits (limits are not checked by kernel yet)
update repquota, edquota and rpc.rquotad to the new world
implement compat_50_quotactl ioctl.
update quotactl(2) man page

fsck_ffs required fixes so that allocating new blocks or inodes will
properly update the superblock and cg sumaries. This was not an issue up
to now because superblock and cg sumaries check happened last, but now
allocations or frees can happen in pass 6.
 1.3.8.2 30-Oct-2012  yamt sync with head
 1.3.8.1 17-Apr-2012  yamt sync with head
 1.3.6.2 06-Jun-2011  jruoho Sync with HEAD.
 1.3.6.1 24-Mar-2011  jruoho file quota1.h was added on branch jruoho-x86intr on 2011-06-06 09:10:18 +0000
 1.3.2.2 21-Apr-2011  rmind sync with head
 1.3.2.1 24-Mar-2011  rmind file quota1.h was added on branch rmind-uvmplock on 2011-04-21 01:42:21 +0000
 1.5.2.1 18-Feb-2012  mrg merge to -current.
 1.8 22-Feb-2023  riastradh ufs: Nix trailing whitespace and tidy up some other minor KNF.
 1.7 29-Jan-2012  dholland Change dqblk_to_quotaval() from quota1_subr.c to dqblk_to_quotavals(),
and pass in two single quotaval structs (for blocks and inodes)
instead of an array of (implicitly) QUOTA_NLIMITS quotaval structs
indexed by constants from quotaprop.h.

Note: because this code is used by COMPAT_50 as well as ufs, this
change requires a kernel version bump. (The code is also used by
edquota, but via .PATH so it's not ABI-sensitive there.)
 1.6 25-Nov-2011  dholland branches: 1.6.2;
Rename struct ufs_quota_entry -> struct quotaval.
 1.5 20-Nov-2011  dholland Reshuffle decls among the quota headers so everything is in the place
it should be:
- stuff for the proplib interface goes in <quota/quotaprop.h>
- stuff for userlevel only goes in <quota/quota.h>
- stuff shared between user and kernel goes in <sys/quota.h>

Note that <quota/quota.h> and <quota/quotaprop.h> are expected to be
moved or removed later on... one thing at a time.

Update include directives in other files as needed.
 1.4 07-Jun-2011  bouyer branches: 1.4.2;
Fix bad cut'n'paste in copyright. Pointed out by dyoung@
 1.3 24-Mar-2011  bouyer branches: 1.3.2; 1.3.4; 1.3.6;
Add a new libquota library, which contains some blocks to build and/or
parse quota plists; as well as a getfsquota() function to retrieve quotas
for a single id from a single filesystem (whatever filesystem this is:
a local quota-enabled fs or NFS). This is build on functions getufsquota()
(for local filesystems with UFS-like quotas) and getnfsquota();
which are also available to userland programs.
move functions from quota2_subr.c to libquota or libprop as appropriate,
and ajust in-tree quota tools.
move some declarations from kernel headers to either sys/quota.h or
quota/quota.h as appropriate. ufs/ufs/quota.h still installed because
it's needed by other installed ufs headers.
ufs/ufs/quota1.h still installed as a quick&dirty way to get a code
using the old quotactl() to compile (just include ufs/ufs/quota1.h instead of
ufs/ufs/quota.h - old code won't compile without this change and this is
on purpose).
Discussed on tech-kern@ and tech-net@ (long thread, but not much about
libquota itself ...)
 1.2 06-Mar-2011  bouyer merge the bouyer-quota2 branch. This adds a new on-disk format
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.

See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
 1.1 28-Jan-2011  bouyer branches: 1.1.2;
file quota1_subr.c was initially added on branch bouyer-quota2.
 1.1.2.3 03-Feb-2011  bouyer Change semantic of limits to allow up to the limit inclued (instead of
up to one less than the limit: I feel that if my limit is 1000 inodes
I should be able to create 1000 files, not 999).
Keep the previous semantic for quota1 dquot, the conversion functions
will add or remove 1 when converting limits from/to the new format.

Adjust test for this change.
 1.1.2.2 31-Jan-2011  bouyer Catch up with Q2V -> QL renaming
Enforce limits for quota2.
pass quota type (*QUOTA) and limit type (QL_*) to
KAUTH_REQ_SYSTEM_FS_QUOTA_NOLIMIT, to make it possible to skip
limit checks for some quota type only if a listener wants to.
 1.1.2.1 28-Jan-2011  bouyer Add conversion functions between old and new format.
 1.3.6.2 06-Jun-2011  jruoho Sync with HEAD.
 1.3.6.1 24-Mar-2011  jruoho file quota1_subr.c was added on branch jruoho-x86intr on 2011-06-06 09:10:18 +0000
 1.3.4.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.3.2.3 12-Jun-2011  rmind sync with head
 1.3.2.2 21-Apr-2011  rmind sync with head
 1.3.2.1 24-Mar-2011  rmind file quota1_subr.c was added on branch rmind-uvmplock on 2011-04-21 01:42:21 +0000
 1.4.2.1 17-Apr-2012  yamt sync with head
 1.6.2.1 18-Feb-2012  mrg merge to -current.
 1.11 22-Feb-2023  riastradh ufs: Nix trailing whitespace and tidy up some other minor KNF.
 1.10 25-Oct-2017  jdolecek fix tyop, PR kern/52653 by Edgar Fuss
 1.9 05-Feb-2012  dholland branches: 1.9.6;
Migrate one last leftover bit (used only by the kernel now) to
sys/ufs/ufs and remove the old quota headers and no-longer-used shared
code. Ok by releng.
 1.8 29-Jan-2012  dholland quota2_check_limit() is used in only one place, so don't stuff it in a
header file.
 1.7 29-Jan-2012  dholland Remove now-unused declarations from quota2.h.
 1.6 29-Jan-2012  dholland Remove references to <quota/quotaprop.h> in src/sys/ufs.
The remaining references in the kernel are in vfs_quotactl.c, the
compat_50 code for the old quotactl (to be fixed up), and the
code compiled from src/common/lib/libquota.
 1.5 07-Jun-2011  bouyer branches: 1.5.2; 1.5.6;
Fix bad cut'n'paste in copyright. Pointed out by dyoung@
 1.4 24-Mar-2011  bouyer branches: 1.4.2; 1.4.4; 1.4.6;
Add a new libquota library, which contains some blocks to build and/or
parse quota plists; as well as a getfsquota() function to retrieve quotas
for a single id from a single filesystem (whatever filesystem this is:
a local quota-enabled fs or NFS). This is build on functions getufsquota()
(for local filesystems with UFS-like quotas) and getnfsquota();
which are also available to userland programs.
move functions from quota2_subr.c to libquota or libprop as appropriate,
and ajust in-tree quota tools.
move some declarations from kernel headers to either sys/quota.h or
quota/quota.h as appropriate. ufs/ufs/quota.h still installed because
it's needed by other installed ufs headers.
ufs/ufs/quota1.h still installed as a quick&dirty way to get a code
using the old quotactl() to compile (just include ufs/ufs/quota1.h instead of
ufs/ufs/quota.h - old code won't compile without this change and this is
on purpose).
Discussed on tech-kern@ and tech-net@ (long thread, but not much about
libquota itself ...)
 1.3 09-Mar-2011  dholland typo in comment
 1.2 06-Mar-2011  bouyer merge the bouyer-quota2 branch. This adds a new on-disk format
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.

See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
 1.1 20-Jan-2011  bouyer branches: 1.1.2;
file quota2.h was initially added on branch bouyer-quota2.
 1.1.2.6 03-Feb-2011  bouyer factor out code to chech a quota against its limits.
 1.1.2.5 31-Jan-2011  bouyer Rename defininition for limits Q2V_* to QL_* and move from quota2.h to quota.h.
 1.1.2.4 29-Jan-2011  bouyer Describe how the on-disk structures are protected from concurent access,
and try to implement it.
 1.1.2.3 28-Jan-2011  bouyer Introduce quota2_ufs_rwq2v() and quota2_ufs_rwq2e() functions, which
byteswap a quota2_val or quota2_entry if needed.
Use this to get quota2_entry in host order before calling q2etoprop().

quota2_walk_list() will byteswap the offset if needed to leave
it in FS byte order in callers.
 1.1.2.2 21-Jan-2011  bouyer Add support for quotactl("getall") command, and convert repquota to new
world.
 1.1.2.1 20-Jan-2011  bouyer Snapshot of work in progress on a modernised disk quota system:
- new quotactl syscall (versionned for backward compat), which takes
as parameter a path to a mount point, and a prop_dictionary
(in plistref format) describing commands and arguments.
For each command, status and data are returned as a prop_dictionary.
quota commands features will be added to take advantage of this,
exporting quota data or getting quota commands as plists.

- new on disk-format storage (all 64bit wide), integrated to metadata for
ffs (and playing nicely with wapbl).
Quotas are enabled on a ffs filesystem via superblock flags.
tunefs(8) can enable or disable quotas.
On a quota-enabled filesystem, fsck_ffs(8) will track per-uid/gid
block and inode usages, and will check and update quotas in Pass 6.
quota usage and limits are stored in unliked files (one for users,
one for groups)l fsck_ffs(8) will create the files if needed, or
free them if needed. This means that after enabling or disabling
quotas on a filesystem; a fsck_ffs(8) run is required.
quotacheck(8) is not needed any more, on a unclean shutdown
fsck or journal replay will take care of fixing quotas.
newfs(8) can create a ready-to-mount quota-enabled filesystem
(superblock flags are set and quota inodes are created).
Other new features or semantic changes:
- default quota datas, applied to users or groups which don't already
have a quota entry
- per-user/group grace time (instead of a filesystem global one)
- 0 really means "nothing allowed at all", not "no limit".
If you want "no limit", set the limit to UQUAD_MAX (tools will
understand "unlimited" and "-")

A quota file is structured as follow:
it starts with a header, containing a few per-filesystem values,
and the default quota limits.
Quota entries are linked together as a simple list, each entry has a
pointer (as an offset withing the file) to the next.
The header has a pointer to a list of free quota entries, and
a hash table of in-use entries. The size of the hash table depends
on the filesystem block size (header+hash table should fit in the
first block). The file is not sparse and is a multiple of
filesystem block size (when the free quota entry list is empty a new
filesystem block is allocated). quota entries to not cross
filesystem block boundaries.

In memory, the kernel keeps a cache of recently used quota entries
as a reference to the block number, and offset withing the block.
The quota entry itself is keept in the buf cache.

fsck_ffs(8), tunefs(8) and newfs(8) supports are completed (with
related atf tests :)
The kernel can update disk usage and report it via quotactl(2).

Todo: enforce quotas limits (limits are not checked by kernel yet)
update repquota, edquota and rpc.rquotad to the new world
implement compat_50_quotactl ioctl.
update quotactl(2) man page

fsck_ffs required fixes so that allocating new blocks or inodes will
properly update the superblock and cg sumaries. This was not an issue up
to now because superblock and cg sumaries check happened last, but now
allocations or frees can happen in pass 6.
 1.4.6.2 06-Jun-2011  jruoho Sync with HEAD.
 1.4.6.1 24-Mar-2011  jruoho file quota2.h was added on branch jruoho-x86intr on 2011-06-06 09:10:18 +0000
 1.4.4.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.4.2.3 12-Jun-2011  rmind sync with head
 1.4.2.2 21-Apr-2011  rmind sync with head
 1.4.2.1 24-Mar-2011  rmind file quota2.h was added on branch rmind-uvmplock on 2011-04-21 01:42:21 +0000
 1.5.6.1 18-Feb-2012  mrg merge to -current.
 1.5.2.1 17-Apr-2012  yamt sync with head
 1.9.6.1 03-Dec-2017  jdolecek update from HEAD
 1.3 24-Mar-2011  bouyer Add a new libquota library, which contains some blocks to build and/or
parse quota plists; as well as a getfsquota() function to retrieve quotas
for a single id from a single filesystem (whatever filesystem this is:
a local quota-enabled fs or NFS). This is build on functions getufsquota()
(for local filesystems with UFS-like quotas) and getnfsquota();
which are also available to userland programs.
move functions from quota2_subr.c to libquota or libprop as appropriate,
and ajust in-tree quota tools.
move some declarations from kernel headers to either sys/quota.h or
quota/quota.h as appropriate. ufs/ufs/quota.h still installed because
it's needed by other installed ufs headers.
ufs/ufs/quota1.h still installed as a quick&dirty way to get a code
using the old quotactl() to compile (just include ufs/ufs/quota1.h instead of
ufs/ufs/quota.h - old code won't compile without this change and this is
on purpose).
Discussed on tech-kern@ and tech-net@ (long thread, but not much about
libquota itself ...)
 1.2 06-Mar-2011  bouyer merge the bouyer-quota2 branch. This adds a new on-disk format
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.

See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
 1.1 20-Jan-2011  bouyer branches: 1.1.2;
file quota2_prop.c was initially added on branch bouyer-quota2.
 1.1.2.5 11-Feb-2011  bouyer Remove key "quota version", it doesn't serve any purpose
 1.1.2.4 31-Jan-2011  bouyer Catch up with Q2V -> QL renaming
Enforce limits for quota2.
pass quota type (*QUOTA) and limit type (QL_*) to
KAUTH_REQ_SYSTEM_FS_QUOTA_NOLIMIT, to make it possible to skip
limit checks for some quota type only if a listener wants to.
 1.1.2.3 30-Jan-2011  bouyer Implement 'set' command for quota2.
 1.1.2.2 21-Jan-2011  bouyer Add support for quotactl("getall") command, and convert repquota to new
world.
 1.1.2.1 20-Jan-2011  bouyer Snapshot of work in progress on a modernised disk quota system:
- new quotactl syscall (versionned for backward compat), which takes
as parameter a path to a mount point, and a prop_dictionary
(in plistref format) describing commands and arguments.
For each command, status and data are returned as a prop_dictionary.
quota commands features will be added to take advantage of this,
exporting quota data or getting quota commands as plists.

- new on disk-format storage (all 64bit wide), integrated to metadata for
ffs (and playing nicely with wapbl).
Quotas are enabled on a ffs filesystem via superblock flags.
tunefs(8) can enable or disable quotas.
On a quota-enabled filesystem, fsck_ffs(8) will track per-uid/gid
block and inode usages, and will check and update quotas in Pass 6.
quota usage and limits are stored in unliked files (one for users,
one for groups)l fsck_ffs(8) will create the files if needed, or
free them if needed. This means that after enabling or disabling
quotas on a filesystem; a fsck_ffs(8) run is required.
quotacheck(8) is not needed any more, on a unclean shutdown
fsck or journal replay will take care of fixing quotas.
newfs(8) can create a ready-to-mount quota-enabled filesystem
(superblock flags are set and quota inodes are created).
Other new features or semantic changes:
- default quota datas, applied to users or groups which don't already
have a quota entry
- per-user/group grace time (instead of a filesystem global one)
- 0 really means "nothing allowed at all", not "no limit".
If you want "no limit", set the limit to UQUAD_MAX (tools will
understand "unlimited" and "-")

A quota file is structured as follow:
it starts with a header, containing a few per-filesystem values,
and the default quota limits.
Quota entries are linked together as a simple list, each entry has a
pointer (as an offset withing the file) to the next.
The header has a pointer to a list of free quota entries, and
a hash table of in-use entries. The size of the hash table depends
on the filesystem block size (header+hash table should fit in the
first block). The file is not sparse and is a multiple of
filesystem block size (when the free quota entry list is empty a new
filesystem block is allocated). quota entries to not cross
filesystem block boundaries.

In memory, the kernel keeps a cache of recently used quota entries
as a reference to the block number, and offset withing the block.
The quota entry itself is keept in the buf cache.

fsck_ffs(8), tunefs(8) and newfs(8) supports are completed (with
related atf tests :)
The kernel can update disk usage and report it via quotactl(2).

Todo: enforce quotas limits (limits are not checked by kernel yet)
update repquota, edquota and rpc.rquotad to the new world
implement compat_50_quotactl ioctl.
update quotactl(2) man page

fsck_ffs required fixes so that allocating new blocks or inodes will
properly update the superblock and cg sumaries. This was not an issue up
to now because superblock and cg sumaries check happened last, but now
allocations or frees can happen in pass 6.
 1.3 24-Mar-2011  bouyer Add a new libquota library, which contains some blocks to build and/or
parse quota plists; as well as a getfsquota() function to retrieve quotas
for a single id from a single filesystem (whatever filesystem this is:
a local quota-enabled fs or NFS). This is build on functions getufsquota()
(for local filesystems with UFS-like quotas) and getnfsquota();
which are also available to userland programs.
move functions from quota2_subr.c to libquota or libprop as appropriate,
and ajust in-tree quota tools.
move some declarations from kernel headers to either sys/quota.h or
quota/quota.h as appropriate. ufs/ufs/quota.h still installed because
it's needed by other installed ufs headers.
ufs/ufs/quota1.h still installed as a quick&dirty way to get a code
using the old quotactl() to compile (just include ufs/ufs/quota1.h instead of
ufs/ufs/quota.h - old code won't compile without this change and this is
on purpose).
Discussed on tech-kern@ and tech-net@ (long thread, but not much about
libquota itself ...)
 1.2 06-Mar-2011  bouyer merge the bouyer-quota2 branch. This adds a new on-disk format
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.

See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
 1.1 20-Jan-2011  bouyer branches: 1.1.2;
file quota2_prop.h was initially added on branch bouyer-quota2.
 1.1.2.3 30-Jan-2011  bouyer Implement 'set' command for quota2.
 1.1.2.2 21-Jan-2011  bouyer Add support for quotactl("getall") command, and convert repquota to new
world.
 1.1.2.1 20-Jan-2011  bouyer Snapshot of work in progress on a modernised disk quota system:
- new quotactl syscall (versionned for backward compat), which takes
as parameter a path to a mount point, and a prop_dictionary
(in plistref format) describing commands and arguments.
For each command, status and data are returned as a prop_dictionary.
quota commands features will be added to take advantage of this,
exporting quota data or getting quota commands as plists.

- new on disk-format storage (all 64bit wide), integrated to metadata for
ffs (and playing nicely with wapbl).
Quotas are enabled on a ffs filesystem via superblock flags.
tunefs(8) can enable or disable quotas.
On a quota-enabled filesystem, fsck_ffs(8) will track per-uid/gid
block and inode usages, and will check and update quotas in Pass 6.
quota usage and limits are stored in unliked files (one for users,
one for groups)l fsck_ffs(8) will create the files if needed, or
free them if needed. This means that after enabling or disabling
quotas on a filesystem; a fsck_ffs(8) run is required.
quotacheck(8) is not needed any more, on a unclean shutdown
fsck or journal replay will take care of fixing quotas.
newfs(8) can create a ready-to-mount quota-enabled filesystem
(superblock flags are set and quota inodes are created).
Other new features or semantic changes:
- default quota datas, applied to users or groups which don't already
have a quota entry
- per-user/group grace time (instead of a filesystem global one)
- 0 really means "nothing allowed at all", not "no limit".
If you want "no limit", set the limit to UQUAD_MAX (tools will
understand "unlimited" and "-")

A quota file is structured as follow:
it starts with a header, containing a few per-filesystem values,
and the default quota limits.
Quota entries are linked together as a simple list, each entry has a
pointer (as an offset withing the file) to the next.
The header has a pointer to a list of free quota entries, and
a hash table of in-use entries. The size of the hash table depends
on the filesystem block size (header+hash table should fit in the
first block). The file is not sparse and is a multiple of
filesystem block size (when the free quota entry list is empty a new
filesystem block is allocated). quota entries to not cross
filesystem block boundaries.

In memory, the kernel keeps a cache of recently used quota entries
as a reference to the block number, and offset withing the block.
The quota entry itself is keept in the buf cache.

fsck_ffs(8), tunefs(8) and newfs(8) supports are completed (with
related atf tests :)
The kernel can update disk usage and report it via quotactl(2).

Todo: enforce quotas limits (limits are not checked by kernel yet)
update repquota, edquota and rpc.rquotad to the new world
implement compat_50_quotactl ioctl.
update quotactl(2) man page

fsck_ffs required fixes so that allocating new blocks or inodes will
properly update the superblock and cg sumaries. This was not an issue up
to now because superblock and cg sumaries check happened last, but now
allocations or frees can happen in pass 6.
 1.7 24-Aug-2023  andvar s/defaut/default/ in comments.
 1.6 22-Feb-2023  riastradh ufs: Nix trailing whitespace and tidy up some other minor KNF.
 1.5 05-Feb-2012  dholland Migrate one last leftover bit (used only by the kernel now) to
sys/ufs/ufs and remove the old quota headers and no-longer-used shared
code. Ok by releng.
 1.4 07-Jun-2011  bouyer branches: 1.4.2; 1.4.6;
Fix bad cut'n'paste in copyright. Pointed out by dyoung@
 1.3 24-Mar-2011  bouyer branches: 1.3.2; 1.3.4; 1.3.6;
Add a new libquota library, which contains some blocks to build and/or
parse quota plists; as well as a getfsquota() function to retrieve quotas
for a single id from a single filesystem (whatever filesystem this is:
a local quota-enabled fs or NFS). This is build on functions getufsquota()
(for local filesystems with UFS-like quotas) and getnfsquota();
which are also available to userland programs.
move functions from quota2_subr.c to libquota or libprop as appropriate,
and ajust in-tree quota tools.
move some declarations from kernel headers to either sys/quota.h or
quota/quota.h as appropriate. ufs/ufs/quota.h still installed because
it's needed by other installed ufs headers.
ufs/ufs/quota1.h still installed as a quick&dirty way to get a code
using the old quotactl() to compile (just include ufs/ufs/quota1.h instead of
ufs/ufs/quota.h - old code won't compile without this change and this is
on purpose).
Discussed on tech-kern@ and tech-net@ (long thread, but not much about
libquota itself ...)
 1.2 06-Mar-2011  bouyer merge the bouyer-quota2 branch. This adds a new on-disk format
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.

See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
 1.1 20-Jan-2011  bouyer branches: 1.1.2;
file quota2_subr.c was initially added on branch bouyer-quota2.
 1.1.2.6 03-Feb-2011  bouyer Change semantic of limits to allow up to the limit inclued (instead of
up to one less than the limit: I feel that if my limit is 1000 inodes
I should be able to create 1000 files, not 999).
Keep the previous semantic for quota1 dquot, the conversion functions
will add or remove 1 when converting limits from/to the new format.

Adjust test for this change.
 1.1.2.5 03-Feb-2011  bouyer factor out code to chech a quota against its limits.
 1.1.2.4 31-Jan-2011  bouyer Catch up with Q2V -> QL renaming
Enforce limits for quota2.
pass quota type (*QUOTA) and limit type (QL_*) to
KAUTH_REQ_SYSTEM_FS_QUOTA_NOLIMIT, to make it possible to skip
limit checks for some quota type only if a listener wants to.
 1.1.2.3 28-Jan-2011  bouyer Add RCSID
 1.1.2.2 28-Jan-2011  bouyer Introduce quota2_ufs_rwq2v() and quota2_ufs_rwq2e() functions, which
byteswap a quota2_val or quota2_entry if needed.
Use this to get quota2_entry in host order before calling q2etoprop().

quota2_walk_list() will byteswap the offset if needed to leave
it in FS byte order in callers.
 1.1.2.1 20-Jan-2011  bouyer Snapshot of work in progress on a modernised disk quota system:
- new quotactl syscall (versionned for backward compat), which takes
as parameter a path to a mount point, and a prop_dictionary
(in plistref format) describing commands and arguments.
For each command, status and data are returned as a prop_dictionary.
quota commands features will be added to take advantage of this,
exporting quota data or getting quota commands as plists.

- new on disk-format storage (all 64bit wide), integrated to metadata for
ffs (and playing nicely with wapbl).
Quotas are enabled on a ffs filesystem via superblock flags.
tunefs(8) can enable or disable quotas.
On a quota-enabled filesystem, fsck_ffs(8) will track per-uid/gid
block and inode usages, and will check and update quotas in Pass 6.
quota usage and limits are stored in unliked files (one for users,
one for groups)l fsck_ffs(8) will create the files if needed, or
free them if needed. This means that after enabling or disabling
quotas on a filesystem; a fsck_ffs(8) run is required.
quotacheck(8) is not needed any more, on a unclean shutdown
fsck or journal replay will take care of fixing quotas.
newfs(8) can create a ready-to-mount quota-enabled filesystem
(superblock flags are set and quota inodes are created).
Other new features or semantic changes:
- default quota datas, applied to users or groups which don't already
have a quota entry
- per-user/group grace time (instead of a filesystem global one)
- 0 really means "nothing allowed at all", not "no limit".
If you want "no limit", set the limit to UQUAD_MAX (tools will
understand "unlimited" and "-")

A quota file is structured as follow:
it starts with a header, containing a few per-filesystem values,
and the default quota limits.
Quota entries are linked together as a simple list, each entry has a
pointer (as an offset withing the file) to the next.
The header has a pointer to a list of free quota entries, and
a hash table of in-use entries. The size of the hash table depends
on the filesystem block size (header+hash table should fit in the
first block). The file is not sparse and is a multiple of
filesystem block size (when the free quota entry list is empty a new
filesystem block is allocated). quota entries to not cross
filesystem block boundaries.

In memory, the kernel keeps a cache of recently used quota entries
as a reference to the block number, and offset withing the block.
The quota entry itself is keept in the buf cache.

fsck_ffs(8), tunefs(8) and newfs(8) supports are completed (with
related atf tests :)
The kernel can update disk usage and report it via quotactl(2).

Todo: enforce quotas limits (limits are not checked by kernel yet)
update repquota, edquota and rpc.rquotad to the new world
implement compat_50_quotactl ioctl.
update quotactl(2) man page

fsck_ffs required fixes so that allocating new blocks or inodes will
properly update the superblock and cg sumaries. This was not an issue up
to now because superblock and cg sumaries check happened last, but now
allocations or frees can happen in pass 6.
 1.3.6.2 06-Jun-2011  jruoho Sync with HEAD.
 1.3.6.1 24-Mar-2011  jruoho file quota2_subr.c was added on branch jruoho-x86intr on 2011-06-06 09:10:18 +0000
 1.3.4.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.3.2.3 12-Jun-2011  rmind sync with head
 1.3.2.2 21-Apr-2011  rmind sync with head
 1.3.2.1 24-Mar-2011  rmind file quota2_subr.c was added on branch rmind-uvmplock on 2011-04-21 01:42:21 +0000
 1.4.6.1 18-Feb-2012  mrg merge to -current.
 1.4.2.1 17-Apr-2012  yamt sync with head
 1.5 22-Feb-2023  riastradh ufs: Nix trailing whitespace and tidy up some other minor KNF.
 1.4 26-Nov-2021  christos use MNT_NFS4ACLS instead of MNT_ACLS (which was changed before to mean
MNT_POSIX1EACLS)
 1.3 20-Oct-2021  thorpej Overhaul of the EVFILT_VNODE kevent(2) filter:

- Centralize vnode kevent handling in the VOP_*() wrappers, rather than
forcing each individual file system to deal with it (except VOP_RENAME(),
because VOP_RENAME() is a mess and we currently have 2 different ways
of handling it; at least it's reasonably well-centralized in the "new"
way).
- Add support for NOTE_OPEN, NOTE_CLOSE, NOTE_CLOSE_WRITE, and NOTE_READ,
compatible with the same events in FreeBSD.
- Track which kevent notifications clients are interested in receiving
to avoid doing work for events no one cares about (avoiding, e.g.
taking locks and traversing the klist to send a NOTE_WRITE when
someone is merely watching for a file to be deleted, for example).

In support of the above:

- Add support in vnode_if.sh for specifying PRE- and POST-op handlers,
to be invoked before and after vop_pre() and vop_post(), respectively.
Basic idea from FreeBSD, but implemented differently.
- Add support in vnode_if.sh for specifying CONTEXT fields in the
vop_*_args structures. These context fields are used to convey information
between the file system VOP function and the VOP wrapper, but do not
occupy an argument slot in the VOP_*() call itself. These context fields
are initialized and subsequently interpreted by PRE- and POST-op handlers.
- Version VOP_REMOVE(), uses the a context field for the file system to report
back the resulting link count of the target vnode. Return this in tmpfs,
udf, nfs, chfs, ext2fs, lfs, and ufs.

NetBSD 9.99.92.
 1.2 10-Oct-2021  thorpej Use VN_KNOTE() to send our NOTE_ATTRIB and NOTE_REVOKE events.
 1.1 16-May-2020  christos Add ACL support for FFS. From FreeBSD.
 1.54 17-Nov-2022  chs Restore backward compatibility of UFS2 with previous NetBSD releases by
disabling support in UFS2 for extended attributes (including ACLs).
Add a new variant of UFS2 called "UFS2ea" that does support extended attributes.
Add new fsck_ffs operations "-c ea" and "-c no-ea" to convert file systems
from UFS2 to UFS2ea and vice-versa (both of which delete all existing extended
attributes in the process).
 1.53 20-Apr-2020  christos handle negative small block numbers for extattr
 1.52 18-Mar-2017  riastradh branches: 1.52.14; 1.52.24;
#if DIAGNOSTIC panic ---> KASSERT
 1.51 01-Mar-2017  hannken Remove now redundant calls to fstrans_start()/fstrans_done().
 1.50 22-Jan-2013  dholland branches: 1.50.14; 1.50.18; 1.50.22;
Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.49 06-Mar-2011  bouyer branches: 1.49.4; 1.49.14;
merge the bouyer-quota2 branch. This adds a new on-disk format
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.

See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
 1.48 27-Mar-2008  ad branches: 1.48.26; 1.48.32; 1.48.34;
Make rusage collection per-LWP and collate in the appropriate places.
cloned threads need a little bit more work but the locking needs to
be fixed first.
 1.47 02-Jan-2008  ad branches: 1.47.6;
Merge vmlocking2 to head.
 1.46 08-Oct-2007  ad branches: 1.46.4; 1.46.6; 1.46.10;
Merge ffs locking & brelse changes from the vmlocking branch.
 1.45 17-May-2007  hannken branches: 1.45.6; 1.45.8; 1.45.10;
Fstrans_start() always returns zero, so change its type to void.
 1.44 22-Feb-2007  thorpej branches: 1.44.4; 1.44.6;
TRUE -> true, FALSE -> false
 1.43 21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.42 29-Jan-2007  hannken branches: 1.42.2;
Change fstrans enum types to upper case.
No functional change.

From Antti Kantee <pooka@netbsd.org>
 1.41 19-Jan-2007  hannken New file system suspension API to replace vn_start_write and vn_finished_write.
The suspension helpers are now put into file system specific operations.
This means every file system not supporting these helpers cannot be suspended
and therefore snapshots are no longer possible.

Implemented for file systems of type ffs.

The new API is enabled on a kernel option NEWVNGATE. This option is
not enabled by default in any kernel config.

Presented and discussed on tech-kern with much input from
Bill Studenmund <wrstuden@netbsd.org> and YAMAMOTO Takashi <yamt@netbsd.org>.

Welcome to 4.99.9 (new vfs op vfs_suspendctl).
 1.40 04-Apr-2006  pavel branches: 1.40.8;
Correct typo in a panic message.
 1.39 11-Dec-2005  christos branches: 1.39.4; 1.39.6; 1.39.8; 1.39.10; 1.39.12;
merge ktrace-lwp.
 1.38 08-Oct-2005  chs avoid the need for a bogus initializer.
 1.37 12-Aug-2005  jmmv Drop extra word from comment.
 1.36 10-Jul-2005  thorpej - Use ANSI function decls.
- Sprinkle some static.
 1.35 26-Feb-2005  perry branches: 1.35.4;
nuke trailing whitespace
 1.34 15-Dec-2004  mycroft branches: 1.34.2; 1.34.4;
Remove some unnecessary (int32_t) casts that would cause us to screw up the
top bit in block addresses.

Also, change some daddr_t->int32_t casts (mostly as arguments to ufs_rw32(),
where they would get promoted anyway) to u_int32_t.
 1.33 15-Sep-2004  yamt ufs_getlbns:
- fix an integer overflow when calculating lbns of indirect blocks.
- remove a redundant calculation of blockcnt.
 1.32 15-Aug-2004  mycroft Repair some FFS_EI code for ufsmount changes.
 1.31 15-Aug-2004  mycroft Fixing age old cruft:
* Rather than using mnt_maxsymlinklen to indicate that a file systems returns
d_type fields(!), add a new internal flag, IMNT_DTYPE.

Add 3 new elements to ufsmount:
* um_maxsymlinklen, replaces mnt_maxsymlinklen (which never should have existed
in the first place).
* um_dirblksiz, which tracks the current directory block size, eliminating the
FS-specific checks littered throughout the code. This may be used later to
make the block size variable.
* um_maxfilesize, which is the maximum file size, possibly adjusted lower due
to implementation issues.

Sync some bug fixes from FFS into ext2fs, particularly:
* ffs_lookup.c 1.21, 1.28, 1.33, 1.48
* ffs_inode.c 1.43, 1.44, 1.45, 1.66, 1.67
* ffs_vnops.c 1.84, 1.85, 1.86

Clean up some crappy pointer frobnication.
 1.30 24-Jul-2004  dbj remove incorrect casts that limit some uses of daddr_t to 31 bits
this fixes problems using ffs2 with more than 2^31 sectors (~1tb)
 1.29 25-May-2004  hannken Add ffs internal snapshots. Written by Marshall Kirk McKusick for FreeBSD.

- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.

Welcome to 2.0F.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.28 27-Feb-2004  uwe branches: 1.28.2;
Shut up gcc3 warning that `metalbn' might be used uninitialized.
XXX: The warning is bogus and only triggered on sh3.
 1.27 25-Jan-2004  hannken Make VOP_STRATEGY(bp) a real VOP as discussed on tech-kern.

VOP_STRATEGY(bp) is replaced by one of two new functions:

- VOP_STRATEGY(vp, bp) Call the strategy routine of vp for bp.
- DEV_STRATEGY(bp) Call the d_strategy routine of bp->b_dev for bp.

DEV_STRATEGY(bp) is used only for block-to-block device situations.
 1.26 10-Jan-2004  yamt store a i/o priority hint in struct buf for buffer queue discipline.
 1.25 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.24 23-Jul-2003  yamt cast UFS1 on-disk block pointers to int32_t before assign it to daddr_t.
it's needed for LFS because UNWRITTEN is a negative number.
 1.23 18-May-2003  yamt branches: 1.23.2;
make is_sequential a callback in order to achieve better lfs write clustering.

since lfs always rewrite blocks into the new segment,
current on-disk place of the block doesn't affect to write clustering.

ok'ed by Konrad Schroder.
 1.22 23-Apr-2003  tls Correct use of MAXBSIZE where MAXPHYS was intended. This is a necessary
first step towards per-device MAXPHYS, and has the beneficial side effect
of allowing clustering to MAXPHYS even on systems that need to run with
a reduced MAXBSIZE to get more metadata buffers.
 1.21 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.20 21-Mar-2003  fvdl LFS likes to store negative values in the dinode block pointers, so
make sure to cast the value back to int32_t after it was changed
by ufs_rw32, before passing it to blkptrtodb.
 1.19 09-Feb-2003  perseant Allow negative values other than UNASSIGNED to be returned from ufs_bmap;
fixes a bug introduced in the 64-bit daddr_t conversion, that manifests
itself in LFS with kernels compiled with the FFS_EI option.
 1.18 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.17 11-May-2002  enami branches: 1.17.4;
Add comment that getblk() in ufs_bmaparray() returns an error only if
we are pagedaemon.
 1.16 13-Nov-2001  chs some callers of ufs_bmaparray() in LFS depend on *nump being set to 0 for
direct blocks, so restore that behaviour.
 1.15 10-Nov-2001  chs fix the previous change: use the correct test for a block number
referring to a direct block.
 1.14 08-Nov-2001  chs only call ufs_getlbns() for blocks which involve indirects, and assert
that this is so. use a shift instead of a multiply in one place.
 1.13 08-Nov-2001  lukem add RCSID
 1.12 06-Nov-2001  simonb Remove some variables that are set but never used.
 1.11 26-Oct-2001  lukem remove #include <ufs/ufs/quota.h> where it was just to appease
<ufs/ufs/inode.h>, since the latter now includes the former. leave the former
in source that obviously uses specific bits of it (for completeness.)
 1.10 27-Nov-2000  chs branches: 1.10.2; 1.10.6; 1.10.10;
Initial integration of the Unified Buffer Cache project.
 1.9 30-Mar-2000  augustss Remove register declarations.
 1.8 13-Jun-1998  kleink branches: 1.8.14;
KNF, mostly of FFS_EI changes.
 1.7 18-Mar-1998  bouyer Add support for reading/writing FFS in non-native byte order, conditioned
to "options FFS_EI". The superblock and inodes (without blk addr) are
byteswapped at disk read/write time, other metadatas are byteswapped
when used (as they are acceeded directly in the buffer cache).
This required the addition of a "um_flags" field to struct ufsmount.
ffs_bswap.c contains superblock and inode byteswap routines also used
by userland utilities.
 1.6 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.5 17-Jul-1997  fvdl When allocating a new block, store the result obtained through counting
indirect blocks in a 64 bit integer, to prevent overflows when computing
NINDIR^3
 1.4 11-Jun-1997  bouyer Add support for ext2fs, this needed a few modifications to ufs/ufs/inode.h:
- added an "union inode_ext" to struct inode, for the per-fs extentions.
For now only ext2fs uses it.
- i_din is now an union:
union {
struct dinode ffs_din; /* 128 bytes of the on-disk dinode. */
struct ext2fs_dinode e2fs_din; /* 128 bytes of the on-disk dinode. */
} i_din
Added a lot of #define i_ffs_* and i_e2fs_* to access the fields.
- Added two macros: FFS_ITIMES and EXT2FS_ITIMES. ITIMES calls the rigth
macro, depending on the time of the inode. ITIMES is used where necessary,
FFS_ITIMES and EXT2FS_ITIMES in other places.
 1.3 09-Feb-1996  christos ufs prototype changes
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.3 01-Mar-1998  fvdl Import some files that were changed after Lite2
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.8.14.2 08-Dec-2000  bouyer Sync with HEAD.
 1.8.14.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.10.10.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.10.6.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.10.6.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.10.2.5 12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.10.2.4 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.10.2.3 20-Jun-2002  nathanw Catch up to -current.
 1.10.2.2 14-Nov-2001  nathanw Catch up to -current.
 1.10.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.17.4.1 26-Aug-2003  tron Pull up revision 1.22 (requested by tls in ticket #1434):
Correct use of MAXBSIZE where MAXPHYS was intended. This is a necessary
first step towards per-device MAXPHYS, and has the beneficial side effect
of allowing clustering to MAXPHYS even on systems that need to run with
a reduced MAXBSIZE to get more metadata buffers.
 1.23.2.7 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.23.2.6 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.23.2.5 18-Dec-2004  skrll Sync with HEAD.
 1.23.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.23.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.23.2.2 25-Aug-2004  skrll Sync with HEAD.
 1.23.2.1 03-Aug-2004  skrll Sync with HEAD
 1.28.2.3 06-Apr-2005  tron Apply patch (requested by mycroft in ticket #1035):
Fix a silent truncation problem that could cause corruption with large
FFSv1 file systems.
 1.28.2.2 18-Sep-2004  he branches: 1.28.2.2.2;
Pull up revision 1.33 (reqyested by yamt in ticket #859):
Fix an integer overflow when calculating lbns of indirect
blocks, and remove a redundant calculation of blockcnt.
 1.28.2.1 28-Jul-2004  tron Pull up revision 1.30 via patch (requested by dbj in ticket #722):
remove incorrect casts that limit some uses of daddr_t to 31 bits
this fixes problems using ffs2 with more than 2^31 sectors (~1tb)
 1.28.2.2.2.1 06-Apr-2005  tron Apply patch (requested by mycroft in ticket #1035):
Fix a silent truncation problem that could cause corruption with large
FFSv1 file systems.
 1.34.4.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.34.2.1 29-Apr-2005  kent sync with -current
 1.35.4.5 21-Jan-2008  yamt sync with head
 1.35.4.4 27-Oct-2007  yamt sync with head.
 1.35.4.3 03-Sep-2007  yamt sync with head.
 1.35.4.2 26-Feb-2007  yamt sync with head.
 1.35.4.1 21-Jun-2006  yamt sync with head.
 1.39.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.39.10.1 19-Apr-2006  elad sync with head.
 1.39.8.1 11-Apr-2006  yamt sync with head
 1.39.6.1 22-Apr-2006  simonb Sync with head.
 1.39.4.1 09-Sep-2006  rpaulo sync with head
 1.40.8.1 01-Feb-2007  ad Sync with head.
 1.42.2.2 17-May-2007  yamt sync with head.
 1.42.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.44.6.1 11-Jul-2007  mjf Sync with head.
 1.44.4.4 30-Aug-2007  ad Reduce diffs to head.
 1.44.4.3 24-Aug-2007  ad Sync with buffer cache locking changes. See buf.h/vfs_bio.c for details.
Some minor portions are incomplete and needs to be verified as a whole.
 1.44.4.2 08-Jun-2007  ad Sync with head.
 1.44.4.1 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.45.10.1 14-Oct-2007  yamt sync with head.
 1.45.8.2 09-Jan-2008  matt sync with HEAD
 1.45.8.1 06-Nov-2007  matt sync with HEAD
 1.45.6.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.46.10.1 02-Jan-2008  bouyer Sync with HEAD
 1.46.6.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.46.4.1 18-Feb-2008  mjf Sync with HEAD.
 1.47.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.48.34.1 18-Feb-2011  bouyer Add a new inode flag, SF_SNAPINVAL, to be set on SF_SNAPSHOT inodes when
the snapshot is invalid.
Set SF_SNAPSHOT | SF_SNAPINVAL early when initializing a snapshot indode,
so that quota are bypassed for allocations on this inode.
Set SF_SNAPSHOT | SF_SNAPINVAL (instead of clearing SF_SNAPSHOT) when
expuge()ing a snapshot inode, so that userland tools working on the
snapshot (e.g. fsck or dump) can properly handle this inode.

The main point at this time is to have fsck_ffs -X properly compute quotas;
as a bonus persistent snapshots files won't show up in a dump(8) from a
snapshot.

This may also help speeding up taking snapshots, by bypassing expuge()
for snapshot inodes completely (but this needs more thoughs).


Briefly discussed with hannken@ in private mail.
 1.48.32.1 06-Jun-2011  jruoho Sync with HEAD.
 1.48.26.1 21-Apr-2011  rmind sync with head
 1.49.14.4 03-Dec-2017  jdolecek update from HEAD
 1.49.14.3 25-Feb-2013  tls resync with head
 1.49.14.2 10-Feb-2013  tls Add an accessor -- ufs_maxphys() -- to check the maximum transfer size
for a given UFS mountpoint, and move the code from mount that finds
the underlying disk and resets the mountpoint max transfer size into a
utility function, ufs_update_maxphys().

Add a global serial number that counts disk property changes to which
filesystems are meant to accomodate themselves. Make ufs_maxphys()
check it. This is a sort of flag-polling interface that avoids callbacks
into the filesystem code, but will require freezing filesystems and
draining in-flight transactions before a decrease in size that is
mandatory (like attaching a disk with a smaller maximum transfer size
as a spare in a RAIDframe set), rather than "advisory", like finding
out set geometry from a RAID controller long after boot and deciding
a smaller transfer size would be optimal, can be signalled. Still, the
"advisory" case is the common one so this is progress.

Make a bit of an example of RAIDframe by making it bump this new
serial number when disks are added to the subsystem. I will attack
one of the hardware RAID drivers (probably arcmsr) next.
 1.49.14.1 09-Oct-2012  bouyer Use mnt_maxphys not MAXPHYS to limit the size of I/O to disk.
Now the read-ahead code does issue 512k requests to disk.
 1.49.4.1 23-Jan-2013  yamt sync with head
 1.50.22.1 21-Apr-2017  bouyer Sync with HEAD
 1.50.18.1 20-Mar-2017  pgoyette Sync with HEAD
 1.50.14.1 28-Aug-2017  skrll Sync with HEAD
 1.52.24.1 20-Apr-2020  bouyer Sync with HEAD
 1.52.14.1 21-Apr-2020  martin Sync with HEAD
 1.23 19-Apr-2018  christos s/static inline/static __inline/g for consistency.
 1.22 08-Feb-2017  rin branches: 1.22.12;
Add smaller versions of fsck_ffs(8) and newfs(8) for install media, where
support for Endian-Independent FFS and Apple UFS is disabled unless FFS_EI=1
and APPLE_UFS=1 are added to CRUNCHENV, respectively.

This reduces the size of ramdisk image for atari by over 15KB.

Thanks tsutsui and christos for their useful comments.
 1.21 29-Apr-2016  christos branches: 1.21.2; 1.21.4;
use variables that could be unused.
 1.20 19-Oct-2013  mrg branches: 1.20.6;
convert ufs_rw{16,32,64}() into real inline functions in all cases,
so that they consume their second arguments properly.
 1.19 19-Oct-2009  bouyer branches: 1.19.12; 1.19.22; 1.19.26;
Remove closes 3 & 4 from my licence. Lots of thanks to Soren Jacobsen
for the booring work !
 1.18 29-Jan-2006  dsl branches: 1.18.72;
Make almost everything #include <sys/bswap.h> instead of <machine/bswap.h>
The bswap.h and endian.h files are all rather incestuous, but I want to
get the constant folding stuff into one place - sys/bswap.h
 1.17 24-Dec-2005  perry branches: 1.17.2;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.16 11-Dec-2005  christos merge ktrace-lwp.
 1.15 10-Jul-2005  thorpej - Use ANSI function decls.
- Sprinkle some static.
 1.14 15-Aug-2004  mycroft branches: 1.14.12;
Fixing age old cruft:
* Rather than using mnt_maxsymlinklen to indicate that a file systems returns
d_type fields(!), add a new internal flag, IMNT_DTYPE.

Add 3 new elements to ufsmount:
* um_maxsymlinklen, replaces mnt_maxsymlinklen (which never should have existed
in the first place).
* um_dirblksiz, which tracks the current directory block size, eliminating the
FS-specific checks littered throughout the code. This may be used later to
make the block size variable.
* um_maxfilesize, which is the maximum file size, possibly adjusted lower due
to implementation issues.

Sync some bug fixes from FFS into ext2fs, particularly:
* ffs_lookup.c 1.21, 1.28, 1.33, 1.48
* ffs_inode.c 1.43, 1.44, 1.45, 1.66, 1.67
* ffs_vnops.c 1.84, 1.85, 1.86

Clean up some crappy pointer frobnication.
 1.13 05-Oct-2003  bouyer Remove references to University of California from my copyright notices.
 1.12 16-Apr-2003  yamt branches: 1.12.2;
ufs_rwXX: cast to unsigned even without FFS_EI to avoid
possible nasty signed vs unsigned problems.
 1.11 01-Dec-2002  matt Add multiple inclusion protection for headers. Fix mismatched
variable declarations (missing const's) as needed.
 1.10 31-Jan-2002  tv Revert previous. This is actually being done a better way.
 1.9 31-Jan-2002  tv For makefs, only include <machine/bswap.h> if it exists.
 1.8 26-Jul-2001  lukem fix spelo
 1.7 30-May-2001  mrg branches: 1.7.4;
use _KERNEL_OPT
 1.6 15-May-2000  bouyer branches: 1.6.6;
Sync copyrigth notice.
 1.5 15-Nov-1999  fvdl Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.4 03-Aug-1999  drochner branches: 1.4.2; 1.4.4; 1.4.8;
clean up inclusion of "opt_ffs.h" and use of "FFS_EI" a bit
 1.3 15-Jan-1999  bouyer Move the bswap functions from libutil to libc (this bups the
minor of libc and the major of libutil). For little-endian architectures
merge the bnswap() assembly versions with nto* and hton* using symbols
aliasing. Use symbol renaming for the bswap function in this case to avoid
namespace pollution.
Declare bswap* in machine/bswap.h, not machine/endian.h. For little-endian
machines, common code for inline macros go in machine/byte_swap.h
Sync libkern with libc.
Adjust #include in kernel sources for machine/bswap.h.
 1.2 12-Nov-1998  thorpej defopt FFS_EI
 1.1 18-Mar-1998  bouyer Add support for reading/writing FFS in non-native byte order, conditioned
to "options FFS_EI". The superblock and inodes (without blk addr) are
byteswapped at disk read/write time, other metadatas are byteswapped
when used (as they are acceeded directly in the buffer cache).
This required the addition of a "um_flags" field to struct ufsmount.
ffs_bswap.c contains superblock and inode byteswap routines also used
by userland utilities.
 1.4.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.4.4.1 19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.4.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.6.6.4 11-Dec-2002  thorpej Sync with HEAD.
 1.6.6.3 28-Feb-2002  nathanw Catch up to -current.
 1.6.6.2 24-Aug-2001  nathanw Catch up with -current.
 1.6.6.1 21-Jun-2001  nathanw Catch up to -current.
 1.7.4.2 11-Feb-2002  jdolecek Sync w/ -current.
 1.7.4.1 03-Aug-2001  lukem update to -current
 1.12.2.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.12.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.12.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.12.2.2 25-Aug-2004  skrll Sync with HEAD.
 1.12.2.1 03-Aug-2004  skrll Sync with HEAD
 1.14.12.1 21-Jun-2006  yamt sync with head.
 1.17.2.1 01-Feb-2006  yamt sync with head.
 1.18.72.1 11-Mar-2010  yamt sync with head
 1.19.26.1 18-May-2014  rmind sync with head
 1.19.22.2 03-Dec-2017  jdolecek update from HEAD
 1.19.22.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.19.12.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.20.6.2 28-Aug-2017  skrll Sync with HEAD
 1.20.6.1 29-May-2016  skrll Sync with HEAD
 1.21.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.21.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.22.12.1 22-Apr-2018  pgoyette Sync with HEAD
 1.41 07-Aug-2022  simonb If UFS or LFS dirhash is enabled in the kernel, set the dirhash cache
size dependant on memory size. If less than 128MB of memory, default
to no cache. With 128MB of memory or more, use a maximum cache size of
1/64th of memory; cap maximum default cache size to 32MB (for systems
with 2GB of memory or more).

The dirhash cache sizes are still explicityly setable by sysctl(8) or
by adding relevant entry(s) to sysctl.conf(5).
 1.40 16-Mar-2020  pgoyette Use the module subsystem's ability to process SYSCTL_SETUP() entries to
automate installation of sysctl nodes.

Note that there are still a number of device and pseudo-device modules
that create entries tied to individual device units, rather than to the
module itself. These are not changed.
 1.39 14-Mar-2020  ad - Hide the details of SPCF_SHOULDYIELD and related behind a couple of small
functions: preempt_point() and preempt_needed().

- preempt(): if the LWP has exceeded its timeslice in kernel, strip it of
any priority boost gained earlier from blocking.
 1.38 08-Mar-2020  chs in ufsdirhash_free(), only examine dh->dh_onlist after taking the
dirhashlist lock. if we skip the lock then we might see that
dh_onlist is zero while ufsdirhash_recycle() is still working on
the dirhash. the symptom I saw was that ufsdirhash_free() would
try to destroy the dh_lock mutex while it was still held.
 1.37 20-Dec-2014  christos branches: 1.37.18; 1.37.22;
clear i_dirhash sooner, but what lock protects it?
 1.36 25-Feb-2014  pooka branches: 1.36.6;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.35 09-Jun-2013  dholland branches: 1.35.2;
Stick UFS_ in front of these symbols:
DIRBLKSIZ
DIRECTSIZ
DIRSIZ
OLDDIRFMT
NEWDIRFMT

Part of PR 47909.
 1.34 05-Oct-2009  rmind branches: 1.34.12; 1.34.22;
ufsdirhash_recycle(): modify ufs_dirhashmem atomically.
 1.33 30-May-2009  hannken ufsdirhash_lookup(): call ufs_blkatoff() with "modify == false".
This buffer is used read-only here and from caller.
 1.32 06-May-2009  rmind Revert previous until problem will be understood.
 1.31 04-May-2009  rmind ufsdirhash_recycle():
- Fix ufs_dirhashmem modification (do it atomically).
- Fix a memory leak.

OK by <ad>.
 1.30 18-Mar-2009  cegger bcmp -> memcmp
 1.29 18-Mar-2009  cegger Ansify function definitions w/o arguments. Generated with sed.
 1.28 22-Feb-2009  ad PR kern/26878 FFSv2 + softdep = livelock (no free ram)
PR kern/16942 panic with softdep and quotas
PR kern/19565 panic: softdep_write_inodeblock: indirect pointer #1 mismatch
PR kern/26274 softdep panic: allocdirect_merge: ...
PR kern/26374 Long delay before non-root users can write to softdep partitions
PR kern/28621 1.6.x "vp != NULL" panic in ffs_softdep.c:4653 while unmounting a softdep (+quota) filesystem
PR kern/29513 FFS+Softdep panic with unfsck-able file-corruption
PR kern/31544 The ffs softdep code appears to fail to write dirty bits to disk
PR kern/31981 stopping scsi disk can cause panic (softdep)
PR kern/32116 kernel panic in softdep (assertion failure)
PR kern/32532 softdep_trackbufs deadlock
PR kern/37191 softdep: locking against myself
PR kern/40474 Kernel panic after remounting raid root with softdep

Retire softdep, pass 2. As discussed and later formally announced on the
mailing lists.
 1.27 03-Jul-2008  ad branches: 1.27.4; 1.27.10;
ufsdirhash_build: missing unlock in failure path.
 1.26 28-Jun-2008  rumble Create sysctl entries during module initialisation and destroy them
appropriately.

Many of these file systems are now ready for modularisation.
 1.25 16-Jun-2008  skd Add some locking, runs with DIAGNOSTIC.
 1.24 15-Jun-2008  skd Fix two cases where we would panic locking against ourselves.
 1.23 04-Jun-2008  ad branches: 1.23.2;
- Tidy up the locking a bit.
- Use atomics/kmem_alloc/pool_cache.
 1.22 16-May-2008  hannken Make sure all cached buffers with valid, not yet written data have been
run through copy-on-write. Call fscow_run() with valid data where possible.

The LP_UFSCOW hack is no longer needed to protect ffs_copyonwrite() against
endless recursion.

- Add a flag B_MODIFY to bread(), breada() and breadn(). If set the caller
intends to modify the buffer returned.

- Always run copy-on-write on buffers returned from ffs_balloc().

- Add new function ffs_getblk() that gets a buffer, assigns a new blkno,
may clear the buffer and runs copy-on-write. Process possible errors
from getblk() or fscow_run(). Part of PR kern/38664.

Welcome to 4.99.63

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.21 03-Jan-2008  ad branches: 1.21.6; 1.21.8; 1.21.10; 1.21.12;
Use pool_cache.
 1.20 08-Oct-2007  ad branches: 1.20.4; 1.20.10;
Merge ffs locking & brelse changes from the vmlocking branch.
 1.19 22-Jul-2007  rumble branches: 1.19.4; 1.19.6; 1.19.8; 1.19.10;
Add missing RCSID.
 1.18 10-Jul-2007  hannken branches: 1.18.2;
Move `struct dquot' and its supporting functions from quota.h to ufs_quota.c.

- Make quota-internal functions static.
- Clean up declarations in quota.h and ufs_extern.h. quota.h now has the
description of quota criterions, on-disk structure, user-kernel interface and
declaration of init/done functions. All ufs quota related function
prototypes go to ufs_extern.h.
- New functions ufsquota_init() and ufsquota_free() create or destroy the
quota fields of `struct inode'.
- chkdq() and chkiq() always update the quota fields of `struct inode' first.
- Only ufs_access() explicitely calls getinoquota().

No objections on tech-kern@
 1.17 09-Jul-2007  ad Fix merge botch.
 1.16 09-Jul-2007  ad Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.15 30-Jun-2007  pooka Using POOL_INIT here makes no sense, since file systems always have
an init method. So get rid of it and #ifdef _LKM and just always
init in the init method. Give malloc types the same treatment.
Makes file systems nicer to work with in linksetless environments
and fixes a few LKM discrepancies.
 1.14 12-Mar-2007  ad branches: 1.14.2;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.
 1.13 04-Mar-2007  christos branches: 1.13.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.12 09-Feb-2007  ad branches: 1.12.2;
Merge newlock2 to head.
 1.11 19-Mar-2006  matt branches: 1.11.8;
More MALLOC -> malloc changes.
 1.10 14-Jan-2006  yamt branches: 1.10.2; 1.10.4; 1.10.6; 1.10.8; 1.10.10;
- unify ffs_blkatoff and lfs_blkatoff.
- remove ufs_ops::uo_blkatoff.
- add directory read-ahead code. (disabled for now.)
 1.9 14-Jan-2006  yamt make ufsdirhash_pool static.
 1.8 13-Jan-2006  yamt ufsdirhash_build: yield cpu when looping on directory entries.
 1.7 11-Dec-2005  christos branches: 1.7.2;
merge ktrace-lwp.
 1.6 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.5 10-Jul-2005  thorpej branches: 1.5.2;
Defflag UFS_DIRHASH.
 1.4 20-Jun-2005  atatat branches: 1.4.2;
Change the rest of the sysctl subsystem to use const consistently.
The __UNCONST macro is now used only where necessary and the RW macros
are gone. Most of the changes here are consumers of the
sysctl_createv(9) interface that now takes a pair of const pointers
which used not to be.
 1.3 31-May-2005  christos s/buf/sbuf.
 1.2 26-Feb-2005  perry branches: 1.2.4;
nuke trailing whitespace
 1.1 23-Jan-2005  rumble branches: 1.1.2; 1.1.4;
Bring in Ian Dowse's Dirhash from FreeBSD. Hash tables of
directories are created on the fly and used to increase
performance by circumventing ufs_lookup's linear search.

Dirhash is enabled by the UFS_DIRHASH option, but not
by default.
 1.1.4.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.1.2.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.1.2.3 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.1.2.2 24-Jan-2005  skrll Sync with HEAD.
 1.1.2.1 23-Jan-2005  skrll file ufs_dirhash.c was added on branch ktrace-lwp on 2005-01-24 08:36:05 +0000
 1.2.4.2 29-Apr-2005  kent sync with -current
 1.2.4.1 26-Feb-2005  kent file ufs_dirhash.c was added on branch kent-audio2 on 2005-04-29 11:29:39 +0000
 1.4.2.5 21-Jan-2008  yamt sync with head
 1.4.2.4 27-Oct-2007  yamt sync with head.
 1.4.2.3 03-Sep-2007  yamt sync with head.
 1.4.2.2 26-Feb-2007  yamt sync with head.
 1.4.2.1 21-Jun-2006  yamt sync with head.
 1.5.2.1 20-Oct-2005  yamt adapt ufs.
 1.7.2.1 15-Jan-2006  yamt sync with head.
 1.10.10.1 28-Mar-2006  tron Merge 2006-03-28 NetBSD-current into the "peter-altq" branch.
 1.10.8.1 19-Apr-2006  elad sync with head.
 1.10.6.1 01-Apr-2006  yamt sync with head.
 1.10.4.1 22-Apr-2006  simonb Sync with head.
 1.10.2.1 09-Sep-2006  rpaulo sync with head
 1.11.8.1 30-Jan-2007  ad Remove support for SA. Ok core@.
 1.12.2.2 24-Mar-2007  yamt sync with head.
 1.12.2.1 12-Mar-2007  rmind Sync with HEAD.
 1.13.2.6 20-Aug-2007  ad Sync with HEAD.
 1.13.2.5 15-Jul-2007  ad Sync with head.
 1.13.2.4 15-Jul-2007  ad Sync with head.
 1.13.2.3 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.13.2.2 13-Apr-2007  ad Enable the dirhash locking, and add some comments from FreeBSD.
 1.13.2.1 13-Mar-2007  ad Sync with head.
 1.14.2.1 11-Jul-2007  mjf Sync with head.
 1.18.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.19.10.2 22-Jul-2007  rumble Add missing RCSID.
 1.19.10.1 22-Jul-2007  rumble file ufs_dirhash.c was added on branch matt-mips64 on 2007-07-22 21:12:28 +0000
 1.19.8.1 14-Oct-2007  yamt sync with head.
 1.19.6.2 09-Jan-2008  matt sync with HEAD
 1.19.6.1 06-Nov-2007  matt sync with HEAD
 1.19.4.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.20.10.1 08-Jan-2008  bouyer Sync with HEAD
 1.20.4.1 18-Feb-2008  mjf Sync with HEAD.
 1.21.12.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.21.12.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.21.10.3 11-Mar-2010  yamt sync with head
 1.21.10.2 20-Jun-2009  yamt sync with head
 1.21.10.1 04-May-2009  yamt sync with head.
 1.21.8.2 17-Jun-2008  yamt sync with head.
 1.21.8.1 18-May-2008  yamt sync with head.
 1.21.6.4 28-Sep-2008  mjf Sync with HEAD.
 1.21.6.3 29-Jun-2008  mjf Sync with HEAD.
 1.21.6.2 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.21.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.23.2.2 03-Jul-2008  simonb Sync with head.
 1.23.2.1 18-Jun-2008  simonb Sync with head.
 1.27.10.2 23-Jul-2009  jym Sync with HEAD.
 1.27.10.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.27.4.2 28-Apr-2009  skrll Sync with HEAD.
 1.27.4.1 03-Mar-2009  skrll Sync with HEAD.
 1.34.22.3 03-Dec-2017  jdolecek update from HEAD
 1.34.22.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.34.22.1 23-Jun-2013  tls resync from head
 1.34.12.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.35.2.1 18-May-2014  rmind sync with head
 1.36.6.1 06-Apr-2015  skrll Sync with HEAD
 1.37.22.1 08-Mar-2020  martin Pull up following revision(s) (requested by chs in ticket #767):

sys/ufs/ufs/ufs_dirhash.c: revision 1.38

in ufsdirhash_free(), only examine dh->dh_onlist after taking the
dirhashlist lock. if we skip the lock then we might see that
dh_onlist is zero while ufsdirhash_recycle() is still working on
the dirhash. the symptom I saw was that ufsdirhash_free() would
try to destroy the dh_lock mutex while it was still held.
 1.37.18.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.2 01-Mar-1998  fvdl Remove extraneous files from Lite2 merge.
 1.1 01-Mar-1998  fvdl branches: 1.1.1;
Initial revision
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.55 10-Feb-2024  andvar Fix various typos in comments, log messages and documentation.
 1.54 22-Feb-2023  riastradh ufs: Nix trailing whitespace and tidy up some other minor KNF.
 1.53 29-Jun-2021  dholland Add containment for the cloning devices hack in vn_open.

Cloning devices (and also things like /dev/stderr) work by allocating
a struct file, stuffing it in the file table (which is a layer
violation), stuffing the file descriptor number for it in a magic
field of struct lwp (which is gross), and then "failing" with one of
two magic errnos, EDUPFD or EMOVEFD.

Before this commit, all callers of vn_open in the kernel (there are
quite a few) were expected to check for these errors and handle the
situation. Needless to say, none of them except for open() itself did,
resulting in internal negative errnos being returned to userspace.

This hack is fairly deeply rooted and cannot be eliminated all at
once. This commit adds logic to handle the magic errnos inside
vn_open; now on success vn_open returns either a vnode or an integer
file descriptor, along with a flag that says whether the underlying
code requested EDUPFD or EMOVEFD. Callers not prepared to cope with
file descriptors can pass NULL for the extra return values, in which
case if a file descriptor would be produced vn_open fails with
EOPNOTSUPP.

Since I'm rearranging vn_open's signature anyway, stop exposing struct
nameidata. Instead, take three arguments: an optional vnode to use as
the starting point (like openat()), the path, and additional namei
flags to use, restricted to NOCHROOT and TRYEMULROOT. (Other namei
behavior, e.g. NOFOLLOW, can be requested via the open flags.)

This change requires a kernel bump. Ride the one an hour ago.
(That was supposed to be coordinated; did not intend to let an hour
slip by. My fault.)
 1.52 16-May-2020  christos branches: 1.52.6;
Add ACL support for FFS. From FreeBSD.
 1.51 17-Jan-2020  ad VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.
 1.50 19-Aug-2019  christos branches: 1.50.2;
- KNF more
- print the full path in error messages
 1.49 19-Aug-2019  christos - return (foo) -> return foo
- normalize all error messages to use __func__
- add more messages for startup failure (should perhaps auto-create
the attributes directory and the user and system subdirs?)
- factor out common code
 1.48 09-Nov-2016  dholland branches: 1.48.16;
Explain why the lock in here needs to be recursive. Related to PR 46997.
 1.47 07-Jul-2016  msaitoh branches: 1.47.2;
KNF. Remove extra spaces. No functional change.
 1.46 19-Nov-2014  manu branches: 1.46.2;
Fix uninitialized mutex usage

We use extended attribute mount mutex before testing if it had been
initialized, and as reported by Christos, this caused panic with
LOCKDEBUG. Fix it by testing before using.
 1.45 15-Nov-2014  manu Fix UFS1 extended attribute backend autocreation deadlock

UFS1 extended attribute backend autocration goes through a vn_open()
to create the backend file, and this forces us to release the lock
on the target node, in case the target is within the parents of the
backend file. That created a window within which another thread could
acquire a lock on the target vnode and deadlock awaiting for the
mount extended attribute lock.

We fix the problem by also releasing the mount extended attribute lock
when calling vn_open(), but that lets another thread race us for backend
creation. We just detect this using O_EXCL for vn_open() and by checking
for EEXIST return code. If we are raced, we fail backend creation but
this is not a problem since another thread succeeded on it: we just have
to use the result.
 1.44 14-Nov-2014  manu Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.
 1.43 07-Feb-2014  hannken branches: 1.43.4;
Change vnode operation lookup to return the resulting vnode *vpp unlocked.
Change cache_lookup() to return an unlocked vnode.

Discussed on tech-kern@

Welcome to 6.99.31
 1.42 09-Jun-2013  dholland branches: 1.42.2;
Stick UFS_ in front of these symbols:
DIRBLKSIZ
DIRECTSIZ
DIRSIZ
OLDDIRFMT
NEWDIRFMT

Part of PR 47909.
 1.41 08-Dec-2012  manu Remove always-true condition and note that the current code is suboptimal.
 1.40 10-Sep-2012  manu branches: 1.40.2;
Fix unmount returnign EBUSY if an attribute was autocreated: we hold
a useless reference that we never gave back
 1.39 01-May-2012  manu Return ENODATA when no attribute is found, like Linux does. After
all we decided to adopt the Linux API, therefore there is rationale
to stick to it.

No standard tells us what to do, and our extended attribute API has not
been used in a release, therefore we do not break anything, and we get
more easily compatible with programs using the Linux extended attribute
API.

Note that FreeBSD and MacOS X return ENOATTR. FreeBSD has its own API
and MacOS X has a Linux-like API. How did the world get so complicated?
 1.38 26-Mar-2012  wiz Fix wrong variable in snprintf. From Henning Petersen in PR 46259,
ok manu@
 1.37 13-Mar-2012  elad Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.
 1.36 27-Jan-2012  para branches: 1.36.2;
converting readdir in ffs ext2fs from malloc(9) to kmem(9)
while there allocate ufs mount structs from kmem(9) too
preceding kmem-vmem-pool-patch

releng@ acknowledged
 1.35 07-Jul-2011  manu branches: 1.35.2; 1.35.6;
Fix locking protocol to avoid a panic on extattrctl stop and on umount.
 1.34 04-Jul-2011  manu Add a flag to VOP_LISTEXTATTR(9) so that the vnode interface can tell the
filesystem in which format extended attribute shall be listed.

There are currently two formats:
- NUL-terminated strings, used for listxattr(2), this is the default.
- one byte length-pprefixed, non NUL-terminated strings, used for
extattr_list_file(2), which is obtanined by setting the
EXTATTR_LIST_PREFIXLEN flag to VOP_LISTEXTATTR(9)

This approach avoid the need for converting the list back and forth, except
in libperfuse, since FUSE uses NUL-terminated strings, and the kernel may
have requested EXTATTR_LIST_PREFIXLEN.
 1.33 27-Jun-2011  manu Implement extended attribute listing for UFS1.

Modify lsextattr(8) so that it does not expect each attribute name to be
prefixed by its length. This enable extattr_list_(file|link|fd) to
return a buffer matching its documentation. This also makes the interface
similar to what Linux and FUSE do, which is nice for interoperability.

Note that since we had no EA implementation supporting listing, we do
not break anything.
 1.32 17-Jun-2011  manu Add mount -o extattr option to enable extended attributs (corrently only
for UFS1).
Remove kernel option for EA backing store autocreation and do it by
default. Add a sysctl so that autocreated attriutr size can be modified.
 1.31 15-Jun-2011  manu Improve UFS1 extended attributes usability
- autocreate attribute backing file for new attributes
- autoload attributes when issuing extattrctl start
- when autoloading attributes, do not display garbage warning when looking
up entries that got ENOENT
 1.30 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.29 10-May-2011  manu branches: 1.29.2;
Fix filesystem root leaked lock when using UFS_EXTATTR_AUTOSTART.
This way, statvfs(2) calls obtained by df(1) or umount(8) will no
longer sleep forever in the kernel to acquire the lock.
 1.28 30-Nov-2010  dholland branches: 1.28.2;
Abolish struct componentname's cn_pnbuf. Use the path buffer in the
pathbuf object passed to namei as work space instead. (For now a pnbuf
pointer appears in struct nameidata, to support certain unclean things
that haven't been fixed yet, but it will be going away in the future.)

This removes the need for the SAVENAME and HASBUF namei flags.
 1.27 24-Jun-2010  hannken Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.
 1.26 10-May-2009  dholland branches: 1.26.2; 1.26.4;
The lwp member of struct componentname was removed a long time ago.
Fix broken build with UFS_EXTATTR_AUTOSTART by removing it here as well.
 1.25 18-Mar-2009  cegger bzero -> memset
 1.24 18-Mar-2009  cegger Ansify function definitions w/o arguments. Generated with sed.
 1.23 17-Dec-2008  cegger branches: 1.23.2;
kill MALLOC and FREE macros.
 1.22 13-Nov-2008  ad _KERNEL_OPT
 1.21 21-Mar-2008  ad branches: 1.21.4; 1.21.10; 1.21.12; 1.21.14;
Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.
 1.20 30-Jan-2008  ad branches: 1.20.6;
Replace use of lockmgr.
 1.19 25-Jan-2008  ad - Fix a couple of locking botches.
- Remove calls to VOP_LEASE().
 1.18 25-Jan-2008  pooka Destroy extattr lock when destroying extattrs associated with the
mountpoint. Make stopping extattrs always succesful to facilitate
always being able to free resources.
 1.17 11-Dec-2007  lukem use __KERNEL_RCSID() instead of __RCSID()
 1.16 26-Nov-2007  pooka branches: 1.16.2; 1.16.4; 1.16.6;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.15 10-Jul-2007  hannken branches: 1.15.6; 1.15.8; 1.15.14;
Move `struct dquot' and its supporting functions from quota.h to ufs_quota.c.

- Make quota-internal functions static.
- Clean up declarations in quota.h and ufs_extern.h. quota.h now has the
description of quota criterions, on-disk structure, user-kernel interface and
declaration of init/done functions. All ufs quota related function
prototypes go to ufs_extern.h.
- New functions ufsquota_init() and ufsquota_free() create or destroy the
quota fields of `struct inode'.
- chkdq() and chkiq() always update the quota fields of `struct inode' first.
- Only ufs_access() explicitely calls getinoquota().

No objections on tech-kern@
 1.14 30-Jun-2007  pooka Using POOL_INIT here makes no sense, since file systems always have
an init method. So get rid of it and #ifdef _LKM and just always
init in the init method. Give malloc types the same treatment.
Makes file systems nicer to work with in linksetless environments
and fixes a few LKM discrepancies.
 1.13 04-Mar-2007  christos branches: 1.13.2; 1.13.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.12 04-Jan-2007  elad branches: 1.12.2;
Consistent usage of KAUTH_GENERIC_ISSUSER.
 1.11 09-Dec-2006  chs a smorgasbord of improvements to vnode locking and path lookup:
- LOCKPARENT is no longer relevant for lookup(), relookup() or VOP_LOOKUP().
these now always return the parent vnode locked. namei() works as before.
lookup() and various other paths no longer acquire vnode locks in the
wrong order via vrele(). fixes PR 32535.
as a nice side effect, path lookup is also up to 25% faster.
- the above allows us to get rid of PDIRUNLOCK.
- also get rid of WANTPARENT (just use LOCKPARENT and unlock it).
- remove an assumption in layer_node_find() that all file systems implement
a recursive VOP_LOCK() (unionfs doesn't).
- require that all file systems supply vfs_vptofh and vfs_fhtovp routines.
fill in eopnotsupp() for file systems that don't support being exported
and remove the checks for NULL. (layerfs calls these without checking.)
- in union_lookup1(), don't change refcounts in the ISDOTDOT case, just
adjust which vnode is locked. fixes PR 33374.
- apply fixes for ufs_rename() from ufs_vnops.c rev. 1.61 to ext2fs_rename().
 1.10 23-Jul-2006  ad branches: 1.10.4; 1.10.6; 1.10.8;
Use the LWP cached credentials where sane.
 1.9 21-May-2006  cube branches: 1.9.4;
Include <sys/kauth.h> because it's needed.
 1.8 14-May-2006  elad branches: 1.8.2;
integrate kauth.
 1.7 01-Mar-2006  yamt branches: 1.7.2; 1.7.4; 1.7.6;
merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.
 1.6 23-Dec-2005  rpaulo branches: 1.6.2; 1.6.4; 1.6.6;
Convert UFS_EXTATTR to struct lwp.
 1.5 11-Dec-2005  christos merge ktrace-lwp.
 1.4 22-Sep-2005  rpaulo branches: 1.4.6; 1.4.8;
In ufs_extattr_stop(), if we haven't started yet, errno must be set
before bailing out.

From FreeBSD.
 1.3 12-Sep-2005  rpaulo Add missing '$' in __RCSID().
 1.2 12-Sep-2005  rpaulo In ufs_extattr_start(), unlock uepm_lock when bailing out.

Ok'd Jason Thorpe.
 1.1 28-Aug-2005  thorpej Experimental support for extended attributes on UFS1 file systems, using a
backing file per attribute type indexed by inode number to hold the extended
attributes.

This is working pretty well on my test systems, except for the "autostart"
feature. I need someone with a better handle on the VFS locking protocol
to go over that.

This is a work-in-progress. There are parts of this that could be re-factored
allowing this approach to be used on other types of file systems.

Adapted from FreeBSD.
 1.4.8.2 18-Nov-2005  yamt - associate read-ahead context to vnode, rather than file.
- revert VOP_READ prototype.
 1.4.8.1 15-Nov-2005  yamt - adapt to the new prototype of VOP_READ.
- adapt ext2fs and union.
 1.4.6.2 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.4.6.1 22-Sep-2005  skrll file ufs_extattr.c was added on branch ktrace-lwp on 2005-11-10 14:12:39 +0000
 1.6.6.2 01-Jun-2006  kardel Sync with head.
 1.6.6.1 22-Apr-2006  simonb Sync with head.
 1.6.4.1 09-Sep-2006  rpaulo sync with head
 1.6.2.1 15-Jan-2006  yamt convert the rest of ufs.
 1.7.6.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.7.4.3 21-Apr-2006  elad use "l->l_proc" instead of "p" (that doesn't exist)
 1.7.4.2 10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.7.4.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.7.2.2 11-Aug-2006  yamt sync with head
 1.7.2.1 24-May-2006  yamt sync with head.
 1.8.2.1 19-Jun-2006  chap Sync with head.
 1.9.4.9 24-Mar-2008  yamt sync with head.
 1.9.4.8 04-Feb-2008  yamt sync with head.
 1.9.4.7 21-Jan-2008  yamt sync with head
 1.9.4.6 07-Dec-2007  yamt sync with head
 1.9.4.5 03-Sep-2007  yamt sync with head.
 1.9.4.4 26-Feb-2007  yamt sync with head.
 1.9.4.3 30-Dec-2006  yamt sync with head.
 1.9.4.2 21-Jun-2006  yamt sync with head.
 1.9.4.1 21-May-2006  yamt file ufs_extattr.c was added on branch yamt-lazymbuf on 2006-06-21 15:12:39 +0000
 1.10.8.1 17-Feb-2007  tron Apply patch (requested by chs in ticket #422):
- Fix various deadlock problems with nullfs and unionfs.
- Speed up path lookups by upto 25%.
 1.10.6.1 10-Dec-2006  yamt sync with head.
 1.10.4.1 12-Jan-2007  ad Sync with head.
 1.12.2.1 12-Mar-2007  rmind Sync with HEAD.
 1.13.4.1 11-Jul-2007  mjf Sync with head.
 1.13.2.1 15-Jul-2007  ad Sync with head.
 1.15.14.3 18-Feb-2008  mjf Sync with HEAD.
 1.15.14.2 27-Dec-2007  mjf Sync with HEAD.
 1.15.14.1 08-Dec-2007  mjf Sync with HEAD.
 1.15.8.2 23-Mar-2008  matt sync with HEAD
 1.15.8.1 09-Jan-2008  matt sync with HEAD
 1.15.6.1 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.16.6.1 13-Dec-2007  bouyer Sync with HEAD
 1.16.4.1 11-Dec-2007  yamt sync with head.
 1.16.2.1 26-Dec-2007  ad Sync with head.
 1.20.6.2 17-Jan-2009  mjf Sync with HEAD.
 1.20.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.21.14.4 17-Sep-2011  bouyer Pull up following revision(s) (requested by manu in ticket #1664):
sys/ufs/ufs/ufs_extattr.c: revision 1.35
Fix locking protocol to avoid a panic on extattrctl stop and on umount.
 1.21.14.3 17-Jul-2011  riz Pull up following revision(s) (requested by manu in ticket #1645):
lib/libc/sys/Makefile.inc 1.207 via patch
lib/libc/sys/extattr_get_file.2 patch
lib/libpuffs/dispatcher.c 1.34,1.36 via patch
lib/libpuffs/puffs.c 1.107 via patch
lib/libpuffs/puffs.h 1.115,1.118 via patch
sys/fs/puffs/puffs_msgif.h 1.71,1.76 via patch
sys/fs/puffs/puffs_vfsops.c 1.88 via patch
sys/fs/puffs/puffs_vnops.c 1.145,1.154 via patch
sys/kern/vfs_xattr.c 1.24-1.27 via patch
sys/kern/vnode_if.c 1.87 via patch
sys/sys/Makefile 1.133 via patch
sys/sys/extattr.h 1.6 via patch
sys/sys/vnode_if.h 1.81 via patch
sys/ufs/ffs/ffs_vnops.c patch
sys/ufs/ufs/ufs_extattr.c 1.31,1.34 via patch

* support extended attributes
* bump major due to structure growth
* add some spare space
* remove ABI sillyness
Support extended attributes.
Fix multiple non compliances in our Linux-like extattr API, and make it
public so that it can be used.
Improve a bit listxattr(2). It attemps to list both system and user
extended attributes, and it faled if calling user did not have privilege
for reading system EA. Now we just lise user EA and skip system EA in
reading them is not allowed.
Fix bug introduced in previous commuit: Do not vrele() a vnode we did not
obtained.
Improve UFS1 extended attributes usability
- autocreate attribute backing file for new attributes
- autoload attributes when issuing extattrctl start
- when autoloading attributes, do not display garbage warning when looking
up entries that got ENOENT
Add a flag to VOP_LISTEXTATTR(9) so that the vnode interface can tell the
filesystem in which format extended attribute shall be listed.
There are currently two formats:
- NUL-terminated strings, used for listxattr(2), this is the default.
- one byte length-pprefixed, non NUL-terminated strings, used for
extattr_list_file(2), which is obtanined by setting the
EXTATTR_LIST_PREFIXLEN flag to VOP_LISTEXTATTR(9)
This approach avoid the need for converting the list back and forth, except
in libperfuse, since FUSE uses NUL-terminated strings, and the kernel may
have requested EXTATTR_LIST_PREFIXLEN.
 1.21.14.2 18-Jun-2011  bouyer Pull up following revision(s) (requested by manu in ticket #1624):
sys/ufs/ufs/ufs_extattr.c: revision 1.29 via patch
Fix filesystem root leaked lock when using UFS_EXTATTR_AUTOSTART.
This way, statvfs(2) calls obtained by df(1) or umount(8) will no
longer sleep forever in the kernel to acquire the lock.
 1.21.14.1 20-May-2011  bouyer Pull up following revision(s) (requested by manu in ticket #1619):
sys/ufs/ufs/ufs_extattr.c: revision 1.26 via patch
The lwp member of struct componentname was removed a long time ago.
Fix broken build with UFS_EXTATTR_AUTOSTART by removing it here as well.
 1.21.12.2 28-Apr-2009  skrll Sync with HEAD.
 1.21.12.1 19-Jan-2009  skrll Sync with HEAD.
 1.21.10.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.21.4.3 11-Aug-2010  yamt sync with head.
 1.21.4.2 16-May-2009  yamt sync with head
 1.21.4.1 04-May-2009  yamt sync with head.
 1.23.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.26.4.4 31-May-2011  rmind sync with head
 1.26.4.3 05-Mar-2011  rmind sync with head
 1.26.4.2 03-Jul-2010  rmind sync with head
 1.26.4.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.26.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.28.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.29.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.35.6.3 02-Jun-2012  mrg sync to latest -current.
 1.35.6.2 05-Apr-2012  mrg sync to latest -current.
 1.35.6.1 18-Feb-2012  mrg merge to -current.
 1.35.2.5 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.35.2.4 16-Jan-2013  yamt sync with (a bit old) head
 1.35.2.3 30-Oct-2012  yamt sync with head
 1.35.2.2 23-May-2012  yamt sync with head.
 1.35.2.1 17-Apr-2012  yamt sync with head
 1.36.2.5 04-Dec-2014  snj Pull up following revision(s) (requested by manu in ticket #1199):
sys/ufs/ufs/ufs_extattr.c: revision 1.46 via patch
Fix uninitialized mutex usage
We use extended attribute mount mutex before testing if it had been
initialized, and as reported by Christos, this caused panic with
LOCKDEBUG. Fix it by testing before using.
 1.36.2.4 04-Dec-2014  snj Pull up following revision(s) (requested by manu in ticket #1197):
sys/ufs/ufs/ufs_extattr.c: revision 1.41, 1.45
Remove always-true condition and note that the current code is
suboptimal.
--
Fix UFS1 extended attribute backend autocreation deadlock
UFS1 extended attribute backend autocration goes through a vn_open()
to create the backend file, and this forces us to release the lock
on the target node, in case the target is within the parents of the
backend file. That created a window within which another thread could
acquire a lock on the target vnode and deadlock awaiting for the
mount extended attribute lock.
We fix the problem by also releasing the mount extended attribute lock
when calling vn_open(), but that lets another thread race us for backend
creation. We just detect this using O_EXCL for vn_open() and by checking
for EEXIST return code. If we are raced, we fail backend creation but
this is not a problem since another thread succeeded on it: we just have
to use the result.
 1.36.2.3 04-Dec-2014  snj Pull up following revision(s) (requested by manu in ticket #1196):
sys/kern/vfs_mount.c: revision 1.31
sys/ufs/ffs/ffs_vfsops.c: revision 1.302
sys/ufs/ufs/ufs_extattr.c: revision 1.44
Fix use-after-free on failed unmount with extended attribute enabled
When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.
The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart
As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.
 1.36.2.2 13-Sep-2012  riz Pull up following revision(s) (requested by manu in ticket #556):
sys/ufs/ufs/ufs_extattr.c: revision 1.40
Fix unmount returnign EBUSY if an attribute was autocreated: we hold
a useless reference that we never gave back
 1.36.2.1 19-May-2012  riz branches: 1.36.2.1.2;
Pull up following revision(s) (requested by manu in ticket #260):
sys/kern/vfs_xattr.c: revision 1.31
sys/ufs/ufs/ufs_extattr.c: revision 1.39
Return ENODATA when no attribute is found, like Linux does. After
all we decided to adopt the Linux API, therefore there is rationale
to stick to it.
No standard tells us what to do, and our extended attribute API has not
been used in a release, therefore we do not break anything, and we get
more easily compatible with programs using the Linux extended attribute
API.
Note that FreeBSD and MacOS X return ENOATTR. FreeBSD has its own API
and MacOS X has a Linux-like API. How did the world get so complicated?
 1.36.2.1.2.1 01-Nov-2012  matt sync with netbsd-6-0-RELEASE.
 1.40.2.4 03-Dec-2017  jdolecek update from HEAD
 1.40.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.40.2.2 23-Jun-2013  tls resync from head
 1.40.2.1 25-Feb-2013  tls resync with head
 1.42.2.1 18-May-2014  rmind sync with head
 1.43.4.3 23-Nov-2014  martin Pull up following revision(s) (requested by manu in ticket #254):
sys/ufs/ufs/ufs_extattr.c: revision 1.46
Fix uninitialized mutex usage
We use extended attribute mount mutex before testing if it had been
initialized, and as reported by Christos, this caused panic with
LOCKDEBUG. Fix it by testing before using.
 1.43.4.2 18-Nov-2014  snj Pull up following revision(s) (requested by manu in ticket #247):
sys/ufs/ufs/ufs_extattr.c: revision 1.45
Fix UFS1 extended attribute backend autocreation deadlock
UFS1 extended attribute backend autocration goes through a vn_open()
to create the backend file, and this forces us to release the lock
on the target node, in case the target is within the parents of the
backend file. That created a window within which another thread could
acquire a lock on the target vnode and deadlock awaiting for the
mount extended attribute lock.
We fix the problem by also releasing the mount extended attribute lock
when calling vn_open(), but that lets another thread race us for backend
creation. We just detect this using O_EXCL for vn_open() and by checking
for EEXIST return code. If we are raced, we fail backend creation but
this is not a problem since another thread succeeded on it: we just have
to use the result.
 1.43.4.1 18-Nov-2014  snj Pull up following revision(s) (requested by manu in ticket #246):
sys/kern/vfs_mount.c: revision 1.31
sys/ufs/ffs/ffs_vfsops.c: revision 1.302
sys/ufs/ufs/ufs_extattr.c: revision 1.44
Fix use-after-free on failed unmount with extended attribute enabled
When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.
The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart
As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.
 1.46.2.2 05-Dec-2016  skrll Sync with HEAD
 1.46.2.1 09-Jul-2016  skrll Sync with HEAD
 1.47.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.48.16.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.48.16.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.50.2.1 17-Jan-2020  ad Sync with head.
 1.52.6.1 01-Aug-2021  thorpej Sync with HEAD.
 1.88 20-Oct-2021  thorpej Overhaul of the EVFILT_VNODE kevent(2) filter:

- Centralize vnode kevent handling in the VOP_*() wrappers, rather than
forcing each individual file system to deal with it (except VOP_RENAME(),
because VOP_RENAME() is a mess and we currently have 2 different ways
of handling it; at least it's reasonably well-centralized in the "new"
way).
- Add support for NOTE_OPEN, NOTE_CLOSE, NOTE_CLOSE_WRITE, and NOTE_READ,
compatible with the same events in FreeBSD.
- Track which kevent notifications clients are interested in receiving
to avoid doing work for events no one cares about (avoiding, e.g.
taking locks and traversing the klist to send a NOTE_WRITE when
someone is merely watching for a file to be deleted, for example).

In support of the above:

- Add support in vnode_if.sh for specifying PRE- and POST-op handlers,
to be invoked before and after vop_pre() and vop_post(), respectively.
Basic idea from FreeBSD, but implemented differently.
- Add support in vnode_if.sh for specifying CONTEXT fields in the
vop_*_args structures. These context fields are used to convey information
between the file system VOP function and the VOP wrapper, but do not
occupy an argument slot in the VOP_*() call itself. These context fields
are initialized and subsequently interpreted by PRE- and POST-op handlers.
- Version VOP_REMOVE(), uses the a context field for the file system to report
back the resulting link count of the target vnode. Return this in tmpfs,
udf, nfs, chfs, ext2fs, lfs, and ufs.

NetBSD 9.99.92.
 1.87 18-Jul-2021  dholland Abolish all the silly indirection macros for initializing vnode ops tables.

These are things of the form #define foofs_op genfs_op, or #define
foofs_op genfs_eopnotsupp, or similar. They serve no purpose besides
obfuscation, and have gotten cutpasted all over everywhere.
 1.86 16-May-2020  christos branches: 1.86.6;
Add ACL support for FFS. From FreeBSD.
 1.85 18-Apr-2020  christos Extended attribute support for ffsv2, from FreeBSD.
 1.84 17-Jan-2020  ad branches: 1.84.4;
VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.
 1.83 28-Oct-2016  jdolecek branches: 1.83.16; 1.83.22;
reorganize ffs_truncate()/ffs_indirtrunc() to be able to partially
succeed; change wapbl_register_deallocation() to return EAGAIN
rather than panic when code hits the limit

callers changed to either loop calling ffs_truncate() using new
utility ufs_truncate_retry() if their semantics requires it, or
just ignore the failure; remove ufs_wapbl_truncate()

this fixes possible user-triggerable panic during truncate, and
resolves WAPBL performance issue with truncates of large files

PR kern/47146 and kern/49175
 1.82 12-Apr-2016  christos branches: 1.82.2;
Remove gcc hack, it does not help.
Add more const.
 1.81 12-Apr-2016  christos Provide reason to be printed in panic string.
 1.80 11-Apr-2016  christos misc cleanups, no functional change
 1.79 27-Mar-2015  riastradh Disentangle buffer-cached I/O from page-cached I/O in UFS.

Page-cached I/O is used for regular files, and is initiated by VFS
users such as userland and NFS.

Buffer-cached I/O is used for directories and symlinks, and is issued
only internally by UFS.

New UFS routine ufs_bufio replaces vn_rdwr for internal use.
ufs_bufio is implemented by new UFS operations uo_bufrd/uo_bufwr,
which sit in ufs_readwrite.c alongside the VOP_READ/VOP_WRITE
implementations.

I preserved the code as much as possible and will leave further
simplification for future commits. I kept the ulfs_readwrite.c
copypasta close to ufs_readwrite.c in case we ever want to merge them
back; likewise ext2fs_readwrite.c.

No externally visible semantic change. All atf fs tests still pass.
 1.78 17-Mar-2015  hannken Change ffs to use vcache_new:
- Change ffs_valloc to return an inode number.
- Remove now obsolete UFS operations UFS_VALLOC and UFS_VFREE.
- Make ufs_makeinode private to ufs_vnops.c and pass vattr instead of mode.
 1.77 29-Oct-2014  christos branches: 1.77.2;
simplify and correct.
 1.76 21-Oct-2014  slp Move and unify indirect block truncate algorithm into a separate function.

Reviewed by joerg.
 1.75 25-May-2014  hannken branches: 1.75.2;
Remove ufs_checkpath() and ufs_readdotdot(). These are relics
from the pre-genfs_rename era.
 1.74 08-May-2014  hannken Add a global vnode cache:

- vcache_get() retrieves a referenced and initialised vnode / fs node pair.
- vcache_remove() removes a vnode / fs node pair from the cache.

On cache miss vcache_get() calls new vfs operation vfs_loadvnode() to
initialise a vnode / fs node pair. This call is guaranteed exclusive,
no other thread will try to load this vnode / fs node pair.

Convert ufs/ext2fs, ufs/ffs and ufs/mfs to use this interface.

Remove now unused ufs/ufs_ihash

Discussed on tech-kern.

Welcome to 6.99.41
 1.73 16-Jun-2013  hannken branches: 1.73.2; 1.73.6;
Add an UFS_SNAPGONE() ufs op replacing the calls
to ffs_snapgone() in ufs_lookup.c.

Ok: David Holland <dholland@netbsd.org>

Welcome to 6.99.22
 1.72 09-May-2012  riastradh branches: 1.72.2;
Adapt ffs, lfs, and ext2fs to use genfs_rename.

ok dholland, rmind
 1.71 01-Feb-2012  dholland Change the syscall API for quotas over to the new non-proplib one.

- struct vfs_quotactl_args -> struct quotactl_args
- add sys/stdint.h to sys/quotactl.h for clean userland build
- install sys/quotactl.h in /usr/include
- update set lists for same
- add new marshalling code in libquota
- add new unmarshalling code in vfs_syscalls.c
- discard proplib interpreter code in vfs_quotactl.c
- add dispatching code for the 14 quotactl ops in vfs_quotactl.c
- mark the proplib quotactl syscall obsolete
- add a new syscall number for the new quotactl syscall
- change the name of the syscall to __quotactl()
- remove the decl of the old quotactl from quota/quotaprop.h
- add a decl of the new quotactl to sys/quotactl.h
- update the libc build
- update ktruss
- remove proplib marshalling code from libquota
- update copy of syscall table in gdb ppc sources
- hack rumphijack to accomodate new quotactl name (as I recall,
pooka wanted such a name change to simplify something, but I
don't really see what/how)

This change appears to require a kernel version bump for rumpish
reasons.
 1.70 29-Jan-2012  dholland Remove the extra op argument to VFS_QUOTACTL() - the op is now stored
purely in the args structure.

This change requires a kernel version bump.
 1.69 29-Jan-2012  dholland Introduce struct vfs_quotactl_args. Use it.

This change uglifies vfs_quotactl some in order to make room for
moving operation-specific but FS-independent logic out of ufs_quota.c.

Note: this change requires a kernel version bump.
 1.68 29-Jan-2012  dholland Move the proplib-based quota command dispatching (that is, the code
that knows the magic string names for the allowed actions) out of
UFS-specific code and to fs-independent code.

This introduces QUOTACTL_* operation codes and changes the signature
of VFS_QUOTACTL() again for compile safety.

Note: this change requires a kernel version bump.
 1.67 29-Jan-2012  dholland Move the code for iterating over the multiple RPC calls in a quota
proplib XML packet to vfs_quotactl.c out of sys/ufs/ufs.

Add a dummy extra arg to VFS_QUOTACTL for compile safety.

Note: this change requires a kernel version bump.
 1.66 17-Jul-2011  dholland branches: 1.66.2; 1.66.6;
Provide correct locking for ufs_wapbl_rename. Note that this does not
fix the non-wapbl rename; that will be coming soon. This patch also
leaves a lot of the older locking-related code around in #if 0 blocks,
and there's a lot of leftover redundant logic. All that will be going
away later.

Relates to at least these PRs:

PR kern/24887
PR kern/41417
PR kern/42093
PR kern/43626

and possibly others.
 1.65 12-Jul-2011  dholland Pass the ufs_lookup_results pointer around instead of fetching it from
the inode in the guts of ufs. Now, in VOPs where i_crap is used it is
used (directly) only immediately on entry to the VOP call and then
passed around by reference.

Except for rename, which needs explicit sorting out. The code in
ufs_wapbl_rename is unchanged in behavior but I'm increasingly
inclined to think it's wrong.
 1.64 04-Apr-2011  ahoka add "struct ufid;" so we can include it without ufs/inode.h
 1.63 06-Mar-2011  bouyer merge the bouyer-quota2 branch. This adds a new on-disk format
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.

See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
 1.62 13-Sep-2009  tsutsui branches: 1.62.4; 1.62.6; 1.62.8;
Move declaration of ufs_hashlock into <ufs/ufs_extern.h> from each c source.
 1.61 22-Feb-2009  ad PR kern/26878 FFSv2 + softdep = livelock (no free ram)
PR kern/16942 panic with softdep and quotas
PR kern/19565 panic: softdep_write_inodeblock: indirect pointer #1 mismatch
PR kern/26274 softdep panic: allocdirect_merge: ...
PR kern/26374 Long delay before non-root users can write to softdep partitions
PR kern/28621 1.6.x "vp != NULL" panic in ffs_softdep.c:4653 while unmounting a softdep (+quota) filesystem
PR kern/29513 FFS+Softdep panic with unfsck-able file-corruption
PR kern/31544 The ffs softdep code appears to fail to write dirty bits to disk
PR kern/31981 stopping scsi disk can cause panic (softdep)
PR kern/32116 kernel panic in softdep (assertion failure)
PR kern/32532 softdep_trackbufs deadlock
PR kern/37191 softdep: locking against myself
PR kern/40474 Kernel panic after remounting raid root with softdep

Retire softdep, pass 2. As discussed and later formally announced on the
mailing lists.
 1.60 31-May-2008  ad branches: 1.60.6; 1.60.8; 1.60.12;
XXX softdep:

If the number of deletes in progress is getting too high, newdirrem()
requests the syncer to flush faster, and in some cases will block to
prevent deletes accumulating faster than the disk can service them.

The syncer will try to lock vnodes that the remover holds locked, leading
to the syncer and remover proceeding in lockstep and making very little
overall forward progress.

Put a hook into ufs_rmdir() and ufs_remove() so that the softdep code
can pace itself without holding vnode locks if the number of deletes is
running out of control.
 1.59 16-May-2008  hannken Make sure all cached buffers with valid, not yet written data have been
run through copy-on-write. Call fscow_run() with valid data where possible.

The LP_UFSCOW hack is no longer needed to protect ffs_copyonwrite() against
endless recursion.

- Add a flag B_MODIFY to bread(), breada() and breadn(). If set the caller
intends to modify the buffer returned.

- Always run copy-on-write on buffers returned from ffs_balloc().

- Add new function ffs_getblk() that gets a buffer, assigns a new blkno,
may clear the buffer and runs copy-on-write. Process possible errors
from getblk() or fscow_run(). Part of PR kern/38664.

Welcome to 4.99.63

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.58 25-Jan-2008  ad branches: 1.58.6; 1.58.8; 1.58.10; 1.58.12;
Remove VOP_LEASE. Discussed on tech-kern.
 1.57 03-Jan-2008  ad Use pool_cache.
 1.56 02-Jan-2008  ad Merge vmlocking2 to head.
 1.55 26-Nov-2007  pooka branches: 1.55.2; 1.55.6;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.54 09-Aug-2007  hannken branches: 1.54.2; 1.54.8;
Move the fstrans-aware lock vnops from ufs to ffs. Other ufs file systems
do not need them.

Ride on 4.99.28
 1.53 10-Jul-2007  hannken branches: 1.53.2; 1.53.6;
Move `struct dquot' and its supporting functions from quota.h to ufs_quota.c.

- Make quota-internal functions static.
- Clean up declarations in quota.h and ufs_extern.h. quota.h now has the
description of quota criterions, on-disk structure, user-kernel interface and
declaration of init/done functions. All ufs quota related function
prototypes go to ufs_extern.h.
- New functions ufsquota_init() and ufsquota_free() create or destroy the
quota fields of `struct inode'.
- chkdq() and chkiq() always update the quota fields of `struct inode' first.
- Only ufs_access() explicitely calls getinoquota().

No objections on tech-kern@
 1.52 04-Mar-2007  christos branches: 1.52.2; 1.52.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.51 21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.50 19-Jan-2007  hannken branches: 1.50.2;
New file system suspension API to replace vn_start_write and vn_finished_write.
The suspension helpers are now put into file system specific operations.
This means every file system not supporting these helpers cannot be suspended
and therefore snapshots are no longer possible.

Implemented for file systems of type ffs.

The new API is enabled on a kernel option NEWVNGATE. This option is
not enabled by default in any kernel config.

Presented and discussed on tech-kern with much input from
Bill Studenmund <wrstuden@netbsd.org> and YAMAMOTO Takashi <yamt@netbsd.org>.

Welcome to 4.99.9 (new vfs op vfs_suspendctl).
 1.49 14-May-2006  elad branches: 1.49.8;
integrate kauth.
 1.48 14-Jan-2006  yamt branches: 1.48.2; 1.48.4; 1.48.6; 1.48.8; 1.48.10;
- unify ffs_blkatoff and lfs_blkatoff.
- remove ufs_ops::uo_blkatoff.
- add directory read-ahead code. (disabled for now.)
 1.47 11-Dec-2005  christos branches: 1.47.2;
merge ktrace-lwp.
 1.46 23-Sep-2005  jmmv Apply the NFS exports list rototill patch:

- Remove all NFS related stuff from file system specific code.
- Drop the vfs_checkexp hook and generalize it in the new nfs_check_export
function, thus removing redundancy from all file systems.
- Move all NFS export-related stuff from kern/vfs_subr.c to the new
file sys/nfs/nfs_export.c. The former was becoming large and its code
is always compiled, regardless of the build options. Using the latter,
the code is only compiled in when NFSSERVER is enabled. While doing this,
also make some functions in nfs_subs.c conditional to NFSSERVER.
- Add a new command in nfssvc(2), called NFSSVC_SETEXPORTSLIST, that takes a
path and a set of export entries. At the moment it can only clear the
exports list or append entries, one by one, but it is done in a way that
allows setting the whole set of entries atomically in the future (see the
comment in mountd_set_exports_list or in doc/TODO).
- Change mountd(8) to use the nfssvc(2) system call instead of mount(2) so
that it becomes file system agnostic. In fact, all this whole thing was
done to remove a 'XXX' block from this utility!
- Change the mount*, newfs and fsck* userland utilities to not deal with NFS
exports initialization; done internally by the kernel when initializing
the NFS support for each file system.
- Implement an interface for VFS (called VFS hooks) so that several kernel
subsystems can run arbitrary code upon receipt of specific VFS events.
At the moment, this only provides support for unmount and is used to
destroy NFS exports lists from the file systems being unmounted, though it
has room for extension.

Thanks go to yamt@, chs@, thorpej@, wrstuden@ and others for their comments
and advice in the development of this patch.
 1.45 23-Jul-2005  yamt update file timestamps for nfsd loaned-read and mmap.
PR/25279. discussed on tech-kern@.
 1.44 10-Jul-2005  thorpej - Use ANSI function decls.
- Sprinkle some static.
 1.43 29-May-2005  christos branches: 1.43.2;
- sprinkle const
- avoid shadow variables.
 1.42 26-Feb-2005  perry branches: 1.42.2;
nuke trailing whitespace
 1.41 20-Jun-2004  hannken branches: 1.41.4; 1.41.6;
Use a pool for struct direct instead of kernel stack.
Reduces the kernel stack usage by 264 bytes.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.40 25-May-2004  hannken Add ffs internal snapshots. Written by Marshall Kirk McKusick for FreeBSD.

- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.

Welcome to 2.0F.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.39 27-Apr-2004  jrf First pass for some caddr_t removal and changes to get rid of it where we
no longer use and/or need it

- removed casts from unionfs, deadfs and fdesc
(there are more to hunt down still)
- changed vfs_quotactl args argumet from caddr_t to void *
- changed vfs_quotactl structures/callers to reflect the api change

Compiled fine and ran for about a day. Approved/reviewed by
christos@netbsd.org and gimpy@netbsd.org.
 1.38 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.37 05-Aug-2003  pk Pass the inode flags to set as an argument to ufs_dirrewrite().
Use it to restore the behaviour of not updating the modified time of a
directory that moves to a new parent.
 1.36 29-Jun-2003  fvdl branches: 1.36.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.35 29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.34 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.33 18-May-2003  yamt make is_sequential a callback in order to achieve better lfs write clustering.

since lfs always rewrite blocks into the new segment,
current on-disk place of the block doesn't affect to write clustering.

ok'ed by Konrad Schroder.
 1.32 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.31 17-Feb-2003  perseant Add code to UBCify LFS. This is still behind "#ifdef LFS_UBC" for now
(there are still some details to work out) but expect that to go
away soon. To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:

* Create a writer daemon kernel thread whose purpose is to handle page
writes for the pagedaemon, but which also takes over some of the
functions of lfs_check(). This thread is started the first time an
LFS is mounted.

* Add a "flags" parameter to GOP_SIZE. Current values are
GOP_SIZE_READ, meaning that the call should return the size of the
in-core version of the file, and GOP_SIZE_WRITE, meaning that it
should return the on-disk size. One of GOP_SIZE_READ or
GOP_SIZE_WRITE must be specified.

* Instead of using malloc(...M_WAITOK) for everything, reserve enough
resources to get by and use malloc(...M_NOWAIT), using the reserves if
necessary. Use the pool subsystem for structures small enough that
this is feasible. This also obsoletes LFS_THROTTLE.

And a few that are not strictly necessary:

* Moves the LFS inode extensions off onto a separately allocated
structure; getting closer to LFS as an LKM. "Welcome to 1.6O."

* Unified GOP_ALLOC between FFS and LFS.

* Update LFS copyright headers to correct values.

* Actually cast to unsigned in lfs_shellsort, like the comment says.

* Keep track of which segments were empty before the previous
checkpoint; any segments that pass two checkpoints both dirty and
empty can be summarily cleaned. Do this. Right now lfs_segclean
still works, but this should be turned into an effectless
compatibility syscall.
 1.30 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.29 01-Dec-2002  matt Add multiple inclusion protection for headers. Fix mismatched
variable declarations (missing const's) as needed.
 1.28 18-Dec-2001  fvdl Bring over fixes from FreeBSD that weren't incorporated yet, mainly
from Kirk McKusick. They implement taking pending block/inode frees
into account for the sake of correct statfs() numbers, and adding
a new softdep type (newdirblk) to correctly handle newly allocated
directory blocks.

Minor additional changes: 1) swap the newly introduced fs_pendinginodes
and fs_pendingblock fields in ffs_sb_swap, and 2) declare lkt_held
in the debug version of the softdep lock structure volatile, as it
can be modified from interrupt context #ifdef DEBUG.
 1.27 15-Sep-2001  chs a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.26 15-Sep-2001  chs add a new VFS op, vfs_reinit, which is called when desiredvnodes is
adjusted via sysctl. file systems that have hash tables which are
sized based on the value of this variable now resize those hash tables
using the new value. the max number of FFS softdeps is also recalculated.

convert various file systems to use the <sys/queue.h> macros for
their hash tables.
 1.25 28-May-2001  chs branches: 1.25.4; 1.25.6;
add a genfs_mmap() and change all of the disk-based filesystems
to implement VOP_MMAP() with the genfs version, in preparation for
actually using this VOP.
 1.24 27-Nov-2000  chs branches: 1.24.2;
Initial integration of the Unified Buffer Cache project.
 1.23 16-Mar-2000  jdolecek Change ufs_init() to keep global count of how many times it was called.
Resources are initialized still just once (on first call).

Add ufs_done(), which takes care of freeing all resources allocated in
ufs_init(). The resources are freed only when last user of the code exits.
 1.22 14-Feb-2000  fvdl Fixes to the softdep code from Ethan Solomita <ethan@geocast.com>.
* Fix buffer ordering when it has dependencies.
* Alleviate memory problems.
* Deal with some recursive vnode locks (sigh).
* Fix other bugs.
 1.21 15-Nov-1999  fvdl Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.20 03-Aug-1999  wrstuden branches: 1.20.2; 1.20.4; 1.20.8;
Add support for fcntl(2) to generate VOP_FCNTL calls. Any fcntl
call with F_FSCTL set and F_SETFL calls generate calls to a new
fileop fo_fcntl. Add genfs_fcntl() and soo_fcntl() which return 0
for F_SETFL and EOPNOTSUPP otherwise. Have all leaf filesystems
use genfs_fcntl().

Reviewed by: thorpej
Tested by: wrstuden
 1.19 03-Aug-1999  drochner clean up inclusion of "opt_ffs.h" and use of "FFS_EI" a bit
 1.18 08-Jul-1999  wrstuden Modify file systems to deal with struct lock in struct vnode. All leaf
fs's other than nfs use genfs_lock() for locking.

Modify lookup routines to set PDIRUNLOCK when they unlock the parrent.
 1.17 26-Feb-1999  wrstuden branches: 1.17.4;
Modify vfsops to seperate vfs_fhtovp() into two routines. vfs_fhtovp() now
only handles the file handle to vnode conversion, and a new call,
vfs_checkexp(), performs the export verification.
 1.16 12-Nov-1998  thorpej defopt FFS_EI
 1.15 10-Aug-1998  matthias create miscfs/genfs/genfs_vnops.c:genfs_enoioctl and make all the other
filesystems use it instead of a private version.
 1.14 25-Jun-1998  thorpej Use genfs_lease_check()
 1.13 24-Jun-1998  sommerfe Always include fifos; "not an option any more".
 1.12 22-Jun-1998  sommerfe defopt for options FIFO
 1.11 13-Jun-1998  kleink KNF, mostly of FFS_EI changes.
 1.10 18-Mar-1998  bouyer Add support for reading/writing FFS in non-native byte order, conditioned
to "options FFS_EI". The superblock and inodes (without blk addr) are
byteswapped at disk read/write time, other metadatas are byteswapped
when used (as they are acceeded directly in the buffer cache).
This required the addition of a "um_flags" field to struct ufsmount.
ffs_bswap.c contains superblock and inode byteswap routines also used
by userland utilities.
 1.9 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.8 11-Apr-1997  kleink Implement a POSIX compliant genfs VOP_SEEK() and use it in the appropriate
places; by Chris G. Demetriou and myself.
 1.7 07-Sep-1996  mycroft Implement poll(2).
 1.6 01-Sep-1996  mycroft Add a set of generic file system operations that most file systems use.
Also, fix some time stamp bogosities.
 1.5 09-Feb-1996  christos ufs prototype changes
 1.4 14-Dec-1994  mycroft Sync with CSRG.
 1.3 13-Dec-1994  mycroft Turn lease_check() into a vnode op, per CSRG.
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.17.4.2 02-Aug-1999  thorpej Update from trunk.
 1.17.4.1 04-Jul-1999  chs add ufs_balloc_range().
 1.20.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.20.4.2 03-Nov-1999  fvdl Give ufs_ihashget an extra argument: the flags passed to vget() for
locking. This way we can avoid locking against ourselves when
ufs_ihashget is called during the flushing of metadata. XXX

Also, comment out a VOP_FSYNC call that I think is now unneeded, and
put a diagnostic printf there to check if this still happens.
 1.20.4.1 19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.20.2.2 08-Dec-2000  bouyer Sync with HEAD.
 1.20.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.24.2.4 11-Dec-2002  thorpej Sync with HEAD.
 1.24.2.3 08-Jan-2002  nathanw Catch up to -current.
 1.24.2.2 21-Sep-2001  nathanw Catch up to -current.
 1.24.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.25.6.1 01-Oct-2001  fvdl Catch up with -current.
 1.25.4.3 25-Sep-2002  jdolecek switch over to genfs_kqfilter(), g/c the ufs_kqfilter() code
 1.25.4.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.25.4.1 10-Jul-2001  lukem * implement ufs_kqfilter(), filt_ufs*()
* add KNOTE(9) calls as appropriate
 1.36.2.8 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.36.2.7 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.36.2.6 27-Oct-2004  skrll Remove the struct lwp * arguments from qsync and ufs_checkpath that are
no longer (read: were never) required.
 1.36.2.5 21-Sep-2004  skrll Fix the sync with head I botched.
 1.36.2.4 18-Sep-2004  skrll Sync with HEAD.
 1.36.2.3 24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.36.2.2 03-Aug-2004  skrll Sync with HEAD
 1.36.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.41.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.41.4.1 29-Apr-2005  kent sync with -current
 1.42.2.1 24-Aug-2005  riz Pull up following revision(s) (requested by yamt in ticket #688):
sys/miscfs/genfs/genfs_vnops.c: revision 1.98 via patch
sys/ufs/ffs/ffs_vfsops.c: revision 1.165
sys/ufs/lfs/lfs_extern.h: revision 1.69
sys/fs/filecorefs/filecore_vfsops.c: revision 1.20
sys/nfs/nfs_node.c: revision 1.80
sys/fs/smbfs/smbfs_node.c: revision 1.24
sys/fs/cd9660/cd9660_vfsops.c: revision 1.24
sys/fs/msdosfs/msdosfs_denode.c: revision 1.8
sys/miscfs/genfs/genfs_node.h: revision 1.6
sys/ufs/lfs/lfs_vfsops.c: revision 1.183
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.86
sys/fs/adosfs/advfsops.c: revision 1.23
sys/fs/ntfs/ntfs_vfsops.c: revision 1.31
- constify genfs_ops.
- use member designators.

sys/miscfs/genfs/genfs_vnops.c: revision 1.99 via patch
genfs_getpages: don't forget to put the vnode onto the syncer's work que
ue
even in the case of PGO_LOCKED.

sys/uvm/uvm_bio.c: revision 1.40
sys/uvm/uvm_pager.h: revision 1.29
sys/miscfs/genfs/genfs_vnops.c: revision 1.100 via patch
sys/ufs/ufs/ufs_inode.c: revision 1.50
- introduce PGO_NOBLOCKALLOC and use it for ubc mapping
to prevent unnecessary block allocations in the case that
page size > block size.
- ufs_balloc_range: use VM_PROT_WRITE+PGO_NOBLOCKALLOC rather than
VM_PROT_READ.

sys/uvm/uvm_fault.c: revision 1.96
sys/miscfs/genfs/genfs_vnops.c: revision 1.101 via patch
sys/uvm/uvm_object.h: revision 1.19
sys/miscfs/genfs/genfs_node.h: revision 1.7
ensure that vnodes with dirty pages are always on syncer's queue.
- genfs_putpages: wait for i/o completion of PG_RELEASED/PG_PAGEOUT pages by
setting "wasclean" false when encountering them.
suggested by Stephan Uphoff in PR/24596 (1).
- genfs_putpages: write protect pages when cleaning out, if
we're going to take the vnode off the syncer's queue.
uvm_fault: don't write-map pages unless its vnode is already on
the syncer's queue.
fix PR/24596 (3) but in the different way from the suggested fix.
(to keep our current behaviour, ie. not to require explicit msync.
discussed on tech-kern@.)
- genfs_putpages: don't mistakenly take a vnode off the queue
by introducing a generation number in genfs_node.
genfs_getpages: increment the generation number.
suggested by Stephan Uphoff in PR/24596 (2).
- add some assertions.

sys/miscfs/genfs/genfs_vnops.c: revision 1.102 via patch
genfs_putpages: don't bother to clean the vnode unless VONWORKLST.

sys/ufs/ffs/ffs_vnops.c: revision 1.71
ffs_full_fsync: because VBLK/VCHR can be mmap'ed,
do VOP_PUTPAGES for them as well.

sys/uvm/uvm_fault.c: revision 1.97
uvm_fault: check a correct object in the case of layered filesystems.
fix PR/30811 from Jukka Salmi.

sys/uvm/uvm_object.h: revision 1.20
sys/ufs/ffs/ffs_vfsops.c: revision 1.167
sys/uvm/uvm_bio.c: revision 1.41
sys/ufs/ufs/ufs_vnops.c: revision 1.129
sys/uvm/uvm_mmap.c: revision 1.92
sys/uvm/uvm_fault.c: revision 1.98
sys/kern/vfs_subr.c: revision 1.252
sys/fs/msdosfs/denode.h: revision 1.5
sys/miscfs/genfs/genfs_vnops.c: revision 1.103 via patch
sys/fs/msdosfs/msdosfs_denode.c: revision 1.9
sys/sys/vnode.h: revision 1.141
sys/ufs/ufs/ufs_inode.c: revision 1.51
sys/ufs/ufs/ufs_extern.h: revision 1.45 via patch
sys/miscfs/genfs/genfs_node.h: revision 1.8
sys/ufs/lfs/lfs_vfsops.c: revision 1.184
sys/uvm/uvm_pager.h: revision 1.30
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.87
update file timestamps for nfsd loaned-read and mmap.
PR/25279. discussed on tech-kern@.

sys/miscfs/genfs/genfs_vnops.c: revision 1.104 via patch
don't write-protect wired pages. pointed by Chuck Silvers.
for now, leave a vnode on the syncer's queue, as suggested by him.

sys/ufs/ffs/ffs_vnops.c: revision 1.72
revert VCHR part of ffs_vnops.c 1.71.
as VCHR uses the device pager, no point to call VOP_PUTPAGES here.
pointed by Chuck Silvers.
 1.43.2.6 04-Feb-2008  yamt sync with head.
 1.43.2.5 21-Jan-2008  yamt sync with head
 1.43.2.4 07-Dec-2007  yamt sync with head
 1.43.2.3 03-Sep-2007  yamt sync with head.
 1.43.2.2 26-Feb-2007  yamt sync with head.
 1.43.2.1 21-Jun-2006  yamt sync with head.
 1.47.2.1 15-Jan-2006  yamt sync with head.
 1.48.10.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.48.8.2 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.48.8.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.48.6.1 24-May-2006  yamt sync with head.
 1.48.4.1 01-Jun-2006  kardel Sync with head.
 1.48.2.1 09-Sep-2006  rpaulo sync with head
 1.49.8.1 01-Feb-2007  ad Sync with head.
 1.50.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.50.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.52.4.1 11-Jul-2007  mjf Sync with head.
 1.52.2.2 20-Aug-2007  ad Sync with HEAD.
 1.52.2.1 15-Jul-2007  ad Sync with head.
 1.53.6.2 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.53.6.1 16-Aug-2007  jmcneill Sync with HEAD.
 1.53.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.54.8.2 18-Feb-2008  mjf Sync with HEAD.
 1.54.8.1 08-Dec-2007  mjf Sync with HEAD.
 1.54.2.2 23-Mar-2008  matt sync with HEAD
 1.54.2.1 09-Jan-2008  matt sync with HEAD
 1.55.6.2 08-Jan-2008  bouyer Sync with HEAD
 1.55.6.1 02-Jan-2008  bouyer Sync with HEAD
 1.55.2.1 19-Dec-2007  ad Get lfs mostly working.
 1.58.12.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.58.10.2 16-Sep-2009  yamt sync with head
 1.58.10.1 04-May-2009  yamt sync with head.
 1.58.8.2 04-Jun-2008  yamt sync with head
 1.58.8.1 18-May-2008  yamt sync with head.
 1.58.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.60.12.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.60.8.1 19-May-2012  riz Apply patch (requested by buhrow in ticket #1759):


sys/ufs/lfs/lfs_vnops.c patch
sys/ufs/ufs/inode.h patch
sys/ufs/ufs/ufs_extern.h patch
sys/ufs/ufs/ufs_lookup.c patch
sys/ufs/ufs/ufs_vnops.c patch
sys/ufs/ufs/ufs_wapbl.c patch

Port dholland's ufs_rename locking changes to netbsd-5.
[buhrow, ticket #1759]

Hello. More testing has revealed a minor misunderstanding between the
vnode API in -current and 5.x. The below patch, against NetBSD-5.1
sources, rolls all the accumulated patches into one patch set. With this
patch, I believe you can now run with WAPBL, softdep or traditional ufs
semantics with heavy file loads and avoid panics due to resource exhaustion
and/or tstile deadlocks. Testing has been done on I386, both uniprocessor
and multiprocessor, and on Sparc machines in uniprocessor mode, though I
think multiprocessor Sparc would be fine as well. Since these changes are
machine independent, I don't anticipate any issues on any platform. It is
my hope that modulo any final issues that come up in the final round of
testing I'm currently performing, these patches will be ready to be pulled
up into the NetBSD-5 branch.
Finally, I'd like to thank mouse@ and hannken@ for their help and
patience in helping me track down and test the final versions of these
patches. With their assistance, I'm confident these patches make NetBSD-5
a much more stable and robust operating environment in a variety of
setings.
 1.60.6.1 03-Mar-2009  skrll Sync with HEAD.
 1.62.8.2 08-Feb-2011  bouyer Minimal hacking to make 'options QUOTA' compile again.
 1.62.8.1 20-Jan-2011  bouyer Snapshot of work in progress on a modernised disk quota system:
- new quotactl syscall (versionned for backward compat), which takes
as parameter a path to a mount point, and a prop_dictionary
(in plistref format) describing commands and arguments.
For each command, status and data are returned as a prop_dictionary.
quota commands features will be added to take advantage of this,
exporting quota data or getting quota commands as plists.

- new on disk-format storage (all 64bit wide), integrated to metadata for
ffs (and playing nicely with wapbl).
Quotas are enabled on a ffs filesystem via superblock flags.
tunefs(8) can enable or disable quotas.
On a quota-enabled filesystem, fsck_ffs(8) will track per-uid/gid
block and inode usages, and will check and update quotas in Pass 6.
quota usage and limits are stored in unliked files (one for users,
one for groups)l fsck_ffs(8) will create the files if needed, or
free them if needed. This means that after enabling or disabling
quotas on a filesystem; a fsck_ffs(8) run is required.
quotacheck(8) is not needed any more, on a unclean shutdown
fsck or journal replay will take care of fixing quotas.
newfs(8) can create a ready-to-mount quota-enabled filesystem
(superblock flags are set and quota inodes are created).
Other new features or semantic changes:
- default quota datas, applied to users or groups which don't already
have a quota entry
- per-user/group grace time (instead of a filesystem global one)
- 0 really means "nothing allowed at all", not "no limit".
If you want "no limit", set the limit to UQUAD_MAX (tools will
understand "unlimited" and "-")

A quota file is structured as follow:
it starts with a header, containing a few per-filesystem values,
and the default quota limits.
Quota entries are linked together as a simple list, each entry has a
pointer (as an offset withing the file) to the next.
The header has a pointer to a list of free quota entries, and
a hash table of in-use entries. The size of the hash table depends
on the filesystem block size (header+hash table should fit in the
first block). The file is not sparse and is a multiple of
filesystem block size (when the free quota entry list is empty a new
filesystem block is allocated). quota entries to not cross
filesystem block boundaries.

In memory, the kernel keeps a cache of recently used quota entries
as a reference to the block number, and offset withing the block.
The quota entry itself is keept in the buf cache.

fsck_ffs(8), tunefs(8) and newfs(8) supports are completed (with
related atf tests :)
The kernel can update disk usage and report it via quotactl(2).

Todo: enforce quotas limits (limits are not checked by kernel yet)
update repquota, edquota and rpc.rquotad to the new world
implement compat_50_quotactl ioctl.
update quotactl(2) man page

fsck_ffs required fixes so that allocating new blocks or inodes will
properly update the superblock and cg sumaries. This was not an issue up
to now because superblock and cg sumaries check happened last, but now
allocations or frees can happen in pass 6.
 1.62.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.62.4.1 21-Apr-2011  rmind sync with head
 1.66.6.2 02-Jun-2012  mrg sync to latest -current.
 1.66.6.1 18-Feb-2012  mrg merge to -current.
 1.66.2.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.66.2.2 23-May-2012  yamt sync with head.
 1.66.2.1 17-Apr-2012  yamt sync with head
 1.72.2.4 03-Dec-2017  jdolecek update from HEAD
 1.72.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.72.2.2 23-Jun-2013  tls resync from head
 1.72.2.1 10-Feb-2013  tls Add an accessor -- ufs_maxphys() -- to check the maximum transfer size
for a given UFS mountpoint, and move the code from mount that finds
the underlying disk and resets the mountpoint max transfer size into a
utility function, ufs_update_maxphys().

Add a global serial number that counts disk property changes to which
filesystems are meant to accomodate themselves. Make ufs_maxphys()
check it. This is a sort of flag-polling interface that avoids callbacks
into the filesystem code, but will require freezing filesystems and
draining in-flight transactions before a decrease in size that is
mandatory (like attaching a disk with a smaller maximum transfer size
as a spare in a RAIDframe set), rather than "advisory", like finding
out set geometry from a RAID controller long after boot and deciding
a smaller transfer size would be optimal, can be signalled. Still, the
"advisory" case is the common one so this is progress.

Make a bit of an example of RAIDframe by making it bump this new
serial number when disks are added to the subsystem. I will attack
one of the hardware RAID drivers (probably arcmsr) next.
 1.73.6.1 10-Aug-2014  tls Rebase.
 1.73.2.1 18-May-2014  rmind sync with head
 1.75.2.1 28-Jan-2015  martin Pull up following revision(s) (requested by christos in ticket #425):
sys/ufs/ufs/ufs_inode.c: revision 1.91-1.92
sys/ufs/ufs/ufs_vnops.c: revision 1.223-1.224
sys/ufs/ufs/ufs_extern.h: revision 1.76-1.77
sys/ufs/ffs/ffs_vfsops.c: revision 1.303-1.305
Add debugging for mount...
Merge some error returns
Check more errors
Restore apple ufs error handling.
Move and unify indirect block truncate algorithm into a separate function.
PR/39371: Tobias Nygren: Don't fail mounting root if WAPBL log is corrupt.
Patch from Sergio L. Pascual.
 1.77.2.3 05-Dec-2016  skrll Sync with HEAD
 1.77.2.2 22-Apr-2016  skrll Sync with HEAD
 1.77.2.1 06-Apr-2015  skrll Sync with HEAD
 1.82.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.83.22.1 17-Jan-2020  ad Sync with head.
 1.83.16.2 21-Apr-2020  martin Sync with HEAD
 1.83.16.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.84.4.1 20-Apr-2020  bouyer Sync with HEAD
 1.86.6.1 01-Aug-2021  thorpej Sync with HEAD.
 1.33 08-May-2014  hannken Add a global vnode cache:

- vcache_get() retrieves a referenced and initialised vnode / fs node pair.
- vcache_remove() removes a vnode / fs node pair from the cache.

On cache miss vcache_get() calls new vfs operation vfs_loadvnode() to
initialise a vnode / fs node pair. This call is guaranteed exclusive,
no other thread will try to load this vnode / fs node pair.

Convert ufs/ext2fs, ufs/ffs and ufs/mfs to use this interface.

Remove now unused ufs/ufs_ihash

Discussed on tech-kern.

Welcome to 6.99.41
 1.32 27-Feb-2014  hannken branches: 1.32.2;
The current implementation of vn_lock() is racy. Modification of
the vnode operations vector for active vnodes is unsafe because it
is not known whether deadfs or the original file system will be
called.

- Pass down LK_RETRY to the lock operation (hint for deadfs only).

- Change deadfs lock operation to return ENOENT if LK_RETRY is unset.

- Change all other lock operations to check for dead vnode once
the vnode is locked and unlock and return ENOENT in this case.

With these changes in place vnode lock operations will never succeed
after vclean() has marked the vnode as VI_XLOCK and before vclean()
has changed the operations vector.

Adresses PR kern/37706 (Forced unmount of file systems is unsafe)

Discussed on tech-kern.

Welcome to 6.99.33
 1.31 12-Jun-2011  rmind branches: 1.31.2; 1.31.12; 1.31.16;
Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.30 21-Jul-2010  hannken branches: 1.30.6;
Make holding v_interlock mandatory for callers of vget().

Announced some time ago on tech-kern.
 1.29 01-Jul-2010  hannken Remove vlockmgr(). Generic vnode lock operations now use a rwlock located
in the vnode. All LK_* flags move from sys/lock.h to sys/vnode.h. Calls
to vlockmgr() in file systems get replaced with VOP_LOCK() or VOP_UNLOCK().

Welcome to 5.99.34.

Discussed on tech-kern.
 1.28 05-Nov-2009  bouyer branches: 1.28.2; 1.28.4;
getcleanvnode(): don't vclean() the vnode if it has gained another
reference while we were getting the v_interlock.
vget(): attempt prevent it from returning a clean vnode:
if the vnode is being inactivated (by vrelel()), wait for
vrelel() to complete (or return EBUSY if we can't wait), and return
ENOENT if the vnode has been vclean'ed by vrelel()
Fix kern/41147 in a better way, hopefully fix other related race conditions.
 1.27 20-Sep-2009  bouyer PR kern/41147: race between nfsd and local rm
Note that the race also exists between 2 nfs client, one of them doing the rm.
In ufs_ihashget(), vget() can return a vnode that has been vclean'ed because
vget() can sleep. After vget returns, check that vp is still connected with
ip, and that ip still points to the inode we want. This fix the NULL
pointer dereference in ufs_fhtovp() I've been seeing on a NFS server.

XXX I have no idea why using vput() instead of
vlockmgr(vp->v_vnlock, LK_RELEASE); vrele(vp); does not work.
 1.26 05-May-2008  ad branches: 1.26.10; 1.26.18;
- Convert hashinit() to use kmem_alloc(). The hash tables can be large
and it's better to not have them in kmem_map.
- Convert a couple of minor items along the way to kmem_alloc().
- Fix some memory leaks.
 1.25 30-Jan-2008  ad branches: 1.25.6; 1.25.8; 1.25.10;
Replace struct lock on vnodes with a simpler lock object built on
krwlock_t. This is a step towards removing lockmgr and simplifying
vnode locking. Discussed on tech-kern.
 1.24 02-Jan-2008  ad Merge vmlocking2 to head.
 1.23 28-May-2007  ad branches: 1.23.8; 1.23.14; 1.23.16; 1.23.20;
Fix lock order inversion between vnode locks and ufs_hashlock. Addresses
kern/36331 (MP deadlock between ufs_ihashget() and VOP_LOOKUP()) for ffs,
other file systems to follow. Reported by perseant@, debugged by Sverre
Froyen, patch posted/tested by Blair Sadewitz.
 1.22 27-Feb-2007  ad branches: 1.22.2; 1.22.4;
Destroy the hash locks on final unmount.
 1.21 15-Feb-2007  ad branches: 1.21.2;
Replace some uses of lockmgr() / simplelocks.
 1.20 11-Dec-2005  christos merge ktrace-lwp.
 1.19 10-Jul-2005  thorpej - Use ANSI function decls.
- Sprinkle some static.
 1.18 07-Aug-2003  agc branches: 1.18.16;
Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.17 29-Jun-2003  fvdl branches: 1.17.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.16 29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.15 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.14 08-Nov-2001  lukem add RCSID
 1.13 26-Oct-2001  lukem remove #include <ufs/ufs/quota.h> where it was just to appease
<ufs/ufs/inode.h>, since the latter now includes the former. leave the former
in source that obviously uses specific bits of it (for completeness.)
 1.12 15-Sep-2001  chs branches: 1.12.2;
add a new VFS op, vfs_reinit, which is called when desiredvnodes is
adjusted via sysctl. file systems that have hash tables which are
sized based on the value of this variable now resize those hash tables
using the new value. the max number of FFS softdeps is also recalculated.

convert various file systems to use the <sys/queue.h> macros for
their hash tables.
 1.11 08-Nov-2000  ad branches: 1.11.2; 1.11.6; 1.11.8;
Update for hashinit() change.
 1.10 16-Mar-2000  jdolecek Change ufs_init() to keep global count of how many times it was called.
Resources are initialized still just once (on first call).

Add ufs_done(), which takes care of freeing all resources allocated in
ufs_init(). The resources are freed only when last user of the code exits.
 1.9 15-Nov-1999  fvdl Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.8 08-Jul-1999  wrstuden branches: 1.8.2; 1.8.4; 1.8.8;
Modify file systems to deal with struct lock in struct vnode. All leaf
fs's other than nfs use genfs_lock() for locking.

Modify lookup routines to set PDIRUNLOCK when they unlock the parrent.
 1.7 01-Mar-1998  fvdl branches: 1.7.10;
Merge with Lite2 + local changes
 1.6 07-Feb-1998  chs add flags arg to hashinit(), to pass to malloc().
 1.5 15-Jul-1997  fvdl Give the hash lock a better name, it's not just ffs that uses it.
 1.4 06-Jul-1997  fvdl Put lock around inode hashing, because getnewvnode or MALLOC might block,
creating race conditions.
 1.3 09-Feb-1996  christos ufs prototype changes
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.7.10.1 02-Aug-1999  thorpej Update from trunk.
 1.8.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.8.4.1 03-Nov-1999  fvdl Give ufs_ihashget an extra argument: the flags passed to vget() for
locking. This way we can avoid locking against ourselves when
ufs_ihashget is called during the flushing of metadata. XXX

Also, comment out a VOP_FSYNC call that I think is now unneeded, and
put a diagnostic printf there to check if this still happens.
 1.8.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.8.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.11.8.1 01-Oct-2001  fvdl Catch up with -current.
 1.11.6.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.11.2.2 14-Nov-2001  nathanw Catch up to -current.
 1.11.2.1 21-Sep-2001  nathanw Catch up to -current.
 1.12.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.17.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.17.2.5 21-Sep-2004  skrll Fix the sync with head I botched.
 1.17.2.4 18-Sep-2004  skrll Sync with HEAD.
 1.17.2.3 24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.17.2.2 03-Aug-2004  skrll Sync with HEAD
 1.17.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.18.16.5 04-Feb-2008  yamt sync with head.
 1.18.16.4 21-Jan-2008  yamt sync with head
 1.18.16.3 03-Sep-2007  yamt sync with head.
 1.18.16.2 26-Feb-2007  yamt sync with head.
 1.18.16.1 21-Jun-2006  yamt sync with head.
 1.21.2.1 12-Mar-2007  rmind Sync with HEAD.
 1.22.4.1 11-Jul-2007  mjf Sync with head.
 1.22.2.2 09-Jun-2007  ad Sync with head.
 1.22.2.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.23.20.1 02-Jan-2008  bouyer Sync with HEAD
 1.23.16.2 19-Dec-2007  ad Get lfs mostly working.
 1.23.16.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.23.14.1 18-Feb-2008  mjf Sync with HEAD.
 1.23.8.2 23-Mar-2008  matt sync with HEAD
 1.23.8.1 09-Jan-2008  matt sync with HEAD
 1.25.10.2 11-Aug-2010  yamt sync with head.
 1.25.10.1 16-May-2008  yamt sync with head.
 1.25.8.1 18-May-2008  yamt sync with head.
 1.25.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.26.18.1 21-Apr-2010  matt sync to netbsd-5
 1.26.10.2 08-Nov-2009  snj Pull up following revision(s) (requested by bouyer in ticket #1129):
sys/kern/vfs_subr.c: revision 1.386
sys/ufs/ufs/ufs_ihash.c: revision 1.28
getcleanvnode(): don't vclean() the vnode if it has gained another
reference while we were getting the v_interlock.
vget(): attempt prevent it from returning a clean vnode:
if the vnode is being inactivated (by vrelel()), wait for
vrelel() to complete (or return EBUSY if we can't wait), and return
ENOENT if the vnode has been vclean'ed by vrelel()
Fix kern/41147 in a better way, hopefully fix other related race conditions.
 1.26.10.1 28-Sep-2009  snj Pull up following revision(s) (requested by bouyer in ticket #1029):
sys/ufs/ufs/ufs_ihash.c: revision 1.27
PR kern/41147: race between nfsd and local rm
Note that the race also exists between 2 nfs client, one of them doing the rm.
In ufs_ihashget(), vget() can return a vnode that has been vclean'ed because
vget() can sleep. After vget returns, check that vp is still connected with
ip, and that ip still points to the inode we want. This fix the NULL
pointer dereference in ufs_fhtovp() I've been seeing on a NFS server.
XXX I have no idea why using vput() instead of
vlockmgr(vp->v_vnlock, LK_RELEASE); vrele(vp); does not work.
 1.28.4.3 05-Mar-2011  rmind sync with head
 1.28.4.2 03-Jul-2010  rmind sync with head
 1.28.4.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.28.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.30.6.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.31.16.1 18-May-2014  rmind sync with head
 1.31.12.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.31.2.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.32.2.1 10-Aug-2014  tls Rebase.
 1.112 05-Sep-2020  riastradh Round of uvm.h cleanup.

The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
to query whether curlwp is the pagedaemon, which should maybe be
exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
by it. We should split up uvm_extern.h but this will serve for now
to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
UVMHIST(ubchist), since ubchist is declared in uvm.h but the
reference evaporates if UVMHIST is not defined, so we reduce header
file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
here.

ok chs@
 1.111 26-Jul-2020  chs pull in a bit more FreeBSD code to allow specifying truncation of
the regular bmap (IO_NORMAL) independently of the extattr bmap (IO_EXT).
fixes fs corruption when removing extattrs in UFS2.
 1.110 18-Apr-2020  christos Extended attribute support for ffsv2, from FreeBSD.
 1.109 23-Feb-2020  ad branches: 1.109.4;
UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.108 15-Jan-2020  ad Merge from yamt-pagecache (after much testing):

- Reduce unnecessary page scan in putpages esp. when an object has a ton of
pages cached but only a few of them are dirty.

- Reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
 1.107 31-Dec-2019  ad branches: 1.107.2;
- Add and use wrapper functions that take and acquire page interlocks, and pairs
of page interlocks. Require that the page interlock be held over calls to
uvm_pageactivate(), uvm_pagewire() and similar.

- Solve the concurrency problem with page replacement state. Rather than
updating the global state synchronously, set an intended state on
individual pages (active, inactive, enqueued, dequeued) while holding the
page interlock. After the interlock is released put the pages on a 128
entry per-CPU queue for their state changes to be made real in batch.
This results in in a ~400 fold decrease in contention on my test system.
Proposed on tech-kern but modified to use the page interlock rather than
atomics to synchronise as it's much easier to maintain that way, and
cheaper.
 1.106 13-Dec-2019  ad Break the global uvm_pageqlock into a per-page identity lock and a private
lock for use of the pagedaemon policy code. Discussed on tech-kern.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour
 1.105 10-Dec-2018  jdolecek put back UFS_WAPBL_JUNLOCK_ASSERT(), the underlying rw_write_held() check
doesn't actually have a race since it checks if the rwlock is held by
current lwp
 1.104 10-Dec-2018  jdolecek make UFS_WAPBL_JLOCK_ASSERT() #ifdef DIAGNOSTIC, same as the underlying
function KASSERT(), so that it actually does something; fix code using
it to actually pass correct params, so that it compiles

remove UFS_WAPBL_JUNLOCK_ASSERT(), as that is inherently racy (it's
okay on those places if the rwlock is held by other lwp); depend
on the RW_ASSERT()/LOCKDEBUG inside rw_enter() to catch the case
with wapbl rwlock held by current lwp
 1.103 28-Jan-2018  hannken branches: 1.103.4;
Make sure inode blocks and size are zero when VOP_INACTIVE()
finalises a now unlinked inode.
Counterpart of the check in ffs_newvnode().
 1.102 28-Oct-2017  pgoyette Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3) format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
but it is possible that I've missed some of them. I would be glad to
update any stragglers that anyone identifies.
 1.101 26-May-2017  riastradh branches: 1.101.2;
Eliminate crusty debugging sludge.

We have a mostly sane vnode lifecycle now. If this needs debugging,
it should be done once at the call site of VOP_RECLAIM.
 1.100 11-Apr-2017  riastradh Make VOP_INACTIVE preserve vnode lock on return.

Discussed on tech-kern:
https://mail-index.netbsd.org/tech-kern/2017/04/01/msg021751.html

Ride 7.99.68, a bumpy bus of incremental vfs improvements!
 1.99 01-Mar-2017  hannken Remove now redundant calls to fstrans_start()/fstrans_done().
 1.98 04-Jan-2017  hannken branches: 1.98.2;
Change ufs_truncate_retry() to call UFS_TRUNCATE() at least once.
Even with "newsize == ip->i_size" it must set mtime etc.

Adresses PR kern/51762 "mtime not updated by open(O_TRUNC)"
 1.97 28-Oct-2016  jdolecek reorganize ffs_truncate()/ffs_indirtrunc() to be able to partially
succeed; change wapbl_register_deallocation() to return EAGAIN
rather than panic when code hits the limit

callers changed to either loop calling ffs_truncate() using new
utility ufs_truncate_retry() if their semantics requires it, or
just ignore the failure; remove ufs_wapbl_truncate()

this fixes possible user-triggerable panic during truncate, and
resolves WAPBL performance issue with truncates of large files

PR kern/47146 and kern/49175
 1.96 20-Aug-2016  hannken Remove now obsolete operation vcache_remove().

Welcome to 7.99.36
 1.95 13-Jun-2015  hannken branches: 1.95.2;
ufs_inactive: stop overwriting error status and return the last error seen.

Should resolve CID 1306276 (UNUSED_VALUE)
 1.94 10-Jun-2015  hannken ufs_inactive: take UFS_WAPBL_BEGIN() before calling chkiq().

Should fix PR kern/49948 (quota panic)
 1.93 15-Apr-2015  riastradh Release the glock on VOP_GETPAGES failure.

Tripped over by nick@'s failing disk, missing unlock in error branch
discovered by jmcneill@.
 1.92 29-Oct-2014  christos branches: 1.92.2;
simplify and correct.
 1.91 21-Oct-2014  slp Move and unify indirect block truncate algorithm into a separate function.

Reviewed by joerg.
 1.90 08-May-2014  hannken branches: 1.90.2;
Add a global vnode cache:

- vcache_get() retrieves a referenced and initialised vnode / fs node pair.
- vcache_remove() removes a vnode / fs node pair from the cache.

On cache miss vcache_get() calls new vfs operation vfs_loadvnode() to
initialise a vnode / fs node pair. This call is guaranteed exclusive,
no other thread will try to load this vnode / fs node pair.

Convert ufs/ext2fs, ufs/ffs and ufs/mfs to use this interface.

Remove now unused ufs/ufs_ihash

Discussed on tech-kern.

Welcome to 6.99.41
 1.89 22-Jan-2013  dholland branches: 1.89.2; 1.89.10;
Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.88 20-Sep-2011  chs branches: 1.88.2; 1.88.12;
strengthen the assertions about pages existing during block allocation,
which were incorrectly relaxed last year. add some comments so that
the intent of these is hopefully clearer.

in ufs_balloc_range(), don't free pages or mark them dirty if
allocating their backing store failed. this fixes PR 45369.
 1.87 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.86 19-May-2011  manu branches: 1.86.2;
Call ufs_extattr_vnode_inactive before UFS_WAPBL_BEGIN, as the latter will
leave the vnode locked, and ufs_extattr_vnode_inactive does lock/unlock
 1.85 19-May-2011  rmind Remove cache_purge(9) calls from reclamation routines in the file systems,
as vclean(9) performs it for us since Lite2 merge.
 1.84 06-Mar-2011  bouyer merge the bouyer-quota2 branch. This adds a new on-disk format
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.

See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
 1.83 01-Sep-2010  chs branches: 1.83.2; 1.83.4;
replace the earlier workaround for PR 40389 with a better fix.
the earlier change caused data corruption by freeing pages
without invaliding their mappings. instead of the trylock/retry,
just take the genfs-node lock before calling VOP_GETPAGES()
and pass a new flag to tell it that we're already holding this lock.
 1.82 28-Jul-2010  hannken ext2fs,ffs: free on disk inodes in the reclaim routine.
Remove now unneeded vnode flag VI_FREEING.

Welcome to 5.99.38.

Ok: Andrew Doran <ad@netbsd.org>
 1.81 24-Jun-2010  hannken Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.
 1.80 15-Mar-2010  hannken branches: 1.80.2;
Allow ufs_inactive() while a file system is suspending. Removes a possible
deadlock between vrele() and ffs_sync() during suspension.
 1.79 07-Feb-2010  bouyer branches: 1.79.2;
- ufs_balloc_range(): on error, only PG_RELEASED the pages that were
allocated to extend the file to the new size. Releasing all pages
may release pages that contains previously-written data not yet flushed
to disk. Should fix PR kern/35704
- {ffs,lfs,ext2fs}_truncate(): Even if the inode's size is the same as
the new length, call uvm_vnp_setsize(). *_truncate() may have been
called by *_write() in the error path (e.g. block allocation failure
because of quota of file system full), and at this point v_writesize
has been set to the desired size of the file and not reverted to the
old size. Not adjusting v_writesize to the real size cause
genfs_do_io() to write to disk past the real end of the file.
 1.78 22-Feb-2009  ad PR kern/26878 FFSv2 + softdep = livelock (no free ram)
PR kern/16942 panic with softdep and quotas
PR kern/19565 panic: softdep_write_inodeblock: indirect pointer #1 mismatch
PR kern/26274 softdep panic: allocdirect_merge: ...
PR kern/26374 Long delay before non-root users can write to softdep partitions
PR kern/28621 1.6.x "vp != NULL" panic in ffs_softdep.c:4653 while unmounting a softdep (+quota) filesystem
PR kern/29513 FFS+Softdep panic with unfsck-able file-corruption
PR kern/31544 The ffs softdep code appears to fail to write dirty bits to disk
PR kern/31981 stopping scsi disk can cause panic (softdep)
PR kern/32116 kernel panic in softdep (assertion failure)
PR kern/32532 softdep_trackbufs deadlock
PR kern/37191 softdep: locking against myself
PR kern/40474 Kernel panic after remounting raid root with softdep

Retire softdep, pass 2. As discussed and later formally announced on the
mailing lists.
 1.77 04-Feb-2009  pooka branches: 1.77.2;
Break hold-and-wait which happens in ufs_balloc_range() when we
have pages busied and are trying to get the genfs node lock.
This causes a lock order reversal described in PR kern/40389.
This is not a proper fix and only a workaround for NetBSD 5.0.

problem first reported by simonb, patch tested by rmind
 1.76 31-Jul-2008  simonb branches: 1.76.2; 1.76.4;
Merge the simonb-wapbl branch. From the original branch commit:

Add Wasabi System's WAPBL (Write Ahead Physical Block Logging)
journaling code. Originally written by Darrin B. Jewell while
at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

OK'd by core@, releng@.
 1.75 17-Jan-2008  ad branches: 1.75.6; 1.75.10; 1.75.12; 1.75.14; 1.75.16;
Fix dodgy tests of v_usecount.
 1.74 09-Jan-2008  ad Go back to freeing on disk inodes in the inactive routine. It would be
better not to do this, but it rules out potential side effects with softdep.
 1.73 03-Jan-2008  pooka make the UFS_EXTATTR case build
 1.72 02-Jan-2008  ad Merge vmlocking2 to head.
 1.71 08-Dec-2007  pooka branches: 1.71.4;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.
 1.70 26-Nov-2007  pooka branches: 1.70.2;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.69 10-Oct-2007  ad branches: 1.69.4;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.68 25-Sep-2007  pooka avoid variable size stack allocations
 1.67 10-Jul-2007  hannken branches: 1.67.6; 1.67.8; 1.67.10;
Move `struct dquot' and its supporting functions from quota.h to ufs_quota.c.

- Make quota-internal functions static.
- Clean up declarations in quota.h and ufs_extern.h. quota.h now has the
description of quota criterions, on-disk structure, user-kernel interface and
declaration of init/done functions. All ufs quota related function
prototypes go to ufs_extern.h.
- New functions ufsquota_init() and ufsquota_free() create or destroy the
quota fields of `struct inode'.
- chkdq() and chkiq() always update the quota fields of `struct inode' first.
- Only ufs_access() explicitely calls getinoquota().

No objections on tech-kern@
 1.66 17-May-2007  hannken Fstrans_start() always returns zero, so change its type to void.
 1.65 07-Apr-2007  hannken Remove calls to now obsolete vn_start_write() and vn_finished_write().
 1.64 29-Jan-2007  hannken branches: 1.64.2; 1.64.6; 1.64.8;
Change fstrans enum types to upper case.
No functional change.

From Antti Kantee <pooka@netbsd.org>
 1.63 19-Jan-2007  hannken New file system suspension API to replace vn_start_write and vn_finished_write.
The suspension helpers are now put into file system specific operations.
This means every file system not supporting these helpers cannot be suspended
and therefore snapshots are no longer possible.

Implemented for file systems of type ffs.

The new API is enabled on a kernel option NEWVNGATE. This option is
not enabled by default in any kernel config.

Presented and discussed on tech-kern with much input from
Bill Studenmund <wrstuden@netbsd.org> and YAMAMOTO Takashi <yamt@netbsd.org>.

Welcome to 4.99.9 (new vfs op vfs_suspendctl).
 1.62 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.61 14-Oct-2006  yamt don't use g_glock directly.
 1.60 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.59 14-May-2006  elad branches: 1.59.8; 1.59.10;
integrate kauth.
 1.58 30-Mar-2006  yamt some cleanups after the introduction of GOP_SIZE_MEM flag.
- remove GOP_SIZE_READ/GOP_SIZE_WRITE flags.
they have not been used since the change.
- ufs_balloc_range: remove code which has been no-op since the change.
thanks Konrad Schroder for explaining the original intention of the code.
- ffs_gop_size: don't extend past eof, in the case of GOP_SIZE_MEM.
otherwise genfs_getpages end up to allocate pages past eof unnecessarily.
 1.57 23-Dec-2005  rpaulo branches: 1.57.4; 1.57.6; 1.57.8; 1.57.10; 1.57.12;
Convert UFS_EXTATTR to struct lwp.
 1.56 23-Dec-2005  yamt prevent in-core vnode being freed from getting new references.
otherwise, once the corresponding bit in the inode bitmap is cleared,
an unrelated inode with the same inode number can be allocated and
ufs_ihashget() picks a stale in-core vnode for it.

PR/32301 by Matthias Scheler.
 1.55 11-Dec-2005  christos merge ktrace-lwp.
 1.54 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.53 14-Sep-2005  yamt branches: 1.53.2;
ufs_balloc_range: correct range to clear PG_RDONLY.
fix a panic in ubc_fault.
 1.52 28-Aug-2005  thorpej Experimental support for extended attributes on UFS1 file systems, using a
backing file per attribute type indexed by inode number to hold the extended
attributes.

This is working pretty well on my test systems, except for the "autostart"
feature. I need someone with a better handle on the VFS locking protocol
to go over that.

This is a work-in-progress. There are parts of this that could be re-factored
allowing this approach to be used on other types of file systems.

Adapted from FreeBSD.
 1.51 23-Jul-2005  yamt update file timestamps for nfsd loaned-read and mmap.
PR/25279. discussed on tech-kern@.
 1.50 17-Jul-2005  yamt - introduce PGO_NOBLOCKALLOC and use it for ubc mapping
to prevent unnecessary block allocations in the case that
page size > block size.

- ufs_balloc_range: use VM_PROT_WRITE+PGO_NOBLOCKALLOC rather than
VM_PROT_READ.
 1.49 10-Jul-2005  thorpej Defflag UFS_DIRHASH.
 1.48 10-Jul-2005  thorpej - Use ANSI function decls.
- Sprinkle some static.
 1.47 23-Jan-2005  rumble branches: 1.47.6; 1.47.8;
Bring in Ian Dowse's Dirhash from FreeBSD. Hash tables of
directories are created on the fly and used to increase
performance by circumventing ufs_lookup's linear search.

Dirhash is enabled by the UFS_DIRHASH option, but not
by default.
 1.46 20-Dec-2004  dbj branches: 1.46.2;
use #if defined(_KERNEL_OPT) around opt includes
fix arg to pool_init() when _LKM is defined
 1.45 08-Oct-2004  dbj remove diagnostic check for modified inactive inodes in ufs_reclaim
this condition can occur if ufs_inactive experiences failure attempting
to write the inode out. Instead, have ufs_reclaim always call VOP_UPDATE
which will only write out the inode if there are unflushed changes
 1.44 14-Aug-2004  mycroft Push atime/mtime updates even further -- into the reclaim path, so they happen
rarely in the normal case. (Note: This happens at reboot/shutdown time because
all file systems are unmounted.)

Also, for IN_MODIFY, use IN_ACCESSED, not IN_MODIFIED; otherwise "ls -l" of
your device node or FIFO would cause the time stamps to get written too
quickly.
 1.43 14-Aug-2004  mycroft Add a new flag, IN_MODIFY. This is like IN_UPDATE|IN_CHANGE, but unlike
setting those flags, it does not cause the inode to be written in the periodic
sync. This is used for writes to special files (devices and named pipes) and
FIFOs.

Do not preemptively sync updates to access times and modification times. They
are now updated in the inode only opportunistically, or when the file or device
is closed. (Really, it should be delayed beyond close, but this is enough to
help substantially with device nodes.)

And the most amusing part:
Trickle sync was broken on both FFS and ext2fs, in different ways. In FFS, the
periodic call to VFS_SYNC(MNT_LAZY) was still causing all file data to be
synced. In ext2fs, it was causing the metadata to *not* be synced. We now
only call VOP_UPDATE() on the node if we're doing MNT_LAZY. I've confirmed
that we do in fact trickle correctly now.
 1.42 05-Nov-2003  hannken Clean up the usage of vn_start_write(). At least one occurence clobbered
previous error conditions.
If "(flags & (V_WAIT|V_PCATCH)) == V_WAIT" the return value is always zero.
Ignore the return value in these cases.

From Darrin B. Jewell.
 1.41 15-Oct-2003  hannken Add the gating of system calls that cause modifications to the underlying
file system.
The function vfs_write_suspend stops all new write operations to a file
system, allows any file system modifying system calls already in progress
to complete, then sync's the file system to disk and returns. The
function vfs_write_resume allows the suspended write operations to
complete.

From FreeBSD with slight modifications.

Approved by: Frank van der Linden <fvdl@netbsd.org>
 1.40 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.39 29-Jun-2003  fvdl branches: 1.39.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.38 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.37 15-May-2003  kristerw The C language does not permit statements of the form
(X ? Y : Z) = 0;
even though gcc handles this by a stupid extension.

Transform these to correct C.

Approved by fvdl.
 1.36 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.35 01-Mar-2003  perseant Be careful to always zero pages on truncation/fragment extension,
in the case where the filesystem block size is larger than PAGE_SIZE.
 1.34 17-Feb-2003  perseant Add code to UBCify LFS. This is still behind "#ifdef LFS_UBC" for now
(there are still some details to work out) but expect that to go
away soon. To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:

* Create a writer daemon kernel thread whose purpose is to handle page
writes for the pagedaemon, but which also takes over some of the
functions of lfs_check(). This thread is started the first time an
LFS is mounted.

* Add a "flags" parameter to GOP_SIZE. Current values are
GOP_SIZE_READ, meaning that the call should return the size of the
in-core version of the file, and GOP_SIZE_WRITE, meaning that it
should return the on-disk size. One of GOP_SIZE_READ or
GOP_SIZE_WRITE must be specified.

* Instead of using malloc(...M_WAITOK) for everything, reserve enough
resources to get by and use malloc(...M_NOWAIT), using the reserves if
necessary. Use the pool subsystem for structures small enough that
this is feasible. This also obsoletes LFS_THROTTLE.

And a few that are not strictly necessary:

* Moves the LFS inode extensions off onto a separately allocated
structure; getting closer to LFS as an LKM. "Welcome to 1.6O."

* Unified GOP_ALLOC between FFS and LFS.

* Update LFS copyright headers to correct values.

* Actually cast to unsigned in lfs_shellsort, like the comment says.

* Keep track of which segments were empty before the previous
checkpoint; any segments that pass two checkpoints both dirty and
empty can be summarily cleaned. Do this. Right now lfs_segclean
still works, but this should be turned into an effectless
compatibility syscall.
 1.33 26-Jan-2002  chs fix an error case.
 1.32 27-Dec-2001  fvdl Use softdep_change_linkcnt to note that the inode mode was set to 0.
From FreeBSD.
 1.31 18-Dec-2001  fvdl Bring over fixes from FreeBSD that weren't incorporated yet, mainly
from Kirk McKusick. They implement taking pending block/inode frees
into account for the sake of correct statfs() numbers, and adding
a new softdep type (newdirblk) to correctly handle newly allocated
directory blocks.

Minor additional changes: 1) swap the newly introduced fs_pendinginodes
and fs_pendingblock fields in ffs_sb_swap, and 2) declare lkt_held
in the debug version of the softdep lock structure volatile, as it
can be modified from interrupt context #ifdef DEBUG.
 1.30 22-Nov-2001  chs we need to hold the pageq lock while calling uvm_page_unbusy() on
pages that uvm_page_unbusy() will free.
 1.29 08-Nov-2001  chs in both paths that can cause fragments to be expanded (write and truncate-up),
deal with the fragment expansion separately before the rest of the operation.
this allows us to simplify ufs_balloc_range() by not worrying about implicit
fragment expansion.

call VOP_PUTPAGES() directly for vnodes instead of
going through the UVM pager "put" vector.
 1.28 08-Nov-2001  lukem add RCSID
 1.27 10-Oct-2001  chs branches: 1.27.2;
in ufs_balloc_range(), if we extend a fragment and need to write the
fragment synchronously, update the vnode's size before doing the flush.
otherwise we might only write part of the data and cause softdep's
accounting to get out of sync. fixes PR 14201.
many thanks to enami for figuring out what was going on.
 1.26 30-Sep-2001  chs in ffs_balloc(), clean up page cache state to avoid hangs when we
get ENOSPC. as a result of this, we now skip some of the normal cleanup
in ufs_balloc_range() in the error case.
 1.25 15-Sep-2001  chs a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.24 04-Jul-2001  chs branches: 1.24.2; 1.24.4;
in ufs_balloc_range(), clear PG_RDONLY on pages which now have backing store.
fixes PR 13353.
 1.23 18-Apr-2001  enami Don't flush possibilly relocated file system block if write is done
asynchronously. They will be flushed later when necessary and flushing
now makes sequential write access very slow.
 1.22 27-Feb-2001  chs branches: 1.22.2;
min() -> MIN(), max() -> MAX().
fixes more problems with file offsets > 4GB.
 1.21 18-Feb-2001  chs skip truncating a file to 0 before freeing it if it's already zero-length.
 1.20 18-Feb-2001  chs fix the range args to pgo_flush() in the error path of ufs_balloc_range().
 1.19 07-Feb-2001  tsutsui Fix nested extern declaration of prtactive.
 1.18 03-Dec-2000  chs in ufs_balloc_range(), don't rely on uvm_vnp_setsize() to invalidate
pages we've allocated past the real EOF when we fail to allocate a block.
we used to play games with the VM notion of the file size but we don't do
that anymore, so uvm_vnp_setsize() doesn't do what we want anymore.
call the pager flush op instead.
 1.17 01-Dec-2000  chs make sure that pages are on an paging queue before unlocking them.
 1.16 27-Nov-2000  chs Initial integration of the Unified Buffer Cache project.
 1.15 29-May-2000  mycroft Add a new inode flags called IN_ACCESSED. This used in place of IN_MODIFIED
to record that the atime was updated. In ffs_update(), we only do synchronous
writes if something *other* than the atime was changed.
 1.14 30-Mar-2000  augustss branches: 1.14.2;
Remove register declarations.
 1.13 05-Mar-1999  mycroft branches: 1.13.4; 1.13.8;
Pass null pointers to VOP_UPDATE rather than having all the callers fetch the
current time themselves.
 1.12 04-Dec-1998  bouyer No need to #include malloc.h here.
 1.11 08-Jun-1998  scottr Use the newly-defined opt_quota.h.
 1.10 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.9 11-Jun-1997  bouyer Add support for ext2fs, this needed a few modifications to ufs/ufs/inode.h:
- added an "union inode_ext" to struct inode, for the per-fs extentions.
For now only ext2fs uses it.
- i_din is now an union:
union {
struct dinode ffs_din; /* 128 bytes of the on-disk dinode. */
struct ext2fs_dinode e2fs_din; /* 128 bytes of the on-disk dinode. */
} i_din
Added a lot of #define i_ffs_* and i_e2fs_* to access the fields.
- Added two macros: FFS_ITIMES and EXT2FS_ITIMES. ITIMES calls the rigth
macro, depending on the time of the inode. ITIMES is used where necessary,
FFS_ITIMES and EXT2FS_ITIMES in other places.
 1.8 10-Mar-1997  mycroft Just increment the generation count. Using the time is bogus and defeats
fsirand(8).
 1.7 11-May-1996  mycroft branches: 1.7.8;
Change VOP_UPDATE() semantics:
* Make 2nd and 3rd args timespecs, not timevals.
* Consistently pass a Boolean as the 4th arg (except in LFS).
Also, fix ffs_update() and lfs_update() to actually change the nsec fields.
 1.6 09-Feb-1996  christos ufs prototype changes
 1.5 14-Dec-1994  mycroft Sync with CSRG.
 1.4 20-Oct-1994  cgd update for new syscall args description mechanism, and deal safely
with wider types.
 1.3 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.2 13-Jun-1994  mycroft Move definition of prtactive.
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.7.8.1 12-Mar-1997  is Merge in changes from Trunk
 1.13.8.5 21-Apr-2001  bouyer Sync with HEAD
 1.13.8.4 12-Mar-2001  bouyer Sync with HEAD.
 1.13.8.3 11-Feb-2001  bouyer Sync with HEAD.
 1.13.8.2 08-Dec-2000  bouyer Sync with HEAD.
 1.13.8.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.13.4.4 06-Aug-1999  chs fix math in ufs_balloc_range().
 1.13.4.3 31-Jul-1999  chs in ufs_balloc_range(), handle cases where we need to hold multiple pages
at the beginning of the range, and where the number of pages at the
beginning and end of the range are different.
 1.13.4.2 11-Jul-1999  chs in ufs_balloc_range(), read any partial pages that were already
allocated before allocating additional blocks in the same page.
hold these pages busy for the duration of the allocations and
release them when we're done.
 1.13.4.1 04-Jul-1999  chs add ufs_balloc_range().
 1.14.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.22.2.8 28-Feb-2002  nathanw Catch up to -current.
 1.22.2.7 08-Jan-2002  nathanw Catch up to -current.
 1.22.2.6 14-Nov-2001  nathanw Catch up to -current.
 1.22.2.5 22-Oct-2001  nathanw Catch up to -current.
 1.22.2.4 08-Oct-2001  nathanw Catch up to -current.
 1.22.2.3 21-Sep-2001  nathanw Catch up to -current.
 1.22.2.2 24-Aug-2001  nathanw Catch up with -current.
 1.22.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.24.4.2 11-Oct-2001  fvdl Catch up with -current. Fix some bogons in the sparc64 kbd/ms
attach code. cd18xx conversion provided by mrg.
 1.24.4.1 01-Oct-2001  fvdl Catch up with -current.
 1.24.2.2 11-Feb-2002  jdolecek Sync w/ -current.
 1.24.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.27.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.39.2.9 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.39.2.8 24-Jan-2005  skrll Sync with HEAD.
 1.39.2.7 17-Jan-2005  skrll Sync with HEAD.
 1.39.2.6 19-Oct-2004  skrll Sync with HEAD
 1.39.2.5 21-Sep-2004  skrll Fix the sync with head I botched.
 1.39.2.4 18-Sep-2004  skrll Sync with HEAD.
 1.39.2.3 25-Aug-2004  skrll Sync with HEAD.
 1.39.2.2 03-Aug-2004  skrll Sync with HEAD
 1.39.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.46.2.1 29-Apr-2005  kent sync with -current
 1.47.8.7 21-Jan-2008  yamt sync with head
 1.47.8.6 07-Dec-2007  yamt sync with head
 1.47.8.5 27-Oct-2007  yamt sync with head.
 1.47.8.4 03-Sep-2007  yamt sync with head.
 1.47.8.3 26-Feb-2007  yamt sync with head.
 1.47.8.2 30-Dec-2006  yamt sync with head.
 1.47.8.1 21-Jun-2006  yamt sync with head.
 1.47.6.2 04-Oct-2005  tron Pull up following revision(s) (requested by yamt in ticket #848):
sys/ufs/ufs/ufs_inode.c: revision 1.53
ufs_balloc_range: correct range to clear PG_RDONLY.
fix a panic in ubc_fault.
 1.47.6.1 24-Aug-2005  riz Pull up following revision(s) (requested by yamt in ticket #688):
sys/miscfs/genfs/genfs_vnops.c: revision 1.98 via patch
sys/ufs/ffs/ffs_vfsops.c: revision 1.165
sys/ufs/lfs/lfs_extern.h: revision 1.69
sys/fs/filecorefs/filecore_vfsops.c: revision 1.20
sys/nfs/nfs_node.c: revision 1.80
sys/fs/smbfs/smbfs_node.c: revision 1.24
sys/fs/cd9660/cd9660_vfsops.c: revision 1.24
sys/fs/msdosfs/msdosfs_denode.c: revision 1.8
sys/miscfs/genfs/genfs_node.h: revision 1.6
sys/ufs/lfs/lfs_vfsops.c: revision 1.183
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.86
sys/fs/adosfs/advfsops.c: revision 1.23
sys/fs/ntfs/ntfs_vfsops.c: revision 1.31
- constify genfs_ops.
- use member designators.

sys/miscfs/genfs/genfs_vnops.c: revision 1.99 via patch
genfs_getpages: don't forget to put the vnode onto the syncer's work que
ue
even in the case of PGO_LOCKED.

sys/uvm/uvm_bio.c: revision 1.40
sys/uvm/uvm_pager.h: revision 1.29
sys/miscfs/genfs/genfs_vnops.c: revision 1.100 via patch
sys/ufs/ufs/ufs_inode.c: revision 1.50
- introduce PGO_NOBLOCKALLOC and use it for ubc mapping
to prevent unnecessary block allocations in the case that
page size > block size.
- ufs_balloc_range: use VM_PROT_WRITE+PGO_NOBLOCKALLOC rather than
VM_PROT_READ.

sys/uvm/uvm_fault.c: revision 1.96
sys/miscfs/genfs/genfs_vnops.c: revision 1.101 via patch
sys/uvm/uvm_object.h: revision 1.19
sys/miscfs/genfs/genfs_node.h: revision 1.7
ensure that vnodes with dirty pages are always on syncer's queue.
- genfs_putpages: wait for i/o completion of PG_RELEASED/PG_PAGEOUT pages by
setting "wasclean" false when encountering them.
suggested by Stephan Uphoff in PR/24596 (1).
- genfs_putpages: write protect pages when cleaning out, if
we're going to take the vnode off the syncer's queue.
uvm_fault: don't write-map pages unless its vnode is already on
the syncer's queue.
fix PR/24596 (3) but in the different way from the suggested fix.
(to keep our current behaviour, ie. not to require explicit msync.
discussed on tech-kern@.)
- genfs_putpages: don't mistakenly take a vnode off the queue
by introducing a generation number in genfs_node.
genfs_getpages: increment the generation number.
suggested by Stephan Uphoff in PR/24596 (2).
- add some assertions.

sys/miscfs/genfs/genfs_vnops.c: revision 1.102 via patch
genfs_putpages: don't bother to clean the vnode unless VONWORKLST.

sys/ufs/ffs/ffs_vnops.c: revision 1.71
ffs_full_fsync: because VBLK/VCHR can be mmap'ed,
do VOP_PUTPAGES for them as well.

sys/uvm/uvm_fault.c: revision 1.97
uvm_fault: check a correct object in the case of layered filesystems.
fix PR/30811 from Jukka Salmi.

sys/uvm/uvm_object.h: revision 1.20
sys/ufs/ffs/ffs_vfsops.c: revision 1.167
sys/uvm/uvm_bio.c: revision 1.41
sys/ufs/ufs/ufs_vnops.c: revision 1.129
sys/uvm/uvm_mmap.c: revision 1.92
sys/uvm/uvm_fault.c: revision 1.98
sys/kern/vfs_subr.c: revision 1.252
sys/fs/msdosfs/denode.h: revision 1.5
sys/miscfs/genfs/genfs_vnops.c: revision 1.103 via patch
sys/fs/msdosfs/msdosfs_denode.c: revision 1.9
sys/sys/vnode.h: revision 1.141
sys/ufs/ufs/ufs_inode.c: revision 1.51
sys/ufs/ufs/ufs_extern.h: revision 1.45 via patch
sys/miscfs/genfs/genfs_node.h: revision 1.8
sys/ufs/lfs/lfs_vfsops.c: revision 1.184
sys/uvm/uvm_pager.h: revision 1.30
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.87
update file timestamps for nfsd loaned-read and mmap.
PR/25279. discussed on tech-kern@.

sys/miscfs/genfs/genfs_vnops.c: revision 1.104 via patch
don't write-protect wired pages. pointed by Chuck Silvers.
for now, leave a vnode on the syncer's queue, as suggested by him.

sys/ufs/ffs/ffs_vnops.c: revision 1.72
revert VCHR part of ffs_vnops.c 1.71.
as VCHR uses the device pager, no point to call VOP_PUTPAGES here.
pointed by Chuck Silvers.
 1.53.2.1 20-Oct-2005  yamt adapt ufs.
 1.57.12.2 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.57.12.1 31-Mar-2006  tron Merge 2006-03-31 NetBSD-current into the "peter-altq" branch.
 1.57.10.3 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.57.10.2 19-Apr-2006  elad sync with head.
 1.57.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.57.8.2 24-May-2006  yamt sync with head.
 1.57.8.1 01-Apr-2006  yamt sync with head.
 1.57.6.2 01-Jun-2006  kardel Sync with head.
 1.57.6.1 22-Apr-2006  simonb Sync with head.
 1.57.4.1 09-Sep-2006  rpaulo sync with head
 1.59.10.2 10-Dec-2006  yamt sync with head.
 1.59.10.1 22-Oct-2006  yamt sync with head
 1.59.8.2 01-Feb-2007  ad Sync with head.
 1.59.8.1 18-Nov-2006  ad Sync with head.
 1.64.8.1 11-Jul-2007  mjf Sync with head.
 1.64.6.9 19-Oct-2007  ad ufs_reclaim: leave the inode on the hash chain until we have updated both
the free map and the on disk inode. At this point the about-to-be-dead
vnode has VI_XLOCK set. Toegether these prevent the on-disk inode from
being reused until we are completely finished with it.
 1.64.6.8 09-Oct-2007  ad Sync with head.
 1.64.6.7 08-Oct-2007  ad If an inode is being made inactive and the file has being deleted,
defer updating the on-disk inode until ufs_reclaim.
 1.64.6.6 16-Sep-2007  ad - Checkpoint work in progress on the vnode lifecycle and reference counting
stuff. This makes it work properly without kernel_lock and fixes a few
quite old bugs. See vfs_subr.c 1.283.2.17 for details.

- Fix some problems with softdep. Unfortunately our softdep code appears
to have some longstanding bugs that cause it fail under stress test.
 1.64.6.5 15-Jul-2007  ad Sync with head.
 1.64.6.4 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.64.6.3 08-Jun-2007  ad Sync with head.
 1.64.6.2 10-Apr-2007  ad Sync with head.
 1.64.6.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.64.2.2 17-May-2007  yamt sync with head.
 1.64.2.1 15-Apr-2007  yamt sync with head.
 1.67.10.2 14-Oct-2007  yamt sync with head.
 1.67.10.1 06-Oct-2007  yamt sync with head.
 1.67.8.3 23-Mar-2008  matt sync with HEAD
 1.67.8.2 09-Jan-2008  matt sync with HEAD
 1.67.8.1 06-Nov-2007  matt sync with HEAD
 1.67.6.4 09-Dec-2007  jmcneill Sync with HEAD.
 1.67.6.3 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.67.6.2 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.67.6.1 02-Oct-2007  joerg Sync with HEAD.
 1.69.4.3 18-Feb-2008  mjf Sync with HEAD.
 1.69.4.2 27-Dec-2007  mjf Sync with HEAD.
 1.69.4.1 08-Dec-2007  mjf Sync with HEAD.
 1.70.2.3 30-Dec-2007  ad Fix remaining problems with ext2fs on this branch.
 1.70.2.2 26-Dec-2007  ad Sync with head.
 1.70.2.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.71.4.4 19-Jan-2008  bouyer Sync with HEAD
 1.71.4.3 10-Jan-2008  bouyer Sync with HEAD
 1.71.4.2 08-Jan-2008  bouyer Sync with HEAD
 1.71.4.1 02-Jan-2008  bouyer Sync with HEAD
 1.75.16.1 19-Oct-2008  haad Sync with HEAD.
 1.75.14.1 10-Jun-2008  simonb Initial commit of Wasabi System's WAPBL (Write Ahead Physical Block
Logging) journaling code. Originally written by Darrin B. Jewell
while at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

Still a number of issues - look in doc/BRANCHES for "simonb-wapbl"
for more info.
 1.75.12.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.75.10.4 09-Oct-2010  yamt sync with head
 1.75.10.3 11-Aug-2010  yamt sync with head.
 1.75.10.2 11-Mar-2010  yamt sync with head
 1.75.10.1 04-May-2009  yamt sync with head.
 1.75.6.1 28-Sep-2008  mjf Sync with HEAD.
 1.76.4.5 16-Jul-2011  riz Pull up following revision(s) (requested by manu in ticket #1642):
sys/ufs/ufs/ufs_inode.c: revision 1.86
Call ufs_extattr_vnode_inactive before UFS_WAPBL_BEGIN, as the latter will
leave the vnode locked, and ufs_extattr_vnode_inactive does lock/unlock
 1.76.4.4 07-Sep-2010  bouyer Pull up following revision(s) (requested by chs in ticket #1448):
sys/uvm/uvm_pager.h: revision 1.39 via patch
sys/miscfs/genfs/genfs_vnops.c: revision 1.183 via patch
sys/ufs/ufs/ufs_inode.c: revision 1.83 via patch
sys/miscfs/genfs/genfs_io.c: revision 1.40 via patch
sys/miscfs/genfs/genfs_node.h: revision 1.20 via patch
replace the earlier workaround for PR 40389 with a better fix.
the earlier change caused data corruption by freeing pages
without invaliding their mappings. instead of the trylock/retry,
just take the genfs-node lock before calling VOP_GETPAGES()
and pass a new flag to tell it that we're already holding this lock.
 1.76.4.3 28-Mar-2010  snj Pull up following revision(s) (requested by hannken in ticket #1345):
sys/ufs/ufs/ufs_inode.c: revision 1.80
Allow ufs_inactive() while a file system is suspending. Removes a possible
deadlock between vrele() and ffs_sync() during suspension.
 1.76.4.2 22-Feb-2010  snj Pull up following revision(s) (requested by bouyer in ticket #1302):
sys/ufs/ext2fs/ext2fs_inode.c: revision 1.71
sys/ufs/ffs/ffs_inode.c: revision 1.104
sys/ufs/lfs/lfs_inode.c: revision 1.121
sys/ufs/ufs/ufs_inode.c: revision 1.79
- ufs_balloc_range(): on error, only PG_RELEASED the pages that were
allocated to extend the file to the new size. Releasing all pages
may release pages that contains previously-written data not yet flushed
to disk. Should fix PR kern/35704
- {ffs,lfs,ext2fs}_truncate(): Even if the inode's size is the same as
the new length, call uvm_vnp_setsize(). *_truncate() may have been
called by *_write() in the error path (e.g. block allocation failure
because of quota of file system full), and at this point v_writesize
has been set to the desired size of the file and not reverted to the
old size. Not adjusting v_writesize to the real size cause
genfs_do_io() to write to disk past the real end of the file.
 1.76.4.1 08-Feb-2009  snj branches: 1.76.4.1.2; 1.76.4.1.4;
Pull up following revision(s) (requested by pooka in ticket #413):
sys/ufs/ufs/ufs_inode.c: revision 1.77
Break hold-and-wait which happens in ufs_balloc_range() when we
have pages busied and are trying to get the genfs node lock.
This causes a lock order reversal described in PR kern/40389.
This is not a proper fix and only a workaround for NetBSD 5.0.
problem first reported by simonb, patch tested by rmind
 1.76.4.1.4.2 20-May-2011  matt bring matt-nb5-mips64 up to date with netbsd-5-1-RELEASE (except compat).
 1.76.4.1.4.1 21-Apr-2010  matt sync to netbsd-5
 1.76.4.1.2.1 07-Sep-2010  bouyer Pull up following revision(s) (requested by chs in ticket #1448):
sys/uvm/uvm_pager.h: revision 1.39 via patch
sys/miscfs/genfs/genfs_vnops.c: revision 1.183 via patch
sys/ufs/ufs/ufs_inode.c: revision 1.83 via patch
sys/miscfs/genfs/genfs_io.c: revision 1.40 via patch
sys/miscfs/genfs/genfs_node.h: revision 1.20 via patch
replace the earlier workaround for PR 40389 with a better fix.
the earlier change caused data corruption by freeing pages
without invaliding their mappings. instead of the trylock/retry,
just take the genfs-node lock before calling VOP_GETPAGES()
and pass a new flag to tell it that we're already holding this lock.
 1.76.2.1 03-Mar-2009  skrll Sync with HEAD.
 1.77.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.79.2.3 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.79.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.79.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.80.2.5 31-May-2011  rmind sync with head
 1.80.2.4 21-Apr-2011  rmind sync with head
 1.80.2.3 05-Mar-2011  rmind sync with head
 1.80.2.2 03-Jul-2010  rmind sync with head
 1.80.2.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.83.4.2 14-Feb-2011  bouyer Update quota in the same WAPBL transaction as we write mode=0 back to inode.
 1.83.4.1 20-Jan-2011  bouyer Snapshot of work in progress on a modernised disk quota system:
- new quotactl syscall (versionned for backward compat), which takes
as parameter a path to a mount point, and a prop_dictionary
(in plistref format) describing commands and arguments.
For each command, status and data are returned as a prop_dictionary.
quota commands features will be added to take advantage of this,
exporting quota data or getting quota commands as plists.

- new on disk-format storage (all 64bit wide), integrated to metadata for
ffs (and playing nicely with wapbl).
Quotas are enabled on a ffs filesystem via superblock flags.
tunefs(8) can enable or disable quotas.
On a quota-enabled filesystem, fsck_ffs(8) will track per-uid/gid
block and inode usages, and will check and update quotas in Pass 6.
quota usage and limits are stored in unliked files (one for users,
one for groups)l fsck_ffs(8) will create the files if needed, or
free them if needed. This means that after enabling or disabling
quotas on a filesystem; a fsck_ffs(8) run is required.
quotacheck(8) is not needed any more, on a unclean shutdown
fsck or journal replay will take care of fixing quotas.
newfs(8) can create a ready-to-mount quota-enabled filesystem
(superblock flags are set and quota inodes are created).
Other new features or semantic changes:
- default quota datas, applied to users or groups which don't already
have a quota entry
- per-user/group grace time (instead of a filesystem global one)
- 0 really means "nothing allowed at all", not "no limit".
If you want "no limit", set the limit to UQUAD_MAX (tools will
understand "unlimited" and "-")

A quota file is structured as follow:
it starts with a header, containing a few per-filesystem values,
and the default quota limits.
Quota entries are linked together as a simple list, each entry has a
pointer (as an offset withing the file) to the next.
The header has a pointer to a list of free quota entries, and
a hash table of in-use entries. The size of the hash table depends
on the filesystem block size (header+hash table should fit in the
first block). The file is not sparse and is a multiple of
filesystem block size (when the free quota entry list is empty a new
filesystem block is allocated). quota entries to not cross
filesystem block boundaries.

In memory, the kernel keeps a cache of recently used quota entries
as a reference to the block number, and offset withing the block.
The quota entry itself is keept in the buf cache.

fsck_ffs(8), tunefs(8) and newfs(8) supports are completed (with
related atf tests :)
The kernel can update disk usage and report it via quotactl(2).

Todo: enforce quotas limits (limits are not checked by kernel yet)
update repquota, edquota and rpc.rquotad to the new world
implement compat_50_quotactl ioctl.
update quotactl(2) man page

fsck_ffs required fixes so that allocating new blocks or inodes will
properly update the superblock and cg sumaries. This was not an issue up
to now because superblock and cg sumaries check happened last, but now
allocations or frees can happen in pass 6.
 1.83.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.86.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.88.12.3 03-Dec-2017  jdolecek update from HEAD
 1.88.12.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.88.12.1 25-Feb-2013  tls resync with head
 1.88.2.4 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.88.2.3 23-Jan-2013  yamt sync with head
 1.88.2.2 17-Feb-2012  yamt byebye PG_HOLE as it turned out to be unnecessary.
 1.88.2.1 02-Nov-2011  yamt page cache related changes

- maintain object pages in radix tree rather than rb tree.
- reduce unnecessary page scan in putpages. esp. when an object has a ton of
pages cached but only a few of them are dirty.
- reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
- fix nfs commit range tracking.
- fix nfs write clustering. XXX hack
 1.89.10.1 10-Aug-2014  tls Rebase.
 1.89.2.1 18-May-2014  rmind sync with head
 1.90.2.2 16-Jul-2015  snj Pull up following revision(s) (requested by hannken in ticket #850):
sys/ufs/ufs/ufs_inode.c: revisions 1.93-1.95
Release the glock on VOP_GETPAGES failure.
Tripped over by nick@'s failing disk, missing unlock in error branch
discovered by jmcneill@.
--
ufs_inactive: take UFS_WAPBL_BEGIN() before calling chkiq().
Should fix PR kern/49948 (quota panic)
--
ufs_inactive: stop overwriting error status and return the last error seen.
Should resolve CID 1306276 (UNUSED_VALUE)
 1.90.2.1 28-Jan-2015  martin Pull up following revision(s) (requested by christos in ticket #425):
sys/ufs/ufs/ufs_inode.c: revision 1.91-1.92
sys/ufs/ufs/ufs_vnops.c: revision 1.223-1.224
sys/ufs/ufs/ufs_extern.h: revision 1.76-1.77
sys/ufs/ffs/ffs_vfsops.c: revision 1.303-1.305
Add debugging for mount...
Merge some error returns
Check more errors
Restore apple ufs error handling.
Move and unify indirect block truncate algorithm into a separate function.
PR/39371: Tobias Nygren: Don't fail mounting root if WAPBL log is corrupt.
Patch from Sergio L. Pascual.
 1.92.2.6 28-Aug-2017  skrll Sync with HEAD
 1.92.2.5 05-Feb-2017  skrll Sync with HEAD
 1.92.2.4 05-Dec-2016  skrll Sync with HEAD
 1.92.2.3 05-Oct-2016  skrll Sync with HEAD
 1.92.2.2 22-Sep-2015  skrll Sync with HEAD
 1.92.2.1 06-Jun-2015  skrll Sync with HEAD
 1.95.2.4 26-Apr-2017  pgoyette Sync with HEAD
 1.95.2.3 20-Mar-2017  pgoyette Sync with HEAD
 1.95.2.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.95.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.98.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.101.2.2 04-Feb-2018  martin Pull up following revision(s) (requested by christos in ticket #523):
sys/ufs/ffs/ffs_vfsops.c: revision 1.356
sys/ufs/ufs/ufs_inode.c: revision 1.103
Make sure inode blocks and size are zero when VOP_INACTIVE()
finalises a now unlinked inode.
Counterpart of the check in ffs_newvnode().
Prevent use-after-free where genfs_node_destroy() would destroy
a lock residing in the just freed inode data.
 1.101.2.1 02-Nov-2017  snj Pull up following revision(s) (requested by pgoyette in ticket #335):
share/man/man9/kernhist.9: 1.5-1.8
sys/arch/acorn26/acorn26/pmap.c: 1.39
sys/arch/arm/arm32/fault.c: 1.105 via patch
sys/arch/arm/arm32/pmap.c: 1.350, 1.359
sys/arch/arm/broadcom/bcm2835_bsc.c: 1.7
sys/arch/arm/omap/if_cpsw.c: 1.20
sys/arch/arm/omap/tiotg.c: 1.7
sys/arch/evbarm/conf/RPI2_INSTALL: 1.3
sys/dev/ic/sl811hs.c: 1.98
sys/dev/usb/ehci.c: 1.256
sys/dev/usb/if_axe.c: 1.83
sys/dev/usb/motg.c: 1.18
sys/dev/usb/ohci.c: 1.274
sys/dev/usb/ucom.c: 1.119
sys/dev/usb/uhci.c: 1.277
sys/dev/usb/uhub.c: 1.137
sys/dev/usb/umass.c: 1.160-1.162
sys/dev/usb/umass_quirks.c: 1.100
sys/dev/usb/umass_scsipi.c: 1.55
sys/dev/usb/usb.c: 1.168
sys/dev/usb/usb_mem.c: 1.70
sys/dev/usb/usb_subr.c: 1.221
sys/dev/usb/usbdi.c: 1.175
sys/dev/usb/usbdi_util.c: 1.67-1.70
sys/dev/usb/usbroothub.c: 1.3
sys/dev/usb/xhci.c: 1.75
sys/external/bsd/drm2/dist/drm/i915/i915_gem.c: 1.34
sys/kern/kern_history.c: 1.15
sys/kern/kern_xxx.c: 1.74
sys/kern/vfs_bio.c: 1.275-1.276
sys/miscfs/genfs/genfs_io.c: 1.71
sys/sys/kernhist.h: 1.21
sys/ufs/ffs/ffs_balloc.c: 1.63
sys/ufs/lfs/lfs_vfsops.c: 1.361
sys/ufs/lfs/ulfs_inode.c: 1.21
sys/ufs/lfs/ulfs_vnops.c: 1.52
sys/ufs/ufs/ufs_inode.c: 1.102
sys/ufs/ufs/ufs_vnops.c: 1.239
sys/uvm/pmap/pmap.c: 1.37-1.39
sys/uvm/pmap/pmap_tlb.c: 1.22
sys/uvm/uvm_amap.c: 1.108
sys/uvm/uvm_anon.c: 1.64
sys/uvm/uvm_aobj.c: 1.126
sys/uvm/uvm_bio.c: 1.91
sys/uvm/uvm_device.c: 1.66
sys/uvm/uvm_fault.c: 1.201
sys/uvm/uvm_km.c: 1.144
sys/uvm/uvm_loan.c: 1.85
sys/uvm/uvm_map.c: 1.353
sys/uvm/uvm_page.c: 1.194
sys/uvm/uvm_pager.c: 1.111
sys/uvm/uvm_pdaemon.c: 1.109
sys/uvm/uvm_swap.c: 1.175
sys/uvm/uvm_vnode.c: 1.103
usr.bin/vmstat/vmstat.c: 1.219
Reorder to test for null before null deref in debug code
--
Reorder to test for null before null deref in debug code
--
KNF
--
No need for '\n' in UVMHIST_LOG
--
normalise a BIOHIST log message
--
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...
(As proposed on tech-kern@ with additional changes and enhancements.)
Details of changes:
* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)
* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.
* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.
* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."
* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.
* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).
* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).
* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.
* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.
[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3)
format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".
[2] I've tried very hard to find "all [the] existing users of
kernhist(9)"
but it is possible that I've missed some of them. I would be glad
to
update any stragglers that anyone identifies.
--
For some reason this single kernel seems to have outgrown its declared
size as a result of the kernhist(9) changes. Bump the size.
XXX The amount of increase may be excessive - anyone with more detailed
XXX knowledge please feel free to further adjust the value
appropriately.
--
Misssed one cast of pointer --> uintptr_t in previous kernhist(9) commit
--
And yet another one. :(
--
Use correct mark-up for NetBSD version.
--
More improvements in grammar and readability.
--
Remove a stray '"' (obvious typo) and add a couple of casts that are
probably needed.
--
And replace an instance of "%p" conversion with "%#jx"
--
Whitespace fix. Give Bl tag table a width. Fix Xr.
 1.103.4.3 21-Apr-2020  martin Sync with HEAD
 1.103.4.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.103.4.1 10-Jun-2019  christos Sync with HEAD
 1.107.2.2 29-Feb-2020  ad Sync with head.
 1.107.2.1 17-Jan-2020  ad Sync with head.
 1.109.4.1 20-Apr-2020  bouyer Sync with HEAD
 1.2 01-Mar-1998  fvdl Remove extraneous files from Lite2 merge.
 1.1 01-Mar-1998  fvdl branches: 1.1.1;
Initial revision
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.159 08-Sep-2024  rillig fix a/an grammar in obvious cases
 1.158 10-Aug-2023  mrg branches: 1.158.6;
don't assign struct pointers to smaller then structure regions of memory.

in all cases here, the later parts of the structure are not actually
accessed, so there are no existing bugs here beyond general UB. for the
ufs ones, this also removes some casts.

found by GCC 12.
 1.157 22-Feb-2023  riastradh ufs: Nix trailing whitespace and tidy up some other minor KNF.
 1.156 06-Aug-2022  andvar s/blity/bility/ in various words, mainly in comments.
 1.155 05-Sep-2020  riastradh Revert "ufs: Prevent mkdir from choking on deleted directories."

This change made no sense and should not have been committed.
 1.154 05-Sep-2020  riastradh ufs: Prevent mkdir from choking on deleted directories.

Fix some missing uvm_vnp_setsize in screw cases while here.
 1.153 16-May-2020  christos Add ACL support for FFS. From FreeBSD.
 1.152 04-Apr-2020  ad Merge the remaining changes from the ad-namecache branch, affecting namei()
and getcwd():

- push vnode locking back as far as possible.
- do most lookups directly in the namecache, avoiding vnode locks & refs.
- don't block new refs to vnodes across VOP_INACTIVE().
- get shared locks for VOP_LOOKUP() if the file system supports it.
- correct lock types for VOP_ACCESS() / VOP_GETATTR() in a few places.

Possible future enhancements:

- make the lookups lockless.
- support dotdot lookups by being lockless and inferring absence of chroot.
- maybe make it work for layered file systems.
- avoid vnode references at the root & cwd.
 1.151 14-Mar-2020  ad - Hide the details of SPCF_SHOULDYIELD and related behind a couple of small
functions: preempt_point() and preempt_needed().

- preempt(): if the LWP has exceeded its timeslice in kernel, strip it of
any priority boost gained earlier from blocking.
 1.150 05-May-2019  christos branches: 1.150.4;
Add more comments to explain what we are doing.
 1.149 05-May-2019  christos Zero out all the dirent padding not just one byte, to avoid kernel memory
disclosure (from https://svnweb.freebsd.org/base?view=revision&revision=347066)
 1.148 27-Oct-2017  joerg branches: 1.148.4;
Revert printf return value change.
 1.147 27-Oct-2017  utkarsh009 [syzkaller] Cast all the printf's to (void *)
> as a result of new printf(9) declaration.
 1.146 30-Mar-2017  hannken branches: 1.146.6;
Remove now redundant calls to fstrans_start()/fstrans_done().
 1.145 29-Apr-2016  christos branches: 1.145.2; 1.145.4;
mention the PR
 1.144 29-Apr-2016  christos Split ufs_direnter futher and turn off tree-vrp for the broken function.
 1.143 14-Apr-2016  christos missing ,
 1.142 14-Apr-2016  christos - match endianness logic more to the original code
- fix namlen type
- use bool more
- eat \n's from panic strings
 1.141 13-Apr-2016  christos more deduplication.
 1.140 12-Apr-2016  christos Remove gcc hack, it does not help.
Add more const.
 1.139 12-Apr-2016  christos - fix build with UFS_DIRHASH
- hide extra diagnostic info
- try to elide gcc bug
 1.138 12-Apr-2016  christos - Collect the slot-related variables in their own structure and extract
some of the slot finding and updating code into their own function.
- Add a new label "next" in the main search loop to avoid nesting and
code duplication.
- Cache some reclen and ino variables for better readability and efficiency.
 1.137 12-Apr-2016  christos Provide reason to be printed in panic string.
 1.136 11-Apr-2016  christos misc cleanups, no functional change
 1.135 11-Jul-2015  mlelstv mp->mnt_stat.f_flag is never set. Use the mnt_flag directly.
This will now actually prevent the 'bad dir' panic if the filesystem
is read-only.
 1.134 28-Mar-2015  maxv Remove the 'cred' argument from breadn(), and update the man page
accordingly.

ok hannken@
 1.133 27-Mar-2015  riastradh Disentangle buffer-cached I/O from page-cached I/O in UFS.

Page-cached I/O is used for regular files, and is initiated by VFS
users such as userland and NFS.

Buffer-cached I/O is used for directories and symlinks, and is issued
only internally by UFS.

New UFS routine ufs_bufio replaces vn_rdwr for internal use.
ufs_bufio is implemented by new UFS operations uo_bufrd/uo_bufwr,
which sit in ufs_readwrite.c alongside the VOP_READ/VOP_WRITE
implementations.

I preserved the code as much as possible and will leave further
simplification for future commits. I kept the ulfs_readwrite.c
copypasta close to ufs_readwrite.c in case we ever want to merge them
back; likewise ext2fs_readwrite.c.

No externally visible semantic change. All atf fs tests still pass.
 1.132 03-Jun-2014  joerg branches: 1.132.4;
Introduce two helper functions to centralise the namecache statistics
in vfs_cache.c. Use consistent locking around the per-cpu data.
 1.131 25-May-2014  hannken Remove ufs_checkpath() and ufs_readdotdot(). These are relics
from the pre-genfs_rename era.
 1.130 08-May-2014  hannken Add a global vnode cache:

- vcache_get() retrieves a referenced and initialised vnode / fs node pair.
- vcache_remove() removes a vnode / fs node pair from the cache.

On cache miss vcache_get() calls new vfs operation vfs_loadvnode() to
initialise a vnode / fs node pair. This call is guaranteed exclusive,
no other thread will try to load this vnode / fs node pair.

Convert ufs/ext2fs, ufs/ffs and ufs/mfs to use this interface.

Remove now unused ufs/ufs_ihash

Discussed on tech-kern.

Welcome to 6.99.41
 1.129 07-Feb-2014  hannken branches: 1.129.2;
Change vnode operation lookup to return the resulting vnode *vpp unlocked.
Change cache_lookup() to return an unlocked vnode.

Discussed on tech-kern@

Welcome to 6.99.31
 1.128 04-Nov-2013  christos Add 2 XXX: gcc initializations
 1.127 25-Oct-2013  martin Mark a diagnostic-only variable
 1.126 20-Oct-2013  htodd Definining needswap where needed.
 1.125 15-Sep-2013  martin Remove unused variable
 1.124 16-Jun-2013  hannken branches: 1.124.2;
Add an UFS_SNAPGONE() ufs op replacing the calls
to ffs_snapgone() in ufs_lookup.c.

Ok: David Holland <dholland@netbsd.org>

Welcome to 6.99.22
 1.123 09-Jun-2013  dholland Stick UFS_ in front of these symbols:
DIRBLKSIZ
DIRECTSIZ
DIRSIZ
OLDDIRFMT
NEWDIRFMT

Part of PR 47909.
 1.122 22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.121 20-Dec-2012  hannken Change bread() and breadn() to never return a buffer on
error and modify all callers to not brelse() on error.

Welcome to 6.99.16

PR kern/46282 (6.0_BETA crash: msdosfs_bmap -> pcbmap -> bread -> bio_doread)
 1.120 05-Nov-2012  dholland Excise struct componentname from the namecache.

This uglifies the interface, because several operations need to be
passed the namei flags and cache_lookup also needs for the time being
to be passed cnp->cn_nameiop. Nonetheless, it's a net benefit.

The glop should be able to go away eventually but requires structural
cleanup elsewhere first.

This change requires a kernel bump.
 1.119 05-Nov-2012  dholland Disentangle the namecache from the internals of namei.

- Move the namecache's hash computation to inside the namecache code,
instead of being spread out all over the place. Remove cn_hash from
struct componentname and delete all uses of it.

- It is no longer necessary (if it ever was) for cache_lookup and
cache_lookup_raw to clear MAKEENTRY from cnp->cn_flags for the cases
that cache_enter already checks for.

- Rearrange the interface of cache_lookup (and cache_lookup_raw) to
make it somewhat simpler, to exclude certain nonexistent error
conditions, and (most importantly) to make it not require write access
to cnp->cn_flags.

This change requires a kernel bump.
 1.118 14-Oct-2012  dholland Add an XXX comment about a broken error case in ufs_dirremove.
(this was in one of my old rename patches)
 1.117 22-Jul-2012  rmind branches: 1.117.2;
Move some the test for MAKEENTRY into the cache_enter(9). Make some
variables in vfs_cache.c static, __read_mostly, etc.

No objection on tech-kern@.
 1.116 04-Jun-2012  riastradh Tidy up some typos and vestiges in comments after the ulr changes.
 1.115 09-May-2012  riastradh Adapt ffs, lfs, and ext2fs to use genfs_rename.

ok dholland, rmind
 1.114 05-May-2012  yamt comments and cosmetics. no functional changes.
 1.113 16-Mar-2012  hannken Fix last commit that broke lookup for dot with op DELETE.

Reviewed by: David Holland <dholland@netbsd.org>
 1.112 13-Mar-2012  elad Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.
 1.111 17-Jul-2011  dholland branches: 1.111.2; 1.111.6; 1.111.8;
Provide correct locking for ufs_wapbl_rename. Note that this does not
fix the non-wapbl rename; that will be coming soon. This patch also
leaves a lot of the older locking-related code around in #if 0 blocks,
and there's a lot of leftover redundant logic. All that will be going
away later.

Relates to at least these PRs:

PR kern/24887
PR kern/41417
PR kern/42093
PR kern/43626

and possibly others.
 1.110 14-Jul-2011  dholland Update comments on functions that take ufs_lookup_results.
 1.109 12-Jul-2011  dholland Pass the ufs_lookup_results pointer around instead of fetching it from
the inode in the guts of ufs. Now, in VOPs where i_crap is used it is
used (directly) only immediately on entry to the VOP call and then
passed around by reference.

Except for rename, which needs explicit sorting out. The code in
ufs_wapbl_rename is unchanged in behavior but I'm increasingly
inclined to think it's wrong.
 1.108 12-Jul-2011  dholland Currently, ufs_lookup produces five auxiliary results that are left in
the vnode when lookup returns and fished out again later.

1. Create struct ufs_lookup_results to hold these.

2. Call the ufs_lookup_results instance in struct inode "i_crap" to be
clear about exactly what's going on, and to distinguish the lookup
results from respectable members of struct inode.

3. Update references to these members in the directory access
subroutines.

4. Include preliminary infrastructure for checking that the i_crap
being used is still valid when it's used. This doesn't actually do
anything yet.

5. Update the way ufs_wapbl_rename manipulates these elements to use
the new data structures. I have not changed the manipulation; it may
or may not be correct but I continue to suspect that it is not.

The word of the day is "stigmergy".
 1.107 11-Jul-2011  hannken Change VOP_BWRITE() to take a vnode as its first argument like all other
VOPs do. Layered file systems no longer have to modify bp->b_vp and run
into trouble when an async VOP_BWRITE() uses the wrong vnode.

- change all occurences of VOP_BWRITE(bp) to VOP_BWRITE(bp->b_vp, bp).
- remove layer_bwrite().
- welcome to 5.99.55

Adresses PR kern/38762 panic: vwakeup: neg numoutput

No objections from tech-kern@.
 1.106 30-Nov-2010  dholland Abolish the SAVENAME and HASBUF flags. There is now always a buffer,
so the path in a struct componentname is now always valid during VOP
calls.
 1.105 24-Jun-2010  hannken Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.
 1.104 02-Mar-2010  pooka branches: 1.104.2;
Remove last #ifdef FFS. Do this by making lfs include ffs.
Could use UFS_OPS, but:

1) the lfs kernel module depends on full ffs already anway
2) lfs is being split from ufs, so this will automatically
go away soon
3) chances of anyone wanting an lfs-only kernel are pretty slim
4) i'm too lazy to figure out how to test ffs_snapgone() is
still called properly if I change the call ;)
 1.103 08-Jan-2010  pooka branches: 1.103.2;
The VATTR_NULL/VREF/VHOLD/HOLDRELE() macros lost their will to live
years ago when the kernel was modified to not alter ABI based on
DIAGNOSTIC, and now just call the respective function interfaces
(in lowercase). Plenty of mix'n match upper/lowercase has creeped
into the tree since then. Nuke the macros and convert all callsites
to lowercase.

no functional change
 1.102 28-Sep-2009  dholland Avoid nasal demons. Code of the form

vput(vp);
error = VFS_VGET(vp->v_mount, ...);

just isn't right. Because of vnode caching this *probably* never bit
anyone, except maybe under very heavy load, but still.
 1.101 22-Feb-2009  ad PR kern/26878 FFSv2 + softdep = livelock (no free ram)
PR kern/16942 panic with softdep and quotas
PR kern/19565 panic: softdep_write_inodeblock: indirect pointer #1 mismatch
PR kern/26274 softdep panic: allocdirect_merge: ...
PR kern/26374 Long delay before non-root users can write to softdep partitions
PR kern/28621 1.6.x "vp != NULL" panic in ffs_softdep.c:4653 while unmounting a softdep (+quota) filesystem
PR kern/29513 FFS+Softdep panic with unfsck-able file-corruption
PR kern/31544 The ffs softdep code appears to fail to write dirty bits to disk
PR kern/31981 stopping scsi disk can cause panic (softdep)
PR kern/32116 kernel panic in softdep (assertion failure)
PR kern/32532 softdep_trackbufs deadlock
PR kern/37191 softdep: locking against myself
PR kern/40474 Kernel panic after remounting raid root with softdep

Retire softdep, pass 2. As discussed and later formally announced on the
mailing lists.
 1.100 13-Nov-2008  ad branches: 1.100.4;
_KERNEL_OPT
 1.99 31-Jul-2008  simonb branches: 1.99.2; 1.99.4; 1.99.8;
Merge the simonb-wapbl branch. From the original branch commit:

Add Wasabi System's WAPBL (Write Ahead Physical Block Logging)
journaling code. Originally written by Darrin B. Jewell while
at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

OK'd by core@, releng@.
 1.98 05-Jun-2008  hannken branches: 1.98.2; 1.98.4;
ufs_blkatoff: Update comment.
 1.97 16-May-2008  hannken Make sure all cached buffers with valid, not yet written data have been
run through copy-on-write. Call fscow_run() with valid data where possible.

The LP_UFSCOW hack is no longer needed to protect ffs_copyonwrite() against
endless recursion.

- Add a flag B_MODIFY to bread(), breada() and breadn(). If set the caller
intends to modify the buffer returned.

- Always run copy-on-write on buffers returned from ffs_balloc().

- Add new function ffs_getblk() that gets a buffer, assigns a new blkno,
may clear the buffer and runs copy-on-write. Process possible errors
from getblk() or fscow_run(). Part of PR kern/38664.

Welcome to 4.99.63

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.96 08-Dec-2007  pooka branches: 1.96.12; 1.96.14; 1.96.16; 1.96.18;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.
 1.95 26-Nov-2007  pooka branches: 1.95.2;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.94 10-Oct-2007  ad branches: 1.94.4;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.93 08-Oct-2007  ad Merge ffs locking & brelse changes from the vmlocking branch.
 1.92 25-Sep-2007  pooka avoid variable size stack allocations
 1.91 23-Jul-2007  pooka branches: 1.91.4; 1.91.6; 1.91.8; 1.91.10;
comment police: DIRBLKSIZE would be too chatty and therefore the
macro is known as DIRBLKSIZ
 1.90 21-Jul-2007  ad Don't depend on uvm_extern.h pulling in proc.h.
 1.89 17-May-2007  hannken branches: 1.89.2;
Fstrans_start() always returns zero, so change its type to void.
 1.88 04-Mar-2007  christos branches: 1.88.2; 1.88.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.87 09-Feb-2007  ad branches: 1.87.2;
Merge newlock2 to head.
 1.86 07-Feb-2007  elad Add missing ')'. Noted by Paul Goyette.
 1.85 06-Feb-2007  bouyer in ufs_dirremove swap ep->d_reclen before use if needed (affect UFS_DIRHASH
only).
While there remove an unneeded swap before compare against 0 in ufs_direnter().
Both pointed out by Pawel Jakub Dawidek on tech-kern@, thanks !
 1.84 29-Jan-2007  hannken Change fstrans enum types to upper case.
No functional change.

From Antti Kantee <pooka@netbsd.org>
 1.83 19-Jan-2007  hannken New file system suspension API to replace vn_start_write and vn_finished_write.
The suspension helpers are now put into file system specific operations.
This means every file system not supporting these helpers cannot be suspended
and therefore snapshots are no longer possible.

Implemented for file systems of type ffs.

The new API is enabled on a kernel option NEWVNGATE. This option is
not enabled by default in any kernel config.

Presented and discussed on tech-kern with much input from
Bill Studenmund <wrstuden@netbsd.org> and YAMAMOTO Takashi <yamt@netbsd.org>.

Welcome to 4.99.9 (new vfs op vfs_suspendctl).
 1.82 04-Jan-2007  elad Consistent usage of KAUTH_GENERIC_ISSUSER.
 1.81 09-Dec-2006  chs a smorgasbord of improvements to vnode locking and path lookup:
- LOCKPARENT is no longer relevant for lookup(), relookup() or VOP_LOOKUP().
these now always return the parent vnode locked. namei() works as before.
lookup() and various other paths no longer acquire vnode locks in the
wrong order via vrele(). fixes PR 32535.
as a nice side effect, path lookup is also up to 25% faster.
- the above allows us to get rid of PDIRUNLOCK.
- also get rid of WANTPARENT (just use LOCKPARENT and unlock it).
- remove an assumption in layer_node_find() that all file systems implement
a recursive VOP_LOCK() (unionfs doesn't).
- require that all file systems supply vfs_vptofh and vfs_fhtovp routines.
fill in eopnotsupp() for file systems that don't support being exported
and remove the checks for NULL. (layerfs calls these without checking.)
- in union_lookup1(), don't change refcounts in the ISDOTDOT case, just
adjust which vnode is locked. fixes PR 33374.
- apply fixes for ufs_rename() from ufs_vnops.c rev. 1.61 to ext2fs_rename().
 1.80 16-Nov-2006  joerg branches: 1.80.2;
LFS will never set SF_SNAPSHOT and doesn't support ffs_snapgone anyway.
So conditionally the calls to that function on the inclusion of FFS and
allow a LFS-only kernel to link.
 1.79 23-Jul-2006  ad branches: 1.79.4; 1.79.6;
Use the LWP cached credentials where sane.
 1.78 23-Jun-2006  yamt fix a simonb-timecounters regression.
the precision of getnanotime() is not suitable for file timestamps.
esp. when it's nfs-exported.

- introduce vfs_timestamp().
(the name is from freebsd. currently merely a wrapper of nanotime())
- for ufs-like filesystems, use it rather than getnanotime().

XXX check other filesystems.
 1.77 07-Jun-2006  kardel branches: 1.77.2;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.76 14-May-2006  elad branches: 1.76.2;
integrate kauth.
 1.75 15-Apr-2006  christos Coverity CID 1166: Add KASSERT before deref.
 1.74 30-Mar-2006  yamt some cleanups after the introduction of GOP_SIZE_MEM flag.
- remove GOP_SIZE_READ/GOP_SIZE_WRITE flags.
they have not been used since the change.
- ufs_balloc_range: remove code which has been no-op since the change.
thanks Konrad Schroder for explaining the original intention of the code.
- ffs_gop_size: don't extend past eof, in the case of GOP_SIZE_MEM.
otherwise genfs_getpages end up to allocate pages past eof unnecessarily.
 1.73 14-Jan-2006  yamt branches: 1.73.2; 1.73.4; 1.73.6; 1.73.8; 1.73.10;
- unify ffs_blkatoff and lfs_blkatoff.
- remove ufs_ops::uo_blkatoff.
- add directory read-ahead code. (disabled for now.)
 1.72 14-Jan-2006  yamt pull freebsd's ufs_lookup.c rev.1.53 and 1.54. PR/31873.

> ----------------------------
> revision 1.54
> date: 2001/08/26 01:25:12; author: iedowse; state: Exp; lines: +30 -12
> When compacting directories, ufs_direnter() always trusted DIRSIZ()
> to supply the number of bytes to be bcopy()'d to move an entry. If
> d_ino == 0 however, DIRSIZ() is not guaranteed to return a sensible
> length, so ufs_direnter could end up corrupting a directory during
> compaction. In practice I believe this can only happen after fsck_ffs
> has fixed a previously-corrupted directory.
>
> We now deal with any mid-block unused entries specially to avoid
> using DIRSIZ() or bcopy() on such entries. We also ensure that the
> variables 'dsize' and 'spacefree' contain meaningful values at all
> times. Add a few comments to describe better this intricate piece
> of code.
>
> The special handling of mid-block unused entries makes the dirhash-
> specific bugfix in the previous revision (1.53) now uncecessary,
> so this change removes it.
>
> Reviewed by: mckusick
> ----------------------------
> revision 1.53
> date: 2001/08/22 01:35:17; author: iedowse; state: Exp; lines: +2 -2
> When compressing directory blocks, the dirhash code didn't check
> that the directory entry was in use before attempting to find it
> in the hash structures to change its offset. Normally, unused
> entries do not need to be moved, but fsck can leave behind some
> unused entries that do. A dirhash sanity panic resulted when the
> entry to be moved was not found. Add a check that stops entries
> with d_ino == 0 from being passed to ufsdirhash_move().
 1.71 13-Jan-2006  yamt FSFMT: whitespace.
 1.70 11-Dec-2005  christos branches: 1.70.2;
merge ktrace-lwp.
 1.69 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.68 26-Sep-2005  yamt branches: 1.68.2;
always use nanotime rather than time.
it's bad to mix nanotime and time because it sometimes
make timestamps go backwards.
 1.67 23-Aug-2005  christos Don't overload MAXNAMLEN, use a separate constant for each filesystem type.
 1.66 19-Aug-2005  christos 64 bit inode changes.
 1.65 10-Jul-2005  thorpej Defflag UFS_DIRHASH.
 1.64 10-Jul-2005  thorpej - Use ANSI function decls.
- Sprinkle some static.
 1.63 29-May-2005  christos branches: 1.63.2;
- sprinkle const
- avoid shadow variables.
 1.62 26-Feb-2005  perry nuke trailing whitespace
 1.61 23-Jan-2005  rumble branches: 1.61.2;
Bring in Ian Dowse's Dirhash from FreeBSD. Hash tables of
directories are created on the fly and used to increase
performance by circumventing ufs_lookup's linear search.

Dirhash is enabled by the UFS_DIRHASH option, but not
by default.
 1.60 17-Sep-2004  skrll branches: 1.60.4;
There's no need to pass a proc value when using UIO_SYSSPACE with
vn_rdwr(9) and uiomove(9).

OK'd by Jason Thorpe
 1.59 15-Aug-2004  mycroft Repair some FFS_EI code for ufsmount changes.
 1.58 15-Aug-2004  mycroft Fixing age old cruft:
* Rather than using mnt_maxsymlinklen to indicate that a file systems returns
d_type fields(!), add a new internal flag, IMNT_DTYPE.

Add 3 new elements to ufsmount:
* um_maxsymlinklen, replaces mnt_maxsymlinklen (which never should have existed
in the first place).
* um_dirblksiz, which tracks the current directory block size, eliminating the
FS-specific checks littered throughout the code. This may be used later to
make the block size variable.
* um_maxfilesize, which is the maximum file size, possibly adjusted lower due
to implementation issues.

Sync some bug fixes from FFS into ext2fs, particularly:
* ffs_lookup.c 1.21, 1.28, 1.33, 1.48
* ffs_inode.c 1.43, 1.44, 1.45, 1.66, 1.67
* ffs_vnops.c 1.84, 1.85, 1.86

Clean up some crappy pointer frobnication.
 1.57 25-May-2004  hannken Add ffs internal snapshots. Written by Marshall Kirk McKusick for FreeBSD.

- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.

Welcome to 2.0F.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.56 21-Apr-2004  christos Replace the statfs() family of system calls with statvfs().
Retain binary compatibility.
 1.55 06-Mar-2004  yamt revert ufs_lookup.c rev.1.53 (MNT_ASYNC changes)
it was redundant because our bwrite() knows about MNT_ASYNC.

ok'ed by Jaromir Dolecek and Chuck Silvers.
 1.54 08-Nov-2003  dbj comment out unnecessary IN_CHANGE|IN_UPDATE in lookup
move softdep specific lock release/regrab inside if DOINGSOFTDEP
 1.53 20-Sep-2003  jdolecek if mounted ASYNC, use delayed writes for metadata, which improves performance
of these operations significantly
based on FreeBSD ufs_lookup.c rev. 1.8, by John Dyson
 1.52 15-Sep-2003  yamt indent.
 1.51 11-Sep-2003  christos PR/15397: Jason Thorpe: directory operations on pathnames that refer to
directories and have trailing slashes should succeed. Ok'd by kjk.
Fix provided by enami.
 1.50 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.49 05-Aug-2003  pk Pass the inode flags to set as an argument to ufs_dirrewrite().
Use it to restore the behaviour of not updating the modified time of a
directory that moves to a new parent.
 1.48 23-Jul-2003  yamt yield cpu in directory entry search loop in ufs_lookup().
this loop can take a bit long time with large buffer cache.
 1.47 29-Jun-2003  fvdl branches: 1.47.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.46 29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.45 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.44 15-May-2003  kristerw The C language does not permit statements of the form
(X ? Y : Z) = 0;
even though gcc handles this by a stupid extension.

Transform these to correct C.

Approved by fvdl.
 1.43 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.42 26-Nov-2002  yamt eliminate i_ino from in-core inode
and use local variable instead.

ok'ed by Frank van der Linden.
 1.41 25-Nov-2002  thorpej Avoid strict-alias warnings.
 1.40 28-Sep-2002  dbj Add support for the Apple UFS variation on ffs
This is the bulk of PR #17345

The general approach is to use a run time deteriminable value
for DIRBLKSIZ. Additional allowances are included for using
MAXSYMLINKLEN with FS_42INODEFMT and a shift in the cylinder group
cluster summary count array. Support is added for managing
the Apple UFS volume label.
 1.39 12-May-2002  matt Eliminate commons.
 1.38 18-Dec-2001  fvdl Bring over fixes from FreeBSD that weren't incorporated yet, mainly
from Kirk McKusick. They implement taking pending block/inode frees
into account for the sake of correct statfs() numbers, and adding
a new softdep type (newdirblk) to correctly handle newly allocated
directory blocks.

Minor additional changes: 1) swap the newly introduced fs_pendinginodes
and fs_pendingblock fields in ffs_sb_swap, and 2) declare lkt_held
in the debug version of the softdep lock structure volatile, as it
can be modified from interrupt context #ifdef DEBUG.
 1.37 19-Nov-2001  lukem fix compile error noted by itojun in [kern/14638]
 1.36 19-Nov-2001  lukem be consistent and cache UFS_*NEEDSWAP results in more functions
 1.35 08-Nov-2001  lukem add RCSID
 1.34 26-Oct-2001  lukem remove #include <ufs/ufs/quota.h> where it was just to appease
<ufs/ufs/inode.h>, since the latter now includes the former. leave the former
in source that obviously uses specific bits of it (for completeness.)
 1.33 26-Feb-2001  fvdl branches: 1.33.2; 1.33.6; 1.33.10;
Some bugfixes from rev 1.33 and 1.34 of this file in FreeBSD (some
in effect cosmetic). Original FreeBSD commit messages:

==
date: 2000/03/15 07:18:15; author: mckusick; state: Exp; lines: +4 -4
Bug fixes for currently harmless bugs that could rise to bite
the unwary if the code were called in slightly different ways.

[...]

2) In ufs_lookup() there is an off-by-one error in the test that checks
if dp->i_diroff is outside the range of the the current directory size.
This is completely harmless, since the following while-loop condition
'dp->i_offset < endsearch' is never met, so the code immediately
does a second pass starting at dp->i_offset = 0.

3) Again in ufs_lookup(), the condition in a sanity check is wrong
for directories that are longer than one block. This bug means that
the sanity check is only effective for small directories.

Submitted by: Ian Dowse <iedowse@maths.tcd.ie>

==

date: 2000/03/09 18:54:59; author: dillon; state: Exp; lines: +2 -2
branches: 1.33.2;
In the 'found' case for ufs_lookup() the underlying bp's data was
being accessed after the bp had been releaed. A simple move of the
brelse() solves the problem.

Approved by: jkh
Submitted by: Ian Dowse <iedowse@maths.tcd.ie>

==
 1.32 28-Jun-2000  mrg remove include of <vm/vm.h> and <uvm/uvm_extern.h>
 1.31 13-May-2000  perseant Change the sementics of the last parameter from a boolean ("waitfor") to
a set of flags ("flags"). Two flags are defined, UPDATE_WAIT and
UPDATE_DIROP.

Under the old semantics, VOP_UPDATE would block if waitfor were set,
under the assumption that directory operations should be done
synchronously. At least LFS and FFS+softdep do not make this
assumption; FFS+softdep got around the problem by enclosing all relevant
calls to VOP_UPDATE in a "if(!DOINGSOFTDEP(vp))", while LFS simply
ignored waitfor, one of the reasons why NFS-serving an LFS filesystem
did not work properly.

Under the new semantics, the UPDATE_DIROP flag is a hint to the
fs-specific update routine that the call comes from a dirop routine, and
should be wait for, or not, accordingly.

Closes PR#8996.
 1.30 30-Mar-2000  augustss Remove register declarations.
 1.29 20-Feb-2000  wiz remove obsoleted #if defined(UVM)
 1.28 14-Feb-2000  fvdl Fixes to the softdep code from Ethan Solomita <ethan@geocast.com>.
* Fix buffer ordering when it has dependencies.
* Alleviate memory problems.
* Deal with some recursive vnode locks (sigh).
* Fix other bugs.
 1.27 15-Nov-1999  fvdl Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.26 05-Sep-1999  jdolecek branches: 1.26.2; 1.26.4; 1.26.8;
Adapt to cache_lookup() changes.

Tested by: jdolecek
Rewieved by: wrstuden
 1.25 04-Aug-1999  wrstuden Make the compiler happy..
 1.24 04-Aug-1999  wrstuden Fix tyop in previous.
 1.23 04-Aug-1999  wrstuden Modify ISDOTDOT case so that we only clear PDIRUNLOCK if we really
re-lock the parent vnode.
 1.22 30-Jul-1999  mycroft Make one code path a bit clearer.
 1.21 08-Jul-1999  wrstuden Modify file systems to deal with struct lock in struct vnode. All leaf
fs's other than nfs use genfs_lock() for locking.

Modify lookup routines to set PDIRUNLOCK when they unlock the parrent.
 1.20 12-Feb-1999  thorpej branches: 1.20.4;
Fix printf format problems on Alpha.
 1.19 08-Sep-1998  fvdl Fix some maxsymlinklen comparisons for old filesystems that were
wrong after the byteswap changes.
 1.18 09-Aug-1998  perry bzero->memset, bcopy->memcpy, bcmp->memcmp
 1.17 28-Jul-1998  thorpej Change the "aresid" argument of vn_rdwr() from an int * to a size_t *,
to match the new uio_resid type.
 1.16 13-Jun-1998  kleink KNF, mostly of FFS_EI changes.
 1.15 18-Mar-1998  bouyer Add support for reading/writing FFS in non-native byte order, conditioned
to "options FFS_EI". The superblock and inodes (without blk addr) are
byteswapped at disk read/write time, other metadatas are byteswapped
when used (as they are acceeded directly in the buffer cache).
This required the addition of a "um_flags" field to struct ufsmount.
ffs_bswap.c contains superblock and inode byteswap routines also used
by userland utilities.
 1.14 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.13 11-Jun-1997  bouyer Add support for ext2fs, this needed a few modifications to ufs/ufs/inode.h:
- added an "union inode_ext" to struct inode, for the per-fs extentions.
For now only ext2fs uses it.
- i_din is now an union:
union {
struct dinode ffs_din; /* 128 bytes of the on-disk dinode. */
struct ext2fs_dinode e2fs_din; /* 128 bytes of the on-disk dinode. */
} i_din
Added a lot of #define i_ffs_* and i_e2fs_* to access the fields.
- Added two macros: FFS_ITIMES and EXT2FS_ITIMES. ITIMES calls the rigth
macro, depending on the time of the inode. ITIMES is used where necessary,
FFS_ITIMES and EXT2FS_ITIMES in other places.
 1.12 12-May-1997  kleink When doing a CREATE, RENAME or DELETE w/ DOWHITEOUT and ISWHITEOUT lookup on a
non-existent file and the end of the pathname is reached, and this `current'
directory resides on a read-only mounted file system, don't update/prepare
its inode for the actual operation but return EROFS.
 1.11 08-May-1997  mycroft Pass the vnode type to vaccess(), and use it when checking VEXEC. Make sure
that the mode bits passed to vaccess() and returned by foo_getattr() contain
only permission bits.
 1.10 08-May-1997  mycroft VEXEC -> VLOOKUP, as appropriate.
 1.9 12-Oct-1996  christos revert previous kprintf changes
 1.8 10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.7 09-Feb-1996  christos ufs prototype changes
 1.6 30-May-1995  mycroft Fix thinko in previous commit. Do this as suggested by John Kohl.
 1.5 30-May-1995  mycroft When replacing a whiteout, set i_endoff to 0, so the directory cannot be
shrunk.
 1.4 30-Dec-1994  mycroft Don't look at d_type for old format file systems.
 1.3 14-Dec-1994  mycroft Sync with CSRG.
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.20.4.1 02-Aug-1999  thorpej Update from trunk.
 1.26.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.26.4.2 26-Oct-1999  fvdl Merge changes in the trickle-sync and softdep code as done by Kirk McKusick
in FreeBSD since the version that we based the branch on. Merging mostly
done by Ethan Solomita <ethan@geocast.com>.

Also, make sure the syncer thread/process isn't active when we're
unmounting a filesystem. This could wreak havoc. XXX should be done
on a per-mountpoint basis, but especially the softdep code would
end up to be a big pile of vfs_busy() calls.
 1.26.4.1 19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.26.2.2 12-Mar-2001  bouyer Sync with HEAD.
 1.26.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.33.10.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.33.6.3 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.33.6.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.33.6.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.33.2.5 11-Dec-2002  thorpej Sync with HEAD.
 1.33.2.4 18-Oct-2002  nathanw Catch up to -current.
 1.33.2.3 20-Jun-2002  nathanw Catch up to -current.
 1.33.2.2 08-Jan-2002  nathanw Catch up to -current.
 1.33.2.1 14-Nov-2001  nathanw Catch up to -current.
 1.47.2.10 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.47.2.9 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.47.2.8 24-Jan-2005  skrll Sync with HEAD.
 1.47.2.7 27-Oct-2004  skrll Remove the struct lwp * arguments from qsync and ufs_checkpath that are
no longer (read: were never) required.
 1.47.2.6 21-Sep-2004  skrll Fix the sync with head I botched.
 1.47.2.5 18-Sep-2004  skrll Sync with HEAD.
 1.47.2.4 25-Aug-2004  skrll Sync with HEAD.
 1.47.2.3 24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.47.2.2 03-Aug-2004  skrll Sync with HEAD
 1.47.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.60.4.1 29-Apr-2005  kent sync with -current
 1.61.2.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.63.2.7 21-Jan-2008  yamt sync with head
 1.63.2.6 07-Dec-2007  yamt sync with head
 1.63.2.5 27-Oct-2007  yamt sync with head.
 1.63.2.4 03-Sep-2007  yamt sync with head.
 1.63.2.3 26-Feb-2007  yamt sync with head.
 1.63.2.2 30-Dec-2006  yamt sync with head.
 1.63.2.1 21-Jun-2006  yamt sync with head.
 1.68.2.1 20-Oct-2005  yamt adapt ufs.
 1.70.2.1 15-Jan-2006  yamt sync with head.
 1.73.10.2 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.73.10.1 31-Mar-2006  tron Merge 2006-03-31 NetBSD-current into the "peter-altq" branch.
 1.73.8.3 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.73.8.2 19-Apr-2006  elad sync with head.
 1.73.8.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.73.6.4 11-Aug-2006  yamt sync with head
 1.73.6.3 26-Jun-2006  yamt sync with head.
 1.73.6.2 24-May-2006  yamt sync with head.
 1.73.6.1 01-Apr-2006  yamt sync with head.
 1.73.4.3 01-Jun-2006  kardel Sync with head.
 1.73.4.2 22-Apr-2006  simonb Sync with head.
 1.73.4.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time() and use "time_second"
instead of "time.tv_sec".
 1.73.2.1 09-Sep-2006  rpaulo sync with head
 1.76.2.1 19-Jun-2006  chap Sync with head.
 1.77.2.1 13-Jul-2006  gdamore Merge from HEAD.
 1.79.6.1 10-Dec-2006  yamt sync with head.
 1.79.4.5 09-Feb-2007  ad Sync with HEAD.
 1.79.4.4 01-Feb-2007  ad Sync with head.
 1.79.4.3 30-Jan-2007  ad Remove support for SA. Ok core@.
 1.79.4.2 12-Jan-2007  ad Sync with head.
 1.79.4.1 18-Nov-2006  ad Sync with head.
 1.80.2.2 17-Feb-2007  tron Apply patch (requested by chs in ticket #422):
- Fix various deadlock problems with nullfs and unionfs.
- Speed up path lookups by upto 25%.
 1.80.2.1 16-Feb-2007  riz Pull up following revision(s) (requested by chs in ticket #419):
sys/ufs/ufs/ufs_lookup.c: revision 1.85-1.86
in ufs_dirremove swap ep->d_reclen before use if needed (affect UFS_DIRHASH
only).
While there remove an unneeded swap before compare against 0 in ufs_direnter().
Both pointed out by Pawel Jakub Dawidek on tech-kern@, thanks !
Add missing ')'. Noted by Paul Goyette.
 1.87.2.2 17-May-2007  yamt sync with head.
 1.87.2.1 12-Mar-2007  rmind Sync with HEAD.
 1.88.4.1 11-Jul-2007  mjf Sync with head.
 1.88.2.5 09-Oct-2007  ad Sync with head.
 1.88.2.4 20-Aug-2007  ad Sync with HEAD.
 1.88.2.3 08-Jun-2007  ad Sync with head.
 1.88.2.2 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.88.2.1 05-Apr-2007  ad Compile fixes.
 1.89.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.91.10.2 23-Jul-2007  pooka comment police: DIRBLKSIZE would be too chatty and therefore the
macro is known as DIRBLKSIZ
 1.91.10.1 23-Jul-2007  pooka file ufs_lookup.c was added on branch matt-mips64 on 2007-07-23 14:58:05 +0000
 1.91.8.2 14-Oct-2007  yamt sync with head.
 1.91.8.1 06-Oct-2007  yamt sync with head.
 1.91.6.2 09-Jan-2008  matt sync with HEAD
 1.91.6.1 06-Nov-2007  matt sync with HEAD
 1.91.4.4 09-Dec-2007  jmcneill Sync with HEAD.
 1.91.4.3 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.91.4.2 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.91.4.1 02-Oct-2007  joerg Sync with HEAD.
 1.94.4.2 27-Dec-2007  mjf Sync with HEAD.
 1.94.4.1 08-Dec-2007  mjf Sync with HEAD.
 1.95.2.1 26-Dec-2007  ad Sync with head.
 1.96.18.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.96.18.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.96.16.3 11-Aug-2010  yamt sync with head.
 1.96.16.2 11-Mar-2010  yamt sync with head
 1.96.16.1 04-May-2009  yamt sync with head.
 1.96.14.2 17-Jun-2008  yamt sync with head.
 1.96.14.1 18-May-2008  yamt sync with head.
 1.96.12.4 17-Jan-2009  mjf Sync with HEAD.
 1.96.12.3 28-Sep-2008  mjf Sync with HEAD.
 1.96.12.2 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.96.12.1 02-Jun-2008  mjf Sync with HEAD.
 1.98.4.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.98.4.1 19-Oct-2008  haad Sync with HEAD.
 1.98.2.6 27-Jun-2008  simonb Add a comma to printf format string missing in previous.
 1.98.2.5 27-Jun-2008  simonb Fix some white-space nits.
 1.98.2.4 27-Jun-2008  simonb Do the UFS_WAPBL_UPDATE() after the call to UFS_TRUNCATE(), not before.

From Greg Oster.
 1.98.2.3 27-Jun-2008  simonb Use "%#x" to print hex values instead of "%x" - this made it really hard
to tell hex from decimal values especially when no observed values had
any a-f digits.
 1.98.2.2 27-Jun-2008  simonb Use local "nameiop" variable instead of cnp->cn_nameiop as the rest of
this function already does.
 1.98.2.1 10-Jun-2008  simonb Initial commit of Wasabi System's WAPBL (Write Ahead Physical Block
Logging) journaling code. Originally written by Darrin B. Jewell
while at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

Still a number of issues - look in doc/BRANCHES for "simonb-wapbl"
for more info.
 1.99.8.1 21-Apr-2010  matt sync to netbsd-5
 1.99.4.2 19-May-2012  riz Apply patch (requested by buhrow in ticket #1759):


sys/ufs/lfs/lfs_vnops.c patch
sys/ufs/ufs/inode.h patch
sys/ufs/ufs/ufs_extern.h patch
sys/ufs/ufs/ufs_lookup.c patch
sys/ufs/ufs/ufs_vnops.c patch
sys/ufs/ufs/ufs_wapbl.c patch

Port dholland's ufs_rename locking changes to netbsd-5.
[buhrow, ticket #1759]

Hello. More testing has revealed a minor misunderstanding between the
vnode API in -current and 5.x. The below patch, against NetBSD-5.1
sources, rolls all the accumulated patches into one patch set. With this
patch, I believe you can now run with WAPBL, softdep or traditional ufs
semantics with heavy file loads and avoid panics due to resource exhaustion
and/or tstile deadlocks. Testing has been done on I386, both uniprocessor
and multiprocessor, and on Sparc machines in uniprocessor mode, though I
think multiprocessor Sparc would be fine as well. Since these changes are
machine independent, I don't anticipate any issues on any platform. It is
my hope that modulo any final issues that come up in the final round of
testing I'm currently performing, these patches will be ready to be pulled
up into the NetBSD-5 branch.
Finally, I'd like to thank mouse@ and hannken@ for their help and
patience in helping me track down and test the final versions of these
patches. With their assistance, I'm confident these patches make NetBSD-5
a much more stable and robust operating environment in a variety of
setings.
 1.99.4.1 14-Feb-2010  bouyer Pull up following revision(s) (requested by dholland in ticket #1300):
sys/ufs/ufs/ufs_lookup.c: revision 1.102
Avoid nasal demons. Code of the form
vput(vp);
error = VFS_VGET(vp->v_mount, ...);
just isn't right. Because of vnode caching this *probably* never bit
anyone, except maybe under very heavy load, but still.
 1.99.2.2 03-Mar-2009  skrll Sync with HEAD.
 1.99.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.100.4.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.103.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.103.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.104.2.2 05-Mar-2011  rmind sync with head
 1.104.2.1 03-Jul-2010  rmind sync with head
 1.111.8.1 12-Aug-2012  martin Pull up following revision(s) (requested by manu in ticket #484):
sys/fs/nilfs/nilfs_vnops.c: revision 1.18
sys/ufs/ufs/ufs_lookup.c: revision 1.117
sys/nfs/nfs_vnops.c: revision 1.295
sys/ufs/chfs/chfs_vnops.c: revision 1.8
sys/ufs/ext2fs/ext2fs_lookup.c: revision 1.70
sys/fs/unionfs/unionfs_vnops.c: revision 1.6
sys/kern/vfs_cache.c: revision 1.89
sys/fs/efs/efs_vnops.c: revision 1.26
sys/fs/hfs/hfs_vnops.c: revision 1.26
sys/fs/adosfs/adlookup.c: revision 1.16
sys/fs/puffs/puffs_vnops.c: revision 1.168
sys/fs/tmpfs/tmpfs_vnops.c: revision 1.98
sys/fs/ntfs/ntfs_vnops.c: revision 1.52
sys/fs/cd9660/cd9660_lookup.c: revision 1.20
sys/fs/msdosfs/msdosfs_lookup.c: revision 1.24
sys/fs/smbfs/smbfs_vnops.c: revision 1.80
sys/fs/udf/udf_vnops.c: revision 1.72
sys/fs/filecorefs/filecore_lookup.c: revision 1.14
sys/fs/puffs/puffs_node.c: revision 1.25
Move some the test for MAKEENTRY into the cache_enter(9). Make some
variables in vfs_cache.c static, __read_mostly, etc.
No objection on tech-kern@.
 1.111.6.2 02-Jun-2012  mrg sync to latest -current.
 1.111.6.1 05-Apr-2012  mrg sync to latest -current.
 1.111.2.6 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.111.2.5 23-Jan-2013  yamt sync with head
 1.111.2.4 16-Jan-2013  yamt sync with (a bit old) head
 1.111.2.3 30-Oct-2012  yamt sync with head
 1.111.2.2 23-May-2012  yamt sync with head.
 1.111.2.1 17-Apr-2012  yamt sync with head
 1.117.2.5 03-Dec-2017  jdolecek update from HEAD
 1.117.2.4 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.117.2.3 23-Jun-2013  tls resync from head
 1.117.2.2 25-Feb-2013  tls resync with head
 1.117.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.124.2.1 18-May-2014  rmind sync with head
 1.129.2.1 10-Aug-2014  tls Rebase.
 1.132.4.5 28-Aug-2017  skrll Sync with HEAD
 1.132.4.4 29-May-2016  skrll Sync with HEAD
 1.132.4.3 22-Apr-2016  skrll Sync with HEAD
 1.132.4.2 22-Sep-2015  skrll Sync with HEAD
 1.132.4.1 06-Apr-2015  skrll Sync with HEAD
 1.145.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.145.2.1 26-Apr-2017  pgoyette Sync with HEAD
 1.146.6.1 02-Mar-2020  martin Additionally pull up the following revisions, to fix build fallout from
ticket #1511:

src/sys/ufs/ufs/dir.h 1.26
sys/ufs/ufs/ufs_lookup.c 1.149

Zero out all the dirent padding not just one byte, to avoid kernel memory
disclosure (from https://svnweb.freebsd.org/base?view=revision&revision=347066)
 1.148.4.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.148.4.1 10-Jun-2019  christos Sync with HEAD
 1.150.4.1 19-Jan-2020  ad Set IMNT_SHRLOOKUP and use it for the in-cache case. Need to check what
more can be done with tmpfs though, it can probably do the whole lookup.
 1.158.6.1 02-Aug-2025  perseant Sync with HEAD
 1.118 22-Feb-2023  riastradh ufs: Nix trailing whitespace and tidy up some other minor KNF.
 1.117 28-Jun-2014  dholland Revert the following changes:

src/sys/sys/quotactl.h 1.37
src/sys/compat/netbsd32/netbsd32.h 1.101
src/sys/compat/netbsd32/netbsd32_netbsd.c 1.188, 1.189
src/sys/kern/vfs_quotactl.c 1.39
src/sys/kern/vfs_syscalls.c 1.483
src/sys/ufs/lfs/ulfs_quota.c 1.11
src/sys/ufs/ufs/ufs_quota.c 1.116
src/lib/libquota/quota_kernel.c 1.5

and do them correctly.

If you're going to change the name of something, you need to change
the name of *all* the things with the same name, not just a handful,
and you should change it to something similar so it still matches the
rest of the system rather than just picking an arbitrarily different
name.

Hi, Joerg.

To wit, rename the quotactl "delete" operation to "del", because
"delete" is a reserved word in C++ and for some reason Joerg wants to
run internal interfaces used only by C code through his C++ compiler.
Do not rename it to "remove" instead, because this doesn't match
libquota or the rest of the usage throughout the system; and rename
all the related identifiers, not just the ones that blew the mind of
Joerg's C++ compiler.

Because this is not a user-facing API (the only userland consumer
sys/quotactl.h is libquota) it is sort of ok to make arbitrary
source-incompatible changes; however, by the same token it's completely
unnecessary. If it *were* a user-facing API that someone might have a
semi-rational reason to want to run a C++ compiler on, it would be
incorrect to change it at this point.
 1.116 12-Jun-2014  joerg Don't t use a C++ keyword as field name.
 1.115 16-Nov-2013  dholland branches: 1.115.2;
tidy the QUOTA2 blocks a bit more
 1.114 16-Nov-2013  mrg move variable use and initialisation inside the #ifdef / block that uses it.
 1.113 18-Oct-2013  christos move code inside ifdef
 1.112 09-Sep-2012  manu branches: 1.112.2; 1.112.4;
Temporary fix for quotactl authorization: it must use the effective UID
and not the real UID.

Further work is required to move the check to the kauth listener instead
of having it in UFS code.
 1.111 26-Aug-2012  dholland Move INITQFNAMES to the right header file.
 1.110 29-Jul-2012  dholland Restore accidentally lost initialization of quotatypes[].
Fixes (null) in the kernel message triggered when you go over quota, and
maybe other things. Reported by Matthew Mondor.
 1.109 18-Feb-2012  matt Eliminate a common in a header file (add a missing extern) and
declare it in the approriate C file.
 1.108 01-Feb-2012  dholland branches: 1.108.2;
Improve the names of some members of struct quotactl_args. These are
effectively function parameter names, but since they need to be
described with the same names in the man page the choices do matter.
Some.
 1.107 01-Feb-2012  dholland Change the syscall API for quotas over to the new non-proplib one.

- struct vfs_quotactl_args -> struct quotactl_args
- add sys/stdint.h to sys/quotactl.h for clean userland build
- install sys/quotactl.h in /usr/include
- update set lists for same
- add new marshalling code in libquota
- add new unmarshalling code in vfs_syscalls.c
- discard proplib interpreter code in vfs_quotactl.c
- add dispatching code for the 14 quotactl ops in vfs_quotactl.c
- mark the proplib quotactl syscall obsolete
- add a new syscall number for the new quotactl syscall
- change the name of the syscall to __quotactl()
- remove the decl of the old quotactl from quota/quotaprop.h
- add a decl of the new quotactl to sys/quotactl.h
- update the libc build
- update ktruss
- remove proplib marshalling code from libquota
- update copy of syscall table in gdb ppc sources
- hack rumphijack to accomodate new quotactl name (as I recall,
pooka wanted such a name change to simplify something, but I
don't really see what/how)

This change appears to require a kernel version bump for rumpish
reasons.
 1.106 01-Feb-2012  dholland Add QUOTACTL_IDTYPESTAT and QUOTACTL_OBJTYPESTAT for retrieving info
about idtypes and objtypes. This avoids compiling in the names of
the id and object types.

I overlooked this last week because the proplib syscall interface has
no way to convey this information.

Renumber the operation codes again (since we still can) to insert
the new operations into the list in a semantically sensible place.

Requires kernel version bump.
 1.105 29-Jan-2012  para sprinkel some #ifdef QUOTA2 to avoid unused variables
 1.104 29-Jan-2012  dholland Remove references to <quota/quotaprop.h> in src/sys/ufs.
The remaining references in the kernel are in vfs_quotactl.c, the
compat_50 code for the old quotactl (to be fixed up), and the
code compiled from src/common/lib/libquota.
 1.103 29-Jan-2012  dholland Remove the extra op argument to VFS_QUOTACTL() - the op is now stored
purely in the args structure.

This change requires a kernel version bump.
 1.102 29-Jan-2012  dholland Tidy up the VFS_QUOTACTL interface. Renumber the command codes in a
logical order (as opposed to the previous order, which accumulated
arbitrarily), remove the separate codes for argument encoding as
there's now a 1-1 mapping between ops and argument substructures,
and assert in VFS_QUOTACTL() itself that the op in the args structure
matches the op passed directly.

This change requires a kernel version bump.
 1.101 29-Jan-2012  dholland Change QUOTACTL_GETVERSION to QUOTACTL_STAT. Add struct quotastat.

This change requires a kernel version bump.
 1.100 29-Jan-2012  dholland Move proplib bits for QUOTACTL_QUOTAOFF out of the ufs code.

This change requires a kernel version bump.
 1.99 29-Jan-2012  dholland Move the proplib bits for QUOTACTL_QUOTAON out of the ufs code.

This change requires a kernel version bump.
 1.98 29-Jan-2012  dholland Add QUOTACTL_CURSORSKIPIDTYPE, QUOTACTL_CURSORATEND, QUOTACTL_CURSORREWIND.

This change requires a kernel version bump.
 1.97 29-Jan-2012  dholland Don't pass the idtype to QUOTACTL_GETALL. Instead, iterate both users
and groups.

This change requires a kernel version bump.
 1.96 29-Jan-2012  dholland Call QUOTACTL_GETALL in a loop to get results 8 at a time. Make
the QUOTACTL_GETALL interface less abusive.

Note: this change requires a kernel version bump.
 1.95 29-Jan-2012  dholland Hack QUOTACTL_GETALL to return results without using proplib.

(this interface is abusive and is going to be cleaned up in the
immediate future)

Note: this change requires a kernel version bump.
 1.94 29-Jan-2012  dholland Pass the cursor to QUOTACTL_GETALL. Don't pass unused proplib items.

Note: this change requires a kernel version bump.
 1.93 29-Jan-2012  dholland Begin adding quota cursor/iteration interface to VFS_QUOTACTL.

Add struct quotakcursor.
Add QUOTACTL_CURSOROPEN and QUOTACTL_CURSORCLOSE operations.
Implement the plumbing for them.
Add trivial implementations of them for quota2.
(iteration is not supported on quota1 for the time being, just as
getall isn't)
Have the proplib interpreter open and close a cursor around doing
QUOTACTL_GETALL.

Note: this change requires a kernel version bump.
 1.92 29-Jan-2012  dholland Package up the args of QUOTACTL_DELETE as a struct quotakey.
 1.91 29-Jan-2012  dholland QUOTACTL_CLEAR -> QUOTACTL_DELETE to match intended API and user API.
 1.90 29-Jan-2012  dholland Improve the quota2 QUOTACTL_CLEAR code to allow clearing blocks and
files independently.

Note: this change requires a kernel version bump.
 1.89 29-Jan-2012  dholland The handling of QUOTACTL_CLEAR does not use the proplib data
dictionary, so don't pass it.

Note: this change requires a kernel version bump.
 1.88 29-Jan-2012  dholland Move toplevel proplib iteration of QUOTACTL_CLEAR to fs-independent code.

Note: this change requires a kernel version bump.
 1.87 29-Jan-2012  dholland Tidy up a bit.
 1.86 29-Jan-2012  dholland Whitespace.
 1.85 29-Jan-2012  dholland Rename QUOTACTL_SET to QUOTACTL_PUT, to match future intended API.
 1.84 29-Jan-2012  dholland Combine the miscellaneous QUOTACTL_SET args into a struct quotakey.

Note: this change requires a kernel version bump.
 1.83 29-Jan-2012  dholland Pass only one objtype and its quotaval to QUOTACTL_SET at one time.

(The backend code to handle this is a lot tidier than I expected given
that the proplib code doesn't allow setting blocks and files
independently; I was afraid there would turn out to be a reason for
that...)

Note: this change requires a kernel version bump.
 1.82 29-Jan-2012  dholland For QUOTACTL_SET in quota2, use the quotaval data instead of proplib.
 1.81 29-Jan-2012  dholland For QUOTACTL_SET in quota1, use the quotaval data instead of proplib.
 1.80 29-Jan-2012  dholland Move the top level iteration for QUOTACTL_SET from ufs to vfs_quotactl.

Note: this change requires a kernel version bump.
 1.79 29-Jan-2012  dholland Whitespace.
 1.78 29-Jan-2012  dholland Use struct quotakey with QUOTACTL_GET. Tidy up accordingly.

Step 5 of 5 for QUOTACTL_GET.

Note: this change requires a kernel version bump.
 1.77 29-Jan-2012  dholland Per the FS-independent schema, get one quotaval at a time from the
filesystem, instead of blocks and files together.

This results in fetching each FS-level quota entry twice and therefore
doing slightly more work, but (1) quota access isn't a critical path
and (2) after fetching the block values the file values will be hot in
the cache, so it won't add much total time.

Also move more of the FS-independent defintions from <quota.h> to
<sys/quota.h> so we can use them internally.

Step 4 of 5 for QUOTACTL_GET.

Note: this change requires a kernel version bump.
 1.76 29-Jan-2012  dholland Move what was second-layer proplib frobbing for QUOTACTL_GET to
FS-independent code. (Step 3 of probably 5 for QUOTACTL_GET.)

Note: this change requires a kernel version bump.
 1.75 29-Jan-2012  dholland Move second-layer proplib frobbing within ufs quota code up to the
first layer. (Step 2 of several for QUOTACTL_GET.)
 1.74 29-Jan-2012  dholland Move first-layer proplib frobbing for QUOTACTL_GET to FS-independent code.
(step 1 of several)

Note: this change requires a kernel version bump.
 1.73 29-Jan-2012  dholland Move proplib frobbing for QUOTACTL_GETVERSION to FS-independent code.

Note: this change requires a kernel version bump.
 1.72 29-Jan-2012  dholland Introduce struct vfs_quotactl_args. Use it.

This change uglifies vfs_quotactl some in order to make room for
moving operation-specific but FS-independent logic out of ufs_quota.c.

Note: this change requires a kernel version bump.
 1.71 29-Jan-2012  dholland Move the proplib-based quota command dispatching (that is, the code
that knows the magic string names for the allowed actions) out of
UFS-specific code and to fs-independent code.

This introduces QUOTACTL_* operation codes and changes the signature
of VFS_QUOTACTL() again for compile safety.

Note: this change requires a kernel version bump.
 1.70 24-Mar-2011  bouyer branches: 1.70.4; 1.70.8;
Add a new libquota library, which contains some blocks to build and/or
parse quota plists; as well as a getfsquota() function to retrieve quotas
for a single id from a single filesystem (whatever filesystem this is:
a local quota-enabled fs or NFS). This is build on functions getufsquota()
(for local filesystems with UFS-like quotas) and getnfsquota();
which are also available to userland programs.
move functions from quota2_subr.c to libquota or libprop as appropriate,
and ajust in-tree quota tools.
move some declarations from kernel headers to either sys/quota.h or
quota/quota.h as appropriate. ufs/ufs/quota.h still installed because
it's needed by other installed ufs headers.
ufs/ufs/quota1.h still installed as a quick&dirty way to get a code
using the old quotactl() to compile (just include ufs/ufs/quota1.h instead of
ufs/ufs/quota.h - old code won't compile without this change and this is
on purpose).
Discussed on tech-kern@ and tech-net@ (long thread, but not much about
libquota itself ...)
 1.69 06-Mar-2011  bouyer merge the bouyer-quota2 branch. This adds a new on-disk format
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.

See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
 1.68 19-Nov-2010  dholland branches: 1.68.2; 1.68.4;
Introduce struct pathbuf. This is an abstraction to hold a pathname
and the metadata required to interpret it. Callers of namei must now
create a pathbuf and pass it to NDINIT (instead of a string and a
uio_seg), then destroy the pathbuf after the namei session is
complete.

Update all namei call sites accordingly. Add a pathbuf(9) man page and
update namei(9).

The pathbuf interface also now appears in a couple of related
additional places that were passing string/uio_seg pairs that were
later fed into NDINIT. Update other call sites accordingly.
 1.67 21-Jul-2010  hannken Make holding v_interlock mandatory for callers of vget().

Announced some time ago on tech-kern.
 1.66 24-Jun-2010  hannken Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.
 1.65 15-Jan-2010  bouyer branches: 1.65.2; 1.65.4;
vclean() actually sets v_tag to VT_NON but doesn't touch v_type.
getcleanvnode() sets v_type to VNON after releasing v_interlock.
So the thread doing quotaon(), quotaoff() or qsync() could vget()
a vnode which is being recycled in getcleanvnode(), after is has
been cleaned and v_interlock released, but before v_type has been
reset, leading to KASSERT(vp->v_usecount == 1) firing in
getnewvnode(), or qsync() dereferending a NULL pointer as in
PR kern/42205.
Fix by using the same tests as other ffs function traversing the mount
list: also check for VTOI(vp) == NULL, and VI_XLOCK in addition
to VI_CLEAN.
 1.64 02-Aug-2009  bouyer Fix previous: mutex_destroy() the right mutex
 1.63 01-Aug-2009  bouyer Add missing mutex_destroy() before pool_cache_put(). Prevents a
"Mutex error: lockdebug_alloc: already initialized" panic.
 1.62 07-May-2009  elad Introduce several actions/requests for authorizing file-system related
operations, specifically quota and block allocation from reserved space.

Modify ufs_quotactl() to accomodate passing "mp" earlier by vfs_busy()ing
it a little bit higher.

Mailing list reference:

http://mail-index.netbsd.org/tech-kern/2009/04/26/msg004936.html

Note that the umapfs request mentioned in this thread was NOT added as
there is still on-going discussion regarding the proper implementation.
 1.61 21-Dec-2008  ad branches: 1.61.2;
Print a warning message and return EOPNOTSUPP if the user tries to enable
quotas on a file system that is using logging.
 1.60 05-May-2008  ad branches: 1.60.8; 1.60.10;
- Convert hashinit() to use kmem_alloc(). The hash tables can be large
and it's better to not have them in kmem_map.
- Convert a couple of minor items along the way to kmem_alloc().
- Fix some memory leaks.
 1.59 21-Mar-2008  ad branches: 1.59.2; 1.59.4;
Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.
 1.58 25-Jan-2008  ad branches: 1.58.6;
Ignore clean vnodes in a couple more places.
 1.57 24-Jan-2008  ad quotaoff: ignore clean vnodes. Fixes PR kern/37818.
 1.56 15-Jan-2008  christos fix locking botch; vunmark() needs the mountvnode lock, and the loop invariant
is to exit with the mountvnode lock held.
 1.55 03-Jan-2008  ad Use pool_cache.
 1.54 03-Jan-2008  pooka valloc -> vnalloc, vfree -> vnfree
Avoids collision with userland valloc(3).

no functional change
ad ok
 1.53 02-Jan-2008  ad Merge vmlocking2 to head.
 1.52 08-Dec-2007  pooka branches: 1.52.4;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.
 1.51 10-Oct-2007  ad branches: 1.51.4; 1.51.6;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.50 31-Jul-2007  hannken branches: 1.50.2; 1.50.4; 1.50.6; 1.50.8;
- Replace the freelist with a pool. Current code never freed its dquot's.
- Always call dqsync() with dq locked.
- Add some assertions to verify the lock held.
- Serialize quotaon()/quotaoff(), dqhashmtx becomes dqlock. From ad@

Reviewed by: Andrew Doran <ad@netbsd.org>
 1.49 19-Jul-2007  hannken Update and add locking to ufs_quota:

- Replace DQ_LOCK/DQ_WANT/sleep/wakeup with a mutex `dq_interlock'. Use this
mutex to protect all quota values and flags.
- Protect the hashtable with a mutex.
- Never update quotas for the quota files on the same file system. Prevents
a deadlock when dqsync() has to change the quota file's size (PR #13942).

Reviewed by: Andrew Doran <ad@netbsd.org>
Bill Stouder-Studenmund <wrstuden@netbsd.org>
 1.48 10-Jul-2007  hannken branches: 1.48.2;
Move `struct dquot' and its supporting functions from quota.h to ufs_quota.c.

- Make quota-internal functions static.
- Clean up declarations in quota.h and ufs_extern.h. quota.h now has the
description of quota criterions, on-disk structure, user-kernel interface and
declaration of init/done functions. All ufs quota related function
prototypes go to ufs_extern.h.
- New functions ufsquota_init() and ufsquota_free() create or destroy the
quota fields of `struct inode'.
- chkdq() and chkiq() always update the quota fields of `struct inode' first.
- Only ufs_access() explicitely calls getinoquota().

No objections on tech-kern@
 1.47 30-Jun-2007  pooka Using POOL_INIT here makes no sense, since file systems always have
an init method. So get rid of it and #ifdef _LKM and just always
init in the init method. Give malloc types the same treatment.
Makes file systems nicer to work with in linksetless environments
and fixes a few LKM discrepancies.
 1.46 23-Jun-2007  hannken If a quota-enabled file system has 65536 active vnodes for one uid
the reference counter of the corresponding struct dquot will overflow.

Change the type of the reference counter from u_int16_t to u_int32_t and
add an assertion to check for overflow.

Observed and tested by Edgar Fu�.

Welcome to 4.99.21 (struct dquot and therefore struct inode changed layout)
 1.45 07-Apr-2007  hannken Remove calls to now obsolete vn_start_write() and vn_finished_write().
 1.44 04-Mar-2007  christos branches: 1.44.2; 1.44.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.43 04-Jan-2007  elad branches: 1.43.2;
Consistent usage of KAUTH_GENERIC_ISSUSER.
 1.42 20-Oct-2006  reinoud branches: 1.42.2; 1.42.4;
Replace the LIST structure mp->mnt_vnodelist to a TAILQ structure since all
vnodes were synced and processed backwards. This meant that the last
accessed node was processed first and the earlierst last.

An extra benefit is the removal of the ugly hack from the Berkly days on
LFS.

In the proces, i've also replaced the various variations hand written loops
by the TAILQ_FOREACH() macro's.
 1.41 23-Jul-2006  ad branches: 1.41.4; 1.41.6;
Use the LWP cached credentials where sane.
 1.40 07-Jun-2006  kardel merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.39 14-May-2006  elad branches: 1.39.2;
integrate kauth.
 1.38 01-Mar-2006  yamt branches: 1.38.2; 1.38.4; 1.38.6;
merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.
 1.37 27-Dec-2005  chs branches: 1.37.2; 1.37.4; 1.37.6;
changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
make those fields always present.
- for functions which are conditionally inline, make them never inline.
- remove some other functions which are conditionally defined but
don't actually do anything anymore.
- make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.
 1.36 11-Dec-2005  christos merge ktrace-lwp.
 1.35 10-Jul-2005  thorpej branches: 1.35.6;
- Use ANSI function decls.
- Sprinkle some static.
 1.34 29-May-2005  christos branches: 1.34.2;
- sprinkle const
- avoid shadow variables.
 1.33 26-Feb-2005  perry branches: 1.33.2; 1.33.4; 1.33.6;
nuke trailing whitespace
 1.32 17-Sep-2004  skrll branches: 1.32.4; 1.32.6;
There's no need to pass a proc value when using UIO_SYSSPACE with
vn_rdwr(9) and uiomove(9).

OK'd by Jason Thorpe
 1.31 15-Aug-2004  mycroft Fixing age old cruft:
* Rather than using mnt_maxsymlinklen to indicate that a file systems returns
d_type fields(!), add a new internal flag, IMNT_DTYPE.

Add 3 new elements to ufsmount:
* um_maxsymlinklen, replaces mnt_maxsymlinklen (which never should have existed
in the first place).
* um_dirblksiz, which tracks the current directory block size, eliminating the
FS-specific checks littered throughout the code. This may be used later to
make the block size variable.
* um_maxfilesize, which is the maximum file size, possibly adjusted lower due
to implementation issues.

Sync some bug fixes from FFS into ext2fs, particularly:
* ffs_lookup.c 1.21, 1.28, 1.33, 1.48
* ffs_inode.c 1.43, 1.44, 1.45, 1.66, 1.67
* ffs_vnops.c 1.84, 1.85, 1.86

Clean up some crappy pointer frobnication.
 1.30 05-Nov-2003  hannken branches: 1.30.4;
Clean up the usage of vn_start_write(). At least one occurence clobbered
previous error conditions.
If "(flags & (V_WAIT|V_PCATCH)) == V_WAIT" the return value is always zero.
Ignore the return value in these cases.

From Darrin B. Jewell.
 1.29 15-Oct-2003  hannken Add the gating of system calls that cause modifications to the underlying
file system.
The function vfs_write_suspend stops all new write operations to a file
system, allows any file system modifying system calls already in progress
to complete, then sync's the file system to disk and returns. The
function vfs_write_resume allows the suspended write operations to
complete.

From FreeBSD with slight modifications.

Approved by: Frank van der Linden <fvdl@netbsd.org>
 1.28 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.27 29-Jun-2003  fvdl branches: 1.27.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.26 29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.25 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.24 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.23 01-Feb-2003  thorpej Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.
 1.22 08-Nov-2001  lukem add RCSID
 1.21 15-Sep-2001  chs branches: 1.21.2;
add a new VFS op, vfs_reinit, which is called when desiredvnodes is
adjusted via sysctl. file systems that have hash tables which are
sized based on the value of this variable now resize those hash tables
using the new value. the max number of FFS softdeps is also recalculated.

convert various file systems to use the <sys/queue.h> macros for
their hash tables.
 1.20 08-Nov-2000  ad branches: 1.20.2; 1.20.6; 1.20.8;
Update for hashinit() change.
 1.19 05-Jul-2000  jdolecek kern.maxvnodes (and hence desireddquot) depends more directly on NVNODE than
on NPROC, adjust the hint to tablefull() accordingly
 1.18 04-Jul-2000  mjacob Add missing second argument to tablefull call. I *think* the added
message makes sense- somebody might want to check it.
 1.17 27-May-2000  thorpej branches: 1.17.4;
sleep() -> tsleep()
 1.16 20-May-2000  thorpej In chkdq(), check for NOCRED. Should fix port-alpha/10147.
 1.15 30-Mar-2000  augustss Remove register declarations.
 1.14 16-Mar-2000  jdolecek Change ufs_init() to keep global count of how many times it was called.
Resources are initialized still just once (on first call).

Add ufs_done(), which takes care of freeing all resources allocated in
ufs_init(). The resources are freed only when last user of the code exits.
 1.13 15-Nov-1999  fvdl Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.12 09-Aug-1998  perry branches: 1.12.12; 1.12.14; 1.12.18;
bzero->memset, bcopy->memcpy, bcmp->memcmp
 1.11 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.10 07-Feb-1998  chs add flags arg to hashinit(), to pass to malloc().
 1.9 11-Jun-1997  bouyer Add support for ext2fs, this needed a few modifications to ufs/ufs/inode.h:
- added an "union inode_ext" to struct inode, for the per-fs extentions.
For now only ext2fs uses it.
- i_din is now an union:
union {
struct dinode ffs_din; /* 128 bytes of the on-disk dinode. */
struct ext2fs_dinode e2fs_din; /* 128 bytes of the on-disk dinode. */
} i_din
Added a lot of #define i_ffs_* and i_e2fs_* to access the fields.
- Added two macros: FFS_ITIMES and EXT2FS_ITIMES. ITIMES calls the rigth
macro, depending on the time of the inode. ITIMES is used where necessary,
FFS_ITIMES and EXT2FS_ITIMES in other places.
 1.8 09-Feb-1996  christos ufs prototype changes
 1.7 08-Mar-1995  cgd cast pointer to long, not int
 1.6 14-Dec-1994  mycroft Remove extra arg to vn_open().
 1.5 13-Dec-1994  mycroft Sync with CSRG.
 1.4 14-Nov-1994  christos added extra argument to vn_open
 1.3 20-Oct-1994  cgd update for new syscall args description mechanism, and deal safely
with wider types.
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.12.18.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.12.14.1 19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.12.12.2 22-Nov-2000  bouyer Sync with HEAD.
 1.12.12.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.17.4.1 05-Jul-2000  jdolecek pullup 1.18, 1.19 from trunk (requsted by mjacob, approved by thorpej):
adapt to tablefull() change, hopefully the hint is helpfull
 1.20.8.1 01-Oct-2001  fvdl Catch up with -current.
 1.20.6.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.20.2.2 14-Nov-2001  nathanw Catch up to -current.
 1.20.2.1 21-Sep-2001  nathanw Catch up to -current.
 1.21.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.27.2.9 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.27.2.8 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.27.2.7 27-Oct-2004  skrll Remove the struct lwp * arguments from qsync and ufs_checkpath that are
no longer (read: were never) required.
 1.27.2.6 21-Sep-2004  skrll Fix the sync with head I botched.
 1.27.2.5 18-Sep-2004  skrll Sync with HEAD.
 1.27.2.4 25-Aug-2004  skrll Sync with HEAD.
 1.27.2.3 24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.27.2.2 03-Aug-2004  skrll Sync with HEAD
 1.27.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.30.4.1 11-Aug-2007  bouyer Pull up following revision(s) (requested by hannken in ticket #11334):
sys/ufs/ufs/ufs_quota.c: revision 1.46
sys/ufs/ufs/quota.h: revision 1.24
If a quota-enabled file system has 65536 active vnodes for one uid
the reference counter of the corresponding struct dquot will overflow.
Change the type of the reference counter from u_int16_t to u_int32_t and
add an assertion to check for overflow.
Observed and tested by Edgar Fu�.
 1.32.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.32.4.1 29-Apr-2005  kent sync with -current
 1.33.6.1 28-Jun-2007  ghen Pull up following revision(s) (requested by hannken in ticket #1807):
sys/ufs/ufs/ufs_quota.c: revision 1.46
sys/ufs/ufs/quota.h: revision 1.24
sys/sys/param.h: patch
If a quota-enabled file system has 65536 active vnodes for one uid
the reference counter of the corresponding struct dquot will overflow.
Change the type of the reference counter from u_int16_t to u_int32_t and
add an assertion to check for overflow.
Bump kernel version as LKM's depending on UFS internals will have to be
recompiled after this change (discussed and approved on tech-kern).
 1.33.4.1 28-Jun-2007  ghen Pull up following revision(s) (requested by hannken in ticket #1807):
sys/ufs/ufs/ufs_quota.c: revision 1.46
sys/ufs/ufs/quota.h: revision 1.24
sys/sys/param.h: patch
If a quota-enabled file system has 65536 active vnodes for one uid
the reference counter of the corresponding struct dquot will overflow.
Change the type of the reference counter from u_int16_t to u_int32_t and
add an assertion to check for overflow.
Bump kernel version as LKM's depending on UFS internals will have to be
recompiled after this change (discussed and approved on tech-kern).
 1.33.2.1 28-Jun-2007  ghen Pull up following revision(s) (requested by hannken in ticket #1807):
sys/ufs/ufs/ufs_quota.c: revision 1.46
sys/ufs/ufs/quota.h: revision 1.24
sys/sys/param.h: patch
If a quota-enabled file system has 65536 active vnodes for one uid
the reference counter of the corresponding struct dquot will overflow.
Change the type of the reference counter from u_int16_t to u_int32_t and
add an assertion to check for overflow.
Bump kernel version as LKM's depending on UFS internals will have to be
recompiled after this change (discussed and approved on tech-kern).
 1.34.2.8 24-Mar-2008  yamt sync with head.
 1.34.2.7 04-Feb-2008  yamt sync with head.
 1.34.2.6 21-Jan-2008  yamt sync with head
 1.34.2.5 27-Oct-2007  yamt sync with head.
 1.34.2.4 03-Sep-2007  yamt sync with head.
 1.34.2.3 26-Feb-2007  yamt sync with head.
 1.34.2.2 30-Dec-2006  yamt sync with head.
 1.34.2.1 21-Jun-2006  yamt sync with head.
 1.35.6.2 18-Nov-2005  yamt - associate read-ahead context to vnode, rather than file.
- revert VOP_READ prototype.
 1.35.6.1 15-Nov-2005  yamt adapt ffs, lfs, nfs.
 1.37.6.3 01-Jun-2006  kardel Sync with head.
 1.37.6.2 22-Apr-2006  simonb Sync with head.
 1.37.6.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time() and use "time_second"
instead of "time.tv_sec".
 1.37.4.1 09-Sep-2006  rpaulo sync with head
 1.37.2.1 31-Dec-2005  yamt adapt some random parts of kernel to uio_vmspace.
 1.38.6.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.38.4.2 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.38.4.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.38.2.3 11-Aug-2006  yamt sync with head
 1.38.2.2 26-Jun-2006  yamt sync with head.
 1.38.2.1 24-May-2006  yamt sync with head.
 1.39.2.1 19-Jun-2006  chap Sync with head.
 1.41.6.1 22-Oct-2006  yamt sync with head
 1.41.4.2 12-Jan-2007  ad Sync with head.
 1.41.4.1 18-Nov-2006  ad Sync with head.
 1.42.4.1 03-Sep-2007  wrstuden Sync w/ NetBSD-4-RC_1
 1.42.2.1 28-Jun-2007  ghen Pull up following revision(s) (requested by hannken in ticket #747):
sys/ufs/ufs/ufs_quota.c: revision 1.46
sys/ufs/ufs/quota.h: revision 1.24
sys/sys/param.h: patch
If a quota-enabled file system has 65536 active vnodes for one uid
the reference counter of the corresponding struct dquot will overflow.
Change the type of the reference counter from u_int16_t to u_int32_t and
add an assertion to check for overflow.
Bump kernel version as LKM's depending on UFS internals will have to be
recompiled after this change (discussed and approved on tech-kern).
 1.43.2.2 15-Apr-2007  yamt sync with head.
 1.43.2.1 12-Mar-2007  rmind Sync with HEAD.
 1.44.4.1 11-Jul-2007  mjf Sync with head.
 1.44.2.7 28-Oct-2007  ad Fix up mnt_vnodelist handling.
 1.44.2.6 25-Oct-2007  ad Fix up mnt_vnodelist handling.
 1.44.2.5 20-Aug-2007  ad Sync with HEAD.
 1.44.2.4 15-Jul-2007  ad Sync with head.
 1.44.2.3 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.44.2.2 10-Apr-2007  ad Sync with head.
 1.44.2.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.48.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.50.8.2 31-Jul-2007  hannken - Replace the freelist with a pool. Current code never freed its dquot's.
- Always call dqsync() with dq locked.
- Add some assertions to verify the lock held.
- Serialize quotaon()/quotaoff(), dqhashmtx becomes dqlock. From ad@

Reviewed by: Andrew Doran <ad@netbsd.org>
 1.50.8.1 31-Jul-2007  hannken file ufs_quota.c was added on branch matt-mips64 on 2007-07-31 09:29:53 +0000
 1.50.6.1 14-Oct-2007  yamt sync with head.
 1.50.4.3 23-Mar-2008  matt sync with HEAD
 1.50.4.2 09-Jan-2008  matt sync with HEAD
 1.50.4.1 06-Nov-2007  matt sync with HEAD
 1.50.2.2 09-Dec-2007  jmcneill Sync with HEAD.
 1.50.2.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.51.6.2 26-Dec-2007  ad Sync with head.
 1.51.6.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.51.4.1 18-Feb-2008  mjf Sync with HEAD.
 1.52.4.3 19-Jan-2008  bouyer Sync with HEAD
 1.52.4.2 08-Jan-2008  bouyer Sync with HEAD
 1.52.4.1 02-Jan-2008  bouyer Sync with HEAD
 1.58.6.3 17-Jan-2009  mjf Sync with HEAD.
 1.58.6.2 02-Jun-2008  mjf Sync with HEAD.
 1.58.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.59.4.6 11-Aug-2010  yamt sync with head.
 1.59.4.5 11-Mar-2010  yamt sync with head
 1.59.4.4 19-Aug-2009  yamt sync with head.
 1.59.4.3 16-May-2009  yamt sync with head
 1.59.4.2 04-May-2009  yamt sync with head.
 1.59.4.1 16-May-2008  yamt sync with head.
 1.59.2.1 18-May-2008  yamt sync with head.
 1.60.10.4 27-Jan-2010  sborrill Pull up the following revisions(s) (requested by bouyer in ticket #1252):
sys/ufs/ufs/ufs_quota.c: revision 1.65

vclean() actually sets v_tag to VT_NON but doesn't touch v_type.
getcleanvnode() sets v_type to VNON after releasing v_interlock.
So the thread doing quotaon(), quotaoff() or qsync() could vget()
a vnode which is being recycled in getcleanvnode(), after it has
been cleaned and v_interlock released, but before v_type has been
reset, leading to KASSERT(vp->v_usecount == 1) firing in
getnewvnode(), or qsync() dereferencing a NULL pointer as in
PR kern/42205.
Fix by using the same tests as other ffs functions traversing the mount
list: also check for VTOI(vp) == NULL, and VI_XLOCK in addition
to VI_CLEAN.
 1.60.10.3 07-Aug-2009  snj Pull up following revision(s) (requested by bouyer in ticket #898):
sys/ufs/ufs/ufs_quota.c: revision 1.64
Fix previous: mutex_destroy() the right mutex
 1.60.10.2 07-Aug-2009  snj Pull up following revision(s) (requested by bouyer in ticket #898):
sys/ufs/ufs/ufs_quota.c: revision 1.63
Add missing mutex_destroy() before pool_cache_put(). Prevents a
"Mutex error: lockdebug_alloc: already initialized" panic.
 1.60.10.1 02-Feb-2009  snj branches: 1.60.10.1.2; 1.60.10.1.4;
Pull up following revision(s) (requested by ad in ticket #351):
sys/ufs/ufs/ufs_quota.c: revision 1.61
Print a warning message and return EOPNOTSUPP if the user tries to enable
quotas on a file system that is using logging.
 1.60.10.1.4.1 21-Apr-2010  matt sync to netbsd-5
 1.60.10.1.2.2 07-Aug-2009  snj Pull up following revision(s) (requested by bouyer in ticket #898):
sys/ufs/ufs/ufs_quota.c: revision 1.64
Fix previous: mutex_destroy() the right mutex
 1.60.10.1.2.1 07-Aug-2009  snj Pull up following revision(s) (requested by bouyer in ticket #898):
sys/ufs/ufs/ufs_quota.c: revision 1.63
Add missing mutex_destroy() before pool_cache_put(). Prevents a
"Mutex error: lockdebug_alloc: already initialized" panic.
 1.60.8.1 19-Jan-2009  skrll Sync with HEAD.
 1.61.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.65.4.4 21-Apr-2011  rmind sync with head
 1.65.4.3 05-Mar-2011  rmind sync with head
 1.65.4.2 03-Jul-2010  rmind sync with head
 1.65.4.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.65.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.68.4.14 13-Feb-2011  bouyer Fix memory leak
 1.68.4.13 12-Feb-2011  bouyer Don't count snapshot files in inode quota too.
At umount time, chk?q may be called after quota have been shutdown,
as there is a final vflush pass after quota?_umount(); so skip quota
checks if the quota vnode is not there any more.
 1.68.4.12 12-Feb-2011  bouyer Do not update disk quotas for snapshot inodes, as this may require a
write to the same filesystem, which will trigger a copy on write,
which will trigger another update to the same block.
Set SF_SNAPSHOT just after truncating the snapshot inode, so that this
inode always account for 0 blocks in quotas.
 1.68.4.11 09-Feb-2011  bouyer Reimplement quotactl commands for quota1
 1.68.4.10 09-Feb-2011  bouyer Various build fixes
 1.68.4.9 08-Feb-2011  bouyer Minimal hacking to make 'options QUOTA' compile again.
 1.68.4.8 07-Feb-2011  bouyer Implement clear command (quota2 only), which either free the specified
quota2 entry (if both disk and inode usage are 0) or revert its limits to
the default quota entry.
 1.68.4.7 31-Jan-2011  bouyer On command with multiple data, make sure to reset 'defaultq' to 0.
 1.68.4.6 31-Jan-2011  bouyer Catch up with Q2V -> QL renaming
Enforce limits for quota2.
pass quota type (*QUOTA) and limit type (QL_*) to
KAUTH_REQ_SYSTEM_FS_QUOTA_NOLIMIT, to make it possible to skip
limit checks for some quota type only if a listener wants to.
 1.68.4.5 30-Jan-2011  bouyer Implement "get version" quotactl command, which return the filesystem's
enabled quota versiob (1 for legacy, 2 for new).
For quota2, make quota and repquota print the user's allowed grace period
if -v is given and not overquota (if overquota, the remaining time is
printed instead, as usual).
 1.68.4.4 30-Jan-2011  bouyer Implement 'set' command for quota2.
 1.68.4.3 29-Jan-2011  bouyer Describe how the on-disk structures are protected from concurent access,
and try to implement it.
 1.68.4.2 21-Jan-2011  bouyer Add support for quotactl("getall") command, and convert repquota to new
world.
 1.68.4.1 20-Jan-2011  bouyer Snapshot of work in progress on a modernised disk quota system:
- new quotactl syscall (versionned for backward compat), which takes
as parameter a path to a mount point, and a prop_dictionary
(in plistref format) describing commands and arguments.
For each command, status and data are returned as a prop_dictionary.
quota commands features will be added to take advantage of this,
exporting quota data or getting quota commands as plists.

- new on disk-format storage (all 64bit wide), integrated to metadata for
ffs (and playing nicely with wapbl).
Quotas are enabled on a ffs filesystem via superblock flags.
tunefs(8) can enable or disable quotas.
On a quota-enabled filesystem, fsck_ffs(8) will track per-uid/gid
block and inode usages, and will check and update quotas in Pass 6.
quota usage and limits are stored in unliked files (one for users,
one for groups)l fsck_ffs(8) will create the files if needed, or
free them if needed. This means that after enabling or disabling
quotas on a filesystem; a fsck_ffs(8) run is required.
quotacheck(8) is not needed any more, on a unclean shutdown
fsck or journal replay will take care of fixing quotas.
newfs(8) can create a ready-to-mount quota-enabled filesystem
(superblock flags are set and quota inodes are created).
Other new features or semantic changes:
- default quota datas, applied to users or groups which don't already
have a quota entry
- per-user/group grace time (instead of a filesystem global one)
- 0 really means "nothing allowed at all", not "no limit".
If you want "no limit", set the limit to UQUAD_MAX (tools will
understand "unlimited" and "-")

A quota file is structured as follow:
it starts with a header, containing a few per-filesystem values,
and the default quota limits.
Quota entries are linked together as a simple list, each entry has a
pointer (as an offset withing the file) to the next.
The header has a pointer to a list of free quota entries, and
a hash table of in-use entries. The size of the hash table depends
on the filesystem block size (header+hash table should fit in the
first block). The file is not sparse and is a multiple of
filesystem block size (when the free quota entry list is empty a new
filesystem block is allocated). quota entries to not cross
filesystem block boundaries.

In memory, the kernel keeps a cache of recently used quota entries
as a reference to the block number, and offset withing the block.
The quota entry itself is keept in the buf cache.

fsck_ffs(8), tunefs(8) and newfs(8) supports are completed (with
related atf tests :)
The kernel can update disk usage and report it via quotactl(2).

Todo: enforce quotas limits (limits are not checked by kernel yet)
update repquota, edquota and rpc.rquotad to the new world
implement compat_50_quotactl ioctl.
update quotactl(2) man page

fsck_ffs required fixes so that allocating new blocks or inodes will
properly update the superblock and cg sumaries. This was not an issue up
to now because superblock and cg sumaries check happened last, but now
allocations or frees can happen in pass 6.
 1.68.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.70.8.1 18-Feb-2012  mrg merge to -current.
 1.70.4.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.70.4.2 30-Oct-2012  yamt sync with head
 1.70.4.1 17-Apr-2012  yamt sync with head
 1.108.2.2 13-Sep-2012  riz Pull up following revision(s) (requested by manu in ticket #551):
sys/ufs/ufs/ufs_quota.c: revision 1.112
Temporary fix for quotactl authorization: it must use the effective UID
and not the real UID.
Further work is required to move the check to the kauth listener instead
of having it in UFS code.
 1.108.2.1 30-Jul-2012  martin branches: 1.108.2.1.2;
Pull up following revision(s) (requested by dholland in ticket #450):
sys/ufs/ufs/ufs_quota.c: revision 1.110
sys/ufs/ufs/ufs_quota.h: revision 1.21
sys/ufs/ufs/ufs_quota.c: revision 1.109
Eliminate a common in a header file (add a missing extern) and
declare it in the approriate C file.
Restore accidentally lost initialization of quotatypes[].
Fixes (null) in the kernel message triggered when you go over quota, and
maybe other things. Reported by Matthew Mondor.
 1.108.2.1.2.1 01-Nov-2012  matt sync with netbsd-6-0-RELEASE.
 1.112.4.1 18-May-2014  rmind sync with head
 1.112.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.115.2.1 10-Aug-2014  tls Rebase.
 1.22 28-Jun-2014  dholland Revert the following changes:

src/sys/sys/quotactl.h 1.37
src/sys/compat/netbsd32/netbsd32.h 1.101
src/sys/compat/netbsd32/netbsd32_netbsd.c 1.188, 1.189
src/sys/kern/vfs_quotactl.c 1.39
src/sys/kern/vfs_syscalls.c 1.483
src/sys/ufs/lfs/ulfs_quota.c 1.11
src/sys/ufs/ufs/ufs_quota.c 1.116
src/lib/libquota/quota_kernel.c 1.5

and do them correctly.

If you're going to change the name of something, you need to change
the name of *all* the things with the same name, not just a handful,
and you should change it to something similar so it still matches the
rest of the system rather than just picking an arbitrarily different
name.

Hi, Joerg.

To wit, rename the quotactl "delete" operation to "del", because
"delete" is a reserved word in C++ and for some reason Joerg wants to
run internal interfaces used only by C code through his C++ compiler.
Do not rename it to "remove" instead, because this doesn't match
libquota or the rest of the usage throughout the system; and rename
all the related identifiers, not just the ones that blew the mind of
Joerg's C++ compiler.

Because this is not a user-facing API (the only userland consumer
sys/quotactl.h is libquota) it is sort of ok to make arbitrary
source-incompatible changes; however, by the same token it's completely
unnecessary. If it *were* a user-facing API that someone might have a
semi-rational reason to want to run a C++ compiler on, it would be
incorrect to change it at this point.
 1.21 18-Feb-2012  matt branches: 1.21.2; 1.21.12;
Eliminate a common in a header file (add a missing extern) and
declare it in the approriate C file.
 1.20 29-Jan-2012  dholland branches: 1.20.2;
Add QUOTACTL_CURSORSKIPIDTYPE, QUOTACTL_CURSORATEND, QUOTACTL_CURSORREWIND.

This change requires a kernel version bump.
 1.19 29-Jan-2012  dholland Don't pass the idtype to QUOTACTL_GETALL. Instead, iterate both users
and groups.

This change requires a kernel version bump.
 1.18 29-Jan-2012  dholland Call QUOTACTL_GETALL in a loop to get results 8 at a time. Make
the QUOTACTL_GETALL interface less abusive.

Note: this change requires a kernel version bump.
 1.17 29-Jan-2012  dholland Hack QUOTACTL_GETALL to return results without using proplib.

(this interface is abusive and is going to be cleaned up in the
immediate future)

Note: this change requires a kernel version bump.
 1.16 29-Jan-2012  dholland Pass the cursor to QUOTACTL_GETALL. Don't pass unused proplib items.

Note: this change requires a kernel version bump.
 1.15 29-Jan-2012  dholland Begin adding quota cursor/iteration interface to VFS_QUOTACTL.

Add struct quotakcursor.
Add QUOTACTL_CURSOROPEN and QUOTACTL_CURSORCLOSE operations.
Implement the plumbing for them.
Add trivial implementations of them for quota2.
(iteration is not supported on quota1 for the time being, just as
getall isn't)
Have the proplib interpreter open and close a cursor around doing
QUOTACTL_GETALL.

Note: this change requires a kernel version bump.
 1.14 29-Jan-2012  dholland Package up the args of QUOTACTL_DELETE as a struct quotakey.
 1.13 29-Jan-2012  dholland QUOTACTL_CLEAR -> QUOTACTL_DELETE to match intended API and user API.
 1.12 29-Jan-2012  dholland Improve the quota2 QUOTACTL_CLEAR code to allow clearing blocks and
files independently.

Note: this change requires a kernel version bump.
 1.11 29-Jan-2012  dholland The handling of QUOTACTL_CLEAR does not use the proplib data
dictionary, so don't pass it.

Note: this change requires a kernel version bump.
 1.10 29-Jan-2012  dholland Rename QUOTACTL_SET to QUOTACTL_PUT, to match future intended API.
 1.9 29-Jan-2012  dholland Combine the miscellaneous QUOTACTL_SET args into a struct quotakey.

Note: this change requires a kernel version bump.
 1.8 29-Jan-2012  dholland Pass only one objtype and its quotaval to QUOTACTL_SET at one time.

(The backend code to handle this is a lot tidier than I expected given
that the proplib code doesn't allow setting blocks and files
independently; I was afraid there would turn out to be a reason for
that...)

Note: this change requires a kernel version bump.
 1.7 29-Jan-2012  dholland For QUOTACTL_SET in quota2, use the quotaval data instead of proplib.
 1.6 29-Jan-2012  dholland For QUOTACTL_SET in quota1, use the quotaval data instead of proplib.
 1.5 29-Jan-2012  dholland Use struct quotakey with QUOTACTL_GET. Tidy up accordingly.

Step 5 of 5 for QUOTACTL_GET.

Note: this change requires a kernel version bump.
 1.4 29-Jan-2012  dholland Per the FS-independent schema, get one quotaval at a time from the
filesystem, instead of blocks and files together.

This results in fetching each FS-level quota entry twice and therefore
doing slightly more work, but (1) quota access isn't a critical path
and (2) after fetching the block values the file values will be hot in
the cache, so it won't add much total time.

Also move more of the FS-independent defintions from <quota.h> to
<sys/quota.h> so we can use them internally.

Step 4 of 5 for QUOTACTL_GET.

Note: this change requires a kernel version bump.
 1.3 29-Jan-2012  dholland Move second-layer proplib frobbing within ufs quota code up to the
first layer. (Step 2 of several for QUOTACTL_GET.)
 1.2 06-Mar-2011  bouyer branches: 1.2.2; 1.2.6; 1.2.8; 1.2.12;
merge the bouyer-quota2 branch. This adds a new on-disk format
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.

See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
 1.1 20-Jan-2011  bouyer branches: 1.1.2;
file ufs_quota.h was initially added on branch bouyer-quota2.
 1.1.2.9 09-Feb-2011  bouyer Reimplement quotactl commands for quota1
 1.1.2.8 09-Feb-2011  bouyer Various build fixes
 1.1.2.7 08-Feb-2011  bouyer Remove unused prototypes
 1.1.2.6 08-Feb-2011  bouyer Minimal hacking to make 'options QUOTA' compile again.
 1.1.2.5 07-Feb-2011  bouyer Implement clear command (quota2 only), which either free the specified
quota2 entry (if both disk and inode usage are 0) or revert its limits to
the default quota entry.
 1.1.2.4 31-Jan-2011  bouyer Catch up with Q2V -> QL renaming
Enforce limits for quota2.
pass quota type (*QUOTA) and limit type (QL_*) to
KAUTH_REQ_SYSTEM_FS_QUOTA_NOLIMIT, to make it possible to skip
limit checks for some quota type only if a listener wants to.
 1.1.2.3 30-Jan-2011  bouyer Implement 'set' command for quota2.
 1.1.2.2 21-Jan-2011  bouyer Add support for quotactl("getall") command, and convert repquota to new
world.
 1.1.2.1 20-Jan-2011  bouyer Snapshot of work in progress on a modernised disk quota system:
- new quotactl syscall (versionned for backward compat), which takes
as parameter a path to a mount point, and a prop_dictionary
(in plistref format) describing commands and arguments.
For each command, status and data are returned as a prop_dictionary.
quota commands features will be added to take advantage of this,
exporting quota data or getting quota commands as plists.

- new on disk-format storage (all 64bit wide), integrated to metadata for
ffs (and playing nicely with wapbl).
Quotas are enabled on a ffs filesystem via superblock flags.
tunefs(8) can enable or disable quotas.
On a quota-enabled filesystem, fsck_ffs(8) will track per-uid/gid
block and inode usages, and will check and update quotas in Pass 6.
quota usage and limits are stored in unliked files (one for users,
one for groups)l fsck_ffs(8) will create the files if needed, or
free them if needed. This means that after enabling or disabling
quotas on a filesystem; a fsck_ffs(8) run is required.
quotacheck(8) is not needed any more, on a unclean shutdown
fsck or journal replay will take care of fixing quotas.
newfs(8) can create a ready-to-mount quota-enabled filesystem
(superblock flags are set and quota inodes are created).
Other new features or semantic changes:
- default quota datas, applied to users or groups which don't already
have a quota entry
- per-user/group grace time (instead of a filesystem global one)
- 0 really means "nothing allowed at all", not "no limit".
If you want "no limit", set the limit to UQUAD_MAX (tools will
understand "unlimited" and "-")

A quota file is structured as follow:
it starts with a header, containing a few per-filesystem values,
and the default quota limits.
Quota entries are linked together as a simple list, each entry has a
pointer (as an offset withing the file) to the next.
The header has a pointer to a list of free quota entries, and
a hash table of in-use entries. The size of the hash table depends
on the filesystem block size (header+hash table should fit in the
first block). The file is not sparse and is a multiple of
filesystem block size (when the free quota entry list is empty a new
filesystem block is allocated). quota entries to not cross
filesystem block boundaries.

In memory, the kernel keeps a cache of recently used quota entries
as a reference to the block number, and offset withing the block.
The quota entry itself is keept in the buf cache.

fsck_ffs(8), tunefs(8) and newfs(8) supports are completed (with
related atf tests :)
The kernel can update disk usage and report it via quotactl(2).

Todo: enforce quotas limits (limits are not checked by kernel yet)
update repquota, edquota and rpc.rquotad to the new world
implement compat_50_quotactl ioctl.
update quotactl(2) man page

fsck_ffs required fixes so that allocating new blocks or inodes will
properly update the superblock and cg sumaries. This was not an issue up
to now because superblock and cg sumaries check happened last, but now
allocations or frees can happen in pass 6.
 1.2.12.1 18-Feb-2012  mrg merge to -current.
 1.2.8.1 17-Apr-2012  yamt sync with head
 1.2.6.2 06-Jun-2011  jruoho Sync with HEAD.
 1.2.6.1 06-Mar-2011  jruoho file ufs_quota.h was added on branch jruoho-x86intr on 2011-06-06 09:10:18 +0000
 1.2.2.2 21-Apr-2011  rmind sync with head
 1.2.2.1 06-Mar-2011  rmind file ufs_quota.h was added on branch rmind-uvmplock on 2011-04-21 01:42:21 +0000
 1.20.2.1 30-Jul-2012  martin Pull up following revision(s) (requested by dholland in ticket #450):
sys/ufs/ufs/ufs_quota.c: revision 1.110
sys/ufs/ufs/ufs_quota.h: revision 1.21
sys/ufs/ufs/ufs_quota.c: revision 1.109
Eliminate a common in a header file (add a missing extern) and
declare it in the approriate C file.
Restore accidentally lost initialization of quotatypes[].
Fixes (null) in the kernel message triggered when you go over quota, and
maybe other things. Reported by Matthew Mondor.
 1.21.12.1 10-Aug-2014  tls Rebase.
 1.21.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.26 22-Feb-2023  riastradh ufs: Nix trailing whitespace and tidy up some other minor KNF.
 1.25 26-Apr-2022  hannken Keep flag "UFS_QUOTA" set until the last quota is closed.

Prevents a live lock when dqrele() finds a struct with "dq_cnt == 1"
and flag "DQ_MOD" and cannot sync as flag UFS_QUOTA is unset.
 1.24 29-Jun-2021  dholland Add containment for the cloning devices hack in vn_open.

Cloning devices (and also things like /dev/stderr) work by allocating
a struct file, stuffing it in the file table (which is a layer
violation), stuffing the file descriptor number for it in a magic
field of struct lwp (which is gross), and then "failing" with one of
two magic errnos, EDUPFD or EMOVEFD.

Before this commit, all callers of vn_open in the kernel (there are
quite a few) were expected to check for these errors and handle the
situation. Needless to say, none of them except for open() itself did,
resulting in internal negative errnos being returned to userspace.

This hack is fairly deeply rooted and cannot be eliminated all at
once. This commit adds logic to handle the magic errnos inside
vn_open; now on success vn_open returns either a vnode or an integer
file descriptor, along with a flag that says whether the underlying
code requested EDUPFD or EMOVEFD. Callers not prepared to cope with
file descriptors can pass NULL for the extra return values, in which
case if a file descriptor would be produced vn_open fails with
EOPNOTSUPP.

Since I'm rearranging vn_open's signature anyway, stop exposing struct
nameidata. Instead, take three arguments: an optional vnode to use as
the starting point (like openat()), the path, and additional namei
flags to use, restricted to NOCHROOT and TRYEMULROOT. (Other namei
behavior, e.g. NOFOLLOW, can be requested via the open flags.)

This change requires a kernel bump. Ride the one an hour ago.
(That was supposed to be coordinated; did not intend to let an hour
slip by. My fault.)
 1.23 25-Dec-2020  nia branches: 1.23.4;
Avoid potentially accessing an array with an index out of range.

Reported-by: syzbot+8832f540234b996bc5a9@syzkaller.appspotmail.com
Reported-by: syzbot+0b785dd10d987350ecb3@syzkaller.appspotmail.com
 1.22 20-Jun-2016  dholland branches: 1.22.10; 1.22.22; 1.22.30;
Widen before multiplying. Like -r1.21, but in the other similar case.
 1.21 25-Nov-2014  christos branches: 1.21.2;
CID 977076: Widen before multiply.
 1.20 24-May-2014  christos Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.
 1.19 17-Mar-2014  hannken branches: 1.19.2;
Change quota1_handle_cmd_quotaon() and q1sync() to use vfs_vnode_iterator.
 1.18 02-Feb-2012  matt branches: 1.18.6; 1.18.10;
Make this compile on vax (uninitialized use warning).
 1.17 01-Feb-2012  dholland Improve the names of some members of struct quotactl_args. These are
effectively function parameter names, but since they need to be
described with the same names in the man page the choices do matter.
Some.
 1.16 29-Jan-2012  dholland Remove references to <quota/quotaprop.h> in src/sys/ufs.
The remaining references in the kernel are in vfs_quotactl.c, the
compat_50 code for the old quotactl (to be fixed up), and the
code compiled from src/common/lib/libquota.
 1.15 29-Jan-2012  dholland Rename QUOTACTL_SET to QUOTACTL_PUT, to match future intended API.
 1.14 29-Jan-2012  dholland Combine the miscellaneous QUOTACTL_SET args into a struct quotakey.

Note: this change requires a kernel version bump.
 1.13 29-Jan-2012  dholland Pass only one objtype and its quotaval to QUOTACTL_SET at one time.

(The backend code to handle this is a lot tidier than I expected given
that the proplib code doesn't allow setting blocks and files
independently; I was afraid there would turn out to be a reason for
that...)

Note: this change requires a kernel version bump.
 1.12 29-Jan-2012  dholland For QUOTACTL_SET in quota1, use the quotaval data instead of proplib.
 1.11 29-Jan-2012  dholland Provide quota info to QUOTACTL_SET as two struct quotaval points
as well as via proplib.

Note: this change requires a kernel version bump.
 1.10 29-Jan-2012  dholland Use struct quotakey with QUOTACTL_GET. Tidy up accordingly.

Step 5 of 5 for QUOTACTL_GET.

Note: this change requires a kernel version bump.
 1.9 29-Jan-2012  dholland Per the FS-independent schema, get one quotaval at a time from the
filesystem, instead of blocks and files together.

This results in fetching each FS-level quota entry twice and therefore
doing slightly more work, but (1) quota access isn't a critical path
and (2) after fetching the block values the file values will be hot in
the cache, so it won't add much total time.

Also move more of the FS-independent defintions from <quota.h> to
<sys/quota.h> so we can use them internally.

Step 4 of 5 for QUOTACTL_GET.

Note: this change requires a kernel version bump.
 1.8 29-Jan-2012  dholland Move second-layer proplib frobbing within ufs quota code up to the
first layer. (Step 2 of several for QUOTACTL_GET.)
 1.7 29-Jan-2012  dholland Change dqblk_to_quotaval() from quota1_subr.c to dqblk_to_quotavals(),
and pass in two single quotaval structs (for blocks and inodes)
instead of an array of (implicitly) QUOTA_NLIMITS quotaval structs
indexed by constants from quotaprop.h.

Note: because this code is used by COMPAT_50 as well as ufs, this
change requires a kernel version bump. (The code is also used by
edquota, but via .PATH so it's not ABI-sensitive there.)
 1.6 25-Nov-2011  dholland branches: 1.6.2;
Rename struct ufs_quota_entry -> struct quotaval.
 1.5 07-Oct-2011  hannken branches: 1.5.2;
As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.
 1.4 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.3 24-Mar-2011  bouyer branches: 1.3.2; 1.3.4; 1.3.6;
Add a new libquota library, which contains some blocks to build and/or
parse quota plists; as well as a getfsquota() function to retrieve quotas
for a single id from a single filesystem (whatever filesystem this is:
a local quota-enabled fs or NFS). This is build on functions getufsquota()
(for local filesystems with UFS-like quotas) and getnfsquota();
which are also available to userland programs.
move functions from quota2_subr.c to libquota or libprop as appropriate,
and ajust in-tree quota tools.
move some declarations from kernel headers to either sys/quota.h or
quota/quota.h as appropriate. ufs/ufs/quota.h still installed because
it's needed by other installed ufs headers.
ufs/ufs/quota1.h still installed as a quick&dirty way to get a code
using the old quotactl() to compile (just include ufs/ufs/quota1.h instead of
ufs/ufs/quota.h - old code won't compile without this change and this is
on purpose).
Discussed on tech-kern@ and tech-net@ (long thread, but not much about
libquota itself ...)
 1.2 06-Mar-2011  bouyer merge the bouyer-quota2 branch. This adds a new on-disk format
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.

See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
 1.1 20-Jan-2011  bouyer branches: 1.1.2;
file ufs_quota1.c was initially added on branch bouyer-quota2.
 1.1.2.6 09-Feb-2011  bouyer Make it actually work.
 1.1.2.5 09-Feb-2011  bouyer Reimplement quotactl commands for quota1
 1.1.2.4 09-Feb-2011  bouyer Fix typo
 1.1.2.3 08-Feb-2011  bouyer Minimal hacking to make 'options QUOTA' compile again.
 1.1.2.2 31-Jan-2011  bouyer Catch up with Q2V -> QL renaming
Enforce limits for quota2.
pass quota type (*QUOTA) and limit type (QL_*) to
KAUTH_REQ_SYSTEM_FS_QUOTA_NOLIMIT, to make it possible to skip
limit checks for some quota type only if a listener wants to.
 1.1.2.1 20-Jan-2011  bouyer Snapshot of work in progress on a modernised disk quota system:
- new quotactl syscall (versionned for backward compat), which takes
as parameter a path to a mount point, and a prop_dictionary
(in plistref format) describing commands and arguments.
For each command, status and data are returned as a prop_dictionary.
quota commands features will be added to take advantage of this,
exporting quota data or getting quota commands as plists.

- new on disk-format storage (all 64bit wide), integrated to metadata for
ffs (and playing nicely with wapbl).
Quotas are enabled on a ffs filesystem via superblock flags.
tunefs(8) can enable or disable quotas.
On a quota-enabled filesystem, fsck_ffs(8) will track per-uid/gid
block and inode usages, and will check and update quotas in Pass 6.
quota usage and limits are stored in unliked files (one for users,
one for groups)l fsck_ffs(8) will create the files if needed, or
free them if needed. This means that after enabling or disabling
quotas on a filesystem; a fsck_ffs(8) run is required.
quotacheck(8) is not needed any more, on a unclean shutdown
fsck or journal replay will take care of fixing quotas.
newfs(8) can create a ready-to-mount quota-enabled filesystem
(superblock flags are set and quota inodes are created).
Other new features or semantic changes:
- default quota datas, applied to users or groups which don't already
have a quota entry
- per-user/group grace time (instead of a filesystem global one)
- 0 really means "nothing allowed at all", not "no limit".
If you want "no limit", set the limit to UQUAD_MAX (tools will
understand "unlimited" and "-")

A quota file is structured as follow:
it starts with a header, containing a few per-filesystem values,
and the default quota limits.
Quota entries are linked together as a simple list, each entry has a
pointer (as an offset withing the file) to the next.
The header has a pointer to a list of free quota entries, and
a hash table of in-use entries. The size of the hash table depends
on the filesystem block size (header+hash table should fit in the
first block). The file is not sparse and is a multiple of
filesystem block size (when the free quota entry list is empty a new
filesystem block is allocated). quota entries to not cross
filesystem block boundaries.

In memory, the kernel keeps a cache of recently used quota entries
as a reference to the block number, and offset withing the block.
The quota entry itself is keept in the buf cache.

fsck_ffs(8), tunefs(8) and newfs(8) supports are completed (with
related atf tests :)
The kernel can update disk usage and report it via quotactl(2).

Todo: enforce quotas limits (limits are not checked by kernel yet)
update repquota, edquota and rpc.rquotad to the new world
implement compat_50_quotactl ioctl.
update quotactl(2) man page

fsck_ffs required fixes so that allocating new blocks or inodes will
properly update the superblock and cg sumaries. This was not an issue up
to now because superblock and cg sumaries check happened last, but now
allocations or frees can happen in pass 6.
 1.3.6.2 06-Jun-2011  jruoho Sync with HEAD.
 1.3.6.1 24-Mar-2011  jruoho file ufs_quota1.c was added on branch jruoho-x86intr on 2011-06-06 09:10:19 +0000
 1.3.4.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.3.2.2 21-Apr-2011  rmind sync with head
 1.3.2.1 24-Mar-2011  rmind file ufs_quota1.c was added on branch rmind-uvmplock on 2011-04-21 01:42:21 +0000
 1.5.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.5.2.1 17-Apr-2012  yamt sync with head
 1.6.2.1 18-Feb-2012  mrg merge to -current.
 1.18.10.1 18-May-2014  rmind sync with head
 1.18.6.2 03-Dec-2017  jdolecek update from HEAD
 1.18.6.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.19.2.1 10-Aug-2014  tls Rebase.
 1.21.2.1 09-Jul-2016  skrll Sync with HEAD
 1.22.30.1 03-Jan-2021  thorpej Sync w/ HEAD.
 1.22.22.2 27-Apr-2022  martin Pull up following revision(s) (requested by hannken in ticket #1440):

usr.sbin/quotaon/quotaon.c: revision 1.31
lib/libquota/quota_oldfiles.c: revision 1.10
sys/ufs/ufs/ufs_quota1.c: revision 1.25

Fix default quota file names, both user and group quota used the
same default name "QUOTAFILENAME, names[USRQUOTA])" resulting in
diagnostic assertion and possibly corrupted quota data.

Keep flag "UFS_QUOTA" set until the last quota is closed.
Prevents a live lock when dqrele() finds a struct with "dq_cnt == 1"
and flag "DQ_MOD" and cannot sync as flag UFS_QUOTA is unset.

As the quota type comes from the kernel and is only valid when
quota is on get the type before quota_off and after quota_on.
 1.22.22.1 01-Jan-2021  martin Pull up following revision(s) (requested by nia in ticket #1176):

sys/ufs/ufs/ufs_quota1.c: revision 1.23

Avoid potentially accessing an array with an index out of range.
 1.22.10.2 27-Apr-2022  martin Pull up following revision(s) (requested by hannken in ticket #1739):

usr.sbin/quotaon/quotaon.c: revision 1.31
lib/libquota/quota_oldfiles.c: revision 1.10
sys/ufs/ufs/ufs_quota1.c: revision 1.25

Fix default quota file names, both user and group quota used the
same default name "QUOTAFILENAME, names[USRQUOTA])" resulting in
diagnostic assertion and possibly corrupted quota data.

Keep flag "UFS_QUOTA" set until the last quota is closed.
Prevents a live lock when dqrele() finds a struct with "dq_cnt == 1"
and flag "DQ_MOD" and cannot sync as flag UFS_QUOTA is unset.

As the quota type comes from the kernel and is only valid when
quota is on get the type before quota_off and after quota_on.
 1.22.10.1 01-Jan-2021  martin Pull up following revision(s) (requested by nia in ticket #1645):

sys/ufs/ufs/ufs_quota1.c: revision 1.23

Avoid potentially accessing an array with an index out of range.
 1.23.4.1 01-Aug-2021  thorpej Sync with HEAD.
 1.46 22-Feb-2023  riastradh ufs: Nix trailing whitespace and tidy up some other minor KNF.
 1.45 28-May-2022  andvar s/grabing/grabbing/ in comments.
 1.44 15-Oct-2021  andvar fix typos in comments.
 1.43 05-Dec-2020  thorpej Remove unnecessary inclusion of <sys/timevar.h>.
 1.42 01-Mar-2017  hannken branches: 1.42.26;
Remove now redundant calls to fstrans_start()/fstrans_done().
 1.41 20-Nov-2016  riastradh branches: 1.41.2;
KASSERT(mutex_owner(...)) ---> KASSERT(mutex_owned(...))

Fixes part of PR kern/47114. Tested by code inspection.
 1.40 28-Mar-2015  maxv branches: 1.40.2;
Remove the 'cred' argument from bread(). Remove a now unused var in
ffs_snapshot.c. Update the man page accordingly.

ok hannken@
 1.39 28-Jun-2014  dholland branches: 1.39.4;
Revert the following changes:

src/sys/sys/quotactl.h 1.37
src/sys/compat/netbsd32/netbsd32.h 1.101
src/sys/compat/netbsd32/netbsd32_netbsd.c 1.188, 1.189
src/sys/kern/vfs_quotactl.c 1.39
src/sys/kern/vfs_syscalls.c 1.483
src/sys/ufs/lfs/ulfs_quota.c 1.11
src/sys/ufs/ufs/ufs_quota.c 1.116
src/lib/libquota/quota_kernel.c 1.5

and do them correctly.

If you're going to change the name of something, you need to change
the name of *all* the things with the same name, not just a handful,
and you should change it to something similar so it still matches the
rest of the system rather than just picking an arbitrarily different
name.

Hi, Joerg.

To wit, rename the quotactl "delete" operation to "del", because
"delete" is a reserved word in C++ and for some reason Joerg wants to
run internal interfaces used only by C code through his C++ compiler.
Do not rename it to "remove" instead, because this doesn't match
libquota or the rest of the usage throughout the system; and rename
all the related identifiers, not just the ones that blew the mind of
Joerg's C++ compiler.

Because this is not a user-facing API (the only userland consumer
sys/quotactl.h is libquota) it is sort of ok to make arbitrary
source-incompatible changes; however, by the same token it's completely
unnecessary. If it *were* a user-facing API that someone might have a
semi-rational reason to want to run a C++ compiler on, it would be
incorrect to change it at this point.
 1.38 16-Mar-2014  uwe branches: 1.38.2;
Shut up -Wuninitialized on sh3 with gcc 4.8
 1.37 29-Jan-2014  bouyer Patch from Edgar Fu� on tech-kern:
set grace time if lowering the limit cause the user/group to now be overquota.
 1.36 20-Oct-2013  htodd Definining needswap where needed.
 1.35 27-Sep-2012  bouyer branches: 1.35.2;
Fix quota2 list corruption issue when defaultquotas are 0 (deny any file
and block allocation).

When quota2_check() is called with an uid not yet in the list,
getinoquota2() will call quota2_q2ealloc() to allocate a new entry for this
uid. quota2_q2ealloc() will remove an entry from the free list and
put it at the head of the corresponding hash list, and flush the block
containing the header if it's not the one also containing the allocated entry.
quota2_q2ealloc() then return the alocated entry and corresponding block
to caller (getinoquota2() here), which returns it to quota2_check().
quota2_check() then checks if the allocation can succeed, and returns and
error if not and calls brelse() on the buffer (because from his POW no
change was made to the entry), effectively discarding changes
to the entry that may have been made by quota2_q2ealloc().
Fix by always bwrite()ing the entry in quota2_q2ealloc(), and re-reading
the entry in caller.
 1.34 13-Feb-2012  dholland branches: 1.34.2; 1.34.4; 1.34.6;
Fix another problem with quota cursor iteration. ok riz
 1.33 05-Feb-2012  dholland Migrate one last leftover bit (used only by the kernel now) to
sys/ufs/ufs and remove the old quota headers and no-longer-used shared
code. Ok by releng.
 1.32 01-Feb-2012  dholland Improve the names of some members of struct quotactl_args. These are
effectively function parameter names, but since they need to be
described with the same names in the man page the choices do matter.
Some.
 1.31 01-Feb-2012  dholland Fix problems in cursor iteration that came to light when iterating one
value at a time, instead of in bulk. Yeah, repquota should do bulk get,
but it doesn't yet.
 1.30 29-Jan-2012  dholland Clean up quota2 cursoring, as promised earlier.
 1.29 29-Jan-2012  dholland quota2_check_limit() is used in only one place, so don't stuff it in a
header file.
 1.28 29-Jan-2012  dholland Remove #if 0'd proplib-related code.
 1.27 29-Jan-2012  dholland Remove references to <quota/quotaprop.h> in src/sys/ufs.
The remaining references in the kernel are in vfs_quotactl.c, the
compat_50 code for the old quotactl (to be fixed up), and the
code compiled from src/common/lib/libquota.
 1.26 29-Jan-2012  dholland Add QUOTACTL_CURSORSKIPIDTYPE, QUOTACTL_CURSORATEND, QUOTACTL_CURSORREWIND.

This change requires a kernel version bump.
 1.25 29-Jan-2012  dholland Don't pass the idtype to QUOTACTL_GETALL. Instead, iterate both users
and groups.

This change requires a kernel version bump.
 1.24 29-Jan-2012  dholland Fix a preexisting array overrun and a preexisting free twice exposed
by cleanup and testing.
 1.23 29-Jan-2012  dholland Call QUOTACTL_GETALL in a loop to get results 8 at a time. Make
the QUOTACTL_GETALL interface less abusive.

Note: this change requires a kernel version bump.
 1.22 29-Jan-2012  dholland Stop treating the default values specially in QUOTACTL_GETALL.

Note: this change requires a kernel version bump.
 1.21 29-Jan-2012  dholland Teach quota2 QUOTACTL_GETALL to acecpt a limit on how much it sends back.
Pass in a dummy limit for now.

Note: this change requires a kernel version bump.
 1.20 29-Jan-2012  dholland Teach quota2 QUOTACTL_GETALL to start in the middle, step 2.
 1.19 29-Jan-2012  dholland Teach quota2 QUOTACTL_GETALL to start in the middle, step 1.
 1.18 29-Jan-2012  dholland Hack QUOTACTL_GETALL to return results without using proplib.

(this interface is abusive and is going to be cleaned up in the
immediate future)

Note: this change requires a kernel version bump.
 1.17 29-Jan-2012  dholland Pass the cursor to QUOTACTL_GETALL. Don't pass unused proplib items.

Note: this change requires a kernel version bump.
 1.16 29-Jan-2012  dholland Begin adding quota cursor/iteration interface to VFS_QUOTACTL.

Add struct quotakcursor.
Add QUOTACTL_CURSOROPEN and QUOTACTL_CURSORCLOSE operations.
Implement the plumbing for them.
Add trivial implementations of them for quota2.
(iteration is not supported on quota1 for the time being, just as
getall isn't)
Have the proplib interpreter open and close a cursor around doing
QUOTACTL_GETALL.

Note: this change requires a kernel version bump.
 1.15 29-Jan-2012  dholland Package up the args of QUOTACTL_DELETE as a struct quotakey.
 1.14 29-Jan-2012  dholland QUOTACTL_CLEAR -> QUOTACTL_DELETE to match intended API and user API.
 1.13 29-Jan-2012  dholland Improve the quota2 QUOTACTL_CLEAR code to allow clearing blocks and
files independently.

Note: this change requires a kernel version bump.
 1.12 29-Jan-2012  dholland The handling of QUOTACTL_CLEAR does not use the proplib data
dictionary, so don't pass it.

Note: this change requires a kernel version bump.
 1.11 29-Jan-2012  dholland Rename QUOTACTL_SET to QUOTACTL_PUT, to match future intended API.
 1.10 29-Jan-2012  dholland Combine the miscellaneous QUOTACTL_SET args into a struct quotakey.

Note: this change requires a kernel version bump.
 1.9 29-Jan-2012  dholland Pass only one objtype and its quotaval to QUOTACTL_SET at one time.

(The backend code to handle this is a lot tidier than I expected given
that the proplib code doesn't allow setting blocks and files
independently; I was afraid there would turn out to be a reason for
that...)

Note: this change requires a kernel version bump.
 1.8 29-Jan-2012  dholland For QUOTACTL_SET in quota2, use the quotaval data instead of proplib.
 1.7 29-Jan-2012  dholland Use struct quotakey with QUOTACTL_GET. Tidy up accordingly.

Step 5 of 5 for QUOTACTL_GET.

Note: this change requires a kernel version bump.
 1.6 29-Jan-2012  dholland Per the FS-independent schema, get one quotaval at a time from the
filesystem, instead of blocks and files together.

This results in fetching each FS-level quota entry twice and therefore
doing slightly more work, but (1) quota access isn't a critical path
and (2) after fetching the block values the file values will be hot in
the cache, so it won't add much total time.

Also move more of the FS-independent defintions from <quota.h> to
<sys/quota.h> so we can use them internally.

Step 4 of 5 for QUOTACTL_GET.

Note: this change requires a kernel version bump.
 1.5 29-Jan-2012  dholland Move second-layer proplib frobbing within ufs quota code up to the
first layer. (Step 2 of several for QUOTACTL_GET.)
 1.4 07-Jun-2011  bouyer branches: 1.4.2; 1.4.6;
Fix bad cut'n'paste in copyright. Pointed out by dyoung@
 1.3 24-Mar-2011  bouyer branches: 1.3.2; 1.3.4; 1.3.6;
Add a new libquota library, which contains some blocks to build and/or
parse quota plists; as well as a getfsquota() function to retrieve quotas
for a single id from a single filesystem (whatever filesystem this is:
a local quota-enabled fs or NFS). This is build on functions getufsquota()
(for local filesystems with UFS-like quotas) and getnfsquota();
which are also available to userland programs.
move functions from quota2_subr.c to libquota or libprop as appropriate,
and ajust in-tree quota tools.
move some declarations from kernel headers to either sys/quota.h or
quota/quota.h as appropriate. ufs/ufs/quota.h still installed because
it's needed by other installed ufs headers.
ufs/ufs/quota1.h still installed as a quick&dirty way to get a code
using the old quotactl() to compile (just include ufs/ufs/quota1.h instead of
ufs/ufs/quota.h - old code won't compile without this change and this is
on purpose).
Discussed on tech-kern@ and tech-net@ (long thread, but not much about
libquota itself ...)
 1.2 06-Mar-2011  bouyer merge the bouyer-quota2 branch. This adds a new on-disk format
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.

See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
 1.1 20-Jan-2011  bouyer branches: 1.1.2;
file ufs_quota2.c was initially added on branch bouyer-quota2.
 1.1.2.16 18-Feb-2011  bouyer Only use B_MODIFY when needed, avoid unecessery copy on write when using
snapshots.
 1.1.2.15 12-Feb-2011  bouyer Don't count snapshot files in inode quota too.
At umount time, chk?q may be called after quota have been shutdown,
as there is a final vflush pass after quota?_umount(); so skip quota
checks if the quota vnode is not there any more.
 1.1.2.14 11-Feb-2011  bouyer walk_list: make sure we don't try to read the same quota disk block again
without releasing it first
 1.1.2.13 10-Feb-2011  bouyer Unless the filestystem is mounted MNT_SYNCHRONOUS, use bdwrite()
to write back updated quota entries.
 1.1.2.12 09-Feb-2011  bouyer Make sure to not try to close the quota vnodes twice.
 1.1.2.11 09-Feb-2011  bouyer Various build fixes
 1.1.2.10 08-Feb-2011  bouyer Minimal hacking to make 'options QUOTA' compile again.
 1.1.2.9 07-Feb-2011  bouyer Implement clear command (quota2 only), which either free the specified
quota2 entry (if both disk and inode usage are 0) or revert its limits to
the default quota entry.
 1.1.2.8 07-Feb-2011  bouyer Create a WAPBL transaction when setting quotas.
 1.1.2.7 03-Feb-2011  bouyer factor out code to chech a quota against its limits.
 1.1.2.6 31-Jan-2011  bouyer Catch up with Q2V -> QL renaming
Enforce limits for quota2.
pass quota type (*QUOTA) and limit type (QL_*) to
KAUTH_REQ_SYSTEM_FS_QUOTA_NOLIMIT, to make it possible to skip
limit checks for some quota type only if a listener wants to.
 1.1.2.5 30-Jan-2011  bouyer Implement 'set' command for quota2.
 1.1.2.4 29-Jan-2011  bouyer Describe how the on-disk structures are protected from concurent access,
and try to implement it.
 1.1.2.3 28-Jan-2011  bouyer Introduce quota2_ufs_rwq2v() and quota2_ufs_rwq2e() functions, which
byteswap a quota2_val or quota2_entry if needed.
Use this to get quota2_entry in host order before calling q2etoprop().

quota2_walk_list() will byteswap the offset if needed to leave
it in FS byte order in callers.
 1.1.2.2 21-Jan-2011  bouyer Add support for quotactl("getall") command, and convert repquota to new
world.
 1.1.2.1 20-Jan-2011  bouyer Snapshot of work in progress on a modernised disk quota system:
- new quotactl syscall (versionned for backward compat), which takes
as parameter a path to a mount point, and a prop_dictionary
(in plistref format) describing commands and arguments.
For each command, status and data are returned as a prop_dictionary.
quota commands features will be added to take advantage of this,
exporting quota data or getting quota commands as plists.

- new on disk-format storage (all 64bit wide), integrated to metadata for
ffs (and playing nicely with wapbl).
Quotas are enabled on a ffs filesystem via superblock flags.
tunefs(8) can enable or disable quotas.
On a quota-enabled filesystem, fsck_ffs(8) will track per-uid/gid
block and inode usages, and will check and update quotas in Pass 6.
quota usage and limits are stored in unliked files (one for users,
one for groups)l fsck_ffs(8) will create the files if needed, or
free them if needed. This means that after enabling or disabling
quotas on a filesystem; a fsck_ffs(8) run is required.
quotacheck(8) is not needed any more, on a unclean shutdown
fsck or journal replay will take care of fixing quotas.
newfs(8) can create a ready-to-mount quota-enabled filesystem
(superblock flags are set and quota inodes are created).
Other new features or semantic changes:
- default quota datas, applied to users or groups which don't already
have a quota entry
- per-user/group grace time (instead of a filesystem global one)
- 0 really means "nothing allowed at all", not "no limit".
If you want "no limit", set the limit to UQUAD_MAX (tools will
understand "unlimited" and "-")

A quota file is structured as follow:
it starts with a header, containing a few per-filesystem values,
and the default quota limits.
Quota entries are linked together as a simple list, each entry has a
pointer (as an offset withing the file) to the next.
The header has a pointer to a list of free quota entries, and
a hash table of in-use entries. The size of the hash table depends
on the filesystem block size (header+hash table should fit in the
first block). The file is not sparse and is a multiple of
filesystem block size (when the free quota entry list is empty a new
filesystem block is allocated). quota entries to not cross
filesystem block boundaries.

In memory, the kernel keeps a cache of recently used quota entries
as a reference to the block number, and offset withing the block.
The quota entry itself is keept in the buf cache.

fsck_ffs(8), tunefs(8) and newfs(8) supports are completed (with
related atf tests :)
The kernel can update disk usage and report it via quotactl(2).

Todo: enforce quotas limits (limits are not checked by kernel yet)
update repquota, edquota and rpc.rquotad to the new world
implement compat_50_quotactl ioctl.
update quotactl(2) man page

fsck_ffs required fixes so that allocating new blocks or inodes will
properly update the superblock and cg sumaries. This was not an issue up
to now because superblock and cg sumaries check happened last, but now
allocations or frees can happen in pass 6.
 1.3.6.2 06-Jun-2011  jruoho Sync with HEAD.
 1.3.6.1 24-Mar-2011  jruoho file ufs_quota2.c was added on branch jruoho-x86intr on 2011-06-06 09:10:19 +0000
 1.3.4.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.3.2.3 12-Jun-2011  rmind sync with head
 1.3.2.2 21-Apr-2011  rmind sync with head
 1.3.2.1 24-Mar-2011  rmind file ufs_quota2.c was added on branch rmind-uvmplock on 2011-04-21 01:42:21 +0000
 1.4.6.1 18-Feb-2012  mrg merge to -current.
 1.4.2.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.4.2.2 30-Oct-2012  yamt sync with head
 1.4.2.1 17-Apr-2012  yamt sync with head
 1.34.6.3 03-Dec-2017  jdolecek update from HEAD
 1.34.6.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.34.6.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.34.4.1 01-Nov-2012  matt sync with netbsd-6-0-RELEASE.
 1.34.2.2 18-Mar-2014  msaitoh Pull up following revision(s) (requested by bouyer in ticket #1027):
sys/ufs/ufs/ufs_quota2.c: revision 1.37
Patch from Edgar Fuss on tech-kern:
set grace time if lowering the limit cause the user/group to now be overquota.
 1.34.2.1 01-Oct-2012  riz Pull up following revision(s) (requested by bouyer in ticket #580):
tests/fs/ffs/h_quota2_tests.c: revision 1.4
tests/fs/ffs/t_miscquota.sh: revision 1.7
sys/ufs/ufs/ufs_quota2.c: revision 1.35
Fix quota2 list corruption issue when defaultquotas are 0 (deny any file
and block allocation).
When quota2_check() is called with an uid not yet in the list,
getinoquota2() will call quota2_q2ealloc() to allocate a new entry for this
uid. quota2_q2ealloc() will remove an entry from the free list and
put it at the head of the corresponding hash list, and flush the block
containing the header if it's not the one also containing the allocated entry.
quota2_q2ealloc() then return the alocated entry and corresponding block
to caller (getinoquota2() here), which returns it to quota2_check().
quota2_check() then checks if the allocation can succeed, and returns and
error if not and calls brelse() on the buffer (because from his POW no
change was made to the entry), effectively discarding changes
to the entry that may have been made by quota2_q2ealloc().
Fix by always bwrite()ing the entry in quota2_q2ealloc(), and re-reading
the entry in caller.
Add test cases for the bug fixed in sys/ufs/ufs/ufs_quota2.c 1.35:
when a on-disk block/inode allocation triggers allocating a new
quota entry, the new quota entry is not in the quota2 header block,
and the allocation will later be denied, the changes to the quota block would
not be flushed to disk, leading to list corruption (detected by fsck).
 1.35.2.1 18-May-2014  rmind sync with head
 1.38.2.1 10-Aug-2014  tls Rebase.
 1.39.4.3 28-Aug-2017  skrll Sync with HEAD
 1.39.4.2 05-Dec-2016  skrll Sync with HEAD
 1.39.4.1 06-Apr-2015  skrll Sync with HEAD
 1.40.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.40.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.41.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.42.26.1 14-Dec-2020  thorpej Sync w/ HEAD.
 1.129 19-Oct-2024  jakllsch ufs: base amount of data to sync on MAXPHYS instead of constant

No functional change except on sun2 and sun3, as ilog2(MAXPHYS) is the
same as the previous constant (16) on all other ports. On sun[23] this
changes amount written from 56KiB/8KiB to 2x 32KiB.
 1.128 21-Feb-2022  hannken branches: 1.128.10;
Fix wrong assertion, the negatiopn of "a && b" is "!a || !b" so we
need "DIP(ip, blocks) != 0" here.

Should fix PR kern/56725 (Panic when ls directory with device nodes
on an older ffs)
 1.127 20-Oct-2021  thorpej Overhaul of the EVFILT_VNODE kevent(2) filter:

- Centralize vnode kevent handling in the VOP_*() wrappers, rather than
forcing each individual file system to deal with it (except VOP_RENAME(),
because VOP_RENAME() is a mess and we currently have 2 different ways
of handling it; at least it's reasonably well-centralized in the "new"
way).
- Add support for NOTE_OPEN, NOTE_CLOSE, NOTE_CLOSE_WRITE, and NOTE_READ,
compatible with the same events in FreeBSD.
- Track which kevent notifications clients are interested in receiving
to avoid doing work for events no one cares about (avoiding, e.g.
taking locks and traversing the klist to send a NOTE_WRITE when
someone is merely watching for a file to be deleted, for example).

In support of the above:

- Add support in vnode_if.sh for specifying PRE- and POST-op handlers,
to be invoked before and after vop_pre() and vop_post(), respectively.
Basic idea from FreeBSD, but implemented differently.
- Add support in vnode_if.sh for specifying CONTEXT fields in the
vop_*_args structures. These context fields are used to convey information
between the file system VOP function and the VOP wrapper, but do not
occupy an argument slot in the VOP_*() call itself. These context fields
are initialized and subsequently interpreted by PRE- and POST-op handlers.
- Version VOP_REMOVE(), uses the a context field for the file system to report
back the resulting link count of the target vnode. Return this in tmpfs,
udf, nfs, chfs, ext2fs, lfs, and ufs.

NetBSD 9.99.92.
 1.126 23-Apr-2020  ad PR kern/54759 (vm.ubc_direct deadlock when read()/write() into mapping of itself)

- Add new flag UBC_ISMAPPED which tells ubc_uiomove() the object is mmap()ed
somewhere. Use it to decide whether to do direct-mapped copy, rather than
poking around directly in the vnode in ubc_uiomove(), which is ugly and
doesn't work for tmpfs. It would be nicer to contain all this in UVM but
the filesystem provides the needed locking here (VV_MAPPED) and to
reinvent that would suck more.

- Rename UBC_UNMAP_FLAG() to UBC_VNODE_FLAGS(). Pass in UBC_ISMAPPED where
appropriate.
 1.125 23-Feb-2020  ad branches: 1.125.4;
UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.124 20-Jun-2019  christos branches: 1.124.4;
unifdef -ULFS_READWRITE ufs_readwrite.c
 1.123 10-Dec-2018  jdolecek put back UFS_WAPBL_JUNLOCK_ASSERT(), the underlying rw_write_held() check
doesn't actually have a race since it checks if the rwlock is held by
current lwp
 1.122 10-Dec-2018  jdolecek make UFS_WAPBL_JLOCK_ASSERT() #ifdef DIAGNOSTIC, same as the underlying
function KASSERT(), so that it actually does something; fix code using
it to actually pass correct params, so that it compiles

remove UFS_WAPBL_JUNLOCK_ASSERT(), as that is inherently racy (it's
okay on those places if the rwlock is held by other lwp); depend
on the RW_ASSERT()/LOCKDEBUG inside rw_enter() to catch the case
with wapbl rwlock held by current lwp
 1.121 01-Mar-2017  hannken branches: 1.121.14;
Remove now redundant calls to fstrans_start()/fstrans_done().
 1.120 12-Apr-2015  riastradh branches: 1.120.2; 1.120.4;
Omit now-unused variable. rump build didn't catch this...
 1.119 12-Apr-2015  riastradh Don't putpages in ufs buffercached writes: kassert there are none.
 1.118 31-Mar-2015  riastradh Amplify that even if we fixed it now the tentacles are still stuck.
 1.117 31-Mar-2015  riastradh Write an essay explaining why ffs_write is one huge WAPBL transaction.
 1.116 29-Mar-2015  riastradh WAPBL tx is always locked by ufs_bufrd caller, so never unlock it.
 1.115 28-Mar-2015  maxv Remove the 'cred' argument from bread(). Remove a now unused var in
ffs_snapshot.c. Update the man page accordingly.

ok hannken@
 1.114 28-Mar-2015  riastradh Let I/O errors override inode update errors in UFS.

Fixes tests/fs/vfs/t_io:read_fault for UFS.
 1.113 28-Mar-2015  maxv Remove the 'cred' argument from breadn(), and update the man page
accordingly.

ok hannken@
 1.112 28-Mar-2015  riastradh Factor out post-read/write inode updates in UFS.
 1.111 28-Mar-2015  riastradh VOP_WRITE never has IO_JOURNALLOCKED.
 1.110 28-Mar-2015  riastradh Turn some `#if DIAGNOSTIC' into KASSERT.
 1.109 27-Mar-2015  riastradh Tighten some kasserts in ufs_bufio code paths.
 1.108 27-Mar-2015  riastradh Disentangle buffer-cached I/O from page-cached I/O in UFS.

Page-cached I/O is used for regular files, and is initiated by VFS
users such as userland and NFS.

Buffer-cached I/O is used for directories and symlinks, and is issued
only internally by UFS.

New UFS routine ufs_bufio replaces vn_rdwr for internal use.
ufs_bufio is implemented by new UFS operations uo_bufrd/uo_bufwr,
which sit in ufs_readwrite.c alongside the VOP_READ/VOP_WRITE
implementations.

I preserved the code as much as possible and will leave further
simplification for future commits. I kept the ulfs_readwrite.c
copypasta close to ufs_readwrite.c in case we ever want to merge them
back; likewise ext2fs_readwrite.c.

No externally visible semantic change. All atf fs tests still pass.
 1.107 23-Jun-2013  dholland branches: 1.107.10;
Stick ffs_, ext2_, chfs_, filecore_, cd9660_, or mfs_ in front of
the following symbols so as to disambiguate fully. (Christos already
did the lfs ones.)

lblkno
lblktosize
lfragtosize
numfrags
blkroundup
fragroundup
 1.106 19-Jun-2013  dholland Rename ambiguous macros:
MAXDIRSIZE -> UFS_MAXDIRSIZE or LFS_MAXDIRSIZE
NINDIR -> FFS_NINDIR, EXT2_NINDIR, LFS_NINDIR, or MFS_NINDIR
INOPB -> FFS_INOPB, LFS_INOPB
INOPF -> FFS_INOPF, LFS_INOPF
blksize -> ffs_blksize, ext2_blksize, or lfs_blksize
sblksize -> ffs_blksize

These are not the only ambiguously defined filesystem macros, of
course, there's a pile more. I may not have found all the ambiguous
definitions of blksize(), too, as there are a lot of other things
called 'blksize' in the system.
 1.105 22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.104 29-Apr-2012  chs branches: 1.104.2;
change vflushbuf() to take the full FSYNC_* flags.
translate FSYNC_LAZY into PGO_LAZY for VOP_PUTPAGES() so that
genfs_do_io() can set the appropriate io priority for the I/O.
this is the first part of addressing PR 46325.
 1.103 17-Apr-2012  christos it is not an error if the kernel needs to clear the setuid/
setgid bit on write/chown/chgrp
 1.102 13-Mar-2012  elad Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.
 1.101 02-Jan-2012  perseant branches: 1.101.2;

* Remove PGO_RECLAIM during lfs_putpages()' call to genfs_putpages(),
to avoid a live lock in the latter when reclaiming a vnode with
dirty pages.

* Add a new segment flag, SEGM_RECLAIM, to note when a segment is
being written for vnode reclamation, and record which inode is being
reclaimed, to aid in forensic debugging.

* Add a new segment flag, SEGM_SINGLE, so that opportunistic writes
can write a single segment's worth of blocks and then stop, rather
than writing all the way up to the cleaner's reserved number of
segments.

* Add assert statements to check mutex ownership is the way it ought
to be, mostly in lfs_putpages; fix problems uncovered by this.

* Don't clear VU_DIROP until the inode actually makes its way to disk,
avoiding a problem where dirop inodes could become separated
(uncovered by a modified version of the "ckckp" forensic regression
test).

* Move the vfs_getopsbyname() call into lfs_writerd. Prepare code to
make lfs_writerd notice when there are no more LFSs, and exit losing
the reference, so that, in theory, the module can be unloaded. This
code is not enabled, since it causes a crash on exit.

* Set IN_MODIFIED on inodes flushed by lfs_flush_dirops. Really we
only need to set IN_MODIFIED if we are going to write them again
(e.g., to write pages); need to think about this more.

Finally, several changes to help avoid "no clean segments" panics:

* In lfs_bmapv, note when a vnode is loaded only to discover whether
its blocks are live, so it can immediately be recycled. Since the
cleaner will try to choose ~empty segments over full ones, this
prevents the cleaner from (1) filling the vnode cache with junk, and
(2) squeezing any unwritten writes to disk and running the fs out of
segments.

* Overestimate by half the amount of metadata that will be required
to fill the clean segments. This will make the disk appear smaller,
but should help avoid a "no clean segments" panic.

* Rearrange lfs_writerd. In particular, lfs_writerd now pays
attention to the number of clean segments available, and holds off
writing until there is room.
 1.100 18-Nov-2011  christos branches: 1.100.4;
Obey MNT_RELATIME, the only addition is that mkdir in ufs sets IN_ACCESS too.
 1.99 11-Jul-2011  hannken branches: 1.99.2;
Change VOP_BWRITE() to take a vnode as its first argument like all other
VOPs do. Layered file systems no longer have to modify bp->b_vp and run
into trouble when an async VOP_BWRITE() uses the wrong vnode.

- change all occurences of VOP_BWRITE(bp) to VOP_BWRITE(bp->b_vp, bp).
- remove layer_bwrite().
- welcome to 5.99.55

Adresses PR kern/38762 panic: vwakeup: neg numoutput

No objections from tech-kern@.
 1.98 19-Jun-2011  rmind - Fix a silly bug: remove umap from uobj in ubc_release() UBC_UNMAP case.
- Use UBC_WANT_UNMAP() consistently.

ARM (PMAP_CACHE_VIVT case) works again.
 1.97 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.96 06-Mar-2011  bouyer branches: 1.96.2;
merge the bouyer-quota2 branch. This adds a new on-disk format
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.

See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
 1.95 23-Apr-2010  pooka branches: 1.95.2; 1.95.4;
Enforce RLIMIT_FSIZE before VOP_WRITE. This adds support to file
system drivers where it was missing from and fixes one buggy
implementation. The arguably weird semantics of the check are
maintained (v_size vs. va_bytes, overwrite).
 1.94 22-Feb-2009  ad branches: 1.94.2; 1.94.4;
PR kern/26878 FFSv2 + softdep = livelock (no free ram)
PR kern/16942 panic with softdep and quotas
PR kern/19565 panic: softdep_write_inodeblock: indirect pointer #1 mismatch
PR kern/26274 softdep panic: allocdirect_merge: ...
PR kern/26374 Long delay before non-root users can write to softdep partitions
PR kern/28621 1.6.x "vp != NULL" panic in ffs_softdep.c:4653 while unmounting a softdep (+quota) filesystem
PR kern/29513 FFS+Softdep panic with unfsck-able file-corruption
PR kern/31544 The ffs softdep code appears to fail to write dirty bits to disk
PR kern/31981 stopping scsi disk can cause panic (softdep)
PR kern/32116 kernel panic in softdep (assertion failure)
PR kern/32532 softdep_trackbufs deadlock
PR kern/37191 softdep: locking against myself
PR kern/40474 Kernel panic after remounting raid root with softdep

Retire softdep, pass 2. As discussed and later formally announced on the
mailing lists.
 1.93 08-Dec-2008  pooka branches: 1.93.2;
Decode write access advice and pass to uvm (not that it's handled
there, but ...).
 1.92 19-Oct-2008  hannken branches: 1.92.2;
Make genfs_directio() IO_JOURNALLOCKED aware. DirectIO no longer triggers
"locking against myself" panic in wapbl_begin().

Observed and tested by: Frank Kardel <kardel@netbsd.org>
 1.91 22-Aug-2008  hannken Add snapshot support for logging ffs file systems.

- Add UFS_WAPBL_BEGIN() / UFS_WAPBL_END() where needed.

- Expunge WAPBL log inodes from snapshots.

- Ffs_copyonwrite() and ffs_snapblkfree() must run inside a WAPBL transaction.

- Add ffs_gop_write() as a wrapper around genfs_gop_write() that makes sure
genfs_gop_write() gets always called inside a WAPBL transaction.

- Add VOP_PUTPAGES() flag PGO_JOURNALLOCKED to tag calls to VOP_PUTPAGES()
inside a WAPBL transaction.

Reviewed by: Simon Burge <simonb@netbsd.org>, Greg Oster <oster@netbsd.org>

PGO_JOURNALLOCKED / ffs_gop_write() part presented on tech-kern@.
 1.90 12-Aug-2008  hannken Deny read/write access to snapshot vnodes. We use fss(4) to read from
snapshots. With this policy in place:

- Separate the snapshot vnode lock from the snapshot common lock.
Snapshots no longer need recursive vnode locks.

- Use a mutex (si_snaplock) to serialize creation, deletion, reading and
writing of snapshots.

- Move ffs_read() for snapshots into ffs_snapshot.c.

Reviewed by: Jason Thorpe <thorpej@netbsd.org>

While here change ffs_copyonwrite() to fail requests from pagedaemon that need
to copy-on-write.
 1.89 31-Jul-2008  simonb Merge the simonb-wapbl branch. From the original branch commit:

Add Wasabi System's WAPBL (Write Ahead Physical Block Logging)
journaling code. Originally written by Darrin B. Jewell while
at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

OK'd by core@, releng@.
 1.88 16-May-2008  hannken branches: 1.88.2; 1.88.4;
Make sure all cached buffers with valid, not yet written data have been
run through copy-on-write. Call fscow_run() with valid data where possible.

The LP_UFSCOW hack is no longer needed to protect ffs_copyonwrite() against
endless recursion.

- Add a flag B_MODIFY to bread(), breada() and breadn(). If set the caller
intends to modify the buffer returned.

- Always run copy-on-write on buffers returned from ffs_balloc().

- Add new function ffs_getblk() that gets a buffer, assigns a new blkno,
may clear the buffer and runs copy-on-write. Process possible errors
from getblk() or fscow_run(). Part of PR kern/38664.

Welcome to 4.99.63

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.87 24-Apr-2008  ad branches: 1.87.2; 1.87.4;
Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.
 1.86 02-Jan-2008  ad branches: 1.86.6; 1.86.8;
Merge vmlocking2 to head.
 1.85 08-Dec-2007  pooka branches: 1.85.4;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.
 1.84 10-Oct-2007  ad branches: 1.84.4; 1.84.6;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.83 08-Oct-2007  ad Merge ffs locking & brelse changes from the vmlocking branch.
 1.82 24-Sep-2007  pooka BLKSIZE is always the same as blksize these days, so get rid of it.
 1.81 27-Jul-2007  yamt branches: 1.81.4; 1.81.6; 1.81.8; 1.81.10;
use ubc_uiomove for read as well.
 1.80 27-Jul-2007  yamt ubc_uiomove: add an "advice" argument rather than using UVM_ADV_RANDOM blindly.
 1.79 05-Jun-2007  yamt branches: 1.79.2;
improve post-ubc file overwrite performance in common cases.
ie. when it's safe, actually overwrite blocks rather than doing
read-modify-write.

also fixes PR/33152 and PR/36303.
 1.78 17-May-2007  hannken Fstrans_start() always returns zero, so change its type to void.
 1.77 19-Apr-2007  yamt hold proclist_mutex when calling psignal().
 1.76 22-Feb-2007  thorpej branches: 1.76.4; 1.76.6;
TRUE -> true, FALSE -> false
 1.75 21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.74 29-Jan-2007  hannken branches: 1.74.2;
Change fstrans enum types to upper case.
No functional change.

From Antti Kantee <pooka@netbsd.org>
 1.73 19-Jan-2007  hannken New file system suspension API to replace vn_start_write and vn_finished_write.
The suspension helpers are now put into file system specific operations.
This means every file system not supporting these helpers cannot be suspended
and therefore snapshots are no longer possible.

Implemented for file systems of type ffs.

The new API is enabled on a kernel option NEWVNGATE. This option is
not enabled by default in any kernel config.

Presented and discussed on tech-kern with much input from
Bill Studenmund <wrstuden@netbsd.org> and YAMAMOTO Takashi <yamt@netbsd.org>.

Welcome to 4.99.9 (new vfs op vfs_suspendctl).
 1.72 04-Jan-2007  elad Consistent usage of KAUTH_GENERIC_ISSUSER.
 1.71 14-Oct-2006  yamt don't use g_glock directly.
 1.70 05-Oct-2006  chs add support for O_DIRECT (I/O directly to application memory,
bypassing any kernel caching for file data).
 1.69 03-Oct-2006  christos Coverity CID 3156: async = TRUE when LFS_READWRITE is defined, leading to
dead code. Ifdef the dead code appropriately (from Arnaud Lacombe)
 1.68 14-May-2006  elad branches: 1.68.8; 1.68.10;
integrate kauth.
 1.67 01-Mar-2006  yamt branches: 1.67.2; 1.67.4; 1.67.6;
merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.
 1.66 11-Dec-2005  christos branches: 1.66.2; 1.66.4; 1.66.6;
merge ktrace-lwp.
 1.65 29-Nov-2005  yamt merge yamt-readahead branch.
 1.64 02-Nov-2005  yamt branches: 1.64.2;
merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.63 19-Apr-2005  perseant branches: 1.63.2; 1.63.4;
Keep per-inode, per-fs, and subsystem-wide counts of blocks allocated through
lfs_balloc(), and use that to estimate the number of dirty pages belonging
to LFS (subsystem or filesystem). This is almost certainly wrong for
the case of a large mmap()ed region, but the accounting is tighter than
what we had before, and performs much better in the typical case of pages
dirtied through write().
 1.62 01-Apr-2005  perseant Protect various per-fs structures with fs->lfs_interlock simple_lock, to
improve behavior in the multiprocessor case. Add debugging segment-lock
assertion statements.
 1.61 26-Feb-2005  perseant branches: 1.61.2;
Various minor LFS improvements:

* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statvfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().
 1.60 09-Jan-2005  chs branches: 1.60.2; 1.60.4;
adjust the UBC mapping code to support non-vnode uvm_objects.
this means we can no longer look at the vnode size to determine how many
pages to request in a fault, which is good since for NFS the size can change
out from under us on the server anyway. there's also a new flag UBC_UNMAP
for ubc_release(), so that the file system code can make the decision about
whether to cache mappings for files being used as executables.
 1.59 10-Sep-2004  yamt g/c no longer used definition of fs_maxfilesize.
 1.58 15-Aug-2004  mycroft Fix some formatting glitches.
 1.57 15-Aug-2004  mycroft Minor simplification to some arithmetic.
 1.56 15-Aug-2004  mycroft Fixing age old cruft:
* Rather than using mnt_maxsymlinklen to indicate that a file systems returns
d_type fields(!), add a new internal flag, IMNT_DTYPE.

Add 3 new elements to ufsmount:
* um_maxsymlinklen, replaces mnt_maxsymlinklen (which never should have existed
in the first place).
* um_dirblksiz, which tracks the current directory block size, eliminating the
FS-specific checks littered throughout the code. This may be used later to
make the block size variable.
* um_maxfilesize, which is the maximum file size, possibly adjusted lower due
to implementation issues.

Sync some bug fixes from FFS into ext2fs, particularly:
* ffs_lookup.c 1.21, 1.28, 1.33, 1.48
* ffs_inode.c 1.43, 1.44, 1.45, 1.66, 1.67
* ffs_vnops.c 1.84, 1.85, 1.86

Clean up some crappy pointer frobnication.
 1.55 07-Aug-2003  agc branches: 1.55.4;
Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.54 29-Jun-2003  fvdl branches: 1.54.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.53 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.52 15-May-2003  kristerw The C language does not permit statements of the form
(X ? Y : Z) = 0;
even though gcc handles this by a stupid extension.

Transform these to correct C.

Approved by fvdl.
 1.51 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.50 15-Mar-2003  perseant Add write-behind to lfs_write().
 1.49 08-Mar-2003  perseant Take away "#ifdef LFS_UBC".
 1.48 17-Feb-2003  perseant Add code to UBCify LFS. This is still behind "#ifdef LFS_UBC" for now
(there are still some details to work out) but expect that to go
away soon. To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:

* Create a writer daemon kernel thread whose purpose is to handle page
writes for the pagedaemon, but which also takes over some of the
functions of lfs_check(). This thread is started the first time an
LFS is mounted.

* Add a "flags" parameter to GOP_SIZE. Current values are
GOP_SIZE_READ, meaning that the call should return the size of the
in-core version of the file, and GOP_SIZE_WRITE, meaning that it
should return the on-disk size. One of GOP_SIZE_READ or
GOP_SIZE_WRITE must be specified.

* Instead of using malloc(...M_WAITOK) for everything, reserve enough
resources to get by and use malloc(...M_NOWAIT), using the reserves if
necessary. Use the pool subsystem for structures small enough that
this is feasible. This also obsoletes LFS_THROTTLE.

And a few that are not strictly necessary:

* Moves the LFS inode extensions off onto a separately allocated
structure; getting closer to LFS as an LKM. "Welcome to 1.6O."

* Unified GOP_ALLOC between FFS and LFS.

* Update LFS copyright headers to correct values.

* Actually cast to unsigned in lfs_shellsort, like the comment says.

* Keep track of which segments were empty before the previous
checkpoint; any segments that pass two checkpoints both dirty and
empty can be summarily cleaned. Do this. Right now lfs_segclean
still works, but this should be turned into an effectless
compatibility syscall.
 1.47 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.46 28-Dec-2002  yamt - in lfs_reserve, vref vnodes that we're locking so that cleaner doesn't
try to reclaim them.
(workaround for deadlock noted in the comment in lfs_reserveavail)
- in lfs_rename, mark vnodes which are being moved as well as directry vnodes.
 1.45 26-Dec-2002  yamt - in lfs_reserve, reserve locked buffer count as well.
- don't wait for locking buf in lfs_bwrite_ext to avoid deadlocks.
- skip lfs_reserve when we're doing dirop.
reserve more (for lfs_truncate) in set_dirop instead.

this mostly solves PR 18972. (and hopefully PR 19196)
 1.44 23-Oct-2002  jdolecek merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe
 1.43 18-Oct-2002  yamt make sure to update the vnode's size even if uiomove failed.
otherwise, softdep states can't be flushed later.

ok'ed by Chuck Silvers. fix PR/16670.
 1.42 25-Mar-2002  chs branches: 1.42.4;
if the size argument to write(2) is 0, do not modify the file in any way,
including updating timestamps. required for standards conformance.
 1.41 22-Mar-2002  chs in lfs_write(), flush and invalidate any page cache pages in the range
that we're about to modify. this weak attempt at coherency is enough
to make some applications (eg. "tail -f") happy, so it's worth having.
 1.40 30-Nov-2001  chs VOP_PUTPAGES() requires page-aligned offsets, so be sure to provide such.
fixes PR 14759.

(while I'm here, call VOP_PUTPAGES() directly instead of indirecting through
the UVM pager op vector.)
 1.39 17-Nov-2001  simonb Set `flags' before being used in the WRITE() function.
 1.38 08-Nov-2001  chs in both paths that can cause fragments to be expanded (write and truncate-up),
deal with the fragment expansion separately before the rest of the operation.
this allows us to simplify ufs_balloc_range() by not worrying about implicit
fragment expansion.

call VOP_PUTPAGES() directly for vnodes instead of
going through the UVM pager "put" vector.
 1.37 08-Nov-2001  lukem add RCSID. (note; this file gets #included)
 1.36 03-Oct-2001  chs branches: 1.36.2;
don't do any flush-behind for async mounts.
this matches the traditional behaviour.
 1.35 30-Sep-2001  chs process one block at a time even when we're using the write fast path
that avoids zeroing pages. this avoids a mess when we get ENOSPC and
softdeps are enabled.
 1.34 16-Sep-2001  chs make LFS work again.
 1.33 15-Sep-2001  chs a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.32 13-Jul-2001  perseant branches: 1.32.2;
Merge the short-lived perseant-lfsv2 branch into the trunk.

Kernels and tools understand both v1 and v2 filesystems; newfs_lfs
generates v2 by default. Changes for the v2 layout include:

- Segments of non-PO2 size and arbitrary block offset, so these can be
matched to convenient physical characteristics of the partition (e.g.,
stripe or track size and offset).

- Address by fragment instead of by disk sector, paving the way for
non-512-byte-sector devices. In theory fragments can be as large
as you like, though in reality they must be smaller than MAXBSIZE in size.

- Use serial number and filesystem identifier to ensure that roll-forward
doesn't get old data and think it's new. Roll-forward is enabled for
v2 filesystems, though not for v1 filesystems by default.

- The inode free list is now a tailq, paving the way for undelete (undelete
is not yet implemented, but can be without further non-backwards-compatible
changes to disk structures).

- Inode atime information is kept in the Ifile, instead of on the inode;
that is, the inode is never written *just* because atime was changed.
Because of this the inodes remain near the file data on the disk, rather
than wandering all over as the disk is read repeatedly. This speeds up
repeated reads by a small but noticeable amount.

Other changes of note include:

- The ifile written by newfs_lfs can now be of arbitrary length, it is no
longer restricted to a single indirect block.

- Fixed an old bug where ctime was changed every time a vnode was created.
I need to look more closely to make sure that the times are only updated
during write(2) and friends, not after-the-fact during a segment write,
and certainly not by the cleaner.
 1.31 26-Mar-2001  chs branches: 1.31.2; 1.31.4;
work around a problem with sync writes vs. softdeps.
 1.30 27-Feb-2001  chs branches: 1.30.2;
min() -> MIN(), max() -> MAX().
fixes more problems with file offsets > 4GB.
 1.29 26-Feb-2001  lukem some KNF
 1.28 27-Nov-2000  chs Initial integration of the Unified Buffer Cache project.
 1.27 09-Sep-2000  perseant Various bug-fixes to LFS, to wit:


Kernel:

* Add runtime quantity lfs_ravail, the number of disk-blocks reserved
for writing. Writes to the filesystem first reserve a maximum amount
of blocks before their write is allowed to proceed; after the blocks
are allocated the reserved total is reduced by a corresponding amount.

If the lfs_reserve function cannot immediately reserve the requested
number of blocks, the inode is unlocked, and the thread sleeps until
the cleaner has made enough space available for the blocks to be
reserved. In this way large files can be written to the filesystem
(or, smaller files can be written to a nearly-full but thoroughly
clean filesystem) and the cleaner can still function properly.

* Remove explicit switching on dlfs_minfreeseg from the kernel code; it
is now merely a fs-creation parameter used to compute dlfs_avail and
dlfs_bfree (and used by fsck_lfs(8) to check their accuracy). Its
former role is better assumed by a properly computed dlfs_avail.

* Bounds-check inode numbers submitted through lfs_bmapv and lfs_markv.
This prevents a panic, but, if the cleaner is feeding the filesystem
the wrong data, you are still in a world of hurt.

* Cleanup: remove explicit references of DEV_BSIZE in favor of
btodb()/dbtob().

lfs_cleanerd:

* Make -n mean "send N segments' blocks through a single call to
lfs_markv". Previously it had meant "clean N segments though N calls
to lfs_markv, before looking again to see if more need to be cleaned".
The new behavior gives better packing of direct data on disk with as
little metadata as possible, largely alleviating the problem that the
cleaner can consume more disk through inefficient use of metadata than
it frees by moving dirty data away from clean "holes" to produce
entirely clean segments.

* Make -b mean "read as many segments as necessary to write N segments
of dirty data back to disk", rather than its former meaning of "read
as many segments as necessary to free N segments worth of space". The
new meaning, combined with the new -n behavior described above,
further aids in cleaning storage efficiency as entire segments can be
written at once, using as few blocks as possible for segment summaries
and inode blocks.

* Make the cleaner take note of segments which could not be cleaned due
to error, and not attempt to clean them until they are entirely free
of dirty blocks. This prevents the case in which a cleanerd running
with -n 1 and without -b (formerly the default) would spin trying
repeatedly to clean a corrupt segment, while the remaining space
filled and deadlocked the filesystem.

* Update the lfs_cleanerd manual page to describe all the options,
including the changes mentioned here (in particular, the -b and -n
flags were previously undocumented).

fsck_lfs:

* Check, and optionally fix, lfs_avail (to an exact figure) and
lfs_bfree (within a margin of error) in pass 5.

newfs_lfs:

* Reduce the default dlfs_minfreeseg to 1/20 of the total segments.

* Add a warning if the sgs disklabel field is 16 (the default for FFS'
cpg, but not usually desirable for LFS' sgs: 5--8 is a better range).

* Change the calculation of lfs_avail and lfs_bfree, corresponding to
the kernel changes mentioned above.

mount_lfs:

* Add -N and -b options to pass corresponding -n and -b options to
lfs_cleanerd.

* Default to calling lfs_cleanerd with "-b -n 4".


[All of these changes were largely tested in the 1.5 branch, with the
idea that they (along with previous un-pulled-up work) could be applied
to the branch while it was still in ALPHA2; however my test system has
experienced corruption on another filesystem (/dev/console has gone
missing :^), and, while I believe this unrelated to the LFS changes, I
cannot with good conscience request that the changes be pulled up.]
 1.26 27-May-2000  perseant branches: 1.26.4;
Prevent dirops from getting around lfs_check and wedging the buffer cache.
All the dirop vnops now mark the inodes with a new flag, IN_ADIROP, which
is removed as soon as the dirop is done (as opposed to VDIROP which stays
until the file is written). To address one issue raised in PR#9357.
 1.25 13-May-2000  perseant Change the sementics of the last parameter from a boolean ("waitfor") to
a set of flags ("flags"). Two flags are defined, UPDATE_WAIT and
UPDATE_DIROP.

Under the old semantics, VOP_UPDATE would block if waitfor were set,
under the assumption that directory operations should be done
synchronously. At least LFS and FFS+softdep do not make this
assumption; FFS+softdep got around the problem by enclosing all relevant
calls to VOP_UPDATE in a "if(!DOINGSOFTDEP(vp))", while LFS simply
ignored waitfor, one of the reasons why NFS-serving an LFS filesystem
did not work properly.

Under the new semantics, the UPDATE_DIROP flag is a hint to the
fs-specific update routine that the call comes from a dirop routine, and
should be wait for, or not, accordingly.

Closes PR#8996.
 1.24 30-Mar-2000  augustss Remove register declarations.
 1.23 15-Nov-1999  fvdl Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.22 24-Mar-1999  mrg branches: 1.22.4; 1.22.8; 1.22.10; 1.22.14;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.
 1.21 10-Mar-1999  perseant Added flags to lfs_check call
 1.20 05-Mar-1999  mycroft Pass null pointers to VOP_UPDATE rather than having all the callers fetch the
current time themselves.
 1.19 10-Feb-1999  bouyer Make sure a buffer optained from bread() is always bresle()'d in case of
error. Closes PR kern/1448 from Wolfgang Solfrank.
 1.18 02-Aug-1998  kleink branches: 1.18.2;
Implement support for IEEE Std 1003.1b-1993 synchronous I/O:
* in the read vnode operator, check for IO_SYNC being set in the ioflag and
synchronously update the file's meta-data if appropriate.
* in the write vnode operator, update the appropriate checks for IO_SYNC being
set in the ioflag to reflect that IO_DSYNC is now inclusive-or'ed into
IO_SYNC, and require all IO_SYNC bits to be set for operations defined by
synchronized I/O file integrity completion but not by synchronized I/O data
integrity completion.
 1.17 09-Jun-1998  scottr Protect various config(8)-generated files from inclusion while
building LKMs. Fixes PR 5557.
 1.16 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.15 10-Feb-1998  mrg - add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.14 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)
 1.13 04-Jul-1997  drochner Don't cast 64bit (off_t) file sizes to vm_offset_t (32bit on many
architectures), truncate them intelligently instead.
The truncation is done centralized in vnode_pager.c.
This prevents from wrap-over effects when parts of large (>2^32 byte) files
are mmapped.
Don't allow to mmap above the numerical range of vm_offset_t.
This is considered a temporary solution until the vm system handles the
object sizes/offsets more cleanly.
 1.12 11-Jun-1997  bouyer Add support for ext2fs, this needed a few modifications to ufs/ufs/inode.h:
- added an "union inode_ext" to struct inode, for the per-fs extentions.
For now only ext2fs uses it.
- i_din is now an union:
union {
struct dinode ffs_din; /* 128 bytes of the on-disk dinode. */
struct ext2fs_dinode e2fs_din; /* 128 bytes of the on-disk dinode. */
} i_din
Added a lot of #define i_ffs_* and i_e2fs_* to access the fields.
- Added two macros: FFS_ITIMES and EXT2FS_ITIMES. ITIMES calls the rigth
macro, depending on the time of the inode. ITIMES is used where necessary,
FFS_ITIMES and EXT2FS_ITIMES in other places.
 1.11 04-Apr-1997  kleink Return immediately upon zero byte reads, as updating st_atime in this case
violates POSIX.1 read() semantics.
 1.10 30-Jan-1997  tls add support for noatime mount flag
 1.9 11-May-1996  mycroft Change VOP_UPDATE() semantics:
* Make 2nd and 3rd args timespecs, not timevals.
* Consistently pass a Boolean as the 4th arg (except in LFS).
Also, fix ffs_update() and lfs_update() to actually change the nsec fields.
 1.8 09-Feb-1996  christos ufs prototype changes
 1.7 24-Jul-1995  cgd avoid unnecessary aging of buffers. This used to make sense, when buffer
caches were much smaller, but makes little sense now, and will become more
useless as RAM (and buffer cache) sizes grow. Suggested by Bob Baron.
 1.6 24-Mar-1995  cgd explicitly cast &time to (struct timeval *) when passing it to VOP_UPDATE.
new prototypes and picky compilers make a volatile mess.
 1.5 14-Dec-1994  mycroft Sync with CSRG.
 1.4 20-Oct-1994  cgd update for new syscall args description mechanism, and deal safely
with wider types.
 1.3 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.2 14-Jun-1994  mycroft Fix compatibility with old fastlinks.
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.18.2.5 30-May-1999  chs track ffs_balloc() interface change.
 1.18.2.4 30-Apr-1999  chs change ubc_alloc()'s length arg to be a pointer instead of the value.
the pointed-to value is the total desired length on input,
and is updated to the length that will fit in the returned window.
this allows callers of ubc_alloc() to be ignorant of the window size.
 1.18.2.3 29-Apr-1999  chs move updating of i_ffs_size to ffs_balloc_range().
 1.18.2.2 25-Feb-1999  chs move ffs_mballoc() (now ffs_balloc_range()) to ffs_balloc.c.
 1.18.2.1 09-Nov-1998  chs initial snapshot. lots left to do.
 1.22.14.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.22.10.1 19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.22.8.4 27-Mar-2001  bouyer Sync with HEAD.
 1.22.8.3 12-Mar-2001  bouyer Sync with HEAD.
 1.22.8.2 08-Dec-2000  bouyer Sync with HEAD.
 1.22.8.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.22.4.7 31-Aug-1999  perseant Rudimentary support for LFS under UBC:

- LFS-specific VOP_BALLOC and VOP_PUTPAGES vnode ops.

- getblk VREG panic #ifdef'd out (can be reinstated when Ifile is
internalized and Ifile can be made another type from VREG)

- interface to VOP_PUTPAGES changed to pass all pager flags, not
just sync. FS putpages routines must know about the pager flags.

- new LFS magic disk address, -2 ("unwritten"), meaning accounted for
but not assigned to a fixed disk location (since LFS does these two
things separately, and the previous accounting method using buffer
headers no longer will work). Changed references to (foo == (daddr_t)-1)
to (foo < 0). Since disk drivers reject all addresses < 0, this should
not present a problem for other FSs.
 1.22.4.6 31-Jul-1999  chs in WRITE(), VOP_BALLOC() no longer does what we want, so switch to using
ffs_balloc1().
 1.22.4.5 11-Jul-1999  chs remove uvm_vnp_uncache(), it's no longer needed.
 1.22.4.4 04-Jul-1999  chs use VOP_BALLOC(). remove some clutter.
 1.22.4.3 02-Jul-1999  thorpej Take two at making a non-converted LFS work in a UBC kernel.
 1.22.4.2 22-Jun-1999  thorpej Put back the calls to cluster_*(), but #if 0 them out. This is mostly
for use as a reference point.
 1.22.4.1 07-Jun-1999  chs merge everything from chs-ubc branch.
 1.26.4.1 14-Sep-2000  perseant Pull up recent LFS kernel changes (approved by thorpej):

ufs/ufs/inode.h, 1.20--1.22 (add i_lfs_effnblks extension ;
make ITIMES aware of LFS_ITIMES;
_LKM protection so userland progs
compile)
ufs/ufs/ufs_vnops.c, 1.69, 1.71 (remove IN_ADIROP;
use ITIMES instead of FFS_ITIMES)
ufs/ufs/ufs_readwrite.c, 1.27 (use lfs_reserve in lfs_write)
ufs/lfs/lfs.h, 1.26--1.32 (define LFS_EST_* macros ;
change MIN_FREE_SEGS to lfs_minfreesegs ;
add avail and bfree to CLEANERINFO ;
change lfs_uinodes to signed ;
change lfs_dmeta to signed ;
add whitespace to line up structure
members ;
explicit cast to int32_t in LFS_EST_*
macros)
ufs/lfs/lfs_alloc.c, back out 1.34.2.3 (pullups of 1.39, 1.40);
then pull up 1.38 (clean up on error)
1.39--1.43 (restore fvdl's ufs_hashlock fix ;
restore fvdl's ufs_hashlock fix ;
set i_lfs_effnblks ;
use UINO macros ;
add comments and fix long lines)
ufs/lfs/lfs_balloc.c, 1.19 (don't succeed halfway)
1.21--1.25 (use i_lfs_effnblks ;
fix i_lfs_effnblks computation and
quieten ;
fix i_ffs_blocks in unwritten fragment ;
remove useless debugging check ;
add comments and (c) 2000)
ufs/lfs/lfs_bio.c, 1.24--1.30 (cleanup and make lfs_flush_fs take
"struct lfs *" instead of "struct
mount *" ;
use lfs_minfreeseg instead of
MIN_FREE_SEGS ;
use UINO macros, and copy bfree/avail
to CLEANERINFO ;
add lfs_reserve function ;
1.28--1.30 fix printf formatting)
ufs/lfs/lfs_cksum.c, 1.13 (add (c) 2000)
ufs/lfs/lfs_debug.c, 1.11 (use btodb instead of DEV_BSIZE)
ufs/lfs/lfs_extern.h, 1.18, 1.20--1.21 (function prototype changes)
ufs/lfs/lfs_inode.c, 1.38 (rewrite lfs_truncate from
ffs_truncate)
1.40--1.44 (count written and unwritten blocks
seperately ;
use disk block units instead of bytes ;
remove unnecessary "mod" variable ;
correct B_DELWRI to avoid bawrite panic ;
use lfs_reserve)
ufs/lfs/lfs_segment.c, 1.52-1.59 (use lfs_dmeta to note used summaries ;
check for UNWRITTEN in indirect blocks ;
more debugging stuff inside #ifdef
DEBUG_LFS ;
use LK_CANRECURSE ;
don't drop dirty indirect blocks ;
use UINO macros ;
don't hose the free list ;
use btodb() instead of DEV_BSIZE ;
make it compile again (oops))
ufs/lfs/lfs_subr.c, 1.16--1.17 (check for locked inodes before
changing ;
use btodb() instead of DEV_BSIZE, (c)
2000)
ufs/lfs/lfs_syscalls.c, back out 1.41.4.2 (fvdl's ufs_hashlock fix);
then pull up 1.43 (use lfs_dmeta)
1.44--1.45 (restore fvdl's ufs_hashlock fix)
1.46--1.47 (fix lfs_avail leakage from sblock
segments ;
use UINO macros)
1.49 (bounds-check inode numbers in
lfs_markv)
ufs/lfs/lfs_vfsops.c, 1.53 (use LFS_EST_* macros in lfs_statfs)
1.56--1.58 (initialize lfs_minfreeseg, lfs_effnblk ;
initialize lfs_uinodes ;
initialize lfs_ravail)
ufs/lfs/lfs_vnops.c, 1.40 (remove VDIROP from removed files)
1.42--1.44 (move SET_ENDOP below the removal of
VDIROP ;
use UINO macros and add lfs_itimes
function ;
use lfs_reserve in dirops)
 1.30.2.9 29-Dec-2002  thorpej Sync with HEAD.
 1.30.2.8 11-Nov-2002  nathanw Catch up to -current
 1.30.2.7 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.30.2.6 08-Jan-2002  nathanw Catch up to -current.
 1.30.2.5 14-Nov-2001  nathanw Catch up to -current.
 1.30.2.4 08-Oct-2001  nathanw Catch up to -current.
 1.30.2.3 21-Sep-2001  nathanw Catch up to -current.
 1.30.2.2 24-Aug-2001  nathanw Catch up with -current.
 1.30.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.31.4.7 26-Sep-2002  jdolecek WRITE: make sure to set 'extended' in the part of code which handles regular
files, too
 1.31.4.6 25-Sep-2002  jdolecek need to initialize 'extended' so that the NOTE_EXTEND flag would
be set correctly in non-extend case
(FreeBSD ufs/ffs/ffs_vnops.c rev 1.95 has this bug too)
 1.31.4.5 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.31.4.4 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.31.4.3 08-Sep-2001  thorpej Centralize the definition of VN_KNOTE().
 1.31.4.2 03-Aug-2001  lukem update to -current
 1.31.4.1 10-Jul-2001  lukem * implement ufs_kqfilter(), filt_ufs*()
* add KNOTE(9) calls as appropriate
 1.31.2.1 13-Jul-2001  perseant Lingering change from the block addressing changeover: reserve the correct
number of bytes according to the number of fs blocks we may need to write.
 1.32.2.2 11-Oct-2001  fvdl Catch up with -current. Fix some bogons in the sparc64 kbd/ms
attach code. cd18xx conversion provided by mrg.
 1.32.2.1 01-Oct-2001  fvdl Catch up with -current.
 1.36.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.42.4.1 21-Oct-2002  lukem Pull up revision 1.43 (requested by yamt in ticket #926):
make sure to update the vnode's size even if uiomove failed.
otherwise, softdep states can't be flushed later.
ok'ed by Chuck Silvers. fix PR/16670.
 1.54.2.10 11-Dec-2005  christos Sync with head.
 1.54.2.9 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.54.2.8 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.54.2.7 17-Jan-2005  skrll Sync with HEAD.
 1.54.2.6 30-Oct-2004  skrll Re-arrange code slightly.
 1.54.2.5 21-Sep-2004  skrll Fix the sync with head I botched.
 1.54.2.4 18-Sep-2004  skrll Sync with HEAD.
 1.54.2.3 25-Aug-2004  skrll Sync with HEAD.
 1.54.2.2 03-Aug-2004  skrll Sync with HEAD
 1.54.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.55.4.1 10-May-2005  riz Pull up the following revisions (requested by perseant in ticket #1281):

1.8 sys/ufs/lfs/TODO
1.75 sys/ufs/lfs/lfs.h (via patch)
1.74 sys/ufs/lfs/lfs_alloc.c (via patch)
1.49, 1.51 sys/ufs/lfs/lfs_balloc.c (1.51 via patch)
1.78 sys/ufs/lfs/lfs_bio.c
1.62 sys/ufs/lfs/lfs_extern.h (via patch)
1.156 sys/ufs/lfs/lfs_segment.c (via patch)
1.48 sys/ufs/lfs/lfs_subr.c
1.101 sys/ufs/lfs/lfs_syscalls.c
1.163 sys/ufs/lfs/lfs_vfsops.c (via patch)
1.134 sys/ufs/lfs/lfs_vnops.c (via patch)
1.61 sys/ufs/ufs/ufs_readwrite.c (via patch)

1.20 libexec/lfs_cleanerd/clean.h (via patch)
1.52 libexec/lfs_cleanerd/cleanerd.c (via patch)
1.41 libexec/lfs_cleanerd/library.c (via patch)

1.4 regress/sys/fs/lfs/newfs_fsck/Makefile
1.2 regress/sys/fs/lfs/newfs_fsck/mkfs_mount
1.2 regress/sys/fs/lfs/newfs_fsck/smallfiles
1.3 sbin/fsck_lfs/bufcache.c
1.3 sbin/fsck_lfs/bufcache.h
1.3 sbin/fsck_lfs/lfs.h
1.8 sbin/fsck_lfs/lfs.c (via patch)
1.8 sbin/fsck_lfs/pass3.c (via patch)
1.18 sbin/fsck_lfs/pass0.c (via patch)
1.18 sbin/fsck_lfs/utilities.c (via patch)
1.7 sbin/fsck_lfs/segwrite.c
1.19 sbin/fsck_lfs/setup.c (via patch)
1.3 sbin/newfs_lfs/Makefile
0 sbin/newfs_lfs/lfs.c (yes, remove it)
1.1 sbin/newfs_lfs/make_lfs.c
1.15 sbin/newfs_lfs/newfs.c (via patch)

Various minor LFS improvements.

Kernel:

* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this. Should fix PR #29045.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
Fixes PR #26680.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().

cleaner:

* Adapt lfs_cleanerd to use the fcntl call to get the Ifile filehandle,
so it need not be in the namespace.
* Make lfs_cleanerd be more careful when there are very few available
segments.
* Make lfs_cleanerd less verbose when the filesystem is unmounted.

newfs_lfs, fsck_lfs, and regression:

* Extend the lfs library from fsck_lfs(8) so that it can be used with a
not-yet-existent LFS. Make newfs_lfs(8) use this library, so it can
create LFSs whose Ifile is larger than one segment. Addresses PR #11110.
* Make newfs_lfs(8) use strsuftoi64() for its arguments, a la newfs(8).
* Make fsck_lfs(8) respect the "file system is clean" flag.
* Don't let fsck_lfs(8) think it has dirty blocks when invoked with the
-n flag.
* Remove the Ifile from the filesystem namespace. The cleaner now uses
a fcntl call on the root inode to find the Ifile filehandle. (As a
side-effect, addresses PR #29144.)
 1.60.4.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.60.2.1 29-Apr-2005  kent sync with -current
 1.61.2.1 07-May-2005  tron Apply patch (requested by perseant in ticket #242):
* fsck_lfs buffer cache fixes, including PR #29151
* Change fsck_lfs phase 0 message to reflect reality
* fsck_lfs: check phase 5 (cleanerinfo accounting) even on
roll-forward
* Keep better track of the free list during roll-forward, avoiding
a core dump
* Improve hash table use for fsck_lfs buffer and vnode cache
* Document fsck_lfs flag -f, and implement -q
* Add resize_lfs, including kernel support
* Add LFS to mountd's list of exportable filesystem types
* Make the LFS lkm work again [christos@]
* Add MP locking to the LFS kernel subsystem
* Fix pager_map deadlock in lfs_putpages()
* Avoid incomplete file extension that looks like "partial
truncation" to fsck
* Use lfs_malloc for cleaner malloc, since the cleaner often runs
in low-memory conditions.
* Use splay trees, not hash table, to track page allocation for
write.
* Fix mkdir panic on full fs
* Fix page accounting leak by counting differently.
* Use rightly named structure for lfs_getattr [skrll@]
* Cosmetic changes for readability.
 1.63.4.1 20-Oct-2005  yamt adapt ufs.
 1.63.2.6 21-Jan-2008  yamt sync with head
 1.63.2.5 27-Oct-2007  yamt sync with head.
 1.63.2.4 03-Sep-2007  yamt sync with head.
 1.63.2.3 26-Feb-2007  yamt sync with head.
 1.63.2.2 30-Dec-2006  yamt sync with head.
 1.63.2.1 21-Jun-2006  yamt sync with head.
 1.64.2.3 19-Nov-2005  yamt - as read-ahead context is per-vnode now,
there are less reasons to make VOP_READ call uvm_ra_request explicitly.
move it to pager (uvn_get) so that it can handle accesses via mmap as well.
- pass advice to pager via ubc.
- tweak DPRINTF.

XXX can be disturbed by PGO_LOCKED.

XXX it's controversial where it should be done.
(uvm_fault, uvn_get or genfs_getpages.)
 1.64.2.2 18-Nov-2005  yamt - associate read-ahead context to vnode, rather than file.
- revert VOP_READ prototype.
 1.64.2.1 15-Nov-2005  yamt adapt ffs, lfs, nfs.
 1.66.6.2 01-Jun-2006  kardel Sync with head.
 1.66.6.1 22-Apr-2006  simonb Sync with head.
 1.66.4.1 09-Sep-2006  rpaulo sync with head
 1.66.2.1 31-Dec-2005  yamt adapt some random parts of kernel to uio_vmspace.
 1.67.6.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.67.4.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.67.2.1 24-May-2006  yamt sync with head.
 1.68.10.1 22-Oct-2006  yamt sync with head
 1.68.8.3 01-Feb-2007  ad Sync with head.
 1.68.8.2 12-Jan-2007  ad Sync with head.
 1.68.8.1 18-Nov-2006  ad Sync with head.
 1.74.2.3 17-May-2007  yamt sync with head.
 1.74.2.2 07-May-2007  yamt sync with head.
 1.74.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.76.6.1 11-Jul-2007  mjf Sync with head.
 1.76.4.8 09-Oct-2007  ad Sync with head.
 1.76.4.7 24-Aug-2007  ad Sync with buffer cache locking changes. See buf.h/vfs_bio.c for details.
Some minor portions are incomplete and needs to be verified as a whole.
 1.76.4.6 20-Aug-2007  ad Sync with HEAD.
 1.76.4.5 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.76.4.4 09-Jun-2007  ad Sync with head.
 1.76.4.3 08-Jun-2007  ad Sync with head.
 1.76.4.2 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.76.4.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.79.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.81.10.2 27-Jul-2007  yamt use ubc_uiomove for read as well.
 1.81.10.1 27-Jul-2007  yamt file ufs_readwrite.c was added on branch matt-mips64 on 2007-07-27 10:00:43 +0000
 1.81.8.2 14-Oct-2007  yamt sync with head.
 1.81.8.1 06-Oct-2007  yamt sync with head.
 1.81.6.2 09-Jan-2008  matt sync with HEAD
 1.81.6.1 06-Nov-2007  matt sync with HEAD
 1.81.4.3 09-Dec-2007  jmcneill Sync with HEAD.
 1.81.4.2 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.81.4.1 02-Oct-2007  joerg Sync with HEAD.
 1.84.6.2 26-Dec-2007  ad Sync with head.
 1.84.6.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.84.4.1 18-Feb-2008  mjf Sync with HEAD.
 1.85.4.1 02-Jan-2008  bouyer Sync with HEAD
 1.86.8.1 18-May-2008  yamt sync with head.
 1.86.6.3 17-Jan-2009  mjf Sync with HEAD.
 1.86.6.2 28-Sep-2008  mjf Sync with HEAD.
 1.86.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.87.4.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.87.4.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.87.2.2 11-Aug-2010  yamt sync with head.
 1.87.2.1 04-May-2009  yamt sync with head.
 1.88.4.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.88.4.1 19-Oct-2008  haad Sync with HEAD.
 1.88.2.1 10-Jun-2008  simonb Initial commit of Wasabi System's WAPBL (Write Ahead Physical Block
Logging) journaling code. Originally written by Darrin B. Jewell
while at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

Still a number of issues - look in doc/BRANCHES for "simonb-wapbl"
for more info.
 1.92.2.2 03-Mar-2009  skrll Sync with HEAD.
 1.92.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.93.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.94.4.3 21-Apr-2011  rmind sync with head
 1.94.4.2 30-May-2010  rmind sync with head
 1.94.4.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.94.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.95.4.1 18-Feb-2011  bouyer Add a new inode flag, SF_SNAPINVAL, to be set on SF_SNAPSHOT inodes when
the snapshot is invalid.
Set SF_SNAPSHOT | SF_SNAPINVAL early when initializing a snapshot indode,
so that quota are bypassed for allocations on this inode.
Set SF_SNAPSHOT | SF_SNAPINVAL (instead of clearing SF_SNAPSHOT) when
expuge()ing a snapshot inode, so that userland tools working on the
snapshot (e.g. fsck or dump) can properly handle this inode.

The main point at this time is to have fsck_ffs -X properly compute quotas;
as a bonus persistent snapshots files won't show up in a dump(8) from a
snapshot.

This may also help speeding up taking snapshots, by bypassing expuge()
for snapshot inodes completely (but this needs more thoughs).


Briefly discussed with hannken@ in private mail.
 1.95.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.96.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.99.2.6 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.99.2.5 23-Jan-2013  yamt sync with head
 1.99.2.4 23-May-2012  yamt sync with head.
 1.99.2.3 17-Apr-2012  yamt sync with head
 1.99.2.2 25-Jan-2012  yamt uvm_loanabj: take an access pattern hint.
 1.99.2.1 04-Jan-2012  yamt enable O->A loaning read for a few filesystems.
 1.100.4.4 02-Jun-2012  mrg sync to latest -current.
 1.100.4.3 29-Apr-2012  mrg sync to latest -current.
 1.100.4.2 05-Apr-2012  mrg sync to latest -current.
 1.100.4.1 18-Feb-2012  mrg merge to -current.
 1.101.2.1 07-May-2012  riz Pull up following revision(s) (requested by chs in ticket #204):
sys/fs/sysvbfs/sysvbfs_vnops.c: revision 1.44
sys/ufs/ffs/ffs_vfsops.c: revision 1.277
sys/fs/v7fs/v7fs_vnops.c: revision 1.11
sys/ufs/chfs/chfs_vnops.c: revision 1.7
sys/ufs/ext2fs/ext2fs_readwrite.c: revision 1.61
sys/miscfs/genfs/genfs_io.c: revision 1.54
sys/kern/vfs_wapbl.c: revision 1.52
sys/uvm/uvm_pager.h: revision 1.43
sys/ufs/ffs/ffs_vnops.c: revision 1.121
sys/kern/vfs_subr.c: revision 1.434
sys/fs/msdosfs/msdosfs_vnops.c: revision 1.83
sys/fs/ntfs/ntfs_vnops.c: revision 1.51
sys/fs/udf/udf_subr.c: revision 1.119
sys/miscfs/specfs/spec_vnops.c: revision 1.135
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.103
sys/fs/udf/udf_vnops.c: revision 1.71
sys/ufs/ufs/ufs_readwrite.c: revision 1.104
change vflushbuf() to take the full FSYNC_* flags.
translate FSYNC_LAZY into PGO_LAZY for VOP_PUTPAGES() so that
genfs_do_io() can set the appropriate io priority for the I/O.
this is the first part of addressing PR 46325.
mark all wapbl I/O as BPRIO_TIMECRITICAL.
this is the second part of addressing PR 46325.
 1.104.2.6 03-Dec-2017  jdolecek update from HEAD
 1.104.2.5 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.104.2.4 23-Jun-2013  tls resync from head
 1.104.2.3 25-Feb-2013  tls resync with head
 1.104.2.2 10-Feb-2013  tls Add an accessor -- ufs_maxphys() -- to check the maximum transfer size
for a given UFS mountpoint, and move the code from mount that finds
the underlying disk and resets the mountpoint max transfer size into a
utility function, ufs_update_maxphys().

Add a global serial number that counts disk property changes to which
filesystems are meant to accomodate themselves. Make ufs_maxphys()
check it. This is a sort of flag-polling interface that avoids callbacks
into the filesystem code, but will require freezing filesystems and
draining in-flight transactions before a decrease in size that is
mandatory (like attaching a disk with a smaller maximum transfer size
as a spare in a RAIDframe set), rather than "advisory", like finding
out set geometry from a RAID controller long after boot and deciding
a smaller transfer size would be optimal, can be signalled. Still, the
"advisory" case is the common one so this is progress.

Make a bit of an example of RAIDframe by making it bump this new
serial number when disks are added to the subsystem. I will attack
one of the hardware RAID drivers (probably arcmsr) next.
 1.104.2.1 14-Oct-2012  tls In the FFS writebehind code (ufs_readwrite.c:WRITE()), use division
and multiplication instead of shifts, to accomodate devices with
MAXPHYS that is a multiple of the page size, but not a power of 2.

MegaRAID neatly writes out 192k chunks now.

An open question: is is really a good idea to always writebehind
at the largest size supported by the device? Likely not, as this
could have a major impact on I/O fairness. OS X and Solaris both
seem to limit transfers to 128k likely for this reason (the same
problem exists with the readahead code but since it is adaptive,
it will not *always* do huge transfers).

However, simply imposing a smallish limit like 128k here seems
like a bad idea because then we cannot accomodate greedy devices
like RAID, for which you want something like 128k * number of
data components. Hmmmmmm.
 1.107.10.3 28-Aug-2017  skrll Sync with HEAD
 1.107.10.2 06-Jun-2015  skrll Sync with HEAD
 1.107.10.1 06-Apr-2015  skrll Sync with HEAD
 1.120.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.120.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.121.14.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.121.14.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.121.14.1 10-Jun-2019  christos Sync with HEAD
 1.124.4.1 29-Feb-2020  ad Sync with head.
 1.125.4.1 25-Apr-2020  bouyer Sync with bouyer-xenpvh-base2 (HEAD)
 1.128.10.1 02-Aug-2025  perseant Sync with HEAD
 1.14 20-Oct-2021  thorpej Overhaul of the EVFILT_VNODE kevent(2) filter:

- Centralize vnode kevent handling in the VOP_*() wrappers, rather than
forcing each individual file system to deal with it (except VOP_RENAME(),
because VOP_RENAME() is a mess and we currently have 2 different ways
of handling it; at least it's reasonably well-centralized in the "new"
way).
- Add support for NOTE_OPEN, NOTE_CLOSE, NOTE_CLOSE_WRITE, and NOTE_READ,
compatible with the same events in FreeBSD.
- Track which kevent notifications clients are interested in receiving
to avoid doing work for events no one cares about (avoiding, e.g.
taking locks and traversing the klist to send a NOTE_WRITE when
someone is merely watching for a file to be deleted, for example).

In support of the above:

- Add support in vnode_if.sh for specifying PRE- and POST-op handlers,
to be invoked before and after vop_pre() and vop_post(), respectively.
Basic idea from FreeBSD, but implemented differently.
- Add support in vnode_if.sh for specifying CONTEXT fields in the
vop_*_args structures. These context fields are used to convey information
between the file system VOP function and the VOP wrapper, but do not
occupy an argument slot in the VOP_*() call itself. These context fields
are initialized and subsequently interpreted by PRE- and POST-op handlers.
- Version VOP_REMOVE(), uses the a context field for the file system to report
back the resulting link count of the target vnode. Return this in tmpfs,
udf, nfs, chfs, ext2fs, lfs, and ufs.

NetBSD 9.99.92.
 1.13 28-Oct-2016  jdolecek reorganize ffs_truncate()/ffs_indirtrunc() to be able to partially
succeed; change wapbl_register_deallocation() to return EAGAIN
rather than panic when code hits the limit

callers changed to either loop calling ffs_truncate() using new
utility ufs_truncate_retry() if their semantics requires it, or
just ignore the failure; remove ufs_wapbl_truncate()

this fixes possible user-triggerable panic during truncate, and
resolves WAPBL performance issue with truncates of large files

PR kern/47146 and kern/49175
 1.12 27-Mar-2015  riastradh branches: 1.12.2;
Disentangle buffer-cached I/O from page-cached I/O in UFS.

Page-cached I/O is used for regular files, and is initiated by VFS
users such as userland and NFS.

Buffer-cached I/O is used for directories and symlinks, and is issued
only internally by UFS.

New UFS routine ufs_bufio replaces vn_rdwr for internal use.
ufs_bufio is implemented by new UFS operations uo_bufrd/uo_bufwr,
which sit in ufs_readwrite.c alongside the VOP_READ/VOP_WRITE
implementations.

I preserved the code as much as possible and will leave further
simplification for future commits. I kept the ulfs_readwrite.c
copypasta close to ufs_readwrite.c in case we ever want to merge them
back; likewise ext2fs_readwrite.c.

No externally visible semantic change. All atf fs tests still pass.
 1.11 25-May-2014  hannken branches: 1.11.4;
ufs_gro_genealogy: use vcache_get() to lookup DOTDOT.
 1.10 06-Feb-2014  hannken branches: 1.10.2;
Move fstrans_start()/fstrans_done() into genfs_insane_rename() to protect
the complete rename operation like we do for all other vnode operations.
 1.9 04-Nov-2013  christos Add 2 XXX: gcc initializations
 1.8 19-Jun-2013  dholland branches: 1.8.2;
Rename ambiguous macros:
MAXDIRSIZE -> UFS_MAXDIRSIZE or LFS_MAXDIRSIZE
NINDIR -> FFS_NINDIR, EXT2_NINDIR, LFS_NINDIR, or MFS_NINDIR
INOPB -> FFS_INOPB, LFS_INOPB
INOPF -> FFS_INOPF, LFS_INOPF
blksize -> ffs_blksize, ext2_blksize, or lfs_blksize
sblksize -> ffs_blksize

These are not the only ambiguously defined filesystem macros, of
course, there's a pile more. I may not have found all the ambiguous
definitions of blksize(), too, as there are a lot of other things
called 'blksize' in the system.
 1.7 09-Jun-2013  dholland Stick UFS_ in front of these symbols:
DIRBLKSIZ
DIRECTSIZ
DIRSIZ
OLDDIRFMT
NEWDIRFMT

Part of PR 47909.
 1.6 22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.5 04-Jun-2012  riastradh branches: 1.5.2;
Kill the IN_RENAME in-core inode flag in ufs and ext2fs.

Now that rename works we need not to wave this sort of voodoo at it.

ok dholland
 1.4 04-Jun-2012  riastradh Fix typo in comment: bp->b_bcount, not bp->b_count.
 1.3 04-Jun-2012  riastradh Kill scary message about cross-block directories and fix its cause.

Add a bunch of kasserts to check more stringently that ufs_direnter
did not compact across directory blocks. Don't bother fetching
subsequent I/O blocks from the directory: ufs_lookup guarantees that
it's not necessary, and the kasserts check this to be sure.

The message fired when we were looking at the start of an I/O block,
not when we crossed from the end of one to the start of another. I
believe it fired only when tulr->ulr_offset was a multiple of the I/O
block size (fs_bsize), which can happen if ufs_lookup either finds an
entry or finds free space at the start of an I/O block.

If ufs_lookup found an entry, none of this ulr recalculation logic
should kick in -- if tvp != NULL, then tulr->ulr_count is garbage, so
it's not merely unnecessary but wrong (although I suspect harmless in
the end) to read it in ufs_rename_overlap_p in consideration of
whether to recalculate fulr.

Discussed with chuq and dholland.

ok dholland
 1.2 10-May-2012  riastradh branches: 1.2.2; 1.2.4;
Disable scary but probably harmless printf.

Still need to find why this harmless-but-shouldn't-happen case is
happening, but in the mean time, we can stop scaring people with it.
 1.1 09-May-2012  riastradh Adapt ffs, lfs, and ext2fs to use genfs_rename.

ok dholland, rmind
 1.2.4.2 02-Jun-2012  mrg sync to latest -current.
 1.2.4.1 10-May-2012  mrg file ufs_rename.c was added on branch jmcneill-usbmp on 2012-06-02 11:09:41 +0000
 1.2.2.5 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.2.2.4 23-Jan-2013  yamt sync with head
 1.2.2.3 30-Oct-2012  yamt sync with head
 1.2.2.2 23-May-2012  yamt sync with head.
 1.2.2.1 10-May-2012  yamt file ufs_rename.c was added on branch yamt-pagecache on 2012-05-23 10:08:20 +0000
 1.5.2.4 03-Dec-2017  jdolecek update from HEAD
 1.5.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.5.2.2 23-Jun-2013  tls resync from head
 1.5.2.1 25-Feb-2013  tls resync with head
 1.8.2.1 18-May-2014  rmind sync with head
 1.10.2.1 10-Aug-2014  tls Rebase.
 1.11.4.2 05-Dec-2016  skrll Sync with HEAD
 1.11.4.1 06-Apr-2015  skrll Sync with HEAD
 1.12.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.1 30-Mar-2007  mjf branches: 1.1.2;
file ufs_trans.c was initially added on branch mjf-ufs-trans.
 1.1.2.1 30-Mar-2007  mjf Add initial implementation of transaction API.
 1.1 30-Mar-2007  mjf branches: 1.1.2;
file ufs_trans.h was initially added on branch mjf-ufs-trans.
 1.1.2.1 30-Mar-2007  mjf Add initial implementation of transaction API.
 1.61 22-Feb-2023  riastradh ufs: Nix trailing whitespace and tidy up some other minor KNF.
 1.60 01-May-2020  hannken There is no difference between a zero-sized and not yet
reclaimed directory vnode and a non-existent vnode.

Teach ufs_fhtovp() to treat zero-sized directories as stale.

PR kern/55211 (fs/vfs/t_vnops:nfs_dir_rmdirdotdot test fails)
 1.59 17-Jan-2020  ad VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.
 1.58 22-Dec-2019  ad branches: 1.58.2;
Make mntvnode_lock per-mount, and address false sharing of struct mount.
 1.57 20-Jun-2019  pgoyette Split the ufs code out of the ffs module and into its own module.

Adapt chfs and ext2fs modules accordingly.
 1.56 10-Dec-2018  maxv Remove unused mbuf.h includes.
 1.55 17-Apr-2017  hannken branches: 1.55.10; 1.55.12;
Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.
 1.54 17-Mar-2015  hannken branches: 1.54.2; 1.54.4;
Change ffs to use vcache_new:
- Change ffs_valloc to return an inode number.
- Remove now obsolete UFS operations UFS_VALLOC and UFS_VFREE.
- Make ufs_makeinode private to ufs_vnops.c and pass vattr instead of mode.
 1.53 08-May-2014  hannken branches: 1.53.4;
Add a global vnode cache:

- vcache_get() retrieves a referenced and initialised vnode / fs node pair.
- vcache_remove() removes a vnode / fs node pair from the cache.

On cache miss vcache_get() calls new vfs operation vfs_loadvnode() to
initialise a vnode / fs node pair. This call is guaranteed exclusive,
no other thread will try to load this vnode / fs node pair.

Convert ufs/ext2fs, ufs/ffs and ufs/mfs to use this interface.

Remove now unused ufs/ufs_ihash

Discussed on tech-kern.

Welcome to 6.99.41
 1.52 22-Jan-2013  dholland branches: 1.52.2; 1.52.10;
Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.51 04-Apr-2012  tron branches: 1.51.2;
Assert that we can a valid inode when looking up a file handle.
 1.50 01-Feb-2012  dholland Change the syscall API for quotas over to the new non-proplib one.

- struct vfs_quotactl_args -> struct quotactl_args
- add sys/stdint.h to sys/quotactl.h for clean userland build
- install sys/quotactl.h in /usr/include
- update set lists for same
- add new marshalling code in libquota
- add new unmarshalling code in vfs_syscalls.c
- discard proplib interpreter code in vfs_quotactl.c
- add dispatching code for the 14 quotactl ops in vfs_quotactl.c
- mark the proplib quotactl syscall obsolete
- add a new syscall number for the new quotactl syscall
- change the name of the syscall to __quotactl()
- remove the decl of the old quotactl from quota/quotaprop.h
- add a decl of the new quotactl to sys/quotactl.h
- update the libc build
- update ktruss
- remove proplib marshalling code from libquota
- update copy of syscall table in gdb ppc sources
- hack rumphijack to accomodate new quotactl name (as I recall,
pooka wanted such a name change to simplify something, but I
don't really see what/how)

This change appears to require a kernel version bump for rumpish
reasons.
 1.49 29-Jan-2012  tsutsui Fix errors in !defined(QUOTA) && !defined(QUOTA2) case.
 1.48 29-Jan-2012  dholland Remove references to <quota/quotaprop.h> in src/sys/ufs.
The remaining references in the kernel are in vfs_quotactl.c, the
compat_50 code for the old quotactl (to be fixed up), and the
code compiled from src/common/lib/libquota.
 1.47 29-Jan-2012  dholland Remove the extra op argument to VFS_QUOTACTL() - the op is now stored
purely in the args structure.

This change requires a kernel version bump.
 1.46 29-Jan-2012  dholland Introduce struct vfs_quotactl_args. Use it.

This change uglifies vfs_quotactl some in order to make room for
moving operation-specific but FS-independent logic out of ufs_quota.c.

Note: this change requires a kernel version bump.
 1.45 29-Jan-2012  dholland Move the proplib-based quota command dispatching (that is, the code
that knows the magic string names for the allowed actions) out of
UFS-specific code and to fs-independent code.

This introduces QUOTACTL_* operation codes and changes the signature
of VFS_QUOTACTL() again for compile safety.

Note: this change requires a kernel version bump.
 1.44 29-Jan-2012  dholland Move the code for iterating over the multiple RPC calls in a quota
proplib XML packet to vfs_quotactl.c out of sys/ufs/ufs.

Add a dummy extra arg to VFS_QUOTACTL for compile safety.

Note: this change requires a kernel version bump.
 1.43 27-Jan-2012  para converting readdir in ffs ext2fs from malloc(9) to kmem(9)
while there allocate ufs mount structs from kmem(9) too
preceding kmem-vmem-pool-patch

releng@ acknowledged
 1.42 24-Mar-2011  bouyer branches: 1.42.4; 1.42.8;
Add a new libquota library, which contains some blocks to build and/or
parse quota plists; as well as a getfsquota() function to retrieve quotas
for a single id from a single filesystem (whatever filesystem this is:
a local quota-enabled fs or NFS). This is build on functions getufsquota()
(for local filesystems with UFS-like quotas) and getnfsquota();
which are also available to userland programs.
move functions from quota2_subr.c to libquota or libprop as appropriate,
and ajust in-tree quota tools.
move some declarations from kernel headers to either sys/quota.h or
quota/quota.h as appropriate. ufs/ufs/quota.h still installed because
it's needed by other installed ufs headers.
ufs/ufs/quota1.h still installed as a quick&dirty way to get a code
using the old quotactl() to compile (just include ufs/ufs/quota1.h instead of
ufs/ufs/quota.h - old code won't compile without this change and this is
on purpose).
Discussed on tech-kern@ and tech-net@ (long thread, but not much about
libquota itself ...)
 1.41 06-Mar-2011  bouyer merge the bouyer-quota2 branch. This adds a new on-disk format
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.

See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
 1.40 07-May-2009  elad branches: 1.40.4; 1.40.6; 1.40.8;
Introduce several actions/requests for authorizing file-system related
operations, specifically quota and block allocation from reserved space.

Modify ufs_quotactl() to accomodate passing "mp" earlier by vfs_busy()ing
it a little bit higher.

Mailing list reference:

http://mail-index.netbsd.org/tech-kern/2009/04/26/msg004936.html

Note that the umapfs request mentioned in this thread was NOT added as
there is still on-going discussion regarding the proper implementation.
 1.39 06-May-2008  ad branches: 1.39.14;
PR kern/38141 lookup/vfs_busy acquire rwlock recursively

Simplify the mount locking. Remove all the crud to deal with recursion on
the mount lock, and crud to deal with unmount as another weirdo lock.

Hopefully this will once and for all fix the deadlocks with this. With this
commit there are two locks on each mount:

- krwlock_t mnt_unmounting. This is used to prevent unmount across critical
sections like getnewvnode(). It's only ever read locked with rw_tryenter(),
and is only ever write locked in dounmount(). A write hold can't be taken
on this lock if the current LWP could hold a vnode lock.

- kmutex_t mnt_updating. This is taken by threads updating the mount, for
example when going r/o -> r/w, and is only present to serialize updates.
In order to take this lock, a read hold must first be taken on
mnt_unmounting, and the two need to be held across the operation.

One effect of this change: previously if an unmount failed, we would make a
half hearted attempt to back out of it gracefully, but that was unlikely to
work in a lot of cases. Now while an unmount that will be aborted is in
progress, new file operations within the mount will fail instead of being
delayed. That is unlikely to be a problem though, because if the admin
requests unmount of a file system then s(he) has made a decision to deny
access to the resource.
 1.38 30-Apr-2008  ad PR kern/38135 vfs_busy/vfs_trybusy confusion

The previous fix worked, but it opened a window where mounts could have
disappeared from mountlist while the caller was traversing it using
vfs_trybusy(). Fix that.
 1.37 30-Jan-2008  ad branches: 1.37.6; 1.37.8; 1.37.10;
PR kern/37706 (forced unmount of file systems is unsafe):

- Do reference counting for 'struct mount'. Each vnode associated with a
mount takes a reference, and in turn the mount takes a reference to the
vfsops.
- Now that mounts are reference counted, replace the overcomplicated mount
locking inherited from 4.4BSD with a recursable rwlock.
 1.36 03-Jan-2008  ad Use pool_cache.
 1.35 26-Nov-2007  pooka branches: 1.35.6;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.34 30-Jun-2007  pooka branches: 1.34.6; 1.34.8; 1.34.14;
Using POOL_INIT here makes no sense, since file systems always have
an init method. So get rid of it and #ifdef _LKM and just always
init in the init method. Give malloc types the same treatment.
Makes file systems nicer to work with in linksetless environments
and fixes a few LKM discrepancies.
 1.33 12-Mar-2007  ad branches: 1.33.2;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.
 1.32 04-Jan-2007  elad branches: 1.32.2; 1.32.6;
Consistent usage of KAUTH_GENERIC_ISSUSER.
 1.31 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.30 12-Oct-2006  thorpej ufs_quotactl(): consume the arguments even if QUOTAS is not defined.
 1.29 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.28 23-Jul-2006  ad branches: 1.28.4; 1.28.6;
Use the LWP cached credentials where sane.
 1.27 14-May-2006  elad integrate kauth.
 1.26 11-Dec-2005  christos branches: 1.26.4; 1.26.6; 1.26.8; 1.26.10; 1.26.12;
merge ktrace-lwp.
 1.25 23-Sep-2005  jmmv Apply the NFS exports list rototill patch:

- Remove all NFS related stuff from file system specific code.
- Drop the vfs_checkexp hook and generalize it in the new nfs_check_export
function, thus removing redundancy from all file systems.
- Move all NFS export-related stuff from kern/vfs_subr.c to the new
file sys/nfs/nfs_export.c. The former was becoming large and its code
is always compiled, regardless of the build options. Using the latter,
the code is only compiled in when NFSSERVER is enabled. While doing this,
also make some functions in nfs_subs.c conditional to NFSSERVER.
- Add a new command in nfssvc(2), called NFSSVC_SETEXPORTSLIST, that takes a
path and a set of export entries. At the moment it can only clear the
exports list or append entries, one by one, but it is done in a way that
allows setting the whole set of entries atomically in the future (see the
comment in mountd_set_exports_list or in doc/TODO).
- Change mountd(8) to use the nfssvc(2) system call instead of mount(2) so
that it becomes file system agnostic. In fact, all this whole thing was
done to remove a 'XXX' block from this utility!
- Change the mount*, newfs and fsck* userland utilities to not deal with NFS
exports initialization; done internally by the kernel when initializing
the NFS support for each file system.
- Implement an interface for VFS (called VFS hooks) so that several kernel
subsystems can run arbitrary code upon receipt of specific VFS events.
At the moment, this only provides support for unmount and is used to
destroy NFS exports lists from the file systems being unmounted, though it
has room for extension.

Thanks go to yamt@, chs@, thorpej@, wrstuden@ and others for their comments
and advice in the development of this patch.
 1.24 10-Jul-2005  thorpej Defflag UFS_DIRHASH.
 1.23 10-Jul-2005  thorpej - Use ANSI function decls.
- Sprinkle some static.
 1.22 23-Jan-2005  rumble branches: 1.22.8;
Bring in Ian Dowse's Dirhash from FreeBSD. Hash tables of
directories are created on the fly and used to increase
performance by circumventing ufs_lookup's linear search.

Dirhash is enabled by the UFS_DIRHASH option, but not
by default.
 1.21 20-Dec-2004  dbj branches: 1.21.2;
use #if defined(_KERNEL_OPT) around opt includes
fix arg to pool_init() when _LKM is defined
 1.20 20-Jun-2004  hannken Use a pool for struct direct instead of kernel stack.
Reduces the kernel stack usage by 264 bytes.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.19 27-Apr-2004  jrf First pass for some caddr_t removal and changes to get rid of it where we
no longer use and/or need it

- removed casts from unionfs, deadfs and fdesc
(there are more to hunt down still)
- changed vfs_quotactl args argumet from caddr_t to void *
- changed vfs_quotactl structures/callers to reflect the api change

Compiled fine and ran for about a day. Approved/reviewed by
christos@netbsd.org and gimpy@netbsd.org.
 1.18 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.17 29-Jun-2003  fvdl branches: 1.17.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.16 29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.15 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.14 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.13 08-Nov-2001  lukem branches: 1.13.12;
add RCSID
 1.12 15-Sep-2001  chs branches: 1.12.2;
add a new VFS op, vfs_reinit, which is called when desiredvnodes is
adjusted via sysctl. file systems that have hash tables which are
sized based on the value of this variable now resize those hash tables
using the new value. the max number of FFS softdeps is also recalculated.

convert various file systems to use the <sys/queue.h> macros for
their hash tables.
 1.11 30-Mar-2000  augustss branches: 1.11.6; 1.11.10; 1.11.12;
Remove register declarations.
 1.10 16-Mar-2000  jdolecek Change ufs_init() to keep global count of how many times it was called.
Resources are initialized still just once (on first call).

Add ufs_done(), which takes care of freeing all resources allocated in
ufs_init(). The resources are freed only when last user of the code exits.
 1.9 26-Feb-1999  wrstuden branches: 1.9.8;
Modify vfsops to seperate vfs_fhtovp() into two routines. vfs_fhtovp() now
only handles the file handle to vnode conversion, and a new call,
vfs_checkexp(), performs the export verification.
 1.8 09-Aug-1998  mrg minr KNF nit ..
 1.7 08-Jun-1998  scottr Use the newly-defined opt_quota.h.
 1.6 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.5 11-Jun-1997  bouyer Add support for ext2fs, this needed a few modifications to ufs/ufs/inode.h:
- added an "union inode_ext" to struct inode, for the per-fs extentions.
For now only ext2fs uses it.
- i_din is now an union:
union {
struct dinode ffs_din; /* 128 bytes of the on-disk dinode. */
struct ext2fs_dinode e2fs_din; /* 128 bytes of the on-disk dinode. */
} i_din
Added a lot of #define i_ffs_* and i_e2fs_* to access the fields.
- Added two macros: FFS_ITIMES and EXT2FS_ITIMES. ITIMES calls the rigth
macro, depending on the time of the inode. ITIMES is used where necessary,
FFS_ITIMES and EXT2FS_ITIMES in other places.
 1.4 09-Feb-1996  christos ufs prototype changes
 1.3 10-May-1995  cgd from Mike Karels:
allow Q_SYNC regardless of "target" uid, we allow it with -1;
fix bug that caused all ops to refer to user quotas, not group.
[finally had a chance to check this!]
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.9.8.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.11.12.1 01-Oct-2001  fvdl Catch up with -current.
 1.11.10.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.11.6.2 14-Nov-2001  nathanw Catch up to -current.
 1.11.6.1 21-Sep-2001  nathanw Catch up to -current.
 1.12.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.13.12.1 18-Dec-2002  gmcgarry Merge pcred and ucred, and poolify. TBD: check backward compatibility
and factor-out some higher-level functionality.
 1.17.2.9 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.17.2.8 24-Jan-2005  skrll Sync with HEAD.
 1.17.2.7 17-Jan-2005  skrll Sync with HEAD.
 1.17.2.6 27-Oct-2004  skrll Remove the struct lwp * arguments from qsync and ufs_checkpath that are
no longer (read: were never) required.
 1.17.2.5 21-Sep-2004  skrll Fix the sync with head I botched.
 1.17.2.4 18-Sep-2004  skrll Sync with HEAD.
 1.17.2.3 24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.17.2.2 03-Aug-2004  skrll Sync with HEAD
 1.17.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.21.2.1 29-Apr-2005  kent sync with -current
 1.22.8.7 04-Feb-2008  yamt sync with head.
 1.22.8.6 21-Jan-2008  yamt sync with head
 1.22.8.5 07-Dec-2007  yamt sync with head
 1.22.8.4 03-Sep-2007  yamt sync with head.
 1.22.8.3 26-Feb-2007  yamt sync with head.
 1.22.8.2 30-Dec-2006  yamt sync with head.
 1.22.8.1 21-Jun-2006  yamt sync with head.
 1.26.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.26.10.3 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.26.10.2 10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.26.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.26.8.2 11-Aug-2006  yamt sync with head
 1.26.8.1 24-May-2006  yamt sync with head.
 1.26.6.1 01-Jun-2006  kardel Sync with head.
 1.26.4.1 09-Sep-2006  rpaulo sync with head
 1.28.6.2 10-Dec-2006  yamt sync with head.
 1.28.6.1 22-Oct-2006  yamt sync with head
 1.28.4.2 12-Jan-2007  ad Sync with head.
 1.28.4.1 18-Nov-2006  ad Sync with head.
 1.32.6.2 15-Jul-2007  ad Sync with head.
 1.32.6.1 13-Mar-2007  ad Sync with head.
 1.32.2.1 24-Mar-2007  yamt sync with head.
 1.33.2.1 11-Jul-2007  mjf Sync with head.
 1.34.14.2 18-Feb-2008  mjf Sync with HEAD.
 1.34.14.1 08-Dec-2007  mjf Sync with HEAD.
 1.34.8.2 23-Mar-2008  matt sync with HEAD
 1.34.8.1 09-Jan-2008  matt sync with HEAD
 1.34.6.1 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.35.6.1 08-Jan-2008  bouyer Sync with HEAD
 1.37.10.2 16-May-2009  yamt sync with head
 1.37.10.1 16-May-2008  yamt sync with head.
 1.37.8.1 18-May-2008  yamt sync with head.
 1.37.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.39.14.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.40.8.2 09-Feb-2011  bouyer Fix build without quotas
 1.40.8.1 20-Jan-2011  bouyer Snapshot of work in progress on a modernised disk quota system:
- new quotactl syscall (versionned for backward compat), which takes
as parameter a path to a mount point, and a prop_dictionary
(in plistref format) describing commands and arguments.
For each command, status and data are returned as a prop_dictionary.
quota commands features will be added to take advantage of this,
exporting quota data or getting quota commands as plists.

- new on disk-format storage (all 64bit wide), integrated to metadata for
ffs (and playing nicely with wapbl).
Quotas are enabled on a ffs filesystem via superblock flags.
tunefs(8) can enable or disable quotas.
On a quota-enabled filesystem, fsck_ffs(8) will track per-uid/gid
block and inode usages, and will check and update quotas in Pass 6.
quota usage and limits are stored in unliked files (one for users,
one for groups)l fsck_ffs(8) will create the files if needed, or
free them if needed. This means that after enabling or disabling
quotas on a filesystem; a fsck_ffs(8) run is required.
quotacheck(8) is not needed any more, on a unclean shutdown
fsck or journal replay will take care of fixing quotas.
newfs(8) can create a ready-to-mount quota-enabled filesystem
(superblock flags are set and quota inodes are created).
Other new features or semantic changes:
- default quota datas, applied to users or groups which don't already
have a quota entry
- per-user/group grace time (instead of a filesystem global one)
- 0 really means "nothing allowed at all", not "no limit".
If you want "no limit", set the limit to UQUAD_MAX (tools will
understand "unlimited" and "-")

A quota file is structured as follow:
it starts with a header, containing a few per-filesystem values,
and the default quota limits.
Quota entries are linked together as a simple list, each entry has a
pointer (as an offset withing the file) to the next.
The header has a pointer to a list of free quota entries, and
a hash table of in-use entries. The size of the hash table depends
on the filesystem block size (header+hash table should fit in the
first block). The file is not sparse and is a multiple of
filesystem block size (when the free quota entry list is empty a new
filesystem block is allocated). quota entries to not cross
filesystem block boundaries.

In memory, the kernel keeps a cache of recently used quota entries
as a reference to the block number, and offset withing the block.
The quota entry itself is keept in the buf cache.

fsck_ffs(8), tunefs(8) and newfs(8) supports are completed (with
related atf tests :)
The kernel can update disk usage and report it via quotactl(2).

Todo: enforce quotas limits (limits are not checked by kernel yet)
update repquota, edquota and rpc.rquotad to the new world
implement compat_50_quotactl ioctl.
update quotactl(2) man page

fsck_ffs required fixes so that allocating new blocks or inodes will
properly update the superblock and cg sumaries. This was not an issue up
to now because superblock and cg sumaries check happened last, but now
allocations or frees can happen in pass 6.
 1.40.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.40.4.1 21-Apr-2011  rmind sync with head
 1.42.8.2 05-Apr-2012  mrg sync to latest -current.
 1.42.8.1 18-Feb-2012  mrg merge to -current.
 1.42.4.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.42.4.2 23-Jan-2013  yamt sync with head
 1.42.4.1 17-Apr-2012  yamt sync with head
 1.51.2.5 03-Dec-2017  jdolecek update from HEAD
 1.51.2.4 24-Oct-2017  jdolecek remove rebase merge artifacts
 1.51.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.51.2.2 25-Feb-2013  tls resync with head
 1.51.2.1 10-Feb-2013  tls Add an accessor -- ufs_maxphys() -- to check the maximum transfer size
for a given UFS mountpoint, and move the code from mount that finds
the underlying disk and resets the mountpoint max transfer size into a
utility function, ufs_update_maxphys().

Add a global serial number that counts disk property changes to which
filesystems are meant to accomodate themselves. Make ufs_maxphys()
check it. This is a sort of flag-polling interface that avoids callbacks
into the filesystem code, but will require freezing filesystems and
draining in-flight transactions before a decrease in size that is
mandatory (like attaching a disk with a smaller maximum transfer size
as a spare in a RAIDframe set), rather than "advisory", like finding
out set geometry from a RAID controller long after boot and deciding
a smaller transfer size would be optimal, can be signalled. Still, the
"advisory" case is the common one so this is progress.

Make a bit of an example of RAIDframe by making it bump this new
serial number when disks are added to the subsystem. I will attack
one of the hardware RAID drivers (probably arcmsr) next.
 1.52.10.1 10-Aug-2014  tls Rebase.
 1.52.2.1 18-May-2014  rmind sync with head
 1.53.4.2 28-Aug-2017  skrll Sync with HEAD
 1.53.4.1 06-Apr-2015  skrll Sync with HEAD
 1.54.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.54.2.1 26-Apr-2017  pgoyette Sync with HEAD
 1.55.12.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.55.12.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.55.12.1 10-Jun-2019  christos Sync with HEAD
 1.55.10.1 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.58.2.1 17-Jan-2020  ad Sync with head.
 1.262 27-Mar-2022  christos add a kauth vnode check for creating links
 1.261 26-Nov-2021  christos use MNT_NFS4ACLS instead of MNT_ACLS (which was changed before to mean
MNT_POSIX1EACLS)
 1.260 20-Oct-2021  thorpej Overhaul of the EVFILT_VNODE kevent(2) filter:

- Centralize vnode kevent handling in the VOP_*() wrappers, rather than
forcing each individual file system to deal with it (except VOP_RENAME(),
because VOP_RENAME() is a mess and we currently have 2 different ways
of handling it; at least it's reasonably well-centralized in the "new"
way).
- Add support for NOTE_OPEN, NOTE_CLOSE, NOTE_CLOSE_WRITE, and NOTE_READ,
compatible with the same events in FreeBSD.
- Track which kevent notifications clients are interested in receiving
to avoid doing work for events no one cares about (avoiding, e.g.
taking locks and traversing the klist to send a NOTE_WRITE when
someone is merely watching for a file to be deleted, for example).

In support of the above:

- Add support in vnode_if.sh for specifying PRE- and POST-op handlers,
to be invoked before and after vop_pre() and vop_post(), respectively.
Basic idea from FreeBSD, but implemented differently.
- Add support in vnode_if.sh for specifying CONTEXT fields in the
vop_*_args structures. These context fields are used to convey information
between the file system VOP function and the VOP wrapper, but do not
occupy an argument slot in the VOP_*() call itself. These context fields
are initialized and subsequently interpreted by PRE- and POST-op handlers.
- Version VOP_REMOVE(), uses the a context field for the file system to report
back the resulting link count of the target vnode. Return this in tmpfs,
udf, nfs, chfs, ext2fs, lfs, and ufs.

NetBSD 9.99.92.
 1.259 05-Sep-2020  riastradh Round of uvm.h cleanup.

The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
to query whether curlwp is the pagedaemon, which should maybe be
exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
by it. We should split up uvm_extern.h but this will serve for now
to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
UVMHIST(ubchist), since ubchist is declared in uvm.h but the
reference evaporates if UVMHIST is not defined, so we reduce header
file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
here.

ok chs@
 1.258 05-Sep-2020  riastradh Revert "ufs: Prevent mkdir from choking on deleted directories."

This change made no sense and should not have been committed.
 1.257 05-Sep-2020  riastradh ufs: Prevent mkdir from choking on deleted directories.

Fix some missing uvm_vnp_setsize in screw cases while here.
 1.256 20-Aug-2020  christos Don't cache id's for vnodes that have ACLs. ok chs@
 1.255 18-May-2020  hannken Assert ufs_strategy() always gets used while current thread
holds a fstrans lock.
 1.254 16-May-2020  christos Add ACL support for FFS. From FreeBSD.
 1.253 12-May-2020  ad cache_enter_id(): give it a boolean parameter to indicate whether the cached
identity is valid.
 1.252 18-Apr-2020  christos Extended attribute support for ffsv2, from FreeBSD.
 1.251 13-Apr-2020  ad Replace most uses of vp->v_usecount with a call to vrefcnt(vp), a function
that hides the details and does atomic_load_relaxed(). Signature matches
FreeBSD.
 1.250 04-Apr-2020  ad branches: 1.250.2;
Merge the remaining changes from the ad-namecache branch, affecting namei()
and getcwd():

- push vnode locking back as far as possible.
- do most lookups directly in the namecache, avoiding vnode locks & refs.
- don't block new refs to vnodes across VOP_INACTIVE().
- get shared locks for VOP_LOOKUP() if the file system supports it.
- correct lock types for VOP_ACCESS() / VOP_GETATTR() in a few places.

Possible future enhancements:

- make the lookups lockless.
- support dotdot lookups by being lockless and inferring absence of chroot.
- maybe make it work for layered file systems.
- avoid vnode references at the root & cwd.
 1.249 26-Feb-2020  maxv Zero out the padding in 'd_namlen', to prevent info leaks. Same logic as
ufs_makedirentry().

Found by kMSan: the unzeroed bytes of the pool_cache were getting copied
to the disk via a DMA write operation, and there kMSan was noticing
uninitialized memory leaving the system.

Reported-by: syzbot+382c9dffc06a9683abb5@syzkaller.appspotmail.com
 1.248 18-Sep-2019  christos branches: 1.248.2;
Add newly created vnodes to the namei cache. The rest of the filesystems
already did that (or they don't support writing). Discussed in tech-kern.
 1.247 01-Jul-2019  dholland branches: 1.247.2;
Lay down some comments related to the previous few revisions of ufs_vnops.c.
 1.246 25-Feb-2019  dholland Revert -r1.244-245 of ufs_vnops.c; they are wrong.
Fix the mistake in -r1.243 that made them look like reasonable changes.

(this does not affect whether the -r1.243 change works with the union
mount path in libc, but fixes an immediate hazard)
 1.245 25-Feb-2019  christos drop unused
 1.244 25-Feb-2019  christos remove junk assignment.
 1.243 24-Feb-2019  mlelstv Reading a directory may trigger a panic when the buffer is too small.
Adjust necessary checks.

While here, also check for arithmetic overflow.

Reported-by: syzbot+88ecace8bff24169058f@syzkaller.appspotmail.com
 1.242 01-Jan-2019  hannken Add "void *extra" argument to vcache_new() so a file system may
pass more information about the file to create.

Welcome to 8.99.30
 1.241 10-Dec-2018  jdolecek put back UFS_WAPBL_JUNLOCK_ASSERT(), the underlying rw_write_held() check
doesn't actually have a race since it checks if the rwlock is held by
current lwp
 1.240 10-Dec-2018  jdolecek make UFS_WAPBL_JLOCK_ASSERT() #ifdef DIAGNOSTIC, same as the underlying
function KASSERT(), so that it actually does something; fix code using
it to actually pass correct params, so that it compiles

remove UFS_WAPBL_JUNLOCK_ASSERT(), as that is inherently racy (it's
okay on those places if the rwlock is held by other lwp); depend
on the RW_ASSERT()/LOCKDEBUG inside rw_enter() to catch the case
with wapbl rwlock held by current lwp
 1.239 28-Oct-2017  pgoyette branches: 1.239.2; 1.239.4;
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3) format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
but it is possible that I've missed some of them. I would be glad to
update any stragglers that anyone identifies.
 1.238 07-Aug-2017  dholland Tidy up ufs_readdir. First step only; there's plenty more that could be
done to improve this code.
 1.237 26-Apr-2017  riastradh branches: 1.237.4;
Change VOP_REMOVE and VOP_RMDIR to preserve lock/ref on dvp.

No change to vp -- the plan is to replace the node by the
componentname in the vop parameters, and let all directory vops do
lookups internally.

Proposed on tech-kern with no objections:
https://mail-index.netbsd.org/tech-kern/2017/04/17/msg021825.html
 1.236 18-Mar-2017  riastradh #if DIAGNOSTIC panic ---> KASSERT
 1.235 01-Mar-2017  hannken Remove now redundant calls to fstrans_start()/fstrans_done().
 1.234 09-Nov-2016  dholland branches: 1.234.2;
ufs_makeinode is declared file-static at the top of the file; mark it
at its definition too, for consistency and to avoid misleading casual
passersby.
 1.233 28-Oct-2016  jdolecek reorganize ffs_truncate()/ffs_indirtrunc() to be able to partially
succeed; change wapbl_register_deallocation() to return EAGAIN
rather than panic when code hits the limit

callers changed to either loop calling ffs_truncate() using new
utility ufs_truncate_retry() if their semantics requires it, or
just ignore the failure; remove ufs_wapbl_truncate()

this fixes possible user-triggerable panic during truncate, and
resolves WAPBL performance issue with truncates of large files

PR kern/47146 and kern/49175
 1.232 19-May-2016  riastradh branches: 1.232.2;
Get rid of UFS_WAPBL_BEGIN1/END1

ufs makeinode no longer releases dvp, so incrementing the
usecount for wapbl is unnecessary.

From coypu.
 1.231 01-Sep-2015  dholland Propagate fix from lfs:
For non-devices, have getattr (and thus stat) produce NODEV in the
rdev field, instead of leaking the address of the first direct block.
 1.230 20-Apr-2015  riastradh Make VOP_LINK return directory still locked and referenced.

Ride 7.99.10 bump.
 1.229 20-Apr-2015  riastradh Fix more dvp->v_mount after vput(dvp).
 1.228 01-Apr-2015  riastradh Don't use dvp after vput(dvp).

Still don't understand why the fstrans_done must happen after the
vput, and that will cause trouble once we move responsibility for the
vrele and unlock outside the vop as it seems obvious we ought to do
-- it's the caller's reference, not the vop's.
 1.227 27-Mar-2015  riastradh Tighten some kasserts in ufs_bufio code paths.
 1.226 27-Mar-2015  riastradh Disentangle buffer-cached I/O from page-cached I/O in UFS.

Page-cached I/O is used for regular files, and is initiated by VFS
users such as userland and NFS.

Buffer-cached I/O is used for directories and symlinks, and is issued
only internally by UFS.

New UFS routine ufs_bufio replaces vn_rdwr for internal use.
ufs_bufio is implemented by new UFS operations uo_bufrd/uo_bufwr,
which sit in ufs_readwrite.c alongside the VOP_READ/VOP_WRITE
implementations.

I preserved the code as much as possible and will leave further
simplification for future commits. I kept the ulfs_readwrite.c
copypasta close to ufs_readwrite.c in case we ever want to merge them
back; likewise ext2fs_readwrite.c.

No externally visible semantic change. All atf fs tests still pass.
 1.225 17-Mar-2015  hannken Change ffs to use vcache_new:
- Change ffs_valloc to return an inode number.
- Remove now obsolete UFS operations UFS_VALLOC and UFS_VFREE.
- Make ufs_makeinode private to ufs_vnops.c and pass vattr instead of mode.
 1.224 29-Oct-2014  christos branches: 1.224.2;
simplify and correct.
 1.223 21-Oct-2014  slp Move and unify indirect block truncate algorithm into a separate function.

Reviewed by joerg.
 1.222 18-Oct-2014  snj src is too big these days to tolerate superfluous apostrophes. It's
"its", people!
 1.221 25-May-2014  hannken branches: 1.221.2;
ufs_mknod: use vcache_get() to reload the new node.
 1.220 23-Jan-2014  hannken branches: 1.220.2;
Change vnode operations create, mknod, mkdir and symlink to return
the resulting vnode *vpp unlocked.

Discussed on tech-kern@

Welcome to 6.99.30
 1.219 17-Jan-2014  hannken Change vnode operations create, mknod, mkdir and symlink to keep the
directory node dvp locked on return.

Discussed on tech-kern@

Welcome to 6.99.29
 1.218 15-Sep-2013  martin Remove unused variables
 1.217 11-Aug-2013  dholland Kill off uo_unmark_vnode/UFS_UNMARK_VNODE as it's now a leftover.
 1.216 16-Jun-2013  dholland branches: 1.216.2;
Add a comment about a matched pair of off-by-one tests that make the
maximum size of short symlinks one byte less than one would think it
should be. Caution against changing it; that would break compatibility
with existing disk images. Behavior noticed by qjsgkem on freenode.

If my analysis is wrong, please correct...
 1.215 09-Jun-2013  christos the speed limit is 80
 1.214 09-Jun-2013  dholland Stick UFS_ in front of these symbols:
DIRBLKSIZ
DIRECTSIZ
DIRSIZ
OLDDIRFMT
NEWDIRFMT

Part of PR 47909.
 1.213 08-Jun-2013  kardel fix clearing of system-flags (schg, sappnd). clearing system flags is possible again
at securelevel < 1.
reviewed by christos@
 1.212 18-Mar-2013  plunky C99 section 6.7.2.3 (Tags) Note 3 states that:

A type specifier of the form

enum identifier

without an enumerator list shall only appear after the type it
specifies is complete.

which means that we cannot pass an "enum vtype" argument to
kauth_access_action() without fully specifying the type first.
Unfortunately there is a complicated include file loop which
makes that difficult, so convert this minimal function into a
macro (and capitalize it).

(ok elad@)
 1.211 22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.210 04-Jun-2012  riastradh branches: 1.210.2;
Kill the IN_RENAME in-core inode flag in ufs and ext2fs.

Now that rename works we need not to wave this sort of voodoo at it.

ok dholland
 1.209 09-May-2012  riastradh Adapt ffs, lfs, and ext2fs to use genfs_rename.

ok dholland, rmind
 1.208 13-Mar-2012  elad Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.
 1.207 27-Jan-2012  para converting readdir in ffs ext2fs from malloc(9) to kmem(9)
while there allocate ufs mount structs from kmem(9) too
preceding kmem-vmem-pool-patch

releng@ acknowledged
 1.206 18-Nov-2011  christos branches: 1.206.4;
Obey MNT_RELATIME, the only addition is that mkdir in ufs sets IN_ACCESS too.
 1.205 27-Sep-2011  christos branches: 1.205.2;
include the proper headers to make {LFS,EXT2FS}_MAXNAMLEN visible
 1.204 27-Sep-2011  christos it is __CTASSERT()
 1.203 27-Sep-2011  christos use FFS_MAXNAMLEN instead of NAME_MAX, making sure that it matches with
EXT2FS_MAXNAMLEN and LFS_MAXNAMLEN.
 1.202 03-Aug-2011  hannken Make whiteouts work on journaling ffs file system by adding the missing
UFS_WAPBL_BEGIN() / UFS_WAPBL_END() around CREATE and DELETE ops.

Fixes PR #44377 (union whiteouts don't work on ffs -o log)
 1.201 29-Jul-2011  riastradh In ufs_rename, declare oldparent and newparent ino_t, not int.

XXX There should be an automatic test for this somewhere.

ok dholland
 1.200 18-Jul-2011  dholland Eliminate the old ufs_rename. The only reason the WAPBL one was
different is that in order to avoid issues with the WAPBL journal lock
the wrong locking had to be changed to different wrong locking. This
is now moot.

I have not hand-validated that the current two copies of rename are
equivalent, or that the locking fixes merged with the old rename
produce code that is textually identical (modulo WAPBL calls that do
nothing when WAPBL is turned off) to the WAPBL rename... but I did
this check when preparing my previous round of rename patches last
year and all updates since have been applied to both.
 1.199 18-Jul-2011  dholland Move ufs_wapbl_rename to ufs_vnops.c next to the old ufs_rename.
 1.198 18-Jul-2011  dholland ok, it is clear that at least vput(vp) needs to go before fstrans_done().
I hoping vput(dvp) doesn't, because if it does that will vastly complicate
future vfs locking cleanup.
 1.197 18-Jul-2011  dholland Add the long essay on rename locking from my earlier patch set as a
big comment, and expand it some for clarity.
 1.196 18-Jul-2011  dholland Fix error path for UFS_WAPBL_BEGIN failure to avoid leaking an vnode
(in memory) and inode (on disk). Caution: untested; I have no idea how
to provoke such a failure.
 1.195 17-Jul-2011  dholland At the end of ufs_rmdir, don't use a dangling vnode pointer to call
fstrans_done. Ok hannken@
 1.194 17-Jul-2011  dholland Fix typo in ufs_rmdir that causes locking botches. This code should be
unreachable because the FS-independent code contains the same test... but
I'm not sure if that applies if nfsd is involved.
 1.193 14-Jul-2011  dholland Clean up handling of ufs_lookup_results in rename.
 1.192 12-Jul-2011  dholland Pass the ufs_lookup_results pointer around instead of fetching it from
the inode in the guts of ufs. Now, in VOPs where i_crap is used it is
used (directly) only immediately on entry to the VOP call and then
passed around by reference.

Except for rename, which needs explicit sorting out. The code in
ufs_wapbl_rename is unchanged in behavior but I'm increasingly
inclined to think it's wrong.
 1.191 12-Jul-2011  dholland Currently, ufs_lookup produces five auxiliary results that are left in
the vnode when lookup returns and fished out again later.

1. Create struct ufs_lookup_results to hold these.

2. Call the ufs_lookup_results instance in struct inode "i_crap" to be
clear about exactly what's going on, and to distinguish the lookup
results from respectable members of struct inode.

3. Update references to these members in the directory access
subroutines.

4. Include preliminary infrastructure for checking that the i_crap
being used is still valid when it's used. This doesn't actually do
anything yet.

5. Update the way ufs_wapbl_rename manipulates these elements to use
the new data structures. I have not changed the manipulation; it may
or may not be correct but I continue to suspect that it is not.

The word of the day is "stigmergy".
 1.190 11-Jul-2011  hannken Change VOP_BWRITE() to take a vnode as its first argument like all other
VOPs do. Layered file systems no longer have to modify bp->b_vp and run
into trouble when an async VOP_BWRITE() uses the wrong vnode.

- change all occurences of VOP_BWRITE(bp) to VOP_BWRITE(bp->b_vp, bp).
- remove layer_bwrite().
- welcome to 5.99.55

Adresses PR kern/38762 panic: vwakeup: neg numoutput

No objections from tech-kern@.
 1.189 30-Apr-2011  hannken ufs_mknod: change vnode type to VNON before it gets unlocked. Closes a small
window where the vnode could have type VCHR but op vector ffs_vnodeop_p.
 1.188 24-Apr-2011  rmind sys_link: prevent hard links on directories (cross-mount operations are
already prevented). File systems are no longer responsible to check this.
Clean up and add asserts (note that dvp == vp cannot happen in vop_link).

OK dholland@
 1.187 06-Mar-2011  bouyer merge the bouyer-quota2 branch. This adds a new on-disk format
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.

See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
 1.186 02-Jan-2011  dholland branches: 1.186.2; 1.186.4;
Remove the special refcount behavior (adding an extra reference to the
parent dir) associated with SAVESTART in relookup().

Check all call sites to make sure that SAVESTART wasn't set while
calling relookup(); if it was, adjust the refcount behavior. Remove
related references to SAVESTART.

The only code that was reaching the extra ref was msdosfs_rename,
where the refcount behavior was already fairly broken and/or gross;
repair it.

Add a dummy 4th argument to relookup to make sure code that hasn't
been inspected won't compile. (This will go away next time the
relookup semantics change, which they will.)
 1.185 30-Nov-2010  dholland Abolish the SAVENAME and HASBUF flags. There is now always a buffer,
so the path in a struct componentname is now always valid during VOP
calls.
 1.184 30-Nov-2010  dholland Abolish struct componentname's cn_pnbuf. Use the path buffer in the
pathbuf object passed to namei as work space instead. (For now a pnbuf
pointer appears in struct nameidata, to support certain unclean things
that haven't been fixed yet, but it will be going away in the future.)

This removes the need for the SAVENAME and HASBUF namei flags.
 1.183 24-Jun-2010  hannken Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.
 1.182 13-Apr-2010  hannken Add fstrans transactions to ufs_close(), ufs_getattr(), ufs_chmod()
and ufs_chown(). These functions change file system state.
 1.181 29-Mar-2010  pooka Stop exposing fifofs internals and leave only fifo_vnodeop_p visible.
 1.180 14-Oct-2009  hannken branches: 1.180.2; 1.180.4;
ufs_rmdir(): move fstrans_done() after vput(). No more unlinked and
zero-sized directory inodes in snapshots.
 1.179 03-Jul-2009  elad Where possible, extract the file-system's access() routine to two internal
functions: the first checking if the operation is possible (regardless of
permissions), the second checking file-system permissions, ACLs, etc.

Mailing list reference:

http://mail-index.netbsd.org/tech-kern/2009/06/21/msg005311.html
 1.178 23-Jun-2009  elad Move the implementation of vaccess() to genfs_can_access(), in line with
the other routines of the same spirit.

Adjust file-system code to use it.

Keep vaccess() for KPI compatibility and to keep element of least
surprise. A "diagnostic" message warning that vaccess() is deprecated will
be printed when it's used (obviously, only in DIAGNOSTIC kernels).

No objections on tech-kern@:

http://mail-index.netbsd.org/tech-kern/2009/06/21/msg005310.html
 1.177 08-May-2009  rmind ufs_setattr: fix previous - return in error path does not finish the
transaction (hi elad).
 1.176 07-May-2009  elad Extract the open-coded authorization logic for chtimes() from various
file-systems and put it in a single function, genfs_can_chtimes().

This also makes UDF follow the same policy as all other file-systems.

Mailing list reference:

http://mail-index.netbsd.org/tech-kern/2009/04/27/msg004951.html
 1.175 22-Apr-2009  elad Per discussion on tech-kern@:

- Replace use of label/goto with returns

- Rename, change prototype of, and move functions from vfs_subr.c to
genfs_vnops.c
 1.174 20-Apr-2009  elad Refactor some duplicated file-system code.

Proposed and received no objections on tech-kern@:

http://mail-index.netbsd.org/tech-kern/2009/04/18/msg004843.html
 1.173 22-Feb-2009  ad PR kern/26878 FFSv2 + softdep = livelock (no free ram)
PR kern/16942 panic with softdep and quotas
PR kern/19565 panic: softdep_write_inodeblock: indirect pointer #1 mismatch
PR kern/26274 softdep panic: allocdirect_merge: ...
PR kern/26374 Long delay before non-root users can write to softdep partitions
PR kern/28621 1.6.x "vp != NULL" panic in ffs_softdep.c:4653 while unmounting a softdep (+quota) filesystem
PR kern/29513 FFS+Softdep panic with unfsck-able file-corruption
PR kern/31544 The ffs softdep code appears to fail to write dirty bits to disk
PR kern/31981 stopping scsi disk can cause panic (softdep)
PR kern/32116 kernel panic in softdep (assertion failure)
PR kern/32532 softdep_trackbufs deadlock
PR kern/37191 softdep: locking against myself
PR kern/40474 Kernel panic after remounting raid root with softdep

Retire softdep, pass 2. As discussed and later formally announced on the
mailing lists.
 1.172 11-Jan-2009  christos branches: 1.172.2;
merge christos-time_t
 1.171 13-Nov-2008  ad Remove #ifdef LFS from the ufs code.
 1.170 11-Nov-2008  joerg Move WAPL replay handling from bread() into ufs_strategy.
This changes the order of hook processing as the copy-on-write handlers
are called after the journal processing. This makes more sense as the
journal overwrite is logically part of the disk IO.
 1.169 14-Aug-2008  matt branches: 1.169.2; 1.169.4; 1.169.6; 1.169.10;
Implement following constants and add support their to the UFS family of file
systems:
_PC_2_SYMLINKS
_PC_SYMLINK_MAX

From andy dot shevchenko at gmail dot com.
 1.168 12-Aug-2008  hannken Deny read/write access to snapshot vnodes. We use fss(4) to read from
snapshots. With this policy in place:

- Separate the snapshot vnode lock from the snapshot common lock.
Snapshots no longer need recursive vnode locks.

- Use a mutex (si_snaplock) to serialize creation, deletion, reading and
writing of snapshots.

- Move ffs_read() for snapshots into ffs_snapshot.c.

Reviewed by: Jason Thorpe <thorpej@netbsd.org>

While here change ffs_copyonwrite() to fail requests from pagedaemon that need
to copy-on-write.
 1.167 31-Jul-2008  simonb Merge the simonb-wapbl branch. From the original branch commit:

Add Wasabi System's WAPBL (Write Ahead Physical Block Logging)
journaling code. Originally written by Darrin B. Jewell while
at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

OK'd by core@, releng@.
 1.166 02-Jun-2008  ad branches: 1.166.2; 1.166.4;
Don't needlessly acquire v_interlock.
 1.165 31-May-2008  ad XXX softdep:

If the number of deletes in progress is getting too high, newdirrem()
requests the syncer to flush faster, and in some cases will block to
prevent deletes accumulating faster than the disk can service them.

The syncer will try to lock vnodes that the remover holds locked, leading
to the syncer and remover proceeding in lockstep and making very little
overall forward progress.

Put a hook into ufs_rmdir() and ufs_remove() so that the softdep code
can pace itself without holding vnode locks if the number of deletes is
running out of control.
 1.164 30-Jan-2008  ad branches: 1.164.6; 1.164.8; 1.164.10; 1.164.12;
Replace struct lock on vnodes with a simpler lock object built on
krwlock_t. This is a step towards removing lockmgr and simplifying
vnode locking. Discussed on tech-kern.
 1.163 24-Jan-2008  ad specfs changes for PR kern/37717 (raidclose() is no longer called on
shutdown). There are still problems with device access and a PR will be
filed.

- Kill checkalias(). Allow multiple vnodes to reference a single device.

- Don't play dangerous tricks with block vnodes to ensure that only one
vnode can describe a block device. Instead, prohibit concurrent opens of
block devices. As a bonus remove the unreliable code that prevents
multiple file system mounts on the same device. It's no longer needed.

- Track opens by vnode and by device. Issue cdev_close() when the last open
goes away, instead of abusing vnode::v_usecount to tell if the device is
open.
 1.162 03-Jan-2008  ad Use pool_cache.
 1.161 02-Jan-2008  ad Merge vmlocking2 to head.
 1.160 08-Dec-2007  pooka branches: 1.160.4;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.
 1.159 26-Nov-2007  pooka branches: 1.159.2;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.158 23-Nov-2007  pooka Update comments: ufs_{rename,mkdir,rmdir} haven't been system calls
since 4.3BSD.
 1.157 10-Oct-2007  ad branches: 1.157.4;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.156 09-Aug-2007  hannken branches: 1.156.2; 1.156.4;
Move the fstrans-aware lock vnops from ufs to ffs. Other ufs file systems
do not need them.

Ride on 4.99.28
 1.155 29-Jul-2007  ad branches: 1.155.4; 1.155.6;
It's not a good idea for device drivers to modify b_flags, as they don't
need to understand the locking around that field. Instead of setting
B_ERROR, set b_error instead. b_error is 'owned' by whoever completes
the I/O request.
 1.154 10-Jul-2007  hannken branches: 1.154.2;
Move `struct dquot' and its supporting functions from quota.h to ufs_quota.c.

- Make quota-internal functions static.
- Clean up declarations in quota.h and ufs_extern.h. quota.h now has the
description of quota criterions, on-disk structure, user-kernel interface and
declaration of init/done functions. All ufs quota related function
prototypes go to ufs_extern.h.
- New functions ufsquota_init() and ufsquota_free() create or destroy the
quota fields of `struct inode'.
- chkdq() and chkiq() always update the quota fields of `struct inode' first.
- Only ufs_access() explicitely calls getinoquota().

No objections on tech-kern@
 1.153 17-May-2007  hannken Fstrans_start() always returns zero, so change its type to void.
 1.152 04-Mar-2007  christos branches: 1.152.2; 1.152.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.151 20-Feb-2007  pooka tyop in comment, fix it
 1.150 20-Feb-2007  pooka In readdir, in case cookies was already allocated but is later free'd
due to an error, reset value of cookies to NULL to avoid confusing
callers.

should fix kern/35728
 1.149 29-Jan-2007  hannken branches: 1.149.2;
Change fstrans enum types to upper case.
No functional change.

From Antti Kantee <pooka@netbsd.org>
 1.148 19-Jan-2007  hannken New file system suspension API to replace vn_start_write and vn_finished_write.
The suspension helpers are now put into file system specific operations.
This means every file system not supporting these helpers cannot be suspended
and therefore snapshots are no longer possible.

Implemented for file systems of type ffs.

The new API is enabled on a kernel option NEWVNGATE. This option is
not enabled by default in any kernel config.

Presented and discussed on tech-kern with much input from
Bill Studenmund <wrstuden@netbsd.org> and YAMAMOTO Takashi <yamt@netbsd.org>.

Welcome to 4.99.9 (new vfs op vfs_suspendctl).
 1.147 04-Jan-2007  elad Consistent usage of KAUTH_GENERIC_ISSUSER.
 1.146 02-Jan-2007  elad Add KAUTH_SYSTEM_CHSYSFLAGS so we can get rid of the last three
securelevel references (ufs, ext2fs, tmpfs).

Intentionally undocumented.
 1.145 26-Dec-2006  yamt ufs_readdir: start from offsets known to be valid,
rather than assuming users feed us valid offsets.
 1.144 09-Dec-2006  chs a smorgasbord of improvements to vnode locking and path lookup:
- LOCKPARENT is no longer relevant for lookup(), relookup() or VOP_LOOKUP().
these now always return the parent vnode locked. namei() works as before.
lookup() and various other paths no longer acquire vnode locks in the
wrong order via vrele(). fixes PR 32535.
as a nice side effect, path lookup is also up to 25% faster.
- the above allows us to get rid of PDIRUNLOCK.
- also get rid of WANTPARENT (just use LOCKPARENT and unlock it).
- remove an assumption in layer_node_find() that all file systems implement
a recursive VOP_LOCK() (unionfs doesn't).
- require that all file systems supply vfs_vptofh and vfs_fhtovp routines.
fill in eopnotsupp() for file systems that don't support being exported
and remove the checks for NULL. (layerfs calls these without checking.)
- in union_lookup1(), don't change refcounts in the ISDOTDOT case, just
adjust which vnode is locked. fixes PR 33374.
- apply fixes for ufs_rename() from ufs_vnops.c rev. 1.61 to ext2fs_rename().
 1.143 03-Oct-2006  christos branches: 1.143.2;
Coverity CID 3690: Add KASSERT to check for reverse INULL.
 1.142 23-Jul-2006  ad branches: 1.142.4; 1.142.6;
Use the LWP cached credentials where sane.
 1.141 07-Jun-2006  kardel merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.140 14-May-2006  elad branches: 1.140.2;
integrate kauth.
 1.139 01-Mar-2006  yamt branches: 1.139.2; 1.139.4; 1.139.6;
merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.
 1.138 11-Dec-2005  christos branches: 1.138.2; 1.138.4; 1.138.6;
merge ktrace-lwp.
 1.137 11-Nov-2005  yamt branches: 1.137.2;
- ignore truncation for VCHR/VBLK/VFIFO as it used to be
before yamt-vop merge. PR/32049 from Atsushi Onoe.
- reject setattr which attempts to change size of VLNK/VSOCK.
 1.136 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.135 27-Sep-2005  yamt branches: 1.135.2;
introduce "ufs_ops" and use it for ITIMES.
 1.134 12-Sep-2005  christos - access the ffs and ext2fs itimes functions through a pointer, so that
if the filesystem is not compiled in the kernel still links. Probably
a better solution is to use weak symbols.
- move the filesystem-specific itime macros to the filesystem header files.
 1.133 12-Sep-2005  christos Use nanotime() to update the time fields in filesystems. Convert the code
from macros to real functions. Original patch and review from chuq.
Note: ext2fs only keeps seconds in the on-disk inode, and msdosfs does not
have enough precision for all fields, so this is not very useful for those
two.
 1.132 23-Aug-2005  yamt ufs_readdir: don't leak kernel garbage to userland.
 1.131 23-Aug-2005  yamt ufs_readdir: when computing the maximum number of entries,
use _DIRENT_RECLEN(cdp, 1) instead of "4".
 1.130 19-Aug-2005  christos 64 bit inode changes.
 1.129 23-Jul-2005  yamt update file timestamps for nfsd loaned-read and mmap.
PR/25279. discussed on tech-kern@.
 1.128 10-Jul-2005  thorpej Defflag UFS_DIRHASH.
 1.127 23-Mar-2005  perseant branches: 1.127.2;
Make LFS dirops get their vnode first, before incrementing the dirop count,
to prevent a deadlock trying to call VOP_PUTPAGES() on a VDIROP vnode.
This can happen when a stacked filesystem is mounted on top of an LFS: an
LFS dirop needs to get a vnode, which is available from the upper layer.
The corresponding lower layer vnode, however, is VDIROP, so the upper layer
can't be cleaned out since its VOP_PUTPAGES() is passed through to the lower
layer, which waits for dirops to drain before it can proceed. Deadlock.

Tweak ufs_makeinode() and ufs_mkdir() to pass the a_vpp argument through
to VOP_VALLOC().

Partially addresses PR # 26043, though it probably does not completely fix
the problem described there.
 1.126 26-Feb-2005  perry branches: 1.126.2;
nuke trailing whitespace
 1.125 24-Jan-2005  dbj branches: 1.125.2;
check _KERNEL_OPT instead of !_LKM to conditionalize opt includes
 1.124 23-Jan-2005  rumble Bring in Ian Dowse's Dirhash from FreeBSD. Hash tables of
directories are created on the fly and used to increase
performance by circumventing ufs_lookup's linear search.

Dirhash is enabled by the UFS_DIRHASH option, but not
by default.
 1.123 21-Sep-2004  thorpej branches: 1.123.4;
Add a new VNODE_LOCKDEBUG option, which enables checks in the VOP_*()
calls to ensure that the vnode lock state is as expected when the VOP
call is made. Modify vnode_if.src to set the expected state according
to the documenting lock table for each VOP. Modify vnode_if.sh to emit
the checks.

Notes:
- The checks are only performed if the vnode has the VLOCKSWORK bit
set. Some file systems (e.g. specfs) don't even bother with vnode
locks, so of course the checks will fail.
- We can't actually run with VNODE_LOCKDEBUG because there are so many
vnode locking problems, not the least of which is the "use SHARED for
VOP_READ()" issue, which screws things up for the entire call chain.

Inspired by similar changes in OpenBSD, but implemented differently.
 1.122 17-Sep-2004  skrll There's no need to pass a proc value when using UIO_SYSSPACE with
vn_rdwr(9) and uiomove(9).

OK'd by Jason Thorpe
 1.121 15-Aug-2004  mycroft Another piece of FFS_EI flotsam.
 1.120 15-Aug-2004  mycroft Repair some FFS_EI code for ufsmount changes.
 1.119 15-Aug-2004  mycroft Fixing age old cruft:
* Rather than using mnt_maxsymlinklen to indicate that a file systems returns
d_type fields(!), add a new internal flag, IMNT_DTYPE.

Add 3 new elements to ufsmount:
* um_maxsymlinklen, replaces mnt_maxsymlinklen (which never should have existed
in the first place).
* um_dirblksiz, which tracks the current directory block size, eliminating the
FS-specific checks littered throughout the code. This may be used later to
make the block size variable.
* um_maxfilesize, which is the maximum file size, possibly adjusted lower due
to implementation issues.

Sync some bug fixes from FFS into ext2fs, particularly:
* ffs_lookup.c 1.21, 1.28, 1.33, 1.48
* ffs_inode.c 1.43, 1.44, 1.45, 1.66, 1.67
* ffs_vnops.c 1.84, 1.85, 1.86

Clean up some crappy pointer frobnication.
 1.118 14-Aug-2004  mycroft Add a new flag, IN_MODIFY. This is like IN_UPDATE|IN_CHANGE, but unlike
setting those flags, it does not cause the inode to be written in the periodic
sync. This is used for writes to special files (devices and named pipes) and
FIFOs.

Do not preemptively sync updates to access times and modification times. They
are now updated in the inode only opportunistically, or when the file or device
is closed. (Really, it should be delayed beyond close, but this is enough to
help substantially with device nodes.)

And the most amusing part:
Trickle sync was broken on both FFS and ext2fs, in different ways. In FFS, the
periodic call to VFS_SYNC(MNT_LAZY) was still causing all file data to be
synced. In ext2fs, it was causing the metadata to *not* be synced. We now
only call VOP_UPDATE() on the node if we're doing MNT_LAZY. I've confirmed
that we do in fact trickle correctly now.
 1.117 24-Jul-2004  dbj remove incorrect casts that limit some uses of daddr_t to 31 bits
this fixes problems using ffs2 with more than 2^31 sectors (~1tb)
 1.116 20-Jun-2004  hannken Use a pool for struct direct instead of kernel stack.
Reduces the kernel stack usage by 264 bytes.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.115 25-May-2004  hannken Add ffs internal snapshots. Written by Marshall Kirk McKusick for FreeBSD.

- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.

Welcome to 2.0F.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.114 22-May-2004  kleink POSIX: Permit a process without the appropriate privilege to change a
file's group ID to its effective gid, in addition to the presently
permitted set of supplementary gids.

From Mark Davies in PR standards/25401.
 1.113 26-Jan-2004  hannken branches: 1.113.2;
Fix xxx_strategy() to use the vnode arg instead of bp->b_vp.
 1.112 26-Jan-2004  hannken Fix mfs_strategy() to use the vp argument.
From YAMAMOTO Takashi <yamt@netbsd.org>.
 1.111 26-Jan-2004  itojun avoid panic on monut_mfs. Greg Oster
 1.110 25-Jan-2004  hannken Make VOP_STRATEGY(bp) a real VOP as discussed on tech-kern.

VOP_STRATEGY(bp) is replaced by one of two new functions:

- VOP_STRATEGY(vp, bp) Call the strategy routine of vp for bp.
- DEV_STRATEGY(bp) Call the d_strategy routine of bp->b_dev for bp.

DEV_STRATEGY(bp) is used only for block-to-block device situations.
 1.109 08-Nov-2003  dbj only let i_ffs_effnlink diverge from i_nlink if DOINGSOFTDEP
 1.108 11-Sep-2003  christos PR/15397: Jason Thorpe: directory operations on pathnames that refer to
directories and have trailing slashes should succeed. Ok'd by kjk.
Fix provided by enami.
 1.107 16-Aug-2003  dsl gcc for sparc seems to barf at 'int_var * 0x100000000ull' so do
'(uint64_t)(uint)int_var << 32' even though it generates twice as many
instructions on i386!
 1.106 10-Aug-2003  dsl Remove only (last?) use of SETHIGH and SETLOW before gcc starts warning
about the odd construct. Also fixes kern/6525.
 1.105 09-Aug-2003  dsl Stop panic if 'mknod xxx b 0 0' done on a full filesystem.
panics in ffs_full_fsync because v_specmountpoint requires that the NULL
v_specinfo be followed.
Tidy up in the same order in all error paths so compiler can merge the
code sequences.

Fixes PR kern/22419
 1.104 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.103 05-Aug-2003  pk Pass the inode flags to set as an argument to ufs_dirrewrite().
Use it to restore the behaviour of not updating the modified time of a
directory that moves to a new parent.
 1.102 29-Jun-2003  fvdl branches: 1.102.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.101 29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.100 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.99 15-May-2003  kristerw The C language does not permit statements of the form
(X ? Y : Z) = 0;
even though gcc handles this by a stupid extension.

Transform these to correct C.

Approved by fvdl.
 1.98 29-Apr-2003  yamt constify 'mastertemplate'.
 1.97 25-Apr-2003  fvdl Assign the right linkcount when renaming a directory.
 1.96 11-Apr-2003  fvdl Remove diagnostic ufs_vinit check, this isn't quite the right place for it;
it'll be reinstated elsewhere.
 1.95 11-Apr-2003  fvdl Adjust diagnostic check for bad mode field; only the VNON case should
matter. From Enami.
 1.94 10-Apr-2003  fvdl Add diagnostic check to ufs_vinit in order to catch bad mode fields in
inodes early.
 1.93 04-Apr-2003  drochner adapt to struct inode change (in UVMHIST code)
 1.92 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.91 15-Mar-2003  perseant Make LFS LKM versions of ufs_makeinode and ufs_mkdir fail correctly.

Note dependency of lfs_vnops.o on ufs_readwrite.c.
 1.90 17-Feb-2003  perseant Add code to UBCify LFS. This is still behind "#ifdef LFS_UBC" for now
(there are still some details to work out) but expect that to go
away soon. To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:

* Create a writer daemon kernel thread whose purpose is to handle page
writes for the pagedaemon, but which also takes over some of the
functions of lfs_check(). This thread is started the first time an
LFS is mounted.

* Add a "flags" parameter to GOP_SIZE. Current values are
GOP_SIZE_READ, meaning that the call should return the size of the
in-core version of the file, and GOP_SIZE_WRITE, meaning that it
should return the on-disk size. One of GOP_SIZE_READ or
GOP_SIZE_WRITE must be specified.

* Instead of using malloc(...M_WAITOK) for everything, reserve enough
resources to get by and use malloc(...M_NOWAIT), using the reserves if
necessary. Use the pool subsystem for structures small enough that
this is feasible. This also obsoletes LFS_THROTTLE.

And a few that are not strictly necessary:

* Moves the LFS inode extensions off onto a separately allocated
structure; getting closer to LFS as an LKM. "Welcome to 1.6O."

* Unified GOP_ALLOC between FFS and LFS.

* Update LFS copyright headers to correct values.

* Actually cast to unsigned in lfs_shellsort, like the comment says.

* Keep track of which segments were empty before the previous
checkpoint; any segments that pass two checkpoints both dirty and
empty can be summarily cleaned. Do this. Right now lfs_segclean
still works, but this should be turned into an effectless
compatibility syscall.
 1.89 31-Dec-2002  yamt don't set vnode type to VNON in error case of ufs_makeinode.
(backout rev.1.74)

it seems that there's no need to do it (anymore?) and LFS has trouble with it.
(VNON vnodes marked VDIROP will never reclaimed)

ok'ed by Frank van der Linden.
 1.88 23-Oct-2002  jdolecek merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe
 1.87 28-Sep-2002  dbj Add support for the Apple UFS variation on ffs
This is the bulk of PR #17345

The general approach is to use a run time deteriminable value
for DIRBLKSIZ. Additional allowances are included for using
MAXSYMLINKLEN with FS_42INODEFMT and a shift in the cylinder group
cluster summary count array. Support is added for managing
the Apple UFS volume label.
 1.86 14-May-2002  mycroft In ufs_mkdir(), write the data block *before* updating the inode with the
block pointer, to prevent "DIRECTORY CORRUPTED" errors from fsck(8).
Note: The behavior in the softdep case is unchanged, but needs to be fixed.
 1.85 23-Dec-2001  fvdl As pointed out by mycroft and reflected in the comment, update the
directory inode before creating the new entry (not the freshly alloced
directory which isn't linked anywhere yet).
 1.84 23-Dec-2001  fvdl Fix botch in my original softdep code merge: remove redundant (and
synchronous to boot) VOP_UPDATE call.
 1.83 08-Nov-2001  lukem add RCSID
 1.82 23-Sep-2001  chs branches: 1.82.2;
when creating a symlink, set the vnode's copy of the size also.
 1.81 15-Sep-2001  chs a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.80 24-Aug-2001  wiz branches: 1.80.2;
heirarchy -> hierarchy
 1.79 24-Jul-2001  assar change vop_symlink and vop_mknod to return vpp (the created node)
refed, so that the caller can actually use it. update callers and
file systems that implement these vnode operations
 1.78 28-May-2001  chs branches: 1.78.4;
add a genfs_mmap() and change all of the disk-based filesystems
to implement VOP_MMAP() with the genfs version, in preparation for
actually using this VOP.
 1.77 23-Mar-2001  fvdl Do an explicit VOP_UNLOCK in ufs_vinit before setting v_op to
spec_vnode_ops_p. Workaround for a lock leak. Problem tracked
down by der Mouse.
 1.76 26-Feb-2001  lukem branches: 1.76.2;
convert to ANSI KNF
 1.75 27-Nov-2000  chs Initial integration of the Unified Buffer Cache project.
 1.74 19-Oct-2000  pk In ufs_makeinode(), set the new vnode type to VNON before calling vput().
 1.73 03-Aug-2000  thorpej Convert namei pathname buffer allocation to use the pool allocator.
 1.72 22-Jul-2000  jdolecek change the lf_advlock() arguments from

int lf_advlock __P((struct lockf **,
off_t, caddr_t, int, struct flock *, int));
to

int lf_advlock __P((struct vop_advlock_args *, struct lockf **, off_t));

This matches common usage and is also compatible with similar change
in FreeBSD (though they use u_quad_t as last arg).
 1.71 05-Jul-2000  perseant Clean up accounting of lfs_uinodes (dirty but unwritten inodes).

Make lfs_uinodes a signed quantity for debugging purposes, and set it to
zero as fs mount time.

Enclose setting/clearing of the dirty flags (IN_MODIFIED, IN_ACCESSED,
IN_CLEANING) in macros, and use those macros everywhere. Make
LFS_ITIMES use these macros; updated the ITIMES macro in inode.h to know
about this. Make ufs_getattr use ITIMES instead of FFS_ITIMES.
 1.70 28-Jun-2000  mrg remove include of <vm/vm.h> and <uvm/uvm_extern.h>
 1.69 27-Jun-2000  perseant Fixes associated with filling an LFS:

Change the space computation to appear to change the size of the *disk*
rather than the *bytes used* when more segment summaries and inode
blocks are written. Try to estimate the amount of space that these will
take up when more files are written, so the disk size doesn't change too
much.

Regularize error returns from lfs_valloc, lfs_balloc, lfs_truncate: they
now fail entirely, rather than succeeding half-way and leaving the fs in
an inconsistent state.

Rewrite lfs_truncate, mostly stealing from ffs_truncate. The old
lfs_truncate had difficulty truncating a large file to a non-zero size
(indirect blocks were not handled appropriately).

Unmark VDIROP on fvp after ufs_remove, ufs_rmdir, so these can be
reclaimed immediately: this vnode would not be written to disk again
anyway if the removal succeeded, and if it failed, no directory
operation occurred.

ufs_makeinode and ufs_mkdir now remove IN_ADIROP on error.
 1.68 30-May-2000  mycroft branches: 1.68.2;
Back out previous kluge.
 1.67 30-May-2000  fvdl Mark an inode as changed after a rename. It wasn't before in the softdep
case, which created inodes with dependencies, but no IN_* flag set,
so the dependencies were never flushed (after the waitfor check in
ffs_update was removed).
 1.66 13-May-2000  perseant branches: 1.66.2;
Change the sementics of the last parameter from a boolean ("waitfor") to
a set of flags ("flags"). Two flags are defined, UPDATE_WAIT and
UPDATE_DIROP.

Under the old semantics, VOP_UPDATE would block if waitfor were set,
under the assumption that directory operations should be done
synchronously. At least LFS and FFS+softdep do not make this
assumption; FFS+softdep got around the problem by enclosing all relevant
calls to VOP_UPDATE in a "if(!DOINGSOFTDEP(vp))", while LFS simply
ignored waitfor, one of the reasons why NFS-serving an LFS filesystem
did not work properly.

Under the new semantics, the UPDATE_DIROP flag is a hint to the
fs-specific update routine that the call comes from a dirop routine, and
should be wait for, or not, accordingly.

Closes PR#8996.
 1.65 05-May-2000  perseant Change the way LFS does block accounting, from trying to infer from the
buffer cache flags, to marking the inode and/or indirect blocks with a
special disk address UNWRITTEN==-2 when a block is accounted for. (This
address is never written to disk, but only used in-core. This is essentially
the same method of block accounting as on the UBC branch, where the buffer
headers don't exist.) Make sure that truncation is handled properly,
especially in the case of holey files.

Fixes PR#9994.
 1.64 30-Mar-2000  augustss Remove register declarations.
 1.63 30-Mar-2000  simonb Delete redundant decls of fifo_vnodeop_p - it's in <miscfs/fifofs/fifo.h>.
Don't need <sys/conf.h> here.
 1.62 14-Feb-2000  fvdl Fixes to the softdep code from Ethan Solomita <ethan@geocast.com>.
* Fix buffer ordering when it has dependencies.
* Alleviate memory problems.
* Deal with some recursive vnode locks (sigh).
* Fix other bugs.
 1.61 13-Dec-1999  wrstuden Modify ufs_rename() to a) be more careful about reference counts (we no longer
depend on the initial lookups being doen with SAVESTART), and b) check
return values for errors.

Should fix PR 8491 for ufs - two simultaneous identical renames will now
work correctly. One will succeed, one will fail.
 1.60 16-Nov-1999  lukem fix lp64 lossage
 1.59 15-Nov-1999  fvdl Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.58 08-Jul-1999  wrstuden branches: 1.58.2; 1.58.4; 1.58.8;
Modify file systems to deal with struct lock in struct vnode. All leaf
fs's other than nfs use genfs_lock() for locking.

Modify lookup routines to set PDIRUNLOCK when they unlock the parrent.
 1.57 24-Mar-1999  mrg branches: 1.57.2; 1.57.4;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.
 1.56 22-Mar-1999  kleink Add _PC_FILESIZEBITS to pathconf vnop.
 1.55 05-Mar-1999  mycroft Pass null pointers to VOP_UPDATE rather than having all the callers fetch the
current time themselves.
 1.54 26-Feb-1999  mrg pull across patches from warner losh <imp@freebsd.org> (freebsd ufs_vnops.c
versions 1.109&1.110), adjusted for our ext2fs support, and also commited
there also. this avoids overflowing the link count.
 1.53 01-Dec-1998  kenh Update device special file modification times if NODEVMTIME isn't set.
 1.52 06-Nov-1998  cgd cast arg to dbtob to u_quad_t; consisent, and fixed size unnecessary
 1.51 08-Sep-1998  fvdl Fix some maxsymlinklen comparisons for old filesystems that were
wrong after the byteswap changes.
 1.50 08-Sep-1998  fvdl Correct maxsymlink comparison for old filesystems that was clobbered in
byteswap changes.
 1.49 04-Sep-1998  kenh Add a NODEVMTIME compile-time option. This will inhibit the updating
of modification times on device special files. Probably only useful
for low-power systems.
 1.48 30-Aug-1998  rvb Remove v_type != DIR check. First, vn_readdir already does
this check, before calling VOP_READDIR. Second, vn_readdir
returns a different error even. Finally, some FS's might
want to write their directories into files that look like
BSD directories and then have ufs_readdir parse them.
 1.47 10-Aug-1998  matthias create miscfs/genfs/genfs_vnops.c:genfs_enoioctl and make all the other
filesystems use it instead of a private version.
 1.46 09-Aug-1998  perry bzero->memset, bcopy->memcpy, bcmp->memcmp
 1.45 03-Aug-1998  kleink Recognize _PC_SYNC_IO.
 1.44 28-Jul-1998  thorpej Don't cast the null residual pointer passed to vn_rdwr().
 1.43 24-Jun-1998  sommerfe Always include fifos; "not an option any more".
 1.42 22-Jun-1998  sommerfe defopt for options FIFO
 1.41 13-Jun-1998  kleink KNF, mostly of FFS_EI changes.
 1.40 08-Jun-1998  scottr Use the newly-defined opt_quota.h.
 1.39 08-May-1998  kleink Fix some arithmetics lossage on typeless pointers.
 1.38 18-Mar-1998  bouyer Add support for reading/writing FFS in non-native byte order, conditioned
to "options FFS_EI". The superblock and inodes (without blk addr) are
byteswapped at disk read/write time, other metadatas are byteswapped
when used (as they are acceeded directly in the buffer cache).
This required the addition of a "um_flags" field to struct ufsmount.
ffs_bswap.c contains superblock and inode byteswap routines also used
by userland utilities.
 1.37 10-Mar-1998  kleink Move the permission check in vfs_syscalls.c::change_owner() back to
ufs_chown() again - the facility required in this context would be a
filesystem-specific super-user determination, which is not available yet.
 1.36 02-Mar-1998  fvdl A cookie should point to the *next* entry. Grrr.
 1.35 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.34 14-Feb-1998  kleink Move some permission-checking code for file owner/group changes up to
vfs_syscalls.c::change_owner(). Also, always update the inode's change time
if the operation succeeds.
 1.33 10-Feb-1998  mrg - add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.32 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)
 1.31 11-Oct-1997  enami branches: 1.31.2;
Backout last change for ufs_readlink. The permission check is now done
in vfs layer.
 1.30 10-Oct-1997  fvdl Last arg to VOP_READDIR became off_t.
 1.29 03-Oct-1997  enami In the function ufs_readlink(), check read permission before
reading link.
 1.28 30-Jun-1997  fvdl branches: 1.28.4;
Return EPERM for an attempt to remove a directory with VOP_REMOVE, not EISDIR.
 1.27 26-Jun-1997  christos Avoid panic triggered by rename("foo/", "bar/..") (From Mycroft)
 1.26 11-Jun-1997  bouyer Add support for ext2fs, this needed a few modifications to ufs/ufs/inode.h:
- added an "union inode_ext" to struct inode, for the per-fs extentions.
For now only ext2fs uses it.
- i_din is now an union:
union {
struct dinode ffs_din; /* 128 bytes of the on-disk dinode. */
struct ext2fs_dinode e2fs_din; /* 128 bytes of the on-disk dinode. */
} i_din
Added a lot of #define i_ffs_* and i_e2fs_* to access the fields.
- Added two macros: FFS_ITIMES and EXT2FS_ITIMES. ITIMES calls the rigth
macro, depending on the time of the inode. ITIMES is used where necessary,
FFS_ITIMES and EXT2FS_ITIMES in other places.
 1.25 08-May-1997  mycroft Pass the vnode type to vaccess(), and use it when checking VEXEC. Make sure
that the mode bits passed to vaccess() and returned by foo_getattr() contain
only permission bits.
 1.24 23-Apr-1997  mikel return EPERM from ufs_setattr() if an attempt is made by non-superuser
to change superuser-only file flags; fixes PR kern/3491.
 1.23 27-Mar-1997  mikel POSIX.1 specifies that a failed link() to a directory must return EPERM,
and EMLINK was not documented; from Klaus Klein in PR standards/3397.
Also documented EOPNOTSUPP for filesystems that don't support hard links.
 1.22 30-Jan-1997  tls add support for noatime mount flag
 1.21 12-Oct-1996  christos revert previous kprintf changes
 1.20 10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.19 01-Sep-1996  mycroft Add a set of generic file system operations that most file systems use.
Also, fix some time stamp bogosities.
 1.18 11-May-1996  mycroft Change VOP_UPDATE() semantics:
* Make 2nd and 3rd args timespecs, not timevals.
* Consistently pass a Boolean as the 4th arg (except in LFS).
Also, fix ffs_update() and lfs_update() to actually change the nsec fields.
 1.17 11-Feb-1996  christos put back traditional symlink change that somehow got lost.
 1.16 09-Feb-1996  christos cross that t and dot that i. Typo in last commit.
 1.15 09-Feb-1996  christos ufs prototype changes
 1.14 09-Feb-1996  mycroft Fix vop_link, vop_symlink, and vop_remove semantics in several ways:
* Change the argument names to vop_link so they actually make sense.
* Implement vop_link and vop_symlink for all file systems, so they do proper
cleanup.
* Require the file system to decide whether or not linking and unlinking of
directories is allowed, and disable it for all current file systems.
 1.13 07-Feb-1996  jtc Revert to sane symlink semantics. This is something we should have done
long ago. Fixes many PRs.
 1.12 01-Feb-1996  jtc Rename struct timespec fields to conform to POSIX.1b
 1.11 09-Oct-1995  mycroft Correct a comment regarding cookies, from Greg Hudson.
 1.10 15-Jun-1995  cgd compensate for timeval/timespec/stat structure changes.
 1.9 03-Jan-1995  cgd fix pr 568
 1.8 27-Dec-1994  mycroft Clear IN_RENAME on failed rename of directory.
 1.7 24-Dec-1994  ws Implement and use a common access checking routine
 1.6 14-Dec-1994  mycroft Sync with CSRG.
 1.5 30-Oct-1994  cgd be more careful with types, also pull in headers where necessary.
 1.4 20-Oct-1994  cgd update for new syscall args description mechanism, and deal safely
with wider types.
 1.3 29-Jun-1994  cgd branches: 1.3.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.2 14-Jun-1994  mycroft Fix compatibility with old fastlinks.
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.3 01-Mar-1998  fvdl Import some files that were changed after Lite2
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.3.2.3 23-Nov-1994  cgd from mycroft, for patch_05
 1.3.2.2 19-Oct-1994  cgd fix that sanity check.
 1.3.2.1 19-Oct-1994  cgd temporary sanity checks, as suggested by charles.
 1.28.4.1 14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.31.2.1 06-Nov-1998  cgd Show correct number of blocks used for files larger than 2GB.
Fixed in trunk as part of Lite-2 merging. (cgd)
 1.57.4.3 02-Aug-1999  thorpej Update from trunk.
 1.57.4.2 11-Jul-1999  chs remove uvm_vnp_uncache(), it's no longer needed.
 1.57.4.1 04-Jul-1999  chs check for bcount being 0 in ufs_strategy().
some casting tidyness.
 1.57.2.1 16-Dec-1999  he Pull up revision 1.61 (requested by wrstuden):
Fix PR#8491: two simultaneous and identical renames would cause
a kernel panic.
 1.58.8.2 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.58.8.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.58.4.2 27-Oct-1999  fvdl Fix a case where ffs_update should not be called to do its work
synchronously when softdepencies are active.
 1.58.4.1 19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.58.2.4 27-Mar-2001  bouyer Sync with HEAD.
 1.58.2.3 12-Mar-2001  bouyer Sync with HEAD.
 1.58.2.2 08-Dec-2000  bouyer Sync with HEAD.
 1.58.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.66.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.68.2.5 26-Feb-2002  he Pull up revisions 1.84-1.85 (requested by fvdl):
Correct a mistake made in the original merge-in of the softdep
code, and fix a problem which caused ffs_fsync to do unneeded
sync writes.
 1.68.2.4 06-Apr-2001  he Pull up revision 1.77 (requested by wrstuden):
Explicitly VOP_UNLOCK before setting v_op to spec_vnode_ops_p.
Works around a lock leak and eventual kernel panic.
 1.68.2.3 19-Oct-2000  he Pull up revision 1.74 (requested by pk):
In ufs_makeinode(), set the new vnode type to VNON before
calling vput().

Prevents kernel panic when running out of disk space in the
middle of allocating a new inode on an FFS or LFS file system.
 1.68.2.2 14-Sep-2000  perseant Pull up recent LFS kernel changes (approved by thorpej):

ufs/ufs/inode.h, 1.20--1.22 (add i_lfs_effnblks extension ;
make ITIMES aware of LFS_ITIMES;
_LKM protection so userland progs
compile)
ufs/ufs/ufs_vnops.c, 1.69, 1.71 (remove IN_ADIROP;
use ITIMES instead of FFS_ITIMES)
ufs/ufs/ufs_readwrite.c, 1.27 (use lfs_reserve in lfs_write)
ufs/lfs/lfs.h, 1.26--1.32 (define LFS_EST_* macros ;
change MIN_FREE_SEGS to lfs_minfreesegs ;
add avail and bfree to CLEANERINFO ;
change lfs_uinodes to signed ;
change lfs_dmeta to signed ;
add whitespace to line up structure
members ;
explicit cast to int32_t in LFS_EST_*
macros)
ufs/lfs/lfs_alloc.c, back out 1.34.2.3 (pullups of 1.39, 1.40);
then pull up 1.38 (clean up on error)
1.39--1.43 (restore fvdl's ufs_hashlock fix ;
restore fvdl's ufs_hashlock fix ;
set i_lfs_effnblks ;
use UINO macros ;
add comments and fix long lines)
ufs/lfs/lfs_balloc.c, 1.19 (don't succeed halfway)
1.21--1.25 (use i_lfs_effnblks ;
fix i_lfs_effnblks computation and
quieten ;
fix i_ffs_blocks in unwritten fragment ;
remove useless debugging check ;
add comments and (c) 2000)
ufs/lfs/lfs_bio.c, 1.24--1.30 (cleanup and make lfs_flush_fs take
"struct lfs *" instead of "struct
mount *" ;
use lfs_minfreeseg instead of
MIN_FREE_SEGS ;
use UINO macros, and copy bfree/avail
to CLEANERINFO ;
add lfs_reserve function ;
1.28--1.30 fix printf formatting)
ufs/lfs/lfs_cksum.c, 1.13 (add (c) 2000)
ufs/lfs/lfs_debug.c, 1.11 (use btodb instead of DEV_BSIZE)
ufs/lfs/lfs_extern.h, 1.18, 1.20--1.21 (function prototype changes)
ufs/lfs/lfs_inode.c, 1.38 (rewrite lfs_truncate from
ffs_truncate)
1.40--1.44 (count written and unwritten blocks
seperately ;
use disk block units instead of bytes ;
remove unnecessary "mod" variable ;
correct B_DELWRI to avoid bawrite panic ;
use lfs_reserve)
ufs/lfs/lfs_segment.c, 1.52-1.59 (use lfs_dmeta to note used summaries ;
check for UNWRITTEN in indirect blocks ;
more debugging stuff inside #ifdef
DEBUG_LFS ;
use LK_CANRECURSE ;
don't drop dirty indirect blocks ;
use UINO macros ;
don't hose the free list ;
use btodb() instead of DEV_BSIZE ;
make it compile again (oops))
ufs/lfs/lfs_subr.c, 1.16--1.17 (check for locked inodes before
changing ;
use btodb() instead of DEV_BSIZE, (c)
2000)
ufs/lfs/lfs_syscalls.c, back out 1.41.4.2 (fvdl's ufs_hashlock fix);
then pull up 1.43 (use lfs_dmeta)
1.44--1.45 (restore fvdl's ufs_hashlock fix)
1.46--1.47 (fix lfs_avail leakage from sblock
segments ;
use UINO macros)
1.49 (bounds-check inode numbers in
lfs_markv)
ufs/lfs/lfs_vfsops.c, 1.53 (use LFS_EST_* macros in lfs_statfs)
1.56--1.58 (initialize lfs_minfreeseg, lfs_effnblk ;
initialize lfs_uinodes ;
initialize lfs_ravail)
ufs/lfs/lfs_vnops.c, 1.40 (remove VDIROP from removed files)
1.42--1.44 (move SET_ENDOP below the removal of
VDIROP ;
use UINO macros and add lfs_itimes
function ;
use lfs_reserve in dirops)
 1.68.2.1 30-Jul-2000  jdolecek Pullup from trunk (approved by thorpej):
Change lf_advlock() to:
int lf_advlock (struct vop_advlock_args *, struct lockf **, off_t)

This matches it's usage. Change inspired by FreeBSD, though we use
off_t instead u_quad_t as the last argument.

sys/lockf.h rev. 1.9
msdosfs/msdosfs_vnops.c rev. 1.99
kern/vfs_lockf.c rev. 1.17
miscfs/specfs/spec_vnops.c rev. 1.49
nfs/nfs_vnops.c rev. 1.115
ufs/ext2fs/ext2fs_vnops.c rev. 1.28
ufs/ufs/ufs_vnops.c rev. 1.72
 1.76.2.11 03-Jan-2003  thorpej Sync with HEAD.
 1.76.2.10 11-Nov-2002  nathanw Catch up to -current
 1.76.2.9 18-Oct-2002  nathanw Catch up to -current.
 1.76.2.8 20-Jun-2002  nathanw Catch up to -current.
 1.76.2.7 08-Jan-2002  nathanw Catch up to -current.
 1.76.2.6 14-Nov-2001  nathanw Catch up to -current.
 1.76.2.5 26-Sep-2001  nathanw Catch up to -current.
Again.
 1.76.2.4 21-Sep-2001  nathanw Catch up to -current.
 1.76.2.3 24-Aug-2001  nathanw Catch up with -current.
 1.76.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.76.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.78.4.9 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.78.4.8 25-Sep-2002  jdolecek switch over to genfs_kqfilter(), g/c the ufs_kqfilter() code
 1.78.4.7 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.78.4.6 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.78.4.5 08-Sep-2001  thorpej Centralize the definition of VN_KNOTE().
 1.78.4.4 07-Sep-2001  thorpej More const.
 1.78.4.3 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.78.4.2 03-Aug-2001  lukem update to -current
 1.78.4.1 10-Jul-2001  lukem * implement ufs_kqfilter(), filt_ufs*()
* add KNOTE(9) calls as appropriate
 1.80.2.3 01-Oct-2001  fvdl Catch up with -current.
 1.80.2.2 18-Sep-2001  fvdl Various changes to make cloning devices possible:

* Add an extra argument (struct vnode **) to VOP_OPEN. If it is
not NULL, specfs will create a cloned (aliased) vnode during
the call, and return it there. The caller should release and
unlock the original vnode if a new vnode was returned. The
new vnode is returned locked.

* Add a flag field to the cdevsw and bdevsw structures.
DF_CLONING indicates that it wants a new vnode for each
open (XXX is there a better way? devprop?)

* If a device is cloning, always call the close entry
point for a VOP_CLOSE.


Also, rewrite cons.c to do the right thing with vnodes. Use VOPs
rather then direct device entry calls. Suggested by mycroft@

Light to moderate testing done an i386 system (arch doesn't matter
though, these are MI changes).
 1.80.2.1 07-Sep-2001  thorpej Commit my "devvp" changes to the thorpej-devvp branch. This
replaces the use of dev_t in most places with a struct vnode *.

This will form the basic infrastructure for real cloning device
support (besides being architecurally cleaner -- it'll be good
to get away from using numbers to represent objects).
 1.82.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.102.2.15 11-Dec-2005  christos Sync with head.
 1.102.2.14 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.102.2.13 01-Apr-2005  skrll Sync with HEAD.
 1.102.2.12 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.102.2.11 04-Feb-2005  skrll Sync with HEAD.
 1.102.2.10 24-Jan-2005  skrll Sync with HEAD.
 1.102.2.9 27-Oct-2004  skrll Fix various comments that describe the argument structures
 1.102.2.8 27-Oct-2004  skrll Remove the struct lwp * arguments from qsync and ufs_checkpath that are
no longer (read: were never) required.
 1.102.2.7 24-Sep-2004  skrll Sync with HEAD.
 1.102.2.6 21-Sep-2004  skrll Fix the sync with head I botched.
 1.102.2.5 18-Sep-2004  skrll Sync with HEAD.
 1.102.2.4 25-Aug-2004  skrll Sync with HEAD.
 1.102.2.3 24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.102.2.2 03-Aug-2004  skrll Sync with HEAD
 1.102.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.113.2.2 28-Jul-2004  tron Pull up revision 1.117 (requested by dbj in ticket #722):
remove incorrect casts that limit some uses of daddr_t to 31 bits
this fixes problems using ffs2 with more than 2^31 sectors (~1tb)
 1.113.2.1 23-May-2004  grant Pull up revision 1.114 (requested by kleink in ticket #379):

POSIX: Permit a process without the appropriate privilege to change a
file's group ID to its effective gid, in addition to the presently
permitted set of supplementary gids.
 1.123.4.1 29-Apr-2005  kent sync with -current
 1.125.2.2 26-Mar-2005  yamt sync with head.
 1.125.2.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.126.2.2 24-Aug-2005  riz Pull up following revision(s) (requested by yamt in ticket #688):
sys/miscfs/genfs/genfs_vnops.c: revision 1.98 via patch
sys/ufs/ffs/ffs_vfsops.c: revision 1.165
sys/ufs/lfs/lfs_extern.h: revision 1.69
sys/fs/filecorefs/filecore_vfsops.c: revision 1.20
sys/nfs/nfs_node.c: revision 1.80
sys/fs/smbfs/smbfs_node.c: revision 1.24
sys/fs/cd9660/cd9660_vfsops.c: revision 1.24
sys/fs/msdosfs/msdosfs_denode.c: revision 1.8
sys/miscfs/genfs/genfs_node.h: revision 1.6
sys/ufs/lfs/lfs_vfsops.c: revision 1.183
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.86
sys/fs/adosfs/advfsops.c: revision 1.23
sys/fs/ntfs/ntfs_vfsops.c: revision 1.31
- constify genfs_ops.
- use member designators.

sys/miscfs/genfs/genfs_vnops.c: revision 1.99 via patch
genfs_getpages: don't forget to put the vnode onto the syncer's work que
ue
even in the case of PGO_LOCKED.

sys/uvm/uvm_bio.c: revision 1.40
sys/uvm/uvm_pager.h: revision 1.29
sys/miscfs/genfs/genfs_vnops.c: revision 1.100 via patch
sys/ufs/ufs/ufs_inode.c: revision 1.50
- introduce PGO_NOBLOCKALLOC and use it for ubc mapping
to prevent unnecessary block allocations in the case that
page size > block size.
- ufs_balloc_range: use VM_PROT_WRITE+PGO_NOBLOCKALLOC rather than
VM_PROT_READ.

sys/uvm/uvm_fault.c: revision 1.96
sys/miscfs/genfs/genfs_vnops.c: revision 1.101 via patch
sys/uvm/uvm_object.h: revision 1.19
sys/miscfs/genfs/genfs_node.h: revision 1.7
ensure that vnodes with dirty pages are always on syncer's queue.
- genfs_putpages: wait for i/o completion of PG_RELEASED/PG_PAGEOUT pages by
setting "wasclean" false when encountering them.
suggested by Stephan Uphoff in PR/24596 (1).
- genfs_putpages: write protect pages when cleaning out, if
we're going to take the vnode off the syncer's queue.
uvm_fault: don't write-map pages unless its vnode is already on
the syncer's queue.
fix PR/24596 (3) but in the different way from the suggested fix.
(to keep our current behaviour, ie. not to require explicit msync.
discussed on tech-kern@.)
- genfs_putpages: don't mistakenly take a vnode off the queue
by introducing a generation number in genfs_node.
genfs_getpages: increment the generation number.
suggested by Stephan Uphoff in PR/24596 (2).
- add some assertions.

sys/miscfs/genfs/genfs_vnops.c: revision 1.102 via patch
genfs_putpages: don't bother to clean the vnode unless VONWORKLST.

sys/ufs/ffs/ffs_vnops.c: revision 1.71
ffs_full_fsync: because VBLK/VCHR can be mmap'ed,
do VOP_PUTPAGES for them as well.

sys/uvm/uvm_fault.c: revision 1.97
uvm_fault: check a correct object in the case of layered filesystems.
fix PR/30811 from Jukka Salmi.

sys/uvm/uvm_object.h: revision 1.20
sys/ufs/ffs/ffs_vfsops.c: revision 1.167
sys/uvm/uvm_bio.c: revision 1.41
sys/ufs/ufs/ufs_vnops.c: revision 1.129
sys/uvm/uvm_mmap.c: revision 1.92
sys/uvm/uvm_fault.c: revision 1.98
sys/kern/vfs_subr.c: revision 1.252
sys/fs/msdosfs/denode.h: revision 1.5
sys/miscfs/genfs/genfs_vnops.c: revision 1.103 via patch
sys/fs/msdosfs/msdosfs_denode.c: revision 1.9
sys/sys/vnode.h: revision 1.141
sys/ufs/ufs/ufs_inode.c: revision 1.51
sys/ufs/ufs/ufs_extern.h: revision 1.45 via patch
sys/miscfs/genfs/genfs_node.h: revision 1.8
sys/ufs/lfs/lfs_vfsops.c: revision 1.184
sys/uvm/uvm_pager.h: revision 1.30
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.87
update file timestamps for nfsd loaned-read and mmap.
PR/25279. discussed on tech-kern@.

sys/miscfs/genfs/genfs_vnops.c: revision 1.104 via patch
don't write-protect wired pages. pointed by Chuck Silvers.
for now, leave a vnode on the syncer's queue, as suggested by him.

sys/ufs/ffs/ffs_vnops.c: revision 1.72
revert VCHR part of ffs_vnops.c 1.71.
as VCHR uses the device pager, no point to call VOP_PUTPAGES here.
pointed by Chuck Silvers.
 1.126.2.1 30-Mar-2005  tron Pull up revision 1.127 (requested by perseant in ticket #74):
Make LFS dirops get their vnode first, before incrementing the dirop
count, to prevent a deadlock trying to call VOP_PUTPAGES() on a VDIROP
vnode. This can happen when a stacked filesystem is mounted on top of an
LFS: an LFS dirop needs to get a vnode, which is available from the upper
layer. The corresponding lower layer vnode, however, is VDIROP, so the
upper layer can't be cleaned out since its VOP_PUTPAGES() is passed
through to the lower layer, which waits for dirops to drain before it can
proceed. Deadlock.
Tweak ufs_makeinode() and ufs_mkdir() to pass the a_vpp argument through
to VOP_VALLOC().
Partially addresses PR # 26043, though it probably does not completely fix
the problem described there.
 1.127.2.8 04-Feb-2008  yamt sync with head.
 1.127.2.7 21-Jan-2008  yamt sync with head
 1.127.2.6 07-Dec-2007  yamt sync with head
 1.127.2.5 27-Oct-2007  yamt sync with head.
 1.127.2.4 03-Sep-2007  yamt sync with head.
 1.127.2.3 26-Feb-2007  yamt sync with head.
 1.127.2.2 30-Dec-2006  yamt sync with head.
 1.127.2.1 21-Jun-2006  yamt sync with head.
 1.135.2.1 20-Oct-2005  yamt adapt ufs.
 1.137.2.2 18-Nov-2005  yamt - associate read-ahead context to vnode, rather than file.
- revert VOP_READ prototype.
 1.137.2.1 15-Nov-2005  yamt adapt ffs, lfs, nfs.
 1.138.6.3 01-Jun-2006  kardel Sync with head.
 1.138.6.2 22-Apr-2006  simonb Sync with head.
 1.138.6.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time() and use "time_second"
instead of "time.tv_sec".
 1.138.4.1 09-Sep-2006  rpaulo sync with head
 1.138.2.1 31-Dec-2005  yamt adapt some random parts of kernel to uio_vmspace.
 1.139.6.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.139.4.5 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.139.4.4 11-Mar-2006  elad When calling kauth_cred_ismember_gid(), don't return the error code if
there is one, just treat it as if the check failed.

Pointed out by thorpej@.
 1.139.4.3 11-Mar-2006  elad kauth_cred_groupmember() -> kauth_cred_ismember_gid(), as requested by
thorpej@ to conform to the Darwin KPI.
 1.139.4.2 10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.139.4.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.139.2.3 11-Aug-2006  yamt sync with head
 1.139.2.2 26-Jun-2006  yamt sync with head.
 1.139.2.1 24-May-2006  yamt sync with head.
 1.140.2.1 19-Jun-2006  chap Sync with head.
 1.142.6.2 10-Dec-2006  yamt sync with head.
 1.142.6.1 22-Oct-2006  yamt sync with head
 1.142.4.3 01-Feb-2007  ad Sync with head.
 1.142.4.2 12-Jan-2007  ad Sync with head.
 1.142.4.1 18-Nov-2006  ad Sync with head.
 1.143.2.3 10-Mar-2007  bouyer Pull up following revision(s) (requested by chs in ticket #508):
sys/ufs/ufs/ufs_vnops.c: revision 1.150
In readdir, in case cookies was already allocated but is later free'd
due to an error, reset value of cookies to NULL to avoid confusing
callers.
should fix kern/35728
 1.143.2.2 17-Feb-2007  tron Apply patch (requested by chs in ticket #422):
- Fix various deadlock problems with nullfs and unionfs.
- Speed up path lookups by upto 25%.
 1.143.2.1 17-Jan-2007  tron Pull up following revision(s) (requested by yamt in ticket #357):
sys/ufs/ufs/ufs_vnops.c: revision 1.145
ufs_readdir: start from offsets known to be valid,
rather than assuming users feed us valid offsets.
 1.149.2.3 17-May-2007  yamt sync with head.
 1.149.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.149.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.152.4.1 11-Jul-2007  mjf Sync with head.
 1.152.2.9 23-Oct-2007  ad Sync with head.
 1.152.2.8 16-Sep-2007  ad - Checkpoint work in progress on the vnode lifecycle and reference counting
stuff. This makes it work properly without kernel_lock and fixes a few
quite old bugs. See vfs_subr.c 1.283.2.17 for details.

- Fix some problems with softdep. Unfortunately our softdep code appears
to have some longstanding bugs that cause it fail under stress test.
 1.152.2.7 20-Aug-2007  ad Sync with HEAD.
 1.152.2.6 19-Aug-2007  ad - Back out the biodone() changes.
- Eliminate B_ERROR (from HEAD).
 1.152.2.5 15-Jul-2007  ad Sync with head.
 1.152.2.4 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.152.2.3 08-Jun-2007  ad Sync with head.
 1.152.2.2 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.152.2.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.154.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.155.6.2 29-Jul-2007  ad It's not a good idea for device drivers to modify b_flags, as they don't
need to understand the locking around that field. Instead of setting
B_ERROR, set b_error instead. b_error is 'owned' by whoever completes
the I/O request.
 1.155.6.1 29-Jul-2007  ad file ufs_vnops.c was added on branch matt-mips64 on 2007-07-29 13:31:17 +0000
 1.155.4.4 09-Dec-2007  jmcneill Sync with HEAD.
 1.155.4.3 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.155.4.2 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.155.4.1 16-Aug-2007  jmcneill Sync with HEAD.
 1.156.4.1 14-Oct-2007  yamt sync with head.
 1.156.2.3 23-Mar-2008  matt sync with HEAD
 1.156.2.2 09-Jan-2008  matt sync with HEAD
 1.156.2.1 06-Nov-2007  matt sync with HEAD
 1.157.4.3 18-Feb-2008  mjf Sync with HEAD.
 1.157.4.2 27-Dec-2007  mjf Sync with HEAD.
 1.157.4.1 08-Dec-2007  mjf Sync with HEAD.
 1.159.2.2 26-Dec-2007  ad Sync with head.
 1.159.2.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.160.4.2 08-Jan-2008  bouyer Sync with HEAD
 1.160.4.1 02-Jan-2008  bouyer Sync with HEAD
 1.164.12.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.164.12.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.164.10.5 11-Aug-2010  yamt sync with head.
 1.164.10.4 11-Mar-2010  yamt sync with head
 1.164.10.3 18-Jul-2009  yamt sync with head.
 1.164.10.2 16-May-2009  yamt sync with head
 1.164.10.1 04-May-2009  yamt sync with head.
 1.164.8.1 04-Jun-2008  yamt sync with head
 1.164.6.4 17-Jan-2009  mjf Sync with HEAD.
 1.164.6.3 28-Sep-2008  mjf Sync with HEAD.
 1.164.6.2 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.164.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.166.4.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.166.4.1 19-Oct-2008  haad Sync with HEAD.
 1.166.2.2 12-Jun-2008  martin License police
 1.166.2.1 10-Jun-2008  simonb Initial commit of Wasabi System's WAPBL (Write Ahead Physical Block
Logging) journaling code. Originally written by Darrin B. Jewell
while at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

Still a number of issues - look in doc/BRANCHES for "simonb-wapbl"
for more info.
 1.169.10.1 21-Apr-2010  matt sync to netbsd-5
 1.169.6.3 20-Nov-2008  christos merge with head.
 1.169.6.2 09-Nov-2008  christos fix printf formats for new dev_t
 1.169.6.1 14-Aug-2008  christos file ufs_vnops.c was added on branch christos-time_t on 2008-11-09 23:29:16 +0000
 1.169.4.3 22-May-2012  riz Pull up following patch (requested by buhrow in ticket #1763):
sys/ufs/ufs/ufs_vnops.c: patch

Make sure we return EXDEV on cross-device links rather than panicing
the system. This corrects a pasting error from the merged patches
in ticket 1759.

Thanks to hannken@ for figuring out the error.
Fixes pr kern/46472
Tested by buhrow@
 1.169.4.2 19-May-2012  riz Apply patch (requested by buhrow in ticket #1759):


sys/ufs/lfs/lfs_vnops.c patch
sys/ufs/ufs/inode.h patch
sys/ufs/ufs/ufs_extern.h patch
sys/ufs/ufs/ufs_lookup.c patch
sys/ufs/ufs/ufs_vnops.c patch
sys/ufs/ufs/ufs_wapbl.c patch

Port dholland's ufs_rename locking changes to netbsd-5.
[buhrow, ticket #1759]

Hello. More testing has revealed a minor misunderstanding between the
vnode API in -current and 5.x. The below patch, against NetBSD-5.1
sources, rolls all the accumulated patches into one patch set. With this
patch, I believe you can now run with WAPBL, softdep or traditional ufs
semantics with heavy file loads and avoid panics due to resource exhaustion
and/or tstile deadlocks. Testing has been done on I386, both uniprocessor
and multiprocessor, and on Sparc machines in uniprocessor mode, though I
think multiprocessor Sparc would be fine as well. Since these changes are
machine independent, I don't anticipate any issues on any platform. It is
my hope that modulo any final issues that come up in the final round of
testing I'm currently performing, these patches will be ready to be pulled
up into the NetBSD-5 branch.
Finally, I'd like to thank mouse@ and hannken@ for their help and
patience in helping me track down and test the final versions of these
patches. With their assistance, I'm confident these patches make NetBSD-5
a much more stable and robust operating environment in a variety of
setings.
 1.169.4.1 28-Mar-2010  snj Pull up following revision(s) (requested by hannken in ticket #1345):
sys/ufs/ufs/ufs_vnops.c: revision 1.180 via patch
ufs_rmdir(): move fstrans_done() after vput(). No more unlinked and
zero-sized directory inodes in snapshots.
 1.169.2.3 28-Apr-2009  skrll Sync with HEAD.
 1.169.2.2 03-Mar-2009  skrll Sync with HEAD.
 1.169.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.172.2.2 23-Jul-2009  jym Sync with HEAD.
 1.172.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.180.4.5 31-May-2011  rmind sync with head
 1.180.4.4 21-Apr-2011  rmind sync with head
 1.180.4.3 05-Mar-2011  rmind sync with head
 1.180.4.2 03-Jul-2010  rmind sync with head
 1.180.4.1 30-May-2010  rmind sync with head
 1.180.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.180.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.186.4.2 18-Feb-2011  bouyer Add a new inode flag, SF_SNAPINVAL, to be set on SF_SNAPSHOT inodes when
the snapshot is invalid.
Set SF_SNAPSHOT | SF_SNAPINVAL early when initializing a snapshot indode,
so that quota are bypassed for allocations on this inode.
Set SF_SNAPSHOT | SF_SNAPINVAL (instead of clearing SF_SNAPSHOT) when
expuge()ing a snapshot inode, so that userland tools working on the
snapshot (e.g. fsck or dump) can properly handle this inode.

The main point at this time is to have fsck_ffs -X properly compute quotas;
as a bonus persistent snapshots files won't show up in a dump(8) from a
snapshot.

This may also help speeding up taking snapshots, by bypassing expuge()
for snapshot inodes completely (but this needs more thoughs).


Briefly discussed with hannken@ in private mail.
 1.186.4.1 20-Jan-2011  bouyer Snapshot of work in progress on a modernised disk quota system:
- new quotactl syscall (versionned for backward compat), which takes
as parameter a path to a mount point, and a prop_dictionary
(in plistref format) describing commands and arguments.
For each command, status and data are returned as a prop_dictionary.
quota commands features will be added to take advantage of this,
exporting quota data or getting quota commands as plists.

- new on disk-format storage (all 64bit wide), integrated to metadata for
ffs (and playing nicely with wapbl).
Quotas are enabled on a ffs filesystem via superblock flags.
tunefs(8) can enable or disable quotas.
On a quota-enabled filesystem, fsck_ffs(8) will track per-uid/gid
block and inode usages, and will check and update quotas in Pass 6.
quota usage and limits are stored in unliked files (one for users,
one for groups)l fsck_ffs(8) will create the files if needed, or
free them if needed. This means that after enabling or disabling
quotas on a filesystem; a fsck_ffs(8) run is required.
quotacheck(8) is not needed any more, on a unclean shutdown
fsck or journal replay will take care of fixing quotas.
newfs(8) can create a ready-to-mount quota-enabled filesystem
(superblock flags are set and quota inodes are created).
Other new features or semantic changes:
- default quota datas, applied to users or groups which don't already
have a quota entry
- per-user/group grace time (instead of a filesystem global one)
- 0 really means "nothing allowed at all", not "no limit".
If you want "no limit", set the limit to UQUAD_MAX (tools will
understand "unlimited" and "-")

A quota file is structured as follow:
it starts with a header, containing a few per-filesystem values,
and the default quota limits.
Quota entries are linked together as a simple list, each entry has a
pointer (as an offset withing the file) to the next.
The header has a pointer to a list of free quota entries, and
a hash table of in-use entries. The size of the hash table depends
on the filesystem block size (header+hash table should fit in the
first block). The file is not sparse and is a multiple of
filesystem block size (when the free quota entry list is empty a new
filesystem block is allocated). quota entries to not cross
filesystem block boundaries.

In memory, the kernel keeps a cache of recently used quota entries
as a reference to the block number, and offset withing the block.
The quota entry itself is keept in the buf cache.

fsck_ffs(8), tunefs(8) and newfs(8) supports are completed (with
related atf tests :)
The kernel can update disk usage and report it via quotactl(2).

Todo: enforce quotas limits (limits are not checked by kernel yet)
update repquota, edquota and rpc.rquotad to the new world
implement compat_50_quotactl ioctl.
update quotactl(2) man page

fsck_ffs required fixes so that allocating new blocks or inodes will
properly update the superblock and cg sumaries. This was not an issue up
to now because superblock and cg sumaries check happened last, but now
allocations or frees can happen in pass 6.
 1.186.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.205.2.5 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.205.2.4 23-Jan-2013  yamt sync with head
 1.205.2.3 30-Oct-2012  yamt sync with head
 1.205.2.2 23-May-2012  yamt sync with head.
 1.205.2.1 17-Apr-2012  yamt sync with head
 1.206.4.3 02-Jun-2012  mrg sync to latest -current.
 1.206.4.2 05-Apr-2012  mrg sync to latest -current.
 1.206.4.1 18-Feb-2012  mrg merge to -current.
 1.210.2.4 03-Dec-2017  jdolecek update from HEAD
 1.210.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.210.2.2 23-Jun-2013  tls resync from head
 1.210.2.1 25-Feb-2013  tls resync with head
 1.216.2.2 18-May-2014  rmind sync with head
 1.216.2.1 28-Aug-2013  rmind sync with head
 1.220.2.1 10-Aug-2014  tls Rebase.
 1.221.2.1 28-Jan-2015  martin Pull up following revision(s) (requested by christos in ticket #425):
sys/ufs/ufs/ufs_inode.c: revision 1.91-1.92
sys/ufs/ufs/ufs_vnops.c: revision 1.223-1.224
sys/ufs/ufs/ufs_extern.h: revision 1.76-1.77
sys/ufs/ffs/ffs_vfsops.c: revision 1.303-1.305
Add debugging for mount...
Merge some error returns
Check more errors
Restore apple ufs error handling.
Move and unify indirect block truncate algorithm into a separate function.
PR/39371: Tobias Nygren: Don't fail mounting root if WAPBL log is corrupt.
Patch from Sergio L. Pascual.
 1.224.2.6 28-Aug-2017  skrll Sync with HEAD
 1.224.2.5 05-Dec-2016  skrll Sync with HEAD
 1.224.2.4 29-May-2016  skrll Sync with HEAD
 1.224.2.3 22-Sep-2015  skrll Sync with HEAD
 1.224.2.2 06-Jun-2015  skrll Sync with HEAD
 1.224.2.1 06-Apr-2015  skrll Sync with HEAD
 1.232.2.3 20-Mar-2017  pgoyette Sync with HEAD
 1.232.2.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.232.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.234.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.237.4.2 27-Feb-2020  martin Pull up following revision(s) (requested by maxv in ticket #1511):

sys/ufs/ufs/ufs_vnops.c: revision 1.249

Zero out the padding in 'd_namlen', to prevent info leaks. Same logic as
ufs_makedirentry().

Found by kMSan: the unzeroed bytes of the pool_cache were getting copied
to the disk via a DMA write operation, and there kMSan was noticing
uninitialized memory leaving the system.
 1.237.4.1 02-Nov-2017  snj Pull up following revision(s) (requested by pgoyette in ticket #335):
share/man/man9/kernhist.9: 1.5-1.8
sys/arch/acorn26/acorn26/pmap.c: 1.39
sys/arch/arm/arm32/fault.c: 1.105 via patch
sys/arch/arm/arm32/pmap.c: 1.350, 1.359
sys/arch/arm/broadcom/bcm2835_bsc.c: 1.7
sys/arch/arm/omap/if_cpsw.c: 1.20
sys/arch/arm/omap/tiotg.c: 1.7
sys/arch/evbarm/conf/RPI2_INSTALL: 1.3
sys/dev/ic/sl811hs.c: 1.98
sys/dev/usb/ehci.c: 1.256
sys/dev/usb/if_axe.c: 1.83
sys/dev/usb/motg.c: 1.18
sys/dev/usb/ohci.c: 1.274
sys/dev/usb/ucom.c: 1.119
sys/dev/usb/uhci.c: 1.277
sys/dev/usb/uhub.c: 1.137
sys/dev/usb/umass.c: 1.160-1.162
sys/dev/usb/umass_quirks.c: 1.100
sys/dev/usb/umass_scsipi.c: 1.55
sys/dev/usb/usb.c: 1.168
sys/dev/usb/usb_mem.c: 1.70
sys/dev/usb/usb_subr.c: 1.221
sys/dev/usb/usbdi.c: 1.175
sys/dev/usb/usbdi_util.c: 1.67-1.70
sys/dev/usb/usbroothub.c: 1.3
sys/dev/usb/xhci.c: 1.75
sys/external/bsd/drm2/dist/drm/i915/i915_gem.c: 1.34
sys/kern/kern_history.c: 1.15
sys/kern/kern_xxx.c: 1.74
sys/kern/vfs_bio.c: 1.275-1.276
sys/miscfs/genfs/genfs_io.c: 1.71
sys/sys/kernhist.h: 1.21
sys/ufs/ffs/ffs_balloc.c: 1.63
sys/ufs/lfs/lfs_vfsops.c: 1.361
sys/ufs/lfs/ulfs_inode.c: 1.21
sys/ufs/lfs/ulfs_vnops.c: 1.52
sys/ufs/ufs/ufs_inode.c: 1.102
sys/ufs/ufs/ufs_vnops.c: 1.239
sys/uvm/pmap/pmap.c: 1.37-1.39
sys/uvm/pmap/pmap_tlb.c: 1.22
sys/uvm/uvm_amap.c: 1.108
sys/uvm/uvm_anon.c: 1.64
sys/uvm/uvm_aobj.c: 1.126
sys/uvm/uvm_bio.c: 1.91
sys/uvm/uvm_device.c: 1.66
sys/uvm/uvm_fault.c: 1.201
sys/uvm/uvm_km.c: 1.144
sys/uvm/uvm_loan.c: 1.85
sys/uvm/uvm_map.c: 1.353
sys/uvm/uvm_page.c: 1.194
sys/uvm/uvm_pager.c: 1.111
sys/uvm/uvm_pdaemon.c: 1.109
sys/uvm/uvm_swap.c: 1.175
sys/uvm/uvm_vnode.c: 1.103
usr.bin/vmstat/vmstat.c: 1.219
Reorder to test for null before null deref in debug code
--
Reorder to test for null before null deref in debug code
--
KNF
--
No need for '\n' in UVMHIST_LOG
--
normalise a BIOHIST log message
--
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...
(As proposed on tech-kern@ with additional changes and enhancements.)
Details of changes:
* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)
* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.
* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.
* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."
* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.
* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).
* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).
* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.
* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.
[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3)
format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".
[2] I've tried very hard to find "all [the] existing users of
kernhist(9)"
but it is possible that I've missed some of them. I would be glad
to
update any stragglers that anyone identifies.
--
For some reason this single kernel seems to have outgrown its declared
size as a result of the kernhist(9) changes. Bump the size.
XXX The amount of increase may be excessive - anyone with more detailed
XXX knowledge please feel free to further adjust the value
appropriately.
--
Misssed one cast of pointer --> uintptr_t in previous kernhist(9) commit
--
And yet another one. :(
--
Use correct mark-up for NetBSD version.
--
More improvements in grammar and readability.
--
Remove a stray '"' (obvious typo) and add a couple of casts that are
probably needed.
--
And replace an instance of "%p" conversion with "%#jx"
--
Whitespace fix. Give Bl tag table a width. Fix Xr.
 1.239.4.4 21-Apr-2020  martin Sync with HEAD
 1.239.4.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.239.4.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.239.4.1 10-Jun-2019  christos Sync with HEAD
 1.239.2.1 18-Jan-2019  pgoyette Synch with HEAD
 1.247.2.1 27-Feb-2020  martin Pull up following revision(s) (requested by maxv in ticket #739):

sys/ufs/ufs/ufs_vnops.c: revision 1.249

Zero out the padding in 'd_namlen', to prevent info leaks. Same logic as
ufs_makedirentry().

Found by kMSan: the unzeroed bytes of the pool_cache were getting copied
to the disk via a DMA write operation, and there kMSan was noticing
uninitialized memory leaving the system.
 1.248.2.3 29-Feb-2020  ad Sync with head.
 1.248.2.2 24-Jan-2020  ad - Put all the namecache stuff back into vnode_impl_t.
- Tidy vfs_cache.c up, finish the comments.
- Finalise how ID information is entered to the cache.
- Handle very small/old systems.
 1.248.2.1 17-Jan-2020  ad vfs_lookup:

- Do the easy component name lookups directly in the namecache without
taking vnode locks nor vnode references (between the start and the leaf /
parent), which seems to largely solve the lock contention problem with
namei(). It needs support from the file system, which has to tell the
name cache about directory permissions (only ffs and tmpfs tried so far),
and I'm not sure how or if it can work with layered file systems yet.
Work in progress.

vfs_cache:

- Make the rbtree operations more efficient: inline the lookup, and key on a
64-bit hash value (32 bits plus 16 bits length) rather than names.

- Take namecache stuff out of vnode_impl, and take the rwlocks, and put them
all together an an nchnode struct which is mapped 1:1: with vnodes. Saves
memory and nicer cache profile.

- Add a routine to help vfs_lookup do its easy component name lookups.

- Report some more stats.

- Tidy up the file a bit.
 1.250.2.1 20-Apr-2020  bouyer Sync with HEAD
 1.26 11-Apr-2020  jdolecek remove noncompilable WAPBL_DEBUG_INODES

PR kern/49554 by Thomas Klausner
 1.25 22-Dec-2019  ad branches: 1.25.6;
Make mntvnode_lock per-mount, and address false sharing of struct mount.
 1.24 01-Mar-2017  hannken branches: 1.24.14;
Remove now redundant calls to fstrans_start()/fstrans_done().
 1.23 27-Jan-2012  para branches: 1.23.6; 1.23.24; 1.23.28; 1.23.32;
converting readdir in ffs ext2fs from malloc(9) to kmem(9)
while there allocate ufs mount structs from kmem(9) too
preceding kmem-vmem-pool-patch

releng@ acknowledged
 1.22 18-Jul-2011  dholland branches: 1.22.2; 1.22.6;
Remove some unneeded rename-related static const data.
(Why didn't gcc warn that this was unused?)
 1.21 18-Jul-2011  dholland Move ufs_wapbl_rename to ufs_vnops.c next to the old ufs_rename.
 1.20 18-Jul-2011  dholland More rename tidying.
 1.19 18-Jul-2011  dholland In ufs_wapbl_rename, remove #if 0 blocks and remove code for
now-impossible cases.
 1.18 17-Jul-2011  dholland minor amendment to previous
 1.17 17-Jul-2011  dholland Provide correct locking for ufs_wapbl_rename. Note that this does not
fix the non-wapbl rename; that will be coming soon. This patch also
leaves a lot of the older locking-related code around in #if 0 blocks,
and there's a lot of leftover redundant logic. All that will be going
away later.

Relates to at least these PRs:

PR kern/24887
PR kern/41417
PR kern/42093
PR kern/43626

and possibly others.
 1.16 14-Jul-2011  dholland Clean up handling of ufs_lookup_results in rename.
 1.15 12-Jul-2011  dholland Pass the ufs_lookup_results pointer around instead of fetching it from
the inode in the guts of ufs. Now, in VOPs where i_crap is used it is
used (directly) only immediately on entry to the VOP call and then
passed around by reference.

Except for rename, which needs explicit sorting out. The code in
ufs_wapbl_rename is unchanged in behavior but I'm increasingly
inclined to think it's wrong.
 1.14 12-Jul-2011  dholland Currently, ufs_lookup produces five auxiliary results that are left in
the vnode when lookup returns and fished out again later.

1. Create struct ufs_lookup_results to hold these.

2. Call the ufs_lookup_results instance in struct inode "i_crap" to be
clear about exactly what's going on, and to distinguish the lookup
results from respectable members of struct inode.

3. Update references to these members in the directory access
subroutines.

4. Include preliminary infrastructure for checking that the i_crap
being used is still valid when it's used. This doesn't actually do
anything yet.

5. Update the way ufs_wapbl_rename manipulates these elements to use
the new data structures. I have not changed the manipulation; it may
or may not be correct but I continue to suspect that it is not.

The word of the day is "stigmergy".
 1.13 23-May-2011  rmind ufs_wapbl_verify_inodes: update to reality (if somebody decides to use this).
 1.12 02-Jan-2011  dholland branches: 1.12.2;
Remove the special refcount behavior (adding an extra reference to the
parent dir) associated with SAVESTART in relookup().

Check all call sites to make sure that SAVESTART wasn't set while
calling relookup(); if it was, adjust the refcount behavior. Remove
related references to SAVESTART.

The only code that was reaching the extra ref was msdosfs_rename,
where the refcount behavior was already fairly broken and/or gross;
repair it.

Add a dummy 4th argument to relookup to make sure code that hasn't
been inspected won't compile. (This will go away next time the
relookup semantics change, which they will.)
 1.11 30-Nov-2010  dholland Abolish the SAVENAME and HASBUF flags. There is now always a buffer,
so the path in a struct componentname is now always valid during VOP
calls.
 1.10 24-Jun-2010  hannken Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.
 1.9 25-May-2010  pooka Add a comment describing an observed boom-crash-burn problem in
the code. Fixing it will require a full tank of gas, half a pack
of cigarettes, sunglasses, darkness, and most importantly:
someone else.
 1.8 02-Mar-2010  pooka branches: 1.8.2;
scortch ufs_vnops.c cargo cult headers
 1.7 06-Apr-2009  pooka branches: 1.7.2; 1.7.4;
Fix reference leak in fix for PR kern/40948.
Pointed out by David Holland.
 1.6 02-Apr-2009  pooka Release tdvp in an appropriate VOP_RENAME error branch to avoid
panic described in PR kern/40948.

As usual, all the error branches in rename live based on an unholy
amalgamation of prayer and the blood of cute, furry and tasty
quadrupeds, so I won't even attempt to audit the rest.

And this wapbl rename really really needs to be merged with the
standard rename. That should be a fun PhD thesis topic ....
 1.5 22-Feb-2009  ad PR kern/26878 FFSv2 + softdep = livelock (no free ram)
PR kern/16942 panic with softdep and quotas
PR kern/19565 panic: softdep_write_inodeblock: indirect pointer #1 mismatch
PR kern/26274 softdep panic: allocdirect_merge: ...
PR kern/26374 Long delay before non-root users can write to softdep partitions
PR kern/28621 1.6.x "vp != NULL" panic in ffs_softdep.c:4653 while unmounting a softdep (+quota) filesystem
PR kern/29513 FFS+Softdep panic with unfsck-able file-corruption
PR kern/31544 The ffs softdep code appears to fail to write dirty bits to disk
PR kern/31981 stopping scsi disk can cause panic (softdep)
PR kern/32116 kernel panic in softdep (assertion failure)
PR kern/32532 softdep_trackbufs deadlock
PR kern/37191 softdep: locking against myself
PR kern/40474 Kernel panic after remounting raid root with softdep

Retire softdep, pass 2. As discussed and later formally announced on the
mailing lists.
 1.4 13-Dec-2008  dholland branches: 1.4.2;
Don't deadlock on rename("foo/foo", "foo") in the case where foo/foo is a
directory. This doesn't affect non-wapbl renames; it affects wapbl because
one of the lock acquisitions was moved up past where this case otherwise
fails.

PR 40163 from Lloyd Parkes.
 1.3 08-Dec-2008  pooka Don't even try to pretend WAPBL_DEBUG_INODES works here, just #error.
 1.2 31-Jul-2008  simonb branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge the simonb-wapbl branch. From the original branch commit:

Add Wasabi System's WAPBL (Write Ahead Physical Block Logging)
journaling code. Originally written by Darrin B. Jewell while
at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

OK'd by core@, releng@.
 1.1 10-Jun-2008  simonb branches: 1.1.2; 1.1.4;
file ufs_wapbl.c was initially added on branch simonb-wapbl.
 1.1.4.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.1.4.1 19-Oct-2008  haad Sync with HEAD.
 1.1.2.5 27-Jun-2008  simonb In wapbl_ufs_rename() make sure we read in the next filesystem
block if we pass the end of the current block. Fixes all sorts
of nasty directory lossage when the stars are aligned correctly.
 1.1.2.4 27-Jun-2008  simonb Reset fdp->i_count to 0 if the current directory offset is at the start
of a directory block, otherwise ufs_dirremove() will try to compact the
current entry onto the end of the last entry of the previous directory
block past the end of that block.

Thanks to Greg Oster for help debugging and fixing this.
 1.1.2.3 12-Jun-2008  martin License police
 1.1.2.2 11-Jun-2008  simonb Fix some whitespace and long line niggles.
 1.1.2.1 10-Jun-2008  simonb Initial commit of Wasabi System's WAPBL (Write Ahead Physical Block
Logging) journaling code. Originally written by Darrin B. Jewell
while at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

Still a number of issues - look in doc/BRANCHES for "simonb-wapbl"
for more info.
 1.2.8.2 19-May-2012  riz Apply patch (requested by buhrow in ticket #1759):


sys/ufs/lfs/lfs_vnops.c patch
sys/ufs/ufs/inode.h patch
sys/ufs/ufs/ufs_extern.h patch
sys/ufs/ufs/ufs_lookup.c patch
sys/ufs/ufs/ufs_vnops.c patch
sys/ufs/ufs/ufs_wapbl.c patch

Port dholland's ufs_rename locking changes to netbsd-5.
[buhrow, ticket #1759]

Hello. More testing has revealed a minor misunderstanding between the
vnode API in -current and 5.x. The below patch, against NetBSD-5.1
sources, rolls all the accumulated patches into one patch set. With this
patch, I believe you can now run with WAPBL, softdep or traditional ufs
semantics with heavy file loads and avoid panics due to resource exhaustion
and/or tstile deadlocks. Testing has been done on I386, both uniprocessor
and multiprocessor, and on Sparc machines in uniprocessor mode, though I
think multiprocessor Sparc would be fine as well. Since these changes are
machine independent, I don't anticipate any issues on any platform. It is
my hope that modulo any final issues that come up in the final round of
testing I'm currently performing, these patches will be ready to be pulled
up into the NetBSD-5 branch.
Finally, I'd like to thank mouse@ and hannken@ for their help and
patience in helping me track down and test the final versions of these
patches. With their assistance, I'm confident these patches make NetBSD-5
a much more stable and robust operating environment in a variety of
setings.
 1.2.8.1 14-Dec-2008  bouyer Pull up following revision(s) (requested by dholland in ticket #187):
sys/ufs/ufs/ufs_wapbl.c: revision 1.4
Don't deadlock on rename("foo/foo", "foo") in the case where foo/foo is a
directory. This doesn't affect non-wapbl renames; it affects wapbl because
one of the lock acquisitions was moved up past where this case otherwise
fails.
PR 40163 from Lloyd Parkes.
 1.2.6.3 28-Apr-2009  skrll Sync with HEAD.
 1.2.6.2 03-Mar-2009  skrll Sync with HEAD.
 1.2.6.1 19-Jan-2009  skrll Sync with HEAD.
 1.2.4.3 17-Jan-2009  mjf Sync with HEAD.
 1.2.4.2 28-Sep-2008  mjf Sync with HEAD.
 1.2.4.1 31-Jul-2008  mjf file ufs_wapbl.c was added on branch mjf-devfs2 on 2008-09-28 10:41:06 +0000
 1.2.2.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.2.2.1 31-Jul-2008  wrstuden file ufs_wapbl.c was added on branch wrstuden-revivesa on 2008-09-18 04:37:06 +0000
 1.4.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.7.4.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.7.4.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.7.2.4 11-Aug-2010  yamt sync with head.
 1.7.2.3 11-Mar-2010  yamt sync with head
 1.7.2.2 04-May-2009  yamt sync with head.
 1.7.2.1 06-Apr-2009  yamt file ufs_wapbl.c was added on branch yamt-nfs-mp on 2009-05-04 08:14:39 +0000
 1.8.2.4 31-May-2011  rmind sync with head
 1.8.2.3 05-Mar-2011  rmind sync with head
 1.8.2.2 03-Jul-2010  rmind sync with head
 1.8.2.1 30-May-2010  rmind sync with head
 1.12.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.22.6.1 18-Feb-2012  mrg merge to -current.
 1.22.2.1 17-Apr-2012  yamt sync with head
 1.23.32.1 21-Apr-2017  bouyer Sync with HEAD
 1.23.28.1 20-Mar-2017  pgoyette Sync with HEAD
 1.23.24.1 28-Aug-2017  skrll Sync with HEAD
 1.23.6.1 03-Dec-2017  jdolecek update from HEAD
 1.24.14.2 21-Apr-2020  martin Sync with HEAD
 1.24.14.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.25.6.1 20-Apr-2020  bouyer Sync with HEAD
 1.19 11-Apr-2020  jdolecek remove noncompilable WAPBL_DEBUG_INODES

PR kern/49554 by Thomas Klausner
 1.18 05-Mar-2020  riastradh branches: 1.18.2;
Revert "Include opt_diagnostic.h for DIAGNOSTIC."

This did not do what I thought it did. opt_diagnostic.h is only for
the unused _DIAGNOSTIC, which seems like an abortive attempt to
incrementally convert DIAGNOSTIC to an opt_*.h option rather than a
command-line option.
 1.17 05-Mar-2020  riastradh Include opt_diagnostic.h for DIAGNOSTIC.

...at least, in header files, which may not have already included
libkern.h.
 1.16 10-Dec-2018  jdolecek put back UFS_WAPBL_JUNLOCK_ASSERT(), the underlying rw_write_held() check
doesn't actually have a race since it checks if the rwlock is held by
current lwp
 1.15 10-Dec-2018  jdolecek make UFS_WAPBL_JLOCK_ASSERT() #ifdef DIAGNOSTIC, same as the underlying
function KASSERT(), so that it actually does something; fix code using
it to actually pass correct params, so that it compiles

remove UFS_WAPBL_JUNLOCK_ASSERT(), as that is inherently racy (it's
okay on those places if the rwlock is held by other lwp); depend
on the RW_ASSERT()/LOCKDEBUG inside rw_enter() to catch the case
with wapbl rwlock held by current lwp
 1.14 11-Nov-2016  jdolecek branches: 1.14.14; 1.14.16;
fix !WAPBL variant of UFS_WAPBL_REGISTER_DEALLOCATION()
 1.13 10-Nov-2016  jdolecek during truncate with wapbl, register deallocation for upper indirect block
before recursing into lower blocks, to make sure that it will be removed after
all its referenced blocks are removed

fixes 'ffs_blkfree_common: freeing free block' panic triggered by
ufs_truncate_retry() when just the upper indirect block registration failed,
code tried to free the lower blocks again after wapbl flush

problem found by hannken@, thank you
 1.12 28-Oct-2016  jdolecek reorganize ffs_truncate()/ffs_indirtrunc() to be able to partially
succeed; change wapbl_register_deallocation() to return EAGAIN
rather than panic when code hits the limit

callers changed to either loop calling ffs_truncate() using new
utility ufs_truncate_retry() if their semantics requires it, or
just ignore the failure; remove ufs_wapbl_truncate()

this fixes possible user-triggerable panic during truncate, and
resolves WAPBL performance issue with truncates of large files

PR kern/47146 and kern/49175
 1.11 19-May-2016  riastradh branches: 1.11.2;
While here, replace GCC __FUNCTION__ by C99 __func__

From coypu.
 1.10 19-May-2016  riastradh Simplify ufs_wapbl_begin2/end2, drop 2 suffix

We are no longer calling UFS_WAPBL_BEGIN/END with vnodes (we are giving
NULL as a parameter in all cases), so we can get rid of this input
parameter and the relevant check.

From coypu.
 1.9 19-May-2016  riastradh Get rid of UFS_WAPBL_BEGIN1/END1

ufs makeinode no longer releases dvp, so incrementing the
usecount for wapbl is unnecessary.

From coypu.
 1.8 10-Nov-2013  christos branches: 1.8.6;
__USE a variable for the non-wapbl case
 1.7 19-Sep-2011  gdt branches: 1.7.2; 1.7.12; 1.7.16;
Remove prototype for the departed wapbl_ufs_rename.

ok dholland@
 1.6 18-Nov-2009  yamt use NULL instead of 0 for pointers
 1.5 08-Oct-2008  pooka branches: 1.5.12;
#error if WABPL_DEBUG_INODES is defined. That code has bitrotted
more than casu marzu cheese.
 1.4 06-Aug-2008  oster branches: 1.4.2; 1.4.4;
Define UFS_WAPBL_UNREGISTER_INODE() and UFS_WAPBL_REGISTER_INODE()
to something that pacifies the compiler in the non-WAPBL case.

Fix suggested by Martin Husemann. Fixes PR#39302.
 1.3 31-Jul-2008  simonb Be consistent with #define<tab>.
 1.2 31-Jul-2008  simonb Merge the simonb-wapbl branch. From the original branch commit:

Add Wasabi System's WAPBL (Write Ahead Physical Block Logging)
journaling code. Originally written by Darrin B. Jewell while
at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

OK'd by core@, releng@.
 1.1 10-Jun-2008  simonb branches: 1.1.2; 1.1.4;
file ufs_wapbl.h was initially added on branch simonb-wapbl.
 1.1.4.1 19-Oct-2008  haad Sync with HEAD.
 1.1.2.4 28-Jul-2008  simonb Add support for creating a WAPBL log in the filesystem. Will
create an in-filesystem log on first "mount -o log" if one doesn't
exist, and will then continue to use same log in the future. See
(soon to be added) wapbl(4) for more info.

Adds a new B_CONTIG low-level allocation flag that uses hints in
"struct ffs_inode_ext" to lay out an ffs file's data contiguously.

Thanks to Greg Oster for helping with the design of this and to
Antti Kantee for code review and suggestions.
 1.1.2.3 03-Jul-2008  simonb Store the location of the journal in the superblock. Currently
nothing really uses this, other than replay checking that what is
in the superblock matches what it expects.
 1.1.2.2 12-Jun-2008  martin License police
 1.1.2.1 10-Jun-2008  simonb Initial commit of Wasabi System's WAPBL (Write Ahead Physical Block
Logging) journaling code. Originally written by Darrin B. Jewell
while at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

Still a number of issues - look in doc/BRANCHES for "simonb-wapbl"
for more info.
 1.4.4.3 17-Jan-2009  mjf Sync with HEAD.
 1.4.4.2 28-Sep-2008  mjf Sync with HEAD.
 1.4.4.1 06-Aug-2008  mjf file ufs_wapbl.h was added on branch mjf-devfs2 on 2008-09-28 10:41:06 +0000
 1.4.2.3 10-Oct-2008  skrll Sync with HEAD.
 1.4.2.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.4.2.1 06-Aug-2008  wrstuden file ufs_wapbl.h was added on branch wrstuden-revivesa on 2008-09-18 04:37:06 +0000
 1.5.12.3 11-Mar-2010  yamt sync with head
 1.5.12.2 04-May-2009  yamt sync with head.
 1.5.12.1 08-Oct-2008  yamt file ufs_wapbl.h was added on branch yamt-nfs-mp on 2009-05-04 08:14:39 +0000
 1.7.16.1 18-May-2014  rmind sync with head
 1.7.12.2 03-Dec-2017  jdolecek update from HEAD
 1.7.12.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.7.2.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.8.6.2 05-Dec-2016  skrll Sync with HEAD
 1.8.6.1 29-May-2016  skrll Sync with HEAD
 1.11.2.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.11.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.14.16.2 21-Apr-2020  martin Sync with HEAD
 1.14.16.1 10-Jun-2019  christos Sync with HEAD
 1.14.14.1 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.18.2.1 20-Apr-2020  bouyer Sync with HEAD
 1.44 17-Nov-2022  chs Restore backward compatibility of UFS2 with previous NetBSD releases by
disabling support in UFS2 for extended attributes (including ACLs).
Add a new variant of UFS2 called "UFS2ea" that does support extended attributes.
Add new fsck_ffs operations "-c ea" and "-c no-ea" to convert file systems
from UFS2 to UFS2ea and vice-versa (both of which delete all existing extended
attributes in the process).
 1.43 27-Mar-2015  riastradh Disentangle buffer-cached I/O from page-cached I/O in UFS.

Page-cached I/O is used for regular files, and is initiated by VFS
users such as userland and NFS.

Buffer-cached I/O is used for directories and symlinks, and is issued
only internally by UFS.

New UFS routine ufs_bufio replaces vn_rdwr for internal use.
ufs_bufio is implemented by new UFS operations uo_bufrd/uo_bufwr,
which sit in ufs_readwrite.c alongside the VOP_READ/VOP_WRITE
implementations.

I preserved the code as much as possible and will leave further
simplification for future commits. I kept the ulfs_readwrite.c
copypasta close to ufs_readwrite.c in case we ever want to merge them
back; likewise ext2fs_readwrite.c.

No externally visible semantic change. All atf fs tests still pass.
 1.42 17-Mar-2015  hannken Change ffs to use vcache_new:
- Change ffs_valloc to return an inode number.
- Remove now obsolete UFS operations UFS_VALLOC and UFS_VFREE.
- Make ufs_makeinode private to ufs_vnops.c and pass vattr instead of mode.
 1.41 11-Aug-2013  dholland branches: 1.41.6;
Kill off uo_unmark_vnode/UFS_UNMARK_VNODE as it's now a leftover.
 1.40 16-Jun-2013  hannken branches: 1.40.2;
Add an UFS_SNAPGONE() ufs op replacing the calls
to ffs_snapgone() in ufs_lookup.c.

Ok: David Holland <dholland@netbsd.org>

Welcome to 6.99.22
 1.39 19-Oct-2012  drochner Implement experimental support to pass notifications that a file
was deleted from the filesystem to the disk driver, commonly
known as "discard" or "trim".
fs/driver support is in ffs and ata wd for now.
This is what was posted here:
http://mail-index.netbsd.org/tech-kern/2012/02/28/msg012813.html
with minor cleanup, and the global switch replaced by a mount option.
 1.38 09-May-2012  riastradh branches: 1.38.2;
Adapt ffs, lfs, and ext2fs to use genfs_rename.

ok dholland, rmind
 1.37 24-Nov-2011  ahoka branches: 1.37.2;
Import CHFS, which was formerly known as ChewieFS.

CHFS is a file system for flash devices developed by the
Software Engineering Department at University of Szeged, Hungary.

http://chewiefs.sed.hu/

Thanks for all who made it possible.
 1.36 06-Mar-2011  bouyer branches: 1.36.4;
merge the bouyer-quota2 branch. This adds a new on-disk format
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.

See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
 1.35 13-Nov-2008  ad branches: 1.35.8; 1.35.10; 1.35.12;
Remove #ifdef LFS from the ufs code.
 1.34 17-Apr-2008  hannken branches: 1.34.4; 1.34.10; 1.34.12;
Replace get/setspecific with a void pointer in struct ufsmount. Use explicit
initialization/finalization of snapshot private data on creation/deletion
of struct ufsmount.
Snapshot mounts no longer may fail silently because kmem_alloc() fails.

Welcome to 4.99.60

Ok: Andrew Doran <ad@netbsd.org>
 1.33 08-Dec-2007  pooka branches: 1.33.12;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.
 1.32 10-Sep-2007  pooka branches: 1.32.8;
include sys/mount.h for export_args30
 1.31 09-Aug-2007  hannken branches: 1.31.2;
Move snapshot per-mount data from struct ufsmount to mount specific data.
No functional changes.

Welcome to 4.99.28 (struct ufsmount changed size)
 1.30 16-Jul-2007  pooka branches: 1.30.2; 1.30.6;
include quota.h to score definitions used by this header
 1.29 09-Jul-2007  ad Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.28 04-Mar-2007  christos branches: 1.28.2; 1.28.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.27 14-May-2006  elad branches: 1.27.14;
integrate kauth.
 1.26 14-Jan-2006  yamt branches: 1.26.2; 1.26.4; 1.26.6; 1.26.8; 1.26.10;
- unify ffs_blkatoff and lfs_blkatoff.
- remove ufs_ops::uo_blkatoff.
- add directory read-ahead code. (disabled for now.)
 1.25 11-Dec-2005  christos branches: 1.25.2;
merge ktrace-lwp.
 1.24 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.23 27-Sep-2005  yamt branches: 1.23.2;
introduce "ufs_ops" and use it for ITIMES.
 1.22 27-Sep-2005  yamt change um_maxfilesize to unsigned as its on-disk counterpart is.
 1.21 25-Sep-2005  jmmv Follow compat naming tradition: rename compat_export_args to export_args30.
 1.20 23-Sep-2005  jmmv Apply the NFS exports list rototill patch:

- Remove all NFS related stuff from file system specific code.
- Drop the vfs_checkexp hook and generalize it in the new nfs_check_export
function, thus removing redundancy from all file systems.
- Move all NFS export-related stuff from kern/vfs_subr.c to the new
file sys/nfs/nfs_export.c. The former was becoming large and its code
is always compiled, regardless of the build options. Using the latter,
the code is only compiled in when NFSSERVER is enabled. While doing this,
also make some functions in nfs_subs.c conditional to NFSSERVER.
- Add a new command in nfssvc(2), called NFSSVC_SETEXPORTSLIST, that takes a
path and a set of export entries. At the moment it can only clear the
exports list or append entries, one by one, but it is done in a way that
allows setting the whole set of entries atomically in the future (see the
comment in mountd_set_exports_list or in doc/TODO).
- Change mountd(8) to use the nfssvc(2) system call instead of mount(2) so
that it becomes file system agnostic. In fact, all this whole thing was
done to remove a 'XXX' block from this utility!
- Change the mount*, newfs and fsck* userland utilities to not deal with NFS
exports initialization; done internally by the kernel when initializing
the NFS support for each file system.
- Implement an interface for VFS (called VFS hooks) so that several kernel
subsystems can run arbitrary code upon receipt of specific VFS events.
At the moment, this only provides support for unmount and is used to
destroy NFS exports lists from the file systems being unmounted, though it
has room for extension.

Thanks go to yamt@, chs@, thorpej@, wrstuden@ and others for their comments
and advice in the development of this patch.
 1.19 28-Aug-2005  thorpej Experimental support for extended attributes on UFS1 file systems, using a
backing file per attribute type indexed by inode number to hold the extended
attributes.

This is working pretty well on my test systems, except for the "autostart"
feature. I need someone with a better handle on the VFS locking protocol
to go over that.

This is a work-in-progress. There are parts of this that could be re-factored
allowing this approach to be used on other types of file systems.

Adapted from FreeBSD.
 1.18 22-May-2005  hannken branches: 1.18.2;
ffs/ffs_alloc.c:
- Add a missing ACTIVECG_CLR().

ffs/ffs_snapshot.c:
- Use async/delayed writes for snapshot creation and sync/uncache these buffers
on end. Reduces the time the file system must be suspended.
- Remove um_snaplistsize. Was a duplicate of um_snapblklist[0].
- Byte swap the list of preallocated blocks on read/write instead of access.
- Always keep this list on ip->i_snapblklist so it may be rolled back when the
newest snapshot gets removed. Fixes a rare snapshot corruption when using
more than one snapshot on a file system.

ufs/ufsmount.h:
- Make TAILQ_LAST() possible on member um_snapshots.
- Remove um_snaplistsize. Was a duplicate of um_snapblklist[0].
 1.17 15-Aug-2004  mycroft branches: 1.17.10;
Fixing age old cruft:
* Rather than using mnt_maxsymlinklen to indicate that a file systems returns
d_type fields(!), add a new internal flag, IMNT_DTYPE.

Add 3 new elements to ufsmount:
* um_maxsymlinklen, replaces mnt_maxsymlinklen (which never should have existed
in the first place).
* um_dirblksiz, which tracks the current directory block size, eliminating the
FS-specific checks littered throughout the code. This may be used later to
make the block size variable.
* um_maxfilesize, which is the maximum file size, possibly adjusted lower due
to implementation issues.

Sync some bug fixes from FFS into ext2fs, particularly:
* ffs_lookup.c 1.21, 1.28, 1.33, 1.48
* ffs_inode.c 1.43, 1.44, 1.45, 1.66, 1.67
* ffs_vnops.c 1.84, 1.85, 1.86

Clean up some crappy pointer frobnication.
 1.16 25-May-2004  hannken Add ffs internal snapshots. Written by Marshall Kirk McKusick for FreeBSD.

- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.

Welcome to 2.0F.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.15 09-Jan-2004  dbj never upgrade the superblock or set FS_FLAGS_UPDATED in fs_old_flags
add compatibility for filesystems created before FFSv2 integration
these patches are from pr port-macppc/23926 and should also fix
problems discussed in pr kern/21404 and pr kern/21283
 1.14 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.13 18-May-2003  yamt branches: 1.13.2;
make is_sequential a callback in order to achieve better lfs write clustering.

since lfs always rewrite blocks into the new segment,
current on-disk place of the block doesn't affect to write clustering.

ok'ed by Konrad Schroder.
 1.12 05-Apr-2003  fvdl * Use the old and new time fields in the superblock as well as a few others
to determine if this filesystem was mounted by an older kernel after
having been mounted by a newer one, to avoid some summary mismatches.
* Reinstate support for 4.2 cylinder groups (read-only, as it was before).
 1.11 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.10 01-Dec-2002  matt Add multiple inclusion protection for headers. Fix mismatched
variable declarations (missing const's) as needed.
 1.9 28-Sep-2002  dbj Add support for the Apple UFS variation on ffs
This is the bulk of PR #17345

The general approach is to use a run time deteriminable value
for DIRBLKSIZ. Additional allowances are included for using
MAXSYMLINKLEN with FS_42INODEFMT and a shift in the cylinder group
cluster summary count array. Support is added for managing
the Apple UFS volume label.
 1.8 27-Nov-2000  chs branches: 1.8.2; 1.8.6;
Initial integration of the Unified Buffer Cache project.
 1.7 18-Mar-1998  bouyer branches: 1.7.14;
Add support for reading/writing FFS in non-native byte order, conditioned
to "options FFS_EI". The superblock and inodes (without blk addr) are
byteswapped at disk read/write time, other metadatas are byteswapped
when used (as they are acceeded directly in the buffer cache).
This required the addition of a "um_flags" field to struct ufsmount.
ffs_bswap.c contains superblock and inode byteswap routines also used
by userland utilities.
 1.6 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.5 11-Jun-1997  bouyer Add support for ext2fs, this needed a few modifications to ufs/ufs/inode.h:
- added an "union inode_ext" to struct inode, for the per-fs extentions.
For now only ext2fs uses it.
- i_din is now an union:
union {
struct dinode ffs_din; /* 128 bytes of the on-disk dinode. */
struct ext2fs_dinode e2fs_din; /* 128 bytes of the on-disk dinode. */
} i_din
Added a lot of #define i_ffs_* and i_e2fs_* to access the fields.
- Added two macros: FFS_ITIMES and EXT2FS_ITIMES. ITIMES calls the rigth
macro, depending on the time of the inode. ITIMES is used where necessary,
FFS_ITIMES and EXT2FS_ITIMES in other places.
 1.4 21-Dec-1994  mycroft Add RCS ids where missing.
 1.3 13-Dec-1994  mycroft Sync with CSRG.
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.7.14.1 08-Dec-2000  bouyer Sync with HEAD.
 1.8.6.1 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.8.2.2 11-Dec-2002  thorpej Sync with HEAD.
 1.8.2.1 18-Oct-2002  nathanw Catch up to -current.
 1.13.2.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.13.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.13.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.13.2.2 25-Aug-2004  skrll Sync with HEAD.
 1.13.2.1 03-Aug-2004  skrll Sync with HEAD
 1.17.10.1 28-May-2005  tron Pull up revision 1.18 (requested by hannken in ticket #334):
ffs/ffs_alloc.c:
- Add a missing ACTIVECG_CLR().
ffs/ffs_snapshot.c:
- Use async/delayed writes for snapshot creation and sync/uncache these buffers
on end. Reduces the time the file system must be suspended.
- Remove um_snaplistsize. Was a duplicate of um_snapblklist[0].
- Byte swap the list of preallocated blocks on read/write instead of access.
- Always keep this list on ip->i_snapblklist so it may be rolled back when the
newest snapshot gets removed. Fixes a rare snapshot corruption when using
more than one snapshot on a file system.
ufs/ufsmount.h:
- Make TAILQ_LAST() possible on member um_snapshots.
- Remove um_snaplistsize. Was a duplicate of um_snapblklist[0].
 1.18.2.4 21-Jan-2008  yamt sync with head
 1.18.2.3 27-Oct-2007  yamt sync with head.
 1.18.2.2 03-Sep-2007  yamt sync with head.
 1.18.2.1 21-Jun-2006  yamt sync with head.
 1.23.2.1 20-Oct-2005  yamt adapt ufs.
 1.25.2.1 15-Jan-2006  yamt sync with head.
 1.26.10.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.26.8.2 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.26.8.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.26.6.1 24-May-2006  yamt sync with head.
 1.26.4.1 01-Jun-2006  kardel Sync with head.
 1.26.2.1 09-Sep-2006  rpaulo sync with head
 1.27.14.1 12-Mar-2007  rmind Sync with HEAD.
 1.28.4.2 11-Jul-2007  mjf Sync with head.
 1.28.4.1 30-Mar-2007  mjf Add initial implementation of transaction API.
 1.28.2.3 09-Oct-2007  ad Sync with head.
 1.28.2.2 20-Aug-2007  ad Sync with HEAD.
 1.28.2.1 13-Apr-2007  ad Put a per-mount lock around ffs shared data structures, excluding softdep
and quotas. Strategy lifted from FreeBSD.
 1.30.6.3 09-Dec-2007  jmcneill Sync with HEAD.
 1.30.6.2 02-Oct-2007  joerg Sync with HEAD.
 1.30.6.1 16-Aug-2007  jmcneill Sync with HEAD.
 1.30.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.31.2.2 09-Jan-2008  matt sync with HEAD
 1.31.2.1 06-Nov-2007  matt sync with HEAD
 1.32.8.1 26-Dec-2007  ad Sync with head.
 1.33.12.2 17-Jan-2009  mjf Sync with HEAD.
 1.33.12.1 02-Jun-2008  mjf Sync with HEAD.
 1.34.12.1 19-Jan-2009  skrll Sync with HEAD.
 1.34.10.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.34.4.1 04-May-2009  yamt sync with head.
 1.35.12.1 20-Jan-2011  bouyer Snapshot of work in progress on a modernised disk quota system:
- new quotactl syscall (versionned for backward compat), which takes
as parameter a path to a mount point, and a prop_dictionary
(in plistref format) describing commands and arguments.
For each command, status and data are returned as a prop_dictionary.
quota commands features will be added to take advantage of this,
exporting quota data or getting quota commands as plists.

- new on disk-format storage (all 64bit wide), integrated to metadata for
ffs (and playing nicely with wapbl).
Quotas are enabled on a ffs filesystem via superblock flags.
tunefs(8) can enable or disable quotas.
On a quota-enabled filesystem, fsck_ffs(8) will track per-uid/gid
block and inode usages, and will check and update quotas in Pass 6.
quota usage and limits are stored in unliked files (one for users,
one for groups)l fsck_ffs(8) will create the files if needed, or
free them if needed. This means that after enabling or disabling
quotas on a filesystem; a fsck_ffs(8) run is required.
quotacheck(8) is not needed any more, on a unclean shutdown
fsck or journal replay will take care of fixing quotas.
newfs(8) can create a ready-to-mount quota-enabled filesystem
(superblock flags are set and quota inodes are created).
Other new features or semantic changes:
- default quota datas, applied to users or groups which don't already
have a quota entry
- per-user/group grace time (instead of a filesystem global one)
- 0 really means "nothing allowed at all", not "no limit".
If you want "no limit", set the limit to UQUAD_MAX (tools will
understand "unlimited" and "-")

A quota file is structured as follow:
it starts with a header, containing a few per-filesystem values,
and the default quota limits.
Quota entries are linked together as a simple list, each entry has a
pointer (as an offset withing the file) to the next.
The header has a pointer to a list of free quota entries, and
a hash table of in-use entries. The size of the hash table depends
on the filesystem block size (header+hash table should fit in the
first block). The file is not sparse and is a multiple of
filesystem block size (when the free quota entry list is empty a new
filesystem block is allocated). quota entries to not cross
filesystem block boundaries.

In memory, the kernel keeps a cache of recently used quota entries
as a reference to the block number, and offset withing the block.
The quota entry itself is keept in the buf cache.

fsck_ffs(8), tunefs(8) and newfs(8) supports are completed (with
related atf tests :)
The kernel can update disk usage and report it via quotactl(2).

Todo: enforce quotas limits (limits are not checked by kernel yet)
update repquota, edquota and rpc.rquotad to the new world
implement compat_50_quotactl ioctl.
update quotactl(2) man page

fsck_ffs required fixes so that allocating new blocks or inodes will
properly update the superblock and cg sumaries. This was not an issue up
to now because superblock and cg sumaries check happened last, but now
allocations or frees can happen in pass 6.
 1.35.10.1 06-Jun-2011  jruoho Sync with HEAD.
 1.35.8.1 21-Apr-2011  rmind sync with head
 1.36.4.4 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.36.4.3 30-Oct-2012  yamt sync with head
 1.36.4.2 23-May-2012  yamt sync with head.
 1.36.4.1 17-Apr-2012  yamt sync with head
 1.37.2.1 02-Jun-2012  mrg sync to latest -current.
 1.38.2.4 03-Dec-2017  jdolecek update from HEAD
 1.38.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.38.2.2 23-Jun-2013  tls resync from head
 1.38.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.40.2.1 28-Aug-2013  rmind sync with head
 1.41.6.1 06-Apr-2015  skrll Sync with HEAD

RSS XML Feed