Home | History | Annotate | Download | only in ffs
History log of /src/sys/ufs/ffs/ffs_vfsops.c
RevisionDateAuthorComments
 1.384  30-Dec-2024  hannken Protect test/clear fs->fs_fmod with um_lock like it is already
protected in ffs_alloc.c.

When writing to disk protect moving superblock to buffer with um_lock.

Set/clear fs->fmod while mounting, updating a mount or unmounting
is safe as these operations run exclusive, either mounting creates
a new file system or the file system is suspended. Assert suspension
for update and unmount.

PR kern/58837 "ffs: Missing locking around fs_fmod/time"
 1.383  30-Dec-2024  hannken emove comment "we are always called with the filesystem marked `MPBUSY'."
above some xxx_sync() operations. These operations get called without
any exclusive lock.

This comment appeared with "add quota support" on 1990-05-02.
On 1998/02/18 MNT_MPBUSY disappeared when vfs_busy() was changed from
an exclusive lock to a shared lock.

PR kern/58837 "ffs: Missing locking around fs_fmod/time"
 1.382  08-Sep-2023  riastradh ffs_sync: Avoid unlocked access to v_numoutput/v_dirtyblkhd.

Found by lockdoc.

PR kern/57606
 1.381  15-Jun-2023  hannken Undo unlock/relock for VOP_IOCTL().

PR kern/57450 (unplugging hung USB disk triggers panic via _vstate_assert)
 1.380  05-Jun-2023  rin Make DEBUG_FFS_MOUNT compile again (with 64-bit ino_t).
 1.379  21-Dec-2022  chs ffs: fail mounts requesting ACLs for non-ea UFS2 file systems

For non-ea UFS2 file system, fail mounts that request ACLs rather than
letting the mount succeed only to reject all ACL operations later.

Also fix the messages about the on-disk fs flags conflicting with
the mount options for which type of ACLs to use, and about requesting
both types of ACLs.
 1.378  17-Nov-2022  chs branches: 1.378.2;
Restore backward compatibility of UFS2 with previous NetBSD releases by
disabling support in UFS2 for extended attributes (including ACLs).
Add a new variant of UFS2 called "UFS2ea" that does support extended attributes.
Add new fsck_ffs operations "-c ea" and "-c no-ea" to convert file systems
from UFS2 to UFS2ea and vice-versa (both of which delete all existing extended
attributes in the process).
 1.377  10-Nov-2022  hannken Some changes to "fs->fs_fmod" and "fs->fs_clean":
- clear "fs->fs_fmod" after reading the super block.
- assert we don't write a super block when mounted read-only.
- make sure "fs->fs_clean" is one of FS_ISCLEAN or FS_WASCLEAN.
- print "file system not clean" on every mount.

Should fix PR kern/57010: ffs: mounting unclean non-root fs read-only
causes spurious write to superblock
 1.376  16-Apr-2022  hannken Unlock vnode for VOP_IOCTL() and wapbl_flush().
 1.375  19-Mar-2022  hannken Remove now unused VV_LOCKSWORK, all file systems support locking.

Remove unused predicates vn_locked() and vn_anylocked().

Welcome to 9.99.95
 1.374  12-Mar-2022  riastradh ffs: Fix 64-bit inode integer truncation.

Reported-by: syzbot+1ae93e092d532582b809@syzkaller.appspotmail.com
 1.373  18-Sep-2021  christos Change the default for ACLs to be posix1e instead of nfsv4 to match FreeBSD.
Requested by chuq.
 1.372  20-Aug-2020  christos Don't cache id's for vnodes that have ACLs. ok chs@
 1.371  05-Jul-2020  christos simplify the acl setup, and fix reversed mask in the fs_flags code.
 1.370  18-May-2020  hannken Assert ufs_strategy() always gets used while current thread
holds a fstrans lock.
 1.369  16-May-2020  christos Add ACL support for FFS. From FreeBSD.
 1.368  12-May-2020  ad cache_enter_id(): give it a boolean parameter to indicate whether the cached
identity is valid.
 1.367  04-Apr-2020  ad Merge the remaining changes from the ad-namecache branch, affecting namei()
and getcwd():

- push vnode locking back as far as possible.
- do most lookups directly in the namecache, avoiding vnode locks & refs.
- don't block new refs to vnodes across VOP_INACTIVE().
- get shared locks for VOP_LOOKUP() if the file system supports it.
- correct lock types for VOP_ACCESS() / VOP_GETATTR() in a few places.

Possible future enhancements:

- make the lookups lockless.
- support dotdot lookups by being lockless and inferring absence of chroot.
- maybe make it work for layered file systems.
- avoid vnode references at the root & cwd.
 1.366  16-Mar-2020  pgoyette Use the module subsystem's ability to process SYSCTL_SETUP() entries to
automate installation of sysctl nodes.

Note that there are still a number of device and pseudo-device modules
that create entries tied to individual device units, rather than to the
module itself. These are not changed.
 1.365  27-Feb-2020  ad Tighten up the locking around vp->v_iflag a little more after the recent
split of vmobjlock & v_interlock.
 1.364  23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.363  17-Jan-2020  ad VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.
 1.362  20-Jun-2019  pgoyette branches: 1.362.2; 1.362.4;
Split the ufs code out of the ffs module and into its own module.

Adapt chfs and ext2fs modules accordingly.
 1.361  01-Jan-2019  hannken Add "void *extra" argument to vcache_new() so a file system may
pass more information about the file to create.

Welcome to 8.99.30
 1.360  10-Dec-2018  jdolecek make UFS_WAPBL_JLOCK_ASSERT() #ifdef DIAGNOSTIC, same as the underlying
function KASSERT(), so that it actually does something; fix code using
it to actually pass correct params, so that it compiles

remove UFS_WAPBL_JUNLOCK_ASSERT(), as that is inherently racy (it's
okay on those places if the rwlock is held by other lwp); depend
on the RW_ASSERT()/LOCKDEBUG inside rw_enter() to catch the case
with wapbl rwlock held by current lwp
 1.359  10-Dec-2018  maxv Remove unused mbuf.h includes.
 1.358  18-Jul-2018  uwe ffs_superblock_validate - check fs_old_size too.

Now I can mount OpenWindows Version 3 CD from 1991.
 1.357  28-May-2018  chs branches: 1.357.2;
add a genfs method to allow a file system to limit the range of pages
that are given to a single GOP_WRITE() call. needed by ZFS.
 1.356  28-Jan-2018  hannken branches: 1.356.2;
Prevent use-after-free where genfs_node_destroy() would destroy
a lock residing in the just freed inode data.
 1.355  15-Nov-2017  christos PR/52728: Izumi Tsutsui: "mount -u /dev/ /" triggers kernel panic
Simplify the control flow of the mount code and make sure that the
mountfrom argument can be converted to a block device in the update
case.
XXX: pullup-8
 1.354  20-Aug-2017  maya print mode as octal for readability
 1.353  17-Apr-2017  hannken branches: 1.353.2; 1.353.4;
Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.
 1.352  17-Apr-2017  hannken Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).
 1.351  01-Apr-2017  riastradh KASSERT(mutex_owned(vp->v_interlock)) in vnode iterator selector.
 1.350  10-Mar-2017  jdolecek slightly rearrange the code for IMNT_WANTRDONLY + MNT_UPDATE case for
better readability, no functional change
 1.349  06-Mar-2017  hannken Adapt the test "enable WAPBL on rw mounts only" to the recent change of
the protocol to update a mounted file.

Should fix PR kern/52031 (FFS mount update doesn't play nice with WAPBL)
 1.348  01-Mar-2017  hannken Bring back read-write to read-only mount update for ffs.
 1.347  01-Mar-2017  hannken Remove now redundant calls to fstrans_start()/fstrans_done().
 1.346  22-Feb-2017  hannken Enable fstrans on all file systems.

Welcome to 7.99.61
 1.345  17-Feb-2017  hannken Add generic genfs_suspendctl() and use it for all file systems.
Layered file systems need work.
 1.344  17-Feb-2017  hannken Untangle VFS_SYNC() from VFS_SUSPENDCTL().
 1.343  17-Feb-2017  hannken Flush the log to disk when ffs_sync() gets called with MNT_WAIT.
 1.342  27-Dec-2016  hannken branches: 1.342.2;
Fix a bug introduced with Rev. 1.294: use LK_NOWAIT when called with MNT_LAZY.
 1.341  20-Oct-2016  jdolecek add assertion to ensure ffs_cgupdate() is always called from
within a WAPBL transaction (if logging is on)
 1.340  28-Jul-2016  martin From Michael Plass:

The superblock field that distinguishes between 4.2BSD and 4.4BSD
inodes is really only relevant on a UFS1 file system. Make sure that
it is a UFS1 fs before using fs_old_inodefmt.

Note that the NetBSD newfs and mkfs utilities initialize fs_old_inodefmt
even for UFS2, so problems were apparent only on file systems created
by other operating systems, for example, FreeBSD.
 1.339  19-Jun-2016  christos branches: 1.339.2;
Relax the dup alloc tests to not include the on-disk data for ffsv2, since
nothing checks that the lazy-initialized inodes are correct and if they happen
to get corrupted, there is no way to fix them.
 1.338  23-Dec-2015  christos We need to check if the inode is initialized for ffsv2 when we translate a
filehandle to a vnode. This can come from nfs and it could be out of range.
In that case we read garbage from the disk, end up trying to free bogus data
when we put the vnode back and we crash.
XXX: pullup-7
 1.337  15-Nov-2015  pgoyette If file system ffs is built with WAPBL defined, make sure that the
module depends on the wapbl module.

No impact to users of built-in ffs file system code, as the WAPBL
#define will cause inclusion of the code in the kernel.

A standard build of the modular ffs file system code will #define
WAPBL, so the module will only work on a kernel which was also
built with WAPBL defined (or, once I commit it, with a dynamically-
loaded wapbl module).
 1.336  22-Oct-2015  maxv Fix PR 50070. From hannken@.
 1.335  24-Jul-2015  maxv Unused inits (harmless).

Found by Brainy.
 1.334  23-May-2015  maxv Add a missing goto.

(was here before my changes)

ok christos@
 1.333  19-May-2015  martin Cosmetics: fix netbsd.org spelling
 1.332  18-May-2015  martin Print all sizes as size_t
 1.331  18-May-2015  martin Make the recently added fs_cgsize test less strict, as it prevents existing
installs from booting.
Catch the common case and warn about it, pointing to a web page describing
the issue - but allow mounting. In all other cases, print more details about
the inconsistency and fail the mount.
 1.330  26-Apr-2015  maxv ffs_superblock_validate(): check the size of cylinder groups.
 1.329  22-Apr-2015  maxv Instead of duplicating code, create ffs_is_appleufs(): returns 1 if the
device is an AppleUFS FS, 0 otherwise.

This changes the behavior a bit: if the kernel cannot determine whether the
disk is an AppleUFS one or not, it now considers it as a normal UFS rather
than returning an error and not mounting/reloading it.

No particular comment on tech-kern@
 1.328  04-Apr-2015  maxv ffs_superblock_validate(): ensure fs_ncg!=0 and fs_maxbpg!=0 to prevent
several divisions by zero.
 1.327  28-Mar-2015  maxv Remove the 'cred' argument from bread(). Remove a now unused var in
ffs_snapshot.c. Update the man page accordingly.

ok hannken@
 1.326  27-Mar-2015  riastradh Disentangle buffer-cached I/O from page-cached I/O in UFS.

Page-cached I/O is used for regular files, and is initiated by VFS
users such as userland and NFS.

Buffer-cached I/O is used for directories and symlinks, and is issued
only internally by UFS.

New UFS routine ufs_bufio replaces vn_rdwr for internal use.
ufs_bufio is implemented by new UFS operations uo_bufrd/uo_bufwr,
which sit in ufs_readwrite.c alongside the VOP_READ/VOP_WRITE
implementations.

I preserved the code as much as possible and will leave further
simplification for future commits. I kept the ulfs_readwrite.c
copypasta close to ufs_readwrite.c in case we ever want to merge them
back; likewise ext2fs_readwrite.c.

No externally visible semantic change. All atf fs tests still pass.
 1.325  17-Mar-2015  hannken Change ffs to use vcache_new:
- Change ffs_valloc to return an inode number.
- Remove now obsolete UFS operations UFS_VALLOC and UFS_VFREE.
- Make ufs_makeinode private to ufs_vnops.c and pass vattr instead of mode.
 1.324  15-Mar-2015  maxv ffs_reload(): fix a bug that prevents Big Endian FSes from being reloaded.
'newfs' should be tagged as FS_SWAPPED, not 'fs'.

Was here before my changes.

While here, also KNF a bit.
 1.323  14-Mar-2015  maxv ffs_superblock_validate(): ensure fs_ipg and fs_fpg are != 0. Otherwise
division by zero in several places.
 1.322  10-Mar-2015  maxv ffs_superblock_validate(): check the number of inodes per block. Otherwise
a malformed value could panic the system.
 1.321  03-Mar-2015  maxv ffs_reload(): release 'bp' earlier
 1.320  03-Mar-2015  maxv ffs_reload(): the current implementation blindly guesses critical fields
of the superblock didn't change. Add checks to ensure they didn't change
for real. This prevents several memory corruptions.
 1.319  23-Feb-2015  maxv Small changes:
- instead of always calling DPRINTF with __func__, put __func__ directly
in the macro
- ffs_mountfs(): rename fsblockloc -> fs_sblockloc, initialize fs_sbsize
to zero
No real functional change
 1.318  22-Feb-2015  maxv ffs_superblock_validate(): sanitize fs_fragshift, fs_bmask and fs_fmask.
 1.317  20-Feb-2015  maxv Style, and fix a DPRINTF

No functional change
 1.316  14-Feb-2015  maxv ffs_superblock_validate(): when checking the number of frag blocks, also
make sure it matches fs->fs_frag. This also prevents an infinite loop if
fs->fs_frag=0.
 1.315  14-Feb-2015  maxv ffs_superblock_validate(): compute fs_bshift and fs_fshift, and ensure
they are consistent with what is indicated in the superblock. This allows
us to safely use some ffs_ macros.
 1.314  14-Feb-2015  maxv In fact, we need to sanitize the superblock *after* swapping it. Therefore,
move the swap code inside the loop.

'fs->fs_sbsize' is swapped twice: the first time in order to get the
correct superblock size, and later when swapping the whole superblock
structure. As a result, we need to check 'fs->fs_sbsize' twice.

This:
- fixes my previous changes for swapped FSes
- allows the kernel to look for other superblock locations if the
current superblock is not validated

And now:
- ffs_superblock_validate() takes only one argument: the fs structure
- 'fs_bsize' is unused, so delete it

Add some comments to explain a bit what we are doing.
 1.313  14-Feb-2015  maxv ffs_superblock_validate(): sanitize the number of frag blocks.
 1.312  14-Feb-2015  maxv Currently, in ffs_reload(), we don't handle the possibility that the
superblock location may have changed. But that implies that we don't
handle the possibility that its size may have changed either.

Therefore: add a check to ensure the size hasn't changed. Otherwise the
mismatch leads to a memory corruption with kmem.
 1.311  14-Feb-2015  maxv Style. No functional change.
 1.310  14-Feb-2015  maxv ffs_reload(): call ffs_superblock_validate() with the new superblock.
 1.309  13-Feb-2015  maxv ffs_superblock_validate(): ensure fs->fs_cssize!=0, otherwise the kernel
panics with kmem_alloc(0).
 1.308  13-Feb-2015  maxv Add some checks in ffs_superblock_validate():
- fs_bsize < MINBSIZE
- !powerof2(fs_bsize)
- !powerof2(fs->fs_fsize)
- fs_bsize < fs->fs_fsize

Based on makefs/ffs.
 1.307  13-Feb-2015  maxv Add a new function: ffs_superblock_validate(). And add a new check to
ensure fs_size!=0; otherwise the kernel panics with a division by zero.
 1.306  13-Feb-2015  maxv Make this a bit more readable. No functional change.
 1.305  16-Jan-2015  christos PR/39371: Tobias Nygren: Don't fail mounting root if WAPBL log is corrupt.
Patch from Sergio L. Pascual.
XXX: pullup-7
 1.304  14-Dec-2014  christos Restore apple ufs error handling.
 1.303  14-Dec-2014  christos - Add debugging for mount...
- Merge some error returns
- Check more errors
 1.302  14-Nov-2014  manu branches: 1.302.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.
 1.301  30-Oct-2014  maxv Limit the superblock size to SBLOCKSIZE, not MAXBSIZE. Otherwise memcpy
will read beyond the allocated buffer.

Discussed a bit on tech-kern@.
 1.300  24-Oct-2014  njoly One semicolon is enough.
 1.299  24-May-2014  christos branches: 1.299.2;
Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.
 1.298  08-May-2014  hannken Add a global vnode cache:

- vcache_get() retrieves a referenced and initialised vnode / fs node pair.
- vcache_remove() removes a vnode / fs node pair from the cache.

On cache miss vcache_get() calls new vfs operation vfs_loadvnode() to
initialise a vnode / fs node pair. This call is guaranteed exclusive,
no other thread will try to load this vnode / fs node pair.

Convert ufs/ext2fs, ufs/ffs and ufs/mfs to use this interface.

Remove now unused ufs/ufs_ihash

Discussed on tech-kern.

Welcome to 6.99.41
 1.297  16-Apr-2014  maxv An (un)privileged user can easily make the kernel dereference a NULL
pointer.

The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).

ok christos@
 1.296  01-Apr-2014  christos branches: 1.296.2;
Check for bread errors before we do the size check. Otherwise we de-reference
NULL...
 1.295  23-Mar-2014  hannken Change all vfsops to use C99 designated initializers.

No functional changes intended.
 1.294  17-Mar-2014  hannken Change ffs_sync() to use vfs_vnode_iterator.
 1.293  05-Mar-2014  hannken Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34
 1.292  25-Feb-2014  pooka Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.291  23-Nov-2013  christos change the mountlist CIRCLEQ into a TAILQ
 1.290  29-Oct-2013  hannken Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25
 1.289  30-Sep-2013  hannken Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>
 1.288  16-Sep-2013  hannken Function ffs_reload() works on a read-only mount, so remove the call
to ffs_snapshot_mount() as it would panic later with "already on list"
when remounting read-write.

Should fix PR kern/48211 (Unclean shutdown with active snapshot causes
panic during reboot)
 1.287  11-Aug-2013  dholland Kill off uo_unmark_vnode/UFS_UNMARK_VNODE as it's now a leftover.
 1.286  23-Jun-2013  dholland branches: 1.286.2;
Stick ffs_ in front of the following macros:
fragstoblks()
blkstofrags()
fragnum()
blknum()

to finish the job of distinguishing them from the lfs versions, which
Christos renamed the other day.

I believe this is the last of the overtly ambiguous exported symbols
from ffs... or at least, the last of the ones that conflicted with lfs.
ffs still pollutes the C namespace very broadly (as does ufs) and this
needs quite a bit more cleanup.

XXX: boo on macros with lowercase names. But I'm not tackling that just yet.
 1.285  23-Jun-2013  dholland fsbtodb() -> FFS_FSBTODB(), EXT2_FSBTODB(), or MFS_FSBTODB()
dbtofsb() -> FFS_DBTOFSB() or EXT2_DBTOFSB()

(Christos already did the lfs ones a few days back)
 1.284  16-Jun-2013  hannken Add an UFS_SNAPGONE() ufs op replacing the calls
to ffs_snapgone() in ufs_lookup.c.

Ok: David Holland <dholland@netbsd.org>

Welcome to 6.99.22
 1.283  09-Jun-2013  dholland Stick UFS_ in front of these symbols:
DIRBLKSIZ
DIRECTSIZ
DIRSIZ
OLDDIRFMT
NEWDIRFMT

Part of PR 47909.
 1.282  22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.281  20-Dec-2012  hannken Change bread() and breadn() to never return a buffer on
error and modify all callers to not brelse() on error.

Welcome to 6.99.16

PR kern/46282 (6.0_BETA crash: msdosfs_bmap -> pcbmap -> bread -> bio_doread)
 1.280  26-Nov-2012  drochner allow to enable ffs "discard" by update mounts, make the flag visible
to userland
 1.279  19-Oct-2012  drochner Implement experimental support to pass notifications that a file
was deleted from the filesystem to the disk driver, commonly
known as "discard" or "trim".
fs/driver support is in ffs and ata wd for now.
This is what was posted here:
http://mail-index.netbsd.org/tech-kern/2012/02/28/msg012813.html
with minor cleanup, and the global switch replaced by a mount option.
 1.278  10-Sep-2012  manu branches: 1.278.2;
Stop extended attributes at the appropriate place so that unmount
does not fail with EBUSY on filesystem with extended attributes ensabled.
 1.277  29-Apr-2012  chs change vflushbuf() to take the full FSYNC_* flags.
translate FSYNC_LAZY into PGO_LAZY for VOP_PUTPAGES() so that
genfs_do_io() can set the appropriate io priority for the I/O.
this is the first part of addressing PR 46325.
 1.276  13-Mar-2012  elad Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.
 1.275  29-Jan-2012  nonaka branches: 1.275.2;
use FS_UFS[12]_MAGIC_SWAPPED instead of bswap32(FS_UFS[12]_MAGIC).
 1.274  28-Jan-2012  rmind pool_page_alloc, pool_page_alloc_meta: avoid extra compare, use const.
ffs_mountfs,sys_swapctl: replace memset with kmem_zalloc.
sys_swapctl: move kmem_free outside the lock path.
uvm_init: fix comment, remove pointless numeration of steps.
uvm_map_enter: remove meflagval variable.
Fix some indentation.
 1.273  27-Jan-2012  para converting readdir in ffs ext2fs from malloc(9) to kmem(9)
while there allocate ufs mount structs from kmem(9) too
preceding kmem-vmem-pool-patch

releng@ acknowledged
 1.272  03-Jan-2012  pgoyette Display current mount point, rather than previous one, when printing
the "replaying log to disk" message.

OK dholland@

Fixes PR kern/39609
 1.271  14-Nov-2011  hannken branches: 1.271.4;
VOP_OPEN() needs a locked vnode. All these copy-and-pasted xxxfs_mount()
implementations need more review.
 1.270  13-Nov-2011  christos use getdiskinfo()
 1.269  07-Oct-2011  hannken branches: 1.269.2;
As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.
 1.268  17-Jun-2011  manu Add mount -o extattr option to enable extended attributs (corrently only
for UFS1).
Remove kernel option for EA backing store autocreation and do it by
default. Add a sysctl so that autocreated attriutr size can be modified.
 1.267  12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.266  27-Apr-2011  hannken branches: 1.266.2;
Cleanup ffs fsync and make devices on wapbl enabled file systems work here:

- Replace the ugly sync loop in ffs_full_fsync() and ffs_vfs_fsync() with
vflushbuf(). This loop is a relic of softdeps and not needed anymore.

- Add ffs_spec_fsync() for device nodes on ffs file systems that calls
spec_fsync() like all other file systems do and then updates the ctime.

Discussed on tech-kern.

Should fix PRs:
PR #41192 wapbl diagnostic panic during cgdconfig
PR #41977 kernel diagnostic assertion "rw_lock_held(&wl->wl_rwlock)" failed
PR #42149 wapbl locking panic if watching DVD
PR #42551 Lockdebug assert in wapbl when running zpool
 1.265  27-Mar-2011  mlelstv Don't abort when APPLE_UFS autodetection cannot read the apple ufs label
due to sector size or alignment problems. Autodetection is only a safety
measure, you should mark the filesystem type in the BSD disklabel.
 1.264  06-Mar-2011  bouyer merge the bouyer-quota2 branch. This adds a new on-disk format
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.

See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
 1.263  27-Dec-2010  hannken branches: 1.263.2; 1.263.4;
Extend the range of fstrans transactions to a sequence of vnode operations
on a locked vnode. This leaves a suspended file system and therefore a
snapshot with either all or no operations of such a sequence done.
 1.262  09-Aug-2010  pooka add a linefeed to the previous
 1.261  09-Aug-2010  pooka Return error if we try to mount a file system with block size > MAXBSIZE.

Note: there is a billion ways to make the kernel panic by trying
to mount a garbage file system and I don't imagine we'll ever get
close to fixing even half of them. However, for this one failing
gracefully is a bonus since Xen DomU only does 32k MAXBSIZE and
the 64k MAXBSIZE file systems are out there (PR port-xen/43727).

Tested by compiling sys/rump with CPPFLAGS+=-DMAXPHYS=32768 (all
tests in tests/fs still pass). I don't know how we're going to
translate this into an easy regression test, though. Maybe with
a hacked newfs?
 1.260  21-Jul-2010  hannken Make holding v_interlock mandatory for callers of vget().

Announced some time ago on tech-kern.
 1.259  24-Jun-2010  hannken Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.
 1.258  11-Feb-2010  mlelstv branches: 1.258.2;
There is no code left that uses disk size data, so don't query it.
This also failed when querying the simulated block device from mfs.
Fixes PR kern/42782.
 1.257  05-Feb-2010  mlelstv branches: 1.257.2;
Correct addressing of superblock updates.
 1.256  31-Jan-2010  mlelstv Fix block shift to work with different device block sizes.

Unlike other filesystems this has some side issues because
the shift values are stored in the superblock and because
userland utitlies share the same fsbtodb macros.

-> the kernel now ignores the value stored in the superblock.
-> the macro adaption is only done for defined(_KERNEL) code.
 1.255  31-Jan-2010  mlelstv Replace individual queries for partition information with
new helper function.
 1.254  08-Jan-2010  pooka The VATTR_NULL/VREF/VHOLD/HOLDRELE() macros lost their will to live
years ago when the kernel was modified to not alter ABI based on
DIAGNOSTIC, and now just call the respective function interfaces
(in lowercase). Plenty of mix'n match upper/lowercase has creeped
into the tree since then. Nuke the macros and convert all callsites
to lowercase.

no functional change
 1.253  04-Nov-2009  hannken Now that softdep has left the tree the only place needing the ffs_lock()
hack is ffs_sync().

- Use the generic lock operations for ffs.
- Change ffs_sync() to omit the vnode lock while suspending.

Reviewed by: Antti Kantee <pooka@netbsd.org>
 1.252  13-Sep-2009  bouyer If the WAPBL journal can't be read (ffs_wapbl_replay_start() fails),
mount the filesystem anyway if MNT_FORCE is present.
This allows to still boot single-user a system with a corrupted
WAPBL on /, and so get a chance to run fsck to fix it.
http://mail-index.netbsd.org/tech-kern/2009/08/17/msg005896.html
and followups.
 1.251  13-Sep-2009  tsutsui Move declaration of ufs_hashlock into <ufs/ufs_extern.h> from each c source.
 1.250  31-Jul-2009  pooka Don't free extattr resources until it is certain that unmount
succeeds. Also, "unmount system call" -> "unmount vfs operation"
in comment just so that our comments aren't 15+ years outdated.
 1.249  23-Jul-2009  pooka Restore error behaviour bulldozed in rev 1.246.

might fix PR kern/41769
 1.248  06-Jul-2009  christos Fix bug introduced in revision 1.174 where a NULL fspec with an MNT_UPDATE
command would always return EINVAL. This broke fsck on root, where fsck'ing
a dirty root would always return an error causing rc to resort in a reboot.
 1.247  29-Jun-2009  dholland Convert 67 namei call sites to use namei_simple, in these functions:

check_console, veriexecclose, veriexec_delete, veriexec_file_add,
emul_find_root, coff_load_shlib (sh3 version), coff_load_shlib,
compat_20_sys_statfs, compat_20_netbsd32_statfs,
ELFNAME2(netbsd32,probe_noteless), darwin_sys_statfs,
ibcs2_sys_statfs, ibcs2_sys_statvfs, linux_sys_uselib,
osf1_sys_statfs, sunos_sys_statfs, sunos32_sys_statfs,
ultrix_sys_statfs, do_sys_mount, fss_create_files (3 of 4),
adosfs_mount, cd9660_mount, coda_ioctl, coda_mount, ext2fs_mount,
ffs_mount, filecore_mount, hfs_mount, lfs_mount, msdosfs_mount,
ntfs_mount, sysvbfs_mount, udf_mount, union_mount, sys_chflags,
sys_lchflags, sys_chmod, sys_lchmod, sys_chown, sys_lchown,
sys___posix_chown, sys___posix_lchown, sys_link, do_sys_pstatvfs,
sys_quotactl, sys_revoke, sys_truncate, do_sys_utimes, sys_extattrctl,
sys_extattr_set_file, sys_extattr_set_link, sys_extattr_get_file,
sys_extattr_get_link, sys_extattr_delete_file,
sys_extattr_delete_link, sys_extattr_list_file, sys_extattr_list_link,
sys_setxattr, sys_lsetxattr, sys_getxattr, sys_lgetxattr,
sys_listxattr, sys_llistxattr, sys_removexattr, sys_lremovexattr

All have been scrutinized (several times, in fact) and compile-tested,
but not all have been explicitly tested in action.

XXX: While I haven't (intentionally) changed the use or nonuse of
XXX: TRYEMULROOT in any of these places, I'm not convinced all the
XXX: uses are correct; an audit might be desirable.
 1.246  25-Apr-2009  elad Add genfs_can_mount() and use it to prevent some more code duplication of
the security checks when mounting a device (VOP_ACCESS() + kauth(9) call)).

Proposed with no objections on tech-kern@:

http://mail-index.netbsd.org/tech-kern/2009/04/20/msg004859.html

The vnode is always expected to be locked, so no locking is done outside
the file-system code.
 1.245  29-Mar-2009  ad fsync:

- atime updates were not being synced.

ffs_sync:

- In some cases the sync vnode was acting like now dead /usr/sbin/update.
It was examining vnodes that it should have ignored.

- It would find dirty inodes and try to flush them. Often ffs_fsync()
cheerfully ignored the flush request due to the fsync bug. Such inodes
remained dirty and were repeatedly re-examined by the syncer until
vnode reclaim or system shutdown.

- We were marking our place in the per-mount vnode list even though in
most cases there was not flush to perform. While not a bug, this wasted
CPU cycles because a TAILQ_NEXT would have sufficed.
 1.244  21-Mar-2009  ad ffs_sync: ensure that we *do* flush atime updates periodically.
ffs_update() was eating the flag.
 1.243  22-Feb-2009  ad PR kern/26878 FFSv2 + softdep = livelock (no free ram)
PR kern/16942 panic with softdep and quotas
PR kern/19565 panic: softdep_write_inodeblock: indirect pointer #1 mismatch
PR kern/26274 softdep panic: allocdirect_merge: ...
PR kern/26374 Long delay before non-root users can write to softdep partitions
PR kern/28621 1.6.x "vp != NULL" panic in ffs_softdep.c:4653 while unmounting a softdep (+quota) filesystem
PR kern/29513 FFS+Softdep panic with unfsck-able file-corruption
PR kern/31544 The ffs softdep code appears to fail to write dirty bits to disk
PR kern/31981 stopping scsi disk can cause panic (softdep)
PR kern/32116 kernel panic in softdep (assertion failure)
PR kern/32532 softdep_trackbufs deadlock
PR kern/37191 softdep: locking against myself
PR kern/40474 Kernel panic after remounting raid root with softdep

Retire softdep, pass 2. As discussed and later formally announced on the
mailing lists.
 1.242  22-Feb-2009  ad PR kern/39564 wapbl performance issues with disk cache flushing
PR kern/40361 WAPBL locking panic in -current
PR kern/40361 WAPBL locking panic in -current
PR kern/40470 WAPBL corrupts ext2fs
PR kern/40562 busy loop in ffs_sync when unmounting a file system
PR kern/40525 panic: ffs_valloc: dup alloc

- A fix for an issue that can lead to "ffs_valloc: dup" due to dirty cg
buffers being invalidated. Problem discovered and patch by dholland@.

- If the syncer fails to lazily sync a vnode due to lock contention,
retry 1 second later instead of 30 seconds later.

- Flush inode atime updates every ~10 seconds (this makes most sense with
logging). Presently they didn't hit the disk for read-only files or
devices until the file system was unmounted. It would be better to trickle
the updates out but that would require more extensive changes.

- Fix issues with file system corruption, busy looping and other nasty
problems when logging and non-logging file systems are intermixed,
with one being the root file system.

- For logging, do not flush metadata on an inode-at-a-time basis if the sync
has been requested by ioflush. Previously, we could try hundreds of log
sync operations a second due to inode update activity, causing the syncer
to fall behind and metadata updates to be serialized across the entire
file system. Instead, burst out metadata and log flushes at a minimum
interval of every 10 seconds on an active file system (happens more often
if the log becomes full). Note this does not change the operation of
fsync() etc.

- With the flush issue fixed, re-enable concurrent metadata updates in
vfs_wapbl.c.
 1.241  13-Nov-2008  ad branches: 1.241.4;
Remove #ifdef LFS from the ufs code.
 1.240  10-Nov-2008  joerg Reduce internals of WAPBL exposed to the rest of the system.
 1.239  30-Oct-2008  joerg branches: 1.239.2;
Fix indentation.
 1.238  10-Oct-2008  hannken branches: 1.238.2;
Break a deadlock where one thread has a wapbl transaction, calls VOP_GETPAGES
and wants to busy a page while another thread calls VOP_PUTPAGES on the same
vnode, takes pages busy and wants to start a wapbl transaction.

Reviewed by: Jason Thorpe <thorpej@netbsd.org>
 1.237  23-Sep-2008  pooka Remove some of my debugging code which was not meant to be committed
in the wapbl merge.
 1.236  21-Sep-2008  freza Revert previous, pooka@ points out it's wrong.
 1.235  21-Sep-2008  freza WAPBL: in '%s: replaying log to disk' message use the path we're
trying to mount on instead of the misleading last-mounted-on
path. Reported by jmcneill.
 1.234  22-Aug-2008  hannken Add snapshot support for logging ffs file systems.

- Add UFS_WAPBL_BEGIN() / UFS_WAPBL_END() where needed.

- Expunge WAPBL log inodes from snapshots.

- Ffs_copyonwrite() and ffs_snapblkfree() must run inside a WAPBL transaction.

- Add ffs_gop_write() as a wrapper around genfs_gop_write() that makes sure
genfs_gop_write() gets always called inside a WAPBL transaction.

- Add VOP_PUTPAGES() flag PGO_JOURNALLOCKED to tag calls to VOP_PUTPAGES()
inside a WAPBL transaction.

Reviewed by: Simon Burge <simonb@netbsd.org>, Greg Oster <oster@netbsd.org>

PGO_JOURNALLOCKED / ffs_gop_write() part presented on tech-kern@.
 1.233  15-Aug-2008  hannken ffs_suspendctl: make sure everything is on disk and the on disk log is empty.
 1.232  31-Jul-2008  hannken Ffs snapshots don't work (yet) with WAPBL:
- no snapshot creation on logging file systems.
- refuse to mount logging file systems with persistent snapshots.

Ok: Simon Burge <simonb@netbsd.org>
 1.231  31-Jul-2008  simonb Merge the simonb-wapbl branch. From the original branch commit:

Add Wasabi System's WAPBL (Write Ahead Physical Block Logging)
journaling code. Originally written by Darrin B. Jewell while
at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

OK'd by core@, releng@.
 1.230  28-Jun-2008  rumble branches: 1.230.2;
Create sysctl entries during module initialisation and destroy them
appropriately.

Many of these file systems are now ready for modularisation.
 1.229  03-Jun-2008  hannken branches: 1.229.2;
ufs/ffs: replace calls to getblk() with ffs_getblk(). Now all buffers
have been run through copy-on-write and async mounts work again.

Fixes PR kern/38820

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.228  16-May-2008  hannken Make sure all cached buffers with valid, not yet written data have been
run through copy-on-write. Call fscow_run() with valid data where possible.

The LP_UFSCOW hack is no longer needed to protect ffs_copyonwrite() against
endless recursion.

- Add a flag B_MODIFY to bread(), breada() and breadn(). If set the caller
intends to modify the buffer returned.

- Always run copy-on-write on buffers returned from ffs_balloc().

- Add new function ffs_getblk() that gets a buffer, assigns a new blkno,
may clear the buffer and runs copy-on-write. Process possible errors
from getblk() or fscow_run(). Part of PR kern/38664.

Welcome to 4.99.63

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.227  10-May-2008  rumble Convert file systems to dynamically attach with the new module interface.
Make VFS hooks dynamic while we're here and say farewell to VFS_ATTACH and
VFS_HOOKS_ATTACH linksets.

As a consequence, most of the file systems can now be loaded as new style
modules.

Quick sanity check by ad@.
 1.226  06-May-2008  ad branches: 1.226.2;
PR kern/38141 lookup/vfs_busy acquire rwlock recursively

Simplify the mount locking. Remove all the crud to deal with recursion on
the mount lock, and crud to deal with unmount as another weirdo lock.

Hopefully this will once and for all fix the deadlocks with this. With this
commit there are two locks on each mount:

- krwlock_t mnt_unmounting. This is used to prevent unmount across critical
sections like getnewvnode(). It's only ever read locked with rw_tryenter(),
and is only ever write locked in dounmount(). A write hold can't be taken
on this lock if the current LWP could hold a vnode lock.

- kmutex_t mnt_updating. This is taken by threads updating the mount, for
example when going r/o -> r/w, and is only present to serialize updates.
In order to take this lock, a read hold must first be taken on
mnt_unmounting, and the two need to be held across the operation.

One effect of this change: previously if an unmount failed, we would make a
half hearted attempt to back out of it gracefully, but that was unlikely to
work in a lot of cases. Now while an unmount that will be aborted is in
progress, new file operations within the mount will fail instead of being
delayed. That is unlikely to be a problem though, because if the admin
requests unmount of a file system then s(he) has made a decision to deny
access to the resource.
 1.225  30-Apr-2008  ad PR kern/38135 vfs_busy/vfs_trybusy confusion

The previous fix worked, but it opened a window where mounts could have
disappeared from mountlist while the caller was traversing it using
vfs_trybusy(). Fix that.
 1.224  29-Apr-2008  ad PR kern/38057 ffs makes assuptions about devvp file system
PR kern/33406 softdeps get stuck in endless loop

Introduce VFS_FSYNC() and call it when syncing a block device, if it
has a mounted file system.
 1.223  17-Apr-2008  hannken branches: 1.223.2; 1.223.4;
Replace get/setspecific with a void pointer in struct ufsmount. Use explicit
initialization/finalization of snapshot private data on creation/deletion
of struct ufsmount.
Snapshot mounts no longer may fail silently because kmem_alloc() fails.

Welcome to 4.99.60

Ok: Andrew Doran <ad@netbsd.org>
 1.222  30-Jan-2008  ad branches: 1.222.6;
PR kern/37706 (forced unmount of file systems is unsafe):

- Do reference counting for 'struct mount'. Each vnode associated with a
mount takes a reference, and in turn the mount takes a reference to the
vfsops.
- Now that mounts are reference counted, replace the overcomplicated mount
locking inherited from 4.4BSD with a recursable rwlock.
 1.221  28-Jan-2008  dholland Fix some race conditions in rename.
Introduce a per-FS rename lock and new vfsops to manipulate it.
Get this lock while renaming. Also add another relookup() in do_sys_rename,
which is a hack to kludge around some of the worst deficiencies of
ufs_rename.
reviewed-by: pooka (and an earlier rev by ad)
posted on tech-kern with no objections.
 1.220  25-Jan-2008  pooka Destroy extattr lock when destroying extattrs associated with the
mountpoint. Make stopping extattrs always succesful to facilitate
always being able to free resources.
 1.219  24-Jan-2008  ad specfs changes for PR kern/37717 (raidclose() is no longer called on
shutdown). There are still problems with device access and a PR will be
filed.

- Kill checkalias(). Allow multiple vnodes to reference a single device.

- Don't play dangerous tricks with block vnodes to ensure that only one
vnode can describe a block device. Instead, prohibit concurrent opens of
block devices. As a bonus remove the unreliable code that prevents
multiple file system mounts on the same device. It's no longer needed.

- Track opens by vnode and by device. Issue cdev_close() when the last open
goes away, instead of abusing vnode::v_usecount to tell if the device is
open.
 1.218  09-Jan-2008  ad Fix hangs on 'biolock' when creating a directory under / with softdep.
 1.217  07-Jan-2008  ad Fix 'panic: softdep_update_inodeblock: update failed'.
 1.216  03-Jan-2008  ad Use pool_cache.
 1.215  03-Jan-2008  pooka valloc -> vnalloc, vfree -> vnfree
Avoids collision with userland valloc(3).

no functional change
ad ok
 1.214  02-Jan-2008  ad Merge vmlocking2 to head.
 1.213  20-Dec-2007  dyoung Call genfs_node_init a little earlier to avoid a vput()ing an
uninitialized node, later, which leads to a kernel panic. Patch
by Antti Kantee.
 1.212  08-Dec-2007  pooka branches: 1.212.4;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.
 1.211  26-Nov-2007  pooka branches: 1.211.2;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.210  10-Oct-2007  ad branches: 1.210.4;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.209  08-Oct-2007  ad Merge ffs locking & brelse changes from the vmlocking branch.
 1.208  09-Aug-2007  hannken branches: 1.208.2; 1.208.4;
Move snapshot per-mount data from struct ufsmount to mount specific data.
No functional changes.

Welcome to 4.99.28 (struct ufsmount changed size)
 1.207  31-Jul-2007  pooka branches: 1.207.2; 1.207.4;
* nuke the nameidata parameter from VFS_MOUNT(). Nobody on tech-kern
knew what it was supposed to be used for and wrstuden gave a go-ahead
* while rototilling, convert file systems which went easily to
use VFS_PROTOS() instead of manually prototyping the methods
 1.206  20-Jul-2007  pooka In sync, skip over vnodes based on if they are clean rather than
if they have pages.
 1.205  17-Jul-2007  pooka branches: 1.205.2;
Make set_statvfs_info() take a parameter for the vfs name instead
of always retrieving it from mp->mnt_op->vfs_name

christos ok
 1.204  12-Jul-2007  dsl Change the VFS_MOUNT() interface so that the 'data' buffer passed to the
fs code is a kernel buffer, pass though the length of the buffer as well.
Since the length of the userspace buffer isn'it (yet) passed through the mount
system call, add a field to the vfsops structure containing the default length.
Split sys_mount() for calls from compat code.
Ride one of the recent kernel version changes - old fs LKMs will load, but
sys_mount() will reject any attempt to use them.
 1.203  10-Jul-2007  hannken Move `struct dquot' and its supporting functions from quota.h to ufs_quota.c.

- Make quota-internal functions static.
- Clean up declarations in quota.h and ufs_extern.h. quota.h now has the
description of quota criterions, on-disk structure, user-kernel interface and
declaration of init/done functions. All ufs quota related function
prototypes go to ufs_extern.h.
- New functions ufsquota_init() and ufsquota_free() create or destroy the
quota fields of `struct inode'.
- chkdq() and chkiq() always update the quota fields of `struct inode' first.
- Only ufs_access() explicitely calls getinoquota().

No objections on tech-kern@
 1.202  30-Jun-2007  pooka Using POOL_INIT here makes no sense, since file systems always have
an init method. So get rid of it and #ifdef _LKM and just always
init in the init method. Give malloc types the same treatment.
Makes file systems nicer to work with in linksetless environments
and fixes a few LKM discrepancies.
 1.201  29-May-2007  tsutsui Fix inconsistent changes in rev 1.153 and 1.154:
Adjust fs->fs_maxfilesize instead of ump->um_maxfilesize
in ffs_oldfscompat_read() because the latter is overrided
by the former after ffs_oldfscompat_read() returned.

Fixes EFBIG errors on read(2) and "exec /sbin/init: error 8"
problem on mac68k after mountroot() on old 4.3BSD UFS created
by the Mkfs tool for MacOS (reported and confirmed on port-mac68k).
 1.200  28-May-2007  ad Fix lock order inversion between vnode locks and ufs_hashlock. Addresses
kern/36331 (MP deadlock between ufs_ihashget() and VOP_LOOKUP()) for ffs,
other file systems to follow. Reported by perseant@, debugged by Sverre
Froyen, patch posted/tested by Blair Sadewitz.
 1.199  17-May-2007  hannken Fstrans_start() always returns zero, so change its type to void.
 1.198  07-Apr-2007  hannken Remove calls to now obsolete vn_start_write() and vn_finished_write().
 1.197  12-Mar-2007  ad branches: 1.197.2;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.
 1.196  16-Feb-2007  hannken branches: 1.196.2; 1.196.6;
Make fstrans(9) the default helper for file system suspension.
Replaces the now obsolete vn_start_write()/vn_finished_write().
 1.195  15-Feb-2007  ad Replace some uses of lockmgr() / simplelocks.
 1.194  29-Jan-2007  hannken Change fstrans enum types to upper case.
No functional change.

From Antti Kantee <pooka@netbsd.org>
 1.193  19-Jan-2007  hannken New file system suspension API to replace vn_start_write and vn_finished_write.
The suspension helpers are now put into file system specific operations.
This means every file system not supporting these helpers cannot be suspended
and therefore snapshots are no longer possible.

Implemented for file systems of type ffs.

The new API is enabled on a kernel option NEWVNGATE. This option is
not enabled by default in any kernel config.

Presented and discussed on tech-kern with much input from
Bill Studenmund <wrstuden@netbsd.org> and YAMAMOTO Takashi <yamt@netbsd.org>.

Welcome to 4.99.9 (new vfs op vfs_suspendctl).
 1.192  07-Jan-2007  isaki Correct indent.
 1.191  04-Jan-2007  elad Consistent usage of KAUTH_GENERIC_ISSUSER.
 1.190  16-Nov-2006  christos branches: 1.190.2; 1.190.4;
__unused removal on arguments; approved by core.
 1.189  25-Oct-2006  reinoud Revisit mnt_vnodelist TAILQ patch. Remove all suspicious TAILQ_FOREACH()
loops where vnodes can get removed or added during the loops. This could
lead to panic's on unmount since nodes are skipped or otherwise
TAILQ_NEXT(0xdeadbeef, ...) was dereferenced.
 1.188  20-Oct-2006  reinoud Replace the LIST structure mp->mnt_vnodelist to a TAILQ structure since all
vnodes were synced and processed backwards. This meant that the last
accessed node was processed first and the earlierst last.

An extra benefit is the removal of the ugly hack from the Berkly days on
LFS.

In the proces, i've also replaced the various variations hand written loops
by the TAILQ_FOREACH() macro's.
 1.187  12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.186  21-Sep-2006  jld Change ffs_mount, in MNT_UPDATE case, to check dev_t's for equality
instead of just vnode pointers. Fixes erroneous "does not match mounted
device" errors from mount(8) in the presence of MFS /dev, init.root, &c.

No objections on tech-kern.
 1.185  30-Aug-2006  christos branches: 1.185.2; 1.185.4;
fix missing initializers
 1.184  23-Jul-2006  ad Use the LWP cached credentials where sane.
 1.183  13-Jul-2006  martin Fix alignement problems for fhandle_t, exposed by gcc4.1.

While touching all vptofh/fhtovp functions, get rid of VFS_MAXFIDSIZ,
version the getfh(2) syscall and explicitly pass the size available in
the filehandle from userland.

Discussed on tech-kern, with lots of help from yamt (thanks!).
 1.182  07-Jun-2006  kardel branches: 1.182.2;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.181  14-May-2006  elad branches: 1.181.2;
integrate kauth.
 1.180  21-Feb-2006  thorpej branches: 1.180.2; 1.180.4; 1.180.6;
Use device_class() instead of accessing dv_class directly.
 1.179  14-Jan-2006  yamt branches: 1.179.2; 1.179.4;
- unify ffs_blkatoff and lfs_blkatoff.
- remove ufs_ops::uo_blkatoff.
- add directory read-ahead code. (disabled for now.)
 1.178  23-Dec-2005  rpaulo branches: 1.178.2;
Convert UFS_EXTATTR to struct lwp.
 1.177  11-Dec-2005  christos merge ktrace-lwp.
 1.176  02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.175  27-Sep-2005  yamt branches: 1.175.2;
introduce "ufs_ops" and use it for ITIMES.
 1.174  23-Sep-2005  jmmv Apply the NFS exports list rototill patch:

- Remove all NFS related stuff from file system specific code.
- Drop the vfs_checkexp hook and generalize it in the new nfs_check_export
function, thus removing redundancy from all file systems.
- Move all NFS export-related stuff from kern/vfs_subr.c to the new
file sys/nfs/nfs_export.c. The former was becoming large and its code
is always compiled, regardless of the build options. Using the latter,
the code is only compiled in when NFSSERVER is enabled. While doing this,
also make some functions in nfs_subs.c conditional to NFSSERVER.
- Add a new command in nfssvc(2), called NFSSVC_SETEXPORTSLIST, that takes a
path and a set of export entries. At the moment it can only clear the
exports list or append entries, one by one, but it is done in a way that
allows setting the whole set of entries atomically in the future (see the
comment in mountd_set_exports_list or in doc/TODO).
- Change mountd(8) to use the nfssvc(2) system call instead of mount(2) so
that it becomes file system agnostic. In fact, all this whole thing was
done to remove a 'XXX' block from this utility!
- Change the mount*, newfs and fsck* userland utilities to not deal with NFS
exports initialization; done internally by the kernel when initializing
the NFS support for each file system.
- Implement an interface for VFS (called VFS hooks) so that several kernel
subsystems can run arbitrary code upon receipt of specific VFS events.
At the moment, this only provides support for unmount and is used to
destroy NFS exports lists from the file systems being unmounted, though it
has room for extension.

Thanks go to yamt@, chs@, thorpej@, wrstuden@ and others for their comments
and advice in the development of this patch.
 1.173  22-Sep-2005  rpaulo Fix bogus if-clause introduced in previous revision.
 1.172  22-Sep-2005  rpaulo In ffs_unmount(), detect EOPNOTSUPP errno returned from
ufs_extattr_stop().

From FreeBSD.
 1.171  12-Sep-2005  christos - access the ffs and ext2fs itimes functions through a pointer, so that
if the filesystem is not compiled in the kernel still links. Probably
a better solution is to use weak symbols.
- move the filesystem-specific itime macros to the filesystem header files.
 1.170  28-Aug-2005  thorpej Experimental support for extended attributes on UFS1 file systems, using a
backing file per attribute type indexed by inode number to hold the extended
attributes.

This is working pretty well on my test systems, except for the "autostart"
feature. I need someone with a better handle on the VFS locking protocol
to go over that.

This is a work-in-progress. There are parts of this that could be re-factored
allowing this approach to be used on other types of file systems.

Adapted from FreeBSD.
 1.169  23-Aug-2005  christos Don't overload MAXNAMLEN, use a separate constant for each filesystem type.
 1.168  25-Jul-2005  drochner fix crash in mount error handling: don't free storage which was not
malloc'd
 1.167  23-Jul-2005  yamt update file timestamps for nfsd loaned-read and mmap.
PR/25279. discussed on tech-kern@.
 1.166  15-Jul-2005  thorpej Use ANSI function decls.
 1.165  28-Jun-2005  yamt branches: 1.165.2;
- constify genfs_ops.
- use member designators.
 1.164  29-May-2005  christos - sprinkle const
- avoid shadow variables.
 1.163  29-Mar-2005  thorpej - Define a VFS_ATTACH() macro that places a reference to a vfsops structure
into the "vfsops" link set.
- Use VFS_ATTACH() where vfsops are declared for individual file systems.
- In vfsinit(), traverse the "vfsops" link set, rather than vfs_list_initial[].
 1.162  04-Mar-2005  christos branches: 1.162.2;
PR/26823: Michael L. Hitch: Endianness flag were not preserved in the compat
superblock read routine.
 1.161  26-Feb-2005  perry nuke trailing whitespace
 1.160  11-Jan-2005  mycroft branches: 1.160.2; 1.160.4;
Rearrange some code slightly to avoid uninitialized variable warnings.
 1.159  09-Jan-2005  mycroft Rework the mountroot interface so that vfs_mountroot() opens the root device
and just passes it on to the file system functions. This avoids opening and
closing the device several times.

Mentioned on tech-kern some time ago, IIRC. I've been running this for a
long time.
 1.158  02-Jan-2005  thorpej Add the system call and VFS infrastructure for file system extended
attributes.

From FreeBSD.
 1.157  26-Dec-2004  dbj remove opt_compat_netbsd.h, afaict it is no longer needed.
i think it was previously used to pull in COMPAT_09 for ffs_statfs
 1.156  21-Nov-2004  jdolecek allow changes of the sysctl values
 1.155  21-Sep-2004  thorpej Add a new VNODE_LOCKDEBUG option, which enables checks in the VOP_*()
calls to ensure that the vnode lock state is as expected when the VOP
call is made. Modify vnode_if.src to set the expected state according
to the documenting lock table for each VOP. Modify vnode_if.sh to emit
the checks.

Notes:
- The checks are only performed if the vnode has the VLOCKSWORK bit
set. Some file systems (e.g. specfs) don't even bother with vnode
locks, so of course the checks will fail.
- We can't actually run with VNODE_LOCKDEBUG because there are so many
vnode locking problems, not the least of which is the "use SHARED for
VOP_READ()" issue, which screws things up for the entire call chain.

Inspired by similar changes in OpenBSD, but implemented differently.
 1.154  19-Sep-2004  yamt um_maxfilesize should be set after
ffs_oldfscompat_read adjusted fs_maxfilesize.
 1.153  15-Aug-2004  mycroft Fixing age old cruft:
* Rather than using mnt_maxsymlinklen to indicate that a file systems returns
d_type fields(!), add a new internal flag, IMNT_DTYPE.

Add 3 new elements to ufsmount:
* um_maxsymlinklen, replaces mnt_maxsymlinklen (which never should have existed
in the first place).
* um_dirblksiz, which tracks the current directory block size, eliminating the
FS-specific checks littered throughout the code. This may be used later to
make the block size variable.
* um_maxfilesize, which is the maximum file size, possibly adjusted lower due
to implementation issues.

Sync some bug fixes from FFS into ext2fs, particularly:
* ffs_lookup.c 1.21, 1.28, 1.33, 1.48
* ffs_inode.c 1.43, 1.44, 1.45, 1.66, 1.67
* ffs_vnops.c 1.84, 1.85, 1.86

Clean up some crappy pointer frobnication.
 1.152  14-Aug-2004  mycroft Add a new flag, IN_MODIFY. This is like IN_UPDATE|IN_CHANGE, but unlike
setting those flags, it does not cause the inode to be written in the periodic
sync. This is used for writes to special files (devices and named pipes) and
FIFOs.

Do not preemptively sync updates to access times and modification times. They
are now updated in the inode only opportunistically, or when the file or device
is closed. (Really, it should be delayed beyond close, but this is enough to
help substantially with device nodes.)

And the most amusing part:
Trickle sync was broken on both FFS and ext2fs, in different ways. In FFS, the
periodic call to VFS_SYNC(MNT_LAZY) was still causing all file data to be
synced. In ext2fs, it was causing the metadata to *not* be synced. We now
only call VOP_UPDATE() on the node if we're doing MNT_LAZY. I've confirmed
that we do in fact trickle correctly now.
 1.151  05-Jul-2004  pk Call inittodr() from main(). Let file system code set the recorded `last
update' time (if any) through the new function setrootfstime().
 1.150  27-May-2004  hannken Fixup last commit. fs->fs_active must be initialized.
 1.149  25-May-2004  hannken Add ffs internal snapshots. Written by Marshall Kirk McKusick for FreeBSD.

- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.

Welcome to 2.0F.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.148  25-May-2004  atatat Sysctl descriptions under vfs subtree
 1.147  20-May-2004  atatat Explicitly call pool_init() (and pool_destroy()) when being built as
an _LKM.

This adds pools to the list of things that lkms must do manually
because they're set up with link sets. Not that there's anything
wrong with link sets, but that we need to try harder to remember that
lkms are second class citizens. Of a sort.
 1.146  26-Apr-2004  simonb Unwrap a not-too-long line.
 1.145  25-Apr-2004  dbj remove botched superblock upgrade warnings.
there are now alternate non-kernel checks and fixes for this problem.
relevent prs include:
bin/17910 kern/21283 kern/21404 port-macppc/23925 port-macppc/23926
install/25138
 1.144  25-Apr-2004  simonb Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.
 1.143  21-Apr-2004  christos Replace the statfs() family of system calls with statvfs().
Retain binary compatibility.
 1.142  18-Apr-2004  dbj remove code that attempts to correct superblock location. this
enforces an unnecessary restriction that the superblock be in the
particular expected locations. Also, the compatibility case is
handled in ffs_oldfscompat_read.
 1.141  18-Apr-2004  dbj when enabling ffs compatibility in ffs_reload, use
sblockloc that superblock was read from
also note XXX that ffs_reload doesn't handle superblock moving
 1.140  27-Mar-2004  dsl branches: 1.140.2;
Rework previous so that FS_FLAGS_UPDATED is only looked at for ffsv1
 1.139  24-Mar-2004  atatat Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.
 1.138  21-Mar-2004  dsl Rework superblock validation logic to make adding validity tests easier.
Ensure that we don't use the first alternate superblock of a ffsv1
filesystem with 64k blocks (it is in the same place as an ffsv2 sb).
Fixes part of PR kern/24809
 1.137  11-Mar-2004  dbj quiet tls. change botched superblock warning to use -b 16
 1.136  10-Mar-2004  keihan s/netbsd.org/NetBSD.org/g
 1.135  22-Feb-2004  jdolecek make sblock_try[] const
 1.134  12-Jan-2004  dbj change the updating note to say you may need fsck_ffs -b 32 -c 4'
 1.133  12-Jan-2004  dbj add checks for a couple of botched superblock upgrade cases
and report a warning with repair references.
 1.132  10-Jan-2004  hannken Split out softdep_flushworklist() from softdep_flushfiles() so that
it can be used to clear the work queue.

Cleanup ffs_sync() which did not synchronously wait when MNT_WAIT
was specified. Clear the work queue when MNT_WAIT is specified.

Result is a clean on-disk file system after ffs_sync(.., MNT_WAIT, ..)

From FreeBSD.
 1.131  09-Jan-2004  dbj never upgrade the superblock or set FS_FLAGS_UPDATED in fs_old_flags
add compatibility for filesystems created before FFSv2 integration
these patches are from pr port-macppc/23926 and should also fix
problems discussed in pr kern/21404 and pr kern/21283
 1.130  04-Dec-2003  atatat Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
 1.129  01-Dec-2003  dbj in ffs_unmount, ignore error returned by VOP_CLOSE(devvp)
this fixes a problem where device close error would cause
unmount to fail but structures to be left partially deallocated
 1.128  08-Nov-2003  dbj fix minor memory leaks in error paths of ffs_mountfs
 1.127  05-Nov-2003  hannken Clean up the usage of vn_start_write(). At least one occurence clobbered
previous error conditions.
If "(flags & (V_WAIT|V_PCATCH)) == V_WAIT" the return value is always zero.
Ignore the return value in these cases.

From Darrin B. Jewell.
 1.126  30-Oct-2003  simonb Remove some assigned-to but otherwise unused variables.
 1.125  15-Oct-2003  hannken Add the gating of system calls that cause modifications to the underlying
file system.
The function vfs_write_suspend stops all new write operations to a file
system, allows any file system modifying system calls already in progress
to complete, then sync's the file system to disk and returns. The
function vfs_write_resume allows the suspended write operations to
complete.

From FreeBSD with slight modifications.

Approved by: Frank van der Linden <fvdl@netbsd.org>
 1.124  14-Oct-2003  dbj add mnt_iflag field to struct mount for internal flags
mv MNT_GONE, MNT_UNMOUNT and MNT_WANTRDWR to this field
additonally add mnt_writeopcountupper and mnt_writeopcountlower fields
in preparation for pending write suspension support work
bump kernel version to 1.6ZD
 1.123  25-Sep-2003  enami In ffs_sbupdate(), swap the sblock after ffs_oldfscompat_write() is
applied rather than the original.
 1.122  17-Sep-2003  enami Fix a recently introduced bug which prevents csum totals being copied
when an old ffs filesytem is first mounted (as a result, df reports disk
full on old ffs filesystem or mfs created by old binary). Problem first
noticed by onoe san.
 1.121  13-Sep-2003  bouyer make sure to not get flags which are for internal use only from the on-disk
superblock.
Proposed in http://mail-index.netbsd.org/tech-kern/2003/09/06/0005.html
 1.120  13-Sep-2003  bouyer Commit changes proposed in
http://mail-index.netbsd.org/tech-kern/2003/09/06/0001.html
http://mail-index.netbsd.org/tech-kern/2003/09/06/0006.html
to avoid compat problems with old ffsv1 by reuse of the old FS_SWAPPED
value for FS_FLAGS_UPDATED, and use of new, larger fields:
- Don't use FS_FLAGS_UPDATED to see if we need to update new fields from
old fields in ffsv1 case.
- when writing back the superblock, copy back the flags to the old location
if only old flags are set (FS_FLAGS_UPDATED won't be set in this case)
in ffsv1 case.
 1.119  07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.118  29-Jun-2003  fvdl branches: 1.118.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.117  29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.116  28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.115  12-Jun-2003  fvdl OS X still seems to use the old nrpos field in the superblock, and gets
unhappy after NetBSD wrote an Apple UFS filesystem. Just set it to 0
in this case.
 1.114  03-May-2003  christos make sure we update fs_fsmnt.
 1.113  16-Apr-2003  christos PR/1796: John Kohl: statfs misbehaves under chrooted environments.

- Under chroot it displays only the visible filesystems with appropriate paths.
- The statfs f_mntonname gets adjusted to contain the real path from root.
- While was there, fixed a bug in ext2fs, locking problems with vfs_getfsstat(),
and factored out some of the vfsop statfs() code to copy_statfs_info(). This
fixes the problem where some filesystems forgot to set fsid.
- Made coda look more like a normal fs.
 1.112  12-Apr-2003  fvdl Don't cache buffers used when finding the superblock, it can lead to
seeing bogus data for the first cg with certain block/frag sizes.
From enami tsugutomo.
 1.111  05-Apr-2003  fvdl * Use the old and new time fields in the superblock as well as a few others
to determine if this filesystem was mounted by an older kernel after
having been mounted by a newer one, to avoid some summary mismatches.
* Reinstate support for 4.2 cylinder groups (read-only, as it was before).
 1.110  02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.109  31-Mar-2003  fvdl The modified flag must be cleared before the last sbupdate call in
unmount, because ffs_flushfiles or softdep_flushfiles may have
modified the filesystem (despite VFS_SYNC having been called first).
 1.108  21-Mar-2003  dsl Use 'void *' instead of 'caddr_t' in prototypes of VOP_IOCTL, VOP_FCNTL
and VOP_ADVLOCK, delete casts from callers (and some to copyin/out).
 1.107  17-Feb-2003  perseant Add code to UBCify LFS. This is still behind "#ifdef LFS_UBC" for now
(there are still some details to work out) but expect that to go
away soon. To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:

* Create a writer daemon kernel thread whose purpose is to handle page
writes for the pagedaemon, but which also takes over some of the
functions of lfs_check(). This thread is started the first time an
LFS is mounted.

* Add a "flags" parameter to GOP_SIZE. Current values are
GOP_SIZE_READ, meaning that the call should return the size of the
in-core version of the file, and GOP_SIZE_WRITE, meaning that it
should return the on-disk size. One of GOP_SIZE_READ or
GOP_SIZE_WRITE must be specified.

* Instead of using malloc(...M_WAITOK) for everything, reserve enough
resources to get by and use malloc(...M_NOWAIT), using the reserves if
necessary. Use the pool subsystem for structures small enough that
this is feasible. This also obsoletes LFS_THROTTLE.

And a few that are not strictly necessary:

* Moves the LFS inode extensions off onto a separately allocated
structure; getting closer to LFS as an LKM. "Welcome to 1.6O."

* Unified GOP_ALLOC between FFS and LFS.

* Update LFS copyright headers to correct values.

* Actually cast to unsigned in lfs_shellsort, like the comment says.

* Keep track of which segments were empty before the previous
checkpoint; any segments that pass two checkpoints both dirty and
empty can be summarily cleaned. Do this. Right now lfs_segclean
still works, but this should be turned into an effectless
compatibility syscall.
 1.106  24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.105  01-Dec-2002  matt Add multiple inclusion protection for headers. Fix mismatched
variable declarations (missing const's) as needed.
 1.104  24-Nov-2002  scw Quell an uninitialised variable warning.
 1.103  28-Sep-2002  dbj Add support for the Apple UFS variation on ffs
This is the bulk of PR #17345

The general approach is to use a run time deteriminable value
for DIRBLKSIZ. Additional allowances are included for using
MAXSYMLINKLEN with FS_42INODEFMT and a shift in the cylinder group
cluster summary count array. Support is added for managing
the Apple UFS volume label.
 1.102  21-Sep-2002  christos MNT_GETARGS support
 1.101  06-Sep-2002  gehenna Merge the gehenna-devsw branch into the trunk.

This merge changes the device switch tables from static array to
dynamically generated by config(8).

- All device switches is defined as a constant structure in device drivers.

- The new grammer ``device-major'' is introduced to ``files''.

device-major <prefix> char <num> [block <num>] [<rules>]

- All device major numbers must be listed up in port dependent majors.<arch>
by using this grammer.

- Added the new naming convention.
The name of the device switch must be <prefix>_[bc]devsw for auto-generation
of device switch tables.

- The backward compatibility of loading block/character device
switch by LKM framework is broken. This is necessary to convert
from block/character device major to device name in runtime and vice versa.

- The restriction to assign device major by LKM is completely removed.
We don't need to reserve LKM entries for dynamic loading of device switch.

- In compile time, device major numbers list is packed into the kernel and
the LKM framework will refer it to assign device major number dynamically.
 1.100  30-Jul-2002  soren Die, qaddr_t, die! - mnt_data in struct mount is already effectively
a void *, so stop pretending otherwise.
 1.99  09-Jun-2002  chs allow read-only mounts even if we can't read the last fragment of the fs.
this enables one to recover data from a failing disk (where the read failure
is a hardware problem) while avoiding corrupting the fs further (in the case
where the read failure is due to a misconfiguration).
 1.98  10-Apr-2002  mycroft branches: 1.98.2; 1.98.4;
Use blkstofrags() and fragstoblks(). Use &(NBBY-1) rather than %NBBY.
Switch off of fs_fragshift rather than fs_frag (generates better jump tables).
 1.97  01-Apr-2002  enami Hold an extra reference if updating and args.fspec == NULL.
 1.96  01-Apr-2002  christos Fixes from enami:

- If VOP_ACCESS fails when updating mount, we will vrele() twice.

- The check for update-only flags in mp->mnt_flag when not updating
case is bogus. If we really want to check, we need to see flags in
ufs_args, but I'm not sure if it is really necessary.

- The credential passed to ffs_reload was credential of when looking
up mount point, but now it is credential of when looking up device
node. Anyway, it may be current process's credential.
 1.95  31-Mar-2002  christos PR/16136: Chris Jepeway: Bogus entry in /etc/fstab can panic kernel.
 1.94  17-Mar-2002  chs when mounting a filesystem, read the last block in the filesystem
to verify that the device is at least as big as the superblock claims
the filesystem is supposed to be, and if it's not then fail the mount.
this should help reduce the type of confusion reported in PR 13228.
 1.93  08-Mar-2002  thorpej Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.
 1.92  28-Feb-2002  pooka Don't add fs->fs_pendingblocks to f_bavail twice. It's already included
in f_bfree, which is added to f_bavail.

Fixes problem with statfs reporting too much free space for filesystems
which have files pending to be freed by softdeps.
 1.91  30-Dec-2001  fvdl XXXX temporary measure: in the case of a softdep 'unmount pending error',
do not mark the filesystem clean, as this will mean that one or more
files were likely not completely removed (will show up as unconnected
in fsck). Prevents filesystems from being marked clean while they're
not until this problem has been figured out.
 1.90  19-Dec-2001  fvdl ffs_reload may be called after an old fsck has run, and the pending*
fields may not be zero. Just reset them silently, it's not an error.
 1.89  18-Dec-2001  fvdl Bring over fixes from FreeBSD that weren't incorporated yet, mainly
from Kirk McKusick. They implement taking pending block/inode frees
into account for the sake of correct statfs() numbers, and adding
a new softdep type (newdirblk) to correctly handle newly allocated
directory blocks.

Minor additional changes: 1) swap the newly introduced fs_pendinginodes
and fs_pendingblock fields in ffs_sb_swap, and 2) declare lkt_held
in the debug version of the softdep lock structure volatile, as it
can be modified from interrupt context #ifdef DEBUG.
 1.88  30-Oct-2001  lukem add __KERNEL_RCSID()
 1.87  15-Sep-2001  chs branches: 1.87.2;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.86  15-Sep-2001  chs add a new VFS op, vfs_reinit, which is called when desiredvnodes is
adjusted via sysctl. file systems that have hash tables which are
sized based on the value of this variable now resize those hash tables
using the new value. the max number of FFS softdeps is also recalculated.

convert various file systems to use the <sys/queue.h> macros for
their hash tables.
 1.85  06-Sep-2001  lukem branches: 1.85.2;
Incorporate the enhanced ffs_dirpref() by Grigoriy Orlov, as found in
FreeBSD (three commits; the initial work, man page updates, and a fix
to ffs_reload()), with the following differences:
- Be consistent between newfs(8) and tunefs(8) as to the options which
set and control the tuning parameters for this work (avgfilesize & avgfpdir)
- Use u_int16_t instead of u_int8_t to keep track of the number of
contiguous directories (suggested by Chuck Silvers)
- Work within our FFS_EI framework
- Ensure that fs->fs_maxclusters and fs->fs_contigdirs don't point to
the same area of memory

The new algorithm has a marked performance increase, especially when
performing tasks such as untarring pkgsrc.tar.gz, etc.

The original FreeBSD commit messages are attached:

=====
mckusick 2001/04/10 01:39:00 PDT
Directory layout preference improvements from Grigoriy Orlov <gluk@ptci.ru>.
His description of the problem and solution follow. My own tests show
speedups on typical filesystem intensive workloads of 5% to 12% which
is very impressive considering the small amount of code change involved.

------

One day I noticed that some file operations run much faster on
small file systems then on big ones. I've looked at the ffs
algorithms, thought about them, and redesigned the dirpref algorithm.

First I want to describe the results of my tests. These results are old
and I have improved the algorithm after these tests were done. Nevertheless
they show how big the perfomance speedup may be. I have done two file/directory
intensive tests on a two OpenBSD systems with old and new dirpref algorithm.
The first test is "tar -xzf ports.tar.gz", the second is "rm -rf ports".
The ports.tar.gz file is the ports collection from the OpenBSD 2.8 release.
It contains 6596 directories and 13868 files. The test systems are:

1. Celeron-450, 128Mb, two IDE drives, the system at wd0, file system for
test is at wd1. Size of test file system is 8 Gb, number of cg=991,
size of cg is 8m, block size = 8k, fragment size = 1k OpenBSD-current
from Dec 2000 with BUFCACHEPERCENT=35

2. PIII-600, 128Mb, two IBM DTLA-307045 IDE drives at i815e, the system
at wd0, file system for test is at wd1. Size of test file system is 40 Gb,
number of cg=5324, size of cg is 8m, block size = 8k, fragment size = 1k
OpenBSD-current from Dec 2000 with BUFCACHEPERCENT=50

You can get more info about the test systems and methods at:
http://www.ptci.ru/gluk/dirpref/old/dirpref.html

Test Results

tar -xzf ports.tar.gz rm -rf ports
mode old dirpref new dirpref speedup old dirprefnew dirpref speedup
First system
normal 667 472 1.41 477 331 1.44
async 285 144 1.98 130 14 9.29
sync 768 616 1.25 477 334 1.43
softdep 413 252 1.64 241 38 6.34
Second system
normal 329 81 4.06 263.5 93.5 2.81
async 302 25.7 11.75 112 2.26 49.56
sync 281 57.0 4.93 263 90.5 2.9
softdep 341 40.6 8.4 284 4.76 59.66

"old dirpref" and "new dirpref" columns give a test time in seconds.
speedup - speed increasement in times, ie. old dirpref / new dirpref.

------

Algorithm description

The old dirpref algorithm is described in comments:

/*
* Find a cylinder to place a directory.
*
* The policy implemented by this algorithm is to select from
* among those cylinder groups with above the average number of
* free inodes, the one with the smallest number of directories.
*/

A new directory is allocated in a different cylinder groups than its
parent directory resulting in a directory tree that is spreaded across
all the cylinder groups. This spreading out results in a non-optimal
access to the directories and files. When we have a small filesystem
it is not a problem but when the filesystem is big then perfomance
degradation becomes very apparent.

What I mean by a big file system ?

1. A big filesystem is a filesystem which occupy 20-30 or more percent
of total drive space, i.e. first and last cylinder are physically
located relatively far from each other.
2. It has a relatively large number of cylinder groups, for example
more cylinder groups than 50% of the buffers in the buffer cache.

The first results in long access times, while the second results in
many buffers being used by metadata operations. Such operations use
cylinder group blocks and on-disk inode blocks. The cylinder group
block (fs->fs_cblkno) contains struct cg, inode and block bit maps.
It is 2k in size for the default filesystem parameters. If new and
parent directories are located in different cylinder groups then the
system performs more input/output operations and uses more buffers.
On filesystems with many cylinder groups, lots of cache buffers are
used for metadata operations.

My solution for this problem is very simple. I allocate many directories
in one cylinder group. I also do some things, so that the new allocation
method does not cause excessive fragmentation and all directory inodes
will not be located at a location far from its file's inodes and data.
The algorithm is:
/*
* Find a cylinder group to place a directory.
*
* The policy implemented by this algorithm is to allocate a
* directory inode in the same cylinder group as its parent
* directory, but also to reserve space for its files inodes
* and data. Restrict the number of directories which may be
* allocated one after another in the same cylinder group
* without intervening allocation of files.
*
* If we allocate a first level directory then force allocation
* in another cylinder group.
*/

My early versions of dirpref give me a good results for a wide range of
file operations and different filesystem capacities except one case:
those applications that create their entire directory structure first
and only later fill this structure with files.

My solution for such and similar cases is to limit a number of
directories which may be created one after another in the same cylinder
group without intervening file creations. For this purpose, I allocate
an array of counters at mount time. This array is linked to the superblock
fs->fs_contigdirs[cg]. Each time a directory is created the counter
increases and each time a file is created the counter decreases. A 60Gb
filesystem with 8mb/cg requires 10kb of memory for the counters array.

The maxcontigdirs is a maximum number of directories which may be created
without an intervening file creation. I found in my tests that the best
performance occurs when I restrict the number of directories in one cylinder
group such that all its files may be located in the same cylinder group.
There may be some deterioration in performance if all the file inodes
are in the same cylinder group as its containing directory, but their
data partially resides in a different cylinder group. The maxcontigdirs
value is calculated to try to prevent this condition. Since there is
no way to know how many files and directories will be allocated later
I added two optimization parameters in superblock/tunefs. They are:

int32_t fs_avgfilesize; /* expected average file size */
int32_t fs_avgfpdir; /* expected # of files per directory */

These parameters have reasonable defaults but may be tweeked for special
uses of a filesystem. They are only necessary in rare cases like better
tuning a filesystem being used to store a squid cache.

I have been using this algorithm for about 3 months. I have done
a lot of testing on filesystems with different capacities, average
filesize, average number of files per directory, and so on. I think
this algorithm has no negative impact on filesystem perfomance. It
works better than the default one in all cases. The new dirpref
will greatly improve untarring/removing/coping of big directories,
decrease load on cvs servers and much more. The new dirpref doesn't
speedup a compilation process, but also doesn't slow it down.

Obtained from: Grigoriy Orlov <gluk@ptci.ru>
=====

=====
iedowse 2001/04/23 17:37:17 PDT
Pre-dirpref versions of fsck may zero out the new superblock fields
fs_contigdirs, fs_avgfilesize and fs_avgfpdir. This could cause
panics if these fields were zeroed while a filesystem was mounted
read-only, and then remounted read-write.

Add code to ffs_reload() which copies the fs_contigdirs pointer
from the previous superblock, and reinitialises fs_avgf* if necessary.

Reviewed by: mckusick
=====

=====
nik 2001/04/10 03:36:44 PDT
Add information about the new options to newfs and tunefs which set the
expected average file size and number of files per directory. Could do
with some fleshing out.
=====
 1.84  02-Sep-2001  lukem Incorporate fix by iedowse @ FreeBSD to allow disks with large numbers of
cylinder groups to work correctly, with minor modifications by me to work
with our FFS_EI code. From the FreeBSD commit message:

The ffs superblock includes a 128-byte region for use by temporary
in-core pointers to summary information. An array in this region
(fs_csp) could overflow on filesystems with a very large number of
cylinder groups (~16000 on i386 with 8k blocks). When this happens,
other fields in the superblock get corrupted, and fsck refuses to
check the filesystem.

Solve this problem by replacing the fs_csp array in 'struct fs'
with a single pointer, and add padding to keep the length of the
128-byte region fixed. Update the kernel and userland utilities
to use just this single pointer.

With this change, the kernel no longer makes use of the superblock
fields 'fs_csshift' and 'fs_csmask'. Add a comment to newfs/mkfs.c
to indicate that these fields must be calculated for compatibility
with older kernels.

Reviewed by: mckusick
 1.83  17-Aug-2001  lukem remove third argument (`int ns') from ffs_sb_swap(), and let ffs_sb_swap()
determine the endianness of the `struct fs *o' superblock from o->fs_magic
and set needswap as necessary, rather than trusting the caller to get
it right. invariably, almost every caller of ffs_sb_swap() was calling it
with ns set to the wrong value for ns anyway!
ansi KNF ffs_bswap.c declarations whilst here.

this fixes all sorts of problems when trying to use other-endian file systems,
notably the kernel trying to access memory *way* off, possibly corrupting or
panicing, and userland programs SEGVing and/or corrupting things (e.g,
"fsck_ffs -B" to swap a file system endianness).

whilst the previous rev of ffs_bswap.c (1.10, 2000/12/23) made this problem
worse, i suspect that the problem was always there and previous versions
just happened not to trash things at the wrong time.

FFS_EI should now be a lot more stable.
 1.82  26-Jul-2001  lukem if printing the value of fs_clean, say 'fs_clean' instead of 'fs_flags' ...
 1.81  30-May-2001  mrg branches: 1.81.4;
use _KERNEL_OPT
 1.80  07-Feb-2001  chs branches: 1.80.2;
remove debug code that was left in by accident.
 1.79  22-Jan-2001  jdolecek make filesystem vnodeop, specop, fifoop and vnodeopv_* arrays const
 1.78  10-Jan-2001  mycroft On a RW->RO transition, explicitly clear fs_fmod after the cgupdate/sbupdate,
to prevent spurious writebacks and whinging about the (correct!) clean flag.
(Why this isn't done in ffs_sbupdate(), I dunno...)
 1.77  10-Jan-2001  chs attach the softdep pagecache pseudo-buffers to the inode
so we can find them quickly in the softdep truncate path.
 1.76  09-Jan-2001  mycroft ffs_reload(): Copy fs_ronly into the new superblock, too, as it may have been
modified on disk (e.g. by fsck(8)). This flag should really be elsewhere.
 1.75  04-Dec-2000  chs in ffs_sync(), don't skip vnodes which have (potentially dirty) pages.
 1.74  03-Dec-2000  fvdl In addition to setting the softdep flag in the superblock when
mounting with softdeps, also explicitly clear it when we don't,
so that a leftover setting after a crash will be cleared.
 1.73  27-Nov-2000  chs Initial integration of the Unified Buffer Cache project.
 1.72  13-Oct-2000  simonb There is no need to explicitly include <uvm/uvm_extern.h> for
<sys/sysctl.h> anymore.
 1.71  19-Sep-2000  fvdl Adapt for VOP_FSYNC parameter change.

Implement range fsync for FFS. Note: not yet implemented for the
SOFTDEP case.
 1.70  28-Jun-2000  mrg <vm/vm.h> -> <uvm/uvm_extern.h>
 1.69  27-Jun-2000  fvdl Due to popular demand, change vinsheadfree to ungetnewvnode to make
the name clearer. No functional change.
 1.68  27-Jun-2000  fvdl In ffs_vget, do not hold ufs_haslock across the call to getnewvnode.
We may sleep in it, or even recurse, with softdeps. Instead, grab
the lock later, but check if noone else has beaten us to the VFS_VGET
operation, and if so, roll back getnewvnode using vinsheadfree, and
just return.
 1.67  16-Jun-2000  perseant branches: 1.67.2;
make it compile (fix typo)
 1.66  16-Jun-2000  matt ignore the softdep flags when mounting and there's no softdep in the kernel.
 1.65  15-Jun-2000  fvdl Allow MNT_SOFTDEP to be passed in via the mount(2) system call, do not
require it to be set via tunefs(8). Silently ignore it when doing
an update mount of a writeable filesystem, the FFS/softdep code isn't ready
for this yet.
 1.64  29-May-2000  mycroft Use LIST_{FIRST,NEXT,EMPTY}().
 1.63  29-May-2000  mycroft Add a new inode flags called IN_ACCESSED. This used in place of IN_MODIFIED
to record that the atime was updated. In ffs_update(), we only do synchronous
writes if something *other* than the atime was changed.
 1.62  04-Apr-2000  jdolecek branches: 1.62.2;
Add a new sysctl variable vfs.ffs.log_changeopt - if this is true,
an optimalization strategy change is logged into syslog. Default
is 0 (to not log). This replaces the recent not quite "right"
change to only log the change if kernel is compiled with DEBUG.
 1.61  30-Mar-2000  augustss Remove register declarations.
 1.60  30-Mar-2000  simonb Delete redundant decls of rootvp - it's in <sys/systm.h>.
Delete redundant decl of ffs_sbupdate() - it's in <ufs/ffs/ffs_extern.h>.
 1.59  16-Mar-2000  jdolecek Add new VFS op routine - vfs_done and call it on filesystem detach
in vfs_detach(). vfs_done may free global filesystem's resources,
typically those allocated in respective filesystem's init function.
Needed so those filesystems which went in via LKM have a chance to
clean after themselves before unloading.

For each leaf filesystem, add appropriate vfs_done routine.

Also remember how many times ffs_init() was called and do
the appropriate initialization on first call only. In ffs_done(),
destroy the resources when called by the last user of ffs code.
Change mfs to call ffs_init()/ffs_done() appropriately.
 1.58  16-Mar-2000  fvdl Inititalize the fs variable struct a little earlier to avoid referencing
a bad pointer in a printf. Problem reported by Krister Walfridsson.
 1.57  14-Feb-2000  fvdl Fixes to the softdep code from Ethan Solomita <ethan@geocast.com>.
* Fix buffer ordering when it has dependencies.
* Alleviate memory problems.
* Deal with some recursive vnode locks (sigh).
* Fix other bugs.
 1.56  10-Dec-1999  drochner Call ffs_oldfscompat() before all the consistency checks, to avoid the
use of uninitialized data in the checks if the filesystem is an old one.
 1.55  15-Nov-1999  fvdl Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.54  20-Oct-1999  enami Check if the type of device node isn't VBAD before touching v_specinfo. If
the device vnode is revoked, the field is NULL and touching it causes null
pointer derefercence.
 1.53  16-Oct-1999  wrstuden branches: 1.53.2; 1.53.4;
In spec_close(), if we're not doing a non-blocking close and VXLOCK is
not set, unlock the vnode before calling the device's close routine and
relock it after it returns. tty close routines will sleep waiting for
buffers to drain, which won't happen often times as the other side needs
to grab the vnode lock first.

Make all unmount routines lock the device vnode before calling VOP_CLOSE().
 1.52  03-Aug-1999  drochner branches: 1.52.2;
clean up inclusion of "opt_ffs.h" and use of "FFS_EI" a bit
 1.51  17-Jul-1999  wrstuden Adjust mountroot routines to vrele rootvp in case of mount error. Closes
PR 7977 by Neil Carson, <neil@brini.com>.
 1.50  08-Jul-1999  wrstuden Modify file systems to deal with struct lock in struct vnode. All leaf
fs's other than nfs use genfs_lock() for locking.

Modify lookup routines to set PDIRUNLOCK when they unlock the parrent.
 1.49  05-Mar-1999  bouyer branches: 1.49.2; 1.49.4;
Don't check fs_bsize before the superblock has been swapped if needed.
Check value of sbsize before allocating memory with this value.
 1.48  26-Feb-1999  wrstuden Modify vfsops to seperate vfs_fhtovp() into two routines. vfs_fhtovp() now
only handles the file handle to vnode conversion, and a new call,
vfs_checkexp(), performs the export verification.
 1.47  10-Feb-1999  bouyer Make sure a buffer optained from bread() is always bresle()'d in case of
error. Closes PR kern/1448 from Wolfgang Solfrank.
 1.46  04-Dec-1998  bouyer Sanity check a few values in the superblock, to avoid mallocing huge
memory area if we try to mount a corrupted filesystem. Fixes kern/3933.
 1.45  12-Nov-1998  thorpej defopt FFS_EI
 1.44  23-Oct-1998  thorpej branches: 1.44.2;
Use DINODE_SIZE rather than pointer arithmetic.
 1.43  01-Sep-1998  thorpej Use the pool allocator and the "nointr" pool page allocator for FFS inodes.

XXX MFS also comes in here for inodes, and used a different malloc type,
but the structure is the same, so we just use the FFS inode pool.
 1.42  09-Aug-1998  perry bzero->memset, bcopy->memcpy, bcmp->memcmp
 1.41  05-Jul-1998  jonathan * defopt COMPAT_{09,10,11,12,13} and COMPAT_NOMID.
TODO: revisit interaction between native compat and emul compat usage.
 1.40  24-Jun-1998  sommerfe Always include fifos; "not an option any more".
 1.39  22-Jun-1998  sommerfe defopt for options FIFO
 1.38  13-Jun-1998  kleink KNF, mostly of FFS_EI changes.
 1.37  09-Jun-1998  scottr Protect various config(8)-generated files from inclusion while
building LKMs. Fixes PR 5557.
 1.36  08-Jun-1998  scottr Use the newly-defined opt_quota.h.
 1.35  05-Jun-1998  kleink Convert fsync vnode operator implementations and usage from the old `waitfor'
argument and MNT_WAIT/MNT_NOWAIT to `flags' and FSYNC_WAIT.
 1.34  18-Mar-1998  bouyer Add support for reading/writing FFS in non-native byte order, conditioned
to "options FFS_EI". The superblock and inodes (without blk addr) are
byteswapped at disk read/write time, other metadatas are byteswapped
when used (as they are acceeded directly in the buffer cache).
This required the addition of a "um_flags" field to struct ufsmount.
ffs_bswap.c contains superblock and inode byteswap routines also used
by userland utilities.
 1.33  01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.32  18-Feb-1998  thorpej Place a pointer to an array of our vnodeopv_desc *'s in our vfsops
structure, for use by vfs_attach().
 1.31  16-Oct-1997  mjacob In calculating the f_bavail field, don't take 32 bit quantities and
multiply them by 90 (to be divided by 100) and expect them to be sane
for very large values (I was getting a negative 'avail' count).
 1.30  22-Jul-1997  fvdl Fix messed up RCS Id.
 1.29  07-Jul-1997  fvdl Get locking around inode hashing right.
 1.28  07-Jul-1997  fvdl Oops, I messed up the lock. Reverting it until I have time to fix it,
to avoid people getting trouble after the supscan hits.
 1.27  06-Jul-1997  fvdl Put lock around inode hashing, because getnewvnode or MALLOC might block,
creating race conditions.
 1.26  12-Jun-1997  mrg remove swap configuration.
 1.25  11-Jun-1997  bouyer Add support for ext2fs, this needed a few modifications to ufs/ufs/inode.h:
- added an "union inode_ext" to struct inode, for the per-fs extentions.
For now only ext2fs uses it.
- i_din is now an union:
union {
struct dinode ffs_din; /* 128 bytes of the on-disk dinode. */
struct ext2fs_dinode e2fs_din; /* 128 bytes of the on-disk dinode. */
} i_din
Added a lot of #define i_ffs_* and i_e2fs_* to access the fields.
- Added two macros: FFS_ITIMES and EXT2FS_ITIMES. ITIMES calls the rigth
macro, depending on the time of the inode. ITIMES is used where necessary,
FFS_ITIMES and EXT2FS_ITIMES in other places.
 1.24  10-Mar-1997  mycroft Just increment the generation count. Using the time is bogus and defeats
fsirand(8).
 1.23  31-Jan-1997  thorpej branches: 1.23.4;
- Add ffs_mountroot to ffs_vfsops.
- Only attempt to mount a root FFS on a DV_DISK class device.
 1.22  22-Dec-1996  cgd branches: 1.22.2;
Change the second and third args to struct vfsops' (*vfs_mount)() to
'const char *', and 'void *', respectively. The second arg is taken directly
from user arguments, and is const there, so must be const in the prototypes
and functions. The third arg is also taken directly from user arguments.
It doesn't have to be changed, but since it's cleaner to keep the type
the same as the user arg's type, and I'm already making the 'const char *'
change...
 1.21  12-Oct-1996  christos revert previous kprintf changes
 1.20  10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.19  09-Feb-1996  christos ffs prototypes
 1.18  19-Dec-1995  cgd Fix from Lite-2: when reloading the file system, save fs_maxcluster and
the old summary structure pointers, and recalculate cluster per cyl. grp.
information.
 1.17  11-Nov-1995  mycroft ffs -> ufs
 1.16  18-Jun-1995  cgd branches: 1.16.2;
don't assume the f_fsnamelen is nul-truncated or longer than MFSNAMELEN
 1.15  12-Apr-1995  mycroft Make use of the `fs_clean' field. If it was set when the file system was
mounted or upgraded to r-w, then clear it and set it again later when the
file system is unmounted or downgraded.
 1.14  09-Mar-1995  mycroft copy*str() should use size_t.
 1.13  08-Mar-1995  cgd size for copyinstr should be u_long
 1.12  18-Jan-1995  mycroft Clean up the code to frob mnt_stat a bit.
 1.11  18-Jan-1995  mycroft Turn mountlist into a CIRCLEQ, and handle setting and checking of MNT_ROOTFS
differently.
 1.10  15-Dec-1994  mycroft Call foo_statfs() from a common place when mounting.
 1.9  14-Dec-1994  mycroft Sync with CSRG.
 1.8  28-Oct-1994  mycroft This is not my day.
 1.7  28-Oct-1994  mycroft Fix typo.
 1.6  28-Oct-1994  mycroft For now, limit the maxfilesize to 2^31*bsize-1 in core. This is temporary.
 1.5  28-Oct-1994  mycroft Fix a couple of types in the compatibility code.
 1.4  29-Jun-1994  cgd branches: 1.4.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.3  28-Jun-1994  mycroft Reload mnt_maxsymlinklen, for `fsck -c2'.
 1.2  22-Jun-1994  mycroft Add a couple of missing casts.
 1.1  08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2  01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1  01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.4.2.1  23-Nov-1994  cgd from mycroft, for patch_05
 1.16.2.2  26-Dec-1995  mycroft Pull in ffs_reload() fix from trunk.
 1.16.2.1  01-Nov-1995  jtc complete ufs -> ffs change (From John Kohl; PR #1403)
 1.22.2.1  14-Jan-1997  thorpej Snapshot of work-in-progress, committed to private branch.

These changes implement machine-independent root device and file system
selection. Notable features:

- All ports behave in a consistent manner regarding root
device selection.
- No more "options GENERIC"; all kernels have the ability
to boot with RB_ASKNAME to select root device and file system
type.
- Root file system type can be wildcarded; a machine-independent
function will try all possible file systems for the selected
root device until one succeeds.
- If the root file system fails to mount, the operator will
be given the chance to select a new root device and file
system type, rather than having the machine simply panic.
- nfs_mountroot() no longer panics if any part of the NFS
mount process fails; it now returns an error, giving the
operator a chance to recover.
- New, more consistent, config(8) grammar. The constructs:

config netbsd swap generic
config netbsd root on nfs

have been replaced with:

config netbsd root on ? type ?
config netbsd root on ? type nfs

Additionally, the operator may select or wildcard root file
system type in the kernel configuration file:

config netbsd root on cd0a type cd9660

config(8) now requires that a "root" specification be
made. "root" may be wired down or wildcarded. "swap" and
"dump" specifications are optional, and follow previous
semantics.

- config(8) has a new "file-system" keyword, used to configure
file systems into the kernel. Eventually, this will be used
to generate the default vfssw[].

- "options NFSCLIENT" is obsolete, and is replaced by
"file-system NFS". "options NFSSERVER" still exists, since
NFS server support is independent of the NFS file system
client.

- sys/arch/<foo>/<foo>/swapgeneric.c is no longer used, and
will be removed; all information is now generated by config(8).

As of this commit, all ports except arm32 have been updated to use
the new setroot(). Only SPARC, i386, and Alpha ports have been
tested at this time. Port masters should test these changes on their
ports, and report any problems back to me.

More changes are on their way, including RB_ASKNAME support in
nfs_mountroot() (to prompt for server address and path) and, potentially,
the ability to select rarp/bootparam or bootp in nfs_mountroot().
 1.23.4.1  12-Mar-1997  is Merge in changes from Trunk
 1.44.2.1  30-May-1999  chs there's a new rule that all vnodes must call uvm_vnp_setsize()
before anyone can possibly access them, so do this in ffs_vget().
 1.49.4.3  02-Aug-1999  thorpej Update from trunk.
 1.49.4.2  04-Jul-1999  chs initialize new struct mount fields in ffs_mountfs().
 1.49.4.1  07-Jun-1999  chs merge everything from chs-ubc branch.
 1.49.2.2  20-Dec-1999  he Pull up revision 1.56 (via patch, requested by drochner):
Fix the use of an uninitialized variable. This could be triggered
if the file system to be mounted is a pre-BSD4.4 one (which can
result in the old file system being rejected).
 1.49.2.1  18-Oct-1999  cgd pull up rev 1.53 from trunk (requested by wrstuden):
In spec_close(), call the device's close routine with the vnode
unlocked if the call might block. Force a non-blocking close if
VXLOCK is set. This eliminates a potential deadlock situation, and
should eliminate the dirty buffers on reboot issue.
 1.52.2.2  27-Dec-1999  wrstuden Pull up to last week's -current.
 1.52.2.1  21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.53.4.3  15-Nov-1999  fvdl Sync with -current
 1.53.4.2  03-Nov-1999  fvdl Give ufs_ihashget an extra argument: the flags passed to vget() for
locking. This way we can avoid locking against ourselves when
ufs_ihashget is called during the flushing of metadata. XXX

Also, comment out a VOP_FSYNC call that I think is now unneeded, and
put a diagnostic printf there to check if this still happens.
 1.53.4.1  19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.53.2.5  11-Feb-2001  bouyer Sync with HEAD.
 1.53.2.4  18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.53.2.3  08-Dec-2000  bouyer Sync with HEAD.
 1.53.2.2  20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.53.2.1  20-Oct-1999  thorpej Sync /w trunk.
 1.62.2.1  22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.67.2.8  06-Oct-2003  itojun make sure to not get flags which are for internal use only from the on-disk
superblock.
Proposed in http://mail-index.netbsd.org/tech-kern/2003/09/06/0005.html
[ticket #80, bouyer]
 1.67.2.7  25-Nov-2001  he Pull up revision 1.85 (requested by lukem):
Pull in enhanced ffs_dirpref() algorithm, which provides a
substantial performance improvement through better locality
between parent/child directories and their files, and by easing
the pressure on the buffer cache for metadata operations.
 1.67.2.6  25-Nov-2001  he Pull up revision 1.84 (requested by lukem):
Change fs_csp[] from being a fixed size to being an array sized
as required. This allows file systems with more than about 15500
cylinder groups (on 32-bit systems) to be used.
 1.67.2.5  25-Nov-2001  he Pull up revision 1.83 (requested by lukem):
Call ffs_sb_swap() with the correct arguments. Fixes problems
with using other-endian file systems.
 1.67.2.4  25-Nov-2001  he Pull up revision 1.82 (requested by lukem):
Correctly refer to fs_clean in error message.
 1.67.2.3  25-Nov-2001  he Pull up revisions 1.76,1.78 (requested by lukem):
In ffs_reload(), copy fs_ronly to the new superblock too.
Clear fs_fmod on rw->ro transition.
 1.67.2.2  14-Dec-2000  he Pull up revision 1.71 (requested by fvdl):
Improve NFS performance, possibly with as much as 100% in
throughput. Please note: this implies a kernel interface change,
VOP_FSYNC gains two arguments.
 1.67.2.1  03-Jul-2000  fvdl pullup from trunk:

Fix a "locking against myself" problem; holding ufs_hashlock
across getnewvnode() could cause a recursive lock if it resulted in
recycling a vnode that was using softdeps.
 1.80.2.15  11-Dec-2002  thorpej Sync with HEAD.
 1.80.2.14  18-Oct-2002  nathanw Catch up to -current.
 1.80.2.13  17-Sep-2002  nathanw Catch up to -current.
 1.80.2.12  01-Aug-2002  nathanw Catch up to -current.
 1.80.2.11  12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.80.2.10  24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.80.2.9  20-Jun-2002  nathanw Catch up to -current.
 1.80.2.8  17-Apr-2002  nathanw Catch up to -current.
 1.80.2.7  01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.80.2.6  08-Jan-2002  nathanw Catch up to -current.
 1.80.2.5  14-Nov-2001  nathanw Catch up to -current.
 1.80.2.4  21-Sep-2001  nathanw Catch up to -current.
 1.80.2.3  24-Aug-2001  nathanw Catch up with -current.
 1.80.2.2  21-Jun-2001  nathanw Catch up to -current.
 1.80.2.1  05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.81.4.8  10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.81.4.7  06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.81.4.6  23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.81.4.5  16-Mar-2002  jdolecek Catch up with -current.
 1.81.4.4  10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.81.4.3  13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.81.4.2  25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.81.4.1  03-Aug-2001  lukem update to -current
 1.85.2.3  01-Oct-2001  fvdl Catch up with -current.
 1.85.2.2  26-Sep-2001  fvdl * add a VCLONED vnode flag that indicates a vnode representing a cloned
device.
* rename REVOKEALL to REVOKEALIAS, and add a REVOKECLONE flag, to pass
to VOP_REVOKE
* the revoke system call will revoke all aliases, as before, but not the
clones
* vdevgone is called when detaching a device, so make it use REVOKECLONE
to get rid of all clones as well
* clean up all uses of VOP_OPEN wrt. locking.
* add a few VOPS to spec_vnops that need to do something when it's a
clone vnode (access and getattr)
* add a copy of the vnode vattr structure of the original 'master' vnode
to the specinfo of a cloned vnode. could possibly redirect getattr to
the 'master' vnode, but this has issues with revoke
* add a vdev_reassignvp function that disassociates a vnode from its
original device, and reassociates it with the specified dev_t. to be
used by cloning devices only, in case a new minor is allocated.
* change all direct references in drivers to v_devcookie and v_rdev
to vdev_privdata(vp) and vdev_rdev(vp). for diagnostic purposes
when debugging race conditions that still exist wrt. locking and
revoking vnodes.
* make the locking state of a vnode consistent when passed to
d_open and d_close (unlocked). locked would be better, but has
some deadlock issues
 1.85.2.1  18-Sep-2001  fvdl Various changes to make cloning devices possible:

* Add an extra argument (struct vnode **) to VOP_OPEN. If it is
not NULL, specfs will create a cloned (aliased) vnode during
the call, and return it there. The caller should release and
unlock the original vnode if a new vnode was returned. The
new vnode is returned locked.

* Add a flag field to the cdevsw and bdevsw structures.
DF_CLONING indicates that it wants a new vnode for each
open (XXX is there a better way? devprop?)

* If a device is cloning, always call the close entry
point for a VOP_CLOSE.


Also, rewrite cons.c to do the right thing with vnodes. Use VOPs
rather then direct device entry calls. Suggested by mycroft@

Light to moderate testing done an i386 system (arch doesn't matter
though, these are MI changes).
 1.87.2.1  12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.98.4.2  24-Sep-2003  tron Pull up revision 1.121 via patch (requested by bouyer in ticket #1464):
make sure to not get flags which are for internal use only from the on-disk
superblock.
Proposed in http://mail-index.netbsd.org/tech-kern/2003/09/06/0005.html
 1.98.4.1  10-Jun-2002  tv Pull up revision 1.99 (requested by chs in ticket #227):
allow read-only mounts even if we can't read the last fragment of the fs.
this enables one to recover data from a failing disk (where the read failure
is a hardware problem) while avoiding corrupting the fs further (in the case
where the read failure is due to a misconfiguration).
 1.98.2.3  29-Aug-2002  gehenna catch up with -current.
 1.98.2.2  20-Jun-2002  gehenna catch up with -current.
 1.98.2.1  16-May-2002  gehenna Use devsw APIs for checking validity of major numbers.
 1.118.2.14  10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.118.2.13  01-Apr-2005  skrll Sync with HEAD.
 1.118.2.12  08-Mar-2005  skrll Sync with HEAD.
 1.118.2.11  04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.118.2.10  17-Jan-2005  skrll Sync with HEAD.
 1.118.2.9  29-Nov-2004  skrll Sync with HEAD.
 1.118.2.8  27-Oct-2004  skrll Remove the struct lwp * arguments from qsync and ufs_checkpath that are
no longer (read: were never) required.
 1.118.2.7  24-Sep-2004  skrll Sync with HEAD.
 1.118.2.6  21-Sep-2004  skrll Fix the sync with head I botched.
 1.118.2.5  18-Sep-2004  skrll Sync with HEAD.
 1.118.2.4  25-Aug-2004  skrll Sync with HEAD.
 1.118.2.3  24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.118.2.2  03-Aug-2004  skrll Sync with HEAD
 1.118.2.1  02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.140.2.3  29-May-2004  tron Pull up revision 1.148 (requested by atatat in ticket #393):
Sysctl descriptions under vfs subtree
 1.140.2.2  28-Apr-2004  jmc Pullup rev 1.145 (requested by dbj in ticket #197)

Remove botched superblock upgrade warnings.
There are now alternate non-kernel checks and fixes for this problem.
PR#17910 PR#21283 PR#21404 PR#23925 PR#23926
PR#25138
 1.140.2.1  27-Apr-2004  jdc Pull up revisions 1.141-1.142 (requested by dbj in ticket #185)

Fix problems related to superblock upgrade issues which may be
experienced by -current users from 2003.
 1.160.4.1  19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.160.2.1  29-Apr-2005  kent sync with -current
 1.162.2.3  30-May-2007  bouyer Pull up following revision(s) (requested by tsutsui in ticket #1798):
sys/ufs/ffs/ffs_vfsops.c: revision 1.201
Fix inconsistent changes in rev 1.153 and 1.154:
Adjust fs->fs_maxfilesize instead of ump->um_maxfilesize
in ffs_oldfscompat_read() because the latter is overrided
by the former after ffs_oldfscompat_read() returned.
Fixes EFBIG errors on read(2) and "exec /sbin/init: error 8"
problem on mac68k after mountroot() on old 4.3BSD UFS created
by the Mkfs tool for MacOS (reported and confirmed on port-mac68k).
 1.162.2.2  10-Mar-2006  tron Pull up following revision(s) (requested by drochner in ticket #1189):
sys/ufs/ffs/ffs_vfsops.c: revision 1.168
fix crash in mount error handling: don't free storage which was not
malloc'd
 1.162.2.1  24-Aug-2005  riz Pull up following revision(s) (requested by yamt in ticket #688):
sys/miscfs/genfs/genfs_vnops.c: revision 1.98 via patch
sys/ufs/ffs/ffs_vfsops.c: revision 1.165
sys/ufs/lfs/lfs_extern.h: revision 1.69
sys/fs/filecorefs/filecore_vfsops.c: revision 1.20
sys/nfs/nfs_node.c: revision 1.80
sys/fs/smbfs/smbfs_node.c: revision 1.24
sys/fs/cd9660/cd9660_vfsops.c: revision 1.24
sys/fs/msdosfs/msdosfs_denode.c: revision 1.8
sys/miscfs/genfs/genfs_node.h: revision 1.6
sys/ufs/lfs/lfs_vfsops.c: revision 1.183
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.86
sys/fs/adosfs/advfsops.c: revision 1.23
sys/fs/ntfs/ntfs_vfsops.c: revision 1.31
- constify genfs_ops.
- use member designators.

sys/miscfs/genfs/genfs_vnops.c: revision 1.99 via patch
genfs_getpages: don't forget to put the vnode onto the syncer's work que
ue
even in the case of PGO_LOCKED.

sys/uvm/uvm_bio.c: revision 1.40
sys/uvm/uvm_pager.h: revision 1.29
sys/miscfs/genfs/genfs_vnops.c: revision 1.100 via patch
sys/ufs/ufs/ufs_inode.c: revision 1.50
- introduce PGO_NOBLOCKALLOC and use it for ubc mapping
to prevent unnecessary block allocations in the case that
page size > block size.
- ufs_balloc_range: use VM_PROT_WRITE+PGO_NOBLOCKALLOC rather than
VM_PROT_READ.

sys/uvm/uvm_fault.c: revision 1.96
sys/miscfs/genfs/genfs_vnops.c: revision 1.101 via patch
sys/uvm/uvm_object.h: revision 1.19
sys/miscfs/genfs/genfs_node.h: revision 1.7
ensure that vnodes with dirty pages are always on syncer's queue.
- genfs_putpages: wait for i/o completion of PG_RELEASED/PG_PAGEOUT pages by
setting "wasclean" false when encountering them.
suggested by Stephan Uphoff in PR/24596 (1).
- genfs_putpages: write protect pages when cleaning out, if
we're going to take the vnode off the syncer's queue.
uvm_fault: don't write-map pages unless its vnode is already on
the syncer's queue.
fix PR/24596 (3) but in the different way from the suggested fix.
(to keep our current behaviour, ie. not to require explicit msync.
discussed on tech-kern@.)
- genfs_putpages: don't mistakenly take a vnode off the queue
by introducing a generation number in genfs_node.
genfs_getpages: increment the generation number.
suggested by Stephan Uphoff in PR/24596 (2).
- add some assertions.

sys/miscfs/genfs/genfs_vnops.c: revision 1.102 via patch
genfs_putpages: don't bother to clean the vnode unless VONWORKLST.

sys/ufs/ffs/ffs_vnops.c: revision 1.71
ffs_full_fsync: because VBLK/VCHR can be mmap'ed,
do VOP_PUTPAGES for them as well.

sys/uvm/uvm_fault.c: revision 1.97
uvm_fault: check a correct object in the case of layered filesystems.
fix PR/30811 from Jukka Salmi.

sys/uvm/uvm_object.h: revision 1.20
sys/ufs/ffs/ffs_vfsops.c: revision 1.167
sys/uvm/uvm_bio.c: revision 1.41
sys/ufs/ufs/ufs_vnops.c: revision 1.129
sys/uvm/uvm_mmap.c: revision 1.92
sys/uvm/uvm_fault.c: revision 1.98
sys/kern/vfs_subr.c: revision 1.252
sys/fs/msdosfs/denode.h: revision 1.5
sys/miscfs/genfs/genfs_vnops.c: revision 1.103 via patch
sys/fs/msdosfs/msdosfs_denode.c: revision 1.9
sys/sys/vnode.h: revision 1.141
sys/ufs/ufs/ufs_inode.c: revision 1.51
sys/ufs/ufs/ufs_extern.h: revision 1.45 via patch
sys/miscfs/genfs/genfs_node.h: revision 1.8
sys/ufs/lfs/lfs_vfsops.c: revision 1.184
sys/uvm/uvm_pager.h: revision 1.30
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.87
update file timestamps for nfsd loaned-read and mmap.
PR/25279. discussed on tech-kern@.

sys/miscfs/genfs/genfs_vnops.c: revision 1.104 via patch
don't write-protect wired pages. pointed by Chuck Silvers.
for now, leave a vnode on the syncer's queue, as suggested by him.

sys/ufs/ffs/ffs_vnops.c: revision 1.72
revert VCHR part of ffs_vnops.c 1.71.
as VCHR uses the device pager, no point to call VOP_PUTPAGES here.
pointed by Chuck Silvers.
 1.165.2.8  04-Feb-2008  yamt sync with head.
 1.165.2.7  21-Jan-2008  yamt sync with head
 1.165.2.6  07-Dec-2007  yamt sync with head
 1.165.2.5  27-Oct-2007  yamt sync with head.
 1.165.2.4  03-Sep-2007  yamt sync with head.
 1.165.2.3  26-Feb-2007  yamt sync with head.
 1.165.2.2  30-Dec-2006  yamt sync with head.
 1.165.2.1  21-Jun-2006  yamt sync with head.
 1.175.2.2  29-Oct-2005  yamt use ffs_* directly rather than via ufs_ops.
suggested by Chuck Silvers.
 1.175.2.1  20-Oct-2005  yamt adapt ufs.
 1.178.2.2  01-Mar-2006  yamt sync with head.
 1.178.2.1  15-Jan-2006  yamt sync with head.
 1.179.4.3  01-Jun-2006  kardel Sync with head.
 1.179.4.2  22-Apr-2006  simonb Sync with head.
 1.179.4.1  04-Feb-2006  simonb Adapt for timecounters: mostly use get*time() and use "time_second"
instead of "time.tv_sec".
 1.179.2.1  09-Sep-2006  rpaulo sync with head
 1.180.6.1  24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.180.4.2  06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.180.4.1  08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.180.2.4  03-Sep-2006  yamt sync with head.
 1.180.2.3  11-Aug-2006  yamt sync with head
 1.180.2.2  26-Jun-2006  yamt sync with head.
 1.180.2.1  24-May-2006  yamt sync with head.
 1.181.2.1  19-Jun-2006  chap Sync with head.
 1.182.2.1  13-Jul-2006  gdamore Merge from HEAD.
 1.185.4.2  10-Dec-2006  yamt sync with head.
 1.185.4.1  22-Oct-2006  yamt sync with head
 1.185.2.3  01-Feb-2007  ad Sync with head.
 1.185.2.2  12-Jan-2007  ad Sync with head.
 1.185.2.1  18-Nov-2006  ad Sync with head.
 1.190.4.1  03-Sep-2007  wrstuden Sync w/ NetBSD-4-RC_1
 1.190.2.1  04-Jun-2007  riz Pull up following revision(s) (requested by tsutsui in ticket #686):
sys/ufs/ffs/ffs_vfsops.c: revision 1.201
Fix inconsistent changes in rev 1.153 and 1.154:
Adjust fs->fs_maxfilesize instead of ump->um_maxfilesize
in ffs_oldfscompat_read() because the latter is overrided
by the former after ffs_oldfscompat_read() returned.
Fixes EFBIG errors on read(2) and "exec /sbin/init: error 8"
problem on mac68k after mountroot() on old 4.3BSD UFS created
by the Mkfs tool for MacOS (reported and confirmed on port-mac68k).
 1.196.6.22  11-Nov-2007  hannken Add fstrans_mount() to explicitly allocate fstrans_info.
Replace remaining malloc() to kmem_alloc() in vfs_trans.c.

Ok: Andrew Doran <ad@netbsd.org>
 1.196.6.21  25-Oct-2007  ad Fix up mnt_vnodelist handling.
 1.196.6.20  23-Oct-2007  ad Sync with head.
 1.196.6.19  08-Oct-2007  ad Call fstrans_unmount().
 1.196.6.18  16-Sep-2007  ad - Checkpoint work in progress on the vnode lifecycle and reference counting
stuff. This makes it work properly without kernel_lock and fixes a few
quite old bugs. See vfs_subr.c 1.283.2.17 for details.

- Fix some problems with softdep. Unfortunately our softdep code appears
to have some longstanding bugs that cause it fail under stress test.
 1.196.6.17  30-Aug-2007  ad - Mark ffs MPSAFE. There are still a few minor problems and I'm not yet
sure about the snapshot code, but by and large it's there.
- Grap ump->um_lock in a few more places.
 1.196.6.16  28-Aug-2007  ad Revert accidental change (mp->mnt_iflag |= IMNT_MPSAFE).
 1.196.6.15  24-Aug-2007  ad Sync with buffer cache locking changes. See buf.h/vfs_bio.c for details.
Some minor portions are incomplete and needs to be verified as a whole.
 1.196.6.14  20-Aug-2007  ad Sync with HEAD.
 1.196.6.13  20-Aug-2007  ad softdep locking improvements. It hangs looping in flush_inodedep_deps(),
more work required.
 1.196.6.12  29-Jul-2007  ad Add vfs_destroy() to free mount structures. The specificdata_ref was being
leaked.
 1.196.6.11  15-Jul-2007  ad Sync with head.
 1.196.6.10  17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.196.6.9  09-Jun-2007  ad Sync with head.
 1.196.6.8  08-Jun-2007  ad Sync with head.
 1.196.6.7  27-May-2007  ad ffs_sync: vp->v_data can be NULL if the vnode is being recycled.
 1.196.6.6  13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.196.6.5  14-Apr-2007  ad ffs_sync: don't try to examine the inode without locking if the vnode is
being freed.
 1.196.6.4  13-Apr-2007  ad Put a per-mount lock around ffs shared data structures, excluding softdep
and quotas. Strategy lifted from FreeBSD.
 1.196.6.3  10-Apr-2007  ad Sync with head.
 1.196.6.2  13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.196.6.1  13-Mar-2007  ad Sync with head.
 1.196.2.3  17-May-2007  yamt sync with head.
 1.196.2.2  15-Apr-2007  yamt sync with head.
 1.196.2.1  24-Mar-2007  yamt sync with head.
 1.197.2.1  11-Jul-2007  mjf Sync with head.
 1.205.2.1  15-Aug-2007  skrll Sync with HEAD.
 1.207.4.2  31-Jul-2007  pooka * nuke the nameidata parameter from VFS_MOUNT(). Nobody on tech-kern
knew what it was supposed to be used for and wrstuden gave a go-ahead
* while rototilling, convert file systems which went easily to
use VFS_PROTOS() instead of manually prototyping the methods
 1.207.4.1  31-Jul-2007  pooka file ffs_vfsops.c was added on branch matt-mips64 on 2007-07-31 21:14:21 +0000
 1.207.2.4  09-Dec-2007  jmcneill Sync with HEAD.
 1.207.2.3  27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.207.2.2  26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.207.2.1  16-Aug-2007  jmcneill Sync with HEAD.
 1.208.4.1  14-Oct-2007  yamt sync with head.
 1.208.2.3  23-Mar-2008  matt sync with HEAD
 1.208.2.2  09-Jan-2008  matt sync with HEAD
 1.208.2.1  06-Nov-2007  matt sync with HEAD
 1.210.4.3  18-Feb-2008  mjf Sync with HEAD.
 1.210.4.2  27-Dec-2007  mjf Sync with HEAD.
 1.210.4.1  08-Dec-2007  mjf Sync with HEAD.
 1.211.2.2  26-Dec-2007  ad Sync with head.
 1.211.2.1  04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.212.4.3  10-Jan-2008  bouyer Sync with HEAD
 1.212.4.2  08-Jan-2008  bouyer Sync with HEAD
 1.212.4.1  02-Jan-2008  bouyer Sync with HEAD
 1.222.6.5  17-Jan-2009  mjf Sync with HEAD.
 1.222.6.4  28-Sep-2008  mjf Sync with HEAD.
 1.222.6.3  29-Jun-2008  mjf Sync with HEAD.
 1.222.6.2  05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.222.6.1  02-Jun-2008  mjf Sync with HEAD.
 1.223.4.7  11-Aug-2010  yamt sync with head.
 1.223.4.6  11-Mar-2010  yamt sync with head
 1.223.4.5  16-Sep-2009  yamt sync with head
 1.223.4.4  19-Aug-2009  yamt sync with head.
 1.223.4.3  18-Jul-2009  yamt sync with head.
 1.223.4.2  04-May-2009  yamt sync with head.
 1.223.4.1  16-May-2008  yamt sync with head.
 1.223.2.2  04-Jun-2008  yamt sync with head
 1.223.2.1  18-May-2008  yamt sync with head.
 1.226.2.3  10-Oct-2008  skrll Sync with HEAD.
 1.226.2.2  18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.226.2.1  23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.229.2.7  28-Jul-2008  simonb Add support for creating a WAPBL log in the filesystem. Will
create an in-filesystem log on first "mount -o log" if one doesn't
exist, and will then continue to use same log in the future. See
(soon to be added) wapbl(4) for more info.

Adds a new B_CONTIG low-level allocation flag that uses hints in
"struct ffs_inode_ext" to lay out an ffs file's data contiguously.

Thanks to Greg Oster for helping with the design of this and to
Antti Kantee for code review and suggestions.
 1.229.2.6  03-Jul-2008  simonb Sync with head.
 1.229.2.5  30-Jun-2008  simonb During mount, mark the filesystem as clean once we've replayed the
journal.

With much help from Greg Oster.
 1.229.2.4  12-Jun-2008  martin License police
 1.229.2.3  11-Jun-2008  simonb Fix some whitespace and long line niggles.
 1.229.2.2  11-Jun-2008  simonb Comment out the behaviour change that requires "mount -f ..." to mount
a dirty filesystem.
 1.229.2.1  10-Jun-2008  simonb Initial commit of Wasabi System's WAPBL (Write Ahead Physical Block
Logging) journaling code. Originally written by Darrin B. Jewell
while at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

Still a number of issues - look in doc/BRANCHES for "simonb-wapbl"
for more info.
 1.230.2.2  13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.230.2.1  19-Oct-2008  haad Sync with HEAD.
 1.238.2.3  28-Apr-2009  skrll Sync with HEAD.
 1.238.2.2  03-Mar-2009  skrll Sync with HEAD.
 1.238.2.1  19-Jan-2009  skrll Sync with HEAD.
 1.239.2.5  25-Apr-2014  sborrill Pull up the following revisions(s) (requested by maxv in ticket #1901):
sys/kern/vfs_syscalls.c: revision 1.478, 1.480 via patch
sys/coda/coda_vfsops.c: revision 1.81
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50 via patch
sys/fs/puffs/puffs_vfsops.c: revision 1.110 via patch
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59 via patch
sys/fs/udf/udf_vfsops.c: revision 1.67
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/kern/vfs_syscalls.c: revision 1.479
sys/miscfs/nullfs/null_vfsops.c: revision 1.88 via patch
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/nfs/nfs_vfsops.c: revision 1.227
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/ufs/mfs/mfs_vfsops.c: revision 1.107

Due to missing checks in the mount syscall, and a wrong assumption on the
file systems side, the kernel could allocate an unbounded or zero-sized
memory buffer, and could dereference a NULL pointer when particular
arguments are given by a user.
 1.239.2.4  03-Oct-2009  snj branches: 1.239.2.4.2; 1.239.2.4.6;
Pull up following revision(s) (requested by bouyer in ticket #1036):
sbin/fsck_ffs/extern.h: revision 1.25 via patch
sbin/fsck_ffs/setup.c: revision 1.88 via patch
sbin/fsck_ffs/wapbl.c: revision 1.4 via patch
sbin/tunefs/tunefs.c: revision 1.41 via patch
sys/ufs/ffs/ffs_vfsops.c: revision 1.252 via patch
sys/ufs/ffs/ffs_wapbl.c: revision 1.13 via patch
Allow tunefs to clear any type of WAPBL log, not only in-filesystem
ones. Discussed in
http://mail-index.netbsd.org/tech-kern/2009/08/17/msg005896.html
and followups.
--
Do some basic checks of the WAPBL journal, to abort the boot before the
kernel refuse to mount a filesystem read-write (booting a system
multiuser with critical filesystems read-only is bad):
Add a check_wapbl() which will check some WAPBL values in the superblock,
and try to read the journal via wapbl_replay_start() if there is one.
pfatal() if one of these fail (abort boot if in preen mode,
as "CONTINUE" otherwise). In non-preen mode the bogus journal will
be cleared.
check_wapbl() is always called if the superblock supports WAPBL.
Even if FS_DOWAPBL is not there, there could be flags asking the
kernel to clear or create a log with bogus values which would cause the
kernel refuse to mount the filesystem.
Discussed in
http://mail-index.netbsd.org/tech-kern/2009/08/17/msg005896.html
and followups.
--
If the WAPBL journal can't be read (ffs_wapbl_replay_start() fails),
mount the filesystem anyway if MNT_FORCE is present.
This allows to still boot single-user a system with a corrupted
WAPBL on /, and so get a chance to run fsck to fix it.
http://mail-index.netbsd.org/tech-kern/2009/08/17/msg005896.html
and followups.
 1.239.2.3  04-Apr-2009  snj branches: 1.239.2.3.4;
Pull up following revision(s) (requested by add in ticket #655):
sys/ufs/ffs/ffs_vfsops.c: revision 1.245 via patch
fsync:
- atime updates were not being synced.
ffs_sync:
- In some cases the sync vnode was acting like now dead /usr/sbin/update.
It was examining vnodes that it should have ignored.
- It would find dirty inodes and try to flush them. Often ffs_fsync()
cheerfully ignored the flush request due to the fsync bug. Such inodes
remained dirty and were repeatedly re-examined by the syncer until
vnode reclaim or system shutdown.
- We were marking our place in the per-mount vnode list even though in
most cases there was not flush to perform. While not a bug, this wasted
CPU cycles because a TAILQ_NEXT would have sufficed.
 1.239.2.2  27-Mar-2009  msaitoh Pull up following revision(s) (requested by ad in ticket #600):
sys/ufs/ffs/ffs_vfsops.c: revision 1.244
ffs_sync: ensure that we *do* flush atime updates periodically.
ffs_update() was eating the flag.
 1.239.2.1  24-Feb-2009  snj Pull up following revision(s) (requested by ad in ticket #490):
sys/kern/vfs_wapbl.c: revision 1.23
sys/miscfs/syncfs/sync_subr.c: revision 1.36
sys/miscfs/syncfs/sync_vnops.c: revision 1.26
sys/ufs/ffs/ffs_alloc.c: revision 1.121
sys/ufs/ffs/ffs_vfsops.c: revision 1.242
sys/ufs/ffs/ffs_vnops.c: revision 1.110
PR kern/39564 wapbl performance issues with disk cache flushing
PR kern/40361 WAPBL locking panic in -current
PR kern/40361 WAPBL locking panic in -current
PR kern/40470 WAPBL corrupts ext2fs
PR kern/40562 busy loop in ffs_sync when unmounting a file system
PR kern/40525 panic: ffs_valloc: dup alloc
- A fix for an issue that can lead to "ffs_valloc: dup" due to dirty cg
buffers being invalidated. Problem discovered and patch by dholland@.
- If the syncer fails to lazily sync a vnode due to lock contention,
retry 1 second later instead of 30 seconds later.
- Flush inode atime updates every ~10 seconds (this makes most sense with
logging). Presently they didn't hit the disk for read-only files or
devices until the file system was unmounted. It would be better to trickle
the updates out but that would require more extensive changes.
- Fix issues with file system corruption, busy looping and other nasty
problems when logging and non-logging file systems are intermixed,
with one being the root file system.
- For logging, do not flush metadata on an inode-at-a-time basis if the sync
has been requested by ioflush. Previously, we could try hundreds of log
sync operations a second due to inode update activity, causing the syncer
to fall behind and metadata updates to be serialized across the entire
file system. Instead, burst out metadata and log flushes at a minimum
interval of every 10 seconds on an active file system (happens more often
if the log becomes full). Note this does not change the operation of
fsync() etc.
- With the flush issue fixed, re-enable concurrent metadata updates in
vfs_wapbl.c.
 1.239.2.4.6.1  28-Apr-2014  sborrill Pull up the following revisions(s) (requested by maxv in ticket #1901):
sys/kern/vfs_syscalls.c: revision 1.478, 1.480 via patch
sys/coda/coda_vfsops.c: revision 1.81
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50 via patch
sys/fs/puffs/puffs_vfsops.c: revision 1.110 via patch
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59 via patch
sys/fs/udf/udf_vfsops.c: revision 1.67
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/kern/vfs_syscalls.c: revision 1.479
sys/miscfs/nullfs/null_vfsops.c: revision 1.88 via patch
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/nfs/nfs_vfsops.c: revision 1.227
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/ufs/mfs/mfs_vfsops.c: revision 1.107

Due to missing checks in the mount syscall, and a wrong assumption on the
file systems side, the kernel could allocate an unbounded or zero-sized
memory buffer, and could dereference a NULL pointer when particular
arguments are given by a user.
 1.239.2.4.2.1  28-Apr-2014  sborrill Pull up the following revisions(s) (requested by maxv in ticket #1901):
sys/kern/vfs_syscalls.c: revision 1.478, 1.480 via patch
sys/coda/coda_vfsops.c: revision 1.81
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50 via patch
sys/fs/puffs/puffs_vfsops.c: revision 1.110 via patch
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59 via patch
sys/fs/udf/udf_vfsops.c: revision 1.67
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/kern/vfs_syscalls.c: revision 1.479
sys/miscfs/nullfs/null_vfsops.c: revision 1.88 via patch
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/nfs/nfs_vfsops.c: revision 1.227
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/ufs/mfs/mfs_vfsops.c: revision 1.107

Due to missing checks in the mount syscall, and a wrong assumption on the
file systems side, the kernel could allocate an unbounded or zero-sized
memory buffer, and could dereference a NULL pointer when particular
arguments are given by a user.
 1.239.2.3.4.1  21-Apr-2010  matt sync to netbsd-5
 1.241.4.2  23-Jul-2009  jym Sync with HEAD.
 1.241.4.1  13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.257.2.14  19-Nov-2010  uebayasi - Check FFS fragment size to be page-aligned too.
- Hook the new cdev_mmap() method.
 1.257.2.13  25-Oct-2010  uebayasi Fragment size doesn't need to be page-aligned.

Return EINVAL if read-only mount option is not set, other failures
reported as ENXIO.
 1.257.2.12  21-Oct-2010  uebayasi Handle XIP mount error properly.
 1.257.2.11  21-Oct-2010  uebayasi After consideration, put back "xip" mount option.

The internal behavior is totally different between with and without
the option; automatic detection and/or fall-through are not user
friendly. mount(8) returning the "xip" flag is also informative.
 1.257.2.10  07-Oct-2010  uebayasi Check filesystem's bsize/fsize are aligned to PAGE_SIZE, or fail with
ENXIO.
 1.257.2.9  26-Sep-2010  uebayasi ffs_vget: Mark XIP only for VREG vnodes.
 1.257.2.8  17-Aug-2010  uebayasi Sync with HEAD.
 1.257.2.7  27-Jul-2010  uebayasi s/DIOCGPHYSADDR/DIOCGPHYSSEG/ now that it returns struct vm_physseg *,
not paddr_t.
 1.257.2.6  28-May-2010  uebayasi Remove the "xip" option from mount_ffs(8) for simplicity.
 1.257.2.5  30-Apr-2010  uebayasi Sync with HEAD.
 1.257.2.4  28-Apr-2010  uebayasi When mounting a block device as XIP, pass registered struct vm_physseg
* as a cookie from the block device to the caller (== mount code).
struct vm_physseg * will be passed to XIP vnode pager
(genfs_do_getpages_xip()), then converted back to paddr_t.

(My future plan is to pass struct vm_physseg * back to the fault handler,
and to pmap_enter() as is.)
 1.257.2.3  23-Mar-2010  uebayasi Put run-time XIP-specific per-mount data in struct specdev, not struct mount.
 1.257.2.2  23-Feb-2010  uebayasi Check XIP mount condition more nicely.
 1.257.2.1  11-Feb-2010  uebayasi XIP hook for ffs.
 1.258.2.6  31-May-2011  rmind sync with head
 1.258.2.5  19-May-2011  rmind Implement sharing of vnode_t::v_interlock amongst vnodes:
- Lock is shared amongst UVM objects using uvm_obj_setlock() or getnewvnode().
- Adjust vnode cache to handle unsharing, add VI_LOCKSHARE flag for that.
- Use sharing in tmpfs and layerfs for underlying object.
- Simplify locking in ubc_fault().
- Sprinkle some asserts.

Discussed with ad@.
 1.258.2.4  21-Apr-2011  rmind sync with head
 1.258.2.3  05-Mar-2011  rmind sync with head
 1.258.2.2  03-Jul-2010  rmind sync with head
 1.258.2.1  16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.263.4.3  09-Feb-2011  bouyer Support MNT_UPDATE for quota2 (especially r/o -> r/w transitions)
 1.263.4.2  08-Feb-2011  bouyer Minimal hacking to make 'options QUOTA' compile again.
 1.263.4.1  20-Jan-2011  bouyer Snapshot of work in progress on a modernised disk quota system:
- new quotactl syscall (versionned for backward compat), which takes
as parameter a path to a mount point, and a prop_dictionary
(in plistref format) describing commands and arguments.
For each command, status and data are returned as a prop_dictionary.
quota commands features will be added to take advantage of this,
exporting quota data or getting quota commands as plists.

- new on disk-format storage (all 64bit wide), integrated to metadata for
ffs (and playing nicely with wapbl).
Quotas are enabled on a ffs filesystem via superblock flags.
tunefs(8) can enable or disable quotas.
On a quota-enabled filesystem, fsck_ffs(8) will track per-uid/gid
block and inode usages, and will check and update quotas in Pass 6.
quota usage and limits are stored in unliked files (one for users,
one for groups)l fsck_ffs(8) will create the files if needed, or
free them if needed. This means that after enabling or disabling
quotas on a filesystem; a fsck_ffs(8) run is required.
quotacheck(8) is not needed any more, on a unclean shutdown
fsck or journal replay will take care of fixing quotas.
newfs(8) can create a ready-to-mount quota-enabled filesystem
(superblock flags are set and quota inodes are created).
Other new features or semantic changes:
- default quota datas, applied to users or groups which don't already
have a quota entry
- per-user/group grace time (instead of a filesystem global one)
- 0 really means "nothing allowed at all", not "no limit".
If you want "no limit", set the limit to UQUAD_MAX (tools will
understand "unlimited" and "-")

A quota file is structured as follow:
it starts with a header, containing a few per-filesystem values,
and the default quota limits.
Quota entries are linked together as a simple list, each entry has a
pointer (as an offset withing the file) to the next.
The header has a pointer to a list of free quota entries, and
a hash table of in-use entries. The size of the hash table depends
on the filesystem block size (header+hash table should fit in the
first block). The file is not sparse and is a multiple of
filesystem block size (when the free quota entry list is empty a new
filesystem block is allocated). quota entries to not cross
filesystem block boundaries.

In memory, the kernel keeps a cache of recently used quota entries
as a reference to the block number, and offset withing the block.
The quota entry itself is keept in the buf cache.

fsck_ffs(8), tunefs(8) and newfs(8) supports are completed (with
related atf tests :)
The kernel can update disk usage and report it via quotactl(2).

Todo: enforce quotas limits (limits are not checked by kernel yet)
update repquota, edquota and rpc.rquotad to the new world
implement compat_50_quotactl ioctl.
update quotactl(2) man page

fsck_ffs required fixes so that allocating new blocks or inodes will
properly update the superblock and cg sumaries. This was not an issue up
to now because superblock and cg sumaries check happened last, but now
allocations or frees can happen in pass 6.
 1.263.2.1  06-Jun-2011  jruoho Sync with HEAD.
 1.266.2.1  23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.269.2.6  22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.269.2.5  23-Jan-2013  yamt sync with head
 1.269.2.4  16-Jan-2013  yamt sync with (a bit old) head
 1.269.2.3  30-Oct-2012  yamt sync with head
 1.269.2.2  23-May-2012  yamt sync with head.
 1.269.2.1  17-Apr-2012  yamt sync with head
 1.271.4.3  02-Jun-2012  mrg sync to latest -current.
 1.271.4.2  05-Apr-2012  mrg sync to latest -current.
 1.271.4.1  18-Feb-2012  mrg merge to -current.
 1.275.2.5  27-Aug-2016  bouyer Pull up following revision(s) (requested by martin in ticket #1395):
sys/ufs/ffs/ffs_vfsops.c: revision 1.340
usr.sbin/quot/quot.c: revision 1.34
sbin/fsdb/fsdb.c: revision 1.49
From Michael Plass:
The superblock field that distinguishes between 4.2BSD and 4.4BSD
inodes is really only relevant on a UFS1 file system. Make sure that
it is a UFS1 fs before using fs_old_inodefmt.
Note that the NetBSD newfs and mkfs utilities initialize fs_old_inodefmt
even for UFS2, so problems were apparent only on file systems created
by other operating systems, for example, FreeBSD.
 1.275.2.4  04-Dec-2014  snj Pull up following revision(s) (requested by manu in ticket #1196):
sys/kern/vfs_mount.c: revision 1.31
sys/ufs/ffs/ffs_vfsops.c: revision 1.302
sys/ufs/ufs/ufs_extattr.c: revision 1.44
Fix use-after-free on failed unmount with extended attribute enabled
When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.
The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart
As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.
 1.275.2.3  21-Apr-2014  bouyer Pull up following revision(s) (requested by maxv in ticket #1050):
sys/ufs/chfs/chfs_vfsops.c: revision 1.11
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/fs/nilfs/nilfs_vfsops.c: revision 1.16
sys/ufs/mfs/mfs_vfsops.c: revision 1.107
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/kern/vfs_syscalls.c: revision 1.478
sys/kern/vfs_syscalls.c: revision 1.479
sys/fs/puffs/puffs_vfsops.c: revision 1.110
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/nfs/nfs_vfsops.c: revision 1.227
sys/fs/v7fs/v7fs_vfsops.c: revision 1.10
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/miscfs/nullfs/null_vfsops.c: revision 1.88
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50
sys/coda/coda_vfsops.c: revision 1.81
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/kern/vfs_syscalls.c: revision 1.480
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/kern/vfs_syscalls.c: revision 1.482
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vfsops.c: revision 1.12
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/udf/udf_vfsops.c: revision 1.67
Limit check for 'data_len'. Otherwise a (un)privileged user can easily
panic the system by passing a huge size.
ok christos@
An (un)privileged user can easily make the kernel dereference a NULL
pointer.
The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).
ok christos@
Some fs's - like kernfs - set their vfs_min_mount_data to zero. Add a check
to prevent an (un)privileged user from requesting a zero-sized allocation
(and thus a panic).
This thing is totally buggy: 'data_len' is modified by the fs, so calling
kmem_free with it while its value has changed since the kmem_alloc is far
from being a good idea.
If the kernel figures out that something mismatches, it will panic
(typically with kernfs).
 1.275.2.2  13-Sep-2012  riz branches: 1.275.2.2.2; 1.275.2.2.4;
Pull up following revision(s) (requested by manu in ticket #553):
sys/ufs/ffs/ffs_vfsops.c: revision 1.278
Stop extended attributes at the appropriate place so that unmount
does not fail with EBUSY on filesystem with extended attributes ensabled.
 1.275.2.1  07-May-2012  riz branches: 1.275.2.1.2;
Pull up following revision(s) (requested by chs in ticket #204):
sys/fs/sysvbfs/sysvbfs_vnops.c: revision 1.44
sys/ufs/ffs/ffs_vfsops.c: revision 1.277
sys/fs/v7fs/v7fs_vnops.c: revision 1.11
sys/ufs/chfs/chfs_vnops.c: revision 1.7
sys/ufs/ext2fs/ext2fs_readwrite.c: revision 1.61
sys/miscfs/genfs/genfs_io.c: revision 1.54
sys/kern/vfs_wapbl.c: revision 1.52
sys/uvm/uvm_pager.h: revision 1.43
sys/ufs/ffs/ffs_vnops.c: revision 1.121
sys/kern/vfs_subr.c: revision 1.434
sys/fs/msdosfs/msdosfs_vnops.c: revision 1.83
sys/fs/ntfs/ntfs_vnops.c: revision 1.51
sys/fs/udf/udf_subr.c: revision 1.119
sys/miscfs/specfs/spec_vnops.c: revision 1.135
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.103
sys/fs/udf/udf_vnops.c: revision 1.71
sys/ufs/ufs/ufs_readwrite.c: revision 1.104
change vflushbuf() to take the full FSYNC_* flags.
translate FSYNC_LAZY into PGO_LAZY for VOP_PUTPAGES() so that
genfs_do_io() can set the appropriate io priority for the I/O.
this is the first part of addressing PR 46325.
mark all wapbl I/O as BPRIO_TIMECRITICAL.
this is the second part of addressing PR 46325.
 1.275.2.2.4.2  27-Aug-2016  bouyer Pull up following revision(s) (requested by martin in ticket #1395):
sys/ufs/ffs/ffs_vfsops.c: revision 1.340
usr.sbin/quot/quot.c: revision 1.34
sbin/fsdb/fsdb.c: revision 1.49
From Michael Plass:
The superblock field that distinguishes between 4.2BSD and 4.4BSD
inodes is really only relevant on a UFS1 file system. Make sure that
it is a UFS1 fs before using fs_old_inodefmt.
Note that the NetBSD newfs and mkfs utilities initialize fs_old_inodefmt
even for UFS2, so problems were apparent only on file systems created
by other operating systems, for example, FreeBSD.
 1.275.2.2.4.1  21-Apr-2014  bouyer Pull up following revision(s) (requested by maxv in ticket #1050):
sys/ufs/chfs/chfs_vfsops.c: revision 1.11
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/fs/nilfs/nilfs_vfsops.c: revision 1.16
sys/ufs/mfs/mfs_vfsops.c: revision 1.107
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/kern/vfs_syscalls.c: revision 1.478
sys/kern/vfs_syscalls.c: revision 1.479
sys/fs/puffs/puffs_vfsops.c: revision 1.110
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/nfs/nfs_vfsops.c: revision 1.227
sys/fs/v7fs/v7fs_vfsops.c: revision 1.10
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/miscfs/nullfs/null_vfsops.c: revision 1.88
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50
sys/coda/coda_vfsops.c: revision 1.81
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/kern/vfs_syscalls.c: revision 1.480
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/kern/vfs_syscalls.c: revision 1.482
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vfsops.c: revision 1.12
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/udf/udf_vfsops.c: revision 1.67
Limit check for 'data_len'. Otherwise a (un)privileged user can easily
panic the system by passing a huge size.
ok christos@
An (un)privileged user can easily make the kernel dereference a NULL
pointer.
The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).
ok christos@
Some fs's - like kernfs - set their vfs_min_mount_data to zero. Add a check
to prevent an (un)privileged user from requesting a zero-sized allocation
(and thus a panic).
This thing is totally buggy: 'data_len' is modified by the fs, so calling
kmem_free with it while its value has changed since the kmem_alloc is far
from being a good idea.
If the kernel figures out that something mismatches, it will panic
(typically with kernfs).
 1.275.2.2.2.2  27-Aug-2016  bouyer Pull up following revision(s) (requested by martin in ticket #1395):
sys/ufs/ffs/ffs_vfsops.c: revision 1.340
usr.sbin/quot/quot.c: revision 1.34
sbin/fsdb/fsdb.c: revision 1.49
From Michael Plass:
The superblock field that distinguishes between 4.2BSD and 4.4BSD
inodes is really only relevant on a UFS1 file system. Make sure that
it is a UFS1 fs before using fs_old_inodefmt.
Note that the NetBSD newfs and mkfs utilities initialize fs_old_inodefmt
even for UFS2, so problems were apparent only on file systems created
by other operating systems, for example, FreeBSD.
 1.275.2.2.2.1  21-Apr-2014  bouyer Pull up following revision(s) (requested by maxv in ticket #1050):
sys/ufs/chfs/chfs_vfsops.c: revision 1.11
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/fs/nilfs/nilfs_vfsops.c: revision 1.16
sys/ufs/mfs/mfs_vfsops.c: revision 1.107
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/kern/vfs_syscalls.c: revision 1.478
sys/kern/vfs_syscalls.c: revision 1.479
sys/fs/puffs/puffs_vfsops.c: revision 1.110
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/nfs/nfs_vfsops.c: revision 1.227
sys/fs/v7fs/v7fs_vfsops.c: revision 1.10
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/miscfs/nullfs/null_vfsops.c: revision 1.88
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50
sys/coda/coda_vfsops.c: revision 1.81
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/kern/vfs_syscalls.c: revision 1.480
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/kern/vfs_syscalls.c: revision 1.482
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vfsops.c: revision 1.12
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/udf/udf_vfsops.c: revision 1.67
Limit check for 'data_len'. Otherwise a (un)privileged user can easily
panic the system by passing a huge size.
ok christos@
An (un)privileged user can easily make the kernel dereference a NULL
pointer.
The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).
ok christos@
Some fs's - like kernfs - set their vfs_min_mount_data to zero. Add a check
to prevent an (un)privileged user from requesting a zero-sized allocation
(and thus a panic).
This thing is totally buggy: 'data_len' is modified by the fs, so calling
kmem_free with it while its value has changed since the kmem_alloc is far
from being a good idea.
If the kernel figures out that something mismatches, it will panic
(typically with kernfs).
 1.275.2.1.2.1  01-Nov-2012  matt sync with netbsd-6-0-RELEASE.
 1.278.2.7  03-Dec-2017  jdolecek update from HEAD
 1.278.2.6  20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.278.2.5  23-Jun-2013  tls resync from head
 1.278.2.4  25-Feb-2013  tls resync with head
 1.278.2.3  10-Feb-2013  tls Add an accessor -- ufs_maxphys() -- to check the maximum transfer size
for a given UFS mountpoint, and move the code from mount that finds
the underlying disk and resets the mountpoint max transfer size into a
utility function, ufs_update_maxphys().

Add a global serial number that counts disk property changes to which
filesystems are meant to accomodate themselves. Make ufs_maxphys()
check it. This is a sort of flag-polling interface that avoids callbacks
into the filesystem code, but will require freezing filesystems and
draining in-flight transactions before a decrease in size that is
mandatory (like attaching a disk with a smaller maximum transfer size
as a spare in a RAIDframe set), rather than "advisory", like finding
out set geometry from a RAID controller long after boot and deciding
a smaller transfer size would be optimal, can be signalled. Still, the
"advisory" case is the common one so this is progress.

Make a bit of an example of RAIDframe by making it bump this new
serial number when disks are added to the subsystem. I will attack
one of the hardware RAID drivers (probably arcmsr) next.
 1.278.2.2  20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.278.2.1  12-Sep-2012  tls Initial snapshot of work to eliminate 64K MAXPHYS. Basically works for
physio (I/O to raw devices); needs more doing to get it going with the
filesystems, but it shouldn't damage data.

All work's been done on amd64 so far. Not hard to add support to other
ports. If others want to pitch in, one very helpful thing would be to
sort out when and how IDE disks can do 128K or larger transfers, and
adjust the various PCI IDE (or at least ahcisata) drivers and wd.c
accordingly -- it would make testing much easier. Another very helpful
thing would be to implement a smart minphys() for RAIDframe along the
lines detailed in the MAXPHYS-NOTES file.
 1.286.2.2  18-May-2014  rmind sync with head
 1.286.2.1  28-Aug-2013  rmind sync with head
 1.296.2.1  10-Aug-2014  tls Rebase.
 1.299.2.4  27-Aug-2016  bouyer Pull up following revision(s) (requested by martin in ticket #1210):
sys/ufs/ffs/ffs_vfsops.c: revision 1.340
usr.sbin/quot/quot.c: revision 1.34
sbin/fsdb/fsdb.c: revision 1.49
From Michael Plass:
The superblock field that distinguishes between 4.2BSD and 4.4BSD
inodes is really only relevant on a UFS1 file system. Make sure that
it is a UFS1 fs before using fs_old_inodefmt.
Note that the NetBSD newfs and mkfs utilities initialize fs_old_inodefmt
even for UFS2, so problems were apparent only on file systems created
by other operating systems, for example, FreeBSD.
 1.299.2.3  28-Jan-2015  martin branches: 1.299.2.3.2;
Pull up following revision(s) (requested by christos in ticket #425):
sys/ufs/ufs/ufs_inode.c: revision 1.91-1.92
sys/ufs/ufs/ufs_vnops.c: revision 1.223-1.224
sys/ufs/ufs/ufs_extern.h: revision 1.76-1.77
sys/ufs/ffs/ffs_vfsops.c: revision 1.303-1.305
Add debugging for mount...
Merge some error returns
Check more errors
Restore apple ufs error handling.
Move and unify indirect block truncate algorithm into a separate function.
PR/39371: Tobias Nygren: Don't fail mounting root if WAPBL log is corrupt.
Patch from Sergio L. Pascual.
 1.299.2.2  29-Dec-2014  martin Pull up following revision(s) (requested by maxv in ticket #352):
sys/ufs/ffs/ffs_vfsops.c: revision 1.301
Limit the superblock size to SBLOCKSIZE, not MAXBSIZE. Otherwise memcpy
will read beyond the allocated buffer.
Discussed a bit on tech-kern@.
 1.299.2.1  18-Nov-2014  snj Pull up following revision(s) (requested by manu in ticket #246):
sys/kern/vfs_mount.c: revision 1.31
sys/ufs/ffs/ffs_vfsops.c: revision 1.302
sys/ufs/ufs/ufs_extattr.c: revision 1.44
Fix use-after-free on failed unmount with extended attribute enabled
When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.
The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart
As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.
 1.299.2.3.2.1  27-Aug-2016  bouyer Pull up following revision(s) (requested by martin in ticket #1210):
sys/ufs/ffs/ffs_vfsops.c: revision 1.340
usr.sbin/quot/quot.c: revision 1.34
sbin/fsdb/fsdb.c: revision 1.49
From Michael Plass:
The superblock field that distinguishes between 4.2BSD and 4.4BSD
inodes is really only relevant on a UFS1 file system. Make sure that
it is a UFS1 fs before using fs_old_inodefmt.
Note that the NetBSD newfs and mkfs utilities initialize fs_old_inodefmt
even for UFS2, so problems were apparent only on file systems created
by other operating systems, for example, FreeBSD.
 1.302.2.9  28-Aug-2017  skrll Sync with HEAD
 1.302.2.8  05-Feb-2017  skrll Sync with HEAD
 1.302.2.7  05-Dec-2016  skrll Sync with HEAD
 1.302.2.6  05-Oct-2016  skrll Sync with HEAD
 1.302.2.5  09-Jul-2016  skrll Sync with HEAD
 1.302.2.4  27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.302.2.3  22-Sep-2015  skrll Sync with HEAD
 1.302.2.2  06-Jun-2015  skrll Sync with HEAD
 1.302.2.1  06-Apr-2015  skrll Sync with HEAD
 1.339.2.7  26-Apr-2017  pgoyette Sync with HEAD
 1.339.2.6  20-Mar-2017  pgoyette Sync with HEAD
 1.339.2.5  07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.339.2.4  04-Nov-2016  pgoyette Sync with HEAD
 1.339.2.3  06-Aug-2016  pgoyette Sync with HEAD
 1.339.2.2  21-Jul-2016  pgoyette Actually save the bdev value when it is retrieved, so we can use it
later in a call to bdevsw_release().
 1.339.2.1  20-Jul-2016  pgoyette Adapt machine-independant code to the new {b,c}devsw reference-counting
(using localcount(9)). All callers of {b,c}devsw_lookup() now call
{b,c}devsw_lookup_acquire() which retains a reference on the 'struct
{b,c}devsw'. This reference must be released by the caller once it is
finished with the structure's content (or other data that would disappear
if the 'struct {b,c}devsw' were to disappear).
 1.342.2.1  21-Apr-2017  bouyer Sync with HEAD
 1.353.4.3  28-Nov-2023  martin Pull up following revision(s) (requested by riastradh in ticket #1921):

sys/ufs/ffs/ffs_vfsops.c: revision 1.382

ffs_sync: Avoid unlocked access to v_numoutput/v_dirtyblkhd.

Found by lockdoc.

PR kern/57606
 1.353.4.2  11-Apr-2018  martin Pull up following revision(s) (requested by christos in ticket #738):

sys/ufs/ffs/ffs_vfsops.c: revision 1.355

PR/52728: Izumi Tsutsui: "mount -u /dev/ /" triggers kernel panic

Simplify the control flow of the mount code and make sure that the
mountfrom argument can be converted to a block device in the update
case.
 1.353.4.1  04-Feb-2018  martin Pull up following revision(s) (requested by christos in ticket #523):
sys/ufs/ffs/ffs_vfsops.c: revision 1.356
sys/ufs/ufs/ufs_inode.c: revision 1.103
Make sure inode blocks and size are zero when VOP_INACTIVE()
finalises a now unlinked inode.
Counterpart of the check in ffs_newvnode().
Prevent use-after-free where genfs_node_destroy() would destroy
a lock residing in the just freed inode data.
 1.353.2.1  27-Apr-2017  pgoyette Restore all work from the former pgoyette-localcount branch (which is
now abandoned doe to cvs merge botch).

The branch now builds, and installs via anita. There are still some
problems (cgd is non-functional and all atf tests time-out) but they
will get resolved soon.
 1.356.2.4  18-Jan-2019  pgoyette Synch with HEAD
 1.356.2.3  26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.356.2.2  28-Jul-2018  pgoyette Sync with HEAD
 1.356.2.1  25-Jun-2018  pgoyette Sync with HEAD
 1.357.2.3  13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.357.2.2  08-Apr-2020  martin Merge changes from current as of 20200406
 1.357.2.1  10-Jun-2019  christos Sync with HEAD
 1.362.4.5  29-Feb-2020  ad Sync with head.
 1.362.4.4  24-Jan-2020  ad - Put all the namecache stuff back into vnode_impl_t.
- Tidy vfs_cache.c up, finish the comments.
- Finalise how ID information is entered to the cache.
- Handle very small/old systems.
 1.362.4.3  19-Jan-2020  ad Set IMNT_SHRLOOKUP and use it for the in-cache case. Need to check what
more can be done with tmpfs though, it can probably do the whole lookup.
 1.362.4.2  17-Jan-2020  ad vfs_lookup:

- Do the easy component name lookups directly in the namecache without
taking vnode locks nor vnode references (between the start and the leaf /
parent), which seems to largely solve the lock contention problem with
namei(). It needs support from the file system, which has to tell the
name cache about directory permissions (only ffs and tmpfs tried so far),
and I'm not sure how or if it can work with layered file systems yet.
Work in progress.

vfs_cache:

- Make the rbtree operations more efficient: inline the lookup, and key on a
64-bit hash value (32 bits plus 16 bits length) rather than names.

- Take namecache stuff out of vnode_impl, and take the rwlocks, and put them
all together an an nchnode struct which is mapped 1:1: with vnodes. Saves
memory and nicer cache profile.

- Add a routine to help vfs_lookup do its easy component name lookups.

- Report some more stats.

- Tidy up the file a bit.
 1.362.4.1  17-Jan-2020  ad Sync with head.
 1.362.2.2  07-Jan-2025  martin Pull up following revision(s) (requested by hannken in ticket #1934):

sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.228
sys/ufs/lfs/lfs_vfsops.c: revision 1.383
sys/ufs/ffs/ffs_wapbl.c: revision 1.50
sys/ufs/ffs/ffs_vfsops.c: revision 1.383 (patch)
sys/ufs/ffs/ffs_vfsops.c: revision 1.384 (patch)

Remove comment "we are always called with the filesystem marked `MPBUSY'."
above some xxx_sync() operations. These operations get called without
any exclusive lock.

This comment appeared with "add quota support" on 1990-05-02.
On 1998/02/18 MNT_MPBUSY disappeared when vfs_busy() was changed from
an exclusive lock to a shared lock.

PR kern/58837 "ffs: Missing locking around fs_fmod/time"

Protect test/clear fs->fs_fmod with um_lock like it is already
protected in ffs_alloc.c.

When writing to disk protect moving superblock to buffer with um_lock.

Set/clear fs->fmod while mounting, updating a mount or unmounting
is safe as these operations run exclusive, either mounting creates
a new file system or the file system is suspended. Assert suspension
for update and unmount.

PR kern/58837 "ffs: Missing locking around fs_fmod/time"
 1.362.2.1  28-Nov-2023  martin Pull up following revision(s) (requested by riastradh in ticket #1770):

sys/ufs/ffs/ffs_vfsops.c: revision 1.382

ffs_sync: Avoid unlocked access to v_numoutput/v_dirtyblkhd.

Found by lockdoc.

PR kern/57606
 1.378.2.4  07-Jan-2025  martin Pull up following revision(s) (requested by hannken in ticket #1037):

sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.228
sys/ufs/lfs/lfs_vfsops.c: revision 1.383
sys/ufs/ffs/ffs_wapbl.c: revision 1.50
sys/ufs/ffs/ffs_vfsops.c: revision 1.383
sys/ufs/ffs/ffs_vfsops.c: revision 1.384

Remove comment "we are always called with the filesystem marked `MPBUSY'."
above some xxx_sync() operations. These operations get called without
any exclusive lock.

This comment appeared with "add quota support" on 1990-05-02.
On 1998/02/18 MNT_MPBUSY disappeared when vfs_busy() was changed from
an exclusive lock to a shared lock.

PR kern/58837 "ffs: Missing locking around fs_fmod/time"

Protect test/clear fs->fs_fmod with um_lock like it is already
protected in ffs_alloc.c.

When writing to disk protect moving superblock to buffer with um_lock.

Set/clear fs->fmod while mounting, updating a mount or unmounting
is safe as these operations run exclusive, either mounting creates
a new file system or the file system is suspended. Assert suspension
for update and unmount.

PR kern/58837 "ffs: Missing locking around fs_fmod/time"
 1.378.2.3  18-Oct-2023  martin Pull up following revision(s) (requested by riastradh in ticket #424):

sys/ufs/ffs/ffs_vfsops.c: revision 1.382

ffs_sync: Avoid unlocked access to v_numoutput/v_dirtyblkhd.

Found by lockdoc.
PR kern/57606
 1.378.2.2  21-Jun-2023  martin Pull up following revision(s) (requested by hannken in ticket #197):

sys/ufs/ffs/ffs_vfsops.c: revision 1.381
sys/dev/raidframe/rf_netbsdkintf.c: revision 1.412

Undo unlock/relock for VOP_IOCTL().
PR kern/57450 (unplugging hung USB disk triggers panic via _vstate_assert)
 1.378.2.1  21-Dec-2022  martin Pull up following revision(s) (requested by chs in ticket #17):

sys/ufs/ffs/ffs_vfsops.c: revision 1.379

ffs: fail mounts requesting ACLs for non-ea UFS2 file systems

For non-ea UFS2 file system, fail mounts that request ACLs rather than
letting the mount succeed only to reject all ACL operations later.

Also fix the messages about the on-disk fs flags conflicting with
the mount options for which type of ACLs to use, and about requesting
both types of ACLs.

RSS XML Feed