Home | History | Annotate | only in /src/sys/ufs/lfs
History log of /src/sys/ufs/lfs
RevisionDateAuthorComments
 1.7 07-Jan-2025  andvar s/remaing/remaining/ s/containg/containing/, mainly in comments.
 1.6 09-Feb-2024  andvar branches: 1.6.2;
fix spelling mistakes, mainly in comments and log messages.
 1.5 11-Dec-2005  christos merge ktrace-lwp.
 1.4 24-Feb-2004  wiz parameter with two es. From Peter Postma.
 1.3 05-Jul-2001  toshii branches: 1.3.22;
Fix typo. s/extention/extension/
 1.2 10-Apr-1999  perseant branches: 1.2.14;
Change the reference to "newlfs" in the CHANGES file to the correct "newfs_lfs"
 1.1 15-Mar-1999  perseant branches: 1.1.4;
New CHANGES files that describes briefly all nontrivial changes made to
the LFS since the 4.4lite2 code was merged into NetBSD.

TODO updated to remove everything marked DONE in 4.4, and add in a list
of more current things to do.

Get rid of comments about the cleaner syscall code and missing fragment
support from README.
 1.1.4.1 21-Jun-1999  thorpej Sync w/ -current.
 1.2.14.1 24-Aug-2001  nathanw Catch up with -current.
 1.3.22.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.3.22.2 18-Sep-2004  skrll Sync with HEAD.
 1.3.22.1 03-Aug-2004  skrll Sync with HEAD
 1.6.2.1 02-Aug-2025  perseant Sync with HEAD
 1.3 28-Jul-2015  dholland Add a new lfs header file: lfs_accessors.h.

This contains all the accessor functions and macros out of lfs.h.
Add an include of lfs_accessors.h after all uses of lfs.h... except
for code that wants to define its own struct lfs-alike that the
accessors are supposed to play along with. For these, set STRUCT_LFS
and include lfs_accessors.h after the necessary structure has been
defined, so that lfs_accessors.h can emit functions in terms of it.
 1.2 08-Jun-2013  dholland branches: 1.2.10;
Split the definitions suitable for userland out of ulfs_inode.h into
lfs_inode.h. Since fsck_lfs, newfs_lfs, and lfs_cleanerd want to reuse
the inode structure for their own internal use, and some of them share
parts of the kernel code as well, the best way forward is to provide a
relatively sanitized header that doesn't bring in stray material.

Shuffle a few other definitions around so that lfs_inode.h depends
only on lfs.h.

Install lfs_inode.h into /usr/include.
 1.1 12-Jun-1998  cgd branches: 1.1.188; 1.1.198;
Rework the way kernel include files are installed. In the new method,
as with user-land programs, include files are installed by each directory
in the tree that has includes to install. (This allows more flexibility
as to what gets installed, makes 'partial installs' easier, and gives us
more options as to which machines' includes get installed at any given
time.) The old SYS_INCLUDES={symlinks,copies} behaviours are _both_
still supported, though at least one bug in the 'symlinks' case is
fixed by this change. Include files can't be build before installation,
so directories that have includes as targets (e.g. dev/pci) have to move
those targets into a different Makefile.
 1.1.198.2 03-Dec-2017  jdolecek update from HEAD
 1.1.198.1 23-Jun-2013  tls resync from head
 1.1.188.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.2.10.1 22-Sep-2015  skrll Sync with HEAD
 1.3 15-Mar-1999  perseant New CHANGES files that describes briefly all nontrivial changes made to
the LFS since the 4.4lite2 code was merged into NetBSD.

TODO updated to remove everything marked DONE in 4.4, and add in a list
of more current things to do.

Get rid of comments about the cleaner syscall code and missing fragment
support from README.
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.2 06-Jun-2013  dholland branches: 1.2.2; 1.2.10;
Update the line-count standings.
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.2.10.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.2.10.1 06-Jun-2013  yamt file README.wc was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.2.2.2 23-Jun-2013  tls resync from head
 1.2.2.1 06-Jun-2013  tls file README.wc was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.10 11-Dec-2005  christos merge ktrace-lwp.
 1.9 01-Apr-2005  perseant branches: 1.9.2;
Protect various per-fs structures with fs->lfs_interlock simple_lock, to
improve behavior in the multiprocessor case. Add debugging segment-lock
assertion statements.
 1.8 26-Feb-2005  perseant branches: 1.8.2;
Various minor LFS improvements:

* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statvfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().
 1.7 23-Feb-2003  perseant branches: 1.7.2; 1.7.8; 1.7.10; 1.7.12;
Fix a buffer overflow bug in the LFS_UBC case that manifested itself
either as a mysterious UVM error or as "panic: dirty bufs". Verify
maximum size in lfs_malloc.

Teach lfs_updatemeta and lfs_shellsort about oversized cluster blocks from
lfs_gop_write.

When unwiring pages in lfs_gop_write, deactivate them, under the theory
that the pagedaemon wanted to free them last we knew.
 1.6 17-Feb-2003  perseant Add code to UBCify LFS. This is still behind "#ifdef LFS_UBC" for now
(there are still some details to work out) but expect that to go
away soon. To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:

* Create a writer daemon kernel thread whose purpose is to handle page
writes for the pagedaemon, but which also takes over some of the
functions of lfs_check(). This thread is started the first time an
LFS is mounted.

* Add a "flags" parameter to GOP_SIZE. Current values are
GOP_SIZE_READ, meaning that the call should return the size of the
in-core version of the file, and GOP_SIZE_WRITE, meaning that it
should return the on-disk size. One of GOP_SIZE_READ or
GOP_SIZE_WRITE must be specified.

* Instead of using malloc(...M_WAITOK) for everything, reserve enough
resources to get by and use malloc(...M_NOWAIT), using the reserves if
necessary. Use the pool subsystem for structures small enough that
this is feasible. This also obsoletes LFS_THROTTLE.

And a few that are not strictly necessary:

* Moves the LFS inode extensions off onto a separately allocated
structure; getting closer to LFS as an LKM. "Welcome to 1.6O."

* Unified GOP_ALLOC between FFS and LFS.

* Update LFS copyright headers to correct values.

* Actually cast to unsigned in lfs_shellsort, like the comment says.

* Keep track of which segments were empty before the previous
checkpoint; any segments that pass two checkpoints both dirty and
empty can be summarily cleaned. Do this. Right now lfs_segclean
still works, but this should be turned into an effectless
compatibility syscall.
 1.5 13-Jul-2001  perseant Merge the short-lived perseant-lfsv2 branch into the trunk.

Kernels and tools understand both v1 and v2 filesystems; newfs_lfs
generates v2 by default. Changes for the v2 layout include:

- Segments of non-PO2 size and arbitrary block offset, so these can be
matched to convenient physical characteristics of the partition (e.g.,
stripe or track size and offset).

- Address by fragment instead of by disk sector, paving the way for
non-512-byte-sector devices. In theory fragments can be as large
as you like, though in reality they must be smaller than MAXBSIZE in size.

- Use serial number and filesystem identifier to ensure that roll-forward
doesn't get old data and think it's new. Roll-forward is enabled for
v2 filesystems, though not for v1 filesystems by default.

- The inode free list is now a tailq, paving the way for undelete (undelete
is not yet implemented, but can be without further non-backwards-compatible
changes to disk structures).

- Inode atime information is kept in the Ifile, instead of on the inode;
that is, the inode is never written *just* because atime was changed.
Because of this the inodes remain near the file data on the disk, rather
than wandering all over as the disk is read repeatedly. This speeds up
repeated reads by a small but noticeable amount.

Other changes of note include:

- The ifile written by newfs_lfs can now be of arbitrary length, it is no
longer restricted to a single indirect block.

- Fixed an old bug where ctime was changed every time a vnode was created.
I need to look more closely to make sure that the times are only updated
during write(2) and friends, not after-the-fact during a segment write,
and certainly not by the cleaner.
 1.4 17-Nov-2000  perseant branches: 1.4.2; 1.4.4; 1.4.6;
Correct accounting of lfs_avail, locked_queue_count, and locked_queue_bytes.
(PR #11468). In the case of fragment allocation, check to see if enough
space is available before extending a fragment already scheduled for writing.

The locked_queue_* variables indicate the number of buffer headers and bytes,
respectively, that are unavailable to getnewbuf() because they are locked up
waiting for LFS to flush them; make sure that that is actually what we're
counting, i.e., never count malloced buffers, and always use b_bufsize instead
of b_bcount.

If DEBUG is defined, the periodic calls to lfs_countlocked will now complain
if either counter is incorrect. (In the future lfs_countlocked will not need
to be called at all if DEBUG is not defined.)
 1.3 15-Mar-1999  perseant branches: 1.3.8;
New CHANGES files that describes briefly all nontrivial changes made to
the LFS since the 4.4lite2 code was merged into NetBSD.

TODO updated to remove everything marked DONE in 4.4, and add in a list
of more current things to do.

Get rid of comments about the cleaner syscall code and missing fragment
support from README.
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.3.8.1 22-Nov-2000  bouyer Sync with HEAD.
 1.4.6.1 03-Aug-2001  lukem update to -current
 1.4.4.1 27-Jun-2001  perseant Import of what I've been calling "LFSv2", that is, LFS with some features
added that require changes to the on-disk data structures. These include:

- 64-bit time in everything but inodes
- User-specified segment offset, and segment size no longer
restricted to PO2.
- Serial number on segment summaries in addition to timestamp, and
a new volume identifier, to make roll-forward feasible without
fear of finding old data and thinking it was new.

Although I think this version works at least as well as what's on the trunk,
we're not done yet; hence this commit is going in on a branch and not on
the trunk. Enhancements that are not here yet include fragment addressing,
like FFS does, instead of block addressing.
 1.4.2.1 24-Aug-2001  nathanw Catch up with -current.
 1.7.12.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.7.10.1 29-Apr-2005  kent sync with -current
 1.7.8.1 10-May-2005  riz Pull up the following revisions (requested by perseant in ticket #1281):

1.8 sys/ufs/lfs/TODO
1.75 sys/ufs/lfs/lfs.h (via patch)
1.74 sys/ufs/lfs/lfs_alloc.c (via patch)
1.49, 1.51 sys/ufs/lfs/lfs_balloc.c (1.51 via patch)
1.78 sys/ufs/lfs/lfs_bio.c
1.62 sys/ufs/lfs/lfs_extern.h (via patch)
1.156 sys/ufs/lfs/lfs_segment.c (via patch)
1.48 sys/ufs/lfs/lfs_subr.c
1.101 sys/ufs/lfs/lfs_syscalls.c
1.163 sys/ufs/lfs/lfs_vfsops.c (via patch)
1.134 sys/ufs/lfs/lfs_vnops.c (via patch)
1.61 sys/ufs/ufs/ufs_readwrite.c (via patch)

1.20 libexec/lfs_cleanerd/clean.h (via patch)
1.52 libexec/lfs_cleanerd/cleanerd.c (via patch)
1.41 libexec/lfs_cleanerd/library.c (via patch)

1.4 regress/sys/fs/lfs/newfs_fsck/Makefile
1.2 regress/sys/fs/lfs/newfs_fsck/mkfs_mount
1.2 regress/sys/fs/lfs/newfs_fsck/smallfiles
1.3 sbin/fsck_lfs/bufcache.c
1.3 sbin/fsck_lfs/bufcache.h
1.3 sbin/fsck_lfs/lfs.h
1.8 sbin/fsck_lfs/lfs.c (via patch)
1.8 sbin/fsck_lfs/pass3.c (via patch)
1.18 sbin/fsck_lfs/pass0.c (via patch)
1.18 sbin/fsck_lfs/utilities.c (via patch)
1.7 sbin/fsck_lfs/segwrite.c
1.19 sbin/fsck_lfs/setup.c (via patch)
1.3 sbin/newfs_lfs/Makefile
0 sbin/newfs_lfs/lfs.c (yes, remove it)
1.1 sbin/newfs_lfs/make_lfs.c
1.15 sbin/newfs_lfs/newfs.c (via patch)

Various minor LFS improvements.

Kernel:

* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this. Should fix PR #29045.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
Fixes PR #26680.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().

cleaner:

* Adapt lfs_cleanerd to use the fcntl call to get the Ifile filehandle,
so it need not be in the namespace.
* Make lfs_cleanerd be more careful when there are very few available
segments.
* Make lfs_cleanerd less verbose when the filesystem is unmounted.

newfs_lfs, fsck_lfs, and regression:

* Extend the lfs library from fsck_lfs(8) so that it can be used with a
not-yet-existent LFS. Make newfs_lfs(8) use this library, so it can
create LFSs whose Ifile is larger than one segment. Addresses PR #11110.
* Make newfs_lfs(8) use strsuftoi64() for its arguments, a la newfs(8).
* Make fsck_lfs(8) respect the "file system is clean" flag.
* Don't let fsck_lfs(8) think it has dirty blocks when invoked with the
-n flag.
* Remove the Ifile from the filesystem namespace. The cleaner now uses
a fcntl call on the root inode to find the Ifile filehandle. (As a
side-effect, addresses PR #29144.)
 1.7.2.2 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.7.2.1 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.8.2.1 07-May-2005  tron Apply patch (requested by perseant in ticket #242):
* fsck_lfs buffer cache fixes, including PR #29151
* Change fsck_lfs phase 0 message to reflect reality
* fsck_lfs: check phase 5 (cleanerinfo accounting) even on
roll-forward
* Keep better track of the free list during roll-forward, avoiding
a core dump
* Improve hash table use for fsck_lfs buffer and vnode cache
* Document fsck_lfs flag -f, and implement -q
* Add resize_lfs, including kernel support
* Add LFS to mountd's list of exportable filesystem types
* Make the LFS lkm work again [christos@]
* Add MP locking to the LFS kernel subsystem
* Fix pager_map deadlock in lfs_putpages()
* Avoid incomplete file extension that looks like "partial
truncation" to fsck
* Use lfs_malloc for cleaner malloc, since the cleaner often runs
in low-memory conditions.
* Use splay trees, not hash table, to track page allocation for
write.
* Fix mkdir panic on full fs
* Fix page accounting leak by counting differently.
* Use rightly named structure for lfs_getattr [skrll@]
* Cosmetic changes for readability.
 1.9.2.1 21-Jun-2006  yamt sync with head.
 1.211 20-Oct-2025  perseant * Generalize the partial-segment parser introduced for roll-forward,
using it to facilitate an in-kernel segment rewriter (cleaner), and a
mechanism to check whether a segment is in fact empty (only used
with DEBUG).

* Add these new fcntl calls:
- LFCNFILESTATS: For each inode given, report its number of direct
blocks, how many gaps (discontinuities) there are between direct
blocks, and how large the total gap distance is. This will be
useful for a coalescing agent.
- LFCNREWRITEFILE: For each inode given, rewrite its direct blocks,
effectively coalescing it into as compact a form as possible.
- LFCNSCRAMBLE: As above, except that it only rewrites every other
block. This causes the file to have many gaps that can be
measured with LFCNFILESTATS and addressed with LFCNREWRITEFILE,
for testing purposes.
- LFCNREWRITESEGS: Rewrite any live data in the given segments.
This is intended to simplify the cleaner API and facilitate an
in-kernel cleaner.
- LFCNCLEANERINFO: Get the most current CLEANERINFO data from the
kernel.
- LFCNSEGUSE: Retrieve segment usage data from the kernel.

* Vnodes marked IN_CLEANING now take a reference. Add a new "cleaner
lock", which must be taken by the cleaner before the segment lock,
and before marking nodes IN_CLEANING. This allows us to flush
vnodes, if necessary, before the cleaning segment is written, and
never to flush vnodes being cleaned. When the cleaner lock is
released, the vnodes are cleared of IN_CLEANING and the reference
dropped.

* Track a potential infinite loop in lfs_gatherblock.

* Pull "needs to flush" and "needs to wait for flush" into functions
instead of inlining their definitions.
 1.210 17-Sep-2025  perseant Add routines to check freelist consistency if compiled with DEBUG and
conditional on a kernel variable manipulated via sysctl.
Add checks before and after each routine that modifies the free list.
#if 0 a section of lfs_vfree() that was intended to keep the free list ordered
but instead corrupted it.
 1.209 15-Sep-2025  perseant If setting the head (or tail) of the inode free list to LFS_UNUSED_INUM, also
set the tail (resp. head) to LFS_UNUSED_INUM, as the list is now empty.

Add a check to ensure that lfs_valloc_fixed will always terminate, even
if the free list should contain a loop. Extend the ifile at the end if it
is empty, to match the assumption of lfs_valloc() that the free list is
never empty.

Needed for roll-forward.
 1.208 28-Mar-2020  christos Comment out some of the CTASSERTS for lint until I fix lint.
 1.207 21-Mar-2020  riastradh Avoid misaligned access to lfs64 on-disk records in memory.

lfs64 directory entries are only 32-bit aligned in order to conserve
space in directory blocks, and we had a hack to stuff a 64-bit inode
in them. This replaces the hack by __aligned(4) __packed, and goes
further:

1. It's not clear that all the other lfs64 data structures are 64-bit
aligned on disk to begin with. We can go through these later and
upgrade them from

struct foo64 {
...
} __aligned(4) __packed;

union foo {
struct foo64 f64;
...
};

to

struct foo64 {
...
};

union foo {
struct foo64 f64 __aligned(8);
...
} __aligned(4) __packed;

if we really want to take advantage of 64-bit memory accesses.

However, the __aligned(4) __packed must remain on the union
because:

2. We access even the lfs32 data structures via a union that has
lfs64 members, and it turns out that compilers will assume access
through a union with 64-bit aligned members implies the whole
union has 64-bit alignment, even if we're only accessing a 32-bit
aligned member.
 1.206 21-Mar-2020  riastradh CTASSERT lfs on-disk structure sizes.
 1.205 23-Feb-2020  riastradh Teach LFS_ORPHAN_NEXTFREE about lfs64.
 1.204 10-Jan-2019  martin branches: 1.204.4; 1.204.6;
Update comment (overlooked in r1.179).
From Jos� Luis Rodr�guez Garc�a in PR kern/53849.
 1.203 26-Jul-2017  maya branches: 1.203.2; 1.203.4;
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar

XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
 1.202 05-Jun-2017  maya Move definition of IN_ALLMOD near the flag it's a mask for.

Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
 1.201 01-Apr-2017  maya branches: 1.201.6;
switch lfs_dirops to condvar (from mtsleep)
 1.200 01-Apr-2017  maya switch lfs_sleepers to condvar (from mtsleep)
 1.199 20-Jun-2016  dholland branches: 1.199.2; 1.199.4;
Massedit u_int{8,16,32,64}_t to uint{8,16,32,64}_t. This effectively
merges ufs/dinode.h 1.25.
 1.198 19-Jun-2016  dholland we are actually synced with ufs/dinode.h 1.24 and ufs/dir.h 1.25.
 1.197 26-Nov-2015  dholland Update now-lying comment.
 1.196 15-Oct-2015  dholland For now bitflip the lfs64 magic number.

This will be unflipped when the format is finalized - right now I
still have pending changes to the superblock in mind (to reduce the
number of redundant fields) so anything created now is not future-
proof. However, the code's also nearing being ready for testing; so
I'm doing this before turning it on as a precaution.
 1.195 15-Oct-2015  dholland Move stuff from struct ulfsmount to struct lfs.
 1.194 03-Oct-2015  dholland Add lfs_checkword type for reading checksum data out of structures.
This is always uint32_t, but having a name for it both makes things
clearer and avoids confusion about whether it should be 32 or 64 bit.

Note: deployed in only one place (that was erroneously tagged
ondisk32) so far.
 1.193 03-Oct-2015  dholland Add an IINFO struct, which is like the FINFO struct but for the inode
blocks portion of the segment summary.

A segment summary block begins with a header (SEGSUM); the rest of the
block contains FINFO structures describing file blocks growing upward
from the bottom (after the header), and IINFO structures describing
inode blocks grown downward from the end of the block. (When they meet
the segment is full regardless of how many blocks might be left.)

IINFO contains just a block number, and until now this information was
handled by just using uint32_t*; switching to a structure will make
the code a lot easier to read, and also make it easier to have 32-bit
and 64-bit versions without making a mess.

This commit just adds the structures and accessors; they'll be
deployed into the code in subsequent commits.
 1.192 21-Sep-2015  dholland Oops, I forgot to make the atime in the 64-bit IFILE 64 bits.
Correct that. Incompatible change, but no LFS64 volumes can have been
created yet.
 1.191 21-Sep-2015  dholland Add 64-bit directory entry structures, and adjust accessors accordingly.

The LFS64 directory entry has a 64-bit inode number. This is stored as
two 32-bit values to avoid inducing 64-bit alignment requirements.

The exposed type for manipulating directory entries is now
LFS_DIRHEADER, following the same convention as e.g. IFILE and SEGUSE.
(But with LFS_ on it, because.)
 1.190 20-Sep-2015  dholland Clean up struct lfs_dirtemplate.
 1.189 15-Sep-2015  dholland Remove struct lfs_direct (no longer used) and update the big block
comment about directories.
 1.188 15-Sep-2015  dholland Add an accessor function for directory names.
 1.187 15-Sep-2015  dholland Move the header part of struct lfs_direct to its own structure.
(lfs_dirheader)

Take the opportunity to improve the directory generation code in
make_lfs.c. (Everything else was unaffected by virtue of using
accessor functions.)
 1.186 15-Sep-2015  dholland Add and use accessor functions for more of the directory entry fields.
 1.185 01-Sep-2015  dholland Add new accessors for the d_type and d_namlen fields of struct lfs_direct.
Napalm the old byteswap access logic for these.
 1.184 01-Sep-2015  dholland Comments on directories.

This includes a description of the struct direct byteswap horrors that
ought to be propagated to ufs/ufs.
 1.183 01-Sep-2015  dholland The ifile's inode number is constant. (it is always 1)

Therefore, storing the value in the superblock and reading it out
again is silly and offers the opportunity for it to become corrupted.
So, don't do that (most of the code already didn't) and use the
existing constant instead. Initialize new 32-bit superblocks with
the value for the sake of old userland programs, but don't keep the
value in the 64-bit superblock at all.

(approved by Margo Seltzer)
 1.182 01-Sep-2015  dholland Make the inode fields in the 64-bit superblock 64 bits wide.
Reasoning as before.

Note that I am not going through and checking for 64->32 truncations
in inode numbers; I'm sure there are quite a few, but that's a project
for later.
 1.181 01-Sep-2015  dholland Add byteswapping to the dinode accessors.

This prevents regressions in the ulfs code when switching to the new
accessors. Note that while adding byteswapping to the other accessors
is straightforward, I haven't done it yet; and that also is not enough
to make LFS_EI work, because there are places lying around that bypass
the accessors for one reason and another and all of them need to be
updated. That is going to have to wait for a later day as LFS_EI is
not on the critical path right now.
 1.180 12-Aug-2015  dholland Hack up dinode usage to be 64 vs. 32 as needed. Part 1.

(This part changes the native lfs code; the ufs-derived code already
has 64 vs. 32 logic, but as aspects of it are unsafe, and don't
entirely interoperate cleanly with the lfs 64/32 stuff, pass 2 will be
rehashing that.)
 1.179 12-Aug-2015  dholland Make the inode number in the 64-bit dinode 64 bits wide, like the
other lfs64 on-disk inode numbers; I've been doing that since this is
a new format and we may as well take the opportunity. This does assume
that more than 4 billion files on a single volume becomes desirable;
but for an average file size of 10K all that takes is a 40 TB volume,
and it's not that hard to make one of those these days if you want to
badly enough.
 1.178 12-Aug-2015  dholland Provide 32-bit and 64-bit versions of FINFO.

This also entailed sorting out part of struct segment, as that
contains a pointer into the current FINFO data.
 1.177 12-Aug-2015  dholland Make 32-bit and 64-bit versions of SEGSUM.
Also fix some of the FINFO handling as it's closely entangled.
 1.176 12-Aug-2015  dholland Add IFILE32 and IFILE64 structures for the on-disk ifile entries.
Add and use accessors. There are also a bunch of places that cast and
I hope I've found them all...
 1.175 12-Aug-2015  dholland Make 32-bit and 64-bit versions of CLEANERINFO.

XXX: while this is written to disk, it seems like much of it would
XXX: be better set up as a commpage shared with the cleaner.
 1.174 12-Aug-2015  dholland Widen several of the fields of BLOCK_INFO to 64 bits.

Keep the old BLOCK_INFO as BLOCK_INFO_70, and version the fcntls that
use it.

Note that BLOCK_INFO_70 has 64-bit padding issues so that it's
different on 32-bit and 64-bit machines. This has been fixed. However,
BLOCK_INFO also contains a pointer, so compat32 stuff for 32-on-64 is
still needed and doesn't currently exist.
 1.173 12-Aug-2015  dholland Fix assorted 64->32 truncations related to BLOCK_INFO.

Also make note of a cleaner limitation: it seems that when it goes to
coalesce discontiguous files, it mallocs an array with one BLOCK_INFO
for every block in the file. Therefore, with 64-bit LFS, on a 32-bit
platform it will be possible to have files large enough to overflow
the cleaner's address space. Currently these will be skipped and cause
warnings via syslog.

At some point someone should rewrite the logic to coalesce files to
use chunks of some reasonable size, as discontinuity between such
chunks is immaterial and mallocing this much space is silly and
fragile. Also, the kernel only accepts up to 65536 blocks at a time
for bmapv and markv, so processing more than this at once probably
isn't useful and may not even work currently. I don't want to change
this around just now as it's not entirely trivial.
 1.172 02-Aug-2015  dholland Pass the fs object to LFS_MAX_DADDR so it can check lfs_is64.

Remove some hackish intentional 64->32 truncations next to the checks
using LFS_MAX_DADDR, and tackle the problem they handled in bmap
instead.

The problem: the magic block pointer value UNWRITTEN has magic value
-2, and if it's not handled specifically, uint32 -> uint64 promotion
turns it into 4294967294, which then causes consternation and
monkeyhouse downstream.

What's here is still kind of a hack, but it's a step forward.
 1.171 02-Aug-2015  dholland Add a (draft) 64-bit superblock. Make things build again.

Add pieces of support for using both superblock types where
convenient, and specifically to the superblock accessors, but don't
actually enable it anywhere.

First substantive step on PR 50000.
 1.170 02-Aug-2015  dholland lfs_cleanint[] in the in-memory superblock needs to have 64-bit entries.
 1.169 02-Aug-2015  dholland Second batch of 64 -> 32 truncations in lfs, along with more minor
tidyups and corrections in passing.
 1.168 02-Aug-2015  dholland Fix assorted 64 -> 32 truncations in lfs. Also, some minor tidyups and
corrections in passing.
 1.167 28-Jul-2015  dholland Move struct salfs back inside libsa now that lfs_accessors.h is separate.
 1.166 28-Jul-2015  dholland Add a new lfs header file: lfs_accessors.h.

This contains all the accessor functions and macros out of lfs.h.
Add an include of lfs_accessors.h after all uses of lfs.h... except
for code that wants to define its own struct lfs-alike that the
accessors are supposed to play along with. For these, set STRUCT_LFS
and include lfs_accessors.h after the necessary structure has been
defined, so that lfs_accessors.h can emit functions in terms of it.
 1.165 24-Jul-2015  dholland More lfs superblock accessors.
(This changes the rest of the code over; all the accessors were
already added.)

The difference between this commit and the previous one is arbitrary,
but the previous one passed the regression tests on its own so I'm
keeping it separate to help with any bisections that might be needed
in the future.
 1.164 24-Jul-2015  dholland Switch to accessor functions for elements of the LFS on-disk
superblock. This will allow switching between 32/64 bit forms on the
fly; it will also allow handling LFS_EI reasonably tidily. (That
currently doesn't work on the superblock.)

It also gets rid of cpp abuse in the form of fake structure member
macros.

Also, instead of doing sleep/wakeup on &lfs_avail and &lfs_nextseg
inside the on-disk superblock, add extra elements to the in-memory
struct lfs for this. (XXX: these should be changed to condvars, but
not right now)

XXX: this migrates a structure needed by the lfs code in libsa (struct
salfs) into lfs.h, where it doesn't belong, but for the time being
this is necessary in order to allow the accessors (and the various
lfs macros and other goop that relies on them) to compile.
 1.163 24-Jul-2015  dholland ulfs2_dinode, having never actually been used with lfs, doesn't have a
di_inumber field. Fix that. First preliminary step on PR 50000.
 1.162 31-May-2015  hannken Use VFS_PROTOS() for lfs.
Rename conflicting struct lfs field "lfs_start" to "lfs_s0addr".

No functional change.
 1.161 28-Mar-2015  maxv Remove the 'cred' argument from bread(). Remove a now unused var in
ffs_snapshot.c. Update the man page accordingly.

ok hannken@
 1.160 28-Jul-2013  dholland branches: 1.160.6;
Bring in a copy of ffs_quota2_mount() for reference.
Add stuff to struct lfs that it needs to initialize.
Clear these fields in mount as there's no on-disk support for quota2;
but this increases the chances of being able to add it (or something
like it) in the future.
 1.159 28-Jul-2013  dholland Migrate the miscellaneous ulfs-level info from struct ulfsmount to
struct lfs.

Put them inside #ifdef _KERNEL there. They are not the only such
members, gross as that is. Unfortunately, moving struct lfs to
lfs_kernel.h does not work.
 1.158 28-Jul-2013  dholland Add lfs_kernel.h for declarations that don't need to be exposed to userland.

lfs currently has the following headers:
lfs.h - on-disk structures and stuff needed for userlevel tools
lfs_inode.h - additional restricted materials for userlevel tools
that operate the fs (newfs_lfs, fsck_lfs, lfs_cleanerd)
lfs_kernel.h - stuff needed only in the kernel

and the following legacy headers that are expected to be mopped up and
folded into one of the above:
lfs_extern.h - function prototypes
ulfs_bswap.h - endian-independent support
ulfs_dinode.h - now contains very little
ulfs_dirhash.h - dirhash support
ulfs_extattr.h - extattr support
ulfs_extern.h - more function prototypes
ulfs_inode.h - assorted kernel-only declarations
ulfs_quota.h - quota support
ulfs_quota1.h - more quota support
ulfs_quota2.h - more quota support
ulfs_quotacommon.h - more quota support
ulfsmount.h - legacy copy of ufsmount material
 1.157 28-Jun-2013  matt branches: 1.157.2;
Remove duplicate define of LFS_MAXNAMLEN
 1.156 23-Jun-2013  dholland typo in comment
 1.155 19-Jun-2013  dholland Rename ambiguous macros:
MAXDIRSIZE -> UFS_MAXDIRSIZE or LFS_MAXDIRSIZE
NINDIR -> FFS_NINDIR, EXT2_NINDIR, LFS_NINDIR, or MFS_NINDIR
INOPB -> FFS_INOPB, LFS_INOPB
INOPF -> FFS_INOPF, LFS_INOPF
blksize -> ffs_blksize, ext2_blksize, or lfs_blksize
sblksize -> ffs_blksize

These are not the only ambiguously defined filesystem macros, of
course, there's a pile more. I may not have found all the ambiguous
definitions of blksize(), too, as there are a lot of other things
called 'blksize' in the system.
 1.154 18-Jun-2013  christos Prefix most of the cpp macros with lfs_ and LFS_ to avoid conflicts with ffs.
This was done so that boot blocks that want to compile both FFS and LFS in
the same file work.
 1.153 18-Jun-2013  dholland Tuck away a bunch of symbols that don't need to be public.
 1.152 09-Jun-2013  dholland Move struct lfs_inode_ext to lfs_inode.h; it doesn't need to be public.
 1.151 08-Jun-2013  dholland Remove stale union and accessor macros.
 1.150 08-Jun-2013  dholland ulfs_dir.h has been emptied; remove it.
 1.149 08-Jun-2013  dholland Move a comment to lfs.h that belongs better there.
 1.148 08-Jun-2013  dholland As nearly all the content of ulfs_dir.h and ulfs_dinode.h has migrated
to lfs.h, propagate the copyright notices too.
 1.147 08-Jun-2013  dholland Move more symbols to lfs.h:
LFS_DIRBLKSIZ
LFS_DIRECTSIZ
LFS_DIRSIZ
LFS_OLDDIRFMT
LFS_NEWDIRFMT
LFS_IFTODT
LFS_DTTOIF
ULFS{,1,2}_MAXSYMLINKLEN
 1.146 08-Jun-2013  dholland DIRBLKSIZ -> LFS_DIRBLKSIZ
DIRECTSIZ -> LFS_DIRECTSIZ
DIRSIZ -> LFS_DIRSIZ
OLDDIRFMT -> LFS_OLDDIRFMT
NEWDIRFMT -> LFS_NEWDIRFMT
IFTODT -> LFS_IFTODT
DTTOIF -> LFS_DTTOIF
 1.145 08-Jun-2013  dholland Move stuff to lfs.h that's needed by userland:
LFS_DT_*
ULFS_ROOTINO
ULFS_WINO
struct lfs_direct
struct lfs_dirtemplate
struct lfs_odirtemplate
struct ulfs_args

Also fix FFS_MAXNAMLEN -> LFS_MAXNAMLEN in several places.
 1.144 08-Jun-2013  dholland Now move LFS_IFMT and friends from ulfs_dinode.h to lfs.h.
 1.143 08-Jun-2013  dholland Move the dinode (on-disk inode) structures to lfs.h, since they are
and will be obviously required by userland tools that need to read
the on-disk structures.

Also, DINODE{1,2}_SIZE -> LFS_DINODE{1,2}_SIZE.
 1.142 08-Jun-2013  dholland Split the definitions suitable for userland out of ulfs_inode.h into
lfs_inode.h. Since fsck_lfs, newfs_lfs, and lfs_cleanerd want to reuse
the inode structure for their own internal use, and some of them share
parts of the kernel code as well, the best way forward is to provide a
relatively sanitized header that doesn't bring in stray material.

Shuffle a few other definitions around so that lfs_inode.h depends
only on lfs.h.

Install lfs_inode.h into /usr/include.
 1.141 06-Jun-2013  dholland Fix some exposed symbols:
LOSTFOUNDINO -> LFS_LOSTFOUNDINO
struct ufid -> struct ulfs_ufid
 1.140 06-Jun-2013  dholland Cleanups to reduce symbol and header exposure:
- move struct ufid from ulfs_inode.h to lfs.h
- lfs.h needs sys/mount.h and sys/pool.h
- ulfs_quota2_subr.c needs lfs_inode.h
- remove ulfs_inode.h from lfs.h in favor of ulfs_dinode.h
- move ULFS_NDADDR, ULFS_NIADDR, ULFS_NXADDR from ulfs_dinode.h to lfs.h
- remove ulfs_dinode.h from lfs.h
- add lfs.h to ulfs_dinode.h
 1.139 06-Jun-2013  dholland Remove stray references to ext2fs, chfs, ffs, and mfs.
 1.138 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.137 22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.136 16-Feb-2012  perseant branches: 1.136.2;
Pass t_renamerace and t_rmdirrace tests.

Adapt dholland@'s fix to ufs_rename to fix PR kern/43582. Address several
other MP locking issues discovered during the course of investigating the
same problem.

Removed extraneous vn_lock() calls on the Ifile, since the Ifile writes
are controlled by the segment lock.

Fix PR kern/45982 by deemphasizing the estimate of how much metadata
will fill the empty space on disk when the disk is nearly empty
(t_renamerace crates a lot of inode blocks on a tiny empty disk).
 1.135 02-Jan-2012  perseant branches: 1.135.2;

* Remove PGO_RECLAIM during lfs_putpages()' call to genfs_putpages(),
to avoid a live lock in the latter when reclaiming a vnode with
dirty pages.

* Add a new segment flag, SEGM_RECLAIM, to note when a segment is
being written for vnode reclamation, and record which inode is being
reclaimed, to aid in forensic debugging.

* Add a new segment flag, SEGM_SINGLE, so that opportunistic writes
can write a single segment's worth of blocks and then stop, rather
than writing all the way up to the cleaner's reserved number of
segments.

* Add assert statements to check mutex ownership is the way it ought
to be, mostly in lfs_putpages; fix problems uncovered by this.

* Don't clear VU_DIROP until the inode actually makes its way to disk,
avoiding a problem where dirop inodes could become separated
(uncovered by a modified version of the "ckckp" forensic regression
test).

* Move the vfs_getopsbyname() call into lfs_writerd. Prepare code to
make lfs_writerd notice when there are no more LFSs, and exit losing
the reference, so that, in theory, the module can be unloaded. This
code is not enabled, since it causes a crash on exit.

* Set IN_MODIFIED on inodes flushed by lfs_flush_dirops. Really we
only need to set IN_MODIFIED if we are going to write them again
(e.g., to write pages); need to think about this more.

Finally, several changes to help avoid "no clean segments" panics:

* In lfs_bmapv, note when a vnode is loaded only to discover whether
its blocks are live, so it can immediately be recycled. Since the
cleaner will try to choose ~empty segments over full ones, this
prevents the cleaner from (1) filling the vnode cache with junk, and
(2) squeezing any unwritten writes to disk and running the fs out of
segments.

* Overestimate by half the amount of metadata that will be required
to fill the clean segments. This will make the disk appear smaller,
but should help avoid a "no clean segments" panic.

* Rearrange lfs_writerd. In particular, lfs_writerd now pays
attention to the number of clean segments available, and holds off
writing until there is room.
 1.134 11-Jul-2011  hannken branches: 1.134.2; 1.134.6;
Change VOP_BWRITE() to take a vnode as its first argument like all other
VOPs do. Layered file systems no longer have to modify bp->b_vp and run
into trouble when an async VOP_BWRITE() uses the wrong vnode.

- change all occurences of VOP_BWRITE(bp) to VOP_BWRITE(bp->b_vp, bp).
- remove layer_bwrite().
- welcome to 5.99.55

Adresses PR kern/38762 panic: vwakeup: neg numoutput

No objections from tech-kern@.
 1.133 16-Feb-2010  mlelstv Three changes in a single commit.

- drop the notion of frags (LFS fragments) vs fsb (FFS fragments)
The code uses a complicated unity function that just makes the
code difficult to understand.

- support larger sector sizes. Fix disk address computations
to use DEV_BSIZE in the kernel as required by device drivers
and to use sector sizes in userland.

- Fix several locking bugs in lfs_bio.c and lfs_subr.c.
 1.132 05-Nov-2009  pooka branches: 1.132.2;
... actually, define compat only for the kernel. Userlandia should
see only one version of the interfaces.
 1.131 05-Nov-2009  pooka Include compat/sys/time_types.h instead of compat/sys/time.h.
Fixes lint drama with interface name collisions.
 1.130 05-Nov-2009  pooka Include compat code by default.
 1.129 29-Oct-2009  christos PR/42246: NAKAJIMA Yoshihiro: provide COMPAT_50 for LFS
 1.128 19-Jul-2009  dholland typo in comment
 1.127 16-May-2008  hannken branches: 1.127.12;
Make sure all cached buffers with valid, not yet written data have been
run through copy-on-write. Call fscow_run() with valid data where possible.

The LP_UFSCOW hack is no longer needed to protect ffs_copyonwrite() against
endless recursion.

- Add a flag B_MODIFY to bread(), breada() and breadn(). If set the caller
intends to modify the buffer returned.

- Always run copy-on-write on buffers returned from ffs_balloc().

- Add new function ffs_getblk() that gets a buffer, assigns a new blkno,
may clear the buffer and runs copy-on-write. Process possible errors
from getblk() or fscow_run(). Part of PR kern/38664.

Welcome to 4.99.63

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.126 28-Apr-2008  martin branches: 1.126.2;
Remove clause 3 and 4 from TNF licenses
 1.125 15-Feb-2008  ad branches: 1.125.6; 1.125.8; 1.125.10;
The buffer LOCKED flag need not be under the protection of bufcache_lock,
BUSY is enough.
 1.124 03-Jan-2008  ad Use pool_cache.
 1.123 02-Jan-2008  ad Merge vmlocking2 to head.
 1.122 10-Oct-2007  ad branches: 1.122.4; 1.122.6; 1.122.10;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.121 08-Oct-2007  ad Merge ffs locking & brelse changes from the vmlocking branch.
 1.120 16-May-2007  perseant branches: 1.120.6; 1.120.8; 1.120.10;
Change references to SEGM_W_DIROPS to SEGM_CKP, and replace the logic that
formerly used SEGM_W_DIROPS in lfs_segwrite() appropriately. This prevents
a problem in which processes could get stuck in "buffers" sleep forever.
 1.119 17-Apr-2007  perseant Install a new sysctl, vfs.lfs.ignore_lazy_sync, which causes LFS to ignore
the "smooth" syncer, as if vfs.sync.*delay = 0, but only for LFS. The
default is "on", i.e., ignore lazy sync.

Reduce the amount of polling/busy-waiting done by lfs_putpages(). To
accomplish this, copied genfs_putpages() and modified it to indicate which
page it was that caused it to return with EDEADLK. fsync()/fdatasync()
should no longer ever fail with EAGAIN, and should not consume huge
quantities of cpu.

Also, try to make dirops less likely to be written as the result of a
VOP_PUTPAGES(), while ensuring that they are written regularly.
 1.118 15-Feb-2007  ad branches: 1.118.2; 1.118.6; 1.118.8;
Replace some uses of lockmgr() / simplelocks.
 1.117 28-Sep-2006  perseant Use lockstatus instead of a homebrewed locking system to control
LFCNWRAPSTOP and LFCNWRAPGO.

Be less verbose about the various looping checks: use log() rather than
printf(), and only log anything if we are really looping ("count = 2" is
not an error condition).

Allow dirops sleeping on available space to be interruptible.
 1.116 15-Sep-2006  perseant branches: 1.116.2;
Don't remark a locked inode with IN_MODIFIED after writing it to disk,
if we ourselves hold the lock. This prevents e.g. mknod from hanging
indefinitely.

Also, always use the return value from VOP_ISLOCKED to determine whether
we hold the lock or someone else does, rather than looking into the lock
structure ourselves.
 1.115 15-Sep-2006  yamt merge yamt-pdpolicy branch.
- separate page replacement policy from the rest of kernel
- implement an alternative replacement policy
 1.114 01-Sep-2006  perseant branches: 1.114.2;
Changes to help the roll-forward agent, to wit:

* Mark being-deleted files in the Ifile so we can finish deleting them
at fs mount time.
* Flag the Ifile with "cleaner must clean" when writers are waiting for
the cleaner, rather than relying solely on the cleaner's estimation of
whether it should clean or not.
* Note partial segments written by a user agent (in particular,
fsck_lfs) so that repeated rolls forward don't interfere with one
another.
* Add a new fcntl, LFCNPASS, that allows the log to wrap exactly once,
for better testing of the validity of checkpoints.
* Keep track of the on-disk nlink count when cleaning, so that we don't
partially complete directory operations while cleaning.
* Ensure that every single Ifile inode write represents a consistent
view of the filesystem. In particular, the accounting for the segment
we are writing the inode into must be correct, and the accounting for
the segment that inode used to reside in must be correct. Rather than
just rewriting the inode if we wrote it wrong, rewrite the necessary
ifile blocks before writing the inode so we never write it wrong.
* Don't unmark any VDIROP vnodes if we haven't written them to disk,
avoiding yet another problem with the "wait for the cleaner" error
return from lfs_putpages().

Also, move the last callback to an aiodone call, so we no longer do any
memory management from interrupt context.
 1.113 06-Aug-2006  martin Fix size confusion with lfs_fhandle - and as it now turns out to be the same
as the lfs compat_30_fhandle, g/c the latter.
Add an alias for the LFCNIFILEFH fcntl, so that binaries compiled in the
meantime (with too large lfs_fhandle) continue to work.

This makes vfs_cleanerd work again after the kernel checks filehandle size
more strictly (problem reported by Kurt Schreiner on current-users).
 1.112 31-Jul-2006  martin Make filehandles opaque to userland
 1.111 20-Jul-2006  perseant Note partial segments that are written by the cleaner, to help out the
roll-forward agent.
 1.110 13-Jul-2006  martin Version the lfs_cleanerd internal fcntl() for filehandles too,
so old cleaners should work with newer kernels.
 1.109 13-Jul-2006  martin Fix alignement problems for fhandle_t, exposed by gcc4.1.

While touching all vptofh/fhtovp functions, get rid of VFS_MAXFIDSIZ,
version the getfh(2) syscall and explicitly pass the size available in
the filehandle from userland.

Discussed on tech-kern, with lots of help from yamt (thanks!).
 1.108 24-Jun-2006  perseant Change LFCNWRAP{STOP,GO} to make them more suitable for snapshotting; in
particular, the caller can now choose whether to wait for the condition
to be met, and if the caller of LFCNWRAPSTOP dies or otherwise closes
the descriptor, the filesystem is started again. Updated the ckckp
regression test to use the new semantics.

dump_lfs(8) now uses the fcntls to implement LFS-style snapshotting through
the -X flag, addressing PR#33457 albeit not using fss(4). Fixed a couple
other problems with dump_lfs that manifested themselves during testing.
 1.107 14-May-2006  elad branches: 1.107.4;
integrate kauth.
 1.106 12-May-2006  perseant Fixes to address the "vinvalbuf: dirty blocks" panic that can occur when
many inodes are cleaned at once. Make sure that we write all the pages
on vnodes that are being flushed, even if we don't think there's room;
drain v_numoutput before lfs_vflush() completes.

Also, don't allow a vnode that is in the process of being cleaned to be
chosen by getnewvnode(); this avoids a segment accounting panic in the case
that a large number of inodes are fed to lfs_markv() all at once.
 1.105 04-May-2006  perseant Introduce another per-filesystem parameter, lfs_resvseg, to separate the
notion of "how many segments are reserved for the cleaner" from that of
"how many segments are not counted in lfs_bfree". The default value
used for existing filesystems is the same as the previous implicit value
of (lfs_minfreeseg / 2 + 1), modulo some sanity checking.

Count pending dirops on a per-filesystem basis, since once we start
writing them we can't stop until we're done. This seems to help stave off
the "no clean segments" panic in the case of filling the filesystem with
directories and small files (e.g. simultaneously unpacking more copies of
pkgsrc than will fit).
 1.104 30-Apr-2006  perseant Postpone the segment accounting changes coming from truncation until the
inode that makes those changes valid is either written to disk by
lfs_writeinode() or discarded by lfs_vfree().

A couple of locking fixes are also included as well.
 1.103 17-Apr-2006  perseant Introduce two fcntl calls that freeze the filesystem right at the point
where segment 0 is being considered for writing. This allows for automated
checkpoint vailidity scanning, and could be used (in conjunction with the
existing LFCNREWIND) for e.g. snapshot dumps as well.

Include a regression test that does such scanning.

When writing the Ifile, loop through the dirty block list three times to
make sure that the checkpoint is always consistent (the first and second
times the Ifile blocks can cross a segment boundary; not so the third time
unless the segments are very small). Discovered by using the aforementioned
regression test.
 1.102 13-Apr-2006  perseant Make lfs_vref/lfs_vunref not need to know about VXLOCK and VFREEING
explicitly (especially since we didn't know about VFREEING at all before),
but notice the EBUSY return from vget() instead.

Fix some more MP locking protocol issues, most of which were pointed out by
Christian Ehrhardt this morning on tech-kern.
 1.101 10-Apr-2006  perseant Optimize the free list search a little more; in particular use words
instead of bytes for the index, and never search below fs->lfs_freehd.

Fix a bug in the previous version of the search (an erroneous assumption
that ino_t was signed).

Free the bitmap when we unmount the filesystem.
 1.100 08-Apr-2006  perseant Implement a somewhat finer-grained mechanism for paging LFS-backed pages.
The writer daemon, if it does not need to flush the whole filesystem,
now only writes the vnodes for which the pagedaemon has requested pageouts
(although it does not pay attention to the page ranges the pagedaemon
supplies).
 1.99 08-Apr-2006  perseant Keep the free list ordered. This solves a problem first pointed out to me
by Michel Oey, in which an aged LFS writes up to an extra Ifile block for
every file created; and paves the way for the truncation of the Ifile when
many files are deleted.
 1.98 07-Apr-2006  perseant Make the segment lock aware of LWPs. Fixes a (somewhat confusing)
"lockmgr: pid 3997, not exclusive lockholder 3997, unlocking" panic I
encountered while running blogbench on an LFS.
 1.97 24-Mar-2006  perseant Improvements to LFS's paging mechanism, to wit:

* Acknowledge that sometimes there are more dirty pages to be written to
disk than clean segments. When we reach the danger line,
lfs_gop_write() now returns EAGAIN. The caller of VOP_PUTPAGES(), if
it holds the segment lock, drops it and waits for the cleaner to make
room before continuing.

* Note and avoid a three-way deadlock in lfs_putpages (a writer holding
a page busy blocks on the cleaner while the cleaner blocks on the
segment lock while lfs_putpages blocks on the page).
 1.96 17-Mar-2006  tls From Konrad Schroeder, in response to strange df output on anoncvs.netbsd.org:
We were returning the wrong value for free space. Now we're not.
 1.95 11-Dec-2005  christos branches: 1.95.4; 1.95.6; 1.95.8; 1.95.10; 1.95.12;
merge ktrace-lwp.
 1.94 13-Sep-2005  christos split out lfs_itimes(). It is used in fsck_lfs.
 1.93 12-Sep-2005  christos Use nanotime() to update the time fields in filesystems. Convert the code
from macros to real functions. Original patch and review from chuq.
Note: ext2fs only keeps seconds in the on-disk inode, and msdosfs does not
have enough precision for all fields, so this is not very useful for those
two.
 1.92 23-Aug-2005  christos Don't overload MAXNAMLEN, use a separate constant for each filesystem type.
 1.91 22-Aug-2005  yamt whitespace.
 1.90 22-Aug-2005  christos change ino_t to u_int32_t for syscall compatibility.
 1.89 31-Jul-2005  christos Move extern kernel variable declarations, into a _KERNEL protected session
so that the don't pollute userland's namespace.
 1.88 29-May-2005  christos branches: 1.88.2;
- sprinkle const
- avoid shadow variables.
 1.87 20-May-2005  perseant Keep track of the number of segments reclaimed, since the cleaner doesn't
do this anymore (it hasn't for quite some time). Add a couple of conditional
debugging messages to indicate why segments are not cleaned, in the event
that lfs_segclean is used.

Make the LFCNSEGWAITALL fcntl work again.
 1.86 23-Apr-2005  perseant Provide a resize_lfs(8), including kernel and cleaner support. The current
implementation requires the fs to be mounted while resizing. Tested in both
directions, and everything appears to work happily, but ymmv.
 1.85 19-Apr-2005  perseant Keep per-inode, per-fs, and subsystem-wide counts of blocks allocated through
lfs_balloc(), and use that to estimate the number of dirty pages belonging
to LFS (subsystem or filesystem). This is almost certainly wrong for
the case of a large mmap()ed region, but the accounting is tighter than
what we had before, and performs much better in the typical case of pages
dirtied through write().
 1.84 16-Apr-2005  perseant Make userland compile again.
 1.83 16-Apr-2005  perseant Use splay trees, rather than a hash table, to manage the accounting of
blocks allocated through VOP_BALLOC() for pages to be written to disk.
This accounting no longer takes a noticeable fraction of the system CPU.
 1.82 16-Apr-2005  perseant Use lfs_malloc() to manage the blkiov arrays that the cleaner functions use,
since the cleaner is likely to operate in a low-memory condition.
 1.81 14-Apr-2005  perseant Tabify leading whitespace
 1.80 14-Apr-2005  perseant Consolidate the hash table we use to maintain the integrity of lfs_avail
into a single, system-wide table, rather than having a separate hash table
per inode. Significantly reduces the "system" cpu usage of your average
file write.
 1.79 14-Apr-2005  perseant Keep track of the highest block held by an LFS inode, so that we can
be assured that the last byte of a file is always allocated. Previously
a file extension could cause the filesystem to be flushed, writing an
inconsistent inode to disk. Although this condition would be corrected
the next time blocks were written to disk, an intervening crash would leave
the filesystem in an inconsistent state, leaving fsck_lfs to complain
of an inode "partially truncated".
 1.78 01-Apr-2005  perseant Protect various per-fs structures with fs->lfs_interlock simple_lock, to
improve behavior in the multiprocessor case. Add debugging segment-lock
assertion statements.
 1.77 08-Mar-2005  perseant branches: 1.77.2;
Straighten out the maze of ifdefs. Instead, consolidate all the debugging
stuff under '#ifdef DEBUG', and use sysctl knobs to turn on/off particular
parts of the debugging reporting (if DEBUG is enabled). Re-enable the LFS
statistics in sysctl, while I'm there. A bit of a rototill.
 1.76 26-Feb-2005  perry nuke trailing whitespace
 1.75 26-Feb-2005  perseant Various minor LFS improvements:

* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statvfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().
 1.74 14-Aug-2004  mycroft branches: 1.74.4; 1.74.6;
Push atime/mtime updates even further -- into the reclaim path, so they happen
rarely in the normal case. (Note: This happens at reboot/shutdown time because
all file systems are unmounted.)

Also, for IN_MODIFY, use IN_ACCESSED, not IN_MODIFIED; otherwise "ls -l" of
your device node or FIFO would cause the time stamps to get written too
quickly.
 1.73 14-Aug-2004  mycroft Add a new flag, IN_MODIFY. This is like IN_UPDATE|IN_CHANGE, but unlike
setting those flags, it does not cause the inode to be written in the periodic
sync. This is used for writes to special files (devices and named pipes) and
FIFOs.

Do not preemptively sync updates to access times and modification times. They
are now updated in the inode only opportunistically, or when the file or device
is closed. (Really, it should be delayed beyond close, but this is enough to
help substantially with device nodes.)

And the most amusing part:
Trickle sync was broken on both FFS and ext2fs, in different ways. In FFS, the
periodic call to VFS_SYNC(MNT_LAZY) was still causing all file data to be
synced. In ext2fs, it was causing the metadata to *not* be synced. We now
only call VOP_UPDATE() on the node if we're doing MNT_LAZY. I've confirmed
that we do in fact trickle correctly now.
 1.72 09-Mar-2004  yamt branches: 1.72.4;
use correct segment size. this fixes memory corruption when using lfsv1.
 1.71 28-Jan-2004  yamt use bufmem instead of bufpages to make lfs a little less broken.
 1.70 07-Sep-2003  yamt - raise spl to bio in lfs_countlocked() rather than having callers to do so.
- buffer cache MP locks.
- assert B_CALL buffers are not on the free queue.
 1.69 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.68 30-Jul-2003  yamt using normal bufcache buffer for cluster buffer head.
 1.67 12-Jul-2003  yamt - protect global resource counts with lfs_subsys_lock.
- clean up scattered externs a little.
 1.66 02-Jul-2003  yamt use queue.h macros.
 1.65 02-Jul-2003  yamt - add a new functions, lfs_writer_enter/leave, and use them instead of
duplicated code fragments.
- add an assertion.
 1.64 23-Apr-2003  perseant branches: 1.64.2;
Make LFS work better (though still not "well") as an NFS-exported
filesystem (and other things that needed to be fixed before the tests
would complete), to wit:

* Include the fs ident in the filehandle; improve stale filehandle checks.

* Change definition of blksize() to use the on-dinode size instead of
the inode's i_size, so that fsck_lfs will work properly again.

* Use b_interlock in lfs_vtruncbuf.

* Postpone dirop reclamation until after the seglock has been released,
so that lfs_truncate is not called with the segment lock held.

* Don't loop in lfs_fsync(), just write everything and wait.

* Be more careful about the interlock/uobjlock in lfs_putpages: when we
lose this lock, we have to resynchronize dirtiness of pages in each
block.

* Be sure to always write indirect blocks and update metadata in
lfs_putpages; fixes a bug that caused blocks to be accounted to the
wrong segment.
 1.63 09-Apr-2003  thorpej Use PAGE_SIZE rather than NBPG.
 1.62 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.61 28-Mar-2003  perseant Add a sleeper count, to prevent the cleaner from panicing the kernel
when the filesystem is unmounted, relocking the Ifile when its lock is
draining. (We can't use vfs_busy() since the process is sleeping for a
good long time.) Clean up / organize lfs.h, while I'm here.

In lfs_update_single, assert that disk addresses are either negative, or
are still positive when converted to int32_t, to prevent recurrence of a
negative/positive block problem.
 1.60 21-Mar-2003  perseant KNF (space after keywords).
 1.59 21-Mar-2003  perseant Use VONWORKLST as a heuristic for vnode emptiness, rather than exhaustively
checking the memq.

Take greater care not to dirty the Ifile vnode when unmounting the filesystem.
This should fix a "(vp->v_flag & VONWORKLST) == 0" assertion panic in vgonel
that could occur when unmounting.

Do not allow the Ifile to be mapped for writing.
 1.58 15-Mar-2003  perseant Add simple_lock protection for lfs_seglock and lfs_subsys_pages; these will
be expanded to cover other per-fs and subsystem-wide data as well.

Fix a case of IN_MODIFIED being set without updating lfs_uinodes, resulting
in a "lfs_uinodes < 0" panic.

Fix a deadlock in lfs_putpages arising from the need to busy all pages in a
block; unbusy any that had already been busied before starting over.
 1.57 11-Mar-2003  perseant - Get rid of unused #ifdefs LFS_NO_PAGEMOVE and LFS_MALLOC_SUMMARY (both
always true) and accompanying dead code.

- When constructing write clusters in lfs_writeseg, if the block we are
about to add is itself a cluster from GOP_WRITE, don't put a cluster
in a cluster, just write the GOP_WRITE cluster on its own. This seems
to represent a slight performance gain on my test machine.

- Charge someone's rusage for writes on LFSes. It's difficult to tell
who the "right" process to charge is; just charge whoever triggered
the write.
 1.56 08-Mar-2003  perseant Take away "#ifdef LFS_UBC".
 1.55 08-Mar-2003  perseant Add an lfs_strategy() that checks to make sure we're not trying to read
where the cleaner is trying to write, instead of tying up the "live"
buffers (or pages).

Fix a bug in the LFS_UBC case where oversized buffers would not be
checksummed correctly, causing uncleanable segments.

Make sure that wakeup(fs->lfs_iocount) is done if fs->lfs_iocount is 1
as well as 0, since we wait in some places for it to drop to 1.

Activate all pages that make it into lfs_gop_write without the segment
lock held, since they must have been dirtied very recently, even if
PG_DELWRI is not set.
 1.54 02-Mar-2003  perseant Account SEGUSE_ACTIVE correctly so that the automatic segment cleaning
actually happens.

Add a new fcntl call that will write the minimum necessary to checkpoint
(i.e., for on-disk directory structure to be consistent, not including
updates to file data) so that the cleaner can clean segments more quickly
without sacrificing three-way commit for cleaning.
 1.53 27-Feb-2003  perseant Do roundup and offset arithmetic in 64 bits, to allow >=2G files.
 1.52 25-Feb-2003  perseant Make fs-specific fcntl macros take three arguments (approved wrstuden).
Let LFS use fcntl for cleaner functions.
 1.51 24-Feb-2003  perseant Add lfs_ioctl vnode op, with ioctls to take over cleaner system call
functionality (not including segment clean, since that is now done
automatically as checkpoints happen).
 1.50 23-Feb-2003  perseant Fix a buffer overflow bug in the LFS_UBC case that manifested itself
either as a mysterious UVM error or as "panic: dirty bufs". Verify
maximum size in lfs_malloc.

Teach lfs_updatemeta and lfs_shellsort about oversized cluster blocks from
lfs_gop_write.

When unwiring pages in lfs_gop_write, deactivate them, under the theory
that the pagedaemon wanted to free them last we knew.
 1.49 20-Feb-2003  perseant Tabify, and fix some comment alignment problems.
 1.48 19-Feb-2003  yamt workaround for "another flush is..." infinity loop in writerd.
if we're writerd, sleep in lfs_flush until another writer goes away
instead of busy loop in writed.
 1.47 18-Feb-2003  soren Make libsa compile again.
 1.46 17-Feb-2003  perseant Add code to UBCify LFS. This is still behind "#ifdef LFS_UBC" for now
(there are still some details to work out) but expect that to go
away soon. To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:

* Create a writer daemon kernel thread whose purpose is to handle page
writes for the pagedaemon, but which also takes over some of the
functions of lfs_check(). This thread is started the first time an
LFS is mounted.

* Add a "flags" parameter to GOP_SIZE. Current values are
GOP_SIZE_READ, meaning that the call should return the size of the
in-core version of the file, and GOP_SIZE_WRITE, meaning that it
should return the on-disk size. One of GOP_SIZE_READ or
GOP_SIZE_WRITE must be specified.

* Instead of using malloc(...M_WAITOK) for everything, reserve enough
resources to get by and use malloc(...M_NOWAIT), using the reserves if
necessary. Use the pool subsystem for structures small enough that
this is feasible. This also obsoletes LFS_THROTTLE.

And a few that are not strictly necessary:

* Moves the LFS inode extensions off onto a separately allocated
structure; getting closer to LFS as an LKM. "Welcome to 1.6O."

* Unified GOP_ALLOC between FFS and LFS.

* Update LFS copyright headers to correct values.

* Actually cast to unsigned in lfs_shellsort, like the comment says.

* Keep track of which segments were empty before the previous
checkpoint; any segments that pass two checkpoints both dirty and
empty can be summarily cleaned. Do this. Right now lfs_segclean
still works, but this should be turned into an effectless
compatibility syscall.
 1.45 29-Jan-2003  yamt don't use daddr_t for segment summary since it's an on-disk structure.
 1.44 27-Jan-2003  yamt make these compilable with lfs debug options.
(follow daddr_t change)

XXX maybe segment number should be 64bit.
 1.43 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.42 01-Dec-2002  matt Add multiple inclusion protection for headers. Fix mismatched
variable declarations (missing const's) as needed.
 1.41 06-Jul-2002  perseant Deal with fragment size changes better. For each fragment that can
exist on an on-disk inode, we keep a record of its size in struct inode,
which is updated when we write the block to disk. The cleaner routines
thus have ready access to what size is the correct size for this block,
on disk.

Fixed a related bug: if a file with fragments is being cleaned
(fragments being cleaned) at the same time it is being extended beyond
NDADDR blocks, we could write a bogus FINFO record that has a frag in the
middle; when it was cleaned this would give back bogus file data. Don't
write the indirect blocks in this case, since there is no need.

lfs_fragextend and lfs_truncate no longer require the seglock, but instead
take a shared lock, which the seglock locks exclusively.
 1.40 16-Jun-2002  perseant For synchronous writes, keep separate i/o counters for each write, so
processes don't have to wait for one another to finish (e.g., nfsd seems
to be a little happier now, though I haven't measured the difference).
Synchronous checkpoints, however, must always wait for all i/o to finish.

Take the contents of the callback functions and have them run in thread
context instead (aiodoned thread). lfs_iocount no longer has to be
protected in splbio(), and quite a bit less of the segment construction
loop needs to be in splbio() as well.

If lfs_markv is handed a block that is not the correct size according to
the inode, refuse to process it. (Formerly it was extended to the "correct"
size.) This is possibly more prone to deadlock, but less prone to corruption.

lfs_segclean now outright refuses to clean segments that appear to have live
bytes in them. Again this may be more prone to deadlock but avoids
corruption.

Replace ufsspec_close and ufsfifo_close with LFS equivalents; this means
that no UFS functions need to know about LFS_ITIMES any more. Remove
the reference from ufs/inode.h.

Tested on i386, test-compiled on alpha.
 1.39 14-May-2002  perseant branches: 1.39.2; 1.39.4;
Phase one of my three-phase plan to make LFS play nice with UBC, and bug-fixes
I found while making sure there weren't any new ones.

* Make the write clusters keep track of the buffers whose blocks they contain.
This should make it possible to (1) write clusters using a page mapping
instead of malloc, if desired, and (2) schedule blocks for rewriting
(somewhere else) if a write error occurs. Code is present to use
pagemove() to construct the clusters but that is untested and will go away
anyway in favor of page mapping.
* DEBUG now keeps a log of Ifile writes, so that any lingering instances of
the "dirty bufs" problem can be properly debugged.
* Keep track of whether the Ifile has been dirtied by various routines that
can be called by lfs_segwrite, and loop on that until it is clean, for
a checkpoint. Checkpoints need to be squeaky clean.
* Warn the user (once) if the Ifile grows larger than is reasonable for their
buffer cache. Both lfs_mountfs and lfs_unmount check since the Ifile can
grow.
* If an inode is not found in a disk block, try rereading the block, under
the assumption that the block was copied to a cluster and then freed.
* Protect WRITEINPROG() with splbio() to fix a hang in lfs_update.
 1.38 23-Nov-2001  chs add spaces for KNF. confirmed to produce identical objects.
 1.37 13-Jul-2001  perseant Merge the short-lived perseant-lfsv2 branch into the trunk.

Kernels and tools understand both v1 and v2 filesystems; newfs_lfs
generates v2 by default. Changes for the v2 layout include:

- Segments of non-PO2 size and arbitrary block offset, so these can be
matched to convenient physical characteristics of the partition (e.g.,
stripe or track size and offset).

- Address by fragment instead of by disk sector, paving the way for
non-512-byte-sector devices. In theory fragments can be as large
as you like, though in reality they must be smaller than MAXBSIZE in size.

- Use serial number and filesystem identifier to ensure that roll-forward
doesn't get old data and think it's new. Roll-forward is enabled for
v2 filesystems, though not for v1 filesystems by default.

- The inode free list is now a tailq, paving the way for undelete (undelete
is not yet implemented, but can be without further non-backwards-compatible
changes to disk structures).

- Inode atime information is kept in the Ifile, instead of on the inode;
that is, the inode is never written *just* because atime was changed.
Because of this the inodes remain near the file data on the disk, rather
than wandering all over as the disk is read repeatedly. This speeds up
repeated reads by a small but noticeable amount.

Other changes of note include:

- The ifile written by newfs_lfs can now be of arbitrary length, it is no
longer restricted to a single indirect block.

- Fixed an old bug where ctime was changed every time a vnode was created.
I need to look more closely to make sure that the times are only updated
during write(2) and friends, not after-the-fact during a segment write,
and certainly not by the cleaner.
 1.36 20-Dec-2000  cgd branches: 1.36.2; 1.36.4; 1.36.6;
replace \<space(s)><newline> (wrong!) with \<newline>
 1.35 17-Nov-2000  perseant Correct accounting of lfs_avail, locked_queue_count, and locked_queue_bytes.
(PR #11468). In the case of fragment allocation, check to see if enough
space is available before extending a fragment already scheduled for writing.

The locked_queue_* variables indicate the number of buffer headers and bytes,
respectively, that are unavailable to getnewbuf() because they are locked up
waiting for LFS to flush them; make sure that that is actually what we're
counting, i.e., never count malloced buffers, and always use b_bufsize instead
of b_bcount.

If DEBUG is defined, the periodic calls to lfs_countlocked will now complain
if either counter is incorrect. (In the future lfs_countlocked will not need
to be called at all if DEBUG is not defined.)
 1.34 13-Nov-2000  perseant Remove debugging code that accidentally went in with yesterday's commit.
 1.33 12-Nov-2000  perseant Do not needlessly dirty segment table blocks during lfs_segwrite,
preventing needless disk activity when the filesystem is idle. (PR #10979.)
 1.32 13-Sep-2000  perseant Cast back to int32_t in LFS_EST_BFREE and LFS_EST_RSVD macros, for
consistency with their arguments.

Change the debugging printf in lfs_reserve to match, and enclose it in
#ifdef DEBUG.

Tested on alpha, arm32, sparc.
 1.31 09-Sep-2000  perseant Various bug-fixes to LFS, to wit:


Kernel:

* Add runtime quantity lfs_ravail, the number of disk-blocks reserved
for writing. Writes to the filesystem first reserve a maximum amount
of blocks before their write is allowed to proceed; after the blocks
are allocated the reserved total is reduced by a corresponding amount.

If the lfs_reserve function cannot immediately reserve the requested
number of blocks, the inode is unlocked, and the thread sleeps until
the cleaner has made enough space available for the blocks to be
reserved. In this way large files can be written to the filesystem
(or, smaller files can be written to a nearly-full but thoroughly
clean filesystem) and the cleaner can still function properly.

* Remove explicit switching on dlfs_minfreeseg from the kernel code; it
is now merely a fs-creation parameter used to compute dlfs_avail and
dlfs_bfree (and used by fsck_lfs(8) to check their accuracy). Its
former role is better assumed by a properly computed dlfs_avail.

* Bounds-check inode numbers submitted through lfs_bmapv and lfs_markv.
This prevents a panic, but, if the cleaner is feeding the filesystem
the wrong data, you are still in a world of hurt.

* Cleanup: remove explicit references of DEV_BSIZE in favor of
btodb()/dbtob().

lfs_cleanerd:

* Make -n mean "send N segments' blocks through a single call to
lfs_markv". Previously it had meant "clean N segments though N calls
to lfs_markv, before looking again to see if more need to be cleaned".
The new behavior gives better packing of direct data on disk with as
little metadata as possible, largely alleviating the problem that the
cleaner can consume more disk through inefficient use of metadata than
it frees by moving dirty data away from clean "holes" to produce
entirely clean segments.

* Make -b mean "read as many segments as necessary to write N segments
of dirty data back to disk", rather than its former meaning of "read
as many segments as necessary to free N segments worth of space". The
new meaning, combined with the new -n behavior described above,
further aids in cleaning storage efficiency as entire segments can be
written at once, using as few blocks as possible for segment summaries
and inode blocks.

* Make the cleaner take note of segments which could not be cleaned due
to error, and not attempt to clean them until they are entirely free
of dirty blocks. This prevents the case in which a cleanerd running
with -n 1 and without -b (formerly the default) would spin trying
repeatedly to clean a corrupt segment, while the remaining space
filled and deadlocked the filesystem.

* Update the lfs_cleanerd manual page to describe all the options,
including the changes mentioned here (in particular, the -b and -n
flags were previously undocumented).

fsck_lfs:

* Check, and optionally fix, lfs_avail (to an exact figure) and
lfs_bfree (within a margin of error) in pass 5.

newfs_lfs:

* Reduce the default dlfs_minfreeseg to 1/20 of the total segments.

* Add a warning if the sgs disklabel field is 16 (the default for FFS'
cpg, but not usually desirable for LFS' sgs: 5--8 is a better range).

* Change the calculation of lfs_avail and lfs_bfree, corresponding to
the kernel changes mentioned above.

mount_lfs:

* Add -N and -b options to pass corresponding -n and -b options to
lfs_cleanerd.

* Default to calling lfs_cleanerd with "-b -n 4".


[All of these changes were largely tested in the 1.5 branch, with the
idea that they (along with previous un-pulled-up work) could be applied
to the branch while it was still in ALPHA2; however my test system has
experienced corruption on another filesystem (/dev/console has gone
missing :^), and, while I believe this unrelated to the LFS changes, I
cannot with good conscience request that the changes be pulled up.]
 1.30 09-Sep-2000  perseant Change dlfs_dmeta and dlfs_avail to signed quantities, to prevent
underflow errors, visible in userland as impossibly high values
returned from df(1).
 1.29 05-Jul-2000  perseant Clean up accounting of lfs_uinodes (dirty but unwritten inodes).

Make lfs_uinodes a signed quantity for debugging purposes, and set it to
zero as fs mount time.

Enclose setting/clearing of the dirty flags (IN_MODIFIED, IN_ACCESSED,
IN_CLEANING) in macros, and use those macros everywhere. Make
LFS_ITIMES use these macros; updated the ITIMES macro in inode.h to know
about this. Make ufs_getattr use ITIMES instead of FFS_ITIMES.
 1.28 04-Jul-2000  perseant Fix errors observed while trying to fill the filesystem with yesterday's
fixes:

- Write copies of bfree and avail in the CLEANERINFO block, so the
cleaner doesn't have to guess which superblock has the current
information (if indeed any do).

- Tighten up accounting of lfs_avail (more needs to be done).

- When cleansing indirect blocks of UNWRITTEN, make sure not to mark
them clean, since they'll need to be rewritten later.
 1.27 03-Jul-2000  perseant Allow the number of free segments reserved for the cleaner to be
parametrized in the filesystem, defaulting to MIN_FREE_SEGS = 2 but set
to something more reasonable at newfs_lfs time.

Note the number of blocks that have been scheduled for writing but which
are not yet on disk in an inode extension, i_lfs_effnblks. Move
i_ffs_effnlink out of the ffs extension and onto the main inode, since
it's used all over the shared code and the lfs extension would clobber
it.

At inode write time, indirect blocks and inode-held blocks of inodes
that have i_lfs_effnblks != i_ffs_blocks are cleansed of UNWRITTEN disk
addresses, so that these never make it to disk.
 1.26 27-Jun-2000  perseant Fixes associated with filling an LFS:

Change the space computation to appear to change the size of the *disk*
rather than the *bytes used* when more segment summaries and inode
blocks are written. Try to estimate the amount of space that these will
take up when more files are written, so the disk size doesn't change too
much.

Regularize error returns from lfs_valloc, lfs_balloc, lfs_truncate: they
now fail entirely, rather than succeeding half-way and leaving the fs in
an inconsistent state.

Rewrite lfs_truncate, mostly stealing from ffs_truncate. The old
lfs_truncate had difficulty truncating a large file to a non-zero size
(indirect blocks were not handled appropriately).

Unmark VDIROP on fvp after ufs_remove, ufs_rmdir, so these can be
reclaimed immediately: this vnode would not be written to disk again
anyway if the removal succeeded, and if it failed, no directory
operation occurred.

ufs_makeinode and ufs_mkdir now remove IN_ADIROP on error.
 1.25 06-Jun-2000  perseant branches: 1.25.2;
Protect inode free list with seglock, instead of separate lock, so that
the head of the inode free list (on the superblock) always matches the
rest of the free list (in the ifile).

Protect lfs_fragextend with seglock, to prevent the segment byte count
fudging from making its way to disk.

Don't try to inactivate dirop vnodes that are still in the middle of
their dirop (may address PR#10285).
 1.24 31-May-2000  perseant update for IN_ACCESSED changes
 1.23 27-May-2000  perseant branches: 1.23.2;
Prevent dirops from getting around lfs_check and wedging the buffer cache.
All the dirop vnops now mark the inodes with a new flag, IN_ADIROP, which
is removed as soon as the dirop is done (as opposed to VDIROP which stays
until the file is written). To address one issue raised in PR#9357.
 1.22 13-May-2000  perseant Change the sementics of the last parameter from a boolean ("waitfor") to
a set of flags ("flags"). Two flags are defined, UPDATE_WAIT and
UPDATE_DIROP.

Under the old semantics, VOP_UPDATE would block if waitfor were set,
under the assumption that directory operations should be done
synchronously. At least LFS and FFS+softdep do not make this
assumption; FFS+softdep got around the problem by enclosing all relevant
calls to VOP_UPDATE in a "if(!DOINGSOFTDEP(vp))", while LFS simply
ignored waitfor, one of the reasons why NFS-serving an LFS filesystem
did not work properly.

Under the new semantics, the UPDATE_DIROP flag is a hint to the
fs-specific update routine that the call comes from a dirop routine, and
should be wait for, or not, accordingly.

Closes PR#8996.
 1.21 05-May-2000  perseant Change the way LFS does block accounting, from trying to infer from the
buffer cache flags, to marking the inode and/or indirect blocks with a
special disk address UNWRITTEN==-2 when a block is accounted for. (This
address is never written to disk, but only used in-core. This is essentially
the same method of block accounting as on the UBC branch, where the buffer
headers don't exist.) Make sure that truncation is handled properly,
especially in the case of holey files.

Fixes PR#9994.
 1.20 19-Jan-2000  perseant Changes to stabilize LFS. The first two of these should also apply to the
1.4 branch.

* Use a separate per-fs lock, instead of ufs_hashlock, to protect the Inode
free list. This seems to prevent the "lockmgr: %d, not exclusive lock holder
%d, unlocking" message I was mis-attributing last night to an unlocked vnode
being passed to vrele.

* Change calling semantics of lfs_ifind, to give better error reporting:
If fed a struct buf, it can report the block number of the offending inode
block as well as the inode number.

* Back out rev 1.10 of lfs_subr.c, since the replacement code was slightly
uglier while being functionally identical.

* Make lfs_vunref use the same free list convention as vrele/vput, so that
vget does not remove vnodes from a hash list they are not on.
 1.19 15-Dec-1999  perseant In lfs_bwrite, don't mark buffers dirty if lfs is mounted read-only.
(Previously buffers could be marked dirty by the cleaner, and possibly by
other means.)

Also check for softdep mount in vfs_shutdown before trying to bawrite
buffers, since other filesystems don't need it and lfs doesn't bawrite.
(This fragment reviewed by fvdl.)

Partially addresses PR#8964.
 1.18 08-Dec-1999  simonb Use an explicitly sized type (u_int32_t) for inode numbers in the super
block instead of ino_t. Reviewed by Konrad Schroder.
 1.17 06-Nov-1999  perseant branches: 1.17.2;
Address ufs_hashlock/ufs_ihashins protocol bug, discovered while doing a
post-mortem of a production machine. Also, take the active dirop
count off of the fs and make it global (since it is measuring a global
resource) and tie the threshold value LFS_MAXDIROP to desiredvnodes.
 1.16 15-Jun-1999  perseant branches: 1.16.2; 1.16.4; 1.16.6;
Minor changes to the segment live bytes calculation. In particular, fixed
a bug in fragment extension that could run the count negative. Also, don't
overcount for inodes, and don't count segment summaries. Thus, for empty
segments the live bytes count should now be exactly zero.
 1.15 01-Jun-1999  perseant Fixed lfs_update (and related functions) so that calls from lfs_fsync
will DTRT with vnodes marked VDIROP. In particular, the message
"flushing VDIROP" will no longer appear, and the filesystem will remain
stable in the event of a crash.

This was particularly a problem with NFS-exported LFSes, since fsync
was called on every file close.
 1.14 25-Mar-1999  perseant branches: 1.14.2; 1.14.4; 1.14.6;
clean up unused/required #ifdefs
 1.13 17-Mar-1999  perseant Move dlfs_pad to the end of struct dlfs (after the pad), for upward
compatibility.
 1.12 17-Mar-1999  perseant Fix pad on lfs.h so it is really 512 bytes, as advertized
 1.11 10-Mar-1999  perseant New sources should leave the LFS in a more-or-less working state. Changes
include:

- DIROP segregation is enabled, and greater care is taken
to make sure that a checkpoint completes. Fsck is not
needed to remount the filesystem.
- Several checks to make sure that the LFS subsystem does not
overuse various resources (memory, in particular).
- The cleaner routines, lfs_markv in particular, are completely
rewritten. A buffer overflow is removed. Greater care is taken
to ensure that inodes come from where lfs_cleanerd say they come
from (so we know nothing has changed since lfs_bmapv was called).
- Fragment allocation is fixed, so that writes beyond end-of-file
do the right thing.
 1.10 11-Sep-1998  pk PR#6032: define fixed sized on-disk superblock structure.
 1.9 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.8 05-Dec-1996  is Make the struct lfs 512 bytes long on 32bit machines whose compiler doesn't
align 32bit integers. Use explicit sized typing at some other places.

XXX This still won't fix lfs for 64bit machines, as we have some
assumptions about sizeof(pointer)=sizeof(u_int32_t) in here, and (if I
looked right) a misaligned u_int64_t. The right fix (to cite cgd) will
be to seperate on-disk-representation from in-core, but I don't have
the time (at the moment) to do this.
 1.7 09-Feb-1996  christos lfs prototypes
 1.6 21-Dec-1994  mycroft Add RCS ids where missing.
 1.5 14-Dec-1994  mycroft Sync with CSRG.
 1.4 17-Nov-1994  mycroft Round struct lfs to 512 bytes.
 1.3 20-Oct-1994  cgd update for new syscall args description mechanism, and deal safely
with wider types.
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.14.6.1 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.14.4.2 31-Aug-1999  perseant Rudimentary support for LFS under UBC:

- LFS-specific VOP_BALLOC and VOP_PUTPAGES vnode ops.

- getblk VREG panic #ifdef'd out (can be reinstated when Ifile is
internalized and Ifile can be made another type from VREG)

- interface to VOP_PUTPAGES changed to pass all pager flags, not
just sync. FS putpages routines must know about the pager flags.

- new LFS magic disk address, -2 ("unwritten"), meaning accounted for
but not assigned to a fixed disk location (since LFS does these two
things separately, and the previous accounting method using buffer
headers no longer will work). Changed references to (foo == (daddr_t)-1)
to (foo < 0). Since disk drivers reject all addresses < 0, this should
not present a problem for other FSs.
 1.14.4.1 21-Jun-1999  thorpej Sync w/ -current.
 1.14.2.4 20-Jan-2000  he Pull up revision 1.20 (requested by perseant):
Files removed (through unlink, rmdir) are now really removed, though the
removal is postponed until the dirop is complete to ensure validity of
the filesystem through a crash. Use a separate per-fs lock, instead of
ufs_hashlock, to protect the inode free list. Change calling semantics
of lfs_ifind, to give better error reporting: If fed a struct buf, it
can report the block number of the offending inode block as well as the
inode number.
 1.14.2.3 17-Dec-1999  he Pull up revision 1.17 (requested by perseant):
Address locking protocol error for inode hash, and make the
maximum number of active dirops a global quantity.
 1.14.2.2 17-Dec-1999  he Pull up revision 1.15 (requested by perseant):
Avoid flushing vnodes involved in a dirop, making lfs' promise
of "no fsck needed, even in the event of a crash" closer to
reality.
 1.14.2.1 25-Jun-1999  perry pullup 1.15->1.16 (perseant)
 1.16.6.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.16.4.1 15-Nov-1999  fvdl Sync with -current
 1.16.2.3 05-Jan-2001  bouyer Sync with HEAD
 1.16.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.16.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.17.2.2 06-Nov-1999  perseant Address ufs_hashlock/ufs_ihashins protocol bug, discovered while doing a
post-mortem of a production machine. Also, take the active dirop
count off of the fs and make it global (since it is measuring a global
resource) and tie the threshold value LFS_MAXDIROP to desiredvnodes.
 1.17.2.1 06-Nov-1999  perseant file lfs.h was added on branch comdex-fall-1999 on 1999-11-06 20:33:06 +0000
 1.23.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.25.2.2 03-Feb-2001  he Pull up revisions 1.33-1.35 (requested by perseant):
o Don't write anything if the filesystem is idle (PR#10979).
o Close up accounting holes in LFS' accounting of immediately-
available-space, number of clean segments, and amount of dirty
space taken up by metadata (PR#11468, PR#11470, PR#11534).
 1.25.2.1 14-Sep-2000  perseant Pull up recent LFS kernel changes (approved by thorpej):

ufs/ufs/inode.h, 1.20--1.22 (add i_lfs_effnblks extension ;
make ITIMES aware of LFS_ITIMES;
_LKM protection so userland progs
compile)
ufs/ufs/ufs_vnops.c, 1.69, 1.71 (remove IN_ADIROP;
use ITIMES instead of FFS_ITIMES)
ufs/ufs/ufs_readwrite.c, 1.27 (use lfs_reserve in lfs_write)
ufs/lfs/lfs.h, 1.26--1.32 (define LFS_EST_* macros ;
change MIN_FREE_SEGS to lfs_minfreesegs ;
add avail and bfree to CLEANERINFO ;
change lfs_uinodes to signed ;
change lfs_dmeta to signed ;
add whitespace to line up structure
members ;
explicit cast to int32_t in LFS_EST_*
macros)
ufs/lfs/lfs_alloc.c, back out 1.34.2.3 (pullups of 1.39, 1.40);
then pull up 1.38 (clean up on error)
1.39--1.43 (restore fvdl's ufs_hashlock fix ;
restore fvdl's ufs_hashlock fix ;
set i_lfs_effnblks ;
use UINO macros ;
add comments and fix long lines)
ufs/lfs/lfs_balloc.c, 1.19 (don't succeed halfway)
1.21--1.25 (use i_lfs_effnblks ;
fix i_lfs_effnblks computation and
quieten ;
fix i_ffs_blocks in unwritten fragment ;
remove useless debugging check ;
add comments and (c) 2000)
ufs/lfs/lfs_bio.c, 1.24--1.30 (cleanup and make lfs_flush_fs take
"struct lfs *" instead of "struct
mount *" ;
use lfs_minfreeseg instead of
MIN_FREE_SEGS ;
use UINO macros, and copy bfree/avail
to CLEANERINFO ;
add lfs_reserve function ;
1.28--1.30 fix printf formatting)
ufs/lfs/lfs_cksum.c, 1.13 (add (c) 2000)
ufs/lfs/lfs_debug.c, 1.11 (use btodb instead of DEV_BSIZE)
ufs/lfs/lfs_extern.h, 1.18, 1.20--1.21 (function prototype changes)
ufs/lfs/lfs_inode.c, 1.38 (rewrite lfs_truncate from
ffs_truncate)
1.40--1.44 (count written and unwritten blocks
seperately ;
use disk block units instead of bytes ;
remove unnecessary "mod" variable ;
correct B_DELWRI to avoid bawrite panic ;
use lfs_reserve)
ufs/lfs/lfs_segment.c, 1.52-1.59 (use lfs_dmeta to note used summaries ;
check for UNWRITTEN in indirect blocks ;
more debugging stuff inside #ifdef
DEBUG_LFS ;
use LK_CANRECURSE ;
don't drop dirty indirect blocks ;
use UINO macros ;
don't hose the free list ;
use btodb() instead of DEV_BSIZE ;
make it compile again (oops))
ufs/lfs/lfs_subr.c, 1.16--1.17 (check for locked inodes before
changing ;
use btodb() instead of DEV_BSIZE, (c)
2000)
ufs/lfs/lfs_syscalls.c, back out 1.41.4.2 (fvdl's ufs_hashlock fix);
then pull up 1.43 (use lfs_dmeta)
1.44--1.45 (restore fvdl's ufs_hashlock fix)
1.46--1.47 (fix lfs_avail leakage from sblock
segments ;
use UINO macros)
1.49 (bounds-check inode numbers in
lfs_markv)
ufs/lfs/lfs_vfsops.c, 1.53 (use LFS_EST_* macros in lfs_statfs)
1.56--1.58 (initialize lfs_minfreeseg, lfs_effnblk ;
initialize lfs_uinodes ;
initialize lfs_ravail)
ufs/lfs/lfs_vnops.c, 1.40 (remove VDIROP from removed files)
1.42--1.44 (move SET_ENDOP below the removal of
VDIROP ;
use UINO macros and add lfs_itimes
function ;
use lfs_reserve in dirops)
 1.36.6.4 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.36.6.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.36.6.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.36.6.1 03-Aug-2001  lukem update to -current
 1.36.4.4 13-Jul-2001  perseant Be more careful about when we update ctime/mtime. In particular, if we
are only writing indirect blocks, that doesn't count for mtime; and when
we first create a vnode, that certainly *does not* count for ctime
(a bug that's been there from the beginning).

This does not change the fact that mtime might still be set after write(2)
is "completed", but it does make the atime-in-the-ifile code have some
effect (noticeable less degradation of read time after an intervening
large write).
 1.36.4.3 10-Jul-2001  perseant Turn the free list into a tailq, with both head and tail kept on the ifile.

Update access times on the inode even if it does not get marked IN_ACCESS.
 1.36.4.2 02-Jul-2001  perseant Change disk addressing unit to be the fragment, instead of the disk sector.
All quantities in the superblock, inodes, indirect blocks, etc. refer now
to this abstract unit (called "fsb" as it is in FFS) instead of disk sectors;
as a consequence segment summary blocks have to be multiples of a fragment in
size. In v1 filesystems, compatibility code ensures that 1 fsb == 1 sector,
regardless of fragment size.

Fragments can now range in size between 512 and 32k; in the event that
LFS_LABELPAD (8k) is smaller than the disk address unit size, an extra
proto-superblock is kept at 8k from the beginning of the disk, to be used
*only* to locate the real superblocks. (Not all of the userland knows about
this yet.)

Almost all of this was done not by me, but by joff.
 1.36.4.1 27-Jun-2001  perseant Import of what I've been calling "LFSv2", that is, LFS with some features
added that require changes to the on-disk data structures. These include:

- 64-bit time in everything but inodes
- User-specified segment offset, and segment size no longer
restricted to PO2.
- Serial number on segment summaries in addition to timestamp, and
a new volume identifier, to make roll-forward feasible without
fear of finding old data and thinking it was new.

Although I think this version works at least as well as what's on the trunk,
we're not done yet; hence this commit is going in on a branch and not on
the trunk. Enhancements that are not here yet include fragment addressing,
like FFS does, instead of block addressing.
 1.36.2.5 11-Dec-2002  thorpej Sync with HEAD.
 1.36.2.4 01-Aug-2002  nathanw Catch up to -current.
 1.36.2.3 20-Jun-2002  nathanw Catch up to -current.
 1.36.2.2 08-Jan-2002  nathanw Catch up to -current.
 1.36.2.1 24-Aug-2001  nathanw Catch up with -current.
 1.39.4.1 20-Jun-2002  lukem Pull up revision 1.40 (requested by perseant in ticket #325):
For synchronous writes, keep separate i/o counters for each write, so
processes don't have to wait for one another to finish (e.g., nfsd seems
to be a little happier now, though I haven't measured the difference).
Synchronous checkpoints, however, must always wait for all i/o to finish.
Take the contents of the callback functions and have them run in thread
context instead (aiodoned thread). lfs_iocount no longer has to be
protected in splbio(), and quite a bit less of the segment construction
loop needs to be in splbio() as well.
If lfs_markv is handed a block that is not the correct size according to
the inode, refuse to process it. (Formerly it was extended to the "correct"
size.) This is possibly more prone to deadlock, but less prone to corruption.
lfs_segclean now outright refuses to clean segments that appear to have live
bytes in them. Again this may be more prone to deadlock but avoids
corruption.
Replace ufsspec_close and ufsfifo_close with LFS equivalents; this means
that no UFS functions need to know about LFS_ITIMES any more. Remove
the reference from ufs/inode.h.
Tested on i386, test-compiled on alpha.
 1.39.2.2 15-Jul-2002  gehenna catch up with -current.
 1.39.2.1 20-Jun-2002  gehenna catch up with -current.
 1.64.2.7 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.64.2.6 08-Mar-2005  skrll Sync with HEAD.
 1.64.2.5 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.64.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.64.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.64.2.2 25-Aug-2004  skrll Sync with HEAD.
 1.64.2.1 03-Aug-2004  skrll Sync with HEAD
 1.72.4.1 10-May-2005  riz Pull up the following revisions (requested by perseant in ticket #1281):

1.8 sys/ufs/lfs/TODO
1.75 sys/ufs/lfs/lfs.h (via patch)
1.74 sys/ufs/lfs/lfs_alloc.c (via patch)
1.49, 1.51 sys/ufs/lfs/lfs_balloc.c (1.51 via patch)
1.78 sys/ufs/lfs/lfs_bio.c
1.62 sys/ufs/lfs/lfs_extern.h (via patch)
1.156 sys/ufs/lfs/lfs_segment.c (via patch)
1.48 sys/ufs/lfs/lfs_subr.c
1.101 sys/ufs/lfs/lfs_syscalls.c
1.163 sys/ufs/lfs/lfs_vfsops.c (via patch)
1.134 sys/ufs/lfs/lfs_vnops.c (via patch)
1.61 sys/ufs/ufs/ufs_readwrite.c (via patch)

1.20 libexec/lfs_cleanerd/clean.h (via patch)
1.52 libexec/lfs_cleanerd/cleanerd.c (via patch)
1.41 libexec/lfs_cleanerd/library.c (via patch)

1.4 regress/sys/fs/lfs/newfs_fsck/Makefile
1.2 regress/sys/fs/lfs/newfs_fsck/mkfs_mount
1.2 regress/sys/fs/lfs/newfs_fsck/smallfiles
1.3 sbin/fsck_lfs/bufcache.c
1.3 sbin/fsck_lfs/bufcache.h
1.3 sbin/fsck_lfs/lfs.h
1.8 sbin/fsck_lfs/lfs.c (via patch)
1.8 sbin/fsck_lfs/pass3.c (via patch)
1.18 sbin/fsck_lfs/pass0.c (via patch)
1.18 sbin/fsck_lfs/utilities.c (via patch)
1.7 sbin/fsck_lfs/segwrite.c
1.19 sbin/fsck_lfs/setup.c (via patch)
1.3 sbin/newfs_lfs/Makefile
0 sbin/newfs_lfs/lfs.c (yes, remove it)
1.1 sbin/newfs_lfs/make_lfs.c
1.15 sbin/newfs_lfs/newfs.c (via patch)

Various minor LFS improvements.

Kernel:

* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this. Should fix PR #29045.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
Fixes PR #26680.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().

cleaner:

* Adapt lfs_cleanerd to use the fcntl call to get the Ifile filehandle,
so it need not be in the namespace.
* Make lfs_cleanerd be more careful when there are very few available
segments.
* Make lfs_cleanerd less verbose when the filesystem is unmounted.

newfs_lfs, fsck_lfs, and regression:

* Extend the lfs library from fsck_lfs(8) so that it can be used with a
not-yet-existent LFS. Make newfs_lfs(8) use this library, so it can
create LFSs whose Ifile is larger than one segment. Addresses PR #11110.
* Make newfs_lfs(8) use strsuftoi64() for its arguments, a la newfs(8).
* Make fsck_lfs(8) respect the "file system is clean" flag.
* Don't let fsck_lfs(8) think it has dirty blocks when invoked with the
-n flag.
* Remove the Ifile from the filesystem namespace. The cleaner now uses
a fcntl call on the root inode to find the Ifile filehandle. (As a
side-effect, addresses PR #29144.)
 1.74.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.74.4.1 29-Apr-2005  kent sync with -current
 1.77.2.18 10-Aug-2006  tron Apply patch (requested by fair in perseant #1457):
Bring LFS up to current, including a patch (1.95 lfs_alloc.c) that
should prevent the inode free list errors seen on the STABLE branch
subsequent to pullup ticket #1327.
 1.77.2.17 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_alloc.c: revision 1.93
sys/ufs/lfs/lfs.h: revision 1.106
sys/ufs/lfs/lfs_vfsops.c: revision 1.209
sys/ufs/lfs/lfs_vnops.c: revision 1.175
sys/ufs/lfs/lfs_segment.c: revision 1.178
Fixes to address the "vinvalbuf: dirty blocks" panic that can occur when
many inodes are cleaned at once. Make sure that we write all the pages
on vnodes that are being flushed, even if we don't think there's room;
drain v_numoutput before lfs_vflush() completes.
Also, don't allow a vnode that is in the process of being cleaned to be
chosen by getnewvnode(); this avoids a segment accounting panic in the case
that a large number of inodes are fed to lfs_markv() all at once.
 1.77.2.16 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_alloc.c: revision 1.92
sys/ufs/lfs/lfs.h: revision 1.105
sys/ufs/lfs/lfs_vfsops.c: revision 1.207
sys/ufs/lfs/lfs_subr.c: revision 1.59
sys/ufs/lfs/lfs_vnops.c: revision 1.173
sys/ufs/lfs/lfs_bio.c: revision 1.92
Introduce another per-filesystem parameter, lfs_resvseg, to separate the
notion of "how many segments are reserved for the cleaner" from that of
"how many segments are not counted in lfs_bfree". The default value
used for existing filesystems is the same as the previous implicit value
of (lfs_minfreeseg / 2 + 1), modulo some sanity checking.
Count pending dirops on a per-filesystem basis, since once we start
writing them we can't stop until we're done. This seems to help stave off
the "no clean segments" panic in the case of filling the filesystem with
directories and small files (e.g. simultaneously unpacking more copies of
pkgsrc than will fit).
 1.77.2.15 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs.h: revision 1.104
sys/ufs/lfs/lfs_vfsops.c: revision 1.206
sys/ufs/lfs/lfs_vnops.c: revision 1.170
sys/ufs/lfs/lfs_extern.h: revision 1.80
sys/ufs/lfs/lfs_segment.c: revision 1.176
sys/ufs/lfs/lfs_inode.c: revision 1.103 via patch
sys/ufs/lfs/lfs_alloc.c: revision 1.90
Postpone the segment accounting changes coming from truncation until the
inode that makes those changes valid is either written to disk by
lfs_writeinode() or discarded by lfs_vfree().
A couple of locking fixes are also included as well.
 1.77.2.14 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs.h: revision 1.103
sys/ufs/lfs/lfs_segment.c: revision 1.174
sys/ufs/lfs/lfs_vnops.c: revision 1.168
Introduce two fcntl calls that freeze the filesystem right at the point
where segment 0 is being considered for writing. This allows for automated
checkpoint vailidity scanning, and could be used (in conjunction with the
existing LFCNREWIND) for e.g. snapshot dumps as well.
Include a regression test that does such scanning.
When writing the Ifile, loop through the dirty block list three times to
make sure that the checkpoint is always consistent (the first and second
times the Ifile blocks can cross a segment boundary; not so the third time
unless the segments are very small). Discovered by using the aforementioned
regression test.
 1.77.2.13 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs.h: revision 1.102
sys/ufs/lfs/lfs_segment.c: revision 1.173
sys/ufs/lfs/lfs_vnops.c: revision 1.167 via patch
sys/ufs/lfs/lfs_bio.c: revision 1.91
Make lfs_vref/lfs_vunref not need to know about VXLOCK and VFREEING
explicitly (especially since we didn't know about VFREEING at all before),
but notice the EBUSY return from vget() instead.
Fix some more MP locking protocol issues, most of which were pointed out by
Christian Ehrhardt this morning on tech-kern.
 1.77.2.12 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs.h: revision 1.101
sys/ufs/lfs/lfs_vfsops.c: revision 1.202
sys/ufs/lfs/lfs_alloc.c: revision 1.88
Optimize the free list search a little more; in particular use words
instead of bytes for the index, and never search below fs->lfs_freehd.
Fix a bug in the previous version of the search (an erroneous assumption
that ino_t was signed).
Free the bitmap when we unmount the filesystem.
 1.77.2.11 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vfsops.c: revision 1.200
sys/ufs/lfs/lfs_vnops.c: revision 1.164
sys/ufs/lfs/lfs_inode.c: revision 1.101
sys/ufs/lfs/lfs_extern.h: revision 1.78
sys/ufs/lfs/lfs.h: revision 1.100
Implement a somewhat finer-grained mechanism for paging LFS-backed pages.
The writer daemon, if it does not need to flush the whole filesystem,
now only writes the vnodes for which the pagedaemon has requested pageouts
(although it does not pay attention to the page ranges the pagedaemon
supplies).
 1.77.2.10 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_alloc.c: revision 1.87
sys/ufs/lfs/lfs.h: revision 1.99
sys/ufs/lfs/lfs_vfsops.c: revision 1.199
sys/ufs/lfs/lfs_extern.h: revision 1.77 via patch
Keep the free list ordered. This solves a problem first pointed out to me
by Michel Oey, in which an aged LFS writes up to an extra Ifile block for
every file created; and paves the way for the truncation of the Ifile when
many files are deleted.
 1.77.2.9 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_subr.c: revision 1.58
sys/ufs/lfs/lfs.h: revision 1.98
Make the segment lock aware of LWPs. Fixes a (somewhat confusing)
"lockmgr: pid 3997, not exclusive lockholder 3997, unlocking" panic I
encountered while running blogbench on an LFS.
 1.77.2.8 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.158
sys/ufs/lfs/lfs_subr.c: revision 1.57
sys/ufs/lfs/lfs_segment.c: revision 1.171
sys/ufs/lfs/lfs.h: revision 1.97
sys/ufs/lfs/lfs_vfsops.c: revision 1.195
sys/ufs/lfs/lfs_extern.h: revision 1.76
Improvements to LFS's paging mechanism, to wit:
* Acknowledge that sometimes there are more dirty pages to be written to
disk than clean segments. When we reach the danger line,
lfs_gop_write() now returns EAGAIN. The caller of VOP_PUTPAGES(), if
it holds the segment lock, drops it and waits for the cleaner to make
room before continuing.
* Note and avoid a three-way deadlock in lfs_putpages (a writer holding
a page busy blocks on the cleaner while the cleaner blocks on the
segment lock while lfs_putpages blocks on the page).
 1.77.2.7 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_segment.c: revision 1.170
sys/ufs/lfs/lfs.h: revision 1.96
sys/ufs/lfs/lfs_vfsops.c: revision 1.194
sys/ufs/lfs/lfs_syscalls.c: revision 1.109
From Konrad Schroeder, in response to strange df output on anoncvs.netbsd.org:
We were returning the wrong value for free space. Now we're not.
 1.77.2.6 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs.h: revision 1.91
whitespace.
 1.77.2.5 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs.h: revision 1.90
change ino_t to u_int32_t for syscall compatibility.
 1.77.2.4 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs.h: revision 1.89
Move extern kernel variable declarations, into a _KERNEL protected session
so that the don't pollute userland's namespace.
 1.77.2.3 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.152
sys/ufs/lfs/lfs_debug.c: revision 1.31
sys/ufs/lfs/lfs_subr.c: revision 1.53
sys/ufs/lfs/lfs_extern.h: revision 1.68
sys/ufs/lfs/lfs_inode.c: revision 1.96
sys/ufs/lfs/lfs_bio.c: revision 1.86
sys/ufs/lfs/lfs_alloc.c: revision 1.83
sys/ufs/lfs/lfs_vfsops.c: revision 1.181
sys/ufs/lfs/lfs.h: revision 1.88
sys/ufs/lfs/lfs_segment.c: revision 1.164
- sprinkle const
- avoid shadow variables.
 1.77.2.2 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vfsops.c: revision 1.180
sys/ufs/lfs/lfs_syscalls.c: revision 1.106
sys/ufs/lfs/lfs.h: revision 1.87
Keep track of the number of segments reclaimed, since the cleaner doesn't
do this anymore (it hasn't for quite some time). Add a couple of conditional
debugging messages to indicate why segments are not cleaned, in the event
that lfs_segclean is used.
Make the LFCNSEGWAITALL fcntl work again.
 1.77.2.1 07-May-2005  tron Apply patch (requested by perseant in ticket #242):
* fsck_lfs buffer cache fixes, including PR #29151
* Change fsck_lfs phase 0 message to reflect reality
* fsck_lfs: check phase 5 (cleanerinfo accounting) even on
roll-forward
* Keep better track of the free list during roll-forward, avoiding
a core dump
* Improve hash table use for fsck_lfs buffer and vnode cache
* Document fsck_lfs flag -f, and implement -q
* Add resize_lfs, including kernel support
* Add LFS to mountd's list of exportable filesystem types
* Make the LFS lkm work again [christos@]
* Add MP locking to the LFS kernel subsystem
* Fix pager_map deadlock in lfs_putpages()
* Avoid incomplete file extension that looks like "partial
truncation" to fsck
* Use lfs_malloc for cleaner malloc, since the cleaner often runs
in low-memory conditions.
* Use splay trees, not hash table, to track page allocation for
write.
* Fix mkdir panic on full fs
* Fix page accounting leak by counting differently.
* Use rightly named structure for lfs_getattr [skrll@]
* Cosmetic changes for readability.
 1.88.2.6 27-Feb-2008  yamt sync with head.
 1.88.2.5 21-Jan-2008  yamt sync with head
 1.88.2.4 27-Oct-2007  yamt sync with head.
 1.88.2.3 26-Feb-2007  yamt sync with head.
 1.88.2.2 30-Dec-2006  yamt sync with head.
 1.88.2.1 21-Jun-2006  yamt sync with head.
 1.95.12.2 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.95.12.1 28-Mar-2006  tron Merge 2006-03-28 NetBSD-current into the "peter-altq" branch.
 1.95.10.3 11-May-2006  elad sync with head
 1.95.10.2 19-Apr-2006  elad sync with head.
 1.95.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.95.8.7 03-Sep-2006  yamt sync with head.
 1.95.8.6 11-Aug-2006  yamt sync with head
 1.95.8.5 26-Jun-2006  yamt sync with head.
 1.95.8.4 24-May-2006  yamt sync with head.
 1.95.8.3 11-Apr-2006  yamt sync with head
 1.95.8.2 01-Apr-2006  yamt sync with head.
 1.95.8.1 05-Mar-2006  yamt separate page replacement policy from the rest of kernel.
 1.95.6.2 01-Jun-2006  kardel Sync with head.
 1.95.6.1 22-Apr-2006  simonb Sync with head.
 1.95.4.1 09-Sep-2006  rpaulo sync with head
 1.107.4.1 13-Jul-2006  gdamore Merge from HEAD.
 1.114.2.1 18-Nov-2006  ad Sync with head.
 1.116.2.1 22-Oct-2006  yamt sync with head
 1.118.8.1 11-Jul-2007  mjf Sync with head.
 1.118.6.4 24-Aug-2007  ad Sync with buffer cache locking changes. See buf.h/vfs_bio.c for details.
Some minor portions are incomplete and needs to be verified as a whole.
 1.118.6.3 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.118.6.2 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.118.6.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.118.2.2 17-May-2007  yamt sync with head.
 1.118.2.1 07-May-2007  yamt sync with head.
 1.120.10.1 14-Oct-2007  yamt sync with head.
 1.120.8.3 23-Mar-2008  matt sync with HEAD
 1.120.8.2 09-Jan-2008  matt sync with HEAD
 1.120.8.1 06-Nov-2007  matt sync with HEAD
 1.120.6.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.122.10.2 08-Jan-2008  bouyer Sync with HEAD
 1.122.10.1 02-Jan-2008  bouyer Sync with HEAD
 1.122.6.2 19-Dec-2007  ad Use a global lfs_lock.
 1.122.6.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.122.4.1 18-Feb-2008  mjf Sync with HEAD.
 1.125.10.4 11-Mar-2010  yamt sync with head
 1.125.10.3 19-Aug-2009  yamt sync with head.
 1.125.10.2 04-May-2009  yamt sync with head.
 1.125.10.1 16-May-2008  yamt sync with head.
 1.125.8.1 18-May-2008  yamt sync with head.
 1.125.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.126.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.127.12.1 23-Jul-2009  jym Sync with HEAD.
 1.132.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.134.6.1 18-Feb-2012  mrg merge to -current.
 1.134.2.4 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.134.2.3 23-Jan-2013  yamt sync with head
 1.134.2.2 17-Apr-2012  yamt sync with head
 1.134.2.1 30-Nov-2011  yamt make lfs another pager specific flag so that it won't be affected by
an nfs hack in genfs.
 1.135.2.1 17-Mar-2012  bouyer Pull up following revision(s) (requested by perseant in ticket #116):
sys/ufs/lfs/lfs_alloc.c: revision 1.112
tests/fs/vfs/t_rmdirrace.c: revision 1.9
tests/fs/vfs/t_renamerace.c: revision 1.25
sys/ufs/lfs/lfs_vnops.c: revision 1.240
sys/ufs/lfs/lfs_segment.c: revision 1.224
sys/ufs/lfs/lfs_bio.c: revision 1.122
sys/ufs/lfs/lfs_vfsops.c: revision 1.294
sbin/newfs_lfs/make_lfs.c: revision 1.19
sys/ufs/lfs/lfs.h: revision 1.136
Pass t_renamerace and t_rmdirrace tests.
Adapt dholland@'s fix to ufs_rename to fix PR kern/43582. Address several
other MP locking issues discovered during the course of investigating the
same problem.
Removed extraneous vn_lock() calls on the Ifile, since the Ifile writes
are controlled by the segment lock.
Fix PR kern/45982 by deemphasizing the estimate of how much metadata
will fill the empty space on disk when the disk is nearly empty
(t_renamerace crates a lot of inode blocks on a tiny empty disk).
 1.136.2.4 03-Dec-2017  jdolecek update from HEAD
 1.136.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.136.2.2 23-Jun-2013  tls resync from head
 1.136.2.1 25-Feb-2013  tls resync with head
 1.157.2.1 28-Aug-2013  rmind sync with head
 1.160.6.6 28-Aug-2017  skrll Sync with HEAD
 1.160.6.5 09-Jul-2016  skrll Sync with HEAD
 1.160.6.4 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.160.6.3 22-Sep-2015  skrll Sync with HEAD
 1.160.6.2 06-Jun-2015  skrll Sync with HEAD
 1.160.6.1 06-Apr-2015  skrll Sync with HEAD
 1.199.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.199.2.1 26-Apr-2017  pgoyette Sync with HEAD
 1.201.6.1 30-Oct-2017  snj Pull up following revision(s) (requested by maya in ticket #330):
sbin/fsck_lfs/inode.c: 1.69
sbin/fsck_lfs/lfs.c: 1.73
sbin/fsck_lfs/pass6.c: 1.50
sbin/fsck_lfs/segwrite.c: 1.46
sys/ufs/lfs/lfs.h: 1.202-1.203
sys/ufs/lfs/lfs_accessors.h: 1.48
sys/ufs/lfs/lfs_alloc.c: 1.136-1.137
sys/ufs/lfs/lfs_balloc.c: 1.94
sys/ufs/lfs/lfs_bio.c: 1.141
sys/ufs/lfs/lfs_extern.h: 1.113
sys/ufs/lfs/lfs_inode.c: 1.156-1.157
sys/ufs/lfs/lfs_inode.h: 1.20, 1.21, 1.23
sys/ufs/lfs/lfs_itimes.c: 1.20
sys/ufs/lfs/lfs_pages.c: 1.13-1.15
sys/ufs/lfs/lfs_rename.c: 1.22
sys/ufs/lfs/lfs_segment.c: 1.270-1.275
sys/ufs/lfs/lfs_subr.c: 1.94-1.97
sys/ufs/lfs/lfs_syscalls.c: 1.175
sys/ufs/lfs/lfs_vfsops.c: 1.360
sys/ufs/lfs/lfs_vnops.c: 1.316-1.321
sys/ufs/lfs/ulfs_inode.c: 1.20
sys/ufs/lfs/ulfs_inode.h: 1.24
sys/ufs/lfs/ulfs_lookup.c: 1.41
sys/ufs/lfs/ulfs_quota2.c: 1.31
sys/ufs/lfs/ulfs_readwrite.c: 1.24
sys/ufs/lfs/ulfs_vnops.c: 1.49-1.50
Update inode member i_flag --> i_state to keep up with kernel changes
Move definition of IN_ALLMOD near the flag it's a mask for.
Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
Correct confusion between i_flag and i_flags
These will have to be renamed.
Spotted by Riastradh, thanks!
Add an XXX about the missing flags so it's not buried in a commit
message.
now the XXX count for LFS is 260
Rename i_flag to i_state.
The similarity to i_flags has previously caused errors.
Use continue to denote the no-op loop to match netbsd style
newline for extra clarity.
It isn't safe to drain dirops with seglock held, it'll deadlock if there
are any dirops. drain before grabbing seglock.
lfs_dirops == 0 is always true (as we already drained dirops), so omit
that part of the comparison.
Fixes a lot of LFS deadlocks. PR kern/52301
Many thanks to dholland for help analyzing coredumps
Ifdef out KDASSERT which fires on my machine.
Deduplicate sanity check that seglock is held on segunlock
Revert r1.272 fix to PR kern/52301, the performance hit is making things
unusable.
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar
XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
fix buffer overflow/KASSERT when cookies are supplied
lfs no longer uses the ffs-style struct direct, use the correct minimum
size
from dholland
XXX more wrong
Consistently use {,UN}MARK_VNODE macros rather than function calls.
Not much point doing anything after a panic call
Ask some question about the code in a XXX comment
XXX question our double-flushing of dirops
Fix typo in comment
 1.203.4.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.203.4.1 10-Jun-2019  christos Sync with HEAD
 1.203.2.1 18-Jan-2019  pgoyette Synch with HEAD
 1.204.6.1 29-Feb-2020  ad Sync with head.
 1.204.4.1 17-Aug-2020  martin Pull up following revision(s) (requested by riastradh in ticket #1050):

sys/ufs/lfs/lfs_subr.c: revision 1.101
sys/ufs/lfs/lfs_subr.c: revision 1.102
sys/ufs/lfs/lfs_inode.c: revision 1.158
sys/ufs/lfs/lfs_inode.h: revision 1.25
sys/ufs/lfs/lfs_balloc.c: revision 1.95
sys/ufs/lfs/lfs_pages.c: revision 1.21
sys/ufs/lfs/lfs_vnops.c: revision 1.330
sys/ufs/lfs/lfs_alloc.c: revision 1.140 (patch)
sys/ufs/lfs/lfs_alloc.c: revision 1.141 (patch)
lib/libp2k/p2k.c: revision 1.72
sys/ufs/lfs/lfs.h: revision 1.205
sys/ufs/lfs/lfs.h: revision 1.206
sys/ufs/lfs/lfs_segment.c: revision 1.284
sys/ufs/lfs/lfs.h: revision 1.207
sys/ufs/lfs/lfs_segment.c: revision 1.285
sys/ufs/lfs/lfs_debug.c: revision 1.55
sys/ufs/lfs/lfs_rename.c: revision 1.23
usr.sbin/dumplfs/dumplfs.c: revision 1.65
sys/ufs/lfs/lfs_vfsops.c: revision 1.371
sys/arch/i386/stand/efiboot/bootx64/Makefile: revision 1.3
sys/ufs/lfs/lfs_vfsops.c: revision 1.372
sys/ufs/lfs/lfs_vfsops.c: revision 1.373
sbin/fsck_lfs/pass1.c: revision 1.46
sys/ufs/lfs/lfs_vnops.c: revision 1.326
sys/ufs/lfs/lfs_vnops.c: revision 1.327
sys/ufs/lfs/lfs_vfsops.c: revision 1.375 (patch)
sys/ufs/lfs/lfs_vnops.c: revision 1.328
sys/ufs/lfs/lfs_subr.c: revision 1.98
sys/ufs/lfs/lfs_extern.h: revision 1.116
sys/ufs/lfs/lfs_vnops.c: revision 1.329
sys/ufs/lfs/lfs_subr.c: revision 1.99
sys/ufs/lfs/lfs_extern.h: revision 1.117
sys/ufs/lfs/lfs_accessors.h: revision 1.49
sys/ufs/lfs/lfs_extern.h: revision 1.118
sys/rump/fs/lib/liblfs/Makefile: revision 1.15
sys/ufs/lfs/lfs_bio.c: revision 1.146 (patch)
sys/ufs/lfs/lfs_bio.c: revision 1.147
sys/ufs/lfs/lfs_subr.c: revision 1.100

Fix kassert in lfs by initializing vp first.

Use a marker node to iterate lfs_dchainhd / i_lfs_dchain.

I believe elements can be removed while the lock is dropped,
including the next node we're hanging on to.

Just use VOP_BWRITE for lfs_bwrite_log.
Hope this doesn't cause trouble with vfs_suspend.

Teach lfs to transition ro<->rw.

Prevent new dirops while we issue lfs_flush_dirops.

lfs_flush_dirops assumes (by KASSERT((ip->i_state & IN_ADIROP) == 0))
that vnodes on the dchain will not become involved in active dirops
even while holding no other locks (lfs_lock, v_interlock), so we must
set lfs_writer here. All other callers already set lfs_writer.

We set fs->lfs_writer++ without explicitly doing lfs_writer_enter
because
(a) we already waited for the dirops to drain, and
(b) we hold lfs_lock and cannot drop it before setting lfs_writer.

Assert lfs_writer where I think we can now prove it.

Serialize access to the splay tree with lfs_lock.

Change some cheap KDASSERT into KASSERT.

Take a reference and fix assertions in lfs_flush_dirops.
Fixes panic:
KASSERT((ip->i_state & IN_ADIROP) == 0) at lfs_vnops.c:1670
lfs_flush_dirops
lfs_check
lfs_setattr
VOP_SETATTR
change_mode
sys_fchmod
syscall

This assertion -- and the assertion that vp->v_uflag has VU_DIROP set
-- is valid only until we release lfs_lock, because we may race with
lfs_unmark_dirop which will remove the nodes and change the flags.

Further, vp itself is valid only as long as it is referenced, which it
is as long as it's on the dchain, but lfs_unmark_dirop drops the
dchain's reference.

Don't lfs_writer_enter while holding v_interlock.

There's no need to lfs_writer_enter at all here, as far as I can see.
lfs_flush_fs will do it for us.

Break deadlock in PR kern/52301.

The lock order is lfs_writer -> lfs_seglock. The problem in 52301 is
that lfs_segwrite violates this lock order by sometimes doing
lfs_seglock -> lfs_writer, either (a) when doing a checkpoint or (b),
opportunistically, when there are no dirops pending. Both cases can
deadlock, because dirops sometimes take the seglock (lfs_truncate,
lfs_valloc, lfs_vfree):
(a) There may be dirops pending, and they may be waiting for the
seglock, so we can't wait for them to complete while holding the
seglock.
(b) The test for fs->lfs_dirops == 0 happens unlocked, and the state
may change by the time lfs_writer_enter acquires lfs_lock.

To resolve this in each case:
(a) Do lfs_writer_enter before lfs_seglock, since we will need it
unconditionally anyway. The worst performance impact of this should
be that some dirops get delayed a little bit.
(b) Create a new lfs_writer_tryenter to use at this point so that the
test for fs->lfs_dirops == 0 and the acquisition of lfs_writer happen
atomically under lfs_lock.

Initialize/destroy lfs_allclean_wakeup in modcmd, not lfs_mountfs.

Fixes reloading lfs.kmod.

In lfs_update, hold lfs_writer around lfs_vflush.

Otherwise, we might do
lfs_vflush
-> lfs_seglock
-> lfs_segwait(SEGM_CKP)
-> lfs_writer_enter
which is the reverse of the lfs_writer -> lfs_seglock ordering.

Call lfs_orphan in lfs_rename while we're still in the dirop.
lfs_writer_enter can't fail; keep it simple and don't pretend it can.

Assert that mtsleep can't fail either -- it doesn't catch signals and
there's no timeout.

Teach LFS_ORPHAN_NEXTFREE about lfs64.

Dust off the orphan detection code and try to make it work.

Fix !DIAGNOSTIC compile

Fix userland references to LFS_ORPHAN_NEXTFREE.

Forgot to grep for these or do a full distribution build, oops!

Fix missing <sys/evcnt.h> by removing the evcnts instead.

Just wanted to confirm that a race might happen, and indeed it did.
These serve little diagnostic value otherwise.

OR into bp->b_cflags; don't overwrite.

CTASSERT lfs on-disk structure sizes.

Avoid misaligned access to lfs64 on-disk records in memory.
lfs64 directory entries are only 32-bit aligned in order to conserve
space in directory blocks, and we had a hack to stuff a 64-bit inode
in them. This replaces the hack by __aligned(4) __packed, and goes
further:

1. It's not clear that all the other lfs64 data structures are 64-bit
aligned on disk to begin with. We can go through these later and
upgrade them from
struct foo64 {
...
} __aligned(4) __packed;
union foo {
struct foo64 f64;
...
};
to
struct foo64 {
...
};
union foo {
struct foo64 f64 __aligned(8);
...
} __aligned(4) __packed;
if we really want to take advantage of 64-bit memory accesses.
However, the __aligned(4) __packed must remain on the union
because:
2. We access even the lfs32 data structures via a union that has
lfs64 members, and it turns out that compilers will assume access
through a union with 64-bit aligned members implies the whole
union has 64-bit alignment, even if we're only accessing a 32-bit
aligned member.

Fix clang build after packed lfs64 accessor change.

Suppress spurious address-of-packed error in rump lfs too.
 1.53 20-Oct-2025  perseant * Generalize the partial-segment parser introduced for roll-forward,
using it to facilitate an in-kernel segment rewriter (cleaner), and a
mechanism to check whether a segment is in fact empty (only used
with DEBUG).

* Add these new fcntl calls:
- LFCNFILESTATS: For each inode given, report its number of direct
blocks, how many gaps (discontinuities) there are between direct
blocks, and how large the total gap distance is. This will be
useful for a coalescing agent.
- LFCNREWRITEFILE: For each inode given, rewrite its direct blocks,
effectively coalescing it into as compact a form as possible.
- LFCNSCRAMBLE: As above, except that it only rewrites every other
block. This causes the file to have many gaps that can be
measured with LFCNFILESTATS and addressed with LFCNREWRITEFILE,
for testing purposes.
- LFCNREWRITESEGS: Rewrite any live data in the given segments.
This is intended to simplify the cleaner API and facilitate an
in-kernel cleaner.
- LFCNCLEANERINFO: Get the most current CLEANERINFO data from the
kernel.
- LFCNSEGUSE: Retrieve segment usage data from the kernel.

* Vnodes marked IN_CLEANING now take a reference. Add a new "cleaner
lock", which must be taken by the cleaner before the segment lock,
and before marking nodes IN_CLEANING. This allows us to flush
vnodes, if necessary, before the cleaning segment is written, and
never to flush vnodes being cleaned. When the cleaner lock is
released, the vnodes are cleared of IN_CLEANING and the reference
dropped.

* Track a potential infinite loop in lfs_gatherblock.

* Pull "needs to flush" and "needs to wait for flush" into functions
instead of inlining their definitions.
 1.52 15-Sep-2025  perseant If setting the head (or tail) of the inode free list to LFS_UNUSED_INUM, also
set the tail (resp. head) to LFS_UNUSED_INUM, as the list is now empty.

Add a check to ensure that lfs_valloc_fixed will always terminate, even
if the free list should contain a loop. Extend the ifile at the end if it
is empty, to match the assumption of lfs_valloc() that the free list is
never empty.

Needed for roll-forward.
 1.51 24-Apr-2022  rillig lfs: fix lint warning about empty declaration
 1.50 07-Sep-2020  riastradh Suppress -Waddress-of-packed-member just for lfs_accessors.h.

We can remove -Wno-error=address-of-packed-member from various
makefiles now.
 1.49 21-Mar-2020  riastradh Avoid misaligned access to lfs64 on-disk records in memory.

lfs64 directory entries are only 32-bit aligned in order to conserve
space in directory blocks, and we had a hack to stuff a 64-bit inode
in them. This replaces the hack by __aligned(4) __packed, and goes
further:

1. It's not clear that all the other lfs64 data structures are 64-bit
aligned on disk to begin with. We can go through these later and
upgrade them from

struct foo64 {
...
} __aligned(4) __packed;

union foo {
struct foo64 f64;
...
};

to

struct foo64 {
...
};

union foo {
struct foo64 f64 __aligned(8);
...
} __aligned(4) __packed;

if we really want to take advantage of 64-bit memory accesses.

However, the __aligned(4) __packed must remain on the union
because:

2. We access even the lfs32 data structures via a union that has
lfs64 members, and it turns out that compilers will assume access
through a union with 64-bit aligned members implies the whole
union has 64-bit alignment, even if we're only accessing a 32-bit
aligned member.
 1.48 10-Jun-2017  maya branches: 1.48.4; 1.48.8; 1.48.12;
Rename i_flag to i_state.

The similarity to i_flags has previously caused errors.
 1.47 12-Jan-2017  christos branches: 1.47.8;
fix sign confusion
 1.46 20-Jun-2016  dholland branches: 1.46.2;
Massedit u_int{8,16,32,64}_t to uint{8,16,32,64}_t. This effectively
merges ufs/dinode.h 1.25.
 1.45 19-Jun-2016  dholland we are actually synced with ufs/dinode.h 1.24 and ufs/dir.h 1.25.
 1.44 19-Feb-2016  riastradh Explicitly cast between char and unsigned char here.
 1.43 19-Feb-2016  riastradh Various housekeeping.

- Include <ufs/lfs/lfs.h> for union lfs_dinode &c.
- Include <string.h> or <sys/systm.h> for memcpy.
- Avoid signedness mismatch in lfs dino accessor for `rdev'.
- Avoid shadowing global `index'.
 1.42 10-Jan-2016  christos there is no reason to use __unused here.
 1.41 10-Jan-2016  dholland Fix two functions that were accidentally "static __unused" instead of
"static __unused inline". Oops; but probably not actually harmful.
 1.40 19-Oct-2015  dholland fix stupid typo in the 64-bit branch of the d_namlen accessor
 1.39 19-Oct-2015  dholland improve some panic messages
 1.38 15-Oct-2015  dholland Remove stray #define of lfs_magic
(the last of the fake superblock structure field macros)
 1.37 10-Oct-2015  dholland Add byteswapping to the inode block-pointer accessors.
 1.36 03-Oct-2015  dholland Drop an explicit sign-extension in fsck that shouldn't be needed any
more.
 1.35 03-Oct-2015  dholland Use IINFO in lfs_writeinode().
(both the kernel and the userland copies)
 1.34 03-Oct-2015  dholland Add an IINFO struct, which is like the FINFO struct but for the inode
blocks portion of the segment summary.

A segment summary block begins with a header (SEGSUM); the rest of the
block contains FINFO structures describing file blocks growing upward
from the bottom (after the header), and IINFO structures describing
inode blocks grown downward from the end of the block. (When they meet
the segment is full regardless of how many blocks might be left.)

IINFO contains just a block number, and until now this information was
handled by just using uint32_t*; switching to a structure will make
the code a lot easier to read, and also make it easier to have 32-bit
and 64-bit versions without making a mess.

This commit just adds the structures and accessors; they'll be
deployed into the code in subsequent commits.
 1.33 21-Sep-2015  dholland branches: 1.33.2;
Fix some assorted 32-bit assumptions not yet otherwise handled.

Also apply patch to fix the overt problem in PR 50246: newfs was
calculating ifpb wrong for volumes with non-default block sizes.
 1.32 21-Sep-2015  dholland Oops, I forgot to make the atime in the 64-bit IFILE 64 bits.
Correct that. Incompatible change, but no LFS64 volumes can have been
created yet.
 1.31 21-Sep-2015  dholland Add 64-bit directory entry structures, and adjust accessors accordingly.

The LFS64 directory entry has a 64-bit inode number. This is stored as
two 32-bit values to avoid inducing 64-bit alignment requirements.

The exposed type for manipulating directory entries is now
LFS_DIRHEADER, following the same convention as e.g. IFILE and SEGUSE.
(But with LFS_ on it, because.)
 1.30 21-Sep-2015  dholland Oops; LFS_DIRECTSIZ() is going to need the fs as an argument.

Also, it turns out that dirhash needs a compile-time-constant version
of LFS_DIRECTSIZ(LFS_MAXNAMLEN+1), independent of 64-vs-32, so create
LFS_MAXDIRENTRYSIZE for this. Sigh.
 1.29 20-Sep-2015  dholland Clean up struct lfs_dirtemplate.
 1.28 20-Sep-2015  dholland Fix glaringly stupid overflow/sizing bug in -r1.25. The part I don't
get is how it passed testing...
 1.27 15-Sep-2015  dholland Pass around struct lfs_dirheader instead of struct lfs_direct.
 1.26 15-Sep-2015  dholland Add an accessor function for directory names.
 1.25 15-Sep-2015  dholland Add a function lfs_copydirname() to copy directory names in place; use
it in place of (variously) memcpy and strlcpy. (The latter isn't even
correct; was probably changed blindly from strncpy at some point.)

The new function zeroes the padding in the directory entry instead of
leaving trash behind.
 1.24 15-Sep-2015  dholland Move the header part of struct lfs_direct to its own structure.
(lfs_dirheader)

Take the opportunity to improve the directory generation code in
make_lfs.c. (Everything else was unaffected by virtue of using
accessor functions.)
 1.23 15-Sep-2015  dholland Add and use accessor functions for more of the directory entry fields.
 1.22 01-Sep-2015  dholland Add new accessors for the d_type and d_namlen fields of struct lfs_direct.
Napalm the old byteswap access logic for these.
 1.21 01-Sep-2015  dholland Fix up indirect block handling in truncate to be 32/64 clean.
 1.20 01-Sep-2015  dholland Tidy the MAXSYMLINKLEN macros.
 1.19 01-Sep-2015  dholland The ifile's inode number is constant. (it is always 1)

Therefore, storing the value in the superblock and reading it out
again is silly and offers the opportunity for it to become corrupted.
So, don't do that (most of the code already didn't) and use the
existing constant instead. Initialize new 32-bit superblocks with
the value for the sake of old userland programs, but don't keep the
value in the 64-bit superblock at all.

(approved by Margo Seltzer)
 1.18 01-Sep-2015  dholland Make the inode fields in the 64-bit superblock 64 bits wide.
Reasoning as before.

Note that I am not going through and checking for 64->32 truncations
in inode numbers; I'm sure there are quite a few, but that's a project
for later.
 1.17 01-Sep-2015  dholland Add byteswapping to the dinode accessors.

This prevents regressions in the ulfs code when switching to the new
accessors. Note that while adding byteswapping to the other accessors
is straightforward, I haven't done it yet; and that also is not enough
to make LFS_EI work, because there are places lying around that bypass
the accessors for one reason and another and all of them need to be
updated. That is going to have to wait for a later day as LFS_EI is
not on the critical path right now.
 1.16 01-Sep-2015  dholland Use the lfs dinode accessors in place of the ufs-derived ones.
(Mostly.)

The ufs-derived ones are fake structure member macros, which are gross
and not very safe. Also, it seems that a lot of places in the lfs code
were using the ffsv1 branch of them unconditionally, and this way it's
guaranteed all those places have been updated.

Found while doing this: for non-devices, have getattr produce NODEV
in the rdev field instead of leaking the address of the first direct
block.
 1.15 29-Aug-2015  mlelstv Fix IFILE pointer calculation when scanning freelist.
 1.14 19-Aug-2015  dholland Part two of dinodes; use the same union everywhere.
(previously the ufs-derived code had things set up slightly different)

Remove a bunch of associated mess.
 1.13 12-Aug-2015  dholland Hack up dinode usage to be 64 vs. 32 as needed. Part 1.

(This part changes the native lfs code; the ufs-derived code already
has 64 vs. 32 logic, but as aspects of it are unsafe, and don't
entirely interoperate cleanly with the lfs 64/32 stuff, pass 2 will be
rehashing that.)
 1.12 12-Aug-2015  dholland Provide 32-bit and 64-bit versions of FINFO.

This also entailed sorting out part of struct segment, as that
contains a pointer into the current FINFO data.
 1.11 12-Aug-2015  dholland Make 32-bit and 64-bit versions of SEGSUM.
Also fix some of the FINFO handling as it's closely entangled.
 1.10 12-Aug-2015  dholland Add IFILE32 and IFILE64 structures for the on-disk ifile entries.
Add and use accessors. There are also a bunch of places that cast and
I hope I've found them all...
 1.9 12-Aug-2015  dholland Make 32-bit and 64-bit versions of CLEANERINFO.

XXX: while this is written to disk, it seems like much of it would
XXX: be better set up as a commpage shared with the cleaner.
 1.8 02-Aug-2015  dholland Pass the fs object to LFS_MAX_DADDR so it can check lfs_is64.

Remove some hackish intentional 64->32 truncations next to the checks
using LFS_MAX_DADDR, and tackle the problem they handled in bmap
instead.

The problem: the magic block pointer value UNWRITTEN has magic value
-2, and if it's not handled specifically, uint32 -> uint64 promotion
turns it into 4294967294, which then causes consternation and
monkeyhouse downstream.

What's here is still kind of a hack, but it's a step forward.
 1.7 02-Aug-2015  dholland Add a (draft) 64-bit superblock. Make things build again.

Add pieces of support for using both superblock types where
convenient, and specifically to the superblock accessors, but don't
actually enable it anywhere.

First substantive step on PR 50000.
 1.6 02-Aug-2015  dholland Use accessor functions for the version field of the lfs superblock.
I thought at first maybe the cases that test the version should be
rolled into the accessors, but on the whole I think the conclusion on
that is no.
 1.5 02-Aug-2015  dholland Second batch of 64 -> 32 truncations in lfs, along with more minor
tidyups and corrections in passing.
 1.4 02-Aug-2015  dholland Fix assorted 64 -> 32 truncations in lfs. Also, some minor tidyups and
corrections in passing.
 1.3 02-Aug-2015  dholland Allow superblock accessors that widen 32-bit disk fields to 64-bit
memory values.
 1.2 28-Jul-2015  dholland Use lfs_accessors.h in conjunction with the cleaner's struct clfs.
Remove previous hacks.
 1.1 28-Jul-2015  dholland Add a new lfs header file: lfs_accessors.h.

This contains all the accessor functions and macros out of lfs.h.
Add an include of lfs_accessors.h after all uses of lfs.h... except
for code that wants to define its own struct lfs-alike that the
accessors are supposed to play along with. For these, set STRUCT_LFS
and include lfs_accessors.h after the necessary structure has been
defined, so that lfs_accessors.h can emit functions in terms of it.
 1.33.2.7 28-Aug-2017  skrll Sync with HEAD
 1.33.2.6 05-Feb-2017  skrll Sync with HEAD
 1.33.2.5 09-Jul-2016  skrll Sync with HEAD
 1.33.2.4 19-Mar-2016  skrll Sync with HEAD
 1.33.2.3 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.33.2.2 22-Sep-2015  skrll Sync with HEAD
 1.33.2.1 21-Sep-2015  skrll file lfs_accessors.h was added on branch nick-nhusb on 2015-09-22 12:06:17 +0000
 1.46.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.47.8.1 30-Oct-2017  snj Pull up following revision(s) (requested by maya in ticket #330):
sbin/fsck_lfs/inode.c: 1.69
sbin/fsck_lfs/lfs.c: 1.73
sbin/fsck_lfs/pass6.c: 1.50
sbin/fsck_lfs/segwrite.c: 1.46
sys/ufs/lfs/lfs.h: 1.202-1.203
sys/ufs/lfs/lfs_accessors.h: 1.48
sys/ufs/lfs/lfs_alloc.c: 1.136-1.137
sys/ufs/lfs/lfs_balloc.c: 1.94
sys/ufs/lfs/lfs_bio.c: 1.141
sys/ufs/lfs/lfs_extern.h: 1.113
sys/ufs/lfs/lfs_inode.c: 1.156-1.157
sys/ufs/lfs/lfs_inode.h: 1.20, 1.21, 1.23
sys/ufs/lfs/lfs_itimes.c: 1.20
sys/ufs/lfs/lfs_pages.c: 1.13-1.15
sys/ufs/lfs/lfs_rename.c: 1.22
sys/ufs/lfs/lfs_segment.c: 1.270-1.275
sys/ufs/lfs/lfs_subr.c: 1.94-1.97
sys/ufs/lfs/lfs_syscalls.c: 1.175
sys/ufs/lfs/lfs_vfsops.c: 1.360
sys/ufs/lfs/lfs_vnops.c: 1.316-1.321
sys/ufs/lfs/ulfs_inode.c: 1.20
sys/ufs/lfs/ulfs_inode.h: 1.24
sys/ufs/lfs/ulfs_lookup.c: 1.41
sys/ufs/lfs/ulfs_quota2.c: 1.31
sys/ufs/lfs/ulfs_readwrite.c: 1.24
sys/ufs/lfs/ulfs_vnops.c: 1.49-1.50
Update inode member i_flag --> i_state to keep up with kernel changes
Move definition of IN_ALLMOD near the flag it's a mask for.
Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
Correct confusion between i_flag and i_flags
These will have to be renamed.
Spotted by Riastradh, thanks!
Add an XXX about the missing flags so it's not buried in a commit
message.
now the XXX count for LFS is 260
Rename i_flag to i_state.
The similarity to i_flags has previously caused errors.
Use continue to denote the no-op loop to match netbsd style
newline for extra clarity.
It isn't safe to drain dirops with seglock held, it'll deadlock if there
are any dirops. drain before grabbing seglock.
lfs_dirops == 0 is always true (as we already drained dirops), so omit
that part of the comparison.
Fixes a lot of LFS deadlocks. PR kern/52301
Many thanks to dholland for help analyzing coredumps
Ifdef out KDASSERT which fires on my machine.
Deduplicate sanity check that seglock is held on segunlock
Revert r1.272 fix to PR kern/52301, the performance hit is making things
unusable.
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar
XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
fix buffer overflow/KASSERT when cookies are supplied
lfs no longer uses the ffs-style struct direct, use the correct minimum
size
from dholland
XXX more wrong
Consistently use {,UN}MARK_VNODE macros rather than function calls.
Not much point doing anything after a panic call
Ask some question about the code in a XXX comment
XXX question our double-flushing of dirops
Fix typo in comment
 1.48.12.1 17-Aug-2020  martin Pull up following revision(s) (requested by riastradh in ticket #1050):

sys/ufs/lfs/lfs_subr.c: revision 1.101
sys/ufs/lfs/lfs_subr.c: revision 1.102
sys/ufs/lfs/lfs_inode.c: revision 1.158
sys/ufs/lfs/lfs_inode.h: revision 1.25
sys/ufs/lfs/lfs_balloc.c: revision 1.95
sys/ufs/lfs/lfs_pages.c: revision 1.21
sys/ufs/lfs/lfs_vnops.c: revision 1.330
sys/ufs/lfs/lfs_alloc.c: revision 1.140 (patch)
sys/ufs/lfs/lfs_alloc.c: revision 1.141 (patch)
lib/libp2k/p2k.c: revision 1.72
sys/ufs/lfs/lfs.h: revision 1.205
sys/ufs/lfs/lfs.h: revision 1.206
sys/ufs/lfs/lfs_segment.c: revision 1.284
sys/ufs/lfs/lfs.h: revision 1.207
sys/ufs/lfs/lfs_segment.c: revision 1.285
sys/ufs/lfs/lfs_debug.c: revision 1.55
sys/ufs/lfs/lfs_rename.c: revision 1.23
usr.sbin/dumplfs/dumplfs.c: revision 1.65
sys/ufs/lfs/lfs_vfsops.c: revision 1.371
sys/arch/i386/stand/efiboot/bootx64/Makefile: revision 1.3
sys/ufs/lfs/lfs_vfsops.c: revision 1.372
sys/ufs/lfs/lfs_vfsops.c: revision 1.373
sbin/fsck_lfs/pass1.c: revision 1.46
sys/ufs/lfs/lfs_vnops.c: revision 1.326
sys/ufs/lfs/lfs_vnops.c: revision 1.327
sys/ufs/lfs/lfs_vfsops.c: revision 1.375 (patch)
sys/ufs/lfs/lfs_vnops.c: revision 1.328
sys/ufs/lfs/lfs_subr.c: revision 1.98
sys/ufs/lfs/lfs_extern.h: revision 1.116
sys/ufs/lfs/lfs_vnops.c: revision 1.329
sys/ufs/lfs/lfs_subr.c: revision 1.99
sys/ufs/lfs/lfs_extern.h: revision 1.117
sys/ufs/lfs/lfs_accessors.h: revision 1.49
sys/ufs/lfs/lfs_extern.h: revision 1.118
sys/rump/fs/lib/liblfs/Makefile: revision 1.15
sys/ufs/lfs/lfs_bio.c: revision 1.146 (patch)
sys/ufs/lfs/lfs_bio.c: revision 1.147
sys/ufs/lfs/lfs_subr.c: revision 1.100

Fix kassert in lfs by initializing vp first.

Use a marker node to iterate lfs_dchainhd / i_lfs_dchain.

I believe elements can be removed while the lock is dropped,
including the next node we're hanging on to.

Just use VOP_BWRITE for lfs_bwrite_log.
Hope this doesn't cause trouble with vfs_suspend.

Teach lfs to transition ro<->rw.

Prevent new dirops while we issue lfs_flush_dirops.

lfs_flush_dirops assumes (by KASSERT((ip->i_state & IN_ADIROP) == 0))
that vnodes on the dchain will not become involved in active dirops
even while holding no other locks (lfs_lock, v_interlock), so we must
set lfs_writer here. All other callers already set lfs_writer.

We set fs->lfs_writer++ without explicitly doing lfs_writer_enter
because
(a) we already waited for the dirops to drain, and
(b) we hold lfs_lock and cannot drop it before setting lfs_writer.

Assert lfs_writer where I think we can now prove it.

Serialize access to the splay tree with lfs_lock.

Change some cheap KDASSERT into KASSERT.

Take a reference and fix assertions in lfs_flush_dirops.
Fixes panic:
KASSERT((ip->i_state & IN_ADIROP) == 0) at lfs_vnops.c:1670
lfs_flush_dirops
lfs_check
lfs_setattr
VOP_SETATTR
change_mode
sys_fchmod
syscall

This assertion -- and the assertion that vp->v_uflag has VU_DIROP set
-- is valid only until we release lfs_lock, because we may race with
lfs_unmark_dirop which will remove the nodes and change the flags.

Further, vp itself is valid only as long as it is referenced, which it
is as long as it's on the dchain, but lfs_unmark_dirop drops the
dchain's reference.

Don't lfs_writer_enter while holding v_interlock.

There's no need to lfs_writer_enter at all here, as far as I can see.
lfs_flush_fs will do it for us.

Break deadlock in PR kern/52301.

The lock order is lfs_writer -> lfs_seglock. The problem in 52301 is
that lfs_segwrite violates this lock order by sometimes doing
lfs_seglock -> lfs_writer, either (a) when doing a checkpoint or (b),
opportunistically, when there are no dirops pending. Both cases can
deadlock, because dirops sometimes take the seglock (lfs_truncate,
lfs_valloc, lfs_vfree):
(a) There may be dirops pending, and they may be waiting for the
seglock, so we can't wait for them to complete while holding the
seglock.
(b) The test for fs->lfs_dirops == 0 happens unlocked, and the state
may change by the time lfs_writer_enter acquires lfs_lock.

To resolve this in each case:
(a) Do lfs_writer_enter before lfs_seglock, since we will need it
unconditionally anyway. The worst performance impact of this should
be that some dirops get delayed a little bit.
(b) Create a new lfs_writer_tryenter to use at this point so that the
test for fs->lfs_dirops == 0 and the acquisition of lfs_writer happen
atomically under lfs_lock.

Initialize/destroy lfs_allclean_wakeup in modcmd, not lfs_mountfs.

Fixes reloading lfs.kmod.

In lfs_update, hold lfs_writer around lfs_vflush.

Otherwise, we might do
lfs_vflush
-> lfs_seglock
-> lfs_segwait(SEGM_CKP)
-> lfs_writer_enter
which is the reverse of the lfs_writer -> lfs_seglock ordering.

Call lfs_orphan in lfs_rename while we're still in the dirop.
lfs_writer_enter can't fail; keep it simple and don't pretend it can.

Assert that mtsleep can't fail either -- it doesn't catch signals and
there's no timeout.

Teach LFS_ORPHAN_NEXTFREE about lfs64.

Dust off the orphan detection code and try to make it work.

Fix !DIAGNOSTIC compile

Fix userland references to LFS_ORPHAN_NEXTFREE.

Forgot to grep for these or do a full distribution build, oops!

Fix missing <sys/evcnt.h> by removing the evcnts instead.

Just wanted to confirm that a race might happen, and indeed it did.
These serve little diagnostic value otherwise.

OR into bp->b_cflags; don't overwrite.

CTASSERT lfs on-disk structure sizes.

Avoid misaligned access to lfs64 on-disk records in memory.
lfs64 directory entries are only 32-bit aligned in order to conserve
space in directory blocks, and we had a hack to stuff a 64-bit inode
in them. This replaces the hack by __aligned(4) __packed, and goes
further:

1. It's not clear that all the other lfs64 data structures are 64-bit
aligned on disk to begin with. We can go through these later and
upgrade them from
struct foo64 {
...
} __aligned(4) __packed;
union foo {
struct foo64 f64;
...
};
to
struct foo64 {
...
};
union foo {
struct foo64 f64 __aligned(8);
...
} __aligned(4) __packed;
if we really want to take advantage of 64-bit memory accesses.
However, the __aligned(4) __packed must remain on the union
because:
2. We access even the lfs32 data structures via a union that has
lfs64 members, and it turns out that compilers will assume access
through a union with 64-bit aligned members implies the whole
union has 64-bit alignment, even if we're only accessing a 32-bit
aligned member.

Fix clang build after packed lfs64 accessor change.

Suppress spurious address-of-packed error in rump lfs too.
 1.48.8.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.48.4.2 03-Dec-2017  jdolecek update from HEAD
 1.48.4.1 10-Jun-2017  jdolecek file lfs_accessors.h was added on branch tls-maxphys on 2017-12-03 11:39:22 +0000
 1.145 21-Sep-2025  christos lfs_freelist_prev is unused
 1.144 17-Sep-2025  perseant Add routines to check freelist consistency if compiled with DEBUG and
conditional on a kernel variable manipulated via sysctl.
Add checks before and after each routine that modifies the free list.
#if 0 a section of lfs_vfree() that was intended to keep the free list ordered
but instead corrupted it.
 1.143 15-Sep-2025  perseant Initialize nextfree, to placate gcc.
 1.142 15-Sep-2025  perseant If setting the head (or tail) of the inode free list to LFS_UNUSED_INUM, also
set the tail (resp. head) to LFS_UNUSED_INUM, as the list is now empty.

Add a check to ensure that lfs_valloc_fixed will always terminate, even
if the free list should contain a loop. Extend the ifile at the end if it
is empty, to match the assumption of lfs_valloc() that the free list is
never empty.

Needed for roll-forward.
 1.141 23-Feb-2020  riastradh Dust off the orphan detection code and try to make it work.
 1.140 23-Feb-2020  riastradh Teach LFS_ORPHAN_NEXTFREE about lfs64.
 1.139 22-Feb-2020  kamil Avoid undefined behavior in *_BITMAP_FREE() macros

left shift of 1 by 31 places cannot be represented in type 'int'
 1.138 17-Jan-2020  ad VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.
 1.137 19-Aug-2017  maya branches: 1.137.4; 1.137.8; 1.137.10;
Consistently use {,UN}MARK_VNODE macros rather than function calls.
 1.136 10-Jun-2017  maya Rename i_flag to i_state.

The similarity to i_flags has previously caused errors.
 1.135 13-Mar-2017  maya branches: 1.135.6;
Fill in some XXXs with the exact action described in them. match
lfs_valloc behaviour.
 1.134 13-Mar-2017  riastradh #if DIAGNOSTIC panic ---> KASSERT

Replace some #if DEBUG by this too. DEBUG is only for expensive
assertions; these are not.
 1.133 07-Aug-2016  dholland branches: 1.133.2;
Remove unused <sys/tree.h>
 1.132 07-Aug-2016  dholland Comments
 1.131 10-Oct-2015  dholland branches: 1.131.2;
Fix minor bitrot in #if 0 or otherwise disabled code.
 1.130 13-Sep-2015  dholland Fix wrong code in lfs_valloc_fixed(). It was overwriting the inode
number it was supposed to be allocating with the head of the inode
freelist, then applying the wrong test to that result. Net result:
unless the freelist was empty (in which case it would always fail),
it would in general drop a bunch of entries from the freelist.

This code seems to have been broken when the first version of lfsv2
was imported onto the perseant-lfsv2 branch in -r1.47.2.1, and
remained broken since, in spite of having been moved to lfs_rfw.c and
back and rearranged quite a bit in the meantime.

Sigh.

Found by Coverity in a rather confusing way as CID 1316545.
 1.129 01-Sep-2015  dholland Use the lfs dinode accessors in place of the ufs-derived ones.
(Mostly.)

The ufs-derived ones are fake structure member macros, which are gross
and not very safe. Also, it seems that a lot of places in the lfs code
were using the ffsv1 branch of them unconditionally, and this way it's
guaranteed all those places have been updated.

Found while doing this: for non-devices, have getattr produce NODEV
in the rdev field instead of leaking the address of the first direct
block.
 1.128 29-Aug-2015  mlelstv Fix IFILE pointer calculation when scanning freelist.
 1.127 12-Aug-2015  dholland Hack up dinode usage to be 64 vs. 32 as needed. Part 1.

(This part changes the native lfs code; the ufs-derived code already
has 64 vs. 32 logic, but as aspects of it are unsafe, and don't
entirely interoperate cleanly with the lfs 64/32 stuff, pass 2 will be
rehashing that.)
 1.126 12-Aug-2015  dholland Add IFILE32 and IFILE64 structures for the on-disk ifile entries.
Add and use accessors. There are also a bunch of places that cast and
I hope I've found them all...
 1.125 02-Aug-2015  dholland Use accessor functions for the version field of the lfs superblock.
I thought at first maybe the cases that test the version should be
rolled into the accessors, but on the whole I think the conclusion on
that is no.
 1.124 28-Jul-2015  dholland Add a new lfs header file: lfs_accessors.h.

This contains all the accessor functions and macros out of lfs.h.
Add an include of lfs_accessors.h after all uses of lfs.h... except
for code that wants to define its own struct lfs-alike that the
accessors are supposed to play along with. For these, set STRUCT_LFS
and include lfs_accessors.h after the necessary structure has been
defined, so that lfs_accessors.h can emit functions in terms of it.
 1.123 24-Jul-2015  dholland More lfs superblock accessors.
(This changes the rest of the code over; all the accessors were
already added.)

The difference between this commit and the previous one is arbitrary,
but the previous one passed the regression tests on its own so I'm
keeping it separate to help with any bisections that might be needed
in the future.
 1.122 24-Jul-2015  dholland Switch to accessor functions for elements of the LFS on-disk
superblock. This will allow switching between 32/64 bit forms on the
fly; it will also allow handling LFS_EI reasonably tidily. (That
currently doesn't work on the superblock.)

It also gets rid of cpp abuse in the form of fake structure member
macros.

Also, instead of doing sleep/wakeup on &lfs_avail and &lfs_nextseg
inside the on-disk superblock, add extra elements to the in-memory
struct lfs for this. (XXX: these should be changed to condvars, but
not right now)

XXX: this migrates a structure needed by the lfs code in libsa (struct
salfs) into lfs.h, where it doesn't belong, but for the time being
this is necessary in order to allow the accessors (and the various
lfs macros and other goop that relies on them) to compile.
 1.121 16-Jul-2015  dholland Don't cast the return value of malloc.
 1.120 31-May-2015  hannken Change lfs from hash table to vcache.

- Change lfs_valloc() to return an inode number and version instead of
a vnode and move lfs_ialloc() and lfs_vcreate() to new lfs_init_vnode().

- Add lfs_valloc_fixed() to allocate a known inode, used by kernel
roll forward.

- Remove lfs_*ref(), these functions cannot coexist with vcache and
their commented behaviour is far away from their implementation.

- Add the cleaner lwp and blockinfo to struct ulfsmount so lfs_loadvnode()
may use hints from the cleaner.

- Remove vnode locks from ulfs_lookup() like we did with ufs_lookup().
 1.119 28-Jul-2013  dholland branches: 1.119.6;
Add more of the bits for supporting quotas.
 1.118 28-Jul-2013  dholland Add lfs_kernel.h for declarations that don't need to be exposed to userland.

lfs currently has the following headers:
lfs.h - on-disk structures and stuff needed for userlevel tools
lfs_inode.h - additional restricted materials for userlevel tools
that operate the fs (newfs_lfs, fsck_lfs, lfs_cleanerd)
lfs_kernel.h - stuff needed only in the kernel

and the following legacy headers that are expected to be mopped up and
folded into one of the above:
lfs_extern.h - function prototypes
ulfs_bswap.h - endian-independent support
ulfs_dinode.h - now contains very little
ulfs_dirhash.h - dirhash support
ulfs_extattr.h - extattr support
ulfs_extern.h - more function prototypes
ulfs_inode.h - assorted kernel-only declarations
ulfs_quota.h - quota support
ulfs_quota1.h - more quota support
ulfs_quota2.h - more quota support
ulfs_quotacommon.h - more quota support
ulfsmount.h - legacy copy of ufsmount material
 1.117 18-Jun-2013  christos branches: 1.117.2;
Prefix most of the cpp macros with lfs_ and LFS_ to avoid conflicts with ffs.
This was done so that boot blocks that want to compile both FFS and LFS in
the same file work.
 1.116 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.115 06-Jun-2013  dholland Split lfs from ufs step 3: rearrange config stuff.
Add new options:
LFS_EI
LFS_DIRHASH
LFS_EXTATTR
LFS_EXTATTR_AUTOSTART
LFS_QUOTA
LFS_QUOTA2

and update code referring to the corresponding FFS and UFS config
symbols to use the LFS versions. Disable the one extant reference
to APPLE_UFS in the ulfs files. Use opt_lfs.h only, not opt_ffs.h.
 1.114 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.113 22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.112 16-Feb-2012  perseant branches: 1.112.2;
Pass t_renamerace and t_rmdirrace tests.

Adapt dholland@'s fix to ufs_rename to fix PR kern/43582. Address several
other MP locking issues discovered during the course of investigating the
same problem.

Removed extraneous vn_lock() calls on the Ifile, since the Ifile writes
are controlled by the segment lock.

Fix PR kern/45982 by deemphasizing the estimate of how much metadata
will fill the empty space on disk when the disk is nearly empty
(t_renamerace crates a lot of inode blocks on a tiny empty disk).
 1.111 12-Jun-2011  rmind branches: 1.111.2; 1.111.6; 1.111.8;
Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.110 24-Jun-2010  hannken branches: 1.110.6;
Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.
 1.109 08-Jan-2010  pooka branches: 1.109.2; 1.109.4;
The VATTR_NULL/VREF/VHOLD/HOLDRELE() macros lost their will to live
years ago when the kernel was modified to not alter ABI based on
DIAGNOSTIC, and now just call the respective function interfaces
(in lowercase). Plenty of mix'n match upper/lowercase has creeped
into the tree since then. Nuke the macros and convert all callsites
to lowercase.

no functional change
 1.108 13-Sep-2009  tsutsui Move declaration of ufs_hashlock into <ufs/ufs_extern.h> from each c source.
 1.107 28-Apr-2008  martin Remove clause 3 and 4 from TNF licenses
 1.106 30-Jan-2008  ad branches: 1.106.6; 1.106.8; 1.106.10;
Replace struct lock on vnodes with a simpler lock object built on
krwlock_t. This is a step towards removing lockmgr and simplifying
vnode locking. Discussed on tech-kern.
 1.105 02-Jan-2008  ad Merge vmlocking2 to head.
 1.104 12-Dec-2007  he Fix a use of lfs_truncate() inside an #ifdef notyet (so no resulting change);
lfs_truncate() has lost its lwp argument.
 1.103 10-Oct-2007  ad branches: 1.103.4; 1.103.6; 1.103.8; 1.103.10;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.102 08-Oct-2007  ad Merge ffs locking & brelse changes from the vmlocking branch.
 1.101 10-Jul-2007  hannken branches: 1.101.6; 1.101.8; 1.101.10;
Move `struct dquot' and its supporting functions from quota.h to ufs_quota.c.

- Make quota-internal functions static.
- Clean up declarations in quota.h and ufs_extern.h. quota.h now has the
description of quota criterions, on-disk structure, user-kernel interface and
declaration of init/done functions. All ufs quota related function
prototypes go to ufs_extern.h.
- New functions ufsquota_init() and ufsquota_free() create or destroy the
quota fields of `struct inode'.
- chkdq() and chkiq() always update the quota fields of `struct inode' first.
- Only ufs_access() explicitely calls getinoquota().

No objections on tech-kern@
 1.100 15-Feb-2007  ad branches: 1.100.6; 1.100.8;
Replace some uses of lockmgr() / simplelocks.
 1.99 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.98 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.97 01-Sep-2006  perseant branches: 1.97.2; 1.97.4;
Changes to help the roll-forward agent, to wit:

* Mark being-deleted files in the Ifile so we can finish deleting them
at fs mount time.
* Flag the Ifile with "cleaner must clean" when writers are waiting for
the cleaner, rather than relying solely on the cleaner's estimation of
whether it should clean or not.
* Note partial segments written by a user agent (in particular,
fsck_lfs) so that repeated rolls forward don't interfere with one
another.
* Add a new fcntl, LFCNPASS, that allows the log to wrap exactly once,
for better testing of the validity of checkpoints.
* Keep track of the on-disk nlink count when cleaning, so that we don't
partially complete directory operations while cleaning.
* Ensure that every single Ifile inode write represents a consistent
view of the filesystem. In particular, the accounting for the segment
we are writing the inode into must be correct, and the accounting for
the segment that inode used to reside in must be correct. Rather than
just rewriting the inode if we wrote it wrong, rewrite the necessary
ifile blocks before writing the inode so we never write it wrong.
* Don't unmark any VDIROP vnodes if we haven't written them to disk,
avoiding yet another problem with the "wait for the cleaner" error
return from lfs_putpages().

Also, move the last callback to an aiodone call, so we no longer do any
memory management from interrupt context.
 1.96 20-Jul-2006  perseant Separate the (non-working) LFS kernel roll-forward code into its own file,
lfs_rfw.c.
 1.95 06-Jul-2006  perseant Protect lfs_order_freelist() with the segment lock.
 1.94 14-May-2006  elad branches: 1.94.4;
integrate kauth.
 1.93 12-May-2006  perseant Fixes to address the "vinvalbuf: dirty blocks" panic that can occur when
many inodes are cleaned at once. Make sure that we write all the pages
on vnodes that are being flushed, even if we don't think there's room;
drain v_numoutput before lfs_vflush() completes.

Also, don't allow a vnode that is in the process of being cleaned to be
chosen by getnewvnode(); this avoids a segment accounting panic in the case
that a large number of inodes are fed to lfs_markv() all at once.
 1.92 04-May-2006  perseant Introduce another per-filesystem parameter, lfs_resvseg, to separate the
notion of "how many segments are reserved for the cleaner" from that of
"how many segments are not counted in lfs_bfree". The default value
used for existing filesystems is the same as the previous implicit value
of (lfs_minfreeseg / 2 + 1), modulo some sanity checking.

Count pending dirops on a per-filesystem basis, since once we start
writing them we can't stop until we're done. This seems to help stave off
the "no clean segments" panic in the case of filling the filesystem with
directories and small files (e.g. simultaneously unpacking more copies of
pkgsrc than will fit).
 1.91 30-Apr-2006  perseant Add an explicit list initialization that was missing from my last commit.
 1.90 30-Apr-2006  perseant Postpone the segment accounting changes coming from truncation until the
inode that makes those changes valid is either written to disk by
lfs_writeinode() or discarded by lfs_vfree().

A couple of locking fixes are also included as well.
 1.89 22-Apr-2006  perseant Fix a fencepost error in the bitmap handling in extend_ifile(), and another
in lfs_freelist_prev().
 1.88 10-Apr-2006  perseant Optimize the free list search a little more; in particular use words
instead of bytes for the index, and never search below fs->lfs_freehd.

Fix a bug in the previous version of the search (an erroneous assumption
that ino_t was signed).

Free the bitmap when we unmount the filesystem.
 1.87 08-Apr-2006  perseant Keep the free list ordered. This solves a problem first pointed out to me
by Michel Oey, in which an aged LFS writes up to an extra Ifile block for
every file created; and paves the way for the truncation of the Ifile when
many files are deleted.
 1.86 11-Dec-2005  christos branches: 1.86.4; 1.86.6; 1.86.8; 1.86.10; 1.86.12;
merge ktrace-lwp.
 1.85 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.84 19-Aug-2005  christos branches: 1.84.2;
64 bit inode changes.
 1.83 29-May-2005  christos branches: 1.83.2;
- sprinkle const
- avoid shadow variables.
 1.82 19-Apr-2005  perseant Keep per-inode, per-fs, and subsystem-wide counts of blocks allocated through
lfs_balloc(), and use that to estimate the number of dirty pages belonging
to LFS (subsystem or filesystem). This is almost certainly wrong for
the case of a large mmap()ed region, but the accounting is tighter than
what we had before, and performs much better in the typical case of pages
dirtied through write().
 1.81 16-Apr-2005  perseant Use splay trees, rather than a hash table, to manage the accounting of
blocks allocated through VOP_BALLOC() for pages to be written to disk.
This accounting no longer takes a noticeable fraction of the system CPU.
 1.80 14-Apr-2005  perseant Consolidate the hash table we use to maintain the integrity of lfs_avail
into a single, system-wide table, rather than having a separate hash table
per inode. Significantly reduces the "system" cpu usage of your average
file write.
 1.79 14-Apr-2005  perseant Keep track of the highest block held by an LFS inode, so that we can
be assured that the last byte of a file is always allocated. Previously
a file extension could cause the filesystem to be flushed, writing an
inconsistent inode to disk. Although this condition would be corrected
the next time blocks were written to disk, an intervening crash would leave
the filesystem in an inconsistent state, leaving fsck_lfs to complain
of an inode "partially truncated".
 1.78 01-Apr-2005  perseant Protect various per-fs structures with fs->lfs_interlock simple_lock, to
improve behavior in the multiprocessor case. Add debugging segment-lock
assertion statements.
 1.77 23-Mar-2005  perseant Make LFS dirops get their vnode first, before incrementing the dirop count,
to prevent a deadlock trying to call VOP_PUTPAGES() on a VDIROP vnode.
This can happen when a stacked filesystem is mounted on top of an LFS: an
LFS dirop needs to get a vnode, which is available from the upper layer.
The corresponding lower layer vnode, however, is VDIROP, so the upper layer
can't be cleaned out since its VOP_PUTPAGES() is passed through to the lower
layer, which waits for dirops to drain before it can proceed. Deadlock.

Tweak ufs_makeinode() and ufs_mkdir() to pass the a_vpp argument through
to VOP_VALLOC().

Partially addresses PR # 26043, though it probably does not completely fix
the problem described there.
 1.76 08-Mar-2005  perseant branches: 1.76.2;
Straighten out the maze of ifdefs. Instead, consolidate all the debugging
stuff under '#ifdef DEBUG', and use sysctl knobs to turn on/off particular
parts of the debugging reporting (if DEBUG is enabled). Re-enable the LFS
statistics in sysctl, while I'm there. A bit of a rototill.
 1.75 26-Feb-2005  perry nuke trailing whitespace
 1.74 26-Feb-2005  perseant Various minor LFS improvements:

* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statvfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().
 1.73 14-Aug-2004  mycroft branches: 1.73.4; 1.73.6;
Add a new flag, IN_MODIFY. This is like IN_UPDATE|IN_CHANGE, but unlike
setting those flags, it does not cause the inode to be written in the periodic
sync. This is used for writes to special files (devices and named pipes) and
FIFOs.

Do not preemptively sync updates to access times and modification times. They
are now updated in the inode only opportunistically, or when the file or device
is closed. (Really, it should be delayed beyond close, but this is enough to
help substantially with device nodes.)

And the most amusing part:
Trickle sync was broken on both FFS and ext2fs, in different ways. In FFS, the
periodic call to VFS_SYNC(MNT_LAZY) was still causing all file data to be
synced. In ext2fs, it was causing the metadata to *not* be synced. We now
only call VOP_UPDATE() on the node if we're doing MNT_LAZY. I've confirmed
that we do in fact trickle correctly now.
 1.72 23-Sep-2003  yamt branches: 1.72.4;
cleanup IN_ADIROP/VDIROP handling a little.
 1.71 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.70 12-Jul-2003  yamt - protect global resource counts with lfs_subsys_lock.
- clean up scattered externs a little.
 1.69 29-Jun-2003  fvdl branches: 1.69.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.68 29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.67 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.66 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.65 15-Mar-2003  perseant Add simple_lock protection for lfs_seglock and lfs_subsys_pages; these will
be expanded to cover other per-fs and subsystem-wide data as well.

Fix a case of IN_MODIFIED being set without updating lfs_uinodes, resulting
in a "lfs_uinodes < 0" panic.

Fix a deadlock in lfs_putpages arising from the need to busy all pages in a
block; unbusy any that had already been busied before starting over.
 1.64 20-Feb-2003  perseant Tabify, and fix some comment alignment problems.
 1.63 17-Feb-2003  perseant Add code to UBCify LFS. This is still behind "#ifdef LFS_UBC" for now
(there are still some details to work out) but expect that to go
away soon. To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:

* Create a writer daemon kernel thread whose purpose is to handle page
writes for the pagedaemon, but which also takes over some of the
functions of lfs_check(). This thread is started the first time an
LFS is mounted.

* Add a "flags" parameter to GOP_SIZE. Current values are
GOP_SIZE_READ, meaning that the call should return the size of the
in-core version of the file, and GOP_SIZE_WRITE, meaning that it
should return the on-disk size. One of GOP_SIZE_READ or
GOP_SIZE_WRITE must be specified.

* Instead of using malloc(...M_WAITOK) for everything, reserve enough
resources to get by and use malloc(...M_NOWAIT), using the reserves if
necessary. Use the pool subsystem for structures small enough that
this is feasible. This also obsoletes LFS_THROTTLE.

And a few that are not strictly necessary:

* Moves the LFS inode extensions off onto a separately allocated
structure; getting closer to LFS as an LKM. "Welcome to 1.6O."

* Unified GOP_ALLOC between FFS and LFS.

* Update LFS copyright headers to correct values.

* Actually cast to unsigned in lfs_shellsort, like the comment says.

* Keep track of which segments were empty before the previous
checkpoint; any segments that pass two checkpoints both dirty and
empty can be summarily cleaned. Do this. Right now lfs_segclean
still works, but this should be turned into an effectless
compatibility syscall.
 1.62 27-Jan-2003  yamt make these compilable with lfs debug options.
(follow daddr_t change)

XXX maybe segment number should be 64bit.
 1.61 25-Jan-2003  tron Use PRId64 instead of hard coding "%lld" to fix build problems under
LP64 ports.
 1.60 25-Jan-2003  tron Fix printf() format strings problems caused by "daddr_t" change.
 1.59 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.58 08-Jan-2003  yamt use lfs_unmark_vnode instead of duplicated code fragments.
 1.57 24-Nov-2002  yamt make sure i_lfs_fragsize is initialized.
fix panic "lfs_writefile: more than one fragment!"
PR 18974.
 1.56 14-May-2002  perseant Phase one of my three-phase plan to make LFS play nice with UBC, and bug-fixes
I found while making sure there weren't any new ones.

* Make the write clusters keep track of the buffers whose blocks they contain.
This should make it possible to (1) write clusters using a page mapping
instead of malloc, if desired, and (2) schedule blocks for rewriting
(somewhere else) if a write error occurs. Code is present to use
pagemove() to construct the clusters but that is untested and will go away
anyway in favor of page mapping.
* DEBUG now keeps a log of Ifile writes, so that any lingering instances of
the "dirty bufs" problem can be properly debugged.
* Keep track of whether the Ifile has been dirtied by various routines that
can be called by lfs_segwrite, and loop on that until it is clean, for
a checkpoint. Checkpoints need to be squeaky clean.
* Warn the user (once) if the Ifile grows larger than is reasonable for their
buffer cache. Both lfs_mountfs and lfs_unmount check since the Ifile can
grow.
* If an inode is not found in a disk block, try rereading the block, under
the assumption that the block was copied to a cluster and then freed.
* Protect WRITEINPROG() with splbio() to fix a hang in lfs_update.
 1.55 04-Feb-2002  perseant Correct free list tail pointer, when adding blocks of new inodes to v2
filesystems. Should fix PR #14408.
 1.54 18-Dec-2001  chs use the new compatibility routines to allow mmap() to work
(in the same non-coherent fashion that it worked pre-UBC)
until someone has time to do it the right way.
 1.53 23-Nov-2001  chs add spaces for KNF. confirmed to produce identical objects.
 1.52 08-Nov-2001  lukem add RCSID
 1.51 14-Oct-2001  chs branches: 1.51.2;
initialize the vnode's copy of the size in lfs_ialloc().
 1.50 28-Sep-2001  chs don't depend on other headers to include sys/proc.h for us.
 1.49 15-Sep-2001  chs a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.48 13-Jul-2001  perseant branches: 1.48.2;
Merge the short-lived perseant-lfsv2 branch into the trunk.

Kernels and tools understand both v1 and v2 filesystems; newfs_lfs
generates v2 by default. Changes for the v2 layout include:

- Segments of non-PO2 size and arbitrary block offset, so these can be
matched to convenient physical characteristics of the partition (e.g.,
stripe or track size and offset).

- Address by fragment instead of by disk sector, paving the way for
non-512-byte-sector devices. In theory fragments can be as large
as you like, though in reality they must be smaller than MAXBSIZE in size.

- Use serial number and filesystem identifier to ensure that roll-forward
doesn't get old data and think it's new. Roll-forward is enabled for
v2 filesystems, though not for v1 filesystems by default.

- The inode free list is now a tailq, paving the way for undelete (undelete
is not yet implemented, but can be without further non-backwards-compatible
changes to disk structures).

- Inode atime information is kept in the Ifile, instead of on the inode;
that is, the inode is never written *just* because atime was changed.
Because of this the inodes remain near the file data on the disk, rather
than wandering all over as the disk is read repeatedly. This speeds up
repeated reads by a small but noticeable amount.

Other changes of note include:

- The ifile written by newfs_lfs can now be of arbitrary length, it is no
longer restricted to a single indirect block.

- Fixed an old bug where ctime was changed every time a vnode was created.
I need to look more closely to make sure that the times are only updated
during write(2) and friends, not after-the-fact during a segment write,
and certainly not by the cleaner.
 1.47 30-May-2001  mrg branches: 1.47.2; 1.47.4;
use _KERNEL_OPT
 1.46 03-Dec-2000  perseant branches: 1.46.2;
Get rid of some old unnecessary code that cleared B_NEEDCOMMIT from buffers in
lfs_writeseg (possibly after they had been freed).

If MALLOCLOG is defined, make lfs_newbuf and lfs_freebuf pass along the
caller's file and line to _malloc and _free.
 1.45 27-Nov-2000  chs Initial integration of the Unified Buffer Cache project.
 1.44 27-Nov-2000  perseant If LFS_DO_ROLLFORWARD is defined, roll forward from the older checkpoint
on mount, through the newer checkpoint and on through any newer
partial-segments that may have been written but not checkpointed because
of an intervening crash.

LFS_DO_ROLLFORWARD is not defined by default.
 1.43 09-Sep-2000  perseant Various bug-fixes to LFS, to wit:


Kernel:

* Add runtime quantity lfs_ravail, the number of disk-blocks reserved
for writing. Writes to the filesystem first reserve a maximum amount
of blocks before their write is allowed to proceed; after the blocks
are allocated the reserved total is reduced by a corresponding amount.

If the lfs_reserve function cannot immediately reserve the requested
number of blocks, the inode is unlocked, and the thread sleeps until
the cleaner has made enough space available for the blocks to be
reserved. In this way large files can be written to the filesystem
(or, smaller files can be written to a nearly-full but thoroughly
clean filesystem) and the cleaner can still function properly.

* Remove explicit switching on dlfs_minfreeseg from the kernel code; it
is now merely a fs-creation parameter used to compute dlfs_avail and
dlfs_bfree (and used by fsck_lfs(8) to check their accuracy). Its
former role is better assumed by a properly computed dlfs_avail.

* Bounds-check inode numbers submitted through lfs_bmapv and lfs_markv.
This prevents a panic, but, if the cleaner is feeding the filesystem
the wrong data, you are still in a world of hurt.

* Cleanup: remove explicit references of DEV_BSIZE in favor of
btodb()/dbtob().

lfs_cleanerd:

* Make -n mean "send N segments' blocks through a single call to
lfs_markv". Previously it had meant "clean N segments though N calls
to lfs_markv, before looking again to see if more need to be cleaned".
The new behavior gives better packing of direct data on disk with as
little metadata as possible, largely alleviating the problem that the
cleaner can consume more disk through inefficient use of metadata than
it frees by moving dirty data away from clean "holes" to produce
entirely clean segments.

* Make -b mean "read as many segments as necessary to write N segments
of dirty data back to disk", rather than its former meaning of "read
as many segments as necessary to free N segments worth of space". The
new meaning, combined with the new -n behavior described above,
further aids in cleaning storage efficiency as entire segments can be
written at once, using as few blocks as possible for segment summaries
and inode blocks.

* Make the cleaner take note of segments which could not be cleaned due
to error, and not attempt to clean them until they are entirely free
of dirty blocks. This prevents the case in which a cleanerd running
with -n 1 and without -b (formerly the default) would spin trying
repeatedly to clean a corrupt segment, while the remaining space
filled and deadlocked the filesystem.

* Update the lfs_cleanerd manual page to describe all the options,
including the changes mentioned here (in particular, the -b and -n
flags were previously undocumented).

fsck_lfs:

* Check, and optionally fix, lfs_avail (to an exact figure) and
lfs_bfree (within a margin of error) in pass 5.

newfs_lfs:

* Reduce the default dlfs_minfreeseg to 1/20 of the total segments.

* Add a warning if the sgs disklabel field is 16 (the default for FFS'
cpg, but not usually desirable for LFS' sgs: 5--8 is a better range).

* Change the calculation of lfs_avail and lfs_bfree, corresponding to
the kernel changes mentioned above.

mount_lfs:

* Add -N and -b options to pass corresponding -n and -b options to
lfs_cleanerd.

* Default to calling lfs_cleanerd with "-b -n 4".


[All of these changes were largely tested in the 1.5 branch, with the
idea that they (along with previous un-pulled-up work) could be applied
to the branch while it was still in ALPHA2; however my test system has
experienced corruption on another filesystem (/dev/console has gone
missing :^), and, while I believe this unrelated to the LFS changes, I
cannot with good conscience request that the changes be pulled up.]
 1.42 05-Jul-2000  perseant Clean up accounting of lfs_uinodes (dirty but unwritten inodes).

Make lfs_uinodes a signed quantity for debugging purposes, and set it to
zero as fs mount time.

Enclose setting/clearing of the dirty flags (IN_MODIFIED, IN_ACCESSED,
IN_CLEANING) in macros, and use those macros everywhere. Make
LFS_ITIMES use these macros; updated the ITIMES macro in inode.h to know
about this. Make ufs_getattr use ITIMES instead of FFS_ITIMES.
 1.41 03-Jul-2000  perseant i_lfs_effnblks fixes. Put debugging printfs under #ifdef DEBUG_LFS.
 1.40 30-Jun-2000  fvdl Rearrange code around getnewvnode as was already done for ffs, to avoid
locking against oneself because getnewvnode recycles a softdep-using vnode.
 1.39 28-Jun-2000  mrg remove include of <vm/vm.h> and <uvm/uvm_extern.h>
 1.38 27-Jun-2000  perseant Fixes associated with filling an LFS:

Change the space computation to appear to change the size of the *disk*
rather than the *bytes used* when more segment summaries and inode
blocks are written. Try to estimate the amount of space that these will
take up when more files are written, so the disk size doesn't change too
much.

Regularize error returns from lfs_valloc, lfs_balloc, lfs_truncate: they
now fail entirely, rather than succeeding half-way and leaving the fs in
an inconsistent state.

Rewrite lfs_truncate, mostly stealing from ffs_truncate. The old
lfs_truncate had difficulty truncating a large file to a non-zero size
(indirect blocks were not handled appropriately).

Unmark VDIROP on fvp after ufs_remove, ufs_rmdir, so these can be
reclaimed immediately: this vnode would not be written to disk again
anyway if the removal succeeded, and if it failed, no directory
operation occurred.

ufs_makeinode and ufs_mkdir now remove IN_ADIROP on error.
 1.37 22-Jun-2000  perseant fix my own typo, grr....
 1.36 22-Jun-2000  perseant Read i_ffs_gen from the version number in the Ifile during lfs_valloc,
instead of keeping it always == 1. (The ifile version number is
increased on vfree.) May address PR #7213, but I haven't been able to
test thoroughly enough to say for sure.
 1.35 22-Jun-2000  perseant Update lfs_vunref for the fact that now a vnode can be locked with no
references (locked for VOP_INACTIVE at the end of vrele) and it's okay.
Check the return value of lfs_vref where appropriate.
Fixes PR #s 10285 and 10352.
 1.34 06-Jun-2000  perseant branches: 1.34.2;
Protect inode free list with seglock, instead of separate lock, so that
the head of the inode free list (on the superblock) always matches the
rest of the free list (in the ifile).

Protect lfs_fragextend with seglock, to prevent the segment byte count
fudging from making its way to disk.

Don't try to inactivate dirop vnodes that are still in the middle of
their dirop (may address PR#10285).
 1.33 31-May-2000  perseant update for IN_ACCESSED changes
 1.32 27-May-2000  perseant branches: 1.32.2;
Prevent dirops from getting around lfs_check and wedging the buffer cache.
All the dirop vnops now mark the inodes with a new flag, IN_ADIROP, which
is removed as soon as the dirop is done (as opposed to VDIROP which stays
until the file is written). To address one issue raised in PR#9357.
 1.31 19-Jan-2000  perseant Changes to stabilize LFS. The first two of these should also apply to the
1.4 branch.

* Use a separate per-fs lock, instead of ufs_hashlock, to protect the Inode
free list. This seems to prevent the "lockmgr: %d, not exclusive lock holder
%d, unlocking" message I was mis-attributing last night to an unlocked vnode
being passed to vrele.

* Change calling semantics of lfs_ifind, to give better error reporting:
If fed a struct buf, it can report the block number of the offending inode
block as well as the inode number.

* Back out rev 1.10 of lfs_subr.c, since the replacement code was slightly
uglier while being functionally identical.

* Make lfs_vunref use the same free list convention as vrele/vput, so that
vget does not remove vnodes from a hash list they are not on.
 1.30 15-Dec-1999  perseant Fix error returns on lfs vnops so that locks and reference counts are
preserved. Handle dirop accounting in lfs_vfree for this case as well.
May address PR#8823.
 1.29 15-Nov-1999  fvdl Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.28 12-Nov-1999  perseant Back out my patch of the 8th (to address unreferenced inode problem).
Apparently this needs more thought.
 1.27 09-Nov-1999  perseant If ifile blocks were written before dirops were complete, and then the
system crashed, inodes could be allocated that were not referenced. (Though
not a serious problem, it evidences itself in phase 4 of fsck_lfs.) Fix
this by marking if_daddr with UNASSIGNED before the inodes are actually
written; at mount time the ifile is checked for UNASSIGNED entries and
any that are found are linked back into the free list. (The latter
functionality should move into the roll-forward agent when it materializes.)
 1.26 06-Nov-1999  perseant branches: 1.26.2;
Address ufs_hashlock/ufs_ihashins protocol bug, discovered while doing a
post-mortem of a production machine. Also, take the active dirop
count off of the fs and make it global (since it is measuring a global
resource) and tie the threshold value LFS_MAXDIROP to desiredvnodes.
 1.25 03-Sep-1999  perseant branches: 1.25.2; 1.25.4; 1.25.6;
Make changes that will allow an LFS filesystem to be used as the root
filesystem. In particular,

- Fix mknod deadlock, described in PR 8172.
- Enable lfs_mountroot.
- Make lfs_writevnodes treat filesystems mounted on lfs device nodes properly,
by flushing that device rather than trying to add blocks to the device inode.

This, in combination with lfs boot blocks, will allow operation of an all-lfs
system.
 1.24 08-Jul-1999  wrstuden Modify file systems to deal with struct lock in struct vnode. All leaf
fs's other than nfs use genfs_lock() for locking.

Modify lookup routines to set PDIRUNLOCK when they unlock the parrent.
 1.23 17-Jun-1999  tls squash some compiler warnings on debug printfs by casting to int
 1.22 15-Jun-1999  perseant Minor changes to the segment live bytes calculation. In particular, fixed
a bug in fragment extension that could run the count negative. Also, don't
overcount for inodes, and don't count segment summaries. Thus, for empty
segments the live bytes count should now be exactly zero.
 1.21 16-Apr-1999  perseant Other half of the ufs_hashlock locking fix (oops)
 1.20 16-Apr-1999  perseant Fix locking panic on ufs_hashlock
 1.19 11-Apr-1999  perseant Take out the `#ifdef USE_UFSHASH'; use ufs_hashlock to lock the inode free
list instead of free_lock.
 1.18 24-Mar-1999  mrg branches: 1.18.2;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.
 1.17 10-Mar-1999  perseant New sources should leave the LFS in a more-or-less working state. Changes
include:

- DIROP segregation is enabled, and greater care is taken
to make sure that a checkpoint completes. Fsck is not
needed to remount the filesystem.
- Several checks to make sure that the LFS subsystem does not
overuse various resources (memory, in particular).
- The cleaner routines, lfs_markv in particular, are completely
rewritten. A buffer overflow is removed. Greater care is taken
to ensure that inodes come from where lfs_cleanerd say they come
from (so we know nothing has changed since lfs_bmapv was called).
- Fragment allocation is fixed, so that writes beyond end-of-file
do the right thing.
 1.16 23-Oct-1998  thorpej Use DINODE_SIZE rather than sizeof(struct dinode).
 1.15 01-Sep-1998  thorpej Use the pool allocator and the "nointr" pool page allocator for LFS inodes.
 1.14 24-Jun-1998  sommerfe Always include fifos; "not an option any more".
 1.13 09-Jun-1998  scottr Protect various config(8)-generated files from inclusion while
building LKMs. Fixes PR 5557.
 1.12 08-Jun-1998  scottr Use the newly-defined opt_quota.h.
 1.11 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.10 07-Feb-1998  chs add UVM stuff.
 1.9 04-Jul-1997  drochner Don't cast 64bit (off_t) file sizes to vm_offset_t (32bit on many
architectures), truncate them intelligently instead.
The truncation is done centralized in vnode_pager.c.
This prevents from wrap-over effects when parts of large (>2^32 byte) files
are mmapped.
Don't allow to mmap above the numerical range of vm_offset_t.
This is considered a temporary solution until the vm system handles the
object sizes/offsets more cleanly.
 1.8 11-Jun-1997  bouyer Add support for ext2fs, this needed a few modifications to ufs/ufs/inode.h:
- added an "union inode_ext" to struct inode, for the per-fs extentions.
For now only ext2fs uses it.
- i_din is now an union:
union {
struct dinode ffs_din; /* 128 bytes of the on-disk dinode. */
struct ext2fs_dinode e2fs_din; /* 128 bytes of the on-disk dinode. */
} i_din
Added a lot of #define i_ffs_* and i_e2fs_* to access the fields.
- Added two macros: FFS_ITIMES and EXT2FS_ITIMES. ITIMES calls the rigth
macro, depending on the time of the inode. ITIMES is used where necessary,
FFS_ITIMES and EXT2FS_ITIMES in other places.
 1.7 10-Mar-1997  mycroft Just increment the generation count. Using the time is bogus and defeats
fsirand(8).
 1.6 12-Oct-1996  christos branches: 1.6.6;
revert previous kprintf changes
 1.5 10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.4 25-Mar-1996  pk Appease gcc: unused variables if !QUOTA
 1.3 09-Feb-1996  christos lfs prototypes
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.6.6.1 12-Mar-1997  is Merge in changes from Trunk
 1.18.2.8 20-Jan-2000  he Pull up revision 1.31 (via patch, requested by perseant):
Files removed (through unlink, rmdir) are now really removed, though the
removal is postponed until the dirop is complete to ensure validity of
the filesystem through a crash. Use a separate per-fs lock, instead of
ufs_hashlock, to protect the inode free list. Change calling semantics
of lfs_ifind, to give better error reporting: If fed a struct buf, it
can report the block number of the offending inode block as well as the
inode number.
 1.18.2.7 15-Jan-2000  he Pull up revision 1.30 (requested by perseant):
Fix error returns on lfs vnops so that locks and reference counts
are preserved. Handle dirop accounting in lfs_vfree for this
case as well. Addresses PR#8823.
 1.18.2.6 15-Jan-2000  he Pull up revision 1.25 (requested by perseant):
Address problems related to using an LFS filesystem as the root
filesystem, including mknod hangs. Fixes PR#8172 and PR#9072.
 1.18.2.5 17-Dec-1999  he Pull up revision 1.26 (requested by perseant):
Address locking protocol error for inode hash, and make the
maximum number of active dirops a global quantity.
 1.18.2.4 03-Sep-1999  he Pull up revision 1.23:
Fix a printf format bug that gives compiler warnings/errors on
64-bit platforms, fixing PR#8241. (perseant)
 1.18.2.3 25-Jun-1999  perry pullup 1.21->1.22 (perseant)
 1.18.2.2 16-Apr-1999  perseant branches: 1.18.2.2.2; 1.18.2.2.4;
Pull up src/sys/ufs/lfs: lfs_alloc.c 1.19->1.21.

This fixes another locking problem, this time a lock on ufs_hashlock in
lfs_vfree. The lock could be held by a process calling getnewvnode, and
then attempted again by lfs_vfree. This works around that, not attempting
to get the lock if curproc already holds it.
 1.18.2.1 13-Apr-1999  perseant Pull-up of changes made to the trunk on Sunday [1.18->1.19], to wit:

Take out the `#ifdef USE_UFSHASH'; use ufs_hashlock to lock the inode free
list instead of free_lock.

Fix inode reporting in lfs_statfs (the meaning of f_files and f_ffree was
reversed).

Fix "lfs_ifind: dinode xxx not found" panic. When inodes were freed, then
immediately reloaded, their dinodes were located in an inode block which
was not on disk at the advertized location, nor in the cache (although it
would be flushed to disk next segment write). Fix this by using getblk()
instead of lfs_newbuf() for inode blocks.

Better checking for held inode locks in lfs_fastvget, for a number of
error conditions. Also change the default setting of lfs_clean_vnhead to
0, which seems to make the locking problems go away (although this is
difficult to test as I can't reliably reproduce them).

Make sure that the wakeup occurs for vnodes that lfs_update might be
sleeping on (nodes which are not marked IN_MODIFIED/IN_CLEANING, but which
have dirty buffers), by marking them with the appropriate flag if
dirtybuffers were added while the write was in progress.

Fix block counting during file truncation, if not truncating to zero.

Disallow threshold-initiated cache flush when dirops are active. Also,
make SET_ENDOP use lfs_check instead of inlining most of it.

Improve the debugging printfs in the cleaner syscalls (in particular, make
it obvious that they're coming from lfs).

Check the superblock version field, and refuse to mount the filesystem if
the version number is higher than we know about. This allows, e.g.,
changes in the format of the ifile, segment size restrictions and
boundaries, etc., which would not affect existing fields in the
superblock, but which would drastically affect the filesystem, to be
smoothly integrated at a later date.
 1.18.2.2.4.1 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.18.2.2.2.4 31-Aug-1999  perseant Rudimentary support for LFS under UBC:

- LFS-specific VOP_BALLOC and VOP_PUTPAGES vnode ops.

- getblk VREG panic #ifdef'd out (can be reinstated when Ifile is
internalized and Ifile can be made another type from VREG)

- interface to VOP_PUTPAGES changed to pass all pager flags, not
just sync. FS putpages routines must know about the pager flags.

- new LFS magic disk address, -2 ("unwritten"), meaning accounted for
but not assigned to a fixed disk location (since LFS does these two
things separately, and the previous accounting method using buffer
headers no longer will work). Changed references to (foo == (daddr_t)-1)
to (foo < 0). Since disk drivers reject all addresses < 0, this should
not present a problem for other FSs.
 1.18.2.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.18.2.2.2.2 11-Jul-1999  chs remove uvm_vnp_uncache(), it's no longer needed.
 1.18.2.2.2.1 21-Jun-1999  thorpej Sync w/ -current.
 1.25.6.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.25.4.2 15-Nov-1999  fvdl Sync with -current
 1.25.4.1 19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.25.2.2 08-Dec-2000  bouyer Sync with HEAD.
 1.25.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.26.2.2 06-Nov-1999  perseant Address ufs_hashlock/ufs_ihashins protocol bug, discovered while doing a
post-mortem of a production machine. Also, take the active dirop
count off of the fs and make it global (since it is measuring a global
resource) and tie the threshold value LFS_MAXDIROP to desiredvnodes.
 1.26.2.1 06-Nov-1999  perseant file lfs_alloc.c was added on branch comdex-fall-1999 on 1999-11-06 20:33:06 +0000
 1.32.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.34.2.4 14-Sep-2000  perseant Pull up recent LFS kernel changes (approved by thorpej):

ufs/ufs/inode.h, 1.20--1.22 (add i_lfs_effnblks extension ;
make ITIMES aware of LFS_ITIMES;
_LKM protection so userland progs
compile)
ufs/ufs/ufs_vnops.c, 1.69, 1.71 (remove IN_ADIROP;
use ITIMES instead of FFS_ITIMES)
ufs/ufs/ufs_readwrite.c, 1.27 (use lfs_reserve in lfs_write)
ufs/lfs/lfs.h, 1.26--1.32 (define LFS_EST_* macros ;
change MIN_FREE_SEGS to lfs_minfreesegs ;
add avail and bfree to CLEANERINFO ;
change lfs_uinodes to signed ;
change lfs_dmeta to signed ;
add whitespace to line up structure
members ;
explicit cast to int32_t in LFS_EST_*
macros)
ufs/lfs/lfs_alloc.c, back out 1.34.2.3 (pullups of 1.39, 1.40);
then pull up 1.38 (clean up on error)
1.39--1.43 (restore fvdl's ufs_hashlock fix ;
restore fvdl's ufs_hashlock fix ;
set i_lfs_effnblks ;
use UINO macros ;
add comments and fix long lines)
ufs/lfs/lfs_balloc.c, 1.19 (don't succeed halfway)
1.21--1.25 (use i_lfs_effnblks ;
fix i_lfs_effnblks computation and
quieten ;
fix i_ffs_blocks in unwritten fragment ;
remove useless debugging check ;
add comments and (c) 2000)
ufs/lfs/lfs_bio.c, 1.24--1.30 (cleanup and make lfs_flush_fs take
"struct lfs *" instead of "struct
mount *" ;
use lfs_minfreeseg instead of
MIN_FREE_SEGS ;
use UINO macros, and copy bfree/avail
to CLEANERINFO ;
add lfs_reserve function ;
1.28--1.30 fix printf formatting)
ufs/lfs/lfs_cksum.c, 1.13 (add (c) 2000)
ufs/lfs/lfs_debug.c, 1.11 (use btodb instead of DEV_BSIZE)
ufs/lfs/lfs_extern.h, 1.18, 1.20--1.21 (function prototype changes)
ufs/lfs/lfs_inode.c, 1.38 (rewrite lfs_truncate from
ffs_truncate)
1.40--1.44 (count written and unwritten blocks
seperately ;
use disk block units instead of bytes ;
remove unnecessary "mod" variable ;
correct B_DELWRI to avoid bawrite panic ;
use lfs_reserve)
ufs/lfs/lfs_segment.c, 1.52-1.59 (use lfs_dmeta to note used summaries ;
check for UNWRITTEN in indirect blocks ;
more debugging stuff inside #ifdef
DEBUG_LFS ;
use LK_CANRECURSE ;
don't drop dirty indirect blocks ;
use UINO macros ;
don't hose the free list ;
use btodb() instead of DEV_BSIZE ;
make it compile again (oops))
ufs/lfs/lfs_subr.c, 1.16--1.17 (check for locked inodes before
changing ;
use btodb() instead of DEV_BSIZE, (c)
2000)
ufs/lfs/lfs_syscalls.c, back out 1.41.4.2 (fvdl's ufs_hashlock fix);
then pull up 1.43 (use lfs_dmeta)
1.44--1.45 (restore fvdl's ufs_hashlock fix)
1.46--1.47 (fix lfs_avail leakage from sblock
segments ;
use UINO macros)
1.49 (bounds-check inode numbers in
lfs_markv)
ufs/lfs/lfs_vfsops.c, 1.53 (use LFS_EST_* macros in lfs_statfs)
1.56--1.58 (initialize lfs_minfreeseg, lfs_effnblk ;
initialize lfs_uinodes ;
initialize lfs_ravail)
ufs/lfs/lfs_vnops.c, 1.40 (remove VDIROP from removed files)
1.42--1.44 (move SET_ENDOP below the removal of
VDIROP ;
use UINO macros and add lfs_itimes
function ;
use lfs_reserve in dirops)
 1.34.2.3 03-Jul-2000  fvdl pullup the fixes from the trunk to not hold ufs_hashlock across
getnewvnode()
 1.34.2.2 28-Jun-2000  perseant pull up i_ffs_gen patch from trunk
 1.34.2.1 22-Jun-2000  perseant Pull up lfs_vunref fix from the trunk.
 1.46.2.11 08-Jan-2003  thorpej Sync with HEAD.
 1.46.2.10 11-Dec-2002  thorpej Sync with HEAD.
 1.46.2.9 20-Jun-2002  nathanw Catch up to -current.
 1.46.2.8 28-Feb-2002  nathanw Catch up to -current.
 1.46.2.7 08-Jan-2002  nathanw Catch up to -current.
 1.46.2.6 14-Nov-2001  nathanw Catch up to -current.
 1.46.2.5 22-Oct-2001  nathanw Catch up to -current.
 1.46.2.4 08-Oct-2001  nathanw Catch up to -current.
 1.46.2.3 21-Sep-2001  nathanw Catch up to -current.
 1.46.2.2 24-Aug-2001  nathanw Catch up with -current.
 1.46.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.47.4.4 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.47.4.3 11-Feb-2002  jdolecek Sync w/ -current.
 1.47.4.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.47.4.1 03-Aug-2001  lukem update to -current
 1.47.2.6 13-Jul-2001  perseant Be more careful about when we update ctime/mtime. In particular, if we
are only writing indirect blocks, that doesn't count for mtime; and when
we first create a vnode, that certainly *does not* count for ctime
(a bug that's been there from the beginning).

This does not change the fact that mtime might still be set after write(2)
is "completed", but it does make the atime-in-the-ifile code have some
effect (noticeable less degradation of read time after an intervening
large write).
 1.47.2.5 10-Jul-2001  perseant Turn the free list into a tailq, with both head and tail kept on the ifile.

Update access times on the inode even if it does not get marked IN_ACCESS.
 1.47.2.4 02-Jul-2001  perseant Change disk addressing unit to be the fragment, instead of the disk sector.
All quantities in the superblock, inodes, indirect blocks, etc. refer now
to this abstract unit (called "fsb" as it is in FFS) instead of disk sectors;
as a consequence segment summary blocks have to be multiples of a fragment in
size. In v1 filesystems, compatibility code ensures that 1 fsb == 1 sector,
regardless of fragment size.

Fragments can now range in size between 512 and 32k; in the event that
LFS_LABELPAD (8k) is smaller than the disk address unit size, an extra
proto-superblock is kept at 8k from the beginning of the disk, to be used
*only* to locate the real superblocks. (Not all of the userland knows about
this yet.)

Almost all of this was done not by me, but by joff.
 1.47.2.3 29-Jun-2001  perseant Update the Ifile copy of the free list head in lfs_vfree, so inode numbers
actually get reused.
 1.47.2.2 29-Jun-2001  perseant Get rid of __P(), protoizing where it had not already been done
 1.47.2.1 27-Jun-2001  perseant Import of what I've been calling "LFSv2", that is, LFS with some features
added that require changes to the on-disk data structures. These include:

- 64-bit time in everything but inodes
- User-specified segment offset, and segment size no longer
restricted to PO2.
- Serial number on segment summaries in addition to timestamp, and
a new volume identifier, to make roll-forward feasible without
fear of finding old data and thinking it was new.

Although I think this version works at least as well as what's on the trunk,
we're not done yet; hence this commit is going in on a branch and not on
the trunk. Enhancements that are not here yet include fragment addressing,
like FFS does, instead of block addressing.
 1.48.2.1 01-Oct-2001  fvdl Catch up with -current.
 1.51.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.69.2.10 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.69.2.9 01-Apr-2005  skrll Sync with HEAD.
 1.69.2.8 08-Mar-2005  skrll Sync with HEAD.
 1.69.2.7 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.69.2.6 21-Sep-2004  skrll Fix the sync with head I botched.
 1.69.2.5 18-Sep-2004  skrll Sync with HEAD.
 1.69.2.4 25-Aug-2004  skrll Sync with HEAD.
 1.69.2.3 24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.69.2.2 03-Aug-2004  skrll Sync with HEAD
 1.69.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.72.4.1 10-May-2005  riz Pull up the following revisions (requested by perseant in ticket #1281):

1.8 sys/ufs/lfs/TODO
1.75 sys/ufs/lfs/lfs.h (via patch)
1.74 sys/ufs/lfs/lfs_alloc.c (via patch)
1.49, 1.51 sys/ufs/lfs/lfs_balloc.c (1.51 via patch)
1.78 sys/ufs/lfs/lfs_bio.c
1.62 sys/ufs/lfs/lfs_extern.h (via patch)
1.156 sys/ufs/lfs/lfs_segment.c (via patch)
1.48 sys/ufs/lfs/lfs_subr.c
1.101 sys/ufs/lfs/lfs_syscalls.c
1.163 sys/ufs/lfs/lfs_vfsops.c (via patch)
1.134 sys/ufs/lfs/lfs_vnops.c (via patch)
1.61 sys/ufs/ufs/ufs_readwrite.c (via patch)

1.20 libexec/lfs_cleanerd/clean.h (via patch)
1.52 libexec/lfs_cleanerd/cleanerd.c (via patch)
1.41 libexec/lfs_cleanerd/library.c (via patch)

1.4 regress/sys/fs/lfs/newfs_fsck/Makefile
1.2 regress/sys/fs/lfs/newfs_fsck/mkfs_mount
1.2 regress/sys/fs/lfs/newfs_fsck/smallfiles
1.3 sbin/fsck_lfs/bufcache.c
1.3 sbin/fsck_lfs/bufcache.h
1.3 sbin/fsck_lfs/lfs.h
1.8 sbin/fsck_lfs/lfs.c (via patch)
1.8 sbin/fsck_lfs/pass3.c (via patch)
1.18 sbin/fsck_lfs/pass0.c (via patch)
1.18 sbin/fsck_lfs/utilities.c (via patch)
1.7 sbin/fsck_lfs/segwrite.c
1.19 sbin/fsck_lfs/setup.c (via patch)
1.3 sbin/newfs_lfs/Makefile
0 sbin/newfs_lfs/lfs.c (yes, remove it)
1.1 sbin/newfs_lfs/make_lfs.c
1.15 sbin/newfs_lfs/newfs.c (via patch)

Various minor LFS improvements.

Kernel:

* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this. Should fix PR #29045.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
Fixes PR #26680.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().

cleaner:

* Adapt lfs_cleanerd to use the fcntl call to get the Ifile filehandle,
so it need not be in the namespace.
* Make lfs_cleanerd be more careful when there are very few available
segments.
* Make lfs_cleanerd less verbose when the filesystem is unmounted.

newfs_lfs, fsck_lfs, and regression:

* Extend the lfs library from fsck_lfs(8) so that it can be used with a
not-yet-existent LFS. Make newfs_lfs(8) use this library, so it can
create LFSs whose Ifile is larger than one segment. Addresses PR #11110.
* Make newfs_lfs(8) use strsuftoi64() for its arguments, a la newfs(8).
* Make fsck_lfs(8) respect the "file system is clean" flag.
* Don't let fsck_lfs(8) think it has dirty blocks when invoked with the
-n flag.
* Remove the Ifile from the filesystem namespace. The cleaner now uses
a fcntl call on the root inode to find the Ifile filehandle. (As a
side-effect, addresses PR #29144.)
 1.73.6.2 26-Mar-2005  yamt sync with head.
 1.73.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.73.4.1 29-Apr-2005  kent sync with -current
 1.76.2.12 10-Aug-2006  tron Apply patch (requested by fair in perseant #1457):
Bring LFS up to current, including a patch (1.95 lfs_alloc.c) that
should prevent the inode free list errors seen on the STABLE branch
subsequent to pullup ticket #1327.
 1.76.2.11 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_alloc.c: revision 1.93
sys/ufs/lfs/lfs.h: revision 1.106
sys/ufs/lfs/lfs_vfsops.c: revision 1.209
sys/ufs/lfs/lfs_vnops.c: revision 1.175
sys/ufs/lfs/lfs_segment.c: revision 1.178
Fixes to address the "vinvalbuf: dirty blocks" panic that can occur when
many inodes are cleaned at once. Make sure that we write all the pages
on vnodes that are being flushed, even if we don't think there's room;
drain v_numoutput before lfs_vflush() completes.
Also, don't allow a vnode that is in the process of being cleaned to be
chosen by getnewvnode(); this avoids a segment accounting panic in the case
that a large number of inodes are fed to lfs_markv() all at once.
 1.76.2.10 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_alloc.c: revision 1.92
sys/ufs/lfs/lfs.h: revision 1.105
sys/ufs/lfs/lfs_vfsops.c: revision 1.207
sys/ufs/lfs/lfs_subr.c: revision 1.59
sys/ufs/lfs/lfs_vnops.c: revision 1.173
sys/ufs/lfs/lfs_bio.c: revision 1.92
Introduce another per-filesystem parameter, lfs_resvseg, to separate the
notion of "how many segments are reserved for the cleaner" from that of
"how many segments are not counted in lfs_bfree". The default value
used for existing filesystems is the same as the previous implicit value
of (lfs_minfreeseg / 2 + 1), modulo some sanity checking.
Count pending dirops on a per-filesystem basis, since once we start
writing them we can't stop until we're done. This seems to help stave off
the "no clean segments" panic in the case of filling the filesystem with
directories and small files (e.g. simultaneously unpacking more copies of
pkgsrc than will fit).
 1.76.2.9 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_alloc.c: revision 1.91
Add an explicit list initialization that was missing from my last commit.
 1.76.2.8 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs.h: revision 1.104
sys/ufs/lfs/lfs_vfsops.c: revision 1.206
sys/ufs/lfs/lfs_vnops.c: revision 1.170
sys/ufs/lfs/lfs_extern.h: revision 1.80
sys/ufs/lfs/lfs_segment.c: revision 1.176
sys/ufs/lfs/lfs_inode.c: revision 1.103 via patch
sys/ufs/lfs/lfs_alloc.c: revision 1.90
Postpone the segment accounting changes coming from truncation until the
inode that makes those changes valid is either written to disk by
lfs_writeinode() or discarded by lfs_vfree().
A couple of locking fixes are also included as well.
 1.76.2.7 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_alloc.c: revision 1.89
Fix a fencepost error in the bitmap handling in extend_ifile(), and another
in lfs_freelist_prev().
 1.76.2.6 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs.h: revision 1.101
sys/ufs/lfs/lfs_vfsops.c: revision 1.202
sys/ufs/lfs/lfs_alloc.c: revision 1.88
Optimize the free list search a little more; in particular use words
instead of bytes for the index, and never search below fs->lfs_freehd.
Fix a bug in the previous version of the search (an erroneous assumption
that ino_t was signed).
Free the bitmap when we unmount the filesystem.
 1.76.2.5 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_alloc.c: revision 1.87
sys/ufs/lfs/lfs.h: revision 1.99
sys/ufs/lfs/lfs_vfsops.c: revision 1.199
sys/ufs/lfs/lfs_extern.h: revision 1.77 via patch
Keep the free list ordered. This solves a problem first pointed out to me
by Michel Oey, in which an aged LFS writes up to an extra Ifile block for
every file created; and paves the way for the truncation of the Ifile when
many files are deleted.
 1.76.2.4 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.153
sys/ufs/lfs/lfs_debug.c: revision 1.32
sys/ufs/lfs/lfs_alloc.c: revision 1.84
sys/ufs/lfs/lfs_vfsops.c: revision 1.185
sys/ufs/lfs/lfs_segment.c: revision 1.165
64 bit inode changes.
 1.76.2.3 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.152
sys/ufs/lfs/lfs_debug.c: revision 1.31
sys/ufs/lfs/lfs_subr.c: revision 1.53
sys/ufs/lfs/lfs_extern.h: revision 1.68
sys/ufs/lfs/lfs_inode.c: revision 1.96
sys/ufs/lfs/lfs_bio.c: revision 1.86
sys/ufs/lfs/lfs_alloc.c: revision 1.83
sys/ufs/lfs/lfs_vfsops.c: revision 1.181
sys/ufs/lfs/lfs.h: revision 1.88
sys/ufs/lfs/lfs_segment.c: revision 1.164
- sprinkle const
- avoid shadow variables.
 1.76.2.2 07-May-2005  tron Apply patch (requested by perseant in ticket #242):
* fsck_lfs buffer cache fixes, including PR #29151
* Change fsck_lfs phase 0 message to reflect reality
* fsck_lfs: check phase 5 (cleanerinfo accounting) even on
roll-forward
* Keep better track of the free list during roll-forward, avoiding
a core dump
* Improve hash table use for fsck_lfs buffer and vnode cache
* Document fsck_lfs flag -f, and implement -q
* Add resize_lfs, including kernel support
* Add LFS to mountd's list of exportable filesystem types
* Make the LFS lkm work again [christos@]
* Add MP locking to the LFS kernel subsystem
* Fix pager_map deadlock in lfs_putpages()
* Avoid incomplete file extension that looks like "partial
truncation" to fsck
* Use lfs_malloc for cleaner malloc, since the cleaner often runs
in low-memory conditions.
* Use splay trees, not hash table, to track page allocation for
write.
* Fix mkdir panic on full fs
* Fix page accounting leak by counting differently.
* Use rightly named structure for lfs_getattr [skrll@]
* Cosmetic changes for readability.
 1.76.2.1 30-Mar-2005  tron Pull up revision 1.77 (requested by perseant in ticket #74):
Make LFS dirops get their vnode first, before incrementing the dirop
count, to prevent a deadlock trying to call VOP_PUTPAGES() on a VDIROP
vnode. This can happen when a stacked filesystem is mounted on top of an
LFS: an LFS dirop needs to get a vnode, which is available from the upper
layer. The corresponding lower layer vnode, however, is VDIROP, so the
upper layer can't be cleaned out since its VOP_PUTPAGES() is passed
through to the lower layer, which waits for dirops to drain before it can
proceed. Deadlock.
Tweak ufs_makeinode() and ufs_mkdir() to pass the a_vpp argument through
to VOP_VALLOC().
Partially addresses PR # 26043, though it probably does not completely fix
the problem described there.
 1.83.2.7 04-Feb-2008  yamt sync with head.
 1.83.2.6 21-Jan-2008  yamt sync with head
 1.83.2.5 27-Oct-2007  yamt sync with head.
 1.83.2.4 03-Sep-2007  yamt sync with head.
 1.83.2.3 26-Feb-2007  yamt sync with head.
 1.83.2.2 30-Dec-2006  yamt sync with head.
 1.83.2.1 21-Jun-2006  yamt sync with head.
 1.84.2.2 29-Oct-2005  yamt use lfs_* directly rather than via ufs_ops.
suggested by Chuck Silvers.
 1.84.2.1 20-Oct-2005  yamt adapt ufs.
 1.86.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.86.10.4 11-May-2006  elad sync with head
 1.86.10.3 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.86.10.2 19-Apr-2006  elad sync with head.
 1.86.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.86.8.4 03-Sep-2006  yamt sync with head.
 1.86.8.3 11-Aug-2006  yamt sync with head
 1.86.8.2 24-May-2006  yamt sync with head.
 1.86.8.1 11-Apr-2006  yamt sync with head
 1.86.6.2 01-Jun-2006  kardel Sync with head.
 1.86.6.1 22-Apr-2006  simonb Sync with head.
 1.86.4.1 09-Sep-2006  rpaulo sync with head
 1.94.4.1 13-Jul-2006  gdamore Merge from HEAD.
 1.97.4.2 10-Dec-2006  yamt sync with head.
 1.97.4.1 22-Oct-2006  yamt sync with head
 1.97.2.1 18-Nov-2006  ad Sync with head.
 1.100.8.1 11-Jul-2007  mjf Sync with head.
 1.100.6.4 15-Jul-2007  ad Sync with head.
 1.100.6.3 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.100.6.2 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.100.6.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.101.10.1 14-Oct-2007  yamt sync with head.
 1.101.8.3 23-Mar-2008  matt sync with HEAD
 1.101.8.2 09-Jan-2008  matt sync with HEAD
 1.101.8.1 06-Nov-2007  matt sync with HEAD
 1.101.6.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.103.10.2 02-Jan-2008  bouyer Sync with HEAD
 1.103.10.1 13-Dec-2007  bouyer Sync with HEAD
 1.103.8.1 13-Dec-2007  yamt sync with head.
 1.103.6.4 26-Dec-2007  ad Sync with head.
 1.103.6.3 19-Dec-2007  ad Use a global lfs_lock.
 1.103.6.2 19-Dec-2007  ad Get lfs mostly working.
 1.103.6.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.103.4.1 18-Feb-2008  mjf Sync with HEAD.
 1.106.10.4 11-Aug-2010  yamt sync with head.
 1.106.10.3 11-Mar-2010  yamt sync with head
 1.106.10.2 16-Sep-2009  yamt sync with head
 1.106.10.1 16-May-2008  yamt sync with head.
 1.106.8.1 18-May-2008  yamt sync with head.
 1.106.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.109.4.2 03-Jul-2010  rmind sync with head
 1.109.4.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.109.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.110.6.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.111.8.1 17-Mar-2012  bouyer Pull up following revision(s) (requested by perseant in ticket #116):
sys/ufs/lfs/lfs_alloc.c: revision 1.112
tests/fs/vfs/t_rmdirrace.c: revision 1.9
tests/fs/vfs/t_renamerace.c: revision 1.25
sys/ufs/lfs/lfs_vnops.c: revision 1.240
sys/ufs/lfs/lfs_segment.c: revision 1.224
sys/ufs/lfs/lfs_bio.c: revision 1.122
sys/ufs/lfs/lfs_vfsops.c: revision 1.294
sbin/newfs_lfs/make_lfs.c: revision 1.19
sys/ufs/lfs/lfs.h: revision 1.136
Pass t_renamerace and t_rmdirrace tests.
Adapt dholland@'s fix to ufs_rename to fix PR kern/43582. Address several
other MP locking issues discovered during the course of investigating the
same problem.
Removed extraneous vn_lock() calls on the Ifile, since the Ifile writes
are controlled by the segment lock.
Fix PR kern/45982 by deemphasizing the estimate of how much metadata
will fill the empty space on disk when the disk is nearly empty
(t_renamerace crates a lot of inode blocks on a tiny empty disk).
 1.111.6.1 18-Feb-2012  mrg merge to -current.
 1.111.2.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.111.2.2 23-Jan-2013  yamt sync with head
 1.111.2.1 17-Apr-2012  yamt sync with head
 1.112.2.4 03-Dec-2017  jdolecek update from HEAD
 1.112.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.112.2.2 23-Jun-2013  tls resync from head
 1.112.2.1 25-Feb-2013  tls resync with head
 1.117.2.1 28-Aug-2013  rmind sync with head
 1.119.6.5 28-Aug-2017  skrll Sync with HEAD
 1.119.6.4 05-Oct-2016  skrll Sync with HEAD
 1.119.6.3 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.119.6.2 22-Sep-2015  skrll Sync with HEAD
 1.119.6.1 06-Jun-2015  skrll Sync with HEAD
 1.131.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.133.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.135.6.1 30-Oct-2017  snj Pull up following revision(s) (requested by maya in ticket #330):
sbin/fsck_lfs/inode.c: 1.69
sbin/fsck_lfs/lfs.c: 1.73
sbin/fsck_lfs/pass6.c: 1.50
sbin/fsck_lfs/segwrite.c: 1.46
sys/ufs/lfs/lfs.h: 1.202-1.203
sys/ufs/lfs/lfs_accessors.h: 1.48
sys/ufs/lfs/lfs_alloc.c: 1.136-1.137
sys/ufs/lfs/lfs_balloc.c: 1.94
sys/ufs/lfs/lfs_bio.c: 1.141
sys/ufs/lfs/lfs_extern.h: 1.113
sys/ufs/lfs/lfs_inode.c: 1.156-1.157
sys/ufs/lfs/lfs_inode.h: 1.20, 1.21, 1.23
sys/ufs/lfs/lfs_itimes.c: 1.20
sys/ufs/lfs/lfs_pages.c: 1.13-1.15
sys/ufs/lfs/lfs_rename.c: 1.22
sys/ufs/lfs/lfs_segment.c: 1.270-1.275
sys/ufs/lfs/lfs_subr.c: 1.94-1.97
sys/ufs/lfs/lfs_syscalls.c: 1.175
sys/ufs/lfs/lfs_vfsops.c: 1.360
sys/ufs/lfs/lfs_vnops.c: 1.316-1.321
sys/ufs/lfs/ulfs_inode.c: 1.20
sys/ufs/lfs/ulfs_inode.h: 1.24
sys/ufs/lfs/ulfs_lookup.c: 1.41
sys/ufs/lfs/ulfs_quota2.c: 1.31
sys/ufs/lfs/ulfs_readwrite.c: 1.24
sys/ufs/lfs/ulfs_vnops.c: 1.49-1.50
Update inode member i_flag --> i_state to keep up with kernel changes
Move definition of IN_ALLMOD near the flag it's a mask for.
Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
Correct confusion between i_flag and i_flags
These will have to be renamed.
Spotted by Riastradh, thanks!
Add an XXX about the missing flags so it's not buried in a commit
message.
now the XXX count for LFS is 260
Rename i_flag to i_state.
The similarity to i_flags has previously caused errors.
Use continue to denote the no-op loop to match netbsd style
newline for extra clarity.
It isn't safe to drain dirops with seglock held, it'll deadlock if there
are any dirops. drain before grabbing seglock.
lfs_dirops == 0 is always true (as we already drained dirops), so omit
that part of the comparison.
Fixes a lot of LFS deadlocks. PR kern/52301
Many thanks to dholland for help analyzing coredumps
Ifdef out KDASSERT which fires on my machine.
Deduplicate sanity check that seglock is held on segunlock
Revert r1.272 fix to PR kern/52301, the performance hit is making things
unusable.
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar
XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
fix buffer overflow/KASSERT when cookies are supplied
lfs no longer uses the ffs-style struct direct, use the correct minimum
size
from dholland
XXX more wrong
Consistently use {,UN}MARK_VNODE macros rather than function calls.
Not much point doing anything after a panic call
Ask some question about the code in a XXX comment
XXX question our double-flushing of dirops
Fix typo in comment
 1.137.10.2 29-Feb-2020  ad Sync with head.
 1.137.10.1 17-Jan-2020  ad Sync with head.
 1.137.8.1 17-Aug-2020  martin Pull up following revision(s) (requested by riastradh in ticket #1050):

sys/ufs/lfs/lfs_subr.c: revision 1.101
sys/ufs/lfs/lfs_subr.c: revision 1.102
sys/ufs/lfs/lfs_inode.c: revision 1.158
sys/ufs/lfs/lfs_inode.h: revision 1.25
sys/ufs/lfs/lfs_balloc.c: revision 1.95
sys/ufs/lfs/lfs_pages.c: revision 1.21
sys/ufs/lfs/lfs_vnops.c: revision 1.330
sys/ufs/lfs/lfs_alloc.c: revision 1.140 (patch)
sys/ufs/lfs/lfs_alloc.c: revision 1.141 (patch)
lib/libp2k/p2k.c: revision 1.72
sys/ufs/lfs/lfs.h: revision 1.205
sys/ufs/lfs/lfs.h: revision 1.206
sys/ufs/lfs/lfs_segment.c: revision 1.284
sys/ufs/lfs/lfs.h: revision 1.207
sys/ufs/lfs/lfs_segment.c: revision 1.285
sys/ufs/lfs/lfs_debug.c: revision 1.55
sys/ufs/lfs/lfs_rename.c: revision 1.23
usr.sbin/dumplfs/dumplfs.c: revision 1.65
sys/ufs/lfs/lfs_vfsops.c: revision 1.371
sys/arch/i386/stand/efiboot/bootx64/Makefile: revision 1.3
sys/ufs/lfs/lfs_vfsops.c: revision 1.372
sys/ufs/lfs/lfs_vfsops.c: revision 1.373
sbin/fsck_lfs/pass1.c: revision 1.46
sys/ufs/lfs/lfs_vnops.c: revision 1.326
sys/ufs/lfs/lfs_vnops.c: revision 1.327
sys/ufs/lfs/lfs_vfsops.c: revision 1.375 (patch)
sys/ufs/lfs/lfs_vnops.c: revision 1.328
sys/ufs/lfs/lfs_subr.c: revision 1.98
sys/ufs/lfs/lfs_extern.h: revision 1.116
sys/ufs/lfs/lfs_vnops.c: revision 1.329
sys/ufs/lfs/lfs_subr.c: revision 1.99
sys/ufs/lfs/lfs_extern.h: revision 1.117
sys/ufs/lfs/lfs_accessors.h: revision 1.49
sys/ufs/lfs/lfs_extern.h: revision 1.118
sys/rump/fs/lib/liblfs/Makefile: revision 1.15
sys/ufs/lfs/lfs_bio.c: revision 1.146 (patch)
sys/ufs/lfs/lfs_bio.c: revision 1.147
sys/ufs/lfs/lfs_subr.c: revision 1.100

Fix kassert in lfs by initializing vp first.

Use a marker node to iterate lfs_dchainhd / i_lfs_dchain.

I believe elements can be removed while the lock is dropped,
including the next node we're hanging on to.

Just use VOP_BWRITE for lfs_bwrite_log.
Hope this doesn't cause trouble with vfs_suspend.

Teach lfs to transition ro<->rw.

Prevent new dirops while we issue lfs_flush_dirops.

lfs_flush_dirops assumes (by KASSERT((ip->i_state & IN_ADIROP) == 0))
that vnodes on the dchain will not become involved in active dirops
even while holding no other locks (lfs_lock, v_interlock), so we must
set lfs_writer here. All other callers already set lfs_writer.

We set fs->lfs_writer++ without explicitly doing lfs_writer_enter
because
(a) we already waited for the dirops to drain, and
(b) we hold lfs_lock and cannot drop it before setting lfs_writer.

Assert lfs_writer where I think we can now prove it.

Serialize access to the splay tree with lfs_lock.

Change some cheap KDASSERT into KASSERT.

Take a reference and fix assertions in lfs_flush_dirops.
Fixes panic:
KASSERT((ip->i_state & IN_ADIROP) == 0) at lfs_vnops.c:1670
lfs_flush_dirops
lfs_check
lfs_setattr
VOP_SETATTR
change_mode
sys_fchmod
syscall

This assertion -- and the assertion that vp->v_uflag has VU_DIROP set
-- is valid only until we release lfs_lock, because we may race with
lfs_unmark_dirop which will remove the nodes and change the flags.

Further, vp itself is valid only as long as it is referenced, which it
is as long as it's on the dchain, but lfs_unmark_dirop drops the
dchain's reference.

Don't lfs_writer_enter while holding v_interlock.

There's no need to lfs_writer_enter at all here, as far as I can see.
lfs_flush_fs will do it for us.

Break deadlock in PR kern/52301.

The lock order is lfs_writer -> lfs_seglock. The problem in 52301 is
that lfs_segwrite violates this lock order by sometimes doing
lfs_seglock -> lfs_writer, either (a) when doing a checkpoint or (b),
opportunistically, when there are no dirops pending. Both cases can
deadlock, because dirops sometimes take the seglock (lfs_truncate,
lfs_valloc, lfs_vfree):
(a) There may be dirops pending, and they may be waiting for the
seglock, so we can't wait for them to complete while holding the
seglock.
(b) The test for fs->lfs_dirops == 0 happens unlocked, and the state
may change by the time lfs_writer_enter acquires lfs_lock.

To resolve this in each case:
(a) Do lfs_writer_enter before lfs_seglock, since we will need it
unconditionally anyway. The worst performance impact of this should
be that some dirops get delayed a little bit.
(b) Create a new lfs_writer_tryenter to use at this point so that the
test for fs->lfs_dirops == 0 and the acquisition of lfs_writer happen
atomically under lfs_lock.

Initialize/destroy lfs_allclean_wakeup in modcmd, not lfs_mountfs.

Fixes reloading lfs.kmod.

In lfs_update, hold lfs_writer around lfs_vflush.

Otherwise, we might do
lfs_vflush
-> lfs_seglock
-> lfs_segwait(SEGM_CKP)
-> lfs_writer_enter
which is the reverse of the lfs_writer -> lfs_seglock ordering.

Call lfs_orphan in lfs_rename while we're still in the dirop.
lfs_writer_enter can't fail; keep it simple and don't pretend it can.

Assert that mtsleep can't fail either -- it doesn't catch signals and
there's no timeout.

Teach LFS_ORPHAN_NEXTFREE about lfs64.

Dust off the orphan detection code and try to make it work.

Fix !DIAGNOSTIC compile

Fix userland references to LFS_ORPHAN_NEXTFREE.

Forgot to grep for these or do a full distribution build, oops!

Fix missing <sys/evcnt.h> by removing the evcnts instead.

Just wanted to confirm that a race might happen, and indeed it did.
These serve little diagnostic value otherwise.

OR into bp->b_cflags; don't overwrite.

CTASSERT lfs on-disk structure sizes.

Avoid misaligned access to lfs64 on-disk records in memory.
lfs64 directory entries are only 32-bit aligned in order to conserve
space in directory blocks, and we had a hack to stuff a 64-bit inode
in them. This replaces the hack by __aligned(4) __packed, and goes
further:

1. It's not clear that all the other lfs64 data structures are 64-bit
aligned on disk to begin with. We can go through these later and
upgrade them from
struct foo64 {
...
} __aligned(4) __packed;
union foo {
struct foo64 f64;
...
};
to
struct foo64 {
...
};
union foo {
struct foo64 f64 __aligned(8);
...
} __aligned(4) __packed;
if we really want to take advantage of 64-bit memory accesses.
However, the __aligned(4) __packed must remain on the union
because:
2. We access even the lfs32 data structures via a union that has
lfs64 members, and it turns out that compilers will assume access
through a union with 64-bit aligned members implies the whole
union has 64-bit alignment, even if we're only accessing a 32-bit
aligned member.

Fix clang build after packed lfs64 accessor change.

Suppress spurious address-of-packed error in rump lfs too.
 1.137.4.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.97 20-Oct-2025  perseant * Generalize the partial-segment parser introduced for roll-forward,
using it to facilitate an in-kernel segment rewriter (cleaner), and a
mechanism to check whether a segment is in fact empty (only used
with DEBUG).

* Add these new fcntl calls:
- LFCNFILESTATS: For each inode given, report its number of direct
blocks, how many gaps (discontinuities) there are between direct
blocks, and how large the total gap distance is. This will be
useful for a coalescing agent.
- LFCNREWRITEFILE: For each inode given, rewrite its direct blocks,
effectively coalescing it into as compact a form as possible.
- LFCNSCRAMBLE: As above, except that it only rewrites every other
block. This causes the file to have many gaps that can be
measured with LFCNFILESTATS and addressed with LFCNREWRITEFILE,
for testing purposes.
- LFCNREWRITESEGS: Rewrite any live data in the given segments.
This is intended to simplify the cleaner API and facilitate an
in-kernel cleaner.
- LFCNCLEANERINFO: Get the most current CLEANERINFO data from the
kernel.
- LFCNSEGUSE: Retrieve segment usage data from the kernel.

* Vnodes marked IN_CLEANING now take a reference. Add a new "cleaner
lock", which must be taken by the cleaner before the segment lock,
and before marking nodes IN_CLEANING. This allows us to flush
vnodes, if necessary, before the cleaning segment is written, and
never to flush vnodes being cleaned. When the cleaner lock is
released, the vnodes are cleared of IN_CLEANING and the reference
dropped.

* Track a potential infinite loop in lfs_gatherblock.

* Pull "needs to flush" and "needs to wait for flush" into functions
instead of inlining their definitions.
 1.96 05-Sep-2020  riastradh Round of uvm.h cleanup.

The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
to query whether curlwp is the pagedaemon, which should maybe be
exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
by it. We should split up uvm_extern.h but this will serve for now
to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
UVMHIST(ubchist), since ubchist is declared in uvm.h but the
reference evaporates if UVMHIST is not defined, so we reduce header
file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
here.

ok chs@
 1.95 23-Feb-2020  riastradh Serialize access to the splay tree with lfs_lock.
 1.94 10-Jun-2017  maya branches: 1.94.6; 1.94.10; 1.94.12;
Rename i_flag to i_state.

The similarity to i_flags has previously caused errors.
 1.93 08-Jun-2017  chs move some buffer cache internals declarations from buf.h to vfs_bio.c.
this is needed to avoid name conflicts with ZFS and also
makes it clearer that other code shouldn't be messing with these.
remove the LFS debug code that poked around in bufqueues and
remove the BQ_EMPTY bufqueue since nothing uses it anymore.
provide a function to let LFS and wapbl read the value of nbuf for now.
 1.92 06-Apr-2017  maya branches: 1.92.6;
Provide a LFS_ENTER_LOG (__nothing) in the !DEBUG case.
so I can drop lots of #ifdef DEBUG around this macro. NFCI
 1.91 07-Aug-2016  dholland branches: 1.91.2;
Fix stupid thinko.
 1.90 07-Aug-2016  dholland comments
 1.89 07-Aug-2016  dholland use static properly
 1.88 10-Oct-2015  dholland branches: 1.88.2;
Use accessors for some more indirect block manipulations.
 1.87 01-Sep-2015  dholland Use the lfs dinode accessors in place of the ufs-derived ones.
(Mostly.)

The ufs-derived ones are fake structure member macros, which are gross
and not very safe. Also, it seems that a lot of places in the lfs code
were using the ffsv1 branch of them unconditionally, and this way it's
guaranteed all those places have been updated.

Found while doing this: for non-devices, have getattr produce NODEV
in the rdev field instead of leaking the address of the first direct
block.
 1.86 02-Aug-2015  dholland Pass the fs object to LFS_MAX_DADDR so it can check lfs_is64.

Remove some hackish intentional 64->32 truncations next to the checks
using LFS_MAX_DADDR, and tackle the problem they handled in bmap
instead.

The problem: the magic block pointer value UNWRITTEN has magic value
-2, and if it's not handled specifically, uint32 -> uint64 promotion
turns it into 4294967294, which then causes consternation and
monkeyhouse downstream.

What's here is still kind of a hack, but it's a step forward.
 1.85 02-Aug-2015  dholland Fix assorted 64 -> 32 truncations in lfs. Also, some minor tidyups and
corrections in passing.
 1.84 28-Jul-2015  dholland Add a new lfs header file: lfs_accessors.h.

This contains all the accessor functions and macros out of lfs.h.
Add an include of lfs_accessors.h after all uses of lfs.h... except
for code that wants to define its own struct lfs-alike that the
accessors are supposed to play along with. For these, set STRUCT_LFS
and include lfs_accessors.h after the necessary structure has been
defined, so that lfs_accessors.h can emit functions in terms of it.
 1.83 24-Jul-2015  dholland More lfs superblock accessors.
(This changes the rest of the code over; all the accessors were
already added.)

The difference between this commit and the previous one is arbitrary,
but the previous one passed the regression tests on its own so I'm
keeping it separate to help with any bisections that might be needed
in the future.
 1.82 24-Jul-2015  dholland Switch to accessor functions for elements of the LFS on-disk
superblock. This will allow switching between 32/64 bit forms on the
fly; it will also allow handling LFS_EI reasonably tidily. (That
currently doesn't work on the superblock.)

It also gets rid of cpp abuse in the form of fake structure member
macros.

Also, instead of doing sleep/wakeup on &lfs_avail and &lfs_nextseg
inside the on-disk superblock, add extra elements to the in-memory
struct lfs for this. (XXX: these should be changed to condvars, but
not right now)

XXX: this migrates a structure needed by the lfs code in libsa (struct
salfs) into lfs.h, where it doesn't belong, but for the time being
this is necessary in order to allow the accessors (and the various
lfs macros and other goop that relies on them) to compile.
 1.81 28-Mar-2015  maxv Remove the 'cred' argument from bread(). Remove a now unused var in
ffs_snapshot.c. Update the man page accordingly.

ok hannken@
 1.80 28-Jul-2013  dholland branches: 1.80.6;
Add more of the bits for supporting quotas.
 1.79 28-Jul-2013  dholland Migrate the miscellaneous ulfs-level info from struct ulfsmount to
struct lfs.

Put them inside #ifdef _KERNEL there. They are not the only such
members, gross as that is. Unfortunately, moving struct lfs to
lfs_kernel.h does not work.
 1.78 28-Jul-2013  dholland Add lfs_kernel.h for declarations that don't need to be exposed to userland.

lfs currently has the following headers:
lfs.h - on-disk structures and stuff needed for userlevel tools
lfs_inode.h - additional restricted materials for userlevel tools
that operate the fs (newfs_lfs, fsck_lfs, lfs_cleanerd)
lfs_kernel.h - stuff needed only in the kernel

and the following legacy headers that are expected to be mopped up and
folded into one of the above:
lfs_extern.h - function prototypes
ulfs_bswap.h - endian-independent support
ulfs_dinode.h - now contains very little
ulfs_dirhash.h - dirhash support
ulfs_extattr.h - extattr support
ulfs_extern.h - more function prototypes
ulfs_inode.h - assorted kernel-only declarations
ulfs_quota.h - quota support
ulfs_quota1.h - more quota support
ulfs_quota2.h - more quota support
ulfs_quotacommon.h - more quota support
ulfsmount.h - legacy copy of ufsmount material
 1.77 18-Jun-2013  christos branches: 1.77.2;
Prefix most of the cpp macros with lfs_ and LFS_ to avoid conflicts with ffs.
This was done so that boot blocks that want to compile both FFS and LFS in
the same file work.
 1.76 06-Jun-2013  dholland Add lfs_ or ulfs_ in front of extern symbols lacking them, mostly
quota-related (and particularly quota2-related) stuff.
 1.75 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.74 06-Jun-2013  dholland Split lfs from ufs step 3: rearrange config stuff.
Add new options:
LFS_EI
LFS_DIRHASH
LFS_EXTATTR
LFS_EXTATTR_AUTOSTART
LFS_QUOTA
LFS_QUOTA2

and update code referring to the corresponding FFS and UFS config
symbols to use the LFS versions. Disable the one extant reference
to APPLE_UFS in the ulfs files. Use opt_lfs.h only, not opt_ffs.h.
 1.73 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.72 22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.71 20-Dec-2012  hannken Change bread() and breadn() to never return a buffer on
error and modify all callers to not brelse() on error.

Welcome to 6.99.16

PR kern/46282 (6.0_BETA crash: msdosfs_bmap -> pcbmap -> bread -> bio_doread)
 1.70 11-Jul-2011  hannken branches: 1.70.2; 1.70.12;
Change VOP_BWRITE() to take a vnode as its first argument like all other
VOPs do. Layered file systems no longer have to modify bp->b_vp and run
into trouble when an async VOP_BWRITE() uses the wrong vnode.

- change all occurences of VOP_BWRITE(bp) to VOP_BWRITE(bp->b_vp, bp).
- remove layer_bwrite().
- welcome to 5.99.55

Adresses PR kern/38762 panic: vwakeup: neg numoutput

No objections from tech-kern@.
 1.69 16-Feb-2010  mlelstv Three changes in a single commit.

- drop the notion of frags (LFS fragments) vs fsb (FFS fragments)
The code uses a complicated unity function that just makes the
code difficult to understand.

- support larger sector sizes. Fix disk address computations
to use DEV_BSIZE in the kernel as required by device drivers
and to use sector sizes in userland.

- Fix several locking bugs in lfs_bio.c and lfs_subr.c.
 1.68 18-Mar-2009  cegger branches: 1.68.2;
bzero -> memset
 1.67 16-May-2008  hannken branches: 1.67.6; 1.67.12;
Make sure all cached buffers with valid, not yet written data have been
run through copy-on-write. Call fscow_run() with valid data where possible.

The LP_UFSCOW hack is no longer needed to protect ffs_copyonwrite() against
endless recursion.

- Add a flag B_MODIFY to bread(), breada() and breadn(). If set the caller
intends to modify the buffer returned.

- Always run copy-on-write on buffers returned from ffs_balloc().

- Add new function ffs_getblk() that gets a buffer, assigns a new blkno,
may clear the buffer and runs copy-on-write. Process possible errors
from getblk() or fscow_run(). Part of PR kern/38664.

Welcome to 4.99.63

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.66 28-Apr-2008  martin branches: 1.66.2;
Remove clause 3 and 4 from TNF licenses
 1.65 15-Feb-2008  ad branches: 1.65.6; 1.65.8; 1.65.10;
The buffer LOCKED flag need not be under the protection of bufcache_lock,
BUSY is enough.
 1.64 02-Jan-2008  ad Merge vmlocking2 to head.
 1.63 08-Oct-2007  ad branches: 1.63.4; 1.63.6; 1.63.10;
Merge ffs locking & brelse changes from the vmlocking branch.
 1.62 15-Feb-2007  ad branches: 1.62.6; 1.62.18; 1.62.20; 1.62.22;
Replace some uses of lockmgr() / simplelocks.
 1.61 14-May-2006  elad integrate kauth.
 1.60 07-Apr-2006  perseant Several minor bug fixes:

* Correct (weak) segment lock assertions in lfs_fragextend and lfs_putpages.
* Keep IN_MODIFIED set if we run out of avail in lfs_putpages.
* Don't try to (re)write buffers on a VBLK vnode; fixes a panic I found
while running with an LFS root.
* Raise priority of LFCNSEGWAIT to PVFS; PUSER is way too low for
something the pagedaemon is relying on.
 1.59 24-Dec-2005  perry branches: 1.59.4; 1.59.6; 1.59.8; 1.59.10; 1.59.12;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.58 11-Dec-2005  christos merge ktrace-lwp.
 1.57 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.56 19-Apr-2005  perseant branches: 1.56.2; 1.56.4;
Keep per-inode, per-fs, and subsystem-wide counts of blocks allocated through
lfs_balloc(), and use that to estimate the number of dirty pages belonging
to LFS (subsystem or filesystem). This is almost certainly wrong for
the case of a large mmap()ed region, but the accounting is tighter than
what we had before, and performs much better in the typical case of pages
dirtied through write().
 1.55 16-Apr-2005  perseant Use splay trees, rather than a hash table, to manage the accounting of
blocks allocated through VOP_BALLOC() for pages to be written to disk.
This accounting no longer takes a noticeable fraction of the system CPU.
 1.54 14-Apr-2005  perseant Tabify leading whitespace
 1.53 14-Apr-2005  perseant Consolidate the hash table we use to maintain the integrity of lfs_avail
into a single, system-wide table, rather than having a separate hash table
per inode. Significantly reduces the "system" cpu usage of your average
file write.
 1.52 01-Apr-2005  perseant Protect various per-fs structures with fs->lfs_interlock simple_lock, to
improve behavior in the multiprocessor case. Add debugging segment-lock
assertion statements.
 1.51 02-Mar-2005  perseant branches: 1.51.2;
Put the ISSPACE() check where it belongs. This allows rewriting a file
on a full filesystem while still returning ENOSPC on an attempt to allocate
new blocks.
 1.50 26-Feb-2005  perry nuke trailing whitespace
 1.49 26-Feb-2005  perseant Various minor LFS improvements:

* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statvfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().
 1.48 25-Jan-2004  hannken branches: 1.48.6; 1.48.8; 1.48.10;
Make VOP_STRATEGY(bp) a real VOP as discussed on tech-kern.

VOP_STRATEGY(bp) is replaced by one of two new functions:

- VOP_STRATEGY(vp, bp) Call the strategy routine of vp for bp.
- DEV_STRATEGY(bp) Call the d_strategy routine of bp->b_dev for bp.

DEV_STRATEGY(bp) is used only for block-to-block device situations.
 1.47 30-Dec-2003  pk Replace the traditional buffer memory management -- based on fixed per buffer
virtual memory reservation and a private pool of memory pages -- by a scheme
based on memory pools.

This allows better utilization of memory because buffers can now be allocated
with a granularity finer than the system's native page size (useful for
filesystems with e.g. 1k or 2k fragment sizes). It also avoids fragmentation
of virtual to physical memory mappings (due to the former fixed virtual
address reservation) resulting in better utilization of MMU resources on some
platforms. Finally, the scheme is more flexible by allowing run-time decisions
on the amount of memory to be used for buffers.

On the other hand, the effectiveness of the LRU queue for buffer recycling
may be somewhat reduced compared to the traditional method since, due to the
nature of the pool based memory allocation, the actual least recently used
buffer may release its memory to a pool different from the one needed by a
newly allocated buffer. However, this effect will kick in only if the
system is under memory pressure.
 1.46 29-Oct-2003  mycroft Adjust to remove bogus initializer.
 1.45 25-Oct-2003  christos Fix uninitialized variable warnings.
 1.44 04-Sep-2003  yamt don't call LFS_DEBUG_COUNTLOCKED after bread().
lfs_countlocked doesn't count buffers that isn't on the freelist.
 1.43 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.42 18-May-2003  yamt branches: 1.42.2;
make is_sequential a callback in order to achieve better lfs write clustering.

since lfs always rewrite blocks into the new segment,
current on-disk place of the block doesn't affect to write clustering.

ok'ed by Konrad Schroder.
 1.41 29-Apr-2003  yamt add an assertion.
 1.40 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.39 15-Mar-2003  perseant Add simple_lock protection for lfs_seglock and lfs_subsys_pages; these will
be expanded to cover other per-fs and subsystem-wide data as well.

Fix a case of IN_MODIFIED being set without updating lfs_uinodes, resulting
in a "lfs_uinodes < 0" panic.

Fix a deadlock in lfs_putpages arising from the need to busy all pages in a
block; unbusy any that had already been busied before starting over.
 1.38 28-Feb-2003  perseant Fix a clrbuf() on an uninitialized pointer.
 1.37 20-Feb-2003  perseant Tabify, and fix some comment alignment problems.
 1.36 17-Feb-2003  perseant Add code to UBCify LFS. This is still behind "#ifdef LFS_UBC" for now
(there are still some details to work out) but expect that to go
away soon. To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:

* Create a writer daemon kernel thread whose purpose is to handle page
writes for the pagedaemon, but which also takes over some of the
functions of lfs_check(). This thread is started the first time an
LFS is mounted.

* Add a "flags" parameter to GOP_SIZE. Current values are
GOP_SIZE_READ, meaning that the call should return the size of the
in-core version of the file, and GOP_SIZE_WRITE, meaning that it
should return the on-disk size. One of GOP_SIZE_READ or
GOP_SIZE_WRITE must be specified.

* Instead of using malloc(...M_WAITOK) for everything, reserve enough
resources to get by and use malloc(...M_NOWAIT), using the reserves if
necessary. Use the pool subsystem for structures small enough that
this is feasible. This also obsoletes LFS_THROTTLE.

And a few that are not strictly necessary:

* Moves the LFS inode extensions off onto a separately allocated
structure; getting closer to LFS as an LKM. "Welcome to 1.6O."

* Unified GOP_ALLOC between FFS and LFS.

* Update LFS copyright headers to correct values.

* Actually cast to unsigned in lfs_shellsort, like the comment says.

* Keep track of which segments were empty before the previous
checkpoint; any segments that pass two checkpoints both dirty and
empty can be summarily cleaned. Do this. Right now lfs_segclean
still works, but this should be turned into an effectless
compatibility syscall.
 1.35 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.34 11-Dec-2002  yamt take care of B_CLRBUF in lfs_balloc.
otherwise you'll see uninitialized blocks.
 1.33 06-Jul-2002  perseant Deal with fragment size changes better. For each fragment that can
exist on an on-disk inode, we keep a record of its size in struct inode,
which is updated when we write the block to disk. The cleaner routines
thus have ready access to what size is the correct size for this block,
on disk.

Fixed a related bug: if a file with fragments is being cleaned
(fragments being cleaned) at the same time it is being extended beyond
NDADDR blocks, we could write a bogus FINFO record that has a frag in the
middle; when it was cleaned this would give back bogus file data. Don't
write the indirect blocks in this case, since there is no need.

lfs_fragextend and lfs_truncate no longer require the seglock, but instead
take a shared lock, which the seglock locks exclusively.
 1.32 14-May-2002  perseant branches: 1.32.2;
Phase one of my three-phase plan to make LFS play nice with UBC, and bug-fixes
I found while making sure there weren't any new ones.

* Make the write clusters keep track of the buffers whose blocks they contain.
This should make it possible to (1) write clusters using a page mapping
instead of malloc, if desired, and (2) schedule blocks for rewriting
(somewhere else) if a write error occurs. Code is present to use
pagemove() to construct the clusters but that is untested and will go away
anyway in favor of page mapping.
* DEBUG now keeps a log of Ifile writes, so that any lingering instances of
the "dirty bufs" problem can be properly debugged.
* Keep track of whether the Ifile has been dirtied by various routines that
can be called by lfs_segwrite, and loop on that until it is clean, for
a checkpoint. Checkpoints need to be squeaky clean.
* Warn the user (once) if the Ifile grows larger than is reasonable for their
buffer cache. Both lfs_mountfs and lfs_unmount check since the Ifile can
grow.
* If an inode is not found in a disk block, try rereading the block, under
the assumption that the block was copied to a cluster and then freed.
* Protect WRITEINPROG() with splbio() to fix a hang in lfs_update.
 1.31 23-Nov-2001  chs add spaces for KNF. confirmed to produce identical objects.
 1.30 08-Nov-2001  lukem add RCSID
 1.29 13-Jul-2001  perseant branches: 1.29.4;
Merge the short-lived perseant-lfsv2 branch into the trunk.

Kernels and tools understand both v1 and v2 filesystems; newfs_lfs
generates v2 by default. Changes for the v2 layout include:

- Segments of non-PO2 size and arbitrary block offset, so these can be
matched to convenient physical characteristics of the partition (e.g.,
stripe or track size and offset).

- Address by fragment instead of by disk sector, paving the way for
non-512-byte-sector devices. In theory fragments can be as large
as you like, though in reality they must be smaller than MAXBSIZE in size.

- Use serial number and filesystem identifier to ensure that roll-forward
doesn't get old data and think it's new. Roll-forward is enabled for
v2 filesystems, though not for v1 filesystems by default.

- The inode free list is now a tailq, paving the way for undelete (undelete
is not yet implemented, but can be without further non-backwards-compatible
changes to disk structures).

- Inode atime information is kept in the Ifile, instead of on the inode;
that is, the inode is never written *just* because atime was changed.
Because of this the inodes remain near the file data on the disk, rather
than wandering all over as the disk is read repeatedly. This speeds up
repeated reads by a small but noticeable amount.

Other changes of note include:

- The ifile written by newfs_lfs can now be of arbitrary length, it is no
longer restricted to a single indirect block.

- Fixed an old bug where ctime was changed every time a vnode was created.
I need to look more closely to make sure that the times are only updated
during write(2) and friends, not after-the-fact during a segment write,
and certainly not by the cleaner.
 1.28 30-May-2001  mrg branches: 1.28.2; 1.28.4;
use _KERNEL_OPT
 1.27 21-Nov-2000  perseant branches: 1.27.2;
More locked_queue_* and lfs_avail accounting fixes from Jesse Off
<joff@gci-net.com>. Remove a specious btodb() in lfs_fragextend, and
count blocks shrunk or removed by VOP_TRUNCATE in lfs_avail.
 1.26 17-Nov-2000  perseant Correct accounting of lfs_avail, locked_queue_count, and locked_queue_bytes.
(PR #11468). In the case of fragment allocation, check to see if enough
space is available before extending a fragment already scheduled for writing.

The locked_queue_* variables indicate the number of buffer headers and bytes,
respectively, that are unavailable to getnewbuf() because they are locked up
waiting for LFS to flush them; make sure that that is actually what we're
counting, i.e., never count malloced buffers, and always use b_bufsize instead
of b_bcount.

If DEBUG is defined, the periodic calls to lfs_countlocked will now complain
if either counter is incorrect. (In the future lfs_countlocked will not need
to be called at all if DEBUG is not defined.)
 1.25 09-Sep-2000  perseant Various bug-fixes to LFS, to wit:


Kernel:

* Add runtime quantity lfs_ravail, the number of disk-blocks reserved
for writing. Writes to the filesystem first reserve a maximum amount
of blocks before their write is allowed to proceed; after the blocks
are allocated the reserved total is reduced by a corresponding amount.

If the lfs_reserve function cannot immediately reserve the requested
number of blocks, the inode is unlocked, and the thread sleeps until
the cleaner has made enough space available for the blocks to be
reserved. In this way large files can be written to the filesystem
(or, smaller files can be written to a nearly-full but thoroughly
clean filesystem) and the cleaner can still function properly.

* Remove explicit switching on dlfs_minfreeseg from the kernel code; it
is now merely a fs-creation parameter used to compute dlfs_avail and
dlfs_bfree (and used by fsck_lfs(8) to check their accuracy). Its
former role is better assumed by a properly computed dlfs_avail.

* Bounds-check inode numbers submitted through lfs_bmapv and lfs_markv.
This prevents a panic, but, if the cleaner is feeding the filesystem
the wrong data, you are still in a world of hurt.

* Cleanup: remove explicit references of DEV_BSIZE in favor of
btodb()/dbtob().

lfs_cleanerd:

* Make -n mean "send N segments' blocks through a single call to
lfs_markv". Previously it had meant "clean N segments though N calls
to lfs_markv, before looking again to see if more need to be cleaned".
The new behavior gives better packing of direct data on disk with as
little metadata as possible, largely alleviating the problem that the
cleaner can consume more disk through inefficient use of metadata than
it frees by moving dirty data away from clean "holes" to produce
entirely clean segments.

* Make -b mean "read as many segments as necessary to write N segments
of dirty data back to disk", rather than its former meaning of "read
as many segments as necessary to free N segments worth of space". The
new meaning, combined with the new -n behavior described above,
further aids in cleaning storage efficiency as entire segments can be
written at once, using as few blocks as possible for segment summaries
and inode blocks.

* Make the cleaner take note of segments which could not be cleaned due
to error, and not attempt to clean them until they are entirely free
of dirty blocks. This prevents the case in which a cleanerd running
with -n 1 and without -b (formerly the default) would spin trying
repeatedly to clean a corrupt segment, while the remaining space
filled and deadlocked the filesystem.

* Update the lfs_cleanerd manual page to describe all the options,
including the changes mentioned here (in particular, the -b and -n
flags were previously undocumented).

fsck_lfs:

* Check, and optionally fix, lfs_avail (to an exact figure) and
lfs_bfree (within a margin of error) in pass 5.

newfs_lfs:

* Reduce the default dlfs_minfreeseg to 1/20 of the total segments.

* Add a warning if the sgs disklabel field is 16 (the default for FFS'
cpg, but not usually desirable for LFS' sgs: 5--8 is a better range).

* Change the calculation of lfs_avail and lfs_bfree, corresponding to
the kernel changes mentioned above.

mount_lfs:

* Add -N and -b options to pass corresponding -n and -b options to
lfs_cleanerd.

* Default to calling lfs_cleanerd with "-b -n 4".


[All of these changes were largely tested in the 1.5 branch, with the
idea that they (along with previous un-pulled-up work) could be applied
to the branch while it was still in ALPHA2; however my test system has
experienced corruption on another filesystem (/dev/console has gone
missing :^), and, while I believe this unrelated to the LFS changes, I
cannot with good conscience request that the changes be pulled up.]
 1.24 04-Jul-2000  perseant Fix errors observed while trying to fill the filesystem with yesterday's
fixes:

- Write copies of bfree and avail in the CLEANERINFO block, so the
cleaner doesn't have to guess which superblock has the current
information (if indeed any do).

- Tighten up accounting of lfs_avail (more needs to be done).

- When cleansing indirect blocks of UNWRITTEN, make sure not to mark
them clean, since they'll need to be rewritten later.
 1.23 03-Jul-2000  perseant Fix i_ffs_blocks in fragment extension case where fragment has not yet
been written to disk.
 1.22 03-Jul-2000  perseant i_lfs_effnblks fixes. Put debugging printfs under #ifdef DEBUG_LFS.
 1.21 03-Jul-2000  perseant Allow the number of free segments reserved for the cleaner to be
parametrized in the filesystem, defaulting to MIN_FREE_SEGS = 2 but set
to something more reasonable at newfs_lfs time.

Note the number of blocks that have been scheduled for writing but which
are not yet on disk in an inode extension, i_lfs_effnblks. Move
i_ffs_effnlink out of the ffs extension and onto the main inode, since
it's used all over the shared code and the lfs extension would clobber
it.

At inode write time, indirect blocks and inode-held blocks of inodes
that have i_lfs_effnblks != i_ffs_blocks are cleansed of UNWRITTEN disk
addresses, so that these never make it to disk.
 1.20 28-Jun-2000  mrg remove include of <vm/vm.h> and <uvm/uvm_extern.h>
 1.19 27-Jun-2000  perseant Fixes associated with filling an LFS:

Change the space computation to appear to change the size of the *disk*
rather than the *bytes used* when more segment summaries and inode
blocks are written. Try to estimate the amount of space that these will
take up when more files are written, so the disk size doesn't change too
much.

Regularize error returns from lfs_valloc, lfs_balloc, lfs_truncate: they
now fail entirely, rather than succeeding half-way and leaving the fs in
an inconsistent state.

Rewrite lfs_truncate, mostly stealing from ffs_truncate. The old
lfs_truncate had difficulty truncating a large file to a non-zero size
(indirect blocks were not handled appropriately).

Unmark VDIROP on fvp after ufs_remove, ufs_rmdir, so these can be
reclaimed immediately: this vnode would not be written to disk again
anyway if the removal succeeded, and if it failed, no directory
operation occurred.

ufs_makeinode and ufs_mkdir now remove IN_ADIROP on error.
 1.18 06-Jun-2000  perseant branches: 1.18.2;
Protect inode free list with seglock, instead of separate lock, so that
the head of the inode free list (on the superblock) always matches the
rest of the free list (in the ifile).

Protect lfs_fragextend with seglock, to prevent the segment byte count
fudging from making its way to disk.

Don't try to inactivate dirop vnodes that are still in the middle of
their dirop (may address PR#10285).
 1.17 30-May-2000  perseant Don't try to "correct" accounting for fragments being extended but which
have never been written to disk.
 1.16 05-May-2000  perseant branches: 1.16.2;
Change the way LFS does block accounting, from trying to infer from the
buffer cache flags, to marking the inode and/or indirect blocks with a
special disk address UNWRITTEN==-2 when a block is accounted for. (This
address is never written to disk, but only used in-core. This is essentially
the same method of block accounting as on the UBC branch, where the buffer
headers don't exist.) Make sure that truncation is handled properly,
especially in the case of holey files.

Fixes PR#9994.
 1.15 23-Apr-2000  perseant Fix problems outlined in PR#9926:
- lfs_truncate extends the file if called with length > i_ffs_size;
- lfs_truncate errors out if called with length < 0;
- lfs_balloc block accounting corrected for the case of blocks read
into the cache before they exist on disk;
- mp->mnt_stat.f_iosize is initialized in lfs_mountfs.
 1.14 15-Nov-1999  fvdl Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.13 15-Jun-1999  perseant branches: 1.13.2; 1.13.4; 1.13.8;
Minor changes to the segment live bytes calculation. In particular, fixed
a bug in fragment extension that could run the count negative. Also, don't
overcount for inodes, and don't count segment summaries. Thus, for empty
segments the live bytes count should now be exactly zero.
 1.12 24-Mar-1999  mrg branches: 1.12.2; 1.12.4; 1.12.6;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.
 1.11 10-Mar-1999  perseant New sources should leave the LFS in a more-or-less working state. Changes
include:

- DIROP segregation is enabled, and greater care is taken
to make sure that a checkpoint completes. Fsck is not
needed to remount the filesystem.
- Several checks to make sure that the LFS subsystem does not
overuse various resources (memory, in particular).
- The cleaner routines, lfs_markv in particular, are completely
rewritten. A buffer overflow is removed. Greater care is taken
to ensure that inodes come from where lfs_cleanerd say they come
from (so we know nothing has changed since lfs_bmapv was called).
- Fragment allocation is fixed, so that writes beyond end-of-file
do the right thing.
 1.10 09-Nov-1998  mycroft GC the B_CACHE bit.
 1.9 09-Jun-1998  scottr Protect various config(8)-generated files from inclusion while
building LKMs. Fixes PR 5557.
 1.8 08-Jun-1998  scottr Use the newly-defined opt_quota.h.
 1.7 03-Mar-1998  drochner Don't cast the quad_t file size to u_long, this can cause overflows.
 1.6 03-Mar-1998  fvdl Make this compile again with UVM
 1.5 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.4 11-Jun-1997  bouyer Add support for ext2fs, this needed a few modifications to ufs/ufs/inode.h:
- added an "union inode_ext" to struct inode, for the per-fs extentions.
For now only ext2fs uses it.
- i_din is now an union:
union {
struct dinode ffs_din; /* 128 bytes of the on-disk dinode. */
struct ext2fs_dinode e2fs_din; /* 128 bytes of the on-disk dinode. */
} i_din
Added a lot of #define i_ffs_* and i_e2fs_* to access the fields.
- Added two macros: FFS_ITIMES and EXT2FS_ITIMES. ITIMES calls the rigth
macro, depending on the time of the inode. ITIMES is used where necessary,
FFS_ITIMES and EXT2FS_ITIMES in other places.
 1.3 09-Feb-1996  christos lfs prototypes
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.12.6.1 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.12.4.3 31-Aug-1999  perseant Rudimentary support for LFS under UBC:

- LFS-specific VOP_BALLOC and VOP_PUTPAGES vnode ops.

- getblk VREG panic #ifdef'd out (can be reinstated when Ifile is
internalized and Ifile can be made another type from VREG)

- interface to VOP_PUTPAGES changed to pass all pager flags, not
just sync. FS putpages routines must know about the pager flags.

- new LFS magic disk address, -2 ("unwritten"), meaning accounted for
but not assigned to a fixed disk location (since LFS does these two
things separately, and the previous accounting method using buffer
headers no longer will work). Changed references to (foo == (daddr_t)-1)
to (foo < 0). Since disk drivers reject all addresses < 0, this should
not present a problem for other FSs.
 1.12.4.2 04-Jul-1999  chs a couple steps towards supporting UBC.
 1.12.4.1 21-Jun-1999  thorpej Sync w/ -current.
 1.12.2.1 25-Jun-1999  perry pullup 1.12->1.13 (perseant)
 1.13.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.13.4.1 19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.13.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.13.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.16.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.18.2.2 03-Feb-2001  he Pull up revisions 1.26-1.27 (requested by perseant):
o Close up accounting holes in LFS' accounting of immediately-
available-space, number of clean segments, and amount of dirty
space taken up by metadata (PR#11468, PR#11470, PR#11534).
 1.18.2.1 14-Sep-2000  perseant Pull up recent LFS kernel changes (approved by thorpej):

ufs/ufs/inode.h, 1.20--1.22 (add i_lfs_effnblks extension ;
make ITIMES aware of LFS_ITIMES;
_LKM protection so userland progs
compile)
ufs/ufs/ufs_vnops.c, 1.69, 1.71 (remove IN_ADIROP;
use ITIMES instead of FFS_ITIMES)
ufs/ufs/ufs_readwrite.c, 1.27 (use lfs_reserve in lfs_write)
ufs/lfs/lfs.h, 1.26--1.32 (define LFS_EST_* macros ;
change MIN_FREE_SEGS to lfs_minfreesegs ;
add avail and bfree to CLEANERINFO ;
change lfs_uinodes to signed ;
change lfs_dmeta to signed ;
add whitespace to line up structure
members ;
explicit cast to int32_t in LFS_EST_*
macros)
ufs/lfs/lfs_alloc.c, back out 1.34.2.3 (pullups of 1.39, 1.40);
then pull up 1.38 (clean up on error)
1.39--1.43 (restore fvdl's ufs_hashlock fix ;
restore fvdl's ufs_hashlock fix ;
set i_lfs_effnblks ;
use UINO macros ;
add comments and fix long lines)
ufs/lfs/lfs_balloc.c, 1.19 (don't succeed halfway)
1.21--1.25 (use i_lfs_effnblks ;
fix i_lfs_effnblks computation and
quieten ;
fix i_ffs_blocks in unwritten fragment ;
remove useless debugging check ;
add comments and (c) 2000)
ufs/lfs/lfs_bio.c, 1.24--1.30 (cleanup and make lfs_flush_fs take
"struct lfs *" instead of "struct
mount *" ;
use lfs_minfreeseg instead of
MIN_FREE_SEGS ;
use UINO macros, and copy bfree/avail
to CLEANERINFO ;
add lfs_reserve function ;
1.28--1.30 fix printf formatting)
ufs/lfs/lfs_cksum.c, 1.13 (add (c) 2000)
ufs/lfs/lfs_debug.c, 1.11 (use btodb instead of DEV_BSIZE)
ufs/lfs/lfs_extern.h, 1.18, 1.20--1.21 (function prototype changes)
ufs/lfs/lfs_inode.c, 1.38 (rewrite lfs_truncate from
ffs_truncate)
1.40--1.44 (count written and unwritten blocks
seperately ;
use disk block units instead of bytes ;
remove unnecessary "mod" variable ;
correct B_DELWRI to avoid bawrite panic ;
use lfs_reserve)
ufs/lfs/lfs_segment.c, 1.52-1.59 (use lfs_dmeta to note used summaries ;
check for UNWRITTEN in indirect blocks ;
more debugging stuff inside #ifdef
DEBUG_LFS ;
use LK_CANRECURSE ;
don't drop dirty indirect blocks ;
use UINO macros ;
don't hose the free list ;
use btodb() instead of DEV_BSIZE ;
make it compile again (oops))
ufs/lfs/lfs_subr.c, 1.16--1.17 (check for locked inodes before
changing ;
use btodb() instead of DEV_BSIZE, (c)
2000)
ufs/lfs/lfs_syscalls.c, back out 1.41.4.2 (fvdl's ufs_hashlock fix);
then pull up 1.43 (use lfs_dmeta)
1.44--1.45 (restore fvdl's ufs_hashlock fix)
1.46--1.47 (fix lfs_avail leakage from sblock
segments ;
use UINO macros)
1.49 (bounds-check inode numbers in
lfs_markv)
ufs/lfs/lfs_vfsops.c, 1.53 (use LFS_EST_* macros in lfs_statfs)
1.56--1.58 (initialize lfs_minfreeseg, lfs_effnblk ;
initialize lfs_uinodes ;
initialize lfs_ravail)
ufs/lfs/lfs_vnops.c, 1.40 (remove VDIROP from removed files)
1.42--1.44 (move SET_ENDOP below the removal of
VDIROP ;
use UINO macros and add lfs_itimes
function ;
use lfs_reserve in dirops)
 1.27.2.8 11-Dec-2002  thorpej Sync with HEAD.
 1.27.2.7 01-Aug-2002  nathanw Catch up to -current.
 1.27.2.6 20-Jun-2002  nathanw Catch up to -current.
 1.27.2.5 08-Jan-2002  nathanw Catch up to -current.
 1.27.2.4 14-Nov-2001  nathanw Catch up to -current.
 1.27.2.3 24-Aug-2001  nathanw Catch up with -current.
 1.27.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.27.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.28.4.4 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.28.4.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.28.4.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.28.4.1 03-Aug-2001  lukem update to -current
 1.28.2.2 02-Jul-2001  perseant Change disk addressing unit to be the fragment, instead of the disk sector.
All quantities in the superblock, inodes, indirect blocks, etc. refer now
to this abstract unit (called "fsb" as it is in FFS) instead of disk sectors;
as a consequence segment summary blocks have to be multiples of a fragment in
size. In v1 filesystems, compatibility code ensures that 1 fsb == 1 sector,
regardless of fragment size.

Fragments can now range in size between 512 and 32k; in the event that
LFS_LABELPAD (8k) is smaller than the disk address unit size, an extra
proto-superblock is kept at 8k from the beginning of the disk, to be used
*only* to locate the real superblocks. (Not all of the userland knows about
this yet.)

Almost all of this was done not by me, but by joff.
 1.28.2.1 29-Jun-2001  perseant Get rid of __P(), protoizing where it had not already been done
 1.29.4.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.32.2.1 15-Jul-2002  gehenna catch up with -current.
 1.42.2.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.42.2.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.42.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.42.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.42.2.1 03-Aug-2004  skrll Sync with HEAD
 1.48.10.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.48.8.1 29-Apr-2005  kent sync with -current
 1.48.6.1 10-May-2005  riz Pull up the following revisions (requested by perseant in ticket #1281):

1.8 sys/ufs/lfs/TODO
1.75 sys/ufs/lfs/lfs.h (via patch)
1.74 sys/ufs/lfs/lfs_alloc.c (via patch)
1.49, 1.51 sys/ufs/lfs/lfs_balloc.c (1.51 via patch)
1.78 sys/ufs/lfs/lfs_bio.c
1.62 sys/ufs/lfs/lfs_extern.h (via patch)
1.156 sys/ufs/lfs/lfs_segment.c (via patch)
1.48 sys/ufs/lfs/lfs_subr.c
1.101 sys/ufs/lfs/lfs_syscalls.c
1.163 sys/ufs/lfs/lfs_vfsops.c (via patch)
1.134 sys/ufs/lfs/lfs_vnops.c (via patch)
1.61 sys/ufs/ufs/ufs_readwrite.c (via patch)

1.20 libexec/lfs_cleanerd/clean.h (via patch)
1.52 libexec/lfs_cleanerd/cleanerd.c (via patch)
1.41 libexec/lfs_cleanerd/library.c (via patch)

1.4 regress/sys/fs/lfs/newfs_fsck/Makefile
1.2 regress/sys/fs/lfs/newfs_fsck/mkfs_mount
1.2 regress/sys/fs/lfs/newfs_fsck/smallfiles
1.3 sbin/fsck_lfs/bufcache.c
1.3 sbin/fsck_lfs/bufcache.h
1.3 sbin/fsck_lfs/lfs.h
1.8 sbin/fsck_lfs/lfs.c (via patch)
1.8 sbin/fsck_lfs/pass3.c (via patch)
1.18 sbin/fsck_lfs/pass0.c (via patch)
1.18 sbin/fsck_lfs/utilities.c (via patch)
1.7 sbin/fsck_lfs/segwrite.c
1.19 sbin/fsck_lfs/setup.c (via patch)
1.3 sbin/newfs_lfs/Makefile
0 sbin/newfs_lfs/lfs.c (yes, remove it)
1.1 sbin/newfs_lfs/make_lfs.c
1.15 sbin/newfs_lfs/newfs.c (via patch)

Various minor LFS improvements.

Kernel:

* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this. Should fix PR #29045.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
Fixes PR #26680.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().

cleaner:

* Adapt lfs_cleanerd to use the fcntl call to get the Ifile filehandle,
so it need not be in the namespace.
* Make lfs_cleanerd be more careful when there are very few available
segments.
* Make lfs_cleanerd less verbose when the filesystem is unmounted.

newfs_lfs, fsck_lfs, and regression:

* Extend the lfs library from fsck_lfs(8) so that it can be used with a
not-yet-existent LFS. Make newfs_lfs(8) use this library, so it can
create LFSs whose Ifile is larger than one segment. Addresses PR #11110.
* Make newfs_lfs(8) use strsuftoi64() for its arguments, a la newfs(8).
* Make fsck_lfs(8) respect the "file system is clean" flag.
* Don't let fsck_lfs(8) think it has dirty blocks when invoked with the
-n flag.
* Remove the Ifile from the filesystem namespace. The cleaner now uses
a fcntl call on the root inode to find the Ifile filehandle. (As a
side-effect, addresses PR #29144.)
 1.51.2.2 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_balloc.c: revision 1.60
sys/ufs/lfs/lfs_syscalls.c: revision 1.111
sys/ufs/lfs/lfs_segment.c: revision 1.172
sys/ufs/lfs/lfs_vnops.c: revision 1.163
Several minor bug fixes:
* Correct (weak) segment lock assertions in lfs_fragextend and lfs_putpages.
* Keep IN_MODIFIED set if we run out of avail in lfs_putpages.
* Don't try to (re)write buffers on a VBLK vnode; fixes a panic I found
while running with an LFS root.
* Raise priority of LFCNSEGWAIT to PVFS; PUSER is way too low for
something the pagedaemon is relying on.
 1.51.2.1 07-May-2005  tron Apply patch (requested by perseant in ticket #242):
* fsck_lfs buffer cache fixes, including PR #29151
* Change fsck_lfs phase 0 message to reflect reality
* fsck_lfs: check phase 5 (cleanerinfo accounting) even on
roll-forward
* Keep better track of the free list during roll-forward, avoiding
a core dump
* Improve hash table use for fsck_lfs buffer and vnode cache
* Document fsck_lfs flag -f, and implement -q
* Add resize_lfs, including kernel support
* Add LFS to mountd's list of exportable filesystem types
* Make the LFS lkm work again [christos@]
* Add MP locking to the LFS kernel subsystem
* Fix pager_map deadlock in lfs_putpages()
* Avoid incomplete file extension that looks like "partial
truncation" to fsck
* Use lfs_malloc for cleaner malloc, since the cleaner often runs
in low-memory conditions.
* Use splay trees, not hash table, to track page allocation for
write.
* Fix mkdir panic on full fs
* Fix page accounting leak by counting differently.
* Use rightly named structure for lfs_getattr [skrll@]
* Cosmetic changes for readability.
 1.56.4.1 20-Oct-2005  yamt adapt ufs.
 1.56.2.5 27-Feb-2008  yamt sync with head.
 1.56.2.4 21-Jan-2008  yamt sync with head
 1.56.2.3 27-Oct-2007  yamt sync with head.
 1.56.2.2 26-Feb-2007  yamt sync with head.
 1.56.2.1 21-Jun-2006  yamt sync with head.
 1.59.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.59.10.3 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.59.10.2 19-Apr-2006  elad sync with head.
 1.59.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.59.8.2 24-May-2006  yamt sync with head.
 1.59.8.1 11-Apr-2006  yamt sync with head
 1.59.6.2 01-Jun-2006  kardel Sync with head.
 1.59.6.1 22-Apr-2006  simonb Sync with head.
 1.59.4.1 09-Sep-2006  rpaulo sync with head
 1.62.22.1 14-Oct-2007  yamt sync with head.
 1.62.20.3 23-Mar-2008  matt sync with HEAD
 1.62.20.2 09-Jan-2008  matt sync with HEAD
 1.62.20.1 06-Nov-2007  matt sync with HEAD
 1.62.18.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.62.6.3 24-Aug-2007  ad Sync with buffer cache locking changes. See buf.h/vfs_bio.c for details.
Some minor portions are incomplete and needs to be verified as a whole.
 1.62.6.2 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.62.6.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.63.10.1 02-Jan-2008  bouyer Sync with HEAD
 1.63.6.2 19-Dec-2007  ad Use a global lfs_lock.
 1.63.6.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.63.4.1 18-Feb-2008  mjf Sync with HEAD.
 1.65.10.3 11-Mar-2010  yamt sync with head
 1.65.10.2 04-May-2009  yamt sync with head.
 1.65.10.1 16-May-2008  yamt sync with head.
 1.65.8.1 18-May-2008  yamt sync with head.
 1.65.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.66.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.67.12.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.67.6.1 28-Apr-2009  skrll Sync with HEAD.
 1.68.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.70.12.4 03-Dec-2017  jdolecek update from HEAD
 1.70.12.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.70.12.2 23-Jun-2013  tls resync from head
 1.70.12.1 25-Feb-2013  tls resync with head
 1.70.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.70.2.1 23-Jan-2013  yamt sync with head
 1.77.2.1 28-Aug-2013  rmind sync with head
 1.80.6.5 28-Aug-2017  skrll Sync with HEAD
 1.80.6.4 05-Oct-2016  skrll Sync with HEAD
 1.80.6.3 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.80.6.2 22-Sep-2015  skrll Sync with HEAD
 1.80.6.1 06-Apr-2015  skrll Sync with HEAD
 1.88.2.1 26-Apr-2017  pgoyette Sync with HEAD
 1.91.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.92.6.1 30-Oct-2017  snj Pull up following revision(s) (requested by maya in ticket #330):
sbin/fsck_lfs/inode.c: 1.69
sbin/fsck_lfs/lfs.c: 1.73
sbin/fsck_lfs/pass6.c: 1.50
sbin/fsck_lfs/segwrite.c: 1.46
sys/ufs/lfs/lfs.h: 1.202-1.203
sys/ufs/lfs/lfs_accessors.h: 1.48
sys/ufs/lfs/lfs_alloc.c: 1.136-1.137
sys/ufs/lfs/lfs_balloc.c: 1.94
sys/ufs/lfs/lfs_bio.c: 1.141
sys/ufs/lfs/lfs_extern.h: 1.113
sys/ufs/lfs/lfs_inode.c: 1.156-1.157
sys/ufs/lfs/lfs_inode.h: 1.20, 1.21, 1.23
sys/ufs/lfs/lfs_itimes.c: 1.20
sys/ufs/lfs/lfs_pages.c: 1.13-1.15
sys/ufs/lfs/lfs_rename.c: 1.22
sys/ufs/lfs/lfs_segment.c: 1.270-1.275
sys/ufs/lfs/lfs_subr.c: 1.94-1.97
sys/ufs/lfs/lfs_syscalls.c: 1.175
sys/ufs/lfs/lfs_vfsops.c: 1.360
sys/ufs/lfs/lfs_vnops.c: 1.316-1.321
sys/ufs/lfs/ulfs_inode.c: 1.20
sys/ufs/lfs/ulfs_inode.h: 1.24
sys/ufs/lfs/ulfs_lookup.c: 1.41
sys/ufs/lfs/ulfs_quota2.c: 1.31
sys/ufs/lfs/ulfs_readwrite.c: 1.24
sys/ufs/lfs/ulfs_vnops.c: 1.49-1.50
Update inode member i_flag --> i_state to keep up with kernel changes
Move definition of IN_ALLMOD near the flag it's a mask for.
Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
Correct confusion between i_flag and i_flags
These will have to be renamed.
Spotted by Riastradh, thanks!
Add an XXX about the missing flags so it's not buried in a commit
message.
now the XXX count for LFS is 260
Rename i_flag to i_state.
The similarity to i_flags has previously caused errors.
Use continue to denote the no-op loop to match netbsd style
newline for extra clarity.
It isn't safe to drain dirops with seglock held, it'll deadlock if there
are any dirops. drain before grabbing seglock.
lfs_dirops == 0 is always true (as we already drained dirops), so omit
that part of the comparison.
Fixes a lot of LFS deadlocks. PR kern/52301
Many thanks to dholland for help analyzing coredumps
Ifdef out KDASSERT which fires on my machine.
Deduplicate sanity check that seglock is held on segunlock
Revert r1.272 fix to PR kern/52301, the performance hit is making things
unusable.
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar
XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
fix buffer overflow/KASSERT when cookies are supplied
lfs no longer uses the ffs-style struct direct, use the correct minimum
size
from dholland
XXX more wrong
Consistently use {,UN}MARK_VNODE macros rather than function calls.
Not much point doing anything after a panic call
Ask some question about the code in a XXX comment
XXX question our double-flushing of dirops
Fix typo in comment
 1.94.12.1 29-Feb-2020  ad Sync with head.
 1.94.10.1 17-Aug-2020  martin Pull up following revision(s) (requested by riastradh in ticket #1050):

sys/ufs/lfs/lfs_subr.c: revision 1.101
sys/ufs/lfs/lfs_subr.c: revision 1.102
sys/ufs/lfs/lfs_inode.c: revision 1.158
sys/ufs/lfs/lfs_inode.h: revision 1.25
sys/ufs/lfs/lfs_balloc.c: revision 1.95
sys/ufs/lfs/lfs_pages.c: revision 1.21
sys/ufs/lfs/lfs_vnops.c: revision 1.330
sys/ufs/lfs/lfs_alloc.c: revision 1.140 (patch)
sys/ufs/lfs/lfs_alloc.c: revision 1.141 (patch)
lib/libp2k/p2k.c: revision 1.72
sys/ufs/lfs/lfs.h: revision 1.205
sys/ufs/lfs/lfs.h: revision 1.206
sys/ufs/lfs/lfs_segment.c: revision 1.284
sys/ufs/lfs/lfs.h: revision 1.207
sys/ufs/lfs/lfs_segment.c: revision 1.285
sys/ufs/lfs/lfs_debug.c: revision 1.55
sys/ufs/lfs/lfs_rename.c: revision 1.23
usr.sbin/dumplfs/dumplfs.c: revision 1.65
sys/ufs/lfs/lfs_vfsops.c: revision 1.371
sys/arch/i386/stand/efiboot/bootx64/Makefile: revision 1.3
sys/ufs/lfs/lfs_vfsops.c: revision 1.372
sys/ufs/lfs/lfs_vfsops.c: revision 1.373
sbin/fsck_lfs/pass1.c: revision 1.46
sys/ufs/lfs/lfs_vnops.c: revision 1.326
sys/ufs/lfs/lfs_vnops.c: revision 1.327
sys/ufs/lfs/lfs_vfsops.c: revision 1.375 (patch)
sys/ufs/lfs/lfs_vnops.c: revision 1.328
sys/ufs/lfs/lfs_subr.c: revision 1.98
sys/ufs/lfs/lfs_extern.h: revision 1.116
sys/ufs/lfs/lfs_vnops.c: revision 1.329
sys/ufs/lfs/lfs_subr.c: revision 1.99
sys/ufs/lfs/lfs_extern.h: revision 1.117
sys/ufs/lfs/lfs_accessors.h: revision 1.49
sys/ufs/lfs/lfs_extern.h: revision 1.118
sys/rump/fs/lib/liblfs/Makefile: revision 1.15
sys/ufs/lfs/lfs_bio.c: revision 1.146 (patch)
sys/ufs/lfs/lfs_bio.c: revision 1.147
sys/ufs/lfs/lfs_subr.c: revision 1.100

Fix kassert in lfs by initializing vp first.

Use a marker node to iterate lfs_dchainhd / i_lfs_dchain.

I believe elements can be removed while the lock is dropped,
including the next node we're hanging on to.

Just use VOP_BWRITE for lfs_bwrite_log.
Hope this doesn't cause trouble with vfs_suspend.

Teach lfs to transition ro<->rw.

Prevent new dirops while we issue lfs_flush_dirops.

lfs_flush_dirops assumes (by KASSERT((ip->i_state & IN_ADIROP) == 0))
that vnodes on the dchain will not become involved in active dirops
even while holding no other locks (lfs_lock, v_interlock), so we must
set lfs_writer here. All other callers already set lfs_writer.

We set fs->lfs_writer++ without explicitly doing lfs_writer_enter
because
(a) we already waited for the dirops to drain, and
(b) we hold lfs_lock and cannot drop it before setting lfs_writer.

Assert lfs_writer where I think we can now prove it.

Serialize access to the splay tree with lfs_lock.

Change some cheap KDASSERT into KASSERT.

Take a reference and fix assertions in lfs_flush_dirops.
Fixes panic:
KASSERT((ip->i_state & IN_ADIROP) == 0) at lfs_vnops.c:1670
lfs_flush_dirops
lfs_check
lfs_setattr
VOP_SETATTR
change_mode
sys_fchmod
syscall

This assertion -- and the assertion that vp->v_uflag has VU_DIROP set
-- is valid only until we release lfs_lock, because we may race with
lfs_unmark_dirop which will remove the nodes and change the flags.

Further, vp itself is valid only as long as it is referenced, which it
is as long as it's on the dchain, but lfs_unmark_dirop drops the
dchain's reference.

Don't lfs_writer_enter while holding v_interlock.

There's no need to lfs_writer_enter at all here, as far as I can see.
lfs_flush_fs will do it for us.

Break deadlock in PR kern/52301.

The lock order is lfs_writer -> lfs_seglock. The problem in 52301 is
that lfs_segwrite violates this lock order by sometimes doing
lfs_seglock -> lfs_writer, either (a) when doing a checkpoint or (b),
opportunistically, when there are no dirops pending. Both cases can
deadlock, because dirops sometimes take the seglock (lfs_truncate,
lfs_valloc, lfs_vfree):
(a) There may be dirops pending, and they may be waiting for the
seglock, so we can't wait for them to complete while holding the
seglock.
(b) The test for fs->lfs_dirops == 0 happens unlocked, and the state
may change by the time lfs_writer_enter acquires lfs_lock.

To resolve this in each case:
(a) Do lfs_writer_enter before lfs_seglock, since we will need it
unconditionally anyway. The worst performance impact of this should
be that some dirops get delayed a little bit.
(b) Create a new lfs_writer_tryenter to use at this point so that the
test for fs->lfs_dirops == 0 and the acquisition of lfs_writer happen
atomically under lfs_lock.

Initialize/destroy lfs_allclean_wakeup in modcmd, not lfs_mountfs.

Fixes reloading lfs.kmod.

In lfs_update, hold lfs_writer around lfs_vflush.

Otherwise, we might do
lfs_vflush
-> lfs_seglock
-> lfs_segwait(SEGM_CKP)
-> lfs_writer_enter
which is the reverse of the lfs_writer -> lfs_seglock ordering.

Call lfs_orphan in lfs_rename while we're still in the dirop.
lfs_writer_enter can't fail; keep it simple and don't pretend it can.

Assert that mtsleep can't fail either -- it doesn't catch signals and
there's no timeout.

Teach LFS_ORPHAN_NEXTFREE about lfs64.

Dust off the orphan detection code and try to make it work.

Fix !DIAGNOSTIC compile

Fix userland references to LFS_ORPHAN_NEXTFREE.

Forgot to grep for these or do a full distribution build, oops!

Fix missing <sys/evcnt.h> by removing the evcnts instead.

Just wanted to confirm that a race might happen, and indeed it did.
These serve little diagnostic value otherwise.

OR into bp->b_cflags; don't overwrite.

CTASSERT lfs on-disk structure sizes.

Avoid misaligned access to lfs64 on-disk records in memory.
lfs64 directory entries are only 32-bit aligned in order to conserve
space in directory blocks, and we had a hack to stuff a 64-bit inode
in them. This replaces the hack by __aligned(4) __packed, and goes
further:

1. It's not clear that all the other lfs64 data structures are 64-bit
aligned on disk to begin with. We can go through these later and
upgrade them from
struct foo64 {
...
} __aligned(4) __packed;
union foo {
struct foo64 f64;
...
};
to
struct foo64 {
...
};
union foo {
struct foo64 f64 __aligned(8);
...
} __aligned(4) __packed;
if we really want to take advantage of 64-bit memory accesses.
However, the __aligned(4) __packed must remain on the union
because:
2. We access even the lfs32 data structures via a union that has
lfs64 members, and it turns out that compilers will assume access
through a union with 64-bit aligned members implies the whole
union has 64-bit alignment, even if we're only accessing a 32-bit
aligned member.

Fix clang build after packed lfs64 accessor change.

Suppress spurious address-of-packed error in rump lfs too.
 1.94.6.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.151 20-Oct-2025  perseant * Generalize the partial-segment parser introduced for roll-forward,
using it to facilitate an in-kernel segment rewriter (cleaner), and a
mechanism to check whether a segment is in fact empty (only used
with DEBUG).

* Add these new fcntl calls:
- LFCNFILESTATS: For each inode given, report its number of direct
blocks, how many gaps (discontinuities) there are between direct
blocks, and how large the total gap distance is. This will be
useful for a coalescing agent.
- LFCNREWRITEFILE: For each inode given, rewrite its direct blocks,
effectively coalescing it into as compact a form as possible.
- LFCNSCRAMBLE: As above, except that it only rewrites every other
block. This causes the file to have many gaps that can be
measured with LFCNFILESTATS and addressed with LFCNREWRITEFILE,
for testing purposes.
- LFCNREWRITESEGS: Rewrite any live data in the given segments.
This is intended to simplify the cleaner API and facilitate an
in-kernel cleaner.
- LFCNCLEANERINFO: Get the most current CLEANERINFO data from the
kernel.
- LFCNSEGUSE: Retrieve segment usage data from the kernel.

* Vnodes marked IN_CLEANING now take a reference. Add a new "cleaner
lock", which must be taken by the cleaner before the segment lock,
and before marking nodes IN_CLEANING. This allows us to flush
vnodes, if necessary, before the cleaning segment is written, and
never to flush vnodes being cleaned. When the cleaner lock is
released, the vnodes are cleared of IN_CLEANING and the reference
dropped.

* Track a potential infinite loop in lfs_gatherblock.

* Pull "needs to flush" and "needs to wait for flush" into functions
instead of inlining their definitions.
 1.150 15-Sep-2025  perseant If we don't have enough space, flush with checkpoint: the Ifile might be
clogging up the buffer cache.

Rewrite the logic in lfs_flush() so that the requested filesystem is always
flushed, regardless of whether only_onefs is set.

Use LFS_WAIT_BYTES and LFS_WAIT_BUFS as the thresholds when determining
whether to wait for resources, rather than their _MAX_ counterparts.
 1.149 05-Sep-2020  riastradh Round of uvm.h cleanup.

The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
to query whether curlwp is the pagedaemon, which should maybe be
exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
by it. We should split up uvm_extern.h but this will serve for now
to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
UVMHIST(ubchist), since ubchist is declared in uvm.h but the
reference evaporates if UVMHIST is not defined, so we reduce header
file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
here.

ok chs@
 1.148 11-Jun-2020  ad uvm_availmem(): give it a boolean argument to specify whether a recent
cached value will do, or if the very latest total must be fetched. It can
be called thousands of times a second and fetching the totals impacts not
only the calling LWP but other CPUs doing unrelated activity in the VM
system.
 1.147 14-Mar-2020  ad OR into bp->b_cflags; don't overwrite.
 1.146 23-Feb-2020  riastradh Prevent new dirops while we issue lfs_flush_dirops.

lfs_flush_dirops assumes (by KASSERT((ip->i_state & IN_ADIROP) == 0))
that vnodes on the dchain will not become involved in active dirops
even while holding no other locks (lfs_lock, v_interlock), so we must
set lfs_writer here. All other callers already set lfs_writer.

We set fs->lfs_writer++ without explicitly doing lfs_writer_enter
because

(a) we already waited for the dirops to drain, and
(b) we hold lfs_lock and cannot drop it before setting lfs_writer.
 1.145 18-Feb-2020  chs remove the aiodoned thread. I originally added this to provide a thread context
for doing page cache iodone work, but since then biodone() has changed to
hand off all iodone work to a softint thread, so we no longer need the
special-purpose aiodoned thread.
 1.144 31-Dec-2019  ad branches: 1.144.2;
Rename uvm_free() -> uvm_availmem().
 1.143 21-Dec-2019  ad uvmexp.free -> uvm_free()
 1.142 09-Jun-2018  zafer branches: 1.142.2; 1.142.6;
Add missing b_cflags and b_oflags.
Ok dholland@
Addresses PR kern/42342 by Yoshihiro Nakajima
 1.141 10-Jun-2017  maya branches: 1.141.4;
Rename i_flag to i_state.

The similarity to i_flags has previously caused errors.
 1.140 08-Jun-2017  chs move some buffer cache internals declarations from buf.h to vfs_bio.c.
this is needed to avoid name conflicts with ZFS and also
makes it clearer that other code shouldn't be messing with these.
remove the LFS debug code that poked around in bufqueues and
remove the BQ_EMPTY bufqueue since nothing uses it anymore.
provide a function to let LFS and wapbl read the value of nbuf for now.
 1.139 17-Apr-2017  hannken branches: 1.139.4;
Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.
 1.138 13-Apr-2017  hannken Switch lfs_flush() and lfs_writerd() to mountlist iterator.
 1.137 01-Apr-2017  maya Switch lfs_writer_daemon to use condvar instead of mtsleep.
track thread existence with struct lwp instead of pid + lid,
it's more useful from ddb.
 1.136 13-Mar-2017  riastradh #if DIAGNOSTIC panic ---> KASSERT

Replace some #if DEBUG by this too. DEBUG is only for expensive
assertions; these are not.
 1.135 03-Oct-2015  hannken branches: 1.135.2; 1.135.4;
Remove dubious vhold()/holdrele() from lfs_reserve().
The vnodes are always referenced on entry.

If we changed ulfs_remove() and ulfs_rmdir() to return the locked dvp
the vnodes were always locked on entry.

Remove an outdated comment from lfs_reserveavail(), unlocking/relocking
the vnode was removed in rev 1.49.
 1.134 12-Aug-2015  dholland Hack up dinode usage to be 64 vs. 32 as needed. Part 1.

(This part changes the native lfs code; the ufs-derived code already
has 64 vs. 32 logic, but as aspects of it are unsafe, and don't
entirely interoperate cleanly with the lfs 64/32 stuff, pass 2 will be
rehashing that.)
 1.133 02-Aug-2015  dholland Fix assorted 64 -> 32 truncations in lfs. Also, some minor tidyups and
corrections in passing.
 1.132 28-Jul-2015  dholland Add a new lfs header file: lfs_accessors.h.

This contains all the accessor functions and macros out of lfs.h.
Add an include of lfs_accessors.h after all uses of lfs.h... except
for code that wants to define its own struct lfs-alike that the
accessors are supposed to play along with. For these, set STRUCT_LFS
and include lfs_accessors.h after the necessary structure has been
defined, so that lfs_accessors.h can emit functions in terms of it.
 1.131 25-Jul-2015  martin Use accessors in DEBUG and DIAGNOSTIC code as well
 1.130 24-Jul-2015  dholland More lfs superblock accessors.
(This changes the rest of the code over; all the accessors were
already added.)

The difference between this commit and the previous one is arbitrary,
but the previous one passed the regression tests on its own so I'm
keeping it separate to help with any bisections that might be needed
in the future.
 1.129 24-Jul-2015  dholland Switch to accessor functions for elements of the LFS on-disk
superblock. This will allow switching between 32/64 bit forms on the
fly; it will also allow handling LFS_EI reasonably tidily. (That
currently doesn't work on the superblock.)

It also gets rid of cpp abuse in the form of fake structure member
macros.

Also, instead of doing sleep/wakeup on &lfs_avail and &lfs_nextseg
inside the on-disk superblock, add extra elements to the in-memory
struct lfs for this. (XXX: these should be changed to condvars, but
not right now)

XXX: this migrates a structure needed by the lfs code in libsa (struct
salfs) into lfs.h, where it doesn't belong, but for the time being
this is necessary in order to allow the accessors (and the various
lfs macros and other goop that relies on them) to compile.
 1.128 27-Nov-2013  christos branches: 1.128.6;
Change the queue.3 *_END(&head) macros to NULL. Since we don't have CIRCLEQ
anymore, all the macros expand to NULL anyway, so this improves readability.
Requested by rmind@
 1.127 23-Nov-2013  christos change the mountlist CIRCLEQ into a TAILQ
 1.126 28-Jul-2013  dholland Add lfs_kernel.h for declarations that don't need to be exposed to userland.

lfs currently has the following headers:
lfs.h - on-disk structures and stuff needed for userlevel tools
lfs_inode.h - additional restricted materials for userlevel tools
that operate the fs (newfs_lfs, fsck_lfs, lfs_cleanerd)
lfs_kernel.h - stuff needed only in the kernel

and the following legacy headers that are expected to be mopped up and
folded into one of the above:
lfs_extern.h - function prototypes
ulfs_bswap.h - endian-independent support
ulfs_dinode.h - now contains very little
ulfs_dirhash.h - dirhash support
ulfs_extattr.h - extattr support
ulfs_extern.h - more function prototypes
ulfs_inode.h - assorted kernel-only declarations
ulfs_quota.h - quota support
ulfs_quota1.h - more quota support
ulfs_quota2.h - more quota support
ulfs_quotacommon.h - more quota support
ulfsmount.h - legacy copy of ufsmount material
 1.125 18-Jun-2013  christos branches: 1.125.2;
Prefix most of the cpp macros with lfs_ and LFS_ to avoid conflicts with ffs.
This was done so that boot blocks that want to compile both FFS and LFS in
the same file work.
 1.124 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.123 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.122 16-Feb-2012  perseant branches: 1.122.2;
Pass t_renamerace and t_rmdirrace tests.

Adapt dholland@'s fix to ufs_rename to fix PR kern/43582. Address several
other MP locking issues discovered during the course of investigating the
same problem.

Removed extraneous vn_lock() calls on the Ifile, since the Ifile writes
are controlled by the segment lock.

Fix PR kern/45982 by deemphasizing the estimate of how much metadata
will fill the empty space on disk when the disk is nearly empty
(t_renamerace crates a lot of inode blocks on a tiny empty disk).
 1.121 02-Jan-2012  perseant branches: 1.121.2;

* Remove PGO_RECLAIM during lfs_putpages()' call to genfs_putpages(),
to avoid a live lock in the latter when reclaiming a vnode with
dirty pages.

* Add a new segment flag, SEGM_RECLAIM, to note when a segment is
being written for vnode reclamation, and record which inode is being
reclaimed, to aid in forensic debugging.

* Add a new segment flag, SEGM_SINGLE, so that opportunistic writes
can write a single segment's worth of blocks and then stop, rather
than writing all the way up to the cleaner's reserved number of
segments.

* Add assert statements to check mutex ownership is the way it ought
to be, mostly in lfs_putpages; fix problems uncovered by this.

* Don't clear VU_DIROP until the inode actually makes its way to disk,
avoiding a problem where dirop inodes could become separated
(uncovered by a modified version of the "ckckp" forensic regression
test).

* Move the vfs_getopsbyname() call into lfs_writerd. Prepare code to
make lfs_writerd notice when there are no more LFSs, and exit losing
the reference, so that, in theory, the module can be unloaded. This
code is not enabled, since it causes a crash on exit.

* Set IN_MODIFIED on inodes flushed by lfs_flush_dirops. Really we
only need to set IN_MODIFIED if we are going to write them again
(e.g., to write pages); need to think about this more.

Finally, several changes to help avoid "no clean segments" panics:

* In lfs_bmapv, note when a vnode is loaded only to discover whether
its blocks are live, so it can immediately be recycled. Since the
cleaner will try to choose ~empty segments over full ones, this
prevents the cleaner from (1) filling the vnode cache with junk, and
(2) squeezing any unwritten writes to disk and running the fs out of
segments.

* Overestimate by half the amount of metadata that will be required
to fill the clean segments. This will make the disk appear smaller,
but should help avoid a "no clean segments" panic.

* Rearrange lfs_writerd. In particular, lfs_writerd now pays
attention to the number of clean segments available, and holds off
writing until there is room.
 1.120 11-Jul-2011  hannken branches: 1.120.2; 1.120.6;
Change VOP_BWRITE() to take a vnode as its first argument like all other
VOPs do. Layered file systems no longer have to modify bp->b_vp and run
into trouble when an async VOP_BWRITE() uses the wrong vnode.

- change all occurences of VOP_BWRITE(bp) to VOP_BWRITE(bp->b_vp, bp).
- remove layer_bwrite().
- welcome to 5.99.55

Adresses PR kern/38762 panic: vwakeup: neg numoutput

No objections from tech-kern@.
 1.119 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.118 24-Jun-2010  hannken branches: 1.118.6;
Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.
 1.117 16-Feb-2010  mlelstv branches: 1.117.2;
Three changes in a single commit.

- drop the notion of frags (LFS fragments) vs fsb (FFS fragments)
The code uses a complicated unity function that just makes the
code difficult to understand.

- support larger sector sizes. Fix disk address computations
to use DEV_BSIZE in the kernel as required by device drivers
and to use sector sizes in userland.

- Fix several locking bugs in lfs_bio.c and lfs_subr.c.
 1.116 08-Jan-2010  pooka branches: 1.116.2;
The VATTR_NULL/VREF/VHOLD/HOLDRELE() macros lost their will to live
years ago when the kernel was modified to not alter ABI based on
DIAGNOSTIC, and now just call the respective function interfaces
(in lowercase). Plenty of mix'n match upper/lowercase has creeped
into the tree since then. Nuke the macros and convert all callsites
to lowercase.

no functional change
 1.115 07-Dec-2009  eeh Fix some more hangs and deadlocks.
 1.114 06-May-2008  ad branches: 1.114.18;
PR kern/38141 lookup/vfs_busy acquire rwlock recursively

Simplify the mount locking. Remove all the crud to deal with recursion on
the mount lock, and crud to deal with unmount as another weirdo lock.

Hopefully this will once and for all fix the deadlocks with this. With this
commit there are two locks on each mount:

- krwlock_t mnt_unmounting. This is used to prevent unmount across critical
sections like getnewvnode(). It's only ever read locked with rw_tryenter(),
and is only ever write locked in dounmount(). A write hold can't be taken
on this lock if the current LWP could hold a vnode lock.

- kmutex_t mnt_updating. This is taken by threads updating the mount, for
example when going r/o -> r/w, and is only present to serialize updates.
In order to take this lock, a read hold must first be taken on
mnt_unmounting, and the two need to be held across the operation.

One effect of this change: previously if an unmount failed, we would make a
half hearted attempt to back out of it gracefully, but that was unlikely to
work in a lot of cases. Now while an unmount that will be aborted is in
progress, new file operations within the mount will fail instead of being
delayed. That is unlikely to be a problem though, because if the admin
requests unmount of a file system then s(he) has made a decision to deny
access to the resource.
 1.113 30-Apr-2008  ad PR kern/38135 vfs_busy/vfs_trybusy confusion

The previous fix worked, but it opened a window where mounts could have
disappeared from mountlist while the caller was traversing it using
vfs_trybusy(). Fix that.
 1.112 29-Apr-2008  ad kern/38135 vfs_busy/vfs_trybusy confusion

The symptom was that sometimes file systems would occasionally not appear
in output from 'df' or 'mount' if the system was busy. Resolution:

- Make mount locks work somewhat like vm_map locks.
- vfs_trybusy() now only fails if the mount is gone, or if someone is
unmounting the file system. Simple contention on mnt_lock doesn't
cause it to fail.
- vfs_busy() will wait even if the file system is being unmounted.
 1.111 28-Apr-2008  martin Remove clause 3 and 4 from TNF licenses
 1.110 20-Feb-2008  matt branches: 1.110.6; 1.110.8; 1.110.10;
Merge all the *different* definitions of bufqueues into one common one.
 1.109 15-Feb-2008  ad The buffer LOCKED flag need not be under the protection of bufcache_lock,
BUSY is enough.
 1.108 30-Jan-2008  ad PR kern/37706 (forced unmount of file systems is unsafe):

- Do reference counting for 'struct mount'. Each vnode associated with a
mount takes a reference, and in turn the mount takes a reference to the
vfsops.
- Now that mounts are reference counted, replace the overcomplicated mount
locking inherited from 4.4BSD with a recursable rwlock.
 1.107 02-Jan-2008  ad Merge vmlocking2 to head.
 1.106 11-Oct-2007  ad branches: 1.106.4; 1.106.6; 1.106.10;
Remove LOCK_ASSERT(!simple_lock_held(&foo));
 1.105 10-Oct-2007  ad Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.104 08-Oct-2007  ad Merge ffs locking & brelse changes from the vmlocking branch.
 1.103 29-Jul-2007  ad branches: 1.103.4; 1.103.6; 1.103.8; 1.103.10;
It's not a good idea for device drivers to modify b_flags, as they don't
need to understand the locking around that field. Instead of setting
B_ERROR, set b_error instead. b_error is 'owned' by whoever completes
the I/O request.
 1.102 17-Jul-2007  christos branches: 1.102.2;
eliminate MFSNAMELEN
 1.101 16-May-2007  perseant Change references to SEGM_W_DIROPS to SEGM_CKP, and replace the logic that
formerly used SEGM_W_DIROPS in lfs_segwrite() appropriately. This prevents
a problem in which processes could get stuck in "buffers" sleep forever.
 1.100 18-Apr-2007  perseant Add/change a couple of comments about locking restrictions.
 1.99 17-Apr-2007  perseant Install a new sysctl, vfs.lfs.ignore_lazy_sync, which causes LFS to ignore
the "smooth" syncer, as if vfs.sync.*delay = 0, but only for LFS. The
default is "on", i.e., ignore lazy sync.

Reduce the amount of polling/busy-waiting done by lfs_putpages(). To
accomplish this, copied genfs_putpages() and modified it to indicate which
page it was that caused it to return with EDEADLK. fsync()/fdatasync()
should no longer ever fail with EAGAIN, and should not consume huge
quantities of cpu.

Also, try to make dirops less likely to be written as the result of a
VOP_PUTPAGES(), while ensuring that they are written regularly.
 1.98 16-Nov-2006  christos branches: 1.98.2; 1.98.4; 1.98.8; 1.98.10; 1.98.16;
__unused removal on arguments; approved by core.
 1.97 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.96 04-Oct-2006  christos fix empty if
 1.95 15-Sep-2006  yamt branches: 1.95.2;
merge yamt-pdpolicy branch.
- separate page replacement policy from the rest of kernel
- implement an alternative replacement policy
 1.94 29-Jun-2006  perseant branches: 1.94.4;
Don't wake up the cleaner if the filesystem is unwrappable, and fix the
compatibility fcntls.

Also includes one-line fixes for an MP locking bug and a zero-length FINFO
problem that manifested during testing.
 1.93 14-May-2006  elad branches: 1.93.4;
integrate kauth.
 1.92 04-May-2006  perseant Introduce another per-filesystem parameter, lfs_resvseg, to separate the
notion of "how many segments are reserved for the cleaner" from that of
"how many segments are not counted in lfs_bfree". The default value
used for existing filesystems is the same as the previous implicit value
of (lfs_minfreeseg / 2 + 1), modulo some sanity checking.

Count pending dirops on a per-filesystem basis, since once we start
writing them we can't stop until we're done. This seems to help stave off
the "no clean segments" panic in the case of filling the filesystem with
directories and small files (e.g. simultaneously unpacking more copies of
pkgsrc than will fit).
 1.91 13-Apr-2006  perseant Make lfs_vref/lfs_vunref not need to know about VXLOCK and VFREEING
explicitly (especially since we didn't know about VFREEING at all before),
but notice the EBUSY return from vget() instead.

Fix some more MP locking protocol issues, most of which were pointed out by
Christian Ehrhardt this morning on tech-kern.
 1.90 05-Mar-2006  christos branches: 1.90.2; 1.90.4;
cleanup more SET/CLR/ISSET lossage
 1.89 06-Jan-2006  yamt branches: 1.89.2; 1.89.4; 1.89.6;
initialize necessary members of struct buf. PR/32462 from Reinoud Zandijk.
 1.88 04-Jan-2006  yamt - add simple functions to allocate/free a buffer for i/o.
- make bufpool static.
 1.87 11-Dec-2005  christos branches: 1.87.2;
merge ktrace-lwp.
 1.86 29-May-2005  christos branches: 1.86.2;
- sprinkle const
- avoid shadow variables.
 1.85 23-Apr-2005  perseant Provide a resize_lfs(8), including kernel and cleaner support. The current
implementation requires the fs to be mounted while resizing. Tested in both
directions, and everything appears to work happily, but ymmv.
 1.84 19-Apr-2005  perseant Keep per-inode, per-fs, and subsystem-wide counts of blocks allocated through
lfs_balloc(), and use that to estimate the number of dirty pages belonging
to LFS (subsystem or filesystem). This is almost certainly wrong for
the case of a large mmap()ed region, but the accounting is tighter than
what we had before, and performs much better in the typical case of pages
dirtied through write().
 1.83 06-Apr-2005  perseant Fix some locking issues that appeared with the simple_lock work.
Address a "pager_map" deadlock in lfs_putpages().
 1.82 01-Apr-2005  perseant Protect various per-fs structures with fs->lfs_interlock simple_lock, to
improve behavior in the multiprocessor case. Add debugging segment-lock
assertion statements.
 1.81 09-Mar-2005  perseant branches: 1.81.2;
Be more careful about handling of flags to lfs_flush, to ensure that
the lfs_writing mutex is respected.
 1.80 08-Mar-2005  perseant Straighten out the maze of ifdefs. Instead, consolidate all the debugging
stuff under '#ifdef DEBUG', and use sysctl knobs to turn on/off particular
parts of the debugging reporting (if DEBUG is enabled). Re-enable the LFS
statistics in sysctl, while I'm there. A bit of a rototill.
 1.79 26-Feb-2005  perry nuke trailing whitespace
 1.78 26-Feb-2005  perseant Various minor LFS improvements:

* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statvfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().
 1.77 28-Jan-2004  yamt branches: 1.77.6; 1.77.8; 1.77.10;
use bufmem instead of bufpages to make lfs a little less broken.
 1.76 04-Dec-2003  yamt use b_private rather than b_saveaddr.
XXX LFS_USE_B_INVAL
 1.75 03-Oct-2003  yamt assertions.
 1.74 23-Sep-2003  yamt remove unnecessary externs of lfs_do_flush.
 1.73 07-Sep-2003  yamt - raise spl to bio in lfs_countlocked() rather than having callers to do so.
- buffer cache MP locks.
- assert B_CALL buffers are not on the free queue.
 1.72 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.71 12-Jul-2003  yamt - protect global resource counts with lfs_subsys_lock.
- clean up scattered externs a little.
 1.70 02-Jul-2003  yamt a comment.
 1.69 02-Jul-2003  yamt use queue.h macros.
 1.68 02-Jul-2003  yamt use VFSTOUFS macro.
 1.67 02-Jul-2003  yamt - add a new functions, lfs_writer_enter/leave, and use them instead of
duplicated code fragments.
- add an assertion.
 1.66 27-Apr-2003  perseant branches: 1.66.2;
Don't change update time on block write; lets e.g. "tar xp" work properly.
 1.65 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.64 15-Mar-2003  perseant Add simple_lock protection for lfs_seglock and lfs_subsys_pages; these will
be expanded to cover other per-fs and subsystem-wide data as well.

Fix a case of IN_MODIFIED being set without updating lfs_uinodes, resulting
in a "lfs_uinodes < 0" panic.

Fix a deadlock in lfs_putpages arising from the need to busy all pages in a
block; unbusy any that had already been busied before starting over.
 1.63 02-Mar-2003  perseant Account SEGUSE_ACTIVE correctly so that the automatic segment cleaning
actually happens.

Add a new fcntl call that will write the minimum necessary to checkpoint
(i.e., for on-disk directory structure to be consistent, not including
updates to file data) so that the cleaner can clean segments more quickly
without sacrificing three-way commit for cleaning.
 1.62 25-Feb-2003  thorpej Add a new BUF_INIT() macro which initializes b_dep and b_interlock, and
use it. This fixes a few places where either b_dep or b_interlock were
not properly initialized.
 1.61 20-Feb-2003  perseant Tabify, and fix some comment alignment problems.
 1.60 19-Feb-2003  yamt workaround for "another flush is..." infinity loop in writerd.
if we're writerd, sleep in lfs_flush until another writer goes away
instead of busy loop in writed.
 1.59 19-Feb-2003  yamt init b_interlock.
 1.58 17-Feb-2003  perseant Add code to UBCify LFS. This is still behind "#ifdef LFS_UBC" for now
(there are still some details to work out) but expect that to go
away soon. To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:

* Create a writer daemon kernel thread whose purpose is to handle page
writes for the pagedaemon, but which also takes over some of the
functions of lfs_check(). This thread is started the first time an
LFS is mounted.

* Add a "flags" parameter to GOP_SIZE. Current values are
GOP_SIZE_READ, meaning that the call should return the size of the
in-core version of the file, and GOP_SIZE_WRITE, meaning that it
should return the on-disk size. One of GOP_SIZE_READ or
GOP_SIZE_WRITE must be specified.

* Instead of using malloc(...M_WAITOK) for everything, reserve enough
resources to get by and use malloc(...M_NOWAIT), using the reserves if
necessary. Use the pool subsystem for structures small enough that
this is feasible. This also obsoletes LFS_THROTTLE.

And a few that are not strictly necessary:

* Moves the LFS inode extensions off onto a separately allocated
structure; getting closer to LFS as an LKM. "Welcome to 1.6O."

* Unified GOP_ALLOC between FFS and LFS.

* Update LFS copyright headers to correct values.

* Actually cast to unsigned in lfs_shellsort, like the comment says.

* Keep track of which segments were empty before the previous
checkpoint; any segments that pass two checkpoints both dirty and
empty can be summarily cleaned. Do this. Right now lfs_segclean
still works, but this should be turned into an effectless
compatibility syscall.
 1.57 05-Feb-2003  pk Make the buffer cache code MP-safe.
 1.56 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.55 30-Dec-2002  yamt comment and assertions
 1.54 30-Dec-2002  yamt move check of lfs_unlockvp from lfs_reserveavail to lfs_reserve
because lfs_reservebuf needs same check as well.
 1.53 29-Dec-2002  yamt fix vref/vunref mismatch.
 1.52 28-Dec-2002  yamt - in lfs_reserve, vref vnodes that we're locking so that cleaner doesn't
try to reclaim them.
(workaround for deadlock noted in the comment in lfs_reserveavail)
- in lfs_rename, mark vnodes which are being moved as well as directry vnodes.
 1.51 26-Dec-2002  yamt - in lfs_reserve, reserve locked buffer count as well.
- don't wait for locking buf in lfs_bwrite_ext to avoid deadlocks.
- skip lfs_reserve when we're doing dirop.
reserve more (for lfs_truncate) in set_dirop instead.

this mostly solves PR 18972. (and hopefully PR 19196)
 1.50 22-Dec-2002  yamt add a XXX comment. (description of possible deadlock)
 1.49 17-Dec-2002  yamt #if 0 out vnode unlock/lock in lfs_reserve for now and add a comment about it.
deadlock is better than corruption (or panic), IMO.
 1.48 14-Dec-2002  yamt - in lfs_bwrite_ext, if we're cleaner,
mark inode IN_CLEANING rather then IN_MODIFIED.
otherwise cleaned (indirect) blocks belongs to the inode isn't written
until next sync.
- add assertions.
 1.47 27-Nov-2002  yamt more XXX comment.
 1.46 24-Nov-2002  yamt add a XXX comment to lfs_reserve.
* it isn't safe to unlock vp here
* because we're passing data using inode from namei.
* (eg. i_offset)
 1.45 24-Nov-2002  yamt lfs_reserve shouldn't block for lfs_unlockvp.
otherwise cleaner deadlocks.
PR 19134.
 1.44 20-Jun-2002  perseant Fix miscalculation in lfs_fits found by Trevin Beattie <trevin@xmission.com>.
Change some of the variable names from "nb", "db" to "fsb" to reflect their
calling conventions.
 1.43 14-May-2002  perseant branches: 1.43.2;
Phase one of my three-phase plan to make LFS play nice with UBC, and bug-fixes
I found while making sure there weren't any new ones.

* Make the write clusters keep track of the buffers whose blocks they contain.
This should make it possible to (1) write clusters using a page mapping
instead of malloc, if desired, and (2) schedule blocks for rewriting
(somewhere else) if a write error occurs. Code is present to use
pagemove() to construct the clusters but that is untested and will go away
anyway in favor of page mapping.
* DEBUG now keeps a log of Ifile writes, so that any lingering instances of
the "dirty bufs" problem can be properly debugged.
* Keep track of whether the Ifile has been dirtied by various routines that
can be called by lfs_segwrite, and loop on that until it is clean, for
a checkpoint. Checkpoints need to be squeaky clean.
* Warn the user (once) if the Ifile grows larger than is reasonable for their
buffer cache. Both lfs_mountfs and lfs_unmount check since the Ifile can
grow.
* If an inode is not found in a disk block, try rereading the block, under
the assumption that the block was copied to a cluster and then freed.
* Protect WRITEINPROG() with splbio() to fix a hang in lfs_update.
 1.42 12-May-2002  matt Eliminate commons.
 1.41 11-Feb-2002  perseant Include the space taken by inodes in the count made by lfs_check();
make VOP_SETATTR call lfs_check. This prevents large numbers of inode
changes (say, at the end of tar(1)) from filling the buffer cache.
 1.40 23-Nov-2001  chs add spaces for KNF. confirmed to produce identical objects.
 1.39 08-Nov-2001  lukem add RCSID
 1.38 06-Nov-2001  simonb Remove some variables that are set but never used.
 1.37 26-Oct-2001  lukem remove #include <ufs/ufs/quota.h> where it was just to appease
<ufs/ufs/inode.h>, since the latter now includes the former. leave the former
in source that obviously uses specific bits of it (for completeness.)
 1.36 13-Jul-2001  perseant branches: 1.36.4;
Merge the short-lived perseant-lfsv2 branch into the trunk.

Kernels and tools understand both v1 and v2 filesystems; newfs_lfs
generates v2 by default. Changes for the v2 layout include:

- Segments of non-PO2 size and arbitrary block offset, so these can be
matched to convenient physical characteristics of the partition (e.g.,
stripe or track size and offset).

- Address by fragment instead of by disk sector, paving the way for
non-512-byte-sector devices. In theory fragments can be as large
as you like, though in reality they must be smaller than MAXBSIZE in size.

- Use serial number and filesystem identifier to ensure that roll-forward
doesn't get old data and think it's new. Roll-forward is enabled for
v2 filesystems, though not for v1 filesystems by default.

- The inode free list is now a tailq, paving the way for undelete (undelete
is not yet implemented, but can be without further non-backwards-compatible
changes to disk structures).

- Inode atime information is kept in the Ifile, instead of on the inode;
that is, the inode is never written *just* because atime was changed.
Because of this the inodes remain near the file data on the disk, rather
than wandering all over as the disk is read repeatedly. This speeds up
repeated reads by a small but noticeable amount.

Other changes of note include:

- The ifile written by newfs_lfs can now be of arbitrary length, it is no
longer restricted to a single indirect block.

- Fixed an old bug where ctime was changed every time a vnode was created.
I need to look more closely to make sure that the times are only updated
during write(2) and friends, not after-the-fact during a segment write,
and certainly not by the cleaner.
 1.35 03-Dec-2000  perseant branches: 1.35.2; 1.35.4; 1.35.6;
Fix typo in 'malloc' for non-MALLOCLOG case
 1.34 03-Dec-2000  perseant Get rid of some old unnecessary code that cleared B_NEEDCOMMIT from buffers in
lfs_writeseg (possibly after they had been freed).

If MALLOCLOG is defined, make lfs_newbuf and lfs_freebuf pass along the
caller's file and line to _malloc and _free.
 1.33 27-Nov-2000  perseant If LFS_DO_ROLLFORWARD is defined, roll forward from the older checkpoint
on mount, through the newer checkpoint and on through any newer
partial-segments that may have been written but not checkpointed because
of an intervening crash.

LFS_DO_ROLLFORWARD is not defined by default.
 1.32 17-Nov-2000  perseant Correct accounting of lfs_avail, locked_queue_count, and locked_queue_bytes.
(PR #11468). In the case of fragment allocation, check to see if enough
space is available before extending a fragment already scheduled for writing.

The locked_queue_* variables indicate the number of buffer headers and bytes,
respectively, that are unavailable to getnewbuf() because they are locked up
waiting for LFS to flush them; make sure that that is actually what we're
counting, i.e., never count malloced buffers, and always use b_bufsize instead
of b_bcount.

If DEBUG is defined, the periodic calls to lfs_countlocked will now complain
if either counter is incorrect. (In the future lfs_countlocked will not need
to be called at all if DEBUG is not defined.)
 1.31 12-Nov-2000  perseant Do not needlessly dirty segment table blocks during lfs_segwrite,
preventing needless disk activity when the filesystem is idle. (PR #10979.)
 1.30 13-Sep-2000  perseant Cast back to int32_t in LFS_EST_BFREE and LFS_EST_RSVD macros, for
consistency with their arguments.

Change the debugging printf in lfs_reserve to match, and enclose it in
#ifdef DEBUG.

Tested on alpha, arm32, sparc.
 1.29 12-Sep-2000  perseant Make this file compile on the alpha as well (use %ld and cast to long,
instead of %qd with no cast).
 1.28 10-Sep-2000  augustss Make this file compile again.
 1.27 09-Sep-2000  perseant Various bug-fixes to LFS, to wit:


Kernel:

* Add runtime quantity lfs_ravail, the number of disk-blocks reserved
for writing. Writes to the filesystem first reserve a maximum amount
of blocks before their write is allowed to proceed; after the blocks
are allocated the reserved total is reduced by a corresponding amount.

If the lfs_reserve function cannot immediately reserve the requested
number of blocks, the inode is unlocked, and the thread sleeps until
the cleaner has made enough space available for the blocks to be
reserved. In this way large files can be written to the filesystem
(or, smaller files can be written to a nearly-full but thoroughly
clean filesystem) and the cleaner can still function properly.

* Remove explicit switching on dlfs_minfreeseg from the kernel code; it
is now merely a fs-creation parameter used to compute dlfs_avail and
dlfs_bfree (and used by fsck_lfs(8) to check their accuracy). Its
former role is better assumed by a properly computed dlfs_avail.

* Bounds-check inode numbers submitted through lfs_bmapv and lfs_markv.
This prevents a panic, but, if the cleaner is feeding the filesystem
the wrong data, you are still in a world of hurt.

* Cleanup: remove explicit references of DEV_BSIZE in favor of
btodb()/dbtob().

lfs_cleanerd:

* Make -n mean "send N segments' blocks through a single call to
lfs_markv". Previously it had meant "clean N segments though N calls
to lfs_markv, before looking again to see if more need to be cleaned".
The new behavior gives better packing of direct data on disk with as
little metadata as possible, largely alleviating the problem that the
cleaner can consume more disk through inefficient use of metadata than
it frees by moving dirty data away from clean "holes" to produce
entirely clean segments.

* Make -b mean "read as many segments as necessary to write N segments
of dirty data back to disk", rather than its former meaning of "read
as many segments as necessary to free N segments worth of space". The
new meaning, combined with the new -n behavior described above,
further aids in cleaning storage efficiency as entire segments can be
written at once, using as few blocks as possible for segment summaries
and inode blocks.

* Make the cleaner take note of segments which could not be cleaned due
to error, and not attempt to clean them until they are entirely free
of dirty blocks. This prevents the case in which a cleanerd running
with -n 1 and without -b (formerly the default) would spin trying
repeatedly to clean a corrupt segment, while the remaining space
filled and deadlocked the filesystem.

* Update the lfs_cleanerd manual page to describe all the options,
including the changes mentioned here (in particular, the -b and -n
flags were previously undocumented).

fsck_lfs:

* Check, and optionally fix, lfs_avail (to an exact figure) and
lfs_bfree (within a margin of error) in pass 5.

newfs_lfs:

* Reduce the default dlfs_minfreeseg to 1/20 of the total segments.

* Add a warning if the sgs disklabel field is 16 (the default for FFS'
cpg, but not usually desirable for LFS' sgs: 5--8 is a better range).

* Change the calculation of lfs_avail and lfs_bfree, corresponding to
the kernel changes mentioned above.

mount_lfs:

* Add -N and -b options to pass corresponding -n and -b options to
lfs_cleanerd.

* Default to calling lfs_cleanerd with "-b -n 4".


[All of these changes were largely tested in the 1.5 branch, with the
idea that they (along with previous un-pulled-up work) could be applied
to the branch while it was still in ALPHA2; however my test system has
experienced corruption on another filesystem (/dev/console has gone
missing :^), and, while I believe this unrelated to the LFS changes, I
cannot with good conscience request that the changes be pulled up.]
 1.26 05-Jul-2000  perseant Clean up accounting of lfs_uinodes (dirty but unwritten inodes).

Make lfs_uinodes a signed quantity for debugging purposes, and set it to
zero as fs mount time.

Enclose setting/clearing of the dirty flags (IN_MODIFIED, IN_ACCESSED,
IN_CLEANING) in macros, and use those macros everywhere. Make
LFS_ITIMES use these macros; updated the ITIMES macro in inode.h to know
about this. Make ufs_getattr use ITIMES instead of FFS_ITIMES.
 1.25 03-Jul-2000  perseant Allow the number of free segments reserved for the cleaner to be
parametrized in the filesystem, defaulting to MIN_FREE_SEGS = 2 but set
to something more reasonable at newfs_lfs time.

Note the number of blocks that have been scheduled for writing but which
are not yet on disk in an inode extension, i_lfs_effnblks. Move
i_ffs_effnlink out of the ffs extension and onto the main inode, since
it's used all over the shared code and the lfs extension would clobber
it.

At inode write time, indirect blocks and inode-held blocks of inodes
that have i_lfs_effnblks != i_ffs_blocks are cleansed of UNWRITTEN disk
addresses, so that these never make it to disk.
 1.24 27-Jun-2000  perseant Fixes associated with filling an LFS:

Change the space computation to appear to change the size of the *disk*
rather than the *bytes used* when more segment summaries and inode
blocks are written. Try to estimate the amount of space that these will
take up when more files are written, so the disk size doesn't change too
much.

Regularize error returns from lfs_valloc, lfs_balloc, lfs_truncate: they
now fail entirely, rather than succeeding half-way and leaving the fs in
an inconsistent state.

Rewrite lfs_truncate, mostly stealing from ffs_truncate. The old
lfs_truncate had difficulty truncating a large file to a non-zero size
(indirect blocks were not handled appropriately).

Unmark VDIROP on fvp after ufs_remove, ufs_rmdir, so these can be
reclaimed immediately: this vnode would not be written to disk again
anyway if the removal succeeded, and if it failed, no directory
operation occurred.

ufs_makeinode and ufs_mkdir now remove IN_ADIROP on error.
 1.23 06-Jun-2000  perseant branches: 1.23.2;
Protect inode free list with seglock, instead of separate lock, so that
the head of the inode free list (on the superblock) always matches the
rest of the free list (in the ifile).

Protect lfs_fragextend with seglock, to prevent the segment byte count
fudging from making its way to disk.

Don't try to inactivate dirop vnodes that are still in the middle of
their dirop (may address PR#10285).
 1.22 31-May-2000  fredb Make this build. (Balance parenthesis.
 1.21 31-May-2000  perseant update for IN_ACCESSED changes
 1.20 27-May-2000  perseant branches: 1.20.2;
Prevent dirops from getting around lfs_check and wedging the buffer cache.
All the dirop vnops now mark the inodes with a new flag, IN_ADIROP, which
is removed as soon as the dirop is done (as opposed to VDIROP which stays
until the file is written). To address one issue raised in PR#9357.
 1.19 19-May-2000  thorpej NULL != 0
 1.18 05-May-2000  perseant Change the way LFS does block accounting, from trying to infer from the
buffer cache flags, to marking the inode and/or indirect blocks with a
special disk address UNWRITTEN==-2 when a block is accounted for. (This
address is never written to disk, but only used in-core. This is essentially
the same method of block accounting as on the UBC branch, where the buffer
headers don't exist.) Make sure that truncation is handled properly,
especially in the case of holey files.

Fixes PR#9994.
 1.17 30-Mar-2000  augustss Remove register declarations.
 1.16 15-Dec-1999  perseant In lfs_bwrite, don't mark buffers dirty if lfs is mounted read-only.
(Previously buffers could be marked dirty by the cleaner, and possibly by
other means.)

Also check for softdep mount in vfs_shutdown before trying to bawrite
buffers, since other filesystems don't need it and lfs doesn't bawrite.
(This fragment reviewed by fvdl.)

Partially addresses PR#8964.
 1.15 04-Dec-1999  ragge CL* discarding.
 1.14 23-Nov-1999  fvdl Be more careful to block bio interrupts for some data structures. There
were at least a few missed cases where vp->v_{clean,dirty}blkhd were
unprotected since the softdep/trickle sync merge.
 1.13 06-Nov-1999  perseant branches: 1.13.2;
Address ufs_hashlock/ufs_ihashins protocol bug, discovered while doing a
post-mortem of a production machine. Also, take the active dirop
count off of the fs and make it global (since it is measuring a global
resource) and tie the threshold value LFS_MAXDIROP to desiredvnodes.
 1.12 21-Oct-1999  perseant Under degenerate access patterns (e.g. `bonnie' benchmark) lfs_check could
fail, because the particular block being requested was always in the cache
(although other routines that cannot afford to call lfs_check have in the
meantime stuffed the cache full of dirty blocks). Partially addresses PR 8383.
 1.11 01-Jun-1999  perseant branches: 1.11.2; 1.11.4; 1.11.6;
Fixed lfs_update (and related functions) so that calls from lfs_fsync
will DTRT with vnodes marked VDIROP. In particular, the message
"flushing VDIROP" will no longer appear, and the filesystem will remain
stable in the event of a crash.

This was particularly a problem with NFS-exported LFSes, since fsync
was called on every file close.
 1.10 12-Apr-1999  perseant Disallow threshold-initiated cache flush when dirops are active. Also, make
SET_ENDOP use lfs_check instead of inlining most of it.
 1.9 25-Mar-1999  perseant branches: 1.9.2;
Fixes to make dirops and lfs_vflush play together well. In particular,
if we are short on vnodes, lfs_vflush from another process can grab a
vnode that lfs_markv has already processed but not yet written; but
lfs_markv holds the seglock. When lfs_vflush gets around to writing it,
the context for copyin is gone. So, now lfs_markv calls copyin itself,
rather than having lfs_writeseg do it.
 1.8 25-Mar-1999  perseant clean up unused/required #ifdefs
 1.7 10-Mar-1999  perseant New sources should leave the LFS in a more-or-less working state. Changes
include:

- DIROP segregation is enabled, and greater care is taken
to make sure that a checkpoint completes. Fsck is not
needed to remount the filesystem.
- Several checks to make sure that the LFS subsystem does not
overuse various resources (memory, in particular).
- The cleaner routines, lfs_markv in particular, are completely
rewritten. A buffer overflow is removed. Greater care is taken
to ensure that inodes come from where lfs_cleanerd say they come
from (so we know nothing has changed since lfs_bmapv was called).
- Fragment allocation is fixed, so that writes beyond end-of-file
do the right thing.
 1.6 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.5 09-Feb-1996  christos lfs prototypes
 1.4 18-Jun-1995  cgd don't assume the f_fsnamelen is nul-truncated or longer than MFSNAMELEN
 1.3 18-Jan-1995  mycroft Turn mountlist into a CIRCLEQ, and handle setting and checking of MNT_ROOTFS
differently.
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.9.2.4 17-Dec-1999  he Pull up revision 1.13 (requested by perseant):
Address locking protocol error for inode hash, and make the
maximum number of active dirops a global quantity.
 1.9.2.3 17-Dec-1999  he Pull up revision 1.11 (requested by perseant):
Avoid flushing vnodes involved in a dirop, making lfs' promise
of "no fsck needed, even in the event of a crash" closer to
reality.
 1.9.2.2 26-Oct-1999  he Pull up revision 1.12 (requested by perseant):
Fix LFS buffer starvation under degenerate access patterns.
 1.9.2.1 13-Apr-1999  perseant branches: 1.9.2.1.2;
Pull-up of changes made to the trunk on Sunday [1.9->1.10], to wit:

Take out the `#ifdef USE_UFSHASH'; use ufs_hashlock to lock the inode free
list instead of free_lock.

Fix inode reporting in lfs_statfs (the meaning of f_files and f_ffree was
reversed).

Fix "lfs_ifind: dinode xxx not found" panic. When inodes were freed, then
immediately reloaded, their dinodes were located in an inode block which
was not on disk at the advertized location, nor in the cache (although it
would be flushed to disk next segment write). Fix this by using getblk()
instead of lfs_newbuf() for inode blocks.

Better checking for held inode locks in lfs_fastvget, for a number of
error conditions. Also change the default setting of lfs_clean_vnhead to
0, which seems to make the locking problems go away (although this is
difficult to test as I can't reliably reproduce them).

Make sure that the wakeup occurs for vnodes that lfs_update might be
sleeping on (nodes which are not marked IN_MODIFIED/IN_CLEANING, but which
have dirty buffers), by marking them with the appropriate flag if
dirtybuffers were added while the write was in progress.

Fix block counting during file truncation, if not truncating to zero.

Disallow threshold-initiated cache flush when dirops are active. Also,
make SET_ENDOP use lfs_check instead of inlining most of it.

Improve the debugging printfs in the cleaner syscalls (in particular, make
it obvious that they're coming from lfs).

Check the superblock version field, and refuse to mount the filesystem if
the version number is higher than we know about. This allows, e.g.,
changes in the format of the ifile, segment size restrictions and
boundaries, etc., which would not affect existing fields in the
superblock, but which would drastically affect the filesystem, to be
smoothly integrated at a later date.
 1.9.2.1.2.2 31-Aug-1999  perseant Rudimentary support for LFS under UBC:

- LFS-specific VOP_BALLOC and VOP_PUTPAGES vnode ops.

- getblk VREG panic #ifdef'd out (can be reinstated when Ifile is
internalized and Ifile can be made another type from VREG)

- interface to VOP_PUTPAGES changed to pass all pager flags, not
just sync. FS putpages routines must know about the pager flags.

- new LFS magic disk address, -2 ("unwritten"), meaning accounted for
but not assigned to a fixed disk location (since LFS does these two
things separately, and the previous accounting method using buffer
headers no longer will work). Changed references to (foo == (daddr_t)-1)
to (foo < 0). Since disk drivers reject all addresses < 0, this should
not present a problem for other FSs.
 1.9.2.1.2.1 21-Jun-1999  thorpej Sync w/ -current.
 1.11.6.2 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.11.6.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.11.4.1 15-Nov-1999  fvdl Sync with -current
 1.11.2.3 08-Dec-2000  bouyer Sync with HEAD.
 1.11.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.11.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.13.2.2 06-Nov-1999  perseant Address ufs_hashlock/ufs_ihashins protocol bug, discovered while doing a
post-mortem of a production machine. Also, take the active dirop
count off of the fs and make it global (since it is measuring a global
resource) and tie the threshold value LFS_MAXDIROP to desiredvnodes.
 1.13.2.1 06-Nov-1999  perseant file lfs_bio.c was added on branch comdex-fall-1999 on 1999-11-06 20:33:06 +0000
 1.20.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.23.2.2 03-Feb-2001  he Pull up revisions 1.31-1.32 (requested by perseant):
o Don't write anything if the filesystem is idle (PR#10979).
o Close up accounting holes in LFS' accounting of immediately-
available-space, number of clean segments, and amount of dirty
space taken up by metadata (PR#11468, PR#11470, PR#11534).
 1.23.2.1 14-Sep-2000  perseant Pull up recent LFS kernel changes (approved by thorpej):

ufs/ufs/inode.h, 1.20--1.22 (add i_lfs_effnblks extension ;
make ITIMES aware of LFS_ITIMES;
_LKM protection so userland progs
compile)
ufs/ufs/ufs_vnops.c, 1.69, 1.71 (remove IN_ADIROP;
use ITIMES instead of FFS_ITIMES)
ufs/ufs/ufs_readwrite.c, 1.27 (use lfs_reserve in lfs_write)
ufs/lfs/lfs.h, 1.26--1.32 (define LFS_EST_* macros ;
change MIN_FREE_SEGS to lfs_minfreesegs ;
add avail and bfree to CLEANERINFO ;
change lfs_uinodes to signed ;
change lfs_dmeta to signed ;
add whitespace to line up structure
members ;
explicit cast to int32_t in LFS_EST_*
macros)
ufs/lfs/lfs_alloc.c, back out 1.34.2.3 (pullups of 1.39, 1.40);
then pull up 1.38 (clean up on error)
1.39--1.43 (restore fvdl's ufs_hashlock fix ;
restore fvdl's ufs_hashlock fix ;
set i_lfs_effnblks ;
use UINO macros ;
add comments and fix long lines)
ufs/lfs/lfs_balloc.c, 1.19 (don't succeed halfway)
1.21--1.25 (use i_lfs_effnblks ;
fix i_lfs_effnblks computation and
quieten ;
fix i_ffs_blocks in unwritten fragment ;
remove useless debugging check ;
add comments and (c) 2000)
ufs/lfs/lfs_bio.c, 1.24--1.30 (cleanup and make lfs_flush_fs take
"struct lfs *" instead of "struct
mount *" ;
use lfs_minfreeseg instead of
MIN_FREE_SEGS ;
use UINO macros, and copy bfree/avail
to CLEANERINFO ;
add lfs_reserve function ;
1.28--1.30 fix printf formatting)
ufs/lfs/lfs_cksum.c, 1.13 (add (c) 2000)
ufs/lfs/lfs_debug.c, 1.11 (use btodb instead of DEV_BSIZE)
ufs/lfs/lfs_extern.h, 1.18, 1.20--1.21 (function prototype changes)
ufs/lfs/lfs_inode.c, 1.38 (rewrite lfs_truncate from
ffs_truncate)
1.40--1.44 (count written and unwritten blocks
seperately ;
use disk block units instead of bytes ;
remove unnecessary "mod" variable ;
correct B_DELWRI to avoid bawrite panic ;
use lfs_reserve)
ufs/lfs/lfs_segment.c, 1.52-1.59 (use lfs_dmeta to note used summaries ;
check for UNWRITTEN in indirect blocks ;
more debugging stuff inside #ifdef
DEBUG_LFS ;
use LK_CANRECURSE ;
don't drop dirty indirect blocks ;
use UINO macros ;
don't hose the free list ;
use btodb() instead of DEV_BSIZE ;
make it compile again (oops))
ufs/lfs/lfs_subr.c, 1.16--1.17 (check for locked inodes before
changing ;
use btodb() instead of DEV_BSIZE, (c)
2000)
ufs/lfs/lfs_syscalls.c, back out 1.41.4.2 (fvdl's ufs_hashlock fix);
then pull up 1.43 (use lfs_dmeta)
1.44--1.45 (restore fvdl's ufs_hashlock fix)
1.46--1.47 (fix lfs_avail leakage from sblock
segments ;
use UINO macros)
1.49 (bounds-check inode numbers in
lfs_markv)
ufs/lfs/lfs_vfsops.c, 1.53 (use LFS_EST_* macros in lfs_statfs)
1.56--1.58 (initialize lfs_minfreeseg, lfs_effnblk ;
initialize lfs_uinodes ;
initialize lfs_ravail)
ufs/lfs/lfs_vnops.c, 1.40 (remove VDIROP from removed files)
1.42--1.44 (move SET_ENDOP below the removal of
VDIROP ;
use UINO macros and add lfs_itimes
function ;
use lfs_reserve in dirops)
 1.35.6.5 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.35.6.4 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.35.6.3 16-Mar-2002  jdolecek Catch up with -current.
 1.35.6.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.35.6.1 03-Aug-2001  lukem update to -current
 1.35.4.4 13-Jul-2001  perseant Be more careful about when we update ctime/mtime. In particular, if we
are only writing indirect blocks, that doesn't count for mtime; and when
we first create a vnode, that certainly *does not* count for ctime
(a bug that's been there from the beginning).

This does not change the fact that mtime might still be set after write(2)
is "completed", but it does make the atime-in-the-ifile code have some
effect (noticeable less degradation of read time after an intervening
large write).
 1.35.4.3 02-Jul-2001  perseant Change disk addressing unit to be the fragment, instead of the disk sector.
All quantities in the superblock, inodes, indirect blocks, etc. refer now
to this abstract unit (called "fsb" as it is in FFS) instead of disk sectors;
as a consequence segment summary blocks have to be multiples of a fragment in
size. In v1 filesystems, compatibility code ensures that 1 fsb == 1 sector,
regardless of fragment size.

Fragments can now range in size between 512 and 32k; in the event that
LFS_LABELPAD (8k) is smaller than the disk address unit size, an extra
proto-superblock is kept at 8k from the beginning of the disk, to be used
*only* to locate the real superblocks. (Not all of the userland knows about
this yet.)

Almost all of this was done not by me, but by joff.
 1.35.4.2 29-Jun-2001  perseant Get rid of __P(), protoizing where it had not already been done
 1.35.4.1 27-Jun-2001  perseant Import of what I've been calling "LFSv2", that is, LFS with some features
added that require changes to the on-disk data structures. These include:

- 64-bit time in everything but inodes
- User-specified segment offset, and segment size no longer
restricted to PO2.
- Serial number on segment summaries in addition to timestamp, and
a new volume identifier, to make roll-forward feasible without
fear of finding old data and thinking it was new.

Although I think this version works at least as well as what's on the trunk,
we're not done yet; hence this commit is going in on a branch and not on
the trunk. Enhancements that are not here yet include fragment addressing,
like FFS does, instead of block addressing.
 1.35.2.13 03-Jan-2003  thorpej Sync with HEAD.
 1.35.2.12 29-Dec-2002  thorpej Sync with HEAD.
 1.35.2.11 19-Dec-2002  thorpej Sync with HEAD.
 1.35.2.10 11-Dec-2002  thorpej Sync with HEAD.
 1.35.2.9 01-Aug-2002  nathanw Catch up to -current.
 1.35.2.8 15-Jul-2002  nathanw Whitespace.
 1.35.2.7 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.35.2.6 20-Jun-2002  nathanw Catch up to -current.
 1.35.2.5 28-Feb-2002  nathanw Catch up to -current.
 1.35.2.4 08-Jan-2002  nathanw Catch up to -current.
 1.35.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.35.2.2 24-Aug-2001  nathanw Catch up with -current.
 1.35.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.36.4.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.43.2.1 15-Jul-2002  gehenna catch up with -current.
 1.66.2.7 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.66.2.6 01-Apr-2005  skrll Sync with HEAD.
 1.66.2.5 08-Mar-2005  skrll Sync with HEAD.
 1.66.2.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.66.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.66.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.66.2.1 03-Aug-2004  skrll Sync with HEAD
 1.77.10.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.77.8.1 29-Apr-2005  kent sync with -current
 1.77.6.1 10-May-2005  riz Pull up the following revisions (requested by perseant in ticket #1281):

1.8 sys/ufs/lfs/TODO
1.75 sys/ufs/lfs/lfs.h (via patch)
1.74 sys/ufs/lfs/lfs_alloc.c (via patch)
1.49, 1.51 sys/ufs/lfs/lfs_balloc.c (1.51 via patch)
1.78 sys/ufs/lfs/lfs_bio.c
1.62 sys/ufs/lfs/lfs_extern.h (via patch)
1.156 sys/ufs/lfs/lfs_segment.c (via patch)
1.48 sys/ufs/lfs/lfs_subr.c
1.101 sys/ufs/lfs/lfs_syscalls.c
1.163 sys/ufs/lfs/lfs_vfsops.c (via patch)
1.134 sys/ufs/lfs/lfs_vnops.c (via patch)
1.61 sys/ufs/ufs/ufs_readwrite.c (via patch)

1.20 libexec/lfs_cleanerd/clean.h (via patch)
1.52 libexec/lfs_cleanerd/cleanerd.c (via patch)
1.41 libexec/lfs_cleanerd/library.c (via patch)

1.4 regress/sys/fs/lfs/newfs_fsck/Makefile
1.2 regress/sys/fs/lfs/newfs_fsck/mkfs_mount
1.2 regress/sys/fs/lfs/newfs_fsck/smallfiles
1.3 sbin/fsck_lfs/bufcache.c
1.3 sbin/fsck_lfs/bufcache.h
1.3 sbin/fsck_lfs/lfs.h
1.8 sbin/fsck_lfs/lfs.c (via patch)
1.8 sbin/fsck_lfs/pass3.c (via patch)
1.18 sbin/fsck_lfs/pass0.c (via patch)
1.18 sbin/fsck_lfs/utilities.c (via patch)
1.7 sbin/fsck_lfs/segwrite.c
1.19 sbin/fsck_lfs/setup.c (via patch)
1.3 sbin/newfs_lfs/Makefile
0 sbin/newfs_lfs/lfs.c (yes, remove it)
1.1 sbin/newfs_lfs/make_lfs.c
1.15 sbin/newfs_lfs/newfs.c (via patch)

Various minor LFS improvements.

Kernel:

* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this. Should fix PR #29045.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
Fixes PR #26680.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().

cleaner:

* Adapt lfs_cleanerd to use the fcntl call to get the Ifile filehandle,
so it need not be in the namespace.
* Make lfs_cleanerd be more careful when there are very few available
segments.
* Make lfs_cleanerd less verbose when the filesystem is unmounted.

newfs_lfs, fsck_lfs, and regression:

* Extend the lfs library from fsck_lfs(8) so that it can be used with a
not-yet-existent LFS. Make newfs_lfs(8) use this library, so it can
create LFSs whose Ifile is larger than one segment. Addresses PR #11110.
* Make newfs_lfs(8) use strsuftoi64() for its arguments, a la newfs(8).
* Make fsck_lfs(8) respect the "file system is clean" flag.
* Don't let fsck_lfs(8) think it has dirty blocks when invoked with the
-n flag.
* Remove the Ifile from the filesystem namespace. The cleaner now uses
a fcntl call on the root inode to find the Ifile filehandle. (As a
side-effect, addresses PR #29144.)
 1.81.2.5 10-Aug-2006  tron Apply patch (requested by fair in perseant #1457):
Bring LFS up to current, including a patch (1.95 lfs_alloc.c) that
should prevent the inode free list errors seen on the STABLE branch
subsequent to pullup ticket #1327.
 1.81.2.4 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_alloc.c: revision 1.92
sys/ufs/lfs/lfs.h: revision 1.105
sys/ufs/lfs/lfs_vfsops.c: revision 1.207
sys/ufs/lfs/lfs_subr.c: revision 1.59
sys/ufs/lfs/lfs_vnops.c: revision 1.173
sys/ufs/lfs/lfs_bio.c: revision 1.92
Introduce another per-filesystem parameter, lfs_resvseg, to separate the
notion of "how many segments are reserved for the cleaner" from that of
"how many segments are not counted in lfs_bfree". The default value
used for existing filesystems is the same as the previous implicit value
of (lfs_minfreeseg / 2 + 1), modulo some sanity checking.
Count pending dirops on a per-filesystem basis, since once we start
writing them we can't stop until we're done. This seems to help stave off
the "no clean segments" panic in the case of filling the filesystem with
directories and small files (e.g. simultaneously unpacking more copies of
pkgsrc than will fit).
 1.81.2.3 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs.h: revision 1.102
sys/ufs/lfs/lfs_segment.c: revision 1.173
sys/ufs/lfs/lfs_vnops.c: revision 1.167 via patch
sys/ufs/lfs/lfs_bio.c: revision 1.91
Make lfs_vref/lfs_vunref not need to know about VXLOCK and VFREEING
explicitly (especially since we didn't know about VFREEING at all before),
but notice the EBUSY return from vget() instead.
Fix some more MP locking protocol issues, most of which were pointed out by
Christian Ehrhardt this morning on tech-kern.
 1.81.2.2 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.152
sys/ufs/lfs/lfs_debug.c: revision 1.31
sys/ufs/lfs/lfs_subr.c: revision 1.53
sys/ufs/lfs/lfs_extern.h: revision 1.68
sys/ufs/lfs/lfs_inode.c: revision 1.96
sys/ufs/lfs/lfs_bio.c: revision 1.86
sys/ufs/lfs/lfs_alloc.c: revision 1.83
sys/ufs/lfs/lfs_vfsops.c: revision 1.181
sys/ufs/lfs/lfs.h: revision 1.88
sys/ufs/lfs/lfs_segment.c: revision 1.164
- sprinkle const
- avoid shadow variables.
 1.81.2.1 07-May-2005  tron Apply patch (requested by perseant in ticket #242):
* fsck_lfs buffer cache fixes, including PR #29151
* Change fsck_lfs phase 0 message to reflect reality
* fsck_lfs: check phase 5 (cleanerinfo accounting) even on
roll-forward
* Keep better track of the free list during roll-forward, avoiding
a core dump
* Improve hash table use for fsck_lfs buffer and vnode cache
* Document fsck_lfs flag -f, and implement -q
* Add resize_lfs, including kernel support
* Add LFS to mountd's list of exportable filesystem types
* Make the LFS lkm work again [christos@]
* Add MP locking to the LFS kernel subsystem
* Fix pager_map deadlock in lfs_putpages()
* Avoid incomplete file extension that looks like "partial
truncation" to fsck
* Use lfs_malloc for cleaner malloc, since the cleaner often runs
in low-memory conditions.
* Use splay trees, not hash table, to track page allocation for
write.
* Fix mkdir panic on full fs
* Fix page accounting leak by counting differently.
* Use rightly named structure for lfs_getattr [skrll@]
* Cosmetic changes for readability.
 1.86.2.7 27-Feb-2008  yamt sync with head.
 1.86.2.6 04-Feb-2008  yamt sync with head.
 1.86.2.5 21-Jan-2008  yamt sync with head
 1.86.2.4 27-Oct-2007  yamt sync with head.
 1.86.2.3 03-Sep-2007  yamt sync with head.
 1.86.2.2 30-Dec-2006  yamt sync with head.
 1.86.2.1 21-Jun-2006  yamt sync with head.
 1.87.2.1 15-Jan-2006  yamt sync with head.
 1.89.6.4 11-Aug-2006  yamt sync with head
 1.89.6.3 24-May-2006  yamt sync with head.
 1.89.6.2 13-Mar-2006  yamt sync with head.
 1.89.6.1 05-Mar-2006  yamt separate page replacement policy from the rest of kernel.
 1.89.4.2 01-Jun-2006  kardel Sync with head.
 1.89.4.1 22-Apr-2006  simonb Sync with head.
 1.89.2.1 09-Sep-2006  rpaulo sync with head
 1.90.4.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.90.2.3 11-May-2006  elad sync with head
 1.90.2.2 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.90.2.1 19-Apr-2006  elad sync with head.
 1.93.4.1 13-Jul-2006  gdamore Merge from HEAD.
 1.94.4.1 18-Nov-2006  ad Sync with head.
 1.95.2.2 10-Dec-2006  yamt sync with head.
 1.95.2.1 22-Oct-2006  yamt sync with head
 1.98.16.1 03-Sep-2007  wrstuden Sync w/ NetBSD-4-RC_1
 1.98.10.1 11-Jul-2007  mjf Sync with head.
 1.98.8.7 24-Aug-2007  ad Sync with buffer cache locking changes. See buf.h/vfs_bio.c for details.
Some minor portions are incomplete and needs to be verified as a whole.
 1.98.8.6 20-Aug-2007  ad Sync with HEAD.
 1.98.8.5 19-Aug-2007  ad - Back out the biodone() changes.
- Eliminate B_ERROR (from HEAD).
 1.98.8.4 23-Jun-2007  ad - Lock v_cleanblkhd, v_dirtyblkhd, v_numoutput with the vnode's interlock.
Get rid of global_v_numoutput_lock. Partially incomplete as the buffer
cache locking doesn't work very well and needs an overhaul.
- Some changes to try and make softdep MP safe. Untested.
 1.98.8.3 08-Jun-2007  ad Sync with head.
 1.98.8.2 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.98.8.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.98.4.2 17-May-2007  yamt sync with head.
 1.98.4.1 07-May-2007  yamt sync with head.
 1.98.2.1 05-Jun-2007  bouyer Pull up following revision(s) (requested by perseant in ticket #703):
sys/miscfs/genfs/genfs.h 1.21
sys/miscfs/genfs/genfs_vnops.c 1.151
sys/ufs/lfs/lfs.h 1.119, 1.120
sys/ufs/lfs/lfs_bio.c 1.99-101
sys/ufs/lfs/lfs_extern.h 1.89
sys/ufs/lfs/lfs_inode.c 1.108, 1.109
sys/ufs/lfs/lfs_segment.c 1.197, 1.199, 1.200
sys/ufs/lfs/lfs_subr.c 1.69, 1.70
sys/ufs/lfs/lfs_syscalls.c 1.119
sys/ufs/lfs/lfs_vfsops.c 1.234, 1.235
sys/ufs/lfs/lfs_vnops.c 1.195, 1.196, 1.200, 1.202-206

Reduce busy waiting in lfs_putpages(), and other LFS improvements.
 1.102.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.103.10.2 29-Jul-2007  ad It's not a good idea for device drivers to modify b_flags, as they don't
need to understand the locking around that field. Instead of setting
B_ERROR, set b_error instead. b_error is 'owned' by whoever completes
the I/O request.
 1.103.10.1 29-Jul-2007  ad file lfs_bio.c was added on branch matt-mips64 on 2007-07-29 13:31:15 +0000
 1.103.8.1 14-Oct-2007  yamt sync with head.
 1.103.6.3 23-Mar-2008  matt sync with HEAD
 1.103.6.2 09-Jan-2008  matt sync with HEAD
 1.103.6.1 06-Nov-2007  matt sync with HEAD
 1.103.4.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.106.10.1 02-Jan-2008  bouyer Sync with HEAD
 1.106.6.5 19-Dec-2007  ad Use a global lfs_lock.
 1.106.6.4 19-Dec-2007  ad Fix some more problems w/lfs on this branch.
 1.106.6.3 19-Dec-2007  ad Get lfs mostly working.
 1.106.6.2 08-Dec-2007  ad Minor locking fixes.
 1.106.6.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.106.4.1 18-Feb-2008  mjf Sync with HEAD.
 1.110.10.3 11-Aug-2010  yamt sync with head.
 1.110.10.2 11-Mar-2010  yamt sync with head
 1.110.10.1 16-May-2008  yamt sync with head.
 1.110.8.1 18-May-2008  yamt sync with head.
 1.110.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.114.18.1 19-Dec-2013  matt Adapt to new uvm_estimatepageable arguments
 1.116.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.116.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.117.2.2 03-Jul-2010  rmind sync with head
 1.117.2.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.118.6.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.120.6.1 18-Feb-2012  mrg merge to -current.
 1.120.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.120.2.1 17-Apr-2012  yamt sync with head
 1.121.2.1 17-Mar-2012  bouyer Pull up following revision(s) (requested by perseant in ticket #116):
sys/ufs/lfs/lfs_alloc.c: revision 1.112
tests/fs/vfs/t_rmdirrace.c: revision 1.9
tests/fs/vfs/t_renamerace.c: revision 1.25
sys/ufs/lfs/lfs_vnops.c: revision 1.240
sys/ufs/lfs/lfs_segment.c: revision 1.224
sys/ufs/lfs/lfs_bio.c: revision 1.122
sys/ufs/lfs/lfs_vfsops.c: revision 1.294
sbin/newfs_lfs/make_lfs.c: revision 1.19
sys/ufs/lfs/lfs.h: revision 1.136
Pass t_renamerace and t_rmdirrace tests.
Adapt dholland@'s fix to ufs_rename to fix PR kern/43582. Address several
other MP locking issues discovered during the course of investigating the
same problem.
Removed extraneous vn_lock() calls on the Ifile, since the Ifile writes
are controlled by the segment lock.
Fix PR kern/45982 by deemphasizing the estimate of how much metadata
will fill the empty space on disk when the disk is nearly empty
(t_renamerace crates a lot of inode blocks on a tiny empty disk).
 1.122.2.3 03-Dec-2017  jdolecek update from HEAD
 1.122.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.122.2.1 23-Jun-2013  tls resync from head
 1.125.2.2 18-May-2014  rmind sync with head
 1.125.2.1 28-Aug-2013  rmind sync with head
 1.128.6.3 28-Aug-2017  skrll Sync with HEAD
 1.128.6.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.128.6.1 22-Sep-2015  skrll Sync with HEAD
 1.135.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.135.2.2 26-Apr-2017  pgoyette Sync with HEAD
 1.135.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.139.4.1 30-Oct-2017  snj Pull up following revision(s) (requested by maya in ticket #330):
sbin/fsck_lfs/inode.c: 1.69
sbin/fsck_lfs/lfs.c: 1.73
sbin/fsck_lfs/pass6.c: 1.50
sbin/fsck_lfs/segwrite.c: 1.46
sys/ufs/lfs/lfs.h: 1.202-1.203
sys/ufs/lfs/lfs_accessors.h: 1.48
sys/ufs/lfs/lfs_alloc.c: 1.136-1.137
sys/ufs/lfs/lfs_balloc.c: 1.94
sys/ufs/lfs/lfs_bio.c: 1.141
sys/ufs/lfs/lfs_extern.h: 1.113
sys/ufs/lfs/lfs_inode.c: 1.156-1.157
sys/ufs/lfs/lfs_inode.h: 1.20, 1.21, 1.23
sys/ufs/lfs/lfs_itimes.c: 1.20
sys/ufs/lfs/lfs_pages.c: 1.13-1.15
sys/ufs/lfs/lfs_rename.c: 1.22
sys/ufs/lfs/lfs_segment.c: 1.270-1.275
sys/ufs/lfs/lfs_subr.c: 1.94-1.97
sys/ufs/lfs/lfs_syscalls.c: 1.175
sys/ufs/lfs/lfs_vfsops.c: 1.360
sys/ufs/lfs/lfs_vnops.c: 1.316-1.321
sys/ufs/lfs/ulfs_inode.c: 1.20
sys/ufs/lfs/ulfs_inode.h: 1.24
sys/ufs/lfs/ulfs_lookup.c: 1.41
sys/ufs/lfs/ulfs_quota2.c: 1.31
sys/ufs/lfs/ulfs_readwrite.c: 1.24
sys/ufs/lfs/ulfs_vnops.c: 1.49-1.50
Update inode member i_flag --> i_state to keep up with kernel changes
Move definition of IN_ALLMOD near the flag it's a mask for.
Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
Correct confusion between i_flag and i_flags
These will have to be renamed.
Spotted by Riastradh, thanks!
Add an XXX about the missing flags so it's not buried in a commit
message.
now the XXX count for LFS is 260
Rename i_flag to i_state.
The similarity to i_flags has previously caused errors.
Use continue to denote the no-op loop to match netbsd style
newline for extra clarity.
It isn't safe to drain dirops with seglock held, it'll deadlock if there
are any dirops. drain before grabbing seglock.
lfs_dirops == 0 is always true (as we already drained dirops), so omit
that part of the comparison.
Fixes a lot of LFS deadlocks. PR kern/52301
Many thanks to dholland for help analyzing coredumps
Ifdef out KDASSERT which fires on my machine.
Deduplicate sanity check that seglock is held on segunlock
Revert r1.272 fix to PR kern/52301, the performance hit is making things
unusable.
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar
XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
fix buffer overflow/KASSERT when cookies are supplied
lfs no longer uses the ffs-style struct direct, use the correct minimum
size
from dholland
XXX more wrong
Consistently use {,UN}MARK_VNODE macros rather than function calls.
Not much point doing anything after a panic call
Ask some question about the code in a XXX comment
XXX question our double-flushing of dirops
Fix typo in comment
 1.141.4.1 25-Jun-2018  pgoyette Sync with HEAD
 1.142.6.1 17-Aug-2020  martin Pull up following revision(s) (requested by riastradh in ticket #1050):

sys/ufs/lfs/lfs_subr.c: revision 1.101
sys/ufs/lfs/lfs_subr.c: revision 1.102
sys/ufs/lfs/lfs_inode.c: revision 1.158
sys/ufs/lfs/lfs_inode.h: revision 1.25
sys/ufs/lfs/lfs_balloc.c: revision 1.95
sys/ufs/lfs/lfs_pages.c: revision 1.21
sys/ufs/lfs/lfs_vnops.c: revision 1.330
sys/ufs/lfs/lfs_alloc.c: revision 1.140 (patch)
sys/ufs/lfs/lfs_alloc.c: revision 1.141 (patch)
lib/libp2k/p2k.c: revision 1.72
sys/ufs/lfs/lfs.h: revision 1.205
sys/ufs/lfs/lfs.h: revision 1.206
sys/ufs/lfs/lfs_segment.c: revision 1.284
sys/ufs/lfs/lfs.h: revision 1.207
sys/ufs/lfs/lfs_segment.c: revision 1.285
sys/ufs/lfs/lfs_debug.c: revision 1.55
sys/ufs/lfs/lfs_rename.c: revision 1.23
usr.sbin/dumplfs/dumplfs.c: revision 1.65
sys/ufs/lfs/lfs_vfsops.c: revision 1.371
sys/arch/i386/stand/efiboot/bootx64/Makefile: revision 1.3
sys/ufs/lfs/lfs_vfsops.c: revision 1.372
sys/ufs/lfs/lfs_vfsops.c: revision 1.373
sbin/fsck_lfs/pass1.c: revision 1.46
sys/ufs/lfs/lfs_vnops.c: revision 1.326
sys/ufs/lfs/lfs_vnops.c: revision 1.327
sys/ufs/lfs/lfs_vfsops.c: revision 1.375 (patch)
sys/ufs/lfs/lfs_vnops.c: revision 1.328
sys/ufs/lfs/lfs_subr.c: revision 1.98
sys/ufs/lfs/lfs_extern.h: revision 1.116
sys/ufs/lfs/lfs_vnops.c: revision 1.329
sys/ufs/lfs/lfs_subr.c: revision 1.99
sys/ufs/lfs/lfs_extern.h: revision 1.117
sys/ufs/lfs/lfs_accessors.h: revision 1.49
sys/ufs/lfs/lfs_extern.h: revision 1.118
sys/rump/fs/lib/liblfs/Makefile: revision 1.15
sys/ufs/lfs/lfs_bio.c: revision 1.146 (patch)
sys/ufs/lfs/lfs_bio.c: revision 1.147
sys/ufs/lfs/lfs_subr.c: revision 1.100

Fix kassert in lfs by initializing vp first.

Use a marker node to iterate lfs_dchainhd / i_lfs_dchain.

I believe elements can be removed while the lock is dropped,
including the next node we're hanging on to.

Just use VOP_BWRITE for lfs_bwrite_log.
Hope this doesn't cause trouble with vfs_suspend.

Teach lfs to transition ro<->rw.

Prevent new dirops while we issue lfs_flush_dirops.

lfs_flush_dirops assumes (by KASSERT((ip->i_state & IN_ADIROP) == 0))
that vnodes on the dchain will not become involved in active dirops
even while holding no other locks (lfs_lock, v_interlock), so we must
set lfs_writer here. All other callers already set lfs_writer.

We set fs->lfs_writer++ without explicitly doing lfs_writer_enter
because
(a) we already waited for the dirops to drain, and
(b) we hold lfs_lock and cannot drop it before setting lfs_writer.

Assert lfs_writer where I think we can now prove it.

Serialize access to the splay tree with lfs_lock.

Change some cheap KDASSERT into KASSERT.

Take a reference and fix assertions in lfs_flush_dirops.
Fixes panic:
KASSERT((ip->i_state & IN_ADIROP) == 0) at lfs_vnops.c:1670
lfs_flush_dirops
lfs_check
lfs_setattr
VOP_SETATTR
change_mode
sys_fchmod
syscall

This assertion -- and the assertion that vp->v_uflag has VU_DIROP set
-- is valid only until we release lfs_lock, because we may race with
lfs_unmark_dirop which will remove the nodes and change the flags.

Further, vp itself is valid only as long as it is referenced, which it
is as long as it's on the dchain, but lfs_unmark_dirop drops the
dchain's reference.

Don't lfs_writer_enter while holding v_interlock.

There's no need to lfs_writer_enter at all here, as far as I can see.
lfs_flush_fs will do it for us.

Break deadlock in PR kern/52301.

The lock order is lfs_writer -> lfs_seglock. The problem in 52301 is
that lfs_segwrite violates this lock order by sometimes doing
lfs_seglock -> lfs_writer, either (a) when doing a checkpoint or (b),
opportunistically, when there are no dirops pending. Both cases can
deadlock, because dirops sometimes take the seglock (lfs_truncate,
lfs_valloc, lfs_vfree):
(a) There may be dirops pending, and they may be waiting for the
seglock, so we can't wait for them to complete while holding the
seglock.
(b) The test for fs->lfs_dirops == 0 happens unlocked, and the state
may change by the time lfs_writer_enter acquires lfs_lock.

To resolve this in each case:
(a) Do lfs_writer_enter before lfs_seglock, since we will need it
unconditionally anyway. The worst performance impact of this should
be that some dirops get delayed a little bit.
(b) Create a new lfs_writer_tryenter to use at this point so that the
test for fs->lfs_dirops == 0 and the acquisition of lfs_writer happen
atomically under lfs_lock.

Initialize/destroy lfs_allclean_wakeup in modcmd, not lfs_mountfs.

Fixes reloading lfs.kmod.

In lfs_update, hold lfs_writer around lfs_vflush.

Otherwise, we might do
lfs_vflush
-> lfs_seglock
-> lfs_segwait(SEGM_CKP)
-> lfs_writer_enter
which is the reverse of the lfs_writer -> lfs_seglock ordering.

Call lfs_orphan in lfs_rename while we're still in the dirop.
lfs_writer_enter can't fail; keep it simple and don't pretend it can.

Assert that mtsleep can't fail either -- it doesn't catch signals and
there's no timeout.

Teach LFS_ORPHAN_NEXTFREE about lfs64.

Dust off the orphan detection code and try to make it work.

Fix !DIAGNOSTIC compile

Fix userland references to LFS_ORPHAN_NEXTFREE.

Forgot to grep for these or do a full distribution build, oops!

Fix missing <sys/evcnt.h> by removing the evcnts instead.

Just wanted to confirm that a race might happen, and indeed it did.
These serve little diagnostic value otherwise.

OR into bp->b_cflags; don't overwrite.

CTASSERT lfs on-disk structure sizes.

Avoid misaligned access to lfs64 on-disk records in memory.
lfs64 directory entries are only 32-bit aligned in order to conserve
space in directory blocks, and we had a hack to stuff a 64-bit inode
in them. This replaces the hack by __aligned(4) __packed, and goes
further:

1. It's not clear that all the other lfs64 data structures are 64-bit
aligned on disk to begin with. We can go through these later and
upgrade them from
struct foo64 {
...
} __aligned(4) __packed;
union foo {
struct foo64 f64;
...
};
to
struct foo64 {
...
};
union foo {
struct foo64 f64 __aligned(8);
...
} __aligned(4) __packed;
if we really want to take advantage of 64-bit memory accesses.
However, the __aligned(4) __packed must remain on the union
because:
2. We access even the lfs32 data structures via a union that has
lfs64 members, and it turns out that compilers will assume access
through a union with 64-bit aligned members implies the whole
union has 64-bit alignment, even if we're only accessing a 32-bit
aligned member.

Fix clang build after packed lfs64 accessor change.

Suppress spurious address-of-packed error in rump lfs too.
 1.142.2.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.144.2.1 29-Feb-2020  ad Sync with head.
 1.30 02-Aug-2015  dholland Add a (draft) 64-bit superblock. Make things build again.

Add pieces of support for using both superblock types where
convenient, and specifically to the superblock accessors, but don't
actually enable it anywhere.

First substantive step on PR 50000.
 1.29 06-Jun-2013  dholland branches: 1.29.10;
Cleanups and hacks to make lfs userland stuff build:
- lfs_cksum.c doesn't actually need ulfs_inode.h any more.
- neither does lfs_itimes.c.
- add hacks to fsck_lfs to make it compile.
- add hacks to newfs_lfs to make it compile.
- fix warning in ulfs_quota.c when quotas are fully disabled
(as I guess is happening with the rumpity version)

XXX: This commit adds -I${NETBSDSRCDIR}/sys to the Makefiles for
XXX: fsck_lfs, newfs_lfs, and lfs_cleanerd. This needs to be cleaned
XXX: up ASAP; but I consider this less problematic in the short term
XXX: than spewing ulfs_*.h into /usr/include.
 1.28 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.27 28-Apr-2008  martin branches: 1.27.34; 1.27.44;
Remove clause 3 and 4 from TNF licenses
 1.26 11-Dec-2005  christos branches: 1.26.70; 1.26.72; 1.26.74;
merge ktrace-lwp.
 1.25 26-Feb-2005  perry nuke trailing whitespace
 1.24 09-Mar-2004  yamt branches: 1.24.8; 1.24.10;
calculate data checksum inline.
 1.23 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.22 20-Feb-2003  perseant branches: 1.22.2;
Tabify, and fix some comment alignment problems.
 1.21 17-Feb-2003  perseant Add code to UBCify LFS. This is still behind "#ifdef LFS_UBC" for now
(there are still some details to work out) but expect that to go
away soon. To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:

* Create a writer daemon kernel thread whose purpose is to handle page
writes for the pagedaemon, but which also takes over some of the
functions of lfs_check(). This thread is started the first time an
LFS is mounted.

* Add a "flags" parameter to GOP_SIZE. Current values are
GOP_SIZE_READ, meaning that the call should return the size of the
in-core version of the file, and GOP_SIZE_WRITE, meaning that it
should return the on-disk size. One of GOP_SIZE_READ or
GOP_SIZE_WRITE must be specified.

* Instead of using malloc(...M_WAITOK) for everything, reserve enough
resources to get by and use malloc(...M_NOWAIT), using the reserves if
necessary. Use the pool subsystem for structures small enough that
this is feasible. This also obsoletes LFS_THROTTLE.

And a few that are not strictly necessary:

* Moves the LFS inode extensions off onto a separately allocated
structure; getting closer to LFS as an LKM. "Welcome to 1.6O."

* Unified GOP_ALLOC between FFS and LFS.

* Update LFS copyright headers to correct values.

* Actually cast to unsigned in lfs_shellsort, like the comment says.

* Keep track of which segments were empty before the previous
checkpoint; any segments that pass two checkpoints both dirty and
empty can be summarily cleaned. Do this. Right now lfs_segclean
still works, but this should be turned into an effectless
compatibility syscall.
 1.20 16-Jun-2002  perseant For synchronous writes, keep separate i/o counters for each write, so
processes don't have to wait for one another to finish (e.g., nfsd seems
to be a little happier now, though I haven't measured the difference).
Synchronous checkpoints, however, must always wait for all i/o to finish.

Take the contents of the callback functions and have them run in thread
context instead (aiodoned thread). lfs_iocount no longer has to be
protected in splbio(), and quite a bit less of the segment construction
loop needs to be in splbio() as well.

If lfs_markv is handed a block that is not the correct size according to
the inode, refuse to process it. (Formerly it was extended to the "correct"
size.) This is possibly more prone to deadlock, but less prone to corruption.

lfs_segclean now outright refuses to clean segments that appear to have live
bytes in them. Again this may be more prone to deadlock but avoids
corruption.

Replace ufsspec_close and ufsfifo_close with LFS equivalents; this means
that no UFS functions need to know about LFS_ITIMES any more. Remove
the reference from ufs/inode.h.

Tested on i386, test-compiled on alpha.
 1.19 15-Nov-2001  lukem branches: 1.19.8; 1.19.10;
don't need <sys/types.h> when including <sys/param.h>
 1.18 08-Nov-2001  lukem add RCSID
 1.17 26-Oct-2001  lukem remove #include <ufs/ufs/quota.h> where it was just to appease
<ufs/ufs/inode.h>, since the latter now includes the former. leave the former
in source that obviously uses specific bits of it (for completeness.)
 1.16 13-Jul-2001  perseant branches: 1.16.4;
Merge the short-lived perseant-lfsv2 branch into the trunk.

Kernels and tools understand both v1 and v2 filesystems; newfs_lfs
generates v2 by default. Changes for the v2 layout include:

- Segments of non-PO2 size and arbitrary block offset, so these can be
matched to convenient physical characteristics of the partition (e.g.,
stripe or track size and offset).

- Address by fragment instead of by disk sector, paving the way for
non-512-byte-sector devices. In theory fragments can be as large
as you like, though in reality they must be smaller than MAXBSIZE in size.

- Use serial number and filesystem identifier to ensure that roll-forward
doesn't get old data and think it's new. Roll-forward is enabled for
v2 filesystems, though not for v1 filesystems by default.

- The inode free list is now a tailq, paving the way for undelete (undelete
is not yet implemented, but can be without further non-backwards-compatible
changes to disk structures).

- Inode atime information is kept in the Ifile, instead of on the inode;
that is, the inode is never written *just* because atime was changed.
Because of this the inodes remain near the file data on the disk, rather
than wandering all over as the disk is read repeatedly. This speeds up
repeated reads by a small but noticeable amount.

Other changes of note include:

- The ifile written by newfs_lfs can now be of arbitrary length, it is no
longer restricted to a single indirect block.

- Fixed an old bug where ctime was changed every time a vnode was created.
I need to look more closely to make sure that the times are only updated
during write(2) and friends, not after-the-fact during a segment write,
and certainly not by the cleaner.
 1.15 04-Feb-2001  christos branches: 1.15.2; 1.15.4; 1.15.6;
don't include lfs_extern.h; ufs/inode.h does too.
 1.14 25-Nov-2000  perseant Use u_int32_t instead of u_long to compute LFS checksums, since the
checksum is stored in a u_int32_t.
 1.13 09-Sep-2000  perseant Various bug-fixes to LFS, to wit:


Kernel:

* Add runtime quantity lfs_ravail, the number of disk-blocks reserved
for writing. Writes to the filesystem first reserve a maximum amount
of blocks before their write is allowed to proceed; after the blocks
are allocated the reserved total is reduced by a corresponding amount.

If the lfs_reserve function cannot immediately reserve the requested
number of blocks, the inode is unlocked, and the thread sleeps until
the cleaner has made enough space available for the blocks to be
reserved. In this way large files can be written to the filesystem
(or, smaller files can be written to a nearly-full but thoroughly
clean filesystem) and the cleaner can still function properly.

* Remove explicit switching on dlfs_minfreeseg from the kernel code; it
is now merely a fs-creation parameter used to compute dlfs_avail and
dlfs_bfree (and used by fsck_lfs(8) to check their accuracy). Its
former role is better assumed by a properly computed dlfs_avail.

* Bounds-check inode numbers submitted through lfs_bmapv and lfs_markv.
This prevents a panic, but, if the cleaner is feeding the filesystem
the wrong data, you are still in a world of hurt.

* Cleanup: remove explicit references of DEV_BSIZE in favor of
btodb()/dbtob().

lfs_cleanerd:

* Make -n mean "send N segments' blocks through a single call to
lfs_markv". Previously it had meant "clean N segments though N calls
to lfs_markv, before looking again to see if more need to be cleaned".
The new behavior gives better packing of direct data on disk with as
little metadata as possible, largely alleviating the problem that the
cleaner can consume more disk through inefficient use of metadata than
it frees by moving dirty data away from clean "holes" to produce
entirely clean segments.

* Make -b mean "read as many segments as necessary to write N segments
of dirty data back to disk", rather than its former meaning of "read
as many segments as necessary to free N segments worth of space". The
new meaning, combined with the new -n behavior described above,
further aids in cleaning storage efficiency as entire segments can be
written at once, using as few blocks as possible for segment summaries
and inode blocks.

* Make the cleaner take note of segments which could not be cleaned due
to error, and not attempt to clean them until they are entirely free
of dirty blocks. This prevents the case in which a cleanerd running
with -n 1 and without -b (formerly the default) would spin trying
repeatedly to clean a corrupt segment, while the remaining space
filled and deadlocked the filesystem.

* Update the lfs_cleanerd manual page to describe all the options,
including the changes mentioned here (in particular, the -b and -n
flags were previously undocumented).

fsck_lfs:

* Check, and optionally fix, lfs_avail (to an exact figure) and
lfs_bfree (within a margin of error) in pass 5.

newfs_lfs:

* Reduce the default dlfs_minfreeseg to 1/20 of the total segments.

* Add a warning if the sgs disklabel field is 16 (the default for FFS'
cpg, but not usually desirable for LFS' sgs: 5--8 is a better range).

* Change the calculation of lfs_avail and lfs_bfree, corresponding to
the kernel changes mentioned above.

mount_lfs:

* Add -N and -b options to pass corresponding -n and -b options to
lfs_cleanerd.

* Default to calling lfs_cleanerd with "-b -n 4".


[All of these changes were largely tested in the 1.5 branch, with the
idea that they (along with previous un-pulled-up work) could be applied
to the branch while it was still in ALPHA2; however my test system has
experienced corruption on another filesystem (/dev/console has gone
missing :^), and, while I believe this unrelated to the LFS changes, I
cannot with good conscience request that the changes be pulled up.]
 1.12 30-Mar-2000  augustss branches: 1.12.4;
Remove register declarations.
 1.11 25-Mar-1999  perseant branches: 1.11.8;
Change lfs_sb_cksum to use offsetof() instead of an inlined version.

Fix lfs_vref/lfs_vunredf to ignore VXLOCKed vnodes that are also being
flushed.

Improve the debugging messages somewhat.
 1.10 10-Mar-1999  perseant New sources should leave the LFS in a more-or-less working state. Changes
include:

- DIROP segregation is enabled, and greater care is taken
to make sure that a checkpoint completes. Fsck is not
needed to remount the filesystem.
- Several checks to make sure that the LFS subsystem does not
overuse various resources (memory, in particular).
- The cleaner routines, lfs_markv in particular, are completely
rewritten. A buffer overflow is removed. Greater care is taken
to ensure that inodes come from where lfs_cleanerd say they come
from (so we know nothing has changed since lfs_bmapv was called).
- Fragment allocation is fixed, so that writes beyond end-of-file
do the right thing.
 1.9 11-Sep-1998  pk PR#6032: define fixed sized on-disk superblock structure.
 1.8 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.7 15-Sep-1997  lukem prototype lfs_cksum ifndef KERNEL
 1.6 16-Feb-1996  christos branches: 1.6.12;
Protect include in lfs_cksum.c so that it can be used by userland programs.
 1.5 09-Feb-1996  christos lfs prototypes
 1.4 14-Dec-1994  mycroft Sync with CSRG.
 1.3 20-Sep-1994  cgd c syntax
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.6.12.1 22-Sep-1997  thorpej Update marc-pcmcia branch from trunk.
 1.11.8.3 11-Feb-2001  bouyer Sync with HEAD.
 1.11.8.2 08-Dec-2000  bouyer Sync with HEAD.
 1.11.8.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.12.4.1 14-Sep-2000  perseant Pull up recent LFS kernel changes (approved by thorpej):

ufs/ufs/inode.h, 1.20--1.22 (add i_lfs_effnblks extension ;
make ITIMES aware of LFS_ITIMES;
_LKM protection so userland progs
compile)
ufs/ufs/ufs_vnops.c, 1.69, 1.71 (remove IN_ADIROP;
use ITIMES instead of FFS_ITIMES)
ufs/ufs/ufs_readwrite.c, 1.27 (use lfs_reserve in lfs_write)
ufs/lfs/lfs.h, 1.26--1.32 (define LFS_EST_* macros ;
change MIN_FREE_SEGS to lfs_minfreesegs ;
add avail and bfree to CLEANERINFO ;
change lfs_uinodes to signed ;
change lfs_dmeta to signed ;
add whitespace to line up structure
members ;
explicit cast to int32_t in LFS_EST_*
macros)
ufs/lfs/lfs_alloc.c, back out 1.34.2.3 (pullups of 1.39, 1.40);
then pull up 1.38 (clean up on error)
1.39--1.43 (restore fvdl's ufs_hashlock fix ;
restore fvdl's ufs_hashlock fix ;
set i_lfs_effnblks ;
use UINO macros ;
add comments and fix long lines)
ufs/lfs/lfs_balloc.c, 1.19 (don't succeed halfway)
1.21--1.25 (use i_lfs_effnblks ;
fix i_lfs_effnblks computation and
quieten ;
fix i_ffs_blocks in unwritten fragment ;
remove useless debugging check ;
add comments and (c) 2000)
ufs/lfs/lfs_bio.c, 1.24--1.30 (cleanup and make lfs_flush_fs take
"struct lfs *" instead of "struct
mount *" ;
use lfs_minfreeseg instead of
MIN_FREE_SEGS ;
use UINO macros, and copy bfree/avail
to CLEANERINFO ;
add lfs_reserve function ;
1.28--1.30 fix printf formatting)
ufs/lfs/lfs_cksum.c, 1.13 (add (c) 2000)
ufs/lfs/lfs_debug.c, 1.11 (use btodb instead of DEV_BSIZE)
ufs/lfs/lfs_extern.h, 1.18, 1.20--1.21 (function prototype changes)
ufs/lfs/lfs_inode.c, 1.38 (rewrite lfs_truncate from
ffs_truncate)
1.40--1.44 (count written and unwritten blocks
seperately ;
use disk block units instead of bytes ;
remove unnecessary "mod" variable ;
correct B_DELWRI to avoid bawrite panic ;
use lfs_reserve)
ufs/lfs/lfs_segment.c, 1.52-1.59 (use lfs_dmeta to note used summaries ;
check for UNWRITTEN in indirect blocks ;
more debugging stuff inside #ifdef
DEBUG_LFS ;
use LK_CANRECURSE ;
don't drop dirty indirect blocks ;
use UINO macros ;
don't hose the free list ;
use btodb() instead of DEV_BSIZE ;
make it compile again (oops))
ufs/lfs/lfs_subr.c, 1.16--1.17 (check for locked inodes before
changing ;
use btodb() instead of DEV_BSIZE, (c)
2000)
ufs/lfs/lfs_syscalls.c, back out 1.41.4.2 (fvdl's ufs_hashlock fix);
then pull up 1.43 (use lfs_dmeta)
1.44--1.45 (restore fvdl's ufs_hashlock fix)
1.46--1.47 (fix lfs_avail leakage from sblock
segments ;
use UINO macros)
1.49 (bounds-check inode numbers in
lfs_markv)
ufs/lfs/lfs_vfsops.c, 1.53 (use LFS_EST_* macros in lfs_statfs)
1.56--1.58 (initialize lfs_minfreeseg, lfs_effnblk ;
initialize lfs_uinodes ;
initialize lfs_ravail)
ufs/lfs/lfs_vnops.c, 1.40 (remove VDIROP from removed files)
1.42--1.44 (move SET_ENDOP below the removal of
VDIROP ;
use UINO macros and add lfs_itimes
function ;
use lfs_reserve in dirops)
 1.15.6.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.15.6.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.15.6.1 03-Aug-2001  lukem update to -current
 1.15.4.1 29-Jun-2001  perseant Get rid of __P(), protoizing where it had not already been done
 1.15.2.4 20-Jun-2002  nathanw Catch up to -current.
 1.15.2.3 08-Jan-2002  nathanw Catch up to -current.
 1.15.2.2 14-Nov-2001  nathanw Catch up to -current.
 1.15.2.1 24-Aug-2001  nathanw Catch up with -current.
 1.16.4.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.19.10.1 20-Jun-2002  lukem Pull up revision 1.20 (requested by perseant in ticket #325):
For synchronous writes, keep separate i/o counters for each write, so
processes don't have to wait for one another to finish (e.g., nfsd seems
to be a little happier now, though I haven't measured the difference).
Synchronous checkpoints, however, must always wait for all i/o to finish.
Take the contents of the callback functions and have them run in thread
context instead (aiodoned thread). lfs_iocount no longer has to be
protected in splbio(), and quite a bit less of the segment construction
loop needs to be in splbio() as well.
If lfs_markv is handed a block that is not the correct size according to
the inode, refuse to process it. (Formerly it was extended to the "correct"
size.) This is possibly more prone to deadlock, but less prone to corruption.
lfs_segclean now outright refuses to clean segments that appear to have live
bytes in them. Again this may be more prone to deadlock but avoids
corruption.
Replace ufsspec_close and ufsfifo_close with LFS equivalents; this means
that no UFS functions need to know about LFS_ITIMES any more. Remove
the reference from ufs/inode.h.
Tested on i386, test-compiled on alpha.
 1.19.8.1 20-Jun-2002  gehenna catch up with -current.
 1.22.2.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.22.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.22.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.22.2.1 03-Aug-2004  skrll Sync with HEAD
 1.24.10.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.24.8.1 29-Apr-2005  kent sync with -current
 1.26.74.1 16-May-2008  yamt sync with head.
 1.26.72.1 18-May-2008  yamt sync with head.
 1.26.70.1 02-Jun-2008  mjf Sync with HEAD.
 1.27.44.2 03-Dec-2017  jdolecek update from HEAD
 1.27.44.1 23-Jun-2013  tls resync from head
 1.27.34.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.29.10.1 22-Sep-2015  skrll Sync with HEAD
 1.55 23-Feb-2020  riastradh Just use VOP_BWRITE for lfs_bwrite_log.

Hope this doesn't cause trouble with vfs_suspend.
 1.54 01-Sep-2015  dholland branches: 1.54.18; 1.54.22; 1.54.24;
The ifile's inode number is constant. (it is always 1)

Therefore, storing the value in the superblock and reading it out
again is silly and offers the opportunity for it to become corrupted.
So, don't do that (most of the code already didn't) and use the
existing constant instead. Initialize new 32-bit superblocks with
the value for the sake of old userland programs, but don't keep the
value in the 64-bit superblock at all.

(approved by Margo Seltzer)
 1.53 01-Sep-2015  dholland Make the inode fields in the 64-bit superblock 64 bits wide.
Reasoning as before.

Note that I am not going through and checking for 64->32 truncations
in inode numbers; I'm sure there are quite a few, but that's a project
for later.
 1.52 12-Aug-2015  dholland Hack up dinode usage to be 64 vs. 32 as needed. Part 1.

(This part changes the native lfs code; the ufs-derived code already
has 64 vs. 32 logic, but as aspects of it are unsafe, and don't
entirely interoperate cleanly with the lfs 64/32 stuff, pass 2 will be
rehashing that.)
 1.51 12-Aug-2015  dholland Provide 32-bit and 64-bit versions of FINFO.

This also entailed sorting out part of struct segment, as that
contains a pointer into the current FINFO data.
 1.50 02-Aug-2015  dholland Add a (draft) 64-bit superblock. Make things build again.

Add pieces of support for using both superblock types where
convenient, and specifically to the superblock accessors, but don't
actually enable it anywhere.

First substantive step on PR 50000.
 1.49 02-Aug-2015  dholland Use accessor functions for the version field of the lfs superblock.
I thought at first maybe the cases that test the version should be
rolled into the accessors, but on the whole I think the conclusion on
that is no.
 1.48 02-Aug-2015  dholland Second batch of 64 -> 32 truncations in lfs, along with more minor
tidyups and corrections in passing.
 1.47 02-Aug-2015  dholland Fix assorted 64 -> 32 truncations in lfs. Also, some minor tidyups and
corrections in passing.
 1.46 28-Jul-2015  dholland Add a new lfs header file: lfs_accessors.h.

This contains all the accessor functions and macros out of lfs.h.
Add an include of lfs_accessors.h after all uses of lfs.h... except
for code that wants to define its own struct lfs-alike that the
accessors are supposed to play along with. For these, set STRUCT_LFS
and include lfs_accessors.h after the necessary structure has been
defined, so that lfs_accessors.h can emit functions in terms of it.
 1.45 25-Jul-2015  hannken Another lfs superblock accessor (inside #ifdef 0).
 1.44 25-Jul-2015  martin Use accessors in DEBUG and DIAGNOSTIC code as well
 1.43 18-Jun-2013  christos branches: 1.43.10;
Prefix most of the cpp macros with lfs_ and LFS_ to avoid conflicts with ffs.
This was done so that boot blocks that want to compile both FFS and LFS in
the same file work.
 1.42 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.41 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.40 22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.39 17-Jul-2011  joerg branches: 1.39.2; 1.39.12;
Retire varargs.h support. Move machine/stdarg.h logic into MI
sys/stdarg.h and expect compiler to provide proper builtins, defaulting
to the GCC interface. lint still has a special fallback.
Reduce abuse of _BSD_VA_LIST_ by defining __va_list by default and
derive va_list as required by standards.
 1.38 19-Jul-2009  dholland minor knf
 1.37 28-Apr-2008  martin branches: 1.37.14;
Remove clause 3 and 4 from TNF licenses
 1.36 02-Jan-2008  ad branches: 1.36.6; 1.36.8; 1.36.10;
Merge vmlocking2 to head.
 1.35 12-Dec-2007  lukem Move __KERNEL_RCSID() so that it's always available if this file is
compiled, even if DEBUG isn't defined.
(This matches the behaviour of various other source files that
provide functions only if DEBUG is enabled.)
 1.34 22-Jul-2007  christos branches: 1.34.6; 1.34.12; 1.34.14; 1.34.16; 1.34.18; 1.34.22;
make this compile again
 1.33 11-Dec-2005  christos branches: 1.33.30; 1.33.40;
merge ktrace-lwp.
 1.32 19-Aug-2005  christos 64 bit inode changes.
 1.31 29-May-2005  christos branches: 1.31.2;
- sprinkle const
- avoid shadow variables.
 1.30 01-Apr-2005  perseant Protect various per-fs structures with fs->lfs_interlock simple_lock, to
improve behavior in the multiprocessor case. Add debugging segment-lock
assertion statements.
 1.29 26-Mar-2005  christos make this compile again :-(
 1.28 26-Mar-2005  christos Use vlog(9). Open-coding vlog here breaks lkm's because including
<sys/kprintf.h> includes opt_multiprocessor.h. One could argue
that the lock stuff should just move to subr_prf.c since nothing
else uses it.
 1.27 08-Mar-2005  simonb branches: 1.27.2;
Tab Police.
 1.26 08-Mar-2005  perseant Straighten out the maze of ifdefs. Instead, consolidate all the debugging
stuff under '#ifdef DEBUG', and use sysctl knobs to turn on/off particular
parts of the debugging reporting (if DEBUG is enabled). Re-enable the LFS
statistics in sysctl, while I'm there. A bit of a rototill.
 1.25 26-Feb-2005  perry nuke trailing whitespace
 1.24 30-Oct-2003  simonb branches: 1.24.8; 1.24.10;
Remove some assigned-to but otherwise unused variables.
 1.23 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.22 02-Apr-2003  fvdl branches: 1.22.2;
Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.21 20-Feb-2003  perseant Tabify, and fix some comment alignment problems.
 1.20 17-Feb-2003  perseant Add code to UBCify LFS. This is still behind "#ifdef LFS_UBC" for now
(there are still some details to work out) but expect that to go
away soon. To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:

* Create a writer daemon kernel thread whose purpose is to handle page
writes for the pagedaemon, but which also takes over some of the
functions of lfs_check(). This thread is started the first time an
LFS is mounted.

* Add a "flags" parameter to GOP_SIZE. Current values are
GOP_SIZE_READ, meaning that the call should return the size of the
in-core version of the file, and GOP_SIZE_WRITE, meaning that it
should return the on-disk size. One of GOP_SIZE_READ or
GOP_SIZE_WRITE must be specified.

* Instead of using malloc(...M_WAITOK) for everything, reserve enough
resources to get by and use malloc(...M_NOWAIT), using the reserves if
necessary. Use the pool subsystem for structures small enough that
this is feasible. This also obsoletes LFS_THROTTLE.

And a few that are not strictly necessary:

* Moves the LFS inode extensions off onto a separately allocated
structure; getting closer to LFS as an LKM. "Welcome to 1.6O."

* Unified GOP_ALLOC between FFS and LFS.

* Update LFS copyright headers to correct values.

* Actually cast to unsigned in lfs_shellsort, like the comment says.

* Keep track of which segments were empty before the previous
checkpoint; any segments that pass two checkpoints both dirty and
empty can be summarily cleaned. Do this. Right now lfs_segclean
still works, but this should be turned into an effectless
compatibility syscall.
 1.19 29-Jan-2003  yamt don't use daddr_t for segment summary since it's an on-disk structure.
 1.18 25-Jan-2003  kleink Fix further printf format warnings for DEBUG, in the wake of daddr_t
having changed.
 1.17 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.16 14-May-2002  perseant Phase one of my three-phase plan to make LFS play nice with UBC, and bug-fixes
I found while making sure there weren't any new ones.

* Make the write clusters keep track of the buffers whose blocks they contain.
This should make it possible to (1) write clusters using a page mapping
instead of malloc, if desired, and (2) schedule blocks for rewriting
(somewhere else) if a write error occurs. Code is present to use
pagemove() to construct the clusters but that is untested and will go away
anyway in favor of page mapping.
* DEBUG now keeps a log of Ifile writes, so that any lingering instances of
the "dirty bufs" problem can be properly debugged.
* Keep track of whether the Ifile has been dirtied by various routines that
can be called by lfs_segwrite, and loop on that until it is clean, for
a checkpoint. Checkpoints need to be squeaky clean.
* Warn the user (once) if the Ifile grows larger than is reasonable for their
buffer cache. Both lfs_mountfs and lfs_unmount check since the Ifile can
grow.
* If an inode is not found in a disk block, try rereading the block, under
the assumption that the block was copied to a cluster and then freed.
* Protect WRITEINPROG() with splbio() to fix a hang in lfs_update.
 1.15 23-Nov-2001  chs add spaces for KNF. confirmed to produce identical objects.
 1.14 08-Nov-2001  lukem add RCSID
 1.13 26-Oct-2001  lukem remove #include <ufs/ufs/quota.h> where it was just to appease
<ufs/ufs/inode.h>, since the latter now includes the former. leave the former
in source that obviously uses specific bits of it (for completeness.)
 1.12 13-Jul-2001  perseant branches: 1.12.4;
Merge the short-lived perseant-lfsv2 branch into the trunk.

Kernels and tools understand both v1 and v2 filesystems; newfs_lfs
generates v2 by default. Changes for the v2 layout include:

- Segments of non-PO2 size and arbitrary block offset, so these can be
matched to convenient physical characteristics of the partition (e.g.,
stripe or track size and offset).

- Address by fragment instead of by disk sector, paving the way for
non-512-byte-sector devices. In theory fragments can be as large
as you like, though in reality they must be smaller than MAXBSIZE in size.

- Use serial number and filesystem identifier to ensure that roll-forward
doesn't get old data and think it's new. Roll-forward is enabled for
v2 filesystems, though not for v1 filesystems by default.

- The inode free list is now a tailq, paving the way for undelete (undelete
is not yet implemented, but can be without further non-backwards-compatible
changes to disk structures).

- Inode atime information is kept in the Ifile, instead of on the inode;
that is, the inode is never written *just* because atime was changed.
Because of this the inodes remain near the file data on the disk, rather
than wandering all over as the disk is read repeatedly. This speeds up
repeated reads by a small but noticeable amount.

Other changes of note include:

- The ifile written by newfs_lfs can now be of arbitrary length, it is no
longer restricted to a single indirect block.

- Fixed an old bug where ctime was changed every time a vnode was created.
I need to look more closely to make sure that the times are only updated
during write(2) and friends, not after-the-fact during a segment write,
and certainly not by the cleaner.
 1.11 09-Sep-2000  perseant branches: 1.11.2; 1.11.4; 1.11.6;
Various bug-fixes to LFS, to wit:


Kernel:

* Add runtime quantity lfs_ravail, the number of disk-blocks reserved
for writing. Writes to the filesystem first reserve a maximum amount
of blocks before their write is allowed to proceed; after the blocks
are allocated the reserved total is reduced by a corresponding amount.

If the lfs_reserve function cannot immediately reserve the requested
number of blocks, the inode is unlocked, and the thread sleeps until
the cleaner has made enough space available for the blocks to be
reserved. In this way large files can be written to the filesystem
(or, smaller files can be written to a nearly-full but thoroughly
clean filesystem) and the cleaner can still function properly.

* Remove explicit switching on dlfs_minfreeseg from the kernel code; it
is now merely a fs-creation parameter used to compute dlfs_avail and
dlfs_bfree (and used by fsck_lfs(8) to check their accuracy). Its
former role is better assumed by a properly computed dlfs_avail.

* Bounds-check inode numbers submitted through lfs_bmapv and lfs_markv.
This prevents a panic, but, if the cleaner is feeding the filesystem
the wrong data, you are still in a world of hurt.

* Cleanup: remove explicit references of DEV_BSIZE in favor of
btodb()/dbtob().

lfs_cleanerd:

* Make -n mean "send N segments' blocks through a single call to
lfs_markv". Previously it had meant "clean N segments though N calls
to lfs_markv, before looking again to see if more need to be cleaned".
The new behavior gives better packing of direct data on disk with as
little metadata as possible, largely alleviating the problem that the
cleaner can consume more disk through inefficient use of metadata than
it frees by moving dirty data away from clean "holes" to produce
entirely clean segments.

* Make -b mean "read as many segments as necessary to write N segments
of dirty data back to disk", rather than its former meaning of "read
as many segments as necessary to free N segments worth of space". The
new meaning, combined with the new -n behavior described above,
further aids in cleaning storage efficiency as entire segments can be
written at once, using as few blocks as possible for segment summaries
and inode blocks.

* Make the cleaner take note of segments which could not be cleaned due
to error, and not attempt to clean them until they are entirely free
of dirty blocks. This prevents the case in which a cleanerd running
with -n 1 and without -b (formerly the default) would spin trying
repeatedly to clean a corrupt segment, while the remaining space
filled and deadlocked the filesystem.

* Update the lfs_cleanerd manual page to describe all the options,
including the changes mentioned here (in particular, the -b and -n
flags were previously undocumented).

fsck_lfs:

* Check, and optionally fix, lfs_avail (to an exact figure) and
lfs_bfree (within a margin of error) in pass 5.

newfs_lfs:

* Reduce the default dlfs_minfreeseg to 1/20 of the total segments.

* Add a warning if the sgs disklabel field is 16 (the default for FFS'
cpg, but not usually desirable for LFS' sgs: 5--8 is a better range).

* Change the calculation of lfs_avail and lfs_bfree, corresponding to
the kernel changes mentioned above.

mount_lfs:

* Add -N and -b options to pass corresponding -n and -b options to
lfs_cleanerd.

* Default to calling lfs_cleanerd with "-b -n 4".


[All of these changes were largely tested in the 1.5 branch, with the
idea that they (along with previous un-pulled-up work) could be applied
to the branch while it was still in ALPHA2; however my test system has
experienced corruption on another filesystem (/dev/console has gone
missing :^), and, while I believe this unrelated to the LFS changes, I
cannot with good conscience request that the changes be pulled up.]
 1.10 23-Apr-2000  perseant branches: 1.10.4;
Fix problems outlined in PR#9926:
- lfs_truncate extends the file if called with length > i_ffs_size;
- lfs_truncate errors out if called with length < 0;
- lfs_balloc block accounting corrected for the case of blocks read
into the cache before they exist on disk;
- mp->mnt_stat.f_iosize is initialized in lfs_mountfs.
 1.9 10-Mar-1999  perseant branches: 1.9.8; 1.9.14;
New sources should leave the LFS in a more-or-less working state. Changes
include:

- DIROP segregation is enabled, and greater care is taken
to make sure that a checkpoint completes. Fsck is not
needed to remount the filesystem.
- Several checks to make sure that the LFS subsystem does not
overuse various resources (memory, in particular).
- The cleaner routines, lfs_markv in particular, are completely
rewritten. A buffer overflow is removed. Greater care is taken
to ensure that inodes come from where lfs_cleanerd say they come
from (so we know nothing has changed since lfs_bmapv was called).
- Fragment allocation is fixed, so that writes beyond end-of-file
do the right thing.
 1.8 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.7 15-Nov-1996  cgd cast int64_t-sized types to "long long" before printing them with %qd.
gcc thinks that the 'q' modifier describes a "long long", and so -Wformat
whines when printing with 'q' on the alpha, since int64_t-sized types are
done with variations on "long" rather than "long long".
 1.6 12-Oct-1996  christos revert previous kprintf changes
 1.5 10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.4 17-Mar-1996  christos Fix printf format strings
 1.3 12-Feb-1996  christos di_size is a quad and needs %qu not %lu
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.9.14.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.9.8.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.10.4.1 14-Sep-2000  perseant Pull up recent LFS kernel changes (approved by thorpej):

ufs/ufs/inode.h, 1.20--1.22 (add i_lfs_effnblks extension ;
make ITIMES aware of LFS_ITIMES;
_LKM protection so userland progs
compile)
ufs/ufs/ufs_vnops.c, 1.69, 1.71 (remove IN_ADIROP;
use ITIMES instead of FFS_ITIMES)
ufs/ufs/ufs_readwrite.c, 1.27 (use lfs_reserve in lfs_write)
ufs/lfs/lfs.h, 1.26--1.32 (define LFS_EST_* macros ;
change MIN_FREE_SEGS to lfs_minfreesegs ;
add avail and bfree to CLEANERINFO ;
change lfs_uinodes to signed ;
change lfs_dmeta to signed ;
add whitespace to line up structure
members ;
explicit cast to int32_t in LFS_EST_*
macros)
ufs/lfs/lfs_alloc.c, back out 1.34.2.3 (pullups of 1.39, 1.40);
then pull up 1.38 (clean up on error)
1.39--1.43 (restore fvdl's ufs_hashlock fix ;
restore fvdl's ufs_hashlock fix ;
set i_lfs_effnblks ;
use UINO macros ;
add comments and fix long lines)
ufs/lfs/lfs_balloc.c, 1.19 (don't succeed halfway)
1.21--1.25 (use i_lfs_effnblks ;
fix i_lfs_effnblks computation and
quieten ;
fix i_ffs_blocks in unwritten fragment ;
remove useless debugging check ;
add comments and (c) 2000)
ufs/lfs/lfs_bio.c, 1.24--1.30 (cleanup and make lfs_flush_fs take
"struct lfs *" instead of "struct
mount *" ;
use lfs_minfreeseg instead of
MIN_FREE_SEGS ;
use UINO macros, and copy bfree/avail
to CLEANERINFO ;
add lfs_reserve function ;
1.28--1.30 fix printf formatting)
ufs/lfs/lfs_cksum.c, 1.13 (add (c) 2000)
ufs/lfs/lfs_debug.c, 1.11 (use btodb instead of DEV_BSIZE)
ufs/lfs/lfs_extern.h, 1.18, 1.20--1.21 (function prototype changes)
ufs/lfs/lfs_inode.c, 1.38 (rewrite lfs_truncate from
ffs_truncate)
1.40--1.44 (count written and unwritten blocks
seperately ;
use disk block units instead of bytes ;
remove unnecessary "mod" variable ;
correct B_DELWRI to avoid bawrite panic ;
use lfs_reserve)
ufs/lfs/lfs_segment.c, 1.52-1.59 (use lfs_dmeta to note used summaries ;
check for UNWRITTEN in indirect blocks ;
more debugging stuff inside #ifdef
DEBUG_LFS ;
use LK_CANRECURSE ;
don't drop dirty indirect blocks ;
use UINO macros ;
don't hose the free list ;
use btodb() instead of DEV_BSIZE ;
make it compile again (oops))
ufs/lfs/lfs_subr.c, 1.16--1.17 (check for locked inodes before
changing ;
use btodb() instead of DEV_BSIZE, (c)
2000)
ufs/lfs/lfs_syscalls.c, back out 1.41.4.2 (fvdl's ufs_hashlock fix);
then pull up 1.43 (use lfs_dmeta)
1.44--1.45 (restore fvdl's ufs_hashlock fix)
1.46--1.47 (fix lfs_avail leakage from sblock
segments ;
use UINO macros)
1.49 (bounds-check inode numbers in
lfs_markv)
ufs/lfs/lfs_vfsops.c, 1.53 (use LFS_EST_* macros in lfs_statfs)
1.56--1.58 (initialize lfs_minfreeseg, lfs_effnblk ;
initialize lfs_uinodes ;
initialize lfs_ravail)
ufs/lfs/lfs_vnops.c, 1.40 (remove VDIROP from removed files)
1.42--1.44 (move SET_ENDOP below the removal of
VDIROP ;
use UINO macros and add lfs_itimes
function ;
use lfs_reserve in dirops)
 1.11.6.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.11.6.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.11.6.1 03-Aug-2001  lukem update to -current
 1.11.4.3 02-Jul-2001  perseant Change disk addressing unit to be the fragment, instead of the disk sector.
All quantities in the superblock, inodes, indirect blocks, etc. refer now
to this abstract unit (called "fsb" as it is in FFS) instead of disk sectors;
as a consequence segment summary blocks have to be multiples of a fragment in
size. In v1 filesystems, compatibility code ensures that 1 fsb == 1 sector,
regardless of fragment size.

Fragments can now range in size between 512 and 32k; in the event that
LFS_LABELPAD (8k) is smaller than the disk address unit size, an extra
proto-superblock is kept at 8k from the beginning of the disk, to be used
*only* to locate the real superblocks. (Not all of the userland knows about
this yet.)

Almost all of this was done not by me, but by joff.
 1.11.4.2 29-Jun-2001  perseant Get rid of __P(), protoizing where it had not already been done
 1.11.4.1 27-Jun-2001  perseant Import of what I've been calling "LFSv2", that is, LFS with some features
added that require changes to the on-disk data structures. These include:

- 64-bit time in everything but inodes
- User-specified segment offset, and segment size no longer
restricted to PO2.
- Serial number on segment summaries in addition to timestamp, and
a new volume identifier, to make roll-forward feasible without
fear of finding old data and thinking it was new.

Although I think this version works at least as well as what's on the trunk,
we're not done yet; hence this commit is going in on a branch and not on
the trunk. Enhancements that are not here yet include fragment addressing,
like FFS does, instead of block addressing.
 1.11.2.4 20-Jun-2002  nathanw Catch up to -current.
 1.11.2.3 08-Jan-2002  nathanw Catch up to -current.
 1.11.2.2 14-Nov-2001  nathanw Catch up to -current.
 1.11.2.1 24-Aug-2001  nathanw Catch up with -current.
 1.12.4.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.22.2.7 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.22.2.6 01-Apr-2005  skrll Sync with HEAD.
 1.22.2.5 08-Mar-2005  skrll Sync with HEAD.
 1.22.2.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.22.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.22.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.22.2.1 03-Aug-2004  skrll Sync with HEAD
 1.24.10.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.24.8.1 29-Apr-2005  kent sync with -current
 1.27.2.3 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.153
sys/ufs/lfs/lfs_debug.c: revision 1.32
sys/ufs/lfs/lfs_alloc.c: revision 1.84
sys/ufs/lfs/lfs_vfsops.c: revision 1.185
sys/ufs/lfs/lfs_segment.c: revision 1.165
64 bit inode changes.
 1.27.2.2 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.152
sys/ufs/lfs/lfs_debug.c: revision 1.31
sys/ufs/lfs/lfs_subr.c: revision 1.53
sys/ufs/lfs/lfs_extern.h: revision 1.68
sys/ufs/lfs/lfs_inode.c: revision 1.96
sys/ufs/lfs/lfs_bio.c: revision 1.86
sys/ufs/lfs/lfs_alloc.c: revision 1.83
sys/ufs/lfs/lfs_vfsops.c: revision 1.181
sys/ufs/lfs/lfs.h: revision 1.88
sys/ufs/lfs/lfs_segment.c: revision 1.164
- sprinkle const
- avoid shadow variables.
 1.27.2.1 07-May-2005  tron Apply patch (requested by perseant in ticket #242):
* fsck_lfs buffer cache fixes, including PR #29151
* Change fsck_lfs phase 0 message to reflect reality
* fsck_lfs: check phase 5 (cleanerinfo accounting) even on
roll-forward
* Keep better track of the free list during roll-forward, avoiding
a core dump
* Improve hash table use for fsck_lfs buffer and vnode cache
* Document fsck_lfs flag -f, and implement -q
* Add resize_lfs, including kernel support
* Add LFS to mountd's list of exportable filesystem types
* Make the LFS lkm work again [christos@]
* Add MP locking to the LFS kernel subsystem
* Fix pager_map deadlock in lfs_putpages()
* Avoid incomplete file extension that looks like "partial
truncation" to fsck
* Use lfs_malloc for cleaner malloc, since the cleaner often runs
in low-memory conditions.
* Use splay trees, not hash table, to track page allocation for
write.
* Fix mkdir panic on full fs
* Fix page accounting leak by counting differently.
* Use rightly named structure for lfs_getattr [skrll@]
* Cosmetic changes for readability.
 1.31.2.3 21-Jan-2008  yamt sync with head
 1.31.2.2 03-Sep-2007  yamt sync with head.
 1.31.2.1 21-Jun-2006  yamt sync with head.
 1.33.40.1 15-Aug-2007  skrll Sync with HEAD.
 1.33.30.4 28-Aug-2007  yamt make this compile with DEBUG.
 1.33.30.3 20-Aug-2007  ad Sync with HEAD.
 1.33.30.2 15-Jul-2007  ad Sync with head.
 1.33.30.1 05-Apr-2007  ad Compile fixes.
 1.34.22.2 22-Jul-2007  christos make this compile again
 1.34.22.1 22-Jul-2007  christos file lfs_debug.c was added on branch matt-mips64 on 2007-07-22 03:41:00 +0000
 1.34.18.2 02-Jan-2008  bouyer Sync with HEAD
 1.34.18.1 13-Dec-2007  bouyer Sync with HEAD
 1.34.16.1 13-Dec-2007  yamt sync with head.
 1.34.14.2 26-Dec-2007  ad Sync with head.
 1.34.14.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.34.12.1 18-Feb-2008  mjf Sync with HEAD.
 1.34.6.1 09-Jan-2008  matt sync with HEAD
 1.36.10.2 19-Aug-2009  yamt sync with head.
 1.36.10.1 16-May-2008  yamt sync with head.
 1.36.8.1 18-May-2008  yamt sync with head.
 1.36.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.37.14.1 23-Jul-2009  jym Sync with HEAD.
 1.39.12.3 03-Dec-2017  jdolecek update from HEAD
 1.39.12.2 23-Jun-2013  tls resync from head
 1.39.12.1 25-Feb-2013  tls resync with head
 1.39.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.39.2.1 23-Jan-2013  yamt sync with head
 1.43.10.1 22-Sep-2015  skrll Sync with HEAD
 1.54.24.1 29-Feb-2020  ad Sync with head.
 1.54.22.1 17-Aug-2020  martin Pull up following revision(s) (requested by riastradh in ticket #1050):

sys/ufs/lfs/lfs_subr.c: revision 1.101
sys/ufs/lfs/lfs_subr.c: revision 1.102
sys/ufs/lfs/lfs_inode.c: revision 1.158
sys/ufs/lfs/lfs_inode.h: revision 1.25
sys/ufs/lfs/lfs_balloc.c: revision 1.95
sys/ufs/lfs/lfs_pages.c: revision 1.21
sys/ufs/lfs/lfs_vnops.c: revision 1.330
sys/ufs/lfs/lfs_alloc.c: revision 1.140 (patch)
sys/ufs/lfs/lfs_alloc.c: revision 1.141 (patch)
lib/libp2k/p2k.c: revision 1.72
sys/ufs/lfs/lfs.h: revision 1.205
sys/ufs/lfs/lfs.h: revision 1.206
sys/ufs/lfs/lfs_segment.c: revision 1.284
sys/ufs/lfs/lfs.h: revision 1.207
sys/ufs/lfs/lfs_segment.c: revision 1.285
sys/ufs/lfs/lfs_debug.c: revision 1.55
sys/ufs/lfs/lfs_rename.c: revision 1.23
usr.sbin/dumplfs/dumplfs.c: revision 1.65
sys/ufs/lfs/lfs_vfsops.c: revision 1.371
sys/arch/i386/stand/efiboot/bootx64/Makefile: revision 1.3
sys/ufs/lfs/lfs_vfsops.c: revision 1.372
sys/ufs/lfs/lfs_vfsops.c: revision 1.373
sbin/fsck_lfs/pass1.c: revision 1.46
sys/ufs/lfs/lfs_vnops.c: revision 1.326
sys/ufs/lfs/lfs_vnops.c: revision 1.327
sys/ufs/lfs/lfs_vfsops.c: revision 1.375 (patch)
sys/ufs/lfs/lfs_vnops.c: revision 1.328
sys/ufs/lfs/lfs_subr.c: revision 1.98
sys/ufs/lfs/lfs_extern.h: revision 1.116
sys/ufs/lfs/lfs_vnops.c: revision 1.329
sys/ufs/lfs/lfs_subr.c: revision 1.99
sys/ufs/lfs/lfs_extern.h: revision 1.117
sys/ufs/lfs/lfs_accessors.h: revision 1.49
sys/ufs/lfs/lfs_extern.h: revision 1.118
sys/rump/fs/lib/liblfs/Makefile: revision 1.15
sys/ufs/lfs/lfs_bio.c: revision 1.146 (patch)
sys/ufs/lfs/lfs_bio.c: revision 1.147
sys/ufs/lfs/lfs_subr.c: revision 1.100

Fix kassert in lfs by initializing vp first.

Use a marker node to iterate lfs_dchainhd / i_lfs_dchain.

I believe elements can be removed while the lock is dropped,
including the next node we're hanging on to.

Just use VOP_BWRITE for lfs_bwrite_log.
Hope this doesn't cause trouble with vfs_suspend.

Teach lfs to transition ro<->rw.

Prevent new dirops while we issue lfs_flush_dirops.

lfs_flush_dirops assumes (by KASSERT((ip->i_state & IN_ADIROP) == 0))
that vnodes on the dchain will not become involved in active dirops
even while holding no other locks (lfs_lock, v_interlock), so we must
set lfs_writer here. All other callers already set lfs_writer.

We set fs->lfs_writer++ without explicitly doing lfs_writer_enter
because
(a) we already waited for the dirops to drain, and
(b) we hold lfs_lock and cannot drop it before setting lfs_writer.

Assert lfs_writer where I think we can now prove it.

Serialize access to the splay tree with lfs_lock.

Change some cheap KDASSERT into KASSERT.

Take a reference and fix assertions in lfs_flush_dirops.
Fixes panic:
KASSERT((ip->i_state & IN_ADIROP) == 0) at lfs_vnops.c:1670
lfs_flush_dirops
lfs_check
lfs_setattr
VOP_SETATTR
change_mode
sys_fchmod
syscall

This assertion -- and the assertion that vp->v_uflag has VU_DIROP set
-- is valid only until we release lfs_lock, because we may race with
lfs_unmark_dirop which will remove the nodes and change the flags.

Further, vp itself is valid only as long as it is referenced, which it
is as long as it's on the dchain, but lfs_unmark_dirop drops the
dchain's reference.

Don't lfs_writer_enter while holding v_interlock.

There's no need to lfs_writer_enter at all here, as far as I can see.
lfs_flush_fs will do it for us.

Break deadlock in PR kern/52301.

The lock order is lfs_writer -> lfs_seglock. The problem in 52301 is
that lfs_segwrite violates this lock order by sometimes doing
lfs_seglock -> lfs_writer, either (a) when doing a checkpoint or (b),
opportunistically, when there are no dirops pending. Both cases can
deadlock, because dirops sometimes take the seglock (lfs_truncate,
lfs_valloc, lfs_vfree):
(a) There may be dirops pending, and they may be waiting for the
seglock, so we can't wait for them to complete while holding the
seglock.
(b) The test for fs->lfs_dirops == 0 happens unlocked, and the state
may change by the time lfs_writer_enter acquires lfs_lock.

To resolve this in each case:
(a) Do lfs_writer_enter before lfs_seglock, since we will need it
unconditionally anyway. The worst performance impact of this should
be that some dirops get delayed a little bit.
(b) Create a new lfs_writer_tryenter to use at this point so that the
test for fs->lfs_dirops == 0 and the acquisition of lfs_writer happen
atomically under lfs_lock.

Initialize/destroy lfs_allclean_wakeup in modcmd, not lfs_mountfs.

Fixes reloading lfs.kmod.

In lfs_update, hold lfs_writer around lfs_vflush.

Otherwise, we might do
lfs_vflush
-> lfs_seglock
-> lfs_segwait(SEGM_CKP)
-> lfs_writer_enter
which is the reverse of the lfs_writer -> lfs_seglock ordering.

Call lfs_orphan in lfs_rename while we're still in the dirop.
lfs_writer_enter can't fail; keep it simple and don't pretend it can.

Assert that mtsleep can't fail either -- it doesn't catch signals and
there's no timeout.

Teach LFS_ORPHAN_NEXTFREE about lfs64.

Dust off the orphan detection code and try to make it work.

Fix !DIAGNOSTIC compile

Fix userland references to LFS_ORPHAN_NEXTFREE.

Forgot to grep for these or do a full distribution build, oops!

Fix missing <sys/evcnt.h> by removing the evcnts instead.

Just wanted to confirm that a race might happen, and indeed it did.
These serve little diagnostic value otherwise.

OR into bp->b_cflags; don't overwrite.

CTASSERT lfs on-disk structure sizes.

Avoid misaligned access to lfs64 on-disk records in memory.
lfs64 directory entries are only 32-bit aligned in order to conserve
space in directory blocks, and we had a hack to stuff a 64-bit inode
in them. This replaces the hack by __aligned(4) __packed, and goes
further:

1. It's not clear that all the other lfs64 data structures are 64-bit
aligned on disk to begin with. We can go through these later and
upgrade them from
struct foo64 {
...
} __aligned(4) __packed;
union foo {
struct foo64 f64;
...
};
to
struct foo64 {
...
};
union foo {
struct foo64 f64 __aligned(8);
...
} __aligned(4) __packed;
if we really want to take advantage of 64-bit memory accesses.
However, the __aligned(4) __packed must remain on the union
because:
2. We access even the lfs32 data structures via a union that has
lfs64 members, and it turns out that compilers will assume access
through a union with 64-bit aligned members implies the whole
union has 64-bit alignment, even if we're only accessing a 32-bit
aligned member.

Fix clang build after packed lfs64 accessor change.

Suppress spurious address-of-packed error in rump lfs too.
 1.54.18.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.123 20-Oct-2025  perseant * Generalize the partial-segment parser introduced for roll-forward,
using it to facilitate an in-kernel segment rewriter (cleaner), and a
mechanism to check whether a segment is in fact empty (only used
with DEBUG).

* Add these new fcntl calls:
- LFCNFILESTATS: For each inode given, report its number of direct
blocks, how many gaps (discontinuities) there are between direct
blocks, and how large the total gap distance is. This will be
useful for a coalescing agent.
- LFCNREWRITEFILE: For each inode given, rewrite its direct blocks,
effectively coalescing it into as compact a form as possible.
- LFCNSCRAMBLE: As above, except that it only rewrites every other
block. This causes the file to have many gaps that can be
measured with LFCNFILESTATS and addressed with LFCNREWRITEFILE,
for testing purposes.
- LFCNREWRITESEGS: Rewrite any live data in the given segments.
This is intended to simplify the cleaner API and facilitate an
in-kernel cleaner.
- LFCNCLEANERINFO: Get the most current CLEANERINFO data from the
kernel.
- LFCNSEGUSE: Retrieve segment usage data from the kernel.

* Vnodes marked IN_CLEANING now take a reference. Add a new "cleaner
lock", which must be taken by the cleaner before the segment lock,
and before marking nodes IN_CLEANING. This allows us to flush
vnodes, if necessary, before the cleaning segment is written, and
never to flush vnodes being cleaned. When the cleaner lock is
released, the vnodes are cleared of IN_CLEANING and the reference
dropped.

* Track a potential infinite loop in lfs_gatherblock.

* Pull "needs to flush" and "needs to wait for flush" into functions
instead of inlining their definitions.
 1.122 17-Sep-2025  perseant Use a workqueue to handle the superblock callback.
 1.121 17-Sep-2025  perseant Add routines to check freelist consistency if compiled with DEBUG and
conditional on a kernel variable manipulated via sysctl.
Add checks before and after each routine that modifies the free list.
#if 0 a section of lfs_vfree() that was intended to keep the free list ordered
but instead corrupted it.
 1.120 04-Sep-2025  perseant Copy the flags from a full partial segment to its continuation, if
a continuation is necessary, so that partial-segment collections marked
with SS_DIROP|SS_CONT are properly completed wiht a partial-segment marked
SS_DIROP (without SS_CONT). Necessary for roll-forward.
 1.119 02-Sep-2025  perseant Use a workqueue to handle cluster iodone, rather than doing it in interrupt context.
 1.118 23-Feb-2020  riastradh Dust off the orphan detection code and try to make it work.
 1.117 23-Feb-2020  riastradh lfs_writer_enter can't fail; keep it simple and don't pretend it can.

Assert that mtsleep can't fail either -- it doesn't catch signals and
there's no timeout.
 1.116 23-Feb-2020  riastradh Break deadlock in PR kern/52301.

The lock order is lfs_writer -> lfs_seglock. The problem in 52301 is
that lfs_segwrite violates this lock order by sometimes doing
lfs_seglock -> lfs_writer, either (a) when doing a checkpoint or (b),
opportunistically, when there are no dirops pending. Both cases can
deadlock, because dirops sometimes take the seglock (lfs_truncate,
lfs_valloc, lfs_vfree):

(a) There may be dirops pending, and they may be waiting for the
seglock, so we can't wait for them to complete while holding the
seglock.

(b) The test for fs->lfs_dirops == 0 happens unlocked, and the state
may change by the time lfs_writer_enter acquires lfs_lock.

To resolve this in each case:

(a) Do lfs_writer_enter before lfs_seglock, since we will need it
unconditionally anyway. The worst performance impact of this should
be that some dirops get delayed a little bit.

(b) Create a new lfs_writer_tryenter to use at this point so that the
test for fs->lfs_dirops == 0 and the acquisition of lfs_writer happen
atomically under lfs_lock.
 1.115 18-Feb-2020  chs remove the aiodoned thread. I originally added this to provide a thread context
for doing page cache iodone work, but since then biodone() has changed to
hand off all iodone work to a softint thread, so we no longer need the
special-purpose aiodoned thread.
 1.114 22-Aug-2018  msaitoh branches: 1.114.4; 1.114.6;
- Cleanup for dynamic sysctl:
- Remove unused *_NAMES macros for sysctl.
- Remove unused *_MAXID for sysctls.
- Move CTL_MACHDEP sysctl definitions for m68k into m68k/include/cpu.h and
use them on all m68k machines.
 1.113 26-Jul-2017  maya branches: 1.113.2; 1.113.4;
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar

XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
 1.112 08-Jun-2017  chs move some buffer cache internals declarations from buf.h to vfs_bio.c.
this is needed to avoid name conflicts with ZFS and also
makes it clearer that other code shouldn't be messing with these.
remove the LFS debug code that poked around in bufqueues and
remove the BQ_EMPTY bufqueue since nothing uses it anymore.
provide a function to let LFS and wapbl read the value of nbuf for now.
 1.111 20-Jun-2016  dholland branches: 1.111.10;
u_int{8,16,32,64}_t -> uint{8,16,32,64}_t in remaining lfs headers.
 1.110 12-Aug-2015  dholland Hack up dinode usage to be 64 vs. 32 as needed. Part 1.

(This part changes the native lfs code; the ufs-derived code already
has 64 vs. 32 logic, but as aspects of it are unsafe, and don't
entirely interoperate cleanly with the lfs 64/32 stuff, pass 2 will be
rehashing that.)
 1.109 12-Aug-2015  dholland Move the security checks for lfs_bmapv/lfs_markv into those functions.
(instead of the system call entry points)

Avoids duplication.

While touching these, pass the lwp around instead of the proc -- the
latter was there for no other reason than because once upon a time
struct proc was the first argument of all syscalls.

(For that matter, why not just use curlwp instead of passing it around
all over the place? The cost of passing it to every syscall probably
exceeds the cost of loading it from curcpu, even on machines where
it's not just kept in a register all the time.)
 1.108 12-Aug-2015  dholland Fix assorted 64->32 truncations related to BLOCK_INFO.

Also make note of a cleaner limitation: it seems that when it goes to
coalesce discontiguous files, it mallocs an array with one BLOCK_INFO
for every block in the file. Therefore, with 64-bit LFS, on a 32-bit
platform it will be possible to have files large enough to overflow
the cleaner's address space. Currently these will be skipped and cause
warnings via syslog.

At some point someone should rewrite the logic to coalesce files to
use chunks of some reasonable size, as discontinuity between such
chunks is immaterial and mallocing this much space is silly and
fragile. Also, the kernel only accepts up to 65536 blocks at a time
for bmapv and markv, so processing more than this at once probably
isn't useful and may not even work currently. I don't want to change
this around just now as it's not entirely trivial.
 1.107 02-Aug-2015  dholland Add a (draft) 64-bit superblock. Make things build again.

Add pieces of support for using both superblock types where
convenient, and specifically to the superblock accessors, but don't
actually enable it anywhere.

First substantive step on PR 50000.
 1.106 02-Aug-2015  dholland Second batch of 64 -> 32 truncations in lfs, along with more minor
tidyups and corrections in passing.
 1.105 31-May-2015  hannken Change lfs from hash table to vcache.

- Change lfs_valloc() to return an inode number and version instead of
a vnode and move lfs_ialloc() and lfs_vcreate() to new lfs_init_vnode().

- Add lfs_valloc_fixed() to allocate a known inode, used by kernel
roll forward.

- Remove lfs_*ref(), these functions cannot coexist with vcache and
their commented behaviour is far away from their implementation.

- Add the cleaner lwp and blockinfo to struct ulfsmount so lfs_loadvnode()
may use hints from the cleaner.

- Remove vnode locks from ulfs_lookup() like we did with ufs_lookup().
 1.104 31-May-2015  hannken Make lfs_fastvget() private to lfs_syscalls.c, change it to take the
BLOCK_INFO and vnode lock type instead of the inode disk address and
return the vnode locked.

Change lfs_markv() and lfs_bmapv() to work on locked vnodes.
 1.103 31-May-2015  hannken Use VFS_PROTOS() for lfs.
Rename conflicting struct lfs field "lfs_start" to "lfs_s0addr".

No functional change.
 1.102 27-Mar-2015  riastradh Disentangle buffer-cached I/O from page-cached I/O in UFS.

Page-cached I/O is used for regular files, and is initiated by VFS
users such as userland and NFS.

Buffer-cached I/O is used for directories and symlinks, and is issued
only internally by UFS.

New UFS routine ufs_bufio replaces vn_rdwr for internal use.
ufs_bufio is implemented by new UFS operations uo_bufrd/uo_bufwr,
which sit in ufs_readwrite.c alongside the VOP_READ/VOP_WRITE
implementations.

I preserved the code as much as possible and will leave further
simplification for future commits. I kept the ulfs_readwrite.c
copypasta close to ufs_readwrite.c in case we ever want to merge them
back; likewise ext2fs_readwrite.c.

No externally visible semantic change. All atf fs tests still pass.
 1.101 18-Mar-2014  riastradh branches: 1.101.6;
Merge riastradh-drm2 to HEAD.
 1.100 20-Jul-2013  dholland Collect the pieces of lfs rename into lfs_rename.c, and sprinkle static.
 1.99 06-Jun-2013  dholland branches: 1.99.2; 1.99.4;
Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.98 23-Feb-2012  joerg branches: 1.98.2;
Make sure that __BEGIN_DECLS and __END_DECLS are paired.
 1.97 02-Jan-2012  perseant * Remove PGO_RECLAIM during lfs_putpages()' call to genfs_putpages(),
to avoid a live lock in the latter when reclaiming a vnode with
dirty pages.

* Add a new segment flag, SEGM_RECLAIM, to note when a segment is
being written for vnode reclamation, and record which inode is being
reclaimed, to aid in forensic debugging.

* Add a new segment flag, SEGM_SINGLE, so that opportunistic writes
can write a single segment's worth of blocks and then stop, rather
than writing all the way up to the cleaner's reserved number of
segments.

* Add assert statements to check mutex ownership is the way it ought
to be, mostly in lfs_putpages; fix problems uncovered by this.

* Don't clear VU_DIROP until the inode actually makes its way to disk,
avoiding a problem where dirop inodes could become separated
(uncovered by a modified version of the "ckckp" forensic regression
test).

* Move the vfs_getopsbyname() call into lfs_writerd. Prepare code to
make lfs_writerd notice when there are no more LFSs, and exit losing
the reference, so that, in theory, the module can be unloaded. This
code is not enabled, since it causes a crash on exit.

* Set IN_MODIFIED on inodes flushed by lfs_flush_dirops. Really we
only need to set IN_MODIFIED if we are going to write them again
(e.g., to write pages); need to think about this more.

Finally, several changes to help avoid "no clean segments" panics:

* In lfs_bmapv, note when a vnode is loaded only to discover whether
its blocks are live, so it can immediately be recycled. Since the
cleaner will try to choose ~empty segments over full ones, this
prevents the cleaner from (1) filling the vnode cache with junk, and
(2) squeezing any unwritten writes to disk and running the fs out of
segments.

* Overestimate by half the amount of metadata that will be required
to fill the clean segments. This will make the disk appear smaller,
but should help avoid a "no clean segments" panic.

* Rearrange lfs_writerd. In particular, lfs_writerd now pays
attention to the number of clean segments available, and holds off
writing until there is room.
 1.96 28-Jun-2008  rumble branches: 1.96.30; 1.96.34;
Create sysctl entries during module initialisation and destroy them
appropriately.

Many of these file systems are now ready for modularisation.
 1.95 28-Apr-2008  martin branches: 1.95.2; 1.95.4;
Remove clause 3 and 4 from TNF licenses
 1.94 02-Jan-2008  ad branches: 1.94.6; 1.94.8; 1.94.10;
Merge vmlocking2 to head.
 1.93 08-Dec-2007  pooka branches: 1.93.4;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.
 1.92 26-Nov-2007  pooka branches: 1.92.2;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.91 31-Jul-2007  pooka branches: 1.91.2; 1.91.4; 1.91.10; 1.91.12;
* nuke the nameidata parameter from VFS_MOUNT(). Nobody on tech-kern
knew what it was supposed to be used for and wrstuden gave a go-ahead
* while rototilling, convert file systems which went easily to
use VFS_PROTOS() instead of manually prototyping the methods
 1.90 12-Jul-2007  dsl branches: 1.90.2;
Change the VFS_MOUNT() interface so that the 'data' buffer passed to the
fs code is a kernel buffer, pass though the length of the buffer as well.
Since the length of the userspace buffer isn'it (yet) passed through the mount
system call, add a field to the vfsops structure containing the default length.
Split sys_mount() for calls from compat code.
Ride one of the recent kernel version changes - old fs LKMs will load, but
sys_mount() will reject any attempt to use them.
 1.89 17-Apr-2007  perseant Install a new sysctl, vfs.lfs.ignore_lazy_sync, which causes LFS to ignore
the "smooth" syncer, as if vfs.sync.*delay = 0, but only for LFS. The
default is "on", i.e., ignore lazy sync.

Reduce the amount of polling/busy-waiting done by lfs_putpages(). To
accomplish this, copied genfs_putpages() and modified it to indicate which
page it was that caused it to return with EDEADLK. fsync()/fdatasync()
should no longer ever fail with EAGAIN, and should not consume huge
quantities of cpu.

Also, try to make dirops less likely to be written as the result of a
VOP_PUTPAGES(), while ensuring that they are written regularly.
 1.88 04-Mar-2007  christos branches: 1.88.2; 1.88.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.87 01-Sep-2006  perseant branches: 1.87.6; 1.87.8; 1.87.12;
Changes to help the roll-forward agent, to wit:

* Mark being-deleted files in the Ifile so we can finish deleting them
at fs mount time.
* Flag the Ifile with "cleaner must clean" when writers are waiting for
the cleaner, rather than relying solely on the cleaner's estimation of
whether it should clean or not.
* Note partial segments written by a user agent (in particular,
fsck_lfs) so that repeated rolls forward don't interfere with one
another.
* Add a new fcntl, LFCNPASS, that allows the log to wrap exactly once,
for better testing of the validity of checkpoints.
* Keep track of the on-disk nlink count when cleaning, so that we don't
partially complete directory operations while cleaning.
* Ensure that every single Ifile inode write represents a consistent
view of the filesystem. In particular, the accounting for the segment
we are writing the inode into must be correct, and the accounting for
the segment that inode used to reside in must be correct. Rather than
just rewriting the inode if we wrote it wrong, rewrite the necessary
ifile blocks before writing the inode so we never write it wrong.
* Don't unmark any VDIROP vnodes if we haven't written them to disk,
avoiding yet another problem with the "wait for the cleaner" error
return from lfs_putpages().

Also, move the last callback to an aiodone call, so we no longer do any
memory management from interrupt context.
 1.86 20-Jul-2006  perseant Separate the (non-working) LFS kernel roll-forward code into its own file,
lfs_rfw.c.
 1.85 13-Jul-2006  martin Fix alignement problems for fhandle_t, exposed by gcc4.1.

While touching all vptofh/fhtovp functions, get rid of VFS_MAXFIDSIZ,
version the getfh(2) syscall and explicitly pass the size available in
the filehandle from userland.

Discussed on tech-kern, with lots of help from yamt (thanks!).
 1.84 29-Jun-2006  perseant Don't wake up the cleaner if the filesystem is unwrappable, and fix the
compatibility fcntls.

Also includes one-line fixes for an MP locking bug and a zero-length FINFO
problem that manifested during testing.
 1.83 18-May-2006  perseant branches: 1.83.4;
Break out the finfo array manipulation code into two new functions,
lfs_acquire_finfo() and lfs_release_finfo(). Add a debugging check
for zero-length finfo arrays in the segment summary to avoid future
regressions.
 1.82 14-May-2006  elad integrate kauth.
 1.81 01-May-2006  perseant Don't ever partially write dirops, even if we need the cleaner to run.
This increases the chances of the "no clean segments" panic slightly,
but allows us to run the ckckp regression test successfully to completion.
 1.80 30-Apr-2006  perseant Postpone the segment accounting changes coming from truncation until the
inode that makes those changes valid is either written to disk by
lfs_writeinode() or discarded by lfs_vfree().

A couple of locking fixes are also included as well.
 1.79 23-Apr-2006  yamt remove unused FFS_NAMES and LFS_NAMES.
 1.78 08-Apr-2006  perseant Implement a somewhat finer-grained mechanism for paging LFS-backed pages.
The writer daemon, if it does not need to flush the whole filesystem,
now only writes the vnodes for which the pagedaemon has requested pageouts
(although it does not pay attention to the page ranges the pagedaemon
supplies).
 1.77 08-Apr-2006  perseant Keep the free list ordered. This solves a problem first pointed out to me
by Michel Oey, in which an aged LFS writes up to an extra Ifile block for
every file created; and paves the way for the truncation of the Ifile when
many files are deleted.
 1.76 24-Mar-2006  perseant Improvements to LFS's paging mechanism, to wit:

* Acknowledge that sometimes there are more dirty pages to be written to
disk than clean segments. When we reach the danger line,
lfs_gop_write() now returns EAGAIN. The caller of VOP_PUTPAGES(), if
it holds the segment lock, drops it and waits for the cleaner to make
room before continuing.

* Note and avoid a three-way deadlock in lfs_putpages (a writer holding
a page busy blocks on the cleaner while the cleaner blocks on the
segment lock while lfs_putpages blocks on the page).
 1.75 14-Jan-2006  yamt branches: 1.75.2; 1.75.4; 1.75.6; 1.75.8; 1.75.10;
- unify ffs_blkatoff and lfs_blkatoff.
- remove ufs_ops::uo_blkatoff.
- add directory read-ahead code. (disabled for now.)
 1.74 06-Jan-2006  yamt remove an obsolete prototype.
 1.73 11-Dec-2005  christos branches: 1.73.2;
merge ktrace-lwp.
 1.72 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.71 13-Sep-2005  christos branches: 1.71.2;
split out lfs_itimes(). It is used in fsck_lfs.
 1.70 12-Sep-2005  christos Use nanotime() to update the time fields in filesystems. Convert the code
from macros to real functions. Original patch and review from chuq.
Note: ext2fs only keeps seconds in the on-disk inode, and msdosfs does not
have enough precision for all fields, so this is not very useful for those
two.
 1.69 28-Jun-2005  yamt branches: 1.69.2;
- constify genfs_ops.
- use member designators.
 1.68 29-May-2005  christos - sprinkle const
- avoid shadow variables.
 1.67 23-Apr-2005  perseant Provide a resize_lfs(8), including kernel and cleaner support. The current
implementation requires the fs to be mounted while resizing. Tested in both
directions, and everything appears to work happily, but ymmv.
 1.66 16-Apr-2005  perseant Use splay trees, rather than a hash table, to manage the accounting of
blocks allocated through VOP_BALLOC() for pages to be written to disk.
This accounting no longer takes a noticeable fraction of the system CPU.
 1.65 14-Apr-2005  perseant Consolidate the hash table we use to maintain the integrity of lfs_avail
into a single, system-wide table, rather than having a separate hash table
per inode. Significantly reduces the "system" cpu usage of your average
file write.
 1.64 08-Mar-2005  perseant branches: 1.64.2;
Straighten out the maze of ifdefs. Instead, consolidate all the debugging
stuff under '#ifdef DEBUG', and use sysctl knobs to turn on/off particular
parts of the debugging reporting (if DEBUG is enabled). Re-enable the LFS
statistics in sysctl, while I'm there. A bit of a rototill.
 1.63 26-Feb-2005  perry nuke trailing whitespace
 1.62 26-Feb-2005  perseant Various minor LFS improvements:

* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statvfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().
 1.61 20-May-2004  atatat branches: 1.61.4; 1.61.6;
Tweak sysctl setup functions (the macros, actually) for use in lkms,
and tweak lkminit_*.c (where applicable) to call them, and to call
sysctl_teardown() when being unloaded.

This consists of (1) making setup functions not be static when being
compiled as lkms (change to sys/sysctl.h), (2) making prototypes
visible for the various setup functions in header files (changes to
various header files), and (3) making simple "load" and "unload"
functions in the actual lkminit stuff.

linux_sysctl.c also needs its root exposed (ie, made not static) for
this (when built as an lkm).
 1.60 21-Apr-2004  christos Replace the statfs() family of system calls with statvfs().
Retain binary compatibility.
 1.59 09-Mar-2004  yamt branches: 1.59.2;
calculate data checksum inline.
 1.58 04-Dec-2003  atatat Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
 1.57 07-Nov-2003  yamt - tweak lfs_update_single()'s prototype so that it can be used by
roll-forward code.
- reduce code duplication using the above in update_meta()
this also fixes fragment accounting.
 1.56 07-Nov-2003  yamt fix spec vnode aliasing.
 1.55 29-Sep-2003  yamt remove redundant prototypes.
 1.54 23-Sep-2003  yamt cleanup IN_ADIROP/VDIROP handling a little.
 1.53 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.52 12-Jul-2003  yamt - protect global resource counts with lfs_subsys_lock.
- clean up scattered externs a little.
 1.51 02-Jul-2003  yamt - add a new functions, lfs_writer_enter/leave, and use them instead of
duplicated code fragments.
- add an assertion.
 1.50 29-Jun-2003  fvdl branches: 1.50.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.49 29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.48 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.47 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.46 20-Mar-2003  yamt fix "more than one fragment" panics;
direct and indirect block pointers are not valid in the case of shortlinks.
while i'm here, move duplicated code in lfs_vget/fastvget into a new
function, lfs_vinit.
 1.45 08-Mar-2003  perseant Add an lfs_strategy() that checks to make sure we're not trying to read
where the cleaner is trying to write, instead of tying up the "live"
buffers (or pages).

Fix a bug in the LFS_UBC case where oversized buffers would not be
checksummed correctly, causing uncleanable segments.

Make sure that wakeup(fs->lfs_iocount) is done if fs->lfs_iocount is 1
as well as 0, since we wait in some places for it to drop to 1.

Activate all pages that make it into lfs_gop_write without the segment
lock held, since they must have been dirtied very recently, even if
PG_DELWRI is not set.
 1.44 25-Feb-2003  perseant Make fs-specific fcntl macros take three arguments (approved wrstuden).
Let LFS use fcntl for cleaner functions.
 1.43 24-Feb-2003  perseant Add lfs_ioctl vnode op, with ioctls to take over cleaner system call
functionality (not including segment clean, since that is now done
automatically as checkpoints happen).
 1.42 23-Feb-2003  perseant Fix a buffer overflow bug in the LFS_UBC case that manifested itself
either as a mysterious UVM error or as "panic: dirty bufs". Verify
maximum size in lfs_malloc.

Teach lfs_updatemeta and lfs_shellsort about oversized cluster blocks from
lfs_gop_write.

When unwiring pages in lfs_gop_write, deactivate them, under the theory
that the pagedaemon wanted to free them last we knew.
 1.41 20-Feb-2003  perseant Tabify, and fix some comment alignment problems.
 1.40 18-Feb-2003  perseant Make it compile again, grr....
 1.39 17-Feb-2003  perseant Add code to UBCify LFS. This is still behind "#ifdef LFS_UBC" for now
(there are still some details to work out) but expect that to go
away soon. To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:

* Create a writer daemon kernel thread whose purpose is to handle page
writes for the pagedaemon, but which also takes over some of the
functions of lfs_check(). This thread is started the first time an
LFS is mounted.

* Add a "flags" parameter to GOP_SIZE. Current values are
GOP_SIZE_READ, meaning that the call should return the size of the
in-core version of the file, and GOP_SIZE_WRITE, meaning that it
should return the on-disk size. One of GOP_SIZE_READ or
GOP_SIZE_WRITE must be specified.

* Instead of using malloc(...M_WAITOK) for everything, reserve enough
resources to get by and use malloc(...M_NOWAIT), using the reserves if
necessary. Use the pool subsystem for structures small enough that
this is feasible. This also obsoletes LFS_THROTTLE.

And a few that are not strictly necessary:

* Moves the LFS inode extensions off onto a separately allocated
structure; getting closer to LFS as an LKM. "Welcome to 1.6O."

* Unified GOP_ALLOC between FFS and LFS.

* Update LFS copyright headers to correct values.

* Actually cast to unsigned in lfs_shellsort, like the comment says.

* Keep track of which segments were empty before the previous
checkpoint; any segments that pass two checkpoints both dirty and
empty can be summarily cleaned. Do this. Right now lfs_segclean
still works, but this should be turned into an effectless
compatibility syscall.
 1.38 01-Feb-2003  tron Only use MALLOC_DECLARE() in kernel namespace.
 1.37 01-Feb-2003  thorpej Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.
 1.36 30-Jan-2003  yamt there's no need to treat VOP_WHITEOUT as dirop
because it modifies only one inode.
 1.35 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.34 28-Dec-2002  yamt - in lfs_reserve, vref vnodes that we're locking so that cleaner doesn't
try to reclaim them.
(workaround for deadlock noted in the comment in lfs_reserveavail)
- in lfs_rename, mark vnodes which are being moved as well as directry vnodes.
 1.33 17-Dec-2002  yamt no need for cleaner to hold vnode locks.
cleaner and normal vnode operations are synchronized enough by
seglock/fraglock and buf's B_BUSY-ness.
 1.32 01-Dec-2002  matt Add multiple inclusion protection for headers. Fix mismatched
variable declarations (missing const's) as needed.
 1.31 16-Jun-2002  perseant For synchronous writes, keep separate i/o counters for each write, so
processes don't have to wait for one another to finish (e.g., nfsd seems
to be a little happier now, though I haven't measured the difference).
Synchronous checkpoints, however, must always wait for all i/o to finish.

Take the contents of the callback functions and have them run in thread
context instead (aiodoned thread). lfs_iocount no longer has to be
protected in splbio(), and quite a bit less of the segment construction
loop needs to be in splbio() as well.

If lfs_markv is handed a block that is not the correct size according to
the inode, refuse to process it. (Formerly it was extended to the "correct"
size.) This is possibly more prone to deadlock, but less prone to corruption.

lfs_segclean now outright refuses to clean segments that appear to have live
bytes in them. Again this may be more prone to deadlock but avoids
corruption.

Replace ufsspec_close and ufsfifo_close with LFS equivalents; this means
that no UFS functions need to know about LFS_ITIMES any more. Remove
the reference from ufs/inode.h.

Tested on i386, test-compiled on alpha.
 1.30 14-May-2002  perseant branches: 1.30.2; 1.30.4;
Phase one of my three-phase plan to make LFS play nice with UBC, and bug-fixes
I found while making sure there weren't any new ones.

* Make the write clusters keep track of the buffers whose blocks they contain.
This should make it possible to (1) write clusters using a page mapping
instead of malloc, if desired, and (2) schedule blocks for rewriting
(somewhere else) if a write error occurs. Code is present to use
pagemove() to construct the clusters but that is untested and will go away
anyway in favor of page mapping.
* DEBUG now keeps a log of Ifile writes, so that any lingering instances of
the "dirty bufs" problem can be properly debugged.
* Keep track of whether the Ifile has been dirtied by various routines that
can be called by lfs_segwrite, and loop on that until it is clean, for
a checkpoint. Checkpoints need to be squeaky clean.
* Warn the user (once) if the Ifile grows larger than is reasonable for their
buffer cache. Both lfs_mountfs and lfs_unmount check since the Ifile can
grow.
* If an inode is not found in a disk block, try rereading the block, under
the assumption that the block was copied to a cluster and then freed.
* Protect WRITEINPROG() with splbio() to fix a hang in lfs_update.
 1.29 12-May-2002  matt Eliminate commons.
 1.28 11-Feb-2002  perseant Include the space taken by inodes in the count made by lfs_check();
make VOP_SETATTR call lfs_check. This prevents large numbers of inode
changes (say, at the end of tar(1)) from filling the buffer cache.
 1.27 18-Dec-2001  chs use the new compatibility routines to allow mmap() to work
(in the same non-coherent fashion that it worked pre-UBC)
until someone has time to do it the right way.
 1.26 15-Sep-2001  chs add a new VFS op, vfs_reinit, which is called when desiredvnodes is
adjusted via sysctl. file systems that have hash tables which are
sized based on the value of this variable now resize those hash tables
using the new value. the max number of FFS softdeps is also recalculated.

convert various file systems to use the <sys/queue.h> macros for
their hash tables.
 1.25 13-Jul-2001  perseant branches: 1.25.2;
Merge the short-lived perseant-lfsv2 branch into the trunk.

Kernels and tools understand both v1 and v2 filesystems; newfs_lfs
generates v2 by default. Changes for the v2 layout include:

- Segments of non-PO2 size and arbitrary block offset, so these can be
matched to convenient physical characteristics of the partition (e.g.,
stripe or track size and offset).

- Address by fragment instead of by disk sector, paving the way for
non-512-byte-sector devices. In theory fragments can be as large
as you like, though in reality they must be smaller than MAXBSIZE in size.

- Use serial number and filesystem identifier to ensure that roll-forward
doesn't get old data and think it's new. Roll-forward is enabled for
v2 filesystems, though not for v1 filesystems by default.

- The inode free list is now a tailq, paving the way for undelete (undelete
is not yet implemented, but can be without further non-backwards-compatible
changes to disk structures).

- Inode atime information is kept in the Ifile, instead of on the inode;
that is, the inode is never written *just* because atime was changed.
Because of this the inodes remain near the file data on the disk, rather
than wandering all over as the disk is read repeatedly. This speeds up
repeated reads by a small but noticeable amount.

Other changes of note include:

- The ifile written by newfs_lfs can now be of arbitrary length, it is no
longer restricted to a single indirect block.

- Fixed an old bug where ctime was changed every time a vnode was created.
I need to look more closely to make sure that the times are only updated
during write(2) and friends, not after-the-fact during a segment write,
and certainly not by the cleaner.
 1.24 03-Dec-2000  perseant branches: 1.24.2; 1.24.4; 1.24.6;
Get rid of some old unnecessary code that cleared B_NEEDCOMMIT from buffers in
lfs_writeseg (possibly after they had been freed).

If MALLOCLOG is defined, make lfs_newbuf and lfs_freebuf pass along the
caller's file and line to _malloc and _free.
 1.23 25-Nov-2000  perseant Use u_int32_t instead of u_long to compute LFS checksums, since the
checksum is stored in a u_int32_t.
 1.22 17-Nov-2000  perseant Correct accounting of lfs_avail, locked_queue_count, and locked_queue_bytes.
(PR #11468). In the case of fragment allocation, check to see if enough
space is available before extending a fragment already scheduled for writing.

The locked_queue_* variables indicate the number of buffer headers and bytes,
respectively, that are unavailable to getnewbuf() because they are locked up
waiting for LFS to flush them; make sure that that is actually what we're
counting, i.e., never count malloced buffers, and always use b_bufsize instead
of b_bcount.

If DEBUG is defined, the periodic calls to lfs_countlocked will now complain
if either counter is incorrect. (In the future lfs_countlocked will not need
to be called at all if DEBUG is not defined.)
 1.21 09-Sep-2000  perseant Various bug-fixes to LFS, to wit:


Kernel:

* Add runtime quantity lfs_ravail, the number of disk-blocks reserved
for writing. Writes to the filesystem first reserve a maximum amount
of blocks before their write is allowed to proceed; after the blocks
are allocated the reserved total is reduced by a corresponding amount.

If the lfs_reserve function cannot immediately reserve the requested
number of blocks, the inode is unlocked, and the thread sleeps until
the cleaner has made enough space available for the blocks to be
reserved. In this way large files can be written to the filesystem
(or, smaller files can be written to a nearly-full but thoroughly
clean filesystem) and the cleaner can still function properly.

* Remove explicit switching on dlfs_minfreeseg from the kernel code; it
is now merely a fs-creation parameter used to compute dlfs_avail and
dlfs_bfree (and used by fsck_lfs(8) to check their accuracy). Its
former role is better assumed by a properly computed dlfs_avail.

* Bounds-check inode numbers submitted through lfs_bmapv and lfs_markv.
This prevents a panic, but, if the cleaner is feeding the filesystem
the wrong data, you are still in a world of hurt.

* Cleanup: remove explicit references of DEV_BSIZE in favor of
btodb()/dbtob().

lfs_cleanerd:

* Make -n mean "send N segments' blocks through a single call to
lfs_markv". Previously it had meant "clean N segments though N calls
to lfs_markv, before looking again to see if more need to be cleaned".
The new behavior gives better packing of direct data on disk with as
little metadata as possible, largely alleviating the problem that the
cleaner can consume more disk through inefficient use of metadata than
it frees by moving dirty data away from clean "holes" to produce
entirely clean segments.

* Make -b mean "read as many segments as necessary to write N segments
of dirty data back to disk", rather than its former meaning of "read
as many segments as necessary to free N segments worth of space". The
new meaning, combined with the new -n behavior described above,
further aids in cleaning storage efficiency as entire segments can be
written at once, using as few blocks as possible for segment summaries
and inode blocks.

* Make the cleaner take note of segments which could not be cleaned due
to error, and not attempt to clean them until they are entirely free
of dirty blocks. This prevents the case in which a cleanerd running
with -n 1 and without -b (formerly the default) would spin trying
repeatedly to clean a corrupt segment, while the remaining space
filled and deadlocked the filesystem.

* Update the lfs_cleanerd manual page to describe all the options,
including the changes mentioned here (in particular, the -b and -n
flags were previously undocumented).

fsck_lfs:

* Check, and optionally fix, lfs_avail (to an exact figure) and
lfs_bfree (within a margin of error) in pass 5.

newfs_lfs:

* Reduce the default dlfs_minfreeseg to 1/20 of the total segments.

* Add a warning if the sgs disklabel field is 16 (the default for FFS'
cpg, but not usually desirable for LFS' sgs: 5--8 is a better range).

* Change the calculation of lfs_avail and lfs_bfree, corresponding to
the kernel changes mentioned above.

mount_lfs:

* Add -N and -b options to pass corresponding -n and -b options to
lfs_cleanerd.

* Default to calling lfs_cleanerd with "-b -n 4".


[All of these changes were largely tested in the 1.5 branch, with the
idea that they (along with previous un-pulled-up work) could be applied
to the branch while it was still in ALPHA2; however my test system has
experienced corruption on another filesystem (/dev/console has gone
missing :^), and, while I believe this unrelated to the LFS changes, I
cannot with good conscience request that the changes be pulled up.]
 1.20 05-Jul-2000  perseant Clean up accounting of lfs_uinodes (dirty but unwritten inodes).

Make lfs_uinodes a signed quantity for debugging purposes, and set it to
zero as fs mount time.

Enclose setting/clearing of the dirty flags (IN_MODIFIED, IN_ACCESSED,
IN_CLEANING) in macros, and use those macros everywhere. Make
LFS_ITIMES use these macros; updated the ITIMES macro in inode.h to know
about this. Make ufs_getattr use ITIMES instead of FFS_ITIMES.
 1.19 30-Jun-2000  fvdl Rearrange code around getnewvnode as was already done for ffs, to avoid
locking against oneself because getnewvnode recycles a softdep-using vnode.
 1.18 27-Jun-2000  perseant Fixes associated with filling an LFS:

Change the space computation to appear to change the size of the *disk*
rather than the *bytes used* when more segment summaries and inode
blocks are written. Try to estimate the amount of space that these will
take up when more files are written, so the disk size doesn't change too
much.

Regularize error returns from lfs_valloc, lfs_balloc, lfs_truncate: they
now fail entirely, rather than succeeding half-way and leaving the fs in
an inconsistent state.

Rewrite lfs_truncate, mostly stealing from ffs_truncate. The old
lfs_truncate had difficulty truncating a large file to a non-zero size
(indirect blocks were not handled appropriately).

Unmark VDIROP on fvp after ufs_remove, ufs_rmdir, so these can be
reclaimed immediately: this vnode would not be written to disk again
anyway if the removal succeeded, and if it failed, no directory
operation occurred.

ufs_makeinode and ufs_mkdir now remove IN_ADIROP on error.
 1.17 16-Mar-2000  jdolecek branches: 1.17.4;
Add new VFS op routine - vfs_done and call it on filesystem detach
in vfs_detach(). vfs_done may free global filesystem's resources,
typically those allocated in respective filesystem's init function.
Needed so those filesystems which went in via LKM have a chance to
clean after themselves before unloading. This fixes random panics
when LKM for filesystem using pools was loaded and unloaded several
times.

For each leaf filesystem, add appropriate vfs_done routine.
 1.16 19-Jan-2000  perseant Changes to stabilize LFS. The first two of these should also apply to the
1.4 branch.

* Use a separate per-fs lock, instead of ufs_hashlock, to protect the Inode
free list. This seems to prevent the "lockmgr: %d, not exclusive lock holder
%d, unlocking" message I was mis-attributing last night to an unlocked vnode
being passed to vrele.

* Change calling semantics of lfs_ifind, to give better error reporting:
If fed a struct buf, it can report the block number of the offending inode
block as well as the inode number.

* Back out rev 1.10 of lfs_subr.c, since the replacement code was slightly
uglier while being functionally identical.

* Make lfs_vunref use the same free list convention as vrele/vput, so that
vget does not remove vnodes from a hash list they are not on.
 1.15 15-Nov-1999  fvdl Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.14 01-Jun-1999  perseant branches: 1.14.2; 1.14.4; 1.14.8;
Fixed lfs_update (and related functions) so that calls from lfs_fsync
will DTRT with vnodes marked VDIROP. In particular, the message
"flushing VDIROP" will no longer appear, and the filesystem will remain
stable in the event of a crash.

This was particularly a problem with NFS-exported LFSes, since fsync
was called on every file close.
 1.13 10-Mar-1999  perseant branches: 1.13.2; 1.13.4;
New sources should leave the LFS in a more-or-less working state. Changes
include:

- DIROP segregation is enabled, and greater care is taken
to make sure that a checkpoint completes. Fsck is not
needed to remount the filesystem.
- Several checks to make sure that the LFS subsystem does not
overuse various resources (memory, in particular).
- The cleaner routines, lfs_markv in particular, are completely
rewritten. A buffer overflow is removed. Greater care is taken
to ensure that inodes come from where lfs_cleanerd say they come
from (so we know nothing has changed since lfs_bmapv was called).
- Fragment allocation is fixed, so that writes beyond end-of-file
do the right thing.
 1.12 26-Feb-1999  wrstuden Modify vfsops to seperate vfs_fhtovp() into two routines. vfs_fhtovp() now
only handles the file handle to vnode conversion, and a new call,
vfs_checkexp(), performs the export verification.
 1.11 11-Sep-1998  pk PR#6032: define fixed sized on-disk superblock structure.
 1.10 01-Sep-1998  thorpej Use the pool allocator and the "nointr" pool page allocator for LFS inodes.
 1.9 24-Jun-1998  sommerfe Always include fifos; "not an option any more".
 1.8 22-Jun-1998  sommerfe defopt for options FIFO
 1.7 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.6 22-Dec-1996  cgd Change the second and third args to struct vfsops' (*vfs_mount)() to
'const char *', and 'void *', respectively. The second arg is taken directly
from user arguments, and is const there, so must be const in the prototypes
and functions. The third arg is also taken directly from user arguments.
It doesn't have to be changed, but since it's cleaner to keep the type
the same as the user arg's type, and I'm already making the 'const char *'
change...
 1.5 12-Feb-1996  christos Add fwd declaration for struct ucred
 1.4 09-Feb-1996  christos lfs prototypes
 1.3 14-Dec-1994  mycroft Sync with CSRG.
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.13.4.3 31-Aug-1999  perseant Rudimentary support for LFS under UBC:

- LFS-specific VOP_BALLOC and VOP_PUTPAGES vnode ops.

- getblk VREG panic #ifdef'd out (can be reinstated when Ifile is
internalized and Ifile can be made another type from VREG)

- interface to VOP_PUTPAGES changed to pass all pager flags, not
just sync. FS putpages routines must know about the pager flags.

- new LFS magic disk address, -2 ("unwritten"), meaning accounted for
but not assigned to a fixed disk location (since LFS does these two
things separately, and the previous accounting method using buffer
headers no longer will work). Changed references to (foo == (daddr_t)-1)
to (foo < 0). Since disk drivers reject all addresses < 0, this should
not present a problem for other FSs.
 1.13.4.2 11-Jul-1999  chs add placeholders for getpages/putpages.
 1.13.4.1 21-Jun-1999  thorpej Sync w/ -current.
 1.13.2.2 20-Jan-2000  he Pull up revision 1.16 (requested by perseant):
Files removed (through unlink, rmdir) are now really removed, though the
removal is postponed until the dirop is complete to ensure validity of
the filesystem through a crash. Use a separate per-fs lock, instead of
ufs_hashlock, to protect the inode free list. Change calling semantics
of lfs_ifind, to give better error reporting: If fed a struct buf, it
can report the block number of the offending inode block as well as the
inode number.
 1.13.2.1 17-Dec-1999  he Pull up revision 1.14 (requested by perseant):
Avoid flushing vnodes involved in a dirop, making lfs' promise
of "no fsck needed, even in the event of a crash" closer to
reality.
 1.14.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.14.4.1 19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.14.2.3 08-Dec-2000  bouyer Sync with HEAD.
 1.14.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.14.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.17.4.3 03-Feb-2001  he Pull up revision 1.22 (requested by perseant):
o Close up accounting holes in LFS' accounting of immediately-
available-space, number of clean segments, and amount of dirty
space taken up by metadata (PR#11468, PR#11470, PR#11534).
 1.17.4.2 14-Sep-2000  perseant Pull up recent LFS kernel changes (approved by thorpej):

ufs/ufs/inode.h, 1.20--1.22 (add i_lfs_effnblks extension ;
make ITIMES aware of LFS_ITIMES;
_LKM protection so userland progs
compile)
ufs/ufs/ufs_vnops.c, 1.69, 1.71 (remove IN_ADIROP;
use ITIMES instead of FFS_ITIMES)
ufs/ufs/ufs_readwrite.c, 1.27 (use lfs_reserve in lfs_write)
ufs/lfs/lfs.h, 1.26--1.32 (define LFS_EST_* macros ;
change MIN_FREE_SEGS to lfs_minfreesegs ;
add avail and bfree to CLEANERINFO ;
change lfs_uinodes to signed ;
change lfs_dmeta to signed ;
add whitespace to line up structure
members ;
explicit cast to int32_t in LFS_EST_*
macros)
ufs/lfs/lfs_alloc.c, back out 1.34.2.3 (pullups of 1.39, 1.40);
then pull up 1.38 (clean up on error)
1.39--1.43 (restore fvdl's ufs_hashlock fix ;
restore fvdl's ufs_hashlock fix ;
set i_lfs_effnblks ;
use UINO macros ;
add comments and fix long lines)
ufs/lfs/lfs_balloc.c, 1.19 (don't succeed halfway)
1.21--1.25 (use i_lfs_effnblks ;
fix i_lfs_effnblks computation and
quieten ;
fix i_ffs_blocks in unwritten fragment ;
remove useless debugging check ;
add comments and (c) 2000)
ufs/lfs/lfs_bio.c, 1.24--1.30 (cleanup and make lfs_flush_fs take
"struct lfs *" instead of "struct
mount *" ;
use lfs_minfreeseg instead of
MIN_FREE_SEGS ;
use UINO macros, and copy bfree/avail
to CLEANERINFO ;
add lfs_reserve function ;
1.28--1.30 fix printf formatting)
ufs/lfs/lfs_cksum.c, 1.13 (add (c) 2000)
ufs/lfs/lfs_debug.c, 1.11 (use btodb instead of DEV_BSIZE)
ufs/lfs/lfs_extern.h, 1.18, 1.20--1.21 (function prototype changes)
ufs/lfs/lfs_inode.c, 1.38 (rewrite lfs_truncate from
ffs_truncate)
1.40--1.44 (count written and unwritten blocks
seperately ;
use disk block units instead of bytes ;
remove unnecessary "mod" variable ;
correct B_DELWRI to avoid bawrite panic ;
use lfs_reserve)
ufs/lfs/lfs_segment.c, 1.52-1.59 (use lfs_dmeta to note used summaries ;
check for UNWRITTEN in indirect blocks ;
more debugging stuff inside #ifdef
DEBUG_LFS ;
use LK_CANRECURSE ;
don't drop dirty indirect blocks ;
use UINO macros ;
don't hose the free list ;
use btodb() instead of DEV_BSIZE ;
make it compile again (oops))
ufs/lfs/lfs_subr.c, 1.16--1.17 (check for locked inodes before
changing ;
use btodb() instead of DEV_BSIZE, (c)
2000)
ufs/lfs/lfs_syscalls.c, back out 1.41.4.2 (fvdl's ufs_hashlock fix);
then pull up 1.43 (use lfs_dmeta)
1.44--1.45 (restore fvdl's ufs_hashlock fix)
1.46--1.47 (fix lfs_avail leakage from sblock
segments ;
use UINO macros)
1.49 (bounds-check inode numbers in
lfs_markv)
ufs/lfs/lfs_vfsops.c, 1.53 (use LFS_EST_* macros in lfs_statfs)
1.56--1.58 (initialize lfs_minfreeseg, lfs_effnblk ;
initialize lfs_uinodes ;
initialize lfs_ravail)
ufs/lfs/lfs_vnops.c, 1.40 (remove VDIROP from removed files)
1.42--1.44 (move SET_ENDOP below the removal of
VDIROP ;
use UINO macros and add lfs_itimes
function ;
use lfs_reserve in dirops)
 1.17.4.1 03-Jul-2000  fvdl pullup the fixes from the trunk to not hold ufs_hashlock across
getnewvnode()
 1.24.6.4 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.24.6.3 16-Mar-2002  jdolecek Catch up with -current.
 1.24.6.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.24.6.1 03-Aug-2001  lukem update to -current
 1.24.4.2 29-Jun-2001  perseant Get rid of __P(), protoizing where it had not already been done
 1.24.4.1 27-Jun-2001  perseant Import of what I've been calling "LFSv2", that is, LFS with some features
added that require changes to the on-disk data structures. These include:

- 64-bit time in everything but inodes
- User-specified segment offset, and segment size no longer
restricted to PO2.
- Serial number on segment summaries in addition to timestamp, and
a new volume identifier, to make roll-forward feasible without
fear of finding old data and thinking it was new.

Although I think this version works at least as well as what's on the trunk,
we're not done yet; hence this commit is going in on a branch and not on
the trunk. Enhancements that are not here yet include fragment addressing,
like FFS does, instead of block addressing.
 1.24.2.8 29-Dec-2002  thorpej Sync with HEAD.
 1.24.2.7 19-Dec-2002  thorpej Sync with HEAD.
 1.24.2.6 11-Dec-2002  thorpej Sync with HEAD.
 1.24.2.5 20-Jun-2002  nathanw Catch up to -current.
 1.24.2.4 28-Feb-2002  nathanw Catch up to -current.
 1.24.2.3 08-Jan-2002  nathanw Catch up to -current.
 1.24.2.2 21-Sep-2001  nathanw Catch up to -current.
 1.24.2.1 24-Aug-2001  nathanw Catch up with -current.
 1.25.2.1 01-Oct-2001  fvdl Catch up with -current.
 1.30.4.1 20-Jun-2002  lukem Pull up revision 1.31 (requested by perseant in ticket #325):
For synchronous writes, keep separate i/o counters for each write, so
processes don't have to wait for one another to finish (e.g., nfsd seems
to be a little happier now, though I haven't measured the difference).
Synchronous checkpoints, however, must always wait for all i/o to finish.
Take the contents of the callback functions and have them run in thread
context instead (aiodoned thread). lfs_iocount no longer has to be
protected in splbio(), and quite a bit less of the segment construction
loop needs to be in splbio() as well.
If lfs_markv is handed a block that is not the correct size according to
the inode, refuse to process it. (Formerly it was extended to the "correct"
size.) This is possibly more prone to deadlock, but less prone to corruption.
lfs_segclean now outright refuses to clean segments that appear to have live
bytes in them. Again this may be more prone to deadlock but avoids
corruption.
Replace ufsspec_close and ufsfifo_close with LFS equivalents; this means
that no UFS functions need to know about LFS_ITIMES any more. Remove
the reference from ufs/inode.h.
Tested on i386, test-compiled on alpha.
 1.30.2.1 20-Jun-2002  gehenna catch up with -current.
 1.50.2.9 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.50.2.8 08-Mar-2005  skrll Sync with HEAD.
 1.50.2.7 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.50.2.6 30-Oct-2004  skrll Oops, forgot this as part of the

"Reduced diff to HEAD by restoring the struct proc * argument to lfs_bmapv"

change
 1.50.2.5 21-Sep-2004  skrll Fix the sync with head I botched.
 1.50.2.4 18-Sep-2004  skrll Sync with HEAD.
 1.50.2.3 24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.50.2.2 03-Aug-2004  skrll Sync with HEAD
 1.50.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.59.2.1 23-May-2004  tron branches: 1.59.2.1.2;
Pull up revision 1.61 (requested by atatat in ticket #374):
Tweak sysctl setup functions (the macros, actually) for use in lkms,
and tweak lkminit_*.c (where applicable) to call them, and to call
sysctl_teardown() when being unloaded.
This consists of (1) making setup functions not be static when being
compiled as lkms (change to sys/sysctl.h), (2) making prototypes
visible for the various setup functions in header files (changes to
various header files), and (3) making simple "load" and "unload"
functions in the actual lkminit stuff.
linux_sysctl.c also needs its root exposed (ie, made not static) for
this (when built as an lkm).
 1.59.2.1.2.1 10-May-2005  riz Pull up the following revisions (requested by perseant in ticket #1281):

1.8 sys/ufs/lfs/TODO
1.75 sys/ufs/lfs/lfs.h (via patch)
1.74 sys/ufs/lfs/lfs_alloc.c (via patch)
1.49, 1.51 sys/ufs/lfs/lfs_balloc.c (1.51 via patch)
1.78 sys/ufs/lfs/lfs_bio.c
1.62 sys/ufs/lfs/lfs_extern.h (via patch)
1.156 sys/ufs/lfs/lfs_segment.c (via patch)
1.48 sys/ufs/lfs/lfs_subr.c
1.101 sys/ufs/lfs/lfs_syscalls.c
1.163 sys/ufs/lfs/lfs_vfsops.c (via patch)
1.134 sys/ufs/lfs/lfs_vnops.c (via patch)
1.61 sys/ufs/ufs/ufs_readwrite.c (via patch)

1.20 libexec/lfs_cleanerd/clean.h (via patch)
1.52 libexec/lfs_cleanerd/cleanerd.c (via patch)
1.41 libexec/lfs_cleanerd/library.c (via patch)

1.4 regress/sys/fs/lfs/newfs_fsck/Makefile
1.2 regress/sys/fs/lfs/newfs_fsck/mkfs_mount
1.2 regress/sys/fs/lfs/newfs_fsck/smallfiles
1.3 sbin/fsck_lfs/bufcache.c
1.3 sbin/fsck_lfs/bufcache.h
1.3 sbin/fsck_lfs/lfs.h
1.8 sbin/fsck_lfs/lfs.c (via patch)
1.8 sbin/fsck_lfs/pass3.c (via patch)
1.18 sbin/fsck_lfs/pass0.c (via patch)
1.18 sbin/fsck_lfs/utilities.c (via patch)
1.7 sbin/fsck_lfs/segwrite.c
1.19 sbin/fsck_lfs/setup.c (via patch)
1.3 sbin/newfs_lfs/Makefile
0 sbin/newfs_lfs/lfs.c (yes, remove it)
1.1 sbin/newfs_lfs/make_lfs.c
1.15 sbin/newfs_lfs/newfs.c (via patch)

Various minor LFS improvements.

Kernel:

* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this. Should fix PR #29045.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
Fixes PR #26680.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().

cleaner:

* Adapt lfs_cleanerd to use the fcntl call to get the Ifile filehandle,
so it need not be in the namespace.
* Make lfs_cleanerd be more careful when there are very few available
segments.
* Make lfs_cleanerd less verbose when the filesystem is unmounted.

newfs_lfs, fsck_lfs, and regression:

* Extend the lfs library from fsck_lfs(8) so that it can be used with a
not-yet-existent LFS. Make newfs_lfs(8) use this library, so it can
create LFSs whose Ifile is larger than one segment. Addresses PR #11110.
* Make newfs_lfs(8) use strsuftoi64() for its arguments, a la newfs(8).
* Make fsck_lfs(8) respect the "file system is clean" flag.
* Don't let fsck_lfs(8) think it has dirty blocks when invoked with the
-n flag.
* Remove the Ifile from the filesystem namespace. The cleaner now uses
a fcntl call on the root inode to find the Ifile filehandle. (As a
side-effect, addresses PR #29144.)
 1.61.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.61.4.1 29-Apr-2005  kent sync with -current
 1.64.2.9 10-Aug-2006  tron Apply patch (requested by fair in perseant #1457):
Bring LFS up to current, including a patch (1.95 lfs_alloc.c) that
should prevent the inode free list errors seen on the STABLE branch
subsequent to pullup ticket #1327.
 1.64.2.8 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.171
sys/ufs/lfs/lfs_extern.h: revision 1.81
sys/ufs/lfs/lfs_segment.c: revision 1.177
Don't ever partially write dirops, even if we need the cleaner to run.
This increases the chances of the "no clean segments" panic slightly,
but allows us to run the ckckp regression test successfully to completion.
 1.64.2.7 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs.h: revision 1.104
sys/ufs/lfs/lfs_vfsops.c: revision 1.206
sys/ufs/lfs/lfs_vnops.c: revision 1.170
sys/ufs/lfs/lfs_extern.h: revision 1.80
sys/ufs/lfs/lfs_segment.c: revision 1.176
sys/ufs/lfs/lfs_inode.c: revision 1.103 via patch
sys/ufs/lfs/lfs_alloc.c: revision 1.90
Postpone the segment accounting changes coming from truncation until the
inode that makes those changes valid is either written to disk by
lfs_writeinode() or discarded by lfs_vfree().
A couple of locking fixes are also included as well.
 1.64.2.6 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vfsops.c: revision 1.200
sys/ufs/lfs/lfs_vnops.c: revision 1.164
sys/ufs/lfs/lfs_inode.c: revision 1.101
sys/ufs/lfs/lfs_extern.h: revision 1.78
sys/ufs/lfs/lfs.h: revision 1.100
Implement a somewhat finer-grained mechanism for paging LFS-backed pages.
The writer daemon, if it does not need to flush the whole filesystem,
now only writes the vnodes for which the pagedaemon has requested pageouts
(although it does not pay attention to the page ranges the pagedaemon
supplies).
 1.64.2.5 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_alloc.c: revision 1.87
sys/ufs/lfs/lfs.h: revision 1.99
sys/ufs/lfs/lfs_vfsops.c: revision 1.199
sys/ufs/lfs/lfs_extern.h: revision 1.77 via patch
Keep the free list ordered. This solves a problem first pointed out to me
by Michel Oey, in which an aged LFS writes up to an extra Ifile block for
every file created; and paves the way for the truncation of the Ifile when
many files are deleted.
 1.64.2.4 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.158
sys/ufs/lfs/lfs_subr.c: revision 1.57
sys/ufs/lfs/lfs_segment.c: revision 1.171
sys/ufs/lfs/lfs.h: revision 1.97
sys/ufs/lfs/lfs_vfsops.c: revision 1.195
sys/ufs/lfs/lfs_extern.h: revision 1.76
Improvements to LFS's paging mechanism, to wit:
* Acknowledge that sometimes there are more dirty pages to be written to
disk than clean segments. When we reach the danger line,
lfs_gop_write() now returns EAGAIN. The caller of VOP_PUTPAGES(), if
it holds the segment lock, drops it and waits for the cleaner to make
room before continuing.
* Note and avoid a three-way deadlock in lfs_putpages (a writer holding
a page busy blocks on the cleaner while the cleaner blocks on the
segment lock while lfs_putpages blocks on the page).
 1.64.2.3 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.152
sys/ufs/lfs/lfs_debug.c: revision 1.31
sys/ufs/lfs/lfs_subr.c: revision 1.53
sys/ufs/lfs/lfs_extern.h: revision 1.68
sys/ufs/lfs/lfs_inode.c: revision 1.96
sys/ufs/lfs/lfs_bio.c: revision 1.86
sys/ufs/lfs/lfs_alloc.c: revision 1.83
sys/ufs/lfs/lfs_vfsops.c: revision 1.181
sys/ufs/lfs/lfs.h: revision 1.88
sys/ufs/lfs/lfs_segment.c: revision 1.164
- sprinkle const
- avoid shadow variables.
 1.64.2.2 24-Aug-2005  riz Pull up following revision(s) (requested by yamt in ticket #688):
sys/miscfs/genfs/genfs_vnops.c: revision 1.98 via patch
sys/ufs/ffs/ffs_vfsops.c: revision 1.165
sys/ufs/lfs/lfs_extern.h: revision 1.69
sys/fs/filecorefs/filecore_vfsops.c: revision 1.20
sys/nfs/nfs_node.c: revision 1.80
sys/fs/smbfs/smbfs_node.c: revision 1.24
sys/fs/cd9660/cd9660_vfsops.c: revision 1.24
sys/fs/msdosfs/msdosfs_denode.c: revision 1.8
sys/miscfs/genfs/genfs_node.h: revision 1.6
sys/ufs/lfs/lfs_vfsops.c: revision 1.183
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.86
sys/fs/adosfs/advfsops.c: revision 1.23
sys/fs/ntfs/ntfs_vfsops.c: revision 1.31
- constify genfs_ops.
- use member designators.

sys/miscfs/genfs/genfs_vnops.c: revision 1.99 via patch
genfs_getpages: don't forget to put the vnode onto the syncer's work que
ue
even in the case of PGO_LOCKED.

sys/uvm/uvm_bio.c: revision 1.40
sys/uvm/uvm_pager.h: revision 1.29
sys/miscfs/genfs/genfs_vnops.c: revision 1.100 via patch
sys/ufs/ufs/ufs_inode.c: revision 1.50
- introduce PGO_NOBLOCKALLOC and use it for ubc mapping
to prevent unnecessary block allocations in the case that
page size > block size.
- ufs_balloc_range: use VM_PROT_WRITE+PGO_NOBLOCKALLOC rather than
VM_PROT_READ.

sys/uvm/uvm_fault.c: revision 1.96
sys/miscfs/genfs/genfs_vnops.c: revision 1.101 via patch
sys/uvm/uvm_object.h: revision 1.19
sys/miscfs/genfs/genfs_node.h: revision 1.7
ensure that vnodes with dirty pages are always on syncer's queue.
- genfs_putpages: wait for i/o completion of PG_RELEASED/PG_PAGEOUT pages by
setting "wasclean" false when encountering them.
suggested by Stephan Uphoff in PR/24596 (1).
- genfs_putpages: write protect pages when cleaning out, if
we're going to take the vnode off the syncer's queue.
uvm_fault: don't write-map pages unless its vnode is already on
the syncer's queue.
fix PR/24596 (3) but in the different way from the suggested fix.
(to keep our current behaviour, ie. not to require explicit msync.
discussed on tech-kern@.)
- genfs_putpages: don't mistakenly take a vnode off the queue
by introducing a generation number in genfs_node.
genfs_getpages: increment the generation number.
suggested by Stephan Uphoff in PR/24596 (2).
- add some assertions.

sys/miscfs/genfs/genfs_vnops.c: revision 1.102 via patch
genfs_putpages: don't bother to clean the vnode unless VONWORKLST.

sys/ufs/ffs/ffs_vnops.c: revision 1.71
ffs_full_fsync: because VBLK/VCHR can be mmap'ed,
do VOP_PUTPAGES for them as well.

sys/uvm/uvm_fault.c: revision 1.97
uvm_fault: check a correct object in the case of layered filesystems.
fix PR/30811 from Jukka Salmi.

sys/uvm/uvm_object.h: revision 1.20
sys/ufs/ffs/ffs_vfsops.c: revision 1.167
sys/uvm/uvm_bio.c: revision 1.41
sys/ufs/ufs/ufs_vnops.c: revision 1.129
sys/uvm/uvm_mmap.c: revision 1.92
sys/uvm/uvm_fault.c: revision 1.98
sys/kern/vfs_subr.c: revision 1.252
sys/fs/msdosfs/denode.h: revision 1.5
sys/miscfs/genfs/genfs_vnops.c: revision 1.103 via patch
sys/fs/msdosfs/msdosfs_denode.c: revision 1.9
sys/sys/vnode.h: revision 1.141
sys/ufs/ufs/ufs_inode.c: revision 1.51
sys/ufs/ufs/ufs_extern.h: revision 1.45 via patch
sys/miscfs/genfs/genfs_node.h: revision 1.8
sys/ufs/lfs/lfs_vfsops.c: revision 1.184
sys/uvm/uvm_pager.h: revision 1.30
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.87
update file timestamps for nfsd loaned-read and mmap.
PR/25279. discussed on tech-kern@.

sys/miscfs/genfs/genfs_vnops.c: revision 1.104 via patch
don't write-protect wired pages. pointed by Chuck Silvers.
for now, leave a vnode on the syncer's queue, as suggested by him.

sys/ufs/ffs/ffs_vnops.c: revision 1.72
revert VCHR part of ffs_vnops.c 1.71.
as VCHR uses the device pager, no point to call VOP_PUTPAGES here.
pointed by Chuck Silvers.
 1.64.2.1 07-May-2005  tron Apply patch (requested by perseant in ticket #242):
* fsck_lfs buffer cache fixes, including PR #29151
* Change fsck_lfs phase 0 message to reflect reality
* fsck_lfs: check phase 5 (cleanerinfo accounting) even on
roll-forward
* Keep better track of the free list during roll-forward, avoiding
a core dump
* Improve hash table use for fsck_lfs buffer and vnode cache
* Document fsck_lfs flag -f, and implement -q
* Add resize_lfs, including kernel support
* Add LFS to mountd's list of exportable filesystem types
* Make the LFS lkm work again [christos@]
* Add MP locking to the LFS kernel subsystem
* Fix pager_map deadlock in lfs_putpages()
* Avoid incomplete file extension that looks like "partial
truncation" to fsck
* Use lfs_malloc for cleaner malloc, since the cleaner often runs
in low-memory conditions.
* Use splay trees, not hash table, to track page allocation for
write.
* Fix mkdir panic on full fs
* Fix page accounting leak by counting differently.
* Use rightly named structure for lfs_getattr [skrll@]
* Cosmetic changes for readability.
 1.69.2.5 21-Jan-2008  yamt sync with head
 1.69.2.4 07-Dec-2007  yamt sync with head
 1.69.2.3 03-Sep-2007  yamt sync with head.
 1.69.2.2 30-Dec-2006  yamt sync with head.
 1.69.2.1 21-Jun-2006  yamt sync with head.
 1.71.2.1 20-Oct-2005  yamt adapt ufs.
 1.73.2.1 15-Jan-2006  yamt sync with head.
 1.75.10.2 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.75.10.1 28-Mar-2006  tron Merge 2006-03-28 NetBSD-current into the "peter-altq" branch.
 1.75.8.5 11-May-2006  elad sync with head
 1.75.8.4 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.75.8.3 03-May-2006  yamt wrap some decls with #ifdef _KERNEL. ok'ed by elad@.
 1.75.8.2 19-Apr-2006  elad sync with head.
 1.75.8.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.75.6.5 03-Sep-2006  yamt sync with head.
 1.75.6.4 11-Aug-2006  yamt sync with head
 1.75.6.3 24-May-2006  yamt sync with head.
 1.75.6.2 11-Apr-2006  yamt sync with head
 1.75.6.1 01-Apr-2006  yamt sync with head.
 1.75.4.2 01-Jun-2006  kardel Sync with head.
 1.75.4.1 22-Apr-2006  simonb Sync with head.
 1.75.2.1 09-Sep-2006  rpaulo sync with head
 1.83.4.1 13-Jul-2006  gdamore Merge from HEAD.
 1.87.12.1 03-Sep-2007  wrstuden Sync w/ NetBSD-4-RC_1
 1.87.8.2 07-May-2007  yamt sync with head.
 1.87.8.1 12-Mar-2007  rmind Sync with HEAD.
 1.87.6.1 05-Jun-2007  bouyer Pull up following revision(s) (requested by perseant in ticket #703):
sys/miscfs/genfs/genfs.h 1.21
sys/miscfs/genfs/genfs_vnops.c 1.151
sys/ufs/lfs/lfs.h 1.119, 1.120
sys/ufs/lfs/lfs_bio.c 1.99-101
sys/ufs/lfs/lfs_extern.h 1.89
sys/ufs/lfs/lfs_inode.c 1.108, 1.109
sys/ufs/lfs/lfs_segment.c 1.197, 1.199, 1.200
sys/ufs/lfs/lfs_subr.c 1.69, 1.70
sys/ufs/lfs/lfs_syscalls.c 1.119
sys/ufs/lfs/lfs_vfsops.c 1.234, 1.235
sys/ufs/lfs/lfs_vnops.c 1.195, 1.196, 1.200, 1.202-206

Reduce busy waiting in lfs_putpages(), and other LFS improvements.
 1.88.4.1 11-Jul-2007  mjf Sync with head.
 1.88.2.4 20-Aug-2007  ad Sync with HEAD.
 1.88.2.3 15-Jul-2007  ad Sync with head.
 1.88.2.2 08-Jun-2007  ad Sync with head.
 1.88.2.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.90.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.91.12.2 31-Jul-2007  pooka * nuke the nameidata parameter from VFS_MOUNT(). Nobody on tech-kern
knew what it was supposed to be used for and wrstuden gave a go-ahead
* while rototilling, convert file systems which went easily to
use VFS_PROTOS() instead of manually prototyping the methods
 1.91.12.1 31-Jul-2007  pooka file lfs_extern.h was added on branch matt-mips64 on 2007-07-31 21:14:21 +0000
 1.91.10.3 18-Feb-2008  mjf Sync with HEAD.
 1.91.10.2 27-Dec-2007  mjf Sync with HEAD.
 1.91.10.1 08-Dec-2007  mjf Sync with HEAD.
 1.91.4.1 09-Jan-2008  matt sync with HEAD
 1.91.2.2 09-Dec-2007  jmcneill Sync with HEAD.
 1.91.2.1 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.92.2.4 26-Dec-2007  ad Sync with head.
 1.92.2.3 19-Dec-2007  ad Use a global lfs_lock.
 1.92.2.2 19-Dec-2007  ad Get lfs mostly working.
 1.92.2.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.93.4.1 02-Jan-2008  bouyer Sync with HEAD
 1.94.10.2 04-May-2009  yamt sync with head.
 1.94.10.1 16-May-2008  yamt sync with head.
 1.94.8.1 18-May-2008  yamt sync with head.
 1.94.6.2 29-Jun-2008  mjf Sync with HEAD.
 1.94.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.95.4.1 03-Jul-2008  simonb Sync with head.
 1.95.2.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.96.34.2 24-Feb-2012  mrg sync to -current.
 1.96.34.1 18-Feb-2012  mrg merge to -current.
 1.96.30.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.96.30.1 17-Apr-2012  yamt sync with head
 1.98.2.3 03-Dec-2017  jdolecek update from HEAD
 1.98.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.98.2.1 23-Jun-2013  tls resync from head
 1.99.4.1 23-Jul-2013  riastradh sync with HEAD
 1.99.2.1 28-Aug-2013  rmind sync with head
 1.101.6.5 28-Aug-2017  skrll Sync with HEAD
 1.101.6.4 09-Jul-2016  skrll Sync with HEAD
 1.101.6.3 22-Sep-2015  skrll Sync with HEAD
 1.101.6.2 06-Jun-2015  skrll Sync with HEAD
 1.101.6.1 06-Apr-2015  skrll Sync with HEAD
 1.111.10.1 30-Oct-2017  snj Pull up following revision(s) (requested by maya in ticket #330):
sbin/fsck_lfs/inode.c: 1.69
sbin/fsck_lfs/lfs.c: 1.73
sbin/fsck_lfs/pass6.c: 1.50
sbin/fsck_lfs/segwrite.c: 1.46
sys/ufs/lfs/lfs.h: 1.202-1.203
sys/ufs/lfs/lfs_accessors.h: 1.48
sys/ufs/lfs/lfs_alloc.c: 1.136-1.137
sys/ufs/lfs/lfs_balloc.c: 1.94
sys/ufs/lfs/lfs_bio.c: 1.141
sys/ufs/lfs/lfs_extern.h: 1.113
sys/ufs/lfs/lfs_inode.c: 1.156-1.157
sys/ufs/lfs/lfs_inode.h: 1.20, 1.21, 1.23
sys/ufs/lfs/lfs_itimes.c: 1.20
sys/ufs/lfs/lfs_pages.c: 1.13-1.15
sys/ufs/lfs/lfs_rename.c: 1.22
sys/ufs/lfs/lfs_segment.c: 1.270-1.275
sys/ufs/lfs/lfs_subr.c: 1.94-1.97
sys/ufs/lfs/lfs_syscalls.c: 1.175
sys/ufs/lfs/lfs_vfsops.c: 1.360
sys/ufs/lfs/lfs_vnops.c: 1.316-1.321
sys/ufs/lfs/ulfs_inode.c: 1.20
sys/ufs/lfs/ulfs_inode.h: 1.24
sys/ufs/lfs/ulfs_lookup.c: 1.41
sys/ufs/lfs/ulfs_quota2.c: 1.31
sys/ufs/lfs/ulfs_readwrite.c: 1.24
sys/ufs/lfs/ulfs_vnops.c: 1.49-1.50
Update inode member i_flag --> i_state to keep up with kernel changes
Move definition of IN_ALLMOD near the flag it's a mask for.
Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
Correct confusion between i_flag and i_flags
These will have to be renamed.
Spotted by Riastradh, thanks!
Add an XXX about the missing flags so it's not buried in a commit
message.
now the XXX count for LFS is 260
Rename i_flag to i_state.
The similarity to i_flags has previously caused errors.
Use continue to denote the no-op loop to match netbsd style
newline for extra clarity.
It isn't safe to drain dirops with seglock held, it'll deadlock if there
are any dirops. drain before grabbing seglock.
lfs_dirops == 0 is always true (as we already drained dirops), so omit
that part of the comparison.
Fixes a lot of LFS deadlocks. PR kern/52301
Many thanks to dholland for help analyzing coredumps
Ifdef out KDASSERT which fires on my machine.
Deduplicate sanity check that seglock is held on segunlock
Revert r1.272 fix to PR kern/52301, the performance hit is making things
unusable.
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar
XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
fix buffer overflow/KASSERT when cookies are supplied
lfs no longer uses the ffs-style struct direct, use the correct minimum
size
from dholland
XXX more wrong
Consistently use {,UN}MARK_VNODE macros rather than function calls.
Not much point doing anything after a panic call
Ask some question about the code in a XXX comment
XXX question our double-flushing of dirops
Fix typo in comment
 1.113.4.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.113.4.1 10-Jun-2019  christos Sync with HEAD
 1.113.2.1 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.114.6.1 29-Feb-2020  ad Sync with head.
 1.114.4.1 17-Aug-2020  martin Pull up following revision(s) (requested by riastradh in ticket #1050):

sys/ufs/lfs/lfs_subr.c: revision 1.101
sys/ufs/lfs/lfs_subr.c: revision 1.102
sys/ufs/lfs/lfs_inode.c: revision 1.158
sys/ufs/lfs/lfs_inode.h: revision 1.25
sys/ufs/lfs/lfs_balloc.c: revision 1.95
sys/ufs/lfs/lfs_pages.c: revision 1.21
sys/ufs/lfs/lfs_vnops.c: revision 1.330
sys/ufs/lfs/lfs_alloc.c: revision 1.140 (patch)
sys/ufs/lfs/lfs_alloc.c: revision 1.141 (patch)
lib/libp2k/p2k.c: revision 1.72
sys/ufs/lfs/lfs.h: revision 1.205
sys/ufs/lfs/lfs.h: revision 1.206
sys/ufs/lfs/lfs_segment.c: revision 1.284
sys/ufs/lfs/lfs.h: revision 1.207
sys/ufs/lfs/lfs_segment.c: revision 1.285
sys/ufs/lfs/lfs_debug.c: revision 1.55
sys/ufs/lfs/lfs_rename.c: revision 1.23
usr.sbin/dumplfs/dumplfs.c: revision 1.65
sys/ufs/lfs/lfs_vfsops.c: revision 1.371
sys/arch/i386/stand/efiboot/bootx64/Makefile: revision 1.3
sys/ufs/lfs/lfs_vfsops.c: revision 1.372
sys/ufs/lfs/lfs_vfsops.c: revision 1.373
sbin/fsck_lfs/pass1.c: revision 1.46
sys/ufs/lfs/lfs_vnops.c: revision 1.326
sys/ufs/lfs/lfs_vnops.c: revision 1.327
sys/ufs/lfs/lfs_vfsops.c: revision 1.375 (patch)
sys/ufs/lfs/lfs_vnops.c: revision 1.328
sys/ufs/lfs/lfs_subr.c: revision 1.98
sys/ufs/lfs/lfs_extern.h: revision 1.116
sys/ufs/lfs/lfs_vnops.c: revision 1.329
sys/ufs/lfs/lfs_subr.c: revision 1.99
sys/ufs/lfs/lfs_extern.h: revision 1.117
sys/ufs/lfs/lfs_accessors.h: revision 1.49
sys/ufs/lfs/lfs_extern.h: revision 1.118
sys/rump/fs/lib/liblfs/Makefile: revision 1.15
sys/ufs/lfs/lfs_bio.c: revision 1.146 (patch)
sys/ufs/lfs/lfs_bio.c: revision 1.147
sys/ufs/lfs/lfs_subr.c: revision 1.100

Fix kassert in lfs by initializing vp first.

Use a marker node to iterate lfs_dchainhd / i_lfs_dchain.

I believe elements can be removed while the lock is dropped,
including the next node we're hanging on to.

Just use VOP_BWRITE for lfs_bwrite_log.
Hope this doesn't cause trouble with vfs_suspend.

Teach lfs to transition ro<->rw.

Prevent new dirops while we issue lfs_flush_dirops.

lfs_flush_dirops assumes (by KASSERT((ip->i_state & IN_ADIROP) == 0))
that vnodes on the dchain will not become involved in active dirops
even while holding no other locks (lfs_lock, v_interlock), so we must
set lfs_writer here. All other callers already set lfs_writer.

We set fs->lfs_writer++ without explicitly doing lfs_writer_enter
because
(a) we already waited for the dirops to drain, and
(b) we hold lfs_lock and cannot drop it before setting lfs_writer.

Assert lfs_writer where I think we can now prove it.

Serialize access to the splay tree with lfs_lock.

Change some cheap KDASSERT into KASSERT.

Take a reference and fix assertions in lfs_flush_dirops.
Fixes panic:
KASSERT((ip->i_state & IN_ADIROP) == 0) at lfs_vnops.c:1670
lfs_flush_dirops
lfs_check
lfs_setattr
VOP_SETATTR
change_mode
sys_fchmod
syscall

This assertion -- and the assertion that vp->v_uflag has VU_DIROP set
-- is valid only until we release lfs_lock, because we may race with
lfs_unmark_dirop which will remove the nodes and change the flags.

Further, vp itself is valid only as long as it is referenced, which it
is as long as it's on the dchain, but lfs_unmark_dirop drops the
dchain's reference.

Don't lfs_writer_enter while holding v_interlock.

There's no need to lfs_writer_enter at all here, as far as I can see.
lfs_flush_fs will do it for us.

Break deadlock in PR kern/52301.

The lock order is lfs_writer -> lfs_seglock. The problem in 52301 is
that lfs_segwrite violates this lock order by sometimes doing
lfs_seglock -> lfs_writer, either (a) when doing a checkpoint or (b),
opportunistically, when there are no dirops pending. Both cases can
deadlock, because dirops sometimes take the seglock (lfs_truncate,
lfs_valloc, lfs_vfree):
(a) There may be dirops pending, and they may be waiting for the
seglock, so we can't wait for them to complete while holding the
seglock.
(b) The test for fs->lfs_dirops == 0 happens unlocked, and the state
may change by the time lfs_writer_enter acquires lfs_lock.

To resolve this in each case:
(a) Do lfs_writer_enter before lfs_seglock, since we will need it
unconditionally anyway. The worst performance impact of this should
be that some dirops get delayed a little bit.
(b) Create a new lfs_writer_tryenter to use at this point so that the
test for fs->lfs_dirops == 0 and the acquisition of lfs_writer happen
atomically under lfs_lock.

Initialize/destroy lfs_allclean_wakeup in modcmd, not lfs_mountfs.

Fixes reloading lfs.kmod.

In lfs_update, hold lfs_writer around lfs_vflush.

Otherwise, we might do
lfs_vflush
-> lfs_seglock
-> lfs_segwait(SEGM_CKP)
-> lfs_writer_enter
which is the reverse of the lfs_writer -> lfs_seglock ordering.

Call lfs_orphan in lfs_rename while we're still in the dirop.
lfs_writer_enter can't fail; keep it simple and don't pretend it can.

Assert that mtsleep can't fail either -- it doesn't catch signals and
there's no timeout.

Teach LFS_ORPHAN_NEXTFREE about lfs64.

Dust off the orphan detection code and try to make it work.

Fix !DIAGNOSTIC compile

Fix userland references to LFS_ORPHAN_NEXTFREE.

Forgot to grep for these or do a full distribution build, oops!

Fix missing <sys/evcnt.h> by removing the evcnts instead.

Just wanted to confirm that a race might happen, and indeed it did.
These serve little diagnostic value otherwise.

OR into bp->b_cflags; don't overwrite.

CTASSERT lfs on-disk structure sizes.

Avoid misaligned access to lfs64 on-disk records in memory.
lfs64 directory entries are only 32-bit aligned in order to conserve
space in directory blocks, and we had a hack to stuff a 64-bit inode
in them. This replaces the hack by __aligned(4) __packed, and goes
further:

1. It's not clear that all the other lfs64 data structures are 64-bit
aligned on disk to begin with. We can go through these later and
upgrade them from
struct foo64 {
...
} __aligned(4) __packed;
union foo {
struct foo64 f64;
...
};
to
struct foo64 {
...
};
union foo {
struct foo64 f64 __aligned(8);
...
} __aligned(4) __packed;
if we really want to take advantage of 64-bit memory accesses.
However, the __aligned(4) __packed must remain on the union
because:
2. We access even the lfs32 data structures via a union that has
lfs64 members, and it turns out that compilers will assume access
through a union with 64-bit aligned members implies the whole
union has 64-bit alignment, even if we're only accessing a 32-bit
aligned member.

Fix clang build after packed lfs64 accessor change.

Suppress spurious address-of-packed error in rump lfs too.
 1.160 23-Apr-2020  ad PR kern/54759 (vm.ubc_direct deadlock when read()/write() into mapping of itself)

- Add new flag UBC_ISMAPPED which tells ubc_uiomove() the object is mmap()ed
somewhere. Use it to decide whether to do direct-mapped copy, rather than
poking around directly in the vnode in ubc_uiomove(), which is ugly and
doesn't work for tmpfs. It would be nicer to contain all this in UVM but
the filesystem provides the needed locking here (VV_MAPPED) and to
reinvent that would suck more.

- Rename UBC_UNMAP_FLAG() to UBC_VNODE_FLAGS(). Pass in UBC_ISMAPPED where
appropriate.
 1.159 23-Feb-2020  ad branches: 1.159.4;
UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.158 23-Feb-2020  riastradh In lfs_update, hold lfs_writer around lfs_vflush.

Otherwise, we might do

lfs_vflush
-> lfs_seglock
-> lfs_segwait(SEGM_CKP)
-> lfs_writer_enter

which is the reverse of the lfs_writer -> lfs_seglock ordering.
 1.157 10-Jun-2017  maya branches: 1.157.6; 1.157.10; 1.157.12;
Rename i_flag to i_state.

The similarity to i_flags has previously caused errors.
 1.156 05-Jun-2017  maya Correct confusion between i_flag and i_flags
These will have to be renamed.

Spotted by Riastradh, thanks!
 1.155 01-Apr-2017  maya branches: 1.155.6;
Simplify locking
 1.154 31-Mar-2017  maya stopgap fix- move lfs_lock to include calls to lfs_dino_{set,get}block

blocks new users that need seglock (need to take lfs_lock) that
setblock before the assert (truncate to 0 but 31 blks/31 effblks)

not proper, but lets me run firefox on lfs
 1.153 21-Mar-2017  maya Update mtime even if oip->i_size == length

PR kern/51762, LFS version.
 1.152 19-Mar-2017  riastradh Fix inadvertently reversed sense of comparisons.
 1.151 18-Mar-2017  riastradh #if DIAGNOSTIC panic ---> KASSERT
 1.150 16-Mar-2017  maya actually cast to unsigned long long and use %llu. certainly not use hex (oops)
suggested by dh
 1.149 15-Mar-2017  maya print inode number in an assert I keep hitting and the adjacent one.
use PRIx64 for printing inode number elsewhere.
 1.148 13-Mar-2017  riastradh #if DIAGNOSTIC panic ---> KASSERTMSG
 1.147 01-Sep-2015  dholland branches: 1.147.2; 1.147.4;
Fix up indirect block handling in truncate to be 32/64 clean.
 1.146 01-Sep-2015  dholland Use the lfs dinode accessors in place of the ufs-derived ones.
(Mostly.)

The ufs-derived ones are fake structure member macros, which are gross
and not very safe. Also, it seems that a lot of places in the lfs code
were using the ffsv1 branch of them unconditionally, and this way it's
guaranteed all those places have been updated.

Found while doing this: for non-devices, have getattr produce NODEV
in the rdev field instead of leaking the address of the first direct
block.
 1.145 19-Aug-2015  dholland Part two of dinodes; use the same union everywhere.
(previously the ufs-derived code had things set up slightly different)

Remove a bunch of associated mess.
 1.144 12-Aug-2015  dholland Hack up dinode usage to be 64 vs. 32 as needed. Part 1.

(This part changes the native lfs code; the ufs-derived code already
has 64 vs. 32 logic, but as aspects of it are unsafe, and don't
entirely interoperate cleanly with the lfs 64/32 stuff, pass 2 will be
rehashing that.)
 1.143 02-Aug-2015  dholland whoops, fix 32-bit build
 1.142 02-Aug-2015  dholland Make i_eff_nblks in the in-memory inode 64 bits wide.
 1.141 02-Aug-2015  dholland Fix assorted 64 -> 32 truncations in lfs. Also, some minor tidyups and
corrections in passing.
 1.140 28-Jul-2015  dholland Add a new lfs header file: lfs_accessors.h.

This contains all the accessor functions and macros out of lfs.h.
Add an include of lfs_accessors.h after all uses of lfs.h... except
for code that wants to define its own struct lfs-alike that the
accessors are supposed to play along with. For these, set STRUCT_LFS
and include lfs_accessors.h after the necessary structure has been
defined, so that lfs_accessors.h can emit functions in terms of it.
 1.139 24-Jul-2015  dholland More lfs superblock accessors.
(This changes the rest of the code over; all the accessors were
already added.)

The difference between this commit and the previous one is arbitrary,
but the previous one passed the regression tests on its own so I'm
keeping it separate to help with any bisections that might be needed
in the future.
 1.138 24-Jul-2015  dholland Switch to accessor functions for elements of the LFS on-disk
superblock. This will allow switching between 32/64 bit forms on the
fly; it will also allow handling LFS_EI reasonably tidily. (That
currently doesn't work on the superblock.)

It also gets rid of cpp abuse in the form of fake structure member
macros.

Also, instead of doing sleep/wakeup on &lfs_avail and &lfs_nextseg
inside the on-disk superblock, add extra elements to the in-memory
struct lfs for this. (XXX: these should be changed to condvars, but
not right now)

XXX: this migrates a structure needed by the lfs code in libsa (struct
salfs) into lfs.h, where it doesn't belong, but for the time being
this is necessary in order to allow the accessors (and the various
lfs macros and other goop that relies on them) to compile.
 1.137 16-Jul-2015  dholland Don't cast the return value of malloc.
 1.136 17-Oct-2013  christos branches: 1.136.6;
- remove unused variables
- add debug ifdefs for debugging variables
- __USE() where appropriate.
 1.135 28-Jul-2013  dholland Add more of the bits for supporting quotas.
 1.134 28-Jul-2013  dholland Migrate the miscellaneous ulfs-level info from struct ulfsmount to
struct lfs.

Put them inside #ifdef _KERNEL there. They are not the only such
members, gross as that is. Unfortunately, moving struct lfs to
lfs_kernel.h does not work.
 1.133 28-Jul-2013  dholland Add lfs_kernel.h for declarations that don't need to be exposed to userland.

lfs currently has the following headers:
lfs.h - on-disk structures and stuff needed for userlevel tools
lfs_inode.h - additional restricted materials for userlevel tools
that operate the fs (newfs_lfs, fsck_lfs, lfs_cleanerd)
lfs_kernel.h - stuff needed only in the kernel

and the following legacy headers that are expected to be mopped up and
folded into one of the above:
lfs_extern.h - function prototypes
ulfs_bswap.h - endian-independent support
ulfs_dinode.h - now contains very little
ulfs_dirhash.h - dirhash support
ulfs_extattr.h - extattr support
ulfs_extern.h - more function prototypes
ulfs_inode.h - assorted kernel-only declarations
ulfs_quota.h - quota support
ulfs_quota1.h - more quota support
ulfs_quota2.h - more quota support
ulfs_quotacommon.h - more quota support
ulfsmount.h - legacy copy of ufsmount material
 1.132 18-Jun-2013  christos branches: 1.132.2;
Prefix most of the cpp macros with lfs_ and LFS_ to avoid conflicts with ffs.
This was done so that boot blocks that want to compile both FFS and LFS in
the same file work.
 1.131 06-Jun-2013  dholland Add lfs_ or ulfs_ in front of extern symbols lacking them, mostly
quota-related (and particularly quota2-related) stuff.
 1.130 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.129 06-Jun-2013  dholland Split lfs from ufs step 3: rearrange config stuff.
Add new options:
LFS_EI
LFS_DIRHASH
LFS_EXTATTR
LFS_EXTATTR_AUTOSTART
LFS_QUOTA
LFS_QUOTA2

and update code referring to the corresponding FFS and UFS config
symbols to use the LFS versions. Disable the one extant reference
to APPLE_UFS in the ulfs files. Use opt_lfs.h only, not opt_ffs.h.
 1.128 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.127 22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.126 23-Nov-2011  bouyer branches: 1.126.8;
If ufs_balloc_range() fails, make sure to call ?fs_truncate() to
reset v_writesize to the right value.
If v_writesize is left larger than the allocated blocks, we may have
the same issue as the one described in
http://mail-index.netbsd.org/tech-kern/2010/02/02/msg007156.html
 1.125 11-Jul-2011  hannken branches: 1.125.2;
Change VOP_BWRITE() to take a vnode as its first argument like all other
VOPs do. Layered file systems no longer have to modify bp->b_vp and run
into trouble when an async VOP_BWRITE() uses the wrong vnode.

- change all occurences of VOP_BWRITE(bp) to VOP_BWRITE(bp->b_vp, bp).
- remove layer_bwrite().
- welcome to 5.99.55

Adresses PR kern/38762 panic: vwakeup: neg numoutput

No objections from tech-kern@.
 1.124 16-Jun-2011  hannken Rename uvm_vnp_zerorange(struct vnode *, off_t, size_t) to
ubc_zerorange(struct uvm_object *, off_t, size_t, int) changing
the first argument to an uvm_object and adding a flags argument.

Modify tmpfs_reg_resize() to zero the backing store (aobj) instead
of the vnode. Ubc_purge() no longer panics when unmounting tmpfs.

Keep uvm_vnp_zerorange() until the next kernel version bump.
 1.123 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.122 16-Feb-2010  mlelstv branches: 1.122.2; 1.122.8;
Three changes in a single commit.

- drop the notion of frags (LFS fragments) vs fsb (FFS fragments)
The code uses a complicated unity function that just makes the
code difficult to understand.

- support larger sector sizes. Fix disk address computations
to use DEV_BSIZE in the kernel as required by device drivers
and to use sector sizes in userland.

- Fix several locking bugs in lfs_bio.c and lfs_subr.c.
 1.121 07-Feb-2010  bouyer branches: 1.121.2;
- ufs_balloc_range(): on error, only PG_RELEASED the pages that were
allocated to extend the file to the new size. Releasing all pages
may release pages that contains previously-written data not yet flushed
to disk. Should fix PR kern/35704
- {ffs,lfs,ext2fs}_truncate(): Even if the inode's size is the same as
the new length, call uvm_vnp_setsize(). *_truncate() may have been
called by *_write() in the error path (e.g. block allocation failure
because of quota of file system full), and at this point v_writesize
has been set to the desired size of the file and not reverted to the
old size. Not adjusting v_writesize to the real size cause
genfs_do_io() to write to disk past the real end of the file.
 1.120 28-Apr-2008  martin branches: 1.120.10; 1.120.18;
Remove clause 3 and 4 from TNF licenses
 1.119 27-Mar-2008  ad branches: 1.119.2; 1.119.4;
Make rusage collection per-LWP and collate in the appropriate places.
cloned threads need a little bit more work but the locking needs to
be fixed first.
 1.118 15-Feb-2008  ad branches: 1.118.6;
Give bbusy() an interlock argument. If the we need to wait for the buffer,
the interlock is dropped and reacquired when awoken. This allows for
busying buffers attached to a list that is not locked by bufcache_lock.
 1.117 15-Feb-2008  ad The buffer LOCKED flag need not be under the protection of bufcache_lock,
BUSY is enough.
 1.116 02-Jan-2008  ad Merge vmlocking2 to head.
 1.115 08-Dec-2007  pooka branches: 1.115.4;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.
 1.114 10-Oct-2007  ad branches: 1.114.4; 1.114.6;
Fix DEBUG builds.
 1.113 10-Oct-2007  ad Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.112 08-Oct-2007  ad Merge ffs locking & brelse changes from the vmlocking branch.
 1.111 10-Jul-2007  hannken branches: 1.111.6; 1.111.8; 1.111.10;
Move `struct dquot' and its supporting functions from quota.h to ufs_quota.c.

- Make quota-internal functions static.
- Clean up declarations in quota.h and ufs_extern.h. quota.h now has the
description of quota criterions, on-disk structure, user-kernel interface and
declaration of init/done functions. All ufs quota related function
prototypes go to ufs_extern.h.
- New functions ufsquota_init() and ufsquota_free() create or destroy the
quota fields of `struct inode'.
- chkdq() and chkiq() always update the quota fields of `struct inode' first.
- Only ufs_access() explicitely calls getinoquota().

No objections on tech-kern@
 1.110 05-Jun-2007  yamt improve post-ubc file overwrite performance in common cases.
ie. when it's safe, actually overwrite blocks rather than doing
read-modify-write.

also fixes PR/33152 and PR/36303.
 1.109 16-May-2007  perseant Change references to SEGM_W_DIROPS to SEGM_CKP, and replace the logic that
formerly used SEGM_W_DIROPS in lfs_segwrite() appropriately. This prevents
a problem in which processes could get stuck in "buffers" sleep forever.
 1.108 18-Apr-2007  perseant Remember to write dirops when the vnode we are trying to flush is a dirop.
 1.107 04-Mar-2007  christos branches: 1.107.2; 1.107.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.106 14-Oct-2006  yamt branches: 1.106.4;
don't use g_glock directly.
 1.105 14-May-2006  elad branches: 1.105.8; 1.105.10;
integrate kauth.
 1.104 14-May-2006  christos Correct a bogus expression gcc4 found.
 1.103 30-Apr-2006  perseant Postpone the segment accounting changes coming from truncation until the
inode that makes those changes valid is either written to disk by
lfs_writeinode() or discarded by lfs_vfree().

A couple of locking fixes are also included as well.
 1.102 19-Apr-2006  perseant Avoid a possible sign overflow condition in lfs_truncate, which would result
in a buffer overflow (underflow). Coverity CID 1521.
 1.101 08-Apr-2006  perseant Implement a somewhat finer-grained mechanism for paging LFS-backed pages.
The writer daemon, if it does not need to flush the whole filesystem,
now only writes the vnodes for which the pagedaemon has requested pageouts
(although it does not pay attention to the page ranges the pagedaemon
supplies).
 1.100 11-Dec-2005  christos branches: 1.100.4; 1.100.6; 1.100.8; 1.100.10; 1.100.12;
merge ktrace-lwp.
 1.99 11-Nov-2005  yamt - ignore truncation for VCHR/VBLK/VFIFO as it used to be
before yamt-vop merge. PR/32049 from Atsushi Onoe.
- reject setattr which attempts to change size of VLNK/VSOCK.
 1.98 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.97 12-Sep-2005  christos branches: 1.97.2;
Use nanotime() to update the time fields in filesystems. Convert the code
from macros to real functions. Original patch and review from chuq.
Note: ext2fs only keeps seconds in the on-disk inode, and msdosfs does not
have enough precision for all fields, so this is not very useful for those
two.
 1.96 29-May-2005  christos branches: 1.96.2;
- sprinkle const
- avoid shadow variables.
 1.95 23-Apr-2005  perseant Provide a resize_lfs(8), including kernel and cleaner support. The current
implementation requires the fs to be mounted while resizing. Tested in both
directions, and everything appears to work happily, but ymmv.
 1.94 16-Apr-2005  perseant Use splay trees, rather than a hash table, to manage the accounting of
blocks allocated through VOP_BALLOC() for pages to be written to disk.
This accounting no longer takes a noticeable fraction of the system CPU.
 1.93 16-Apr-2005  perseant Use lfs_malloc() to manage the blkiov arrays that the cleaner functions use,
since the cleaner is likely to operate in a low-memory condition.
 1.92 14-Apr-2005  perseant Keep track of the highest block held by an LFS inode, so that we can
be assured that the last byte of a file is always allocated. Previously
a file extension could cause the filesystem to be flushed, writing an
inconsistent inode to disk. Although this condition would be corrected
the next time blocks were written to disk, an intervening crash would leave
the filesystem in an inconsistent state, leaving fsck_lfs to complain
of an inode "partially truncated".
 1.91 01-Apr-2005  perseant Protect various per-fs structures with fs->lfs_interlock simple_lock, to
improve behavior in the multiprocessor case. Add debugging segment-lock
assertion statements.
 1.90 08-Mar-2005  perseant branches: 1.90.2;
Straighten out the maze of ifdefs. Instead, consolidate all the debugging
stuff under '#ifdef DEBUG', and use sysctl knobs to turn on/off particular
parts of the debugging reporting (if DEBUG is enabled). Re-enable the LFS
statistics in sysctl, while I'm there. A bit of a rototill.
 1.89 26-Feb-2005  perry nuke trailing whitespace
 1.88 15-Aug-2004  mycroft branches: 1.88.4; 1.88.6;
Don't write out the extra zero pages with PGO_SYNCIO. We start an asynchronous
write anyway, and they will not be freed until that write is finished.
 1.87 15-Aug-2004  mycroft Copy the current partial-truncate logic from FFS. In the process, fix a
potential overrun when truncating a fragment.
 1.86 15-Aug-2004  mycroft Minor simplification to some arithmetic.
 1.85 15-Aug-2004  mycroft Fixing age old cruft:
* Rather than using mnt_maxsymlinklen to indicate that a file systems returns
d_type fields(!), add a new internal flag, IMNT_DTYPE.

Add 3 new elements to ufsmount:
* um_maxsymlinklen, replaces mnt_maxsymlinklen (which never should have existed
in the first place).
* um_dirblksiz, which tracks the current directory block size, eliminating the
FS-specific checks littered throughout the code. This may be used later to
make the block size variable.
* um_maxfilesize, which is the maximum file size, possibly adjusted lower due
to implementation issues.

Sync some bug fixes from FFS into ext2fs, particularly:
* ffs_lookup.c 1.21, 1.28, 1.33, 1.48
* ffs_inode.c 1.43, 1.44, 1.45, 1.66, 1.67
* ffs_vnops.c 1.84, 1.85, 1.86

Clean up some crappy pointer frobnication.
 1.84 14-Aug-2004  mycroft Add a new flag, IN_MODIFY. This is like IN_UPDATE|IN_CHANGE, but unlike
setting those flags, it does not cause the inode to be written in the periodic
sync. This is used for writes to special files (devices and named pipes) and
FIFOs.

Do not preemptively sync updates to access times and modification times. They
are now updated in the inode only opportunistically, or when the file or device
is closed. (Really, it should be delayed beyond close, but this is enough to
help substantially with device nodes.)

And the most amusing part:
Trickle sync was broken on both FFS and ext2fs, in different ways. In FFS, the
periodic call to VFS_SYNC(MNT_LAZY) was still causing all file data to be
synced. In ext2fs, it was causing the metadata to *not* be synced. We now
only call VOP_UPDATE() on the node if we're doing MNT_LAZY. I've confirmed
that we do in fact trickle correctly now.
 1.83 30-Mar-2004  oster If we bail out due to an error, we need 'unreserve' the space that
we'd reserved earlier.

Approved by: yamt
 1.82 25-Jan-2004  hannken branches: 1.82.2;
Make VOP_STRATEGY(bp) a real VOP as discussed on tech-kern.

VOP_STRATEGY(bp) is replaced by one of two new functions:

- VOP_STRATEGY(vp, bp) Call the strategy routine of vp for bp.
- DEV_STRATEGY(bp) Call the d_strategy routine of bp->b_dev for bp.

DEV_STRATEGY(bp) is used only for block-to-block device situations.
 1.81 30-Dec-2003  pk Replace the traditional buffer memory management -- based on fixed per buffer
virtual memory reservation and a private pool of memory pages -- by a scheme
based on memory pools.

This allows better utilization of memory because buffers can now be allocated
with a granularity finer than the system's native page size (useful for
filesystems with e.g. 1k or 2k fragment sizes). It also avoids fragmentation
of virtual to physical memory mappings (due to the former fixed virtual
address reservation) resulting in better utilization of MMU resources on some
platforms. Finally, the scheme is more flexible by allowing run-time decisions
on the amount of memory to be used for buffers.

On the other hand, the effectiveness of the LRU queue for buffer recycling
may be somewhat reduced compared to the traditional method since, due to the
nature of the pool based memory allocation, the actual least recently used
buffer may release its memory to a pool different from the one needed by a
newly allocated buffer. However, this effect will kick in only if the
system is under memory pressure.
 1.80 07-Nov-2003  yamt more assertion about file truncation to zero.
 1.79 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.78 12-Jul-2003  yamt - protect global resource counts with lfs_subsys_lock.
- clean up scattered externs a little.
 1.77 29-Jun-2003  fvdl branches: 1.77.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.76 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.75 27-Apr-2003  yamt fix b_interlock lock/unlock mismatches.
 1.74 23-Apr-2003  perseant Make LFS work better (though still not "well") as an NFS-exported
filesystem (and other things that needed to be fixed before the tests
would complete), to wit:

* Include the fs ident in the filehandle; improve stale filehandle checks.

* Change definition of blksize() to use the on-dinode size instead of
the inode's i_size, so that fsck_lfs will work properly again.

* Use b_interlock in lfs_vtruncbuf.

* Postpone dirop reclamation until after the seglock has been released,
so that lfs_truncate is not called with the segment lock held.

* Don't loop in lfs_fsync(), just write everything and wait.

* Be more careful about the interlock/uobjlock in lfs_putpages: when we
lose this lock, we have to resynchronize dirtiness of pages in each
block.

* Be sure to always write indirect blocks and update metadata in
lfs_putpages; fixes a bug that caused blocks to be accounted to the
wrong segment.
 1.73 10-Apr-2003  simonb '#if 0' out a variable that is currently only used in other '#if 0'd out
code.
 1.72 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.71 20-Mar-2003  perseant Hold the segment lock during truncation to prevent indirect blocks from
being written by lfs_updatemeta while lfs_truncate is also writing them,
a bug pointed out by YAMAMOTO Takashi <yamt@netbsd.org>.
 1.70 08-Mar-2003  perseant Take away "#ifdef LFS_UBC".
 1.69 04-Mar-2003  perseant Don't force all truncations to be synchronous
 1.68 01-Mar-2003  perseant Be careful to always zero pages on truncation/fragment extension,
in the case where the filesystem block size is larger than PAGE_SIZE.
 1.67 28-Feb-2003  perseant Make lfs_truncate handle file extension correctly, in the LFS_UBC case.
 1.66 28-Feb-2003  perseant Quell a hasty panic in lfs_truncate: on-inode disk addresses can be
different between the beginning and end of the call.
 1.65 20-Feb-2003  perseant Tabify, and fix some comment alignment problems.
 1.64 17-Feb-2003  perseant Add code to UBCify LFS. This is still behind "#ifdef LFS_UBC" for now
(there are still some details to work out) but expect that to go
away soon. To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:

* Create a writer daemon kernel thread whose purpose is to handle page
writes for the pagedaemon, but which also takes over some of the
functions of lfs_check(). This thread is started the first time an
LFS is mounted.

* Add a "flags" parameter to GOP_SIZE. Current values are
GOP_SIZE_READ, meaning that the call should return the size of the
in-core version of the file, and GOP_SIZE_WRITE, meaning that it
should return the on-disk size. One of GOP_SIZE_READ or
GOP_SIZE_WRITE must be specified.

* Instead of using malloc(...M_WAITOK) for everything, reserve enough
resources to get by and use malloc(...M_NOWAIT), using the reserves if
necessary. Use the pool subsystem for structures small enough that
this is feasible. This also obsoletes LFS_THROTTLE.

And a few that are not strictly necessary:

* Moves the LFS inode extensions off onto a separately allocated
structure; getting closer to LFS as an LKM. "Welcome to 1.6O."

* Unified GOP_ALLOC between FFS and LFS.

* Update LFS copyright headers to correct values.

* Actually cast to unsigned in lfs_shellsort, like the comment says.

* Keep track of which segments were empty before the previous
checkpoint; any segments that pass two checkpoints both dirty and
empty can be summarily cleaned. Do this. Right now lfs_segclean
still works, but this should be turned into an effectless
compatibility syscall.
 1.63 25-Jan-2003  fvdl The oldblks and newblks arrays are used to store direct copies of
on-disk block pointers, so they should be int32_t. Error found
by Izumi Tsutsui.
 1.62 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.61 28-Dec-2002  yamt - in lfs_reserve, vref vnodes that we're locking so that cleaner doesn't
try to reclaim them.
(workaround for deadlock noted in the comment in lfs_reserveavail)
- in lfs_rename, mark vnodes which are being moved as well as directry vnodes.
 1.60 27-Sep-2002  provos remove trailing \n in panic(). approved perry.
 1.59 06-Jul-2002  perseant Deal with fragment size changes better. For each fragment that can
exist on an on-disk inode, we keep a record of its size in struct inode,
which is updated when we write the block to disk. The cleaner routines
thus have ready access to what size is the correct size for this block,
on disk.

Fixed a related bug: if a file with fragments is being cleaned
(fragments being cleaned) at the same time it is being extended beyond
NDADDR blocks, we could write a bogus FINFO record that has a frag in the
middle; when it was cleaned this would give back bogus file data. Don't
write the indirect blocks in this case, since there is no need.

lfs_fragextend and lfs_truncate no longer require the seglock, but instead
take a shared lock, which the seglock locks exclusively.
 1.58 02-Jul-2002  yamt fix printf format for DEBUG_LFS.
 1.57 14-May-2002  perseant branches: 1.57.2;
Phase one of my three-phase plan to make LFS play nice with UBC, and bug-fixes
I found while making sure there weren't any new ones.

* Make the write clusters keep track of the buffers whose blocks they contain.
This should make it possible to (1) write clusters using a page mapping
instead of malloc, if desired, and (2) schedule blocks for rewriting
(somewhere else) if a write error occurs. Code is present to use
pagemove() to construct the clusters but that is untested and will go away
anyway in favor of page mapping.
* DEBUG now keeps a log of Ifile writes, so that any lingering instances of
the "dirty bufs" problem can be properly debugged.
* Keep track of whether the Ifile has been dirtied by various routines that
can be called by lfs_segwrite, and loop on that until it is clean, for
a checkpoint. Checkpoints need to be squeaky clean.
* Warn the user (once) if the Ifile grows larger than is reasonable for their
buffer cache. Both lfs_mountfs and lfs_unmount check since the Ifile can
grow.
* If an inode is not found in a disk block, try rereading the block, under
the assumption that the block was copied to a cluster and then freed.
* Protect WRITEINPROG() with splbio() to fix a hang in lfs_update.
 1.56 23-Nov-2001  chs add spaces for KNF. confirmed to produce identical objects.
 1.55 08-Nov-2001  lukem add RCSID
 1.54 06-Nov-2001  simonb Remove some variables that are set but never used.
 1.53 15-Sep-2001  chs branches: 1.53.2;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.52 13-Jul-2001  perseant branches: 1.52.2;
Merge the short-lived perseant-lfsv2 branch into the trunk.

Kernels and tools understand both v1 and v2 filesystems; newfs_lfs
generates v2 by default. Changes for the v2 layout include:

- Segments of non-PO2 size and arbitrary block offset, so these can be
matched to convenient physical characteristics of the partition (e.g.,
stripe or track size and offset).

- Address by fragment instead of by disk sector, paving the way for
non-512-byte-sector devices. In theory fragments can be as large
as you like, though in reality they must be smaller than MAXBSIZE in size.

- Use serial number and filesystem identifier to ensure that roll-forward
doesn't get old data and think it's new. Roll-forward is enabled for
v2 filesystems, though not for v1 filesystems by default.

- The inode free list is now a tailq, paving the way for undelete (undelete
is not yet implemented, but can be without further non-backwards-compatible
changes to disk structures).

- Inode atime information is kept in the Ifile, instead of on the inode;
that is, the inode is never written *just* because atime was changed.
Because of this the inodes remain near the file data on the disk, rather
than wandering all over as the disk is read repeatedly. This speeds up
repeated reads by a small but noticeable amount.

Other changes of note include:

- The ifile written by newfs_lfs can now be of arbitrary length, it is no
longer restricted to a single indirect block.

- Fixed an old bug where ctime was changed every time a vnode was created.
I need to look more closely to make sure that the times are only updated
during write(2) and friends, not after-the-fact during a segment write,
and certainly not by the cleaner.
 1.51 30-May-2001  mrg branches: 1.51.2; 1.51.4;
use _KERNEL_OPT
 1.50 03-Dec-2000  perseant branches: 1.50.2;
Get rid of some old unnecessary code that cleared B_NEEDCOMMIT from buffers in
lfs_writeseg (possibly after they had been freed).

If MALLOCLOG is defined, make lfs_newbuf and lfs_freebuf pass along the
caller's file and line to _malloc and _free.
 1.49 27-Nov-2000  chs Initial integration of the Unified Buffer Cache project.
 1.48 27-Nov-2000  perseant If LFS_DO_ROLLFORWARD is defined, roll forward from the older checkpoint
on mount, through the newer checkpoint and on through any newer
partial-segments that may have been written but not checkpointed because
of an intervening crash.

LFS_DO_ROLLFORWARD is not defined by default.
 1.47 21-Nov-2000  perseant More locked_queue_* and lfs_avail accounting fixes from Jesse Off
<joff@gci-net.com>. Remove a specious btodb() in lfs_fragextend, and
count blocks shrunk or removed by VOP_TRUNCATE in lfs_avail.
 1.46 17-Nov-2000  perseant Correct accounting of lfs_avail, locked_queue_count, and locked_queue_bytes.
(PR #11468). In the case of fragment allocation, check to see if enough
space is available before extending a fragment already scheduled for writing.

The locked_queue_* variables indicate the number of buffer headers and bytes,
respectively, that are unavailable to getnewbuf() because they are locked up
waiting for LFS to flush them; make sure that that is actually what we're
counting, i.e., never count malloced buffers, and always use b_bufsize instead
of b_bcount.

If DEBUG is defined, the periodic calls to lfs_countlocked will now complain
if either counter is incorrect. (In the future lfs_countlocked will not need
to be called at all if DEBUG is not defined.)
 1.45 14-Oct-2000  perseant In lfs_truncate, don't overcount the real blocks removed from the inode,
when deallocating a fragment that has not made it to disk yet.

Also, during dirops, give the directory vnode an extra reference in
SET_DIROP, to ensure its continued existence during SET_ENDOP, preventing
a possible NULL-dereference there.

These two changes should close PR #11064.
 1.44 09-Sep-2000  perseant Various bug-fixes to LFS, to wit:


Kernel:

* Add runtime quantity lfs_ravail, the number of disk-blocks reserved
for writing. Writes to the filesystem first reserve a maximum amount
of blocks before their write is allowed to proceed; after the blocks
are allocated the reserved total is reduced by a corresponding amount.

If the lfs_reserve function cannot immediately reserve the requested
number of blocks, the inode is unlocked, and the thread sleeps until
the cleaner has made enough space available for the blocks to be
reserved. In this way large files can be written to the filesystem
(or, smaller files can be written to a nearly-full but thoroughly
clean filesystem) and the cleaner can still function properly.

* Remove explicit switching on dlfs_minfreeseg from the kernel code; it
is now merely a fs-creation parameter used to compute dlfs_avail and
dlfs_bfree (and used by fsck_lfs(8) to check their accuracy). Its
former role is better assumed by a properly computed dlfs_avail.

* Bounds-check inode numbers submitted through lfs_bmapv and lfs_markv.
This prevents a panic, but, if the cleaner is feeding the filesystem
the wrong data, you are still in a world of hurt.

* Cleanup: remove explicit references of DEV_BSIZE in favor of
btodb()/dbtob().

lfs_cleanerd:

* Make -n mean "send N segments' blocks through a single call to
lfs_markv". Previously it had meant "clean N segments though N calls
to lfs_markv, before looking again to see if more need to be cleaned".
The new behavior gives better packing of direct data on disk with as
little metadata as possible, largely alleviating the problem that the
cleaner can consume more disk through inefficient use of metadata than
it frees by moving dirty data away from clean "holes" to produce
entirely clean segments.

* Make -b mean "read as many segments as necessary to write N segments
of dirty data back to disk", rather than its former meaning of "read
as many segments as necessary to free N segments worth of space". The
new meaning, combined with the new -n behavior described above,
further aids in cleaning storage efficiency as entire segments can be
written at once, using as few blocks as possible for segment summaries
and inode blocks.

* Make the cleaner take note of segments which could not be cleaned due
to error, and not attempt to clean them until they are entirely free
of dirty blocks. This prevents the case in which a cleanerd running
with -n 1 and without -b (formerly the default) would spin trying
repeatedly to clean a corrupt segment, while the remaining space
filled and deadlocked the filesystem.

* Update the lfs_cleanerd manual page to describe all the options,
including the changes mentioned here (in particular, the -b and -n
flags were previously undocumented).

fsck_lfs:

* Check, and optionally fix, lfs_avail (to an exact figure) and
lfs_bfree (within a margin of error) in pass 5.

newfs_lfs:

* Reduce the default dlfs_minfreeseg to 1/20 of the total segments.

* Add a warning if the sgs disklabel field is 16 (the default for FFS'
cpg, but not usually desirable for LFS' sgs: 5--8 is a better range).

* Change the calculation of lfs_avail and lfs_bfree, corresponding to
the kernel changes mentioned above.

mount_lfs:

* Add -N and -b options to pass corresponding -n and -b options to
lfs_cleanerd.

* Default to calling lfs_cleanerd with "-b -n 4".


[All of these changes were largely tested in the 1.5 branch, with the
idea that they (along with previous un-pulled-up work) could be applied
to the branch while it was still in ALPHA2; however my test system has
experienced corruption on another filesystem (/dev/console has gone
missing :^), and, while I believe this unrelated to the LFS changes, I
cannot with good conscience request that the changes be pulled up.]
 1.43 09-Sep-2000  perseant Make sure to unmark B_DELWRI on blocks freed due to truncation to a non-zero
file length. Should fix PR #s 10551 and 10831.
 1.42 05-Jul-2000  perseant Clean up accounting of lfs_uinodes (dirty but unwritten inodes).

Make lfs_uinodes a signed quantity for debugging purposes, and set it to
zero as fs mount time.

Enclose setting/clearing of the dirty flags (IN_MODIFIED, IN_ACCESSED,
IN_CLEANING) in macros, and use those macros everywhere. Make
LFS_ITIMES use these macros; updated the ITIMES macro in inode.h to know
about this. Make ufs_getattr use ITIMES instead of FFS_ITIMES.
 1.41 04-Jul-2000  perseant Fix errors observed while trying to fill the filesystem with yesterday's
fixes:

- Write copies of bfree and avail in the CLEANERINFO block, so the
cleaner doesn't have to guess which superblock has the current
information (if indeed any do).

- Tighten up accounting of lfs_avail (more needs to be done).

- When cleansing indirect blocks of UNWRITTEN, make sure not to mark
them clean, since they'll need to be rewritten later.
 1.40 03-Jul-2000  perseant Allow the number of free segments reserved for the cleaner to be
parametrized in the filesystem, defaulting to MIN_FREE_SEGS = 2 but set
to something more reasonable at newfs_lfs time.

Note the number of blocks that have been scheduled for writing but which
are not yet on disk in an inode extension, i_lfs_effnblks. Move
i_ffs_effnlink out of the ffs extension and onto the main inode, since
it's used all over the shared code and the lfs extension would clobber
it.

At inode write time, indirect blocks and inode-held blocks of inodes
that have i_lfs_effnblks != i_ffs_blocks are cleansed of UNWRITTEN disk
addresses, so that these never make it to disk.
 1.39 28-Jun-2000  mrg remove include of <vm/vm.h> and <uvm/uvm_extern.h>
 1.38 27-Jun-2000  perseant Fixes associated with filling an LFS:

Change the space computation to appear to change the size of the *disk*
rather than the *bytes used* when more segment summaries and inode
blocks are written. Try to estimate the amount of space that these will
take up when more files are written, so the disk size doesn't change too
much.

Regularize error returns from lfs_valloc, lfs_balloc, lfs_truncate: they
now fail entirely, rather than succeeding half-way and leaving the fs in
an inconsistent state.

Rewrite lfs_truncate, mostly stealing from ffs_truncate. The old
lfs_truncate had difficulty truncating a large file to a non-zero size
(indirect blocks were not handled appropriately).

Unmark VDIROP on fvp after ufs_remove, ufs_rmdir, so these can be
reclaimed immediately: this vnode would not be written to disk again
anyway if the removal succeeded, and if it failed, no directory
operation occurred.

ufs_makeinode and ufs_mkdir now remove IN_ADIROP on error.
 1.37 31-May-2000  perseant branches: 1.37.2;
update for IN_ACCESSED changes
 1.36 13-May-2000  perseant branches: 1.36.2;
Change the sementics of the last parameter from a boolean ("waitfor") to
a set of flags ("flags"). Two flags are defined, UPDATE_WAIT and
UPDATE_DIROP.

Under the old semantics, VOP_UPDATE would block if waitfor were set,
under the assumption that directory operations should be done
synchronously. At least LFS and FFS+softdep do not make this
assumption; FFS+softdep got around the problem by enclosing all relevant
calls to VOP_UPDATE in a "if(!DOINGSOFTDEP(vp))", while LFS simply
ignored waitfor, one of the reasons why NFS-serving an LFS filesystem
did not work properly.

Under the new semantics, the UPDATE_DIROP flag is a hint to the
fs-specific update routine that the call comes from a dirop routine, and
should be wait for, or not, accordingly.

Closes PR#8996.
 1.35 05-May-2000  perseant Change the way LFS does block accounting, from trying to infer from the
buffer cache flags, to marking the inode and/or indirect blocks with a
special disk address UNWRITTEN==-2 when a block is accounted for. (This
address is never written to disk, but only used in-core. This is essentially
the same method of block accounting as on the UBC branch, where the buffer
headers don't exist.) Make sure that truncation is handled properly,
especially in the case of holey files.

Fixes PR#9994.
 1.34 24-Apr-2000  perseant get rid of unused variable aflags
 1.33 23-Apr-2000  perseant Fix problems outlined in PR#9926:
- lfs_truncate extends the file if called with length > i_ffs_size;
- lfs_truncate errors out if called with length < 0;
- lfs_balloc block accounting corrected for the case of blocks read
into the cache before they exist on disk;
- mp->mnt_stat.f_iosize is initialized in lfs_mountfs.
 1.32 30-Mar-2000  augustss Remove register declarations.
 1.31 12-Mar-2000  bouyer lfs_truncate: handle synlinks with length > maxsymlink_len as regular files.
For symlinks > 60 chars we were bzero'ing part of (struct inode) past the
actual inode struct, corrupting memory following the current (struct inode)
resuling in a 'panic: pool_get(lfsinopl): free list modified' later.
This could also be the cause of random panics. With this fix LFS seems to be
useable for me now.
 1.30 19-Jan-2000  perseant Changes to stabilize LFS. The first two of these should also apply to the
1.4 branch.

* Use a separate per-fs lock, instead of ufs_hashlock, to protect the Inode
free list. This seems to prevent the "lockmgr: %d, not exclusive lock holder
%d, unlocking" message I was mis-attributing last night to an unlocked vnode
being passed to vrele.

* Change calling semantics of lfs_ifind, to give better error reporting:
If fed a struct buf, it can report the block number of the offending inode
block as well as the inode number.

* Back out rev 1.10 of lfs_subr.c, since the replacement code was slightly
uglier while being functionally identical.

* Make lfs_vunref use the same free list convention as vrele/vput, so that
vget does not remove vnodes from a hash list they are not on.
 1.29 16-Jan-2000  perseant Fix a problem in my changes of Dec 14th, that prevents removed vnodes
from being inactivated under some conditions. Removed vnodes are now
inactivated when the VDIROP flag is cleared, and to prevent block
accounting problems this clearing has been postponed until
lfs_segunlock.
 1.28 23-Nov-1999  fvdl Be more careful to block bio interrupts for some data structures. There
were at least a few missed cases where vp->v_{clean,dirty}blkhd were
unprotected since the softdep/trickle sync merge.
 1.27 03-Sep-1999  perseant branches: 1.27.2; 1.27.8;
Make changes that will allow an LFS filesystem to be used as the root
filesystem. In particular,

- Fix mknod deadlock, described in PR 8172.
- Enable lfs_mountroot.
- Make lfs_writevnodes treat filesystems mounted on lfs device nodes properly,
by flushing that device rather than trying to add blocks to the device inode.

This, in combination with lfs boot blocks, will allow operation of an all-lfs
system.
 1.26 15-Jun-1999  perseant Minor changes to the segment live bytes calculation. In particular, fixed
a bug in fragment extension that could run the count negative. Also, don't
overcount for inodes, and don't count segment summaries. Thus, for empty
segments the live bytes count should now be exactly zero.
 1.25 01-Jun-1999  perseant Fixed lfs_update (and related functions) so that calls from lfs_fsync
will DTRT with vnodes marked VDIROP. In particular, the message
"flushing VDIROP" will no longer appear, and the filesystem will remain
stable in the event of a crash.

This was particularly a problem with NFS-exported LFSes, since fsync
was called on every file close.
 1.24 12-Apr-1999  perseant Fix block counting during file truncation, if not truncating to zero.
 1.23 12-Apr-1999  perseant Make sure that the wakeup occurs for vnodes that lfs_update might be sleeping
on (nodes which are not marked IN_MODIFIED/IN_CLEANING, but which have dirty
buffers), by marking them with the appropriate flag if dirtybuffers were added
while the write was in progress.
 1.22 01-Apr-1999  perseant branches: 1.22.2;
Fix buffer handling problems in lfs_vinvalbuf
 1.21 29-Mar-1999  perseant lfs_truncate calls vinvalbuf to invalidate all currently-hald buffers, which
in turn forces a flush of the vnode, whether or not it is involved in a dirop.
(This can happen during a remove or rmdir, when the directory is shrunk.)
Because of the nature of dirops, however, flushing a vnode involved in a dirop
is disallowed (and was marked with a panic). This patch has lfs_truncate
call a specialized vinvalbuf that only invalidates buffers following the new
end-of-file, and thus does not require a flush. Also the panic is demoted,
in case I missed any other path to lfs_vflush.
 1.20 25-Mar-1999  perseant clean up unused/required #ifdefs
 1.19 24-Mar-1999  mrg completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.
 1.18 10-Mar-1999  perseant New sources should leave the LFS in a more-or-less working state. Changes
include:

- DIROP segregation is enabled, and greater care is taken
to make sure that a checkpoint completes. Fsck is not
needed to remount the filesystem.
- Several checks to make sure that the LFS subsystem does not
overuse various resources (memory, in particular).
- The cleaner routines, lfs_markv in particular, are completely
rewritten. A buffer overflow is removed. Greater care is taken
to ensure that inodes come from where lfs_cleanerd say they come
from (so we know nothing has changed since lfs_bmapv was called).
- Fragment allocation is fixed, so that writes beyond end-of-file
do the right thing.
 1.17 05-Mar-1999  mycroft Pass null pointers to VOP_UPDATE rather than having all the callers fetch the
current time themselves.
 1.16 05-Mar-1999  mycroft Permit the access and modify time pointers passed to VOP_UPDATE to be null,
meaning the current time.
 1.15 10-Feb-1999  bouyer Make sure a buffer optained from bread() is always bresle()'d in case of
error. Closes PR kern/1448 from Wolfgang Solfrank.
 1.14 09-Jun-1998  scottr Protect various config(8)-generated files from inclusion while
building LKMs. Fixes PR 5557.
 1.13 08-Jun-1998  scottr Use the newly-defined opt_quota.h.
 1.12 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.11 07-Feb-1998  chs add UVM stuff.
 1.10 04-Jul-1997  drochner Don't cast 64bit (off_t) file sizes to vm_offset_t (32bit on many
architectures), truncate them intelligently instead.
The truncation is done centralized in vnode_pager.c.
This prevents from wrap-over effects when parts of large (>2^32 byte) files
are mmapped.
Don't allow to mmap above the numerical range of vm_offset_t.
This is considered a temporary solution until the vm system handles the
object sizes/offsets more cleanly.
 1.9 11-Jun-1997  bouyer Add support for ext2fs, this needed a few modifications to ufs/ufs/inode.h:
- added an "union inode_ext" to struct inode, for the per-fs extentions.
For now only ext2fs uses it.
- i_din is now an union:
union {
struct dinode ffs_din; /* 128 bytes of the on-disk dinode. */
struct ext2fs_dinode e2fs_din; /* 128 bytes of the on-disk dinode. */
} i_din
Added a lot of #define i_ffs_* and i_e2fs_* to access the fields.
- Added two macros: FFS_ITIMES and EXT2FS_ITIMES. ITIMES calls the rigth
macro, depending on the time of the inode. ITIMES is used where necessary,
FFS_ITIMES and EXT2FS_ITIMES in other places.
 1.8 12-Oct-1996  christos revert previous kprintf changes
 1.7 10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.6 01-Sep-1996  mycroft Add a set of generic file system operations that most file systems use.
Also, fix some time stamp bogosities.
 1.5 11-May-1996  mycroft Change VOP_UPDATE() semantics:
* Make 2nd and 3rd args timespecs, not timevals.
* Consistently pass a Boolean as the 4th arg (except in LFS).
Also, fix ffs_update() and lfs_update() to actually change the nsec fields.
 1.4 09-Feb-1996  christos lfs prototypes
 1.3 15-Jun-1995  cgd compensate for timeval/timespec/stat structure changes.
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.22.2.5 20-Jan-2000  he Pull up revisions 1.29-1.30 (requested by perseant):
Files removed (through unlink, rmdir) are now really removed, though the
removal is postponed until the dirop is complete to ensure validity of
the filesystem through a crash. Use a separate per-fs lock, instead of
ufs_hashlock, to protect the inode free list. Change calling semantics
of lfs_ifind, to give better error reporting: If fed a struct buf, it
can report the block number of the offending inode block as well as the
inode number.
 1.22.2.4 15-Jan-2000  he Pull up revision 1.27 (requested by perseant):
Address problems related to using an LFS filesystem as the root
filesystem, including mknod hangs. Fixes PR#8172 and PR#9072.
 1.22.2.3 17-Dec-1999  he Pull up revision 1.25 (requested by perseant):
Avoid flushing vnodes involved in a dirop, making lfs' promise
of "no fsck needed, even in the event of a crash" closer to
reality.
 1.22.2.2 25-Jun-1999  perry pullup 1.25->1.26 (perseant)
 1.22.2.1 13-Apr-1999  perseant branches: 1.22.2.1.2; 1.22.2.1.4;
Pull-up of changes made to the trunk on Sunday [1.22->1.24], to wit:

Take out the `#ifdef USE_UFSHASH'; use ufs_hashlock to lock the inode free
list instead of free_lock.

Fix inode reporting in lfs_statfs (the meaning of f_files and f_ffree was
reversed).

Fix "lfs_ifind: dinode xxx not found" panic. When inodes were freed, then
immediately reloaded, their dinodes were located in an inode block which
was not on disk at the advertized location, nor in the cache (although it
would be flushed to disk next segment write). Fix this by using getblk()
instead of lfs_newbuf() for inode blocks.

Better checking for held inode locks in lfs_fastvget, for a number of
error conditions. Also change the default setting of lfs_clean_vnhead to
0, which seems to make the locking problems go away (although this is
difficult to test as I can't reliably reproduce them).

Make sure that the wakeup occurs for vnodes that lfs_update might be
sleeping on (nodes which are not marked IN_MODIFIED/IN_CLEANING, but which
have dirty buffers), by marking them with the appropriate flag if
dirtybuffers were added while the write was in progress.

Fix block counting during file truncation, if not truncating to zero.

Disallow threshold-initiated cache flush when dirops are active. Also,
make SET_ENDOP use lfs_check instead of inlining most of it.

Improve the debugging printfs in the cleaner syscalls (in particular, make
it obvious that they're coming from lfs).

Check the superblock version field, and refuse to mount the filesystem if
the version number is higher than we know about. This allows, e.g.,
changes in the format of the ifile, segment size restrictions and
boundaries, etc., which would not affect existing fields in the
superblock, but which would drastically affect the filesystem, to be
smoothly integrated at a later date.
 1.22.2.1.4.1 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.22.2.1.2.3 31-Aug-1999  perseant Rudimentary support for LFS under UBC:

- LFS-specific VOP_BALLOC and VOP_PUTPAGES vnode ops.

- getblk VREG panic #ifdef'd out (can be reinstated when Ifile is
internalized and Ifile can be made another type from VREG)

- interface to VOP_PUTPAGES changed to pass all pager flags, not
just sync. FS putpages routines must know about the pager flags.

- new LFS magic disk address, -2 ("unwritten"), meaning accounted for
but not assigned to a fixed disk location (since LFS does these two
things separately, and the previous accounting method using buffer
headers no longer will work). Changed references to (foo == (daddr_t)-1)
to (foo < 0). Since disk drivers reject all addresses < 0, this should
not present a problem for other FSs.
 1.22.2.1.2.2 11-Jul-1999  chs remove uvm_vnp_uncache(), it's no longer needed.
 1.22.2.1.2.1 21-Jun-1999  thorpej Sync w/ -current.
 1.27.8.2 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.27.8.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.27.2.3 08-Dec-2000  bouyer Sync with HEAD.
 1.27.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.27.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.36.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.37.2.3 23-Mar-2001  he Pull up revisions 1.46-1.47 (via patch, requested by perseant):
o Close up accounting holes in LFS' accounting of immediately-
available-space, number of clean segments, and amount of dirty
space taken up by metadata (PR#11468, PR#11470, PR#11534).
This one got left out when the rest was pulled up. Sorry.
 1.37.2.2 01-Nov-2000  tv Pullup 1.45 [perseant, toshii]:
In lfs_truncate, don't overcount the real blocks removed from the inode,
when deallocating a fragment that has not made it to disk yet.

Also, during dirops, give the directory vnode an extra reference in
SET_DIROP, to ensure its continued existence during SET_ENDOP, preventing
a possible NULL-dereference there.

These two changes should close PR #11064.
 1.37.2.1 14-Sep-2000  perseant Pull up recent LFS kernel changes (approved by thorpej):

ufs/ufs/inode.h, 1.20--1.22 (add i_lfs_effnblks extension ;
make ITIMES aware of LFS_ITIMES;
_LKM protection so userland progs
compile)
ufs/ufs/ufs_vnops.c, 1.69, 1.71 (remove IN_ADIROP;
use ITIMES instead of FFS_ITIMES)
ufs/ufs/ufs_readwrite.c, 1.27 (use lfs_reserve in lfs_write)
ufs/lfs/lfs.h, 1.26--1.32 (define LFS_EST_* macros ;
change MIN_FREE_SEGS to lfs_minfreesegs ;
add avail and bfree to CLEANERINFO ;
change lfs_uinodes to signed ;
change lfs_dmeta to signed ;
add whitespace to line up structure
members ;
explicit cast to int32_t in LFS_EST_*
macros)
ufs/lfs/lfs_alloc.c, back out 1.34.2.3 (pullups of 1.39, 1.40);
then pull up 1.38 (clean up on error)
1.39--1.43 (restore fvdl's ufs_hashlock fix ;
restore fvdl's ufs_hashlock fix ;
set i_lfs_effnblks ;
use UINO macros ;
add comments and fix long lines)
ufs/lfs/lfs_balloc.c, 1.19 (don't succeed halfway)
1.21--1.25 (use i_lfs_effnblks ;
fix i_lfs_effnblks computation and
quieten ;
fix i_ffs_blocks in unwritten fragment ;
remove useless debugging check ;
add comments and (c) 2000)
ufs/lfs/lfs_bio.c, 1.24--1.30 (cleanup and make lfs_flush_fs take
"struct lfs *" instead of "struct
mount *" ;
use lfs_minfreeseg instead of
MIN_FREE_SEGS ;
use UINO macros, and copy bfree/avail
to CLEANERINFO ;
add lfs_reserve function ;
1.28--1.30 fix printf formatting)
ufs/lfs/lfs_cksum.c, 1.13 (add (c) 2000)
ufs/lfs/lfs_debug.c, 1.11 (use btodb instead of DEV_BSIZE)
ufs/lfs/lfs_extern.h, 1.18, 1.20--1.21 (function prototype changes)
ufs/lfs/lfs_inode.c, 1.38 (rewrite lfs_truncate from
ffs_truncate)
1.40--1.44 (count written and unwritten blocks
seperately ;
use disk block units instead of bytes ;
remove unnecessary "mod" variable ;
correct B_DELWRI to avoid bawrite panic ;
use lfs_reserve)
ufs/lfs/lfs_segment.c, 1.52-1.59 (use lfs_dmeta to note used summaries ;
check for UNWRITTEN in indirect blocks ;
more debugging stuff inside #ifdef
DEBUG_LFS ;
use LK_CANRECURSE ;
don't drop dirty indirect blocks ;
use UINO macros ;
don't hose the free list ;
use btodb() instead of DEV_BSIZE ;
make it compile again (oops))
ufs/lfs/lfs_subr.c, 1.16--1.17 (check for locked inodes before
changing ;
use btodb() instead of DEV_BSIZE, (c)
2000)
ufs/lfs/lfs_syscalls.c, back out 1.41.4.2 (fvdl's ufs_hashlock fix);
then pull up 1.43 (use lfs_dmeta)
1.44--1.45 (restore fvdl's ufs_hashlock fix)
1.46--1.47 (fix lfs_avail leakage from sblock
segments ;
use UINO macros)
1.49 (bounds-check inode numbers in
lfs_markv)
ufs/lfs/lfs_vfsops.c, 1.53 (use LFS_EST_* macros in lfs_statfs)
1.56--1.58 (initialize lfs_minfreeseg, lfs_effnblk ;
initialize lfs_uinodes ;
initialize lfs_ravail)
ufs/lfs/lfs_vnops.c, 1.40 (remove VDIROP from removed files)
1.42--1.44 (move SET_ENDOP below the removal of
VDIROP ;
use UINO macros and add lfs_itimes
function ;
use lfs_reserve in dirops)
 1.50.2.9 29-Dec-2002  thorpej Sync with HEAD.
 1.50.2.8 18-Oct-2002  nathanw Catch up to -current.
 1.50.2.7 01-Aug-2002  nathanw Catch up to -current.
 1.50.2.6 20-Jun-2002  nathanw Catch up to -current.
 1.50.2.5 08-Jan-2002  nathanw Catch up to -current.
 1.50.2.4 14-Nov-2001  nathanw Catch up to -current.
 1.50.2.3 21-Sep-2001  nathanw Catch up to -current.
 1.50.2.2 24-Aug-2001  nathanw Catch up with -current.
 1.50.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.51.4.5 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.51.4.4 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.51.4.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.51.4.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.51.4.1 03-Aug-2001  lukem update to -current
 1.51.2.3 02-Jul-2001  perseant Change disk addressing unit to be the fragment, instead of the disk sector.
All quantities in the superblock, inodes, indirect blocks, etc. refer now
to this abstract unit (called "fsb" as it is in FFS) instead of disk sectors;
as a consequence segment summary blocks have to be multiples of a fragment in
size. In v1 filesystems, compatibility code ensures that 1 fsb == 1 sector,
regardless of fragment size.

Fragments can now range in size between 512 and 32k; in the event that
LFS_LABELPAD (8k) is smaller than the disk address unit size, an extra
proto-superblock is kept at 8k from the beginning of the disk, to be used
*only* to locate the real superblocks. (Not all of the userland knows about
this yet.)

Almost all of this was done not by me, but by joff.
 1.51.2.2 29-Jun-2001  perseant Get rid of __P(), protoizing where it had not already been done
 1.51.2.1 27-Jun-2001  perseant Import of what I've been calling "LFSv2", that is, LFS with some features
added that require changes to the on-disk data structures. These include:

- 64-bit time in everything but inodes
- User-specified segment offset, and segment size no longer
restricted to PO2.
- Serial number on segment summaries in addition to timestamp, and
a new volume identifier, to make roll-forward feasible without
fear of finding old data and thinking it was new.

Although I think this version works at least as well as what's on the trunk,
we're not done yet; hence this commit is going in on a branch and not on
the trunk. Enhancements that are not here yet include fragment addressing,
like FFS does, instead of block addressing.
 1.52.2.1 01-Oct-2001  fvdl Catch up with -current.
 1.53.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.57.2.1 15-Jul-2002  gehenna catch up with -current.
 1.77.2.9 11-Dec-2005  christos Sync with head.
 1.77.2.8 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.77.2.7 08-Mar-2005  skrll Sync with HEAD.
 1.77.2.6 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.77.2.5 21-Sep-2004  skrll Fix the sync with head I botched.
 1.77.2.4 18-Sep-2004  skrll Sync with HEAD.
 1.77.2.3 25-Aug-2004  skrll Sync with HEAD.
 1.77.2.2 03-Aug-2004  skrll Sync with HEAD
 1.77.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.82.2.1 09-Apr-2004  jmc Pullup rev 1.83 (requested by oster in ticket #112)

If we bail out due to an error, we need 'unreserve' the space that
we'd reserved earlier.
 1.88.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.88.4.1 29-Apr-2005  kent sync with -current
 1.90.2.5 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs.h: revision 1.104
sys/ufs/lfs/lfs_vfsops.c: revision 1.206
sys/ufs/lfs/lfs_vnops.c: revision 1.170
sys/ufs/lfs/lfs_extern.h: revision 1.80
sys/ufs/lfs/lfs_segment.c: revision 1.176
sys/ufs/lfs/lfs_inode.c: revision 1.103 via patch
sys/ufs/lfs/lfs_alloc.c: revision 1.90
Postpone the segment accounting changes coming from truncation until the
inode that makes those changes valid is either written to disk by
lfs_writeinode() or discarded by lfs_vfree().
A couple of locking fixes are also included as well.
 1.90.2.4 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_inode.c: revision 1.102
Avoid a possible sign overflow condition in lfs_truncate, which would result
in a buffer overflow (underflow). Coverity CID 1521.
 1.90.2.3 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vfsops.c: revision 1.200
sys/ufs/lfs/lfs_vnops.c: revision 1.164
sys/ufs/lfs/lfs_inode.c: revision 1.101
sys/ufs/lfs/lfs_extern.h: revision 1.78
sys/ufs/lfs/lfs.h: revision 1.100
Implement a somewhat finer-grained mechanism for paging LFS-backed pages.
The writer daemon, if it does not need to flush the whole filesystem,
now only writes the vnodes for which the pagedaemon has requested pageouts
(although it does not pay attention to the page ranges the pagedaemon
supplies).
 1.90.2.2 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.152
sys/ufs/lfs/lfs_debug.c: revision 1.31
sys/ufs/lfs/lfs_subr.c: revision 1.53
sys/ufs/lfs/lfs_extern.h: revision 1.68
sys/ufs/lfs/lfs_inode.c: revision 1.96
sys/ufs/lfs/lfs_bio.c: revision 1.86
sys/ufs/lfs/lfs_alloc.c: revision 1.83
sys/ufs/lfs/lfs_vfsops.c: revision 1.181
sys/ufs/lfs/lfs.h: revision 1.88
sys/ufs/lfs/lfs_segment.c: revision 1.164
- sprinkle const
- avoid shadow variables.
 1.90.2.1 07-May-2005  tron Apply patch (requested by perseant in ticket #242):
* fsck_lfs buffer cache fixes, including PR #29151
* Change fsck_lfs phase 0 message to reflect reality
* fsck_lfs: check phase 5 (cleanerinfo accounting) even on
roll-forward
* Keep better track of the free list during roll-forward, avoiding
a core dump
* Improve hash table use for fsck_lfs buffer and vnode cache
* Document fsck_lfs flag -f, and implement -q
* Add resize_lfs, including kernel support
* Add LFS to mountd's list of exportable filesystem types
* Make the LFS lkm work again [christos@]
* Add MP locking to the LFS kernel subsystem
* Fix pager_map deadlock in lfs_putpages()
* Avoid incomplete file extension that looks like "partial
truncation" to fsck
* Use lfs_malloc for cleaner malloc, since the cleaner often runs
in low-memory conditions.
* Use splay trees, not hash table, to track page allocation for
write.
* Fix mkdir panic on full fs
* Fix page accounting leak by counting differently.
* Use rightly named structure for lfs_getattr [skrll@]
* Cosmetic changes for readability.
 1.96.2.6 27-Feb-2008  yamt sync with head.
 1.96.2.5 21-Jan-2008  yamt sync with head
 1.96.2.4 27-Oct-2007  yamt sync with head.
 1.96.2.3 03-Sep-2007  yamt sync with head.
 1.96.2.2 30-Dec-2006  yamt sync with head.
 1.96.2.1 21-Jun-2006  yamt sync with head.
 1.97.2.2 29-Oct-2005  yamt use lfs_* directly rather than via ufs_ops.
suggested by Chuck Silvers.
 1.97.2.1 20-Oct-2005  yamt adapt ufs.
 1.100.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.100.10.4 11-May-2006  elad sync with head
 1.100.10.3 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.100.10.2 19-Apr-2006  elad sync with head.
 1.100.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.100.8.2 24-May-2006  yamt sync with head.
 1.100.8.1 11-Apr-2006  yamt sync with head
 1.100.6.2 01-Jun-2006  kardel Sync with head.
 1.100.6.1 22-Apr-2006  simonb Sync with head.
 1.100.4.1 09-Sep-2006  rpaulo sync with head
 1.105.10.1 22-Oct-2006  yamt sync with head
 1.105.8.1 18-Nov-2006  ad Sync with head.
 1.106.4.3 17-May-2007  yamt sync with head.
 1.106.4.2 07-May-2007  yamt sync with head.
 1.106.4.1 12-Mar-2007  rmind Sync with HEAD.
 1.107.4.1 11-Jul-2007  mjf Sync with head.
 1.107.2.7 28-Aug-2007  yamt make this compilable with DEBUG.
 1.107.2.6 24-Aug-2007  ad Sync with buffer cache locking changes. See buf.h/vfs_bio.c for details.
Some minor portions are incomplete and needs to be verified as a whole.
 1.107.2.5 15-Jul-2007  ad Sync with head.
 1.107.2.4 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.107.2.3 09-Jun-2007  ad Sync with head.
 1.107.2.2 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.107.2.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.111.10.1 14-Oct-2007  yamt sync with head.
 1.111.8.3 23-Mar-2008  matt sync with HEAD
 1.111.8.2 09-Jan-2008  matt sync with HEAD
 1.111.8.1 06-Nov-2007  matt sync with HEAD
 1.111.6.2 09-Dec-2007  jmcneill Sync with HEAD.
 1.111.6.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.114.6.5 26-Dec-2007  ad Sync with head.
 1.114.6.4 19-Dec-2007  ad Use a global lfs_lock.
 1.114.6.3 19-Dec-2007  ad Fix some more problems w/lfs on this branch.
 1.114.6.2 19-Dec-2007  ad Get lfs mostly working.
 1.114.6.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.114.4.1 18-Feb-2008  mjf Sync with HEAD.
 1.115.4.1 02-Jan-2008  bouyer Sync with HEAD
 1.118.6.2 02-Jun-2008  mjf Sync with HEAD.
 1.118.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.119.4.2 11-Mar-2010  yamt sync with head
 1.119.4.1 16-May-2008  yamt sync with head.
 1.119.2.1 18-May-2008  yamt sync with head.
 1.120.18.1 21-Apr-2010  matt sync to netbsd-5
 1.120.10.2 25-Jan-2012  riz Pull up following revision(s) (requested by bouyer in ticket #1702):
sys/ufs/lfs/lfs_inode.c: revision 1.126
sys/ufs/ffs/ffs_inode.c: revision 1.108
If ufs_balloc_range() fails, make sure to call ?fs_truncate() to
reset v_writesize to the right value.
If v_writesize is left larger than the allocated blocks, we may have
the same issue as the one described in
http://mail-index.netbsd.org/tech-kern/2010/02/02/msg007156.html
 1.120.10.1 22-Feb-2010  snj Pull up following revision(s) (requested by bouyer in ticket #1302):
sys/ufs/ext2fs/ext2fs_inode.c: revision 1.71
sys/ufs/ffs/ffs_inode.c: revision 1.104
sys/ufs/lfs/lfs_inode.c: revision 1.121
sys/ufs/ufs/ufs_inode.c: revision 1.79
- ufs_balloc_range(): on error, only PG_RELEASED the pages that were
allocated to extend the file to the new size. Releasing all pages
may release pages that contains previously-written data not yet flushed
to disk. Should fix PR kern/35704
- {ffs,lfs,ext2fs}_truncate(): Even if the inode's size is the same as
the new length, call uvm_vnp_setsize(). *_truncate() may have been
called by *_write() in the error path (e.g. block allocation failure
because of quota of file system full), and at this point v_writesize
has been set to the desired size of the file and not reverted to the
old size. Not adjusting v_writesize to the real size cause
genfs_do_io() to write to disk past the real end of the file.
 1.121.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.122.8.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.122.2.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.125.2.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.125.2.2 23-Jan-2013  yamt sync with head
 1.125.2.1 17-Apr-2012  yamt sync with head
 1.126.8.4 03-Dec-2017  jdolecek update from HEAD
 1.126.8.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.126.8.2 23-Jun-2013  tls resync from head
 1.126.8.1 25-Feb-2013  tls resync with head
 1.132.2.2 18-May-2014  rmind sync with head
 1.132.2.1 28-Aug-2013  rmind sync with head
 1.136.6.2 28-Aug-2017  skrll Sync with HEAD
 1.136.6.1 22-Sep-2015  skrll Sync with HEAD
 1.147.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.147.2.2 26-Apr-2017  pgoyette Sync with HEAD
 1.147.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.155.6.1 30-Oct-2017  snj Pull up following revision(s) (requested by maya in ticket #330):
sbin/fsck_lfs/inode.c: 1.69
sbin/fsck_lfs/lfs.c: 1.73
sbin/fsck_lfs/pass6.c: 1.50
sbin/fsck_lfs/segwrite.c: 1.46
sys/ufs/lfs/lfs.h: 1.202-1.203
sys/ufs/lfs/lfs_accessors.h: 1.48
sys/ufs/lfs/lfs_alloc.c: 1.136-1.137
sys/ufs/lfs/lfs_balloc.c: 1.94
sys/ufs/lfs/lfs_bio.c: 1.141
sys/ufs/lfs/lfs_extern.h: 1.113
sys/ufs/lfs/lfs_inode.c: 1.156-1.157
sys/ufs/lfs/lfs_inode.h: 1.20, 1.21, 1.23
sys/ufs/lfs/lfs_itimes.c: 1.20
sys/ufs/lfs/lfs_pages.c: 1.13-1.15
sys/ufs/lfs/lfs_rename.c: 1.22
sys/ufs/lfs/lfs_segment.c: 1.270-1.275
sys/ufs/lfs/lfs_subr.c: 1.94-1.97
sys/ufs/lfs/lfs_syscalls.c: 1.175
sys/ufs/lfs/lfs_vfsops.c: 1.360
sys/ufs/lfs/lfs_vnops.c: 1.316-1.321
sys/ufs/lfs/ulfs_inode.c: 1.20
sys/ufs/lfs/ulfs_inode.h: 1.24
sys/ufs/lfs/ulfs_lookup.c: 1.41
sys/ufs/lfs/ulfs_quota2.c: 1.31
sys/ufs/lfs/ulfs_readwrite.c: 1.24
sys/ufs/lfs/ulfs_vnops.c: 1.49-1.50
Update inode member i_flag --> i_state to keep up with kernel changes
Move definition of IN_ALLMOD near the flag it's a mask for.
Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
Correct confusion between i_flag and i_flags
These will have to be renamed.
Spotted by Riastradh, thanks!
Add an XXX about the missing flags so it's not buried in a commit
message.
now the XXX count for LFS is 260
Rename i_flag to i_state.
The similarity to i_flags has previously caused errors.
Use continue to denote the no-op loop to match netbsd style
newline for extra clarity.
It isn't safe to drain dirops with seglock held, it'll deadlock if there
are any dirops. drain before grabbing seglock.
lfs_dirops == 0 is always true (as we already drained dirops), so omit
that part of the comparison.
Fixes a lot of LFS deadlocks. PR kern/52301
Many thanks to dholland for help analyzing coredumps
Ifdef out KDASSERT which fires on my machine.
Deduplicate sanity check that seglock is held on segunlock
Revert r1.272 fix to PR kern/52301, the performance hit is making things
unusable.
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar
XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
fix buffer overflow/KASSERT when cookies are supplied
lfs no longer uses the ffs-style struct direct, use the correct minimum
size
from dholland
XXX more wrong
Consistently use {,UN}MARK_VNODE macros rather than function calls.
Not much point doing anything after a panic call
Ask some question about the code in a XXX comment
XXX question our double-flushing of dirops
Fix typo in comment
 1.157.12.1 29-Feb-2020  ad Sync with head.
 1.157.10.1 17-Aug-2020  martin Pull up following revision(s) (requested by riastradh in ticket #1050):

sys/ufs/lfs/lfs_subr.c: revision 1.101
sys/ufs/lfs/lfs_subr.c: revision 1.102
sys/ufs/lfs/lfs_inode.c: revision 1.158
sys/ufs/lfs/lfs_inode.h: revision 1.25
sys/ufs/lfs/lfs_balloc.c: revision 1.95
sys/ufs/lfs/lfs_pages.c: revision 1.21
sys/ufs/lfs/lfs_vnops.c: revision 1.330
sys/ufs/lfs/lfs_alloc.c: revision 1.140 (patch)
sys/ufs/lfs/lfs_alloc.c: revision 1.141 (patch)
lib/libp2k/p2k.c: revision 1.72
sys/ufs/lfs/lfs.h: revision 1.205
sys/ufs/lfs/lfs.h: revision 1.206
sys/ufs/lfs/lfs_segment.c: revision 1.284
sys/ufs/lfs/lfs.h: revision 1.207
sys/ufs/lfs/lfs_segment.c: revision 1.285
sys/ufs/lfs/lfs_debug.c: revision 1.55
sys/ufs/lfs/lfs_rename.c: revision 1.23
usr.sbin/dumplfs/dumplfs.c: revision 1.65
sys/ufs/lfs/lfs_vfsops.c: revision 1.371
sys/arch/i386/stand/efiboot/bootx64/Makefile: revision 1.3
sys/ufs/lfs/lfs_vfsops.c: revision 1.372
sys/ufs/lfs/lfs_vfsops.c: revision 1.373
sbin/fsck_lfs/pass1.c: revision 1.46
sys/ufs/lfs/lfs_vnops.c: revision 1.326
sys/ufs/lfs/lfs_vnops.c: revision 1.327
sys/ufs/lfs/lfs_vfsops.c: revision 1.375 (patch)
sys/ufs/lfs/lfs_vnops.c: revision 1.328
sys/ufs/lfs/lfs_subr.c: revision 1.98
sys/ufs/lfs/lfs_extern.h: revision 1.116
sys/ufs/lfs/lfs_vnops.c: revision 1.329
sys/ufs/lfs/lfs_subr.c: revision 1.99
sys/ufs/lfs/lfs_extern.h: revision 1.117
sys/ufs/lfs/lfs_accessors.h: revision 1.49
sys/ufs/lfs/lfs_extern.h: revision 1.118
sys/rump/fs/lib/liblfs/Makefile: revision 1.15
sys/ufs/lfs/lfs_bio.c: revision 1.146 (patch)
sys/ufs/lfs/lfs_bio.c: revision 1.147
sys/ufs/lfs/lfs_subr.c: revision 1.100

Fix kassert in lfs by initializing vp first.

Use a marker node to iterate lfs_dchainhd / i_lfs_dchain.

I believe elements can be removed while the lock is dropped,
including the next node we're hanging on to.

Just use VOP_BWRITE for lfs_bwrite_log.
Hope this doesn't cause trouble with vfs_suspend.

Teach lfs to transition ro<->rw.

Prevent new dirops while we issue lfs_flush_dirops.

lfs_flush_dirops assumes (by KASSERT((ip->i_state & IN_ADIROP) == 0))
that vnodes on the dchain will not become involved in active dirops
even while holding no other locks (lfs_lock, v_interlock), so we must
set lfs_writer here. All other callers already set lfs_writer.

We set fs->lfs_writer++ without explicitly doing lfs_writer_enter
because
(a) we already waited for the dirops to drain, and
(b) we hold lfs_lock and cannot drop it before setting lfs_writer.

Assert lfs_writer where I think we can now prove it.

Serialize access to the splay tree with lfs_lock.

Change some cheap KDASSERT into KASSERT.

Take a reference and fix assertions in lfs_flush_dirops.
Fixes panic:
KASSERT((ip->i_state & IN_ADIROP) == 0) at lfs_vnops.c:1670
lfs_flush_dirops
lfs_check
lfs_setattr
VOP_SETATTR
change_mode
sys_fchmod
syscall

This assertion -- and the assertion that vp->v_uflag has VU_DIROP set
-- is valid only until we release lfs_lock, because we may race with
lfs_unmark_dirop which will remove the nodes and change the flags.

Further, vp itself is valid only as long as it is referenced, which it
is as long as it's on the dchain, but lfs_unmark_dirop drops the
dchain's reference.

Don't lfs_writer_enter while holding v_interlock.

There's no need to lfs_writer_enter at all here, as far as I can see.
lfs_flush_fs will do it for us.

Break deadlock in PR kern/52301.

The lock order is lfs_writer -> lfs_seglock. The problem in 52301 is
that lfs_segwrite violates this lock order by sometimes doing
lfs_seglock -> lfs_writer, either (a) when doing a checkpoint or (b),
opportunistically, when there are no dirops pending. Both cases can
deadlock, because dirops sometimes take the seglock (lfs_truncate,
lfs_valloc, lfs_vfree):
(a) There may be dirops pending, and they may be waiting for the
seglock, so we can't wait for them to complete while holding the
seglock.
(b) The test for fs->lfs_dirops == 0 happens unlocked, and the state
may change by the time lfs_writer_enter acquires lfs_lock.

To resolve this in each case:
(a) Do lfs_writer_enter before lfs_seglock, since we will need it
unconditionally anyway. The worst performance impact of this should
be that some dirops get delayed a little bit.
(b) Create a new lfs_writer_tryenter to use at this point so that the
test for fs->lfs_dirops == 0 and the acquisition of lfs_writer happen
atomically under lfs_lock.

Initialize/destroy lfs_allclean_wakeup in modcmd, not lfs_mountfs.

Fixes reloading lfs.kmod.

In lfs_update, hold lfs_writer around lfs_vflush.

Otherwise, we might do
lfs_vflush
-> lfs_seglock
-> lfs_segwait(SEGM_CKP)
-> lfs_writer_enter
which is the reverse of the lfs_writer -> lfs_seglock ordering.

Call lfs_orphan in lfs_rename while we're still in the dirop.
lfs_writer_enter can't fail; keep it simple and don't pretend it can.

Assert that mtsleep can't fail either -- it doesn't catch signals and
there's no timeout.

Teach LFS_ORPHAN_NEXTFREE about lfs64.

Dust off the orphan detection code and try to make it work.

Fix !DIAGNOSTIC compile

Fix userland references to LFS_ORPHAN_NEXTFREE.

Forgot to grep for these or do a full distribution build, oops!

Fix missing <sys/evcnt.h> by removing the evcnts instead.

Just wanted to confirm that a race might happen, and indeed it did.
These serve little diagnostic value otherwise.

OR into bp->b_cflags; don't overwrite.

CTASSERT lfs on-disk structure sizes.

Avoid misaligned access to lfs64 on-disk records in memory.
lfs64 directory entries are only 32-bit aligned in order to conserve
space in directory blocks, and we had a hack to stuff a 64-bit inode
in them. This replaces the hack by __aligned(4) __packed, and goes
further:

1. It's not clear that all the other lfs64 data structures are 64-bit
aligned on disk to begin with. We can go through these later and
upgrade them from
struct foo64 {
...
} __aligned(4) __packed;
union foo {
struct foo64 f64;
...
};
to
struct foo64 {
...
};
union foo {
struct foo64 f64 __aligned(8);
...
} __aligned(4) __packed;
if we really want to take advantage of 64-bit memory accesses.
However, the __aligned(4) __packed must remain on the union
because:
2. We access even the lfs32 data structures via a union that has
lfs64 members, and it turns out that compilers will assume access
through a union with 64-bit aligned members implies the whole
union has 64-bit alignment, even if we're only accessing a 32-bit
aligned member.

Fix clang build after packed lfs64 accessor change.

Suppress spurious address-of-packed error in rump lfs too.
 1.157.6.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.159.4.1 25-Apr-2020  bouyer Sync with bouyer-xenpvh-base2 (HEAD)
 1.27 20-Oct-2025  perseant * Generalize the partial-segment parser introduced for roll-forward,
using it to facilitate an in-kernel segment rewriter (cleaner), and a
mechanism to check whether a segment is in fact empty (only used
with DEBUG).

* Add these new fcntl calls:
- LFCNFILESTATS: For each inode given, report its number of direct
blocks, how many gaps (discontinuities) there are between direct
blocks, and how large the total gap distance is. This will be
useful for a coalescing agent.
- LFCNREWRITEFILE: For each inode given, rewrite its direct blocks,
effectively coalescing it into as compact a form as possible.
- LFCNSCRAMBLE: As above, except that it only rewrites every other
block. This causes the file to have many gaps that can be
measured with LFCNFILESTATS and addressed with LFCNREWRITEFILE,
for testing purposes.
- LFCNREWRITESEGS: Rewrite any live data in the given segments.
This is intended to simplify the cleaner API and facilitate an
in-kernel cleaner.
- LFCNCLEANERINFO: Get the most current CLEANERINFO data from the
kernel.
- LFCNSEGUSE: Retrieve segment usage data from the kernel.

* Vnodes marked IN_CLEANING now take a reference. Add a new "cleaner
lock", which must be taken by the cleaner before the segment lock,
and before marking nodes IN_CLEANING. This allows us to flush
vnodes, if necessary, before the cleaning segment is written, and
never to flush vnodes being cleaned. When the cleaner lock is
released, the vnodes are cleared of IN_CLEANING and the reference
dropped.

* Track a potential infinite loop in lfs_gatherblock.

* Pull "needs to flush" and "needs to wait for flush" into functions
instead of inlining their definitions.
 1.26 23-Mar-2022  andvar fix few typos for word "previous(ly)" in comments.
 1.25 23-Feb-2020  riastradh Use a marker node to iterate lfs_dchainhd / i_lfs_dchain.

I believe elements can be removed while the lock is dropped,
including the next node we're hanging on to.
 1.24 18-Feb-2020  chs remove the aiodoned thread. I originally added this to provide a thread context
for doing page cache iodone work, but since then biodone() has changed to
hand off all iodone work to a softint thread, so we no longer need the
special-purpose aiodoned thread.
 1.23 10-Jun-2017  maya branches: 1.23.6; 1.23.10; 1.23.12;
Rename i_flag to i_state.

The similarity to i_flags has previously caused errors.
 1.22 08-Jun-2017  chs move some buffer cache internals declarations from buf.h to vfs_bio.c.
this is needed to avoid name conflicts with ZFS and also
makes it clearer that other code shouldn't be messing with these.
remove the LFS debug code that poked around in bufqueues and
remove the BQ_EMPTY bufqueue since nothing uses it anymore.
provide a function to let LFS and wapbl read the value of nbuf for now.
 1.21 05-Jun-2017  maya Add an XXX about the missing flags so it's not buried in a commit
message.

now the XXX count for LFS is 260
 1.20 05-Jun-2017  maya Move definition of IN_ALLMOD near the flag it's a mask for.

Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
 1.19 06-Apr-2017  maya branches: 1.19.6;
don't guard lfs_sbactive or lfs_log with splbio, lfs_lock is plenty.
 1.18 06-Apr-2017  maya Provide a LFS_ENTER_LOG (__nothing) in the !DEBUG case.
so I can drop lots of #ifdef DEBUG around this macro. NFCI
 1.17 06-Apr-2017  maya Drop single use macro LFS_BCLEAN_LOG with an inlined implementation.

LFS_ENTER_LOG currently macro grabs lfs_lock, so I'd like to have just one
name for it.
 1.16 20-Jun-2016  dholland branches: 1.16.2; 1.16.4;
u_int{8,16,32,64}_t -> uint{8,16,32,64}_t in remaining lfs headers.
 1.15 20-Jun-2016  dholland Note more already-merged versions:

inode.h 1.68 is subsumed by ulfs_inode.h 1.19
inode.h 1.69-1.72 do not apply to lfs
ufs_extern.h 1.74 was covered when lfs was moved to the new vnode cache
ufs_extern.h 1.75 is equivalent to ulfs_extern.h 1.13
ufs_extern.h 1.76-1.77 do not apply to lfs
ufsmount.h 1.42 does not apply to lfs
ufs_inode.c 1.90 is subsumed by ulfs_inode.c 1.10
ufs_inode.c 1.91-1.92 do not apply to lfs
ufs_lookup.c 1.130 is subsumed by ulfs_lookup.c 1.24
ufs_lookup.c 1.131 is equivalent to ulfs_lookup.c 1.20
ufs_lookup.c 1.132 is equivalent to ulfs_lookup.c 1.21
ufs_lookup.c 1.133 is equivalent to ulfs_lookup.c 1.22
ufs_lookup.c 1.134 is equivalent to ulfs_lookup.c 1.23
ufs_lookup.c 1.135 is equivalent to ulfs_lookup.c 1.25
ufs_quota2.c 1.38 is equivalent to ulfs_quota2.c 1.17
ufs_quota2.c 1.39 is equivalent to ulfs_quota2.c 1.16
ufs_quota2.c 1.40 is equivalent to ulfs_quota2.c 1.18
ufs_vfsops.c 1.53 is subsumed by lfs_vfsops.c 1.324
ufs_vfsops.c 1.54 is subsumed by lfs_vfsops.c 1.324
ufs_vnops.c 1.223-1.224 do not apply to lfs
 1.14 20-Jun-2016  dholland ufs/inode.h -r1.67 is effectively merged into here too.
 1.13 20-Jun-2016  dholland Merge ufs/inode.h 1.66: remove i_hash from struct inode. This is the
hash table entry link from the old per-fs vnode cache and we don't
need it any more.
 1.12 19-Jun-2016  dholland Mark ufs file versions we're already synced with.
 1.11 01-Sep-2015  dholland Use the lfs dinode accessors in place of the ufs-derived ones.
(Mostly.)

The ufs-derived ones are fake structure member macros, which are gross
and not very safe. Also, it seems that a lot of places in the lfs code
were using the ffsv1 branch of them unconditionally, and this way it's
guaranteed all those places have been updated.

Found while doing this: for non-devices, have getattr produce NODEV
in the rdev field instead of leaking the address of the first direct
block.
 1.10 19-Aug-2015  dholland Part two of dinodes; use the same union everywhere.
(previously the ufs-derived code had things set up slightly different)

Remove a bunch of associated mess.
 1.9 12-Aug-2015  dholland Hack up dinode usage to be 64 vs. 32 as needed. Part 1.

(This part changes the native lfs code; the ufs-derived code already
has 64 vs. 32 logic, but as aspects of it are unsafe, and don't
entirely interoperate cleanly with the lfs 64/32 stuff, pass 2 will be
rehashing that.)
 1.8 02-Aug-2015  dholland Make i_eff_nblks in the in-memory inode 64 bits wide.
 1.7 26-May-2014  ryoon branches: 1.7.4;
Close comments
 1.6 26-May-2014  dholland remove ffs-only IN_SPACECOUNTED
 1.5 18-Jun-2013  dholland branches: 1.5.2; 1.5.8; 1.5.10;
Tuck away a bunch of symbols that don't need to be public.
 1.4 09-Jun-2013  dholland Move struct lfs_inode_ext to lfs_inode.h; it doesn't need to be public.
 1.3 08-Jun-2013  dholland G/C another unneeded union
 1.2 08-Jun-2013  dholland Remove stale union and accessor macros.
 1.1 08-Jun-2013  dholland Split the definitions suitable for userland out of ulfs_inode.h into
lfs_inode.h. Since fsck_lfs, newfs_lfs, and lfs_cleanerd want to reuse
the inode structure for their own internal use, and some of them share
parts of the kernel code as well, the best way forward is to provide a
relatively sanitized header that doesn't bring in stray material.

Shuffle a few other definitions around so that lfs_inode.h depends
only on lfs.h.

Install lfs_inode.h into /usr/include.
 1.5.10.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.5.10.1 18-Jun-2013  yamt file lfs_inode.h was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.5.8.1 10-Aug-2014  tls Rebase.
 1.5.2.4 03-Dec-2017  jdolecek update from HEAD
 1.5.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.5.2.2 23-Jun-2013  tls resync from head
 1.5.2.1 18-Jun-2013  tls file lfs_inode.h was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.7.4.3 28-Aug-2017  skrll Sync with HEAD
 1.7.4.2 09-Jul-2016  skrll Sync with HEAD
 1.7.4.1 22-Sep-2015  skrll Sync with HEAD
 1.16.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.16.2.1 26-Apr-2017  pgoyette Sync with HEAD
 1.19.6.1 30-Oct-2017  snj Pull up following revision(s) (requested by maya in ticket #330):
sbin/fsck_lfs/inode.c: 1.69
sbin/fsck_lfs/lfs.c: 1.73
sbin/fsck_lfs/pass6.c: 1.50
sbin/fsck_lfs/segwrite.c: 1.46
sys/ufs/lfs/lfs.h: 1.202-1.203
sys/ufs/lfs/lfs_accessors.h: 1.48
sys/ufs/lfs/lfs_alloc.c: 1.136-1.137
sys/ufs/lfs/lfs_balloc.c: 1.94
sys/ufs/lfs/lfs_bio.c: 1.141
sys/ufs/lfs/lfs_extern.h: 1.113
sys/ufs/lfs/lfs_inode.c: 1.156-1.157
sys/ufs/lfs/lfs_inode.h: 1.20, 1.21, 1.23
sys/ufs/lfs/lfs_itimes.c: 1.20
sys/ufs/lfs/lfs_pages.c: 1.13-1.15
sys/ufs/lfs/lfs_rename.c: 1.22
sys/ufs/lfs/lfs_segment.c: 1.270-1.275
sys/ufs/lfs/lfs_subr.c: 1.94-1.97
sys/ufs/lfs/lfs_syscalls.c: 1.175
sys/ufs/lfs/lfs_vfsops.c: 1.360
sys/ufs/lfs/lfs_vnops.c: 1.316-1.321
sys/ufs/lfs/ulfs_inode.c: 1.20
sys/ufs/lfs/ulfs_inode.h: 1.24
sys/ufs/lfs/ulfs_lookup.c: 1.41
sys/ufs/lfs/ulfs_quota2.c: 1.31
sys/ufs/lfs/ulfs_readwrite.c: 1.24
sys/ufs/lfs/ulfs_vnops.c: 1.49-1.50
Update inode member i_flag --> i_state to keep up with kernel changes
Move definition of IN_ALLMOD near the flag it's a mask for.
Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
Correct confusion between i_flag and i_flags
These will have to be renamed.
Spotted by Riastradh, thanks!
Add an XXX about the missing flags so it's not buried in a commit
message.
now the XXX count for LFS is 260
Rename i_flag to i_state.
The similarity to i_flags has previously caused errors.
Use continue to denote the no-op loop to match netbsd style
newline for extra clarity.
It isn't safe to drain dirops with seglock held, it'll deadlock if there
are any dirops. drain before grabbing seglock.
lfs_dirops == 0 is always true (as we already drained dirops), so omit
that part of the comparison.
Fixes a lot of LFS deadlocks. PR kern/52301
Many thanks to dholland for help analyzing coredumps
Ifdef out KDASSERT which fires on my machine.
Deduplicate sanity check that seglock is held on segunlock
Revert r1.272 fix to PR kern/52301, the performance hit is making things
unusable.
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar
XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
fix buffer overflow/KASSERT when cookies are supplied
lfs no longer uses the ffs-style struct direct, use the correct minimum
size
from dholland
XXX more wrong
Consistently use {,UN}MARK_VNODE macros rather than function calls.
Not much point doing anything after a panic call
Ask some question about the code in a XXX comment
XXX question our double-flushing of dirops
Fix typo in comment
 1.23.12.1 29-Feb-2020  ad Sync with head.
 1.23.10.1 17-Aug-2020  martin Pull up following revision(s) (requested by riastradh in ticket #1050):

sys/ufs/lfs/lfs_subr.c: revision 1.101
sys/ufs/lfs/lfs_subr.c: revision 1.102
sys/ufs/lfs/lfs_inode.c: revision 1.158
sys/ufs/lfs/lfs_inode.h: revision 1.25
sys/ufs/lfs/lfs_balloc.c: revision 1.95
sys/ufs/lfs/lfs_pages.c: revision 1.21
sys/ufs/lfs/lfs_vnops.c: revision 1.330
sys/ufs/lfs/lfs_alloc.c: revision 1.140 (patch)
sys/ufs/lfs/lfs_alloc.c: revision 1.141 (patch)
lib/libp2k/p2k.c: revision 1.72
sys/ufs/lfs/lfs.h: revision 1.205
sys/ufs/lfs/lfs.h: revision 1.206
sys/ufs/lfs/lfs_segment.c: revision 1.284
sys/ufs/lfs/lfs.h: revision 1.207
sys/ufs/lfs/lfs_segment.c: revision 1.285
sys/ufs/lfs/lfs_debug.c: revision 1.55
sys/ufs/lfs/lfs_rename.c: revision 1.23
usr.sbin/dumplfs/dumplfs.c: revision 1.65
sys/ufs/lfs/lfs_vfsops.c: revision 1.371
sys/arch/i386/stand/efiboot/bootx64/Makefile: revision 1.3
sys/ufs/lfs/lfs_vfsops.c: revision 1.372
sys/ufs/lfs/lfs_vfsops.c: revision 1.373
sbin/fsck_lfs/pass1.c: revision 1.46
sys/ufs/lfs/lfs_vnops.c: revision 1.326
sys/ufs/lfs/lfs_vnops.c: revision 1.327
sys/ufs/lfs/lfs_vfsops.c: revision 1.375 (patch)
sys/ufs/lfs/lfs_vnops.c: revision 1.328
sys/ufs/lfs/lfs_subr.c: revision 1.98
sys/ufs/lfs/lfs_extern.h: revision 1.116
sys/ufs/lfs/lfs_vnops.c: revision 1.329
sys/ufs/lfs/lfs_subr.c: revision 1.99
sys/ufs/lfs/lfs_extern.h: revision 1.117
sys/ufs/lfs/lfs_accessors.h: revision 1.49
sys/ufs/lfs/lfs_extern.h: revision 1.118
sys/rump/fs/lib/liblfs/Makefile: revision 1.15
sys/ufs/lfs/lfs_bio.c: revision 1.146 (patch)
sys/ufs/lfs/lfs_bio.c: revision 1.147
sys/ufs/lfs/lfs_subr.c: revision 1.100

Fix kassert in lfs by initializing vp first.

Use a marker node to iterate lfs_dchainhd / i_lfs_dchain.

I believe elements can be removed while the lock is dropped,
including the next node we're hanging on to.

Just use VOP_BWRITE for lfs_bwrite_log.
Hope this doesn't cause trouble with vfs_suspend.

Teach lfs to transition ro<->rw.

Prevent new dirops while we issue lfs_flush_dirops.

lfs_flush_dirops assumes (by KASSERT((ip->i_state & IN_ADIROP) == 0))
that vnodes on the dchain will not become involved in active dirops
even while holding no other locks (lfs_lock, v_interlock), so we must
set lfs_writer here. All other callers already set lfs_writer.

We set fs->lfs_writer++ without explicitly doing lfs_writer_enter
because
(a) we already waited for the dirops to drain, and
(b) we hold lfs_lock and cannot drop it before setting lfs_writer.

Assert lfs_writer where I think we can now prove it.

Serialize access to the splay tree with lfs_lock.

Change some cheap KDASSERT into KASSERT.

Take a reference and fix assertions in lfs_flush_dirops.
Fixes panic:
KASSERT((ip->i_state & IN_ADIROP) == 0) at lfs_vnops.c:1670
lfs_flush_dirops
lfs_check
lfs_setattr
VOP_SETATTR
change_mode
sys_fchmod
syscall

This assertion -- and the assertion that vp->v_uflag has VU_DIROP set
-- is valid only until we release lfs_lock, because we may race with
lfs_unmark_dirop which will remove the nodes and change the flags.

Further, vp itself is valid only as long as it is referenced, which it
is as long as it's on the dchain, but lfs_unmark_dirop drops the
dchain's reference.

Don't lfs_writer_enter while holding v_interlock.

There's no need to lfs_writer_enter at all here, as far as I can see.
lfs_flush_fs will do it for us.

Break deadlock in PR kern/52301.

The lock order is lfs_writer -> lfs_seglock. The problem in 52301 is
that lfs_segwrite violates this lock order by sometimes doing
lfs_seglock -> lfs_writer, either (a) when doing a checkpoint or (b),
opportunistically, when there are no dirops pending. Both cases can
deadlock, because dirops sometimes take the seglock (lfs_truncate,
lfs_valloc, lfs_vfree):
(a) There may be dirops pending, and they may be waiting for the
seglock, so we can't wait for them to complete while holding the
seglock.
(b) The test for fs->lfs_dirops == 0 happens unlocked, and the state
may change by the time lfs_writer_enter acquires lfs_lock.

To resolve this in each case:
(a) Do lfs_writer_enter before lfs_seglock, since we will need it
unconditionally anyway. The worst performance impact of this should
be that some dirops get delayed a little bit.
(b) Create a new lfs_writer_tryenter to use at this point so that the
test for fs->lfs_dirops == 0 and the acquisition of lfs_writer happen
atomically under lfs_lock.

Initialize/destroy lfs_allclean_wakeup in modcmd, not lfs_mountfs.

Fixes reloading lfs.kmod.

In lfs_update, hold lfs_writer around lfs_vflush.

Otherwise, we might do
lfs_vflush
-> lfs_seglock
-> lfs_segwait(SEGM_CKP)
-> lfs_writer_enter
which is the reverse of the lfs_writer -> lfs_seglock ordering.

Call lfs_orphan in lfs_rename while we're still in the dirop.
lfs_writer_enter can't fail; keep it simple and don't pretend it can.

Assert that mtsleep can't fail either -- it doesn't catch signals and
there's no timeout.

Teach LFS_ORPHAN_NEXTFREE about lfs64.

Dust off the orphan detection code and try to make it work.

Fix !DIAGNOSTIC compile

Fix userland references to LFS_ORPHAN_NEXTFREE.

Forgot to grep for these or do a full distribution build, oops!

Fix missing <sys/evcnt.h> by removing the evcnts instead.

Just wanted to confirm that a race might happen, and indeed it did.
These serve little diagnostic value otherwise.

OR into bp->b_cflags; don't overwrite.

CTASSERT lfs on-disk structure sizes.

Avoid misaligned access to lfs64 on-disk records in memory.
lfs64 directory entries are only 32-bit aligned in order to conserve
space in directory blocks, and we had a hack to stuff a 64-bit inode
in them. This replaces the hack by __aligned(4) __packed, and goes
further:

1. It's not clear that all the other lfs64 data structures are 64-bit
aligned on disk to begin with. We can go through these later and
upgrade them from
struct foo64 {
...
} __aligned(4) __packed;
union foo {
struct foo64 f64;
...
};
to
struct foo64 {
...
};
union foo {
struct foo64 f64 __aligned(8);
...
} __aligned(4) __packed;
if we really want to take advantage of 64-bit memory accesses.
However, the __aligned(4) __packed must remain on the union
because:
2. We access even the lfs32 data structures via a union that has
lfs64 members, and it turns out that compilers will assume access
through a union with 64-bit aligned members implies the whole
union has 64-bit alignment, even if we're only accessing a 32-bit
aligned member.

Fix clang build after packed lfs64 accessor change.

Suppress spurious address-of-packed error in rump lfs too.
 1.23.6.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.20 10-Jun-2017  maya Rename i_flag to i_state.

The similarity to i_flags has previously caused errors.
 1.19 01-Sep-2015  dholland branches: 1.19.10;
Use the lfs dinode accessors in place of the ufs-derived ones.
(Mostly.)

The ufs-derived ones are fake structure member macros, which are gross
and not very safe. Also, it seems that a lot of places in the lfs code
were using the ffsv1 branch of them unconditionally, and this way it's
guaranteed all those places have been updated.

Found while doing this: for non-devices, have getattr produce NODEV
in the rdev field instead of leaking the address of the first direct
block.
 1.18 12-Aug-2015  dholland Add IFILE32 and IFILE64 structures for the on-disk ifile entries.
Add and use accessors. There are also a bunch of places that cast and
I hope I've found them all...
 1.17 02-Aug-2015  dholland Use accessor functions for the version field of the lfs superblock.
I thought at first maybe the cases that test the version should be
rolled into the accessors, but on the whole I think the conclusion on
that is no.
 1.16 28-Jul-2015  dholland Add a new lfs header file: lfs_accessors.h.

This contains all the accessor functions and macros out of lfs.h.
Add an include of lfs_accessors.h after all uses of lfs.h... except
for code that wants to define its own struct lfs-alike that the
accessors are supposed to play along with. For these, set STRUCT_LFS
and include lfs_accessors.h after the necessary structure has been
defined, so that lfs_accessors.h can emit functions in terms of it.
 1.15 08-Jun-2013  dholland branches: 1.15.10;
Tidy up the LFS userland build hacks.
Don't use -I${NETBSDSRCDIR}/sys; don't include files other than the
exported LFS headers, which are lfs.h, lfs_inode.h, and (for now)
lfs_extern.h.
 1.14 06-Jun-2013  dholland Cleanups and hacks to make lfs userland stuff build:
- lfs_cksum.c doesn't actually need ulfs_inode.h any more.
- neither does lfs_itimes.c.
- add hacks to fsck_lfs to make it compile.
- add hacks to newfs_lfs to make it compile.
- fix warning in ulfs_quota.c when quotas are fully disabled
(as I guess is happening with the rumpity version)

XXX: This commit adds -I${NETBSDSRCDIR}/sys to the Makefiles for
XXX: fsck_lfs, newfs_lfs, and lfs_cleanerd. This needs to be cleaned
XXX: up ASAP; but I consider this less problematic in the short term
XXX: than spewing ulfs_*.h into /usr/include.
 1.13 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.12 28-Apr-2008  martin branches: 1.12.34; 1.12.44;
Remove clause 3 and 4 from TNF licenses
 1.11 02-Jan-2008  ad branches: 1.11.6; 1.11.8; 1.11.10;
Merge vmlocking2 to head.
 1.10 23-Jun-2006  yamt branches: 1.10.14; 1.10.30; 1.10.36; 1.10.40; 1.10.44;
fix a simonb-timecounters regression.
the precision of getnanotime() is not suitable for file timestamps.
esp. when it's nfs-exported.

- introduce vfs_timestamp().
(the name is from freebsd. currently merely a wrapper of nanotime())
- for ufs-like filesystems, use it rather than getnanotime().

XXX check other filesystems.
 1.9 07-Jun-2006  kardel branches: 1.9.2; 1.9.4;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.8 15-May-2006  christos branches: 1.8.2;
we need <sys/kauth.h> for the kernel.
 1.7 15-May-2006  christos Don't include <sys/kauth.h>; breaks userland (newfs_lfs)
 1.6 14-May-2006  elad integrate kauth.
 1.5 19-Mar-2006  rtr don't bother checking of ts == NULL before assigning since we know that
it is.
solves coverity 2725 / run 6
 1.4 11-Dec-2005  christos branches: 1.4.4; 1.4.6; 1.4.8; 1.4.10; 1.4.12;
merge ktrace-lwp.
 1.3 30-Oct-2005  simonb branches: 1.3.2;
We don't need <sys/systm.h> here.
 1.2 13-Sep-2005  christos branches: 1.2.2;
redefine panic if we are a user program.
 1.1 13-Sep-2005  christos split out lfs_itimes(). It is used in fsck_lfs.
 1.2.2.1 02-Nov-2005  yamt sync with head.
 1.3.2.2 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.3.2.1 30-Oct-2005  skrll file lfs_itimes.c was added on branch ktrace-lwp on 2005-11-10 14:12:32 +0000
 1.4.12.2 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.4.12.1 28-Mar-2006  tron Merge 2006-03-28 NetBSD-current into the "peter-altq" branch.
 1.4.10.2 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.4.10.1 19-Apr-2006  elad sync with head.
 1.4.8.3 26-Jun-2006  yamt sync with head.
 1.4.8.2 24-May-2006  yamt sync with head.
 1.4.8.1 01-Apr-2006  yamt sync with head.
 1.4.6.4 01-Jun-2006  kardel Sync with head.
 1.4.6.3 22-Apr-2006  simonb Sync with head.
 1.4.6.2 05-Feb-2006  simonb In the *itimes functions, just call getnanotime() at the start of
the function and use the result if needed, rather than the previous
conditional calls/assignments method. The code is clearer this way,
and benchmarks at about the same speed.
 1.4.6.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time() and use "time_second"
instead of "time.tv_sec".
 1.4.4.1 09-Sep-2006  rpaulo sync with head
 1.8.2.1 19-Jun-2006  chap Sync with head.
 1.9.4.4 21-Jan-2008  yamt sync with head
 1.9.4.3 30-Dec-2006  yamt sync with head.
 1.9.4.2 21-Jun-2006  yamt sync with head.
 1.9.4.1 07-Jun-2006  yamt file lfs_itimes.c was added on branch yamt-lazymbuf on 2006-06-21 15:12:39 +0000
 1.9.2.1 13-Jul-2006  gdamore Merge from HEAD.
 1.10.44.1 02-Jan-2008  bouyer Sync with HEAD
 1.10.40.2 19-Dec-2007  ad Use a global lfs_lock.
 1.10.40.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.10.36.1 18-Feb-2008  mjf Sync with HEAD.
 1.10.30.1 09-Jan-2008  matt sync with HEAD
 1.10.14.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.11.10.1 16-May-2008  yamt sync with head.
 1.11.8.1 18-May-2008  yamt sync with head.
 1.11.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.12.44.2 03-Dec-2017  jdolecek update from HEAD
 1.12.44.1 23-Jun-2013  tls resync from head
 1.12.34.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.15.10.2 28-Aug-2017  skrll Sync with HEAD
 1.15.10.1 22-Sep-2015  skrll Sync with HEAD
 1.19.10.1 30-Oct-2017  snj Pull up following revision(s) (requested by maya in ticket #330):
sbin/fsck_lfs/inode.c: 1.69
sbin/fsck_lfs/lfs.c: 1.73
sbin/fsck_lfs/pass6.c: 1.50
sbin/fsck_lfs/segwrite.c: 1.46
sys/ufs/lfs/lfs.h: 1.202-1.203
sys/ufs/lfs/lfs_accessors.h: 1.48
sys/ufs/lfs/lfs_alloc.c: 1.136-1.137
sys/ufs/lfs/lfs_balloc.c: 1.94
sys/ufs/lfs/lfs_bio.c: 1.141
sys/ufs/lfs/lfs_extern.h: 1.113
sys/ufs/lfs/lfs_inode.c: 1.156-1.157
sys/ufs/lfs/lfs_inode.h: 1.20, 1.21, 1.23
sys/ufs/lfs/lfs_itimes.c: 1.20
sys/ufs/lfs/lfs_pages.c: 1.13-1.15
sys/ufs/lfs/lfs_rename.c: 1.22
sys/ufs/lfs/lfs_segment.c: 1.270-1.275
sys/ufs/lfs/lfs_subr.c: 1.94-1.97
sys/ufs/lfs/lfs_syscalls.c: 1.175
sys/ufs/lfs/lfs_vfsops.c: 1.360
sys/ufs/lfs/lfs_vnops.c: 1.316-1.321
sys/ufs/lfs/ulfs_inode.c: 1.20
sys/ufs/lfs/ulfs_inode.h: 1.24
sys/ufs/lfs/ulfs_lookup.c: 1.41
sys/ufs/lfs/ulfs_quota2.c: 1.31
sys/ufs/lfs/ulfs_readwrite.c: 1.24
sys/ufs/lfs/ulfs_vnops.c: 1.49-1.50
Update inode member i_flag --> i_state to keep up with kernel changes
Move definition of IN_ALLMOD near the flag it's a mask for.
Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
Correct confusion between i_flag and i_flags
These will have to be renamed.
Spotted by Riastradh, thanks!
Add an XXX about the missing flags so it's not buried in a commit
message.
now the XXX count for LFS is 260
Rename i_flag to i_state.
The similarity to i_flags has previously caused errors.
Use continue to denote the no-op loop to match netbsd style
newline for extra clarity.
It isn't safe to drain dirops with seglock held, it'll deadlock if there
are any dirops. drain before grabbing seglock.
lfs_dirops == 0 is always true (as we already drained dirops), so omit
that part of the comparison.
Fixes a lot of LFS deadlocks. PR kern/52301
Many thanks to dholland for help analyzing coredumps
Ifdef out KDASSERT which fires on my machine.
Deduplicate sanity check that seglock is held on segunlock
Revert r1.272 fix to PR kern/52301, the performance hit is making things
unusable.
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar
XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
fix buffer overflow/KASSERT when cookies are supplied
lfs no longer uses the ffs-style struct direct, use the correct minimum
size
from dholland
XXX more wrong
Consistently use {,UN}MARK_VNODE macros rather than function calls.
Not much point doing anything after a panic call
Ask some question about the code in a XXX comment
XXX question our double-flushing of dirops
Fix typo in comment
 1.4 20-Oct-2025  perseant * Generalize the partial-segment parser introduced for roll-forward,
using it to facilitate an in-kernel segment rewriter (cleaner), and a
mechanism to check whether a segment is in fact empty (only used
with DEBUG).

* Add these new fcntl calls:
- LFCNFILESTATS: For each inode given, report its number of direct
blocks, how many gaps (discontinuities) there are between direct
blocks, and how large the total gap distance is. This will be
useful for a coalescing agent.
- LFCNREWRITEFILE: For each inode given, rewrite its direct blocks,
effectively coalescing it into as compact a form as possible.
- LFCNSCRAMBLE: As above, except that it only rewrites every other
block. This causes the file to have many gaps that can be
measured with LFCNFILESTATS and addressed with LFCNREWRITEFILE,
for testing purposes.
- LFCNREWRITESEGS: Rewrite any live data in the given segments.
This is intended to simplify the cleaner API and facilitate an
in-kernel cleaner.
- LFCNCLEANERINFO: Get the most current CLEANERINFO data from the
kernel.
- LFCNSEGUSE: Retrieve segment usage data from the kernel.

* Vnodes marked IN_CLEANING now take a reference. Add a new "cleaner
lock", which must be taken by the cleaner before the segment lock,
and before marking nodes IN_CLEANING. This allows us to flush
vnodes, if necessary, before the cleaning segment is written, and
never to flush vnodes being cleaned. When the cleaner lock is
released, the vnodes are cleared of IN_CLEANING and the reference
dropped.

* Track a potential infinite loop in lfs_gatherblock.

* Pull "needs to flush" and "needs to wait for flush" into functions
instead of inlining their definitions.
 1.3 20-Jun-2016  dholland u_int{8,16,32,64}_t -> uint{8,16,32,64}_t in remaining lfs headers.
 1.2 12-Aug-2015  dholland Widen several of the fields of BLOCK_INFO to 64 bits.

Keep the old BLOCK_INFO as BLOCK_INFO_70, and version the fcntls that
use it.

Note that BLOCK_INFO_70 has 64-bit padding issues so that it's
different on 32-bit and 64-bit machines. This has been fixed. However,
BLOCK_INFO also contains a pointer, so compat32 stuff for 32-on-64 is
still needed and doesn't currently exist.
 1.1 28-Jul-2013  dholland branches: 1.1.2; 1.1.6; 1.1.10; 1.1.12;
Add lfs_kernel.h for declarations that don't need to be exposed to userland.

lfs currently has the following headers:
lfs.h - on-disk structures and stuff needed for userlevel tools
lfs_inode.h - additional restricted materials for userlevel tools
that operate the fs (newfs_lfs, fsck_lfs, lfs_cleanerd)
lfs_kernel.h - stuff needed only in the kernel

and the following legacy headers that are expected to be mopped up and
folded into one of the above:
lfs_extern.h - function prototypes
ulfs_bswap.h - endian-independent support
ulfs_dinode.h - now contains very little
ulfs_dirhash.h - dirhash support
ulfs_extattr.h - extattr support
ulfs_extern.h - more function prototypes
ulfs_inode.h - assorted kernel-only declarations
ulfs_quota.h - quota support
ulfs_quota1.h - more quota support
ulfs_quota2.h - more quota support
ulfs_quotacommon.h - more quota support
ulfsmount.h - legacy copy of ufsmount material
 1.1.12.2 09-Jul-2016  skrll Sync with HEAD
 1.1.12.1 22-Sep-2015  skrll Sync with HEAD
 1.1.10.3 03-Dec-2017  jdolecek update from HEAD
 1.1.10.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.1.10.1 28-Jul-2013  tls file lfs_kernel.h was added on branch tls-maxphys on 2014-08-20 00:04:44 +0000
 1.1.6.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.1.6.1 28-Jul-2013  yamt file lfs_kernel.h was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.1.2.2 28-Aug-2013  rmind sync with head
 1.1.2.1 28-Jul-2013  rmind file lfs_kernel.h was added on branch rmind-smpnet on 2013-08-28 23:59:38 +0000
 1.27 11-Apr-2023  riastradh lfs: Assert page identity doesn't change.

Forgot what I was debugging when I inserted a relookup in my local
tree months or years ago, but whatever it was, if that solved a
problem, this KDASSERT will make the problem more obvious.
 1.26 05-Sep-2020  riastradh Round of uvm.h cleanup.

The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
to query whether curlwp is the pagedaemon, which should maybe be
exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
by it. We should split up uvm_extern.h but this will serve for now
to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
UVMHIST(ubchist), since ubchist is declared in uvm.h but the
reference evaporates if UVMHIST is not defined, so we reduce header
file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
here.

ok chs@
 1.25 17-Mar-2020  ad Tweak the March 14th change to make page waits interlocked by pg->interlock.
Remove unneeded changes and only deal with the PQ_WANTED flag, to exclude
possible bugs.
 1.24 14-Mar-2020  ad Make uvm_pagemarkdirty() responsible for putting vnodes onto the syncer
work list. Proposed on tech-kern@.
 1.23 14-Mar-2020  ad Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.
 1.22 23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.21 23-Feb-2020  riastradh Don't lfs_writer_enter while holding v_interlock.

There's no need to lfs_writer_enter at all here, as far as I can see.
lfs_flush_fs will do it for us.
 1.20 15-Jan-2020  ad Merge from yamt-pagecache (after much testing):

- Reduce unnecessary page scan in putpages esp. when an object has a ton of
pages cached but only a few of them are dirty.

- Reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
 1.19 31-Dec-2019  ad branches: 1.19.2;
- Add and use wrapper functions that take and acquire page interlocks, and pairs
of page interlocks. Require that the page interlock be held over calls to
uvm_pageactivate(), uvm_pagewire() and similar.

- Solve the concurrency problem with page replacement state. Rather than
updating the global state synchronously, set an intended state on
individual pages (active, inactive, enqueued, dequeued) while holding the
page interlock. After the interlock is released put the pages on a 128
entry per-CPU queue for their state changes to be made real in batch.
This results in in a ~400 fold decrease in contention on my test system.
Proposed on tech-kern but modified to use the page interlock rather than
atomics to synchronise as it's much easier to maintain that way, and
cheaper.
 1.18 20-Dec-2019  ad Fix lfs_putpages() for bsize < nbpg.
 1.17 15-Dec-2019  ad Merge from yamt-pagecache:

- do gang lookup of pages using radixtree.
- remove now unused uvm_object::uo_memq and vm_page::listq.queue.
 1.16 13-Dec-2019  ad Break the global uvm_pageqlock into a per-page identity lock and a private
lock for use of the pagedaemon policy code. Discussed on tech-kern.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour
 1.15 19-Aug-2017  maya branches: 1.15.4; 1.15.8;
Ask some question about the code in a XXX comment
 1.14 10-Jun-2017  maya Rename i_flag to i_state.

The similarity to i_flags has previously caused errors.
 1.13 05-Jun-2017  maya Correct confusion between i_flag and i_flags
These will have to be renamed.

Spotted by Riastradh, thanks!
 1.12 04-Jun-2017  hannken Operations fstrans_start() and fstrans_start_nowait() now always
use FSTRANS_SHARED as lock type so remove the lock type argument.

File system state FSTRANS_SUSPENDING is now unused so remove it.

Regen vnode_if files.

Ride 8.99.1 less than a hour ago.
 1.11 01-Apr-2017  maya branches: 1.11.6;
Switch lfs_writer_daemon to use condvar instead of mtsleep.
track thread existence with struct lwp instead of pid + lid,
it's more useful from ddb.
 1.10 30-Mar-2017  hannken Remove now redundant calls to fstrans_start()/fstrans_done().

Add fstrans_start()/fstrans_done() to lfs_putpages().
 1.9 04-Oct-2016  christos branches: 1.9.2;
Grr, the optimizer on mips64 can't handle this... Use MIN_PAGE_SIZE.
 1.8 21-Jul-2016  christos Don't do variable stack allocations for systems with non-const PAGE_SIZE;
instead assume that the smallest pagesize is 1024.
 1.7 12-Aug-2015  dholland branches: 1.7.2;
Make 32-bit and 64-bit versions of SEGSUM.
Also fix some of the FINFO handling as it's closely entangled.
 1.6 02-Aug-2015  dholland Second batch of 64 -> 32 truncations in lfs, along with more minor
tidyups and corrections in passing.
 1.5 28-Jul-2015  dholland Add a new lfs header file: lfs_accessors.h.

This contains all the accessor functions and macros out of lfs.h.
Add an include of lfs_accessors.h after all uses of lfs.h... except
for code that wants to define its own struct lfs-alike that the
accessors are supposed to play along with. For these, set STRUCT_LFS
and include lfs_accessors.h after the necessary structure has been
defined, so that lfs_accessors.h can emit functions in terms of it.
 1.4 25-Jul-2015  martin Use accessors in DEBUG and DIAGNOSTIC code as well
 1.3 24-Jul-2015  dholland More lfs superblock accessors.
(This changes the rest of the code over; all the accessors were
already added.)

The difference between this commit and the previous one is arbitrary,
but the previous one passed the regression tests on its own so I'm
keeping it separate to help with any bisections that might be needed
in the future.
 1.2 24-Jul-2015  dholland Switch to accessor functions for elements of the LFS on-disk
superblock. This will allow switching between 32/64 bit forms on the
fly; it will also allow handling LFS_EI reasonably tidily. (That
currently doesn't work on the superblock.)

It also gets rid of cpp abuse in the form of fake structure member
macros.

Also, instead of doing sleep/wakeup on &lfs_avail and &lfs_nextseg
inside the on-disk superblock, add extra elements to the in-memory
struct lfs for this. (XXX: these should be changed to condvars, but
not right now)

XXX: this migrates a structure needed by the lfs code in libsa (struct
salfs) into lfs.h, where it doesn't belong, but for the time being
this is necessary in order to allow the accessors (and the various
lfs macros and other goop that relies on them) to compile.
 1.1 16-May-2014  dholland branches: 1.1.2; 1.1.4; 1.1.8; 1.1.10;
Move lfs_getpages and lfs_putpages to their own file.
 1.1.10.4 28-Aug-2017  skrll Sync with HEAD
 1.1.10.3 05-Dec-2016  skrll Sync with HEAD
 1.1.10.2 05-Oct-2016  skrll Sync with HEAD
 1.1.10.1 22-Sep-2015  skrll Sync with HEAD
 1.1.8.3 03-Dec-2017  jdolecek update from HEAD
 1.1.8.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.1.8.1 16-May-2014  tls file lfs_pages.c was added on branch tls-maxphys on 2014-08-20 00:04:44 +0000
 1.1.4.2 10-Aug-2014  tls Rebase.
 1.1.4.1 16-May-2014  tls file lfs_pages.c was added on branch tls-earlyentropy on 2014-08-10 06:56:58 +0000
 1.1.2.2 18-May-2014  rmind sync with head
 1.1.2.1 16-May-2014  rmind file lfs_pages.c was added on branch rmind-smpnet on 2014-05-18 17:46:21 +0000
 1.7.2.3 26-Apr-2017  pgoyette Sync with HEAD
 1.7.2.2 04-Nov-2016  pgoyette Sync with HEAD
 1.7.2.1 26-Jul-2016  pgoyette Sync with HEAD
 1.9.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.11.6.2 30-Oct-2017  snj Pull up following revision(s) (requested by maya in ticket #330):
sbin/fsck_lfs/inode.c: 1.69
sbin/fsck_lfs/lfs.c: 1.73
sbin/fsck_lfs/pass6.c: 1.50
sbin/fsck_lfs/segwrite.c: 1.46
sys/ufs/lfs/lfs.h: 1.202-1.203
sys/ufs/lfs/lfs_accessors.h: 1.48
sys/ufs/lfs/lfs_alloc.c: 1.136-1.137
sys/ufs/lfs/lfs_balloc.c: 1.94
sys/ufs/lfs/lfs_bio.c: 1.141
sys/ufs/lfs/lfs_extern.h: 1.113
sys/ufs/lfs/lfs_inode.c: 1.156-1.157
sys/ufs/lfs/lfs_inode.h: 1.20, 1.21, 1.23
sys/ufs/lfs/lfs_itimes.c: 1.20
sys/ufs/lfs/lfs_pages.c: 1.13-1.15
sys/ufs/lfs/lfs_rename.c: 1.22
sys/ufs/lfs/lfs_segment.c: 1.270-1.275
sys/ufs/lfs/lfs_subr.c: 1.94-1.97
sys/ufs/lfs/lfs_syscalls.c: 1.175
sys/ufs/lfs/lfs_vfsops.c: 1.360
sys/ufs/lfs/lfs_vnops.c: 1.316-1.321
sys/ufs/lfs/ulfs_inode.c: 1.20
sys/ufs/lfs/ulfs_inode.h: 1.24
sys/ufs/lfs/ulfs_lookup.c: 1.41
sys/ufs/lfs/ulfs_quota2.c: 1.31
sys/ufs/lfs/ulfs_readwrite.c: 1.24
sys/ufs/lfs/ulfs_vnops.c: 1.49-1.50
Update inode member i_flag --> i_state to keep up with kernel changes
Move definition of IN_ALLMOD near the flag it's a mask for.
Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
Correct confusion between i_flag and i_flags
These will have to be renamed.
Spotted by Riastradh, thanks!
Add an XXX about the missing flags so it's not buried in a commit
message.
now the XXX count for LFS is 260
Rename i_flag to i_state.
The similarity to i_flags has previously caused errors.
Use continue to denote the no-op loop to match netbsd style
newline for extra clarity.
It isn't safe to drain dirops with seglock held, it'll deadlock if there
are any dirops. drain before grabbing seglock.
lfs_dirops == 0 is always true (as we already drained dirops), so omit
that part of the comparison.
Fixes a lot of LFS deadlocks. PR kern/52301
Many thanks to dholland for help analyzing coredumps
Ifdef out KDASSERT which fires on my machine.
Deduplicate sanity check that seglock is held on segunlock
Revert r1.272 fix to PR kern/52301, the performance hit is making things
unusable.
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar
XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
fix buffer overflow/KASSERT when cookies are supplied
lfs no longer uses the ffs-style struct direct, use the correct minimum
size
from dholland
XXX more wrong
Consistently use {,UN}MARK_VNODE macros rather than function calls.
Not much point doing anything after a panic call
Ask some question about the code in a XXX comment
XXX question our double-flushing of dirops
Fix typo in comment
 1.11.6.1 04-Jun-2017  bouyer pullup the following revisions, requested by hannken in ticket #2:
src/share/man/man9/fstrans.9 1.25
src/sys/kern/vfs_mount.c 1.66
src/sys/kern/vfs_subr.c 1.468
src/sys/kern/vfs_trans.c 1.46
src/sys/kern/vfs_vnode.c 1.94, 1.95, 1.96
src/sys/kern/vnode_if.c 1.105, 1.106
src/sys/kern/vnode_if.sh 1.65, 1.66
src/sys/kern/vnode_if.src 1.76
src/sys/miscfs/genfs/genfs_io.c 1.69
src/sys/miscfs/genfs/genfs_vnops.c 1.196, 1.197
src/sys/miscfs/genfs/layer_extern.h 1.40
src/sys/miscfs/genfs/layer_vfsops.c 1.51
src/sys/miscfs/genfs/layer_vnops.c 1.67
src/sys/miscfs/nullfs/null_vnops.c 1.42
src/sys/miscfs/overlay/overlay_vnops.c 1.24
src/sys/miscfs/umapfs/umap_vnops.c 1.60
src/sys/rump/include/rump/rumpvnode_if.h 1.29, 1.30
src/sys/rump/librump/rumpkern/emul.c 1.182
src/sys/rump/librump/rumpvfs/rumpvnode_if.c 1.29, 1.30
src/sys/sys/fstrans.h 1.11
src/sys/sys/vnode.h 1.278
src/sys/sys/vnode_if.h 1.100, 1.101
src/sys/sys/vnode_impl.h 1.14, 1.15
src/sys/ufs/lfs/lfs_pages.c 1.12

Vnode state, lock and fstrans cleanup:
- Rename vnode state "VS_ACTIVE" to "VS_LOADED" and add synthetic
state "VS_ACTIVE" to assert a loaded vnode with usecount > 0.

- Redo FSTRANS in vnode_if.c and use it for VOP_LOCK and VOP_UNLOCK.

- Cleanup the genfs lock operations.

- Make "struct vnode_impl" member "vi_lock" a krwlock_t again.

- Remove the lock type argument from fstrans_start and
fstrans_start_nowait,
remove now unused FSTRANS state "FSTRANS_SUSPENDING".
 1.15.8.1 17-Aug-2020  martin Pull up following revision(s) (requested by riastradh in ticket #1050):

sys/ufs/lfs/lfs_subr.c: revision 1.101
sys/ufs/lfs/lfs_subr.c: revision 1.102
sys/ufs/lfs/lfs_inode.c: revision 1.158
sys/ufs/lfs/lfs_inode.h: revision 1.25
sys/ufs/lfs/lfs_balloc.c: revision 1.95
sys/ufs/lfs/lfs_pages.c: revision 1.21
sys/ufs/lfs/lfs_vnops.c: revision 1.330
sys/ufs/lfs/lfs_alloc.c: revision 1.140 (patch)
sys/ufs/lfs/lfs_alloc.c: revision 1.141 (patch)
lib/libp2k/p2k.c: revision 1.72
sys/ufs/lfs/lfs.h: revision 1.205
sys/ufs/lfs/lfs.h: revision 1.206
sys/ufs/lfs/lfs_segment.c: revision 1.284
sys/ufs/lfs/lfs.h: revision 1.207
sys/ufs/lfs/lfs_segment.c: revision 1.285
sys/ufs/lfs/lfs_debug.c: revision 1.55
sys/ufs/lfs/lfs_rename.c: revision 1.23
usr.sbin/dumplfs/dumplfs.c: revision 1.65
sys/ufs/lfs/lfs_vfsops.c: revision 1.371
sys/arch/i386/stand/efiboot/bootx64/Makefile: revision 1.3
sys/ufs/lfs/lfs_vfsops.c: revision 1.372
sys/ufs/lfs/lfs_vfsops.c: revision 1.373
sbin/fsck_lfs/pass1.c: revision 1.46
sys/ufs/lfs/lfs_vnops.c: revision 1.326
sys/ufs/lfs/lfs_vnops.c: revision 1.327
sys/ufs/lfs/lfs_vfsops.c: revision 1.375 (patch)
sys/ufs/lfs/lfs_vnops.c: revision 1.328
sys/ufs/lfs/lfs_subr.c: revision 1.98
sys/ufs/lfs/lfs_extern.h: revision 1.116
sys/ufs/lfs/lfs_vnops.c: revision 1.329
sys/ufs/lfs/lfs_subr.c: revision 1.99
sys/ufs/lfs/lfs_extern.h: revision 1.117
sys/ufs/lfs/lfs_accessors.h: revision 1.49
sys/ufs/lfs/lfs_extern.h: revision 1.118
sys/rump/fs/lib/liblfs/Makefile: revision 1.15
sys/ufs/lfs/lfs_bio.c: revision 1.146 (patch)
sys/ufs/lfs/lfs_bio.c: revision 1.147
sys/ufs/lfs/lfs_subr.c: revision 1.100

Fix kassert in lfs by initializing vp first.

Use a marker node to iterate lfs_dchainhd / i_lfs_dchain.

I believe elements can be removed while the lock is dropped,
including the next node we're hanging on to.

Just use VOP_BWRITE for lfs_bwrite_log.
Hope this doesn't cause trouble with vfs_suspend.

Teach lfs to transition ro<->rw.

Prevent new dirops while we issue lfs_flush_dirops.

lfs_flush_dirops assumes (by KASSERT((ip->i_state & IN_ADIROP) == 0))
that vnodes on the dchain will not become involved in active dirops
even while holding no other locks (lfs_lock, v_interlock), so we must
set lfs_writer here. All other callers already set lfs_writer.

We set fs->lfs_writer++ without explicitly doing lfs_writer_enter
because
(a) we already waited for the dirops to drain, and
(b) we hold lfs_lock and cannot drop it before setting lfs_writer.

Assert lfs_writer where I think we can now prove it.

Serialize access to the splay tree with lfs_lock.

Change some cheap KDASSERT into KASSERT.

Take a reference and fix assertions in lfs_flush_dirops.
Fixes panic:
KASSERT((ip->i_state & IN_ADIROP) == 0) at lfs_vnops.c:1670
lfs_flush_dirops
lfs_check
lfs_setattr
VOP_SETATTR
change_mode
sys_fchmod
syscall

This assertion -- and the assertion that vp->v_uflag has VU_DIROP set
-- is valid only until we release lfs_lock, because we may race with
lfs_unmark_dirop which will remove the nodes and change the flags.

Further, vp itself is valid only as long as it is referenced, which it
is as long as it's on the dchain, but lfs_unmark_dirop drops the
dchain's reference.

Don't lfs_writer_enter while holding v_interlock.

There's no need to lfs_writer_enter at all here, as far as I can see.
lfs_flush_fs will do it for us.

Break deadlock in PR kern/52301.

The lock order is lfs_writer -> lfs_seglock. The problem in 52301 is
that lfs_segwrite violates this lock order by sometimes doing
lfs_seglock -> lfs_writer, either (a) when doing a checkpoint or (b),
opportunistically, when there are no dirops pending. Both cases can
deadlock, because dirops sometimes take the seglock (lfs_truncate,
lfs_valloc, lfs_vfree):
(a) There may be dirops pending, and they may be waiting for the
seglock, so we can't wait for them to complete while holding the
seglock.
(b) The test for fs->lfs_dirops == 0 happens unlocked, and the state
may change by the time lfs_writer_enter acquires lfs_lock.

To resolve this in each case:
(a) Do lfs_writer_enter before lfs_seglock, since we will need it
unconditionally anyway. The worst performance impact of this should
be that some dirops get delayed a little bit.
(b) Create a new lfs_writer_tryenter to use at this point so that the
test for fs->lfs_dirops == 0 and the acquisition of lfs_writer happen
atomically under lfs_lock.

Initialize/destroy lfs_allclean_wakeup in modcmd, not lfs_mountfs.

Fixes reloading lfs.kmod.

In lfs_update, hold lfs_writer around lfs_vflush.

Otherwise, we might do
lfs_vflush
-> lfs_seglock
-> lfs_segwait(SEGM_CKP)
-> lfs_writer_enter
which is the reverse of the lfs_writer -> lfs_seglock ordering.

Call lfs_orphan in lfs_rename while we're still in the dirop.
lfs_writer_enter can't fail; keep it simple and don't pretend it can.

Assert that mtsleep can't fail either -- it doesn't catch signals and
there's no timeout.

Teach LFS_ORPHAN_NEXTFREE about lfs64.

Dust off the orphan detection code and try to make it work.

Fix !DIAGNOSTIC compile

Fix userland references to LFS_ORPHAN_NEXTFREE.

Forgot to grep for these or do a full distribution build, oops!

Fix missing <sys/evcnt.h> by removing the evcnts instead.

Just wanted to confirm that a race might happen, and indeed it did.
These serve little diagnostic value otherwise.

OR into bp->b_cflags; don't overwrite.

CTASSERT lfs on-disk structure sizes.

Avoid misaligned access to lfs64 on-disk records in memory.
lfs64 directory entries are only 32-bit aligned in order to conserve
space in directory blocks, and we had a hack to stuff a 64-bit inode
in them. This replaces the hack by __aligned(4) __packed, and goes
further:

1. It's not clear that all the other lfs64 data structures are 64-bit
aligned on disk to begin with. We can go through these later and
upgrade them from
struct foo64 {
...
} __aligned(4) __packed;
union foo {
struct foo64 f64;
...
};
to
struct foo64 {
...
};
union foo {
struct foo64 f64 __aligned(8);
...
} __aligned(4) __packed;
if we really want to take advantage of 64-bit memory accesses.
However, the __aligned(4) __packed must remain on the union
because:
2. We access even the lfs32 data structures via a union that has
lfs64 members, and it turns out that compilers will assume access
through a union with 64-bit aligned members implies the whole
union has 64-bit alignment, even if we're only accessing a 32-bit
aligned member.

Fix clang build after packed lfs64 accessor change.

Suppress spurious address-of-packed error in rump lfs too.
 1.15.4.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.19.2.2 29-Feb-2020  ad Sync with head.
 1.19.2.1 17-Jan-2020  ad Sync with head.
 1.25 20-Oct-2021  thorpej Overhaul of the EVFILT_VNODE kevent(2) filter:

- Centralize vnode kevent handling in the VOP_*() wrappers, rather than
forcing each individual file system to deal with it (except VOP_RENAME(),
because VOP_RENAME() is a mess and we currently have 2 different ways
of handling it; at least it's reasonably well-centralized in the "new"
way).
- Add support for NOTE_OPEN, NOTE_CLOSE, NOTE_CLOSE_WRITE, and NOTE_READ,
compatible with the same events in FreeBSD.
- Track which kevent notifications clients are interested in receiving
to avoid doing work for events no one cares about (avoiding, e.g.
taking locks and traversing the klist to send a NOTE_WRITE when
someone is merely watching for a file to be deleted, for example).

In support of the above:

- Add support in vnode_if.sh for specifying PRE- and POST-op handlers,
to be invoked before and after vop_pre() and vop_post(), respectively.
Basic idea from FreeBSD, but implemented differently.
- Add support in vnode_if.sh for specifying CONTEXT fields in the
vop_*_args structures. These context fields are used to convey information
between the file system VOP function and the VOP wrapper, but do not
occupy an argument slot in the VOP_*() call itself. These context fields
are initialized and subsequently interpreted by PRE- and POST-op handlers.
- Version VOP_REMOVE(), uses the a context field for the file system to report
back the resulting link count of the target vnode. Return this in tmpfs,
udf, nfs, chfs, ext2fs, lfs, and ufs.

NetBSD 9.99.92.
 1.24 05-Sep-2020  riastradh Round of uvm.h cleanup.

The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
to query whether curlwp is the pagedaemon, which should maybe be
exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
by it. We should split up uvm_extern.h but this will serve for now
to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
UVMHIST(ubchist), since ubchist is declared in uvm.h but the
reference evaporates if UVMHIST is not defined, so we reduce header
file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
here.

ok chs@
 1.23 23-Feb-2020  riastradh Call lfs_orphan in lfs_rename while we're still in the dirop.
 1.22 10-Jun-2017  maya branches: 1.22.6; 1.22.10; 1.22.12;
Rename i_flag to i_state.

The similarity to i_flags has previously caused errors.
 1.21 20-Jun-2016  dholland branches: 1.21.10;
One more batch of already-synced ufs changes:

ufs_extern.h 1.79 is equivalent to ulfs_extern.h 1.14
ufsmount.h 1.43 is (roughly) equivalent to lfs_extern.h 1.102
ufs_inode.c 1.94 does not apply to lfs
ufs_inode.c 1.95 does not apply to lfs either
ufs_readwrite.c 1.108 is equivalent to ulfs_readwrite.c 1.8
ufs_readwrite.c 1.109 is equivalent to ulfs_readwrite.c 1.9
ufs_readwrite.c 1.110 is equivalent to ulfs_readwrite.c 1.10
ufs_readwrite.c 1.111 does not apply to lfs
ufs_readwrite.c 1.112 is equivalent to ulfs_readwrite.c 1.11
ufs_readwrite.c 1.113 is equivalent to ulfs_readwrite.c 1.13
ufs_readwrite.c 1.114 is equivalent to ulfs_readwrite.c 1.14
ufs_readwrite.c 1.115 is equivalent to ulfs_readwrite.c 1.15
ufs_readwrite.c 1.116-1.118 does not apply to lfs
ufs_readwrite.c 1.119-1.120 are equivalent to ulfs_readwrite.c 1.16
ufs_rename.c 1.12 is equivalent to lfs_rename.c 1.8
ufs_vnops.c 1.226 is equivalent to ulfs_vnops.c 1.22 and lfs_vnops.c 1.270
ufs_vnops.c 1.227 is equivalent to ulfs_vnops.c 1.23
ufs_vnops.c 1.228-1.229 are equivalent to ulfs_vnops.c 1.24
ufs_vnops.c 1.230 is equivalent to ulfs_vnops.c 1.25 and lfs_vnops.c 1.271
ufs_vnops.c 1.231 originated in lfs
ufs_vnops.c 1.232 does not apply to lfs
 1.20 20-Jun-2016  dholland fix typo in previous
 1.19 20-Jun-2016  dholland Merge ufs_rename.c 1.11: ufs_gro_genealogy: use vcache_get() to lookup DOTDOT.
 1.18 20-Jun-2016  dholland More already-merged or equivalent changes:

ufs_dirhash.c 1.36 corresponds to ulfs_dirhash.c 1.8
ufs_extattr.c 1.43 corresponds to ulfs_extattr.c 1.7
ufs_lookup.c 1.126 does not apply to lfs
ufs_lookup.c 1.127 we already have
ufs_lookup.c 1.128 does not apply to lfs
ufs_lookup.c 1.129 corresponds to ulfs_lookup.c 1.19
ufs_quota1.c 1.19 corresponds to ulfs_quota1.c 1.7
ufs_quota1.c 1.20 corresponds to ulfs_quota1.c 1.8
ufs_quota2.c 1.36 we have equivalent changes for
ufs_rename.c 1.9 corresponds to lfs_rename.c 1.5
ufs_rename.c 1.10 corresponds to lfs_rename.c 1.6
ufs_vnops.c 1.219 corresponds to lfs_vnops.c 1.260 and ulfs_vnops.c 1.19
ufs_vnops.c 1.220 corresponds to lfs_vnops.c 1.261 and ulfs_vnops.c 1.20
ufs_vnops.c 1.221 was superseded by later changes
ufs_vnops.c 1.222 got fixed independently in lfs
 1.17 19-Jun-2016  dholland Mark ufs file versions we're already synced with.
 1.16 21-Sep-2015  dholland Add 64-bit directory entry structures, and adjust accessors accordingly.

The LFS64 directory entry has a 64-bit inode number. This is stored as
two 32-bit values to avoid inducing 64-bit alignment requirements.

The exposed type for manipulating directory entries is now
LFS_DIRHEADER, following the same convention as e.g. IFILE and SEGUSE.
(But with LFS_ on it, because.)
 1.15 21-Sep-2015  dholland Oops; LFS_DIRECTSIZ() is going to need the fs as an argument.

Also, it turns out that dirhash needs a compile-time-constant version
of LFS_DIRECTSIZ(LFS_MAXNAMLEN+1), independent of 64-vs-32, so create
LFS_MAXDIRENTRYSIZE for this. Sigh.
 1.14 20-Sep-2015  dholland Clean up struct lfs_dirtemplate.
 1.13 15-Sep-2015  dholland Pass around struct lfs_dirheader instead of struct lfs_direct.
 1.12 15-Sep-2015  dholland Add an accessor function for directory names.
 1.11 15-Sep-2015  dholland Kill off ulfs_makedirentry; just pass the data to ulfs_direnter instead.
For now, move one copy of the code that allocates and fills in a
temporary struct lfs_direct to the top of ulfs_direnter; but it should
go away shortly.
 1.10 15-Sep-2015  dholland Add and use accessor functions for more of the directory entry fields.
 1.9 01-Sep-2015  dholland Add new accessors for the d_type and d_namlen fields of struct lfs_direct.
Napalm the old byteswap access logic for these.
 1.8 27-Mar-2015  riastradh Disentangle buffer-cached I/O from page-cached I/O in UFS.

Page-cached I/O is used for regular files, and is initiated by VFS
users such as userland and NFS.

Buffer-cached I/O is used for directories and symlinks, and is issued
only internally by UFS.

New UFS routine ufs_bufio replaces vn_rdwr for internal use.
ufs_bufio is implemented by new UFS operations uo_bufrd/uo_bufwr,
which sit in ufs_readwrite.c alongside the VOP_READ/VOP_WRITE
implementations.

I preserved the code as much as possible and will leave further
simplification for future commits. I kept the ulfs_readwrite.c
copypasta close to ufs_readwrite.c in case we ever want to merge them
back; likewise ext2fs_readwrite.c.

No externally visible semantic change. All atf fs tests still pass.
 1.7 17-May-2014  dholland branches: 1.7.2; 1.7.6; 1.7.8;
Remove the DIROP macros. They are evil, especially the CREATE ones.

This results in some duplicate logic in the creation vnops (symlink,
mknod, create, mkdir) but we will probably be able to factor it out in
a more sensible way later.

Now the creation vnops call getnewvnode explicitly instead of under
multiple layers of obscure gunk. Then we explicitly do lfs_set_dirop,
and afterwards lfs_unset_dirop.
 1.6 06-Feb-2014  hannken branches: 1.6.2;
Move fstrans_start()/fstrans_done() into genfs_insane_rename() to protect
the complete rename operation like we do for all other vnode operations.
 1.5 28-Jan-2014  martin Quell a gcc 4.8 maybe-unitialized false positive
 1.4 28-Jul-2013  dholland branches: 1.4.2;
Migrate the miscellaneous ulfs-level info from struct ulfsmount to
struct lfs.

Put them inside #ifdef _KERNEL there. They are not the only such
members, gross as that is. Unfortunately, moving struct lfs to
lfs_kernel.h does not work.
 1.3 28-Jul-2013  dholland Remove the now-pointless ulfs ops macros.
 1.2 20-Jul-2013  dholland branches: 1.2.2;
G/C unused pieces.
 1.1 20-Jul-2013  dholland Collect the pieces of lfs rename into lfs_rename.c, and sprinkle static.
 1.2.2.2 23-Jul-2013  riastradh sync with HEAD
 1.2.2.1 20-Jul-2013  riastradh file lfs_rename.c was added on branch riastradh-drm2 on 2013-07-23 21:07:38 +0000
 1.4.2.3 18-May-2014  rmind sync with head
 1.4.2.2 28-Aug-2013  rmind sync with head
 1.4.2.1 28-Jul-2013  rmind file lfs_rename.c was added on branch rmind-smpnet on 2013-08-28 23:59:38 +0000
 1.6.2.1 10-Aug-2014  tls Rebase.
 1.7.8.4 28-Aug-2017  skrll Sync with HEAD
 1.7.8.3 09-Jul-2016  skrll Sync with HEAD
 1.7.8.2 22-Sep-2015  skrll Sync with HEAD
 1.7.8.1 06-Apr-2015  skrll Sync with HEAD
 1.7.6.3 03-Dec-2017  jdolecek update from HEAD
 1.7.6.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.7.6.1 17-May-2014  tls file lfs_rename.c was added on branch tls-maxphys on 2014-08-20 00:04:44 +0000
 1.7.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.7.2.1 17-May-2014  yamt file lfs_rename.c was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.21.10.1 30-Oct-2017  snj Pull up following revision(s) (requested by maya in ticket #330):
sbin/fsck_lfs/inode.c: 1.69
sbin/fsck_lfs/lfs.c: 1.73
sbin/fsck_lfs/pass6.c: 1.50
sbin/fsck_lfs/segwrite.c: 1.46
sys/ufs/lfs/lfs.h: 1.202-1.203
sys/ufs/lfs/lfs_accessors.h: 1.48
sys/ufs/lfs/lfs_alloc.c: 1.136-1.137
sys/ufs/lfs/lfs_balloc.c: 1.94
sys/ufs/lfs/lfs_bio.c: 1.141
sys/ufs/lfs/lfs_extern.h: 1.113
sys/ufs/lfs/lfs_inode.c: 1.156-1.157
sys/ufs/lfs/lfs_inode.h: 1.20, 1.21, 1.23
sys/ufs/lfs/lfs_itimes.c: 1.20
sys/ufs/lfs/lfs_pages.c: 1.13-1.15
sys/ufs/lfs/lfs_rename.c: 1.22
sys/ufs/lfs/lfs_segment.c: 1.270-1.275
sys/ufs/lfs/lfs_subr.c: 1.94-1.97
sys/ufs/lfs/lfs_syscalls.c: 1.175
sys/ufs/lfs/lfs_vfsops.c: 1.360
sys/ufs/lfs/lfs_vnops.c: 1.316-1.321
sys/ufs/lfs/ulfs_inode.c: 1.20
sys/ufs/lfs/ulfs_inode.h: 1.24
sys/ufs/lfs/ulfs_lookup.c: 1.41
sys/ufs/lfs/ulfs_quota2.c: 1.31
sys/ufs/lfs/ulfs_readwrite.c: 1.24
sys/ufs/lfs/ulfs_vnops.c: 1.49-1.50
Update inode member i_flag --> i_state to keep up with kernel changes
Move definition of IN_ALLMOD near the flag it's a mask for.
Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
Correct confusion between i_flag and i_flags
These will have to be renamed.
Spotted by Riastradh, thanks!
Add an XXX about the missing flags so it's not buried in a commit
message.
now the XXX count for LFS is 260
Rename i_flag to i_state.
The similarity to i_flags has previously caused errors.
Use continue to denote the no-op loop to match netbsd style
newline for extra clarity.
It isn't safe to drain dirops with seglock held, it'll deadlock if there
are any dirops. drain before grabbing seglock.
lfs_dirops == 0 is always true (as we already drained dirops), so omit
that part of the comparison.
Fixes a lot of LFS deadlocks. PR kern/52301
Many thanks to dholland for help analyzing coredumps
Ifdef out KDASSERT which fires on my machine.
Deduplicate sanity check that seglock is held on segunlock
Revert r1.272 fix to PR kern/52301, the performance hit is making things
unusable.
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar
XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
fix buffer overflow/KASSERT when cookies are supplied
lfs no longer uses the ffs-style struct direct, use the correct minimum
size
from dholland
XXX more wrong
Consistently use {,UN}MARK_VNODE macros rather than function calls.
Not much point doing anything after a panic call
Ask some question about the code in a XXX comment
XXX question our double-flushing of dirops
Fix typo in comment
 1.22.12.1 29-Feb-2020  ad Sync with head.
 1.22.10.1 17-Aug-2020  martin Pull up following revision(s) (requested by riastradh in ticket #1050):

sys/ufs/lfs/lfs_subr.c: revision 1.101
sys/ufs/lfs/lfs_subr.c: revision 1.102
sys/ufs/lfs/lfs_inode.c: revision 1.158
sys/ufs/lfs/lfs_inode.h: revision 1.25
sys/ufs/lfs/lfs_balloc.c: revision 1.95
sys/ufs/lfs/lfs_pages.c: revision 1.21
sys/ufs/lfs/lfs_vnops.c: revision 1.330
sys/ufs/lfs/lfs_alloc.c: revision 1.140 (patch)
sys/ufs/lfs/lfs_alloc.c: revision 1.141 (patch)
lib/libp2k/p2k.c: revision 1.72
sys/ufs/lfs/lfs.h: revision 1.205
sys/ufs/lfs/lfs.h: revision 1.206
sys/ufs/lfs/lfs_segment.c: revision 1.284
sys/ufs/lfs/lfs.h: revision 1.207
sys/ufs/lfs/lfs_segment.c: revision 1.285
sys/ufs/lfs/lfs_debug.c: revision 1.55
sys/ufs/lfs/lfs_rename.c: revision 1.23
usr.sbin/dumplfs/dumplfs.c: revision 1.65
sys/ufs/lfs/lfs_vfsops.c: revision 1.371
sys/arch/i386/stand/efiboot/bootx64/Makefile: revision 1.3
sys/ufs/lfs/lfs_vfsops.c: revision 1.372
sys/ufs/lfs/lfs_vfsops.c: revision 1.373
sbin/fsck_lfs/pass1.c: revision 1.46
sys/ufs/lfs/lfs_vnops.c: revision 1.326
sys/ufs/lfs/lfs_vnops.c: revision 1.327
sys/ufs/lfs/lfs_vfsops.c: revision 1.375 (patch)
sys/ufs/lfs/lfs_vnops.c: revision 1.328
sys/ufs/lfs/lfs_subr.c: revision 1.98
sys/ufs/lfs/lfs_extern.h: revision 1.116
sys/ufs/lfs/lfs_vnops.c: revision 1.329
sys/ufs/lfs/lfs_subr.c: revision 1.99
sys/ufs/lfs/lfs_extern.h: revision 1.117
sys/ufs/lfs/lfs_accessors.h: revision 1.49
sys/ufs/lfs/lfs_extern.h: revision 1.118
sys/rump/fs/lib/liblfs/Makefile: revision 1.15
sys/ufs/lfs/lfs_bio.c: revision 1.146 (patch)
sys/ufs/lfs/lfs_bio.c: revision 1.147
sys/ufs/lfs/lfs_subr.c: revision 1.100

Fix kassert in lfs by initializing vp first.

Use a marker node to iterate lfs_dchainhd / i_lfs_dchain.

I believe elements can be removed while the lock is dropped,
including the next node we're hanging on to.

Just use VOP_BWRITE for lfs_bwrite_log.
Hope this doesn't cause trouble with vfs_suspend.

Teach lfs to transition ro<->rw.

Prevent new dirops while we issue lfs_flush_dirops.

lfs_flush_dirops assumes (by KASSERT((ip->i_state & IN_ADIROP) == 0))
that vnodes on the dchain will not become involved in active dirops
even while holding no other locks (lfs_lock, v_interlock), so we must
set lfs_writer here. All other callers already set lfs_writer.

We set fs->lfs_writer++ without explicitly doing lfs_writer_enter
because
(a) we already waited for the dirops to drain, and
(b) we hold lfs_lock and cannot drop it before setting lfs_writer.

Assert lfs_writer where I think we can now prove it.

Serialize access to the splay tree with lfs_lock.

Change some cheap KDASSERT into KASSERT.

Take a reference and fix assertions in lfs_flush_dirops.
Fixes panic:
KASSERT((ip->i_state & IN_ADIROP) == 0) at lfs_vnops.c:1670
lfs_flush_dirops
lfs_check
lfs_setattr
VOP_SETATTR
change_mode
sys_fchmod
syscall

This assertion -- and the assertion that vp->v_uflag has VU_DIROP set
-- is valid only until we release lfs_lock, because we may race with
lfs_unmark_dirop which will remove the nodes and change the flags.

Further, vp itself is valid only as long as it is referenced, which it
is as long as it's on the dchain, but lfs_unmark_dirop drops the
dchain's reference.

Don't lfs_writer_enter while holding v_interlock.

There's no need to lfs_writer_enter at all here, as far as I can see.
lfs_flush_fs will do it for us.

Break deadlock in PR kern/52301.

The lock order is lfs_writer -> lfs_seglock. The problem in 52301 is
that lfs_segwrite violates this lock order by sometimes doing
lfs_seglock -> lfs_writer, either (a) when doing a checkpoint or (b),
opportunistically, when there are no dirops pending. Both cases can
deadlock, because dirops sometimes take the seglock (lfs_truncate,
lfs_valloc, lfs_vfree):
(a) There may be dirops pending, and they may be waiting for the
seglock, so we can't wait for them to complete while holding the
seglock.
(b) The test for fs->lfs_dirops == 0 happens unlocked, and the state
may change by the time lfs_writer_enter acquires lfs_lock.

To resolve this in each case:
(a) Do lfs_writer_enter before lfs_seglock, since we will need it
unconditionally anyway. The worst performance impact of this should
be that some dirops get delayed a little bit.
(b) Create a new lfs_writer_tryenter to use at this point so that the
test for fs->lfs_dirops == 0 and the acquisition of lfs_writer happen
atomically under lfs_lock.

Initialize/destroy lfs_allclean_wakeup in modcmd, not lfs_mountfs.

Fixes reloading lfs.kmod.

In lfs_update, hold lfs_writer around lfs_vflush.

Otherwise, we might do
lfs_vflush
-> lfs_seglock
-> lfs_segwait(SEGM_CKP)
-> lfs_writer_enter
which is the reverse of the lfs_writer -> lfs_seglock ordering.

Call lfs_orphan in lfs_rename while we're still in the dirop.
lfs_writer_enter can't fail; keep it simple and don't pretend it can.

Assert that mtsleep can't fail either -- it doesn't catch signals and
there's no timeout.

Teach LFS_ORPHAN_NEXTFREE about lfs64.

Dust off the orphan detection code and try to make it work.

Fix !DIAGNOSTIC compile

Fix userland references to LFS_ORPHAN_NEXTFREE.

Forgot to grep for these or do a full distribution build, oops!

Fix missing <sys/evcnt.h> by removing the evcnts instead.

Just wanted to confirm that a race might happen, and indeed it did.
These serve little diagnostic value otherwise.

OR into bp->b_cflags; don't overwrite.

CTASSERT lfs on-disk structure sizes.

Avoid misaligned access to lfs64 on-disk records in memory.
lfs64 directory entries are only 32-bit aligned in order to conserve
space in directory blocks, and we had a hack to stuff a 64-bit inode
in them. This replaces the hack by __aligned(4) __packed, and goes
further:

1. It's not clear that all the other lfs64 data structures are 64-bit
aligned on disk to begin with. We can go through these later and
upgrade them from
struct foo64 {
...
} __aligned(4) __packed;
union foo {
struct foo64 f64;
...
};
to
struct foo64 {
...
};
union foo {
struct foo64 f64 __aligned(8);
...
} __aligned(4) __packed;
if we really want to take advantage of 64-bit memory accesses.
However, the __aligned(4) __packed must remain on the union
because:
2. We access even the lfs32 data structures via a union that has
lfs64 members, and it turns out that compilers will assume access
through a union with 64-bit aligned members implies the whole
union has 64-bit alignment, even if we're only accessing a 32-bit
aligned member.

Fix clang build after packed lfs64 accessor change.

Suppress spurious address-of-packed error in rump lfs too.
 1.22.6.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.40 20-Oct-2025  perseant * Generalize the partial-segment parser introduced for roll-forward,
using it to facilitate an in-kernel segment rewriter (cleaner), and a
mechanism to check whether a segment is in fact empty (only used
with DEBUG).

* Add these new fcntl calls:
- LFCNFILESTATS: For each inode given, report its number of direct
blocks, how many gaps (discontinuities) there are between direct
blocks, and how large the total gap distance is. This will be
useful for a coalescing agent.
- LFCNREWRITEFILE: For each inode given, rewrite its direct blocks,
effectively coalescing it into as compact a form as possible.
- LFCNSCRAMBLE: As above, except that it only rewrites every other
block. This causes the file to have many gaps that can be
measured with LFCNFILESTATS and addressed with LFCNREWRITEFILE,
for testing purposes.
- LFCNREWRITESEGS: Rewrite any live data in the given segments.
This is intended to simplify the cleaner API and facilitate an
in-kernel cleaner.
- LFCNCLEANERINFO: Get the most current CLEANERINFO data from the
kernel.
- LFCNSEGUSE: Retrieve segment usage data from the kernel.

* Vnodes marked IN_CLEANING now take a reference. Add a new "cleaner
lock", which must be taken by the cleaner before the segment lock,
and before marking nodes IN_CLEANING. This allows us to flush
vnodes, if necessary, before the cleaning segment is written, and
never to flush vnodes being cleaned. When the cleaner lock is
released, the vnodes are cleared of IN_CLEANING and the reference
dropped.

* Track a potential infinite loop in lfs_gatherblock.

* Pull "needs to flush" and "needs to wait for flush" into functions
instead of inlining their definitions.
 1.39 14-Oct-2025  perseant Check the existing inode address against LFS_UNUSED_DADDR before checking
whether it is in the same segment, to prevent a byte undercount in segment 0
during roll forward. This was most often expressed in the fs/lfs/t_rfw rfw64
test case, though it affected both 32- and 64-bit LFSs equally.
 1.38 06-Oct-2025  perseant Don't stop recovery when we find a partial-segment with neither inodes nor
finfos. Under normal conditions, we should never be producing such a partial
segment. However, these do sometimes appear and they need not prevent us
from continuing.
 1.37 17-Sep-2025  perseant Add working in-kernel roll forward.
 1.36 05-Sep-2020  riastradh Round of uvm.h cleanup.

The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
to query whether curlwp is the pagedaemon, which should maybe be
exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
by it. We should split up uvm_extern.h but this will serve for now
to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
UVMHIST(ubchist), since ubchist is declared in uvm.h but the
reference evaporates if UVMHIST is not defined, so we reduce header
file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
here.

ok chs@
 1.35 17-Jan-2020  ad VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.
 1.34 01-Jan-2019  hannken branches: 1.34.6;
Add "void *extra" argument to vcache_new() so a file system may
pass more information about the file to create.

Welcome to 8.99.30
 1.33 10-Dec-2018  maxv Remove unused mbuf.h includes.
 1.32 03-Oct-2015  dholland branches: 1.32.16; 1.32.18;
Use the new IINFO in the rfw code, eliminating hardwired 32-bit values.
 1.31 01-Sep-2015  dholland Use the lfs dinode accessors in place of the ufs-derived ones.
(Mostly.)

The ufs-derived ones are fake structure member macros, which are gross
and not very safe. Also, it seems that a lot of places in the lfs code
were using the ffsv1 branch of them unconditionally, and this way it's
guaranteed all those places have been updated.

Found while doing this: for non-devices, have getattr produce NODEV
in the rdev field instead of leaking the address of the first direct
block.
 1.30 19-Aug-2015  dholland Part two of dinodes; use the same union everywhere.
(previously the ufs-derived code had things set up slightly different)

Remove a bunch of associated mess.
 1.29 12-Aug-2015  dholland Hack up dinode usage to be 64 vs. 32 as needed. Part 1.

(This part changes the native lfs code; the ufs-derived code already
has 64 vs. 32 logic, but as aspects of it are unsafe, and don't
entirely interoperate cleanly with the lfs 64/32 stuff, pass 2 will be
rehashing that.)
 1.28 12-Aug-2015  dholland Provide 32-bit and 64-bit versions of FINFO.

This also entailed sorting out part of struct segment, as that
contains a pointer into the current FINFO data.
 1.27 12-Aug-2015  dholland Make 32-bit and 64-bit versions of SEGSUM.
Also fix some of the FINFO handling as it's closely entangled.
 1.26 12-Aug-2015  dholland Add IFILE32 and IFILE64 structures for the on-disk ifile entries.
Add and use accessors. There are also a bunch of places that cast and
I hope I've found them all...
 1.25 02-Aug-2015  dholland Use accessor functions for the version field of the lfs superblock.
I thought at first maybe the cases that test the version should be
rolled into the accessors, but on the whole I think the conclusion on
that is no.
 1.24 28-Jul-2015  dholland Add a new lfs header file: lfs_accessors.h.

This contains all the accessor functions and macros out of lfs.h.
Add an include of lfs_accessors.h after all uses of lfs.h... except
for code that wants to define its own struct lfs-alike that the
accessors are supposed to play along with. For these, set STRUCT_LFS
and include lfs_accessors.h after the necessary structure has been
defined, so that lfs_accessors.h can emit functions in terms of it.
 1.23 24-Jul-2015  dholland More lfs superblock accessors.
(This changes the rest of the code over; all the accessors were
already added.)

The difference between this commit and the previous one is arbitrary,
but the previous one passed the regression tests on its own so I'm
keeping it separate to help with any bisections that might be needed
in the future.
 1.22 24-Jul-2015  dholland Switch to accessor functions for elements of the LFS on-disk
superblock. This will allow switching between 32/64 bit forms on the
fly; it will also allow handling LFS_EI reasonably tidily. (That
currently doesn't work on the superblock.)

It also gets rid of cpp abuse in the form of fake structure member
macros.

Also, instead of doing sleep/wakeup on &lfs_avail and &lfs_nextseg
inside the on-disk superblock, add extra elements to the in-memory
struct lfs for this. (XXX: these should be changed to condvars, but
not right now)

XXX: this migrates a structure needed by the lfs code in libsa (struct
salfs) into lfs.h, where it doesn't belong, but for the time being
this is necessary in order to allow the accessors (and the various
lfs macros and other goop that relies on them) to compile.
 1.21 16-Jul-2015  dholland Don't cast the return value of malloc.
 1.20 31-May-2015  hannken Change lfs from hash table to vcache.

- Change lfs_valloc() to return an inode number and version instead of
a vnode and move lfs_ialloc() and lfs_vcreate() to new lfs_init_vnode().

- Add lfs_valloc_fixed() to allocate a known inode, used by kernel
roll forward.

- Remove lfs_*ref(), these functions cannot coexist with vcache and
their commented behaviour is far away from their implementation.

- Add the cleaner lwp and blockinfo to struct ulfsmount so lfs_loadvnode()
may use hints from the cleaner.

- Remove vnode locks from ulfs_lookup() like we did with ufs_lookup().
 1.19 28-Mar-2015  maxv Remove the 'cred' argument from bread(). Remove a now unused var in
ffs_snapshot.c. Update the man page accordingly.

ok hannken@
 1.18 28-Jul-2013  dholland branches: 1.18.6;
Add lfs_kernel.h for declarations that don't need to be exposed to userland.

lfs currently has the following headers:
lfs.h - on-disk structures and stuff needed for userlevel tools
lfs_inode.h - additional restricted materials for userlevel tools
that operate the fs (newfs_lfs, fsck_lfs, lfs_cleanerd)
lfs_kernel.h - stuff needed only in the kernel

and the following legacy headers that are expected to be mopped up and
folded into one of the above:
lfs_extern.h - function prototypes
ulfs_bswap.h - endian-independent support
ulfs_dinode.h - now contains very little
ulfs_dirhash.h - dirhash support
ulfs_extattr.h - extattr support
ulfs_extern.h - more function prototypes
ulfs_inode.h - assorted kernel-only declarations
ulfs_quota.h - quota support
ulfs_quota1.h - more quota support
ulfs_quota2.h - more quota support
ulfs_quotacommon.h - more quota support
ulfsmount.h - legacy copy of ufsmount material
 1.17 18-Jun-2013  christos branches: 1.17.2;
Prefix most of the cpp macros with lfs_ and LFS_ to avoid conflicts with ffs.
This was done so that boot blocks that want to compile both FFS and LFS in
the same file work.
 1.16 08-Jun-2013  dholland Stick LFS_ in front of IFMT, IFIFO, IFREG, etc. so as not to conflict
with the UFS copies of these symbols. (Which themselves ought to have
UFS_ stuck on.)
 1.15 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.14 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.13 22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.12 22-Feb-2009  ad branches: 1.12.12; 1.12.22;
PR kern/26878 FFSv2 + softdep = livelock (no free ram)
PR kern/16942 panic with softdep and quotas
PR kern/19565 panic: softdep_write_inodeblock: indirect pointer #1 mismatch
PR kern/26274 softdep panic: allocdirect_merge: ...
PR kern/26374 Long delay before non-root users can write to softdep partitions
PR kern/28621 1.6.x "vp != NULL" panic in ffs_softdep.c:4653 while unmounting a softdep (+quota) filesystem
PR kern/29513 FFS+Softdep panic with unfsck-able file-corruption
PR kern/31544 The ffs softdep code appears to fail to write dirty bits to disk
PR kern/31981 stopping scsi disk can cause panic (softdep)
PR kern/32116 kernel panic in softdep (assertion failure)
PR kern/32532 softdep_trackbufs deadlock
PR kern/37191 softdep: locking against myself
PR kern/40474 Kernel panic after remounting raid root with softdep

Retire softdep, pass 2. As discussed and later formally announced on the
mailing lists.
 1.11 16-May-2008  hannken branches: 1.11.6; 1.11.12;
Make sure all cached buffers with valid, not yet written data have been
run through copy-on-write. Call fscow_run() with valid data where possible.

The LP_UFSCOW hack is no longer needed to protect ffs_copyonwrite() against
endless recursion.

- Add a flag B_MODIFY to bread(), breada() and breadn(). If set the caller
intends to modify the buffer returned.

- Always run copy-on-write on buffers returned from ffs_balloc().

- Add new function ffs_getblk() that gets a buffer, assigns a new blkno,
may clear the buffer and runs copy-on-write. Process possible errors
from getblk() or fscow_run(). Part of PR kern/38664.

Welcome to 4.99.63

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.10 28-Apr-2008  martin branches: 1.10.2;
Remove clause 3 and 4 from TNF licenses
 1.9 02-Jan-2008  ad branches: 1.9.6; 1.9.8; 1.9.10;
Merge vmlocking2 to head.
 1.8 12-Dec-2007  he Make this build again, as part of sys/lkm/dev/vnd/:
- lfs_truncate() has lost its lwp argument.
- Cast from void* to char* before doing pointer arithmetic.
 1.7 12-Dec-2007  ad Fix a stray brelse() that got missed.
 1.6 12-Dec-2007  lukem defflag LFS_KERNEL_RFW (in opt_lfs.h).
Note: lfs_rfw.c doesn't compile if you define the option; locking API fallout?
 1.5 10-Oct-2007  ad branches: 1.5.4; 1.5.6; 1.5.8; 1.5.10;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.4 08-Oct-2007  ad Merge ffs locking & brelse changes from the vmlocking branch.
 1.3 01-Sep-2006  perseant branches: 1.3.4; 1.3.10; 1.3.16; 1.3.30; 1.3.32; 1.3.34;
Changes to help the roll-forward agent, to wit:

* Mark being-deleted files in the Ifile so we can finish deleting them
at fs mount time.
* Flag the Ifile with "cleaner must clean" when writers are waiting for
the cleaner, rather than relying solely on the cleaner's estimation of
whether it should clean or not.
* Note partial segments written by a user agent (in particular,
fsck_lfs) so that repeated rolls forward don't interfere with one
another.
* Add a new fcntl, LFCNPASS, that allows the log to wrap exactly once,
for better testing of the validity of checkpoints.
* Keep track of the on-disk nlink count when cleaning, so that we don't
partially complete directory operations while cleaning.
* Ensure that every single Ifile inode write represents a consistent
view of the filesystem. In particular, the accounting for the segment
we are writing the inode into must be correct, and the accounting for
the segment that inode used to reside in must be correct. Rather than
just rewriting the inode if we wrote it wrong, rewrite the necessary
ifile blocks before writing the inode so we never write it wrong.
* Don't unmark any VDIROP vnodes if we haven't written them to disk,
avoiding yet another problem with the "wait for the cleaner" error
return from lfs_putpages().

Also, move the last callback to an aiodone call, so we no longer do any
memory management from interrupt context.
 1.2 20-Jul-2006  perseant branches: 1.2.4;
Oops, commit the correct version of lfs_rfw.c. The roll-forward functionality
is known not to work in this version (as it did not previously) but it should
at least compile.
 1.1 20-Jul-2006  perseant Separate the (non-working) LFS kernel roll-forward code into its own file,
lfs_rfw.c.
 1.2.4.3 03-Sep-2006  yamt sync with head.
 1.2.4.2 11-Aug-2006  yamt sync with head
 1.2.4.1 20-Jul-2006  yamt file lfs_rfw.c was added on branch yamt-pdpolicy on 2006-08-11 15:47:37 +0000
 1.3.34.1 14-Oct-2007  yamt sync with head.
 1.3.32.2 09-Jan-2008  matt sync with HEAD
 1.3.32.1 06-Nov-2007  matt sync with HEAD
 1.3.30.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.3.16.3 24-Aug-2007  ad Sync with buffer cache locking changes. See buf.h/vfs_bio.c for details.
Some minor portions are incomplete and needs to be verified as a whole.
 1.3.16.2 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.3.16.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.3.10.4 21-Jan-2008  yamt sync with head
 1.3.10.3 27-Oct-2007  yamt sync with head.
 1.3.10.2 30-Dec-2006  yamt sync with head.
 1.3.10.1 01-Sep-2006  yamt file lfs_rfw.c was added on branch yamt-lazymbuf on 2006-12-30 20:51:01 +0000
 1.3.4.2 09-Sep-2006  rpaulo sync with head
 1.3.4.1 01-Sep-2006  rpaulo file lfs_rfw.c was added on branch rpaulo-netinet-merge-pcb on 2006-09-09 03:00:00 +0000
 1.5.10.2 02-Jan-2008  bouyer Sync with HEAD
 1.5.10.1 13-Dec-2007  bouyer Sync with HEAD
 1.5.8.1 13-Dec-2007  yamt sync with head.
 1.5.6.5 28-Dec-2007  ad Make it compile.
 1.5.6.4 26-Dec-2007  ad Sync with head.
 1.5.6.3 19-Dec-2007  ad Use a global lfs_lock.
 1.5.6.2 19-Dec-2007  ad Fix some more problems w/lfs on this branch.
 1.5.6.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.5.4.1 18-Feb-2008  mjf Sync with HEAD.
 1.9.10.2 04-May-2009  yamt sync with head.
 1.9.10.1 16-May-2008  yamt sync with head.
 1.9.8.1 18-May-2008  yamt sync with head.
 1.9.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.10.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.11.12.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.11.6.1 03-Mar-2009  skrll Sync with HEAD.
 1.12.22.4 03-Dec-2017  jdolecek update from HEAD
 1.12.22.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.12.22.2 23-Jun-2013  tls resync from head
 1.12.22.1 25-Feb-2013  tls resync with head
 1.12.12.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.12.12.1 23-Jan-2013  yamt sync with head
 1.17.2.1 28-Aug-2013  rmind sync with head
 1.18.6.4 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.18.6.3 22-Sep-2015  skrll Sync with HEAD
 1.18.6.2 06-Jun-2015  skrll Sync with HEAD
 1.18.6.1 06-Apr-2015  skrll Sync with HEAD
 1.32.18.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.32.18.1 10-Jun-2019  christos Sync with HEAD
 1.32.16.2 18-Jan-2019  pgoyette Synch with HEAD
 1.32.16.1 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.34.6.1 17-Jan-2020  ad Sync with head.
 1.295 29-Oct-2025  perseant Use IINFOSIZE and LFS_BLKPTRSIZE, rather than sizeof(int32_t), to represent
the size of inode numbers and logical block numbers, respectively, in the
segment summary header. Prevents an overrun in LFS64.
 1.294 20-Oct-2025  perseant * Generalize the partial-segment parser introduced for roll-forward,
using it to facilitate an in-kernel segment rewriter (cleaner), and a
mechanism to check whether a segment is in fact empty (only used
with DEBUG).

* Add these new fcntl calls:
- LFCNFILESTATS: For each inode given, report its number of direct
blocks, how many gaps (discontinuities) there are between direct
blocks, and how large the total gap distance is. This will be
useful for a coalescing agent.
- LFCNREWRITEFILE: For each inode given, rewrite its direct blocks,
effectively coalescing it into as compact a form as possible.
- LFCNSCRAMBLE: As above, except that it only rewrites every other
block. This causes the file to have many gaps that can be
measured with LFCNFILESTATS and addressed with LFCNREWRITEFILE,
for testing purposes.
- LFCNREWRITESEGS: Rewrite any live data in the given segments.
This is intended to simplify the cleaner API and facilitate an
in-kernel cleaner.
- LFCNCLEANERINFO: Get the most current CLEANERINFO data from the
kernel.
- LFCNSEGUSE: Retrieve segment usage data from the kernel.

* Vnodes marked IN_CLEANING now take a reference. Add a new "cleaner
lock", which must be taken by the cleaner before the segment lock,
and before marking nodes IN_CLEANING. This allows us to flush
vnodes, if necessary, before the cleaning segment is written, and
never to flush vnodes being cleaned. When the cleaner lock is
released, the vnodes are cleared of IN_CLEANING and the reference
dropped.

* Track a potential infinite loop in lfs_gatherblock.

* Pull "needs to flush" and "needs to wait for flush" into functions
instead of inlining their definitions.
 1.293 17-Sep-2025  perseant Add working in-kernel roll forward.
 1.292 17-Sep-2025  perseant Use a workqueue to handle the superblock callback.
 1.291 17-Sep-2025  perseant Add routines to check freelist consistency if compiled with DEBUG and
conditional on a kernel variable manipulated via sysctl.
Add checks before and after each routine that modifies the free list.
#if 0 a section of lfs_vfree() that was intended to keep the free list ordered
but instead corrupted it.
 1.290 04-Sep-2025  perseant Copy the flags from a full partial segment to its continuation, if
a continuation is necessary, so that partial-segment collections marked
with SS_DIROP|SS_CONT are properly completed wiht a partial-segment marked
SS_DIROP (without SS_CONT). Necessary for roll-forward.
 1.289 02-Sep-2025  perseant Use a workqueue to handle cluster iodone, rather than doing it in interrupt context.
 1.288 05-Sep-2020  riastradh Round of uvm.h cleanup.

The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
to query whether curlwp is the pagedaemon, which should maybe be
exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
by it. We should split up uvm_extern.h but this will serve for now
to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
UVMHIST(ubchist), since ubchist is declared in uvm.h but the
reference evaporates if UVMHIST is not defined, so we reduce header
file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
here.

ok chs@
 1.287 13-Aug-2020  riastradh Skip unlinked inodes.

They no longer matter on disk so we don't need to write anything out
for them.
 1.286 23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.285 23-Feb-2020  riastradh Break deadlock in PR kern/52301.

The lock order is lfs_writer -> lfs_seglock. The problem in 52301 is
that lfs_segwrite violates this lock order by sometimes doing
lfs_seglock -> lfs_writer, either (a) when doing a checkpoint or (b),
opportunistically, when there are no dirops pending. Both cases can
deadlock, because dirops sometimes take the seglock (lfs_truncate,
lfs_valloc, lfs_vfree):

(a) There may be dirops pending, and they may be waiting for the
seglock, so we can't wait for them to complete while holding the
seglock.

(b) The test for fs->lfs_dirops == 0 happens unlocked, and the state
may change by the time lfs_writer_enter acquires lfs_lock.

To resolve this in each case:

(a) Do lfs_writer_enter before lfs_seglock, since we will need it
unconditionally anyway. The worst performance impact of this should
be that some dirops get delayed a little bit.

(b) Create a new lfs_writer_tryenter to use at this point so that the
test for fs->lfs_dirops == 0 and the acquisition of lfs_writer happen
atomically under lfs_lock.
 1.284 23-Feb-2020  riastradh Change some cheap KDASSERT into KASSERT.
 1.283 22-Feb-2020  ad Make LFS/rump play nice with aiodoned removal.

PR kern/55004 (Hundreds of file system tests now fail on real hardware)
 1.282 18-Feb-2020  chs remove the aiodoned thread. I originally added this to provide a thread context
for doing page cache iodone work, but since then biodone() has changed to
hand off all iodone work to a softint thread, so we no longer need the
special-purpose aiodoned thread.
 1.281 15-Jan-2020  ad Merge from yamt-pagecache (after much testing):

- Reduce unnecessary page scan in putpages esp. when an object has a ton of
pages cached but only a few of them are dirty.

- Reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
 1.280 08-Dec-2019  ad branches: 1.280.2;
Revert previous. No performance gain worth the potential headaches
with buffers in these contexts.
 1.279 08-Dec-2019  ad Avoid thundering herd: cv_broadcast(&bp->b_busy) -> cv_signal(&bp->b_busy)
 1.278 03-Sep-2018  riastradh branches: 1.278.4;
Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
 1.277 09-Jun-2018  zafer branches: 1.277.2;
Add missing b_cflags and b_oflags.
Ok dholland@
Addresses PR kern/42342 by Yoshihiro Nakajima
 1.276 06-Jun-2018  maya Remove duplicate ;
 1.275 20-Aug-2017  maya branches: 1.275.2;
XXX question our double-flushing of dirops
 1.274 26-Jul-2017  maya change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar

XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
 1.273 26-Jul-2017  maya Revert r1.272 fix to PR kern/52301, the performance hit is making things
unusable.
 1.272 15-Jun-2017  maya It isn't safe to drain dirops with seglock held, it'll deadlock if there
are any dirops. drain before grabbing seglock.

lfs_dirops == 0 is always true (as we already drained dirops), so omit
that part of the comparison.

Fixes a lot of LFS deadlocks. PR kern/52301

Many thanks to dholland for help analyzing coredumps
 1.271 12-Jun-2017  maya Use continue to denote the no-op loop to match netbsd style
newline for extra clarity.
 1.270 10-Jun-2017  maya Rename i_flag to i_state.

The similarity to i_flags has previously caused errors.
 1.269 06-Apr-2017  maya branches: 1.269.6;
don't guard lfs_sbactive or lfs_log with splbio, lfs_lock is plenty.
 1.268 06-Apr-2017  maya remove deprecated comment (and move it below assert)
there's no spl dance for I/O here.
 1.267 06-Apr-2017  maya Provide a LFS_ENTER_LOG (__nothing) in the !DEBUG case.
so I can drop lots of #ifdef DEBUG around this macro. NFCI
 1.266 06-Apr-2017  maya Drop single use macro LFS_BCLEAN_LOG with an inlined implementation.

LFS_ENTER_LOG currently macro grabs lfs_lock, so I'd like to have just one
name for it.
 1.265 01-Apr-2017  riastradh KASSERT(mutex_owned(vp->v_interlock)) in vnode iterator selector.
 1.264 13-Mar-2017  riastradh #if DIAGNOSTIC panic ---> KASSERT

Replace some #if DEBUG by this too. DEBUG is only for expensive
assertions; these are not.
 1.263 19-Oct-2015  dholland branches: 1.263.2; 1.263.4;
improve some panic messages
 1.262 10-Oct-2015  dholland Fix minor bitrot in #if 0 or otherwise disabled code.
 1.261 10-Oct-2015  dholland Use accessors for some more indirect block manipulations.
 1.260 03-Oct-2015  dholland Use IINFO in lfs_writeinode().
(both the kernel and the userland copies)
 1.259 01-Sep-2015  dholland Use the lfs dinode accessors in place of the ufs-derived ones.
(Mostly.)

The ufs-derived ones are fake structure member macros, which are gross
and not very safe. Also, it seems that a lot of places in the lfs code
were using the ffsv1 branch of them unconditionally, and this way it's
guaranteed all those places have been updated.

Found while doing this: for non-devices, have getattr produce NODEV
in the rdev field instead of leaking the address of the first direct
block.
 1.258 21-Aug-2015  hannken lfs_writevnodes: replace mnt_vnodelist traversal with vfs_vnode_iterator.
 1.257 19-Aug-2015  dholland Part two of dinodes; use the same union everywhere.
(previously the ufs-derived code had things set up slightly different)

Remove a bunch of associated mess.
 1.256 12-Aug-2015  dholland Hack up dinode usage to be 64 vs. 32 as needed. Part 1.

(This part changes the native lfs code; the ufs-derived code already
has 64 vs. 32 logic, but as aspects of it are unsafe, and don't
entirely interoperate cleanly with the lfs 64/32 stuff, pass 2 will be
rehashing that.)
 1.255 12-Aug-2015  dholland Provide 32-bit and 64-bit versions of FINFO.

This also entailed sorting out part of struct segment, as that
contains a pointer into the current FINFO data.
 1.254 12-Aug-2015  dholland Make 32-bit and 64-bit versions of SEGSUM.
Also fix some of the FINFO handling as it's closely entangled.
 1.253 12-Aug-2015  dholland Add IFILE32 and IFILE64 structures for the on-disk ifile entries.
Add and use accessors. There are also a bunch of places that cast and
I hope I've found them all...
 1.252 12-Aug-2015  dholland Make 32-bit and 64-bit versions of CLEANERINFO.

XXX: while this is written to disk, it seems like much of it would
XXX: be better set up as a commpage shared with the cleaner.
 1.251 02-Aug-2015  dholland Pass the fs object to LFS_MAX_DADDR so it can check lfs_is64.

Remove some hackish intentional 64->32 truncations next to the checks
using LFS_MAX_DADDR, and tackle the problem they handled in bmap
instead.

The problem: the magic block pointer value UNWRITTEN has magic value
-2, and if it's not handled specifically, uint32 -> uint64 promotion
turns it into 4294967294, which then causes consternation and
monkeyhouse downstream.

What's here is still kind of a hack, but it's a step forward.
 1.250 02-Aug-2015  dholland Add a (draft) 64-bit superblock. Make things build again.

Add pieces of support for using both superblock types where
convenient, and specifically to the superblock accessors, but don't
actually enable it anywhere.

First substantive step on PR 50000.
 1.249 02-Aug-2015  dholland Use accessor functions for the version field of the lfs superblock.
I thought at first maybe the cases that test the version should be
rolled into the accessors, but on the whole I think the conclusion on
that is no.
 1.248 02-Aug-2015  dholland Make i_eff_nblks in the in-memory inode 64 bits wide.
 1.247 02-Aug-2015  dholland Fix catastrophic bug in lfs_rewind() that changed segment numbers
(lfs_curseg/lfs_nextseg in the superblock) using the wrong units.
These fields are for whatever reason the start addresses of segments
(measured in frags) rather than the segment numbers 0..n.

This only apparently affects dumping from a mounted fs; however, it
trashes the fs.

I would really, really like to have a static analysis tool that can
keep track of the units things are measured in, since fs code is full
of conversion macros and the macros are named inscrutable things like
"sntod" whose letters don't necessarily even correspond to the units
they convert. It is surprising that more of these are not wrong.
 1.246 02-Aug-2015  dholland Second batch of 64 -> 32 truncations in lfs, along with more minor
tidyups and corrections in passing.
 1.245 28-Jul-2015  dholland Add a new lfs header file: lfs_accessors.h.

This contains all the accessor functions and macros out of lfs.h.
Add an include of lfs_accessors.h after all uses of lfs.h... except
for code that wants to define its own struct lfs-alike that the
accessors are supposed to play along with. For these, set STRUCT_LFS
and include lfs_accessors.h after the necessary structure has been
defined, so that lfs_accessors.h can emit functions in terms of it.
 1.244 25-Jul-2015  martin Use accessors in DEBUG and DIAGNOSTIC code as well
 1.243 24-Jul-2015  dholland More lfs superblock accessors.
(This changes the rest of the code over; all the accessors were
already added.)

The difference between this commit and the previous one is arbitrary,
but the previous one passed the regression tests on its own so I'm
keeping it separate to help with any bisections that might be needed
in the future.
 1.242 24-Jul-2015  dholland Switch to accessor functions for elements of the LFS on-disk
superblock. This will allow switching between 32/64 bit forms on the
fly; it will also allow handling LFS_EI reasonably tidily. (That
currently doesn't work on the superblock.)

It also gets rid of cpp abuse in the form of fake structure member
macros.

Also, instead of doing sleep/wakeup on &lfs_avail and &lfs_nextseg
inside the on-disk superblock, add extra elements to the in-memory
struct lfs for this. (XXX: these should be changed to condvars, but
not right now)

XXX: this migrates a structure needed by the lfs code in libsa (struct
salfs) into lfs.h, where it doesn't belong, but for the time being
this is necessary in order to allow the accessors (and the various
lfs macros and other goop that relies on them) to compile.
 1.241 07-Jun-2015  hannken Fix copy and paste errors from last commits.
- Kernel i386/ALL and amd64/ALL compile again.
- Resolves CID 1304138 (DEADCODE) and 1304139 (IDENTICAL_BRANCHES).
 1.240 31-May-2015  hannken Change lfs from hash table to vcache.

- Change lfs_valloc() to return an inode number and version instead of
a vnode and move lfs_ialloc() and lfs_vcreate() to new lfs_init_vnode().

- Add lfs_valloc_fixed() to allocate a known inode, used by kernel
roll forward.

- Remove lfs_*ref(), these functions cannot coexist with vcache and
their commented behaviour is far away from their implementation.

- Add the cleaner lwp and blockinfo to struct ulfsmount so lfs_loadvnode()
may use hints from the cleaner.

- Remove vnode locks from ulfs_lookup() like we did with ufs_lookup().
 1.239 31-May-2015  hannken Use VFS_PROTOS() for lfs.
Rename conflicting struct lfs field "lfs_start" to "lfs_s0addr".

No functional change.
 1.238 20-Apr-2015  riastradh Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.
 1.237 28-Mar-2015  maxv Remove the 'cred' argument from bread(). Remove a now unused var in
ffs_snapshot.c. Update the man page accordingly.

ok hannken@
 1.236 24-Mar-2014  hannken branches: 1.236.4; 1.236.6;
- Make VI_XLOCK, VI_CLEAN and VI_LOCKSHARE private to kern/vfs_*.c.
- Make vwait() static.
- Add vdead_check() to check a vnode for being or becoming dead.

Discussed on tech-kern.

Welcome to 6.99.38
 1.235 18-Mar-2014  hannken Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37
 1.234 17-Mar-2014  hannken Change vismarker() to VI_MARKER for lfs_writevnodes().
This operation has to be changed to vfs_vnode_iterator.
 1.233 29-Oct-2013  hannken Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25
 1.232 17-Oct-2013  christos - remove unused variables
- add debug ifdefs for debugging variables
- __USE() where appropriate.
 1.231 28-Jul-2013  dholland Add lfs_kernel.h for declarations that don't need to be exposed to userland.

lfs currently has the following headers:
lfs.h - on-disk structures and stuff needed for userlevel tools
lfs_inode.h - additional restricted materials for userlevel tools
that operate the fs (newfs_lfs, fsck_lfs, lfs_cleanerd)
lfs_kernel.h - stuff needed only in the kernel

and the following legacy headers that are expected to be mopped up and
folded into one of the above:
lfs_extern.h - function prototypes
ulfs_bswap.h - endian-independent support
ulfs_dinode.h - now contains very little
ulfs_dirhash.h - dirhash support
ulfs_extattr.h - extattr support
ulfs_extern.h - more function prototypes
ulfs_inode.h - assorted kernel-only declarations
ulfs_quota.h - quota support
ulfs_quota1.h - more quota support
ulfs_quota2.h - more quota support
ulfs_quotacommon.h - more quota support
ulfsmount.h - legacy copy of ufsmount material
 1.230 18-Jun-2013  christos branches: 1.230.2;
Prefix most of the cpp macros with lfs_ and LFS_ to avoid conflicts with ffs.
This was done so that boot blocks that want to compile both FFS and LFS in
the same file work.
 1.229 08-Jun-2013  dholland ulfs_dir.h has been emptied; remove it.
 1.228 08-Jun-2013  dholland Stick LFS_ in front of IFMT, IFIFO, IFREG, etc. so as not to conflict
with the UFS copies of these symbols. (Which themselves ought to have
UFS_ stuck on.)
 1.227 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.226 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.225 22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.224 16-Feb-2012  perseant branches: 1.224.2;
Pass t_renamerace and t_rmdirrace tests.

Adapt dholland@'s fix to ufs_rename to fix PR kern/43582. Address several
other MP locking issues discovered during the course of investigating the
same problem.

Removed extraneous vn_lock() calls on the Ifile, since the Ifile writes
are controlled by the segment lock.

Fix PR kern/45982 by deemphasizing the estimate of how much metadata
will fill the empty space on disk when the disk is nearly empty
(t_renamerace crates a lot of inode blocks on a tiny empty disk).
 1.223 02-Jan-2012  perseant branches: 1.223.2;

* Remove PGO_RECLAIM during lfs_putpages()' call to genfs_putpages(),
to avoid a live lock in the latter when reclaiming a vnode with
dirty pages.

* Add a new segment flag, SEGM_RECLAIM, to note when a segment is
being written for vnode reclamation, and record which inode is being
reclaimed, to aid in forensic debugging.

* Add a new segment flag, SEGM_SINGLE, so that opportunistic writes
can write a single segment's worth of blocks and then stop, rather
than writing all the way up to the cleaner's reserved number of
segments.

* Add assert statements to check mutex ownership is the way it ought
to be, mostly in lfs_putpages; fix problems uncovered by this.

* Don't clear VU_DIROP until the inode actually makes its way to disk,
avoiding a problem where dirop inodes could become separated
(uncovered by a modified version of the "ckckp" forensic regression
test).

* Move the vfs_getopsbyname() call into lfs_writerd. Prepare code to
make lfs_writerd notice when there are no more LFSs, and exit losing
the reference, so that, in theory, the module can be unloaded. This
code is not enabled, since it causes a crash on exit.

* Set IN_MODIFIED on inodes flushed by lfs_flush_dirops. Really we
only need to set IN_MODIFIED if we are going to write them again
(e.g., to write pages); need to think about this more.

Finally, several changes to help avoid "no clean segments" panics:

* In lfs_bmapv, note when a vnode is loaded only to discover whether
its blocks are live, so it can immediately be recycled. Since the
cleaner will try to choose ~empty segments over full ones, this
prevents the cleaner from (1) filling the vnode cache with junk, and
(2) squeezing any unwritten writes to disk and running the fs out of
segments.

* Overestimate by half the amount of metadata that will be required
to fill the clean segments. This will make the disk appear smaller,
but should help avoid a "no clean segments" panic.

* Rearrange lfs_writerd. In particular, lfs_writerd now pays
attention to the number of clean segments available, and holds off
writing until there is room.
 1.222 11-Jul-2011  hannken branches: 1.222.2; 1.222.6;
Change VOP_BWRITE() to take a vnode as its first argument like all other
VOPs do. Layered file systems no longer have to modify bp->b_vp and run
into trouble when an async VOP_BWRITE() uses the wrong vnode.

- change all occurences of VOP_BWRITE(bp) to VOP_BWRITE(bp->b_vp, bp).
- remove layer_bwrite().
- welcome to 5.99.55

Adresses PR kern/38762 panic: vwakeup: neg numoutput

No objections from tech-kern@.
 1.221 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.220 03-Apr-2011  rmind branches: 1.220.2;
- Use offsetof() in VOPARG_OFFSETOF() instead of re-implementing it.
- Remove VDESC_NOMAP_VPP and VDESC_VPP_WILLRELE.
- Remove VRELEL_NOINACTIVE and VRELEL_ONHEAD.
 1.219 02-Apr-2011  rmind Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.
 1.218 23-Mar-2011  rmind G/C count_lock_queue (unused for 12 years)
 1.217 21-Jul-2010  hannken branches: 1.217.2;
Make holding v_interlock mandatory for callers of vget().

Announced some time ago on tech-kern.
 1.216 24-Jun-2010  hannken Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.
 1.215 16-Feb-2010  mlelstv branches: 1.215.2;
Three changes in a single commit.

- drop the notion of frags (LFS fragments) vs fsb (FFS fragments)
The code uses a complicated unity function that just makes the
code difficult to understand.

- support larger sector sizes. Fix disk address computations
to use DEV_BSIZE in the kernel as required by device drivers
and to use sector sizes in userland.

- Fix several locking bugs in lfs_bio.c and lfs_subr.c.
 1.214 07-Aug-2009  wiz branches: 1.214.2;
Add missing parenthesis in #ifdef LFS_USE_B_INVAL.
From Henning Petersen in PR 41841.
 1.213 02-Jun-2008  ad branches: 1.213.8; 1.213.18; 1.213.22;
Use atomics to maintain v_usecount.
 1.212 16-May-2008  hannken Make sure all cached buffers with valid, not yet written data have been
run through copy-on-write. Call fscow_run() with valid data where possible.

The LP_UFSCOW hack is no longer needed to protect ffs_copyonwrite() against
endless recursion.

- Add a flag B_MODIFY to bread(), breada() and breadn(). If set the caller
intends to modify the buffer returned.

- Always run copy-on-write on buffers returned from ffs_balloc().

- Add new function ffs_getblk() that gets a buffer, assigns a new blkno,
may clear the buffer and runs copy-on-write. Process possible errors
from getblk() or fscow_run(). Part of PR kern/38664.

Welcome to 4.99.63

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.211 28-Apr-2008  martin branches: 1.211.2;
Remove clause 3 and 4 from TNF licenses
 1.210 27-Mar-2008  ad branches: 1.210.2; 1.210.4;
Make rusage collection per-LWP and collate in the appropriate places.
cloned threads need a little bit more work but the locking needs to
be fixed first.
 1.209 15-Feb-2008  ad branches: 1.209.6;
The buffer LOCKED flag need not be under the protection of bufcache_lock,
BUSY is enough.
 1.208 27-Jan-2008  pooka Replace vrelel() 010101-mania with a flags parameter. However,
leave flags unimplemented for a while (no change in functionality).
 1.207 02-Jan-2008  ad Merge vmlocking2 to head.
 1.206 10-Oct-2007  ad branches: 1.206.4; 1.206.6; 1.206.10;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.205 08-Oct-2007  ad Merge ffs locking & brelse changes from the vmlocking branch.
 1.204 09-Aug-2007  pooka branches: 1.204.2; 1.204.4;
Instead of having lfs muck directly about with vnode free lists,
introduce vrele2(), which allows to release vnodes the way lfs
sometimes wants it:
+ without calling inactive
+ inserting the vnode at the head of the freelist (this is a very
questionable optimization that isn't even enabled by default,
but I went along with the same semantics for now)
 1.203 29-Jul-2007  ad branches: 1.203.4; 1.203.6;
It's not a good idea for device drivers to modify b_flags, as they don't
need to understand the locking around that field. Instead of setting
B_ERROR, set b_error instead. b_error is 'owned' by whoever completes
the I/O request.
 1.202 12-Jul-2007  rmind branches: 1.202.2;
Implementation of per-CPU work-queues support for workqueue(9) interface.
WQ_PERCPU flag for workqueue and additional argument for workqueue_enqueue()
to assign a CPU might be used. Notes:
- For now, the list is used for workqueue_queue, which is non-optimal,
and will be changed with array, where index would be CPU ID.
- The data structures should be changed to be cache-friendly.

Reviewed by: <yamt>, <tech-kern>
 1.201 30-Jun-2007  pooka Using POOL_INIT here makes no sense, since file systems always have
an init method. So get rid of it and #ifdef _LKM and just always
init in the init method. Give malloc types the same treatment.
Makes file systems nicer to work with in linksetless environments
and fixes a few LKM discrepancies.
 1.200 16-May-2007  perseant Change references to SEGM_W_DIROPS to SEGM_CKP, and replace the logic that
formerly used SEGM_W_DIROPS in lfs_segwrite() appropriately. This prevents
a problem in which processes could get stuck in "buffers" sleep forever.
 1.199 17-Apr-2007  perseant Install a new sysctl, vfs.lfs.ignore_lazy_sync, which causes LFS to ignore
the "smooth" syncer, as if vfs.sync.*delay = 0, but only for LFS. The
default is "on", i.e., ignore lazy sync.

Reduce the amount of polling/busy-waiting done by lfs_putpages(). To
accomplish this, copied genfs_putpages() and modified it to indicate which
page it was that caused it to return with EDEADLK. fsync()/fdatasync()
should no longer ever fail with EAGAIN, and should not consume huge
quantities of cpu.

Also, try to make dirops less likely to be written as the result of a
VOP_PUTPAGES(), while ensuring that they are written regularly.
 1.198 04-Mar-2007  christos branches: 1.198.2; 1.198.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.197 23-Feb-2007  perseant Reverse the order of searching the vnode list in lfs_writevnodes(). This
should speed up e.g. "chown -R" on LFS filesystems; e.g. it shows a 100%
increase in the 'seq_stat' column of bonnie++.
 1.196 21-Dec-2006  yamt branches: 1.196.2;
merge yamt-splraiseipl branch.

- finish implementing splraiseipl (and makeiplcookie).
http://mail-index.NetBSD.org/tech-kern/2006/07/01/0000.html
- complete workqueue(9) and fix its ipl problem, which is reported
to cause audio skipping.
- fix netbt (at least compilation problems) for some ports.
- fix PR/33218.
 1.195 16-Nov-2006  christos branches: 1.195.2; 1.195.4;
__unused removal on arguments; approved by core.
 1.194 20-Oct-2006  reinoud Replace the LIST structure mp->mnt_vnodelist to a TAILQ structure since all
vnodes were synced and processed backwards. This meant that the last
accessed node was processed first and the earlierst last.

An extra benefit is the removal of the ugly hack from the Berkly days on
LFS.

In the proces, i've also replaced the various variations hand written loops
by the TAILQ_FOREACH() macro's.
 1.193 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.192 04-Oct-2006  christos fix empty if
 1.191 28-Sep-2006  perseant Use lockstatus instead of a homebrewed locking system to control
LFCNWRAPSTOP and LFCNWRAPGO.

Be less verbose about the various looping checks: use log() rather than
printf(), and only log anything if we are really looping ("count = 2" is
not an error condition).

Allow dirops sleeping on available space to be interruptible.
 1.190 02-Sep-2006  christos branches: 1.190.2; 1.190.4;
remove impossible test
 1.189 01-Sep-2006  perseant Changes to help the roll-forward agent, to wit:

* Mark being-deleted files in the Ifile so we can finish deleting them
at fs mount time.
* Flag the Ifile with "cleaner must clean" when writers are waiting for
the cleaner, rather than relying solely on the cleaner's estimation of
whether it should clean or not.
* Note partial segments written by a user agent (in particular,
fsck_lfs) so that repeated rolls forward don't interfere with one
another.
* Add a new fcntl, LFCNPASS, that allows the log to wrap exactly once,
for better testing of the validity of checkpoints.
* Keep track of the on-disk nlink count when cleaning, so that we don't
partially complete directory operations while cleaning.
* Ensure that every single Ifile inode write represents a consistent
view of the filesystem. In particular, the accounting for the segment
we are writing the inode into must be correct, and the accounting for
the segment that inode used to reside in must be correct. Rather than
just rewriting the inode if we wrote it wrong, rewrite the necessary
ifile blocks before writing the inode so we never write it wrong.
* Don't unmark any VDIROP vnodes if we haven't written them to disk,
avoiding yet another problem with the "wait for the cleaner" error
return from lfs_putpages().

Also, move the last callback to an aiodone call, so we no longer do any
memory management from interrupt context.
 1.188 20-Jul-2006  perseant Note partial segments that are written by the cleaner, to help out the
roll-forward agent.
 1.187 20-Jul-2006  perseant Loop on the check for lfs_nowrap, so we don't allow a process to squeeze by.
 1.186 20-Jul-2006  perseant Don't try to write all the vnodes, when the cleaner needs a vnode to be
recycled.
 1.185 29-Jun-2006  perseant Don't wake up the cleaner if the filesystem is unwrappable, and fix the
compatibility fcntls.

Also includes one-line fixes for an MP locking bug and a zero-length FINFO
problem that manifested during testing.
 1.184 24-Jun-2006  perseant Change LFCNWRAP{STOP,GO} to make them more suitable for snapshotting; in
particular, the caller can now choose whether to wait for the condition
to be met, and if the caller of LFCNWRAPSTOP dies or otherwise closes
the descriptor, the filesystem is started again. Updated the ckckp
regression test to use the new semantics.

dump_lfs(8) now uses the fcntls to implement LFS-style snapshotting through
the -X flag, addressing PR#33457 albeit not using fss(4). Fixed a couple
other problems with dump_lfs that manifested themselves during testing.
 1.183 23-Jun-2006  yamt fix a simonb-timecounters regression.
the precision of getnanotime() is not suitable for file timestamps.
esp. when it's nfs-exported.

- introduce vfs_timestamp().
(the name is from freebsd. currently merely a wrapper of nanotime())
- for ufs-like filesystems, use it rather than getnanotime().

XXX check other filesystems.
 1.182 07-Jun-2006  kardel branches: 1.182.2;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.181 20-May-2006  perseant Fix a bug in which FINFOs were written with a version number of zero.
Add assertions and add this to the DEBUG fip test in lfs_writeseg.
 1.180 18-May-2006  perseant branches: 1.180.2;
Break out the finfo array manipulation code into two new functions,
lfs_acquire_finfo() and lfs_release_finfo(). Add a debugging check
for zero-length finfo arrays in the segment summary to avoid future
regressions.
 1.179 14-May-2006  elad integrate kauth.
 1.178 12-May-2006  perseant Fixes to address the "vinvalbuf: dirty blocks" panic that can occur when
many inodes are cleaned at once. Make sure that we write all the pages
on vnodes that are being flushed, even if we don't think there's room;
drain v_numoutput before lfs_vflush() completes.

Also, don't allow a vnode that is in the process of being cleaned to be
chosen by getnewvnode(); this avoids a segment accounting panic in the case
that a large number of inodes are fed to lfs_markv() all at once.
 1.177 01-May-2006  perseant Don't ever partially write dirops, even if we need the cleaner to run.
This increases the chances of the "no clean segments" panic slightly,
but allows us to run the ckckp regression test successfully to completion.
 1.176 30-Apr-2006  perseant Postpone the segment accounting changes coming from truncation until the
inode that makes those changes valid is either written to disk by
lfs_writeinode() or discarded by lfs_vfree().

A couple of locking fixes are also included as well.
 1.175 22-Apr-2006  perseant Regression test improvements:

Move the stop for LFCNWRAPSTOP to the point at which writing at segment 0
is really about to commence, since this is what the test expects (and
incidentally what a snapshotting utility wants as well).

More correctly reconstruct the on-disk state at every checkpoint, rather
than relying on the entire state at the point of wrapping to be accurate
(that is only true the first time we wrap). Add a "make abort" target to
make rerunning the test more convenient when it has failed and we're done
analyzing the failure.
 1.174 17-Apr-2006  perseant Introduce two fcntl calls that freeze the filesystem right at the point
where segment 0 is being considered for writing. This allows for automated
checkpoint vailidity scanning, and could be used (in conjunction with the
existing LFCNREWIND) for e.g. snapshot dumps as well.

Include a regression test that does such scanning.

When writing the Ifile, loop through the dirty block list three times to
make sure that the checkpoint is always consistent (the first and second
times the Ifile blocks can cross a segment boundary; not so the third time
unless the segments are very small). Discovered by using the aforementioned
regression test.
 1.173 13-Apr-2006  perseant Make lfs_vref/lfs_vunref not need to know about VXLOCK and VFREEING
explicitly (especially since we didn't know about VFREEING at all before),
but notice the EBUSY return from vget() instead.

Fix some more MP locking protocol issues, most of which were pointed out by
Christian Ehrhardt this morning on tech-kern.
 1.172 07-Apr-2006  perseant Several minor bug fixes:

* Correct (weak) segment lock assertions in lfs_fragextend and lfs_putpages.
* Keep IN_MODIFIED set if we run out of avail in lfs_putpages.
* Don't try to (re)write buffers on a VBLK vnode; fixes a panic I found
while running with an LFS root.
* Raise priority of LFCNSEGWAIT to PVFS; PUSER is way too low for
something the pagedaemon is relying on.
 1.171 24-Mar-2006  perseant Improvements to LFS's paging mechanism, to wit:

* Acknowledge that sometimes there are more dirty pages to be written to
disk than clean segments. When we reach the danger line,
lfs_gop_write() now returns EAGAIN. The caller of VOP_PUTPAGES(), if
it holds the segment lock, drops it and waits for the cleaner to make
room before continuing.

* Note and avoid a three-way deadlock in lfs_putpages (a writer holding
a page busy blocks on the cleaner while the cleaner blocks on the
segment lock while lfs_putpages blocks on the page).
 1.170 17-Mar-2006  tls From Konrad Schroeder, in response to strange df output on anoncvs.netbsd.org:
We were returning the wrong value for free space. Now we're not.
 1.169 04-Jan-2006  yamt branches: 1.169.2; 1.169.4; 1.169.6; 1.169.8; 1.169.10;
- add simple functions to allocate/free a buffer for i/o.
- make bufpool static.
 1.168 11-Dec-2005  christos branches: 1.168.2;
merge ktrace-lwp.
 1.167 26-Sep-2005  yamt always use nanotime rather than time.
it's bad to mix nanotime and time because it sometimes
make timestamps go backwards.
 1.166 12-Sep-2005  christos Use nanotime() to update the time fields in filesystems. Convert the code
from macros to real functions. Original patch and review from chuq.
Note: ext2fs only keeps seconds in the on-disk inode, and msdosfs does not
have enough precision for all fields, so this is not very useful for those
two.
 1.165 19-Aug-2005  christos 64 bit inode changes.
 1.164 29-May-2005  christos branches: 1.164.2;
- sprinkle const
- avoid shadow variables.
 1.163 23-Apr-2005  perseant Provide a resize_lfs(8), including kernel and cleaner support. The current
implementation requires the fs to be mounted while resizing. Tested in both
directions, and everything appears to work happily, but ymmv.
 1.162 19-Apr-2005  perseant Keep per-inode, per-fs, and subsystem-wide counts of blocks allocated through
lfs_balloc(), and use that to estimate the number of dirty pages belonging
to LFS (subsystem or filesystem). This is almost certainly wrong for
the case of a large mmap()ed region, but the accounting is tighter than
what we had before, and performs much better in the typical case of pages
dirtied through write().
 1.161 18-Apr-2005  perseant Check the to-be-on-disk consistency of directories as well (correct a typo
in an earlier commit).
 1.160 14-Apr-2005  perseant Keep track of the highest block held by an LFS inode, so that we can
be assured that the last byte of a file is always allocated. Previously
a file extension could cause the filesystem to be flushed, writing an
inconsistent inode to disk. Although this condition would be corrected
the next time blocks were written to disk, an intervening crash would leave
the filesystem in an inconsistent state, leaving fsck_lfs to complain
of an inode "partially truncated".
 1.159 01-Apr-2005  perseant Protect various per-fs structures with fs->lfs_interlock simple_lock, to
improve behavior in the multiprocessor case. Add debugging segment-lock
assertion statements.
 1.158 08-Mar-2005  perseant branches: 1.158.2;
Straighten out the maze of ifdefs. Instead, consolidate all the debugging
stuff under '#ifdef DEBUG', and use sysctl knobs to turn on/off particular
parts of the debugging reporting (if DEBUG is enabled). Re-enable the LFS
statistics in sysctl, while I'm there. A bit of a rototill.
 1.157 26-Feb-2005  perry nuke trailing whitespace
 1.156 26-Feb-2005  perseant Various minor LFS improvements:

* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statvfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().
 1.155 18-Sep-2004  yamt branches: 1.155.4; 1.155.6;
change some members of struct buf from long to int.
ride on 2.0H.
 1.154 14-Aug-2004  mycroft Add a new flag, IN_MODIFY. This is like IN_UPDATE|IN_CHANGE, but unlike
setting those flags, it does not cause the inode to be written in the periodic
sync. This is used for writes to special files (devices and named pipes) and
FIFOs.

Do not preemptively sync updates to access times and modification times. They
are now updated in the inode only opportunistically, or when the file or device
is closed. (Really, it should be delayed beyond close, but this is enough to
help substantially with device nodes.)

And the most amusing part:
Trickle sync was broken on both FFS and ext2fs, in different ways. In FFS, the
periodic call to VFS_SYNC(MNT_LAZY) was still causing all file data to be
synced. In ext2fs, it was causing the metadata to *not* be synced. We now
only call VOP_UPDATE() on the node if we're doing MNT_LAZY. I've confirmed
that we do in fact trickle correctly now.
 1.153 19-May-2004  yamt lfs_cluster_aiodone: turn an invariant condition into an assertion.
 1.152 09-Mar-2004  yamt branches: 1.152.4;
calculate data checksum inline.
 1.151 09-Mar-2004  yamt use correct segment size. this fixes memory corruption when using lfsv1.
 1.150 29-Jan-2004  yamt lfs_update_single: add an assertion.
 1.149 28-Jan-2004  yamt eliminate tricky usages of VOP_STRATEGY which are (no longer?) necessary.
 1.148 25-Jan-2004  hannken Make VOP_STRATEGY(bp) a real VOP as discussed on tech-kern.

VOP_STRATEGY(bp) is replaced by one of two new functions:

- VOP_STRATEGY(vp, bp) Call the strategy routine of vp for bp.
- DEV_STRATEGY(bp) Call the d_strategy routine of bp->b_dev for bp.

DEV_STRATEGY(bp) is used only for block-to-block device situations.
 1.147 10-Jan-2004  yamt store a i/o priority hint in struct buf for buffer queue discipline.
 1.146 17-Dec-2003  yamt set VBWAIT when waiting v_numoutput to be drained.
 1.145 17-Dec-2003  yamt remove a redundant substitution.
 1.144 04-Dec-2003  yamt use b_private rather than b_saveaddr.
XXX LFS_USE_B_INVAL
 1.143 07-Nov-2003  yamt - tweak lfs_update_single()'s prototype so that it can be used by
roll-forward code.
- reduce code duplication using the above in update_meta()
this also fixes fragment accounting.
 1.142 25-Oct-2003  christos Fix uninitialized variable warnings.
 1.141 18-Oct-2003  yamt be more strict about sa->vp.
(make sure the last lfs_updatemata in lfs_putpages takes effect.)
 1.140 18-Oct-2003  simonb Remove assigned-to but otherwise unused variable.
 1.139 17-Oct-2003  yamt add comments and tweak code a little for readability.
(no behaviour changes)
 1.138 14-Oct-2003  yamt remove a redundant definition of LFS_MAX_ACTIVE.
 1.137 08-Oct-2003  yamt - a comment.
- bcopy -> memcpy
- increase 'p' only when needed.
 1.136 03-Oct-2003  yamt assertions.
 1.135 03-Oct-2003  yamt reassignbuf() when lfs_writeseg() takes away B_DELWRI.
 1.134 03-Oct-2003  yamt when inactivating segments, compare segment numbers correctly.
 1.133 29-Sep-2003  yamt remove redundant prototypes.
 1.132 07-Sep-2003  yamt - buffer cache MP locks.
- avoid changing buffer state on the free queue.
 1.131 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.130 30-Jul-2003  yamt using normal bufcache buffer for cluster buffer head.
 1.129 23-Jul-2003  yamt KNF.
 1.128 12-Jul-2003  yamt - wrap long lines.
- remove a mysterious blank line.
 1.127 12-Jul-2003  yamt - protect global resource counts with lfs_subsys_lock.
- clean up scattered externs a little.
 1.126 02-Jul-2003  yamt use queue.h macros.
 1.125 02-Jul-2003  yamt - add a new functions, lfs_writer_enter/leave, and use them instead of
duplicated code fragments.
- add an assertion.
 1.124 29-Jun-2003  fvdl branches: 1.124.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.123 29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.122 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.121 18-May-2003  yamt make is_sequential a callback in order to achieve better lfs write clustering.

since lfs always rewrite blocks into the new segment,
current on-disk place of the block doesn't affect to write clustering.

ok'ed by Konrad Schroder.
 1.120 23-Apr-2003  perseant Make LFS work better (though still not "well") as an NFS-exported
filesystem (and other things that needed to be fixed before the tests
would complete), to wit:

* Include the fs ident in the filehandle; improve stale filehandle checks.

* Change definition of blksize() to use the on-dinode size instead of
the inode's i_size, so that fsck_lfs will work properly again.

* Use b_interlock in lfs_vtruncbuf.

* Postpone dirop reclamation until after the seglock has been released,
so that lfs_truncate is not called with the segment lock held.

* Don't loop in lfs_fsync(), just write everything and wait.

* Be more careful about the interlock/uobjlock in lfs_putpages: when we
lose this lock, we have to resynchronize dirtiness of pages in each
block.

* Be sure to always write indirect blocks and update metadata in
lfs_putpages; fixes a bug that caused blocks to be accounted to the
wrong segment.
 1.119 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.118 01-Apr-2003  yamt add assertions and a debug check.
 1.117 28-Mar-2003  fvdl The checkpoint loop always used (multiples of) lfs_sepb as the number
of segments to mark. However, this may be much more than lfs_nseg.

Originally this wasn't a big problem, since only the structures in the
diskblock were changed, but nowadays there's a mirror of the segflags
in the in-core superblock. This problem caused the code to walk
way past the end of that allocated area, causing memory corruption
in other kernel structures. So, use lfs_nseg as the maximum, as it should be.

While here, simplify the loop; it had become an obfuscated piece of
code overtime.
 1.116 28-Mar-2003  perseant Add a sleeper count, to prevent the cleaner from panicing the kernel
when the filesystem is unmounted, relocking the Ifile when its lock is
draining. (We can't use vfs_busy() since the process is sleeping for a
good long time.) Clean up / organize lfs.h, while I'm here.

In lfs_update_single, assert that disk addresses are either negative, or
are still positive when converted to int32_t, to prevent recurrence of a
negative/positive block problem.
 1.115 21-Mar-2003  perseant KNF (space after keywords).
 1.114 21-Mar-2003  perseant Use VONWORKLST as a heuristic for vnode emptiness, rather than exhaustively
checking the memq.

Take greater care not to dirty the Ifile vnode when unmounting the filesystem.
This should fix a "(vp->v_flag & VONWORKLST) == 0" assertion panic in vgonel
that could occur when unmounting.

Do not allow the Ifile to be mapped for writing.
 1.113 20-Mar-2003  yamt lfs_writevnodes:
in the case of "starting over", kick lfs_writeseg
in order to avoid deadlock in check_dirty.
 1.112 20-Mar-2003  perseant Don't break out of Ifile-writing loop in lfs_segwrite until nothing is left.
Note however that blocks can be added to the Ifile even when the segment
block is held because of inodes' atime. Do not panic with "dirty blocks"
if these blocks are present.
 1.111 15-Mar-2003  perseant Add simple_lock protection for lfs_seglock and lfs_subsys_pages; these will
be expanded to cover other per-fs and subsystem-wide data as well.

Fix a case of IN_MODIFIED being set without updating lfs_uinodes, resulting
in a "lfs_uinodes < 0" panic.

Fix a deadlock in lfs_putpages arising from the need to busy all pages in a
block; unbusy any that had already been busied before starting over.
 1.110 15-Mar-2003  kristerw SO C requires a statement after a label.
 1.109 11-Mar-2003  perseant - Get rid of unused #ifdefs LFS_NO_PAGEMOVE and LFS_MALLOC_SUMMARY (both
always true) and accompanying dead code.

- When constructing write clusters in lfs_writeseg, if the block we are
about to add is itself a cluster from GOP_WRITE, don't put a cluster
in a cluster, just write the GOP_WRITE cluster on its own. This seems
to represent a slight performance gain on my test machine.

- Charge someone's rusage for writes on LFSes. It's difficult to tell
who the "right" process to charge is; just charge whoever triggered
the write.
 1.108 08-Mar-2003  perseant Take away "#ifdef LFS_UBC".
 1.107 08-Mar-2003  perseant Add an lfs_strategy() that checks to make sure we're not trying to read
where the cleaner is trying to write, instead of tying up the "live"
buffers (or pages).

Fix a bug in the LFS_UBC case where oversized buffers would not be
checksummed correctly, causing uncleanable segments.

Make sure that wakeup(fs->lfs_iocount) is done if fs->lfs_iocount is 1
as well as 0, since we wait in some places for it to drop to 1.

Activate all pages that make it into lfs_gop_write without the segment
lock held, since they must have been dirtied very recently, even if
PG_DELWRI is not set.
 1.106 04-Mar-2003  perseant Make sure we hold the uobjlock when checking for dirty pages, in lfs_vflush.
Note that pages can become dirty without our knowing it, anyway; don't
panic if that happens.
 1.105 02-Mar-2003  perseant Account SEGUSE_ACTIVE correctly so that the automatic segment cleaning
actually happens.

Add a new fcntl call that will write the minimum necessary to checkpoint
(i.e., for on-disk directory structure to be consistent, not including
updates to file data) so that the cleaner can clean segments more quickly
without sacrificing three-way commit for cleaning.
 1.104 23-Feb-2003  perseant Fix a buffer overflow bug in the LFS_UBC case that manifested itself
either as a mysterious UVM error or as "panic: dirty bufs". Verify
maximum size in lfs_malloc.

Teach lfs_updatemeta and lfs_shellsort about oversized cluster blocks from
lfs_gop_write.

When unwiring pages in lfs_gop_write, deactivate them, under the theory
that the pagedaemon wanted to free them last we knew.
 1.103 20-Feb-2003  perseant Tabify, and fix some comment alignment problems.
 1.102 19-Feb-2003  yamt acquire v_interlock before calling VOP_PUTPAGES.
 1.101 17-Feb-2003  perseant Add code to UBCify LFS. This is still behind "#ifdef LFS_UBC" for now
(there are still some details to work out) but expect that to go
away soon. To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:

* Create a writer daemon kernel thread whose purpose is to handle page
writes for the pagedaemon, but which also takes over some of the
functions of lfs_check(). This thread is started the first time an
LFS is mounted.

* Add a "flags" parameter to GOP_SIZE. Current values are
GOP_SIZE_READ, meaning that the call should return the size of the
in-core version of the file, and GOP_SIZE_WRITE, meaning that it
should return the on-disk size. One of GOP_SIZE_READ or
GOP_SIZE_WRITE must be specified.

* Instead of using malloc(...M_WAITOK) for everything, reserve enough
resources to get by and use malloc(...M_NOWAIT), using the reserves if
necessary. Use the pool subsystem for structures small enough that
this is feasible. This also obsoletes LFS_THROTTLE.

And a few that are not strictly necessary:

* Moves the LFS inode extensions off onto a separately allocated
structure; getting closer to LFS as an LKM. "Welcome to 1.6O."

* Unified GOP_ALLOC between FFS and LFS.

* Update LFS copyright headers to correct values.

* Actually cast to unsigned in lfs_shellsort, like the comment says.

* Keep track of which segments were empty before the previous
checkpoint; any segments that pass two checkpoints both dirty and
empty can be summarily cleaned. Do this. Right now lfs_segclean
still works, but this should be turned into an effectless
compatibility syscall.
 1.100 05-Feb-2003  pk Make the buffer cache code MP-safe.
 1.99 01-Feb-2003  thorpej Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.
 1.98 29-Jan-2003  yamt don't use daddr_t for segment summary since it's an on-disk structure.
 1.97 29-Jan-2003  simonb Remove variable that is only assigned to but not referenced.
 1.96 27-Jan-2003  yamt make these compilable with lfs debug options.
(follow daddr_t change)

XXX maybe segment number should be 64bit.
 1.95 27-Jan-2003  kleink Further printf format fixes in the wake of daddr_t.

Note that PRI?64 and long long int arguments aren't made for each other,
nor are %lld and int64_t arguments.
 1.94 25-Jan-2003  kleink Fix further printf format warnings for DEBUG, in the wake of daddr_t
having changed.
 1.93 25-Jan-2003  tron Use PRId64 instead of hard coding "%lld" to fix build problems under
LP64 ports.
 1.92 25-Jan-2003  tron Fix printf() format strings problems caused by "daddr_t" change.
 1.91 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.90 08-Jan-2003  yamt backout wrong assertions that i added.
 1.89 08-Jan-2003  yamt add assertions.
 1.88 31-Dec-2002  yamt write ifile only when it has dirty buffers.
 1.87 17-Dec-2002  yamt no need for cleaner to hold vnode locks.
cleaner and normal vnode operations are synchronized enough by
seglock/fraglock and buf's B_BUSY-ness.
 1.86 17-Dec-2002  yamt use ufs_daddr_t instead of int where appropriate.
 1.85 14-Dec-2002  yamt in lfs_writefile, check v_type==VNON earlier.
to avoid null dereference with DEBUG_LFS_VERBOSE.
 1.84 13-Dec-2002  yamt save a segment write when doing checkpoint.
 1.83 12-Dec-2002  yamt correct DIAGNOSTIC code for duplicated inodes in a segment and su_nbytes.
 1.82 27-Sep-2002  provos remove trailing \n in panic(). approved perry.
 1.81 22-Sep-2002  jdolecek don't need <sys/conf.h> here
 1.80 06-Jul-2002  perseant Deal with fragment size changes better. For each fragment that can
exist on an on-disk inode, we keep a record of its size in struct inode,
which is updated when we write the block to disk. The cleaner routines
thus have ready access to what size is the correct size for this block,
on disk.

Fixed a related bug: if a file with fragments is being cleaned
(fragments being cleaned) at the same time it is being extended beyond
NDADDR blocks, we could write a bogus FINFO record that has a frag in the
middle; when it was cleaned this would give back bogus file data. Don't
write the indirect blocks in this case, since there is no need.

lfs_fragextend and lfs_truncate no longer require the seglock, but instead
take a shared lock, which the seglock locks exclusively.
 1.79 16-Jun-2002  perseant For synchronous writes, keep separate i/o counters for each write, so
processes don't have to wait for one another to finish (e.g., nfsd seems
to be a little happier now, though I haven't measured the difference).
Synchronous checkpoints, however, must always wait for all i/o to finish.

Take the contents of the callback functions and have them run in thread
context instead (aiodoned thread). lfs_iocount no longer has to be
protected in splbio(), and quite a bit less of the segment construction
loop needs to be in splbio() as well.

If lfs_markv is handed a block that is not the correct size according to
the inode, refuse to process it. (Formerly it was extended to the "correct"
size.) This is possibly more prone to deadlock, but less prone to corruption.

lfs_segclean now outright refuses to clean segments that appear to have live
bytes in them. Again this may be more prone to deadlock but avoids
corruption.

Replace ufsspec_close and ufsfifo_close with LFS equivalents; this means
that no UFS functions need to know about LFS_ITIMES any more. Remove
the reference from ufs/inode.h.

Tested on i386, test-compiled on alpha.
 1.78 24-May-2002  perseant Fix a couple of instances where reassignbuf() was not done at splbio.

Tested on i386.
 1.77 23-May-2002  perseant Back out rev 1.174 of vfs_subr.c, because the splbio() wasn't protecting
enough to be useful, and broadening it so that it did would have meant
that operations possibly requiring synchronous disk activity would have
to be done in splbio(). This clearly was not going to work.

Worked around this in the LFS case by having lfs_cluster_callback put an
extra hold on the vnode before calling biodone(), and taking the hold
off without HOLDRELE's problematic list swapping. lfs_vunref() will take
care of that---in thread context---on the next write if need be.

Also, ensure that the list walking in lfs_{writevnodes,segunlock,gather}
takes into account the possibility that the list may change
underneath it (possibly because it itself deleted an element).

Tested on i386, test-compiled on alpha.
 1.76 20-May-2002  perseant branches: 1.76.2;
Protect v_freelist with splbio(), since HOLDRELE can be called in
interrupt context (through brelvp). (LFS may be the only subsystem
affected by this problem.)

Tested on i386.
 1.75 17-May-2002  perseant use macros from <sys/queue.h>
 1.74 14-May-2002  perseant branches: 1.74.2;
Phase one of my three-phase plan to make LFS play nice with UBC, and bug-fixes
I found while making sure there weren't any new ones.

* Make the write clusters keep track of the buffers whose blocks they contain.
This should make it possible to (1) write clusters using a page mapping
instead of malloc, if desired, and (2) schedule blocks for rewriting
(somewhere else) if a write error occurs. Code is present to use
pagemove() to construct the clusters but that is untested and will go away
anyway in favor of page mapping.
* DEBUG now keeps a log of Ifile writes, so that any lingering instances of
the "dirty bufs" problem can be properly debugged.
* Keep track of whether the Ifile has been dirtied by various routines that
can be called by lfs_segwrite, and loop on that until it is clean, for
a checkpoint. Checkpoints need to be squeaky clean.
* Warn the user (once) if the Ifile grows larger than is reasonable for their
buffer cache. Both lfs_mountfs and lfs_unmount check since the Ifile can
grow.
* If an inode is not found in a disk block, try rereading the block, under
the assumption that the block was copied to a cluster and then freed.
* Protect WRITEINPROG() with splbio() to fix a hang in lfs_update.
 1.73 23-Nov-2001  chs add spaces for KNF. confirmed to produce identical objects.
 1.72 08-Nov-2001  lukem add RCSID
 1.71 26-Oct-2001  lukem remove #include <ufs/ufs/quota.h> where it was just to appease
<ufs/ufs/inode.h>, since the latter now includes the former. leave the former
in source that obviously uses specific bits of it (for completeness.)
 1.70 26-Jul-2001  jdolecek branches: 1.70.2; 1.70.4;
lfs_writeseg(): make el_size a size_t (cosmetic only, no functional change)
 1.69 13-Jul-2001  perseant Merge the short-lived perseant-lfsv2 branch into the trunk.

Kernels and tools understand both v1 and v2 filesystems; newfs_lfs
generates v2 by default. Changes for the v2 layout include:

- Segments of non-PO2 size and arbitrary block offset, so these can be
matched to convenient physical characteristics of the partition (e.g.,
stripe or track size and offset).

- Address by fragment instead of by disk sector, paving the way for
non-512-byte-sector devices. In theory fragments can be as large
as you like, though in reality they must be smaller than MAXBSIZE in size.

- Use serial number and filesystem identifier to ensure that roll-forward
doesn't get old data and think it's new. Roll-forward is enabled for
v2 filesystems, though not for v1 filesystems by default.

- The inode free list is now a tailq, paving the way for undelete (undelete
is not yet implemented, but can be without further non-backwards-compatible
changes to disk structures).

- Inode atime information is kept in the Ifile, instead of on the inode;
that is, the inode is never written *just* because atime was changed.
Because of this the inodes remain near the file data on the disk, rather
than wandering all over as the disk is read repeatedly. This speeds up
repeated reads by a small but noticeable amount.

Other changes of note include:

- The ifile written by newfs_lfs can now be of arbitrary length, it is no
longer restricted to a single indirect block.

- Fixed an old bug where ctime was changed every time a vnode was created.
I need to look more closely to make sure that the times are only updated
during write(2) and friends, not after-the-fact during a segment write,
and certainly not by the cleaner.
 1.68 30-May-2001  mrg branches: 1.68.2; 1.68.4;
use _KERNEL_OPT
 1.67 09-Jan-2001  joff branches: 1.67.2;
If DIAGNOSTIC and the segment writer gets a badly sized buffer, panic()
instead of silently corrupting the filesystem.
 1.66 03-Dec-2000  perseant Get rid of some old unnecessary code that cleared B_NEEDCOMMIT from buffers in
lfs_writeseg (possibly after they had been freed).

If MALLOCLOG is defined, make lfs_newbuf and lfs_freebuf pass along the
caller's file and line to _malloc and _free.
 1.65 30-Nov-2000  jdolecek only include opt_ddb.h for !LKM
 1.64 27-Nov-2000  chs Initial integration of the Unified Buffer Cache project.
 1.63 27-Nov-2000  perseant If LFS_DO_ROLLFORWARD is defined, roll forward from the older checkpoint
on mount, through the newer checkpoint and on through any newer
partial-segments that may have been written but not checkpointed because
of an intervening crash.

LFS_DO_ROLLFORWARD is not defined by default.
 1.62 17-Nov-2000  perseant Correct accounting of lfs_avail, locked_queue_count, and locked_queue_bytes.
(PR #11468). In the case of fragment allocation, check to see if enough
space is available before extending a fragment already scheduled for writing.

The locked_queue_* variables indicate the number of buffer headers and bytes,
respectively, that are unavailable to getnewbuf() because they are locked up
waiting for LFS to flush them; make sure that that is actually what we're
counting, i.e., never count malloced buffers, and always use b_bufsize instead
of b_bcount.

If DEBUG is defined, the periodic calls to lfs_countlocked will now complain
if either counter is incorrect. (In the future lfs_countlocked will not need
to be called at all if DEBUG is not defined.)
 1.61 12-Nov-2000  perseant Do not needlessly dirty segment table blocks during lfs_segwrite,
preventing needless disk activity when the filesystem is idle. (PR #10979.)
 1.60 12-Nov-2000  toshii Fix obsolete comments in lfs_writeinode since rev. 1.27.
New comments are mostly from perseant, with my additions.
 1.59 09-Sep-2000  perseant oops
 1.58 09-Sep-2000  perseant Various bug-fixes to LFS, to wit:


Kernel:

* Add runtime quantity lfs_ravail, the number of disk-blocks reserved
for writing. Writes to the filesystem first reserve a maximum amount
of blocks before their write is allowed to proceed; after the blocks
are allocated the reserved total is reduced by a corresponding amount.

If the lfs_reserve function cannot immediately reserve the requested
number of blocks, the inode is unlocked, and the thread sleeps until
the cleaner has made enough space available for the blocks to be
reserved. In this way large files can be written to the filesystem
(or, smaller files can be written to a nearly-full but thoroughly
clean filesystem) and the cleaner can still function properly.

* Remove explicit switching on dlfs_minfreeseg from the kernel code; it
is now merely a fs-creation parameter used to compute dlfs_avail and
dlfs_bfree (and used by fsck_lfs(8) to check their accuracy). Its
former role is better assumed by a properly computed dlfs_avail.

* Bounds-check inode numbers submitted through lfs_bmapv and lfs_markv.
This prevents a panic, but, if the cleaner is feeding the filesystem
the wrong data, you are still in a world of hurt.

* Cleanup: remove explicit references of DEV_BSIZE in favor of
btodb()/dbtob().

lfs_cleanerd:

* Make -n mean "send N segments' blocks through a single call to
lfs_markv". Previously it had meant "clean N segments though N calls
to lfs_markv, before looking again to see if more need to be cleaned".
The new behavior gives better packing of direct data on disk with as
little metadata as possible, largely alleviating the problem that the
cleaner can consume more disk through inefficient use of metadata than
it frees by moving dirty data away from clean "holes" to produce
entirely clean segments.

* Make -b mean "read as many segments as necessary to write N segments
of dirty data back to disk", rather than its former meaning of "read
as many segments as necessary to free N segments worth of space". The
new meaning, combined with the new -n behavior described above,
further aids in cleaning storage efficiency as entire segments can be
written at once, using as few blocks as possible for segment summaries
and inode blocks.

* Make the cleaner take note of segments which could not be cleaned due
to error, and not attempt to clean them until they are entirely free
of dirty blocks. This prevents the case in which a cleanerd running
with -n 1 and without -b (formerly the default) would spin trying
repeatedly to clean a corrupt segment, while the remaining space
filled and deadlocked the filesystem.

* Update the lfs_cleanerd manual page to describe all the options,
including the changes mentioned here (in particular, the -b and -n
flags were previously undocumented).

fsck_lfs:

* Check, and optionally fix, lfs_avail (to an exact figure) and
lfs_bfree (within a margin of error) in pass 5.

newfs_lfs:

* Reduce the default dlfs_minfreeseg to 1/20 of the total segments.

* Add a warning if the sgs disklabel field is 16 (the default for FFS'
cpg, but not usually desirable for LFS' sgs: 5--8 is a better range).

* Change the calculation of lfs_avail and lfs_bfree, corresponding to
the kernel changes mentioned above.

mount_lfs:

* Add -N and -b options to pass corresponding -n and -b options to
lfs_cleanerd.

* Default to calling lfs_cleanerd with "-b -n 4".


[All of these changes were largely tested in the 1.5 branch, with the
idea that they (along with previous un-pulled-up work) could be applied
to the branch while it was still in ALPHA2; however my test system has
experienced corruption on another filesystem (/dev/console has gone
missing :^), and, while I believe this unrelated to the LFS changes, I
cannot with good conscience request that the changes be pulled up.]
 1.57 09-Sep-2000  perseant Fix a buffer-cache corrupting bug in lfs_writeseg, where brelse could
be improperly used on an already-queued buffer.
 1.56 05-Jul-2000  perseant Clean up accounting of lfs_uinodes (dirty but unwritten inodes).

Make lfs_uinodes a signed quantity for debugging purposes, and set it to
zero as fs mount time.

Enclose setting/clearing of the dirty flags (IN_MODIFIED, IN_ACCESSED,
IN_CLEANING) in macros, and use those macros everywhere. Make
LFS_ITIMES use these macros; updated the ITIMES macro in inode.h to know
about this. Make ufs_getattr use ITIMES instead of FFS_ITIMES.
 1.55 04-Jul-2000  perseant Fix errors observed while trying to fill the filesystem with yesterday's
fixes:

- Write copies of bfree and avail in the CLEANERINFO block, so the
cleaner doesn't have to guess which superblock has the current
information (if indeed any do).

- Tighten up accounting of lfs_avail (more needs to be done).

- When cleansing indirect blocks of UNWRITTEN, make sure not to mark
them clean, since they'll need to be rewritten later.
 1.54 03-Jul-2000  perseant i_lfs_effnblks fixes. Put debugging printfs under #ifdef DEBUG_LFS.
 1.53 03-Jul-2000  perseant Allow the number of free segments reserved for the cleaner to be
parametrized in the filesystem, defaulting to MIN_FREE_SEGS = 2 but set
to something more reasonable at newfs_lfs time.

Note the number of blocks that have been scheduled for writing but which
are not yet on disk in an inode extension, i_lfs_effnblks. Move
i_ffs_effnlink out of the ffs extension and onto the main inode, since
it's used all over the shared code and the lfs extension would clobber
it.

At inode write time, indirect blocks and inode-held blocks of inodes
that have i_lfs_effnblks != i_ffs_blocks are cleansed of UNWRITTEN disk
addresses, so that these never make it to disk.
 1.52 27-Jun-2000  perseant Fixes associated with filling an LFS:

Change the space computation to appear to change the size of the *disk*
rather than the *bytes used* when more segment summaries and inode
blocks are written. Try to estimate the amount of space that these will
take up when more files are written, so the disk size doesn't change too
much.

Regularize error returns from lfs_valloc, lfs_balloc, lfs_truncate: they
now fail entirely, rather than succeeding half-way and leaving the fs in
an inconsistent state.

Rewrite lfs_truncate, mostly stealing from ffs_truncate. The old
lfs_truncate had difficulty truncating a large file to a non-zero size
(indirect blocks were not handled appropriately).

Unmark VDIROP on fvp after ufs_remove, ufs_rmdir, so these can be
reclaimed immediately: this vnode would not be written to disk again
anyway if the removal succeeded, and if it failed, no directory
operation occurred.

ufs_makeinode and ufs_mkdir now remove IN_ADIROP on error.
 1.51 27-Jun-2000  perseant From John Evans <jevans@cray.com>: use datosn() to convert to segment
number, when remarking the current segment ACTIVE. See PR #10463.
 1.50 22-Jun-2000  perseant Update lfs_vunref for the fact that now a vnode can be locked with no
references (locked for VOP_INACTIVE at the end of vrele) and it's okay.
Check the return value of lfs_vref where appropriate.
Fixes PR #s 10285 and 10352.
 1.49 06-Jun-2000  perseant branches: 1.49.2;
Protect inode free list with seglock, instead of separate lock, so that
the head of the inode free list (on the superblock) always matches the
rest of the free list (in the ifile).

Protect lfs_fragextend with seglock, to prevent the segment byte count
fudging from making its way to disk.

Don't try to inactivate dirop vnodes that are still in the middle of
their dirop (may address PR#10285).
 1.48 31-May-2000  fredb Make this build. (Balance parenthesis.
 1.47 31-May-2000  perseant update for IN_ACCESSED changes
 1.46 27-May-2000  perseant branches: 1.46.2;
Prevent dirops from getting around lfs_check and wedging the buffer cache.
All the dirop vnops now mark the inodes with a new flag, IN_ADIROP, which
is removed as soon as the dirop is done (as opposed to VDIROP which stays
until the file is written). To address one issue raised in PR#9357.
 1.45 19-May-2000  thorpej NULL != 0
 1.44 10-May-2000  perseant stop vnode reference leak introduced in patch to PR#9994
 1.43 05-May-2000  perseant Change the way LFS does block accounting, from trying to infer from the
buffer cache flags, to marking the inode and/or indirect blocks with a
special disk address UNWRITTEN==-2 when a block is accounted for. (This
address is never written to disk, but only used in-core. This is essentially
the same method of block accounting as on the UBC branch, where the buffer
headers don't exist.) Make sure that truncation is handled properly,
especially in the case of holey files.

Fixes PR#9994.
 1.42 30-Mar-2000  augustss Remove register declarations.
 1.41 13-Mar-2000  soren Fix doubled 'the's in comments.
 1.40 19-Jan-2000  perseant Changes to stabilize LFS. The first two of these should also apply to the
1.4 branch.

* Use a separate per-fs lock, instead of ufs_hashlock, to protect the Inode
free list. This seems to prevent the "lockmgr: %d, not exclusive lock holder
%d, unlocking" message I was mis-attributing last night to an unlocked vnode
being passed to vrele.

* Change calling semantics of lfs_ifind, to give better error reporting:
If fed a struct buf, it can report the block number of the offending inode
block as well as the inode number.

* Back out rev 1.10 of lfs_subr.c, since the replacement code was slightly
uglier while being functionally identical.

* Make lfs_vunref use the same free list convention as vrele/vput, so that
vget does not remove vnodes from a hash list they are not on.
 1.39 16-Jan-2000  perseant Fix a problem in my changes of Dec 14th, that prevents removed vnodes
from being inactivated under some conditions. Removed vnodes are now
inactivated when the VDIROP flag is cleared, and to prevent block
accounting problems this clearing has been postponed until
lfs_segunlock.
 1.38 14-Jan-2000  perseant Better handling of various combinations of cleaning, vnode flushing, and
dirop writing. In particular, lfs_writevnodes now writes all buffers from
a flushed vnode whether cleaning or not, and the same with the Ifile; and
lfs_segwrite does not attempt to write data from other non-cleaning vnodes,
even if a vnode is being flushed.
 1.37 03-Dec-1999  perseant Handle the case of a vnode flush while dirops are active correctly in
lfs_segwrite. Also, make sure a flush is called in SET_DIROP before sleeping
on its results. Addresses PR #8863.
 1.36 17-Nov-1999  perseant Fix spllevel problem with superblock exclusion and with segment write throttle.
May address PR#8383.
 1.35 15-Nov-1999  fvdl Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.34 12-Nov-1999  perseant Back out my patch of the 8th (to address unreferenced inode problem).
Apparently this needs more thought.
 1.33 09-Nov-1999  perseant If ifile blocks were written before dirops were complete, and then the
system crashed, inodes could be allocated that were not referenced. (Though
not a serious problem, it evidences itself in phase 4 of fsck_lfs.) Fix
this by marking if_daddr with UNASSIGNED before the inodes are actually
written; at mount time the ifile is checked for UNASSIGNED entries and
any that are found are linked back into the free list. (The latter
functionality should move into the roll-forward agent when it materializes.)
 1.32 06-Nov-1999  perseant branches: 1.32.2;
Address ufs_hashlock/ufs_ihashins protocol bug, discovered while doing a
post-mortem of a production machine. Also, take the active dirop
count off of the fs and make it global (since it is measuring a global
resource) and tie the threshold value LFS_MAXDIROP to desiredvnodes.
 1.31 01-Oct-1999  mycroft branches: 1.31.2; 1.31.4; 1.31.6;
Fix printf() formats.
 1.30 03-Sep-1999  perseant Make changes that will allow an LFS filesystem to be used as the root
filesystem. In particular,

- Fix mknod deadlock, described in PR 8172.
- Enable lfs_mountroot.
- Make lfs_writevnodes treat filesystems mounted on lfs device nodes properly,
by flushing that device rather than trying to add blocks to the device inode.

This, in combination with lfs boot blocks, will allow operation of an all-lfs
system.
 1.29 08-Jul-1999  wrstuden Modify file systems to deal with struct lock in struct vnode. All leaf
fs's other than nfs use genfs_lock() for locking.

Modify lookup routines to set PDIRUNLOCK when they unlock the parrent.
 1.28 17-Jun-1999  tls squash some compiler warnings on debug printfs by casting to int
 1.27 15-Jun-1999  perseant Minor changes to the segment live bytes calculation. In particular, fixed
a bug in fragment extension that could run the count negative. Also, don't
overcount for inodes, and don't count segment summaries. Thus, for empty
segments the live bytes count should now be exactly zero.
 1.26 12-Apr-1999  perseant Make sure that the wakeup occurs for vnodes that lfs_update might be sleeping
on (nodes which are not marked IN_MODIFIED/IN_CLEANING, but which have dirty
buffers), by marking them with the appropriate flag if dirtybuffers were added
while the write was in progress.
 1.25 12-Apr-1999  perseant Better checking for held inode locks in lfs_fastvget, for a number of error
conditions. Also change the default setting of lfs_clean_vnhead to 0, which
seems to make the locking problems go away (although this is difficult to
test as I can't reliably reproduce them).
 1.24 12-Apr-1999  perseant Fix "lfs_ifind: dinode xxx not found" panic. When inodes were freed,
then immediately reloaded, their dinodes were located in an inode block
which was not on disk at the advertized location, nor in the cache (although
it would be flushed to disk next segment write). Fix this by using getblk()
instead of lfs_newbuf() for inode blocks.
 1.23 30-Mar-1999  perseant branches: 1.23.2;
Add initialization to quell compiler warning (only on some platforms?)
 1.22 30-Mar-1999  perseant Move variable initialization to the top of lfs_vflush
 1.21 29-Mar-1999  perseant lfs_truncate calls vinvalbuf to invalidate all currently-hald buffers, which
in turn forces a flush of the vnode, whether or not it is involved in a dirop.
(This can happen during a remove or rmdir, when the directory is shrunk.)
Because of the nature of dirops, however, flushing a vnode involved in a dirop
is disallowed (and was marked with a panic). This patch has lfs_truncate
call a specialized vinvalbuf that only invalidates buffers following the new
end-of-file, and thus does not require a flush. Also the panic is demoted,
in case I missed any other path to lfs_vflush.
 1.20 25-Mar-1999  perseant Make sysctl variable lfs_clean_vnhead do what it was supposed to do,
namely, toggle whether vnodes loaded only for cleaning (as opposed to
normal filesystem use) are freed to the *head* of the vnode free list,
rather than the tail. This should avoid a possible cache flushing
effect, if the cleaner cleans a segment containing a large number of
live inodes.
 1.19 25-Mar-1999  perseant Fixes to make dirops and lfs_vflush play together well. In particular,
if we are short on vnodes, lfs_vflush from another process can grab a
vnode that lfs_markv has already processed but not yet written; but
lfs_markv holds the seglock. When lfs_vflush gets around to writing it,
the context for copyin is gone. So, now lfs_markv calls copyin itself,
rather than having lfs_writeseg do it.
 1.18 25-Mar-1999  perseant Lock buffers with B_BUSY between data checksum calculation and write, so
some other process doesn't change the data after it was checksummed.
 1.17 25-Mar-1999  perseant Change lfs_sb_cksum to use offsetof() instead of an inlined version.

Fix lfs_vref/lfs_vunredf to ignore VXLOCKed vnodes that are also being
flushed.

Improve the debugging messages somewhat.
 1.16 25-Mar-1999  perseant clean up unused/required #ifdefs
 1.15 10-Mar-1999  perseant New sources should leave the LFS in a more-or-less working state. Changes
include:

- DIROP segregation is enabled, and greater care is taken
to make sure that a checkpoint completes. Fsck is not
needed to remount the filesystem.
- Several checks to make sure that the LFS subsystem does not
overuse various resources (memory, in particular).
- The cleaner routines, lfs_markv in particular, are completely
rewritten. A buffer overflow is removed. Greater care is taken
to ensure that inodes come from where lfs_cleanerd say they come
from (so we know nothing has changed since lfs_bmapv was called).
- Fragment allocation is fixed, so that writes beyond end-of-file
do the right thing.
 1.14 09-Nov-1998  mycroft GC the B_CACHE bit.
 1.13 23-Oct-1998  thorpej Use DINODE_SIZE rather than sizeof(struct dinode).
 1.12 11-Sep-1998  pk PR#6032: define fixed sized on-disk superblock structure.
 1.11 08-May-1998  kleink Fix some arithmetics lossage on typeless pointers.
 1.10 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.9 13-Jun-1997  pk TIMESPEC_TO_TIMEVAL => TIMEVAL_TO_TIMESPEC
 1.8 11-Jun-1997  bouyer Add support for ext2fs, this needed a few modifications to ufs/ufs/inode.h:
- added an "union inode_ext" to struct inode, for the per-fs extentions.
For now only ext2fs uses it.
- i_din is now an union:
union {
struct dinode ffs_din; /* 128 bytes of the on-disk dinode. */
struct ext2fs_dinode e2fs_din; /* 128 bytes of the on-disk dinode. */
} i_din
Added a lot of #define i_ffs_* and i_e2fs_* to access the fields.
- Added two macros: FFS_ITIMES and EXT2FS_ITIMES. ITIMES calls the rigth
macro, depending on the time of the inode. ITIMES is used where necessary,
FFS_ITIMES and EXT2FS_ITIMES in other places.
 1.7 12-Oct-1996  christos revert previous kprintf changes
 1.6 10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.5 01-Sep-1996  mycroft Add a set of generic file system operations that most file systems use.
Also, fix some time stamp bogosities.
 1.4 09-Feb-1996  christos lfs prototypes
 1.3 21-Aug-1994  cgd C syntax fix, and syscall args style (For later.)
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.23.2.10 20-Jan-2000  he Pull up revision 1.39 (requested by perseant):
Files removed (through unlink, rmdir) are now really removed, though the
removal is postponed until the dirop is complete to ensure validity of
the filesystem through a crash. Use a separate per-fs lock, instead of
ufs_hashlock, to protect the inode free list. Change calling semantics
of lfs_ifind, to give better error reporting: If fed a struct buf, it
can report the block number of the offending inode block as well as the
inode number.
 1.23.2.9 15-Jan-2000  he Pull up revision 1.38 (requested by perseant):
Handle flushing a vnode during cleaning, and cleaning the Ifile,
more correctly, avoiding possible disk corruption in some cases.
 1.23.2.8 15-Jan-2000  he Pull up revision 1.30 (requested by perseant):
Address problems related to using an LFS filesystem as the root
filesystem, including mknod hangs. Fixes PR#8172 and PR#9072.
 1.23.2.7 18-Dec-1999  he Pull up revision 1.37 (requested by perseant):
Handle the case of a vnode flush while dirops are active correctly
in lfs_segwrite. Also, make sure a flush is called in SET_DIROP
before sleeping on its results. Addresses PR#8863.
 1.23.2.6 17-Dec-1999  he Pull up revision 1.32 (requested by perseant):
Address locking protocol error for inode hash, and make the
maximum number of active dirops a global quantity.
 1.23.2.5 16-Dec-1999  he Pull up revision 1.36 (requested by perseant):
Fix spllevel problem with superblock exclusion and with write
throttle. Addressess PR#8383.
 1.23.2.4 10-Oct-1999  cgd pull up rev 1.31 from trunk (requested by mycroft):
Fix potential overflow of v_usecount and v_writecount (and panics
resulting from this) by widening them to `long'. Mostly affects
systems where maxvnodes>=32768.
 1.23.2.3 03-Sep-1999  he Pull up revision 1.28:
Fix a printf format bug that gives compiler warnings/errors on
64-bit platforms, fixing PR#8241. (perseant)
 1.23.2.2 25-Jun-1999  perry pullup 1.26->1.27 (perseant)
 1.23.2.1 13-Apr-1999  perseant branches: 1.23.2.1.2; 1.23.2.1.4;
Pull-up of changes made to the trunk on Sunday [1.23->1.26], to wit:

Take out the `#ifdef USE_UFSHASH'; use ufs_hashlock to lock the inode free
list instead of free_lock.

Fix inode reporting in lfs_statfs (the meaning of f_files and f_ffree was
reversed).

Fix "lfs_ifind: dinode xxx not found" panic. When inodes were freed, then
immediately reloaded, their dinodes were located in an inode block which
was not on disk at the advertized location, nor in the cache (although it
would be flushed to disk next segment write). Fix this by using getblk()
instead of lfs_newbuf() for inode blocks.

Better checking for held inode locks in lfs_fastvget, for a number of
error conditions. Also change the default setting of lfs_clean_vnhead to
0, which seems to make the locking problems go away (although this is
difficult to test as I can't reliably reproduce them).

Make sure that the wakeup occurs for vnodes that lfs_update might be
sleeping on (nodes which are not marked IN_MODIFIED/IN_CLEANING, but which
have dirty buffers), by marking them with the appropriate flag if
dirtybuffers were added while the write was in progress.

Fix block counting during file truncation, if not truncating to zero.

Disallow threshold-initiated cache flush when dirops are active. Also,
make SET_ENDOP use lfs_check instead of inlining most of it.

Improve the debugging printfs in the cleaner syscalls (in particular, make
it obvious that they're coming from lfs).

Check the superblock version field, and refuse to mount the filesystem if
the version number is higher than we know about. This allows, e.g.,
changes in the format of the ifile, segment size restrictions and
boundaries, etc., which would not affect existing fields in the
superblock, but which would drastically affect the filesystem, to be
smoothly integrated at a later date.
 1.23.2.1.4.1 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.23.2.1.2.4 31-Aug-1999  perseant Rudimentary support for LFS under UBC:

- LFS-specific VOP_BALLOC and VOP_PUTPAGES vnode ops.

- getblk VREG panic #ifdef'd out (can be reinstated when Ifile is
internalized and Ifile can be made another type from VREG)

- interface to VOP_PUTPAGES changed to pass all pager flags, not
just sync. FS putpages routines must know about the pager flags.

- new LFS magic disk address, -2 ("unwritten"), meaning accounted for
but not assigned to a fixed disk location (since LFS does these two
things separately, and the previous accounting method using buffer
headers no longer will work). Changed references to (foo == (daddr_t)-1)
to (foo < 0). Since disk drivers reject all addresses < 0, this should
not present a problem for other FSs.
 1.23.2.1.2.3 02-Aug-1999  thorpej Update from trunk.
 1.23.2.1.2.2 21-Jun-1999  thorpej Correct a printf format now that vnode flags are an int (in the uvm_vnode
structure).
 1.23.2.1.2.1 21-Jun-1999  thorpej Sync w/ -current.
 1.31.6.2 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.31.6.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.31.4.2 15-Nov-1999  fvdl Sync with -current
 1.31.4.1 19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.31.2.4 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.31.2.3 08-Dec-2000  bouyer Sync with HEAD.
 1.31.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.31.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.32.2.2 06-Nov-1999  perseant Address ufs_hashlock/ufs_ihashins protocol bug, discovered while doing a
post-mortem of a production machine. Also, take the active dirop
count off of the fs and make it global (since it is measuring a global
resource) and tie the threshold value LFS_MAXDIROP to desiredvnodes.
 1.32.2.1 06-Nov-1999  perseant file lfs_segment.c was added on branch comdex-fall-1999 on 1999-11-06 20:33:06 +0000
 1.46.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.49.2.4 03-Feb-2001  he Pull up revisions 1.60-1.62 (requested by perseant):
o Don't write anything if the filesystem is idle (PR#10979).
o Close up accounting holes in LFS' accounting of immediately-
available-space, number of clean segments, and amount of dirty
space taken up by metadata (PR#11468, PR#11470, PR#11534).
 1.49.2.3 14-Sep-2000  perseant Pull up recent LFS kernel changes (approved by thorpej):

ufs/ufs/inode.h, 1.20--1.22 (add i_lfs_effnblks extension ;
make ITIMES aware of LFS_ITIMES;
_LKM protection so userland progs
compile)
ufs/ufs/ufs_vnops.c, 1.69, 1.71 (remove IN_ADIROP;
use ITIMES instead of FFS_ITIMES)
ufs/ufs/ufs_readwrite.c, 1.27 (use lfs_reserve in lfs_write)
ufs/lfs/lfs.h, 1.26--1.32 (define LFS_EST_* macros ;
change MIN_FREE_SEGS to lfs_minfreesegs ;
add avail and bfree to CLEANERINFO ;
change lfs_uinodes to signed ;
change lfs_dmeta to signed ;
add whitespace to line up structure
members ;
explicit cast to int32_t in LFS_EST_*
macros)
ufs/lfs/lfs_alloc.c, back out 1.34.2.3 (pullups of 1.39, 1.40);
then pull up 1.38 (clean up on error)
1.39--1.43 (restore fvdl's ufs_hashlock fix ;
restore fvdl's ufs_hashlock fix ;
set i_lfs_effnblks ;
use UINO macros ;
add comments and fix long lines)
ufs/lfs/lfs_balloc.c, 1.19 (don't succeed halfway)
1.21--1.25 (use i_lfs_effnblks ;
fix i_lfs_effnblks computation and
quieten ;
fix i_ffs_blocks in unwritten fragment ;
remove useless debugging check ;
add comments and (c) 2000)
ufs/lfs/lfs_bio.c, 1.24--1.30 (cleanup and make lfs_flush_fs take
"struct lfs *" instead of "struct
mount *" ;
use lfs_minfreeseg instead of
MIN_FREE_SEGS ;
use UINO macros, and copy bfree/avail
to CLEANERINFO ;
add lfs_reserve function ;
1.28--1.30 fix printf formatting)
ufs/lfs/lfs_cksum.c, 1.13 (add (c) 2000)
ufs/lfs/lfs_debug.c, 1.11 (use btodb instead of DEV_BSIZE)
ufs/lfs/lfs_extern.h, 1.18, 1.20--1.21 (function prototype changes)
ufs/lfs/lfs_inode.c, 1.38 (rewrite lfs_truncate from
ffs_truncate)
1.40--1.44 (count written and unwritten blocks
seperately ;
use disk block units instead of bytes ;
remove unnecessary "mod" variable ;
correct B_DELWRI to avoid bawrite panic ;
use lfs_reserve)
ufs/lfs/lfs_segment.c, 1.52-1.59 (use lfs_dmeta to note used summaries ;
check for UNWRITTEN in indirect blocks ;
more debugging stuff inside #ifdef
DEBUG_LFS ;
use LK_CANRECURSE ;
don't drop dirty indirect blocks ;
use UINO macros ;
don't hose the free list ;
use btodb() instead of DEV_BSIZE ;
make it compile again (oops))
ufs/lfs/lfs_subr.c, 1.16--1.17 (check for locked inodes before
changing ;
use btodb() instead of DEV_BSIZE, (c)
2000)
ufs/lfs/lfs_syscalls.c, back out 1.41.4.2 (fvdl's ufs_hashlock fix);
then pull up 1.43 (use lfs_dmeta)
1.44--1.45 (restore fvdl's ufs_hashlock fix)
1.46--1.47 (fix lfs_avail leakage from sblock
segments ;
use UINO macros)
1.49 (bounds-check inode numbers in
lfs_markv)
ufs/lfs/lfs_vfsops.c, 1.53 (use LFS_EST_* macros in lfs_statfs)
1.56--1.58 (initialize lfs_minfreeseg, lfs_effnblk ;
initialize lfs_uinodes ;
initialize lfs_ravail)
ufs/lfs/lfs_vnops.c, 1.40 (remove VDIROP from removed files)
1.42--1.44 (move SET_ENDOP below the removal of
VDIROP ;
use UINO macros and add lfs_itimes
function ;
use lfs_reserve in dirops)
 1.49.2.2 28-Jun-2000  perseant pull up active current segment patch from trunk
 1.49.2.1 22-Jun-2000  perseant Pull up lfs_vunref fix from the trunk.
 1.67.2.13 08-Jan-2003  thorpej Oh my aching HEAD.
 1.67.2.12 08-Jan-2003  thorpej Sync with HEAD.
 1.67.2.11 03-Jan-2003  thorpej Sync with HEAD.
 1.67.2.10 19-Dec-2002  thorpej Sync with HEAD.
 1.67.2.9 18-Oct-2002  nathanw Catch up to -current.
 1.67.2.8 01-Aug-2002  nathanw Catch up to -current.
 1.67.2.7 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.67.2.6 20-Jun-2002  nathanw Catch up to -current.
 1.67.2.5 08-Jan-2002  nathanw Catch up to -current.
 1.67.2.4 14-Nov-2001  nathanw Catch up to -current.
 1.67.2.3 24-Aug-2001  nathanw Catch up with -current.
 1.67.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.67.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.68.4.5 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.68.4.4 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.68.4.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.68.4.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.68.4.1 03-Aug-2001  lukem update to -current
 1.68.2.3 02-Jul-2001  perseant Change disk addressing unit to be the fragment, instead of the disk sector.
All quantities in the superblock, inodes, indirect blocks, etc. refer now
to this abstract unit (called "fsb" as it is in FFS) instead of disk sectors;
as a consequence segment summary blocks have to be multiples of a fragment in
size. In v1 filesystems, compatibility code ensures that 1 fsb == 1 sector,
regardless of fragment size.

Fragments can now range in size between 512 and 32k; in the event that
LFS_LABELPAD (8k) is smaller than the disk address unit size, an extra
proto-superblock is kept at 8k from the beginning of the disk, to be used
*only* to locate the real superblocks. (Not all of the userland knows about
this yet.)

Almost all of this was done not by me, but by joff.
 1.68.2.2 29-Jun-2001  perseant Get rid of __P(), protoizing where it had not already been done
 1.68.2.1 27-Jun-2001  perseant Import of what I've been calling "LFSv2", that is, LFS with some features
added that require changes to the on-disk data structures. These include:

- 64-bit time in everything but inodes
- User-specified segment offset, and segment size no longer
restricted to PO2.
- Serial number on segment summaries in addition to timestamp, and
a new volume identifier, to make roll-forward feasible without
fear of finding old data and thinking it was new.

Although I think this version works at least as well as what's on the trunk,
we're not done yet; hence this commit is going in on a branch and not on
the trunk. Enhancements that are not here yet include fragment addressing,
like FFS does, instead of block addressing.
 1.70.4.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.70.2.1 07-Sep-2001  thorpej Commit my "devvp" changes to the thorpej-devvp branch. This
replaces the use of dev_t in most places with a struct vnode *.

This will form the basic infrastructure for real cloning device
support (besides being architecurally cleaner -- it'll be good
to get away from using numbers to represent objects).
 1.74.2.3 15-Jul-2002  gehenna catch up with -current.
 1.74.2.2 20-Jun-2002  gehenna catch up with -current.
 1.74.2.1 30-May-2002  gehenna Catch up with -current.
 1.76.2.3 20-Jun-2002  lukem Pull up revision 1.79 (requested by perseant in ticket #325):
For synchronous writes, keep separate i/o counters for each write, so
processes don't have to wait for one another to finish (e.g., nfsd seems
to be a little happier now, though I haven't measured the difference).
Synchronous checkpoints, however, must always wait for all i/o to finish.
Take the contents of the callback functions and have them run in thread
context instead (aiodoned thread). lfs_iocount no longer has to be
protected in splbio(), and quite a bit less of the segment construction
loop needs to be in splbio() as well.
If lfs_markv is handed a block that is not the correct size according to
the inode, refuse to process it. (Formerly it was extended to the "correct"
size.) This is possibly more prone to deadlock, but less prone to corruption.
lfs_segclean now outright refuses to clean segments that appear to have live
bytes in them. Again this may be more prone to deadlock but avoids
corruption.
Replace ufsspec_close and ufsfifo_close with LFS equivalents; this means
that no UFS functions need to know about LFS_ITIMES any more. Remove
the reference from ufs/inode.h.
Tested on i386, test-compiled on alpha.
 1.76.2.2 02-Jun-2002  tv Pull up revision 1.78 (requested by perseant in ticket #135):
Fix a couple of instances where reassignbuf() was not done at splbio.
Tested on i386.
 1.76.2.1 02-Jun-2002  tv Pull up revision 1.77 (requested by perseant in ticket #132):
Back out rev 1.174 of vfs_subr.c, because the splbio() wasn't protecting
enough to be useful, and broadening it so that it did would have meant
that operations possibly requiring synchronous disk activity would have
to be done in splbio(). This clearly was not going to work.
Worked around this in the LFS case by having lfs_cluster_callback put an
extra hold on the vnode before calling biodone(), and taking the hold
off without HOLDRELE's problematic list swapping. lfs_vunref() will take
care of that---in thread context---on the next write if need be.
Also, ensure that the list walking in lfs_{writevnodes,segunlock,gather}
takes into account the possibility that the list may change
underneath it (possibly because it itself deleted an element).
Tested on i386, test-compiled on alpha.
 1.124.2.10 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.124.2.9 08-Mar-2005  skrll Sync with HEAD.
 1.124.2.8 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.124.2.7 24-Sep-2004  skrll Sync with HEAD.
 1.124.2.6 21-Sep-2004  skrll Fix the sync with head I botched.
 1.124.2.5 18-Sep-2004  skrll Sync with HEAD.
 1.124.2.4 25-Aug-2004  skrll Sync with HEAD.
 1.124.2.3 24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.124.2.2 03-Aug-2004  skrll Sync with HEAD
 1.124.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.152.4.1 10-May-2005  riz Pull up the following revisions (requested by perseant in ticket #1281):

1.8 sys/ufs/lfs/TODO
1.75 sys/ufs/lfs/lfs.h (via patch)
1.74 sys/ufs/lfs/lfs_alloc.c (via patch)
1.49, 1.51 sys/ufs/lfs/lfs_balloc.c (1.51 via patch)
1.78 sys/ufs/lfs/lfs_bio.c
1.62 sys/ufs/lfs/lfs_extern.h (via patch)
1.156 sys/ufs/lfs/lfs_segment.c (via patch)
1.48 sys/ufs/lfs/lfs_subr.c
1.101 sys/ufs/lfs/lfs_syscalls.c
1.163 sys/ufs/lfs/lfs_vfsops.c (via patch)
1.134 sys/ufs/lfs/lfs_vnops.c (via patch)
1.61 sys/ufs/ufs/ufs_readwrite.c (via patch)

1.20 libexec/lfs_cleanerd/clean.h (via patch)
1.52 libexec/lfs_cleanerd/cleanerd.c (via patch)
1.41 libexec/lfs_cleanerd/library.c (via patch)

1.4 regress/sys/fs/lfs/newfs_fsck/Makefile
1.2 regress/sys/fs/lfs/newfs_fsck/mkfs_mount
1.2 regress/sys/fs/lfs/newfs_fsck/smallfiles
1.3 sbin/fsck_lfs/bufcache.c
1.3 sbin/fsck_lfs/bufcache.h
1.3 sbin/fsck_lfs/lfs.h
1.8 sbin/fsck_lfs/lfs.c (via patch)
1.8 sbin/fsck_lfs/pass3.c (via patch)
1.18 sbin/fsck_lfs/pass0.c (via patch)
1.18 sbin/fsck_lfs/utilities.c (via patch)
1.7 sbin/fsck_lfs/segwrite.c
1.19 sbin/fsck_lfs/setup.c (via patch)
1.3 sbin/newfs_lfs/Makefile
0 sbin/newfs_lfs/lfs.c (yes, remove it)
1.1 sbin/newfs_lfs/make_lfs.c
1.15 sbin/newfs_lfs/newfs.c (via patch)

Various minor LFS improvements.

Kernel:

* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this. Should fix PR #29045.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
Fixes PR #26680.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().

cleaner:

* Adapt lfs_cleanerd to use the fcntl call to get the Ifile filehandle,
so it need not be in the namespace.
* Make lfs_cleanerd be more careful when there are very few available
segments.
* Make lfs_cleanerd less verbose when the filesystem is unmounted.

newfs_lfs, fsck_lfs, and regression:

* Extend the lfs library from fsck_lfs(8) so that it can be used with a
not-yet-existent LFS. Make newfs_lfs(8) use this library, so it can
create LFSs whose Ifile is larger than one segment. Addresses PR #11110.
* Make newfs_lfs(8) use strsuftoi64() for its arguments, a la newfs(8).
* Make fsck_lfs(8) respect the "file system is clean" flag.
* Don't let fsck_lfs(8) think it has dirty blocks when invoked with the
-n flag.
* Remove the Ifile from the filesystem namespace. The cleaner now uses
a fcntl call on the root inode to find the Ifile filehandle. (As a
side-effect, addresses PR #29144.)
 1.155.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.155.4.1 29-Apr-2005  kent sync with -current
 1.158.2.13 10-Aug-2006  tron Apply patch (requested by fair in perseant #1457):
Bring LFS up to current, including a patch (1.95 lfs_alloc.c) that
should prevent the inode free list errors seen on the STABLE branch
subsequent to pullup ticket #1327.
 1.158.2.12 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_alloc.c: revision 1.93
sys/ufs/lfs/lfs.h: revision 1.106
sys/ufs/lfs/lfs_vfsops.c: revision 1.209
sys/ufs/lfs/lfs_vnops.c: revision 1.175
sys/ufs/lfs/lfs_segment.c: revision 1.178
Fixes to address the "vinvalbuf: dirty blocks" panic that can occur when
many inodes are cleaned at once. Make sure that we write all the pages
on vnodes that are being flushed, even if we don't think there's room;
drain v_numoutput before lfs_vflush() completes.
Also, don't allow a vnode that is in the process of being cleaned to be
chosen by getnewvnode(); this avoids a segment accounting panic in the case
that a large number of inodes are fed to lfs_markv() all at once.
 1.158.2.11 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.171
sys/ufs/lfs/lfs_extern.h: revision 1.81
sys/ufs/lfs/lfs_segment.c: revision 1.177
Don't ever partially write dirops, even if we need the cleaner to run.
This increases the chances of the "no clean segments" panic slightly,
but allows us to run the ckckp regression test successfully to completion.
 1.158.2.10 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs.h: revision 1.104
sys/ufs/lfs/lfs_vfsops.c: revision 1.206
sys/ufs/lfs/lfs_vnops.c: revision 1.170
sys/ufs/lfs/lfs_extern.h: revision 1.80
sys/ufs/lfs/lfs_segment.c: revision 1.176
sys/ufs/lfs/lfs_inode.c: revision 1.103 via patch
sys/ufs/lfs/lfs_alloc.c: revision 1.90
Postpone the segment accounting changes coming from truncation until the
inode that makes those changes valid is either written to disk by
lfs_writeinode() or discarded by lfs_vfree().
A couple of locking fixes are also included as well.
 1.158.2.9 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_segment.c: revision 1.175
Regression test improvements:
Move the stop for LFCNWRAPSTOP to the point at which writing at segment 0
is really about to commence, since this is what the test expects (and
incidentally what a snapshotting utility wants as well).
More correctly reconstruct the on-disk state at every checkpoint, rather
than relying on the entire state at the point of wrapping to be accurate
(that is only true the first time we wrap). Add a "make abort" target to
make rerunning the test more convenient when it has failed and we're done
analyzing the failure.
 1.158.2.8 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs.h: revision 1.103
sys/ufs/lfs/lfs_segment.c: revision 1.174
sys/ufs/lfs/lfs_vnops.c: revision 1.168
Introduce two fcntl calls that freeze the filesystem right at the point
where segment 0 is being considered for writing. This allows for automated
checkpoint vailidity scanning, and could be used (in conjunction with the
existing LFCNREWIND) for e.g. snapshot dumps as well.
Include a regression test that does such scanning.
When writing the Ifile, loop through the dirty block list three times to
make sure that the checkpoint is always consistent (the first and second
times the Ifile blocks can cross a segment boundary; not so the third time
unless the segments are very small). Discovered by using the aforementioned
regression test.
 1.158.2.7 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs.h: revision 1.102
sys/ufs/lfs/lfs_segment.c: revision 1.173
sys/ufs/lfs/lfs_vnops.c: revision 1.167 via patch
sys/ufs/lfs/lfs_bio.c: revision 1.91
Make lfs_vref/lfs_vunref not need to know about VXLOCK and VFREEING
explicitly (especially since we didn't know about VFREEING at all before),
but notice the EBUSY return from vget() instead.
Fix some more MP locking protocol issues, most of which were pointed out by
Christian Ehrhardt this morning on tech-kern.
 1.158.2.6 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_balloc.c: revision 1.60
sys/ufs/lfs/lfs_syscalls.c: revision 1.111
sys/ufs/lfs/lfs_segment.c: revision 1.172
sys/ufs/lfs/lfs_vnops.c: revision 1.163
Several minor bug fixes:
* Correct (weak) segment lock assertions in lfs_fragextend and lfs_putpages.
* Keep IN_MODIFIED set if we run out of avail in lfs_putpages.
* Don't try to (re)write buffers on a VBLK vnode; fixes a panic I found
while running with an LFS root.
* Raise priority of LFCNSEGWAIT to PVFS; PUSER is way too low for
something the pagedaemon is relying on.
 1.158.2.5 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.158
sys/ufs/lfs/lfs_subr.c: revision 1.57
sys/ufs/lfs/lfs_segment.c: revision 1.171
sys/ufs/lfs/lfs.h: revision 1.97
sys/ufs/lfs/lfs_vfsops.c: revision 1.195
sys/ufs/lfs/lfs_extern.h: revision 1.76
Improvements to LFS's paging mechanism, to wit:
* Acknowledge that sometimes there are more dirty pages to be written to
disk than clean segments. When we reach the danger line,
lfs_gop_write() now returns EAGAIN. The caller of VOP_PUTPAGES(), if
it holds the segment lock, drops it and waits for the cleaner to make
room before continuing.
* Note and avoid a three-way deadlock in lfs_putpages (a writer holding
a page busy blocks on the cleaner while the cleaner blocks on the
segment lock while lfs_putpages blocks on the page).
 1.158.2.4 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_segment.c: revision 1.170
sys/ufs/lfs/lfs.h: revision 1.96
sys/ufs/lfs/lfs_vfsops.c: revision 1.194
sys/ufs/lfs/lfs_syscalls.c: revision 1.109
From Konrad Schroeder, in response to strange df output on anoncvs.netbsd.org:
We were returning the wrong value for free space. Now we're not.
 1.158.2.3 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.153
sys/ufs/lfs/lfs_debug.c: revision 1.32
sys/ufs/lfs/lfs_alloc.c: revision 1.84
sys/ufs/lfs/lfs_vfsops.c: revision 1.185
sys/ufs/lfs/lfs_segment.c: revision 1.165
64 bit inode changes.
 1.158.2.2 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.152
sys/ufs/lfs/lfs_debug.c: revision 1.31
sys/ufs/lfs/lfs_subr.c: revision 1.53
sys/ufs/lfs/lfs_extern.h: revision 1.68
sys/ufs/lfs/lfs_inode.c: revision 1.96
sys/ufs/lfs/lfs_bio.c: revision 1.86
sys/ufs/lfs/lfs_alloc.c: revision 1.83
sys/ufs/lfs/lfs_vfsops.c: revision 1.181
sys/ufs/lfs/lfs.h: revision 1.88
sys/ufs/lfs/lfs_segment.c: revision 1.164
- sprinkle const
- avoid shadow variables.
 1.158.2.1 07-May-2005  tron Apply patch (requested by perseant in ticket #242):
* fsck_lfs buffer cache fixes, including PR #29151
* Change fsck_lfs phase 0 message to reflect reality
* fsck_lfs: check phase 5 (cleanerinfo accounting) even on
roll-forward
* Keep better track of the free list during roll-forward, avoiding
a core dump
* Improve hash table use for fsck_lfs buffer and vnode cache
* Document fsck_lfs flag -f, and implement -q
* Add resize_lfs, including kernel support
* Add LFS to mountd's list of exportable filesystem types
* Make the LFS lkm work again [christos@]
* Add MP locking to the LFS kernel subsystem
* Fix pager_map deadlock in lfs_putpages()
* Avoid incomplete file extension that looks like "partial
truncation" to fsck
* Use lfs_malloc for cleaner malloc, since the cleaner often runs
in low-memory conditions.
* Use splay trees, not hash table, to track page allocation for
write.
* Fix mkdir panic on full fs
* Fix page accounting leak by counting differently.
* Use rightly named structure for lfs_getattr [skrll@]
* Cosmetic changes for readability.
 1.164.2.8 27-Feb-2008  yamt sync with head.
 1.164.2.7 04-Feb-2008  yamt sync with head.
 1.164.2.6 21-Jan-2008  yamt sync with head
 1.164.2.5 27-Oct-2007  yamt sync with head.
 1.164.2.4 03-Sep-2007  yamt sync with head.
 1.164.2.3 26-Feb-2007  yamt sync with head.
 1.164.2.2 30-Dec-2006  yamt sync with head.
 1.164.2.1 21-Jun-2006  yamt sync with head.
 1.168.2.1 15-Jan-2006  yamt sync with head.
 1.169.10.2 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.169.10.1 28-Mar-2006  tron Merge 2006-03-28 NetBSD-current into the "peter-altq" branch.
 1.169.8.3 11-May-2006  elad sync with head
 1.169.8.2 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.169.8.1 19-Apr-2006  elad sync with head.
 1.169.6.6 03-Sep-2006  yamt sync with head.
 1.169.6.5 11-Aug-2006  yamt sync with head
 1.169.6.4 26-Jun-2006  yamt sync with head.
 1.169.6.3 24-May-2006  yamt sync with head.
 1.169.6.2 11-Apr-2006  yamt sync with head
 1.169.6.1 01-Apr-2006  yamt sync with head.
 1.169.4.3 01-Jun-2006  kardel Sync with head.
 1.169.4.2 22-Apr-2006  simonb Sync with head.
 1.169.4.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time() and use "time_second"
instead of "time.tv_sec".
 1.169.2.1 09-Sep-2006  rpaulo sync with head
 1.180.2.1 19-Jun-2006  chap Sync with head.
 1.182.2.1 13-Jul-2006  gdamore Merge from HEAD.
 1.190.4.3 10-Dec-2006  yamt sync with head.
 1.190.4.2 22-Oct-2006  yamt use workqueue for aiodoned.
 1.190.4.1 22-Oct-2006  yamt sync with head
 1.190.2.2 12-Jan-2007  ad Sync with head.
 1.190.2.1 18-Nov-2006  ad Sync with head.
 1.195.4.1 03-Sep-2007  wrstuden Sync w/ NetBSD-4-RC_1
 1.195.2.1 05-Jun-2007  bouyer Pull up following revision(s) (requested by perseant in ticket #703):
sys/miscfs/genfs/genfs.h 1.21
sys/miscfs/genfs/genfs_vnops.c 1.151
sys/ufs/lfs/lfs.h 1.119, 1.120
sys/ufs/lfs/lfs_bio.c 1.99-101
sys/ufs/lfs/lfs_extern.h 1.89
sys/ufs/lfs/lfs_inode.c 1.108, 1.109
sys/ufs/lfs/lfs_segment.c 1.197, 1.199, 1.200
sys/ufs/lfs/lfs_subr.c 1.69, 1.70
sys/ufs/lfs/lfs_syscalls.c 1.119
sys/ufs/lfs/lfs_vfsops.c 1.234, 1.235
sys/ufs/lfs/lfs_vnops.c 1.195, 1.196, 1.200, 1.202-206

Reduce busy waiting in lfs_putpages(), and other LFS improvements.
 1.196.2.4 17-May-2007  yamt sync with head.
 1.196.2.3 07-May-2007  yamt sync with head.
 1.196.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.196.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.198.4.1 11-Jul-2007  mjf Sync with head.
 1.198.2.13 01-Oct-2007  ad Make it compile (XXX not correct).
 1.198.2.12 28-Aug-2007  yamt - mark aiodone workqueue MPSAFE.
- make lfs callbacks acquire kernel_lock by themselves.

ok'ed by Andrew Doran.
 1.198.2.11 28-Aug-2007  yamt make this compilable with DEBUG.
 1.198.2.10 24-Aug-2007  ad Sync with buffer cache locking changes. See buf.h/vfs_bio.c for details.
Some minor portions are incomplete and needs to be verified as a whole.
 1.198.2.9 20-Aug-2007  ad Sync with HEAD.
 1.198.2.8 19-Aug-2007  ad - Back out the biodone() changes.
- Eliminate B_ERROR (from HEAD).
 1.198.2.7 15-Jul-2007  ad Sync with head.
 1.198.2.6 23-Jun-2007  ad - Lock v_cleanblkhd, v_dirtyblkhd, v_numoutput with the vnode's interlock.
Get rid of global_v_numoutput_lock. Partially incomplete as the buffer
cache locking doesn't work very well and needs an overhaul.
- Some changes to try and make softdep MP safe. Untested.
 1.198.2.5 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.198.2.4 08-Jun-2007  ad Sync with head.
 1.198.2.3 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.198.2.2 21-Mar-2007  ad - Replace more simple_locks, and fix up in a few places.
- Use condition variables.
- LOCK_ASSERT -> KASSERT.
 1.198.2.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.202.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.203.6.2 29-Jul-2007  ad It's not a good idea for device drivers to modify b_flags, as they don't
need to understand the locking around that field. Instead of setting
B_ERROR, set b_error instead. b_error is 'owned' by whoever completes
the I/O request.
 1.203.6.1 29-Jul-2007  ad file lfs_segment.c was added on branch matt-mips64 on 2007-07-29 13:31:15 +0000
 1.203.4.2 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.203.4.1 16-Aug-2007  jmcneill Sync with HEAD.
 1.204.4.1 14-Oct-2007  yamt sync with head.
 1.204.2.3 23-Mar-2008  matt sync with HEAD
 1.204.2.2 09-Jan-2008  matt sync with HEAD
 1.204.2.1 06-Nov-2007  matt sync with HEAD
 1.206.10.1 02-Jan-2008  bouyer Sync with HEAD
 1.206.6.4 19-Dec-2007  ad Use a global lfs_lock.
 1.206.6.3 19-Dec-2007  ad Fix some more problems w/lfs on this branch.
 1.206.6.2 19-Dec-2007  ad Get lfs mostly working.
 1.206.6.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.206.4.1 18-Feb-2008  mjf Sync with HEAD.
 1.209.6.3 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.209.6.2 02-Jun-2008  mjf Sync with HEAD.
 1.209.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.210.4.5 11-Aug-2010  yamt sync with head.
 1.210.4.4 11-Mar-2010  yamt sync with head
 1.210.4.3 19-Aug-2009  yamt sync with head.
 1.210.4.2 04-May-2009  yamt sync with head.
 1.210.4.1 16-May-2008  yamt sync with head.
 1.210.2.2 04-Jun-2008  yamt sync with head
 1.210.2.1 18-May-2008  yamt sync with head.
 1.211.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.213.22.2 09-Nov-2015  snj Fix ticket #1974 fallout.
 1.213.22.1 07-Nov-2015  snj Pull up following revision(s) (requested by dholland in ticket #1974):
sys/ufs/lfs/lfs_segment.c: revision 1.247 via patch
Fix catastrophic bug in lfs_rewind() that changed segment numbers
(lfs_curseg/lfs_nextseg in the superblock) using the wrong units.
These fields are for whatever reason the start addresses of segments
(measured in frags) rather than the segment numbers 0..n.
This only apparently affects dumping from a mounted fs; however, it
trashes the fs.
I would really, really like to have a static analysis tool that can
keep track of the units things are measured in, since fs code is full
of conversion macros and the macros are named inscrutable things like
"sntod" whose letters don't necessarily even correspond to the units
they convert. It is surprising that more of these are not wrong.
 1.213.18.2 09-Nov-2015  snj Fix ticket #1974 fallout.
 1.213.18.1 07-Nov-2015  snj Pull up following revision(s) (requested by dholland in ticket #1974):
sys/ufs/lfs/lfs_segment.c: revision 1.247 via patch
Fix catastrophic bug in lfs_rewind() that changed segment numbers
(lfs_curseg/lfs_nextseg in the superblock) using the wrong units.
These fields are for whatever reason the start addresses of segments
(measured in frags) rather than the segment numbers 0..n.
This only apparently affects dumping from a mounted fs; however, it
trashes the fs.
I would really, really like to have a static analysis tool that can
keep track of the units things are measured in, since fs code is full
of conversion macros and the macros are named inscrutable things like
"sntod" whose letters don't necessarily even correspond to the units
they convert. It is surprising that more of these are not wrong.
 1.213.8.2 09-Nov-2015  sborrill Fix breakage from ticket #1974
 1.213.8.1 07-Nov-2015  snj Pull up following revision(s) (requested by dholland in ticket #1974):
sys/ufs/lfs/lfs_segment.c: revision 1.247 via patch
Fix catastrophic bug in lfs_rewind() that changed segment numbers
(lfs_curseg/lfs_nextseg in the superblock) using the wrong units.
These fields are for whatever reason the start addresses of segments
(measured in frags) rather than the segment numbers 0..n.
This only apparently affects dumping from a mounted fs; however, it
trashes the fs.
I would really, really like to have a static analysis tool that can
keep track of the units things are measured in, since fs code is full
of conversion macros and the macros are named inscrutable things like
"sntod" whose letters don't necessarily even correspond to the units
they convert. It is surprising that more of these are not wrong.
 1.214.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.214.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.215.2.4 21-Apr-2011  rmind sync with head
 1.215.2.3 05-Mar-2011  rmind sync with head
 1.215.2.2 03-Jul-2010  rmind sync with head
 1.215.2.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.217.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.220.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.222.6.1 18-Feb-2012  mrg merge to -current.
 1.222.2.4 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.222.2.3 23-Jan-2013  yamt sync with head
 1.222.2.2 17-Apr-2012  yamt sync with head
 1.222.2.1 02-Nov-2011  yamt page cache related changes

- maintain object pages in radix tree rather than rb tree.
- reduce unnecessary page scan in putpages. esp. when an object has a ton of
pages cached but only a few of them are dirty.
- reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
- fix nfs commit range tracking.
- fix nfs write clustering. XXX hack
 1.223.2.2 15-Nov-2015  bouyer Pull up following revision(s) (requested by dholland in ticket #1319):
sys/ufs/lfs/lfs_segment.c: revision 1.247 via patch
Fix catastrophic bug in lfs_rewind() that changed segment numbers
(lfs_curseg/lfs_nextseg in the superblock) using the wrong units.
These fields are for whatever reason the start addresses of segments
(measured in frags) rather than the segment numbers 0..n.
This only apparently affects dumping from a mounted fs; however, it
trashes the fs.
I would really, really like to have a static analysis tool that can
keep track of the units things are measured in, since fs code is full
of conversion macros and the macros are named inscrutable things like
"sntod" whose letters don't necessarily even correspond to the units
they convert. It is surprising that more of these are not wrong.
 1.223.2.1 17-Mar-2012  bouyer Pull up following revision(s) (requested by perseant in ticket #116):
sys/ufs/lfs/lfs_alloc.c: revision 1.112
tests/fs/vfs/t_rmdirrace.c: revision 1.9
tests/fs/vfs/t_renamerace.c: revision 1.25
sys/ufs/lfs/lfs_vnops.c: revision 1.240
sys/ufs/lfs/lfs_segment.c: revision 1.224
sys/ufs/lfs/lfs_bio.c: revision 1.122
sys/ufs/lfs/lfs_vfsops.c: revision 1.294
sbin/newfs_lfs/make_lfs.c: revision 1.19
sys/ufs/lfs/lfs.h: revision 1.136
Pass t_renamerace and t_rmdirrace tests.
Adapt dholland@'s fix to ufs_rename to fix PR kern/43582. Address several
other MP locking issues discovered during the course of investigating the
same problem.
Removed extraneous vn_lock() calls on the Ifile, since the Ifile writes
are controlled by the segment lock.
Fix PR kern/45982 by deemphasizing the estimate of how much metadata
will fill the empty space on disk when the disk is nearly empty
(t_renamerace crates a lot of inode blocks on a tiny empty disk).
 1.224.2.4 03-Dec-2017  jdolecek update from HEAD
 1.224.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.224.2.2 23-Jun-2013  tls resync from head
 1.224.2.1 25-Feb-2013  tls resync with head
 1.230.2.2 18-May-2014  rmind sync with head
 1.230.2.1 28-Aug-2013  rmind sync with head
 1.236.6.5 28-Aug-2017  skrll Sync with HEAD
 1.236.6.4 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.236.6.3 22-Sep-2015  skrll Sync with HEAD
 1.236.6.2 06-Jun-2015  skrll Sync with HEAD
 1.236.6.1 06-Apr-2015  skrll Sync with HEAD
 1.236.4.1 04-Aug-2015  snj Pull up following revision(s) (requested by dholland in ticket #932):
sys/ufs/lfs/lfs_segment.c: revision 1.247 via patch
Fix catastrophic bug in lfs_rewind() that changed segment numbers
(lfs_curseg/lfs_nextseg in the superblock) using the wrong units.
These fields are for whatever reason the start addresses of segments
(measured in frags) rather than the segment numbers 0..n.
This only apparently affects dumping from a mounted fs; however, it
trashes the fs.
I would really, really like to have a static analysis tool that can
keep track of the units things are measured in, since fs code is full
of conversion macros and the macros are named inscrutable things like
"sntod" whose letters don't necessarily even correspond to the units
they convert. It is surprising that more of these are not wrong.
 1.263.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.263.2.2 26-Apr-2017  pgoyette Sync with HEAD
 1.263.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.269.6.1 30-Oct-2017  snj Pull up following revision(s) (requested by maya in ticket #330):
sbin/fsck_lfs/inode.c: 1.69
sbin/fsck_lfs/lfs.c: 1.73
sbin/fsck_lfs/pass6.c: 1.50
sbin/fsck_lfs/segwrite.c: 1.46
sys/ufs/lfs/lfs.h: 1.202-1.203
sys/ufs/lfs/lfs_accessors.h: 1.48
sys/ufs/lfs/lfs_alloc.c: 1.136-1.137
sys/ufs/lfs/lfs_balloc.c: 1.94
sys/ufs/lfs/lfs_bio.c: 1.141
sys/ufs/lfs/lfs_extern.h: 1.113
sys/ufs/lfs/lfs_inode.c: 1.156-1.157
sys/ufs/lfs/lfs_inode.h: 1.20, 1.21, 1.23
sys/ufs/lfs/lfs_itimes.c: 1.20
sys/ufs/lfs/lfs_pages.c: 1.13-1.15
sys/ufs/lfs/lfs_rename.c: 1.22
sys/ufs/lfs/lfs_segment.c: 1.270-1.275
sys/ufs/lfs/lfs_subr.c: 1.94-1.97
sys/ufs/lfs/lfs_syscalls.c: 1.175
sys/ufs/lfs/lfs_vfsops.c: 1.360
sys/ufs/lfs/lfs_vnops.c: 1.316-1.321
sys/ufs/lfs/ulfs_inode.c: 1.20
sys/ufs/lfs/ulfs_inode.h: 1.24
sys/ufs/lfs/ulfs_lookup.c: 1.41
sys/ufs/lfs/ulfs_quota2.c: 1.31
sys/ufs/lfs/ulfs_readwrite.c: 1.24
sys/ufs/lfs/ulfs_vnops.c: 1.49-1.50
Update inode member i_flag --> i_state to keep up with kernel changes
Move definition of IN_ALLMOD near the flag it's a mask for.
Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
Correct confusion between i_flag and i_flags
These will have to be renamed.
Spotted by Riastradh, thanks!
Add an XXX about the missing flags so it's not buried in a commit
message.
now the XXX count for LFS is 260
Rename i_flag to i_state.
The similarity to i_flags has previously caused errors.
Use continue to denote the no-op loop to match netbsd style
newline for extra clarity.
It isn't safe to drain dirops with seglock held, it'll deadlock if there
are any dirops. drain before grabbing seglock.
lfs_dirops == 0 is always true (as we already drained dirops), so omit
that part of the comparison.
Fixes a lot of LFS deadlocks. PR kern/52301
Many thanks to dholland for help analyzing coredumps
Ifdef out KDASSERT which fires on my machine.
Deduplicate sanity check that seglock is held on segunlock
Revert r1.272 fix to PR kern/52301, the performance hit is making things
unusable.
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar
XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
fix buffer overflow/KASSERT when cookies are supplied
lfs no longer uses the ffs-style struct direct, use the correct minimum
size
from dholland
XXX more wrong
Consistently use {,UN}MARK_VNODE macros rather than function calls.
Not much point doing anything after a panic call
Ask some question about the code in a XXX comment
XXX question our double-flushing of dirops
Fix typo in comment
 1.275.2.2 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.275.2.1 25-Jun-2018  pgoyette Sync with HEAD
 1.277.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.277.2.1 10-Jun-2019  christos Sync with HEAD
 1.278.4.1 17-Aug-2020  martin Pull up following revision(s) (requested by riastradh in ticket #1050):

sys/ufs/lfs/lfs_subr.c: revision 1.101
sys/ufs/lfs/lfs_subr.c: revision 1.102
sys/ufs/lfs/lfs_inode.c: revision 1.158
sys/ufs/lfs/lfs_inode.h: revision 1.25
sys/ufs/lfs/lfs_balloc.c: revision 1.95
sys/ufs/lfs/lfs_pages.c: revision 1.21
sys/ufs/lfs/lfs_vnops.c: revision 1.330
sys/ufs/lfs/lfs_alloc.c: revision 1.140 (patch)
sys/ufs/lfs/lfs_alloc.c: revision 1.141 (patch)
lib/libp2k/p2k.c: revision 1.72
sys/ufs/lfs/lfs.h: revision 1.205
sys/ufs/lfs/lfs.h: revision 1.206
sys/ufs/lfs/lfs_segment.c: revision 1.284
sys/ufs/lfs/lfs.h: revision 1.207
sys/ufs/lfs/lfs_segment.c: revision 1.285
sys/ufs/lfs/lfs_debug.c: revision 1.55
sys/ufs/lfs/lfs_rename.c: revision 1.23
usr.sbin/dumplfs/dumplfs.c: revision 1.65
sys/ufs/lfs/lfs_vfsops.c: revision 1.371
sys/arch/i386/stand/efiboot/bootx64/Makefile: revision 1.3
sys/ufs/lfs/lfs_vfsops.c: revision 1.372
sys/ufs/lfs/lfs_vfsops.c: revision 1.373
sbin/fsck_lfs/pass1.c: revision 1.46
sys/ufs/lfs/lfs_vnops.c: revision 1.326
sys/ufs/lfs/lfs_vnops.c: revision 1.327
sys/ufs/lfs/lfs_vfsops.c: revision 1.375 (patch)
sys/ufs/lfs/lfs_vnops.c: revision 1.328
sys/ufs/lfs/lfs_subr.c: revision 1.98
sys/ufs/lfs/lfs_extern.h: revision 1.116
sys/ufs/lfs/lfs_vnops.c: revision 1.329
sys/ufs/lfs/lfs_subr.c: revision 1.99
sys/ufs/lfs/lfs_extern.h: revision 1.117
sys/ufs/lfs/lfs_accessors.h: revision 1.49
sys/ufs/lfs/lfs_extern.h: revision 1.118
sys/rump/fs/lib/liblfs/Makefile: revision 1.15
sys/ufs/lfs/lfs_bio.c: revision 1.146 (patch)
sys/ufs/lfs/lfs_bio.c: revision 1.147
sys/ufs/lfs/lfs_subr.c: revision 1.100

Fix kassert in lfs by initializing vp first.

Use a marker node to iterate lfs_dchainhd / i_lfs_dchain.

I believe elements can be removed while the lock is dropped,
including the next node we're hanging on to.

Just use VOP_BWRITE for lfs_bwrite_log.
Hope this doesn't cause trouble with vfs_suspend.

Teach lfs to transition ro<->rw.

Prevent new dirops while we issue lfs_flush_dirops.

lfs_flush_dirops assumes (by KASSERT((ip->i_state & IN_ADIROP) == 0))
that vnodes on the dchain will not become involved in active dirops
even while holding no other locks (lfs_lock, v_interlock), so we must
set lfs_writer here. All other callers already set lfs_writer.

We set fs->lfs_writer++ without explicitly doing lfs_writer_enter
because
(a) we already waited for the dirops to drain, and
(b) we hold lfs_lock and cannot drop it before setting lfs_writer.

Assert lfs_writer where I think we can now prove it.

Serialize access to the splay tree with lfs_lock.

Change some cheap KDASSERT into KASSERT.

Take a reference and fix assertions in lfs_flush_dirops.
Fixes panic:
KASSERT((ip->i_state & IN_ADIROP) == 0) at lfs_vnops.c:1670
lfs_flush_dirops
lfs_check
lfs_setattr
VOP_SETATTR
change_mode
sys_fchmod
syscall

This assertion -- and the assertion that vp->v_uflag has VU_DIROP set
-- is valid only until we release lfs_lock, because we may race with
lfs_unmark_dirop which will remove the nodes and change the flags.

Further, vp itself is valid only as long as it is referenced, which it
is as long as it's on the dchain, but lfs_unmark_dirop drops the
dchain's reference.

Don't lfs_writer_enter while holding v_interlock.

There's no need to lfs_writer_enter at all here, as far as I can see.
lfs_flush_fs will do it for us.

Break deadlock in PR kern/52301.

The lock order is lfs_writer -> lfs_seglock. The problem in 52301 is
that lfs_segwrite violates this lock order by sometimes doing
lfs_seglock -> lfs_writer, either (a) when doing a checkpoint or (b),
opportunistically, when there are no dirops pending. Both cases can
deadlock, because dirops sometimes take the seglock (lfs_truncate,
lfs_valloc, lfs_vfree):
(a) There may be dirops pending, and they may be waiting for the
seglock, so we can't wait for them to complete while holding the
seglock.
(b) The test for fs->lfs_dirops == 0 happens unlocked, and the state
may change by the time lfs_writer_enter acquires lfs_lock.

To resolve this in each case:
(a) Do lfs_writer_enter before lfs_seglock, since we will need it
unconditionally anyway. The worst performance impact of this should
be that some dirops get delayed a little bit.
(b) Create a new lfs_writer_tryenter to use at this point so that the
test for fs->lfs_dirops == 0 and the acquisition of lfs_writer happen
atomically under lfs_lock.

Initialize/destroy lfs_allclean_wakeup in modcmd, not lfs_mountfs.

Fixes reloading lfs.kmod.

In lfs_update, hold lfs_writer around lfs_vflush.

Otherwise, we might do
lfs_vflush
-> lfs_seglock
-> lfs_segwait(SEGM_CKP)
-> lfs_writer_enter
which is the reverse of the lfs_writer -> lfs_seglock ordering.

Call lfs_orphan in lfs_rename while we're still in the dirop.
lfs_writer_enter can't fail; keep it simple and don't pretend it can.

Assert that mtsleep can't fail either -- it doesn't catch signals and
there's no timeout.

Teach LFS_ORPHAN_NEXTFREE about lfs64.

Dust off the orphan detection code and try to make it work.

Fix !DIAGNOSTIC compile

Fix userland references to LFS_ORPHAN_NEXTFREE.

Forgot to grep for these or do a full distribution build, oops!

Fix missing <sys/evcnt.h> by removing the evcnts instead.

Just wanted to confirm that a race might happen, and indeed it did.
These serve little diagnostic value otherwise.

OR into bp->b_cflags; don't overwrite.

CTASSERT lfs on-disk structure sizes.

Avoid misaligned access to lfs64 on-disk records in memory.
lfs64 directory entries are only 32-bit aligned in order to conserve
space in directory blocks, and we had a hack to stuff a 64-bit inode
in them. This replaces the hack by __aligned(4) __packed, and goes
further:

1. It's not clear that all the other lfs64 data structures are 64-bit
aligned on disk to begin with. We can go through these later and
upgrade them from
struct foo64 {
...
} __aligned(4) __packed;
union foo {
struct foo64 f64;
...
};
to
struct foo64 {
...
};
union foo {
struct foo64 f64 __aligned(8);
...
} __aligned(4) __packed;
if we really want to take advantage of 64-bit memory accesses.
However, the __aligned(4) __packed must remain on the union
because:
2. We access even the lfs32 data structures via a union that has
lfs64 members, and it turns out that compilers will assume access
through a union with 64-bit aligned members implies the whole
union has 64-bit alignment, even if we're only accessing a 32-bit
aligned member.

Fix clang build after packed lfs64 accessor change.

Suppress spurious address-of-packed error in rump lfs too.
 1.280.2.2 29-Feb-2020  ad Sync with head.
 1.280.2.1 17-Jan-2020  ad Sync with head.
 1.105 20-Oct-2025  perseant * Generalize the partial-segment parser introduced for roll-forward,
using it to facilitate an in-kernel segment rewriter (cleaner), and a
mechanism to check whether a segment is in fact empty (only used
with DEBUG).

* Add these new fcntl calls:
- LFCNFILESTATS: For each inode given, report its number of direct
blocks, how many gaps (discontinuities) there are between direct
blocks, and how large the total gap distance is. This will be
useful for a coalescing agent.
- LFCNREWRITEFILE: For each inode given, rewrite its direct blocks,
effectively coalescing it into as compact a form as possible.
- LFCNSCRAMBLE: As above, except that it only rewrites every other
block. This causes the file to have many gaps that can be
measured with LFCNFILESTATS and addressed with LFCNREWRITEFILE,
for testing purposes.
- LFCNREWRITESEGS: Rewrite any live data in the given segments.
This is intended to simplify the cleaner API and facilitate an
in-kernel cleaner.
- LFCNCLEANERINFO: Get the most current CLEANERINFO data from the
kernel.
- LFCNSEGUSE: Retrieve segment usage data from the kernel.

* Vnodes marked IN_CLEANING now take a reference. Add a new "cleaner
lock", which must be taken by the cleaner before the segment lock,
and before marking nodes IN_CLEANING. This allows us to flush
vnodes, if necessary, before the cleaning segment is written, and
never to flush vnodes being cleaned. When the cleaner lock is
released, the vnodes are cleared of IN_CLEANING and the reference
dropped.

* Track a potential infinite loop in lfs_gatherblock.

* Pull "needs to flush" and "needs to wait for flush" into functions
instead of inlining their definitions.
 1.104 04-Sep-2025  perseant Copy the flags from a full partial segment to its continuation, if
a continuation is necessary, so that partial-segment collections marked
with SS_DIROP|SS_CONT are properly completed wiht a partial-segment marked
SS_DIROP (without SS_CONT). Necessary for roll-forward.
 1.103 05-Sep-2020  riastradh Round of uvm.h cleanup.

The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
to query whether curlwp is the pagedaemon, which should maybe be
exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
by it. We should split up uvm_extern.h but this will serve for now
to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
UVMHIST(ubchist), since ubchist is declared in uvm.h but the
reference evaporates if UVMHIST is not defined, so we reduce header
file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
here.

ok chs@
 1.102 23-Feb-2020  riastradh Fix missing <sys/evcnt.h> by removing the evcnts instead.

Just wanted to confirm that a race might happen, and indeed it did.
These serve little diagnostic value otherwise.
 1.101 23-Feb-2020  ad Fix !DIAGNOSTIC compile
 1.100 23-Feb-2020  riastradh lfs_writer_enter can't fail; keep it simple and don't pretend it can.

Assert that mtsleep can't fail either -- it doesn't catch signals and
there's no timeout.
 1.99 23-Feb-2020  riastradh Break deadlock in PR kern/52301.

The lock order is lfs_writer -> lfs_seglock. The problem in 52301 is
that lfs_segwrite violates this lock order by sometimes doing
lfs_seglock -> lfs_writer, either (a) when doing a checkpoint or (b),
opportunistically, when there are no dirops pending. Both cases can
deadlock, because dirops sometimes take the seglock (lfs_truncate,
lfs_valloc, lfs_vfree):

(a) There may be dirops pending, and they may be waiting for the
seglock, so we can't wait for them to complete while holding the
seglock.

(b) The test for fs->lfs_dirops == 0 happens unlocked, and the state
may change by the time lfs_writer_enter acquires lfs_lock.

To resolve this in each case:

(a) Do lfs_writer_enter before lfs_seglock, since we will need it
unconditionally anyway. The worst performance impact of this should
be that some dirops get delayed a little bit.

(b) Create a new lfs_writer_tryenter to use at this point so that the
test for fs->lfs_dirops == 0 and the acquisition of lfs_writer happen
atomically under lfs_lock.
 1.98 23-Feb-2020  riastradh Use a marker node to iterate lfs_dchainhd / i_lfs_dchain.

I believe elements can be removed while the lock is dropped,
including the next node we're hanging on to.
 1.97 26-Jul-2017  maya branches: 1.97.4; 1.97.8; 1.97.10;
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar

XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
 1.96 26-Jul-2017  maya Deduplicate sanity check that seglock is held on segunlock
 1.95 19-Jun-2017  maya Ifdef out KDASSERT which fires on my machine.
 1.94 10-Jun-2017  maya Rename i_flag to i_state.

The similarity to i_flags has previously caused errors.
 1.93 08-Jun-2017  chs move some buffer cache internals declarations from buf.h to vfs_bio.c.
this is needed to avoid name conflicts with ZFS and also
makes it clearer that other code shouldn't be messing with these.
remove the LFS debug code that poked around in bufqueues and
remove the BQ_EMPTY bufqueue since nothing uses it anymore.
provide a function to let LFS and wapbl read the value of nbuf for now.
 1.92 06-Apr-2017  maya branches: 1.92.6;
don't guard lfs_sbactive or lfs_log with splbio, lfs_lock is plenty.
 1.91 06-Apr-2017  maya don't guard lfs_reshash with splbio, lfs_lock is plenty
 1.90 06-Apr-2017  maya if DEBUG panic => KDASSERT. reduces ifdefs. NFC
 1.89 06-Apr-2017  maya Provide a LFS_ENTER_LOG (__nothing) in the !DEBUG case.
so I can drop lots of #ifdef DEBUG around this macro. NFCI
 1.88 01-Apr-2017  maya Keep on holding lfs_lock when calling cv_broadcast

pointed out by skrll, thanks.
 1.87 01-Apr-2017  maya switch lfs_dirops to condvar (from mtsleep)
 1.86 03-Oct-2015  dholland branches: 1.86.2; 1.86.4;
Use IINFO in lfs_writeinode().
(both the kernel and the userland copies)
 1.85 12-Aug-2015  dholland Make 32-bit and 64-bit versions of CLEANERINFO.

XXX: while this is written to disk, it seems like much of it would
XXX: be better set up as a commpage shared with the cleaner.
 1.84 28-Jul-2015  dholland Add a new lfs header file: lfs_accessors.h.

This contains all the accessor functions and macros out of lfs.h.
Add an include of lfs_accessors.h after all uses of lfs.h... except
for code that wants to define its own struct lfs-alike that the
accessors are supposed to play along with. For these, set STRUCT_LFS
and include lfs_accessors.h after the necessary structure has been
defined, so that lfs_accessors.h can emit functions in terms of it.
 1.83 24-Jul-2015  dholland More lfs superblock accessors.
(This changes the rest of the code over; all the accessors were
already added.)

The difference between this commit and the previous one is arbitrary,
but the previous one passed the regression tests on its own so I'm
keeping it separate to help with any bisections that might be needed
in the future.
 1.82 24-Jul-2015  dholland Switch to accessor functions for elements of the LFS on-disk
superblock. This will allow switching between 32/64 bit forms on the
fly; it will also allow handling LFS_EI reasonably tidily. (That
currently doesn't work on the superblock.)

It also gets rid of cpp abuse in the form of fake structure member
macros.

Also, instead of doing sleep/wakeup on &lfs_avail and &lfs_nextseg
inside the on-disk superblock, add extra elements to the in-memory
struct lfs for this. (XXX: these should be changed to condvars, but
not right now)

XXX: this migrates a structure needed by the lfs code in libsa (struct
salfs) into lfs.h, where it doesn't belong, but for the time being
this is necessary in order to allow the accessors (and the various
lfs macros and other goop that relies on them) to compile.
 1.81 16-Jul-2015  dholland Don't cast the return value of malloc.
 1.80 28-Jul-2013  dholland branches: 1.80.6;
Add lfs_kernel.h for declarations that don't need to be exposed to userland.

lfs currently has the following headers:
lfs.h - on-disk structures and stuff needed for userlevel tools
lfs_inode.h - additional restricted materials for userlevel tools
that operate the fs (newfs_lfs, fsck_lfs, lfs_cleanerd)
lfs_kernel.h - stuff needed only in the kernel

and the following legacy headers that are expected to be mopped up and
folded into one of the above:
lfs_extern.h - function prototypes
ulfs_bswap.h - endian-independent support
ulfs_dinode.h - now contains very little
ulfs_dirhash.h - dirhash support
ulfs_extattr.h - extattr support
ulfs_extern.h - more function prototypes
ulfs_inode.h - assorted kernel-only declarations
ulfs_quota.h - quota support
ulfs_quota1.h - more quota support
ulfs_quota2.h - more quota support
ulfs_quotacommon.h - more quota support
ulfsmount.h - legacy copy of ufsmount material
 1.79 18-Jun-2013  christos branches: 1.79.2;
Prefix most of the cpp macros with lfs_ and LFS_ to avoid conflicts with ffs.
This was done so that boot blocks that want to compile both FFS and LFS in
the same file work.
 1.78 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.77 02-Jan-2012  perseant branches: 1.77.6;

* Remove PGO_RECLAIM during lfs_putpages()' call to genfs_putpages(),
to avoid a live lock in the latter when reclaiming a vnode with
dirty pages.

* Add a new segment flag, SEGM_RECLAIM, to note when a segment is
being written for vnode reclamation, and record which inode is being
reclaimed, to aid in forensic debugging.

* Add a new segment flag, SEGM_SINGLE, so that opportunistic writes
can write a single segment's worth of blocks and then stop, rather
than writing all the way up to the cleaner's reserved number of
segments.

* Add assert statements to check mutex ownership is the way it ought
to be, mostly in lfs_putpages; fix problems uncovered by this.

* Don't clear VU_DIROP until the inode actually makes its way to disk,
avoiding a problem where dirop inodes could become separated
(uncovered by a modified version of the "ckckp" forensic regression
test).

* Move the vfs_getopsbyname() call into lfs_writerd. Prepare code to
make lfs_writerd notice when there are no more LFSs, and exit losing
the reference, so that, in theory, the module can be unloaded. This
code is not enabled, since it causes a crash on exit.

* Set IN_MODIFIED on inodes flushed by lfs_flush_dirops. Really we
only need to set IN_MODIFIED if we are going to write them again
(e.g., to write pages); need to think about this more.

Finally, several changes to help avoid "no clean segments" panics:

* In lfs_bmapv, note when a vnode is loaded only to discover whether
its blocks are live, so it can immediately be recycled. Since the
cleaner will try to choose ~empty segments over full ones, this
prevents the cleaner from (1) filling the vnode cache with junk, and
(2) squeezing any unwritten writes to disk and running the fs out of
segments.

* Overestimate by half the amount of metadata that will be required
to fill the clean segments. This will make the disk appear smaller,
but should help avoid a "no clean segments" panic.

* Rearrange lfs_writerd. In particular, lfs_writerd now pays
attention to the number of clean segments available, and holds off
writing until there is room.
 1.76 25-Jun-2010  hannken branches: 1.76.8; 1.76.12;
Undo last commit and don't try to lock vnodes in lfs_unmark_dirop()
as we may deadlock trying to write the superblock.

Should fix PR #43503 Can't create device nodes on LFS.
 1.75 24-Jun-2010  hannken Clean up vnode lock operations:

- VOP_LOCK(vp, flags): Limit the set of allowed flags to LK_EXCLUSIVE,
LK_SHARED and LK_NOWAIT. LK_INTERLOCK is no longer allowed as it
makes no sense here.

- VOP_ISLOCKED(vp): Remove the for some time unused return value
LK_EXCLOTHER. Mark this operation as "diagnostic only".
Making a lock decision based on this operation is no longer allowed.

Discussed on tech-kern.
 1.74 16-Feb-2010  mlelstv branches: 1.74.2;
Three changes in a single commit.

- drop the notion of frags (LFS fragments) vs fsb (FFS fragments)
The code uses a complicated unity function that just makes the
code difficult to understand.

- support larger sector sizes. Fix disk address computations
to use DEV_BSIZE in the kernel as required by device drivers
and to use sector sizes in userland.

- Fix several locking bugs in lfs_bio.c and lfs_subr.c.
 1.73 28-Apr-2008  martin branches: 1.73.20;
Remove clause 3 and 4 from TNF licenses
 1.72 02-Jan-2008  ad branches: 1.72.6; 1.72.8; 1.72.10;
Merge vmlocking2 to head.
 1.71 10-Oct-2007  ad branches: 1.71.4; 1.71.6; 1.71.10;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.70 15-May-2007  tnn branches: 1.70.6; 1.70.8; 1.70.10;
Add missing underscore to wchan name.
 1.69 18-Apr-2007  perseant Add/change a couple of comments about locking restrictions.
 1.68 12-Mar-2007  ad branches: 1.68.2;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.
 1.67 21-Feb-2007  thorpej branches: 1.67.4;
Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.66 15-Feb-2007  ad branches: 1.66.2;
Replace some uses of lockmgr() / simplelocks.
 1.65 16-Nov-2006  christos branches: 1.65.2; 1.65.4;
__unused removal on arguments; approved by core.
 1.64 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.63 04-Oct-2006  christos fix empty if
 1.62 15-Sep-2006  perseant branches: 1.62.2;
Don't remark a locked inode with IN_MODIFIED after writing it to disk,
if we ourselves hold the lock. This prevents e.g. mknod from hanging
indefinitely.

Also, always use the return value from VOP_ISLOCKED to determine whether
we hold the lock or someone else does, rather than looking into the lock
structure ourselves.
 1.61 01-Sep-2006  perseant branches: 1.61.2;
Changes to help the roll-forward agent, to wit:

* Mark being-deleted files in the Ifile so we can finish deleting them
at fs mount time.
* Flag the Ifile with "cleaner must clean" when writers are waiting for
the cleaner, rather than relying solely on the cleaner's estimation of
whether it should clean or not.
* Note partial segments written by a user agent (in particular,
fsck_lfs) so that repeated rolls forward don't interfere with one
another.
* Add a new fcntl, LFCNPASS, that allows the log to wrap exactly once,
for better testing of the validity of checkpoints.
* Keep track of the on-disk nlink count when cleaning, so that we don't
partially complete directory operations while cleaning.
* Ensure that every single Ifile inode write represents a consistent
view of the filesystem. In particular, the accounting for the segment
we are writing the inode into must be correct, and the accounting for
the segment that inode used to reside in must be correct. Rather than
just rewriting the inode if we wrote it wrong, rewrite the necessary
ifile blocks before writing the inode so we never write it wrong.
* Don't unmark any VDIROP vnodes if we haven't written them to disk,
avoiding yet another problem with the "wait for the cleaner" error
return from lfs_putpages().

Also, move the last callback to an aiodone call, so we no longer do any
memory management from interrupt context.
 1.60 29-Jun-2006  perseant Don't wake up the cleaner if the filesystem is unwrappable, and fix the
compatibility fcntls.

Also includes one-line fixes for an MP locking bug and a zero-length FINFO
problem that manifested during testing.
 1.59 04-May-2006  perseant branches: 1.59.4;
Introduce another per-filesystem parameter, lfs_resvseg, to separate the
notion of "how many segments are reserved for the cleaner" from that of
"how many segments are not counted in lfs_bfree". The default value
used for existing filesystems is the same as the previous implicit value
of (lfs_minfreeseg / 2 + 1), modulo some sanity checking.

Count pending dirops on a per-filesystem basis, since once we start
writing them we can't stop until we're done. This seems to help stave off
the "no clean segments" panic in the case of filling the filesystem with
directories and small files (e.g. simultaneously unpacking more copies of
pkgsrc than will fit).
 1.58 07-Apr-2006  perseant Make the segment lock aware of LWPs. Fixes a (somewhat confusing)
"lockmgr: pid 3997, not exclusive lockholder 3997, unlocking" panic I
encountered while running blogbench on an LFS.
 1.57 24-Mar-2006  perseant Improvements to LFS's paging mechanism, to wit:

* Acknowledge that sometimes there are more dirty pages to be written to
disk than clean segments. When we reach the danger line,
lfs_gop_write() now returns EAGAIN. The caller of VOP_PUTPAGES(), if
it holds the segment lock, drops it and waits for the cleaner to make
room before continuing.

* Note and avoid a three-way deadlock in lfs_putpages (a writer holding
a page busy blocks on the cleaner while the cleaner blocks on the
segment lock while lfs_putpages blocks on the page).
 1.56 14-Jan-2006  yamt branches: 1.56.2; 1.56.4; 1.56.6; 1.56.8; 1.56.10;
- unify ffs_blkatoff and lfs_blkatoff.
- remove ufs_ops::uo_blkatoff.
- add directory read-ahead code. (disabled for now.)
 1.55 11-Dec-2005  christos branches: 1.55.2;
merge ktrace-lwp.
 1.54 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.53 29-May-2005  christos branches: 1.53.2; 1.53.4;
- sprinkle const
- avoid shadow variables.
 1.52 16-Apr-2005  perseant Use lfs_malloc() to manage the blkiov arrays that the cleaner functions use,
since the cleaner is likely to operate in a low-memory condition.
 1.51 01-Apr-2005  perseant Protect various per-fs structures with fs->lfs_interlock simple_lock, to
improve behavior in the multiprocessor case. Add debugging segment-lock
assertion statements.
 1.50 08-Mar-2005  perseant branches: 1.50.2;
Straighten out the maze of ifdefs. Instead, consolidate all the debugging
stuff under '#ifdef DEBUG', and use sysctl knobs to turn on/off particular
parts of the debugging reporting (if DEBUG is enabled). Re-enable the LFS
statistics in sysctl, while I'm there. A bit of a rototill.
 1.49 26-Feb-2005  perry nuke trailing whitespace
 1.48 26-Feb-2005  perseant Various minor LFS improvements:

* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statvfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().
 1.47 09-Mar-2004  yamt branches: 1.47.6; 1.47.8; 1.47.10;
use correct segment size. this fixes memory corruption when using lfsv1.
 1.46 21-Dec-2003  simonb Fix usage of fifth argument to pool_init().
 1.45 14-Oct-2003  dbj add mnt_iflag field to struct mount for internal flags
mv MNT_GONE, MNT_UNMOUNT and MNT_WANTRDWR to this field
additonally add mnt_writeopcountupper and mnt_writeopcountlower fields
in preparation for pending write suspension support work
bump kernel version to 1.6ZD
 1.44 07-Sep-2003  yamt use LFS_DEBUG_COUNTLOCKED macro.
 1.43 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.42 12-Jul-2003  yamt - protect global resource counts with lfs_subsys_lock.
- clean up scattered externs a little.
 1.41 02-Jul-2003  yamt - add a new functions, lfs_writer_enter/leave, and use them instead of
duplicated code fragments.
- add an assertion.
 1.40 23-Apr-2003  perseant branches: 1.40.2;
Make LFS work better (though still not "well") as an NFS-exported
filesystem (and other things that needed to be fixed before the tests
would complete), to wit:

* Include the fs ident in the filehandle; improve stale filehandle checks.

* Change definition of blksize() to use the on-dinode size instead of
the inode's i_size, so that fsck_lfs will work properly again.

* Use b_interlock in lfs_vtruncbuf.

* Postpone dirop reclamation until after the seglock has been released,
so that lfs_truncate is not called with the segment lock held.

* Don't loop in lfs_fsync(), just write everything and wait.

* Be more careful about the interlock/uobjlock in lfs_putpages: when we
lose this lock, we have to resynchronize dirtiness of pages in each
block.

* Be sure to always write indirect blocks and update metadata in
lfs_putpages; fixes a bug that caused blocks to be accounted to the
wrong segment.
 1.39 21-Mar-2003  perseant KNF (space after keywords).
 1.38 15-Mar-2003  perseant Add simple_lock protection for lfs_seglock and lfs_subsys_pages; these will
be expanded to cover other per-fs and subsystem-wide data as well.

Fix a case of IN_MODIFIED being set without updating lfs_uinodes, resulting
in a "lfs_uinodes < 0" panic.

Fix a deadlock in lfs_putpages arising from the need to busy all pages in a
block; unbusy any that had already been busied before starting over.
 1.37 11-Mar-2003  perseant - Get rid of unused #ifdefs LFS_NO_PAGEMOVE and LFS_MALLOC_SUMMARY (both
always true) and accompanying dead code.

- When constructing write clusters in lfs_writeseg, if the block we are
about to add is itself a cluster from GOP_WRITE, don't put a cluster
in a cluster, just write the GOP_WRITE cluster on its own. This seems
to represent a slight performance gain on my test machine.

- Charge someone's rusage for writes on LFSes. It's difficult to tell
who the "right" process to charge is; just charge whoever triggered
the write.
 1.36 08-Mar-2003  perseant Add an lfs_strategy() that checks to make sure we're not trying to read
where the cleaner is trying to write, instead of tying up the "live"
buffers (or pages).

Fix a bug in the LFS_UBC case where oversized buffers would not be
checksummed correctly, causing uncleanable segments.

Make sure that wakeup(fs->lfs_iocount) is done if fs->lfs_iocount is 1
as well as 0, since we wait in some places for it to drop to 1.

Activate all pages that make it into lfs_gop_write without the segment
lock held, since they must have been dirtied very recently, even if
PG_DELWRI is not set.
 1.35 04-Mar-2003  perseant Don't add dirty blocks to the ifile in lfs_segunlock, if we're trying to
unmount the filesystem. This avoids a "dirty blocks" panic.
 1.34 23-Feb-2003  perseant Fix a buffer overflow bug in the LFS_UBC case that manifested itself
either as a mysterious UVM error or as "panic: dirty bufs". Verify
maximum size in lfs_malloc.

Teach lfs_updatemeta and lfs_shellsort about oversized cluster blocks from
lfs_gop_write.

When unwiring pages in lfs_gop_write, deactivate them, under the theory
that the pagedaemon wanted to free them last we knew.
 1.33 20-Feb-2003  perseant Tabify, and fix some comment alignment problems.
 1.32 19-Feb-2003  yamt add debug code to lfs_free.
 1.31 17-Feb-2003  perseant Add code to UBCify LFS. This is still behind "#ifdef LFS_UBC" for now
(there are still some details to work out) but expect that to go
away soon. To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:

* Create a writer daemon kernel thread whose purpose is to handle page
writes for the pagedaemon, but which also takes over some of the
functions of lfs_check(). This thread is started the first time an
LFS is mounted.

* Add a "flags" parameter to GOP_SIZE. Current values are
GOP_SIZE_READ, meaning that the call should return the size of the
in-core version of the file, and GOP_SIZE_WRITE, meaning that it
should return the on-disk size. One of GOP_SIZE_READ or
GOP_SIZE_WRITE must be specified.

* Instead of using malloc(...M_WAITOK) for everything, reserve enough
resources to get by and use malloc(...M_NOWAIT), using the reserves if
necessary. Use the pool subsystem for structures small enough that
this is feasible. This also obsoletes LFS_THROTTLE.

And a few that are not strictly necessary:

* Moves the LFS inode extensions off onto a separately allocated
structure; getting closer to LFS as an LKM. "Welcome to 1.6O."

* Unified GOP_ALLOC between FFS and LFS.

* Update LFS copyright headers to correct values.

* Actually cast to unsigned in lfs_shellsort, like the comment says.

* Keep track of which segments were empty before the previous
checkpoint; any segments that pass two checkpoints both dirty and
empty can be summarily cleaned. Do this. Right now lfs_segclean
still works, but this should be turned into an effectless
compatibility syscall.
 1.30 29-Jan-2003  yamt don't use daddr_t for segment summary since it's an on-disk structure.
 1.29 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.28 11-Jul-2002  perseant Remove lying comment on SEGM_PROT seglock.
 1.27 06-Jul-2002  perseant Deal with fragment size changes better. For each fragment that can
exist on an on-disk inode, we keep a record of its size in struct inode,
which is updated when we write the block to disk. The cleaner routines
thus have ready access to what size is the correct size for this block,
on disk.

Fixed a related bug: if a file with fragments is being cleaned
(fragments being cleaned) at the same time it is being extended beyond
NDADDR blocks, we could write a bogus FINFO record that has a frag in the
middle; when it was cleaned this would give back bogus file data. Don't
write the indirect blocks in this case, since there is no need.

lfs_fragextend and lfs_truncate no longer require the seglock, but instead
take a shared lock, which the seglock locks exclusively.
 1.26 16-Jun-2002  perseant For synchronous writes, keep separate i/o counters for each write, so
processes don't have to wait for one another to finish (e.g., nfsd seems
to be a little happier now, though I haven't measured the difference).
Synchronous checkpoints, however, must always wait for all i/o to finish.

Take the contents of the callback functions and have them run in thread
context instead (aiodoned thread). lfs_iocount no longer has to be
protected in splbio(), and quite a bit less of the segment construction
loop needs to be in splbio() as well.

If lfs_markv is handed a block that is not the correct size according to
the inode, refuse to process it. (Formerly it was extended to the "correct"
size.) This is possibly more prone to deadlock, but less prone to corruption.

lfs_segclean now outright refuses to clean segments that appear to have live
bytes in them. Again this may be more prone to deadlock but avoids
corruption.

Replace ufsspec_close and ufsfifo_close with LFS equivalents; this means
that no UFS functions need to know about LFS_ITIMES any more. Remove
the reference from ufs/inode.h.

Tested on i386, test-compiled on alpha.
 1.25 24-May-2002  perseant Fix a couple of instances where reassignbuf() was not done at splbio.

Tested on i386.
 1.24 23-May-2002  perseant Back out rev 1.174 of vfs_subr.c, because the splbio() wasn't protecting
enough to be useful, and broadening it so that it did would have meant
that operations possibly requiring synchronous disk activity would have
to be done in splbio(). This clearly was not going to work.

Worked around this in the LFS case by having lfs_cluster_callback put an
extra hold on the vnode before calling biodone(), and taking the hold
off without HOLDRELE's problematic list swapping. lfs_vunref() will take
care of that---in thread context---on the next write if need be.

Also, ensure that the list walking in lfs_{writevnodes,segunlock,gather}
takes into account the possibility that the list may change
underneath it (possibly because it itself deleted an element).

Tested on i386, test-compiled on alpha.
 1.23 17-May-2002  perseant branches: 1.23.2;
use macros from <sys/queue.h>
 1.22 14-May-2002  perseant branches: 1.22.2;
Phase one of my three-phase plan to make LFS play nice with UBC, and bug-fixes
I found while making sure there weren't any new ones.

* Make the write clusters keep track of the buffers whose blocks they contain.
This should make it possible to (1) write clusters using a page mapping
instead of malloc, if desired, and (2) schedule blocks for rewriting
(somewhere else) if a write error occurs. Code is present to use
pagemove() to construct the clusters but that is untested and will go away
anyway in favor of page mapping.
* DEBUG now keeps a log of Ifile writes, so that any lingering instances of
the "dirty bufs" problem can be properly debugged.
* Keep track of whether the Ifile has been dirtied by various routines that
can be called by lfs_segwrite, and loop on that until it is clean, for
a checkpoint. Checkpoints need to be squeaky clean.
* Warn the user (once) if the Ifile grows larger than is reasonable for their
buffer cache. Both lfs_mountfs and lfs_unmount check since the Ifile can
grow.
* If an inode is not found in a disk block, try rereading the block, under
the assumption that the block was copied to a cluster and then freed.
* Protect WRITEINPROG() with splbio() to fix a hang in lfs_update.
 1.21 23-Nov-2001  chs add spaces for KNF. confirmed to produce identical objects.
 1.20 08-Nov-2001  lukem add RCSID
 1.19 26-Oct-2001  lukem remove #include <ufs/ufs/quota.h> where it was just to appease
<ufs/ufs/inode.h>, since the latter now includes the former. leave the former
in source that obviously uses specific bits of it (for completeness.)
 1.18 13-Jul-2001  perseant branches: 1.18.4;
Merge the short-lived perseant-lfsv2 branch into the trunk.

Kernels and tools understand both v1 and v2 filesystems; newfs_lfs
generates v2 by default. Changes for the v2 layout include:

- Segments of non-PO2 size and arbitrary block offset, so these can be
matched to convenient physical characteristics of the partition (e.g.,
stripe or track size and offset).

- Address by fragment instead of by disk sector, paving the way for
non-512-byte-sector devices. In theory fragments can be as large
as you like, though in reality they must be smaller than MAXBSIZE in size.

- Use serial number and filesystem identifier to ensure that roll-forward
doesn't get old data and think it's new. Roll-forward is enabled for
v2 filesystems, though not for v1 filesystems by default.

- The inode free list is now a tailq, paving the way for undelete (undelete
is not yet implemented, but can be without further non-backwards-compatible
changes to disk structures).

- Inode atime information is kept in the Ifile, instead of on the inode;
that is, the inode is never written *just* because atime was changed.
Because of this the inodes remain near the file data on the disk, rather
than wandering all over as the disk is read repeatedly. This speeds up
repeated reads by a small but noticeable amount.

Other changes of note include:

- The ifile written by newfs_lfs can now be of arbitrary length, it is no
longer restricted to a single indirect block.

- Fixed an old bug where ctime was changed every time a vnode was created.
I need to look more closely to make sure that the times are only updated
during write(2) and friends, not after-the-fact during a segment write,
and certainly not by the cleaner.
 1.17 09-Sep-2000  perseant branches: 1.17.2; 1.17.4; 1.17.6;
Various bug-fixes to LFS, to wit:


Kernel:

* Add runtime quantity lfs_ravail, the number of disk-blocks reserved
for writing. Writes to the filesystem first reserve a maximum amount
of blocks before their write is allowed to proceed; after the blocks
are allocated the reserved total is reduced by a corresponding amount.

If the lfs_reserve function cannot immediately reserve the requested
number of blocks, the inode is unlocked, and the thread sleeps until
the cleaner has made enough space available for the blocks to be
reserved. In this way large files can be written to the filesystem
(or, smaller files can be written to a nearly-full but thoroughly
clean filesystem) and the cleaner can still function properly.

* Remove explicit switching on dlfs_minfreeseg from the kernel code; it
is now merely a fs-creation parameter used to compute dlfs_avail and
dlfs_bfree (and used by fsck_lfs(8) to check their accuracy). Its
former role is better assumed by a properly computed dlfs_avail.

* Bounds-check inode numbers submitted through lfs_bmapv and lfs_markv.
This prevents a panic, but, if the cleaner is feeding the filesystem
the wrong data, you are still in a world of hurt.

* Cleanup: remove explicit references of DEV_BSIZE in favor of
btodb()/dbtob().

lfs_cleanerd:

* Make -n mean "send N segments' blocks through a single call to
lfs_markv". Previously it had meant "clean N segments though N calls
to lfs_markv, before looking again to see if more need to be cleaned".
The new behavior gives better packing of direct data on disk with as
little metadata as possible, largely alleviating the problem that the
cleaner can consume more disk through inefficient use of metadata than
it frees by moving dirty data away from clean "holes" to produce
entirely clean segments.

* Make -b mean "read as many segments as necessary to write N segments
of dirty data back to disk", rather than its former meaning of "read
as many segments as necessary to free N segments worth of space". The
new meaning, combined with the new -n behavior described above,
further aids in cleaning storage efficiency as entire segments can be
written at once, using as few blocks as possible for segment summaries
and inode blocks.

* Make the cleaner take note of segments which could not be cleaned due
to error, and not attempt to clean them until they are entirely free
of dirty blocks. This prevents the case in which a cleanerd running
with -n 1 and without -b (formerly the default) would spin trying
repeatedly to clean a corrupt segment, while the remaining space
filled and deadlocked the filesystem.

* Update the lfs_cleanerd manual page to describe all the options,
including the changes mentioned here (in particular, the -b and -n
flags were previously undocumented).

fsck_lfs:

* Check, and optionally fix, lfs_avail (to an exact figure) and
lfs_bfree (within a margin of error) in pass 5.

newfs_lfs:

* Reduce the default dlfs_minfreeseg to 1/20 of the total segments.

* Add a warning if the sgs disklabel field is 16 (the default for FFS'
cpg, but not usually desirable for LFS' sgs: 5--8 is a better range).

* Change the calculation of lfs_avail and lfs_bfree, corresponding to
the kernel changes mentioned above.

mount_lfs:

* Add -N and -b options to pass corresponding -n and -b options to
lfs_cleanerd.

* Default to calling lfs_cleanerd with "-b -n 4".


[All of these changes were largely tested in the 1.5 branch, with the
idea that they (along with previous un-pulled-up work) could be applied
to the branch while it was still in ALPHA2; however my test system has
experienced corruption on another filesystem (/dev/console has gone
missing :^), and, while I believe this unrelated to the LFS changes, I
cannot with good conscience request that the changes be pulled up.]
 1.16 27-Jun-2000  perseant Fixes associated with filling an LFS:

Change the space computation to appear to change the size of the *disk*
rather than the *bytes used* when more segment summaries and inode
blocks are written. Try to estimate the amount of space that these will
take up when more files are written, so the disk size doesn't change too
much.

Regularize error returns from lfs_valloc, lfs_balloc, lfs_truncate: they
now fail entirely, rather than succeeding half-way and leaving the fs in
an inconsistent state.

Rewrite lfs_truncate, mostly stealing from ffs_truncate. The old
lfs_truncate had difficulty truncating a large file to a non-zero size
(indirect blocks were not handled appropriately).

Unmark VDIROP on fvp after ufs_remove, ufs_rmdir, so these can be
reclaimed immediately: this vnode would not be written to disk again
anyway if the removal succeeded, and if it failed, no directory
operation occurred.

ufs_makeinode and ufs_mkdir now remove IN_ADIROP on error.
 1.15 06-Jun-2000  perseant branches: 1.15.2;
Don't try to inactivate dirop vnodes that are still in the middle of
their dirop.
 1.14 05-May-2000  perseant branches: 1.14.2;
Change the way LFS does block accounting, from trying to infer from the
buffer cache flags, to marking the inode and/or indirect blocks with a
special disk address UNWRITTEN==-2 when a block is accounted for. (This
address is never written to disk, but only used in-core. This is essentially
the same method of block accounting as on the UBC branch, where the buffer
headers don't exist.) Make sure that truncation is handled properly,
especially in the case of holey files.

Fixes PR#9994.
 1.13 30-Mar-2000  augustss Remove register declarations.
 1.12 19-Jan-2000  perseant Changes to stabilize LFS. The first two of these should also apply to the
1.4 branch.

* Use a separate per-fs lock, instead of ufs_hashlock, to protect the Inode
free list. This seems to prevent the "lockmgr: %d, not exclusive lock holder
%d, unlocking" message I was mis-attributing last night to an unlocked vnode
being passed to vrele.

* Change calling semantics of lfs_ifind, to give better error reporting:
If fed a struct buf, it can report the block number of the offending inode
block as well as the inode number.

* Back out rev 1.10 of lfs_subr.c, since the replacement code was slightly
uglier while being functionally identical.

* Make lfs_vunref use the same free list convention as vrele/vput, so that
vget does not remove vnodes from a hash list they are not on.
 1.11 16-Jan-2000  perseant Make sure that vnodes are locked when inactivated (e.g. by the cleaner)
 1.10 16-Jan-2000  perseant Fix a problem in my changes of Dec 14th, that prevents removed vnodes
from being inactivated under some conditions. Removed vnodes are now
inactivated when the VDIROP flag is cleared, and to prevent block
accounting problems this clearing has been postponed until
lfs_segunlock.
 1.9 25-Mar-1999  perseant branches: 1.9.2; 1.9.8; 1.9.14;
clean up unused/required #ifdefs
 1.8 10-Mar-1999  perseant New sources should leave the LFS in a more-or-less working state. Changes
include:

- DIROP segregation is enabled, and greater care is taken
to make sure that a checkpoint completes. Fsck is not
needed to remount the filesystem.
- Several checks to make sure that the LFS subsystem does not
overuse various resources (memory, in particular).
- The cleaner routines, lfs_markv in particular, are completely
rewritten. A buffer overflow is removed. Greater care is taken
to ensure that inodes come from where lfs_cleanerd say they come
from (so we know nothing has changed since lfs_bmapv was called).
- Fragment allocation is fixed, so that writes beyond end-of-file
do the right thing.
 1.7 25-Aug-1998  thorpej Add some braces to make egcs happy.
 1.6 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.5 12-Oct-1996  christos revert previous kprintf changes
 1.4 10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.3 09-Feb-1996  christos lfs prototypes
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.9.14.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.9.8.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.9.2.1 20-Jan-2000  he Pull up revision 1.10 (requested by perseant):
Files removed (through unlink, rmdir) are now really removed, though the
removal is postponed until the dirop is complete to ensure validity of
the filesystem through a crash. Use a separate per-fs lock, instead of
ufs_hashlock, to protect the inode free list. Change calling semantics
of lfs_ifind, to give better error reporting: If fed a struct buf, it
can report the block number of the offending inode block as well as the
inode number.
 1.14.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.15.2.1 14-Sep-2000  perseant Pull up recent LFS kernel changes (approved by thorpej):

ufs/ufs/inode.h, 1.20--1.22 (add i_lfs_effnblks extension ;
make ITIMES aware of LFS_ITIMES;
_LKM protection so userland progs
compile)
ufs/ufs/ufs_vnops.c, 1.69, 1.71 (remove IN_ADIROP;
use ITIMES instead of FFS_ITIMES)
ufs/ufs/ufs_readwrite.c, 1.27 (use lfs_reserve in lfs_write)
ufs/lfs/lfs.h, 1.26--1.32 (define LFS_EST_* macros ;
change MIN_FREE_SEGS to lfs_minfreesegs ;
add avail and bfree to CLEANERINFO ;
change lfs_uinodes to signed ;
change lfs_dmeta to signed ;
add whitespace to line up structure
members ;
explicit cast to int32_t in LFS_EST_*
macros)
ufs/lfs/lfs_alloc.c, back out 1.34.2.3 (pullups of 1.39, 1.40);
then pull up 1.38 (clean up on error)
1.39--1.43 (restore fvdl's ufs_hashlock fix ;
restore fvdl's ufs_hashlock fix ;
set i_lfs_effnblks ;
use UINO macros ;
add comments and fix long lines)
ufs/lfs/lfs_balloc.c, 1.19 (don't succeed halfway)
1.21--1.25 (use i_lfs_effnblks ;
fix i_lfs_effnblks computation and
quieten ;
fix i_ffs_blocks in unwritten fragment ;
remove useless debugging check ;
add comments and (c) 2000)
ufs/lfs/lfs_bio.c, 1.24--1.30 (cleanup and make lfs_flush_fs take
"struct lfs *" instead of "struct
mount *" ;
use lfs_minfreeseg instead of
MIN_FREE_SEGS ;
use UINO macros, and copy bfree/avail
to CLEANERINFO ;
add lfs_reserve function ;
1.28--1.30 fix printf formatting)
ufs/lfs/lfs_cksum.c, 1.13 (add (c) 2000)
ufs/lfs/lfs_debug.c, 1.11 (use btodb instead of DEV_BSIZE)
ufs/lfs/lfs_extern.h, 1.18, 1.20--1.21 (function prototype changes)
ufs/lfs/lfs_inode.c, 1.38 (rewrite lfs_truncate from
ffs_truncate)
1.40--1.44 (count written and unwritten blocks
seperately ;
use disk block units instead of bytes ;
remove unnecessary "mod" variable ;
correct B_DELWRI to avoid bawrite panic ;
use lfs_reserve)
ufs/lfs/lfs_segment.c, 1.52-1.59 (use lfs_dmeta to note used summaries ;
check for UNWRITTEN in indirect blocks ;
more debugging stuff inside #ifdef
DEBUG_LFS ;
use LK_CANRECURSE ;
don't drop dirty indirect blocks ;
use UINO macros ;
don't hose the free list ;
use btodb() instead of DEV_BSIZE ;
make it compile again (oops))
ufs/lfs/lfs_subr.c, 1.16--1.17 (check for locked inodes before
changing ;
use btodb() instead of DEV_BSIZE, (c)
2000)
ufs/lfs/lfs_syscalls.c, back out 1.41.4.2 (fvdl's ufs_hashlock fix);
then pull up 1.43 (use lfs_dmeta)
1.44--1.45 (restore fvdl's ufs_hashlock fix)
1.46--1.47 (fix lfs_avail leakage from sblock
segments ;
use UINO macros)
1.49 (bounds-check inode numbers in
lfs_markv)
ufs/lfs/lfs_vfsops.c, 1.53 (use LFS_EST_* macros in lfs_statfs)
1.56--1.58 (initialize lfs_minfreeseg, lfs_effnblk ;
initialize lfs_uinodes ;
initialize lfs_ravail)
ufs/lfs/lfs_vnops.c, 1.40 (remove VDIROP from removed files)
1.42--1.44 (move SET_ENDOP below the removal of
VDIROP ;
use UINO macros and add lfs_itimes
function ;
use lfs_reserve in dirops)
 1.17.6.4 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.17.6.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.17.6.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.17.6.1 03-Aug-2001  lukem update to -current
 1.17.4.3 02-Jul-2001  perseant Change disk addressing unit to be the fragment, instead of the disk sector.
All quantities in the superblock, inodes, indirect blocks, etc. refer now
to this abstract unit (called "fsb" as it is in FFS) instead of disk sectors;
as a consequence segment summary blocks have to be multiples of a fragment in
size. In v1 filesystems, compatibility code ensures that 1 fsb == 1 sector,
regardless of fragment size.

Fragments can now range in size between 512 and 32k; in the event that
LFS_LABELPAD (8k) is smaller than the disk address unit size, an extra
proto-superblock is kept at 8k from the beginning of the disk, to be used
*only* to locate the real superblocks. (Not all of the userland knows about
this yet.)

Almost all of this was done not by me, but by joff.
 1.17.4.2 29-Jun-2001  perseant Get rid of __P(), protoizing where it had not already been done
 1.17.4.1 27-Jun-2001  perseant Import of what I've been calling "LFSv2", that is, LFS with some features
added that require changes to the on-disk data structures. These include:

- 64-bit time in everything but inodes
- User-specified segment offset, and segment size no longer
restricted to PO2.
- Serial number on segment summaries in addition to timestamp, and
a new volume identifier, to make roll-forward feasible without
fear of finding old data and thinking it was new.

Although I think this version works at least as well as what's on the trunk,
we're not done yet; hence this commit is going in on a branch and not on
the trunk. Enhancements that are not here yet include fragment addressing,
like FFS does, instead of block addressing.
 1.17.2.7 01-Aug-2002  nathanw Catch up to -current.
 1.17.2.6 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.17.2.5 20-Jun-2002  nathanw Catch up to -current.
 1.17.2.4 08-Jan-2002  nathanw Catch up to -current.
 1.17.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.17.2.2 24-Aug-2001  nathanw Catch up with -current.
 1.17.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.18.4.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.22.2.3 15-Jul-2002  gehenna catch up with -current.
 1.22.2.2 20-Jun-2002  gehenna catch up with -current.
 1.22.2.1 30-May-2002  gehenna Catch up with -current.
 1.23.2.3 20-Jun-2002  lukem Pull up revision 1.26 (requested by perseant in ticket #325):
For synchronous writes, keep separate i/o counters for each write, so
processes don't have to wait for one another to finish (e.g., nfsd seems
to be a little happier now, though I haven't measured the difference).
Synchronous checkpoints, however, must always wait for all i/o to finish.
Take the contents of the callback functions and have them run in thread
context instead (aiodoned thread). lfs_iocount no longer has to be
protected in splbio(), and quite a bit less of the segment construction
loop needs to be in splbio() as well.
If lfs_markv is handed a block that is not the correct size according to
the inode, refuse to process it. (Formerly it was extended to the "correct"
size.) This is possibly more prone to deadlock, but less prone to corruption.
lfs_segclean now outright refuses to clean segments that appear to have live
bytes in them. Again this may be more prone to deadlock but avoids
corruption.
Replace ufsspec_close and ufsfifo_close with LFS equivalents; this means
that no UFS functions need to know about LFS_ITIMES any more. Remove
the reference from ufs/inode.h.
Tested on i386, test-compiled on alpha.
 1.23.2.2 02-Jun-2002  tv Pull up revision 1.25 (requested by perseant in ticket #135):
Fix a couple of instances where reassignbuf() was not done at splbio.
Tested on i386.
 1.23.2.1 02-Jun-2002  tv Pull up revision 1.24 (requested by perseant in ticket #132):
Back out rev 1.174 of vfs_subr.c, because the splbio() wasn't protecting
enough to be useful, and broadening it so that it did would have meant
that operations possibly requiring synchronous disk activity would have
to be done in splbio(). This clearly was not going to work.
Worked around this in the LFS case by having lfs_cluster_callback put an
extra hold on the vnode before calling biodone(), and taking the hold
off without HOLDRELE's problematic list swapping. lfs_vunref() will take
care of that---in thread context---on the next write if need be.
Also, ensure that the list walking in lfs_{writevnodes,segunlock,gather}
takes into account the possibility that the list may change
underneath it (possibly because it itself deleted an element).
Tested on i386, test-compiled on alpha.
 1.40.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.40.2.5 08-Mar-2005  skrll Sync with HEAD.
 1.40.2.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.40.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.40.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.40.2.1 03-Aug-2004  skrll Sync with HEAD
 1.47.10.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.47.8.1 29-Apr-2005  kent sync with -current
 1.47.6.1 10-May-2005  riz Pull up the following revisions (requested by perseant in ticket #1281):

1.8 sys/ufs/lfs/TODO
1.75 sys/ufs/lfs/lfs.h (via patch)
1.74 sys/ufs/lfs/lfs_alloc.c (via patch)
1.49, 1.51 sys/ufs/lfs/lfs_balloc.c (1.51 via patch)
1.78 sys/ufs/lfs/lfs_bio.c
1.62 sys/ufs/lfs/lfs_extern.h (via patch)
1.156 sys/ufs/lfs/lfs_segment.c (via patch)
1.48 sys/ufs/lfs/lfs_subr.c
1.101 sys/ufs/lfs/lfs_syscalls.c
1.163 sys/ufs/lfs/lfs_vfsops.c (via patch)
1.134 sys/ufs/lfs/lfs_vnops.c (via patch)
1.61 sys/ufs/ufs/ufs_readwrite.c (via patch)

1.20 libexec/lfs_cleanerd/clean.h (via patch)
1.52 libexec/lfs_cleanerd/cleanerd.c (via patch)
1.41 libexec/lfs_cleanerd/library.c (via patch)

1.4 regress/sys/fs/lfs/newfs_fsck/Makefile
1.2 regress/sys/fs/lfs/newfs_fsck/mkfs_mount
1.2 regress/sys/fs/lfs/newfs_fsck/smallfiles
1.3 sbin/fsck_lfs/bufcache.c
1.3 sbin/fsck_lfs/bufcache.h
1.3 sbin/fsck_lfs/lfs.h
1.8 sbin/fsck_lfs/lfs.c (via patch)
1.8 sbin/fsck_lfs/pass3.c (via patch)
1.18 sbin/fsck_lfs/pass0.c (via patch)
1.18 sbin/fsck_lfs/utilities.c (via patch)
1.7 sbin/fsck_lfs/segwrite.c
1.19 sbin/fsck_lfs/setup.c (via patch)
1.3 sbin/newfs_lfs/Makefile
0 sbin/newfs_lfs/lfs.c (yes, remove it)
1.1 sbin/newfs_lfs/make_lfs.c
1.15 sbin/newfs_lfs/newfs.c (via patch)

Various minor LFS improvements.

Kernel:

* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this. Should fix PR #29045.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
Fixes PR #26680.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().

cleaner:

* Adapt lfs_cleanerd to use the fcntl call to get the Ifile filehandle,
so it need not be in the namespace.
* Make lfs_cleanerd be more careful when there are very few available
segments.
* Make lfs_cleanerd less verbose when the filesystem is unmounted.

newfs_lfs, fsck_lfs, and regression:

* Extend the lfs library from fsck_lfs(8) so that it can be used with a
not-yet-existent LFS. Make newfs_lfs(8) use this library, so it can
create LFSs whose Ifile is larger than one segment. Addresses PR #11110.
* Make newfs_lfs(8) use strsuftoi64() for its arguments, a la newfs(8).
* Make fsck_lfs(8) respect the "file system is clean" flag.
* Don't let fsck_lfs(8) think it has dirty blocks when invoked with the
-n flag.
* Remove the Ifile from the filesystem namespace. The cleaner now uses
a fcntl call on the root inode to find the Ifile filehandle. (As a
side-effect, addresses PR #29144.)
 1.50.2.6 10-Aug-2006  tron Apply patch (requested by fair in perseant #1457):
Bring LFS up to current, including a patch (1.95 lfs_alloc.c) that
should prevent the inode free list errors seen on the STABLE branch
subsequent to pullup ticket #1327.
 1.50.2.5 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_alloc.c: revision 1.92
sys/ufs/lfs/lfs.h: revision 1.105
sys/ufs/lfs/lfs_vfsops.c: revision 1.207
sys/ufs/lfs/lfs_subr.c: revision 1.59
sys/ufs/lfs/lfs_vnops.c: revision 1.173
sys/ufs/lfs/lfs_bio.c: revision 1.92
Introduce another per-filesystem parameter, lfs_resvseg, to separate the
notion of "how many segments are reserved for the cleaner" from that of
"how many segments are not counted in lfs_bfree". The default value
used for existing filesystems is the same as the previous implicit value
of (lfs_minfreeseg / 2 + 1), modulo some sanity checking.
Count pending dirops on a per-filesystem basis, since once we start
writing them we can't stop until we're done. This seems to help stave off
the "no clean segments" panic in the case of filling the filesystem with
directories and small files (e.g. simultaneously unpacking more copies of
pkgsrc than will fit).
 1.50.2.4 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_subr.c: revision 1.58
sys/ufs/lfs/lfs.h: revision 1.98
Make the segment lock aware of LWPs. Fixes a (somewhat confusing)
"lockmgr: pid 3997, not exclusive lockholder 3997, unlocking" panic I
encountered while running blogbench on an LFS.
 1.50.2.3 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.158
sys/ufs/lfs/lfs_subr.c: revision 1.57
sys/ufs/lfs/lfs_segment.c: revision 1.171
sys/ufs/lfs/lfs.h: revision 1.97
sys/ufs/lfs/lfs_vfsops.c: revision 1.195
sys/ufs/lfs/lfs_extern.h: revision 1.76
Improvements to LFS's paging mechanism, to wit:
* Acknowledge that sometimes there are more dirty pages to be written to
disk than clean segments. When we reach the danger line,
lfs_gop_write() now returns EAGAIN. The caller of VOP_PUTPAGES(), if
it holds the segment lock, drops it and waits for the cleaner to make
room before continuing.
* Note and avoid a three-way deadlock in lfs_putpages (a writer holding
a page busy blocks on the cleaner while the cleaner blocks on the
segment lock while lfs_putpages blocks on the page).
 1.50.2.2 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.152
sys/ufs/lfs/lfs_debug.c: revision 1.31
sys/ufs/lfs/lfs_subr.c: revision 1.53
sys/ufs/lfs/lfs_extern.h: revision 1.68
sys/ufs/lfs/lfs_inode.c: revision 1.96
sys/ufs/lfs/lfs_bio.c: revision 1.86
sys/ufs/lfs/lfs_alloc.c: revision 1.83
sys/ufs/lfs/lfs_vfsops.c: revision 1.181
sys/ufs/lfs/lfs.h: revision 1.88
sys/ufs/lfs/lfs_segment.c: revision 1.164
- sprinkle const
- avoid shadow variables.
 1.50.2.1 07-May-2005  tron Apply patch (requested by perseant in ticket #242):
* fsck_lfs buffer cache fixes, including PR #29151
* Change fsck_lfs phase 0 message to reflect reality
* fsck_lfs: check phase 5 (cleanerinfo accounting) even on
roll-forward
* Keep better track of the free list during roll-forward, avoiding
a core dump
* Improve hash table use for fsck_lfs buffer and vnode cache
* Document fsck_lfs flag -f, and implement -q
* Add resize_lfs, including kernel support
* Add LFS to mountd's list of exportable filesystem types
* Make the LFS lkm work again [christos@]
* Add MP locking to the LFS kernel subsystem
* Fix pager_map deadlock in lfs_putpages()
* Avoid incomplete file extension that looks like "partial
truncation" to fsck
* Use lfs_malloc for cleaner malloc, since the cleaner often runs
in low-memory conditions.
* Use splay trees, not hash table, to track page allocation for
write.
* Fix mkdir panic on full fs
* Fix page accounting leak by counting differently.
* Use rightly named structure for lfs_getattr [skrll@]
* Cosmetic changes for readability.
 1.53.4.1 20-Oct-2005  yamt adapt ufs.
 1.53.2.6 21-Jan-2008  yamt sync with head
 1.53.2.5 27-Oct-2007  yamt sync with head.
 1.53.2.4 03-Sep-2007  yamt sync with head.
 1.53.2.3 26-Feb-2007  yamt sync with head.
 1.53.2.2 30-Dec-2006  yamt sync with head.
 1.53.2.1 21-Jun-2006  yamt sync with head.
 1.55.2.1 15-Jan-2006  yamt sync with head.
 1.56.10.2 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.56.10.1 28-Mar-2006  tron Merge 2006-03-28 NetBSD-current into the "peter-altq" branch.
 1.56.8.2 11-May-2006  elad sync with head
 1.56.8.1 19-Apr-2006  elad sync with head.
 1.56.6.5 03-Sep-2006  yamt sync with head.
 1.56.6.4 11-Aug-2006  yamt sync with head
 1.56.6.3 24-May-2006  yamt sync with head.
 1.56.6.2 11-Apr-2006  yamt sync with head
 1.56.6.1 01-Apr-2006  yamt sync with head.
 1.56.4.2 01-Jun-2006  kardel Sync with head.
 1.56.4.1 22-Apr-2006  simonb Sync with head.
 1.56.2.1 09-Sep-2006  rpaulo sync with head
 1.59.4.1 13-Jul-2006  gdamore Merge from HEAD.
 1.61.2.1 18-Nov-2006  ad Sync with head.
 1.62.2.2 10-Dec-2006  yamt sync with head.
 1.62.2.1 22-Oct-2006  yamt sync with head
 1.65.4.1 03-Sep-2007  wrstuden Sync w/ NetBSD-4-RC_1
 1.65.2.1 05-Jun-2007  bouyer Pull up following revision(s) (requested by perseant in ticket #703):
sys/miscfs/genfs/genfs.h 1.21
sys/miscfs/genfs/genfs_vnops.c 1.151
sys/ufs/lfs/lfs.h 1.119, 1.120
sys/ufs/lfs/lfs_bio.c 1.99-101
sys/ufs/lfs/lfs_extern.h 1.89
sys/ufs/lfs/lfs_inode.c 1.108, 1.109
sys/ufs/lfs/lfs_segment.c 1.197, 1.199, 1.200
sys/ufs/lfs/lfs_subr.c 1.69, 1.70
sys/ufs/lfs/lfs_syscalls.c 1.119
sys/ufs/lfs/lfs_vfsops.c 1.234, 1.235
sys/ufs/lfs/lfs_vnops.c 1.195, 1.196, 1.200, 1.202-206

Reduce busy waiting in lfs_putpages(), and other LFS improvements.
 1.66.2.4 17-May-2007  yamt sync with head.
 1.66.2.3 07-May-2007  yamt sync with head.
 1.66.2.2 24-Mar-2007  yamt sync with head.
 1.66.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.67.4.5 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.67.4.4 08-Jun-2007  ad Sync with head.
 1.67.4.3 21-Mar-2007  ad GC the simplelock/spinlock debugging stuff.
 1.67.4.2 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.67.4.1 13-Mar-2007  ad Sync with head.
 1.68.2.1 11-Jul-2007  mjf Sync with head.
 1.70.10.1 14-Oct-2007  yamt sync with head.
 1.70.8.2 09-Jan-2008  matt sync with HEAD
 1.70.8.1 06-Nov-2007  matt sync with HEAD
 1.70.6.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.71.10.1 02-Jan-2008  bouyer Sync with HEAD
 1.71.6.2 19-Dec-2007  ad Use a global lfs_lock.
 1.71.6.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.71.4.1 18-Feb-2008  mjf Sync with HEAD.
 1.72.10.3 11-Aug-2010  yamt sync with head.
 1.72.10.2 11-Mar-2010  yamt sync with head
 1.72.10.1 16-May-2008  yamt sync with head.
 1.72.8.1 18-May-2008  yamt sync with head.
 1.72.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.73.20.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.73.20.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.74.2.1 03-Jul-2010  rmind sync with head
 1.76.12.1 18-Feb-2012  mrg merge to -current.
 1.76.8.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.76.8.1 17-Apr-2012  yamt sync with head
 1.77.6.3 03-Dec-2017  jdolecek update from HEAD
 1.77.6.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.77.6.1 23-Jun-2013  tls resync from head
 1.79.2.1 28-Aug-2013  rmind sync with head
 1.80.6.3 28-Aug-2017  skrll Sync with HEAD
 1.80.6.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.80.6.1 22-Sep-2015  skrll Sync with HEAD
 1.86.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.86.2.1 26-Apr-2017  pgoyette Sync with HEAD
 1.92.6.1 30-Oct-2017  snj Pull up following revision(s) (requested by maya in ticket #330):
sbin/fsck_lfs/inode.c: 1.69
sbin/fsck_lfs/lfs.c: 1.73
sbin/fsck_lfs/pass6.c: 1.50
sbin/fsck_lfs/segwrite.c: 1.46
sys/ufs/lfs/lfs.h: 1.202-1.203
sys/ufs/lfs/lfs_accessors.h: 1.48
sys/ufs/lfs/lfs_alloc.c: 1.136-1.137
sys/ufs/lfs/lfs_balloc.c: 1.94
sys/ufs/lfs/lfs_bio.c: 1.141
sys/ufs/lfs/lfs_extern.h: 1.113
sys/ufs/lfs/lfs_inode.c: 1.156-1.157
sys/ufs/lfs/lfs_inode.h: 1.20, 1.21, 1.23
sys/ufs/lfs/lfs_itimes.c: 1.20
sys/ufs/lfs/lfs_pages.c: 1.13-1.15
sys/ufs/lfs/lfs_rename.c: 1.22
sys/ufs/lfs/lfs_segment.c: 1.270-1.275
sys/ufs/lfs/lfs_subr.c: 1.94-1.97
sys/ufs/lfs/lfs_syscalls.c: 1.175
sys/ufs/lfs/lfs_vfsops.c: 1.360
sys/ufs/lfs/lfs_vnops.c: 1.316-1.321
sys/ufs/lfs/ulfs_inode.c: 1.20
sys/ufs/lfs/ulfs_inode.h: 1.24
sys/ufs/lfs/ulfs_lookup.c: 1.41
sys/ufs/lfs/ulfs_quota2.c: 1.31
sys/ufs/lfs/ulfs_readwrite.c: 1.24
sys/ufs/lfs/ulfs_vnops.c: 1.49-1.50
Update inode member i_flag --> i_state to keep up with kernel changes
Move definition of IN_ALLMOD near the flag it's a mask for.
Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
Correct confusion between i_flag and i_flags
These will have to be renamed.
Spotted by Riastradh, thanks!
Add an XXX about the missing flags so it's not buried in a commit
message.
now the XXX count for LFS is 260
Rename i_flag to i_state.
The similarity to i_flags has previously caused errors.
Use continue to denote the no-op loop to match netbsd style
newline for extra clarity.
It isn't safe to drain dirops with seglock held, it'll deadlock if there
are any dirops. drain before grabbing seglock.
lfs_dirops == 0 is always true (as we already drained dirops), so omit
that part of the comparison.
Fixes a lot of LFS deadlocks. PR kern/52301
Many thanks to dholland for help analyzing coredumps
Ifdef out KDASSERT which fires on my machine.
Deduplicate sanity check that seglock is held on segunlock
Revert r1.272 fix to PR kern/52301, the performance hit is making things
unusable.
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar
XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
fix buffer overflow/KASSERT when cookies are supplied
lfs no longer uses the ffs-style struct direct, use the correct minimum
size
from dholland
XXX more wrong
Consistently use {,UN}MARK_VNODE macros rather than function calls.
Not much point doing anything after a panic call
Ask some question about the code in a XXX comment
XXX question our double-flushing of dirops
Fix typo in comment
 1.97.10.1 29-Feb-2020  ad Sync with head.
 1.97.8.1 17-Aug-2020  martin Pull up following revision(s) (requested by riastradh in ticket #1050):

sys/ufs/lfs/lfs_subr.c: revision 1.101
sys/ufs/lfs/lfs_subr.c: revision 1.102
sys/ufs/lfs/lfs_inode.c: revision 1.158
sys/ufs/lfs/lfs_inode.h: revision 1.25
sys/ufs/lfs/lfs_balloc.c: revision 1.95
sys/ufs/lfs/lfs_pages.c: revision 1.21
sys/ufs/lfs/lfs_vnops.c: revision 1.330
sys/ufs/lfs/lfs_alloc.c: revision 1.140 (patch)
sys/ufs/lfs/lfs_alloc.c: revision 1.141 (patch)
lib/libp2k/p2k.c: revision 1.72
sys/ufs/lfs/lfs.h: revision 1.205
sys/ufs/lfs/lfs.h: revision 1.206
sys/ufs/lfs/lfs_segment.c: revision 1.284
sys/ufs/lfs/lfs.h: revision 1.207
sys/ufs/lfs/lfs_segment.c: revision 1.285
sys/ufs/lfs/lfs_debug.c: revision 1.55
sys/ufs/lfs/lfs_rename.c: revision 1.23
usr.sbin/dumplfs/dumplfs.c: revision 1.65
sys/ufs/lfs/lfs_vfsops.c: revision 1.371
sys/arch/i386/stand/efiboot/bootx64/Makefile: revision 1.3
sys/ufs/lfs/lfs_vfsops.c: revision 1.372
sys/ufs/lfs/lfs_vfsops.c: revision 1.373
sbin/fsck_lfs/pass1.c: revision 1.46
sys/ufs/lfs/lfs_vnops.c: revision 1.326
sys/ufs/lfs/lfs_vnops.c: revision 1.327
sys/ufs/lfs/lfs_vfsops.c: revision 1.375 (patch)
sys/ufs/lfs/lfs_vnops.c: revision 1.328
sys/ufs/lfs/lfs_subr.c: revision 1.98
sys/ufs/lfs/lfs_extern.h: revision 1.116
sys/ufs/lfs/lfs_vnops.c: revision 1.329
sys/ufs/lfs/lfs_subr.c: revision 1.99
sys/ufs/lfs/lfs_extern.h: revision 1.117
sys/ufs/lfs/lfs_accessors.h: revision 1.49
sys/ufs/lfs/lfs_extern.h: revision 1.118
sys/rump/fs/lib/liblfs/Makefile: revision 1.15
sys/ufs/lfs/lfs_bio.c: revision 1.146 (patch)
sys/ufs/lfs/lfs_bio.c: revision 1.147
sys/ufs/lfs/lfs_subr.c: revision 1.100

Fix kassert in lfs by initializing vp first.

Use a marker node to iterate lfs_dchainhd / i_lfs_dchain.

I believe elements can be removed while the lock is dropped,
including the next node we're hanging on to.

Just use VOP_BWRITE for lfs_bwrite_log.
Hope this doesn't cause trouble with vfs_suspend.

Teach lfs to transition ro<->rw.

Prevent new dirops while we issue lfs_flush_dirops.

lfs_flush_dirops assumes (by KASSERT((ip->i_state & IN_ADIROP) == 0))
that vnodes on the dchain will not become involved in active dirops
even while holding no other locks (lfs_lock, v_interlock), so we must
set lfs_writer here. All other callers already set lfs_writer.

We set fs->lfs_writer++ without explicitly doing lfs_writer_enter
because
(a) we already waited for the dirops to drain, and
(b) we hold lfs_lock and cannot drop it before setting lfs_writer.

Assert lfs_writer where I think we can now prove it.

Serialize access to the splay tree with lfs_lock.

Change some cheap KDASSERT into KASSERT.

Take a reference and fix assertions in lfs_flush_dirops.
Fixes panic:
KASSERT((ip->i_state & IN_ADIROP) == 0) at lfs_vnops.c:1670
lfs_flush_dirops
lfs_check
lfs_setattr
VOP_SETATTR
change_mode
sys_fchmod
syscall

This assertion -- and the assertion that vp->v_uflag has VU_DIROP set
-- is valid only until we release lfs_lock, because we may race with
lfs_unmark_dirop which will remove the nodes and change the flags.

Further, vp itself is valid only as long as it is referenced, which it
is as long as it's on the dchain, but lfs_unmark_dirop drops the
dchain's reference.

Don't lfs_writer_enter while holding v_interlock.

There's no need to lfs_writer_enter at all here, as far as I can see.
lfs_flush_fs will do it for us.

Break deadlock in PR kern/52301.

The lock order is lfs_writer -> lfs_seglock. The problem in 52301 is
that lfs_segwrite violates this lock order by sometimes doing
lfs_seglock -> lfs_writer, either (a) when doing a checkpoint or (b),
opportunistically, when there are no dirops pending. Both cases can
deadlock, because dirops sometimes take the seglock (lfs_truncate,
lfs_valloc, lfs_vfree):
(a) There may be dirops pending, and they may be waiting for the
seglock, so we can't wait for them to complete while holding the
seglock.
(b) The test for fs->lfs_dirops == 0 happens unlocked, and the state
may change by the time lfs_writer_enter acquires lfs_lock.

To resolve this in each case:
(a) Do lfs_writer_enter before lfs_seglock, since we will need it
unconditionally anyway. The worst performance impact of this should
be that some dirops get delayed a little bit.
(b) Create a new lfs_writer_tryenter to use at this point so that the
test for fs->lfs_dirops == 0 and the acquisition of lfs_writer happen
atomically under lfs_lock.

Initialize/destroy lfs_allclean_wakeup in modcmd, not lfs_mountfs.

Fixes reloading lfs.kmod.

In lfs_update, hold lfs_writer around lfs_vflush.

Otherwise, we might do
lfs_vflush
-> lfs_seglock
-> lfs_segwait(SEGM_CKP)
-> lfs_writer_enter
which is the reverse of the lfs_writer -> lfs_seglock ordering.

Call lfs_orphan in lfs_rename while we're still in the dirop.
lfs_writer_enter can't fail; keep it simple and don't pretend it can.

Assert that mtsleep can't fail either -- it doesn't catch signals and
there's no timeout.

Teach LFS_ORPHAN_NEXTFREE about lfs64.

Dust off the orphan detection code and try to make it work.

Fix !DIAGNOSTIC compile

Fix userland references to LFS_ORPHAN_NEXTFREE.

Forgot to grep for these or do a full distribution build, oops!

Fix missing <sys/evcnt.h> by removing the evcnts instead.

Just wanted to confirm that a race might happen, and indeed it did.
These serve little diagnostic value otherwise.

OR into bp->b_cflags; don't overwrite.

CTASSERT lfs on-disk structure sizes.

Avoid misaligned access to lfs64 on-disk records in memory.
lfs64 directory entries are only 32-bit aligned in order to conserve
space in directory blocks, and we had a hack to stuff a 64-bit inode
in them. This replaces the hack by __aligned(4) __packed, and goes
further:

1. It's not clear that all the other lfs64 data structures are 64-bit
aligned on disk to begin with. We can go through these later and
upgrade them from
struct foo64 {
...
} __aligned(4) __packed;
union foo {
struct foo64 f64;
...
};
to
struct foo64 {
...
};
union foo {
struct foo64 f64 __aligned(8);
...
} __aligned(4) __packed;
if we really want to take advantage of 64-bit memory accesses.
However, the __aligned(4) __packed must remain on the union
because:
2. We access even the lfs32 data structures via a union that has
lfs64 members, and it turns out that compilers will assume access
through a union with 64-bit aligned members implies the whole
union has 64-bit alignment, even if we're only accessing a 32-bit
aligned member.

Fix clang build after packed lfs64 accessor change.

Suppress spurious address-of-packed error in rump lfs too.
 1.97.4.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.177 20-Oct-2025  perseant * Generalize the partial-segment parser introduced for roll-forward,
using it to facilitate an in-kernel segment rewriter (cleaner), and a
mechanism to check whether a segment is in fact empty (only used
with DEBUG).

* Add these new fcntl calls:
- LFCNFILESTATS: For each inode given, report its number of direct
blocks, how many gaps (discontinuities) there are between direct
blocks, and how large the total gap distance is. This will be
useful for a coalescing agent.
- LFCNREWRITEFILE: For each inode given, rewrite its direct blocks,
effectively coalescing it into as compact a form as possible.
- LFCNSCRAMBLE: As above, except that it only rewrites every other
block. This causes the file to have many gaps that can be
measured with LFCNFILESTATS and addressed with LFCNREWRITEFILE,
for testing purposes.
- LFCNREWRITESEGS: Rewrite any live data in the given segments.
This is intended to simplify the cleaner API and facilitate an
in-kernel cleaner.
- LFCNCLEANERINFO: Get the most current CLEANERINFO data from the
kernel.
- LFCNSEGUSE: Retrieve segment usage data from the kernel.

* Vnodes marked IN_CLEANING now take a reference. Add a new "cleaner
lock", which must be taken by the cleaner before the segment lock,
and before marking nodes IN_CLEANING. This allows us to flush
vnodes, if necessary, before the cleaning segment is written, and
never to flush vnodes being cleaned. When the cleaner lock is
released, the vnodes are cleared of IN_CLEANING and the reference
dropped.

* Track a potential infinite loop in lfs_gatherblock.

* Pull "needs to flush" and "needs to wait for flush" into functions
instead of inlining their definitions.
 1.176 18-Feb-2020  chs remove the aiodoned thread. I originally added this to provide a thread context
for doing page cache iodone work, but since then biodone() has changed to
hand off all iodone work to a softint thread, so we no longer need the
special-purpose aiodoned thread.
 1.175 26-Jul-2017  maya branches: 1.175.4; 1.175.10;
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar

XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
 1.174 17-Apr-2017  hannken branches: 1.174.4;
Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.
 1.173 13-Mar-2017  riastradh #if DIAGNOSTIC panic ---> KASSERT

Replace some #if DEBUG by this too. DEBUG is only for expensive
assertions; these are not.
 1.172 15-Oct-2015  dholland branches: 1.172.2; 1.172.4;
Move stuff from struct ulfsmount to struct lfs.
 1.171 10-Oct-2015  dholland Fix minor bitrot in #if 0 or otherwise disabled code.
 1.170 01-Sep-2015  dholland Use the lfs dinode accessors in place of the ufs-derived ones.
(Mostly.)

The ufs-derived ones are fake structure member macros, which are gross
and not very safe. Also, it seems that a lot of places in the lfs code
were using the ffsv1 branch of them unconditionally, and this way it's
guaranteed all those places have been updated.

Found while doing this: for non-devices, have getattr produce NODEV
in the rdev field instead of leaking the address of the first direct
block.
 1.169 12-Aug-2015  dholland Hack up dinode usage to be 64 vs. 32 as needed. Part 1.

(This part changes the native lfs code; the ufs-derived code already
has 64 vs. 32 logic, but as aspects of it are unsafe, and don't
entirely interoperate cleanly with the lfs 64/32 stuff, pass 2 will be
rehashing that.)
 1.168 12-Aug-2015  dholland Add IFILE32 and IFILE64 structures for the on-disk ifile entries.
Add and use accessors. There are also a bunch of places that cast and
I hope I've found them all...
 1.167 12-Aug-2015  dholland Make 32-bit and 64-bit versions of CLEANERINFO.

XXX: while this is written to disk, it seems like much of it would
XXX: be better set up as a commpage shared with the cleaner.
 1.166 12-Aug-2015  dholland Move the security checks for lfs_bmapv/lfs_markv into those functions.
(instead of the system call entry points)

Avoids duplication.

While touching these, pass the lwp around instead of the proc -- the
latter was there for no other reason than because once upon a time
struct proc was the first argument of all syscalls.

(For that matter, why not just use curlwp instead of passing it around
all over the place? The cost of passing it to every syscall probably
exceeds the cost of loading it from curcpu, even on machines where
it's not just kept in a register all the time.)
 1.165 12-Aug-2015  dholland Fix assorted 64->32 truncations related to BLOCK_INFO.

Also make note of a cleaner limitation: it seems that when it goes to
coalesce discontiguous files, it mallocs an array with one BLOCK_INFO
for every block in the file. Therefore, with 64-bit LFS, on a 32-bit
platform it will be possible to have files large enough to overflow
the cleaner's address space. Currently these will be skipped and cause
warnings via syslog.

At some point someone should rewrite the logic to coalesce files to
use chunks of some reasonable size, as discontinuity between such
chunks is immaterial and mallocing this much space is silly and
fragile. Also, the kernel only accepts up to 65536 blocks at a time
for bmapv and markv, so processing more than this at once probably
isn't useful and may not even work currently. I don't want to change
this around just now as it's not entirely trivial.
 1.164 02-Aug-2015  dholland Use accessor functions for the version field of the lfs superblock.
I thought at first maybe the cases that test the version should be
rolled into the accessors, but on the whole I think the conclusion on
that is no.
 1.163 28-Jul-2015  dholland Add a new lfs header file: lfs_accessors.h.

This contains all the accessor functions and macros out of lfs.h.
Add an include of lfs_accessors.h after all uses of lfs.h... except
for code that wants to define its own struct lfs-alike that the
accessors are supposed to play along with. For these, set STRUCT_LFS
and include lfs_accessors.h after the necessary structure has been
defined, so that lfs_accessors.h can emit functions in terms of it.
 1.162 24-Jul-2015  dholland More lfs superblock accessors.
(This changes the rest of the code over; all the accessors were
already added.)

The difference between this commit and the previous one is arbitrary,
but the previous one passed the regression tests on its own so I'm
keeping it separate to help with any bisections that might be needed
in the future.
 1.161 24-Jul-2015  dholland Switch to accessor functions for elements of the LFS on-disk
superblock. This will allow switching between 32/64 bit forms on the
fly; it will also allow handling LFS_EI reasonably tidily. (That
currently doesn't work on the superblock.)

It also gets rid of cpp abuse in the form of fake structure member
macros.

Also, instead of doing sleep/wakeup on &lfs_avail and &lfs_nextseg
inside the on-disk superblock, add extra elements to the in-memory
struct lfs for this. (XXX: these should be changed to condvars, but
not right now)

XXX: this migrates a structure needed by the lfs code in libsa (struct
salfs) into lfs.h, where it doesn't belong, but for the time being
this is necessary in order to allow the accessors (and the various
lfs macros and other goop that relies on them) to compile.
 1.160 31-May-2015  hannken Change lfs from hash table to vcache.

- Change lfs_valloc() to return an inode number and version instead of
a vnode and move lfs_ialloc() and lfs_vcreate() to new lfs_init_vnode().

- Add lfs_valloc_fixed() to allocate a known inode, used by kernel
roll forward.

- Remove lfs_*ref(), these functions cannot coexist with vcache and
their commented behaviour is far away from their implementation.

- Add the cleaner lwp and blockinfo to struct ulfsmount so lfs_loadvnode()
may use hints from the cleaner.

- Remove vnode locks from ulfs_lookup() like we did with ufs_lookup().
 1.159 31-May-2015  hannken Make lfs_fastvget() private to lfs_syscalls.c, change it to take the
BLOCK_INFO and vnode lock type instead of the inode disk address and
return the vnode locked.

Change lfs_markv() and lfs_bmapv() to work on locked vnodes.
 1.158 31-May-2015  hannken Use VFS_PROTOS() for lfs.
Rename conflicting struct lfs field "lfs_start" to "lfs_s0addr".

No functional change.
 1.157 20-Apr-2015  riastradh Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.
 1.156 28-Mar-2015  maxv Remove the 'cred' argument from bread(). Remove a now unused var in
ffs_snapshot.c. Update the man page accordingly.

ok hannken@
 1.155 17-Apr-2014  pgoyette branches: 1.155.4;
s/null/NULL/ to fix build break

Hello, xtos!
 1.154 17-Apr-2014  christos CID/1203190: Fix NULL deref
 1.153 09-Apr-2014  riastradh Take vp->v_interlock before vdead_check in lfs_bmapv.

XXX This code is a pile of bodge that needs a serious rototill anyway.
 1.152 24-Mar-2014  hannken branches: 1.152.2;
- Make VI_XLOCK, VI_CLEAN and VI_LOCKSHARE private to kern/vfs_*.c.
- Make vwait() static.
- Add vdead_check() to check a vnode for being or becoming dead.

Discussed on tech-kern.

Welcome to 6.99.38
 1.151 05-Mar-2014  hannken Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34
 1.150 29-Oct-2013  hannken Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25
 1.149 07-Oct-2013  dholland Remove stray KERNEL_UNLOCK_ONE() in error path of lfs_markv().
From Wolfgang Stukenbrock in PR 44370.

This error path is only reachable if lfs_markv is handed an out of
range inode number, so it's unlikely that it gets tickled very often.

It isn't clear to me that we need the kernel lock in here at all, as
the path to lfs_markv that's actually used at this point (via fcntl)
doesn't take it. But, one thing at a time.
 1.148 28-Jul-2013  dholland Add lfs_kernel.h for declarations that don't need to be exposed to userland.

lfs currently has the following headers:
lfs.h - on-disk structures and stuff needed for userlevel tools
lfs_inode.h - additional restricted materials for userlevel tools
that operate the fs (newfs_lfs, fsck_lfs, lfs_cleanerd)
lfs_kernel.h - stuff needed only in the kernel

and the following legacy headers that are expected to be mopped up and
folded into one of the above:
lfs_extern.h - function prototypes
ulfs_bswap.h - endian-independent support
ulfs_dinode.h - now contains very little
ulfs_dirhash.h - dirhash support
ulfs_extattr.h - extattr support
ulfs_extern.h - more function prototypes
ulfs_inode.h - assorted kernel-only declarations
ulfs_quota.h - quota support
ulfs_quota1.h - more quota support
ulfs_quota2.h - more quota support
ulfs_quotacommon.h - more quota support
ulfsmount.h - legacy copy of ufsmount material
 1.147 18-Jun-2013  christos branches: 1.147.2;
Prefix most of the cpp macros with lfs_ and LFS_ to avoid conflicts with ffs.
This was done so that boot blocks that want to compile both FFS and LFS in
the same file work.
 1.146 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.145 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.144 22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.143 20-Dec-2012  hannken Change bread() and breadn() to never return a buffer on
error and modify all callers to not brelse() on error.

Welcome to 6.99.16

PR kern/46282 (6.0_BETA crash: msdosfs_bmap -> pcbmap -> bread -> bio_doread)
 1.142 13-Mar-2012  elad branches: 1.142.2;
Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.
 1.141 15-Jan-2012  perseant Corrections to part of rev 1.140. lfs_bmapv, not lfs_markv, marks vnodes
LFSI_BMAP and recycles them. This greatly reduces the writing leakage
occurring when the filesystem has no space available for non-cleaning
writes.
 1.140 02-Jan-2012  perseant * Remove PGO_RECLAIM during lfs_putpages()' call to genfs_putpages(),
to avoid a live lock in the latter when reclaiming a vnode with
dirty pages.

* Add a new segment flag, SEGM_RECLAIM, to note when a segment is
being written for vnode reclamation, and record which inode is being
reclaimed, to aid in forensic debugging.

* Add a new segment flag, SEGM_SINGLE, so that opportunistic writes
can write a single segment's worth of blocks and then stop, rather
than writing all the way up to the cleaner's reserved number of
segments.

* Add assert statements to check mutex ownership is the way it ought
to be, mostly in lfs_putpages; fix problems uncovered by this.

* Don't clear VU_DIROP until the inode actually makes its way to disk,
avoiding a problem where dirop inodes could become separated
(uncovered by a modified version of the "ckckp" forensic regression
test).

* Move the vfs_getopsbyname() call into lfs_writerd. Prepare code to
make lfs_writerd notice when there are no more LFSs, and exit losing
the reference, so that, in theory, the module can be unloaded. This
code is not enabled, since it causes a crash on exit.

* Set IN_MODIFIED on inodes flushed by lfs_flush_dirops. Really we
only need to set IN_MODIFIED if we are going to write them again
(e.g., to write pages); need to think about this more.

Finally, several changes to help avoid "no clean segments" panics:

* In lfs_bmapv, note when a vnode is loaded only to discover whether
its blocks are live, so it can immediately be recycled. Since the
cleaner will try to choose ~empty segments over full ones, this
prevents the cleaner from (1) filling the vnode cache with junk, and
(2) squeezing any unwritten writes to disk and running the fs out of
segments.

* Overestimate by half the amount of metadata that will be required
to fill the clean segments. This will make the disk appear smaller,
but should help avoid a "no clean segments" panic.

* Rearrange lfs_writerd. In particular, lfs_writerd now pays
attention to the number of clean segments available, and holds off
writing until there is room.
 1.139 12-Jun-2011  rmind branches: 1.139.2; 1.139.6;
Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.138 01-Jul-2010  hannken branches: 1.138.6;
Remove vlockmgr(). Generic vnode lock operations now use a rwlock located
in the vnode. All LK_* flags move from sys/lock.h to sys/vnode.h. Calls
to vlockmgr() in file systems get replaced with VOP_LOCK() or VOP_UNLOCK().

Welcome to 5.99.34.

Discussed on tech-kern.
 1.137 24-Jun-2010  hannken Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.
 1.136 16-Feb-2010  mlelstv branches: 1.136.2;
Three changes in a single commit.

- drop the notion of frags (LFS fragments) vs fsb (FFS fragments)
The code uses a complicated unity function that just makes the
code difficult to understand.

- support larger sector sizes. Fix disk address computations
to use DEV_BSIZE in the kernel as required by device drivers
and to use sector sizes in userland.

- Fix several locking bugs in lfs_bio.c and lfs_subr.c.
 1.135 13-Sep-2009  tsutsui branches: 1.135.2;
Move declaration of ufs_hashlock into <ufs/ufs_extern.h> from each c source.
 1.134 11-Jan-2009  christos merge christos-time_t
 1.133 16-May-2008  hannken branches: 1.133.6;
Make sure all cached buffers with valid, not yet written data have been
run through copy-on-write. Call fscow_run() with valid data where possible.

The LP_UFSCOW hack is no longer needed to protect ffs_copyonwrite() against
endless recursion.

- Add a flag B_MODIFY to bread(), breada() and breadn(). If set the caller
intends to modify the buffer returned.

- Always run copy-on-write on buffers returned from ffs_balloc().

- Add new function ffs_getblk() that gets a buffer, assigns a new blkno,
may clear the buffer and runs copy-on-write. Process possible errors
from getblk() or fscow_run(). Part of PR kern/38664.

Welcome to 4.99.63

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.132 06-May-2008  ad branches: 1.132.2;
PR kern/38141 lookup/vfs_busy acquire rwlock recursively

Simplify the mount locking. Remove all the crud to deal with recursion on
the mount lock, and crud to deal with unmount as another weirdo lock.

Hopefully this will once and for all fix the deadlocks with this. With this
commit there are two locks on each mount:

- krwlock_t mnt_unmounting. This is used to prevent unmount across critical
sections like getnewvnode(). It's only ever read locked with rw_tryenter(),
and is only ever write locked in dounmount(). A write hold can't be taken
on this lock if the current LWP could hold a vnode lock.

- kmutex_t mnt_updating. This is taken by threads updating the mount, for
example when going r/o -> r/w, and is only present to serialize updates.
In order to take this lock, a read hold must first be taken on
mnt_unmounting, and the two need to be held across the operation.

One effect of this change: previously if an unmount failed, we would make a
half hearted attempt to back out of it gracefully, but that was unlikely to
work in a lot of cases. Now while an unmount that will be aborted is in
progress, new file operations within the mount will fail instead of being
delayed. That is unlikely to be a problem though, because if the admin
requests unmount of a file system then s(he) has made a decision to deny
access to the resource.
 1.131 30-Apr-2008  ad PR kern/38135 vfs_busy/vfs_trybusy confusion

The previous fix worked, but it opened a window where mounts could have
disappeared from mountlist while the caller was traversing it using
vfs_trybusy(). Fix that.
 1.130 28-Apr-2008  martin Remove clause 3 and 4 from TNF licenses
 1.129 21-Apr-2008  ad branches: 1.129.2;
Acquire kernel_lock directly in LFS syscalls.
 1.128 30-Jan-2008  ad branches: 1.128.6; 1.128.8; 1.128.10;
PR kern/37706 (forced unmount of file systems is unsafe):

- Do reference counting for 'struct mount'. Each vnode associated with a
mount takes a reference, and in turn the mount takes a reference to the
vfsops.
- Now that mounts are reference counted, replace the overcomplicated mount
locking inherited from 4.4BSD with a recursable rwlock.
 1.127 30-Jan-2008  ad Replace struct lock on vnodes with a simpler lock object built on
krwlock_t. This is a step towards removing lockmgr and simplifying
vnode locking. Discussed on tech-kern.
 1.126 02-Jan-2008  ad Merge vmlocking2 to head.
 1.125 20-Dec-2007  dsl Convert all the system call entry points from:
int foo(struct lwp *l, void *v, register_t *retval)
to:
int foo(struct lwp *l, const struct foo_args *uap, register_t *retval)
Fixup compat code to not write into 'uap' and (in some cases) to actually
pass a correctly formatted 'uap' structure with the right name to the
next routine.
A few 'compat' routines that just call standard ones have been deleted.
All the 'compat' code compiles (along with the kernels required to test
build it).
98% done by automated scripts.
 1.124 10-Oct-2007  ad branches: 1.124.4; 1.124.6; 1.124.10;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.123 08-Oct-2007  ad Merge ffs locking & brelse changes from the vmlocking branch.
 1.122 04-Mar-2007  christos branches: 1.122.2; 1.122.14; 1.122.16; 1.122.18;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.121 15-Feb-2007  ad branches: 1.121.2;
Replace some uses of lockmgr() / simplelocks.
 1.120 09-Feb-2007  ad Merge newlock2 to head.
 1.119 04-Jan-2007  elad Consistent usage of KAUTH_GENERIC_ISSUSER.
 1.118 16-Nov-2006  christos branches: 1.118.2; 1.118.4;
__unused removal on arguments; approved by core.
 1.117 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.116 01-Sep-2006  perseant branches: 1.116.2; 1.116.4;
Changes to help the roll-forward agent, to wit:

* Mark being-deleted files in the Ifile so we can finish deleting them
at fs mount time.
* Flag the Ifile with "cleaner must clean" when writers are waiting for
the cleaner, rather than relying solely on the cleaner's estimation of
whether it should clean or not.
* Note partial segments written by a user agent (in particular,
fsck_lfs) so that repeated rolls forward don't interfere with one
another.
* Add a new fcntl, LFCNPASS, that allows the log to wrap exactly once,
for better testing of the validity of checkpoints.
* Keep track of the on-disk nlink count when cleaning, so that we don't
partially complete directory operations while cleaning.
* Ensure that every single Ifile inode write represents a consistent
view of the filesystem. In particular, the accounting for the segment
we are writing the inode into must be correct, and the accounting for
the segment that inode used to reside in must be correct. Rather than
just rewriting the inode if we wrote it wrong, rewrite the necessary
ifile blocks before writing the inode so we never write it wrong.
* Don't unmark any VDIROP vnodes if we haven't written them to disk,
avoiding yet another problem with the "wait for the cleaner" error
return from lfs_putpages().

Also, move the last callback to an aiodone call, so we no longer do any
memory management from interrupt context.
 1.115 23-Jul-2006  ad Use the LWP cached credentials where sane.
 1.114 07-Jun-2006  kardel merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.113 14-May-2006  elad branches: 1.113.2;
integrate kauth.
 1.112 18-Apr-2006  perseant Get rid of the LFS_FORCE_WRITE case. We never really used it, and it could
panic the kernel if cleaner daemon passed the right combination of arguments.
Coverity CID 2741.
 1.111 07-Apr-2006  perseant Several minor bug fixes:

* Correct (weak) segment lock assertions in lfs_fragextend and lfs_putpages.
* Keep IN_MODIFIED set if we run out of avail in lfs_putpages.
* Don't try to (re)write buffers on a VBLK vnode; fixes a panic I found
while running with an LFS root.
* Raise priority of LFCNSEGWAIT to PVFS; PUSER is way too low for
something the pagedaemon is relying on.
 1.110 19-Mar-2006  rtr init struct vnode *vp = NULL
coverity 2724 / run 6
XXX in future runs coverity may complain about deref NULL now but comment
on line 382 indicates this should not be possible
 1.109 17-Mar-2006  tls From Konrad Schroeder, in response to strange df output on anoncvs.netbsd.org:
We were returning the wrong value for free space. Now we're not.
 1.108 11-Dec-2005  christos branches: 1.108.4; 1.108.6; 1.108.8; 1.108.10; 1.108.12;
merge ktrace-lwp.
 1.107 25-May-2005  perseant branches: 1.107.2;
Don't update lfs_stats.segs_reclaimed if we're not keeping statistics.
Patch from Juan RP.
 1.106 20-May-2005  perseant Keep track of the number of segments reclaimed, since the cleaner doesn't
do this anymore (it hasn't for quite some time). Add a couple of conditional
debugging messages to indicate why segments are not cleaned, in the event
that lfs_segclean is used.

Make the LFCNSEGWAITALL fcntl work again.
 1.105 16-Apr-2005  perseant Use lfs_malloc() to manage the blkiov arrays that the cleaner functions use,
since the cleaner is likely to operate in a low-memory condition.
 1.104 01-Apr-2005  perseant Protect various per-fs structures with fs->lfs_interlock simple_lock, to
improve behavior in the multiprocessor case. Add debugging segment-lock
assertion statements.
 1.103 08-Mar-2005  perseant branches: 1.103.2;
Straighten out the maze of ifdefs. Instead, consolidate all the debugging
stuff under '#ifdef DEBUG', and use sysctl knobs to turn on/off particular
parts of the debugging reporting (if DEBUG is enabled). Re-enable the LFS
statistics in sysctl, while I'm there. A bit of a rototill.
 1.102 26-Feb-2005  perry nuke trailing whitespace
 1.101 26-Feb-2005  perseant Various minor LFS improvements:

* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statvfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().
 1.100 04-Dec-2003  yamt branches: 1.100.6; 1.100.8; 1.100.10;
use b_private rather than b_saveaddr.
XXX LFS_USE_B_INVAL
 1.99 07-Nov-2003  yamt fix spec vnode aliasing.
 1.98 10-Sep-2003  yamt g/c CHECK_COPYIN.
 1.97 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.96 30-Jul-2003  yamt - check EROFS earlier in lfs_markv.
- remove wrong error recovery code (fake buffers are never on bufqueue)
and put a comment instead.
 1.95 30-Jul-2003  yamt remove an unused definition of LFS_VREF_THRESHOLD.
 1.94 02-Jul-2003  yamt use queue.h macros.
 1.93 29-Jun-2003  fvdl branches: 1.93.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.92 29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.91 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.90 17-May-2003  nakayama Avoid comparison is always false warning in gcc 3.3 w/ 64-bit size_t.
 1.89 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.88 20-Mar-2003  yamt fix "more than one fragment" panics;
direct and indirect block pointers are not valid in the case of shortlinks.
while i'm here, move duplicated code in lfs_vget/fastvget into a new
function, lfs_vinit.
 1.87 15-Mar-2003  perseant Add simple_lock protection for lfs_seglock and lfs_subsys_pages; these will
be expanded to cover other per-fs and subsystem-wide data as well.

Fix a case of IN_MODIFIED being set without updating lfs_uinodes, resulting
in a "lfs_uinodes < 0" panic.

Fix a deadlock in lfs_putpages arising from the need to busy all pages in a
block; unbusy any that had already been busied before starting over.
 1.86 08-Mar-2003  perseant Only #define LFS if not already defined.
 1.85 08-Mar-2003  perseant Add an lfs_strategy() that checks to make sure we're not trying to read
where the cleaner is trying to write, instead of tying up the "live"
buffers (or pages).

Fix a bug in the LFS_UBC case where oversized buffers would not be
checksummed correctly, causing uncleanable segments.

Make sure that wakeup(fs->lfs_iocount) is done if fs->lfs_iocount is 1
as well as 0, since we wait in some places for it to drop to 1.

Activate all pages that make it into lfs_gop_write without the segment
lock held, since they must have been dirtied very recently, even if
PG_DELWRI is not set.
 1.84 24-Feb-2003  perseant Add lfs_ioctl vnode op, with ioctls to take over cleaner system call
functionality (not including segment clean, since that is now done
automatically as checkpoints happen).
 1.83 23-Feb-2003  simonb Remove assigned-to but not used variable.
 1.82 20-Feb-2003  perseant Tabify, and fix some comment alignment problems.
 1.81 18-Feb-2003  perseant Make it compile again, grr....
 1.80 17-Feb-2003  perseant Add code to UBCify LFS. This is still behind "#ifdef LFS_UBC" for now
(there are still some details to work out) but expect that to go
away soon. To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:

* Create a writer daemon kernel thread whose purpose is to handle page
writes for the pagedaemon, but which also takes over some of the
functions of lfs_check(). This thread is started the first time an
LFS is mounted.

* Add a "flags" parameter to GOP_SIZE. Current values are
GOP_SIZE_READ, meaning that the call should return the size of the
in-core version of the file, and GOP_SIZE_WRITE, meaning that it
should return the on-disk size. One of GOP_SIZE_READ or
GOP_SIZE_WRITE must be specified.

* Instead of using malloc(...M_WAITOK) for everything, reserve enough
resources to get by and use malloc(...M_NOWAIT), using the reserves if
necessary. Use the pool subsystem for structures small enough that
this is feasible. This also obsoletes LFS_THROTTLE.

And a few that are not strictly necessary:

* Moves the LFS inode extensions off onto a separately allocated
structure; getting closer to LFS as an LKM. "Welcome to 1.6O."

* Unified GOP_ALLOC between FFS and LFS.

* Update LFS copyright headers to correct values.

* Actually cast to unsigned in lfs_shellsort, like the comment says.

* Keep track of which segments were empty before the previous
checkpoint; any segments that pass two checkpoints both dirty and
empty can be summarily cleaned. Do this. Right now lfs_segclean
still works, but this should be turned into an effectless
compatibility syscall.
 1.79 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.78 18-Jan-2003  thorpej Merge the nathanw_sa branch.
 1.77 26-Dec-2002  yamt don't try to write all blocks passed to lfs_markv at once
since it likely causes buf starvation.
 1.76 21-Dec-2002  yamt add a XXX comment
 1.75 18-Dec-2002  yamt correct/add assertion.
 1.74 17-Dec-2002  yamt no need for cleaner to hold vnode locks.
cleaner and normal vnode operations are synchronized enough by
seglock/fraglock and buf's B_BUSY-ness.
 1.73 24-Nov-2002  yamt in lfs_fakebuf, make corresponding buffer busy to avoid
reading blocks that isn't written yet.
it's needed because we'll update metadatas in lfs_updatemeta
before data pointed by them is actually written to disk.

XXX should be solved with fake inode/indirect blocks instead?
 1.72 24-Nov-2002  yamt blksize() macro shouldn't used for indirect blocks.
this fixes "getblk: block size invariant failed" panic.
PR 18977.
 1.71 03-Aug-2002  itojun correct range check, have overflow check, fix type mismatches,
for cmap args and some other calls. from openbsd
 1.70 07-Jul-2002  briggs Fix a printf format warning.
 1.69 06-Jul-2002  perseant Deal with fragment size changes better. For each fragment that can
exist on an on-disk inode, we keep a record of its size in struct inode,
which is updated when we write the block to disk. The cleaner routines
thus have ready access to what size is the correct size for this block,
on disk.

Fixed a related bug: if a file with fragments is being cleaned
(fragments being cleaned) at the same time it is being extended beyond
NDADDR blocks, we could write a bogus FINFO record that has a frag in the
middle; when it was cleaned this would give back bogus file data. Don't
write the indirect blocks in this case, since there is no need.

lfs_fragextend and lfs_truncate no longer require the seglock, but instead
take a shared lock, which the seglock locks exclusively.
 1.68 20-Jun-2002  perseant Don't bomb out of lfs_bmapv if the caller is requesting blocks that
live in the current segment. There's nothing wrong with this, and
it is necessary for the correct operation of the coaleascer.
 1.67 16-Jun-2002  perseant For synchronous writes, keep separate i/o counters for each write, so
processes don't have to wait for one another to finish (e.g., nfsd seems
to be a little happier now, though I haven't measured the difference).
Synchronous checkpoints, however, must always wait for all i/o to finish.

Take the contents of the callback functions and have them run in thread
context instead (aiodoned thread). lfs_iocount no longer has to be
protected in splbio(), and quite a bit less of the segment construction
loop needs to be in splbio() as well.

If lfs_markv is handed a block that is not the correct size according to
the inode, refuse to process it. (Formerly it was extended to the "correct"
size.) This is possibly more prone to deadlock, but less prone to corruption.

lfs_segclean now outright refuses to clean segments that appear to have live
bytes in them. Again this may be more prone to deadlock but avoids
corruption.

Replace ufsspec_close and ufsfifo_close with LFS equivalents; this means
that no UFS functions need to know about LFS_ITIMES any more. Remove
the reference from ufs/inode.h.

Tested on i386, test-compiled on alpha.
 1.66 06-Jun-2002  perseant Let lfs_bmapv fill in the bi_size member of the BLOCK_INFO structure,
as well as bi_daddr. This lets the cleaner have an idea of what the size
of this block was at the time it was written without having to refer to
a segment header (e.g., in the file coalescing case).

Tested on i386.
 1.65 14-May-2002  perseant branches: 1.65.2; 1.65.4;
Phase one of my three-phase plan to make LFS play nice with UBC, and bug-fixes
I found while making sure there weren't any new ones.

* Make the write clusters keep track of the buffers whose blocks they contain.
This should make it possible to (1) write clusters using a page mapping
instead of malloc, if desired, and (2) schedule blocks for rewriting
(somewhere else) if a write error occurs. Code is present to use
pagemove() to construct the clusters but that is untested and will go away
anyway in favor of page mapping.
* DEBUG now keeps a log of Ifile writes, so that any lingering instances of
the "dirty bufs" problem can be properly debugged.
* Keep track of whether the Ifile has been dirtied by various routines that
can be called by lfs_segwrite, and loop on that until it is clean, for
a checkpoint. Checkpoints need to be squeaky clean.
* Warn the user (once) if the Ifile grows larger than is reasonable for their
buffer cache. Both lfs_mountfs and lfs_unmount check since the Ifile can
grow.
* If an inode is not found in a disk block, try rereading the block, under
the assumption that the block was copied to a cluster and then freed.
* Protect WRITEINPROG() with splbio() to fix a hang in lfs_update.
 1.64 12-May-2002  matt Eliminate commons.
 1.63 18-Dec-2001  chs use the new compatibility routines to allow mmap() to work
(in the same non-coherent fashion that it worked pre-UBC)
until someone has time to do it the right way.
 1.62 23-Nov-2001  chs add spaces for KNF. confirmed to produce identical objects.
 1.61 08-Nov-2001  lukem add RCSID
 1.60 26-Oct-2001  lukem remove #include <ufs/ufs/quota.h> where it was just to appease
<ufs/ufs/inode.h>, since the latter now includes the former. leave the former
in source that obviously uses specific bits of it (for completeness.)
 1.59 15-Sep-2001  chs branches: 1.59.2;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.58 03-Aug-2001  jdolecek branches: 1.58.2;
Constraint 'blkcnt' of lfs_markv() syscall by 64KB. Reviewed by
Konrad Schroder <perseant@NetBSD.org>.
 1.57 13-Jul-2001  perseant Merge the short-lived perseant-lfsv2 branch into the trunk.

Kernels and tools understand both v1 and v2 filesystems; newfs_lfs
generates v2 by default. Changes for the v2 layout include:

- Segments of non-PO2 size and arbitrary block offset, so these can be
matched to convenient physical characteristics of the partition (e.g.,
stripe or track size and offset).

- Address by fragment instead of by disk sector, paving the way for
non-512-byte-sector devices. In theory fragments can be as large
as you like, though in reality they must be smaller than MAXBSIZE in size.

- Use serial number and filesystem identifier to ensure that roll-forward
doesn't get old data and think it's new. Roll-forward is enabled for
v2 filesystems, though not for v1 filesystems by default.

- The inode free list is now a tailq, paving the way for undelete (undelete
is not yet implemented, but can be without further non-backwards-compatible
changes to disk structures).

- Inode atime information is kept in the Ifile, instead of on the inode;
that is, the inode is never written *just* because atime was changed.
Because of this the inodes remain near the file data on the disk, rather
than wandering all over as the disk is read repeatedly. This speeds up
repeated reads by a small but noticeable amount.

Other changes of note include:

- The ifile written by newfs_lfs can now be of arbitrary length, it is no
longer restricted to a single indirect block.

- Fixed an old bug where ctime was changed every time a vnode was created.
I need to look more closely to make sure that the times are only updated
during write(2) and friends, not after-the-fact during a segment write,
and certainly not by the cleaner.
 1.56 03-Dec-2000  perseant branches: 1.56.2; 1.56.4; 1.56.6;
Call uvm_vmp_setsize() in lfs_{fast,}vget to set initial vnode size.
 1.55 30-Nov-2000  jdolecek no need to include fs_lfs.h, define LFS directly
 1.54 27-Nov-2000  perseant If LFS_DO_ROLLFORWARD is defined, roll forward from the older checkpoint
on mount, through the newer checkpoint and on through any newer
partial-segments that may have been written but not checkpointed because
of an intervening crash.

LFS_DO_ROLLFORWARD is not defined by default.
 1.53 22-Nov-2000  perseant Protect lfs_{bmapv,markv} with vfs_{un,}busy. Fix a reference/lock leak
in an error case in lfs_markv. Change the vfs_getvfs() error to return
ENOENT, for consistency with failure of vfs_busy().

99% of this patch was from Jesse Off <joff@gci-net.com> (PR #11547).
 1.52 17-Nov-2000  perseant Correct accounting of lfs_avail, locked_queue_count, and locked_queue_bytes.
(PR #11468). In the case of fragment allocation, check to see if enough
space is available before extending a fragment already scheduled for writing.

The locked_queue_* variables indicate the number of buffer headers and bytes,
respectively, that are unavailable to getnewbuf() because they are locked up
waiting for LFS to flush them; make sure that that is actually what we're
counting, i.e., never count malloced buffers, and always use b_bufsize instead
of b_bcount.

If DEBUG is defined, the periodic calls to lfs_countlocked will now complain
if either counter is incorrect. (In the future lfs_countlocked will not need
to be called at all if DEBUG is not defined.)
 1.51 21-Oct-2000  toshii In lfs_fastvget(), initialize i_lfs_effnblks correctly.
 1.50 20-Oct-2000  perseant Do not increment the clean segment counter, if a segment that the cleaner
is trying to clean is already clean (e.g., if two lfs_cleanerds are running
at once.)
 1.49 09-Sep-2000  perseant Various bug-fixes to LFS, to wit:


Kernel:

* Add runtime quantity lfs_ravail, the number of disk-blocks reserved
for writing. Writes to the filesystem first reserve a maximum amount
of blocks before their write is allowed to proceed; after the blocks
are allocated the reserved total is reduced by a corresponding amount.

If the lfs_reserve function cannot immediately reserve the requested
number of blocks, the inode is unlocked, and the thread sleeps until
the cleaner has made enough space available for the blocks to be
reserved. In this way large files can be written to the filesystem
(or, smaller files can be written to a nearly-full but thoroughly
clean filesystem) and the cleaner can still function properly.

* Remove explicit switching on dlfs_minfreeseg from the kernel code; it
is now merely a fs-creation parameter used to compute dlfs_avail and
dlfs_bfree (and used by fsck_lfs(8) to check their accuracy). Its
former role is better assumed by a properly computed dlfs_avail.

* Bounds-check inode numbers submitted through lfs_bmapv and lfs_markv.
This prevents a panic, but, if the cleaner is feeding the filesystem
the wrong data, you are still in a world of hurt.

* Cleanup: remove explicit references of DEV_BSIZE in favor of
btodb()/dbtob().

lfs_cleanerd:

* Make -n mean "send N segments' blocks through a single call to
lfs_markv". Previously it had meant "clean N segments though N calls
to lfs_markv, before looking again to see if more need to be cleaned".
The new behavior gives better packing of direct data on disk with as
little metadata as possible, largely alleviating the problem that the
cleaner can consume more disk through inefficient use of metadata than
it frees by moving dirty data away from clean "holes" to produce
entirely clean segments.

* Make -b mean "read as many segments as necessary to write N segments
of dirty data back to disk", rather than its former meaning of "read
as many segments as necessary to free N segments worth of space". The
new meaning, combined with the new -n behavior described above,
further aids in cleaning storage efficiency as entire segments can be
written at once, using as few blocks as possible for segment summaries
and inode blocks.

* Make the cleaner take note of segments which could not be cleaned due
to error, and not attempt to clean them until they are entirely free
of dirty blocks. This prevents the case in which a cleanerd running
with -n 1 and without -b (formerly the default) would spin trying
repeatedly to clean a corrupt segment, while the remaining space
filled and deadlocked the filesystem.

* Update the lfs_cleanerd manual page to describe all the options,
including the changes mentioned here (in particular, the -b and -n
flags were previously undocumented).

fsck_lfs:

* Check, and optionally fix, lfs_avail (to an exact figure) and
lfs_bfree (within a margin of error) in pass 5.

newfs_lfs:

* Reduce the default dlfs_minfreeseg to 1/20 of the total segments.

* Add a warning if the sgs disklabel field is 16 (the default for FFS'
cpg, but not usually desirable for LFS' sgs: 5--8 is a better range).

* Change the calculation of lfs_avail and lfs_bfree, corresponding to
the kernel changes mentioned above.

mount_lfs:

* Add -N and -b options to pass corresponding -n and -b options to
lfs_cleanerd.

* Default to calling lfs_cleanerd with "-b -n 4".


[All of these changes were largely tested in the 1.5 branch, with the
idea that they (along with previous un-pulled-up work) could be applied
to the branch while it was still in ALPHA2; however my test system has
experienced corruption on another filesystem (/dev/console has gone
missing :^), and, while I believe this unrelated to the LFS changes, I
cannot with good conscience request that the changes be pulled up.]
 1.48 13-Jul-2000  thorpej XXX Use of hzto() return value needs to be double-checked here.
 1.47 05-Jul-2000  perseant Clean up accounting of lfs_uinodes (dirty but unwritten inodes).

Make lfs_uinodes a signed quantity for debugging purposes, and set it to
zero as fs mount time.

Enclose setting/clearing of the dirty flags (IN_MODIFIED, IN_ACCESSED,
IN_CLEANING) in macros, and use those macros everywhere. Make
LFS_ITIMES use these macros; updated the ITIMES macro in inode.h to know
about this. Make ufs_getattr use ITIMES instead of FFS_ITIMES.
 1.46 04-Jul-2000  perseant Fix errors observed while trying to fill the filesystem with yesterday's
fixes:

- Write copies of bfree and avail in the CLEANERINFO block, so the
cleaner doesn't have to guess which superblock has the current
information (if indeed any do).

- Tighten up accounting of lfs_avail (more needs to be done).

- When cleansing indirect blocks of UNWRITTEN, make sure not to mark
them clean, since they'll need to be rewritten later.
 1.45 03-Jul-2000  fvdl Correct typo in previous.
 1.44 30-Jun-2000  fvdl Rearrange code around getnewvnode as was already done for ffs, to avoid
locking against oneself because getnewvnode recycles a softdep-using vnode.
 1.43 27-Jun-2000  perseant Fixes associated with filling an LFS:

Change the space computation to appear to change the size of the *disk*
rather than the *bytes used* when more segment summaries and inode
blocks are written. Try to estimate the amount of space that these will
take up when more files are written, so the disk size doesn't change too
much.

Regularize error returns from lfs_valloc, lfs_balloc, lfs_truncate: they
now fail entirely, rather than succeeding half-way and leaving the fs in
an inconsistent state.

Rewrite lfs_truncate, mostly stealing from ffs_truncate. The old
lfs_truncate had difficulty truncating a large file to a non-zero size
(indirect blocks were not handled appropriately).

Unmark VDIROP on fvp after ufs_remove, ufs_rmdir, so these can be
reclaimed immediately: this vnode would not be written to disk again
anyway if the removal succeeded, and if it failed, no directory
operation occurred.

ufs_makeinode and ufs_mkdir now remove IN_ADIROP on error.
 1.42 22-Jun-2000  perseant Update lfs_vunref for the fact that now a vnode can be locked with no
references (locked for VOP_INACTIVE at the end of vrele) and it's okay.
Check the return value of lfs_vref where appropriate.
Fixes PR #s 10285 and 10352.
 1.41 30-Mar-2000  augustss branches: 1.41.4;
Remove register declarations.
 1.40 19-Jan-2000  perseant Changes to stabilize LFS. The first two of these should also apply to the
1.4 branch.

* Use a separate per-fs lock, instead of ufs_hashlock, to protect the Inode
free list. This seems to prevent the "lockmgr: %d, not exclusive lock holder
%d, unlocking" message I was mis-attributing last night to an unlocked vnode
being passed to vrele.

* Change calling semantics of lfs_ifind, to give better error reporting:
If fed a struct buf, it can report the block number of the offending inode
block as well as the inode number.

* Back out rev 1.10 of lfs_subr.c, since the replacement code was slightly
uglier while being functionally identical.

* Make lfs_vunref use the same free list convention as vrele/vput, so that
vget does not remove vnodes from a hash list they are not on.
 1.39 16-Jan-2000  perseant correct typo (reference uninitialized variable)
 1.38 14-Jan-2000  perseant Expand the category of "metadata" in lfs_markv to include Ifile data blocks.
This prevents a rare condition in which Ifile "ifile" blocks, that is, the
blocks of the ifile which point VOP_VGET at the inode block containing the
requested inode, from being "unwritten" when cleaning during intense disk
activity.
 1.37 23-Nov-1999  fvdl Be more careful to block bio interrupts for some data structures. There
were at least a few missed cases where vp->v_{clean,dirty}blkhd were
unprotected since the softdep/trickle sync merge.
 1.36 21-Nov-1999  perseant Initialize i_ffs_effnlink, so every file doesn't look like it's already been
deleted for the purpose of dirops (particularly create and mkdir). Addresses
PR#8815.
 1.35 12-Nov-1999  perseant Back out my patch of the 8th (to address unreferenced inode problem).
Apparently this needs more thought.
 1.34 09-Nov-1999  perseant If ifile blocks were written before dirops were complete, and then the
system crashed, inodes could be allocated that were not referenced. (Though
not a serious problem, it evidences itself in phase 4 of fsck_lfs.) Fix
this by marking if_daddr with UNASSIGNED before the inodes are actually
written; at mount time the ifile is checked for UNASSIGNED entries and
any that are found are linked back into the free list. (The latter
functionality should move into the roll-forward agent when it materializes.)
 1.33 08-Jul-1999  wrstuden branches: 1.33.2; 1.33.4; 1.33.8;
Modify file systems to deal with struct lock in struct vnode. All leaf
fs's other than nfs use genfs_lock() for locking.

Modify lookup routines to set PDIRUNLOCK when they unlock the parrent.
 1.32 09-Jun-1999  drochner complete the previous
reindent syscall args
 1.31 09-Jun-1999  christos prefix the lfs syscalls with sys_
 1.30 14-Apr-1999  perseant Fix lost lock in lfs_markv -- a typo-class bug, obvious when you look at it.
 1.29 12-Apr-1999  perseant Improve the debugging printfs in the cleaner syscalls (in particular, make
it obvious that they're coming from lfs).
 1.28 12-Apr-1999  perseant Better checking for held inode locks in lfs_fastvget, for a number of error
conditions. Also change the default setting of lfs_clean_vnhead to 0, which
seems to make the locking problems go away (although this is difficult to
test as I can't reliably reproduce them).
 1.27 11-Apr-1999  perseant Take out the `#ifdef USE_UFSHASH'; use ufs_hashlock to lock the inode free
list instead of free_lock.
 1.26 29-Mar-1999  perseant branches: 1.26.2;
Fix unit mismatch in debugging code in lfs_segclean; also put it properly
within `#ifdef DEBUG_LFS'.
 1.25 25-Mar-1999  perseant Fixes to make dirops and lfs_vflush play together well. In particular,
if we are short on vnodes, lfs_vflush from another process can grab a
vnode that lfs_markv has already processed but not yet written; but
lfs_markv holds the seglock. When lfs_vflush gets around to writing it,
the context for copyin is gone. So, now lfs_markv calls copyin itself,
rather than having lfs_writeseg do it.
 1.24 25-Mar-1999  perseant Change lfs_sb_cksum to use offsetof() instead of an inlined version.

Fix lfs_vref/lfs_vunredf to ignore VXLOCKed vnodes that are also being
flushed.

Improve the debugging messages somewhat.
 1.23 25-Mar-1999  perseant clean up unused/required #ifdefs
 1.22 10-Mar-1999  perseant New sources should leave the LFS in a more-or-less working state. Changes
include:

- DIROP segregation is enabled, and greater care is taken
to make sure that a checkpoint completes. Fsck is not
needed to remount the filesystem.
- Several checks to make sure that the LFS subsystem does not
overuse various resources (memory, in particular).
- The cleaner routines, lfs_markv in particular, are completely
rewritten. A buffer overflow is removed. Greater care is taken
to ensure that inodes come from where lfs_cleanerd say they come
from (so we know nothing has changed since lfs_bmapv was called).
- Fragment allocation is fixed, so that writes beyond end-of-file
do the right thing.
 1.21 09-Nov-1998  mycroft GC the B_CACHE bit.
 1.20 23-Oct-1998  thorpej Use DINODE_SIZE rather than sizeof(struct dinode).
 1.19 15-Sep-1998  pk Apply patch from PR#5542: buffer overflow in lfs_markv().
 1.18 24-Jun-1998  sommerfe Always include fifos; "not an option any more".
 1.17 09-Jun-1998  scottr Protect various config(8)-generated files from inclusion while
building LKMs. Fixes PR 5557.
 1.16 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.15 19-Feb-1998  thorpej Include the LFS option header.
 1.14 13-Jan-1998  thorpej Nuke spurious semicolon, from Konrad Schroder <perseant@hhhh.org>.
 1.13 11-Jun-1997  bouyer Add support for ext2fs, this needed a few modifications to ufs/ufs/inode.h:
- added an "union inode_ext" to struct inode, for the per-fs extentions.
For now only ext2fs uses it.
- i_din is now an union:
union {
struct dinode ffs_din; /* 128 bytes of the on-disk dinode. */
struct ext2fs_dinode e2fs_din; /* 128 bytes of the on-disk dinode. */
} i_din
Added a lot of #define i_ffs_* and i_e2fs_* to access the fields.
- Added two macros: FFS_ITIMES and EXT2FS_ITIMES. ITIMES calls the rigth
macro, depending on the time of the inode. ITIMES is used where necessary,
FFS_ITIMES and EXT2FS_ITIMES in other places.
 1.12 12-Oct-1996  christos revert previous kprintf changes
 1.11 10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.10 09-Feb-1996  christos lfs prototypes
 1.9 21-Sep-1995  thorpej Make system calls conform to a standard prototype and bring those
prototypes into scope.
 1.8 21-Mar-1995  mycroft Update to use timer{add,sub}().
 1.7 14-Dec-1994  mycroft Sync with CSRG.
 1.6 11-Dec-1994  mycroft Use __timeradd(), not timevaladd().
 1.5 20-Oct-1994  cgd update for new syscall args description mechanism, and deal safely
with wider types.
 1.4 21-Aug-1994  cgd C syntax fix, and syscall args style (For later.)
 1.3 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.2 16-Jun-1994  mycroft This i_flags should be i_flag.
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.26.2.4 20-Jan-2000  he Pull up revision 1.40 (requested by perseant):
Files removed (through unlink, rmdir) are now really removed, though the
removal is postponed until the dirop is complete to ensure validity of
the filesystem through a crash. Use a separate per-fs lock, instead of
ufs_hashlock, to protect the inode free list. Change calling semantics
of lfs_ifind, to give better error reporting: If fed a struct buf, it
can report the block number of the offending inode block as well as the
inode number.
 1.26.2.3 15-Jan-2000  he Pull up revision 1.38 (requested by perseant):
Expand the category of "metadata" in lfs_markv to include Ifile
data blocks. This prevents a rare condition in which certain
Ifile blocks are "unwritten" when cleaning during intense disk
activity.
 1.26.2.2 15-Apr-1999  perseant branches: 1.26.2.2.2;
Pull up 1.29->1.30; trivial fix for the forgotten lock problem.
 1.26.2.1 13-Apr-1999  perseant Pull-up of changes made to the trunk on Sunday [1.27-1.28], to wit:

Take out the `#ifdef USE_UFSHASH'; use ufs_hashlock to lock the inode free
list instead of free_lock.

Fix inode reporting in lfs_statfs (the meaning of f_files and f_ffree was
reversed).

Fix "lfs_ifind: dinode xxx not found" panic. When inodes were freed, then
immediately reloaded, their dinodes were located in an inode block which
was not on disk at the advertized location, nor in the cache (although it
would be flushed to disk next segment write). Fix this by using getblk()
instead of lfs_newbuf() for inode blocks.

Better checking for held inode locks in lfs_fastvget, for a number of
error conditions. Also change the default setting of lfs_clean_vnhead to
0, which seems to make the locking problems go away (although this is
difficult to test as I can't reliably reproduce them).

Make sure that the wakeup occurs for vnodes that lfs_update might be
sleeping on (nodes which are not marked IN_MODIFIED/IN_CLEANING, but which
have dirty buffers), by marking them with the appropriate flag if
dirtybuffers were added while the write was in progress.

Fix block counting during file truncation, if not truncating to zero.

Disallow threshold-initiated cache flush when dirops are active. Also,
make SET_ENDOP use lfs_check instead of inlining most of it.

Improve the debugging printfs in the cleaner syscalls (in particular, make
it obvious that they're coming from lfs).

Check the superblock version field, and refuse to mount the filesystem if
the version number is higher than we know about. This allows, e.g.,
changes in the format of the ifile, segment size restrictions and
boundaries, etc., which would not affect existing fields in the
superblock, but which would drastically affect the filesystem, to be
smoothly integrated at a later date.
 1.26.2.2.2.3 31-Aug-1999  perseant Rudimentary support for LFS under UBC:

- LFS-specific VOP_BALLOC and VOP_PUTPAGES vnode ops.

- getblk VREG panic #ifdef'd out (can be reinstated when Ifile is
internalized and Ifile can be made another type from VREG)

- interface to VOP_PUTPAGES changed to pass all pager flags, not
just sync. FS putpages routines must know about the pager flags.

- new LFS magic disk address, -2 ("unwritten"), meaning accounted for
but not assigned to a fixed disk location (since LFS does these two
things separately, and the previous accounting method using buffer
headers no longer will work). Changed references to (foo == (daddr_t)-1)
to (foo < 0). Since disk drivers reject all addresses < 0, this should
not present a problem for other FSs.
 1.26.2.2.2.2 02-Aug-1999  thorpej Update from trunk.
 1.26.2.2.2.1 21-Jun-1999  thorpej Sync w/ -current.
 1.33.8.2 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.33.8.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.33.4.1 15-Nov-1999  fvdl Sync with -current
 1.33.2.3 08-Dec-2000  bouyer Sync with HEAD.
 1.33.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.33.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.41.4.7 16-Aug-2001  tv Pullup [jdolecek]:

sys/ufs/lfs/lfs_syscalls.c 1.58 via patch

Constrain "blkcnt" of lfs_markv() syscall by 64KB.
 1.41.4.6 03-Feb-2001  he Pull up revisions 1.50,1.52-1.53 (requested by perseant):
o Close up accounting holes in LFS' accounting of immediately-
available-space, number of clean segments, and amount of dirty
space taken up by metadata (PR#11468, PR#11470, PR#11534).
o Don't clean the same segment twice.
o Fix locking and reference leaks in lfs_markv, on error (PR#11547).
 1.41.4.5 01-Nov-2000  tv Pullup 1.51 [toshii]:
In lfs_fastvget(), initialize i_lfs_effnblks correctly.
 1.41.4.4 14-Sep-2000  perseant Pull up recent LFS kernel changes (approved by thorpej):

ufs/ufs/inode.h, 1.20--1.22 (add i_lfs_effnblks extension ;
make ITIMES aware of LFS_ITIMES;
_LKM protection so userland progs
compile)
ufs/ufs/ufs_vnops.c, 1.69, 1.71 (remove IN_ADIROP;
use ITIMES instead of FFS_ITIMES)
ufs/ufs/ufs_readwrite.c, 1.27 (use lfs_reserve in lfs_write)
ufs/lfs/lfs.h, 1.26--1.32 (define LFS_EST_* macros ;
change MIN_FREE_SEGS to lfs_minfreesegs ;
add avail and bfree to CLEANERINFO ;
change lfs_uinodes to signed ;
change lfs_dmeta to signed ;
add whitespace to line up structure
members ;
explicit cast to int32_t in LFS_EST_*
macros)
ufs/lfs/lfs_alloc.c, back out 1.34.2.3 (pullups of 1.39, 1.40);
then pull up 1.38 (clean up on error)
1.39--1.43 (restore fvdl's ufs_hashlock fix ;
restore fvdl's ufs_hashlock fix ;
set i_lfs_effnblks ;
use UINO macros ;
add comments and fix long lines)
ufs/lfs/lfs_balloc.c, 1.19 (don't succeed halfway)
1.21--1.25 (use i_lfs_effnblks ;
fix i_lfs_effnblks computation and
quieten ;
fix i_ffs_blocks in unwritten fragment ;
remove useless debugging check ;
add comments and (c) 2000)
ufs/lfs/lfs_bio.c, 1.24--1.30 (cleanup and make lfs_flush_fs take
"struct lfs *" instead of "struct
mount *" ;
use lfs_minfreeseg instead of
MIN_FREE_SEGS ;
use UINO macros, and copy bfree/avail
to CLEANERINFO ;
add lfs_reserve function ;
1.28--1.30 fix printf formatting)
ufs/lfs/lfs_cksum.c, 1.13 (add (c) 2000)
ufs/lfs/lfs_debug.c, 1.11 (use btodb instead of DEV_BSIZE)
ufs/lfs/lfs_extern.h, 1.18, 1.20--1.21 (function prototype changes)
ufs/lfs/lfs_inode.c, 1.38 (rewrite lfs_truncate from
ffs_truncate)
1.40--1.44 (count written and unwritten blocks
seperately ;
use disk block units instead of bytes ;
remove unnecessary "mod" variable ;
correct B_DELWRI to avoid bawrite panic ;
use lfs_reserve)
ufs/lfs/lfs_segment.c, 1.52-1.59 (use lfs_dmeta to note used summaries ;
check for UNWRITTEN in indirect blocks ;
more debugging stuff inside #ifdef
DEBUG_LFS ;
use LK_CANRECURSE ;
don't drop dirty indirect blocks ;
use UINO macros ;
don't hose the free list ;
use btodb() instead of DEV_BSIZE ;
make it compile again (oops))
ufs/lfs/lfs_subr.c, 1.16--1.17 (check for locked inodes before
changing ;
use btodb() instead of DEV_BSIZE, (c)
2000)
ufs/lfs/lfs_syscalls.c, back out 1.41.4.2 (fvdl's ufs_hashlock fix);
then pull up 1.43 (use lfs_dmeta)
1.44--1.45 (restore fvdl's ufs_hashlock fix)
1.46--1.47 (fix lfs_avail leakage from sblock
segments ;
use UINO macros)
1.49 (bounds-check inode numbers in
lfs_markv)
ufs/lfs/lfs_vfsops.c, 1.53 (use LFS_EST_* macros in lfs_statfs)
1.56--1.58 (initialize lfs_minfreeseg, lfs_effnblk ;
initialize lfs_uinodes ;
initialize lfs_ravail)
ufs/lfs/lfs_vnops.c, 1.40 (remove VDIROP from removed files)
1.42--1.44 (move SET_ENDOP below the removal of
VDIROP ;
use UINO macros and add lfs_itimes
function ;
use lfs_reserve in dirops)
 1.41.4.3 13-Jul-2000  thorpej Pull up rev. 1.48:
XXX Use of hzto() return value needs to be double-checked here.
 1.41.4.2 03-Jul-2000  fvdl pullup the fixes from the trunk to not hold ufs_hashlock across
getnewvnode()
 1.41.4.1 22-Jun-2000  perseant Pull up lfs_vunref fix from the trunk.
 1.56.6.5 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.56.6.4 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.56.6.3 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.56.6.2 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.56.6.1 03-Aug-2001  lukem update to -current
 1.56.4.3 02-Jul-2001  perseant Change disk addressing unit to be the fragment, instead of the disk sector.
All quantities in the superblock, inodes, indirect blocks, etc. refer now
to this abstract unit (called "fsb" as it is in FFS) instead of disk sectors;
as a consequence segment summary blocks have to be multiples of a fragment in
size. In v1 filesystems, compatibility code ensures that 1 fsb == 1 sector,
regardless of fragment size.

Fragments can now range in size between 512 and 32k; in the event that
LFS_LABELPAD (8k) is smaller than the disk address unit size, an extra
proto-superblock is kept at 8k from the beginning of the disk, to be used
*only* to locate the real superblocks. (Not all of the userland knows about
this yet.)

Almost all of this was done not by me, but by joff.
 1.56.4.2 29-Jun-2001  perseant Get rid of __P(), protoizing where it had not already been done
 1.56.4.1 27-Jun-2001  perseant Import of what I've been calling "LFSv2", that is, LFS with some features
added that require changes to the on-disk data structures. These include:

- 64-bit time in everything but inodes
- User-specified segment offset, and segment size no longer
restricted to PO2.
- Serial number on segment summaries in addition to timestamp, and
a new volume identifier, to make roll-forward feasible without
fear of finding old data and thinking it was new.

Although I think this version works at least as well as what's on the trunk,
we're not done yet; hence this commit is going in on a branch and not on
the trunk. Enhancements that are not here yet include fragment addressing,
like FFS does, instead of block addressing.
 1.56.2.14 29-Dec-2002  thorpej Sync with HEAD.
 1.56.2.13 19-Dec-2002  thorpej Sync with HEAD.
 1.56.2.12 11-Dec-2002  thorpej Sync with HEAD.
 1.56.2.11 13-Aug-2002  nathanw Catch up to -current.
 1.56.2.10 01-Aug-2002  nathanw Catch up to -current.
 1.56.2.9 12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.56.2.8 20-Jun-2002  nathanw Catch up to -current.
 1.56.2.7 29-May-2002  nathanw #include <sys/sa.h> before <sys/syscallargs.h>, to provide sa_upcall_t
now that <sys/param.h> doesn't include <sys/sa.h>.

(Behold the Power of Ed)
 1.56.2.6 08-Jan-2002  nathanw Catch up to -current.
 1.56.2.5 14-Nov-2001  nathanw Catch up to -current.
 1.56.2.4 21-Sep-2001  nathanw Catch up to -current.
 1.56.2.3 24-Aug-2001  nathanw A few files and lwp/proc conversions I missed in the last big update.
GENERIC runs again.
 1.56.2.2 24-Aug-2001  nathanw Catch up with -current.
 1.56.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.58.2.1 01-Oct-2001  fvdl Catch up with -current.
 1.59.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.65.4.2 07-Aug-2002  lukem Pull up revision 1.71 (requested by itojun in ticket #616):
correct range check, have overflow check, fix type mismatches,
for cmap args and some other calls. from openbsd
 1.65.4.1 20-Jun-2002  lukem Pull up revision 1.67 (requested by perseant in ticket #325):
For synchronous writes, keep separate i/o counters for each write, so
processes don't have to wait for one another to finish (e.g., nfsd seems
to be a little happier now, though I haven't measured the difference).
Synchronous checkpoints, however, must always wait for all i/o to finish.
Take the contents of the callback functions and have them run in thread
context instead (aiodoned thread). lfs_iocount no longer has to be
protected in splbio(), and quite a bit less of the segment construction
loop needs to be in splbio() as well.
If lfs_markv is handed a block that is not the correct size according to
the inode, refuse to process it. (Formerly it was extended to the "correct"
size.) This is possibly more prone to deadlock, but less prone to corruption.
lfs_segclean now outright refuses to clean segments that appear to have live
bytes in them. Again this may be more prone to deadlock but avoids
corruption.
Replace ufsspec_close and ufsfifo_close with LFS equivalents; this means
that no UFS functions need to know about LFS_ITIMES any more. Remove
the reference from ufs/inode.h.
Tested on i386, test-compiled on alpha.
 1.65.2.3 29-Aug-2002  gehenna catch up with -current.
 1.65.2.2 15-Jul-2002  gehenna catch up with -current.
 1.65.2.1 20-Jun-2002  gehenna catch up with -current.
 1.93.2.9 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.93.2.8 08-Mar-2005  skrll Sync with HEAD.
 1.93.2.7 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.93.2.6 30-Oct-2004  skrll Reduced diff to HEAD by restoring the struct proc * argument to lfs_bmapv
 1.93.2.5 21-Sep-2004  skrll Fix the sync with head I botched.
 1.93.2.4 18-Sep-2004  skrll Sync with HEAD.
 1.93.2.3 24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.93.2.2 03-Aug-2004  skrll Sync with HEAD
 1.93.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.100.10.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.100.8.1 29-Apr-2005  kent sync with -current
 1.100.6.1 10-May-2005  riz Pull up the following revisions (requested by perseant in ticket #1281):

1.8 sys/ufs/lfs/TODO
1.75 sys/ufs/lfs/lfs.h (via patch)
1.74 sys/ufs/lfs/lfs_alloc.c (via patch)
1.49, 1.51 sys/ufs/lfs/lfs_balloc.c (1.51 via patch)
1.78 sys/ufs/lfs/lfs_bio.c
1.62 sys/ufs/lfs/lfs_extern.h (via patch)
1.156 sys/ufs/lfs/lfs_segment.c (via patch)
1.48 sys/ufs/lfs/lfs_subr.c
1.101 sys/ufs/lfs/lfs_syscalls.c
1.163 sys/ufs/lfs/lfs_vfsops.c (via patch)
1.134 sys/ufs/lfs/lfs_vnops.c (via patch)
1.61 sys/ufs/ufs/ufs_readwrite.c (via patch)

1.20 libexec/lfs_cleanerd/clean.h (via patch)
1.52 libexec/lfs_cleanerd/cleanerd.c (via patch)
1.41 libexec/lfs_cleanerd/library.c (via patch)

1.4 regress/sys/fs/lfs/newfs_fsck/Makefile
1.2 regress/sys/fs/lfs/newfs_fsck/mkfs_mount
1.2 regress/sys/fs/lfs/newfs_fsck/smallfiles
1.3 sbin/fsck_lfs/bufcache.c
1.3 sbin/fsck_lfs/bufcache.h
1.3 sbin/fsck_lfs/lfs.h
1.8 sbin/fsck_lfs/lfs.c (via patch)
1.8 sbin/fsck_lfs/pass3.c (via patch)
1.18 sbin/fsck_lfs/pass0.c (via patch)
1.18 sbin/fsck_lfs/utilities.c (via patch)
1.7 sbin/fsck_lfs/segwrite.c
1.19 sbin/fsck_lfs/setup.c (via patch)
1.3 sbin/newfs_lfs/Makefile
0 sbin/newfs_lfs/lfs.c (yes, remove it)
1.1 sbin/newfs_lfs/make_lfs.c
1.15 sbin/newfs_lfs/newfs.c (via patch)

Various minor LFS improvements.

Kernel:

* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this. Should fix PR #29045.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
Fixes PR #26680.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().

cleaner:

* Adapt lfs_cleanerd to use the fcntl call to get the Ifile filehandle,
so it need not be in the namespace.
* Make lfs_cleanerd be more careful when there are very few available
segments.
* Make lfs_cleanerd less verbose when the filesystem is unmounted.

newfs_lfs, fsck_lfs, and regression:

* Extend the lfs library from fsck_lfs(8) so that it can be used with a
not-yet-existent LFS. Make newfs_lfs(8) use this library, so it can
create LFSs whose Ifile is larger than one segment. Addresses PR #11110.
* Make newfs_lfs(8) use strsuftoi64() for its arguments, a la newfs(8).
* Make fsck_lfs(8) respect the "file system is clean" flag.
* Don't let fsck_lfs(8) think it has dirty blocks when invoked with the
-n flag.
* Remove the Ifile from the filesystem namespace. The cleaner now uses
a fcntl call on the root inode to find the Ifile filehandle. (As a
side-effect, addresses PR #29144.)
 1.103.2.5 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_syscalls.c: revision 1.112
Get rid of the LFS_FORCE_WRITE case. We never really used it, and it could
panic the kernel if cleaner daemon passed the right combination of arguments.
Coverity CID 2741.
 1.103.2.4 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_balloc.c: revision 1.60
sys/ufs/lfs/lfs_syscalls.c: revision 1.111
sys/ufs/lfs/lfs_segment.c: revision 1.172
sys/ufs/lfs/lfs_vnops.c: revision 1.163
Several minor bug fixes:
* Correct (weak) segment lock assertions in lfs_fragextend and lfs_putpages.
* Keep IN_MODIFIED set if we run out of avail in lfs_putpages.
* Don't try to (re)write buffers on a VBLK vnode; fixes a panic I found
while running with an LFS root.
* Raise priority of LFCNSEGWAIT to PVFS; PUSER is way too low for
something the pagedaemon is relying on.
 1.103.2.3 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_segment.c: revision 1.170
sys/ufs/lfs/lfs.h: revision 1.96
sys/ufs/lfs/lfs_vfsops.c: revision 1.194
sys/ufs/lfs/lfs_syscalls.c: revision 1.109
From Konrad Schroeder, in response to strange df output on anoncvs.netbsd.org:
We were returning the wrong value for free space. Now we're not.
 1.103.2.2 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vfsops.c: revision 1.180
sys/ufs/lfs/lfs_syscalls.c: revision 1.106
sys/ufs/lfs/lfs.h: revision 1.87
Keep track of the number of segments reclaimed, since the cleaner doesn't
do this anymore (it hasn't for quite some time). Add a couple of conditional
debugging messages to indicate why segments are not cleaned, in the event
that lfs_segclean is used.
Make the LFCNSEGWAITALL fcntl work again.

sys/ufs/lfs/lfs_syscalls.c: revision 1.107
Don't update lfs_stats.segs_reclaimed if we're not keeping statistics.
Patch from Juan RP.
 1.103.2.1 07-May-2005  tron Apply patch (requested by perseant in ticket #242):
* fsck_lfs buffer cache fixes, including PR #29151
* Change fsck_lfs phase 0 message to reflect reality
* fsck_lfs: check phase 5 (cleanerinfo accounting) even on
roll-forward
* Keep better track of the free list during roll-forward, avoiding
a core dump
* Improve hash table use for fsck_lfs buffer and vnode cache
* Document fsck_lfs flag -f, and implement -q
* Add resize_lfs, including kernel support
* Add LFS to mountd's list of exportable filesystem types
* Make the LFS lkm work again [christos@]
* Add MP locking to the LFS kernel subsystem
* Fix pager_map deadlock in lfs_putpages()
* Avoid incomplete file extension that looks like "partial
truncation" to fsck
* Use lfs_malloc for cleaner malloc, since the cleaner often runs
in low-memory conditions.
* Use splay trees, not hash table, to track page allocation for
write.
* Fix mkdir panic on full fs
* Fix page accounting leak by counting differently.
* Use rightly named structure for lfs_getattr [skrll@]
* Cosmetic changes for readability.
 1.107.2.7 04-Feb-2008  yamt sync with head.
 1.107.2.6 21-Jan-2008  yamt sync with head
 1.107.2.5 27-Oct-2007  yamt sync with head.
 1.107.2.4 03-Sep-2007  yamt sync with head.
 1.107.2.3 26-Feb-2007  yamt sync with head.
 1.107.2.2 30-Dec-2006  yamt sync with head.
 1.107.2.1 21-Jun-2006  yamt sync with head.
 1.108.12.2 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.108.12.1 28-Mar-2006  tron Merge 2006-03-28 NetBSD-current into the "peter-altq" branch.
 1.108.10.5 11-May-2006  elad sync with head
 1.108.10.4 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.108.10.3 19-Apr-2006  elad sync with head.
 1.108.10.2 10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.108.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.108.8.6 03-Sep-2006  yamt sync with head.
 1.108.8.5 11-Aug-2006  yamt sync with head
 1.108.8.4 26-Jun-2006  yamt sync with head.
 1.108.8.3 24-May-2006  yamt sync with head.
 1.108.8.2 11-Apr-2006  yamt sync with head
 1.108.8.1 01-Apr-2006  yamt sync with head.
 1.108.6.3 01-Jun-2006  kardel Sync with head.
 1.108.6.2 22-Apr-2006  simonb Sync with head.
 1.108.6.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time() and use "time_second"
instead of "time.tv_sec".
 1.108.4.1 09-Sep-2006  rpaulo sync with head
 1.113.2.1 19-Jun-2006  chap Sync with head.
 1.116.4.2 10-Dec-2006  yamt sync with head.
 1.116.4.1 22-Oct-2006  yamt sync with head
 1.116.2.3 30-Jan-2007  ad Remove support for SA. Ok core@.
 1.116.2.2 12-Jan-2007  ad Sync with head.
 1.116.2.1 18-Nov-2006  ad Sync with head.
 1.118.4.1 03-Sep-2007  wrstuden Sync w/ NetBSD-4-RC_1
 1.118.2.1 05-Jun-2007  bouyer Pull up following revision(s) (requested by perseant in ticket #703):
sys/miscfs/genfs/genfs.h 1.21
sys/miscfs/genfs/genfs_vnops.c 1.151
sys/ufs/lfs/lfs.h 1.119, 1.120
sys/ufs/lfs/lfs_bio.c 1.99-101
sys/ufs/lfs/lfs_extern.h 1.89
sys/ufs/lfs/lfs_inode.c 1.108, 1.109
sys/ufs/lfs/lfs_segment.c 1.197, 1.199, 1.200
sys/ufs/lfs/lfs_subr.c 1.69, 1.70
sys/ufs/lfs/lfs_syscalls.c 1.119
sys/ufs/lfs/lfs_vfsops.c 1.234, 1.235
sys/ufs/lfs/lfs_vnops.c 1.195, 1.196, 1.200, 1.202-206

Reduce busy waiting in lfs_putpages(), and other LFS improvements.
 1.121.2.1 12-Mar-2007  rmind Sync with HEAD.
 1.122.18.1 14-Oct-2007  yamt sync with head.
 1.122.16.3 23-Mar-2008  matt sync with HEAD
 1.122.16.2 09-Jan-2008  matt sync with HEAD
 1.122.16.1 06-Nov-2007  matt sync with HEAD
 1.122.14.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.122.2.4 24-Aug-2007  ad Sync with buffer cache locking changes. See buf.h/vfs_bio.c for details.
Some minor portions are incomplete and needs to be verified as a whole.
 1.122.2.3 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.122.2.2 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.122.2.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.124.10.1 02-Jan-2008  bouyer Sync with HEAD
 1.124.6.5 26-Dec-2007  ad Sync with head.
 1.124.6.4 19-Dec-2007  ad Use a global lfs_lock.
 1.124.6.3 19-Dec-2007  ad Fix some more problems w/lfs on this branch.
 1.124.6.2 19-Dec-2007  ad Get lfs mostly working.
 1.124.6.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.124.4.1 18-Feb-2008  mjf Sync with HEAD.
 1.128.10.1 18-May-2008  yamt sync with head.
 1.128.8.2 01-Nov-2008  christos Sync with head.
 1.128.8.1 29-Mar-2008  christos Welcome to the time_t=long long dev_t=uint64_t branch.
 1.128.6.2 17-Jan-2009  mjf Sync with HEAD.
 1.128.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.129.2.5 11-Aug-2010  yamt sync with head.
 1.129.2.4 11-Mar-2010  yamt sync with head
 1.129.2.3 16-Sep-2009  yamt sync with head
 1.129.2.2 04-May-2009  yamt sync with head.
 1.129.2.1 16-May-2008  yamt sync with head.
 1.132.2.3 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.132.2.2 14-May-2008  wrstuden Per discussion with ad, remove most of the #include <sys/sa.h> lines
as they were including sa.h just for the type(s) needed for syscallargs.h.

Instead, create a new file, sys/satypes.h, which contains just the
types needed for syscallargs.h. Yes, there's only one now, but that
may change and it's probably more likely to change if it'd be difficult
to handle. :-)

Per discussion with matt at n dot o, add an include of satypes.h to
sigtypes.h. Upcall handlers are kinda signal handlers, and signalling
is the header file that's already included for syscallargs.h that
closest matches SA.

This shaves about 3000 lines off of the diff of the branch relative
to the base. That also represents about 18% of the total before this
checkin.

I think this reduction is very good thing.
 1.132.2.1 10-May-2008  wrstuden Initial checkin of re-adding SA. Everything except kern_sa.c
compiles in GENERIC for i386. This is still a work-in-progress, but
this checkin covers most of the mechanical work (changing signalling
to be able to accomidate SA's process-wide signalling and re-adding
includes of sys/sa.h and savar.h). Subsequent changes will be much
more interesting.

Also, kern_sa.c has received partial cleanup. There's still more
to do, though.
 1.133.6.1 19-Jan-2009  skrll Sync with HEAD.
 1.135.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.135.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.136.2.3 19-May-2011  rmind Implement sharing of vnode_t::v_interlock amongst vnodes:
- Lock is shared amongst UVM objects using uvm_obj_setlock() or getnewvnode().
- Adjust vnode cache to handle unsharing, add VI_LOCKSHARE flag for that.
- Use sharing in tmpfs and layerfs for underlying object.
- Simplify locking in ubc_fault().
- Sprinkle some asserts.

Discussed with ad@.
 1.136.2.2 03-Jul-2010  rmind sync with head
 1.136.2.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.138.6.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.139.6.2 05-Apr-2012  mrg sync to latest -current.
 1.139.6.1 18-Feb-2012  mrg merge to -current.
 1.139.2.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.139.2.2 23-Jan-2013  yamt sync with head
 1.139.2.1 17-Apr-2012  yamt sync with head
 1.142.2.4 03-Dec-2017  jdolecek update from HEAD
 1.142.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.142.2.2 23-Jun-2013  tls resync from head
 1.142.2.1 25-Feb-2013  tls resync with head
 1.147.2.2 18-May-2014  rmind sync with head
 1.147.2.1 28-Aug-2013  rmind sync with head
 1.152.2.1 10-Aug-2014  tls Rebase.
 1.155.4.5 28-Aug-2017  skrll Sync with HEAD
 1.155.4.4 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.155.4.3 22-Sep-2015  skrll Sync with HEAD
 1.155.4.2 06-Jun-2015  skrll Sync with HEAD
 1.155.4.1 06-Apr-2015  skrll Sync with HEAD
 1.172.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.172.2.2 26-Apr-2017  pgoyette Sync with HEAD
 1.172.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.174.4.1 30-Oct-2017  snj Pull up following revision(s) (requested by maya in ticket #330):
sbin/fsck_lfs/inode.c: 1.69
sbin/fsck_lfs/lfs.c: 1.73
sbin/fsck_lfs/pass6.c: 1.50
sbin/fsck_lfs/segwrite.c: 1.46
sys/ufs/lfs/lfs.h: 1.202-1.203
sys/ufs/lfs/lfs_accessors.h: 1.48
sys/ufs/lfs/lfs_alloc.c: 1.136-1.137
sys/ufs/lfs/lfs_balloc.c: 1.94
sys/ufs/lfs/lfs_bio.c: 1.141
sys/ufs/lfs/lfs_extern.h: 1.113
sys/ufs/lfs/lfs_inode.c: 1.156-1.157
sys/ufs/lfs/lfs_inode.h: 1.20, 1.21, 1.23
sys/ufs/lfs/lfs_itimes.c: 1.20
sys/ufs/lfs/lfs_pages.c: 1.13-1.15
sys/ufs/lfs/lfs_rename.c: 1.22
sys/ufs/lfs/lfs_segment.c: 1.270-1.275
sys/ufs/lfs/lfs_subr.c: 1.94-1.97
sys/ufs/lfs/lfs_syscalls.c: 1.175
sys/ufs/lfs/lfs_vfsops.c: 1.360
sys/ufs/lfs/lfs_vnops.c: 1.316-1.321
sys/ufs/lfs/ulfs_inode.c: 1.20
sys/ufs/lfs/ulfs_inode.h: 1.24
sys/ufs/lfs/ulfs_lookup.c: 1.41
sys/ufs/lfs/ulfs_quota2.c: 1.31
sys/ufs/lfs/ulfs_readwrite.c: 1.24
sys/ufs/lfs/ulfs_vnops.c: 1.49-1.50
Update inode member i_flag --> i_state to keep up with kernel changes
Move definition of IN_ALLMOD near the flag it's a mask for.
Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
Correct confusion between i_flag and i_flags
These will have to be renamed.
Spotted by Riastradh, thanks!
Add an XXX about the missing flags so it's not buried in a commit
message.
now the XXX count for LFS is 260
Rename i_flag to i_state.
The similarity to i_flags has previously caused errors.
Use continue to denote the no-op loop to match netbsd style
newline for extra clarity.
It isn't safe to drain dirops with seglock held, it'll deadlock if there
are any dirops. drain before grabbing seglock.
lfs_dirops == 0 is always true (as we already drained dirops), so omit
that part of the comparison.
Fixes a lot of LFS deadlocks. PR kern/52301
Many thanks to dholland for help analyzing coredumps
Ifdef out KDASSERT which fires on my machine.
Deduplicate sanity check that seglock is held on segunlock
Revert r1.272 fix to PR kern/52301, the performance hit is making things
unusable.
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar
XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
fix buffer overflow/KASSERT when cookies are supplied
lfs no longer uses the ffs-style struct direct, use the correct minimum
size
from dholland
XXX more wrong
Consistently use {,UN}MARK_VNODE macros rather than function calls.
Not much point doing anything after a panic call
Ask some question about the code in a XXX comment
XXX question our double-flushing of dirops
Fix typo in comment
 1.175.10.1 29-Feb-2020  ad Sync with head.
 1.175.4.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.391 20-Oct-2025  perseant Correct handling of B_MODIFY in lfs_resize_fs to avoid leaving some old
file-entry data in the segment table.
 1.390 20-Oct-2025  perseant * Generalize the partial-segment parser introduced for roll-forward,
using it to facilitate an in-kernel segment rewriter (cleaner), and a
mechanism to check whether a segment is in fact empty (only used
with DEBUG).

* Add these new fcntl calls:
- LFCNFILESTATS: For each inode given, report its number of direct
blocks, how many gaps (discontinuities) there are between direct
blocks, and how large the total gap distance is. This will be
useful for a coalescing agent.
- LFCNREWRITEFILE: For each inode given, rewrite its direct blocks,
effectively coalescing it into as compact a form as possible.
- LFCNSCRAMBLE: As above, except that it only rewrites every other
block. This causes the file to have many gaps that can be
measured with LFCNFILESTATS and addressed with LFCNREWRITEFILE,
for testing purposes.
- LFCNREWRITESEGS: Rewrite any live data in the given segments.
This is intended to simplify the cleaner API and facilitate an
in-kernel cleaner.
- LFCNCLEANERINFO: Get the most current CLEANERINFO data from the
kernel.
- LFCNSEGUSE: Retrieve segment usage data from the kernel.

* Vnodes marked IN_CLEANING now take a reference. Add a new "cleaner
lock", which must be taken by the cleaner before the segment lock,
and before marking nodes IN_CLEANING. This allows us to flush
vnodes, if necessary, before the cleaning segment is written, and
never to flush vnodes being cleaned. When the cleaner lock is
released, the vnodes are cleared of IN_CLEANING and the reference
dropped.

* Track a potential infinite loop in lfs_gatherblock.

* Pull "needs to flush" and "needs to wait for flush" into functions
instead of inlining their definitions.
 1.389 29-Sep-2025  perseant Use the symbolic name MNT_WAIT when calling VFS_SYNC. No functional change.
 1.388 19-Sep-2025  perseant Interpret the "waitfor" argument to lfs_sync to match the passed values.
Not every sync needs to be a synchronous checkpoint.
 1.387 17-Sep-2025  perseant Add working in-kernel roll forward.
 1.386 17-Sep-2025  perseant Use a workqueue to handle the superblock callback.
 1.385 17-Sep-2025  perseant Add routines to check freelist consistency if compiled with DEBUG and
conditional on a kernel variable manipulated via sysctl.
Add checks before and after each routine that modifies the free list.
#if 0 a section of lfs_vfree() that was intended to keep the free list ordered
but instead corrupted it.
 1.384 02-Sep-2025  perseant Use a workqueue to handle cluster iodone, rather than doing it in interrupt context.
 1.383 30-Dec-2024  hannken emove comment "we are always called with the filesystem marked `MPBUSY'."
above some xxx_sync() operations. These operations get called without
any exclusive lock.

This comment appeared with "add quota support" on 1990-05-02.
On 1998/02/18 MNT_MPBUSY disappeared when vfs_busy() was changed from
an exclusive lock to a shared lock.

PR kern/58837 "ffs: Missing locking around fs_fmod/time"
 1.382 19-Mar-2022  hannken branches: 1.382.4; 1.382.10;
Remove now unused VV_LOCKSWORK, all file systems support locking.

Remove unused predicates vn_locked() and vn_anylocked().

Welcome to 9.99.95
 1.381 31-Jul-2021  andvar s/threshhold/threshold
 1.380 05-Sep-2020  riastradh branches: 1.380.6;
Round of uvm.h cleanup.

The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
to query whether curlwp is the pagedaemon, which should maybe be
exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
by it. We should split up uvm_extern.h but this will serve for now
to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
UVMHIST(ubchist), since ubchist is declared in uvm.h but the
reference evaporates if UVMHIST is not defined, so we reduce header
file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
here.

ok chs@
 1.379 04-Aug-2020  riastradh Mark lfs vnodes with VV_LOCKSWORK, same as ffs.
 1.378 04-Apr-2020  ad Merge the remaining changes from the ad-namecache branch, affecting namei()
and getcwd():

- push vnode locking back as far as possible.
- do most lookups directly in the namecache, avoiding vnode locks & refs.
- don't block new refs to vnodes across VOP_INACTIVE().
- get shared locks for VOP_LOOKUP() if the file system supports it.
- correct lock types for VOP_ACCESS() / VOP_GETATTR() in a few places.

Possible future enhancements:

- make the lookups lockless.
- support dotdot lookups by being lockless and inferring absence of chroot.
- maybe make it work for layered file systems.
- avoid vnode references at the root & cwd.
 1.377 16-Mar-2020  pgoyette Use the module subsystem's ability to process SYSCTL_SETUP() entries to
automate installation of sysctl nodes.

Note that there are still a number of device and pseudo-device modules
that create entries tied to individual device units, rather than to the
module itself. These are not changed.
 1.376 14-Mar-2020  ad Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.
 1.375 14-Mar-2020  ad OR into bp->b_cflags; don't overwrite.
 1.374 23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.373 23-Feb-2020  riastradh Dust off the orphan detection code and try to make it work.
 1.372 23-Feb-2020  riastradh Initialize/destroy lfs_allclean_wakeup in modcmd, not lfs_mountfs.

Fixes reloading lfs.kmod.
 1.371 23-Feb-2020  riastradh Teach lfs to transition ro<->rw.
 1.370 18-Feb-2020  chs remove the aiodoned thread. I originally added this to provide a thread context
for doing page cache iodone work, but since then biodone() has changed to
hand off all iodone work to a softint thread, so we no longer need the
special-purpose aiodoned thread.
 1.369 17-Jan-2020  ad VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.
 1.368 15-Jan-2020  ad Merge from yamt-pagecache (after much testing):

- Reduce unnecessary page scan in putpages esp. when an object has a ton of
pages cached but only a few of them are dirty.

- Reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
 1.367 31-Dec-2019  ad branches: 1.367.2;
- Add and use wrapper functions that take and acquire page interlocks, and pairs
of page interlocks. Require that the page interlock be held over calls to
uvm_pageactivate(), uvm_pagewire() and similar.

- Solve the concurrency problem with page replacement state. Rather than
updating the global state synchronously, set an intended state on
individual pages (active, inactive, enqueued, dequeued) while holding the
page interlock. After the interlock is released put the pages on a 128
entry per-CPU queue for their state changes to be made real in batch.
This results in in a ~400 fold decrease in contention on my test system.
Proposed on tech-kern but modified to use the page interlock rather than
atomics to synchronise as it's much easier to maintain that way, and
cheaper.
 1.366 13-Dec-2019  ad Break the global uvm_pageqlock into a per-page identity lock and a private
lock for use of the pagedaemon policy code. Discussed on tech-kern.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour
 1.365 28-May-2019  msaitoh branches: 1.365.2;
s/recieve/receive/
 1.364 01-Jan-2019  hannken Add "void *extra" argument to vcache_new() so a file system may
pass more information about the file to create.

Welcome to 8.99.30
 1.363 10-Dec-2018  maxv Remove unused mbuf.h includes.
 1.362 28-May-2018  chs branches: 1.362.2;
add a genfs method to allow a file system to limit the range of pages
that are given to a single GOP_WRITE() call. needed by ZFS.
 1.361 28-Oct-2017  pgoyette branches: 1.361.2;
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3) format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
but it is possible that I've missed some of them. I would be glad to
update any stragglers that anyone identifies.
 1.360 26-Jul-2017  maya change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar

XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
 1.359 17-Apr-2017  hannken branches: 1.359.2; 1.359.4;
Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.
 1.358 17-Apr-2017  hannken Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).
 1.357 13-Apr-2017  hannken Switch lfs_flush() and lfs_writerd() to mountlist iterator.
 1.356 01-Apr-2017  maya Switch lfs_writer_daemon to use condvar instead of mtsleep.
track thread existence with struct lwp instead of pid + lid,
it's more useful from ddb.
 1.355 01-Apr-2017  maya switch lfs_dirops to condvar (from mtsleep)
 1.354 01-Apr-2017  maya switch lfs_sleepers to condvar (from mtsleep)
 1.353 13-Mar-2017  riastradh #if DIAGNOSTIC panic ---> KASSERT

Replace some #if DEBUG by this too. DEBUG is only for expensive
assertions; these are not.
 1.352 17-Feb-2017  hannken Add generic genfs_suspendctl() and use it for all file systems.
Layered file systems need work.
 1.351 07-Jul-2016  msaitoh branches: 1.351.2; 1.351.4;
KNF. Remove extra spaces. No functional change.
 1.350 20-Jun-2016  dholland Merge -r1.44 of ufs_extattr.c and related change -r1.302 of ffs_vfops.c:
fix use-after-free on failed unmount with extended attributes enabled.
 1.349 19-Oct-2015  dholland Set the legacy ulfs fstype field to ULFS2 when mounting lfs64. Oops.
 1.348 15-Oct-2015  dholland Enable mounting lfs64 volumes.
 1.347 15-Oct-2015  dholland Move stuff from struct ulfsmount to struct lfs.
 1.346 10-Oct-2015  dholland Remove no longer needed explicit 32->64 sign extension.

This is the last 32-bit-on-disk item among those that were either
already tagged or readily discoverable.
 1.345 01-Sep-2015  dholland Add new accessors for the d_type and d_namlen fields of struct lfs_direct.
Napalm the old byteswap access logic for these.
 1.344 01-Sep-2015  dholland Make the inode fields in the 64-bit superblock 64 bits wide.
Reasoning as before.

Note that I am not going through and checking for 64->32 truncations
in inode numbers; I'm sure there are quite a few, but that's a project
for later.
 1.343 01-Sep-2015  dholland Add byteswapping to the dinode accessors.

This prevents regressions in the ulfs code when switching to the new
accessors. Note that while adding byteswapping to the other accessors
is straightforward, I haven't done it yet; and that also is not enough
to make LFS_EI work, because there are places lying around that bypass
the accessors for one reason and another and all of them need to be
updated. That is going to have to wait for a later day as LFS_EI is
not on the critical path right now.
 1.342 01-Sep-2015  dholland Use the lfs dinode accessors in place of the ufs-derived ones.
(Mostly.)

The ufs-derived ones are fake structure member macros, which are gross
and not very safe. Also, it seems that a lot of places in the lfs code
were using the ffsv1 branch of them unconditionally, and this way it's
guaranteed all those places have been updated.

Found while doing this: for non-devices, have getattr produce NODEV
in the rdev field instead of leaking the address of the first direct
block.
 1.341 19-Aug-2015  dholland Part two of dinodes; use the same union everywhere.
(previously the ufs-derived code had things set up slightly different)

Remove a bunch of associated mess.
 1.340 12-Aug-2015  dholland Hack up dinode usage to be 64 vs. 32 as needed. Part 1.

(This part changes the native lfs code; the ufs-derived code already
has 64 vs. 32 logic, but as aspects of it are unsafe, and don't
entirely interoperate cleanly with the lfs 64/32 stuff, pass 2 will be
rehashing that.)
 1.339 12-Aug-2015  dholland Provide 32-bit and 64-bit versions of FINFO.

This also entailed sorting out part of struct segment, as that
contains a pointer into the current FINFO data.
 1.338 12-Aug-2015  dholland Make 32-bit and 64-bit versions of SEGSUM.
Also fix some of the FINFO handling as it's closely entangled.
 1.337 12-Aug-2015  dholland Add IFILE32 and IFILE64 structures for the on-disk ifile entries.
Add and use accessors. There are also a bunch of places that cast and
I hope I've found them all...
 1.336 12-Aug-2015  dholland Make 32-bit and 64-bit versions of CLEANERINFO.

XXX: while this is written to disk, it seems like much of it would
XXX: be better set up as a commpage shared with the cleaner.
 1.335 12-Aug-2015  dholland Fix botched syscall_package. HI CHRISTOS
 1.334 02-Aug-2015  dholland Pass the fs object to LFS_MAX_DADDR so it can check lfs_is64.

Remove some hackish intentional 64->32 truncations next to the checks
using LFS_MAX_DADDR, and tackle the problem they handled in bmap
instead.

The problem: the magic block pointer value UNWRITTEN has magic value
-2, and if it's not handled specifically, uint32 -> uint64 promotion
turns it into 4294967294, which then causes consternation and
monkeyhouse downstream.

What's here is still kind of a hack, but it's a step forward.
 1.333 02-Aug-2015  dholland Add a (draft) 64-bit superblock. Make things build again.

Add pieces of support for using both superblock types where
convenient, and specifically to the superblock accessors, but don't
actually enable it anywhere.

First substantive step on PR 50000.
 1.332 02-Aug-2015  dholland Use accessor functions for the version field of the lfs superblock.
I thought at first maybe the cases that test the version should be
rolled into the accessors, but on the whole I think the conclusion on
that is no.
 1.331 02-Aug-2015  dholland Second batch of 64 -> 32 truncations in lfs, along with more minor
tidyups and corrections in passing.
 1.330 02-Aug-2015  dholland Fix assorted 64 -> 32 truncations in lfs. Also, some minor tidyups and
corrections in passing.
 1.329 28-Jul-2015  dholland Add a new lfs header file: lfs_accessors.h.

This contains all the accessor functions and macros out of lfs.h.
Add an include of lfs_accessors.h after all uses of lfs.h... except
for code that wants to define its own struct lfs-alike that the
accessors are supposed to play along with. For these, set STRUCT_LFS
and include lfs_accessors.h after the necessary structure has been
defined, so that lfs_accessors.h can emit functions in terms of it.
 1.328 24-Jul-2015  dholland More lfs superblock accessors.
(This changes the rest of the code over; all the accessors were
already added.)

The difference between this commit and the previous one is arbitrary,
but the previous one passed the regression tests on its own so I'm
keeping it separate to help with any bisections that might be needed
in the future.
 1.327 24-Jul-2015  dholland Switch to accessor functions for elements of the LFS on-disk
superblock. This will allow switching between 32/64 bit forms on the
fly; it will also allow handling LFS_EI reasonably tidily. (That
currently doesn't work on the superblock.)

It also gets rid of cpp abuse in the form of fake structure member
macros.

Also, instead of doing sleep/wakeup on &lfs_avail and &lfs_nextseg
inside the on-disk superblock, add extra elements to the in-memory
struct lfs for this. (XXX: these should be changed to condvars, but
not right now)

XXX: this migrates a structure needed by the lfs code in libsa (struct
salfs) into lfs.h, where it doesn't belong, but for the time being
this is necessary in order to allow the accessors (and the various
lfs macros and other goop that relies on them) to compile.
 1.326 16-Jul-2015  dholland Don't cast the return value of malloc.
 1.325 07-Jun-2015  hannken Fix copy and paste errors from last commits.
- Kernel i386/ALL and amd64/ALL compile again.
- Resolves CID 1304138 (DEADCODE) and 1304139 (IDENTICAL_BRANCHES).
 1.324 31-May-2015  hannken Change lfs from hash table to vcache.

- Change lfs_valloc() to return an inode number and version instead of
a vnode and move lfs_ialloc() and lfs_vcreate() to new lfs_init_vnode().

- Add lfs_valloc_fixed() to allocate a known inode, used by kernel
roll forward.

- Remove lfs_*ref(), these functions cannot coexist with vcache and
their commented behaviour is far away from their implementation.

- Add the cleaner lwp and blockinfo to struct ulfsmount so lfs_loadvnode()
may use hints from the cleaner.

- Remove vnode locks from ulfs_lookup() like we did with ufs_lookup().
 1.323 31-May-2015  hannken Use VFS_PROTOS() for lfs.
Rename conflicting struct lfs field "lfs_start" to "lfs_s0addr".

No functional change.
 1.322 28-Mar-2015  maxv Remove the 'cred' argument from bread(). Remove a now unused var in
ffs_snapshot.c. Update the man page accordingly.

ok hannken@
 1.321 16-Apr-2014  maxv branches: 1.321.4;
An (un)privileged user can easily make the kernel dereference a NULL
pointer.

The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).

ok christos@
 1.320 24-Mar-2014  hannken branches: 1.320.2;
- Make VI_XLOCK, VI_CLEAN and VI_LOCKSHARE private to kern/vfs_*.c.
- Make vwait() static.
- Add vdead_check() to check a vnode for being or becoming dead.

Discussed on tech-kern.

Welcome to 6.99.38
 1.319 23-Mar-2014  hannken Change all vfsops to use C99 designated initializers.

No functional changes intended.
 1.318 25-Feb-2014  pooka Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.317 27-Nov-2013  christos Change the queue.3 *_END(&head) macros to NULL. Since we don't have CIRCLEQ
anymore, all the macros expand to NULL anyway, so this improves readability.
Requested by rmind@
 1.316 23-Nov-2013  christos change the mountlist CIRCLEQ into a TAILQ
 1.315 17-Oct-2013  christos - remove unused variables
- add debug ifdefs for debugging variables
- __USE() where appropriate.
 1.314 30-Sep-2013  hannken Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>
 1.313 28-Jul-2013  dholland Merge in some of the stuff for supporting the extended attributes code.
 1.312 28-Jul-2013  dholland Add more of the bits for supporting quotas.
 1.311 28-Jul-2013  dholland Bring in a copy of ffs_quota2_mount() for reference.
Add stuff to struct lfs that it needs to initialize.
Clear these fields in mount as there's no on-disk support for quota2;
but this increases the chances of being able to add it (or something
like it) in the future.
 1.310 28-Jul-2013  dholland Migrate the miscellaneous ulfs-level info from struct ulfsmount to
struct lfs.

Put them inside #ifdef _KERNEL there. They are not the only such
members, gross as that is. Unfortunately, moving struct lfs to
lfs_kernel.h does not work.
 1.309 28-Jul-2013  dholland Add lfs_kernel.h for declarations that don't need to be exposed to userland.

lfs currently has the following headers:
lfs.h - on-disk structures and stuff needed for userlevel tools
lfs_inode.h - additional restricted materials for userlevel tools
that operate the fs (newfs_lfs, fsck_lfs, lfs_cleanerd)
lfs_kernel.h - stuff needed only in the kernel

and the following legacy headers that are expected to be mopped up and
folded into one of the above:
lfs_extern.h - function prototypes
ulfs_bswap.h - endian-independent support
ulfs_dinode.h - now contains very little
ulfs_dirhash.h - dirhash support
ulfs_extattr.h - extattr support
ulfs_extern.h - more function prototypes
ulfs_inode.h - assorted kernel-only declarations
ulfs_quota.h - quota support
ulfs_quota1.h - more quota support
ulfs_quota2.h - more quota support
ulfs_quotacommon.h - more quota support
ulfsmount.h - legacy copy of ufsmount material
 1.308 28-Jul-2013  dholland Get rid of the ulfs_ops table as we only have one fs in here now.
 1.307 18-Jun-2013  christos branches: 1.307.2;
Prefix most of the cpp macros with lfs_ and LFS_ to avoid conflicts with ffs.
This was done so that boot blocks that want to compile both FFS and LFS in
the same file work.
 1.306 17-Jun-2013  christos LFS module does not depend on FFS anymore. (NAKAJIMA Yoshihiro)
 1.305 10-Jun-2013  hannken Make DEBUG kernel compile: di_u.inumber -> di_inumber
 1.304 08-Jun-2013  dholland DIRBLKSIZ -> LFS_DIRBLKSIZ
DIRECTSIZ -> LFS_DIRECTSIZ
DIRSIZ -> LFS_DIRSIZ
OLDDIRFMT -> LFS_OLDDIRFMT
NEWDIRFMT -> LFS_NEWDIRFMT
IFTODT -> LFS_IFTODT
DTTOIF -> LFS_DTTOIF
 1.303 08-Jun-2013  dholland Stick LFS_ in front of IFMT, IFIFO, IFREG, etc. so as not to conflict
with the UFS copies of these symbols. (Which themselves ought to have
UFS_ stuck on.)
 1.302 06-Jun-2013  dholland Add lfs_ or ulfs_ in front of extern symbols lacking them, mostly
quota-related (and particularly quota2-related) stuff.
 1.301 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.300 06-Jun-2013  dholland Split lfs from ufs step 3: rearrange config stuff.
Add new options:
LFS_EI
LFS_DIRHASH
LFS_EXTATTR
LFS_EXTATTR_AUTOSTART
LFS_QUOTA
LFS_QUOTA2

and update code referring to the corresponding FFS and UFS config
symbols to use the LFS versions. Disable the one extant reference
to APPLE_UFS in the ulfs files. Use opt_lfs.h only, not opt_ffs.h.
 1.299 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.298 22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.297 20-Dec-2012  hannken Change bread() and breadn() to never return a buffer on
error and modify all callers to not brelse() on error.

Welcome to 6.99.16

PR kern/46282 (6.0_BETA crash: msdosfs_bmap -> pcbmap -> bread -> bio_doread)
 1.296 30-Apr-2012  rmind branches: 1.296.2;
- Replace some malloc(9) uses with kmem(9).
- G/C M_IPMOPTS, M_IPMADDR and M_BWMETER.
 1.295 13-Mar-2012  elad Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.
 1.294 16-Feb-2012  perseant Pass t_renamerace and t_rmdirrace tests.

Adapt dholland@'s fix to ufs_rename to fix PR kern/43582. Address several
other MP locking issues discovered during the course of investigating the
same problem.

Removed extraneous vn_lock() calls on the Ifile, since the Ifile writes
are controlled by the segment lock.

Fix PR kern/45982 by deemphasizing the estimate of how much metadata
will fill the empty space on disk when the disk is nearly empty
(t_renamerace crates a lot of inode blocks on a tiny empty disk).
 1.293 04-Jan-2012  perseant branches: 1.293.2;
lfs_writerd thread exits when no more LFSs are mounted.
 1.292 02-Jan-2012  perseant * Remove PGO_RECLAIM during lfs_putpages()' call to genfs_putpages(),
to avoid a live lock in the latter when reclaiming a vnode with
dirty pages.

* Add a new segment flag, SEGM_RECLAIM, to note when a segment is
being written for vnode reclamation, and record which inode is being
reclaimed, to aid in forensic debugging.

* Add a new segment flag, SEGM_SINGLE, so that opportunistic writes
can write a single segment's worth of blocks and then stop, rather
than writing all the way up to the cleaner's reserved number of
segments.

* Add assert statements to check mutex ownership is the way it ought
to be, mostly in lfs_putpages; fix problems uncovered by this.

* Don't clear VU_DIROP until the inode actually makes its way to disk,
avoiding a problem where dirop inodes could become separated
(uncovered by a modified version of the "ckckp" forensic regression
test).

* Move the vfs_getopsbyname() call into lfs_writerd. Prepare code to
make lfs_writerd notice when there are no more LFSs, and exit losing
the reference, so that, in theory, the module can be unloaded. This
code is not enabled, since it causes a crash on exit.

* Set IN_MODIFIED on inodes flushed by lfs_flush_dirops. Really we
only need to set IN_MODIFIED if we are going to write them again
(e.g., to write pages); need to think about this more.

Finally, several changes to help avoid "no clean segments" panics:

* In lfs_bmapv, note when a vnode is loaded only to discover whether
its blocks are live, so it can immediately be recycled. Since the
cleaner will try to choose ~empty segments over full ones, this
prevents the cleaner from (1) filling the vnode cache with junk, and
(2) squeezing any unwritten writes to disk and running the fs out of
segments.

* Overestimate by half the amount of metadata that will be required
to fill the clean segments. This will make the disk appear smaller,
but should help avoid a "no clean segments" panic.

* Rearrange lfs_writerd. In particular, lfs_writerd now pays
attention to the number of clean segments available, and holds off
writing until there is room.
 1.291 14-Nov-2011  hannken branches: 1.291.4;
VOP_OPEN() needs a locked vnode. All these copy-and-pasted xxxfs_mount()
implementations need more review.
 1.290 11-Jul-2011  hannken branches: 1.290.2;
Change VOP_BWRITE() to take a vnode as its first argument like all other
VOPs do. Layered file systems no longer have to modify bp->b_vp and run
into trouble when an async VOP_BWRITE() uses the wrong vnode.

- change all occurences of VOP_BWRITE(bp) to VOP_BWRITE(bp->b_vp, bp).
- remove layer_bwrite().
- welcome to 5.99.55

Adresses PR kern/38762 panic: vwakeup: neg numoutput

No objections from tech-kern@.
 1.289 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.288 06-Mar-2011  bouyer branches: 1.288.2;
merge the bouyer-quota2 branch. This adds a new on-disk format
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.

See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
 1.287 24-Jun-2010  hannken branches: 1.287.2; 1.287.4;
Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.
 1.286 02-Mar-2010  pooka branches: 1.286.2;
load lfs syscalls in modload
 1.285 02-Mar-2010  pooka /*
* XXX: Get extra reference to LFS vfsops. This prevents unload,
* but also prevents kernel panic due to text being unloaded
* from below lfs_writerd. When lfs_writerd can exit, remove
* this!!!
*/
 1.284 18-Feb-2010  eeh Fix root filesystem support.
 1.283 16-Feb-2010  mlelstv Three changes in a single commit.

- drop the notion of frags (LFS fragments) vs fsb (FFS fragments)
The code uses a complicated unity function that just makes the
code difficult to understand.

- support larger sector sizes. Fix disk address computations
to use DEV_BSIZE in the kernel as required by device drivers
and to use sector sizes in userland.

- Fix several locking bugs in lfs_bio.c and lfs_subr.c.
 1.282 08-Jan-2010  pooka branches: 1.282.2;
The VATTR_NULL/VREF/VHOLD/HOLDRELE() macros lost their will to live
years ago when the kernel was modified to not alter ABI based on
DIAGNOSTIC, and now just call the respective function interfaces
(in lowercase). Plenty of mix'n match upper/lowercase has creeped
into the tree since then. Nuke the macros and convert all callsites
to lowercase.

no functional change
 1.281 07-Dec-2009  eeh Fix some more hangs and deadlocks.
 1.280 17-Nov-2009  pooka Create unwind log in global variable instead of automatic variable.

memory leak spotted by njoly's valgrind run
 1.279 29-Oct-2009  eeh Fix up numoutput accounting.
 1.278 13-Sep-2009  tsutsui Move declaration of ufs_hashlock into <ufs/ufs_extern.h> from each c source.
 1.277 05-Aug-2009  pooka Compensate v_numoutput & nestbuf for lfs's rather peculiar I/O habits.
 1.276 05-Aug-2009  pooka remember to nestiobuf_done() too
 1.275 05-Aug-2009  pooka Use nestiobuf instead of homerolled equivalent.
 1.274 29-Jun-2009  dholland Convert 67 namei call sites to use namei_simple, in these functions:

check_console, veriexecclose, veriexec_delete, veriexec_file_add,
emul_find_root, coff_load_shlib (sh3 version), coff_load_shlib,
compat_20_sys_statfs, compat_20_netbsd32_statfs,
ELFNAME2(netbsd32,probe_noteless), darwin_sys_statfs,
ibcs2_sys_statfs, ibcs2_sys_statvfs, linux_sys_uselib,
osf1_sys_statfs, sunos_sys_statfs, sunos32_sys_statfs,
ultrix_sys_statfs, do_sys_mount, fss_create_files (3 of 4),
adosfs_mount, cd9660_mount, coda_ioctl, coda_mount, ext2fs_mount,
ffs_mount, filecore_mount, hfs_mount, lfs_mount, msdosfs_mount,
ntfs_mount, sysvbfs_mount, udf_mount, union_mount, sys_chflags,
sys_lchflags, sys_chmod, sys_lchmod, sys_chown, sys_lchown,
sys___posix_chown, sys___posix_lchown, sys_link, do_sys_pstatvfs,
sys_quotactl, sys_revoke, sys_truncate, do_sys_utimes, sys_extattrctl,
sys_extattr_set_file, sys_extattr_set_link, sys_extattr_get_file,
sys_extattr_get_link, sys_extattr_delete_file,
sys_extattr_delete_link, sys_extattr_list_file, sys_extattr_list_link,
sys_setxattr, sys_lsetxattr, sys_getxattr, sys_lgetxattr,
sys_listxattr, sys_llistxattr, sys_removexattr, sys_lremovexattr

All have been scrutinized (several times, in fact) and compile-tested,
but not all have been explicitly tested in action.

XXX: While I haven't (intentionally) changed the use or nonuse of
XXX: TRYEMULROOT in any of these places, I'm not convinced all the
XXX: uses are correct; an audit might be desirable.
 1.273 07-May-2009  elad Use genfs_can_mount().
 1.272 04-Apr-2009  ad Turn up the volume on the warning message a bit.
 1.271 15-Mar-2009  cegger ansify function definitions
 1.270 22-Feb-2009  ad PR kern/26878 FFSv2 + softdep = livelock (no free ram)
PR kern/16942 panic with softdep and quotas
PR kern/19565 panic: softdep_write_inodeblock: indirect pointer #1 mismatch
PR kern/26274 softdep panic: allocdirect_merge: ...
PR kern/26374 Long delay before non-root users can write to softdep partitions
PR kern/28621 1.6.x "vp != NULL" panic in ffs_softdep.c:4653 while unmounting a softdep (+quota) filesystem
PR kern/29513 FFS+Softdep panic with unfsck-able file-corruption
PR kern/31544 The ffs softdep code appears to fail to write dirty bits to disk
PR kern/31981 stopping scsi disk can cause panic (softdep)
PR kern/32116 kernel panic in softdep (assertion failure)
PR kern/32532 softdep_trackbufs deadlock
PR kern/37191 softdep: locking against myself
PR kern/40474 Kernel panic after remounting raid root with softdep

Retire softdep, pass 2. As discussed and later formally announced on the
mailing lists.
 1.269 13-Nov-2008  ad branches: 1.269.4;
These depend on ffs.
 1.268 13-Nov-2008  ad Remove #ifdef LFS from the ufs code.
 1.267 28-Jun-2008  rumble branches: 1.267.2; 1.267.4; 1.267.6;
Fix lkm fallout from previous sysctl changes. This largely duplicates
sysctl creation code, but lkms are going away soon(ish) anyway.

Spotted by Chris Gilbert.
 1.266 28-Jun-2008  rumble Create sysctl entries during module initialisation and destroy them
appropriately.

Many of these file systems are now ready for modularisation.
 1.265 24-May-2008  nakayama branches: 1.265.2;
s/log file system/log-structured file system/
 1.264 20-May-2008  ad Don't moan about LFS unless the mount succeeds.
 1.263 18-May-2008  ad Until these get fixed or replaced:

WARNING: the foo file system is experimental and may be unstable
 1.262 16-May-2008  hannken Make sure all cached buffers with valid, not yet written data have been
run through copy-on-write. Call fscow_run() with valid data where possible.

The LP_UFSCOW hack is no longer needed to protect ffs_copyonwrite() against
endless recursion.

- Add a flag B_MODIFY to bread(), breada() and breadn(). If set the caller
intends to modify the buffer returned.

- Always run copy-on-write on buffers returned from ffs_balloc().

- Add new function ffs_getblk() that gets a buffer, assigns a new blkno,
may clear the buffer and runs copy-on-write. Process possible errors
from getblk() or fscow_run(). Part of PR kern/38664.

Welcome to 4.99.63

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.261 10-May-2008  rumble Convert file systems to dynamically attach with the new module interface.
Make VFS hooks dynamic while we're here and say farewell to VFS_ATTACH and
VFS_HOOKS_ATTACH linksets.

As a consequence, most of the file systems can now be loaded as new style
modules.

Quick sanity check by ad@.
 1.260 06-May-2008  ad branches: 1.260.2;
PR kern/38141 lookup/vfs_busy acquire rwlock recursively

Simplify the mount locking. Remove all the crud to deal with recursion on
the mount lock, and crud to deal with unmount as another weirdo lock.

Hopefully this will once and for all fix the deadlocks with this. With this
commit there are two locks on each mount:

- krwlock_t mnt_unmounting. This is used to prevent unmount across critical
sections like getnewvnode(). It's only ever read locked with rw_tryenter(),
and is only ever write locked in dounmount(). A write hold can't be taken
on this lock if the current LWP could hold a vnode lock.

- kmutex_t mnt_updating. This is taken by threads updating the mount, for
example when going r/o -> r/w, and is only present to serialize updates.
In order to take this lock, a read hold must first be taken on
mnt_unmounting, and the two need to be held across the operation.

One effect of this change: previously if an unmount failed, we would make a
half hearted attempt to back out of it gracefully, but that was unlikely to
work in a lot of cases. Now while an unmount that will be aborted is in
progress, new file operations within the mount will fail instead of being
delayed. That is unlikely to be a problem though, because if the admin
requests unmount of a file system then s(he) has made a decision to deny
access to the resource.
 1.259 30-Apr-2008  ad PR kern/38135 vfs_busy/vfs_trybusy confusion

The previous fix worked, but it opened a window where mounts could have
disappeared from mountlist while the caller was traversing it using
vfs_trybusy(). Fix that.
 1.258 29-Apr-2008  ad kern/38135 vfs_busy/vfs_trybusy confusion

The symptom was that sometimes file systems would occasionally not appear
in output from 'df' or 'mount' if the system was busy. Resolution:

- Make mount locks work somewhat like vm_map locks.
- vfs_trybusy() now only fails if the mount is gone, or if someone is
unmounting the file system. Simple contention on mnt_lock doesn't
cause it to fail.
- vfs_busy() will wait even if the file system is being unmounted.
 1.257 29-Apr-2008  ad PR kern/38057 ffs makes assuptions about devvp file system
PR kern/33406 softdeps get stuck in endless loop

Introduce VFS_FSYNC() and call it when syncing a block device, if it
has a mounted file system.
 1.256 28-Apr-2008  martin Remove clause 3 and 4 from TNF licenses
 1.255 30-Jan-2008  ad branches: 1.255.6; 1.255.8; 1.255.10;
PR kern/37706 (forced unmount of file systems is unsafe):

- Do reference counting for 'struct mount'. Each vnode associated with a
mount takes a reference, and in turn the mount takes a reference to the
vfsops.
- Now that mounts are reference counted, replace the overcomplicated mount
locking inherited from 4.4BSD with a recursable rwlock.
 1.254 28-Jan-2008  dholland Fix some race conditions in rename.
Introduce a per-FS rename lock and new vfsops to manipulate it.
Get this lock while renaming. Also add another relookup() in do_sys_rename,
which is a hack to kludge around some of the worst deficiencies of
ufs_rename.
reviewed-by: pooka (and an earlier rev by ad)
posted on tech-kern with no objections.
 1.253 24-Jan-2008  ad specfs changes for PR kern/37717 (raidclose() is no longer called on
shutdown). There are still problems with device access and a PR will be
filed.

- Kill checkalias(). Allow multiple vnodes to reference a single device.

- Don't play dangerous tricks with block vnodes to ensure that only one
vnode can describe a block device. Instead, prohibit concurrent opens of
block devices. As a bonus remove the unreliable code that prevents
multiple file system mounts on the same device. It's no longer needed.

- Track opens by vnode and by device. Issue cdev_close() when the last open
goes away, instead of abusing vnode::v_usecount to tell if the device is
open.
 1.252 02-Jan-2008  ad Merge vmlocking2 to head.
 1.251 12-Dec-2007  lukem defflag LFS_KERNEL_RFW (in opt_lfs.h).
Note: lfs_rfw.c doesn't compile if you define the option; locking API fallout?
 1.250 08-Dec-2007  pooka branches: 1.250.2; 1.250.4;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.
 1.249 26-Nov-2007  pooka branches: 1.249.2;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.248 22-Nov-2007  yamt lfs_mountroot: use vfs_destroy.
 1.247 10-Nov-2007  rmind Use PRI_BIO for kthreads instead of PINOD. Fixes a missed case of priority
inversion, which caused LFS to fire some assertions.

Reported by Kurt Schreiner on <current-users>.
 1.246 10-Oct-2007  ad branches: 1.246.2; 1.246.4;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.245 08-Oct-2007  ad Merge ffs locking & brelse changes from the vmlocking branch.
 1.244 31-Jul-2007  pooka branches: 1.244.2; 1.244.4; 1.244.6; 1.244.8;
* nuke the nameidata parameter from VFS_MOUNT(). Nobody on tech-kern
knew what it was supposed to be used for and wrstuden gave a go-ahead
* while rototilling, convert file systems which went easily to
use VFS_PROTOS() instead of manually prototyping the methods
 1.243 29-Jul-2007  ad It's not a good idea for device drivers to modify b_flags, as they don't
need to understand the locking around that field. Instead of setting
B_ERROR, set b_error instead. b_error is 'owned' by whoever completes
the I/O request.
 1.242 26-Jul-2007  pooka Use eopnotsupp() instead of vfs_stdsuspendctl() and retire the latter.
 1.241 23-Jul-2007  ad Workaround the ufs_haslock/ufs_ihash_lock deadlock. From a patch
posted by Blair Sadewitz.
 1.240 17-Jul-2007  christos branches: 1.240.2;
Eliminate MFSNAMELEN
 1.239 17-Jul-2007  pooka Make set_statvfs_info() take a parameter for the vfs name instead
of always retrieving it from mp->mnt_op->vfs_name

christos ok
 1.238 12-Jul-2007  dsl Change the VFS_MOUNT() interface so that the 'data' buffer passed to the
fs code is a kernel buffer, pass though the length of the buffer as well.
Since the length of the userspace buffer isn'it (yet) passed through the mount
system call, add a field to the vfsops structure containing the default length.
Split sys_mount() for calls from compat code.
Ride one of the recent kernel version changes - old fs LKMs will load, but
sys_mount() will reject any attempt to use them.
 1.237 09-Jul-2007  ad Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.236 30-Jun-2007  pooka Using POOL_INIT here makes no sense, since file systems always have
an init method. So get rid of it and #ifdef _LKM and just always
init in the init method. Give malloc types the same treatment.
Makes file systems nicer to work with in linksetless environments
and fixes a few LKM discrepancies.
 1.235 16-May-2007  perseant Change references to SEGM_W_DIROPS to SEGM_CKP, and replace the logic that
formerly used SEGM_W_DIROPS in lfs_segwrite() appropriately. This prevents
a problem in which processes could get stuck in "buffers" sleep forever.
 1.234 17-Apr-2007  perseant Install a new sysctl, vfs.lfs.ignore_lazy_sync, which causes LFS to ignore
the "smooth" syncer, as if vfs.sync.*delay = 0, but only for LFS. The
default is "on", i.e., ignore lazy sync.

Reduce the amount of polling/busy-waiting done by lfs_putpages(). To
accomplish this, copied genfs_putpages() and modified it to indicate which
page it was that caused it to return with EDEADLK. fsync()/fdatasync()
should no longer ever fail with EAGAIN, and should not consume huge
quantities of cpu.

Also, try to make dirops less likely to be written as the result of a
VOP_PUTPAGES(), while ensuring that they are written regularly.
 1.233 13-Mar-2007  ad Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.
 1.232 12-Mar-2007  ad branches: 1.232.2;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.
 1.231 22-Feb-2007  thorpej branches: 1.231.4;
TRUE -> true, FALSE -> false
 1.230 21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.229 18-Feb-2007  ad Release ufs_hashlock before calling ungetnewvnode().
 1.228 15-Feb-2007  ad branches: 1.228.2;
Destroy the fraglock on unmount.
 1.227 15-Feb-2007  ad Replace some uses of lockmgr() / simplelocks.
 1.226 19-Jan-2007  hannken New file system suspension API to replace vn_start_write and vn_finished_write.
The suspension helpers are now put into file system specific operations.
This means every file system not supporting these helpers cannot be suspended
and therefore snapshots are no longer possible.

Implemented for file systems of type ffs.

The new API is enabled on a kernel option NEWVNGATE. This option is
not enabled by default in any kernel config.

Presented and discussed on tech-kern with much input from
Bill Studenmund <wrstuden@netbsd.org> and YAMAMOTO Takashi <yamt@netbsd.org>.

Welcome to 4.99.9 (new vfs op vfs_suspendctl).
 1.225 04-Jan-2007  elad Consistent usage of KAUTH_GENERIC_ISSUSER.
 1.224 16-Nov-2006  christos branches: 1.224.2; 1.224.4;
__unused removal on arguments; approved by core.
 1.223 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.222 04-Oct-2006  christos fix empty if
 1.221 28-Sep-2006  perseant Use lockstatus instead of a homebrewed locking system to control
LFCNWRAPSTOP and LFCNWRAPGO.

Be less verbose about the various looping checks: use log() rather than
printf(), and only log anything if we are really looping ("count = 2" is
not an error condition).

Allow dirops sleeping on available space to be interruptible.
 1.220 02-Sep-2006  christos branches: 1.220.2; 1.220.4;
- add missing initializers
- comment out impossible code
 1.219 01-Sep-2006  perseant Changes to help the roll-forward agent, to wit:

* Mark being-deleted files in the Ifile so we can finish deleting them
at fs mount time.
* Flag the Ifile with "cleaner must clean" when writers are waiting for
the cleaner, rather than relying solely on the cleaner's estimation of
whether it should clean or not.
* Note partial segments written by a user agent (in particular,
fsck_lfs) so that repeated rolls forward don't interfere with one
another.
* Add a new fcntl, LFCNPASS, that allows the log to wrap exactly once,
for better testing of the validity of checkpoints.
* Keep track of the on-disk nlink count when cleaning, so that we don't
partially complete directory operations while cleaning.
* Ensure that every single Ifile inode write represents a consistent
view of the filesystem. In particular, the accounting for the segment
we are writing the inode into must be correct, and the accounting for
the segment that inode used to reside in must be correct. Rather than
just rewriting the inode if we wrote it wrong, rewrite the necessary
ifile blocks before writing the inode so we never write it wrong.
* Don't unmark any VDIROP vnodes if we haven't written them to disk,
avoiding yet another problem with the "wait for the cleaner" error
return from lfs_putpages().

Also, move the last callback to an aiodone call, so we no longer do any
memory management from interrupt context.
 1.218 23-Jul-2006  ad Use the LWP cached credentials where sane.
 1.217 20-Jul-2006  perseant Separate the (non-working) LFS kernel roll-forward code into its own file,
lfs_rfw.c.
 1.216 13-Jul-2006  martin Fix alignement problems for fhandle_t, exposed by gcc4.1.

While touching all vptofh/fhtovp functions, get rid of VFS_MAXFIDSIZ,
version the getfh(2) syscall and explicitly pass the size available in
the filehandle from userland.

Discussed on tech-kern, with lots of help from yamt (thanks!).
 1.215 06-Jul-2006  perseant Fix a typo that caused a "multiple free" panic on unmounting a resized lfs.
 1.214 29-Jun-2006  perseant Don't wake up the cleaner if the filesystem is unwrappable, and fix the
compatibility fcntls.

Also includes one-line fixes for an MP locking bug and a zero-length FINFO
problem that manifested during testing.
 1.213 24-May-2006  perseant branches: 1.213.2;
Read the inode version number fro a more reliable source, quelling a
diagnostic assertion panic.
 1.212 18-May-2006  perseant branches: 1.212.2;
Break out the finfo array manipulation code into two new functions,
lfs_acquire_finfo() and lfs_release_finfo(). Add a debugging check
for zero-length finfo arrays in the segment summary to avoid future
regressions.
 1.211 18-May-2006  perseant Don't duplicate the LFS_STARVED_FOR_SEGS check (an oversight that came
in with rev 1.210).
 1.210 14-May-2006  elad integrate kauth.
 1.209 12-May-2006  perseant Fixes to address the "vinvalbuf: dirty blocks" panic that can occur when
many inodes are cleaned at once. Make sure that we write all the pages
on vnodes that are being flushed, even if we don't think there's room;
drain v_numoutput before lfs_vflush() completes.

Also, don't allow a vnode that is in the process of being cleaned to be
chosen by getnewvnode(); this avoids a segment accounting panic in the case
that a large number of inodes are fed to lfs_markv() all at once.
 1.208 10-May-2006  mrg quell GCC 4.1 uninitialised variable warnings.

XXX: we should audit the tree for which old ones are no longer needed
after getting the older compilers out of the tree..
 1.207 04-May-2006  perseant Introduce another per-filesystem parameter, lfs_resvseg, to separate the
notion of "how many segments are reserved for the cleaner" from that of
"how many segments are not counted in lfs_bfree". The default value
used for existing filesystems is the same as the previous implicit value
of (lfs_minfreeseg / 2 + 1), modulo some sanity checking.

Count pending dirops on a per-filesystem basis, since once we start
writing them we can't stop until we're done. This seems to help stave off
the "no clean segments" panic in the case of filling the filesystem with
directories and small files (e.g. simultaneously unpacking more copies of
pkgsrc than will fit).
 1.206 30-Apr-2006  perseant Postpone the segment accounting changes coming from truncation until the
inode that makes those changes valid is either written to disk by
lfs_writeinode() or discarded by lfs_vfree().

A couple of locking fixes are also included as well.
 1.205 18-Apr-2006  perseant Don't roll forward if we aren't given a process context. Coverity CID 1076.
 1.204 15-Apr-2006  christos Coverity CID 2499: Fix uninitialize variable use.
 1.203 10-Apr-2006  perseant Remove mostly useless BUFPAGES warning message from lfs_{un,}mount.
 1.202 10-Apr-2006  perseant Optimize the free list search a little more; in particular use words
instead of bytes for the index, and never search below fs->lfs_freehd.

Fix a bug in the previous version of the search (an erroneous assumption
that ino_t was signed).

Free the bitmap when we unmount the filesystem.
 1.201 10-Apr-2006  perseant Correct a locking bug in the recent pager optimization.
 1.200 08-Apr-2006  perseant Implement a somewhat finer-grained mechanism for paging LFS-backed pages.
The writer daemon, if it does not need to flush the whole filesystem,
now only writes the vnodes for which the pagedaemon has requested pageouts
(although it does not pay attention to the page ranges the pagedaemon
supplies).
 1.199 08-Apr-2006  perseant Keep the free list ordered. This solves a problem first pointed out to me
by Michel Oey, in which an aged LFS writes up to an extra Ifile block for
every file created; and paves the way for the truncation of the Ifile when
many files are deleted.
 1.198 31-Mar-2006  perseant Handle the "filesystem is clean" flag correctly when upgrading from
read-only to read-write mount. This makes "root on lfs" work for me,
although it looks like a different traceback from PR#32667.
 1.197 30-Mar-2006  yamt some cleanups after the introduction of GOP_SIZE_MEM flag.
- remove GOP_SIZE_READ/GOP_SIZE_WRITE flags.
they have not been used since the change.
- ufs_balloc_range: remove code which has been no-op since the change.
thanks Konrad Schroder for explaining the original intention of the code.
- ffs_gop_size: don't extend past eof, in the case of GOP_SIZE_MEM.
otherwise genfs_getpages end up to allocate pages past eof unnecessarily.
 1.196 28-Mar-2006  perseant Double-checkpoint on unmount. This ensures that vnodes belonging to removed
files are really freed, preventing occasional spurious EBUSY returns from
vflush().
 1.195 24-Mar-2006  perseant Improvements to LFS's paging mechanism, to wit:

* Acknowledge that sometimes there are more dirty pages to be written to
disk than clean segments. When we reach the danger line,
lfs_gop_write() now returns EAGAIN. The caller of VOP_PUTPAGES(), if
it holds the segment lock, drops it and waits for the cleaner to make
room before continuing.

* Note and avoid a three-way deadlock in lfs_putpages (a writer holding
a page busy blocks on the cleaner while the cleaner blocks on the
segment lock while lfs_putpages blocks on the page).
 1.194 17-Mar-2006  tls From Konrad Schroeder, in response to strange df output on anoncvs.netbsd.org:
We were returning the wrong value for free space. Now we're not.
 1.193 21-Feb-2006  thorpej branches: 1.193.2; 1.193.4; 1.193.6;
Use device_class() instead of accessing dv_class directly.
 1.192 14-Jan-2006  yamt branches: 1.192.2; 1.192.4;
- unify ffs_blkatoff and lfs_blkatoff.
- remove ufs_ops::uo_blkatoff.
- add directory read-ahead code. (disabled for now.)
 1.191 04-Jan-2006  yamt - add simple functions to allocate/free a buffer for i/o.
- make bufpool static.
 1.190 11-Dec-2005  christos branches: 1.190.2;
merge ktrace-lwp.
 1.189 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.188 27-Sep-2005  yamt branches: 1.188.2;
introduce "ufs_ops" and use it for ITIMES.
 1.187 23-Sep-2005  jmmv Apply the NFS exports list rototill patch:

- Remove all NFS related stuff from file system specific code.
- Drop the vfs_checkexp hook and generalize it in the new nfs_check_export
function, thus removing redundancy from all file systems.
- Move all NFS export-related stuff from kern/vfs_subr.c to the new
file sys/nfs/nfs_export.c. The former was becoming large and its code
is always compiled, regardless of the build options. Using the latter,
the code is only compiled in when NFSSERVER is enabled. While doing this,
also make some functions in nfs_subs.c conditional to NFSSERVER.
- Add a new command in nfssvc(2), called NFSSVC_SETEXPORTSLIST, that takes a
path and a set of export entries. At the moment it can only clear the
exports list or append entries, one by one, but it is done in a way that
allows setting the whole set of entries atomically in the future (see the
comment in mountd_set_exports_list or in doc/TODO).
- Change mountd(8) to use the nfssvc(2) system call instead of mount(2) so
that it becomes file system agnostic. In fact, all this whole thing was
done to remove a 'XXX' block from this utility!
- Change the mount*, newfs and fsck* userland utilities to not deal with NFS
exports initialization; done internally by the kernel when initializing
the NFS support for each file system.
- Implement an interface for VFS (called VFS hooks) so that several kernel
subsystems can run arbitrary code upon receipt of specific VFS events.
At the moment, this only provides support for unmount and is used to
destroy NFS exports lists from the file systems being unmounted, though it
has room for extension.

Thanks go to yamt@, chs@, thorpej@, wrstuden@ and others for their comments
and advice in the development of this patch.
 1.186 23-Aug-2005  christos Don't overload MAXNAMLEN, use a separate constant for each filesystem type.
 1.185 19-Aug-2005  christos 64 bit inode changes.
 1.184 23-Jul-2005  yamt update file timestamps for nfsd loaned-read and mmap.
PR/25279. discussed on tech-kern@.
 1.183 28-Jun-2005  yamt branches: 1.183.2;
- constify genfs_ops.
- use member designators.
 1.182 09-Jun-2005  atatat Properly fix the constipated lossage wrt -Wcast-qual and the sysctl
code. I know it's not the prettiest code, but it seems to work rather
well in spite of itself.
 1.181 29-May-2005  christos - sprinkle const
- avoid shadow variables.
 1.180 20-May-2005  perseant Keep track of the number of segments reclaimed, since the cleaner doesn't
do this anymore (it hasn't for quite some time). Add a couple of conditional
debugging messages to indicate why segments are not cleaned, in the event
that lfs_segclean is used.

Make the LFCNSEGWAITALL fcntl work again.
 1.179 20-May-2005  perseant Fill in the lfs_fsmnt field in the superblock when we mount the filesystem,
so fsck(8) can tell where it was last mounted.
 1.178 04-May-2005  perseant Don't let the pager_map deadlock avoidance code in lfs_putpages() write
segments containing zero-block FINFO records. These records cause segments
to become uncleanable, which would eventually result in a "no clean segments"
panic.
 1.177 23-Apr-2005  perseant Provide a resize_lfs(8), including kernel and cleaner support. The current
implementation requires the fs to be mounted while resizing. Tested in both
directions, and everything appears to work happily, but ymmv.
 1.176 19-Apr-2005  perseant Keep per-inode, per-fs, and subsystem-wide counts of blocks allocated through
lfs_balloc(), and use that to estimate the number of dirty pages belonging
to LFS (subsystem or filesystem). This is almost certainly wrong for
the case of a large mmap()ed region, but the accounting is tighter than
what we had before, and performs much better in the typical case of pages
dirtied through write().
 1.175 16-Apr-2005  perseant Remove left-over reference to "lfs_blist", for _LKM case.
 1.174 16-Apr-2005  perseant Use splay trees, rather than a hash table, to manage the accounting of
blocks allocated through VOP_BALLOC() for pages to be written to disk.
This accounting no longer takes a noticeable fraction of the system CPU.
 1.173 14-Apr-2005  perseant Consolidate the hash table we use to maintain the integrity of lfs_avail
into a single, system-wide table, rather than having a separate hash table
per inode. Significantly reduces the "system" cpu usage of your average
file write.
 1.172 14-Apr-2005  perseant Keep track of the highest block held by an LFS inode, so that we can
be assured that the last byte of a file is always allocated. Previously
a file extension could cause the filesystem to be flushed, writing an
inconsistent inode to disk. Although this condition would be corrected
the next time blocks were written to disk, an intervening crash would leave
the filesystem in an inconsistent state, leaving fsck_lfs to complain
of an inode "partially truncated".
 1.171 08-Apr-2005  perseant Clean up the handling of the pager_map deadlock in lfs_putpages, after
realizing that it is safe to sleep the second time through the loop.
 1.170 06-Apr-2005  perseant Fix some locking issues that appeared with the simple_lock work.
Address a "pager_map" deadlock in lfs_putpages().
 1.169 01-Apr-2005  perseant Protect various per-fs structures with fs->lfs_interlock simple_lock, to
improve behavior in the multiprocessor case. Add debugging segment-lock
assertion statements.
 1.168 29-Mar-2005  thorpej - Define a VFS_ATTACH() macro that places a reference to a vfsops structure
into the "vfsops" link set.
- Use VFS_ATTACH() where vfsops are declared for individual file systems.
- In vfsinit(), traverse the "vfsops" link set, rather than vfs_list_initial[].
 1.167 08-Mar-2005  simonb branches: 1.167.2;
Tab Police.
 1.166 08-Mar-2005  perseant Straighten out the maze of ifdefs. Instead, consolidate all the debugging
stuff under '#ifdef DEBUG', and use sysctl knobs to turn on/off particular
parts of the debugging reporting (if DEBUG is enabled). Re-enable the LFS
statistics in sysctl, while I'm there. A bit of a rototill.
 1.165 04-Mar-2005  perseant Move "ifile is too large for your NBUFS/BUFPAGES" messages into a function.
Use log(9) to warn the user instead of printf(9). Since the theory is that
the Ifile is "always in cache", but the greater performance risk is
when the inode entries can't be held in cache, note these two cases
separately, at different log levels (notice and warning, respectively).
 1.164 26-Feb-2005  perry nuke trailing whitespace
 1.163 26-Feb-2005  perseant Various minor LFS improvements:

* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statvfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().
 1.162 11-Jan-2005  mycroft branches: 1.162.2; 1.162.4;
Rearrange some code slightly to avoid uninitialized variable warnings.
 1.161 09-Jan-2005  mycroft Whoops -- move the location of the VOP_OPEN()/VOP_CLOSE(), et al, from
foo_mountfs() to foo_mount(), to match the new mountroot API.
Also, for ext2fs and lfs, copy some restructuring from ffs to allow changing
file system parameters without specifying the device name.
(ntfs could use some more work.)
 1.160 09-Jan-2005  mycroft Rework the mountroot interface so that vfs_mountroot() opens the root device
and just passes it on to the file system functions. This avoids opening and
closing the device several times.

Mentioned on tech-kern some time ago, IIRC. I've been running this for a
long time.
 1.159 02-Jan-2005  thorpej Add the system call and VFS infrastructure for file system extended
attributes.

From FreeBSD.
 1.158 16-Aug-2004  mycroft Make sure to set IMNT_DTYPE here...
 1.157 15-Aug-2004  mycroft Need to set um_dirblksiz here...
 1.156 15-Aug-2004  mycroft Fixing age old cruft:
* Rather than using mnt_maxsymlinklen to indicate that a file systems returns
d_type fields(!), add a new internal flag, IMNT_DTYPE.

Add 3 new elements to ufsmount:
* um_maxsymlinklen, replaces mnt_maxsymlinklen (which never should have existed
in the first place).
* um_dirblksiz, which tracks the current directory block size, eliminating the
FS-specific checks littered throughout the code. This may be used later to
make the block size variable.
* um_maxfilesize, which is the maximum file size, possibly adjusted lower due
to implementation issues.

Sync some bug fixes from FFS into ext2fs, particularly:
* ffs_lookup.c 1.21, 1.28, 1.33, 1.48
* ffs_inode.c 1.43, 1.44, 1.45, 1.66, 1.67
* ffs_vnops.c 1.84, 1.85, 1.86

Clean up some crappy pointer frobnication.
 1.155 14-Aug-2004  mycroft Add a new flag, IN_MODIFY. This is like IN_UPDATE|IN_CHANGE, but unlike
setting those flags, it does not cause the inode to be written in the periodic
sync. This is used for writes to special files (devices and named pipes) and
FIFOs.

Do not preemptively sync updates to access times and modification times. They
are now updated in the inode only opportunistically, or when the file or device
is closed. (Really, it should be delayed beyond close, but this is enough to
help substantially with device nodes.)

And the most amusing part:
Trickle sync was broken on both FFS and ext2fs, in different ways. In FFS, the
periodic call to VFS_SYNC(MNT_LAZY) was still causing all file data to be
synced. In ext2fs, it was causing the metadata to *not* be synced. We now
only call VOP_UPDATE() on the node if we're doing MNT_LAZY. I've confirmed
that we do in fact trickle correctly now.
 1.154 05-Jul-2004  pk Call inittodr() from main(). Let file system code set the recorded `last
update' time (if any) through the new function setrootfstime().
 1.153 30-May-2004  yamt lfs_gop_write: assert that ifile never come here.
 1.152 25-May-2004  hannken Add ffs internal snapshots. Written by Marshall Kirk McKusick for FreeBSD.

- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.

Welcome to 2.0F.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.151 25-May-2004  atatat Sysctl descriptions under vfs subtree
 1.150 20-May-2004  atatat Explicitly call pool_init() (and pool_destroy()) when being built as
an _LKM.

This adds pools to the list of things that lkms must do manually
because they're set up with link sets. Not that there's anything
wrong with link sets, but that we need to try harder to remember that
lkms are second class citizens. Of a sort.
 1.149 25-Apr-2004  simonb Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.
 1.148 22-Apr-2004  yamt lfs_statvfs: report f_frsize correctly.
 1.147 21-Apr-2004  christos Replace the statfs() family of system calls with statvfs().
Retain binary compatibility.
 1.146 27-Mar-2004  atatat branches: 1.146.2;
Manually attach malloc types when being built as an lkm.
 1.145 24-Mar-2004  atatat Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.
 1.144 26-Feb-2004  oster Add a missing:

pool_destroy(&lfs_dinode_pool);

to lfs_done().

Approved-by: yamt
 1.143 28-Jan-2004  he Let the cast to (long long) for using the result as a printf argument
apply to the whole expression, not just the first factor.
 1.142 28-Jan-2004  yamt use bufmem instead of bufpages to make lfs a little less broken.
 1.141 04-Dec-2003  atatat Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
 1.140 07-Nov-2003  yamt - tweak lfs_update_single()'s prototype so that it can be used by
roll-forward code.
- reduce code duplication using the above in update_meta()
this also fixes fragment accounting.
 1.139 07-Nov-2003  yamt fix spec vnode aliasing.
 1.138 07-Nov-2003  yamt - tell filesize changes to vm when roll-forwarding data blocks.
- handle fragment extension better during roll-forward.
- related assertions.
 1.137 30-Oct-2003  simonb Remove some assigned-to but otherwise unused variables.
 1.136 25-Oct-2003  christos Fix uninitialized variable warnings.
 1.135 14-Oct-2003  dbj add mnt_iflag field to struct mount for internal flags
mv MNT_GONE, MNT_UNMOUNT and MNT_WANTRDWR to this field
additonally add mnt_writeopcountupper and mnt_writeopcountlower fields
in preparation for pending write suspension support work
bump kernel version to 1.6ZD
 1.134 14-Oct-2003  yamt add a prototype of check_segsum().
 1.133 14-Oct-2003  yamt when roll-forwarding, check segment serial numbers correctly.
 1.132 14-Oct-2003  yamt add a missing fsbtodb() to read a correct block for roll-forwarding.
 1.131 07-Sep-2003  yamt comments on lfs_issequential_hole.
 1.130 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.129 23-Jul-2003  yamt add parenthesis missed in rev.1.127.
 1.128 23-Jul-2003  yamt whitespace
 1.127 23-Jul-2003  yamt add KASSERTs in lfs_issequential_hole.
 1.126 12-Jul-2003  yamt more MP locks.
 1.125 12-Jul-2003  yamt - protect global resource counts with lfs_subsys_lock.
- clean up scattered externs a little.
 1.124 02-Jul-2003  yamt use queue.h macros.
 1.123 02-Jul-2003  yamt use VFSTOUFS macro.
 1.122 02-Jul-2003  yamt - add a new functions, lfs_writer_enter/leave, and use them instead of
duplicated code fragments.
- add an assertion.
 1.121 29-Jun-2003  fvdl branches: 1.121.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.120 29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.119 28-Jun-2003  bouyer Adapt for struct proc* -> struct lwp* changes.
 1.118 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.117 18-May-2003  yamt make is_sequential a callback in order to achieve better lfs write clustering.

since lfs always rewrite blocks into the new segment,
current on-disk place of the block doesn't affect to write clustering.

ok'ed by Konrad Schroder.
 1.116 29-Apr-2003  perseant Restrict the run of cluster blocks to on-disk contiguous blocks (back out
part of rev 1.115), to avoid writing over holes. This is the lesser of two
evils, to be replaced soon.
 1.115 23-Apr-2003  perseant Make LFS work better (though still not "well") as an NFS-exported
filesystem (and other things that needed to be fixed before the tests
would complete), to wit:

* Include the fs ident in the filehandle; improve stale filehandle checks.

* Change definition of blksize() to use the on-dinode size instead of
the inode's i_size, so that fsck_lfs will work properly again.

* Use b_interlock in lfs_vtruncbuf.

* Postpone dirop reclamation until after the seglock has been released,
so that lfs_truncate is not called with the segment lock held.

* Don't loop in lfs_fsync(), just write everything and wait.

* Be more careful about the interlock/uobjlock in lfs_putpages: when we
lose this lock, we have to resynchronize dirtiness of pages in each
block.

* Be sure to always write indirect blocks and update metadata in
lfs_putpages; fixes a bug that caused blocks to be accounted to the
wrong segment.
 1.114 16-Apr-2003  christos PR/1796: John Kohl: statfs misbehaves under chrooted environments.

- Under chroot it displays only the visible filesystems with appropriate paths.
- The statfs f_mntonname gets adjusted to contain the real path from root.
- While was there, fixed a bug in ext2fs, locking problems with vfs_getfsstat(),
and factored out some of the vfsop statfs() code to copy_statfs_info(). This
fixes the problem where some filesystems forgot to set fsid.
- Made coda look more like a normal fs.
 1.113 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.112 28-Mar-2003  perseant Add a sleeper count, to prevent the cleaner from panicing the kernel
when the filesystem is unmounted, relocking the Ifile when its lock is
draining. (We can't use vfs_busy() since the process is sleeping for a
good long time.) Clean up / organize lfs.h, while I'm here.

In lfs_update_single, assert that disk addresses are either negative, or
are still positive when converted to int32_t, to prevent recurrence of a
negative/positive block problem.
 1.111 21-Mar-2003  dsl Use 'void *' instead of 'caddr_t' in prototypes of VOP_IOCTL, VOP_FCNTL
and VOP_ADVLOCK, delete casts from callers (and some to copyin/out).
 1.110 21-Mar-2003  perseant KNF (space after keywords).
 1.109 21-Mar-2003  perseant Use VONWORKLST as a heuristic for vnode emptiness, rather than exhaustively
checking the memq.

Take greater care not to dirty the Ifile vnode when unmounting the filesystem.
This should fix a "(vp->v_flag & VONWORKLST) == 0" assertion panic in vgonel
that could occur when unmounting.

Do not allow the Ifile to be mapped for writing.
 1.108 21-Mar-2003  yamt make this compilable with DIAGNOSTIC and without DEBUG.
fix PR 20827 from FUKAUMI Naoki.
 1.107 20-Mar-2003  yamt fix "more than one fragment" panics;
direct and indirect block pointers are not valid in the case of shortlinks.
while i'm here, move duplicated code in lfs_vget/fastvget into a new
function, lfs_vinit.
 1.106 18-Mar-2003  perseant Remember to destroy lfs_inoext_pool when closing up the LFS subsystem.
 1.105 15-Mar-2003  perseant Add simple_lock protection for lfs_seglock and lfs_subsys_pages; these will
be expanded to cover other per-fs and subsystem-wide data as well.

Fix a case of IN_MODIFIED being set without updating lfs_uinodes, resulting
in a "lfs_uinodes < 0" panic.

Fix a deadlock in lfs_putpages arising from the need to busy all pages in a
block; unbusy any that had already been busied before starting over.
 1.104 08-Mar-2003  perseant Take away "#ifdef LFS_UBC".
 1.103 08-Mar-2003  perseant Add an lfs_strategy() that checks to make sure we're not trying to read
where the cleaner is trying to write, instead of tying up the "live"
buffers (or pages).

Fix a bug in the LFS_UBC case where oversized buffers would not be
checksummed correctly, causing uncleanable segments.

Make sure that wakeup(fs->lfs_iocount) is done if fs->lfs_iocount is 1
as well as 0, since we wait in some places for it to drop to 1.

Activate all pages that make it into lfs_gop_write without the segment
lock held, since they must have been dirtied very recently, even if
PG_DELWRI is not set.
 1.102 02-Mar-2003  perseant Account SEGUSE_ACTIVE correctly so that the automatic segment cleaning
actually happens.

Add a new fcntl call that will write the minimum necessary to checkpoint
(i.e., for on-disk directory structure to be consistent, not including
updates to file data) so that the cleaner can clean segments more quickly
without sacrificing three-way commit for cleaning.
 1.101 01-Mar-2003  yamt use pid_t for pid.
 1.100 01-Mar-2003  perseant Be careful to always zero pages on truncation/fragment extension,
in the case where the filesystem block size is larger than PAGE_SIZE.
 1.99 25-Feb-2003  thorpej Add a new BUF_INIT() macro which initializes b_dep and b_interlock, and
use it. This fixes a few places where either b_dep or b_interlock were
not properly initialized.
 1.98 25-Feb-2003  yamt fix simplelocks
 1.97 23-Feb-2003  perseant Fix a buffer overflow bug in the LFS_UBC case that manifested itself
either as a mysterious UVM error or as "panic: dirty bufs". Verify
maximum size in lfs_malloc.

Teach lfs_updatemeta and lfs_shellsort about oversized cluster blocks from
lfs_gop_write.

When unwiring pages in lfs_gop_write, deactivate them, under the theory
that the pagedaemon wanted to free them last we knew.
 1.96 20-Feb-2003  perseant Tabify, and fix some comment alignment problems.
 1.95 19-Feb-2003  yamt workaround for "another flush is..." infinity loop in writerd.
if we're writerd, sleep in lfs_flush until another writer goes away
instead of busy loop in writed.
 1.94 19-Feb-2003  yamt wire the pages instead of just dequeue'ing them.
advised by Chuck Silvers.
 1.93 19-Feb-2003  yamt init b_interlock.
 1.92 19-Feb-2003  yamt init b_interlock.
 1.91 17-Feb-2003  perseant Add code to UBCify LFS. This is still behind "#ifdef LFS_UBC" for now
(there are still some details to work out) but expect that to go
away soon. To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:

* Create a writer daemon kernel thread whose purpose is to handle page
writes for the pagedaemon, but which also takes over some of the
functions of lfs_check(). This thread is started the first time an
LFS is mounted.

* Add a "flags" parameter to GOP_SIZE. Current values are
GOP_SIZE_READ, meaning that the call should return the size of the
in-core version of the file, and GOP_SIZE_WRITE, meaning that it
should return the on-disk size. One of GOP_SIZE_READ or
GOP_SIZE_WRITE must be specified.

* Instead of using malloc(...M_WAITOK) for everything, reserve enough
resources to get by and use malloc(...M_NOWAIT), using the reserves if
necessary. Use the pool subsystem for structures small enough that
this is feasible. This also obsoletes LFS_THROTTLE.

And a few that are not strictly necessary:

* Moves the LFS inode extensions off onto a separately allocated
structure; getting closer to LFS as an LKM. "Welcome to 1.6O."

* Unified GOP_ALLOC between FFS and LFS.

* Update LFS copyright headers to correct values.

* Actually cast to unsigned in lfs_shellsort, like the comment says.

* Keep track of which segments were empty before the previous
checkpoint; any segments that pass two checkpoints both dirty and
empty can be summarily cleaned. Do this. Right now lfs_segclean
still works, but this should be turned into an effectless
compatibility syscall.
 1.90 29-Jan-2003  yamt don't use daddr_t for segment summary since it's an on-disk structure.
 1.89 27-Jan-2003  yamt make these compilable with lfs debug options.
(follow daddr_t change)

XXX maybe segment number should be 64bit.
 1.88 25-Jan-2003  kleink Fix further printf format warnings for DEBUG, in the wake of daddr_t
having changed.
 1.87 25-Jan-2003  tron Use PRId64 instead of hard coding "%lld" to fix build problems under
LP64 ports.
 1.86 25-Jan-2003  tron Fix printf() format strings problems caused by "daddr_t" change.
 1.85 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.84 12-Jan-2003  yamt - zerofill struct lfs when allocating it.
- use M_ZERO instead of memset after malloc.
 1.83 24-Nov-2002  yamt lfs_sync should wait at lfs_writer, not lfs_dirops.
PR 18973.
 1.82 27-Sep-2002  provos remove trailing \n in panic(). approved perry.
 1.81 21-Sep-2002  christos MNT_GETARGS support
 1.80 06-Sep-2002  gehenna Merge the gehenna-devsw branch into the trunk.

This merge changes the device switch tables from static array to
dynamically generated by config(8).

- All device switches is defined as a constant structure in device drivers.

- The new grammer ``device-major'' is introduced to ``files''.

device-major <prefix> char <num> [block <num>] [<rules>]

- All device major numbers must be listed up in port dependent majors.<arch>
by using this grammer.

- Added the new naming convention.
The name of the device switch must be <prefix>_[bc]devsw for auto-generation
of device switch tables.

- The backward compatibility of loading block/character device
switch by LKM framework is broken. This is necessary to convert
from block/character device major to device name in runtime and vice versa.

- The restriction to assign device major by LKM is completely removed.
We don't need to reserve LKM entries for dynamic loading of device switch.

- In compile time, device major numbers list is packed into the kernel and
the LKM framework will refer it to assign device major number dynamically.
 1.79 30-Jul-2002  soren Die, qaddr_t, die! - mnt_data in struct mount is already effectively
a void *, so stop pretending otherwise.
 1.78 06-Jul-2002  perseant Deal with fragment size changes better. For each fragment that can
exist on an on-disk inode, we keep a record of its size in struct inode,
which is updated when we write the block to disk. The cleaner routines
thus have ready access to what size is the correct size for this block,
on disk.

Fixed a related bug: if a file with fragments is being cleaned
(fragments being cleaned) at the same time it is being extended beyond
NDADDR blocks, we could write a bogus FINFO record that has a frag in the
middle; when it was cleaned this would give back bogus file data. Don't
write the indirect blocks in this case, since there is no need.

lfs_fragextend and lfs_truncate no longer require the seglock, but instead
take a shared lock, which the seglock locks exclusively.
 1.77 16-Jun-2002  perseant For synchronous writes, keep separate i/o counters for each write, so
processes don't have to wait for one another to finish (e.g., nfsd seems
to be a little happier now, though I haven't measured the difference).
Synchronous checkpoints, however, must always wait for all i/o to finish.

Take the contents of the callback functions and have them run in thread
context instead (aiodoned thread). lfs_iocount no longer has to be
protected in splbio(), and quite a bit less of the segment construction
loop needs to be in splbio() as well.

If lfs_markv is handed a block that is not the correct size according to
the inode, refuse to process it. (Formerly it was extended to the "correct"
size.) This is possibly more prone to deadlock, but less prone to corruption.

lfs_segclean now outright refuses to clean segments that appear to have live
bytes in them. Again this may be more prone to deadlock but avoids
corruption.

Replace ufsspec_close and ufsfifo_close with LFS equivalents; this means
that no UFS functions need to know about LFS_ITIMES any more. Remove
the reference from ufs/inode.h.

Tested on i386, test-compiled on alpha.
 1.76 17-May-2002  perseant branches: 1.76.2;
use macros from <sys/queue.h>
 1.75 16-May-2002  thorpej Fix LP64 printf format warning.
 1.74 14-May-2002  perseant branches: 1.74.2;
Phase one of my three-phase plan to make LFS play nice with UBC, and bug-fixes
I found while making sure there weren't any new ones.

* Make the write clusters keep track of the buffers whose blocks they contain.
This should make it possible to (1) write clusters using a page mapping
instead of malloc, if desired, and (2) schedule blocks for rewriting
(somewhere else) if a write error occurs. Code is present to use
pagemove() to construct the clusters but that is untested and will go away
anyway in favor of page mapping.
* DEBUG now keeps a log of Ifile writes, so that any lingering instances of
the "dirty bufs" problem can be properly debugged.
* Keep track of whether the Ifile has been dirtied by various routines that
can be called by lfs_segwrite, and loop on that until it is clean, for
a checkpoint. Checkpoints need to be squeaky clean.
* Warn the user (once) if the Ifile grows larger than is reasonable for their
buffer cache. Both lfs_mountfs and lfs_unmount check since the Ifile can
grow.
* If an inode is not found in a disk block, try rereading the block, under
the assumption that the block was copied to a cluster and then freed.
* Protect WRITEINPROG() with splbio() to fix a hang in lfs_update.
 1.73 12-May-2002  matt Eliminate commons.
 1.72 08-Mar-2002  thorpej Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.
 1.71 18-Dec-2001  chs use the new compatibility routines to allow mmap() to work
(in the same non-coherent fashion that it worked pre-UBC)
until someone has time to do it the right way.
 1.70 23-Nov-2001  chs add spaces for KNF. confirmed to produce identical objects.
 1.69 08-Nov-2001  lukem add RCSID
 1.68 15-Sep-2001  chs branches: 1.68.2;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.67 15-Sep-2001  chs add a new VFS op, vfs_reinit, which is called when desiredvnodes is
adjusted via sysctl. file systems that have hash tables which are
sized based on the value of this variable now resize those hash tables
using the new value. the max number of FFS softdeps is also recalculated.

convert various file systems to use the <sys/queue.h> macros for
their hash tables.
 1.66 13-Jul-2001  perseant branches: 1.66.2;
Merge the short-lived perseant-lfsv2 branch into the trunk.

Kernels and tools understand both v1 and v2 filesystems; newfs_lfs
generates v2 by default. Changes for the v2 layout include:

- Segments of non-PO2 size and arbitrary block offset, so these can be
matched to convenient physical characteristics of the partition (e.g.,
stripe or track size and offset).

- Address by fragment instead of by disk sector, paving the way for
non-512-byte-sector devices. In theory fragments can be as large
as you like, though in reality they must be smaller than MAXBSIZE in size.

- Use serial number and filesystem identifier to ensure that roll-forward
doesn't get old data and think it's new. Roll-forward is enabled for
v2 filesystems, though not for v1 filesystems by default.

- The inode free list is now a tailq, paving the way for undelete (undelete
is not yet implemented, but can be without further non-backwards-compatible
changes to disk structures).

- Inode atime information is kept in the Ifile, instead of on the inode;
that is, the inode is never written *just* because atime was changed.
Because of this the inodes remain near the file data on the disk, rather
than wandering all over as the disk is read repeatedly. This speeds up
repeated reads by a small but noticeable amount.

Other changes of note include:

- The ifile written by newfs_lfs can now be of arbitrary length, it is no
longer restricted to a single indirect block.

- Fixed an old bug where ctime was changed every time a vnode was created.
I need to look more closely to make sure that the times are only updated
during write(2) and friends, not after-the-fact during a segment write,
and certainly not by the cleaner.
 1.65 30-May-2001  mrg branches: 1.65.2; 1.65.4;
use _KERNEL_OPT
 1.64 26-Jan-2001  itohy branches: 1.64.2;
Call inittodr() from lfs_mountroot() so that the system time is set properly
when booted from LFS.
 1.63 22-Jan-2001  jdolecek make filesystem vnodeop, specop, fifoop and vnodeopv_* arrays const
 1.62 03-Dec-2000  perseant Call uvm_vmp_setsize() in lfs_{fast,}vget to set initial vnode size.
 1.61 03-Dec-2000  chs don't forget to set um_lognindir (now required by ufs_bmaparray()).
 1.60 27-Nov-2000  perseant If LFS_DO_ROLLFORWARD is defined, roll forward from the older checkpoint
on mount, through the newer checkpoint and on through any newer
partial-segments that may have been written but not checkpointed because
of an intervening crash.

LFS_DO_ROLLFORWARD is not defined by default.
 1.59 14-Nov-2000  perseant Initialize the cleaner information in the Ifile from the same info from
the superblock at fs mount time, enabling the previous patch to fsck_lfs.
Patch from Jesse Off <joff@gci-net.com> (Closes PR #11470).
 1.58 09-Sep-2000  perseant Various bug-fixes to LFS, to wit:


Kernel:

* Add runtime quantity lfs_ravail, the number of disk-blocks reserved
for writing. Writes to the filesystem first reserve a maximum amount
of blocks before their write is allowed to proceed; after the blocks
are allocated the reserved total is reduced by a corresponding amount.

If the lfs_reserve function cannot immediately reserve the requested
number of blocks, the inode is unlocked, and the thread sleeps until
the cleaner has made enough space available for the blocks to be
reserved. In this way large files can be written to the filesystem
(or, smaller files can be written to a nearly-full but thoroughly
clean filesystem) and the cleaner can still function properly.

* Remove explicit switching on dlfs_minfreeseg from the kernel code; it
is now merely a fs-creation parameter used to compute dlfs_avail and
dlfs_bfree (and used by fsck_lfs(8) to check their accuracy). Its
former role is better assumed by a properly computed dlfs_avail.

* Bounds-check inode numbers submitted through lfs_bmapv and lfs_markv.
This prevents a panic, but, if the cleaner is feeding the filesystem
the wrong data, you are still in a world of hurt.

* Cleanup: remove explicit references of DEV_BSIZE in favor of
btodb()/dbtob().

lfs_cleanerd:

* Make -n mean "send N segments' blocks through a single call to
lfs_markv". Previously it had meant "clean N segments though N calls
to lfs_markv, before looking again to see if more need to be cleaned".
The new behavior gives better packing of direct data on disk with as
little metadata as possible, largely alleviating the problem that the
cleaner can consume more disk through inefficient use of metadata than
it frees by moving dirty data away from clean "holes" to produce
entirely clean segments.

* Make -b mean "read as many segments as necessary to write N segments
of dirty data back to disk", rather than its former meaning of "read
as many segments as necessary to free N segments worth of space". The
new meaning, combined with the new -n behavior described above,
further aids in cleaning storage efficiency as entire segments can be
written at once, using as few blocks as possible for segment summaries
and inode blocks.

* Make the cleaner take note of segments which could not be cleaned due
to error, and not attempt to clean them until they are entirely free
of dirty blocks. This prevents the case in which a cleanerd running
with -n 1 and without -b (formerly the default) would spin trying
repeatedly to clean a corrupt segment, while the remaining space
filled and deadlocked the filesystem.

* Update the lfs_cleanerd manual page to describe all the options,
including the changes mentioned here (in particular, the -b and -n
flags were previously undocumented).

fsck_lfs:

* Check, and optionally fix, lfs_avail (to an exact figure) and
lfs_bfree (within a margin of error) in pass 5.

newfs_lfs:

* Reduce the default dlfs_minfreeseg to 1/20 of the total segments.

* Add a warning if the sgs disklabel field is 16 (the default for FFS'
cpg, but not usually desirable for LFS' sgs: 5--8 is a better range).

* Change the calculation of lfs_avail and lfs_bfree, corresponding to
the kernel changes mentioned above.

mount_lfs:

* Add -N and -b options to pass corresponding -n and -b options to
lfs_cleanerd.

* Default to calling lfs_cleanerd with "-b -n 4".


[All of these changes were largely tested in the 1.5 branch, with the
idea that they (along with previous un-pulled-up work) could be applied
to the branch while it was still in ALPHA2; however my test system has
experienced corruption on another filesystem (/dev/console has gone
missing :^), and, while I believe this unrelated to the LFS changes, I
cannot with good conscience request that the changes be pulled up.]
 1.57 05-Jul-2000  perseant Clean up accounting of lfs_uinodes (dirty but unwritten inodes).

Make lfs_uinodes a signed quantity for debugging purposes, and set it to
zero as fs mount time.

Enclose setting/clearing of the dirty flags (IN_MODIFIED, IN_ACCESSED,
IN_CLEANING) in macros, and use those macros everywhere. Make
LFS_ITIMES use these macros; updated the ITIMES macro in inode.h to know
about this. Make ufs_getattr use ITIMES instead of FFS_ITIMES.
 1.56 03-Jul-2000  perseant Allow the number of free segments reserved for the cleaner to be
parametrized in the filesystem, defaulting to MIN_FREE_SEGS = 2 but set
to something more reasonable at newfs_lfs time.

Note the number of blocks that have been scheduled for writing but which
are not yet on disk in an inode extension, i_lfs_effnblks. Move
i_ffs_effnlink out of the ffs extension and onto the main inode, since
it's used all over the shared code and the lfs extension would clobber
it.

At inode write time, indirect blocks and inode-held blocks of inodes
that have i_lfs_effnblks != i_ffs_blocks are cleansed of UNWRITTEN disk
addresses, so that these never make it to disk.
 1.55 30-Jun-2000  fvdl Rearrange code around getnewvnode as was already done for ffs, to avoid
locking against oneself because getnewvnode recycles a softdep-using vnode.
 1.54 28-Jun-2000  mrg <vm/vm.h> -> <uvm/uvm_extern.h>
 1.53 27-Jun-2000  perseant Fixes associated with filling an LFS:

Change the space computation to appear to change the size of the *disk*
rather than the *bytes used* when more segment summaries and inode
blocks are written. Try to estimate the amount of space that these will
take up when more files are written, so the disk size doesn't change too
much.

Regularize error returns from lfs_valloc, lfs_balloc, lfs_truncate: they
now fail entirely, rather than succeeding half-way and leaving the fs in
an inconsistent state.

Rewrite lfs_truncate, mostly stealing from ffs_truncate. The old
lfs_truncate had difficulty truncating a large file to a non-zero size
(indirect blocks were not handled appropriately).

Unmark VDIROP on fvp after ufs_remove, ufs_rmdir, so these can be
reclaimed immediately: this vnode would not be written to disk again
anyway if the removal succeeded, and if it failed, no directory
operation occurred.

ufs_makeinode and ufs_mkdir now remove IN_ADIROP on error.
 1.52 27-May-2000  perseant branches: 1.52.4;
Prevent dirops from getting around lfs_check and wedging the buffer cache.
All the dirop vnops now mark the inodes with a new flag, IN_ADIROP, which
is removed as soon as the dirop is done (as opposed to VDIROP which stays
until the file is written). To address one issue raised in PR#9357.
 1.51 19-May-2000  thorpej NULL != 0
 1.50 29-Apr-2000  perseant Test whether the filesystem is an LFS before trying to read the alternate
superblock (whose disk address is stored in the primary superblock). Also,
refuse to mount a filesystem whose superblocks overlap or where the alt.
superblock has a lower disk address than the primary superblock.

Solves PR#10001.
 1.49 23-Apr-2000  perseant Fix problems outlined in PR#9926:
- lfs_truncate extends the file if called with length > i_ffs_size;
- lfs_truncate errors out if called with length < 0;
- lfs_balloc block accounting corrected for the case of blocks read
into the cache before they exist on disk;
- mp->mnt_stat.f_iosize is initialized in lfs_mountfs.
 1.48 30-Mar-2000  augustss Remove register declarations.
 1.47 16-Mar-2000  jdolecek Add new VFS op routine - vfs_done and call it on filesystem detach
in vfs_detach(). vfs_done may free global filesystem's resources,
typically those allocated in respective filesystem's init function.
Needed so those filesystems which went in via LKM have a chance to
clean after themselves before unloading. This fixes random panics
when LKM for filesystem using pools was loaded and unloaded several
times.

For each leaf filesystem, add appropriate vfs_done routine.
 1.46 19-Jan-2000  perseant Changes to stabilize LFS. The first two of these should also apply to the
1.4 branch.

* Use a separate per-fs lock, instead of ufs_hashlock, to protect the Inode
free list. This seems to prevent the "lockmgr: %d, not exclusive lock holder
%d, unlocking" message I was mis-attributing last night to an unlocked vnode
being passed to vrele.

* Change calling semantics of lfs_ifind, to give better error reporting:
If fed a struct buf, it can report the block number of the offending inode
block as well as the inode number.

* Back out rev 1.10 of lfs_subr.c, since the replacement code was slightly
uglier while being functionally identical.

* Make lfs_vunref use the same free list convention as vrele/vput, so that
vget does not remove vnodes from a hash list they are not on.
 1.45 21-Nov-1999  perseant Initialize i_ffs_effnlink, so every file doesn't look like it's already been
deleted for the purpose of dirops (particularly create and mkdir). Addresses
PR#8815.
 1.44 15-Nov-1999  fvdl Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.43 12-Nov-1999  perseant Back out my patch of the 8th (to address unreferenced inode problem).
Apparently this needs more thought.
 1.42 09-Nov-1999  perseant If ifile blocks were written before dirops were complete, and then the
system crashed, inodes could be allocated that were not referenced. (Though
not a serious problem, it evidences itself in phase 4 of fsck_lfs.) Fix
this by marking if_daddr with UNASSIGNED before the inodes are actually
written; at mount time the ifile is checked for UNASSIGNED entries and
any that are found are linked back into the free list. (The latter
functionality should move into the roll-forward agent when it materializes.)
 1.41 06-Nov-1999  perseant branches: 1.41.2;
Address ufs_hashlock/ufs_ihashins protocol bug, discovered while doing a
post-mortem of a production machine. Also, take the active dirop
count off of the fs and make it global (since it is measuring a global
resource) and tie the threshold value LFS_MAXDIROP to desiredvnodes.
 1.40 20-Oct-1999  enami Check if the type of device node isn't VBAD before touching v_specinfo. If
the device vnode is revoked, the field is NULL and touching it causes null
pointer derefercence.
 1.39 18-Oct-1999  wrstuden branches: 1.39.2; 1.39.4;
Catch a few cases missed earlier where we need to lock the vnode before
calling VOP_CLOSE().
 1.38 08-Sep-1999  augustss branches: 1.38.2;
Add #include <sys/device.h> so this file compiles again.
 1.37 08-Sep-1999  sommerfeld Avoid dereferencing NULL rootvp if booting diskless.
 1.36 03-Sep-1999  perseant Make changes that will allow an LFS filesystem to be used as the root
filesystem. In particular,

- Fix mknod deadlock, described in PR 8172.
- Enable lfs_mountroot.
- Make lfs_writevnodes treat filesystems mounted on lfs device nodes properly,
by flushing that device rather than trying to add blocks to the device inode.

This, in combination with lfs boot blocks, will allow operation of an all-lfs
system.
 1.35 17-Jul-1999  wrstuden Adjust mountroot routines to vrele rootvp in case of mount error. Closes
PR 7977 by Neil Carson, <neil@brini.com>.
 1.34 01-Jun-1999  perseant Fixed lfs_update (and related functions) so that calls from lfs_fsync
will DTRT with vnodes marked VDIROP. In particular, the message
"flushing VDIROP" will no longer appear, and the filesystem will remain
stable in the event of a crash.

This was particularly a problem with NFS-exported LFSes, since fsync
was called on every file close.
 1.33 04-May-1999  scottr Include opt_ddb.h so we will get the Debugger() prototype.
 1.32 12-Apr-1999  perseant Check the superblock version field, and refuse to mount the filesystem
if the version number is higher than we know about. This allows, e.g.,
changes in the format of the ifile, segment size restrictions and boundaries,
etc., which would not affect existing fields in the superblock, but which
would drastically affect the filesystem, to be smoothly integrated at a
later date.
 1.31 11-Apr-1999  perseant Fix inode reporting in lfs_statfs (the meaning of f_files and f_ffree was
reversed).
 1.30 11-Apr-1999  perseant Mark the current segment with SEGUSE_ACTIVE at mount time, rather than waiting
for the first write. If this is not done, the cleaner may try to clean the
current segment out from under the writer if the filesystem is mounted after
a crash (or any other time that the dirty:clean segment ration is high enough).
 1.29 04-Apr-1999  mycroft Fix obvious bugs:
* The MNT_UPDATE case had a null pointer dereference. (This is a good example
of why blindly adding bogus initializiers is a FUNDAMENTALLY BAD IDEA!)
* Make sure the whole ufsmount is zeroed, as the export code relies on this.
* If we decided to use the second/alternate superblock, make sure to copy the
in-core version from the right buffer.
Also, reenable NFS exporting.
 1.28 25-Mar-1999  perseant branches: 1.28.2;
clean up unused/required #ifdefs
 1.27 24-Mar-1999  tron Don't include "opt_uvm.h" any more.
 1.26 10-Mar-1999  perseant New sources should leave the LFS in a more-or-less working state. Changes
include:

- DIROP segregation is enabled, and greater care is taken
to make sure that a checkpoint completes. Fsck is not
needed to remount the filesystem.
- Several checks to make sure that the LFS subsystem does not
overuse various resources (memory, in particular).
- The cleaner routines, lfs_markv in particular, are completely
rewritten. A buffer overflow is removed. Greater care is taken
to ensure that inodes come from where lfs_cleanerd say they come
from (so we know nothing has changed since lfs_bmapv was called).
- Fragment allocation is fixed, so that writes beyond end-of-file
do the right thing.
 1.25 26-Feb-1999  wrstuden Modify vfsops to seperate vfs_fhtovp() into two routines. vfs_fhtovp() now
only handles the file handle to vnode conversion, and a new call,
vfs_checkexp(), performs the export verification.
 1.24 11-Sep-1998  pk PR#6032: define fixed sized on-disk superblock structure.
 1.23 01-Sep-1998  thorpej Use the pool allocator and the "nointr" pool page allocator for LFS inodes.
 1.22 24-Jun-1998  sommerfe Always include fifos; "not an option any more".
 1.21 22-Jun-1998  sommerfe defopt for options FIFO
 1.20 09-Jun-1998  scottr Protect various config(8)-generated files from inclusion while
building LKMs. Fixes PR 5557.
 1.19 08-Jun-1998  scottr Use the newly-defined opt_quota.h.
 1.18 18-Mar-1998  bouyer Add support for reading/writing FFS in non-native byte order, conditioned
to "options FFS_EI". The superblock and inodes (without blk addr) are
byteswapped at disk read/write time, other metadatas are byteswapped
when used (as they are acceeded directly in the buffer cache).
This required the addition of a "um_flags" field to struct ufsmount.
ffs_bswap.c contains superblock and inode byteswap routines also used
by userland utilities.
 1.17 01-Mar-1998  fvdl Remove accidentally enabled lfs_mountroot from vfsops struct.
 1.16 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.15 18-Feb-1998  thorpej Place a pointer to an array of our vnodeopv_desc *'s in our vfsops
structure, for use by vfs_attach().
 1.14 16-Oct-1997  mjacob In calculating the f_bavail field, don't take 32 bit quantities and
multiply them by 90 (to be divided by 100) and expect them to be sane
for very large values (I was getting a negative 'avail' count).
 1.13 11-Jun-1997  bouyer Add support for ext2fs, this needed a few modifications to ufs/ufs/inode.h:
- added an "union inode_ext" to struct inode, for the per-fs extentions.
For now only ext2fs uses it.
- i_din is now an union:
union {
struct dinode ffs_din; /* 128 bytes of the on-disk dinode. */
struct ext2fs_dinode e2fs_din; /* 128 bytes of the on-disk dinode. */
} i_din
Added a lot of #define i_ffs_* and i_e2fs_* to access the fields.
- Added two macros: FFS_ITIMES and EXT2FS_ITIMES. ITIMES calls the rigth
macro, depending on the time of the inode. ITIMES is used where necessary,
FFS_ITIMES and EXT2FS_ITIMES in other places.
 1.12 22-Dec-1996  cgd Change the second and third args to struct vfsops' (*vfs_mount)() to
'const char *', and 'void *', respectively. The second arg is taken directly
from user arguments, and is const there, so must be const in the prototypes
and functions. The third arg is also taken directly from user arguments.
It doesn't have to be changed, but since it's cleaner to keep the type
the same as the user arg's type, and I'm already making the 'const char *'
change...
 1.11 25-Mar-1996  pk Appease gcc: unused variables if !QUOTA
 1.10 09-Feb-1996  christos lfs prototypes
 1.9 18-Jun-1995  cgd don't assume the f_fsnamelen is nul-truncated or longer than MFSNAMELEN
 1.8 09-Mar-1995  mycroft copy*str() should use size_t.
 1.7 08-Mar-1995  cgd size for copyinstr should be u_long
 1.6 18-Jan-1995  mycroft Clean up the code to frob mnt_stat a bit.
 1.5 18-Jan-1995  mycroft Turn mountlist into a CIRCLEQ, and handle setting and checking of MNT_ROOTFS
differently.
 1.4 15-Dec-1994  mycroft Call foo_statfs() from a common place when mounting.
 1.3 14-Dec-1994  mycroft Sync with CSRG.
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.28.2.9 05-May-2000  he Pull up revision 1.50 (requested by perseant):
Sanity check the superblock before trying to use it to find the
alt superblock; sanity check the disk address of the alt superblock
to avoid deadlocking when trying to read it with the primary
superblock buffer still busy. Fixes PR#10001.
 1.28.2.8 29-Mar-2000  he Pull up revision 1.38 (requested by simonb):
Prevent lfs_mountroot() from attempting to use a network device
as root.
(This revision is needed on some NetBSD platforms.)
 1.28.2.7 29-Mar-2000  he Pull up revision 1.37 (requested by pk):
Prevent lfs_mountroot() from attempting to use a network device
as root.
 1.28.2.6 20-Jan-2000  he Pull up revision 1.46 (via patch, requested by perseant):
Files removed (through unlink, rmdir) are now really removed, though the
removal is postponed until the dirop is complete to ensure validity of
the filesystem through a crash. Use a separate per-fs lock, instead of
ufs_hashlock, to protect the inode free list. Change calling semantics
of lfs_ifind, to give better error reporting: If fed a struct buf, it
can report the block number of the offending inode block as well as the
inode number.
 1.28.2.5 15-Jan-2000  he Pull up revision 1.36 (requested by perseant):
Address problems related to using an LFS filesystem as the root
filesystem, including mknod hangs. Fixes PR#8172 and PR#9072.
 1.28.2.4 17-Dec-1999  he Pull up revision 1.41 (requested by perseant):
Address locking protocol error for inode hash, and make the
maximum number of active dirops a global quantity.
 1.28.2.3 17-Dec-1999  he Pull up revision 1.34 (via patch, requested by perseant):
Avoid flushing vnodes involved in a dirop, making lfs' promise
of "no fsck needed, even in the event of a crash" closer to
reality.
 1.28.2.2 19-Oct-1999  he Pull up revision 1.39 (requested by wrstuden):
Catch a few cases missed earlier where we need to lock the vnode before
calling VOP_CLOSE().
 1.28.2.1 13-Apr-1999  perseant branches: 1.28.2.1.2;
Pull-up of changes made to the trunk on Sunday [1.30->1.32], to wit:

Take out the `#ifdef USE_UFSHASH'; use ufs_hashlock to lock the inode free
list instead of free_lock.

Fix inode reporting in lfs_statfs (the meaning of f_files and f_ffree was
reversed).

Fix "lfs_ifind: dinode xxx not found" panic. When inodes were freed, then
immediately reloaded, their dinodes were located in an inode block which
was not on disk at the advertized location, nor in the cache (although it
would be flushed to disk next segment write). Fix this by using getblk()
instead of lfs_newbuf() for inode blocks.

Better checking for held inode locks in lfs_fastvget, for a number of
error conditions. Also change the default setting of lfs_clean_vnhead to
0, which seems to make the locking problems go away (although this is
difficult to test as I can't reliably reproduce them).

Make sure that the wakeup occurs for vnodes that lfs_update might be
sleeping on (nodes which are not marked IN_MODIFIED/IN_CLEANING, but which
have dirty buffers), by marking them with the appropriate flag if
dirtybuffers were added while the write was in progress.

Fix block counting during file truncation, if not truncating to zero.

Disallow threshold-initiated cache flush when dirops are active. Also,
make SET_ENDOP use lfs_check instead of inlining most of it.

Improve the debugging printfs in the cleaner syscalls (in particular, make
it obvious that they're coming from lfs).

Check the superblock version field, and refuse to mount the filesystem if
the version number is higher than we know about. This allows, e.g.,
changes in the format of the ifile, segment size restrictions and
boundaries, etc., which would not affect existing fields in the
superblock, but which would drastically affect the filesystem, to be
smoothly integrated at a later date.
 1.28.2.1.2.3 31-Aug-1999  perseant Rudimentary support for LFS under UBC:

- LFS-specific VOP_BALLOC and VOP_PUTPAGES vnode ops.

- getblk VREG panic #ifdef'd out (can be reinstated when Ifile is
internalized and Ifile can be made another type from VREG)

- interface to VOP_PUTPAGES changed to pass all pager flags, not
just sync. FS putpages routines must know about the pager flags.

- new LFS magic disk address, -2 ("unwritten"), meaning accounted for
but not assigned to a fixed disk location (since LFS does these two
things separately, and the previous accounting method using buffer
headers no longer will work). Changed references to (foo == (daddr_t)-1)
to (foo < 0). Since disk drivers reject all addresses < 0, this should
not present a problem for other FSs.
 1.28.2.1.2.2 02-Aug-1999  thorpej Update from trunk.
 1.28.2.1.2.1 21-Jun-1999  thorpej Sync w/ -current.
 1.38.2.2 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.38.2.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.39.4.3 15-Nov-1999  fvdl Sync with -current
 1.39.4.2 03-Nov-1999  fvdl Give ufs_ihashget an extra argument: the flags passed to vget() for
locking. This way we can avoid locking against ourselves when
ufs_ihashget is called during the flushing of metadata. XXX

Also, comment out a VOP_FSYNC call that I think is now unneeded, and
put a diagnostic printf there to check if this still happens.
 1.39.4.1 19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.39.2.5 11-Feb-2001  bouyer Sync with HEAD.
 1.39.2.4 08-Dec-2000  bouyer Sync with HEAD.
 1.39.2.3 22-Nov-2000  bouyer Sync with HEAD.
 1.39.2.2 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.39.2.1 20-Oct-1999  thorpej Sync w/ trunk.
 1.41.2.2 06-Nov-1999  perseant Address ufs_hashlock/ufs_ihashins protocol bug, discovered while doing a
post-mortem of a production machine. Also, take the active dirop
count off of the fs and make it global (since it is measuring a global
resource) and tie the threshold value LFS_MAXDIROP to desiredvnodes.
 1.41.2.1 06-Nov-1999  perseant file lfs_vfsops.c was added on branch comdex-fall-1999 on 1999-11-06 20:33:07 +0000
 1.52.4.3 03-Feb-2001  he Pull up revision 1.59 (requested by perseant):
o Initialize cleaner info from superblock, making fsck_lfs'
accounting of lfs_nclean work.
 1.52.4.2 14-Sep-2000  perseant Pull up recent LFS kernel changes (approved by thorpej):

ufs/ufs/inode.h, 1.20--1.22 (add i_lfs_effnblks extension ;
make ITIMES aware of LFS_ITIMES;
_LKM protection so userland progs
compile)
ufs/ufs/ufs_vnops.c, 1.69, 1.71 (remove IN_ADIROP;
use ITIMES instead of FFS_ITIMES)
ufs/ufs/ufs_readwrite.c, 1.27 (use lfs_reserve in lfs_write)
ufs/lfs/lfs.h, 1.26--1.32 (define LFS_EST_* macros ;
change MIN_FREE_SEGS to lfs_minfreesegs ;
add avail and bfree to CLEANERINFO ;
change lfs_uinodes to signed ;
change lfs_dmeta to signed ;
add whitespace to line up structure
members ;
explicit cast to int32_t in LFS_EST_*
macros)
ufs/lfs/lfs_alloc.c, back out 1.34.2.3 (pullups of 1.39, 1.40);
then pull up 1.38 (clean up on error)
1.39--1.43 (restore fvdl's ufs_hashlock fix ;
restore fvdl's ufs_hashlock fix ;
set i_lfs_effnblks ;
use UINO macros ;
add comments and fix long lines)
ufs/lfs/lfs_balloc.c, 1.19 (don't succeed halfway)
1.21--1.25 (use i_lfs_effnblks ;
fix i_lfs_effnblks computation and
quieten ;
fix i_ffs_blocks in unwritten fragment ;
remove useless debugging check ;
add comments and (c) 2000)
ufs/lfs/lfs_bio.c, 1.24--1.30 (cleanup and make lfs_flush_fs take
"struct lfs *" instead of "struct
mount *" ;
use lfs_minfreeseg instead of
MIN_FREE_SEGS ;
use UINO macros, and copy bfree/avail
to CLEANERINFO ;
add lfs_reserve function ;
1.28--1.30 fix printf formatting)
ufs/lfs/lfs_cksum.c, 1.13 (add (c) 2000)
ufs/lfs/lfs_debug.c, 1.11 (use btodb instead of DEV_BSIZE)
ufs/lfs/lfs_extern.h, 1.18, 1.20--1.21 (function prototype changes)
ufs/lfs/lfs_inode.c, 1.38 (rewrite lfs_truncate from
ffs_truncate)
1.40--1.44 (count written and unwritten blocks
seperately ;
use disk block units instead of bytes ;
remove unnecessary "mod" variable ;
correct B_DELWRI to avoid bawrite panic ;
use lfs_reserve)
ufs/lfs/lfs_segment.c, 1.52-1.59 (use lfs_dmeta to note used summaries ;
check for UNWRITTEN in indirect blocks ;
more debugging stuff inside #ifdef
DEBUG_LFS ;
use LK_CANRECURSE ;
don't drop dirty indirect blocks ;
use UINO macros ;
don't hose the free list ;
use btodb() instead of DEV_BSIZE ;
make it compile again (oops))
ufs/lfs/lfs_subr.c, 1.16--1.17 (check for locked inodes before
changing ;
use btodb() instead of DEV_BSIZE, (c)
2000)
ufs/lfs/lfs_syscalls.c, back out 1.41.4.2 (fvdl's ufs_hashlock fix);
then pull up 1.43 (use lfs_dmeta)
1.44--1.45 (restore fvdl's ufs_hashlock fix)
1.46--1.47 (fix lfs_avail leakage from sblock
segments ;
use UINO macros)
1.49 (bounds-check inode numbers in
lfs_markv)
ufs/lfs/lfs_vfsops.c, 1.53 (use LFS_EST_* macros in lfs_statfs)
1.56--1.58 (initialize lfs_minfreeseg, lfs_effnblk ;
initialize lfs_uinodes ;
initialize lfs_ravail)
ufs/lfs/lfs_vnops.c, 1.40 (remove VDIROP from removed files)
1.42--1.44 (move SET_ENDOP below the removal of
VDIROP ;
use UINO macros and add lfs_itimes
function ;
use lfs_reserve in dirops)
 1.52.4.1 03-Jul-2000  fvdl pullup the fixes from the trunk to not hold ufs_hashlock across
getnewvnode()
 1.64.2.15 15-Jan-2003  thorpej Sync with HEAD.
 1.64.2.14 11-Dec-2002  thorpej Sync with HEAD.
 1.64.2.13 18-Oct-2002  nathanw Catch up to -current.
 1.64.2.12 17-Sep-2002  nathanw Catch up to -current.
 1.64.2.11 01-Aug-2002  nathanw Catch up to -current.
 1.64.2.10 15-Jul-2002  nathanw Whitespace.
 1.64.2.9 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.64.2.8 20-Jun-2002  nathanw Catch up to -current.
 1.64.2.7 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.64.2.6 08-Jan-2002  nathanw Catch up to -current.
 1.64.2.5 14-Nov-2001  nathanw Catch up to -current.
 1.64.2.4 21-Sep-2001  nathanw Catch up to -current.
 1.64.2.3 24-Aug-2001  nathanw Catch up with -current.
 1.64.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.64.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.65.4.6 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.65.4.5 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.65.4.4 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.65.4.3 16-Mar-2002  jdolecek Catch up with -current.
 1.65.4.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.65.4.1 03-Aug-2001  lukem update to -current
 1.65.2.4 02-Jul-2001  perseant Change disk addressing unit to be the fragment, instead of the disk sector.
All quantities in the superblock, inodes, indirect blocks, etc. refer now
to this abstract unit (called "fsb" as it is in FFS) instead of disk sectors;
as a consequence segment summary blocks have to be multiples of a fragment in
size. In v1 filesystems, compatibility code ensures that 1 fsb == 1 sector,
regardless of fragment size.

Fragments can now range in size between 512 and 32k; in the event that
LFS_LABELPAD (8k) is smaller than the disk address unit size, an extra
proto-superblock is kept at 8k from the beginning of the disk, to be used
*only* to locate the real superblocks. (Not all of the userland knows about
this yet.)

Almost all of this was done not by me, but by joff.
 1.65.2.3 29-Jun-2001  perseant fix comment in light of roll_id
 1.65.2.2 29-Jun-2001  perseant Get rid of __P(), protoizing where it had not already been done
 1.65.2.1 27-Jun-2001  perseant Import of what I've been calling "LFSv2", that is, LFS with some features
added that require changes to the on-disk data structures. These include:

- 64-bit time in everything but inodes
- User-specified segment offset, and segment size no longer
restricted to PO2.
- Serial number on segment summaries in addition to timestamp, and
a new volume identifier, to make roll-forward feasible without
fear of finding old data and thinking it was new.

Although I think this version works at least as well as what's on the trunk,
we're not done yet; hence this commit is going in on a branch and not on
the trunk. Enhancements that are not here yet include fragment addressing,
like FFS does, instead of block addressing.
 1.66.2.3 01-Oct-2001  fvdl Catch up with -current.
 1.66.2.2 26-Sep-2001  fvdl * add a VCLONED vnode flag that indicates a vnode representing a cloned
device.
* rename REVOKEALL to REVOKEALIAS, and add a REVOKECLONE flag, to pass
to VOP_REVOKE
* the revoke system call will revoke all aliases, as before, but not the
clones
* vdevgone is called when detaching a device, so make it use REVOKECLONE
to get rid of all clones as well
* clean up all uses of VOP_OPEN wrt. locking.
* add a few VOPS to spec_vnops that need to do something when it's a
clone vnode (access and getattr)
* add a copy of the vnode vattr structure of the original 'master' vnode
to the specinfo of a cloned vnode. could possibly redirect getattr to
the 'master' vnode, but this has issues with revoke
* add a vdev_reassignvp function that disassociates a vnode from its
original device, and reassociates it with the specified dev_t. to be
used by cloning devices only, in case a new minor is allocated.
* change all direct references in drivers to v_devcookie and v_rdev
to vdev_privdata(vp) and vdev_rdev(vp). for diagnostic purposes
when debugging race conditions that still exist wrt. locking and
revoking vnodes.
* make the locking state of a vnode consistent when passed to
d_open and d_close (unlocked). locked would be better, but has
some deadlock issues
 1.66.2.1 18-Sep-2001  fvdl Various changes to make cloning devices possible:

* Add an extra argument (struct vnode **) to VOP_OPEN. If it is
not NULL, specfs will create a cloned (aliased) vnode during
the call, and return it there. The caller should release and
unlock the original vnode if a new vnode was returned. The
new vnode is returned locked.

* Add a flag field to the cdevsw and bdevsw structures.
DF_CLONING indicates that it wants a new vnode for each
open (XXX is there a better way? devprop?)

* If a device is cloning, always call the close entry
point for a VOP_CLOSE.


Also, rewrite cons.c to do the right thing with vnodes. Use VOPs
rather then direct device entry calls. Suggested by mycroft@

Light to moderate testing done an i386 system (arch doesn't matter
though, these are MI changes).
 1.68.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.74.2.5 29-Aug-2002  gehenna catch up with -current.
 1.74.2.4 15-Jul-2002  gehenna catch up with -current.
 1.74.2.3 20-Jun-2002  gehenna catch up with -current.
 1.74.2.2 30-May-2002  gehenna Catch up with -current.
 1.74.2.1 16-May-2002  gehenna Use devsw APIs for checking validity of major numbers.
 1.76.2.1 20-Jun-2002  lukem Pull up revision 1.77 (requested by perseant in ticket #325):
For synchronous writes, keep separate i/o counters for each write, so
processes don't have to wait for one another to finish (e.g., nfsd seems
to be a little happier now, though I haven't measured the difference).
Synchronous checkpoints, however, must always wait for all i/o to finish.
Take the contents of the callback functions and have them run in thread
context instead (aiodoned thread). lfs_iocount no longer has to be
protected in splbio(), and quite a bit less of the segment construction
loop needs to be in splbio() as well.
If lfs_markv is handed a block that is not the correct size according to
the inode, refuse to process it. (Formerly it was extended to the "correct"
size.) This is possibly more prone to deadlock, but less prone to corruption.
lfs_segclean now outright refuses to clean segments that appear to have live
bytes in them. Again this may be more prone to deadlock but avoids
corruption.
Replace ufsspec_close and ufsfifo_close with LFS equivalents; this means
that no UFS functions need to know about LFS_ITIMES any more. Remove
the reference from ufs/inode.h.
Tested on i386, test-compiled on alpha.
 1.121.2.13 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.121.2.12 01-Apr-2005  skrll Sync with HEAD.
 1.121.2.11 08-Mar-2005  skrll Sync with HEAD.
 1.121.2.10 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.121.2.9 17-Jan-2005  skrll Sync with HEAD.
 1.121.2.8 27-Oct-2004  skrll Remove the struct lwp * arguments from qsync and ufs_checkpath that are
no longer (read: were never) required.
 1.121.2.7 21-Sep-2004  skrll Fix the sync with head I botched.
 1.121.2.6 18-Sep-2004  skrll Sync with HEAD.
 1.121.2.5 25-Aug-2004  skrll Sync with HEAD.
 1.121.2.4 24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.121.2.3 03-Aug-2004  skrll Sync with HEAD
 1.121.2.2 19-Aug-2003  skrll LWPify
 1.121.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.146.2.1 29-May-2004  tron branches: 1.146.2.1.2;
Pull up revision 1.151 (requested by atatat in ticket #393):
Sysctl descriptions under vfs subtree
 1.146.2.1.2.1 10-May-2005  riz Pull up the following revisions (requested by perseant in ticket #1281):

1.8 sys/ufs/lfs/TODO
1.75 sys/ufs/lfs/lfs.h (via patch)
1.74 sys/ufs/lfs/lfs_alloc.c (via patch)
1.49, 1.51 sys/ufs/lfs/lfs_balloc.c (1.51 via patch)
1.78 sys/ufs/lfs/lfs_bio.c
1.62 sys/ufs/lfs/lfs_extern.h (via patch)
1.156 sys/ufs/lfs/lfs_segment.c (via patch)
1.48 sys/ufs/lfs/lfs_subr.c
1.101 sys/ufs/lfs/lfs_syscalls.c
1.163 sys/ufs/lfs/lfs_vfsops.c (via patch)
1.134 sys/ufs/lfs/lfs_vnops.c (via patch)
1.61 sys/ufs/ufs/ufs_readwrite.c (via patch)

1.20 libexec/lfs_cleanerd/clean.h (via patch)
1.52 libexec/lfs_cleanerd/cleanerd.c (via patch)
1.41 libexec/lfs_cleanerd/library.c (via patch)

1.4 regress/sys/fs/lfs/newfs_fsck/Makefile
1.2 regress/sys/fs/lfs/newfs_fsck/mkfs_mount
1.2 regress/sys/fs/lfs/newfs_fsck/smallfiles
1.3 sbin/fsck_lfs/bufcache.c
1.3 sbin/fsck_lfs/bufcache.h
1.3 sbin/fsck_lfs/lfs.h
1.8 sbin/fsck_lfs/lfs.c (via patch)
1.8 sbin/fsck_lfs/pass3.c (via patch)
1.18 sbin/fsck_lfs/pass0.c (via patch)
1.18 sbin/fsck_lfs/utilities.c (via patch)
1.7 sbin/fsck_lfs/segwrite.c
1.19 sbin/fsck_lfs/setup.c (via patch)
1.3 sbin/newfs_lfs/Makefile
0 sbin/newfs_lfs/lfs.c (yes, remove it)
1.1 sbin/newfs_lfs/make_lfs.c
1.15 sbin/newfs_lfs/newfs.c (via patch)

Various minor LFS improvements.

Kernel:

* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this. Should fix PR #29045.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
Fixes PR #26680.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().

cleaner:

* Adapt lfs_cleanerd to use the fcntl call to get the Ifile filehandle,
so it need not be in the namespace.
* Make lfs_cleanerd be more careful when there are very few available
segments.
* Make lfs_cleanerd less verbose when the filesystem is unmounted.

newfs_lfs, fsck_lfs, and regression:

* Extend the lfs library from fsck_lfs(8) so that it can be used with a
not-yet-existent LFS. Make newfs_lfs(8) use this library, so it can
create LFSs whose Ifile is larger than one segment. Addresses PR #11110.
* Make newfs_lfs(8) use strsuftoi64() for its arguments, a la newfs(8).
* Make fsck_lfs(8) respect the "file system is clean" flag.
* Don't let fsck_lfs(8) think it has dirty blocks when invoked with the
-n flag.
* Remove the Ifile from the filesystem namespace. The cleaner now uses
a fcntl call on the root inode to find the Ifile filehandle. (As a
side-effect, addresses PR #29144.)
 1.162.4.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.162.2.1 29-Apr-2005  kent sync with -current
 1.167.2.22 10-Aug-2006  tron Apply patch (requested by fair in perseant #1457):
Bring LFS up to current, including a patch (1.95 lfs_alloc.c) that
should prevent the inode free list errors seen on the STABLE branch
subsequent to pullup ticket #1327.
 1.167.2.21 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_alloc.c: revision 1.93
sys/ufs/lfs/lfs.h: revision 1.106
sys/ufs/lfs/lfs_vfsops.c: revision 1.209
sys/ufs/lfs/lfs_vnops.c: revision 1.175
sys/ufs/lfs/lfs_segment.c: revision 1.178
Fixes to address the "vinvalbuf: dirty blocks" panic that can occur when
many inodes are cleaned at once. Make sure that we write all the pages
on vnodes that are being flushed, even if we don't think there's room;
drain v_numoutput before lfs_vflush() completes.
Also, don't allow a vnode that is in the process of being cleaned to be
chosen by getnewvnode(); this avoids a segment accounting panic in the case
that a large number of inodes are fed to lfs_markv() all at once.
 1.167.2.20 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_alloc.c: revision 1.92
sys/ufs/lfs/lfs.h: revision 1.105
sys/ufs/lfs/lfs_vfsops.c: revision 1.207
sys/ufs/lfs/lfs_subr.c: revision 1.59
sys/ufs/lfs/lfs_vnops.c: revision 1.173
sys/ufs/lfs/lfs_bio.c: revision 1.92
Introduce another per-filesystem parameter, lfs_resvseg, to separate the
notion of "how many segments are reserved for the cleaner" from that of
"how many segments are not counted in lfs_bfree". The default value
used for existing filesystems is the same as the previous implicit value
of (lfs_minfreeseg / 2 + 1), modulo some sanity checking.
Count pending dirops on a per-filesystem basis, since once we start
writing them we can't stop until we're done. This seems to help stave off
the "no clean segments" panic in the case of filling the filesystem with
directories and small files (e.g. simultaneously unpacking more copies of
pkgsrc than will fit).
 1.167.2.19 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs.h: revision 1.104
sys/ufs/lfs/lfs_vfsops.c: revision 1.206
sys/ufs/lfs/lfs_vnops.c: revision 1.170
sys/ufs/lfs/lfs_extern.h: revision 1.80
sys/ufs/lfs/lfs_segment.c: revision 1.176
sys/ufs/lfs/lfs_inode.c: revision 1.103 via patch
sys/ufs/lfs/lfs_alloc.c: revision 1.90
Postpone the segment accounting changes coming from truncation until the
inode that makes those changes valid is either written to disk by
lfs_writeinode() or discarded by lfs_vfree().
A couple of locking fixes are also included as well.
 1.167.2.18 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vfsops.c: revision 1.205
Don't roll forward if we aren't given a process context. Coverity CID 1076.
 1.167.2.17 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vfsops.c: revision 1.204 via patch
Coverity CID 2499: Fix uninitialize variable use.
 1.167.2.16 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vfsops.c: revision 1.203 via patch
Remove mostly useless BUFPAGES warning message from lfs_{un,}mount.
 1.167.2.15 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs.h: revision 1.101
sys/ufs/lfs/lfs_vfsops.c: revision 1.202
sys/ufs/lfs/lfs_alloc.c: revision 1.88
Optimize the free list search a little more; in particular use words
instead of bytes for the index, and never search below fs->lfs_freehd.
Fix a bug in the previous version of the search (an erroneous assumption
that ino_t was signed).
Free the bitmap when we unmount the filesystem.
 1.167.2.14 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vfsops.c: revision 1.201
Correct a locking bug in the recent pager optimization.
 1.167.2.13 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vfsops.c: revision 1.200
sys/ufs/lfs/lfs_vnops.c: revision 1.164
sys/ufs/lfs/lfs_inode.c: revision 1.101
sys/ufs/lfs/lfs_extern.h: revision 1.78
sys/ufs/lfs/lfs.h: revision 1.100
Implement a somewhat finer-grained mechanism for paging LFS-backed pages.
The writer daemon, if it does not need to flush the whole filesystem,
now only writes the vnodes for which the pagedaemon has requested pageouts
(although it does not pay attention to the page ranges the pagedaemon
supplies).
 1.167.2.12 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_alloc.c: revision 1.87
sys/ufs/lfs/lfs.h: revision 1.99
sys/ufs/lfs/lfs_vfsops.c: revision 1.199
sys/ufs/lfs/lfs_extern.h: revision 1.77 via patch
Keep the free list ordered. This solves a problem first pointed out to me
by Michel Oey, in which an aged LFS writes up to an extra Ifile block for
every file created; and paves the way for the truncation of the Ifile when
many files are deleted.
 1.167.2.11 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vfsops.c: revision 1.198
sys/ufs/lfs/lfs_vnops.c: revision 1.161
Handle the "filesystem is clean" flag correctly when upgrading from
read-only to read-write mount. This makes "root on lfs" work for me,
although it looks like a different traceback from PR#32667.
 1.167.2.10 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vfsops.c: revision 1.196
Double-checkpoint on unmount. This ensures that vnodes belonging to removed
files are really freed, preventing occasional spurious EBUSY returns from
vflush().
 1.167.2.9 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.158
sys/ufs/lfs/lfs_subr.c: revision 1.57
sys/ufs/lfs/lfs_segment.c: revision 1.171
sys/ufs/lfs/lfs.h: revision 1.97
sys/ufs/lfs/lfs_vfsops.c: revision 1.195
sys/ufs/lfs/lfs_extern.h: revision 1.76
Improvements to LFS's paging mechanism, to wit:
* Acknowledge that sometimes there are more dirty pages to be written to
disk than clean segments. When we reach the danger line,
lfs_gop_write() now returns EAGAIN. The caller of VOP_PUTPAGES(), if
it holds the segment lock, drops it and waits for the cleaner to make
room before continuing.
* Note and avoid a three-way deadlock in lfs_putpages (a writer holding
a page busy blocks on the cleaner while the cleaner blocks on the
segment lock while lfs_putpages blocks on the page).
 1.167.2.8 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_segment.c: revision 1.170
sys/ufs/lfs/lfs.h: revision 1.96
sys/ufs/lfs/lfs_vfsops.c: revision 1.194
sys/ufs/lfs/lfs_syscalls.c: revision 1.109
From Konrad Schroeder, in response to strange df output on anoncvs.netbsd.org:
We were returning the wrong value for free space. Now we're not.
 1.167.2.7 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.153
sys/ufs/lfs/lfs_debug.c: revision 1.32
sys/ufs/lfs/lfs_alloc.c: revision 1.84
sys/ufs/lfs/lfs_vfsops.c: revision 1.185
sys/ufs/lfs/lfs_segment.c: revision 1.165
64 bit inode changes.
 1.167.2.6 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.152
sys/ufs/lfs/lfs_debug.c: revision 1.31
sys/ufs/lfs/lfs_subr.c: revision 1.53
sys/ufs/lfs/lfs_extern.h: revision 1.68
sys/ufs/lfs/lfs_inode.c: revision 1.96
sys/ufs/lfs/lfs_bio.c: revision 1.86
sys/ufs/lfs/lfs_alloc.c: revision 1.83
sys/ufs/lfs/lfs_vfsops.c: revision 1.181
sys/ufs/lfs/lfs.h: revision 1.88
sys/ufs/lfs/lfs_segment.c: revision 1.164
- sprinkle const
- avoid shadow variables.
 1.167.2.5 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vfsops.c: revision 1.180
sys/ufs/lfs/lfs_syscalls.c: revision 1.106
sys/ufs/lfs/lfs.h: revision 1.87
Keep track of the number of segments reclaimed, since the cleaner doesn't
do this anymore (it hasn't for quite some time). Add a couple of conditional
debugging messages to indicate why segments are not cleaned, in the event
that lfs_segclean is used.
Make the LFCNSEGWAITALL fcntl work again.
 1.167.2.4 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vfsops.c: revision 1.179
Fill in the lfs_fsmnt field in the superblock when we mount the filesystem,
so fsck(8) can tell where it was last mounted.
 1.167.2.3 24-Aug-2005  riz Pull up following revision(s) (requested by yamt in ticket #688):
sys/miscfs/genfs/genfs_vnops.c: revision 1.98 via patch
sys/ufs/ffs/ffs_vfsops.c: revision 1.165
sys/ufs/lfs/lfs_extern.h: revision 1.69
sys/fs/filecorefs/filecore_vfsops.c: revision 1.20
sys/nfs/nfs_node.c: revision 1.80
sys/fs/smbfs/smbfs_node.c: revision 1.24
sys/fs/cd9660/cd9660_vfsops.c: revision 1.24
sys/fs/msdosfs/msdosfs_denode.c: revision 1.8
sys/miscfs/genfs/genfs_node.h: revision 1.6
sys/ufs/lfs/lfs_vfsops.c: revision 1.183
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.86
sys/fs/adosfs/advfsops.c: revision 1.23
sys/fs/ntfs/ntfs_vfsops.c: revision 1.31
- constify genfs_ops.
- use member designators.

sys/miscfs/genfs/genfs_vnops.c: revision 1.99 via patch
genfs_getpages: don't forget to put the vnode onto the syncer's work que
ue
even in the case of PGO_LOCKED.

sys/uvm/uvm_bio.c: revision 1.40
sys/uvm/uvm_pager.h: revision 1.29
sys/miscfs/genfs/genfs_vnops.c: revision 1.100 via patch
sys/ufs/ufs/ufs_inode.c: revision 1.50
- introduce PGO_NOBLOCKALLOC and use it for ubc mapping
to prevent unnecessary block allocations in the case that
page size > block size.
- ufs_balloc_range: use VM_PROT_WRITE+PGO_NOBLOCKALLOC rather than
VM_PROT_READ.

sys/uvm/uvm_fault.c: revision 1.96
sys/miscfs/genfs/genfs_vnops.c: revision 1.101 via patch
sys/uvm/uvm_object.h: revision 1.19
sys/miscfs/genfs/genfs_node.h: revision 1.7
ensure that vnodes with dirty pages are always on syncer's queue.
- genfs_putpages: wait for i/o completion of PG_RELEASED/PG_PAGEOUT pages by
setting "wasclean" false when encountering them.
suggested by Stephan Uphoff in PR/24596 (1).
- genfs_putpages: write protect pages when cleaning out, if
we're going to take the vnode off the syncer's queue.
uvm_fault: don't write-map pages unless its vnode is already on
the syncer's queue.
fix PR/24596 (3) but in the different way from the suggested fix.
(to keep our current behaviour, ie. not to require explicit msync.
discussed on tech-kern@.)
- genfs_putpages: don't mistakenly take a vnode off the queue
by introducing a generation number in genfs_node.
genfs_getpages: increment the generation number.
suggested by Stephan Uphoff in PR/24596 (2).
- add some assertions.

sys/miscfs/genfs/genfs_vnops.c: revision 1.102 via patch
genfs_putpages: don't bother to clean the vnode unless VONWORKLST.

sys/ufs/ffs/ffs_vnops.c: revision 1.71
ffs_full_fsync: because VBLK/VCHR can be mmap'ed,
do VOP_PUTPAGES for them as well.

sys/uvm/uvm_fault.c: revision 1.97
uvm_fault: check a correct object in the case of layered filesystems.
fix PR/30811 from Jukka Salmi.

sys/uvm/uvm_object.h: revision 1.20
sys/ufs/ffs/ffs_vfsops.c: revision 1.167
sys/uvm/uvm_bio.c: revision 1.41
sys/ufs/ufs/ufs_vnops.c: revision 1.129
sys/uvm/uvm_mmap.c: revision 1.92
sys/uvm/uvm_fault.c: revision 1.98
sys/kern/vfs_subr.c: revision 1.252
sys/fs/msdosfs/denode.h: revision 1.5
sys/miscfs/genfs/genfs_vnops.c: revision 1.103 via patch
sys/fs/msdosfs/msdosfs_denode.c: revision 1.9
sys/sys/vnode.h: revision 1.141
sys/ufs/ufs/ufs_inode.c: revision 1.51
sys/ufs/ufs/ufs_extern.h: revision 1.45 via patch
sys/miscfs/genfs/genfs_node.h: revision 1.8
sys/ufs/lfs/lfs_vfsops.c: revision 1.184
sys/uvm/uvm_pager.h: revision 1.30
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.87
update file timestamps for nfsd loaned-read and mmap.
PR/25279. discussed on tech-kern@.

sys/miscfs/genfs/genfs_vnops.c: revision 1.104 via patch
don't write-protect wired pages. pointed by Chuck Silvers.
for now, leave a vnode on the syncer's queue, as suggested by him.

sys/ufs/ffs/ffs_vnops.c: revision 1.72
revert VCHR part of ffs_vnops.c 1.71.
as VCHR uses the device pager, no point to call VOP_PUTPAGES here.
pointed by Chuck Silvers.
 1.167.2.2 18-May-2005  snj Pull up revision 1.178 (requested by perseant in ticket #311):
Don't let the pager_map deadlock avoidance code in lfs_putpages() write
segments containing zero-block FINFO records. These records cause segments
to become uncleanable, which would eventually result in a "no clean segments"
panic.
 1.167.2.1 07-May-2005  tron Apply patch (requested by perseant in ticket #242):
* fsck_lfs buffer cache fixes, including PR #29151
* Change fsck_lfs phase 0 message to reflect reality
* fsck_lfs: check phase 5 (cleanerinfo accounting) even on
roll-forward
* Keep better track of the free list during roll-forward, avoiding
a core dump
* Improve hash table use for fsck_lfs buffer and vnode cache
* Document fsck_lfs flag -f, and implement -q
* Add resize_lfs, including kernel support
* Add LFS to mountd's list of exportable filesystem types
* Make the LFS lkm work again [christos@]
* Add MP locking to the LFS kernel subsystem
* Fix pager_map deadlock in lfs_putpages()
* Avoid incomplete file extension that looks like "partial
truncation" to fsck
* Use lfs_malloc for cleaner malloc, since the cleaner often runs
in low-memory conditions.
* Use splay trees, not hash table, to track page allocation for
write.
* Fix mkdir panic on full fs
* Fix page accounting leak by counting differently.
* Use rightly named structure for lfs_getattr [skrll@]
* Cosmetic changes for readability.
 1.183.2.9 04-Feb-2008  yamt sync with head.
 1.183.2.8 21-Jan-2008  yamt sync with head
 1.183.2.7 07-Dec-2007  yamt sync with head
 1.183.2.6 15-Nov-2007  yamt sync with head.
 1.183.2.5 27-Oct-2007  yamt sync with head.
 1.183.2.4 03-Sep-2007  yamt sync with head.
 1.183.2.3 26-Feb-2007  yamt sync with head.
 1.183.2.2 30-Dec-2006  yamt sync with head.
 1.183.2.1 21-Jun-2006  yamt sync with head.
 1.188.2.2 29-Oct-2005  yamt use lfs_* directly rather than via ufs_ops.
suggested by Chuck Silvers.
 1.188.2.1 20-Oct-2005  yamt adapt ufs.
 1.190.2.2 01-Mar-2006  yamt sync with head.
 1.190.2.1 15-Jan-2006  yamt sync with head.
 1.192.4.2 01-Jun-2006  kardel Sync with head.
 1.192.4.1 22-Apr-2006  simonb Sync with head.
 1.192.2.1 09-Sep-2006  rpaulo sync with head
 1.193.6.3 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.193.6.2 31-Mar-2006  tron Merge 2006-03-31 NetBSD-current into the "peter-altq" branch.
 1.193.6.1 28-Mar-2006  tron Merge 2006-03-28 NetBSD-current into the "peter-altq" branch.
 1.193.4.4 11-May-2006  elad sync with head
 1.193.4.3 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.193.4.2 19-Apr-2006  elad sync with head.
 1.193.4.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.193.2.6 03-Sep-2006  yamt sync with head.
 1.193.2.5 11-Aug-2006  yamt sync with head
 1.193.2.4 26-Jun-2006  yamt sync with head.
 1.193.2.3 24-May-2006  yamt sync with head.
 1.193.2.2 11-Apr-2006  yamt sync with head
 1.193.2.1 01-Apr-2006  yamt sync with head.
 1.212.2.1 19-Jun-2006  chap Sync with head.
 1.213.2.1 13-Jul-2006  gdamore Merge from HEAD.
 1.220.4.2 10-Dec-2006  yamt sync with head.
 1.220.4.1 22-Oct-2006  yamt sync with head
 1.220.2.3 01-Feb-2007  ad Sync with head.
 1.220.2.2 12-Jan-2007  ad Sync with head.
 1.220.2.1 18-Nov-2006  ad Sync with head.
 1.224.4.1 03-Sep-2007  wrstuden Sync w/ NetBSD-4-RC_1
 1.224.2.1 05-Jun-2007  bouyer Pull up following revision(s) (requested by perseant in ticket #703):
sys/miscfs/genfs/genfs.h 1.21
sys/miscfs/genfs/genfs_vnops.c 1.151
sys/ufs/lfs/lfs.h 1.119, 1.120
sys/ufs/lfs/lfs_bio.c 1.99-101
sys/ufs/lfs/lfs_extern.h 1.89
sys/ufs/lfs/lfs_inode.c 1.108, 1.109
sys/ufs/lfs/lfs_segment.c 1.197, 1.199, 1.200
sys/ufs/lfs/lfs_subr.c 1.69, 1.70
sys/ufs/lfs/lfs_syscalls.c 1.119
sys/ufs/lfs/lfs_vfsops.c 1.234, 1.235
sys/ufs/lfs/lfs_vnops.c 1.195, 1.196, 1.200, 1.202-206

Reduce busy waiting in lfs_putpages(), and other LFS improvements.
 1.228.2.4 17-May-2007  yamt sync with head.
 1.228.2.3 07-May-2007  yamt sync with head.
 1.228.2.2 24-Mar-2007  yamt sync with head.
 1.228.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.231.4.13 16-Sep-2007  ad - Checkpoint work in progress on the vnode lifecycle and reference counting
stuff. This makes it work properly without kernel_lock and fixes a few
quite old bugs. See vfs_subr.c 1.283.2.17 for details.

- Fix some problems with softdep. Unfortunately our softdep code appears
to have some longstanding bugs that cause it fail under stress test.
 1.231.4.12 24-Aug-2007  ad Sync with buffer cache locking changes. See buf.h/vfs_bio.c for details.
Some minor portions are incomplete and needs to be verified as a whole.
 1.231.4.11 21-Aug-2007  yamt fix some races around pagedaemon and uvm_wait. ok'ed by Andrew Doran.
 1.231.4.10 20-Aug-2007  ad Sync with HEAD.
 1.231.4.9 19-Aug-2007  ad - Back out the biodone() changes.
- Eliminate B_ERROR (from HEAD).
 1.231.4.8 15-Jul-2007  ad Sync with head.
 1.231.4.7 23-Jun-2007  ad - Lock v_cleanblkhd, v_dirtyblkhd, v_numoutput with the vnode's interlock.
Get rid of global_v_numoutput_lock. Partially incomplete as the buffer
cache locking doesn't work very well and needs an overhaul.
- Some changes to try and make softdep MP safe. Untested.
 1.231.4.6 08-Jun-2007  ad Sync with head.
 1.231.4.5 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.231.4.4 10-Apr-2007  ad Nuke the deferred kthread creation stuff, as it's no longer needed.
Pointed out by thorpej@.
 1.231.4.3 09-Apr-2007  ad - Add two new arguments to kthread_create1: pri_t pri, bool mpsafe.
- Fork kthreads off proc0 as new LWPs, not new processes.
 1.231.4.2 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.231.4.1 13-Mar-2007  ad Sync with head.
 1.232.2.1 11-Jul-2007  mjf Sync with head.
 1.240.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.244.8.2 31-Jul-2007  pooka * nuke the nameidata parameter from VFS_MOUNT(). Nobody on tech-kern
knew what it was supposed to be used for and wrstuden gave a go-ahead
* while rototilling, convert file systems which went easily to
use VFS_PROTOS() instead of manually prototyping the methods
 1.244.8.1 31-Jul-2007  pooka file lfs_vfsops.c was added on branch matt-mips64 on 2007-07-31 21:14:21 +0000
 1.244.6.1 14-Oct-2007  yamt sync with head.
 1.244.4.3 23-Mar-2008  matt sync with HEAD
 1.244.4.2 09-Jan-2008  matt sync with HEAD
 1.244.4.1 06-Nov-2007  matt sync with HEAD
 1.244.2.4 09-Dec-2007  jmcneill Sync with HEAD.
 1.244.2.3 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.244.2.2 11-Nov-2007  joerg Sync with HEAD.
 1.244.2.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.246.4.4 18-Feb-2008  mjf Sync with HEAD.
 1.246.4.3 27-Dec-2007  mjf Sync with HEAD.
 1.246.4.2 08-Dec-2007  mjf Sync with HEAD.
 1.246.4.1 19-Nov-2007  mjf Sync with HEAD.
 1.246.2.2 22-Nov-2007  bouyer Sync with HEAD
 1.246.2.1 13-Nov-2007  bouyer Sync with HEAD
 1.249.2.5 26-Dec-2007  ad Sync with head.
 1.249.2.4 19-Dec-2007  ad Use a global lfs_lock.
 1.249.2.3 19-Dec-2007  ad Fix some more problems w/lfs on this branch.
 1.249.2.2 19-Dec-2007  ad Get lfs mostly working.
 1.249.2.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.250.4.2 02-Jan-2008  bouyer Sync with HEAD
 1.250.4.1 13-Dec-2007  bouyer Sync with HEAD
 1.250.2.1 13-Dec-2007  yamt sync with head.
 1.255.10.8 11-Aug-2010  yamt sync with head.
 1.255.10.7 11-Mar-2010  yamt sync with head
 1.255.10.6 16-Sep-2009  yamt sync with head
 1.255.10.5 19-Aug-2009  yamt sync with head.
 1.255.10.4 18-Jul-2009  yamt sync with head.
 1.255.10.3 16-May-2009  yamt sync with head
 1.255.10.2 04-May-2009  yamt sync with head.
 1.255.10.1 16-May-2008  yamt sync with head.
 1.255.8.2 04-Jun-2008  yamt sync with head
 1.255.8.1 18-May-2008  yamt sync with head.
 1.255.6.3 17-Jan-2009  mjf Sync with HEAD.
 1.255.6.2 29-Jun-2008  mjf Sync with HEAD.
 1.255.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.260.2.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.260.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.265.2.1 03-Jul-2008  simonb Sync with head.
 1.267.6.2 25-Apr-2014  sborrill Pull up the following revisions(s) (requested by maxv in ticket #1901):
sys/kern/vfs_syscalls.c: revision 1.478, 1.480 via patch
sys/coda/coda_vfsops.c: revision 1.81
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50 via patch
sys/fs/puffs/puffs_vfsops.c: revision 1.110 via patch
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59 via patch
sys/fs/udf/udf_vfsops.c: revision 1.67
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/kern/vfs_syscalls.c: revision 1.479
sys/miscfs/nullfs/null_vfsops.c: revision 1.88 via patch
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/nfs/nfs_vfsops.c: revision 1.227
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/ufs/mfs/mfs_vfsops.c: revision 1.107

Due to missing checks in the mount syscall, and a wrong assumption on the
file systems side, the kernel could allocate an unbounded or zero-sized
memory buffer, and could dereference a NULL pointer when particular
arguments are given by a user.
 1.267.6.1 04-Apr-2009  snj branches: 1.267.6.1.4; 1.267.6.1.6; 1.267.6.1.10;
Pull up following revision(s) (requested by ad in ticket #662):
sys/ufs/lfs/lfs_vfsops.c: revision 1.272
Turn up the volume on the warning message a bit.
 1.267.6.1.10.1 28-Apr-2014  sborrill Pull up the following revisions(s) (requested by maxv in ticket #1901):
sys/kern/vfs_syscalls.c: revision 1.478, 1.480 via patch
sys/coda/coda_vfsops.c: revision 1.81
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50 via patch
sys/fs/puffs/puffs_vfsops.c: revision 1.110 via patch
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59 via patch
sys/fs/udf/udf_vfsops.c: revision 1.67
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/kern/vfs_syscalls.c: revision 1.479
sys/miscfs/nullfs/null_vfsops.c: revision 1.88 via patch
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/nfs/nfs_vfsops.c: revision 1.227
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/ufs/mfs/mfs_vfsops.c: revision 1.107

Due to missing checks in the mount syscall, and a wrong assumption on the
file systems side, the kernel could allocate an unbounded or zero-sized
memory buffer, and could dereference a NULL pointer when particular
arguments are given by a user.
 1.267.6.1.6.1 28-Apr-2014  sborrill Pull up the following revisions(s) (requested by maxv in ticket #1901):
sys/kern/vfs_syscalls.c: revision 1.478, 1.480 via patch
sys/coda/coda_vfsops.c: revision 1.81
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50 via patch
sys/fs/puffs/puffs_vfsops.c: revision 1.110 via patch
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59 via patch
sys/fs/udf/udf_vfsops.c: revision 1.67
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/kern/vfs_syscalls.c: revision 1.479
sys/miscfs/nullfs/null_vfsops.c: revision 1.88 via patch
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/nfs/nfs_vfsops.c: revision 1.227
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/ufs/mfs/mfs_vfsops.c: revision 1.107

Due to missing checks in the mount syscall, and a wrong assumption on the
file systems side, the kernel could allocate an unbounded or zero-sized
memory buffer, and could dereference a NULL pointer when particular
arguments are given by a user.
 1.267.6.1.4.1 09-Feb-2012  matt Change to use the updated uvm_pageout_* signature.
 1.267.4.3 28-Apr-2009  skrll Sync with HEAD.
 1.267.4.2 03-Mar-2009  skrll Sync with HEAD.
 1.267.4.1 19-Jan-2009  skrll Sync with HEAD.
 1.267.2.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.269.4.2 23-Jul-2009  jym Sync with HEAD.
 1.269.4.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.282.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.282.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.286.2.4 19-May-2011  rmind Implement sharing of vnode_t::v_interlock amongst vnodes:
- Lock is shared amongst UVM objects using uvm_obj_setlock() or getnewvnode().
- Adjust vnode cache to handle unsharing, add VI_LOCKSHARE flag for that.
- Use sharing in tmpfs and layerfs for underlying object.
- Simplify locking in ubc_fault().
- Sprinkle some asserts.

Discussed with ad@.
 1.286.2.3 21-Apr-2011  rmind sync with head
 1.286.2.2 03-Jul-2010  rmind sync with head
 1.286.2.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.287.4.1 09-Feb-2011  bouyer Various build fixes
 1.287.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.288.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.290.2.5 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.290.2.4 23-Jan-2013  yamt sync with head
 1.290.2.3 23-May-2012  yamt sync with head.
 1.290.2.2 17-Apr-2012  yamt sync with head
 1.290.2.1 02-Nov-2011  yamt page cache related changes

- maintain object pages in radix tree rather than rb tree.
- reduce unnecessary page scan in putpages. esp. when an object has a ton of
pages cached but only a few of them are dirty.
- reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
- fix nfs commit range tracking.
- fix nfs write clustering. XXX hack
 1.291.4.3 02-Jun-2012  mrg sync to latest -current.
 1.291.4.2 05-Apr-2012  mrg sync to latest -current.
 1.291.4.1 18-Feb-2012  mrg merge to -current.
 1.293.2.2 21-Apr-2014  bouyer Pull up following revision(s) (requested by maxv in ticket #1050):
sys/ufs/chfs/chfs_vfsops.c: revision 1.11
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/fs/nilfs/nilfs_vfsops.c: revision 1.16
sys/ufs/mfs/mfs_vfsops.c: revision 1.107
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/kern/vfs_syscalls.c: revision 1.478
sys/kern/vfs_syscalls.c: revision 1.479
sys/fs/puffs/puffs_vfsops.c: revision 1.110
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/nfs/nfs_vfsops.c: revision 1.227
sys/fs/v7fs/v7fs_vfsops.c: revision 1.10
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/miscfs/nullfs/null_vfsops.c: revision 1.88
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50
sys/coda/coda_vfsops.c: revision 1.81
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/kern/vfs_syscalls.c: revision 1.480
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/kern/vfs_syscalls.c: revision 1.482
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vfsops.c: revision 1.12
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/udf/udf_vfsops.c: revision 1.67
Limit check for 'data_len'. Otherwise a (un)privileged user can easily
panic the system by passing a huge size.
ok christos@
An (un)privileged user can easily make the kernel dereference a NULL
pointer.
The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).
ok christos@
Some fs's - like kernfs - set their vfs_min_mount_data to zero. Add a check
to prevent an (un)privileged user from requesting a zero-sized allocation
(and thus a panic).
This thing is totally buggy: 'data_len' is modified by the fs, so calling
kmem_free with it while its value has changed since the kmem_alloc is far
from being a good idea.
If the kernel figures out that something mismatches, it will panic
(typically with kernfs).
 1.293.2.1 17-Mar-2012  bouyer branches: 1.293.2.1.4; 1.293.2.1.6;
Pull up following revision(s) (requested by perseant in ticket #116):
sys/ufs/lfs/lfs_alloc.c: revision 1.112
tests/fs/vfs/t_rmdirrace.c: revision 1.9
tests/fs/vfs/t_renamerace.c: revision 1.25
sys/ufs/lfs/lfs_vnops.c: revision 1.240
sys/ufs/lfs/lfs_segment.c: revision 1.224
sys/ufs/lfs/lfs_bio.c: revision 1.122
sys/ufs/lfs/lfs_vfsops.c: revision 1.294
sbin/newfs_lfs/make_lfs.c: revision 1.19
sys/ufs/lfs/lfs.h: revision 1.136
Pass t_renamerace and t_rmdirrace tests.
Adapt dholland@'s fix to ufs_rename to fix PR kern/43582. Address several
other MP locking issues discovered during the course of investigating the
same problem.
Removed extraneous vn_lock() calls on the Ifile, since the Ifile writes
are controlled by the segment lock.
Fix PR kern/45982 by deemphasizing the estimate of how much metadata
will fill the empty space on disk when the disk is nearly empty
(t_renamerace crates a lot of inode blocks on a tiny empty disk).
 1.293.2.1.6.1 21-Apr-2014  bouyer Pull up following revision(s) (requested by maxv in ticket #1050):
sys/ufs/chfs/chfs_vfsops.c: revision 1.11
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/fs/nilfs/nilfs_vfsops.c: revision 1.16
sys/ufs/mfs/mfs_vfsops.c: revision 1.107
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/kern/vfs_syscalls.c: revision 1.478
sys/kern/vfs_syscalls.c: revision 1.479
sys/fs/puffs/puffs_vfsops.c: revision 1.110
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/nfs/nfs_vfsops.c: revision 1.227
sys/fs/v7fs/v7fs_vfsops.c: revision 1.10
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/miscfs/nullfs/null_vfsops.c: revision 1.88
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50
sys/coda/coda_vfsops.c: revision 1.81
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/kern/vfs_syscalls.c: revision 1.480
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/kern/vfs_syscalls.c: revision 1.482
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vfsops.c: revision 1.12
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/udf/udf_vfsops.c: revision 1.67
Limit check for 'data_len'. Otherwise a (un)privileged user can easily
panic the system by passing a huge size.
ok christos@
An (un)privileged user can easily make the kernel dereference a NULL
pointer.
The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).
ok christos@
Some fs's - like kernfs - set their vfs_min_mount_data to zero. Add a check
to prevent an (un)privileged user from requesting a zero-sized allocation
(and thus a panic).
This thing is totally buggy: 'data_len' is modified by the fs, so calling
kmem_free with it while its value has changed since the kmem_alloc is far
from being a good idea.
If the kernel figures out that something mismatches, it will panic
(typically with kernfs).
 1.293.2.1.4.1 21-Apr-2014  bouyer Pull up following revision(s) (requested by maxv in ticket #1050):
sys/ufs/chfs/chfs_vfsops.c: revision 1.11
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/fs/nilfs/nilfs_vfsops.c: revision 1.16
sys/ufs/mfs/mfs_vfsops.c: revision 1.107
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/kern/vfs_syscalls.c: revision 1.478
sys/kern/vfs_syscalls.c: revision 1.479
sys/fs/puffs/puffs_vfsops.c: revision 1.110
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/nfs/nfs_vfsops.c: revision 1.227
sys/fs/v7fs/v7fs_vfsops.c: revision 1.10
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/miscfs/nullfs/null_vfsops.c: revision 1.88
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50
sys/coda/coda_vfsops.c: revision 1.81
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/kern/vfs_syscalls.c: revision 1.480
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/kern/vfs_syscalls.c: revision 1.482
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vfsops.c: revision 1.12
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/udf/udf_vfsops.c: revision 1.67
Limit check for 'data_len'. Otherwise a (un)privileged user can easily
panic the system by passing a huge size.
ok christos@
An (un)privileged user can easily make the kernel dereference a NULL
pointer.
The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).
ok christos@
Some fs's - like kernfs - set their vfs_min_mount_data to zero. Add a check
to prevent an (un)privileged user from requesting a zero-sized allocation
(and thus a panic).
This thing is totally buggy: 'data_len' is modified by the fs, so calling
kmem_free with it while its value has changed since the kmem_alloc is far
from being a good idea.
If the kernel figures out that something mismatches, it will panic
(typically with kernfs).
 1.296.2.4 03-Dec-2017  jdolecek update from HEAD
 1.296.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.296.2.2 23-Jun-2013  tls resync from head
 1.296.2.1 25-Feb-2013  tls resync with head
 1.307.2.2 18-May-2014  rmind sync with head
 1.307.2.1 28-Aug-2013  rmind sync with head
 1.320.2.1 10-Aug-2014  tls Rebase.
 1.321.4.6 28-Aug-2017  skrll Sync with HEAD
 1.321.4.5 09-Jul-2016  skrll Sync with HEAD
 1.321.4.4 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.321.4.3 22-Sep-2015  skrll Sync with HEAD
 1.321.4.2 06-Jun-2015  skrll Sync with HEAD
 1.321.4.1 06-Apr-2015  skrll Sync with HEAD
 1.351.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.351.2.3 26-Apr-2017  pgoyette Sync with HEAD
 1.351.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.351.2.1 20-Jul-2016  pgoyette Adapt machine-independant code to the new {b,c}devsw reference-counting
(using localcount(9)). All callers of {b,c}devsw_lookup() now call
{b,c}devsw_lookup_acquire() which retains a reference on the 'struct
{b,c}devsw'. This reference must be released by the caller once it is
finished with the structure's content (or other data that would disappear
if the 'struct {b,c}devsw' were to disappear).
 1.359.4.2 02-Nov-2017  snj Pull up following revision(s) (requested by pgoyette in ticket #335):
share/man/man9/kernhist.9: 1.5-1.8
sys/arch/acorn26/acorn26/pmap.c: 1.39
sys/arch/arm/arm32/fault.c: 1.105 via patch
sys/arch/arm/arm32/pmap.c: 1.350, 1.359
sys/arch/arm/broadcom/bcm2835_bsc.c: 1.7
sys/arch/arm/omap/if_cpsw.c: 1.20
sys/arch/arm/omap/tiotg.c: 1.7
sys/arch/evbarm/conf/RPI2_INSTALL: 1.3
sys/dev/ic/sl811hs.c: 1.98
sys/dev/usb/ehci.c: 1.256
sys/dev/usb/if_axe.c: 1.83
sys/dev/usb/motg.c: 1.18
sys/dev/usb/ohci.c: 1.274
sys/dev/usb/ucom.c: 1.119
sys/dev/usb/uhci.c: 1.277
sys/dev/usb/uhub.c: 1.137
sys/dev/usb/umass.c: 1.160-1.162
sys/dev/usb/umass_quirks.c: 1.100
sys/dev/usb/umass_scsipi.c: 1.55
sys/dev/usb/usb.c: 1.168
sys/dev/usb/usb_mem.c: 1.70
sys/dev/usb/usb_subr.c: 1.221
sys/dev/usb/usbdi.c: 1.175
sys/dev/usb/usbdi_util.c: 1.67-1.70
sys/dev/usb/usbroothub.c: 1.3
sys/dev/usb/xhci.c: 1.75
sys/external/bsd/drm2/dist/drm/i915/i915_gem.c: 1.34
sys/kern/kern_history.c: 1.15
sys/kern/kern_xxx.c: 1.74
sys/kern/vfs_bio.c: 1.275-1.276
sys/miscfs/genfs/genfs_io.c: 1.71
sys/sys/kernhist.h: 1.21
sys/ufs/ffs/ffs_balloc.c: 1.63
sys/ufs/lfs/lfs_vfsops.c: 1.361
sys/ufs/lfs/ulfs_inode.c: 1.21
sys/ufs/lfs/ulfs_vnops.c: 1.52
sys/ufs/ufs/ufs_inode.c: 1.102
sys/ufs/ufs/ufs_vnops.c: 1.239
sys/uvm/pmap/pmap.c: 1.37-1.39
sys/uvm/pmap/pmap_tlb.c: 1.22
sys/uvm/uvm_amap.c: 1.108
sys/uvm/uvm_anon.c: 1.64
sys/uvm/uvm_aobj.c: 1.126
sys/uvm/uvm_bio.c: 1.91
sys/uvm/uvm_device.c: 1.66
sys/uvm/uvm_fault.c: 1.201
sys/uvm/uvm_km.c: 1.144
sys/uvm/uvm_loan.c: 1.85
sys/uvm/uvm_map.c: 1.353
sys/uvm/uvm_page.c: 1.194
sys/uvm/uvm_pager.c: 1.111
sys/uvm/uvm_pdaemon.c: 1.109
sys/uvm/uvm_swap.c: 1.175
sys/uvm/uvm_vnode.c: 1.103
usr.bin/vmstat/vmstat.c: 1.219
Reorder to test for null before null deref in debug code
--
Reorder to test for null before null deref in debug code
--
KNF
--
No need for '\n' in UVMHIST_LOG
--
normalise a BIOHIST log message
--
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...
(As proposed on tech-kern@ with additional changes and enhancements.)
Details of changes:
* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)
* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.
* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.
* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."
* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.
* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).
* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).
* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.
* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.
[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3)
format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".
[2] I've tried very hard to find "all [the] existing users of
kernhist(9)"
but it is possible that I've missed some of them. I would be glad
to
update any stragglers that anyone identifies.
--
For some reason this single kernel seems to have outgrown its declared
size as a result of the kernhist(9) changes. Bump the size.
XXX The amount of increase may be excessive - anyone with more detailed
XXX knowledge please feel free to further adjust the value
appropriately.
--
Misssed one cast of pointer --> uintptr_t in previous kernhist(9) commit
--
And yet another one. :(
--
Use correct mark-up for NetBSD version.
--
More improvements in grammar and readability.
--
Remove a stray '"' (obvious typo) and add a couple of casts that are
probably needed.
--
And replace an instance of "%p" conversion with "%#jx"
--
Whitespace fix. Give Bl tag table a width. Fix Xr.
 1.359.4.1 30-Oct-2017  snj Pull up following revision(s) (requested by maya in ticket #330):
sbin/fsck_lfs/inode.c: 1.69
sbin/fsck_lfs/lfs.c: 1.73
sbin/fsck_lfs/pass6.c: 1.50
sbin/fsck_lfs/segwrite.c: 1.46
sys/ufs/lfs/lfs.h: 1.202-1.203
sys/ufs/lfs/lfs_accessors.h: 1.48
sys/ufs/lfs/lfs_alloc.c: 1.136-1.137
sys/ufs/lfs/lfs_balloc.c: 1.94
sys/ufs/lfs/lfs_bio.c: 1.141
sys/ufs/lfs/lfs_extern.h: 1.113
sys/ufs/lfs/lfs_inode.c: 1.156-1.157
sys/ufs/lfs/lfs_inode.h: 1.20, 1.21, 1.23
sys/ufs/lfs/lfs_itimes.c: 1.20
sys/ufs/lfs/lfs_pages.c: 1.13-1.15
sys/ufs/lfs/lfs_rename.c: 1.22
sys/ufs/lfs/lfs_segment.c: 1.270-1.275
sys/ufs/lfs/lfs_subr.c: 1.94-1.97
sys/ufs/lfs/lfs_syscalls.c: 1.175
sys/ufs/lfs/lfs_vfsops.c: 1.360
sys/ufs/lfs/lfs_vnops.c: 1.316-1.321
sys/ufs/lfs/ulfs_inode.c: 1.20
sys/ufs/lfs/ulfs_inode.h: 1.24
sys/ufs/lfs/ulfs_lookup.c: 1.41
sys/ufs/lfs/ulfs_quota2.c: 1.31
sys/ufs/lfs/ulfs_readwrite.c: 1.24
sys/ufs/lfs/ulfs_vnops.c: 1.49-1.50
Update inode member i_flag --> i_state to keep up with kernel changes
Move definition of IN_ALLMOD near the flag it's a mask for.
Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
Correct confusion between i_flag and i_flags
These will have to be renamed.
Spotted by Riastradh, thanks!
Add an XXX about the missing flags so it's not buried in a commit
message.
now the XXX count for LFS is 260
Rename i_flag to i_state.
The similarity to i_flags has previously caused errors.
Use continue to denote the no-op loop to match netbsd style
newline for extra clarity.
It isn't safe to drain dirops with seglock held, it'll deadlock if there
are any dirops. drain before grabbing seglock.
lfs_dirops == 0 is always true (as we already drained dirops), so omit
that part of the comparison.
Fixes a lot of LFS deadlocks. PR kern/52301
Many thanks to dholland for help analyzing coredumps
Ifdef out KDASSERT which fires on my machine.
Deduplicate sanity check that seglock is held on segunlock
Revert r1.272 fix to PR kern/52301, the performance hit is making things
unusable.
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar
XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
fix buffer overflow/KASSERT when cookies are supplied
lfs no longer uses the ffs-style struct direct, use the correct minimum
size
from dholland
XXX more wrong
Consistently use {,UN}MARK_VNODE macros rather than function calls.
Not much point doing anything after a panic call
Ask some question about the code in a XXX comment
XXX question our double-flushing of dirops
Fix typo in comment
 1.359.2.1 27-Apr-2017  pgoyette Restore all work from the former pgoyette-localcount branch (which is
now abandoned doe to cvs merge botch).

The branch now builds, and installs via anita. There are still some
problems (cgd is non-functional and all atf tests time-out) but they
will get resolved soon.
 1.361.2.3 18-Jan-2019  pgoyette Synch with HEAD
 1.361.2.2 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.361.2.1 25-Jun-2018  pgoyette Sync with HEAD
 1.362.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.362.2.1 10-Jun-2019  christos Sync with HEAD
 1.365.2.2 07-Jan-2025  martin Pull up following revision(s) (requested by hannken in ticket #1934):

sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.228
sys/ufs/lfs/lfs_vfsops.c: revision 1.383
sys/ufs/ffs/ffs_wapbl.c: revision 1.50
sys/ufs/ffs/ffs_vfsops.c: revision 1.383 (patch)
sys/ufs/ffs/ffs_vfsops.c: revision 1.384 (patch)

Remove comment "we are always called with the filesystem marked `MPBUSY'."
above some xxx_sync() operations. These operations get called without
any exclusive lock.

This comment appeared with "add quota support" on 1990-05-02.
On 1998/02/18 MNT_MPBUSY disappeared when vfs_busy() was changed from
an exclusive lock to a shared lock.

PR kern/58837 "ffs: Missing locking around fs_fmod/time"

Protect test/clear fs->fs_fmod with um_lock like it is already
protected in ffs_alloc.c.

When writing to disk protect moving superblock to buffer with um_lock.

Set/clear fs->fmod while mounting, updating a mount or unmounting
is safe as these operations run exclusive, either mounting creates
a new file system or the file system is suspended. Assert suspension
for update and unmount.

PR kern/58837 "ffs: Missing locking around fs_fmod/time"
 1.365.2.1 17-Aug-2020  martin Pull up following revision(s) (requested by riastradh in ticket #1050):

sys/ufs/lfs/lfs_subr.c: revision 1.101
sys/ufs/lfs/lfs_subr.c: revision 1.102
sys/ufs/lfs/lfs_inode.c: revision 1.158
sys/ufs/lfs/lfs_inode.h: revision 1.25
sys/ufs/lfs/lfs_balloc.c: revision 1.95
sys/ufs/lfs/lfs_pages.c: revision 1.21
sys/ufs/lfs/lfs_vnops.c: revision 1.330
sys/ufs/lfs/lfs_alloc.c: revision 1.140 (patch)
sys/ufs/lfs/lfs_alloc.c: revision 1.141 (patch)
lib/libp2k/p2k.c: revision 1.72
sys/ufs/lfs/lfs.h: revision 1.205
sys/ufs/lfs/lfs.h: revision 1.206
sys/ufs/lfs/lfs_segment.c: revision 1.284
sys/ufs/lfs/lfs.h: revision 1.207
sys/ufs/lfs/lfs_segment.c: revision 1.285
sys/ufs/lfs/lfs_debug.c: revision 1.55
sys/ufs/lfs/lfs_rename.c: revision 1.23
usr.sbin/dumplfs/dumplfs.c: revision 1.65
sys/ufs/lfs/lfs_vfsops.c: revision 1.371
sys/arch/i386/stand/efiboot/bootx64/Makefile: revision 1.3
sys/ufs/lfs/lfs_vfsops.c: revision 1.372
sys/ufs/lfs/lfs_vfsops.c: revision 1.373
sbin/fsck_lfs/pass1.c: revision 1.46
sys/ufs/lfs/lfs_vnops.c: revision 1.326
sys/ufs/lfs/lfs_vnops.c: revision 1.327
sys/ufs/lfs/lfs_vfsops.c: revision 1.375 (patch)
sys/ufs/lfs/lfs_vnops.c: revision 1.328
sys/ufs/lfs/lfs_subr.c: revision 1.98
sys/ufs/lfs/lfs_extern.h: revision 1.116
sys/ufs/lfs/lfs_vnops.c: revision 1.329
sys/ufs/lfs/lfs_subr.c: revision 1.99
sys/ufs/lfs/lfs_extern.h: revision 1.117
sys/ufs/lfs/lfs_accessors.h: revision 1.49
sys/ufs/lfs/lfs_extern.h: revision 1.118
sys/rump/fs/lib/liblfs/Makefile: revision 1.15
sys/ufs/lfs/lfs_bio.c: revision 1.146 (patch)
sys/ufs/lfs/lfs_bio.c: revision 1.147
sys/ufs/lfs/lfs_subr.c: revision 1.100

Fix kassert in lfs by initializing vp first.

Use a marker node to iterate lfs_dchainhd / i_lfs_dchain.

I believe elements can be removed while the lock is dropped,
including the next node we're hanging on to.

Just use VOP_BWRITE for lfs_bwrite_log.
Hope this doesn't cause trouble with vfs_suspend.

Teach lfs to transition ro<->rw.

Prevent new dirops while we issue lfs_flush_dirops.

lfs_flush_dirops assumes (by KASSERT((ip->i_state & IN_ADIROP) == 0))
that vnodes on the dchain will not become involved in active dirops
even while holding no other locks (lfs_lock, v_interlock), so we must
set lfs_writer here. All other callers already set lfs_writer.

We set fs->lfs_writer++ without explicitly doing lfs_writer_enter
because
(a) we already waited for the dirops to drain, and
(b) we hold lfs_lock and cannot drop it before setting lfs_writer.

Assert lfs_writer where I think we can now prove it.

Serialize access to the splay tree with lfs_lock.

Change some cheap KDASSERT into KASSERT.

Take a reference and fix assertions in lfs_flush_dirops.
Fixes panic:
KASSERT((ip->i_state & IN_ADIROP) == 0) at lfs_vnops.c:1670
lfs_flush_dirops
lfs_check
lfs_setattr
VOP_SETATTR
change_mode
sys_fchmod
syscall

This assertion -- and the assertion that vp->v_uflag has VU_DIROP set
-- is valid only until we release lfs_lock, because we may race with
lfs_unmark_dirop which will remove the nodes and change the flags.

Further, vp itself is valid only as long as it is referenced, which it
is as long as it's on the dchain, but lfs_unmark_dirop drops the
dchain's reference.

Don't lfs_writer_enter while holding v_interlock.

There's no need to lfs_writer_enter at all here, as far as I can see.
lfs_flush_fs will do it for us.

Break deadlock in PR kern/52301.

The lock order is lfs_writer -> lfs_seglock. The problem in 52301 is
that lfs_segwrite violates this lock order by sometimes doing
lfs_seglock -> lfs_writer, either (a) when doing a checkpoint or (b),
opportunistically, when there are no dirops pending. Both cases can
deadlock, because dirops sometimes take the seglock (lfs_truncate,
lfs_valloc, lfs_vfree):
(a) There may be dirops pending, and they may be waiting for the
seglock, so we can't wait for them to complete while holding the
seglock.
(b) The test for fs->lfs_dirops == 0 happens unlocked, and the state
may change by the time lfs_writer_enter acquires lfs_lock.

To resolve this in each case:
(a) Do lfs_writer_enter before lfs_seglock, since we will need it
unconditionally anyway. The worst performance impact of this should
be that some dirops get delayed a little bit.
(b) Create a new lfs_writer_tryenter to use at this point so that the
test for fs->lfs_dirops == 0 and the acquisition of lfs_writer happen
atomically under lfs_lock.

Initialize/destroy lfs_allclean_wakeup in modcmd, not lfs_mountfs.

Fixes reloading lfs.kmod.

In lfs_update, hold lfs_writer around lfs_vflush.

Otherwise, we might do
lfs_vflush
-> lfs_seglock
-> lfs_segwait(SEGM_CKP)
-> lfs_writer_enter
which is the reverse of the lfs_writer -> lfs_seglock ordering.

Call lfs_orphan in lfs_rename while we're still in the dirop.
lfs_writer_enter can't fail; keep it simple and don't pretend it can.

Assert that mtsleep can't fail either -- it doesn't catch signals and
there's no timeout.

Teach LFS_ORPHAN_NEXTFREE about lfs64.

Dust off the orphan detection code and try to make it work.

Fix !DIAGNOSTIC compile

Fix userland references to LFS_ORPHAN_NEXTFREE.

Forgot to grep for these or do a full distribution build, oops!

Fix missing <sys/evcnt.h> by removing the evcnts instead.

Just wanted to confirm that a race might happen, and indeed it did.
These serve little diagnostic value otherwise.

OR into bp->b_cflags; don't overwrite.

CTASSERT lfs on-disk structure sizes.

Avoid misaligned access to lfs64 on-disk records in memory.
lfs64 directory entries are only 32-bit aligned in order to conserve
space in directory blocks, and we had a hack to stuff a 64-bit inode
in them. This replaces the hack by __aligned(4) __packed, and goes
further:

1. It's not clear that all the other lfs64 data structures are 64-bit
aligned on disk to begin with. We can go through these later and
upgrade them from
struct foo64 {
...
} __aligned(4) __packed;
union foo {
struct foo64 f64;
...
};
to
struct foo64 {
...
};
union foo {
struct foo64 f64 __aligned(8);
...
} __aligned(4) __packed;
if we really want to take advantage of 64-bit memory accesses.
However, the __aligned(4) __packed must remain on the union
because:
2. We access even the lfs32 data structures via a union that has
lfs64 members, and it turns out that compilers will assume access
through a union with 64-bit aligned members implies the whole
union has 64-bit alignment, even if we're only accessing a 32-bit
aligned member.

Fix clang build after packed lfs64 accessor change.

Suppress spurious address-of-packed error in rump lfs too.
 1.367.2.3 29-Feb-2020  ad Sync with head.
 1.367.2.2 19-Jan-2020  ad Set IMNT_SHRLOOKUP and use it for the in-cache case. Need to check what
more can be done with tmpfs though, it can probably do the whole lookup.
 1.367.2.1 17-Jan-2020  ad Sync with head.
 1.380.6.1 01-Aug-2021  thorpej Sync with HEAD.
 1.382.10.1 02-Aug-2025  perseant Sync with HEAD
 1.382.4.1 07-Jan-2025  martin Pull up following revision(s) (requested by hannken in ticket #1037):

sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.228
sys/ufs/lfs/lfs_vfsops.c: revision 1.383
sys/ufs/ffs/ffs_wapbl.c: revision 1.50
sys/ufs/ffs/ffs_vfsops.c: revision 1.383
sys/ufs/ffs/ffs_vfsops.c: revision 1.384

Remove comment "we are always called with the filesystem marked `MPBUSY'."
above some xxx_sync() operations. These operations get called without
any exclusive lock.

This comment appeared with "add quota support" on 1990-05-02.
On 1998/02/18 MNT_MPBUSY disappeared when vfs_busy() was changed from
an exclusive lock to a shared lock.

PR kern/58837 "ffs: Missing locking around fs_fmod/time"

Protect test/clear fs->fs_fmod with um_lock like it is already
protected in ffs_alloc.c.

When writing to disk protect moving superblock to buffer with um_lock.

Set/clear fs->fmod while mounting, updating a mount or unmounting
is safe as these operations run exclusive, either mounting creates
a new file system or the file system is suspended. Assert suspension
for update and unmount.

PR kern/58837 "ffs: Missing locking around fs_fmod/time"
 1.345 20-Oct-2025  perseant * Generalize the partial-segment parser introduced for roll-forward,
using it to facilitate an in-kernel segment rewriter (cleaner), and a
mechanism to check whether a segment is in fact empty (only used
with DEBUG).

* Add these new fcntl calls:
- LFCNFILESTATS: For each inode given, report its number of direct
blocks, how many gaps (discontinuities) there are between direct
blocks, and how large the total gap distance is. This will be
useful for a coalescing agent.
- LFCNREWRITEFILE: For each inode given, rewrite its direct blocks,
effectively coalescing it into as compact a form as possible.
- LFCNSCRAMBLE: As above, except that it only rewrites every other
block. This causes the file to have many gaps that can be
measured with LFCNFILESTATS and addressed with LFCNREWRITEFILE,
for testing purposes.
- LFCNREWRITESEGS: Rewrite any live data in the given segments.
This is intended to simplify the cleaner API and facilitate an
in-kernel cleaner.
- LFCNCLEANERINFO: Get the most current CLEANERINFO data from the
kernel.
- LFCNSEGUSE: Retrieve segment usage data from the kernel.

* Vnodes marked IN_CLEANING now take a reference. Add a new "cleaner
lock", which must be taken by the cleaner before the segment lock,
and before marking nodes IN_CLEANING. This allows us to flush
vnodes, if necessary, before the cleaning segment is written, and
never to flush vnodes being cleaned. When the cleaner lock is
released, the vnodes are cleared of IN_CLEANING and the reference
dropped.

* Track a potential infinite loop in lfs_gatherblock.

* Pull "needs to flush" and "needs to wait for flush" into functions
instead of inlining their definitions.
 1.344 01-Oct-2025  perseant Align case labels with 8-character tab stops. No functional change.
 1.343 17-Sep-2025  perseant Add working in-kernel roll forward.
 1.342 06-Sep-2025  perseant Lock the vnode before calling lfs_set_dirop, to meet the conditions of
the assertion. Fixes a regression introduced in rev 1.341.
 1.341 05-Sep-2025  perseant Protect the changed link count of the linked vnode with {,UN}MARK_DIROP
in lfs_link(). Necessary for roll-forward.
 1.340 20-Oct-2021  thorpej Overhaul of the EVFILT_VNODE kevent(2) filter:

- Centralize vnode kevent handling in the VOP_*() wrappers, rather than
forcing each individual file system to deal with it (except VOP_RENAME(),
because VOP_RENAME() is a mess and we currently have 2 different ways
of handling it; at least it's reasonably well-centralized in the "new"
way).
- Add support for NOTE_OPEN, NOTE_CLOSE, NOTE_CLOSE_WRITE, and NOTE_READ,
compatible with the same events in FreeBSD.
- Track which kevent notifications clients are interested in receiving
to avoid doing work for events no one cares about (avoiding, e.g.
taking locks and traversing the klist to send a NOTE_WRITE when
someone is merely watching for a file to be deleted, for example).

In support of the above:

- Add support in vnode_if.sh for specifying PRE- and POST-op handlers,
to be invoked before and after vop_pre() and vop_post(), respectively.
Basic idea from FreeBSD, but implemented differently.
- Add support in vnode_if.sh for specifying CONTEXT fields in the
vop_*_args structures. These context fields are used to convey information
between the file system VOP function and the VOP wrapper, but do not
occupy an argument slot in the VOP_*() call itself. These context fields
are initialized and subsequently interpreted by PRE- and POST-op handlers.
- Version VOP_REMOVE(), uses the a context field for the file system to report
back the resulting link count of the target vnode. Return this in tmpfs,
udf, nfs, chfs, ext2fs, lfs, and ufs.

NetBSD 9.99.92.
 1.339 18-Jul-2021  dholland Abolish all the silly indirection macros for initializing vnode ops tables.

These are things of the form #define foofs_op genfs_op, or #define
foofs_op genfs_eopnotsupp, or similar. They serve no purpose besides
obfuscation, and have gotten cutpasted all over everywhere.
 1.338 18-Jul-2021  dholland Use macros for the canned parts of device and fifo vnode op tables.

Add GENFS_SPECOP_ENTRIES and GENFS_FIFOOP_ENTRIES macros that contain
the portion of the vnode ops table declaration that is
(conservatively) the same in every fs. Use these in every fs that
supports devices and/or fifos with separate ops tables.

Note that ptyfs works differently (it has one type of vnode with
open-coded dispatch to the specfs code, which I haven't changed in
this commit) and rump/librump/rumpvfs/rumpfs.c has an indirect dynamic
dispatch that already does more or less the same thing, which I also
haven't changed.

Also note that this anticipates a few bits in the next changeset here
and there, and adds missing but unreachable calls in some cases (e.g.
most fses weren't defining whiteout on devices and fifos, but it isn't
reachable there), and it changes parsepath on devices and fifos to
genfs_badop from genfs_parsepath (but it's not reachable there
either).

It appears that devices in kernfs were missing kqfilter, so it's
possible that if you try to use kqueue on /kern/rootdev that it'll
explode.

And finally note that the ops declaration tables aren't
order-dependent. (Other than vop_default_desc has to come first.)
Otherwise this wouldn't work.
 1.337 29-Jun-2021  dholland - Add a new vnode op: VOP_PARSEPATH.
- Move namei_getcomponent to genfs_vnops.c and call it genfs_parsepath.
- Add a parsepath entry to every vnode ops table.

VOP_PARSEPATH takes a directory vnode to be searched and a complete
following path and chooses how much of that path to consume. To begin
with, all parsepath calls are genfs_parsepath, which locates the first
'/' as always.

Note that the call doesn't take the whole struct componentname, only
the string. The other bits of struct componentname should not be
needed and there's no reason to cause potential complications by
exposing them.
 1.336 05-Sep-2020  riastradh branches: 1.336.6;
Round of uvm.h cleanup.

The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
to query whether curlwp is the pagedaemon, which should maybe be
exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
by it. We should split up uvm_extern.h but this will serve for now
to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
UVMHIST(ubchist), since ubchist is declared in uvm.h but the
reference evaporates if UVMHIST is not defined, so we reduce header
file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
here.

ok chs@
 1.335 05-Sep-2020  riastradh Revert "ufs: Prevent mkdir from choking on deleted directories."

This change made no sense and should not have been committed.
 1.334 05-Sep-2020  riastradh ufs: Prevent mkdir from choking on deleted directories.

Fix some missing uvm_vnp_setsize in screw cases while here.
 1.333 16-May-2020  christos Add ACL support for FFS. From FreeBSD.
 1.332 13-Apr-2020  ad Replace most uses of vp->v_usecount with a call to vrefcnt(vp), a function
that hides the details and does atomic_load_relaxed(). Signature matches
FreeBSD.
 1.331 23-Feb-2020  ad branches: 1.331.4;
UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.330 23-Feb-2020  riastradh Fix missing <sys/evcnt.h> by removing the evcnts instead.

Just wanted to confirm that a race might happen, and indeed it did.
These serve little diagnostic value otherwise.
 1.329 23-Feb-2020  riastradh Take a reference and fix assertions in lfs_flush_dirops.

Fixes panic:

KASSERT((ip->i_state & IN_ADIROP) == 0) at lfs_vnops.c:1670
lfs_flush_dirops
lfs_check
lfs_setattr
VOP_SETATTR
change_mode
sys_fchmod
syscall

This assertion -- and the assertion that vp->v_uflag has VU_DIROP set
-- is valid only until we release lfs_lock, because we may race with
lfs_unmark_dirop which will remove the nodes and change the flags.

Further, vp itself is valid only as long as it is referenced, which it
is as long as it's on the dchain, but lfs_unmark_dirop drops the
dchain's reference.
 1.328 23-Feb-2020  riastradh Change some cheap KDASSERT into KASSERT.
 1.327 23-Feb-2020  riastradh Assert lfs_writer where I think we can now prove it.
 1.326 23-Feb-2020  riastradh Use a marker node to iterate lfs_dchainhd / i_lfs_dchain.

I believe elements can be removed while the lock is dropped,
including the next node we're hanging on to.
 1.325 18-Sep-2019  christos branches: 1.325.2;
Add newly created vnodes to the namei cache. The rest of the filesystems
already did that (or they don't support writing). Discussed in tech-kern.
 1.324 20-Jun-2019  christos branches: 1.324.2;
unifdef -DLFS_READWRITE ulfs_readwrite.c
 1.323 01-Jan-2019  hannken Add "void *extra" argument to vcache_new() so a file system may
pass more information about the file to create.

Welcome to 8.99.30
 1.322 11-Aug-2018  zafer In lfs_mkdir fix wrong return path in case of EMLINK which causes a panic. Also, check earlier before setting up dirop.
 1.321 20-Aug-2017  maya branches: 1.321.2; 1.321.4;
Fix typo in comment
 1.320 19-Aug-2017  maya Not much point doing anything after a panic call
 1.319 19-Aug-2017  maya Consistently use {,UN}MARK_VNODE macros rather than function calls.
 1.318 26-Jul-2017  maya change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar

XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
 1.317 10-Jun-2017  maya Rename i_flag to i_state.

The similarity to i_flags has previously caused errors.
 1.316 05-Jun-2017  maya Correct confusion between i_flag and i_flags
These will have to be renamed.

Spotted by Riastradh, thanks!
 1.315 26-May-2017  riastradh branches: 1.315.2;
Make VOP_RECLAIM do the last unlock of the vnode.

VOP_RECLAIM naturally has exclusive access to the vnode, so having it
locked on entry is not strictly necessary -- but it means if there
are any final operations that must be done on the vnode, such as
ffs_update, requiring exclusive access to it, we can now kassert that
the vnode is locked in those operations.

We can't just have the caller release the last lock because some file
systems don't use genfs_lock, and require the vnode to remain valid
for VOP_UNLOCK to work, notably unionfs.
 1.314 26-Apr-2017  riastradh Change VOP_REMOVE and VOP_RMDIR to preserve lock/ref on dvp.

No change to vp -- the plan is to replace the node by the
componentname in the vop parameters, and let all directory vops do
lookups internally.

Proposed on tech-kern with no objections:
https://mail-index.netbsd.org/tech-kern/2017/04/17/msg021825.html
 1.313 11-Apr-2017  riastradh Make VOP_INACTIVE preserve vnode lock on return.

Discussed on tech-kern:
https://mail-index.netbsd.org/tech-kern/2017/04/01/msg021751.html

Ride 7.99.68, a bumpy bus of incremental vfs improvements!
 1.312 11-Apr-2017  riastradh Fix non-DIAGNOSTIC build by using vp outside KASSERT too.
 1.311 11-Apr-2017  riastradh Sprinkle lock ownership assertions.
 1.310 01-Apr-2017  maya Switch lfs_writer_daemon to use condvar instead of mtsleep.
track thread existence with struct lwp instead of pid + lid,
it's more useful from ddb.
 1.309 01-Apr-2017  maya switch lfs_dirops to condvar (from mtsleep)
 1.308 01-Apr-2017  maya switch lfs_sleepers to condvar (from mtsleep)
 1.307 30-Mar-2017  hannken Remove now redundant calls to fstrans_start()/fstrans_done().

Add fstrans_start()/fstrans_done() to lfs_putpages().
 1.306 16-Mar-2017  maya actually cast to unsigned long long and use %llu. certainly not use hex (oops)
suggested by dh
 1.305 15-Mar-2017  maya print inode number in an assert I keep hitting and the adjacent one.
use PRIx64 for printing inode number elsewhere.
 1.304 13-Jul-2016  maya branches: 1.304.2; 1.304.4;
Fix a deadlock

ok dholland@
 1.303 20-Jun-2016  dholland In lfs_mknod, don't release dvp until done with it. This was exposed a
while back when I removed a sketchy preprocessor macro scheme, but I'd
left it the way it was at the time and marked it for later. Now I
guess it's later.

Also don't randomly use both dvp and ap->a_dvp; they're the same, so
pick one and stick to it.
 1.302 20-Jun-2016  dholland One more batch of already-synced ufs changes:

ufs_extern.h 1.79 is equivalent to ulfs_extern.h 1.14
ufsmount.h 1.43 is (roughly) equivalent to lfs_extern.h 1.102
ufs_inode.c 1.94 does not apply to lfs
ufs_inode.c 1.95 does not apply to lfs either
ufs_readwrite.c 1.108 is equivalent to ulfs_readwrite.c 1.8
ufs_readwrite.c 1.109 is equivalent to ulfs_readwrite.c 1.9
ufs_readwrite.c 1.110 is equivalent to ulfs_readwrite.c 1.10
ufs_readwrite.c 1.111 does not apply to lfs
ufs_readwrite.c 1.112 is equivalent to ulfs_readwrite.c 1.11
ufs_readwrite.c 1.113 is equivalent to ulfs_readwrite.c 1.13
ufs_readwrite.c 1.114 is equivalent to ulfs_readwrite.c 1.14
ufs_readwrite.c 1.115 is equivalent to ulfs_readwrite.c 1.15
ufs_readwrite.c 1.116-1.118 does not apply to lfs
ufs_readwrite.c 1.119-1.120 are equivalent to ulfs_readwrite.c 1.16
ufs_rename.c 1.12 is equivalent to lfs_rename.c 1.8
ufs_vnops.c 1.226 is equivalent to ulfs_vnops.c 1.22 and lfs_vnops.c 1.270
ufs_vnops.c 1.227 is equivalent to ulfs_vnops.c 1.23
ufs_vnops.c 1.228-1.229 are equivalent to ulfs_vnops.c 1.24
ufs_vnops.c 1.230 is equivalent to ulfs_vnops.c 1.25 and lfs_vnops.c 1.271
ufs_vnops.c 1.231 originated in lfs
ufs_vnops.c 1.232 does not apply to lfs
 1.301 20-Jun-2016  dholland With the previous we seem to have the changes from -r1.225 of ufs_vnops.c.
(as that was stuff from moving ffs to the new vcache and lfs has also been
moved, this is not surprising)
 1.300 20-Jun-2016  dholland ulfs_makeinode -> lfs_makeinode
 1.299 20-Jun-2016  dholland Merge (effectively) -r1.78 of ufs_extern.h: shift ulfs_makeinode to
lfs_vnops.c and make it file-static there, as that's the only place
it's used.
 1.298 20-Jun-2016  dholland Note more already-merged versions:

inode.h 1.68 is subsumed by ulfs_inode.h 1.19
inode.h 1.69-1.72 do not apply to lfs
ufs_extern.h 1.74 was covered when lfs was moved to the new vnode cache
ufs_extern.h 1.75 is equivalent to ulfs_extern.h 1.13
ufs_extern.h 1.76-1.77 do not apply to lfs
ufsmount.h 1.42 does not apply to lfs
ufs_inode.c 1.90 is subsumed by ulfs_inode.c 1.10
ufs_inode.c 1.91-1.92 do not apply to lfs
ufs_lookup.c 1.130 is subsumed by ulfs_lookup.c 1.24
ufs_lookup.c 1.131 is equivalent to ulfs_lookup.c 1.20
ufs_lookup.c 1.132 is equivalent to ulfs_lookup.c 1.21
ufs_lookup.c 1.133 is equivalent to ulfs_lookup.c 1.22
ufs_lookup.c 1.134 is equivalent to ulfs_lookup.c 1.23
ufs_lookup.c 1.135 is equivalent to ulfs_lookup.c 1.25
ufs_quota2.c 1.38 is equivalent to ulfs_quota2.c 1.17
ufs_quota2.c 1.39 is equivalent to ulfs_quota2.c 1.16
ufs_quota2.c 1.40 is equivalent to ulfs_quota2.c 1.18
ufs_vfsops.c 1.53 is subsumed by lfs_vfsops.c 1.324
ufs_vfsops.c 1.54 is subsumed by lfs_vfsops.c 1.324
ufs_vnops.c 1.223-1.224 do not apply to lfs
 1.297 20-Jun-2016  dholland More already-merged or equivalent changes:

ufs_dirhash.c 1.36 corresponds to ulfs_dirhash.c 1.8
ufs_extattr.c 1.43 corresponds to ulfs_extattr.c 1.7
ufs_lookup.c 1.126 does not apply to lfs
ufs_lookup.c 1.127 we already have
ufs_lookup.c 1.128 does not apply to lfs
ufs_lookup.c 1.129 corresponds to ulfs_lookup.c 1.19
ufs_quota1.c 1.19 corresponds to ulfs_quota1.c 1.7
ufs_quota1.c 1.20 corresponds to ulfs_quota1.c 1.8
ufs_quota2.c 1.36 we have equivalent changes for
ufs_rename.c 1.9 corresponds to lfs_rename.c 1.5
ufs_rename.c 1.10 corresponds to lfs_rename.c 1.6
ufs_vnops.c 1.219 corresponds to lfs_vnops.c 1.260 and ulfs_vnops.c 1.19
ufs_vnops.c 1.220 corresponds to lfs_vnops.c 1.261 and ulfs_vnops.c 1.20
ufs_vnops.c 1.221 was superseded by later changes
ufs_vnops.c 1.222 got fixed independently in lfs
 1.296 19-Jun-2016  dholland we already have ufs_lookup.c 1.125 and ufs_vnops.c 1.218.
 1.295 19-Jun-2016  dholland missed one
(probably this should be tracked in some way other than pasting rcsid
comments, but it's what we've got)
 1.294 19-Jun-2016  dholland Merge -r1.216 of ufs_vnops.c: comments about maxsymlinklen handling
 1.293 21-Sep-2015  dholland Add 64-bit directory entry structures, and adjust accessors accordingly.

The LFS64 directory entry has a 64-bit inode number. This is stored as
two 32-bit values to avoid inducing 64-bit alignment requirements.

The exposed type for manipulating directory entries is now
LFS_DIRHEADER, following the same convention as e.g. IFILE and SEGUSE.
(But with LFS_ on it, because.)
 1.292 21-Sep-2015  dholland Oops; LFS_DIRECTSIZ() is going to need the fs as an argument.

Also, it turns out that dirhash needs a compile-time-constant version
of LFS_DIRECTSIZ(LFS_MAXNAMLEN+1), independent of 64-vs-32, so create
LFS_MAXDIRENTRYSIZE for this. Sigh.
 1.291 20-Sep-2015  dholland Clean up struct lfs_dirtemplate.
 1.290 15-Sep-2015  dholland Kill off ulfs_makedirentry; just pass the data to ulfs_direnter instead.
For now, move one copy of the code that allocates and fills in a
temporary struct lfs_direct to the top of ulfs_direnter; but it should
go away shortly.
 1.289 01-Sep-2015  dholland Add new accessors for the d_type and d_namlen fields of struct lfs_direct.
Napalm the old byteswap access logic for these.
 1.288 01-Sep-2015  dholland Use the lfs dinode accessors in place of the ufs-derived ones.
(Mostly.)

The ufs-derived ones are fake structure member macros, which are gross
and not very safe. Also, it seems that a lot of places in the lfs code
were using the ffsv1 branch of them unconditionally, and this way it's
guaranteed all those places have been updated.

Found while doing this: for non-devices, have getattr produce NODEV
in the rdev field instead of leaking the address of the first direct
block.
 1.287 19-Aug-2015  dholland Part two of dinodes; use the same union everywhere.
(previously the ufs-derived code had things set up slightly different)

Remove a bunch of associated mess.
 1.286 12-Aug-2015  dholland Hack up dinode usage to be 64 vs. 32 as needed. Part 1.

(This part changes the native lfs code; the ufs-derived code already
has 64 vs. 32 logic, but as aspects of it are unsafe, and don't
entirely interoperate cleanly with the lfs 64/32 stuff, pass 2 will be
rehashing that.)
 1.285 12-Aug-2015  dholland Make 32-bit and 64-bit versions of SEGSUM.
Also fix some of the FINFO handling as it's closely entangled.
 1.284 12-Aug-2015  dholland Make 32-bit and 64-bit versions of CLEANERINFO.

XXX: while this is written to disk, it seems like much of it would
XXX: be better set up as a commpage shared with the cleaner.
 1.283 12-Aug-2015  dholland Widen several of the fields of BLOCK_INFO to 64 bits.

Keep the old BLOCK_INFO as BLOCK_INFO_70, and version the fcntls that
use it.

Note that BLOCK_INFO_70 has 64-bit padding issues so that it's
different on 32-bit and 64-bit machines. This has been fixed. However,
BLOCK_INFO also contains a pointer, so compat32 stuff for 32-on-64 is
still needed and doesn't currently exist.
 1.282 12-Aug-2015  dholland Move the security checks for lfs_bmapv/lfs_markv into those functions.
(instead of the system call entry points)

Avoids duplication.

While touching these, pass the lwp around instead of the proc -- the
latter was there for no other reason than because once upon a time
struct proc was the first argument of all syscalls.

(For that matter, why not just use curlwp instead of passing it around
all over the place? The cost of passing it to every syscall probably
exceeds the cost of loading it from curcpu, even on machines where
it's not just kept in a register all the time.)
 1.281 03-Aug-2015  dholland Simplify some leftover code and remove some old assertions.

Last year when I killed off some evil dirop-related macros, I added
these assertions because if the things they asserted weren't true we'd
be leaking vnodes. Well, it seems that the code at the time did leak
vnodes, so certain failure cases (e.g. mkdir with disk full) would
assert. Nobody apparently tripped on this in the past fourteen months,
until I broke balloc so it always failed (unrelatedly) while working
on some LFS64 changes.

However, the vnode leak has since been removed by hannken@ as part of
the vnode cache changes, so the assertions are now superfluous;
instead, just make sure *vpp gets nulled on failure, and don't worry
about whether or not VU_DIROP is set as it shouldn't matter any more.

XXX: there's still a lot of gratuitous pointer aliasing in here that
should be tidied away.
 1.280 02-Aug-2015  dholland lfs_cleanint[] in the in-memory superblock needs to have 64-bit entries.
 1.279 02-Aug-2015  dholland Make i_eff_nblks in the in-memory inode 64 bits wide.
 1.278 28-Jul-2015  dholland Add a new lfs header file: lfs_accessors.h.

This contains all the accessor functions and macros out of lfs.h.
Add an include of lfs_accessors.h after all uses of lfs.h... except
for code that wants to define its own struct lfs-alike that the
accessors are supposed to play along with. For these, set STRUCT_LFS
and include lfs_accessors.h after the necessary structure has been
defined, so that lfs_accessors.h can emit functions in terms of it.
 1.277 26-Jul-2015  hannken lfs_flush_pchain: replace vget() with vcache_get().
 1.276 25-Jul-2015  martin Use accessors in DEBUG and DIAGNOSTIC code as well
 1.275 24-Jul-2015  dholland More lfs superblock accessors.
(This changes the rest of the code over; all the accessors were
already added.)

The difference between this commit and the previous one is arbitrary,
but the previous one passed the regression tests on its own so I'm
keeping it separate to help with any bisections that might be needed
in the future.
 1.274 24-Jul-2015  dholland Switch to accessor functions for elements of the LFS on-disk
superblock. This will allow switching between 32/64 bit forms on the
fly; it will also allow handling LFS_EI reasonably tidily. (That
currently doesn't work on the superblock.)

It also gets rid of cpp abuse in the form of fake structure member
macros.

Also, instead of doing sleep/wakeup on &lfs_avail and &lfs_nextseg
inside the on-disk superblock, add extra elements to the in-memory
struct lfs for this. (XXX: these should be changed to condvars, but
not right now)

XXX: this migrates a structure needed by the lfs code in libsa (struct
salfs) into lfs.h, where it doesn't belong, but for the time being
this is necessary in order to allow the accessors (and the various
lfs macros and other goop that relies on them) to compile.
 1.273 07-Jun-2015  hannken Fix copy and paste errors from last commits.
- Kernel i386/ALL and amd64/ALL compile again.
- Resolves CID 1304138 (DEADCODE) and 1304139 (IDENTICAL_BRANCHES).
 1.272 31-May-2015  hannken Change lfs from hash table to vcache.

- Change lfs_valloc() to return an inode number and version instead of
a vnode and move lfs_ialloc() and lfs_vcreate() to new lfs_init_vnode().

- Add lfs_valloc_fixed() to allocate a known inode, used by kernel
roll forward.

- Remove lfs_*ref(), these functions cannot coexist with vcache and
their commented behaviour is far away from their implementation.

- Add the cleaner lwp and blockinfo to struct ulfsmount so lfs_loadvnode()
may use hints from the cleaner.

- Remove vnode locks from ulfs_lookup() like we did with ufs_lookup().
 1.271 20-Apr-2015  riastradh Make VOP_LINK return directory still locked and referenced.

Ride 7.99.10 bump.
 1.270 27-Mar-2015  riastradh Disentangle buffer-cached I/O from page-cached I/O in UFS.

Page-cached I/O is used for regular files, and is initiated by VFS
users such as userland and NFS.

Buffer-cached I/O is used for directories and symlinks, and is issued
only internally by UFS.

New UFS routine ufs_bufio replaces vn_rdwr for internal use.
ufs_bufio is implemented by new UFS operations uo_bufrd/uo_bufwr,
which sit in ufs_readwrite.c alongside the VOP_READ/VOP_WRITE
implementations.

I preserved the code as much as possible and will leave further
simplification for future commits. I kept the ulfs_readwrite.c
copypasta close to ufs_readwrite.c in case we ever want to merge them
back; likewise ext2fs_readwrite.c.

No externally visible semantic change. All atf fs tests still pass.
 1.269 25-Jul-2014  dholland branches: 1.269.2; 1.269.4;
Add VOP_FALLOCATE and VOP_FDISCARD to every vnode ops table I can
find.

The filesystem ones all call genfs_eopnotsupp - right now I am only
implementing the plumbing and we can implement fallocate and/or
fdiscard for files later.

The device ones call spec_fallocate (which is also genfs_eopnotsupp)
and spec_fdiscard, which dispatches to the device-level op.

The fifo ones all call vn_fifo_bypass, which also ends up being
EOPNOTSUPP.
 1.268 17-May-2014  dholland Merge ulfs_create into lfs_create.
 1.267 17-May-2014  dholland Merge ulfs_mkdir into lfs_mkdir.
 1.266 17-May-2014  dholland Merge ulfs_symlink into lfs_symlink.
 1.265 17-May-2014  dholland Move the ulfs-level (copy of ufs) vnops for symlink, create, and mkdir
into lfs_vnops.c preparatory to folding them into the lfs entry points.

(lfs_vnops.c now has four licenses. sigh.)
 1.264 17-May-2014  dholland Remove the DIROP macros. They are evil, especially the CREATE ones.

This results in some duplicate logic in the creation vnops (symlink,
mknod, create, mkdir) but we will probably be able to factor it out in
a more sensible way later.

Now the creation vnops call getnewvnode explicitly instead of under
multiple layers of obscure gunk. Then we explicitly do lfs_set_dirop,
and afterwards lfs_unset_dirop.
 1.263 16-May-2014  dholland Move lfs_getpages and lfs_putpages to their own file.
 1.262 24-Mar-2014  hannken branches: 1.262.2;
- Make VI_XLOCK, VI_CLEAN and VI_LOCKSHARE private to kern/vfs_*.c.
- Make vwait() static.
- Add vdead_check() to check a vnode for being or becoming dead.

Discussed on tech-kern.

Welcome to 6.99.38
 1.261 23-Jan-2014  hannken Change vnode operations create, mknod, mkdir and symlink to return
the resulting vnode *vpp unlocked.

Discussed on tech-kern@

Welcome to 6.99.30
 1.260 17-Jan-2014  hannken Change vnode operations create, mknod, mkdir and symlink to keep the
directory node dvp locked on return.

Discussed on tech-kern@

Welcome to 6.99.29
 1.259 18-Oct-2013  christos use __USE() in the right place, instead of (void)var.
 1.258 17-Oct-2013  christos - remove unused variables
- add debug ifdefs for debugging variables
- __USE() where appropriate.
 1.257 29-Jul-2013  dholland Fix build both with and without options LFS_EI.
 1.256 29-Jul-2013  dholland Revert previous; it is wrong.
 1.255 28-Jul-2013  pgoyette Remove unused variable to fix the build.
 1.254 28-Jul-2013  dholland Merge the extattr VOPs from ffs.
As these do nothing besides dispatch to ulfs_extattr.c it wasn't
exactly hard.

This might just make extended attributes work on lfs...
 1.253 28-Jul-2013  dholland Migrate the miscellaneous ulfs-level info from struct ulfsmount to
struct lfs.

Put them inside #ifdef _KERNEL there. They are not the only such
members, gross as that is. Unfortunately, moving struct lfs to
lfs_kernel.h does not work.
 1.252 28-Jul-2013  dholland Add lfs_kernel.h for declarations that don't need to be exposed to userland.

lfs currently has the following headers:
lfs.h - on-disk structures and stuff needed for userlevel tools
lfs_inode.h - additional restricted materials for userlevel tools
that operate the fs (newfs_lfs, fsck_lfs, lfs_cleanerd)
lfs_kernel.h - stuff needed only in the kernel

and the following legacy headers that are expected to be mopped up and
folded into one of the above:
lfs_extern.h - function prototypes
ulfs_bswap.h - endian-independent support
ulfs_dinode.h - now contains very little
ulfs_dirhash.h - dirhash support
ulfs_extattr.h - extattr support
ulfs_extern.h - more function prototypes
ulfs_inode.h - assorted kernel-only declarations
ulfs_quota.h - quota support
ulfs_quota1.h - more quota support
ulfs_quota2.h - more quota support
ulfs_quotacommon.h - more quota support
ulfsmount.h - legacy copy of ufsmount material
 1.251 21-Jul-2013  dholland Merge logic from ulfs_close(), ulfs_getattr(), and ulfs_strategy()
into the preexisting lfs_*() versions of these functions, and delete
the unused ulfs copies.
 1.250 20-Jul-2013  dholland Merge ulfs_mknod into lfs_mknod, which was missing some bits.
 1.249 20-Jul-2013  dholland Collect the pieces of lfs rename into lfs_rename.c, and sprinkle static.
 1.248 18-Jun-2013  christos branches: 1.248.2; 1.248.4;
Prefix most of the cpp macros with lfs_ and LFS_ to avoid conflicts with ffs.
This was done so that boot blocks that want to compile both FFS and LFS in
the same file work.
 1.247 08-Jun-2013  dholland ulfs_dir.h has been emptied; remove it.
 1.246 08-Jun-2013  dholland Stick LFS_ in front of IFMT, IFIFO, IFREG, etc. so as not to conflict
with the UFS copies of these symbols. (Which themselves ought to have
UFS_ stuck on.)
 1.245 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.244 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.243 22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.242 09-May-2012  riastradh branches: 1.242.2;
Adapt ffs, lfs, and ext2fs to use genfs_rename.

ok dholland, rmind
 1.241 13-Mar-2012  elad Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.
 1.240 16-Feb-2012  perseant Pass t_renamerace and t_rmdirrace tests.

Adapt dholland@'s fix to ufs_rename to fix PR kern/43582. Address several
other MP locking issues discovered during the course of investigating the
same problem.

Removed extraneous vn_lock() calls on the Ifile, since the Ifile writes
are controlled by the segment lock.

Fix PR kern/45982 by deemphasizing the estimate of how much metadata
will fill the empty space on disk when the disk is nearly empty
(t_renamerace crates a lot of inode blocks on a tiny empty disk).
 1.239 02-Jan-2012  perseant branches: 1.239.2;

* Remove PGO_RECLAIM during lfs_putpages()' call to genfs_putpages(),
to avoid a live lock in the latter when reclaiming a vnode with
dirty pages.

* Add a new segment flag, SEGM_RECLAIM, to note when a segment is
being written for vnode reclamation, and record which inode is being
reclaimed, to aid in forensic debugging.

* Add a new segment flag, SEGM_SINGLE, so that opportunistic writes
can write a single segment's worth of blocks and then stop, rather
than writing all the way up to the cleaner's reserved number of
segments.

* Add assert statements to check mutex ownership is the way it ought
to be, mostly in lfs_putpages; fix problems uncovered by this.

* Don't clear VU_DIROP until the inode actually makes its way to disk,
avoiding a problem where dirop inodes could become separated
(uncovered by a modified version of the "ckckp" forensic regression
test).

* Move the vfs_getopsbyname() call into lfs_writerd. Prepare code to
make lfs_writerd notice when there are no more LFSs, and exit losing
the reference, so that, in theory, the module can be unloaded. This
code is not enabled, since it causes a crash on exit.

* Set IN_MODIFIED on inodes flushed by lfs_flush_dirops. Really we
only need to set IN_MODIFIED if we are going to write them again
(e.g., to write pages); need to think about this more.

Finally, several changes to help avoid "no clean segments" panics:

* In lfs_bmapv, note when a vnode is loaded only to discover whether
its blocks are live, so it can immediately be recycled. Since the
cleaner will try to choose ~empty segments over full ones, this
prevents the cleaner from (1) filling the vnode cache with junk, and
(2) squeezing any unwritten writes to disk and running the fs out of
segments.

* Overestimate by half the amount of metadata that will be required
to fill the clean segments. This will make the disk appear smaller,
but should help avoid a "no clean segments" panic.

* Rearrange lfs_writerd. In particular, lfs_writerd now pays
attention to the number of clean segments available, and holds off
writing until there is room.
 1.238 20-Sep-2011  chs branches: 1.238.2; 1.238.6;
strengthen the assertions about pages existing during block allocation,
which were incorrectly relaxed last year. add some comments so that
the intent of these is hopefully clearer.

in ufs_balloc_range(), don't free pages or mark them dirty if
allocating their backing store failed. this fixes PR 45369.
 1.237 12-Jul-2011  dholland Pass the ufs_lookup_results pointer around instead of fetching it from
the inode in the guts of ufs. Now, in VOPs where i_crap is used it is
used (directly) only immediately on entry to the VOP call and then
passed around by reference.

Except for rename, which needs explicit sorting out. The code in
ufs_wapbl_rename is unchanged in behavior but I'm increasingly
inclined to think it's wrong.
 1.236 11-Jul-2011  hannken Change VOP_BWRITE() to take a vnode as its first argument like all other
VOPs do. Layered file systems no longer have to modify bp->b_vp and run
into trouble when an async VOP_BWRITE() uses the wrong vnode.

- change all occurences of VOP_BWRITE(bp) to VOP_BWRITE(bp->b_vp, bp).
- remove layer_bwrite().
- welcome to 5.99.55

Adresses PR kern/38762 panic: vwakeup: neg numoutput

No objections from tech-kern@.
 1.235 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.234 05-Jan-2011  martin branches: 1.234.6;
Avoid NULL deref inside a KASSERT, as discussed on tech-kern.
 1.233 02-Jan-2011  dholland Remove the special refcount behavior (adding an extra reference to the
parent dir) associated with SAVESTART in relookup().

Check all call sites to make sure that SAVESTART wasn't set while
calling relookup(); if it was, adjust the refcount behavior. Remove
related references to SAVESTART.

The only code that was reaching the extra ref was msdosfs_rename,
where the refcount behavior was already fairly broken and/or gross;
repair it.

Add a dummy 4th argument to relookup to make sure code that hasn't
been inspected won't compile. (This will go away next time the
relookup semantics change, which they will.)
 1.232 18-Dec-2010  eeh Byebye deadlock.
 1.231 04-Aug-2010  hannken Free the on disk inodes in the reclaim routine.
 1.230 29-Jul-2010  hannken Add vm page flag PG_MARKER and use it to tag dummy marker pages
in genfs_do_putpages() and uao_put().
Use 'v_uobj.uo_npages' to check for an empty memq.
Put some assertions where these marker pages may not appear.

Ok: YAMAMOTO Takashi <yamt@netbsd.org>
 1.229 24-Jun-2010  hannken Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.
 1.228 24-Jun-2010  hannken Clean up vnode lock operations:

- VOP_LOCK(vp, flags): Limit the set of allowed flags to LK_EXCLUSIVE,
LK_SHARED and LK_NOWAIT. LK_INTERLOCK is no longer allowed as it
makes no sense here.

- VOP_ISLOCKED(vp): Remove the for some time unused return value
LK_EXCLOTHER. Mark this operation as "diagnostic only".
Making a lock decision based on this operation is no longer allowed.

Discussed on tech-kern.
 1.227 29-Mar-2010  pooka Stop exposing fifofs internals and leave only fifo_vnodeop_p visible.
 1.226 07-Dec-2009  eeh branches: 1.226.2; 1.226.4;
Fix some more hangs and deadlocks.
 1.225 17-Nov-2009  eeh This should fix a deadlock.
 1.224 05-Nov-2009  pooka Include compat code by default.
 1.223 30-Oct-2009  christos compile without COMPAT_50
 1.222 29-Oct-2009  christos PR/42246: NAKAJIMA Yoshihiro: provide COMPAT_50 for LFS
 1.221 07-May-2009  elad Replace KAUTH_GENERIC_ISSUSER with a better alternative.
 1.220 22-Feb-2009  ad PR kern/26878 FFSv2 + softdep = livelock (no free ram)
PR kern/16942 panic with softdep and quotas
PR kern/19565 panic: softdep_write_inodeblock: indirect pointer #1 mismatch
PR kern/26274 softdep panic: allocdirect_merge: ...
PR kern/26374 Long delay before non-root users can write to softdep partitions
PR kern/28621 1.6.x "vp != NULL" panic in ffs_softdep.c:4653 while unmounting a softdep (+quota) filesystem
PR kern/29513 FFS+Softdep panic with unfsck-able file-corruption
PR kern/31544 The ffs softdep code appears to fail to write dirty bits to disk
PR kern/31981 stopping scsi disk can cause panic (softdep)
PR kern/32116 kernel panic in softdep (assertion failure)
PR kern/32532 softdep_trackbufs deadlock
PR kern/37191 softdep: locking against myself
PR kern/40474 Kernel panic after remounting raid root with softdep

Retire softdep, pass 2. As discussed and later formally announced on the
mailing lists.
 1.219 16-Jan-2009  yamt branches: 1.219.2;
one more change which i forgot to commit with
UVM_PAGE_HASH_PENALTY -> UVM_PAGE_TREE_PENALTY rename.
noticed by Andreas Wrede.
 1.218 24-Jun-2008  gmcgarry branches: 1.218.4; 1.218.6; 1.218.12;
fcntl(4) says the command is type int. lfs_fcntl() comment says u_long. The implementation says int. Synchronise comment with documentation and cast to int before comparison.
 1.217 04-Jun-2008  ad branches: 1.217.2;
vm_page: put TAILQ_ENTRY into a union with LIST_ENTRY, so we can use both.
 1.216 28-Apr-2008  martin branches: 1.216.2;
Remove clause 3 and 4 from TNF licenses
 1.215 25-Jan-2008  ad branches: 1.215.6; 1.215.8; 1.215.10;
Remove VOP_LEASE. Discussed on tech-kern.
 1.214 02-Jan-2008  ad Merge vmlocking2 to head.
 1.213 26-Nov-2007  pooka branches: 1.213.2; 1.213.6;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.212 10-Oct-2007  ad branches: 1.212.4;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.211 08-Oct-2007  ad Merge ffs locking & brelse changes from the vmlocking branch.
 1.210 29-Jul-2007  ad branches: 1.210.4; 1.210.6; 1.210.8; 1.210.10;
It's not a good idea for device drivers to modify b_flags, as they don't
need to understand the locking around that field. Instead of setting
B_ERROR, set b_error instead. b_error is 'owned' by whoever completes
the I/O request.
 1.209 27-Jul-2007  pooka Change unused fflags parameter in VOP_MMAP to prot and pass in
desired vm protection.
 1.208 10-Jul-2007  perseant branches: 1.208.2;
Move the "vp = NULL" assignment after the code that requires vp != NULL.
Reported by Chris Ross on current-users.
 1.207 09-Jul-2007  ad Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.206 24-Apr-2007  perseant Get rid of our own private copy of genfs_putpages, having adapted the real
genfs_putpages to suit our purposes.
 1.205 17-Apr-2007  perseant Fix another locking protocol error in lfs_fsync().
 1.204 17-Apr-2007  perseant Fix MP locking protocol violations introduced in my previous commit.
 1.203 17-Apr-2007  perseant Install a new sysctl, vfs.lfs.ignore_lazy_sync, which causes LFS to ignore
the "smooth" syncer, as if vfs.sync.*delay = 0, but only for LFS. The
default is "on", i.e., ignore lazy sync.

Reduce the amount of polling/busy-waiting done by lfs_putpages(). To
accomplish this, copied genfs_putpages() and modified it to indicate which
page it was that caused it to return with EDEADLK. fsync()/fdatasync()
should no longer ever fail with EAGAIN, and should not consume huge
quantities of cpu.

Also, try to make dirops less likely to be written as the result of a
VOP_PUTPAGES(), while ensuring that they are written regularly.
 1.202 05-Apr-2007  perseant correct comment for lfs_putpages
 1.201 04-Mar-2007  christos branches: 1.201.2; 1.201.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.200 21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.199 20-Feb-2007  ad Call genfs_node_destroy() where appropriate.
 1.198 09-Feb-2007  ad branches: 1.198.2;
Merge newlock2 to head.
 1.197 19-Jan-2007  hannken New file system suspension API to replace vn_start_write and vn_finished_write.
The suspension helpers are now put into file system specific operations.
This means every file system not supporting these helpers cannot be suspended
and therefore snapshots are no longer possible.

Implemented for file systems of type ffs.

The new API is enabled on a kernel option NEWVNGATE. This option is
not enabled by default in any kernel config.

Presented and discussed on tech-kern with much input from
Bill Studenmund <wrstuden@netbsd.org> and YAMAMOTO Takashi <yamt@netbsd.org>.

Welcome to 4.99.9 (new vfs op vfs_suspendctl).
 1.196 04-Jan-2007  elad Consistent usage of KAUTH_GENERIC_ISSUSER.
 1.195 03-Jan-2007  perseant Change VONWORKLST handling to better match its other uses; in particular,
check memq and clear VWRITEMAPDIRTY at the same time.
 1.194 09-Dec-2006  chs a smorgasbord of improvements to vnode locking and path lookup:
- LOCKPARENT is no longer relevant for lookup(), relookup() or VOP_LOOKUP().
these now always return the parent vnode locked. namei() works as before.
lookup() and various other paths no longer acquire vnode locks in the
wrong order via vrele(). fixes PR 32535.
as a nice side effect, path lookup is also up to 25% faster.
- the above allows us to get rid of PDIRUNLOCK.
- also get rid of WANTPARENT (just use LOCKPARENT and unlock it).
- remove an assumption in layer_node_find() that all file systems implement
a recursive VOP_LOCK() (unionfs doesn't).
- require that all file systems supply vfs_vptofh and vfs_fhtovp routines.
fill in eopnotsupp() for file systems that don't support being exported
and remove the checks for NULL. (layerfs calls these without checking.)
- in union_lookup1(), don't change refcounts in the ISDOTDOT case, just
adjust which vnode is locked. fixes PR 33374.
- apply fixes for ufs_rename() from ufs_vnops.c rev. 1.61 to ext2fs_rename().
 1.193 16-Nov-2006  christos branches: 1.193.2;
__unused removal on arguments; approved by core.
 1.192 20-Oct-2006  reinoud Replace the LIST structure mp->mnt_vnodelist to a TAILQ structure since all
vnodes were synced and processed backwards. This meant that the last
accessed node was processed first and the earlierst last.

An extra benefit is the removal of the ugly hack from the Berkly days on
LFS.

In the proces, i've also replaced the various variations hand written loops
by the TAILQ_FOREACH() macro's.
 1.191 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.190 28-Sep-2006  perseant Use lockstatus instead of a homebrewed locking system to control
LFCNWRAPSTOP and LFCNWRAPGO.

Be less verbose about the various looping checks: use log() rather than
printf(), and only log anything if we are really looping ("count = 2" is
not an error condition).

Allow dirops sleeping on available space to be interruptible.
 1.189 15-Sep-2006  perseant branches: 1.189.2;
Don't remark a locked inode with IN_MODIFIED after writing it to disk,
if we ourselves hold the lock. This prevents e.g. mknod from hanging
indefinitely.

Also, always use the return value from VOP_ISLOCKED to determine whether
we hold the lock or someone else does, rather than looking into the lock
structure ourselves.
 1.188 01-Sep-2006  perseant branches: 1.188.2;
Changes to help the roll-forward agent, to wit:

* Mark being-deleted files in the Ifile so we can finish deleting them
at fs mount time.
* Flag the Ifile with "cleaner must clean" when writers are waiting for
the cleaner, rather than relying solely on the cleaner's estimation of
whether it should clean or not.
* Note partial segments written by a user agent (in particular,
fsck_lfs) so that repeated rolls forward don't interfere with one
another.
* Add a new fcntl, LFCNPASS, that allows the log to wrap exactly once,
for better testing of the validity of checkpoints.
* Keep track of the on-disk nlink count when cleaning, so that we don't
partially complete directory operations while cleaning.
* Ensure that every single Ifile inode write represents a consistent
view of the filesystem. In particular, the accounting for the segment
we are writing the inode into must be correct, and the accounting for
the segment that inode used to reside in must be correct. Rather than
just rewriting the inode if we wrote it wrong, rewrite the necessary
ifile blocks before writing the inode so we never write it wrong.
* Don't unmark any VDIROP vnodes if we haven't written them to disk,
avoiding yet another problem with the "wait for the cleaner" error
return from lfs_putpages().

Also, move the last callback to an aiodone call, so we no longer do any
memory management from interrupt context.
 1.187 06-Aug-2006  martin Fix size confusion with lfs_fhandle - and as it now turns out to be the same
as the lfs compat_30_fhandle, g/c the latter.
Add an alias for the LFCNIFILEFH fcntl, so that binaries compiled in the
meantime (with too large lfs_fhandle) continue to work.

This makes vfs_cleanerd work again after the kernel checks filehandle size
more strictly (problem reported by Kurt Schreiner on current-users).
 1.186 31-Jul-2006  martin Make filehandles opaque to userland
 1.185 23-Jul-2006  ad Use the LWP cached credentials where sane.
 1.184 20-Jul-2006  perseant Move the kauth checks up front, so that all new LFS fcntl calls are subject
to the check for superuser privilege.
 1.183 13-Jul-2006  martin Apply _KERNEL_OPT
 1.182 13-Jul-2006  martin Version the lfs_cleanerd internal fcntl() for filehandles too,
so old cleaners should work with newer kernels.
 1.181 13-Jul-2006  martin Fix alignement problems for fhandle_t, exposed by gcc4.1.

While touching all vptofh/fhtovp functions, get rid of VFS_MAXFIDSIZ,
version the getfh(2) syscall and explicitly pass the size available in
the filehandle from userland.

Discussed on tech-kern, with lots of help from yamt (thanks!).
 1.180 29-Jun-2006  perseant Don't wake up the cleaner if the filesystem is unwrappable, and fix the
compatibility fcntls.

Also includes one-line fixes for an MP locking bug and a zero-length FINFO
problem that manifested during testing.
 1.179 24-Jun-2006  perseant Change LFCNWRAP{STOP,GO} to make them more suitable for snapshotting; in
particular, the caller can now choose whether to wait for the condition
to be met, and if the caller of LFCNWRAPSTOP dies or otherwise closes
the descriptor, the filesystem is started again. Updated the ckckp
regression test to use the new semantics.

dump_lfs(8) now uses the fcntls to implement LFS-style snapshotting through
the -X flag, addressing PR#33457 albeit not using fss(4). Fixed a couple
other problems with dump_lfs that manifested themselves during testing.
 1.178 18-May-2006  perseant branches: 1.178.4;
Break out the finfo array manipulation code into two new functions,
lfs_acquire_finfo() and lfs_release_finfo(). Add a debugging check
for zero-length finfo arrays in the segment summary to avoid future
regressions.
 1.177 17-May-2006  perseant Don't be quite so eager to error out from lfs_putpages() when pages are
busy; if we've sensed a possible 3-way deadlock and are not the pagedaemon,
relock and try again.
 1.176 14-May-2006  elad integrate kauth.
 1.175 12-May-2006  perseant Fixes to address the "vinvalbuf: dirty blocks" panic that can occur when
many inodes are cleaned at once. Make sure that we write all the pages
on vnodes that are being flushed, even if we don't think there's room;
drain v_numoutput before lfs_vflush() completes.

Also, don't allow a vnode that is in the process of being cleaned to be
chosen by getnewvnode(); this avoids a segment accounting panic in the case
that a large number of inodes are fed to lfs_markv() all at once.
 1.174 04-May-2006  perseant Change VOP_FCNTL to take an unlocked vnode. Approved by wrstuden@.
 1.173 04-May-2006  perseant Introduce another per-filesystem parameter, lfs_resvseg, to separate the
notion of "how many segments are reserved for the cleaner" from that of
"how many segments are not counted in lfs_bfree". The default value
used for existing filesystems is the same as the previous implicit value
of (lfs_minfreeseg / 2 + 1), modulo some sanity checking.

Count pending dirops on a per-filesystem basis, since once we start
writing them we can't stop until we're done. This seems to help stave off
the "no clean segments" panic in the case of filling the filesystem with
directories and small files (e.g. simultaneously unpacking more copies of
pkgsrc than will fit).
 1.172 02-May-2006  perseant Fix a "locking against myself": lfs_flush_dirops() doesn't need to lock the
vnodes to write their blocks, since it holds the segment lock.
 1.171 01-May-2006  perseant Don't ever partially write dirops, even if we need the cleaner to run.
This increases the chances of the "no clean segments" panic slightly,
but allows us to run the ckckp regression test successfully to completion.
 1.170 30-Apr-2006  perseant Postpone the segment accounting changes coming from truncation until the
inode that makes those changes valid is either written to disk by
lfs_writeinode() or discarded by lfs_vfree().

A couple of locking fixes are also included as well.
 1.169 18-Apr-2006  perseant Yet another MP locking issue.
 1.168 17-Apr-2006  perseant Introduce two fcntl calls that freeze the filesystem right at the point
where segment 0 is being considered for writing. This allows for automated
checkpoint vailidity scanning, and could be used (in conjunction with the
existing LFCNREWIND) for e.g. snapshot dumps as well.

Include a regression test that does such scanning.

When writing the Ifile, loop through the dirty block list three times to
make sure that the checkpoint is always consistent (the first and second
times the Ifile blocks can cross a segment boundary; not so the third time
unless the segments are very small). Discovered by using the aforementioned
regression test.
 1.167 13-Apr-2006  perseant Make lfs_vref/lfs_vunref not need to know about VXLOCK and VFREEING
explicitly (especially since we didn't know about VFREEING at all before),
but notice the EBUSY return from vget() instead.

Fix some more MP locking protocol issues, most of which were pointed out by
Christian Ehrhardt this morning on tech-kern.
 1.166 11-Apr-2006  perseant Another MP locking fix.
 1.165 10-Apr-2006  perseant Don't leak vnode references if we fail to lock a vnode in lfs_flush_pchain().
Also fix another (probably only academic) simple_lock protocol error.
 1.164 08-Apr-2006  perseant Implement a somewhat finer-grained mechanism for paging LFS-backed pages.
The writer daemon, if it does not need to flush the whole filesystem,
now only writes the vnodes for which the pagedaemon has requested pageouts
(although it does not pay attention to the page ranges the pagedaemon
supplies).
 1.163 07-Apr-2006  perseant Several minor bug fixes:

* Correct (weak) segment lock assertions in lfs_fragextend and lfs_putpages.
* Keep IN_MODIFIED set if we run out of avail in lfs_putpages.
* Don't try to (re)write buffers on a VBLK vnode; fixes a panic I found
while running with an LFS root.
* Raise priority of LFCNSEGWAIT to PVFS; PUSER is way too low for
something the pagedaemon is relying on.
 1.162 01-Apr-2006  perseant Make sure we unlock to zero when avoiding 3-way deadlock; otherwise we
simply have a different form of deadlock.
 1.161 31-Mar-2006  perseant Handle the "filesystem is clean" flag correctly when upgrading from
read-only to read-write mount. This makes "root on lfs" work for me,
although it looks like a different traceback from PR#32667.
 1.160 30-Mar-2006  yamt some cleanups after the introduction of GOP_SIZE_MEM flag.
- remove GOP_SIZE_READ/GOP_SIZE_WRITE flags.
they have not been used since the change.
- ufs_balloc_range: remove code which has been no-op since the change.
thanks Konrad Schroder for explaining the original intention of the code.
- ffs_gop_size: don't extend past eof, in the case of GOP_SIZE_MEM.
otherwise genfs_getpages end up to allocate pages past eof unnecessarily.
 1.159 28-Mar-2006  perseant Don't let the pagedaemon wait for pages, since that is just asking for
a deadlock.
 1.158 24-Mar-2006  perseant Improvements to LFS's paging mechanism, to wit:

* Acknowledge that sometimes there are more dirty pages to be written to
disk than clean segments. When we reach the danger line,
lfs_gop_write() now returns EAGAIN. The caller of VOP_PUTPAGES(), if
it holds the segment lock, drops it and waits for the cleaner to make
room before continuing.

* Note and avoid a three-way deadlock in lfs_putpages (a writer holding
a page busy blocks on the cleaner while the cleaner blocks on the
segment lock while lfs_putpages blocks on the page).
 1.157 11-Dec-2005  christos branches: 1.157.4; 1.157.6; 1.157.8; 1.157.10; 1.157.12;
merge ktrace-lwp.
 1.156 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.155 13-Sep-2005  christos branches: 1.155.2;
split out lfs_itimes(). It is used in fsck_lfs.
 1.154 12-Sep-2005  christos Use nanotime() to update the time fields in filesystems. Convert the code
from macros to real functions. Original patch and review from chuq.
Note: ext2fs only keeps seconds in the on-disk inode, and msdosfs does not
have enough precision for all fields, so this is not very useful for those
two.
 1.153 19-Aug-2005  christos 64 bit inode changes.
 1.152 29-May-2005  christos branches: 1.152.2;
- sprinkle const
- avoid shadow variables.
 1.151 20-May-2005  perseant VOP_LOCK drops the interlock; pick it up again to avoid an "already unlocked"
panic in lfs_putpages.
 1.150 27-Apr-2005  perseant Recognize that we hold the v_interlock when relocking after a flush in
lfs_putpages.
 1.149 25-Apr-2005  skrll Use the right arg structure for lfs_setattr, i.e. s/getattr/setattr/.
 1.148 23-Apr-2005  perseant Provide a resize_lfs(8), including kernel and cleaner support. The current
implementation requires the fs to be mounted while resizing. Tested in both
directions, and everything appears to work happily, but ymmv.
 1.147 19-Apr-2005  perseant Keep per-inode, per-fs, and subsystem-wide counts of blocks allocated through
lfs_balloc(), and use that to estimate the number of dirty pages belonging
to LFS (subsystem or filesystem). This is almost certainly wrong for
the case of a large mmap()ed region, but the accounting is tighter than
what we had before, and performs much better in the typical case of pages
dirtied through write().
 1.146 18-Apr-2005  perseant Check for the inode having been previously freed, in UNMARK_VNODE().
Avoids a panic when calling mkdir() on a full filesystem.
 1.145 16-Apr-2005  perseant Use splay trees, rather than a hash table, to manage the accounting of
blocks allocated through VOP_BALLOC() for pages to be written to disk.
This accounting no longer takes a noticeable fraction of the system CPU.
 1.144 16-Apr-2005  perseant Use lfs_malloc() to manage the blkiov arrays that the cleaner functions use,
since the cleaner is likely to operate in a low-memory condition.
 1.143 14-Apr-2005  perseant Tabify leading whitespace
 1.142 14-Apr-2005  perseant Consolidate the hash table we use to maintain the integrity of lfs_avail
into a single, system-wide table, rather than having a separate hash table
per inode. Significantly reduces the "system" cpu usage of your average
file write.
 1.141 01-Apr-2005  perseant Protect various per-fs structures with fs->lfs_interlock simple_lock, to
improve behavior in the multiprocessor case. Add debugging segment-lock
assertion statements.
 1.140 25-Mar-2005  perseant Don't sleep while holding the vnode interlock. Should take care of the
first panic case in PR #26043.
 1.139 24-Mar-2005  chs avoid the need for recursive locking lfs_flush_dirops() by unlocking
the vnode around the call to this in the caller.
 1.138 23-Mar-2005  perseant Make LFS dirops get their vnode first, before incrementing the dirop count,
to prevent a deadlock trying to call VOP_PUTPAGES() on a VDIROP vnode.
This can happen when a stacked filesystem is mounted on top of an LFS: an
LFS dirop needs to get a vnode, which is available from the upper layer.
The corresponding lower layer vnode, however, is VDIROP, so the upper layer
can't be cleaned out since its VOP_PUTPAGES() is passed through to the lower
layer, which waits for dirops to drain before it can proceed. Deadlock.

Tweak ufs_makeinode() and ufs_mkdir() to pass the a_vpp argument through
to VOP_VALLOC().

Partially addresses PR # 26043, though it probably does not completely fix
the problem described there.
 1.137 08-Mar-2005  simonb branches: 1.137.2;
Tab Police.
 1.136 08-Mar-2005  perseant Straighten out the maze of ifdefs. Instead, consolidate all the debugging
stuff under '#ifdef DEBUG', and use sysctl knobs to turn on/off particular
parts of the debugging reporting (if DEBUG is enabled). Re-enable the LFS
statistics in sysctl, while I'm there. A bit of a rototill.
 1.135 26-Feb-2005  perry nuke trailing whitespace
 1.134 26-Feb-2005  perseant Various minor LFS improvements:

* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statvfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().
 1.133 25-Jan-2005  wrstuden Extend fsync_range(2) to support the FDISKSYNC flag, which requests
that the sync be propogated out through the disk drive caches.
 1.132 22-Apr-2004  yamt branches: 1.132.4; 1.132.6;
check_dirty: fix another PHOLD leak. ("goto top" path)
 1.131 21-Apr-2004  christos Replace the statfs() family of system calls with statvfs().
Retain binary compatibility.
 1.130 20-Apr-2004  yamt check_dirty: plug a PHOLD leak. from Greg Oster.
 1.129 26-Feb-2004  yamt branches: 1.129.4;
lfs_putpages: fix a simple_lock mismatch.
 1.128 26-Jan-2004  hannken Fix xxx_strategy() to use the vnode arg instead of bp->b_vp.
 1.127 25-Jan-2004  hannken Make VOP_STRATEGY(bp) a real VOP as discussed on tech-kern.

VOP_STRATEGY(bp) is replaced by one of two new functions:

- VOP_STRATEGY(vp, bp) Call the strategy routine of vp for bp.
- DEV_STRATEGY(bp) Call the d_strategy routine of bp->b_dev for bp.

DEV_STRATEGY(bp) is used only for block-to-block device situations.
 1.126 16-Dec-2003  yamt - reduce code duplication.
- use boolean_t where appropriate.
 1.125 16-Dec-2003  yamt g/c lfs_no_inactive.
 1.124 25-Nov-2003  yamt use FINFOSIZE macro.
 1.123 30-Oct-2003  simonb Remove some assigned-to but otherwise unused variables.
 1.122 25-Oct-2003  christos Fix uninitialized variable warnings.
 1.121 21-Oct-2003  fvdl Correct preempt() calls.
 1.120 18-Oct-2003  yamt be more strict about sa->vp.
(make sure the last lfs_updatemata in lfs_putpages takes effect.)
 1.119 14-Oct-2003  dbj add mnt_iflag field to struct mount for internal flags
mv MNT_GONE, MNT_UNMOUNT and MNT_WANTRDWR to this field
additonally add mnt_writeopcountupper and mnt_writeopcountlower fields
in preparation for pending write suspension support work
bump kernel version to 1.6ZD
 1.118 24-Sep-2003  yamt fix a bug of lfs.

genfs_getpages() can read in more blocks than it should due to faked filesize
of lfs_gop_size(). it's a security problem and it makes gcc3 "internal error"

to fix this,
- in genfs_getpages(), always calculate diskeof and memeof separately
so that filesystems (in this case, lfs) can use different strategies
for them.
- introduce GOP_SIZE_MEM flag and use it to request in-core filesize.
(it was an intention of GOP_SIZE_READ,
but after the above change _READ is not a straightforward name)

after this, no one uses GOP_SIZE_{READ,WRITE} anymore but leave them for now.
 1.117 23-Sep-2003  yamt cleanup IN_ADIROP/VDIROP handling a little.
 1.116 23-Sep-2003  yamt remove unnecessary externs of lfs_do_flush.
 1.115 20-Sep-2003  yamt some comments
 1.114 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.113 12-Jul-2003  yamt more MP locks.
 1.112 12-Jul-2003  yamt - protect global resource counts with lfs_subsys_lock.
- clean up scattered externs a little.
 1.111 02-Jul-2003  yamt - add a new functions, lfs_writer_enter/leave, and use them instead of
duplicated code fragments.
- add an assertion.
 1.110 02-Jul-2003  yamt drain dirops before aqcuiring seglock. otherwise it might deadlocks.
PR/20676 (Karl Knutsson)
 1.109 29-Jun-2003  fvdl branches: 1.109.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.108 29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.107 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.106 07-May-2003  ragge Add a missing ifdef DDB.
 1.105 02-May-2003  perseant Correct arguments to check_dirty, ensuring that all pages in a block are
written if any of them are dirty. Pointed out by yamt.
 1.104 27-Apr-2003  yamt fix a comment.
 1.103 23-Apr-2003  perseant Make LFS work better (though still not "well") as an NFS-exported
filesystem (and other things that needed to be fixed before the tests
would complete), to wit:

* Include the fs ident in the filehandle; improve stale filehandle checks.

* Change definition of blksize() to use the on-dinode size instead of
the inode's i_size, so that fsck_lfs will work properly again.

* Use b_interlock in lfs_vtruncbuf.

* Postpone dirop reclamation until after the seglock has been released,
so that lfs_truncate is not called with the segment lock held.

* Don't loop in lfs_fsync(), just write everything and wait.

* Be more careful about the interlock/uobjlock in lfs_putpages: when we
lose this lock, we have to resynchronize dirtiness of pages in each
block.

* Be sure to always write indirect blocks and update metadata in
lfs_putpages; fixes a bug that caused blocks to be accounted to the
wrong segment.
 1.102 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.101 01-Apr-2003  yamt lfs_strategy is used only for read.
 1.100 28-Mar-2003  perseant Add a sleeper count, to prevent the cleaner from panicing the kernel
when the filesystem is unmounted, relocking the Ifile when its lock is
draining. (We can't use vfs_busy() since the process is sleeping for a
good long time.) Clean up / organize lfs.h, while I'm here.

In lfs_update_single, assert that disk addresses are either negative, or
are still positive when converted to int32_t, to prevent recurrence of a
negative/positive block problem.
 1.99 22-Mar-2003  perseant Unlock ifile inode during streamlined VOP_INACTIVE.
 1.98 21-Mar-2003  perseant KNF (space after keywords).
 1.97 21-Mar-2003  perseant Use VONWORKLST as a heuristic for vnode emptiness, rather than exhaustively
checking the memq.

Take greater care not to dirty the Ifile vnode when unmounting the filesystem.
This should fix a "(vp->v_flag & VONWORKLST) == 0" assertion panic in vgonel
that could occur when unmounting.

Do not allow the Ifile to be mapped for writing.
 1.96 15-Mar-2003  perseant Add simple_lock protection for lfs_seglock and lfs_subsys_pages; these will
be expanded to cover other per-fs and subsystem-wide data as well.

Fix a case of IN_MODIFIED being set without updating lfs_uinodes, resulting
in a "lfs_uinodes < 0" panic.

Fix a deadlock in lfs_putpages arising from the need to busy all pages in a
block; unbusy any that had already been busied before starting over.
 1.95 08-Mar-2003  perseant Take away "#ifdef LFS_UBC".
 1.94 08-Mar-2003  perseant Add an lfs_strategy() that checks to make sure we're not trying to read
where the cleaner is trying to write, instead of tying up the "live"
buffers (or pages).

Fix a bug in the LFS_UBC case where oversized buffers would not be
checksummed correctly, causing uncleanable segments.

Make sure that wakeup(fs->lfs_iocount) is done if fs->lfs_iocount is 1
as well as 0, since we wait in some places for it to drop to 1.

Activate all pages that make it into lfs_gop_write without the segment
lock held, since they must have been dirtied very recently, even if
PG_DELWRI is not set.
 1.93 04-Mar-2003  perseant Make sure we hold the uobjlock when checking for dirty pages, in lfs_vflush.
Note that pages can become dirty without our knowing it, anyway; don't
panic if that happens.
 1.92 02-Mar-2003  perseant Account SEGUSE_ACTIVE correctly so that the automatic segment cleaning
actually happens.

Add a new fcntl call that will write the minimum necessary to checkpoint
(i.e., for on-disk directory structure to be consistent, not including
updates to file data) so that the cleaner can clean segments more quickly
without sacrificing three-way commit for cleaning.
 1.91 01-Mar-2003  yamt use pid_t for pid.
 1.90 25-Feb-2003  perseant Make fs-specific fcntl macros take three arguments (approved wrstuden).
Let LFS use fcntl for cleaner functions.
 1.89 24-Feb-2003  perseant Add lfs_ioctl vnode op, with ioctls to take over cleaner system call
functionality (not including segment clean, since that is now done
automatically as checkpoints happen).
 1.88 23-Feb-2003  perseant Fix a buffer overflow bug in the LFS_UBC case that manifested itself
either as a mysterious UVM error or as "panic: dirty bufs". Verify
maximum size in lfs_malloc.

Teach lfs_updatemeta and lfs_shellsort about oversized cluster blocks from
lfs_gop_write.

When unwiring pages in lfs_gop_write, deactivate them, under the theory
that the pagedaemon wanted to free them last we knew.
 1.87 22-Feb-2003  yamt fix simple_lock/unlock mismatches.
 1.86 20-Feb-2003  perseant Tabify, and fix some comment alignment problems.
 1.85 19-Feb-2003  yamt wire the pages instead of just dequeue'ing them.
advised by Chuck Silvers.
 1.84 17-Feb-2003  perseant Add code to UBCify LFS. This is still behind "#ifdef LFS_UBC" for now
(there are still some details to work out) but expect that to go
away soon. To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:

* Create a writer daemon kernel thread whose purpose is to handle page
writes for the pagedaemon, but which also takes over some of the
functions of lfs_check(). This thread is started the first time an
LFS is mounted.

* Add a "flags" parameter to GOP_SIZE. Current values are
GOP_SIZE_READ, meaning that the call should return the size of the
in-core version of the file, and GOP_SIZE_WRITE, meaning that it
should return the on-disk size. One of GOP_SIZE_READ or
GOP_SIZE_WRITE must be specified.

* Instead of using malloc(...M_WAITOK) for everything, reserve enough
resources to get by and use malloc(...M_NOWAIT), using the reserves if
necessary. Use the pool subsystem for structures small enough that
this is feasible. This also obsoletes LFS_THROTTLE.

And a few that are not strictly necessary:

* Moves the LFS inode extensions off onto a separately allocated
structure; getting closer to LFS as an LKM. "Welcome to 1.6O."

* Unified GOP_ALLOC between FFS and LFS.

* Update LFS copyright headers to correct values.

* Actually cast to unsigned in lfs_shellsort, like the comment says.

* Keep track of which segments were empty before the previous
checkpoint; any segments that pass two checkpoints both dirty and
empty can be summarily cleaned. Do this. Right now lfs_segclean
still works, but this should be turned into an effectless
compatibility syscall.
 1.83 03-Feb-2003  perseant Don't call a dirop within a dirop: if lfs_rename is actually deleting
a link, call lfs_remove directly before starting dirop rather than
having ufs_rename do it.
 1.82 30-Jan-2003  yamt there's no need to treat VOP_WHITEOUT as dirop
because it modifies only one inode.
 1.81 25-Jan-2003  kleink Fix further printf format warnings for DEBUG, in the wake of daddr_t
having changed.
 1.80 18-Jan-2003  thorpej Merge the nathanw_sa branch.
 1.79 08-Jan-2003  yamt for lfs_remove/lfs_rmdir, keep removed vnodes marked VDIROP.
(backout parts of rev.1.40)
otherwise, directory structures can be corrupted because checkpoints can
occur via eg. lfs_vflush before parent directory is written.
 1.78 08-Jan-2003  yamt in set_dirop/endop, use normal vref/vrele instead of lfs versions
so that we don't miss lfs_inactivate.
 1.77 08-Jan-2003  yamt add assertions.
 1.76 08-Jan-2003  yamt use lfs_unmark_vnode instead of duplicated code fragments.
 1.75 29-Dec-2002  yamt backout assertions in lfs_inactive.
they can be false when unmounting forcibly.
 1.74 28-Dec-2002  christos fix compile problem.
 1.73 28-Dec-2002  yamt avoid warnings without DIAGNOSTIC.

pointed by Andreas Wrede.
 1.72 28-Dec-2002  yamt dirop inode can't be passed to lfs_inactivate.
 1.71 28-Dec-2002  yamt - in lfs_reserve, vref vnodes that we're locking so that cleaner doesn't
try to reclaim them.
(workaround for deadlock noted in the comment in lfs_reserveavail)
- in lfs_rename, mark vnodes which are being moved as well as directry vnodes.
 1.70 26-Dec-2002  yamt - in lfs_reserve, reserve locked buffer count as well.
- don't wait for locking buf in lfs_bwrite_ext to avoid deadlocks.
- skip lfs_reserve when we're doing dirop.
reserve more (for lfs_truncate) in set_dirop instead.

this mostly solves PR 18972. (and hopefully PR 19196)
 1.69 24-Nov-2002  yamt correct locking for lfs_rmdir. PR 18976.
 1.68 23-Oct-2002  jdolecek merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe
 1.67 27-Sep-2002  provos remove trailing \n in panic(). approved perry.
 1.66 22-Sep-2002  jdolecek don't need <sys/conf.h> here
 1.65 16-Jun-2002  perseant For synchronous writes, keep separate i/o counters for each write, so
processes don't have to wait for one another to finish (e.g., nfsd seems
to be a little happier now, though I haven't measured the difference).
Synchronous checkpoints, however, must always wait for all i/o to finish.

Take the contents of the callback functions and have them run in thread
context instead (aiodoned thread). lfs_iocount no longer has to be
protected in splbio(), and quite a bit less of the segment construction
loop needs to be in splbio() as well.

If lfs_markv is handed a block that is not the correct size according to
the inode, refuse to process it. (Formerly it was extended to the "correct"
size.) This is possibly more prone to deadlock, but less prone to corruption.

lfs_segclean now outright refuses to clean segments that appear to have live
bytes in them. Again this may be more prone to deadlock but avoids
corruption.

Replace ufsspec_close and ufsfifo_close with LFS equivalents; this means
that no UFS functions need to know about LFS_ITIMES any more. Remove
the reference from ufs/inode.h.

Tested on i386, test-compiled on alpha.
 1.64 17-May-2002  perseant branches: 1.64.2;
use macros from <sys/queue.h>
 1.63 14-May-2002  perseant branches: 1.63.2;
Phase one of my three-phase plan to make LFS play nice with UBC, and bug-fixes
I found while making sure there weren't any new ones.

* Make the write clusters keep track of the buffers whose blocks they contain.
This should make it possible to (1) write clusters using a page mapping
instead of malloc, if desired, and (2) schedule blocks for rewriting
(somewhere else) if a write error occurs. Code is present to use
pagemove() to construct the clusters but that is untested and will go away
anyway in favor of page mapping.
* DEBUG now keeps a log of Ifile writes, so that any lingering instances of
the "dirty bufs" problem can be properly debugged.
* Keep track of whether the Ifile has been dirtied by various routines that
can be called by lfs_segwrite, and loop on that until it is clean, for
a checkpoint. Checkpoints need to be squeaky clean.
* Warn the user (once) if the Ifile grows larger than is reasonable for their
buffer cache. Both lfs_mountfs and lfs_unmount check since the Ifile can
grow.
* If an inode is not found in a disk block, try rereading the block, under
the assumption that the block was copied to a cluster and then freed.
* Protect WRITEINPROG() with splbio() to fix a hang in lfs_update.
 1.62 27-Apr-2002  perseant Make exported LFSes not panic on the first file create.
 1.61 11-Feb-2002  perseant Include the space taken by inodes in the count made by lfs_check();
make VOP_SETATTR call lfs_check. This prevents large numbers of inode
changes (say, at the end of tar(1)) from filling the buffer cache.
 1.60 18-Dec-2001  chs use the new compatibility routines to allow mmap() to work
(in the same non-coherent fashion that it worked pre-UBC)
until someone has time to do it the right way.
 1.59 23-Nov-2001  chs add spaces for KNF. confirmed to produce identical objects.
 1.58 08-Nov-2001  lukem add RCSID
 1.57 26-Oct-2001  lukem remove #include <ufs/ufs/quota.h> where it was just to appease
<ufs/ufs/inode.h>, since the latter now includes the former. leave the former
in source that obviously uses specific bits of it (for completeness.)
 1.56 22-Sep-2001  sommerfeld branches: 1.56.2;
Add fifo_putpages() placebo so that the vnode's uobj is unlocked.
 1.55 15-Sep-2001  chs a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.54 24-Aug-2001  chs branches: 1.54.2;
disable mmap() for LFS until it is fixed.
 1.53 17-Aug-2001  chs add getpages/putpages entries for spec vnodes.
 1.52 24-Jul-2001  assar change vop_symlink and vop_mknod to return vpp (the created node)
refed, so that the caller can actually use it. update callers and
file systems that implement these vnode operations
 1.51 13-Jul-2001  perseant Merge the short-lived perseant-lfsv2 branch into the trunk.

Kernels and tools understand both v1 and v2 filesystems; newfs_lfs
generates v2 by default. Changes for the v2 layout include:

- Segments of non-PO2 size and arbitrary block offset, so these can be
matched to convenient physical characteristics of the partition (e.g.,
stripe or track size and offset).

- Address by fragment instead of by disk sector, paving the way for
non-512-byte-sector devices. In theory fragments can be as large
as you like, though in reality they must be smaller than MAXBSIZE in size.

- Use serial number and filesystem identifier to ensure that roll-forward
doesn't get old data and think it's new. Roll-forward is enabled for
v2 filesystems, though not for v1 filesystems by default.

- The inode free list is now a tailq, paving the way for undelete (undelete
is not yet implemented, but can be without further non-backwards-compatible
changes to disk structures).

- Inode atime information is kept in the Ifile, instead of on the inode;
that is, the inode is never written *just* because atime was changed.
Because of this the inodes remain near the file data on the disk, rather
than wandering all over as the disk is read repeatedly. This speeds up
repeated reads by a small but noticeable amount.

Other changes of note include:

- The ifile written by newfs_lfs can now be of arbitrary length, it is no
longer restricted to a single indirect block.

- Fixed an old bug where ctime was changed every time a vnode was created.
I need to look more closely to make sure that the times are only updated
during write(2) and friends, not after-the-fact during a segment write,
and certainly not by the cleaner.
 1.50 22-Jan-2001  jdolecek branches: 1.50.2; 1.50.4; 1.50.6;
make filesystem vnodeop, specop, fifoop and vnodeopv_* arrays const
 1.49 18-Nov-2000  toshii Make buildable again.
The previous commit was a backout of rev. 1.45, which must be an accident.
 1.48 17-Nov-2000  perseant Correct accounting of lfs_avail, locked_queue_count, and locked_queue_bytes.
(PR #11468). In the case of fragment allocation, check to see if enough
space is available before extending a fragment already scheduled for writing.

The locked_queue_* variables indicate the number of buffer headers and bytes,
respectively, that are unavailable to getnewbuf() because they are locked up
waiting for LFS to flush them; make sure that that is actually what we're
counting, i.e., never count malloced buffers, and always use b_bufsize instead
of b_bcount.

If DEBUG is defined, the periodic calls to lfs_countlocked will now complain
if either counter is incorrect. (In the future lfs_countlocked will not need
to be called at all if DEBUG is not defined.)
 1.47 12-Nov-2000  perseant Do not needlessly dirty segment table blocks during lfs_segwrite,
preventing needless disk activity when the filesystem is idle. (PR #10979.)
 1.46 14-Oct-2000  perseant In lfs_truncate, don't overcount the real blocks removed from the inode,
when deallocating a fragment that has not made it to disk yet.

Also, during dirops, give the directory vnode an extra reference in
SET_DIROP, to ensure its continued existence during SET_ENDOP, preventing
a possible NULL-dereference there.

These two changes should close PR #11064.
 1.45 19-Sep-2000  fvdl Adapt for VOP_FSYNC parameter change.
 1.44 09-Sep-2000  perseant Various bug-fixes to LFS, to wit:


Kernel:

* Add runtime quantity lfs_ravail, the number of disk-blocks reserved
for writing. Writes to the filesystem first reserve a maximum amount
of blocks before their write is allowed to proceed; after the blocks
are allocated the reserved total is reduced by a corresponding amount.

If the lfs_reserve function cannot immediately reserve the requested
number of blocks, the inode is unlocked, and the thread sleeps until
the cleaner has made enough space available for the blocks to be
reserved. In this way large files can be written to the filesystem
(or, smaller files can be written to a nearly-full but thoroughly
clean filesystem) and the cleaner can still function properly.

* Remove explicit switching on dlfs_minfreeseg from the kernel code; it
is now merely a fs-creation parameter used to compute dlfs_avail and
dlfs_bfree (and used by fsck_lfs(8) to check their accuracy). Its
former role is better assumed by a properly computed dlfs_avail.

* Bounds-check inode numbers submitted through lfs_bmapv and lfs_markv.
This prevents a panic, but, if the cleaner is feeding the filesystem
the wrong data, you are still in a world of hurt.

* Cleanup: remove explicit references of DEV_BSIZE in favor of
btodb()/dbtob().

lfs_cleanerd:

* Make -n mean "send N segments' blocks through a single call to
lfs_markv". Previously it had meant "clean N segments though N calls
to lfs_markv, before looking again to see if more need to be cleaned".
The new behavior gives better packing of direct data on disk with as
little metadata as possible, largely alleviating the problem that the
cleaner can consume more disk through inefficient use of metadata than
it frees by moving dirty data away from clean "holes" to produce
entirely clean segments.

* Make -b mean "read as many segments as necessary to write N segments
of dirty data back to disk", rather than its former meaning of "read
as many segments as necessary to free N segments worth of space". The
new meaning, combined with the new -n behavior described above,
further aids in cleaning storage efficiency as entire segments can be
written at once, using as few blocks as possible for segment summaries
and inode blocks.

* Make the cleaner take note of segments which could not be cleaned due
to error, and not attempt to clean them until they are entirely free
of dirty blocks. This prevents the case in which a cleanerd running
with -n 1 and without -b (formerly the default) would spin trying
repeatedly to clean a corrupt segment, while the remaining space
filled and deadlocked the filesystem.

* Update the lfs_cleanerd manual page to describe all the options,
including the changes mentioned here (in particular, the -b and -n
flags were previously undocumented).

fsck_lfs:

* Check, and optionally fix, lfs_avail (to an exact figure) and
lfs_bfree (within a margin of error) in pass 5.

newfs_lfs:

* Reduce the default dlfs_minfreeseg to 1/20 of the total segments.

* Add a warning if the sgs disklabel field is 16 (the default for FFS'
cpg, but not usually desirable for LFS' sgs: 5--8 is a better range).

* Change the calculation of lfs_avail and lfs_bfree, corresponding to
the kernel changes mentioned above.

mount_lfs:

* Add -N and -b options to pass corresponding -n and -b options to
lfs_cleanerd.

* Default to calling lfs_cleanerd with "-b -n 4".


[All of these changes were largely tested in the 1.5 branch, with the
idea that they (along with previous un-pulled-up work) could be applied
to the branch while it was still in ALPHA2; however my test system has
experienced corruption on another filesystem (/dev/console has gone
missing :^), and, while I believe this unrelated to the LFS changes, I
cannot with good conscience request that the changes be pulled up.]
 1.43 05-Jul-2000  perseant Clean up accounting of lfs_uinodes (dirty but unwritten inodes).

Make lfs_uinodes a signed quantity for debugging purposes, and set it to
zero as fs mount time.

Enclose setting/clearing of the dirty flags (IN_MODIFIED, IN_ACCESSED,
IN_CLEANING) in macros, and use those macros everywhere. Make
LFS_ITIMES use these macros; updated the ITIMES macro in inode.h to know
about this. Make ufs_getattr use ITIMES instead of FFS_ITIMES.
 1.42 01-Jul-2000  perseant Move SET_ENDOP after vrele to avoid deactivating vnode twice, if
SET_ENDOP triggers a write.
 1.41 28-Jun-2000  mrg remove include of <vm/vm.h> and <uvm/uvm_extern.h>
 1.40 27-Jun-2000  perseant Fixes associated with filling an LFS:

Change the space computation to appear to change the size of the *disk*
rather than the *bytes used* when more segment summaries and inode
blocks are written. Try to estimate the amount of space that these will
take up when more files are written, so the disk size doesn't change too
much.

Regularize error returns from lfs_valloc, lfs_balloc, lfs_truncate: they
now fail entirely, rather than succeeding half-way and leaving the fs in
an inconsistent state.

Rewrite lfs_truncate, mostly stealing from ffs_truncate. The old
lfs_truncate had difficulty truncating a large file to a non-zero size
(indirect blocks were not handled appropriately).

Unmark VDIROP on fvp after ufs_remove, ufs_rmdir, so these can be
reclaimed immediately: this vnode would not be written to disk again
anyway if the removal succeeded, and if it failed, no directory
operation occurred.

ufs_makeinode and ufs_mkdir now remove IN_ADIROP on error.
 1.39 22-Jun-2000  perseant Update lfs_vunref for the fact that now a vnode can be locked with no
references (locked for VOP_INACTIVE at the end of vrele) and it's okay.
Check the return value of lfs_vref where appropriate.
Fixes PR #s 10285 and 10352.
 1.38 31-May-2000  perseant branches: 1.38.2;
update for IN_ACCESSED changes
 1.37 27-May-2000  perseant branches: 1.37.2;
Prevent dirops from getting around lfs_check and wedging the buffer cache.
All the dirop vnops now mark the inodes with a new flag, IN_ADIROP, which
is removed as soon as the dirop is done (as opposed to VDIROP which stays
until the file is written). To address one issue raised in PR#9357.
 1.36 13-May-2000  perseant Change the sementics of the last parameter from a boolean ("waitfor") to
a set of flags ("flags"). Two flags are defined, UPDATE_WAIT and
UPDATE_DIROP.

Under the old semantics, VOP_UPDATE would block if waitfor were set,
under the assumption that directory operations should be done
synchronously. At least LFS and FFS+softdep do not make this
assumption; FFS+softdep got around the problem by enclosing all relevant
calls to VOP_UPDATE in a "if(!DOINGSOFTDEP(vp))", while LFS simply
ignored waitfor, one of the reasons why NFS-serving an LFS filesystem
did not work properly.

Under the new semantics, the UPDATE_DIROP flag is a hint to the
fs-specific update routine that the call comes from a dirop routine, and
should be wait for, or not, accordingly.

Closes PR#8996.
 1.35 30-Mar-2000  augustss Remove register declarations.
 1.34 15-Dec-1999  perseant Fix error returns on lfs vnops so that locks and reference counts are
preserved. Handle dirop accounting in lfs_vfree for this case as well.
May address PR#8823.
 1.33 03-Dec-1999  perseant Handle the case of a vnode flush while dirops are active correctly in
lfs_segwrite. Also, make sure a flush is called in SET_DIROP before sleeping
on its results. Addresses PR #8863.
 1.32 15-Nov-1999  fvdl Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.31 06-Nov-1999  perseant branches: 1.31.2;
Address ufs_hashlock/ufs_ihashins protocol bug, discovered while doing a
post-mortem of a production machine. Also, take the active dirop
count off of the fs and make it global (since it is measuring a global
resource) and tie the threshold value LFS_MAXDIROP to desiredvnodes.
 1.30 05-Nov-1999  perseant Better fix for PR# 8577: before setting dirops, check for cross-device
rename and error out. This avoids possible problems with attempting
rename between two LFSs.
 1.29 01-Nov-1999  perseant Check that the destination vnode is on an LFS before trying to twiddle
its superblock. Fixes PR#8577.
 1.28 03-Sep-1999  perseant branches: 1.28.2; 1.28.4; 1.28.6;
Make changes that will allow an LFS filesystem to be used as the root
filesystem. In particular,

- Fix mknod deadlock, described in PR 8172.
- Enable lfs_mountroot.
- Make lfs_writevnodes treat filesystems mounted on lfs device nodes properly,
by flushing that device rather than trying to add blocks to the device inode.

This, in combination with lfs boot blocks, will allow operation of an all-lfs
system.
 1.27 03-Aug-1999  wrstuden Add support for fcntl(2) to generate VOP_FCNTL calls. Any fcntl
call with F_FSCTL set and F_SETFL calls generate calls to a new
fileop fo_fcntl. Add genfs_fcntl() and soo_fcntl() which return 0
for F_SETFL and EOPNOTSUPP otherwise. Have all leaf filesystems
use genfs_fcntl().

Reviewed by: thorpej
Tested by: wrstuden
 1.26 12-Apr-1999  perseant Disallow threshold-initiated cache flush when dirops are active. Also, make
SET_ENDOP use lfs_check instead of inlining most of it.
 1.25 29-Mar-1999  perseant branches: 1.25.2;
Fix the other missing dirop wakeup
 1.24 25-Mar-1999  perseant Since dirop vnodes can't be flushed, they hold a reference until their
dirop is completely written to disk. This means that ordinary calls to
ufs vnops which would ordinarily call VOP_INACTIVE through vrele/vput,
don't. This patch detects that condition after such vnops have been
run, and calls VOP_INACTIVE if it would ordinarily have been called by
the ufs call.
 1.23 25-Mar-1999  perseant clean up unused/required #ifdefs
 1.22 10-Mar-1999  perseant New sources should leave the LFS in a more-or-less working state. Changes
include:

- DIROP segregation is enabled, and greater care is taken
to make sure that a checkpoint completes. Fsck is not
needed to remount the filesystem.
- Several checks to make sure that the LFS subsystem does not
overuse various resources (memory, in particular).
- The cleaner routines, lfs_markv in particular, are completely
rewritten. A buffer overflow is removed. Greater care is taken
to ensure that inodes come from where lfs_cleanerd say they come
from (so we know nothing has changed since lfs_bmapv was called).
- Fragment allocation is fixed, so that writes beyond end-of-file
do the right thing.
 1.21 05-Mar-1999  mycroft Pass null pointers to VOP_UPDATE rather than having all the callers fetch the
current time themselves.
 1.20 06-Nov-1998  cgd argument to dbtob needs to be cast to u_quad_t here to avoid shift lossage
 1.19 01-Sep-1998  thorpej Use the pool allocator and the "nointr" pool page allocator for LFS inodes.
 1.18 24-Jun-1998  sommerfe Always include fifos; "not an option any more".
 1.17 22-Jun-1998  sommerfe defopt for options FIFO
 1.16 05-Jun-1998  kleink Convert fsync vnode operator implementations and usage from the old `waitfor'
argument and MNT_WAIT/MNT_NOWAIT to `flags' and FSYNC_WAIT.
 1.15 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.14 11-Jun-1997  bouyer Add support for ext2fs, this needed a few modifications to ufs/ufs/inode.h:
- added an "union inode_ext" to struct inode, for the per-fs extentions.
For now only ext2fs uses it.
- i_din is now an union:
union {
struct dinode ffs_din; /* 128 bytes of the on-disk dinode. */
struct ext2fs_dinode e2fs_din; /* 128 bytes of the on-disk dinode. */
} i_din
Added a lot of #define i_ffs_* and i_e2fs_* to access the fields.
- Added two macros: FFS_ITIMES and EXT2FS_ITIMES. ITIMES calls the rigth
macro, depending on the time of the inode. ITIMES is used where necessary,
FFS_ITIMES and EXT2FS_ITIMES in other places.
 1.13 07-Sep-1996  mycroft Implement poll(2).
 1.12 01-Sep-1996  mycroft Add a set of generic file system operations that most file systems use.
Also, fix some time stamp bogosities.
 1.11 11-May-1996  mycroft Change VOP_UPDATE() semantics:
* Make 2nd and 3rd args timespecs, not timevals.
* Consistently pass a Boolean as the 4th arg (except in LFS).
Also, fix ffs_update() and lfs_update() to actually change the nsec fields.
 1.10 09-Feb-1996  christos lfs prototypes
 1.9 09-Feb-1996  mycroft Fix vop_link, vop_symlink, and vop_remove semantics in several ways:
* Change the argument names to vop_link so they actually make sense.
* Implement vop_link and vop_symlink for all file systems, so they do proper
cleanup.
* Require the file system to decide whether or not linking and unlinking of
directories is allowed, and disable it for all current file systems.
 1.8 01-Feb-1996  jtc Rename struct timespec fields to conform to POSIX.1b
 1.7 15-Jun-1995  cgd compensate for timeval/timespec/stat structure changes.
 1.6 14-Dec-1994  mycroft Sync with CSRG.
 1.5 13-Dec-1994  mycroft Not ready for part of the previous change yet...
 1.4 13-Dec-1994  mycroft Turn lease_check() into a vnode op, per CSRG.
 1.3 20-Oct-1994  cgd update for new syscall args description mechanism, and deal safely
with wider types.
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.25.2.6 15-Jan-2000  he Pull up revision 1.34 (requested by perseant):
Fix error returns on lfs vnops so that locks and reference counts
are preserved. Handle dirop accounting in lfs_vfree for this
case as well. Addresses PR#8823.
 1.25.2.5 15-Jan-2000  he Pull up revision 1.28 (requested by perseant):
Address problems related to using an LFS filesystem as the root
filesystem, including mknod hangs. Fixes PR#8172 and PR#9072.
 1.25.2.4 18-Dec-1999  he Pull up revision 1.33 (requested by perseant):
Handle the case of a vnode flush while dirops are active correctly
in lfs_segwrite. Also, make sure a flush is called in SET_DIROP
before sleeping on its results. Addresses PR#8863.
 1.25.2.3 17-Dec-1999  he Pull up revision 1.31 (requested by perseant):
Address locking protocol error for inode hash, and make the
maximum number of active dirops a global quantity.
 1.25.2.2 08-Nov-1999  cgd pull up revs 1.29-1.30 from trunk (requested by perseant):
Check for cross-device rename before setting up dirop markers in
lfs_rename. Addresses PR#8577.
 1.25.2.1 13-Apr-1999  perseant branches: 1.25.2.1.2;
Pull-up of changes made to the trunk on Sunday [1.25->1.26], to wit:

Take out the `#ifdef USE_UFSHASH'; use ufs_hashlock to lock the inode free
list instead of free_lock.

Fix inode reporting in lfs_statfs (the meaning of f_files and f_ffree was
reversed).

Fix "lfs_ifind: dinode xxx not found" panic. When inodes were freed, then
immediately reloaded, their dinodes were located in an inode block which
was not on disk at the advertized location, nor in the cache (although it
would be flushed to disk next segment write). Fix this by using getblk()
instead of lfs_newbuf() for inode blocks.

Better checking for held inode locks in lfs_fastvget, for a number of
error conditions. Also change the default setting of lfs_clean_vnhead to
0, which seems to make the locking problems go away (although this is
difficult to test as I can't reliably reproduce them).

Make sure that the wakeup occurs for vnodes that lfs_update might be
sleeping on (nodes which are not marked IN_MODIFIED/IN_CLEANING, but which
have dirty buffers), by marking them with the appropriate flag if
dirtybuffers were added while the write was in progress.

Fix block counting during file truncation, if not truncating to zero.

Disallow threshold-initiated cache flush when dirops are active. Also,
make SET_ENDOP use lfs_check instead of inlining most of it.

Improve the debugging printfs in the cleaner syscalls (in particular, make
it obvious that they're coming from lfs).

Check the superblock version field, and refuse to mount the filesystem if
the version number is higher than we know about. This allows, e.g.,
changes in the format of the ifile, segment size restrictions and
boundaries, etc., which would not affect existing fields in the
superblock, but which would drastically affect the filesystem, to be
smoothly integrated at a later date.
 1.25.2.1.2.5 31-Aug-1999  perseant Rudimentary support for LFS under UBC:

- LFS-specific VOP_BALLOC and VOP_PUTPAGES vnode ops.

- getblk VREG panic #ifdef'd out (can be reinstated when Ifile is
internalized and Ifile can be made another type from VREG)

- interface to VOP_PUTPAGES changed to pass all pager flags, not
just sync. FS putpages routines must know about the pager flags.

- new LFS magic disk address, -2 ("unwritten"), meaning accounted for
but not assigned to a fixed disk location (since LFS does these two
things separately, and the previous accounting method using buffer
headers no longer will work). Changed references to (foo == (daddr_t)-1)
to (foo < 0). Since disk drivers reject all addresses < 0, this should
not present a problem for other FSs.
 1.25.2.1.2.4 11-Jul-1999  chs add placeholders for getpages/putpages.
 1.25.2.1.2.3 02-Jul-1999  thorpej Take two at making a non-converted LFS work in a UBC kernel.
 1.25.2.1.2.2 21-Jun-1999  thorpej Pull in ffs_extern.h to get ffs_balloc_range() prototype for
ufs_readwrite.c
 1.25.2.1.2.1 21-Jun-1999  thorpej Sync w/ -current.
 1.28.6.2 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.28.6.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.28.4.2 15-Nov-1999  fvdl Sync with -current
 1.28.4.1 19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.28.2.3 11-Feb-2001  bouyer Sync with HEAD.
 1.28.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.28.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.31.2.2 06-Nov-1999  perseant Address ufs_hashlock/ufs_ihashins protocol bug, discovered while doing a
post-mortem of a production machine. Also, take the active dirop
count off of the fs and make it global (since it is measuring a global
resource) and tie the threshold value LFS_MAXDIROP to desiredvnodes.
 1.31.2.1 06-Nov-1999  perseant file lfs_vnops.c was added on branch comdex-fall-1999 on 1999-11-06 20:33:07 +0000
 1.37.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.38.2.6 03-Feb-2001  he Pull up revisions 1.47-1.49 (requested by perseant):
o Don't write anything if the filesystem is idle (PR#10979).
o Close up accounting holes in LFS' accounting of immediately-
available-space, number of clean segments, and amount of dirty
space taken up by metadata (PR#11468, PR#11470, PR#11534).
 1.38.2.5 14-Dec-2000  he Pull up revision 1.45 (requested by fvdl):
Improve NFS performance, possibly with as much as 100% in
throughput. Please note: this implies a kernel interface change,
VOP_FSYNC gains two arguments.
 1.38.2.4 01-Nov-2000  tv Fix pullup of 1.46 [perseant, toshii]:
In lfs_truncate, don't overcount the real blocks removed from the inode,
when deallocating a fragment that has not made it to disk yet.

Also, during dirops, give the directory vnode an extra reference in
SET_DIROP, to ensure its continued existence during SET_ENDOP, preventing
a possible NULL-dereference there.

These two changes should close PR #11064.
 1.38.2.3 01-Nov-2000  tv Pullup 1.46 [perseant, toshii]:
In lfs_truncate, don't overcount the real blocks removed from the inode,
when deallocating a fragment that has not made it to disk yet.

Also, during dirops, give the directory vnode an extra reference in
SET_DIROP, to ensure its continued existence during SET_ENDOP, preventing
a possible NULL-dereference there.

These two changes should close PR #11064.
 1.38.2.2 14-Sep-2000  perseant Pull up recent LFS kernel changes (approved by thorpej):

ufs/ufs/inode.h, 1.20--1.22 (add i_lfs_effnblks extension ;
make ITIMES aware of LFS_ITIMES;
_LKM protection so userland progs
compile)
ufs/ufs/ufs_vnops.c, 1.69, 1.71 (remove IN_ADIROP;
use ITIMES instead of FFS_ITIMES)
ufs/ufs/ufs_readwrite.c, 1.27 (use lfs_reserve in lfs_write)
ufs/lfs/lfs.h, 1.26--1.32 (define LFS_EST_* macros ;
change MIN_FREE_SEGS to lfs_minfreesegs ;
add avail and bfree to CLEANERINFO ;
change lfs_uinodes to signed ;
change lfs_dmeta to signed ;
add whitespace to line up structure
members ;
explicit cast to int32_t in LFS_EST_*
macros)
ufs/lfs/lfs_alloc.c, back out 1.34.2.3 (pullups of 1.39, 1.40);
then pull up 1.38 (clean up on error)
1.39--1.43 (restore fvdl's ufs_hashlock fix ;
restore fvdl's ufs_hashlock fix ;
set i_lfs_effnblks ;
use UINO macros ;
add comments and fix long lines)
ufs/lfs/lfs_balloc.c, 1.19 (don't succeed halfway)
1.21--1.25 (use i_lfs_effnblks ;
fix i_lfs_effnblks computation and
quieten ;
fix i_ffs_blocks in unwritten fragment ;
remove useless debugging check ;
add comments and (c) 2000)
ufs/lfs/lfs_bio.c, 1.24--1.30 (cleanup and make lfs_flush_fs take
"struct lfs *" instead of "struct
mount *" ;
use lfs_minfreeseg instead of
MIN_FREE_SEGS ;
use UINO macros, and copy bfree/avail
to CLEANERINFO ;
add lfs_reserve function ;
1.28--1.30 fix printf formatting)
ufs/lfs/lfs_cksum.c, 1.13 (add (c) 2000)
ufs/lfs/lfs_debug.c, 1.11 (use btodb instead of DEV_BSIZE)
ufs/lfs/lfs_extern.h, 1.18, 1.20--1.21 (function prototype changes)
ufs/lfs/lfs_inode.c, 1.38 (rewrite lfs_truncate from
ffs_truncate)
1.40--1.44 (count written and unwritten blocks
seperately ;
use disk block units instead of bytes ;
remove unnecessary "mod" variable ;
correct B_DELWRI to avoid bawrite panic ;
use lfs_reserve)
ufs/lfs/lfs_segment.c, 1.52-1.59 (use lfs_dmeta to note used summaries ;
check for UNWRITTEN in indirect blocks ;
more debugging stuff inside #ifdef
DEBUG_LFS ;
use LK_CANRECURSE ;
don't drop dirty indirect blocks ;
use UINO macros ;
don't hose the free list ;
use btodb() instead of DEV_BSIZE ;
make it compile again (oops))
ufs/lfs/lfs_subr.c, 1.16--1.17 (check for locked inodes before
changing ;
use btodb() instead of DEV_BSIZE, (c)
2000)
ufs/lfs/lfs_syscalls.c, back out 1.41.4.2 (fvdl's ufs_hashlock fix);
then pull up 1.43 (use lfs_dmeta)
1.44--1.45 (restore fvdl's ufs_hashlock fix)
1.46--1.47 (fix lfs_avail leakage from sblock
segments ;
use UINO macros)
1.49 (bounds-check inode numbers in
lfs_markv)
ufs/lfs/lfs_vfsops.c, 1.53 (use LFS_EST_* macros in lfs_statfs)
1.56--1.58 (initialize lfs_minfreeseg, lfs_effnblk ;
initialize lfs_uinodes ;
initialize lfs_ravail)
ufs/lfs/lfs_vnops.c, 1.40 (remove VDIROP from removed files)
1.42--1.44 (move SET_ENDOP below the removal of
VDIROP ;
use UINO macros and add lfs_itimes
function ;
use lfs_reserve in dirops)
 1.38.2.1 22-Jun-2000  perseant Pull up lfs_vunref fix from the trunk.
 1.50.6.9 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.50.6.8 26-Sep-2002  jdolecek hook in genfs_kqfilter(), kevents seem to work fine
 1.50.6.7 23-Sep-2002  jdolecek add spec kqfilter vnode op
 1.50.6.6 22-Sep-2002  jdolecek add fifo_kqfilter() to fifo ops, to switch on support for kevents
 1.50.6.5 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.50.6.4 16-Mar-2002  jdolecek Catch up with -current.
 1.50.6.3 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.50.6.2 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.50.6.1 03-Aug-2001  lukem update to -current
 1.50.4.2 02-Jul-2001  perseant Change disk addressing unit to be the fragment, instead of the disk sector.
All quantities in the superblock, inodes, indirect blocks, etc. refer now
to this abstract unit (called "fsb" as it is in FFS) instead of disk sectors;
as a consequence segment summary blocks have to be multiples of a fragment in
size. In v1 filesystems, compatibility code ensures that 1 fsb == 1 sector,
regardless of fragment size.

Fragments can now range in size between 512 and 32k; in the event that
LFS_LABELPAD (8k) is smaller than the disk address unit size, an extra
proto-superblock is kept at 8k from the beginning of the disk, to be used
*only* to locate the real superblocks. (Not all of the userland knows about
this yet.)

Almost all of this was done not by me, but by joff.
 1.50.4.1 29-Jun-2001  perseant Get rid of __P(), protoizing where it had not already been done
 1.50.2.15 08-Jan-2003  thorpej Sync with HEAD.
 1.50.2.14 08-Jan-2003  thorpej Sync with HEAD.
 1.50.2.13 29-Dec-2002  thorpej Sync with HEAD.
 1.50.2.12 11-Dec-2002  thorpej Sync with HEAD.
 1.50.2.11 11-Nov-2002  nathanw Catch up to -current
 1.50.2.10 18-Oct-2002  nathanw Catch up to -current.
 1.50.2.9 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.50.2.8 20-Jun-2002  nathanw Catch up to -current.
 1.50.2.7 28-Feb-2002  nathanw Catch up to -current.
 1.50.2.6 08-Jan-2002  nathanw Catch up to -current.
 1.50.2.5 14-Nov-2001  nathanw Catch up to -current.
 1.50.2.4 26-Sep-2001  nathanw Catch up to -current.
Again.
 1.50.2.3 21-Sep-2001  nathanw Catch up to -current.
 1.50.2.2 24-Aug-2001  nathanw Catch up with -current.
 1.50.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.54.2.1 01-Oct-2001  fvdl Catch up with -current.
 1.56.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.63.2.2 20-Jun-2002  gehenna catch up with -current.
 1.63.2.1 30-May-2002  gehenna Catch up with -current.
 1.64.2.1 20-Jun-2002  lukem Pull up revision 1.65 (requested by perseant in ticket #325):
For synchronous writes, keep separate i/o counters for each write, so
processes don't have to wait for one another to finish (e.g., nfsd seems
to be a little happier now, though I haven't measured the difference).
Synchronous checkpoints, however, must always wait for all i/o to finish.
Take the contents of the callback functions and have them run in thread
context instead (aiodoned thread). lfs_iocount no longer has to be
protected in splbio(), and quite a bit less of the segment construction
loop needs to be in splbio() as well.
If lfs_markv is handed a block that is not the correct size according to
the inode, refuse to process it. (Formerly it was extended to the "correct"
size.) This is possibly more prone to deadlock, but less prone to corruption.
lfs_segclean now outright refuses to clean segments that appear to have live
bytes in them. Again this may be more prone to deadlock but avoids
corruption.
Replace ufsspec_close and ufsfifo_close with LFS equivalents; this means
that no UFS functions need to know about LFS_ITIMES any more. Remove
the reference from ufs/inode.h.
Tested on i386, test-compiled on alpha.
 1.109.2.11 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.109.2.10 01-Apr-2005  skrll Sync with HEAD.
 1.109.2.9 08-Mar-2005  skrll Sync with HEAD.
 1.109.2.8 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.109.2.7 04-Feb-2005  skrll Sync with HEAD.
 1.109.2.6 30-Oct-2004  skrll Reduced diff to HEAD by restoring the struct proc * argument to lfs_bmapv
 1.109.2.5 21-Sep-2004  skrll Fix the sync with head I botched.
 1.109.2.4 18-Sep-2004  skrll Sync with HEAD.
 1.109.2.3 24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.109.2.2 03-Aug-2004  skrll Sync with HEAD
 1.109.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.129.4.1 10-May-2005  riz Pull up the following revisions (requested by perseant in ticket #1281):

1.8 sys/ufs/lfs/TODO
1.75 sys/ufs/lfs/lfs.h (via patch)
1.74 sys/ufs/lfs/lfs_alloc.c (via patch)
1.49, 1.51 sys/ufs/lfs/lfs_balloc.c (1.51 via patch)
1.78 sys/ufs/lfs/lfs_bio.c
1.62 sys/ufs/lfs/lfs_extern.h (via patch)
1.156 sys/ufs/lfs/lfs_segment.c (via patch)
1.48 sys/ufs/lfs/lfs_subr.c
1.101 sys/ufs/lfs/lfs_syscalls.c
1.163 sys/ufs/lfs/lfs_vfsops.c (via patch)
1.134 sys/ufs/lfs/lfs_vnops.c (via patch)
1.61 sys/ufs/ufs/ufs_readwrite.c (via patch)

1.20 libexec/lfs_cleanerd/clean.h (via patch)
1.52 libexec/lfs_cleanerd/cleanerd.c (via patch)
1.41 libexec/lfs_cleanerd/library.c (via patch)

1.4 regress/sys/fs/lfs/newfs_fsck/Makefile
1.2 regress/sys/fs/lfs/newfs_fsck/mkfs_mount
1.2 regress/sys/fs/lfs/newfs_fsck/smallfiles
1.3 sbin/fsck_lfs/bufcache.c
1.3 sbin/fsck_lfs/bufcache.h
1.3 sbin/fsck_lfs/lfs.h
1.8 sbin/fsck_lfs/lfs.c (via patch)
1.8 sbin/fsck_lfs/pass3.c (via patch)
1.18 sbin/fsck_lfs/pass0.c (via patch)
1.18 sbin/fsck_lfs/utilities.c (via patch)
1.7 sbin/fsck_lfs/segwrite.c
1.19 sbin/fsck_lfs/setup.c (via patch)
1.3 sbin/newfs_lfs/Makefile
0 sbin/newfs_lfs/lfs.c (yes, remove it)
1.1 sbin/newfs_lfs/make_lfs.c
1.15 sbin/newfs_lfs/newfs.c (via patch)

Various minor LFS improvements.

Kernel:

* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this. Should fix PR #29045.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
Fixes PR #26680.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().

cleaner:

* Adapt lfs_cleanerd to use the fcntl call to get the Ifile filehandle,
so it need not be in the namespace.
* Make lfs_cleanerd be more careful when there are very few available
segments.
* Make lfs_cleanerd less verbose when the filesystem is unmounted.

newfs_lfs, fsck_lfs, and regression:

* Extend the lfs library from fsck_lfs(8) so that it can be used with a
not-yet-existent LFS. Make newfs_lfs(8) use this library, so it can
create LFSs whose Ifile is larger than one segment. Addresses PR #11110.
* Make newfs_lfs(8) use strsuftoi64() for its arguments, a la newfs(8).
* Make fsck_lfs(8) respect the "file system is clean" flag.
* Don't let fsck_lfs(8) think it has dirty blocks when invoked with the
-n flag.
* Remove the Ifile from the filesystem namespace. The cleaner now uses
a fcntl call on the root inode to find the Ifile filehandle. (As a
side-effect, addresses PR #29144.)
 1.132.6.3 26-Mar-2005  yamt sync with head.
 1.132.6.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.132.6.1 12-Feb-2005  yamt sync with head.
 1.132.4.1 29-Apr-2005  kent sync with -current
 1.137.2.25 10-Aug-2006  tron Apply patch (requested by fair in perseant #1457):
Bring LFS up to current, including a patch (1.95 lfs_alloc.c) that
should prevent the inode free list errors seen on the STABLE branch
subsequent to pullup ticket #1327.
 1.137.2.24 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.177
Don't be quite so eager to error out from lfs_putpages() when pages are
busy; if we've sensed a possible 3-way deadlock and are not the pagedaemon,
relock and try again.
 1.137.2.23 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_alloc.c: revision 1.93
sys/ufs/lfs/lfs.h: revision 1.106
sys/ufs/lfs/lfs_vfsops.c: revision 1.209
sys/ufs/lfs/lfs_vnops.c: revision 1.175
sys/ufs/lfs/lfs_segment.c: revision 1.178
Fixes to address the "vinvalbuf: dirty blocks" panic that can occur when
many inodes are cleaned at once. Make sure that we write all the pages
on vnodes that are being flushed, even if we don't think there's room;
drain v_numoutput before lfs_vflush() completes.
Also, don't allow a vnode that is in the process of being cleaned to be
chosen by getnewvnode(); this avoids a segment accounting panic in the case
that a large number of inodes are fed to lfs_markv() all at once.
 1.137.2.22 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_alloc.c: revision 1.92
sys/ufs/lfs/lfs.h: revision 1.105
sys/ufs/lfs/lfs_vfsops.c: revision 1.207
sys/ufs/lfs/lfs_subr.c: revision 1.59
sys/ufs/lfs/lfs_vnops.c: revision 1.173
sys/ufs/lfs/lfs_bio.c: revision 1.92
Introduce another per-filesystem parameter, lfs_resvseg, to separate the
notion of "how many segments are reserved for the cleaner" from that of
"how many segments are not counted in lfs_bfree". The default value
used for existing filesystems is the same as the previous implicit value
of (lfs_minfreeseg / 2 + 1), modulo some sanity checking.
Count pending dirops on a per-filesystem basis, since once we start
writing them we can't stop until we're done. This seems to help stave off
the "no clean segments" panic in the case of filling the filesystem with
directories and small files (e.g. simultaneously unpacking more copies of
pkgsrc than will fit).
 1.137.2.21 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.172
Fix a "locking against myself": lfs_flush_dirops() doesn't need to lock the
vnodes to write their blocks, since it holds the segment lock.
 1.137.2.20 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.171
sys/ufs/lfs/lfs_extern.h: revision 1.81
sys/ufs/lfs/lfs_segment.c: revision 1.177
Don't ever partially write dirops, even if we need the cleaner to run.
This increases the chances of the "no clean segments" panic slightly,
but allows us to run the ckckp regression test successfully to completion.
 1.137.2.19 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs.h: revision 1.104
sys/ufs/lfs/lfs_vfsops.c: revision 1.206
sys/ufs/lfs/lfs_vnops.c: revision 1.170
sys/ufs/lfs/lfs_extern.h: revision 1.80
sys/ufs/lfs/lfs_segment.c: revision 1.176
sys/ufs/lfs/lfs_inode.c: revision 1.103 via patch
sys/ufs/lfs/lfs_alloc.c: revision 1.90
Postpone the segment accounting changes coming from truncation until the
inode that makes those changes valid is either written to disk by
lfs_writeinode() or discarded by lfs_vfree().
A couple of locking fixes are also included as well.
 1.137.2.18 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.169
Yet another MP locking issue.
 1.137.2.17 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs.h: revision 1.103
sys/ufs/lfs/lfs_segment.c: revision 1.174
sys/ufs/lfs/lfs_vnops.c: revision 1.168
Introduce two fcntl calls that freeze the filesystem right at the point
where segment 0 is being considered for writing. This allows for automated
checkpoint vailidity scanning, and could be used (in conjunction with the
existing LFCNREWIND) for e.g. snapshot dumps as well.
Include a regression test that does such scanning.
When writing the Ifile, loop through the dirty block list three times to
make sure that the checkpoint is always consistent (the first and second
times the Ifile blocks can cross a segment boundary; not so the third time
unless the segments are very small). Discovered by using the aforementioned
regression test.
 1.137.2.16 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs.h: revision 1.102
sys/ufs/lfs/lfs_segment.c: revision 1.173
sys/ufs/lfs/lfs_vnops.c: revision 1.167 via patch
sys/ufs/lfs/lfs_bio.c: revision 1.91
Make lfs_vref/lfs_vunref not need to know about VXLOCK and VFREEING
explicitly (especially since we didn't know about VFREEING at all before),
but notice the EBUSY return from vget() instead.
Fix some more MP locking protocol issues, most of which were pointed out by
Christian Ehrhardt this morning on tech-kern.
 1.137.2.15 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.166
Another MP locking fix.
 1.137.2.14 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.165
Don't leak vnode references if we fail to lock a vnode in lfs_flush_pchain().
Also fix another (probably only academic) simple_lock protocol error.
 1.137.2.13 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vfsops.c: revision 1.200
sys/ufs/lfs/lfs_vnops.c: revision 1.164
sys/ufs/lfs/lfs_inode.c: revision 1.101
sys/ufs/lfs/lfs_extern.h: revision 1.78
sys/ufs/lfs/lfs.h: revision 1.100
Implement a somewhat finer-grained mechanism for paging LFS-backed pages.
The writer daemon, if it does not need to flush the whole filesystem,
now only writes the vnodes for which the pagedaemon has requested pageouts
(although it does not pay attention to the page ranges the pagedaemon
supplies).
 1.137.2.12 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_balloc.c: revision 1.60
sys/ufs/lfs/lfs_syscalls.c: revision 1.111
sys/ufs/lfs/lfs_segment.c: revision 1.172
sys/ufs/lfs/lfs_vnops.c: revision 1.163
Several minor bug fixes:
* Correct (weak) segment lock assertions in lfs_fragextend and lfs_putpages.
* Keep IN_MODIFIED set if we run out of avail in lfs_putpages.
* Don't try to (re)write buffers on a VBLK vnode; fixes a panic I found
while running with an LFS root.
* Raise priority of LFCNSEGWAIT to PVFS; PUSER is way too low for
something the pagedaemon is relying on.
 1.137.2.11 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.162
Make sure we unlock to zero when avoiding 3-way deadlock; otherwise we
simply have a different form of deadlock.
 1.137.2.10 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vfsops.c: revision 1.198
sys/ufs/lfs/lfs_vnops.c: revision 1.161
Handle the "filesystem is clean" flag correctly when upgrading from
read-only to read-write mount. This makes "root on lfs" work for me,
although it looks like a different traceback from PR#32667.
 1.137.2.9 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.159
Don't let the pagedaemon wait for pages, since that is just asking for
a deadlock.
 1.137.2.8 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.158
sys/ufs/lfs/lfs_subr.c: revision 1.57
sys/ufs/lfs/lfs_segment.c: revision 1.171
sys/ufs/lfs/lfs.h: revision 1.97
sys/ufs/lfs/lfs_vfsops.c: revision 1.195
sys/ufs/lfs/lfs_extern.h: revision 1.76
Improvements to LFS's paging mechanism, to wit:
* Acknowledge that sometimes there are more dirty pages to be written to
disk than clean segments. When we reach the danger line,
lfs_gop_write() now returns EAGAIN. The caller of VOP_PUTPAGES(), if
it holds the segment lock, drops it and waits for the cleaner to make
room before continuing.
* Note and avoid a three-way deadlock in lfs_putpages (a writer holding
a page busy blocks on the cleaner while the cleaner blocks on the
segment lock while lfs_putpages blocks on the page).
 1.137.2.7 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.153
sys/ufs/lfs/lfs_debug.c: revision 1.32
sys/ufs/lfs/lfs_alloc.c: revision 1.84
sys/ufs/lfs/lfs_vfsops.c: revision 1.185
sys/ufs/lfs/lfs_segment.c: revision 1.165
64 bit inode changes.
 1.137.2.6 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.152
sys/ufs/lfs/lfs_debug.c: revision 1.31
sys/ufs/lfs/lfs_subr.c: revision 1.53
sys/ufs/lfs/lfs_extern.h: revision 1.68
sys/ufs/lfs/lfs_inode.c: revision 1.96
sys/ufs/lfs/lfs_bio.c: revision 1.86
sys/ufs/lfs/lfs_alloc.c: revision 1.83
sys/ufs/lfs/lfs_vfsops.c: revision 1.181
sys/ufs/lfs/lfs.h: revision 1.88
sys/ufs/lfs/lfs_segment.c: revision 1.164
- sprinkle const
- avoid shadow variables.
 1.137.2.5 20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_vnops.c: revision 1.151
VOP_LOCK drops the interlock; pick it up again to avoid an "already unlocked"
panic in lfs_putpages.
 1.137.2.4 07-May-2005  tron Apply patch (requested by perseant in ticket #242):
* fsck_lfs buffer cache fixes, including PR #29151
* Change fsck_lfs phase 0 message to reflect reality
* fsck_lfs: check phase 5 (cleanerinfo accounting) even on
roll-forward
* Keep better track of the free list during roll-forward, avoiding
a core dump
* Improve hash table use for fsck_lfs buffer and vnode cache
* Document fsck_lfs flag -f, and implement -q
* Add resize_lfs, including kernel support
* Add LFS to mountd's list of exportable filesystem types
* Make the LFS lkm work again [christos@]
* Add MP locking to the LFS kernel subsystem
* Fix pager_map deadlock in lfs_putpages()
* Avoid incomplete file extension that looks like "partial
truncation" to fsck
* Use lfs_malloc for cleaner malloc, since the cleaner often runs
in low-memory conditions.
* Use splay trees, not hash table, to track page allocation for
write.
* Fix mkdir panic on full fs
* Fix page accounting leak by counting differently.
* Use rightly named structure for lfs_getattr [skrll@]
* Cosmetic changes for readability.
 1.137.2.3 30-Mar-2005  tron Pull up revision 1.140 (requested by perseant in ticket #74):
Don't sleep while holding the vnode interlock. Should take care of the
first panic case in PR #26043.
 1.137.2.2 30-Mar-2005  tron Pull up revision 1.139 (requested by perseant in ticket #74):
avoid the need for recursive locking lfs_flush_dirops() by unlocking
the vnode around the call to this in the caller.
 1.137.2.1 30-Mar-2005  tron Pull up revision 1.138 (requested by perseant in ticket #74):
Make LFS dirops get their vnode first, before incrementing the dirop
count, to prevent a deadlock trying to call VOP_PUTPAGES() on a VDIROP
vnode. This can happen when a stacked filesystem is mounted on top of an
LFS: an LFS dirop needs to get a vnode, which is available from the upper
layer. The corresponding lower layer vnode, however, is VDIROP, so the
upper layer can't be cleaned out since its VOP_PUTPAGES() is passed
through to the lower layer, which waits for dirops to drain before it can
proceed. Deadlock.
Tweak ufs_makeinode() and ufs_mkdir() to pass the a_vpp argument through
to VOP_VALLOC().
Partially addresses PR # 26043, though it probably does not completely fix
the problem described there.
 1.152.2.8 04-Feb-2008  yamt sync with head.
 1.152.2.7 21-Jan-2008  yamt sync with head
 1.152.2.6 07-Dec-2007  yamt sync with head
 1.152.2.5 27-Oct-2007  yamt sync with head.
 1.152.2.4 03-Sep-2007  yamt sync with head.
 1.152.2.3 26-Feb-2007  yamt sync with head.
 1.152.2.2 30-Dec-2006  yamt sync with head.
 1.152.2.1 21-Jun-2006  yamt sync with head.
 1.155.2.2 29-Oct-2005  yamt use lfs_* directly rather than via ufs_ops.
suggested by Chuck Silvers.
 1.155.2.1 20-Oct-2005  yamt adapt ufs.
 1.157.12.3 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.157.12.2 31-Mar-2006  tron Merge 2006-03-31 NetBSD-current into the "peter-altq" branch.
 1.157.12.1 28-Mar-2006  tron Merge 2006-03-28 NetBSD-current into the "peter-altq" branch.
 1.157.10.5 11-May-2006  elad sync with head
 1.157.10.4 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.157.10.3 19-Apr-2006  elad sync with head.
 1.157.10.2 10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.157.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.157.8.6 03-Sep-2006  yamt sync with head.
 1.157.8.5 11-Aug-2006  yamt sync with head
 1.157.8.4 26-Jun-2006  yamt sync with head.
 1.157.8.3 24-May-2006  yamt sync with head.
 1.157.8.2 11-Apr-2006  yamt sync with head
 1.157.8.1 01-Apr-2006  yamt sync with head.
 1.157.6.2 01-Jun-2006  kardel Sync with head.
 1.157.6.1 22-Apr-2006  simonb Sync with head.
 1.157.4.1 09-Sep-2006  rpaulo sync with head
 1.178.4.1 13-Jul-2006  gdamore Merge from HEAD.
 1.188.2.4 01-Feb-2007  ad Sync with head.
 1.188.2.3 30-Jan-2007  ad Remove support for SA. Ok core@.
 1.188.2.2 12-Jan-2007  ad Sync with head.
 1.188.2.1 18-Nov-2006  ad Sync with head.
 1.189.2.2 10-Dec-2006  yamt sync with head.
 1.189.2.1 22-Oct-2006  yamt sync with head
 1.193.2.3 25-Nov-2007  xtraeme Pull up following revision(s) (requested by christos in ticket #994):
sys/ufs/lfs/lfs_vnops.c: revision 1.208 (patch)
Move the "vp = NULL" assignment after the code that requires vp != NULL.
Reported by Chris Ross on current-users.
 1.193.2.2 05-Jun-2007  bouyer Pull up following revision(s) (requested by perseant in ticket #703):
sys/miscfs/genfs/genfs.h 1.21
sys/miscfs/genfs/genfs_vnops.c 1.151
sys/ufs/lfs/lfs.h 1.119, 1.120
sys/ufs/lfs/lfs_bio.c 1.99-101
sys/ufs/lfs/lfs_extern.h 1.89
sys/ufs/lfs/lfs_inode.c 1.108, 1.109
sys/ufs/lfs/lfs_segment.c 1.197, 1.199, 1.200
sys/ufs/lfs/lfs_subr.c 1.69, 1.70
sys/ufs/lfs/lfs_syscalls.c 1.119
sys/ufs/lfs/lfs_vfsops.c 1.234, 1.235
sys/ufs/lfs/lfs_vnops.c 1.195, 1.196, 1.200, 1.202-206

Reduce busy waiting in lfs_putpages(), and other LFS improvements.
 1.193.2.1 17-Feb-2007  tron branches: 1.193.2.1.2;
Apply patch (requested by chs in ticket #422):
- Fix various deadlock problems with nullfs and unionfs.
- Speed up path lookups by upto 25%.
 1.193.2.1.2.2 06-Jan-2008  wrstuden Catch up to netbsd-4.0 release.
 1.193.2.1.2.1 03-Sep-2007  wrstuden Sync w/ NetBSD-4-RC_1
 1.198.2.4 07-May-2007  yamt sync with head.
 1.198.2.3 15-Apr-2007  yamt sync with head.
 1.198.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.198.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.201.4.1 11-Jul-2007  mjf Sync with head.
 1.201.2.12 16-Sep-2007  ad - Checkpoint work in progress on the vnode lifecycle and reference counting
stuff. This makes it work properly without kernel_lock and fixes a few
quite old bugs. See vfs_subr.c 1.283.2.17 for details.

- Fix some problems with softdep. Unfortunately our softdep code appears
to have some longstanding bugs that cause it fail under stress test.
 1.201.2.11 20-Aug-2007  ad Sync with HEAD.
 1.201.2.10 19-Aug-2007  ad - Back out the biodone() changes.
- Eliminate B_ERROR (from HEAD).
 1.201.2.9 15-Jul-2007  ad Sync with head.
 1.201.2.8 23-Jun-2007  ad - Lock v_cleanblkhd, v_dirtyblkhd, v_numoutput with the vnode's interlock.
Get rid of global_v_numoutput_lock. Partially incomplete as the buffer
cache locking doesn't work very well and needs an overhaul.
- Some changes to try and make softdep MP safe. Untested.
 1.201.2.7 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.201.2.6 08-Jun-2007  ad Sync with head.
 1.201.2.5 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.201.2.4 10-Apr-2007  ad Sync with head.
 1.201.2.3 09-Apr-2007  ad - Add two new arguments to kthread_create1: pri_t pri, bool mpsafe.
- Fork kthreads off proc0 as new LWPs, not new processes.
 1.201.2.2 21-Mar-2007  ad - Replace more simple_locks, and fix up in a few places.
- Use condition variables.
- LOCK_ASSERT -> KASSERT.
 1.201.2.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.208.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.210.10.2 29-Jul-2007  ad It's not a good idea for device drivers to modify b_flags, as they don't
need to understand the locking around that field. Instead of setting
B_ERROR, set b_error instead. b_error is 'owned' by whoever completes
the I/O request.
 1.210.10.1 29-Jul-2007  ad file lfs_vnops.c was added on branch matt-mips64 on 2007-07-29 13:31:16 +0000
 1.210.8.1 14-Oct-2007  yamt sync with head.
 1.210.6.3 23-Mar-2008  matt sync with HEAD
 1.210.6.2 09-Jan-2008  matt sync with HEAD
 1.210.6.1 06-Nov-2007  matt sync with HEAD
 1.210.4.2 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.210.4.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.212.4.2 18-Feb-2008  mjf Sync with HEAD.
 1.212.4.1 08-Dec-2007  mjf Sync with HEAD.
 1.213.6.1 02-Jan-2008  bouyer Sync with HEAD
 1.213.2.4 19-Dec-2007  ad Use a global lfs_lock.
 1.213.2.3 19-Dec-2007  ad Fix some more problems w/lfs on this branch.
 1.213.2.2 19-Dec-2007  ad Get lfs mostly working.
 1.213.2.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.215.10.5 11-Aug-2010  yamt sync with head.
 1.215.10.4 11-Mar-2010  yamt sync with head
 1.215.10.3 16-May-2009  yamt sync with head
 1.215.10.2 04-May-2009  yamt sync with head.
 1.215.10.1 16-May-2008  yamt sync with head.
 1.215.8.2 17-Jun-2008  yamt sync with head.
 1.215.8.1 18-May-2008  yamt sync with head.
 1.215.6.4 17-Jan-2009  mjf Sync with HEAD.
 1.215.6.3 29-Jun-2008  mjf Sync with HEAD.
 1.215.6.2 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.215.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.216.2.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.216.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.217.2.1 27-Jun-2008  simonb Sync with head.
 1.218.12.1 29-Feb-2012  matt Deal with UVM_PAGE_OWN changes.
 1.218.6.1 19-May-2012  riz Apply patch (requested by buhrow in ticket #1759):


sys/ufs/lfs/lfs_vnops.c patch
sys/ufs/ufs/inode.h patch
sys/ufs/ufs/ufs_extern.h patch
sys/ufs/ufs/ufs_lookup.c patch
sys/ufs/ufs/ufs_vnops.c patch
sys/ufs/ufs/ufs_wapbl.c patch

Port dholland's ufs_rename locking changes to netbsd-5.
[buhrow, ticket #1759]

Hello. More testing has revealed a minor misunderstanding between the
vnode API in -current and 5.x. The below patch, against NetBSD-5.1
sources, rolls all the accumulated patches into one patch set. With this
patch, I believe you can now run with WAPBL, softdep or traditional ufs
semantics with heavy file loads and avoid panics due to resource exhaustion
and/or tstile deadlocks. Testing has been done on I386, both uniprocessor
and multiprocessor, and on Sparc machines in uniprocessor mode, though I
think multiprocessor Sparc would be fine as well. Since these changes are
machine independent, I don't anticipate any issues on any platform. It is
my hope that modulo any final issues that come up in the final round of
testing I'm currently performing, these patches will be ready to be pulled
up into the NetBSD-5 branch.
Finally, I'd like to thank mouse@ and hannken@ for their help and
patience in helping me track down and test the final versions of these
patches. With their assistance, I'm confident these patches make NetBSD-5
a much more stable and robust operating environment in a variety of
setings.
 1.218.4.2 03-Mar-2009  skrll Sync with HEAD.
 1.218.4.1 19-Jan-2009  skrll Sync with HEAD.
 1.219.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.226.4.6 21-May-2011  rmind Fix the build.
 1.226.4.5 19-May-2011  rmind Implement sharing of vnode_t::v_interlock amongst vnodes:
- Lock is shared amongst UVM objects using uvm_obj_setlock() or getnewvnode().
- Adjust vnode cache to handle unsharing, add VI_LOCKSHARE flag for that.
- Use sharing in tmpfs and layerfs for underlying object.
- Simplify locking in ubc_fault().
- Sprinkle some asserts.

Discussed with ad@.
 1.226.4.4 05-Mar-2011  rmind sync with head
 1.226.4.3 03-Jul-2010  rmind sync with head
 1.226.4.2 30-May-2010  rmind sync with head
 1.226.4.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.226.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.226.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.234.6.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.238.6.3 02-Jun-2012  mrg sync to latest -current.
 1.238.6.2 05-Apr-2012  mrg sync to latest -current.
 1.238.6.1 18-Feb-2012  mrg merge to -current.
 1.238.2.6 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.238.2.5 23-Jan-2013  yamt sync with head
 1.238.2.4 23-May-2012  yamt sync with head.
 1.238.2.3 17-Apr-2012  yamt sync with head
 1.238.2.2 06-Nov-2011  yamt remove pg->listq and uobj->memq
 1.238.2.1 02-Nov-2011  yamt page cache related changes

- maintain object pages in radix tree rather than rb tree.
- reduce unnecessary page scan in putpages. esp. when an object has a ton of
pages cached but only a few of them are dirty.
- reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
- fix nfs commit range tracking.
- fix nfs write clustering. XXX hack
 1.239.2.2 27-Aug-2016  bouyer Pull up following revision(s) (requested by dholland in ticket #1389):
sys/ufs/lfs/lfs_vnops.c: revision 1.304
Fix a deadlock
ok dholland@
 1.239.2.1 17-Mar-2012  bouyer Pull up following revision(s) (requested by perseant in ticket #116):
sys/ufs/lfs/lfs_alloc.c: revision 1.112
tests/fs/vfs/t_rmdirrace.c: revision 1.9
tests/fs/vfs/t_renamerace.c: revision 1.25
sys/ufs/lfs/lfs_vnops.c: revision 1.240
sys/ufs/lfs/lfs_segment.c: revision 1.224
sys/ufs/lfs/lfs_bio.c: revision 1.122
sys/ufs/lfs/lfs_vfsops.c: revision 1.294
sbin/newfs_lfs/make_lfs.c: revision 1.19
sys/ufs/lfs/lfs.h: revision 1.136
Pass t_renamerace and t_rmdirrace tests.
Adapt dholland@'s fix to ufs_rename to fix PR kern/43582. Address several
other MP locking issues discovered during the course of investigating the
same problem.
Removed extraneous vn_lock() calls on the Ifile, since the Ifile writes
are controlled by the segment lock.
Fix PR kern/45982 by deemphasizing the estimate of how much metadata
will fill the empty space on disk when the disk is nearly empty
(t_renamerace crates a lot of inode blocks on a tiny empty disk).
 1.242.2.4 03-Dec-2017  jdolecek update from HEAD
 1.242.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.242.2.2 23-Jun-2013  tls resync from head
 1.242.2.1 25-Feb-2013  tls resync with head
 1.248.4.1 23-Jul-2013  riastradh sync with HEAD
 1.248.2.2 18-May-2014  rmind sync with head
 1.248.2.1 28-Aug-2013  rmind sync with head
 1.262.2.1 10-Aug-2014  tls Rebase.
 1.269.4.6 28-Aug-2017  skrll Sync with HEAD
 1.269.4.5 05-Oct-2016  skrll Sync with HEAD
 1.269.4.4 09-Jul-2016  skrll Sync with HEAD
 1.269.4.3 22-Sep-2015  skrll Sync with HEAD
 1.269.4.2 06-Jun-2015  skrll Sync with HEAD
 1.269.4.1 06-Apr-2015  skrll Sync with HEAD
 1.269.2.2 14-Jul-2016  martin Pull up following revision(s) (requested by dholland in ticket #1205):
sys/ufs/lfs/lfs_vnops.c: revision 1.304
Fix a deadlock
ok dholland@
 1.269.2.1 06-Aug-2015  snj Apply patch (requested by dholland in ticket #935):
Comment out some KASSERTs.
 1.304.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.304.2.2 26-Apr-2017  pgoyette Sync with HEAD
 1.304.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.315.2.1 30-Oct-2017  snj Pull up following revision(s) (requested by maya in ticket #330):
sbin/fsck_lfs/inode.c: 1.69
sbin/fsck_lfs/lfs.c: 1.73
sbin/fsck_lfs/pass6.c: 1.50
sbin/fsck_lfs/segwrite.c: 1.46
sys/ufs/lfs/lfs.h: 1.202-1.203
sys/ufs/lfs/lfs_accessors.h: 1.48
sys/ufs/lfs/lfs_alloc.c: 1.136-1.137
sys/ufs/lfs/lfs_balloc.c: 1.94
sys/ufs/lfs/lfs_bio.c: 1.141
sys/ufs/lfs/lfs_extern.h: 1.113
sys/ufs/lfs/lfs_inode.c: 1.156-1.157
sys/ufs/lfs/lfs_inode.h: 1.20, 1.21, 1.23
sys/ufs/lfs/lfs_itimes.c: 1.20
sys/ufs/lfs/lfs_pages.c: 1.13-1.15
sys/ufs/lfs/lfs_rename.c: 1.22
sys/ufs/lfs/lfs_segment.c: 1.270-1.275
sys/ufs/lfs/lfs_subr.c: 1.94-1.97
sys/ufs/lfs/lfs_syscalls.c: 1.175
sys/ufs/lfs/lfs_vfsops.c: 1.360
sys/ufs/lfs/lfs_vnops.c: 1.316-1.321
sys/ufs/lfs/ulfs_inode.c: 1.20
sys/ufs/lfs/ulfs_inode.h: 1.24
sys/ufs/lfs/ulfs_lookup.c: 1.41
sys/ufs/lfs/ulfs_quota2.c: 1.31
sys/ufs/lfs/ulfs_readwrite.c: 1.24
sys/ufs/lfs/ulfs_vnops.c: 1.49-1.50
Update inode member i_flag --> i_state to keep up with kernel changes
Move definition of IN_ALLMOD near the flag it's a mask for.
Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
Correct confusion between i_flag and i_flags
These will have to be renamed.
Spotted by Riastradh, thanks!
Add an XXX about the missing flags so it's not buried in a commit
message.
now the XXX count for LFS is 260
Rename i_flag to i_state.
The similarity to i_flags has previously caused errors.
Use continue to denote the no-op loop to match netbsd style
newline for extra clarity.
It isn't safe to drain dirops with seglock held, it'll deadlock if there
are any dirops. drain before grabbing seglock.
lfs_dirops == 0 is always true (as we already drained dirops), so omit
that part of the comparison.
Fixes a lot of LFS deadlocks. PR kern/52301
Many thanks to dholland for help analyzing coredumps
Ifdef out KDASSERT which fires on my machine.
Deduplicate sanity check that seglock is held on segunlock
Revert r1.272 fix to PR kern/52301, the performance hit is making things
unusable.
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar
XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
fix buffer overflow/KASSERT when cookies are supplied
lfs no longer uses the ffs-style struct direct, use the correct minimum
size
from dholland
XXX more wrong
Consistently use {,UN}MARK_VNODE macros rather than function calls.
Not much point doing anything after a panic call
Ask some question about the code in a XXX comment
XXX question our double-flushing of dirops
Fix typo in comment
 1.321.4.4 21-Apr-2020  martin Sync with HEAD
 1.321.4.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.321.4.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.321.4.1 10-Jun-2019  christos Sync with HEAD
 1.321.2.2 18-Jan-2019  pgoyette Synch with HEAD
 1.321.2.1 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.324.2.1 17-Aug-2020  martin Pull up following revision(s) (requested by riastradh in ticket #1050):

sys/ufs/lfs/lfs_subr.c: revision 1.101
sys/ufs/lfs/lfs_subr.c: revision 1.102
sys/ufs/lfs/lfs_inode.c: revision 1.158
sys/ufs/lfs/lfs_inode.h: revision 1.25
sys/ufs/lfs/lfs_balloc.c: revision 1.95
sys/ufs/lfs/lfs_pages.c: revision 1.21
sys/ufs/lfs/lfs_vnops.c: revision 1.330
sys/ufs/lfs/lfs_alloc.c: revision 1.140 (patch)
sys/ufs/lfs/lfs_alloc.c: revision 1.141 (patch)
lib/libp2k/p2k.c: revision 1.72
sys/ufs/lfs/lfs.h: revision 1.205
sys/ufs/lfs/lfs.h: revision 1.206
sys/ufs/lfs/lfs_segment.c: revision 1.284
sys/ufs/lfs/lfs.h: revision 1.207
sys/ufs/lfs/lfs_segment.c: revision 1.285
sys/ufs/lfs/lfs_debug.c: revision 1.55
sys/ufs/lfs/lfs_rename.c: revision 1.23
usr.sbin/dumplfs/dumplfs.c: revision 1.65
sys/ufs/lfs/lfs_vfsops.c: revision 1.371
sys/arch/i386/stand/efiboot/bootx64/Makefile: revision 1.3
sys/ufs/lfs/lfs_vfsops.c: revision 1.372
sys/ufs/lfs/lfs_vfsops.c: revision 1.373
sbin/fsck_lfs/pass1.c: revision 1.46
sys/ufs/lfs/lfs_vnops.c: revision 1.326
sys/ufs/lfs/lfs_vnops.c: revision 1.327
sys/ufs/lfs/lfs_vfsops.c: revision 1.375 (patch)
sys/ufs/lfs/lfs_vnops.c: revision 1.328
sys/ufs/lfs/lfs_subr.c: revision 1.98
sys/ufs/lfs/lfs_extern.h: revision 1.116
sys/ufs/lfs/lfs_vnops.c: revision 1.329
sys/ufs/lfs/lfs_subr.c: revision 1.99
sys/ufs/lfs/lfs_extern.h: revision 1.117
sys/ufs/lfs/lfs_accessors.h: revision 1.49
sys/ufs/lfs/lfs_extern.h: revision 1.118
sys/rump/fs/lib/liblfs/Makefile: revision 1.15
sys/ufs/lfs/lfs_bio.c: revision 1.146 (patch)
sys/ufs/lfs/lfs_bio.c: revision 1.147
sys/ufs/lfs/lfs_subr.c: revision 1.100

Fix kassert in lfs by initializing vp first.

Use a marker node to iterate lfs_dchainhd / i_lfs_dchain.

I believe elements can be removed while the lock is dropped,
including the next node we're hanging on to.

Just use VOP_BWRITE for lfs_bwrite_log.
Hope this doesn't cause trouble with vfs_suspend.

Teach lfs to transition ro<->rw.

Prevent new dirops while we issue lfs_flush_dirops.

lfs_flush_dirops assumes (by KASSERT((ip->i_state & IN_ADIROP) == 0))
that vnodes on the dchain will not become involved in active dirops
even while holding no other locks (lfs_lock, v_interlock), so we must
set lfs_writer here. All other callers already set lfs_writer.

We set fs->lfs_writer++ without explicitly doing lfs_writer_enter
because
(a) we already waited for the dirops to drain, and
(b) we hold lfs_lock and cannot drop it before setting lfs_writer.

Assert lfs_writer where I think we can now prove it.

Serialize access to the splay tree with lfs_lock.

Change some cheap KDASSERT into KASSERT.

Take a reference and fix assertions in lfs_flush_dirops.
Fixes panic:
KASSERT((ip->i_state & IN_ADIROP) == 0) at lfs_vnops.c:1670
lfs_flush_dirops
lfs_check
lfs_setattr
VOP_SETATTR
change_mode
sys_fchmod
syscall

This assertion -- and the assertion that vp->v_uflag has VU_DIROP set
-- is valid only until we release lfs_lock, because we may race with
lfs_unmark_dirop which will remove the nodes and change the flags.

Further, vp itself is valid only as long as it is referenced, which it
is as long as it's on the dchain, but lfs_unmark_dirop drops the
dchain's reference.

Don't lfs_writer_enter while holding v_interlock.

There's no need to lfs_writer_enter at all here, as far as I can see.
lfs_flush_fs will do it for us.

Break deadlock in PR kern/52301.

The lock order is lfs_writer -> lfs_seglock. The problem in 52301 is
that lfs_segwrite violates this lock order by sometimes doing
lfs_seglock -> lfs_writer, either (a) when doing a checkpoint or (b),
opportunistically, when there are no dirops pending. Both cases can
deadlock, because dirops sometimes take the seglock (lfs_truncate,
lfs_valloc, lfs_vfree):
(a) There may be dirops pending, and they may be waiting for the
seglock, so we can't wait for them to complete while holding the
seglock.
(b) The test for fs->lfs_dirops == 0 happens unlocked, and the state
may change by the time lfs_writer_enter acquires lfs_lock.

To resolve this in each case:
(a) Do lfs_writer_enter before lfs_seglock, since we will need it
unconditionally anyway. The worst performance impact of this should
be that some dirops get delayed a little bit.
(b) Create a new lfs_writer_tryenter to use at this point so that the
test for fs->lfs_dirops == 0 and the acquisition of lfs_writer happen
atomically under lfs_lock.

Initialize/destroy lfs_allclean_wakeup in modcmd, not lfs_mountfs.

Fixes reloading lfs.kmod.

In lfs_update, hold lfs_writer around lfs_vflush.

Otherwise, we might do
lfs_vflush
-> lfs_seglock
-> lfs_segwait(SEGM_CKP)
-> lfs_writer_enter
which is the reverse of the lfs_writer -> lfs_seglock ordering.

Call lfs_orphan in lfs_rename while we're still in the dirop.
lfs_writer_enter can't fail; keep it simple and don't pretend it can.

Assert that mtsleep can't fail either -- it doesn't catch signals and
there's no timeout.

Teach LFS_ORPHAN_NEXTFREE about lfs64.

Dust off the orphan detection code and try to make it work.

Fix !DIAGNOSTIC compile

Fix userland references to LFS_ORPHAN_NEXTFREE.

Forgot to grep for these or do a full distribution build, oops!

Fix missing <sys/evcnt.h> by removing the evcnts instead.

Just wanted to confirm that a race might happen, and indeed it did.
These serve little diagnostic value otherwise.

OR into bp->b_cflags; don't overwrite.

CTASSERT lfs on-disk structure sizes.

Avoid misaligned access to lfs64 on-disk records in memory.
lfs64 directory entries are only 32-bit aligned in order to conserve
space in directory blocks, and we had a hack to stuff a 64-bit inode
in them. This replaces the hack by __aligned(4) __packed, and goes
further:

1. It's not clear that all the other lfs64 data structures are 64-bit
aligned on disk to begin with. We can go through these later and
upgrade them from
struct foo64 {
...
} __aligned(4) __packed;
union foo {
struct foo64 f64;
...
};
to
struct foo64 {
...
};
union foo {
struct foo64 f64 __aligned(8);
...
} __aligned(4) __packed;
if we really want to take advantage of 64-bit memory accesses.
However, the __aligned(4) __packed must remain on the union
because:
2. We access even the lfs32 data structures via a union that has
lfs64 members, and it turns out that compilers will assume access
through a union with 64-bit aligned members implies the whole
union has 64-bit alignment, even if we're only accessing a 32-bit
aligned member.

Fix clang build after packed lfs64 accessor change.

Suppress spurious address-of-packed error in rump lfs too.
 1.325.2.1 29-Feb-2020  ad Sync with head.
 1.331.4.1 20-Apr-2020  bouyer Sync with HEAD
 1.336.6.1 01-Aug-2021  thorpej Sync with HEAD.
 1.9 30-Mar-2017  hannken Remove now redundant calls to fstrans_start()/fstrans_done().

Add fstrans_start()/fstrans_done() to lfs_putpages().
 1.8 13-Mar-2017  riastradh #if DIAGNOSTIC panic ---> KASSERT

Replace some #if DEBUG by this too. DEBUG is only for expensive
assertions; these are not.
 1.7 01-Sep-2015  dholland branches: 1.7.2; 1.7.4;
Use the lfs dinode accessors in place of the ufs-derived ones.
(Mostly.)

The ufs-derived ones are fake structure member macros, which are gross
and not very safe. Also, it seems that a lot of places in the lfs code
were using the ffsv1 branch of them unconditionally, and this way it's
guaranteed all those places have been updated.

Found while doing this: for non-devices, have getattr produce NODEV
in the rdev field instead of leaking the address of the first direct
block.
 1.6 02-Aug-2015  dholland Pass the fs object to LFS_MAX_DADDR so it can check lfs_is64.

Remove some hackish intentional 64->32 truncations next to the checks
using LFS_MAX_DADDR, and tackle the problem they handled in bmap
instead.

The problem: the magic block pointer value UNWRITTEN has magic value
-2, and if it's not handled specifically, uint32 -> uint64 promotion
turns it into 4294967294, which then causes consternation and
monkeyhouse downstream.

What's here is still kind of a hack, but it's a step forward.
 1.5 28-Jul-2013  dholland branches: 1.5.4; 1.5.8;
Migrate the miscellaneous ulfs-level info from struct ulfsmount to
struct lfs.

Put them inside #ifdef _KERNEL there. They are not the only such
members, gross as that is. Unfortunately, moving struct lfs to
lfs_kernel.h does not work.
 1.4 06-Jun-2013  dholland branches: 1.4.2; 1.4.4;
Remove stray references to ext2fs, chfs, ffs, and mfs.
 1.3 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.2 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.4.4.1 28-Aug-2013  rmind sync with head
 1.4.2.4 03-Dec-2017  jdolecek update from HEAD
 1.4.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.4.2.2 23-Jun-2013  tls resync from head
 1.4.2.1 06-Jun-2013  tls file ulfs_bmap.c was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.5.8.2 28-Aug-2017  skrll Sync with HEAD
 1.5.8.1 22-Sep-2015  skrll Sync with HEAD
 1.5.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.5.4.1 28-Jul-2013  yamt file ulfs_bmap.c was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.7.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.7.2.2 26-Apr-2017  pgoyette Sync with HEAD
 1.7.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.9 19-Apr-2018  christos s/static inline/static __inline/g for consistency.
 1.8 20-Jun-2016  dholland branches: 1.8.16;
u_int{8,16,32,64}_t -> uint{8,16,32,64}_t in remaining lfs headers.
 1.7 19-Jun-2016  dholland we already have changes here comparable to ufs_bswap.h -r1.20 and -r1.21.
 1.6 18-Oct-2013  christos branches: 1.6.4; 1.6.8;
use __USE() in the right place, instead of (void)var.
 1.5 17-Oct-2013  christos - remove unused variables
- add debug ifdefs for debugging variables
- __USE() where appropriate.
 1.4 28-Jul-2013  dholland Migrate the miscellaneous ulfs-level info from struct ulfsmount to
struct lfs.

Put them inside #ifdef _KERNEL there. They are not the only such
members, gross as that is. Unfortunately, moving struct lfs to
lfs_kernel.h does not work.
 1.3 06-Jun-2013  dholland branches: 1.3.2; 1.3.4;
Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.2 06-Jun-2013  dholland Split lfs from ufs step 3: rearrange config stuff.
Add new options:
LFS_EI
LFS_DIRHASH
LFS_EXTATTR
LFS_EXTATTR_AUTOSTART
LFS_QUOTA
LFS_QUOTA2

and update code referring to the corresponding FFS and UFS config
symbols to use the LFS versions. Disable the one extant reference
to APPLE_UFS in the ulfs files. Use opt_lfs.h only, not opt_ffs.h.
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.3.4.2 18-May-2014  rmind sync with head
 1.3.4.1 28-Aug-2013  rmind sync with head
 1.3.2.4 03-Dec-2017  jdolecek update from HEAD
 1.3.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.3.2.2 23-Jun-2013  tls resync from head
 1.3.2.1 06-Jun-2013  tls file ulfs_bswap.h was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.6.8.1 09-Jul-2016  skrll Sync with HEAD
 1.6.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.6.4.1 18-Oct-2013  yamt file ulfs_bswap.h was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.8.16.1 22-Apr-2018  pgoyette Sync with HEAD
 1.13 20-Jun-2016  dholland Massedit u_int{8,16,32,64}_t to uint{8,16,32,64}_t. This effectively
merges ufs/dinode.h 1.25.
 1.12 19-Jun-2016  dholland we are actually synced with ufs/dinode.h 1.24 and ufs/dir.h 1.25.
 1.11 08-Jun-2013  dholland branches: 1.11.2; 1.11.10; 1.11.14;
Move a comment to lfs.h that belongs better there.
 1.10 08-Jun-2013  dholland Move more symbols to lfs.h:
LFS_DIRBLKSIZ
LFS_DIRECTSIZ
LFS_DIRSIZ
LFS_OLDDIRFMT
LFS_NEWDIRFMT
LFS_IFTODT
LFS_DTTOIF
ULFS{,1,2}_MAXSYMLINKLEN
 1.9 08-Jun-2013  dholland Move stuff to lfs.h that's needed by userland:
LFS_DT_*
ULFS_ROOTINO
ULFS_WINO
struct lfs_direct
struct lfs_dirtemplate
struct lfs_odirtemplate
struct ulfs_args

Also fix FFS_MAXNAMLEN -> LFS_MAXNAMLEN in several places.
 1.8 08-Jun-2013  dholland struct direct -> struct lfs_direct
struct dirtemplate -> struct lfs_dirtemplate
struct odirtemplate -> struct lfs_odirtemplate
DT_* -> LFS_DT_*
 1.7 08-Jun-2013  dholland Now move LFS_IFMT and friends from ulfs_dinode.h to lfs.h.
 1.6 08-Jun-2013  dholland Stick LFS_ in front of IFMT, IFIFO, IFREG, etc. so as not to conflict
with the UFS copies of these symbols. (Which themselves ought to have
UFS_ stuck on.)
 1.5 08-Jun-2013  dholland Move the dinode (on-disk inode) structures to lfs.h, since they are
and will be obviously required by userland tools that need to read
the on-disk structures.

Also, DINODE{1,2}_SIZE -> LFS_DINODE{1,2}_SIZE.
 1.4 06-Jun-2013  dholland Remove references to Apple UFS.
 1.3 06-Jun-2013  dholland Cleanups to reduce symbol and header exposure:
- move struct ufid from ulfs_inode.h to lfs.h
- lfs.h needs sys/mount.h and sys/pool.h
- ulfs_quota2_subr.c needs lfs_inode.h
- remove ulfs_inode.h from lfs.h in favor of ulfs_dinode.h
- move ULFS_NDADDR, ULFS_NIADDR, ULFS_NXADDR from ulfs_dinode.h to lfs.h
- remove ulfs_dinode.h from lfs.h
- add lfs.h to ulfs_dinode.h
 1.2 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.11.14.1 09-Jul-2016  skrll Sync with HEAD
 1.11.10.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.11.10.1 08-Jun-2013  yamt file ulfs_dinode.h was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.11.2.3 03-Dec-2017  jdolecek update from HEAD
 1.11.2.2 23-Jun-2013  tls resync from head
 1.11.2.1 08-Jun-2013  tls file ulfs_dinode.h was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.9 08-Jun-2013  dholland ulfs_dir.h has been emptied; remove it.
 1.8 08-Jun-2013  dholland Move more symbols to lfs.h:
LFS_DIRBLKSIZ
LFS_DIRECTSIZ
LFS_DIRSIZ
LFS_OLDDIRFMT
LFS_NEWDIRFMT
LFS_IFTODT
LFS_DTTOIF
ULFS{,1,2}_MAXSYMLINKLEN
 1.7 08-Jun-2013  dholland DIRBLKSIZ -> LFS_DIRBLKSIZ
DIRECTSIZ -> LFS_DIRECTSIZ
DIRSIZ -> LFS_DIRSIZ
OLDDIRFMT -> LFS_OLDDIRFMT
NEWDIRFMT -> LFS_NEWDIRFMT
IFTODT -> LFS_IFTODT
DTTOIF -> LFS_DTTOIF
 1.6 08-Jun-2013  dholland Move stuff to lfs.h that's needed by userland:
LFS_DT_*
ULFS_ROOTINO
ULFS_WINO
struct lfs_direct
struct lfs_dirtemplate
struct lfs_odirtemplate
struct ulfs_args

Also fix FFS_MAXNAMLEN -> LFS_MAXNAMLEN in several places.
 1.5 08-Jun-2013  dholland struct direct -> struct lfs_direct
struct dirtemplate -> struct lfs_dirtemplate
struct odirtemplate -> struct lfs_odirtemplate
DT_* -> LFS_DT_*
 1.4 08-Jun-2013  dholland Split the definitions suitable for userland out of ulfs_inode.h into
lfs_inode.h. Since fsck_lfs, newfs_lfs, and lfs_cleanerd want to reuse
the inode structure for their own internal use, and some of them share
parts of the kernel code as well, the best way forward is to provide a
relatively sanitized header that doesn't bring in stray material.

Shuffle a few other definitions around so that lfs_inode.h depends
only on lfs.h.

Install lfs_inode.h into /usr/include.
 1.3 06-Jun-2013  dholland Remove references to Apple UFS.
 1.2 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.19 07-Aug-2022  simonb If UFS or LFS dirhash is enabled in the kernel, set the dirhash cache
size dependant on memory size. If less than 128MB of memory, default
to no cache. With 128MB of memory or more, use a maximum cache size of
1/64th of memory; cap maximum default cache size to 32MB (for systems
with 2GB of memory or more).

The dirhash cache sizes are still explicityly setable by sysctl(8) or
by adding relevant entry(s) to sysctl.conf(5).
 1.18 14-Mar-2020  ad - Hide the details of SPCF_SHOULDYIELD and related behind a couple of small
functions: preempt_point() and preempt_needed().

- preempt(): if the LWP has exceeded its timeslice in kernel, strip it of
any priority boost gained earlier from blocking.
 1.17 20-Jun-2016  dholland branches: 1.17.18;
Merge -r1.37 of ufs_dirhash.c:
clear i_dirhash sooner, but what lock protects it?
 1.16 20-Jun-2016  dholland More already-merged or equivalent changes:

ufs_dirhash.c 1.36 corresponds to ulfs_dirhash.c 1.8
ufs_extattr.c 1.43 corresponds to ulfs_extattr.c 1.7
ufs_lookup.c 1.126 does not apply to lfs
ufs_lookup.c 1.127 we already have
ufs_lookup.c 1.128 does not apply to lfs
ufs_lookup.c 1.129 corresponds to ulfs_lookup.c 1.19
ufs_quota1.c 1.19 corresponds to ulfs_quota1.c 1.7
ufs_quota1.c 1.20 corresponds to ulfs_quota1.c 1.8
ufs_quota2.c 1.36 we have equivalent changes for
ufs_rename.c 1.9 corresponds to lfs_rename.c 1.5
ufs_rename.c 1.10 corresponds to lfs_rename.c 1.6
ufs_vnops.c 1.219 corresponds to lfs_vnops.c 1.260 and ulfs_vnops.c 1.19
ufs_vnops.c 1.220 corresponds to lfs_vnops.c 1.261 and ulfs_vnops.c 1.20
ufs_vnops.c 1.221 was superseded by later changes
ufs_vnops.c 1.222 got fixed independently in lfs
 1.15 19-Jun-2016  dholland Mark ufs file versions we're already synced with.
 1.14 21-Sep-2015  dholland Add 64-bit directory entry structures, and adjust accessors accordingly.

The LFS64 directory entry has a 64-bit inode number. This is stored as
two 32-bit values to avoid inducing 64-bit alignment requirements.

The exposed type for manipulating directory entries is now
LFS_DIRHEADER, following the same convention as e.g. IFILE and SEGUSE.
(But with LFS_ on it, because.)
 1.13 21-Sep-2015  dholland Oops; LFS_DIRECTSIZ() is going to need the fs as an argument.

Also, it turns out that dirhash needs a compile-time-constant version
of LFS_DIRECTSIZ(LFS_MAXNAMLEN+1), independent of 64-vs-32, so create
LFS_MAXDIRENTRYSIZE for this. Sigh.
 1.12 15-Sep-2015  dholland Pass around struct lfs_dirheader instead of struct lfs_direct.
 1.11 15-Sep-2015  dholland Add an accessor function for directory names.
 1.10 15-Sep-2015  dholland Add and use accessor functions for more of the directory entry fields.
 1.9 01-Sep-2015  dholland Add new accessors for the d_type and d_namlen fields of struct lfs_direct.
Napalm the old byteswap access logic for these.
 1.8 25-Feb-2014  pooka branches: 1.8.4; 1.8.8;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.7 28-Jul-2013  dholland Migrate the miscellaneous ulfs-level info from struct ulfsmount to
struct lfs.

Put them inside #ifdef _KERNEL there. They are not the only such
members, gross as that is. Unfortunately, moving struct lfs to
lfs_kernel.h does not work.
 1.6 08-Jun-2013  dholland branches: 1.6.2; 1.6.4;
ulfs_dir.h has been emptied; remove it.
 1.5 08-Jun-2013  dholland DIRBLKSIZ -> LFS_DIRBLKSIZ
DIRECTSIZ -> LFS_DIRECTSIZ
DIRSIZ -> LFS_DIRSIZ
OLDDIRFMT -> LFS_OLDDIRFMT
NEWDIRFMT -> LFS_NEWDIRFMT
IFTODT -> LFS_IFTODT
DTTOIF -> LFS_DTTOIF
 1.4 08-Jun-2013  dholland struct direct -> struct lfs_direct
struct dirtemplate -> struct lfs_dirtemplate
struct odirtemplate -> struct lfs_odirtemplate
DT_* -> LFS_DT_*
 1.3 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.2 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.6.4.2 18-May-2014  rmind sync with head
 1.6.4.1 28-Aug-2013  rmind sync with head
 1.6.2.4 03-Dec-2017  jdolecek update from HEAD
 1.6.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.6.2.2 23-Jun-2013  tls resync from head
 1.6.2.1 08-Jun-2013  tls file ulfs_dirhash.c was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.8.8.2 09-Jul-2016  skrll Sync with HEAD
 1.8.8.1 22-Sep-2015  skrll Sync with HEAD
 1.8.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.8.4.1 25-Feb-2014  yamt file ulfs_dirhash.c was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.17.18.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.12 19-Aug-2021  andvar s/memry/memory+s/softare/software/+s/grapics/graphics+s/ouput/output
 1.11 27-Dec-2019  msaitoh s/inital/initial/
 1.10 20-Jun-2016  dholland branches: 1.10.18;
u_int{8,16,32,64}_t -> uint{8,16,32,64}_t in remaining lfs headers.
 1.9 19-Jun-2016  dholland Mark ufs file versions we're already synced with.
 1.8 21-Sep-2015  dholland Add 64-bit directory entry structures, and adjust accessors accordingly.

The LFS64 directory entry has a 64-bit inode number. This is stored as
two 32-bit values to avoid inducing 64-bit alignment requirements.

The exposed type for manipulating directory entries is now
LFS_DIRHEADER, following the same convention as e.g. IFILE and SEGUSE.
(But with LFS_ on it, because.)
 1.7 21-Sep-2015  dholland Oops; LFS_DIRECTSIZ() is going to need the fs as an argument.

Also, it turns out that dirhash needs a compile-time-constant version
of LFS_DIRECTSIZ(LFS_MAXNAMLEN+1), independent of 64-vs-32, so create
LFS_MAXDIRENTRYSIZE for this. Sigh.
 1.6 15-Sep-2015  dholland Pass around struct lfs_dirheader instead of struct lfs_direct.
 1.5 08-Jun-2013  dholland branches: 1.5.2; 1.5.10; 1.5.14;
DIRBLKSIZ -> LFS_DIRBLKSIZ
DIRECTSIZ -> LFS_DIRECTSIZ
DIRSIZ -> LFS_DIRSIZ
OLDDIRFMT -> LFS_OLDDIRFMT
NEWDIRFMT -> LFS_NEWDIRFMT
IFTODT -> LFS_IFTODT
DTTOIF -> LFS_DTTOIF
 1.4 08-Jun-2013  dholland Move stuff to lfs.h that's needed by userland:
LFS_DT_*
ULFS_ROOTINO
ULFS_WINO
struct lfs_direct
struct lfs_dirtemplate
struct lfs_odirtemplate
struct ulfs_args

Also fix FFS_MAXNAMLEN -> LFS_MAXNAMLEN in several places.
 1.3 08-Jun-2013  dholland struct direct -> struct lfs_direct
struct dirtemplate -> struct lfs_dirtemplate
struct odirtemplate -> struct lfs_odirtemplate
DT_* -> LFS_DT_*
 1.2 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.5.14.2 09-Jul-2016  skrll Sync with HEAD
 1.5.14.1 22-Sep-2015  skrll Sync with HEAD
 1.5.10.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.5.10.1 08-Jun-2013  yamt file ulfs_dirhash.h was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.5.2.3 03-Dec-2017  jdolecek update from HEAD
 1.5.2.2 23-Jun-2013  tls resync from head
 1.5.2.1 08-Jun-2013  tls file ulfs_dirhash.h was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.10.18.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.18 10-Feb-2024  andvar Fix various typos in comments, log messages and documentation.
 1.17 29-Jun-2021  dholland Add containment for the cloning devices hack in vn_open.

Cloning devices (and also things like /dev/stderr) work by allocating
a struct file, stuffing it in the file table (which is a layer
violation), stuffing the file descriptor number for it in a magic
field of struct lwp (which is gross), and then "failing" with one of
two magic errnos, EDUPFD or EMOVEFD.

Before this commit, all callers of vn_open in the kernel (there are
quite a few) were expected to check for these errors and handle the
situation. Needless to say, none of them except for open() itself did,
resulting in internal negative errnos being returned to userspace.

This hack is fairly deeply rooted and cannot be eliminated all at
once. This commit adds logic to handle the magic errnos inside
vn_open; now on success vn_open returns either a vnode or an integer
file descriptor, along with a flag that says whether the underlying
code requested EDUPFD or EMOVEFD. Callers not prepared to cope with
file descriptors can pass NULL for the extra return values, in which
case if a file descriptor would be produced vn_open fails with
EOPNOTSUPP.

Since I'm rearranging vn_open's signature anyway, stop exposing struct
nameidata. Instead, take three arguments: an optional vnode to use as
the starting point (like openat()), the path, and additional namei
flags to use, restricted to NOCHROOT and TRYEMULROOT. (Other namei
behavior, e.g. NOFOLLOW, can be requested via the open flags.)

This change requires a kernel bump. Ride the one an hour ago.
(That was supposed to be coordinated; did not intend to let an hour
slip by. My fault.)
 1.16 16-May-2020  christos branches: 1.16.6;
Add ACL support for FFS. From FreeBSD.
 1.15 17-Jan-2020  ad VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.
 1.14 09-Nov-2016  dholland branches: 1.14.16; 1.14.22;
Apply ufs_extattr.c 1.48:
Explain why the lock in here needs to be recursive. Related to PR 46997.

ufs_extattr 1.47 was also committed directly here, so this file is still
fully synced with it.
 1.13 07-Jul-2016  msaitoh branches: 1.13.2;
KNF. Remove extra spaces. No functional change.
 1.12 20-Jun-2016  dholland Merge -r1.46 of ufs_extattr.c: Fix uninitialized mutex usage
 1.11 20-Jun-2016  dholland Merge -r1.45 of ufs_extattr.c:
Fix UFS1 extended attribute backend autocreation deadlock
 1.10 20-Jun-2016  dholland Merge -r1.44 of ufs_extattr.c and related change -r1.302 of ffs_vfops.c:
fix use-after-free on failed unmount with extended attributes enabled.
 1.9 20-Jun-2016  dholland More already-merged or equivalent changes:

ufs_dirhash.c 1.36 corresponds to ulfs_dirhash.c 1.8
ufs_extattr.c 1.43 corresponds to ulfs_extattr.c 1.7
ufs_lookup.c 1.126 does not apply to lfs
ufs_lookup.c 1.127 we already have
ufs_lookup.c 1.128 does not apply to lfs
ufs_lookup.c 1.129 corresponds to ulfs_lookup.c 1.19
ufs_quota1.c 1.19 corresponds to ulfs_quota1.c 1.7
ufs_quota1.c 1.20 corresponds to ulfs_quota1.c 1.8
ufs_quota2.c 1.36 we have equivalent changes for
ufs_rename.c 1.9 corresponds to lfs_rename.c 1.5
ufs_rename.c 1.10 corresponds to lfs_rename.c 1.6
ufs_vnops.c 1.219 corresponds to lfs_vnops.c 1.260 and ulfs_vnops.c 1.19
ufs_vnops.c 1.220 corresponds to lfs_vnops.c 1.261 and ulfs_vnops.c 1.20
ufs_vnops.c 1.221 was superseded by later changes
ufs_vnops.c 1.222 got fixed independently in lfs
 1.8 19-Jun-2016  dholland Mark ufs file versions we're already synced with.
 1.7 07-Feb-2014  hannken branches: 1.7.4; 1.7.8;
Change vnode operation lookup to return the resulting vnode *vpp unlocked.
Change cache_lookup() to return an unlocked vnode.

Discussed on tech-kern@

Welcome to 6.99.31
 1.6 08-Jun-2013  dholland branches: 1.6.2; 1.6.4;
ulfs_dir.h has been emptied; remove it.
 1.5 08-Jun-2013  dholland DIRBLKSIZ -> LFS_DIRBLKSIZ
DIRECTSIZ -> LFS_DIRECTSIZ
DIRSIZ -> LFS_DIRSIZ
OLDDIRFMT -> LFS_OLDDIRFMT
NEWDIRFMT -> LFS_NEWDIRFMT
IFTODT -> LFS_IFTODT
DTTOIF -> LFS_DTTOIF
 1.4 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.3 06-Jun-2013  dholland Split lfs from ufs step 3: rearrange config stuff.
Add new options:
LFS_EI
LFS_DIRHASH
LFS_EXTATTR
LFS_EXTATTR_AUTOSTART
LFS_QUOTA
LFS_QUOTA2

and update code referring to the corresponding FFS and UFS config
symbols to use the LFS versions. Disable the one extant reference
to APPLE_UFS in the ulfs files. Use opt_lfs.h only, not opt_ffs.h.
 1.2 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.6.4.1 18-May-2014  rmind sync with head
 1.6.2.4 03-Dec-2017  jdolecek update from HEAD
 1.6.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.6.2.2 23-Jun-2013  tls resync from head
 1.6.2.1 08-Jun-2013  tls file ulfs_extattr.c was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.7.8.2 05-Dec-2016  skrll Sync with HEAD
 1.7.8.1 09-Jul-2016  skrll Sync with HEAD
 1.7.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.7.4.1 07-Feb-2014  yamt file ulfs_extattr.c was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.13.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.14.22.1 17-Jan-2020  ad Sync with head.
 1.14.16.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.16.6.1 01-Aug-2021  thorpej Sync with HEAD.
 1.3 20-Jun-2016  dholland Merge -r1.11 of extattr.h:
Bump UFS1 extended attribute max name length to 256
 1.2 06-Jun-2013  dholland branches: 1.2.2; 1.2.10; 1.2.14;
Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.2.14.1 09-Jul-2016  skrll Sync with HEAD
 1.2.10.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.2.10.1 06-Jun-2013  yamt file ulfs_extattr.h was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.2.2.3 03-Dec-2017  jdolecek update from HEAD
 1.2.2.2 23-Jun-2013  tls resync from head
 1.2.2.1 06-Jun-2013  tls file ulfs_extattr.h was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.26 18-Jul-2021  dholland Abolish all the silly indirection macros for initializing vnode ops tables.

These are things of the form #define foofs_op genfs_op, or #define
foofs_op genfs_eopnotsupp, or similar. They serve no purpose besides
obfuscation, and have gotten cutpasted all over everywhere.
 1.25 17-Jan-2020  ad branches: 1.25.10;
VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.
 1.24 20-Jun-2016  dholland branches: 1.24.18; 1.24.24;
One more batch of already-synced ufs changes:

ufs_extern.h 1.79 is equivalent to ulfs_extern.h 1.14
ufsmount.h 1.43 is (roughly) equivalent to lfs_extern.h 1.102
ufs_inode.c 1.94 does not apply to lfs
ufs_inode.c 1.95 does not apply to lfs either
ufs_readwrite.c 1.108 is equivalent to ulfs_readwrite.c 1.8
ufs_readwrite.c 1.109 is equivalent to ulfs_readwrite.c 1.9
ufs_readwrite.c 1.110 is equivalent to ulfs_readwrite.c 1.10
ufs_readwrite.c 1.111 does not apply to lfs
ufs_readwrite.c 1.112 is equivalent to ulfs_readwrite.c 1.11
ufs_readwrite.c 1.113 is equivalent to ulfs_readwrite.c 1.13
ufs_readwrite.c 1.114 is equivalent to ulfs_readwrite.c 1.14
ufs_readwrite.c 1.115 is equivalent to ulfs_readwrite.c 1.15
ufs_readwrite.c 1.116-1.118 does not apply to lfs
ufs_readwrite.c 1.119-1.120 are equivalent to ulfs_readwrite.c 1.16
ufs_rename.c 1.12 is equivalent to lfs_rename.c 1.8
ufs_vnops.c 1.226 is equivalent to ulfs_vnops.c 1.22 and lfs_vnops.c 1.270
ufs_vnops.c 1.227 is equivalent to ulfs_vnops.c 1.23
ufs_vnops.c 1.228-1.229 are equivalent to ulfs_vnops.c 1.24
ufs_vnops.c 1.230 is equivalent to ulfs_vnops.c 1.25 and lfs_vnops.c 1.271
ufs_vnops.c 1.231 originated in lfs
ufs_vnops.c 1.232 does not apply to lfs
 1.23 20-Jun-2016  dholland Merge (effectively) -r1.78 of ufs_extern.h: shift ulfs_makeinode to
lfs_vnops.c and make it file-static there, as that's the only place
it's used.
 1.22 20-Jun-2016  dholland Note more already-merged versions:

inode.h 1.68 is subsumed by ulfs_inode.h 1.19
inode.h 1.69-1.72 do not apply to lfs
ufs_extern.h 1.74 was covered when lfs was moved to the new vnode cache
ufs_extern.h 1.75 is equivalent to ulfs_extern.h 1.13
ufs_extern.h 1.76-1.77 do not apply to lfs
ufsmount.h 1.42 does not apply to lfs
ufs_inode.c 1.90 is subsumed by ulfs_inode.c 1.10
ufs_inode.c 1.91-1.92 do not apply to lfs
ufs_lookup.c 1.130 is subsumed by ulfs_lookup.c 1.24
ufs_lookup.c 1.131 is equivalent to ulfs_lookup.c 1.20
ufs_lookup.c 1.132 is equivalent to ulfs_lookup.c 1.21
ufs_lookup.c 1.133 is equivalent to ulfs_lookup.c 1.22
ufs_lookup.c 1.134 is equivalent to ulfs_lookup.c 1.23
ufs_lookup.c 1.135 is equivalent to ulfs_lookup.c 1.25
ufs_quota2.c 1.38 is equivalent to ulfs_quota2.c 1.17
ufs_quota2.c 1.39 is equivalent to ulfs_quota2.c 1.16
ufs_quota2.c 1.40 is equivalent to ulfs_quota2.c 1.18
ufs_vfsops.c 1.53 is subsumed by lfs_vfsops.c 1.324
ufs_vfsops.c 1.54 is subsumed by lfs_vfsops.c 1.324
ufs_vnops.c 1.223-1.224 do not apply to lfs
 1.21 19-Jun-2016  dholland Update the ufs versions these files are synced with by 1: the
201306016 commit by hannken@ that removed references to ffs_snapgone
in ufs doesn't need to be synced into lfs.
 1.20 21-Sep-2015  dholland Add 64-bit directory entry structures, and adjust accessors accordingly.

The LFS64 directory entry has a 64-bit inode number. This is stored as
two 32-bit values to avoid inducing 64-bit alignment requirements.

The exposed type for manipulating directory entries is now
LFS_DIRHEADER, following the same convention as e.g. IFILE and SEGUSE.
(But with LFS_ on it, because.)
 1.19 15-Sep-2015  dholland Pass around struct lfs_dirheader instead of struct lfs_direct.
 1.18 15-Sep-2015  dholland Kill off the ulfs_direct_cache pool.
We no longer allocate temporary struct directs, so we don't need a
pool for them.
 1.17 15-Sep-2015  dholland Kill off ulfs_makedirentry; just pass the data to ulfs_direnter instead.
For now, move one copy of the code that allocates and fills in a
temporary struct lfs_direct to the top of ulfs_direnter; but it should
go away shortly.
 1.16 01-Sep-2015  dholland Add new accessors for the d_type and d_namlen fields of struct lfs_direct.
Napalm the old byteswap access logic for these.
 1.15 31-May-2015  hannken Change lfs from hash table to vcache.

- Change lfs_valloc() to return an inode number and version instead of
a vnode and move lfs_ialloc() and lfs_vcreate() to new lfs_init_vnode().

- Add lfs_valloc_fixed() to allocate a known inode, used by kernel
roll forward.

- Remove lfs_*ref(), these functions cannot coexist with vcache and
their commented behaviour is far away from their implementation.

- Add the cleaner lwp and blockinfo to struct ulfsmount so lfs_loadvnode()
may use hints from the cleaner.

- Remove vnode locks from ulfs_lookup() like we did with ufs_lookup().
 1.14 27-Mar-2015  riastradh Disentangle buffer-cached I/O from page-cached I/O in UFS.

Page-cached I/O is used for regular files, and is initiated by VFS
users such as userland and NFS.

Buffer-cached I/O is used for directories and symlinks, and is issued
only internally by UFS.

New UFS routine ufs_bufio replaces vn_rdwr for internal use.
ufs_bufio is implemented by new UFS operations uo_bufrd/uo_bufwr,
which sit in ufs_readwrite.c alongside the VOP_READ/VOP_WRITE
implementations.

I preserved the code as much as possible and will leave further
simplification for future commits. I kept the ulfs_readwrite.c
copypasta close to ufs_readwrite.c in case we ever want to merge them
back; likewise ext2fs_readwrite.c.

No externally visible semantic change. All atf fs tests still pass.
 1.13 25-May-2014  hannken branches: 1.13.4;
Remove ulfs_checkpath() and ulfs_readdotdot(). These are relics
from the pre-genfs_rename era.
 1.12 17-May-2014  dholland branches: 1.12.2;
Merge ulfs_mkdir into lfs_mkdir.
 1.11 17-May-2014  dholland Merge ulfs_symlink into lfs_symlink.
 1.10 28-Jul-2013  dholland branches: 1.10.2;
Bring in a copy of ffs_quota2_mount() for reference.
Add stuff to struct lfs that it needs to initialize.
Clear these fields in mount as there's no on-disk support for quota2;
but this increases the chances of being able to add it (or something
like it) in the future.
 1.9 28-Jul-2013  dholland Migrate the miscellaneous ulfs-level info from struct ulfsmount to
struct lfs.

Put them inside #ifdef _KERNEL there. They are not the only such
members, gross as that is. Unfortunately, moving struct lfs to
lfs_kernel.h does not work.
 1.8 20-Jul-2013  dholland G/C unused pieces.
 1.7 20-Jul-2013  dholland Collect the pieces of lfs rename into lfs_rename.c, and sprinkle static.
 1.6 08-Jun-2013  dholland branches: 1.6.2; 1.6.4; 1.6.6;
struct direct -> struct lfs_direct
struct dirtemplate -> struct lfs_dirtemplate
struct odirtemplate -> struct lfs_odirtemplate
DT_* -> LFS_DT_*
 1.5 06-Jun-2013  dholland Fix some exposed symbols:
LOSTFOUNDINO -> LFS_LOSTFOUNDINO
struct ufid -> struct ulfs_ufid
 1.4 06-Jun-2013  dholland Apparently we also need to cut and paste ffs_snapgone() in order to be
able to link the ufs code.

Instead of actually cutting and pasting it (as it depends on ffs-only
things) implement it as panic. Probably we'll be able to demonstrate
later that it's unreachable.

XXX: Someone should add snapgone to struct ufs_ops in ufs/ufsmount.h,
XXX: and fix ufs/ufs_lookup.c to not hardwire ffs.
 1.3 06-Jun-2013  dholland Add lfs_ or ulfs_ in front of extern symbols lacking them, mostly
quota-related (and particularly quota2-related) stuff.
 1.2 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.6.6.1 23-Jul-2013  riastradh sync with HEAD
 1.6.4.2 18-May-2014  rmind sync with head
 1.6.4.1 28-Aug-2013  rmind sync with head
 1.6.2.4 03-Dec-2017  jdolecek update from HEAD
 1.6.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.6.2.2 23-Jun-2013  tls resync from head
 1.6.2.1 08-Jun-2013  tls file ulfs_extern.h was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.10.2.1 10-Aug-2014  tls Rebase.
 1.12.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.12.2.1 17-May-2014  yamt file ulfs_extern.h was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.13.4.4 09-Jul-2016  skrll Sync with HEAD
 1.13.4.3 22-Sep-2015  skrll Sync with HEAD
 1.13.4.2 06-Jun-2015  skrll Sync with HEAD
 1.13.4.1 06-Apr-2015  skrll Sync with HEAD
 1.24.24.1 17-Jan-2020  ad Sync with head.
 1.24.18.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.25.10.1 01-Aug-2021  thorpej Sync with HEAD.
 1.6 31-May-2015  hannken Change lfs from hash table to vcache.

- Change lfs_valloc() to return an inode number and version instead of
a vnode and move lfs_ialloc() and lfs_vcreate() to new lfs_init_vnode().

- Add lfs_valloc_fixed() to allocate a known inode, used by kernel
roll forward.

- Remove lfs_*ref(), these functions cannot coexist with vcache and
their commented behaviour is far away from their implementation.

- Add the cleaner lwp and blockinfo to struct ulfsmount so lfs_loadvnode()
may use hints from the cleaner.

- Remove vnode locks from ulfs_lookup() like we did with ufs_lookup().
 1.5 20-Apr-2015  riastradh Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.
 1.4 27-Feb-2014  hannken branches: 1.4.4; 1.4.8;
The current implementation of vn_lock() is racy. Modification of
the vnode operations vector for active vnodes is unsafe because it
is not known whether deadfs or the original file system will be
called.

- Pass down LK_RETRY to the lock operation (hint for deadfs only).

- Change deadfs lock operation to return ENOENT if LK_RETRY is unset.

- Change all other lock operations to check for dead vnode once
the vnode is locked and unlock and return ENOENT in this case.

With these changes in place vnode lock operations will never succeed
after vclean() has marked the vnode as VI_XLOCK and before vclean()
has changed the operations vector.

Adresses PR kern/37706 (Forced unmount of file systems is unsafe)

Discussed on tech-kern.

Welcome to 6.99.33
 1.3 06-Jun-2013  dholland branches: 1.3.2; 1.3.4;
Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.2 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.3.4.1 18-May-2014  rmind sync with head
 1.3.2.4 03-Dec-2017  jdolecek update from HEAD
 1.3.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.3.2.2 23-Jun-2013  tls resync from head
 1.3.2.1 06-Jun-2013  tls file ulfs_ihash.c was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.4.8.1 06-Jun-2015  skrll Sync with HEAD
 1.4.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.4.4.1 27-Feb-2014  yamt file ulfs_ihash.c was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.26 05-Sep-2020  riastradh Round of uvm.h cleanup.

The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
to query whether curlwp is the pagedaemon, which should maybe be
exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
by it. We should split up uvm_extern.h but this will serve for now
to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
UVMHIST(ubchist), since ubchist is declared in uvm.h but the
reference evaporates if UVMHIST is not defined, so we reduce header
file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
here.

ok chs@
 1.25 23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.24 15-Jan-2020  ad Merge from yamt-pagecache (after much testing):

- Reduce unnecessary page scan in putpages esp. when an object has a ton of
pages cached but only a few of them are dirty.

- Reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
 1.23 31-Dec-2019  ad branches: 1.23.2;
- Add and use wrapper functions that take and acquire page interlocks, and pairs
of page interlocks. Require that the page interlock be held over calls to
uvm_pageactivate(), uvm_pagewire() and similar.

- Solve the concurrency problem with page replacement state. Rather than
updating the global state synchronously, set an intended state on
individual pages (active, inactive, enqueued, dequeued) while holding the
page interlock. After the interlock is released put the pages on a 128
entry per-CPU queue for their state changes to be made real in batch.
This results in in a ~400 fold decrease in contention on my test system.
Proposed on tech-kern but modified to use the page interlock rather than
atomics to synchronise as it's much easier to maintain that way, and
cheaper.
 1.22 13-Dec-2019  ad Break the global uvm_pageqlock into a per-page identity lock and a private
lock for use of the pagedaemon policy code. Discussed on tech-kern.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour
 1.21 28-Oct-2017  pgoyette branches: 1.21.4;
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3) format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
but it is possible that I've missed some of them. I would be glad to
update any stragglers that anyone identifies.
 1.20 10-Jun-2017  maya Rename i_flag to i_state.

The similarity to i_flags has previously caused errors.
 1.19 26-May-2017  riastradh branches: 1.19.2;
Eliminate crusty debugging sludge.

We have a mostly sane vnode lifecycle now. If this needs debugging,
it should be done once at the call site of VOP_RECLAIM.
 1.18 11-Apr-2017  riastradh Make VOP_INACTIVE preserve vnode lock on return.

Discussed on tech-kern:
https://mail-index.netbsd.org/tech-kern/2017/04/01/msg021751.html

Ride 7.99.68, a bumpy bus of incremental vfs improvements!
 1.17 30-Mar-2017  hannken Remove now redundant calls to fstrans_start()/fstrans_done().

Add fstrans_start()/fstrans_done() to lfs_putpages().
 1.16 20-Aug-2016  hannken branches: 1.16.2;
Remove now obsolete operation vcache_remove().

Welcome to 7.99.36
 1.15 20-Jun-2016  dholland branches: 1.15.2;
One more batch of already-synced ufs changes:

ufs_extern.h 1.79 is equivalent to ulfs_extern.h 1.14
ufsmount.h 1.43 is (roughly) equivalent to lfs_extern.h 1.102
ufs_inode.c 1.94 does not apply to lfs
ufs_inode.c 1.95 does not apply to lfs either
ufs_readwrite.c 1.108 is equivalent to ulfs_readwrite.c 1.8
ufs_readwrite.c 1.109 is equivalent to ulfs_readwrite.c 1.9
ufs_readwrite.c 1.110 is equivalent to ulfs_readwrite.c 1.10
ufs_readwrite.c 1.111 does not apply to lfs
ufs_readwrite.c 1.112 is equivalent to ulfs_readwrite.c 1.11
ufs_readwrite.c 1.113 is equivalent to ulfs_readwrite.c 1.13
ufs_readwrite.c 1.114 is equivalent to ulfs_readwrite.c 1.14
ufs_readwrite.c 1.115 is equivalent to ulfs_readwrite.c 1.15
ufs_readwrite.c 1.116-1.118 does not apply to lfs
ufs_readwrite.c 1.119-1.120 are equivalent to ulfs_readwrite.c 1.16
ufs_rename.c 1.12 is equivalent to lfs_rename.c 1.8
ufs_vnops.c 1.226 is equivalent to ulfs_vnops.c 1.22 and lfs_vnops.c 1.270
ufs_vnops.c 1.227 is equivalent to ulfs_vnops.c 1.23
ufs_vnops.c 1.228-1.229 are equivalent to ulfs_vnops.c 1.24
ufs_vnops.c 1.230 is equivalent to ulfs_vnops.c 1.25 and lfs_vnops.c 1.271
ufs_vnops.c 1.231 originated in lfs
ufs_vnops.c 1.232 does not apply to lfs
 1.14 20-Jun-2016  dholland Merge ufs_inode.c 1.93: missing unlock on error path.
 1.13 20-Jun-2016  dholland Note more already-merged versions:

inode.h 1.68 is subsumed by ulfs_inode.h 1.19
inode.h 1.69-1.72 do not apply to lfs
ufs_extern.h 1.74 was covered when lfs was moved to the new vnode cache
ufs_extern.h 1.75 is equivalent to ulfs_extern.h 1.13
ufs_extern.h 1.76-1.77 do not apply to lfs
ufsmount.h 1.42 does not apply to lfs
ufs_inode.c 1.90 is subsumed by ulfs_inode.c 1.10
ufs_inode.c 1.91-1.92 do not apply to lfs
ufs_lookup.c 1.130 is subsumed by ulfs_lookup.c 1.24
ufs_lookup.c 1.131 is equivalent to ulfs_lookup.c 1.20
ufs_lookup.c 1.132 is equivalent to ulfs_lookup.c 1.21
ufs_lookup.c 1.133 is equivalent to ulfs_lookup.c 1.22
ufs_lookup.c 1.134 is equivalent to ulfs_lookup.c 1.23
ufs_lookup.c 1.135 is equivalent to ulfs_lookup.c 1.25
ufs_quota2.c 1.38 is equivalent to ulfs_quota2.c 1.17
ufs_quota2.c 1.39 is equivalent to ulfs_quota2.c 1.16
ufs_quota2.c 1.40 is equivalent to ulfs_quota2.c 1.18
ufs_vfsops.c 1.53 is subsumed by lfs_vfsops.c 1.324
ufs_vfsops.c 1.54 is subsumed by lfs_vfsops.c 1.324
ufs_vnops.c 1.223-1.224 do not apply to lfs
 1.12 14-Nov-2015  pgoyette Remove historic references to wapbl.
 1.11 01-Sep-2015  dholland Add new accessors for the d_type and d_namlen fields of struct lfs_direct.
Napalm the old byteswap access logic for these.
 1.10 31-May-2015  hannken Change lfs from hash table to vcache.

- Change lfs_valloc() to return an inode number and version instead of
a vnode and move lfs_ialloc() and lfs_vcreate() to new lfs_init_vnode().

- Add lfs_valloc_fixed() to allocate a known inode, used by kernel
roll forward.

- Remove lfs_*ref(), these functions cannot coexist with vcache and
their commented behaviour is far away from their implementation.

- Add the cleaner lwp and blockinfo to struct ulfsmount so lfs_loadvnode()
may use hints from the cleaner.

- Remove vnode locks from ulfs_lookup() like we did with ufs_lookup().
 1.9 28-Jul-2013  dholland branches: 1.9.4; 1.9.6; 1.9.8;
Remove the now-pointless ulfs ops macros.
 1.8 28-Jul-2013  dholland Get rid of the ulfs_ops table as we only have one fs in here now.
 1.7 08-Jun-2013  dholland branches: 1.7.2; 1.7.4;
There is no WAPBL in LFS.
 1.6 08-Jun-2013  dholland mp->mnt_wapbl and mp->mnt_wapbl_replay are always NULL in here.
 1.5 06-Jun-2013  dholland Add lfs_ or ulfs_ in front of extern symbols lacking them, mostly
quota-related (and particularly quota2-related) stuff.
 1.4 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.3 06-Jun-2013  dholland Split lfs from ufs step 3: rearrange config stuff.
Add new options:
LFS_EI
LFS_DIRHASH
LFS_EXTATTR
LFS_EXTATTR_AUTOSTART
LFS_QUOTA
LFS_QUOTA2

and update code referring to the corresponding FFS and UFS config
symbols to use the LFS versions. Disable the one extant reference
to APPLE_UFS in the ulfs files. Use opt_lfs.h only, not opt_ffs.h.
 1.2 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.7.4.1 28-Aug-2013  rmind sync with head
 1.7.2.4 03-Dec-2017  jdolecek update from HEAD
 1.7.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.7.2.2 23-Jun-2013  tls resync from head
 1.7.2.1 08-Jun-2013  tls file ulfs_inode.c was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.9.8.6 28-Aug-2017  skrll Sync with HEAD
 1.9.8.5 05-Oct-2016  skrll Sync with HEAD
 1.9.8.4 09-Jul-2016  skrll Sync with HEAD
 1.9.8.3 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.9.8.2 22-Sep-2015  skrll Sync with HEAD
 1.9.8.1 06-Jun-2015  skrll Sync with HEAD
 1.9.6.1 10-Jul-2016  martin Pull up following revision(s) (requested by dholland in ticket #1188):
sys/ufs/lfs/ulfs_inode.c: revision 1.14
Merge ufs_inode.c 1.93: missing unlock on error path.
 1.9.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.9.4.1 28-Jul-2013  yamt file ulfs_inode.c was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.15.2.1 26-Apr-2017  pgoyette Sync with HEAD
 1.16.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.19.2.2 02-Nov-2017  snj Pull up following revision(s) (requested by pgoyette in ticket #335):
share/man/man9/kernhist.9: 1.5-1.8
sys/arch/acorn26/acorn26/pmap.c: 1.39
sys/arch/arm/arm32/fault.c: 1.105 via patch
sys/arch/arm/arm32/pmap.c: 1.350, 1.359
sys/arch/arm/broadcom/bcm2835_bsc.c: 1.7
sys/arch/arm/omap/if_cpsw.c: 1.20
sys/arch/arm/omap/tiotg.c: 1.7
sys/arch/evbarm/conf/RPI2_INSTALL: 1.3
sys/dev/ic/sl811hs.c: 1.98
sys/dev/usb/ehci.c: 1.256
sys/dev/usb/if_axe.c: 1.83
sys/dev/usb/motg.c: 1.18
sys/dev/usb/ohci.c: 1.274
sys/dev/usb/ucom.c: 1.119
sys/dev/usb/uhci.c: 1.277
sys/dev/usb/uhub.c: 1.137
sys/dev/usb/umass.c: 1.160-1.162
sys/dev/usb/umass_quirks.c: 1.100
sys/dev/usb/umass_scsipi.c: 1.55
sys/dev/usb/usb.c: 1.168
sys/dev/usb/usb_mem.c: 1.70
sys/dev/usb/usb_subr.c: 1.221
sys/dev/usb/usbdi.c: 1.175
sys/dev/usb/usbdi_util.c: 1.67-1.70
sys/dev/usb/usbroothub.c: 1.3
sys/dev/usb/xhci.c: 1.75
sys/external/bsd/drm2/dist/drm/i915/i915_gem.c: 1.34
sys/kern/kern_history.c: 1.15
sys/kern/kern_xxx.c: 1.74
sys/kern/vfs_bio.c: 1.275-1.276
sys/miscfs/genfs/genfs_io.c: 1.71
sys/sys/kernhist.h: 1.21
sys/ufs/ffs/ffs_balloc.c: 1.63
sys/ufs/lfs/lfs_vfsops.c: 1.361
sys/ufs/lfs/ulfs_inode.c: 1.21
sys/ufs/lfs/ulfs_vnops.c: 1.52
sys/ufs/ufs/ufs_inode.c: 1.102
sys/ufs/ufs/ufs_vnops.c: 1.239
sys/uvm/pmap/pmap.c: 1.37-1.39
sys/uvm/pmap/pmap_tlb.c: 1.22
sys/uvm/uvm_amap.c: 1.108
sys/uvm/uvm_anon.c: 1.64
sys/uvm/uvm_aobj.c: 1.126
sys/uvm/uvm_bio.c: 1.91
sys/uvm/uvm_device.c: 1.66
sys/uvm/uvm_fault.c: 1.201
sys/uvm/uvm_km.c: 1.144
sys/uvm/uvm_loan.c: 1.85
sys/uvm/uvm_map.c: 1.353
sys/uvm/uvm_page.c: 1.194
sys/uvm/uvm_pager.c: 1.111
sys/uvm/uvm_pdaemon.c: 1.109
sys/uvm/uvm_swap.c: 1.175
sys/uvm/uvm_vnode.c: 1.103
usr.bin/vmstat/vmstat.c: 1.219
Reorder to test for null before null deref in debug code
--
Reorder to test for null before null deref in debug code
--
KNF
--
No need for '\n' in UVMHIST_LOG
--
normalise a BIOHIST log message
--
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...
(As proposed on tech-kern@ with additional changes and enhancements.)
Details of changes:
* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)
* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.
* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.
* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."
* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.
* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).
* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).
* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.
* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.
[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3)
format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".
[2] I've tried very hard to find "all [the] existing users of
kernhist(9)"
but it is possible that I've missed some of them. I would be glad
to
update any stragglers that anyone identifies.
--
For some reason this single kernel seems to have outgrown its declared
size as a result of the kernhist(9) changes. Bump the size.
XXX The amount of increase may be excessive - anyone with more detailed
XXX knowledge please feel free to further adjust the value
appropriately.
--
Misssed one cast of pointer --> uintptr_t in previous kernhist(9) commit
--
And yet another one. :(
--
Use correct mark-up for NetBSD version.
--
More improvements in grammar and readability.
--
Remove a stray '"' (obvious typo) and add a couple of casts that are
probably needed.
--
And replace an instance of "%p" conversion with "%#jx"
--
Whitespace fix. Give Bl tag table a width. Fix Xr.
 1.19.2.1 30-Oct-2017  snj Pull up following revision(s) (requested by maya in ticket #330):
sbin/fsck_lfs/inode.c: 1.69
sbin/fsck_lfs/lfs.c: 1.73
sbin/fsck_lfs/pass6.c: 1.50
sbin/fsck_lfs/segwrite.c: 1.46
sys/ufs/lfs/lfs.h: 1.202-1.203
sys/ufs/lfs/lfs_accessors.h: 1.48
sys/ufs/lfs/lfs_alloc.c: 1.136-1.137
sys/ufs/lfs/lfs_balloc.c: 1.94
sys/ufs/lfs/lfs_bio.c: 1.141
sys/ufs/lfs/lfs_extern.h: 1.113
sys/ufs/lfs/lfs_inode.c: 1.156-1.157
sys/ufs/lfs/lfs_inode.h: 1.20, 1.21, 1.23
sys/ufs/lfs/lfs_itimes.c: 1.20
sys/ufs/lfs/lfs_pages.c: 1.13-1.15
sys/ufs/lfs/lfs_rename.c: 1.22
sys/ufs/lfs/lfs_segment.c: 1.270-1.275
sys/ufs/lfs/lfs_subr.c: 1.94-1.97
sys/ufs/lfs/lfs_syscalls.c: 1.175
sys/ufs/lfs/lfs_vfsops.c: 1.360
sys/ufs/lfs/lfs_vnops.c: 1.316-1.321
sys/ufs/lfs/ulfs_inode.c: 1.20
sys/ufs/lfs/ulfs_inode.h: 1.24
sys/ufs/lfs/ulfs_lookup.c: 1.41
sys/ufs/lfs/ulfs_quota2.c: 1.31
sys/ufs/lfs/ulfs_readwrite.c: 1.24
sys/ufs/lfs/ulfs_vnops.c: 1.49-1.50
Update inode member i_flag --> i_state to keep up with kernel changes
Move definition of IN_ALLMOD near the flag it's a mask for.
Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
Correct confusion between i_flag and i_flags
These will have to be renamed.
Spotted by Riastradh, thanks!
Add an XXX about the missing flags so it's not buried in a commit
message.
now the XXX count for LFS is 260
Rename i_flag to i_state.
The similarity to i_flags has previously caused errors.
Use continue to denote the no-op loop to match netbsd style
newline for extra clarity.
It isn't safe to drain dirops with seglock held, it'll deadlock if there
are any dirops. drain before grabbing seglock.
lfs_dirops == 0 is always true (as we already drained dirops), so omit
that part of the comparison.
Fixes a lot of LFS deadlocks. PR kern/52301
Many thanks to dholland for help analyzing coredumps
Ifdef out KDASSERT which fires on my machine.
Deduplicate sanity check that seglock is held on segunlock
Revert r1.272 fix to PR kern/52301, the performance hit is making things
unusable.
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar
XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
fix buffer overflow/KASSERT when cookies are supplied
lfs no longer uses the ffs-style struct direct, use the correct minimum
size
from dholland
XXX more wrong
Consistently use {,UN}MARK_VNODE macros rather than function calls.
Not much point doing anything after a panic call
Ask some question about the code in a XXX comment
XXX question our double-flushing of dirops
Fix typo in comment
 1.21.4.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.23.2.2 29-Feb-2020  ad Sync with head.
 1.23.2.1 17-Jan-2020  ad Sync with head.
 1.25 17-Feb-2024  mlelstv Whitespace.
 1.24 10-Jun-2017  maya Rename i_flag to i_state.

The similarity to i_flags has previously caused errors.
 1.23 08-Jun-2017  chs move some buffer cache internals declarations from buf.h to vfs_bio.c.
this is needed to avoid name conflicts with ZFS and also
makes it clearer that other code shouldn't be messing with these.
remove the LFS debug code that poked around in bufqueues and
remove the BQ_EMPTY bufqueue since nothing uses it anymore.
provide a function to let LFS and wapbl read the value of nbuf for now.
 1.22 21-Jun-2016  dholland branches: 1.22.10;
Revert version 1.19 (make ufid_ino in struct ulfs_ufid 64-bit) -- via
a twisty maze of marginal if not illegal type punning it breaks the
cleaner.

This will need to be done over, but it requires substantially more
mechanism and compat ioctls. Booo.
 1.21 20-Jun-2016  dholland u_int{8,16,32,64}_t -> uint{8,16,32,64}_t in remaining lfs headers.
 1.20 20-Jun-2016  dholland Note more already-merged versions:

inode.h 1.68 is subsumed by ulfs_inode.h 1.19
inode.h 1.69-1.72 do not apply to lfs
ufs_extern.h 1.74 was covered when lfs was moved to the new vnode cache
ufs_extern.h 1.75 is equivalent to ulfs_extern.h 1.13
ufs_extern.h 1.76-1.77 do not apply to lfs
ufsmount.h 1.42 does not apply to lfs
ufs_inode.c 1.90 is subsumed by ulfs_inode.c 1.10
ufs_inode.c 1.91-1.92 do not apply to lfs
ufs_lookup.c 1.130 is subsumed by ulfs_lookup.c 1.24
ufs_lookup.c 1.131 is equivalent to ulfs_lookup.c 1.20
ufs_lookup.c 1.132 is equivalent to ulfs_lookup.c 1.21
ufs_lookup.c 1.133 is equivalent to ulfs_lookup.c 1.22
ufs_lookup.c 1.134 is equivalent to ulfs_lookup.c 1.23
ufs_lookup.c 1.135 is equivalent to ulfs_lookup.c 1.25
ufs_quota2.c 1.38 is equivalent to ulfs_quota2.c 1.17
ufs_quota2.c 1.39 is equivalent to ulfs_quota2.c 1.16
ufs_quota2.c 1.40 is equivalent to ulfs_quota2.c 1.18
ufs_vfsops.c 1.53 is subsumed by lfs_vfsops.c 1.324
ufs_vfsops.c 1.54 is subsumed by lfs_vfsops.c 1.324
ufs_vnops.c 1.223-1.224 do not apply to lfs
 1.19 20-Jun-2016  dholland Merge -r1.67 of ufs/inode.h: make the inode field of a filehandle
64-bit instead of truncating to 32 bits. Note that if you're serving
nfs off lfs (but I don't think you are as I think there are known
fatal problems doing so) you'll need to reboot your clients after this
change.

I've used a 64-bit value explicitly instead of ino_t (as in the ufs
structure) because this is a structure whose size ought to be well
defined. I remember some discussion of this when the ufs change was
committed, but not the conclusion (if any) -- if anyone hates this it
can be changed to ino_t easily enough.
 1.18 20-Jun-2016  dholland Merge ufs/inode.h 1.66: remove i_hash from struct inode. This is the
hash table entry link from the old per-fs vnode cache and we don't
need it any more.
 1.17 19-Jun-2016  dholland Mark ufs file versions we're already synced with.
 1.16 01-Sep-2015  dholland Use the lfs dinode accessors in place of the ufs-derived ones.
(Mostly.)

The ufs-derived ones are fake structure member macros, which are gross
and not very safe. Also, it seems that a lot of places in the lfs code
were using the ffsv1 branch of them unconditionally, and this way it's
guaranteed all those places have been updated.

Found while doing this: for non-devices, have getattr produce NODEV
in the rdev field instead of leaking the address of the first direct
block.
 1.15 12-Aug-2015  dholland Hack up dinode usage to be 64 vs. 32 as needed. Part 1.

(This part changes the native lfs code; the ufs-derived code already
has 64 vs. 32 logic, but as aspects of it are unsafe, and don't
entirely interoperate cleanly with the lfs 64/32 stuff, pass 2 will be
rehashing that.)
 1.14 24-Jul-2015  dholland More lfs superblock accessors.
(This changes the rest of the code over; all the accessors were
already added.)

The difference between this commit and the previous one is arbitrary,
but the previous one passed the regression tests on its own so I'm
keeping it separate to help with any bisections that might be needed
in the future.
 1.13 24-Jul-2015  dholland Switch to accessor functions for elements of the LFS on-disk
superblock. This will allow switching between 32/64 bit forms on the
fly; it will also allow handling LFS_EI reasonably tidily. (That
currently doesn't work on the superblock.)

It also gets rid of cpp abuse in the form of fake structure member
macros.

Also, instead of doing sleep/wakeup on &lfs_avail and &lfs_nextseg
inside the on-disk superblock, add extra elements to the in-memory
struct lfs for this. (XXX: these should be changed to condvars, but
not right now)

XXX: this migrates a structure needed by the lfs code in libsa (struct
salfs) into lfs.h, where it doesn't belong, but for the time being
this is necessary in order to allow the accessors (and the various
lfs macros and other goop that relies on them) to compile.
 1.12 17-May-2014  dholland branches: 1.12.2; 1.12.6;
Remove the DIROP macros. They are evil, especially the CREATE ones.

This results in some duplicate logic in the creation vnops (symlink,
mknod, create, mkdir) but we will probably be able to factor it out in
a more sensible way later.

Now the creation vnops call getnewvnode explicitly instead of under
multiple layers of obscure gunk. Then we explicitly do lfs_set_dirop,
and afterwards lfs_unset_dirop.
 1.11 18-Mar-2014  riastradh branches: 1.11.2;
Merge riastradh-drm2 to HEAD.
 1.10 20-Jul-2013  dholland Collect the pieces of lfs rename into lfs_rename.c, and sprinkle static.
 1.9 18-Jun-2013  christos branches: 1.9.2; 1.9.4; 1.9.6;
Prefix most of the cpp macros with lfs_ and LFS_ to avoid conflicts with ffs.
This was done so that boot blocks that want to compile both FFS and LFS in
the same file work.
 1.8 18-Jun-2013  dholland Tuck away a bunch of symbols that don't need to be public.
 1.7 08-Jun-2013  dholland ulfs_dir.h has been emptied; remove it.
 1.6 08-Jun-2013  dholland Split the definitions suitable for userland out of ulfs_inode.h into
lfs_inode.h. Since fsck_lfs, newfs_lfs, and lfs_cleanerd want to reuse
the inode structure for their own internal use, and some of them share
parts of the kernel code as well, the best way forward is to provide a
relatively sanitized header that doesn't bring in stray material.

Shuffle a few other definitions around so that lfs_inode.h depends
only on lfs.h.

Install lfs_inode.h into /usr/include.
 1.5 06-Jun-2013  dholland Cleanups to reduce symbol and header exposure:
- move struct ufid from ulfs_inode.h to lfs.h
- lfs.h needs sys/mount.h and sys/pool.h
- ulfs_quota2_subr.c needs lfs_inode.h
- remove ulfs_inode.h from lfs.h in favor of ulfs_dinode.h
- move ULFS_NDADDR, ULFS_NIADDR, ULFS_NXADDR from ulfs_dinode.h to lfs.h
- remove ulfs_dinode.h from lfs.h
- add lfs.h to ulfs_dinode.h
 1.4 06-Jun-2013  dholland Remove stray references to ext2fs, chfs, ffs, and mfs.
 1.3 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.2 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.9.6.1 23-Jul-2013  riastradh sync with HEAD
 1.9.4.2 18-May-2014  rmind sync with head
 1.9.4.1 28-Aug-2013  rmind sync with head
 1.9.2.4 03-Dec-2017  jdolecek update from HEAD
 1.9.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.9.2.2 23-Jun-2013  tls resync from head
 1.9.2.1 18-Jun-2013  tls file ulfs_inode.h was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.11.2.1 10-Aug-2014  tls Rebase.
 1.12.6.3 28-Aug-2017  skrll Sync with HEAD
 1.12.6.2 09-Jul-2016  skrll Sync with HEAD
 1.12.6.1 22-Sep-2015  skrll Sync with HEAD
 1.12.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.12.2.1 17-May-2014  yamt file ulfs_inode.h was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.22.10.1 30-Oct-2017  snj Pull up following revision(s) (requested by maya in ticket #330):
sbin/fsck_lfs/inode.c: 1.69
sbin/fsck_lfs/lfs.c: 1.73
sbin/fsck_lfs/pass6.c: 1.50
sbin/fsck_lfs/segwrite.c: 1.46
sys/ufs/lfs/lfs.h: 1.202-1.203
sys/ufs/lfs/lfs_accessors.h: 1.48
sys/ufs/lfs/lfs_alloc.c: 1.136-1.137
sys/ufs/lfs/lfs_balloc.c: 1.94
sys/ufs/lfs/lfs_bio.c: 1.141
sys/ufs/lfs/lfs_extern.h: 1.113
sys/ufs/lfs/lfs_inode.c: 1.156-1.157
sys/ufs/lfs/lfs_inode.h: 1.20, 1.21, 1.23
sys/ufs/lfs/lfs_itimes.c: 1.20
sys/ufs/lfs/lfs_pages.c: 1.13-1.15
sys/ufs/lfs/lfs_rename.c: 1.22
sys/ufs/lfs/lfs_segment.c: 1.270-1.275
sys/ufs/lfs/lfs_subr.c: 1.94-1.97
sys/ufs/lfs/lfs_syscalls.c: 1.175
sys/ufs/lfs/lfs_vfsops.c: 1.360
sys/ufs/lfs/lfs_vnops.c: 1.316-1.321
sys/ufs/lfs/ulfs_inode.c: 1.20
sys/ufs/lfs/ulfs_inode.h: 1.24
sys/ufs/lfs/ulfs_lookup.c: 1.41
sys/ufs/lfs/ulfs_quota2.c: 1.31
sys/ufs/lfs/ulfs_readwrite.c: 1.24
sys/ufs/lfs/ulfs_vnops.c: 1.49-1.50
Update inode member i_flag --> i_state to keep up with kernel changes
Move definition of IN_ALLMOD near the flag it's a mask for.
Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
Correct confusion between i_flag and i_flags
These will have to be renamed.
Spotted by Riastradh, thanks!
Add an XXX about the missing flags so it's not buried in a commit
message.
now the XXX count for LFS is 260
Rename i_flag to i_state.
The similarity to i_flags has previously caused errors.
Use continue to denote the no-op loop to match netbsd style
newline for extra clarity.
It isn't safe to drain dirops with seglock held, it'll deadlock if there
are any dirops. drain before grabbing seglock.
lfs_dirops == 0 is always true (as we already drained dirops), so omit
that part of the comparison.
Fixes a lot of LFS deadlocks. PR kern/52301
Many thanks to dholland for help analyzing coredumps
Ifdef out KDASSERT which fires on my machine.
Deduplicate sanity check that seglock is held on segunlock
Revert r1.272 fix to PR kern/52301, the performance hit is making things
unusable.
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar
XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
fix buffer overflow/KASSERT when cookies are supplied
lfs no longer uses the ffs-style struct direct, use the correct minimum
size
from dholland
XXX more wrong
Consistently use {,UN}MARK_VNODE macros rather than function calls.
Not much point doing anything after a panic call
Ask some question about the code in a XXX comment
XXX question our double-flushing of dirops
Fix typo in comment
 1.48 08-Sep-2024  rillig fix a/an grammar in obvious cases
 1.47 06-Aug-2022  andvar branches: 1.47.10;
s/blity/bility/ in various words, mainly in comments.
 1.46 05-Sep-2020  riastradh Revert "ufs: Prevent mkdir from choking on deleted directories."

This change made no sense and should not have been committed.
 1.45 05-Sep-2020  riastradh ufs: Prevent mkdir from choking on deleted directories.

Fix some missing uvm_vnp_setsize in screw cases while here.
 1.44 16-May-2020  christos Add ACL support for FFS. From FreeBSD.
 1.43 04-Apr-2020  ad Merge the remaining changes from the ad-namecache branch, affecting namei()
and getcwd():

- push vnode locking back as far as possible.
- do most lookups directly in the namecache, avoiding vnode locks & refs.
- don't block new refs to vnodes across VOP_INACTIVE().
- get shared locks for VOP_LOOKUP() if the file system supports it.
- correct lock types for VOP_ACCESS() / VOP_GETATTR() in a few places.

Possible future enhancements:

- make the lookups lockless.
- support dotdot lookups by being lockless and inferring absence of chroot.
- maybe make it work for layered file systems.
- avoid vnode references at the root & cwd.
 1.42 14-Mar-2020  ad - Hide the details of SPCF_SHOULDYIELD and related behind a couple of small
functions: preempt_point() and preempt_needed().

- preempt(): if the LWP has exceeded its timeslice in kernel, strip it of
any priority boost gained earlier from blocking.
 1.41 10-Jun-2017  maya branches: 1.41.6; 1.41.12;
Rename i_flag to i_state.

The similarity to i_flags has previously caused errors.
 1.40 30-Mar-2017  hannken branches: 1.40.6;
Remove now redundant calls to fstrans_start()/fstrans_done().

Add fstrans_start()/fstrans_done() to lfs_putpages().
 1.39 20-Jun-2016  dholland branches: 1.39.2; 1.39.4;
Note more already-merged versions:

inode.h 1.68 is subsumed by ulfs_inode.h 1.19
inode.h 1.69-1.72 do not apply to lfs
ufs_extern.h 1.74 was covered when lfs was moved to the new vnode cache
ufs_extern.h 1.75 is equivalent to ulfs_extern.h 1.13
ufs_extern.h 1.76-1.77 do not apply to lfs
ufsmount.h 1.42 does not apply to lfs
ufs_inode.c 1.90 is subsumed by ulfs_inode.c 1.10
ufs_inode.c 1.91-1.92 do not apply to lfs
ufs_lookup.c 1.130 is subsumed by ulfs_lookup.c 1.24
ufs_lookup.c 1.131 is equivalent to ulfs_lookup.c 1.20
ufs_lookup.c 1.132 is equivalent to ulfs_lookup.c 1.21
ufs_lookup.c 1.133 is equivalent to ulfs_lookup.c 1.22
ufs_lookup.c 1.134 is equivalent to ulfs_lookup.c 1.23
ufs_lookup.c 1.135 is equivalent to ulfs_lookup.c 1.25
ufs_quota2.c 1.38 is equivalent to ulfs_quota2.c 1.17
ufs_quota2.c 1.39 is equivalent to ulfs_quota2.c 1.16
ufs_quota2.c 1.40 is equivalent to ulfs_quota2.c 1.18
ufs_vfsops.c 1.53 is subsumed by lfs_vfsops.c 1.324
ufs_vfsops.c 1.54 is subsumed by lfs_vfsops.c 1.324
ufs_vnops.c 1.223-1.224 do not apply to lfs
 1.38 20-Jun-2016  dholland More already-merged or equivalent changes:

ufs_dirhash.c 1.36 corresponds to ulfs_dirhash.c 1.8
ufs_extattr.c 1.43 corresponds to ulfs_extattr.c 1.7
ufs_lookup.c 1.126 does not apply to lfs
ufs_lookup.c 1.127 we already have
ufs_lookup.c 1.128 does not apply to lfs
ufs_lookup.c 1.129 corresponds to ulfs_lookup.c 1.19
ufs_quota1.c 1.19 corresponds to ulfs_quota1.c 1.7
ufs_quota1.c 1.20 corresponds to ulfs_quota1.c 1.8
ufs_quota2.c 1.36 we have equivalent changes for
ufs_rename.c 1.9 corresponds to lfs_rename.c 1.5
ufs_rename.c 1.10 corresponds to lfs_rename.c 1.6
ufs_vnops.c 1.219 corresponds to lfs_vnops.c 1.260 and ulfs_vnops.c 1.19
ufs_vnops.c 1.220 corresponds to lfs_vnops.c 1.261 and ulfs_vnops.c 1.20
ufs_vnops.c 1.221 was superseded by later changes
ufs_vnops.c 1.222 got fixed independently in lfs
 1.37 19-Jun-2016  dholland we already have ufs_lookup.c 1.125 and ufs_vnops.c 1.218.
 1.36 19-Jun-2016  dholland Update the ufs versions these files are synced with by 1: the
201306016 commit by hannken@ that removed references to ffs_snapgone
in ufs doesn't need to be synced into lfs.
 1.35 14-Nov-2015  pgoyette Remove historic references to wapbl.
 1.34 21-Sep-2015  dholland Add 64-bit directory entry structures, and adjust accessors accordingly.

The LFS64 directory entry has a 64-bit inode number. This is stored as
two 32-bit values to avoid inducing 64-bit alignment requirements.

The exposed type for manipulating directory entries is now
LFS_DIRHEADER, following the same convention as e.g. IFILE and SEGUSE.
(But with LFS_ on it, because.)
 1.33 21-Sep-2015  dholland Oops; LFS_DIRECTSIZ() is going to need the fs as an argument.

Also, it turns out that dirhash needs a compile-time-constant version
of LFS_DIRECTSIZ(LFS_MAXNAMLEN+1), independent of 64-vs-32, so create
LFS_MAXDIRENTRYSIZE for this. Sigh.
 1.32 15-Sep-2015  dholland Pass around struct lfs_dirheader instead of struct lfs_direct.
 1.31 15-Sep-2015  dholland Add an accessor function for directory names.
 1.30 15-Sep-2015  dholland Tidyups/fixes preparatory to making d_name[] in struct lfs_direct size
0 instead of size LFS_MAXNAMLEN+1, and preparatory to having accessor
functions for d_name. In particular, don't create prototype entries
and copy them, and access the name field only for directory structures
that are in buffers with space for the name to exist.
 1.29 15-Sep-2015  dholland Tidy up ulfs_direnter: don't malloc a temporary struct lfs_direct
and double-copy it. Just write to the destination buffer.
 1.28 15-Sep-2015  dholland Kill off ulfs_makedirentry; just pass the data to ulfs_direnter instead.
For now, move one copy of the code that allocates and fills in a
temporary struct lfs_direct to the top of ulfs_direnter; but it should
go away shortly.
 1.27 15-Sep-2015  dholland Add and use accessor functions for more of the directory entry fields.
 1.26 01-Sep-2015  dholland Add new accessors for the d_type and d_namlen fields of struct lfs_direct.
Napalm the old byteswap access logic for these.
 1.25 11-Jul-2015  mlelstv mp->mnt_stat.f_flag is never set. Use the mnt_flag directly.
This will now actually prevent the 'bad dir' panic if the filesystem
is read-only.
 1.24 31-May-2015  hannken Change lfs from hash table to vcache.

- Change lfs_valloc() to return an inode number and version instead of
a vnode and move lfs_ialloc() and lfs_vcreate() to new lfs_init_vnode().

- Add lfs_valloc_fixed() to allocate a known inode, used by kernel
roll forward.

- Remove lfs_*ref(), these functions cannot coexist with vcache and
their commented behaviour is far away from their implementation.

- Add the cleaner lwp and blockinfo to struct ulfsmount so lfs_loadvnode()
may use hints from the cleaner.

- Remove vnode locks from ulfs_lookup() like we did with ufs_lookup().
 1.23 28-Mar-2015  maxv Remove the 'cred' argument from breadn(), and update the man page
accordingly.

ok hannken@
 1.22 27-Mar-2015  riastradh Disentangle buffer-cached I/O from page-cached I/O in UFS.

Page-cached I/O is used for regular files, and is initiated by VFS
users such as userland and NFS.

Buffer-cached I/O is used for directories and symlinks, and is issued
only internally by UFS.

New UFS routine ufs_bufio replaces vn_rdwr for internal use.
ufs_bufio is implemented by new UFS operations uo_bufrd/uo_bufwr,
which sit in ufs_readwrite.c alongside the VOP_READ/VOP_WRITE
implementations.

I preserved the code as much as possible and will leave further
simplification for future commits. I kept the ulfs_readwrite.c
copypasta close to ufs_readwrite.c in case we ever want to merge them
back; likewise ext2fs_readwrite.c.

No externally visible semantic change. All atf fs tests still pass.
 1.21 03-Jun-2014  joerg branches: 1.21.4;
Introduce two helper functions to centralise the namecache statistics
in vfs_cache.c. Use consistent locking around the per-cpu data.
 1.20 25-May-2014  hannken Remove ulfs_checkpath() and ulfs_readdotdot(). These are relics
from the pre-genfs_rename era.
 1.19 07-Feb-2014  hannken branches: 1.19.2; 1.19.4;
Change vnode operation lookup to return the resulting vnode *vpp unlocked.
Change cache_lookup() to return an unlocked vnode.

Discussed on tech-kern@

Welcome to 6.99.31
 1.18 28-Jan-2014  martin Bogus gcc 4.8 maybe-used-uninitialized warning
 1.17 25-Oct-2013  martin Mark a diagnostic-only variable
 1.16 17-Oct-2013  christos - remove unused variables
- add debug ifdefs for debugging variables
- __USE() where appropriate.
 1.15 28-Jul-2013  dholland Migrate the miscellaneous ulfs-level info from struct ulfsmount to
struct lfs.

Put them inside #ifdef _KERNEL there. They are not the only such
members, gross as that is. Unfortunately, moving struct lfs to
lfs_kernel.h does not work.
 1.14 28-Jul-2013  dholland Remove the now-pointless ulfs ops macros.
 1.13 28-Jul-2013  dholland Get rid of the ulfs_ops table as we only have one fs in here now.
 1.12 18-Jun-2013  christos branches: 1.12.2; 1.12.4;
Prefix most of the cpp macros with lfs_ and LFS_ to avoid conflicts with ffs.
This was done so that boot blocks that want to compile both FFS and LFS in
the same file work.
 1.11 08-Jun-2013  dholland ulfs_dir.h has been emptied; remove it.
 1.10 08-Jun-2013  dholland There is no WAPBL in LFS.
 1.9 08-Jun-2013  dholland DIRBLKSIZ -> LFS_DIRBLKSIZ
DIRECTSIZ -> LFS_DIRECTSIZ
DIRSIZ -> LFS_DIRSIZ
OLDDIRFMT -> LFS_OLDDIRFMT
NEWDIRFMT -> LFS_NEWDIRFMT
IFTODT -> LFS_IFTODT
DTTOIF -> LFS_DTTOIF
 1.8 08-Jun-2013  dholland Move stuff to lfs.h that's needed by userland:
LFS_DT_*
ULFS_ROOTINO
ULFS_WINO
struct lfs_direct
struct lfs_dirtemplate
struct lfs_odirtemplate
struct ulfs_args

Also fix FFS_MAXNAMLEN -> LFS_MAXNAMLEN in several places.
 1.7 08-Jun-2013  dholland struct direct -> struct lfs_direct
struct dirtemplate -> struct lfs_dirtemplate
struct odirtemplate -> struct lfs_odirtemplate
DT_* -> LFS_DT_*
 1.6 06-Jun-2013  dholland Apparently we also need to cut and paste ffs_snapgone() in order to be
able to link the ufs code.

Instead of actually cutting and pasting it (as it depends on ffs-only
things) implement it as panic. Probably we'll be able to demonstrate
later that it's unreachable.

XXX: Someone should add snapgone to struct ufs_ops in ufs/ufsmount.h,
XXX: and fix ufs/ufs_lookup.c to not hardwire ffs.
 1.5 06-Jun-2013  dholland Add lfs_ or ulfs_ in front of extern symbols lacking them, mostly
quota-related (and particularly quota2-related) stuff.
 1.4 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.3 06-Jun-2013  dholland Split lfs from ufs step 3: rearrange config stuff.
Add new options:
LFS_EI
LFS_DIRHASH
LFS_EXTATTR
LFS_EXTATTR_AUTOSTART
LFS_QUOTA
LFS_QUOTA2

and update code referring to the corresponding FFS and UFS config
symbols to use the LFS versions. Disable the one extant reference
to APPLE_UFS in the ulfs files. Use opt_lfs.h only, not opt_ffs.h.
 1.2 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.12.4.2 18-May-2014  rmind sync with head
 1.12.4.1 28-Aug-2013  rmind sync with head
 1.12.2.4 03-Dec-2017  jdolecek update from HEAD
 1.12.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.12.2.2 23-Jun-2013  tls resync from head
 1.12.2.1 18-Jun-2013  tls file ulfs_lookup.c was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.19.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.19.4.1 07-Feb-2014  yamt file ulfs_lookup.c was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.19.2.1 10-Aug-2014  tls Rebase.
 1.21.4.5 28-Aug-2017  skrll Sync with HEAD
 1.21.4.4 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.21.4.3 22-Sep-2015  skrll Sync with HEAD
 1.21.4.2 06-Jun-2015  skrll Sync with HEAD
 1.21.4.1 06-Apr-2015  skrll Sync with HEAD
 1.39.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.39.2.1 26-Apr-2017  pgoyette Sync with HEAD
 1.40.6.1 30-Oct-2017  snj Pull up following revision(s) (requested by maya in ticket #330):
sbin/fsck_lfs/inode.c: 1.69
sbin/fsck_lfs/lfs.c: 1.73
sbin/fsck_lfs/pass6.c: 1.50
sbin/fsck_lfs/segwrite.c: 1.46
sys/ufs/lfs/lfs.h: 1.202-1.203
sys/ufs/lfs/lfs_accessors.h: 1.48
sys/ufs/lfs/lfs_alloc.c: 1.136-1.137
sys/ufs/lfs/lfs_balloc.c: 1.94
sys/ufs/lfs/lfs_bio.c: 1.141
sys/ufs/lfs/lfs_extern.h: 1.113
sys/ufs/lfs/lfs_inode.c: 1.156-1.157
sys/ufs/lfs/lfs_inode.h: 1.20, 1.21, 1.23
sys/ufs/lfs/lfs_itimes.c: 1.20
sys/ufs/lfs/lfs_pages.c: 1.13-1.15
sys/ufs/lfs/lfs_rename.c: 1.22
sys/ufs/lfs/lfs_segment.c: 1.270-1.275
sys/ufs/lfs/lfs_subr.c: 1.94-1.97
sys/ufs/lfs/lfs_syscalls.c: 1.175
sys/ufs/lfs/lfs_vfsops.c: 1.360
sys/ufs/lfs/lfs_vnops.c: 1.316-1.321
sys/ufs/lfs/ulfs_inode.c: 1.20
sys/ufs/lfs/ulfs_inode.h: 1.24
sys/ufs/lfs/ulfs_lookup.c: 1.41
sys/ufs/lfs/ulfs_quota2.c: 1.31
sys/ufs/lfs/ulfs_readwrite.c: 1.24
sys/ufs/lfs/ulfs_vnops.c: 1.49-1.50
Update inode member i_flag --> i_state to keep up with kernel changes
Move definition of IN_ALLMOD near the flag it's a mask for.
Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
Correct confusion between i_flag and i_flags
These will have to be renamed.
Spotted by Riastradh, thanks!
Add an XXX about the missing flags so it's not buried in a commit
message.
now the XXX count for LFS is 260
Rename i_flag to i_state.
The similarity to i_flags has previously caused errors.
Use continue to denote the no-op loop to match netbsd style
newline for extra clarity.
It isn't safe to drain dirops with seglock held, it'll deadlock if there
are any dirops. drain before grabbing seglock.
lfs_dirops == 0 is always true (as we already drained dirops), so omit
that part of the comparison.
Fixes a lot of LFS deadlocks. PR kern/52301
Many thanks to dholland for help analyzing coredumps
Ifdef out KDASSERT which fires on my machine.
Deduplicate sanity check that seglock is held on segunlock
Revert r1.272 fix to PR kern/52301, the performance hit is making things
unusable.
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar
XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
fix buffer overflow/KASSERT when cookies are supplied
lfs no longer uses the ffs-style struct direct, use the correct minimum
size
from dholland
XXX more wrong
Consistently use {,UN}MARK_VNODE macros rather than function calls.
Not much point doing anything after a panic call
Ask some question about the code in a XXX comment
XXX question our double-flushing of dirops
Fix typo in comment
 1.41.12.1 19-Jan-2020  ad Set IMNT_SHRLOOKUP and use it for the in-cache case. Need to check what
more can be done with tmpfs though, it can probably do the whole lookup.
 1.41.6.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.47.10.1 02-Aug-2025  perseant Sync with HEAD
 1.13 19-Jun-2016  dholland Mark ufs file versions we're already synced with.
 1.12 28-Jun-2014  dholland branches: 1.12.4;
Revert the following changes:

src/sys/sys/quotactl.h 1.37
src/sys/compat/netbsd32/netbsd32.h 1.101
src/sys/compat/netbsd32/netbsd32_netbsd.c 1.188, 1.189
src/sys/kern/vfs_quotactl.c 1.39
src/sys/kern/vfs_syscalls.c 1.483
src/sys/ufs/lfs/ulfs_quota.c 1.11
src/sys/ufs/ufs/ufs_quota.c 1.116
src/lib/libquota/quota_kernel.c 1.5

and do them correctly.

If you're going to change the name of something, you need to change
the name of *all* the things with the same name, not just a handful,
and you should change it to something similar so it still matches the
rest of the system rather than just picking an arbitrarily different
name.

Hi, Joerg.

To wit, rename the quotactl "delete" operation to "del", because
"delete" is a reserved word in C++ and for some reason Joerg wants to
run internal interfaces used only by C code through his C++ compiler.
Do not rename it to "remove" instead, because this doesn't match
libquota or the rest of the usage throughout the system; and rename
all the related identifiers, not just the ones that blew the mind of
Joerg's C++ compiler.

Because this is not a user-facing API (the only userland consumer
sys/quotactl.h is libquota) it is sort of ok to make arbitrary
source-incompatible changes; however, by the same token it's completely
unnecessary. If it *were* a user-facing API that someone might have a
semi-rational reason to want to run a C++ compiler on, it would be
incorrect to change it at this point.
 1.11 12-Jun-2014  joerg Don't t use a C++ keyword as field name.
 1.10 22-Nov-2013  dholland branches: 1.10.2; 1.10.4;
fix typo; hi christos
 1.9 16-Nov-2013  dholland This is now equivalent to ufs_quota.c -r1.115.

(it isn't quite the same textually in a few places but this doesn't
really matter)
 1.8 18-Oct-2013  christos fix unused variable warnings
 1.7 28-Jul-2013  dholland Migrate the miscellaneous ulfs-level info from struct ulfsmount to
struct lfs.

Put them inside #ifdef _KERNEL there. They are not the only such
members, gross as that is. Unfortunately, moving struct lfs to
lfs_kernel.h does not work.
 1.6 06-Jun-2013  dholland branches: 1.6.2; 1.6.4;
Cleanups and hacks to make lfs userland stuff build:
- lfs_cksum.c doesn't actually need ulfs_inode.h any more.
- neither does lfs_itimes.c.
- add hacks to fsck_lfs to make it compile.
- add hacks to newfs_lfs to make it compile.
- fix warning in ulfs_quota.c when quotas are fully disabled
(as I guess is happening with the rumpity version)

XXX: This commit adds -I${NETBSDSRCDIR}/sys to the Makefiles for
XXX: fsck_lfs, newfs_lfs, and lfs_cleanerd. This needs to be cleaned
XXX: up ASAP; but I consider this less problematic in the short term
XXX: than spewing ulfs_*.h into /usr/include.
 1.5 06-Jun-2013  dholland Add lfs_ or ulfs_ in front of extern symbols lacking them, mostly
quota-related (and particularly quota2-related) stuff.
 1.4 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.3 06-Jun-2013  dholland Split lfs from ufs step 3: rearrange config stuff.
Add new options:
LFS_EI
LFS_DIRHASH
LFS_EXTATTR
LFS_EXTATTR_AUTOSTART
LFS_QUOTA
LFS_QUOTA2

and update code referring to the corresponding FFS and UFS config
symbols to use the LFS versions. Disable the one extant reference
to APPLE_UFS in the ulfs files. Use opt_lfs.h only, not opt_ffs.h.
 1.2 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.6.4.2 18-May-2014  rmind sync with head
 1.6.4.1 28-Aug-2013  rmind sync with head
 1.6.2.4 03-Dec-2017  jdolecek update from HEAD
 1.6.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.6.2.2 23-Jun-2013  tls resync from head
 1.6.2.1 06-Jun-2013  tls file ulfs_quota.c was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.10.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.10.4.1 22-Nov-2013  yamt file ulfs_quota.c was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.10.2.1 10-Aug-2014  tls Rebase.
 1.12.4.1 09-Jul-2016  skrll Sync with HEAD
 1.7 20-Jun-2016  dholland u_int{8,16,32,64}_t -> uint{8,16,32,64}_t in remaining lfs headers.
 1.6 19-Jun-2016  dholland Mark ufs file versions we're already synced with.
 1.5 28-Jun-2014  dholland branches: 1.5.4;
Revert the following changes:

src/sys/sys/quotactl.h 1.37
src/sys/compat/netbsd32/netbsd32.h 1.101
src/sys/compat/netbsd32/netbsd32_netbsd.c 1.188, 1.189
src/sys/kern/vfs_quotactl.c 1.39
src/sys/kern/vfs_syscalls.c 1.483
src/sys/ufs/lfs/ulfs_quota.c 1.11
src/sys/ufs/ufs/ufs_quota.c 1.116
src/lib/libquota/quota_kernel.c 1.5

and do them correctly.

If you're going to change the name of something, you need to change
the name of *all* the things with the same name, not just a handful,
and you should change it to something similar so it still matches the
rest of the system rather than just picking an arbitrarily different
name.

Hi, Joerg.

To wit, rename the quotactl "delete" operation to "del", because
"delete" is a reserved word in C++ and for some reason Joerg wants to
run internal interfaces used only by C code through his C++ compiler.
Do not rename it to "remove" instead, because this doesn't match
libquota or the rest of the usage throughout the system; and rename
all the related identifiers, not just the ones that blew the mind of
Joerg's C++ compiler.

Because this is not a user-facing API (the only userland consumer
sys/quotactl.h is libquota) it is sort of ok to make arbitrary
source-incompatible changes; however, by the same token it's completely
unnecessary. If it *were* a user-facing API that someone might have a
semi-rational reason to want to run a C++ compiler on, it would be
incorrect to change it at this point.
 1.4 06-Jun-2013  dholland branches: 1.4.2; 1.4.8; 1.4.10;
Add lfs_ or ulfs_ in front of extern symbols lacking them, mostly
quota-related (and particularly quota2-related) stuff.
 1.3 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.2 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.4.10.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.4.10.1 06-Jun-2013  yamt file ulfs_quota.h was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.4.8.1 10-Aug-2014  tls Rebase.
 1.4.2.4 03-Dec-2017  jdolecek update from HEAD
 1.4.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.4.2.2 23-Jun-2013  tls resync from head
 1.4.2.1 06-Jun-2013  tls file ulfs_quota.h was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.5.4.1 09-Jul-2016  skrll Sync with HEAD
 1.12 29-Jun-2021  dholland Add containment for the cloning devices hack in vn_open.

Cloning devices (and also things like /dev/stderr) work by allocating
a struct file, stuffing it in the file table (which is a layer
violation), stuffing the file descriptor number for it in a magic
field of struct lwp (which is gross), and then "failing" with one of
two magic errnos, EDUPFD or EMOVEFD.

Before this commit, all callers of vn_open in the kernel (there are
quite a few) were expected to check for these errors and handle the
situation. Needless to say, none of them except for open() itself did,
resulting in internal negative errnos being returned to userspace.

This hack is fairly deeply rooted and cannot be eliminated all at
once. This commit adds logic to handle the magic errnos inside
vn_open; now on success vn_open returns either a vnode or an integer
file descriptor, along with a flag that says whether the underlying
code requested EDUPFD or EMOVEFD. Callers not prepared to cope with
file descriptors can pass NULL for the extra return values, in which
case if a file descriptor would be produced vn_open fails with
EOPNOTSUPP.

Since I'm rearranging vn_open's signature anyway, stop exposing struct
nameidata. Instead, take three arguments: an optional vnode to use as
the starting point (like openat()), the path, and additional namei
flags to use, restricted to NOCHROOT and TRYEMULROOT. (Other namei
behavior, e.g. NOFOLLOW, can be requested via the open flags.)

This change requires a kernel bump. Ride the one an hour ago.
(That was supposed to be coordinated; did not intend to let an hour
slip by. My fault.)
 1.11 20-Jun-2016  dholland branches: 1.11.34;
Merge -r1.20 and -r1.21 of ufs_quota1.c: widen before multiplying.
 1.10 20-Jun-2016  dholland More already-merged or equivalent changes:

ufs_dirhash.c 1.36 corresponds to ulfs_dirhash.c 1.8
ufs_extattr.c 1.43 corresponds to ulfs_extattr.c 1.7
ufs_lookup.c 1.126 does not apply to lfs
ufs_lookup.c 1.127 we already have
ufs_lookup.c 1.128 does not apply to lfs
ufs_lookup.c 1.129 corresponds to ulfs_lookup.c 1.19
ufs_quota1.c 1.19 corresponds to ulfs_quota1.c 1.7
ufs_quota1.c 1.20 corresponds to ulfs_quota1.c 1.8
ufs_quota2.c 1.36 we have equivalent changes for
ufs_rename.c 1.9 corresponds to lfs_rename.c 1.5
ufs_rename.c 1.10 corresponds to lfs_rename.c 1.6
ufs_vnops.c 1.219 corresponds to lfs_vnops.c 1.260 and ulfs_vnops.c 1.19
ufs_vnops.c 1.220 corresponds to lfs_vnops.c 1.261 and ulfs_vnops.c 1.20
ufs_vnops.c 1.221 was superseded by later changes
ufs_vnops.c 1.222 got fixed independently in lfs
 1.9 26-Jul-2015  hannken Remove bogus "mutex_enter(&mntvnode_lock)".
 1.8 24-May-2014  christos branches: 1.8.4;
Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.
 1.7 17-Mar-2014  hannken branches: 1.7.2; 1.7.4;
Change lfsquota1_handle_cmd_quotaon() and lfs_q1sync()
to use vfs_vnode_iterator.
 1.6 28-Jul-2013  dholland Migrate the miscellaneous ulfs-level info from struct ulfsmount to
struct lfs.

Put them inside #ifdef _KERNEL there. They are not the only such
members, gross as that is. Unfortunately, moving struct lfs to
lfs_kernel.h does not work.
 1.5 08-Jun-2013  dholland branches: 1.5.2; 1.5.4;
mp->mnt_wapbl and mp->mnt_wapbl_replay are always NULL in here.
 1.4 06-Jun-2013  dholland Add lfs_ or ulfs_ in front of extern symbols lacking them, mostly
quota-related (and particularly quota2-related) stuff.
 1.3 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.2 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.5.4.2 18-May-2014  rmind sync with head
 1.5.4.1 28-Aug-2013  rmind sync with head
 1.5.2.4 03-Dec-2017  jdolecek update from HEAD
 1.5.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.5.2.2 23-Jun-2013  tls resync from head
 1.5.2.1 08-Jun-2013  tls file ulfs_quota1.c was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.7.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.7.4.1 17-Mar-2014  yamt file ulfs_quota1.c was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.7.2.1 10-Aug-2014  tls Rebase.
 1.8.4.2 09-Jul-2016  skrll Sync with HEAD
 1.8.4.1 22-Sep-2015  skrll Sync with HEAD
 1.11.34.1 01-Aug-2021  thorpej Sync with HEAD.
 1.5 20-Jun-2016  dholland u_int{8,16,32,64}_t -> uint{8,16,32,64}_t in remaining lfs headers.
 1.4 06-Jun-2013  dholland branches: 1.4.2; 1.4.10; 1.4.14;
Add lfs_ or ulfs_ in front of extern symbols lacking them, mostly
quota-related (and particularly quota2-related) stuff.
 1.3 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.2 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.4.14.1 09-Jul-2016  skrll Sync with HEAD
 1.4.10.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.4.10.1 06-Jun-2013  yamt file ulfs_quota1.h was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.4.2.3 03-Dec-2017  jdolecek update from HEAD
 1.4.2.2 23-Jun-2013  tls resync from head
 1.4.2.1 06-Jun-2013  tls file ulfs_quota1.h was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.4 25-Jul-2021  skrll #include <sys/param.h> for COHERENCY_UNIT (and KNF)
 1.3 06-Jun-2013  dholland branches: 1.3.2; 1.3.10; 1.3.54;
Add lfs_ or ulfs_ in front of extern symbols lacking them, mostly
quota-related (and particularly quota2-related) stuff.
 1.2 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.3.54.1 01-Aug-2021  thorpej Sync with HEAD.
 1.3.10.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.3.10.1 06-Jun-2013  yamt file ulfs_quota1_subr.c was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.3.2.2 23-Jun-2013  tls resync from head
 1.3.2.1 06-Jun-2013  tls file ulfs_quota1_subr.c was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.35 28-May-2022  andvar s/grabing/grabbing/ in comments.
 1.34 15-Oct-2021  andvar fix typos in comments.
 1.33 05-Dec-2020  thorpej Remove unnecessary inclusion of <sys/timevar.h>.
 1.32 17-Jan-2020  ad branches: 1.32.6;
VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.
 1.31 10-Jun-2017  maya branches: 1.31.6; 1.31.12;
Rename i_flag to i_state.

The similarity to i_flags has previously caused errors.
 1.30 30-Mar-2017  hannken branches: 1.30.6;
Remove now redundant calls to fstrans_start()/fstrans_done().

Add fstrans_start()/fstrans_done() to lfs_putpages().
 1.29 20-Nov-2016  riastradh branches: 1.29.2;
KASSERT(mutex_owner(...)) ---> KASSERT(mutex_owned(...))
 1.28 07-Jul-2016  msaitoh branches: 1.28.2;
KNF. Remove extra spaces. No functional change.
 1.27 20-Jun-2016  dholland Note more already-merged versions:

inode.h 1.68 is subsumed by ulfs_inode.h 1.19
inode.h 1.69-1.72 do not apply to lfs
ufs_extern.h 1.74 was covered when lfs was moved to the new vnode cache
ufs_extern.h 1.75 is equivalent to ulfs_extern.h 1.13
ufs_extern.h 1.76-1.77 do not apply to lfs
ufsmount.h 1.42 does not apply to lfs
ufs_inode.c 1.90 is subsumed by ulfs_inode.c 1.10
ufs_inode.c 1.91-1.92 do not apply to lfs
ufs_lookup.c 1.130 is subsumed by ulfs_lookup.c 1.24
ufs_lookup.c 1.131 is equivalent to ulfs_lookup.c 1.20
ufs_lookup.c 1.132 is equivalent to ulfs_lookup.c 1.21
ufs_lookup.c 1.133 is equivalent to ulfs_lookup.c 1.22
ufs_lookup.c 1.134 is equivalent to ulfs_lookup.c 1.23
ufs_lookup.c 1.135 is equivalent to ulfs_lookup.c 1.25
ufs_quota2.c 1.38 is equivalent to ulfs_quota2.c 1.17
ufs_quota2.c 1.39 is equivalent to ulfs_quota2.c 1.16
ufs_quota2.c 1.40 is equivalent to ulfs_quota2.c 1.18
ufs_vfsops.c 1.53 is subsumed by lfs_vfsops.c 1.324
ufs_vfsops.c 1.54 is subsumed by lfs_vfsops.c 1.324
ufs_vnops.c 1.223-1.224 do not apply to lfs
 1.26 20-Jun-2016  dholland Merge some cosmetic changes from ffs_quota2.c 1.5. I didn't merge the
whitespace changes.
 1.25 20-Jun-2016  dholland Remove stray 'n' in file. silly control key...
 1.24 20-Jun-2016  dholland Merge ufs_quota2.c 1.37: set grace time if lowering the limit causes
the user/group to now be over quota. From Edgar Fu�.
 1.23 20-Jun-2016  dholland More already-merged or equivalent changes:

ufs_dirhash.c 1.36 corresponds to ulfs_dirhash.c 1.8
ufs_extattr.c 1.43 corresponds to ulfs_extattr.c 1.7
ufs_lookup.c 1.126 does not apply to lfs
ufs_lookup.c 1.127 we already have
ufs_lookup.c 1.128 does not apply to lfs
ufs_lookup.c 1.129 corresponds to ulfs_lookup.c 1.19
ufs_quota1.c 1.19 corresponds to ulfs_quota1.c 1.7
ufs_quota1.c 1.20 corresponds to ulfs_quota1.c 1.8
ufs_quota2.c 1.36 we have equivalent changes for
ufs_rename.c 1.9 corresponds to lfs_rename.c 1.5
ufs_rename.c 1.10 corresponds to lfs_rename.c 1.6
ufs_vnops.c 1.219 corresponds to lfs_vnops.c 1.260 and ulfs_vnops.c 1.19
ufs_vnops.c 1.220 corresponds to lfs_vnops.c 1.261 and ulfs_vnops.c 1.20
ufs_vnops.c 1.221 was superseded by later changes
ufs_vnops.c 1.222 got fixed independently in lfs
 1.22 14-Nov-2015  pgoyette Remove historic references to wapbl.
 1.21 28-Jul-2015  dholland Add a new lfs header file: lfs_accessors.h.

This contains all the accessor functions and macros out of lfs.h.
Add an include of lfs_accessors.h after all uses of lfs.h... except
for code that wants to define its own struct lfs-alike that the
accessors are supposed to play along with. For these, set STRUCT_LFS
and include lfs_accessors.h after the necessary structure has been
defined, so that lfs_accessors.h can emit functions in terms of it.
 1.20 24-Jul-2015  dholland More lfs superblock accessors.
(This changes the rest of the code over; all the accessors were
already added.)

The difference between this commit and the previous one is arbitrary,
but the previous one passed the regression tests on its own so I'm
keeping it separate to help with any bisections that might be needed
in the future.
 1.19 24-Jul-2015  dholland Switch to accessor functions for elements of the LFS on-disk
superblock. This will allow switching between 32/64 bit forms on the
fly; it will also allow handling LFS_EI reasonably tidily. (That
currently doesn't work on the superblock.)

It also gets rid of cpp abuse in the form of fake structure member
macros.

Also, instead of doing sleep/wakeup on &lfs_avail and &lfs_nextseg
inside the on-disk superblock, add extra elements to the in-memory
struct lfs for this. (XXX: these should be changed to condvars, but
not right now)

XXX: this migrates a structure needed by the lfs code in libsa (struct
salfs) into lfs.h, where it doesn't belong, but for the time being
this is necessary in order to allow the accessors (and the various
lfs macros and other goop that relies on them) to compile.
 1.18 28-Mar-2015  maxv Remove the 'cred' argument from bread(). Remove a now unused var in
ffs_snapshot.c. Update the man page accordingly.

ok hannken@
 1.17 08-Dec-2014  justin Avoid uninitialized variable error in some cases with gcc
 1.16 28-Jun-2014  dholland branches: 1.16.4;
Revert the following changes:

src/sys/sys/quotactl.h 1.37
src/sys/compat/netbsd32/netbsd32.h 1.101
src/sys/compat/netbsd32/netbsd32_netbsd.c 1.188, 1.189
src/sys/kern/vfs_quotactl.c 1.39
src/sys/kern/vfs_syscalls.c 1.483
src/sys/ufs/lfs/ulfs_quota.c 1.11
src/sys/ufs/ufs/ufs_quota.c 1.116
src/lib/libquota/quota_kernel.c 1.5

and do them correctly.

If you're going to change the name of something, you need to change
the name of *all* the things with the same name, not just a handful,
and you should change it to something similar so it still matches the
rest of the system rather than just picking an arbitrarily different
name.

Hi, Joerg.

To wit, rename the quotactl "delete" operation to "del", because
"delete" is a reserved word in C++ and for some reason Joerg wants to
run internal interfaces used only by C code through his C++ compiler.
Do not rename it to "remove" instead, because this doesn't match
libquota or the rest of the usage throughout the system; and rename
all the related identifiers, not just the ones that blew the mind of
Joerg's C++ compiler.

Because this is not a user-facing API (the only userland consumer
sys/quotactl.h is libquota) it is sort of ok to make arbitrary
source-incompatible changes; however, by the same token it's completely
unnecessary. If it *were* a user-facing API that someone might have a
semi-rational reason to want to run a C++ compiler on, it would be
incorrect to change it at this point.
 1.15 18-Oct-2013  christos branches: 1.15.2; 1.15.4;
fix unused variable warnings
 1.14 18-Oct-2013  christos use __USE() in the right place, instead of (void)var.
 1.13 29-Jul-2013  dholland Fix build both with and without options LFS_EI.
 1.12 29-Jul-2013  dholland Revert previous; it is wrong.
 1.11 28-Jul-2013  pgoyette Remove more unused variables to unbreak the build.
 1.10 28-Jul-2013  dholland Bring in a copy of ffs_quota2_mount() for reference.
Add stuff to struct lfs that it needs to initialize.
Clear these fields in mount as there's no on-disk support for quota2;
but this increases the chances of being able to add it (or something
like it) in the future.
 1.9 28-Jul-2013  dholland Migrate the miscellaneous ulfs-level info from struct ulfsmount to
struct lfs.

Put them inside #ifdef _KERNEL there. They are not the only such
members, gross as that is. Unfortunately, moving struct lfs to
lfs_kernel.h does not work.
 1.8 28-Jul-2013  dholland Remove the now-pointless ulfs ops macros.
 1.7 28-Jul-2013  dholland Get rid of the ulfs_ops table as we only have one fs in here now.
 1.6 08-Jun-2013  dholland branches: 1.6.2; 1.6.4;
There is no WAPBL in LFS.
 1.5 06-Jun-2013  dholland Add lfs_ or ulfs_ in front of extern symbols lacking them, mostly
quota-related (and particularly quota2-related) stuff.
 1.4 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.3 06-Jun-2013  dholland Split lfs from ufs step 3: rearrange config stuff.
Add new options:
LFS_EI
LFS_DIRHASH
LFS_EXTATTR
LFS_EXTATTR_AUTOSTART
LFS_QUOTA
LFS_QUOTA2

and update code referring to the corresponding FFS and UFS config
symbols to use the LFS versions. Disable the one extant reference
to APPLE_UFS in the ulfs files. Use opt_lfs.h only, not opt_ffs.h.
 1.2 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.6.4.2 18-May-2014  rmind sync with head
 1.6.4.1 28-Aug-2013  rmind sync with head
 1.6.2.4 03-Dec-2017  jdolecek update from HEAD
 1.6.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.6.2.2 23-Jun-2013  tls resync from head
 1.6.2.1 08-Jun-2013  tls file ulfs_quota2.c was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.15.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.15.4.1 18-Oct-2013  yamt file ulfs_quota2.c was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.15.2.1 10-Aug-2014  tls Rebase.
 1.16.4.6 28-Aug-2017  skrll Sync with HEAD
 1.16.4.5 05-Dec-2016  skrll Sync with HEAD
 1.16.4.4 09-Jul-2016  skrll Sync with HEAD
 1.16.4.3 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.16.4.2 22-Sep-2015  skrll Sync with HEAD
 1.16.4.1 06-Apr-2015  skrll Sync with HEAD
 1.28.2.2 26-Apr-2017  pgoyette Sync with HEAD
 1.28.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.29.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.30.6.1 30-Oct-2017  snj Pull up following revision(s) (requested by maya in ticket #330):
sbin/fsck_lfs/inode.c: 1.69
sbin/fsck_lfs/lfs.c: 1.73
sbin/fsck_lfs/pass6.c: 1.50
sbin/fsck_lfs/segwrite.c: 1.46
sys/ufs/lfs/lfs.h: 1.202-1.203
sys/ufs/lfs/lfs_accessors.h: 1.48
sys/ufs/lfs/lfs_alloc.c: 1.136-1.137
sys/ufs/lfs/lfs_balloc.c: 1.94
sys/ufs/lfs/lfs_bio.c: 1.141
sys/ufs/lfs/lfs_extern.h: 1.113
sys/ufs/lfs/lfs_inode.c: 1.156-1.157
sys/ufs/lfs/lfs_inode.h: 1.20, 1.21, 1.23
sys/ufs/lfs/lfs_itimes.c: 1.20
sys/ufs/lfs/lfs_pages.c: 1.13-1.15
sys/ufs/lfs/lfs_rename.c: 1.22
sys/ufs/lfs/lfs_segment.c: 1.270-1.275
sys/ufs/lfs/lfs_subr.c: 1.94-1.97
sys/ufs/lfs/lfs_syscalls.c: 1.175
sys/ufs/lfs/lfs_vfsops.c: 1.360
sys/ufs/lfs/lfs_vnops.c: 1.316-1.321
sys/ufs/lfs/ulfs_inode.c: 1.20
sys/ufs/lfs/ulfs_inode.h: 1.24
sys/ufs/lfs/ulfs_lookup.c: 1.41
sys/ufs/lfs/ulfs_quota2.c: 1.31
sys/ufs/lfs/ulfs_readwrite.c: 1.24
sys/ufs/lfs/ulfs_vnops.c: 1.49-1.50
Update inode member i_flag --> i_state to keep up with kernel changes
Move definition of IN_ALLMOD near the flag it's a mask for.
Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
Correct confusion between i_flag and i_flags
These will have to be renamed.
Spotted by Riastradh, thanks!
Add an XXX about the missing flags so it's not buried in a commit
message.
now the XXX count for LFS is 260
Rename i_flag to i_state.
The similarity to i_flags has previously caused errors.
Use continue to denote the no-op loop to match netbsd style
newline for extra clarity.
It isn't safe to drain dirops with seglock held, it'll deadlock if there
are any dirops. drain before grabbing seglock.
lfs_dirops == 0 is always true (as we already drained dirops), so omit
that part of the comparison.
Fixes a lot of LFS deadlocks. PR kern/52301
Many thanks to dholland for help analyzing coredumps
Ifdef out KDASSERT which fires on my machine.
Deduplicate sanity check that seglock is held on segunlock
Revert r1.272 fix to PR kern/52301, the performance hit is making things
unusable.
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar
XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
fix buffer overflow/KASSERT when cookies are supplied
lfs no longer uses the ffs-style struct direct, use the correct minimum
size
from dholland
XXX more wrong
Consistently use {,UN}MARK_VNODE macros rather than function calls.
Not much point doing anything after a panic call
Ask some question about the code in a XXX comment
XXX question our double-flushing of dirops
Fix typo in comment
 1.31.12.1 17-Jan-2020  ad Sync with head.
 1.31.6.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.32.6.1 14-Dec-2020  thorpej Sync w/ HEAD.
 1.4 06-Jun-2013  dholland branches: 1.4.2; 1.4.10;
Add lfs_ or ulfs_ in front of extern symbols lacking them, mostly
quota-related (and particularly quota2-related) stuff.
 1.3 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.2 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.4.10.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.4.10.1 06-Jun-2013  yamt file ulfs_quota2.h was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.4.2.2 23-Jun-2013  tls resync from head
 1.4.2.1 06-Jun-2013  tls file ulfs_quota2.h was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.7 24-Aug-2023  andvar s/defaut/default/ in comments.
 1.6 06-Jun-2013  dholland branches: 1.6.2; 1.6.10;
Cleanups to reduce symbol and header exposure:
- move struct ufid from ulfs_inode.h to lfs.h
- lfs.h needs sys/mount.h and sys/pool.h
- ulfs_quota2_subr.c needs lfs_inode.h
- remove ulfs_inode.h from lfs.h in favor of ulfs_dinode.h
- move ULFS_NDADDR, ULFS_NIADDR, ULFS_NXADDR from ulfs_dinode.h to lfs.h
- remove ulfs_dinode.h from lfs.h
- add lfs.h to ulfs_dinode.h
 1.5 06-Jun-2013  dholland Remove stray references to ext2fs, chfs, ffs, and mfs.
 1.4 06-Jun-2013  dholland Add lfs_ or ulfs_ in front of extern symbols lacking them, mostly
quota-related (and particularly quota2-related) stuff.
 1.3 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.2 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.6.10.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.6.10.1 06-Jun-2013  yamt file ulfs_quota2_subr.c was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.6.2.2 23-Jun-2013  tls resync from head
 1.6.2.1 06-Jun-2013  tls file ulfs_quota2_subr.c was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.4 08-Jun-2013  dholland branches: 1.4.2; 1.4.10;
Split the definitions suitable for userland out of ulfs_inode.h into
lfs_inode.h. Since fsck_lfs, newfs_lfs, and lfs_cleanerd want to reuse
the inode structure for their own internal use, and some of them share
parts of the kernel code as well, the best way forward is to provide a
relatively sanitized header that doesn't bring in stray material.

Shuffle a few other definitions around so that lfs_inode.h depends
only on lfs.h.

Install lfs_inode.h into /usr/include.
 1.3 06-Jun-2013  dholland Add lfs_ or ulfs_ in front of extern symbols lacking them, mostly
quota-related (and particularly quota2-related) stuff.
 1.2 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.4.10.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.4.10.1 08-Jun-2013  yamt file ulfs_quotacommon.h was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.4.2.2 23-Jun-2013  tls resync from head
 1.4.2.1 08-Jun-2013  tls file ulfs_quotacommon.h was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.28 20-Oct-2021  thorpej Overhaul of the EVFILT_VNODE kevent(2) filter:

- Centralize vnode kevent handling in the VOP_*() wrappers, rather than
forcing each individual file system to deal with it (except VOP_RENAME(),
because VOP_RENAME() is a mess and we currently have 2 different ways
of handling it; at least it's reasonably well-centralized in the "new"
way).
- Add support for NOTE_OPEN, NOTE_CLOSE, NOTE_CLOSE_WRITE, and NOTE_READ,
compatible with the same events in FreeBSD.
- Track which kevent notifications clients are interested in receiving
to avoid doing work for events no one cares about (avoiding, e.g.
taking locks and traversing the klist to send a NOTE_WRITE when
someone is merely watching for a file to be deleted, for example).

In support of the above:

- Add support in vnode_if.sh for specifying PRE- and POST-op handlers,
to be invoked before and after vop_pre() and vop_post(), respectively.
Basic idea from FreeBSD, but implemented differently.
- Add support in vnode_if.sh for specifying CONTEXT fields in the
vop_*_args structures. These context fields are used to convey information
between the file system VOP function and the VOP wrapper, but do not
occupy an argument slot in the VOP_*() call itself. These context fields
are initialized and subsequently interpreted by PRE- and POST-op handlers.
- Version VOP_REMOVE(), uses the a context field for the file system to report
back the resulting link count of the target vnode. Return this in tmpfs,
udf, nfs, chfs, ext2fs, lfs, and ufs.

NetBSD 9.99.92.
 1.27 23-Apr-2020  ad PR kern/54759 (vm.ubc_direct deadlock when read()/write() into mapping of itself)

- Add new flag UBC_ISMAPPED which tells ubc_uiomove() the object is mmap()ed
somewhere. Use it to decide whether to do direct-mapped copy, rather than
poking around directly in the vnode in ubc_uiomove(), which is ugly and
doesn't work for tmpfs. It would be nicer to contain all this in UVM but
the filesystem provides the needed locking here (VV_MAPPED) and to
reinvent that would suck more.

- Rename UBC_UNMAP_FLAG() to UBC_VNODE_FLAGS(). Pass in UBC_ISMAPPED where
appropriate.
 1.26 23-Feb-2020  ad branches: 1.26.4;
UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.25 20-Jun-2019  christos branches: 1.25.4;
unifdef -DLFS_READWRITE ulfs_readwrite.c
 1.24 10-Jun-2017  maya branches: 1.24.6;
Rename i_flag to i_state.

The similarity to i_flags has previously caused errors.
 1.23 30-Mar-2017  hannken branches: 1.23.6;
Remove now redundant calls to fstrans_start()/fstrans_done().

Add fstrans_start()/fstrans_done() to lfs_putpages().
 1.22 20-Jun-2016  dholland branches: 1.22.2; 1.22.4;
One more batch of already-synced ufs changes:

ufs_extern.h 1.79 is equivalent to ulfs_extern.h 1.14
ufsmount.h 1.43 is (roughly) equivalent to lfs_extern.h 1.102
ufs_inode.c 1.94 does not apply to lfs
ufs_inode.c 1.95 does not apply to lfs either
ufs_readwrite.c 1.108 is equivalent to ulfs_readwrite.c 1.8
ufs_readwrite.c 1.109 is equivalent to ulfs_readwrite.c 1.9
ufs_readwrite.c 1.110 is equivalent to ulfs_readwrite.c 1.10
ufs_readwrite.c 1.111 does not apply to lfs
ufs_readwrite.c 1.112 is equivalent to ulfs_readwrite.c 1.11
ufs_readwrite.c 1.113 is equivalent to ulfs_readwrite.c 1.13
ufs_readwrite.c 1.114 is equivalent to ulfs_readwrite.c 1.14
ufs_readwrite.c 1.115 is equivalent to ulfs_readwrite.c 1.15
ufs_readwrite.c 1.116-1.118 does not apply to lfs
ufs_readwrite.c 1.119-1.120 are equivalent to ulfs_readwrite.c 1.16
ufs_rename.c 1.12 is equivalent to lfs_rename.c 1.8
ufs_vnops.c 1.226 is equivalent to ulfs_vnops.c 1.22 and lfs_vnops.c 1.270
ufs_vnops.c 1.227 is equivalent to ulfs_vnops.c 1.23
ufs_vnops.c 1.228-1.229 are equivalent to ulfs_vnops.c 1.24
ufs_vnops.c 1.230 is equivalent to ulfs_vnops.c 1.25 and lfs_vnops.c 1.271
ufs_vnops.c 1.231 originated in lfs
ufs_vnops.c 1.232 does not apply to lfs
 1.21 19-Jun-2016  dholland Mark ufs file versions we're already synced with.
 1.20 23-Nov-2015  mlelstv fix assertion checking that bufrd function is used only for large symlinks
that aren't embedded in the inode.
 1.19 24-Jul-2015  dholland More lfs superblock accessors.
(This changes the rest of the code over; all the accessors were
already added.)

The difference between this commit and the previous one is arbitrary,
but the previous one passed the regression tests on its own so I'm
keeping it separate to help with any bisections that might be needed
in the future.
 1.18 24-Jul-2015  dholland Switch to accessor functions for elements of the LFS on-disk
superblock. This will allow switching between 32/64 bit forms on the
fly; it will also allow handling LFS_EI reasonably tidily. (That
currently doesn't work on the superblock.)

It also gets rid of cpp abuse in the form of fake structure member
macros.

Also, instead of doing sleep/wakeup on &lfs_avail and &lfs_nextseg
inside the on-disk superblock, add extra elements to the in-memory
struct lfs for this. (XXX: these should be changed to condvars, but
not right now)

XXX: this migrates a structure needed by the lfs code in libsa (struct
salfs) into lfs.h, where it doesn't belong, but for the time being
this is necessary in order to allow the accessors (and the various
lfs macros and other goop that relies on them) to compile.
 1.17 12-Apr-2015  riastradh Strip IO_JOURNALLOCKED, PGO_JOURNALLOCKED out of ulfs_readwrite.c.

These are vestigial from ufs_readwrite.c with wapbl -- lfs does not
have a journal but only the explicit wapbl calls, not these flags,
got ripped out in the transition to ulfs_readwrite.c.
 1.16 12-Apr-2015  riastradh Same putpages->kassert in ulfs_readwrite.c
 1.15 28-Mar-2015  maxv Remove the 'cred' argument from bread(). Remove a now unused var in
ffs_snapshot.c. Update the man page accordingly.

ok hannken@
 1.14 28-Mar-2015  riastradh Let I/O errors override inode update errors in UFS.

Fixes tests/fs/vfs/t_io:read_fault for UFS.
 1.13 28-Mar-2015  maxv Remove the 'cred' argument from breadn(), and update the man page
accordingly.

ok hannken@
 1.12 28-Mar-2015  riastradh Make some comments match better in ulfs_readwrite.
 1.11 28-Mar-2015  riastradh Factor out post-read/write inode updates in UFS.
 1.10 28-Mar-2015  riastradh Turn some `#if DIAGNOSTIC' into KASSERT.
 1.9 27-Mar-2015  riastradh Tighten some kasserts in ufs_bufio code paths.
 1.8 27-Mar-2015  riastradh Disentangle buffer-cached I/O from page-cached I/O in UFS.

Page-cached I/O is used for regular files, and is initiated by VFS
users such as userland and NFS.

Buffer-cached I/O is used for directories and symlinks, and is issued
only internally by UFS.

New UFS routine ufs_bufio replaces vn_rdwr for internal use.
ufs_bufio is implemented by new UFS operations uo_bufrd/uo_bufwr,
which sit in ufs_readwrite.c alongside the VOP_READ/VOP_WRITE
implementations.

I preserved the code as much as possible and will leave further
simplification for future commits. I kept the ulfs_readwrite.c
copypasta close to ufs_readwrite.c in case we ever want to merge them
back; likewise ext2fs_readwrite.c.

No externally visible semantic change. All atf fs tests still pass.
 1.7 17-Oct-2013  christos branches: 1.7.4; 1.7.8;
- remove unused variables
- add debug ifdefs for debugging variables
- __USE() where appropriate.
 1.6 28-Jul-2013  dholland Migrate the miscellaneous ulfs-level info from struct ulfsmount to
struct lfs.

Put them inside #ifdef _KERNEL there. They are not the only such
members, gross as that is. Unfortunately, moving struct lfs to
lfs_kernel.h does not work.
 1.5 28-Jul-2013  dholland Remove the now-pointless ulfs ops macros.
 1.4 18-Jun-2013  christos branches: 1.4.2; 1.4.4;
Prefix most of the cpp macros with lfs_ and LFS_ to avoid conflicts with ffs.
This was done so that boot blocks that want to compile both FFS and LFS in
the same file work.
 1.3 08-Jun-2013  dholland There is no WAPBL in LFS.
 1.2 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.4.4.2 18-May-2014  rmind sync with head
 1.4.4.1 28-Aug-2013  rmind sync with head
 1.4.2.4 03-Dec-2017  jdolecek update from HEAD
 1.4.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.4.2.2 23-Jun-2013  tls resync from head
 1.4.2.1 18-Jun-2013  tls file ulfs_readwrite.c was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.7.8.5 28-Aug-2017  skrll Sync with HEAD
 1.7.8.4 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.7.8.3 22-Sep-2015  skrll Sync with HEAD
 1.7.8.2 06-Jun-2015  skrll Sync with HEAD
 1.7.8.1 06-Apr-2015  skrll Sync with HEAD
 1.7.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.7.4.1 17-Oct-2013  yamt file ulfs_readwrite.c was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.22.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.22.2.1 26-Apr-2017  pgoyette Sync with HEAD
 1.23.6.1 30-Oct-2017  snj Pull up following revision(s) (requested by maya in ticket #330):
sbin/fsck_lfs/inode.c: 1.69
sbin/fsck_lfs/lfs.c: 1.73
sbin/fsck_lfs/pass6.c: 1.50
sbin/fsck_lfs/segwrite.c: 1.46
sys/ufs/lfs/lfs.h: 1.202-1.203
sys/ufs/lfs/lfs_accessors.h: 1.48
sys/ufs/lfs/lfs_alloc.c: 1.136-1.137
sys/ufs/lfs/lfs_balloc.c: 1.94
sys/ufs/lfs/lfs_bio.c: 1.141
sys/ufs/lfs/lfs_extern.h: 1.113
sys/ufs/lfs/lfs_inode.c: 1.156-1.157
sys/ufs/lfs/lfs_inode.h: 1.20, 1.21, 1.23
sys/ufs/lfs/lfs_itimes.c: 1.20
sys/ufs/lfs/lfs_pages.c: 1.13-1.15
sys/ufs/lfs/lfs_rename.c: 1.22
sys/ufs/lfs/lfs_segment.c: 1.270-1.275
sys/ufs/lfs/lfs_subr.c: 1.94-1.97
sys/ufs/lfs/lfs_syscalls.c: 1.175
sys/ufs/lfs/lfs_vfsops.c: 1.360
sys/ufs/lfs/lfs_vnops.c: 1.316-1.321
sys/ufs/lfs/ulfs_inode.c: 1.20
sys/ufs/lfs/ulfs_inode.h: 1.24
sys/ufs/lfs/ulfs_lookup.c: 1.41
sys/ufs/lfs/ulfs_quota2.c: 1.31
sys/ufs/lfs/ulfs_readwrite.c: 1.24
sys/ufs/lfs/ulfs_vnops.c: 1.49-1.50
Update inode member i_flag --> i_state to keep up with kernel changes
Move definition of IN_ALLMOD near the flag it's a mask for.
Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
Correct confusion between i_flag and i_flags
These will have to be renamed.
Spotted by Riastradh, thanks!
Add an XXX about the missing flags so it's not buried in a commit
message.
now the XXX count for LFS is 260
Rename i_flag to i_state.
The similarity to i_flags has previously caused errors.
Use continue to denote the no-op loop to match netbsd style
newline for extra clarity.
It isn't safe to drain dirops with seglock held, it'll deadlock if there
are any dirops. drain before grabbing seglock.
lfs_dirops == 0 is always true (as we already drained dirops), so omit
that part of the comparison.
Fixes a lot of LFS deadlocks. PR kern/52301
Many thanks to dholland for help analyzing coredumps
Ifdef out KDASSERT which fires on my machine.
Deduplicate sanity check that seglock is held on segunlock
Revert r1.272 fix to PR kern/52301, the performance hit is making things
unusable.
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar
XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
fix buffer overflow/KASSERT when cookies are supplied
lfs no longer uses the ffs-style struct direct, use the correct minimum
size
from dholland
XXX more wrong
Consistently use {,UN}MARK_VNODE macros rather than function calls.
Not much point doing anything after a panic call
Ask some question about the code in a XXX comment
XXX question our double-flushing of dirops
Fix typo in comment
 1.24.6.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.24.6.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.25.4.1 29-Feb-2020  ad Sync with head.
 1.26.4.1 25-Apr-2020  bouyer Sync with bouyer-xenpvh-base2 (HEAD)
 1.10 20-Jul-2013  dholland Collect the pieces of lfs rename into lfs_rename.c, and sprinkle static.
 1.9 19-Jun-2013  dholland branches: 1.9.2; 1.9.4; 1.9.6;
Rename ambiguous macros:
MAXDIRSIZE -> UFS_MAXDIRSIZE or LFS_MAXDIRSIZE
NINDIR -> FFS_NINDIR, EXT2_NINDIR, LFS_NINDIR, or MFS_NINDIR
INOPB -> FFS_INOPB, LFS_INOPB
INOPF -> FFS_INOPF, LFS_INOPF
blksize -> ffs_blksize, ext2_blksize, or lfs_blksize
sblksize -> ffs_blksize

These are not the only ambiguously defined filesystem macros, of
course, there's a pile more. I may not have found all the ambiguous
definitions of blksize(), too, as there are a lot of other things
called 'blksize' in the system.
 1.8 08-Jun-2013  dholland ulfs_dir.h has been emptied; remove it.
 1.7 08-Jun-2013  dholland There is no WAPBL in LFS.
 1.6 08-Jun-2013  dholland DIRBLKSIZ -> LFS_DIRBLKSIZ
DIRECTSIZ -> LFS_DIRECTSIZ
DIRSIZ -> LFS_DIRSIZ
OLDDIRFMT -> LFS_OLDDIRFMT
NEWDIRFMT -> LFS_NEWDIRFMT
IFTODT -> LFS_IFTODT
DTTOIF -> LFS_DTTOIF
 1.5 08-Jun-2013  dholland struct direct -> struct lfs_direct
struct dirtemplate -> struct lfs_dirtemplate
struct odirtemplate -> struct lfs_odirtemplate
DT_* -> LFS_DT_*
 1.4 08-Jun-2013  dholland Stick LFS_ in front of IFMT, IFIFO, IFREG, etc. so as not to conflict
with the UFS copies of these symbols. (Which themselves ought to have
UFS_ stuck on.)
 1.3 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.2 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.9.6.1 23-Jul-2013  riastradh sync with HEAD
 1.9.4.1 28-Aug-2013  rmind sync with head
 1.9.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.9.2.2 23-Jun-2013  tls resync from head
 1.9.2.1 19-Jun-2013  tls file ulfs_rename.c was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.4 05-Sep-2020  riastradh Round of uvm.h cleanup.

The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
to query whether curlwp is the pagedaemon, which should maybe be
exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
by it. We should split up uvm_extern.h but this will serve for now
to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
UVMHIST(ubchist), since ubchist is declared in uvm.h but the
reference evaporates if UVMHIST is not defined, so we reduce header
file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
here.

ok chs@
 1.3 14-Nov-2015  pgoyette Remove historic references to wapbl.
 1.2 08-Jun-2013  dholland branches: 1.2.2; 1.2.10; 1.2.14;
There is no WAPBL in LFS.
 1.1 06-Jun-2013  dholland Apparently we also need to cut and paste ffs_snapgone() in order to be
able to link the ufs code.

Instead of actually cutting and pasting it (as it depends on ffs-only
things) implement it as panic. Probably we'll be able to demonstrate
later that it's unreachable.

XXX: Someone should add snapgone to struct ufs_ops in ufs/ufsmount.h,
XXX: and fix ufs/ufs_lookup.c to not hardwire ffs.
 1.2.14.1 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.2.10.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.2.10.1 08-Jun-2013  yamt file ulfs_snapshot.c was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.2.2.3 03-Dec-2017  jdolecek update from HEAD
 1.2.2.2 23-Jun-2013  tls resync from head
 1.2.2.1 08-Jun-2013  tls file ulfs_snapshot.c was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.16 17-Jan-2020  ad VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.
 1.15 22-Dec-2019  ad branches: 1.15.2;
Make mntvnode_lock per-mount, and address false sharing of struct mount.
 1.14 10-Dec-2018  maxv Remove unused mbuf.h includes.
 1.13 17-Apr-2017  hannken branches: 1.13.10; 1.13.12;
Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.
 1.12 20-Jun-2016  dholland branches: 1.12.2; 1.12.4;
Note more already-merged versions:

inode.h 1.68 is subsumed by ulfs_inode.h 1.19
inode.h 1.69-1.72 do not apply to lfs
ufs_extern.h 1.74 was covered when lfs was moved to the new vnode cache
ufs_extern.h 1.75 is equivalent to ulfs_extern.h 1.13
ufs_extern.h 1.76-1.77 do not apply to lfs
ufsmount.h 1.42 does not apply to lfs
ufs_inode.c 1.90 is subsumed by ulfs_inode.c 1.10
ufs_inode.c 1.91-1.92 do not apply to lfs
ufs_lookup.c 1.130 is subsumed by ulfs_lookup.c 1.24
ufs_lookup.c 1.131 is equivalent to ulfs_lookup.c 1.20
ufs_lookup.c 1.132 is equivalent to ulfs_lookup.c 1.21
ufs_lookup.c 1.133 is equivalent to ulfs_lookup.c 1.22
ufs_lookup.c 1.134 is equivalent to ulfs_lookup.c 1.23
ufs_lookup.c 1.135 is equivalent to ulfs_lookup.c 1.25
ufs_quota2.c 1.38 is equivalent to ulfs_quota2.c 1.17
ufs_quota2.c 1.39 is equivalent to ulfs_quota2.c 1.16
ufs_quota2.c 1.40 is equivalent to ulfs_quota2.c 1.18
ufs_vfsops.c 1.53 is subsumed by lfs_vfsops.c 1.324
ufs_vfsops.c 1.54 is subsumed by lfs_vfsops.c 1.324
ufs_vnops.c 1.223-1.224 do not apply to lfs
 1.11 15-Sep-2015  dholland Kill off the ulfs_direct_cache pool.
We no longer allocate temporary struct directs, so we don't need a
pool for them.
 1.10 01-Sep-2015  dholland Add new accessors for the d_type and d_namlen fields of struct lfs_direct.
Napalm the old byteswap access logic for these.
 1.9 31-May-2015  hannken Change lfs from hash table to vcache.

- Change lfs_valloc() to return an inode number and version instead of
a vnode and move lfs_ialloc() and lfs_vcreate() to new lfs_init_vnode().

- Add lfs_valloc_fixed() to allocate a known inode, used by kernel
roll forward.

- Remove lfs_*ref(), these functions cannot coexist with vcache and
their commented behaviour is far away from their implementation.

- Add the cleaner lwp and blockinfo to struct ulfsmount so lfs_loadvnode()
may use hints from the cleaner.

- Remove vnode locks from ulfs_lookup() like we did with ufs_lookup().
 1.8 08-Jun-2013  dholland branches: 1.8.2; 1.8.10; 1.8.14;
struct direct -> struct lfs_direct
struct dirtemplate -> struct lfs_dirtemplate
struct odirtemplate -> struct lfs_odirtemplate
DT_* -> LFS_DT_*
 1.7 06-Jun-2013  dholland Fix some exposed symbols:
LOSTFOUNDINO -> LFS_LOSTFOUNDINO
struct ufid -> struct ulfs_ufid
 1.6 06-Jun-2013  dholland Cleanups to reduce symbol and header exposure:
- move struct ufid from ulfs_inode.h to lfs.h
- lfs.h needs sys/mount.h and sys/pool.h
- ulfs_quota2_subr.c needs lfs_inode.h
- remove ulfs_inode.h from lfs.h in favor of ulfs_dinode.h
- move ULFS_NDADDR, ULFS_NIADDR, ULFS_NXADDR from ulfs_dinode.h to lfs.h
- remove ulfs_dinode.h from lfs.h
- add lfs.h to ulfs_dinode.h
 1.5 06-Jun-2013  dholland Add lfs_ or ulfs_ in front of extern symbols lacking them, mostly
quota-related (and particularly quota2-related) stuff.
 1.4 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.3 06-Jun-2013  dholland Split lfs from ufs step 3: rearrange config stuff.
Add new options:
LFS_EI
LFS_DIRHASH
LFS_EXTATTR
LFS_EXTATTR_AUTOSTART
LFS_QUOTA
LFS_QUOTA2

and update code referring to the corresponding FFS and UFS config
symbols to use the LFS versions. Disable the one extant reference
to APPLE_UFS in the ulfs files. Use opt_lfs.h only, not opt_ffs.h.
 1.2 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.8.14.3 28-Aug-2017  skrll Sync with HEAD
 1.8.14.2 22-Sep-2015  skrll Sync with HEAD
 1.8.14.1 06-Jun-2015  skrll Sync with HEAD
 1.8.10.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.8.10.1 08-Jun-2013  yamt file ulfs_vfsops.c was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.8.2.3 03-Dec-2017  jdolecek update from HEAD
 1.8.2.2 23-Jun-2013  tls resync from head
 1.8.2.1 08-Jun-2013  tls file ulfs_vfsops.c was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.12.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.12.2.1 26-Apr-2017  pgoyette Sync with HEAD
 1.13.12.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.13.12.1 10-Jun-2019  christos Sync with HEAD
 1.13.10.1 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.15.2.1 17-Jan-2020  ad Sync with head.
 1.56 27-Mar-2022  christos add a kauth vnode check for creating links
 1.55 20-Oct-2021  thorpej Overhaul of the EVFILT_VNODE kevent(2) filter:

- Centralize vnode kevent handling in the VOP_*() wrappers, rather than
forcing each individual file system to deal with it (except VOP_RENAME(),
because VOP_RENAME() is a mess and we currently have 2 different ways
of handling it; at least it's reasonably well-centralized in the "new"
way).
- Add support for NOTE_OPEN, NOTE_CLOSE, NOTE_CLOSE_WRITE, and NOTE_READ,
compatible with the same events in FreeBSD.
- Track which kevent notifications clients are interested in receiving
to avoid doing work for events no one cares about (avoiding, e.g.
taking locks and traversing the klist to send a NOTE_WRITE when
someone is merely watching for a file to be deleted, for example).

In support of the above:

- Add support in vnode_if.sh for specifying PRE- and POST-op handlers,
to be invoked before and after vop_pre() and vop_post(), respectively.
Basic idea from FreeBSD, but implemented differently.
- Add support in vnode_if.sh for specifying CONTEXT fields in the
vop_*_args structures. These context fields are used to convey information
between the file system VOP function and the VOP wrapper, but do not
occupy an argument slot in the VOP_*() call itself. These context fields
are initialized and subsequently interpreted by PRE- and POST-op handlers.
- Version VOP_REMOVE(), uses the a context field for the file system to report
back the resulting link count of the target vnode. Return this in tmpfs,
udf, nfs, chfs, ext2fs, lfs, and ufs.

NetBSD 9.99.92.
 1.54 05-Sep-2020  riastradh Round of uvm.h cleanup.

The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
to query whether curlwp is the pagedaemon, which should maybe be
exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
by it. We should split up uvm_extern.h but this will serve for now
to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
UVMHIST(ubchist), since ubchist is declared in uvm.h but the
reference evaporates if UVMHIST is not defined, so we reduce header
file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
here.

ok chs@
 1.53 16-May-2020  christos Add ACL support for FFS. From FreeBSD.
 1.52 28-Oct-2017  pgoyette Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3) format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
but it is possible that I've missed some of them. I would be glad to
update any stragglers that anyone identifies.
 1.51 07-Aug-2017  dholland Tidy up ufs_readdir. First step only; there's plenty more that could be
done to improve this code.
 1.50 04-Aug-2017  maya fix buffer overflow/KASSERT when cookies are supplied
lfs no longer uses the ffs-style struct direct, use the correct minimum
size

from dholland
XXX more wrong
 1.49 10-Jun-2017  maya Rename i_flag to i_state.

The similarity to i_flags has previously caused errors.
 1.48 26-Apr-2017  riastradh branches: 1.48.4;
Change VOP_REMOVE and VOP_RMDIR to preserve lock/ref on dvp.

No change to vp -- the plan is to replace the node by the
componentname in the vop parameters, and let all directory vops do
lookups internally.

Proposed on tech-kern with no objections:
https://mail-index.netbsd.org/tech-kern/2017/04/17/msg021825.html
 1.47 11-Apr-2017  riastradh Sprinkle lock ownership assertions.
 1.46 30-Mar-2017  hannken Remove now redundant calls to fstrans_start()/fstrans_done().

Add fstrans_start()/fstrans_done() to lfs_putpages().
 1.45 13-Mar-2017  riastradh #if DIAGNOSTIC panic ---> KASSERT

Replace some #if DEBUG by this too. DEBUG is only for expensive
assertions; these are not.
 1.44 20-Jun-2016  dholland branches: 1.44.2; 1.44.4;
One more batch of already-synced ufs changes:

ufs_extern.h 1.79 is equivalent to ulfs_extern.h 1.14
ufsmount.h 1.43 is (roughly) equivalent to lfs_extern.h 1.102
ufs_inode.c 1.94 does not apply to lfs
ufs_inode.c 1.95 does not apply to lfs either
ufs_readwrite.c 1.108 is equivalent to ulfs_readwrite.c 1.8
ufs_readwrite.c 1.109 is equivalent to ulfs_readwrite.c 1.9
ufs_readwrite.c 1.110 is equivalent to ulfs_readwrite.c 1.10
ufs_readwrite.c 1.111 does not apply to lfs
ufs_readwrite.c 1.112 is equivalent to ulfs_readwrite.c 1.11
ufs_readwrite.c 1.113 is equivalent to ulfs_readwrite.c 1.13
ufs_readwrite.c 1.114 is equivalent to ulfs_readwrite.c 1.14
ufs_readwrite.c 1.115 is equivalent to ulfs_readwrite.c 1.15
ufs_readwrite.c 1.116-1.118 does not apply to lfs
ufs_readwrite.c 1.119-1.120 are equivalent to ulfs_readwrite.c 1.16
ufs_rename.c 1.12 is equivalent to lfs_rename.c 1.8
ufs_vnops.c 1.226 is equivalent to ulfs_vnops.c 1.22 and lfs_vnops.c 1.270
ufs_vnops.c 1.227 is equivalent to ulfs_vnops.c 1.23
ufs_vnops.c 1.228-1.229 are equivalent to ulfs_vnops.c 1.24
ufs_vnops.c 1.230 is equivalent to ulfs_vnops.c 1.25 and lfs_vnops.c 1.271
ufs_vnops.c 1.231 originated in lfs
ufs_vnops.c 1.232 does not apply to lfs
 1.43 20-Jun-2016  dholland With the previous we seem to have the changes from -r1.225 of ufs_vnops.c.
(as that was stuff from moving ffs to the new vcache and lfs has also been
moved, this is not surprising)
 1.42 20-Jun-2016  dholland Merge (effectively) -r1.78 of ufs_extern.h: shift ulfs_makeinode to
lfs_vnops.c and make it file-static there, as that's the only place
it's used.
 1.41 20-Jun-2016  dholland Note more already-merged versions:

inode.h 1.68 is subsumed by ulfs_inode.h 1.19
inode.h 1.69-1.72 do not apply to lfs
ufs_extern.h 1.74 was covered when lfs was moved to the new vnode cache
ufs_extern.h 1.75 is equivalent to ulfs_extern.h 1.13
ufs_extern.h 1.76-1.77 do not apply to lfs
ufsmount.h 1.42 does not apply to lfs
ufs_inode.c 1.90 is subsumed by ulfs_inode.c 1.10
ufs_inode.c 1.91-1.92 do not apply to lfs
ufs_lookup.c 1.130 is subsumed by ulfs_lookup.c 1.24
ufs_lookup.c 1.131 is equivalent to ulfs_lookup.c 1.20
ufs_lookup.c 1.132 is equivalent to ulfs_lookup.c 1.21
ufs_lookup.c 1.133 is equivalent to ulfs_lookup.c 1.22
ufs_lookup.c 1.134 is equivalent to ulfs_lookup.c 1.23
ufs_lookup.c 1.135 is equivalent to ulfs_lookup.c 1.25
ufs_quota2.c 1.38 is equivalent to ulfs_quota2.c 1.17
ufs_quota2.c 1.39 is equivalent to ulfs_quota2.c 1.16
ufs_quota2.c 1.40 is equivalent to ulfs_quota2.c 1.18
ufs_vfsops.c 1.53 is subsumed by lfs_vfsops.c 1.324
ufs_vfsops.c 1.54 is subsumed by lfs_vfsops.c 1.324
ufs_vnops.c 1.223-1.224 do not apply to lfs
 1.40 20-Jun-2016  dholland More already-merged or equivalent changes:

ufs_dirhash.c 1.36 corresponds to ulfs_dirhash.c 1.8
ufs_extattr.c 1.43 corresponds to ulfs_extattr.c 1.7
ufs_lookup.c 1.126 does not apply to lfs
ufs_lookup.c 1.127 we already have
ufs_lookup.c 1.128 does not apply to lfs
ufs_lookup.c 1.129 corresponds to ulfs_lookup.c 1.19
ufs_quota1.c 1.19 corresponds to ulfs_quota1.c 1.7
ufs_quota1.c 1.20 corresponds to ulfs_quota1.c 1.8
ufs_quota2.c 1.36 we have equivalent changes for
ufs_rename.c 1.9 corresponds to lfs_rename.c 1.5
ufs_rename.c 1.10 corresponds to lfs_rename.c 1.6
ufs_vnops.c 1.219 corresponds to lfs_vnops.c 1.260 and ulfs_vnops.c 1.19
ufs_vnops.c 1.220 corresponds to lfs_vnops.c 1.261 and ulfs_vnops.c 1.20
ufs_vnops.c 1.221 was superseded by later changes
ufs_vnops.c 1.222 got fixed independently in lfs
 1.39 19-Jun-2016  dholland we already have ufs_lookup.c 1.125 and ufs_vnops.c 1.218.
 1.38 19-Jun-2016  dholland note that we're synced with ufs_vnops.c -r1.217 and ufsmount.h -r1.41
(those changes removed lfs hooks from ufs so shouldn't be merged across)
 1.37 19-Jun-2016  dholland Merge -r1.216 of ufs_vnops.c: comments about maxsymlinklen handling
 1.36 19-Jun-2016  dholland Merge -r1.215 of ufs_vnops.c: the speed limit is 80
(-r1.214 was ffs-only)
 1.35 14-Nov-2015  pgoyette Remove historic references to wapbl.
 1.34 21-Sep-2015  dholland Add 64-bit directory entry structures, and adjust accessors accordingly.

The LFS64 directory entry has a 64-bit inode number. This is stored as
two 32-bit values to avoid inducing 64-bit alignment requirements.

The exposed type for manipulating directory entries is now
LFS_DIRHEADER, following the same convention as e.g. IFILE and SEGUSE.
(But with LFS_ on it, because.)
 1.33 21-Sep-2015  dholland Oops; LFS_DIRECTSIZ() is going to need the fs as an argument.

Also, it turns out that dirhash needs a compile-time-constant version
of LFS_DIRECTSIZ(LFS_MAXNAMLEN+1), independent of 64-vs-32, so create
LFS_MAXDIRENTRYSIZE for this. Sigh.
 1.32 15-Sep-2015  dholland Pass around struct lfs_dirheader instead of struct lfs_direct.
 1.31 15-Sep-2015  dholland Add an accessor function for directory names.
 1.30 15-Sep-2015  dholland Kill off ulfs_makedirentry; just pass the data to ulfs_direnter instead.
For now, move one copy of the code that allocates and fills in a
temporary struct lfs_direct to the top of ulfs_direnter; but it should
go away shortly.
 1.29 15-Sep-2015  dholland Add and use accessor functions for more of the directory entry fields.
 1.28 01-Sep-2015  dholland Add new accessors for the d_type and d_namlen fields of struct lfs_direct.
Napalm the old byteswap access logic for these.
 1.27 01-Sep-2015  dholland Use the lfs dinode accessors in place of the ufs-derived ones.
(Mostly.)

The ufs-derived ones are fake structure member macros, which are gross
and not very safe. Also, it seems that a lot of places in the lfs code
were using the ffsv1 branch of them unconditionally, and this way it's
guaranteed all those places have been updated.

Found while doing this: for non-devices, have getattr produce NODEV
in the rdev field instead of leaking the address of the first direct
block.
 1.26 31-May-2015  hannken Change lfs from hash table to vcache.

- Change lfs_valloc() to return an inode number and version instead of
a vnode and move lfs_ialloc() and lfs_vcreate() to new lfs_init_vnode().

- Add lfs_valloc_fixed() to allocate a known inode, used by kernel
roll forward.

- Remove lfs_*ref(), these functions cannot coexist with vcache and
their commented behaviour is far away from their implementation.

- Add the cleaner lwp and blockinfo to struct ulfsmount so lfs_loadvnode()
may use hints from the cleaner.

- Remove vnode locks from ulfs_lookup() like we did with ufs_lookup().
 1.25 20-Apr-2015  riastradh Make VOP_LINK return directory still locked and referenced.

Ride 7.99.10 bump.
 1.24 20-Apr-2015  riastradh Fix more dvp->v_mount after vput(dvp).
 1.23 27-Mar-2015  riastradh Tighten some kasserts in ufs_bufio code paths.
 1.22 27-Mar-2015  riastradh Disentangle buffer-cached I/O from page-cached I/O in UFS.

Page-cached I/O is used for regular files, and is initiated by VFS
users such as userland and NFS.

Buffer-cached I/O is used for directories and symlinks, and is issued
only internally by UFS.

New UFS routine ufs_bufio replaces vn_rdwr for internal use.
ufs_bufio is implemented by new UFS operations uo_bufrd/uo_bufwr,
which sit in ufs_readwrite.c alongside the VOP_READ/VOP_WRITE
implementations.

I preserved the code as much as possible and will leave further
simplification for future commits. I kept the ulfs_readwrite.c
copypasta close to ufs_readwrite.c in case we ever want to merge them
back; likewise ext2fs_readwrite.c.

No externally visible semantic change. All atf fs tests still pass.
 1.21 17-May-2014  dholland branches: 1.21.2; 1.21.6;
Move the ulfs-level (copy of ufs) vnops for symlink, create, and mkdir
into lfs_vnops.c preparatory to folding them into the lfs entry points.

(lfs_vnops.c now has four licenses. sigh.)
 1.20 23-Jan-2014  hannken branches: 1.20.2;
Change vnode operations create, mknod, mkdir and symlink to return
the resulting vnode *vpp unlocked.

Discussed on tech-kern@

Welcome to 6.99.30
 1.19 17-Jan-2014  hannken Change vnode operations create, mknod, mkdir and symlink to keep the
directory node dvp locked on return.

Discussed on tech-kern@

Welcome to 6.99.29
 1.18 28-Jul-2013  dholland Migrate the miscellaneous ulfs-level info from struct ulfsmount to
struct lfs.

Put them inside #ifdef _KERNEL there. They are not the only such
members, gross as that is. Unfortunately, moving struct lfs to
lfs_kernel.h does not work.
 1.17 28-Jul-2013  dholland Remove the now-pointless ulfs ops macros.
 1.16 28-Jul-2013  dholland Remove ulfsspec_close and ulfsfifo_close as they're not used.
 1.15 21-Jul-2013  dholland Merge logic from ulfs_close(), ulfs_getattr(), and ulfs_strategy()
into the preexisting lfs_*() versions of these functions, and delete
the unused ulfs copies.
 1.14 20-Jul-2013  dholland Remove ulfs_mknod, which is not used.
 1.13 08-Jun-2013  dholland branches: 1.13.2; 1.13.4; 1.13.6;
ulfs_dir.h has been emptied; remove it.
 1.12 08-Jun-2013  dholland There is no WAPBL in LFS.
 1.11 08-Jun-2013  dholland mp->mnt_wapbl and mp->mnt_wapbl_replay are always NULL in here.
 1.10 08-Jun-2013  dholland Merge -r1.213 of ufs_vnops.c:

Committed By: kardel
Date: Sat Jun 8 05:47:02 UTC 2013

fix clearing of system-flags (schg, sappnd). clearing system flags is
possible again at securelevel < 1.
reviewed by christos@
 1.9 08-Jun-2013  dholland DIRBLKSIZ -> LFS_DIRBLKSIZ
DIRECTSIZ -> LFS_DIRECTSIZ
DIRSIZ -> LFS_DIRSIZ
OLDDIRFMT -> LFS_OLDDIRFMT
NEWDIRFMT -> LFS_NEWDIRFMT
IFTODT -> LFS_IFTODT
DTTOIF -> LFS_DTTOIF
 1.8 08-Jun-2013  dholland struct direct -> struct lfs_direct
struct dirtemplate -> struct lfs_dirtemplate
struct odirtemplate -> struct lfs_odirtemplate
DT_* -> LFS_DT_*
 1.7 08-Jun-2013  dholland Stick LFS_ in front of IFMT, IFIFO, IFREG, etc. so as not to conflict
with the UFS copies of these symbols. (Which themselves ought to have
UFS_ stuck on.)
 1.6 06-Jun-2013  dholland Remove stray references to ext2fs, chfs, ffs, and mfs.
 1.5 06-Jun-2013  dholland Add lfs_ or ulfs_ in front of extern symbols lacking them, mostly
quota-related (and particularly quota2-related) stuff.
 1.4 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.3 06-Jun-2013  dholland Split lfs from ufs step 3: rearrange config stuff.
Add new options:
LFS_EI
LFS_DIRHASH
LFS_EXTATTR
LFS_EXTATTR_AUTOSTART
LFS_QUOTA
LFS_QUOTA2

and update code referring to the corresponding FFS and UFS config
symbols to use the LFS versions. Disable the one extant reference
to APPLE_UFS in the ulfs files. Use opt_lfs.h only, not opt_ffs.h.
 1.2 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.13.6.1 23-Jul-2013  riastradh sync with HEAD
 1.13.4.2 18-May-2014  rmind sync with head
 1.13.4.1 28-Aug-2013  rmind sync with head
 1.13.2.4 03-Dec-2017  jdolecek update from HEAD
 1.13.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.13.2.2 23-Jun-2013  tls resync from head
 1.13.2.1 08-Jun-2013  tls file ulfs_vnops.c was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.20.2.1 10-Aug-2014  tls Rebase.
 1.21.6.6 28-Aug-2017  skrll Sync with HEAD
 1.21.6.5 09-Jul-2016  skrll Sync with HEAD
 1.21.6.4 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.21.6.3 22-Sep-2015  skrll Sync with HEAD
 1.21.6.2 06-Jun-2015  skrll Sync with HEAD
 1.21.6.1 06-Apr-2015  skrll Sync with HEAD
 1.21.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.21.2.1 17-May-2014  yamt file ulfs_vnops.c was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.44.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.44.2.2 26-Apr-2017  pgoyette Sync with HEAD
 1.44.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.48.4.2 02-Nov-2017  snj Pull up following revision(s) (requested by pgoyette in ticket #335):
share/man/man9/kernhist.9: 1.5-1.8
sys/arch/acorn26/acorn26/pmap.c: 1.39
sys/arch/arm/arm32/fault.c: 1.105 via patch
sys/arch/arm/arm32/pmap.c: 1.350, 1.359
sys/arch/arm/broadcom/bcm2835_bsc.c: 1.7
sys/arch/arm/omap/if_cpsw.c: 1.20
sys/arch/arm/omap/tiotg.c: 1.7
sys/arch/evbarm/conf/RPI2_INSTALL: 1.3
sys/dev/ic/sl811hs.c: 1.98
sys/dev/usb/ehci.c: 1.256
sys/dev/usb/if_axe.c: 1.83
sys/dev/usb/motg.c: 1.18
sys/dev/usb/ohci.c: 1.274
sys/dev/usb/ucom.c: 1.119
sys/dev/usb/uhci.c: 1.277
sys/dev/usb/uhub.c: 1.137
sys/dev/usb/umass.c: 1.160-1.162
sys/dev/usb/umass_quirks.c: 1.100
sys/dev/usb/umass_scsipi.c: 1.55
sys/dev/usb/usb.c: 1.168
sys/dev/usb/usb_mem.c: 1.70
sys/dev/usb/usb_subr.c: 1.221
sys/dev/usb/usbdi.c: 1.175
sys/dev/usb/usbdi_util.c: 1.67-1.70
sys/dev/usb/usbroothub.c: 1.3
sys/dev/usb/xhci.c: 1.75
sys/external/bsd/drm2/dist/drm/i915/i915_gem.c: 1.34
sys/kern/kern_history.c: 1.15
sys/kern/kern_xxx.c: 1.74
sys/kern/vfs_bio.c: 1.275-1.276
sys/miscfs/genfs/genfs_io.c: 1.71
sys/sys/kernhist.h: 1.21
sys/ufs/ffs/ffs_balloc.c: 1.63
sys/ufs/lfs/lfs_vfsops.c: 1.361
sys/ufs/lfs/ulfs_inode.c: 1.21
sys/ufs/lfs/ulfs_vnops.c: 1.52
sys/ufs/ufs/ufs_inode.c: 1.102
sys/ufs/ufs/ufs_vnops.c: 1.239
sys/uvm/pmap/pmap.c: 1.37-1.39
sys/uvm/pmap/pmap_tlb.c: 1.22
sys/uvm/uvm_amap.c: 1.108
sys/uvm/uvm_anon.c: 1.64
sys/uvm/uvm_aobj.c: 1.126
sys/uvm/uvm_bio.c: 1.91
sys/uvm/uvm_device.c: 1.66
sys/uvm/uvm_fault.c: 1.201
sys/uvm/uvm_km.c: 1.144
sys/uvm/uvm_loan.c: 1.85
sys/uvm/uvm_map.c: 1.353
sys/uvm/uvm_page.c: 1.194
sys/uvm/uvm_pager.c: 1.111
sys/uvm/uvm_pdaemon.c: 1.109
sys/uvm/uvm_swap.c: 1.175
sys/uvm/uvm_vnode.c: 1.103
usr.bin/vmstat/vmstat.c: 1.219
Reorder to test for null before null deref in debug code
--
Reorder to test for null before null deref in debug code
--
KNF
--
No need for '\n' in UVMHIST_LOG
--
normalise a BIOHIST log message
--
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...
(As proposed on tech-kern@ with additional changes and enhancements.)
Details of changes:
* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)
* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.
* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.
* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."
* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.
* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).
* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).
* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.
* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.
[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3)
format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".
[2] I've tried very hard to find "all [the] existing users of
kernhist(9)"
but it is possible that I've missed some of them. I would be glad
to
update any stragglers that anyone identifies.
--
For some reason this single kernel seems to have outgrown its declared
size as a result of the kernhist(9) changes. Bump the size.
XXX The amount of increase may be excessive - anyone with more detailed
XXX knowledge please feel free to further adjust the value
appropriately.
--
Misssed one cast of pointer --> uintptr_t in previous kernhist(9) commit
--
And yet another one. :(
--
Use correct mark-up for NetBSD version.
--
More improvements in grammar and readability.
--
Remove a stray '"' (obvious typo) and add a couple of casts that are
probably needed.
--
And replace an instance of "%p" conversion with "%#jx"
--
Whitespace fix. Give Bl tag table a width. Fix Xr.
 1.48.4.1 30-Oct-2017  snj Pull up following revision(s) (requested by maya in ticket #330):
sbin/fsck_lfs/inode.c: 1.69
sbin/fsck_lfs/lfs.c: 1.73
sbin/fsck_lfs/pass6.c: 1.50
sbin/fsck_lfs/segwrite.c: 1.46
sys/ufs/lfs/lfs.h: 1.202-1.203
sys/ufs/lfs/lfs_accessors.h: 1.48
sys/ufs/lfs/lfs_alloc.c: 1.136-1.137
sys/ufs/lfs/lfs_balloc.c: 1.94
sys/ufs/lfs/lfs_bio.c: 1.141
sys/ufs/lfs/lfs_extern.h: 1.113
sys/ufs/lfs/lfs_inode.c: 1.156-1.157
sys/ufs/lfs/lfs_inode.h: 1.20, 1.21, 1.23
sys/ufs/lfs/lfs_itimes.c: 1.20
sys/ufs/lfs/lfs_pages.c: 1.13-1.15
sys/ufs/lfs/lfs_rename.c: 1.22
sys/ufs/lfs/lfs_segment.c: 1.270-1.275
sys/ufs/lfs/lfs_subr.c: 1.94-1.97
sys/ufs/lfs/lfs_syscalls.c: 1.175
sys/ufs/lfs/lfs_vfsops.c: 1.360
sys/ufs/lfs/lfs_vnops.c: 1.316-1.321
sys/ufs/lfs/ulfs_inode.c: 1.20
sys/ufs/lfs/ulfs_inode.h: 1.24
sys/ufs/lfs/ulfs_lookup.c: 1.41
sys/ufs/lfs/ulfs_quota2.c: 1.31
sys/ufs/lfs/ulfs_readwrite.c: 1.24
sys/ufs/lfs/ulfs_vnops.c: 1.49-1.50
Update inode member i_flag --> i_state to keep up with kernel changes
Move definition of IN_ALLMOD near the flag it's a mask for.
Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
Correct confusion between i_flag and i_flags
These will have to be renamed.
Spotted by Riastradh, thanks!
Add an XXX about the missing flags so it's not buried in a commit
message.
now the XXX count for LFS is 260
Rename i_flag to i_state.
The similarity to i_flags has previously caused errors.
Use continue to denote the no-op loop to match netbsd style
newline for extra clarity.
It isn't safe to drain dirops with seglock held, it'll deadlock if there
are any dirops. drain before grabbing seglock.
lfs_dirops == 0 is always true (as we already drained dirops), so omit
that part of the comparison.
Fixes a lot of LFS deadlocks. PR kern/52301
Many thanks to dholland for help analyzing coredumps
Ifdef out KDASSERT which fires on my machine.
Deduplicate sanity check that seglock is held on segunlock
Revert r1.272 fix to PR kern/52301, the performance hit is making things
unusable.
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar
XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
fix buffer overflow/KASSERT when cookies are supplied
lfs no longer uses the ffs-style struct direct, use the correct minimum
size
from dholland
XXX more wrong
Consistently use {,UN}MARK_VNODE macros rather than function calls.
Not much point doing anything after a panic call
Ask some question about the code in a XXX comment
XXX question our double-flushing of dirops
Fix typo in comment
 1.6 08-Jun-2013  dholland G/C
 1.5 08-Jun-2013  dholland There is no WAPBL in LFS.
 1.4 06-Jun-2013  dholland Remove stray references to ext2fs, chfs, ffs, and mfs.
 1.3 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.2 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.5 08-Jun-2013  dholland G/C
 1.4 08-Jun-2013  dholland There is no WAPBL in LFS.
 1.3 08-Jun-2013  dholland mp->mnt_wapbl and mp->mnt_wapbl_replay are always NULL in here.
 1.2 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.18 20-Jun-2016  dholland One more batch of already-synced ufs changes:

ufs_extern.h 1.79 is equivalent to ulfs_extern.h 1.14
ufsmount.h 1.43 is (roughly) equivalent to lfs_extern.h 1.102
ufs_inode.c 1.94 does not apply to lfs
ufs_inode.c 1.95 does not apply to lfs either
ufs_readwrite.c 1.108 is equivalent to ulfs_readwrite.c 1.8
ufs_readwrite.c 1.109 is equivalent to ulfs_readwrite.c 1.9
ufs_readwrite.c 1.110 is equivalent to ulfs_readwrite.c 1.10
ufs_readwrite.c 1.111 does not apply to lfs
ufs_readwrite.c 1.112 is equivalent to ulfs_readwrite.c 1.11
ufs_readwrite.c 1.113 is equivalent to ulfs_readwrite.c 1.13
ufs_readwrite.c 1.114 is equivalent to ulfs_readwrite.c 1.14
ufs_readwrite.c 1.115 is equivalent to ulfs_readwrite.c 1.15
ufs_readwrite.c 1.116-1.118 does not apply to lfs
ufs_readwrite.c 1.119-1.120 are equivalent to ulfs_readwrite.c 1.16
ufs_rename.c 1.12 is equivalent to lfs_rename.c 1.8
ufs_vnops.c 1.226 is equivalent to ulfs_vnops.c 1.22 and lfs_vnops.c 1.270
ufs_vnops.c 1.227 is equivalent to ulfs_vnops.c 1.23
ufs_vnops.c 1.228-1.229 are equivalent to ulfs_vnops.c 1.24
ufs_vnops.c 1.230 is equivalent to ulfs_vnops.c 1.25 and lfs_vnops.c 1.271
ufs_vnops.c 1.231 originated in lfs
ufs_vnops.c 1.232 does not apply to lfs
 1.17 20-Jun-2016  dholland Note more already-merged versions:

inode.h 1.68 is subsumed by ulfs_inode.h 1.19
inode.h 1.69-1.72 do not apply to lfs
ufs_extern.h 1.74 was covered when lfs was moved to the new vnode cache
ufs_extern.h 1.75 is equivalent to ulfs_extern.h 1.13
ufs_extern.h 1.76-1.77 do not apply to lfs
ufsmount.h 1.42 does not apply to lfs
ufs_inode.c 1.90 is subsumed by ulfs_inode.c 1.10
ufs_inode.c 1.91-1.92 do not apply to lfs
ufs_lookup.c 1.130 is subsumed by ulfs_lookup.c 1.24
ufs_lookup.c 1.131 is equivalent to ulfs_lookup.c 1.20
ufs_lookup.c 1.132 is equivalent to ulfs_lookup.c 1.21
ufs_lookup.c 1.133 is equivalent to ulfs_lookup.c 1.22
ufs_lookup.c 1.134 is equivalent to ulfs_lookup.c 1.23
ufs_lookup.c 1.135 is equivalent to ulfs_lookup.c 1.25
ufs_quota2.c 1.38 is equivalent to ulfs_quota2.c 1.17
ufs_quota2.c 1.39 is equivalent to ulfs_quota2.c 1.16
ufs_quota2.c 1.40 is equivalent to ulfs_quota2.c 1.18
ufs_vfsops.c 1.53 is subsumed by lfs_vfsops.c 1.324
ufs_vfsops.c 1.54 is subsumed by lfs_vfsops.c 1.324
ufs_vnops.c 1.223-1.224 do not apply to lfs
 1.16 19-Jun-2016  dholland note that we're synced with ufs_vnops.c -r1.217 and ufsmount.h -r1.41
(those changes removed lfs hooks from ufs so shouldn't be merged across)
 1.15 19-Jun-2016  dholland Update the ufs versions these files are synced with by 1: the
201306016 commit by hannken@ that removed references to ffs_snapgone
in ufs doesn't need to be synced into lfs.
 1.14 15-Oct-2015  dholland Move stuff from struct ulfsmount to struct lfs.
 1.13 31-May-2015  hannken Change lfs from hash table to vcache.

- Change lfs_valloc() to return an inode number and version instead of
a vnode and move lfs_ialloc() and lfs_vcreate() to new lfs_init_vnode().

- Add lfs_valloc_fixed() to allocate a known inode, used by kernel
roll forward.

- Remove lfs_*ref(), these functions cannot coexist with vcache and
their commented behaviour is far away from their implementation.

- Add the cleaner lwp and blockinfo to struct ulfsmount so lfs_loadvnode()
may use hints from the cleaner.

- Remove vnode locks from ulfs_lookup() like we did with ufs_lookup().
 1.12 28-Jul-2013  dholland branches: 1.12.4; 1.12.8;
Migrate the miscellaneous ulfs-level info from struct ulfsmount to
struct lfs.

Put them inside #ifdef _KERNEL there. They are not the only such
members, gross as that is. Unfortunately, moving struct lfs to
lfs_kernel.h does not work.
 1.11 28-Jul-2013  dholland Remove the now-pointless ulfs ops macros.
 1.10 28-Jul-2013  dholland Get rid of the ulfs_ops table as we only have one fs in here now.
 1.9 28-Jul-2013  dholland Improve comments in struct ulfsmount.
Also rearrange it to group related items together.
 1.8 28-Jul-2013  dholland Prune unused stuff from struct ulfsmount.
 1.7 08-Jun-2013  dholland branches: 1.7.2; 1.7.4;
Move stuff to lfs.h that's needed by userland:
LFS_DT_*
ULFS_ROOTINO
ULFS_WINO
struct lfs_direct
struct lfs_dirtemplate
struct lfs_odirtemplate
struct ulfs_args

Also fix FFS_MAXNAMLEN -> LFS_MAXNAMLEN in several places.
 1.6 06-Jun-2013  dholland Remove references to Apple UFS.
 1.5 06-Jun-2013  dholland Remove stray references to ext2fs, chfs, ffs, and mfs.
 1.4 06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.3 06-Jun-2013  dholland Split lfs from ufs step 3: rearrange config stuff.
Add new options:
LFS_EI
LFS_DIRHASH
LFS_EXTATTR
LFS_EXTATTR_AUTOSTART
LFS_QUOTA
LFS_QUOTA2

and update code referring to the corresponding FFS and UFS config
symbols to use the LFS versions. Disable the one extant reference
to APPLE_UFS in the ulfs files. Use opt_lfs.h only, not opt_ffs.h.
 1.2 06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.1 06-Jun-2013  dholland Split lfs from ufs, part 1: cut and paste 15000 lines of ufs as "ulfs".

These are verbatim copies except that I've preserved the ufs rcsids
for reference. Also,
ufs/quota.h -> ulfs_quotacommon.h
ufs/ufs_quota.h -> ulfs_quota.h

Splitting lfs from ufs was ok'd by core some years ago. This is not
from my original tree, which became unmergeable after the several sets
of quota changes; I've done the work over again over the last couple
days.
 1.7.4.1 28-Aug-2013  rmind sync with head
 1.7.2.4 03-Dec-2017  jdolecek update from HEAD
 1.7.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.7.2.2 23-Jun-2013  tls resync from head
 1.7.2.1 08-Jun-2013  tls file ulfsmount.h was added on branch tls-maxphys on 2013-06-23 06:18:39 +0000
 1.12.8.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.12.8.1 06-Jun-2015  skrll Sync with HEAD
 1.12.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.12.4.1 28-Jul-2013  yamt file ulfsmount.h was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000

RSS XML Feed