Home | History | Annotate | Download | only in lfs
History log of /src/sys/ufs/lfs/lfs_balloc.c
RevisionDateAuthorComments
 1.96  05-Sep-2020  riastradh Round of uvm.h cleanup.

The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
to query whether curlwp is the pagedaemon, which should maybe be
exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
by it. We should split up uvm_extern.h but this will serve for now
to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
UVMHIST(ubchist), since ubchist is declared in uvm.h but the
reference evaporates if UVMHIST is not defined, so we reduce header
file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
here.

ok chs@
 1.95  23-Feb-2020  riastradh Serialize access to the splay tree with lfs_lock.
 1.94  10-Jun-2017  maya branches: 1.94.6; 1.94.10; 1.94.12;
Rename i_flag to i_state.

The similarity to i_flags has previously caused errors.
 1.93  08-Jun-2017  chs move some buffer cache internals declarations from buf.h to vfs_bio.c.
this is needed to avoid name conflicts with ZFS and also
makes it clearer that other code shouldn't be messing with these.
remove the LFS debug code that poked around in bufqueues and
remove the BQ_EMPTY bufqueue since nothing uses it anymore.
provide a function to let LFS and wapbl read the value of nbuf for now.
 1.92  06-Apr-2017  maya branches: 1.92.6;
Provide a LFS_ENTER_LOG (__nothing) in the !DEBUG case.
so I can drop lots of #ifdef DEBUG around this macro. NFCI
 1.91  07-Aug-2016  dholland branches: 1.91.2;
Fix stupid thinko.
 1.90  07-Aug-2016  dholland comments
 1.89  07-Aug-2016  dholland use static properly
 1.88  10-Oct-2015  dholland branches: 1.88.2;
Use accessors for some more indirect block manipulations.
 1.87  01-Sep-2015  dholland Use the lfs dinode accessors in place of the ufs-derived ones.
(Mostly.)

The ufs-derived ones are fake structure member macros, which are gross
and not very safe. Also, it seems that a lot of places in the lfs code
were using the ffsv1 branch of them unconditionally, and this way it's
guaranteed all those places have been updated.

Found while doing this: for non-devices, have getattr produce NODEV
in the rdev field instead of leaking the address of the first direct
block.
 1.86  02-Aug-2015  dholland Pass the fs object to LFS_MAX_DADDR so it can check lfs_is64.

Remove some hackish intentional 64->32 truncations next to the checks
using LFS_MAX_DADDR, and tackle the problem they handled in bmap
instead.

The problem: the magic block pointer value UNWRITTEN has magic value
-2, and if it's not handled specifically, uint32 -> uint64 promotion
turns it into 4294967294, which then causes consternation and
monkeyhouse downstream.

What's here is still kind of a hack, but it's a step forward.
 1.85  02-Aug-2015  dholland Fix assorted 64 -> 32 truncations in lfs. Also, some minor tidyups and
corrections in passing.
 1.84  28-Jul-2015  dholland Add a new lfs header file: lfs_accessors.h.

This contains all the accessor functions and macros out of lfs.h.
Add an include of lfs_accessors.h after all uses of lfs.h... except
for code that wants to define its own struct lfs-alike that the
accessors are supposed to play along with. For these, set STRUCT_LFS
and include lfs_accessors.h after the necessary structure has been
defined, so that lfs_accessors.h can emit functions in terms of it.
 1.83  24-Jul-2015  dholland More lfs superblock accessors.
(This changes the rest of the code over; all the accessors were
already added.)

The difference between this commit and the previous one is arbitrary,
but the previous one passed the regression tests on its own so I'm
keeping it separate to help with any bisections that might be needed
in the future.
 1.82  24-Jul-2015  dholland Switch to accessor functions for elements of the LFS on-disk
superblock. This will allow switching between 32/64 bit forms on the
fly; it will also allow handling LFS_EI reasonably tidily. (That
currently doesn't work on the superblock.)

It also gets rid of cpp abuse in the form of fake structure member
macros.

Also, instead of doing sleep/wakeup on &lfs_avail and &lfs_nextseg
inside the on-disk superblock, add extra elements to the in-memory
struct lfs for this. (XXX: these should be changed to condvars, but
not right now)

XXX: this migrates a structure needed by the lfs code in libsa (struct
salfs) into lfs.h, where it doesn't belong, but for the time being
this is necessary in order to allow the accessors (and the various
lfs macros and other goop that relies on them) to compile.
 1.81  28-Mar-2015  maxv Remove the 'cred' argument from bread(). Remove a now unused var in
ffs_snapshot.c. Update the man page accordingly.

ok hannken@
 1.80  28-Jul-2013  dholland branches: 1.80.6;
Add more of the bits for supporting quotas.
 1.79  28-Jul-2013  dholland Migrate the miscellaneous ulfs-level info from struct ulfsmount to
struct lfs.

Put them inside #ifdef _KERNEL there. They are not the only such
members, gross as that is. Unfortunately, moving struct lfs to
lfs_kernel.h does not work.
 1.78  28-Jul-2013  dholland Add lfs_kernel.h for declarations that don't need to be exposed to userland.

lfs currently has the following headers:
lfs.h - on-disk structures and stuff needed for userlevel tools
lfs_inode.h - additional restricted materials for userlevel tools
that operate the fs (newfs_lfs, fsck_lfs, lfs_cleanerd)
lfs_kernel.h - stuff needed only in the kernel

and the following legacy headers that are expected to be mopped up and
folded into one of the above:
lfs_extern.h - function prototypes
ulfs_bswap.h - endian-independent support
ulfs_dinode.h - now contains very little
ulfs_dirhash.h - dirhash support
ulfs_extattr.h - extattr support
ulfs_extern.h - more function prototypes
ulfs_inode.h - assorted kernel-only declarations
ulfs_quota.h - quota support
ulfs_quota1.h - more quota support
ulfs_quota2.h - more quota support
ulfs_quotacommon.h - more quota support
ulfsmount.h - legacy copy of ufsmount material
 1.77  18-Jun-2013  christos branches: 1.77.2;
Prefix most of the cpp macros with lfs_ and LFS_ to avoid conflicts with ffs.
This was done so that boot blocks that want to compile both FFS and LFS in
the same file work.
 1.76  06-Jun-2013  dholland Add lfs_ or ulfs_ in front of extern symbols lacking them, mostly
quota-related (and particularly quota2-related) stuff.
 1.75  06-Jun-2013  dholland Split lfs from ufs step 4:

Massedit all ufs symbols to be "ulfs" instead, to make sure there are
no conflicts with ufs. Confirmed with grep.

(This required changing a few comments that maybe should have been
left alone to say "ulfs", but we'll survive that.)
 1.74  06-Jun-2013  dholland Split lfs from ufs step 3: rearrange config stuff.
Add new options:
LFS_EI
LFS_DIRHASH
LFS_EXTATTR
LFS_EXTATTR_AUTOSTART
LFS_QUOTA
LFS_QUOTA2

and update code referring to the corresponding FFS and UFS config
symbols to use the LFS versions. Disable the one extant reference
to APPLE_UFS in the ulfs files. Use opt_lfs.h only, not opt_ffs.h.
 1.73  06-Jun-2013  dholland Split lfs from ufs, part 2:

Change all <ufs/ufs/foo.h> includes to <ufs/lfs/ulfs_foo.h>.
 1.72  22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.71  20-Dec-2012  hannken Change bread() and breadn() to never return a buffer on
error and modify all callers to not brelse() on error.

Welcome to 6.99.16

PR kern/46282 (6.0_BETA crash: msdosfs_bmap -> pcbmap -> bread -> bio_doread)
 1.70  11-Jul-2011  hannken branches: 1.70.2; 1.70.12;
Change VOP_BWRITE() to take a vnode as its first argument like all other
VOPs do. Layered file systems no longer have to modify bp->b_vp and run
into trouble when an async VOP_BWRITE() uses the wrong vnode.

- change all occurences of VOP_BWRITE(bp) to VOP_BWRITE(bp->b_vp, bp).
- remove layer_bwrite().
- welcome to 5.99.55

Adresses PR kern/38762 panic: vwakeup: neg numoutput

No objections from tech-kern@.
 1.69  16-Feb-2010  mlelstv Three changes in a single commit.

- drop the notion of frags (LFS fragments) vs fsb (FFS fragments)
The code uses a complicated unity function that just makes the
code difficult to understand.

- support larger sector sizes. Fix disk address computations
to use DEV_BSIZE in the kernel as required by device drivers
and to use sector sizes in userland.

- Fix several locking bugs in lfs_bio.c and lfs_subr.c.
 1.68  18-Mar-2009  cegger branches: 1.68.2;
bzero -> memset
 1.67  16-May-2008  hannken branches: 1.67.6; 1.67.12;
Make sure all cached buffers with valid, not yet written data have been
run through copy-on-write. Call fscow_run() with valid data where possible.

The LP_UFSCOW hack is no longer needed to protect ffs_copyonwrite() against
endless recursion.

- Add a flag B_MODIFY to bread(), breada() and breadn(). If set the caller
intends to modify the buffer returned.

- Always run copy-on-write on buffers returned from ffs_balloc().

- Add new function ffs_getblk() that gets a buffer, assigns a new blkno,
may clear the buffer and runs copy-on-write. Process possible errors
from getblk() or fscow_run(). Part of PR kern/38664.

Welcome to 4.99.63

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.66  28-Apr-2008  martin branches: 1.66.2;
Remove clause 3 and 4 from TNF licenses
 1.65  15-Feb-2008  ad branches: 1.65.6; 1.65.8; 1.65.10;
The buffer LOCKED flag need not be under the protection of bufcache_lock,
BUSY is enough.
 1.64  02-Jan-2008  ad Merge vmlocking2 to head.
 1.63  08-Oct-2007  ad branches: 1.63.4; 1.63.6; 1.63.10;
Merge ffs locking & brelse changes from the vmlocking branch.
 1.62  15-Feb-2007  ad branches: 1.62.6; 1.62.18; 1.62.20; 1.62.22;
Replace some uses of lockmgr() / simplelocks.
 1.61  14-May-2006  elad integrate kauth.
 1.60  07-Apr-2006  perseant Several minor bug fixes:

* Correct (weak) segment lock assertions in lfs_fragextend and lfs_putpages.
* Keep IN_MODIFIED set if we run out of avail in lfs_putpages.
* Don't try to (re)write buffers on a VBLK vnode; fixes a panic I found
while running with an LFS root.
* Raise priority of LFCNSEGWAIT to PVFS; PUSER is way too low for
something the pagedaemon is relying on.
 1.59  24-Dec-2005  perry branches: 1.59.4; 1.59.6; 1.59.8; 1.59.10; 1.59.12;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.58  11-Dec-2005  christos merge ktrace-lwp.
 1.57  02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.56  19-Apr-2005  perseant branches: 1.56.2; 1.56.4;
Keep per-inode, per-fs, and subsystem-wide counts of blocks allocated through
lfs_balloc(), and use that to estimate the number of dirty pages belonging
to LFS (subsystem or filesystem). This is almost certainly wrong for
the case of a large mmap()ed region, but the accounting is tighter than
what we had before, and performs much better in the typical case of pages
dirtied through write().
 1.55  16-Apr-2005  perseant Use splay trees, rather than a hash table, to manage the accounting of
blocks allocated through VOP_BALLOC() for pages to be written to disk.
This accounting no longer takes a noticeable fraction of the system CPU.
 1.54  14-Apr-2005  perseant Tabify leading whitespace
 1.53  14-Apr-2005  perseant Consolidate the hash table we use to maintain the integrity of lfs_avail
into a single, system-wide table, rather than having a separate hash table
per inode. Significantly reduces the "system" cpu usage of your average
file write.
 1.52  01-Apr-2005  perseant Protect various per-fs structures with fs->lfs_interlock simple_lock, to
improve behavior in the multiprocessor case. Add debugging segment-lock
assertion statements.
 1.51  02-Mar-2005  perseant branches: 1.51.2;
Put the ISSPACE() check where it belongs. This allows rewriting a file
on a full filesystem while still returning ENOSPC on an attempt to allocate
new blocks.
 1.50  26-Feb-2005  perry nuke trailing whitespace
 1.49  26-Feb-2005  perseant Various minor LFS improvements:

* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statvfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().
 1.48  25-Jan-2004  hannken branches: 1.48.6; 1.48.8; 1.48.10;
Make VOP_STRATEGY(bp) a real VOP as discussed on tech-kern.

VOP_STRATEGY(bp) is replaced by one of two new functions:

- VOP_STRATEGY(vp, bp) Call the strategy routine of vp for bp.
- DEV_STRATEGY(bp) Call the d_strategy routine of bp->b_dev for bp.

DEV_STRATEGY(bp) is used only for block-to-block device situations.
 1.47  30-Dec-2003  pk Replace the traditional buffer memory management -- based on fixed per buffer
virtual memory reservation and a private pool of memory pages -- by a scheme
based on memory pools.

This allows better utilization of memory because buffers can now be allocated
with a granularity finer than the system's native page size (useful for
filesystems with e.g. 1k or 2k fragment sizes). It also avoids fragmentation
of virtual to physical memory mappings (due to the former fixed virtual
address reservation) resulting in better utilization of MMU resources on some
platforms. Finally, the scheme is more flexible by allowing run-time decisions
on the amount of memory to be used for buffers.

On the other hand, the effectiveness of the LRU queue for buffer recycling
may be somewhat reduced compared to the traditional method since, due to the
nature of the pool based memory allocation, the actual least recently used
buffer may release its memory to a pool different from the one needed by a
newly allocated buffer. However, this effect will kick in only if the
system is under memory pressure.
 1.46  29-Oct-2003  mycroft Adjust to remove bogus initializer.
 1.45  25-Oct-2003  christos Fix uninitialized variable warnings.
 1.44  04-Sep-2003  yamt don't call LFS_DEBUG_COUNTLOCKED after bread().
lfs_countlocked doesn't count buffers that isn't on the freelist.
 1.43  07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.42  18-May-2003  yamt branches: 1.42.2;
make is_sequential a callback in order to achieve better lfs write clustering.

since lfs always rewrite blocks into the new segment,
current on-disk place of the block doesn't affect to write clustering.

ok'ed by Konrad Schroder.
 1.41  29-Apr-2003  yamt add an assertion.
 1.40  02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.39  15-Mar-2003  perseant Add simple_lock protection for lfs_seglock and lfs_subsys_pages; these will
be expanded to cover other per-fs and subsystem-wide data as well.

Fix a case of IN_MODIFIED being set without updating lfs_uinodes, resulting
in a "lfs_uinodes < 0" panic.

Fix a deadlock in lfs_putpages arising from the need to busy all pages in a
block; unbusy any that had already been busied before starting over.
 1.38  28-Feb-2003  perseant Fix a clrbuf() on an uninitialized pointer.
 1.37  20-Feb-2003  perseant Tabify, and fix some comment alignment problems.
 1.36  17-Feb-2003  perseant Add code to UBCify LFS. This is still behind "#ifdef LFS_UBC" for now
(there are still some details to work out) but expect that to go
away soon. To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:

* Create a writer daemon kernel thread whose purpose is to handle page
writes for the pagedaemon, but which also takes over some of the
functions of lfs_check(). This thread is started the first time an
LFS is mounted.

* Add a "flags" parameter to GOP_SIZE. Current values are
GOP_SIZE_READ, meaning that the call should return the size of the
in-core version of the file, and GOP_SIZE_WRITE, meaning that it
should return the on-disk size. One of GOP_SIZE_READ or
GOP_SIZE_WRITE must be specified.

* Instead of using malloc(...M_WAITOK) for everything, reserve enough
resources to get by and use malloc(...M_NOWAIT), using the reserves if
necessary. Use the pool subsystem for structures small enough that
this is feasible. This also obsoletes LFS_THROTTLE.

And a few that are not strictly necessary:

* Moves the LFS inode extensions off onto a separately allocated
structure; getting closer to LFS as an LKM. "Welcome to 1.6O."

* Unified GOP_ALLOC between FFS and LFS.

* Update LFS copyright headers to correct values.

* Actually cast to unsigned in lfs_shellsort, like the comment says.

* Keep track of which segments were empty before the previous
checkpoint; any segments that pass two checkpoints both dirty and
empty can be summarily cleaned. Do this. Right now lfs_segclean
still works, but this should be turned into an effectless
compatibility syscall.
 1.35  24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.34  11-Dec-2002  yamt take care of B_CLRBUF in lfs_balloc.
otherwise you'll see uninitialized blocks.
 1.33  06-Jul-2002  perseant Deal with fragment size changes better. For each fragment that can
exist on an on-disk inode, we keep a record of its size in struct inode,
which is updated when we write the block to disk. The cleaner routines
thus have ready access to what size is the correct size for this block,
on disk.

Fixed a related bug: if a file with fragments is being cleaned
(fragments being cleaned) at the same time it is being extended beyond
NDADDR blocks, we could write a bogus FINFO record that has a frag in the
middle; when it was cleaned this would give back bogus file data. Don't
write the indirect blocks in this case, since there is no need.

lfs_fragextend and lfs_truncate no longer require the seglock, but instead
take a shared lock, which the seglock locks exclusively.
 1.32  14-May-2002  perseant branches: 1.32.2;
Phase one of my three-phase plan to make LFS play nice with UBC, and bug-fixes
I found while making sure there weren't any new ones.

* Make the write clusters keep track of the buffers whose blocks they contain.
This should make it possible to (1) write clusters using a page mapping
instead of malloc, if desired, and (2) schedule blocks for rewriting
(somewhere else) if a write error occurs. Code is present to use
pagemove() to construct the clusters but that is untested and will go away
anyway in favor of page mapping.
* DEBUG now keeps a log of Ifile writes, so that any lingering instances of
the "dirty bufs" problem can be properly debugged.
* Keep track of whether the Ifile has been dirtied by various routines that
can be called by lfs_segwrite, and loop on that until it is clean, for
a checkpoint. Checkpoints need to be squeaky clean.
* Warn the user (once) if the Ifile grows larger than is reasonable for their
buffer cache. Both lfs_mountfs and lfs_unmount check since the Ifile can
grow.
* If an inode is not found in a disk block, try rereading the block, under
the assumption that the block was copied to a cluster and then freed.
* Protect WRITEINPROG() with splbio() to fix a hang in lfs_update.
 1.31  23-Nov-2001  chs add spaces for KNF. confirmed to produce identical objects.
 1.30  08-Nov-2001  lukem add RCSID
 1.29  13-Jul-2001  perseant branches: 1.29.4;
Merge the short-lived perseant-lfsv2 branch into the trunk.

Kernels and tools understand both v1 and v2 filesystems; newfs_lfs
generates v2 by default. Changes for the v2 layout include:

- Segments of non-PO2 size and arbitrary block offset, so these can be
matched to convenient physical characteristics of the partition (e.g.,
stripe or track size and offset).

- Address by fragment instead of by disk sector, paving the way for
non-512-byte-sector devices. In theory fragments can be as large
as you like, though in reality they must be smaller than MAXBSIZE in size.

- Use serial number and filesystem identifier to ensure that roll-forward
doesn't get old data and think it's new. Roll-forward is enabled for
v2 filesystems, though not for v1 filesystems by default.

- The inode free list is now a tailq, paving the way for undelete (undelete
is not yet implemented, but can be without further non-backwards-compatible
changes to disk structures).

- Inode atime information is kept in the Ifile, instead of on the inode;
that is, the inode is never written *just* because atime was changed.
Because of this the inodes remain near the file data on the disk, rather
than wandering all over as the disk is read repeatedly. This speeds up
repeated reads by a small but noticeable amount.

Other changes of note include:

- The ifile written by newfs_lfs can now be of arbitrary length, it is no
longer restricted to a single indirect block.

- Fixed an old bug where ctime was changed every time a vnode was created.
I need to look more closely to make sure that the times are only updated
during write(2) and friends, not after-the-fact during a segment write,
and certainly not by the cleaner.
 1.28  30-May-2001  mrg branches: 1.28.2; 1.28.4;
use _KERNEL_OPT
 1.27  21-Nov-2000  perseant branches: 1.27.2;
More locked_queue_* and lfs_avail accounting fixes from Jesse Off
<joff@gci-net.com>. Remove a specious btodb() in lfs_fragextend, and
count blocks shrunk or removed by VOP_TRUNCATE in lfs_avail.
 1.26  17-Nov-2000  perseant Correct accounting of lfs_avail, locked_queue_count, and locked_queue_bytes.
(PR #11468). In the case of fragment allocation, check to see if enough
space is available before extending a fragment already scheduled for writing.

The locked_queue_* variables indicate the number of buffer headers and bytes,
respectively, that are unavailable to getnewbuf() because they are locked up
waiting for LFS to flush them; make sure that that is actually what we're
counting, i.e., never count malloced buffers, and always use b_bufsize instead
of b_bcount.

If DEBUG is defined, the periodic calls to lfs_countlocked will now complain
if either counter is incorrect. (In the future lfs_countlocked will not need
to be called at all if DEBUG is not defined.)
 1.25  09-Sep-2000  perseant Various bug-fixes to LFS, to wit:


Kernel:

* Add runtime quantity lfs_ravail, the number of disk-blocks reserved
for writing. Writes to the filesystem first reserve a maximum amount
of blocks before their write is allowed to proceed; after the blocks
are allocated the reserved total is reduced by a corresponding amount.

If the lfs_reserve function cannot immediately reserve the requested
number of blocks, the inode is unlocked, and the thread sleeps until
the cleaner has made enough space available for the blocks to be
reserved. In this way large files can be written to the filesystem
(or, smaller files can be written to a nearly-full but thoroughly
clean filesystem) and the cleaner can still function properly.

* Remove explicit switching on dlfs_minfreeseg from the kernel code; it
is now merely a fs-creation parameter used to compute dlfs_avail and
dlfs_bfree (and used by fsck_lfs(8) to check their accuracy). Its
former role is better assumed by a properly computed dlfs_avail.

* Bounds-check inode numbers submitted through lfs_bmapv and lfs_markv.
This prevents a panic, but, if the cleaner is feeding the filesystem
the wrong data, you are still in a world of hurt.

* Cleanup: remove explicit references of DEV_BSIZE in favor of
btodb()/dbtob().

lfs_cleanerd:

* Make -n mean "send N segments' blocks through a single call to
lfs_markv". Previously it had meant "clean N segments though N calls
to lfs_markv, before looking again to see if more need to be cleaned".
The new behavior gives better packing of direct data on disk with as
little metadata as possible, largely alleviating the problem that the
cleaner can consume more disk through inefficient use of metadata than
it frees by moving dirty data away from clean "holes" to produce
entirely clean segments.

* Make -b mean "read as many segments as necessary to write N segments
of dirty data back to disk", rather than its former meaning of "read
as many segments as necessary to free N segments worth of space". The
new meaning, combined with the new -n behavior described above,
further aids in cleaning storage efficiency as entire segments can be
written at once, using as few blocks as possible for segment summaries
and inode blocks.

* Make the cleaner take note of segments which could not be cleaned due
to error, and not attempt to clean them until they are entirely free
of dirty blocks. This prevents the case in which a cleanerd running
with -n 1 and without -b (formerly the default) would spin trying
repeatedly to clean a corrupt segment, while the remaining space
filled and deadlocked the filesystem.

* Update the lfs_cleanerd manual page to describe all the options,
including the changes mentioned here (in particular, the -b and -n
flags were previously undocumented).

fsck_lfs:

* Check, and optionally fix, lfs_avail (to an exact figure) and
lfs_bfree (within a margin of error) in pass 5.

newfs_lfs:

* Reduce the default dlfs_minfreeseg to 1/20 of the total segments.

* Add a warning if the sgs disklabel field is 16 (the default for FFS'
cpg, but not usually desirable for LFS' sgs: 5--8 is a better range).

* Change the calculation of lfs_avail and lfs_bfree, corresponding to
the kernel changes mentioned above.

mount_lfs:

* Add -N and -b options to pass corresponding -n and -b options to
lfs_cleanerd.

* Default to calling lfs_cleanerd with "-b -n 4".


[All of these changes were largely tested in the 1.5 branch, with the
idea that they (along with previous un-pulled-up work) could be applied
to the branch while it was still in ALPHA2; however my test system has
experienced corruption on another filesystem (/dev/console has gone
missing :^), and, while I believe this unrelated to the LFS changes, I
cannot with good conscience request that the changes be pulled up.]
 1.24  04-Jul-2000  perseant Fix errors observed while trying to fill the filesystem with yesterday's
fixes:

- Write copies of bfree and avail in the CLEANERINFO block, so the
cleaner doesn't have to guess which superblock has the current
information (if indeed any do).

- Tighten up accounting of lfs_avail (more needs to be done).

- When cleansing indirect blocks of UNWRITTEN, make sure not to mark
them clean, since they'll need to be rewritten later.
 1.23  03-Jul-2000  perseant Fix i_ffs_blocks in fragment extension case where fragment has not yet
been written to disk.
 1.22  03-Jul-2000  perseant i_lfs_effnblks fixes. Put debugging printfs under #ifdef DEBUG_LFS.
 1.21  03-Jul-2000  perseant Allow the number of free segments reserved for the cleaner to be
parametrized in the filesystem, defaulting to MIN_FREE_SEGS = 2 but set
to something more reasonable at newfs_lfs time.

Note the number of blocks that have been scheduled for writing but which
are not yet on disk in an inode extension, i_lfs_effnblks. Move
i_ffs_effnlink out of the ffs extension and onto the main inode, since
it's used all over the shared code and the lfs extension would clobber
it.

At inode write time, indirect blocks and inode-held blocks of inodes
that have i_lfs_effnblks != i_ffs_blocks are cleansed of UNWRITTEN disk
addresses, so that these never make it to disk.
 1.20  28-Jun-2000  mrg remove include of <vm/vm.h> and <uvm/uvm_extern.h>
 1.19  27-Jun-2000  perseant Fixes associated with filling an LFS:

Change the space computation to appear to change the size of the *disk*
rather than the *bytes used* when more segment summaries and inode
blocks are written. Try to estimate the amount of space that these will
take up when more files are written, so the disk size doesn't change too
much.

Regularize error returns from lfs_valloc, lfs_balloc, lfs_truncate: they
now fail entirely, rather than succeeding half-way and leaving the fs in
an inconsistent state.

Rewrite lfs_truncate, mostly stealing from ffs_truncate. The old
lfs_truncate had difficulty truncating a large file to a non-zero size
(indirect blocks were not handled appropriately).

Unmark VDIROP on fvp after ufs_remove, ufs_rmdir, so these can be
reclaimed immediately: this vnode would not be written to disk again
anyway if the removal succeeded, and if it failed, no directory
operation occurred.

ufs_makeinode and ufs_mkdir now remove IN_ADIROP on error.
 1.18  06-Jun-2000  perseant branches: 1.18.2;
Protect inode free list with seglock, instead of separate lock, so that
the head of the inode free list (on the superblock) always matches the
rest of the free list (in the ifile).

Protect lfs_fragextend with seglock, to prevent the segment byte count
fudging from making its way to disk.

Don't try to inactivate dirop vnodes that are still in the middle of
their dirop (may address PR#10285).
 1.17  30-May-2000  perseant Don't try to "correct" accounting for fragments being extended but which
have never been written to disk.
 1.16  05-May-2000  perseant branches: 1.16.2;
Change the way LFS does block accounting, from trying to infer from the
buffer cache flags, to marking the inode and/or indirect blocks with a
special disk address UNWRITTEN==-2 when a block is accounted for. (This
address is never written to disk, but only used in-core. This is essentially
the same method of block accounting as on the UBC branch, where the buffer
headers don't exist.) Make sure that truncation is handled properly,
especially in the case of holey files.

Fixes PR#9994.
 1.15  23-Apr-2000  perseant Fix problems outlined in PR#9926:
- lfs_truncate extends the file if called with length > i_ffs_size;
- lfs_truncate errors out if called with length < 0;
- lfs_balloc block accounting corrected for the case of blocks read
into the cache before they exist on disk;
- mp->mnt_stat.f_iosize is initialized in lfs_mountfs.
 1.14  15-Nov-1999  fvdl Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.13  15-Jun-1999  perseant branches: 1.13.2; 1.13.4; 1.13.8;
Minor changes to the segment live bytes calculation. In particular, fixed
a bug in fragment extension that could run the count negative. Also, don't
overcount for inodes, and don't count segment summaries. Thus, for empty
segments the live bytes count should now be exactly zero.
 1.12  24-Mar-1999  mrg branches: 1.12.2; 1.12.4; 1.12.6;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.
 1.11  10-Mar-1999  perseant New sources should leave the LFS in a more-or-less working state. Changes
include:

- DIROP segregation is enabled, and greater care is taken
to make sure that a checkpoint completes. Fsck is not
needed to remount the filesystem.
- Several checks to make sure that the LFS subsystem does not
overuse various resources (memory, in particular).
- The cleaner routines, lfs_markv in particular, are completely
rewritten. A buffer overflow is removed. Greater care is taken
to ensure that inodes come from where lfs_cleanerd say they come
from (so we know nothing has changed since lfs_bmapv was called).
- Fragment allocation is fixed, so that writes beyond end-of-file
do the right thing.
 1.10  09-Nov-1998  mycroft GC the B_CACHE bit.
 1.9  09-Jun-1998  scottr Protect various config(8)-generated files from inclusion while
building LKMs. Fixes PR 5557.
 1.8  08-Jun-1998  scottr Use the newly-defined opt_quota.h.
 1.7  03-Mar-1998  drochner Don't cast the quad_t file size to u_long, this can cause overflows.
 1.6  03-Mar-1998  fvdl Make this compile again with UVM
 1.5  01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.4  11-Jun-1997  bouyer Add support for ext2fs, this needed a few modifications to ufs/ufs/inode.h:
- added an "union inode_ext" to struct inode, for the per-fs extentions.
For now only ext2fs uses it.
- i_din is now an union:
union {
struct dinode ffs_din; /* 128 bytes of the on-disk dinode. */
struct ext2fs_dinode e2fs_din; /* 128 bytes of the on-disk dinode. */
} i_din
Added a lot of #define i_ffs_* and i_e2fs_* to access the fields.
- Added two macros: FFS_ITIMES and EXT2FS_ITIMES. ITIMES calls the rigth
macro, depending on the time of the inode. ITIMES is used where necessary,
FFS_ITIMES and EXT2FS_ITIMES in other places.
 1.3  09-Feb-1996  christos lfs prototypes
 1.2  29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1  08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2  01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1  01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.12.6.1  30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.12.4.3  31-Aug-1999  perseant Rudimentary support for LFS under UBC:

- LFS-specific VOP_BALLOC and VOP_PUTPAGES vnode ops.

- getblk VREG panic #ifdef'd out (can be reinstated when Ifile is
internalized and Ifile can be made another type from VREG)

- interface to VOP_PUTPAGES changed to pass all pager flags, not
just sync. FS putpages routines must know about the pager flags.

- new LFS magic disk address, -2 ("unwritten"), meaning accounted for
but not assigned to a fixed disk location (since LFS does these two
things separately, and the previous accounting method using buffer
headers no longer will work). Changed references to (foo == (daddr_t)-1)
to (foo < 0). Since disk drivers reject all addresses < 0, this should
not present a problem for other FSs.
 1.12.4.2  04-Jul-1999  chs a couple steps towards supporting UBC.
 1.12.4.1  21-Jun-1999  thorpej Sync w/ -current.
 1.12.2.1  25-Jun-1999  perry pullup 1.12->1.13 (perseant)
 1.13.8.1  27-Dec-1999  wrstuden Pull up to last week's -current.
 1.13.4.1  19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.13.2.2  22-Nov-2000  bouyer Sync with HEAD.
 1.13.2.1  20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.16.2.1  22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.18.2.2  03-Feb-2001  he Pull up revisions 1.26-1.27 (requested by perseant):
o Close up accounting holes in LFS' accounting of immediately-
available-space, number of clean segments, and amount of dirty
space taken up by metadata (PR#11468, PR#11470, PR#11534).
 1.18.2.1  14-Sep-2000  perseant Pull up recent LFS kernel changes (approved by thorpej):

ufs/ufs/inode.h, 1.20--1.22 (add i_lfs_effnblks extension ;
make ITIMES aware of LFS_ITIMES;
_LKM protection so userland progs
compile)
ufs/ufs/ufs_vnops.c, 1.69, 1.71 (remove IN_ADIROP;
use ITIMES instead of FFS_ITIMES)
ufs/ufs/ufs_readwrite.c, 1.27 (use lfs_reserve in lfs_write)
ufs/lfs/lfs.h, 1.26--1.32 (define LFS_EST_* macros ;
change MIN_FREE_SEGS to lfs_minfreesegs ;
add avail and bfree to CLEANERINFO ;
change lfs_uinodes to signed ;
change lfs_dmeta to signed ;
add whitespace to line up structure
members ;
explicit cast to int32_t in LFS_EST_*
macros)
ufs/lfs/lfs_alloc.c, back out 1.34.2.3 (pullups of 1.39, 1.40);
then pull up 1.38 (clean up on error)
1.39--1.43 (restore fvdl's ufs_hashlock fix ;
restore fvdl's ufs_hashlock fix ;
set i_lfs_effnblks ;
use UINO macros ;
add comments and fix long lines)
ufs/lfs/lfs_balloc.c, 1.19 (don't succeed halfway)
1.21--1.25 (use i_lfs_effnblks ;
fix i_lfs_effnblks computation and
quieten ;
fix i_ffs_blocks in unwritten fragment ;
remove useless debugging check ;
add comments and (c) 2000)
ufs/lfs/lfs_bio.c, 1.24--1.30 (cleanup and make lfs_flush_fs take
"struct lfs *" instead of "struct
mount *" ;
use lfs_minfreeseg instead of
MIN_FREE_SEGS ;
use UINO macros, and copy bfree/avail
to CLEANERINFO ;
add lfs_reserve function ;
1.28--1.30 fix printf formatting)
ufs/lfs/lfs_cksum.c, 1.13 (add (c) 2000)
ufs/lfs/lfs_debug.c, 1.11 (use btodb instead of DEV_BSIZE)
ufs/lfs/lfs_extern.h, 1.18, 1.20--1.21 (function prototype changes)
ufs/lfs/lfs_inode.c, 1.38 (rewrite lfs_truncate from
ffs_truncate)
1.40--1.44 (count written and unwritten blocks
seperately ;
use disk block units instead of bytes ;
remove unnecessary "mod" variable ;
correct B_DELWRI to avoid bawrite panic ;
use lfs_reserve)
ufs/lfs/lfs_segment.c, 1.52-1.59 (use lfs_dmeta to note used summaries ;
check for UNWRITTEN in indirect blocks ;
more debugging stuff inside #ifdef
DEBUG_LFS ;
use LK_CANRECURSE ;
don't drop dirty indirect blocks ;
use UINO macros ;
don't hose the free list ;
use btodb() instead of DEV_BSIZE ;
make it compile again (oops))
ufs/lfs/lfs_subr.c, 1.16--1.17 (check for locked inodes before
changing ;
use btodb() instead of DEV_BSIZE, (c)
2000)
ufs/lfs/lfs_syscalls.c, back out 1.41.4.2 (fvdl's ufs_hashlock fix);
then pull up 1.43 (use lfs_dmeta)
1.44--1.45 (restore fvdl's ufs_hashlock fix)
1.46--1.47 (fix lfs_avail leakage from sblock
segments ;
use UINO macros)
1.49 (bounds-check inode numbers in
lfs_markv)
ufs/lfs/lfs_vfsops.c, 1.53 (use LFS_EST_* macros in lfs_statfs)
1.56--1.58 (initialize lfs_minfreeseg, lfs_effnblk ;
initialize lfs_uinodes ;
initialize lfs_ravail)
ufs/lfs/lfs_vnops.c, 1.40 (remove VDIROP from removed files)
1.42--1.44 (move SET_ENDOP below the removal of
VDIROP ;
use UINO macros and add lfs_itimes
function ;
use lfs_reserve in dirops)
 1.27.2.8  11-Dec-2002  thorpej Sync with HEAD.
 1.27.2.7  01-Aug-2002  nathanw Catch up to -current.
 1.27.2.6  20-Jun-2002  nathanw Catch up to -current.
 1.27.2.5  08-Jan-2002  nathanw Catch up to -current.
 1.27.2.4  14-Nov-2001  nathanw Catch up to -current.
 1.27.2.3  24-Aug-2001  nathanw Catch up with -current.
 1.27.2.2  21-Jun-2001  nathanw Catch up to -current.
 1.27.2.1  05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.28.4.4  06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.28.4.3  23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.28.4.2  10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.28.4.1  03-Aug-2001  lukem update to -current
 1.28.2.2  02-Jul-2001  perseant Change disk addressing unit to be the fragment, instead of the disk sector.
All quantities in the superblock, inodes, indirect blocks, etc. refer now
to this abstract unit (called "fsb" as it is in FFS) instead of disk sectors;
as a consequence segment summary blocks have to be multiples of a fragment in
size. In v1 filesystems, compatibility code ensures that 1 fsb == 1 sector,
regardless of fragment size.

Fragments can now range in size between 512 and 32k; in the event that
LFS_LABELPAD (8k) is smaller than the disk address unit size, an extra
proto-superblock is kept at 8k from the beginning of the disk, to be used
*only* to locate the real superblocks. (Not all of the userland knows about
this yet.)

Almost all of this was done not by me, but by joff.
 1.28.2.1  29-Jun-2001  perseant Get rid of __P(), protoizing where it had not already been done
 1.29.4.1  12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.32.2.1  15-Jul-2002  gehenna catch up with -current.
 1.42.2.5  10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.42.2.4  04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.42.2.3  21-Sep-2004  skrll Fix the sync with head I botched.
 1.42.2.2  18-Sep-2004  skrll Sync with HEAD.
 1.42.2.1  03-Aug-2004  skrll Sync with HEAD
 1.48.10.1  19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.48.8.1  29-Apr-2005  kent sync with -current
 1.48.6.1  10-May-2005  riz Pull up the following revisions (requested by perseant in ticket #1281):

1.8 sys/ufs/lfs/TODO
1.75 sys/ufs/lfs/lfs.h (via patch)
1.74 sys/ufs/lfs/lfs_alloc.c (via patch)
1.49, 1.51 sys/ufs/lfs/lfs_balloc.c (1.51 via patch)
1.78 sys/ufs/lfs/lfs_bio.c
1.62 sys/ufs/lfs/lfs_extern.h (via patch)
1.156 sys/ufs/lfs/lfs_segment.c (via patch)
1.48 sys/ufs/lfs/lfs_subr.c
1.101 sys/ufs/lfs/lfs_syscalls.c
1.163 sys/ufs/lfs/lfs_vfsops.c (via patch)
1.134 sys/ufs/lfs/lfs_vnops.c (via patch)
1.61 sys/ufs/ufs/ufs_readwrite.c (via patch)

1.20 libexec/lfs_cleanerd/clean.h (via patch)
1.52 libexec/lfs_cleanerd/cleanerd.c (via patch)
1.41 libexec/lfs_cleanerd/library.c (via patch)

1.4 regress/sys/fs/lfs/newfs_fsck/Makefile
1.2 regress/sys/fs/lfs/newfs_fsck/mkfs_mount
1.2 regress/sys/fs/lfs/newfs_fsck/smallfiles
1.3 sbin/fsck_lfs/bufcache.c
1.3 sbin/fsck_lfs/bufcache.h
1.3 sbin/fsck_lfs/lfs.h
1.8 sbin/fsck_lfs/lfs.c (via patch)
1.8 sbin/fsck_lfs/pass3.c (via patch)
1.18 sbin/fsck_lfs/pass0.c (via patch)
1.18 sbin/fsck_lfs/utilities.c (via patch)
1.7 sbin/fsck_lfs/segwrite.c
1.19 sbin/fsck_lfs/setup.c (via patch)
1.3 sbin/newfs_lfs/Makefile
0 sbin/newfs_lfs/lfs.c (yes, remove it)
1.1 sbin/newfs_lfs/make_lfs.c
1.15 sbin/newfs_lfs/newfs.c (via patch)

Various minor LFS improvements.

Kernel:

* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this. Should fix PR #29045.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
Fixes PR #26680.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().

cleaner:

* Adapt lfs_cleanerd to use the fcntl call to get the Ifile filehandle,
so it need not be in the namespace.
* Make lfs_cleanerd be more careful when there are very few available
segments.
* Make lfs_cleanerd less verbose when the filesystem is unmounted.

newfs_lfs, fsck_lfs, and regression:

* Extend the lfs library from fsck_lfs(8) so that it can be used with a
not-yet-existent LFS. Make newfs_lfs(8) use this library, so it can
create LFSs whose Ifile is larger than one segment. Addresses PR #11110.
* Make newfs_lfs(8) use strsuftoi64() for its arguments, a la newfs(8).
* Make fsck_lfs(8) respect the "file system is clean" flag.
* Don't let fsck_lfs(8) think it has dirty blocks when invoked with the
-n flag.
* Remove the Ifile from the filesystem namespace. The cleaner now uses
a fcntl call on the root inode to find the Ifile filehandle. (As a
side-effect, addresses PR #29144.)
 1.51.2.2  20-May-2006  riz Pull up following revision(s) (requested by perseant in ticket #1327):
sys/ufs/lfs/lfs_balloc.c: revision 1.60
sys/ufs/lfs/lfs_syscalls.c: revision 1.111
sys/ufs/lfs/lfs_segment.c: revision 1.172
sys/ufs/lfs/lfs_vnops.c: revision 1.163
Several minor bug fixes:
* Correct (weak) segment lock assertions in lfs_fragextend and lfs_putpages.
* Keep IN_MODIFIED set if we run out of avail in lfs_putpages.
* Don't try to (re)write buffers on a VBLK vnode; fixes a panic I found
while running with an LFS root.
* Raise priority of LFCNSEGWAIT to PVFS; PUSER is way too low for
something the pagedaemon is relying on.
 1.51.2.1  07-May-2005  tron Apply patch (requested by perseant in ticket #242):
* fsck_lfs buffer cache fixes, including PR #29151
* Change fsck_lfs phase 0 message to reflect reality
* fsck_lfs: check phase 5 (cleanerinfo accounting) even on
roll-forward
* Keep better track of the free list during roll-forward, avoiding
a core dump
* Improve hash table use for fsck_lfs buffer and vnode cache
* Document fsck_lfs flag -f, and implement -q
* Add resize_lfs, including kernel support
* Add LFS to mountd's list of exportable filesystem types
* Make the LFS lkm work again [christos@]
* Add MP locking to the LFS kernel subsystem
* Fix pager_map deadlock in lfs_putpages()
* Avoid incomplete file extension that looks like "partial
truncation" to fsck
* Use lfs_malloc for cleaner malloc, since the cleaner often runs
in low-memory conditions.
* Use splay trees, not hash table, to track page allocation for
write.
* Fix mkdir panic on full fs
* Fix page accounting leak by counting differently.
* Use rightly named structure for lfs_getattr [skrll@]
* Cosmetic changes for readability.
 1.56.4.1  20-Oct-2005  yamt adapt ufs.
 1.56.2.5  27-Feb-2008  yamt sync with head.
 1.56.2.4  21-Jan-2008  yamt sync with head
 1.56.2.3  27-Oct-2007  yamt sync with head.
 1.56.2.2  26-Feb-2007  yamt sync with head.
 1.56.2.1  21-Jun-2006  yamt sync with head.
 1.59.12.1  24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.59.10.3  06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.59.10.2  19-Apr-2006  elad sync with head.
 1.59.10.1  08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.59.8.2  24-May-2006  yamt sync with head.
 1.59.8.1  11-Apr-2006  yamt sync with head
 1.59.6.2  01-Jun-2006  kardel Sync with head.
 1.59.6.1  22-Apr-2006  simonb Sync with head.
 1.59.4.1  09-Sep-2006  rpaulo sync with head
 1.62.22.1  14-Oct-2007  yamt sync with head.
 1.62.20.3  23-Mar-2008  matt sync with HEAD
 1.62.20.2  09-Jan-2008  matt sync with HEAD
 1.62.20.1  06-Nov-2007  matt sync with HEAD
 1.62.18.1  26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.62.6.3  24-Aug-2007  ad Sync with buffer cache locking changes. See buf.h/vfs_bio.c for details.
Some minor portions are incomplete and needs to be verified as a whole.
 1.62.6.2  13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.62.6.1  13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.63.10.1  02-Jan-2008  bouyer Sync with HEAD
 1.63.6.2  19-Dec-2007  ad Use a global lfs_lock.
 1.63.6.1  04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.63.4.1  18-Feb-2008  mjf Sync with HEAD.
 1.65.10.3  11-Mar-2010  yamt sync with head
 1.65.10.2  04-May-2009  yamt sync with head.
 1.65.10.1  16-May-2008  yamt sync with head.
 1.65.8.1  18-May-2008  yamt sync with head.
 1.65.6.1  02-Jun-2008  mjf Sync with HEAD.
 1.66.2.1  23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.67.12.1  13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.67.6.1  28-Apr-2009  skrll Sync with HEAD.
 1.68.2.1  30-Apr-2010  uebayasi Sync with HEAD.
 1.70.12.4  03-Dec-2017  jdolecek update from HEAD
 1.70.12.3  20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.70.12.2  23-Jun-2013  tls resync from head
 1.70.12.1  25-Feb-2013  tls resync with head
 1.70.2.2  22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.70.2.1  23-Jan-2013  yamt sync with head
 1.77.2.1  28-Aug-2013  rmind sync with head
 1.80.6.5  28-Aug-2017  skrll Sync with HEAD
 1.80.6.4  05-Oct-2016  skrll Sync with HEAD
 1.80.6.3  27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.80.6.2  22-Sep-2015  skrll Sync with HEAD
 1.80.6.1  06-Apr-2015  skrll Sync with HEAD
 1.88.2.1  26-Apr-2017  pgoyette Sync with HEAD
 1.91.2.1  21-Apr-2017  bouyer Sync with HEAD
 1.92.6.1  30-Oct-2017  snj Pull up following revision(s) (requested by maya in ticket #330):
sbin/fsck_lfs/inode.c: 1.69
sbin/fsck_lfs/lfs.c: 1.73
sbin/fsck_lfs/pass6.c: 1.50
sbin/fsck_lfs/segwrite.c: 1.46
sys/ufs/lfs/lfs.h: 1.202-1.203
sys/ufs/lfs/lfs_accessors.h: 1.48
sys/ufs/lfs/lfs_alloc.c: 1.136-1.137
sys/ufs/lfs/lfs_balloc.c: 1.94
sys/ufs/lfs/lfs_bio.c: 1.141
sys/ufs/lfs/lfs_extern.h: 1.113
sys/ufs/lfs/lfs_inode.c: 1.156-1.157
sys/ufs/lfs/lfs_inode.h: 1.20, 1.21, 1.23
sys/ufs/lfs/lfs_itimes.c: 1.20
sys/ufs/lfs/lfs_pages.c: 1.13-1.15
sys/ufs/lfs/lfs_rename.c: 1.22
sys/ufs/lfs/lfs_segment.c: 1.270-1.275
sys/ufs/lfs/lfs_subr.c: 1.94-1.97
sys/ufs/lfs/lfs_syscalls.c: 1.175
sys/ufs/lfs/lfs_vfsops.c: 1.360
sys/ufs/lfs/lfs_vnops.c: 1.316-1.321
sys/ufs/lfs/ulfs_inode.c: 1.20
sys/ufs/lfs/ulfs_inode.h: 1.24
sys/ufs/lfs/ulfs_lookup.c: 1.41
sys/ufs/lfs/ulfs_quota2.c: 1.31
sys/ufs/lfs/ulfs_readwrite.c: 1.24
sys/ufs/lfs/ulfs_vnops.c: 1.49-1.50
Update inode member i_flag --> i_state to keep up with kernel changes
Move definition of IN_ALLMOD near the flag it's a mask for.
Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
Correct confusion between i_flag and i_flags
These will have to be renamed.
Spotted by Riastradh, thanks!
Add an XXX about the missing flags so it's not buried in a commit
message.
now the XXX count for LFS is 260
Rename i_flag to i_state.
The similarity to i_flags has previously caused errors.
Use continue to denote the no-op loop to match netbsd style
newline for extra clarity.
It isn't safe to drain dirops with seglock held, it'll deadlock if there
are any dirops. drain before grabbing seglock.
lfs_dirops == 0 is always true (as we already drained dirops), so omit
that part of the comparison.
Fixes a lot of LFS deadlocks. PR kern/52301
Many thanks to dholland for help analyzing coredumps
Ifdef out KDASSERT which fires on my machine.
Deduplicate sanity check that seglock is held on segunlock
Revert r1.272 fix to PR kern/52301, the performance hit is making things
unusable.
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar
XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
fix buffer overflow/KASSERT when cookies are supplied
lfs no longer uses the ffs-style struct direct, use the correct minimum
size
from dholland
XXX more wrong
Consistently use {,UN}MARK_VNODE macros rather than function calls.
Not much point doing anything after a panic call
Ask some question about the code in a XXX comment
XXX question our double-flushing of dirops
Fix typo in comment
 1.94.12.1  29-Feb-2020  ad Sync with head.
 1.94.10.1  17-Aug-2020  martin Pull up following revision(s) (requested by riastradh in ticket #1050):

sys/ufs/lfs/lfs_subr.c: revision 1.101
sys/ufs/lfs/lfs_subr.c: revision 1.102
sys/ufs/lfs/lfs_inode.c: revision 1.158
sys/ufs/lfs/lfs_inode.h: revision 1.25
sys/ufs/lfs/lfs_balloc.c: revision 1.95
sys/ufs/lfs/lfs_pages.c: revision 1.21
sys/ufs/lfs/lfs_vnops.c: revision 1.330
sys/ufs/lfs/lfs_alloc.c: revision 1.140 (patch)
sys/ufs/lfs/lfs_alloc.c: revision 1.141 (patch)
lib/libp2k/p2k.c: revision 1.72
sys/ufs/lfs/lfs.h: revision 1.205
sys/ufs/lfs/lfs.h: revision 1.206
sys/ufs/lfs/lfs_segment.c: revision 1.284
sys/ufs/lfs/lfs.h: revision 1.207
sys/ufs/lfs/lfs_segment.c: revision 1.285
sys/ufs/lfs/lfs_debug.c: revision 1.55
sys/ufs/lfs/lfs_rename.c: revision 1.23
usr.sbin/dumplfs/dumplfs.c: revision 1.65
sys/ufs/lfs/lfs_vfsops.c: revision 1.371
sys/arch/i386/stand/efiboot/bootx64/Makefile: revision 1.3
sys/ufs/lfs/lfs_vfsops.c: revision 1.372
sys/ufs/lfs/lfs_vfsops.c: revision 1.373
sbin/fsck_lfs/pass1.c: revision 1.46
sys/ufs/lfs/lfs_vnops.c: revision 1.326
sys/ufs/lfs/lfs_vnops.c: revision 1.327
sys/ufs/lfs/lfs_vfsops.c: revision 1.375 (patch)
sys/ufs/lfs/lfs_vnops.c: revision 1.328
sys/ufs/lfs/lfs_subr.c: revision 1.98
sys/ufs/lfs/lfs_extern.h: revision 1.116
sys/ufs/lfs/lfs_vnops.c: revision 1.329
sys/ufs/lfs/lfs_subr.c: revision 1.99
sys/ufs/lfs/lfs_extern.h: revision 1.117
sys/ufs/lfs/lfs_accessors.h: revision 1.49
sys/ufs/lfs/lfs_extern.h: revision 1.118
sys/rump/fs/lib/liblfs/Makefile: revision 1.15
sys/ufs/lfs/lfs_bio.c: revision 1.146 (patch)
sys/ufs/lfs/lfs_bio.c: revision 1.147
sys/ufs/lfs/lfs_subr.c: revision 1.100

Fix kassert in lfs by initializing vp first.

Use a marker node to iterate lfs_dchainhd / i_lfs_dchain.

I believe elements can be removed while the lock is dropped,
including the next node we're hanging on to.

Just use VOP_BWRITE for lfs_bwrite_log.
Hope this doesn't cause trouble with vfs_suspend.

Teach lfs to transition ro<->rw.

Prevent new dirops while we issue lfs_flush_dirops.

lfs_flush_dirops assumes (by KASSERT((ip->i_state & IN_ADIROP) == 0))
that vnodes on the dchain will not become involved in active dirops
even while holding no other locks (lfs_lock, v_interlock), so we must
set lfs_writer here. All other callers already set lfs_writer.

We set fs->lfs_writer++ without explicitly doing lfs_writer_enter
because
(a) we already waited for the dirops to drain, and
(b) we hold lfs_lock and cannot drop it before setting lfs_writer.

Assert lfs_writer where I think we can now prove it.

Serialize access to the splay tree with lfs_lock.

Change some cheap KDASSERT into KASSERT.

Take a reference and fix assertions in lfs_flush_dirops.
Fixes panic:
KASSERT((ip->i_state & IN_ADIROP) == 0) at lfs_vnops.c:1670
lfs_flush_dirops
lfs_check
lfs_setattr
VOP_SETATTR
change_mode
sys_fchmod
syscall

This assertion -- and the assertion that vp->v_uflag has VU_DIROP set
-- is valid only until we release lfs_lock, because we may race with
lfs_unmark_dirop which will remove the nodes and change the flags.

Further, vp itself is valid only as long as it is referenced, which it
is as long as it's on the dchain, but lfs_unmark_dirop drops the
dchain's reference.

Don't lfs_writer_enter while holding v_interlock.

There's no need to lfs_writer_enter at all here, as far as I can see.
lfs_flush_fs will do it for us.

Break deadlock in PR kern/52301.

The lock order is lfs_writer -> lfs_seglock. The problem in 52301 is
that lfs_segwrite violates this lock order by sometimes doing
lfs_seglock -> lfs_writer, either (a) when doing a checkpoint or (b),
opportunistically, when there are no dirops pending. Both cases can
deadlock, because dirops sometimes take the seglock (lfs_truncate,
lfs_valloc, lfs_vfree):
(a) There may be dirops pending, and they may be waiting for the
seglock, so we can't wait for them to complete while holding the
seglock.
(b) The test for fs->lfs_dirops == 0 happens unlocked, and the state
may change by the time lfs_writer_enter acquires lfs_lock.

To resolve this in each case:
(a) Do lfs_writer_enter before lfs_seglock, since we will need it
unconditionally anyway. The worst performance impact of this should
be that some dirops get delayed a little bit.
(b) Create a new lfs_writer_tryenter to use at this point so that the
test for fs->lfs_dirops == 0 and the acquisition of lfs_writer happen
atomically under lfs_lock.

Initialize/destroy lfs_allclean_wakeup in modcmd, not lfs_mountfs.

Fixes reloading lfs.kmod.

In lfs_update, hold lfs_writer around lfs_vflush.

Otherwise, we might do
lfs_vflush
-> lfs_seglock
-> lfs_segwait(SEGM_CKP)
-> lfs_writer_enter
which is the reverse of the lfs_writer -> lfs_seglock ordering.

Call lfs_orphan in lfs_rename while we're still in the dirop.
lfs_writer_enter can't fail; keep it simple and don't pretend it can.

Assert that mtsleep can't fail either -- it doesn't catch signals and
there's no timeout.

Teach LFS_ORPHAN_NEXTFREE about lfs64.

Dust off the orphan detection code and try to make it work.

Fix !DIAGNOSTIC compile

Fix userland references to LFS_ORPHAN_NEXTFREE.

Forgot to grep for these or do a full distribution build, oops!

Fix missing <sys/evcnt.h> by removing the evcnts instead.

Just wanted to confirm that a race might happen, and indeed it did.
These serve little diagnostic value otherwise.

OR into bp->b_cflags; don't overwrite.

CTASSERT lfs on-disk structure sizes.

Avoid misaligned access to lfs64 on-disk records in memory.
lfs64 directory entries are only 32-bit aligned in order to conserve
space in directory blocks, and we had a hack to stuff a 64-bit inode
in them. This replaces the hack by __aligned(4) __packed, and goes
further:

1. It's not clear that all the other lfs64 data structures are 64-bit
aligned on disk to begin with. We can go through these later and
upgrade them from
struct foo64 {
...
} __aligned(4) __packed;
union foo {
struct foo64 f64;
...
};
to
struct foo64 {
...
};
union foo {
struct foo64 f64 __aligned(8);
...
} __aligned(4) __packed;
if we really want to take advantage of 64-bit memory accesses.
However, the __aligned(4) __packed must remain on the union
because:
2. We access even the lfs32 data structures via a union that has
lfs64 members, and it turns out that compilers will assume access
through a union with 64-bit aligned members implies the whole
union has 64-bit alignment, even if we're only accessing a 32-bit
aligned member.

Fix clang build after packed lfs64 accessor change.

Suppress spurious address-of-packed error in rump lfs too.
 1.94.6.1  08-Apr-2020  martin Merge changes from current as of 20200406

RSS XML Feed