Home | History | Annotate | Line # | Download | only in lfs
      1  1.7    andvar #   $NetBSD: CHANGES,v 1.7 2025/01/07 17:39:45 andvar Exp $
      2  1.1  perseant 
      3  1.1  perseant kernel:
      4  1.1  perseant 
      5  1.1  perseant - Instead of blindly continuing when it encounters an Inode that is
      6  1.1  perseant   locked by another process, lfs_markv will process the rest of the
      7  1.1  perseant   inodes passed to it and then return EAGAIN.  The cleaner will
      8  1.1  perseant   recognize this and not mark the segment clean.  When the cleaner runs
      9  1.7    andvar   again, the segment containing the (formerly) locked inode will sort high
     10  1.1  perseant   for cleaning, since it is now almost entirely empty.
     11  1.1  perseant 
     12  1.1  perseant - A beginning has been made to test keeping atime information in the
     13  1.1  perseant   Ifile, instead of on the inodes.  This should make read-mostly
     14  1.1  perseant   filesystems significantly faster, since the inodes will then remain
     15  1.1  perseant   close to the data blocks on disk; but of course the ifile will be
     16  1.1  perseant   somewhat larger.  This code is not enabled, as it makes the format of
     17  1.1  perseant   IFILEs change.
     18  1.1  perseant 
     19  1.1  perseant - The superblock has been broken into two components: an on-disk
     20  1.1  perseant   superblock using fixed-size types, exactly 512 bytes regardless of
     21  1.1  perseant   architecture (or could be enlarged in multiples of the media block
     22  1.1  perseant   size up to LFS_SBPAD); and an in-memory superblock containing the
     23  1.1  perseant   information only useful to a running LFS, including segment pointers,
     24  1.1  perseant   etc.  The superblock checksumming code has been modified to make
     25  1.1  perseant   future changes to the superblock format easier.
     26  1.1  perseant 
     27  1.1  perseant - Because of the way that lfs_writeseg works, buffers are freed before
     28  1.1  perseant   they are really written to disk: their contents are copied into large
     29  1.1  perseant   buffers which are written async.  Because the buffer cache does not
     30  1.1  perseant   serve to throttle these writes, and malloced memory is used to hold them,
     31  1.1  perseant   there is a danger of running out of kmem_map.  To avoid this, a new
     32  1.4       wiz   compile-time parameter, LFS_THROTTLE, is used as an upper bound for the
     33  1.1  perseant   number of partial-segments allowed to be in progress writing at any
     34  1.1  perseant   given time.
     35  1.1  perseant 
     36  1.1  perseant - If the system crashes between the point that a checkpoint is scheduled
     37  1.1  perseant   for writing and the time that the write completes, the filesystem
     38  1.1  perseant   could be left in an inconsistent state (no valid checkpoints on
     39  1.1  perseant   disk).  To avoid this, we toggle between the first two superblocks
     40  1.1  perseant   when checkpointing, and (if it is indicated that no roll-forward agent
     41  1.1  perseant   exists) do not allow one checkpoint to occur before the last one has
     42  1.1  perseant   completed.  When the filesystem is mounted, it uses the *older* of the
     43  1.1  perseant   first two superblocks.
     44  1.1  perseant 
     45  1.1  perseant - DIROPs:
     46  1.1  perseant 
     47  1.1  perseant   The design of the LFS includes segregating vnodes used in directory
     48  1.1  perseant   operations, so that they can be written at the same time during a
     49  1.1  perseant   checkpoint, avoiding filesystem inconsistency after a crash.  Code for
     50  1.1  perseant   this was partially written for BSD4.4, but was not complete or enabled.
     51  1.1  perseant 
     52  1.1  perseant   In particular, vnodes marked VDIROP could be flushed by getnewvnode at
     53  1.1  perseant   any time, negating the usefulness of marking a vnode VDIROP, since if
     54  1.1  perseant   the filesystem then crashed it would be inconsistent.  Now, when a
     55  1.1  perseant   vnode is first marked VDIROP it is also referenced.  To avoid running
     56  1.1  perseant   out of vnodes, an attempt to mark more than LFS_MAXDIROP vnodes wth
     57  1.1  perseant   VDIROP will sleep, and trigger a partial-segment write when no dirops
     58  1.1  perseant   are active.
     59  1.1  perseant 
     60  1.1  perseant - LFS maintains a linked list of free inode numbers in the Ifile;
     61  1.1  perseant   accesses to this list are now protected by a simple lock.
     62  1.1  perseant 
     63  1.1  perseant - lfs_vfree is not allowed to run while an inode has blocks scheduled
     64  1.1  perseant   for writing, since that could trigger a miscounting in lfs_truncate.
     65  1.1  perseant 
     66  1.1  perseant - lfs_balloc now correctly extends fragments, if a block is written
     67  1.1  perseant   beyond the current end-of-file.
     68  1.1  perseant 
     69  1.1  perseant - Blocks which have already been gathered into a partial-segment are not
     70  1.1  perseant   allowed to be extended, since if they were, any blocks following them
     71  1.1  perseant   would either be written in the wrong place, or overwrite other blocks.
     72  1.1  perseant 
     73  1.1  perseant - The LFS buffer-header accounting, which triggers a partial-segment
     74  1.6    andvar   write if too many buffer-headers are in use by the LFS subsystem, has
     75  1.1  perseant   been expanded to include *bytes* used in LFS buffers as well.
     76  1.1  perseant 
     77  1.1  perseant - Reads of the Ifile, which almost always come from the cleaner, can no
     78  1.1  perseant   longer trigger a partial-segment write, since this could cause a
     79  1.1  perseant   deadlock.
     80  1.1  perseant 
     81  1.1  perseant - Support has been added (but not tested, and currently disabled by
     82  1.1  perseant   default) for true read-only filesystems.  Currently, if a filesystem
     83  1.1  perseant   is mounted read-only the cleaner can still operate on it, but this
     84  1.1  perseant   obviously would not be true for read-only media.  (I think the
     85  1.1  perseant   original plan was for the roll-forward agent to operate using this
     86  1.1  perseant   "feature"?)
     87  1.1  perseant 
     88  1.1  perseant - If a fake buffer is created by lfs_markv and another process draws the
     89  1.1  perseant   same block in and changes it, the fake buffer is now discarded and
     90  1.1  perseant   replaced by the "real" buffer containing the new data.
     91  1.1  perseant 
     92  1.1  perseant - An inode which has blocks gathered no longer has IN_MODIFIED set, but
     93  1.1  perseant   still does in fact have dirty blocks attached.  lfs_update will now
     94  1.1  perseant   wait for such an inode's writes to complete before it runs,
     95  1.1  perseant   suppressing a panic in vinvalbuf.
     96  1.1  perseant 
     97  1.1  perseant - Many filesystem operations now update the Ifile's mtime, allowing the
     98  1.1  perseant   cleaner to detect when the filesystem is idle, and clean more
     99  1.1  perseant   vigorously during such times (cf. Blackwell et al., 1995).
    100  1.1  perseant 
    101  1.1  perseant - When writing a partial-segment, make sure that the current segment is
    102  1.1  perseant   still marked ACTIVE afterward (otherwise the cleaner might try to
    103  1.1  perseant   clean it, since it might well be mostly empty).
    104  1.1  perseant 
    105  1.1  perseant - Don't trust the cleaner so much.  Sort the blocks during gathering,
    106  1.1  perseant   even if they came from the cleaner; verify the location of on-disk
    107  1.1  perseant   inodes, even if the cleaner says it knows where they came from.
    108  1.1  perseant 
    109  1.1  perseant - The cleaning code (lfs_markv in particular) has been entirely
    110  1.1  perseant   rewritten, and the partial-segment writing code changed to match.
    111  1.1  perseant   Lfs_markv no longer uses its own implementation of lfs_segwrite, but
    112  1.1  perseant   marks inodes with IN_CLEANING to differentiate them from the
    113  1.1  perseant   non-cleaning inodes.  This change fixes numerous problems with the old
    114  1.3    toshii   cleaner, including a buffer overrun, and lost extensions in active
    115  1.1  perseant   fragments.  lfs_bmapv looks up and returns the addresses of inode
    116  1.1  perseant   blocks, so the cleaner can do something intelligent with them.
    117  1.1  perseant 
    118  1.1  perseant   If IN_CLEANING is set on an inode during partial-segment write, only fake
    119  1.1  perseant   buffers will be written, and IN_MODIFIED will not be cleared, saving
    120  1.1  perseant   us from a panic in vinvalbuf.  The addition of IN_CLEANING also allows
    121  1.1  perseant   dirops to be active while cleaning is in progress; since otherwise
    122  1.1  perseant   buffers engaged in active dirops might be written ahead of schedule,
    123  1.1  perseant   and cause an inconsistent checkpoint to be written to disk.
    124  1.1  perseant 
    125  1.1  perseant   (XXX - even now, DIROP blocks can sometimes be written to disk, if we
    126  1.1  perseant   are cleaning the same blocks as are active?  Grr, I don't see a good
    127  1.1  perseant   solution for this!)
    128  1.1  perseant 
    129  1.1  perseant - Added sysctl entries for LFS.  In particular, `writeindir' controls
    130  1.1  perseant   whether indirect blocks are written during non-checkpoint writes.
    131  1.1  perseant   (Since there is no roll-forward agent as yet, there is no penalty in
    132  1.1  perseant   not writing indirect blocks.)
    133  1.1  perseant 
    134  1.1  perseant - Wake up the cleaner at fs-unmount time, so it can die (if we unmount
    135  1.1  perseant   and then remount, we could conceivably get more than one cleaner
    136  1.1  perseant   operating at once).
    137  1.1  perseant 
    138  1.2  perseant newfs_lfs:
    139  1.1  perseant 
    140  1.1  perseant - The ifile inode is now created with the schg flag set, since nothing
    141  1.1  perseant   ever modifies it.  This could be a pain for the roll-forward agent,
    142  1.1  perseant   but since that should really run *before* the filesystem is mounted,
    143  1.1  perseant   I don't care.
    144  1.1  perseant 
    145  1.1  perseant - For large disks, it may be necessary to write one or more indirect
    146  1.1  perseant   blocks when the ifile inode is created.  Newlfs has been changed to
    147  1.1  perseant   write the first indirect block, if necessary.  It should instead just
    148  1.1  perseant   build a set of inodes and blocks, and then use the partial-segment
    149  1.1  perseant   writing routine mentioned above to write an ifile of whatever size is
    150  1.1  perseant   desired.
    151  1.1  perseant 
    152  1.1  perseant lfs_cleanerd:
    153  1.1  perseant 
    154  1.1  perseant - Now writes information to the syslog.
    155  1.1  perseant 
    156  1.1  perseant - Can now deal properly with fragments.
    157  1.1  perseant 
    158  1.1  perseant - Sometimes, the cleaner can die.  (Why?)  If this happens and we don't
    159  1.1  perseant   notice, we're screwed, since the fs will overfill.  So, the invoked
    160  1.1  perseant   cleaner now spawns itself repeatedly, a la init(8), to ensure that a
    161  1.1  perseant   cleaner is always present to clean the fs.
    162  1.1  perseant 
    163  1.1  perseant - Added a flag to clean more actively, not on low load average but
    164  1.1  perseant   filesystem inactivity; a la Blackwell et al., 1995.
    165  1.1  perseant 
    166  1.1  perseant fsck_lfs:
    167  1.1  perseant 
    168  1.1  perseant - Exists, although it currently cannot actually fix anything (it is a
    169  1.1  perseant   diagnostic tool only at this point).
    170