CHANGES revision 1.2
11.2Sperseant#   $NetBSD: CHANGES,v 1.2 1999/04/10 18:31:05 perseant Exp $
21.1Sperseant
31.1Sperseantkernel:
41.1Sperseant
51.1Sperseant- Instead of blindly continuing when it encounters an Inode that is
61.1Sperseant  locked by another process, lfs_markv will process the rest of the
71.1Sperseant  inodes passed to it and then return EAGAIN.  The cleaner will
81.1Sperseant  recognize this and not mark the segment clean.  When the cleaner runs
91.1Sperseant  again, the segment containg the (formerly) locked inode will sort high
101.1Sperseant  for cleaning, since it is now almost entirely empty.
111.1Sperseant
121.1Sperseant- A beginning has been made to test keeping atime information in the
131.1Sperseant  Ifile, instead of on the inodes.  This should make read-mostly
141.1Sperseant  filesystems significantly faster, since the inodes will then remain
151.1Sperseant  close to the data blocks on disk; but of course the ifile will be
161.1Sperseant  somewhat larger.  This code is not enabled, as it makes the format of
171.1Sperseant  IFILEs change.
181.1Sperseant
191.1Sperseant- The superblock has been broken into two components: an on-disk
201.1Sperseant  superblock using fixed-size types, exactly 512 bytes regardless of
211.1Sperseant  architecture (or could be enlarged in multiples of the media block
221.1Sperseant  size up to LFS_SBPAD); and an in-memory superblock containing the
231.1Sperseant  information only useful to a running LFS, including segment pointers,
241.1Sperseant  etc.  The superblock checksumming code has been modified to make
251.1Sperseant  future changes to the superblock format easier.
261.1Sperseant
271.1Sperseant- Because of the way that lfs_writeseg works, buffers are freed before
281.1Sperseant  they are really written to disk: their contents are copied into large
291.1Sperseant  buffers which are written async.  Because the buffer cache does not
301.1Sperseant  serve to throttle these writes, and malloced memory is used to hold them,
311.1Sperseant  there is a danger of running out of kmem_map.  To avoid this, a new
321.1Sperseant  compile-time paramter, LFS_THROTTLE, is used as an upper bound for the
331.1Sperseant  number of partial-segments allowed to be in progress writing at any
341.1Sperseant  given time.
351.1Sperseant
361.1Sperseant- If the system crashes between the point that a checkpoint is scheduled
371.1Sperseant  for writing and the time that the write completes, the filesystem
381.1Sperseant  could be left in an inconsistent state (no valid checkpoints on
391.1Sperseant  disk).  To avoid this, we toggle between the first two superblocks
401.1Sperseant  when checkpointing, and (if it is indicated that no roll-forward agent
411.1Sperseant  exists) do not allow one checkpoint to occur before the last one has
421.1Sperseant  completed.  When the filesystem is mounted, it uses the *older* of the
431.1Sperseant  first two superblocks.
441.1Sperseant
451.1Sperseant- DIROPs:
461.1Sperseant
471.1Sperseant  The design of the LFS includes segregating vnodes used in directory
481.1Sperseant  operations, so that they can be written at the same time during a
491.1Sperseant  checkpoint, avoiding filesystem inconsistency after a crash.  Code for
501.1Sperseant  this was partially written for BSD4.4, but was not complete or enabled.
511.1Sperseant
521.1Sperseant  In particular, vnodes marked VDIROP could be flushed by getnewvnode at
531.1Sperseant  any time, negating the usefulness of marking a vnode VDIROP, since if
541.1Sperseant  the filesystem then crashed it would be inconsistent.  Now, when a
551.1Sperseant  vnode is first marked VDIROP it is also referenced.  To avoid running
561.1Sperseant  out of vnodes, an attempt to mark more than LFS_MAXDIROP vnodes wth
571.1Sperseant  VDIROP will sleep, and trigger a partial-segment write when no dirops
581.1Sperseant  are active.
591.1Sperseant
601.1Sperseant- LFS maintains a linked list of free inode numbers in the Ifile;
611.1Sperseant  accesses to this list are now protected by a simple lock.
621.1Sperseant
631.1Sperseant- lfs_vfree is not allowed to run while an inode has blocks scheduled
641.1Sperseant  for writing, since that could trigger a miscounting in lfs_truncate.
651.1Sperseant
661.1Sperseant- lfs_balloc now correctly extends fragments, if a block is written
671.1Sperseant  beyond the current end-of-file.
681.1Sperseant
691.1Sperseant- Blocks which have already been gathered into a partial-segment are not
701.1Sperseant  allowed to be extended, since if they were, any blocks following them
711.1Sperseant  would either be written in the wrong place, or overwrite other blocks.
721.1Sperseant
731.1Sperseant- The LFS buffer-header accounting, which triggers a partial-segment
741.1Sperseant  write if too many buffer-headers are in use by the LFS subystem, has
751.1Sperseant  been expanded to include *bytes* used in LFS buffers as well.
761.1Sperseant
771.1Sperseant- Reads of the Ifile, which almost always come from the cleaner, can no
781.1Sperseant  longer trigger a partial-segment write, since this could cause a
791.1Sperseant  deadlock.
801.1Sperseant
811.1Sperseant- Support has been added (but not tested, and currently disabled by
821.1Sperseant  default) for true read-only filesystems.  Currently, if a filesystem
831.1Sperseant  is mounted read-only the cleaner can still operate on it, but this
841.1Sperseant  obviously would not be true for read-only media.  (I think the
851.1Sperseant  original plan was for the roll-forward agent to operate using this
861.1Sperseant  "feature"?)
871.1Sperseant
881.1Sperseant- If a fake buffer is created by lfs_markv and another process draws the
891.1Sperseant  same block in and changes it, the fake buffer is now discarded and
901.1Sperseant  replaced by the "real" buffer containing the new data.
911.1Sperseant
921.1Sperseant- An inode which has blocks gathered no longer has IN_MODIFIED set, but
931.1Sperseant  still does in fact have dirty blocks attached.  lfs_update will now
941.1Sperseant  wait for such an inode's writes to complete before it runs,
951.1Sperseant  suppressing a panic in vinvalbuf.
961.1Sperseant
971.1Sperseant- Many filesystem operations now update the Ifile's mtime, allowing the
981.1Sperseant  cleaner to detect when the filesystem is idle, and clean more
991.1Sperseant  vigorously during such times (cf. Blackwell et al., 1995).
1001.1Sperseant
1011.1Sperseant- When writing a partial-segment, make sure that the current segment is
1021.1Sperseant  still marked ACTIVE afterward (otherwise the cleaner might try to
1031.1Sperseant  clean it, since it might well be mostly empty).
1041.1Sperseant
1051.1Sperseant- Don't trust the cleaner so much.  Sort the blocks during gathering,
1061.1Sperseant  even if they came from the cleaner; verify the location of on-disk
1071.1Sperseant  inodes, even if the cleaner says it knows where they came from.
1081.1Sperseant
1091.1Sperseant- The cleaning code (lfs_markv in particular) has been entirely
1101.1Sperseant  rewritten, and the partial-segment writing code changed to match.
1111.1Sperseant  Lfs_markv no longer uses its own implementation of lfs_segwrite, but
1121.1Sperseant  marks inodes with IN_CLEANING to differentiate them from the
1131.1Sperseant  non-cleaning inodes.  This change fixes numerous problems with the old
1141.1Sperseant  cleaner, including a buffer overrun, and lost extentions in active
1151.1Sperseant  fragments.  lfs_bmapv looks up and returns the addresses of inode
1161.1Sperseant  blocks, so the cleaner can do something intelligent with them.
1171.1Sperseant
1181.1Sperseant  If IN_CLEANING is set on an inode during partial-segment write, only fake
1191.1Sperseant  buffers will be written, and IN_MODIFIED will not be cleared, saving
1201.1Sperseant  us from a panic in vinvalbuf.  The addition of IN_CLEANING also allows
1211.1Sperseant  dirops to be active while cleaning is in progress; since otherwise
1221.1Sperseant  buffers engaged in active dirops might be written ahead of schedule,
1231.1Sperseant  and cause an inconsistent checkpoint to be written to disk.
1241.1Sperseant
1251.1Sperseant  (XXX - even now, DIROP blocks can sometimes be written to disk, if we
1261.1Sperseant  are cleaning the same blocks as are active?  Grr, I don't see a good
1271.1Sperseant  solution for this!)
1281.1Sperseant
1291.1Sperseant- Added sysctl entries for LFS.  In particular, `writeindir' controls
1301.1Sperseant  whether indirect blocks are written during non-checkpoint writes.
1311.1Sperseant  (Since there is no roll-forward agent as yet, there is no penalty in
1321.1Sperseant  not writing indirect blocks.)
1331.1Sperseant
1341.1Sperseant- Wake up the cleaner at fs-unmount time, so it can die (if we unmount
1351.1Sperseant  and then remount, we could conceivably get more than one cleaner
1361.1Sperseant  operating at once).
1371.1Sperseant
1381.2Sperseantnewfs_lfs:
1391.1Sperseant
1401.1Sperseant- The ifile inode is now created with the schg flag set, since nothing
1411.1Sperseant  ever modifies it.  This could be a pain for the roll-forward agent,
1421.1Sperseant  but since that should really run *before* the filesystem is mounted,
1431.1Sperseant  I don't care.
1441.1Sperseant
1451.1Sperseant- For large disks, it may be necessary to write one or more indirect
1461.1Sperseant  blocks when the ifile inode is created.  Newlfs has been changed to
1471.1Sperseant  write the first indirect block, if necessary.  It should instead just
1481.1Sperseant  build a set of inodes and blocks, and then use the partial-segment
1491.1Sperseant  writing routine mentioned above to write an ifile of whatever size is
1501.1Sperseant  desired.
1511.1Sperseant
1521.1Sperseantlfs_cleanerd:
1531.1Sperseant
1541.1Sperseant- Now writes information to the syslog.
1551.1Sperseant
1561.1Sperseant- Can now deal properly with fragments.
1571.1Sperseant
1581.1Sperseant- Sometimes, the cleaner can die.  (Why?)  If this happens and we don't
1591.1Sperseant  notice, we're screwed, since the fs will overfill.  So, the invoked
1601.1Sperseant  cleaner now spawns itself repeatedly, a la init(8), to ensure that a
1611.1Sperseant  cleaner is always present to clean the fs.
1621.1Sperseant
1631.1Sperseant- Added a flag to clean more actively, not on low load average but
1641.1Sperseant  filesystem inactivity; a la Blackwell et al., 1995.
1651.1Sperseant
1661.1Sperseantfsck_lfs:
1671.1Sperseant
1681.1Sperseant- Exists, although it currently cannot actually fix anything (it is a
1691.1Sperseant  diagnostic tool only at this point).
170