CHANGES revision 1.6 1 1.6 andvar # $NetBSD: CHANGES,v 1.6 2024/02/09 22:08:38 andvar Exp $
2 1.1 perseant
3 1.1 perseant kernel:
4 1.1 perseant
5 1.1 perseant - Instead of blindly continuing when it encounters an Inode that is
6 1.1 perseant locked by another process, lfs_markv will process the rest of the
7 1.1 perseant inodes passed to it and then return EAGAIN. The cleaner will
8 1.1 perseant recognize this and not mark the segment clean. When the cleaner runs
9 1.1 perseant again, the segment containg the (formerly) locked inode will sort high
10 1.1 perseant for cleaning, since it is now almost entirely empty.
11 1.1 perseant
12 1.1 perseant - A beginning has been made to test keeping atime information in the
13 1.1 perseant Ifile, instead of on the inodes. This should make read-mostly
14 1.1 perseant filesystems significantly faster, since the inodes will then remain
15 1.1 perseant close to the data blocks on disk; but of course the ifile will be
16 1.1 perseant somewhat larger. This code is not enabled, as it makes the format of
17 1.1 perseant IFILEs change.
18 1.1 perseant
19 1.1 perseant - The superblock has been broken into two components: an on-disk
20 1.1 perseant superblock using fixed-size types, exactly 512 bytes regardless of
21 1.1 perseant architecture (or could be enlarged in multiples of the media block
22 1.1 perseant size up to LFS_SBPAD); and an in-memory superblock containing the
23 1.1 perseant information only useful to a running LFS, including segment pointers,
24 1.1 perseant etc. The superblock checksumming code has been modified to make
25 1.1 perseant future changes to the superblock format easier.
26 1.1 perseant
27 1.1 perseant - Because of the way that lfs_writeseg works, buffers are freed before
28 1.1 perseant they are really written to disk: their contents are copied into large
29 1.1 perseant buffers which are written async. Because the buffer cache does not
30 1.1 perseant serve to throttle these writes, and malloced memory is used to hold them,
31 1.1 perseant there is a danger of running out of kmem_map. To avoid this, a new
32 1.4 wiz compile-time parameter, LFS_THROTTLE, is used as an upper bound for the
33 1.1 perseant number of partial-segments allowed to be in progress writing at any
34 1.1 perseant given time.
35 1.1 perseant
36 1.1 perseant - If the system crashes between the point that a checkpoint is scheduled
37 1.1 perseant for writing and the time that the write completes, the filesystem
38 1.1 perseant could be left in an inconsistent state (no valid checkpoints on
39 1.1 perseant disk). To avoid this, we toggle between the first two superblocks
40 1.1 perseant when checkpointing, and (if it is indicated that no roll-forward agent
41 1.1 perseant exists) do not allow one checkpoint to occur before the last one has
42 1.1 perseant completed. When the filesystem is mounted, it uses the *older* of the
43 1.1 perseant first two superblocks.
44 1.1 perseant
45 1.1 perseant - DIROPs:
46 1.1 perseant
47 1.1 perseant The design of the LFS includes segregating vnodes used in directory
48 1.1 perseant operations, so that they can be written at the same time during a
49 1.1 perseant checkpoint, avoiding filesystem inconsistency after a crash. Code for
50 1.1 perseant this was partially written for BSD4.4, but was not complete or enabled.
51 1.1 perseant
52 1.1 perseant In particular, vnodes marked VDIROP could be flushed by getnewvnode at
53 1.1 perseant any time, negating the usefulness of marking a vnode VDIROP, since if
54 1.1 perseant the filesystem then crashed it would be inconsistent. Now, when a
55 1.1 perseant vnode is first marked VDIROP it is also referenced. To avoid running
56 1.1 perseant out of vnodes, an attempt to mark more than LFS_MAXDIROP vnodes wth
57 1.1 perseant VDIROP will sleep, and trigger a partial-segment write when no dirops
58 1.1 perseant are active.
59 1.1 perseant
60 1.1 perseant - LFS maintains a linked list of free inode numbers in the Ifile;
61 1.1 perseant accesses to this list are now protected by a simple lock.
62 1.1 perseant
63 1.1 perseant - lfs_vfree is not allowed to run while an inode has blocks scheduled
64 1.1 perseant for writing, since that could trigger a miscounting in lfs_truncate.
65 1.1 perseant
66 1.1 perseant - lfs_balloc now correctly extends fragments, if a block is written
67 1.1 perseant beyond the current end-of-file.
68 1.1 perseant
69 1.1 perseant - Blocks which have already been gathered into a partial-segment are not
70 1.1 perseant allowed to be extended, since if they were, any blocks following them
71 1.1 perseant would either be written in the wrong place, or overwrite other blocks.
72 1.1 perseant
73 1.1 perseant - The LFS buffer-header accounting, which triggers a partial-segment
74 1.6 andvar write if too many buffer-headers are in use by the LFS subsystem, has
75 1.1 perseant been expanded to include *bytes* used in LFS buffers as well.
76 1.1 perseant
77 1.1 perseant - Reads of the Ifile, which almost always come from the cleaner, can no
78 1.1 perseant longer trigger a partial-segment write, since this could cause a
79 1.1 perseant deadlock.
80 1.1 perseant
81 1.1 perseant - Support has been added (but not tested, and currently disabled by
82 1.1 perseant default) for true read-only filesystems. Currently, if a filesystem
83 1.1 perseant is mounted read-only the cleaner can still operate on it, but this
84 1.1 perseant obviously would not be true for read-only media. (I think the
85 1.1 perseant original plan was for the roll-forward agent to operate using this
86 1.1 perseant "feature"?)
87 1.1 perseant
88 1.1 perseant - If a fake buffer is created by lfs_markv and another process draws the
89 1.1 perseant same block in and changes it, the fake buffer is now discarded and
90 1.1 perseant replaced by the "real" buffer containing the new data.
91 1.1 perseant
92 1.1 perseant - An inode which has blocks gathered no longer has IN_MODIFIED set, but
93 1.1 perseant still does in fact have dirty blocks attached. lfs_update will now
94 1.1 perseant wait for such an inode's writes to complete before it runs,
95 1.1 perseant suppressing a panic in vinvalbuf.
96 1.1 perseant
97 1.1 perseant - Many filesystem operations now update the Ifile's mtime, allowing the
98 1.1 perseant cleaner to detect when the filesystem is idle, and clean more
99 1.1 perseant vigorously during such times (cf. Blackwell et al., 1995).
100 1.1 perseant
101 1.1 perseant - When writing a partial-segment, make sure that the current segment is
102 1.1 perseant still marked ACTIVE afterward (otherwise the cleaner might try to
103 1.1 perseant clean it, since it might well be mostly empty).
104 1.1 perseant
105 1.1 perseant - Don't trust the cleaner so much. Sort the blocks during gathering,
106 1.1 perseant even if they came from the cleaner; verify the location of on-disk
107 1.1 perseant inodes, even if the cleaner says it knows where they came from.
108 1.1 perseant
109 1.1 perseant - The cleaning code (lfs_markv in particular) has been entirely
110 1.1 perseant rewritten, and the partial-segment writing code changed to match.
111 1.1 perseant Lfs_markv no longer uses its own implementation of lfs_segwrite, but
112 1.1 perseant marks inodes with IN_CLEANING to differentiate them from the
113 1.1 perseant non-cleaning inodes. This change fixes numerous problems with the old
114 1.3 toshii cleaner, including a buffer overrun, and lost extensions in active
115 1.1 perseant fragments. lfs_bmapv looks up and returns the addresses of inode
116 1.1 perseant blocks, so the cleaner can do something intelligent with them.
117 1.1 perseant
118 1.1 perseant If IN_CLEANING is set on an inode during partial-segment write, only fake
119 1.1 perseant buffers will be written, and IN_MODIFIED will not be cleared, saving
120 1.1 perseant us from a panic in vinvalbuf. The addition of IN_CLEANING also allows
121 1.1 perseant dirops to be active while cleaning is in progress; since otherwise
122 1.1 perseant buffers engaged in active dirops might be written ahead of schedule,
123 1.1 perseant and cause an inconsistent checkpoint to be written to disk.
124 1.1 perseant
125 1.1 perseant (XXX - even now, DIROP blocks can sometimes be written to disk, if we
126 1.1 perseant are cleaning the same blocks as are active? Grr, I don't see a good
127 1.1 perseant solution for this!)
128 1.1 perseant
129 1.1 perseant - Added sysctl entries for LFS. In particular, `writeindir' controls
130 1.1 perseant whether indirect blocks are written during non-checkpoint writes.
131 1.1 perseant (Since there is no roll-forward agent as yet, there is no penalty in
132 1.1 perseant not writing indirect blocks.)
133 1.1 perseant
134 1.1 perseant - Wake up the cleaner at fs-unmount time, so it can die (if we unmount
135 1.1 perseant and then remount, we could conceivably get more than one cleaner
136 1.1 perseant operating at once).
137 1.1 perseant
138 1.2 perseant newfs_lfs:
139 1.1 perseant
140 1.1 perseant - The ifile inode is now created with the schg flag set, since nothing
141 1.1 perseant ever modifies it. This could be a pain for the roll-forward agent,
142 1.1 perseant but since that should really run *before* the filesystem is mounted,
143 1.1 perseant I don't care.
144 1.1 perseant
145 1.1 perseant - For large disks, it may be necessary to write one or more indirect
146 1.1 perseant blocks when the ifile inode is created. Newlfs has been changed to
147 1.1 perseant write the first indirect block, if necessary. It should instead just
148 1.1 perseant build a set of inodes and blocks, and then use the partial-segment
149 1.1 perseant writing routine mentioned above to write an ifile of whatever size is
150 1.1 perseant desired.
151 1.1 perseant
152 1.1 perseant lfs_cleanerd:
153 1.1 perseant
154 1.1 perseant - Now writes information to the syslog.
155 1.1 perseant
156 1.1 perseant - Can now deal properly with fragments.
157 1.1 perseant
158 1.1 perseant - Sometimes, the cleaner can die. (Why?) If this happens and we don't
159 1.1 perseant notice, we're screwed, since the fs will overfill. So, the invoked
160 1.1 perseant cleaner now spawns itself repeatedly, a la init(8), to ensure that a
161 1.1 perseant cleaner is always present to clean the fs.
162 1.1 perseant
163 1.1 perseant - Added a flag to clean more actively, not on low load average but
164 1.1 perseant filesystem inactivity; a la Blackwell et al., 1995.
165 1.1 perseant
166 1.1 perseant fsck_lfs:
167 1.1 perseant
168 1.1 perseant - Exists, although it currently cannot actually fix anything (it is a
169 1.1 perseant diagnostic tool only at this point).
170