README revision 1.1
11.1Smycroft# from: @(#)README 8.1 (Berkeley) 6/11/93 21.1Smycroft# $Id: README,v 1.1 1994/06/08 11:42:16 mycroft Exp $ 31.1Smycroft 41.1SmycroftThe file system is reasonably stable, but incomplete. There are 51.1Smycroftplaces where cleaning performance can be improved dramatically (see 61.1Smycroftcomments in lfs_syscalls.c). For details on the implementation, 71.1Smycroftperformance and why garbage collection always wins, see Dr. Margo 81.1SmycroftSeltzer's thesis available for anonymous ftp from toe.cs.berkeley.edu, 91.1Smycroftin the directory pub/personal/margo/thesis.ps.Z, or the January 1993 101.1SmycroftUSENIX paper. 111.1Smycroft 121.1SmycroftMissing Functionality: 131.1Smycroft Multiple block sizes and/or fragments are not yet implemented. 141.1Smycroft 151.1Smycroft---------- 161.1SmycroftThe disk is laid out in segments. The first segment starts 8K into the 171.1Smycroftdisk (the first 8K is used for boot information). Each segment is composed 181.1Smycroftof the following: 191.1Smycroft 201.1Smycroft An optional super block 211.1Smycroft One or more groups of: 221.1Smycroft segment summary 231.1Smycroft 0 or more data blocks 241.1Smycroft 0 or more inode blocks 251.1Smycroft 261.1SmycroftThe segment summary and inode/data blocks start after the super block (if 271.1Smycroftpresent), and grow toward the end of the segment. 281.1Smycroft 291.1Smycroft _______________________________________________ 301.1Smycroft | | | | | 311.1Smycroft | summary | data/inode | summary | data/inode | 321.1Smycroft | block | blocks | block | blocks | ... 331.1Smycroft |_________|____________|_________|____________| 341.1Smycroft 351.1SmycroftThe data/inode blocks following a summary block are described by the 361.1Smycroftsummary block. In order to permit the segment to be written in any order 371.1Smycroftand in a forward direction only, a checksum is calculated across the 381.1Smycroftblocks described by the summary. Additionally, the summary is checksummed 391.1Smycroftand timestamped. Both of these are intended for recovery; the former is 401.1Smycroftto make it easy to determine that it *is* a summary block and the latter 411.1Smycroftis to make it easy to determine when recovery is finished for partially 421.1Smycroftwritten segments. These checksums are also used by the cleaner. 431.1Smycroft 441.1Smycroft Summary block (detail) 451.1Smycroft ________________ 461.1Smycroft | sum cksum | 471.1Smycroft | data cksum | 481.1Smycroft | next segment | 491.1Smycroft | timestamp | 501.1Smycroft | FINFO count | 511.1Smycroft | inode count | 521.1Smycroft | flags | 531.1Smycroft |______________| 541.1Smycroft | FINFO-1 | 0 or more file info structures, identifying the 551.1Smycroft | . | blocks in the segment. 561.1Smycroft | . | 571.1Smycroft | . | 581.1Smycroft | FINFO-N | 591.1Smycroft | inode-N | 601.1Smycroft | . | 611.1Smycroft | . | 621.1Smycroft | . | 0 or more inode daddr_t's, identifying the inode 631.1Smycroft | inode-1 | blocks in the segment. 641.1Smycroft |______________| 651.1Smycroft 661.1SmycroftInode blocks are blocks of on-disk inodes in the same format as those in 671.1Smycroftthe FFS. However, spare[0] contains the inode number of the inode so we 681.1Smycroftcan find a particular inode on a page. They are packed page_size / 691.1Smycroftsizeof(inode) to a block. Data blocks are exactly as in the FFS. Both 701.1Smycroftinodes and data blocks move around the file system at will. 711.1Smycroft 721.1SmycroftThe file system is described by a super-block which is replicated and 731.1Smycroftoccurs as the first block of the first and other segments. (The maximum 741.1Smycroftnumber of super-blocks is MAXNUMSB). Each super-block maintains a list 751.1Smycroftof the disk addresses of all the super-blocks. The super-block maintains 761.1Smycrofta small amount of checkpoint information, essentially just enough to find 771.1Smycroftthe inode for the IFILE (fs->lfs_idaddr). 781.1Smycroft 791.1SmycroftThe IFILE is visible in the file system, as inode number IFILE_INUM. It 801.1Smycroftcontains information shared between the kernel and various user processes. 811.1Smycroft 821.1Smycroft Ifile (detail) 831.1Smycroft ________________ 841.1Smycroft | cleaner info | Cleaner information per file system. (Page 851.1Smycroft | | granularity.) 861.1Smycroft |______________| 871.1Smycroft | segment | Space available and last modified times per 881.1Smycroft | usage table | segment. (Page granularity.) 891.1Smycroft |______________| 901.1Smycroft | IFILE-1 | Per inode status information: current version #, 911.1Smycroft | . | if currently allocated, last access time and 921.1Smycroft | . | current disk address of containing inode block. 931.1Smycroft | . | If current disk address is LFS_UNUSED_DADDR, the 941.1Smycroft | IFILE-N | inode is not in use, and it's on the free list. 951.1Smycroft |______________| 961.1Smycroft 971.1Smycroft 981.1SmycroftFirst Segment at Creation Time: 991.1Smycroft_____________________________________________________________ 1001.1Smycroft| | | | | | | | 1011.1Smycroft| 8K pad | Super | summary | inode | ifile | root | l + f | 1021.1Smycroft| | block | | block | | dir | dir | 1031.1Smycroft|________|_______|_________|_______|_______|_______|_______| 1041.1Smycroft ^ 1051.1Smycroft Segment starts here. 1061.1Smycroft 1071.1SmycroftSome differences from the Sprite LFS implementation. 1081.1Smycroft 1091.1Smycroft1. The LFS implementation placed the ifile metadata and the super block 1101.1Smycroft at fixed locations. This implementation replicates the super block 1111.1Smycroft and puts each at a fixed location. The checkpoint data is divided into 1121.1Smycroft two parts -- just enough information to find the IFILE is stored in 1131.1Smycroft two of the super blocks, although it is not toggled between them as in 1141.1Smycroft the Sprite implementation. (This was deliberate, to avoid a single 1151.1Smycroft point of failure.) The remaining checkpoint information is treated as 1161.1Smycroft a regular file, which means that the cleaner info, the segment usage 1171.1Smycroft table and the ifile meta-data are stored in normal log segments. 1181.1Smycroft (Tastes great, less filling...) 1191.1Smycroft 1201.1Smycroft2. The segment layout is radically different in Sprite; this implementation 1211.1Smycroft uses something a lot like network framing, where data/inode blocks are 1221.1Smycroft written asynchronously, and a checksum is used to validate any set of 1231.1Smycroft summary and data/inode blocks. Sprite writes summary blocks synchronously 1241.1Smycroft after the data/inode blocks have been written and the existence of the 1251.1Smycroft summary block validates the data/inode blocks. This permits us to write 1261.1Smycroft everything contiguously, even partial segments and their summaries, whereas 1271.1Smycroft Sprite is forced to seek (from the end of the data inode to the summary 1281.1Smycroft which lives at the end of the segment). Additionally, writing the summary 1291.1Smycroft synchronously should cost about 1/2 a rotation per summary. 1301.1Smycroft 1311.1Smycroft3. Sprite LFS distinguishes between different types of blocks in the segment. 1321.1Smycroft Other than inode blocks and data blocks, we don't. 1331.1Smycroft 1341.1Smycroft4. Sprite LFS traverses the IFILE looking for free blocks. We maintain a 1351.1Smycroft free list threaded through the IFILE entries. 1361.1Smycroft 1371.1Smycroft5. The cleaner runs in user space, as opposed to kernel space. It shares 1381.1Smycroft information with the kernel by reading/writing the IFILE and through 1391.1Smycroft cleaner specific system calls. 1401.1Smycroft 141