Home | History | Annotate | Download | only in ffs
History log of /src/sys/ufs/ffs/ffs_alloc.c
RevisionDateAuthorComments
 1.174  27-Jun-2025  andvar s/quadradically/quadratically/ in comments.
 1.173  13-May-2024  msaitoh s/contigous/contiguous/ in comment.
 1.172  07-Jan-2023  chs ufs: fixed signed/unsigned bugs affecting large file systems

Apply these commits from FreeBSD:

commit e870d1e6f97cc73308c11c40684b775bcfa906a2
Author: Kirk McKusick <mckusick@FreeBSD.org>
Date: Wed Feb 10 20:10:35 2010 +0000

This fix corrects a problem in the file system that treats large
inode numbers as negative rather than unsigned. For a default
(16K block) file system, this bug began to show up at a file system
size above about 16Tb.

To fully handle this problem, newfs must be updated to ensure that
it will never create a filesystem with more than 2^32 inodes. That
patch will be forthcoming soon.

Reported by: Scott Burns, John Kilburg, Bruce Evans
Followup by: Jeff Roberson
PR: 133980
MFC after: 2 weeks

commit 81479e688b0f643ffacd3f335b4b4bba460b769d
Author: Kirk McKusick <mckusick@FreeBSD.org>
Date: Thu Feb 11 18:14:53 2010 +0000

One last pass to get all the unsigned comparisons correct.


In additional to the changes from FreeBSD, this commit includes quite a few
related changes to appease -Wsign-compare.
 1.171  23-Apr-2022  hannken branches: 1.171.4;
Need vnode locked fot VOP_FDISCARD().
 1.170  03-Sep-2021  andvar fix typos in comments, mainly s/extention/extension/ and s/sufficent/sufficient/
 1.169  05-Sep-2020  riastradh Round of uvm.h cleanup.

The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
to query whether curlwp is the pagedaemon, which should maybe be
exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
by it. We should split up uvm_extern.h but this will serve for now
to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
UVMHIST(ubchist), since ubchist is declared in uvm.h but the
reference evaporates if UVMHIST is not defined, so we reduce header
file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
here.

ok chs@
 1.168  26-Jul-2020  chs skip the assertions about page-locking when allocating to the extattr bmap,
since extattrs do not use the page cache.
 1.167  18-Apr-2020  christos Extended attribute support for ffsv2, from FreeBSD.
 1.166  23-Feb-2020  ad branches: 1.166.4;
UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.165  18-Feb-2020  riastradh Fix non-DIAGNOSTIC build with UVM_PAGE_TRKOWN.
 1.164  14-Apr-2019  kardel branches: 1.164.4; 1.164.6;
PR/53990, PR/52380, PR/52102: UFS2 cylinder group inode allocation botch

Fix rare allocation botch in ffs_nodealloccg().

Conditions:
a) less than
#_of_initialized_inodes(cg->cg_initediblk)
- inodes_per_filesystem_block
are allocated in the cylinder group
b) cg->cg_irotor points to a uninterupted run of
allocated inodes in the inode bitmap up to the
end of dynamically initialized inodes
(cg->cg_initediblk)

In this case the next inode after this run was returned
without initializing the respective inode block. As the
block is not initialized these inodes could trigger panics
on inode consistency due to old (uninitialized) disk data.

In very rare cases data loss could occur when
the uninitialized inode block is initialized via the
normal mechanism.
Further conditions to occur after the above:
c) no panic
d) no (forced) fsck
e) and more than cg->cg_initediblk - inodes_per_filesystem_block
allocated inodes.

Fix:
Always insure allocation always in initialized inode range
extending the initialized inode range as needed.

Add KASSERTMSG() safeguards.

ok hannken@
 1.163  10-Dec-2018  jdolecek put back UFS_WAPBL_JUNLOCK_ASSERT(), the underlying rw_write_held() check
doesn't actually have a race since it checks if the rwlock is held by
current lwp
 1.162  10-Dec-2018  jdolecek make UFS_WAPBL_JLOCK_ASSERT() #ifdef DIAGNOSTIC, same as the underlying
function KASSERT(), so that it actually does something; fix code using
it to actually pass correct params, so that it compiles

remove UFS_WAPBL_JUNLOCK_ASSERT(), as that is inherently racy (it's
okay on those places if the rwlock is held by other lwp); depend
on the RW_ASSERT()/LOCKDEBUG inside rw_enter() to catch the case
with wapbl rwlock held by current lwp
 1.161  03-Sep-2018  riastradh Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
 1.160  19-Jul-2018  ozaki-r Avoid using magic numbers for arguments of workqueue_create (NFC)
 1.159  07-Dec-2017  chs branches: 1.159.2; 1.159.4;
fix the UVM_PAGE_TRKOWN page-locking assertion at the top of ffs_alloc()
to work right for multi-threaded processes.
 1.158  13-Aug-2017  mlelstv Don't time out the discard work queue here. Either destroying a work queue
with pending work items panics or accessing freed resources from the work
item will crash. The timeout needs to be handled gracefully by the driver
that implements the discard operation.

Fixes parts of PR 50725.
 1.157  12-Jul-2017  hannken When initializing more inodes make sure to write them to disk
before writing the cylinder group with updated cg_initediblk.
 1.156  18-Mar-2017  riastradh branches: 1.156.6;
#if DIAGNOSTIC panic ---> KASSERT
 1.155  01-Mar-2017  hannken Remove now redundant calls to fstrans_start()/fstrans_done().
 1.154  30-Oct-2016  christos branches: 1.154.2;
Tidy up panic messages, no functional change.
 1.153  28-Oct-2016  jdolecek reorganize ffs_truncate()/ffs_indirtrunc() to be able to partially
succeed; change wapbl_register_deallocation() to return EAGAIN
rather than panic when code hits the limit

callers changed to either loop calling ffs_truncate() using new
utility ufs_truncate_retry() if their semantics requires it, or
just ignore the failure; remove ufs_wapbl_truncate()

this fixes possible user-triggerable panic during truncate, and
resolves WAPBL performance issue with truncates of large files

PR kern/47146 and kern/49175
 1.152  25-Sep-2016  jdolecek adjust ffs_realloccg() so that the logic about allocating full
contiguous block for future fragment expansion doesn't need to
UFS_WAPBL_REGISTER_DEALLOCATION() or ffs_blkfree(); the free blocks
are now immediatelly available for use by the expanding file in further i/o

primary driver is safe removal of the deallocation registration and
hence failure point, but this also fixes degenerate case for wapbl,
and similar also for discard - if the file would be actually expanded
before wapbl commit, or before discard queue would be processed,
the filesystem would not yet see the contiguous free blocks, and
would be forced to allocate another fragment elsewhere
 1.151  12-Aug-2015  riastradh branches: 1.151.2;
Need wapbl transaction around ffs_blkfree_cg. Fixes wapbl+discard.
 1.150  08-Aug-2015  mlelstv don't crash when printing error messages when there are no credentials.
don't abuse the printed uid to log the inode number.

The printing/logging of error messages should be simplified.
 1.149  28-Mar-2015  maxv Remove the 'cred' argument from bread(). Remove a now unused var in
ffs_snapshot.c. Update the man page accordingly.

ok hannken@
 1.148  17-Mar-2015  hannken Change ffs to use vcache_new:
- Change ffs_valloc to return an inode number.
- Remove now obsolete UFS operations UFS_VALLOC and UFS_VFREE.
- Make ufs_makeinode private to ufs_vnops.c and pass vattr instead of mode.
 1.147  08-Sep-2014  joerg branches: 1.147.2;
Prefer cprng_fast32 over random. A good distribution even in the lower
bits beat any minor performance advantage randomo(9) might have,
especially given the disk IO involved.
 1.146  25-Jul-2014  dholland branches: 1.146.2;
Switch the FFS code for discarding free blocks to use VOP_FDISCARD.
 1.145  12-Nov-2013  dholland branches: 1.145.2;
clarify warning printout
 1.144  28-Oct-2013  bad Pull in fix from FreeBSD ffs_alloc.c r121785:
Consider only cylinder groups with at least 75% of the average free space
per cylinder group and 75% of the average free inodes per cylinder group
as candidates for the creation of a new directory. Avoids excessive I/O
scanning for a suitable cylinder group on relatively full file systems.

Tested by sborril and me.

Pullup: netbsd-6, netbsd-5


Original commit message:

Tweak the calculation of minbfree in ffs_dirpref() so that only
those cylinder groups that have at least 75% of the average free
space per cylinder group for that file system are considered as
candidates for the creation of a new directory. The previous formula
for minbfree would set it to zero if the file system was more than
75% full, which allowed cylinder groups with no free space at all
to be chosen as candidates for directory creation, which resulted
in an expensive search for free blocks for each file that was
subsequently created in that directory.

Modify the calculation of minifree in the same way.

Decrease maxcontigdirs as the file system fills to decrease the
likelyhood that a cluster of directories will overflow the available
space in a cylinder group.

Reviewed by: mckusick
Tested by: kmarx@vicor.com
MFC after: 2 weeks
 1.143  20-Oct-2013  christos always declare needswap
 1.142  20-Oct-2013  christos always declare needswap
 1.141  19-Oct-2013  martin Eliminate a variable only used in diagnostic kernels
 1.140  30-Sep-2013  hannken Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>
 1.139  12-Sep-2013  martin #ifdef a variable just like their use
 1.138  23-Jun-2013  dholland branches: 1.138.2;
Stick ffs_ in front of the following macros:
fragstoblks()
blkstofrags()
fragnum()
blknum()

to finish the job of distinguishing them from the lfs versions, which
Christos renamed the other day.

I believe this is the last of the overtly ambiguous exported symbols
from ffs... or at least, the last of the ones that conflicted with lfs.
ffs still pollutes the C namespace very broadly (as does ufs) and this
needs quite a bit more cleanup.

XXX: boo on macros with lowercase names. But I'm not tackling that just yet.
 1.137  23-Jun-2013  dholland Stick ffs_, ext2_, chfs_, filecore_, cd9660_, or mfs_ in front of
the following symbols so as to disambiguate fully. (Christos already
did the lfs ones.)

lblkno
lblktosize
lfragtosize
numfrags
blkroundup
fragroundup
 1.136  23-Jun-2013  dholland fsbtodb() -> FFS_FSBTODB(), EXT2_FSBTODB(), or MFS_FSBTODB()
dbtofsb() -> FFS_DBTOFSB() or EXT2_DBTOFSB()

(Christos already did the lfs ones a few days back)
 1.135  19-Jun-2013  dholland blkoff() -> ffs_blkoff() stragglers
 1.134  19-Jun-2013  dholland Rename ambiguous macros:
MAXDIRSIZE -> UFS_MAXDIRSIZE or LFS_MAXDIRSIZE
NINDIR -> FFS_NINDIR, EXT2_NINDIR, LFS_NINDIR, or MFS_NINDIR
INOPB -> FFS_INOPB, LFS_INOPB
INOPF -> FFS_INOPF, LFS_INOPF
blksize -> ffs_blksize, ext2_blksize, or lfs_blksize
sblksize -> ffs_blksize

These are not the only ambiguously defined filesystem macros, of
course, there's a pile more. I may not have found all the ambiguous
definitions of blksize(), too, as there are a lot of other things
called 'blksize' in the system.
 1.133  22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.132  20-Dec-2012  hannken Change bread() and breadn() to never return a buffer on
error and modify all callers to not brelse() on error.

Welcome to 6.99.16

PR kern/46282 (6.0_BETA crash: msdosfs_bmap -> pcbmap -> bread -> bio_doread)
 1.131  19-Oct-2012  drochner Implement experimental support to pass notifications that a file
was deleted from the filesystem to the disk driver, commonly
known as "discard" or "trim".
fs/driver support is in ffs and ata wd for now.
This is what was posted here:
http://mail-index.netbsd.org/tech-kern/2012/02/28/msg012813.html
with minor cleanup, and the global switch replaced by a mount option.
 1.130  28-Nov-2011  tls branches: 1.130.4; 1.130.8;
Remove arc4random() and arc4randbytes() from the kernel API. Replace
arc4random() hacks in rump with stubs that call the host arc4random() to
get numbers that are hopefully actually random (arc4random() keyed with
stack junk is not). This should fix some of the currently failing anita
tests -- we should no longer generate duplicate "random" MAC addresses in
the test environment.
 1.129  20-Sep-2011  chs branches: 1.129.2;
strengthen the assertions about pages existing during block allocation,
which were incorrectly relaxed last year. add some comments so that
the intent of these is hopefully clearer.

in ufs_balloc_range(), don't free pages or mark them dirty if
allocating their backing store failed. this fixes PR 45369.
 1.128  12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.127  06-Mar-2011  bouyer branches: 1.127.2;
merge the bouyer-quota2 branch. This adds a new on-disk format
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.

See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
 1.126  06-Mar-2011  rmind {ffs_nodealloccg,ext2fs_nodealloccg,ext2fs_mapsearch}: use XOR and ffs()
to find free bits in the inode and block bitmaps, instead of the loop.

Obtained from FreeBSD (changes by jhb).
 1.125  21-Feb-2010  mlelstv branches: 1.125.2; 1.125.4; 1.125.6;
For the UVM_PAGE_TRKOWN test do not require that the relevant pages
must exist.
 1.124  07-May-2009  elad branches: 1.124.2;
Introduce several actions/requests for authorizing file-system related
operations, specifically quota and block allocation from reserved space.

Modify ufs_quotactl() to accomodate passing "mp" earlier by vfs_busy()ing
it a little bit higher.

Mailing list reference:

http://mail-index.netbsd.org/tech-kern/2009/04/26/msg004936.html

Note that the umapfs request mentioned in this thread was NOT added as
there is still on-going discussion regarding the proper implementation.
 1.123  25-Apr-2009  sborrill Fix random 'filesystem full' messages by trapping a couple of 32-bit
overflow areas missed in rev 1.110 and switching cgbase().

Kudos to rump_ffs!
 1.122  22-Feb-2009  ad PR kern/26878 FFSv2 + softdep = livelock (no free ram)
PR kern/16942 panic with softdep and quotas
PR kern/19565 panic: softdep_write_inodeblock: indirect pointer #1 mismatch
PR kern/26274 softdep panic: allocdirect_merge: ...
PR kern/26374 Long delay before non-root users can write to softdep partitions
PR kern/28621 1.6.x "vp != NULL" panic in ffs_softdep.c:4653 while unmounting a softdep (+quota) filesystem
PR kern/29513 FFS+Softdep panic with unfsck-able file-corruption
PR kern/31544 The ffs softdep code appears to fail to write dirty bits to disk
PR kern/31981 stopping scsi disk can cause panic (softdep)
PR kern/32116 kernel panic in softdep (assertion failure)
PR kern/32532 softdep_trackbufs deadlock
PR kern/37191 softdep: locking against myself
PR kern/40474 Kernel panic after remounting raid root with softdep

Retire softdep, pass 2. As discussed and later formally announced on the
mailing lists.
 1.121  22-Feb-2009  ad PR kern/39564 wapbl performance issues with disk cache flushing
PR kern/40361 WAPBL locking panic in -current
PR kern/40361 WAPBL locking panic in -current
PR kern/40470 WAPBL corrupts ext2fs
PR kern/40562 busy loop in ffs_sync when unmounting a file system
PR kern/40525 panic: ffs_valloc: dup alloc

- A fix for an issue that can lead to "ffs_valloc: dup" due to dirty cg
buffers being invalidated. Problem discovered and patch by dholland@.

- If the syncer fails to lazily sync a vnode due to lock contention,
retry 1 second later instead of 30 seconds later.

- Flush inode atime updates every ~10 seconds (this makes most sense with
logging). Presently they didn't hit the disk for read-only files or
devices until the file system was unmounted. It would be better to trickle
the updates out but that would require more extensive changes.

- Fix issues with file system corruption, busy looping and other nasty
problems when logging and non-logging file systems are intermixed,
with one being the root file system.

- For logging, do not flush metadata on an inode-at-a-time basis if the sync
has been requested by ioflush. Previously, we could try hundreds of log
sync operations a second due to inode update activity, causing the syncer
to fall behind and metadata updates to be serialized across the entire
file system. Instead, burst out metadata and log flushes at a minimum
interval of every 10 seconds on an active file system (happens more often
if the log becomes full). Note this does not change the operation of
fsync() etc.

- With the flush issue fixed, re-enable concurrent metadata updates in
vfs_wapbl.c.
 1.120  11-Jan-2009  christos branches: 1.120.2;
merge christos-time_t
 1.119  06-Dec-2008  joerg Split ffs_freefile into a frontend for normal cylinder group and for
snapshot use. Adjust ffs_blkfree_common to get the fs instance passed
in, the original commit didn't account blocks in the snapshots
correctly. Assert that ffs_blkfree is used with the primary fs instance
and that ffs_checkfreefile is only used for snapshots. Move the bdwrite
from ffs_blkfree_common into the caller for symmetry. This creates a
redundant write of unmodified data for ffs_blkfree_snap if a double free
of a block happens.

Reviewed and tested by hannken@.
 1.118  01-Dec-2008  joerg Revert last. Conditionalize variables on FFS_EI.
 1.117  01-Dec-2008  cegger build fix: remove unused variables
 1.116  01-Dec-2008  joerg ffs_blkfree is used in two different ways. The normal usage is to free a
block in the cylinder groups of the filesystem. The other user is the
snapshot code, which wants to modify the copied cylinder groups. Use
different frontends to distinguish the cases in preparation for fine
grained locking for cylinder groups.
 1.115  30-Nov-2008  joerg Split ffs_blkalloc into a frontend that does inode based consistency
checks and a backend that just asserts them. Use the backend in
ffs_wapbl_abort_sync_metadata instead of faking an inode.
 1.114  06-Nov-2008  joerg Remove XXXUBC code for ffs_reallocblks, that has been conditionalized in
2002 and #if 0'ed in 2005. It would need a considerable amount of work
to bring back and obscures the more important block allocation.
 1.113  06-Aug-2008  hannken branches: 1.113.2; 1.113.4;
Do not call UFS_WAPBL_*() when ffs_freefile() is acting on a snapshot.

While here replace the test for VBLK with a convenience variable.
 1.112  31-Jul-2008  hannken Resolve a deadlock when fs_nodealloccg() initializes more inodes on
an UFS2 file system. With the current cylinder group buffer busy it
calls ffs_getblk(). This runs through copy-on-write and may need the
current cylinder group buffer to allocate a new block for the snapshot.

While here write the cylinder group buffer synchronously after
cg_initediblk was changed because fsck_ffs will trust it.

Reviewed by: Jason Thorpe <thorpej@netbsd.org>
 1.111  31-Jul-2008  simonb Merge the simonb-wapbl branch. From the original branch commit:

Add Wasabi System's WAPBL (Write Ahead Physical Block Logging)
journaling code. Originally written by Darrin B. Jewell while
at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

OK'd by core@, releng@.
 1.110  11-Jul-2008  simonb Fix potential 32-bit overflow problem in the blockpref code.
mlelstv@ points out FreeBSD fixed the same thing a couple of years
ago - here's the commit message they used on rev 1.127:

Fixes a bug that caused UFS2 filesystems bigger than 2TB to
prematurely report that they were full and/or to panic the kernel
with the message ``ffs_clusteralloc: allocated out of group''.

Submitted by: Henry Whincup <henry@jot.to>
 1.109  04-Jun-2008  ad branches: 1.109.2; 1.109.4;
When setting DONE on the buffer, assert that there are no waiters in
biowait().
 1.108  03-Jun-2008  hannken ufs/ffs: replace calls to getblk() with ffs_getblk(). Now all buffers
have been run through copy-on-write and async mounts work again.

Fixes PR kern/38820

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.107  16-May-2008  hannken Make sure all cached buffers with valid, not yet written data have been
run through copy-on-write. Call fscow_run() with valid data where possible.

The LP_UFSCOW hack is no longer needed to protect ffs_copyonwrite() against
endless recursion.

- Add a flag B_MODIFY to bread(), breada() and breadn(). If set the caller
intends to modify the buffer returned.

- Always run copy-on-write on buffers returned from ffs_balloc().

- Add new function ffs_getblk() that gets a buffer, assigns a new blkno,
may clear the buffer and runs copy-on-write. Process possible errors
from getblk() or fscow_run(). Part of PR kern/38664.

Welcome to 4.99.63

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.106  21-Jan-2008  pooka branches: 1.106.6; 1.106.8; 1.106.10; 1.106.12; 1.106.14;
Sprinkle comments about um_lock status on function entry and exit.
No functional change.
 1.105  02-Jan-2008  ad Merge vmlocking2 to head.
 1.104  01-Nov-2007  hannken branches: 1.104.2; 1.104.4; 1.104.8;
Avoid doing bawrite to initialize inode block while holding cylinder
group block buffer busy. If filesystem has any active snapshots, bawrite
can come back trying to allocate new snapshot data block from the same
cylinder group and cause deadlock.

From FreeBSD Rev. 1.117
 1.103  18-Oct-2007  hannken Ffs_blkfree() and ffs_freefile() take a devvp that may be a regular file whencalled from snapshot creation. Be sure to use the right mount.

Ok: Andrew Doran <ad@netbsd.org>
 1.102  10-Oct-2007  ad branches: 1.102.2;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.101  08-Oct-2007  ad Merge ffs locking & brelse changes from the vmlocking branch.
 1.100  09-Aug-2007  hannken branches: 1.100.2; 1.100.4;
Move snapshot per-mount data from struct ufsmount to mount specific data.
No functional changes.

Welcome to 4.99.28 (struct ufsmount changed size)
 1.99  16-Jul-2007  pooka branches: 1.99.2; 1.99.6;
When allocating blocks, check minfree before asking kauth about
suser. The latter has unknown cost and rarely needs to be called.
 1.98  04-Mar-2007  christos branches: 1.98.2; 1.98.6;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.97  04-Jan-2007  elad branches: 1.97.2;
Consistent usage of KAUTH_GENERIC_ISSUSER.
 1.96  16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.95  15-Oct-2006  yamt ffs_alloc: remove an assertion which is no longer true.
 1.94  12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.93  23-Jun-2006  yamt branches: 1.93.4; 1.93.6;
fix a simonb-timecounters regression.
the precision of getnanotime() is not suitable for file timestamps.
esp. when it's nfs-exported.

- introduce vfs_timestamp().
(the name is from freebsd. currently merely a wrapper of nanotime())
- for ufs-like filesystems, use it rather than getnanotime().

XXX check other filesystems.
 1.92  07-Jun-2006  kardel branches: 1.92.2;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.91  14-May-2006  elad branches: 1.91.2;
integrate kauth.
 1.90  23-Dec-2005  yamt branches: 1.90.4; 1.90.6; 1.90.8; 1.90.10; 1.90.12;
prevent in-core vnode being freed from getting new references.
otherwise, once the corresponding bit in the inode bitmap is cleared,
an unrelated inode with the same inode number can be allocated and
ufs_ihashget() picks a stale in-core vnode for it.

PR/32301 by Matthias Scheler.
 1.89  27-Nov-2005  dsl Force some multiplies to give a 64 bit result to avoid dirsize being zero
and causing a divide by zero trap later.
Fixes a panic noted in netbsd-help.
 1.88  02-Nov-2005  yamt branches: 1.88.2;
merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.87  26-Sep-2005  yamt branches: 1.87.2;
always use nanotime rather than time.
it's bad to mix nanotime and time because it sometimes
make timestamps go backwards.
 1.86  19-Aug-2005  christos 64 bit inode changes.
 1.85  15-Jul-2005  thorpej Use ANSI function decls.
 1.84  06-Jun-2005  dbj branches: 1.84.2;
remove (long) cast on bpref, which is daddr_t
 1.83  29-May-2005  christos - sprinkle const
- avoid shadow variables.
 1.82  22-May-2005  hannken ffs/ffs_alloc.c:
- Add a missing ACTIVECG_CLR().

ffs/ffs_snapshot.c:
- Use async/delayed writes for snapshot creation and sync/uncache these buffers
on end. Reduces the time the file system must be suspended.
- Remove um_snaplistsize. Was a duplicate of um_snapblklist[0].
- Byte swap the list of preallocated blocks on read/write instead of access.
- Always keep this list on ip->i_snapblklist so it may be rolled back when the
newest snapshot gets removed. Fixes a rare snapshot corruption when using
more than one snapshot on a file system.

ufs/ufsmount.h:
- Make TAILQ_LAST() possible on member um_snapshots.
- Remove um_snaplistsize. Was a duplicate of um_snapblklist[0].
 1.81  26-Feb-2005  perry branches: 1.81.2;
nuke trailing whitespace
 1.80  15-Dec-2004  mycroft branches: 1.80.2; 1.80.4;
Remove some unnecessary (int32_t) casts that would cause us to screw up the
top bit in block addresses.

Also, change some daddr_t->int32_t casts (mostly as arguments to ufs_rw32(),
where they would get promoted anyway) to u_int32_t.
 1.79  11-Oct-2004  dbj print absolute inode number in debug output when freeing free inode occurs.
previously, the number was relative to the cylinder group, which was confusing.
prefix debug message with "ifree:" so this can be differentiated in bug reports.
 1.78  29-Aug-2004  hannken While creating a snapshot inodes must be freed from the
snapshot, not from the file system.
ffs_freefile() needs explicit "fs" and "devvp" arguments.
 1.77  26-May-2004  hannken Don't use VTOI(vp)->i_flags to test for snapshot devices. Will not work
for non-UFS file systems. Test for VBLK vnode instead.
 1.76  25-May-2004  hannken Add ffs internal snapshots. Written by Marshall Kirk McKusick for FreeBSD.

- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.

Welcome to 2.0F.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.75  18-Apr-2004  dbj when enabling ffs compatibility in ffs_reload, use
sblockloc that superblock was read from
also note XXX that ffs_reload doesn't handle superblock moving
 1.74  13-Jan-2004  soren branches: 1.74.2;
With large average filesizes, it was possible to overflow dirsize to zero,
causing a division by zero in ffs_dirpref().

From Barry Bouwsma of Tiengen.
 1.73  09-Jan-2004  dbj never upgrade the superblock or set FS_FLAGS_UPDATED in fs_old_flags
add compatibility for filesystems created before FFSv2 integration
these patches are from pr port-macppc/23926 and should also fix
problems discussed in pr kern/21404 and pr kern/21283
 1.72  30-Dec-2003  pk Replace the traditional buffer memory management -- based on fixed per buffer
virtual memory reservation and a private pool of memory pages -- by a scheme
based on memory pools.

This allows better utilization of memory because buffers can now be allocated
with a granularity finer than the system's native page size (useful for
filesystems with e.g. 1k or 2k fragment sizes). It also avoids fragmentation
of virtual to physical memory mappings (due to the former fixed virtual
address reservation) resulting in better utilization of MMU resources on some
platforms. Finally, the scheme is more flexible by allowing run-time decisions
on the amount of memory to be used for buffers.

On the other hand, the effectiveness of the LRU queue for buffer recycling
may be somewhat reduced compared to the traditional method since, due to the
nature of the pool based memory allocation, the actual least recently used
buffer may release its memory to a pool different from the one needed by a
newly allocated buffer. However, this effect will kick in only if the
system is under memory pressure.
 1.71  27-Nov-2003  mycroft Remove part of previous -- there is NO reason for directory allocation to use
arc4random().
 1.70  05-Sep-2003  itojun use arc4random instead of random (mask with INT32_MAX to avoid getting
negative numbers unexpectedly).
 1.69  07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.68  29-Jun-2003  fvdl branches: 1.68.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.67  29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.66  28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.65  15-May-2003  kristerw The C language does not permit statements of the form
(X ? Y : Z) = 0;
even though gcc handles this by a stupid extension.

Transform these to correct C.

Approved by fvdl.
 1.64  04-May-2003  gmcgarry Print pid on error. From Greg A. Woods in PR#17393.
 1.63  17-Apr-2003  fvdl configdirs was changed to an array of u_int8_t, so don't compare values
to 65535.
 1.62  12-Apr-2003  fvdl Use variables for some cg accesses; makes things more readable and more
similar to FreeBSD. No functional change.
 1.61  10-Apr-2003  fvdl Initialize the 'mirror' i_flags fiels in struct inode to 0.
 1.60  02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.59  26-Jan-2003  tsutsui More printf format cleanup to reduce casts.
 1.58  24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.57  27-Dec-2002  hannken Clear IN_SPACECOUNTED on (re-)used inodes.
This cures the "unmount pending error:" on softdep umounts.

Approved by: Frank van der Linden <fvdl@netbsd.org>
 1.56  27-Sep-2002  provos remove trailing \n in panic(). approved perry.
 1.55  14-May-2002  matt branches: 1.55.4;
Commit out code that's no longer used.
 1.54  10-Apr-2002  mycroft Use blkstofrags() and fragstoblks(). Use &(NBBY-1) rather than %NBBY.
Switch off of fs_fragshift rather than fs_frag (generates better jump tables).
 1.53  30-Oct-2001  lukem add __KERNEL_RCSID()
 1.52  19-Sep-2001  lukem branches: 1.52.2;
- ffs_blkpref() changes:
- don't both updating fs->fs_cgrotor, since it's actually not used in
the kernel. from Manuel Bouyer in [kern/3389]
- when examining cylinder groups from startcg to startcg-1 (wrapping
at fs->fs_ncg), there's no need to check startcg at the end as well
as the start...
- highlight in the struct fs declaration that fs_cgrotor is UNUSED
 1.51  15-Sep-2001  chs a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.50  06-Sep-2001  lukem branches: 1.50.2;
Incorporate the enhanced ffs_dirpref() by Grigoriy Orlov, as found in
FreeBSD (three commits; the initial work, man page updates, and a fix
to ffs_reload()), with the following differences:
- Be consistent between newfs(8) and tunefs(8) as to the options which
set and control the tuning parameters for this work (avgfilesize & avgfpdir)
- Use u_int16_t instead of u_int8_t to keep track of the number of
contiguous directories (suggested by Chuck Silvers)
- Work within our FFS_EI framework
- Ensure that fs->fs_maxclusters and fs->fs_contigdirs don't point to
the same area of memory

The new algorithm has a marked performance increase, especially when
performing tasks such as untarring pkgsrc.tar.gz, etc.

The original FreeBSD commit messages are attached:

=====
mckusick 2001/04/10 01:39:00 PDT
Directory layout preference improvements from Grigoriy Orlov <gluk@ptci.ru>.
His description of the problem and solution follow. My own tests show
speedups on typical filesystem intensive workloads of 5% to 12% which
is very impressive considering the small amount of code change involved.

------

One day I noticed that some file operations run much faster on
small file systems then on big ones. I've looked at the ffs
algorithms, thought about them, and redesigned the dirpref algorithm.

First I want to describe the results of my tests. These results are old
and I have improved the algorithm after these tests were done. Nevertheless
they show how big the perfomance speedup may be. I have done two file/directory
intensive tests on a two OpenBSD systems with old and new dirpref algorithm.
The first test is "tar -xzf ports.tar.gz", the second is "rm -rf ports".
The ports.tar.gz file is the ports collection from the OpenBSD 2.8 release.
It contains 6596 directories and 13868 files. The test systems are:

1. Celeron-450, 128Mb, two IDE drives, the system at wd0, file system for
test is at wd1. Size of test file system is 8 Gb, number of cg=991,
size of cg is 8m, block size = 8k, fragment size = 1k OpenBSD-current
from Dec 2000 with BUFCACHEPERCENT=35

2. PIII-600, 128Mb, two IBM DTLA-307045 IDE drives at i815e, the system
at wd0, file system for test is at wd1. Size of test file system is 40 Gb,
number of cg=5324, size of cg is 8m, block size = 8k, fragment size = 1k
OpenBSD-current from Dec 2000 with BUFCACHEPERCENT=50

You can get more info about the test systems and methods at:
http://www.ptci.ru/gluk/dirpref/old/dirpref.html

Test Results

tar -xzf ports.tar.gz rm -rf ports
mode old dirpref new dirpref speedup old dirprefnew dirpref speedup
First system
normal 667 472 1.41 477 331 1.44
async 285 144 1.98 130 14 9.29
sync 768 616 1.25 477 334 1.43
softdep 413 252 1.64 241 38 6.34
Second system
normal 329 81 4.06 263.5 93.5 2.81
async 302 25.7 11.75 112 2.26 49.56
sync 281 57.0 4.93 263 90.5 2.9
softdep 341 40.6 8.4 284 4.76 59.66

"old dirpref" and "new dirpref" columns give a test time in seconds.
speedup - speed increasement in times, ie. old dirpref / new dirpref.

------

Algorithm description

The old dirpref algorithm is described in comments:

/*
* Find a cylinder to place a directory.
*
* The policy implemented by this algorithm is to select from
* among those cylinder groups with above the average number of
* free inodes, the one with the smallest number of directories.
*/

A new directory is allocated in a different cylinder groups than its
parent directory resulting in a directory tree that is spreaded across
all the cylinder groups. This spreading out results in a non-optimal
access to the directories and files. When we have a small filesystem
it is not a problem but when the filesystem is big then perfomance
degradation becomes very apparent.

What I mean by a big file system ?

1. A big filesystem is a filesystem which occupy 20-30 or more percent
of total drive space, i.e. first and last cylinder are physically
located relatively far from each other.
2. It has a relatively large number of cylinder groups, for example
more cylinder groups than 50% of the buffers in the buffer cache.

The first results in long access times, while the second results in
many buffers being used by metadata operations. Such operations use
cylinder group blocks and on-disk inode blocks. The cylinder group
block (fs->fs_cblkno) contains struct cg, inode and block bit maps.
It is 2k in size for the default filesystem parameters. If new and
parent directories are located in different cylinder groups then the
system performs more input/output operations and uses more buffers.
On filesystems with many cylinder groups, lots of cache buffers are
used for metadata operations.

My solution for this problem is very simple. I allocate many directories
in one cylinder group. I also do some things, so that the new allocation
method does not cause excessive fragmentation and all directory inodes
will not be located at a location far from its file's inodes and data.
The algorithm is:
/*
* Find a cylinder group to place a directory.
*
* The policy implemented by this algorithm is to allocate a
* directory inode in the same cylinder group as its parent
* directory, but also to reserve space for its files inodes
* and data. Restrict the number of directories which may be
* allocated one after another in the same cylinder group
* without intervening allocation of files.
*
* If we allocate a first level directory then force allocation
* in another cylinder group.
*/

My early versions of dirpref give me a good results for a wide range of
file operations and different filesystem capacities except one case:
those applications that create their entire directory structure first
and only later fill this structure with files.

My solution for such and similar cases is to limit a number of
directories which may be created one after another in the same cylinder
group without intervening file creations. For this purpose, I allocate
an array of counters at mount time. This array is linked to the superblock
fs->fs_contigdirs[cg]. Each time a directory is created the counter
increases and each time a file is created the counter decreases. A 60Gb
filesystem with 8mb/cg requires 10kb of memory for the counters array.

The maxcontigdirs is a maximum number of directories which may be created
without an intervening file creation. I found in my tests that the best
performance occurs when I restrict the number of directories in one cylinder
group such that all its files may be located in the same cylinder group.
There may be some deterioration in performance if all the file inodes
are in the same cylinder group as its containing directory, but their
data partially resides in a different cylinder group. The maxcontigdirs
value is calculated to try to prevent this condition. Since there is
no way to know how many files and directories will be allocated later
I added two optimization parameters in superblock/tunefs. They are:

int32_t fs_avgfilesize; /* expected average file size */
int32_t fs_avgfpdir; /* expected # of files per directory */

These parameters have reasonable defaults but may be tweeked for special
uses of a filesystem. They are only necessary in rare cases like better
tuning a filesystem being used to store a squid cache.

I have been using this algorithm for about 3 months. I have done
a lot of testing on filesystems with different capacities, average
filesize, average number of files per directory, and so on. I think
this algorithm has no negative impact on filesystem perfomance. It
works better than the default one in all cases. The new dirpref
will greatly improve untarring/removing/coping of big directories,
decrease load on cvs servers and much more. The new dirpref doesn't
speedup a compilation process, but also doesn't slow it down.

Obtained from: Grigoriy Orlov <gluk@ptci.ru>
=====

=====
iedowse 2001/04/23 17:37:17 PDT
Pre-dirpref versions of fsck may zero out the new superblock fields
fs_contigdirs, fs_avgfilesize and fs_avgfpdir. This could cause
panics if these fields were zeroed while a filesystem was mounted
read-only, and then remounted read-write.

Add code to ffs_reload() which copies the fs_contigdirs pointer
from the previous superblock, and reinitialises fs_avgf* if necessary.

Reviewed by: mckusick
=====

=====
nik 2001/04/10 03:36:44 PDT
Add information about the new options to newfs and tunefs which set the
expected average file size and number of files per directory. Could do
with some fleshing out.
=====
 1.49  31-Aug-2001  lukem no need to cast arg to lblktosize() any more
 1.48  30-Aug-2001  lukem be consistent when casting arg to lblktosize() in UVM_PAGE_TRKOWN debug code
 1.47  24-Aug-2001  wiz heirarchy -> hierarchy
 1.46  20-Aug-2001  wiz precede, not preceed.
 1.45  09-Aug-2001  lukem correctly cast arguments to scanc()
 1.44  03-Jun-2001  chs branches: 1.44.4;
fix an error case for quotas.
 1.43  30-May-2001  mrg use _KERNEL_OPT
 1.42  13-Mar-2001  sommerfeld Change ffs_dirpref() to pay attention to the amount of available free
space before deciding which cylinder group should contain a new directory
inode.

Fixes kern/11983; works around some, but not all, of the side effects
of kern/11989.

Tested by me for well over a month on my laptop; preliminary versions of
the fix were tested by Frank van der Linden and Herb Peyerl.
 1.41  05-Feb-2001  chs branches: 1.41.2;
add casts to an assertion in ffs_alloc() so it works with offsets past 4GB.
 1.40  18-Jan-2001  jdolecek constify
 1.39  30-Nov-2000  nathanw Don't set the value of doreallocblks here; it's defined over in vfs_cluster.c
In fact, doreallocblks isn't used here at all. Delete the declaration.
 1.38  30-Nov-2000  jdolecek change vfs.ffs.doreallocblks to 1 by default - this does not have
aby bad symptoms any more, fix for bug causing problems with this
option was in BSD4.4-Lite2 and pulled in together with softdep changes

See also Keith Smith & Margo Seltzer's paper on the topic at
http://www.eecs.harvard.edu/~keith/papers/realloc.ps.gz
 1.37  27-Nov-2000  chs Initial integration of the Unified Buffer Cache project.
 1.36  28-Jun-2000  mrg remove include of <vm/vm.h> and <uvm/uvm_extern.h>
 1.35  19-May-2000  thorpej branches: 1.35.4;
NULL != 0
 1.34  04-Apr-2000  jdolecek Add a new sysctl variable vfs.ffs.log_changeopt - if this is true,
an optimalization strategy change is logged into syslog. Default
is 0 (to not log). This replaces the recent not quite "right"
change to only log the change if kernel is compiled with DEBUG.
 1.33  30-Mar-2000  augustss Remove register declarations.
 1.32  29-Mar-2000  jdolecek Log the optimization changes only if DEBUG. Fixes kern/9697
 1.31  14-Feb-2000  fvdl Fixes to the softdep code from Ethan Solomita <ethan@geocast.com>.
* Fix buffer ordering when it has dependencies.
* Alleviate memory problems.
* Deal with some recursive vnode locks (sigh).
* Fix other bugs.
 1.30  15-Nov-1999  fvdl Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.29  24-Mar-1999  mrg branches: 1.29.4; 1.29.8; 1.29.10; 1.29.14;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.
 1.28  05-Mar-1999  mycroft Pass null pointers to VOP_UPDATE rather than having all the callers fetch the
current time themselves.
 1.27  12-Nov-1998  thorpej defopt FFS_EI
 1.26  18-Aug-1998  thorpej branches: 1.26.2;
Back out part of last change (uninitialized work-around).
 1.25  18-Aug-1998  thorpej Add some braces to make egcs happy (ambiguous else warning). Also,
deal with bogus uninitialized warning (__noreturn__ related)
 1.24  09-Aug-1998  perry bzero->memset, bcopy->memcpy, bcmp->memcmp
 1.23  28-Jul-1998  drochner The fragtbl[], inside[] and around[] variables are needed by "fsck",
so we can't put them inside "#ifdef _KERNEL".
Put declarations inside .c files where needed to preserve namespace.
 1.22  09-Jun-1998  scottr Protect various config(8)-generated files from inclusion while
building LKMs. Fixes PR 5557.
 1.21  08-Jun-1998  scottr Use the newly-defined opt_quota.h.
 1.20  19-Mar-1998  ross Fix a 64-bit pointer/int warning.
 1.19  18-Mar-1998  bouyer Add support for reading/writing FFS in non-native byte order, conditioned
to "options FFS_EI". The superblock and inodes (without blk addr) are
byteswapped at disk read/write time, other metadatas are byteswapped
when used (as they are acceeded directly in the buffer cache).
This required the addition of a "um_flags" field to struct ufsmount.
ffs_bswap.c contains superblock and inode byteswap routines also used
by userland utilities.
 1.18  01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.17  10-Feb-1998  mrg - add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.16  05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)
 1.15  11-Jun-1997  bouyer Add support for ext2fs, this needed a few modifications to ufs/ufs/inode.h:
- added an "union inode_ext" to struct inode, for the per-fs extentions.
For now only ext2fs uses it.
- i_din is now an union:
union {
struct dinode ffs_din; /* 128 bytes of the on-disk dinode. */
struct ext2fs_dinode e2fs_din; /* 128 bytes of the on-disk dinode. */
} i_din
Added a lot of #define i_ffs_* and i_e2fs_* to access the fields.
- Added two macros: FFS_ITIMES and EXT2FS_ITIMES. ITIMES calls the rigth
macro, depending on the time of the inode. ITIMES is used where necessary,
FFS_ITIMES and EXT2FS_ITIMES in other places.
 1.14  10-Mar-1997  mycroft Just increment the generation count. Using the time is bogus and defeats
fsirand(8).
 1.13  12-Oct-1996  christos branches: 1.13.6;
revert previous kprintf changes
 1.12  10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.11  11-May-1996  mycroft Change VOP_UPDATE() semantics:
* Make 2nd and 3rd args timespecs, not timevals.
* Consistently pass a Boolean as the 4th arg (except in LFS).
Also, fix ffs_update() and lfs_update() to actually change the nsec fields.
 1.10  17-Mar-1996  christos Fix printf format strings
 1.9  09-Feb-1996  christos ffs prototypes
 1.8  19-Jul-1995  cgd don't just throw away updates to the cylinder group bitmaps, actually
write them to disk! From Keith Smith at Harvard, via Kirk McKusick.
fixes the occasional `blkfree: freeing free block' that has been seen
when cluster reallocation code is enabled.
 1.7  24-Mar-1995  cgd explicitly cast &time to (struct timeval *) when passing it to VOP_UPDATE.
new prototypes and picky compilers make a volatile mess.
 1.6  16-Dec-1994  mycroft Ignore rotational optimization if nrpos == 1, as suggested by Stefan Esser.
 1.5  14-Dec-1994  mycroft Sync with CSRG.
 1.4  20-Oct-1994  cgd update for new syscall args description mechanism, and deal safely
with wider types.
 1.3  04-Jul-1994  mycroft Do the doasyncfree conditionalization better.
 1.2  29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1  08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.3  01-Mar-1998  fvdl Import some files that were changed after Lite2
 1.1.1.2  01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1  01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.13.6.1  12-Mar-1997  is Merge in changes from Trunk
 1.26.2.4  30-May-1999  chs uvm_vnp_setpageblknos() is gone, and some misc cleanup.
 1.26.2.3  09-Apr-1999  chs in ffs_reallocg(), don't dereference bpp if it's NULL.
 1.26.2.2  25-Feb-1999  chs replace uvm_vnp_relocate() with uvm_vnp_setpageblknos().
 1.26.2.1  09-Nov-1998  chs initial snapshot. lots left to do.
 1.29.14.2  27-Dec-1999  wrstuden Pull up to last week's -current.
 1.29.14.1  21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.29.10.1  19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.29.8.4  27-Mar-2001  bouyer Sync with HEAD.
 1.29.8.3  11-Feb-2001  bouyer Sync with HEAD.
 1.29.8.2  08-Dec-2000  bouyer Sync with HEAD.
 1.29.8.1  20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.29.4.2  11-Jul-1999  chs remove uvm_vnp_uncache(), it's no longer needed.
 1.29.4.1  07-Jun-1999  chs merge everything from chs-ubc branch.
 1.35.4.4  25-Nov-2001  he Pull up revision 1.52 (requested by lukem):
Mark fs_cgrotor as unused.
 1.35.4.3  25-Nov-2001  he Pull up revision 1.50 (requested by lukem):
Pull in enhanced ffs_dirpref() algorithm, which provides a
substantial performance improvement through better locality
between parent/child directories and their files, and by easing
the pressure on the buffer cache for metadata operations.
 1.35.4.2  25-Nov-2001  he Pull up revision 1.45 (requested by lukem):
Fix scanc() arguments.
 1.35.4.1  25-Nov-2001  he Pull up revision 1.42 (requested by lukem):
Change ffs_dirpref() to be less pathological.
 1.41.2.12  29-Dec-2002  thorpej Sync with HEAD.
 1.41.2.11  18-Oct-2002  nathanw Catch up to -current.
 1.41.2.10  15-Jul-2002  nathanw Revert to curproc.
 1.41.2.9  24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.41.2.8  20-Jun-2002  nathanw Catch up to -current.
 1.41.2.7  17-Apr-2002  nathanw Catch up to -current.
 1.41.2.6  14-Nov-2001  nathanw Catch up to -current.
 1.41.2.5  21-Sep-2001  nathanw Catch up to -current.
 1.41.2.4  24-Aug-2001  nathanw Catch up with -current.
 1.41.2.3  21-Jun-2001  nathanw Catch up to -current.
 1.41.2.2  09-Apr-2001  nathanw Catch up with -current.
 1.41.2.1  05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.44.4.5  10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.44.4.4  23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.44.4.3  10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.44.4.2  13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.44.4.1  25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.50.2.1  01-Oct-2001  fvdl Catch up with -current.
 1.52.2.1  12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.55.4.1  05-Jan-2003  jmc Pull up revisions 1.56-1.57 (requested by hannken in ticket #1049)
Clear IN_SPACECOUNTED on (re-)used inodes.
This cures the "unmount pending error:" on softdep umounts.
 1.68.2.11  11-Dec-2005  christos Sync with head.
 1.68.2.10  10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.68.2.9  04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.68.2.8  18-Dec-2004  skrll Sync with HEAD.
 1.68.2.7  19-Oct-2004  skrll Sync with HEAD
 1.68.2.6  21-Sep-2004  skrll Fix the sync with head I botched.
 1.68.2.5  18-Sep-2004  skrll Sync with HEAD.
 1.68.2.4  03-Sep-2004  skrll Sync with HEAD
 1.68.2.3  24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.68.2.2  03-Aug-2004  skrll Sync with HEAD
 1.68.2.1  02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.74.2.1  27-Apr-2004  jdc Pull up revision 1.75 (requested by dbj in ticket #185)

Fix problems related to superblock upgrade issues which may be
experienced by -current users from 2003.
 1.80.4.1  19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.80.2.1  29-Apr-2005  kent sync with -current
 1.81.2.1  28-May-2005  tron Pull up revision 1.82 (requested by hannken in ticket #334):
ffs/ffs_alloc.c:
- Add a missing ACTIVECG_CLR().
ffs/ffs_snapshot.c:
- Use async/delayed writes for snapshot creation and sync/uncache these buffers
on end. Reduces the time the file system must be suspended.
- Remove um_snaplistsize. Was a duplicate of um_snapblklist[0].
- Byte swap the list of preallocated blocks on read/write instead of access.
- Always keep this list on ip->i_snapblklist so it may be rolled back when the
newest snapshot gets removed. Fixes a rare snapshot corruption when using
more than one snapshot on a file system.
ufs/ufsmount.h:
- Make TAILQ_LAST() possible on member um_snapshots.
- Remove um_snaplistsize. Was a duplicate of um_snapblklist[0].
 1.84.2.8  04-Feb-2008  yamt sync with head.
 1.84.2.7  21-Jan-2008  yamt sync with head
 1.84.2.6  15-Nov-2007  yamt sync with head.
 1.84.2.5  27-Oct-2007  yamt sync with head.
 1.84.2.4  03-Sep-2007  yamt sync with head.
 1.84.2.3  26-Feb-2007  yamt sync with head.
 1.84.2.2  30-Dec-2006  yamt sync with head.
 1.84.2.1  21-Jun-2006  yamt sync with head.
 1.87.2.2  29-Oct-2005  yamt use ffs_* directly rather than via ufs_ops.
suggested by Chuck Silvers.
 1.87.2.1  20-Oct-2005  yamt adapt ufs.
 1.88.2.1  29-Nov-2005  yamt sync with head.
 1.90.12.1  24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.90.10.2  06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.90.10.1  08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.90.8.2  26-Jun-2006  yamt sync with head.
 1.90.8.1  24-May-2006  yamt sync with head.
 1.90.6.2  01-Jun-2006  kardel Sync with head.
 1.90.6.1  04-Feb-2006  simonb Adapt for timecounters: mostly use get*time() and use "time_second"
instead of "time.tv_sec".
 1.90.4.1  09-Sep-2006  rpaulo sync with head
 1.91.2.1  19-Jun-2006  chap Sync with head.
 1.92.2.1  13-Jul-2006  gdamore Merge from HEAD.
 1.93.6.2  10-Dec-2006  yamt sync with head.
 1.93.6.1  22-Oct-2006  yamt sync with head
 1.93.4.2  12-Jan-2007  ad Sync with head.
 1.93.4.1  18-Nov-2006  ad Sync with head.
 1.97.2.1  12-Mar-2007  rmind Sync with HEAD.
 1.98.6.1  19-Mar-2007  reinoud Move the structure `cluster_save' to the dead ufs/ffs code that was using
it solely.

Preserved just in case the code is resurrected one day.
 1.98.2.8  23-Oct-2007  ad Sync with head.
 1.98.2.7  24-Aug-2007  ad Sync with buffer cache locking changes. See buf.h/vfs_bio.c for details.
Some minor portions are incomplete and needs to be verified as a whole.
 1.98.2.6  20-Aug-2007  ad Sync with HEAD.
 1.98.2.5  17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.98.2.4  13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.98.2.3  06-May-2007  ad ffs_blkfree: don't leak ump->um_lock.
 1.98.2.2  13-Apr-2007  ad Put a per-mount lock around ffs shared data structures, excluding softdep
and quotas. Strategy lifted from FreeBSD.
 1.98.2.1  13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.99.6.3  04-Nov-2007  jmcneill Sync with HEAD.
 1.99.6.2  26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.99.6.1  16-Aug-2007  jmcneill Sync with HEAD.
 1.99.2.1  15-Aug-2007  skrll Sync with HEAD.
 1.100.4.1  14-Oct-2007  yamt sync with head.
 1.100.2.3  23-Mar-2008  matt sync with HEAD
 1.100.2.2  09-Jan-2008  matt sync with HEAD
 1.100.2.1  06-Nov-2007  matt sync with HEAD
 1.102.2.2  13-Nov-2007  bouyer Sync with HEAD
 1.102.2.1  25-Oct-2007  bouyer Sync with HEAD.
 1.104.8.2  23-Jan-2008  bouyer Sync with HEAD.
 1.104.8.1  02-Jan-2008  bouyer Sync with HEAD
 1.104.4.1  04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.104.2.1  18-Feb-2008  mjf Sync with HEAD.
 1.106.14.2  18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.106.14.1  23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.106.12.3  11-Mar-2010  yamt sync with head
 1.106.12.2  16-May-2009  yamt sync with head
 1.106.12.1  04-May-2009  yamt sync with head.
 1.106.10.3  17-Jun-2008  yamt sync with head.
 1.106.10.2  04-Jun-2008  yamt sync with head
 1.106.10.1  18-May-2008  yamt sync with head.
 1.106.8.8  04-Jan-2009  christos fix diagnostic printfs.
 1.106.8.7  30-Dec-2008  christos fix dev_t printfs
 1.106.8.6  28-Dec-2008  christos deal with new printfs format inconsistencies.
 1.106.8.5  27-Dec-2008  christos merge with head.
 1.106.8.4  09-Nov-2008  christos merge with head.
 1.106.8.3  01-Nov-2008  christos catch up with changes in head.
 1.106.8.2  01-Nov-2008  christos Sync with head.
 1.106.8.1  29-Mar-2008  christos Welcome to the time_t=long long dev_t=uint64_t branch.
 1.106.6.4  17-Jan-2009  mjf Sync with HEAD.
 1.106.6.3  28-Sep-2008  mjf Sync with HEAD.
 1.106.6.2  05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.106.6.1  02-Jun-2008  mjf Sync with HEAD.
 1.109.4.2  13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.109.4.1  19-Oct-2008  haad Sync with HEAD.
 1.109.2.4  28-Jul-2008  simonb Add support for creating a WAPBL log in the filesystem. Will
create an in-filesystem log on first "mount -o log" if one doesn't
exist, and will then continue to use same log in the future. See
(soon to be added) wapbl(4) for more info.

Adds a new B_CONTIG low-level allocation flag that uses hints in
"struct ffs_inode_ext" to lay out an ffs file's data contiguously.

Thanks to Greg Oster for helping with the design of this and to
Antti Kantee for code review and suggestions.
 1.109.2.3  18-Jul-2008  simonb Sync with head.
 1.109.2.2  12-Jun-2008  martin License police
 1.109.2.1  10-Jun-2008  simonb Initial commit of Wasabi System's WAPBL (Write Ahead Physical Block
Logging) journaling code. Originally written by Darrin B. Jewell
while at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

Still a number of issues - look in doc/BRANCHES for "simonb-wapbl"
for more info.
 1.113.4.3  29-Oct-2013  sborrill Pull up the following revisions(s) (requested by bad in ticket #1888):
sys/ufs/ffs/ffs_alloc.c: revision 1.144 via patch

Pull in fix from FreeBSD ffs_alloc.c r121785:
Consider only cylinder groups with at least 75% of the average free space
per cylinder group and 75% of the average free inodes per cylinder group
as candidates for the creation of a new directory. Avoids excessive I/O
scanning for a suitable cylinder group on relatively full file systems.
 1.113.4.2  07-May-2009  snj Pull up following revision(s) (requested by sborrill in ticket #726):
sys/ufs/ffs/ffs_alloc.c: revision 1.123 via patch
Fix random 'filesystem full' messages by trapping a couple of 32-bit
overflow areas missed in rev 1.110 and switching cgbase().
Kudos to rump_ffs!
 1.113.4.1  24-Feb-2009  snj branches: 1.113.4.1.2;
Pull up following revision(s) (requested by ad in ticket #490):
sys/kern/vfs_wapbl.c: revision 1.23
sys/miscfs/syncfs/sync_subr.c: revision 1.36
sys/miscfs/syncfs/sync_vnops.c: revision 1.26
sys/ufs/ffs/ffs_alloc.c: revision 1.121
sys/ufs/ffs/ffs_vfsops.c: revision 1.242
sys/ufs/ffs/ffs_vnops.c: revision 1.110
PR kern/39564 wapbl performance issues with disk cache flushing
PR kern/40361 WAPBL locking panic in -current
PR kern/40361 WAPBL locking panic in -current
PR kern/40470 WAPBL corrupts ext2fs
PR kern/40562 busy loop in ffs_sync when unmounting a file system
PR kern/40525 panic: ffs_valloc: dup alloc
- A fix for an issue that can lead to "ffs_valloc: dup" due to dirty cg
buffers being invalidated. Problem discovered and patch by dholland@.
- If the syncer fails to lazily sync a vnode due to lock contention,
retry 1 second later instead of 30 seconds later.
- Flush inode atime updates every ~10 seconds (this makes most sense with
logging). Presently they didn't hit the disk for read-only files or
devices until the file system was unmounted. It would be better to trickle
the updates out but that would require more extensive changes.
- Fix issues with file system corruption, busy looping and other nasty
problems when logging and non-logging file systems are intermixed,
with one being the root file system.
- For logging, do not flush metadata on an inode-at-a-time basis if the sync
has been requested by ioflush. Previously, we could try hundreds of log
sync operations a second due to inode update activity, causing the syncer
to fall behind and metadata updates to be serialized across the entire
file system. Instead, burst out metadata and log flushes at a minimum
interval of every 10 seconds on an active file system (happens more often
if the log becomes full). Note this does not change the operation of
fsync() etc.
- With the flush issue fixed, re-enable concurrent metadata updates in
vfs_wapbl.c.
 1.113.4.1.2.1  07-May-2009  snj branches: 1.113.4.1.2.1.2;
Pull up following revision(s) (requested by sborrill in ticket #726):
sys/ufs/ffs/ffs_alloc.c: revision 1.123 via patch
Fix random 'filesystem full' messages by trapping a couple of 32-bit
overflow areas missed in rev 1.110 and switching cgbase().
Kudos to rump_ffs!
 1.113.4.1.2.1.2.1  21-Apr-2010  matt sync to netbsd-5
 1.113.2.3  28-Apr-2009  skrll Sync with HEAD.
 1.113.2.2  03-Mar-2009  skrll Sync with HEAD.
 1.113.2.1  19-Jan-2009  skrll Sync with HEAD.
 1.120.2.1  13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.124.2.1  30-Apr-2010  uebayasi Sync with HEAD.
 1.125.6.1  20-Jan-2011  bouyer Snapshot of work in progress on a modernised disk quota system:
- new quotactl syscall (versionned for backward compat), which takes
as parameter a path to a mount point, and a prop_dictionary
(in plistref format) describing commands and arguments.
For each command, status and data are returned as a prop_dictionary.
quota commands features will be added to take advantage of this,
exporting quota data or getting quota commands as plists.

- new on disk-format storage (all 64bit wide), integrated to metadata for
ffs (and playing nicely with wapbl).
Quotas are enabled on a ffs filesystem via superblock flags.
tunefs(8) can enable or disable quotas.
On a quota-enabled filesystem, fsck_ffs(8) will track per-uid/gid
block and inode usages, and will check and update quotas in Pass 6.
quota usage and limits are stored in unliked files (one for users,
one for groups)l fsck_ffs(8) will create the files if needed, or
free them if needed. This means that after enabling or disabling
quotas on a filesystem; a fsck_ffs(8) run is required.
quotacheck(8) is not needed any more, on a unclean shutdown
fsck or journal replay will take care of fixing quotas.
newfs(8) can create a ready-to-mount quota-enabled filesystem
(superblock flags are set and quota inodes are created).
Other new features or semantic changes:
- default quota datas, applied to users or groups which don't already
have a quota entry
- per-user/group grace time (instead of a filesystem global one)
- 0 really means "nothing allowed at all", not "no limit".
If you want "no limit", set the limit to UQUAD_MAX (tools will
understand "unlimited" and "-")

A quota file is structured as follow:
it starts with a header, containing a few per-filesystem values,
and the default quota limits.
Quota entries are linked together as a simple list, each entry has a
pointer (as an offset withing the file) to the next.
The header has a pointer to a list of free quota entries, and
a hash table of in-use entries. The size of the hash table depends
on the filesystem block size (header+hash table should fit in the
first block). The file is not sparse and is a multiple of
filesystem block size (when the free quota entry list is empty a new
filesystem block is allocated). quota entries to not cross
filesystem block boundaries.

In memory, the kernel keeps a cache of recently used quota entries
as a reference to the block number, and offset withing the block.
The quota entry itself is keept in the buf cache.

fsck_ffs(8), tunefs(8) and newfs(8) supports are completed (with
related atf tests :)
The kernel can update disk usage and report it via quotactl(2).

Todo: enforce quotas limits (limits are not checked by kernel yet)
update repquota, edquota and rpc.rquotad to the new world
implement compat_50_quotactl ioctl.
update quotactl(2) man page

fsck_ffs required fixes so that allocating new blocks or inodes will
properly update the superblock and cg sumaries. This was not an issue up
to now because superblock and cg sumaries check happened last, but now
allocations or frees can happen in pass 6.
 1.125.4.1  06-Jun-2011  jruoho Sync with HEAD.
 1.125.2.2  21-Apr-2011  rmind sync with head
 1.125.2.1  16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.127.2.1  23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.129.2.4  22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.129.2.3  23-Jan-2013  yamt sync with head
 1.129.2.2  30-Oct-2012  yamt sync with head
 1.129.2.1  17-Apr-2012  yamt sync with head
 1.130.8.5  03-Dec-2017  jdolecek update from HEAD
 1.130.8.4  20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.130.8.3  23-Jun-2013  tls resync from head
 1.130.8.2  25-Feb-2013  tls resync with head
 1.130.8.1  20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.130.4.1  29-Oct-2013  sborrill Pull up the following revisions(s) (requested by bad in ticket #978):
sys/ufs/ffs/ffs_alloc.c: revision 1.144

Pull in fix from FreeBSD ffs_alloc.c r121785:
Consider only cylinder groups with at least 75% of the average free space
per cylinder group and 75% of the average free inodes per cylinder group
as candidates for the creation of a new directory. Avoids excessive I/O
scanning for a suitable cylinder group on relatively full file systems.
 1.138.2.1  18-May-2014  rmind sync with head
 1.145.2.1  10-Aug-2014  tls Rebase.
 1.146.2.2  29-May-2019  martin Pull up following revision(s) (requested by kardel in ticket #1697):

sys/ufs/ffs/ffs_alloc.c: revision 1.164

PR/53990, PR/52380, PR/52102: UFS2 cylinder group inode allocation botch

Fix rare allocation botch in ffs_nodealloccg().

Conditions:
a) less than
#_of_initialized_inodes(cg->cg_initediblk)
- inodes_per_filesystem_block
are allocated in the cylinder group
b) cg->cg_irotor points to a uninterupted run of
allocated inodes in the inode bitmap up to the
end of dynamically initialized inodes
(cg->cg_initediblk)

In this case the next inode after this run was returned
without initializing the respective inode block. As the
block is not initialized these inodes could trigger panics
on inode consistency due to old (uninitialized) disk data.

In very rare cases data loss could occur when
the uninitialized inode block is initialized via the
normal mechanism.

Further conditions to occur after the above:
c) no panic
d) no (forced) fsck
e) and more than cg->cg_initediblk - inodes_per_filesystem_block
allocated inodes.

Fix:

Always insure allocation always in initialized inode range
extending the initialized inode range as needed.

Add KASSERTMSG() safeguards.

ok hannken@
 1.146.2.1  14-Aug-2015  msaitoh branches: 1.146.2.1.2; 1.146.2.1.6;
Pull up following revision(s) (requested by riastradh in ticket #949):
sys/ufs/ffs/ffs_alloc.c: revision 1.151
Need wapbl transaction around ffs_blkfree_cg. Fixes wapbl+discard.
 1.146.2.1.6.1  29-May-2019  martin Pull up following revision(s) (requested by kardel in ticket #1697):

sys/ufs/ffs/ffs_alloc.c: revision 1.164

PR/53990, PR/52380, PR/52102: UFS2 cylinder group inode allocation botch

Fix rare allocation botch in ffs_nodealloccg().

Conditions:
a) less than
#_of_initialized_inodes(cg->cg_initediblk)
- inodes_per_filesystem_block
are allocated in the cylinder group
b) cg->cg_irotor points to a uninterupted run of
allocated inodes in the inode bitmap up to the
end of dynamically initialized inodes
(cg->cg_initediblk)

In this case the next inode after this run was returned
without initializing the respective inode block. As the
block is not initialized these inodes could trigger panics
on inode consistency due to old (uninitialized) disk data.

In very rare cases data loss could occur when
the uninitialized inode block is initialized via the
normal mechanism.

Further conditions to occur after the above:
c) no panic
d) no (forced) fsck
e) and more than cg->cg_initediblk - inodes_per_filesystem_block
allocated inodes.

Fix:

Always insure allocation always in initialized inode range
extending the initialized inode range as needed.

Add KASSERTMSG() safeguards.

ok hannken@
 1.146.2.1.2.1  29-May-2019  martin Pull up following revision(s) (requested by kardel in ticket #1697):

sys/ufs/ffs/ffs_alloc.c: revision 1.164

PR/53990, PR/52380, PR/52102: UFS2 cylinder group inode allocation botch

Fix rare allocation botch in ffs_nodealloccg().

Conditions:
a) less than
#_of_initialized_inodes(cg->cg_initediblk)
- inodes_per_filesystem_block
are allocated in the cylinder group
b) cg->cg_irotor points to a uninterupted run of
allocated inodes in the inode bitmap up to the
end of dynamically initialized inodes
(cg->cg_initediblk)

In this case the next inode after this run was returned
without initializing the respective inode block. As the
block is not initialized these inodes could trigger panics
on inode consistency due to old (uninitialized) disk data.

In very rare cases data loss could occur when
the uninitialized inode block is initialized via the
normal mechanism.

Further conditions to occur after the above:
c) no panic
d) no (forced) fsck
e) and more than cg->cg_initediblk - inodes_per_filesystem_block
allocated inodes.

Fix:

Always insure allocation always in initialized inode range
extending the initialized inode range as needed.

Add KASSERTMSG() safeguards.

ok hannken@
 1.147.2.5  28-Aug-2017  skrll Sync with HEAD
 1.147.2.4  05-Dec-2016  skrll Sync with HEAD
 1.147.2.3  05-Oct-2016  skrll Sync with HEAD
 1.147.2.2  22-Sep-2015  skrll Sync with HEAD
 1.147.2.1  06-Apr-2015  skrll Sync with HEAD
 1.151.2.2  20-Mar-2017  pgoyette Sync with HEAD
 1.151.2.1  04-Nov-2016  pgoyette Sync with HEAD
 1.154.2.1  21-Apr-2017  bouyer Sync with HEAD
 1.156.6.2  29-May-2019  martin Pull up following revision(s) (requested by kardel in ticket #1272):

sys/ufs/ffs/ffs_alloc.c: revision 1.164

PR/53990, PR/52380, PR/52102: UFS2 cylinder group inode allocation botch

Fix rare allocation botch in ffs_nodealloccg().

Conditions:
a) less than
#_of_initialized_inodes(cg->cg_initediblk)
- inodes_per_filesystem_block
are allocated in the cylinder group
b) cg->cg_irotor points to a uninterupted run of
allocated inodes in the inode bitmap up to the
end of dynamically initialized inodes
(cg->cg_initediblk)

In this case the next inode after this run was returned
without initializing the respective inode block. As the
block is not initialized these inodes could trigger panics
on inode consistency due to old (uninitialized) disk data.

In very rare cases data loss could occur when
the uninitialized inode block is initialized via the
normal mechanism.

Further conditions to occur after the above:
c) no panic
d) no (forced) fsck
e) and more than cg->cg_initediblk - inodes_per_filesystem_block
allocated inodes.

Fix:

Always insure allocation always in initialized inode range
extending the initialized inode range as needed.

Add KASSERTMSG() safeguards.

ok hannken@
 1.156.6.1  24-Jul-2017  snj Pull up following revision(s) (requested by hannken in ticket #129):
sys/ufs/ffs/ffs_alloc.c: revision 1.157
When initializing more inodes make sure to write them to disk
before writing the cylinder group with updated cg_initediblk.
 1.159.4.3  21-Apr-2020  martin Sync with HEAD
 1.159.4.2  08-Apr-2020  martin Merge changes from current as of 20200406
 1.159.4.1  10-Jun-2019  christos Sync with HEAD
 1.159.2.2  06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.159.2.1  28-Jul-2018  pgoyette Sync with HEAD
 1.164.6.1  29-Feb-2020  ad Sync with head.
 1.164.4.1  21-Mar-2020  martin Pull up following revision(s) (requested by riastradh in ticket #790):

sys/ufs/ffs/ffs_alloc.c: revision 1.165

Fix non-DIAGNOSTIC build with UVM_PAGE_TRKOWN.
 1.166.4.1  20-Apr-2020  bouyer Sync with HEAD
 1.171.4.1  13-May-2023  martin Pull up following revision(s) (requested by chs in ticket #160):

usr.sbin/makefs/ffs/ffs_alloc.c: revision 1.31
sbin/tunefs/tunefs.c: revision 1.58
sbin/fsck_ffs/setup.c: revision 1.105
sbin/fsck_ffs/pass5.c: revision 1.56
usr.sbin/makefs/ffs.c: revision 1.74
usr.sbin/makefs/ffs/mkfs.c: revision 1.42
usr.sbin/makefs/Makefile: revision 1.40
sys/ufs/ffs/fs.h: revision 1.71
sbin/fsdb/fsdb.c: revision 1.54
sbin/resize_ffs/resize_ffs.c: revision 1.58
sbin/fsck_ffs/pass4.c: revision 1.29
usr.sbin/makefs/ffs/ffs_extern.h: revision 1.9
sbin/newfs/mkfs.c: revision 1.133
sys/ufs/ffs/ffs_alloc.c: revision 1.172
sbin/fsck_ffs/pass1b.c: revision 1.24
usr.sbin/dumpfs/dumpfs.c: revision 1.68
sys/ufs/ffs/ffs_extern.h: revision 1.88
usr.sbin/quotacheck/quotacheck.c: revision 1.51
sys/ufs/ffs/ffs_subr.c: revision 1.54
sbin/fsck_ffs/main.c: revision 1.91
sbin/fsck_ffs/pass1.c: revision 1.63

ufs: fixed signed/unsigned bugs affecting large file systems

Apply these commits from FreeBSD:
commit e870d1e6f97cc73308c11c40684b775bcfa906a2
Author: Kirk McKusick <mckusick@FreeBSD.org>
Date: Wed Feb 10 20:10:35 2010 +0000
This fix corrects a problem in the file system that treats large
inode numbers as negative rather than unsigned. For a default
(16K block) file system, this bug began to show up at a file system
size above about 16Tb.
To fully handle this problem, newfs must be updated to ensure that
it will never create a filesystem with more than 2^32 inodes. That
patch will be forthcoming soon.
Reported by: Scott Burns, John Kilburg, Bruce Evans
Followup by: Jeff Roberson
PR: 133980
MFC after: 2 weeks

commit 81479e688b0f643ffacd3f335b4b4bba460b769d
Author: Kirk McKusick <mckusick@FreeBSD.org>
Date: Thu Feb 11 18:14:53 2010 +0000
One last pass to get all the unsigned comparisons correct.

In additional to the changes from FreeBSD, this commit includes quite a few
related changes to appease -Wsign-compare.

RSS XML Feed