Home | History | Annotate | Download | only in lfs
History log of /src/sys/ufs/lfs/lfs_pages.c
RevisionDateAuthorComments
 1.27  11-Apr-2023  riastradh lfs: Assert page identity doesn't change.

Forgot what I was debugging when I inserted a relookup in my local
tree months or years ago, but whatever it was, if that solved a
problem, this KDASSERT will make the problem more obvious.
 1.26  05-Sep-2020  riastradh Round of uvm.h cleanup.

The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
to query whether curlwp is the pagedaemon, which should maybe be
exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
by it. We should split up uvm_extern.h but this will serve for now
to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
UVMHIST(ubchist), since ubchist is declared in uvm.h but the
reference evaporates if UVMHIST is not defined, so we reduce header
file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
here.

ok chs@
 1.25  17-Mar-2020  ad Tweak the March 14th change to make page waits interlocked by pg->interlock.
Remove unneeded changes and only deal with the PQ_WANTED flag, to exclude
possible bugs.
 1.24  14-Mar-2020  ad Make uvm_pagemarkdirty() responsible for putting vnodes onto the syncer
work list. Proposed on tech-kern@.
 1.23  14-Mar-2020  ad Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.
 1.22  23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.21  23-Feb-2020  riastradh Don't lfs_writer_enter while holding v_interlock.

There's no need to lfs_writer_enter at all here, as far as I can see.
lfs_flush_fs will do it for us.
 1.20  15-Jan-2020  ad Merge from yamt-pagecache (after much testing):

- Reduce unnecessary page scan in putpages esp. when an object has a ton of
pages cached but only a few of them are dirty.

- Reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
 1.19  31-Dec-2019  ad branches: 1.19.2;
- Add and use wrapper functions that take and acquire page interlocks, and pairs
of page interlocks. Require that the page interlock be held over calls to
uvm_pageactivate(), uvm_pagewire() and similar.

- Solve the concurrency problem with page replacement state. Rather than
updating the global state synchronously, set an intended state on
individual pages (active, inactive, enqueued, dequeued) while holding the
page interlock. After the interlock is released put the pages on a 128
entry per-CPU queue for their state changes to be made real in batch.
This results in in a ~400 fold decrease in contention on my test system.
Proposed on tech-kern but modified to use the page interlock rather than
atomics to synchronise as it's much easier to maintain that way, and
cheaper.
 1.18  20-Dec-2019  ad Fix lfs_putpages() for bsize < nbpg.
 1.17  15-Dec-2019  ad Merge from yamt-pagecache:

- do gang lookup of pages using radixtree.
- remove now unused uvm_object::uo_memq and vm_page::listq.queue.
 1.16  13-Dec-2019  ad Break the global uvm_pageqlock into a per-page identity lock and a private
lock for use of the pagedaemon policy code. Discussed on tech-kern.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour
 1.15  19-Aug-2017  maya branches: 1.15.4; 1.15.8;
Ask some question about the code in a XXX comment
 1.14  10-Jun-2017  maya Rename i_flag to i_state.

The similarity to i_flags has previously caused errors.
 1.13  05-Jun-2017  maya Correct confusion between i_flag and i_flags
These will have to be renamed.

Spotted by Riastradh, thanks!
 1.12  04-Jun-2017  hannken Operations fstrans_start() and fstrans_start_nowait() now always
use FSTRANS_SHARED as lock type so remove the lock type argument.

File system state FSTRANS_SUSPENDING is now unused so remove it.

Regen vnode_if files.

Ride 8.99.1 less than a hour ago.
 1.11  01-Apr-2017  maya branches: 1.11.6;
Switch lfs_writer_daemon to use condvar instead of mtsleep.
track thread existence with struct lwp instead of pid + lid,
it's more useful from ddb.
 1.10  30-Mar-2017  hannken Remove now redundant calls to fstrans_start()/fstrans_done().

Add fstrans_start()/fstrans_done() to lfs_putpages().
 1.9  04-Oct-2016  christos branches: 1.9.2;
Grr, the optimizer on mips64 can't handle this... Use MIN_PAGE_SIZE.
 1.8  21-Jul-2016  christos Don't do variable stack allocations for systems with non-const PAGE_SIZE;
instead assume that the smallest pagesize is 1024.
 1.7  12-Aug-2015  dholland branches: 1.7.2;
Make 32-bit and 64-bit versions of SEGSUM.
Also fix some of the FINFO handling as it's closely entangled.
 1.6  02-Aug-2015  dholland Second batch of 64 -> 32 truncations in lfs, along with more minor
tidyups and corrections in passing.
 1.5  28-Jul-2015  dholland Add a new lfs header file: lfs_accessors.h.

This contains all the accessor functions and macros out of lfs.h.
Add an include of lfs_accessors.h after all uses of lfs.h... except
for code that wants to define its own struct lfs-alike that the
accessors are supposed to play along with. For these, set STRUCT_LFS
and include lfs_accessors.h after the necessary structure has been
defined, so that lfs_accessors.h can emit functions in terms of it.
 1.4  25-Jul-2015  martin Use accessors in DEBUG and DIAGNOSTIC code as well
 1.3  24-Jul-2015  dholland More lfs superblock accessors.
(This changes the rest of the code over; all the accessors were
already added.)

The difference between this commit and the previous one is arbitrary,
but the previous one passed the regression tests on its own so I'm
keeping it separate to help with any bisections that might be needed
in the future.
 1.2  24-Jul-2015  dholland Switch to accessor functions for elements of the LFS on-disk
superblock. This will allow switching between 32/64 bit forms on the
fly; it will also allow handling LFS_EI reasonably tidily. (That
currently doesn't work on the superblock.)

It also gets rid of cpp abuse in the form of fake structure member
macros.

Also, instead of doing sleep/wakeup on &lfs_avail and &lfs_nextseg
inside the on-disk superblock, add extra elements to the in-memory
struct lfs for this. (XXX: these should be changed to condvars, but
not right now)

XXX: this migrates a structure needed by the lfs code in libsa (struct
salfs) into lfs.h, where it doesn't belong, but for the time being
this is necessary in order to allow the accessors (and the various
lfs macros and other goop that relies on them) to compile.
 1.1  16-May-2014  dholland branches: 1.1.2; 1.1.4; 1.1.8; 1.1.10;
Move lfs_getpages and lfs_putpages to their own file.
 1.1.10.4  28-Aug-2017  skrll Sync with HEAD
 1.1.10.3  05-Dec-2016  skrll Sync with HEAD
 1.1.10.2  05-Oct-2016  skrll Sync with HEAD
 1.1.10.1  22-Sep-2015  skrll Sync with HEAD
 1.1.8.3  03-Dec-2017  jdolecek update from HEAD
 1.1.8.2  20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.1.8.1  16-May-2014  tls file lfs_pages.c was added on branch tls-maxphys on 2014-08-20 00:04:44 +0000
 1.1.4.2  10-Aug-2014  tls Rebase.
 1.1.4.1  16-May-2014  tls file lfs_pages.c was added on branch tls-earlyentropy on 2014-08-10 06:56:58 +0000
 1.1.2.2  18-May-2014  rmind sync with head
 1.1.2.1  16-May-2014  rmind file lfs_pages.c was added on branch rmind-smpnet on 2014-05-18 17:46:21 +0000
 1.7.2.3  26-Apr-2017  pgoyette Sync with HEAD
 1.7.2.2  04-Nov-2016  pgoyette Sync with HEAD
 1.7.2.1  26-Jul-2016  pgoyette Sync with HEAD
 1.9.2.1  21-Apr-2017  bouyer Sync with HEAD
 1.11.6.2  30-Oct-2017  snj Pull up following revision(s) (requested by maya in ticket #330):
sbin/fsck_lfs/inode.c: 1.69
sbin/fsck_lfs/lfs.c: 1.73
sbin/fsck_lfs/pass6.c: 1.50
sbin/fsck_lfs/segwrite.c: 1.46
sys/ufs/lfs/lfs.h: 1.202-1.203
sys/ufs/lfs/lfs_accessors.h: 1.48
sys/ufs/lfs/lfs_alloc.c: 1.136-1.137
sys/ufs/lfs/lfs_balloc.c: 1.94
sys/ufs/lfs/lfs_bio.c: 1.141
sys/ufs/lfs/lfs_extern.h: 1.113
sys/ufs/lfs/lfs_inode.c: 1.156-1.157
sys/ufs/lfs/lfs_inode.h: 1.20, 1.21, 1.23
sys/ufs/lfs/lfs_itimes.c: 1.20
sys/ufs/lfs/lfs_pages.c: 1.13-1.15
sys/ufs/lfs/lfs_rename.c: 1.22
sys/ufs/lfs/lfs_segment.c: 1.270-1.275
sys/ufs/lfs/lfs_subr.c: 1.94-1.97
sys/ufs/lfs/lfs_syscalls.c: 1.175
sys/ufs/lfs/lfs_vfsops.c: 1.360
sys/ufs/lfs/lfs_vnops.c: 1.316-1.321
sys/ufs/lfs/ulfs_inode.c: 1.20
sys/ufs/lfs/ulfs_inode.h: 1.24
sys/ufs/lfs/ulfs_lookup.c: 1.41
sys/ufs/lfs/ulfs_quota2.c: 1.31
sys/ufs/lfs/ulfs_readwrite.c: 1.24
sys/ufs/lfs/ulfs_vnops.c: 1.49-1.50
Update inode member i_flag --> i_state to keep up with kernel changes
Move definition of IN_ALLMOD near the flag it's a mask for.
Now we can see that it doesn't match all the flags, but changing that will
require more careful thought.
Correct confusion between i_flag and i_flags
These will have to be renamed.
Spotted by Riastradh, thanks!
Add an XXX about the missing flags so it's not buried in a commit
message.
now the XXX count for LFS is 260
Rename i_flag to i_state.
The similarity to i_flags has previously caused errors.
Use continue to denote the no-op loop to match netbsd style
newline for extra clarity.
It isn't safe to drain dirops with seglock held, it'll deadlock if there
are any dirops. drain before grabbing seglock.
lfs_dirops == 0 is always true (as we already drained dirops), so omit
that part of the comparison.
Fixes a lot of LFS deadlocks. PR kern/52301
Many thanks to dholland for help analyzing coredumps
Ifdef out KDASSERT which fires on my machine.
Deduplicate sanity check that seglock is held on segunlock
Revert r1.272 fix to PR kern/52301, the performance hit is making things
unusable.
change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar
XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
appropriate?
fix buffer overflow/KASSERT when cookies are supplied
lfs no longer uses the ffs-style struct direct, use the correct minimum
size
from dholland
XXX more wrong
Consistently use {,UN}MARK_VNODE macros rather than function calls.
Not much point doing anything after a panic call
Ask some question about the code in a XXX comment
XXX question our double-flushing of dirops
Fix typo in comment
 1.11.6.1  04-Jun-2017  bouyer pullup the following revisions, requested by hannken in ticket #2:
src/share/man/man9/fstrans.9 1.25
src/sys/kern/vfs_mount.c 1.66
src/sys/kern/vfs_subr.c 1.468
src/sys/kern/vfs_trans.c 1.46
src/sys/kern/vfs_vnode.c 1.94, 1.95, 1.96
src/sys/kern/vnode_if.c 1.105, 1.106
src/sys/kern/vnode_if.sh 1.65, 1.66
src/sys/kern/vnode_if.src 1.76
src/sys/miscfs/genfs/genfs_io.c 1.69
src/sys/miscfs/genfs/genfs_vnops.c 1.196, 1.197
src/sys/miscfs/genfs/layer_extern.h 1.40
src/sys/miscfs/genfs/layer_vfsops.c 1.51
src/sys/miscfs/genfs/layer_vnops.c 1.67
src/sys/miscfs/nullfs/null_vnops.c 1.42
src/sys/miscfs/overlay/overlay_vnops.c 1.24
src/sys/miscfs/umapfs/umap_vnops.c 1.60
src/sys/rump/include/rump/rumpvnode_if.h 1.29, 1.30
src/sys/rump/librump/rumpkern/emul.c 1.182
src/sys/rump/librump/rumpvfs/rumpvnode_if.c 1.29, 1.30
src/sys/sys/fstrans.h 1.11
src/sys/sys/vnode.h 1.278
src/sys/sys/vnode_if.h 1.100, 1.101
src/sys/sys/vnode_impl.h 1.14, 1.15
src/sys/ufs/lfs/lfs_pages.c 1.12

Vnode state, lock and fstrans cleanup:
- Rename vnode state "VS_ACTIVE" to "VS_LOADED" and add synthetic
state "VS_ACTIVE" to assert a loaded vnode with usecount > 0.

- Redo FSTRANS in vnode_if.c and use it for VOP_LOCK and VOP_UNLOCK.

- Cleanup the genfs lock operations.

- Make "struct vnode_impl" member "vi_lock" a krwlock_t again.

- Remove the lock type argument from fstrans_start and
fstrans_start_nowait,
remove now unused FSTRANS state "FSTRANS_SUSPENDING".
 1.15.8.1  17-Aug-2020  martin Pull up following revision(s) (requested by riastradh in ticket #1050):

sys/ufs/lfs/lfs_subr.c: revision 1.101
sys/ufs/lfs/lfs_subr.c: revision 1.102
sys/ufs/lfs/lfs_inode.c: revision 1.158
sys/ufs/lfs/lfs_inode.h: revision 1.25
sys/ufs/lfs/lfs_balloc.c: revision 1.95
sys/ufs/lfs/lfs_pages.c: revision 1.21
sys/ufs/lfs/lfs_vnops.c: revision 1.330
sys/ufs/lfs/lfs_alloc.c: revision 1.140 (patch)
sys/ufs/lfs/lfs_alloc.c: revision 1.141 (patch)
lib/libp2k/p2k.c: revision 1.72
sys/ufs/lfs/lfs.h: revision 1.205
sys/ufs/lfs/lfs.h: revision 1.206
sys/ufs/lfs/lfs_segment.c: revision 1.284
sys/ufs/lfs/lfs.h: revision 1.207
sys/ufs/lfs/lfs_segment.c: revision 1.285
sys/ufs/lfs/lfs_debug.c: revision 1.55
sys/ufs/lfs/lfs_rename.c: revision 1.23
usr.sbin/dumplfs/dumplfs.c: revision 1.65
sys/ufs/lfs/lfs_vfsops.c: revision 1.371
sys/arch/i386/stand/efiboot/bootx64/Makefile: revision 1.3
sys/ufs/lfs/lfs_vfsops.c: revision 1.372
sys/ufs/lfs/lfs_vfsops.c: revision 1.373
sbin/fsck_lfs/pass1.c: revision 1.46
sys/ufs/lfs/lfs_vnops.c: revision 1.326
sys/ufs/lfs/lfs_vnops.c: revision 1.327
sys/ufs/lfs/lfs_vfsops.c: revision 1.375 (patch)
sys/ufs/lfs/lfs_vnops.c: revision 1.328
sys/ufs/lfs/lfs_subr.c: revision 1.98
sys/ufs/lfs/lfs_extern.h: revision 1.116
sys/ufs/lfs/lfs_vnops.c: revision 1.329
sys/ufs/lfs/lfs_subr.c: revision 1.99
sys/ufs/lfs/lfs_extern.h: revision 1.117
sys/ufs/lfs/lfs_accessors.h: revision 1.49
sys/ufs/lfs/lfs_extern.h: revision 1.118
sys/rump/fs/lib/liblfs/Makefile: revision 1.15
sys/ufs/lfs/lfs_bio.c: revision 1.146 (patch)
sys/ufs/lfs/lfs_bio.c: revision 1.147
sys/ufs/lfs/lfs_subr.c: revision 1.100

Fix kassert in lfs by initializing vp first.

Use a marker node to iterate lfs_dchainhd / i_lfs_dchain.

I believe elements can be removed while the lock is dropped,
including the next node we're hanging on to.

Just use VOP_BWRITE for lfs_bwrite_log.
Hope this doesn't cause trouble with vfs_suspend.

Teach lfs to transition ro<->rw.

Prevent new dirops while we issue lfs_flush_dirops.

lfs_flush_dirops assumes (by KASSERT((ip->i_state & IN_ADIROP) == 0))
that vnodes on the dchain will not become involved in active dirops
even while holding no other locks (lfs_lock, v_interlock), so we must
set lfs_writer here. All other callers already set lfs_writer.

We set fs->lfs_writer++ without explicitly doing lfs_writer_enter
because
(a) we already waited for the dirops to drain, and
(b) we hold lfs_lock and cannot drop it before setting lfs_writer.

Assert lfs_writer where I think we can now prove it.

Serialize access to the splay tree with lfs_lock.

Change some cheap KDASSERT into KASSERT.

Take a reference and fix assertions in lfs_flush_dirops.
Fixes panic:
KASSERT((ip->i_state & IN_ADIROP) == 0) at lfs_vnops.c:1670
lfs_flush_dirops
lfs_check
lfs_setattr
VOP_SETATTR
change_mode
sys_fchmod
syscall

This assertion -- and the assertion that vp->v_uflag has VU_DIROP set
-- is valid only until we release lfs_lock, because we may race with
lfs_unmark_dirop which will remove the nodes and change the flags.

Further, vp itself is valid only as long as it is referenced, which it
is as long as it's on the dchain, but lfs_unmark_dirop drops the
dchain's reference.

Don't lfs_writer_enter while holding v_interlock.

There's no need to lfs_writer_enter at all here, as far as I can see.
lfs_flush_fs will do it for us.

Break deadlock in PR kern/52301.

The lock order is lfs_writer -> lfs_seglock. The problem in 52301 is
that lfs_segwrite violates this lock order by sometimes doing
lfs_seglock -> lfs_writer, either (a) when doing a checkpoint or (b),
opportunistically, when there are no dirops pending. Both cases can
deadlock, because dirops sometimes take the seglock (lfs_truncate,
lfs_valloc, lfs_vfree):
(a) There may be dirops pending, and they may be waiting for the
seglock, so we can't wait for them to complete while holding the
seglock.
(b) The test for fs->lfs_dirops == 0 happens unlocked, and the state
may change by the time lfs_writer_enter acquires lfs_lock.

To resolve this in each case:
(a) Do lfs_writer_enter before lfs_seglock, since we will need it
unconditionally anyway. The worst performance impact of this should
be that some dirops get delayed a little bit.
(b) Create a new lfs_writer_tryenter to use at this point so that the
test for fs->lfs_dirops == 0 and the acquisition of lfs_writer happen
atomically under lfs_lock.

Initialize/destroy lfs_allclean_wakeup in modcmd, not lfs_mountfs.

Fixes reloading lfs.kmod.

In lfs_update, hold lfs_writer around lfs_vflush.

Otherwise, we might do
lfs_vflush
-> lfs_seglock
-> lfs_segwait(SEGM_CKP)
-> lfs_writer_enter
which is the reverse of the lfs_writer -> lfs_seglock ordering.

Call lfs_orphan in lfs_rename while we're still in the dirop.
lfs_writer_enter can't fail; keep it simple and don't pretend it can.

Assert that mtsleep can't fail either -- it doesn't catch signals and
there's no timeout.

Teach LFS_ORPHAN_NEXTFREE about lfs64.

Dust off the orphan detection code and try to make it work.

Fix !DIAGNOSTIC compile

Fix userland references to LFS_ORPHAN_NEXTFREE.

Forgot to grep for these or do a full distribution build, oops!

Fix missing <sys/evcnt.h> by removing the evcnts instead.

Just wanted to confirm that a race might happen, and indeed it did.
These serve little diagnostic value otherwise.

OR into bp->b_cflags; don't overwrite.

CTASSERT lfs on-disk structure sizes.

Avoid misaligned access to lfs64 on-disk records in memory.
lfs64 directory entries are only 32-bit aligned in order to conserve
space in directory blocks, and we had a hack to stuff a 64-bit inode
in them. This replaces the hack by __aligned(4) __packed, and goes
further:

1. It's not clear that all the other lfs64 data structures are 64-bit
aligned on disk to begin with. We can go through these later and
upgrade them from
struct foo64 {
...
} __aligned(4) __packed;
union foo {
struct foo64 f64;
...
};
to
struct foo64 {
...
};
union foo {
struct foo64 f64 __aligned(8);
...
} __aligned(4) __packed;
if we really want to take advantage of 64-bit memory accesses.
However, the __aligned(4) __packed must remain on the union
because:
2. We access even the lfs32 data structures via a union that has
lfs64 members, and it turns out that compilers will assume access
through a union with 64-bit aligned members implies the whole
union has 64-bit alignment, even if we're only accessing a 32-bit
aligned member.

Fix clang build after packed lfs64 accessor change.

Suppress spurious address-of-packed error in rump lfs too.
 1.15.4.1  08-Apr-2020  martin Merge changes from current as of 20200406
 1.19.2.2  29-Feb-2020  ad Sync with head.
 1.19.2.1  17-Jan-2020  ad Sync with head.

RSS XML Feed