Home | History | Annotate | Download | only in vfs
History log of /src/tests/fs/vfs/t_renamerace.c
RevisionDateAuthorComments
 1.44  31-Jan-2022  ryo Extend the time to wait for the thread to quit.

It seems that alarm(1) is not enough time for the thread to actually exit after quittingtime = 1.
It randomly failed with "Test program received signal 14" on a slow environment.
 1.43  27-Nov-2021  gson Force failure of the nfs_renamerace_cycle, p2k_ffs_renamerace_cycle,
and puffs_renamerace_cycle test cases as they fail only randomly or
only on some systems.
 1.42  23-Oct-2021  hannken After converting msdosfs_rename() to use genfs_sane_rename() the
MSDOS tests should pass.

Tested on QEMU/nvmm archs i386 and amd64.

Should resolve PR kern/43626 (directory renaming more than a little racy)
 1.41  16-Jun-2021  riastradh tests/fs/vfs: Mark udf_renamerace_cycle flaky, PR kern/56253.
 1.40  05-Sep-2020  riastradh Revert "ufs: Prevent mkdir from choking on deleted directories."

This change made no sense and should not have been committed.
 1.39  05-Sep-2020  riastradh ufs: Prevent mkdir from choking on deleted directories.

Fix some missing uvm_vnp_setsize in screw cases while here.
 1.38  05-Sep-2020  riastradh genfs_rename: Fix deadlocks in cross-directory cyclic rename.

Reproducer:

A: for (;;) { mkdir("c", 0600); mkdir("c/d", 0600); mkdir("c/d/e", 0600);
rmdir("c/d/e"); rmdir("c/d"); }
B: for (;;) { mkdir("c", 0600); mkdir("c/d", 0600); mkdir("c/d/e", 0600);
rename("c", "c/d/e"); }
C: for (;;) { mkdir("c", 0600); mkdir("c/d", 0600); mkdir("c/d/e", 0600);
rename("c/d/e", "c"); }

Deadlock:

- A holds c and wants to lock d; and either
- B holds . and d and wants to lock c, or
- C holds . and d and wants to lock c.

The problem with these is that genfs_rename_enter_separate in B or C
tried lock order .->d->c->e (in A/B, fdvp->tdvp->fvp->tvp; in A/C,
tdvp->fdvp->tvp->fvp) which violates the ancestor->descendant order
.->c->d->e.

The resolution is to change B to do fdvp->fvp->tdvp->tvp and C to do
tdvp->tvp->fdvp->fvp. But there's an edge case: tvp and fvp might be
the same (hard links), and we can't detect that until after we've
looked them both up -- and in some file systems (I'm looking at you,
ufs), there is no mere lookup operation, only lookup-and-lock, so we
can't even hold the lock on one of tvp or fvp when we look up the
other one if there's a chance they might be the same.

Fortunately the cases
(a) tvp = fvp
(b) tvp or fvp is a directory
are mutually exclusive as long as directories cannot be hard-linked.
In case (a) we can just defer locking {tvp, fvp} until the end, because
it can't possibly have {fdvp or fvp, tdvp or tvp} as descendants. In
case (b) we can just lock them in the order fdvp->fvp->tdvp->tvp or
tdvp->tvp->fdvp->fvp if the first one of {fvp, tvp} is a directory,
because it can't possibly coincide with the second one of {fvp, tvp}.

With this change, we can now prove that the locking order is consistent
with the ancestor->descendant partial ordering. Where two nodes are
incommensurate under that partial ordering, they are only ever locked
by rename and there is only ever one rename at a time.

Proof:

- For same-directory renames, genfs_rename_enter_common locks the
directory first and then the children. The order
directory->child[i] is consistent with ancestor->descendant and
child[0]/child[1] are incommensurate.

- For cross-directory renames:

. While a rename is in progress and the fs-wide rename lock is held,
directories can be created or removed but not changed, so the
outcome of gro_genealogy -- which, given fdvp and tdvp, returns
the node N relating fdvp/N/.../tdvp or null if there is none --
can only transition from finding N to not finding N, if one of
the directories is removed while any of the vnodes are unlocked.
Merely creating directories cannot change the ancestry of tdvp,
and concurrent renames are not possible.

Thus, if a gro_genealogy determined the operation to have the
form fdvp/N/.../tdvp, then it might cease to have that form, but
only because tdvp was removed which will harmlessly cause the
rename to fail later on. Similarly, if gro_genealogy determined
the operation _not_ to have the form fdvp/N/.../tdvp then it
can't begin to have that form until after the rename has
completed.

The lock order is,

=> for fdvp/.../tdvp:
1. lock fdvp
2. lookup(/lock/unlock) fvp (consistent with fdvp->fvp)
3. lock fvp if a directory (consistent with fdvp->fvp)
4. lock tdvp (consistent with fdvp->tdvp and possibly fvp->tdvp)
5. lookup(/lock/unlock) tvp (consistent with tdvp->tvp)
6. lock fvp if a nondirectory (fvp->t* or fvp->fdvp is impossible)
7. lock tvp if not fvp (tvp->f* is impossible unless tvp=fvp)

=> for incommensurate fdvp & tdvp, or for tdvp/.../fdvp:
1. lock tdvp
2. lookup(/lock/unlock) tvp (consistent with tdvp->tvp)
3. lock tvp if a directory (consistent with tdvp->tvp)
4. lock fdvp (either incommensurate with tdvp and/or tvp, or
consistent with tdvp(->tvp)->fdvp)
5. lookup(/lock/unlock) fvp (consistent with fdvp->fvp)
6. lock tvp if a nondirectory (tvp->f* or tvp->tdvp is impossible)
7. lock fvp if not tvp (fvp->t* is impossible unless fvp=tvp)

Deadlocks found by hannken@; resolution worked out with dholland@.

XXX I think we could improve concurrency somewhat -- with a likely
big win for applications like tar and rsync that create many files
with temporary names and then rename them to the permanent one in the
same directory -- by making vfs_renamelock a reader/writer lock: any
number of same-directory renames, or exactly one cross-directory
rename, at any one time.
 1.37  05-Sep-2020  riastradh tests/fs/vfs/t_renamerace: Test a screw case hannken@ found.
 1.36  17-Aug-2019  gson The udf_renamerace test case no longer fails due to PR kern/49046, but
it does fail due to PR kern/53865 on real hardware.
 1.35  13-Jan-2019  gson branches: 1.35.2;
Mark the fs/vfs/t_renamerace:udf_renamerace_dirs test case as an
expected failure referencing PR kern/53865, and force failure to avoid
reports of unexpected success as it does not realiably fail under
qemu. This makes the treatment of udf_renamerace_dirs the same as
that of udf_renamerace, only with a different PR. Also, make
whitespace consistent between the two.
 1.34  13-Jan-2017  christos branches: 1.34.12; 1.34.14;
Don't play with "../.." in includes for h_macros.h; deal with it centrally.
Minor fixes.
 1.33  04-May-2016  dholland branches: 1.33.2;
Cite a relevant PR for msdos_renamerace instead of one that was fixed
several years ago.
 1.32  29-Jul-2014  gson Mark the udf_renamerace test case (but not udf_renamerace_dirs) as an
expected failure again, now with a reference to PR kern/49046.
Since the test only fails part of the time, force failure to
avoid failure reports reports due to unexpected success.
 1.31  25-Jul-2014  pgoyette Remove atf_tc_expect_fail() calls for udf file-system. These tests are
currently passing. As discussed on current-users. Any new failures
should be reported via send-pr.
 1.30  09-Jan-2014  hannken branches: 1.30.2;
Operation sysvbfs_remove() destructs inodes attached to active vnodes.
Defer the destruction to sysvbfs_reclaim().

Disable test t_renamerace:sysvbfs_renamerace as it will exhaust the
inode table (sysvbfs has space for 8 inodes only).

Ok: Izumi Tsutsui <tsutsui@netbsd.org>
 1.29  10-Jul-2013  reinoud Update test cases for UDF now udf_rename() uses the genfs_rename framework
 1.28  08-Jul-2013  reinoud Cover the last failing UDF test cases with a reference to PR kern/47986, i.e.
all rename's fail until UDF switches over to the new rename framework solving
the locking mechanism.
 1.27  17-Mar-2013  jmmv Fix the t_renamerace:lfs_renamerace_dirs test on fast machines.

This test was failing on my machine when run natively but not causing any
problems when run within qemu, and the failure was "mkdir: No space left
on device".

My understanding of the issue is that this test overflowed the temporary
disk image due to its high rate of file churn and the lfs_cleanerd not
being able to keep up. Note that this test is capped by time, not number
of operations, so this is why the problem does not show up in a slow
emulated system.

To fix this, just bump the test file system image limit a little bit.
(I tried increasing the frequency at which lfs_cleanerd does its thing,
but it wasn't enough.)
 1.26  09-May-2012  riastradh branches: 1.26.2;
Adjust t_renamerace now that ext2fs and ffs have good rename.
 1.25  16-Feb-2012  perseant Pass t_renamerace and t_rmdirrace tests.

Adapt dholland@'s fix to ufs_rename to fix PR kern/43582. Address several
other MP locking issues discovered during the course of investigating the
same problem.

Removed extraneous vn_lock() calls on the Ifile, since the Ifile writes
are controlled by the segment lock.

Fix PR kern/45982 by deemphasizing the estimate of how much metadata
will fill the empty space on disk when the disk is nearly empty
(t_renamerace crates a lot of inode blocks on a tiny empty disk).
 1.24  08-Oct-2011  njoly branches: 1.24.2; 1.24.4;
Slightly adjust skipped messages, makes output more consistent.
 1.23  18-Jul-2011  dholland ffs and ffslog are no longer xfail.
 1.22  14-Mar-2011  pooka Apparently this way of triggering the msdosfs rename vnode leak
does not bite every time (most commonly observed on the amd64/qemu
runs), so add a race condition catcher.
 1.21  06-Mar-2011  pooka Add a race catcher for p2k_ffs renamerace -- it seems like the
problem doesn't trigger always especially in a qemu env (but triggers
100% of the time on my desktop).
 1.20  03-Mar-2011  pooka The re-enabled renamerace test also triggers the recent msdosfs
vnode leak. xfail this under the blanket of PR kern/44661.
 1.19  03-Mar-2011  pooka Apparently my last commit to msdosfs_vnops.c fixed the (harmless?)
buffer overrun in rename (>15 years old bug), so re-enable other
msdosfs rename tests too.
 1.18  11-Jan-2011  pooka branches: 1.18.2;
need unrace-catcher for ffslog
 1.17  07-Jan-2011  pooka xfail PR kern/44336
 1.16  07-Jan-2011  pooka ffs -o log dies in renamerace_dirs just like the rest.
 1.15  02-Jan-2011  pooka + rump_lwproc_newproc -> rump_lwproc_rfork()
+ add a tess for rump_lwproc_rfork()
 1.14  11-Nov-2010  pooka skip tests which use features which rumpfs does not support
(namely: vop_rename and a file system size limit)
 1.13  01-Nov-2010  pooka Create the process we use later in the test. Otherwise cwd doesn't
go right and the test fails because of attempting to create files
in the wrong directory.
 1.12  01-Sep-2010  pooka update to new rump lwp/proc interfaces
 1.11  26-Aug-2010  pooka chdir() once per process is enough, no need to do it for every
thread (and doing so would cause occasional failures when some
thread would cd out of the test mountpoint while another thread
was still running in there).
 1.10  26-Aug-2010  pooka Put the workaround for PR kern/43799 into the common nfs unmount routine.
 1.9  25-Aug-2010  pooka Start many more threads for the renamerace since it seems to catch
more errors.

Add a sleepkludge to deal with NFS's sillyrename brokenness.
 1.8  16-Jul-2010  pooka Some of the msdosfs tests are killed by SSP due to stack limit
being exceeded. I cannot figure out what is going on by code
reading, nor repeat this either on my desktop or in qemu, so skip
those tests for msdosfs until I can get to the bottom of it.
 1.7  16-Jul-2010  pooka skip directory test on sysvbfs
 1.6  16-Jul-2010  pooka Fix typo in comment. comment tested by wizd.
 1.5  16-Jul-2010  pooka Fill in PR kern/43626 now that it exists.
 1.4  16-Jul-2010  pooka Do the famous renamerace test using directories. Uh oh, bad idea.
PR coming soon.
 1.3  16-Jul-2010  pooka This test does not always fail for LFS, so apply same kludge as
elsewhere while waiting for atf to grow support for these cases.
 1.2  14-Jul-2010  pooka xfail test on lfs. It goes badaboom faster than you can find your
multipass. Borrow PR kern/43582 used earlier for rmdirrace, as it
looks pretty much like the same problem.
 1.1  14-Jul-2010  pooka Convert "The Original" rename race test from to vfs and retire the
ffs/tmpfs versions. The only difference is that the origamical
one mounted ffs with MNT_LOG (and therein actually lay the bug).
 1.18.2.1  05-Mar-2011  bouyer Sync with HEAD
 1.24.4.1  17-Mar-2012  bouyer Pull up following revision(s) (requested by perseant in ticket #116):
sys/ufs/lfs/lfs_alloc.c: revision 1.112
tests/fs/vfs/t_rmdirrace.c: revision 1.9
tests/fs/vfs/t_renamerace.c: revision 1.25
sys/ufs/lfs/lfs_vnops.c: revision 1.240
sys/ufs/lfs/lfs_segment.c: revision 1.224
sys/ufs/lfs/lfs_bio.c: revision 1.122
sys/ufs/lfs/lfs_vfsops.c: revision 1.294
sbin/newfs_lfs/make_lfs.c: revision 1.19
sys/ufs/lfs/lfs.h: revision 1.136
Pass t_renamerace and t_rmdirrace tests.
Adapt dholland@'s fix to ufs_rename to fix PR kern/43582. Address several
other MP locking issues discovered during the course of investigating the
same problem.
Removed extraneous vn_lock() calls on the Ifile, since the Ifile writes
are controlled by the segment lock.
Fix PR kern/45982 by deemphasizing the estimate of how much metadata
will fill the empty space on disk when the disk is nearly empty
(t_renamerace crates a lot of inode blocks on a tiny empty disk).
 1.24.2.3  22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.24.2.2  23-May-2012  yamt sync with head.
 1.24.2.1  17-Apr-2012  yamt sync with head
 1.26.2.2  20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.26.2.1  23-Jun-2013  tls resync from head
 1.30.2.1  10-Aug-2014  tls Rebase.
 1.33.2.1  20-Mar-2017  pgoyette Sync with HEAD
 1.34.14.2  13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.34.14.1  10-Jun-2019  christos Sync with HEAD
 1.34.12.1  18-Jan-2019  pgoyette Synch with HEAD
 1.35.2.1  13-Sep-2020  martin Pull up following revision(s) (requested by riastradh in ticket #1083):

sys/miscfs/genfs/genfs_rename.c: revision 1.5
tests/fs/vfs/t_renamerace.c: revision 1.37
tests/fs/vfs/t_renamerace.c: revision 1.38

tests/fs/vfs/t_renamerace: Test a screw case hannken@ found.

genfs_rename: Fix deadlocks in cross-directory cyclic rename.

Reproducer:
A: for (;;) { mkdir("c", 0600); mkdir("c/d", 0600); mkdir("c/d/e", 0600);
rmdir("c/d/e"); rmdir("c/d"); }
B: for (;;) { mkdir("c", 0600); mkdir("c/d", 0600); mkdir("c/d/e", 0600);
rename("c", "c/d/e"); }
C: for (;;) { mkdir("c", 0600); mkdir("c/d", 0600); mkdir("c/d/e", 0600);
rename("c/d/e", "c"); }

Deadlock:
- A holds c and wants to lock d; and either
- B holds . and d and wants to lock c, or
- C holds . and d and wants to lock c.

The problem with these is that genfs_rename_enter_separate in B or C
tried lock order .->d->c->e (in A/B, fdvp->tdvp->fvp->tvp; in A/C,
tdvp->fdvp->tvp->fvp) which violates the ancestor->descendant order
.->c->d->e.

The resolution is to change B to do fdvp->fvp->tdvp->tvp and C to do
tdvp->tvp->fdvp->fvp. But there's an edge case: tvp and fvp might be
the same (hard links), and we can't detect that until after we've
looked them both up -- and in some file systems (I'm looking at you,
ufs), there is no mere lookup operation, only lookup-and-lock, so we
can't even hold the lock on one of tvp or fvp when we look up the
other one if there's a chance they might be the same.

Fortunately the cases
(a) tvp = fvp
(b) tvp or fvp is a directory
are mutually exclusive as long as directories cannot be hard-linked.

In case (a) we can just defer locking {tvp, fvp} until the end, because
it can't possibly have {fdvp or fvp, tdvp or tvp} as descendants. In
case (b) we can just lock them in the order fdvp->fvp->tdvp->tvp or
tdvp->tvp->fdvp->fvp if the first one of {fvp, tvp} is a directory,
because it can't possibly coincide with the second one of {fvp, tvp}.

With this change, we can now prove that the locking order is consistent
with the ancestor->descendant partial ordering. Where two nodes are
incommensurate under that partial ordering, they are only ever locked
by rename and there is only ever one rename at a time.

Proof:
- For same-directory renames, genfs_rename_enter_common locks the
directory first and then the children. The order
directory->child[i] is consistent with ancestor->descendant and
child[0]/child[1] are incommensurate.
- For cross-directory renames:
. While a rename is in progress and the fs-wide rename lock is held,
directories can be created or removed but not changed, so the
outcome of gro_genealogy -- which, given fdvp and tdvp, returns
the node N relating fdvp/N/.../tdvp or null if there is none --
can only transition from finding N to not finding N, if one of
the directories is removed while any of the vnodes are unlocked.
Merely creating directories cannot change the ancestry of tdvp,
and concurrent renames are not possible.
Thus, if a gro_genealogy determined the operation to have the
form fdvp/N/.../tdvp, then it might cease to have that form, but
only because tdvp was removed which will harmlessly cause the
rename to fail later on. Similarly, if gro_genealogy determined
the operation _not_ to have the form fdvp/N/.../tdvp then it
can't begin to have that form until after the rename has
completed.
The lock order is,
=> for fdvp/.../tdvp:
1. lock fdvp
2. lookup(/lock/unlock) fvp (consistent with fdvp->fvp)
3. lock fvp if a directory (consistent with fdvp->fvp)
4. lock tdvp (consistent with fdvp->tdvp and possibly fvp->tdvp)
5. lookup(/lock/unlock) tvp (consistent with tdvp->tvp)
6. lock fvp if a nondirectory (fvp->t* or fvp->fdvp is impossible)
7. lock tvp if not fvp (tvp->f* is impossible unless tvp=fvp)
=> for incommensurate fdvp & tdvp, or for tdvp/.../fdvp:
1. lock tdvp
2. lookup(/lock/unlock) tvp (consistent with tdvp->tvp)
3. lock tvp if a directory (consistent with tdvp->tvp)
4. lock fdvp (either incommensurate with tdvp and/or tvp, or
consistent with tdvp(->tvp)->fdvp)
5. lookup(/lock/unlock) fvp (consistent with fdvp->fvp)
6. lock tvp if a nondirectory (tvp->f* or tvp->tdvp is impossible)
7. lock fvp if not tvp (fvp->t* is impossible unless fvp=tvp)

Deadlocks found by hannken@; resolution worked out with dholland@.

XXX I think we could improve concurrency somewhat -- with a likely
big win for applications like tar and rsync that create many files
with temporary names and then rename them to the permanent one in the
same directory -- by making vfs_renamelock a reader/writer lock: any
number of same-directory renames, or exactly one cross-directory
rename, at any one time.

RSS XML Feed