History log of /src/sys/miscfs/genfs/genfs_rename.c |
Revision | | Date | Author | Comments |
1.7 |
| 20-Oct-2021 |
thorpej | Don't use genfs_rename_knote() in the "rename foo over hard-link to itself" case, which simply results in removing the "from" name; there are assertions in genfs_rename_knote() that are too strong for that case.
PR kern/56460
|
1.6 |
| 20-Oct-2021 |
thorpej | Overhaul of the EVFILT_VNODE kevent(2) filter:
- Centralize vnode kevent handling in the VOP_*() wrappers, rather than forcing each individual file system to deal with it (except VOP_RENAME(), because VOP_RENAME() is a mess and we currently have 2 different ways of handling it; at least it's reasonably well-centralized in the "new" way). - Add support for NOTE_OPEN, NOTE_CLOSE, NOTE_CLOSE_WRITE, and NOTE_READ, compatible with the same events in FreeBSD. - Track which kevent notifications clients are interested in receiving to avoid doing work for events no one cares about (avoiding, e.g. taking locks and traversing the klist to send a NOTE_WRITE when someone is merely watching for a file to be deleted, for example).
In support of the above:
- Add support in vnode_if.sh for specifying PRE- and POST-op handlers, to be invoked before and after vop_pre() and vop_post(), respectively. Basic idea from FreeBSD, but implemented differently. - Add support in vnode_if.sh for specifying CONTEXT fields in the vop_*_args structures. These context fields are used to convey information between the file system VOP function and the VOP wrapper, but do not occupy an argument slot in the VOP_*() call itself. These context fields are initialized and subsequently interpreted by PRE- and POST-op handlers. - Version VOP_REMOVE(), uses the a context field for the file system to report back the resulting link count of the target vnode. Return this in tmpfs, udf, nfs, chfs, ext2fs, lfs, and ufs.
NetBSD 9.99.92.
|
1.5 |
| 05-Sep-2020 |
riastradh | genfs_rename: Fix deadlocks in cross-directory cyclic rename.
Reproducer:
A: for (;;) { mkdir("c", 0600); mkdir("c/d", 0600); mkdir("c/d/e", 0600); rmdir("c/d/e"); rmdir("c/d"); } B: for (;;) { mkdir("c", 0600); mkdir("c/d", 0600); mkdir("c/d/e", 0600); rename("c", "c/d/e"); } C: for (;;) { mkdir("c", 0600); mkdir("c/d", 0600); mkdir("c/d/e", 0600); rename("c/d/e", "c"); }
Deadlock:
- A holds c and wants to lock d; and either - B holds . and d and wants to lock c, or - C holds . and d and wants to lock c.
The problem with these is that genfs_rename_enter_separate in B or C tried lock order .->d->c->e (in A/B, fdvp->tdvp->fvp->tvp; in A/C, tdvp->fdvp->tvp->fvp) which violates the ancestor->descendant order .->c->d->e.
The resolution is to change B to do fdvp->fvp->tdvp->tvp and C to do tdvp->tvp->fdvp->fvp. But there's an edge case: tvp and fvp might be the same (hard links), and we can't detect that until after we've looked them both up -- and in some file systems (I'm looking at you, ufs), there is no mere lookup operation, only lookup-and-lock, so we can't even hold the lock on one of tvp or fvp when we look up the other one if there's a chance they might be the same.
Fortunately the cases (a) tvp = fvp (b) tvp or fvp is a directory are mutually exclusive as long as directories cannot be hard-linked. In case (a) we can just defer locking {tvp, fvp} until the end, because it can't possibly have {fdvp or fvp, tdvp or tvp} as descendants. In case (b) we can just lock them in the order fdvp->fvp->tdvp->tvp or tdvp->tvp->fdvp->fvp if the first one of {fvp, tvp} is a directory, because it can't possibly coincide with the second one of {fvp, tvp}.
With this change, we can now prove that the locking order is consistent with the ancestor->descendant partial ordering. Where two nodes are incommensurate under that partial ordering, they are only ever locked by rename and there is only ever one rename at a time.
Proof:
- For same-directory renames, genfs_rename_enter_common locks the directory first and then the children. The order directory->child[i] is consistent with ancestor->descendant and child[0]/child[1] are incommensurate.
- For cross-directory renames:
. While a rename is in progress and the fs-wide rename lock is held, directories can be created or removed but not changed, so the outcome of gro_genealogy -- which, given fdvp and tdvp, returns the node N relating fdvp/N/.../tdvp or null if there is none -- can only transition from finding N to not finding N, if one of the directories is removed while any of the vnodes are unlocked. Merely creating directories cannot change the ancestry of tdvp, and concurrent renames are not possible.
Thus, if a gro_genealogy determined the operation to have the form fdvp/N/.../tdvp, then it might cease to have that form, but only because tdvp was removed which will harmlessly cause the rename to fail later on. Similarly, if gro_genealogy determined the operation _not_ to have the form fdvp/N/.../tdvp then it can't begin to have that form until after the rename has completed.
The lock order is,
=> for fdvp/.../tdvp: 1. lock fdvp 2. lookup(/lock/unlock) fvp (consistent with fdvp->fvp) 3. lock fvp if a directory (consistent with fdvp->fvp) 4. lock tdvp (consistent with fdvp->tdvp and possibly fvp->tdvp) 5. lookup(/lock/unlock) tvp (consistent with tdvp->tvp) 6. lock fvp if a nondirectory (fvp->t* or fvp->fdvp is impossible) 7. lock tvp if not fvp (tvp->f* is impossible unless tvp=fvp)
=> for incommensurate fdvp & tdvp, or for tdvp/.../fdvp: 1. lock tdvp 2. lookup(/lock/unlock) tvp (consistent with tdvp->tvp) 3. lock tvp if a directory (consistent with tdvp->tvp) 4. lock fdvp (either incommensurate with tdvp and/or tvp, or consistent with tdvp(->tvp)->fdvp) 5. lookup(/lock/unlock) fvp (consistent with fdvp->fvp) 6. lock tvp if a nondirectory (tvp->f* or tvp->tdvp is impossible) 7. lock fvp if not tvp (fvp->t* is impossible unless fvp=tvp)
Deadlocks found by hannken@; resolution worked out with dholland@.
XXX I think we could improve concurrency somewhat -- with a likely big win for applications like tar and rsync that create many files with temporary names and then rename them to the permanent one in the same directory -- by making vfs_renamelock a reader/writer lock: any number of same-directory renames, or exactly one cross-directory rename, at any one time.
|
1.4 |
| 16-May-2020 |
christos | Add ACL support for FFS. From FreeBSD.
|
1.3 |
| 30-Mar-2017 |
hannken | branches: 1.3.18; Remove now redundant calls to fstrans_start()/fstrans_done().
|
1.2 |
| 06-Feb-2014 |
hannken | branches: 1.2.6; 1.2.10; 1.2.14; Move fstrans_start()/fstrans_done() into genfs_insane_rename() to protect the complete rename operation like we do for all other vnode operations.
|
1.1 |
| 08-May-2012 |
riastradh | branches: 1.1.2; 1.1.4; 1.1.6; 1.1.8; 1.1.10; Implement a genfs_rename abstraction.
First major step in incrementally adapting all the file systems to a saner rename VOP protocol.
|
1.1.10.1 |
| 18-May-2014 |
rmind | sync with head
|
1.1.8.2 |
| 03-Dec-2017 |
jdolecek | update from HEAD
|
1.1.8.1 |
| 20-Aug-2014 |
tls | Rebase to HEAD as of a few days ago.
|
1.1.6.2 |
| 02-Jul-2012 |
jdc | Pull up revisions: src/sys/conf/files revision 1.1050 src/sys/miscfs/genfs/genfs.h revision 1.30 via patch src/sys/miscfs/genfs/genfs_rename.c revision 1.1 via patch src/sys/rump/librump/rumpvfs/Makefile.rumpvfs revision 1.33 (requested by riastradh in ticket #286).
Implement a genfs_rename abstraction.
First major step in incrementally adapting all the file systems to a saner rename VOP protocol.
|
1.1.6.1 |
| 08-May-2012 |
jdc | file genfs_rename.c was added on branch netbsd-6 on 2012-07-02 18:01:17 +0000
|
1.1.4.2 |
| 02-Jun-2012 |
mrg | sync to latest -current.
|
1.1.4.1 |
| 08-May-2012 |
mrg | file genfs_rename.c was added on branch jmcneill-usbmp on 2012-06-02 11:09:36 +0000
|
1.1.2.3 |
| 22-May-2014 |
yamt | sync with head.
for a reference, the tree before this commit was tagged as yamt-pagecache-tag8.
this commit was splitted into small chunks to avoid a limitation of cvs. ("Protocol error: too many arguments")
|
1.1.2.2 |
| 23-May-2012 |
yamt | sync with head.
|
1.1.2.1 |
| 08-May-2012 |
yamt | file genfs_rename.c was added on branch yamt-pagecache on 2012-05-23 10:08:14 +0000
|
1.2.14.1 |
| 21-Apr-2017 |
bouyer | Sync with HEAD
|
1.2.10.1 |
| 26-Apr-2017 |
pgoyette | Sync with HEAD
|
1.2.6.1 |
| 28-Aug-2017 |
skrll | Sync with HEAD
|
1.3.18.1 |
| 13-Sep-2020 |
martin | Pull up following revision(s) (requested by riastradh in ticket #1083):
sys/miscfs/genfs/genfs_rename.c: revision 1.5 tests/fs/vfs/t_renamerace.c: revision 1.37 tests/fs/vfs/t_renamerace.c: revision 1.38
tests/fs/vfs/t_renamerace: Test a screw case hannken@ found.
genfs_rename: Fix deadlocks in cross-directory cyclic rename.
Reproducer: A: for (;;) { mkdir("c", 0600); mkdir("c/d", 0600); mkdir("c/d/e", 0600); rmdir("c/d/e"); rmdir("c/d"); } B: for (;;) { mkdir("c", 0600); mkdir("c/d", 0600); mkdir("c/d/e", 0600); rename("c", "c/d/e"); } C: for (;;) { mkdir("c", 0600); mkdir("c/d", 0600); mkdir("c/d/e", 0600); rename("c/d/e", "c"); }
Deadlock: - A holds c and wants to lock d; and either - B holds . and d and wants to lock c, or - C holds . and d and wants to lock c.
The problem with these is that genfs_rename_enter_separate in B or C tried lock order .->d->c->e (in A/B, fdvp->tdvp->fvp->tvp; in A/C, tdvp->fdvp->tvp->fvp) which violates the ancestor->descendant order .->c->d->e.
The resolution is to change B to do fdvp->fvp->tdvp->tvp and C to do tdvp->tvp->fdvp->fvp. But there's an edge case: tvp and fvp might be the same (hard links), and we can't detect that until after we've looked them both up -- and in some file systems (I'm looking at you, ufs), there is no mere lookup operation, only lookup-and-lock, so we can't even hold the lock on one of tvp or fvp when we look up the other one if there's a chance they might be the same.
Fortunately the cases (a) tvp = fvp (b) tvp or fvp is a directory are mutually exclusive as long as directories cannot be hard-linked.
In case (a) we can just defer locking {tvp, fvp} until the end, because it can't possibly have {fdvp or fvp, tdvp or tvp} as descendants. In case (b) we can just lock them in the order fdvp->fvp->tdvp->tvp or tdvp->tvp->fdvp->fvp if the first one of {fvp, tvp} is a directory, because it can't possibly coincide with the second one of {fvp, tvp}.
With this change, we can now prove that the locking order is consistent with the ancestor->descendant partial ordering. Where two nodes are incommensurate under that partial ordering, they are only ever locked by rename and there is only ever one rename at a time.
Proof: - For same-directory renames, genfs_rename_enter_common locks the directory first and then the children. The order directory->child[i] is consistent with ancestor->descendant and child[0]/child[1] are incommensurate. - For cross-directory renames: . While a rename is in progress and the fs-wide rename lock is held, directories can be created or removed but not changed, so the outcome of gro_genealogy -- which, given fdvp and tdvp, returns the node N relating fdvp/N/.../tdvp or null if there is none -- can only transition from finding N to not finding N, if one of the directories is removed while any of the vnodes are unlocked. Merely creating directories cannot change the ancestry of tdvp, and concurrent renames are not possible. Thus, if a gro_genealogy determined the operation to have the form fdvp/N/.../tdvp, then it might cease to have that form, but only because tdvp was removed which will harmlessly cause the rename to fail later on. Similarly, if gro_genealogy determined the operation _not_ to have the form fdvp/N/.../tdvp then it can't begin to have that form until after the rename has completed. The lock order is, => for fdvp/.../tdvp: 1. lock fdvp 2. lookup(/lock/unlock) fvp (consistent with fdvp->fvp) 3. lock fvp if a directory (consistent with fdvp->fvp) 4. lock tdvp (consistent with fdvp->tdvp and possibly fvp->tdvp) 5. lookup(/lock/unlock) tvp (consistent with tdvp->tvp) 6. lock fvp if a nondirectory (fvp->t* or fvp->fdvp is impossible) 7. lock tvp if not fvp (tvp->f* is impossible unless tvp=fvp) => for incommensurate fdvp & tdvp, or for tdvp/.../fdvp: 1. lock tdvp 2. lookup(/lock/unlock) tvp (consistent with tdvp->tvp) 3. lock tvp if a directory (consistent with tdvp->tvp) 4. lock fdvp (either incommensurate with tdvp and/or tvp, or consistent with tdvp(->tvp)->fdvp) 5. lookup(/lock/unlock) fvp (consistent with fdvp->fvp) 6. lock tvp if a nondirectory (tvp->f* or tvp->tdvp is impossible) 7. lock fvp if not tvp (fvp->t* is impossible unless fvp=tvp)
Deadlocks found by hannken@; resolution worked out with dholland@.
XXX I think we could improve concurrency somewhat -- with a likely big win for applications like tar and rsync that create many files with temporary names and then rename them to the permanent one in the same directory -- by making vfs_renamelock a reader/writer lock: any number of same-directory renames, or exactly one cross-directory rename, at any one time.
|