Home | History | Annotate | Download | only in kern
History log of /src/sys/kern/vfs_mount.c
RevisionDateAuthorComments
 1.110  07-Dec-2024  riastradh vfs(9): Sprinkle SET_ERROR dtrace probes.

PR kern/58378: Kernel error code origination lacks dtrace probes
 1.109  07-Dec-2024  riastradh vfs(9): Fix some more whitespace issues.

No functional change intended.
 1.108  07-Dec-2024  riastradh vfs(9): Sprinkle KNF.

No functional change intended.
 1.107  11-Aug-2024  bad explain why MNT_ASYNC is temporarily cleared

related to PR kern/58564.
 1.106  11-Aug-2024  bad vfs_subr.c: in dounmount restore the async flag before checking it

This avoids the file system being put on the syncer work list and future
modified buffers being flushed to disk by the synce after an attempt to
unmount it fails.

Adjust the test case to not expect a failure.

fixes PR kern/58564.
 1.105  19-Apr-2024  riastradh branches: 1.105.2;
dounmount: Avoid &((struct vnode_impl *)NULL)->vi_vnode.

Member access of a null pointer is undefined, even if the result
should also be null because vi_vnode is at the start of vnode_impl.

Reported-by: syzbot+a4b2d13c0d6d4dac2d07@syzkaller.appspotmail.com
https://syzkaller.appspot.com/bug?extid=a4b2d13c0d6d4dac2d07
 1.104  17-Jan-2024  hannken Print dangling vnode before panic() to help debug.

PR kern/57775 ""panic: unmount: dangling vnode" while umounting procfs"
 1.103  28-Dec-2023  hannken Include "veriexec.h" and <sys/verified_exec.h> to run
veriexec_unmountchk() on "NVERIEXEC > 0".
 1.102  24-Feb-2023  riastradh kern: Eliminate most __HAVE_ATOMIC_AS_MEMBAR conditionals.

I'm leaving in the conditional around the legacy membar_enters
(store-before-load, store-before-store) in kern_mutex.c and in
kern_lock.c because they may still matter: store-before-load barriers
tend to be the most expensive kind, so eliding them is probably
worthwhile on x86. (It also may not matter; I just don't care to do
measurements right now, and it's a single valid and potentially
justifiable use case in the whole tree.)

However, membar_release/acquire can be mere instruction barriers on
all TSO platforms including x86, so there's no need to go out of our
way with a bad API to conditionalize them. If the procedure call
overhead is measurable we just could change them to be macros on x86
that expand into __insn_barrier.

Discussed on tech-kern:
https://mail-index.netbsd.org/tech-kern/2023/02/23/msg028729.html
 1.101  09-Dec-2022  hannken branches: 1.101.2;
Harden layered file systems usage of field "mnt_lower" against
forced unmounts of the lower layer.

- Dont allow "dead_rootmount" as lower layer.

- Take file system busy before a vfs operation walks down the stack.

Reported-by: syzbot+27b35e5675b1753cec03@syzkaller.appspotmail.com
Reported-by: syzbot+99071492e3de2eff49e9@syzkaller.appspotmail.com
 1.100  10-Nov-2022  hannken If built with DEBUG Limit the depth of file system stack so kernel sanitizers
may stress mount/unmount without exhausting the kernel stack.
 1.99  04-Nov-2022  hannken Add a helper to set or clear lower mount and use it.
Always add a reference to the lower mount.

Ride 9.99.105
 1.98  26-Oct-2022  riastradh sys/filedesc.h: New home for extern cwdi0.
 1.97  13-Sep-2022  riastradh vflush(9): Insert `involuntary' preemption point at each vnode.

Currently there is a voluntary yield every 100ms, but that's a long
time. Should help to avoid hogging the CPU while flushing lots of
data to big disks on systems without kpreemption.
 1.96  26-Aug-2022  hannken Two defects in vfs_getnewfsid():

- Parallel mounts may get the same fsid. Always increment "xxxfs_mntid"
to make it unlikely.

- Directly walk "mountlist" to prevent a rare deadlock where one thread
holds a vnode locked, calls vfs_getnewfsid() and the iterator has to
wait for a suspended file system while the thread suspending needs
this vnode lock.
 1.95  22-Aug-2022  hannken Protect changing "v_mountedhere" with file system suspension instead
of vnode lock.
 1.94  08-Jul-2022  hannken Suspend file system after VFS_MOUNT() and before taking mnt_updating.
Prevents deadlock against concurrent unmounts of layered file systems.
 1.93  09-Apr-2022  riastradh sys: Use membar_release/acquire around reference drop.

This just goes through my recent reference count membar audit and
changes membar_exit to membar_release and membar_enter to
membar_acquire -- this should make everything cheaper on most CPUs
without hurting correctness, because membar_acquire is generally
cheaper than membar_enter.
 1.92  28-Mar-2022  riastradh specfs: Let spec_node_lookup_by_dev wait for reclaim to finish.

vdevgone relies on this to ensure that if there is a concurrent
revoke in progress, it will wait for that revoke to finish -- that
way, it can guarantee all I/O operations have completed and the
device is closed.
 1.91  24-Mar-2022  riastradh vfs(9): Add missing vnode lock around VOP_CLOSE in vfs_mountroot.

Maybe vnode_if.c should be taught to KASSERT the vnode lock now that
locks always work.
 1.90  19-Mar-2022  hannken Lock vnode across VOP_OPEN.
 1.89  16-Mar-2022  andvar s/paniced/panicked/ and s/borken/broken/ in comments.
 1.88  12-Mar-2022  riastradh sys: Membar audit around reference count releases.

If two threads are using an object that is freed when the reference
count goes to zero, we need to ensure that all memory operations
related to the object happen before freeing the object.

Using an atomic_dec_uint_nv(&refcnt) == 0 ensures that only one
thread takes responsibility for freeing, but it's not enough to
ensure that the other thread's memory operations happen before the
freeing.

Consider:

Thread A Thread B
obj->foo = 42; obj->baz = 73;
mumble(&obj->bar); grumble(&obj->quux);
/* membar_exit(); */ /* membar_exit(); */
atomic_dec -- not last atomic_dec -- last
/* membar_enter(); */
KASSERT(invariant(obj->foo,
obj->bar));
free_stuff(obj);

The memory barriers ensure that

obj->foo = 42;
mumble(&obj->bar);

in thread A happens before

KASSERT(invariant(obj->foo, obj->bar));
free_stuff(obj);

in thread B. Without them, this ordering is not guaranteed.

So in general it is necessary to do

membar_exit();
if (atomic_dec_uint_nv(&obj->refcnt) != 0)
return;
membar_enter();

to release a reference, for the `last one out hit the lights' style
of reference counting. (This is in contrast to the style where one
thread blocks new references and then waits under a lock for existing
ones to drain with a condvar -- no membar needed thanks to mutex(9).)

I searched for atomic_dec to find all these. Obviously we ought to
have a better abstraction for this because there's so much copypasta.
This is a stop-gap measure to fix actual bugs until we have that. It
would be nice if an abstraction could gracefully handle the different
styles of reference counting in use -- some years ago I drafted an
API for this, but making it cover everything got a little out of hand
(particularly with struct vnode::v_usecount) and I ended up setting
it aside to work on psref/localcount instead for better scalability.

I got bored of adding #ifdef __HAVE_ATOMIC_AS_MEMBAR everywhere, so I
only put it on things that look performance-critical on 5sec review.
We should really adopt membar_enter_preatomic/membar_exit_postatomic
or something (except they are applicable only to atomic r/m/w, not to
atomic_load/store_*, making the naming annoying) and get rid of all
the ifdefs.
 1.87  04-Feb-2022  hannken Stop clearing "v_mountedhere" in mount_domount() error path.

We did not set it and may clear the value from another mount.
 1.86  16-Feb-2021  hannken Reorganize uvm_swap_shutdown() a bit, make sure the vnode gets
locked and referenced across the call to swap_off() and finally
use it from vfs_unmountall1() to remove swap after unmounting
the last file system.

Adresses PR kern/54969 (Disk cache is no longer flushed on shutdown)
 1.85  19-Nov-2020  hannken We have to ignore interrupts when suspending here the same way
we have to do with revoke.

Reported-by: syzbot+0cfb253b382a9836450a@syzkaller.appspotmail.com
 1.84  13-Oct-2020  hannken branches: 1.84.2;
Suspend file system before unmounting in mount_domount() error path
to prevent diagnostic assertions from unmount/flush.

Reported-by: syzbot+8d557f49c8b7888182eb@syzkaller.appspotmail.com
Reported-by: syzbot+e87fe1e769a3426d9bf3@syzkaller.appspotmail.com
Reported-by: syzbot+9c5b86e651e98c5bf438@syzkaller.appspotmail.com
Reported-by: syzbot+610b614af0d66179ca78@syzkaller.appspotmail.com
Reported-by: syzbot+7818ff113a1535ebc724@syzkaller.appspotmail.com
 1.83  23-May-2020  ad Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.
 1.82  01-May-2020  hannken Undo Rev. 1.79, it breaks root-on-raid where it destroys the component
disks before the raid:

forcefully unmounting / (/dev/raid0a)...
sd1: detached
sd0: detached
raid0: cache flush to component /dev/sd0a failed.
raid0: cache flush to component /dev/sd1a failed.
fatal page fault in supervisor mode
Stopped in pid 2356.2356 (reboot) at netbsd:sdstrategy+0x36

Reopens PR kern/54969: Disk cache is no longer flushed on shutdown
 1.81  21-Apr-2020  ad Revert the changes made in February to make cwdinfo use mostly lockless,
which relied on taking extra vnode refs.

Having benchmarked various experimental changes over the past few months it
seems that it's better to avoid vnode refs as much as possible. cwdi_lock
as a RW lock already did that to some extent for getcwd() and will permit
the same for namei() too.
 1.80  20-Apr-2020  ad Rename buf_syncwait() to vfs_syncwait(), and have it wait on v_numoutput
rather than BC_BUSY. Removes the dependency on bufhash.
 1.79  19-Apr-2020  hannken Destroy anonymous device vnodes on reboot once the last file system
got unmounted and the mount list is empty.

PR kern/54969: Disk cache is no longer flushed on shutdown
 1.78  13-Apr-2020  ad Replace most uses of vp->v_usecount with a call to vrefcnt(vp), a function
that hides the details and does atomic_load_relaxed(). Signature matches
FreeBSD.
 1.77  13-Apr-2020  maxv hardclock_ticks -> getticks()
 1.76  10-Apr-2020  ad vfs_mountroot(): don't needlessly grab a second reference to the root vnode
(the kernel never chdirs) nor a lock on it.
 1.75  23-Feb-2020  ad branches: 1.75.4;
Merge from ad-namecache:

- Have a stab at clustering the members of vnode_t and vnode_impl_t in a
more cache-conscious way. With that done, go back to adjusting v_usecount
with atomics and keep vi_lock directly in vnode_impl_t (saves KVA).

- Allow VOP_LOCK(LK_NONE) for the benefit of VFS_VGET() and VFS_ROOT().
Make sure LK_UPGRADE always comes with LK_NOWAIT.

- Make cwdinfo use mostly lockless.
 1.74  17-Jan-2020  ad VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.
 1.73  22-Dec-2019  ad branches: 1.73.2;
Make mntvnode_lock per-mount, and address false sharing of struct mount.
 1.72  16-Nov-2019  maxv NULL-check the structure pointer, not the address of its first field. Also
add KASSERT. For clarity, and to appease kUBSan.
 1.71  19-Aug-2019  christos If we could not start extattr for some reason, don't advertise extattr in the
mount.
 1.70  20-Feb-2019  hannken branches: 1.70.4;
Move fstrans_unmount() to vfs_rele(), just before it would free the mount.
Don't take a mount reference for fstrans as it gets notified about the release.

Defer the final free of the mount to fstrans_mount_dtor() when fstrans
has released all references to this mount. Prevents the mount's memory
to be reused as a new mount before fstrans released all references.

Address PR kern/53928 modules/t_builtin:disable test case randomly fails.
 1.69  20-Feb-2019  hannken Attach "mnt_transinfo" to "dead_rootmount" so every mount has a
valid "mnt_transinfo" and remove now unneeded flag IMNT_HAS_TRANS.

Run fstrans_start()/fstrans_done() on dead_rootmount if FSTRANS_DEAD_ENABLED.
Should become the default for DIAGNOSTIC in the future.
 1.68  05-Feb-2019  hannken Allow dounmount() with file system already suspended.

Remove no longer valid test for layered mounts,
ZFS will unmount snapshots bottom up.
 1.67  21-Aug-2017  hannken branches: 1.67.4;
Change forced unmount to revert open device vnodes to anonymous devices.
 1.66  04-Jun-2017  hannken Operations fstrans_start() and fstrans_start_nowait() now always
use FSTRANS_SHARED as lock type so remove the lock type argument.

File system state FSTRANS_SUSPENDING is now unused so remove it.

Regen vnode_if files.

Ride 8.99.1 less than a hour ago.
 1.65  01-Jun-2017  chs branches: 1.65.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.64  24-May-2017  hannken With dounmount() working on a suspended file system remove no longer
needed fields mnt_busynest and mnt_unmounting from struct mount.

Welcome to 7.99.73
 1.63  24-May-2017  hannken Remove the syncer dance from dounmount(). The syncer skips
unmounting file systems as they are suspended.

Remove now unused syncer_mutex.
 1.62  17-May-2017  hannken Suspend file system while unmounting. This way no operations run
on the mounted file system during unmount and all operations see
the state before or after the (possibly failed) unmount.
 1.61  07-May-2017  hannken Enter fstrans from _vfs_busy() and leave from vfs_unbusy().

Adapt sched_sync() and do_sys_sync().
 1.60  07-May-2017  hannken Move fstrans initialization to vfs_mountalloc().
 1.59  07-May-2017  hannken Remove now invalid comment.
 1.58  17-Apr-2017  hannken branches: 1.58.2;
Add vfs_trybusy() and mountlist_iterator_trynext() and use it for the syncer.
 1.57  17-Apr-2017  hannken No need to keep a not yet visible mount busy. Move vfs_busy()
from vfs_mountalloc() to vfs_rootmountalloc().

XXX: Do we really need to vfs_busy() for vfs_mountroot?
 1.56  17-Apr-2017  hannken Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.
 1.55  17-Apr-2017  hannken Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).
 1.54  17-Apr-2017  hannken Cleanup after mountlist iterator:
- remove now unused field mnt_list.
- rename mount_list to mountlist and make it local to vfs_mount.c.
- make mountlist_lock local to vfs_mount.c.

Change pstat.c to retrieve vnodes by lru lists.
 1.53  12-Apr-2017  hannken Switch vfs_getvfs(), dounmount() and vfs_mountroot() to mountlist iterator.

Add a helper to retrieve a mount with "highest generation < arg" and
use it from vfs_unmount_forceone() and vfs_unmountall1().
 1.52  11-Apr-2017  hannken Add an iterator over the currently mounted file systems.

Ride 7.99.68
 1.51  30-Mar-2017  hannken Change _fstrans_start() to allocate per lwp info for layered file
systems to get a reference on the mount.

Set mnt_lower on successfull mount only.
 1.50  06-Mar-2017  hannken Always use the lowest mount for fstrans and suspend. This way we
enter/leave or suspend/resume the stack of layered file systems as a unit.
 1.49  06-Mar-2017  hannken Deny unmounting file systems below layered file systems.
 1.48  22-Feb-2017  hannken Enable fstrans on all file systems.

Welcome to 7.99.61
 1.47  27-Jan-2017  hannken Vrecycle() cannot wait for the vnode lock. On a leaf file system this lock
will always succeed as we hold the last reference and prevent further
references. On layered file systems waiting for the lock would open a can of
deadlocks as the lower vnodes may have other active references.
 1.46  27-Jan-2017  hannken When called with WRITECLOSE vflush() must sync the vnode and take
care of unlinked but open vnodes.

PR kern/30525 remounting ffs read-only (mount -ur) does not sync metadata.
 1.45  13-Jan-2017  hannken branches: 1.45.2;
Add file-local iterator variant vfs_vnode_iterator_next1() that
waits for vnodes to become reclaimed and use it from vflush().
 1.44  11-Jan-2017  hannken Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.

Add an ugly hack so pstat.c may still traverse the list.
 1.43  02-Jan-2017  hannken Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54
 1.42  14-Dec-2016  hannken Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().
 1.41  03-Nov-2016  hannken Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.
 1.40  07-Jul-2016  msaitoh branches: 1.40.2;
KNF. Remove extra spaces. No functional change.
 1.39  19-May-2016  hannken Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".
 1.38  19-May-2016  hannken Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.
 1.37  19-Aug-2015  hannken Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.
 1.36  02-Aug-2015  manu Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.
 1.35  06-May-2015  hannken Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@
 1.34  20-Apr-2015  riastradh Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.
 1.33  09-Mar-2015  pooka The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.
 1.32  08-Jan-2015  hannken vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().
 1.31  14-Nov-2014  manu branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.
 1.30  30-May-2014  hannken branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.
 1.29  24-May-2014  christos Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.
 1.28  18-Mar-2014  hannken branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37
 1.27  05-Mar-2014  hannken Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34
 1.26  27-Feb-2014  hannken Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.
 1.25  27-Nov-2013  christos one more *_END(head) -> NULL
 1.24  23-Nov-2013  christos change the mountlist CIRCLEQ into a TAILQ
 1.23  29-Oct-2013  hannken Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25
 1.22  25-Oct-2013  martin Mark diagnostic-only variables
 1.21  30-Sep-2013  hannken Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>
 1.20  30-Aug-2013  hannken Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)
 1.19  28-Apr-2013  mlelstv branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.
 1.18  26-Apr-2013  mlelstv Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.
 1.17  13-Feb-2013  hannken Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17
 1.16  14-Dec-2012  pooka Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG
 1.15  27-Oct-2012  chs split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.
 1.14  08-May-2012  gson branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.
 1.13  13-Mar-2012  elad Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.
 1.12  18-Nov-2011  christos branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.
 1.11  14-Oct-2011  hannken branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.
 1.10  07-Oct-2011  hannken As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.
 1.9  01-Sep-2011  christos undo previous
 1.8  01-Sep-2011  christos fix typo.
 1.7  01-Sep-2011  christos Check for v_type before v_rdev because it is cheaper and safer.
 1.6  12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.5  05-Jun-2011  dsl branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.
 1.4  03-Apr-2011  rmind branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.
 1.3  02-Apr-2011  rmind Merge vfs_shutdown1() and vfs_shutdown().
 1.2  02-Apr-2011  rmind - Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.
 1.1  02-Apr-2011  rmind Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.
 1.4.4.1  23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.4.2.3  12-Jun-2011  rmind sync with head
 1.4.2.2  21-Apr-2011  rmind sync with head
 1.4.2.1  03-Apr-2011  rmind file vfs_mount.c was added on branch rmind-uvmplock on 2011-04-21 01:42:11 +0000
 1.5.2.2  06-Jun-2011  jruoho Sync with HEAD.
 1.5.2.1  05-Jun-2011  jruoho file vfs_mount.c was added on branch jruoho-x86intr on 2011-06-06 09:09:41 +0000
 1.11.2.5  22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.11.2.4  23-Jan-2013  yamt sync with head
 1.11.2.3  30-Oct-2012  yamt sync with head
 1.11.2.2  23-May-2012  yamt sync with head.
 1.11.2.1  17-Apr-2012  yamt sync with head
 1.12.6.2  04-Dec-2014  snj Pull up following revision(s) (requested by manu in ticket #1196):
sys/kern/vfs_mount.c: revision 1.31
sys/ufs/ffs/ffs_vfsops.c: revision 1.302
sys/ufs/ufs/ufs_extattr.c: revision 1.44
Fix use-after-free on failed unmount with extended attribute enabled
When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.
The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart
As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.
 1.12.6.1  19-May-2012  riz Pull up following revision(s) (requested by manu in ticket #259):
sys/kern/vfs_syscalls.c: revision 1.456
sys/kern/vfs_mount.c: revision 1.14
sys/kern/vfs_syscalls.c: revision 1.452
sys/kern/vfs_syscalls.c: revision 1.453
sys/kern/vfs_syscalls.c: revision 1.454
Do not use vp after mount_domount() call as it sets it to NULL on success.
This fixes a panic when starting extended attributes.
Fix mount -o extattr : previous patch fixed a panic but caused operation
to happen on the mount point instead of the mounted filesystem.
Fix the extattr start fix. Looking up the filesystemroot vnode again
does not seems to be reliable. Instead save it before mount_domount()
sets it to NULL.
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.
 1.12.4.2  02-Jun-2012  mrg sync to latest -current.
 1.12.4.1  05-Apr-2012  mrg sync to latest -current.
 1.14.2.5  03-Dec-2017  jdolecek update from HEAD
 1.14.2.4  20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.14.2.3  23-Jun-2013  tls resync from head
 1.14.2.2  25-Feb-2013  tls resync with head
 1.14.2.1  20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.19.4.1  18-May-2014  rmind sync with head
 1.28.2.1  10-Aug-2014  tls Rebase.
 1.30.2.3  04-Nov-2015  riz Pull up following revision(s) (requested by manu in ticket #962):
sys/kern/vfs_mount.c: revision 1.36
Do not VFS_SYNC before VFS_UNMOUNT on force unmount
VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.
This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.
As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.
 1.30.2.2  09-Jan-2015  martin Pull up following revision(s) (requested by hannken in ticket #398):
sys/kern/vfs_mount.c: revision 1.32
vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().
 1.30.2.1  18-Nov-2014  snj Pull up following revision(s) (requested by manu in ticket #246):
sys/kern/vfs_mount.c: revision 1.31
sys/ufs/ffs/ffs_vfsops.c: revision 1.302
sys/ufs/ufs/ufs_extattr.c: revision 1.44
Fix use-after-free on failed unmount with extended attribute enabled
When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.
The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart
As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.
 1.31.2.7  28-Aug-2017  skrll Sync with HEAD
 1.31.2.6  05-Feb-2017  skrll Sync with HEAD
 1.31.2.5  05-Dec-2016  skrll Sync with HEAD
 1.31.2.4  29-May-2016  skrll Sync with HEAD
 1.31.2.3  22-Sep-2015  skrll Sync with HEAD
 1.31.2.2  06-Jun-2015  skrll Sync with HEAD
 1.31.2.1  06-Apr-2015  skrll Sync with HEAD
 1.40.2.6  26-Apr-2017  pgoyette Sync with HEAD
 1.40.2.5  20-Mar-2017  pgoyette Sync with HEAD
 1.40.2.4  07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.40.2.3  04-Nov-2016  pgoyette Sync with HEAD
 1.40.2.2  22-Jul-2016  pgoyette Fix up the error/exit path.

Return actual error, not blindly report success!
 1.40.2.1  20-Jul-2016  pgoyette Adapt machine-independant code to the new {b,c}devsw reference-counting
(using localcount(9)). All callers of {b,c}devsw_lookup() now call
{b,c}devsw_lookup_acquire() which retains a reference on the 'struct
{b,c}devsw'. This reference must be released by the caller once it is
finished with the structure's content (or other data that would disappear
if the 'struct {b,c}devsw' were to disappear).
 1.45.2.1  21-Apr-2017  bouyer Sync with HEAD
 1.58.2.3  19-May-2017  pgoyette Resolve conflicts from previous merge (all resulting from $NetBSD
keywork expansion)
 1.58.2.2  11-May-2017  pgoyette Sync with HEAD
 1.58.2.1  27-Apr-2017  pgoyette Restore all work from the former pgoyette-localcount branch (which is
now abandoned doe to cvs merge botch).

The branch now builds, and installs via anita. There are still some
problems (cgd is non-functional and all atf tests time-out) but they
will get resolved soon.
 1.65.2.2  25-Aug-2017  snj Pull up following revision(s) (requested by hannken in ticket #227):
sys/sys/vnode_impl.h: revision 1.16
sys/kern/vfs_vnode.c: revision 1.97
sys/kern/vfs_vnode.c: revision 1.98
sys/kern/vfs_mount.c: revision 1.67
sys/miscfs/deadfs/dead_vfsops.c: revision 1.8
No need to cache anonymous device vnodes, they will never be looked up.
Set key to (dead_rootmount, 0, NULL) and add assertions.
--
Change forced unmount to revert open device vnodes to anonymous devices.
 1.65.2.1  04-Jun-2017  bouyer pullup the following revisions, requested by hannken in ticket #2:
src/share/man/man9/fstrans.9 1.25
src/sys/kern/vfs_mount.c 1.66
src/sys/kern/vfs_subr.c 1.468
src/sys/kern/vfs_trans.c 1.46
src/sys/kern/vfs_vnode.c 1.94, 1.95, 1.96
src/sys/kern/vnode_if.c 1.105, 1.106
src/sys/kern/vnode_if.sh 1.65, 1.66
src/sys/kern/vnode_if.src 1.76
src/sys/miscfs/genfs/genfs_io.c 1.69
src/sys/miscfs/genfs/genfs_vnops.c 1.196, 1.197
src/sys/miscfs/genfs/layer_extern.h 1.40
src/sys/miscfs/genfs/layer_vfsops.c 1.51
src/sys/miscfs/genfs/layer_vnops.c 1.67
src/sys/miscfs/nullfs/null_vnops.c 1.42
src/sys/miscfs/overlay/overlay_vnops.c 1.24
src/sys/miscfs/umapfs/umap_vnops.c 1.60
src/sys/rump/include/rump/rumpvnode_if.h 1.29, 1.30
src/sys/rump/librump/rumpkern/emul.c 1.182
src/sys/rump/librump/rumpvfs/rumpvnode_if.c 1.29, 1.30
src/sys/sys/fstrans.h 1.11
src/sys/sys/vnode.h 1.278
src/sys/sys/vnode_if.h 1.100, 1.101
src/sys/sys/vnode_impl.h 1.14, 1.15
src/sys/ufs/lfs/lfs_pages.c 1.12

Vnode state, lock and fstrans cleanup:
- Rename vnode state "VS_ACTIVE" to "VS_LOADED" and add synthetic
state "VS_ACTIVE" to assert a loaded vnode with usecount > 0.

- Redo FSTRANS in vnode_if.c and use it for VOP_LOCK and VOP_UNLOCK.

- Cleanup the genfs lock operations.

- Make "struct vnode_impl" member "vi_lock" a krwlock_t again.

- Remove the lock type argument from fstrans_start and
fstrans_start_nowait,
remove now unused FSTRANS state "FSTRANS_SUSPENDING".
 1.67.4.3  21-Apr-2020  martin Sync with HEAD
 1.67.4.2  13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.67.4.1  10-Jun-2019  christos Sync with HEAD
 1.70.4.2  01-May-2020  martin Pull up following revision(s) (requested by hannken in ticket #881):

sys/kern/vfs_mount.c: revision 1.82

Undo Rev. 1.79, it breaks root-on-raid where it destroys the component
disks before the raid:

forcefully unmounting / (/dev/raid0a)...
sd1: detached
sd0: detached
raid0: cache flush to component /dev/sd0a failed.
raid0: cache flush to component /dev/sd1a failed.
fatal page fault in supervisor mode
Stopped in pid 2356.2356 (reboot) at netbsd:sdstrategy+0x36

Reopens PR kern/54969: Disk cache is no longer flushed on shutdown
 1.70.4.1  22-Apr-2020  martin Pull up following revision(s) (requested by gson in ticket #839):

sys/kern/vfs_mount.c: revision 1.79

Destroy anonymous device vnodes on reboot once the last file system
got unmounted and the mount list is empty.

PR kern/54969: Disk cache is no longer flushed on shutdown
 1.73.2.3  29-Feb-2020  ad Sync with head.
 1.73.2.2  25-Jan-2020  ad Make cwdinfo use mostly lockless, and largely hide the details in vfs_cwd.c.
 1.73.2.1  17-Jan-2020  ad Sync with head.
 1.75.4.2  25-Apr-2020  bouyer Sync with bouyer-xenpvh-base2 (HEAD)
 1.75.4.1  20-Apr-2020  bouyer Sync with HEAD
 1.84.2.2  03-Apr-2021  thorpej Sync with HEAD.
 1.84.2.1  14-Dec-2020  thorpej Sync w/ HEAD.
 1.101.2.2  12-Sep-2024  martin Pull up following revision(s) (requested by rin in ticket #843):

sys/kern/vfs_mount.c: revision 1.105

dounmount: Avoid &((struct vnode_impl *)NULL)->vi_vnode.

Member access of a null pointer is undefined, even if the result
should also be null because vi_vnode is at the start of vnode_impl.
 1.101.2.1  18-Apr-2024  martin Pull up following revision(s) (requested by hannken in ticket #668):

sys/miscfs/procfs/procfs.h: revision 1.83
sys/miscfs/procfs/procfs.h: revision 1.84
sys/kern/vfs_mount.c: revision 1.104
sys/miscfs/procfs/procfs_vnops.c: revision 1.230
sys/kern/init_main.c: revision 1.547
sys/kern/kern_hook.c: revision 1.15
sys/miscfs/procfs/procfs_vfsops.c: revision 1.112
sys/miscfs/procfs/procfs_vfsops.c: revision 1.113
sys/miscfs/procfs/procfs_vfsops.c: revision 1.114
sys/miscfs/procfs/procfs_subr.c: revision 1.117

Print dangling vnode before panic() to help debug.

PR kern/57775 ""panic: unmount: dangling vnode" while umounting procfs"
Protect kernel hooks exechook, exithook and forkhook with rwlock.

Lock as writer on establish/disestablish and as reader on list traverse.

For exechook ride "exec_lock" as it is already take as reader when
traversing the list. Add local locks for exithook and forkhook.

Move exec_init before signal_init as signal_init calls exechook_establish()
that needs "exec_lock".

PR kern/39913 "exec, fork, exit hooks need locking"

Add a hashmap to access all procfs nodes by pid.

Using the exechook to revoke procfs nodes is racy and may deadlock:
one thread runs doexechooks() -> procfs_revoke_vnodes() and wants to suspend
the file system for vgone(), while another thread runs a forced unmount,
has the file system suspended, tries to disestablish the exechook and
waits for doexechooks() to complete.

Establish/disestablish the exechook on module load/unload instead
mount/unmount and use the hashmap to access all procfs nodes for this pid.

May fix PR kern/57775 ""panic: unmount: dangling vnode" while umounting procfs"

Remove all procfs nodes for this process on process exit.
 1.105.2.1  02-Aug-2025  perseant Sync with HEAD

RSS XML Feed