Home | History | Annotate | Download | only in kern
History log of /src/sys/kern/vfs_lookup.c
RevisionDateAuthorComments
 1.239  14-Sep-2025  andvar Fix various typos in comments and log message.
 1.238  07-Dec-2024  riastradh vfs(9): Sprinkle SET_ERROR dtrace probes.

PR kern/58378: Kernel error code origination lacks dtrace probes
 1.237  07-Dec-2024  riastradh vfs(9): Fix some more whitespace issues.

No functional change intended.
 1.236  07-Dec-2024  riastradh vfs(9): Sprinkle KNF.

No functional change intended.
 1.235  01-Jul-2024  christos refactor slightly so we don't try to read the buffer supplied by userland.
 1.234  01-May-2023  mlelstv branches: 1.234.6;
Default PROC_MACHINE_ARCH to machine_arch and use this for magic
symlinks to resolve "@machine_arch".

This keeps behaviour of magic symlinks and 'uname -p' output the same.
Fixes PR 57320.
 1.233  09-Apr-2023  riastradh kern: KASSERT(A && B) -> KASSERT(A); KASSERT(B)
 1.232  22-Aug-2022  hannken branches: 1.232.4;
Use fstrans_start()/fstrans_done() to cross the mount in lookup_crossmount().
It is sufficient here as it prevents the file system from unmount and
makes it safe to use VFS_ROOT() here.

Removes a rare deadlock where one thread has "foundobj" locked and waits
for "foundobj->v_mountedhere" to resume while the thread holding the file
system suspended tries to lookup a node and needs a lock on "foundobj".
 1.231  10-Feb-2022  hannken Remove the assertion "searchdir != foundobj" from lookup_crossmount().

It will trigger whenever we lookup "." on a directory that becomes
mounted

fd = open("/mnt")
mount(..., "/mnt", ...)
fd2 = openat(fd, ".")

or

Unlocked searchdir gets mounted mounted on between
lookup_fastforward()/lookup_once() and the test for
foundobj->v_mountedhere.

May address syzkaller:

Reported-by: syzbot+9197ac681ce50f707d9a@syzkaller.appspotmail.com
Reported-by: syzbot+eb4854df8ee3c9bc278d@syzkaller.appspotmail.com
Reported-by: syzbot+3cc5b4126ab554f145d3@syzkaller.appspotmail.com
Reported-by: syzbot+7eae48a3ea952efee8c8@syzkaller.appspotmail.com
Reported-by: syzbot+b7f662083ccf8be3e669@syzkaller.appspotmail.com
 1.230  13-Nov-2021  hannken If lookup_fastforward() loses an intermediate searchdir, has to roll
back and retry it must use the initial searchdir from *searchdir_ret
for lookup_parsepath().
 1.229  29-Jun-2021  dholland Now remove cn_consume from struct componentname.

This change requires a kernel bump.

Note though that I'm not going to version the VOP_LOOKUP args
structure (or any other args structure) as code that doesn't touch
cn_consume doesn't need attention and code that does will fail on it
without further intervention.
 1.228  29-Jun-2021  dholland - Add a new vnode op: VOP_PARSEPATH.
- Move namei_getcomponent to genfs_vnops.c and call it genfs_parsepath.
- Add a parsepath entry to every vnode ops table.

VOP_PARSEPATH takes a directory vnode to be searched and a complete
following path and chooses how much of that path to consume. To begin
with, all parsepath calls are genfs_parsepath, which locates the first
'/' as always.

Note that the call doesn't take the whole struct componentname, only
the string. The other bits of struct componentname should not be
needed and there's no reason to cause potential complications by
exposing them.
 1.227  29-Jun-2021  dholland Adjust namei internals to be able to make an external call to parse
the pathname. (Basically, this means change the signature of
namei_getcomponent(), and thus lookup_parsepath(), to pass in the
directory vnode and to allow failures.)
 1.226  16-Jun-2021  dholland Add a new namei flag NONEXCLHACK for open with O_CREAT and not O_EXCL.

This case needs to be distinguished from the other CREATE operations
because it is supposed to successfully return (and open) the target if
it exists. In the case where that target is the root, or a mount
point, such that there's no parent dir, "real" CREATE operations fail,
but O_CREAT without O_EXCL needs to succeed.

So (a) add the flag, (b) test for it in namei in the situation
described above, (c) set it in open under the appropriate
circumstances, and (d) because this can result in namei returning
ni_dvp of NULL, cope with that case.

Should get into -9 and maybe even -8, because it was prompted by
issues with 3rd-party code. The use of a flag (vs. adding an
additional nameiop, which would be more appropriate) was deliberate to
make the patch small and noninvasive.
 1.225  29-Dec-2020  chs branches: 1.225.4;
Honor LOCKPARENT for ".." of the root directory.

Reported-by: syzbot+f40b9f241b818fd12198@syzkaller.appspotmail.com
 1.224  15-Jun-2020  ad branches: 1.224.2;
lookup_fastforward():

- If the root vnode of a mount is being reclaimed concurrent to a lookup,
it's possbile to become confounded and bail out of the loop with both
foundobj=NULL and searchdir=NULL (causing a NULL pointer deref). If that
happens everything should be rolled back to the start for retry. Problem
found and debugged by hannken@.

- If the terminal node was !VDIR then searchdir was needlessly referenced.
No functional impact.
 1.223  04-Jun-2020  riastradh Nix trailing whitespace. NFCI.
 1.222  30-May-2020  ad Fix merge error - adjust assertions.
 1.221  30-May-2020  ad A couple of small changes to lookup that cut 5-10% system time from
"build.sh release" on my test system:

- Crossing mount points during lookup is slow because the set up for, and
act of doing VFS_ROOT() is quite involved. Use the name cache to help
with this. Cache an "impossible" zero-length name with covered vnodes,
that points to the root of the file system mounted there. Use it to cross
mounts. When cache_purge() is called on either of the vnodes involved the
cache entry will disappear. All of the needed calls for that are already
in place (vnode reclaim, unmount, etc).

- In lookup_fastforward(), if the the last component has been found and the
parent directory (searchdir) is not going to be returned, then don't get a
reference to it.
 1.220  26-May-2020  ad Make vcache_tryvget() lockless. Reviewed by hannken@.
 1.219  22-Apr-2020  ad lookup_fastforward(): handle dotdot lookups and give up less often in
the union mount case.
 1.218  21-Apr-2020  ad Revert the changes made in February to make cwdinfo use mostly lockless,
which relied on taking extra vnode refs.

Having benchmarked various experimental changes over the past few months it
seems that it's better to avoid vnode refs as much as possible. cwdi_lock
as a RW lock already did that to some extent for getcwd() and will permit
the same for namei() too.
 1.217  07-Apr-2020  ad branches: 1.217.2;
lookup_fastforward(): failure to vget foundobj vnode also translates into
EOPNOTSUPP; VOP_LOOKUP() should retry it.
 1.216  07-Apr-2020  ad PR kern/55146 (100+ file system test cases failing)

- namei_oneroot(): key on negative return from lookup_fastforward()
(EOPNOTSUPP), not positive.

- lookup_crossmount(): don't lose track of founddir.

From hannken@, with a couple of tweaks.
 1.215  04-Apr-2020  ad Merge the remaining changes from the ad-namecache branch, affecting namei()
and getcwd():

- push vnode locking back as far as possible.
- do most lookups directly in the namecache, avoiding vnode locks & refs.
- don't block new refs to vnodes across VOP_INACTIVE().
- get shared locks for VOP_LOOKUP() if the file system supports it.
- correct lock types for VOP_ACCESS() / VOP_GETATTR() in a few places.

Possible future enhancements:

- make the lookups lockless.
- support dotdot lookups by being lockless and inferring absence of chroot.
- maybe make it work for layered file systems.
- avoid vnode references at the root & cwd.
 1.214  23-Feb-2020  ad Merge from ad-namecache:

- Have a stab at clustering the members of vnode_t and vnode_impl_t in a
more cache-conscious way. With that done, go back to adjusting v_usecount
with atomics and keep vi_lock directly in vnode_impl_t (saves KVA).

- Allow VOP_LOCK(LK_NONE) for the benefit of VFS_VGET() and VFS_ROOT().
Make sure LK_UPGRADE always comes with LK_NOWAIT.

- Make cwdinfo use mostly lockless.
 1.213  17-Jan-2020  ad VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.
 1.212  18-Jul-2019  hannken branches: 1.212.2; 1.212.4;
Make namei() work with no root dir yet.

From David Holland with minor tweaks from me.

Should fix PR kern/54378 (panic with TLB miss when attempting to reboot)
 1.211  06-Jul-2019  maxv Fix (harmless) uninitialized variable. In the path

namei_tryemulroot -> namei_oneroot-> namei_start

There was a branch where 'ndp->ni_erootdir' was not initialized.
 1.210  17-Mar-2019  hannken With TRYEMULROOT namei_getstartdir() gets used twice so have to
vrele() "ni_rootdir" and "ni_erootdir" on entry.
 1.209  12-Mar-2019  hannken Take a reference on ndp->ni_rootdir and ndp->ni_erootdir.

A multithreaded process may chroot during namei() and we end up with
vn_under() trying to reference the now unreferenced ni_rootdir.

Ok: David Holland <dholland@netbsd.org>

Reported-by: syzbot+889319cdf91a3d0373a9@syzkaller.appspotmail.com
 1.208  09-Jul-2017  dholland branches: 1.208.6;
Fix vnode leak on error, introduced by the openat family changes in -r1.200.
From mjg@freebsd.
 1.207  01-Jun-2017  chs branches: 1.207.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.206  17-Apr-2017  hannken Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.
 1.205  22-Apr-2016  riastradh branches: 1.205.2; 1.205.4;
#if DIAGNOSTIC panic ---> KASSERTMSG
 1.204  12-Apr-2016  dholland Fix (finally) the rest of PR 47040.

Revert the supporting logic in -r1.190 of vfs_lookup.c, and fix the
important change to set searchdir = NULL instead of searchdir =
foundobj. Then supply the necessary new supporting logic to cope with
some new cases where searchdir can be null.

This is at the point when lookup_once crosses a mountpoint going down;
the idea was to avoid coupling locks across filesystems as that has a
number of potentially negative consequences. At this stage of namei,
though, it's important to set searchdir to null as this is what is
used later on to handle other cases arising from crossing mount
points. If you set it to be the same as foundobj, that instead creates
the impression that you looked up "/." on the new volume, and that
causes odd things to happen in corner cases such as the one appearing
in PR 47040.

This fix ought to be pulled up to -6 and -7, and it probably could be
safely, but given the delicacy of this code and the fact that it's
taken me more than three years to find the combination of time and
intestinal fortitude to do it, as well as the minor nature of the
resulting wrong behavior observed so far, I think we'll let that part
go.

This change also exposes an annoying corner case: if you cross a mount
point and the root directory vnode of the new volume is not a
directory but a symlink, we now have no searchdir to follow the
symlink relative to. In principle one could hang onto the searchdir
from before calling lookup_once and use that, or complexify the
interface of lookup_once to hang onto it as desired for this case.
Alternatively one could add the necessary null checks to namei_follow
and allow only absolute symlinks in this case, as for an absolute
symlink one doesn't need the old searchdir. However, given that only
broken filesystems have symlinks as their root vnodes, I'm not going
to bother. Instead if this happens we'll just fail with ENOTDIR.
 1.203  24-Aug-2015  pooka to garnish, dust with _KERNEL_OPT
 1.202  21-Apr-2015  riastradh Cull unused INRENAME and INRELOOKUP from callers.
 1.201  07-Feb-2014  hannken branches: 1.201.4; 1.201.6; 1.201.8; 1.201.12;
Change vnode operation lookup to return the resulting vnode *vpp unlocked.
Change cache_lookup() to return an unlocked vnode.

Discussed on tech-kern@

Welcome to 6.99.31
 1.200  18-Nov-2012  manu branches: 1.200.2;
Add most system calls for POSIX extended API set, part 2, with test cases:
faccessat(2), fchmodat(2), fchownat(2), fstatat(2), mkdirat(2), mkfifoat(2),
mknodat(2), linkat(2), readlinkat(2), symlinkat(2), renameat(2), unlinkat(2),
utimensat(2), openat(2).

Also implement O_SEARCH for openat(2)

Still missing:
- some flags for openat(2)
- fexecve(2) implementation
 1.199  05-Nov-2012  para make DEBUG kernels buildable again (typo)
 1.198  05-Nov-2012  dholland Rename the new ni_startdir (the slot used to hold the starting point
for openat() and friends) to ni_atdir to avoid confusion with a
previously existing (and, alas, still documented) ni_startdir field
that meant something else entirely.
 1.197  05-Nov-2012  dholland Disentangle the namecache from the internals of namei.

- Move the namecache's hash computation to inside the namecache code,
instead of being spread out all over the place. Remove cn_hash from
struct componentname and delete all uses of it.

- It is no longer necessary (if it ever was) for cache_lookup and
cache_lookup_raw to clear MAKEENTRY from cnp->cn_flags for the cases
that cache_enter already checks for.

- Rearrange the interface of cache_lookup (and cache_lookup_raw) to
make it somewhat simpler, to exclude certain nonexistent error
conditions, and (most importantly) to make it not require write access
to cnp->cn_flags.

This change requires a kernel bump.
 1.196  13-Oct-2012  dholland Replace hack implementation of NDAT() for "nameiat" with a proper one.
(This change requires a kernel bump.)
 1.195  10-Oct-2012  dholland In layer_lookup(), clear *vpp before returning EROFS, as otherwise a
stale value can be returned and this causes a diagnostic panic in
namei.

In relookup(), clear *vpp before calling VOP_LOOKUP, as is done in
lookup_once(), as an additional precautionary measure.

(in theory both of these fixes are not required together)

Should fix PR 47040.
 1.194  08-Oct-2012  dholland Add namei-level support for openat() and friends. The way you do it is
by calling NDAT(&nd, dirvp) after NDINIT().

Right now the implementation is vile and unspeakable to avoid changing
the kernel ABI; this way we can get openat() and friends into 6.1. I
will rectify the mess and bump the kernel once things are working.
 1.193  08-Oct-2012  dholland Tidy up namei internals to allow openat() and friends without getting
tangled in nfsd's special cases.
 1.192  27-Sep-2011  christos branches: 1.192.2; 1.192.8; 1.192.12;
include <sys/dirent.h> to make MAXNAMLEN visible.
 1.191  27-Sep-2011  christos use KERNEL_NAME_MAX to enforce the same limit to names as before, and
make sure that MAXNAMLEN == NAME_MAX
 1.190  01-Sep-2011  yamt redo vfs_lookup.c rev.1.126.
when crossing a mount point, don't keep the parent vnode locked.
ie. don't lock a vnode while holding another vnode which belongs to a
different filesystem. otherwise we propagate slowness (or deadness) of a
filesystem to another via vnode lock chain.
 1.189  13-Aug-2011  riastradh Handle absolute symlinks to the root.

Fixes panic on `ln -s / foo && cd foo' found by ober by trying to run
wine.

ok dholland
 1.188  10-Aug-2011  dholland Revert previous, it breaks nullfs. (And I guess there are no tests for
nullfs?)
 1.187  09-Aug-2011  dholland Fail namei immediately if searchdir is unlinked / has been rmdir'd.
Do this by checking if v_size == 0. Should fix PR 44658 (and PR 32661).
 1.186  09-Aug-2011  dholland Include missing part of previous commit to this file. (sigh)
 1.185  09-Aug-2011  dholland Simplify handling of slashes. Provides a proper fix for PR 44961.
 1.184  16-May-2011  dholland Hack for PR 44961: restore the prior "logic" pertaining to looking up /
to prevent a crash when attempting rename("/", "foo"). This is not really
what I want going forward and it may cause e.g. rmdir("blah/") to fail, so
if it causes trouble for anyone back it out. The right fix is going to have
to wait until the qemu/tcp_vtw problems I ran into last night get sorted out.
 1.183  18-Apr-2011  dholland Simplify logic: at the bottom of the loop, instead of checking if we
should continue and if not breaking unconditionally, check if we
should break and if not use the bottom of the loop to continue to the
next iteration.
 1.182  18-Apr-2011  dholland Goto considered harmful: now the "goto alldone" can be dropped by
reversing the sense of the associated test and using the big block I
moved a couple versions back (and didn't reindent on purpose) as the
body of the if statement.

There are now no gotos in namei_oneroot, only normal loop logic.
 1.181  18-Apr-2011  dholland The "goto alldone" from a couple patches back (inside the loop) can
now be changed to a loop break and another null test and goto outside
the loop. In neither of the other two cases for exiting the loop can
foundobj be null.
 1.180  18-Apr-2011  dholland Goto considered harmful: "goto terminal" can now just be "break".
 1.179  18-Apr-2011  dholland Move the big chunk of code at "terminal:" outside the loop; since it
has an unconditional loop break at the end this can be done safely,
now that the other loop break has been patched out.

Add a spurious set of braces to preserve the indent for the moment.
 1.178  18-Apr-2011  dholland Goto still harmful, but use "goto alldone" in place of a loop break
for now anyway.
 1.177  18-Apr-2011  dholland Goto considered harmful; remove dirloop: in favor of using a loop
continue.

This runs the "are we mounted?" test on every directory instead of
only sometimes; however, it's not exactly an expensive test (null
pointer check) and the prior logic wasn't exactly sensible -- it
checked at the beginning and after following a symlink but, for some
reason, not after crossing a mount point.
 1.176  18-Apr-2011  dholland ISSYMLINK is now only referenced inside vfs_lookup.c, and not needed
there, so get rid of it.
 1.175  14-Apr-2011  yamt assertions
 1.174  11-Apr-2011  jakllsch Partially revert part of -r1.167; it was wrong. From dholland.
 1.173  11-Apr-2011  dholland description:
Update comments.
 1.172  11-Apr-2011  dholland Add comment warning about case with LOCKPARENT but not LOCKLEAF. bleh.
 1.171  11-Apr-2011  dholland description:
Remove dead assignment of "error" and simplify some uses of it.
 1.170  11-Apr-2011  dholland description:
Simplify refcount handling/cleanup in three places.
 1.169  11-Apr-2011  dholland description:
Don't assign inside an if-expression without an explicit comparison.
 1.168  11-Apr-2011  dholland description:
Assign NULL to ni_dvp immediately before error return, rather than
halfway through the logic.
 1.167  11-Apr-2011  dholland description:
namei_follow() randomly drops foundobj on success. Do that in the
caller instead. On the other hand, the caller was updating cn_nameptr,
and since that's closely related to the buffer manipulation in
namei_follow, do that there.
 1.166  11-Apr-2011  dholland description:
Update some comments.
 1.165  11-Apr-2011  dholland description:
Don't assign ni_vp until namei_oneroot() returns.
 1.164  11-Apr-2011  dholland description:
Make sure namei_oneroot leaves ni_dvp and ni_vp NULL on error.
 1.163  11-Apr-2011  dholland description:
Cosmetic: names of scratch vnodes.
 1.162  11-Apr-2011  dholland description:
Remove state->lookup_alldone. Don't need it any more; it's set
precisely when succeeding with a null result vnode and it now works to
just check for that case.

(also, when "error" is already 0 we don't need to assign another 0 to
it, even as a precaution.)
 1.161  11-Apr-2011  dholland description:
Pass foundobj to namei_follow() instead of fishing in the global state.
 1.160  11-Apr-2011  dholland description:
Fix lookup_for_nfsd_index() -- it wasn't locking the directory it was
searching. I'm not sure if this is something I introduced or if it's
just been wrong for ages; the code path is used only for serving
index.html in WebNFS and probably just ought to be removed.
 1.159  11-Apr-2011  dholland description:
Ensure we don't leak stale pointers out in ni_dvp or ni_vp on error return.
 1.158  11-Apr-2011  dholland description:
In the test where we check if searchdir is NULL and fail if we needed
to return ni_dvp, also check if searchdir is on a different volume
from foundobj. I believe the NULL test was meant to encompass this
situation, but it definitely doesn't in some cases related to
emulroots. This appears to be a bug, and I'm pretty sure it's not one
I introduced.

(The search directory and result are on different volumes if we
crossed a mount point.)
 1.157  11-Apr-2011  dholland description:
Don't assign ni_dvp until the end of namei_oneroot().
 1.156  11-Apr-2011  dholland description:
Improve previous by manipulating ni_dvp more intelligently.
 1.155  11-Apr-2011  dholland description:
Don't bother conditionally doing vput(ndp->ni_dvp) where it's always null.
(and don't bother testing for null where it never is)
 1.154  11-Apr-2011  dholland description:
In lookup_once(), assign newsearchdir_ret when searchdir is updated,
instead of upon return.
 1.153  11-Apr-2011  dholland description:
vref new vnodes before vrele'ing old vnodes, just in case.
 1.152  11-Apr-2011  dholland description:
state->namei_startdir has no further reason to exist.
 1.151  11-Apr-2011  dholland description:
namei_end() doesn't really do anything useful at this point, so get
rid of it.
 1.150  11-Apr-2011  dholland description:
As ndp->ni_dvp is also assigned to the updated search dir on every
return from lookup_once(), pass it back instead and update ni_dvp in
the caller.
 1.149  11-Apr-2011  dholland description:
lookup_once() on success always sets ni_vp to the same thing as the
returned foundobj, so do that in the caller instead.
 1.148  11-Apr-2011  dholland description:
In lookup_once(), move the assignments to ni_dvp and ni_vp to just
before function return.
 1.147  11-Apr-2011  dholland Use locals/args instead of state->dp in namei_once(). Remove
state->dp.
 1.146  11-Apr-2011  dholland Split the variable that replaced state->dp into two, to reflect its
actual usage.
 1.145  11-Apr-2011  dholland In namei_oneroot(), use a local in place of state->dp.
state->dp is now only used by/in lookup_once().
 1.144  11-Apr-2011  dholland Improve namei_atsymlink to take the found object as an argument
instead of fetching it from the global state.
 1.143  11-Apr-2011  dholland Move unrelated error handling logic out of namei_parsepath.
 1.142  11-Apr-2011  dholland Move assignment of search directory from ni_dvp outside namei_follow.
 1.141  11-Apr-2011  dholland Improve namei_follow to handle the search dir as an argument instead
of in the global state.
 1.140  11-Apr-2011  dholland Improve namei_start to pass back its result instead of updating the
global state.
 1.139  11-Apr-2011  dholland Simplify.
 1.138  11-Apr-2011  dholland Fold do_lookup into namei.
 1.137  11-Apr-2011  dholland Split TRYEMULROOT handling into its own function.
 1.136  11-Apr-2011  dholland Cut and paste and simplify code used by the other nfsd entry point, so
it won't get in the way.
 1.135  11-Apr-2011  dholland Merge nfsd's cut&paste copy of namei with the master one.
 1.134  11-Apr-2011  dholland More cleanup.
 1.133  11-Apr-2011  dholland Clean up. Move some more code across from nfsd's private entry points.
 1.132  22-Mar-2011  pooka pnbuf_cache is used all over the place outside of vfs, so put it
in one place to avoid many definitions.
 1.131  04-Jan-2011  dholland branches: 1.131.2;
Tsort functions and remove a small #if 0 block leftover from earlier cleanup.
No functional change.
 1.130  02-Jan-2011  dholland Remove the special refcount behavior (adding an extra reference to the
parent dir) associated with SAVESTART in relookup().

Check all call sites to make sure that SAVESTART wasn't set while
calling relookup(); if it was, adjust the refcount behavior. Remove
related references to SAVESTART.

The only code that was reaching the extra ref was msdosfs_rename,
where the refcount behavior was already fairly broken and/or gross;
repair it.

Add a dummy 4th argument to relookup to make sure code that hasn't
been inspected won't compile. (This will go away next time the
relookup semantics change, which they will.)
 1.129  02-Jan-2011  dholland Add an INRELOOKUP namei flag. Sigh. (We don't need more namei flags.)

However, because of a protocol deficiency puffs relies on being able
to keep track of VOP_LOOKUP calls by inspecting their contents, and
this at least allows it to use something vaguely principled instead of
making wild guesses based on whether SAVESTART is set.

Update libp2k to use INRELOOKUP instead of SAVESTART.
 1.128  02-Jan-2011  dholland Remove unused nameidata field ni_startdir.
 1.127  20-Dec-2010  yamt revert vfs_lookup.c rev.1.126 for now because some problems are reported
on source-changes-d@ (thanks pooka) and i don't think i can take a look at
them in a timely manner.
 1.126  17-Dec-2010  yamt - lookup_once: when crossing a mount point, don't keep the parent vnode locked.
ie. don't lock a vnode while holding another vnode which belongs to a
different filesystem. otherwise we propagate slowness (or deadness) of a
filesystem to another via vnode lock chain.
- lookup_parsepath: don't alter vnode states. let the caller do it instead.
- add comments and assertions.
 1.125  30-Nov-2010  dholland Abolish the SAVENAME and HASBUF flags. There is now always a buffer,
so the path in a struct componentname is now always valid during VOP
calls.
 1.124  30-Nov-2010  dholland Abolish struct componentname's cn_pnbuf. Use the path buffer in the
pathbuf object passed to namei as work space instead. (For now a pnbuf
pointer appears in struct nameidata, to support certain unclean things
that haven't been fixed yet, but it will be going away in the future.)

This removes the need for the SAVENAME and HASBUF namei flags.
 1.123  19-Nov-2010  dholland Introduce struct pathbuf. This is an abstraction to hold a pathname
and the metadata required to interpret it. Callers of namei must now
create a pathbuf and pass it to NDINIT (instead of a string and a
uio_seg), then destroy the pathbuf after the namei session is
complete.

Update all namei call sites accordingly. Add a pathbuf(9) man page and
update namei(9).

The pathbuf interface also now appears in a couple of related
additional places that were passing string/uio_seg pairs that were
later fed into NDINIT. Update other call sites accordingly.
 1.122  24-Jun-2010  hannken Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.
 1.121  08-Jan-2010  pooka branches: 1.121.2; 1.121.4;
The VATTR_NULL/VREF/VHOLD/HOLDRELE() macros lost their will to live
years ago when the kernel was modified to not alter ABI based on
DIAGNOSTIC, and now just call the respective function interfaces
(in lowercase). Plenty of mix'n match upper/lowercase has creeped
into the tree since then. Nuke the macros and convert all callsites
to lowercase.

no functional change
 1.120  27-Sep-2009  dholland Move a big wodge of symlink-following code from nfsd to inside
lookup_for_nfsd(). This code is, or at least should be, the same as
the regular symlink-following code plus an extra flag nfsd needs.

The two lots of code can/will be merged in the future.
 1.119  27-Sep-2009  dholland Rename lookup() to lookup_for_nfsd(), to make it clear just whose
private backdoor entry point this is.

Also, clone the lookup_for_nfsd() entry point as
lookup_for_nfsd_index(), for use by a different call site in nfsd that
does different unclean things with nameidata.
 1.118  09-Aug-2009  dholland Begin splitting lookup() into more tractable pieces too.
 1.117  09-Aug-2009  dholland Begin splitting up namei into smaller pieces.
 1.116  29-Jun-2009  dholland Add namei_simple_kernel and namei_simple_user. These provide the common
case functionality of namei in a simple package with only a couple flags.

A substantial majority of the namei call sites in the kernel can use
this interface; this will isolate those areas from the changes arising
as the internals of namei are fumigated.
 1.115  26-Jun-2009  christos magic symlink cleanup:
- use size_t for len
- don't call strlen multiple times in macro
- add gid
- off by one in bounds calculation
 1.114  04-May-2009  yamt when freeing cn_pnbuf, make it NULL if DIAGNOSTIC.
 1.113  11-Feb-2009  enami Make module (auto)loading under chroot envrionment actually work:
- NOCHROOT flag must be assigned to different bit from TRYEMULROOT
since the code expected to be executed is in the else clase of
if (flags & TRYEMULROOT).
- Necessary variables aren't set.
 1.112  17-Jan-2009  yamt branches: 1.112.2;
malloc -> kmem_alloc.
 1.111  14-Nov-2008  ad Add a NOCHROOT flag for namei(). Looks outside any chroot and performs the
lookup from the root directory if given an absolute path.
 1.110  20-Aug-2008  pooka branches: 1.110.2; 1.110.4;
Remove my development ifdefs. (hi simon!)
 1.109  31-Jul-2008  simonb Merge the simonb-wapbl branch. From the original branch commit:

Add Wasabi System's WAPBL (Write Ahead Physical Block Logging)
journaling code. Originally written by Darrin B. Jewell while
at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

OK'd by core@, releng@.
 1.108  06-May-2008  ad branches: 1.108.4; 1.108.6;
PR kern/38141 lookup/vfs_busy acquire rwlock recursively

Simplify the mount locking. Remove all the crud to deal with recursion on
the mount lock, and crud to deal with unmount as another weirdo lock.

Hopefully this will once and for all fix the deadlocks with this. With this
commit there are two locks on each mount:

- krwlock_t mnt_unmounting. This is used to prevent unmount across critical
sections like getnewvnode(). It's only ever read locked with rw_tryenter(),
and is only ever write locked in dounmount(). A write hold can't be taken
on this lock if the current LWP could hold a vnode lock.

- kmutex_t mnt_updating. This is taken by threads updating the mount, for
example when going r/o -> r/w, and is only present to serialize updates.
In order to take this lock, a read hold must first be taken on
mnt_unmounting, and the two need to be held across the operation.

One effect of this change: previously if an unmount failed, we would make a
half hearted attempt to back out of it gracefully, but that was unlikely to
work in a lot of cases. Now while an unmount that will be aborted is in
progress, new file operations within the mount will fail instead of being
delayed. That is unlikely to be a problem though, because if the admin
requests unmount of a file system then s(he) has made a decision to deny
access to the resource.
 1.107  06-May-2008  ad lookup: Do a vfs_trybusy(). If the file system is being unmounted, then
just fail the operation.
 1.106  30-Apr-2008  ad PR kern/38135 vfs_busy/vfs_trybusy confusion

The previous fix worked, but it opened a window where mounts could have
disappeared from mountlist while the caller was traversing it using
vfs_trybusy(). Fix that.
 1.105  29-Apr-2008  ad kern/38135 vfs_busy/vfs_trybusy confusion

The symptom was that sometimes file systems would occasionally not appear
in output from 'df' or 'mount' if the system was busy. Resolution:

- Make mount locks work somewhat like vm_map locks.
- vfs_trybusy() now only fails if the mount is gone, or if someone is
unmounting the file system. Simple contention on mnt_lock doesn't
cause it to fail.
- vfs_busy() will wait even if the file system is being unmounted.
 1.104  30-Jan-2008  ad branches: 1.104.6; 1.104.8; 1.104.10;
PR kern/37706 (forced unmount of file systems is unsafe):

- Do reference counting for 'struct mount'. Each vnode associated with a
mount takes a reference, and in turn the mount takes a reference to the
vfsops.
- Now that mounts are reference counted, replace the overcomplicated mount
locking inherited from 4.4BSD with a recursable rwlock.
 1.103  31-Dec-2007  ad Remove systrace. Ok core@.
 1.102  08-Dec-2007  pooka branches: 1.102.4;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.
 1.101  04-Dec-2007  mjf Implement a new magic string for magic symlinks, @ruid, which exapnds to the
real user id of the process and use this magic string for per-user tmp.
This should fix PR/35687

Kernel parts reviewed by wrstuden@
 1.100  26-Nov-2007  pooka branches: 1.100.2;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.99  07-Nov-2007  ad Merge from vmlocking:

- pool_cache changes.
- Debugger/procfs locking fixes.
- Other minor changes.
 1.98  10-Oct-2007  ad branches: 1.98.2; 1.98.4;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.97  15-Aug-2007  ad branches: 1.97.2; 1.97.4;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.
 1.96  12-Aug-2007  pooka Revert code part of rev 1.95, yamt pointed out it changes NFS semantics.
 1.95  12-Aug-2007  pooka CREATE is a write operation in my book, so check for that also when
checking for a readonly lookup. This shouldn't make a difference
now, though, as the only RDONLY lookup is done by getcwd(), and
that a) doesn't create files b) calls LOOKUP directly anyway.

Also, fix comment I managed to miss in the previous commit (I didn't
expect the same comment to be there twice).
 1.94  12-Aug-2007  pooka cn_flags RDONLY brilliantly has nothing to do with the file system
itself being r/o, so fix a couple of misguided comments.
 1.93  09-Jul-2007  ad branches: 1.93.2; 1.93.6;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.92  19-May-2007  christos - remove pathname_ interface.
- use macros to deal with pathnames in userspace, when veriexec is used.
- reorder the veriexec_ call arguments for consistency.
With help from elad@ finding the last bug.
 1.91  26-Apr-2007  dsl Since ktrace/systrace can sleep, move the VREF(dp) to before them.
 1.90  26-Apr-2007  dsl Be a little less over-zelous about converting ".." at the emulation root
to the real root. Rather that do the check inside lookup() - where it
applies to to every ".." in a pathname, explicitly check the start of
the caller-supplied buffers and any absolute symbolic links.
Note that in the latter case the re-search from the real root is supressed.
Should fix PR kern/36225
 1.89  26-Apr-2007  dsl Pass the emulation root string into namei() from emul_find_interp() so that
the ktrace entries for lookups done during exec can have the full filename.
This is rather a hack :-)
 1.88  26-Apr-2007  dsl Move the ktrace (and systrace) in namei() inside the retry loop for
emulation lookups.
If doing a lookup relative to the emulation root, prepend the emulation root
to the traced filename.
While here pass the filename length through to the ktrace code since namei()
knows the length and ktr_namei() would have to call strlen().
Note: that if namei() is being called during execve processing, the emulation
root name isn't available and "/emul/???" is used. Also namei() has to use
strlen() to get the lenght on the emulatoon root - even though it is a
compile-time constant string.
 1.87  25-Apr-2007  dsl Move the place where we convert the return value of emulation lookups that
would return the emulation-root to the real root to the main exit path.
Means that lookups of both "/" and "/." get converted from "/emul/xxx" to "/".
 1.86  23-Apr-2007  dsl When we return the real root instead of the emulated root, we may
not have the parent vnode for the emulated root - so dont vput() it.
May fix PR kern/36197.
 1.85  22-Apr-2007  dsl Change the way that emulations locate files within the emulation root to
avoid having to allocate space in the 'stackgap'
- which is very LWP unfriendly.
The additional code for non-emulation namei() is trivial, the reduction for
the emulations is massive.
The vnode for a processes emulation root is saved in the cwdi structure
during process exec.
If the emulation root the TRYEMULROOT flag are set, namei() will do an initial
search for absolute pathnames in the emulation root, if that fails it will
retry from the normal root.
".." at the emulation root will always go to the real root, even in the middle
of paths and when expanding symlinks.
Absolute symlinks found using absolute paths in the emulation root will be
relative to the emulation root (so /usr/lib/xxx.so -> /lib/xxx.so links
inside the emulation root don't need changing).
If the root of the emulation would be returned (for an emulation lookup), then
the real root is returned instead (matching the behaviour of emul_lookup,
but being a cheap comparison here) so that programs that scan "../.."
looking for the root dircetory don't loop forever.
The target for symbolic links is no longer mangled (it used to get the
CHECK_ALT_xxx() treatment, so could get /emul/xxx prepended).
CHECK_ALT_xxx() are no more. Most of the change is deleting them, and adding
TRYEMULROOT to the flags to NDINIT().
A lot of the emulation system call stubs could now be deleted.
 1.84  22-Feb-2007  thorpej branches: 1.84.4; 1.84.6;
TRUE -> true, FALSE -> false
 1.83  21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.82  17-Feb-2007  pavel Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.
 1.81  04-Feb-2007  chs branches: 1.81.2;
more fixes for the new vnode locking scheme:
- don't use SAVESTART in calls to relookup() from unionfs,
just vref() the desired vnode when we need to.
- fix locking and refcounting in the unionfs EEXIST error cases.
- release any vnode locks before calling VFS_ROOT(), vfs_busy() is enough.
this allows us to simplify union_root() and fix PR 3006.
- union_lock() doesn't handle shared lock requests correctly,
so convert them to exclusive instead. fixes PR 34775.
- in relookup(), avoid reusing "dp" for different purposes,
the error handling wasn't right. (actually just get rid of dp.)
also, change relookup() to ignore LOCKLEAF and always return the
vnode locked since the callers already expect this.
 1.80  31-Jan-2007  elad PR/35524: Brian de Alwis: panic from free in pathname_get

Patch applied, thanks for the report!
 1.79  07-Jan-2007  pooka update some comments for vnode locking smoergasbord change

amazing -- the description of VOP_LOOKUP is suddenly human-readable
 1.78  07-Jan-2007  pooka Restore name caching behaviour accidentally removed in rev 1.73, using
variation suggested by yamt on tech-kern.

XXX: The exception is that this doesn't any longer prevent caching
of RENAME, which was implied in a weird weird way previously. But
that's handled by the callers currently.
 1.77  27-Dec-2006  chs fix two more problems in the recent changes to lookup():
- don't hold the parent directory vnode locked while traversing mount points.
the fs that's mounted might be an NFS served by a userland process
like the automounter, which might need to traverse the parent directory
in order to complete the lookup.
- in the ENAMETOOLONG case fixed in rev. 1.75, set ni_dvp to dp
since we've logically moved on to using "dp" as the parent.
the caller will then handle vput()ing it as normal.
this fixes PR 35279.
 1.76  24-Dec-2006  elad PR/35278: YAMAMOTO Takashi: veriexec sometimes feeds user va to log(9)

Introduce the (intentionally undocumented) pathname_get(), pathname_path(),
and pathname_put(), to deal with allocating and copying of pathnames from
either kernel- or user-space.
 1.75  13-Dec-2006  yamt lookup: add more missing vput().
 1.74  13-Dec-2006  chs in lookup(), vput() the starting vnode in the case where
we return with both ni_dvp and ni_vp being NULL.
 1.73  09-Dec-2006  chs a smorgasbord of improvements to vnode locking and path lookup:
- LOCKPARENT is no longer relevant for lookup(), relookup() or VOP_LOOKUP().
these now always return the parent vnode locked. namei() works as before.
lookup() and various other paths no longer acquire vnode locks in the
wrong order via vrele(). fixes PR 32535.
as a nice side effect, path lookup is also up to 25% faster.
- the above allows us to get rid of PDIRUNLOCK.
- also get rid of WANTPARENT (just use LOCKPARENT and unlock it).
- remove an assumption in layer_node_find() that all file systems implement
a recursive VOP_LOCK() (unionfs doesn't).
- require that all file systems supply vfs_vptofh and vfs_fhtovp routines.
fill in eopnotsupp() for file systems that don't support being exported
and remove the checks for NULL. (layerfs calls these without checking.)
- in union_lookup1(), don't change refcounts in the ISDOTDOT case, just
adjust which vnode is locked. fixes PR 33374.
- apply fixes for ufs_rename() from ufs_vnops.c rev. 1.61 to ext2fs_rename().
 1.72  04-Nov-2006  elad branches: 1.72.2;
Add "@uid" keyword translation, to translate effective user-id of the
process.
 1.71  23-Jul-2006  ad branches: 1.71.4; 1.71.6;
Use the LWP cached credentials where sane.
 1.70  14-May-2006  elad integrate kauth.
 1.69  03-Mar-2006  rumble branches: 1.69.2; 1.69.4; 1.69.6;
Update namei(9) comments and man page to indicate that we operate on
vnodes, not inodes.
 1.68  01-Mar-2006  yamt merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.
 1.67  12-Feb-2006  chs convert "magiclinks" from a per-fs mount option to a system-wide sysctl.
as discussed on tech-kern quite some time ago.
 1.66  04-Feb-2006  yamt for some random places, use PNBUF_GET/PUT rather than
- on-stack buffer
- malloc(MAXPATHLEN)
 1.65  27-Dec-2005  chs branches: 1.65.2; 1.65.4; 1.65.6;
change errors returned for various operations on "/" to conform to SUSv3.
as discussed on tech-kern some time back.
 1.64  11-Dec-2005  christos merge ktrace-lwp.
 1.63  06-Jul-2005  thorpej A few tweaks to magic symlinks:
- Add a @{var} syntax in addition to @var. This allows for patterns like
@{ostype}-@{osrelease}-@{machine_arch}.
- Add a @emul variable that expands to the process's emulation name
(e.g. "netbsd", "netbsd32", "linux", etc.)
 1.62  23-Jun-2005  thorpej branches: 1.62.2;
Remove the last references to M_NAMEI; everything should be using PNBUF_*()
now (for a long time now). Remove M_NAMEI, and bump the kernel version to
3.99.7 to reflect its removal.
 1.61  23-Jun-2005  thorpej Implement expansion of special "magic" strings in symlinks into
system-specific values. Submitted by Chris Demetriou in Nov 1995 (!)
in PR kern/1781, modified only slighly by me.

This is enabled on a per-mount basis with the MNT_MAGICLINKS mount
flag. It can be enabled at mountroot() time by building the kernel
with the ROOTFS_MAGICLINKS option.

The following magic strings are supported by the implementation:

@machine value of MACHINE for the system
@machine_arch value of MACHINE_ARCH for the system
@hostname the system host name, as set with sethostname()
@domainname the system domain name, as set with setdomainname()
@kernel_ident the kernel config file name
@osrelease the releaes number of the OS
@ostype the name of the OS (always "NetBSD" for NetBSD)

Example usage:

mkdir /arch/i386/bin
mkdir /arch/sparc/bin
ln -s /arch/@machine_arch/bin /bin
 1.60  05-Jun-2005  thorpej Use ANSI function decls.
 1.59  29-May-2005  christos - add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.
 1.58  08-May-2005  christos Panic strings should not end with \n.
 1.57  08-Mar-2005  wrstuden branches: 1.57.2;
Adjust error case handling. If the VOP_LOOKUP() call unlocked the
parent directory node, remember that.

Addresses locking/lookup issues seen in:

http://mail-index.NetBSD.org/tech-kern/2004/06/20/0019.html
http://mail-index.netbsd.org/tech-kern/2005/01/08/0000.html
 1.56  26-Feb-2005  perry nuke trailing whitespace
 1.55  17-Sep-2004  skrll branches: 1.55.4; 1.55.6;
There's no need to pass a proc value when using UIO_SYSSPACE with
vn_rdwr(9) and uiomove(9).

OK'd by Jason Thorpe
 1.54  08-Dec-2003  hannken branches: 1.54.2; 1.54.4;
Fix the last commit(s). On machines with sizeof(long) != sizeof(int)
the hash compare would fail.
 1.53  06-Dec-2003  yamt fix a debug code to follow recent change about tailing slashes.
 1.52  06-Dec-2003  yamt - turn non-verbose parts of NAMEI_DIAGNOSTIC into normal DEBUG.
- comments on #endif.
 1.51  11-Sep-2003  christos PR/15397: Jason Thorpe: directory operations on pathnames that refer to
directories and have trailing slashes should succeed. Ok'd by kjk.
Fix provided by enami.
 1.50  25-Aug-2003  cb fix a race condition between path resolution in userland
and the subsequent namei(): inform the kernel portion of
valid filenames and then disallow symlink lookups for
those filenames by means of a hook in namei().
with suggestions from provos@

also, add (currently unused) seqnr field to struct
systrace_replace, from provos@
 1.49  07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.48  29-Jun-2003  fvdl branches: 1.48.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.47  29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.46  28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.45  10-Apr-2003  erh Make sure the directory is still mounted before looking things up in it.
Fixes PR kern/5683.
 1.44  01-Feb-2003  thorpej Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.
 1.43  20-Jan-2003  christos name the component that should be leaf in the diagnostic.
 1.42  22-Oct-2002  simonb We go to a lot of effort to choose a suitable value for "docache" in
relookup() ... then ignore it! Remove it.
 1.41  02-Aug-2002  soren Make NAMEI_DIAGNOSTIC compile.
 1.40  21-Jun-2002  wrstuden If we're in a chroot, and we are looking up '..', make sure we are
still in the chroot. If not, teleport the lookup to the chroot
and log. Closes an assisted-jail escape method pointed out by
xs@kittenz.org. Patch from xs@kittenz.org and myself
 1.39  08-Dec-2001  lukem branches: 1.39.8; 1.39.10;
- Implement
uint32_t namei_hash(const char *p, const char **ep)
which determines the equivalent MI hash32_str() hash for p.
If *ep != NULL, calculate the hash to the character before ep.
If *ep == NULL, calculate the has to the first / or NUL found, and
point *ep to that location.
- Use namei_hash() to calculate cn_hash in lookup() and relookup().
Hash distribution goes from 35-40% to 55-70%, with similar profiled
time spent in cache_lookup() and cache_enter() on my P3-600.
- Use namei_hash() to calculate cn_hash in nfs_readdirplusrpc(),
insetad of homegrown code (that differed from that in lookup() !)
namei_hash() has better spread and is faster than previous code
(which used a non-constant multiplication).
 1.38  12-Nov-2001  lukem add RCSIDs
 1.37  17-Oct-2001  thorpej branches: 1.37.2;
Use a pool cache for namei buffers -- it's faster to allocate from
a pool cache than a pool.
 1.36  08-Sep-2001  christos Set the credentials to be used in the NDINIT macro so that syscalls can
hijack them.
 1.35  03-Aug-2000  thorpej branches: 1.35.2; 1.35.4; 1.35.6;
Convert namei pathname buffer allocation to use the pool allocator.
 1.34  27-May-2000  sommerfeld branches: 1.34.4;
Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()
 1.33  30-Mar-2000  augustss Get rid of register declarations.
 1.32  03-Aug-1999  wrstuden branches: 1.32.2;
Modify how lookup walks up mount points. As suggested by Konrad
Schroder <perseant@hitl.washington.edu>, unlock the mounted on
vnode before we call VFS_ROOT so that we cover the case where the new
root vnode shares a lock with the mounted-on vnode. Note that we have
asserted vfs_busy on the new fs before unlocking, so no other process can
steal the mount out from under us.
 1.31  08-Jul-1999  wrstuden Modify file systems to deal with struct lock in struct vnode. All leaf
fs's other than nfs use genfs_lock() for locking.

Modify lookup routines to set PDIRUNLOCK when they unlock the parrent.
 1.30  30-Apr-1999  thorpej Break cdir/rdir/cmask info out of struct filedesc, and put it in a new
substructure, `cwdinfo'. Implement optional sharing of this substructure.

This is required for clone(2).
 1.29  07-Apr-1999  wrstuden Fix obscure bug in namei(), which was the cause of PR 7306.

The problem is that if "sl" is a symbolic link, a lookup on "sl/"
will be flagged as the last component. Thus VOP_LOOKUP will lock
the parent directory if LOCKPARENT is set. In order for the symbolic
link to be resolved, this lock needs to be released. namei() would
test for this by checking if ni_pathlen == 1, which it wouldn't as
"/" is left in the name, and namei() would not unlock the parent.
The next call to lookup() to resolve the symbolic link would fail
as the parent was still locked.
 1.28  04-Aug-1998  perry branches: 1.28.6;
Abolition of bcopy, ovbcopy, bcmp, and bzero, phase one.
bcopy(x, y, z) -> memcpy(y, x, z)
ovbcopy(x, y, z) -> memmove(y, x, z)
bcmp(x, y, z) -> memcmp(x, y, z)
bzero(x, y) -> memset(x, 0, y)
 1.27  25-Jun-1998  thorpej defopt KTRACE
 1.26  01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.25  30-Oct-1997  enami Conditionalize the recognition of symbolic link permission by
per fs mount option `symperm'.
 1.24  11-Oct-1997  enami branches: 1.24.2;
Check exec bit of symbolic link when traversing path and do it in vfs layer.
Suggested by der Mouse. Ok'ed by Jason R. Thorpe.
 1.23  08-May-1997  mycroft branches: 1.23.4;
Snapshot of namei() cleanup:
1) Eliminate all of the null component name special cases; handle runs of
slashes and leading and trailing slashes completely differently.
2) Return ENOENT when doing a lookup through an empty symlink.
3) Enforce that we're doing a lookup through a directory in in chdir() and
lookup() rather than in foo_lookup().

Not yet finished.
 1.22  08-Apr-1997  kleink Added a sanity check to the force-directory routine, as the CREATE and
RENAME namei() operations may succeed without returning a vnode.
 1.21  08-Apr-1997  kleink POSIX.1 changes to namei():

(1) "" no longer refers to the current working directory;
looking this up will now result in ENOENT.

(2) by stripping off trailing slashes and setting a `forcedir'
flag, make code such as { mkdir("dir", m); rmdir("dir/"); }
actually work.
 1.20  25-Oct-1996  cgd make the namei struct members ni_dirp and ni_next, and the componentname
struct member cn_nameptr 'const', since they should never be used to
modify the path name. (Only the pathname buffer, cn_pnbuf, should be
modified.) Propagate the const poisoning to code that uses the namei
and componentname structs.
 1.19  13-Oct-1996  christos backout previous kprintf change
 1.18  10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.17  09-Feb-1996  christos More proto fixes
 1.16  04-Feb-1996  christos First pass at prototyping
 1.15  08-Mar-1995  cgd needs systm.h
 1.14  14-Dec-1994  mycroft Sync with CSRG.
 1.13  29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.12  08-Jun-1994  mycroft Update to 4.4-Lite fs code.
 1.11  18-May-1994  cgd mostly-machine-indepedent switch, and changes to match. also, hack init_main
 1.10  17-May-1994  cgd copyright foo
 1.9  05-Jan-1994  cgd minor cleanup; extra spaces, patchkit info, etc.
 1.8  04-Jan-1994  cgd add support for union and loopback mounts, from jsp
 1.7  18-Dec-1993  mycroft Canonicalize all #includes.
 1.6  20-Nov-1993  cgd do something better with lookup return values; suggested by BSDI's msdosfs mod
 1.5  07-Sep-1993  ws branches: 1.5.2;
Changes to VFS readdir semantics
NFS changes for better cookie support
ISOFS changes for better Rockridge support and support for generation numbers
 1.4  01-Aug-1993  mycroft Add RCS identifiers (this time on the correct side of the branch), and
incorporate recent changes in netbsd-0-9 branch.
 1.3  20-May-1993  cgd branches: 1.3.2;
add $Id$ strings, and clean up file headers where necessary
 1.2  21-Mar-1993  cgd after 0.2.2 "stable" patches applied
 1.1  21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.3  01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.2  01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.1.1.1  21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.3.2.1  31-Jul-1993  cgd give names, err, wmesg's, to my "pain" -- i.e. convert sleep() to tsleep()
 1.5.2.2  20-Nov-1993  cgd do something better with lookup return values; suggested by BSDI's msdosfs mod
 1.5.2.1  14-Nov-1993  mycroft Canonicalize all #includes.
 1.23.4.1  14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.24.2.1  31-Oct-1997  mellon Pull rev 1.25 up from trunk (enami)
 1.28.6.1  07-Apr-1999  wrstuden branches: 1.28.6.1.2;
Pull up revision 1.28 -> 1.29, approved by Perry.

Here's the revised commit message from 1.29:

Fix obscure bug in namei(), which was the cause of PR 7306.

The problem is that if "sl" is a symbolic link, a lookup on "sl/"
will be flagged as the last component. Thus VOP_LOOKUP will lock
the parent directory if LOCKPARENT is set. In order for the symbolic
link to be resolved, this lock needs to be released. namei() would
test for this by checking if ni_pathlen == 1, which it wouldn't as
"/" is left in the name, and namei() would not unlock the parent.
The next call to lookup() to resolve the symbolic link would fail
as the parent was still locked.
 1.28.6.1.2.2  02-Aug-1999  thorpej Update from trunk.
 1.28.6.1.2.1  21-Jun-1999  thorpej Sync w/ -current.
 1.32.2.1  20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.34.4.1  26-Jun-2002  he Pull up revision 1.40 (requested by wrstuden):
Fix a chroot escape method, and log attempts.
 1.35.6.1  01-Oct-2001  fvdl Catch up with -current.
 1.35.4.3  06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.35.4.2  10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.35.4.1  13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.35.2.7  11-Nov-2002  nathanw Catch up to -current
 1.35.2.6  13-Aug-2002  nathanw Catch up to -current.
 1.35.2.5  01-Aug-2002  nathanw Catch up to -current.
 1.35.2.4  08-Jan-2002  nathanw Catch up to -current.
 1.35.2.3  14-Nov-2001  nathanw Catch up to -current.
 1.35.2.2  22-Oct-2001  nathanw Catch up to -current.
 1.35.2.1  21-Sep-2001  nathanw Catch up to -current.
 1.37.2.1  12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.39.10.1  21-Jun-2002  lukem Pull up revision 1.40 (requested by wrstuden in ticket #336):
If we're in a chroot, and we are looking up '..', make sure we are
still in the chroot. If not, teleport the lookup to the chroot
and log. Closes an assisted-jail escape method pointed out by
xs@kittenz.org. Patch from xs@kittenz.org and wrstuden
 1.39.8.2  29-Aug-2002  gehenna catch up with -current.
 1.39.8.1  15-Jul-2002  gehenna catch up with -current.
 1.48.2.8  10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.48.2.7  01-Apr-2005  skrll Sync with HEAD.
 1.48.2.6  04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.48.2.5  21-Sep-2004  skrll Fix the sync with head I botched.
 1.48.2.4  18-Sep-2004  skrll Sync with HEAD.
 1.48.2.3  24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.48.2.2  03-Aug-2004  skrll Sync with HEAD
 1.48.2.1  02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.54.4.1  16-Mar-2005  tron Pull up revision 1.57 (requested by wrstuden in ticket #1298):
Adjust error case handling. If the VOP_LOOKUP() call unlocked the
parent directory node, remember that.
Addresses locking/lookup issues seen in:
http://mail-index.NetBSD.org/tech-kern/2004/06/20/0019.html
http://mail-index.netbsd.org/tech-kern/2005/01/08/0000.html
 1.54.2.1  16-Mar-2005  tron Pull up revision 1.57 (requested by wrstuden in ticket #1298):
Adjust error case handling. If the VOP_LOOKUP() call unlocked the
parent directory node, remember that.
Addresses locking/lookup issues seen in:
http://mail-index.NetBSD.org/tech-kern/2004/06/20/0019.html
http://mail-index.netbsd.org/tech-kern/2005/01/08/0000.html
 1.55.6.1  19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.55.4.1  29-Apr-2005  kent sync with -current
 1.57.2.3  20-Jan-2006  riz Back out tickets 490, 559, and 560, which added "magic symlinks", at
the request of chs@ (thorpej@ concurs), as there is consensus that
this should be changed to a system-wide tunable, rather than a mount
option.
 1.57.2.2  29-Dec-2005  riz Pull up following revision(s) (requested by thorpej in ticket #559):
sys/kern/vfs_lookup.c: revision 1.63
A few tweaks to magic symlinks:
- Add a @{var} syntax in addition to @var. This allows for
patterns like
@{ostype}-@{osrelease}-@{machine_arch}.
- Add a @emul variable that expands to the process's emulation name
(e.g. "netbsd", "netbsd32", "linux", etc.)
 1.57.2.1  29-Dec-2005  riz Pull up following revision(s) (requested by thorpej in ticket #490):
lib/libc/sys/mount.2: revision 1.33
sys/sys/systm.h: revision 1.179
sys/sys/fstypes.h: revision 1.4
include/mntopts.h: revision 1.6
sys/conf/newvers.sh: revision 1.41
sys/kern/vfs_syscalls.c: revision 1.223
sys/conf/files: revision 1.720
sys/kern/vfs_lookup.c: revision 1.61
share/man/man7/symlink.7: revision 1.7
sbin/mount/mount.8: revision 1.47
sys/kern/init_main.c: revision 1.248 via patch
share/man/man4/options.4: revision 1.280 via patch
Implement expansion of special "magic" strings in symlinks into
system-specific values. Submitted by Chris Demetriou in Nov 1995 (!)
in PR kern/1781, modified only slighly by me.
This is enabled on a per-mount basis with the MNT_MAGICLINKS mount
flag. It can be enabled at mountroot() time by building the kernel
with the ROOTFS_MAGICLINKS option.
The following magic strings are supported by the implementation:
@machine value of MACHINE for the system
@machine_arch value of MACHINE_ARCH for the system
@hostname the system host name, as set with sethostname()
@domainname the system domain name, as set with setdomainname()
@kernel_ident the kernel config file name
@osrelease the releaes number of the OS
@ostype the name of the OS (always "NetBSD" for NetBSD)
Example usage:
mkdir /arch/i386/bin
mkdir /arch/sparc/bin
ln -s /arch/@machine_arch/bin /bin
 1.62.2.9  04-Feb-2008  yamt sync with head.
 1.62.2.8  21-Jan-2008  yamt sync with head
 1.62.2.7  07-Dec-2007  yamt sync with head
 1.62.2.6  15-Nov-2007  yamt sync with head.
 1.62.2.5  27-Oct-2007  yamt sync with head.
 1.62.2.4  03-Sep-2007  yamt sync with head.
 1.62.2.3  26-Feb-2007  yamt sync with head.
 1.62.2.2  30-Dec-2006  yamt sync with head.
 1.62.2.1  21-Jun-2006  yamt sync with head.
 1.65.6.2  01-Jun-2006  kardel Sync with head.
 1.65.6.1  22-Apr-2006  simonb Sync with head.
 1.65.4.1  09-Sep-2006  rpaulo sync with head
 1.65.2.2  18-Feb-2006  yamt sync with head.
 1.65.2.1  31-Dec-2005  yamt uio_segflg/uio_lwp -> uio_vmspace.
 1.69.6.1  24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.69.4.2  06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.69.4.1  08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.69.2.2  11-Aug-2006  yamt sync with head
 1.69.2.1  24-May-2006  yamt sync with head.
 1.71.6.2  18-Dec-2006  yamt sync with head.
 1.71.6.1  10-Dec-2006  yamt sync with head.
 1.71.4.4  09-Feb-2007  ad Sync with HEAD.
 1.71.4.3  01-Feb-2007  ad Sync with head.
 1.71.4.2  12-Jan-2007  ad Sync with head.
 1.71.4.1  18-Nov-2006  ad Sync with head.
 1.72.2.4  14-Nov-2012  riz Pull up following revision(s) (requested by dholland in ticket #1466):
sys/kern/vfs_lookup.c: revision 1.195
sys/miscfs/genfs/layer_vnops.c: revision 1.51
In layer_lookup(), clear *vpp before returning EROFS, as otherwise a
stale value can be returned and this causes a diagnostic panic in
namei.
In relookup(), clear *vpp before calling VOP_LOOKUP, as is done in
lookup_once(), as an additional precautionary measure.
(in theory both of these fixes are not required together)
Should fix PR 47040.
 1.72.2.3  17-Feb-2007  tron branches: 1.72.2.3.6;
Apply patch (requested by chs in ticket #422):
- Fix various deadlock problems with nullfs and unionfs.
- Speed up path lookups by upto 25%.
 1.72.2.2  07-Feb-2007  tron Pull up following revision(s) (requested by elad in ticket #399):
sys/kern/vfs_lookup.c: revision 1.80
PR/35524: Brian de Alwis: panic from free in pathname_get
Patch applied, thanks for the report!
 1.72.2.1  03-Jan-2007  tron Pull up following revision(s) (requested by elad in ticket #304):
sys/kern/vfs_syscalls.c: revision 1.282
sys/kern/vfs_lookup.c: revision 1.76
sys/sys/namei.h: revision 1.47
PR/35278: YAMAMOTO Takashi: veriexec sometimes feeds user va to log(9)
Introduce the (intentionally undocumented) pathname_get(), pathname_path(),
and pathname_put(), to deal with allocating and copying of pathnames from
either kernel- or user-space.
 1.72.2.3.6.1  14-Nov-2012  riz Pull up following revision(s) (requested by dholland in ticket #1466):
sys/kern/vfs_lookup.c: revision 1.195
sys/miscfs/genfs/layer_vnops.c: revision 1.51
In layer_lookup(), clear *vpp before returning EROFS, as otherwise a
stale value can be returned and this causes a diagnostic panic in
namei.
In relookup(), clear *vpp before calling VOP_LOOKUP, as is done in
lookup_once(), as an additional precautionary measure.
(in theory both of these fixes are not required together)
Should fix PR 47040.
 1.81.2.2  07-May-2007  yamt sync with head.
 1.81.2.1  27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.84.6.1  11-Jul-2007  mjf Sync with head.
 1.84.4.5  01-Sep-2007  ad Update for pool_cache API changes.
 1.84.4.4  20-Aug-2007  ad Sync with HEAD.
 1.84.4.3  17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.84.4.2  08-Jun-2007  ad Sync with head.
 1.84.4.1  21-Mar-2007  ad - Put a lock around the proc's CWD info (work in progress).
- Replace some more simplelocks.
- Make lbolt a condvar.
 1.93.6.5  09-Dec-2007  jmcneill Sync with HEAD.
 1.93.6.4  27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.93.6.3  11-Nov-2007  joerg Sync with HEAD.
 1.93.6.2  26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.93.6.1  16-Aug-2007  jmcneill Sync with HEAD.
 1.93.2.2  03-Sep-2007  skrll Sync with HEAD.
 1.93.2.1  15-Aug-2007  skrll Sync with HEAD.
 1.97.4.1  14-Oct-2007  yamt sync with head.
 1.97.2.4  23-Mar-2008  matt sync with HEAD
 1.97.2.3  09-Jan-2008  matt sync with HEAD
 1.97.2.2  08-Nov-2007  matt sync with -HEAD
 1.97.2.1  06-Nov-2007  matt sync with HEAD
 1.98.4.4  18-Feb-2008  mjf Sync with HEAD.
 1.98.4.3  27-Dec-2007  mjf Sync with HEAD.
 1.98.4.2  08-Dec-2007  mjf Sync with HEAD.
 1.98.4.1  19-Nov-2007  mjf Sync with HEAD.
 1.98.2.1  13-Nov-2007  bouyer Sync with HEAD
 1.100.2.2  26-Dec-2007  ad Sync with head.
 1.100.2.1  08-Dec-2007  ad Sync with head.
 1.102.4.1  02-Jan-2008  bouyer Sync with HEAD
 1.104.10.6  11-Aug-2010  yamt sync with head.
 1.104.10.5  11-Mar-2010  yamt sync with head
 1.104.10.4  19-Aug-2009  yamt sync with head.
 1.104.10.3  18-Jul-2009  yamt sync with head.
 1.104.10.2  04-May-2009  yamt sync with head.
 1.104.10.1  16-May-2008  yamt sync with head.
 1.104.8.1  18-May-2008  yamt sync with head.
 1.104.6.2  17-Jan-2009  mjf Sync with HEAD.
 1.104.6.1  02-Jun-2008  mjf Sync with HEAD.
 1.108.6.2  13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.108.6.1  19-Oct-2008  haad Sync with HEAD.
 1.108.4.1  10-Jun-2008  simonb Initial commit of Wasabi System's WAPBL (Write Ahead Physical Block
Logging) journaling code. Originally written by Darrin B. Jewell
while at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

Still a number of issues - look in doc/BRANCHES for "simonb-wapbl"
for more info.
 1.110.4.2  06-Nov-2012  riz Pull up following revision(s) (requested by dholland in ticket #1814):
sys/kern/vfs_lookup.c: revision 1.195
sys/miscfs/genfs/layer_vnops.c: revision 1.51
In layer_lookup(), clear *vpp before returning EROFS, as otherwise a
stale value can be returned and this causes a diagnostic panic in
namei.
In relookup(), clear *vpp before calling VOP_LOOKUP, as is done in
lookup_once(), as an additional precautionary measure.
(in theory both of these fixes are not required together)
Should fix PR 47040.
 1.110.4.1  17-Nov-2008  snj Pull up following revision(s) (requested by ad in ticket #76):
sys/sys/namei.h: revision 1.61
sys/kern/vfs_lookup.c: revision 1.111
Add a NOCHROOT flag for namei(). Looks outside any chroot and performs the
lookup from the root directory if given an absolute path.
 1.110.2.2  03-Mar-2009  skrll Sync with HEAD.
 1.110.2.1  19-Jan-2009  skrll Sync with HEAD.
 1.112.2.2  23-Jul-2009  jym Sync with HEAD.
 1.112.2.1  13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.121.4.4  31-May-2011  rmind sync with head
 1.121.4.3  21-Apr-2011  rmind sync with head
 1.121.4.2  05-Mar-2011  rmind sync with head
 1.121.4.1  03-Jul-2010  rmind sync with head
 1.121.2.1  17-Aug-2010  uebayasi Sync with HEAD.
 1.131.2.1  06-Jun-2011  jruoho Sync with HEAD.
 1.192.12.3  03-Dec-2017  jdolecek update from HEAD
 1.192.12.2  20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.192.12.1  20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.192.8.1  18-Nov-2012  msaitoh Pull up following revision(s) (requested by dholland in ticket #664):
sys/kern/vfs_lookup.c: revision 1.195
sys/miscfs/genfs/layer_vnops.c: revision 1.51
In layer_lookup(), clear *vpp before returning EROFS, as otherwise a
stale value can be returned and this causes a diagnostic panic in
namei.
In relookup(), clear *vpp before calling VOP_LOOKUP, as is done in
lookup_once(), as an additional precautionary measure.
(in theory both of these fixes are not required together)
Should fix PR 47040.
 1.192.2.3  22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.192.2.2  16-Jan-2013  yamt sync with (a bit old) head
 1.192.2.1  30-Oct-2012  yamt sync with head
 1.200.2.1  18-May-2014  rmind sync with head
 1.201.12.1  10-Jul-2017  martin Pull up following revision(s) (requested by dh in ticket #1451):
sys/kern/vfs_lookup.c: revision 1.208
Fix vnode leak on error, introduced by the openat family changes in -r1.200.
From mjg@freebsd.
 1.201.8.1  10-Jul-2017  martin Pull up following revision(s) (requested by dh in ticket #1451):
sys/kern/vfs_lookup.c: revision 1.208
Fix vnode leak on error, introduced by the openat family changes in -r1.200.
From mjg@freebsd.
 1.201.6.4  28-Aug-2017  skrll Sync with HEAD
 1.201.6.3  22-Apr-2016  skrll Sync with HEAD
 1.201.6.2  22-Sep-2015  skrll Sync with HEAD
 1.201.6.1  06-Jun-2015  skrll Sync with HEAD
 1.201.4.1  10-Jul-2017  martin Pull up following revision(s) (requested by dh in ticket #1451):
sys/kern/vfs_lookup.c: revision 1.208
Fix vnode leak on error, introduced by the openat family changes in -r1.200.
From mjg@freebsd.
 1.205.4.1  21-Apr-2017  bouyer Sync with HEAD
 1.205.2.1  26-Apr-2017  pgoyette Sync with HEAD
 1.207.2.2  21-Jun-2021  martin Pull up following revision(s) (requested by dholland in ticket #1685):

sys/sys/namei.src: revision 1.59 (via patch)
sys/kern/vfs_vnops.c: revision 1.215
sys/kern/vfs_lookup.c: revision 1.226

Add a new namei flag NONEXCLHACK for open with O_CREAT and not O_EXCL.
This case needs to be distinguished from the other CREATE operations
because it is supposed to successfully return (and open) the target if
it exists. In the case where that target is the root, or a mount
point, such that there's no parent dir, "real" CREATE operations fail,
but O_CREAT without O_EXCL needs to succeed.

So (a) add the flag, (b) test for it in namei in the situation
described above, (c) set it in open under the appropriate
circumstances, and (d) because this can result in namei returning
ni_dvp of NULL, cope with that case.

Should get into -9 and maybe even -8, because it was prompted by
issues with 3rd-party code. The use of a flag (vs. adding an
additional nameiop, which would be more appropriate) was deliberate to
make the patch small and noninvasive.
 1.207.2.1  10-Jul-2017  martin Pull up following revision(s) (requested by dh in ticket #116):
sys/kern/vfs_lookup.c: revision 1.208
Fix vnode leak on error, introduced by the openat family changes in -r1.200.
From mjg@freebsd.
 1.208.6.3  13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.208.6.2  08-Apr-2020  martin Merge changes from current as of 20200406
 1.208.6.1  10-Jun-2019  christos Sync with HEAD
 1.212.4.11  03-Mar-2020  ad lookup_fastforward(): bail out on mount -o union
 1.212.4.10  29-Feb-2020  ad Sync with head.
 1.212.4.9  25-Jan-2020  ad Make cwdinfo use mostly lockless, and largely hide the details in vfs_cwd.c.
 1.212.4.8  24-Jan-2020  ad Add a comment.
 1.212.4.7  23-Jan-2020  ad Improve an assertion.
 1.212.4.6  23-Jan-2020  ad - Change style of new code slightly to match rest of file.
- NFS lookup needs to cross mountpoint too.
- Update comments.
 1.212.4.5  22-Jan-2020  ad Fast-forward through the namecache was stopping one component too soon when
there was an obstacle, e.g. a mountpoint. The obstacle should be returned
not the parent directory.
 1.212.4.4  19-Jan-2020  ad - Add a LOCKSHARED flag to namei (matching FreeBSD) indicating that we want
the leaf locked with LK_SHARED.

- Add an IMNT_SHRLOOKUP flag to struct mount indicating that the file
system can do VOP_LOOKUP() with an shared lock. If it encounters
something tricky, VOP_LOOKUP() is free to return ENOLCK and namei() will
retry the lookup with an exclusive lock. If the file system has this flag
set, namei() will try with shared locks for all of the "read only"
lookups, i.e. nameiop=LOOKUP or !ISLASTCN.

- vfs_getcwd: only take vnode locks when really needed, take shared locks if
possible, and where the namecache has identify info for the directories,
do it all in the namecache.

- vfs_lookup: when crossing mountpoints take only a shared lock on the
covered vnode; don't need anything else.
 1.212.4.3  17-Jan-2020  ad vfs_lookup:

- Do the easy component name lookups directly in the namecache without
taking vnode locks nor vnode references (between the start and the leaf /
parent), which seems to largely solve the lock contention problem with
namei(). It needs support from the file system, which has to tell the
name cache about directory permissions (only ffs and tmpfs tried so far),
and I'm not sure how or if it can work with layered file systems yet.
Work in progress.

vfs_cache:

- Make the rbtree operations more efficient: inline the lookup, and key on a
64-bit hash value (32 bits plus 16 bits length) rather than names.

- Take namecache stuff out of vnode_impl, and take the rwlocks, and put them
all together an an nchnode struct which is mapped 1:1: with vnodes. Saves
memory and nicer cache profile.

- Add a routine to help vfs_lookup do its easy component name lookups.

- Report some more stats.

- Tidy up the file a bit.
 1.212.4.2  17-Jan-2020  ad Sync with head.
 1.212.4.1  16-Jan-2020  ad Push the vnode locking in namei() about as far back as it will go.
 1.212.2.1  21-Jun-2021  martin Pull up following revision(s) (requested by dholland in ticket #1296):

sys/sys/namei.src: revision 1.59 (via patch)
sys/kern/vfs_vnops.c: revision 1.215
sys/kern/vfs_lookup.c: revision 1.226

Add a new namei flag NONEXCLHACK for open with O_CREAT and not O_EXCL.
This case needs to be distinguished from the other CREATE operations
because it is supposed to successfully return (and open) the target if
it exists. In the case where that target is the root, or a mount
point, such that there's no parent dir, "real" CREATE operations fail,
but O_CREAT without O_EXCL needs to succeed.

So (a) add the flag, (b) test for it in namei in the situation
described above, (c) set it in open under the appropriate
circumstances, and (d) because this can result in namei returning
ni_dvp of NULL, cope with that case.

Should get into -9 and maybe even -8, because it was prompted by
issues with 3rd-party code. The use of a flag (vs. adding an
additional nameiop, which would be more appropriate) was deliberate to
make the patch small and noninvasive.
 1.217.2.1  25-Apr-2020  bouyer Sync with bouyer-xenpvh-base2 (HEAD)
 1.224.2.1  03-Jan-2021  thorpej Sync w/ HEAD.
 1.225.4.2  01-Aug-2021  thorpej Sync with HEAD.
 1.225.4.1  17-Jun-2021  thorpej Sync w/ HEAD.
 1.232.4.1  28-May-2023  martin Pull up following revision(s) (requested by gutteridge in ticket #175):

sys/sys/proc.h: revision 1.371
sys/kern/vfs_lookup.c: revision 1.234

Default PROC_MACHINE_ARCH to machine_arch and use this for magic
symlinks to resolve "@machine_arch".

This keeps behaviour of magic symlinks and 'uname -p' output the same.
Fixes PR 57320.
 1.234.6.1  02-Aug-2025  perseant Sync with HEAD

RSS XML Feed