Home | History | Annotate | Line # | Download | only in man9
namei.9 revision 1.37
 $NetBSD: namei.9,v 1.37 2016/05/07 08:52:10 wiz Exp $

Copyright (c) 2001, 2005, 2006 The NetBSD Foundation, Inc.
All rights reserved.

This code is derived from software contributed to The NetBSD Foundation
by Gregory McGarry.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.

.Dd May 6, 2016 .Dt NAMEI 9 .Os .Sh NAME .Nm namei , .Nm NDINIT , .Nm NDAT , .Nm namei_simple_kernel , .Nm namei_simple_user .Nm relookup , .Nm lookup_for_nfsd , .Nm lookup_for_nfsd_index , .Nd pathname lookup .Sh SYNOPSIS n sys/namei.h n sys/uio.h n sys/vnode.h .Fn NDINIT "struct nameidata *ndp" "u_long op" "u_long flags" \ "struct pathbuf *pathbuf" .Fn NDAT "struct nameidata *ndp" "struct vnode *dvp" .Ft int .Fn namei "struct nameidata *ndp" .Ft int .Fn namei_simple_kernel "const char *path" "namei_simple_flags_t sflags" \ "struct vnode **ret" .Ft int .Fn namei_simple_user "const char *path" "namei_simple_flags_t sflags" \ "struct vnode **ret" .Ft int .Fn relookup "struct vnode *dvp" "struct vnode **vpp" \ "struct componentname *cnp" .Ft int .Fn lookup_for_nfsd "struct nameidata *ndp" "struct vnode *startdir" \ "int neverfollow" .Ft int .Fn lookup_for_nfsd_index "struct nameidata *ndp" .Sh DESCRIPTION The .Nm interface is used to convert pathnames to file system vnodes. The name of the interface is actually a contraction of the words .Em name and .Em inode for name-to-inode conversion, in the days before the .Xr vfs 9 interface was implemented.

p Except for the simple forms, the arguments passed to the functions are encapsulated in the .Em nameidata structure. The public interface to this structure is as follows. l -offset indent t It contains a member .Em ni_erootdir that may be set to the emulation root directory before the call if the .Dv EMULROOTSET flag is also set and .Dv TRYEMULROOT is in effect. This is only necessary (or allowed) if the emulation root is not yet set in the current process. t It contains a member .Em ni_vp that upon successful return contains the result vnode. t It contains a member .Em ni_dvp that upon successful return contains the vnode of the directory containing the result vnode or .Dv NULL . If the .Dv LOCKPARENT flag is set, the containing vnode is returned; otherwise this field is left set to .Dv NULL . t It contains a member .Em ni_cnd that is a .Em componentname structure (described in just a moment). This may be used by callers to pass to certain vnode operations that operate on pathnames. .El

p The other fields in the structure should not be examined or altered directly. Note that the .Xr nfs 4 code misuses the .Em nameidata structure and currently has an incestuous relationship with the .Nm code. This is gradually being cleaned up. Other code should initialize the .Em nameidata structure with the .Fn NDINIT macro and perhaps also the .Fn NDAT macro.

p The .Em componentname structure has the following layout: d -literal struct componentname { /* * Arguments to VOP_LOOKUP and directory VOP routines. */ uint32_t cn_nameiop; /* namei operation */ uint32_t cn_flags; /* flags to namei */ kauth_cred_t cn_cred; /* credentials */ const char *cn_nameptr; /* pointer to looked up name */ size_t cn_namelen; /* length of looked up comp */ /* * Side result from VOP_LOOKUP. */ size_t cn_consume; /* chars to consume in lookup */ }; .Ed This structure contains the information about a single directory component name, along with certain other information required by vnode operations. See .Xr vnodeops 9 for more information about these vnode operations.

p The members: l -tag -offset indent -width cn_consumexx -compact t cn_nameiop The type of operation in progress; indicates the basic operating mode of namei. May be one of .Dv LOOKUP , .Dv CREATE , .Dv DELETE , or .Dv RENAME . These modes are described below. t cn_flags Additional flags affecting the operation of namei. These are described below as well. t cn_cred The credentials to use for the lookup or other operation the .Em componentname is passed to. This may match the credentials of the current process or it may not, depending on where the original operation request came from and how it has been routed. t cn_nameptr The name of this directory component, followed by the rest of the path being looked up. t cn_namelen The length of the name of this directory component. The name is not in general null terminated, although the complete string (the full remaining path) always is. t cn_consume This field starts at zero; it may be set to a larger value by implementations of .Xr VOP_LOOKUP 9 to indicate how many more characters beyond .Em cn_namelen are being consumed. New uses of this feature are discouraged and should be discussed. .El .Ss Operating modes The .Dv LOOKUP mode is the baseline behavior, for all ordinary lookups that simply return a vnode. In this mode, if the requested object does not exist, .Er ENOENT is returned.

p The .Dv CREATE mode is used when doing the lookup for operations that create names in the file system namespace, such as .Xr mkdir 2 . In this mode, if the requested name already exists, .Er EEXIST is returned. Otherwise if the requested name does not exist, the lookup succeeds and the returned vnode .Em ni_vp is set to .Dv NULL .

p The .Dv DELETE mode is used when doing the lookup for operations that remove names from the file system namespace, such as .Xr rmdir 2 ; and the .Dv RENAME mode is used when doing (one of the) lookups for .Xr rename 2 .

p The mode may affect both the .Xr VOP_LOOKUP 9 implementation of the file system involved and the internal operation of .Nm , such as interactions with the .Xr namecache 9 . In .Xr ffs 4 , for example, the .Xr VOP_LOOKUP 9 implementation reacts to .Dv CREATE , .Dv DELETE , and .Dv RENAME by computing additional information needed to operate on directory entries. This information is consumed by the vnode operations expected to be called subsequently. If one of these modes is used (and owing to failure or whatever other circumstances) the eventual vnode operation is not called, .Xr VOP_ABORTOP 9 should be used to ensure the state involved is cleaned up correctly. .Ss Flags The following flags may appear. Some are set by .Nm ; some are set by .Xr VOP_LOOKUP 9 ; some are set by the caller in the .Fa flags argument of .Fn NDINIT . Not all possible combinations are sensible or legal. l -tag -offset indent -width NOCROSSMOUNT -compact t Dv FOLLOW Follow symbolic links encountered as the last path component. Used by operations that do not address symbolic links directly, e.g. .Xr stat 2 . Does not affect symbolic links found in the middle of a path. Set by the caller. Used internally. t Dv NOFOLLOW Do not follow symbolic links encountered as the last path component. Used by operations that address symbolic links explicitly, such as .Xr lstat 2 . Set by the caller. Used internally. Note: the value of .Dv NOFOLLOW is 0 ; it exists on the grounds that every .Nm call should explicitly choose either .Dv FOLLOW or .Dv NOFOLLOW . t Dv LOCKLEAF Upon successful completion, the resultant named vnode .Em ni_vp is to be left locked. (Otherwise, the vnode is unlocked before .Nm returns.) Set by the caller; used internally. t Dv LOCKPARENT Upon successful completion, the parent directory containing the named vnode is left locked and returned in .Em ni_dvp as described above. Otherwise, .Em ni_dvp is set to .Dv NULL . Set by the caller; used internally. t Dv TRYEMULROOT If set, the path is looked up in emulation root of the current process first. If that fails, the system root is used. Set by the caller; used internally. t Dv EMULROOTSET As described above, indicates that the caller has set .Em ni_erootdir prior to calling .Nm . This is only useful or permitted when the emulation in the current process is partway through being set up. Set by the caller; used internally. t Dv NOCHROOT Bypass normal .Xr chroot 8 handling for absolute paths. Set by the caller and used internally. t Dv NOCROSSMOUNT Do not cross mount points. Set by the caller and used internally. t Dv RDONLY Enforce read-only behavior. Set by the caller and used internally. May also be used by file-system-level vnode functions. t Dv DOWHITEOUT Allow whiteouts to be seen as objects instead of functioning as .Dq nothing there . Set by the caller; used by vnode functions. t Dv ISWHITEOUT The current path component is a whiteout. Set by .Xr VOP_LOOKUP 9 implementations and consumed by other vnode functions and the .Xr namecache 9 . t Dv ISDOTDOT The current pathname component is ``..''. Set internally; used by vnode operations. t Dv ISLASTCN The current path component is the last component found in the pathname. Set internally; used by vnode functions. t Dv REQUIREDIR The current object to be looked up must be a directory. Set and used internally; any external references are abusive. t Dv CREATEDIR Accept slashes after a component name that does not exist. This only makes sense in .Dv CREATE mode and when creating a directory. Set by the caller and used internally. t Dv NOCACHE The name looked up should not be entered in the .Xr namecache 9 . Set and used internally. (Currently sometimes set by callers, but most likely this is improper/incorrect.) May also be used by file-system-level vnode functions. t Dv MAKEENTRY The current pathname component should be added to the .Xr namecache 9 . Set and used internally. Also abused by the .Xr nfs 4 code. .El

p The following additional historic flags have been removed from .Nx and should be handled as follows if porting code from elsewhere: l -tag -offset indent -width NOCROSSMOUNT -compact t Dv INRENAME Part of a misbegotten and incorrect locking scheme. Any file-system-level code using this is presumptively incorrect. File systems should use the .Xr genfs_rename 9 interface to handle locking in .Fn VOP_RENAME . t Dv INRELOOKUP Used at one point for signaling to .Xr puffs 3 to work around a protocol deficiency that was later rectified. t Dv ISSYMLINK Useless internal state. t Dv SAVESTART Unclean setting affect vnode reference counting. Now effectively never in effect. Any code referring to this is suspect. t Dv SAVENAME Unclean setting relating to responsibility for freeing pathname buffers in the days before the .Em pathbuf structure. Now effectively always in effect; the caller of .Nm owns the .Em pathbuf structure and is always responsible for destroying it. t Dv HASBUF Related to SAVENAME. Any uses can be replaced with .Dq true . .El All access to the .Nm interface must be in process context. Pathname lookups cannot be done in interrupt context. .Sh FUNCTIONS l -tag -width compact t Fn NDINIT "ndp" "op" "flags" "pathbuf" Initialise a nameidata structure pointed to by .Fa ndp for use by the .Nm interface. The operating mode and flags (as documented above) are specified by .Fa op and .Fa flags respectively. The pathname is passed as a pathbuf structure, which should be initialized using one of the .Xr pathbuf 9 operations. Destroying the pathbuf is the responsibility of the caller; this must not be done until the caller is finished with all of the .Nm results and all of the nameidata contents except for the result vnode.

p This routine stores the credentials of the calling thread .Va ( curlwp ) in .Fa ndp . .Fn NDINIT sets the credentials using .Xr kauth_cred_get 9 . In the rare case that another set of credentials is required for the namei operation, .Em ndp-\*[Gt]ni_cnd.cn_cred must be set manually after .Fn NDINIT . t Fn NDAT "ndp" "dvp" This macro is used after .Fn NDINIT to set the starting directory. This supersedes the current process's current working directory as the initial point of departure for looking up relative paths. This mechanism is used by .Xr openat 2 and related calls. t Fn namei "ndp" Convert a pathname into a pointer to a vnode. The nameidata structure pointed to by .Fa ndp should be initialized with the .Fn NDINIT macro, and perhaps also the .Fn NDAT macro. Direct initialization of members of struct nameidata is .Em not supported and may (will) break silently in the future.

p The vnode for the pathname is returned in .Em ndp-\*[Gt]ni_vp . The parent directory is returned locked in .Em ndp-\*[Gt]ni_dvp iff .Dv LOCKPARENT is specified.

p Any or all of the flags documented above as set by the caller can be enabled by passing them (OR'd together) as the .Fa flags argument of .Fn NDINIT . As discussed above every such call should explicitly contain either .Dv FOLLOW or .Dv NOFOLLOW to control the behavior regarding final symbolic links. t Fn namei_simple_kernel "path" "sflags" "ret" Look up the path .Fa path and translate it to a vnode, returned in .Fa ret . The .Fa path argument must be a kernel

q Dv UIO_SYSSPACE pointer. The .Fa sflags argument chooses the precise behavior. It may be set to one of the following symbols: l -tag -offset indent -width NSM_NOFOLLOW_TRYEMULROOT -compact t Dv NSM_NOFOLLOW_NOEMULROOT t Dv NSM_NOFOLLOW_TRYEMULROOT t Dv NSM_FOLLOW_NOEMULROOT t Dv NSM_FOLLOW_TRYEMULROOT .El These select (or not) the .Dv FOLLOW/NOFOLLOW and .Dv TRYEMULROOT flags. Other flags are not available through this interface, which is nonetheless sufficient for more than half the .Fn namei usage in the kernel. Note that the encoding of .Fa sflags has deliberately been arranged to be type-incompatible with anything else. This prevents various possible accidents while the .Fn namei interface is being rototilled. t Fn namei_simple_user "path" "sflags" "ret" This function is the same as .Fn namei_simple_kernel except that the .Fa path argument shall be a user pointer

q Dv UIO_USERSPACE rather than a kernel pointer. t Fn relookup "dvp" "vpp" "cnp" Reacquire a path name component is a directory. This is a quicker way to lookup a pathname component when the parent directory is known. The locked parent directory vnode is specified by .Fa dvp and the pathname component by .Fa cnp . The vnode of the pathname is returned in the address specified by .Fa vpp . Note that one may only use .Fn relookup to repeat a lookup of a final path component previously done by .Nm , and one must use the same .Em componentname structure that call produced. Otherwise the behavior is undefined and likely adverse. t Fn lookup_for_nfsd "ndp" "startdir" "neverfollow" This is a private entry point into .Nm used by the NFS server code. It looks up a path starting from .Fa startdir . If .Fa neverfollow is set, .Em any symbolic link (not just at the end of the path) will cause an error. Otherwise, it follows symlinks normally. It should not be used by new code. t Fn lookup_for_nfsd_index "ndp" This is a (second) private entry point into .Nm used by the NFS server code. It looks up a single path component. It should not be used by new code. .El .Sh INTERNALS The .Em nameidata structure has the following layout: d -literal struct nameidata { /* * Arguments to namei. */ struct vnode *ni_atdir; /* startup dir, cwd if null */ struct pathbuf *ni_pathbuf; /* pathname container */ char *ni_pnbuf; /* extra pathname buffer ref (XXX) */ /* * Internal starting state. (But see notes.) */ struct vnode *ni_rootdir; /* logical root directory */ struct vnode *ni_erootdir; /* emulation root directory */ /* * Results from namei. */ struct vnode *ni_vp; /* vnode of result */ struct vnode *ni_dvp; /* vnode of intermediate directory */ /* * Internal current state. */ size_t ni_pathlen; /* remaining chars in path */ const char *ni_next; /* next location in pathname */ unsigned int ni_loopcnt; /* count of symlinks encountered */ /* * Lookup parameters: this structure describes the subset of * information from the nameidata structure that is passed * through the VOP interface. */ struct componentname ni_cnd; }; .Ed

p These fields are: l -tag -offset indent -width ni_erootdirx -compact t ni_atdir The directory to use for the starting point of relative paths. If null, the current process's current directory is used. This is initialized to .Dv NULL by .Fn NDINIT and set by .Fn NDAT . t ni_pathbuf The abstract path buffer in use, passed as an argument to .Fn NDINIT . The name pointers that appear elsewhere, such as in the .Em componentname structure, point into this buffer. It is owned by the caller and must not be destroyed until all .Nm operations are complete. See .Xr pathbuf 9 . t ni_pnbuf This is the name pointer used during .Nm . It points into .Fa ni_pathbuf . It is not initialized until entry into .Nm . t ni_rootdir The root directory to use as the starting point for absolute paths. This is retrieved from the current process's current root directory when .Nm starts up. It is not initialized by .Fn NDINIT . t ni_erootdir The root directory to use as the emulation root, for processes running in emulation. This is retrieved from the current process's emulation root directory when .Nm starts up and not initialized by .Fn NDINIT . As described elsewhere, it may be set by the caller if the .Dv EMULROOTSET flag is used, but this should only be done when the current process's emulation root directory is not yet initialized. (And ideally in the future things would be tidied so that this is not necessary.) t ni_vp t ni_dvp Returned vnodes, as described above. These only contain valid values if .Nm returns successfully. t ni_pathlen The length of the full current remaining path string in .Fa ni_pnbuf . This is not initialized by .Fn NDINIT and is used only internally. t ni_next The remaining part of the path, after the current component found in the .Em componentname structure. This is not initialized by .Fn NDINIT and is used only internally. t ni_loopcnt The number of symbolic links encountered (and traversed) so far. If this exceeds a limit, .Nm fails with .Er ELOOP . This is not initialized by .Fn NDINIT and is used only internally. t ni_cnd The .Em componentname structure holding the current directory component, and also the mode, flags, and credentials. The mode, flags, and credentials are initialized by .Fn NDINIT ; the rest is not initialized until .Nm runs. .El

p There is also a .Em namei_state structure that is hidden within

a vfs_lookup.c . This contains the following additional state: l -tag -offset indent -width attempt_retry -compact t docache A flag indicating whether to cache the last pathname component. t rdonly The read-only state, initialized from the .Dv RDONLY flag. t slashes The number of trailing slashes found after the current pathname component. t attempt_retry Set on some error cases (and not others) to indicate that a failure in the emulation root should be followed by a retry in the real system root. .El

p The state in .Em namei_state is genuinely private to .Nm . Note that much of the state in .Em nameidata should also be private, but is currently not because it is misused in some fashion by outside code, usually .Xr nfs 4 .

p The control flow within the .Nm portions of

a vfs_lookup.c is as follows. l -tag t Fn namei does a complete path lookup by calling .Fn namei_init , .Fn namei_tryemulroot , and .Fn namei_cleanup . t Fn namei_init sets up the basic internal state and makes some (precondition-type) assertions. t Fn namei_cleanup makes some postcondition-type assertions; it currently does nothing besides this. t Fn namei_tryemulroot handles .Dv TRYEMULROOT by calling .Fn namei_oneroot once or twice as needed, and attends to making sure the original pathname is preserved for the second try. t Fn namei_oneroot does a complete path search from a single root directory. It begins with .Fn namei_start , then calls .Fn lookup_once (and if necessary, .Fn namei_follow ) repeatedly until done. It also handles returning the result vnode(s) in the requested state. t Fn namei_start sets up the initial state and locking; it calls .Fn namei_getstartdir . t Fn namei_getstartdir initializes the root directory state (both .Fa ni_rootdir and .Fa ni_erootdir ) and picks the starting directory, consuming the leading slashes of an absolute path and handling the magic .Dq /../ string for bypassing the emulation root. A different version .Fn namei_getstartdir_for_nfsd is used for lookups coming from .Xr nfsd 8 as those are required to have different semantics. t Fn lookup_once calls .Fn VOP_LOOKUP for one path component, also handling any needed crossing of mount points (either up or down) and coping with locking requirements. t Fn lookup_parsepath is called prior to each .Fn lookup_once call to examine the pathname and find where the next component starts. t Fn namei_follow reads the contents of a symbolic link and updates both the path buffer and the search directory accordingly. .El

p As a final note be advised that the magic return value associated with .Dv CREATE mode is different for .Nm than it is for .Fn VOP_LOOKUP . The latter .Dq fails with .Er EJUSTRETURN . .Nm translates this into succeeding and returning a null vnode. .Sh CODE REFERENCES The name lookup subsystem is implemented within the file

a sys/kern/vfs_lookup.c . .Sh SEE ALSO .Xr intro 9 , .Xr namecache 9 , .Xr vfs 9 , .Xr vnode 9 , .Xr vnodeops 9 .Sh BUGS There should be no such thing as operating modes. Only .Dv LOOKUP is actually needed. The behavior where removing an object looks it up within .Nm and then calls into the file system (which must look it up again internally or cache state from .Fn VOP_LOOKUP ) is particularly contorted.

p Most of the flags are equally bogus.

p Most of the contents of the .Em nameidata structure should be private and hidden within .Nm ; currently it cannot be because of abuse elsewhere.

p The .Dv EMULROOTSET flag is messy.

p There is no good way to support file systems that want to use a more elaborate pathname schema than the customary slash-delimited components.