Commit 1a50ede2 authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull nfsd updates from Chuck Lever:
 "Several substantial changes this time around:

   - Previously, exporting an NFS mount via NFSD was considered to be an
     unsupported feature. With v5.11, the community has attempted to
     make re-exporting a first-class feature of NFSD.

     This would enable the Linux in-kernel NFS server to be used as an
     intermediate cache for a remotely-located primary NFS server, for
     example, even with other NFS server implementations, like a NetApp
     filer, as the primary.

   - A short series of patches brings support for multiple RPC/RDMA data
     chunks per RPC transaction to the Linux NFS server's RPC/RDMA
     transport implementation.

     This is a part of the RPC/RDMA spec that the other premiere
     NFS/RDMA implementation (Solaris) has had for a very long time, and
     completes the implementation of RPC/RDMA version 1 in the Linux
     kernel's NFS server.

   - Long ago, NFSv4 support was introduced to NFSD using a series of C
     macros that hid dprintk's and goto's. Over time, the kernel's XDR
     implementation has been greatly improved, but these C macros have
     remained and become fallow. A series of patches in this pull
     request completely replaces those macros with the use of current
     kernel XDR infrastructure. Benefits include:

       - More robust input sanitization in NFSD's NFSv4 XDR decoders.

       - Make it easier to use common kernel library functions that use
         XDR stream APIs (for example, GSS-API).

       - Align the structure of the source code with the RFCs so it is
         easier to learn, verify, and maintain our XDR implementation.

       - Removal of more than a hundred hidden dprintk() call sites.

       - Removal of some explicit manipulation of pages to help make the
         eventual transition to xdr->bvec smoother.

   - On top of several related fixes in 5.10-rc, there are a few more
     fixes to get the Linux NFSD implementation of NFSv4.2 inter-server
     copy up to speed.

  And as usual, there is a pinch of seasoning in the form of a
  collection of unrelated minor bug fixes and clean-ups.

  Many thanks to all who contributed this time around!"

* tag 'nfsd-5.11' of git://git.linux-nfs.org/projects/cel/cel-2.6: (131 commits)
  nfsd: Record NFSv4 pre/post-op attributes as non-atomic
  nfsd: Set PF_LOCAL_THROTTLE on local filesystems only
  nfsd: Fix up nfsd to ensure that timeout errors don't result in ESTALE
  exportfs: Add a function to return the raw output from fh_to_dentry()
  nfsd: close cached files prior to a REMOVE or RENAME that would replace target
  nfsd: allow filesystems to opt out of subtree checking
  nfsd: add a new EXPORT_OP_NOWCC flag to struct export_operations
  Revert "nfsd4: support change_attr_type attribute"
  nfsd4: don't query change attribute in v2/v3 case
  nfsd: minor nfsd4_change_attribute cleanup
  nfsd: simplify nfsd4_change_info
  nfsd: only call inode_query_iversion in the I_VERSION case
  nfs_common: need lock during iterate through the list
  NFSD: Fix 5 seconds delay when doing inter server copy
  NFSD: Fix sparse warning in nfs4proc.c
  SUNRPC: Remove XDRBUF_SPARSE_PAGES flag in gss_proxy upcall
  sunrpc: clean-up cache downcall
  nfsd: Fix message level for normal termination
  NFSD: Remove macros that are no longer used
  NFSD: Replace READ* macros in nfsd4_decode_compound()
  ...
parents 9867cb1f 716a8bc7
Loading
Loading
Loading
Loading
+52 −0
Original line number Diff line number Diff line
@@ -154,6 +154,11 @@ struct which has the following members:
    to find potential names, and matches inode numbers to find the correct
    match.

  flags
    Some filesystems may need to be handled differently than others. The
    export_operations struct also includes a flags field that allows the
    filesystem to communicate such information to nfsd. See the Export
    Operations Flags section below for more explanation.

A filehandle fragment consists of an array of 1 or more 4byte words,
together with a one byte "type".
@@ -163,3 +168,50 @@ generated by encode_fh, in which case it will have been padded with
nuls.  Rather, the encode_fh routine should choose a "type" which
indicates the decode_fh how much of the filehandle is valid, and how
it should be interpreted.

Export Operations Flags
-----------------------
In addition to the operation vector pointers, struct export_operations also
contains a "flags" field that allows the filesystem to communicate to nfsd
that it may want to do things differently when dealing with it. The
following flags are defined:

  EXPORT_OP_NOWCC - disable NFSv3 WCC attributes on this filesystem
    RFC 1813 recommends that servers always send weak cache consistency
    (WCC) data to the client after each operation. The server should
    atomically collect attributes about the inode, do an operation on it,
    and then collect the attributes afterward. This allows the client to
    skip issuing GETATTRs in some situations but means that the server
    is calling vfs_getattr for almost all RPCs. On some filesystems
    (particularly those that are clustered or networked) this is expensive
    and atomicity is difficult to guarantee. This flag indicates to nfsd
    that it should skip providing WCC attributes to the client in NFSv3
    replies when doing operations on this filesystem. Consider enabling
    this on filesystems that have an expensive ->getattr inode operation,
    or when atomicity between pre and post operation attribute collection
    is impossible to guarantee.

  EXPORT_OP_NOSUBTREECHK - disallow subtree checking on this fs
    Many NFS operations deal with filehandles, which the server must then
    vet to ensure that they live inside of an exported tree. When the
    export consists of an entire filesystem, this is trivial. nfsd can just
    ensure that the filehandle live on the filesystem. When only part of a
    filesystem is exported however, then nfsd must walk the ancestors of the
    inode to ensure that it's within an exported subtree. This is an
    expensive operation and not all filesystems can support it properly.
    This flag exempts the filesystem from subtree checking and causes
    exportfs to get back an error if it tries to enable subtree checking
    on it.

  EXPORT_OP_CLOSE_BEFORE_UNLINK - always close cached files before unlinking
    On some exportable filesystems (such as NFS) unlinking a file that
    is still open can cause a fair bit of extra work. For instance,
    the NFS client will do a "sillyrename" to ensure that the file
    sticks around while it's still open. When reexporting, that open
    file is held by nfsd so we usually end up doing a sillyrename, and
    then immediately deleting the sillyrenamed file just afterward when
    the link count actually goes to zero. Sometimes this delete can race
    with other operations (for instance an rmdir of the parent directory).
    This flag causes nfsd to close any open files for this inode _before_
    calling into the vfs to do an unlink or a rename that would replace
    an existing file.
+24 −8
Original line number Diff line number Diff line
@@ -417,9 +417,11 @@ int exportfs_encode_fh(struct dentry *dentry, struct fid *fid, int *max_len,
}
EXPORT_SYMBOL_GPL(exportfs_encode_fh);

struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid,
		int fh_len, int fileid_type,
		int (*acceptable)(void *, struct dentry *), void *context)
struct dentry *
exportfs_decode_fh_raw(struct vfsmount *mnt, struct fid *fid, int fh_len,
		       int fileid_type,
		       int (*acceptable)(void *, struct dentry *),
		       void *context)
{
	const struct export_operations *nop = mnt->mnt_sb->s_export_op;
	struct dentry *result, *alias;
@@ -432,10 +434,8 @@ struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid,
	if (!nop || !nop->fh_to_dentry)
		return ERR_PTR(-ESTALE);
	result = nop->fh_to_dentry(mnt->mnt_sb, fid, fh_len, fileid_type);
	if (PTR_ERR(result) == -ENOMEM)
		return ERR_CAST(result);
	if (IS_ERR_OR_NULL(result))
		return ERR_PTR(-ESTALE);
		return result;

	/*
	 * If no acceptance criteria was specified by caller, a disconnected
@@ -561,10 +561,26 @@ struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid,

 err_result:
	dput(result);
	if (err != -ENOMEM)
		err = -ESTALE;
	return ERR_PTR(err);
}
EXPORT_SYMBOL_GPL(exportfs_decode_fh_raw);

struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid,
				  int fh_len, int fileid_type,
				  int (*acceptable)(void *, struct dentry *),
				  void *context)
{
	struct dentry *ret;

	ret = exportfs_decode_fh_raw(mnt, fid, fh_len, fileid_type,
				     acceptable, context);
	if (IS_ERR_OR_NULL(ret)) {
		if (ret == ERR_PTR(-ENOMEM))
			return ret;
		return ERR_PTR(-ESTALE);
	}
	return ret;
}
EXPORT_SYMBOL_GPL(exportfs_decode_fh);

MODULE_LICENSE("GPL");
+1 −1
Original line number Diff line number Diff line
@@ -697,7 +697,7 @@ bl_alloc_lseg(struct pnfs_layout_hdr *lo, struct nfs4_layoutget_res *lgr,

	xdr_init_decode_pages(&xdr, &buf,
			lgr->layoutp->pages, lgr->layoutp->len);
	xdr_set_scratch_buffer(&xdr, page_address(scratch), PAGE_SIZE);
	xdr_set_scratch_page(&xdr, scratch);

	status = -EIO;
	p = xdr_inline_decode(&xdr, 4);
+1 −1
Original line number Diff line number Diff line
@@ -510,7 +510,7 @@ bl_alloc_deviceid_node(struct nfs_server *server, struct pnfs_device *pdev,
		goto out;

	xdr_init_decode_pages(&xdr, &buf, pdev->pages, pdev->pglen);
	xdr_set_scratch_buffer(&xdr, page_address(scratch), PAGE_SIZE);
	xdr_set_scratch_page(&xdr, scratch);

	p = xdr_inline_decode(&xdr, sizeof(__be32));
	if (!p)
+1 −1
Original line number Diff line number Diff line
@@ -576,7 +576,7 @@ int nfs_readdir_page_filler(nfs_readdir_descriptor_t *desc, struct nfs_entry *en
		goto out_nopages;

	xdr_init_decode_pages(&stream, &buf, xdr_pages, buflen);
	xdr_set_scratch_buffer(&stream, page_address(scratch), PAGE_SIZE);
	xdr_set_scratch_page(&stream, scratch);

	do {
		if (entry->label)
Loading