Commit 92dbc9de authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull overlayfs updates from Miklos Szeredi:

 - Allow unprivileged mounting in a user namespace.

   For quite some time the security model of overlayfs has been that
   operations on underlying layers shall be performed with the
   privileges of the mounting task.

   This way an unprvileged user cannot gain privileges by the act of
   mounting an overlayfs instance. A full audit of all function calls
   made by the overlayfs code has been performed to see whether they
   conform to this model, and this branch contains some fixes in this
   regard.

 - Support running on copied filesystem images by optionally disabling
   UUID verification.

 - Bug fixes as well as documentation updates.

* tag 'ovl-update-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: unprivieged mounts
  ovl: do not get metacopy for userxattr
  ovl: do not fail because of O_NOATIME
  ovl: do not fail when setting origin xattr
  ovl: user xattr
  ovl: simplify file splice
  ovl: make ioctl() safe
  ovl: check privs before decoding file handle
  vfs: verify source area in vfs_dedupe_file_range_one()
  vfs: move cap_convert_nscap() call into vfs_setxattr()
  ovl: fix incorrect extent info in metacopy case
  ovl: expand warning in ovl_d_real()
  ovl: document lower modification caveats
  ovl: warn about orphan metacopy
  ovl: doc clarification
  ovl: introduce new "uuid=off" option for inodes index feature
  ovl: propagate ovl_fs to ovl_decode_real_fh and ovl_encode_real_fh
parents 65de0b89 459c7c56
Loading
Loading
Loading
Loading
+28 −8
Original line number Diff line number Diff line
@@ -97,11 +97,13 @@ directory trees to be in the same filesystem and there is no
requirement that the root of a filesystem be given for either upper or
lower.

The lower filesystem can be any filesystem supported by Linux and does
not need to be writable.  The lower filesystem can even be another
overlayfs.  The upper filesystem will normally be writable and if it
is it must support the creation of trusted.* extended attributes, and
must provide valid d_type in readdir responses, so NFS is not suitable.
A wide range of filesystems supported by Linux can be the lower filesystem,
but not all filesystems that are mountable by Linux have the features
needed for OverlayFS to work.  The lower filesystem does not need to be
writable.  The lower filesystem can even be another overlayfs.  The upper
filesystem will normally be writable and if it is it must support the
creation of trusted.* and/or user.* extended attributes, and must provide
valid d_type in readdir responses, so NFS is not suitable.

A read-only overlay of two read-only filesystems may use any
filesystem type.
@@ -467,14 +469,18 @@ summarized in the `Inode properties`_ table above.
Changes to underlying filesystems
---------------------------------

Offline changes, when the overlay is not mounted, are allowed to either
the upper or the lower trees.

Changes to the underlying filesystems while part of a mounted overlay
filesystem are not allowed.  If the underlying filesystem is changed,
the behavior of the overlay is undefined, though it will not result in
a crash or deadlock.

Offline changes, when the overlay is not mounted, are allowed to the
upper tree.  Offline changes to the lower tree are only allowed if the
"metadata only copy up", "inode index", and "redirect_dir" features
have not been used.  If the lower tree is modified and any of these
features has been used, the behavior of the overlay is undefined,
though it will not result in a crash or deadlock.

When the overlay NFS export feature is enabled, overlay filesystems
behavior on offline changes of the underlying lower layer is different
than the behavior when NFS export is disabled.
@@ -563,6 +569,11 @@ This verification may cause significant overhead in some cases.
Note: the mount options index=off,nfs_export=on are conflicting for a
read-write mount and will result in an error.

Note: the mount option uuid=off can be used to replace UUID of the underlying
filesystem in file handles with null, and effectively disable UUID checks. This
can be useful in case the underlying disk is copied and the UUID of this copy
is changed. This is only applicable if all lower/upper/work directories are on
the same filesystem, otherwise it will fallback to normal behaviour.

Volatile mount
--------------
@@ -583,6 +594,15 @@ fresh one. In very limited cases where the user knows that the system has
not crashed and contents of upperdir are intact, The "volatile" directory
can be removed.


User xattr
----------

The the "-o userxattr" mount option forces overlayfs to use the
"user.overlay." xattr namespace instead of "trusted.overlay.".  This is
useful for unprivileged mounting of overlayfs.


Testsuite
---------

+16 −12
Original line number Diff line number Diff line
@@ -275,7 +275,8 @@ int ovl_set_attr(struct dentry *upperdentry, struct kstat *stat)
	return err;
}

struct ovl_fh *ovl_encode_real_fh(struct dentry *real, bool is_upper)
struct ovl_fh *ovl_encode_real_fh(struct ovl_fs *ofs, struct dentry *real,
				  bool is_upper)
{
	struct ovl_fh *fh;
	int fh_type, dwords;
@@ -319,6 +320,7 @@ struct ovl_fh *ovl_encode_real_fh(struct dentry *real, bool is_upper)
	if (is_upper)
		fh->fb.flags |= OVL_FH_FLAG_PATH_UPPER;
	fh->fb.len = sizeof(fh->fb) + buflen;
	if (ofs->config.uuid)
		fh->fb.uuid = *uuid;

	return fh;
@@ -328,8 +330,8 @@ out_err:
	return ERR_PTR(err);
}

int ovl_set_origin(struct dentry *dentry, struct dentry *lower,
		   struct dentry *upper)
int ovl_set_origin(struct ovl_fs *ofs, struct dentry *dentry,
		   struct dentry *lower, struct dentry *upper)
{
	const struct ovl_fh *fh = NULL;
	int err;
@@ -340,7 +342,7 @@ int ovl_set_origin(struct dentry *dentry, struct dentry *lower,
	 * up and a pure upper inode.
	 */
	if (ovl_can_decode_fh(lower->d_sb)) {
		fh = ovl_encode_real_fh(lower, false);
		fh = ovl_encode_real_fh(ofs, lower, false);
		if (IS_ERR(fh))
			return PTR_ERR(fh);
	}
@@ -352,7 +354,8 @@ int ovl_set_origin(struct dentry *dentry, struct dentry *lower,
				 fh ? fh->fb.len : 0, 0);
	kfree(fh);

	return err;
	/* Ignore -EPERM from setting "user.*" on symlink/special */
	return err == -EPERM ? 0 : err;
}

/* Store file handle of @upper dir in @index dir entry */
@@ -362,7 +365,7 @@ static int ovl_set_upper_fh(struct ovl_fs *ofs, struct dentry *upper,
	const struct ovl_fh *fh;
	int err;

	fh = ovl_encode_real_fh(upper, true);
	fh = ovl_encode_real_fh(ofs, upper, true);
	if (IS_ERR(fh))
		return PTR_ERR(fh);

@@ -380,6 +383,7 @@ static int ovl_set_upper_fh(struct ovl_fs *ofs, struct dentry *upper,
static int ovl_create_index(struct dentry *dentry, struct dentry *origin,
			    struct dentry *upper)
{
	struct ovl_fs *ofs = OVL_FS(dentry->d_sb);
	struct dentry *indexdir = ovl_indexdir(dentry->d_sb);
	struct inode *dir = d_inode(indexdir);
	struct dentry *index = NULL;
@@ -402,7 +406,7 @@ static int ovl_create_index(struct dentry *dentry, struct dentry *origin,
	if (WARN_ON(ovl_test_flag(OVL_INDEX, d_inode(dentry))))
		return -EIO;

	err = ovl_get_index_name(origin, &name);
	err = ovl_get_index_name(ofs, origin, &name);
	if (err)
		return err;

@@ -411,7 +415,7 @@ static int ovl_create_index(struct dentry *dentry, struct dentry *origin,
	if (IS_ERR(temp))
		goto free_name;

	err = ovl_set_upper_fh(OVL_FS(dentry->d_sb), upper, temp);
	err = ovl_set_upper_fh(ofs, upper, temp);
	if (err)
		goto out;

@@ -521,7 +525,7 @@ static int ovl_copy_up_inode(struct ovl_copy_up_ctx *c, struct dentry *temp)
	 * hard link.
	 */
	if (c->origin) {
		err = ovl_set_origin(c->dentry, c->lowerpath.dentry, temp);
		err = ovl_set_origin(ofs, c->dentry, c->lowerpath.dentry, temp);
		if (err)
			return err;
	}
@@ -700,7 +704,7 @@ out_dput:
static int ovl_do_copy_up(struct ovl_copy_up_ctx *c)
{
	int err;
	struct ovl_fs *ofs = c->dentry->d_sb->s_fs_info;
	struct ovl_fs *ofs = OVL_FS(c->dentry->d_sb);
	bool to_index = false;

	/*
@@ -722,7 +726,7 @@ static int ovl_do_copy_up(struct ovl_copy_up_ctx *c)

	if (to_index) {
		c->destdir = ovl_indexdir(c->dentry->d_sb);
		err = ovl_get_index_name(c->lowerpath.dentry, &c->destname);
		err = ovl_get_index_name(ofs, c->lowerpath.dentry, &c->destname);
		if (err)
			return err;
	} else if (WARN_ON(!c->parent)) {
+6 −4
Original line number Diff line number Diff line
@@ -211,7 +211,8 @@ static int ovl_check_encode_origin(struct dentry *dentry)
	return 1;
}

static int ovl_dentry_to_fid(struct dentry *dentry, u32 *fid, int buflen)
static int ovl_dentry_to_fid(struct ovl_fs *ofs, struct dentry *dentry,
			     u32 *fid, int buflen)
{
	struct ovl_fh *fh = NULL;
	int err, enc_lower;
@@ -226,7 +227,7 @@ static int ovl_dentry_to_fid(struct dentry *dentry, u32 *fid, int buflen)
		goto fail;

	/* Encode an upper or lower file handle */
	fh = ovl_encode_real_fh(enc_lower ? ovl_dentry_lower(dentry) :
	fh = ovl_encode_real_fh(ofs, enc_lower ? ovl_dentry_lower(dentry) :
				ovl_dentry_upper(dentry), !enc_lower);
	if (IS_ERR(fh))
		return PTR_ERR(fh);
@@ -249,6 +250,7 @@ fail:
static int ovl_encode_fh(struct inode *inode, u32 *fid, int *max_len,
			 struct inode *parent)
{
	struct ovl_fs *ofs = OVL_FS(inode->i_sb);
	struct dentry *dentry;
	int bytes, buflen = *max_len << 2;

@@ -260,7 +262,7 @@ static int ovl_encode_fh(struct inode *inode, u32 *fid, int *max_len,
	if (WARN_ON(!dentry))
		return FILEID_INVALID;

	bytes = ovl_dentry_to_fid(dentry, fid, buflen);
	bytes = ovl_dentry_to_fid(ofs, dentry, fid, buflen);
	dput(dentry);
	if (bytes <= 0)
		return FILEID_INVALID;
@@ -680,7 +682,7 @@ static struct dentry *ovl_upper_fh_to_d(struct super_block *sb,
	if (!ovl_upper_mnt(ofs))
		return ERR_PTR(-EACCES);

	upper = ovl_decode_real_fh(fh, ovl_upper_mnt(ofs), true);
	upper = ovl_decode_real_fh(ofs, fh, ovl_upper_mnt(ofs), true);
	if (IS_ERR_OR_NULL(upper))
		return upper;

+21 −123
Original line number Diff line number Diff line
@@ -53,9 +53,10 @@ static struct file *ovl_open_realfile(const struct file *file,
	err = inode_permission(realinode, MAY_OPEN | acc_mode);
	if (err) {
		realfile = ERR_PTR(err);
	} else if (!inode_owner_or_capable(realinode)) {
		realfile = ERR_PTR(-EPERM);
	} else {
		if (!inode_owner_or_capable(realinode))
			flags &= ~O_NOATIME;

		realfile = open_with_fake_path(&file->f_path, flags, realinode,
					       current_cred());
	}
@@ -75,12 +76,6 @@ static int ovl_change_flags(struct file *file, unsigned int flags)
	struct inode *inode = file_inode(file);
	int err;

	flags |= OVL_OPEN_FLAGS;

	/* If some flag changed that cannot be changed then something's amiss */
	if (WARN_ON((file->f_flags ^ flags) & ~OVL_SETFL_MASK))
		return -EIO;

	flags &= OVL_SETFL_MASK;

	if (((flags ^ file->f_flags) & O_APPEND) && IS_APPEND(inode))
@@ -397,48 +392,6 @@ out_unlock:
	return ret;
}

static ssize_t ovl_splice_read(struct file *in, loff_t *ppos,
			 struct pipe_inode_info *pipe, size_t len,
			 unsigned int flags)
{
	ssize_t ret;
	struct fd real;
	const struct cred *old_cred;

	ret = ovl_real_fdget(in, &real);
	if (ret)
		return ret;

	old_cred = ovl_override_creds(file_inode(in)->i_sb);
	ret = generic_file_splice_read(real.file, ppos, pipe, len, flags);
	revert_creds(old_cred);

	ovl_file_accessed(in);
	fdput(real);
	return ret;
}

static ssize_t
ovl_splice_write(struct pipe_inode_info *pipe, struct file *out,
			  loff_t *ppos, size_t len, unsigned int flags)
{
	struct fd real;
	const struct cred *old_cred;
	ssize_t ret;

	ret = ovl_real_fdget(out, &real);
	if (ret)
		return ret;

	old_cred = ovl_override_creds(file_inode(out)->i_sb);
	ret = iter_file_splice_write(pipe, real.file, ppos, len, flags);
	revert_creds(old_cred);

	ovl_file_accessed(out);
	fdput(real);
	return ret;
}

static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync)
{
	struct fd real;
@@ -541,46 +494,31 @@ static long ovl_real_ioctl(struct file *file, unsigned int cmd,
			   unsigned long arg)
{
	struct fd real;
	const struct cred *old_cred;
	long ret;

	ret = ovl_real_fdget(file, &real);
	if (ret)
		return ret;

	old_cred = ovl_override_creds(file_inode(file)->i_sb);
	ret = security_file_ioctl(real.file, cmd, arg);
	if (!ret)
	if (!ret) {
		/*
		 * Don't override creds, since we currently can't safely check
		 * permissions before doing so.
		 */
		ret = vfs_ioctl(real.file, cmd, arg);
	revert_creds(old_cred);
	}

	fdput(real);

	return ret;
}

static unsigned int ovl_iflags_to_fsflags(unsigned int iflags)
{
	unsigned int flags = 0;

	if (iflags & S_SYNC)
		flags |= FS_SYNC_FL;
	if (iflags & S_APPEND)
		flags |= FS_APPEND_FL;
	if (iflags & S_IMMUTABLE)
		flags |= FS_IMMUTABLE_FL;
	if (iflags & S_NOATIME)
		flags |= FS_NOATIME_FL;

	return flags;
}

static long ovl_ioctl_set_flags(struct file *file, unsigned int cmd,
				unsigned long arg, unsigned int flags)
				unsigned long arg)
{
	long ret;
	struct inode *inode = file_inode(file);
	unsigned int oldflags;

	if (!inode_owner_or_capable(inode))
		return -EACCES;
@@ -591,10 +529,13 @@ static long ovl_ioctl_set_flags(struct file *file, unsigned int cmd,

	inode_lock(inode);

	/* Check the capability before cred override */
	oldflags = ovl_iflags_to_fsflags(READ_ONCE(inode->i_flags));
	ret = vfs_ioc_setflags_prepare(inode, oldflags, flags);
	if (ret)
	/*
	 * Prevent copy up if immutable and has no CAP_LINUX_IMMUTABLE
	 * capability.
	 */
	ret = -EPERM;
	if (!ovl_has_upperdata(inode) && IS_IMMUTABLE(inode) &&
	    !capable(CAP_LINUX_IMMUTABLE))
		goto unlock;

	ret = ovl_maybe_copy_up(file_dentry(file), O_WRONLY);
@@ -613,46 +554,6 @@ unlock:

}

static long ovl_ioctl_set_fsflags(struct file *file, unsigned int cmd,
				  unsigned long arg)
{
	unsigned int flags;

	if (get_user(flags, (int __user *) arg))
		return -EFAULT;

	return ovl_ioctl_set_flags(file, cmd, arg, flags);
}

static unsigned int ovl_fsxflags_to_fsflags(unsigned int xflags)
{
	unsigned int flags = 0;

	if (xflags & FS_XFLAG_SYNC)
		flags |= FS_SYNC_FL;
	if (xflags & FS_XFLAG_APPEND)
		flags |= FS_APPEND_FL;
	if (xflags & FS_XFLAG_IMMUTABLE)
		flags |= FS_IMMUTABLE_FL;
	if (xflags & FS_XFLAG_NOATIME)
		flags |= FS_NOATIME_FL;

	return flags;
}

static long ovl_ioctl_set_fsxflags(struct file *file, unsigned int cmd,
				   unsigned long arg)
{
	struct fsxattr fa;

	memset(&fa, 0, sizeof(fa));
	if (copy_from_user(&fa, (void __user *) arg, sizeof(fa)))
		return -EFAULT;

	return ovl_ioctl_set_flags(file, cmd, arg,
				   ovl_fsxflags_to_fsflags(fa.fsx_xflags));
}

long ovl_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
	long ret;
@@ -663,12 +564,9 @@ long ovl_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
		ret = ovl_real_ioctl(file, cmd, arg);
		break;

	case FS_IOC_SETFLAGS:
		ret = ovl_ioctl_set_fsflags(file, cmd, arg);
		break;

	case FS_IOC_FSSETXATTR:
		ret = ovl_ioctl_set_fsxflags(file, cmd, arg);
	case FS_IOC_SETFLAGS:
		ret = ovl_ioctl_set_flags(file, cmd, arg);
		break;

	default:
@@ -801,8 +699,8 @@ const struct file_operations ovl_file_operations = {
#ifdef CONFIG_COMPAT
	.compat_ioctl	= ovl_compat_ioctl,
#endif
	.splice_read    = ovl_splice_read,
	.splice_write   = ovl_splice_write,
	.splice_read    = generic_file_splice_read,
	.splice_write   = iter_file_splice_write,

	.copy_file_range	= ovl_copy_file_range,
	.remap_file_range	= ovl_remap_file_range,
+10 −4
Original line number Diff line number Diff line
@@ -329,8 +329,14 @@ static const char *ovl_get_link(struct dentry *dentry,

bool ovl_is_private_xattr(struct super_block *sb, const char *name)
{
	return strncmp(name, OVL_XATTR_PREFIX,
		       sizeof(OVL_XATTR_PREFIX) - 1) == 0;
	struct ovl_fs *ofs = sb->s_fs_info;

	if (ofs->config.userxattr)
		return strncmp(name, OVL_XATTR_USER_PREFIX,
			       sizeof(OVL_XATTR_USER_PREFIX) - 1) == 0;
	else
		return strncmp(name, OVL_XATTR_TRUSTED_PREFIX,
			       sizeof(OVL_XATTR_TRUSTED_PREFIX) - 1) == 0;
}

int ovl_xattr_set(struct dentry *dentry, struct inode *inode, const char *name,
@@ -476,7 +482,7 @@ static int ovl_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
		      u64 start, u64 len)
{
	int err;
	struct inode *realinode = ovl_inode_real(inode);
	struct inode *realinode = ovl_inode_realdata(inode);
	const struct cred *old_cred;

	if (!realinode->i_op->fiemap)
@@ -690,7 +696,7 @@ static void ovl_fill_inode(struct inode *inode, umode_t mode, dev_t rdev)
 * For the first, copy up case, the union nlink does not change, whether the
 * operation succeeds or fails, but the upper inode nlink may change.
 * Therefore, before copy up, we store the union nlink value relative to the
 * lower inode nlink in the index inode xattr trusted.overlay.nlink.
 * lower inode nlink in the index inode xattr .overlay.nlink.
 *
 * For the second, upper hardlink case, the union nlink should be incremented
 * or decremented IFF the operation succeeds, aligned with nlink change of the
Loading