Merge branch 'bpf-sockopt-hooks' (2ec1899e) · Commits · 戴 / test

Documentation/bpf/index.rst

+1 −0

Original line number	Diff line number	Diff line
		@@ -42,6 +42,7 @@ Program types
		.. toctree::
		:maxdepth: 1

		prog_cgroup_sockopt
		prog_cgroup_sysctl
		prog_flow_dissector

Documentation/bpf/prog_cgroup_sockopt.rst

0 → 100644

+93 −0

Original line number	Diff line number	Diff line
		.. SPDX-License-Identifier: GPL-2.0

		============================
		BPF_PROG_TYPE_CGROUP_SOCKOPT
		============================

		``BPF_PROG_TYPE_CGROUP_SOCKOPT`` program type can be attached to two
		cgroup hooks:

		* ``BPF_CGROUP_GETSOCKOPT`` - called every time process executes ``getsockopt``
		system call.
		* ``BPF_CGROUP_SETSOCKOPT`` - called every time process executes ``setsockopt``
		system call.

		The context (``struct bpf_sockopt``) has associated socket (``sk``) and
		all input arguments: ``level``, ``optname``, ``optval`` and ``optlen``.

		BPF_CGROUP_SETSOCKOPT
		=====================

		``BPF_CGROUP_SETSOCKOPT`` is triggered before the kernel handling of
		sockopt and it has writable context: it can modify the supplied arguments
		before passing them down to the kernel. This hook has access to the cgroup
		and socket local storage.

		If BPF program sets ``optlen`` to -1, the control will be returned
		back to the userspace after all other BPF programs in the cgroup
		chain finish (i.e. kernel ``setsockopt`` handling will not be executed).

		Note, that ``optlen`` can not be increased beyond the user-supplied
		value. It can only be decreased or set to -1. Any other value will
		trigger ``EFAULT``.

		Return Type
		-----------

		* ``0`` - reject the syscall, ``EPERM`` will be returned to the userspace.
		* ``1`` - success, continue with next BPF program in the cgroup chain.

		BPF_CGROUP_GETSOCKOPT
		=====================

		``BPF_CGROUP_GETSOCKOPT`` is triggered after the kernel handing of
		sockopt. The BPF hook can observe ``optval``, ``optlen`` and ``retval``
		if it's interested in whatever kernel has returned. BPF hook can override
		the values above, adjust ``optlen`` and reset ``retval`` to 0. If ``optlen``
		has been increased above initial ``getsockopt`` value (i.e. userspace
		buffer is too small), ``EFAULT`` is returned.

		This hook has access to the cgroup and socket local storage.

		Note, that the only acceptable value to set to ``retval`` is 0 and the
		original value that the kernel returned. Any other value will trigger
		``EFAULT``.

		Return Type
		-----------

		* ``0`` - reject the syscall, ``EPERM`` will be returned to the userspace.
		* ``1`` - success: copy ``optval`` and ``optlen`` to userspace, return
		``retval`` from the syscall (note that this can be overwritten by
		the BPF program from the parent cgroup).

		Cgroup Inheritance
		==================

		Suppose, there is the following cgroup hierarchy where each cgroup
		has ``BPF_CGROUP_GETSOCKOPT`` attached at each level with
		``BPF_F_ALLOW_MULTI`` flag::

		A (root, parent)
		\
		B (child)

		When the application calls ``getsockopt`` syscall from the cgroup B,
		the programs are executed from the bottom up: B, A. First program
		(B) sees the result of kernel's ``getsockopt``. It can optionally
		adjust ``optval``, ``optlen`` and reset ``retval`` to 0. After that
		control will be passed to the second (A) program which will see the
		same context as B including any potential modifications.

		Same for ``BPF_CGROUP_SETSOCKOPT``: if the program is attached to
		A and B, the trigger order is B, then A. If B does any changes
		to the input arguments (``level``, ``optname``, ``optval``, ``optlen``),
		then the next program in the chain (A) will see those changes,
		not the original input ``setsockopt`` arguments. The potentially
		modified values will be then passed down to the kernel.

		Example
		=======

		See ``tools/testing/selftests/bpf/progs/sockopt_sk.c`` for an example
		of BPF program that handles socket options.

include/linux/bpf-cgroup.h

+45 −0

Original line number	Diff line number	Diff line
		@@ -124,6 +124,14 @@ int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head,
		loff_t ppos, void *new_buf,
		enum bpf_attach_type type);

		int __cgroup_bpf_run_filter_setsockopt(struct sock sock, int level,
		int optname, char __user optval,
		int optlen, char *kernel_optval);
		int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
		int optname, char __user *optval,
		int __user *optlen, int max_optlen,
		int retval);

		static inline enum bpf_cgroup_storage_type cgroup_storage_type(
		struct bpf_map *map)
		{
		@@ -286,6 +294,38 @@ int bpf_percpu_cgroup_storage_update(struct bpf_map map, void key,
		__ret; \
		})

		#define BPF_CGROUP_RUN_PROG_SETSOCKOPT(sock, level, optname, optval, optlen, \
		kernel_optval) \
		({ \
		int __ret = 0; \
		if (cgroup_bpf_enabled) \
		__ret = __cgroup_bpf_run_filter_setsockopt(sock, level, \
		optname, optval, \
		optlen, \
		kernel_optval); \
		__ret; \
		})

		#define BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen) \
		({ \
		int __ret = 0; \
		if (cgroup_bpf_enabled) \
		get_user(__ret, optlen); \
		__ret; \
		})

		#define BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock, level, optname, optval, optlen, \
		max_optlen, retval) \
		({ \
		int __ret = retval; \
		if (cgroup_bpf_enabled) \
		__ret = __cgroup_bpf_run_filter_getsockopt(sock, level, \
		optname, optval, \
		optlen, max_optlen, \
		retval); \
		__ret; \
		})

		int cgroup_bpf_prog_attach(const union bpf_attr *attr,
		enum bpf_prog_type ptype, struct bpf_prog *prog);
		int cgroup_bpf_prog_detach(const union bpf_attr *attr,
		@@ -357,6 +397,11 @@ static inline int bpf_percpu_cgroup_storage_update(struct bpf_map *map,
		#define BPF_CGROUP_RUN_PROG_SOCK_OPS(sock_ops) ({ 0; })
		#define BPF_CGROUP_RUN_PROG_DEVICE_CGROUP(type,major,minor,access) ({ 0; })
		#define BPF_CGROUP_RUN_PROG_SYSCTL(head,table,write,buf,count,pos,nbuf) ({ 0; })
		#define BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen) ({ 0; })
		#define BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock, level, optname, optval, \
		optlen, max_optlen, retval) ({ retval; })
		#define BPF_CGROUP_RUN_PROG_SETSOCKOPT(sock, level, optname, optval, optlen, \
		kernel_optval) ({ 0; })

		#define for_each_cgroup_storage_type(stype) for (; false; )

include/linux/bpf.h

+2 −0

Original line number	Diff line number	Diff line
		@@ -518,6 +518,7 @@ struct bpf_prog_array {
		struct bpf_prog_array *bpf_prog_array_alloc(u32 prog_cnt, gfp_t flags);
		void bpf_prog_array_free(struct bpf_prog_array *progs);
		int bpf_prog_array_length(struct bpf_prog_array *progs);
		bool bpf_prog_array_is_empty(struct bpf_prog_array *array);
		int bpf_prog_array_copy_to_user(struct bpf_prog_array *progs,
		__u32 __user *prog_ids, u32 cnt);

		@@ -1051,6 +1052,7 @@ extern const struct bpf_func_proto bpf_spin_unlock_proto;
		extern const struct bpf_func_proto bpf_get_local_storage_proto;
		extern const struct bpf_func_proto bpf_strtol_proto;
		extern const struct bpf_func_proto bpf_strtoul_proto;
		extern const struct bpf_func_proto bpf_tcp_sock_proto;

		/* Shared helpers among cBPF and eBPF. */
		void bpf_user_rnd_init_once(void);

include/linux/bpf_types.h

+1 −0

Original line number	Diff line number	Diff line
		@@ -30,6 +30,7 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE, raw_tracepoint_writable)
		#ifdef CONFIG_CGROUP_BPF
		BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_DEVICE, cg_dev)
		BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_SYSCTL, cg_sysctl)
		BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_SOCKOPT, cg_sockopt)
		#endif
		#ifdef CONFIG_BPF_LIRC_MODE2
		BPF_PROG_TYPE(BPF_PROG_TYPE_LIRC_MODE2, lirc_mode2)

Admin message