Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next (150f29f5) · Commits · 戴 / test

Documentation/bpf/bpf_devel_QA.rst

+12 −7

Original line number	Diff line number	Diff line
		@@ -149,7 +149,7 @@ In case the patch or patch series has to be reworked and sent out
		again in a second or later revision, it is also required to add a
		version number (``v2``, ``v3``, ...) into the subject prefix::

		git format-patch --subject-prefix='PATCH net-next v2' start..finish
		git format-patch --subject-prefix='PATCH bpf-next v2' start..finish

		When changes have been requested to the patch series, always send the
		whole patch series again with the feedback incorporated (never send
		@@ -479,12 +479,13 @@ LLVM's static compiler lists the supported targets through

		$ llc --version
		LLVM (http://llvm.org/):
		LLVM version 6.0.0svn
		LLVM version 10.0.0
		Optimized build.
		Default target: x86_64-unknown-linux-gnu
		Host CPU: skylake

		Registered Targets:
		aarch64 - AArch64 (little endian)
		bpf - BPF (host endian)
		bpfeb - BPF (big endian)
		bpfel - BPF (little endian)
		@@ -517,6 +518,10 @@ from the git repositories::
		The built binaries can then be found in the build/bin/ directory, where
		you can point the PATH variable to.

		Set ``-DLLVM_TARGETS_TO_BUILD`` equal to the target you wish to build, you
		will find a full list of targets within the llvm-project/llvm/lib/Target
		directory.

		Q: Reporting LLVM BPF issues
		----------------------------
		Q: Should I notify BPF kernel maintainers about issues in LLVM's BPF code

Documentation/bpf/btf.rst

+25 −0

Original line number	Diff line number	Diff line
		@@ -724,6 +724,31 @@ want to define unused entry in BTF_ID_LIST, like::
		BTF_ID_UNUSED
		BTF_ID(struct, task_struct)

		The ``BTF_SET_START/END`` macros pair defines sorted list of BTF ID values
		and their count, with following syntax::

		BTF_SET_START(set)
		BTF_ID(type1, name1)
		BTF_ID(type2, name2)
		BTF_SET_END(set)

		resulting in following layout in .BTF_ids section::

		__BTF_ID__set__set:
		.zero 4
		__BTF_ID__type1__name1__3:
		.zero 4
		__BTF_ID__type2__name2__4:
		.zero 4

		The ``struct btf_id_set set;`` variable is defined to access the list.

		The ``typeX`` name can be one of following::

		struct, union, typedef, func

		and is used as a filter when resolving the BTF ID value.

		All the BTF ID lists and sets are compiled in the .BTF_ids section and
		resolved during the linking phase of kernel build by ``resolve_btfids`` tool.

Documentation/bpf/index.rst

+1 −0

Original line number	Diff line number	Diff line
		@@ -52,6 +52,7 @@ Program types
		prog_cgroup_sysctl
		prog_flow_dissector
		bpf_lsm
		prog_sk_lookup


		Map types

Documentation/bpf/prog_sk_lookup.rst

0 → 100644

+98 −0

Original line number	Diff line number	Diff line
		.. SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)

		=====================
		BPF sk_lookup program
		=====================

		BPF sk_lookup program type (``BPF_PROG_TYPE_SK_LOOKUP``) introduces programmability
		into the socket lookup performed by the transport layer when a packet is to be
		delivered locally.

		When invoked BPF sk_lookup program can select a socket that will receive the
		incoming packet by calling the ``bpf_sk_assign()`` BPF helper function.

		Hooks for a common attach point (``BPF_SK_LOOKUP``) exist for both TCP and UDP.

		Motivation
		==========

		BPF sk_lookup program type was introduced to address setup scenarios where
		binding sockets to an address with ``bind()`` socket call is impractical, such
		as:

		1. receiving connections on a range of IP addresses, e.g. 192.0.2.0/24, when
		binding to a wildcard address ``INADRR_ANY`` is not possible due to a port
		conflict,
		2. receiving connections on all or a wide range of ports, i.e. an L7 proxy use
		case.

		Such setups would require creating and ``bind()``'ing one socket to each of the
		IP address/port in the range, leading to resource consumption and potential
		latency spikes during socket lookup.

		Attachment
		==========

		BPF sk_lookup program can be attached to a network namespace with
		``bpf(BPF_LINK_CREATE, ...)`` syscall using the ``BPF_SK_LOOKUP`` attach type and a
		netns FD as attachment ``target_fd``.

		Multiple programs can be attached to one network namespace. Programs will be
		invoked in the same order as they were attached.

		Hooks
		=====

		The attached BPF sk_lookup programs run whenever the transport layer needs to
		find a listening (TCP) or an unconnected (UDP) socket for an incoming packet.

		Incoming traffic to established (TCP) and connected (UDP) sockets is delivered
		as usual without triggering the BPF sk_lookup hook.

		The attached BPF programs must return with either ``SK_PASS`` or ``SK_DROP``
		verdict code. As for other BPF program types that are network filters,
		``SK_PASS`` signifies that the socket lookup should continue on to regular
		hashtable-based lookup, while ``SK_DROP`` causes the transport layer to drop the
		packet.

		A BPF sk_lookup program can also select a socket to receive the packet by
		calling ``bpf_sk_assign()`` BPF helper. Typically, the program looks up a socket
		in a map holding sockets, such as ``SOCKMAP`` or ``SOCKHASH``, and passes a
		``struct bpf_sock *`` to ``bpf_sk_assign()`` helper to record the
		selection. Selecting a socket only takes effect if the program has terminated
		with ``SK_PASS`` code.

		When multiple programs are attached, the end result is determined from return
		codes of all the programs according to the following rules:

		1. If any program returned ``SK_PASS`` and selected a valid socket, the socket
		is used as the result of the socket lookup.
		2. If more than one program returned ``SK_PASS`` and selected a socket, the last
		selection takes effect.
		3. If any program returned ``SK_DROP``, and no program returned ``SK_PASS`` and
		selected a socket, socket lookup fails.
		4. If all programs returned ``SK_PASS`` and none of them selected a socket,
		socket lookup continues on.

		API
		===

		In its context, an instance of ``struct bpf_sk_lookup``, BPF sk_lookup program
		receives information about the packet that triggered the socket lookup. Namely:

		* IP version (``AF_INET`` or ``AF_INET6``),
		* L4 protocol identifier (``IPPROTO_TCP`` or ``IPPROTO_UDP``),
		* source and destination IP address,
		* source and destination L4 port,
		* the socket that has been selected with ``bpf_sk_assign()``.

		Refer to ``struct bpf_sk_lookup`` declaration in ``linux/bpf.h`` user API
		header, and `bpf-helpers(7)
		<https://man7.org/linux/man-pages/man7/bpf-helpers.7.html>`_ man-page section
		for ``bpf_sk_assign()`` for details.

		Example
		=======

		See ``tools/testing/selftests/bpf/prog_tests/sk_lookup.c`` for the reference
		implementation.

Documentation/networking/af_xdp.rst

+58 −10

Original line number	Diff line number	Diff line
		@@ -258,14 +258,21 @@ socket into zero-copy mode or fail.
		XDP_SHARED_UMEM bind flag
		-------------------------

		This flag enables you to bind multiple sockets to the same UMEM, but
		only if they share the same queue id. In this mode, each socket has
		their own RX and TX rings, but the UMEM (tied to the fist socket
		created) only has a single FILL ring and a single COMPLETION
		ring. To use this mode, create the first socket and bind it in the normal
		way. Create a second socket and create an RX and a TX ring, or at
		least one of them, but no FILL or COMPLETION rings as the ones from
		the first socket will be used. In the bind call, set he
		This flag enables you to bind multiple sockets to the same UMEM. It
		works on the same queue id, between queue ids and between
		netdevs/devices. In this mode, each socket has their own RX and TX
		rings as usual, but you are going to have one or more FILL and
		COMPLETION ring pairs. You have to create one of these pairs per
		unique netdev and queue id tuple that you bind to.

		Starting with the case were we would like to share a UMEM between
		sockets bound to the same netdev and queue id. The UMEM (tied to the
		fist socket created) will only have a single FILL ring and a single
		COMPLETION ring as there is only on unique netdev,queue_id tuple that
		we have bound to. To use this mode, create the first socket and bind
		it in the normal way. Create a second socket and create an RX and a TX
		ring, or at least one of them, but no FILL or COMPLETION rings as the
		ones from the first socket will be used. In the bind call, set he
		XDP_SHARED_UMEM option and provide the initial socket's fd in the
		sxdp_shared_umem_fd field. You can attach an arbitrary number of extra
		sockets this way.
		@@ -305,11 +312,41 @@ concurrently. There are no synchronization primitives in the
		libbpf code that protects multiple users at this point in time.

		Libbpf uses this mode if you create more than one socket tied to the
		same umem. However, note that you need to supply the
		same UMEM. However, note that you need to supply the
		XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD libbpf_flag with the
		xsk_socket__create calls and load your own XDP program as there is no
		built in one in libbpf that will route the traffic for you.

		The second case is when you share a UMEM between sockets that are
		bound to different queue ids and/or netdevs. In this case you have to
		create one FILL ring and one COMPLETION ring for each unique
		netdev,queue_id pair. Let us say you want to create two sockets bound
		to two different queue ids on the same netdev. Create the first socket
		and bind it in the normal way. Create a second socket and create an RX
		and a TX ring, or at least one of them, and then one FILL and
		COMPLETION ring for this socket. Then in the bind call, set he
		XDP_SHARED_UMEM option and provide the initial socket's fd in the
		sxdp_shared_umem_fd field as you registered the UMEM on that
		socket. These two sockets will now share one and the same UMEM.

		There is no need to supply an XDP program like the one in the previous
		case where sockets were bound to the same queue id and
		device. Instead, use the NIC's packet steering capabilities to steer
		the packets to the right queue. In the previous example, there is only
		one queue shared among sockets, so the NIC cannot do this steering. It
		can only steer between queues.

		In libbpf, you need to use the xsk_socket__create_shared() API as it
		takes a reference to a FILL ring and a COMPLETION ring that will be
		created for you and bound to the shared UMEM. You can use this
		function for all the sockets you create, or you can use it for the
		second and following ones and use xsk_socket__create() for the first
		one. Both methods yield the same result.

		Note that a UMEM can be shared between sockets on the same queue id
		and device, as well as between queues on the same device and between
		devices at the same time.

		XDP_USE_NEED_WAKEUP bind flag
		-----------------------------

		@@ -364,7 +401,7 @@ resources by only setting up one of them. Both the FILL ring and the
		COMPLETION ring are mandatory as you need to have a UMEM tied to your
		socket. But if the XDP_SHARED_UMEM flag is used, any socket after the
		first one does not have a UMEM and should in that case not have any
		FILL or COMPLETION rings created as the ones from the shared umem will
		FILL or COMPLETION rings created as the ones from the shared UMEM will
		be used. Note, that the rings are single-producer single-consumer, so
		do not try to access them from multiple processes at the same
		time. See the XDP_SHARED_UMEM section.
		@@ -567,6 +604,17 @@ A: The short answer is no, that is not supported at the moment. The
		switch, or other distribution mechanism, in your NIC to direct
		traffic to the correct queue id and socket.

		Q: My packets are sometimes corrupted. What is wrong?

		A: Care has to be taken not to feed the same buffer in the UMEM into
		more than one ring at the same time. If you for example feed the
		same buffer into the FILL ring and the TX ring at the same time, the
		NIC might receive data into the buffer at the same time it is
		sending it. This will cause some packets to become corrupted. Same
		thing goes for feeding the same buffer into the FILL rings
		belonging to different queue ids or netdevs bound with the
		XDP_SHARED_UMEM flag.

		Credits
		=======

Admin message