Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next (f7fb7c1a) · Commits · 戴 / test

Documentation/bpf/bpf_design_QA.rst

+12 −12

Original line number	Diff line number	Diff line
		@@ -36,27 +36,27 @@ consideration important quirks of other architectures) and
		defines calling convention that is compatible with C calling
		convention of the linux kernel on those architectures.

		Q: can multiple return values be supported in the future?
		Q: Can multiple return values be supported in the future?
		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
		A: NO. BPF allows only register R0 to be used as return value.

		Q: can more than 5 function arguments be supported in the future?
		Q: Can more than 5 function arguments be supported in the future?
		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
		A: NO. BPF calling convention only allows registers R1-R5 to be used
		as arguments. BPF is not a standalone instruction set.
		(unlike x64 ISA that allows msft, cdecl and other conventions)

		Q: can BPF programs access instruction pointer or return address?
		Q: Can BPF programs access instruction pointer or return address?
		-----------------------------------------------------------------
		A: NO.

		Q: can BPF programs access stack pointer ?
		Q: Can BPF programs access stack pointer ?
		------------------------------------------
		A: NO.

		Only frame pointer (register R10) is accessible.
		From compiler point of view it's necessary to have stack pointer.
		For example LLVM defines register R11 as stack pointer in its
		For example, LLVM defines register R11 as stack pointer in its
		BPF backend, but it makes sure that generated code never uses it.

		Q: Does C-calling convention diminishes possible use cases?
		@@ -66,8 +66,8 @@ A: YES.
		BPF design forces addition of major functionality in the form
		of kernel helper functions and kernel objects like BPF maps with
		seamless interoperability between them. It lets kernel call into
		BPF programs and programs call kernel helpers with zero overhead.
		As all of them were native C code. That is particularly the case
		BPF programs and programs call kernel helpers with zero overhead,
		as all of them were native C code. That is particularly the case
		for JITed BPF programs that are indistinguishable from
		native kernel C code.

		@@ -75,9 +75,9 @@ Q: Does it mean that 'innovative' extensions to BPF code are disallowed?
		------------------------------------------------------------------------
		A: Soft yes.

		At least for now until BPF core has support for
		At least for now, until BPF core has support for
		bpf-to-bpf calls, indirect calls, loops, global variables,
		jump tables, read only sections and all other normal constructs
		jump tables, read-only sections, and all other normal constructs
		that C code can produce.

		Q: Can loops be supported in a safe way?
		@@ -109,16 +109,16 @@ For example why BPF_JNE and other compare and jumps are not cpu-like?
		A: This was necessary to avoid introducing flags into ISA which are
		impossible to make generic and efficient across CPU architectures.

		Q: why BPF_DIV instruction doesn't map to x64 div?
		Q: Why BPF_DIV instruction doesn't map to x64 div?
		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
		A: Because if we picked one-to-one relationship to x64 it would have made
		it more complicated to support on arm64 and other archs. Also it
		needs div-by-zero runtime check.

		Q: why there is no BPF_SDIV for signed divide operation?
		Q: Why there is no BPF_SDIV for signed divide operation?
		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
		A: Because it would be rarely used. llvm errors in such case and
		prints a suggestion to use unsigned divide instead
		prints a suggestion to use unsigned divide instead.

		Q: Why BPF has implicit prologue and epilogue?
		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Documentation/bpf/btf.rst

+147 −169

File changed.

Preview size limit exceeded, changes collapsed.

Documentation/networking/af_xdp.rst

+35 −1

Original line number	Diff line number	Diff line
		@@ -295,6 +295,41 @@ using::
		For XDP_SKB mode, use the switch "-S" instead of "-N" and all options
		can be displayed with "-h", as usual.

		FAQ
		=======

		Q: I am not seeing any traffic on the socket. What am I doing wrong?

		A: When a netdev of a physical NIC is initialized, Linux usually
		allocates one Rx and Tx queue pair per core. So on a 8 core system,
		queue ids 0 to 7 will be allocated, one per core. In the AF_XDP
		bind call or the xsk_socket__create libbpf function call, you
		specify a specific queue id to bind to and it is only the traffic
		towards that queue you are going to get on you socket. So in the
		example above, if you bind to queue 0, you are NOT going to get any
		traffic that is distributed to queues 1 through 7. If you are
		lucky, you will see the traffic, but usually it will end up on one
		of the queues you have not bound to.

		There are a number of ways to solve the problem of getting the
		traffic you want to the queue id you bound to. If you want to see
		all the traffic, you can force the netdev to only have 1 queue, queue
		id 0, and then bind to queue 0. You can use ethtool to do this::

		sudo ethtool -L <interface> combined 1

		If you want to only see part of the traffic, you can program the
		NIC through ethtool to filter out your traffic to a single queue id
		that you can bind your XDP socket to. Here is one example in which
		UDP traffic to and from port 4242 are sent to queue 2::

		sudo ethtool -N <interface> rx-flow-hash udp4 fn
		sudo ethtool -N <interface> flow-type udp4 src-port 4242 dst-port \
		4242 action 2

		A number of other ways are possible all up to the capabilitites of
		the NIC you have.

		Credits
		=======

		@@ -309,4 +344,3 @@ Credits
		- Michael S. Tsirkin
		- Qi Z Zhang
		- Willem de Bruijn

Documentation/networking/filter.txt

+1 −1

Original line number	Diff line number	Diff line
		@@ -829,7 +829,7 @@ tracing filters may do to maintain counters of events, for example. Register R9
		is not used by socket filters either, but more complex filters may be running
		out of registers and would have to resort to spill/fill to stack.

		Internal BPF can used as generic assembler for last step performance
		Internal BPF can be used as a generic assembler for last step performance
		optimizations, socket filters and seccomp are using it as assembler. Tracing
		filters may use it as assembler to generate code from kernel. In kernel usage
		may not be bounded by security considerations, since generated internal BPF code

include/linux/bpf.h

+9 −0

Original line number	Diff line number	Diff line
		@@ -16,6 +16,7 @@
		#include <linux/rbtree_latch.h>
		#include <linux/numa.h>
		#include <linux/wait.h>
		#include <linux/u64_stats_sync.h>

		struct bpf_verifier_env;
		struct perf_event;
		@@ -340,6 +341,12 @@ enum bpf_cgroup_storage_type {

		#define MAX_BPF_CGROUP_STORAGE_TYPE __BPF_CGROUP_STORAGE_MAX

		struct bpf_prog_stats {
		u64 cnt;
		u64 nsecs;
		struct u64_stats_sync syncp;
		};

		struct bpf_prog_aux {
		atomic_t refcnt;
		u32 used_map_cnt;
		@@ -389,6 +396,7 @@ struct bpf_prog_aux {
		* main prog always has linfo_idx == 0
		*/
		u32 linfo_idx;
		struct bpf_prog_stats __percpu *stats;
		union {
		struct work_struct work;
		struct rcu_head rcu;
		@@ -559,6 +567,7 @@ void bpf_map_area_free(void *base);
		void bpf_map_init_from_attr(struct bpf_map map, union bpf_attr attr);

		extern int sysctl_unprivileged_bpf_disabled;
		extern int sysctl_bpf_stats_enabled;

		int bpf_map_new_fd(struct bpf_map *map, int flags);
		int bpf_prog_new_fd(struct bpf_prog *prog);

Admin message