Commit 29d9f30d authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull networking updates from David Miller:
 "Highlights:

   1) Fix the iwlwifi regression, from Johannes Berg.

   2) Support BSS coloring and 802.11 encapsulation offloading in
      hardware, from John Crispin.

   3) Fix some potential Spectre issues in qtnfmac, from Sergey
      Matyukevich.

   4) Add TTL decrement action to openvswitch, from Matteo Croce.

   5) Allow paralleization through flow_action setup by not taking the
      RTNL mutex, from Vlad Buslov.

   6) A lot of zero-length array to flexible-array conversions, from
      Gustavo A. R. Silva.

   7) Align XDP statistics names across several drivers for consistency,
      from Lorenzo Bianconi.

   8) Add various pieces of infrastructure for offloading conntrack, and
      make use of it in mlx5 driver, from Paul Blakey.

   9) Allow using listening sockets in BPF sockmap, from Jakub Sitnicki.

  10) Lots of parallelization improvements during configuration changes
      in mlxsw driver, from Ido Schimmel.

  11) Add support to devlink for generic packet traps, which report
      packets dropped during ACL processing. And use them in mlxsw
      driver. From Jiri Pirko.

  12) Support bcmgenet on ACPI, from Jeremy Linton.

  13) Make BPF compatible with RT, from Thomas Gleixnet, Alexei
      Starovoitov, and your's truly.

  14) Support XDP meta-data in virtio_net, from Yuya Kusakabe.

  15) Fix sysfs permissions when network devices change namespaces, from
      Christian Brauner.

  16) Add a flags element to ethtool_ops so that drivers can more simply
      indicate which coalescing parameters they actually support, and
      therefore the generic layer can validate the user's ethtool
      request. Use this in all drivers, from Jakub Kicinski.

  17) Offload FIFO qdisc in mlxsw, from Petr Machata.

  18) Support UDP sockets in sockmap, from Lorenz Bauer.

  19) Fix stretch ACK bugs in several TCP congestion control modules,
      from Pengcheng Yang.

  20) Support virtual functiosn in octeontx2 driver, from Tomasz
      Duszynski.

  21) Add region operations for devlink and use it in ice driver to dump
      NVM contents, from Jacob Keller.

  22) Add support for hw offload of MACSEC, from Antoine Tenart.

  23) Add support for BPF programs that can be attached to LSM hooks,
      from KP Singh.

  24) Support for multiple paths, path managers, and counters in MPTCP.
      From Peter Krystad, Paolo Abeni, Florian Westphal, Davide Caratti,
      and others.

  25) More progress on adding the netlink interface to ethtool, from
      Michal Kubecek"

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2121 commits)
  net: ipv6: rpl_iptunnel: Fix potential memory leak in rpl_do_srh_inline
  cxgb4/chcr: nic-tls stats in ethtool
  net: dsa: fix oops while probing Marvell DSA switches
  net/bpfilter: remove superfluous testing message
  net: macb: Fix handling of fixed-link node
  net: dsa: ksz: Select KSZ protocol tag
  netdevsim: dev: Fix memory leak in nsim_dev_take_snapshot_write
  net: stmmac: add EHL 2.5Gbps PCI info and PCI ID
  net: stmmac: add EHL PSE0 & PSE1 1Gbps PCI info and PCI ID
  net: stmmac: create dwmac-intel.c to contain all Intel platform
  net: dsa: bcm_sf2: Support specifying VLAN tag egress rule
  net: dsa: bcm_sf2: Add support for matching VLAN TCI
  net: dsa: bcm_sf2: Move writing of CFP_DATA(5) into slicing functions
  net: dsa: bcm_sf2: Check earlier for FLOW_EXT and FLOW_MAC_EXT
  net: dsa: bcm_sf2: Disable learning for ASP port
  net: dsa: b53: Deny enslaving port 7 for 7278 into a bridge
  net: dsa: b53: Prevent tagged VLAN on port 7 for 7278
  net: dsa: b53: Restore VLAN entries upon (re)configuration
  net: dsa: bcm_sf2: Fix overflow checks
  hv_netvsc: Remove unnecessary round_up for recv_completion_cnt
  ...
parents 56a451b7 7f80ccfe
Loading
Loading
Loading
Loading
+2 −1
Original line number Diff line number Diff line
@@ -67,7 +67,8 @@ two flavors of JITs, the newer eBPF JIT currently supported on:
  - sparc64
  - mips64
  - s390x
  - riscv
  - riscv64
  - riscv32

And the older cBPF JIT supported on the following archs:

+12 −17
Original line number Diff line number Diff line
@@ -20,11 +20,11 @@ Reporting bugs
Q: How do I report bugs for BPF kernel code?
--------------------------------------------
A: Since all BPF kernel development as well as bpftool and iproute2 BPF
loader development happens through the netdev kernel mailing list,
loader development happens through the bpf kernel mailing list,
please report any found issues around BPF to the following mailing
list:

 netdev@vger.kernel.org
 bpf@vger.kernel.org

This may also include issues related to XDP, BPF tracing, etc.

@@ -46,17 +46,12 @@ Submitting patches

Q: To which mailing list do I need to submit my BPF patches?
------------------------------------------------------------
A: Please submit your BPF patches to the netdev kernel mailing list:

 netdev@vger.kernel.org
A: Please submit your BPF patches to the bpf kernel mailing list:

Historically, BPF came out of networking and has always been maintained
by the kernel networking community. Although these days BPF touches
many other subsystems as well, the patches are still routed mainly
through the networking community.
 bpf@vger.kernel.org

In case your patch has changes in various different subsystems (e.g.
tracing, security, etc), make sure to Cc the related kernel mailing
networking, tracing, security, etc), make sure to Cc the related kernel mailing
lists and maintainers from there as well, so they are able to review
the changes and provide their Acked-by's to the patches.

@@ -168,7 +163,7 @@ a BPF point of view.
Be aware that this is not a final verdict that the patch will
automatically get accepted into net or net-next trees eventually:

On the netdev kernel mailing list reviews can come in at any point
On the bpf kernel mailing list reviews can come in at any point
in time. If discussions around a patch conclude that they cannot
get included as-is, we will either apply a follow-up fix or drop
them from the trees entirely. Therefore, we also reserve to rebase
@@ -494,15 +489,15 @@ A: You need cmake and gcc-c++ as build requisites for LLVM. Once you have
that set up, proceed with building the latest LLVM and clang version
from the git repositories::

     $ git clone http://llvm.org/git/llvm.git
     $ cd llvm/tools
     $ git clone --depth 1 http://llvm.org/git/clang.git
     $ cd ..; mkdir build; cd build
     $ cmake .. -DLLVM_TARGETS_TO_BUILD="BPF;X86" \
     $ git clone https://github.com/llvm/llvm-project.git
     $ mkdir -p llvm-project/llvm/build/install
     $ cd llvm-project/llvm/build
     $ cmake .. -G "Ninja" -DLLVM_TARGETS_TO_BUILD="BPF;X86" \
                -DLLVM_ENABLE_PROJECTS="clang"    \
                -DBUILD_SHARED_LIBS=OFF           \
                -DCMAKE_BUILD_TYPE=Release        \
                -DLLVM_BUILD_RUNTIME=OFF
     $ make -j $(getconf _NPROCESSORS_ONLN)
     $ ninja

The built binaries can then be found in the build/bin/ directory, where
you can point the PATH variable to.
+142 −0
Original line number Diff line number Diff line
.. SPDX-License-Identifier: GPL-2.0+
.. Copyright (C) 2020 Google LLC.

================
LSM BPF Programs
================

These BPF programs allow runtime instrumentation of the LSM hooks by privileged
users to implement system-wide MAC (Mandatory Access Control) and Audit
policies using eBPF.

Structure
---------

The example shows an eBPF program that can be attached to the ``file_mprotect``
LSM hook:

.. c:function:: int file_mprotect(struct vm_area_struct *vma, unsigned long reqprot, unsigned long prot);

Other LSM hooks which can be instrumented can be found in
``include/linux/lsm_hooks.h``.

eBPF programs that use :doc:`/bpf/btf` do not need to include kernel headers
for accessing information from the attached eBPF program's context. They can
simply declare the structures in the eBPF program and only specify the fields
that need to be accessed.

.. code-block:: c

	struct mm_struct {
		unsigned long start_brk, brk, start_stack;
	} __attribute__((preserve_access_index));

	struct vm_area_struct {
		unsigned long start_brk, brk, start_stack;
		unsigned long vm_start, vm_end;
		struct mm_struct *vm_mm;
	} __attribute__((preserve_access_index));


.. note:: The order of the fields is irrelevant.

This can be further simplified (if one has access to the BTF information at
build time) by generating the ``vmlinux.h`` with:

.. code-block:: console

	# bpftool btf dump file <path-to-btf-vmlinux> format c > vmlinux.h

.. note:: ``path-to-btf-vmlinux`` can be ``/sys/kernel/btf/vmlinux`` if the
	  build environment matches the environment the BPF programs are
	  deployed in.

The ``vmlinux.h`` can then simply be included in the BPF programs without
requiring the definition of the types.

The eBPF programs can be declared using the``BPF_PROG``
macros defined in `tools/lib/bpf/bpf_tracing.h`_. In this
example:

	* ``"lsm/file_mprotect"`` indicates the LSM hook that the program must
	  be attached to
	* ``mprotect_audit`` is the name of the eBPF program

.. code-block:: c

	SEC("lsm/file_mprotect")
	int BPF_PROG(mprotect_audit, struct vm_area_struct *vma,
		     unsigned long reqprot, unsigned long prot, int ret)
	{
		/* ret is the return value from the previous BPF program
		 * or 0 if it's the first hook.
		 */
		if (ret != 0)
			return ret;

		int is_heap;

		is_heap = (vma->vm_start >= vma->vm_mm->start_brk &&
			   vma->vm_end <= vma->vm_mm->brk);

		/* Return an -EPERM or write information to the perf events buffer
		 * for auditing
		 */
		if (is_heap)
			return -EPERM;
	}

The ``__attribute__((preserve_access_index))`` is a clang feature that allows
the BPF verifier to update the offsets for the access at runtime using the
:doc:`/bpf/btf` information. Since the BPF verifier is aware of the types, it
also validates all the accesses made to the various types in the eBPF program.

Loading
-------

eBPF programs can be loaded with the :manpage:`bpf(2)` syscall's
``BPF_PROG_LOAD`` operation:

.. code-block:: c

	struct bpf_object *obj;

	obj = bpf_object__open("./my_prog.o");
	bpf_object__load(obj);

This can be simplified by using a skeleton header generated by ``bpftool``:

.. code-block:: console

	# bpftool gen skeleton my_prog.o > my_prog.skel.h

and the program can be loaded by including ``my_prog.skel.h`` and using
the generated helper, ``my_prog__open_and_load``.

Attachment to LSM Hooks
-----------------------

The LSM allows attachment of eBPF programs as LSM hooks using :manpage:`bpf(2)`
syscall's ``BPF_RAW_TRACEPOINT_OPEN`` operation or more simply by
using the libbpf helper ``bpf_program__attach_lsm``.

The program can be detached from the LSM hook by *destroying* the ``link``
link returned by ``bpf_program__attach_lsm`` using ``bpf_link__destroy``.

One can also use the helpers generated in ``my_prog.skel.h`` i.e.
``my_prog__attach`` for attachment and ``my_prog__destroy`` for cleaning up.

Examples
--------

An example eBPF program can be found in
`tools/testing/selftests/bpf/progs/lsm.c`_ and the corresponding
userspace code in `tools/testing/selftests/bpf/prog_tests/test_lsm.c`_

.. Links
.. _tools/lib/bpf/bpf_tracing.h:
   https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/tools/lib/bpf/bpf_tracing.h
.. _tools/testing/selftests/bpf/progs/lsm.c:
   https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/tools/testing/selftests/bpf/progs/lsm.c
.. _tools/testing/selftests/bpf/prog_tests/test_lsm.c:
   https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/tools/testing/selftests/bpf/prog_tests/test_lsm.c
+213 −0
Original line number Diff line number Diff line
.. SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)

==============
BPF drgn tools
==============

drgn scripts is a convenient and easy to use mechanism to retrieve arbitrary
kernel data structures. drgn is not relying on kernel UAPI to read the data.
Instead it's reading directly from ``/proc/kcore`` or vmcore and pretty prints
the data based on DWARF debug information from vmlinux.

This document describes BPF related drgn tools.

See `drgn/tools`_ for all tools available at the moment and `drgn/doc`_ for
more details on drgn itself.

bpf_inspect.py
--------------

Description
===========

`bpf_inspect.py`_ is a tool intended to inspect BPF programs and maps. It can
iterate over all programs and maps in the system and print basic information
about these objects, including id, type and name.

The main use-case `bpf_inspect.py`_ covers is to show BPF programs of types
``BPF_PROG_TYPE_EXT`` and ``BPF_PROG_TYPE_TRACING`` attached to other BPF
programs via ``freplace``/``fentry``/``fexit`` mechanisms, since there is no
user-space API to get this information.

Getting started
===============

List BPF programs (full names are obtained from BTF)::

    % sudo bpf_inspect.py prog
        27: BPF_PROG_TYPE_TRACEPOINT         tracepoint__tcp__tcp_send_reset
      4632: BPF_PROG_TYPE_CGROUP_SOCK_ADDR   tw_ipt_bind
     49464: BPF_PROG_TYPE_RAW_TRACEPOINT     raw_tracepoint__sched_process_exit

List BPF maps::

      % sudo bpf_inspect.py map
        2577: BPF_MAP_TYPE_HASH                tw_ipt_vips
        4050: BPF_MAP_TYPE_STACK_TRACE         stack_traces
        4069: BPF_MAP_TYPE_PERCPU_ARRAY        ned_dctcp_cntr

Find BPF programs attached to BPF program ``test_pkt_access``::

      % sudo bpf_inspect.py p | grep test_pkt_access
         650: BPF_PROG_TYPE_SCHED_CLS          test_pkt_access
         654: BPF_PROG_TYPE_TRACING            test_main                        linked:[650->25: BPF_TRAMP_FEXIT test_pkt_access->test_pkt_access()]
         655: BPF_PROG_TYPE_TRACING            test_subprog1                    linked:[650->29: BPF_TRAMP_FEXIT test_pkt_access->test_pkt_access_subprog1()]
         656: BPF_PROG_TYPE_TRACING            test_subprog2                    linked:[650->31: BPF_TRAMP_FEXIT test_pkt_access->test_pkt_access_subprog2()]
         657: BPF_PROG_TYPE_TRACING            test_subprog3                    linked:[650->21: BPF_TRAMP_FEXIT test_pkt_access->test_pkt_access_subprog3()]
         658: BPF_PROG_TYPE_EXT                new_get_skb_len                  linked:[650->16: BPF_TRAMP_REPLACE test_pkt_access->get_skb_len()]
         659: BPF_PROG_TYPE_EXT                new_get_skb_ifindex              linked:[650->23: BPF_TRAMP_REPLACE test_pkt_access->get_skb_ifindex()]
         660: BPF_PROG_TYPE_EXT                new_get_constant                 linked:[650->19: BPF_TRAMP_REPLACE test_pkt_access->get_constant()]

It can be seen that there is a program ``test_pkt_access``, id 650 and there
are multiple other tracing and ext programs attached to functions in
``test_pkt_access``.

For example the line::

         658: BPF_PROG_TYPE_EXT                new_get_skb_len                  linked:[650->16: BPF_TRAMP_REPLACE test_pkt_access->get_skb_len()]

, means that BPF program id 658, type ``BPF_PROG_TYPE_EXT``, name
``new_get_skb_len`` replaces (``BPF_TRAMP_REPLACE``) function ``get_skb_len()``
that has BTF id 16 in BPF program id 650, name ``test_pkt_access``.

Getting help:

.. code-block:: none

    % sudo bpf_inspect.py
    usage: bpf_inspect.py [-h] {prog,p,map,m} ...

    drgn script to list BPF programs or maps and their properties
    unavailable via kernel API.

    See https://github.com/osandov/drgn/ for more details on drgn.

    optional arguments:
      -h, --help      show this help message and exit

    subcommands:
      {prog,p,map,m}
        prog (p)      list BPF programs
        map (m)       list BPF maps

Customization
=============

The script is intended to be customized by developers to print relevant
information about BPF programs, maps and other objects.

For example, to print ``struct bpf_prog_aux`` for BPF program id 53077:

.. code-block:: none

    % git diff
    diff --git a/tools/bpf_inspect.py b/tools/bpf_inspect.py
    index 650e228..aea2357 100755
    --- a/tools/bpf_inspect.py
    +++ b/tools/bpf_inspect.py
    @@ -112,7 +112,9 @@ def list_bpf_progs(args):
             if linked:
                 linked = f" linked:[{linked}]"

    -        print(f"{id_:>6}: {type_:32} {name:32} {linked}")
    +        if id_ == 53077:
    +            print(f"{id_:>6}: {type_:32} {name:32}")
    +            print(f"{bpf_prog.aux}")


     def list_bpf_maps(args):

It produces the output::

    % sudo bpf_inspect.py p
     53077: BPF_PROG_TYPE_XDP                tw_xdp_policer
    *(struct bpf_prog_aux *)0xffff8893fad4b400 = {
            .refcnt = (atomic64_t){
                    .counter = (long)58,
            },
            .used_map_cnt = (u32)1,
            .max_ctx_offset = (u32)8,
            .max_pkt_offset = (u32)15,
            .max_tp_access = (u32)0,
            .stack_depth = (u32)8,
            .id = (u32)53077,
            .func_cnt = (u32)0,
            .func_idx = (u32)0,
            .attach_btf_id = (u32)0,
            .linked_prog = (struct bpf_prog *)0x0,
            .verifier_zext = (bool)0,
            .offload_requested = (bool)0,
            .attach_btf_trace = (bool)0,
            .func_proto_unreliable = (bool)0,
            .trampoline_prog_type = (enum bpf_tramp_prog_type)BPF_TRAMP_FENTRY,
            .trampoline = (struct bpf_trampoline *)0x0,
            .tramp_hlist = (struct hlist_node){
                    .next = (struct hlist_node *)0x0,
                    .pprev = (struct hlist_node **)0x0,
            },
            .attach_func_proto = (const struct btf_type *)0x0,
            .attach_func_name = (const char *)0x0,
            .func = (struct bpf_prog **)0x0,
            .jit_data = (void *)0x0,
            .poke_tab = (struct bpf_jit_poke_descriptor *)0x0,
            .size_poke_tab = (u32)0,
            .ksym_tnode = (struct latch_tree_node){
                    .node = (struct rb_node [2]){
                            {
                                    .__rb_parent_color = (unsigned long)18446612956263126665,
                                    .rb_right = (struct rb_node *)0x0,
                                    .rb_left = (struct rb_node *)0xffff88a0be3d0088,
                            },
                            {
                                    .__rb_parent_color = (unsigned long)18446612956263126689,
                                    .rb_right = (struct rb_node *)0x0,
                                    .rb_left = (struct rb_node *)0xffff88a0be3d00a0,
                            },
                    },
            },
            .ksym_lnode = (struct list_head){
                    .next = (struct list_head *)0xffff88bf481830b8,
                    .prev = (struct list_head *)0xffff888309f536b8,
            },
            .ops = (const struct bpf_prog_ops *)xdp_prog_ops+0x0 = 0xffffffff820fa350,
            .used_maps = (struct bpf_map **)0xffff889ff795de98,
            .prog = (struct bpf_prog *)0xffffc9000cf2d000,
            .user = (struct user_struct *)root_user+0x0 = 0xffffffff82444820,
            .load_time = (u64)2408348759285319,
            .cgroup_storage = (struct bpf_map *[2]){},
            .name = (char [16])"tw_xdp_policer",
            .security = (void *)0xffff889ff795d548,
            .offload = (struct bpf_prog_offload *)0x0,
            .btf = (struct btf *)0xffff8890ce6d0580,
            .func_info = (struct bpf_func_info *)0xffff889ff795d240,
            .func_info_aux = (struct bpf_func_info_aux *)0xffff889ff795de20,
            .linfo = (struct bpf_line_info *)0xffff888a707afc00,
            .jited_linfo = (void **)0xffff8893fad48600,
            .func_info_cnt = (u32)1,
            .nr_linfo = (u32)37,
            .linfo_idx = (u32)0,
            .num_exentries = (u32)0,
            .extable = (struct exception_table_entry *)0xffffffffa032d950,
            .stats = (struct bpf_prog_stats *)0x603fe3a1f6d0,
            .work = (struct work_struct){
                    .data = (atomic_long_t){
                            .counter = (long)0,
                    },
                    .entry = (struct list_head){
                            .next = (struct list_head *)0x0,
                            .prev = (struct list_head *)0x0,
                    },
                    .func = (work_func_t)0x0,
            },
            .rcu = (struct callback_head){
                    .next = (struct callback_head *)0x0,
                    .func = (void (*)(struct callback_head *))0x0,
            },
    }


.. Links
.. _drgn/doc: https://drgn.readthedocs.io/en/latest/
.. _drgn/tools: https://github.com/osandov/drgn/tree/master/tools
.. _bpf_inspect.py:
   https://github.com/osandov/drgn/blob/master/tools/bpf_inspect.py
+4 −2
Original line number Diff line number Diff line
@@ -45,14 +45,16 @@ Program types
   prog_cgroup_sockopt
   prog_cgroup_sysctl
   prog_flow_dissector
   bpf_lsm


Testing BPF
===========
Testing and debugging BPF
=========================

.. toctree::
   :maxdepth: 1

   drgn
   s390


Loading