Commit ae575c8a authored by Alexei Starovoitov's avatar Alexei Starovoitov
Browse files

Merge branch 'jmp32-insns'

Jiong Wang says:

====================
v3 -> v4:
 - Fixed rebase issue. JMP32 checks were missing in two new functions:
    + kernel/bpf/verifier.c:insn_is_cond_jump
    + drivers/net/ethernet/netronome/nfp/bpf/main.h:is_mbpf_cond_jump
   (Daniel)
 - Further rebased on top of latest llvm-readelf change.

v2 -> v3:
 - Added missed check on JMP32 inside bpf_jit_build_body. (Sandipan)
 - Wrap ?: statements in s390 port with brace. They are used by macros
   which doesn't guard the operand with brace.
 - Fixed the ',' issues test_verifier change.
 - Reorder two selftests patches to be near each other.
 - Rebased on top of latest bpf-next.

v1 -> v2:
 - Updated encoding. Use reserved insn class 0x6 instead of packing with
   existing BPF_JMP. (Alexei)
 - Updated code comments in s390 port. (Martin)
 - Separate JIT function for jeq32_imm in NFP port. (Jakub)
 - Re-implemented auto-testing support. (Jakub)
 - Moved testcases to test_verifer.c, plus more unit tests. (Jakub)
 - Fixed JEQ/JNE range deduction. (Jakub)
 - Also supported JSET in this patch set.
 - Fixed/Improved range deduction for all the other operations. All C
   programs under bpf selftest passed verification now.
 - Improved min/max code implementation.
 - Fixed bpftool/disassembler.

Current eBPF ISA has 32-bit sub-register and has defined a set of ALU32
instructions.

However, there is no JMP32 instructions, the consequence is code-gen for
32-bit sub-registers is not efficient. For example, explicit sign-extension
from 32-bit to 64-bit is needed for signed comparison.

Adding JMP32 instruction therefore could complete eBPF ISA on 32-bit
sub-register support. This also match those JMP32 instructions in most JIT
backends, for example x64-64 and AArch64. These new eBPF JMP32 instructions
could have one-to-one map on them.

A few verifier ALU32 related bugs has been fixed recently, and JMP32
introduced by this set further improves BPF sub-register ecosystem. Once
this is landed, BPF programs using 32-bit sub-register ISA could get
reasonably good support from verifier and JIT compilers. Users then could
compare the runtime efficiency of one BPF program under both modes, and
could use the one shown better from benchmark result.

From benchmark results on some Cilium BPF programs, for 64-bit arches,
after JMP32 introduced, programs compiled with -mattr=+alu32 (meaning
enable sub-register usage) are smaller in code size and generally smaller
in verifier processed insn number.

Benchmark results
===
Text size in bytes (generated by "size")
---
LLVM code-gen option   default  alu32  alu32/jmp32  change Vs.  change Vs.
                                                    alu32       default
bpf_lb-DLB_L3.o:       6456     6280   6160         -1.91%      -4.58%
bpf_lb-DLB_L4.o:       7848     7664   7136         -6.89%      -9.07%
bpf_lb-DUNKNOWN.o:     2680     2664   2568         -3.60%      -4.18%
bpf_lxc.o:             104824   104744 97360        -7.05%      -7.12%
bpf_netdev.o:          23456    23576  21632        -8.25%      -7.78%
bpf_overlay.o:         16184    16304  14648        -10.16%     -9.49%

Processed instruction number
---
LLVM code-gen option   default  alu32  alu32/jmp32  change Vs.  change Vs.
                                                    alu32       default
bpf_lb-DLB_L3.o:       1579     1281   1295         +1.09%      -17.99%
bpf_lb-DLB_L4.o:       2045     1663   1556         -6.43%      -23.91%
bpf_lb-DUNKNOWN.o:     606      513    501          -2.34%      -17.33%
bpf_lxc.o:             85381    103218 94435        -8.51%      +10.60%
bpf_netdev.o:          5246     5809   5200         -10.48%     -0.08%
bpf_overlay.o:         2443     2705   2456         -9.02%      -0.53%

It is even better for 32-bit arches like x32, arm32 and nfp etc, as now
some conditional jump will become JMP32 which doesn't require code-gen for
high 32-bit comparison.

Encoding
===
The new JMP32 instructions are using new BPF_JMP32 class which is using
the reserved eBPF class number 0x6. And BPF_JA/CALL/EXIT only exist for
BPF_JMP, they are reserved opcode for BPF_JMP32.

LLVM support
===
A couple of unit tests has been added and included in this set. Also LLVM
code-gen for JMP32 has been added, so you could just compile any BPF C
program with both -mcpu=probe and -mattr=+alu32 specified. If you are
compiling on a machine with kernel patched by this set, LLVM will select
the ISA automatically based on host probe results. Otherwise specify
-mcpu=v3 and -mattr=+alu32 could also force use JMP32 ISA.

   LLVM support could be found at:

     https://github.com/Netronome/llvm/tree/jmp32-v2



   (clang driver also taught about the new "v3" processor, will send out
    merge request for both clang and llvm once kernel set landed.)

JIT backends support
===
A couple of JIT backends has been supported in this set except SPARC and
MIPS. It shouldn't be a big issue for these two ports as LLVM default won't
generate JMP32 insns, it will only generate them when host machine is
probed to be with the support.

Thanks.
====================

Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
parents dbbd79ae 3ef84346
Loading
Loading
Loading
Loading
+8 −7
Original line number Diff line number Diff line
@@ -865,7 +865,7 @@ Three LSB bits store instruction class which is one of:
  BPF_STX   0x03          BPF_STX   0x03
  BPF_ALU   0x04          BPF_ALU   0x04
  BPF_JMP   0x05          BPF_JMP   0x05
  BPF_RET   0x06          [ class 6 unused, for future if needed ]
  BPF_RET   0x06          BPF_JMP32 0x06
  BPF_MISC  0x07          BPF_ALU64 0x07

When BPF_CLASS(code) == BPF_ALU or BPF_JMP, 4th bit encodes source operand ...
@@ -902,9 +902,9 @@ If BPF_CLASS(code) == BPF_ALU or BPF_ALU64 [ in eBPF ], BPF_OP(code) is one of:
  BPF_ARSH  0xc0  /* eBPF only: sign extending shift right */
  BPF_END   0xd0  /* eBPF only: endianness conversion */

If BPF_CLASS(code) == BPF_JMP, BPF_OP(code) is one of:
If BPF_CLASS(code) == BPF_JMP or BPF_JMP32 [ in eBPF ], BPF_OP(code) is one of:

  BPF_JA    0x00
  BPF_JA    0x00  /* BPF_JMP only */
  BPF_JEQ   0x10
  BPF_JGT   0x20
  BPF_JGE   0x30
@@ -912,8 +912,8 @@ If BPF_CLASS(code) == BPF_JMP, BPF_OP(code) is one of:
  BPF_JNE   0x50  /* eBPF only: jump != */
  BPF_JSGT  0x60  /* eBPF only: signed '>' */
  BPF_JSGE  0x70  /* eBPF only: signed '>=' */
  BPF_CALL  0x80  /* eBPF only: function call */
  BPF_EXIT  0x90  /* eBPF only: function return */
  BPF_CALL  0x80  /* eBPF BPF_JMP only: function call */
  BPF_EXIT  0x90  /* eBPF BPF_JMP only: function return */
  BPF_JLT   0xa0  /* eBPF only: unsigned '<' */
  BPF_JLE   0xb0  /* eBPF only: unsigned '<=' */
  BPF_JSLT  0xc0  /* eBPF only: signed '<' */
@@ -936,8 +936,9 @@ Classic BPF wastes the whole BPF_RET class to represent a single 'ret'
operation. Classic BPF_RET | BPF_K means copy imm32 into return register
and perform function exit. eBPF is modeled to match CPU, so BPF_JMP | BPF_EXIT
in eBPF means function exit only. The eBPF program needs to store return
value into register R0 before doing a BPF_EXIT. Class 6 in eBPF is currently
unused and reserved for future use.
value into register R0 before doing a BPF_EXIT. Class 6 in eBPF is used as
BPF_JMP32 to mean exactly the same operations as BPF_JMP, but with 32-bit wide
operands for the comparisons instead.

For load and store instructions the 8-bit 'code' field is divided as:

+44 −9
Original line number Diff line number Diff line
@@ -1083,12 +1083,17 @@ static inline void emit_ldx_r(const s8 dst[], const s8 src,

/* Arithmatic Operation */
static inline void emit_ar_r(const u8 rd, const u8 rt, const u8 rm,
			     const u8 rn, struct jit_ctx *ctx, u8 op) {
			     const u8 rn, struct jit_ctx *ctx, u8 op,
			     bool is_jmp64) {
	switch (op) {
	case BPF_JSET:
		if (is_jmp64) {
			emit(ARM_AND_R(ARM_IP, rt, rn), ctx);
			emit(ARM_AND_R(ARM_LR, rd, rm), ctx);
			emit(ARM_ORRS_R(ARM_IP, ARM_LR, ARM_IP), ctx);
		} else {
			emit(ARM_ANDS_R(ARM_IP, rt, rn), ctx);
		}
		break;
	case BPF_JEQ:
	case BPF_JNE:
@@ -1096,17 +1101,24 @@ static inline void emit_ar_r(const u8 rd, const u8 rt, const u8 rm,
	case BPF_JGE:
	case BPF_JLE:
	case BPF_JLT:
		if (is_jmp64) {
			emit(ARM_CMP_R(rd, rm), ctx);
			/* Only compare low halve if high halve are equal. */
			_emit(ARM_COND_EQ, ARM_CMP_R(rt, rn), ctx);
		} else {
			emit(ARM_CMP_R(rt, rn), ctx);
		}
		break;
	case BPF_JSLE:
	case BPF_JSGT:
		emit(ARM_CMP_R(rn, rt), ctx);
		if (is_jmp64)
			emit(ARM_SBCS_R(ARM_IP, rm, rd), ctx);
		break;
	case BPF_JSLT:
	case BPF_JSGE:
		emit(ARM_CMP_R(rt, rn), ctx);
		if (is_jmp64)
			emit(ARM_SBCS_R(ARM_IP, rd, rm), ctx);
		break;
	}
@@ -1615,6 +1627,17 @@ exit:
	case BPF_JMP | BPF_JLT | BPF_X:
	case BPF_JMP | BPF_JSLT | BPF_X:
	case BPF_JMP | BPF_JSLE | BPF_X:
	case BPF_JMP32 | BPF_JEQ | BPF_X:
	case BPF_JMP32 | BPF_JGT | BPF_X:
	case BPF_JMP32 | BPF_JGE | BPF_X:
	case BPF_JMP32 | BPF_JNE | BPF_X:
	case BPF_JMP32 | BPF_JSGT | BPF_X:
	case BPF_JMP32 | BPF_JSGE | BPF_X:
	case BPF_JMP32 | BPF_JSET | BPF_X:
	case BPF_JMP32 | BPF_JLE | BPF_X:
	case BPF_JMP32 | BPF_JLT | BPF_X:
	case BPF_JMP32 | BPF_JSLT | BPF_X:
	case BPF_JMP32 | BPF_JSLE | BPF_X:
		/* Setup source registers */
		rm = arm_bpf_get_reg32(src_hi, tmp2[0], ctx);
		rn = arm_bpf_get_reg32(src_lo, tmp2[1], ctx);
@@ -1641,6 +1664,17 @@ exit:
	case BPF_JMP | BPF_JLE | BPF_K:
	case BPF_JMP | BPF_JSLT | BPF_K:
	case BPF_JMP | BPF_JSLE | BPF_K:
	case BPF_JMP32 | BPF_JEQ | BPF_K:
	case BPF_JMP32 | BPF_JGT | BPF_K:
	case BPF_JMP32 | BPF_JGE | BPF_K:
	case BPF_JMP32 | BPF_JNE | BPF_K:
	case BPF_JMP32 | BPF_JSGT | BPF_K:
	case BPF_JMP32 | BPF_JSGE | BPF_K:
	case BPF_JMP32 | BPF_JSET | BPF_K:
	case BPF_JMP32 | BPF_JLT | BPF_K:
	case BPF_JMP32 | BPF_JLE | BPF_K:
	case BPF_JMP32 | BPF_JSLT | BPF_K:
	case BPF_JMP32 | BPF_JSLE | BPF_K:
		if (off == 0)
			break;
		rm = tmp2[0];
@@ -1652,7 +1686,8 @@ go_jmp:
		rd = arm_bpf_get_reg64(dst, tmp, ctx);

		/* Check for the condition */
		emit_ar_r(rd[0], rd[1], rm, rn, ctx, BPF_OP(code));
		emit_ar_r(rd[0], rd[1], rm, rn, ctx, BPF_OP(code),
			  BPF_CLASS(code) == BPF_JMP);

		/* Setup JUMP instruction */
		jmp_offset = bpf2a32_offset(i+off, i, ctx);
+2 −0
Original line number Diff line number Diff line
@@ -62,6 +62,7 @@
#define ARM_INST_ADDS_I		0x02900000

#define ARM_INST_AND_R		0x00000000
#define ARM_INST_ANDS_R		0x00100000
#define ARM_INST_AND_I		0x02000000

#define ARM_INST_BIC_R		0x01c00000
@@ -172,6 +173,7 @@
#define ARM_ADC_I(rd, rn, imm)	_AL3_I(ARM_INST_ADC, rd, rn, imm)

#define ARM_AND_R(rd, rn, rm)	_AL3_R(ARM_INST_AND, rd, rn, rm)
#define ARM_ANDS_R(rd, rn, rm)	_AL3_R(ARM_INST_ANDS, rd, rn, rm)
#define ARM_AND_I(rd, rn, imm)	_AL3_I(ARM_INST_AND, rd, rn, imm)

#define ARM_BIC_R(rd, rn, rm)	_AL3_R(ARM_INST_BIC, rd, rn, rm)
+30 −7
Original line number Diff line number Diff line
@@ -362,7 +362,8 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
	const s16 off = insn->off;
	const s32 imm = insn->imm;
	const int i = insn - ctx->prog->insnsi;
	const bool is64 = BPF_CLASS(code) == BPF_ALU64;
	const bool is64 = BPF_CLASS(code) == BPF_ALU64 ||
			  BPF_CLASS(code) == BPF_JMP;
	const bool isdw = BPF_SIZE(code) == BPF_DW;
	u8 jmp_cond;
	s32 jmp_offset;
@@ -559,7 +560,17 @@ emit_bswap_uxt:
	case BPF_JMP | BPF_JSLT | BPF_X:
	case BPF_JMP | BPF_JSGE | BPF_X:
	case BPF_JMP | BPF_JSLE | BPF_X:
		emit(A64_CMP(1, dst, src), ctx);
	case BPF_JMP32 | BPF_JEQ | BPF_X:
	case BPF_JMP32 | BPF_JGT | BPF_X:
	case BPF_JMP32 | BPF_JLT | BPF_X:
	case BPF_JMP32 | BPF_JGE | BPF_X:
	case BPF_JMP32 | BPF_JLE | BPF_X:
	case BPF_JMP32 | BPF_JNE | BPF_X:
	case BPF_JMP32 | BPF_JSGT | BPF_X:
	case BPF_JMP32 | BPF_JSLT | BPF_X:
	case BPF_JMP32 | BPF_JSGE | BPF_X:
	case BPF_JMP32 | BPF_JSLE | BPF_X:
		emit(A64_CMP(is64, dst, src), ctx);
emit_cond_jmp:
		jmp_offset = bpf2a64_offset(i + off, i, ctx);
		check_imm19(jmp_offset);
@@ -601,7 +612,8 @@ emit_cond_jmp:
		emit(A64_B_(jmp_cond, jmp_offset), ctx);
		break;
	case BPF_JMP | BPF_JSET | BPF_X:
		emit(A64_TST(1, dst, src), ctx);
	case BPF_JMP32 | BPF_JSET | BPF_X:
		emit(A64_TST(is64, dst, src), ctx);
		goto emit_cond_jmp;
	/* IF (dst COND imm) JUMP off */
	case BPF_JMP | BPF_JEQ | BPF_K:
@@ -614,12 +626,23 @@ emit_cond_jmp:
	case BPF_JMP | BPF_JSLT | BPF_K:
	case BPF_JMP | BPF_JSGE | BPF_K:
	case BPF_JMP | BPF_JSLE | BPF_K:
		emit_a64_mov_i(1, tmp, imm, ctx);
		emit(A64_CMP(1, dst, tmp), ctx);
	case BPF_JMP32 | BPF_JEQ | BPF_K:
	case BPF_JMP32 | BPF_JGT | BPF_K:
	case BPF_JMP32 | BPF_JLT | BPF_K:
	case BPF_JMP32 | BPF_JGE | BPF_K:
	case BPF_JMP32 | BPF_JLE | BPF_K:
	case BPF_JMP32 | BPF_JNE | BPF_K:
	case BPF_JMP32 | BPF_JSGT | BPF_K:
	case BPF_JMP32 | BPF_JSLT | BPF_K:
	case BPF_JMP32 | BPF_JSGE | BPF_K:
	case BPF_JMP32 | BPF_JSLE | BPF_K:
		emit_a64_mov_i(is64, tmp, imm, ctx);
		emit(A64_CMP(is64, dst, tmp), ctx);
		goto emit_cond_jmp;
	case BPF_JMP | BPF_JSET | BPF_K:
		emit_a64_mov_i(1, tmp, imm, ctx);
		emit(A64_TST(1, dst, tmp), ctx);
	case BPF_JMP32 | BPF_JSET | BPF_K:
		emit_a64_mov_i(is64, tmp, imm, ctx);
		emit(A64_TST(is64, dst, tmp), ctx);
		goto emit_cond_jmp;
	/* function call */
	case BPF_JMP | BPF_CALL:
+1 −0
Original line number Diff line number Diff line
@@ -337,6 +337,7 @@
#define PPC_INST_DIVWU			0x7c000396
#define PPC_INST_DIVD			0x7c0003d2
#define PPC_INST_RLWINM			0x54000000
#define PPC_INST_RLWINM_DOT		0x54000001
#define PPC_INST_RLWIMI			0x50000000
#define PPC_INST_RLDICL			0x78000000
#define PPC_INST_RLDICR			0x78000004
Loading