Commit 4511708b authored by Thomas Gleixner's avatar Thomas Gleixner
Browse files

Merge tag 'perf-core-for-mingo-5.4-20190814' of...

Merge tag 'perf-core-for-mingo-5.4-20190814' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

 into perf/core

Pull perf/core improvements and fixes from Arnaldo:

Intel PT:

  Adrian Hunter:

  - Add PEBS via Intel PT support, the kernel bits went via PeterZ.

perf record:

  Alexander Shishkin:

  - Add an option to take an AUX snapshot on exit.

  Tan Xiaojun:

  - Support aarch64 random socket_id assignment, just like was fixed for S/390.

tools:

  Andy Shevchenko:

  - Keep list of tools in alphabetical order on 'make -C tools help'.

perf session:

  Arnaldo Carvalho de Melo:

  - Avoid infinite loop when seeing invalid header.size, reported by
    Vince Weaver using a perf.data fuzzer.

Documentation:

  Vince Weaver:

  - Clarify HEADER_SAMPLE_TOPOLOGY format in the perf.data spec.

perf config:

  Arnaldo Carvalho de Melo:

  - Honour $PERF_CONFIG env var to specify alternate .perfconfig.

perf test:

  Arnaldo Carvalho de Melo:

  - Disable ~/.perfconfig to get default output in 'perf trace' tests.

perf top:

  Arnaldo Carvalho de Melo:

  - Set display thread COMM to help with debugging.

  - Collapse and resort evsels in a group, so that we have output
    similar to 'perf report' when using event groups, i.e.

      perf top -e '{cycles,instructions}'

    Will have two columns, and the instructions one will work.

core:

  Igor Lubashev:

  - Detect if libcap development files are available so that we
    can use capabilities to match the checks made by the kernel instead
    of using plain (geteuid() == 0).

Intel:

  Haiyan Song:

  - Add Icelake V1.00 event file.

perf trace:

  Leo Yan:

  - Fix segmentation fault when access syscall info on arm64.

Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
parents 7f06d0aa 1cd8fa28
Loading
Loading
Loading
Loading
+80 −8
Original line number Diff line number Diff line
@@ -41,10 +41,11 @@ Related CVEs

The following CVE entries describe Spectre variants:

   =============   =======================  =================
   =============   =======================  ==========================
   CVE-2017-5753   Bounds check bypass      Spectre variant 1
   CVE-2017-5715   Branch target injection  Spectre variant 2
   =============   =======================  =================
   CVE-2019-1125   Spectre v1 swapgs        Spectre variant 1 (swapgs)
   =============   =======================  ==========================

Problem
-------
@@ -78,6 +79,13 @@ There are some extensions of Spectre variant 1 attacks for reading data
over the network, see :ref:`[12] <spec_ref12>`. However such attacks
are difficult, low bandwidth, fragile, and are considered low risk.

Note that, despite "Bounds Check Bypass" name, Spectre variant 1 is not
only about user-controlled array bounds checks.  It can affect any
conditional checks.  The kernel entry code interrupt, exception, and NMI
handlers all have conditional swapgs checks.  Those may be problematic
in the context of Spectre v1, as kernel code can speculatively run with
a user GS.

Spectre variant 2 (Branch Target Injection)
-------------------------------------------

@@ -132,6 +140,9 @@ not cover all possible attack vectors.
1. A user process attacking the kernel
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Spectre variant 1
~~~~~~~~~~~~~~~~~

   The attacker passes a parameter to the kernel via a register or
   via a known address in memory during a syscall. Such parameter may
   be used later by the kernel as an index to an array or to derive
@@ -144,7 +155,40 @@ not cover all possible attack vectors.
   potentially be influenced for Spectre attacks, new "nospec" accessor
   macros are used to prevent speculative loading of data.

   Spectre variant 2 attacker can :ref:`poison <poison_btb>` the branch
Spectre variant 1 (swapgs)
~~~~~~~~~~~~~~~~~~~~~~~~~~

   An attacker can train the branch predictor to speculatively skip the
   swapgs path for an interrupt or exception.  If they initialize
   the GS register to a user-space value, if the swapgs is speculatively
   skipped, subsequent GS-related percpu accesses in the speculation
   window will be done with the attacker-controlled GS value.  This
   could cause privileged memory to be accessed and leaked.

   For example:

   ::

     if (coming from user space)
         swapgs
     mov %gs:<percpu_offset>, %reg
     mov (%reg), %reg1

   When coming from user space, the CPU can speculatively skip the
   swapgs, and then do a speculative percpu load using the user GS
   value.  So the user can speculatively force a read of any kernel
   value.  If a gadget exists which uses the percpu value as an address
   in another load/store, then the contents of the kernel value may
   become visible via an L1 side channel attack.

   A similar attack exists when coming from kernel space.  The CPU can
   speculatively do the swapgs, causing the user GS to get used for the
   rest of the speculative window.

Spectre variant 2
~~~~~~~~~~~~~~~~~

   A spectre variant 2 attacker can :ref:`poison <poison_btb>` the branch
   target buffer (BTB) before issuing syscall to launch an attack.
   After entering the kernel, the kernel could use the poisoned branch
   target buffer on indirect jump and jump to gadget code in speculative
@@ -280,11 +324,18 @@ The sysfs file showing Spectre variant 1 mitigation status is:

The possible values in this file are:

  =======================================  =================================
  'Mitigation: __user pointer sanitation'  Protection in kernel on a case by
                                           case base with explicit pointer
                                           sanitation.
  =======================================  =================================
  .. list-table::

     * - 'Not affected'
       - The processor is not vulnerable.
     * - 'Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers'
       - The swapgs protections are disabled; otherwise it has
         protection in the kernel on a case by case base with explicit
         pointer sanitation and usercopy LFENCE barriers.
     * - 'Mitigation: usercopy/swapgs barriers and __user pointer sanitization'
       - Protection in the kernel on a case by case base with explicit
         pointer sanitation, usercopy LFENCE barriers, and swapgs LFENCE
         barriers.

However, the protections are put in place on a case by case basis,
and there is no guarantee that all possible attack vectors for Spectre
@@ -366,12 +417,27 @@ Turning on mitigation for Spectre variant 1 and Spectre variant 2
1. Kernel mitigation
^^^^^^^^^^^^^^^^^^^^

Spectre variant 1
~~~~~~~~~~~~~~~~~

   For the Spectre variant 1, vulnerable kernel code (as determined
   by code audit or scanning tools) is annotated on a case by case
   basis to use nospec accessor macros for bounds clipping :ref:`[2]
   <spec_ref2>` to avoid any usable disclosure gadgets. However, it may
   not cover all attack vectors for Spectre variant 1.

   Copy-from-user code has an LFENCE barrier to prevent the access_ok()
   check from being mis-speculated.  The barrier is done by the
   barrier_nospec() macro.

   For the swapgs variant of Spectre variant 1, LFENCE barriers are
   added to interrupt, exception and NMI entry where needed.  These
   barriers are done by the FENCE_SWAPGS_KERNEL_ENTRY and
   FENCE_SWAPGS_USER_ENTRY macros.

Spectre variant 2
~~~~~~~~~~~~~~~~~

   For Spectre variant 2 mitigation, the compiler turns indirect calls or
   jumps in the kernel into equivalent return trampolines (retpolines)
   :ref:`[3] <spec_ref3>` :ref:`[9] <spec_ref9>` to go to the target
@@ -473,6 +539,12 @@ Mitigation control on the kernel command line
Spectre variant 2 mitigation can be disabled or force enabled at the
kernel command line.

	nospectre_v1

		[X86,PPC] Disable mitigations for Spectre Variant 1
		(bounds check bypass). With this option data leaks are
		possible in the system.

	nospectre_v2

		[X86] Disable all mitigations for the Spectre variant 2
+4 −4
Original line number Diff line number Diff line
@@ -2604,7 +2604,7 @@
				expose users to several CPU vulnerabilities.
				Equivalent to: nopti [X86,PPC]
					       kpti=0 [ARM64]
					       nospectre_v1 [PPC]
					       nospectre_v1 [X86,PPC]
					       nobp=0 [S390]
					       nospectre_v2 [X86,PPC,S390,ARM64]
					       spectre_v2_user=off [X86]
@@ -2965,9 +2965,9 @@
			nosmt=force: Force disable SMT, cannot be undone
				     via the sysfs control file.

	nospectre_v1	[PPC] Disable mitigations for Spectre Variant 1 (bounds
			check bypass). With this option data leaks are possible
			in the system.
	nospectre_v1	[X86,PPC] Disable mitigations for Spectre Variant 1
			(bounds check bypass). With this option data leaks are
			possible in the system.

	nospectre_v2	[X86,PPC_FSL_BOOK3E,ARM64] Disable all mitigations for
			the Spectre variant 2 (indirect branch prediction)
+0 −162
Original line number Diff line number Diff line
===================
RISC-V CPU Bindings
===================

The device tree allows to describe the layout of CPUs in a system through
the "cpus" node, which in turn contains a number of subnodes (ie "cpu")
defining properties for every cpu.

Bindings for CPU nodes follow the Devicetree Specification, available from:

https://www.devicetree.org/specifications/

with updates for 32-bit and 64-bit RISC-V systems provided in this document.

===========
Terminology
===========

This document uses some terminology common to the RISC-V community that is not
widely used, the definitions of which are listed here:

* hart: A hardware execution context, which contains all the state mandated by
  the RISC-V ISA: a PC and some registers.  This terminology is designed to
  disambiguate software's view of execution contexts from any particular
  microarchitectural implementation strategy.  For example, my Intel laptop is
  described as having one socket with two cores, each of which has two hyper
  threads.  Therefore this system has four harts.

=====================================
cpus and cpu node bindings definition
=====================================

The RISC-V architecture, in accordance with the Devicetree Specification,
requires the cpus and cpu nodes to be present and contain the properties
described below.

- cpus node

        Description: Container of cpu nodes

        The node name must be "cpus".

        A cpus node must define the following properties:

        - #address-cells
                Usage: required
                Value type: <u32>
                Definition: must be set to 1
        - #size-cells
                Usage: required
                Value type: <u32>
                Definition: must be set to 0

- cpu node

        Description: Describes a hart context

        PROPERTIES

        - device_type
                Usage: required
                Value type: <string>
                Definition: must be "cpu"
        - reg
                Usage: required
                Value type: <u32>
                Definition: The hart ID of this CPU node
        - compatible:
                Usage: required
                Value type: <stringlist>
                Definition: must contain "riscv", may contain one of
                            "sifive,rocket0"
        - mmu-type:
                Usage: optional
                Value type: <string>
                Definition: Specifies the CPU's MMU type.  Possible values are
                            "riscv,sv32"
                            "riscv,sv39"
                            "riscv,sv48"
        - riscv,isa:
                Usage: required
                Value type: <string>
                Definition: Contains the RISC-V ISA string of this hart.  These
                            ISA strings are defined by the RISC-V ISA manual.

Example: SiFive Freedom U540G Development Kit
---------------------------------------------

This system contains two harts: a hart marked as disabled that's used for
low-level system tasks and should be ignored by Linux, and a second hart that
Linux is allowed to run on.

        cpus {
                #address-cells = <1>;
                #size-cells = <0>;
                timebase-frequency = <1000000>;
                cpu@0 {
                        clock-frequency = <1600000000>;
                        compatible = "sifive,rocket0", "riscv";
                        device_type = "cpu";
                        i-cache-block-size = <64>;
                        i-cache-sets = <128>;
                        i-cache-size = <16384>;
                        next-level-cache = <&L15 &L0>;
                        reg = <0>;
                        riscv,isa = "rv64imac";
                        status = "disabled";
                        L10: interrupt-controller {
                                #interrupt-cells = <1>;
                                compatible = "riscv,cpu-intc";
                                interrupt-controller;
                        };
                };
                cpu@1 {
                        clock-frequency = <1600000000>;
                        compatible = "sifive,rocket0", "riscv";
                        d-cache-block-size = <64>;
                        d-cache-sets = <64>;
                        d-cache-size = <32768>;
                        d-tlb-sets = <1>;
                        d-tlb-size = <32>;
                        device_type = "cpu";
                        i-cache-block-size = <64>;
                        i-cache-sets = <64>;
                        i-cache-size = <32768>;
                        i-tlb-sets = <1>;
                        i-tlb-size = <32>;
                        mmu-type = "riscv,sv39";
                        next-level-cache = <&L15 &L0>;
                        reg = <1>;
                        riscv,isa = "rv64imafdc";
                        status = "okay";
                        tlb-split;
                        L13: interrupt-controller {
                                #interrupt-cells = <1>;
                                compatible = "riscv,cpu-intc";
                                interrupt-controller;
                        };
                };
        };

Example: Spike ISA Simulator with 1 Hart
----------------------------------------

This device tree matches the Spike ISA golden model as run with `spike -p1`.

        cpus {
                cpu@0 {
                        device_type = "cpu";
                        reg = <0x00000000>;
                        status = "okay";
                        compatible = "riscv";
                        riscv,isa = "rv64imafdc";
                        mmu-type = "riscv,sv48";
                        clock-frequency = <0x3b9aca00>;
                        interrupt-controller {
                                #interrupt-cells = <0x00000001>;
                                interrupt-controller;
                                compatible = "riscv,cpu-intc";
                        }
                }
        }
+16 −0
Original line number Diff line number Diff line
@@ -10,6 +10,18 @@ maintainers:
  - Paul Walmsley <paul.walmsley@sifive.com>
  - Palmer Dabbelt <palmer@sifive.com>

description: |
  This document uses some terminology common to the RISC-V community
  that is not widely used, the definitions of which are listed here:

  hart: A hardware execution context, which contains all the state
  mandated by the RISC-V ISA: a PC and some registers.  This
  terminology is designed to disambiguate software's view of execution
  contexts from any particular microarchitectural implementation
  strategy.  For example, an Intel laptop containing one socket with
  two cores, each of which has two hyperthreads, could be described as
  having four harts.

properties:
  compatible:
    items:
@@ -50,6 +62,10 @@ properties:
      User-Level ISA document, available from
      https://riscv.org/specifications/

      While the isa strings in ISA specification are case
      insensitive, letters in the riscv,isa string must be all
      lowercase to simplify parsing.

  timebase-frequency:
    type: integer
    minimum: 1
+1 −1
Original line number Diff line number Diff line
@@ -19,7 +19,7 @@ properties:
  compatible:
    items:
      - enum:
          - sifive,freedom-unleashed-a00
          - sifive,hifive-unleashed-a00
      - const: sifive,fu540-c000
      - const: sifive,fu540
...
Loading