Commit 87cfeb19 authored by Ingo Molnar's avatar Ingo Molnar
Browse files

Merge tag 'perf-core-for-mingo-5.8-20200420' of...

Merge tag 'perf-core-for-mingo-5.8-20200420' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

 into perf/core

Pull perf/core fixes and improvements from Arnaldo Carvalho de Melo:

kernel + tools/perf:

  Alexey Budankov:

  - Introduce CAP_PERFMON to kernel and user space.

callchains:

  Adrian Hunter:

  - Allow using Intel PT to synthesize callchains for regular events.

  Kan Liang:

  - Stitch LBR records from multiple samples to get deeper backtraces,
    there are caveats, see the csets for details.

perf script:

  Andreas Gerstmayr:

  - Add flamegraph.py script

BPF:

  Jiri Olsa:

  - Synthesize bpf_trampoline/dispatcher ksymbol events.

perf stat:

  Arnaldo Carvalho de Melo:

  - Honour --timeout for forked workloads.

  Stephane Eranian:

  - Force error in fallback on :k events, to avoid counting nothing when
    the user asks for kernel events but is not allowed to.

perf bench:

  Ian Rogers:

  - Add event synthesis benchmark.

tools api fs:

  Stephane Eranian:

 - Make xxx__mountpoint() more scalable

libtraceevent:

  He Zhe:

  - Handle return value of asprintf.

Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
parents 18bf3408 12e89e65
Loading
Loading
Loading
Loading
+61 −25
Original line number Diff line number Diff line
.. _perf_security:

Perf Events and tool security
Perf events and tool security
=============================

Overview
@@ -42,11 +42,11 @@ categories:
Data that belong to the fourth category can potentially contain
sensitive process data. If PMUs in some monitoring modes capture values
of execution context registers or data from process memory then access
to such monitoring capabilities requires to be ordered and secured
properly. So, perf_events/Perf performance monitoring is the subject for
security access control management [5]_ .
to such monitoring modes requires to be ordered and secured properly.
So, perf_events performance monitoring and observability operations are
the subject for security access control management [5]_ .

perf_events/Perf access control
perf_events access control
-------------------------------

To perform security checks, the Linux implementation splits processes
@@ -66,11 +66,25 @@ into distinct units, known as capabilities [6]_ , which can be
independently enabled and disabled on per-thread basis for processes and
files of unprivileged users.

Unprivileged processes with enabled CAP_SYS_ADMIN capability are treated
Unprivileged processes with enabled CAP_PERFMON capability are treated
as privileged processes with respect to perf_events performance
monitoring and bypass *scope* permissions checks in the kernel.

Unprivileged processes using perf_events system call API is also subject
monitoring and observability operations, thus, bypass *scope* permissions
checks in the kernel. CAP_PERFMON implements the principle of least
privilege [13]_ (POSIX 1003.1e: 2.2.2.39) for performance monitoring and
observability operations in the kernel and provides a secure approach to
perfomance monitoring and observability in the system.

For backward compatibility reasons the access to perf_events monitoring and
observability operations is also open for CAP_SYS_ADMIN privileged
processes but CAP_SYS_ADMIN usage for secure monitoring and observability
use cases is discouraged with respect to the CAP_PERFMON capability.
If system audit records [14]_ for a process using perf_events system call
API contain denial records of acquiring both CAP_PERFMON and CAP_SYS_ADMIN
capabilities then providing the process with CAP_PERFMON capability singly
is recommended as the preferred secure approach to resolve double access
denial logging related to usage of performance monitoring and observability.

Unprivileged processes using perf_events system call are also subject
for PTRACE_MODE_READ_REALCREDS ptrace access mode check [7]_ , whose
outcome determines whether monitoring is permitted. So unprivileged
processes provided with CAP_SYS_PTRACE capability are effectively
@@ -82,14 +96,14 @@ performance analysis of monitored processes or a system. For example,
CAP_SYSLOG capability permits reading kernel space memory addresses from
/proc/kallsyms file.

perf_events/Perf privileged users
Privileged Perf users groups
---------------------------------

Mechanisms of capabilities, privileged capability-dumb files [6]_ and
file system ACLs [10]_ can be used to create a dedicated group of
perf_events/Perf privileged users who are permitted to execute
performance monitoring without scope limits. The following steps can be
taken to create such a group of privileged Perf users.
file system ACLs [10]_ can be used to create dedicated groups of
privileged Perf users who are permitted to execute performance monitoring
and observability without scope limits. The following steps can be
taken to create such groups of privileged Perf users.

1. Create perf_users group of privileged Perf users, assign perf_users
   group to Perf tool executable and limit access to the executable for
@@ -108,30 +122,51 @@ taken to create such a group of privileged Perf users.
   -rwxr-x---  2 root perf_users  11M Oct 19 15:12 perf

2. Assign the required capabilities to the Perf tool executable file and
   enable members of perf_users group with performance monitoring
   enable members of perf_users group with monitoring and observability
   privileges [6]_ :

::

   # setcap "cap_sys_admin,cap_sys_ptrace,cap_syslog=ep" perf
   # setcap -v "cap_sys_admin,cap_sys_ptrace,cap_syslog=ep" perf
   # setcap "cap_perfmon,cap_sys_ptrace,cap_syslog=ep" perf
   # setcap -v "cap_perfmon,cap_sys_ptrace,cap_syslog=ep" perf
   perf: OK
   # getcap perf
   perf = cap_sys_ptrace,cap_sys_admin,cap_syslog+ep
   perf = cap_sys_ptrace,cap_syslog,cap_perfmon+ep

If the libcap installed doesn't yet support "cap_perfmon", use "38" instead,
i.e.:

::

   # setcap "38,cap_ipc_lock,cap_sys_ptrace,cap_syslog=ep" perf

Note that you may need to have 'cap_ipc_lock' in the mix for tools such as
'perf top', alternatively use 'perf top -m N', to reduce the memory that
it uses for the perf ring buffer, see the memory allocation section below.

Using a libcap without support for CAP_PERFMON will make cap_get_flag(caps, 38,
CAP_EFFECTIVE, &val) fail, which will lead the default event to be 'cycles:u',
so as a workaround explicitly ask for the 'cycles' event, i.e.:

::

  # perf top -e cycles

To get kernel and user samples with a perf binary with just CAP_PERFMON.

As a result, members of perf_users group are capable of conducting
performance monitoring by using functionality of the configured Perf
tool executable that, when executes, passes perf_events subsystem scope
checks.
performance monitoring and observability by using functionality of the
configured Perf tool executable that, when executes, passes perf_events
subsystem scope checks.

This specific access control management is only available to superuser
or root running processes with CAP_SETPCAP, CAP_SETFCAP [6]_
capabilities.

perf_events/Perf unprivileged users
Unprivileged users
-----------------------------------

perf_events/Perf *scope* and *access* control for unprivileged processes
perf_events *scope* and *access* control for unprivileged processes
is governed by perf_event_paranoid [2]_ setting:

-1:
@@ -166,7 +201,7 @@ is governed by perf_event_paranoid [2]_ setting:
     perf_event_mlock_kb locking limit is imposed but ignored for
     unprivileged processes with CAP_IPC_LOCK capability.

perf_events/Perf resource control
Resource control
---------------------------------

Open file descriptors
@@ -227,4 +262,5 @@ Bibliography
.. [10] `<http://man7.org/linux/man-pages/man5/acl.5.html>`_
.. [11] `<http://man7.org/linux/man-pages/man2/getrlimit.2.html>`_
.. [12] `<http://man7.org/linux/man-pages/man5/limits.conf.5.html>`_
.. [13] `<https://sites.google.com/site/fullycapable>`_
.. [14] `<http://man7.org/linux/man-pages/man8/auditd.8.html>`_
+11 −5
Original line number Diff line number Diff line
@@ -721,7 +721,13 @@ perf_event_paranoid
===================

Controls use of the performance events system by unprivileged
users (without CAP_SYS_ADMIN).  The default value is 2.
users (without CAP_PERFMON).  The default value is 2.

For backward compatibility reasons access to system performance
monitoring and observability remains open for CAP_SYS_ADMIN
privileged processes but CAP_SYS_ADMIN usage for secure system
performance monitoring and observability operations is discouraged
with respect to CAP_PERFMON use cases.

===  ==================================================================
 -1  Allow use of (almost) all events by all users.
@@ -730,13 +736,13 @@ users (without CAP_SYS_ADMIN). The default value is 2.
     ``CAP_IPC_LOCK``.

>=0  Disallow ftrace function tracepoint by users without
     ``CAP_SYS_ADMIN``.
     ``CAP_PERFMON``.

     Disallow raw tracepoint access by users without ``CAP_SYS_ADMIN``.
     Disallow raw tracepoint access by users without ``CAP_PERFMON``.

>=1  Disallow CPU event access by users without ``CAP_SYS_ADMIN``.
>=1  Disallow CPU event access by users without ``CAP_PERFMON``.

>=2  Disallow kernel profiling by users without ``CAP_SYS_ADMIN``.
>=2  Disallow kernel profiling by users without ``CAP_PERFMON``.
===  ==================================================================


+1 −1
Original line number Diff line number Diff line
@@ -300,7 +300,7 @@ static ssize_t perf_write(struct file *file, const char __user *buf,
	else
		return -EFAULT;

	if (!capable(CAP_SYS_ADMIN))
	if (!perfmon_capable())
		return -EACCES;

	if (count != sizeof(uint32_t))
+2 −2
Original line number Diff line number Diff line
@@ -976,7 +976,7 @@ static int thread_imc_event_init(struct perf_event *event)
	if (event->attr.type != event->pmu->type)
		return -ENOENT;

	if (!capable(CAP_SYS_ADMIN))
	if (!perfmon_capable())
		return -EACCES;

	/* Sampling not supported */
@@ -1412,7 +1412,7 @@ static int trace_imc_event_init(struct perf_event *event)
	if (event->attr.type != event->pmu->type)
		return -ENOENT;

	if (!capable(CAP_SYS_ADMIN))
	if (!perfmon_capable())
		return -EACCES;

	/* Return if this is a couting event */
+6 −7
Original line number Diff line number Diff line
@@ -3390,10 +3390,10 @@ i915_perf_open_ioctl_locked(struct i915_perf *perf,
	/* Similar to perf's kernel.perf_paranoid_cpu sysctl option
	 * we check a dev.i915.perf_stream_paranoid sysctl option
	 * to determine if it's ok to access system wide OA counters
	 * without CAP_SYS_ADMIN privileges.
	 * without CAP_PERFMON or CAP_SYS_ADMIN privileges.
	 */
	if (privileged_op &&
	    i915_perf_stream_paranoid && !capable(CAP_SYS_ADMIN)) {
	    i915_perf_stream_paranoid && !perfmon_capable()) {
		DRM_DEBUG("Insufficient privileges to open i915 perf stream\n");
		ret = -EACCES;
		goto err_ctx;
@@ -3586,9 +3586,8 @@ static int read_properties_unlocked(struct i915_perf *perf,
			} else
				oa_freq_hz = 0;

			if (oa_freq_hz > i915_oa_max_sample_rate &&
			    !capable(CAP_SYS_ADMIN)) {
				DRM_DEBUG("OA exponent would exceed the max sampling frequency (sysctl dev.i915.oa_max_sample_rate) %uHz without root privileges\n",
			if (oa_freq_hz > i915_oa_max_sample_rate && !perfmon_capable()) {
				DRM_DEBUG("OA exponent would exceed the max sampling frequency (sysctl dev.i915.oa_max_sample_rate) %uHz without CAP_PERFMON or CAP_SYS_ADMIN privileges\n",
					  i915_oa_max_sample_rate);
				return -EACCES;
			}
@@ -4009,7 +4008,7 @@ int i915_perf_add_config_ioctl(struct drm_device *dev, void *data,
		return -EINVAL;
	}

	if (i915_perf_stream_paranoid && !capable(CAP_SYS_ADMIN)) {
	if (i915_perf_stream_paranoid && !perfmon_capable()) {
		DRM_DEBUG("Insufficient privileges to add i915 OA config\n");
		return -EACCES;
	}
@@ -4156,7 +4155,7 @@ int i915_perf_remove_config_ioctl(struct drm_device *dev, void *data,
		return -ENOTSUPP;
	}

	if (i915_perf_stream_paranoid && !capable(CAP_SYS_ADMIN)) {
	if (i915_perf_stream_paranoid && !perfmon_capable()) {
		DRM_DEBUG("Insufficient privileges to remove i915 OA config\n");
		return -EACCES;
	}
Loading