Commit 56b2147f authored by Ingo Molnar's avatar Ingo Molnar
Browse files

Merge tag 'perf-core-for-mingo-5.5-20191107' of...

Merge tag 'perf-core-for-mingo-5.5-20191107' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

 into perf/core

Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

perf report:

  Jin Yao:

  - Introduce --total-cycles, for basic block profiling, further using data
    obtained from LBR, an example should suffice:

      # perf record -b
      ^C[ perf record: Woken up 595 times to write data ]
      [ perf record: Captured and wrote 156.672 MB perf.data (196873 samples) ]

      # perf evlist -v
      cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: ANY

      # perf report --total-cycles --stdio
      # To display the perf.data header info, please use --header/--header-only options.
      #
      # Total Lost Samples: 0
      #
      # Samples: 6M of event 'cycles'
      # Event count (approx.): 6299936
      #
      # Sampled  Sampled   Avg     Avg
      # Cycles%  Cycles  Cycles%  Cycles                 [Program Block Range]     Shared Object
      # .......  ......  .......  .....   ....................................  ................
      #
         2.17%     1.7M   0.08%     607       [compiler.h:199 -> common.c:221]  [kernel.vmlinux]
         0.72%   544.5K   0.03%     230     [entry_64.S:657 -> entry_64.S:662]  [kernel.vmlinux]
         0.56%   541.8K   0.09%     672       [compiler.h:199 -> common.c:300]  [kernel.vmlinux]
         0.39%   293.2K   0.01%     104   [list_debug.c:43 -> list_debug.c:61]  [kernel.vmlinux]
         0.36%   278.6K   0.03%     272   [entry_64.S:1289 -> entry_64.S:1308]  [kernel.vmlinux]

perf record:

  Adrian Hunter:

  - Allow storing perf.data in a directory together with a copy of /proc/kcore.

  Jiwei Sun:

  - Add support for limit perf output file size, i.e.:

    # perf record --all-cpus -F 10000 --max-size=4M sleep 10h
    [ perf record: perf size limit reached (4097 KB), stopping session ]
    [ perf record: Woken up 6 times to write data ]
    [ perf record: Captured and wrote 4.048 MB perf.data (54094 samples) ]
    Terminated
    # ls -lah perf.data
    -rw-------. 1 root root 4.1M Nov  7 15:27 perf.data
    #

perf stat:

  Jiri Olsa:

  - Add --per-node agregation support:

    In live mode:

      # perf stat  -a -I 1000 -e cycles --per-node
      #           time node   cpus             counts unit events
           1.000542550 N0       20          6,202,097      cycles
           1.000542550 N1       20            639,559      cycles
           2.002040063 N0       20          7,412,495      cycles
           2.002040063 N1       20          2,185,577      cycles
           3.003451699 N0       20          6,508,917      cycles
           3.003451699 N1       20            765,607      cycles
      ...

    Or in the record/report stat session:

      # perf stat record -a -I 1000 -e cycles
      #           time             counts unit events
           1.000536937         10,008,468      cycles
           2.002090152          9,578,539      cycles
           3.003625233          7,647,869      cycles
           4.005135036          7,032,086      cycles
      ^C     4.340902364          3,923,893      cycles

      # perf stat report --per-node
      #           time node   cpus             counts unit events
           1.000536937 N0       20          9,355,086      cycles
           1.000536937 N1       20            653,382      cycles
           2.002090152 N0       20          7,712,838      cycles
           2.002090152 N1       20          1,865,701      cycles
       ...

perf probe:

  Masami Hiramatsu:

  Various fixes related to recent additions to the DWARF format:

  - Fix to find range-only function instance

  - Walk function lines in lexical blocks

  - Fix to show function entry line as probe-able

  - Fix wrong address verification

  - Fix to probe a function which has no entry pc

  - Fix to probe an inline function which has no entry pc

  - Fix to list probe event with correct line number

  - Fix to show inlined function callsite without entry_pc

  - Fix to show ranges of variables in functions without entry_pc

  - Return a better scope DIE if there is no best scope

  - Skip end-of-sequence and non statement lines

  - Filter out instances except for inlined subroutine and subprogram

  - Fix to show calling lines of inlined functions

  - Skip overlapped location on searching variables

perf inject:

  Adrian Hunter:

  - Do not strip evsels with --strip, as they are needed for create_gcov
    (see the autofdo example in tools/perf/Documentation/intel-pt.txt).

Intel PT:

  Adrian Hunter:

  - Intel PT uses an auxtrace_cache to store the results of code-walking, to avoid
    repeated decoding. Add an auxtrace_cache__remove to handle text poke events.

core:

  Andi Kleen:

  - Always preserve errno while cleaning up perf_event_open failures.

llvm:

  Arnaldo Carvalho de Melo:

  - No need to tell that the request for saving a .o file for BPF events, as
    expressed in ~/.perfconfig was satisfied, make that a debug message.

perf vendor events:

Intel:

  Haiyan Song:

  - Update CascadelakeX events to v1.05.

  - Update all the Intel JSON metrics from TMAM 3.6.

Treewide:

  Ian Rogers:

  - Improve error paths, plugging leaks found using LLVM tools
    such as libFuzzer.

jevents:

  Yunfeng Ye:

  - Fix resource leak in process_mapfile() and main()

perf kvm:

  Igor Lubashev:

  - Use evlist layer api when possible.

libsubcmd:

  James Clark:

  - Move EXTRA_FLAGS to the end to allow overriding existing flags.

  - Use -O0 with DEBUG=1

perf diff:

  Jin Yao:

  - Don't use hack to skip column length calculation

CoreSight ETM:

  Leo yan:

  - Fix definition of macro TO_CS_QUEUE_NR

ARM64:

  John Garry:

  - Do not try to include libelf header files when its feature detection
    failed, fixing the cross build for ARM64.

perf tests:

  Leo Yan:

  - Fix out of bounds memory access in the backward ring buffer test.

Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
parents 8f05c1ff 7fa46cbf
Loading
Loading
Loading
Loading
+6 −3
Original line number Diff line number Diff line
@@ -19,8 +19,7 @@ MAKEFLAGS += --no-print-directory

LIBFILE = $(OUTPUT)libsubcmd.a

CFLAGS := $(EXTRA_WARNINGS) $(EXTRA_CFLAGS)
CFLAGS += -ggdb3 -Wall -Wextra -std=gnu99 -fPIC
CFLAGS := -ggdb3 -Wall -Wextra -std=gnu99 -fPIC

ifeq ($(DEBUG),0)
  ifeq ($(feature-fortify-source), 1)
@@ -28,7 +27,9 @@ ifeq ($(DEBUG),0)
  endif
endif

ifeq ($(CC_NO_CLANG), 0)
ifeq ($(DEBUG),1)
  CFLAGS += -O0
else ifeq ($(CC_NO_CLANG), 0)
  CFLAGS += -O3
else
  CFLAGS += -O6
@@ -43,6 +44,8 @@ CFLAGS += -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE

CFLAGS += -I$(srctree)/tools/include/

CFLAGS += $(EXTRA_WARNINGS) $(EXTRA_CFLAGS)

SUBCMD_IN := $(OUTPUT)libsubcmd-in.o

all:
+7 −0
Original line number Diff line number Diff line
@@ -571,6 +571,13 @@ config terms. For example: 'cycles/overwrite/' and 'instructions/no-overwrite/'.

Implies --tail-synthesize.

--kcore::
Make a copy of /proc/kcore and place it into a directory with the perf data file.

--max-size=<size>::
Limit the sample data max size, <size> is expected to be a number with
appended unit character - B/K/M/G

SEE ALSO
--------
linkperf:perf-stat[1], linkperf:perf-list[1]
+11 −0
Original line number Diff line number Diff line
@@ -525,6 +525,17 @@ include::itrace.txt[]
	Configure time quantum for time sort key. Default 100ms.
	Accepts s, us, ms, ns units.

--total-cycles::
	When --total-cycles is specified, it supports sorting for all blocks by
	'Sampled Cycles%'. This is useful to concentrate on the globally hottest
	blocks. In output, there are some new columns:

	'Sampled Cycles%' - block sampled cycles aggregation / total sampled cycles
	'Sampled Cycles'  - block sampled cycles aggregation
	'Avg Cycles%'     - block average sampled cycles / sum of total block average
			    sampled cycles
	'Avg Cycles'      - block average sampled cycles

include::callchain-overhead-calculation.txt[]

SEE ALSO
+5 −0
Original line number Diff line number Diff line
@@ -217,6 +217,11 @@ core number and the number of online logical processors on that physical process
Aggregate counts per monitored threads, when monitoring threads (-t option)
or processes (-p option).

--per-node::
Aggregate counts per NUMA nodes for system-wide mode measurements. This
is a useful mode to detect imbalance between NUMA nodes. To enable this
mode, use --per-node in addition to -a. (system-wide).

-D msecs::
--delay msecs::
After starting the program, wait msecs before measuring. This is useful to
+63 −0
Original line number Diff line number Diff line
perf.data directory format

DISCLAIMER This is not ABI yet and is subject to possible change
           in following versions of perf. We will remove this
           disclaimer once the directory format soaks in.


This document describes the on-disk perf.data directory format.

The layout is described by HEADER_DIR_FORMAT feature.
Currently it holds only version number (0):

  HEADER_DIR_FORMAT = 24

  struct {
     uint64_t version;
  }

The current only version value 0 means that:
  - there is a single perf.data file named 'data' within the directory.
  e.g.

    $ tree -ps perf.data
    perf.data
    └── [-rw-------       25912]  data

Future versions are expected to describe different data files
layout according to special needs.

Currently the only 'perf record' option to output to a directory is
the --kcore option which puts a copy of /proc/kcore into the directory.
e.g.

  $ sudo perf record --kcore uname
  Linux
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.015 MB perf.data (9 samples) ]
  $ sudo tree -ps perf.data
  perf.data
  ├── [-rw-------       23744]  data
  └── [drwx------        4096]  kcore_dir
      ├── [-r--------     6731125]  kallsyms
      ├── [-r--------    40230912]  kcore
      └── [-r--------        5419]  modules

  1 directory, 4 files
  $ sudo perf script -v
  build id event received for vmlinux: 1eaa285996affce2d74d8e66dcea09a80c9941de
  build id event received for [vdso]: 8bbaf5dc62a9b644b4d4e4539737e104e4a84541
  build id event received for /lib/x86_64-linux-gnu/libc-2.28.so: 5b157f49586a3ca84d55837f97ff466767dd3445
  Samples for 'cycles' event do not have CPU attribute set. Skipping 'cpu' field.
  Using CPUID GenuineIntel-6-8E-A
  Using perf.data/kcore_dir/kcore for kernel data
  Using perf.data/kcore_dir/kallsyms for symbols
              perf 15316 2060795.480902:          1 cycles:  ffffffffa2caa548 native_write_msr+0x8 (vmlinux)
              perf 15316 2060795.480906:          1 cycles:  ffffffffa2caa548 native_write_msr+0x8 (vmlinux)
              perf 15316 2060795.480908:          7 cycles:  ffffffffa2caa548 native_write_msr+0x8 (vmlinux)
              perf 15316 2060795.480910:        119 cycles:  ffffffffa2caa54a native_write_msr+0xa (vmlinux)
              perf 15316 2060795.480912:       2109 cycles:  ffffffffa2c9b7b0 native_apic_msr_write+0x0 (vmlinux)
              perf 15316 2060795.480914:      37606 cycles:  ffffffffa2f121fe perf_event_addr_filters_exec+0x2e (vmlinux)
             uname 15316 2060795.480924:     588287 cycles:  ffffffffa303a56d page_counter_try_charge+0x6d (vmlinux)
             uname 15316 2060795.481067:    2261945 cycles:  ffffffffa301438f kmem_cache_free+0x4f (vmlinux)
             uname 15316 2060795.481643:    2172167 cycles:      7f1a48c393c0 _IO_un_link+0x0 (/lib/x86_64-linux-gnu/libc-2.28.so)
Loading