Commit 8cacac6e authored by Ingo Molnar's avatar Ingo Molnar
Browse files

Merge tag 'perf-core-for-mingo-5.5-20191122' of...

Merge tag 'perf-core-for-mingo-5.5-20191122' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

 into perf/core

Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

perf report:

  Jin Yao:

  - Allow entering the annotation view (symbol source/assembly +
    overhead/cycles/etc column) from the 'perf report --total-cycles'
    interface.

    E.g.:

      # perf record --all-cpus --branch-any --all-kernel
      ^C[ perf record: Woken up 5 times to write data ]
      #
      # perf evlist -v
      cycles: size: 120, { sample_period, sample_freq }: 4000,
      sample_type: IP|TID|TIME|CPU|PERIOD|BRANCH_STACK,
      read_format: ID, disabled: 1, inherit: 1, exclude_user: 1, mmap: 1, comm: 1, freq: 1, task: 1,
      precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1,
      bpf_event: 1, branch_sample_type: ANY
      #
      # perf report --total-cycles
      #
      # Samples: 78762 of event 'cycles'
      Sampled  Sampled Avg      Avg
      Cycles%  Cycles  Cycles%  Cycles                           [Program Block Range]     Shared Object
        1.72%    95.8K   0.00%     254                        [msr.h:105 -> msr.h:166]  [kernel.vmlinux]
        1.56%   107.6K   0.00%     618                [compiler.h:199 -> common.c:301]  [kernel.vmlinux]
        0.83%    46.3K   0.00%     409              [entry_64.S:153 -> entry_64.S:175]  [kernel.vmlinux]
        0.83%    46.1K   0.00%      83                  [jump_label.h:41 -> tsc.c:230]  [kernel.vmlinux]
        0.64%    36.9K   0.01%    1.4K            [hda_intel.c:904 -> hda_intel.c:916]   [snd_hda_intel]
        0.57%    30.2K   0.00%     282                      [file.c:710 -> file.c:730]  [kernel.vmlinux]
        0.48%    25.8K   0.00%      82              [spinlock.c:158 -> spinlock.c:160]  [kernel.vmlinux]
        0.45%    23.7K   0.00%     369  [tick-broadcast.c:585 -> tick-broadcast.c:586]  [kernel.vmlinux]
        0.44%    24.4K   0.00%      73                       [msr.h:236 -> tsc.c:1088]  [kernel.vmlinux]
        0.43%    22.7K   0.00%     144                [cpuidle.c:229 -> cpuidle.c:232]  [kernel.vmlinux]

    Then press 'A' or Enter on one of those lines, just like with 'perf top', say
    the top one: [msr.h:105 -> msr.h:166], then this shows up:

      Samples: 78K of event 'cycles', 4000 Hz, Event count (approx.): 78762
      native_write_msr  /lib/modules/5.4.0-rc8/build/vmlinux [Percent: local period]
      Percent│ IPC Cycle (Average IPC: 0.02, IPC Coverage: 50.0%)
             │
             │             Disassembly of section .text:
             │
             │             ffffffff8106c480 <native_write_msr>:
             │             __wrmsr():
             │             return EAX_EDX_VAL(val, low, high);
             │             }
             │
             │             static inline void notrace __wrmsr(unsigned int msr, u32 low, u32 high)
             │             {
             │             asm volatile("1: wrmsr\n"
       49.16 │0.02           mov   %edi,%ecx
             │0.02           mov   %esi,%eax
             │0.02           wrmsr
             │             arch_static_branch():
             │             #include <linux/stringify.h>
             │             #include <linux/types.h>
             │
             │             static __always_inline bool arch_static_branch(struct static_key *key, bool branch)
             │             {
             │             asm_volatile_goto("1:"
        0.79 │0.02           nop
             │             native_write_msr():
             │             {
             │             __wrmsr(msr, low, high);
             │
             │             if (msr_tracepoint_active(__tracepoint_write_msr))
             │             do_trace_write_msr(msr, ((u64)high << 32 | low), 0);
             │             }
       50.05 │0.02  254    ← retq
             │             do_trace_write_msr(msr, ((u64)high << 32 | low), 0);
             │               shl   $0x20,%rdx
             │               mov   %esi,%esi
             │               or    %rdx,%rsi
             │               xor   %edx,%edx
             │             → jmpq  do_trace_write_msr

    We need to improve this to show the source code line numbers in the
    annotation view, so one can go from that program block to the annotation view
    and see those source code line numbers straight away.

auxtrace/Intel PT:

  Adrian Hunter:

  - Add support for AUX area sampling, requires new functionality that
    will land in 5.5, its already in tip.

    This includes kernel capability querying so that it fails gracefully
    with older kernels, duimping aux area samples in 'perf report -D' and
    'perf script'.

perf.data:

  Alexey Budankov:

  - Fix decompression of PERF_RECORD_COMPRESSED records.

core:

  Arnaldo Carvalho de Melo:

  - Use the 'dcacheline' cmp routine to find the right DSOs taking into
    account the 'maj', 'min', 'ino' and 'ino_generation', that got moved
    from 'struct map' to 'struct dso', where it belongs.

    This further reduces the size of 'struct map', there is still more
    work to do to maybe get it to max one cacheline.

libtraceevent:

  Hewenliang:

  - Fix memory leakage in copy_filter_type().

  Sudip Mukherjee:

  - Fix header installation.

perf parse:

  Ian Rogers :

  - Fix potential memory leak when handling tracepoint errors, found using
    LLVM's libFuzzer.

perf probe:

  Colin Ian King:

  - Fix spelling mistake "addrees" -> "address".

Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
parents 8f6ee51d 4584f084
Loading
Loading
Loading
Loading
+8 −2
Original line number Diff line number Diff line
@@ -141,8 +141,9 @@ enum perf_event_sample_format {
	PERF_SAMPLE_TRANSACTION			= 1U << 17,
	PERF_SAMPLE_REGS_INTR			= 1U << 18,
	PERF_SAMPLE_PHYS_ADDR			= 1U << 19,
	PERF_SAMPLE_AUX				= 1U << 20,

	PERF_SAMPLE_MAX = 1U << 20,		/* non-ABI */
	PERF_SAMPLE_MAX = 1U << 21,		/* non-ABI */

	__PERF_SAMPLE_CALLCHAIN_EARLY		= 1ULL << 63, /* non-ABI; internal use */
};
@@ -300,6 +301,7 @@ enum perf_event_read_format {
					/* add: sample_stack_user */
#define PERF_ATTR_SIZE_VER4	104	/* add: sample_regs_intr */
#define PERF_ATTR_SIZE_VER5	112	/* add: aux_watermark */
#define PERF_ATTR_SIZE_VER6	120	/* add: aux_sample_size */

/*
 * Hardware event_id to monitor via a performance monitoring event:
@@ -424,7 +426,9 @@ struct perf_event_attr {
	 */
	__u32	aux_watermark;
	__u16	sample_max_stack;
	__u16	__reserved_2;	/* align to __u64 */
	__u16	__reserved_2;
	__u32	aux_sample_size;
	__u32	__reserved_3;
};

/*
@@ -864,6 +868,8 @@ enum perf_event_type {
	 *	{ u64			abi; # enum perf_sample_regs_abi
	 *	  u64			regs[weight(mask)]; } && PERF_SAMPLE_REGS_INTR
	 *	{ u64			phys_addr;} && PERF_SAMPLE_PHYS_ADDR
	 *	{ u64			size;
	 *	  char			data[size]; } && PERF_SAMPLE_AUX
	 * };
	 */
	PERF_RECORD_SAMPLE			= 9,
+4 −4
Original line number Diff line number Diff line
@@ -232,10 +232,10 @@ install_pkgconfig:

install_headers:
	$(call QUIET_INSTALL, headers) \
		$(call do_install,event-parse.h,$(DESTDIR)$(includedir_SQ),644); \
		$(call do_install,event-utils.h,$(DESTDIR)$(includedir_SQ),644); \
		$(call do_install,trace-seq.h,$(DESTDIR)$(includedir_SQ),644); \
		$(call do_install,kbuffer.h,$(DESTDIR)$(includedir_SQ),644)
		$(call do_install,event-parse.h,$(includedir_SQ),644); \
		$(call do_install,event-utils.h,$(includedir_SQ),644); \
		$(call do_install,trace-seq.h,$(includedir_SQ),644); \
		$(call do_install,kbuffer.h,$(includedir_SQ),644)

install: install_lib

+7 −2
Original line number Diff line number Diff line
@@ -1473,8 +1473,10 @@ static int copy_filter_type(struct tep_event_filter *filter,
	if (strcmp(str, "TRUE") == 0 || strcmp(str, "FALSE") == 0) {
		/* Add trivial event */
		arg = allocate_arg();
		if (arg == NULL)
		if (arg == NULL) {
			free(str);
			return -1;
		}

		arg->type = TEP_FILTER_ARG_BOOLEAN;
		if (strcmp(str, "TRUE") == 0)
@@ -1483,8 +1485,11 @@ static int copy_filter_type(struct tep_event_filter *filter,
			arg->boolean.value = 0;

		filter_type = add_filter_type(filter, event->id);
		if (filter_type == NULL)
		if (filter_type == NULL) {
			free(str);
			free_arg(arg);
			return -1;
		}

		filter_type->filter = arg;

+57 −2
Original line number Diff line number Diff line
@@ -434,6 +434,56 @@ pwr_evt Enable power events. The power events provide information about
		"0" otherwise.


AUX area sampling option
------------------------

To select Intel PT "sampling" the AUX area sampling option can be used:

	--aux-sample

Optionally it can be followed by the sample size in bytes e.g.

	--aux-sample=8192

In addition, the Intel PT event to sample must be defined e.g.

	-e intel_pt//u

Samples on other events will be created containing Intel PT data e.g. the
following will create Intel PT samples on the branch-misses event, note the
events must be grouped using {}:

	perf record --aux-sample -e '{intel_pt//u,branch-misses:u}'

An alternative to '--aux-sample' is to add the config term 'aux-sample-size' to
events.  In this case, the grouping is implied e.g.

	perf record -e intel_pt//u -e branch-misses/aux-sample-size=8192/u

is the same as:

	perf record -e '{intel_pt//u,branch-misses/aux-sample-size=8192/u}'

but allows for also using an address filter e.g.:

	perf record -e intel_pt//u --filter 'filter * @/bin/ls' -e branch-misses/aux-sample-size=8192/u -- ls

It is important to select a sample size that is big enough to contain at least
one PSB packet.  If not a warning will be displayed:

	Intel PT sample size (%zu) may be too small for PSB period (%zu)

The calculation used for that is: if sample_size <= psb_period + 256 display the
warning.  When sampling is used, psb_period defaults to 0 (2KiB).

The default sample size is 4KiB.

The sample size is passed in aux_sample_size in struct perf_event_attr.  The
sample size is limited by the maximum event size which is 64KiB.  It is
difficult to know how big the event might be without the trace sample attached,
but the tool validates that the sample size is not greater than 60KiB.


new snapshot option
-------------------

@@ -487,8 +537,8 @@ their mlock limit (which defaults to 64KiB but is not multiplied by the number
of cpus).

In full-trace mode, powers of two are allowed for buffer size, with a minimum
size of 2 pages.  In snapshot mode, it is the same but the minimum size is
1 page.
size of 2 pages.  In snapshot mode or sampling mode, it is the same but the
minimum size is 1 page.

The mmap size and auxtrace mmap size are displayed if the -vv option is used e.g.

@@ -501,12 +551,17 @@ Intel PT modes of operation

Intel PT can be used in 2 modes:
	full-trace mode
	sample mode
	snapshot mode

Full-trace mode traces continuously e.g.

	perf record -e intel_pt//u uname

Sample mode attaches a Intel PT sample to other events e.g.

	perf record --aux-sample -e intel_pt//u -e branch-misses:u

Snapshot mode captures the available data when a signal is sent e.g.

	perf record -v -e intel_pt//u -S ./loopy 1000000000 &
+9 −0
Original line number Diff line number Diff line
@@ -62,6 +62,9 @@ OPTIONS
		    like this: name=\'CPU_CLK_UNHALTED.THREAD:cmask=0x1\'.
	  - 'aux-output': Generate AUX records instead of events. This requires
			  that an AUX area event is also provided.
	  - 'aux-sample-size': Set sample size for AUX area sampling. If the
	  '--aux-sample' option has been used, set aux-sample-size=0 to disable
	  AUX area sampling for the event.

          See the linkperf:perf-list[1] man page for more parameters.

@@ -433,6 +436,12 @@ can be specified in a string that follows this option:
In Snapshot Mode trace data is captured only when signal SIGUSR2 is received
and on exit if the above 'e' option is given.

--aux-sample[=OPTIONS]::
Select AUX area sampling. At least one of the events selected by the -e option
must be an AUX area event. Samples on other events will be created containing
data from the AUX area. Optionally sample size may be specified, otherwise it
defaults to 4KiB.

--proc-map-timeout::
When processing pre-existing threads /proc/XXX/mmap, it may take a long time,
because the file may be huge. A time out is needed in such cases.
Loading