Commit 2322d6c5 authored by Linus Torvalds's avatar Linus Torvalds
Browse files

Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull more perf tooling updates from Thomas Gleixner:
 "Perf tool updates and fixes:

  perf stat:

   - Display user and system time for workload targets (Jiri Olsa)

  perf record:

   - Enable arbitrary event names thru name= modifier (Alexey Budankov)

  PowerPC:

   - Add a python script for hypervisor call statistics (Ravi Bangoria)

  Intel PT: (Adrian Hunter)

   - Fix sync_switch INTEL_PT_SS_NOT_TRACING

   - Fix decoding to accept CBR between FUP and corresponding TIP

   - Fix MTC timing after overflow

   - Fix "Unexpected indirect branch" error

  perf test:

   - record+probe_libc_inet_pton:
      - To get the symbol table for dynamic shared objects on ubuntu we
        need to pass the -D/--dynamic command line option, unlike with
        the fedora distros (Arnaldo Carvalho de Melo)

   - code-reading:
      - Fix perf_env setup for PTI entry trampolines (Adrian Hunter)

   - kmod-path:
      - Add tests for vdso32 and vdsox32 (Adrian Hunter)

   - Use header file util/debug.h (Thomas Richter)

  perf annotate:

   - Make the various UI backends (stdio, TUI, gtk) use more
     consistently structs with annotation options as specified by the
     user (Arnaldo Carvalho de Melo)

   - Move annotation specific knobs from the symbol_conf global kitchen
     sink to the annotation option structs (Arnaldo Carvalho de Melo)

  perf script:

   - Add more PMU fields to python scripts event handler dict (Jin Yao)

  Core:

   - Fix misleading error for some unparsable events mentioning PMUs
     when those are not involved in the problem (Jiri Olsa)

   - Consider BSS symbols when processing /proc/kallsyms ('B' and 'b')
     (Arnaldo Carvalho de Melo)

   - Be more robust when trying to use per-symbol histograms, checking
     for unlikely but possible cases where the space for the histograms
     wasn't allocated, print a debug message for such cases (Arnaldo
     Carvalho de Melo)

   - Fix symbol and object code resolution for vdso32 and vdsox32
     (Adrian Hunter)

   - No need to check for null when passing pointers to foo__get() style
     refcount grabbing helpers, just like in the kernel and with free(),
     its safe to pass a NULL pointer to avoid having to check it before
     each and every foo__get() call (Arnaldo Carvalho de Melo)

   - Remove some dead code (quote.[ch]) (Arnaldo Carvalho de Melo)

   - Remove some needless globals, making them local (Arnaldo Carvalho
     de Melo)

   - Reduce usage of symbol_conf.use_callchain, using other means of
     finding out if callchains are in use or available for specific
     events, as we evolved this codebase to allow requesting callchains
     for just a subset of the monitored events. In time it will help
     polish recording and showing mixed sets accross the various tools:

        perf record -e cycles/call-graph=fp/,cache-misses/call-graph=dwarf/,instructions'

     (Arnaldo Carvalho de Melo)

   - Consider PTI entry trampolines in map__rip_2objdump() (Adrian
     Hunter)"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
  perf script python: Add dict fields introduction to Documentation
  perf script python: Add more PMU fields to event handler dict
  perf script python: Move dsoname code to a new function
  perf symbols: Add BSS symbols when reading from /proc/kallsyms
  perf annnotate: Make __symbol__inc_addr_samples handle src->histograms == NULL
  perf intel-pt: Fix "Unexpected indirect branch" error
  perf intel-pt: Fix MTC timing after overflow
  perf intel-pt: Fix decoding to accept CBR between FUP and corresponding TIP
  perf intel-pt: Fix sync_switch INTEL_PT_SS_NOT_TRACING
  perf script powerpc: Python script for hypervisor call statistics
  perf test record+probe_libc_inet_pton: Ask 'nm' for dynamic symbols
  perf map: Consider PTI entry trampolines in rip_2objdump()
  perf test code-reading: Fix perf_env setup for PTI entry trampolines
  perf tools: Fix pmu events parsing rule
  perf stat: Display user and system time
  perf record: Enable arbitrary event names thru name= modifier
  perf tools: Fix symbol and object code resolution for vdso32 and vdsox32
  perf tests kmod-path: Add tests for vdso32 and vdsox32
  perf hists: Check if a hist_entry has callchains before using them
  perf hists: Introduce hist_entry__has_callchain() method
  ...
parents 9f3fbe85 2696ec45
Loading
Loading
Loading
Loading
+5 −1
Original line number Diff line number Diff line
@@ -124,7 +124,11 @@ The available PMUs and their raw parameters can be listed with
For example the raw event "LSD.UOPS" core pmu event above could
be specified as

  perf stat -e cpu/event=0xa8,umask=0x1,name=LSD.UOPS_CYCLES,cmask=1/ ...
  perf stat -e cpu/event=0xa8,umask=0x1,name=LSD.UOPS_CYCLES,cmask=0x1/ ...

  or using extended name syntax

  perf stat -e cpu/event=0xa8,umask=0x1,cmask=0x1,name=\'LSD.UOPS_CYCLES:cmask=0x1\'/ ...

PER SOCKET PMUS
---------------
+3 −0
Original line number Diff line number Diff line
@@ -57,6 +57,9 @@ OPTIONS
			 FP mode, "dwarf" for DWARF mode, "lbr" for LBR mode and
			 "no" for disable callgraph.
	  - 'stack-size': user stack size for dwarf mode
	  - 'name' : User defined event name. Single quotes (') may be used to
		    escape symbols in the name from parsing by shell and tool
		    like this: name=\'CPU_CLK_UNHALTED.THREAD:cmask=0x1\'.

          See the linkperf:perf-list[1] man page for more parameters.

+26 −0
Original line number Diff line number Diff line
@@ -610,6 +610,32 @@ Various utility functions for use with perf script:
  nsecs_str(nsecs) - returns printable string in the form secs.nsecs
  avg(total, n) - returns average given a sum and a total number of values

SUPPORTED FIELDS
----------------

Currently supported fields:

ev_name, comm, pid, tid, cpu, ip, time, period, phys_addr, addr,
symbol, dso, time_enabled, time_running, values, callchain,
brstack, brstacksym, datasrc, datasrc_decode, iregs, uregs,
weight, transaction, raw_buf, attr.

Some fields have sub items:

brstack:
    from, to, from_dsoname, to_dsoname, mispred,
    predicted, in_tx, abort, cycles.

brstacksym:
    items: from, to, pred, in_tx, abort (converted string)

For example,
We can use this code to print brstack "from", "to", "cycles".

if 'brstack' in dict:
	for entry in dict['brstack']:
		print "from %s, to %s, cycles %s" % (entry["from"], entry["to"], entry["cycles"])

SEE ALSO
--------
linkperf:perf-script[1]
+29 −11
Original line number Diff line number Diff line
@@ -310,20 +310,38 @@ Users who wants to get the actual value can apply --no-metric-only.
EXAMPLES
--------

$ perf stat -- make -j
$ perf stat -- make

 Performance counter stats for 'make -j':
   Performance counter stats for 'make':

    8117.370256  task clock ticks     #      11.281 CPU utilization factor
            678  context switches     #       0.000 M/sec
            133  CPU migrations       #       0.000 M/sec
         235724  pagefaults           #       0.029 M/sec
    24821162526  CPU cycles           #    3057.784 M/sec
    18687303457  instructions         #    2302.138 M/sec
      172158895  cache references     #      21.209 M/sec
       27075259  cache misses         #       3.335 M/sec
        83723.452481      task-clock:u (msec)       #    1.004 CPUs utilized
                   0      context-switches:u        #    0.000 K/sec
                   0      cpu-migrations:u          #    0.000 K/sec
           3,228,188      page-faults:u             #    0.039 M/sec
     229,570,665,834      cycles:u                  #    2.742 GHz
     313,163,853,778      instructions:u            #    1.36  insn per cycle
      69,704,684,856      branches:u                #  832.559 M/sec
       2,078,861,393      branch-misses:u           #    2.98% of all branches

 Wall-clock time elapsed:   719.554352 msecs
        83.409183620 seconds time elapsed

        74.684747000 seconds user
         8.739217000 seconds sys

TIMINGS
-------
As displayed in the example above we can display 3 types of timings.
We always display the time the counters were enabled/alive:

        83.409183620 seconds time elapsed

For workload sessions we also display time the workloads spent in
user/system lands:

        74.684747000 seconds user
         8.739217000 seconds sys

Those times are the very same as displayed by the 'time' tool.

CSV FORMAT
----------
+2 −2
Original line number Diff line number Diff line
@@ -189,7 +189,7 @@ out_error:
	return -1;
}

int perf_env__lookup_objdump(struct perf_env *env)
int perf_env__lookup_objdump(struct perf_env *env, const char **path)
{
	/*
	 * For live mode, env->arch will be NULL and we can use
@@ -198,5 +198,5 @@ int perf_env__lookup_objdump(struct perf_env *env)
	if (env->arch == NULL)
		return 0;

	return perf_env__lookup_binutils_path(env, "objdump", &objdump_path);
	return perf_env__lookup_binutils_path(env, "objdump", path);
}
Loading