Commit 3071f13d authored by Agustin Vega-Frias's avatar Agustin Vega-Frias Committed by Will Deacon
Browse files

perf: qcom: Add L3 cache PMU driver



This adds a new dynamic PMU to the Perf Events framework to program
and control the L3 cache PMUs in some Qualcomm Technologies SOCs.

The driver supports a distributed cache architecture where the overall
cache for a socket is comprised of multiple slices each with its own PMU.
Access to each individual PMU is provided even though all CPUs share all
the slices. User space needs to aggregate to individual counts to provide
a global picture.

The driver exports formatting and event information to sysfs so it can
be used by the perf user space tools with the syntaxes:
   perf stat -a -e l3cache_0_0/read-miss/
   perf stat -a -e l3cache_0_0/event=0x21/

Acked-by: default avatarMark Rutland <mark.rutland@arm.com>
Signed-off-by: default avatarAgustin Vega-Frias <agustinv@codeaurora.org>
[will: fixed sparse issues]
Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
parent c09adab0
Loading
Loading
Loading
Loading
+25 −0
Original line number Diff line number Diff line
Qualcomm Datacenter Technologies L3 Cache Performance Monitoring Unit (PMU)
===========================================================================

This driver supports the L3 cache PMUs found in Qualcomm Datacenter Technologies
Centriq SoCs. The L3 cache on these SOCs is composed of multiple slices, shared
by all cores within a socket. Each slice is exposed as a separate uncore perf
PMU with device name l3cache_<socket>_<instance>. User space is responsible
for aggregating across slices.

The driver provides a description of its available events and configuration
options in sysfs, see /sys/devices/l3cache*. Given that these are uncore PMUs
the driver also exposes a "cpumask" sysfs attribute which contains a mask
consisting of one CPU per socket which will be used to handle all the PMU
events on that socket.

The hardware implements 32bit event counters and has a flat 8bit event space
exposed via the "event" format attribute. In addition to the 32bit physical
counters the driver supports virtual 64bit hardware counters by using hardware
counter chaining. This feature is exposed via the "lc" (long counter) format
flag. E.g.:

  perf stat -e l3cache_0_0/read-miss,lc/

Given that these are uncore PMUs the driver does not support sampling, therefore
"perf record" will not work. Per-task perf sessions are not supported.
+10 −0
Original line number Diff line number Diff line
@@ -21,6 +21,16 @@ config QCOM_L2_PMU
	  Adds the L2 cache PMU into the perf events subsystem for
	  monitoring L2 cache events.

config QCOM_L3_PMU
	bool "Qualcomm Technologies L3-cache PMU"
	depends on ARCH_QCOM && ARM64 && PERF_EVENTS && ACPI
	select QCOM_IRQ_COMBINER
	help
	   Provides support for the L3 cache performance monitor unit (PMU)
	   in Qualcomm Technologies processors.
	   Adds the L3 cache PMU into the perf events subsystem for
	   monitoring L3 cache events.

config XGENE_PMU
        depends on PERF_EVENTS && ARCH_XGENE
        bool "APM X-Gene SoC PMU"
+1 −0
Original line number Diff line number Diff line
obj-$(CONFIG_ARM_PMU) += arm_pmu.o
obj-$(CONFIG_QCOM_L2_PMU)	+= qcom_l2_pmu.o
obj-$(CONFIG_QCOM_L3_PMU) += qcom_l3_pmu.o
obj-$(CONFIG_XGENE_PMU) += xgene_pmu.o
+849 −0

File added.

Preview size limit exceeded, changes collapsed.

+1 −0
Original line number Diff line number Diff line
@@ -137,6 +137,7 @@ enum cpuhp_state {
	CPUHP_AP_PERF_ARM_CCN_ONLINE,
	CPUHP_AP_PERF_ARM_L2X0_ONLINE,
	CPUHP_AP_PERF_ARM_QCOM_L2_ONLINE,
	CPUHP_AP_PERF_ARM_QCOM_L3_ONLINE,
	CPUHP_AP_WORKQUEUE_ONLINE,
	CPUHP_AP_RCUTREE_ONLINE,
	CPUHP_AP_ONLINE_DYN,