Merge branches 'pm-cpuidle' and 'pm-em' (4c5744a0) · Commits · 戴 / test

Documentation/driver-api/thermal/power_allocator.rst

+11 −1

Original line number	Diff line number	Diff line
		@@ -71,7 +71,9 @@ to the speed-grade of the silicon. `sustainable_power` is therefore
		simply an estimate, and may be tuned to affect the aggressiveness of
		the thermal ramp. For reference, the sustainable power of a 4" phone
		is typically 2000mW, while on a 10" tablet is around 4500mW (may vary
		depending on screen size).
		depending on screen size). It is possible to have the power value
		expressed in an abstract scale. The sustained power should be aligned
		to the scale used by the related cooling devices.

		If you are using device tree, do add it as a property of the
		thermal-zone. For example::
		@@ -269,3 +271,11 @@ won't be very good. Note that this is not particular to this
		governor, step-wise will also misbehave if you call its throttle()
		faster than the normal thermal framework tick (due to interrupts for
		example) as it will overreact.

		Energy Model requirements
		=========================

		Another important thing is the consistent scale of the power values
		provided by the cooling devices. All of the cooling devices in a single
		thermal zone should have power values reported either in milli-Watts
		or scaled to the same 'abstract scale'.

Documentation/power/energy-model.rst

+25 −5

Original line number	Diff line number	Diff line
		@@ -20,6 +20,21 @@ possible source of information on its own, the EM framework intervenes as an
		abstraction layer which standardizes the format of power cost tables in the
		kernel, hence enabling to avoid redundant work.

		The power values might be expressed in milli-Watts or in an 'abstract scale'.
		Multiple subsystems might use the EM and it is up to the system integrator to
		check that the requirements for the power value scale types are met. An example
		can be found in the Energy-Aware Scheduler documentation
		Documentation/scheduler/sched-energy.rst. For some subsystems like thermal or
		powercap power values expressed in an 'abstract scale' might cause issues.
		These subsystems are more interested in estimation of power used in the past,
		thus the real milli-Watts might be needed. An example of these requirements can
		be found in the Intelligent Power Allocation in
		Documentation/driver-api/thermal/power_allocator.rst.
		Kernel subsystems might implement automatic detection to check whether EM
		registered devices have inconsistent scale (based on EM internal flag).
		Important thing to keep in mind is that when the power values are expressed in
		an 'abstract scale' deriving real energy in milli-Joules would not be possible.

		The figure below depicts an example of drivers (Arm-specific here, but the
		approach is applicable to any architecture) providing power costs to the EM
		framework, and interested clients reading the data from it::
		@@ -73,7 +88,7 @@ Drivers are expected to register performance domains into the EM framework by
		calling the following API::

		int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
		struct em_data_callback cb, cpumask_t cpus);
		struct em_data_callback cb, cpumask_t cpus, bool milliwatts);

		Drivers must provide a callback function returning <frequency, power> tuples
		for each performance state. The callback function provided by the driver is free
		@@ -81,6 +96,10 @@ to fetch data from any relevant location (DT, firmware, ...), and by any mean
		deemed necessary. Only for CPU devices, drivers must specify the CPUs of the
		performance domains using cpumask. For other devices than CPUs the last
		argument must be set to NULL.
		The last argument 'milliwatts' is important to set with correct value. Kernel
		subsystems which use EM might rely on this flag to check if all EM devices use
		the same scale. If there are different scales, these subsystems might decide
		to: return warning/error, stop working or panic.
		See Section 3. for an example of driver implementing this
		callback, and kernel/power/energy_model.c for further documentation on this
		API.
		@@ -156,7 +175,8 @@ EM framework::
		37 nr_opp = foo_get_nr_opp(policy);
		38
		39 /* And register the new performance domain */
		40 em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, policy->cpus);
		41
		42 return 0;
		43 }
		40 em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, policy->cpus,
		41 true);
		42
		43 return 0;
		44 }

Documentation/scheduler/sched-energy.rst

+5 −0

Original line number	Diff line number	Diff line
		@@ -350,6 +350,11 @@ independent EM framework in Documentation/power/energy-model.rst.
		Please also note that the scheduling domains need to be re-built after the
		EM has been registered in order to start EAS.

		EAS uses the EM to make a forecasting decision on energy usage and thus it is
		more focused on the difference when checking possible options for task
		placement. For EAS it doesn't matter whether the EM power values are expressed
		in milli-Watts or in an 'abstract scale'.


		6.3 - Energy Model complexity
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

drivers/base/power/domain.c

+35 −16

Original line number	Diff line number	Diff line
		@@ -1363,41 +1363,60 @@ static void genpd_complete(struct device *dev)
		genpd_unlock(genpd);
		}

		/**
		* genpd_syscore_switch - Switch power during system core suspend or resume.
		* @dev: Device that normally is marked as "always on" to switch power for.
		*
		* This routine may only be called during the system core (syscore) suspend or
		* resume phase for devices whose "always on" flags are set.
		*/
		static void genpd_syscore_switch(struct device *dev, bool suspend)
		static void genpd_switch_state(struct device *dev, bool suspend)
		{
		struct generic_pm_domain *genpd;
		bool use_lock;

		genpd = dev_to_genpd_safe(dev);
		if (!genpd)
		return;

		use_lock = genpd_is_irq_safe(genpd);

		if (use_lock)
		genpd_lock(genpd);

		if (suspend) {
		genpd->suspended_count++;
		genpd_sync_power_off(genpd, false, 0);
		genpd_sync_power_off(genpd, use_lock, 0);
		} else {
		genpd_sync_power_on(genpd, false, 0);
		genpd_sync_power_on(genpd, use_lock, 0);
		genpd->suspended_count--;
		}

		if (use_lock)
		genpd_unlock(genpd);
		}

		void pm_genpd_syscore_poweroff(struct device *dev)
		/**
		* dev_pm_genpd_suspend - Synchronously try to suspend the genpd for @dev
		* @dev: The device that is attached to the genpd, that can be suspended.
		*
		* This routine should typically be called for a device that needs to be
		* suspended during the syscore suspend phase. It may also be called during
		* suspend-to-idle to suspend a corresponding CPU device that is attached to a
		* genpd.
		*/
		void dev_pm_genpd_suspend(struct device *dev)
		{
		genpd_syscore_switch(dev, true);
		genpd_switch_state(dev, true);
		}
		EXPORT_SYMBOL_GPL(pm_genpd_syscore_poweroff);
		EXPORT_SYMBOL_GPL(dev_pm_genpd_suspend);

		void pm_genpd_syscore_poweron(struct device *dev)
		/**
		* dev_pm_genpd_resume - Synchronously try to resume the genpd for @dev
		* @dev: The device that is attached to the genpd, which needs to be resumed.
		*
		* This routine should typically be called for a device that needs to be resumed
		* during the syscore resume phase. It may also be called during suspend-to-idle
		* to resume a corresponding CPU device that is attached to a genpd.
		*/
		void dev_pm_genpd_resume(struct device *dev)
		{
		genpd_syscore_switch(dev, false);
		genpd_switch_state(dev, false);
		}
		EXPORT_SYMBOL_GPL(pm_genpd_syscore_poweron);
		EXPORT_SYMBOL_GPL(dev_pm_genpd_resume);

		#else /* !CONFIG_PM_SLEEP */

drivers/clocksource/sh_cmt.c

+4 −4

Original line number	Diff line number	Diff line
		@@ -658,7 +658,7 @@ static void sh_cmt_clocksource_suspend(struct clocksource *cs)
		return;

		sh_cmt_stop(ch, FLAG_CLOCKSOURCE);
		pm_genpd_syscore_poweroff(&ch->cmt->pdev->dev);
		dev_pm_genpd_suspend(&ch->cmt->pdev->dev);
		}

		static void sh_cmt_clocksource_resume(struct clocksource *cs)
		@@ -668,7 +668,7 @@ static void sh_cmt_clocksource_resume(struct clocksource *cs)
		if (!ch->cs_enabled)
		return;

		pm_genpd_syscore_poweron(&ch->cmt->pdev->dev);
		dev_pm_genpd_resume(&ch->cmt->pdev->dev);
		sh_cmt_start(ch, FLAG_CLOCKSOURCE);
		}

		@@ -760,7 +760,7 @@ static void sh_cmt_clock_event_suspend(struct clock_event_device *ced)
		{
		struct sh_cmt_channel *ch = ced_to_sh_cmt(ced);

		pm_genpd_syscore_poweroff(&ch->cmt->pdev->dev);
		dev_pm_genpd_suspend(&ch->cmt->pdev->dev);
		clk_unprepare(ch->cmt->clk);
		}

		@@ -769,7 +769,7 @@ static void sh_cmt_clock_event_resume(struct clock_event_device *ced)
		struct sh_cmt_channel *ch = ced_to_sh_cmt(ced);

		clk_prepare(ch->cmt->clk);
		pm_genpd_syscore_poweron(&ch->cmt->pdev->dev);
		dev_pm_genpd_resume(&ch->cmt->pdev->dev);
		}

		static int sh_cmt_register_clockevent(struct sh_cmt_channel *ch,

Admin message