Commit 642e53ea authored by Linus Torvalds's avatar Linus Torvalds
Browse files

Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar:
 "The main changes in this cycle are:

   - Various NUMA scheduling updates: harmonize the load-balancer and
     NUMA placement logic to not work against each other. The intended
     result is better locality, better utilization and fewer migrations.

   - Introduce Thermal Pressure tracking and optimizations, to improve
     task placement on thermally overloaded systems.

   - Implement frequency invariant scheduler accounting on (some) x86
     CPUs. This is done by observing and sampling the 'recent' CPU
     frequency average at ~tick boundaries. The CPU provides this data
     via the APERF/MPERF MSRs. This hopefully makes our capacity
     estimates more precise and keeps tasks on the same CPU better even
     if it might seem overloaded at a lower momentary frequency. (As
     usual, turbo mode is a complication that we resolve by observing
     the maximum frequency and renormalizing to it.)

   - Add asymmetric CPU capacity wakeup scan to improve capacity
     utilization on asymmetric topologies. (big.LITTLE systems)

   - PSI fixes and optimizations.

   - RT scheduling capacity awareness fixes & improvements.

   - Optimize the CONFIG_RT_GROUP_SCHED constraints code.

   - Misc fixes, cleanups and optimizations - see the changelog for
     details"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (62 commits)
  threads: Update PID limit comment according to futex UAPI change
  sched/fair: Fix condition of avg_load calculation
  sched/rt: cpupri_find: Trigger a full search as fallback
  kthread: Do not preempt current task if it is going to call schedule()
  sched/fair: Improve spreading of utilization
  sched: Avoid scale real weight down to zero
  psi: Move PF_MEMSTALL out of task->flags
  MAINTAINERS: Add maintenance information for psi
  psi: Optimize switching tasks inside shared cgroups
  psi: Fix cpu.pressure for cpu.max and competing cgroups
  sched/core: Distribute tasks within affinity masks
  sched/fair: Fix enqueue_task_fair warning
  thermal/cpu-cooling, sched/core: Move the arch_set_thermal_pressure() API to generic scheduler code
  sched/rt: Remove unnecessary push for unfit tasks
  sched/rt: Allow pulling unfitting task
  sched/rt: Optimize cpupri_find() on non-heterogenous systems
  sched/rt: Re-instate old behavior in select_task_rq_rt()
  sched/rt: cpupri_find: Implement fallback mechanism for !fit case
  sched/fair: Fix reordering of enqueue/dequeue_task_fair()
  sched/fair: Fix runnable_avg for throttled cfs
  ...
parents 9b82f05f 313f16e2
Loading
Loading
Loading
Loading
+16 −0
Original line number Diff line number Diff line
@@ -4428,6 +4428,22 @@
			incurs a small amount of overhead in the scheduler
			but is useful for debugging and performance tuning.

	sched_thermal_decay_shift=
			[KNL, SMP] Set a decay shift for scheduler thermal
			pressure signal. Thermal pressure signal follows the
			default decay period of other scheduler pelt
			signals(usually 32 ms but configurable). Setting
			sched_thermal_decay_shift will left shift the decay
			period for the thermal pressure signal by the shift
			value.
			i.e. with the default pelt decay period of 32 ms
			sched_thermal_decay_shift   thermal pressure decay pr
				1			64 ms
				2			128 ms
			and so on.
			Format: integer between 0 and 10
			Default is 0.

	skew_tick=	[KNL] Offset the periodic timer tick per cpu to mitigate
			xtime_lock contention on larger systems, and/or RCU lock
			contention on all systems with CONFIG_MAXSMP set.
+6 −8
Original line number Diff line number Diff line
@@ -61,8 +61,8 @@ setup that list.
  address of the associated 'lock entry', plus or minus, of what will
  be called the 'lock word', from that 'lock entry'.  The 'lock word'
  is always a 32 bit word, unlike the other words above.  The 'lock
  word' holds 3 flag bits in the upper 3 bits, and the thread id (TID)
  of the thread holding the lock in the bottom 29 bits.  See further
  word' holds 2 flag bits in the upper 2 bits, and the thread id (TID)
  of the thread holding the lock in the bottom 30 bits.  See further
  below for a description of the flag bits.

  The third word, called 'list_op_pending', contains transient copy of
@@ -128,7 +128,7 @@ that thread's robust_futex linked lock list a given time.
A given futex lock structure in a user shared memory region may be held
at different times by any of the threads with access to that region. The
thread currently holding such a lock, if any, is marked with the threads
TID in the lower 29 bits of the 'lock word'.
TID in the lower 30 bits of the 'lock word'.

When adding or removing a lock from its list of held locks, in order for
the kernel to correctly handle lock cleanup regardless of when the task
@@ -141,7 +141,7 @@ On insertion:
 1) set the 'list_op_pending' word to the address of the 'lock entry'
    to be inserted,
 2) acquire the futex lock,
 3) add the lock entry, with its thread id (TID) in the bottom 29 bits
 3) add the lock entry, with its thread id (TID) in the bottom 30 bits
    of the 'lock word', to the linked list starting at 'head', and
 4) clear the 'list_op_pending' word.

@@ -155,7 +155,7 @@ On removal:

On exit, the kernel will consider the address stored in
'list_op_pending' and the address of each 'lock word' found by walking
the list starting at 'head'.  For each such address, if the bottom 29
the list starting at 'head'.  For each such address, if the bottom 30
bits of the 'lock word' at offset 'offset' from that address equals the
exiting threads TID, then the kernel will do two things:

@@ -180,7 +180,5 @@ any point:
    future kernel configuration changes) elements.

When the kernel sees a list entry whose 'lock word' doesn't have the
current threads TID in the lower 29 bits, it does nothing with that
current threads TID in the lower 30 bits, it does nothing with that
entry, and goes on to the next entry.

Bit 29 (0x20000000) of the 'lock word' is reserved for future use.
+6 −0
Original line number Diff line number Diff line
@@ -13552,6 +13552,12 @@ F: net/psample
F:	include/net/psample.h
F:	include/uapi/linux/psample.h
PRESSURE STALL INFORMATION (PSI)
M:	Johannes Weiner <hannes@cmpxchg.org>
S:	Maintained
F:	kernel/sched/psi.c
F:	include/linux/psi*
PSTORE FILESYSTEM
M:	Kees Cook <keescook@chromium.org>
M:	Anton Vorontsov <anton@enomsg.org>
+3 −0
Original line number Diff line number Diff line
@@ -16,6 +16,9 @@
/* Enable topology flag updates */
#define arch_update_cpu_topology topology_update_cpu_topology

/* Replace task scheduler's default thermal pressure retrieve API */
#define arch_scale_thermal_pressure topology_get_thermal_pressure

#else

static inline void init_cpu_topology(void) { }
+1 −0
Original line number Diff line number Diff line
@@ -62,6 +62,7 @@ CONFIG_ARCH_ZX=y
CONFIG_ARCH_ZYNQMP=y
CONFIG_ARM64_VA_BITS_48=y
CONFIG_SCHED_MC=y
CONFIG_SCHED_SMT=y
CONFIG_NUMA=y
CONFIG_SECCOMP=y
CONFIG_KEXEC=y
Loading