Commit 94d18ee9 authored by Linus Torvalds's avatar Linus Torvalds
Browse files

Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RCU updates from Ingo Molnar:
 "This cycle's RCU changes were:

   - A few more RCU flavor consolidation cleanups.

   - Updates to RCU's list-traversal macros improving lockdep usability.

   - Forward-progress improvements for no-CBs CPUs: Avoid ignoring
     incoming callbacks during grace-period waits.

   - Forward-progress improvements for no-CBs CPUs: Use ->cblist
     structure to take advantage of others' grace periods.

   - Also added a small commit that avoids needlessly inflicting
     scheduler-clock ticks on callback-offloaded CPUs.

   - Forward-progress improvements for no-CBs CPUs: Reduce contention on
     ->nocb_lock guarding ->cblist.

   - Forward-progress improvements for no-CBs CPUs: Add ->nocb_bypass
     list to further reduce contention on ->nocb_lock guarding ->cblist.

   - Miscellaneous fixes.

   - Torture-test updates.

   - minor LKMM updates"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (86 commits)
  MAINTAINERS: Update from paulmck@linux.ibm.com to paulmck@kernel.org
  rcu: Don't include <linux/ktime.h> in rcutiny.h
  rcu: Allow rcu_do_batch() to dynamically adjust batch sizes
  rcu/nocb: Don't wake no-CBs GP kthread if timer posted under overload
  rcu/nocb: Reduce __call_rcu_nocb_wake() leaf rcu_node ->lock contention
  rcu/nocb: Reduce nocb_cb_wait() leaf rcu_node ->lock contention
  rcu/nocb: Advance CBs after merge in rcutree_migrate_callbacks()
  rcu/nocb: Avoid synchronous wakeup in __call_rcu_nocb_wake()
  rcu/nocb: Print no-CBs diagnostics when rcutorture writer unduly delayed
  rcu/nocb: EXP Check use and usefulness of ->nocb_lock_contended
  rcu/nocb: Add bypass callback queueing
  rcu/nocb: Atomic ->len field in rcu_segcblist structure
  rcu/nocb: Unconditionally advance and wake for excessive CBs
  rcu/nocb: Reduce ->nocb_lock contention with separate ->nocb_gp_lock
  rcu/nocb: Reduce contention at no-CBs invocation-done time
  rcu/nocb: Reduce contention at no-CBs registry-time CB advancement
  rcu/nocb: Round down for number of no-CBs grace-period kthreads
  rcu/nocb: Avoid ->nocb_lock capture by corresponding CPU
  rcu/nocb: Avoid needless wakeups of no-CBs grace-period kthread
  rcu/nocb: Make __call_rcu_nocb_wake() safe for many callbacks
  ...
parents d75a43c6 4a0fa886
Loading
Loading
Loading
Loading
+72 −1
Original line number Diff line number Diff line
@@ -2129,6 +2129,8 @@ Some of the relevant points of interest are as follows:
<li>	<a href="#Hotplug CPU">Hotplug CPU</a>.
<li>	<a href="#Scheduler and RCU">Scheduler and RCU</a>.
<li>	<a href="#Tracing and RCU">Tracing and RCU</a>.
<li>	<a href="#Accesses to User Memory and RCU">
Accesses to User Memory and RCU</a>.
<li>	<a href="#Energy Efficiency">Energy Efficiency</a>.
<li>	<a href="#Scheduling-Clock Interrupts and RCU">
	Scheduling-Clock Interrupts and RCU</a>.
@@ -2512,7 +2514,7 @@ disabled across the entire RCU read-side critical section.
<p>
It is possible to use tracing on RCU code, but tracing itself
uses RCU.
For this reason, <tt>rcu_dereference_raw_notrace()</tt>
For this reason, <tt>rcu_dereference_raw_check()</tt>
is provided for use by tracing, which avoids the destructive
recursion that could otherwise ensue.
This API is also used by virtualization in some architectures,
@@ -2521,6 +2523,75 @@ cannot be used.
The tracing folks both located the requirement and provided the
needed fix, so this surprise requirement was relatively painless.

<h3><a name="Accesses to User Memory and RCU">
Accesses to User Memory and RCU</a></h3>

<p>
The kernel needs to access user-space memory, for example, to access
data referenced by system-call parameters.
The <tt>get_user()</tt> macro does this job.

<p>
However, user-space memory might well be paged out, which means
that <tt>get_user()</tt> might well page-fault and thus block while
waiting for the resulting I/O to complete.
It would be a very bad thing for the compiler to reorder
a <tt>get_user()</tt> invocation into an RCU read-side critical
section.
For example, suppose that the source code looked like this:

<blockquote>
<pre>
 1 rcu_read_lock();
 2 p = rcu_dereference(gp);
 3 v = p-&gt;value;
 4 rcu_read_unlock();
 5 get_user(user_v, user_p);
 6 do_something_with(v, user_v);
</pre>
</blockquote>

<p>
The compiler must not be permitted to transform this source code into
the following:

<blockquote>
<pre>
 1 rcu_read_lock();
 2 p = rcu_dereference(gp);
 3 get_user(user_v, user_p); // BUG: POSSIBLE PAGE FAULT!!!
 4 v = p-&gt;value;
 5 rcu_read_unlock();
 6 do_something_with(v, user_v);
</pre>
</blockquote>

<p>
If the compiler did make this transformation in a
<tt>CONFIG_PREEMPT=n</tt> kernel build, and if <tt>get_user()</tt> did
page fault, the result would be a quiescent state in the middle
of an RCU read-side critical section.
This misplaced quiescent state could result in line&nbsp;4 being
a use-after-free access, which could be bad for your kernel's
actuarial statistics.
Similar examples can be constructed with the call to <tt>get_user()</tt>
preceding the <tt>rcu_read_lock()</tt>.

<p>
Unfortunately, <tt>get_user()</tt> doesn't have any particular
ordering properties, and in some architectures the underlying <tt>asm</tt>
isn't even marked <tt>volatile</tt>.
And even if it was marked <tt>volatile</tt>, the above access to
<tt>p-&gt;value</tt> is not volatile, so the compiler would not have any
reason to keep those two accesses in order.

<p>
Therefore, the Linux-kernel definitions of <tt>rcu_read_lock()</tt>
and <tt>rcu_read_unlock()</tt> must act as compiler barriers,
at least for outermost instances of <tt>rcu_read_lock()</tt> and
<tt>rcu_read_unlock()</tt> within a nested set of RCU read-side critical
sections.

<h3><a name="Energy Efficiency">Energy Efficiency</a></h3>

<p>
+6 −0
Original line number Diff line number Diff line
@@ -57,6 +57,12 @@ o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that
	CONFIG_PREEMPT_RCU case, you might see stall-warning
	messages.

	You can use the rcutree.kthread_prio kernel boot parameter to
	increase the scheduling priority of RCU's kthreads, which can
	help avoid this problem.  However, please note that doing this
	can increase your system's context-switch rate and thus degrade
	performance.

o	A periodic interrupt whose handler takes longer than the time
	interval between successive pairs of interrupts.  This can
	prevent RCU's kthreads and softirq handlers from running.
+11 −6
Original line number Diff line number Diff line
@@ -3842,12 +3842,13 @@
			RCU_BOOST is not set, valid values are 0-99 and
			the default is zero (non-realtime operation).

	rcutree.rcu_nocb_leader_stride= [KNL]
			Set the number of NOCB kthread groups, which
			defaults to the square root of the number of
			CPUs.  Larger numbers reduces the wakeup overhead
			on the per-CPU grace-period kthreads, but increases
			that same overhead on each group's leader.
	rcutree.rcu_nocb_gp_stride= [KNL]
			Set the number of NOCB callback kthreads in
			each group, which defaults to the square root
			of the number of CPUs.	Larger numbers reduce
			the wakeup overhead on the global grace-period
			kthread, but increases that same overhead on
			each group's NOCB grace-period kthread.

	rcutree.qhimark= [KNL]
			Set threshold of queued RCU callbacks beyond which
@@ -4052,6 +4053,10 @@
	rcutorture.verbose= [KNL]
			Enable additional printk() statements.

	rcupdate.rcu_cpu_stall_ftrace_dump= [KNL]
			Dump ftrace buffer after reporting RCU CPU
			stall warning.

	rcupdate.rcu_cpu_stall_suppress= [KNL]
			Suppress RCU CPU stall warning messages.

+8 −8
Original line number Diff line number Diff line
@@ -9325,7 +9325,7 @@ F: drivers/misc/lkdtm/*

LINUX KERNEL MEMORY CONSISTENCY MODEL (LKMM)
M:	Alan Stern <stern@rowland.harvard.edu>
M:	Andrea Parri <andrea.parri@amarulasolutions.com>
M:	Andrea Parri <parri.andrea@gmail.com>
M:	Will Deacon <will@kernel.org>
M:	Peter Zijlstra <peterz@infradead.org>
M:	Boqun Feng <boqun.feng@gmail.com>
@@ -9333,7 +9333,7 @@ M: Nicholas Piggin <npiggin@gmail.com>
M:	David Howells <dhowells@redhat.com>
M:	Jade Alglave <j.alglave@ucl.ac.uk>
M:	Luc Maranget <luc.maranget@inria.fr>
M:	"Paul E. McKenney" <paulmck@linux.ibm.com>
M:	"Paul E. McKenney" <paulmck@kernel.org>
R:	Akira Yokosawa <akiyks@gmail.com>
R:	Daniel Lustig <dlustig@nvidia.com>
L:	linux-kernel@vger.kernel.org
@@ -10362,7 +10362,7 @@ F: drivers/platform/x86/mlx-platform.c

MEMBARRIER SUPPORT
M:	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
M:	"Paul E. McKenney" <paulmck@linux.ibm.com>
M:	"Paul E. McKenney" <paulmck@kernel.org>
L:	linux-kernel@vger.kernel.org
S:	Supported
F:	kernel/sched/membarrier.c
@@ -13465,7 +13465,7 @@ S: Orphan
F:	drivers/net/wireless/ray*

RCUTORTURE TEST FRAMEWORK
M:	"Paul E. McKenney" <paulmck@linux.ibm.com>
M:	"Paul E. McKenney" <paulmck@kernel.org>
M:	Josh Triplett <josh@joshtriplett.org>
R:	Steven Rostedt <rostedt@goodmis.org>
R:	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
@@ -13512,7 +13512,7 @@ F: arch/x86/include/asm/resctrl_sched.h
F:	Documentation/x86/resctrl*

READ-COPY UPDATE (RCU)
M:	"Paul E. McKenney" <paulmck@linux.ibm.com>
M:	"Paul E. McKenney" <paulmck@kernel.org>
M:	Josh Triplett <josh@joshtriplett.org>
R:	Steven Rostedt <rostedt@goodmis.org>
R:	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
@@ -13670,7 +13670,7 @@ F: include/linux/reset-controller.h
RESTARTABLE SEQUENCES SUPPORT
M:	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
M:	Peter Zijlstra <peterz@infradead.org>
M:	"Paul E. McKenney" <paulmck@linux.ibm.com>
M:	"Paul E. McKenney" <paulmck@kernel.org>
M:	Boqun Feng <boqun.feng@gmail.com>
L:	linux-kernel@vger.kernel.org
S:	Supported
@@ -14710,7 +14710,7 @@ F: mm/sl?b*

SLEEPABLE READ-COPY UPDATE (SRCU)
M:	Lai Jiangshan <jiangshanlai@gmail.com>
M:	"Paul E. McKenney" <paulmck@linux.ibm.com>
M:	"Paul E. McKenney" <paulmck@kernel.org>
M:	Josh Triplett <josh@joshtriplett.org>
R:	Steven Rostedt <rostedt@goodmis.org>
R:	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
@@ -16209,7 +16209,7 @@ F: drivers/platform/x86/topstar-laptop.c

TORTURE-TEST MODULES
M:	Davidlohr Bueso <dave@stgolabs.net>
M:	"Paul E. McKenney" <paulmck@linux.ibm.com>
M:	"Paul E. McKenney" <paulmck@kernel.org>
M:	Josh Triplett <josh@joshtriplett.org>
L:	linux-kernel@vger.kernel.org
S:	Supported
+2 −4
Original line number Diff line number Diff line
@@ -264,15 +264,13 @@ int __cpu_disable(void)
	return 0;
}

static DECLARE_COMPLETION(cpu_died);

/*
 * called on the thread which is asking for a CPU to be shutdown -
 * waits until shutdown has completed, or it is timed out.
 */
void __cpu_die(unsigned int cpu)
{
	if (!wait_for_completion_timeout(&cpu_died, msecs_to_jiffies(5000))) {
	if (!cpu_wait_death(cpu, 5)) {
		pr_err("CPU%u: cpu didn't die\n", cpu);
		return;
	}
@@ -319,7 +317,7 @@ void arch_cpu_idle_dead(void)
	 * this returns, power and/or clocks can be removed at any point
	 * from this CPU and its cache by platform_cpu_kill().
	 */
	complete(&cpu_died);
	(void)cpu_report_death();

	/*
	 * Ensure that the cache lines associated with that completion are
Loading