Commit 6c06b66e authored by Ingo Molnar's avatar Ingo Molnar
Browse files

Merge branch 'for-mingo' of...

Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu

 into core/rcu

Pull RCU and LKMM changes from Paul E. McKenney:

 - A few more RCU flavor consolidation cleanups.

 - Miscellaneous fixes.

 - Updates to RCU's list-traversal macros improving lockdep usability.

 - Torture-test updates.

 - Forward-progress improvements for no-CBs CPUs: Avoid ignoring
   incoming callbacks during grace-period waits.

 - Forward-progress improvements for no-CBs CPUs: Use ->cblist
   structure to take advantage of others' grace periods.

 - Also added a small commit that avoids needlessly inflicting
   scheduler-clock ticks on callback-offloaded CPUs.

 - Forward-progress improvements for no-CBs CPUs: Reduce contention
   on ->nocb_lock guarding ->cblist.

 - Forward-progress improvements for no-CBs CPUs: Add ->nocb_bypass
   list to further reduce contention on ->nocb_lock guarding ->cblist.

 - LKMM updates.

Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
parents bb7ba806 07f038a4
Loading
Loading
Loading
Loading
+72 −1
Original line number Diff line number Diff line
@@ -2129,6 +2129,8 @@ Some of the relevant points of interest are as follows:
<li>	<a href="#Hotplug CPU">Hotplug CPU</a>.
<li>	<a href="#Scheduler and RCU">Scheduler and RCU</a>.
<li>	<a href="#Tracing and RCU">Tracing and RCU</a>.
<li>	<a href="#Accesses to User Memory and RCU">
Accesses to User Memory and RCU</a>.
<li>	<a href="#Energy Efficiency">Energy Efficiency</a>.
<li>	<a href="#Scheduling-Clock Interrupts and RCU">
	Scheduling-Clock Interrupts and RCU</a>.
@@ -2512,7 +2514,7 @@ disabled across the entire RCU read-side critical section.
<p>
It is possible to use tracing on RCU code, but tracing itself
uses RCU.
For this reason, <tt>rcu_dereference_raw_notrace()</tt>
For this reason, <tt>rcu_dereference_raw_check()</tt>
is provided for use by tracing, which avoids the destructive
recursion that could otherwise ensue.
This API is also used by virtualization in some architectures,
@@ -2521,6 +2523,75 @@ cannot be used.
The tracing folks both located the requirement and provided the
needed fix, so this surprise requirement was relatively painless.

<h3><a name="Accesses to User Memory and RCU">
Accesses to User Memory and RCU</a></h3>

<p>
The kernel needs to access user-space memory, for example, to access
data referenced by system-call parameters.
The <tt>get_user()</tt> macro does this job.

<p>
However, user-space memory might well be paged out, which means
that <tt>get_user()</tt> might well page-fault and thus block while
waiting for the resulting I/O to complete.
It would be a very bad thing for the compiler to reorder
a <tt>get_user()</tt> invocation into an RCU read-side critical
section.
For example, suppose that the source code looked like this:

<blockquote>
<pre>
 1 rcu_read_lock();
 2 p = rcu_dereference(gp);
 3 v = p-&gt;value;
 4 rcu_read_unlock();
 5 get_user(user_v, user_p);
 6 do_something_with(v, user_v);
</pre>
</blockquote>

<p>
The compiler must not be permitted to transform this source code into
the following:

<blockquote>
<pre>
 1 rcu_read_lock();
 2 p = rcu_dereference(gp);
 3 get_user(user_v, user_p); // BUG: POSSIBLE PAGE FAULT!!!
 4 v = p-&gt;value;
 5 rcu_read_unlock();
 6 do_something_with(v, user_v);
</pre>
</blockquote>

<p>
If the compiler did make this transformation in a
<tt>CONFIG_PREEMPT=n</tt> kernel build, and if <tt>get_user()</tt> did
page fault, the result would be a quiescent state in the middle
of an RCU read-side critical section.
This misplaced quiescent state could result in line&nbsp;4 being
a use-after-free access, which could be bad for your kernel's
actuarial statistics.
Similar examples can be constructed with the call to <tt>get_user()</tt>
preceding the <tt>rcu_read_lock()</tt>.

<p>
Unfortunately, <tt>get_user()</tt> doesn't have any particular
ordering properties, and in some architectures the underlying <tt>asm</tt>
isn't even marked <tt>volatile</tt>.
And even if it was marked <tt>volatile</tt>, the above access to
<tt>p-&gt;value</tt> is not volatile, so the compiler would not have any
reason to keep those two accesses in order.

<p>
Therefore, the Linux-kernel definitions of <tt>rcu_read_lock()</tt>
and <tt>rcu_read_unlock()</tt> must act as compiler barriers,
at least for outermost instances of <tt>rcu_read_lock()</tt> and
<tt>rcu_read_unlock()</tt> within a nested set of RCU read-side critical
sections.

<h3><a name="Energy Efficiency">Energy Efficiency</a></h3>

<p>
+6 −0
Original line number Diff line number Diff line
@@ -57,6 +57,12 @@ o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that
	CONFIG_PREEMPT_RCU case, you might see stall-warning
	messages.

	You can use the rcutree.kthread_prio kernel boot parameter to
	increase the scheduling priority of RCU's kthreads, which can
	help avoid this problem.  However, please note that doing this
	can increase your system's context-switch rate and thus degrade
	performance.

o	A periodic interrupt whose handler takes longer than the time
	interval between successive pairs of interrupts.  This can
	prevent RCU's kthreads and softirq handlers from running.
+11 −6
Original line number Diff line number Diff line
@@ -3837,12 +3837,13 @@
			RCU_BOOST is not set, valid values are 0-99 and
			the default is zero (non-realtime operation).

	rcutree.rcu_nocb_leader_stride= [KNL]
			Set the number of NOCB kthread groups, which
			defaults to the square root of the number of
			CPUs.  Larger numbers reduces the wakeup overhead
			on the per-CPU grace-period kthreads, but increases
			that same overhead on each group's leader.
	rcutree.rcu_nocb_gp_stride= [KNL]
			Set the number of NOCB callback kthreads in
			each group, which defaults to the square root
			of the number of CPUs.	Larger numbers reduce
			the wakeup overhead on the global grace-period
			kthread, but increases that same overhead on
			each group's NOCB grace-period kthread.

	rcutree.qhimark= [KNL]
			Set threshold of queued RCU callbacks beyond which
@@ -4047,6 +4048,10 @@
	rcutorture.verbose= [KNL]
			Enable additional printk() statements.

	rcupdate.rcu_cpu_stall_ftrace_dump= [KNL]
			Dump ftrace buffer after reporting RCU CPU
			stall warning.

	rcupdate.rcu_cpu_stall_suppress= [KNL]
			Suppress RCU CPU stall warning messages.

+1 −1
Original line number Diff line number Diff line
@@ -9340,7 +9340,7 @@ F: drivers/misc/lkdtm/*

LINUX KERNEL MEMORY CONSISTENCY MODEL (LKMM)
M:	Alan Stern <stern@rowland.harvard.edu>
M:	Andrea Parri <andrea.parri@amarulasolutions.com>
M:	Andrea Parri <parri.andrea@gmail.com>
M:	Will Deacon <will@kernel.org>
M:	Peter Zijlstra <peterz@infradead.org>
M:	Boqun Feng <boqun.feng@gmail.com>
+2 −4
Original line number Diff line number Diff line
@@ -264,15 +264,13 @@ int __cpu_disable(void)
	return 0;
}

static DECLARE_COMPLETION(cpu_died);

/*
 * called on the thread which is asking for a CPU to be shutdown -
 * waits until shutdown has completed, or it is timed out.
 */
void __cpu_die(unsigned int cpu)
{
	if (!wait_for_completion_timeout(&cpu_died, msecs_to_jiffies(5000))) {
	if (!cpu_wait_death(cpu, 5)) {
		pr_err("CPU%u: cpu didn't die\n", cpu);
		return;
	}
@@ -319,7 +317,7 @@ void arch_cpu_idle_dead(void)
	 * this returns, power and/or clocks can be removed at any point
	 * from this CPU and its cache by platform_cpu_kill().
	 */
	complete(&cpu_died);
	(void)cpu_report_death();

	/*
	 * Ensure that the cache lines associated with that completion are
Loading