Merge branch 'for-mingo' of... (6c06b66e) · Commits · 戴 / test

Documentation/RCU/Design/Requirements/Requirements.html

+72 −1

Original line number	Diff line number	Diff line
		@@ -2129,6 +2129,8 @@ Some of the relevant points of interest are as follows:
		<li> <a href="#Hotplug CPU">Hotplug CPU</a>.
		<li> <a href="#Scheduler and RCU">Scheduler and RCU</a>.
		<li> <a href="#Tracing and RCU">Tracing and RCU</a>.
		<li> <a href="#Accesses to User Memory and RCU">
		Accesses to User Memory and RCU</a>.
		<li> <a href="#Energy Efficiency">Energy Efficiency</a>.
		<li> <a href="#Scheduling-Clock Interrupts and RCU">
		Scheduling-Clock Interrupts and RCU</a>.
		@@ -2512,7 +2514,7 @@ disabled across the entire RCU read-side critical section.
		<p>
		It is possible to use tracing on RCU code, but tracing itself
		uses RCU.
		For this reason, <tt>rcu_dereference_raw_notrace()</tt>
		For this reason, <tt>rcu_dereference_raw_check()</tt>
		is provided for use by tracing, which avoids the destructive
		recursion that could otherwise ensue.
		This API is also used by virtualization in some architectures,
		@@ -2521,6 +2523,75 @@ cannot be used.
		The tracing folks both located the requirement and provided the
		needed fix, so this surprise requirement was relatively painless.

		<h3><a name="Accesses to User Memory and RCU">
		Accesses to User Memory and RCU</a></h3>

		<p>
		The kernel needs to access user-space memory, for example, to access
		data referenced by system-call parameters.
		The <tt>get_user()</tt> macro does this job.

		<p>
		However, user-space memory might well be paged out, which means
		that <tt>get_user()</tt> might well page-fault and thus block while
		waiting for the resulting I/O to complete.
		It would be a very bad thing for the compiler to reorder
		a <tt>get_user()</tt> invocation into an RCU read-side critical
		section.
		For example, suppose that the source code looked like this:

		<blockquote>
		<pre>
		1 rcu_read_lock();
		2 p = rcu_dereference(gp);
		3 v = p->value;
		4 rcu_read_unlock();
		5 get_user(user_v, user_p);
		6 do_something_with(v, user_v);
		</pre>
		</blockquote>

		<p>
		The compiler must not be permitted to transform this source code into
		the following:

		<blockquote>
		<pre>
		1 rcu_read_lock();
		2 p = rcu_dereference(gp);
		3 get_user(user_v, user_p); // BUG: POSSIBLE PAGE FAULT!!!
		4 v = p->value;
		5 rcu_read_unlock();
		6 do_something_with(v, user_v);
		</pre>
		</blockquote>

		<p>
		If the compiler did make this transformation in a
		<tt>CONFIG_PREEMPT=n</tt> kernel build, and if <tt>get_user()</tt> did
		page fault, the result would be a quiescent state in the middle
		of an RCU read-side critical section.
		This misplaced quiescent state could result in line 4 being
		a use-after-free access, which could be bad for your kernel's
		actuarial statistics.
		Similar examples can be constructed with the call to <tt>get_user()</tt>
		preceding the <tt>rcu_read_lock()</tt>.

		<p>
		Unfortunately, <tt>get_user()</tt> doesn't have any particular
		ordering properties, and in some architectures the underlying <tt>asm</tt>
		isn't even marked <tt>volatile</tt>.
		And even if it was marked <tt>volatile</tt>, the above access to
		<tt>p->value</tt> is not volatile, so the compiler would not have any
		reason to keep those two accesses in order.

		<p>
		Therefore, the Linux-kernel definitions of <tt>rcu_read_lock()</tt>
		and <tt>rcu_read_unlock()</tt> must act as compiler barriers,
		at least for outermost instances of <tt>rcu_read_lock()</tt> and
		<tt>rcu_read_unlock()</tt> within a nested set of RCU read-side critical
		sections.

		<h3><a name="Energy Efficiency">Energy Efficiency</a></h3>

		<p>

Documentation/RCU/stallwarn.txt

+6 −0

Original line number	Diff line number	Diff line
		@@ -57,6 +57,12 @@ o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that
		CONFIG_PREEMPT_RCU case, you might see stall-warning
		messages.

		You can use the rcutree.kthread_prio kernel boot parameter to
		increase the scheduling priority of RCU's kthreads, which can
		help avoid this problem. However, please note that doing this
		can increase your system's context-switch rate and thus degrade
		performance.

		o A periodic interrupt whose handler takes longer than the time
		interval between successive pairs of interrupts. This can
		prevent RCU's kthreads and softirq handlers from running.

Documentation/admin-guide/kernel-parameters.txt

+11 −6

Original line number	Diff line number	Diff line
		@@ -3837,12 +3837,13 @@
		RCU_BOOST is not set, valid values are 0-99 and
		the default is zero (non-realtime operation).

		rcutree.rcu_nocb_leader_stride= [KNL]
		Set the number of NOCB kthread groups, which
		defaults to the square root of the number of
		CPUs. Larger numbers reduces the wakeup overhead
		on the per-CPU grace-period kthreads, but increases
		that same overhead on each group's leader.
		rcutree.rcu_nocb_gp_stride= [KNL]
		Set the number of NOCB callback kthreads in
		each group, which defaults to the square root
		of the number of CPUs. Larger numbers reduce
		the wakeup overhead on the global grace-period
		kthread, but increases that same overhead on
		each group's NOCB grace-period kthread.

		rcutree.qhimark= [KNL]
		Set threshold of queued RCU callbacks beyond which
		@@ -4047,6 +4048,10 @@
		rcutorture.verbose= [KNL]
		Enable additional printk() statements.

		rcupdate.rcu_cpu_stall_ftrace_dump= [KNL]
		Dump ftrace buffer after reporting RCU CPU
		stall warning.

		rcupdate.rcu_cpu_stall_suppress= [KNL]
		Suppress RCU CPU stall warning messages.

MAINTAINERS

+1 −1

Original line number	Diff line number	Diff line
		@@ -9340,7 +9340,7 @@ F: drivers/misc/lkdtm/*

		LINUX KERNEL MEMORY CONSISTENCY MODEL (LKMM)
		M: Alan Stern <stern@rowland.harvard.edu>
		M: Andrea Parri <andrea.parri@amarulasolutions.com>
		M: Andrea Parri <parri.andrea@gmail.com>
		M: Will Deacon <will@kernel.org>
		M: Peter Zijlstra <peterz@infradead.org>
		M: Boqun Feng <boqun.feng@gmail.com>

arch/arm/kernel/smp.c

+2 −4

Original line number	Diff line number	Diff line
		@@ -264,15 +264,13 @@ int __cpu_disable(void)
		return 0;
		}

		static DECLARE_COMPLETION(cpu_died);

		/*
		* called on the thread which is asking for a CPU to be shutdown -
		* waits until shutdown has completed, or it is timed out.
		*/
		void __cpu_die(unsigned int cpu)
		{
		if (!wait_for_completion_timeout(&cpu_died, msecs_to_jiffies(5000))) {
		if (!cpu_wait_death(cpu, 5)) {
		pr_err("CPU%u: cpu didn't die\n", cpu);
		return;
		}
		@@ -319,7 +317,7 @@ void arch_cpu_idle_dead(void)
		* this returns, power and/or clocks can be removed at any point
		* from this CPU and its cache by platform_cpu_kill().
		*/
		complete(&cpu_died);
		(void)cpu_report_death();

		/*
		* Ensure that the cache lines associated with that completion are

Admin message