Commit 77e4ef99 authored by Tejun Heo's avatar Tejun Heo
Browse files

threadgroup: extend threadgroup_lock() to cover exit and exec



threadgroup_lock() protected only protected against new addition to
the threadgroup, which was inherently somewhat incomplete and
problematic for its only user cgroup.  On-going migration could race
against exec and exit leading to interesting problems - the symmetry
between various attach methods, task exiting during method execution,
->exit() racing against attach methods, migrating task switching basic
properties during exec and so on.

This patch extends threadgroup_lock() such that it protects against
all three threadgroup altering operations - fork, exit and exec.  For
exit, threadgroup_change_begin/end() calls are added to exit_signals
around assertion of PF_EXITING.  For exec, threadgroup_[un]lock() are
updated to also grab and release cred_guard_mutex.

With this change, threadgroup_lock() guarantees that the target
threadgroup will remain stable - no new task will be added, no new
PF_EXITING will be set and exec won't happen.

The next patch will update cgroup so that it can take full advantage
of this change.

-v2: beefed up comment as suggested by Frederic.

-v3: narrowed scope of protection in exit path as suggested by
     Frederic.

Signed-off-by: default avatarTejun Heo <tj@kernel.org>
Reviewed-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
Acked-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Menage <paul@paulmenage.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
parent 257058ae
Loading
Loading
Loading
Loading
+41 −6
Original line number Diff line number Diff line
@@ -635,11 +635,13 @@ struct signal_struct {
#endif
#ifdef CONFIG_CGROUPS
	/*
	 * The group_rwsem prevents threads from forking with
	 * CLONE_THREAD while held for writing. Use this for fork-sensitive
	 * threadgroup-wide operations. It's taken for reading in fork.c in
	 * copy_process().
	 * Currently only needed write-side by cgroups.
	 * group_rwsem prevents new tasks from entering the threadgroup and
	 * member tasks from exiting,a more specifically, setting of
	 * PF_EXITING.  fork and exit paths are protected with this rwsem
	 * using threadgroup_change_begin/end().  Users which require
	 * threadgroup to remain stable should use threadgroup_[un]lock()
	 * which also takes care of exec path.  Currently, cgroup is the
	 * only user.
	 */
	struct rw_semaphore group_rwsem;
#endif
@@ -2371,7 +2373,6 @@ static inline void unlock_task_sighand(struct task_struct *tsk,
	spin_unlock_irqrestore(&tsk->sighand->siglock, *flags);
}

/* See the declaration of group_rwsem in signal_struct. */
#ifdef CONFIG_CGROUPS
static inline void threadgroup_change_begin(struct task_struct *tsk)
{
@@ -2381,13 +2382,47 @@ static inline void threadgroup_change_end(struct task_struct *tsk)
{
	up_read(&tsk->signal->group_rwsem);
}

/**
 * threadgroup_lock - lock threadgroup
 * @tsk: member task of the threadgroup to lock
 *
 * Lock the threadgroup @tsk belongs to.  No new task is allowed to enter
 * and member tasks aren't allowed to exit (as indicated by PF_EXITING) or
 * perform exec.  This is useful for cases where the threadgroup needs to
 * stay stable across blockable operations.
 *
 * fork and exit paths explicitly call threadgroup_change_{begin|end}() for
 * synchronization.  While held, no new task will be added to threadgroup
 * and no existing live task will have its PF_EXITING set.
 *
 * During exec, a task goes and puts its thread group through unusual
 * changes.  After de-threading, exclusive access is assumed to resources
 * which are usually shared by tasks in the same group - e.g. sighand may
 * be replaced with a new one.  Also, the exec'ing task takes over group
 * leader role including its pid.  Exclude these changes while locked by
 * grabbing cred_guard_mutex which is used to synchronize exec path.
 */
static inline void threadgroup_lock(struct task_struct *tsk)
{
	/*
	 * exec uses exit for de-threading nesting group_rwsem inside
	 * cred_guard_mutex. Grab cred_guard_mutex first.
	 */
	mutex_lock(&tsk->signal->cred_guard_mutex);
	down_write(&tsk->signal->group_rwsem);
}

/**
 * threadgroup_unlock - unlock threadgroup
 * @tsk: member task of the threadgroup to unlock
 *
 * Reverse threadgroup_lock().
 */
static inline void threadgroup_unlock(struct task_struct *tsk)
{
	up_write(&tsk->signal->group_rwsem);
	mutex_unlock(&tsk->signal->cred_guard_mutex);
}
#else
static inline void threadgroup_change_begin(struct task_struct *tsk) {}
+10 −0
Original line number Diff line number Diff line
@@ -2359,8 +2359,15 @@ void exit_signals(struct task_struct *tsk)
	int group_stop = 0;
	sigset_t unblocked;

	/*
	 * @tsk is about to have PF_EXITING set - lock out users which
	 * expect stable threadgroup.
	 */
	threadgroup_change_begin(tsk);

	if (thread_group_empty(tsk) || signal_group_exit(tsk->signal)) {
		tsk->flags |= PF_EXITING;
		threadgroup_change_end(tsk);
		return;
	}

@@ -2370,6 +2377,9 @@ void exit_signals(struct task_struct *tsk)
	 * see wants_signal(), do_signal_stop().
	 */
	tsk->flags |= PF_EXITING;

	threadgroup_change_end(tsk);

	if (!signal_pending(tsk))
		goto out;