Commit ad4ecbcb authored by Shailabh Nagar's avatar Shailabh Nagar Committed by Linus Torvalds
Browse files

[PATCH] delay accounting taskstats interface send tgid once



Send per-tgid data only once during exit of a thread group instead of once
with each member thread exit.

Currently, when a thread exits, besides its per-tid data, the per-tgid data
of its thread group is also sent out, if its thread group is non-empty.
The per-tgid data sent consists of the sum of per-tid stats for all
*remaining* threads of the thread group.

This patch modifies this sending in two ways:

- the per-tgid data is sent only when the last thread of a thread group
  exits.  This cuts down heavily on the overhead of sending/receiving
  per-tgid data, especially when other exploiters of the taskstats
  interface aren't interested in per-tgid stats

- the semantics of the per-tgid data sent are changed.  Instead of being
  the sum of per-tid data for remaining threads, the value now sent is the
  true total accumalated statistics for all threads that are/were part of
  the thread group.

The patch also addresses a minor issue where failure of one accounting
subsystem to fill in the taskstats structure was causing the send of
taskstats to not be sent at all.

The patch has been tested for stability and run cerberus for over 4 hours
on an SMP.

[akpm@osdl.org: bugfixes]
Signed-off-by: default avatarShailabh Nagar <nagar@watson.ibm.com>
Signed-off-by: default avatarBalbir Singh <balbir@in.ibm.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
parent 25890454
Loading
Loading
Loading
Loading
+4 −9
Original line number Original line Diff line number Diff line
@@ -48,9 +48,10 @@ counter (say cpu_delay_total) for a task will give the delay
experienced by the task waiting for the corresponding resource
experienced by the task waiting for the corresponding resource
in that interval.
in that interval.


When a task exits, records containing the per-task and per-process statistics
When a task exits, records containing the per-task statistics
are sent to userspace without requiring a command. More details are given in
are sent to userspace without requiring a command. If it is the last exiting
the taskstats interface description.
task of a thread group, the per-tgid statistics are also sent. More details
are given in the taskstats interface description.


The getdelays.c userspace utility in this directory allows simple commands to
The getdelays.c userspace utility in this directory allows simple commands to
be run and the corresponding delay statistics to be displayed. It also serves
be run and the corresponding delay statistics to be displayed. It also serves
@@ -107,9 +108,3 @@ IO count delay total
	0	0
	0	0
MEM	count	delay total
MEM	count	delay total
	0	0
	0	0





+13 −20
Original line number Original line Diff line number Diff line
@@ -32,12 +32,11 @@ The response contains statistics for a task (if pid is specified) or the sum of
statistics for all tasks of the process (if tgid is specified).
statistics for all tasks of the process (if tgid is specified).


To obtain statistics for tasks which are exiting, userspace opens a multicast
To obtain statistics for tasks which are exiting, userspace opens a multicast
netlink socket. Each time a task exits, two records are sent by the kernel to
netlink socket. Each time a task exits, its per-pid statistics is always sent
each listener on the multicast socket. The first the per-pid task's statistics
by the kernel to each listener on the multicast socket. In addition, if it is
and the second is the sum for all tasks of the process to which the task
the last thread exiting its thread group, an additional record containing the
belongs (the task does not need to be the thread group leader). The need for
per-tgid stats are also sent. The latter contains the sum of per-pid stats for
per-tgid stats to be sent for each exiting task is explained in the per-tgid
all threads in the thread group, both past and present.
stats section below.


getdelays.c is a simple utility demonstrating usage of the taskstats interface
getdelays.c is a simple utility demonstrating usage of the taskstats interface
for reporting delay accounting statistics.
for reporting delay accounting statistics.
@@ -104,20 +103,14 @@ stats in userspace alone is inefficient and potentially inaccurate (due to lack
of atomicity).
of atomicity).


However, maintaining per-process, in addition to per-task stats, within the
However, maintaining per-process, in addition to per-task stats, within the
kernel has space and time overheads. Hence the taskstats implementation
kernel has space and time overheads. To address this, the taskstats code
dynamically sums up the per-task stats for each task belonging to a process
accumalates each exiting task's statistics into a process-wide data structure.
whenever per-process stats are needed.
When the last task of a process exits, the process level data accumalated also

gets sent to userspace (along with the per-task data).
Not maintaining per-tgid stats creates a problem when userspace is interested

in getting these stats when the process dies i.e. the last thread of
When a user queries to get per-tgid data, the sum of all other live threads in
a process exits. It isn't possible to simply return some aggregated per-process
the group is added up and added to the accumalated total for previously exited
statistic from the kernel.
threads of the same thread group.

The approach taken by taskstats is to return the per-tgid stats *each* time
a task exits, in addition to the per-pid stats for that task. Userspace can
maintain task<->process mappings and use them to maintain the per-process stats
in userspace, updating the aggregate appropriately as the tasks of a process
exit.


Extending taskstats
Extending taskstats
-------------------
-------------------
+12 −0
Original line number Original line Diff line number Diff line
@@ -2240,6 +2240,12 @@ M: tsbogend@alpha.franken.de
L:	netdev@vger.kernel.org
L:	netdev@vger.kernel.org
S:	Maintained
S:	Maintained


PER-TASK DELAY ACCOUNTING
P:	Shailabh Nagar
M:	nagar@watson.ibm.com
L:	linux-kernel@vger.kernel.org
S:	Maintained

PERSONALITY HANDLING
PERSONALITY HANDLING
P:	Christoph Hellwig
P:	Christoph Hellwig
M:	hch@infradead.org
M:	hch@infradead.org
@@ -2767,6 +2773,12 @@ P: Deepak Saxena
M:	dsaxena@plexity.net
M:	dsaxena@plexity.net
S:	Maintained
S:	Maintained


TASKSTATS STATISTICS INTERFACE
P:	Shailabh Nagar
M:	nagar@watson.ibm.com
L:	linux-kernel@vger.kernel.org
S:	Maintained

TI PARALLEL LINK CABLE DRIVER
TI PARALLEL LINK CABLE DRIVER
P:     Romain Lievin
P:     Romain Lievin
M:     roms@lpg.ticalc.org
M:     roms@lpg.ticalc.org
+4 −0
Original line number Original line Diff line number Diff line
@@ -463,6 +463,10 @@ struct signal_struct {
#ifdef CONFIG_BSD_PROCESS_ACCT
#ifdef CONFIG_BSD_PROCESS_ACCT
	struct pacct_struct pacct;	/* per-process accounting information */
	struct pacct_struct pacct;	/* per-process accounting information */
#endif
#endif
#ifdef CONFIG_TASKSTATS
	spinlock_t stats_lock;
	struct taskstats *stats;
#endif
};
};


/* Context switch must be unlocked if interrupts are to be enabled */
/* Context switch must be unlocked if interrupts are to be enabled */
+55 −16
Original line number Original line Diff line number Diff line
@@ -19,36 +19,75 @@ enum {
extern kmem_cache_t *taskstats_cache;
extern kmem_cache_t *taskstats_cache;
extern struct mutex taskstats_exit_mutex;
extern struct mutex taskstats_exit_mutex;


static inline void taskstats_exit_alloc(struct taskstats **ptidstats,
static inline void taskstats_exit_alloc(struct taskstats **ptidstats)
					struct taskstats **ptgidstats)
{
{
	*ptidstats = kmem_cache_zalloc(taskstats_cache, SLAB_KERNEL);
	*ptidstats = kmem_cache_zalloc(taskstats_cache, SLAB_KERNEL);
	*ptgidstats = kmem_cache_zalloc(taskstats_cache, SLAB_KERNEL);
}
}


static inline void taskstats_exit_free(struct taskstats *tidstats,
static inline void taskstats_exit_free(struct taskstats *tidstats)
					struct taskstats *tgidstats)
{
{
	if (tidstats)
	if (tidstats)
		kmem_cache_free(taskstats_cache, tidstats);
		kmem_cache_free(taskstats_cache, tidstats);
	if (tgidstats)
		kmem_cache_free(taskstats_cache, tgidstats);
}
}


extern void taskstats_exit_send(struct task_struct *, struct taskstats *,
static inline void taskstats_tgid_init(struct signal_struct *sig)
				struct taskstats *);
{
extern void taskstats_init_early(void);
	spin_lock_init(&sig->stats_lock);
	sig->stats = NULL;
}

static inline void taskstats_tgid_alloc(struct signal_struct *sig)
{
	struct taskstats *stats;
	unsigned long flags;

	stats = kmem_cache_zalloc(taskstats_cache, SLAB_KERNEL);
	if (!stats)
		return;

	spin_lock_irqsave(&sig->stats_lock, flags);
	if (!sig->stats) {
		sig->stats = stats;
		stats = NULL;
	}
	spin_unlock_irqrestore(&sig->stats_lock, flags);

	if (stats)
		kmem_cache_free(taskstats_cache, stats);
}


static inline void taskstats_tgid_free(struct signal_struct *sig)
{
	struct taskstats *stats = NULL;
	unsigned long flags;

	spin_lock_irqsave(&sig->stats_lock, flags);
	if (sig->stats) {
		stats = sig->stats;
		sig->stats = NULL;
	}
	spin_unlock_irqrestore(&sig->stats_lock, flags);
	if (stats)
		kmem_cache_free(taskstats_cache, stats);
}

extern void taskstats_exit_send(struct task_struct *, struct taskstats *, int);
extern void taskstats_init_early(void);
extern void taskstats_tgid_alloc(struct signal_struct *);
#else
#else
static inline void taskstats_exit_alloc(struct taskstats **ptidstats,
static inline void taskstats_exit_alloc(struct taskstats **ptidstats)
					struct taskstats **ptgidstats)
{}
{}
static inline void taskstats_exit_free(struct taskstats *ptidstats,
static inline void taskstats_exit_free(struct taskstats *ptidstats)
					struct taskstats *ptgidstats)
{}
{}
static inline void taskstats_exit_send(struct task_struct *tsk,
static inline void taskstats_exit_send(struct task_struct *tsk,
				       struct taskstats *tidstats,
				       struct taskstats *tidstats,
					struct taskstats *tgidstats)
				       int group_dead)
{}
static inline void taskstats_tgid_init(struct signal_struct *sig)
{}
static inline void taskstats_tgid_alloc(struct signal_struct *sig)
{}
static inline void taskstats_tgid_free(struct signal_struct *sig)
{}
{}
static inline void taskstats_init_early(void)
static inline void taskstats_init_early(void)
{}
{}
Loading