Commit 8a5dbc65 authored by Jakub Kicinski's avatar Jakub Kicinski Committed by Linus Torvalds
Browse files

mm/memcg: prepare for swap over-high accounting and penalty calculation

Patch series "memcg: Slow down swap allocation as the available space
gets depleted", v6.

Tejun describes the problem as follows:

When swap runs out, there's an abrupt change in system behavior - the
anonymous memory suddenly becomes unmanageable which readily breaks any
sort of memory isolation and can bring down the whole system.  To avoid
that, oomd [1] monitors free swap space and triggers kills when it drops
below the specific threshold (e.g.  15%).

While this works, it's far from ideal:

 - Depending on IO performance and total swap size, a given
   headroom might not be enough or too much.

 - oomd has to monitor swap depletion in addition to the usual
   pressure metrics and it currently doesn't consider memory.swap.max.

Solve this by adapting parts of the approach that memory.high uses -
slow down allocation as the resource gets depleted turning the depletion
behavior from abrupt cliff one to gradual degradation observable through
memory pressure metric.

[1] https://github.com/facebookincubator/oomd



This patch (of 4):

Slice the memory overage calculation logic a little bit so we can reuse
it to apply a similar penalty to the swap.  The logic which accesses the
memory-specific fields (use and high values) has to be taken out of
calculate_high_delay().

Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Chris Down <chris@chrisdown.name>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20200527195846.102707-1-kuba@kernel.org
Link: http://lkml.kernel.org/r/20200527195846.102707-2-kuba@kernel.org


Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
parent 54b512e9
Loading
Loading
Loading
Loading
+35 −27
Original line number Diff line number Diff line
@@ -2321,25 +2321,12 @@ static void high_work_func(struct work_struct *work)
 #define MEMCG_DELAY_PRECISION_SHIFT 20
 #define MEMCG_DELAY_SCALING_SHIFT 14

/*
 * Get the number of jiffies that we should penalise a mischievous cgroup which
 * is exceeding its memory.high by checking both it and its ancestors.
 */
static unsigned long calculate_high_delay(struct mem_cgroup *memcg,
					  unsigned int nr_pages)
static u64 calculate_overage(unsigned long usage, unsigned long high)
{
	unsigned long penalty_jiffies;
	u64 max_overage = 0;

	do {
		unsigned long usage, high;
	u64 overage;

		usage = page_counter_read(&memcg->memory);
		high = READ_ONCE(memcg->high);

	if (usage <= high)
			continue;
		return 0;

	/*
	 * Prevent division by 0 in overage calculation by acting as if
@@ -2349,13 +2336,33 @@ static unsigned long calculate_high_delay(struct mem_cgroup *memcg,

	overage = usage - high;
	overage <<= MEMCG_DELAY_PRECISION_SHIFT;
		overage = div64_u64(overage, high);
	return div64_u64(overage, high);
}

		if (overage > max_overage)
			max_overage = overage;
static u64 mem_find_max_overage(struct mem_cgroup *memcg)
{
	u64 overage, max_overage = 0;

	do {
		overage = calculate_overage(page_counter_read(&memcg->memory),
					    READ_ONCE(memcg->high));
		max_overage = max(overage, max_overage);
	} while ((memcg = parent_mem_cgroup(memcg)) &&
		 !mem_cgroup_is_root(memcg));

	return max_overage;
}

/*
 * Get the number of jiffies that we should penalise a mischievous cgroup which
 * is exceeding its memory.high by checking both it and its ancestors.
 */
static unsigned long calculate_high_delay(struct mem_cgroup *memcg,
					  unsigned int nr_pages,
					  u64 max_overage)
{
	unsigned long penalty_jiffies;

	if (!max_overage)
		return 0;

@@ -2411,7 +2418,8 @@ void mem_cgroup_handle_over_high(void)
	 * memory.high is breached and reclaim is unable to keep up. Throttle
	 * allocators proactively to slow down excessive growth.
	 */
	penalty_jiffies = calculate_high_delay(memcg, nr_pages);
	penalty_jiffies = calculate_high_delay(memcg, nr_pages,
					       mem_find_max_overage(memcg));

	/*
	 * Don't sleep if the amount of jiffies this memcg owes us is so low