Commit ee01c4d7 authored by Linus Torvalds's avatar Linus Torvalds
Browse files

Merge branch 'akpm' (patches from Andrew)

Merge more updates from Andrew Morton:
 "More mm/ work, plenty more to come

  Subsystems affected by this patch series: slub, memcg, gup, kasan,
  pagealloc, hugetlb, vmscan, tools, mempolicy, memblock, hugetlbfs,
  thp, mmap, kconfig"

* akpm: (131 commits)
  arm64: mm: use ARCH_HAS_DEBUG_WX instead of arch defined
  x86: mm: use ARCH_HAS_DEBUG_WX instead of arch defined
  riscv: support DEBUG_WX
  mm: add DEBUG_WX support
  drivers/base/memory.c: cache memory blocks in xarray to accelerate lookup
  mm/thp: rename pmd_mknotpresent() as pmd_mkinvalid()
  powerpc/mm: drop platform defined pmd_mknotpresent()
  mm: thp: don't need to drain lru cache when splitting and mlocking THP
  hugetlbfs: get unmapped area below TASK_UNMAPPED_BASE for hugetlbfs
  sparc32: register memory occupied by kernel as memblock.memory
  include/linux/memblock.h: fix minor typo and unclear comment
  mm, mempolicy: fix up gup usage in lookup_node
  tools/vm/page_owner_sort.c: filter out unneeded line
  mm: swap: memcg: fix memcg stats for huge pages
  mm: swap: fix vmstats for huge pages
  mm: vmscan: limit the range of LRU type balancing
  mm: vmscan: reclaim writepage is IO cost
  mm: vmscan: determine anon/file pressure balance at the reclaim root
  mm: balance LRU lists based on relative thrashing
  mm: only count actual rotations as LRU reclaim cost
  ...
parents c444eb56 09587a09
Loading
Loading
Loading
Loading
+7 −12
Original line number Diff line number Diff line
@@ -199,11 +199,11 @@ An RSS page is unaccounted when it's fully unmapped. A PageCache page is
unaccounted when it's removed from radix-tree. Even if RSS pages are fully
unmapped (by kswapd), they may exist as SwapCache in the system until they
are really freed. Such SwapCaches are also accounted.
A swapped-in page is not accounted until it's mapped.
A swapped-in page is accounted after adding into swapcache.

Note: The kernel does swapin-readahead and reads multiple swaps at once.
This means swapped-in pages may contain pages for other tasks than a task
causing page fault. So, we avoid accounting at swap-in I/O.
Since page's memcg recorded into swap whatever memsw enabled, the page will
be accounted after swapin.

At page migration, accounting information is kept.

@@ -222,18 +222,13 @@ the cgroup that brought it in -- this will happen on memory pressure).
But see section 8.2: when moving a task to another cgroup, its pages may
be recharged to the new cgroup, if move_charge_at_immigrate has been chosen.

Exception: If CONFIG_MEMCG_SWAP is not used.
When you do swapoff and make swapped-out pages of shmem(tmpfs) to
be backed into memory in force, charges for pages are accounted against the
caller of swapoff rather than the users of shmem.

2.4 Swap Extension (CONFIG_MEMCG_SWAP)
2.4 Swap Extension
--------------------------------------

Swap Extension allows you to record charge for swap. A swapped-in page is
charged back to original page allocator if possible.
Swap usage is always recorded for each of cgroup. Swap Extension allows you to
read and limit it.

When swap is accounted, following files are added.
When CONFIG_SWAP is enabled, following files are added.

 - memory.memsw.usage_in_bytes.
 - memory.memsw.limit_in_bytes.
+27 −13
Original line number Diff line number Diff line
@@ -834,12 +834,15 @@
			See also Documentation/networking/decnet.rst.

	default_hugepagesz=
			[same as hugepagesz=] The size of the default
			HugeTLB page size. This is the size represented by
			the legacy /proc/ hugepages APIs, used for SHM, and
			default size when mounting hugetlbfs filesystems.
			Defaults to the default architecture's huge page size
			if not specified.
			[HW] The size of the default HugeTLB page. This is
			the size represented by the legacy /proc/ hugepages
			APIs.  In addition, this is the default hugetlb size
			used for shmget(), mmap() and mounting hugetlbfs
			filesystems.  If not specified, defaults to the
			architecture's default huge page size.  Huge page
			sizes are architecture dependent.  See also
			Documentation/admin-guide/mm/hugetlbpage.rst.
			Format: size[KMG]

	deferred_probe_timeout=
			[KNL] Debugging option to set a timeout in seconds for
@@ -1484,13 +1487,24 @@
			hugepages using the cma allocator. If enabled, the
			boot-time allocation of gigantic hugepages is skipped.

	hugepages=	[HW,X86-32,IA-64] HugeTLB pages to allocate at boot.
	hugepagesz=	[HW,IA-64,PPC,X86-64] The size of the HugeTLB pages.
			On x86-64 and powerpc, this option can be specified
			multiple times interleaved with hugepages= to reserve
			huge pages of different sizes. Valid pages sizes on
			x86-64 are 2M (when the CPU supports "pse") and 1G
			(when the CPU supports the "pdpe1gb" cpuinfo flag).
	hugepages=	[HW] Number of HugeTLB pages to allocate at boot.
			If this follows hugepagesz (below), it specifies
			the number of pages of hugepagesz to be allocated.
			If this is the first HugeTLB parameter on the command
			line, it specifies the number of pages to allocate for
			the default huge page size.  See also
			Documentation/admin-guide/mm/hugetlbpage.rst.
			Format: <integer>

	hugepagesz=
			[HW] The size of the HugeTLB pages.  This is used in
			conjunction with hugepages (above) to allocate huge
			pages of a specific size at boot.  The pair
			hugepagesz=X hugepages=Y can be specified once for
			each supported huge page size. Huge page sizes are
			architecture dependent.  See also
			Documentation/admin-guide/mm/hugetlbpage.rst.
			Format: size[KMG]

	hung_task_panic=
			[KNL] Should the hung task detector generate panics.
+35 −0
Original line number Diff line number Diff line
@@ -100,6 +100,41 @@ with a huge page size selection parameter "hugepagesz=<size>". <size> must
be specified in bytes with optional scale suffix [kKmMgG].  The default huge
page size may be selected with the "default_hugepagesz=<size>" boot parameter.

Hugetlb boot command line parameter semantics
hugepagesz - Specify a huge page size.  Used in conjunction with hugepages
	parameter to preallocate a number of huge pages of the specified
	size.  Hence, hugepagesz and hugepages are typically specified in
	pairs such as:
		hugepagesz=2M hugepages=512
	hugepagesz can only be specified once on the command line for a
	specific huge page size.  Valid huge page sizes are architecture
	dependent.
hugepages - Specify the number of huge pages to preallocate.  This typically
	follows a valid hugepagesz or default_hugepagesz parameter.  However,
	if hugepages is the first or only hugetlb command line parameter it
	implicitly specifies the number of huge pages of default size to
	allocate.  If the number of huge pages of default size is implicitly
	specified, it can not be overwritten by a hugepagesz,hugepages
	parameter pair for the default size.
	For example, on an architecture with 2M default huge page size:
		hugepages=256 hugepagesz=2M hugepages=512
	will result in 256 2M huge pages being allocated and a warning message
	indicating that the hugepages=512 parameter is ignored.  If a hugepages
	parameter is preceded by an invalid hugepagesz parameter, it will
	be ignored.
default_hugepagesz - Specify the default huge page size.  This parameter can
	only be specified once on the command line.  default_hugepagesz can
	optionally be followed by the hugepages parameter to preallocate a
	specific number of huge pages of default size.  The number of default
	sized huge pages to preallocate can also be implicitly specified as
	mentioned in the hugepages section above.  Therefore, on an
	architecture with 2M default huge page size:
		hugepages=256
		default_hugepagesz=2M hugepages=256
		hugepages=256 default_hugepagesz=2M
	will all result in 256 2M huge pages being allocated.  Valid default
	huge page size is architecture dependent.

When multiple huge page sizes are supported, ``/proc/sys/vm/nr_hugepages``
indicates the current number of pre-allocated huge pages of the default size.
Thus, one can use the following command to dynamically allocate/deallocate
+7 −0
Original line number Diff line number Diff line
@@ -220,6 +220,13 @@ memory. A lower value can prevent THPs from being
collapsed, resulting fewer pages being collapsed into
THPs, and lower memory access performance.

``max_ptes_shared`` specifies how many pages can be shared across multiple
processes. Exceeding the number would block the collapse::

	/sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_shared

A higher value may increase memory footprint for some workloads.

Boot parameter
==============

+18 −5
Original line number Diff line number Diff line
@@ -831,14 +831,27 @@ tooling to work, you can do::
swappiness
==========

This control is used to define how aggressive the kernel will swap
memory pages.  Higher values will increase aggressiveness, lower values
decrease the amount of swap.  A value of 0 instructs the kernel not to
initiate swap until the amount of free and file-backed pages is less
than the high water mark in a zone.
This control is used to define the rough relative IO cost of swapping
and filesystem paging, as a value between 0 and 200. At 100, the VM
assumes equal IO cost and will thus apply memory pressure to the page
cache and swap-backed pages equally; lower values signify more
expensive swap IO, higher values indicates cheaper.

Keep in mind that filesystem IO patterns under memory pressure tend to
be more efficient than swap's random IO. An optimal value will require
experimentation and will also be workload-dependent.

The default value is 60.

For in-memory swap, like zram or zswap, as well as hybrid setups that
have swap on faster devices than the filesystem, values beyond 100 can
be considered. For example, if the random IO against the swap device
is on average 2x faster than IO from the filesystem, swappiness should
be 133 (x + 2x = 200, 2x = 133.33).

At 0, the kernel will not initiate swap until the amount of free and
file-backed pages is less than the high watermark in a zone.


unprivileged_userfaultfd
========================
Loading