Commit 1064d857 authored by Linus Torvalds's avatar Linus Torvalds
Browse files

Merge branch 'akpm' (patches from Andrew)

Merge more updates from Andrew Morton:

 - a couple of hotfixes

 - almost all of the rest of MM

 - lib/ updates

 - binfmt_elf updates

 - autofs updates

 - quite a lot of misc fixes and updates
    - reiserfs, fatfs
    - signals
    - exec
    - cpumask
    - rapidio
    - sysctl
    - pids
    - eventfd
    - gcov
    - panic
    - pps

 - gdb script updates

 - ipc updates

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (126 commits)
  mm: memcontrol: fix NUMA round-robin reclaim at intermediate level
  mm: memcontrol: fix recursive statistics correctness & scalabilty
  mm: memcontrol: move stat/event counting functions out-of-line
  mm: memcontrol: make cgroup stats and events query API explicitly local
  drivers/virt/fsl_hypervisor.c: prevent integer overflow in ioctl
  drivers/virt/fsl_hypervisor.c: dereferencing error pointers in ioctl
  mm, memcg: rename ambiguously named memory.stat counters and functions
  arch: remove <asm/sizes.h> and <asm-generic/sizes.h>
  treewide: replace #include <asm/sizes.h> with #include <linux/sizes.h>
  fs/block_dev.c: Remove duplicate header
  fs/cachefiles/namei.c: remove duplicate header
  include/linux/sched/signal.h: replace `tsk' with `task'
  fs/coda/psdev.c: remove duplicate header
  ipc: do cyclic id allocation for the ipc object.
  ipc: conserve sequence numbers in ipcmni_extend mode
  ipc: allow boot time extension of IPCMNI from 32k to 16M
  ipc/mqueue: optimize msg_get()
  ipc/mqueue: remove redundant wq task assignment
  ipc: prevent lockup on alloc_msg and free_msg
  scripts/gdb: print cached rate in lx-clk-summary
  ...
parents 35c99ffa def0fdae
Loading
Loading
Loading
Loading
+107 −0
Original line number Diff line number Diff line
@@ -63,6 +63,110 @@ as well as medium and long term trends. The total absolute stall time
spikes which wouldn't necessarily make a dent in the time averages,
or to average trends over custom time frames.

Monitoring for pressure thresholds
==================================

Users can register triggers and use poll() to be woken up when resource
pressure exceeds certain thresholds.

A trigger describes the maximum cumulative stall time over a specific
time window, e.g. 100ms of total stall time within any 500ms window to
generate a wakeup event.

To register a trigger user has to open psi interface file under
/proc/pressure/ representing the resource to be monitored and write the
desired threshold and time window. The open file descriptor should be
used to wait for trigger events using select(), poll() or epoll().
The following format is used:

<some|full> <stall amount in us> <time window in us>

For example writing "some 150000 1000000" into /proc/pressure/memory
would add 150ms threshold for partial memory stall measured within
1sec time window. Writing "full 50000 1000000" into /proc/pressure/io
would add 50ms threshold for full io stall measured within 1sec time window.

Triggers can be set on more than one psi metric and more than one trigger
for the same psi metric can be specified. However for each trigger a separate
file descriptor is required to be able to poll it separately from others,
therefore for each trigger a separate open() syscall should be made even
when opening the same psi interface file.

Monitors activate only when system enters stall state for the monitored
psi metric and deactivates upon exit from the stall state. While system is
in the stall state psi signal growth is monitored at a rate of 10 times per
tracking window.

The kernel accepts window sizes ranging from 500ms to 10s, therefore min
monitoring update interval is 50ms and max is 1s. Min limit is set to
prevent overly frequent polling. Max limit is chosen as a high enough number
after which monitors are most likely not needed and psi averages can be used
instead.

When activated, psi monitor stays active for at least the duration of one
tracking window to avoid repeated activations/deactivations when system is
bouncing in and out of the stall state.

Notifications to the userspace are rate-limited to one per tracking window.

The trigger will de-register when the file descriptor used to define the
trigger  is closed.

Userspace monitor usage example
===============================

#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <poll.h>
#include <string.h>
#include <unistd.h>

/*
 * Monitor memory partial stall with 1s tracking window size
 * and 150ms threshold.
 */
int main() {
	const char trig[] = "some 150000 1000000";
	struct pollfd fds;
	int n;

	fds.fd = open("/proc/pressure/memory", O_RDWR | O_NONBLOCK);
	if (fds.fd < 0) {
		printf("/proc/pressure/memory open error: %s\n",
			strerror(errno));
		return 1;
	}
	fds.events = POLLPRI;

	if (write(fds.fd, trig, strlen(trig) + 1) < 0) {
		printf("/proc/pressure/memory write error: %s\n",
			strerror(errno));
		return 1;
	}

	printf("waiting for events...\n");
	while (1) {
		n = poll(&fds, 1, -1);
		if (n < 0) {
			printf("poll error: %s\n", strerror(errno));
			return 1;
		}
		if (fds.revents & POLLERR) {
			printf("got POLLERR, event source is gone\n");
			return 0;
		}
		if (fds.revents & POLLPRI) {
			printf("event triggered!\n");
		} else {
			printf("unknown event received: 0x%x\n", fds.revents);
			return 1;
		}
	}

	return 0;
}

Cgroup2 interface
=================

@@ -71,3 +175,6 @@ mounted, pressure stall information is also tracked for tasks grouped
into cgroups. Each subdirectory in the cgroupfs mountpoint contains
cpu.pressure, memory.pressure, and io.pressure files; the format is
the same as the /proc/pressure/ files.

Per-cgroup psi monitors can be specified and used the same way as
system-wide ones.
+16 −1
Original line number Diff line number Diff line
@@ -1830,6 +1830,9 @@
	ip=		[IP_PNP]
			See Documentation/filesystems/nfs/nfsroot.txt.

	ipcmni_extend	[KNL] Extend the maximum number of unique System V
			IPC identifiers from 32,768 to 16,777,216.

	irqaffinity=	[SMP] Set the default irq affinity mask
			The argument is a cpu list, as described above.

@@ -3174,6 +3177,16 @@
			This will also cause panics on machine check exceptions.
			Useful together with panic=30 to trigger a reboot.

	page_alloc.shuffle=
			[KNL] Boolean flag to control whether the page allocator
			should randomize its free lists. The randomization may
			be automatically enabled if the kernel detects it is
			running on a platform with a direct-mapped memory-side
			cache, and this parameter can be used to
			override/disable that behavior. The state of the flag
			can be read from sysfs at:
			/sys/module/page_alloc/parameters/shuffle.

	page_owner=	[KNL] Boot-time page_owner enabling option.
			Storage of the information about who allocated
			each page is disabled in default. With this switch,
@@ -4054,7 +4067,9 @@
				[[,]s[mp]#### \
				[[,]b[ios] | a[cpi] | k[bd] | t[riple] | e[fi] | p[ci]] \
				[[,]f[orce]
			Where reboot_mode is one of warm (soft) or cold (hard) or gpio,
			Where reboot_mode is one of warm (soft) or cold (hard) or gpio
					(prefix with 'panic_' to set mode for panic
					reboot only),
			      reboot_type is one of bios, acpi, kbd, triple, efi, or pci,
			      reboot_force is either force or not specified,
			      reboot_cpu is s[mp]#### with #### being the processor
+2 −2
Original line number Diff line number Diff line
@@ -147,10 +147,10 @@ Division Functions
.. kernel-doc:: include/linux/math64.h
   :internal:

.. kernel-doc:: lib/div64.c
.. kernel-doc:: lib/math/div64.c
   :functions: div_s64_rem div64_u64_rem div64_u64 div64_s64

.. kernel-doc:: lib/gcd.c
.. kernel-doc:: lib/math/gcd.c
   :export:

UUID/GUID
+14 −4
Original line number Diff line number Diff line
@@ -34,10 +34,6 @@ Configure the kernel with::
        CONFIG_DEBUG_FS=y
        CONFIG_GCOV_KERNEL=y

select the gcc's gcov format, default is autodetect based on gcc version::

        CONFIG_GCOV_FORMAT_AUTODETECT=y

and to get coverage data for the entire kernel::

        CONFIG_GCOV_PROFILE_ALL=y
@@ -169,6 +165,20 @@ b) gcov is run on the BUILD machine
      [user@build] gcov -o /tmp/coverage/tmp/out/init main.c


Note on compilers
-----------------

GCC and LLVM gcov tools are not necessarily compatible. Use gcov_ to work with
GCC-generated .gcno and .gcda files, and use llvm-cov_ for Clang.

.. _gcov: http://gcc.gnu.org/onlinedocs/gcc/Gcov.html
.. _llvm-cov: https://llvm.org/docs/CommandGuide/llvm-cov.html

Build differences between GCC and Clang gcov are handled by Kconfig. It
automatically selects the appropriate gcov format depending on the detected
toolchain.


Troubleshooting
---------------

+7 −0
Original line number Diff line number Diff line
@@ -7,6 +7,10 @@ Required properties:
- compatible: should be "pps-gpio"
- gpios: one PPS GPIO in the format described by ../gpio/gpio.txt

Additional required properties for the PPS ECHO functionality:
- echo-gpios: one PPS ECHO GPIO in the format described by ../gpio/gpio.txt
- echo-active-ms: duration in ms of the active portion of the echo pulse

Optional properties:
- assert-falling-edge: when present, assert is indicated by a falling edge
                       (instead of by a rising edge)
@@ -19,5 +23,8 @@ Example:
		gpios = <&gpio1 26 GPIO_ACTIVE_HIGH>;
		assert-falling-edge;

		echo-gpios = <&gpio1 27 GPIO_ACTIVE_HIGH>;
		echo-active-ms = <100>;

		compatible = "pps-gpio";
	};
Loading