Merge branch 'akpm' (patches from Andrew) (63bef48f) · Commits · 戴 / test

Documentation/admin-guide/kernel-parameters.txt

+11 −2

Original line number	Diff line number	Diff line
		@@ -2573,13 +2573,22 @@
		For details see: Documentation/admin-guide/hw-vuln/mds.rst

		mem=nn[KMG] [KNL,BOOT] Force usage of a specific amount of memory
		Amount of memory to be used when the kernel is not able
		to see the whole system memory or for test.
		Amount of memory to be used in cases as follows:

		1 for test;
		2 when the kernel is not able to see the whole system memory;
		3 memory that lies after 'mem=' boundary is excluded from
		the hypervisor, then assigned to KVM guests.

		[X86] Work as limiting max address. Use together
		with memmap= to avoid physical address space collisions.
		Without memmap= PCI devices could be placed at addresses
		belonging to unused RAM.

		Note that this only takes effects during boot time since
		in above case 3, memory may need be hot added after boot
		if system memory of hypervisor is not sufficient.

		mem=nopentium [BUGS=X86-32] Disable usage of 4MB pages for kernel
		memory.

Documentation/admin-guide/mm/transhuge.rst

+14 −0

Original line number	Diff line number	Diff line
		@@ -310,6 +310,11 @@ thp_fault_fallback
		is incremented if a page fault fails to allocate
		a huge page and instead falls back to using small pages.

		thp_fault_fallback_charge
		is incremented if a page fault fails to charge a huge page and
		instead falls back to using small pages even though the
		allocation was successful.

		thp_collapse_alloc_failed
		is incremented if khugepaged found a range
		of pages that should be collapsed into one huge page but failed
		@@ -319,6 +324,15 @@ thp_file_alloc
		is incremented every time a file huge page is successfully
		allocated.

		thp_file_fallback
		is incremented if a file huge page is attempted to be allocated
		but fails and instead falls back to using small pages.

		thp_file_fallback_charge
		is incremented if a file huge page cannot be charged and instead
		falls back to using small pages even though the allocation was
		successful.

		thp_file_mapped
		is incremented every time a file huge page is mapped into
		user address space.

Documentation/admin-guide/mm/userfaultfd.rst

+51 −0

Original line number	Diff line number	Diff line
		@@ -108,6 +108,57 @@ UFFDIO_COPY. They're atomic as in guaranteeing that nothing can see an
		half copied page since it'll keep userfaulting until the copy has
		finished.

		Notes:

		- If you requested UFFDIO_REGISTER_MODE_MISSING when registering then
		you must provide some kind of page in your thread after reading from
		the uffd. You must provide either UFFDIO_COPY or UFFDIO_ZEROPAGE.
		The normal behavior of the OS automatically providing a zero page on
		an annonymous mmaping is not in place.

		- None of the page-delivering ioctls default to the range that you
		registered with. You must fill in all fields for the appropriate
		ioctl struct including the range.

		- You get the address of the access that triggered the missing page
		event out of a struct uffd_msg that you read in the thread from the
		uffd. You can supply as many pages as you want with UFFDIO_COPY or
		UFFDIO_ZEROPAGE. Keep in mind that unless you used DONTWAKE then
		the first of any of those IOCTLs wakes up the faulting thread.

		- Be sure to test for all errors including (pollfd[0].revents &
		POLLERR). This can happen, e.g. when ranges supplied were
		incorrect.

		Write Protect Notifications
		---------------------------

		This is equivalent to (but faster than) using mprotect and a SIGSEGV
		signal handler.

		Firstly you need to register a range with UFFDIO_REGISTER_MODE_WP.
		Instead of using mprotect(2) you use ioctl(uffd, UFFDIO_WRITEPROTECT,
		struct *uffdio_writeprotect) while mode = UFFDIO_WRITEPROTECT_MODE_WP
		in the struct passed in. The range does not default to and does not
		have to be identical to the range you registered with. You can write
		protect as many ranges as you like (inside the registered range).
		Then, in the thread reading from uffd the struct will have
		msg.arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_WP set. Now you send
		ioctl(uffd, UFFDIO_WRITEPROTECT, struct *uffdio_writeprotect) again
		while pagefault.mode does not have UFFDIO_WRITEPROTECT_MODE_WP set.
		This wakes up the thread which will continue to run with writes. This
		allows you to do the bookkeeping about the write in the uffd reading
		thread before the ioctl.

		If you registered with both UFFDIO_REGISTER_MODE_MISSING and
		UFFDIO_REGISTER_MODE_WP then you need to think about the sequence in
		which you supply a page and undo write protect. Note that there is a
		difference between writes into a WP area and into a !WP area. The
		former will have UFFD_PAGEFAULT_FLAG_WP set, the latter
		UFFD_PAGEFAULT_FLAG_WRITE. The latter did not fail on protection but
		you still need to supply a page when UFFDIO_REGISTER_MODE_MISSING was
		used.

		QEMU/KVM
		========

Documentation/vm/free_page_reporting.rst

0 → 100644

+40 −0

Original line number	Diff line number	Diff line
		.. _free_page_reporting:

		=====================
		Free Page Reporting
		=====================

		Free page reporting is an API by which a device can register to receive
		lists of pages that are currently unused by the system. This is useful in
		the case of virtualization where a guest is then able to use this data to
		notify the hypervisor that it is no longer using certain pages in memory.

		For the driver, typically a balloon driver, to use of this functionality
		it will allocate and initialize a page_reporting_dev_info structure. The
		field within the structure it will populate is the "report" function
		pointer used to process the scatterlist. It must also guarantee that it can
		handle at least PAGE_REPORTING_CAPACITY worth of scatterlist entries per
		call to the function. A call to page_reporting_register will register the
		page reporting interface with the reporting framework assuming no other
		page reporting devices are already registered.

		Once registered the page reporting API will begin reporting batches of
		pages to the driver. The API will start reporting pages 2 seconds after
		the interface is registered and will continue to do so 2 seconds after any
		page of a sufficiently high order is freed.

		Pages reported will be stored in the scatterlist passed to the reporting
		function with the final entry having the end bit set in entry nent - 1.
		While pages are being processed by the report function they will not be
		accessible to the allocator. Once the report function has been completed
		the pages will be returned to the free area from which they were obtained.

		Prior to removing a driver that is making use of free page reporting it
		is necessary to call page_reporting_unregister to have the
		page_reporting_dev_info structure that is currently in use by free page
		reporting removed. Doing this will prevent further reports from being
		issued via the interface. If another driver or the same driver is
		registered it is possible for it to resume where the previous driver had
		left off in terms of reporting free pages.

		Alexander Duyck, Dec 04, 2019

Documentation/vm/zswap.rst

+12 −8

Original line number	Diff line number	Diff line
		@@ -35,9 +35,11 @@ Zswap evicts pages from compressed cache on an LRU basis to the backing swap
		device when the compressed pool reaches its size limit. This requirement had
		been identified in prior community discussions.

		Zswap is disabled by default but can be enabled at boot time by setting
		the ``enabled`` attribute to 1 at boot time. ie: ``zswap.enabled=1``. Zswap
		can also be enabled and disabled at runtime using the sysfs interface.
		Whether Zswap is enabled at the boot time depends on whether
		the ``CONFIG_ZSWAP_DEFAULT_ON`` Kconfig option is enabled or not.
		This setting can then be overridden by providing the kernel command line
		``zswap.enabled=`` option, for example ``zswap.enabled=0``.
		Zswap can also be enabled and disabled at runtime using the sysfs interface.
		An example command to enable zswap at runtime, assuming sysfs is mounted
		at ``/sys``, is::

		@@ -64,9 +66,10 @@ allocation in zpool is not directly accessible by address. Rather, a handle is
		returned by the allocation routine and that handle must be mapped before being
		accessed. The compressed memory pool grows on demand and shrinks as compressed
		pages are freed. The pool is not preallocated. By default, a zpool
		of type zbud is created, but it can be selected at boot time by
		setting the ``zpool`` attribute, e.g. ``zswap.zpool=zbud``. It can
		also be changed at runtime using the sysfs ``zpool`` attribute, e.g.::
		of type selected in ``CONFIG_ZSWAP_ZPOOL_DEFAULT`` Kconfig option is created,
		but it can be overridden at boot time by setting the ``zpool`` attribute,
		e.g. ``zswap.zpool=zbud``. It can also be changed at runtime using the sysfs
		``zpool`` attribute, e.g.::

		echo zbud > /sys/module/zswap/parameters/zpool

		@@ -97,8 +100,9 @@ controlled policy:
		* max_pool_percent - The maximum percentage of memory that the compressed
		pool can occupy.

		The default compressor is lzo, but it can be selected at boot time by
		setting the ``compressor`` attribute, e.g. ``zswap.compressor=lzo``.
		The default compressor is selected in ``CONFIG_ZSWAP_COMPRESSOR_DEFAULT``
		Kconfig option, but it can be overridden at boot time by setting the
		``compressor`` attribute, e.g. ``zswap.compressor=lzo``.
		It can also be changed at runtime using the sysfs "compressor"
		attribute, e.g.::

Admin message