Commit 8c1b724d authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull kvm updates from Paolo Bonzini:
 "ARM:
   - GICv4.1 support

   - 32bit host removal

  PPC:
   - secure (encrypted) using under the Protected Execution Framework
     ultravisor

  s390:
   - allow disabling GISA (hardware interrupt injection) and protected
     VMs/ultravisor support.

  x86:
   - New dirty bitmap flag that sets all bits in the bitmap when dirty
     page logging is enabled; this is faster because it doesn't require
     bulk modification of the page tables.

   - Initial work on making nested SVM event injection more similar to
     VMX, and less buggy.

   - Various cleanups to MMU code (though the big ones and related
     optimizations were delayed to 5.8). Instead of using cr3 in
     function names which occasionally means eptp, KVM too has
     standardized on "pgd".

   - A large refactoring of CPUID features, which now use an array that
     parallels the core x86_features.

   - Some removal of pointer chasing from kvm_x86_ops, which will also
     be switched to static calls as soon as they are available.

   - New Tigerlake CPUID features.

   - More bugfixes, optimizations and cleanups.

  Generic:
   - selftests: cleanups, new MMU notifier stress test, steal-time test

   - CSV output for kvm_stat"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (277 commits)
  x86/kvm: fix a missing-prototypes "vmread_error"
  KVM: x86: Fix BUILD_BUG() in __cpuid_entry_get_reg() w/ CONFIG_UBSAN=y
  KVM: VMX: Add a trampoline to fix VMREAD error handling
  KVM: SVM: Annotate svm_x86_ops as __initdata
  KVM: VMX: Annotate vmx_x86_ops as __initdata
  KVM: x86: Drop __exit from kvm_x86_ops' hardware_unsetup()
  KVM: x86: Copy kvm_x86_ops by value to eliminate layer of indirection
  KVM: x86: Set kvm_x86_ops only after ->hardware_setup() completes
  KVM: VMX: Configure runtime hooks using vmx_x86_ops
  KVM: VMX: Move hardware_setup() definition below vmx_x86_ops
  KVM: x86: Move init-only kvm_x86_ops to separate struct
  KVM: Pass kvm_init()'s opaque param to additional arch funcs
  s390/gmap: return proper error code on ksm unsharing
  KVM: selftests: Fix cosmetic copy-paste error in vm_mem_region_move()
  KVM: Fix out of range accesses to memslots
  KVM: X86: Micro-optimize IPI fastpath delay
  KVM: X86: Delay read msr data iff writes ICR MSR
  KVM: PPC: Book3S HV: Add a capability for enabling secure guests
  KVM: arm64: GICv4.1: Expose HW-based SGIs in debugfs
  KVM: arm64: GICv4.1: Allow non-trapping WFI when using HW SGIs
  ...
parents f14a9532 514ccc19
Loading
Loading
Loading
Loading
+5 −0
Original line number Diff line number Diff line
@@ -3821,6 +3821,11 @@
			before loading.
			See Documentation/admin-guide/blockdev/ramdisk.rst.

	prot_virt=	[S390] enable hosting protected virtual machines
			isolated from the hypervisor (if hardware supports
			that).
			Format: <bool>

	psi=		[KNL] Enable or disable pressure stall information
			tracking.
			Format: <bool>
+105 −23
Original line number Diff line number Diff line
@@ -1574,8 +1574,8 @@ This ioctl would set vcpu's xcr to the value userspace specified.
  };

  #define KVM_CPUID_FLAG_SIGNIFCANT_INDEX		BIT(0)
  #define KVM_CPUID_FLAG_STATEFUL_FUNC		BIT(1)
  #define KVM_CPUID_FLAG_STATE_READ_NEXT		BIT(2)
  #define KVM_CPUID_FLAG_STATEFUL_FUNC		BIT(1) /* deprecated */
  #define KVM_CPUID_FLAG_STATE_READ_NEXT		BIT(2) /* deprecated */

  struct kvm_cpuid_entry2 {
	__u32 function;
@@ -1626,13 +1626,6 @@ emulate them efficiently. The fields in each entry are defined as follows:

        KVM_CPUID_FLAG_SIGNIFCANT_INDEX:
           if the index field is valid
        KVM_CPUID_FLAG_STATEFUL_FUNC:
           if cpuid for this function returns different values for successive
           invocations; there will be several entries with the same function,
           all with this flag set
        KVM_CPUID_FLAG_STATE_READ_NEXT:
           for KVM_CPUID_FLAG_STATEFUL_FUNC entries, set if this entry is
           the first entry to be read by a cpu

   eax, ebx, ecx, edx:
         the values returned by the cpuid instruction for
@@ -2117,7 +2110,8 @@ Errors:

  ======   ============================================================
  ENOENT   no such register
  EINVAL   invalid register ID, or no such register
  EINVAL   invalid register ID, or no such register or used with VMs in
           protected virtualization mode on s390
  EPERM    (arm64) register access not allowed before vcpu finalization
  ======   ============================================================

@@ -2552,7 +2546,8 @@ Errors include:

  ======== ============================================================
  ENOENT   no such register
  EINVAL   invalid register ID, or no such register
  EINVAL   invalid register ID, or no such register or used with VMs in
           protected virtualization mode on s390
  EPERM    (arm64) register access not allowed before vcpu finalization
  ======== ============================================================

@@ -3347,8 +3342,8 @@ The member 'flags' is used for passing flags from userspace.
::

  #define KVM_CPUID_FLAG_SIGNIFCANT_INDEX		BIT(0)
  #define KVM_CPUID_FLAG_STATEFUL_FUNC		BIT(1)
  #define KVM_CPUID_FLAG_STATE_READ_NEXT		BIT(2)
  #define KVM_CPUID_FLAG_STATEFUL_FUNC		BIT(1) /* deprecated */
  #define KVM_CPUID_FLAG_STATE_READ_NEXT		BIT(2) /* deprecated */

  struct kvm_cpuid_entry2 {
	__u32 function;
@@ -3394,13 +3389,6 @@ The fields in each entry are defined as follows:

        KVM_CPUID_FLAG_SIGNIFCANT_INDEX:
           if the index field is valid
        KVM_CPUID_FLAG_STATEFUL_FUNC:
           if cpuid for this function returns different values for successive
           invocations; there will be several entries with the same function,
           all with this flag set
        KVM_CPUID_FLAG_STATE_READ_NEXT:
           for KVM_CPUID_FLAG_STATEFUL_FUNC entries, set if this entry is
           the first entry to be read by a cpu

   eax, ebx, ecx, edx:

@@ -4649,6 +4637,60 @@ the clear cpu reset definition in the POP. However, the cpu is not put
into ESA mode. This reset is a superset of the initial reset.


4.125 KVM_S390_PV_COMMAND
-------------------------

:Capability: KVM_CAP_S390_PROTECTED
:Architectures: s390
:Type: vm ioctl
:Parameters: struct kvm_pv_cmd
:Returns: 0 on success, < 0 on error

::

  struct kvm_pv_cmd {
	__u32 cmd;	/* Command to be executed */
	__u16 rc;	/* Ultravisor return code */
	__u16 rrc;	/* Ultravisor return reason code */
	__u64 data;	/* Data or address */
	__u32 flags;    /* flags for future extensions. Must be 0 for now */
	__u32 reserved[3];
  };

cmd values:

KVM_PV_ENABLE
  Allocate memory and register the VM with the Ultravisor, thereby
  donating memory to the Ultravisor that will become inaccessible to
  KVM. All existing CPUs are converted to protected ones. After this
  command has succeeded, any CPU added via hotplug will become
  protected during its creation as well.

  Errors:

  =====      =============================
  EINTR      an unmasked signal is pending
  =====      =============================

KVM_PV_DISABLE

  Deregister the VM from the Ultravisor and reclaim the memory that
  had been donated to the Ultravisor, making it usable by the kernel
  again.  All registered VCPUs are converted back to non-protected
  ones.

KVM_PV_VM_SET_SEC_PARMS
  Pass the image header from VM memory to the Ultravisor in
  preparation of image unpacking and verification.

KVM_PV_VM_UNPACK
  Unpack (protect and decrypt) a page of the encrypted boot image.

KVM_PV_VM_VERIFY
  Verify the integrity of the unpacked image. Only if this succeeds,
  KVM is allowed to start protected VCPUs.


5. The kvm_run structure
========================

@@ -5707,8 +5749,13 @@ and injected exceptions.
:Architectures: x86, arm, arm64, mips
:Parameters: args[0] whether feature should be enabled or not

With this capability enabled, KVM_GET_DIRTY_LOG will not automatically
clear and write-protect all pages that are returned as dirty.
Valid flags are::

  #define KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE   (1 << 0)
  #define KVM_DIRTY_LOG_INITIALLY_SET           (1 << 1)

With KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE is set, KVM_GET_DIRTY_LOG will not
automatically clear and write-protect all pages that are returned as dirty.
Rather, userspace will have to do this operation separately using
KVM_CLEAR_DIRTY_LOG.

@@ -5719,18 +5766,42 @@ than requiring to sync a full memslot; this ensures that KVM does not
take spinlocks for an extended period of time.  Second, in some cases a
large amount of time can pass between a call to KVM_GET_DIRTY_LOG and
userspace actually using the data in the page.  Pages can be modified
during this time, which is inefficint for both the guest and userspace:
during this time, which is inefficient for both the guest and userspace:
the guest will incur a higher penalty due to write protection faults,
while userspace can see false reports of dirty pages.  Manual reprotection
helps reducing this time, improving guest performance and reducing the
number of dirty log false positives.

With KVM_DIRTY_LOG_INITIALLY_SET set, all the bits of the dirty bitmap
will be initialized to 1 when created.  This also improves performance because
dirty logging can be enabled gradually in small chunks on the first call
to KVM_CLEAR_DIRTY_LOG.  KVM_DIRTY_LOG_INITIALLY_SET depends on
KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE (it is also only available on
x86 for now).

KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 was previously available under the name
KVM_CAP_MANUAL_DIRTY_LOG_PROTECT, but the implementation had bugs that make
it hard or impossible to use it correctly.  The availability of
KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 signals that those bugs are fixed.
Userspace should not try to use KVM_CAP_MANUAL_DIRTY_LOG_PROTECT.

7.19 KVM_CAP_PPC_SECURE_GUEST
------------------------------

:Architectures: ppc

This capability indicates that KVM is running on a host that has
ultravisor firmware and thus can support a secure guest.  On such a
system, a guest can ask the ultravisor to make it a secure guest,
one whose memory is inaccessible to the host except for pages which
are explicitly requested to be shared with the host.  The ultravisor
notifies KVM when a guest requests to become a secure guest, and KVM
has the opportunity to veto the transition.

If present, this capability can be enabled for a VM, meaning that KVM
will allow the transition to secure guest mode.  Otherwise KVM will
veto the transition.

8. Other capabilities.
======================

@@ -6027,3 +6098,14 @@ Architectures: s390

This capability indicates that the KVM_S390_NORMAL_RESET and
KVM_S390_CLEAR_RESET ioctls are available.

8.23 KVM_CAP_S390_PROTECTED

Architecture: s390


This capability indicates that the Ultravisor has been initialized and
KVM can therefore start protected VMs.
This capability governs the KVM_S390_PV_COMMAND ioctl and the
KVM_MP_STATE_LOAD MP_STATE. KVM_SET_MP_STATE can fail for protected
guests when the state change is invalid.
+5 −0
Original line number Diff line number Diff line
@@ -11,6 +11,11 @@ hypervisor when running as a guest (under Xen, KVM or any other
hypervisor), or any hypervisor-specific interaction when the kernel is
used as a host.

Note: KVM/arm has been removed from the kernel. The API described
here is still valid though, as it allows the kernel to kexec when
booted at HYP. It can also be used by a hypervisor other than KVM
if necessary.

On arm and arm64 (without VHE), the kernel doesn't run in hypervisor
mode, but still needs to interact with it, allowing a built-in
hypervisor to be either installed or torn down.
+2 −9
Original line number Diff line number Diff line
@@ -108,16 +108,9 @@ Groups:
      mask or unmask the adapter, as specified in mask

    KVM_S390_IO_ADAPTER_MAP
      perform a gmap translation for the guest address provided in addr,
      pin a userspace page for the translated address and add it to the
      list of mappings

      .. note:: A new mapping will be created unconditionally; therefore,
	        the calling code should avoid making duplicate mappings.

      This is now a no-op. The mapping is purely done by the irq route.
    KVM_S390_IO_ADAPTER_UNMAP
      release a userspace page for the translated address specified in addr
      from the list of mappings
      This is now a no-op. The mapping is purely done by the irq route.

  KVM_DEV_FLIC_AISM
    modify the adapter-interruption-suppression mode for a given isc if the
+2 −0
Original line number Diff line number Diff line
@@ -18,6 +18,8 @@ KVM
   nested-vmx
   ppc-pv
   s390-diag
   s390-pv
   s390-pv-boot
   timekeeping
   vcpu-requests

Loading