A page state change is typically followed by an access of the page(s) and
results in another VMEXIT in order to map the page into the nested page
table. Depending on the size of page state change request, this can
generate a number of additional VMEXITs. For example, under SNP, when
Linux is utilizing lazy memory acceptance, memory is typically accepted in
4M chunks. A page state change request is submitted to mark the pages as
private, followed by validation of the memory. Since the guest_memfd
currently only supports 4K pages, each page validation will result in
VMEXIT to map the page, resulting in 1024 additional exits.
When performing a page state change, invoke KVM_PRE_FAULT_MEMORY for the
size of the page state change in order to pre-map the pages and avoid the
additional VMEXITs. This helps speed up boot times.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/f5411c42340bd2f5c14972551edb4e959995e42b.1743193824.git.thomas.lendacky@amd.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The immediate form of MSR access instructions are primarily motivated by
performance, not code size: by having the MSR number in an immediate, it
is available *much* earlier in the pipeline, which allows the hardware
much more leeway about how a particular MSR is handled.
Signed-off-by: Xin Li (Intel) <xin@zytor.com>
Link: https://lore.kernel.org/r/20250103084827.1820007-4-xin@zytor.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* target/i386/kvm: Intel TDX support
* target/i386/emulate: more lflags cleanups
* meson: remove need for explicit listing of dependencies in hw_common_arch and
target_common_arch
* rust: small fixes
* hpet: Reorganize register decoding to be more similar to Rust code
* target/i386: fixes for AMD models
* target/i386: new EPYC-Turin CPU model
# -----BEGIN PGP SIGNATURE-----
#
# iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmg4BxwUHHBib256aW5p
# QHJlZGhhdC5jb20ACgkQv/vSX3jHroP67gf+PEP4EDQP0AJUfxXYVsczGf5snGjz
# ro8jYmKG+huBZcrS6uPK5zHYxtOI9bHr4ipTHJyHd61lyzN6Ys9amPbs/CRE2Q4x
# Ky4AojPhCuaL2wHcYNcu41L+hweVQ3myj97vP3hWvkatulXYeMqW3/4JZgr4WZ69
# A9LGLtLabobTz5yLc8x6oHLn/BZ2y7gjd2LzTz8bqxx7C/kamjoDrF2ZHbX9DLQW
# BKWQ3edSO6rorSNHWGZsy9BE20AEkW2LgJdlV9eXglFEuEs6cdPKwGEZepade4bQ
# Rdt2gHTlQdUDTFmAbz8pttPxFGMC9Zpmb3nnicKJpKQAmkT/x4k9ncjyAQ==
# =XmkU
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 29 May 2025 03:05:00 EDT
# gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg: issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1
# Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83
* tag 'for-upstream' of https://gitlab.com/bonzini/qemu: (77 commits)
target/i386/tcg/helper-tcg: fix file references in comments
target/i386: Add support for EPYC-Turin model
target/i386: Update EPYC-Genoa for Cache property, perfmon-v2, RAS and SVM feature bits
target/i386: Add couple of feature bits in CPUID_Fn80000021_EAX
target/i386: Update EPYC-Milan CPU model for Cache property, RAS, SVM feature bits
target/i386: Update EPYC-Rome CPU model for Cache property, RAS, SVM feature bits
target/i386: Update EPYC CPU model for Cache property, RAS, SVM feature bits
rust: make declaration of dependent crates more consistent
docs: Add TDX documentation
i386/tdx: Validate phys_bits against host value
i386/tdx: Make invtsc default on
i386/tdx: Don't treat SYSCALL as unavailable
i386/tdx: Fetch and validate CPUID of TD guest
target/i386: Print CPUID subleaf info for unsupported feature
i386: Remove unused parameter "uint32_t bit" in feature_word_description()
i386/cgs: Introduce x86_confidential_guest_check_features()
i386/tdx: Define supported KVM features for TDX
i386/tdx: Add XFD to supported bit of TDX
i386/tdx: Add supported CPUID bits relates to XFAM
i386/tdx: Add supported CPUID bits related to TD Attributes
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
ARMCPU typedef is declared in "cpu-qom.h". Include it in
order to avoid when refactoring unrelated headers:
target/arm/hvf_arm.h:23:41: error: unknown type name 'ARMCPU'
23 | void hvf_arm_set_cpu_features_from_host(ARMCPU *cpu);
| ^
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-id: 20250513173928.77376-10-philmd@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
ARMCPU typedef is declared in "cpu-qom.h". Include it in
order to avoid when refactoring unrelated headers:
target/arm/kvm_arm.h:54:29: error: unknown type name 'ARMCPU'
54 | bool write_list_to_kvmstate(ARMCPU *cpu, int level);
| ^
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-id: 20250513173928.77376-9-philmd@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
arm-qmp-cmds.c uses ARM_MAX_VQ, which is defined in "cpu.h".
Include the latter to avoid when refactoring unrelated headers:
target/arm/arm-qmp-cmds.c:83:19: error: use of undeclared identifier 'ARM_MAX_VQ'
83 | QEMU_BUILD_BUG_ON(ARM_MAX_VQ > 16);
| ^
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-id: 20250513173928.77376-8-philmd@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
"target/arm/cpu-features.h" dereferences the ARMISARegisters
structure, which is defined in "cpu.h". Include the latter to
avoid when refactoring unrelated headers:
In file included from target/arm/internals.h:33:
target/arm/cpu-features.h:45:54: error: unknown type name 'ARMISARegisters'
45 | static inline bool isar_feature_aa32_thumb_div(const ARMISARegisters *id)
| ^
target/arm/cpu-features.h:47:12: error: use of undeclared identifier 'R_ID_ISAR0_DIVIDE_SHIFT'
47 | return FIELD_EX32(id->id_isar0, ID_ISAR0, DIVIDE) != 0;
| ^
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-id: 20250513173928.77376-7-philmd@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
CPReadFn type definitions use the CPUARMState type, itself
declared in "cpu.h". Include this file in order to avoid when
refactoring headers:
../target/arm/cpregs.h:241:27: error: unknown type name 'CPUARMState'
typedef uint64_t CPReadFn(CPUARMState *env, const ARMCPRegInfo *opaque);
^
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250513173928.77376-5-philmd@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
accel/tcg: Fix atomic_mmu_lookup vs TLB_FORCE_SLOW
linux-user: implement pgid field of /proc/self/stat
target/sh4: Use MO_ALIGN for system UNALIGN()
target/microblaze: Use TARGET_LONG_BITS == 32 for system mode
accel/tcg: Add TCGCPUOps.pointer_wrap
target/*: Populate TCGCPUOps.pointer_wrap
# -----BEGIN PGP SIGNATURE-----
#
# iQFRBAABCgA7FiEEekgeeIaLTbaoWgXAZN846K9+IV8FAmg2xZAdHHJpY2hhcmQu
# aGVuZGVyc29uQGxpbmFyby5vcmcACgkQZN846K9+IV/VmAgAu5PHIARUuNqneUPQ
# 2JxqpZHGVbaXE0ACi9cslpfThFM/I4OXmK21ZWb1dHB3qasNiKU8cdImXSUVH3dj
# DLsr/tliReerZGUoHEtFsYd+VOtqb3wcrvXxnzG/xB761uZjFCnqwy4MrXMfSXVh
# 6w+eysWOblYHQb9rAZho4nyw6BgjYAX2vfMFxLJBcDP/fjILFB7xoXHEyqKWMmE1
# 0enA0KUotyLOCRXVEXSsfPDYD8szXfMkII3YcGnscthm5j58oc3skVdKFGVjNkNb
# /aFpyvoU7Vp3JpxkYEIWLQrRM75VSb1KzJwMipHgYy3GoV++BrY10T0jyEPrx0iq
# RFzK4A==
# =XQzq
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 28 May 2025 04:13:04 EDT
# gpg: using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F
# gpg: issuer "richard.henderson@linaro.org"
# gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" [full]
# Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A 05C0 64DF 38E8 AF7E 215F
* tag 'pull-tcg-20250528' of https://gitlab.com/rth7680/qemu: (28 commits)
accel/tcg: Assert TCGCPUOps.pointer_wrap is set
target/sparc: Fill in TCGCPUOps.pointer_wrap
target/s390x: Fill in TCGCPUOps.pointer_wrap
target/riscv: Fill in TCGCPUOps.pointer_wrap
target/ppc: Fill in TCGCPUOps.pointer_wrap
target/mips: Fill in TCGCPUOps.pointer_wrap
target/loongarch: Fill in TCGCPUOps.pointer_wrap
target/i386: Fill in TCGCPUOps.pointer_wrap
target/arm: Fill in TCGCPUOps.pointer_wrap
target: Use cpu_pointer_wrap_uint32 for 32-bit targets
target: Use cpu_pointer_wrap_notreached for strict align targets
accel/tcg: Add TCGCPUOps.pointer_wrap
target/sh4: Use MO_ALIGN for system UNALIGN()
tcg: Drop TCGContext.page_{mask,bits}
tcg: Drop TCGContext.tlb_dyn_max_bits
target/microblaze: Simplify compute_ldst_addr_type{a,b}
target/microblaze: Drop DisasContext.r0
target/microblaze: Use TARGET_LONG_BITS == 32 for system mode
target/microblaze: Fix printf format in mmu_translate
target/microblaze: Use TCGv_i64 for compute_ldst_addr_ea
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Found that some of the cache properties are not set correctly for EPYC models.
l1d_cache.no_invd_sharing should not be true.
l1i_cache.no_invd_sharing should not be true.
L2.self_init should be true.
L2.inclusive should be true.
L3.inclusive should not be true.
L3.no_invd_sharing should be true.
Fix these cache properties.
Also add the missing RAS and SVM features bits on AMD EPYC-Genoa model.
The SVM feature bits are used in nested guests.
perfmon-v2 : Allow guests to make use of the PerfMonV2 features.
succor : Software uncorrectable error containment and recovery capability.
overflow-recov : MCA overflow recovery support.
lbrv : LBR virtualization
tsc-scale : MSR based TSC rate control
vmcb-clean : VMCB clean bits
flushbyasid : Flush by ASID
pause-filter : Pause intercept filter
pfthreshold : PAUSE filter threshold
v-vmsave-vmload: Virtualized VMLOAD and VMSAVE
vgif : Virtualized GIF
fs-gs-base-ns : WRMSR to {FS,GS,KERNEL_GS}_BASE is non-serializing
The feature details are available in APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/afe3f05d4116124fd5795f28fc23d7b396140313.1746734284.git.babu.moger@amd.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Found that some of the cache properties are not set correctly for EPYC models.
l1d_cache.no_invd_sharing should not be true.
l1i_cache.no_invd_sharing should not be true.
L2.self_init should be true.
L2.inclusive should be true.
L3.inclusive should not be true.
L3.no_invd_sharing should be true.
Fix these cache properties.
Also add the missing RAS and SVM features bits on AMD EPYC-Milan model.
The SVM feature bits are used in nested guests.
succor : Software uncorrectable error containment and recovery capability.
overflow-recov : MCA overflow recovery support.
lbrv : LBR virtualization
tsc-scale : MSR based TSC rate control
vmcb-clean : VMCB clean bits
flushbyasid : Flush by ASID
pause-filter : Pause intercept filter
pfthreshold : PAUSE filter threshold
v-vmsave-vmload : Virtualized VMLOAD and VMSAVE
vgif : Virtualized GIF
Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/c619c0e09a9d5d496819ed48d69181d65f416891.1746734284.git.babu.moger@amd.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Found that some of the cache properties are not set correctly for EPYC models.
l1d_cache.no_invd_sharing should not be true.
l1i_cache.no_invd_sharing should not be true.
L2.self_init should be true.
L2.inclusive should be true.
L3.inclusive should not be true.
L3.no_invd_sharing should be true.
Fix these cache properties.
Also add the missing RAS and SVM features bits on AMD EPYC-Rome. The SVM
feature bits are used in nested guests.
succor : Software uncorrectable error containment and recovery capability.
overflow-recov : MCA overflow recovery support.
lbrv : LBR virtualization
tsc-scale : MSR based TSC rate control
vmcb-clean : VMCB clean bits
flushbyasid : Flush by ASID
pause-filter : Pause intercept filter
pfthreshold : PAUSE filter threshold
v-vmsave-vmload : Virtualized VMLOAD and VMSAVE
vgif : Virtualized GIF
Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/8265af72057b84c99ac3a02a5487e32759cc69b1.1746734284.git.babu.moger@amd.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Found that some of the cache properties are not set correctly for EPYC models.
l1d_cache.no_invd_sharing should not be true.
l1i_cache.no_invd_sharing should not be true.
L2.self_init should be true.
L2.inclusive should be true.
L3.inclusive should not be true.
L3.no_invd_sharing should be true.
Fix the cache properties.
Also add the missing RAS and SVM features bits on AMD
EPYC CPU models. The SVM feature bits are used in nested guests.
succor : Software uncorrectable error containment and recovery capability.
overflow-recov : MCA overflow recovery support.
lbrv : LBR virtualization
tsc-scale : MSR based TSC rate control
vmcb-clean : VMCB clean bits
flushbyasid : Flush by ASID
pause-filter : Pause intercept filter
pfthreshold : PAUSE filter threshold
v-vmsave-vmload : Virtualized VMLOAD and VMSAVE
vgif : Virtualized GIF
Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/515941861700d7066186c9600bc5d96a1741ef0c.1746734284.git.babu.moger@amd.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
On Intel CPU, the value of CPUID_EXT2_SYSCALL depends on the mode of
the vcpu. It's 0 outside 64-bit mode and 1 in 64-bit mode.
The initial state of TDX vcpu is 32-bit protected mode. At the time of
calling KVM_TDX_GET_CPUID, vcpu hasn't started running so the value read
is 0.
In reality, 64-bit mode should always be supported. So mark
CPUID_EXT2_SYSCALL always supported to avoid false warning.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-53-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use KVM_TDX_GET_CPUID to get the CPUIDs that are managed and enfored
by TDX module for TD guest. Check QEMU's configuration against the
fetched data.
Print wanring message when 1. a feature is not supported but requested
by QEMU or 2. QEMU doesn't want to expose a feature while it is enforced
enabled.
- If cpu->enforced_cpuid is not set, prints the warning message of both
1) and 2) and tweak QEMU's configuration.
- If cpu->enforced_cpuid is set, quit if any case of 1) or 2).
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-52-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Some CPUID bits are controlled by XFAM. They are not covered by
tdx_caps.cpuid (which only contians the directly configurable bits), but
they are actually supported when the related XFAM bit is supported.
Add these XFAM controlled bits to TDX supported CPUID bits based on the
supported_xfam.
Besides, incorporate the supported_xfam into the supported CPUID leaf of
0xD.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-48-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
For TDX, some CPUID feature bit is configured via TD attributes. They
are not covered by tdx_caps.cpuid (which only contians the directly
configurable CPUID bits), but they are actually supported when the
related attributre bit is supported.
Note, LASS and KeyLocker are not supported by KVM for TDX, nor does
QEMU support it (see TDX_SUPPORTED_TD_ATTRS). They are defined in
tdx_attrs_maps[] for the completeness of the existing TD Attribute
bits that are related with CPUID features.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-47-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
TDX architecture forcibly sets some CPUID bits for TD guest that VMM
cannot disable it. They are fixed1 bits.
Fixed1 bits are not covered by tdx_caps.cpuid (which only contains the
directly configurable bits), while fixed1 bits are supported for TD guest
obviously.
Add fixed1 bits to tdx_supported_cpuid. Besides, set all the fixed1
bits to the initial set of KVM's support since KVM might not report them
as supported.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-46-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Maintain a TDX specific supported CPUID set, and use it to mask the
common supported CPUID value of KVM. It can avoid newly added supported
features (reported via KVM_GET_SUPPORTED_CPUID) for common VMs being
falsely reported as supported for TDX.
As the first step, initialize the TDX supported CPUID set with all the
configurable CPUID bits. It's not complete because there are other CPUID
bits are supported for TDX but not reported as directly configurable.
E.g. the XFAM related bits, attribute related bits and fixed-1 bits.
They will be handled in the future.
Also, what matters are the CPUID bits related to QEMU's feature word.
Only mask the CPUID leafs which are feature word leaf.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-45-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Currently, QEMU exposes CPUID 0x1f to guest only when necessary, i.e.,
when topology level that cannot be enumerated by leaf 0xB, e.g., die or
module level, are configured for the guest, e.g., -smp xx,dies=2.
However, TDX architecture forces to require CPUID 0x1f to configure CPU
topology.
Introduce a bool flag, enable_cpuid_0x1f, in CPU for the case that
requires CPUID leaf 0x1f to be exposed to guest.
Introduce a new function x86_has_cpuid_0x1f(), which is the wrapper of
cpu->enable_cpuid_0x1f and x86_has_extended_topo() to check if it needs
to enable cpuid leaf 0x1f for the guest.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-34-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>