KVM's userspace access interface to the GICD enable and active bits
is via set/clear register pairs which implement the hardware's "write
1s to the clear register to clear the 0 bits, and write 1s to the set
register to set the 1 bits" semantics. We didn't get this right,
because we were writing 0 to the clear register.
Writing 0 to GICD_IC{ENABLE,ACTIVE}R architecturally has no effect on
interrupt status (all writes are simply ignored by KVM) and doesn't
comply with the intention of "first write to the clear-reg to clear
all bits".
Write all 1's to actually clear the enable/active status.
This didn't have any adverse effects on migration because there
we start with a clean VM state; it would be guest-visible when
doing a system reset, but since Linux always cleans up the
register state of the GIC during bootup before it enables it
most users won't have run into a problem here.
Cc: qemu-stable@nongnu.org
Fixes: 367b9f527b ("hw/intc/arm_gicv3_kvm: Implement get/put functions")
Signed-off-by: Zenghui Yu <zenghui.yu@linux.dev>
Message-id: 20250729161650.43758-3-zenghui.yu@linux.dev
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
As per the arm-vgic-v3 kernel doc [1]:
Accesses to GICD_ICPENDR register region and GICR_ICPENDR0 registers
have RAZ/WI semantics, meaning that reads always return 0 and writes
are always ignored.
The state behind these registers (both 0 and 1 bits) is written by
writing to the GICD_ISPENDR and GICR_ISPENDR0 registers, unlike
some of the other set/clear register pairs.
Remove the useless writes to ICPENDR registers in kvm_arm_gicv3_put().
[1] https://docs.kernel.org/virt/kvm/devices/arm-vgic-v3.html
Signed-off-by: Zenghui Yu <zenghui.yu@linux.dev>
Message-id: 20250729161650.43758-2-zenghui.yu@linux.dev
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
QEMU start failed when smp cpu < smp maxcpus , because qemu send a NULL
cpu to KVM, this patch adds a check for kvm_ipi_access_regs() to fix it.
run with '-smp 1,maxcpus=4,sockets=4,cores=1,threads=1'
we got:
Unexpected error in kvm_device_access() at ../accel/kvm/kvm-all.c:3477:
qemu-system-loongarch64: KVM_SET_DEVICE_ATTR failed: Group 1073741825 attr 0x0000000000010000: Invalid argument
Signed-off-by: Song Gao <gaosong@loongson.cn>
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Message-ID: <20250725081213.3867592-1-gaosong@loongson.cn>
In preparation to implement POOL context push, add support for POOL
NVP context save/restore.
The NVP p bit is defined in the spec as follows:
If TRUE, the CPPR of a Pool VP in the NVP is updated during store of
the context with the CPPR of the Hard context it was running under.
It's not clear whether non-pool VPs always or never get CPPR updated.
Before this patch, OS contexts always save CPPR, so we will assume that
is the behaviour.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Glenn Miles <milesg@linux.ibm.com>
Reviewed-by: Michael Kowal <kowal@linux.ibm.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250512031100.439842-41-npiggin@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
The tctx "signaling" registers (PIPR, CPPR, NSR) raise an interrupt on
the target CPU thread. The POOL and PHYS rings both raise hypervisor
interrupts, so they both share one set of signaling registers in the
PHYS ring. The PHYS NSR register contains a field that indicates which
ring has presented the interrupt being signaled to the CPU.
This sharing results in all the "alt_regs" throughout the code. alt_regs
is not very descriptive, and worse is that the name is used for
conversions in both directions, i.e., to find the presenting ring from
the signaling ring, and the signaling ring from the presenting ring.
Instead of alt_regs, use the names sig_regs and sig_ring, and regs and
ring for the presenting ring being worked on. Add a helper function to
get the sign_regs, and add some asserts to ensure the POOL regs are
never used to signal interrupts.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Glenn Miles <milesg@linux.ibm.com>
Reviewed-by: Michael Kowal <kowal@linux.ibm.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250512031100.439842-34-npiggin@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
xive_tctx_pipr_present() as implemented with xive_tctx_pipr_update()
causes VP-directed (group==0) interrupt to be presented in PIPR and NSR
despite being a lower priority than the currently presented group
interrupt.
This must not happen. The IPB bit should record the low priority VP
interrupt, but PIPR and NSR must not present the lower priority
interrupt.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Glenn Miles <milesg@linux.ibm.com>
Reviewed-by: Michael Kowal <kowal@linux.ibm.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250512031100.439842-32-npiggin@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Booting AIX in a PowerVM partition requires the use of the "Acknowledge
O/S Interrupt to even O/S reporting line" special operation provided by
the IBM XIVE interrupt controller. This operation is invoked by writing
a byte (data is irrelevant) to offset 0xC10 of the Thread Interrupt
Management Area (TIMA). It can be used by software to notify the XIVE
logic that the interrupt was received.
Signed-off-by: Glenn Miles <milesg@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Michael Kowal <kowal@linux.ibm.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250512031100.439842-26-npiggin@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
When an XIVE context is pulled while it has an active, unacknowledged
group interrupt, XIVE will check to see if a context on another thread
can handle the interrupt and, if so, notify that context. If there
are no contexts that can handle the interrupt, then the interrupt is
added to a backlog and XIVE will attempt to escalate the interrupt,
if configured to do so, allowing the higher privileged handler to
activate a context that can handle the original interrupt.
Signed-off-by: Glenn Miles <milesg@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Michael Kowal <kowal@linux.ibm.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250512031100.439842-23-npiggin@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
The group interrupt delivery flow selects the group backlog scan if
LSMFB < IPB, but that scan may find an interrupt with a priority >=
IPB. In that case, the VP-direct interrupt should be chosen. This
extends to selecting the lowest prio between POOL and PHYS rings.
Implement this just by re-starting the selection logic if the
backlog irq was not found or priority did not match LSMFB (LSMFB
is updated so next time around it would see the right value and
not loop infinitely).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Michael Kowal <kowal@linux.ibm.com>
Reviewed-by: Caleb Schlossin <calebs@linux.ibm.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250512031100.439842-13-npiggin@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
A problem was seen where uart interrupts would be lost resulting in the
console hanging. Traces showed that a lower priority interrupt was
preempting a higher priority interrupt, which would result in the higher
priority interrupt never being handled.
The new interrupt's priority was being compared against the CPPR
(Current Processor Priority Register) instead of the PIPR (Post
Interrupt Priority Register), as was required by the XIVE spec.
This allowed for a window between raising an interrupt and ACK'ing
the interrupt where a lower priority interrupt could slip in.
Fixes: 26c55b9941 ("ppc/xive2: Process group backlog when updating the CPPR")
Signed-off-by: Glenn Miles <milesg@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Michael Kowal <kowal@linux.ibm.com>
Reviewed-by: Caleb Schlossin <calebs@linux.ibm.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250512031100.439842-10-npiggin@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
The current xive algorithm for finding a matching group vCPU
target always uses the first vCPU found. And, since it always
starts the search with thread 0 of a core, thread 0 is almost
always used to handle group interrupts. This can lead to additional
interrupt latency and poor performance for interrupt intensive
work loads.
Changing this to use a simple round-robin algorithm for deciding which
thread number to use when starting a search, which leads to a more
distributed use of threads for handling group interrupts.
[npiggin: Also round-robin among threads, not just cores]
Signed-off-by: Glenn Miles <milesg@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Glenn Miles <milesg@linux.ibm.com>
Reviewed-by: Michael Kowal <kowal@linux.ibm.com>
Reviewed-by: Caleb Schlossin <calebs@linux.ibm.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250512031100.439842-9-npiggin@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>