RISC-V Patches for the 4.2 Soft Freeze, Part 1, v3
This contains quite a few patches that I'd like to target for 4.2.
They're mostly emulation fixes for the sifive_u board, which now much
more closely matches the hardware and can therefor run the same fireware
as what gets loaded onto the board. Additional user-visible
improvements include:
* support for loading initrd files from the command line into Linux, via
/chosen/linux,initrd-{start,end} device tree nodes.
* The conversion of LOG_TRACE to trace events.
* The addition of clock DT nodes for our uart and ethernet.
This also includes some preliminary work for the H extension patches,
but does not include the H extension patches as I haven't had time to
review them yet.
This passes my OE boot test on 32-bit and 64-bit virt machines, as well
as a 64-bit upstream Linux boot on the sifive_u machine. It has been
fixed to actually pass "make check" this time.
Changes since v2 (never made it to the list):
* Sets the sifive_u machine default core count to 2 instead of 5.
Changes since v1 <20190910190513.21160-1-palmer@sifive.com>:
* Sets the sifive_u machine default core count to 5 instead of 1, as
it's impossible to have a single core sifive_u machine.
# gpg: Signature made Tue 17 Sep 2019 16:43:30 BST
# gpg: using RSA key 00CE76D1834960DFCE886DF8EF4CA1502CCBAB41
# gpg: issuer "palmer@dabbelt.com"
# gpg: Good signature from "Palmer Dabbelt <palmer@dabbelt.com>" [unknown]
# gpg: aka "Palmer Dabbelt <palmer@sifive.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 00CE 76D1 8349 60DF CE88 6DF8 EF4C A150 2CCB AB41
* remotes/palmer/tags/riscv-for-master-4.2-sf1-v3: (48 commits)
gdbstub: riscv: fix the fflags registers
target/riscv: Use TB_FLAGS_MSTATUS_FS for floating point
target/riscv: Fix mstatus dirty mask
target/riscv: Use both register name and ABI name
riscv: sifive_u: Update model and compatible strings in device tree
riscv: sifive_u: Remove handcrafted clock nodes for UART and ethernet
riscv: sifive_u: Fix broken GEM support
riscv: sifive_u: Instantiate OTP memory with a serial number
riscv: sifive: Implement a model for SiFive FU540 OTP
riscv: roms: Update default bios for sifive_u machine
riscv: sifive_u: Change UART node name in device tree
riscv: sifive_u: Update UART base addresses and IRQs
riscv: sifive_u: Reference PRCI clocks in UART and ethernet nodes
riscv: sifive_u: Add PRCI block to the SoC
riscv: sifive_u: Generate hfclk and rtcclk nodes
riscv: sifive: Implement PRCI model for FU540
riscv: sifive_u: Update PLIC hart topology configuration string
riscv: sifive_u: Update hart configuration to reflect the real FU540 SoC
riscv: sifive_u: Set the minimum number of cpus to 2
riscv: hart: Add a "hartid-base" property to RISC-V hart array
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
In the past we did not have a model for PRCI, hence two handcrafted
clock nodes ("/soc/ethclk" and "/soc/uartclk") were created for the
purpose of supplying hard-coded clock frequencies. But now since we
have added the PRCI support in QEMU, we don't need them any more.
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
At present the GEM support in sifive_u machine is seriously broken.
The GEM block register base was set to a weird number (0x100900FC),
which for no way could work with the cadence_gem model in QEMU.
Not like other GEM variants, the FU540-specific GEM has a management
block to control 10/100/1000Mbps link speed changes, that is mapped
to 0x100a0000. We can simply map it into MMIO space without special
handling using create_unimplemented_device().
Update the GEM node compatible string to use the official name used
by the upstream Linux kernel, and add the management block reg base
& size to the <reg> property encoding.
Tested with upstream U-Boot and Linux kernel MACB drivers.
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
This adds an OTP memory with a given serial number to the sifive_u
machine. With such support, the upstream U-Boot for sifive_fu540
boots out of the box on the sifive_u machine.
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
This implements a simple model for SiFive FU540 OTP (One-Time
Programmable) Memory interface, primarily for reading out the
stored serial number from the first 1 KiB of the 16 KiB OTP
memory reserved by SiFive for internal use.
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Now that we have added a PRCI node, update existing UART and ethernet
nodes to reference PRCI as their clock sources, to keep in sync with
the Linux kernel device tree.
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Add PRCI mmio base address and size mappings to sifive_u machine,
and generate the corresponding device tree node.
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
To keep in sync with Linux kernel device tree, generate hfclk and
rtcclk nodes in the device tree, to be referenced by PRCI node.
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
This adds a simple PRCI model for FU540 (sifive_u). It has different
register layout from the existing PRCI model for FE310 (sifive_e).
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
The FU540-C000 includes a 64-bit E51 RISC-V core and four 64-bit U54
RISC-V cores. Currently the sifive_u machine only populates 4 U54
cores. Update the max cpu number to 5 to reflect the real hardware,
by creating 2 CPU clusters as containers for RISC-V hart arrays to
populate heterogeneous harts.
The cpu nodes in the generated DTS have been updated as well.
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
At present each hart's hartid in a RISC-V hart array is assigned
the same value of its index in the hart array. But for a system
that has multiple hart arrays, this is not the case any more.
Add a new "hartid-base" property so that hartid number can be
assigned based on the property value.
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Current SiFive PRCI model only works with sifive_e machine, as it
only emulates registers or PRCI block in the FE310 SoC.
Rename the file name to make it clear that it is for sifive_e.
This also prefix "sifive_e"/"SIFIVE_E" for all macros, variables
and functions.
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
Reviewed-by: Chih-Min Chao <chihmin.chao@sifive.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
This adds a reset opcode for sifive_test device to trigger a system
reset for testing purpose.
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
Reviewed-by: Palmer Dabbelt <palmer@sifive.com>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
This adds a helper routine for finding firmware. It is currently
used only for "-bios default" case.
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
* Fix Patchew CI failures (myself)
* i386 fw_cfg refactoring (Philippe)
* pmem bugfix (Stefan)
* Support for accessing cstate MSRs (Wanpeng)
* exec.c cleanups (Wei Yang)
* Improved throttling (Yury)
* elf-ops.h coverity fix (Stefano)
# gpg: Signature made Mon 16 Sep 2019 16:13:12 BST
# gpg: using RSA key BFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1
# Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83
* remotes/bonzini/tags/for-upstream: (29 commits)
hw/i386/pc: Extract the x86 generic fw_cfg code
hw/i386/pc: Rename pc_build_feature_control() as generic fw_cfg_build_*
hw/i386/pc: Let pc_build_feature_control() take a MachineState argument
hw/i386/pc: Let pc_build_feature_control() take a FWCfgState argument
hw/i386/pc: Rename pc_build_smbios() as generic fw_cfg_build_smbios()
hw/i386/pc: Let pc_build_smbios() take a generic MachineState argument
hw/i386/pc: Let pc_build_smbios() take a FWCfgState argument
hw/i386/pc: Replace PCMachineState argument with MachineState in fw_cfg_arch_create
hw/i386/pc: Pass the CPUArchIdList array by argument
hw/i386/pc: Pass the apic_id_limit value by argument
hw/i386/pc: Pass the boot_cpus value by argument
hw/i386/pc: Rename bochs_bios_init as more generic fw_cfg_arch_create
hw/i386/pc: Use address_space_memory in place
hw/i386/pc: Extract e820 memory layout code
hw/i386/pc: Use e820_get_num_entries() to access e820_entries
cpus: Fix throttling during vm_stop
qemu-thread: Add qemu_cond_timedwait
memory: inline and optimize devend_memop
memory: fetch pmem size in get_file_size()
elf-ops.h: fix int overflow in load_elf()
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The new function is needed to implement conditional sleep for CPU
throttling. It's possible to reuse qemu_sem_timedwait, but it's more
difficult than just add qemu_cond_timedwait.
Also moved compute_abs_deadline function up the code to reuse it in
qemu_cond_timedwait_impl win32.
Signed-off-by: Yury Kotov <yury-kotov@yandex-team.ru>
Message-Id: <20190909131335.16848-2-yury-kotov@yandex-team.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
devend_memop can rely on the fact that the result is always either
0 or MO_BSWAP, corresponding respectively to host endianness and
the opposite. Native (target) endianness in turn can be either
the host endianness, in which case MO_BSWAP is only returned for
host-opposite endianness, or the opposite, in which case 0 is only
returned for host endianness.
With this in mind, devend_memop can be compiled as a setcond+shift
for every target. Do this and, while at it, move it to
include/exec/memory.h since !NEED_CPU_H files do not (and should not)
need it.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Introduce this new per-machine hook to give any machine class a chance
to do a sanity check on the to-be-hotplugged device as a sanity test.
This will be used for x86 to try to detect some illegal configuration
of devices, e.g., possible conflictions between vfio-pci and x86
vIOMMU.
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20190916080718.3299-3-peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Neither stat(2) nor lseek(2) report the size of Linux devdax pmem
character device nodes. Commit 314aec4a6e
("hostmem-file: reject invalid pmem file sizes") added code to
hostmem-file.c to fetch the size from sysfs and compare against the
user-provided size=NUM parameter:
if (backend->size > size) {
error_setg(errp, "size property %" PRIu64 " is larger than "
"pmem file \"%s\" size %" PRIu64, backend->size,
fb->mem_path, size);
return;
}
It turns out that exec.c:qemu_ram_alloc_from_fd() already has an
equivalent size check but it skips devdax pmem character devices because
lseek(2) returns 0:
if (file_size > 0 && file_size < size) {
error_setg(errp, "backing store %s size 0x%" PRIx64
" does not match 'size' option 0x" RAM_ADDR_FMT,
mem_path, file_size, size);
return NULL;
}
This patch moves the devdax pmem file size code into get_file_size() so
that we check the memory size in a single place:
qemu_ram_alloc_from_fd(). This simplifies the code and makes it more
general.
This also fixes the problem that hostmem-file only checks the devdax
pmem file size when the pmem=on parameter is given. An unchecked
size=NUM parameter can lead to SIGBUS in QEMU so we must always fetch
the file size for Linux devdax pmem character device nodes.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20190830093056.12572-1-stefanha@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Emulate read errors in the DMA Checksum Register for high frequencies
and optimistic settings of the Read Timing Compensation Register. This
will help in tuning the SPI timing calibration algorithm. Errors are
only injected when the property "inject_failure" is set to true as
suggested by Philippe.
The values below are those to expect from the first flash device of
the FMC controller of a palmetto-bmc machine.
Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Message-id: 20190904070506.1052-8-clg@kaod.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The FMC controller on the Aspeed SoCs support DMA to access the flash
modules. It can operate in a normal mode, to copy to or from the flash
module mapping window, or in a checksum calculation mode, to evaluate
the best clock settings for reads.
The model introduces two custom address spaces for DMAs: one for the
AHB window of the FMC flash devices and one for the DRAM. The latter
is populated using a "dram" link set from the machine with the RAM
container region.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Acked-by: Joel Stanley <joel@jms.id.au>
Message-id: 20190904070506.1052-6-clg@kaod.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
GPIO pins are arranged in groups of 8 pins labeled A,B,..,Y,Z,AA,AB,AC.
(Note that the ast2400 controller only goes up to group AB).
A set has four groups (except set AC which only has one) and is
referred to by the groups it is composed of (eg ABCD,EFGH,...,YZAAAB).
Each set is accessed and controlled by a bank of 14 registers.
These registers operate on a per pin level where each bit in the register
corresponds to a pin, except for the command source registers. The command
source registers operate on a per group level where bits 24, 16, 8 and 0
correspond to each group in the set.
eg. registers for set ABCD:
|D7...D0|C7...C0|B7...B0|A7...A0| <- GPIOs
|31...24|23...16|15....8|7.....0| <- bit position
Note that there are a couple of groups that only have 4 pins.
There are two ways that this model deviates from the behaviour of the
actual controller:
(1) The only control source driving the GPIO pins in the model is the ARM
model (as there currently aren't models for the LPC or Coprocessor).
(2) None of the registers in the model are reset tolerant (needs
integration with the watchdog).
Signed-off-by: Rashmica Gupta <rashmica.g@gmail.com>
Tested-by: Andrew Jeffery <andrew@aj.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-id: 20190904070506.1052-2-clg@kaod.org
[clg: fixed missing header files
made use of HWADDR_PRIx to fix compilation on windows ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Migration pull 2019-09-12
New feature:
UUID validation check from Yury Kotov
plus a bunch of fixes.
# gpg: Signature made Thu 12 Sep 2019 14:48:28 BST
# gpg: using RSA key 45F5C71B4A0CB7FB977A9FA90516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>" [full]
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A 9FA9 0516 331E BC5B FDE7
* remotes/dgilbert/tags/pull-migration-20190912a:
migration: fix one typo in comment of function migration_total_bytes()
migration/qemu-file: fix potential buf waste for extra buf_index adjustment
migration/qemu-file: remove check on writev_buffer in qemu_put_compression_data
migration: Fix postcopy bw for recovery
tests/migration: Add a test for validate-uuid capability
tests/libqtest: Allow setting expected exit status
migration: Add validate-uuid capability
qemu-file: Rework old qemu_fflush comment
migration: register_savevm_live doesn't need dev
hw/net/vmxnet3: Fix leftover unregister_savevm
migration: cleanup check on ops in savevm.handlers iterations
migration: multifd_send_thread always post p->sem_sync when error happen
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Block layer patches:
- qcow2: Allow overwriting multiple compressed clusters at once for
better performance
- nfs: add support for nfs_umount
- file-posix: write_zeroes fixes
- qemu-io, blockdev-create, pr-manager: Fix crashes and memory leaks
- qcow2: Fix the calculation of the maximum L2 cache size
- vpc: Fix return code for vpc_co_create()
- blockjob: Code cleanup
- iotests improvements (e.g. for use with valgrind)
# gpg: Signature made Fri 13 Sep 2019 11:19:19 BST
# gpg: using RSA key 7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* remotes/kevin/tags/for-upstream: (23 commits)
qcow2: Stop overwriting compressed clusters one by one
block/create: Do not abort if a block driver is not available
qemu-io: Don't leak pattern file in error path
iotests: extend sleeping time under Valgrind
iotests: extended timeout under Valgrind
iotests: Valgrind fails with nonexistent directory
iotests: Add casenotrun report to bash tests
iotests: exclude killed processes from running under Valgrind
iotests: allow Valgrind checking all QEMU processes
block/nfs: add support for nfs_umount
block/nfs: tear down aio before nfs_close
iotests: skip 232 when run tests as root
iotests: Test blockdev-create for vpc
iotests: Restrict nbd Python tests to nbd
iotests: Restrict file Python tests to file
iotests: Add supported protocols to execute_test()
vpc: Return 0 from vpc_co_create() on success
file-posix: Fix has_write_zeroes after NO_FALLBACK
pr-manager: Fix invalid g_free() crash bug
iotests: Test reverse sub-cluster qcow2 writes
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
In job_finish_sync job_enter should be enough for a job to make some
progress and draining is a wrong tool for it. So use job_enter directly
here and drop job_drain with all related staff not used more.
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Tested-by: John Snow <jsnow@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
On Sparc and PowerMac, the bit 0 of the address selects the register
type (control or data) and bit 1 selects the channel (B or A).
On m68k Macintosh and NeXTcube, the bit 0 selects the channel and
bit 1 the register type.
This patch introduces a new parameter (bit_swap) to the device interface
to indicate bits usage must be swapped between registers and channels.
For the moment all the machines use the bit 0, but this change will be
needed to emulate the Quadra 800 or NeXTcube machine.
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Hervé Poussineau <hpoussin@reactos.org>
[thh: added NeXTcube to the patch description]
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190831074519.32613-5-huth@tuxfamily.org>
Signed-off-by: Thomas Huth <huth@tuxfamily.org>
It is still quite incomplete (no SCSI, no floppy emulation, no network,
etc.), but the firmware already shows up the debug monitor prompt in the
framebuffer display, so at least the very basics are already working.
This code has been taken from Bryce Lanham's GSoC 2011 NeXT branch at
https://github.com/blanham/qemu-NeXT/blob/next-cube/hw/next-cube.c
and altered quite a bit to fit the latest interface and coding conventions
of the current QEMU.
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190831074519.32613-4-huth@tuxfamily.org>
Signed-off-by: Thomas Huth <huth@tuxfamily.org>
It is likely still quite incomplete (e.g. mouse and interrupts are not
implemented yet), but it is good enough for keyboard input at the firmware
monitor.
This code has been taken from Bryce Lanham's GSoC 2011 NeXT branch at
https://github.com/blanham/qemu-NeXT/blob/next-cube/hw/next-kbd.c
and altered to fit the latest interface of the current QEMU (e.g. to use
memory_region_init_io() instead of cpu_register_physical_memory()).
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190831074519.32613-3-huth@tuxfamily.org>
Signed-off-by: Thomas Huth <huth@tuxfamily.org>
Commit fe0480d6 and friends added BDRV_REQ_NO_FALLBACK as a way to
avoid wasting time on a preliminary write-zero request that will later
be rewritten by actual data, if it is known that the write-zero
request will use a slow fallback; but in doing so, could not optimize
for NBD. The NBD specification is now considering an extension that
will allow passing on those semantics; this patch updates the new
protocol bits and 'qemu-nbd --list' output to recognize the bit, as
well as the new errno value possible when using the new flag; while
upcoming patches will improve the client to use the feature when
present, and the server to advertise support for it.
The NBD spec recommends (but not requires) that ENOTSUP be avoided for
all but failures of a fast zero (the only time it is mandatory to
avoid an ENOTSUP failure is when fast zero is supported but not
requested during write zeroes; the questionable use is for ENOTSUP to
other actions like a normal write request). However, clients that get
an unexpected ENOTSUP will either already be treating it the same as
EINVAL, or may appreciate the extra bit of information. We were
equally loose for returning EOVERFLOW in more situations than
recommended by the spec, so if it turns out to be a problem in
practice, a later patch can tighten handling for both error codes.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190823143726.27062-3-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[eblake: tweak commit message, also handle EOPNOTSUPP]
When creating a read-only image, we are still advertising support for
TRIM and WRITE_ZEROES to the client, even though the client should not
be issuing those commands. But seeing this requires looking across
multiple functions:
All callers to nbd_export_new() passed a single flag based solely on
whether the export allows writes. Later, we then pass a constant set
of flags to nbd_negotiate_options() (namely, the set of flags which we
always support, at least for writable images), which is then further
dynamically modified with NBD_FLAG_SEND_DF based on client requests
for structured options. Finally, when processing NBD_OPT_EXPORT_NAME
or NBD_OPT_EXPORT_GO we bitwise-or the original caller's flag with the
runtime set of flags we've built up over several functions.
Let's refactor things to instead compute a baseline of flags as soon
as possible which gets shared between multiple clients, in
nbd_export_new(), and changing the signature for the callers to pass
in a simpler bool rather than having to figure out flags. We can then
get rid of the 'myflags' parameter to various functions, and instead
refer to client for everything we need (we still have to perform a
bitwise-OR for NBD_FLAG_SEND_DF during NBD_OPT_EXPORT_NAME and
NBD_OPT_EXPORT_GO, but it's easier to see what is being computed).
This lets us quit advertising senseless flags for read-only images, as
well as making the next patch for exposing FAST_ZERO support easier to
write.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190823143726.27062-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[eblake: improve commit message, update iotest 223]
The NBD specification defines NBD_FLAG_CAN_MULTI_CONN, which can be
advertised when the server promises cache consistency between
simultaneous clients (basically, rules that determine what FUA and
flush from one client are able to guarantee for reads from another
client). When we don't permit simultaneous clients (such as qemu-nbd
without -e), the bit makes no sense; and for writable images, we
probably have a lot more work before we can declare that actions from
one client are cache-consistent with actions from another. But for
read-only images, where flush isn't changing any data, we might as
well advertise multi-conn support. What's more, advertisement of the
bit makes it easier for clients to determine if 'qemu-nbd -e' was in
use, where a second connection will succeed rather than hang until the
first client goes away.
This patch affects qemu as server in advertising the bit. We may want
to consider patches to qemu as client to attempt parallel connections
for higher throughput by spreading the load over those connections
when a server advertises multi-conn, but for now sticking to one
connection per nbd:// BDS is okay.
See also: https://bugzilla.redhat.com/1708300
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190815185024.7010-1-eblake@redhat.com>
[eblake: tweak blockdev-nbd.c to not request shared when writable,
fix iotest 233]
Reviewed-by: John Snow <jsnow@redhat.com>
Allow page table bit to swap endianness.
Reorganize watchpoints out of i/o path.
Return host address from probe_write / probe_access.
# gpg: Signature made Tue 03 Sep 2019 16:47:50 BST
# gpg: using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F
# gpg: issuer "richard.henderson@linaro.org"
# gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" [full]
# Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A 05C0 64DF 38E8 AF7E 215F
* remotes/rth/tags/pull-tcg-20190903: (36 commits)
tcg: Factor out probe_write() logic into probe_access()
tcg: Make probe_write() return a pointer to the host page
s390x/tcg: Pass a size to probe_write() in do_csst()
hppa/tcg: Call probe_write() also for CONFIG_USER_ONLY
mips/tcg: Call probe_write() for CONFIG_USER_ONLY as well
tcg: Enforce single page access in probe_write()
tcg: Factor out CONFIG_USER_ONLY probe_write() from s390x code
s390x/tcg: Fix length calculation in probe_write_access()
s390x/tcg: Use guest_addr_valid() instead of h2g_valid() in probe_write_access()
tcg: Check for watchpoints in probe_write()
cputlb: Handle watchpoints via TLB_WATCHPOINT
cputlb: Remove double-alignment in store_helper
cputlb: Fix size operand for tlb_fill on unaligned store
exec: Factor out cpu_watchpoint_address_matches
cputlb: Fold TLB_RECHECK into TLB_INVALID_MASK
exec: Factor out core logic of check_watchpoint()
exec: Move user-only watchpoint stubs inline
target/sparc: sun4u Invert Endian TTE bit
target/sparc: Add TLB entry with attributes
cputlb: Byte swap memory transaction attribute
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>