mirror/xemu - xemu - Gitea: Git with a cup of tea

mirror of https://github.com/mborgerson/xemu.git synced 2026-03-12 06:34:54 +00:00

Author	SHA1	Message	Date
Fiona Ebner	6b89e851fa	block: add bdrv_graph_wrlock_drained() convenience wrapper Many write-locked sections are also drained sections. A new bdrv_graph_wrunlock_drained() wrapper around bdrv_graph_wrunlock() is introduced, which will begin a drained section first. A global variable is used so bdrv_graph_wrunlock() knows if it also needs to end such a drained section. Both the aio_poll call in bdrv_graph_wrlock() and the aio_bh_poll() in bdrv_graph_wrunlock() can re-enter a write-locked section. While for the latter, ending the drain could be moved to before the call, the former requires that the variable is a counter and not just a boolean. Since the wrapper calls bdrv_drain_all_begin(), which must be called with the graph unlocked, mark the wrapper as GRAPH_UNLOCKED too. The switch to the new helpers was generated with the following commands and then manually checked: find . -name '.c' -exec sed -i -z 's/bdrv_drain_all_begin();\n\sbdrv_graph_wrlock();/bdrv_graph_wrlock_drained();/g' {} ';' find . -name '.c' -exec sed -i -z 's/bdrv_graph_wrunlock();\n\sbdrv_drain_all_end();/bdrv_graph_wrunlock();/g' {} ';' Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-25-f.ebner@proxmox.com> [kwolf: Removed redundant GRAPH_UNLOCKED] Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2025-07-14 15:40:58 +02:00
Fiona Ebner	d75f8ed1d7	block: move drain outside of quorum_del_child() The quorum_del_child() callback runs under the graph lock, so it is not allowed to drain. It is only called as the .bdrv_del_child() callback, which is only called in the bdrv_del_child() function, which also runs under the graph lock. The bdrv_del_child() function is called by qmp_x_blockdev_change(). A drained section was already introduced there by commit "block: move drain out of quorum_add_child()". This finally finishes moving out the drain to places that are not under the graph lock started in "block: move draining out of bdrv_change_aio_context() and mark GRAPH_RDLOCK". Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-17-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2025-06-04 18:16:34 +02:00
Fiona Ebner	b13f546545	block: move drain outside of bdrv_root_unref_child() This is part of resolving the deadlock mentioned in commit "block: move draining out of bdrv_change_aio_context() and mark GRAPH_RDLOCK". bdrv_root_unref_child() is called by: 1. blk_remove_bs(), where a drained section is introduced. 2. bdrv_unref_child(), which runs under the graph lock, so the drain will be moved further up to its callers. 3. block_job_remove_all_bdrv(), where a drained section is introduced. For all callers of bdrv_unref_child() and its generated bdrv_co_unref_child() coroutine variant, a drained section is introduced, they are not explicilty listed here. The caller quorum_del_child() holds the graph lock, so it is not actually allowed to drain. This will be addressed in the next commit. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-16-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2025-06-04 18:16:34 +02:00
Fiona Ebner	0414930d3a	block: move drain outside of quorum_add_child() This is part of resolving the deadlock mentioned in commit "block: move draining out of bdrv_change_aio_context() and mark GRAPH_RDLOCK". The quorum_add_child() callback runs under the graph lock, so it is not allowed to drain. It is only called as the .bdrv_add_child() callback, which is only called in the bdrv_add_child() function, which also runs under the graph lock. The bdrv_add_child() function is called by qmp_x_blockdev_change(), where a drained section is introduced. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-15-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2025-06-04 18:16:34 +02:00
Fiona Ebner	77f3965ba7	block: move drain outside of bdrv_attach_child() This is part of resolving the deadlock mentioned in commit "block: move draining out of bdrv_change_aio_context() and mark GRAPH_RDLOCK". The function bdrv_attach_child() runs under the graph lock, so it is not allowed to drain. It is called by: 1. replication_start() 2. quorum_add_child() 3. bdrv_open_child_common() 4. Throughout test-bdrv-graph-mod.c and test-bdrv-drain.c unit tests. In all callers, a drained section is introduced. The function quorum_add_child() runs under the graph lock, so it is not actually allowed to drain. This will be addressed by the following commit. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20250530151125.955508-14-f.ebner@proxmox.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2025-06-04 18:16:34 +02:00
Eric Blake	c33159dec7	block: Expand block status mode from bool to flags This patch is purely mechanical, changing bool want_zero into an unsigned int for bitwise-or of flags. As of this patch, all implementations are unchanged (the old want_zero==true is now mode==BDRV_WANT_PRECISE which is a superset of BDRV_WANT_ZERO); but the callers in io.c that used to pass want_zero==false are now prepared for future driver changes that can now distinguish bewteen BDRV_WANT_ZERO vs. BDRV_WANT_ALLOCATED. The next patch will actually change the file-posix driver along those lines, now that we have more-specific hints. As for the background why this patch is useful: right now, the file-posix driver recognizes that if allocation is being queried, the entire image can be reported as allocated (there is no backing file to refer to) - but this throws away information on whether the entire image reads as zero (trivially true if lseek(SEEK_HOLE) at offset 0 returns -ENXIO, a bit more complicated to prove if the raw file was created with 'qemu-img create' since we intentionally allocate a small chunk of all-zero data to help with alignment probing). Later patches will add a generic algorithm for seeing if an entire file reads as zeroes. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20250509204341.3553601-16-eblake@redhat.com>	2025-05-14 15:33:34 -05:00
Daniel P. Berrangé	407bc4bf90	qapi: Move include/qapi/qmp/ to include/qobject/ The general expectation is that header files should follow the same file/path naming scheme as the corresponding source file. There are various historical exceptions to this practice in QEMU, with one of the most notable being the include/qapi/qmp/ directory. Most of the headers there correspond to source files in qobject/. This patch corrects most of that inconsistency by creating include/qobject/ and moving the headers for qobject/ there. This also fixes MAINTAINERS for include/qapi/qmp/dispatch.h: scripts/get_maintainer.pl now reports "QAPI" instead of "No maintainers found". Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Acked-by: Halil Pasic <pasic@linux.ibm.com> #s390x Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-ID: <20241118151235.2665921-2-armbru@redhat.com> [Rebased]	2025-02-10 15:33:16 +01:00
Markus Armbruster	ef834aa2b2	qapi/crypto: Rename QCryptoHashAlgorithm to *Algo, and drop prefix QAPI's 'prefix' feature can make the connection between enumeration type and its constants less than obvious. It's best used with restraint. QCryptoHashAlgorithm has a 'prefix' that overrides the generated enumeration constants' prefix to QCRYPTO_HASH_ALG. We could simply drop 'prefix', but then the prefix becomes QCRYPTO_HASH_ALGORITHM, which is rather long. We could additionally rename the type to QCryptoHashAlg, but I think the abbreviation "alg" is less than clear. Rename the type to QCryptoHashAlgo instead. The prefix becomes to QCRYPTO_HASH_ALGO. Signed-off-by: Markus Armbruster <armbru@redhat.com> Acked-by: Daniel P. Berrangé <berrange@redhat.com> Message-ID: <20240904111836.3273842-12-armbru@redhat.com> [Conflicts with merge commit `7bbadc60b5` resolved]	2024-09-10 14:02:16 +02:00
Stefan Hajnoczi	6bc30f1949	graph-lock: remove AioContext locking Stop acquiring/releasing the AioContext lock in bdrv_graph_wrlock()/bdrv_graph_unlock() since the lock no longer has any effect. The distinction between bdrv_graph_wrunlock() and bdrv_graph_wrunlock_ctx() becomes meaningless and they can be collapsed into one function. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20231205182011.1976568-6-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-12-21 22:49:27 +01:00
Kevin Wolf	6bc0bcc89f	block: Fix deadlocks in bdrv_graph_wrunlock() bdrv_graph_wrunlock() calls aio_poll(), which may run callbacks that have a nested event loop. Nested event loops can depend on other iothreads making progress, so in order to allow them to make progress it must not hold the AioContext lock of another thread while calling aio_poll(). This introduces a @bs parameter to bdrv_graph_wrunlock() whose AioContext is temporarily dropped (which matches bdrv_graph_wrlock()), and a bdrv_graph_wrunlock_ctx() that can be used if the BlockDriverState doesn't necessarily exist any more when unlocking. This also requires a change to bdrv_schedule_unref(), which was relying on the incorrectly taken lock. It needs to take the lock itself now. While this is a separate bug, it can't be fixed a separate patch because otherwise the intermediate state would either deadlock or try to release a lock that we don't even hold. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20231115172012.112727-3-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> [kwolf: Fixed up bdrv_schedule_unref()] Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-11-21 12:45:21 +01:00
Kevin Wolf	4026f1c4f3	block: Mark bdrv_get_parent_name() and callers GRAPH_RDLOCK This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_get_parent_name() need to hold a reader lock for the graph because it accesses the parents list of a node. For some places, we know that they will hold the lock, but we don't have the GRAPH_RDLOCK annotations yet. In this case, add assume_graph_lock() with a FIXME comment. These places will be removed once everything is properly annotated. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20230929145157.45443-13-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-10-12 16:31:33 +02:00
Kevin Wolf	9def6082cf	block: Mark bdrv_add/del_child() and caller GRAPH_WRLOCK The functions read the parents list in the generic block layer, so we need to hold the graph lock already there. The BlockDriver implementations actually modify the graph, so it has to be a writer lock. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20230911094620.45040-22-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-09-20 17:46:01 +02:00
Kevin Wolf	32a8aba37e	block: Mark bdrv_unref_child() GRAPH_WRLOCK Instead of taking the writer lock internally, require callers to already hold it when calling bdrv_unref_child(). These callers will typically already hold the graph lock once the locking work is completed, which means that they can't call functions that take it internally. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20230911094620.45040-21-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-09-20 17:46:01 +02:00
Kevin Wolf	afdaeb9ea0	block: Mark bdrv_attach_child() GRAPH_WRLOCK Instead of taking the writer lock internally, require callers to already hold it when calling bdrv_attach_child_common(). These callers will typically already hold the graph lock once the locking work is completed, which means that they can't call functions that take it internally. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20230911094620.45040-13-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-09-20 17:46:01 +02:00
Kevin Wolf	533c6e4ee8	block: Mark bdrv_recurse_can_replace() and callers GRAPH_RDLOCK This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_recurse_can_replace() need to hold a reader lock for the graph because it accesses the children list of a node. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20230504115750.54437-20-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-05-10 14:16:54 +02:00
Kevin Wolf	8ab8140a04	block: Mark bdrv_co_refresh_total_sectors() and callers GRAPH_RDLOCK This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_co_refresh_total_sectors() need to hold a reader lock for the graph. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230203152202.49054-24-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-02-23 19:49:33 +01:00
Kevin Wolf	b9b10c35e5	block: Mark public read/write functions GRAPH_RDLOCK This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_co_pread/pwrite() need to hold a reader lock for the graph. For some places, we know that they will hold the lock, but we don't have the GRAPH_RDLOCK annotations yet. In this case, add assume_graph_lock() with a FIXME comment. These places will be removed once everything is properly annotated. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230203152202.49054-12-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-02-23 19:49:17 +01:00
Kevin Wolf	7b1fb72e2c	block: Mark read/write in block/io.c GRAPH_RDLOCK This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_driver_*() need to hold a reader lock for the graph. It doesn't add the annotation to public functions yet. For some places, we know that they will hold the lock, but we don't have the GRAPH_RDLOCK annotations yet. In this case, add assume_graph_lock() with a FIXME comment. These places will be removed once everything is properly annotated. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230203152202.49054-11-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-02-23 19:49:16 +01:00
Kevin Wolf	abaf8b750b	block: Mark bdrv_co_pwrite_zeroes() and callers GRAPH_RDLOCK This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_co_pwrite_zeroes() need to hold a reader lock for the graph. For some places, we know that they will hold the lock, but we don't have the GRAPH_RDLOCK annotations yet. In this case, add assume_graph_lock() with a FIXME comment. These places will be removed once everything is properly annotated. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230203152202.49054-10-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-02-23 19:49:14 +01:00
Emanuele Giuseppe Esposito	8809534933	block: Mark bdrv_co_flush() and callers GRAPH_RDLOCK This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_co_flush() need to hold a reader lock for the graph. For some places, we know that they will hold the lock, but we don't have the GRAPH_RDLOCK annotations yet. In this case, add assume_graph_lock() with a FIXME comment. These places will be removed once everything is properly annotated. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230203152202.49054-8-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-02-23 19:49:12 +01:00
Kevin Wolf	7ff9579e60	block: Mark bdrv_co_block_status() and callers GRAPH_RDLOCK This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_co_block_status() need to hold a reader lock for the graph. For some places, we know that they will hold the lock, but we don't have the GRAPH_RDLOCK annotations yet. In this case, add assume_graph_lock() with a FIXME comment. These places will be removed once everything is properly annotated. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230203152202.49054-5-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-02-23 19:49:07 +01:00
Emanuele Giuseppe Esposito	c86422c554	block: Convert bdrv_refresh_total_sectors() to co_wrapper_mixed BlockDriver->bdrv_getlength is categorized as IO callback, and it currently doesn't run in a coroutine. We should let it take a graph rdlock since the callback traverses the block nodes graph, which however is only possible in a coroutine. Therefore turn it into a co_wrapper to move the actual function into a coroutine where the lock can be taken. Because now this function creates a new coroutine and polls, we need to take the AioContext lock where it is missing, for the only reason that internally co_wrapper calls AIO_WAIT_WHILE and it expects to release the AioContext lock. This is especially messy when a co_wrapper creates a coroutine and polls in bdrv_open_driver, because this function has so many callers in so many context that it can easily lead to deadlocks. Therefore the new rule for bdrv_open_driver is that the caller must always hold the AioContext lock of the given bs (except if it is a coroutine), because the function calls bdrv_refresh_total_sectors() which is now a co_wrapper. Once the rwlock is ultimated and placed in every place it needs to be, we will poll using AIO_WAIT_WHILE_UNLOCKED and remove the AioContext lock. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230113204212.359076-7-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-02-01 16:52:32 +01:00
Markus Armbruster	54fde4ff06	qapi block: Elide redundant has_FOO in generated C The has_FOO for pointer-valued FOO are redundant, except for arrays. They are also a nuisance to work with. Recent commit "qapi: Start to elide redundant has_FOO in generated C" provided the means to elide them step by step. This is the step for qapi/block*.json. Said commit explains the transformation in more detail. There is one instance of the invariant violation mentioned there: qcow2_signal_corruption() passes false, "" when node_name is an empty string. Take care to pass NULL then. The previous two commits cleaned up two more. Additionally, helper bdrv_latency_histogram_stats() loses its output parameters and returns a value instead. Cc: Kevin Wolf <kwolf@redhat.com> Cc: Hanna Reitz <hreitz@redhat.com> Cc: qemu-block@nongnu.org Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20221104160712.3005652-11-armbru@redhat.com> [Fixes for #ifndef LIBRBD_SUPPORTS_ENCRYPTION and MacOS squashed in]	2022-12-14 20:03:25 +01:00
Kevin Wolf	2ffc10d53b	quorum: Remove unnecessary forward declaration Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20221006122607.162769-1-kwolf@redhat.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-10-07 12:11:41 +02:00
Paolo Bonzini	2987ae7d84	quorum: add missing coroutine_fn annotations Callers of coroutine_fn must be coroutine_fn themselves, or the call must be within "if (qemu_in_coroutine())". Apply coroutine_fn to functions where this holds. Reviewed-by: Alberto Faria <afaria@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220922084924.201610-19-pbonzini@redhat.com> [kwolf: Fixed up coding style] Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-10-07 12:11:41 +02:00
Peter Maydell	5df022cf2e	osdep: Move memalign-related functions to their own header Move the various memalign-related functions out of osdep.h and into their own header, which we include only where they are used. While we're doing this, add some brief documentation comments. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 20220226180723.1706285-10-peter.maydell@linaro.org	2022-03-07 13:16:49 +00:00
Vladimir Sementsov-Ogievskiy	f34b2bcf8c	block: use int64_t instead of int in driver write_zeroes handlers We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, convert driver write_zeroes handlers bytes parameter to int64_t. The only caller of all updated function is bdrv_co_do_pwrite_zeroes(). bdrv_co_do_pwrite_zeroes() itself is of course OK with widening of callee parameter type. Also, bdrv_co_do_pwrite_zeroes()'s max_write_zeroes is limited to INT_MAX. So, updated functions all are safe, they will not get "bytes" larger than before. Still, let's look through all updated functions, and add assertions to the ones which are actually unprepared to values larger than INT_MAX. For these drivers also set explicit max_pwrite_zeroes limit. Let's go: blkdebug: calculations can't overflow, thanks to bdrv_check_qiov_request() in generic layer. rule_check() and bdrv_co_pwrite_zeroes() both have 64bit argument. blklogwrites: pass to blk_log_writes_co_log() with 64bit argument. blkreplay, copy-on-read, filter-compress: pass to bdrv_co_pwrite_zeroes() which is OK copy-before-write: Calls cbw_do_copy_before_write() and bdrv_co_pwrite_zeroes, both have 64bit argument. file-posix: both handler calls raw_do_pwrite_zeroes, which is updated. In raw_do_pwrite_zeroes() calculations are OK due to bdrv_check_qiov_request(), bytes go to RawPosixAIOData::aio_nbytes which is uint64_t. Check also where that uint64_t gets handed: handle_aiocb_write_zeroes_block() passes a uint64_t[2] to ioctl(BLKZEROOUT), handle_aiocb_write_zeroes() calls do_fallocate() which takes off_t (and we compile to always have 64-bit off_t), as does handle_aiocb_write_zeroes_unmap. All look safe. gluster: bytes go to GlusterAIOCB::size which is int64_t and to glfs_zerofill_async works with off_t. iscsi: Aha, here we deal with iscsi_writesame16_task() that has uint32_t num_blocks argument and iscsi_writesame16_task() has uint16_t argument. Make comments, add assertions and clarify max_pwrite_zeroes calculation. iscsi_allocmap_() functions already has int64_t argument is_byte_request_lun_aligned is simple to update, do it. mirror_top: pass to bdrv_mirror_top_do_write which has uint64_t argument nbd: Aha, here we have protocol limitation, and NBDRequest::len is uint32_t. max_pwrite_zeroes is cleanly set to 32bit value, so we are OK for now. nvme: Again, protocol limitation. And no inherent limit for write-zeroes at all. But from code that calculates cdw12 it's obvious that we do have limit and alignment. Let's clarify it. Also, obviously the code is not prepared to handle bytes=0. Let's handle this case too. trace events already 64bit preallocate: pass to handle_write() and bdrv_co_pwrite_zeroes(), both 64bit. rbd: pass to qemu_rbd_start_co() which is 64bit. qcow2: offset + bytes and alignment still works good (thanks to bdrv_check_qiov_request()), so tail calculation is OK qcow2_subcluster_zeroize() has 64bit argument, should be OK trace events updated qed: qed_co_request wants int nb_sectors. Also in code we have size_t used for request length which may be 32bit. So, let's just keep INT_MAX as a limit (aligning it down to pwrite_zeroes_alignment) and don't care. raw-format: Is OK. raw_adjust_offset and bdrv_co_pwrite_zeroes are both 64bit. throttle: Both throttle_group_co_io_limits_intercept() and bdrv_co_pwrite_zeroes() are 64bit. vmdk: pass to vmdk_pwritev which is 64bit quorum: pass to quorum_co_pwritev() which is 64bit Hooray! At this point all block drivers are prepared to support 64bit write-zero requests, or have explicitly set max_pwrite_zeroes. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210903102807.27127-8-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: use <= rather than < in assertions relying on max_pwrite_zeroes] Signed-off-by: Eric Blake <eblake@redhat.com>	2021-09-29 13:46:32 -05:00
Vladimir Sementsov-Ogievskiy	e75abedab7	block: use int64_t instead of uint64_t in driver write handlers We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, convert driver write handlers parameters which are already 64bit to signed type. While being here, convert also flags parameter to be BdrvRequestFlags. Now let's consider all callers. Simple git grep '\->bdrv_$aio\\|co$_pwritev$_part$\?' shows that's there three callers of driver function: bdrv_driver_pwritev() and bdrv_driver_pwritev_compressed() in block/io.c, both pass int64_t, checked by bdrv_check_qiov_request() to be non-negative. qcow2_save_vmstate() does bdrv_check_qiov_request(). Still, the functions may be called directly, not only by drv->... Let's check: git grep '\.bdrv_$aio\\|co$_pwritev$_part$\?\s*=' \| \ awk '{print $4}' \| sed 's/,//' \| sed 's/&//' \| sort \| uniq \| \ while read func; do git grep "$func(" \| \ grep -v "$func(BlockDriverState"; done shows several callers: qcow2: qcow2_co_truncate() write at most up to @offset, which is checked in generic qcow2_co_truncate() by bdrv_check_request(). qcow2_co_pwritev_compressed_task() pass the request (or part of the request) that already went through normal write path, so it should be OK qcow: qcow_co_pwritev_compressed() pass int64_t, it's updated by this patch quorum: quorum_co_pwrite_zeroes() pass int64_t and int - OK throttle: throttle_co_pwritev_compressed() pass int64_t, it's updated by this patch vmdk: vmdk_co_pwritev_compressed() pass int64_t, it's updated by this patch Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210903102807.27127-5-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-09-29 13:46:31 -05:00
Vladimir Sementsov-Ogievskiy	f7ef38dd13	block: use int64_t instead of uint64_t in driver read handlers We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, convert driver read handlers parameters which are already 64bit to signed type. While being here, convert also flags parameter to be BdrvRequestFlags. Now let's consider all callers. Simple git grep '\->bdrv_$aio\\|co$_preadv$_part$\?' shows that's there three callers of driver function: bdrv_driver_preadv() in block/io.c, passes int64_t, checked by bdrv_check_qiov_request() to be non-negative. qcow2_load_vmstate() does bdrv_check_qiov_request(). do_perform_cow_read() has uint64_t argument. And a lot of things in qcow2 driver are uint64_t, so converting it is big job. But we must not work with requests that don't satisfy bdrv_check_qiov_request(), so let's just assert it here. Still, the functions may be called directly, not only by drv->... Let's check: git grep '\.bdrv_$aio\\|co$_preadv$_part$\?\s*=' \| \ awk '{print $4}' \| sed 's/,//' \| sed 's/&//' \| sort \| uniq \| \ while read func; do git grep "$func(" \| \ grep -v "$func(BlockDriverState"; done The only one such caller: QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, &data, 1); ... ret = bdrv_replace_test_co_preadv(bs, 0, 1, &qiov, 0); in tests/unit/test-bdrv-drain.c, and it's OK obviously. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210903102807.27127-4-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: fix typos] Signed-off-by: Eric Blake <eblake@redhat.com>	2021-09-29 13:46:31 -05:00
Lukas Straub	5529b02da2	block/quorum: Provide .bdrv_co_flush instead of .bdrv_co_flush_to_disk The quorum block driver uses a custom flush callback to handle the case when some children return io errors. In that case it still returns success if enough children are healthy. However, it provides it as the .bdrv_co_flush_to_disk callback, not as .bdrv_co_flush. This causes the block layer to do it's own generic flushing for the children instead, which doesn't handle errors properly. Fix this by providing .bdrv_co_flush instead of .bdrv_co_flush_to_disk so the block layer uses the custom flush callback. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Reported-by: Minghao Yuan <meeho@qq.com> Message-Id: <20210518134214.11ccf05f@gecko.fritz.box> Tested-by: Zhang Chen <chen.zhang@intel.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-06-02 14:23:20 +02:00
Vladimir Sementsov-Ogievskiy	bc52024959	block: check return value of bdrv_open_child and drop error propagation This patch is generated by cocci script: @@ symbol bdrv_open_child, errp, local_err; expression file; @@ file = bdrv_open_child(..., - &local_err + errp ); - if (local_err) + if (!file) { ... - error_propagate(errp, local_err); ... } with command spatch --sp-file x.cocci --macro-file scripts/cocci-macro-file.h \ --in-place --no-show-diff --max-width 80 --use-gitgrep block Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20210202124956.63146-4-vsementsov@virtuozzo.com> [eblake: fix qcow2_do_open() to use ERRP_GUARD, necessary as the only caller to pass allow_none=true] Signed-off-by: Eric Blake <eblake@redhat.com>	2021-03-08 15:07:09 -06:00
Alberto Garcia	5cddb2e95f	quorum: Implement bdrv_co_pwrite_zeroes() This simply calls bdrv_co_pwrite_zeroes() in all children. bs->supported_zero_flags is also set to the flags that are supported by all children. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <2f09c842781fe336b4c2e40036bba577b7430190.1605286097.git.berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-12-18 12:35:55 +01:00
Alberto Garcia	ef9bba1484	quorum: Implement bdrv_co_block_status() The quorum driver does not implement bdrv_co_block_status() and because of that it always reports to contain data even if all its children are known to be empty. One consequence of this is that if we for example create a quorum with a size of 10GB and we mirror it to a new image the operation will write 10GB of actual zeroes to the destination image wasting a lot of time and disk space. Since a quorum has an arbitrary number of children of potentially different formats there is no way to report all possible allocation status flags in a way that makes sense, so this implementation only reports when a given region is known to contain zeroes (BDRV_BLOCK_ZERO) or not (BDRV_BLOCK_DATA). If all children agree that a region contains zeroes then we can return BDRV_BLOCK_ZERO using the smallest size reported by the children (because all agree that a region of at least that size contains zeroes). If at least one child disagrees we have to return BDRV_BLOCK_DATA. In this case we use the largest of the sizes reported by the children that didn't return BDRV_BLOCK_ZERO (because we know that there won't be an agreement for at least that size). Signed-off-by: Alberto Garcia <berto@igalia.com> Tested-by: Tao Xu <tao3.xu@intel.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <db83149afcf0f793effc8878089d29af4c46ffe1.1605286097.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-12-18 12:35:55 +01:00
Markus Armbruster	6cc0667d9b	Tweak a few "Parameter 'NAME' expects THING" error message Change to "expects a THING" where that's an obvious improvement Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20201113082626.2725812-11-armbru@redhat.com>	2020-12-10 17:16:44 +01:00
Max Reitz	9ca5b0e842	quorum: Require WRITE perm with rewrite-corrupted Using rewrite-corrupted means quorum may issue writes to its children just from receiving read requests from its parents. Thus, it must take the WRITE permission when rewrite-corrupted is used. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201113211718.261671-2-mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-11-17 12:38:28 +01:00
Lukas Straub	5eb9a3c7b0	block/quorum.c: stable children names If we remove the child with the highest index from the quorum, decrement s->next_child_index. This way we get stable children names as long as we only remove the last child. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Fixes: https://bugs.launchpad.net/bugs/1881231 Reviewed-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <5d5f930424c1c770754041aa8ad6421dc4e2b58e.1596536719.git.lukasstraub2@web.de> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:05:12 +02:00
Markus Armbruster	a5f9b9df25	error: Reduce unnecessary error propagation When all we do with an Error we receive into a local variable is propagating to somewhere else, we can just as well receive it there right away, even when we need to keep error_propagate() for other error paths. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200707160613.848843-38-armbru@redhat.com>	2020-07-10 15:18:08 +02:00
Markus Armbruster	dcfe480544	error: Avoid unnecessary error_propagate() after error_setg() Replace error_setg(&err, ...); error_propagate(errp, err); by error_setg(errp, ...); Related pattern: if (...) { error_setg(&err, ...); goto out; } ... out: error_propagate(errp, err); return; When all paths to label out are that way, replace by if (...) { error_setg(errp, ...); return; } and delete the label along with the error_propagate(). When we have at most one other path that actually needs to propagate, and maybe one at the end that where propagation is unnecessary, e.g. foo(..., &err); if (err) { goto out; } ... bar(..., &err); out: error_propagate(errp, err); return; move the error_propagate() to where it's needed, like if (...) { foo(..., &err); error_propagate(errp, err); return; } ... bar(..., errp); return; and transform the error_setg() as above. In some places, the transformation results in obviously unnecessary error_propagate(). The next few commits will eliminate them. Bonus: the elimination of gotos will make later patches in this series easier to review. Candidates for conversion tracked down with this Coccinelle script: @@ identifier err, errp; expression list args; @@ - error_setg(&err, args); + error_setg(errp, args); ... when != err error_propagate(errp, err); Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200707160613.848843-34-armbru@redhat.com>	2020-07-10 15:18:08 +02:00
Markus Armbruster	235e59cf03	qemu-option: Use returned bool to check for failure The previous commit enables conversion of foo(..., &err); if (err) { ... } to if (!foo(..., &err)) { ... } for QemuOpts functions that now return true / false on success / error. Coccinelle script: @@ identifier fun = { opts_do_parse, parse_option_bool, parse_option_number, parse_option_size, qemu_opt_parse, qemu_opt_rename, qemu_opt_set, qemu_opt_set_bool, qemu_opt_set_number, qemu_opts_absorb_qdict, qemu_opts_do_parse, qemu_opts_from_qdict_entry, qemu_opts_set, qemu_opts_validate }; expression list args, args2; typedef Error; Error *err; @@ - fun(args, &err, args2); - if (err) + if (!fun(args, &err, args2)) { ... } A few line breaks tidied up manually. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200707160613.848843-15-armbru@redhat.com> [Conflict with commit `0b6786a9c1` "block/amend: refactor qcow2 amend options" resolved by rerunning Coccinelle on master's version]	2020-07-10 15:17:35 +02:00
Max Reitz	e5d8a40685	block: Drop @child_class from bdrv_child_perm() Implementations should decide the necessary permissions based on @role. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200513110544.176672-35-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	36ee58d13b	block: Switch child_format users to child_of_bds Both users (quorum and blkverify) use child_format for not-really-filtered children, so the appropriate BdrvChildRole in both cases is DATA. (Note that this will cause bdrv_inherited_options() to force-allow format probing.) Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-22-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	bf8e925eb5	block: Pass BdrvChildRole to bdrv_child_perm() For now, all callers pass 0 and no callee evaluates this value. Later patches will change both. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-7-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	258b776515	block: Add BdrvChildRole to BdrvChild For now, it is always set to 0. Later patches in this series will ensure that all callers pass an appropriate combination of flags. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-6-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	bd86fb990c	block: Rename BdrvChildRole to BdrvChildClass This structure nearly only contains parent callbacks for child state changes. It cannot really reflect a child's role, because different roles may overlap (as we will see when real roles are introduced), and because parents can have custom callbacks even when the child fulfills a standard role. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200513110544.176672-4-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	3c7f75b321	quorum: Stop marking it as a filter Quorum is not a filter, for example because it cannot guarantee which of its children will serve the next request. Thus, any of its children may differ from the data visible to quorum's parents. We have other filters with multiple children, but they differ in this aspect: - blkverify quits the whole qemu process if its children differ. As such, we can always skip it when we want to skip it (as a filter node) by going to any of its children. Both have the same data. - replication generally serves requests from bs->file, so this is its only actually filtered child. - Block job filters currently only have one child, but they will probably get more children in the future. Still, they will always have only one actually filtered child. Having "filters" as a dedicated node category only makes sense if you can skip them by going to a one fixed child that always shows the same data as the filter node. Quorum cannot fulfill this, so it is not a filter. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200218103454.296704-13-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-02-18 11:55:40 +01:00
Max Reitz	6b4907cf42	block: Remove bdrv_recurse_is_first_non_filter() It no longer has any users. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200218103454.296704-11-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-02-18 11:55:40 +01:00
Max Reitz	a3ed794b36	quorum: Implement .bdrv_recurse_can_replace() Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200218103454.296704-9-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-02-18 11:55:40 +01:00
Max Reitz	37a3791b38	quorum: Fix child permissions Quorum cannot share WRITE or RESIZE on its children. Presumably, it only does so because as a filter, it seemed intuitively correct to point its .bdrv_child_perm to bdrv_filter_default_perm(). However, it is not really a filter, and bdrv_filter_default_perm() does not work for it, so we have to provide a custom .bdrv_child_perm implementation. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200218103454.296704-6-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-02-18 11:55:39 +01:00
Markus Armbruster	0b8fa32f55	Include qemu/module.h where needed, drop it from qemu-common.h Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190523143508.25387-4-armbru@redhat.com> [Rebased with conflicts resolved automatically, except for hw/usb/dev-hub.c hw/misc/exynos4210_rng.c hw/misc/bcm2835_rng.c hw/misc/aspeed_scu.c hw/display/virtio-vga.c hw/arm/stm32f205_soc.c; ui/cocoa.m fixed up]	2019-06-12 13:18:33 +02:00
Alberto Garcia	b441dc71c0	block: Make bdrv_root_attach_child() unref child_bs on failure A consequence of the previous patch is that bdrv_attach_child() transfers the reference to child_bs from the caller to parent_bs, which will drop it on bdrv_close() or when someone calls bdrv_unref_child(). But this only happens when bdrv_attach_child() succeeds. If it fails then the caller is responsible for dropping the reference to child_bs. This patch makes bdrv_attach_child() take the reference also when there is an error, freeing the caller for having to do it. A similar situation happens with bdrv_root_attach_child(), so the changes on this patch affect both functions. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: 20dfb3d9ccec559cdd1a9690146abad5d204a186.1557754872.git.berto@igalia.com [mreitz: Removed now superfluous BdrvChild * variable in bdrv_open_child()] Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-05-28 20:30:55 +02:00

1 2 3

140 Commits