From 79fbbed7a530c211752b198feafe811860b17c8a Mon Sep 17 00:00:00 2001
From: Miroslav Rezanina <mrezanin@redhat.com>
Date: Fri, 12 Oct 2018 07:31:11 +0200
Subject: [PATCH 001/407] Initial redhat build

This patch introduces redhat build structure in redhat subdirectory. In addition,
several issues are fixed in QEMU tree:

 - Change of app name for sasl_server_init in VNC code from qemu to qemu-kvm
  - As we use qemu-kvm as name in all places, this is updated to be consistent
 - Man page renamed from qemu to qemu-kvm
  - man page is installed using make install so we have to fix it in qemu tree

This rebase includes changes up to qemu-kvm-6.1.0-5.el9

Rebase notes (3.1.0):
- added new configure options

Rebase notes (4.0.0):
- Added dependency to perl-Test-Harness (upstream)
- Added dependency to python3-sphinx (upstream)
- Change location of icons (upstream)
- Remove .desktop file (added upstream)
- Added qemu-trace-stap (added upstream)
- Removed elf2dmp (added upstream)
- Remove .buildinfo
- Added pvh.bin rom (added upstream)
- Added interop documentation files
- Use python module instead of qemu.py (upstream)

Rebase notes (4.1.0):
- Remove edk2 files generated by build
- Switch to rhel-8.1-candidate build target
- Remove specs documentation
- Switched from libssh2 to libssh
- Add rc0 tarball usage hacks
- Added BuildRequires for wget, rpm-build and python3-sphinx
- Removed new unpacked files
- Update configure line to use new options

Rebase notes (4.2.0):
- Disable iotest run during make check
- README renamed to README.rst (upstream)
- Removed ui-spice-app.so
- Added relevant changes from "505f7f4 redhat: Adding slirp to the exploded tree"
- Removed qemu-ga.8 install from spec file - installed by make
- Removed spapr-rtas.bin (upstream)
- Require newer SLOF (20191022)

Rebase notes (5.1.0):
- Use python3 for virtio_seg_max_adjust.py test
- Removed qemu-trace-stap shebang from spec file
- Added virtiofsd.1 (upstream)
- Use out-of-tree build
- New documentation structure (upstream)
- Update local build
- Removing installed qemu-storage-daemon (added upstream)
- Removing opensbi-riscv32-sifive_u-fw_jump.bin (added upstream)
- Disable iotests (moved from Enable make check commit)
- Added missing configure options
- Reorder configure options
- qemu-pr-helper moved to /usr/libexec/ (upstream)
- Added submodules for usb-redir, smartcard-reader and qxl display (upstream)
- Added setting rc version in Makefile for build
- removed --disable-vxhs configure option (removed upstream)
- bumped required libusbx-devel version to 1.0.23
- bumped libfdt version to 1.6.0

Rebase notes (5.2.0 rc0):
- Move libfdt dependency to qemu-kvm-core
- Move manpage rename from Makefile to spec file
- rename with-confsuffix configure option to with-suffix (upstream)
- Bump libusbx Requires version to 1.0.234
- Manual copy of keymaps in spec file (BZ 1875217)
- Removed /usr/share/qemu-kvm/npcm7xx_bootrom.bin, considering it
  unpackaged for now.
- Removed /usr/share/qemu-kvm/qboot.rom, considering unpackaged.
- Added build dependency for meson and ninja-build
- hw/s390/s390-pci-vfio.c hack - set NULL for g_autofree variables
- Removed Chanelog (upstream)
- Fix in directory used for docs (upstream add %name so we do not pass it in configure)
- Package various .so as part of qemu-kvm-core package.

Rebase notes (5.2.0 rc2):
- Added fix for dtrace build on RHEL 8.4.0

Rebase notes (5.2.0 rc3):
- Added man page for qemu-pr-helper
- Added new configure options
- Update qemu-kiwi patches to v4

Rebase notes (6.0.0):
- update tracetool usage in spec file
- remove qemu-storage-daemon-qmp-ref man page
- remove qemu-storage-daemon man page
- Added devel documentation
- do not package virtfs-proxy-helper files
- Use --with-git-submodules instead of --(enable|disable)-git-update
- Minor build fixes for sending upstream
- g_autofree initialization fixed upstream
- Updated rc information usage
- do not package package hw-s390x-virtio-gpu-ccw.so
- Disable new switch options

Rebase notes (6.1.0):
- Fix warning issue in block.c
- Download tarball from dist-git cache
- Removed sheepdog driver
- Added new display modules:
  - hw-display-virtio-gpu-gl.so
  - hw-display-virtio-gpu-pci-gl.so
  - hw-display-virtio-vga-gl.so
- sasl fix moved from ui/vnc.c to ui/vnc-auth-sasl.c
- Added accel-qtest-%{kvm_target} and accel-tcg-%{kvm_target}
- Added about docs
- Use -q option for setup
- Added hw-usb-host.so
- Disable new options (bpf, nvmm, slirp-smbd)

Rebase notes (6.2.0):
- Using internal meson
- removed --disable-jemalloc and --disable-tcmalloc configure options
- added audio-oss.so
- added fdt requirement for x86_64
- tests/acceptance renamed to tests/avocado
- added multiboot_dma.bin
- Removed conflict relics
- Updated configure options

Merged patches (3.1.0):
- 01f0c9f RHEL8: Add disable configure options to qemu spec file
- Spec file cleanups

Merged patches (4.0.0):
- aa4297c Add edk2 Requires to qemu-kvm
- d124ff5779 Fixing brew build target
- eb204b5 Introduce the qemu-kvm-tests rpm
- 223cf0c Load kvm module during boot (partial)

Merged patches (4.1.0):
- ebb6e97 redhat: Fix LOCALVERSION creation
- b0ab0cc redhat: enable tpmdev passthrough (not disabling tests)
- 7cb3c4a Enable libpmem to support nvdimm
- 8943607 qemu-kvm.spec: bump libseccomp >= 2.4.0
- 27b7c44 rh: set CONFIG_BOCHS_DISPLAY=y for x86 (partial)
- e1fe9fe x86_64-rh-devices: enable TPM emulation (partial)

Merged patches (4.2.0):
- 69e1fb2 enable virgla
- d4f6115 enable virgl, for real this time ...

Merged patches (5.1.0):
- 5edf6bd Add support for rh-brew-module
- f77d52d redhat: ship virtiofsd vhost-user device backend
- 63f12d4 redhat: Always use module build target for rh-brew (modified)
- 9b1e140 redhat: updating the modular target
- 44b8bd0 spec: Fix python shenigans for tests

Merged patches (5.2.0 rc0):
- 9238ce7 Add support for simpletrace
- 5797cff Remove explicit glusterfs-api dependency
- fd62478 disable virgl
- 0205018 redhat: link /etc/qemu-ga/fsfreeze-hook to /etc/qemu-kvm/
- 3645097 redhat: Make all generated so files executable (not only block-*)

Merged patches (5.2.0 rc2):
- pjw 99657 redhat: introduces disable_everything macro into the configure call
- pjw 99659 redhat: scripts/extract_build_cmd.py - Avoid listing empty lines
- pjw 99658 redhat: Fixing rh-local build
- pjw 99660 redhat: Add qemu-kiwi subpackage
- d2e59ce redhat: add (un/pre)install systemd hooks for qemu-ga

Merged patches (5.2.0 rc3):
- pjw 99887 - redhat: allow Makefile rh-prep builddep to fail
- pjw 99885 - redhat: adding rh-rpm target

Merged patches (6.0.0):
- 5ab9954a3b spec: find system python via meson
- cd0f7db11f build-system: use b_staticpic=false
- 80d2dec42c udev-kvm-check: remove the "exceeded subscription limit" message
- 38959d51c0 redhat: Allow make to inherit params from parent make for rh-local
- 1e0cfe458f redhat: moving all documentation files to qemu-kvm-docs
- d7a594d02b redhat: makes qemu respect system's crypto profile
- e2bbf1572b spec: Package qemu-storage-daemon
- 92f10993ba spec: ui-spice sub-package
- 8931e46069 spec: ui-opengl sub-package

Merged patches (6.1.0):
- 7bb57541b3 redhat: Install the s390-netboot.img that we've built
- b4a8531f41 redhat: Fix "unversioned Obsoletes" warning
- 141a1693c7 redhat: Move qemu-kvm-docs dependency to qemu-kvm
- d75f59c6f9 redhat: introducting qemu-kvm-hw-usbredir
- a934d8bf44 redhat: use the standard vhost-user JSON path

Merged patches (6.2.0):
- 4f3f04bbb6 spec: Remove qemu-kiwi build
---
 README.systemtap                        | 43 +++++++++++++++++++++++++
 meson.build                             |  4 ++-
 scripts/qemu-guest-agent/fsfreeze-hook  |  2 +-
 scripts/systemtap/conf.d/qemu_kvm.conf  |  4 +++
 scripts/systemtap/script.d/qemu_kvm.stp |  1 +
 tests/check-block.sh                    |  2 ++
 ui/vnc-auth-sasl.c                      |  2 +-
 7 files changed, 55 insertions(+), 3 deletions(-)
 create mode 100644 README.systemtap
 create mode 100644 scripts/systemtap/conf.d/qemu_kvm.conf
 create mode 100644 scripts/systemtap/script.d/qemu_kvm.stp

diff --git a/README.systemtap b/README.systemtap
new file mode 100644
index 00000000000..ad913fc9903
--- /dev/null
+++ b/README.systemtap
@@ -0,0 +1,43 @@
+QEMU tracing using systemtap-initscript
+---------------------------------------
+
+You can capture QEMU trace data all the time using systemtap-initscript.  This
+uses SystemTap's flight recorder mode to trace all running guests to a
+fixed-size buffer on the host.  Old trace entries are overwritten by new
+entries when the buffer size wraps.
+
+1. Install the systemtap-initscript package:
+  # yum install systemtap-initscript
+
+2. Install the systemtap scripts and the conf file:
+  # cp /usr/share/qemu-kvm/systemtap/script.d/qemu_kvm.stp /etc/systemtap/script.d/
+  # cp /usr/share/qemu-kvm/systemtap/conf.d/qemu_kvm.conf /etc/systemtap/conf.d/
+
+The set of trace events to enable is given in qemu_kvm.stp.  This SystemTap
+script can be customized to add or remove trace events provided in
+/usr/share/systemtap/tapset/qemu-kvm-simpletrace.stp.
+
+SystemTap customizations can be made to qemu_kvm.conf to control the flight
+recorder buffer size and whether to store traces in memory only or disk too.
+See stap(1) for option documentation.
+
+3. Start the systemtap service.
+　# service systemtap start qemu_kvm
+
+4. Make the service start at boot time.
+　# chkconfig systemtap on
+
+5. Confirm that the service works.
+  # service systemtap status qemu_kvm
+  qemu_kvm is running...
+
+When you want to inspect the trace buffer, perform the following steps:
+
+1. Dump the trace buffer.
+  # staprun -A qemu_kvm >/tmp/trace.log
+
+2. Start the systemtap service because the preceding step stops the service.
+  # service systemtap start qemu_kvm
+
+3. Translate the trace record to readable format.
+  # /usr/share/qemu-kvm/simpletrace.py --no-header /usr/share/qemu-kvm/trace-events /tmp/trace.log
diff --git a/meson.build b/meson.build
index 96de1a6ef94..5f6ba86dbb8 100644
--- a/meson.build
+++ b/meson.build
@@ -2108,7 +2108,9 @@ if capstone_opt == 'internal'
     # Include all configuration defines via a header file, which will wind up
     # as a dependency on the object file, and thus changes here will result
     # in a rebuild.
-    '-include', 'capstone-defs.h'
+    '-include', 'capstone-defs.h',
+
+    '-Wp,-D_GLIBCXX_ASSERTIONS',
   ]
 
   libcapstone = static_library('capstone',
diff --git a/scripts/qemu-guest-agent/fsfreeze-hook b/scripts/qemu-guest-agent/fsfreeze-hook
index 13aafd48451..e9b84ec0284 100755
--- a/scripts/qemu-guest-agent/fsfreeze-hook
+++ b/scripts/qemu-guest-agent/fsfreeze-hook
@@ -8,7 +8,7 @@
 # request, it is issued with "thaw" argument after filesystem is thawed.
 
 LOGFILE=/var/log/qga-fsfreeze-hook.log
-FSFREEZE_D=$(dirname -- "$0")/fsfreeze-hook.d
+FSFREEZE_D=$(dirname -- "$(realpath $0)")/fsfreeze-hook.d
 
 # Check whether file $1 is a backup or rpm-generated file and should be ignored
 is_ignored_file() {
diff --git a/scripts/systemtap/conf.d/qemu_kvm.conf b/scripts/systemtap/conf.d/qemu_kvm.conf
new file mode 100644
index 00000000000..372d8160a48
--- /dev/null
+++ b/scripts/systemtap/conf.d/qemu_kvm.conf
@@ -0,0 +1,4 @@
+# Force load uprobes (see BZ#1118352)
+stap -e 'probe process("/usr/libexec/qemu-kvm").function("main") { printf("") }' -c true
+
+qemu_kvm_OPT="-s4" # per-CPU buffer size, in megabytes
diff --git a/scripts/systemtap/script.d/qemu_kvm.stp b/scripts/systemtap/script.d/qemu_kvm.stp
new file mode 100644
index 00000000000..c04abf94496
--- /dev/null
+++ b/scripts/systemtap/script.d/qemu_kvm.stp
@@ -0,0 +1 @@
+probe qemu.kvm.simpletrace.handle_qmp_command,qemu.kvm.simpletrace.monitor_protocol_*,qemu.kvm.simpletrace.migrate_set_state {}
diff --git a/tests/check-block.sh b/tests/check-block.sh
index f86cb863de5..6d38340d491 100755
--- a/tests/check-block.sh
+++ b/tests/check-block.sh
@@ -69,6 +69,8 @@ else
     fi
 fi
 
+exit 0
+
 cd tests/qemu-iotests
 
 # QEMU_CHECK_BLOCK_AUTO is used to disable some unstable sub-tests
diff --git a/ui/vnc-auth-sasl.c b/ui/vnc-auth-sasl.c
index 47fdae5b215..2a950caa2aa 100644
--- a/ui/vnc-auth-sasl.c
+++ b/ui/vnc-auth-sasl.c
@@ -42,7 +42,7 @@
 
 bool vnc_sasl_server_init(Error **errp)
 {
-    int saslErr = sasl_server_init(NULL, "qemu");
+    int saslErr = sasl_server_init(NULL, "qemu-kvm");
 
     if (saslErr != SASL_OK) {
         error_setg(errp, "Failed to initialize SASL auth: %s",
-- 
Gitee


From 03cc195024c0eac84019e892e80880e0fccb8619 Mon Sep 17 00:00:00 2001
From: Miroslav Rezanina <mrezanin@redhat.com>
Date: Wed, 2 Sep 2020 09:11:07 +0200
Subject: [PATCH 002/407] Enable/disable devices for RHEL

This commit adds all changes related to changes in supported devices.

Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>

Rebase notes (qemu 3.1.0)
- spapr_rng disabled in default_config
- new hyperv.mak in default configs
- Move changes from x86_64-softmmu.mak to i386-softmmu.mak
- Added CONFIG_VIRTIO_MMIO to aarch64-softmmu.mak
- Removed config_vga_isa.c changes as no longer needed
- Removed new devices

Rebase notes (4.0.0):
- Added CONFIG_PCI_EXPRESS_GENERIC_BRIDGE for aarch64-softmmu.mak
- Added CONFIG_ARM_VIRT for aarch64-softmmu.mak
- Switch to KConfig (upstream)
 - Using device whitelist + without-defualt-devices option

Rebase notes (4.1.0):
- Added CONFIG_USB_OHCI_PCI for ppc64
- Added CONFIG_XIVE_KVM for ppc64
- Added CONFIG_ACPI_PCI for x86_64
- Added CONFIG_SEMIHOSTING for aarch64
- Cleanup aarch64 devices
- Do not build a15mpcore.c
- Removed ide-isa.c stub file
- Use CONFIG_USB_EHCI_PCI on x86_64 (new upstream)

Rebase notes (4.2.0-rc0):
- Use conditional build for isa-superio.c (upstream change)
- Rename PCI_PIIX to PCI_I440FX (upstream change)

Rebase notes (4.2.0-rc3):
- Disabled ccid-card-emulated (patch 92566)
- Disabled vfio-pci-igd-lpc-bridge (patch 92565)

Rebase notes (5.1.0):
- added CONFIG_PCI_EXPRESS on ppc64 (due to upstream dependency)
- Added CONFIG_NVDIMM
- updated cortex-15 disabling to upstream code
- Add CONFIG_ACPI_APEI for aarch64
- removed obsolete hw/bt/Makefile.objs chunk
- removed unnecessary changes in target/i386/cpu.c

Rebase notes (5.2.0 rc0):
- Added CONFIG_USB_XHCI_PCI on aarch64 ppc64 and x86_64
- remove vl.c hack for no hpet
- Enable CONFIG_PTIMER for aarch64
- Do not package hw-display-virtio-gpu.so on s390x

Rebase notes (5.2.0 rc1):
- Added CONFIG_ARM_GIC for aarch64 (required for build)

Rebase notes (weekly-210113):
- Removed XICS_KVM, XICS_SPAPR, XIVE_KVM and XIVE_SPAPR config (removed upstream)

Rebase notes (weekly-210120):
- Add CONFIG_ARM_COMPATIBLE_SEMIHOSTING option

Rebase notes (weekly-210203):
- Rename CONFIG_PVPANIC to CONFIG_PVPANIC_ISA

Rebase notes (weekly-210317):
- Add new USB_STORAGE_CORE and USB_STORAGE_CLASSIC config for ppc64 and x86_64
- Update disabling TCG cpus for AArch64

Rebase notes (weekly-210519):
- Do not use CONFIG_SPICE and CONFIG_OPENGL in default configs

Rebase notes (weekly-210623):
- Add CONFIG_TPM for archs with used TPM functionality

Rebase notes (weekly-210714):
- default_configs moved to configs

Rebase notes (6.1.0 rc2):
- Use --with-device-ARCH configure option to use redhat config files

Rebase notes (6.2.0 rc3):
- Do not remove -no-hpet documentation
Merged patches (qemu 3.1.0):
- d51e082 Re-enable CONFIG_HYPERV_TESTDEV
- 4b889f3 Declare cirrus-vga as deprecated
- b579d32 Do not build bluetooth support
- 3eef52a Disable CONFIG_IPMI and CONFIG_I2C for ppc64
- 9caf292 Disable CONFIG_CAN_BUS and CONFIG_CAN_SJA1000

Merged patches (4.1.0):
- 20a51f6 fdc: Revert downstream disablement of device "floppy"
- f869cc0 fdc: Restrict floppy controllers to RHEL-7 machine types
- 5909721 aarch64: Compile out IOH3420
- 27b7c44 rh: set CONFIG_BOCHS_DISPLAY=y for x86 (partial)
- 495a27d x86_64-rh-devices: add missing TPM passthrough
- e1fe9fe x86_64-rh-devices: enable TPM emulation (partial)

Merged patches (4.2.0):
- f7587dd RHEL: disable hostmem-memfd

Merged patches (5.1.0):
- 4543a3c i386: Remove cpu64-rhel6 CPU model
- 96533 aarch64: Remove tcg cpu types (pjw commit)
- 559d589 Revert "RHEL: disable hostmem-memfd"
- 441128e enable ramfb

Merged patches (5.2.0 rc0):
- f70eb50 RHEL-only: Enable vTPM for POWER in downstream configs
- 69d8ae7 redhat: fix 5.0 rebase missing ISA TPM TIS
- 8310f89 RHEL-only: Enable vTPM for ARM in downstream configs
- 4a8ccfd Disable TPM passthrough backend on ARM

Merged patches (6.0.0):
- ff817df9e3 config: enable VFIO_CCW
- 70d3924521 redhat: Add some devices for exporting upstream machine types
 - without machine type chunks
- efac91b2b4 default-configs: Enable vhost-user-blk

Merged patches (weekly-210630):
- 59a178acff disable CONFIG_USB_STORAGE_BOT

Merged patches (6.1.0 rc2):
- 86f0025f16 aarch64: Add USB storage devices
---
 .../aarch64-softmmu/aarch64-rh-devices.mak    |  31 ++++++
 .../ppc64-softmmu/ppc64-rh-devices.mak        |  36 ++++++
 configs/devices/rh-virtio.mak                 |  10 ++
 .../s390x-softmmu/s390x-rh-devices.mak        |  16 +++
 .../x86_64-softmmu/x86_64-rh-devices.mak      | 104 ++++++++++++++++++
 .../x86_64-upstream-devices.mak               |   4 +
 hw/acpi/ich9.c                                |   4 +-
 hw/arm/meson.build                            |   2 +-
 hw/block/fdc.c                                |  10 ++
 hw/char/parallel.c                            |   9 ++
 hw/cpu/meson.build                            |   5 +-
 hw/display/cirrus_vga.c                       |   3 +
 hw/ide/piix.c                                 |   5 +-
 hw/input/pckbd.c                              |   2 +
 hw/net/e1000.c                                |   2 +
 hw/ppc/spapr_cpu_core.c                       |   2 +
 hw/timer/hpet.c                               |   8 ++
 hw/usb/meson.build                            |   2 +-
 target/arm/cpu_tcg.c                          |  10 ++
 target/ppc/cpu-models.c                       |  10 ++
 target/s390x/cpu_models_sysemu.c              |   3 +
 target/s390x/kvm/kvm.c                        |   8 ++
 22 files changed, 279 insertions(+), 7 deletions(-)
 create mode 100644 configs/devices/aarch64-softmmu/aarch64-rh-devices.mak
 create mode 100644 configs/devices/ppc64-softmmu/ppc64-rh-devices.mak
 create mode 100644 configs/devices/rh-virtio.mak
 create mode 100644 configs/devices/s390x-softmmu/s390x-rh-devices.mak
 create mode 100644 configs/devices/x86_64-softmmu/x86_64-rh-devices.mak
 create mode 100644 configs/devices/x86_64-softmmu/x86_64-upstream-devices.mak

diff --git a/configs/devices/aarch64-softmmu/aarch64-rh-devices.mak b/configs/devices/aarch64-softmmu/aarch64-rh-devices.mak
new file mode 100644
index 00000000000..0d4f9e6e4bd
--- /dev/null
+++ b/configs/devices/aarch64-softmmu/aarch64-rh-devices.mak
@@ -0,0 +1,31 @@
+include ../rh-virtio.mak
+
+CONFIG_ARM_GIC_KVM=y
+CONFIG_ARM_GIC=y
+CONFIG_ARM_SMMUV3=y
+CONFIG_ARM_V7M=y
+CONFIG_ARM_VIRT=y
+CONFIG_EDID=y
+CONFIG_PCIE_PORT=y
+CONFIG_PCI_DEVICES=y
+CONFIG_PCI_TESTDEV=y
+CONFIG_PFLASH_CFI01=y
+CONFIG_SCSI=y
+CONFIG_SEMIHOSTING=y
+CONFIG_USB=y
+CONFIG_USB_XHCI=y
+CONFIG_USB_XHCI_PCI=y
+CONFIG_USB_STORAGE_CORE=y
+CONFIG_USB_STORAGE_CLASSIC=y
+CONFIG_VFIO=y
+CONFIG_VFIO_PCI=y
+CONFIG_VIRTIO_MMIO=y
+CONFIG_VIRTIO_PCI=y
+CONFIG_XIO3130=y
+CONFIG_NVDIMM=y
+CONFIG_ACPI_APEI=y
+CONFIG_TPM=y
+CONFIG_TPM_EMULATOR=y
+CONFIG_TPM_TIS_SYSBUS=y
+CONFIG_PTIMER=y
+CONFIG_ARM_COMPATIBLE_SEMIHOSTING=y
diff --git a/configs/devices/ppc64-softmmu/ppc64-rh-devices.mak b/configs/devices/ppc64-softmmu/ppc64-rh-devices.mak
new file mode 100644
index 00000000000..73e3ee02933
--- /dev/null
+++ b/configs/devices/ppc64-softmmu/ppc64-rh-devices.mak
@@ -0,0 +1,36 @@
+include ../rh-virtio.mak
+
+CONFIG_DIMM=y
+CONFIG_MEM_DEVICE=y
+CONFIG_NVDIMM=y
+CONFIG_PCI=y
+CONFIG_PCI_DEVICES=y
+CONFIG_PCI_TESTDEV=y
+CONFIG_PCI_EXPRESS=y
+CONFIG_PSERIES=y
+CONFIG_SCSI=y
+CONFIG_SPAPR_VSCSI=y
+CONFIG_TEST_DEVICES=y
+CONFIG_USB=y
+CONFIG_USB_OHCI=y
+CONFIG_USB_OHCI_PCI=y
+CONFIG_USB_SMARTCARD=y
+CONFIG_USB_STORAGE_CORE=y
+CONFIG_USB_STORAGE_CLASSIC=y
+CONFIG_USB_XHCI=y
+CONFIG_USB_XHCI_NEC=y
+CONFIG_USB_XHCI_PCI=y
+CONFIG_VFIO=y
+CONFIG_VFIO_PCI=y
+CONFIG_VGA=y
+CONFIG_VGA_PCI=y
+CONFIG_VHOST_USER=y
+CONFIG_VIRTIO_PCI=y
+CONFIG_VIRTIO_VGA=y
+CONFIG_WDT_IB6300ESB=y
+CONFIG_XICS=y
+CONFIG_XIVE=y
+CONFIG_TPM=y
+CONFIG_TPM_SPAPR=y
+CONFIG_TPM_EMULATOR=y
+CONFIG_TPM_PASSTHROUGH=y
diff --git a/configs/devices/rh-virtio.mak b/configs/devices/rh-virtio.mak
new file mode 100644
index 00000000000..94ede1b5f68
--- /dev/null
+++ b/configs/devices/rh-virtio.mak
@@ -0,0 +1,10 @@
+CONFIG_VIRTIO=y
+CONFIG_VIRTIO_BALLOON=y
+CONFIG_VIRTIO_BLK=y
+CONFIG_VIRTIO_GPU=y
+CONFIG_VIRTIO_INPUT=y
+CONFIG_VIRTIO_INPUT_HOST=y
+CONFIG_VIRTIO_NET=y
+CONFIG_VIRTIO_RNG=y
+CONFIG_VIRTIO_SCSI=y
+CONFIG_VIRTIO_SERIAL=y
diff --git a/configs/devices/s390x-softmmu/s390x-rh-devices.mak b/configs/devices/s390x-softmmu/s390x-rh-devices.mak
new file mode 100644
index 00000000000..165c082e873
--- /dev/null
+++ b/configs/devices/s390x-softmmu/s390x-rh-devices.mak
@@ -0,0 +1,16 @@
+include ../rh-virtio.mak
+
+CONFIG_PCI=y
+CONFIG_S390_CCW_VIRTIO=y
+CONFIG_S390_FLIC=y
+CONFIG_S390_FLIC_KVM=y
+CONFIG_SCLPCONSOLE=y
+CONFIG_SCSI=y
+CONFIG_TERMINAL3270=y
+CONFIG_VFIO=y
+CONFIG_VFIO_AP=y
+CONFIG_VFIO_CCW=y
+CONFIG_VFIO_PCI=y
+CONFIG_VHOST_USER=y
+CONFIG_VIRTIO_CCW=y
+CONFIG_WDT_DIAG288=y
diff --git a/configs/devices/x86_64-softmmu/x86_64-rh-devices.mak b/configs/devices/x86_64-softmmu/x86_64-rh-devices.mak
new file mode 100644
index 00000000000..ddf036f0423
--- /dev/null
+++ b/configs/devices/x86_64-softmmu/x86_64-rh-devices.mak
@@ -0,0 +1,104 @@
+include ../rh-virtio.mak
+include x86_64-upstream-devices.mak
+
+CONFIG_AC97=y
+CONFIG_ACPI=y
+CONFIG_ACPI_PCI=y
+CONFIG_ACPI_CPU_HOTPLUG=y
+CONFIG_ACPI_MEMORY_HOTPLUG=y
+CONFIG_ACPI_NVDIMM=y
+CONFIG_ACPI_SMBUS=y
+CONFIG_ACPI_VMGENID=y
+CONFIG_ACPI_X86=y
+CONFIG_ACPI_X86_ICH=y
+CONFIG_AHCI=y
+CONFIG_APIC=y
+CONFIG_APM=y
+CONFIG_BOCHS_DISPLAY=y
+CONFIG_DIMM=y
+CONFIG_E1000E_PCI_EXPRESS=y
+CONFIG_E1000_PCI=y
+CONFIG_EDU=y
+CONFIG_FDC=y
+CONFIG_FDC_SYSBUS=y
+CONFIG_FW_CFG_DMA=y
+CONFIG_HDA=y
+CONFIG_HYPERV=y
+CONFIG_HYPERV_TESTDEV=y
+CONFIG_I2C=y
+CONFIG_I440FX=y
+CONFIG_I8254=y
+CONFIG_I8257=y
+CONFIG_I8259=y
+CONFIG_I82801B11=y
+CONFIG_IDE_CORE=y
+CONFIG_IDE_PCI=y
+CONFIG_IDE_PIIX=y
+CONFIG_IDE_QDEV=y
+CONFIG_IOAPIC=y
+CONFIG_IOH3420=y
+CONFIG_ISA_BUS=y
+CONFIG_ISA_DEBUG=y
+CONFIG_ISA_TESTDEV=y
+CONFIG_LPC_ICH9=y
+CONFIG_MC146818RTC=y
+CONFIG_MEM_DEVICE=y
+CONFIG_NVDIMM=y
+CONFIG_PAM=y
+CONFIG_PC=y
+CONFIG_PCI=y
+CONFIG_PCIE_PORT=y
+CONFIG_PCI_DEVICES=y
+CONFIG_PCI_EXPRESS=y
+CONFIG_PCI_EXPRESS_Q35=y
+CONFIG_PCI_I440FX=y
+CONFIG_PCI_TESTDEV=y
+CONFIG_PCKBD=y
+CONFIG_PCSPK=y
+CONFIG_PC_ACPI=y
+CONFIG_PC_PCI=y
+CONFIG_PFLASH_CFI01=y
+CONFIG_PVPANIC_ISA=y
+CONFIG_PXB=y
+CONFIG_Q35=y
+CONFIG_QXL=y
+CONFIG_RTL8139_PCI=y
+CONFIG_SCSI=y
+CONFIG_SERIAL=y
+CONFIG_SERIAL_ISA=y
+CONFIG_SERIAL_PCI=y
+CONFIG_SEV=y
+CONFIG_SGA=y
+CONFIG_SMBIOS=y
+CONFIG_SMBUS_EEPROM=y
+CONFIG_TEST_DEVICES=y
+CONFIG_USB=y
+CONFIG_USB_EHCI=y
+CONFIG_USB_EHCI_PCI=y
+CONFIG_USB_SMARTCARD=y
+CONFIG_USB_STORAGE_CORE=y
+CONFIG_USB_STORAGE_CLASSIC=y
+CONFIG_USB_UHCI=y
+CONFIG_USB_XHCI=y
+CONFIG_USB_XHCI_NEC=y
+CONFIG_USB_XHCI_PCI=y
+CONFIG_VFIO=y
+CONFIG_VFIO_PCI=y
+CONFIG_VGA=y
+CONFIG_VGA_CIRRUS=y
+CONFIG_VGA_PCI=y
+CONFIG_VHOST_USER=y
+CONFIG_VHOST_USER_BLK=y
+CONFIG_VIRTIO_PCI=y
+CONFIG_VIRTIO_VGA=y
+CONFIG_VMMOUSE=y
+CONFIG_VMPORT=y
+CONFIG_VTD=y
+CONFIG_WDT_IB6300ESB=y
+CONFIG_WDT_IB700=y
+CONFIG_XIO3130=y
+CONFIG_TPM=y
+CONFIG_TPM_CRB=y
+CONFIG_TPM_TIS_ISA=y
+CONFIG_TPM_EMULATOR=y
+CONFIG_TPM_PASSTHROUGH=y
diff --git a/configs/devices/x86_64-softmmu/x86_64-upstream-devices.mak b/configs/devices/x86_64-softmmu/x86_64-upstream-devices.mak
new file mode 100644
index 00000000000..2cd20f54d2f
--- /dev/null
+++ b/configs/devices/x86_64-softmmu/x86_64-upstream-devices.mak
@@ -0,0 +1,4 @@
+# We need "isa-parallel"
+CONFIG_PARALLEL=y
+# We need "hpet"
+CONFIG_HPET=y
diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c
index ebe08ed831f..381ef2ddcf4 100644
--- a/hw/acpi/ich9.c
+++ b/hw/acpi/ich9.c
@@ -438,8 +438,8 @@ void ich9_pm_add_properties(Object *obj, ICH9LPCPMRegs *pm)
     static const uint32_t gpe0_len = ICH9_PMIO_GPE0_LEN;
     pm->acpi_memory_hotplug.is_enabled = true;
     pm->cpu_hotplug_legacy = true;
-    pm->disable_s3 = 0;
-    pm->disable_s4 = 0;
+    pm->disable_s3 = 1;
+    pm->disable_s4 = 1;
     pm->s4_val = 2;
     pm->use_acpi_hotplug_bridge = true;
     pm->keep_pci_slot_hpc = true;
diff --git a/hw/arm/meson.build b/hw/arm/meson.build
index 721a8eb8bed..87ed4dd914e 100644
--- a/hw/arm/meson.build
+++ b/hw/arm/meson.build
@@ -31,7 +31,7 @@ arm_ss.add(when: 'CONFIG_VEXPRESS', if_true: files('vexpress.c'))
 arm_ss.add(when: 'CONFIG_ZYNQ', if_true: files('xilinx_zynq.c'))
 arm_ss.add(when: 'CONFIG_SABRELITE', if_true: files('sabrelite.c'))
 
-arm_ss.add(when: 'CONFIG_ARM_V7M', if_true: files('armv7m.c'))
+#arm_ss.add(when: 'CONFIG_ARM_V7M', if_true: files('armv7m.c'))
 arm_ss.add(when: 'CONFIG_EXYNOS4', if_true: files('exynos4210.c'))
 arm_ss.add(when: 'CONFIG_PXA2XX', if_true: files('pxa2xx.c', 'pxa2xx_gpio.c', 'pxa2xx_pic.c'))
 arm_ss.add(when: 'CONFIG_DIGIC', if_true: files('digic.c'))
diff --git a/hw/block/fdc.c b/hw/block/fdc.c
index 21d18ac2e36..97fa6de4239 100644
--- a/hw/block/fdc.c
+++ b/hw/block/fdc.c
@@ -48,6 +48,8 @@
 #include "qom/object.h"
 #include "fdc-internal.h"
 
+#include "hw/boards.h"
+
 /********************************************************/
 /* debug Floppy devices */
 
@@ -2337,6 +2339,14 @@ void fdctrl_realize_common(DeviceState *dev, FDCtrl *fdctrl, Error **errp)
     FDrive *drive;
     static int command_tables_inited = 0;
 
+    /* Restricted for Red Hat Enterprise Linux: */
+    MachineClass *mc = MACHINE_GET_CLASS(qdev_get_machine());
+    if (!strstr(mc->name, "-rhel7.")) {
+        error_setg(errp, "Device %s is not supported with machine type %s",
+                   object_get_typename(OBJECT(dev)), mc->name);
+        return;
+    }
+
     if (fdctrl->fallback == FLOPPY_DRIVE_TYPE_AUTO) {
         error_setg(errp, "Cannot choose a fallback FDrive type of 'auto'");
         return;
diff --git a/hw/char/parallel.c b/hw/char/parallel.c
index b45e67bfbb9..e5f108211b8 100644
--- a/hw/char/parallel.c
+++ b/hw/char/parallel.c
@@ -29,6 +29,7 @@
 #include "chardev/char-parallel.h"
 #include "chardev/char-fe.h"
 #include "hw/acpi/aml-build.h"
+#include "hw/boards.h"
 #include "hw/irq.h"
 #include "hw/isa/isa.h"
 #include "hw/qdev-properties.h"
@@ -534,6 +535,14 @@ static void parallel_isa_realizefn(DeviceState *dev, Error **errp)
     int base;
     uint8_t dummy;
 
+    /* Restricted for Red Hat Enterprise Linux */
+    MachineClass *mc = MACHINE_GET_CLASS(qdev_get_machine());
+    if (strstr(mc->name, "rhel")) {
+        error_setg(errp, "Device %s is not supported with machine type %s",
+                   object_get_typename(OBJECT(dev)), mc->name);
+        return;
+    }
+
     if (!qemu_chr_fe_backend_connected(&s->chr)) {
         error_setg(errp, "Can't create parallel device, empty char device");
         return;
diff --git a/hw/cpu/meson.build b/hw/cpu/meson.build
index 9e52fee9e77..bb71c9f3e72 100644
--- a/hw/cpu/meson.build
+++ b/hw/cpu/meson.build
@@ -1,6 +1,7 @@
-softmmu_ss.add(files('core.c', 'cluster.c'))
+#softmmu_ss.add(files('core.c', 'cluster.c'))
+softmmu_ss.add(files('core.c'))
 
 specific_ss.add(when: 'CONFIG_ARM11MPCORE', if_true: files('arm11mpcore.c'))
 specific_ss.add(when: 'CONFIG_REALVIEW', if_true: files('realview_mpcore.c'))
 specific_ss.add(when: 'CONFIG_A9MPCORE', if_true: files('a9mpcore.c'))
-specific_ss.add(when: 'CONFIG_A15MPCORE', if_true: files('a15mpcore.c'))
+#specific_ss.add(when: 'CONFIG_A15MPCORE', if_true: files('a15mpcore.c'))
diff --git a/hw/display/cirrus_vga.c b/hw/display/cirrus_vga.c
index fdca6ca659f..fa1a7eee514 100644
--- a/hw/display/cirrus_vga.c
+++ b/hw/display/cirrus_vga.c
@@ -2945,6 +2945,9 @@ static void pci_cirrus_vga_realize(PCIDevice *dev, Error **errp)
      PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(dev);
      int16_t device_id = pc->device_id;
 
+     warn_report("'cirrus-vga' is deprecated, "
+                 "please use a different VGA card instead");
+
      /* follow real hardware, cirrus card emulated has 4 MB video memory.
        Also accept 8 MB/16 MB for backward compatibility. */
      if (s->vga.vram_size_mb != 4 && s->vga.vram_size_mb != 8 &&
diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index ce89fd0aa36..fbcf802b138 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -232,7 +232,8 @@ static void piix3_ide_class_init(ObjectClass *klass, void *data)
     k->device_id = PCI_DEVICE_ID_INTEL_82371SB_1;
     k->class_id = PCI_CLASS_STORAGE_IDE;
     set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
-    dc->hotpluggable = false;
+    /* Disabled for Red Hat Enterprise Linux: */
+    dc->user_creatable = false;
 }
 
 static const TypeInfo piix3_ide_info = {
@@ -261,6 +262,8 @@ static void piix4_ide_class_init(ObjectClass *klass, void *data)
     k->class_id = PCI_CLASS_STORAGE_IDE;
     set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
     dc->hotpluggable = false;
+    /* Disabled for Red Hat Enterprise Linux: */
+    dc->user_creatable = false;
 }
 
 static const TypeInfo piix4_ide_info = {
diff --git a/hw/input/pckbd.c b/hw/input/pckbd.c
index baba62f357a..bc360347ea4 100644
--- a/hw/input/pckbd.c
+++ b/hw/input/pckbd.c
@@ -796,6 +796,8 @@ static void i8042_class_initfn(ObjectClass *klass, void *data)
     dc->vmsd = &vmstate_kbd_isa;
     isa->build_aml = i8042_build_aml;
     set_bit(DEVICE_CATEGORY_INPUT, dc->categories);
+    /* Disabled for Red Hat Enterprise Linux: */
+    dc->user_creatable = false;
 }
 
 static const TypeInfo i8042_info = {
diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index f5bc81296d1..282d01e3748 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -1821,6 +1821,7 @@ static const E1000Info e1000_devices[] = {
         .revision  = 0x03,
         .phy_id2   = E1000_PHY_ID2_8254xx_DEFAULT,
     },
+#if 0 /* Disabled for Red Hat Enterprise Linux 7 */
     {
         .name      = "e1000-82544gc",
         .device_id = E1000_DEV_ID_82544GC_COPPER,
@@ -1833,6 +1834,7 @@ static const E1000Info e1000_devices[] = {
         .revision  = 0x03,
         .phy_id2   = E1000_PHY_ID2_8254xx_DEFAULT,
     },
+#endif
 };
 
 static void e1000_register_types(void)
diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
index 58e7341cb78..8ba34f6a1df 100644
--- a/hw/ppc/spapr_cpu_core.c
+++ b/hw/ppc/spapr_cpu_core.c
@@ -370,10 +370,12 @@ static const TypeInfo spapr_cpu_core_type_infos[] = {
         .instance_size = sizeof(SpaprCpuCore),
         .class_size = sizeof(SpaprCpuCoreClass),
     },
+#if 0  /* Disabled for Red Hat Enterprise Linux */
     DEFINE_SPAPR_CPU_CORE_TYPE("970_v2.2"),
     DEFINE_SPAPR_CPU_CORE_TYPE("970mp_v1.0"),
     DEFINE_SPAPR_CPU_CORE_TYPE("970mp_v1.1"),
     DEFINE_SPAPR_CPU_CORE_TYPE("power5+_v2.1"),
+#endif
     DEFINE_SPAPR_CPU_CORE_TYPE("power7_v2.3"),
     DEFINE_SPAPR_CPU_CORE_TYPE("power7+_v2.1"),
     DEFINE_SPAPR_CPU_CORE_TYPE("power8_v2.0"),
diff --git a/hw/timer/hpet.c b/hw/timer/hpet.c
index 9520471be2c..202e0325240 100644
--- a/hw/timer/hpet.c
+++ b/hw/timer/hpet.c
@@ -733,6 +733,14 @@ static void hpet_realize(DeviceState *dev, Error **errp)
     int i;
     HPETTimer *timer;
 
+    /* Restricted for Red Hat Enterprise Linux */
+    MachineClass *mc = MACHINE_GET_CLASS(qdev_get_machine());
+    if (strstr(mc->name, "rhel")) {
+        error_setg(errp, "Device %s is not supported with machine type %s",
+                   object_get_typename(OBJECT(dev)), mc->name);
+        return;
+    }
+
     if (!s->intcap) {
         warn_report("Hpet's intcap not initialized");
     }
diff --git a/hw/usb/meson.build b/hw/usb/meson.build
index de853d780dd..0776ae6a20e 100644
--- a/hw/usb/meson.build
+++ b/hw/usb/meson.build
@@ -52,7 +52,7 @@ softmmu_ss.add(when: 'CONFIG_USB_SMARTCARD', if_true: files('dev-smartcard-reade
 if cacard.found()
   usbsmartcard_ss = ss.source_set()
   usbsmartcard_ss.add(when: 'CONFIG_USB_SMARTCARD',
-                      if_true: [cacard, files('ccid-card-emulated.c', 'ccid-card-passthru.c')])
+                      if_true: [cacard, files('ccid-card-passthru.c')])
   hw_usb_modules += {'smartcard': usbsmartcard_ss}
 endif
 
diff --git a/target/arm/cpu_tcg.c b/target/arm/cpu_tcg.c
index 13d0e9b1954..3826fa5122c 100644
--- a/target/arm/cpu_tcg.c
+++ b/target/arm/cpu_tcg.c
@@ -22,6 +22,7 @@
 /* CPU models. These are not needed for the AArch64 linux-user build. */
 #if !defined(CONFIG_USER_ONLY) || !defined(TARGET_AARCH64)
 
+#if 0 /* Disabled for Red Hat Enterprise Linux */
 #if !defined(CONFIG_USER_ONLY) && defined(CONFIG_TCG)
 static bool arm_v7m_cpu_exec_interrupt(CPUState *cs, int interrupt_request)
 {
@@ -375,6 +376,7 @@ static void cortex_a9_initfn(Object *obj)
     cpu->ccsidr[1] = 0x200fe019; /* 16k L1 icache. */
     define_arm_cp_regs(cpu, cortexa9_cp_reginfo);
 }
+#endif /* disabled for RHEL */
 
 #ifndef CONFIG_USER_ONLY
 static uint64_t a15_l2ctlr_read(CPUARMState *env, const ARMCPRegInfo *ri)
@@ -400,6 +402,7 @@ static const ARMCPRegInfo cortexa15_cp_reginfo[] = {
     REGINFO_SENTINEL
 };
 
+#if 0 /* Disabled for Red Hat Enterprise Linux */
 static void cortex_a7_initfn(Object *obj)
 {
     ARMCPU *cpu = ARM_CPU(obj);
@@ -445,6 +448,7 @@ static void cortex_a7_initfn(Object *obj)
     cpu->ccsidr[2] = 0x711fe07a; /* 4096K L2 unified cache */
     define_arm_cp_regs(cpu, cortexa15_cp_reginfo); /* Same as A15 */
 }
+#endif /* disabled for RHEL */
 
 static void cortex_a15_initfn(Object *obj)
 {
@@ -488,6 +492,7 @@ static void cortex_a15_initfn(Object *obj)
     define_arm_cp_regs(cpu, cortexa15_cp_reginfo);
 }
 
+#if 0 /* Disabled for Red Hat Enterprise Linux */
 static void cortex_m0_initfn(Object *obj)
 {
     ARMCPU *cpu = ARM_CPU(obj);
@@ -928,6 +933,7 @@ static void arm_v7m_class_init(ObjectClass *oc, void *data)
 
     cc->gdb_core_xml_file = "arm-m-profile.xml";
 }
+#endif /* disabled for RHEL */
 
 #ifndef TARGET_AARCH64
 /*
@@ -1007,6 +1013,7 @@ static void arm_max_initfn(Object *obj)
 #endif /* !TARGET_AARCH64 */
 
 static const ARMCPUInfo arm_tcg_cpus[] = {
+#if 0 /* Disabled for Red Hat Enterprise Linux */
     { .name = "arm926",      .initfn = arm926_initfn },
     { .name = "arm946",      .initfn = arm946_initfn },
     { .name = "arm1026",     .initfn = arm1026_initfn },
@@ -1022,7 +1029,9 @@ static const ARMCPUInfo arm_tcg_cpus[] = {
     { .name = "cortex-a7",   .initfn = cortex_a7_initfn },
     { .name = "cortex-a8",   .initfn = cortex_a8_initfn },
     { .name = "cortex-a9",   .initfn = cortex_a9_initfn },
+#endif /* disabled for RHEL */
     { .name = "cortex-a15",  .initfn = cortex_a15_initfn },
+#if 0 /* Disabled for Red Hat Enterprise Linux */
     { .name = "cortex-m0",   .initfn = cortex_m0_initfn,
                              .class_init = arm_v7m_class_init },
     { .name = "cortex-m3",   .initfn = cortex_m3_initfn,
@@ -1053,6 +1062,7 @@ static const ARMCPUInfo arm_tcg_cpus[] = {
     { .name = "pxa270-b1",   .initfn = pxa270b1_initfn },
     { .name = "pxa270-c0",   .initfn = pxa270c0_initfn },
     { .name = "pxa270-c5",   .initfn = pxa270c5_initfn },
+#endif /* disabled for RHEL */
 #ifndef TARGET_AARCH64
     { .name = "max",         .initfn = arm_max_initfn },
 #endif
diff --git a/target/ppc/cpu-models.c b/target/ppc/cpu-models.c
index 4baa111713b..d779c4d1d52 100644
--- a/target/ppc/cpu-models.c
+++ b/target/ppc/cpu-models.c
@@ -66,6 +66,7 @@
 #define POWERPC_DEF(_name, _pvr, _type, _desc)                              \
     POWERPC_DEF_SVR(_name, _desc, _pvr, POWERPC_SVR_NONE, _type)
 
+#if 0  /* Embedded and 32-bit CPUs disabled for Red Hat Enterprise Linux */
     /* Embedded PowerPC                                                      */
     /* PowerPC 401 family                                                    */
     POWERPC_DEF("401",           CPU_POWERPC_401,                    401,
@@ -740,8 +741,10 @@
                 "PowerPC 7447A v1.2 (G4)")
     POWERPC_DEF("7457a_v1.2",    CPU_POWERPC_74x7A_v12,              7455,
                 "PowerPC 7457A v1.2 (G4)")
+#endif
     /* 64 bits PowerPC                                                       */
 #if defined(TARGET_PPC64)
+#if 0  /* Disabled for Red Hat Enterprise Linux */
     POWERPC_DEF("970_v2.2",      CPU_POWERPC_970_v22,                970,
                 "PowerPC 970 v2.2")
     POWERPC_DEF("970fx_v1.0",    CPU_POWERPC_970FX_v10,              970,
@@ -760,6 +763,7 @@
                 "PowerPC 970MP v1.1")
     POWERPC_DEF("power5+_v2.1",  CPU_POWERPC_POWER5P_v21,            POWER5P,
                 "POWER5+ v2.1")
+#endif
     POWERPC_DEF("power7_v2.3",   CPU_POWERPC_POWER7_v23,             POWER7,
                 "POWER7 v2.3")
     POWERPC_DEF("power7+_v2.1",  CPU_POWERPC_POWER7P_v21,            POWER7,
@@ -784,6 +788,7 @@
 /* PowerPC CPU aliases                                                     */
 
 PowerPCCPUAlias ppc_cpu_aliases[] = {
+#if 0  /* Embedded and 32-bit CPUs disabled for Red Hat Enterprise Linux */
     { "403", "403gc" },
     { "405", "405d4" },
     { "405cr", "405crc" },
@@ -942,12 +947,15 @@ PowerPCCPUAlias ppc_cpu_aliases[] = {
     { "7447a", "7447a_v1.2" },
     { "7457a", "7457a_v1.2" },
     { "apollo7pm", "7457a_v1.0" },
+#endif
 #if defined(TARGET_PPC64)
+#if 0  /* Disabled for Red Hat Enterprise Linux */
     { "970", "970_v2.2" },
     { "970fx", "970fx_v3.1" },
     { "970mp", "970mp_v1.1" },
     { "power5+", "power5+_v2.1" },
     { "power5gs", "power5+_v2.1" },
+#endif
     { "power7", "power7_v2.3" },
     { "power7+", "power7+_v2.1" },
     { "power8e", "power8e_v2.1" },
@@ -957,6 +965,7 @@ PowerPCCPUAlias ppc_cpu_aliases[] = {
     { "power10", "power10_v2.0" },
 #endif
 
+#if 0  /* Disabled for Red Hat Enterprise Linux */
     /* Generic PowerPCs */
 #if defined(TARGET_PPC64)
     { "ppc64", "970fx_v3.1" },
@@ -964,5 +973,6 @@ PowerPCCPUAlias ppc_cpu_aliases[] = {
     { "ppc32", "604" },
     { "ppc", "604" },
     { "default", "604" },
+#endif
     { NULL, NULL }
 };
diff --git a/target/s390x/cpu_models_sysemu.c b/target/s390x/cpu_models_sysemu.c
index 05c3ccaaff2..6a04ccab1b4 100644
--- a/target/s390x/cpu_models_sysemu.c
+++ b/target/s390x/cpu_models_sysemu.c
@@ -36,6 +36,9 @@ static void check_unavailable_features(const S390CPUModel *max_model,
         (max_model->def->gen == model->def->gen &&
          max_model->def->ec_ga < model->def->ec_ga)) {
         list_add_feat("type", unavailable);
+    } else if (model->def->gen < 11 && kvm_enabled()) {
+        /* Older CPU models are not supported on Red Hat Enterprise Linux */
+        list_add_feat("type", unavailable);
     }
 
     /* detect missing features if any to properly report them */
diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
index 5b1fdb55c47..c52434985ba 100644
--- a/target/s390x/kvm/kvm.c
+++ b/target/s390x/kvm/kvm.c
@@ -2508,6 +2508,14 @@ void kvm_s390_apply_cpu_model(const S390CPUModel *model, Error **errp)
         error_setg(errp, "KVM doesn't support CPU models");
         return;
     }
+
+    /* Older CPU models are not supported on Red Hat Enterprise Linux */
+    if (model->def->gen < 11) {
+        error_setg(errp, "KVM: Unsupported CPU type specified: %s",
+                   MACHINE(qdev_get_machine())->cpu_type);
+        return;
+    }
+
     prop.cpuid = s390_cpuid_from_cpu_model(model);
     prop.ibc = s390_ibc_from_cpu_model(model);
     /* configure cpu features indicated via STFL(e) */
-- 
Gitee


From 22e5cf0522a96c6f92e9697efae5b1aaf42f7540 Mon Sep 17 00:00:00 2001
From: Miroslav Rezanina <mrezanin@redhat.com>
Date: Fri, 11 Jan 2019 09:54:45 +0100
Subject: [PATCH 003/407] Machine type related general changes

This patch is first part of original "Add RHEL machine types" patch we
split to allow easier review. It contains changes not related to any
architecture.

Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>

Rebase changes (4.0.0):
- Remove e1000 device duplication changes to reflect upstream solution
- Rewrite machine compat properties to upstream solution

Rebase changes (4.1.0):
- Removed optional flag for machine compat properties (upstream)
- Remove c3e002cb chunk from hw/net/e1000.c
- Reorder compat structures
- Use one format for compat scructures
- Added compat for virtio-balloon-pci.any_layout for rhel71

Rebase changes (weekly-210303):
- Added rhel 8.4.0 compat based on 5.2 compat

Rebase changes (weekly-211103):
- Do not duplicate minimal_version_id for piix4_pm

Merged patches (4.0.0):
- d4c0957 compat: Generic HW_COMPAT_RHEL7_6
- cbac773 virtio: Make disable-legacy/disable-modern compat properties optional

Merged patches (4.1.0):
- 479ad30 redhat: fix cut'n'paste garbage in hw_compat comments
- f19738e compat: Generic hw_compat_rhel_8_0

Merged patches (4.2.0):
- 9f2bfaa machine types: Update hw_compat_rhel_8_0 from hw_compat_4_0
- ca4a5e8 virtio: Make disable-legacy/disable-modern compat properties optional
- compat: Generic hw_compat_rhel_8_1 (patch 93040/92956)

Merged patches (5.1.0):
- e6c3fbf hw/smbios: set new default SMBIOS fields for Windows driver support (partialy)
- 8f9f4d8 compat: disable 'edid' for virtio-gpu-ccw

Merged patches (5.2.0 rc0):
- 8348642 redhat: define hw_compat_8_2
- 45b8402 redhat: define hw_compat_8_2
- 4effa71 redhat: Update hw_compat_8_2
- 0e84dff virtio: skip legacy support check on machine types less than 5.1 (partialy)

Merged patches (6.0.0):
- fa0063ba67 redhat: Define hw_compat_8_3
- d98e328c8d usb/hcd-xhci-pci: Fixup capabilities ordering (again)
- b8a2578117 virtio: move 'use-disabled-flag' property to hw_compat_4_2
- f7940b04c8 virtio-pci: compat page aligned ATS

Merged patches (weekly-210602):
- 26f25108c1 redhat: add missing entries in hw_compat_rhel_8_4

Merged patches (weekly-211006):
- 43c4b9bea6 redhat: Define hw_compat_rhel_8_5
---
 hw/acpi/ich9.c               |  15 ++
 hw/acpi/piix4.c              |   6 +-
 hw/arm/virt.c                |   2 +-
 hw/char/serial.c             |  16 +++
 hw/core/machine.c            | 272 +++++++++++++++++++++++++++++++++++
 hw/display/vga-isa.c         |   2 +-
 hw/i386/pc_piix.c            |   2 +
 hw/i386/pc_q35.c             |   2 +
 hw/net/e1000e.c              |  22 +++
 hw/net/rtl8139.c             |   4 +-
 hw/rtc/mc146818rtc.c         |   6 +
 hw/smbios/smbios.c           |  46 +++++-
 hw/timer/i8254_common.c      |   2 +-
 hw/usb/hcd-uhci.c            |   4 +-
 hw/usb/hcd-xhci-pci.c        |  59 ++++++--
 hw/usb/hcd-xhci-pci.h        |   1 +
 hw/usb/hcd-xhci.c            |  20 +++
 hw/usb/hcd-xhci.h            |   2 +
 include/hw/acpi/ich9.h       |   3 +
 include/hw/boards.h          |  36 +++++
 include/hw/firmware/smbios.h |   5 +-
 include/hw/i386/pc.h         |   3 +
 include/hw/usb.h             |   3 +
 migration/migration.c        |   2 +
 migration/migration.h        |   5 +
 25 files changed, 514 insertions(+), 26 deletions(-)

diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c
index 381ef2ddcf4..82bd805b557 100644
--- a/hw/acpi/ich9.c
+++ b/hw/acpi/ich9.c
@@ -433,6 +433,18 @@ static void ich9_pm_set_keep_pci_slot_hpc(Object *obj, bool value, Error **errp)
     s->pm.keep_pci_slot_hpc = value;
 }
 
+static bool ich9_pm_get_force_rev1_fadt(Object *obj, Error **errp)
+{
+    ICH9LPCState *s = ICH9_LPC_DEVICE(obj);
+    return s->pm.force_rev1_fadt;
+}
+
+static void ich9_pm_set_force_rev1_fadt(Object *obj, bool value, Error **errp)
+{
+    ICH9LPCState *s = ICH9_LPC_DEVICE(obj);
+    s->pm.force_rev1_fadt = value;
+}
+
 void ich9_pm_add_properties(Object *obj, ICH9LPCPMRegs *pm)
 {
     static const uint32_t gpe0_len = ICH9_PMIO_GPE0_LEN;
@@ -457,6 +469,9 @@ void ich9_pm_add_properties(Object *obj, ICH9LPCPMRegs *pm)
     object_property_add_bool(obj, "cpu-hotplug-legacy",
                              ich9_pm_get_cpu_hotplug_legacy,
                              ich9_pm_set_cpu_hotplug_legacy);
+    object_property_add_bool(obj, "__com.redhat_force-rev1-fadt",
+                             ich9_pm_get_force_rev1_fadt,
+                             ich9_pm_set_force_rev1_fadt);
     object_property_add_uint8_ptr(obj, ACPI_PM_PROP_S3_DISABLED,
                                   &pm->disable_s3, OBJ_PROP_FLAG_READWRITE);
     object_property_add_uint8_ptr(obj, ACPI_PM_PROP_S4_DISABLED,
diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
index f0b5fac44a1..8d6011c0a3b 100644
--- a/hw/acpi/piix4.c
+++ b/hw/acpi/piix4.c
@@ -278,7 +278,7 @@ static bool piix4_vmstate_need_smbus(void *opaque, int version_id)
 static const VMStateDescription vmstate_acpi = {
     .name = "piix4_pm",
     .version_id = 3,
-    .minimum_version_id = 3,
+    .minimum_version_id = 2,
     .post_load = vmstate_acpi_post_load,
     .fields = (VMStateField[]) {
         VMSTATE_PCI_DEVICE(parent_obj, PIIX4PMState),
@@ -644,8 +644,8 @@ static void piix4_send_gpe(AcpiDeviceIf *adev, AcpiEventStatusBits ev)
 
 static Property piix4_pm_properties[] = {
     DEFINE_PROP_UINT32("smb_io_base", PIIX4PMState, smb_io_base, 0),
-    DEFINE_PROP_UINT8(ACPI_PM_PROP_S3_DISABLED, PIIX4PMState, disable_s3, 0),
-    DEFINE_PROP_UINT8(ACPI_PM_PROP_S4_DISABLED, PIIX4PMState, disable_s4, 0),
+    DEFINE_PROP_UINT8(ACPI_PM_PROP_S3_DISABLED, PIIX4PMState, disable_s3, 1),
+    DEFINE_PROP_UINT8(ACPI_PM_PROP_S4_DISABLED, PIIX4PMState, disable_s4, 1),
     DEFINE_PROP_UINT8(ACPI_PM_PROP_S4_VAL, PIIX4PMState, s4_val, 2),
     DEFINE_PROP_BOOL(ACPI_PM_PROP_ACPI_PCIHP_BRIDGE, PIIX4PMState,
                      use_acpi_hotplug_bridge, true),
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 30da05dfe04..5de4d9d73bf 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1590,7 +1590,7 @@ static void virt_build_smbios(VirtMachineState *vms)
 
     smbios_set_defaults("QEMU", product,
                         vmc->smbios_old_sys_ver ? "1.0" : mc->name, false,
-                        true, SMBIOS_ENTRY_POINT_30);
+                        true, NULL, NULL, SMBIOS_ENTRY_POINT_30);
 
     smbios_get_tables(MACHINE(vms), NULL, 0,
                       &smbios_tables, &smbios_tables_len,
diff --git a/hw/char/serial.c b/hw/char/serial.c
index 7061aacbce9..fe8d0afbb0a 100644
--- a/hw/char/serial.c
+++ b/hw/char/serial.c
@@ -37,6 +37,7 @@
 #include "trace.h"
 #include "hw/qdev-properties.h"
 #include "hw/qdev-properties-system.h"
+#include "migration/migration.h"
 
 #define UART_LCR_DLAB	0x80	/* Divisor latch access bit */
 
@@ -689,6 +690,9 @@ static int serial_post_load(void *opaque, int version_id)
 static bool serial_thr_ipending_needed(void *opaque)
 {
     SerialState *s = opaque;
+    if (migrate_pre_2_2) {
+        return false;
+    }
 
     if (s->ier & UART_IER_THRI) {
         bool expected_value = ((s->iir & UART_IIR_ID) == UART_IIR_THRI);
@@ -770,6 +774,10 @@ static const VMStateDescription vmstate_serial_xmit_fifo = {
 static bool serial_fifo_timeout_timer_needed(void *opaque)
 {
     SerialState *s = (SerialState *)opaque;
+    if (migrate_pre_2_2) {
+        return false;
+    }
+
     return timer_pending(s->fifo_timeout_timer);
 }
 
@@ -787,6 +795,10 @@ static const VMStateDescription vmstate_serial_fifo_timeout_timer = {
 static bool serial_timeout_ipending_needed(void *opaque)
 {
     SerialState *s = (SerialState *)opaque;
+    if (migrate_pre_2_2) {
+        return false;
+    }
+
     return s->timeout_ipending != 0;
 }
 
@@ -804,6 +816,10 @@ static const VMStateDescription vmstate_serial_timeout_ipending = {
 static bool serial_poll_needed(void *opaque)
 {
     SerialState *s = (SerialState *)opaque;
+    if (migrate_pre_2_2) {
+        return false;
+    }
+
     return s->poll_msl >= 0;
 }
 
diff --git a/hw/core/machine.c b/hw/core/machine.c
index 53a99abc560..be4f9864cd6 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -37,6 +37,278 @@
 #include "hw/virtio/virtio.h"
 #include "hw/virtio/virtio-pci.h"
 
+/*
+ * Mostly the same as hw_compat_6_0
+ */
+GlobalProperty hw_compat_rhel_8_5[] = {
+    /* hw_compat_rhel_8_5 from hw_compat_6_0 */
+    { "gpex-pcihost", "allow-unmapped-accesses", "false" },
+    /* hw_compat_rhel_8_5 from hw_compat_6_0 */
+    { "i8042", "extended-state", "false"},
+    /* hw_compat_rhel_8_5 from hw_compat_6_0 */
+    { "nvme-ns", "eui64-default", "off"},
+    /* hw_compat_rhel_8_5 from hw_compat_6_0 */
+    { "e1000", "init-vet", "off" },
+    /* hw_compat_rhel_8_5 from hw_compat_6_0 */
+    { "e1000e", "init-vet", "off" },
+};
+const size_t hw_compat_rhel_8_5_len = G_N_ELEMENTS(hw_compat_rhel_8_5);
+
+/*
+ * Mostly the same as hw_compat_5_2
+ */
+GlobalProperty hw_compat_rhel_8_4[] = {
+    /* hw_compat_rhel_8_4 from hw_compat_5_2 */
+    { "ICH9-LPC", "smm-compat", "on"},
+    /* hw_compat_rhel_8_4 from hw_compat_5_2 */
+    { "PIIX4_PM", "smm-compat", "on"},
+    /* hw_compat_rhel_8_4 from hw_compat_5_2 */
+    { "virtio-blk-device", "report-discard-granularity", "off" },
+    /* hw_compat_rhel_8_4 from hw_compat_5_2 */
+    { "virtio-net-pci", "vectors", "3"},
+};
+const size_t hw_compat_rhel_8_4_len = G_N_ELEMENTS(hw_compat_rhel_8_4);
+
+/*
+ * Mostly the same as hw_compat_5_1
+ */
+GlobalProperty hw_compat_rhel_8_3[] = {
+    /* hw_compat_rhel_8_3 from hw_compat_5_1 */
+    { "vhost-scsi", "num_queues", "1"},
+    /* hw_compat_rhel_8_3 from hw_compat_5_1 */
+    { "vhost-user-blk", "num-queues", "1"},
+    /* hw_compat_rhel_8_3 from hw_compat_5_1 */
+    { "vhost-user-scsi", "num_queues", "1"},
+    /* hw_compat_rhel_8_3 from hw_compat_5_1 */
+    { "virtio-blk-device", "num-queues", "1"},
+    /* hw_compat_rhel_8_3 from hw_compat_5_1 */
+    { "virtio-scsi-device", "num_queues", "1"},
+    /* hw_compat_rhel_8_3 from hw_compat_5_1 */
+    { "nvme", "use-intel-id", "on"},
+    /* hw_compat_rhel_8_3 from hw_compat_5_1 */
+    { "pvpanic", "events", "1"}, /* PVPANIC_PANICKED */
+    /* hw_compat_rhel_8_3 bz 1912846 */
+    { "pci-xhci", "x-rh-late-msi-cap", "off" },
+    /* hw_compat_rhel_8_3 from hw_compat_5_1 */
+    { "virtio-pci", "x-ats-page-aligned", "off"},
+};
+const size_t hw_compat_rhel_8_3_len = G_N_ELEMENTS(hw_compat_rhel_8_3);
+
+/*
+ * The same as hw_compat_4_2 + hw_compat_5_0
+ */
+GlobalProperty hw_compat_rhel_8_2[] = {
+    /* hw_compat_rhel_8_2 from hw_compat_4_2 */
+    { "virtio-blk-device", "queue-size", "128"},
+    /* hw_compat_rhel_8_2 from hw_compat_4_2 */
+    { "virtio-scsi-device", "virtqueue_size", "128"},
+    /* hw_compat_rhel_8_2 from hw_compat_4_2 */
+    { "virtio-blk-device", "x-enable-wce-if-config-wce", "off" },
+    /* hw_compat_rhel_8_2 from hw_compat_4_2 */
+    { "virtio-blk-device", "seg-max-adjust", "off"},
+    /* hw_compat_rhel_8_2 from hw_compat_4_2 */
+    { "virtio-scsi-device", "seg_max_adjust", "off"},
+    /* hw_compat_rhel_8_2 from hw_compat_4_2 */
+    { "vhost-blk-device", "seg_max_adjust", "off"},
+    /* hw_compat_rhel_8_2 from hw_compat_4_2 */
+    { "usb-host", "suppress-remote-wake", "off" },
+    /* hw_compat_rhel_8_2 from hw_compat_4_2 */
+    { "usb-redir", "suppress-remote-wake", "off" },
+    /* hw_compat_rhel_8_2 from hw_compat_4_2 */
+    { "qxl", "revision", "4" },
+    /* hw_compat_rhel_8_2 from hw_compat_4_2 */
+    { "qxl-vga", "revision", "4" },
+    /* hw_compat_rhel_8_2 from hw_compat_4_2 */
+    { "fw_cfg", "acpi-mr-restore", "false" },
+    /* hw_compat_rhel_8_2 from hw_compat_4_2 */
+    { "virtio-device", "use-disabled-flag", "false" },
+    /* hw_compat_rhel_8_2 from hw_compat_5_0 */
+    { "pci-host-bridge", "x-config-reg-migration-enabled", "off" },
+    /* hw_compat_rhel_8_2 from hw_compat_5_0 */
+    { "virtio-balloon-device", "page-poison", "false" },
+    /* hw_compat_rhel_8_2 from hw_compat_5_0 */
+    { "vmport", "x-read-set-eax", "off" },
+    /* hw_compat_rhel_8_2 from hw_compat_5_0 */
+    { "vmport", "x-signal-unsupported-cmd", "off" },
+    /* hw_compat_rhel_8_2 from hw_compat_5_0 */
+    { "vmport", "x-report-vmx-type", "off" },
+    /* hw_compat_rhel_8_2 from hw_compat_5_0 */
+    { "vmport", "x-cmds-v2", "off" },
+    /* hw_compat_rhel_8_2 from hw_compat_5_0 */
+    { "virtio-device", "x-disable-legacy-check", "true" },
+};
+const size_t hw_compat_rhel_8_2_len = G_N_ELEMENTS(hw_compat_rhel_8_2);
+
+/*
+ * The same as hw_compat_4_1
+ */
+GlobalProperty hw_compat_rhel_8_1[] = {
+    /* hw_compat_rhel_8_1 from hw_compat_4_1 */
+    { "virtio-pci", "x-pcie-flr-init", "off" },
+};
+const size_t hw_compat_rhel_8_1_len = G_N_ELEMENTS(hw_compat_rhel_8_1);
+
+/* The same as hw_compat_3_1
+ * format of array has been changed by:
+ *     6c36bddf5340 ("machine: Use shorter format for GlobalProperty arrays")
+ */
+GlobalProperty hw_compat_rhel_8_0[] = {
+    /* hw_compat_rhel_8_0 from hw_compat_3_1 */
+    { "pcie-root-port", "x-speed", "2_5" },
+    /* hw_compat_rhel_8_0 from hw_compat_3_1 */
+    { "pcie-root-port", "x-width", "1" },
+    /* hw_compat_rhel_8_0 from hw_compat_3_1 */
+    { "memory-backend-file", "x-use-canonical-path-for-ramblock-id", "true" },
+    /* hw_compat_rhel_8_0 from hw_compat_3_1 */
+    { "memory-backend-memfd", "x-use-canonical-path-for-ramblock-id", "true" },
+    /* hw_compat_rhel_8_0 from hw_compat_3_1 */
+    { "tpm-crb", "ppi", "false" },
+    /* hw_compat_rhel_8_0 from hw_compat_3_1 */
+    { "tpm-tis", "ppi", "false" },
+    /* hw_compat_rhel_8_0 from hw_compat_3_1 */
+    { "usb-kbd", "serial", "42" },
+    /* hw_compat_rhel_8_0 from hw_compat_3_1 */
+    { "usb-mouse", "serial", "42" },
+    /* hw_compat_rhel_8_0 from hw_compat_3_1 */
+    { "usb-tablet", "serial", "42" },
+    /* hw_compat_rhel_8_0 from hw_compat_3_1 */
+    { "virtio-blk-device", "discard", "false" },
+    /* hw_compat_rhel_8_0 from hw_compat_3_1 */
+    { "virtio-blk-device", "write-zeroes", "false" },
+    /* hw_compat_rhel_8_0 from hw_compat_4_0 */
+    { "VGA",            "edid", "false" },
+    /* hw_compat_rhel_8_0 from hw_compat_4_0 */
+    { "secondary-vga",  "edid", "false" },
+    /* hw_compat_rhel_8_0 from hw_compat_4_0 */
+    { "bochs-display",  "edid", "false" },
+    /* hw_compat_rhel_8_0 from hw_compat_4_0 */
+    { "virtio-vga",     "edid", "false" },
+    /* hw_compat_rhel_8_0 from hw_compat_4_0 */
+    { "virtio-gpu-device", "edid", "false" },
+    /* hw_compat_rhel_8_0 from hw_compat_4_0 */
+    { "virtio-device", "use-started", "false" },
+    /* hw_compat_rhel_8_0 from hw_compat_3_1 - that was added in 4.1 */
+    { "pcie-root-port-base", "disable-acs", "true" },
+};
+const size_t hw_compat_rhel_8_0_len = G_N_ELEMENTS(hw_compat_rhel_8_0);
+
+/* The same as hw_compat_3_0 + hw_compat_2_12
+ * except that
+ *   there's nothing in 3_0
+ *   migration.decompress-error-check=off was in 7.5 from bz 1584139
+ */
+GlobalProperty hw_compat_rhel_7_6[] = {
+    /* hw_compat_rhel_7_6 from hw_compat_2_12 */
+    { "hda-audio", "use-timer", "false" },
+    /* hw_compat_rhel_7_6 from hw_compat_2_12 */
+    { "cirrus-vga", "global-vmstate", "true" },
+    /* hw_compat_rhel_7_6 from hw_compat_2_12 */
+    { "VGA", "global-vmstate", "true" },
+    /* hw_compat_rhel_7_6 from hw_compat_2_12 */
+    { "vmware-svga", "global-vmstate", "true" },
+    /* hw_compat_rhel_7_6 from hw_compat_2_12 */
+    { "qxl-vga", "global-vmstate",  "true" },
+};
+const size_t hw_compat_rhel_7_6_len = G_N_ELEMENTS(hw_compat_rhel_7_6);
+
+/* The same as hw_compat_2_11 + hw_compat_2_10 */
+GlobalProperty hw_compat_rhel_7_5[] = {
+    /* hw_compat_rhel_7_5 from hw_compat_2_11 */
+    { "hpet", "hpet-offset-saved", "false" },
+    /* hw_compat_rhel_7_5 from hw_compat_2_11 */
+    { "virtio-blk-pci", "vectors", "2" },
+    /* hw_compat_rhel_7_5 from hw_compat_2_11 */
+    { "vhost-user-blk-pci", "vectors", "2" },
+    /* hw_compat_rhel_7_5 from hw_compat_2_11
+       bz 1608778 modified for our naming */
+    { "e1000-82540em", "migrate_tso_props", "off" },
+    /* hw_compat_rhel_7_5 from hw_compat_2_10 */
+    { "virtio-mouse-device", "wheel-axis", "false" },
+    /* hw_compat_rhel_7_5 from hw_compat_2_10 */
+    { "virtio-tablet-device", "wheel-axis", "false" },
+    { "cirrus-vga", "vgamem_mb", "16" },
+    { "migration", "decompress-error-check", "off" },
+};
+const size_t hw_compat_rhel_7_5_len = G_N_ELEMENTS(hw_compat_rhel_7_5);
+
+/* Mostly like hw_compat_2_9 except
+ * x-mtu-bypass-backend, x-migrate-msix has already been
+ * backported to RHEL7.4. shpc was already on in 7.4.
+ */
+GlobalProperty hw_compat_rhel_7_4[] = {
+    { "intel-iommu", "pt", "off" },
+};
+
+const size_t hw_compat_rhel_7_4_len = G_N_ELEMENTS(hw_compat_rhel_7_4);
+/* Mostly like HW_COMPAT_2_6 + HW_COMPAT_2_7 + HW_COMPAT_2_8 except
+ * disable-modern, disable-legacy, page-per-vq have already been
+ * backported to RHEL7.3
+ */
+GlobalProperty hw_compat_rhel_7_3[] = {
+    { "virtio-mmio", "format_transport_address", "off" },
+    { "virtio-serial-device", "emergency-write", "off" },
+    { "ioapic", "version", "0x11" },
+    { "intel-iommu", "x-buggy-eim", "true" },
+    { "virtio-pci", "x-ignore-backend-features", "on" },
+    { "fw_cfg_mem", "x-file-slots", stringify(0x10) },
+    { "fw_cfg_io", "x-file-slots", stringify(0x10) },
+    { "pflash_cfi01", "old-multiple-chip-handling", "on" },
+    { TYPE_PCI_DEVICE, "x-pcie-extcap-init", "off" },
+    { "virtio-pci", "x-pcie-deverr-init", "off" },
+    { "virtio-pci", "x-pcie-lnkctl-init", "off" },
+    { "virtio-pci", "x-pcie-pm-init", "off" },
+    { "virtio-net-device", "x-mtu-bypass-backend", "off" },
+    { "e1000e", "__redhat_e1000e_7_3_intr_state", "on" },
+};
+const size_t hw_compat_rhel_7_3_len = G_N_ELEMENTS(hw_compat_rhel_7_3);
+
+/* Mostly like hw_compat_2_4 + 2_3 but:
+ *  we don't need "any_layout" as it has been backported to 7.2
+ */
+GlobalProperty hw_compat_rhel_7_2[] = {
+        { "virtio-blk-device", "scsi", "true" },
+        { "e1000-82540em", "extra_mac_registers", "off" },
+        { "virtio-pci", "x-disable-pcie", "on" },
+        { "virtio-pci", "migrate-extra", "off" },
+        { "fw_cfg_mem", "dma_enabled", "off" },
+        { "fw_cfg_io", "dma_enabled", "off" },
+        { "isa-fdc", "fallback", "144" },
+        /* Optional because not all virtio-pci devices support legacy mode */
+        { "virtio-pci", "disable-modern", "on", .optional = true },
+        { "virtio-pci", "disable-legacy", "off", .optional = true },
+        { TYPE_PCI_DEVICE, "x-pcie-lnksta-dllla", "off" },
+        { "virtio-pci", "page-per-vq", "on" },
+        /* hw_compat_rhel_7_2 - introduced with 2.10.0 */
+        { "migration", "send-section-footer", "off" },
+        /* hw_compat_rhel_7_2 - introduced with 2.10.0 */
+        { "migration", "store-global-state", "off",
+        },
+};
+const size_t hw_compat_rhel_7_2_len = G_N_ELEMENTS(hw_compat_rhel_7_2);
+
+/* Mostly like hw_compat_2_1 but:
+ *    we don't need virtio-scsi-pci since 7.0 already had that on
+ *
+ * RH: Note, qemu-extended-regs should have been enabled in the 7.1
+ * machine type, but was accidentally turned off in 7.2 onwards.
+ */
+GlobalProperty hw_compat_rhel_7_1[] = {
+    { "intel-hda-generic", "old_msi_addr", "on" },
+    { "VGA", "qemu-extended-regs", "off" },
+    { "secondary-vga", "qemu-extended-regs", "off" },
+    { "usb-mouse", "usb_version", stringify(1) },
+    { "usb-kbd", "usb_version", stringify(1) },
+    { "virtio-pci",  "virtio-pci-bus-master-bug-migration", "on" },
+    { "virtio-blk-pci", "any_layout", "off" },
+    { "virtio-balloon-pci", "any_layout", "off" },
+    { "virtio-serial-pci", "any_layout", "off" },
+    { "virtio-9p-pci", "any_layout", "off" },
+    { "virtio-rng-pci", "any_layout", "off" },
+    /* HW_COMPAT_RHEL7_1 - introduced with 2.10.0 */
+    { "migration", "send-configuration", "off" },
+};
+const size_t hw_compat_rhel_7_1_len = G_N_ELEMENTS(hw_compat_rhel_7_1);
+
 GlobalProperty hw_compat_6_1[] = {
     { "vhost-user-vsock-device", "seqpacket", "off" },
     { "nvme-ns", "shared", "off" },
diff --git a/hw/display/vga-isa.c b/hw/display/vga-isa.c
index 90851e730bc..a91c5d74676 100644
--- a/hw/display/vga-isa.c
+++ b/hw/display/vga-isa.c
@@ -85,7 +85,7 @@ static void vga_isa_realizefn(DeviceState *dev, Error **errp)
 }
 
 static Property vga_isa_properties[] = {
-    DEFINE_PROP_UINT32("vgamem_mb", ISAVGAState, state.vram_size_mb, 8),
+    DEFINE_PROP_UINT32("vgamem_mb", ISAVGAState, state.vram_size_mb, 16),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 223dd3e05d1..dda3f64f19e 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -177,6 +177,8 @@ static void pc_init1(MachineState *machine,
         smbios_set_defaults("QEMU", "Standard PC (i440FX + PIIX, 1996)",
                             mc->name, pcmc->smbios_legacy_mode,
                             pcmc->smbios_uuid_encoded,
+                            pcmc->smbios_stream_product,
+                            pcmc->smbios_stream_version,
                             SMBIOS_ENTRY_POINT_21);
     }
 
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index e1e100316d9..235054a6435 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -200,6 +200,8 @@ static void pc_q35_init(MachineState *machine)
         smbios_set_defaults("QEMU", "Standard PC (Q35 + ICH9, 2009)",
                             mc->name, pcmc->smbios_legacy_mode,
                             pcmc->smbios_uuid_encoded,
+                            pcmc->smbios_stream_product,
+                            pcmc->smbios_stream_version,
                             SMBIOS_ENTRY_POINT_21);
     }
 
diff --git a/hw/net/e1000e.c b/hw/net/e1000e.c
index ac96f7665af..d35bc1f0b09 100644
--- a/hw/net/e1000e.c
+++ b/hw/net/e1000e.c
@@ -81,6 +81,12 @@ struct E1000EState {
 
     E1000ECore core;
     bool init_vet;
+
+    /* 7.3 had the intr_state field that was in the original e1000e code
+     * but that was removed prior to 2.7's release
+     */
+    bool redhat_7_3_intr_state_enable;
+    uint32_t redhat_7_3_intr_state;
 };
 
 #define E1000E_MMIO_IDX     0
@@ -96,6 +102,10 @@ struct E1000EState {
 #define E1000E_MSIX_TABLE   (0x0000)
 #define E1000E_MSIX_PBA     (0x2000)
 
+/* Values as in RHEL 7.3 build and original upstream */
+#define RH_E1000E_USE_MSI     BIT(0)
+#define RH_E1000E_USE_MSIX    BIT(1)
+
 static uint64_t
 e1000e_mmio_read(void *opaque, hwaddr addr, unsigned size)
 {
@@ -307,6 +317,8 @@ e1000e_init_msix(E1000EState *s)
     } else {
         if (!e1000e_use_msix_vectors(s, E1000E_MSIX_VEC_NUM)) {
             msix_uninit(d, &s->msix, &s->msix);
+        } else {
+            s->redhat_7_3_intr_state |= RH_E1000E_USE_MSIX;
         }
     }
 }
@@ -478,6 +490,8 @@ static void e1000e_pci_realize(PCIDevice *pci_dev, Error **errp)
     ret = msi_init(PCI_DEVICE(s), 0xD0, 1, true, false, NULL);
     if (ret) {
         trace_e1000e_msi_init_fail(ret);
+    } else {
+        s->redhat_7_3_intr_state |= RH_E1000E_USE_MSI;
     }
 
     if (e1000e_add_pm_capability(pci_dev, e1000e_pmrb_offset,
@@ -605,6 +619,11 @@ static const VMStateDescription e1000e_vmstate_intr_timer = {
     VMSTATE_STRUCT_ARRAY(_f, _s, _num, 0,                           \
                          e1000e_vmstate_intr_timer, E1000IntrDelayTimer)
 
+static bool rhel_7_3_check(void *opaque, int version_id)
+{
+    return ((E1000EState *)opaque)->redhat_7_3_intr_state_enable;
+}
+
 static const VMStateDescription e1000e_vmstate = {
     .name = "e1000e",
     .version_id = 1,
@@ -616,6 +635,7 @@ static const VMStateDescription e1000e_vmstate = {
         VMSTATE_MSIX(parent_obj, E1000EState),
 
         VMSTATE_UINT32(ioaddr, E1000EState),
+        VMSTATE_UINT32_TEST(redhat_7_3_intr_state, E1000EState, rhel_7_3_check),
         VMSTATE_UINT32(core.rxbuf_min_shift, E1000EState),
         VMSTATE_UINT8(core.rx_desc_len, E1000EState),
         VMSTATE_UINT32_ARRAY(core.rxbuf_sizes, E1000EState,
@@ -664,6 +684,8 @@ static PropertyInfo e1000e_prop_disable_vnet,
 
 static Property e1000e_properties[] = {
     DEFINE_NIC_PROPERTIES(E1000EState, conf),
+    DEFINE_PROP_BOOL("__redhat_e1000e_7_3_intr_state", E1000EState,
+                        redhat_7_3_intr_state_enable, false),
     DEFINE_PROP_SIGNED("disable_vnet_hdr", E1000EState, disable_vnet, false,
                         e1000e_prop_disable_vnet, bool),
     DEFINE_PROP_SIGNED("subsys_ven", E1000EState, subsys_ven,
diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c
index 90b4fc63ce6..3ffb9dd22c2 100644
--- a/hw/net/rtl8139.c
+++ b/hw/net/rtl8139.c
@@ -3179,7 +3179,7 @@ static int rtl8139_pre_save(void *opaque)
 
 static const VMStateDescription vmstate_rtl8139 = {
     .name = "rtl8139",
-    .version_id = 5,
+    .version_id = 4,
     .minimum_version_id = 3,
     .post_load = rtl8139_post_load,
     .pre_save  = rtl8139_pre_save,
@@ -3260,7 +3260,9 @@ static const VMStateDescription vmstate_rtl8139 = {
         VMSTATE_UINT32(tally_counters.TxMCol, RTL8139State),
         VMSTATE_UINT64(tally_counters.RxOkPhy, RTL8139State),
         VMSTATE_UINT64(tally_counters.RxOkBrd, RTL8139State),
+#if 0 /* Disabled for Red Hat Enterprise Linux bz 1420195 */
         VMSTATE_UINT32_V(tally_counters.RxOkMul, RTL8139State, 5),
+#endif
         VMSTATE_UINT16(tally_counters.TxAbt, RTL8139State),
         VMSTATE_UINT16(tally_counters.TxUndrn, RTL8139State),
 
diff --git a/hw/rtc/mc146818rtc.c b/hw/rtc/mc146818rtc.c
index 4fbafddb226..2f120c6e70d 100644
--- a/hw/rtc/mc146818rtc.c
+++ b/hw/rtc/mc146818rtc.c
@@ -43,6 +43,7 @@
 #include "qapi/qapi-events-misc-target.h"
 #include "qapi/visitor.h"
 #include "hw/rtc/mc146818rtc_regs.h"
+#include "migration/migration.h"
 
 #ifdef TARGET_I386
 #include "qapi/qapi-commands-misc-target.h"
@@ -821,6 +822,11 @@ static int rtc_post_load(void *opaque, int version_id)
 static bool rtc_irq_reinject_on_ack_count_needed(void *opaque)
 {
     RTCState *s = (RTCState *)opaque;
+
+    if (migrate_pre_2_2) {
+        return false;
+    }
+
     return s->irq_reinject_on_ack_count != 0;
 }
 
diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c
index 7397e567373..3a4bb894bac 100644
--- a/hw/smbios/smbios.c
+++ b/hw/smbios/smbios.c
@@ -57,6 +57,9 @@ static bool smbios_legacy = true;
 static bool smbios_uuid_encoded = true;
 /* end: legacy structures & constants for <= 2.0 machines */
 
+/* Set to true for modern Windows 10 HardwareID-6 compat */
+static bool smbios_type2_required;
+
 
 uint8_t *smbios_tables;
 size_t smbios_tables_len;
@@ -619,7 +622,7 @@ static void smbios_build_type_1_table(void)
 
 static void smbios_build_type_2_table(void)
 {
-    SMBIOS_BUILD_TABLE_PRE(2, 0x200, false); /* optional */
+    SMBIOS_BUILD_TABLE_PRE(2, 0x200, smbios_type2_required);
 
     SMBIOS_TABLE_SET_STR(2, manufacturer_str, type2.manufacturer);
     SMBIOS_TABLE_SET_STR(2, product_str, type2.product);
@@ -888,7 +891,10 @@ void smbios_set_cpuid(uint32_t version, uint32_t features)
 
 void smbios_set_defaults(const char *manufacturer, const char *product,
                          const char *version, bool legacy_mode,
-                         bool uuid_encoded, SmbiosEntryPointType ep_type)
+                         bool uuid_encoded,
+                         const char *stream_product,
+                         const char *stream_version,
+                         SmbiosEntryPointType ep_type)
 {
     smbios_have_defaults = true;
     smbios_legacy = legacy_mode;
@@ -909,11 +915,45 @@ void smbios_set_defaults(const char *manufacturer, const char *product,
         g_free(smbios_entries);
     }
 
+    /*
+     * If @stream_product & @stream_version are non-NULL, then
+     * we're following rules for new Windows driver support.
+     * The data we have to report is defined in this doc:
+     *
+     * https://docs.microsoft.com/en-us/windows-hardware/drivers/install/specifying-hardware-ids-for-a-computer
+     *
+     * The Windows drivers are written to expect use of the
+     * scheme documented as "HardwareID-6" against Windows 10,
+     * which uses SMBIOS System (Type 1) and Base Board (Type 2)
+     * tables and will match on
+     *
+     *   System Manufacturer = Red Hat     (@manufacturer)
+     *   System SKU Number = 8.2.0         (@stream_version)
+     *   Baseboard Manufacturer = Red Hat  (@manufacturer)
+     *   Baseboard Product = RHEL-AV       (@stream_product)
+     *
+     * NB, SKU must be changed with each RHEL-AV release
+     *
+     * Other fields can be freely used by applications using
+     * QEMU. For example apps can use the "System product"
+     * and "System version" to identify themselves.
+     *
+     * We get 'System Manufacturer' and 'Baseboard Manufacturer'
+     */
     SMBIOS_SET_DEFAULT(type1.manufacturer, manufacturer);
     SMBIOS_SET_DEFAULT(type1.product, product);
     SMBIOS_SET_DEFAULT(type1.version, version);
+    SMBIOS_SET_DEFAULT(type1.family, "Red Hat Enterprise Linux");
+    if (stream_version != NULL) {
+        SMBIOS_SET_DEFAULT(type1.sku, stream_version);
+    }
     SMBIOS_SET_DEFAULT(type2.manufacturer, manufacturer);
-    SMBIOS_SET_DEFAULT(type2.product, product);
+    if (stream_product != NULL) {
+        SMBIOS_SET_DEFAULT(type2.product, stream_product);
+        smbios_type2_required = true;
+    } else {
+        SMBIOS_SET_DEFAULT(type2.product, product);
+    }
     SMBIOS_SET_DEFAULT(type2.version, version);
     SMBIOS_SET_DEFAULT(type3.manufacturer, manufacturer);
     SMBIOS_SET_DEFAULT(type3.version, version);
diff --git a/hw/timer/i8254_common.c b/hw/timer/i8254_common.c
index 050875b4973..32935da46c3 100644
--- a/hw/timer/i8254_common.c
+++ b/hw/timer/i8254_common.c
@@ -231,7 +231,7 @@ static const VMStateDescription vmstate_pit_common = {
     .pre_save = pit_dispatch_pre_save,
     .post_load = pit_dispatch_post_load,
     .fields = (VMStateField[]) {
-        VMSTATE_UINT32_V(channels[0].irq_disabled, PITCommonState, 3),
+        VMSTATE_UINT32(channels[0].irq_disabled, PITCommonState), /* qemu-kvm's v2 had 'flags' here */
         VMSTATE_STRUCT_ARRAY(channels, PITCommonState, 3, 2,
                              vmstate_pit_channel, PITChannelState),
         VMSTATE_INT64(channels[0].next_transition_time,
diff --git a/hw/usb/hcd-uhci.c b/hw/usb/hcd-uhci.c
index d1b5657d722..7930b868fa8 100644
--- a/hw/usb/hcd-uhci.c
+++ b/hw/usb/hcd-uhci.c
@@ -1166,11 +1166,13 @@ void usb_uhci_common_realize(PCIDevice *dev, Error **errp)
     UHCIState *s = UHCI(dev);
     uint8_t *pci_conf = s->dev.config;
     int i;
+    int irq_pin;
 
     pci_conf[PCI_CLASS_PROG] = 0x00;
     /* TODO: reset value should be 0. */
     pci_conf[USB_SBRN] = USB_RELEASE_1; /* release number */
-    pci_config_set_interrupt_pin(pci_conf, u->info.irq_pin + 1);
+    irq_pin = u->info.irq_pin;
+    pci_config_set_interrupt_pin(pci_conf, irq_pin + 1);
     s->irq = pci_allocate_irq(dev);
 
     if (s->masterbus) {
diff --git a/hw/usb/hcd-xhci-pci.c b/hw/usb/hcd-xhci-pci.c
index e934b1a5b1f..e18b05e5287 100644
--- a/hw/usb/hcd-xhci-pci.c
+++ b/hw/usb/hcd-xhci-pci.c
@@ -104,6 +104,33 @@ static int xhci_pci_vmstate_post_load(void *opaque, int version_id)
    return 0;
 }
 
+/* RH bz 1912846 */
+static bool usb_xhci_pci_add_msi(struct PCIDevice *dev, Error **errp)
+{
+    int ret;
+    Error *err = NULL;
+    XHCIPciState *s = XHCI_PCI(dev);
+
+    ret = msi_init(dev, 0x70, s->xhci.numintrs, true, false, &err);
+    /*
+     * Any error other than -ENOTSUP(board's MSI support is broken)
+     * is a programming error
+     */
+    assert(!ret || ret == -ENOTSUP);
+    if (ret && s->msi == ON_OFF_AUTO_ON) {
+        /* Can't satisfy user's explicit msi=on request, fail */
+        error_append_hint(&err, "You have to use msi=auto (default) or "
+                "msi=off with this machine type.\n");
+        error_propagate(errp, err);
+        return true;
+    }
+    assert(!err || s->msi == ON_OFF_AUTO_AUTO);
+    /* With msi=auto, we fall back to MSI off silently */
+    error_free(err);
+
+    return false;
+}
+
 static void usb_xhci_pci_realize(struct PCIDevice *dev, Error **errp)
 {
     int ret;
@@ -125,23 +152,12 @@ static void usb_xhci_pci_realize(struct PCIDevice *dev, Error **errp)
         s->xhci.nec_quirks = true;
     }
 
-    if (s->msi != ON_OFF_AUTO_OFF) {
-        ret = msi_init(dev, 0x70, s->xhci.numintrs, true, false, &err);
-        /*
-         * Any error other than -ENOTSUP(board's MSI support is broken)
-         * is a programming error
-         */
-        assert(!ret || ret == -ENOTSUP);
-        if (ret && s->msi == ON_OFF_AUTO_ON) {
-            /* Can't satisfy user's explicit msi=on request, fail */
-            error_append_hint(&err, "You have to use msi=auto (default) or "
-                    "msi=off with this machine type.\n");
+    if (s->msi != ON_OFF_AUTO_OFF && s->rh_late_msi_cap) {
+        /* This gives the behaviour from 5.2.0 onwards, lspci shows 90,a0,70 */
+        if (usb_xhci_pci_add_msi(dev, &err)) {
             error_propagate(errp, err);
             return;
         }
-        assert(!err || s->msi == ON_OFF_AUTO_AUTO);
-        /* With msi=auto, we fall back to MSI off silently */
-        error_free(err);
     }
     pci_register_bar(dev, 0,
                      PCI_BASE_ADDRESS_SPACE_MEMORY |
@@ -154,6 +170,14 @@ static void usb_xhci_pci_realize(struct PCIDevice *dev, Error **errp)
         assert(ret > 0);
     }
 
+    /* RH bz 1912846 */
+    if (s->msi != ON_OFF_AUTO_OFF && !s->rh_late_msi_cap) {
+        /* This gives the older RH machine behaviour, lspci shows 90,70,a0 */
+        if (usb_xhci_pci_add_msi(dev, &err)) {
+            error_propagate(errp, err);
+            return;
+        }
+    }
     if (s->msix != ON_OFF_AUTO_OFF) {
         /* TODO check for errors, and should fail when msix=on */
         msix_init(dev, s->xhci.numintrs,
@@ -198,11 +222,18 @@ static void xhci_instance_init(Object *obj)
     qdev_alias_all_properties(DEVICE(&s->xhci), obj);
 }
 
+static Property xhci_pci_properties[] = {
+    /* RH bz 1912846 */
+    DEFINE_PROP_BOOL("x-rh-late-msi-cap", XHCIPciState, rh_late_msi_cap, true),
+    DEFINE_PROP_END_OF_LIST()
+};
+
 static void xhci_class_init(ObjectClass *klass, void *data)
 {
     PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
     DeviceClass *dc = DEVICE_CLASS(klass);
 
+    device_class_set_props(dc, xhci_pci_properties);
     dc->reset   = xhci_pci_reset;
     dc->vmsd    = &vmstate_xhci_pci;
     set_bit(DEVICE_CATEGORY_USB, dc->categories);
diff --git a/hw/usb/hcd-xhci-pci.h b/hw/usb/hcd-xhci-pci.h
index c193f794439..086a1feb1ed 100644
--- a/hw/usb/hcd-xhci-pci.h
+++ b/hw/usb/hcd-xhci-pci.h
@@ -39,6 +39,7 @@ typedef struct XHCIPciState {
     XHCIState xhci;
     OnOffAuto msi;
     OnOffAuto msix;
+    bool      rh_late_msi_cap;  /* bz 1912846 */
 } XHCIPciState;
 
 #endif
diff --git a/hw/usb/hcd-xhci.c b/hw/usb/hcd-xhci.c
index e01700039b1..d5ea13356c1 100644
--- a/hw/usb/hcd-xhci.c
+++ b/hw/usb/hcd-xhci.c
@@ -3494,9 +3494,27 @@ static const VMStateDescription vmstate_xhci_slot = {
     }
 };
 
+static int xhci_event_pre_save(void *opaque)
+{
+    XHCIEvent *s = opaque;
+
+    s->cve_2014_5263_a = ((uint8_t *)&s->type)[0];
+    s->cve_2014_5263_b = ((uint8_t *)&s->type)[1];
+
+    return 0;
+}
+
+bool migrate_cve_2014_5263_xhci_fields;
+
+static bool xhci_event_cve_2014_5263(void *opaque, int version_id)
+{
+    return migrate_cve_2014_5263_xhci_fields;
+}
+
 static const VMStateDescription vmstate_xhci_event = {
     .name = "xhci-event",
     .version_id = 1,
+    .pre_save = xhci_event_pre_save,
     .fields = (VMStateField[]) {
         VMSTATE_UINT32(type,   XHCIEvent),
         VMSTATE_UINT32(ccode,  XHCIEvent),
@@ -3505,6 +3523,8 @@ static const VMStateDescription vmstate_xhci_event = {
         VMSTATE_UINT32(flags,  XHCIEvent),
         VMSTATE_UINT8(slotid,  XHCIEvent),
         VMSTATE_UINT8(epid,    XHCIEvent),
+        VMSTATE_UINT8_TEST(cve_2014_5263_a, XHCIEvent, xhci_event_cve_2014_5263),
+        VMSTATE_UINT8_TEST(cve_2014_5263_b, XHCIEvent, xhci_event_cve_2014_5263),
         VMSTATE_END_OF_LIST()
     }
 };
diff --git a/hw/usb/hcd-xhci.h b/hw/usb/hcd-xhci.h
index 98f598382ad..50a7b6f6c4b 100644
--- a/hw/usb/hcd-xhci.h
+++ b/hw/usb/hcd-xhci.h
@@ -149,6 +149,8 @@ typedef struct XHCIEvent {
     uint32_t flags;
     uint8_t slotid;
     uint8_t epid;
+    uint8_t cve_2014_5263_a;
+    uint8_t cve_2014_5263_b;
 } XHCIEvent;
 
 typedef struct XHCIInterrupter {
diff --git a/include/hw/acpi/ich9.h b/include/hw/acpi/ich9.h
index 7ca92843c6b..21abfd84473 100644
--- a/include/hw/acpi/ich9.h
+++ b/include/hw/acpi/ich9.h
@@ -68,6 +68,9 @@ typedef struct ICH9LPCPMRegs {
     bool smm_compat;
     bool enable_tco;
     TCOIORegs tco_regs;
+
+    /* RH addition, see bz 1489800 */
+    bool force_rev1_fadt;
 } ICH9LPCPMRegs;
 
 #define ACPI_PM_PROP_TCO_ENABLED "enable_tco"
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 9c1c1901046..8bba96ef2b3 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -441,4 +441,40 @@ extern const size_t hw_compat_2_2_len;
 extern GlobalProperty hw_compat_2_1[];
 extern const size_t hw_compat_2_1_len;
 
+extern GlobalProperty hw_compat_rhel_8_5[];
+extern const size_t hw_compat_rhel_8_5_len;
+
+extern GlobalProperty hw_compat_rhel_8_4[];
+extern const size_t hw_compat_rhel_8_4_len;
+
+extern GlobalProperty hw_compat_rhel_8_3[];
+extern const size_t hw_compat_rhel_8_3_len;
+
+extern GlobalProperty hw_compat_rhel_8_2[];
+extern const size_t hw_compat_rhel_8_2_len;
+
+extern GlobalProperty hw_compat_rhel_8_1[];
+extern const size_t hw_compat_rhel_8_1_len;
+
+extern GlobalProperty hw_compat_rhel_8_0[];
+extern const size_t hw_compat_rhel_8_0_len;
+
+extern GlobalProperty hw_compat_rhel_7_6[];
+extern const size_t hw_compat_rhel_7_6_len;
+
+extern GlobalProperty hw_compat_rhel_7_5[];
+extern const size_t hw_compat_rhel_7_5_len;
+
+extern GlobalProperty hw_compat_rhel_7_4[];
+extern const size_t hw_compat_rhel_7_4_len;
+
+extern GlobalProperty hw_compat_rhel_7_3[];
+extern const size_t hw_compat_rhel_7_3_len;
+
+extern GlobalProperty hw_compat_rhel_7_2[];
+extern const size_t hw_compat_rhel_7_2_len;
+
+extern GlobalProperty hw_compat_rhel_7_1[];
+extern const size_t hw_compat_rhel_7_1_len;
+
 #endif
diff --git a/include/hw/firmware/smbios.h b/include/hw/firmware/smbios.h
index 5a0dd0c8cff..2cb1ec2bab1 100644
--- a/include/hw/firmware/smbios.h
+++ b/include/hw/firmware/smbios.h
@@ -278,7 +278,10 @@ void smbios_entry_add(QemuOpts *opts, Error **errp);
 void smbios_set_cpuid(uint32_t version, uint32_t features);
 void smbios_set_defaults(const char *manufacturer, const char *product,
                          const char *version, bool legacy_mode,
-                         bool uuid_encoded, SmbiosEntryPointType ep_type);
+                         bool uuid_encoded,
+                         const char *stream_product,
+                         const char *stream_version,
+                         SmbiosEntryPointType ep_type);
 uint8_t *smbios_get_table_legacy(MachineState *ms, size_t *length);
 void smbios_get_tables(MachineState *ms,
                        const struct smbios_phys_mem_area *mem_array,
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 9ab39e428f8..7ccc9a1a078 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -107,6 +107,9 @@ struct PCMachineClass {
     bool smbios_defaults;
     bool smbios_legacy_mode;
     bool smbios_uuid_encoded;
+    /* New fields needed for Windows HardwareID-6 matching */
+    const char *smbios_stream_product;
+    const char *smbios_stream_version;
 
     /* RAM / address space compat: */
     bool gigabyte_align;
diff --git a/include/hw/usb.h b/include/hw/usb.h
index 33668dd0a99..e6b2fe72dac 100644
--- a/include/hw/usb.h
+++ b/include/hw/usb.h
@@ -582,4 +582,7 @@ void usb_pcap_init(FILE *fp);
 void usb_pcap_ctrl(USBPacket *p, bool setup);
 void usb_pcap_data(USBPacket *p, bool setup);
 
+/* hcd-xhci.c -- rhel7.0.0 machine type compatibility */
+extern bool migrate_cve_2014_5263_xhci_fields;
+
 #endif
diff --git a/migration/migration.c b/migration/migration.c
index abaf6f9e3d7..a87ff01b819 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -164,6 +164,8 @@ INITIALIZE_MIGRATE_CAPS_SET(check_caps_background_snapshot,
     MIGRATION_CAPABILITY_X_COLO,
     MIGRATION_CAPABILITY_VALIDATE_UUID);
 
+bool migrate_pre_2_2;
+
 /* When we add fault tolerance, we could have several
    migrations at once.  For now we don't need to add
    dynamic creation of migration */
diff --git a/migration/migration.h b/migration/migration.h
index 8130b703eb9..d016cedd9d2 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -381,6 +381,11 @@ bool check_dirty_bitmap_mig_alias_map(const BitmapMigrationNodeAliasList *bbm,
 void migrate_add_address(SocketAddress *address);
 
 int foreach_not_ignored_block(RAMBlockIterFunc func, void *opaque);
+/*
+ * Disables a load of subsections that were added in 2.2/rh7.2 for backwards
+ * migration compatibility.
+ */
+extern bool migrate_pre_2_2;
 
 #define qemu_ram_foreach_block \
   #warning "Use foreach_not_ignored_block in migration code"
-- 
Gitee


From 1c4d88b92c7f4a51abcd702cddab8bb4b2616bc5 Mon Sep 17 00:00:00 2001
From: Miroslav Rezanina <mrezanin@redhat.com>
Date: Fri, 19 Oct 2018 12:53:31 +0200
Subject: [PATCH 004/407] Add aarch64 machine types

Adding changes to add RHEL machine types for aarch64 architecture.

Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>

Rebase notes (4.0.0):
- Use upstream compat handling

Rebase notes (4.1.0-rc0):
- Removed a15memmap (upstream)
- Use virt_flash_create in rhel800_virt_instance_init

Rebase notes (4.2.0-rc0):
- Set numa_mem_supported

Rebase notes (4.2.0-rc3):
- aarch64: Add virt-rhel8.2.0 machine type for ARM (patch 92246)
- aarch64: virt: Allow more than 1TB of RAM (patch 92249)
- aarch64: virt: Allow PCDIMM instantiation (patch 92247)
- aarch64: virt: Enhance the comment related to gic-version (patch 92248)

Rebase notes (5.0.0):
- Set default_ram_id in rhel_machine_class_init
- Added setting acpi properties

Rebase notes (5.1.0):
- Added ras property
- Added to virt_machine_device_unplug_cb to machine type (upstream)
- added mte property (upstream)

Rebase notes (weekly-210210):
- Added support for oem fields to machine type

Rebase notes (weekly-210303):
- Use rhel-8.4.0 hw compat

Rebase notes (6.0.0-rc2):
- renamed oem-id and oem-table-id to x-oem-id and x-oem-table-id

Rebase notes (210623):
- Protect TPM functions by CONFIG_TPM ifdef

Rebase notes (6.1.0-rc0):
- Add support for default_bus_bypass_iommu

Merged patches (4.0.0):
- 7bfdb4c aarch64: Add virt-rhel8.0.0 machine type for ARM
- 3433e69 aarch64: Set virt-rhel8.0.0 max_cpus to 512
- 4d20863 aarch64: Use 256MB ECAM region by default

Merged patches (4.1.0):
- c3e39ef aarch64: Add virt-rhel8.1.0 machine type for ARM
- 59a46d1 aarch64: Allow ARM VIRT iommu option in RHEL8.1 machine

Merged patches (5.2.0 rc0):
- 12990ad hw/arm: Changes to rhel820 machine
- 46d5a79 hw/arm: Introduce rhel_virt_instance_init() helper
- 098954a hw/arm: Add rhel830 machine type
- ee8e99d arm: Set correct max_cpus value on virt-rhel* machine types
- e5edd38 RHEL-only: arm/virt: Allow the TPM_TIS_SYSBUS device dynamic allocation in machvirt
- 6d7ba66 machine types/numa: set numa_mem_supported on old machine types (partialy)
- 25c5644 machine_types/numa: compatibility for auto_enable_numa_with_memdev (partialy)

Merged patches (6.0):
- 078fadb5da AArch64 machine types cleanup
- ea7b7425fa hw/arm/virt: Add 8.4 Machine type

Merged patches (weekly-210609):
- 73b1578882 hw/arm/virt: Add 8.5 machine type
- 5333038d11 hw/arm/virt: Disable PL011 clock migration through hw_compat_rhel_8_3
- 63adb8ae86 arm/virt: Register highmem and gic-version as class properties

Merged patches (weekly-211027):
- 86e3057c0a hw: arm: virt: Add hw_compat_rhel_8_5 to 8.5 machine type
---
 hw/arm/virt.c         | 226 +++++++++++++++++++++++++++++++++++++++++-
 hw/core/machine.c     |   2 +
 include/hw/arm/virt.h |   8 ++
 3 files changed, 235 insertions(+), 1 deletion(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 5de4d9d73bf..c77d26ab138 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -79,6 +79,7 @@
 #include "hw/char/pl011.h"
 #include "qemu/guest-random.h"
 
+#if 0 /* Disabled for Red Hat Enterprise Linux */
 #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
     static void virt_##major##_##minor##_class_init(ObjectClass *oc, \
                                                     void *data) \
@@ -105,7 +106,48 @@
     DEFINE_VIRT_MACHINE_LATEST(major, minor, true)
 #define DEFINE_VIRT_MACHINE(major, minor) \
     DEFINE_VIRT_MACHINE_LATEST(major, minor, false)
-
+#endif /* disabled for RHEL */
+
+#define DEFINE_RHEL_MACHINE_LATEST(m, n, s, latest)                     \
+    static void rhel##m##n##s##_virt_class_init(ObjectClass *oc,        \
+                                                void *data)             \
+    {                                                                   \
+        MachineClass *mc = MACHINE_CLASS(oc);                           \
+        rhel##m##n##s##_virt_options(mc);                               \
+        mc->desc = "RHEL " # m "." # n "." # s " ARM Virtual Machine";  \
+        if (latest) {                                                   \
+            mc->alias = "virt";                                         \
+            mc->is_default = 1;                                         \
+        }                                                               \
+    }                                                                   \
+    static const TypeInfo rhel##m##n##s##_machvirt_info = {             \
+        .name = MACHINE_TYPE_NAME("virt-rhel" # m "." # n "." # s),     \
+        .parent = TYPE_RHEL_MACHINE,                                    \
+        .class_init = rhel##m##n##s##_virt_class_init,                  \
+    };                                                                  \
+    static void rhel##m##n##s##_machvirt_init(void)                     \
+    {                                                                   \
+        type_register_static(&rhel##m##n##s##_machvirt_info);           \
+    }                                                                   \
+    type_init(rhel##m##n##s##_machvirt_init);
+
+#define DEFINE_RHEL_MACHINE_AS_LATEST(major, minor, subminor)   \
+    DEFINE_RHEL_MACHINE_LATEST(major, minor, subminor, true)
+#define DEFINE_RHEL_MACHINE(major, minor, subminor)             \
+    DEFINE_RHEL_MACHINE_LATEST(major, minor, subminor, false)
+
+/* This variable is for changes to properties that are RHEL specific,
+ * different to the current upstream and to be applied to the latest
+ * machine type.
+ */
+GlobalProperty arm_rhel_compat[] = {
+    {
+        .driver   = "virtio-net-pci",
+        .property = "romfile",
+        .value    = "",
+    },
+};
+const size_t arm_rhel_compat_len = G_N_ELEMENTS(arm_rhel_compat);
 
 /* Number of external interrupt lines to configure the GIC with */
 #define NUM_IRQS 256
@@ -2180,6 +2222,7 @@ static void machvirt_init(MachineState *machine)
     qemu_add_machine_init_done_notifier(&vms->machine_done);
 }
 
+#if 0 /* Disabled for Red Hat Enterprise Linux */
 static bool virt_get_secure(Object *obj, Error **errp)
 {
     VirtMachineState *vms = VIRT_MACHINE(obj);
@@ -2207,6 +2250,7 @@ static void virt_set_virt(Object *obj, bool value, Error **errp)
 
     vms->virt = value;
 }
+#endif /* disabled for RHEL */
 
 static bool virt_get_highmem(Object *obj, Error **errp)
 {
@@ -2304,6 +2348,7 @@ static void virt_set_acpi(Object *obj, Visitor *v, const char *name,
     visit_type_OnOffAuto(v, name, &vms->acpi, errp);
 }
 
+#if 0 /* Disabled for Red Hat Enterprise Linux */
 static bool virt_get_ras(Object *obj, Error **errp)
 {
     VirtMachineState *vms = VIRT_MACHINE(obj);
@@ -2331,6 +2376,7 @@ static void virt_set_mte(Object *obj, bool value, Error **errp)
 
     vms->mte = value;
 }
+#endif /* disabled for RHEL */
 
 static char *virt_get_gic_version(Object *obj, Error **errp)
 {
@@ -2666,6 +2712,7 @@ static int virt_kvm_type(MachineState *ms, const char *type_str)
     return fixed_ipa ? 0 : requested_pa_size;
 }
 
+#if 0 /* Disabled for Red Hat Enterprise Linux */
 static void virt_machine_class_init(ObjectClass *oc, void *data)
 {
     MachineClass *mc = MACHINE_CLASS(oc);
@@ -3031,3 +3078,180 @@ static void virt_machine_2_6_options(MachineClass *mc)
     vmc->no_pmu = true;
 }
 DEFINE_VIRT_MACHINE(2, 6)
+#endif /* disabled for RHEL */
+
+static void rhel_machine_class_init(ObjectClass *oc, void *data)
+{
+    MachineClass *mc = MACHINE_CLASS(oc);
+    HotplugHandlerClass *hc = HOTPLUG_HANDLER_CLASS(oc);
+
+    mc->family = "virt-rhel-Z";
+    mc->init = machvirt_init;
+    /* Maximum supported VCPU count for all virt-rhel* machines */
+    mc->max_cpus = 384;
+#ifdef CONFIG_TPM
+    machine_class_allow_dynamic_sysbus_dev(mc, TYPE_TPM_TIS_SYSBUS);
+#endif
+    mc->block_default_type = IF_VIRTIO;
+    mc->no_cdrom = 1;
+    mc->pci_allow_0_address = true;
+    /* We know we will never create a pre-ARMv7 CPU which needs 1K pages */
+    mc->minimum_page_bits = 12;
+    mc->possible_cpu_arch_ids = virt_possible_cpu_arch_ids;
+    mc->cpu_index_to_instance_props = virt_cpu_index_to_props;
+    mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-a57");
+    mc->get_default_cpu_node_id = virt_get_default_cpu_node_id;
+    mc->kvm_type = virt_kvm_type;
+    assert(!mc->get_hotplug_handler);
+    mc->get_hotplug_handler = virt_machine_get_hotplug_handler;
+    hc->pre_plug = virt_machine_device_pre_plug_cb;
+    hc->plug = virt_machine_device_plug_cb;
+    hc->unplug_request = virt_machine_device_unplug_request_cb;
+    hc->unplug = virt_machine_device_unplug_cb;
+    mc->nvdimm_supported = true;
+    mc->auto_enable_numa_with_memhp = true;
+    mc->auto_enable_numa_with_memdev = true;
+    mc->default_ram_id = "mach-virt.ram";
+
+    object_class_property_add(oc, "acpi", "OnOffAuto",
+        virt_get_acpi, virt_set_acpi,
+        NULL, NULL);
+    object_class_property_set_description(oc, "acpi",
+        "Enable ACPI");
+
+    object_class_property_add_bool(oc, "highmem", virt_get_highmem,
+                                   virt_set_highmem);
+    object_class_property_set_description(oc, "highmem",
+                                          "Set on/off to enable/disable using "
+                                          "physical address space above 32 bits");
+
+    object_class_property_add_str(oc, "gic-version", virt_get_gic_version,
+                                  virt_set_gic_version);
+    object_class_property_set_description(oc, "gic-version",
+                                          "Set GIC version. "
+                                          "Valid values are 2, 3, host and max");
+
+    object_class_property_add_str(oc, "x-oem-id",
+                                  virt_get_oem_id,
+                                  virt_set_oem_id);
+    object_class_property_set_description(oc, "x-oem-id",
+                                          "Override the default value of field OEMID "
+                                          "in ACPI table header."
+                                          "The string may be up to 6 bytes in size");
+
+    object_class_property_add_str(oc, "x-oem-table-id",
+                                  virt_get_oem_table_id,
+                                  virt_set_oem_table_id);
+    object_class_property_set_description(oc, "x-oem-table-id",
+                                          "Override the default value of field OEM Table ID "
+                                          "in ACPI table header."
+                                          "The string may be up to 8 bytes in size");
+    object_class_property_add_bool(oc, "default_bus_bypass_iommu",
+                                   virt_get_default_bus_bypass_iommu,
+                                   virt_set_default_bus_bypass_iommu);
+
+}
+
+static void rhel_virt_instance_init(Object *obj)
+{
+    VirtMachineState *vms = VIRT_MACHINE(obj);
+    VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms);
+
+    /* EL3 is disabled by default and non-configurable for RHEL */
+    vms->secure = false;
+
+    /* EL2 is disabled by default and non-configurable for RHEL */
+    vms->virt = false;
+
+    /* High memory is enabled by default */
+    vms->highmem = true;
+    vms->gic_version = VIRT_GIC_VERSION_NOSEL;
+
+    vms->highmem_ecam = !vmc->no_highmem_ecam;
+
+    if (vmc->no_its) {
+        vms->its = false;
+    } else {
+        /* Default allows ITS instantiation */
+        vms->its = true;
+        object_property_add_bool(obj, "its", virt_get_its,
+                                 virt_set_its);
+        object_property_set_description(obj, "its",
+                                        "Set on/off to enable/disable "
+                                        "ITS instantiation");
+    }
+
+    /* Default disallows iommu instantiation */
+    vms->iommu = VIRT_IOMMU_NONE;
+    object_property_add_str(obj, "iommu", virt_get_iommu, virt_set_iommu);
+    object_property_set_description(obj, "iommu",
+                                    "Set the IOMMU type. "
+                                    "Valid values are none and smmuv3");
+
+    /* Default disallows RAS instantiation and is non-configurable for RHEL */
+    vms->ras = false;
+
+    /* MTE is disabled by default and non-configurable for RHEL */
+    vms->mte = false;
+
+    vms->default_bus_bypass_iommu = false;
+    vms->irqmap = a15irqmap;
+
+    virt_flash_create(vms);
+    vms->oem_id = g_strndup(ACPI_BUILD_APPNAME6, 6);
+    vms->oem_table_id = g_strndup(ACPI_BUILD_APPNAME8, 8);
+
+}
+
+static const TypeInfo rhel_machine_info = {
+    .name          = TYPE_RHEL_MACHINE,
+    .parent        = TYPE_MACHINE,
+    .abstract      = true,
+    .instance_size = sizeof(VirtMachineState),
+    .class_size    = sizeof(VirtMachineClass),
+    .class_init    = rhel_machine_class_init,
+    .instance_init = rhel_virt_instance_init,
+    .interfaces = (InterfaceInfo[]) {
+         { TYPE_HOTPLUG_HANDLER },
+         { }
+    },
+};
+
+static void rhel_machine_init(void)
+{
+    type_register_static(&rhel_machine_info);
+}
+type_init(rhel_machine_init);
+
+static void rhel850_virt_options(MachineClass *mc)
+{
+    compat_props_add(mc->compat_props, arm_rhel_compat, arm_rhel_compat_len);
+    compat_props_add(mc->compat_props, hw_compat_rhel_8_5, hw_compat_rhel_8_5_len);
+}
+DEFINE_RHEL_MACHINE_AS_LATEST(8, 5, 0)
+
+static void rhel840_virt_options(MachineClass *mc)
+{
+    rhel850_virt_options(mc);
+    compat_props_add(mc->compat_props, hw_compat_rhel_8_4, hw_compat_rhel_8_4_len);
+}
+DEFINE_RHEL_MACHINE(8, 4, 0)
+
+static void rhel830_virt_options(MachineClass *mc)
+{
+    VirtMachineClass *vmc = VIRT_MACHINE_CLASS(OBJECT_CLASS(mc));
+
+    rhel840_virt_options(mc);
+    compat_props_add(mc->compat_props, hw_compat_rhel_8_3, hw_compat_rhel_8_3_len);
+    vmc->no_kvm_steal_time = true;
+}
+DEFINE_RHEL_MACHINE(8, 3, 0)
+
+static void rhel820_virt_options(MachineClass *mc)
+{
+    rhel830_virt_options(mc);
+    compat_props_add(mc->compat_props, hw_compat_rhel_8_2, hw_compat_rhel_8_2_len);
+    mc->numa_mem_supported = true;
+    mc->auto_enable_numa_with_memdev = false;
+}
+DEFINE_RHEL_MACHINE(8, 2, 0)
diff --git a/hw/core/machine.c b/hw/core/machine.c
index be4f9864cd6..62febde5aa6 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -87,6 +87,8 @@ GlobalProperty hw_compat_rhel_8_3[] = {
     { "nvme", "use-intel-id", "on"},
     /* hw_compat_rhel_8_3 from hw_compat_5_1 */
     { "pvpanic", "events", "1"}, /* PVPANIC_PANICKED */
+    /* hw_compat_rhel_8_3 from hw_compat_5_1 */
+    { "pl011", "migrate-clk", "off" },
     /* hw_compat_rhel_8_3 bz 1912846 */
     { "pci-xhci", "x-rh-late-msi-cap", "off" },
     /* hw_compat_rhel_8_3 from hw_compat_5_1 */
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index dc6b66ffc8f..93646288473 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -175,9 +175,17 @@ struct VirtMachineState {
 
 #define VIRT_ECAM_ID(high) (high ? VIRT_HIGH_PCIE_ECAM : VIRT_PCIE_ECAM)
 
+#if 0 /* disabled for Red Hat Enterprise Linux */
 #define TYPE_VIRT_MACHINE   MACHINE_TYPE_NAME("virt")
 OBJECT_DECLARE_TYPE(VirtMachineState, VirtMachineClass, VIRT_MACHINE)
 
+#else
+#define TYPE_RHEL_MACHINE MACHINE_TYPE_NAME("virt-rhel")
+typedef struct VirtMachineClass VirtMachineClass;
+typedef struct VirtMachineState VirtMachineState;
+DECLARE_OBJ_CHECKERS(VirtMachineState, VirtMachineClass, VIRT_MACHINE, TYPE_RHEL_MACHINE)
+#endif
+
 void virt_acpi_setup(VirtMachineState *vms);
 bool virt_is_acpi_enabled(VirtMachineState *vms);
 
-- 
Gitee


From a801a76335ee433ea0ab1d0591d07e397e4600db Mon Sep 17 00:00:00 2001
From: Miroslav Rezanina <mrezanin@redhat.com>
Date: Fri, 19 Oct 2018 13:47:32 +0200
Subject: [PATCH 005/407] Add s390x machine types

Adding changes to add RHEL machine types for s390x architecture.

Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>

Rebase changes (weekly-4.1.0):
- Use upstream compat handling

Rebase notes (weekly-210303):
- Use rhel-8.4.0 hw compat

Merged patches (3.1.0):
- 29df663 s390x/cpumodel: default enable bpb and ppa15 for z196 and later

Merged patches (4.1.0):
- 6c200d665b hw/s390x/s390-virtio-ccw: Add machine types for RHEL8.0.0

Merged patches (4.2.0):
- fb192e5 redhat: s390x: Rename s390-ccw-virtio-rhel8.0.0 to s390-ccw-virtio-rhel8.1.0
- a9b22e8 redhat: s390x: Add proper compatibility options for the -rhel7.6.0 machine
- hw/s390x: Add the s390-ccw-virtio-rhel8.2.0 machine types (patch 92954)

Merged patches (weekly-201216):
- a6ae745cce redhat: s390x: add rhel-8.4.0 compat machine

Merged patches (weekly-210602):
- 50835d3429 redhat: s390x: add rhel-8.5.0 compat machine

Merged patches (weekly-211006):
- a3bcde27fe redhat: Add s390x machine type compatibility update for 6.1 rebase
---
 hw/s390x/s390-virtio-ccw.c | 99 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 98 insertions(+), 1 deletion(-)

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 653587ea62f..181856e6cf5 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -767,7 +767,7 @@ bool css_migration_enabled(void)
     {                                                                         \
         MachineClass *mc = MACHINE_CLASS(oc);                                 \
         ccw_machine_##suffix##_class_options(mc);                             \
-        mc->desc = "VirtIO-ccw based S390 machine v" verstr;                  \
+        mc->desc = "VirtIO-ccw based S390 machine " verstr;                   \
         if (latest) {                                                         \
             mc->alias = "s390-ccw-virtio";                                    \
             mc->is_default = true;                                            \
@@ -791,6 +791,7 @@ bool css_migration_enabled(void)
     }                                                                         \
     type_init(ccw_machine_register_##suffix)
 
+#if 0 /* Disabled for Red Hat Enterprise Linux */
 static void ccw_machine_6_2_instance_options(MachineState *machine)
 {
 }
@@ -1100,6 +1101,102 @@ static void ccw_machine_2_4_class_options(MachineClass *mc)
     compat_props_add(mc->compat_props, compat, G_N_ELEMENTS(compat));
 }
 DEFINE_CCW_MACHINE(2_4, "2.4", false);
+#endif
+
+static void ccw_machine_rhel850_instance_options(MachineState *machine)
+{
+}
+
+static void ccw_machine_rhel850_class_options(MachineClass *mc)
+{
+    compat_props_add(mc->compat_props, hw_compat_rhel_8_5, hw_compat_rhel_8_5_len);
+}
+DEFINE_CCW_MACHINE(rhel850, "rhel8.5.0", true);
+
+static void ccw_machine_rhel840_instance_options(MachineState *machine)
+{
+    ccw_machine_rhel850_instance_options(machine);
+}
+
+static void ccw_machine_rhel840_class_options(MachineClass *mc)
+{
+    ccw_machine_rhel850_class_options(mc);
+    compat_props_add(mc->compat_props, hw_compat_rhel_8_4, hw_compat_rhel_8_4_len);
+}
+DEFINE_CCW_MACHINE(rhel840, "rhel8.4.0", false);
+
+static void ccw_machine_rhel820_instance_options(MachineState *machine)
+{
+    ccw_machine_rhel840_instance_options(machine);
+}
+
+static void ccw_machine_rhel820_class_options(MachineClass *mc)
+{
+    ccw_machine_rhel840_class_options(mc);
+    mc->fixup_ram_size = s390_fixup_ram_size;
+    /* we did not publish a rhel8.3.0 machine */
+    compat_props_add(mc->compat_props, hw_compat_rhel_8_3, hw_compat_rhel_8_3_len);
+    compat_props_add(mc->compat_props, hw_compat_rhel_8_2, hw_compat_rhel_8_2_len);
+}
+DEFINE_CCW_MACHINE(rhel820, "rhel8.2.0", false);
+
+static void ccw_machine_rhel760_instance_options(MachineState *machine)
+{
+    static const S390FeatInit qemu_cpu_feat = { S390_FEAT_LIST_QEMU_V3_1 };
+
+    ccw_machine_rhel820_instance_options(machine);
+
+    s390_set_qemu_cpu_model(0x2827, 12, 2, qemu_cpu_feat);
+
+    /* The multiple-epoch facility was not available with rhel7.6.0 on z14GA1 */
+    s390_cpudef_featoff(14, 1, S390_FEAT_MULTIPLE_EPOCH);
+    s390_cpudef_featoff(14, 1, S390_FEAT_PTFF_QSIE);
+    s390_cpudef_featoff(14, 1, S390_FEAT_PTFF_QTOUE);
+    s390_cpudef_featoff(14, 1, S390_FEAT_PTFF_STOE);
+    s390_cpudef_featoff(14, 1, S390_FEAT_PTFF_STOUE);
+}
+
+static void ccw_machine_rhel760_class_options(MachineClass *mc)
+{
+    ccw_machine_rhel820_class_options(mc);
+    /* We never published the s390x version of RHEL-AV 8.0 and 8.1, so add this here */
+    compat_props_add(mc->compat_props, hw_compat_rhel_8_1, hw_compat_rhel_8_1_len);
+    compat_props_add(mc->compat_props, hw_compat_rhel_8_0, hw_compat_rhel_8_0_len);
+    compat_props_add(mc->compat_props, hw_compat_rhel_7_6, hw_compat_rhel_7_6_len);
+}
+DEFINE_CCW_MACHINE(rhel760, "rhel7.6.0", false);
+
+static void ccw_machine_rhel750_instance_options(MachineState *machine)
+{
+    static const S390FeatInit qemu_cpu_feat = { S390_FEAT_LIST_QEMU_V2_11 };
+    ccw_machine_rhel760_instance_options(machine);
+
+    /* before 2.12 we emulated the very first z900, and RHEL 7.5 is
+       based on 2.10 */
+    s390_set_qemu_cpu_model(0x2064, 7, 1, qemu_cpu_feat);
+
+    /* bpb and ppa15 were only in the full model in RHEL 7.5 */
+    s390_cpudef_featoff_greater(11, 1, S390_FEAT_PPA15);
+    s390_cpudef_featoff_greater(11, 1, S390_FEAT_BPB);
+}
+
+GlobalProperty ccw_compat_rhel_7_5[] = {
+        {
+            .driver   = TYPE_SCLP_EVENT_FACILITY,
+            .property = "allow_all_mask_sizes",
+            .value    = "off",
+        },
+};
+const size_t ccw_compat_rhel_7_5_len = G_N_ELEMENTS(ccw_compat_rhel_7_5);
+
+static void ccw_machine_rhel750_class_options(MachineClass *mc)
+{
+    ccw_machine_rhel760_class_options(mc);
+    compat_props_add(mc->compat_props, hw_compat_rhel_7_5, hw_compat_rhel_7_5_len);
+    compat_props_add(mc->compat_props, ccw_compat_rhel_7_5, ccw_compat_rhel_7_5_len);
+    S390_CCW_MACHINE_CLASS(mc)->hpage_1m_allowed = false;
+}
+DEFINE_CCW_MACHINE(rhel750, "rhel7.5.0", false);
 
 static void ccw_machine_register_types(void)
 {
-- 
Gitee


From 92e18e07235616da5adb7f045264ca78d7f43989 Mon Sep 17 00:00:00 2001
From: Miroslav Rezanina <mrezanin@redhat.com>
Date: Fri, 19 Oct 2018 13:10:31 +0200
Subject: [PATCH 006/407] Add x86_64 machine types

Adding changes to add RHEL machine types for x86_64 architecture.

Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>

Rebase changes (qemu-4.0.0):
- Use upstream compat handling

Rebase notes (3.1.0):
- Removed xsave changes

Rebase notes (4.1.0):
- Updated format for compat structures

Rebase notes (4.2.0-rc2):
- Use X86MachineClass for save_tsc_khz (upstream change)

Rebase notes (weekly-210303):
- Use rhel-8.4.0 hw compat

Rebase notes (weekly-210519):
- kvm_default_props moved to new file (upstream)

Rebase notes (6.2.0-rc0):
- linuxboot_dma_enabled moved to X86MachineState

Merged patches (4.1.0):
- f4dc802 pc: 7.5 compat entries
- 456ed3e pc: PC_RHEL7_6_COMPAT
- 04119ee pc: Add compat for pc-i440fx-rhel7.6.0 machine type
- b3b3687 pc: Add pc-q35-8.0.0 machine type
- 8d46fc6 pc: Add x-migrate-smi-count=off to PC_RHEL7_6_COMPAT
- 1de7949 kvm: clear out KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT for older machine types
- 18cf0d7 target/i386: Disable MPX support on named CPU models (partialy)
- 2660667 rhel: Set host-phys-bits-limit=48 on rhel machine-types

Merged patches (4.2.0):
- 7d5c2ef pc: Don't make die-id mandatory unless necessary
- e42808c x86 machine types: pc_rhel_8_0_compat
- 9de83a8 x86 machine types: q35: Fixup units_per_default_bus
- 6df1559 x86 machine types: Fixup dynamic sysbus entries
- 0784125 x86 machine types: add pc-q35-rhel8.1.0
- machines/x86: Add rhel 8.2 machine type (patch 92959)

Merged patches (5.1.0):
- 481357e RHEL: hw/i386: disable nested PERF_GLOBAL_CTRL MSR support
- e6c3fbf hw/smbios: set new default SMBIOS fields for Windows driver support (partialy)

Merged patches (5.2.0 rc0):
- b02c9f5 x86: Add 8.3.0 x86_64 machine type
- f2edc4f q35: Set max_cpus to 512
- 6d7ba66 machine types/numa: set numa_mem_supported on old machine types (partialy)
- 25c5644 machine_types/numa: compatibility for auto_enable_numa_with_memdev (partialy)
- e2d3209 x86: lpc9: let firmware negotiate 'CPU hotplug with SMI' features (partialy)

Merged patches (weekly-210120):
- d0afeaa0c4 RHEL: Switch pvpanic test to q35
- e19cdad83c 8.4 x86 machine type

Merged patches (weekly-210203):
- 96f8781bd6 q35: Increase max_cpus to 710 on pc-q35-rhel8* machine types

Merged patches (weekly-210224):
- 70d3924521 redhat: Add some devices for exporting upstream machine types
 - machine type chunks only

Merged patches (6.0.0 rc0):
- 031c690804 i386/acpi: restore device paths for pre-5.1 vms

Merged patches (weekly-210623):
- 64c350696f x86: Add x86 rhel8.5 machine types
- 1c8fe5e164 redhat: x86: Enable 'kvm-asyncpf-int' by default

Merged patches (weekly-210714):
- 618e2424ed redhat: Expose upstream machines pc-4.2 and pc-2.11
- c4d1aa8bf2 redhat: Enable FDC device for upstream machines too
- 66882f9a32 redhat: Add hw_compat_4_2_extra and apply to upstream machines

Fix machine type
---
 hw/block/fdc.c             |   5 +-
 hw/i386/acpi-build.c       |   3 +
 hw/i386/pc.c               | 298 ++++++++++++++++++++++++++++++++++++-
 hw/i386/pc_piix.c          | 274 +++++++++++++++++++++++++++++++++-
 hw/i386/pc_q35.c           | 234 ++++++++++++++++++++++++++++-
 include/hw/boards.h        |   2 +
 include/hw/i386/pc.h       |  45 ++++++
 target/i386/kvm/kvm-cpu.c  |   1 +
 target/i386/kvm/kvm.c      |   4 +
 tests/qtest/pvpanic-test.c |   5 +-
 10 files changed, 862 insertions(+), 9 deletions(-)

diff --git a/hw/block/fdc.c b/hw/block/fdc.c
index 97fa6de4239..63042ef030b 100644
--- a/hw/block/fdc.c
+++ b/hw/block/fdc.c
@@ -2341,7 +2341,10 @@ void fdctrl_realize_common(DeviceState *dev, FDCtrl *fdctrl, Error **errp)
 
     /* Restricted for Red Hat Enterprise Linux: */
     MachineClass *mc = MACHINE_GET_CLASS(qdev_get_machine());
-    if (!strstr(mc->name, "-rhel7.")) {
+    if (!strstr(mc->name, "-rhel7.") &&
+        /* Exported two upstream machine types allows FDC too */
+        strcmp(mc->name, "pc-i440fx-4.2") &&
+        strcmp(mc->name, "pc-i440fx-2.11")) {
         error_setg(errp, "Device %s is not supported with machine type %s",
                    object_get_typename(OBJECT(dev)), mc->name);
         return;
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index a99c6e4fe3f..447ea35275f 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -230,6 +230,9 @@ static void acpi_get_pm_info(MachineState *machine, AcpiPmInfo *pm)
         pm->fadt.reset_reg = r;
         pm->fadt.reset_val = 0xf;
         pm->fadt.flags |= 1 << ACPI_FADT_F_RESET_REG_SUP;
+        if (object_property_get_bool(lpc,
+                                     "__com.redhat_force-rev1-fadt", NULL))
+            pm->fadt.rev = 1;
         pm->cpu_hp_io_base = ICH9_CPU_HOTPLUG_IO_BASE;
         pm->smi_on_cpuhp =
             !!(smi_features & BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT));
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index a2ef40ecbc2..e8109954cab 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -371,6 +371,296 @@ GlobalProperty pc_compat_1_4[] = {
 };
 const size_t pc_compat_1_4_len = G_N_ELEMENTS(pc_compat_1_4);
 
+/* This macro is for changes to properties that are RHEL specific,
+ * different to the current upstream and to be applied to the latest
+ * machine type.
+ */
+GlobalProperty pc_rhel_compat[] = {
+    { TYPE_X86_CPU, "host-phys-bits", "on" },
+    { TYPE_X86_CPU, "host-phys-bits-limit", "48" },
+    { TYPE_X86_CPU, "vmx-entry-load-perf-global-ctrl", "off" },
+    { TYPE_X86_CPU, "vmx-exit-load-perf-global-ctrl", "off" },
+    /* bz 1508330 */ 
+    { "vfio-pci", "x-no-geforce-quirks", "on" },
+    /* bz 1941397 */
+    { TYPE_X86_CPU, "kvm-asyncpf-int", "on" },
+};
+const size_t pc_rhel_compat_len = G_N_ELEMENTS(pc_rhel_compat);
+
+GlobalProperty pc_rhel_8_4_compat[] = {
+    /* pc_rhel_8_4_compat from pc_compat_5_2 */
+    { "ICH9-LPC", "x-smi-cpu-hotunplug", "off" },
+    { TYPE_X86_CPU, "kvm-asyncpf-int", "off" },
+};
+const size_t pc_rhel_8_4_compat_len = G_N_ELEMENTS(pc_rhel_8_4_compat);
+
+GlobalProperty pc_rhel_8_3_compat[] = {
+    /* pc_rhel_8_3_compat from pc_compat_5_1 */
+    { "ICH9-LPC", "x-smi-cpu-hotplug", "off" },
+};
+const size_t pc_rhel_8_3_compat_len = G_N_ELEMENTS(pc_rhel_8_3_compat);
+
+GlobalProperty pc_rhel_8_2_compat[] = {
+    /* pc_rhel_8_2_compat from pc_compat_4_2 */
+    { "mch", "smbase-smram", "off" },
+};
+const size_t pc_rhel_8_2_compat_len = G_N_ELEMENTS(pc_rhel_8_2_compat);
+
+/* pc_rhel_8_1_compat is empty since pc_4_1_compat is */
+GlobalProperty pc_rhel_8_1_compat[] = { };
+const size_t pc_rhel_8_1_compat_len = G_N_ELEMENTS(pc_rhel_8_1_compat);
+
+GlobalProperty pc_rhel_8_0_compat[] = {
+    /* pc_rhel_8_0_compat from pc_compat_3_1 */
+    { "intel-iommu", "dma-drain", "off" },
+    /* pc_rhel_8_0_compat from pc_compat_3_1 */
+    { "Opteron_G3" "-" TYPE_X86_CPU, "rdtscp", "off" },
+    /* pc_rhel_8_0_compat from pc_compat_3_1 */
+    { "Opteron_G4" "-" TYPE_X86_CPU, "rdtscp", "off" },
+    /* pc_rhel_8_0_compat from pc_compat_3_1 */
+    { "Opteron_G4" "-" TYPE_X86_CPU, "npt", "off" },
+    /* pc_rhel_8_0_compat from pc_compat_3_1 */
+    { "Opteron_G4" "-" TYPE_X86_CPU, "nrip-save", "off" },
+    /* pc_rhel_8_0_compat from pc_compat_3_1 */
+    { "Opteron_G5" "-" TYPE_X86_CPU, "rdtscp", "off" },
+    /* pc_rhel_8_0_compat from pc_compat_3_1 */
+    { "Opteron_G5" "-" TYPE_X86_CPU, "npt", "off" },
+    /* pc_rhel_8_0_compat from pc_compat_3_1 */
+    { "Opteron_G5" "-" TYPE_X86_CPU, "nrip-save", "off" },
+    /* pc_rhel_8_0_compat from pc_compat_3_1 */
+    { "EPYC" "-" TYPE_X86_CPU, "npt", "off" },
+    /* pc_rhel_8_0_compat from pc_compat_3_1 */
+    { "EPYC" "-" TYPE_X86_CPU, "nrip-save", "off" },
+    /* pc_rhel_8_0_compat from pc_compat_3_1 */
+    { "EPYC-IBPB" "-" TYPE_X86_CPU, "npt", "off" },
+    /* pc_rhel_8_0_compat from pc_compat_3_1 */
+    { "EPYC-IBPB" "-" TYPE_X86_CPU, "nrip-save", "off" },
+    /** The mpx=on entries from pc_compat_3_1 are in pc_rhel_7_6_compat **/
+    /* pc_rhel_8_0_compat from pc_compat_3_1 */
+    { "Cascadelake-Server" "-" TYPE_X86_CPU, "stepping", "5" },
+    /* pc_rhel_8_0_compat from pc_compat_3_1 */
+    { TYPE_X86_CPU, "x-intel-pt-auto-level", "off" },
+};
+const size_t pc_rhel_8_0_compat_len = G_N_ELEMENTS(pc_rhel_8_0_compat);
+
+/* Similar to PC_COMPAT_3_0 + PC_COMPAT_2_12, but:
+ * all of the 2_12 stuff was already in 7.6 from bz 1481253
+ * x-migrate-smi-count comes from PC_COMPAT_2_11 but
+ * is really tied to kernel version so keep it off on 7.x
+ * machine types irrespective of host.
+ */
+GlobalProperty pc_rhel_7_6_compat[] = {
+    /* pc_rhel_7_6_compat from pc_compat_3_0 */ 
+    { TYPE_X86_CPU, "x-hv-synic-kvm-only", "on" },
+    /* pc_rhel_7_6_compat from pc_compat_3_0 */ 
+    { "Skylake-Server" "-" TYPE_X86_CPU, "pku", "off" },
+    /* pc_rhel_7_6_compat from pc_compat_3_0 */ 
+    { "Skylake-Server-IBRS" "-" TYPE_X86_CPU, "pku", "off" },
+    /* pc_rhel_7_6_compat from pc_compat_2_11 */ 
+    { TYPE_X86_CPU, "x-migrate-smi-count", "off" },
+    /* pc_rhel_7_6_compat from pc_compat_2_11 */ 
+    { "Skylake-Client" "-" TYPE_X86_CPU, "mpx", "on" },
+    /* pc_rhel_7_6_compat from pc_compat_2_11 */ 
+    { "Skylake-Client-IBRS" "-" TYPE_X86_CPU, "mpx", "on" },
+    /* pc_rhel_7_6_compat from pc_compat_2_11 */ 
+    { "Skylake-Server" "-" TYPE_X86_CPU, "mpx", "on" },
+    /* pc_rhel_7_6_compat from pc_compat_2_11 */ 
+    { "Skylake-Server-IBRS" "-" TYPE_X86_CPU, "mpx", "on" },
+    /* pc_rhel_7_6_compat from pc_compat_2_11 */ 
+    { "Cascadelake-Server" "-" TYPE_X86_CPU, "mpx", "on" },
+    /* pc_rhel_7_6_compat from pc_compat_2_11 */ 
+    { "Icelake-Client" "-" TYPE_X86_CPU, "mpx", "on" },
+    /* pc_rhel_7_6_compat from pc_compat_2_11 */ 
+    { "Icelake-Server" "-" TYPE_X86_CPU, "mpx", "on" },
+};
+const size_t pc_rhel_7_6_compat_len = G_N_ELEMENTS(pc_rhel_7_6_compat);
+
+/* Similar to PC_COMPAT_2_11 + PC_COMPAT_2_10, but:
+ * - x-hv-max-vps was backported to 7.5
+ * - x-pci-hole64-fix was backported to 7.5
+ */
+GlobalProperty pc_rhel_7_5_compat[] = {
+    /* pc_rhel_7_5_compat from pc_compat_2_11 */  
+    { "Skylake-Server" "-" TYPE_X86_CPU, "clflushopt", "off" },
+    /* pc_rhel_7_5_compat from pc_compat_2_12 */  
+    { TYPE_X86_CPU, "legacy-cache", "on" },
+    /* pc_rhel_7_5_compat from pc_compat_2_12 */  
+    { TYPE_X86_CPU, "topoext", "off" },
+    /* pc_rhel_7_5_compat from pc_compat_2_12 */  
+    { "EPYC-" TYPE_X86_CPU, "xlevel", stringify(0x8000000a) },
+    /* pc_rhel_7_5_compat from pc_compat_2_12 */  
+    { "EPYC-IBPB-" TYPE_X86_CPU, "xlevel", stringify(0x8000000a) },
+};
+const size_t pc_rhel_7_5_compat_len = G_N_ELEMENTS(pc_rhel_7_5_compat);
+
+GlobalProperty pc_rhel_7_4_compat[] = {
+    /* pc_rhel_7_4_compat from pc_compat_2_9 */ 
+    { "mch", "extended-tseg-mbytes", stringify(0) },
+    /* bz 1489800 */ 
+    { "ICH9-LPC", "__com.redhat_force-rev1-fadt", "on" },
+    /* pc_rhel_7_4_compat from pc_compat_2_10 */ 
+    { "i440FX-pcihost", "x-pci-hole64-fix", "off" },
+    /* pc_rhel_7_4_compat from pc_compat_2_10 */ 
+    { "q35-pcihost", "x-pci-hole64-fix", "off" },
+    /* pc_rhel_7_4_compat from pc_compat_2_10 */ 
+    { TYPE_X86_CPU, "x-hv-max-vps", "0x40" },
+};
+const size_t pc_rhel_7_4_compat_len = G_N_ELEMENTS(pc_rhel_7_4_compat);
+
+GlobalProperty pc_rhel_7_3_compat[] = {
+    /* pc_rhel_7_3_compat from pc_compat_2_8 */ 
+    { "kvmclock", "x-mach-use-reliable-get-clock", "off" },
+    /* pc_rhel_7_3_compat from pc_compat_2_7 */ 
+    { TYPE_X86_CPU, "l3-cache", "off" },
+    /* pc_rhel_7_3_compat from pc_compat_2_7 */ 
+    { TYPE_X86_CPU, "full-cpuid-auto-level", "off" },
+    /* pc_rhel_7_3_compat from pc_compat_2_7 */ 
+    { "Opteron_G3" "-" TYPE_X86_CPU, "family", "15" },
+    /* pc_rhel_7_3_compat from pc_compat_2_7 */ 
+    { "Opteron_G3" "-" TYPE_X86_CPU, "model", "6" },
+    /* pc_rhel_7_3_compat from pc_compat_2_7 */ 
+    { "Opteron_G3" "-" TYPE_X86_CPU, "stepping", "1" },
+    /* pc_rhel_7_3_compat from pc_compat_2_7 */ 
+    { "isa-pcspk", "migrate", "off" },
+    /* pc_rhel_7_3_compat from pc_compat_2_6 */ 
+    { TYPE_X86_CPU, "cpuid-0xb", "off" },
+    /* pc_rhel_7_3_compat from pc_compat_2_8 */ 
+    { "ICH9-LPC", "x-smi-broadcast", "off" },
+    /* pc_rhel_7_3_compat from pc_compat_2_8 */ 
+    { TYPE_X86_CPU, "vmware-cpuid-freq", "off" },
+    /* pc_rhel_7_3_compat from pc_compat_2_8 */ 
+    { "Haswell-" TYPE_X86_CPU, "stepping", "1" },
+    /* pc_rhel_7_3_compat from pc_compat_2_3 added in 2.9*/ 
+    { TYPE_X86_CPU, "kvm-no-smi-migration", "on" },
+};
+const size_t pc_rhel_7_3_compat_len = G_N_ELEMENTS(pc_rhel_7_3_compat);
+
+GlobalProperty pc_rhel_7_2_compat[] = {
+    { "phenom" "-" TYPE_X86_CPU, "rdtscp", "off"},
+    { "qemu64" "-" TYPE_X86_CPU, "sse4a", "on" },
+    { "qemu64" "-" TYPE_X86_CPU, "abm", "on" },
+    { "Haswell-" TYPE_X86_CPU, "abm", "off" },
+    { "Haswell-IBRS" "-" TYPE_X86_CPU, "abm", "off" },
+    { "Haswell-noTSX-" TYPE_X86_CPU, "abm", "off" },
+    { "Haswell-noTSX-IBRS" "-" TYPE_X86_CPU, "abm", "off" },
+    { "Broadwell-" TYPE_X86_CPU, "abm", "off" },
+    { "Broadwell-IBRS" "-" TYPE_X86_CPU, "abm", "off" },
+    { "Broadwell-noTSX-" TYPE_X86_CPU, "abm", "off" },
+    { "Broadwell-noTSX-IBRS" "-" TYPE_X86_CPU, "abm", "off" },
+    { "host" "-" TYPE_X86_CPU, "host-cache-info", "on" },
+    { TYPE_X86_CPU, "check", "off" },
+    { "qemu32" "-" TYPE_X86_CPU, "popcnt", "on" },
+    { TYPE_X86_CPU, "arat", "off" },
+    { "usb-redir", "streams", "off" },
+    { TYPE_X86_CPU, "fill-mtrr-mask", "off" },
+    { "apic-common", "legacy-instance-id", "on" },
+};
+const size_t pc_rhel_7_2_compat_len = G_N_ELEMENTS(pc_rhel_7_2_compat);
+
+GlobalProperty pc_rhel_7_1_compat[] = {
+    { "kvm64" "-" TYPE_X86_CPU, "vme", "off" },
+    { "kvm32" "-" TYPE_X86_CPU, "vme", "off" },
+    { "Conroe" "-" TYPE_X86_CPU, "vme", "off" },
+    { "Penryn" "-" TYPE_X86_CPU, "vme", "off" },
+    { "Nehalem" "-" TYPE_X86_CPU, "vme", "off" },
+    { "Nehalem-IBRS" "-" TYPE_X86_CPU, "vme", "off" },
+    { "Westmere" "-" TYPE_X86_CPU, "vme", "off" },
+    { "Westmere-IBRS" "-" TYPE_X86_CPU, "vme", "off" },
+    { "SandyBridge" "-" TYPE_X86_CPU, "vme", "off" },
+    { "SandyBridge-IBRS" "-" TYPE_X86_CPU, "vme", "off" },
+    { "Haswell" "-" TYPE_X86_CPU, "vme", "off" },
+    { "Haswell-IBRS" "-" TYPE_X86_CPU, "vme", "off" },
+    { "Broadwell" "-" TYPE_X86_CPU, "vme", "off" },
+    { "Broadwell-IBRS" "-" TYPE_X86_CPU, "vme", "off" },
+    { "Opteron_G1" "-" TYPE_X86_CPU, "vme", "off" },
+    { "Opteron_G2" "-" TYPE_X86_CPU, "vme", "off" },
+    { "Opteron_G3" "-" TYPE_X86_CPU, "vme", "off" },
+    { "Opteron_G4" "-" TYPE_X86_CPU, "vme", "off" },
+    { "Opteron_G5" "-" TYPE_X86_CPU, "vme", "off" },
+    { "Haswell" "-" TYPE_X86_CPU,  "f16c", "off" },
+    { "Haswell-IBRS" "-" TYPE_X86_CPU, "f16c", "off" },
+    { "Haswell" "-" TYPE_X86_CPU, "rdrand", "off" },
+    { "Haswell-IBRS" "-" TYPE_X86_CPU, "rdrand", "off" },
+    { "Broadwell" "-" TYPE_X86_CPU, "f16c", "off" },
+    { "Broadwell-IBRS" "-" TYPE_X86_CPU, "f16c", "off" },
+    { "Broadwell" "-" TYPE_X86_CPU, "rdrand", "off" },
+    { "Broadwell-IBRS" "-" TYPE_X86_CPU, "rdrand", "off" },
+    { "coreduo" "-" TYPE_X86_CPU, "vmx", "on" },
+    { "core2duo" "-" TYPE_X86_CPU, "vmx", "on" },
+    { "qemu64" "-" TYPE_X86_CPU, "min-level", stringify(4) },
+    { "kvm64" "-" TYPE_X86_CPU, "min-level", stringify(5) },
+    { "pentium3" "-" TYPE_X86_CPU, "min-level", stringify(2) },
+    { "n270" "-" TYPE_X86_CPU, "min-level", stringify(5) },
+    { "Conroe" "-" TYPE_X86_CPU, "min-level", stringify(4) },
+    { "Penryn" "-" TYPE_X86_CPU, "min-level", stringify(4) },
+    { "Nehalem" "-" TYPE_X86_CPU, "min-level", stringify(4) },
+    { "n270" "-" TYPE_X86_CPU, "min-xlevel", stringify(0x8000000a) },
+    { "Penryn" "-" TYPE_X86_CPU, "min-xlevel", stringify(0x8000000a) },
+    { "Conroe" "-" TYPE_X86_CPU, "min-xlevel", stringify(0x8000000a) },
+    { "Nehalem" "-" TYPE_X86_CPU, "min-xlevel", stringify(0x8000000a) },
+    { "Westmere" "-" TYPE_X86_CPU, "min-xlevel", stringify(0x8000000a) },
+    { "SandyBridge" "-" TYPE_X86_CPU, "min-xlevel", stringify(0x8000000a) },
+    { "IvyBridge" "-" TYPE_X86_CPU, "min-xlevel", stringify(0x8000000a) },
+    { "Haswell" "-" TYPE_X86_CPU, "min-xlevel", stringify(0x8000000a) },
+    { "Haswell-noTSX" "-" TYPE_X86_CPU, "min-xlevel", stringify(0x8000000a) },
+    { "Broadwell" "-" TYPE_X86_CPU, "min-xlevel", stringify(0x8000000a) },
+    { "Broadwell-noTSX" "-" TYPE_X86_CPU, "min-xlevel", stringify(0x8000000a) },
+};
+const size_t pc_rhel_7_1_compat_len = G_N_ELEMENTS(pc_rhel_7_1_compat);
+
+/*
+ * The PC_RHEL_*_COMPAT serve the same purpose for RHEL-7 machine
+ * types as the PC_COMPAT_* do for upstream types.
+ * PC_RHEL_7_*_COMPAT apply both to i440fx and q35 types.
+ */
+
+/*
+ * RHEL-7 is based on QEMU 1.5.3, so this needs the PC_COMPAT_*
+ * between our base and 1.5, less stuff backported to RHEL-7.0
+ * (usb-device.msos-desc), less stuff for devices we changed
+ * (qemu64-x86_64-cpu) or don't support (hpet, pci-serial-2x,
+ * pci-serial-4x) in 7.0.
+ */
+GlobalProperty pc_rhel_7_0_compat[] = {
+    { "virtio-scsi-pci", "any_layout", "off" },
+    { "PIIX4_PM", "memory-hotplug-support", "off" },
+    { "apic", "version", stringify(0x11) },
+    { "nec-usb-xhci", "superspeed-ports-first", "off" },
+    { "nec-usb-xhci", "force-pcie-endcap", "on" },
+    { "pci-serial", "prog_if", stringify(0) },
+    { "virtio-net-pci", "guest_announce", "off" },
+    { "ICH9-LPC", "memory-hotplug-support", "off" },
+    { "xio3130-downstream", COMPAT_PROP_PCP, "off" },
+    { "ioh3420", COMPAT_PROP_PCP, "off" },
+    { "PIIX4_PM", "acpi-pci-hotplug-with-bridge-support", "off" },
+    { "e1000", "mitigation", "off" },
+    { "virtio-net-pci", "ctrl_guest_offloads", "off" },
+    { "Conroe" "-" TYPE_X86_CPU, "x2apic", "on" },
+    { "Penryn" "-" TYPE_X86_CPU, "x2apic", "on" },
+    { "Nehalem" "-" TYPE_X86_CPU, "x2apic", "on" },
+    { "Nehalem-IBRS" "-" TYPE_X86_CPU, "x2apic", "on" },
+    { "Westmere" "-" TYPE_X86_CPU, "x2apic", "on" },
+    { "Westmere-IBRS" "-" TYPE_X86_CPU, "x2apic", "on" },
+    { "Opteron_G1" "-" TYPE_X86_CPU, "x2apic", "on" },
+    { "Opteron_G2" "-" TYPE_X86_CPU, "x2apic", "on" },
+    { "Opteron_G3" "-" TYPE_X86_CPU, "x2apic", "on" },
+    { "Opteron_G4" "-" TYPE_X86_CPU, "x2apic", "on" },
+    { "Opteron_G5" "-" TYPE_X86_CPU, "x2apic", "on" },
+};
+const size_t pc_rhel_7_0_compat_len = G_N_ELEMENTS(pc_rhel_7_0_compat);
+
+/*
+ * RHEL: These properties only apply to the RHEL exported machine types
+ * pc-4.2/2.11 for the purpose to have a limited upstream machines support
+ * which can be migrated to RHEL.  Let's avoid touching hw_compat_4_2 directly
+ * so that we can have some isolation against the upstream code.
+ */
+GlobalProperty hw_compat_4_2_extra[] = {
+    /* By default enlarge the default virtio-net-pci ROM to 512KB. */
+    { "virtio-net-pci", "romsize", "0x80000" },
+};
+const size_t hw_compat_4_2_extra_len = G_N_ELEMENTS(hw_compat_4_2_extra);
+
 GSIState *pc_gsi_create(qemu_irq **irqs, bool pci_enabled)
 {
     GSIState *s;
@@ -904,7 +1194,8 @@ void pc_memory_init(PCMachineState *pcms,
     option_rom_mr = g_malloc(sizeof(*option_rom_mr));
     memory_region_init_ram(option_rom_mr, NULL, "pc.rom", PC_ROM_SIZE,
                            &error_fatal);
-    if (pcmc->pci_enabled) {
+    /* RH difference: See bz 1489800, explicitly make ROM ro */
+    if (pcmc->pc_rom_ro) {
         memory_region_set_readonly(option_rom_mr, true);
     }
     memory_region_add_subregion_overlap(rom_memory,
@@ -1694,6 +1985,8 @@ static void pc_machine_class_init(ObjectClass *oc, void *data)
     pcmc->pvh_enabled = true;
     pcmc->kvmclock_create_always = true;
     assert(!mc->get_hotplug_handler);
+    pcmc->pc_rom_ro = true;
+    mc->async_pf_vmexit_disable = false;
     mc->get_hotplug_handler = pc_get_hotplug_handler;
     mc->hotplug_allowed = pc_hotplug_allowed;
     mc->cpu_index_to_instance_props = x86_cpu_index_to_props;
@@ -1704,7 +1997,8 @@ static void pc_machine_class_init(ObjectClass *oc, void *data)
     mc->has_hotpluggable_cpus = true;
     mc->default_boot_order = "cad";
     mc->block_default_type = IF_IDE;
-    mc->max_cpus = 255;
+    /* 240: max CPU count for RHEL */
+    mc->max_cpus = 240;
     mc->reset = pc_machine_reset;
     mc->wakeup = pc_machine_wakeup;
     hc->pre_plug = pc_machine_device_pre_plug_cb;
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index dda3f64f19e..2885edffe91 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -50,6 +50,7 @@
 #include "qapi/error.h"
 #include "qemu/error-report.h"
 #include "sysemu/xen.h"
+#include "migration/migration.h"
 #ifdef CONFIG_XEN
 #include <xen/hvm/hvm_info_table.h>
 #include "hw/xen/xen_pt.h"
@@ -174,8 +175,8 @@ static void pc_init1(MachineState *machine,
     if (pcmc->smbios_defaults) {
         MachineClass *mc = MACHINE_GET_CLASS(machine);
         /* These values are guest ABI, do not change */
-        smbios_set_defaults("QEMU", "Standard PC (i440FX + PIIX, 1996)",
-                            mc->name, pcmc->smbios_legacy_mode,
+        smbios_set_defaults("Red Hat", "KVM",
+                            mc->desc, pcmc->smbios_legacy_mode,
                             pcmc->smbios_uuid_encoded,
                             pcmc->smbios_stream_product,
                             pcmc->smbios_stream_version,
@@ -314,6 +315,15 @@ static void pc_init1(MachineState *machine,
  * hw_compat_*, pc_compat_*, or * pc_*_machine_options().
  */
 
+/*
+ * NOTE!  Not all the upstream machine types are disabled for RHEL.  For
+ * providing a very limited support for upstream machine types, pc machines
+ * 2.11 and 4.2 are exposed explicitly.  This will make the below "#if" macros
+ * a bit messed up, but please read this comment first so that we can have a
+ * rough understanding of what we're going to do.
+ */
+
+#if 0 /* Disabled for Red Hat Enterprise Linux */
 static void pc_compat_2_3_fn(MachineState *machine)
 {
     X86MachineState *x86ms = X86_MACHINE(machine);
@@ -389,6 +399,8 @@ static void pc_xen_hvm_init(MachineState *machine)
 }
 #endif
 
+#endif /* Disabled for Red Hat Enterprise Linux */
+
 #define DEFINE_I440FX_MACHINE(suffix, name, compatfn, optionfn) \
     static void pc_init_##suffix(MachineState *machine) \
     { \
@@ -424,8 +436,10 @@ static void pc_i440fx_6_2_machine_options(MachineClass *m)
     pcmc->default_cpu_version = 1;
 }
 
+#if 0 /* Disabled for Red Hat Enterprise Linux */
 DEFINE_I440FX_MACHINE(v6_2, "pc-i440fx-6.2", NULL,
                       pc_i440fx_6_2_machine_options);
+#endif /* Disabled for Red Hat Enterprise Linux */
 
 static void pc_i440fx_6_1_machine_options(MachineClass *m)
 {
@@ -437,8 +451,10 @@ static void pc_i440fx_6_1_machine_options(MachineClass *m)
     m->smp_props.prefer_sockets = true;
 }
 
+#if 0 /* Disabled for Red Hat Enterprise Linux */
 DEFINE_I440FX_MACHINE(v6_1, "pc-i440fx-6.1", NULL,
                       pc_i440fx_6_1_machine_options);
+#endif /* Disabled for Red Hat Enterprise Linux */
 
 static void pc_i440fx_6_0_machine_options(MachineClass *m)
 {
@@ -449,8 +465,10 @@ static void pc_i440fx_6_0_machine_options(MachineClass *m)
     compat_props_add(m->compat_props, pc_compat_6_0, pc_compat_6_0_len);
 }
 
+#if 0 /* Disabled for Red Hat Enterprise Linux */
 DEFINE_I440FX_MACHINE(v6_0, "pc-i440fx-6.0", NULL,
                       pc_i440fx_6_0_machine_options);
+#endif /* Disabled for Red Hat Enterprise Linux */
 
 static void pc_i440fx_5_2_machine_options(MachineClass *m)
 {
@@ -461,8 +479,10 @@ static void pc_i440fx_5_2_machine_options(MachineClass *m)
     compat_props_add(m->compat_props, pc_compat_5_2, pc_compat_5_2_len);
 }
 
+#if 0 /* Disabled for Red Hat Enterprise Linux */
 DEFINE_I440FX_MACHINE(v5_2, "pc-i440fx-5.2", NULL,
                       pc_i440fx_5_2_machine_options);
+#endif /* Disabled for Red Hat Enterprise Linux */
 
 static void pc_i440fx_5_1_machine_options(MachineClass *m)
 {
@@ -477,8 +497,10 @@ static void pc_i440fx_5_1_machine_options(MachineClass *m)
     pcmc->pci_root_uid = 1;
 }
 
+#if 0 /* Disabled for Red Hat Enterprise Linux */
 DEFINE_I440FX_MACHINE(v5_1, "pc-i440fx-5.1", NULL,
                       pc_i440fx_5_1_machine_options);
+#endif /* Disabled for Red Hat Enterprise Linux */
 
 static void pc_i440fx_5_0_machine_options(MachineClass *m)
 {
@@ -491,8 +513,10 @@ static void pc_i440fx_5_0_machine_options(MachineClass *m)
     m->auto_enable_numa_with_memdev = false;
 }
 
+#if 0 /* Disabled for Red Hat Enterprise Linux */
 DEFINE_I440FX_MACHINE(v5_0, "pc-i440fx-5.0", NULL,
                       pc_i440fx_5_0_machine_options);
+#endif /* Disabled for Red Hat Enterprise Linux */
 
 static void pc_i440fx_4_2_machine_options(MachineClass *m)
 {
@@ -501,8 +525,21 @@ static void pc_i440fx_4_2_machine_options(MachineClass *m)
     m->is_default = false;
     compat_props_add(m->compat_props, hw_compat_4_2, hw_compat_4_2_len);
     compat_props_add(m->compat_props, pc_compat_4_2, pc_compat_4_2_len);
+
+    /*
+     * RHEL: Mark all upstream machines as deprecated because they're not
+     * supported by RHEL, even if exported.
+     */
+    m->deprecation_reason = "Not supported by RHEL";
+    /*
+     * RHEL: Specific compat properties to have limited support for upstream
+     * machines exported.
+     */
+    compat_props_add(m->compat_props, hw_compat_4_2_extra,
+                     hw_compat_4_2_extra_len);
 }
 
+/* RHEL: Export pc-4.2 */
 DEFINE_I440FX_MACHINE(v4_2, "pc-i440fx-4.2", NULL,
                       pc_i440fx_4_2_machine_options);
 
@@ -515,8 +552,10 @@ static void pc_i440fx_4_1_machine_options(MachineClass *m)
     compat_props_add(m->compat_props, pc_compat_4_1, pc_compat_4_1_len);
 }
 
+#if 0 /* Disabled for Red Hat Enterprise Linux */
 DEFINE_I440FX_MACHINE(v4_1, "pc-i440fx-4.1", NULL,
                       pc_i440fx_4_1_machine_options);
+#endif /* Disabled for Red Hat Enterprise Linux */
 
 static void pc_i440fx_4_0_machine_options(MachineClass *m)
 {
@@ -529,8 +568,10 @@ static void pc_i440fx_4_0_machine_options(MachineClass *m)
     compat_props_add(m->compat_props, pc_compat_4_0, pc_compat_4_0_len);
 }
 
+#if 0 /* Disabled for Red Hat Enterprise Linux */
 DEFINE_I440FX_MACHINE(v4_0, "pc-i440fx-4.0", NULL,
                       pc_i440fx_4_0_machine_options);
+#endif /* Disabled for Red Hat Enterprise Linux */
 
 static void pc_i440fx_3_1_machine_options(MachineClass *m)
 {
@@ -546,8 +587,10 @@ static void pc_i440fx_3_1_machine_options(MachineClass *m)
     compat_props_add(m->compat_props, pc_compat_3_1, pc_compat_3_1_len);
 }
 
+#if 0 /* Disabled for Red Hat Enterprise Linux */
 DEFINE_I440FX_MACHINE(v3_1, "pc-i440fx-3.1", NULL,
                       pc_i440fx_3_1_machine_options);
+#endif /* Disabled for Red Hat Enterprise Linux */
 
 static void pc_i440fx_3_0_machine_options(MachineClass *m)
 {
@@ -556,8 +599,10 @@ static void pc_i440fx_3_0_machine_options(MachineClass *m)
     compat_props_add(m->compat_props, pc_compat_3_0, pc_compat_3_0_len);
 }
 
+#if 0 /* Disabled for Red Hat Enterprise Linux */
 DEFINE_I440FX_MACHINE(v3_0, "pc-i440fx-3.0", NULL,
                       pc_i440fx_3_0_machine_options);
+#endif /* Disabled for Red Hat Enterprise Linux */
 
 static void pc_i440fx_2_12_machine_options(MachineClass *m)
 {
@@ -566,8 +611,10 @@ static void pc_i440fx_2_12_machine_options(MachineClass *m)
     compat_props_add(m->compat_props, pc_compat_2_12, pc_compat_2_12_len);
 }
 
+#if 0 /* Disabled for Red Hat Enterprise Linux */
 DEFINE_I440FX_MACHINE(v2_12, "pc-i440fx-2.12", NULL,
                       pc_i440fx_2_12_machine_options);
+#endif /* Disabled for Red Hat Enterprise Linux */
 
 static void pc_i440fx_2_11_machine_options(MachineClass *m)
 {
@@ -576,9 +623,11 @@ static void pc_i440fx_2_11_machine_options(MachineClass *m)
     compat_props_add(m->compat_props, pc_compat_2_11, pc_compat_2_11_len);
 }
 
+/* RHEL: Export pc-2.11 */
 DEFINE_I440FX_MACHINE(v2_11, "pc-i440fx-2.11", NULL,
                       pc_i440fx_2_11_machine_options);
 
+#if 0 /* Disabled for Red Hat Enterprise Linux */
 static void pc_i440fx_2_10_machine_options(MachineClass *m)
 {
     pc_i440fx_2_11_machine_options(m);
@@ -951,3 +1000,224 @@ static void xenfv_3_1_machine_options(MachineClass *m)
 DEFINE_PC_MACHINE(xenfv, "xenfv-3.1", pc_xen_hvm_init,
                   xenfv_3_1_machine_options);
 #endif
+#endif  /* Disabled for Red Hat Enterprise Linux */
+
+/* Red Hat Enterprise Linux machine types */
+
+/* Options for the latest rhel7 machine type */
+static void pc_machine_rhel7_options(MachineClass *m)
+{
+    PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
+    m->family = "pc_piix_Y";
+    m->default_machine_opts = "firmware=bios-256k.bin,hpet=off";
+    pcmc->default_nic_model = "e1000";
+    pcmc->pci_root_uid = 0;
+    m->default_display = "std";
+    m->no_parallel = 1;
+    m->numa_mem_supported = true;
+    m->auto_enable_numa_with_memdev = false;
+    machine_class_allow_dynamic_sysbus_dev(m, TYPE_RAMFB_DEVICE);
+    compat_props_add(m->compat_props, pc_rhel_compat, pc_rhel_compat_len);
+    m->alias = "pc";
+    m->is_default = 1;
+}
+
+static void pc_init_rhel760(MachineState *machine)
+{
+    pc_init1(machine, TYPE_I440FX_PCI_HOST_BRIDGE, \
+             TYPE_I440FX_PCI_DEVICE);
+}
+
+static void pc_machine_rhel760_options(MachineClass *m)
+{
+    PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
+    pc_machine_rhel7_options(m);
+    m->desc = "RHEL 7.6.0 PC (i440FX + PIIX, 1996)";
+    m->async_pf_vmexit_disable = true;
+    m->smbus_no_migration_support = true;
+    pcmc->pvh_enabled = false;
+    pcmc->default_cpu_version = CPU_VERSION_LEGACY;
+    pcmc->kvmclock_create_always = false;
+    /* From pc_i440fx_5_1_machine_options() */
+    pcmc->pci_root_uid = 1;
+    compat_props_add(m->compat_props, hw_compat_rhel_8_4,
+                     hw_compat_rhel_8_4_len);
+    compat_props_add(m->compat_props, pc_rhel_8_4_compat,
+                     pc_rhel_8_4_compat_len);
+    compat_props_add(m->compat_props, hw_compat_rhel_8_3,
+                     hw_compat_rhel_8_3_len);
+    compat_props_add(m->compat_props, pc_rhel_8_3_compat,
+                     pc_rhel_8_3_compat_len);
+    compat_props_add(m->compat_props, hw_compat_rhel_8_2,
+                     hw_compat_rhel_8_2_len);
+    compat_props_add(m->compat_props, pc_rhel_8_2_compat,
+                     pc_rhel_8_2_compat_len);
+    compat_props_add(m->compat_props, hw_compat_rhel_8_1, hw_compat_rhel_8_1_len);
+    compat_props_add(m->compat_props, pc_rhel_8_1_compat, pc_rhel_8_1_compat_len);
+    compat_props_add(m->compat_props, hw_compat_rhel_8_0, hw_compat_rhel_8_0_len);
+    compat_props_add(m->compat_props, pc_rhel_8_0_compat, pc_rhel_8_0_compat_len);
+    compat_props_add(m->compat_props, hw_compat_rhel_7_6, hw_compat_rhel_7_6_len);
+    compat_props_add(m->compat_props, pc_rhel_7_6_compat, pc_rhel_7_6_compat_len);
+}
+
+DEFINE_PC_MACHINE(rhel760, "pc-i440fx-rhel7.6.0", pc_init_rhel760,
+                  pc_machine_rhel760_options);
+
+static void pc_init_rhel750(MachineState *machine)
+{
+    pc_init1(machine, TYPE_I440FX_PCI_HOST_BRIDGE, \
+             TYPE_I440FX_PCI_DEVICE);
+}
+
+static void pc_machine_rhel750_options(MachineClass *m)
+{
+    pc_machine_rhel760_options(m);
+    m->alias = NULL;
+    m->is_default = 0;
+    m->desc = "RHEL 7.5.0 PC (i440FX + PIIX, 1996)";
+    m->auto_enable_numa_with_memhp = false;
+    compat_props_add(m->compat_props, hw_compat_rhel_7_5, hw_compat_rhel_7_5_len);
+    compat_props_add(m->compat_props, pc_rhel_7_5_compat, pc_rhel_7_5_compat_len);
+}
+
+DEFINE_PC_MACHINE(rhel750, "pc-i440fx-rhel7.5.0", pc_init_rhel750,
+                  pc_machine_rhel750_options);
+
+static void pc_init_rhel740(MachineState *machine)
+{
+    pc_init1(machine, TYPE_I440FX_PCI_HOST_BRIDGE, \
+             TYPE_I440FX_PCI_DEVICE);
+}
+
+static void pc_machine_rhel740_options(MachineClass *m)
+{
+    PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
+    pc_machine_rhel750_options(m);
+    m->desc = "RHEL 7.4.0 PC (i440FX + PIIX, 1996)";
+    pcmc->pc_rom_ro = false;
+    compat_props_add(m->compat_props, hw_compat_rhel_7_4, hw_compat_rhel_7_4_len);
+    compat_props_add(m->compat_props, pc_rhel_7_4_compat, pc_rhel_7_4_compat_len);
+}
+
+DEFINE_PC_MACHINE(rhel740, "pc-i440fx-rhel7.4.0", pc_init_rhel740,
+                  pc_machine_rhel740_options);
+
+static void pc_init_rhel730(MachineState *machine)
+{
+    pc_init1(machine, TYPE_I440FX_PCI_HOST_BRIDGE, \
+             TYPE_I440FX_PCI_DEVICE);
+}
+
+static void pc_machine_rhel730_options(MachineClass *m)
+{
+    X86MachineClass *x86mc = X86_MACHINE_CLASS(m);
+    pc_machine_rhel740_options(m);
+    m->desc = "RHEL 7.3.0 PC (i440FX + PIIX, 1996)";
+    x86mc->fwcfg_dma_enabled = false;
+    compat_props_add(m->compat_props, hw_compat_rhel_7_3, hw_compat_rhel_7_3_len);
+    compat_props_add(m->compat_props, pc_rhel_7_3_compat, pc_rhel_7_3_compat_len);
+}
+
+DEFINE_PC_MACHINE(rhel730, "pc-i440fx-rhel7.3.0", pc_init_rhel730,
+                  pc_machine_rhel730_options);
+
+
+static void pc_init_rhel720(MachineState *machine)
+{
+    pc_init1(machine, TYPE_I440FX_PCI_HOST_BRIDGE, \
+             TYPE_I440FX_PCI_DEVICE);
+}
+
+static void pc_machine_rhel720_options(MachineClass *m)
+{
+    PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
+    X86MachineClass *x86mc = X86_MACHINE_CLASS(m);
+    pc_machine_rhel730_options(m);
+    m->desc = "RHEL 7.2.0 PC (i440FX + PIIX, 1996)";
+    /* From pc_i440fx_2_5_machine_options */
+    x86mc->save_tsc_khz = false;
+    m->legacy_fw_cfg_order = 1;
+    /* Note: broken_reserved_end was already in 7.2 */
+    /* From pc_i440fx_2_6_machine_options */
+    pcmc->legacy_cpu_hotplug = true;
+    compat_props_add(m->compat_props, hw_compat_rhel_7_2, hw_compat_rhel_7_2_len);
+    compat_props_add(m->compat_props, pc_rhel_7_2_compat, pc_rhel_7_2_compat_len);
+}
+
+DEFINE_PC_MACHINE(rhel720, "pc-i440fx-rhel7.2.0", pc_init_rhel720,
+                  pc_machine_rhel720_options);
+
+static void pc_compat_rhel710(MachineState *machine)
+{
+    PCMachineState *pcms = PC_MACHINE(machine);
+    PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
+
+    /* From pc_compat_2_2 */
+    pcmc->rsdp_in_ram = false;
+    machine->suppress_vmdesc = true;
+
+    /* From pc_compat_2_1 */
+    pcmc->smbios_uuid_encoded = false;
+    x86_cpu_change_kvm_default("svm", NULL);
+    pcmc->enforce_aligned_dimm = false;
+
+    /* Disable all the extra subsections that were added in 2.2 */
+    migrate_pre_2_2 = true;
+
+    /* From pc_i440fx_2_4_machine_options */
+    pcmc->broken_reserved_end = true;
+}
+
+static void pc_init_rhel710(MachineState *machine)
+{
+    pc_compat_rhel710(machine);
+    pc_init1(machine, TYPE_I440FX_PCI_HOST_BRIDGE, \
+             TYPE_I440FX_PCI_DEVICE);
+}
+
+static void pc_machine_rhel710_options(MachineClass *m)
+{
+    pc_machine_rhel720_options(m);
+    m->family = "pc_piix_Y";
+    m->desc = "RHEL 7.1.0 PC (i440FX + PIIX, 1996)";
+    m->default_display = "cirrus";
+    compat_props_add(m->compat_props, hw_compat_rhel_7_1, hw_compat_rhel_7_1_len);
+    compat_props_add(m->compat_props, pc_rhel_7_1_compat, pc_rhel_7_1_compat_len);
+}
+
+DEFINE_PC_MACHINE(rhel710, "pc-i440fx-rhel7.1.0", pc_init_rhel710,
+                  pc_machine_rhel710_options);
+
+static void pc_compat_rhel700(MachineState *machine)
+{
+    PCMachineState *pcms = PC_MACHINE(machine);
+    PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
+
+    pc_compat_rhel710(machine);
+
+    /* Upstream enables it for everyone, we're a little more selective */
+    x86_cpu_change_kvm_default("x2apic", NULL);
+    x86_cpu_change_kvm_default("svm", NULL);
+    pcmc->legacy_acpi_table_size = 6418; /* see pc_compat_2_0() */
+    pcmc->smbios_legacy_mode = true;
+    pcmc->has_reserved_memory = false;
+    migrate_cve_2014_5263_xhci_fields = true;
+}
+
+static void pc_init_rhel700(MachineState *machine)
+{
+    pc_compat_rhel700(machine);
+    pc_init1(machine, TYPE_I440FX_PCI_HOST_BRIDGE, \
+             TYPE_I440FX_PCI_DEVICE);
+}
+
+static void pc_machine_rhel700_options(MachineClass *m)
+{
+    pc_machine_rhel710_options(m);
+    m->family = "pc_piix_Y";
+    m->desc = "RHEL 7.0.0 PC (i440FX + PIIX, 1996)";
+    compat_props_add(m->compat_props, pc_rhel_7_0_compat, pc_rhel_7_0_compat_len);
+}
+
+DEFINE_PC_MACHINE(rhel700, "pc-i440fx-rhel7.0.0", pc_init_rhel700,
+                  pc_machine_rhel700_options);
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 235054a6435..c67418b6a91 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -197,8 +197,8 @@ static void pc_q35_init(MachineState *machine)
 
     if (pcmc->smbios_defaults) {
         /* These values are guest ABI, do not change */
-        smbios_set_defaults("QEMU", "Standard PC (Q35 + ICH9, 2009)",
-                            mc->name, pcmc->smbios_legacy_mode,
+        smbios_set_defaults("Red Hat", "KVM",
+                            mc->desc, pcmc->smbios_legacy_mode,
                             pcmc->smbios_uuid_encoded,
                             pcmc->smbios_stream_product,
                             pcmc->smbios_stream_version,
@@ -342,6 +342,7 @@ static void pc_q35_init(MachineState *machine)
     DEFINE_PC_MACHINE(suffix, name, pc_init_##suffix, optionfn)
 
 
+#if 0 /* Disabled for Red Hat Enterprise Linux */
 static void pc_q35_machine_options(MachineClass *m)
 {
     PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
@@ -620,3 +621,232 @@ static void pc_q35_2_4_machine_options(MachineClass *m)
 
 DEFINE_Q35_MACHINE(v2_4, "pc-q35-2.4", NULL,
                    pc_q35_2_4_machine_options);
+#endif  /* Disabled for Red Hat Enterprise Linux */
+
+/* Red Hat Enterprise Linux machine types */
+
+/* Options for the latest rhel q35 machine type */
+static void pc_q35_machine_rhel_options(MachineClass *m)
+{
+    PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
+    pcmc->default_nic_model = "e1000e";
+    pcmc->pci_root_uid = 0;
+    m->family = "pc_q35_Z";
+    m->units_per_default_bus = 1;
+    m->default_machine_opts = "firmware=bios-256k.bin,hpet=off";
+    m->default_display = "std";
+    m->no_floppy = 1;
+    m->no_parallel = 1;
+    pcmc->default_cpu_version = 1;
+    machine_class_allow_dynamic_sysbus_dev(m, TYPE_AMD_IOMMU_DEVICE);
+    machine_class_allow_dynamic_sysbus_dev(m, TYPE_INTEL_IOMMU_DEVICE);
+    machine_class_allow_dynamic_sysbus_dev(m, TYPE_RAMFB_DEVICE);
+    m->alias = "q35";
+    m->max_cpus = 710;
+    compat_props_add(m->compat_props, pc_rhel_compat, pc_rhel_compat_len);
+}
+
+static void pc_q35_init_rhel850(MachineState *machine)
+{
+    pc_q35_init(machine);
+}
+
+static void pc_q35_machine_rhel850_options(MachineClass *m)
+{
+    PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
+    pc_q35_machine_rhel_options(m);
+    m->desc = "RHEL-8.5.0 PC (Q35 + ICH9, 2009)";
+    pcmc->smbios_stream_product = "RHEL-AV";
+    pcmc->smbios_stream_version = "8.5.0";
+}
+
+DEFINE_PC_MACHINE(q35_rhel850, "pc-q35-rhel8.5.0", pc_q35_init_rhel850,
+                  pc_q35_machine_rhel850_options);
+
+
+static void pc_q35_init_rhel840(MachineState *machine)
+{
+    pc_q35_init(machine);
+}
+
+static void pc_q35_machine_rhel840_options(MachineClass *m)
+{
+    PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
+    pc_q35_machine_rhel850_options(m);
+    m->desc = "RHEL-8.4.0 PC (Q35 + ICH9, 2009)";
+    m->alias = NULL;
+    pcmc->smbios_stream_product = "RHEL-AV";
+    pcmc->smbios_stream_version = "8.4.0";
+    compat_props_add(m->compat_props, hw_compat_rhel_8_4,
+                     hw_compat_rhel_8_4_len);
+    compat_props_add(m->compat_props, pc_rhel_8_4_compat,
+                     pc_rhel_8_4_compat_len);
+}
+
+DEFINE_PC_MACHINE(q35_rhel840, "pc-q35-rhel8.4.0", pc_q35_init_rhel840,
+                  pc_q35_machine_rhel840_options);
+
+
+static void pc_q35_init_rhel830(MachineState *machine)
+{
+    pc_q35_init(machine);
+}
+
+static void pc_q35_machine_rhel830_options(MachineClass *m)
+{
+    PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
+    pc_q35_machine_rhel840_options(m);
+    m->desc = "RHEL-8.3.0 PC (Q35 + ICH9, 2009)";
+    pcmc->smbios_stream_product = "RHEL-AV";
+    pcmc->smbios_stream_version = "8.3.0";
+    compat_props_add(m->compat_props, hw_compat_rhel_8_3,
+                     hw_compat_rhel_8_3_len);
+    compat_props_add(m->compat_props, pc_rhel_8_3_compat,
+                     pc_rhel_8_3_compat_len);
+    /* From pc_q35_5_1_machine_options() */
+    pcmc->kvmclock_create_always = false;
+    /* From pc_q35_5_1_machine_options() */
+    pcmc->pci_root_uid = 1;
+}
+
+DEFINE_PC_MACHINE(q35_rhel830, "pc-q35-rhel8.3.0", pc_q35_init_rhel830,
+                  pc_q35_machine_rhel830_options);
+
+static void pc_q35_init_rhel820(MachineState *machine)
+{
+    pc_q35_init(machine);
+}
+
+static void pc_q35_machine_rhel820_options(MachineClass *m)
+{
+    PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
+    pc_q35_machine_rhel830_options(m);
+    m->desc = "RHEL-8.2.0 PC (Q35 + ICH9, 2009)";
+    m->numa_mem_supported = true;
+    m->auto_enable_numa_with_memdev = false;
+    pcmc->smbios_stream_product = "RHEL-AV";
+    pcmc->smbios_stream_version = "8.2.0";
+    compat_props_add(m->compat_props, hw_compat_rhel_8_2,
+                     hw_compat_rhel_8_2_len);
+    compat_props_add(m->compat_props, pc_rhel_8_2_compat,
+                     pc_rhel_8_2_compat_len);
+}
+
+DEFINE_PC_MACHINE(q35_rhel820, "pc-q35-rhel8.2.0", pc_q35_init_rhel820,
+                  pc_q35_machine_rhel820_options);
+
+static void pc_q35_init_rhel810(MachineState *machine)
+{
+    pc_q35_init(machine);
+}
+
+static void pc_q35_machine_rhel810_options(MachineClass *m)
+{
+    PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
+    pc_q35_machine_rhel820_options(m);
+    m->desc = "RHEL-8.1.0 PC (Q35 + ICH9, 2009)";
+    m->alias = NULL;
+    pcmc->smbios_stream_product = NULL;
+    pcmc->smbios_stream_version = NULL;
+    compat_props_add(m->compat_props, hw_compat_rhel_8_1, hw_compat_rhel_8_1_len);
+    compat_props_add(m->compat_props, pc_rhel_8_1_compat, pc_rhel_8_1_compat_len);
+}
+
+DEFINE_PC_MACHINE(q35_rhel810, "pc-q35-rhel8.1.0", pc_q35_init_rhel810,
+                  pc_q35_machine_rhel810_options);
+
+static void pc_q35_init_rhel800(MachineState *machine)
+{
+    pc_q35_init(machine);
+}
+
+static void pc_q35_machine_rhel800_options(MachineClass *m)
+{
+    PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
+    pc_q35_machine_rhel810_options(m);
+    m->desc = "RHEL-8.0.0 PC (Q35 + ICH9, 2009)";
+    m->smbus_no_migration_support = true;
+    m->alias = NULL;
+    pcmc->pvh_enabled = false;
+    pcmc->default_cpu_version = CPU_VERSION_LEGACY;
+    compat_props_add(m->compat_props, hw_compat_rhel_8_0, hw_compat_rhel_8_0_len);
+    compat_props_add(m->compat_props, pc_rhel_8_0_compat, pc_rhel_8_0_compat_len);
+}
+
+DEFINE_PC_MACHINE(q35_rhel800, "pc-q35-rhel8.0.0", pc_q35_init_rhel800,
+                  pc_q35_machine_rhel800_options);
+
+static void pc_q35_init_rhel760(MachineState *machine)
+{
+    pc_q35_init(machine);
+}
+
+static void pc_q35_machine_rhel760_options(MachineClass *m)
+{
+    pc_q35_machine_rhel800_options(m);
+    m->alias = NULL;
+    m->desc = "RHEL-7.6.0 PC (Q35 + ICH9, 2009)";
+    m->async_pf_vmexit_disable = true;
+    compat_props_add(m->compat_props, hw_compat_rhel_7_6, hw_compat_rhel_7_6_len);
+    compat_props_add(m->compat_props, pc_rhel_7_6_compat, pc_rhel_7_6_compat_len);
+}
+
+DEFINE_PC_MACHINE(q35_rhel760, "pc-q35-rhel7.6.0", pc_q35_init_rhel760,
+                  pc_q35_machine_rhel760_options);
+
+static void pc_q35_init_rhel750(MachineState *machine)
+{
+    pc_q35_init(machine);
+}
+
+static void pc_q35_machine_rhel750_options(MachineClass *m)
+{
+    PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
+    pc_q35_machine_rhel760_options(m);
+    m->alias = NULL;
+    m->desc = "RHEL-7.5.0 PC (Q35 + ICH9, 2009)";
+    m->auto_enable_numa_with_memhp = false;
+    pcmc->default_nic_model = "e1000";
+    compat_props_add(m->compat_props, hw_compat_rhel_7_5, hw_compat_rhel_7_5_len);
+    compat_props_add(m->compat_props, pc_rhel_7_5_compat, pc_rhel_7_5_compat_len);
+}
+
+DEFINE_PC_MACHINE(q35_rhel750, "pc-q35-rhel7.5.0", pc_q35_init_rhel750,
+                  pc_q35_machine_rhel750_options);
+
+static void pc_q35_init_rhel740(MachineState *machine)
+{
+    pc_q35_init(machine);
+}
+
+static void pc_q35_machine_rhel740_options(MachineClass *m)
+{
+    PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
+    pc_q35_machine_rhel750_options(m);
+    m->desc = "RHEL-7.4.0 PC (Q35 + ICH9, 2009)";
+    pcmc->pc_rom_ro = false;
+    compat_props_add(m->compat_props, hw_compat_rhel_7_4, hw_compat_rhel_7_4_len);
+    compat_props_add(m->compat_props, pc_rhel_7_4_compat, pc_rhel_7_4_compat_len);
+}
+
+DEFINE_PC_MACHINE(q35_rhel740, "pc-q35-rhel7.4.0", pc_q35_init_rhel740,
+                  pc_q35_machine_rhel740_options);
+
+static void pc_q35_init_rhel730(MachineState *machine)
+{
+    pc_q35_init(machine);
+}
+
+static void pc_q35_machine_rhel730_options(MachineClass *m)
+{
+    X86MachineClass *x86mc = X86_MACHINE_CLASS(m);
+    pc_q35_machine_rhel740_options(m);
+    m->desc = "RHEL-7.3.0 PC (Q35 + ICH9, 2009)";
+    m->max_cpus = 255;
+    x86mc->fwcfg_dma_enabled = false;
+    compat_props_add(m->compat_props, hw_compat_rhel_7_3, hw_compat_rhel_7_3_len);
+    compat_props_add(m->compat_props, pc_rhel_7_3_compat, pc_rhel_7_3_compat_len);
+}
+
+DEFINE_PC_MACHINE(q35_rhel730, "pc-q35-rhel7.3.0", pc_q35_init_rhel730,
+                  pc_q35_machine_rhel730_options);
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 8bba96ef2b3..04e87598152 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -263,6 +263,8 @@ struct MachineClass {
     strList *allowed_dynamic_sysbus_devices;
     bool auto_enable_numa_with_memhp;
     bool auto_enable_numa_with_memdev;
+    /* RHEL only */
+    bool async_pf_vmexit_disable;
     bool ignore_boot_device_suffixes;
     bool smbus_no_migration_support;
     bool nvdimm_supported;
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 7ccc9a1a078..d0544ee119c 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -125,6 +125,9 @@ struct PCMachineClass {
 
     /* create kvmclock device even when KVM PV features are not exposed */
     bool kvmclock_create_always;
+
+    /* RH only, see bz 1489800 */
+    bool pc_rom_ro;
 };
 
 #define TYPE_PC_MACHINE "generic-pc-machine"
@@ -280,6 +283,48 @@ extern const size_t pc_compat_1_5_len;
 extern GlobalProperty pc_compat_1_4[];
 extern const size_t pc_compat_1_4_len;
 
+extern GlobalProperty pc_rhel_compat[];
+extern const size_t pc_rhel_compat_len;
+
+extern GlobalProperty pc_rhel_8_4_compat[];
+extern const size_t pc_rhel_8_4_compat_len;
+
+extern GlobalProperty pc_rhel_8_3_compat[];
+extern const size_t pc_rhel_8_3_compat_len;
+
+extern GlobalProperty pc_rhel_8_2_compat[];
+extern const size_t pc_rhel_8_2_compat_len;
+
+extern GlobalProperty pc_rhel_8_1_compat[];
+extern const size_t pc_rhel_8_1_compat_len;
+
+extern GlobalProperty pc_rhel_8_0_compat[];
+extern const size_t pc_rhel_8_0_compat_len;
+
+extern GlobalProperty pc_rhel_7_6_compat[];
+extern const size_t pc_rhel_7_6_compat_len;
+
+extern GlobalProperty pc_rhel_7_5_compat[];
+extern const size_t pc_rhel_7_5_compat_len;
+
+extern GlobalProperty pc_rhel_7_4_compat[];
+extern const size_t pc_rhel_7_4_compat_len;
+
+extern GlobalProperty pc_rhel_7_3_compat[];
+extern const size_t pc_rhel_7_3_compat_len;
+
+extern GlobalProperty pc_rhel_7_2_compat[];
+extern const size_t pc_rhel_7_2_compat_len;
+
+extern GlobalProperty pc_rhel_7_1_compat[];
+extern const size_t pc_rhel_7_1_compat_len;
+
+extern GlobalProperty pc_rhel_7_0_compat[];
+extern const size_t pc_rhel_7_0_compat_len;
+
+extern GlobalProperty hw_compat_4_2_extra[];
+extern const size_t hw_compat_4_2_extra_len;
+
 /* Helper for setting model-id for CPU models that changed model-id
  * depending on QEMU versions up to QEMU 2.4.
  */
diff --git a/target/i386/kvm/kvm-cpu.c b/target/i386/kvm/kvm-cpu.c
index d95028018e8..7b004065aec 100644
--- a/target/i386/kvm/kvm-cpu.c
+++ b/target/i386/kvm/kvm-cpu.c
@@ -131,6 +131,7 @@ static PropValue kvm_default_props[] = {
     { "acpi", "off" },
     { "monitor", "off" },
     { "svm", "off" },
+    { "kvm-pv-unhalt", "on" },
     { NULL, NULL },
 };
 
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 5a698bde19a..a668f521ac9 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -3336,6 +3336,7 @@ static int kvm_get_msrs(X86CPU *cpu)
     struct kvm_msr_entry *msrs = cpu->kvm_msr_buf->entries;
     int ret, i;
     uint64_t mtrr_top_bits;
+    MachineClass *mc = MACHINE_GET_CLASS(qdev_get_machine());
 
     kvm_msr_buf_reset(cpu);
 
@@ -3665,6 +3666,9 @@ static int kvm_get_msrs(X86CPU *cpu)
             break;
         case MSR_KVM_ASYNC_PF_EN:
             env->async_pf_en_msr = msrs[i].data;
+            if (mc->async_pf_vmexit_disable) {
+                env->async_pf_en_msr &= ~(1ULL << 2);
+            }
             break;
         case MSR_KVM_ASYNC_PF_INT:
             env->async_pf_int_msr = msrs[i].data;
diff --git a/tests/qtest/pvpanic-test.c b/tests/qtest/pvpanic-test.c
index 6dcad2db498..580c2c43d20 100644
--- a/tests/qtest/pvpanic-test.c
+++ b/tests/qtest/pvpanic-test.c
@@ -17,7 +17,7 @@ static void test_panic_nopause(void)
     QDict *response, *data;
     QTestState *qts;
 
-    qts = qtest_init("-device pvpanic -action panic=none");
+    qts = qtest_init("-M q35 -device pvpanic -action panic=none");
 
     val = qtest_inb(qts, 0x505);
     g_assert_cmpuint(val, ==, 3);
@@ -40,7 +40,8 @@ static void test_panic(void)
     QDict *response, *data;
     QTestState *qts;
 
-    qts = qtest_init("-device pvpanic -action panic=pause");
+    /* RHEL: Use q35 */
+    qts = qtest_init("-M q35 -device pvpanic -action panic=pause");
 
     val = qtest_inb(qts, 0x505);
     g_assert_cmpuint(val, ==, 3);
-- 
Gitee


From b1eed7fde263974c8abff5ba2807b9f5f0dce924 Mon Sep 17 00:00:00 2001
From: Miroslav Rezanina <mrezanin@redhat.com>
Date: Wed, 2 Sep 2020 09:39:41 +0200
Subject: [PATCH 007/407] Enable make check

Fixing tests after device disabling and machine types changes and enabling
make check run during build.

Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>

Rebase changes (4.0.0):
- Remove testing for pseries-2.7 in endianess test
- Disable device-plug-test on s390x as it use disabled device
- Do not run cpu-plug-tests on 7.3 and older machine types

Rebase changes (4.1.0-rc0):
- removed iotests 068

Rebase changes (4.1.0-rc1):
- remove all 205 tests (unstable)

Rebase changes (4.2.0-rc0):
- partially disable hd-geo-test (requires lsi53c895a)

Rebase changes (5.1.0-rc1):
- Disable qtest/q35-test (uses upstream machine types)
- Do not run iotests on make checka
- Enabled iotests 071 and 099

Rebase changes (5.2.0 rc0):
- Disable cdrom tests (unsupported devices) on x86_64
- disable fuzz test

Rebase changes (6.0.0):
- Disabled xlnx-can-test
- Disable pxb-pcie subtest for bios-table-test
- Replace qtest usage of upstream q35 machine type with pc-q35-rhel8.4.0
- Not run cdrom-test on aarch64

Rebase changes (6.1.0):
- Remove unnecessary test disabling changes

Rebase changes (weekly-211006):
- New handling for bios-table-test (disabled downstream)

Merged patches (4.0.0):
- f7ffd13 Remove 7 qcow2 and luks iotests that are taking > 25 sec to run during the fast train build proce

Merged patches (4.1.0-rc0):
- 41288ff redhat: Remove raw iotest 205
---
 tests/qemu-iotests/051              |  8 ++++----
 tests/qtest/bios-tables-test.c      |  5 ++++-
 tests/qtest/boot-serial-test.c      |  6 +++++-
 tests/qtest/cdrom-test.c            |  4 ++++
 tests/qtest/cpu-plug-test.c         |  4 ++--
 tests/qtest/fuzz-e1000e-test.c      |  2 +-
 tests/qtest/fuzz-virtio-scsi-test.c |  2 +-
 tests/qtest/hd-geo-test.c           |  4 ++++
 tests/qtest/lpc-ich9-test.c         |  2 +-
 tests/qtest/meson.build             | 13 ++++---------
 tests/qtest/prom-env-test.c         |  4 ++++
 tests/qtest/test-x86-cpuid-compat.c |  2 ++
 tests/qtest/usb-hcd-xhci-test.c     |  4 ++++
 13 files changed, 40 insertions(+), 20 deletions(-)

diff --git a/tests/qemu-iotests/051 b/tests/qemu-iotests/051
index 1d2fa93a115..c8a2815f54b 100755
--- a/tests/qemu-iotests/051
+++ b/tests/qemu-iotests/051
@@ -174,9 +174,9 @@ run_qemu -drive if=virtio
 case "$QEMU_DEFAULT_MACHINE" in
     pc)
         run_qemu -drive if=none,id=disk -device ide-cd,drive=disk
-        run_qemu -drive if=none,id=disk -device lsi53c895a -device scsi-cd,drive=disk
+#        run_qemu -drive if=none,id=disk -device lsi53c895a -device scsi-cd,drive=disk
         run_qemu -drive if=none,id=disk -device ide-hd,drive=disk
-        run_qemu -drive if=none,id=disk -device lsi53c895a -device scsi-hd,drive=disk
+#        run_qemu -drive if=none,id=disk -device lsi53c895a -device scsi-hd,drive=disk
         ;;
      *)
         ;;
@@ -225,9 +225,9 @@ run_qemu -drive file="$TEST_IMG",if=virtio,readonly=on
 case "$QEMU_DEFAULT_MACHINE" in
     pc)
         run_qemu -drive file="$TEST_IMG",if=none,id=disk,readonly=on -device ide-cd,drive=disk
-        run_qemu -drive file="$TEST_IMG",if=none,id=disk,readonly=on -device lsi53c895a -device scsi-cd,drive=disk
+#        run_qemu -drive file="$TEST_IMG",if=none,id=disk,readonly=on -device lsi53c895a -device scsi-cd,drive=disk
         run_qemu -drive file="$TEST_IMG",if=none,id=disk,readonly=on -device ide-hd,drive=disk
-        run_qemu -drive file="$TEST_IMG",if=none,id=disk,readonly=on -device lsi53c895a -device scsi-hd,drive=disk
+#        run_qemu -drive file="$TEST_IMG",if=none,id=disk,readonly=on -device lsi53c895a -device scsi-hd,drive=disk
         ;;
      *)
         ;;
diff --git a/tests/qtest/bios-tables-test.c b/tests/qtest/bios-tables-test.c
index 258874167ef..16d8304cde1 100644
--- a/tests/qtest/bios-tables-test.c
+++ b/tests/qtest/bios-tables-test.c
@@ -1372,6 +1372,7 @@ static void test_acpi_virt_tcg_numamem(void)
 
 }
 
+#if 0 /* Disabled for Red Hat Enterprise Linux */
 static void test_acpi_virt_tcg_pxb(void)
 {
     test_data data = {
@@ -1403,6 +1404,7 @@ static void test_acpi_virt_tcg_pxb(void)
 
     free_test_data(&data);
 }
+#endif
 
 static void test_acpi_tcg_acpi_hmat(const char *machine)
 {
@@ -1644,7 +1646,8 @@ int main(int argc, char *argv[])
             qtest_add_func("acpi/virt", test_acpi_virt_tcg);
             qtest_add_func("acpi/virt/numamem", test_acpi_virt_tcg_numamem);
             qtest_add_func("acpi/virt/memhp", test_acpi_virt_tcg_memhp);
-            qtest_add_func("acpi/virt/pxb", test_acpi_virt_tcg_pxb);
+            /* Disabled for Red Hat Enterprise Linux
+            qtest_add_func("acpi/virt/pxb", test_acpi_virt_tcg_pxb); */
             qtest_add_func("acpi/virt/oem-fields", test_acpi_oem_fields_virt);
         }
     }
diff --git a/tests/qtest/boot-serial-test.c b/tests/qtest/boot-serial-test.c
index 83828ba2707..294476b9592 100644
--- a/tests/qtest/boot-serial-test.c
+++ b/tests/qtest/boot-serial-test.c
@@ -148,19 +148,23 @@ static testdef_t tests[] = {
     { "ppc", "g3beige", "", "PowerPC,750" },
     { "ppc", "mac99", "", "PowerPC,G4" },
     { "ppc", "sam460ex", "-m 256", "DRAM:  256 MiB" },
+#if 0 /* Disabled for Red Hat Enterprise Linux */
     { "ppc64", "ppce500", "", "U-Boot" },
     { "ppc64", "40p", "-m 192", "Memory: 192M" },
     { "ppc64", "mac99", "", "PowerPC,970FX" },
+#endif
     { "ppc64", "pseries",
       "-machine " PSERIES_DEFAULT_CAPABILITIES,
       "Open Firmware" },
+#if 0 /* Disabled for Red Hat Enterprise Linux */
     { "ppc64", "powernv8", "", "OPAL" },
     { "ppc64", "powernv9", "", "OPAL" },
     { "ppc64", "sam460ex", "-device e1000", "8086  100e" },
+#endif
     { "i386", "isapc", "-cpu qemu32 -device sga", "SGABIOS" },
     { "i386", "pc", "-device sga", "SGABIOS" },
     { "i386", "q35", "-device sga", "SGABIOS" },
-    { "x86_64", "isapc", "-cpu qemu32 -device sga", "SGABIOS" },
+    { "x86_64", "pc", "-cpu qemu32 -device sga", "SGABIOS" },
     { "x86_64", "q35", "-device sga", "SGABIOS" },
     { "sparc", "LX", "", "TMS390S10" },
     { "sparc", "SS-4", "", "MB86904" },
diff --git a/tests/qtest/cdrom-test.c b/tests/qtest/cdrom-test.c
index 5af944a5fb7..69d9bac38ae 100644
--- a/tests/qtest/cdrom-test.c
+++ b/tests/qtest/cdrom-test.c
@@ -140,6 +140,7 @@ static void add_x86_tests(void)
         qtest_add_data_func("cdrom/boot/isapc", "-M isapc "
                             "-drive if=ide,media=cdrom,file=", test_cdboot);
     }
+#if 0  /* Disabled for Red Hat Enterprise Linux */
     qtest_add_data_func("cdrom/boot/am53c974",
                         "-device am53c974 -device scsi-cd,drive=cd1 "
                         "-drive if=none,id=cd1,format=raw,file=", test_cdboot);
@@ -155,6 +156,7 @@ static void add_x86_tests(void)
     qtest_add_data_func("cdrom/boot/megasas-gen2", "-M q35 "
                         "-device megasas-gen2 -device scsi-cd,drive=cd1 "
                         "-blockdev file,node-name=cd1,filename=", test_cdboot);
+#endif
 }
 
 static void add_s390x_tests(void)
@@ -220,6 +222,7 @@ int main(int argc, char **argv)
             "magnum", "malta", "pica61", NULL
         };
         add_cdrom_param_tests(mips64machines);
+#if 0 /* Disabled for Red Hat Enterprise Linux */
     } else if (g_str_equal(arch, "arm") || g_str_equal(arch, "aarch64")) {
         const char *armmachines[] = {
             "realview-eb", "realview-eb-mpcore", "realview-pb-a8",
@@ -227,6 +230,7 @@ int main(int argc, char **argv)
             "vexpress-a9", "virt", NULL
         };
         add_cdrom_param_tests(armmachines);
+#endif
     } else {
         const char *nonemachine[] = { "none", NULL };
         add_cdrom_param_tests(nonemachine);
diff --git a/tests/qtest/cpu-plug-test.c b/tests/qtest/cpu-plug-test.c
index a1c689414be..a8f076711c7 100644
--- a/tests/qtest/cpu-plug-test.c
+++ b/tests/qtest/cpu-plug-test.c
@@ -110,8 +110,8 @@ static void add_pseries_test_case(const char *mname)
     char *path;
     PlugTestData *data;
 
-    if (!g_str_has_prefix(mname, "pseries-") ||
-        (g_str_has_prefix(mname, "pseries-2.") && atoi(&mname[10]) < 7)) {
+    if (!g_str_has_prefix(mname, "pseries-rhel") ||
+        (g_str_has_prefix(mname, "pseries-rhel7.") && atoi(&mname[14]) < 4)) {
         return;
     }
     data = g_new(PlugTestData, 1);
diff --git a/tests/qtest/fuzz-e1000e-test.c b/tests/qtest/fuzz-e1000e-test.c
index 66229e60964..947fba73b74 100644
--- a/tests/qtest/fuzz-e1000e-test.c
+++ b/tests/qtest/fuzz-e1000e-test.c
@@ -17,7 +17,7 @@ static void test_lp1879531_eth_get_rss_ex_dst_addr(void)
 {
     QTestState *s;
 
-    s = qtest_init("-nographic -monitor none -serial none -M pc-q35-5.0");
+    s = qtest_init("-nographic -monitor none -serial none -M pc-q35-rhel8.4.0");
 
     qtest_outl(s, 0xcf8, 0x80001010);
     qtest_outl(s, 0xcfc, 0xe1020000);
diff --git a/tests/qtest/fuzz-virtio-scsi-test.c b/tests/qtest/fuzz-virtio-scsi-test.c
index aaf6d10e189..43727d62ac0 100644
--- a/tests/qtest/fuzz-virtio-scsi-test.c
+++ b/tests/qtest/fuzz-virtio-scsi-test.c
@@ -19,7 +19,7 @@ static void test_mmio_oob_from_memory_region_cache(void)
 {
     QTestState *s;
 
-    s = qtest_init("-M pc-q35-5.2 -display none -m 512M "
+    s = qtest_init("-M pc-q35-rhel8.4.0 -display none -m 512M "
                    "-device virtio-scsi,num_queues=8,addr=03.0 ");
 
     qtest_outl(s, 0xcf8, 0x80001811);
diff --git a/tests/qtest/hd-geo-test.c b/tests/qtest/hd-geo-test.c
index 113126ae06c..999ef2aace6 100644
--- a/tests/qtest/hd-geo-test.c
+++ b/tests/qtest/hd-geo-test.c
@@ -737,6 +737,7 @@ static void test_override_ide(void)
     test_override(args, expected);
 }
 
+#if 0 /* Require lsi53c895a - not supported on RHEL */
 static void test_override_scsi(void)
 {
     TestArgs *args = create_args();
@@ -781,6 +782,7 @@ static void test_override_scsi_2_controllers(void)
     add_scsi_disk(args, 3, 1, 0, 1, 2, 0, 1, 0);
     test_override(args, expected);
 }
+#endif
 
 static void test_override_virtio_blk(void)
 {
@@ -960,9 +962,11 @@ int main(int argc, char **argv)
     qtest_add_func("hd-geo/ide/device/user/chst", test_ide_device_user_chst);
     if (have_qemu_img()) {
         qtest_add_func("hd-geo/override/ide", test_override_ide);
+#if 0 /* Require lsi53c895a - not supported on RHEL */
         qtest_add_func("hd-geo/override/scsi", test_override_scsi);
         qtest_add_func("hd-geo/override/scsi_2_controllers",
                        test_override_scsi_2_controllers);
+#endif
         qtest_add_func("hd-geo/override/virtio_blk", test_override_virtio_blk);
         qtest_add_func("hd-geo/override/zero_chs", test_override_zero_chs);
         qtest_add_func("hd-geo/override/scsi_hot_unplug",
diff --git a/tests/qtest/lpc-ich9-test.c b/tests/qtest/lpc-ich9-test.c
index fe0bef99807..7a9d51579b5 100644
--- a/tests/qtest/lpc-ich9-test.c
+++ b/tests/qtest/lpc-ich9-test.c
@@ -15,7 +15,7 @@ static void test_lp1878642_pci_bus_get_irq_level_assert(void)
 {
     QTestState *s;
 
-    s = qtest_init("-M pc-q35-5.0 "
+    s = qtest_init("-M pc-q35-rhel8.4.0 "
                    "-nographic -monitor none -serial none");
 
     qtest_outl(s, 0xcf8, 0x8000f840); /* PMBASE */
diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index c9d8458062f..049e06c057a 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -68,7 +68,6 @@ qtests_i386 = \
   (config_all_devices.has_key('CONFIG_RTL8139_PCI') ? ['rtl8139-test'] : []) +              \
   (config_all_devices.has_key('CONFIG_E1000E_PCI_EXPRESS') ? ['fuzz-e1000e-test'] : []) +   \
   (config_all_devices.has_key('CONFIG_ESP_PCI') ? ['am53c974-test'] : []) +                 \
-  (unpack_edk2_blobs ? ['bios-tables-test'] : []) +                                         \
   qtests_pci +                                                                              \
   ['fdc-test',
    'ide-test',
@@ -81,7 +80,6 @@ qtests_i386 = \
    'drive_del-test',
    'tco-test',
    'cpu-plug-test',
-   'q35-test',
    'vmgenid-test',
    'migration-test',
    'test-x86-cpuid-compat',
@@ -130,17 +128,15 @@ qtests_mips64el = \
 
 qtests_ppc = \
   (config_all_devices.has_key('CONFIG_ISA_TESTDEV') ? ['endianness-test'] : []) +            \
-  (config_all_devices.has_key('CONFIG_M48T59') ? ['m48t59-test'] : []) +                     \
-  ['boot-order-test', 'prom-env-test', 'boot-serial-test']                 \
+  (config_all_devices.has_key('CONFIG_M48T59') ? ['m48t59-test'] : [])
 
 qtests_ppc64 = \
   (config_all_devices.has_key('CONFIG_PSERIES') ? ['device-plug-test'] : []) +               \
   (config_all_devices.has_key('CONFIG_POWERNV') ? ['pnv-xscom-test'] : []) +                 \
   (config_all_devices.has_key('CONFIG_PSERIES') ? ['rtas-test'] : []) +                      \
-  (slirp.found() ? ['pxe-test', 'test-netfilter'] : []) +              \
+  (slirp.found() ? ['pxe-test'] : []) +              \
   (config_all_devices.has_key('CONFIG_USB_UHCI') ? ['usb-hcd-uhci-test'] : []) +             \
   (config_all_devices.has_key('CONFIG_USB_XHCI_NEC') ? ['usb-hcd-xhci-test'] : []) +         \
-  (config_host.has_key('CONFIG_POSIX') ? ['test-filter-mirror'] : []) +                      \
   qtests_pci + ['migration-test', 'numa-test', 'cpu-plug-test', 'drive_del-test']
 
 qtests_sh4 = (config_all_devices.has_key('CONFIG_ISA_TESTDEV') ? ['endianness-test'] : [])
@@ -186,8 +182,8 @@ qtests_aarch64 = \
   ['arm-cpu-features',
    'numa-test',
    'boot-serial-test',
-   'xlnx-can-test',
-   'fuzz-xlnx-dp-test',
+#   'xlnx-can-test',
+#   'fuzz-xlnx-dp-test',
    'migration-test']
 
 qtests_s390x = \
@@ -196,7 +192,6 @@ qtests_s390x = \
   (config_host.has_key('CONFIG_POSIX') ? ['test-filter-redirector'] : []) +                     \
   ['boot-serial-test',
    'drive_del-test',
-   'device-plug-test',
    'virtio-ccw-test',
    'cpu-plug-test',
    'migration-test']
diff --git a/tests/qtest/prom-env-test.c b/tests/qtest/prom-env-test.c
index f41d80154a5..f8dc478ce81 100644
--- a/tests/qtest/prom-env-test.c
+++ b/tests/qtest/prom-env-test.c
@@ -89,10 +89,14 @@ int main(int argc, char *argv[])
     if (!strcmp(arch, "ppc")) {
         add_tests(ppc_machines);
     } else if (!strcmp(arch, "ppc64")) {
+#if 0 /* Disabled for Red Hat Enterprise Linux */
         add_tests(ppc_machines);
         if (g_test_slow()) {
+#endif
             qtest_add_data_func("prom-env/pseries", "pseries", test_machine);
+#if 0 /* Disabled for Red Hat Enterprise Linux */
         }
+#endif
     } else if (!strcmp(arch, "sparc")) {
         add_tests(sparc_machines);
     } else if (!strcmp(arch, "sparc64")) {
diff --git a/tests/qtest/test-x86-cpuid-compat.c b/tests/qtest/test-x86-cpuid-compat.c
index f28848e06e8..6b2fd398a2d 100644
--- a/tests/qtest/test-x86-cpuid-compat.c
+++ b/tests/qtest/test-x86-cpuid-compat.c
@@ -300,6 +300,7 @@ int main(int argc, char **argv)
                    "-cpu 486,xlevel2=0xC0000002,xstore=on",
                    "xlevel2", 0xC0000002);
 
+#if 0 /* Disabled in Red Hat Enterprise Linux */
     /* Check compatibility of old machine-types that didn't
      * auto-increase level/xlevel/xlevel2: */
 
@@ -350,6 +351,7 @@ int main(int argc, char **argv)
     add_cpuid_test("x86/cpuid/xlevel-compat/pc-i440fx-2.4/npt-on",
                    "-machine pc-i440fx-2.4 -cpu SandyBridge,svm=on,npt=on",
                    "xlevel", 0x80000008);
+#endif
 
     /* Test feature parsing */
     add_feature_test("x86/cpuid/features/plus",
diff --git a/tests/qtest/usb-hcd-xhci-test.c b/tests/qtest/usb-hcd-xhci-test.c
index 10ef9d2a91a..38558730506 100644
--- a/tests/qtest/usb-hcd-xhci-test.c
+++ b/tests/qtest/usb-hcd-xhci-test.c
@@ -21,6 +21,7 @@ static void test_xhci_hotplug(void)
     usb_test_hotplug(global_qtest, "xhci", "1", NULL);
 }
 
+#if 0 /* Disabled for Red Hat Enterprise Linux */
 static void test_usb_uas_hotplug(void)
 {
     QTestState *qts = global_qtest;
@@ -36,6 +37,7 @@ static void test_usb_uas_hotplug(void)
     qtest_qmp_device_del(qts, "scsihd");
     qtest_qmp_device_del(qts, "uas");
 }
+#endif
 
 static void test_usb_ccid_hotplug(void)
 {
@@ -56,7 +58,9 @@ int main(int argc, char **argv)
 
     qtest_add_func("/xhci/pci/init", test_xhci_init);
     qtest_add_func("/xhci/pci/hotplug", test_xhci_hotplug);
+#if 0 /* Disabled for Red Hat Enterprise Linux */
     qtest_add_func("/xhci/pci/hotplug/usb-uas", test_usb_uas_hotplug);
+#endif
     qtest_add_func("/xhci/pci/hotplug/usb-ccid", test_usb_ccid_hotplug);
 
     qtest_start("-device nec-usb-xhci,id=xhci"
-- 
Gitee


From 97e833c32883089e9b5231cb02eb211ab4f9a361 Mon Sep 17 00:00:00 2001
From: Bandan Das <bsd@redhat.com>
Date: Tue, 3 Dec 2013 20:05:13 +0100
Subject: [PATCH 008/407] vfio: cap number of devices that can be assigned

RH-Author: Bandan Das <bsd@redhat.com>
Message-id: <1386101113-31560-3-git-send-email-bsd@redhat.com>
Patchwork-id: 55984
O-Subject: [PATCH RHEL7 qemu-kvm v2 2/2] vfio: cap number of devices that can be assigned
Bugzilla: 678368
RH-Acked-by: Alex Williamson <alex.williamson@redhat.com>
RH-Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
RH-Acked-by: Michael S. Tsirkin <mst@redhat.com>

Go through all groups to get count of total number of devices
active to enforce limit

Reasoning from Alex for the limit(32) - Assuming 3 slots per
device, with 125 slots (number of memory slots for RHEL 7),
we can support almost 40 devices and still have few slots left
for other uses. Stepping down a bit, the number 32 arbitrarily
matches the number of slots on a PCI bus and is also a nice power
of two.

Signed-off-by: Bandan Das <bsd@redhat.com>

Rebase notes (2.8.0):
- removed return value for vfio_realize (commit 1a22aca)

Merged patches (2.9.0):
- 17eb774 vfio: Use error_setg when reporting max assigned device overshoot

 Merged patches (4.1.0-rc3):
- 2b89558 vfio: increase the cap on number of assigned devices to 64
---
 hw/vfio/pci.c | 29 ++++++++++++++++++++++++++++-
 hw/vfio/pci.h |  1 +
 2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 7b45353ce27..eb725a3aeec 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -45,6 +45,9 @@
 
 #define TYPE_VFIO_PCI_NOHOTPLUG "vfio-pci-nohotplug"
 
+/* RHEL only: Set once for the first assigned dev */
+static uint16_t device_limit;
+
 static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
 static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
 
@@ -2807,9 +2810,30 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
     ssize_t len;
     struct stat st;
     int groupid;
-    int i, ret;
+    int ret, i = 0;
     bool is_mdev;
 
+    if (device_limit && device_limit != vdev->assigned_device_limit) {
+            error_setg(errp, "Assigned device limit has been redefined. "
+                       "Old:%d, New:%d",
+                       device_limit, vdev->assigned_device_limit);
+            return;
+    } else {
+        device_limit = vdev->assigned_device_limit;
+    }
+
+    QLIST_FOREACH(group, &vfio_group_list, next) {
+        QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
+            i++;
+        }
+    }
+
+    if (i >= vdev->assigned_device_limit) {
+        error_setg(errp, "Maximum supported vfio devices (%d) "
+                     "already attached", vdev->assigned_device_limit);
+        return;
+    }
+
     if (!vdev->vbasedev.sysfsdev) {
         if (!(~vdev->host.domain || ~vdev->host.bus ||
               ~vdev->host.slot || ~vdev->host.function)) {
@@ -3246,6 +3270,9 @@ static Property vfio_pci_dev_properties[] = {
     DEFINE_PROP_BOOL("x-no-kvm-msix", VFIOPCIDevice, no_kvm_msix, false),
     DEFINE_PROP_BOOL("x-no-geforce-quirks", VFIOPCIDevice,
                      no_geforce_quirks, false),
+    /* RHEL only */
+    DEFINE_PROP_UINT16("x-assigned-device-limit", VFIOPCIDevice,
+                       assigned_device_limit, 64),
     DEFINE_PROP_BOOL("x-no-kvm-ioeventfd", VFIOPCIDevice, no_kvm_ioeventfd,
                      false),
     DEFINE_PROP_BOOL("x-no-vfio-ioeventfd", VFIOPCIDevice, no_vfio_ioeventfd,
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index 64777516d16..e0fe6ca97e3 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -139,6 +139,7 @@ struct VFIOPCIDevice {
     EventNotifier err_notifier;
     EventNotifier req_notifier;
     int (*resetfn)(struct VFIOPCIDevice *);
+    uint16_t assigned_device_limit;
     uint32_t vendor_id;
     uint32_t device_id;
     uint32_t sub_vendor_id;
-- 
Gitee


From 16dce76f649dcfcf789dca7b718600924b553e64 Mon Sep 17 00:00:00 2001
From: Eduardo Habkost <ehabkost@redhat.com>
Date: Wed, 4 Dec 2013 18:53:17 +0100
Subject: [PATCH 009/407] Add support statement to -help output

RH-Author: Eduardo Habkost <ehabkost@redhat.com>
Message-id: <1386183197-27761-1-git-send-email-ehabkost@redhat.com>
Patchwork-id: 55994
O-Subject: [qemu-kvm RHEL7 PATCH] Add support statement to -help output
Bugzilla: 972773
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Acked-by: knoel@redhat.com
RH-Acked-by: Paolo Bonzini <pbonzini@redhat.com>

Add support statement to -help output, reporting direct qemu-kvm usage
as unsupported by Red Hat, and advising users to use libvirt instead.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
---
 softmmu/vl.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/softmmu/vl.c b/softmmu/vl.c
index 620a1f1367e..d46b8fb4ab6 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -827,9 +827,17 @@ static void version(void)
            QEMU_COPYRIGHT "\n");
 }
 
+static void print_rh_warning(void)
+{
+    printf("\nWARNING: Direct use of qemu-kvm from the command line is not supported by Red Hat.\n"
+             "WARNING: Use libvirt as the stable management interface.\n"
+             "WARNING: Some command line options listed here may not be available in future releases.\n\n");
+}
+
 static void help(int exitcode)
 {
     version();
+    print_rh_warning();
     printf("usage: %s [options] [disk_image]\n\n"
            "'disk_image' is a raw hard disk image for IDE hard disk 0\n\n",
             error_get_progname());
@@ -855,6 +863,7 @@ static void help(int exitcode)
            "\n"
            QEMU_HELP_BOTTOM "\n");
 
+    print_rh_warning();
     exit(exitcode);
 }
 
-- 
Gitee


From d53bb2586ebf2eb929ab942912a9c3c4dc72623f Mon Sep 17 00:00:00 2001
From: Andrew Jones <drjones@redhat.com>
Date: Tue, 21 Jan 2014 10:46:52 +0100
Subject: [PATCH 010/407] globally limit the maximum number of CPUs

We now globally limit the number of VCPUs.
Especially, there is no way one can specify more than
max_cpus VCPUs for a VM.

This allows us the restore the ppc max_cpus limitation to the upstream
default and minimize the ppc hack in kvm-all.c.

Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>
Signed-off-by: Danilo Cesar Lemes de Paula <ddepaula@redhat.com>

Rebase notes (2.11.0):
- Removed CONFIG_RHV reference
- Update commit log

Merged patches (2.11.0):
- 92fef14623 redhat: remove manual max_cpus limitations for ppc
- bb722e9eff redhat: globally limit the maximum number of CPUs
- fdeef3c1c7 RHEL: Set vcpus hard limit to 240 for Power
- 0584216921 Match POWER max cpus to x86

Signed-off-by: Andrew Jones <drjones@redhat.com>

Merged patches (5.1.0):
- redhat: globally limit the maximum number of CPUs
- redhat: remove manual max_cpus limitations for ppc
- use recommended max vcpu count

Merged patches (5.2.0 rc0):
- f8a4123 vl: Remove downstream-only MAX_RHEL_CPUS code
---
 accel/kvm/kvm-all.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index eecd8031cf6..8f2a53438f6 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -2423,6 +2423,18 @@ static int kvm_init(MachineState *ms)
     soft_vcpus_limit = kvm_recommended_vcpus(s);
     hard_vcpus_limit = kvm_max_vcpus(s);
 
+#ifdef HOST_PPC64
+    /*
+     * On POWER, the kernel advertises a soft limit based on the
+     * number of CPU threads on the host.  We want to allow exceeding
+     * this for testing purposes, so we don't want to set hard limit
+     * to soft limit as on x86.
+     */
+#else
+    /* RHEL doesn't support nr_vcpus > soft_vcpus_limit */
+    hard_vcpus_limit = soft_vcpus_limit;
+#endif
+
     while (nc->name) {
         if (nc->num > soft_vcpus_limit) {
             warn_report("Number of %s cpus requested (%d) exceeds "
-- 
Gitee


From 36d5e67ffc2509d920eefe2c673a23c94e98b6f7 Mon Sep 17 00:00:00 2001
From: Miroslav Rezanina <mrezanin@redhat.com>
Date: Wed, 8 Jul 2020 08:35:50 +0200
Subject: [PATCH 011/407] Use qemu-kvm in documentation instead of
 qemu-system-<arch>

Patchwork-id: 62380
O-Subject: [RHEV-7.1 qemu-kvm-rhev PATCHv4] Use qemu-kvm in documentation instead of qemu-system-i386
Bugzilla: 1140620
RH-Acked-by: Laszlo Ersek <lersek@redhat.com>
RH-Acked-by: Markus Armbruster <armbru@redhat.com>
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

From: Miroslav Rezanina <mrezanin@redhat.com>

We change the name and location of qemu-kvm binaries. Update documentation
to reflect this change. Only architectures available in RHEL are updated.

Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>

Rebase notes (5.1.0 rc0):
 - qemu-block-drivers.texi converted to qemu-block-drivers.rst (upstream)

Rebase notes (5.2.0 rc0):
 - rewrite patch to new docs structure
---
 docs/defs.rst.inc              |  4 ++--
 docs/tools/qemu-trace-stap.rst | 14 +++++++-------
 qemu-options.hx                | 10 +++++-----
 3 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/docs/defs.rst.inc b/docs/defs.rst.inc
index 52d6454b939..d74dbdeca98 100644
--- a/docs/defs.rst.inc
+++ b/docs/defs.rst.inc
@@ -9,7 +9,7 @@
    but the manpages will end up misrendered with following normal text
    incorrectly in boldface.
 
-.. |qemu_system| replace:: qemu-system-x86_64
-.. |qemu_system_x86| replace:: qemu-system-x86_64
+.. |qemu_system| replace:: qemu-kvm
+.. |qemu_system_x86| replace:: qemu-kvm
 .. |I2C| replace:: I\ :sup:`2`\ C
 .. |I2S| replace:: I\ :sup:`2`\ S
diff --git a/docs/tools/qemu-trace-stap.rst b/docs/tools/qemu-trace-stap.rst
index d53073b52b6..9e93df084f3 100644
--- a/docs/tools/qemu-trace-stap.rst
+++ b/docs/tools/qemu-trace-stap.rst
@@ -46,19 +46,19 @@ The following commands are valid:
   any of the listed names. If no *PATTERN* is given, the all possible
   probes will be listed.
 
-  For example, to list all probes available in the ``qemu-system-x86_64``
+  For example, to list all probes available in the ``qemu-kvm``
   binary:
 
   ::
 
-    $ qemu-trace-stap list qemu-system-x86_64
+    $ qemu-trace-stap list qemu-kvm
 
   To filter the list to only cover probes related to QEMU's cryptographic
   subsystem, in a binary outside ``$PATH``
 
   ::
 
-    $ qemu-trace-stap list /opt/qemu/4.0.0/bin/qemu-system-x86_64 'qcrypto*'
+    $ qemu-trace-stap list /opt/qemu/4.0.0/bin/qemu-kvm 'qcrypto*'
 
 .. option:: run OPTIONS BINARY PATTERN...
 
@@ -90,18 +90,18 @@ The following commands are valid:
     Restrict the tracing session so that it only triggers for the process
     identified by *PID*.
 
-  For example, to monitor all processes executing ``qemu-system-x86_64``
+  For example, to monitor all processes executing ``qemu-kvm``
   as found on ``$PATH``, displaying all I/O related probes:
 
   ::
 
-    $ qemu-trace-stap run qemu-system-x86_64 'qio*'
+    $ qemu-trace-stap run qemu-kvm 'qio*'
 
   To monitor only the QEMU process with PID 1732
 
   ::
 
-    $ qemu-trace-stap run --pid=1732 qemu-system-x86_64 'qio*'
+    $ qemu-trace-stap run --pid=1732 qemu-kvm 'qio*'
 
   To monitor QEMU processes running an alternative binary outside of
   ``$PATH``, displaying verbose information about setup of the
@@ -109,7 +109,7 @@ The following commands are valid:
 
   ::
 
-    $ qemu-trace-stap -v run /opt/qemu/4.0.0/qemu-system-x86_64 'qio*'
+    $ qemu-trace-stap -v run /opt/qemu/4.0.0/qemu-kvm 'qio*'
 
 See also
 --------
diff --git a/qemu-options.hx b/qemu-options.hx
index ae2c6dbbfc0..94c4a8dbaff 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -3150,11 +3150,11 @@ SRST
 
     ::
 
-        qemu -m 512 -object memory-backend-file,id=mem,size=512M,mem-path=/hugetlbfs,share=on \
-             -numa node,memdev=mem \
-             -chardev socket,id=chr0,path=/path/to/socket \
-             -netdev type=vhost-user,id=net0,chardev=chr0 \
-             -device virtio-net-pci,netdev=net0
+        qemu-kvm -m 512 -object memory-backend-file,id=mem,size=512M,mem-path=/hugetlbfs,share=on \
+                 -numa node,memdev=mem \
+                 -chardev socket,id=chr0,path=/path/to/socket \
+                 -netdev type=vhost-user,id=net0,chardev=chr0 \
+                 -device virtio-net-pci,netdev=net0
 
 ``-netdev vhost-vdpa,vhostdev=/path/to/dev``
     Establish a vhost-vdpa netdev.
-- 
Gitee


From e66e12dd03dce3867a2198b9a7e192aa333047c6 Mon Sep 17 00:00:00 2001
From: Fam Zheng <famz@redhat.com>
Date: Wed, 14 Jun 2017 15:37:01 +0200
Subject: [PATCH 012/407] virtio-scsi: Reject scsi-cd if data plane enabled
 [RHEL only]

RH-Author: Fam Zheng <famz@redhat.com>
Message-id: <20170614153701.14757-1-famz@redhat.com>
Patchwork-id: 75613
O-Subject: [RHV-7.4 qemu-kvm-rhev PATCH v3] virtio-scsi: Reject scsi-cd if data plane enabled [RHEL only]
Bugzilla: 1378816
RH-Acked-by: Paolo Bonzini <pbonzini@redhat.com>
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
RH-Acked-by: Max Reitz <mreitz@redhat.com>

We need a fix for RHEL 7.4 and 7.3.z, but unfortunately upstream isn't
ready. If it were, the changes will be too invasive. To have an idea:

https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg05400.html

is an incomplete attempt to fix part of the issue, and the remaining
work unfortunately involve even more complex changes.

As a band-aid, this partially reverts the effect of ef8875b
(virtio-scsi: Remove op blocker for dataplane, since v2.7). We cannot
simply revert that commit as a whole because we already shipped it in
qemu-kvm-rhev 7.3, since when, block jobs has been possible.  We should
only block what has been broken.  Also, faithfully reverting the above
commit means adding back the removed op blocker, but that is not enough,
because it still crashes when inserting media into an initially empty
scsi-cd.

All in all, scsi-cd on virtio-scsi-dataplane has basically been unusable
unless the scsi-cd never enters an empty state, so, disable it
altogether.  Otherwise it would be much more difficult to avoid
crashing.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>
Signed-off-by: Danilo C. L. de Paula <ddepaula@redhat.com>
---
 hw/scsi/virtio-scsi.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 51fd09522ac..a35257c35a8 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -896,6 +896,15 @@ static void virtio_scsi_hotplug(HotplugHandler *hotplug_dev, DeviceState *dev,
     AioContext *old_context;
     int ret;
 
+    /* XXX: Remove this check once block backend is capable of handling
+     * AioContext change upon eject/insert.
+     * s->ctx is NULL if ioeventfd is off, s->ctx is qemu_get_aio_context() if
+     * data plane is not used, both cases are safe for scsi-cd. */
+    if (s->ctx && s->ctx != qemu_get_aio_context() &&
+        object_dynamic_cast(OBJECT(dev), "scsi-cd")) {
+        error_setg(errp, "scsi-cd is not supported by data plane");
+        return;
+    }
     if (s->ctx && !s->dataplane_fenced) {
         if (blk_op_is_blocked(sd->conf.blk, BLOCK_OP_TYPE_DATAPLANE, errp)) {
             return;
-- 
Gitee


From 96c4249b76a8ff5553d3c0f2b49444bd6c73f188 Mon Sep 17 00:00:00 2001
From: David Gibson <dgibson@redhat.com>
Date: Wed, 6 Feb 2019 03:58:56 +0000
Subject: [PATCH 013/407] BZ1653590: Require at least 64kiB pages for
 downstream guests & hosts

RH-Author: David Gibson <dgibson@redhat.com>
Message-id: <20190206035856.19058-1-dgibson@redhat.com>
Patchwork-id: 84246
O-Subject: [RHELAV-8.0/rhel qemu-kvm PATCH] BZ1653590: Require at least 64kiB pages for downstream guests & hosts
Bugzilla: 1653590
RH-Acked-by: Laurent Vivier <lvivier@redhat.com>
RH-Acked-by: Serhii Popovych <spopovyc@redhat.com>
RH-Acked-by: Thomas Huth <thuth@redhat.com>

Most current POWER guests require 64kiB page support, so that's the default
for the cap-hpt-max-pagesize option in qemu which limits available guest
page sizes.  We warn if the value is set smaller than that, but don't
outright fail upstream, because we need to allow for the possibility of
guest (and/or host) kernels configured for 4kiB page sizes.

Downstream, however, we simply don't support 4kiB pagesize configured
kernels in guest or host, so we can have qemu simply error out in this
situation.

Testing: Attempted to start a guest with cap-hpt-max-page-size=4k and verified
         it failed immediately with a qemu error

Signed-off-by: David Gibson <dgibson@redhat.com>
Signed-off-by: Danilo C. L. de Paula <ddepaula@redhat.com>
---
 hw/ppc/spapr_caps.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
index ed7c077a0d9..48a8efe6788 100644
--- a/hw/ppc/spapr_caps.c
+++ b/hw/ppc/spapr_caps.c
@@ -332,12 +332,19 @@ bool spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
 static void cap_hpt_maxpagesize_apply(SpaprMachineState *spapr,
                                       uint8_t val, Error **errp)
 {
+#if 0 /* disabled for RHEL */
     if (val < 12) {
         error_setg(errp, "Require at least 4kiB hpt-max-page-size");
         return;
     } else if (val < 16) {
         warn_report("Many guests require at least 64kiB hpt-max-page-size");
     }
+#else /* Only page sizes >=64kiB supported for RHEL */
+    if (val < 16) {
+        error_setg(errp, "Require at least 64kiB hpt-max-page-size");
+        return;
+    }
+#endif
 
     spapr_check_pagesize(spapr, qemu_minrampagesize(), errp);
 }
-- 
Gitee


From 6e9bcd211e641684f3452334ed12bd9557aa3446 Mon Sep 17 00:00:00 2001
From: Laurent Vivier <lvivier@redhat.com>
Date: Mon, 15 Nov 2021 14:22:29 +0100
Subject: [PATCH 014/407] compat: Update hw_compat_rhel_8_5

RH-Author: Laurent Vivier <lvivier@redhat.com>
RH-MergeRequest: 66: redhat: Update pseries-rhel8.5.0 machine type
RH-Commit: [1/2] 232f2ad2b29d250fbdb8fcea9d814704c575ba2b
RH-Bugzilla: 2022608
RH-Acked-by: Eric Auger <eric.auger@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Greg Kurz <gkurz@redhat.com>

Add properties from hw_compat_6_1 as it already includes the ones from
hw_compat_6_0. Add a lately added property from 6.0 too.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
--
Rebase notes (6.2.0 rc3):
- Included compatc changes introduced in RC2
---
 hw/core/machine.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 62febde5aa6..736c765c30c 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -38,7 +38,7 @@
 #include "hw/virtio/virtio-pci.h"
 
 /*
- * Mostly the same as hw_compat_6_0
+ * Mostly the same as hw_compat_6_0 and hw_compat_6_1
  */
 GlobalProperty hw_compat_rhel_8_5[] = {
     /* hw_compat_rhel_8_5 from hw_compat_6_0 */
@@ -51,6 +51,12 @@ GlobalProperty hw_compat_rhel_8_5[] = {
     { "e1000", "init-vet", "off" },
     /* hw_compat_rhel_8_5 from hw_compat_6_0 */
     { "e1000e", "init-vet", "off" },
+    /* hw_compat_rhel_8_5 from hw_compat_6_0 */
+    { "vhost-vsock-device", "seqpacket", "off" },
+    /* hw_compat_rhel_8_5 from hw_compat_6_1 */
+    { "vhost-user-vsock-device", "seqpacket", "off" },
+    /* hw_compat_rhel_8_5 from hw_compat_6_1 */
+    { "nvme-ns", "shared", "off" },
 };
 const size_t hw_compat_rhel_8_5_len = G_N_ELEMENTS(hw_compat_rhel_8_5);
 
-- 
Gitee


From d03cbd3f8ea4f6ed71ca6bcaaf31d88a62ee4220 Mon Sep 17 00:00:00 2001
From: Eric Auger <eric.auger@redhat.com>
Date: Tue, 16 Nov 2021 17:03:07 +0100
Subject: [PATCH 015/407] redhat: virt-rhel8.5.0: Update machine type
 compatibility for QEMU 6.2.0 update

RH-Author: Eric Auger <eric.auger@redhat.com>
RH-MergeRequest: 75: redhat: virt-rhel8.5.0: Update machine type compatibility for QEMU 6.2.0 update
RH-Commit: [21/21] f027d13654944e3d34e3356affe7af952eec2bed
RH-Bugzilla: 2022607
RH-Acked-by: Gavin Shan <gshan@redhat.com>
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Acked-by: Andrew Jones <drjones@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Laurent Vivier <lvivier@redhat.com>

To keep compatibility with 8.5-AV machine type we need to
turn few new options on by default:
smp_props.prefer_sockets, no_cpu_topology, no_tcg_its

TESTED: migrate from rhel-av-8.5.0 to rhel-8.6.0 and vice-versa
with upstream fix: 33a0c404fb  hw/intc/arm_gicv3_its: Revert version
increments in vmstate_its

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>
---
 hw/arm/virt.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index c77d26ab138..e8941afd018 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -3225,8 +3225,13 @@ type_init(rhel_machine_init);
 
 static void rhel850_virt_options(MachineClass *mc)
 {
+    VirtMachineClass *vmc = VIRT_MACHINE_CLASS(OBJECT_CLASS(mc));
+
     compat_props_add(mc->compat_props, arm_rhel_compat, arm_rhel_compat_len);
     compat_props_add(mc->compat_props, hw_compat_rhel_8_5, hw_compat_rhel_8_5_len);
+    mc->smp_props.prefer_sockets = true;
+    vmc->no_cpu_topology = true;
+    vmc->no_tcg_its = true;
 }
 DEFINE_RHEL_MACHINE_AS_LATEST(8, 5, 0)
 
-- 
Gitee


From a51ae30ced748986d9365fcd1ec46bd22d05b0d0 Mon Sep 17 00:00:00 2001
From: Eduardo Habkost <ehabkost@redhat.com>
Date: Tue, 19 Oct 2021 13:17:06 -0400
Subject: [PATCH 016/407] Fix virtio-net-pci* "vectors" compat

RH-Author: Dr. David Alan Gilbert <dgilbert@redhat.com>
RH-MergeRequest: 77: 8.6/6.2 mt fixes
RH-Commit: [21/23] 8ad581932275d2698a99f31bec40b14f1dbd3d2e
RH-Bugzilla: 2026443
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>

hw_compat_rhel_8_4 has an issue: it affects only "virtio-net-pci"
but not "virtio-net-pci-transitional" and
"virtio-net-pci-non-transitional".  The solution is to use the
"virtio-net-pci-base" type in compat_props.

An equivalent fix will be submitted for hw_compat_5_2 upstream.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
(cherry picked from commit d45823ab0d0138b2fbaf2ed1e1896d2052f3ccb3)
Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>
---
 hw/core/machine.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 736c765c30c..024b025fc2a 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -71,7 +71,11 @@ GlobalProperty hw_compat_rhel_8_4[] = {
     /* hw_compat_rhel_8_4 from hw_compat_5_2 */
     { "virtio-blk-device", "report-discard-granularity", "off" },
     /* hw_compat_rhel_8_4 from hw_compat_5_2 */
-    { "virtio-net-pci", "vectors", "3"},
+    /*
+     * Upstream incorrectly had "virtio-net-pci" instead of "virtio-net-pci-base",
+     * (https://bugzilla.redhat.com/show_bug.cgi?id=1999141)
+     */
+    { "virtio-net-pci-base", "vectors", "3"},
 };
 const size_t hw_compat_rhel_8_4_len = G_N_ELEMENTS(hw_compat_rhel_8_4);
 
-- 
Gitee


From fd337f58c89de3f051c310963067b0663594db97 Mon Sep 17 00:00:00 2001
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Date: Tue, 23 Nov 2021 17:57:42 +0000
Subject: [PATCH 017/407] x86/rhel machine types: Add pc_rhel_8_5_compat

RH-Author: Dr. David Alan Gilbert <dgilbert@redhat.com>
RH-MergeRequest: 77: 8.6/6.2 mt fixes
RH-Commit: [22/23] 8bf555c5d78f344b97ffd5c888c7a7bed592d9d0
RH-Bugzilla: 2026443
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>

Add pc_rhel_8_5_compat as the merge of pc_compat_6_1 and pc_compat_6_0
(since 8.5 was based on 6.0).

Note, x-keep-pci-slot-hpc flipped back and forward, leaving it out
looks like it leaves us with the original.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>
---
 hw/i386/pc.c         | 21 +++++++++++++++++++++
 include/hw/i386/pc.h |  3 +++
 2 files changed, 24 insertions(+)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index e8109954cab..4c08a1971ca 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -387,6 +387,27 @@ GlobalProperty pc_rhel_compat[] = {
 };
 const size_t pc_rhel_compat_len = G_N_ELEMENTS(pc_rhel_compat);
 
+GlobalProperty pc_rhel_8_5_compat[] = {
+    /* pc_rhel_8_5_compat from pc_compat_6_0 */
+    { "qemu64" "-" TYPE_X86_CPU, "family", "6" },
+    /* pc_rhel_8_5_compat from pc_compat_6_0 */
+    { "qemu64" "-" TYPE_X86_CPU, "model", "6" },
+    /* pc_rhel_8_5_compat from pc_compat_6_0 */
+    { "qemu64" "-" TYPE_X86_CPU, "stepping", "3" },
+    /* pc_rhel_8_5_compat from pc_compat_6_0 */
+    { TYPE_X86_CPU, "x-vendor-cpuid-only", "off" },
+    /* pc_rhel_8_5_compat from pc_compat_6_0 */
+    { "ICH9-LPC", ACPI_PM_PROP_ACPI_PCIHP_BRIDGE, "off" },
+
+    /* pc_rhel_8_5_compat from pc_compat_6_1 */
+    { TYPE_X86_CPU, "hv-version-id-build", "0x1bbc" },
+    /* pc_rhel_8_5_compat from pc_compat_6_1 */
+    { TYPE_X86_CPU, "hv-version-id-major", "0x0006" },
+    /* pc_rhel_8_5_compat from pc_compat_6_1 */
+    { TYPE_X86_CPU, "hv-version-id-minor", "0x0001" },
+};
+const size_t pc_rhel_8_5_compat_len = G_N_ELEMENTS(pc_rhel_8_5_compat);
+
 GlobalProperty pc_rhel_8_4_compat[] = {
     /* pc_rhel_8_4_compat from pc_compat_5_2 */
     { "ICH9-LPC", "x-smi-cpu-hotunplug", "off" },
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index d0544ee119c..9e8bfb69f80 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -286,6 +286,9 @@ extern const size_t pc_compat_1_4_len;
 extern GlobalProperty pc_rhel_compat[];
 extern const size_t pc_rhel_compat_len;
 
+extern GlobalProperty pc_rhel_8_5_compat[];
+extern const size_t pc_rhel_8_5_compat_len;
+
 extern GlobalProperty pc_rhel_8_4_compat[];
 extern const size_t pc_rhel_8_4_compat_len;
 
-- 
Gitee


From 41defb5361e0f8bb2029198d215e45f1bf79fdc6 Mon Sep 17 00:00:00 2001
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Date: Tue, 23 Nov 2021 18:07:49 +0000
Subject: [PATCH 018/407] x86/rhel machine types: Wire compat into q35 and
 i440fx

RH-Author: Dr. David Alan Gilbert <dgilbert@redhat.com>
RH-MergeRequest: 77: 8.6/6.2 mt fixes
RH-Commit: [23/23] fc3861aeccc943b434231193ef45ffbc0b3cf6c6
RH-Bugzilla: 2026443
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>

Wire the pc_rhel_8_5 compat data into both piix and q35
to keep the existing machine types compatible.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>
---
 hw/i386/pc_piix.c | 4 ++++
 hw/i386/pc_q35.c  | 4 ++++
 2 files changed, 8 insertions(+)

diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 2885edffe91..37fab007331 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -1040,6 +1040,10 @@ static void pc_machine_rhel760_options(MachineClass *m)
     pcmc->kvmclock_create_always = false;
     /* From pc_i440fx_5_1_machine_options() */
     pcmc->pci_root_uid = 1;
+    compat_props_add(m->compat_props, hw_compat_rhel_8_5,
+                     hw_compat_rhel_8_5_len);
+    compat_props_add(m->compat_props, pc_rhel_8_5_compat,
+                     pc_rhel_8_5_compat_len);
     compat_props_add(m->compat_props, hw_compat_rhel_8_4,
                      hw_compat_rhel_8_4_len);
     compat_props_add(m->compat_props, pc_rhel_8_4_compat,
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index c67418b6a91..78876e1101c 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -658,6 +658,10 @@ static void pc_q35_machine_rhel850_options(MachineClass *m)
     m->desc = "RHEL-8.5.0 PC (Q35 + ICH9, 2009)";
     pcmc->smbios_stream_product = "RHEL-AV";
     pcmc->smbios_stream_version = "8.5.0";
+    compat_props_add(m->compat_props, hw_compat_rhel_8_5,
+                     hw_compat_rhel_8_5_len);
+    compat_props_add(m->compat_props, pc_rhel_8_5_compat,
+                     pc_rhel_8_5_compat_len);
 }
 
 DEFINE_PC_MACHINE(q35_rhel850, "pc-q35-rhel8.5.0", pc_q35_init_rhel850,
-- 
Gitee


From a52befcea95bb78098beb92037808f394f5ba7a4 Mon Sep 17 00:00:00 2001
From: Thomas Huth <thuth@redhat.com>
Date: Fri, 26 Nov 2021 09:37:11 +0100
Subject: [PATCH 019/407] redhat: Add s390x machine type compatibility handling
 for the rebase to v6.2

RH-Author: Thomas Huth <thuth@redhat.com>
RH-MergeRequest: 80: Add s390x machine type compatibility handling for the rebase to v6.2
RH-Commit: [26/26] c45cf594604f6dd23954696b9c84d2025e328d11
RH-Bugzilla: 2022602
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Acked-by: Laurent Vivier <lvivier@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>

Add compatibility handling for the rhel8.5.0 machine type (and
recursively older, of course).

Based on the following upstream commits:

 463e50da8b - s390x/cpumodel: Bump up QEMU model to a stripped-down IBM z14 GA2
 30e398f796 - s390x/cpumodel: Add more feature to gen16 default model
 4a0af2930a - machine: Prefer cores over sockets in smp parsing since 6.2
 2b52619994 - machine: Move smp_prefer_sockets to struct SMPCompatProps

Signed-off-by: Thomas Huth <thuth@redhat.com>
---
 hw/s390x/s390-virtio-ccw.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 181856e6cf5..cf13c457d6b 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -1105,11 +1105,21 @@ DEFINE_CCW_MACHINE(2_4, "2.4", false);
 
 static void ccw_machine_rhel850_instance_options(MachineState *machine)
 {
+    static const S390FeatInit qemu_cpu_feat = { S390_FEAT_LIST_QEMU_V6_0 };
+
+    s390_set_qemu_cpu_model(0x2964, 13, 2, qemu_cpu_feat);
+
+    s390_cpudef_featoff_greater(16, 1, S390_FEAT_NNPA);
+    s390_cpudef_featoff_greater(16, 1, S390_FEAT_VECTOR_PACKED_DECIMAL_ENH2);
+    s390_cpudef_featoff_greater(16, 1, S390_FEAT_BEAR_ENH);
+    s390_cpudef_featoff_greater(16, 1, S390_FEAT_RDP);
+    s390_cpudef_featoff_greater(16, 1, S390_FEAT_PAI);
 }
 
 static void ccw_machine_rhel850_class_options(MachineClass *mc)
 {
     compat_props_add(mc->compat_props, hw_compat_rhel_8_5, hw_compat_rhel_8_5_len);
+    mc->smp_props.prefer_sockets = true;
 }
 DEFINE_CCW_MACHINE(rhel850, "rhel8.5.0", true);
 
-- 
Gitee


From f1b1018082732cdcc6fb7df279150886437afc25 Mon Sep 17 00:00:00 2001
From: Thomas Huth <thuth@redhat.com>
Date: Fri, 10 Dec 2021 10:07:40 +0100
Subject: [PATCH 020/407] redhat: Add rhel8.6.0 machine type for s390x
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Thomas Huth <thuth@redhat.com>
RH-MergeRequest: 90: Add rhel8.6.0 machine type for s390x
RH-Commit: [1/1] 91961fc52d708e6b30d7361fbab3572c5b5c1859
RH-Bugzilla: 2005325
RH-Acked-by: Greg Kurz <gkurz@redhat.com>
RH-Acked-by: Philippe Mathieu-Daudé <philmd@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: David Hildenbrand <david@redhat.com>

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2005325

The new machine type has better default values for the upcoming
"generation 16" mainframe.

Signed-off-by: Thomas Huth <thuth@redhat.com>
---
 hw/s390x/s390-virtio-ccw.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index cf13c457d6b..9795eb94069 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -1103,10 +1103,21 @@ static void ccw_machine_2_4_class_options(MachineClass *mc)
 DEFINE_CCW_MACHINE(2_4, "2.4", false);
 #endif
 
+static void ccw_machine_rhel860_instance_options(MachineState *machine)
+{
+}
+
+static void ccw_machine_rhel860_class_options(MachineClass *mc)
+{
+}
+DEFINE_CCW_MACHINE(rhel860, "rhel8.6.0", true);
+
 static void ccw_machine_rhel850_instance_options(MachineState *machine)
 {
     static const S390FeatInit qemu_cpu_feat = { S390_FEAT_LIST_QEMU_V6_0 };
 
+    ccw_machine_rhel860_instance_options(machine);
+
     s390_set_qemu_cpu_model(0x2964, 13, 2, qemu_cpu_feat);
 
     s390_cpudef_featoff_greater(16, 1, S390_FEAT_NNPA);
@@ -1118,10 +1129,11 @@ static void ccw_machine_rhel850_instance_options(MachineState *machine)
 
 static void ccw_machine_rhel850_class_options(MachineClass *mc)
 {
+    ccw_machine_rhel860_class_options(mc);
     compat_props_add(mc->compat_props, hw_compat_rhel_8_5, hw_compat_rhel_8_5_len);
     mc->smp_props.prefer_sockets = true;
 }
-DEFINE_CCW_MACHINE(rhel850, "rhel8.5.0", true);
+DEFINE_CCW_MACHINE(rhel850, "rhel8.5.0", false);
 
 static void ccw_machine_rhel840_instance_options(MachineState *machine)
 {
-- 
Gitee


From 99598aaf8502fbf141b4af1d407b1632fe08c055 Mon Sep 17 00:00:00 2001
From: Eric Auger <eric.auger@redhat.com>
Date: Mon, 20 Dec 2021 15:50:44 +0100
Subject: [PATCH 021/407] hw/arm/virt: Register "iommu" as a class property

RH-Author: Eric Auger <eric.auger@redhat.com>
RH-MergeRequest: 95: hw/arm/virt: Add virt-rhel8.6.0 machine type
RH-Commit: [1/5] 74b01bb90213493db700d5bdf81dd99892571972
RH-Bugzilla: 2031039
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Acked-by: Andrew Jones <drjones@redhat.com>
RH-Acked-by: Gavin Shan <gshan@redhat.com>

branch: rhel-8.6.0
Brew: 42212069
Upstream: no

Register the "iommu" option as a class property. This mirrors what
was done in upstream commit b91def7b ("arm/virt: Register
most properties as class properties").

While we are at it we also move the "x-oem-id" and "x-oem-table-id"
registrations at the very end of the rhel_machine_class_init()
function. This makes our life easier when comparing with upstream.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 hw/arm/virt.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index e8941afd018..684ffce52e0 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -3131,6 +3131,18 @@ static void rhel_machine_class_init(ObjectClass *oc, void *data)
                                           "Set GIC version. "
                                           "Valid values are 2, 3, host and max");
 
+    object_class_property_add_str(oc, "iommu", virt_get_iommu, virt_set_iommu);
+    object_class_property_set_description(oc, "iommu",
+                                          "Set the IOMMU type. "
+                                          "Valid values are none and smmuv3");
+
+    object_class_property_add_bool(oc, "default_bus_bypass_iommu",
+                                   virt_get_default_bus_bypass_iommu,
+                                   virt_set_default_bus_bypass_iommu);
+    object_class_property_set_description(oc, "default_bus_bypass_iommu",
+                                          "Set on/off to enable/disable "
+                                          "bypass_iommu for default root bus");
+
     object_class_property_add_str(oc, "x-oem-id",
                                   virt_get_oem_id,
                                   virt_set_oem_id);
@@ -3146,10 +3158,6 @@ static void rhel_machine_class_init(ObjectClass *oc, void *data)
                                           "Override the default value of field OEM Table ID "
                                           "in ACPI table header."
                                           "The string may be up to 8 bytes in size");
-    object_class_property_add_bool(oc, "default_bus_bypass_iommu",
-                                   virt_get_default_bus_bypass_iommu,
-                                   virt_set_default_bus_bypass_iommu);
-
 }
 
 static void rhel_virt_instance_init(Object *obj)
@@ -3183,10 +3191,6 @@ static void rhel_virt_instance_init(Object *obj)
 
     /* Default disallows iommu instantiation */
     vms->iommu = VIRT_IOMMU_NONE;
-    object_property_add_str(obj, "iommu", virt_get_iommu, virt_set_iommu);
-    object_property_set_description(obj, "iommu",
-                                    "Set the IOMMU type. "
-                                    "Valid values are none and smmuv3");
 
     /* Default disallows RAS instantiation and is non-configurable for RHEL */
     vms->ras = false;
-- 
Gitee


From 223a2daba1f7ee8b6815ae8f65cb2e219dc5a025 Mon Sep 17 00:00:00 2001
From: Eric Auger <eric.auger@redhat.com>
Date: Mon, 20 Dec 2021 16:04:59 +0100
Subject: [PATCH 022/407] hw/arm/virt: Register "its" as a class property

RH-Author: Eric Auger <eric.auger@redhat.com>
RH-MergeRequest: 95: hw/arm/virt: Add virt-rhel8.6.0 machine type
RH-Commit: [2/5] 4ddfa57495578127770f93689c4d9f111a12b91c
RH-Bugzilla: 2031039
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Acked-by: Andrew Jones <drjones@redhat.com>
RH-Acked-by: Gavin Shan <gshan@redhat.com>

branch: rhel-8.6.0
Brew: 42212069
Upstream: no

Register "its" as a class property.  This mirrors what was done
in commit 27edeeaafe43 ("virt: Register "its" as class property").

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 hw/arm/virt.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 684ffce52e0..d679391eb07 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -3143,6 +3143,12 @@ static void rhel_machine_class_init(ObjectClass *oc, void *data)
                                           "Set on/off to enable/disable "
                                           "bypass_iommu for default root bus");
 
+    object_class_property_add_bool(oc, "its", virt_get_its,
+                                   virt_set_its);
+    object_class_property_set_description(oc, "its",
+                                          "Set on/off to enable/disable "
+                                          "ITS instantiation");
+
     object_class_property_add_str(oc, "x-oem-id",
                                   virt_get_oem_id,
                                   virt_set_oem_id);
@@ -3182,11 +3188,6 @@ static void rhel_virt_instance_init(Object *obj)
     } else {
         /* Default allows ITS instantiation */
         vms->its = true;
-        object_property_add_bool(obj, "its", virt_get_its,
-                                 virt_set_its);
-        object_property_set_description(obj, "its",
-                                        "Set on/off to enable/disable "
-                                        "ITS instantiation");
     }
 
     /* Default disallows iommu instantiation */
-- 
Gitee


From 96e8ee59ea2eb1e48aade04334066c2804275ec2 Mon Sep 17 00:00:00 2001
From: Eric Auger <eric.auger@redhat.com>
Date: Mon, 20 Dec 2021 15:58:38 +0100
Subject: [PATCH 023/407] hw/arm/virt: Rename default_bus_bypass_iommu

RH-Author: Eric Auger <eric.auger@redhat.com>
RH-MergeRequest: 95: hw/arm/virt: Add virt-rhel8.6.0 machine type
RH-Commit: [3/5] 3ed0425391dab7cf14c6e66fc1b2430be1152d6c
RH-Bugzilla: 2031039
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Acked-by: Andrew Jones <drjones@redhat.com>
RH-Acked-by: Gavin Shan <gshan@redhat.com>

branch: rhel-8.6.0
Brew: 42212069
Upstream: no

Rename "default_bus_bypass_iommu" into "default-bus-bypass-iommu".
This mirrors what was done in upstream commit:
9dad363a223 ("hw/arm/virt: Rename default_bus_bypass_iommu")

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 hw/arm/virt.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index d679391eb07..6a4173b6c38 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -3136,10 +3136,10 @@ static void rhel_machine_class_init(ObjectClass *oc, void *data)
                                           "Set the IOMMU type. "
                                           "Valid values are none and smmuv3");
 
-    object_class_property_add_bool(oc, "default_bus_bypass_iommu",
+    object_class_property_add_bool(oc, "default-bus-bypass-iommu",
                                    virt_get_default_bus_bypass_iommu,
                                    virt_set_default_bus_bypass_iommu);
-    object_class_property_set_description(oc, "default_bus_bypass_iommu",
+    object_class_property_set_description(oc, "default-bus-bypass-iommu",
                                           "Set on/off to enable/disable "
                                           "bypass_iommu for default root bus");
 
-- 
Gitee


From 0b83d69b7324a5a4099b83a2ada2453ad99b1721 Mon Sep 17 00:00:00 2001
From: Eric Auger <eric.auger@redhat.com>
Date: Mon, 20 Dec 2021 15:24:24 +0100
Subject: [PATCH 024/407] hw/arm/virt: Add 8.6 machine type

RH-Author: Eric Auger <eric.auger@redhat.com>
RH-MergeRequest: 95: hw/arm/virt: Add virt-rhel8.6.0 machine type
RH-Commit: [4/5] d0df3e796d3e9a6ca2af1e3b33fc6021bcac5d09
RH-Bugzilla: 2031039
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Acked-by: Andrew Jones <drjones@redhat.com>
RH-Acked-by: Gavin Shan <gshan@redhat.com>

branch: rhel-8.6.0
Brew: 42212069
Upstream: no

Add 8.6 machine type.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 hw/arm/virt.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 6a4173b6c38..c9c17b9d45d 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -3228,17 +3228,23 @@ static void rhel_machine_init(void)
 }
 type_init(rhel_machine_init);
 
+static void rhel860_virt_options(MachineClass *mc)
+{
+    compat_props_add(mc->compat_props, arm_rhel_compat, arm_rhel_compat_len);
+}
+DEFINE_RHEL_MACHINE_AS_LATEST(8, 6, 0)
+
 static void rhel850_virt_options(MachineClass *mc)
 {
     VirtMachineClass *vmc = VIRT_MACHINE_CLASS(OBJECT_CLASS(mc));
 
-    compat_props_add(mc->compat_props, arm_rhel_compat, arm_rhel_compat_len);
+    rhel860_virt_options(mc);
     compat_props_add(mc->compat_props, hw_compat_rhel_8_5, hw_compat_rhel_8_5_len);
     mc->smp_props.prefer_sockets = true;
     vmc->no_cpu_topology = true;
     vmc->no_tcg_its = true;
 }
-DEFINE_RHEL_MACHINE_AS_LATEST(8, 5, 0)
+DEFINE_RHEL_MACHINE(8, 5, 0)
 
 static void rhel840_virt_options(MachineClass *mc)
 {
-- 
Gitee


From cfdd39abe42d8907858748bd14e7405f9bd8b69c Mon Sep 17 00:00:00 2001
From: Eric Auger <eric.auger@redhat.com>
Date: Wed, 5 Jan 2022 16:17:10 +0100
Subject: [PATCH 025/407] hw/arm/virt: Check no_tcg_its and minor style changes

RH-Author: Eric Auger <eric.auger@redhat.com>
RH-MergeRequest: 95: hw/arm/virt: Add virt-rhel8.6.0 machine type
RH-Commit: [5/5] 57e77446ff5a1a7efe152b2c907c0a0ca5487ab7
RH-Bugzilla: 2031039
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Acked-by: Andrew Jones <drjones@redhat.com>
RH-Acked-by: Gavin Shan <gshan@redhat.com>

branch: rhel-8.6.0
Brew: 42212069
Upstream: no

Truly allow TCG ITS instantiation according to the no_tcg_its
class flag. Otherwise it is always set to false.

We also take benefit of this patch to do some minor non
functional style changes to be closer to the upstream code.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 hw/arm/virt.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index c9c17b9d45d..dbf0a6d62fe 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -3157,6 +3157,7 @@ static void rhel_machine_class_init(ObjectClass *oc, void *data)
                                           "in ACPI table header."
                                           "The string may be up to 6 bytes in size");
 
+
     object_class_property_add_str(oc, "x-oem-table-id",
                                   virt_get_oem_table_id,
                                   virt_set_oem_table_id);
@@ -3164,6 +3165,7 @@ static void rhel_machine_class_init(ObjectClass *oc, void *data)
                                           "Override the default value of field OEM Table ID "
                                           "in ACPI table header."
                                           "The string may be up to 8 bytes in size");
+
 }
 
 static void rhel_virt_instance_init(Object *obj)
@@ -3188,24 +3190,32 @@ static void rhel_virt_instance_init(Object *obj)
     } else {
         /* Default allows ITS instantiation */
         vms->its = true;
+
+        if (vmc->no_tcg_its) {
+            vms->tcg_its = false;
+        } else {
+            vms->tcg_its = true;
+        }
     }
 
     /* Default disallows iommu instantiation */
     vms->iommu = VIRT_IOMMU_NONE;
 
+    /* The default root bus is attached to iommu by default */
+    vms->default_bus_bypass_iommu = false;
+
     /* Default disallows RAS instantiation and is non-configurable for RHEL */
     vms->ras = false;
 
     /* MTE is disabled by default and non-configurable for RHEL */
     vms->mte = false;
 
-    vms->default_bus_bypass_iommu = false;
     vms->irqmap = a15irqmap;
 
     virt_flash_create(vms);
+
     vms->oem_id = g_strndup(ACPI_BUILD_APPNAME6, 6);
     vms->oem_table_id = g_strndup(ACPI_BUILD_APPNAME8, 8);
-
 }
 
 static const TypeInfo rhel_machine_info = {
-- 
Gitee


From bc3739e08213b468ab3e3e682637e4eb85008bd7 Mon Sep 17 00:00:00 2001
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Date: Tue, 7 Dec 2021 18:39:47 +0000
Subject: [PATCH 026/407] rhel machine types/x86: set prefer_sockets

RH-Author: Dr. David Alan Gilbert <dgilbert@redhat.com>
RH-MergeRequest: 96: Fixup x86 prefer_sockets
RH-Commit: [1/1] 29578bcc2f5d3408c155c155cdfa10b7a12faf4d
RH-Bugzilla: 2029582
RH-Acked-by: Igor Mammedov <imammedo@redhat.com>
RH-Acked-by: quintela1 <quintela@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>

When I fixed up the machine types for 8.5 I missed the
  prefer_sockets = true

add them in; it looks like Power, ARM already have them, and I see them
in thuth's s390 patch.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 hw/i386/pc_piix.c | 1 +
 hw/i386/pc_q35.c  | 1 +
 2 files changed, 2 insertions(+)

diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 37fab007331..c30057c4431 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -1020,6 +1020,7 @@ static void pc_machine_rhel7_options(MachineClass *m)
     compat_props_add(m->compat_props, pc_rhel_compat, pc_rhel_compat_len);
     m->alias = "pc";
     m->is_default = 1;
+    m->smp_props.prefer_sockets = true;
 }
 
 static void pc_init_rhel760(MachineState *machine)
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 78876e1101c..f6e77bca0ea 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -662,6 +662,7 @@ static void pc_q35_machine_rhel850_options(MachineClass *m)
                      hw_compat_rhel_8_5_len);
     compat_props_add(m->compat_props, pc_rhel_8_5_compat,
                      pc_rhel_8_5_compat_len);
+    m->smp_props.prefer_sockets = true;
 }
 
 DEFINE_PC_MACHINE(q35_rhel850, "pc-q35-rhel8.5.0", pc_q35_init_rhel850,
-- 
Gitee


From 00905270f01c45e364364c14a1ab42bea71f88a5 Mon Sep 17 00:00:00 2001
From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Tue, 21 Dec 2021 09:45:44 -0500
Subject: [PATCH 027/407] acpi: validate hotplug selector on access
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 97: acpi: validate hotplug selector on access
RH-Commit: [1/1] 79bcfb0df0091e2b716d2e1c545f047b3409c26c (jmaloy/qemu-kvm)
RH-Bugzilla: 2036580
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Acked-by: Daniel P. Berrangé <berrange@redhat.com>
RH-Acked-by: Igor Mammedov <imammedo@redhat.com>

When bus is looked up on a pci write, we didn't
validate that the lookup succeeded.
Fuzzers thus can trigger QEMU crash by dereferencing the NULL
bus pointer.

Fixes: b32bd763a1 ("pci: introduce acpi-index property for PCI device")
Fixes: CVE-2021-4158
Cc: "Igor Mammedov" <imammedo@redhat.com>
Fixes: https://gitlab.com/qemu-project/qemu/-/issues/770
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Ani Sinha <ani@anisinha.ca>
(cherry picked from commit 9bd6565ccee68f72d5012e24646e12a1c662827e)
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 hw/acpi/pcihp.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
index 30405b5113d..a5e182dd3a3 100644
--- a/hw/acpi/pcihp.c
+++ b/hw/acpi/pcihp.c
@@ -491,6 +491,9 @@ static void pci_write(void *opaque, hwaddr addr, uint64_t data,
         }
 
         bus = acpi_pcihp_find_hotplug_bus(s, s->hotplug_select);
+        if (!bus) {
+            break;
+        }
         QTAILQ_FOREACH_SAFE(kid, &bus->qbus.children, sibling, next) {
             Object *o = OBJECT(kid->child);
             PCIDevice *dev = PCI_DEVICE(o);
-- 
Gitee


From a386221871603c72279e7a4d6e762e5832ce0dd3 Mon Sep 17 00:00:00 2001
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Date: Tue, 11 Jan 2022 18:29:31 +0000
Subject: [PATCH 028/407] x86: Add q35 RHEL 8.6.0 machine type
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Dr. David Alan Gilbert <dgilbert@redhat.com>
RH-MergeRequest: 99: x86: Add q35 RHEL 8.6.0 machine type
RH-Commit: [1/1] a694724b6fa972e312bb76b5569bc979d6c596ef
RH-Bugzilla: 2031035
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Acked-by: Daniel P. Berrangé <berrange@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>

Add the new 8.6.0 machine type;  note that while the -AV
notation has gone in the product naming, just keep the smbios
definitions the same for consistency.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 hw/i386/pc_q35.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index f6e77bca0ea..5559261d9e3 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -646,6 +646,24 @@ static void pc_q35_machine_rhel_options(MachineClass *m)
     compat_props_add(m->compat_props, pc_rhel_compat, pc_rhel_compat_len);
 }
 
+static void pc_q35_init_rhel860(MachineState *machine)
+{
+    pc_q35_init(machine);
+}
+
+static void pc_q35_machine_rhel860_options(MachineClass *m)
+{
+    PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
+    pc_q35_machine_rhel_options(m);
+    m->desc = "RHEL-8.6.0 PC (Q35 + ICH9, 2009)";
+    pcmc->smbios_stream_product = "RHEL-AV";
+    pcmc->smbios_stream_version = "8.6.0";
+}
+
+DEFINE_PC_MACHINE(q35_rhel860, "pc-q35-rhel8.6.0", pc_q35_init_rhel860,
+                  pc_q35_machine_rhel860_options);
+
+
 static void pc_q35_init_rhel850(MachineState *machine)
 {
     pc_q35_init(machine);
@@ -654,8 +672,9 @@ static void pc_q35_init_rhel850(MachineState *machine)
 static void pc_q35_machine_rhel850_options(MachineClass *m)
 {
     PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
-    pc_q35_machine_rhel_options(m);
+    pc_q35_machine_rhel860_options(m);
     m->desc = "RHEL-8.5.0 PC (Q35 + ICH9, 2009)";
+    m->alias = NULL;
     pcmc->smbios_stream_product = "RHEL-AV";
     pcmc->smbios_stream_version = "8.5.0";
     compat_props_add(m->compat_props, hw_compat_rhel_8_5,
-- 
Gitee


From ede68bc6f3a999eebe6f6fca3b93ef307dc4758e Mon Sep 17 00:00:00 2001
From: Vivek Goyal <vgoyal@redhat.com>
Date: Tue, 25 Jan 2022 13:51:14 -0500
Subject: [PATCH 029/407] virtiofsd: Drop membership of all supplementary
 groups (CVE-2022-0358)

RH-Author: Dr. David Alan Gilbert <dgilbert@redhat.com>
RH-MergeRequest: 102: virtiofsd: Drop membership of all supplementary groups (CVE-2022-0358)
RH-Commit: [1/1] 93e56c88277fec8e42559a899d32b80fac4a923f
RH-Bugzilla: 2046198
RH-Acked-by: Greg Kurz <gkurz@redhat.com>
RH-Acked-by: Sergio Lopez <None>
RH-Acked-by: Laszlo Ersek <lersek@redhat.com>

At the start, drop membership of all supplementary groups. This is
not required.

If we have membership of "root" supplementary group and when we switch
uid/gid using setresuid/setsgid, we still retain membership of existing
supplemntary groups. And that can allow some operations which are not
normally allowed.

For example, if root in guest creates a dir as follows.

$ mkdir -m 03777 test_dir

This sets SGID on dir as well as allows unprivileged users to write into
this dir.

And now as unprivileged user open file as follows.

$ su test
$ fd = open("test_dir/priviledge_id", O_RDWR|O_CREAT|O_EXCL, 02755);

This will create SGID set executable in test_dir/.

And that's a problem because now an unpriviliged user can execute it,
get egid=0 and get access to resources owned by "root" group. This is
privilege escalation.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2044863
Fixes: CVE-2022-0358
Reported-by: JIETAO XIAO <shawtao1125@gmail.com>
Suggested-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <YfBGoriS38eBQrAb@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  dgilbert: Fixed missing {}'s style nit
(cherry picked from commit 449e8171f96a6a944d1f3b7d3627ae059eae21ca)
---
 tools/virtiofsd/passthrough_ll.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 64b5b4fbb18..b3d0674f6d2 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -54,6 +54,7 @@
 #include <sys/wait.h>
 #include <sys/xattr.h>
 #include <syslog.h>
+#include <grp.h>
 
 #include "qemu/cutils.h"
 #include "passthrough_helpers.h"
@@ -1161,6 +1162,30 @@ static void lo_lookup(fuse_req_t req, fuse_ino_t parent, const char *name)
 #define OURSYS_setresuid SYS_setresuid
 #endif
 
+static void drop_supplementary_groups(void)
+{
+    int ret;
+
+    ret = getgroups(0, NULL);
+    if (ret == -1) {
+        fuse_log(FUSE_LOG_ERR, "getgroups() failed with error=%d:%s\n",
+                 errno, strerror(errno));
+        exit(1);
+    }
+
+    if (!ret) {
+        return;
+    }
+
+    /* Drop all supplementary groups. We should not need it */
+    ret = setgroups(0, NULL);
+    if (ret == -1) {
+        fuse_log(FUSE_LOG_ERR, "setgroups() failed with error=%d:%s\n",
+                 errno, strerror(errno));
+        exit(1);
+    }
+}
+
 /*
  * Change to uid/gid of caller so that file is created with
  * ownership of caller.
@@ -3926,6 +3951,8 @@ int main(int argc, char *argv[])
 
     qemu_init_exec_dir(argv[0]);
 
+    drop_supplementary_groups();
+
     pthread_mutex_init(&lo.mutex, NULL);
     lo.inodes = g_hash_table_new(lo_key_hash, lo_key_equal);
     lo.root.fd = -1;
-- 
Gitee


From 801103575244f4250ac875865aaa6fd1e99a87d3 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Daniel=20P=2E=20Berrang=C3=A9?= <berrange@redhat.com>
Date: Wed, 5 Jan 2022 12:38:47 +0000
Subject: [PATCH 030/407] softmmu: fix device deletion events with -device JSON
 syntax
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Kevin Wolf <kwolf@redhat.com>
RH-MergeRequest: 103: Fix hot unplug of devices created with -device JSON syntax
RH-Commit: [1/1] 64cbc78bcb46bdb24d5f589ceb5ad598c388e447
RH-Bugzilla: 2033279
RH-Acked-by: Laurent Vivier <lvivier@redhat.com>
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Jano Tomko <None>
RH-Acked-by: Daniel P. Berrangé <berrange@redhat.com>

The -device JSON syntax impl leaks a reference on the created
DeviceState instance. As a result when you hot-unplug the
device, the device_finalize method won't be called and thus
it will fail to emit the required DEVICE_DELETED event.

A 'json-cli' feature was previously added against the
'device_add' QMP command QAPI schema to indicated to mgmt
apps that -device supported JSON syntax. Given the hotplug
bug that feature flag is not usable for its purpose, so
we add a new 'json-cli-hotplug' feature to indicate the
-device supports JSON without breaking hotplug.

Fixes: 5dacda5167560b3af8eadbce5814f60ba44b467e
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/802
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220105123847.4047954-2-berrange@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Tested-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 64b4529a432507ee84a924be69a03432639e87ba)
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 qapi/qdev.json                 |  5 ++++-
 softmmu/vl.c                   |  4 +++-
 tests/qtest/device-plug-test.c | 19 +++++++++++++++++++
 3 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/qapi/qdev.json b/qapi/qdev.json
index 69656b14df2..26cd10106b2 100644
--- a/qapi/qdev.json
+++ b/qapi/qdev.json
@@ -44,6 +44,9 @@
 # @json-cli: If present, the "-device" command line option supports JSON
 #            syntax with a structure identical to the arguments of this
 #            command.
+# @json-cli-hotplug: If present, the "-device" command line option supports JSON
+#                    syntax without the reference counting leak that broke
+#                    hot-unplug
 #
 # Notes:
 #
@@ -74,7 +77,7 @@
 { 'command': 'device_add',
   'data': {'driver': 'str', '*bus': 'str', '*id': 'str'},
   'gen': false, # so we can get the additional arguments
-  'features': ['json-cli'] }
+  'features': ['json-cli', 'json-cli-hotplug'] }
 
 ##
 # @device_del:
diff --git a/softmmu/vl.c b/softmmu/vl.c
index d46b8fb4ab6..b3829e2edd1 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -2690,6 +2690,7 @@ static void qemu_create_cli_devices(void)
     qemu_opts_foreach(qemu_find_opts("device"),
                       device_init_func, NULL, &error_fatal);
     QTAILQ_FOREACH(opt, &device_opts, next) {
+        DeviceState *dev;
         loc_push_restore(&opt->loc);
         /*
          * TODO Eventually we should call qmp_device_add() here to make sure it
@@ -2698,7 +2699,8 @@ static void qemu_create_cli_devices(void)
          * from the start, so call qdev_device_add_from_qdict() directly for
          * now.
          */
-        qdev_device_add_from_qdict(opt->opts, true, &error_fatal);
+        dev = qdev_device_add_from_qdict(opt->opts, true, &error_fatal);
+        object_unref(OBJECT(dev));
         loc_pop(&opt->loc);
     }
     rom_reset_order_override();
diff --git a/tests/qtest/device-plug-test.c b/tests/qtest/device-plug-test.c
index 559d47727af..ad79bd4c143 100644
--- a/tests/qtest/device-plug-test.c
+++ b/tests/qtest/device-plug-test.c
@@ -77,6 +77,23 @@ static void test_pci_unplug_request(void)
     qtest_quit(qtest);
 }
 
+static void test_pci_unplug_json_request(void)
+{
+    QTestState *qtest = qtest_initf(
+        "-device '{\"driver\": \"virtio-mouse-pci\", \"id\": \"dev0\"}'");
+
+    /*
+     * Request device removal. As the guest is not running, the request won't
+     * be processed. However during system reset, the removal will be
+     * handled, removing the device.
+     */
+    device_del(qtest, "dev0");
+    system_reset(qtest);
+    wait_device_deleted_event(qtest, "dev0");
+
+    qtest_quit(qtest);
+}
+
 static void test_ccw_unplug(void)
 {
     QTestState *qtest = qtest_initf("-device virtio-balloon-ccw,id=dev0");
@@ -145,6 +162,8 @@ int main(int argc, char **argv)
      */
     qtest_add_func("/device-plug/pci-unplug-request",
                    test_pci_unplug_request);
+    qtest_add_func("/device-plug/pci-unplug-json-request",
+                   test_pci_unplug_json_request);
 
     if (!strcmp(arch, "s390x")) {
         qtest_add_func("/device-plug/ccw-unplug",
-- 
Gitee


From 3c92c24582927a7063f6423a797694ad0cb15e91 Mon Sep 17 00:00:00 2001
From: Stefan Hajnoczi <stefanha@redhat.com>
Date: Tue, 11 Jan 2022 15:36:12 +0000
Subject: [PATCH 031/407] block-backend: prevent dangling BDS pointers across
 aio_poll()

RH-Author: Stefan Hajnoczi <stefanha@redhat.com>
RH-MergeRequest: 109: block-backend: prevent dangling BDS pointers across aio_poll()
RH-Commit: [1/2] da5a59eddff0dc10be7de8e291fa675143d11d73
RH-Bugzilla: 2021778 2036178
RH-Acked-by: Hanna Reitz <hreitz@redhat.com>
RH-Acked-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
RH-Acked-by: Kevin Wolf <kwolf@redhat.com>

The BlockBackend root child can change when aio_poll() is invoked. This
happens when a temporary filter node is removed upon blockjob
completion, for example.

Functions in block/block-backend.c must be aware of this when using a
blk_bs() pointer across aio_poll() because the BlockDriverState refcnt
may reach 0, resulting in a stale pointer.

One example is scsi_device_purge_requests(), which calls blk_drain() to
wait for in-flight requests to cancel. If the backup blockjob is active,
then the BlockBackend root child is a temporary filter BDS owned by the
blockjob. The blockjob can complete during bdrv_drained_begin() and the
last reference to the BDS is released when the temporary filter node is
removed. This results in a use-after-free when blk_drain() calls
bdrv_drained_end(bs) on the dangling pointer.

Explicitly hold a reference to bs across block APIs that invoke
aio_poll().

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2021778
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2036178
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20220111153613.25453-2-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 1e3552dbd28359d35967b7c28dc86cde1bc29205)
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/block-backend.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 12ef80ea170..23e727199b9 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -822,16 +822,22 @@ BlockBackend *blk_by_public(BlockBackendPublic *public)
 void blk_remove_bs(BlockBackend *blk)
 {
     ThrottleGroupMember *tgm = &blk->public.throttle_group_member;
-    BlockDriverState *bs;
     BdrvChild *root;
 
     notifier_list_notify(&blk->remove_bs_notifiers, blk);
     if (tgm->throttle_state) {
-        bs = blk_bs(blk);
+        BlockDriverState *bs = blk_bs(blk);
+
+        /*
+         * Take a ref in case blk_bs() changes across bdrv_drained_begin(), for
+         * example, if a temporary filter node is removed by a blockjob.
+         */
+        bdrv_ref(bs);
         bdrv_drained_begin(bs);
         throttle_group_detach_aio_context(tgm);
         throttle_group_attach_aio_context(tgm, qemu_get_aio_context());
         bdrv_drained_end(bs);
+        bdrv_unref(bs);
     }
 
     blk_update_root_state(blk);
@@ -1705,6 +1711,7 @@ void blk_drain(BlockBackend *blk)
     BlockDriverState *bs = blk_bs(blk);
 
     if (bs) {
+        bdrv_ref(bs);
         bdrv_drained_begin(bs);
     }
 
@@ -1714,6 +1721,7 @@ void blk_drain(BlockBackend *blk)
 
     if (bs) {
         bdrv_drained_end(bs);
+        bdrv_unref(bs);
     }
 }
 
@@ -2044,10 +2052,13 @@ static int blk_do_set_aio_context(BlockBackend *blk, AioContext *new_context,
     int ret;
 
     if (bs) {
+        bdrv_ref(bs);
+
         if (update_root_node) {
             ret = bdrv_child_try_set_aio_context(bs, new_context, blk->root,
                                                  errp);
             if (ret < 0) {
+                bdrv_unref(bs);
                 return ret;
             }
         }
@@ -2057,6 +2068,8 @@ static int blk_do_set_aio_context(BlockBackend *blk, AioContext *new_context,
             throttle_group_attach_aio_context(tgm, new_context);
             bdrv_drained_end(bs);
         }
+
+        bdrv_unref(bs);
     }
 
     blk->ctx = new_context;
@@ -2326,11 +2339,13 @@ void blk_io_limits_disable(BlockBackend *blk)
     ThrottleGroupMember *tgm = &blk->public.throttle_group_member;
     assert(tgm->throttle_state);
     if (bs) {
+        bdrv_ref(bs);
         bdrv_drained_begin(bs);
     }
     throttle_group_unregister_tgm(tgm);
     if (bs) {
         bdrv_drained_end(bs);
+        bdrv_unref(bs);
     }
 }
 
-- 
Gitee


From ab181b53883d1b745e2a58f7eb5d1758b98ffb26 Mon Sep 17 00:00:00 2001
From: Stefan Hajnoczi <stefanha@redhat.com>
Date: Tue, 11 Jan 2022 15:36:13 +0000
Subject: [PATCH 032/407] iotests/stream-error-on-reset: New test

RH-Author: Stefan Hajnoczi <stefanha@redhat.com>
RH-MergeRequest: 109: block-backend: prevent dangling BDS pointers across aio_poll()
RH-Commit: [2/2] 0ecb7010d9c121398e7ee22ee47dd85d89bcd941
RH-Bugzilla: 2021778 2036178
RH-Acked-by: Hanna Reitz <hreitz@redhat.com>
RH-Acked-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
RH-Acked-by: Kevin Wolf <kwolf@redhat.com>

Author: Hanna Reitz <hreitz@redhat.com>

Test the following scenario:
- Simple stream block in two-layer backing chain (base and top)
- The job is drained via blk_drain(), then an error occurs while the job
  settles the ongoing request
- And so the job completes while in blk_drain()

This was reported as a segfault, but is fixed by "block-backend: prevent
dangling BDS pointers across aio_poll()".

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2036178
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20220111153613.25453-3-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 2ca1d5d6b91f8a52a5c651f660b2f58c94bf97ba)
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 .../qemu-iotests/tests/stream-error-on-reset  | 140 ++++++++++++++++++
 .../tests/stream-error-on-reset.out           |   5 +
 2 files changed, 145 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/stream-error-on-reset
 create mode 100644 tests/qemu-iotests/tests/stream-error-on-reset.out

diff --git a/tests/qemu-iotests/tests/stream-error-on-reset b/tests/qemu-iotests/tests/stream-error-on-reset
new file mode 100755
index 00000000000..7eaedb24d7b
--- /dev/null
+++ b/tests/qemu-iotests/tests/stream-error-on-reset
@@ -0,0 +1,140 @@
+#!/usr/bin/env python3
+# group: rw quick
+#
+# Test what happens when a stream job completes in a blk_drain().
+#
+# Copyright (C) 2022 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+import os
+import iotests
+from iotests import imgfmt, qemu_img_create, qemu_io_silent, QMPTestCase
+
+
+image_size = 1 * 1024 * 1024
+data_size = 64 * 1024
+base = os.path.join(iotests.test_dir, 'base.img')
+top = os.path.join(iotests.test_dir, 'top.img')
+
+
+# We want to test completing a stream job in a blk_drain().
+#
+# The blk_drain() we are going to use is a virtio-scsi device resetting,
+# which we can trigger by resetting the system.
+#
+# In order to have the block job complete on drain, we (1) throttle its
+# base image so we can start the drain after it has begun, but before it
+# completes, and (2) make it encounter an I/O error on the ensuing write.
+# (If it completes regularly, the completion happens after the drain for
+# some reason.)
+
+class TestStreamErrorOnReset(QMPTestCase):
+    def setUp(self) -> None:
+        """
+        Create two images:
+        - base image {base} with {data_size} bytes allocated
+        - top image {top} without any data allocated
+
+        And the following VM configuration:
+        - base image throttled to {data_size}
+        - top image with a blkdebug configuration so the first write access
+          to it will result in an error
+        - top image is attached to a virtio-scsi device
+        """
+        assert qemu_img_create('-f', imgfmt, base, str(image_size)) == 0
+        assert qemu_io_silent('-c', f'write 0 {data_size}', base) == 0
+        assert qemu_img_create('-f', imgfmt, top, str(image_size)) == 0
+
+        self.vm = iotests.VM()
+        self.vm.add_args('-accel', 'tcg') # Make throttling work properly
+        self.vm.add_object(self.vm.qmp_to_opts({
+            'qom-type': 'throttle-group',
+            'id': 'thrgr',
+            'x-bps-total': str(data_size)
+        }))
+        self.vm.add_blockdev(self.vm.qmp_to_opts({
+            'driver': imgfmt,
+            'node-name': 'base',
+            'file': {
+                'driver': 'throttle',
+                'throttle-group': 'thrgr',
+                'file': {
+                    'driver': 'file',
+                    'filename': base
+                }
+            }
+        }))
+        self.vm.add_blockdev(self.vm.qmp_to_opts({
+            'driver': imgfmt,
+            'node-name': 'top',
+            'file': {
+                'driver': 'blkdebug',
+                'node-name': 'top-blkdebug',
+                'inject-error': [{
+                    'event': 'pwritev',
+                    'immediately': 'true',
+                    'once': 'true'
+                }],
+                'image': {
+                    'driver': 'file',
+                    'filename': top
+                }
+            },
+            'backing': 'base'
+        }))
+        self.vm.add_device(self.vm.qmp_to_opts({
+            'driver': 'virtio-scsi',
+            'id': 'vscsi'
+        }))
+        self.vm.add_device(self.vm.qmp_to_opts({
+            'driver': 'scsi-hd',
+            'bus': 'vscsi.0',
+            'drive': 'top'
+        }))
+        self.vm.launch()
+
+    def tearDown(self) -> None:
+        self.vm.shutdown()
+        os.remove(top)
+        os.remove(base)
+
+    def test_stream_error_on_reset(self) -> None:
+        # Launch a stream job, which will take at least a second to
+        # complete, because the base image is throttled (so we can
+        # get in between it having started and it having completed)
+        res = self.vm.qmp('block-stream', job_id='stream', device='top')
+        self.assert_qmp(res, 'return', {})
+
+        while True:
+            ev = self.vm.event_wait('JOB_STATUS_CHANGE')
+            if ev['data']['status'] == 'running':
+                # Once the stream job is running, reset the system, which
+                # forces the virtio-scsi device to be reset, thus draining
+                # the stream job, and making it complete.  Completing
+                # inside of that drain should not result in a segfault.
+                res = self.vm.qmp('system_reset')
+                self.assert_qmp(res, 'return', {})
+            elif ev['data']['status'] == 'null':
+                # The test is done once the job is gone
+                break
+
+
+if __name__ == '__main__':
+    # Passes with any format with backing file support, but qed and
+    # qcow1 do not seem to exercise the used-to-be problematic code
+    # path, so there is no point in having them in this list
+    iotests.main(supported_fmts=['qcow2', 'vmdk'],
+                 supported_protocols=['file'])
diff --git a/tests/qemu-iotests/tests/stream-error-on-reset.out b/tests/qemu-iotests/tests/stream-error-on-reset.out
new file mode 100644
index 00000000000..ae1213e6f86
--- /dev/null
+++ b/tests/qemu-iotests/tests/stream-error-on-reset.out
@@ -0,0 +1,5 @@
+.
+----------------------------------------------------------------------
+Ran 1 tests
+
+OK
-- 
Gitee


From 848ac0f16f1aaffae46ac1f1b4a66e990c306b43 Mon Sep 17 00:00:00 2001
From: Peter Lieven <pl@kamp.de>
Date: Thu, 13 Jan 2022 15:44:25 +0100
Subject: [PATCH 033/407] block/rbd: fix handling of holes in
 .bdrv_co_block_status

RH-Author: Stefano Garzarella <sgarzare@redhat.com>
RH-MergeRequest: 110: block/rbd: fix handling of holes in .bdrv_co_block_status
RH-Commit: [1/2] 352656a5c77cc7855b476c3559a10c6aa64a4f58
RH-Bugzilla: 2037135
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
RH-Acked-by: Kevin Wolf <kwolf@redhat.com>
RH-Acked-by: Hanna Reitz <hreitz@redhat.com>

the assumption that we can't hit a hole if we do not diff against a snapshot was wrong.

We can see a hole in an image if we diff against base if there exists an older snapshot
of the image and we have discarded blocks in the image where the snapshot has data.

Fix this by simply handling a hole like an unallocated area. There are no callbacks
for unallocated areas so just bail out if we hit a hole.

Fixes: 0347a8fd4c3faaedf119be04c197804be40a384b
Suggested-by: Ilya Dryomov <idryomov@gmail.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <20220113144426.4036493-2-pl@kamp.de>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 9e302f64bb407a9bb097b626da97228c2654cfee)
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 block/rbd.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index def96292e0e..20bb896c4af 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -1279,11 +1279,11 @@ static int qemu_rbd_diff_iterate_cb(uint64_t offs, size_t len,
     RBDDiffIterateReq *req = opaque;
 
     assert(req->offs + req->bytes <= offs);
-    /*
-     * we do not diff against a snapshot so we should never receive a callback
-     * for a hole.
-     */
-    assert(exists);
+
+    /* treat a hole like an unallocated area and bail out */
+    if (!exists) {
+        return 0;
+    }
 
     if (!req->exists && offs > req->offs) {
         /*
-- 
Gitee


From 78a49d53fa4803c2ecd72b8327dba4d58405b531 Mon Sep 17 00:00:00 2001
From: Peter Lieven <pl@kamp.de>
Date: Thu, 13 Jan 2022 15:44:26 +0100
Subject: [PATCH 034/407] block/rbd: workaround for ceph issue #53784

RH-Author: Stefano Garzarella <sgarzare@redhat.com>
RH-MergeRequest: 110: block/rbd: fix handling of holes in .bdrv_co_block_status
RH-Commit: [2/2] 1384557462e89bb539d0d25a1a471ad738fb9e89
RH-Bugzilla: 2037135
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
RH-Acked-by: Kevin Wolf <kwolf@redhat.com>
RH-Acked-by: Hanna Reitz <hreitz@redhat.com>

librbd had a bug until early 2022 that affected all versions of ceph that
supported fast-diff. This bug results in reporting of incorrect offsets
if the offset parameter to rbd_diff_iterate2 is not object aligned.

This patch works around this bug for pre Quincy versions of librbd.

Fixes: 0347a8fd4c3faaedf119be04c197804be40a384b
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <20220113144426.4036493-3-pl@kamp.de>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Tested-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit fc176116cdea816ceb8dd969080b2b95f58edbc0)
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 block/rbd.c | 42 ++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 40 insertions(+), 2 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index 20bb896c4af..8f183eba2a4 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -1320,6 +1320,7 @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
     int status, r;
     RBDDiffIterateReq req = { .offs = offset };
     uint64_t features, flags;
+    uint64_t head = 0;
 
     assert(offset + bytes <= s->image_size);
 
@@ -1347,7 +1348,43 @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
         return status;
     }
 
-    r = rbd_diff_iterate2(s->image, NULL, offset, bytes, true, true,
+#if LIBRBD_VERSION_CODE < LIBRBD_VERSION(1, 17, 0)
+    /*
+     * librbd had a bug until early 2022 that affected all versions of ceph that
+     * supported fast-diff. This bug results in reporting of incorrect offsets
+     * if the offset parameter to rbd_diff_iterate2 is not object aligned.
+     * Work around this bug by rounding down the offset to object boundaries.
+     * This is OK because we call rbd_diff_iterate2 with whole_object = true.
+     * However, this workaround only works for non cloned images with default
+     * striping.
+     *
+     * See: https://tracker.ceph.com/issues/53784
+     */
+
+    /* check if RBD image has non-default striping enabled */
+    if (features & RBD_FEATURE_STRIPINGV2) {
+        return status;
+    }
+
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wdeprecated-declarations"
+    /*
+     * check if RBD image is a clone (= has a parent).
+     *
+     * rbd_get_parent_info is deprecated from Nautilus onwards, but the
+     * replacement rbd_get_parent is not present in Luminous and Mimic.
+     */
+    if (rbd_get_parent_info(s->image, NULL, 0, NULL, 0, NULL, 0) != -ENOENT) {
+        return status;
+    }
+#pragma GCC diagnostic pop
+
+    head = req.offs & (s->object_size - 1);
+    req.offs -= head;
+    bytes += head;
+#endif
+
+    r = rbd_diff_iterate2(s->image, NULL, req.offs, bytes, true, true,
                           qemu_rbd_diff_iterate_cb, &req);
     if (r < 0 && r != QEMU_RBD_EXIT_DIFF_ITERATE2) {
         return status;
@@ -1366,7 +1403,8 @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
         status = BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID;
     }
 
-    *pnum = req.bytes;
+    assert(req.bytes > head);
+    *pnum = req.bytes - head;
     return status;
 }
 
-- 
Gitee


From af6ec350adf69b1c7596ef27ecdb4de708789f25 Mon Sep 17 00:00:00 2001
From: Yang Zhong <yang.zhong@intel.com>
Date: Mon, 1 Nov 2021 12:20:05 -0400
Subject: [PATCH 035/407] numa: Enable numa for SGX EPC sections

RH-Author: Paul Lai <None>
RH-MergeRequest: 111: numa: Enable numa for SGX EPC sections
RH-Commit: [1/5] c29297cbacc4cb65c9ac125db349a767aa2574af
RH-Bugzilla: 1518984
RH-Acked-by: Paolo Bonzini <None>
RH-Acked-by: Bandan Das <None>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>

The basic SGX did not enable numa for SGX EPC sections, which
result in all EPC sections located in numa node 0. This patch
enable SGX numa function in the guest and the EPC section can
work with RAM as one numa node.

The Guest kernel related log:
[    0.009981] ACPI: SRAT: Node 0 PXM 0 [mem 0x180000000-0x183ffffff]
[    0.009982] ACPI: SRAT: Node 1 PXM 1 [mem 0x184000000-0x185bfffff]
The SRAT table can normally show SGX EPC sections menory info in different
numa nodes.

The SGX EPC numa related command:
 ......
 -m 4G,maxmem=20G \
 -smp sockets=2,cores=2 \
 -cpu host,+sgx-provisionkey \
 -object memory-backend-ram,size=2G,host-nodes=0,policy=bind,id=node0 \
 -object memory-backend-epc,id=mem0,size=64M,prealloc=on,host-nodes=0,policy=bind \
 -numa node,nodeid=0,cpus=0-1,memdev=node0 \
 -object memory-backend-ram,size=2G,host-nodes=1,policy=bind,id=node1 \
 -object memory-backend-epc,id=mem1,size=28M,prealloc=on,host-nodes=1,policy=bind \
 -numa node,nodeid=1,cpus=2-3,memdev=node1 \
 -M sgx-epc.0.memdev=mem0,sgx-epc.0.node=0,sgx-epc.1.memdev=mem1,sgx-epc.1.node=1 \
 ......

Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20211101162009.62161-2-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 1105812382e1126d86dddc16b3700f8c79dc93d1)
Signed-off-by: Paul Lai <plai@redhat.com>
---
 hw/core/numa.c            |  5 ++---
 hw/i386/acpi-build.c      |  2 ++
 hw/i386/sgx-epc.c         |  3 +++
 hw/i386/sgx-stub.c        |  4 ++++
 hw/i386/sgx.c             | 44 +++++++++++++++++++++++++++++++++++++++
 include/hw/i386/sgx-epc.h |  3 +++
 monitor/hmp-cmds.c        |  1 +
 qapi/machine.json         | 10 ++++++++-
 qemu-options.hx           |  4 ++--
 9 files changed, 70 insertions(+), 6 deletions(-)

diff --git a/hw/core/numa.c b/hw/core/numa.c
index e6050b22739..1aa05dcf425 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -784,9 +784,8 @@ static void numa_stat_memory_devices(NumaNodeMem node_mem[])
                 break;
             case MEMORY_DEVICE_INFO_KIND_SGX_EPC:
                 se = value->u.sgx_epc.data;
-                /* TODO: once we support numa, assign to right node */
-                node_mem[0].node_mem += se->size;
-                node_mem[0].node_plugged_mem += se->size;
+                node_mem[se->node].node_mem += se->size;
+                node_mem[se->node].node_plugged_mem = 0;
                 break;
             default:
                 g_assert_not_reached();
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 447ea35275f..a4478e77b73 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2071,6 +2071,8 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
         nvdimm_build_srat(table_data);
     }
 
+    sgx_epc_build_srat(table_data);
+
     /*
      * TODO: this part is not in ACPI spec and current linux kernel boots fine
      * without these entries. But I recall there were issues the last time I
diff --git a/hw/i386/sgx-epc.c b/hw/i386/sgx-epc.c
index e508827e787..96b2940d75e 100644
--- a/hw/i386/sgx-epc.c
+++ b/hw/i386/sgx-epc.c
@@ -21,6 +21,7 @@
 
 static Property sgx_epc_properties[] = {
     DEFINE_PROP_UINT64(SGX_EPC_ADDR_PROP, SGXEPCDevice, addr, 0),
+    DEFINE_PROP_UINT32(SGX_EPC_NUMA_NODE_PROP, SGXEPCDevice, node, 0),
     DEFINE_PROP_LINK(SGX_EPC_MEMDEV_PROP, SGXEPCDevice, hostmem,
                      TYPE_MEMORY_BACKEND_EPC, HostMemoryBackendEpc *),
     DEFINE_PROP_END_OF_LIST(),
@@ -139,6 +140,8 @@ static void sgx_epc_md_fill_device_info(const MemoryDeviceState *md,
     se->memaddr = epc->addr;
     se->size = object_property_get_uint(OBJECT(epc), SGX_EPC_SIZE_PROP,
                                         NULL);
+    se->node = object_property_get_uint(OBJECT(epc), SGX_EPC_NUMA_NODE_PROP,
+                                        NULL);
     se->memdev = object_get_canonical_path(OBJECT(epc->hostmem));
 
     info->u.sgx_epc.data = se;
diff --git a/hw/i386/sgx-stub.c b/hw/i386/sgx-stub.c
index c9b379e6651..26833eb233c 100644
--- a/hw/i386/sgx-stub.c
+++ b/hw/i386/sgx-stub.c
@@ -6,6 +6,10 @@
 #include "qapi/error.h"
 #include "qapi/qapi-commands-misc-target.h"
 
+void sgx_epc_build_srat(GArray *table_data)
+{
+}
+
 SGXInfo *qmp_query_sgx(Error **errp)
 {
     error_setg(errp, "SGX support is not compiled in");
diff --git a/hw/i386/sgx.c b/hw/i386/sgx.c
index 8fef3dd8fad..d04299904a2 100644
--- a/hw/i386/sgx.c
+++ b/hw/i386/sgx.c
@@ -23,6 +23,7 @@
 #include "sysemu/hw_accel.h"
 #include "sysemu/reset.h"
 #include <sys/ioctl.h>
+#include "hw/acpi/aml-build.h"
 
 #define SGX_MAX_EPC_SECTIONS            8
 #define SGX_CPUID_EPC_INVALID           0x0
@@ -36,6 +37,46 @@
 
 #define RETRY_NUM                       2
 
+static int sgx_epc_device_list(Object *obj, void *opaque)
+{
+    GSList **list = opaque;
+
+    if (object_dynamic_cast(obj, TYPE_SGX_EPC)) {
+        *list = g_slist_append(*list, DEVICE(obj));
+    }
+
+    object_child_foreach(obj, sgx_epc_device_list, opaque);
+    return 0;
+}
+
+static GSList *sgx_epc_get_device_list(void)
+{
+    GSList *list = NULL;
+
+    object_child_foreach(qdev_get_machine(), sgx_epc_device_list, &list);
+    return list;
+}
+
+void sgx_epc_build_srat(GArray *table_data)
+{
+    GSList *device_list = sgx_epc_get_device_list();
+
+    for (; device_list; device_list = device_list->next) {
+        DeviceState *dev = device_list->data;
+        Object *obj = OBJECT(dev);
+        uint64_t addr, size;
+        int node;
+
+        node = object_property_get_uint(obj, SGX_EPC_NUMA_NODE_PROP,
+                                        &error_abort);
+        addr = object_property_get_uint(obj, SGX_EPC_ADDR_PROP, &error_abort);
+        size = object_property_get_uint(obj, SGX_EPC_SIZE_PROP, &error_abort);
+
+        build_srat_memory(table_data, addr, size, node, MEM_AFFINITY_ENABLED);
+    }
+    g_slist_free(device_list);
+}
+
 static uint64_t sgx_calc_section_metric(uint64_t low, uint64_t high)
 {
     return (low & MAKE_64BIT_MASK(12, 20)) +
@@ -226,6 +267,9 @@ void pc_machine_init_sgx_epc(PCMachineState *pcms)
         /* set the memdev link with memory backend */
         object_property_parse(obj, SGX_EPC_MEMDEV_PROP, list->value->memdev,
                               &error_fatal);
+        /* set the numa node property for sgx epc object */
+        object_property_set_uint(obj, SGX_EPC_NUMA_NODE_PROP, list->value->node,
+                             &error_fatal);
         object_property_set_bool(obj, "realized", true, &error_fatal);
         object_unref(obj);
     }
diff --git a/include/hw/i386/sgx-epc.h b/include/hw/i386/sgx-epc.h
index a6a65be854f..581fac389a6 100644
--- a/include/hw/i386/sgx-epc.h
+++ b/include/hw/i386/sgx-epc.h
@@ -25,6 +25,7 @@
 #define SGX_EPC_ADDR_PROP "addr"
 #define SGX_EPC_SIZE_PROP "size"
 #define SGX_EPC_MEMDEV_PROP "memdev"
+#define SGX_EPC_NUMA_NODE_PROP "node"
 
 /**
  * SGXEPCDevice:
@@ -38,6 +39,7 @@ typedef struct SGXEPCDevice {
 
     /* public */
     uint64_t addr;
+    uint32_t node;
     HostMemoryBackendEpc *hostmem;
 } SGXEPCDevice;
 
@@ -56,6 +58,7 @@ typedef struct SGXEPCState {
 } SGXEPCState;
 
 bool sgx_epc_get_section(int section_nr, uint64_t *addr, uint64_t *size);
+void sgx_epc_build_srat(GArray *table_data);
 
 static inline uint64_t sgx_epc_above_4g_end(SGXEPCState *sgx_epc)
 {
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 9c91bf93e94..2669156b284 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -1810,6 +1810,7 @@ void hmp_info_memory_devices(Monitor *mon, const QDict *qdict)
                                se->id ? se->id : "");
                 monitor_printf(mon, "  memaddr: 0x%" PRIx64 "\n", se->memaddr);
                 monitor_printf(mon, "  size: %" PRIu64 "\n", se->size);
+                monitor_printf(mon, "  node: %" PRId64 "\n", se->node);
                 monitor_printf(mon, "  memdev: %s\n", se->memdev);
                 break;
             default:
diff --git a/qapi/machine.json b/qapi/machine.json
index 067e3f53787..16e771affcf 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -1207,12 +1207,15 @@
 #
 # @memdev: memory backend linked with device
 #
+# @node: the numa node
+#
 # Since: 6.2
 ##
 { 'struct': 'SgxEPCDeviceInfo',
   'data': { '*id': 'str',
             'memaddr': 'size',
             'size': 'size',
+            'node': 'int',
             'memdev': 'str'
           }
 }
@@ -1285,10 +1288,15 @@
 #
 # @memdev: memory backend linked with device
 #
+# @node: the numa node
+#
 # Since: 6.2
 ##
 { 'struct': 'SgxEPC',
-  'data': { 'memdev': 'str' } }
+  'data': { 'memdev': 'str',
+            'node': 'int'
+          }
+}
 
 ##
 # @SgxEPCProperties:
diff --git a/qemu-options.hx b/qemu-options.hx
index 94c4a8dbaff..4b7798088b0 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -127,11 +127,11 @@ SRST
 ERST
 
 DEF("M", HAS_ARG, QEMU_OPTION_M,
-    "                sgx-epc.0.memdev=memid\n",
+    "                sgx-epc.0.memdev=memid,sgx-epc.0.node=numaid\n",
     QEMU_ARCH_ALL)
 
 SRST
-``sgx-epc.0.memdev=@var{memid}``
+``sgx-epc.0.memdev=@var{memid},sgx-epc.0.node=@var{numaid}``
     Define an SGX EPC section.
 ERST
 
-- 
Gitee


From 8190d352c6a00af1d89f32998b8de95d44762bc9 Mon Sep 17 00:00:00 2001
From: Yang Zhong <yang.zhong@intel.com>
Date: Mon, 1 Nov 2021 12:20:07 -0400
Subject: [PATCH 036/407] numa: Support SGX numa in the monitor and Libvirt
 interfaces

RH-Author: Paul Lai <None>
RH-MergeRequest: 111: numa: Enable numa for SGX EPC sections
RH-Commit: [2/5] 403c4f98dccd023293cd3246081ae12f4782bed0
RH-Bugzilla: 1518984
RH-Acked-by: Paolo Bonzini <None>
RH-Acked-by: Bandan Das <None>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>

Add the SGXEPCSection list into SGXInfo to show the multiple
SGX EPC sections detailed info, not the total size like before.
This patch can enable numa support for 'info sgx' command and
QMP interfaces. The new interfaces show each EPC section info
in one numa node. Libvirt can use QMP interface to get the
detailed host SGX EPC capabilities to decide how to allocate
host EPC sections to guest.

(qemu) info sgx
 SGX support: enabled
 SGX1 support: enabled
 SGX2 support: enabled
 FLC support: enabled
 NUMA node #0: size=67108864
 NUMA node #1: size=29360128

The QMP interface show:
(QEMU) query-sgx
{"return": {"sgx": true, "sgx2": true, "sgx1": true, "sections": \
[{"node": 0, "size": 67108864}, {"node": 1, "size": 29360128}], "flc": true}}

(QEMU) query-sgx-capabilities
{"return": {"sgx": true, "sgx2": true, "sgx1": true, "sections": \
[{"node": 0, "size": 17070817280}, {"node": 1, "size": 17079205888}], "flc": true}}

Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20211101162009.62161-4-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 4755927ae12547c2e7cb22c5fa1b39038c6c11b1)
Signed-off-by: Paul Lai <plai@redhat.com>
---
 hw/i386/sgx.c         | 51 +++++++++++++++++++++++++++++++++++--------
 qapi/misc-target.json | 19 ++++++++++++++--
 2 files changed, 59 insertions(+), 11 deletions(-)

diff --git a/hw/i386/sgx.c b/hw/i386/sgx.c
index d04299904a2..5de5dd08936 100644
--- a/hw/i386/sgx.c
+++ b/hw/i386/sgx.c
@@ -83,11 +83,13 @@ static uint64_t sgx_calc_section_metric(uint64_t low, uint64_t high)
            ((high & MAKE_64BIT_MASK(0, 20)) << 32);
 }
 
-static uint64_t sgx_calc_host_epc_section_size(void)
+static SGXEPCSectionList *sgx_calc_host_epc_sections(void)
 {
+    SGXEPCSectionList *head = NULL, **tail = &head;
+    SGXEPCSection *section;
     uint32_t i, type;
     uint32_t eax, ebx, ecx, edx;
-    uint64_t size = 0;
+    uint32_t j = 0;
 
     for (i = 0; i < SGX_MAX_EPC_SECTIONS; i++) {
         host_cpuid(0x12, i + 2, &eax, &ebx, &ecx, &edx);
@@ -101,10 +103,13 @@ static uint64_t sgx_calc_host_epc_section_size(void)
             break;
         }
 
-        size += sgx_calc_section_metric(ecx, edx);
+        section = g_new0(SGXEPCSection, 1);
+        section->node = j++;
+        section->size = sgx_calc_section_metric(ecx, edx);
+        QAPI_LIST_APPEND(tail, section);
     }
 
-    return size;
+    return head;
 }
 
 static void sgx_epc_reset(void *opaque)
@@ -168,13 +173,35 @@ SGXInfo *qmp_query_sgx_capabilities(Error **errp)
     info->sgx1 = eax & (1U << 0) ? true : false;
     info->sgx2 = eax & (1U << 1) ? true : false;
 
-    info->section_size = sgx_calc_host_epc_section_size();
+    info->sections = sgx_calc_host_epc_sections();
 
     close(fd);
 
     return info;
 }
 
+static SGXEPCSectionList *sgx_get_epc_sections_list(void)
+{
+    GSList *device_list = sgx_epc_get_device_list();
+    SGXEPCSectionList *head = NULL, **tail = &head;
+    SGXEPCSection *section;
+
+    for (; device_list; device_list = device_list->next) {
+        DeviceState *dev = device_list->data;
+        Object *obj = OBJECT(dev);
+
+        section = g_new0(SGXEPCSection, 1);
+        section->node = object_property_get_uint(obj, SGX_EPC_NUMA_NODE_PROP,
+                                                 &error_abort);
+        section->size = object_property_get_uint(obj, SGX_EPC_SIZE_PROP,
+                                                 &error_abort);
+        QAPI_LIST_APPEND(tail, section);
+    }
+    g_slist_free(device_list);
+
+    return head;
+}
+
 SGXInfo *qmp_query_sgx(Error **errp)
 {
     SGXInfo *info = NULL;
@@ -193,14 +220,13 @@ SGXInfo *qmp_query_sgx(Error **errp)
         return NULL;
     }
 
-    SGXEPCState *sgx_epc = &pcms->sgx_epc;
     info = g_new0(SGXInfo, 1);
 
     info->sgx = true;
     info->sgx1 = true;
     info->sgx2 = true;
     info->flc = true;
-    info->section_size = sgx_epc->size;
+    info->sections = sgx_get_epc_sections_list();
 
     return info;
 }
@@ -208,6 +234,7 @@ SGXInfo *qmp_query_sgx(Error **errp)
 void hmp_info_sgx(Monitor *mon, const QDict *qdict)
 {
     Error *err = NULL;
+    SGXEPCSectionList *section_list, *section;
     g_autoptr(SGXInfo) info = qmp_query_sgx(&err);
 
     if (err) {
@@ -222,8 +249,14 @@ void hmp_info_sgx(Monitor *mon, const QDict *qdict)
                    info->sgx2 ? "enabled" : "disabled");
     monitor_printf(mon, "FLC support: %s\n",
                    info->flc ? "enabled" : "disabled");
-    monitor_printf(mon, "size: %" PRIu64 "\n",
-                   info->section_size);
+
+    section_list = info->sections;
+    for (section = section_list; section; section = section->next) {
+        monitor_printf(mon, "NUMA node #%" PRId64 ": ",
+                       section->value->node);
+        monitor_printf(mon, "size=%" PRIu64 "\n",
+                       section->value->size);
+    }
 }
 
 bool sgx_epc_get_section(int section_nr, uint64_t *addr, uint64_t *size)
diff --git a/qapi/misc-target.json b/qapi/misc-target.json
index 5aa2b95b7d4..1022aa0184c 100644
--- a/qapi/misc-target.json
+++ b/qapi/misc-target.json
@@ -337,6 +337,21 @@
   'if': 'TARGET_ARM' }
 
 
+##
+# @SGXEPCSection:
+#
+# Information about intel SGX EPC section info
+#
+# @node: the numa node
+#
+# @size: the size of epc section
+#
+# Since: 6.2
+##
+{ 'struct': 'SGXEPCSection',
+  'data': { 'node': 'int',
+            'size': 'uint64'}}
+
 ##
 # @SGXInfo:
 #
@@ -350,7 +365,7 @@
 #
 # @flc: true if FLC is supported
 #
-# @section-size: The EPC section size for guest
+# @sections: The EPC sections info for guest
 #
 # Since: 6.2
 ##
@@ -359,7 +374,7 @@
             'sgx1': 'bool',
             'sgx2': 'bool',
             'flc': 'bool',
-            'section-size': 'uint64'},
+            'sections': ['SGXEPCSection']},
    'if': 'TARGET_I386' }
 
 ##
-- 
Gitee


From de4e615c2dd81cbd2583ae8317918bda0c6a4935 Mon Sep 17 00:00:00 2001
From: Yang Zhong <yang.zhong@intel.com>
Date: Mon, 1 Nov 2021 12:20:08 -0400
Subject: [PATCH 037/407] doc: Add the SGX numa description

RH-Author: Paul Lai <None>
RH-MergeRequest: 111: numa: Enable numa for SGX EPC sections
RH-Commit: [3/5] 41c74688c9662b966c243566a837135ff52341c4
RH-Bugzilla: 1518984
RH-Acked-by: Paolo Bonzini <None>
RH-Acked-by: Bandan Das <None>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>

Add the SGX numa reference command and how to check if
SGX numa is support or not with multiple EPC sections.

Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20211101162009.62161-5-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit d1889b36098c79e2e6ac90faf3d0dc5ec0057677)
Signed-off-by: Paul Lai <plai@redhat.com>
---
 docs/system/i386/sgx.rst | 31 +++++++++++++++++++++++++++----
 1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/docs/system/i386/sgx.rst b/docs/system/i386/sgx.rst
index f8fade5ac2d..0f0a73f7587 100644
--- a/docs/system/i386/sgx.rst
+++ b/docs/system/i386/sgx.rst
@@ -141,8 +141,7 @@ To launch a SGX guest:
   |qemu_system_x86| \\
    -cpu host,+sgx-provisionkey \\
    -object memory-backend-epc,id=mem1,size=64M,prealloc=on \\
-   -object memory-backend-epc,id=mem2,size=28M \\
-   -M sgx-epc.0.memdev=mem1,sgx-epc.1.memdev=mem2
+   -M sgx-epc.0.memdev=mem1,sgx-epc.0.node=0
 
 Utilizing SGX in the guest requires a kernel/OS with SGX support.
 The support can be determined in guest by::
@@ -152,8 +151,32 @@ The support can be determined in guest by::
 and SGX epc info by::
 
   $ dmesg | grep sgx
-  [    1.242142] sgx: EPC section 0x180000000-0x181bfffff
-  [    1.242319] sgx: EPC section 0x181c00000-0x1837fffff
+  [    0.182807] sgx: EPC section 0x140000000-0x143ffffff
+  [    0.183695] sgx: [Firmware Bug]: Unable to map EPC section to online node. Fallback to the NUMA node 0.
+
+To launch a SGX numa guest:
+
+.. parsed-literal::
+
+  |qemu_system_x86| \\
+   -cpu host,+sgx-provisionkey \\
+   -object memory-backend-ram,size=2G,host-nodes=0,policy=bind,id=node0 \\
+   -object memory-backend-epc,id=mem0,size=64M,prealloc=on,host-nodes=0,policy=bind \\
+   -numa node,nodeid=0,cpus=0-1,memdev=node0 \\
+   -object memory-backend-ram,size=2G,host-nodes=1,policy=bind,id=node1 \\
+   -object memory-backend-epc,id=mem1,size=28M,prealloc=on,host-nodes=1,policy=bind \\
+   -numa node,nodeid=1,cpus=2-3,memdev=node1 \\
+   -M sgx-epc.0.memdev=mem0,sgx-epc.0.node=0,sgx-epc.1.memdev=mem1,sgx-epc.1.node=1
+
+and SGX epc numa info by::
+
+  $ dmesg | grep sgx
+  [    0.369937] sgx: EPC section 0x180000000-0x183ffffff
+  [    0.370259] sgx: EPC section 0x184000000-0x185bfffff
+
+  $ dmesg | grep SRAT
+  [    0.009981] ACPI: SRAT: Node 0 PXM 0 [mem 0x180000000-0x183ffffff]
+  [    0.009982] ACPI: SRAT: Node 1 PXM 1 [mem 0x184000000-0x185bfffff]
 
 References
 ----------
-- 
Gitee


From 5557725f262b48a3229cad653942fc48ea002d1e Mon Sep 17 00:00:00 2001
From: Paul Lai <plai@redhat.com>
Date: Tue, 25 Jan 2022 15:16:22 -0500
Subject: [PATCH 038/407] Enable SGX -- RH Only

RH-Author: Paul Lai <None>
RH-MergeRequest: 111: numa: Enable numa for SGX EPC sections
RH-Commit: [4/5] cea874f29984897ef1232fb7749c13203c888034
RH-Bugzilla: 1518984
RH-Acked-by: Paolo Bonzini <None>
RH-Acked-by: Bandan Das <None>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
---
 configs/devices/x86_64-softmmu/x86_64-rh-devices.mak | 1 +
 1 file changed, 1 insertion(+)

diff --git a/configs/devices/x86_64-softmmu/x86_64-rh-devices.mak b/configs/devices/x86_64-softmmu/x86_64-rh-devices.mak
index ddf036f0423..fdbbdf97426 100644
--- a/configs/devices/x86_64-softmmu/x86_64-rh-devices.mak
+++ b/configs/devices/x86_64-softmmu/x86_64-rh-devices.mak
@@ -102,3 +102,4 @@ CONFIG_TPM_CRB=y
 CONFIG_TPM_TIS_ISA=y
 CONFIG_TPM_EMULATOR=y
 CONFIG_TPM_PASSTHROUGH=y
+CONFIG_SGX=y
-- 
Gitee


From d49cf18c7e3695de0d1cb1a84cb6dd6c009568a1 Mon Sep 17 00:00:00 2001
From: Yang Zhong <yang.zhong@intel.com>
Date: Thu, 20 Jan 2022 17:31:04 -0500
Subject: [PATCH 039/407] qapi: Cleanup SGX related comments and restore
 @section-size
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Paul Lai <None>
RH-MergeRequest: 111: numa: Enable numa for SGX EPC sections
RH-Commit: [5/5] 497dbeaebb7b8f99f5f8a7de58000dcab0d0c22d
RH-Bugzilla: 1518984
RH-Acked-by: Paolo Bonzini <None>
RH-Acked-by: Bandan Das <None>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>

The SGX NUMA patches were merged into Qemu 7.0 release, we need
clarify detailed version history information and also change
some related comments, which make SGX related comments clearer.

The QMP command schema promises backwards compatibility as standard.
We temporarily restore "@section-size", which can avoid incompatible
API breakage. The "@section-size" will be deprecated in 7.2 version.

Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20220120223104.437161-1-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit a66bd91f030827742778a9e0da19fe55716b4a60)
Signed-off-by: Paul Lai <plai@redhat.com>
---
 docs/about/deprecated.rst | 13 +++++++++++++
 hw/i386/sgx.c             | 11 +++++++++--
 qapi/machine.json         |  4 ++--
 qapi/misc-target.json     | 22 +++++++++++++++++-----
 4 files changed, 41 insertions(+), 9 deletions(-)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index ff7488cb63b..33925edf450 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -270,6 +270,19 @@ accepted incorrect commands will return an error. Users should make sure that
 all arguments passed to ``device_add`` are consistent with the documented
 property types.
 
+``query-sgx`` return value member ``section-size`` (since 7.0)
+''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
+
+Member ``section-size`` in return value elements with meta-type ``uint64`` is
+deprecated.  Use ``sections`` instead.
+
+
+``query-sgx-capabilities`` return value member ``section-size`` (since 7.0)
+'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
+
+Member ``section-size`` in return value elements with meta-type ``uint64`` is
+deprecated.  Use ``sections`` instead.
+
 System accelerators
 -------------------
 
diff --git a/hw/i386/sgx.c b/hw/i386/sgx.c
index 5de5dd08936..a2b318dd938 100644
--- a/hw/i386/sgx.c
+++ b/hw/i386/sgx.c
@@ -83,7 +83,7 @@ static uint64_t sgx_calc_section_metric(uint64_t low, uint64_t high)
            ((high & MAKE_64BIT_MASK(0, 20)) << 32);
 }
 
-static SGXEPCSectionList *sgx_calc_host_epc_sections(void)
+static SGXEPCSectionList *sgx_calc_host_epc_sections(uint64_t *size)
 {
     SGXEPCSectionList *head = NULL, **tail = &head;
     SGXEPCSection *section;
@@ -106,6 +106,7 @@ static SGXEPCSectionList *sgx_calc_host_epc_sections(void)
         section = g_new0(SGXEPCSection, 1);
         section->node = j++;
         section->size = sgx_calc_section_metric(ecx, edx);
+        *size += section->size;
         QAPI_LIST_APPEND(tail, section);
     }
 
@@ -156,6 +157,7 @@ SGXInfo *qmp_query_sgx_capabilities(Error **errp)
 {
     SGXInfo *info = NULL;
     uint32_t eax, ebx, ecx, edx;
+    uint64_t size = 0;
 
     int fd = qemu_open_old("/dev/sgx_vepc", O_RDWR);
     if (fd < 0) {
@@ -173,7 +175,8 @@ SGXInfo *qmp_query_sgx_capabilities(Error **errp)
     info->sgx1 = eax & (1U << 0) ? true : false;
     info->sgx2 = eax & (1U << 1) ? true : false;
 
-    info->sections = sgx_calc_host_epc_sections();
+    info->sections = sgx_calc_host_epc_sections(&size);
+    info->section_size = size;
 
     close(fd);
 
@@ -220,12 +223,14 @@ SGXInfo *qmp_query_sgx(Error **errp)
         return NULL;
     }
 
+    SGXEPCState *sgx_epc = &pcms->sgx_epc;
     info = g_new0(SGXInfo, 1);
 
     info->sgx = true;
     info->sgx1 = true;
     info->sgx2 = true;
     info->flc = true;
+    info->section_size = sgx_epc->size;
     info->sections = sgx_get_epc_sections_list();
 
     return info;
@@ -249,6 +254,8 @@ void hmp_info_sgx(Monitor *mon, const QDict *qdict)
                    info->sgx2 ? "enabled" : "disabled");
     monitor_printf(mon, "FLC support: %s\n",
                    info->flc ? "enabled" : "disabled");
+    monitor_printf(mon, "size: %" PRIu64 "\n",
+                   info->section_size);
 
     section_list = info->sections;
     for (section = section_list; section; section = section->next) {
diff --git a/qapi/machine.json b/qapi/machine.json
index 16e771affcf..a9f33d0f27d 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -1207,7 +1207,7 @@
 #
 # @memdev: memory backend linked with device
 #
-# @node: the numa node
+# @node: the numa node (Since: 7.0)
 #
 # Since: 6.2
 ##
@@ -1288,7 +1288,7 @@
 #
 # @memdev: memory backend linked with device
 #
-# @node: the numa node
+# @node: the numa node (Since: 7.0)
 #
 # Since: 6.2
 ##
diff --git a/qapi/misc-target.json b/qapi/misc-target.json
index 1022aa0184c..4bc45d24741 100644
--- a/qapi/misc-target.json
+++ b/qapi/misc-target.json
@@ -344,9 +344,9 @@
 #
 # @node: the numa node
 #
-# @size: the size of epc section
+# @size: the size of EPC section
 #
-# Since: 6.2
+# Since: 7.0
 ##
 { 'struct': 'SGXEPCSection',
   'data': { 'node': 'int',
@@ -365,7 +365,13 @@
 #
 # @flc: true if FLC is supported
 #
-# @sections: The EPC sections info for guest
+# @section-size: The EPC section size for guest
+#                Redundant with @sections.  Just for backward compatibility.
+#
+# @sections: The EPC sections info for guest (Since: 7.0)
+#
+# Features:
+# @deprecated: Member @section-size is deprecated.  Use @sections instead.
 #
 # Since: 6.2
 ##
@@ -374,6 +380,8 @@
             'sgx1': 'bool',
             'sgx2': 'bool',
             'flc': 'bool',
+            'section-size': { 'type': 'uint64',
+                    'features': [ 'deprecated' ] },
             'sections': ['SGXEPCSection']},
    'if': 'TARGET_I386' }
 
@@ -390,7 +398,9 @@
 #
 # -> { "execute": "query-sgx" }
 # <- { "return": { "sgx": true, "sgx1" : true, "sgx2" : true,
-#                  "flc": true, "section-size" : 0 } }
+#                  "flc": true,  "section-size" : 96468992,
+#                  "sections": [{"node": 0, "size": 67108864},
+#                  {"node": 1, "size": 29360128}]} }
 #
 ##
 { 'command': 'query-sgx', 'returns': 'SGXInfo', 'if': 'TARGET_I386' }
@@ -408,7 +418,9 @@
 #
 # -> { "execute": "query-sgx-capabilities" }
 # <- { "return": { "sgx": true, "sgx1" : true, "sgx2" : true,
-#                  "flc": true, "section-size" : 0 } }
+#                  "flc": true, "section-size" : 96468992,
+#                  "section" : [{"node": 0, "size": 67108864},
+#                  {"node": 1, "size": 29360128}]} }
 #
 ##
 { 'command': 'query-sgx-capabilities', 'returns': 'SGXInfo', 'if': 'TARGET_I386' }
-- 
Gitee


From a80475939b18bc0140d30c3c2b0ed3c7bc622923 Mon Sep 17 00:00:00 2001
From: Hanna Reitz <hreitz@redhat.com>
Date: Tue, 18 Jan 2022 17:59:59 +0100
Subject: [PATCH 040/407] block/io: Update BSC only if want_zero is true

RH-Author: Hanna Reitz <hreitz@redhat.com>
RH-MergeRequest: 112: block/io: Update BSC only if want_zero is true
RH-Commit: [1/2] a202de1f52110d1e871c3b5b58f2d9e9b5d17570
RH-Bugzilla: 2041480
RH-Acked-by: Eric Blake <eblake@redhat.com>
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
RH-Acked-by: Kevin Wolf <kwolf@redhat.com>

We update the block-status cache whenever we get new information from a
bdrv_co_block_status() call to the block driver.  However, if we have
passed want_zero=false to that call, it may flag areas containing zeroes
as data, and so we would update the block-status cache with wrong
information.

Therefore, we should not update the cache with want_zero=false.

Reported-by: Nir Soffer <nsoffer@redhat.com>
Fixes: 0bc329fbb00 ("block: block-status cache for data regions")
Reviewed-by: Nir Soffer <nsoffer@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220118170000.49423-2-hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
(cherry picked from commit 113b727ce788335cf76f65355d670c9bc130fd75)
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
 block/io.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/block/io.c b/block/io.c
index bb0a254def1..4e4cb556c58 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2497,8 +2497,12 @@ static int coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
              * non-protocol nodes, and then it is never used.  However, filling
              * the cache requires an RCU update, so double check here to avoid
              * such an update if possible.
+             *
+             * Check want_zero, because we only want to update the cache when we
+             * have accurate information about what is zero and what is data.
              */
-            if (ret == (BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID) &&
+            if (want_zero &&
+                ret == (BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID) &&
                 QLIST_EMPTY(&bs->children))
             {
                 /*
-- 
Gitee


From 6f5f1985826d9b5f5053ee1d34df9ac1f70c74dd Mon Sep 17 00:00:00 2001
From: Hanna Reitz <hreitz@redhat.com>
Date: Tue, 18 Jan 2022 18:00:00 +0100
Subject: [PATCH 041/407] iotests/block-status-cache: New test

RH-Author: Hanna Reitz <hreitz@redhat.com>
RH-MergeRequest: 112: block/io: Update BSC only if want_zero is true
RH-Commit: [2/2] ba86b4db32c33e17a85f476d445ef0523cf8f60e
RH-Bugzilla: 2041480
RH-Acked-by: Eric Blake <eblake@redhat.com>
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
RH-Acked-by: Kevin Wolf <kwolf@redhat.com>

Add a new test to verify that want_zero=false block-status calls do not
pollute the block-status cache for want_zero=true calls.

We check want_zero=true calls and their results using `qemu-img map`
(over NBD), and want_zero=false calls also using `qemu-img map` over
NBD, but using the qemu:allocation-depth context.

(This test case cannot be integrated into nbd-qemu-allocation, because
that is a qcow2 test, and this is a raw test.)

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220118170000.49423-3-hreitz@redhat.com>
Reviewed-by: Nir Soffer <nsoffer@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Tested-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
(cherry picked from commit 6384dd534d742123d26c008d9794b20bc41359d5)
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
 tests/qemu-iotests/tests/block-status-cache   | 139 ++++++++++++++++++
 .../qemu-iotests/tests/block-status-cache.out |   5 +
 2 files changed, 144 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/block-status-cache
 create mode 100644 tests/qemu-iotests/tests/block-status-cache.out

diff --git a/tests/qemu-iotests/tests/block-status-cache b/tests/qemu-iotests/tests/block-status-cache
new file mode 100755
index 00000000000..6fa10bb8f8a
--- /dev/null
+++ b/tests/qemu-iotests/tests/block-status-cache
@@ -0,0 +1,139 @@
+#!/usr/bin/env python3
+# group: rw quick
+#
+# Test cases for the block-status cache.
+#
+# Copyright (C) 2022 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+import os
+import signal
+import iotests
+from iotests import qemu_img_create, qemu_img_pipe, qemu_nbd
+
+
+image_size = 1 * 1024 * 1024
+test_img = os.path.join(iotests.test_dir, 'test.img')
+
+nbd_pidfile = os.path.join(iotests.test_dir, 'nbd.pid')
+nbd_sock = os.path.join(iotests.sock_dir, 'nbd.sock')
+
+
+class TestBscWithNbd(iotests.QMPTestCase):
+    def setUp(self) -> None:
+        """Just create an empty image with a read-only NBD server on it"""
+        assert qemu_img_create('-f', iotests.imgfmt, test_img,
+                               str(image_size)) == 0
+
+        # Pass --allocation-depth to enable the qemu:allocation-depth context,
+        # which we are going to query to provoke a block-status inquiry with
+        # want_zero=false.
+        assert qemu_nbd(f'--socket={nbd_sock}',
+                        f'--format={iotests.imgfmt}',
+                        '--persistent',
+                        '--allocation-depth',
+                        '--read-only',
+                        f'--pid-file={nbd_pidfile}',
+                        test_img) \
+            == 0
+
+    def tearDown(self) -> None:
+        with open(nbd_pidfile, encoding='utf-8') as f:
+            pid = int(f.read())
+        os.kill(pid, signal.SIGTERM)
+        os.remove(nbd_pidfile)
+        os.remove(test_img)
+
+    def test_with_zero_bug(self) -> None:
+        """
+        Verify that the block-status cache is not corrupted by a
+        want_zero=false call.
+        We can provoke a want_zero=false call with `qemu-img map` over NBD with
+        x-dirty-bitmap=qemu:allocation-depth, so we first run a normal `map`
+        (which results in want_zero=true), then using said
+        qemu:allocation-depth context, and finally another normal `map` to
+        verify that the cache has not been corrupted.
+        """
+
+        nbd_img_opts = f'driver=nbd,server.type=unix,server.path={nbd_sock}'
+        nbd_img_opts_alloc_depth = nbd_img_opts + \
+            ',x-dirty-bitmap=qemu:allocation-depth'
+
+        # Normal map, results in want_zero=true.
+        # This will probably detect an allocated data sector first (qemu likes
+        # to allocate the first sector to facilitate alignment probing), and
+        # then the rest to be zero.  The BSC will thus contain (if anything)
+        # one range covering the first sector.
+        map_pre = qemu_img_pipe('map', '--output=json', '--image-opts',
+                                nbd_img_opts)
+
+        # qemu:allocation-depth maps for want_zero=false.
+        # want_zero=false should (with the file driver, which the server is
+        # using) report everything as data.  While this is sufficient for
+        # want_zero=false, this is nothing that should end up in the
+        # block-status cache.
+        # Due to a bug, this information did end up in the cache, though, and
+        # this would lead to wrong information being returned on subsequent
+        # want_zero=true calls.
+        #
+        # We need to run this map twice: On the first call, we probably still
+        # have the first sector in the cache, and so this will be served from
+        # the cache; and only the subsequent range will be queried from the
+        # block driver.  This subsequent range will then be entered into the
+        # cache.
+        # If we did a want_zero=true call at this point, we would thus get
+        # correct information: The first sector is not covered by the cache, so
+        # we would get fresh block-status information from the driver, which
+        # would return a data range, and this would then go into the cache,
+        # evicting the wrong range from the want_zero=false call before.
+        #
+        # Therefore, we need a second want_zero=false map to reproduce:
+        # Since the first sector is not in the cache, the query for its status
+        # will go to the driver, which will return a result that reports the
+        # whole image to be a single data area.  This result will then go into
+        # the cache, and so the cache will then report the whole image to
+        # contain data.
+        #
+        # Note that once the cache reports the whole image to contain data, any
+        # subsequent map operation will be served from the cache, and so we can
+        # never loop too many times here.
+        for _ in range(2):
+            # (Ignore the result, this is just to contaminate the cache)
+            qemu_img_pipe('map', '--output=json', '--image-opts',
+                          nbd_img_opts_alloc_depth)
+
+        # Now let's see whether the cache reports everything as data, or
+        # whether we get correct information (i.e. the same as we got on our
+        # first attempt).
+        map_post = qemu_img_pipe('map', '--output=json', '--image-opts',
+                                 nbd_img_opts)
+
+        if map_pre != map_post:
+            print('ERROR: Map information differs before and after querying ' +
+                  'qemu:allocation-depth')
+            print('Before:')
+            print(map_pre)
+            print('After:')
+            print(map_post)
+
+            self.fail("Map information differs")
+
+
+if __name__ == '__main__':
+    # The block-status cache only works on the protocol layer, so to test it,
+    # we can only use the raw format
+    iotests.main(supported_fmts=['raw'],
+                 supported_protocols=['file'])
diff --git a/tests/qemu-iotests/tests/block-status-cache.out b/tests/qemu-iotests/tests/block-status-cache.out
new file mode 100644
index 00000000000..ae1213e6f86
--- /dev/null
+++ b/tests/qemu-iotests/tests/block-status-cache.out
@@ -0,0 +1,5 @@
+.
+----------------------------------------------------------------------
+Ran 1 tests
+
+OK
-- 
Gitee


From 5260b19e073c58d481633ece4753b7427eab9df8 Mon Sep 17 00:00:00 2001
From: Hanna Reitz <hreitz@redhat.com>
Date: Fri, 4 Feb 2022 12:10:06 +0100
Subject: [PATCH 042/407] block/nbd: Delete reconnect delay timer when done

RH-Author: Hanna Reitz <hreitz@redhat.com>
RH-MergeRequest: 117: block/nbd: Handle AioContext changes
RH-Commit: [1/6] 70814602a8a43a7c14857d76266d82b1aa5174a9
RH-Bugzilla: 2035185
RH-Acked-by: Kevin Wolf <kwolf@redhat.com>
RH-Acked-by: Eric Blake <eblake@redhat.com>
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

We start the reconnect delay timer to cancel the reconnection attempt
after a while.  Once nbd_co_do_establish_connection() has returned, this
attempt is over, and we no longer need the timer.

Delete it before returning from nbd_reconnect_attempt(), so that it does
not persist beyond the I/O request that was paused for reconnecting; we
do not want it to fire in a drained section, because all sort of things
can happen in such a section (e.g. the AioContext might be changed, and
we do not want the timer to fire in the wrong context; or the BDS might
even be deleted, and so the timer CB would access already-freed data).

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
(cherry picked from commit 3ce1fc16bad9c3f8b7b10b451a224d6d76e5c551)
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
 block/nbd.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/block/nbd.c b/block/nbd.c
index 5ef462db1b7..b8e5a9b4cc8 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -353,6 +353,13 @@ static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
     }
 
     nbd_co_do_establish_connection(s->bs, NULL);
+
+    /*
+     * The reconnect attempt is done (maybe successfully, maybe not), so
+     * we no longer need this timer.  Delete it so it will not outlive
+     * this I/O request (so draining removes all timers).
+     */
+    reconnect_delay_timer_del(s);
 }
 
 static coroutine_fn int nbd_receive_replies(BDRVNBDState *s, uint64_t handle)
-- 
Gitee


From bb9a92083a15d3299f8a58f08e541271c78b0a05 Mon Sep 17 00:00:00 2001
From: Hanna Reitz <hreitz@redhat.com>
Date: Fri, 4 Feb 2022 12:10:08 +0100
Subject: [PATCH 043/407] block/nbd: Assert there are no timers when closed

RH-Author: Hanna Reitz <hreitz@redhat.com>
RH-MergeRequest: 117: block/nbd: Handle AioContext changes
RH-Commit: [2/6] 995795ae9844a7d2b28cb1e57fd7fe81482d0205
RH-Bugzilla: 2035185
RH-Acked-by: Kevin Wolf <kwolf@redhat.com>
RH-Acked-by: Eric Blake <eblake@redhat.com>
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

Our two timers must not remain armed beyond nbd_clear_bdrvstate(), or
they will access freed data when they fire.

This patch is separate from the patches that actually fix the issue
(HEAD^^ and HEAD^) so that you can run the associated regression iotest
(281) on a configuration that reproducibly exposes the bug.

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
(cherry picked from commit 8a39c381e5e407d2fe5500324323f90a8540fa90)

Conflict:
- block/nbd.c: open_timer was introduced after the 6.2 release (for
  nbd's @open-timeout parameter), and has not been backported, so drop
  the assertion that it is NULL

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
 block/nbd.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/nbd.c b/block/nbd.c
index b8e5a9b4cc8..aab20125d88 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -108,6 +108,9 @@ static void nbd_clear_bdrvstate(BlockDriverState *bs)
 
     yank_unregister_instance(BLOCKDEV_YANK_INSTANCE(bs->node_name));
 
+    /* Must not leave timers behind that would access freed data */
+    assert(!s->reconnect_delay_timer);
+
     object_unref(OBJECT(s->tlscreds));
     qapi_free_SocketAddress(s->saddr);
     s->saddr = NULL;
-- 
Gitee


From 0d459470b7193977ea37240066d091cd5310ac44 Mon Sep 17 00:00:00 2001
From: Hanna Reitz <hreitz@redhat.com>
Date: Fri, 4 Feb 2022 12:10:09 +0100
Subject: [PATCH 044/407] iotests.py: Add QemuStorageDaemon class

RH-Author: Hanna Reitz <hreitz@redhat.com>
RH-MergeRequest: 117: block/nbd: Handle AioContext changes
RH-Commit: [3/6] 754fe76bc5e8be57f4b78f176531014c4a12b044
RH-Bugzilla: 2035185
RH-Acked-by: Kevin Wolf <kwolf@redhat.com>
RH-Acked-by: Eric Blake <eblake@redhat.com>
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

This is a rather simple class that allows creating a QSD instance
running in the background and stopping it when no longer needed.

The __del__ handler is a safety net for when something goes so wrong in
a test that e.g. the tearDown() method is not called (e.g. setUp()
launches the QSD, but then launching a VM fails).  We do not want the
QSD to continue running after the test has failed, so __del__() will
take care to kill it.

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
(cherry picked from commit 091dc7b2b5553a529bff9a7bf9ad3bc85bc5bdcd)
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
 tests/qemu-iotests/iotests.py | 40 +++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 83bfedb9020..a51b5ce8cd8 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -72,6 +72,8 @@
 qemu_prog = os.environ.get('QEMU_PROG', 'qemu')
 qemu_opts = os.environ.get('QEMU_OPTIONS', '').strip().split(' ')
 
+qsd_prog = os.environ.get('QSD_PROG', 'qemu-storage-daemon')
+
 gdb_qemu_env = os.environ.get('GDB_OPTIONS')
 qemu_gdb = []
 if gdb_qemu_env:
@@ -312,6 +314,44 @@ def cmd(self, cmd):
         return self._read_output()
 
 
+class QemuStorageDaemon:
+    def __init__(self, *args: str, instance_id: str = 'a'):
+        assert '--pidfile' not in args
+        self.pidfile = os.path.join(test_dir, f'qsd-{instance_id}-pid')
+        all_args = [qsd_prog] + list(args) + ['--pidfile', self.pidfile]
+
+        # Cannot use with here, we want the subprocess to stay around
+        # pylint: disable=consider-using-with
+        self._p = subprocess.Popen(all_args)
+        while not os.path.exists(self.pidfile):
+            if self._p.poll() is not None:
+                cmd = ' '.join(all_args)
+                raise RuntimeError(
+                    'qemu-storage-daemon terminated with exit code ' +
+                    f'{self._p.returncode}: {cmd}')
+
+            time.sleep(0.01)
+
+        with open(self.pidfile, encoding='utf-8') as f:
+            self._pid = int(f.read().strip())
+
+        assert self._pid == self._p.pid
+
+    def stop(self, kill_signal=15):
+        self._p.send_signal(kill_signal)
+        self._p.wait()
+        self._p = None
+
+        try:
+            os.remove(self.pidfile)
+        except OSError:
+            pass
+
+    def __del__(self):
+        if self._p is not None:
+            self.stop(kill_signal=9)
+
+
 def qemu_nbd(*args):
     '''Run qemu-nbd in daemon mode and return the parent's exit code'''
     return subprocess.call(qemu_nbd_args + ['--fork'] + list(args))
-- 
Gitee


From 4dbad240e6290fd82aac69c4c7504ba652c4c00f Mon Sep 17 00:00:00 2001
From: Hanna Reitz <hreitz@redhat.com>
Date: Fri, 4 Feb 2022 12:10:10 +0100
Subject: [PATCH 045/407] iotests/281: Test lingering timers

RH-Author: Hanna Reitz <hreitz@redhat.com>
RH-MergeRequest: 117: block/nbd: Handle AioContext changes
RH-Commit: [4/6] aaad466941637a34224dc037bbea37d128b5676b
RH-Bugzilla: 2035185
RH-Acked-by: Kevin Wolf <kwolf@redhat.com>
RH-Acked-by: Eric Blake <eblake@redhat.com>
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

Prior to "block/nbd: Delete reconnect delay timer when done" and
"block/nbd: Delete open timer when done", both of those timers would
remain scheduled even after successfully (re-)connecting to the server,
and they would not even be deleted when the BDS is deleted.

This test constructs exactly this situation:
(1) Configure an @open-timeout, so the open timer is armed, and
(2) Configure a @reconnect-delay and trigger a reconnect situation
    (which succeeds immediately), so the reconnect delay timer is armed.
Then we immediately delete the BDS, and sleep for longer than the
@open-timeout and @reconnect-delay.  Prior to said patches, this caused
one (or both) of the timer CBs to access already-freed data.

Accessing freed data may or may not crash, so this test can produce
false successes, but I do not know how to show the problem in a better
or more reliable way.  If you run this test on "block/nbd: Assert there
are no timers when closed" and without the fix patches mentioned above,
you should reliably see an assertion failure.
(But all other tests that use the reconnect delay timer (264 and 277)
will fail in that configuration, too; as will nbd-reconnect-on-open,
which uses the open timer.)

Remove this test from the quick group because of the two second sleep
this patch introduces.

(I decided to put this test case into 281, because the main bug this
series addresses is in the interaction of the NBD block driver and I/O
threads, which is precisely the scope of 281.  The test case for that
other bug will also be put into the test class added here.

Also, excuse the test class's name, I couldn't come up with anything
better.  The "yield" part will make sense two patches from now.)

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
(cherry picked from commit eaf1e85d4ddefdbd197f393fa9c5acc7ba8133b0)

Conflict:
- @open-timeout was introduced after the 6.2 release, and has not been
  backported.  Consequently, there is no open_timer, and we can (and
  must) drop the respective parts of the test here.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
 tests/qemu-iotests/281     | 73 ++++++++++++++++++++++++++++++++++++--
 tests/qemu-iotests/281.out |  4 +--
 2 files changed, 73 insertions(+), 4 deletions(-)

diff --git a/tests/qemu-iotests/281 b/tests/qemu-iotests/281
index 956698083f0..13c588be757 100755
--- a/tests/qemu-iotests/281
+++ b/tests/qemu-iotests/281
@@ -1,5 +1,5 @@
 #!/usr/bin/env python3
-# group: rw quick
+# group: rw
 #
 # Test cases for blockdev + IOThread interactions
 #
@@ -20,8 +20,9 @@
 #
 
 import os
+import time
 import iotests
-from iotests import qemu_img
+from iotests import qemu_img, QemuStorageDaemon
 
 image_len = 64 * 1024 * 1024
 
@@ -243,6 +244,74 @@ class TestBlockdevBackupAbort(iotests.QMPTestCase):
         # Hangs on failure, we expect this error.
         self.assert_qmp(result, 'error/class', 'GenericError')
 
+# Test for RHBZ#2033626
+class TestYieldingAndTimers(iotests.QMPTestCase):
+    sock = os.path.join(iotests.sock_dir, 'nbd.sock')
+    qsd = None
+
+    def setUp(self):
+        self.create_nbd_export()
+
+        # Simple VM with an NBD block device connected to the NBD export
+        # provided by the QSD
+        self.vm = iotests.VM()
+        self.vm.add_blockdev('nbd,node-name=nbd,server.type=unix,' +
+                             f'server.path={self.sock},export=exp,' +
+                             'reconnect-delay=1')
+
+        self.vm.launch()
+
+    def tearDown(self):
+        self.stop_nbd_export()
+        self.vm.shutdown()
+
+    def test_timers_with_blockdev_del(self):
+        # Stop and restart the NBD server, and do some I/O on the client to
+        # trigger a reconnect and start the reconnect delay timer
+        self.stop_nbd_export()
+        self.create_nbd_export()
+
+        result = self.vm.qmp('human-monitor-command',
+                             command_line='qemu-io nbd "write 0 512"')
+        self.assert_qmp(result, 'return', '')
+
+        # Reconnect is done, so the reconnect delay timer should be gone.
+        # (But there used to be a bug where it remained active, for which this
+        # is a regression test.)
+
+        # Delete the BDS to see whether the timer is gone.  If it is not,
+        # it will remain active, fire later, and then access freed data.
+        # (Or, with "block/nbd: Assert there are no timers when closed"
+        # applied, the assertion added in that patch will fail.)
+        result = self.vm.qmp('blockdev-del', node_name='nbd')
+        self.assert_qmp(result, 'return', {})
+
+        # Give the timer some time to fire (it has a timeout of 1 s).
+        # (Sleeping in an iotest may ring some alarm bells, but note that if
+        # the timing is off here, the test will just always pass.  If we kill
+        # the VM too early, then we just kill the timer before it can fire,
+        # thus not see the error, and so the test will pass.)
+        time.sleep(2)
+
+    def create_nbd_export(self):
+        assert self.qsd is None
+
+        # Simple NBD export of a null-co BDS
+        self.qsd = QemuStorageDaemon(
+            '--blockdev',
+            'null-co,node-name=null,read-zeroes=true',
+
+            '--nbd-server',
+            f'addr.type=unix,addr.path={self.sock}',
+
+            '--export',
+            'nbd,id=exp,node-name=null,name=exp,writable=true'
+        )
+
+    def stop_nbd_export(self):
+        self.qsd.stop()
+        self.qsd = None
+
 if __name__ == '__main__':
     iotests.main(supported_fmts=['qcow2'],
                  supported_protocols=['file'])
diff --git a/tests/qemu-iotests/281.out b/tests/qemu-iotests/281.out
index 89968f35d78..914e3737bd7 100644
--- a/tests/qemu-iotests/281.out
+++ b/tests/qemu-iotests/281.out
@@ -1,5 +1,5 @@
-....
+.....
 ----------------------------------------------------------------------
-Ran 4 tests
+Ran 5 tests
 
 OK
-- 
Gitee


From a9d3d347cdeeab6bc34414041c3e0861643ef487 Mon Sep 17 00:00:00 2001
From: Hanna Reitz <hreitz@redhat.com>
Date: Fri, 4 Feb 2022 12:10:11 +0100
Subject: [PATCH 046/407] block/nbd: Move s->ioc on AioContext change

RH-Author: Hanna Reitz <hreitz@redhat.com>
RH-MergeRequest: 117: block/nbd: Handle AioContext changes
RH-Commit: [5/6] 107757b9fbadfb832c75521317108525daa4174e
RH-Bugzilla: 2035185
RH-Acked-by: Kevin Wolf <kwolf@redhat.com>
RH-Acked-by: Eric Blake <eblake@redhat.com>
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

s->ioc must always be attached to the NBD node's AioContext.  If that
context changes, s->ioc must be attached to the new context.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2033626
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
(cherry picked from commit e15f3a66c830e3fce99c9d56c493c2f7078a1225)

Conflict:
- block/nbd.c: open_timer was added after the 6.2 release, so we need
  not (and cannot) assert it is NULL here.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
 block/nbd.c | 41 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/block/nbd.c b/block/nbd.c
index aab20125d88..a3896c7f5f4 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -2003,6 +2003,38 @@ static void nbd_cancel_in_flight(BlockDriverState *bs)
     nbd_co_establish_connection_cancel(s->conn);
 }
 
+static void nbd_attach_aio_context(BlockDriverState *bs,
+                                   AioContext *new_context)
+{
+    BDRVNBDState *s = bs->opaque;
+
+    /*
+     * The reconnect_delay_timer is scheduled in I/O paths when the
+     * connection is lost, to cancel the reconnection attempt after a
+     * given time.  Once this attempt is done (successfully or not),
+     * nbd_reconnect_attempt() ensures the timer is deleted before the
+     * respective I/O request is resumed.
+     * Since the AioContext can only be changed when a node is drained,
+     * the reconnect_delay_timer cannot be active here.
+     */
+    assert(!s->reconnect_delay_timer);
+
+    if (s->ioc) {
+        qio_channel_attach_aio_context(s->ioc, new_context);
+    }
+}
+
+static void nbd_detach_aio_context(BlockDriverState *bs)
+{
+    BDRVNBDState *s = bs->opaque;
+
+    assert(!s->reconnect_delay_timer);
+
+    if (s->ioc) {
+        qio_channel_detach_aio_context(s->ioc);
+    }
+}
+
 static BlockDriver bdrv_nbd = {
     .format_name                = "nbd",
     .protocol_name              = "nbd",
@@ -2026,6 +2058,9 @@ static BlockDriver bdrv_nbd = {
     .bdrv_dirname               = nbd_dirname,
     .strong_runtime_opts        = nbd_strong_runtime_opts,
     .bdrv_cancel_in_flight      = nbd_cancel_in_flight,
+
+    .bdrv_attach_aio_context    = nbd_attach_aio_context,
+    .bdrv_detach_aio_context    = nbd_detach_aio_context,
 };
 
 static BlockDriver bdrv_nbd_tcp = {
@@ -2051,6 +2086,9 @@ static BlockDriver bdrv_nbd_tcp = {
     .bdrv_dirname               = nbd_dirname,
     .strong_runtime_opts        = nbd_strong_runtime_opts,
     .bdrv_cancel_in_flight      = nbd_cancel_in_flight,
+
+    .bdrv_attach_aio_context    = nbd_attach_aio_context,
+    .bdrv_detach_aio_context    = nbd_detach_aio_context,
 };
 
 static BlockDriver bdrv_nbd_unix = {
@@ -2076,6 +2114,9 @@ static BlockDriver bdrv_nbd_unix = {
     .bdrv_dirname               = nbd_dirname,
     .strong_runtime_opts        = nbd_strong_runtime_opts,
     .bdrv_cancel_in_flight      = nbd_cancel_in_flight,
+
+    .bdrv_attach_aio_context    = nbd_attach_aio_context,
+    .bdrv_detach_aio_context    = nbd_detach_aio_context,
 };
 
 static void bdrv_nbd_init(void)
-- 
Gitee


From 31112f00b85619618ee510addeb002e3ba298c47 Mon Sep 17 00:00:00 2001
From: Hanna Reitz <hreitz@redhat.com>
Date: Fri, 4 Feb 2022 12:10:12 +0100
Subject: [PATCH 047/407] iotests/281: Let NBD connection yield in iothread

RH-Author: Hanna Reitz <hreitz@redhat.com>
RH-MergeRequest: 117: block/nbd: Handle AioContext changes
RH-Commit: [6/6] a23706f34022d301eb7ffc84fc0d0a77d72b9844
RH-Bugzilla: 2035185
RH-Acked-by: Kevin Wolf <kwolf@redhat.com>
RH-Acked-by: Eric Blake <eblake@redhat.com>
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

Put an NBD block device into an I/O thread, and then read data from it,
hoping that the NBD connection will yield during that read.  When it
does, the coroutine must be reentered in the block device's I/O thread,
which will only happen if the NBD block driver attaches the connection's
QIOChannel to the new AioContext.  It did not do that after 4ddb5d2fde
("block/nbd: drop connection_co") and prior to "block/nbd: Move s->ioc
on AioContext change", which would cause an assertion failure.

To improve our chances of yielding, the NBD server is throttled to
reading 64 kB/s, and the NBD client reads 128 kB, so it should yield at
some point.

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
(cherry picked from commit 8cfbe929e8c26050f0a4580a1606a370a947d4ce)
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
 tests/qemu-iotests/281     | 28 +++++++++++++++++++++++++---
 tests/qemu-iotests/281.out |  4 ++--
 2 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/tests/qemu-iotests/281 b/tests/qemu-iotests/281
index 13c588be757..b2ead7f3882 100755
--- a/tests/qemu-iotests/281
+++ b/tests/qemu-iotests/281
@@ -253,8 +253,9 @@ class TestYieldingAndTimers(iotests.QMPTestCase):
         self.create_nbd_export()
 
         # Simple VM with an NBD block device connected to the NBD export
-        # provided by the QSD
+        # provided by the QSD, and an (initially unused) iothread
         self.vm = iotests.VM()
+        self.vm.add_object('iothread,id=iothr')
         self.vm.add_blockdev('nbd,node-name=nbd,server.type=unix,' +
                              f'server.path={self.sock},export=exp,' +
                              'reconnect-delay=1')
@@ -293,19 +294,40 @@ class TestYieldingAndTimers(iotests.QMPTestCase):
         # thus not see the error, and so the test will pass.)
         time.sleep(2)
 
+    def test_yield_in_iothread(self):
+        # Move the NBD node to the I/O thread; the NBD block driver should
+        # attach the connection's QIOChannel to that thread's AioContext, too
+        result = self.vm.qmp('x-blockdev-set-iothread',
+                             node_name='nbd', iothread='iothr')
+        self.assert_qmp(result, 'return', {})
+
+        # Do some I/O that will be throttled by the QSD, so that the network
+        # connection hopefully will yield here.  When it is resumed, it must
+        # then be resumed in the I/O thread's AioContext.
+        result = self.vm.qmp('human-monitor-command',
+                             command_line='qemu-io nbd "read 0 128K"')
+        self.assert_qmp(result, 'return', '')
+
     def create_nbd_export(self):
         assert self.qsd is None
 
-        # Simple NBD export of a null-co BDS
+        # Export a throttled null-co BDS: Reads are throttled (max 64 kB/s),
+        # writes are not.
         self.qsd = QemuStorageDaemon(
+            '--object',
+            'throttle-group,id=thrgr,x-bps-read=65536,x-bps-read-max=65536',
+
             '--blockdev',
             'null-co,node-name=null,read-zeroes=true',
 
+            '--blockdev',
+            'throttle,node-name=thr,file=null,throttle-group=thrgr',
+
             '--nbd-server',
             f'addr.type=unix,addr.path={self.sock}',
 
             '--export',
-            'nbd,id=exp,node-name=null,name=exp,writable=true'
+            'nbd,id=exp,node-name=thr,name=exp,writable=true'
         )
 
     def stop_nbd_export(self):
diff --git a/tests/qemu-iotests/281.out b/tests/qemu-iotests/281.out
index 914e3737bd7..3f8a935a082 100644
--- a/tests/qemu-iotests/281.out
+++ b/tests/qemu-iotests/281.out
@@ -1,5 +1,5 @@
-.....
+......
 ----------------------------------------------------------------------
-Ran 5 tests
+Ran 6 tests
 
 OK
-- 
Gitee


From ced94530c020e5d13b79cc033169c702cf9ab66d Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Tue, 22 Mar 2022 19:23:36 -0400
Subject: [PATCH 048/407] Revert "redhat: Add hw_compat_4_2_extra and apply to
 upstream machines"

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 131: Revert "redhat: Add hw_compat_4_2_extra and apply to upstream machines"
RH-Commit: [1/3] 47b7d9e5062f5e215d5ed1a3ecdc1a87ac3fa630 (jmaloy/qemu-kvm)
RH-Bugzilla: 2062613
RH-Acked-by: Peter Xu <peterx@redhat.com>
RH-Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

BZ: https://bugzilla.redhat.com/2062613
UPSTREAM: no
BREW: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=44038000

commit dc2e9ec1e014950c7918e23a3e9b0096b34a4a92
Author: Dr. David Alan Gilbert <dgilbert@redhat.com>
Date:   Wed Mar 9 10:31:53 2022 +0000

    Revert "redhat: Add hw_compat_4_2_extra and apply to upstream machines"

    This reverts commit 66882f9a3230246409f3918424aca26add5c034a.
    We no longer need these compat machines it was added for.

    Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

(cherry picked from commit dc2e9ec1e014950c7918e23a3e9b0096b34a4a92)
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 hw/i386/pc.c         | 12 ------------
 hw/i386/pc_piix.c    |  6 ------
 include/hw/i386/pc.h |  3 ---
 3 files changed, 21 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 4c08a1971ca..357257349be 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -670,18 +670,6 @@ GlobalProperty pc_rhel_7_0_compat[] = {
 };
 const size_t pc_rhel_7_0_compat_len = G_N_ELEMENTS(pc_rhel_7_0_compat);
 
-/*
- * RHEL: These properties only apply to the RHEL exported machine types
- * pc-4.2/2.11 for the purpose to have a limited upstream machines support
- * which can be migrated to RHEL.  Let's avoid touching hw_compat_4_2 directly
- * so that we can have some isolation against the upstream code.
- */
-GlobalProperty hw_compat_4_2_extra[] = {
-    /* By default enlarge the default virtio-net-pci ROM to 512KB. */
-    { "virtio-net-pci", "romsize", "0x80000" },
-};
-const size_t hw_compat_4_2_extra_len = G_N_ELEMENTS(hw_compat_4_2_extra);
-
 GSIState *pc_gsi_create(qemu_irq **irqs, bool pci_enabled)
 {
     GSIState *s;
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index c30057c4431..7b7076cbc7e 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -531,12 +531,6 @@ static void pc_i440fx_4_2_machine_options(MachineClass *m)
      * supported by RHEL, even if exported.
      */
     m->deprecation_reason = "Not supported by RHEL";
-    /*
-     * RHEL: Specific compat properties to have limited support for upstream
-     * machines exported.
-     */
-    compat_props_add(m->compat_props, hw_compat_4_2_extra,
-                     hw_compat_4_2_extra_len);
 }
 
 /* RHEL: Export pc-4.2 */
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 9e8bfb69f80..4a593acb50f 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -325,9 +325,6 @@ extern const size_t pc_rhel_7_1_compat_len;
 extern GlobalProperty pc_rhel_7_0_compat[];
 extern const size_t pc_rhel_7_0_compat_len;
 
-extern GlobalProperty hw_compat_4_2_extra[];
-extern const size_t hw_compat_4_2_extra_len;
-
 /* Helper for setting model-id for CPU models that changed model-id
  * depending on QEMU versions up to QEMU 2.4.
  */
-- 
Gitee


From 4435b7d6ec78a0f51a2c83aedde591d45de457f2 Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Tue, 22 Mar 2022 19:23:36 -0400
Subject: [PATCH 049/407] Revert "redhat: Enable FDC device for upstream
 machines too"

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 131: Revert "redhat: Add hw_compat_4_2_extra and apply to upstream machines"
RH-Commit: [2/3] 4e3c945e3de9bb9d9a6d24115f0719168c9669fe (jmaloy/qemu-kvm)
RH-Bugzilla: 2062613
RH-Acked-by: Peter Xu <peterx@redhat.com>
RH-Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

BZ: https://bugzilla.redhat.com/2062613
UPSTREAM: no
BREW: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=44038000

commit 597cb6ca1da4a3eea77c1e4928f55203a1d5c70c
Author: Dr. David Alan Gilbert <dgilbert@redhat.com>
Date:   Wed Mar 9 10:32:39 2022 +0000

    Revert "redhat: Enable FDC device for upstream machines too"

    This reverts commit c4d1aa8bf21fe98da94a9cff30b7c25bed12c17f.
    We no longer need these compat machines it was added for.

  Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

(cherry picked from commit 597cb6ca1da4a3eea77c1e4928f55203a1d5c70c)
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 hw/block/fdc.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/hw/block/fdc.c b/hw/block/fdc.c
index 63042ef030b..97fa6de4239 100644
--- a/hw/block/fdc.c
+++ b/hw/block/fdc.c
@@ -2341,10 +2341,7 @@ void fdctrl_realize_common(DeviceState *dev, FDCtrl *fdctrl, Error **errp)
 
     /* Restricted for Red Hat Enterprise Linux: */
     MachineClass *mc = MACHINE_GET_CLASS(qdev_get_machine());
-    if (!strstr(mc->name, "-rhel7.") &&
-        /* Exported two upstream machine types allows FDC too */
-        strcmp(mc->name, "pc-i440fx-4.2") &&
-        strcmp(mc->name, "pc-i440fx-2.11")) {
+    if (!strstr(mc->name, "-rhel7.")) {
         error_setg(errp, "Device %s is not supported with machine type %s",
                    object_get_typename(OBJECT(dev)), mc->name);
         return;
-- 
Gitee


From 2075027e75fc2c98cf0dcc87e626447ffc51763a Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Tue, 22 Mar 2022 19:23:36 -0400
Subject: [PATCH 050/407] Revert "redhat: Expose upstream machines pc-4.2 and
 pc-2.11"

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 131: Revert "redhat: Add hw_compat_4_2_extra and apply to upstream machines"
RH-Commit: [3/3] 35cee68034580f81b3aa916921eecd2fdfa7dd15 (jmaloy/qemu-kvm)
RH-Bugzilla: 2062613
RH-Acked-by: Peter Xu <peterx@redhat.com>
RH-Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

BZ: https://bugzilla.redhat.com/2062613
UPSTREAM: no
BREW: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=44038000

commit f3b50d6d4ae0be9e64aafe6a15f5423bab4899e9
Author: Dr. David Alan Gilbert <dgilbert@redhat.com>
Date:   Wed Mar 9 10:34:58 2022 +0000

    Revert "redhat: Expose upstream machines pc-4.2 and pc-2.11"
    This reverts commit 618e2424edba499d52cd26cf8363bc2dd85ef149.
    We no longer need these compat machines.

    Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

(cherry picked from commit f3b50d6d4ae0be9e64aafe6a15f5423bab4899e9)
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 hw/i386/pc_piix.c | 37 -------------------------------------
 1 file changed, 37 deletions(-)

diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 7b7076cbc7e..f03a8f0db8e 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -315,14 +315,6 @@ static void pc_init1(MachineState *machine,
  * hw_compat_*, pc_compat_*, or * pc_*_machine_options().
  */
 
-/*
- * NOTE!  Not all the upstream machine types are disabled for RHEL.  For
- * providing a very limited support for upstream machine types, pc machines
- * 2.11 and 4.2 are exposed explicitly.  This will make the below "#if" macros
- * a bit messed up, but please read this comment first so that we can have a
- * rough understanding of what we're going to do.
- */
-
 #if 0 /* Disabled for Red Hat Enterprise Linux */
 static void pc_compat_2_3_fn(MachineState *machine)
 {
@@ -399,8 +391,6 @@ static void pc_xen_hvm_init(MachineState *machine)
 }
 #endif
 
-#endif /* Disabled for Red Hat Enterprise Linux */
-
 #define DEFINE_I440FX_MACHINE(suffix, name, compatfn, optionfn) \
     static void pc_init_##suffix(MachineState *machine) \
     { \
@@ -465,10 +455,8 @@ static void pc_i440fx_6_0_machine_options(MachineClass *m)
     compat_props_add(m->compat_props, pc_compat_6_0, pc_compat_6_0_len);
 }
 
-#if 0 /* Disabled for Red Hat Enterprise Linux */
 DEFINE_I440FX_MACHINE(v6_0, "pc-i440fx-6.0", NULL,
                       pc_i440fx_6_0_machine_options);
-#endif /* Disabled for Red Hat Enterprise Linux */
 
 static void pc_i440fx_5_2_machine_options(MachineClass *m)
 {
@@ -479,10 +467,8 @@ static void pc_i440fx_5_2_machine_options(MachineClass *m)
     compat_props_add(m->compat_props, pc_compat_5_2, pc_compat_5_2_len);
 }
 
-#if 0 /* Disabled for Red Hat Enterprise Linux */
 DEFINE_I440FX_MACHINE(v5_2, "pc-i440fx-5.2", NULL,
                       pc_i440fx_5_2_machine_options);
-#endif /* Disabled for Red Hat Enterprise Linux */
 
 static void pc_i440fx_5_1_machine_options(MachineClass *m)
 {
@@ -497,10 +483,8 @@ static void pc_i440fx_5_1_machine_options(MachineClass *m)
     pcmc->pci_root_uid = 1;
 }
 
-#if 0 /* Disabled for Red Hat Enterprise Linux */
 DEFINE_I440FX_MACHINE(v5_1, "pc-i440fx-5.1", NULL,
                       pc_i440fx_5_1_machine_options);
-#endif /* Disabled for Red Hat Enterprise Linux */
 
 static void pc_i440fx_5_0_machine_options(MachineClass *m)
 {
@@ -513,10 +497,8 @@ static void pc_i440fx_5_0_machine_options(MachineClass *m)
     m->auto_enable_numa_with_memdev = false;
 }
 
-#if 0 /* Disabled for Red Hat Enterprise Linux */
 DEFINE_I440FX_MACHINE(v5_0, "pc-i440fx-5.0", NULL,
                       pc_i440fx_5_0_machine_options);
-#endif /* Disabled for Red Hat Enterprise Linux */
 
 static void pc_i440fx_4_2_machine_options(MachineClass *m)
 {
@@ -525,15 +507,8 @@ static void pc_i440fx_4_2_machine_options(MachineClass *m)
     m->is_default = false;
     compat_props_add(m->compat_props, hw_compat_4_2, hw_compat_4_2_len);
     compat_props_add(m->compat_props, pc_compat_4_2, pc_compat_4_2_len);
-
-    /*
-     * RHEL: Mark all upstream machines as deprecated because they're not
-     * supported by RHEL, even if exported.
-     */
-    m->deprecation_reason = "Not supported by RHEL";
 }
 
-/* RHEL: Export pc-4.2 */
 DEFINE_I440FX_MACHINE(v4_2, "pc-i440fx-4.2", NULL,
                       pc_i440fx_4_2_machine_options);
 
@@ -546,10 +521,8 @@ static void pc_i440fx_4_1_machine_options(MachineClass *m)
     compat_props_add(m->compat_props, pc_compat_4_1, pc_compat_4_1_len);
 }
 
-#if 0 /* Disabled for Red Hat Enterprise Linux */
 DEFINE_I440FX_MACHINE(v4_1, "pc-i440fx-4.1", NULL,
                       pc_i440fx_4_1_machine_options);
-#endif /* Disabled for Red Hat Enterprise Linux */
 
 static void pc_i440fx_4_0_machine_options(MachineClass *m)
 {
@@ -562,10 +535,8 @@ static void pc_i440fx_4_0_machine_options(MachineClass *m)
     compat_props_add(m->compat_props, pc_compat_4_0, pc_compat_4_0_len);
 }
 
-#if 0 /* Disabled for Red Hat Enterprise Linux */
 DEFINE_I440FX_MACHINE(v4_0, "pc-i440fx-4.0", NULL,
                       pc_i440fx_4_0_machine_options);
-#endif /* Disabled for Red Hat Enterprise Linux */
 
 static void pc_i440fx_3_1_machine_options(MachineClass *m)
 {
@@ -581,10 +552,8 @@ static void pc_i440fx_3_1_machine_options(MachineClass *m)
     compat_props_add(m->compat_props, pc_compat_3_1, pc_compat_3_1_len);
 }
 
-#if 0 /* Disabled for Red Hat Enterprise Linux */
 DEFINE_I440FX_MACHINE(v3_1, "pc-i440fx-3.1", NULL,
                       pc_i440fx_3_1_machine_options);
-#endif /* Disabled for Red Hat Enterprise Linux */
 
 static void pc_i440fx_3_0_machine_options(MachineClass *m)
 {
@@ -593,10 +562,8 @@ static void pc_i440fx_3_0_machine_options(MachineClass *m)
     compat_props_add(m->compat_props, pc_compat_3_0, pc_compat_3_0_len);
 }
 
-#if 0 /* Disabled for Red Hat Enterprise Linux */
 DEFINE_I440FX_MACHINE(v3_0, "pc-i440fx-3.0", NULL,
                       pc_i440fx_3_0_machine_options);
-#endif /* Disabled for Red Hat Enterprise Linux */
 
 static void pc_i440fx_2_12_machine_options(MachineClass *m)
 {
@@ -605,10 +572,8 @@ static void pc_i440fx_2_12_machine_options(MachineClass *m)
     compat_props_add(m->compat_props, pc_compat_2_12, pc_compat_2_12_len);
 }
 
-#if 0 /* Disabled for Red Hat Enterprise Linux */
 DEFINE_I440FX_MACHINE(v2_12, "pc-i440fx-2.12", NULL,
                       pc_i440fx_2_12_machine_options);
-#endif /* Disabled for Red Hat Enterprise Linux */
 
 static void pc_i440fx_2_11_machine_options(MachineClass *m)
 {
@@ -617,11 +582,9 @@ static void pc_i440fx_2_11_machine_options(MachineClass *m)
     compat_props_add(m->compat_props, pc_compat_2_11, pc_compat_2_11_len);
 }
 
-/* RHEL: Export pc-2.11 */
 DEFINE_I440FX_MACHINE(v2_11, "pc-i440fx-2.11", NULL,
                       pc_i440fx_2_11_machine_options);
 
-#if 0 /* Disabled for Red Hat Enterprise Linux */
 static void pc_i440fx_2_10_machine_options(MachineClass *m)
 {
     pc_i440fx_2_11_machine_options(m);
-- 
Gitee


From bc9c801dad00e191784e56d78c18d0d4ee07101b Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Tue, 22 Mar 2022 19:23:36 -0400
Subject: [PATCH 051/407] hw/virtio: vdpa: Fix leak of host-notifier
 memory-region

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 132: hw/virtio: vdpa: Fix leak of host-notifier memory-region
RH-Commit: [1/1] b3cec35d185e3b9844a458f5c51c5d5ef7e3d8f1 (jmaloy/qemu-kvm)
RH-Bugzilla: 2060843
RH-Acked-by: Stefano Garzarella <sgarzare@redhat.com>
RH-Acked-by: Laurent Vivier <lvivier@redhat.com>
RH-Acked-by: Igor Mammedov <imammedo@redhat.com>

BZ: https://bugzilla.redhat.com/2060843
UPSTREAM: no
BREW: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=44038138

commit 98f7607ecda00dea3cbb2ed7b4427c96846efb83
Author: Laurent Vivier <lvivier@redhat.com>
Date:   Fri Feb 11 18:02:59 2022 +0100

    hw/virtio: vdpa: Fix leak of host-notifier memory-region

    If call virtio_queue_set_host_notifier_mr fails, should free
    host-notifier memory-region.

    This problem can trigger a coredump with some vDPA drivers (mlx5,
    but not with the vdpasim), if we unplug the virtio-net card from
    the guest after a stop/start.

    The same fix has been done for vhost-user:
      1f89d3b91e3e ("hw/virtio: Fix leak of host-notifier memory-region")

    Fixes: d0416d487bd5 ("vhost-vdpa: map virtqueue notification area if possible")
    Cc: jasowang@redhat.com
    Resolves: https://bugzilla.redhat.com/2027208
    Signed-off-by: Laurent Vivier <lvivier@redhat.com>
    Message-Id: <20220211170259.1388734-1-lvivier@redhat.com>
    Cc: qemu-stable@nongnu.org
    Acked-by: Jason Wang <jasowang@redhat.com>
    Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
    Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

(cherry picked from commit 98f7607ecda00dea3cbb2ed7b4427c96846efb83)
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 hw/virtio/vhost-vdpa.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index bcaf00e09f3..78da48a333f 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -415,6 +415,7 @@ static int vhost_vdpa_host_notifier_init(struct vhost_dev *dev, int queue_index)
     g_free(name);
 
     if (virtio_queue_set_host_notifier_mr(vdev, queue_index, &n->mr, true)) {
+        object_unparent(OBJECT(&n->mr));
         munmap(addr, page_size);
         goto err;
     }
-- 
Gitee


From b75f5f504b85ed118772da6c53d0b9381c1c7e94 Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Wed, 23 Mar 2022 13:21:40 -0400
Subject: [PATCH 052/407] pci: expose TYPE_XIO3130_DOWNSTREAM name

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 134: pci: expose TYPE_XIO3130_DOWNSTREAM name
RH-Commit: [1/2] f09ddcaf686f22b545bf269f87787ebfc33fccda (jmaloy/qemu-kvm)
RH-Bugzilla: 2062610
RH-Acked-by: Igor Mammedov <imammedo@redhat.com>
RH-Acked-by: Gerd Hoffmann <kraxel@redhat.com>

BZ: https://bugzilla.redhat.com/2062610
UPSTREAM: merged
BREW: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=44038138

commit c41481af9a5d0d463607cc45b45c510875570817
Author: Igor Mammedov <imammedo@redhat.com>
Date:   Tue Mar 1 10:11:58 2022 -0500

    pci: expose TYPE_XIO3130_DOWNSTREAM name

    Type name will be used in followup patch for cast check
    in pcihp code.

    Signed-off-by: Igor Mammedov <imammedo@redhat.com>
    Message-Id: <20220301151200.3507298-2-imammedo@redhat.com>
    Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

(cherry picked from commit c41481af9a5d0d463607cc45b45c510875570817)
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 hw/pci-bridge/xio3130_downstream.c         |  3 ++-
 include/hw/pci-bridge/xio3130_downstream.h | 15 +++++++++++++++
 2 files changed, 17 insertions(+), 1 deletion(-)
 create mode 100644 include/hw/pci-bridge/xio3130_downstream.h

diff --git a/hw/pci-bridge/xio3130_downstream.c b/hw/pci-bridge/xio3130_downstream.c
index 04aae72cd61..b17cafd359b 100644
--- a/hw/pci-bridge/xio3130_downstream.c
+++ b/hw/pci-bridge/xio3130_downstream.c
@@ -28,6 +28,7 @@
 #include "migration/vmstate.h"
 #include "qapi/error.h"
 #include "qemu/module.h"
+#include "hw/pci-bridge/xio3130_downstream.h"
 
 #define PCI_DEVICE_ID_TI_XIO3130D       0x8233  /* downstream port */
 #define XIO3130_REVISION                0x1
@@ -173,7 +174,7 @@ static void xio3130_downstream_class_init(ObjectClass *klass, void *data)
 }
 
 static const TypeInfo xio3130_downstream_info = {
-    .name          = "xio3130-downstream",
+    .name          = TYPE_XIO3130_DOWNSTREAM,
     .parent        = TYPE_PCIE_SLOT,
     .class_init    = xio3130_downstream_class_init,
     .interfaces = (InterfaceInfo[]) {
diff --git a/include/hw/pci-bridge/xio3130_downstream.h b/include/hw/pci-bridge/xio3130_downstream.h
new file mode 100644
index 00000000000..1d10139aeab
--- /dev/null
+++ b/include/hw/pci-bridge/xio3130_downstream.h
@@ -0,0 +1,15 @@
+/*
+ * TI X3130 pci express downstream port switch
+ *
+ * Copyright (C) 2022 Igor Mammedov <imammedo@redhat.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef HW_PCI_BRIDGE_XIO3130_DOWNSTREAM_H
+#define HW_PCI_BRIDGE_XIO3130_DOWNSTREAM_H
+
+#define TYPE_XIO3130_DOWNSTREAM "xio3130-downstream"
+
+#endif
+
-- 
Gitee


From a654bf94ece81eabd53b2ee45d713bc4ed3fa630 Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Wed, 23 Mar 2022 13:21:40 -0400
Subject: [PATCH 053/407] acpi: pcihp: pcie: set power on cap on parent slot

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 134: pci: expose TYPE_XIO3130_DOWNSTREAM name
RH-Commit: [2/2] d883872647a6e90ec573140b2c171f3f53b600ab (jmaloy/qemu-kvm)
RH-Bugzilla: 2062610
RH-Acked-by: Igor Mammedov <imammedo@redhat.com>
RH-Acked-by: Gerd Hoffmann <kraxel@redhat.com>

BZ: https://bugzilla.redhat.com/2062610
UPSTREAM: merged
BREW: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=44038138

commit 6b0969f1ec825984cd74619f0730be421b0c46fb
Author: Igor Mammedov <imammedo@redhat.com>
Date:   Tue Mar 1 10:11:59 2022 -0500

    acpi: pcihp: pcie: set power on cap on parent slot

    on creation a PCIDevice has power turned on at the end of pci_qdev_realize()
    however later on if PCIe slot isn't populated with any children
    it's power is turned off. It's fine if native hotplug is used
    as plug callback will power slot on among other things.
    However when ACPI hotplug is enabled it replaces native PCIe plug
    callbacks with ACPI specific ones (acpi_pcihp_device_*plug_cb) and
    as result slot stays powered off. It works fine as ACPI hotplug
    on guest side takes care of enumerating/initializing hotplugged
    device. But when later guest is migrated, call chain introduced by]
    commit d5daff7d312 (pcie: implement slot power control for pcie root ports)

       pcie_cap_slot_post_load()
           -> pcie_cap_update_power()
               -> pcie_set_power_device()
                   -> pci_set_power()
                       -> pci_update_mappings()

    will disable earlier initialized BARs for the hotplugged device
    in powered off slot due to commit 23786d13441 (pci: implement power state)
    which disables BARs if power is off.

    Fix it by setting PCI_EXP_SLTCTL_PCC to PCI_EXP_SLTCTL_PWR_ON
    on slot (root port/downstream port) at the time a device
    hotplugged into it. As result PCI_EXP_SLTCTL_PWR_ON is migrated
    to target and above call chain keeps device plugged into it
    powered on.

    Fixes: d5daff7d312 ("pcie: implement slot power control for pcie root ports")
    Fixes: 23786d13441 ("pci: implement power state")
    Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2053584
    Suggested-by: "Michael S. Tsirkin" <mst@redhat.com>
    Signed-off-by: Igor Mammedov <imammedo@redhat.com>
    Message-Id: <20220301151200.3507298-3-imammedo@redhat.com>
    Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

(cherry picked from commit 6b0969f1ec825984cd74619f0730be421b0c46fb)
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 hw/acpi/pcihp.c       | 12 +++++++++++-
 hw/pci/pcie.c         | 11 +++++++++++
 include/hw/pci/pcie.h |  1 +
 3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
index a5e182dd3a3..be0e846b343 100644
--- a/hw/acpi/pcihp.c
+++ b/hw/acpi/pcihp.c
@@ -32,6 +32,7 @@
 #include "hw/pci/pci_bridge.h"
 #include "hw/pci/pci_host.h"
 #include "hw/pci/pcie_port.h"
+#include "hw/pci-bridge/xio3130_downstream.h"
 #include "hw/i386/acpi-build.h"
 #include "hw/acpi/acpi.h"
 #include "hw/pci/pci_bus.h"
@@ -341,6 +342,8 @@ void acpi_pcihp_device_plug_cb(HotplugHandler *hotplug_dev, AcpiPciHpState *s,
 {
     PCIDevice *pdev = PCI_DEVICE(dev);
     int slot = PCI_SLOT(pdev->devfn);
+    PCIDevice *bridge;
+    PCIBus *bus;
     int bsel;
 
     /* Don't send event when device is enabled during qemu machine creation:
@@ -370,7 +373,14 @@ void acpi_pcihp_device_plug_cb(HotplugHandler *hotplug_dev, AcpiPciHpState *s,
         return;
     }
 
-    bsel = acpi_pcihp_get_bsel(pci_get_bus(pdev));
+    bus = pci_get_bus(pdev);
+    bridge = pci_bridge_get_device(bus);
+    if (object_dynamic_cast(OBJECT(bridge), TYPE_PCIE_ROOT_PORT) ||
+        object_dynamic_cast(OBJECT(bridge), TYPE_XIO3130_DOWNSTREAM)) {
+        pcie_cap_slot_enable_power(bridge);
+    }
+
+    bsel = acpi_pcihp_get_bsel(bus);
     g_assert(bsel >= 0);
     s->acpi_pcihp_pci_status[bsel].up |= (1U << slot);
     acpi_send_event(DEVICE(hotplug_dev), ACPI_PCI_HOTPLUG_STATUS);
diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index d7d73a31e4c..996f0e24fe0 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -366,6 +366,17 @@ static void hotplug_event_clear(PCIDevice *dev)
     }
 }
 
+void pcie_cap_slot_enable_power(PCIDevice *dev)
+{
+    uint8_t *exp_cap = dev->config + dev->exp.exp_cap;
+    uint32_t sltcap = pci_get_long(exp_cap + PCI_EXP_SLTCAP);
+
+    if (sltcap & PCI_EXP_SLTCAP_PCP) {
+        pci_set_word_by_mask(exp_cap + PCI_EXP_SLTCTL,
+                             PCI_EXP_SLTCTL_PCC, PCI_EXP_SLTCTL_PWR_ON);
+    }
+}
+
 static void pcie_set_power_device(PCIBus *bus, PCIDevice *dev, void *opaque)
 {
     bool *power = opaque;
diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
index 6063bee0ec6..c27368d0778 100644
--- a/include/hw/pci/pcie.h
+++ b/include/hw/pci/pcie.h
@@ -112,6 +112,7 @@ void pcie_cap_slot_write_config(PCIDevice *dev,
                                 uint32_t addr, uint32_t val, int len);
 int pcie_cap_slot_post_load(void *opaque, int version_id);
 void pcie_cap_slot_push_attention_button(PCIDevice *dev);
+void pcie_cap_slot_enable_power(PCIDevice *dev);
 
 void pcie_cap_root_init(PCIDevice *dev);
 void pcie_cap_root_reset(PCIDevice *dev);
-- 
Gitee


From ddd68250f7cef12afb4380f8bae18ceca83e0865 Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Wed, 30 Mar 2022 14:52:34 -0400
Subject: [PATCH 054/407] vmxcap: Add 5-level EPT bit

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 139: vmxcap: Add 5-level EPT bit
RH-Commit: [1/2] 4c098f551f1ed8e2a5582f466afda35b28d97055 (jmaloy/qemu-kvm)
RH-Bugzilla: 2065207
RH-Acked-by: Paolo Bonzini <pbonzini@redhat.com>

BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2065207
UPSTREAM: Merged

commit d312378e59658473aa91aa15c67ec6200d92e5ff
Author: Vitaly Kuznetsov <vkuznets@redhat.com>
Date:   Mon Feb 21 15:53:16 2022 +0100

    vmxcap: Add 5-level EPT bit

    5-level EPT is present in Icelake Server CPUs and is supported by QEMU
    ('vmx-page-walk-5').

    Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
    Message-Id: <20220221145316.576138-2-vkuznets@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

(cherry picked from commit d312378e59658473aa91aa15c67ec6200d92e5ff)
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 scripts/kvm/vmxcap | 1 +
 1 file changed, 1 insertion(+)

diff --git a/scripts/kvm/vmxcap b/scripts/kvm/vmxcap
index 6fe66d5f575..f140040104b 100755
--- a/scripts/kvm/vmxcap
+++ b/scripts/kvm/vmxcap
@@ -249,6 +249,7 @@ controls = [
         bits = {
             0: 'Execute-only EPT translations',
             6: 'Page-walk length 4',
+            7: 'Page-walk length 5',
             8: 'Paging-structure memory type UC',
             14: 'Paging-structure memory type WB',
             16: '2MB EPT pages',
-- 
Gitee


From e1d83b1f02046b1d0c1e7a7c5803769bb0e4fe8f Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Wed, 30 Mar 2022 14:52:34 -0400
Subject: [PATCH 055/407] i386: Add Icelake-Server-v6 CPU model with 5-level
 EPT support

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 139: vmxcap: Add 5-level EPT bit
RH-Commit: [2/2] e913746b2df9cbd0308014ab5cc72577458857fa (jmaloy/qemu-kvm)
RH-Bugzilla: 2065207
RH-Acked-by: Paolo Bonzini <pbonzini@redhat.com>

BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2065207
UPSTREAM: Merged

commit: 12cab535db6440af41ed8dfefe908a594321b6ce
Author: Vitaly Kuznetsov <vkuznets@redhat.com>
Date:   Mon Feb 21 15:53:15 2022 +0100

    i386: Add Icelake-Server-v6 CPU model with 5-level EPT support

    Windows 11 with WSL2 enabled (Hyper-V) fails to boot with Icelake-Server
    {-v5} CPU model but boots well with '-cpu host'. Apparently, it expects
    5-level paging and 5-level EPT support to come in pair but QEMU's
    Icelake-Server CPU model lacks the later. Introduce 'Icelake-Server-v6'
    CPU model with 'vmx-page-walk-5' enabled by default.

    Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
    Message-Id: <20220221145316.576138-1-vkuznets@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

(cherry picked from commit 12cab535db6440af41ed8dfefe908a594321b6ce)
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 target/i386/cpu.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index aa9e6368004..6e25d133397 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -3505,6 +3505,14 @@ static const X86CPUDefinition builtin_x86_defs[] = {
                     { /* end of list */ }
                 },
             },
+            {
+                .version = 6,
+                .note = "5-level EPT",
+                .props = (PropValue[]) {
+                    { "vmx-page-walk-5", "on" },
+                    { /* end of list */ }
+                },
+            },
             { /* end of list */ }
         }
     },
-- 
Gitee


From 92be6f0856c7000cfb36fdc37eb6c736281af5a0 Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Wed, 30 Mar 2022 14:52:34 -0400
Subject: [PATCH 056/407] acpi: fix QEMU crash when started with SLIC table
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 141: acpi: fix QEMU crash when started with SLIC table
RH-Commit: [1/10] 0c34e80346c33da4f220d9c486b120c35005144e (jmaloy/qemu-kvm)
RH-Bugzilla: 2062611
RH-Acked-by: Igor Mammedov <imammedo@redhat.com>

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2062611
Upstream: Merged

commit 8cdb99af45365727ac17f45239a9b8c1d5155c6d)
Author: Igor Mammedov <imammedo@redhat.com>
Date:   Mon Dec 27 14:31:17 2021 -0500

    acpi: fix QEMU crash when started with SLIC table

    if QEMU is started with used provided SLIC table blob,

      -acpitable sig=SLIC,oem_id='CRASH ',oem_table_id="ME",oem_rev=00002210,asl_compiler_id="",asl_compiler_rev=00000000,data=/dev/null
    it will assert with:

      hw/acpi/aml-build.c:61:build_append_padded_str: assertion failed: (len <= maxlen)

    and following backtrace:

      ...
      build_append_padded_str (array=0x555556afe320, str=0x555556afdb2e "CRASH ME", maxlen=0x6, pad=0x20) at hw/acpi/aml-build.c:61
      acpi_table_begin (desc=0x7fffffffd1b0, array=0x555556afe320) at hw/acpi/aml-build.c:1727
      build_fadt (tbl=0x555556afe320, linker=0x555557ca3830, f=0x7fffffffd318, oem_id=0x555556afdb2e "CRASH ME", oem_table_id=0x555556afdb34 "ME") at hw/acpi/aml-build.c:2064
      ...

    which happens due to acpi_table_begin() expecting NULL terminated
    oem_id and oem_table_id strings, which is normally the case, but
    in case of user provided SLIC table, oem_id points to table's blob
    directly and as result oem_id became longer than expected.

    Fix issue by handling oem_id consistently and make acpi_get_slic_oem()
    return NULL terminated strings.

    PS:
    After [1] refactoring, oem_id semantics became inconsistent, where
    NULL terminated string was coming from machine and old way pointer
    into byte array coming from -acpitable option. That used to work
    since build_header() wasn't expecting NULL terminated string and
    blindly copied the 1st 6 bytes only.

    However commit [2] broke that by replacing build_header() with
    acpi_table_begin(), which was expecting NULL terminated string
    and was checking oem_id size.

    1) 602b45820 ("acpi: Permit OEM ID and OEM table ID fields to be changed")
    2)
    Fixes: 4b56e1e4eb08 ("acpi: build_fadt: use acpi_table_begin()/acpi_table_end() instead of build_header()")
    Resolves: https://gitlab.com/qemu-project/qemu/-/issues/786
    Signed-off-by: Igor Mammedov <imammedo@redhat.com>
    Message-Id: <20211227193120.1084176-2-imammedo@redhat.com>
    Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
    Tested-by: Denis Lisov <dennis.lissov@gmail.com>
    Tested-by: Alexander Tsoy <alexander@tsoy.me>
    Cc: qemu-stable@nongnu.org
    Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

(cherry picked from commit 8cdb99af45365727ac17f45239a9b8c1d5155c6d)
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 hw/acpi/core.c       | 4 ++--
 hw/i386/acpi-build.c | 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/acpi/core.c b/hw/acpi/core.c
index 1e004d0078d..3e811bf03ca 100644
--- a/hw/acpi/core.c
+++ b/hw/acpi/core.c
@@ -345,8 +345,8 @@ int acpi_get_slic_oem(AcpiSlicOem *oem)
         struct acpi_table_header *hdr = (void *)(u - sizeof(hdr->_length));
 
         if (memcmp(hdr->sig, "SLIC", 4) == 0) {
-            oem->id = hdr->oem_id;
-            oem->table_id = hdr->oem_table_id;
+            oem->id = g_strndup(hdr->oem_id, 6);
+            oem->table_id = g_strndup(hdr->oem_table_id, 8);
             return 0;
         }
     }
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index a4478e77b73..acc4869db06 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2726,6 +2726,8 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
 
     /* Cleanup memory that's no longer used. */
     g_array_free(table_offsets, true);
+    g_free(slic_oem.id);
+    g_free(slic_oem.table_id);
 }
 
 static void acpi_ram_update(MemoryRegion *mr, GArray *data)
-- 
Gitee


From 3f2605af8883612b30aca456f048b2da422bde67 Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Wed, 30 Mar 2022 14:52:34 -0400
Subject: [PATCH 057/407] tests: acpi: whitelist expected blobs before changing
 them

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 141: acpi: fix QEMU crash when started with SLIC table
RH-Commit: [2/10] c664ecad30ca9c13025a63bb31ae7b80fd63e4df (jmaloy/qemu-kvm)
RH-Bugzilla: 2062611
RH-Acked-by: Igor Mammedov <imammedo@redhat.com>

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2062611
Upstream: Merged

commit e71f6ab9d93a7d01e833647e7010c1079c4cef30
Author: Igor Mammedov <imammedo@redhat.com>
Date:   Mon Dec 27 14:31:18 2021 -0500

    tests: acpi: whitelist expected blobs before changing them

    Signed-off-by: Igor Mammedov <imammedo@redhat.com>
    Message-Id: <20211227193120.1084176-3-imammedo@redhat.com>
    Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

(cherry picked from commit e71f6ab9d93a7d01e833647e7010c1079c4cef30)
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 tests/qtest/bios-tables-test-allowed-diff.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tests/qtest/bios-tables-test-allowed-diff.h b/tests/qtest/bios-tables-test-allowed-diff.h
index dfb8523c8bf..49dbf8fa3ee 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1 +1,3 @@
 /* List of comma-separated changed AML files to ignore */
+"tests/data/acpi/q35/FACP.slic",
+"tests/data/acpi/q35/SLIC.slic",
-- 
Gitee


From 564db7776a453cb085cde4780df550d7e69690f7 Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Wed, 30 Mar 2022 14:52:34 -0400
Subject: [PATCH 058/407] tests: acpi: add SLIC table test

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 141: acpi: fix QEMU crash when started with SLIC table
RH-Commit: [3/10] baac9b82c16a50eb4640fd7146775c9d507c7b21 (jmaloy/qemu-kvm)
RH-Bugzilla: 2062611
RH-Acked-by: Igor Mammedov <imammedo@redhat.com>

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2062611
Upstream: Merged

commit 11edfabee443b149468a82b5efc88c96d1d259ec
Author: Igor Mammedov <imammedo@redhat.com>
Date:   Mon Dec 27 14:31:19 2021 -0500

    tests: acpi: add SLIC table test

    When user uses '-acpitable' to add SLIC table, some ACPI
    tables (FADT) will change its 'Oem ID'/'Oem Table ID' fields to
    match that of SLIC. Test makes sure thati QEMU handles
    those fields correctly when SLIC table is added with
    '-acpitable' option.

    Conflicts: tests/qtest/bios-tables-test.c
     due to missing 39d7554b2009 ("tests/acpi: add test case for VIOT")

    Signed-off-by: Igor Mammedov <imammedo@redhat.com>
    Message-Id: <20211227193120.1084176-4-imammedo@redhat.com>
    Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

(cherry picked from commit 11edfabee443b149468a82b5efc88c96d1d259ec)
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 tests/qtest/bios-tables-test.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/tests/qtest/bios-tables-test.c b/tests/qtest/bios-tables-test.c
index 16d8304cde1..e159b711365 100644
--- a/tests/qtest/bios-tables-test.c
+++ b/tests/qtest/bios-tables-test.c
@@ -1467,6 +1467,20 @@ static void test_acpi_virt_tcg(void)
     free_test_data(&data);
 }
 
+static void test_acpi_q35_slic(void)
+{
+    test_data data = {
+        .machine = MACHINE_Q35,
+        .variant = ".slic",
+    };
+
+    test_acpi_one("-acpitable sig=SLIC,oem_id='CRASH ',oem_table_id='ME',"
+                  "oem_rev=00002210,asl_compiler_id='qemu',"
+                  "asl_compiler_rev=00000000,data=/dev/null",
+                  &data);
+    free_test_data(&data);
+}
+
 static void test_oem_fields(test_data *data)
 {
     int i;
@@ -1641,6 +1655,7 @@ int main(int argc, char *argv[])
             qtest_add_func("acpi/q35/kvm/xapic", test_acpi_q35_kvm_xapic);
             qtest_add_func("acpi/q35/kvm/dmar", test_acpi_q35_kvm_dmar);
         }
+        qtest_add_func("acpi/q35/slic", test_acpi_q35_slic);
     } else if (strcmp(arch, "aarch64") == 0) {
         if (has_tcg) {
             qtest_add_func("acpi/virt", test_acpi_virt_tcg);
-- 
Gitee


From a7eb6101adf8338af2677c948a2947d42b4f1c1f Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Wed, 30 Mar 2022 14:52:34 -0400
Subject: [PATCH 059/407] tests: acpi: SLIC: update expected blobs

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 141: acpi: fix QEMU crash when started with SLIC table
RH-Commit: [4/10] ca28e5c57f9eb432e5ad6b1cb7ef646a86890dd5 (jmaloy/qemu-kvm)
RH-Bugzilla: 2062611
RH-Acked-by: Igor Mammedov <imammedo@redhat.com>

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2062611
Upstream: Merged

commit c8adb4d222c42951a9d0367e5f5d4e1f5e2c9ad7
Author: Igor Mammedov <imammedo@redhat.com>
Date:   Mon Dec 27 14:31:20 2021 -0500

    tests: acpi: SLIC: update expected blobs

    Signed-off-by: Igor Mammedov <imammedo@redhat.com>
    Message-Id: <20211227193120.1084176-5-imammedo@redhat.com>
    Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

(cherry picked from commit c8adb4d222c42951a9d0367e5f5d4e1f5e2c9ad7)
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 tests/qtest/bios-tables-test-allowed-diff.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/tests/qtest/bios-tables-test-allowed-diff.h b/tests/qtest/bios-tables-test-allowed-diff.h
index 49dbf8fa3ee..dfb8523c8bf 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1,3 +1 @@
 /* List of comma-separated changed AML files to ignore */
-"tests/data/acpi/q35/FACP.slic",
-"tests/data/acpi/q35/SLIC.slic",
-- 
Gitee


From 347baa472858b5f8d144484a7078125a70abd658 Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Wed, 30 Mar 2022 14:52:34 -0400
Subject: [PATCH 060/407] tests: acpi: manually pad OEM_ID/OEM_TABLE_ID for
 test_oem_fields() test

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 141: acpi: fix QEMU crash when started with SLIC table
RH-Commit: [5/10] 4ec8c738acec178c2f005f189b0c2a77a7af4088 (jmaloy/qemu-kvm)
RH-Bugzilla: 2062611
RH-Acked-by: Igor Mammedov <imammedo@redhat.com>

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2062611
Upstream: Merged

commit a849522f726767022203ef2b6c395ea19facb866
Author: Igor Mammedov <imammedo@redhat.com>
Date:   Wed Jan 12 08:03:29 2022 -0500

    tests: acpi: manually pad OEM_ID/OEM_TABLE_ID for test_oem_fields() test

    The next commit will revert OEM fields padding with whitespace to
    padding with '\0' as it was before [1]. As result test_oem_fields() will
    fail due to unexpectedly smaller ID sizes read from QEMU ACPI tables.

    Pad OEM_ID/OEM_TABLE_ID manually with spaces so that values the test
    puts on QEMU CLI and expected values match.

    1) 602b458201 ("acpi: Permit OEM ID and OEM table ID fields to be changed")
    Signed-off-by: Igor Mammedov <imammedo@redhat.com>
    Message-Id: <20220112130332.1648664-2-imammedo@redhat.com>
    Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

(cherry picked from commit a849522f726767022203ef2b6c395ea19facb866)
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 tests/qtest/bios-tables-test.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/tests/qtest/bios-tables-test.c b/tests/qtest/bios-tables-test.c
index e159b711365..348fdbd2023 100644
--- a/tests/qtest/bios-tables-test.c
+++ b/tests/qtest/bios-tables-test.c
@@ -71,9 +71,10 @@
 
 #define ACPI_REBUILD_EXPECTED_AML "TEST_ACPI_REBUILD_AML"
 
-#define OEM_ID             "TEST"
-#define OEM_TABLE_ID       "OEM"
-#define OEM_TEST_ARGS      "-machine x-oem-id="OEM_ID",x-oem-table-id="OEM_TABLE_ID
+#define OEM_ID             "TEST  "
+#define OEM_TABLE_ID       "OEM     "
+#define OEM_TEST_ARGS      "-machine x-oem-id='" OEM_ID "',x-oem-table-id='" \
+                           OEM_TABLE_ID "'"
 
 typedef struct {
     bool tcg_only;
@@ -1484,11 +1485,7 @@ static void test_acpi_q35_slic(void)
 static void test_oem_fields(test_data *data)
 {
     int i;
-    char oem_id[6];
-    char oem_table_id[8];
 
-    strpadcpy(oem_id, sizeof oem_id, OEM_ID, ' ');
-    strpadcpy(oem_table_id, sizeof oem_table_id, OEM_TABLE_ID, ' ');
     for (i = 0; i < data->tables->len; ++i) {
         AcpiSdtTable *sdt;
 
@@ -1498,8 +1495,8 @@ static void test_oem_fields(test_data *data)
             continue;
         }
 
-        g_assert(memcmp(sdt->aml + 10, oem_id, 6) == 0);
-        g_assert(memcmp(sdt->aml + 16, oem_table_id, 8) == 0);
+        g_assert(memcmp(sdt->aml + 10, OEM_ID, 6) == 0);
+        g_assert(memcmp(sdt->aml + 16, OEM_TABLE_ID, 8) == 0);
     }
 }
 
-- 
Gitee


From c5bfd9d48f31bdaef8830cc27a8985ad3b576d5b Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Wed, 30 Mar 2022 14:52:34 -0400
Subject: [PATCH 061/407] tests: acpi: whitelist nvdimm's SSDT and FACP.slic
 expected blobs

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 141: acpi: fix QEMU crash when started with SLIC table
RH-Commit: [6/10] 3f3a929cde82f228da1e4bc66e4c869467c0289c (jmaloy/qemu-kvm)
RH-Bugzilla: 2062611
RH-Acked-by: Igor Mammedov <imammedo@redhat.com>

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2062611
Upstream: Merged

commit d1e4a4654154925eddf0fc449fa9c92b806b9c8c
Author: Igor Mammedov <imammedo@redhat.com>
Date:   Wed Jan 12 08:03:30 2022 -0500

    tests: acpi: whitelist nvdimm's SSDT and FACP.slic expected blobs

    The next commit will revert OEM fields whitespace padding to
    padding with '\0' as it was before [1]. That will change OEM
    Table ID for:
      * SSDT.*: where it was padded from 6 characters to 8
      * FACP.slic: where it was padded from 2 characters to 8
    after reverting whitespace padding, it will be replaced with
    '\0' which effectively will shorten OEM table ID to 6 and 2
    characters.

    Whitelist affected tables before introducing the change.

    1) 602b458201 ("acpi: Permit OEM ID and OEM table ID fields to be changed")
    Signed-off-by: Igor Mammedov <imammedo@redhat.com>
    Message-Id: <20220112130332.1648664-3-imammedo@redhat.com>
    Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

(cherry picked from commit d1e4a4654154925eddf0fc449fa9c92b806b9c8c)
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 tests/qtest/bios-tables-test-allowed-diff.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tests/qtest/bios-tables-test-allowed-diff.h b/tests/qtest/bios-tables-test-allowed-diff.h
index dfb8523c8bf..7faa8f53bea 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1 +1,5 @@
 /* List of comma-separated changed AML files to ignore */
+"tests/data/acpi/virt/SSDT.memhp",
+"tests/data/acpi/pc/SSDT.dimmpxm",
+"tests/data/acpi/q35/SSDT.dimmpxm",
+"tests/data/acpi/q35/FACP.slic",
-- 
Gitee


From f74722ae77abbf91b9e05c14d039d05d009a0bd4 Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Wed, 30 Mar 2022 14:52:34 -0400
Subject: [PATCH 062/407] acpi: fix OEM ID/OEM Table ID padding

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 141: acpi: fix QEMU crash when started with SLIC table
RH-Commit: [7/10] 51ea859cbe12b5a902d529ab589d18757d98f71d (jmaloy/qemu-kvm)
RH-Bugzilla: 2062611
RH-Acked-by: Igor Mammedov <imammedo@redhat.com>

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2062611
Upstream: Merged

commit 748c030f360a940fe0c9382c8ca1649096c3a80d
Author: Igor Mammedov <imammedo@redhat.com>
Date:   Wed Jan 12 08:03:31 2022 -0500

    acpi: fix OEM ID/OEM Table ID padding

    Commit [2] broke original '\0' padding of OEM ID and OEM Table ID
    fields in headers of ACPI tables. While it doesn't have impact on
    default values since QEMU uses 6 and 8 characters long values
    respectively, it broke usecase where IDs are provided on QEMU CLI.
    It shouldn't affect guest (but may cause licensing verification
    issues in guest OS).
    One of the broken usecases is user supplied SLIC table with IDs
    shorter than max possible length, where [2] mangles IDs with extra
    spaces in RSDT and FADT tables whereas guest OS expects those to
    mirror the respective values of the used SLIC table.

    Fix it by replacing whitespace padding with '\0' padding in
    accordance with [1] and expectations of guest OS

    1) ACPI spec, v2.0b
           17.2 AML Grammar Definition
           ...
           //OEM ID of up to 6 characters. If the OEM ID is
           //shorter than 6 characters, it can be terminated
           //with a NULL character.

    2)
    Fixes: 602b458201 ("acpi: Permit OEM ID and OEM table ID fields to be changed")
    Resolves: https://gitlab.com/qemu-project/qemu/-/issues/707
    Reported-by: Dmitry V. Orekhov <dima.orekhov@gmail.com>
    Signed-off-by: Igor Mammedov <imammedo@redhat.com>
    Cc: qemu-stable@nongnu.org
    Message-Id: <20220112130332.1648664-4-imammedo@redhat.com>
    Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    Reviewed-by: Ani Sinha <ani@anisinha.ca>
    Tested-by: Dmitry V. Orekhov dima.orekhov@gmail.com

(cherry picked from commit 748c030f360a940fe0c9382c8ca1649096c3a80d)
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 hw/acpi/aml-build.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index b3b3310df32..65148d5b9d0 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1724,9 +1724,9 @@ void acpi_table_begin(AcpiTable *desc, GArray *array)
     build_append_int_noprefix(array, 0, 4); /* Length */
     build_append_int_noprefix(array, desc->rev, 1); /* Revision */
     build_append_int_noprefix(array, 0, 1); /* Checksum */
-    build_append_padded_str(array, desc->oem_id, 6, ' '); /* OEMID */
+    build_append_padded_str(array, desc->oem_id, 6, '\0'); /* OEMID */
     /* OEM Table ID */
-    build_append_padded_str(array, desc->oem_table_id, 8, ' ');
+    build_append_padded_str(array, desc->oem_table_id, 8, '\0');
     build_append_int_noprefix(array, 1, 4); /* OEM Revision */
     g_array_append_vals(array, ACPI_BUILD_APPNAME8, 4); /* Creator ID */
     build_append_int_noprefix(array, 1, 4); /* Creator Revision */
-- 
Gitee


From 85cf166ca403a367428713a24a4ed39cd9c6ecad Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Wed, 30 Mar 2022 14:52:34 -0400
Subject: [PATCH 063/407] tests: acpi: update expected blobs

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 141: acpi: fix QEMU crash when started with SLIC table
RH-Commit: [8/10] e069c5de88f34393d65d32b60380865832820302 (jmaloy/qemu-kvm)
RH-Bugzilla: 2062611
RH-Acked-by: Igor Mammedov <imammedo@redhat.com>

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2062611
Upstream: Merged

commit 5adc3aba875416b0e077d8a29ddd0357883746f4
Author: Igor Mammedov <imammedo@redhat.com>
Date:   Wed Jan 12 08:03:32 2022 -0500

    tests: acpi: update expected blobs

    Expected changes caused by previous commit:

    nvdimm ssdt (q35/pc/virt):
      - *     OEM Table ID     "NVDIMM  "
      + *     OEM Table ID     "NVDIMM"

    SLIC test FADT (tests/data/acpi/q35/FACP.slic):
      -[010h 0016   8]                 Oem Table ID : "ME      "
      +[010h 0016   8]                 Oem Table ID : "ME"

    Signed-off-by: Igor Mammedov <imammedo@redhat.com>
    Message-Id: <20220112130332.1648664-5-imammedo@redhat.com>
    Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

(cherry picked from commit 5adc3aba875416b0e077d8a29ddd0357883746f4)
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 tests/qtest/bios-tables-test-allowed-diff.h | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/tests/qtest/bios-tables-test-allowed-diff.h b/tests/qtest/bios-tables-test-allowed-diff.h
index 7faa8f53bea..dfb8523c8bf 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1,5 +1 @@
 /* List of comma-separated changed AML files to ignore */
-"tests/data/acpi/virt/SSDT.memhp",
-"tests/data/acpi/pc/SSDT.dimmpxm",
-"tests/data/acpi/q35/SSDT.dimmpxm",
-"tests/data/acpi/q35/FACP.slic",
-- 
Gitee


From 3c858aad31038fe2dd9b0eb2987310ccd1f7c2d6 Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Wed, 30 Mar 2022 14:52:34 -0400
Subject: [PATCH 064/407] tests: acpi: test short OEM_ID/OEM_TABLE_ID values in
 test_oem_fields()

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 141: acpi: fix QEMU crash when started with SLIC table
RH-Commit: [9/10] 31339223fb6c6cc32185b9fdaac76f2709b17ad6 (jmaloy/qemu-kvm)
RH-Bugzilla: 2062611
RH-Acked-by: Igor Mammedov <imammedo@redhat.com>

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2062611
Upstream: Merged

commit 408ca92634770de5eac7965ed97c6260e770f2e7
Author: Igor Mammedov <imammedo@redhat.com>
Date:   Fri Jan 14 09:26:41 2022 -0500

    tests: acpi: test short OEM_ID/OEM_TABLE_ID values in test_oem_fields()

    Previous patch [1] added explicit whitespace padding to OEM_ID/OEM_TABLE_ID
    values used in test_oem_fields() testcase to avoid false positive and
    bisection issues when QEMU is switched to \0' padding. As result
    testcase ceased to test values that were shorter than max possible
    length values.

    Update testcase to make sure that it's testing shorter IDs like it
    used to before [2].

    1) "tests: acpi: manually pad OEM_ID/OEM_TABLE_ID for  test_oem_fields() test"
    2) 602b458201 ("acpi: Permit OEM ID and OEM table ID fields to be changed")

    Signed-off-by: Igor Mammedov <imammedo@redhat.com>
    Message-Id: <20220114142641.1727679-1-imammedo@redhat.com>
    Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

(cherry picked from commit 408ca92634770de5eac7965ed97c6260e770f2e7)
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 tests/qtest/bios-tables-test.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/tests/qtest/bios-tables-test.c b/tests/qtest/bios-tables-test.c
index 348fdbd2023..515a6474905 100644
--- a/tests/qtest/bios-tables-test.c
+++ b/tests/qtest/bios-tables-test.c
@@ -71,10 +71,10 @@
 
 #define ACPI_REBUILD_EXPECTED_AML "TEST_ACPI_REBUILD_AML"
 
-#define OEM_ID             "TEST  "
-#define OEM_TABLE_ID       "OEM     "
-#define OEM_TEST_ARGS      "-machine x-oem-id='" OEM_ID "',x-oem-table-id='" \
-                           OEM_TABLE_ID "'"
+#define OEM_ID             "TEST"
+#define OEM_TABLE_ID       "OEM"
+#define OEM_TEST_ARGS      "-machine x-oem-id=" OEM_ID ",x-oem-table-id=" \
+                           OEM_TABLE_ID
 
 typedef struct {
     bool tcg_only;
@@ -1495,8 +1495,8 @@ static void test_oem_fields(test_data *data)
             continue;
         }
 
-        g_assert(memcmp(sdt->aml + 10, OEM_ID, 6) == 0);
-        g_assert(memcmp(sdt->aml + 16, OEM_TABLE_ID, 8) == 0);
+        g_assert(strncmp((char *)sdt->aml + 10, OEM_ID, 6) == 0);
+        g_assert(strncmp((char *)sdt->aml + 16, OEM_TABLE_ID, 8) == 0);
     }
 }
 
-- 
Gitee


From 6862046ef39c4e659251d6e1bb2614874e3a0928 Mon Sep 17 00:00:00 2001
From: Stefano Garzarella <sgarzare@redhat.com>
Date: Thu, 24 Mar 2022 16:04:57 +0100
Subject: [PATCH 065/407] RHEL: disable "seqpacket" for "vhost-vsock-device" in
 rhel8.6.0
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Stefano Garzarella <sgarzare@redhat.com>
RH-MergeRequest: 136: RHEL: disable "seqpacket" for "vhost-vsock-device" in rhel8.6.0 [rhel-8.7.0]
RH-Commit: [1/1] d82ea09e123679521503689f7d9af1c03dc71bfc
RH-Bugzilla: 2068202
RH-Acked-by: Jason Wang <None>
RH-Acked-by: Eugenio Pérez <None>
RH-Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

vhost-vsock device in RHEL 8 kernels doesn't support seqpacket.
To avoid problems when migrating a VM from RHEL 9 host, we need to
disable it in rhel8-* machine types.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 hw/core/machine.c          | 10 ++++++++++
 hw/i386/pc_piix.c          |  2 ++
 hw/i386/pc_q35.c           |  2 ++
 hw/s390x/s390-virtio-ccw.c |  1 +
 include/hw/boards.h        |  3 +++
 5 files changed, 18 insertions(+)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 024b025fc2a..76fcabec7a2 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -37,6 +37,16 @@
 #include "hw/virtio/virtio.h"
 #include "hw/virtio/virtio-pci.h"
 
+GlobalProperty hw_compat_rhel_8_6[] = {
+    /* hw_compat_rhel_8_6 bz 2068202 */
+    /*
+     * vhost-vsock device in RHEL 8 kernels doesn't support seqpacket, so
+     * we need do disable it downstream on the latest hw_compat_rhel_8.
+     */
+    { "vhost-vsock-device", "seqpacket", "off" },
+};
+const size_t hw_compat_rhel_8_6_len = G_N_ELEMENTS(hw_compat_rhel_8_6);
+
 /*
  * Mostly the same as hw_compat_6_0 and hw_compat_6_1
  */
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index f03a8f0db8e..ab6d03e07ab 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -998,6 +998,8 @@ static void pc_machine_rhel760_options(MachineClass *m)
     pcmc->kvmclock_create_always = false;
     /* From pc_i440fx_5_1_machine_options() */
     pcmc->pci_root_uid = 1;
+    compat_props_add(m->compat_props, hw_compat_rhel_8_6,
+                     hw_compat_rhel_8_6_len);
     compat_props_add(m->compat_props, hw_compat_rhel_8_5,
                      hw_compat_rhel_8_5_len);
     compat_props_add(m->compat_props, pc_rhel_8_5_compat,
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 5559261d9e3..882fe7a68dc 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -658,6 +658,8 @@ static void pc_q35_machine_rhel860_options(MachineClass *m)
     m->desc = "RHEL-8.6.0 PC (Q35 + ICH9, 2009)";
     pcmc->smbios_stream_product = "RHEL-AV";
     pcmc->smbios_stream_version = "8.6.0";
+    compat_props_add(m->compat_props, hw_compat_rhel_8_6,
+                     hw_compat_rhel_8_6_len);
 }
 
 DEFINE_PC_MACHINE(q35_rhel860, "pc-q35-rhel8.6.0", pc_q35_init_rhel860,
diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 9795eb94069..bec270598bc 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -1109,6 +1109,7 @@ static void ccw_machine_rhel860_instance_options(MachineState *machine)
 
 static void ccw_machine_rhel860_class_options(MachineClass *mc)
 {
+    compat_props_add(mc->compat_props, hw_compat_rhel_8_6, hw_compat_rhel_8_6_len);
 }
 DEFINE_CCW_MACHINE(rhel860, "rhel8.6.0", true);
 
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 04e87598152..4ddb7981444 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -443,6 +443,9 @@ extern const size_t hw_compat_2_2_len;
 extern GlobalProperty hw_compat_2_1[];
 extern const size_t hw_compat_2_1_len;
 
+extern GlobalProperty hw_compat_rhel_8_6[];
+extern const size_t hw_compat_rhel_8_6_len;
+
 extern GlobalProperty hw_compat_rhel_8_5[];
 extern const size_t hw_compat_rhel_8_5_len;
 
-- 
Gitee


From c1723b922587a1353b98bcb6405f6ce31b636b4e Mon Sep 17 00:00:00 2001
From: Kevin Wolf <kwolf@redhat.com>
Date: Thu, 3 Feb 2022 15:05:33 +0100
Subject: [PATCH 066/407] block: Lock AioContext for drain_end in
 blockdev-reopen

RH-Author: Kevin Wolf <kwolf@redhat.com>
RH-MergeRequest: 142: block: Lock AioContext for drain_end in blockdev-reopen
RH-Commit: [1/2] 98de3b5987f88ea6b4b503f623d6c4475574e037
RH-Bugzilla: 2067118
RH-Acked-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
RH-Acked-by: Hanna Reitz <hreitz@redhat.com>

bdrv_subtree_drained_end() requires the caller to hold the AioContext
lock for the drained node. Not doing this for nodes outside of the main
AioContext leads to crashes when AIO_WAIT_WHILE() needs to wait and
tries to temporarily release the lock.

Fixes: 3908b7a8994fa5ef7a89aa58cd5a02fc58141592
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2046659
Reported-by: Qing Wang <qinwang@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20220203140534.36522-2-kwolf@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit aba8205be0707b9d108e32254e186ba88107a869)
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 blockdev.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/blockdev.c b/blockdev.c
index b35072644eb..565f6a81fd7 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3562,6 +3562,7 @@ void qmp_blockdev_reopen(BlockdevOptionsList *reopen_list, Error **errp)
 {
     BlockReopenQueue *queue = NULL;
     GSList *drained = NULL;
+    GSList *p;
 
     /* Add each one of the BDS that we want to reopen to the queue */
     for (; reopen_list != NULL; reopen_list = reopen_list->next) {
@@ -3611,7 +3612,15 @@ void qmp_blockdev_reopen(BlockdevOptionsList *reopen_list, Error **errp)
 
 fail:
     bdrv_reopen_queue_free(queue);
-    g_slist_free_full(drained, (GDestroyNotify) bdrv_subtree_drained_end);
+    for (p = drained; p; p = p->next) {
+        BlockDriverState *bs = p->data;
+        AioContext *ctx = bdrv_get_aio_context(bs);
+
+        aio_context_acquire(ctx);
+        bdrv_subtree_drained_end(bs);
+        aio_context_release(ctx);
+    }
+    g_slist_free(drained);
 }
 
 void qmp_blockdev_del(const char *node_name, Error **errp)
-- 
Gitee


From 72b6386d6fbe6a2f685ff24a427559deaa724344 Mon Sep 17 00:00:00 2001
From: Kevin Wolf <kwolf@redhat.com>
Date: Thu, 3 Feb 2022 15:05:34 +0100
Subject: [PATCH 067/407] iotests: Test blockdev-reopen with iothreads and
 throttling

RH-Author: Kevin Wolf <kwolf@redhat.com>
RH-MergeRequest: 142: block: Lock AioContext for drain_end in blockdev-reopen
RH-Commit: [2/2] 91d365864c391ca7db7db13260913fb61987b833
RH-Bugzilla: 2067118
RH-Acked-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
RH-Acked-by: Hanna Reitz <hreitz@redhat.com>

The 'throttle' block driver implements .bdrv_co_drain_end, so
blockdev-reopen will have to wait for it to complete in the polling
loop at the end of qmp_blockdev_reopen(). This makes AIO_WAIT_WHILE()
release the AioContext lock, which causes a crash if the lock hasn't
correctly been taken.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20220203140534.36522-3-kwolf@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit ee810602376125ca0e0afd6b7c715e13740978ea)
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 tests/qemu-iotests/245     | 36 +++++++++++++++++++++++++++++++++---
 tests/qemu-iotests/245.out |  4 ++--
 2 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/tests/qemu-iotests/245 b/tests/qemu-iotests/245
index 24ac43f70ee..8cbed7821b0 100755
--- a/tests/qemu-iotests/245
+++ b/tests/qemu-iotests/245
@@ -1138,12 +1138,13 @@ class TestBlockdevReopen(iotests.QMPTestCase):
         self.assertEqual(self.get_node('hd1'), None)
         self.assert_qmp(self.get_node('hd2'), 'ro', True)
 
-    def run_test_iothreads(self, iothread_a, iothread_b, errmsg = None):
-        opts = hd_opts(0)
+    def run_test_iothreads(self, iothread_a, iothread_b, errmsg = None,
+                           opts_a = None, opts_b = None):
+        opts = opts_a or hd_opts(0)
         result = self.vm.qmp('blockdev-add', conv_keys = False, **opts)
         self.assert_qmp(result, 'return', {})
 
-        opts2 = hd_opts(2)
+        opts2 = opts_b or hd_opts(2)
         result = self.vm.qmp('blockdev-add', conv_keys = False, **opts2)
         self.assert_qmp(result, 'return', {})
 
@@ -1194,6 +1195,35 @@ class TestBlockdevReopen(iotests.QMPTestCase):
     def test_iothreads_switch_overlay(self):
         self.run_test_iothreads('', 'iothread0')
 
+    def test_iothreads_with_throttling(self):
+        # Create a throttle-group object
+        opts = { 'qom-type': 'throttle-group', 'id': 'group0',
+                 'limits': { 'iops-total': 1000 } }
+        result = self.vm.qmp('object-add', conv_keys = False, **opts)
+        self.assert_qmp(result, 'return', {})
+
+        # Options with a throttle filter between format and protocol
+        opts = [
+            {
+                'driver': iotests.imgfmt,
+                'node-name': f'hd{idx}',
+                'file' : {
+                    'node-name': f'hd{idx}-throttle',
+                    'driver': 'throttle',
+                    'throttle-group': 'group0',
+                    'file': {
+                        'driver': 'file',
+                        'node-name': f'hd{idx}-file',
+                        'filename': hd_path[idx],
+                    },
+                },
+            }
+            for idx in (0, 2)
+        ]
+
+        self.run_test_iothreads('iothread0', 'iothread0', None,
+                                opts[0], opts[1])
+
 if __name__ == '__main__':
     iotests.activate_logging()
     iotests.main(supported_fmts=["qcow2"],
diff --git a/tests/qemu-iotests/245.out b/tests/qemu-iotests/245.out
index 4eced192944..a4e04a3266b 100644
--- a/tests/qemu-iotests/245.out
+++ b/tests/qemu-iotests/245.out
@@ -17,8 +17,8 @@ read 1/1 bytes at offset 262152
 read 1/1 bytes at offset 262160
 1 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
-...............
+................
 ----------------------------------------------------------------------
-Ran 25 tests
+Ran 26 tests
 
 OK
-- 
Gitee


From 2470de2c262a8124ce30d9093b56aa8710e350a2 Mon Sep 17 00:00:00 2001
From: Thomas Huth <thuth@redhat.com>
Date: Wed, 6 Apr 2022 10:56:26 +0200
Subject: [PATCH 068/407] s390x/css: fix PMCW invalid mask

RH-Author: Thomas Huth <thuth@redhat.com>
RH-MergeRequest: 145: s390x/css: fix PMCW invalid mask
RH-Commit: [1/1] fbf192f651aa668af56ca5c77455595fcdb19508
RH-Bugzilla: 2071070
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>

Bugzilla: http://bugzilla.redhat.com/2071070

commit 2df59b73e0864f021f6179f32f7ed364f6d4f38d
Author: Nico Boehr <nrb@linux.ibm.com>
Date:   Thu Dec 16 14:16:57 2021 +0100

    s390x/css: fix PMCW invalid mask

    Previously, we required bits 5, 6 and 7 to be zero (0x07 == 0b111). But,
    as per the principles of operation, bit 5 is ignored in MSCH and bits 0,
    1, 6 and 7 need to be zero.

    As both PMCW_FLAGS_MASK_INVALID and ioinst_schib_valid() are only used
    by ioinst_handle_msch(), adjust the mask accordingly.

    Fixes: db1c8f53bfb1 ("s390: Channel I/O basic definitions.")
    Signed-off-by: Nico Boehr <nrb@linux.ibm.com>
    Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
    Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
    Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
    Reviewed-by: Cornelia Huck <cohuck@redhat.com>
    Message-Id: <20211216131657.1057978-1-nrb@linux.ibm.com>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Thomas Huth <thuth@redhat.com>
---
 include/hw/s390x/ioinst.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/hw/s390x/ioinst.h b/include/hw/s390x/ioinst.h
index 3771fff9d44..ea8d0f24449 100644
--- a/include/hw/s390x/ioinst.h
+++ b/include/hw/s390x/ioinst.h
@@ -107,7 +107,7 @@ QEMU_BUILD_BUG_MSG(sizeof(PMCW) != 28, "size of PMCW is wrong");
 #define PMCW_FLAGS_MASK_MP 0x0004
 #define PMCW_FLAGS_MASK_TF 0x0002
 #define PMCW_FLAGS_MASK_DNV 0x0001
-#define PMCW_FLAGS_MASK_INVALID 0x0700
+#define PMCW_FLAGS_MASK_INVALID 0xc300
 
 #define PMCW_CHARS_MASK_ST 0x00e00000
 #define PMCW_CHARS_MASK_MBFC 0x00000004
-- 
Gitee


From 9b5de646b66503ea433fbeb24b69a053132f5d01 Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Wed, 13 Apr 2022 14:51:06 -0400
Subject: [PATCH 069/407] hw/intc/arm_gicv3: Check for !MEMTX_OK instead of
 MEMTX_ERROR
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 151: hw/intc/arm_gicv3: Check for !MEMTX_OK instead of MEMTX_ERROR
RH-Commit: [1/3] 561c9c2b1249f07d33013040b1c495ed1fbf825b (jmaloy/qemu-kvm)
RH-Bugzilla: 1999236
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
RH-Acked-by: Peter Xu <peterx@redhat.com>

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1999236
Upstream: Merged
CVE: CVE-2021-3750

commit b9d383ab797f54ae5fa8746117770709921dc529
Author: Philippe Mathieu-Daudé <philmd@redhat.com>
Date:   Wed Dec 15 19:24:19 2021 +0100

    hw/intc/arm_gicv3: Check for !MEMTX_OK instead of MEMTX_ERROR

    Quoting Peter Maydell:

     "These MEMTX_* aren't from the memory transaction
      API functions; they're just being used by gicd_readl() and
      friends as a way to indicate a success/failure so that the
      actual MemoryRegionOps read/write fns like gicv3_dist_read()
      can log a guest error."

    We are going to introduce more MemTxResult bits, so it is
    safer to check for !MEMTX_OK rather than MEMTX_ERROR.

    Reviewed-by: Peter Xu <peterx@redhat.com>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
    Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
    Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
    Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

(cherry picked from commit b9d383ab797f54ae5fa8746117770709921dc529)
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 hw/intc/arm_gicv3_redist.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/intc/arm_gicv3_redist.c b/hw/intc/arm_gicv3_redist.c
index c8ff3eca085..99b11ca5eee 100644
--- a/hw/intc/arm_gicv3_redist.c
+++ b/hw/intc/arm_gicv3_redist.c
@@ -462,7 +462,7 @@ MemTxResult gicv3_redist_read(void *opaque, hwaddr offset, uint64_t *data,
         break;
     }
 
-    if (r == MEMTX_ERROR) {
+    if (r != MEMTX_OK) {
         qemu_log_mask(LOG_GUEST_ERROR,
                       "%s: invalid guest read at offset " TARGET_FMT_plx
                       " size %u\n", __func__, offset, size);
@@ -521,7 +521,7 @@ MemTxResult gicv3_redist_write(void *opaque, hwaddr offset, uint64_t data,
         break;
     }
 
-    if (r == MEMTX_ERROR) {
+    if (r != MEMTX_OK) {
         qemu_log_mask(LOG_GUEST_ERROR,
                       "%s: invalid guest write at offset " TARGET_FMT_plx
                       " size %u\n", __func__, offset, size);
-- 
Gitee


From c211d073288be08708d947bae853c9f82a4bd8d8 Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Wed, 13 Apr 2022 14:51:06 -0400
Subject: [PATCH 070/407] softmmu/physmem: Simplify flatview_write and
 address_space_access_valid
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 151: hw/intc/arm_gicv3: Check for !MEMTX_OK instead of MEMTX_ERROR
RH-Commit: [2/3] daabe41eefd5c519def592e374fa368e32a680d3 (jmaloy/qemu-kvm)
RH-Bugzilla: 1999236
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
RH-Acked-by: Peter Xu <peterx@redhat.com>

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1999236
Upstream: Merged
CVE: CVE-2021-3750

commit 58e74682baf4e1ad26b064d8c02e5bc99c75c5d9
Author: Philippe Mathieu-Daudé <philmd@redhat.com>
Date:   Wed Dec 15 19:24:20 2021 +0100

    softmmu/physmem: Simplify flatview_write and address_space_access_valid

    Remove unuseful local 'result' variables.

    Reviewed-by: Peter Xu <peterx@redhat.com>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: Alexander Bulekov <alxndr@bu.edu>
    Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
    Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
    Message-Id: <20211215182421.418374-3-philmd@redhat.com>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

(cherry picked from commit 58e74682baf4e1ad26b064d8c02e5bc99c75c5d9)
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 softmmu/physmem.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 3524c04c2a1..483a31be816 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -2815,14 +2815,11 @@ static MemTxResult flatview_write(FlatView *fv, hwaddr addr, MemTxAttrs attrs,
     hwaddr l;
     hwaddr addr1;
     MemoryRegion *mr;
-    MemTxResult result = MEMTX_OK;
 
     l = len;
     mr = flatview_translate(fv, addr, &addr1, &l, true, attrs);
-    result = flatview_write_continue(fv, addr, attrs, buf, len,
-                                     addr1, l, mr);
-
-    return result;
+    return flatview_write_continue(fv, addr, attrs, buf, len,
+                                   addr1, l, mr);
 }
 
 /* Called within RCU critical section.  */
@@ -3119,12 +3116,10 @@ bool address_space_access_valid(AddressSpace *as, hwaddr addr,
                                 MemTxAttrs attrs)
 {
     FlatView *fv;
-    bool result;
 
     RCU_READ_LOCK_GUARD();
     fv = address_space_to_flatview(as);
-    result = flatview_access_valid(fv, addr, len, is_write, attrs);
-    return result;
+    return flatview_access_valid(fv, addr, len, is_write, attrs);
 }
 
 static hwaddr
-- 
Gitee


From e736dcbd2681bf89a352f6f156ecd94b791ece9a Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Wed, 13 Apr 2022 14:51:06 -0400
Subject: [PATCH 071/407] softmmu/physmem: Introduce MemTxAttrs::memory field
 and MEMTX_ACCESS_ERROR
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 151: hw/intc/arm_gicv3: Check for !MEMTX_OK instead of MEMTX_ERROR
RH-Commit: [3/3] b1ebc1e99f21ba0b9eccb284e260b56c7a8e64d8 (jmaloy/qemu-kvm)
RH-Bugzilla: 1999236
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
RH-Acked-by: Peter Xu <peterx@redhat.com>

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1999236
Upstream: Merged
CVE: CVE-2021-3750
Conflicts: memalign.h has not been introduced in this version. Instead,
           we include osdep.h where the function prototypes are to be
           found.

commit 3ab6fdc91b72e156da22848f0003ff4225690ced
Author: Philippe Mathieu-Daudé <philmd@redhat.com>
Date:   Wed Dec 15 19:24:21 2021 +0100

    softmmu/physmem: Introduce MemTxAttrs::memory field and MEMTX_ACCESS_ERROR

    Add the 'memory' bit to the memory attributes to restrict bus
    controller accesses to memories.

    Introduce flatview_access_allowed() to check bus permission
    before running any bus transaction.

    Have read/write accessors return MEMTX_ACCESS_ERROR if an access is
    restricted.

    There is no change for the default case where 'memory' is not set.

    Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
    Message-Id: <20211215182421.418374-4-philmd@redhat.com>
    Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
    Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
    [thuth: Replaced MEMTX_BUS_ERROR with MEMTX_ACCESS_ERROR, remove "inline"]
    Signed-off-by: Thomas Huth <thuth@redhat.com>

(cherry picked from commit 3ab6fdc91b72e156da22848f0003ff4225690ced)
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 include/exec/memattrs.h |  9 +++++++++
 softmmu/physmem.c       | 45 +++++++++++++++++++++++++++++++++++++++--
 2 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/include/exec/memattrs.h b/include/exec/memattrs.h
index 95f2d20d55b..9fb98bc1efd 100644
--- a/include/exec/memattrs.h
+++ b/include/exec/memattrs.h
@@ -35,6 +35,14 @@ typedef struct MemTxAttrs {
     unsigned int secure:1;
     /* Memory access is usermode (unprivileged) */
     unsigned int user:1;
+    /*
+     * Bus interconnect and peripherals can access anything (memories,
+     * devices) by default. By setting the 'memory' bit, bus transaction
+     * are restricted to "normal" memories (per the AMBA documentation)
+     * versus devices. Access to devices will be logged and rejected
+     * (see MEMTX_ACCESS_ERROR).
+     */
+    unsigned int memory:1;
     /* Requester ID (for MSI for example) */
     unsigned int requester_id:16;
     /* Invert endianness for this page */
@@ -66,6 +74,7 @@ typedef struct MemTxAttrs {
 #define MEMTX_OK 0
 #define MEMTX_ERROR             (1U << 0) /* device returned an error */
 #define MEMTX_DECODE_ERROR      (1U << 1) /* nothing at that address */
+#define MEMTX_ACCESS_ERROR      (1U << 2) /* access denied */
 typedef uint32_t MemTxResult;
 
 #endif
diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 483a31be816..4d0ef5f92fe 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -41,6 +41,8 @@
 #include "qemu/config-file.h"
 #include "qemu/error-report.h"
 #include "qemu/qemu-print.h"
+#include "qemu/log.h"
+#include "qemu/osdep.h"
 #include "exec/memory.h"
 #include "exec/ioport.h"
 #include "sysemu/dma.h"
@@ -2759,6 +2761,33 @@ static bool prepare_mmio_access(MemoryRegion *mr)
     return release_lock;
 }
 
+/**
+ * flatview_access_allowed
+ * @mr: #MemoryRegion to be accessed
+ * @attrs: memory transaction attributes
+ * @addr: address within that memory region
+ * @len: the number of bytes to access
+ *
+ * Check if a memory transaction is allowed.
+ *
+ * Returns: true if transaction is allowed, false if denied.
+ */
+static bool flatview_access_allowed(MemoryRegion *mr, MemTxAttrs attrs,
+                                    hwaddr addr, hwaddr len)
+{
+    if (likely(!attrs.memory)) {
+        return true;
+    }
+    if (memory_region_is_ram(mr)) {
+        return true;
+    }
+    qemu_log_mask(LOG_GUEST_ERROR,
+                  "Invalid access to non-RAM device at "
+                  "addr 0x%" HWADDR_PRIX ", size %" HWADDR_PRIu ", "
+                  "region '%s'\n", addr, len, memory_region_name(mr));
+    return false;
+}
+
 /* Called within RCU critical section.  */
 static MemTxResult flatview_write_continue(FlatView *fv, hwaddr addr,
                                            MemTxAttrs attrs,
@@ -2773,7 +2802,10 @@ static MemTxResult flatview_write_continue(FlatView *fv, hwaddr addr,
     const uint8_t *buf = ptr;
 
     for (;;) {
-        if (!memory_access_is_direct(mr, true)) {
+        if (!flatview_access_allowed(mr, attrs, addr1, l)) {
+            result |= MEMTX_ACCESS_ERROR;
+            /* Keep going. */
+        } else if (!memory_access_is_direct(mr, true)) {
             release_lock |= prepare_mmio_access(mr);
             l = memory_access_size(mr, l, addr1);
             /* XXX: could force current_cpu to NULL to avoid
@@ -2818,6 +2850,9 @@ static MemTxResult flatview_write(FlatView *fv, hwaddr addr, MemTxAttrs attrs,
 
     l = len;
     mr = flatview_translate(fv, addr, &addr1, &l, true, attrs);
+    if (!flatview_access_allowed(mr, attrs, addr, len)) {
+        return MEMTX_ACCESS_ERROR;
+    }
     return flatview_write_continue(fv, addr, attrs, buf, len,
                                    addr1, l, mr);
 }
@@ -2836,7 +2871,10 @@ MemTxResult flatview_read_continue(FlatView *fv, hwaddr addr,
 
     fuzz_dma_read_cb(addr, len, mr);
     for (;;) {
-        if (!memory_access_is_direct(mr, false)) {
+        if (!flatview_access_allowed(mr, attrs, addr1, l)) {
+            result |= MEMTX_ACCESS_ERROR;
+            /* Keep going. */
+        } else if (!memory_access_is_direct(mr, false)) {
             /* I/O case */
             release_lock |= prepare_mmio_access(mr);
             l = memory_access_size(mr, l, addr1);
@@ -2879,6 +2917,9 @@ static MemTxResult flatview_read(FlatView *fv, hwaddr addr,
 
     l = len;
     mr = flatview_translate(fv, addr, &addr1, &l, false, attrs);
+    if (!flatview_access_allowed(mr, attrs, addr, len)) {
+        return MEMTX_ACCESS_ERROR;
+    }
     return flatview_read_continue(fv, addr, attrs, buf, len,
                                   addr1, l, mr);
 }
-- 
Gitee


From 05a6b969de3d76a276c2c6470ad97cd0cd120cfd Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@rehat.com>
Date: Wed, 13 Apr 2022 20:54:45 -0400
Subject: [PATCH 072/407] display/qxl-render: fix race condition in qxl_cursor
 (CVE-2021-4207)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 152: display/qxl-render: fix race condition in qxl_cursor (CVE-2021-4207)
RH-Commit: [1/1] f05b9a956f2e0ca522b5be127beff813d04b5588 (jmaloy/qemu-kvm)
RH-Bugzilla: 2040738
RH-Acked-by: Gerd Hoffmann <kraxel@redhat.com>
RH-Acked-by: Mauro Matteo Cascella <None>

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2040738
Upstream: Merged
CVE: CVE-2021-4207

commit 9569f5cb5b4bffa9d3ebc8ba7da1e03830a9a895
Author: Mauro Matteo Cascella <mcascell@redhat.com>
Date:   Thu Apr 7 10:11:06 2022 +0200

    display/qxl-render: fix race condition in qxl_cursor (CVE-2021-4207)

    Avoid fetching 'width' and 'height' a second time to prevent possible
    race condition. Refer to security advisory
    https://starlabs.sg/advisories/22-4207/ for more information.

    Fixes: CVE-2021-4207
    Signed-off-by: Mauro Matteo Cascella <mcascell@redhat.com>
    Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
    Message-Id: <20220407081106.343235-1-mcascell@redhat.com>
    Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

(cherry picked from commit 9569f5cb5b4bffa9d3ebc8ba7da1e03830a9a895)
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 hw/display/qxl-render.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/display/qxl-render.c b/hw/display/qxl-render.c
index d28849b1217..237ed293baa 100644
--- a/hw/display/qxl-render.c
+++ b/hw/display/qxl-render.c
@@ -266,7 +266,7 @@ static QEMUCursor *qxl_cursor(PCIQXLDevice *qxl, QXLCursor *cursor,
         }
         break;
     case SPICE_CURSOR_TYPE_ALPHA:
-        size = sizeof(uint32_t) * cursor->header.width * cursor->header.height;
+        size = sizeof(uint32_t) * c->width * c->height;
         qxl_unpack_chunks(c->data, size, qxl, &cursor->chunk, group_id);
         if (qxl->debug > 2) {
             cursor_print_ascii_art(c, "qxl/alpha");
-- 
Gitee


From a3cf30ad0a3c4985a4fc4354d59926efb369eb75 Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@rehat.com>
Date: Thu, 14 Apr 2022 10:38:26 -0400
Subject: [PATCH 073/407] vhost-vsock: detach the virqueue element in case of
 error

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 153: vhost-vsock: detach the virqueue element in case of error
RH-Commit: [1/1] 024dbc9073fddbe89a8ae8eb201f5bc674bffb64 (jmaloy/qemu-kvm)
RH-Bugzilla: 2063262
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
RH-Acked-by: Stefano Garzarella <sgarzare@redhat.com>

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2063262
Upstream: Merged
CVE: CVE-2022-26354

commit 8d1b247f3748ac4078524130c6d7ae42b6140aaf
Author: Stefano Garzarella <sgarzare@redhat.com>
Date:   Mon Feb 28 10:50:58 2022 +0100

    vhost-vsock: detach the virqueue element in case of error

    In vhost_vsock_common_send_transport_reset(), if an element popped from
    the virtqueue is invalid, we should call virtqueue_detach_element() to
    detach it from the virtqueue before freeing its memory.

    Fixes: fc0b9b0e1c ("vhost-vsock: add virtio sockets device")
    Fixes: CVE-2022-26354
    Cc: qemu-stable@nongnu.org
    Reported-by: VictorV <vv474172261@gmail.com>
    Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
    Message-Id: <20220228095058.27899-1-sgarzare@redhat.com>
    Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
    Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

(cherry picked from commit 8d1b247f3748ac4078524130c6d7ae42b6140aaf)
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 hw/virtio/vhost-vsock-common.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/vhost-vsock-common.c b/hw/virtio/vhost-vsock-common.c
index 3f3771274e7..ed706681ace 100644
--- a/hw/virtio/vhost-vsock-common.c
+++ b/hw/virtio/vhost-vsock-common.c
@@ -153,19 +153,23 @@ static void vhost_vsock_common_send_transport_reset(VHostVSockCommon *vvc)
     if (elem->out_num) {
         error_report("invalid vhost-vsock event virtqueue element with "
                      "out buffers");
-        goto out;
+        goto err;
     }
 
     if (iov_from_buf(elem->in_sg, elem->in_num, 0,
                      &event, sizeof(event)) != sizeof(event)) {
         error_report("vhost-vsock event virtqueue element is too short");
-        goto out;
+        goto err;
     }
 
     virtqueue_push(vq, elem, sizeof(event));
     virtio_notify(VIRTIO_DEVICE(vvc), vq);
 
-out:
+    g_free(elem);
+    return;
+
+err:
+    virtqueue_detach_element(vq, elem, 0);
     g_free(elem);
 }
 
-- 
Gitee


From efedecbece02af1f0d48128b2078df128404320f Mon Sep 17 00:00:00 2001
From: Thomas Huth <thuth@redhat.com>
Date: Tue, 5 Apr 2022 10:20:59 +0200
Subject: [PATCH 074/407] s390x/ipl: support extended kernel command line size

RH-Author: Thomas Huth <thuth@redhat.com>
RH-MergeRequest: 144: s390x/ipl: support extended kernel command line size
RH-Commit: [1/1] be227e50af5dbe7802605f873db29ac5358aa196
RH-Bugzilla: 2043830
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: David Hildenbrand <david@redhat.com>

Bugzilla: http://bugzilla.redhat.com/2043830

commit b2173046a64beed76715f310f98538f159276af1
Author: Marc Hartmayer <mhartmay@linux.ibm.com>
Date:   Mon Nov 22 12:29:09 2021 +0100

    s390x/ipl: support extended kernel command line size

    In the past s390 used a fixed command line length of 896 bytes. This has changed
    with the Linux commit 5ecb2da660ab ("s390: support command lines longer than 896
    bytes"). There is now a parm area indicating the maximum command line size. This
    parm area has always been initialized to zero, so with older kernels this field
    would read zero and we must then assume that only 896 bytes are available.

    Signed-off-by: Marc Hartmayer <mhartmay@linux.ibm.com>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
    Acked-by: Viktor Mihajlovski <mihajlov@de.ibm.com>
    Message-Id: <20211122112909.18138-1-mhartmay@linux.ibm.com>
    [thuth: Cosmetic fixes, and use PRIu64 instead of %lu]
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Thomas Huth <thuth@redhat.com>
---
 hw/s390x/ipl.c | 27 +++++++++++++++++++++++----
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/hw/s390x/ipl.c b/hw/s390x/ipl.c
index 7ddca0127fc..eb7fc4c4ae8 100644
--- a/hw/s390x/ipl.c
+++ b/hw/s390x/ipl.c
@@ -37,8 +37,9 @@
 
 #define KERN_IMAGE_START                0x010000UL
 #define LINUX_MAGIC_ADDR                0x010008UL
+#define KERN_PARM_AREA_SIZE_ADDR        0x010430UL
 #define KERN_PARM_AREA                  0x010480UL
-#define KERN_PARM_AREA_SIZE             0x000380UL
+#define LEGACY_KERN_PARM_AREA_SIZE      0x000380UL
 #define INITRD_START                    0x800000UL
 #define INITRD_PARM_START               0x010408UL
 #define PARMFILE_START                  0x001000UL
@@ -110,6 +111,21 @@ static uint64_t bios_translate_addr(void *opaque, uint64_t srcaddr)
     return srcaddr + dstaddr;
 }
 
+static uint64_t get_max_kernel_cmdline_size(void)
+{
+    uint64_t *size_ptr = rom_ptr(KERN_PARM_AREA_SIZE_ADDR, sizeof(*size_ptr));
+
+    if (size_ptr) {
+        uint64_t size;
+
+        size = be64_to_cpu(*size_ptr);
+        if (size) {
+            return size;
+        }
+    }
+    return LEGACY_KERN_PARM_AREA_SIZE;
+}
+
 static void s390_ipl_realize(DeviceState *dev, Error **errp)
 {
     MachineState *ms = MACHINE(qdev_get_machine());
@@ -197,10 +213,13 @@ static void s390_ipl_realize(DeviceState *dev, Error **errp)
             ipl->start_addr = KERN_IMAGE_START;
             /* Overwrite parameters in the kernel image, which are "rom" */
             if (parm_area) {
-                if (cmdline_size > KERN_PARM_AREA_SIZE) {
+                uint64_t max_cmdline_size = get_max_kernel_cmdline_size();
+
+                if (cmdline_size > max_cmdline_size) {
                     error_setg(errp,
-                               "kernel command line exceeds maximum size: %zu > %lu",
-                               cmdline_size, KERN_PARM_AREA_SIZE);
+                               "kernel command line exceeds maximum size:"
+                               " %zu > %" PRIu64,
+                               cmdline_size, max_cmdline_size);
                     return;
                 }
 
-- 
Gitee


From 7aa61dfe7edce1a4ccdb54edf979d438ac53e6f7 Mon Sep 17 00:00:00 2001
From: Jason Wang <jasowang@redhat.com>
Date: Tue, 8 Mar 2022 10:42:51 +0800
Subject: [PATCH 075/407] virtio-net: fix map leaking on error during receive

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 154: virtio-net: fix map leaking on error during receive
RH-Commit: [1/1] 7178b0cd5ce7c89fe476f2e199c9212c8b89327a (jmaloy/qemu-kvm)
RH-Bugzilla: 2063206
RH-Acked-by: Jason Wang <None>
RH-Acked-by: Kevin Wolf <kwolf@redhat.com>
RH-Acked-by: Laurent Vivier <lvivier@redhat.com>

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2063206
Upstream: Merged
CVE: CVE-2022-26353

commit abe300d9d894f7138e1af7c8e9c88c04bfe98b37
Author: Jason Wang <jasowang@redhat.com>
Date:   Tue Mar 8 10:42:51 2022 +0800

    virtio-net: fix map leaking on error during receive

    Commit bedd7e93d0196 ("virtio-net: fix use after unmap/free for sg")
    tries to fix the use after free of the sg by caching the virtqueue
    elements in an array and unmap them at once after receiving the
    packets, But it forgot to unmap the cached elements on error which
    will lead to leaking of mapping and other unexpected results.

    Fixing this by detaching the cached elements on error. This addresses
    CVE-2022-26353.

    Reported-by: Victor Tom <vv474172261@gmail.com>
    Cc: qemu-stable@nongnu.org
    Fixes: CVE-2022-26353
    Fixes: bedd7e93d0196 ("virtio-net: fix use after unmap/free for sg")
    Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
    Signed-off-by: Jason Wang <jasowang@redhat.com>

(cherry picked from commit abe300d9d894f7138e1af7c8e9c88c04bfe98b37)
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 hw/net/virtio-net.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index f2014d5ea0b..e1f4748831e 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1862,6 +1862,7 @@ static ssize_t virtio_net_receive_rcu(NetClientState *nc, const uint8_t *buf,
 
 err:
     for (j = 0; j < i; j++) {
+        virtqueue_detach_element(q->rx_vq, elems[j], lens[j]);
         g_free(elems[j]);
     }
 
-- 
Gitee


From 1b6eefba66afd87e958ced870f5beb57fe1b8354 Mon Sep 17 00:00:00 2001
From: Hanna Reitz <hreitz@redhat.com>
Date: Tue, 5 Apr 2022 15:46:50 +0200
Subject: [PATCH 076/407] qcow2: Improve refcount structure rebuilding

RH-Author: Hanna Reitz <hreitz@redhat.com>
RH-MergeRequest: 171: qcow2: Improve refcount structure rebuilding
RH-Commit: [1/4] 0bb78f7735a0730204670ae5ec2e040ad1d23942
RH-Bugzilla: 1519071
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
RH-Acked-by: Eric Blake <eblake@redhat.com>

When rebuilding the refcount structures (when qemu-img check -r found
errors with refcount = 0, but reference count > 0), the new refcount
table defaults to being put at the image file end[1].  There is no good
reason for that except that it means we will not have to rewrite any
refblocks we already wrote to disk.

Changing the code to rewrite those refblocks is not too difficult,
though, so let us do that.  That is beneficial for images on block
devices, where we cannot really write beyond the end of the image file.

Use this opportunity to add extensive comments to the code, and refactor
it a bit, getting rid of the backwards-jumping goto.

[1] Unless there is something allocated in the area pointed to by the
    last refblock, so we have to write that refblock.  In that case, we
    try to put the reftable in there.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1519071
Closes: https://gitlab.com/qemu-project/qemu/-/issues/941
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220405134652.19278-2-hreitz@redhat.com>
(cherry picked from commit a8c07ec287554dcefd33733f0e5888a281ddc95e)
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
 block/qcow2-refcount.c | 332 +++++++++++++++++++++++++++++------------
 1 file changed, 235 insertions(+), 97 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 46145722527..555d8ba5aca 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -2435,111 +2435,140 @@ static int64_t alloc_clusters_imrt(BlockDriverState *bs,
 }
 
 /*
- * Creates a new refcount structure based solely on the in-memory information
- * given through *refcount_table. All necessary allocations will be reflected
- * in that array.
+ * Helper function for rebuild_refcount_structure().
  *
- * On success, the old refcount structure is leaked (it will be covered by the
- * new refcount structure).
+ * Scan the range of clusters [first_cluster, end_cluster) for allocated
+ * clusters and write all corresponding refblocks to disk.  The refblock
+ * and allocation data is taken from the in-memory refcount table
+ * *refcount_table[] (of size *nb_clusters), which is basically one big
+ * (unlimited size) refblock for the whole image.
+ *
+ * For these refblocks, clusters are allocated using said in-memory
+ * refcount table.  Care is taken that these allocations are reflected
+ * in the refblocks written to disk.
+ *
+ * The refblocks' offsets are written into a reftable, which is
+ * *on_disk_reftable_ptr[] (of size *on_disk_reftable_entries_ptr).  If
+ * that reftable is of insufficient size, it will be resized to fit.
+ * This reftable is not written to disk.
+ *
+ * (If *on_disk_reftable_ptr is not NULL, the entries within are assumed
+ * to point to existing valid refblocks that do not need to be allocated
+ * again.)
+ *
+ * Return whether the on-disk reftable array was resized (true/false),
+ * or -errno on error.
  */
-static int rebuild_refcount_structure(BlockDriverState *bs,
-                                      BdrvCheckResult *res,
-                                      void **refcount_table,
-                                      int64_t *nb_clusters)
+static int rebuild_refcounts_write_refblocks(
+        BlockDriverState *bs, void **refcount_table, int64_t *nb_clusters,
+        int64_t first_cluster, int64_t end_cluster,
+        uint64_t **on_disk_reftable_ptr, uint32_t *on_disk_reftable_entries_ptr
+    )
 {
     BDRVQcow2State *s = bs->opaque;
-    int64_t first_free_cluster = 0, reftable_offset = -1, cluster = 0;
+    int64_t cluster;
     int64_t refblock_offset, refblock_start, refblock_index;
-    uint32_t reftable_size = 0;
-    uint64_t *on_disk_reftable = NULL;
+    int64_t first_free_cluster = 0;
+    uint64_t *on_disk_reftable = *on_disk_reftable_ptr;
+    uint32_t on_disk_reftable_entries = *on_disk_reftable_entries_ptr;
     void *on_disk_refblock;
-    int ret = 0;
-    struct {
-        uint64_t reftable_offset;
-        uint32_t reftable_clusters;
-    } QEMU_PACKED reftable_offset_and_clusters;
-
-    qcow2_cache_empty(bs, s->refcount_block_cache);
+    bool reftable_grown = false;
+    int ret;
 
-write_refblocks:
-    for (; cluster < *nb_clusters; cluster++) {
+    for (cluster = first_cluster; cluster < end_cluster; cluster++) {
+        /* Check all clusters to find refblocks that contain non-zero entries */
         if (!s->get_refcount(*refcount_table, cluster)) {
             continue;
         }
 
+        /*
+         * This cluster is allocated, so we need to create a refblock
+         * for it.  The data we will write to disk is just the
+         * respective slice from *refcount_table, so it will contain
+         * accurate refcounts for all clusters belonging to this
+         * refblock.  After we have written it, we will therefore skip
+         * all remaining clusters in this refblock.
+         */
+
         refblock_index = cluster >> s->refcount_block_bits;
         refblock_start = refblock_index << s->refcount_block_bits;
 
-        /* Don't allocate a cluster in a refblock already written to disk */
-        if (first_free_cluster < refblock_start) {
-            first_free_cluster = refblock_start;
-        }
-        refblock_offset = alloc_clusters_imrt(bs, 1, refcount_table,
-                                              nb_clusters, &first_free_cluster);
-        if (refblock_offset < 0) {
-            fprintf(stderr, "ERROR allocating refblock: %s\n",
-                    strerror(-refblock_offset));
-            res->check_errors++;
-            ret = refblock_offset;
-            goto fail;
-        }
+        if (on_disk_reftable_entries > refblock_index &&
+            on_disk_reftable[refblock_index])
+        {
+            /*
+             * We can get here after a `goto write_refblocks`: We have a
+             * reftable from a previous run, and the refblock is already
+             * allocated.  No need to allocate it again.
+             */
+            refblock_offset = on_disk_reftable[refblock_index];
+        } else {
+            int64_t refblock_cluster_index;
 
-        if (reftable_size <= refblock_index) {
-            uint32_t old_reftable_size = reftable_size;
-            uint64_t *new_on_disk_reftable;
+            /* Don't allocate a cluster in a refblock already written to disk */
+            if (first_free_cluster < refblock_start) {
+                first_free_cluster = refblock_start;
+            }
+            refblock_offset = alloc_clusters_imrt(bs, 1, refcount_table,
+                                                  nb_clusters,
+                                                  &first_free_cluster);
+            if (refblock_offset < 0) {
+                fprintf(stderr, "ERROR allocating refblock: %s\n",
+                        strerror(-refblock_offset));
+                return refblock_offset;
+            }
 
-            reftable_size = ROUND_UP((refblock_index + 1) * REFTABLE_ENTRY_SIZE,
-                                     s->cluster_size) / REFTABLE_ENTRY_SIZE;
-            new_on_disk_reftable = g_try_realloc(on_disk_reftable,
-                                                 reftable_size *
-                                                 REFTABLE_ENTRY_SIZE);
-            if (!new_on_disk_reftable) {
-                res->check_errors++;
-                ret = -ENOMEM;
-                goto fail;
+            refblock_cluster_index = refblock_offset / s->cluster_size;
+            if (refblock_cluster_index >= end_cluster) {
+                /*
+                 * We must write the refblock that holds this refblock's
+                 * refcount
+                 */
+                end_cluster = refblock_cluster_index + 1;
             }
-            on_disk_reftable = new_on_disk_reftable;
 
-            memset(on_disk_reftable + old_reftable_size, 0,
-                   (reftable_size - old_reftable_size) * REFTABLE_ENTRY_SIZE);
+            if (on_disk_reftable_entries <= refblock_index) {
+                on_disk_reftable_entries =
+                    ROUND_UP((refblock_index + 1) * REFTABLE_ENTRY_SIZE,
+                             s->cluster_size) / REFTABLE_ENTRY_SIZE;
+                on_disk_reftable =
+                    g_try_realloc(on_disk_reftable,
+                                  on_disk_reftable_entries *
+                                  REFTABLE_ENTRY_SIZE);
+                if (!on_disk_reftable) {
+                    return -ENOMEM;
+                }
 
-            /* The offset we have for the reftable is now no longer valid;
-             * this will leak that range, but we can easily fix that by running
-             * a leak-fixing check after this rebuild operation */
-            reftable_offset = -1;
-        } else {
-            assert(on_disk_reftable);
-        }
-        on_disk_reftable[refblock_index] = refblock_offset;
+                memset(on_disk_reftable + *on_disk_reftable_entries_ptr, 0,
+                       (on_disk_reftable_entries -
+                        *on_disk_reftable_entries_ptr) *
+                       REFTABLE_ENTRY_SIZE);
 
-        /* If this is apparently the last refblock (for now), try to squeeze the
-         * reftable in */
-        if (refblock_index == (*nb_clusters - 1) >> s->refcount_block_bits &&
-            reftable_offset < 0)
-        {
-            uint64_t reftable_clusters = size_to_clusters(s, reftable_size *
-                                                          REFTABLE_ENTRY_SIZE);
-            reftable_offset = alloc_clusters_imrt(bs, reftable_clusters,
-                                                  refcount_table, nb_clusters,
-                                                  &first_free_cluster);
-            if (reftable_offset < 0) {
-                fprintf(stderr, "ERROR allocating reftable: %s\n",
-                        strerror(-reftable_offset));
-                res->check_errors++;
-                ret = reftable_offset;
-                goto fail;
+                *on_disk_reftable_ptr = on_disk_reftable;
+                *on_disk_reftable_entries_ptr = on_disk_reftable_entries;
+
+                reftable_grown = true;
+            } else {
+                assert(on_disk_reftable);
             }
+            on_disk_reftable[refblock_index] = refblock_offset;
         }
 
+        /* Refblock is allocated, write it to disk */
+
         ret = qcow2_pre_write_overlap_check(bs, 0, refblock_offset,
                                             s->cluster_size, false);
         if (ret < 0) {
             fprintf(stderr, "ERROR writing refblock: %s\n", strerror(-ret));
-            goto fail;
+            return ret;
         }
 
-        /* The size of *refcount_table is always cluster-aligned, therefore the
-         * write operation will not overflow */
+        /*
+         * The refblock is simply a slice of *refcount_table.
+         * Note that the size of *refcount_table is always aligned to
+         * whole clusters, so the write operation will not result in
+         * out-of-bounds accesses.
+         */
         on_disk_refblock = (void *)((char *) *refcount_table +
                                     refblock_index * s->cluster_size);
 
@@ -2547,23 +2576,99 @@ write_refblocks:
                           s->cluster_size);
         if (ret < 0) {
             fprintf(stderr, "ERROR writing refblock: %s\n", strerror(-ret));
-            goto fail;
+            return ret;
         }
 
-        /* Go to the end of this refblock */
+        /* This refblock is done, skip to its end */
         cluster = refblock_start + s->refcount_block_size - 1;
     }
 
-    if (reftable_offset < 0) {
-        uint64_t post_refblock_start, reftable_clusters;
+    return reftable_grown;
+}
+
+/*
+ * Creates a new refcount structure based solely on the in-memory information
+ * given through *refcount_table (this in-memory information is basically just
+ * the concatenation of all refblocks).  All necessary allocations will be
+ * reflected in that array.
+ *
+ * On success, the old refcount structure is leaked (it will be covered by the
+ * new refcount structure).
+ */
+static int rebuild_refcount_structure(BlockDriverState *bs,
+                                      BdrvCheckResult *res,
+                                      void **refcount_table,
+                                      int64_t *nb_clusters)
+{
+    BDRVQcow2State *s = bs->opaque;
+    int64_t reftable_offset = -1;
+    int64_t reftable_length = 0;
+    int64_t reftable_clusters;
+    int64_t refblock_index;
+    uint32_t on_disk_reftable_entries = 0;
+    uint64_t *on_disk_reftable = NULL;
+    int ret = 0;
+    int reftable_size_changed = 0;
+    struct {
+        uint64_t reftable_offset;
+        uint32_t reftable_clusters;
+    } QEMU_PACKED reftable_offset_and_clusters;
+
+    qcow2_cache_empty(bs, s->refcount_block_cache);
+
+    /*
+     * For each refblock containing entries, we try to allocate a
+     * cluster (in the in-memory refcount table) and write its offset
+     * into on_disk_reftable[].  We then write the whole refblock to
+     * disk (as a slice of the in-memory refcount table).
+     * This is done by rebuild_refcounts_write_refblocks().
+     *
+     * Once we have scanned all clusters, we try to find space for the
+     * reftable.  This will dirty the in-memory refcount table (i.e.
+     * make it differ from the refblocks we have already written), so we
+     * need to run rebuild_refcounts_write_refblocks() again for the
+     * range of clusters where the reftable has been allocated.
+     *
+     * This second run might make the reftable grow again, in which case
+     * we will need to allocate another space for it, which is why we
+     * repeat all this until the reftable stops growing.
+     *
+     * (This loop will terminate, because with every cluster the
+     * reftable grows, it can accomodate a multitude of more refcounts,
+     * so that at some point this must be able to cover the reftable
+     * and all refblocks describing it.)
+     *
+     * We then convert the reftable to big-endian and write it to disk.
+     *
+     * Note that we never free any reftable allocations.  Doing so would
+     * needlessly complicate the algorithm: The eventual second check
+     * run we do will clean up all leaks we have caused.
+     */
+
+    reftable_size_changed =
+        rebuild_refcounts_write_refblocks(bs, refcount_table, nb_clusters,
+                                          0, *nb_clusters,
+                                          &on_disk_reftable,
+                                          &on_disk_reftable_entries);
+    if (reftable_size_changed < 0) {
+        res->check_errors++;
+        ret = reftable_size_changed;
+        goto fail;
+    }
+
+    /*
+     * There was no reftable before, so rebuild_refcounts_write_refblocks()
+     * must have increased its size (from 0 to something).
+     */
+    assert(reftable_size_changed);
+
+    do {
+        int64_t reftable_start_cluster, reftable_end_cluster;
+        int64_t first_free_cluster = 0;
+
+        reftable_length = on_disk_reftable_entries * REFTABLE_ENTRY_SIZE;
+        reftable_clusters = size_to_clusters(s, reftable_length);
 
-        post_refblock_start = ROUND_UP(*nb_clusters, s->refcount_block_size);
-        reftable_clusters =
-            size_to_clusters(s, reftable_size * REFTABLE_ENTRY_SIZE);
-        /* Not pretty but simple */
-        if (first_free_cluster < post_refblock_start) {
-            first_free_cluster = post_refblock_start;
-        }
         reftable_offset = alloc_clusters_imrt(bs, reftable_clusters,
                                               refcount_table, nb_clusters,
                                               &first_free_cluster);
@@ -2575,24 +2680,55 @@ write_refblocks:
             goto fail;
         }
 
-        goto write_refblocks;
-    }
+        /*
+         * We need to update the affected refblocks, so re-run the
+         * write_refblocks loop for the reftable's range of clusters.
+         */
+        assert(offset_into_cluster(s, reftable_offset) == 0);
+        reftable_start_cluster = reftable_offset / s->cluster_size;
+        reftable_end_cluster = reftable_start_cluster + reftable_clusters;
+        reftable_size_changed =
+            rebuild_refcounts_write_refblocks(bs, refcount_table, nb_clusters,
+                                              reftable_start_cluster,
+                                              reftable_end_cluster,
+                                              &on_disk_reftable,
+                                              &on_disk_reftable_entries);
+        if (reftable_size_changed < 0) {
+            res->check_errors++;
+            ret = reftable_size_changed;
+            goto fail;
+        }
+
+        /*
+         * If the reftable size has changed, we will need to find a new
+         * allocation, repeating the loop.
+         */
+    } while (reftable_size_changed);
 
-    for (refblock_index = 0; refblock_index < reftable_size; refblock_index++) {
+    /* The above loop must have run at least once */
+    assert(reftable_offset >= 0);
+
+    /*
+     * All allocations are done, all refblocks are written, convert the
+     * reftable to big-endian and write it to disk.
+     */
+
+    for (refblock_index = 0; refblock_index < on_disk_reftable_entries;
+         refblock_index++)
+    {
         cpu_to_be64s(&on_disk_reftable[refblock_index]);
     }
 
-    ret = qcow2_pre_write_overlap_check(bs, 0, reftable_offset,
-                                        reftable_size * REFTABLE_ENTRY_SIZE,
+    ret = qcow2_pre_write_overlap_check(bs, 0, reftable_offset, reftable_length,
                                         false);
     if (ret < 0) {
         fprintf(stderr, "ERROR writing reftable: %s\n", strerror(-ret));
         goto fail;
     }
 
-    assert(reftable_size < INT_MAX / REFTABLE_ENTRY_SIZE);
+    assert(reftable_length < INT_MAX);
     ret = bdrv_pwrite(bs->file, reftable_offset, on_disk_reftable,
-                      reftable_size * REFTABLE_ENTRY_SIZE);
+                      reftable_length);
     if (ret < 0) {
         fprintf(stderr, "ERROR writing reftable: %s\n", strerror(-ret));
         goto fail;
@@ -2601,7 +2737,7 @@ write_refblocks:
     /* Enter new reftable into the image header */
     reftable_offset_and_clusters.reftable_offset = cpu_to_be64(reftable_offset);
     reftable_offset_and_clusters.reftable_clusters =
-        cpu_to_be32(size_to_clusters(s, reftable_size * REFTABLE_ENTRY_SIZE));
+        cpu_to_be32(reftable_clusters);
     ret = bdrv_pwrite_sync(bs->file,
                            offsetof(QCowHeader, refcount_table_offset),
                            &reftable_offset_and_clusters,
@@ -2611,12 +2747,14 @@ write_refblocks:
         goto fail;
     }
 
-    for (refblock_index = 0; refblock_index < reftable_size; refblock_index++) {
+    for (refblock_index = 0; refblock_index < on_disk_reftable_entries;
+         refblock_index++)
+    {
         be64_to_cpus(&on_disk_reftable[refblock_index]);
     }
     s->refcount_table = on_disk_reftable;
     s->refcount_table_offset = reftable_offset;
-    s->refcount_table_size = reftable_size;
+    s->refcount_table_size = on_disk_reftable_entries;
     update_max_refcount_table_index(s);
 
     return 0;
-- 
Gitee


From 44fafe10af843c930a42b0704a23dfd78b03e677 Mon Sep 17 00:00:00 2001
From: Hanna Reitz <hreitz@redhat.com>
Date: Tue, 5 Apr 2022 15:46:51 +0200
Subject: [PATCH 077/407] iotests/108: Test new refcount rebuild algorithm

RH-Author: Hanna Reitz <hreitz@redhat.com>
RH-MergeRequest: 171: qcow2: Improve refcount structure rebuilding
RH-Commit: [2/4] 2aa8c383f0c88c414f10ade8bd2e8af07c35f35b
RH-Bugzilla: 1519071
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
RH-Acked-by: Eric Blake <eblake@redhat.com>

One clear problem with how qcow2's refcount structure rebuild algorithm
used to be before "qcow2: Improve refcount structure rebuilding" was
that it is prone to failure for qcow2 images on block devices: There is
generally unused space after the actual image, and if that exceeds what
one refblock covers, the old algorithm would invariably write the
reftable past the block device's end, which cannot work.  The new
algorithm does not have this problem.

Test it with three tests:
(1) Create an image with more empty space at the end than what one
    refblock covers, see whether rebuilding the refcount structures
    results in a change in the image file length.  (It should not.)

(2) Leave precisely enough space somewhere at the beginning of the image
    for the new reftable (and the refblock for that place), see whether
    the new algorithm puts the reftable there.  (It should.)

(3) Test the original problem: Create (something like) a block device
    with a fixed size, then create a qcow2 image in there, write some
    data, and then have qemu-img check rebuild the refcount structures.
    Before HEAD^, the reftable would have been written past the image
    file end, i.e. outside of what the block device provides, which
    cannot work.  HEAD^ should have fixed that.
    ("Something like a block device" means a loop device if we can use
    one ("sudo -n losetup" works), or a FUSE block export with
    growable=false otherwise.)

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220405134652.19278-3-hreitz@redhat.com>
(cherry picked from commit 9ffd6d646d1d5ee9087a8cbf0b7d2f96c5656162)

Conflicts:
- 108: The downstream qemu-storage-daemon does not support --daemonize,
  so this switch has been replaced by a loop waiting for the PID file to
  appear

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
 tests/qemu-iotests/108     | 263 ++++++++++++++++++++++++++++++++++++-
 tests/qemu-iotests/108.out |  81 ++++++++++++
 2 files changed, 343 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/108 b/tests/qemu-iotests/108
index 8eaef0b8bf2..23abbeaff03 100755
--- a/tests/qemu-iotests/108
+++ b/tests/qemu-iotests/108
@@ -30,13 +30,20 @@ status=1	# failure is the default!
 
 _cleanup()
 {
-	_cleanup_test_img
+    _cleanup_test_img
+    if [ -f "$TEST_DIR/qsd.pid" ]; then
+        qsd_pid=$(cat "$TEST_DIR/qsd.pid")
+        kill -KILL "$qsd_pid"
+        fusermount -u "$TEST_DIR/fuse-export" &>/dev/null
+    fi
+    rm -f "$TEST_DIR/fuse-export"
 }
 trap "_cleanup; exit \$status" 0 1 2 3 15
 
 # get standard environment, filters and checks
 . ./common.rc
 . ./common.filter
+. ./common.qemu
 
 # This tests qcow2-specific low-level functionality
 _supported_fmt qcow2
@@ -47,6 +54,22 @@ _supported_os Linux
 # files
 _unsupported_imgopts 'refcount_bits=\([^1]\|.\([^6]\|$\)\)' data_file
 
+# This test either needs sudo -n losetup or FUSE exports to work
+if sudo -n losetup &>/dev/null; then
+    loopdev=true
+else
+    loopdev=false
+
+    # QSD --export fuse will either yield "Parameter 'id' is missing"
+    # or "Invalid parameter 'fuse'", depending on whether there is
+    # FUSE support or not.
+    error=$($QSD --export fuse 2>&1)
+    if [[ $error = *"'fuse'"* ]]; then
+        _notrun 'Passwordless sudo for losetup or FUSE support required, but' \
+                'neither is available'
+    fi
+fi
+
 echo
 echo '=== Repairing an image without any refcount table ==='
 echo
@@ -138,6 +161,244 @@ _make_test_img 64M
 poke_file "$TEST_IMG" $((0x10008)) "\xff\xff\xff\xff\xff\xff\x00\x00"
 _check_test_img -r all
 
+echo
+echo '=== Check rebuilt reftable location ==='
+
+# In an earlier version of the refcount rebuild algorithm, the
+# reftable was generally placed at the image end (unless something was
+# allocated in the area covered by the refblock right before the image
+# file end, then we would try to place the reftable in that refblock).
+# This was later changed so the reftable would be placed in the
+# earliest possible location.  Test this.
+
+echo
+echo '--- Does the image size increase? ---'
+echo
+
+# First test: Just create some image, write some data to it, and
+# resize it so there is free space at the end of the image (enough
+# that it spans at least one full refblock, which for cluster_size=512
+# images, spans 128k).  With the old algorithm, the reftable would
+# have then been placed at the end of the image file, but with the new
+# one, it will be put in that free space.
+# We want to check whether the size of the image file increases due to
+# rebuilding the refcount structures (it should not).
+
+_make_test_img -o 'cluster_size=512' 1M
+# Write something
+$QEMU_IO -c 'write 0 64k' "$TEST_IMG" | _filter_qemu_io
+
+# Add free space
+file_len=$(stat -c '%s' "$TEST_IMG")
+truncate -s $((file_len + 256 * 1024)) "$TEST_IMG"
+
+# Corrupt the image by saying the image header was not allocated
+rt_offset=$(peek_file_be "$TEST_IMG" 48 8)
+rb_offset=$(peek_file_be "$TEST_IMG" $rt_offset 8)
+poke_file "$TEST_IMG" $rb_offset "\x00\x00"
+
+# Check whether rebuilding the refcount structures increases the image
+# file size
+file_len=$(stat -c '%s' "$TEST_IMG")
+echo
+# The only leaks there can be are the old refcount structures that are
+# leaked during rebuilding, no need to clutter the output with them
+_check_test_img -r all | grep -v '^Repairing cluster.*refcount=1 reference=0'
+echo
+post_repair_file_len=$(stat -c '%s' "$TEST_IMG")
+
+if [[ $file_len -eq $post_repair_file_len ]]; then
+    echo 'OK: Image size did not change'
+else
+    echo 'ERROR: Image size differs' \
+        "($file_len before, $post_repair_file_len after)"
+fi
+
+echo
+echo '--- Will the reftable occupy a hole specifically left for it?  ---'
+echo
+
+# Note: With cluster_size=512, every refblock covers 128k.
+# The reftable covers 8M per reftable cluster.
+
+# Create an image that requires two reftable clusters (just because
+# this is more interesting than a single-clustered reftable).
+_make_test_img -o 'cluster_size=512' 9M
+$QEMU_IO -c 'write 0 8M' "$TEST_IMG" | _filter_qemu_io
+
+# Writing 8M will have resized the reftable.  Unfortunately, doing so
+# will leave holes in the file, so we need to fill them up so we can
+# be sure the whole file is allocated.  Do that by writing
+# consecutively smaller chunks starting from 8 MB, until the file
+# length increases even with a chunk size of 512.  Then we must have
+# filled all holes.
+ofs=$((8 * 1024 * 1024))
+block_len=$((16 * 1024))
+while [[ $block_len -ge 512 ]]; do
+    file_len=$(stat -c '%s' "$TEST_IMG")
+    while [[ $(stat -c '%s' "$TEST_IMG") -eq $file_len ]]; do
+        # Do not include this in the reference output, it does not
+        # really matter which qemu-io calls we do here exactly
+        $QEMU_IO -c "write $ofs $block_len" "$TEST_IMG" >/dev/null
+        ofs=$((ofs + block_len))
+    done
+    block_len=$((block_len / 2))
+done
+
+# Fill up to 9M (do not include this in the reference output either,
+# $ofs is random for all we know)
+$QEMU_IO -c "write $ofs $((9 * 1024 * 1024 - ofs))" "$TEST_IMG" >/dev/null
+
+# Make space as follows:
+# - For the first refblock: Right at the beginning of the image (this
+#   refblock is placed in the first place possible),
+# - For the reftable somewhere soon afterwards, still near the
+#   beginning of the image (i.e. covered by the first refblock); the
+#   reftable too is placed in the first place possible, but only after
+#   all refblocks have been placed)
+# No space is needed for the other refblocks, because no refblock is
+# put before the space it covers.  In this test case, we do not mind
+# if they are placed at the image file's end.
+
+# Before we make that space, we have to find out the host offset of
+# the area that belonged to the two data clusters at guest offset 4k,
+# because we expect the reftable to be placed there, and we will have
+# to verify that it is.
+
+l1_offset=$(peek_file_be "$TEST_IMG" 40 8)
+l2_offset=$(peek_file_be "$TEST_IMG" $l1_offset 8)
+l2_offset=$((l2_offset & 0x00fffffffffffe00))
+data_4k_offset=$(peek_file_be "$TEST_IMG" \
+                 $((l2_offset + 4096 / 512 * 8)) 8)
+data_4k_offset=$((data_4k_offset & 0x00fffffffffffe00))
+
+$QEMU_IO -c "discard 0 512" -c "discard 4k 1k" "$TEST_IMG" | _filter_qemu_io
+
+# Corrupt the image by saying the image header was not allocated
+rt_offset=$(peek_file_be "$TEST_IMG" 48 8)
+rb_offset=$(peek_file_be "$TEST_IMG" $rt_offset 8)
+poke_file "$TEST_IMG" $rb_offset "\x00\x00"
+
+echo
+# The only leaks there can be are the old refcount structures that are
+# leaked during rebuilding, no need to clutter the output with them
+_check_test_img -r all | grep -v '^Repairing cluster.*refcount=1 reference=0'
+echo
+
+# Check whether the reftable was put where we expected
+rt_offset=$(peek_file_be "$TEST_IMG" 48 8)
+if [[ $rt_offset -eq $data_4k_offset ]]; then
+    echo 'OK: Reftable is where we expect it'
+else
+    echo "ERROR: Reftable is at $rt_offset, but was expected at $data_4k_offset"
+fi
+
+echo
+echo '--- Rebuilding refcount structures on block devices ---'
+echo
+
+# A block device cannot really grow, at least not during qemu-img
+# check.  As mentioned in the above cases, rebuilding the refcount
+# structure may lead to new refcount structures being written after
+# the end of the image, and in the past that happened even if there
+# was more than sufficient space in the image.  Such post-EOF writes
+# will not work on block devices, so test that the new algorithm
+# avoids it.
+
+# If we have passwordless sudo and losetup, we can use those to create
+# a block device.  Otherwise, we can resort to qemu's FUSE export to
+# create a file that isn't growable, which effectively tests the same
+# thing.
+
+_cleanup_test_img
+truncate -s $((64 * 1024 * 1024)) "$TEST_IMG"
+
+if $loopdev; then
+    export_mp=$(sudo -n losetup --show -f "$TEST_IMG")
+    export_mp_driver=host_device
+    sudo -n chmod go+rw "$export_mp"
+else
+    # Create non-growable FUSE export that is a bit like an empty
+    # block device
+    export_mp="$TEST_DIR/fuse-export"
+    export_mp_driver=file
+    touch "$export_mp"
+
+    $QSD \
+        --blockdev file,node-name=export-node,filename="$TEST_IMG" \
+        --export fuse,id=fuse-export,node-name=export-node,mountpoint="$export_mp",writable=on,growable=off \
+        --pidfile "$TEST_DIR/qsd.pid" \
+        &
+
+    while [ ! -f "$TEST_DIR/qsd.pid" ]; do
+        sleep 0.1
+    done
+fi
+
+# Now create a qcow2 image on the device -- unfortunately, qemu-img
+# create force-creates the file, so we have to resort to the
+# blockdev-create job.
+_launch_qemu \
+    --blockdev $export_mp_driver,node-name=file,filename="$export_mp"
+
+_send_qemu_cmd \
+    $QEMU_HANDLE \
+    '{ "execute": "qmp_capabilities" }' \
+    'return'
+
+# Small cluster size again, so the image needs multiple refblocks
+_send_qemu_cmd \
+    $QEMU_HANDLE \
+    '{ "execute": "blockdev-create",
+       "arguments": {
+           "job-id": "create",
+           "options": {
+               "driver": "qcow2",
+               "file": "file",
+               "size": '$((64 * 1024 * 1024))',
+               "cluster-size": 512
+           } } }' \
+    '"concluded"'
+
+_send_qemu_cmd \
+    $QEMU_HANDLE \
+    '{ "execute": "job-dismiss", "arguments": { "id": "create" } }' \
+    'return'
+
+_send_qemu_cmd \
+    $QEMU_HANDLE \
+    '{ "execute": "quit" }' \
+    'return'
+
+wait=y _cleanup_qemu
+echo
+
+# Write some data
+$QEMU_IO -c 'write 0 64k' "$export_mp" | _filter_qemu_io
+
+# Corrupt the image by saying the image header was not allocated
+rt_offset=$(peek_file_be "$export_mp" 48 8)
+rb_offset=$(peek_file_be "$export_mp" $rt_offset 8)
+poke_file "$export_mp" $rb_offset "\x00\x00"
+
+# Repairing such a simple case should just work
+# (We used to put the reftable at the end of the image file, which can
+# never work for non-growable devices.)
+echo
+TEST_IMG="$export_mp" _check_test_img -r all \
+    | grep -v '^Repairing cluster.*refcount=1 reference=0'
+
+if $loopdev; then
+    sudo -n losetup -d "$export_mp"
+else
+    qsd_pid=$(cat "$TEST_DIR/qsd.pid")
+    kill -TERM "$qsd_pid"
+    # Wait for process to exit (cannot `wait` because the QSD is daemonized)
+    while [ -f "$TEST_DIR/qsd.pid" ]; do
+        true
+    done
+fi
+
 # success, all done
 echo '*** done'
 rm -f $seq.full
diff --git a/tests/qemu-iotests/108.out b/tests/qemu-iotests/108.out
index 75bab8dc84a..b5401d788dc 100644
--- a/tests/qemu-iotests/108.out
+++ b/tests/qemu-iotests/108.out
@@ -105,6 +105,87 @@ The following inconsistencies were found and repaired:
     0 leaked clusters
     1 corruptions
 
+Double checking the fixed image now...
+No errors were found on the image.
+
+=== Check rebuilt reftable location ===
+
+--- Does the image size increase? ---
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+wrote 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+ERROR cluster 0 refcount=0 reference=1
+Rebuilding refcount structure
+The following inconsistencies were found and repaired:
+
+    0 leaked clusters
+    1 corruptions
+
+Double checking the fixed image now...
+No errors were found on the image.
+
+OK: Image size did not change
+
+--- Will the reftable occupy a hole specifically left for it?  ---
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=9437184
+wrote 8388608/8388608 bytes at offset 0
+8 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+discard 512/512 bytes at offset 0
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+discard 1024/1024 bytes at offset 4096
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+ERROR cluster 0 refcount=0 reference=1
+Rebuilding refcount structure
+The following inconsistencies were found and repaired:
+
+    0 leaked clusters
+    1 corruptions
+
+Double checking the fixed image now...
+No errors were found on the image.
+
+OK: Reftable is where we expect it
+
+--- Rebuilding refcount structures on block devices ---
+
+{ "execute": "qmp_capabilities" }
+{"return": {}}
+{ "execute": "blockdev-create",
+       "arguments": {
+           "job-id": "create",
+           "options": {
+               "driver": "IMGFMT",
+               "file": "file",
+               "size": 67108864,
+               "cluster-size": 512
+           } } }
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "create"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "create"}}
+{"return": {}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "create"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "create"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "create"}}
+{ "execute": "job-dismiss", "arguments": { "id": "create" } }
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "create"}}
+{"return": {}}
+{ "execute": "quit" }
+{"return": {}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+
+wrote 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+ERROR cluster 0 refcount=0 reference=1
+Rebuilding refcount structure
+The following inconsistencies were found and repaired:
+
+    0 leaked clusters
+    1 corruptions
+
 Double checking the fixed image now...
 No errors were found on the image.
 *** done
-- 
Gitee


From 18d6aaca81bd806a5f9b0484d06792de9c26bb59 Mon Sep 17 00:00:00 2001
From: Hanna Reitz <hreitz@redhat.com>
Date: Tue, 5 Apr 2022 15:46:52 +0200
Subject: [PATCH 078/407] qcow2: Add errp to rebuild_refcount_structure()

RH-Author: Hanna Reitz <hreitz@redhat.com>
RH-MergeRequest: 171: qcow2: Improve refcount structure rebuilding
RH-Commit: [3/4] 9dddd1d21383c4cbd528e5a0d42b0c2a7d87c8f6
RH-Bugzilla: 1519071
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
RH-Acked-by: Eric Blake <eblake@redhat.com>

Instead of fprint()-ing error messages in rebuild_refcount_structure()
and its rebuild_refcounts_write_refblocks() helper, pass them through an
Error object to qcow2_check_refcounts() (which will then print it).

Suggested-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220405134652.19278-4-hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
(cherry picked from commit 0423f75351ab83b844a31349218b0eadd830e07a)
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
 block/qcow2-refcount.c | 33 +++++++++++++++++++--------------
 1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 555d8ba5aca..09f8ef49272 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -2462,7 +2462,8 @@ static int64_t alloc_clusters_imrt(BlockDriverState *bs,
 static int rebuild_refcounts_write_refblocks(
         BlockDriverState *bs, void **refcount_table, int64_t *nb_clusters,
         int64_t first_cluster, int64_t end_cluster,
-        uint64_t **on_disk_reftable_ptr, uint32_t *on_disk_reftable_entries_ptr
+        uint64_t **on_disk_reftable_ptr, uint32_t *on_disk_reftable_entries_ptr,
+        Error **errp
     )
 {
     BDRVQcow2State *s = bs->opaque;
@@ -2513,8 +2514,8 @@ static int rebuild_refcounts_write_refblocks(
                                                   nb_clusters,
                                                   &first_free_cluster);
             if (refblock_offset < 0) {
-                fprintf(stderr, "ERROR allocating refblock: %s\n",
-                        strerror(-refblock_offset));
+                error_setg_errno(errp, -refblock_offset,
+                                 "ERROR allocating refblock");
                 return refblock_offset;
             }
 
@@ -2536,6 +2537,7 @@ static int rebuild_refcounts_write_refblocks(
                                   on_disk_reftable_entries *
                                   REFTABLE_ENTRY_SIZE);
                 if (!on_disk_reftable) {
+                    error_setg(errp, "ERROR allocating reftable memory");
                     return -ENOMEM;
                 }
 
@@ -2559,7 +2561,7 @@ static int rebuild_refcounts_write_refblocks(
         ret = qcow2_pre_write_overlap_check(bs, 0, refblock_offset,
                                             s->cluster_size, false);
         if (ret < 0) {
-            fprintf(stderr, "ERROR writing refblock: %s\n", strerror(-ret));
+            error_setg_errno(errp, -ret, "ERROR writing refblock");
             return ret;
         }
 
@@ -2575,7 +2577,7 @@ static int rebuild_refcounts_write_refblocks(
         ret = bdrv_pwrite(bs->file, refblock_offset, on_disk_refblock,
                           s->cluster_size);
         if (ret < 0) {
-            fprintf(stderr, "ERROR writing refblock: %s\n", strerror(-ret));
+            error_setg_errno(errp, -ret, "ERROR writing refblock");
             return ret;
         }
 
@@ -2598,7 +2600,8 @@ static int rebuild_refcounts_write_refblocks(
 static int rebuild_refcount_structure(BlockDriverState *bs,
                                       BdrvCheckResult *res,
                                       void **refcount_table,
-                                      int64_t *nb_clusters)
+                                      int64_t *nb_clusters,
+                                      Error **errp)
 {
     BDRVQcow2State *s = bs->opaque;
     int64_t reftable_offset = -1;
@@ -2649,7 +2652,7 @@ static int rebuild_refcount_structure(BlockDriverState *bs,
         rebuild_refcounts_write_refblocks(bs, refcount_table, nb_clusters,
                                           0, *nb_clusters,
                                           &on_disk_reftable,
-                                          &on_disk_reftable_entries);
+                                          &on_disk_reftable_entries, errp);
     if (reftable_size_changed < 0) {
         res->check_errors++;
         ret = reftable_size_changed;
@@ -2673,8 +2676,8 @@ static int rebuild_refcount_structure(BlockDriverState *bs,
                                               refcount_table, nb_clusters,
                                               &first_free_cluster);
         if (reftable_offset < 0) {
-            fprintf(stderr, "ERROR allocating reftable: %s\n",
-                    strerror(-reftable_offset));
+            error_setg_errno(errp, -reftable_offset,
+                             "ERROR allocating reftable");
             res->check_errors++;
             ret = reftable_offset;
             goto fail;
@@ -2692,7 +2695,7 @@ static int rebuild_refcount_structure(BlockDriverState *bs,
                                               reftable_start_cluster,
                                               reftable_end_cluster,
                                               &on_disk_reftable,
-                                              &on_disk_reftable_entries);
+                                              &on_disk_reftable_entries, errp);
         if (reftable_size_changed < 0) {
             res->check_errors++;
             ret = reftable_size_changed;
@@ -2722,7 +2725,7 @@ static int rebuild_refcount_structure(BlockDriverState *bs,
     ret = qcow2_pre_write_overlap_check(bs, 0, reftable_offset, reftable_length,
                                         false);
     if (ret < 0) {
-        fprintf(stderr, "ERROR writing reftable: %s\n", strerror(-ret));
+        error_setg_errno(errp, -ret, "ERROR writing reftable");
         goto fail;
     }
 
@@ -2730,7 +2733,7 @@ static int rebuild_refcount_structure(BlockDriverState *bs,
     ret = bdrv_pwrite(bs->file, reftable_offset, on_disk_reftable,
                       reftable_length);
     if (ret < 0) {
-        fprintf(stderr, "ERROR writing reftable: %s\n", strerror(-ret));
+        error_setg_errno(errp, -ret, "ERROR writing reftable");
         goto fail;
     }
 
@@ -2743,7 +2746,7 @@ static int rebuild_refcount_structure(BlockDriverState *bs,
                            &reftable_offset_and_clusters,
                            sizeof(reftable_offset_and_clusters));
     if (ret < 0) {
-        fprintf(stderr, "ERROR setting reftable: %s\n", strerror(-ret));
+        error_setg_errno(errp, -ret, "ERROR setting reftable");
         goto fail;
     }
 
@@ -2811,11 +2814,13 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
     if (rebuild && (fix & BDRV_FIX_ERRORS)) {
         BdrvCheckResult old_res = *res;
         int fresh_leaks = 0;
+        Error *local_err = NULL;
 
         fprintf(stderr, "Rebuilding refcount structure\n");
         ret = rebuild_refcount_structure(bs, res, &refcount_table,
-                                         &nb_clusters);
+                                         &nb_clusters, &local_err);
         if (ret < 0) {
+            error_report_err(local_err);
             goto fail;
         }
 
-- 
Gitee


From 357c37f482578116b3616ad9f4eb69fd6d10831b Mon Sep 17 00:00:00 2001
From: Hanna Reitz <hreitz@redhat.com>
Date: Thu, 21 Apr 2022 16:24:35 +0200
Subject: [PATCH 079/407] iotests/108: Fix when missing user_allow_other

RH-Author: Hanna Reitz <hreitz@redhat.com>
RH-MergeRequest: 171: qcow2: Improve refcount structure rebuilding
RH-Commit: [4/4] 36b70b5378ae7c8084b9e847706f00003abe9c11
RH-Bugzilla: 1519071
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
RH-Acked-by: Eric Blake <eblake@redhat.com>

FUSE exports' allow-other option defaults to "auto", which means that it
will try passing allow_other as a mount option, and fall back to not
using it when an error occurs.  We make no effort to hide fusermount's
error message (because it would be difficult, and because users might
want to know about the fallback occurring), and so when allow_other does
not work (primarily when /etc/fuse.conf does not contain
user_allow_other), this error message will appear and break the
reference output.

We do not need allow_other here, though, so we can just pass
allow-other=off to fix that.

Reported-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220421142435.569600-1-hreitz@redhat.com>
Tested-by: Markus Armbruster <armbru@redhat.com>
Tested-by: Eric Blake <eblake@redhat.com>
(cherry picked from commit 348a0740afc5b313599533eb69bbb2b95d2f1bba)
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
 tests/qemu-iotests/108 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/108 b/tests/qemu-iotests/108
index 23abbeaff03..775ff08eca4 100755
--- a/tests/qemu-iotests/108
+++ b/tests/qemu-iotests/108
@@ -326,7 +326,7 @@ else
 
     $QSD \
         --blockdev file,node-name=export-node,filename="$TEST_IMG" \
-        --export fuse,id=fuse-export,node-name=export-node,mountpoint="$export_mp",writable=on,growable=off \
+        --export fuse,id=fuse-export,node-name=export-node,mountpoint="$export_mp",writable=on,growable=off,allow-other=off \
         --pidfile "$TEST_DIR/qsd.pid" \
         &
 
-- 
Gitee


From 37167db585af7c8f68662a71b77da56df9bf46a7 Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Thu, 14 Apr 2022 16:45:30 -0400
Subject: [PATCH 080/407] Revert redhat: Add some devices for exporting
 upstream machine types

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 156: Revert redhat: Add some devices for exporting upstream machine types
RH-Commit: [1/1] f25d0da3a181136917ead82f5a5c59efe3fa445a (jmaloy/qemu-kvm)
RH-Bugzilla: 2065043
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Peter Xu <peterx@redhat.com>

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2065043
Upstream: no

Manual revert of commit 70d3924521c9bfd912bcf1a1fc76f49eb377de46, since
the directory structure looks different from rhel-av-8.4.0.z where
this commit is taken from. Besides, x86_64-softmmu.mak looks totally
different and should not be affected by this reversal.

Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 configs/devices/x86_64-softmmu/x86_64-rh-devices.mak     | 1 -
 .../devices/x86_64-softmmu/x86_64-upstream-devices.mak   | 4 ----
 hw/char/parallel.c                                       | 9 ---------
 hw/i386/pc_piix.c                                        | 2 +-
 hw/i386/pc_q35.c                                         | 2 +-
 hw/timer/hpet.c                                          | 8 --------
 6 files changed, 2 insertions(+), 24 deletions(-)
 delete mode 100644 configs/devices/x86_64-softmmu/x86_64-upstream-devices.mak

diff --git a/configs/devices/x86_64-softmmu/x86_64-rh-devices.mak b/configs/devices/x86_64-softmmu/x86_64-rh-devices.mak
index fdbbdf97426..31ce08edaba 100644
--- a/configs/devices/x86_64-softmmu/x86_64-rh-devices.mak
+++ b/configs/devices/x86_64-softmmu/x86_64-rh-devices.mak
@@ -1,5 +1,4 @@
 include ../rh-virtio.mak
-include x86_64-upstream-devices.mak
 
 CONFIG_AC97=y
 CONFIG_ACPI=y
diff --git a/configs/devices/x86_64-softmmu/x86_64-upstream-devices.mak b/configs/devices/x86_64-softmmu/x86_64-upstream-devices.mak
deleted file mode 100644
index 2cd20f54d2f..00000000000
--- a/configs/devices/x86_64-softmmu/x86_64-upstream-devices.mak
+++ /dev/null
@@ -1,4 +0,0 @@
-# We need "isa-parallel"
-CONFIG_PARALLEL=y
-# We need "hpet"
-CONFIG_HPET=y
diff --git a/hw/char/parallel.c b/hw/char/parallel.c
index e5f108211b8..b45e67bfbb9 100644
--- a/hw/char/parallel.c
+++ b/hw/char/parallel.c
@@ -29,7 +29,6 @@
 #include "chardev/char-parallel.h"
 #include "chardev/char-fe.h"
 #include "hw/acpi/aml-build.h"
-#include "hw/boards.h"
 #include "hw/irq.h"
 #include "hw/isa/isa.h"
 #include "hw/qdev-properties.h"
@@ -535,14 +534,6 @@ static void parallel_isa_realizefn(DeviceState *dev, Error **errp)
     int base;
     uint8_t dummy;
 
-    /* Restricted for Red Hat Enterprise Linux */
-    MachineClass *mc = MACHINE_GET_CLASS(qdev_get_machine());
-    if (strstr(mc->name, "rhel")) {
-        error_setg(errp, "Device %s is not supported with machine type %s",
-                   object_get_typename(OBJECT(dev)), mc->name);
-        return;
-    }
-
     if (!qemu_chr_fe_backend_connected(&s->chr)) {
         error_setg(errp, "Can't create parallel device, empty char device");
         return;
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index ab6d03e07ab..5f101c8748f 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -966,7 +966,7 @@ static void pc_machine_rhel7_options(MachineClass *m)
 {
     PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
     m->family = "pc_piix_Y";
-    m->default_machine_opts = "firmware=bios-256k.bin,hpet=off";
+    m->default_machine_opts = "firmware=bios-256k.bin";
     pcmc->default_nic_model = "e1000";
     pcmc->pci_root_uid = 0;
     m->default_display = "std";
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 882fe7a68dc..73b0d0d3174 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -633,7 +633,7 @@ static void pc_q35_machine_rhel_options(MachineClass *m)
     pcmc->pci_root_uid = 0;
     m->family = "pc_q35_Z";
     m->units_per_default_bus = 1;
-    m->default_machine_opts = "firmware=bios-256k.bin,hpet=off";
+    m->default_machine_opts = "firmware=bios-256k.bin";
     m->default_display = "std";
     m->no_floppy = 1;
     m->no_parallel = 1;
diff --git a/hw/timer/hpet.c b/hw/timer/hpet.c
index 202e0325240..9520471be2c 100644
--- a/hw/timer/hpet.c
+++ b/hw/timer/hpet.c
@@ -733,14 +733,6 @@ static void hpet_realize(DeviceState *dev, Error **errp)
     int i;
     HPETTimer *timer;
 
-    /* Restricted for Red Hat Enterprise Linux */
-    MachineClass *mc = MACHINE_GET_CLASS(qdev_get_machine());
-    if (strstr(mc->name, "rhel")) {
-        error_setg(errp, "Device %s is not supported with machine type %s",
-                   object_get_typename(OBJECT(dev)), mc->name);
-        return;
-    }
-
     if (!s->intcap) {
         warn_report("Hpet's intcap not initialized");
     }
-- 
Gitee


From abe750120daca36812857c05aa9752442bdf9678 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini@redhat.com>
Date: Thu, 24 Mar 2022 09:21:41 +0100
Subject: [PATCH 081/407] target/i386: properly reset TSC on reset

RH-Author: Paolo Bonzini <pbonzini@redhat.com>
RH-MergeRequest: 172: target/i386: properly reset TSC on reset
RH-Commit: [1/1] 7008bc5d02ad0a2d8b78259459d22d8f0986c989
RH-Bugzilla: 2070417
RH-Acked-by: Marcelo Tosatti <None>
RH-Acked-by: Igor Mammedov <imammedo@redhat.com>
RH-Acked-by: Vitaly Kuznetsov <vkuznets@redhat.com>

Some versions of Windows hang on reboot if their TSC value is greater
than 2^54.  The calibration of the Hyper-V reference time overflows
and fails; as a result the processors' clock sources are out of sync.

The issue is that the TSC _should_ be reset to 0 on CPU reset and
QEMU tries to do that.  However, KVM special cases writing 0 to the
TSC and thinks that QEMU is trying to hot-plug a CPU, which is
correct the first time through but not later.  Thwart this valiant
effort and reset the TSC to 1 instead, but only if the CPU has been
run once.

For this to work, env->tsc has to be moved to the part of CPUArchState
that is not zeroed at the beginning of x86_cpu_reset.

Reported-by: Vadim Rozenfeld <vrozenfe@redhat.com>
Supersedes: <20220324082346.72180-1-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 5286c3662294119dc2dd1e9296757337211451f6)
---
 target/i386/cpu.c | 13 +++++++++++++
 target/i386/cpu.h |  2 +-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 6e25d133397..dd6935b1dd5 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -5871,6 +5871,19 @@ static void x86_cpu_reset(DeviceState *dev)
     env->xstate_bv = 0;
 
     env->pat = 0x0007040600070406ULL;
+
+    if (kvm_enabled()) {
+        /*
+         * KVM handles TSC = 0 specially and thinks we are hot-plugging
+         * a new CPU, use 1 instead to force a reset.
+         */
+        if (env->tsc != 0) {
+            env->tsc = 1;
+        }
+    } else {
+        env->tsc = 0;
+    }
+
     env->msr_ia32_misc_enable = MSR_IA32_MISC_ENABLE_DEFAULT;
     if (env->features[FEAT_1_ECX] & CPUID_EXT_MONITOR) {
         env->msr_ia32_misc_enable |= MSR_IA32_MISC_ENABLE_MWAIT;
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 04f2b790c9f..c6a6c871f16 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1510,7 +1510,6 @@ typedef struct CPUX86State {
     target_ulong kernelgsbase;
 #endif
 
-    uint64_t tsc;
     uint64_t tsc_adjust;
     uint64_t tsc_deadline;
     uint64_t tsc_aux;
@@ -1660,6 +1659,7 @@ typedef struct CPUX86State {
     int64_t tsc_khz;
     int64_t user_tsc_khz; /* for sanity check only */
     uint64_t apic_bus_freq;
+    uint64_t tsc;
 #if defined(CONFIG_KVM) || defined(CONFIG_HVF)
     void *xsave_buf;
     uint32_t xsave_buf_len;
-- 
Gitee


From 04e960e240c15ba009f23c46e632af0cab5c8cef Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Wed, 4 May 2022 10:35:17 -0400
Subject: [PATCH 082/407] ui/cursor: fix integer overflow in cursor_alloc
 (CVE-2021-4206)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 180: ui/cursor: fix integer overflow in cursor_alloc (CVE-2021-4206)
RH-Commit: [1/1] 7ad711347bc6248dc5aefa45401ca74448dee5e5 (jmaloy/qemu-kvm)
RH-Bugzilla: 2040734
RH-Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
RH-Acked-by: Mauro Matteo Cascella <None>
RH-Acked-by: Gerd Hoffmann <kraxel@redhat.com>

BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2040734
Upstream: Merged
CVE: CVE-2021-4206

commit fa892e9abb728e76afcf27323ab29c57fb0fe7aa
Author: Mauro Matteo Cascella <mcascell@redhat.com>
Date:   Thu Apr 7 10:17:12 2022 +0200

    ui/cursor: fix integer overflow in cursor_alloc (CVE-2021-4206)

    Prevent potential integer overflow by limiting 'width' and 'height' to
    512x512. Also change 'datasize' type to size_t. Refer to security
    advisory https://starlabs.sg/advisories/22-4206/ for more information.

    Fixes: CVE-2021-4206
    Signed-off-by: Mauro Matteo Cascella <mcascell@redhat.com>
    Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
    Message-Id: <20220407081712.345609-1-mcascell@redhat.com>
    Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

(cherry picked from commit fa892e9abb728e76afcf27323ab29c57fb0fe7aa)
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 hw/display/qxl-render.c | 7 +++++++
 hw/display/vmware_vga.c | 2 ++
 ui/cursor.c             | 8 +++++++-
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/hw/display/qxl-render.c b/hw/display/qxl-render.c
index 237ed293baa..ca217004bf7 100644
--- a/hw/display/qxl-render.c
+++ b/hw/display/qxl-render.c
@@ -247,6 +247,13 @@ static QEMUCursor *qxl_cursor(PCIQXLDevice *qxl, QXLCursor *cursor,
     size_t size;
 
     c = cursor_alloc(cursor->header.width, cursor->header.height);
+
+    if (!c) {
+        qxl_set_guest_bug(qxl, "%s: cursor %ux%u alloc error", __func__,
+                cursor->header.width, cursor->header.height);
+        goto fail;
+    }
+
     c->hot_x = cursor->header.hot_spot_x;
     c->hot_y = cursor->header.hot_spot_y;
     switch (cursor->header.type) {
diff --git a/hw/display/vmware_vga.c b/hw/display/vmware_vga.c
index e2969a6c81c..2b81d6122fc 100644
--- a/hw/display/vmware_vga.c
+++ b/hw/display/vmware_vga.c
@@ -509,6 +509,8 @@ static inline void vmsvga_cursor_define(struct vmsvga_state_s *s,
     int i, pixels;
 
     qc = cursor_alloc(c->width, c->height);
+    assert(qc != NULL);
+
     qc->hot_x = c->hot_x;
     qc->hot_y = c->hot_y;
     switch (c->bpp) {
diff --git a/ui/cursor.c b/ui/cursor.c
index 1d62ddd4d07..835f0802f95 100644
--- a/ui/cursor.c
+++ b/ui/cursor.c
@@ -46,6 +46,8 @@ static QEMUCursor *cursor_parse_xpm(const char *xpm[])
 
     /* parse pixel data */
     c = cursor_alloc(width, height);
+    assert(c != NULL);
+
     for (pixel = 0, y = 0; y < height; y++, line++) {
         for (x = 0; x < height; x++, pixel++) {
             idx = xpm[line][x];
@@ -91,7 +93,11 @@ QEMUCursor *cursor_builtin_left_ptr(void)
 QEMUCursor *cursor_alloc(int width, int height)
 {
     QEMUCursor *c;
-    int datasize = width * height * sizeof(uint32_t);
+    size_t datasize = width * height * sizeof(uint32_t);
+
+    if (width > 512 || height > 512) {
+        return NULL;
+    }
 
     c = g_malloc0(sizeof(QEMUCursor) + datasize);
     c->width  = width;
-- 
Gitee


From 7cd8097d2e3936071ae0d322e147547f8c140f70 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini@redhat.com>
Date: Thu, 11 Nov 2021 12:06:00 +0100
Subject: [PATCH 083/407] virtio-gpu: do not byteswap padding
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Paul Lai <plai@redhat.com>
RH-MergeRequest: 176: Enable KVM AMX support
RH-Commit: [1/13] 12714f53820b7632e7fc0a8a3bf8eb4a64f41750
RH-Bugzilla: 1916415
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Igor Mammedov <imammedo@redhat.com>
RH-Acked-by: Paolo Bonzini <pbonzini@redhat.com>

In Linux 5.16, the padding of struct virtio_gpu_ctrl_hdr has become a
single-byte field followed by a uint8_t[3] array of padding bytes,
and virtio_gpu_ctrl_hdr_bswap does not compile anymore.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20211111110604.207376-2-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit a4663f1a5506626175fc64c86e52135587c36872)
Signed-off-by: Paul Lai <plai@redhat.com>
---
 include/hw/virtio/virtio-gpu-bswap.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/include/hw/virtio/virtio-gpu-bswap.h b/include/hw/virtio/virtio-gpu-bswap.h
index e2bee8f5955..5faac0d8d5f 100644
--- a/include/hw/virtio/virtio-gpu-bswap.h
+++ b/include/hw/virtio/virtio-gpu-bswap.h
@@ -24,7 +24,6 @@ virtio_gpu_ctrl_hdr_bswap(struct virtio_gpu_ctrl_hdr *hdr)
     le32_to_cpus(&hdr->flags);
     le64_to_cpus(&hdr->fence_id);
     le32_to_cpus(&hdr->ctx_id);
-    le32_to_cpus(&hdr->padding);
 }
 
 static inline void
-- 
Gitee


From bda07816123479e70d2737f555d18317fbb2c6e3 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini@redhat.com>
Date: Thu, 11 Nov 2021 12:06:01 +0100
Subject: [PATCH 084/407] linux-headers: update to 5.16-rc1
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Paul Lai <plai@redhat.com>
RH-MergeRequest: 176: Enable KVM AMX support
RH-Commit: [2/13] 4af2f4942db029b81890e3862793fb54b62791cc
RH-Bugzilla: 1916415
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Igor Mammedov <imammedo@redhat.com>
RH-Acked-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20211111110604.207376-3-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 43709a0ca3b09e952bde3f38112f1d7fbf7c65b1)
Signed-off-by: Paul Lai <plai@redhat.com>
---
 include/standard-headers/drm/drm_fourcc.h     | 121 +++++++++++++++++-
 include/standard-headers/linux/ethtool.h      |  31 +++++
 include/standard-headers/linux/fuse.h         |  10 +-
 include/standard-headers/linux/pci_regs.h     |   6 +
 include/standard-headers/linux/virtio_gpu.h   |  18 ++-
 include/standard-headers/linux/virtio_ids.h   |  24 ++++
 include/standard-headers/linux/virtio_vsock.h |   3 +-
 linux-headers/asm-arm64/unistd.h              |   1 +
 linux-headers/asm-generic/unistd.h            |  22 +++-
 linux-headers/asm-mips/unistd_n32.h           |   1 +
 linux-headers/asm-mips/unistd_n64.h           |   1 +
 linux-headers/asm-mips/unistd_o32.h           |   1 +
 linux-headers/asm-powerpc/unistd_32.h         |   1 +
 linux-headers/asm-powerpc/unistd_64.h         |   1 +
 linux-headers/asm-s390/unistd_32.h            |   1 +
 linux-headers/asm-s390/unistd_64.h            |   1 +
 linux-headers/asm-x86/kvm.h                   |   5 +
 linux-headers/asm-x86/unistd_32.h             |   3 +
 linux-headers/asm-x86/unistd_64.h             |   3 +
 linux-headers/asm-x86/unistd_x32.h            |   3 +
 linux-headers/linux/kvm.h                     |  40 +++++-
 21 files changed, 276 insertions(+), 21 deletions(-)

diff --git a/include/standard-headers/drm/drm_fourcc.h b/include/standard-headers/drm/drm_fourcc.h
index 352b51fd0ac..2c025cb4fe8 100644
--- a/include/standard-headers/drm/drm_fourcc.h
+++ b/include/standard-headers/drm/drm_fourcc.h
@@ -103,6 +103,12 @@ extern "C" {
 /* 8 bpp Red */
 #define DRM_FORMAT_R8		fourcc_code('R', '8', ' ', ' ') /* [7:0] R */
 
+/* 10 bpp Red */
+#define DRM_FORMAT_R10		fourcc_code('R', '1', '0', ' ') /* [15:0] x:R 6:10 little endian */
+
+/* 12 bpp Red */
+#define DRM_FORMAT_R12		fourcc_code('R', '1', '2', ' ') /* [15:0] x:R 4:12 little endian */
+
 /* 16 bpp Red */
 #define DRM_FORMAT_R16		fourcc_code('R', '1', '6', ' ') /* [15:0] R little endian */
 
@@ -372,6 +378,12 @@ extern "C" {
 
 #define DRM_FORMAT_RESERVED	      ((1ULL << 56) - 1)
 
+#define fourcc_mod_get_vendor(modifier) \
+	(((modifier) >> 56) & 0xff)
+
+#define fourcc_mod_is_vendor(modifier, vendor) \
+	(fourcc_mod_get_vendor(modifier) == DRM_FORMAT_MOD_VENDOR_## vendor)
+
 #define fourcc_mod_code(vendor, val) \
 	((((uint64_t)DRM_FORMAT_MOD_VENDOR_## vendor) << 56) | ((val) & 0x00ffffffffffffffULL))
 
@@ -899,9 +911,9 @@ drm_fourcc_canonicalize_nvidia_format_mod(uint64_t modifier)
 
 /*
  * The top 4 bits (out of the 56 bits alloted for specifying vendor specific
- * modifiers) denote the category for modifiers. Currently we have only two
- * categories of modifiers ie AFBC and MISC. We can have a maximum of sixteen
- * different categories.
+ * modifiers) denote the category for modifiers. Currently we have three
+ * categories of modifiers ie AFBC, MISC and AFRC. We can have a maximum of
+ * sixteen different categories.
  */
 #define DRM_FORMAT_MOD_ARM_CODE(__type, __val) \
 	fourcc_mod_code(ARM, ((uint64_t)(__type) << 52) | ((__val) & 0x000fffffffffffffULL))
@@ -1016,6 +1028,109 @@ drm_fourcc_canonicalize_nvidia_format_mod(uint64_t modifier)
  */
 #define AFBC_FORMAT_MOD_USM	(1ULL << 12)
 
+/*
+ * Arm Fixed-Rate Compression (AFRC) modifiers
+ *
+ * AFRC is a proprietary fixed rate image compression protocol and format,
+ * designed to provide guaranteed bandwidth and memory footprint
+ * reductions in graphics and media use-cases.
+ *
+ * AFRC buffers consist of one or more planes, with the same components
+ * and meaning as an uncompressed buffer using the same pixel format.
+ *
+ * Within each plane, the pixel/luma/chroma values are grouped into
+ * "coding unit" blocks which are individually compressed to a
+ * fixed size (in bytes). All coding units within a given plane of a buffer
+ * store the same number of values, and have the same compressed size.
+ *
+ * The coding unit size is configurable, allowing different rates of compression.
+ *
+ * The start of each AFRC buffer plane must be aligned to an alignment granule which
+ * depends on the coding unit size.
+ *
+ * Coding Unit Size   Plane Alignment
+ * ----------------   ---------------
+ * 16 bytes           1024 bytes
+ * 24 bytes           512  bytes
+ * 32 bytes           2048 bytes
+ *
+ * Coding units are grouped into paging tiles. AFRC buffer dimensions must be aligned
+ * to a multiple of the paging tile dimensions.
+ * The dimensions of each paging tile depend on whether the buffer is optimised for
+ * scanline (SCAN layout) or rotated (ROT layout) access.
+ *
+ * Layout   Paging Tile Width   Paging Tile Height
+ * ------   -----------------   ------------------
+ * SCAN     16 coding units     4 coding units
+ * ROT      8  coding units     8 coding units
+ *
+ * The dimensions of each coding unit depend on the number of components
+ * in the compressed plane and whether the buffer is optimised for
+ * scanline (SCAN layout) or rotated (ROT layout) access.
+ *
+ * Number of Components in Plane   Layout      Coding Unit Width   Coding Unit Height
+ * -----------------------------   ---------   -----------------   ------------------
+ * 1                               SCAN        16 samples          4 samples
+ * Example: 16x4 luma samples in a 'Y' plane
+ *          16x4 chroma 'V' values, in the 'V' plane of a fully-planar YUV buffer
+ * -----------------------------   ---------   -----------------   ------------------
+ * 1                               ROT         8 samples           8 samples
+ * Example: 8x8 luma samples in a 'Y' plane
+ *          8x8 chroma 'V' values, in the 'V' plane of a fully-planar YUV buffer
+ * -----------------------------   ---------   -----------------   ------------------
+ * 2                               DONT CARE   8 samples           4 samples
+ * Example: 8x4 chroma pairs in the 'UV' plane of a semi-planar YUV buffer
+ * -----------------------------   ---------   -----------------   ------------------
+ * 3                               DONT CARE   4 samples           4 samples
+ * Example: 4x4 pixels in an RGB buffer without alpha
+ * -----------------------------   ---------   -----------------   ------------------
+ * 4                               DONT CARE   4 samples           4 samples
+ * Example: 4x4 pixels in an RGB buffer with alpha
+ */
+
+#define DRM_FORMAT_MOD_ARM_TYPE_AFRC 0x02
+
+#define DRM_FORMAT_MOD_ARM_AFRC(__afrc_mode) \
+	DRM_FORMAT_MOD_ARM_CODE(DRM_FORMAT_MOD_ARM_TYPE_AFRC, __afrc_mode)
+
+/*
+ * AFRC coding unit size modifier.
+ *
+ * Indicates the number of bytes used to store each compressed coding unit for
+ * one or more planes in an AFRC encoded buffer. The coding unit size for chrominance
+ * is the same for both Cb and Cr, which may be stored in separate planes.
+ *
+ * AFRC_FORMAT_MOD_CU_SIZE_P0 indicates the number of bytes used to store
+ * each compressed coding unit in the first plane of the buffer. For RGBA buffers
+ * this is the only plane, while for semi-planar and fully-planar YUV buffers,
+ * this corresponds to the luma plane.
+ *
+ * AFRC_FORMAT_MOD_CU_SIZE_P12 indicates the number of bytes used to store
+ * each compressed coding unit in the second and third planes in the buffer.
+ * For semi-planar and fully-planar YUV buffers, this corresponds to the chroma plane(s).
+ *
+ * For single-plane buffers, AFRC_FORMAT_MOD_CU_SIZE_P0 must be specified
+ * and AFRC_FORMAT_MOD_CU_SIZE_P12 must be zero.
+ * For semi-planar and fully-planar buffers, both AFRC_FORMAT_MOD_CU_SIZE_P0 and
+ * AFRC_FORMAT_MOD_CU_SIZE_P12 must be specified.
+ */
+#define AFRC_FORMAT_MOD_CU_SIZE_MASK 0xf
+#define AFRC_FORMAT_MOD_CU_SIZE_16 (1ULL)
+#define AFRC_FORMAT_MOD_CU_SIZE_24 (2ULL)
+#define AFRC_FORMAT_MOD_CU_SIZE_32 (3ULL)
+
+#define AFRC_FORMAT_MOD_CU_SIZE_P0(__afrc_cu_size) (__afrc_cu_size)
+#define AFRC_FORMAT_MOD_CU_SIZE_P12(__afrc_cu_size) ((__afrc_cu_size) << 4)
+
+/*
+ * AFRC scanline memory layout.
+ *
+ * Indicates if the buffer uses the scanline-optimised layout
+ * for an AFRC encoded buffer, otherwise, it uses the rotation-optimised layout.
+ * The memory layout is the same for all planes.
+ */
+#define AFRC_FORMAT_MOD_LAYOUT_SCAN (1ULL << 8)
+
 /*
  * Arm 16x16 Block U-Interleaved modifier
  *
diff --git a/include/standard-headers/linux/ethtool.h b/include/standard-headers/linux/ethtool.h
index 053d3fafdf3..688eb8dc39c 100644
--- a/include/standard-headers/linux/ethtool.h
+++ b/include/standard-headers/linux/ethtool.h
@@ -603,6 +603,7 @@ enum ethtool_link_ext_state {
 	ETHTOOL_LINK_EXT_STATE_CALIBRATION_FAILURE,
 	ETHTOOL_LINK_EXT_STATE_POWER_BUDGET_EXCEEDED,
 	ETHTOOL_LINK_EXT_STATE_OVERHEAT,
+	ETHTOOL_LINK_EXT_STATE_MODULE,
 };
 
 /* More information in addition to ETHTOOL_LINK_EXT_STATE_AUTONEG. */
@@ -639,6 +640,8 @@ enum ethtool_link_ext_substate_link_logical_mismatch {
 enum ethtool_link_ext_substate_bad_signal_integrity {
 	ETHTOOL_LINK_EXT_SUBSTATE_BSI_LARGE_NUMBER_OF_PHYSICAL_ERRORS = 1,
 	ETHTOOL_LINK_EXT_SUBSTATE_BSI_UNSUPPORTED_RATE,
+	ETHTOOL_LINK_EXT_SUBSTATE_BSI_SERDES_REFERENCE_CLOCK_LOST,
+	ETHTOOL_LINK_EXT_SUBSTATE_BSI_SERDES_ALOS,
 };
 
 /* More information in addition to ETHTOOL_LINK_EXT_STATE_CABLE_ISSUE. */
@@ -647,6 +650,11 @@ enum ethtool_link_ext_substate_cable_issue {
 	ETHTOOL_LINK_EXT_SUBSTATE_CI_CABLE_TEST_FAILURE,
 };
 
+/* More information in addition to ETHTOOL_LINK_EXT_STATE_MODULE. */
+enum ethtool_link_ext_substate_module {
+	ETHTOOL_LINK_EXT_SUBSTATE_MODULE_CMIS_NOT_READY = 1,
+};
+
 #define ETH_GSTRING_LEN		32
 
 /**
@@ -704,6 +712,29 @@ enum ethtool_stringset {
 	ETH_SS_COUNT
 };
 
+/**
+ * enum ethtool_module_power_mode_policy - plug-in module power mode policy
+ * @ETHTOOL_MODULE_POWER_MODE_POLICY_HIGH: Module is always in high power mode.
+ * @ETHTOOL_MODULE_POWER_MODE_POLICY_AUTO: Module is transitioned by the host
+ *	to high power mode when the first port using it is put administratively
+ *	up and to low power mode when the last port using it is put
+ *	administratively down.
+ */
+enum ethtool_module_power_mode_policy {
+	ETHTOOL_MODULE_POWER_MODE_POLICY_HIGH = 1,
+	ETHTOOL_MODULE_POWER_MODE_POLICY_AUTO,
+};
+
+/**
+ * enum ethtool_module_power_mode - plug-in module power mode
+ * @ETHTOOL_MODULE_POWER_MODE_LOW: Module is in low power mode.
+ * @ETHTOOL_MODULE_POWER_MODE_HIGH: Module is in high power mode.
+ */
+enum ethtool_module_power_mode {
+	ETHTOOL_MODULE_POWER_MODE_LOW = 1,
+	ETHTOOL_MODULE_POWER_MODE_HIGH,
+};
+
 /**
  * struct ethtool_gstrings - string set for data tagging
  * @cmd: Command number = %ETHTOOL_GSTRINGS
diff --git a/include/standard-headers/linux/fuse.h b/include/standard-headers/linux/fuse.h
index cce105bfbab..23ea31708bb 100644
--- a/include/standard-headers/linux/fuse.h
+++ b/include/standard-headers/linux/fuse.h
@@ -181,6 +181,9 @@
  *  - add FUSE_OPEN_KILL_SUIDGID
  *  - extend fuse_setxattr_in, add FUSE_SETXATTR_EXT
  *  - add FUSE_SETXATTR_ACL_KILL_SGID
+ *
+ *  7.34
+ *  - add FUSE_SYNCFS
  */
 
 #ifndef _LINUX_FUSE_H
@@ -212,7 +215,7 @@
 #define FUSE_KERNEL_VERSION 7
 
 /** Minor version number of this interface */
-#define FUSE_KERNEL_MINOR_VERSION 33
+#define FUSE_KERNEL_MINOR_VERSION 34
 
 /** The node ID of the root inode */
 #define FUSE_ROOT_ID 1
@@ -505,6 +508,7 @@ enum fuse_opcode {
 	FUSE_COPY_FILE_RANGE	= 47,
 	FUSE_SETUPMAPPING	= 48,
 	FUSE_REMOVEMAPPING	= 49,
+	FUSE_SYNCFS		= 50,
 
 	/* CUSE specific operations */
 	CUSE_INIT		= 4096,
@@ -967,4 +971,8 @@ struct fuse_removemapping_one {
 #define FUSE_REMOVEMAPPING_MAX_ENTRY   \
 		(PAGE_SIZE / sizeof(struct fuse_removemapping_one))
 
+struct fuse_syncfs_in {
+	uint64_t	padding;
+};
+
 #endif /* _LINUX_FUSE_H */
diff --git a/include/standard-headers/linux/pci_regs.h b/include/standard-headers/linux/pci_regs.h
index e709ae8235e..ff6ccbc6efe 100644
--- a/include/standard-headers/linux/pci_regs.h
+++ b/include/standard-headers/linux/pci_regs.h
@@ -504,6 +504,12 @@
 #define  PCI_EXP_DEVCTL_URRE	0x0008	/* Unsupported Request Reporting En. */
 #define  PCI_EXP_DEVCTL_RELAX_EN 0x0010 /* Enable relaxed ordering */
 #define  PCI_EXP_DEVCTL_PAYLOAD	0x00e0	/* Max_Payload_Size */
+#define  PCI_EXP_DEVCTL_PAYLOAD_128B 0x0000 /* 128 Bytes */
+#define  PCI_EXP_DEVCTL_PAYLOAD_256B 0x0020 /* 256 Bytes */
+#define  PCI_EXP_DEVCTL_PAYLOAD_512B 0x0040 /* 512 Bytes */
+#define  PCI_EXP_DEVCTL_PAYLOAD_1024B 0x0060 /* 1024 Bytes */
+#define  PCI_EXP_DEVCTL_PAYLOAD_2048B 0x0080 /* 2048 Bytes */
+#define  PCI_EXP_DEVCTL_PAYLOAD_4096B 0x00a0 /* 4096 Bytes */
 #define  PCI_EXP_DEVCTL_EXT_TAG	0x0100	/* Extended Tag Field Enable */
 #define  PCI_EXP_DEVCTL_PHANTOM	0x0200	/* Phantom Functions Enable */
 #define  PCI_EXP_DEVCTL_AUX_PME	0x0400	/* Auxiliary Power PM Enable */
diff --git a/include/standard-headers/linux/virtio_gpu.h b/include/standard-headers/linux/virtio_gpu.h
index 1357e4774ea..2da48d3d4c2 100644
--- a/include/standard-headers/linux/virtio_gpu.h
+++ b/include/standard-headers/linux/virtio_gpu.h
@@ -59,6 +59,11 @@
  * VIRTIO_GPU_CMD_RESOURCE_CREATE_BLOB
  */
 #define VIRTIO_GPU_F_RESOURCE_BLOB       3
+/*
+ * VIRTIO_GPU_CMD_CREATE_CONTEXT with
+ * context_init and multiple timelines
+ */
+#define VIRTIO_GPU_F_CONTEXT_INIT        4
 
 enum virtio_gpu_ctrl_type {
 	VIRTIO_GPU_UNDEFINED = 0,
@@ -122,14 +127,20 @@ enum virtio_gpu_shm_id {
 	VIRTIO_GPU_SHM_ID_HOST_VISIBLE = 1
 };
 
-#define VIRTIO_GPU_FLAG_FENCE (1 << 0)
+#define VIRTIO_GPU_FLAG_FENCE         (1 << 0)
+/*
+ * If the following flag is set, then ring_idx contains the index
+ * of the command ring that needs to used when creating the fence
+ */
+#define VIRTIO_GPU_FLAG_INFO_RING_IDX (1 << 1)
 
 struct virtio_gpu_ctrl_hdr {
 	uint32_t type;
 	uint32_t flags;
 	uint64_t fence_id;
 	uint32_t ctx_id;
-	uint32_t padding;
+	uint8_t ring_idx;
+	uint8_t padding[3];
 };
 
 /* data passed in the cursor vq */
@@ -269,10 +280,11 @@ struct virtio_gpu_resource_create_3d {
 };
 
 /* VIRTIO_GPU_CMD_CTX_CREATE */
+#define VIRTIO_GPU_CONTEXT_INIT_CAPSET_ID_MASK 0x000000ff
 struct virtio_gpu_ctx_create {
 	struct virtio_gpu_ctrl_hdr hdr;
 	uint32_t nlen;
-	uint32_t padding;
+	uint32_t context_init;
 	char debug_name[64];
 };
 
diff --git a/include/standard-headers/linux/virtio_ids.h b/include/standard-headers/linux/virtio_ids.h
index 4fe842c3a3a..80d76b75bcc 100644
--- a/include/standard-headers/linux/virtio_ids.h
+++ b/include/standard-headers/linux/virtio_ids.h
@@ -54,7 +54,31 @@
 #define VIRTIO_ID_SOUND			25 /* virtio sound */
 #define VIRTIO_ID_FS			26 /* virtio filesystem */
 #define VIRTIO_ID_PMEM			27 /* virtio pmem */
+#define VIRTIO_ID_RPMB			28 /* virtio rpmb */
 #define VIRTIO_ID_MAC80211_HWSIM	29 /* virtio mac80211-hwsim */
+#define VIRTIO_ID_VIDEO_ENCODER		30 /* virtio video encoder */
+#define VIRTIO_ID_VIDEO_DECODER		31 /* virtio video decoder */
+#define VIRTIO_ID_SCMI			32 /* virtio SCMI */
+#define VIRTIO_ID_NITRO_SEC_MOD		33 /* virtio nitro secure module*/
+#define VIRTIO_ID_I2C_ADAPTER		34 /* virtio i2c adapter */
+#define VIRTIO_ID_WATCHDOG		35 /* virtio watchdog */
+#define VIRTIO_ID_CAN			36 /* virtio can */
+#define VIRTIO_ID_DMABUF		37 /* virtio dmabuf */
+#define VIRTIO_ID_PARAM_SERV		38 /* virtio parameter server */
+#define VIRTIO_ID_AUDIO_POLICY		39 /* virtio audio policy */
 #define VIRTIO_ID_BT			40 /* virtio bluetooth */
+#define VIRTIO_ID_GPIO			41 /* virtio gpio */
+
+/*
+ * Virtio Transitional IDs
+ */
+
+#define VIRTIO_TRANS_ID_NET		1000 /* transitional virtio net */
+#define VIRTIO_TRANS_ID_BLOCK		1001 /* transitional virtio block */
+#define VIRTIO_TRANS_ID_BALLOON		1002 /* transitional virtio balloon */
+#define VIRTIO_TRANS_ID_CONSOLE		1003 /* transitional virtio console */
+#define VIRTIO_TRANS_ID_SCSI		1004 /* transitional virtio SCSI */
+#define VIRTIO_TRANS_ID_RNG		1005 /* transitional virtio rng */
+#define VIRTIO_TRANS_ID_9P		1009 /* transitional virtio 9p console */
 
 #endif /* _LINUX_VIRTIO_IDS_H */
diff --git a/include/standard-headers/linux/virtio_vsock.h b/include/standard-headers/linux/virtio_vsock.h
index 3a23488e421..467e751b178 100644
--- a/include/standard-headers/linux/virtio_vsock.h
+++ b/include/standard-headers/linux/virtio_vsock.h
@@ -97,7 +97,8 @@ enum virtio_vsock_shutdown {
 
 /* VIRTIO_VSOCK_OP_RW flags values */
 enum virtio_vsock_rw {
-	VIRTIO_VSOCK_SEQ_EOR = 1,
+	VIRTIO_VSOCK_SEQ_EOM = 1,
+	VIRTIO_VSOCK_SEQ_EOR = 2,
 };
 
 #endif /* _LINUX_VIRTIO_VSOCK_H */
diff --git a/linux-headers/asm-arm64/unistd.h b/linux-headers/asm-arm64/unistd.h
index f83a70e07df..ce2ee8f1e36 100644
--- a/linux-headers/asm-arm64/unistd.h
+++ b/linux-headers/asm-arm64/unistd.h
@@ -20,5 +20,6 @@
 #define __ARCH_WANT_SET_GET_RLIMIT
 #define __ARCH_WANT_TIME32_SYSCALLS
 #define __ARCH_WANT_SYS_CLONE3
+#define __ARCH_WANT_MEMFD_SECRET
 
 #include <asm-generic/unistd.h>
diff --git a/linux-headers/asm-generic/unistd.h b/linux-headers/asm-generic/unistd.h
index f211961ce1d..4557a8b6086 100644
--- a/linux-headers/asm-generic/unistd.h
+++ b/linux-headers/asm-generic/unistd.h
@@ -673,15 +673,15 @@ __SYSCALL(__NR_madvise, sys_madvise)
 #define __NR_remap_file_pages 234
 __SYSCALL(__NR_remap_file_pages, sys_remap_file_pages)
 #define __NR_mbind 235
-__SC_COMP(__NR_mbind, sys_mbind, compat_sys_mbind)
+__SYSCALL(__NR_mbind, sys_mbind)
 #define __NR_get_mempolicy 236
-__SC_COMP(__NR_get_mempolicy, sys_get_mempolicy, compat_sys_get_mempolicy)
+__SYSCALL(__NR_get_mempolicy, sys_get_mempolicy)
 #define __NR_set_mempolicy 237
-__SC_COMP(__NR_set_mempolicy, sys_set_mempolicy, compat_sys_set_mempolicy)
+__SYSCALL(__NR_set_mempolicy, sys_set_mempolicy)
 #define __NR_migrate_pages 238
-__SC_COMP(__NR_migrate_pages, sys_migrate_pages, compat_sys_migrate_pages)
+__SYSCALL(__NR_migrate_pages, sys_migrate_pages)
 #define __NR_move_pages 239
-__SC_COMP(__NR_move_pages, sys_move_pages, compat_sys_move_pages)
+__SYSCALL(__NR_move_pages, sys_move_pages)
 #endif
 
 #define __NR_rt_tgsigqueueinfo 240
@@ -873,8 +873,18 @@ __SYSCALL(__NR_landlock_add_rule, sys_landlock_add_rule)
 #define __NR_landlock_restrict_self 446
 __SYSCALL(__NR_landlock_restrict_self, sys_landlock_restrict_self)
 
+#ifdef __ARCH_WANT_MEMFD_SECRET
+#define __NR_memfd_secret 447
+__SYSCALL(__NR_memfd_secret, sys_memfd_secret)
+#endif
+#define __NR_process_mrelease 448
+__SYSCALL(__NR_process_mrelease, sys_process_mrelease)
+
+#define __NR_futex_waitv 449
+__SYSCALL(__NR_futex_waitv, sys_futex_waitv)
+
 #undef __NR_syscalls
-#define __NR_syscalls 447
+#define __NR_syscalls 450
 
 /*
  * 32 bit systems traditionally used different
diff --git a/linux-headers/asm-mips/unistd_n32.h b/linux-headers/asm-mips/unistd_n32.h
index 09cd297698e..4b3e7ad1ecc 100644
--- a/linux-headers/asm-mips/unistd_n32.h
+++ b/linux-headers/asm-mips/unistd_n32.h
@@ -376,5 +376,6 @@
 #define __NR_landlock_create_ruleset (__NR_Linux + 444)
 #define __NR_landlock_add_rule (__NR_Linux + 445)
 #define __NR_landlock_restrict_self (__NR_Linux + 446)
+#define __NR_process_mrelease (__NR_Linux + 448)
 
 #endif /* _ASM_UNISTD_N32_H */
diff --git a/linux-headers/asm-mips/unistd_n64.h b/linux-headers/asm-mips/unistd_n64.h
index 780e0cead66..488d9298d9b 100644
--- a/linux-headers/asm-mips/unistd_n64.h
+++ b/linux-headers/asm-mips/unistd_n64.h
@@ -352,5 +352,6 @@
 #define __NR_landlock_create_ruleset (__NR_Linux + 444)
 #define __NR_landlock_add_rule (__NR_Linux + 445)
 #define __NR_landlock_restrict_self (__NR_Linux + 446)
+#define __NR_process_mrelease (__NR_Linux + 448)
 
 #endif /* _ASM_UNISTD_N64_H */
diff --git a/linux-headers/asm-mips/unistd_o32.h b/linux-headers/asm-mips/unistd_o32.h
index 06a2b3b55e6..f47399870a8 100644
--- a/linux-headers/asm-mips/unistd_o32.h
+++ b/linux-headers/asm-mips/unistd_o32.h
@@ -422,5 +422,6 @@
 #define __NR_landlock_create_ruleset (__NR_Linux + 444)
 #define __NR_landlock_add_rule (__NR_Linux + 445)
 #define __NR_landlock_restrict_self (__NR_Linux + 446)
+#define __NR_process_mrelease (__NR_Linux + 448)
 
 #endif /* _ASM_UNISTD_O32_H */
diff --git a/linux-headers/asm-powerpc/unistd_32.h b/linux-headers/asm-powerpc/unistd_32.h
index cd5a8a41b26..11d54696dc6 100644
--- a/linux-headers/asm-powerpc/unistd_32.h
+++ b/linux-headers/asm-powerpc/unistd_32.h
@@ -429,6 +429,7 @@
 #define __NR_landlock_create_ruleset 444
 #define __NR_landlock_add_rule 445
 #define __NR_landlock_restrict_self 446
+#define __NR_process_mrelease 448
 
 
 #endif /* _ASM_UNISTD_32_H */
diff --git a/linux-headers/asm-powerpc/unistd_64.h b/linux-headers/asm-powerpc/unistd_64.h
index 8458effa8d8..cf740bab136 100644
--- a/linux-headers/asm-powerpc/unistd_64.h
+++ b/linux-headers/asm-powerpc/unistd_64.h
@@ -401,6 +401,7 @@
 #define __NR_landlock_create_ruleset 444
 #define __NR_landlock_add_rule 445
 #define __NR_landlock_restrict_self 446
+#define __NR_process_mrelease 448
 
 
 #endif /* _ASM_UNISTD_64_H */
diff --git a/linux-headers/asm-s390/unistd_32.h b/linux-headers/asm-s390/unistd_32.h
index 0c3cd299e42..8f97d98128f 100644
--- a/linux-headers/asm-s390/unistd_32.h
+++ b/linux-headers/asm-s390/unistd_32.h
@@ -419,5 +419,6 @@
 #define __NR_landlock_create_ruleset 444
 #define __NR_landlock_add_rule 445
 #define __NR_landlock_restrict_self 446
+#define __NR_process_mrelease 448
 
 #endif /* _ASM_S390_UNISTD_32_H */
diff --git a/linux-headers/asm-s390/unistd_64.h b/linux-headers/asm-s390/unistd_64.h
index 8dfc08b5e62..021ffc30e66 100644
--- a/linux-headers/asm-s390/unistd_64.h
+++ b/linux-headers/asm-s390/unistd_64.h
@@ -367,5 +367,6 @@
 #define __NR_landlock_create_ruleset 444
 #define __NR_landlock_add_rule 445
 #define __NR_landlock_restrict_self 446
+#define __NR_process_mrelease 448
 
 #endif /* _ASM_S390_UNISTD_64_H */
diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
index a6c327f8ad9..5a776a08f78 100644
--- a/linux-headers/asm-x86/kvm.h
+++ b/linux-headers/asm-x86/kvm.h
@@ -295,6 +295,7 @@ struct kvm_debug_exit_arch {
 #define KVM_GUESTDBG_USE_HW_BP		0x00020000
 #define KVM_GUESTDBG_INJECT_DB		0x00040000
 #define KVM_GUESTDBG_INJECT_BP		0x00080000
+#define KVM_GUESTDBG_BLOCKIRQ		0x00100000
 
 /* for KVM_SET_GUEST_DEBUG */
 struct kvm_guest_debug_arch {
@@ -503,4 +504,8 @@ struct kvm_pmu_event_filter {
 #define KVM_PMU_EVENT_ALLOW 0
 #define KVM_PMU_EVENT_DENY 1
 
+/* for KVM_{GET,SET,HAS}_DEVICE_ATTR */
+#define KVM_VCPU_TSC_CTRL 0 /* control group for the timestamp counter (TSC) */
+#define   KVM_VCPU_TSC_OFFSET 0 /* attribute for the TSC offset */
+
 #endif /* _ASM_X86_KVM_H */
diff --git a/linux-headers/asm-x86/unistd_32.h b/linux-headers/asm-x86/unistd_32.h
index 66e96c0c685..9c9ffe312b6 100644
--- a/linux-headers/asm-x86/unistd_32.h
+++ b/linux-headers/asm-x86/unistd_32.h
@@ -437,6 +437,9 @@
 #define __NR_landlock_create_ruleset 444
 #define __NR_landlock_add_rule 445
 #define __NR_landlock_restrict_self 446
+#define __NR_memfd_secret 447
+#define __NR_process_mrelease 448
+#define __NR_futex_waitv 449
 
 
 #endif /* _ASM_UNISTD_32_H */
diff --git a/linux-headers/asm-x86/unistd_64.h b/linux-headers/asm-x86/unistd_64.h
index b8ff6f14ee8..084f1eef9c5 100644
--- a/linux-headers/asm-x86/unistd_64.h
+++ b/linux-headers/asm-x86/unistd_64.h
@@ -359,6 +359,9 @@
 #define __NR_landlock_create_ruleset 444
 #define __NR_landlock_add_rule 445
 #define __NR_landlock_restrict_self 446
+#define __NR_memfd_secret 447
+#define __NR_process_mrelease 448
+#define __NR_futex_waitv 449
 
 
 #endif /* _ASM_UNISTD_64_H */
diff --git a/linux-headers/asm-x86/unistd_x32.h b/linux-headers/asm-x86/unistd_x32.h
index 06a1097c15e..a2441affc24 100644
--- a/linux-headers/asm-x86/unistd_x32.h
+++ b/linux-headers/asm-x86/unistd_x32.h
@@ -312,6 +312,9 @@
 #define __NR_landlock_create_ruleset (__X32_SYSCALL_BIT + 444)
 #define __NR_landlock_add_rule (__X32_SYSCALL_BIT + 445)
 #define __NR_landlock_restrict_self (__X32_SYSCALL_BIT + 446)
+#define __NR_memfd_secret (__X32_SYSCALL_BIT + 447)
+#define __NR_process_mrelease (__X32_SYSCALL_BIT + 448)
+#define __NR_futex_waitv (__X32_SYSCALL_BIT + 449)
 #define __NR_rt_sigaction (__X32_SYSCALL_BIT + 512)
 #define __NR_rt_sigreturn (__X32_SYSCALL_BIT + 513)
 #define __NR_ioctl (__X32_SYSCALL_BIT + 514)
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index bcaf66cc4d2..02c5e7b7bbb 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -269,6 +269,7 @@ struct kvm_xen_exit {
 #define KVM_EXIT_AP_RESET_HOLD    32
 #define KVM_EXIT_X86_BUS_LOCK     33
 #define KVM_EXIT_XEN              34
+#define KVM_EXIT_RISCV_SBI        35
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -397,13 +398,23 @@ struct kvm_run {
 		 * "ndata" is correct, that new fields are enumerated in "flags",
 		 * and that each flag enumerates fields that are 64-bit aligned
 		 * and sized (so that ndata+internal.data[] is valid/accurate).
+		 *
+		 * Space beyond the defined fields may be used to store arbitrary
+		 * debug information relating to the emulation failure. It is
+		 * accounted for in "ndata" but the format is unspecified and is
+		 * not represented in "flags". Any such information is *not* ABI!
 		 */
 		struct {
 			__u32 suberror;
 			__u32 ndata;
 			__u64 flags;
-			__u8  insn_size;
-			__u8  insn_bytes[15];
+			union {
+				struct {
+					__u8  insn_size;
+					__u8  insn_bytes[15];
+				};
+			};
+			/* Arbitrary debug data may follow. */
 		} emulation_failure;
 		/* KVM_EXIT_OSI */
 		struct {
@@ -469,6 +480,13 @@ struct kvm_run {
 		} msr;
 		/* KVM_EXIT_XEN */
 		struct kvm_xen_exit xen;
+		/* KVM_EXIT_RISCV_SBI */
+		struct {
+			unsigned long extension_id;
+			unsigned long function_id;
+			unsigned long args[6];
+			unsigned long ret[2];
+		} riscv_sbi;
 		/* Fix the size of the union. */
 		char padding[256];
 	};
@@ -1223,11 +1241,16 @@ struct kvm_irqfd {
 
 /* Do not use 1, KVM_CHECK_EXTENSION returned it before we had flags.  */
 #define KVM_CLOCK_TSC_STABLE		2
+#define KVM_CLOCK_REALTIME		(1 << 2)
+#define KVM_CLOCK_HOST_TSC		(1 << 3)
 
 struct kvm_clock_data {
 	__u64 clock;
 	__u32 flags;
-	__u32 pad[9];
+	__u32 pad0;
+	__u64 realtime;
+	__u64 host_tsc;
+	__u32 pad[4];
 };
 
 /* For KVM_CAP_SW_TLB */
@@ -1965,7 +1988,9 @@ struct kvm_stats_header {
 #define KVM_STATS_TYPE_CUMULATIVE	(0x0 << KVM_STATS_TYPE_SHIFT)
 #define KVM_STATS_TYPE_INSTANT		(0x1 << KVM_STATS_TYPE_SHIFT)
 #define KVM_STATS_TYPE_PEAK		(0x2 << KVM_STATS_TYPE_SHIFT)
-#define KVM_STATS_TYPE_MAX		KVM_STATS_TYPE_PEAK
+#define KVM_STATS_TYPE_LINEAR_HIST	(0x3 << KVM_STATS_TYPE_SHIFT)
+#define KVM_STATS_TYPE_LOG_HIST		(0x4 << KVM_STATS_TYPE_SHIFT)
+#define KVM_STATS_TYPE_MAX		KVM_STATS_TYPE_LOG_HIST
 
 #define KVM_STATS_UNIT_SHIFT		4
 #define KVM_STATS_UNIT_MASK		(0xF << KVM_STATS_UNIT_SHIFT)
@@ -1988,8 +2013,9 @@ struct kvm_stats_header {
  * @size: The number of data items for this stats.
  *        Every data item is of type __u64.
  * @offset: The offset of the stats to the start of stat structure in
- *          struture kvm or kvm_vcpu.
- * @unused: Unused field for future usage. Always 0 for now.
+ *          structure kvm or kvm_vcpu.
+ * @bucket_size: A parameter value used for histogram stats. It is only used
+ *		for linear histogram stats, specifying the size of the bucket;
  * @name: The name string for the stats. Its size is indicated by the
  *        &kvm_stats_header->name_size.
  */
@@ -1998,7 +2024,7 @@ struct kvm_stats_desc {
 	__s16 exponent;
 	__u16 size;
 	__u32 offset;
-	__u32 unused;
+	__u32 bucket_size;
 	char name[];
 };
 
-- 
Gitee


From 698a44f94f0cad8fda374641678db9f70e3ce73f Mon Sep 17 00:00:00 2001
From: Vivek Goyal <vgoyal@redhat.com>
Date: Tue, 8 Feb 2022 15:48:05 -0500
Subject: [PATCH 085/407] linux-headers: Update headers to v5.17-rc1

RH-Author: Paul Lai <plai@redhat.com>
RH-MergeRequest: 176: Enable KVM AMX support
RH-Commit: [3/13] 63593c2431eabf02222f37467736b580022b94c8
RH-Bugzilla: 1916415
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Igor Mammedov <imammedo@redhat.com>
RH-Acked-by: Paolo Bonzini <pbonzini@redhat.com>

Update headers to 5.17-rc1. I need latest fuse changes.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <20220208204813.682906-3-vgoyal@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
(cherry picked from commit ef17dd6a8e6b6e3aeb29233996d44dfcb736d515)
Signed-off-by: Paul Lai <plai@redhat.com>
---
 include/standard-headers/asm-x86/kvm_para.h   |   1 +
 include/standard-headers/drm/drm_fourcc.h     |  11 ++
 include/standard-headers/linux/ethtool.h      |   1 +
 include/standard-headers/linux/fuse.h         |  60 +++++++-
 include/standard-headers/linux/pci_regs.h     | 142 +++++++++---------
 include/standard-headers/linux/virtio_gpio.h  |  72 +++++++++
 include/standard-headers/linux/virtio_i2c.h   |  47 ++++++
 include/standard-headers/linux/virtio_iommu.h |   8 +-
 .../standard-headers/linux/virtio_pcidev.h    |  65 ++++++++
 include/standard-headers/linux/virtio_scmi.h  |  24 +++
 linux-headers/asm-generic/unistd.h            |   5 +-
 linux-headers/asm-mips/unistd_n32.h           |   2 +
 linux-headers/asm-mips/unistd_n64.h           |   2 +
 linux-headers/asm-mips/unistd_o32.h           |   2 +
 linux-headers/asm-powerpc/unistd_32.h         |   2 +
 linux-headers/asm-powerpc/unistd_64.h         |   2 +
 linux-headers/asm-riscv/bitsperlong.h         |  14 ++
 linux-headers/asm-riscv/mman.h                |   1 +
 linux-headers/asm-riscv/unistd.h              |  44 ++++++
 linux-headers/asm-s390/unistd_32.h            |   2 +
 linux-headers/asm-s390/unistd_64.h            |   2 +
 linux-headers/asm-x86/kvm.h                   |  16 +-
 linux-headers/asm-x86/unistd_32.h             |   1 +
 linux-headers/asm-x86/unistd_64.h             |   1 +
 linux-headers/asm-x86/unistd_x32.h            |   1 +
 linux-headers/linux/kvm.h                     |  17 +++
 26 files changed, 469 insertions(+), 76 deletions(-)
 create mode 100644 include/standard-headers/linux/virtio_gpio.h
 create mode 100644 include/standard-headers/linux/virtio_i2c.h
 create mode 100644 include/standard-headers/linux/virtio_pcidev.h
 create mode 100644 include/standard-headers/linux/virtio_scmi.h
 create mode 100644 linux-headers/asm-riscv/bitsperlong.h
 create mode 100644 linux-headers/asm-riscv/mman.h
 create mode 100644 linux-headers/asm-riscv/unistd.h

diff --git a/include/standard-headers/asm-x86/kvm_para.h b/include/standard-headers/asm-x86/kvm_para.h
index 204cfb8640d..f0235e58a1d 100644
--- a/include/standard-headers/asm-x86/kvm_para.h
+++ b/include/standard-headers/asm-x86/kvm_para.h
@@ -8,6 +8,7 @@
  * should be used to determine that a VM is running under KVM.
  */
 #define KVM_CPUID_SIGNATURE	0x40000000
+#define KVM_SIGNATURE "KVMKVMKVM\0\0\0"
 
 /* This CPUID returns two feature bitmaps in eax, edx. Before enabling
  * a particular paravirtualization, the appropriate feature bit should
diff --git a/include/standard-headers/drm/drm_fourcc.h b/include/standard-headers/drm/drm_fourcc.h
index 2c025cb4fe8..4888f85f695 100644
--- a/include/standard-headers/drm/drm_fourcc.h
+++ b/include/standard-headers/drm/drm_fourcc.h
@@ -313,6 +313,13 @@ extern "C" {
  */
 #define DRM_FORMAT_P016		fourcc_code('P', '0', '1', '6') /* 2x2 subsampled Cr:Cb plane 16 bits per channel */
 
+/* 2 plane YCbCr420.
+ * 3 10 bit components and 2 padding bits packed into 4 bytes.
+ * index 0 = Y plane, [31:0] x:Y2:Y1:Y0 2:10:10:10 little endian
+ * index 1 = Cr:Cb plane, [63:0] x:Cr2:Cb2:Cr1:x:Cb1:Cr0:Cb0 [2:10:10:10:2:10:10:10] little endian
+ */
+#define DRM_FORMAT_P030		fourcc_code('P', '0', '3', '0') /* 2x2 subsampled Cr:Cb plane 10 bits per channel packed */
+
 /* 3 plane non-subsampled (444) YCbCr
  * 16 bits per component, but only 10 bits are used and 6 bits are padded
  * index 0: Y plane, [15:0] Y:x [10:6] little endian
@@ -853,6 +860,10 @@ drm_fourcc_canonicalize_nvidia_format_mod(uint64_t modifier)
  * and UV.  Some SAND-using hardware stores UV in a separate tiled
  * image from Y to reduce the column height, which is not supported
  * with these modifiers.
+ *
+ * The DRM_FORMAT_MOD_BROADCOM_SAND128_COL_HEIGHT modifier is also
+ * supported for DRM_FORMAT_P030 where the columns remain as 128 bytes
+ * wide, but as this is a 10 bpp format that translates to 96 pixels.
  */
 
 #define DRM_FORMAT_MOD_BROADCOM_SAND32_COL_HEIGHT(v) \
diff --git a/include/standard-headers/linux/ethtool.h b/include/standard-headers/linux/ethtool.h
index 688eb8dc39c..38d5a4cd6ed 100644
--- a/include/standard-headers/linux/ethtool.h
+++ b/include/standard-headers/linux/ethtool.h
@@ -231,6 +231,7 @@ enum tunable_id {
 	ETHTOOL_RX_COPYBREAK,
 	ETHTOOL_TX_COPYBREAK,
 	ETHTOOL_PFC_PREVENTION_TOUT, /* timeout in msecs */
+	ETHTOOL_TX_COPYBREAK_BUF_SIZE,
 	/*
 	 * Add your fresh new tunable attribute above and remember to update
 	 * tunable_strings[] in net/ethtool/common.c
diff --git a/include/standard-headers/linux/fuse.h b/include/standard-headers/linux/fuse.h
index 23ea31708bb..bda06258be5 100644
--- a/include/standard-headers/linux/fuse.h
+++ b/include/standard-headers/linux/fuse.h
@@ -184,6 +184,16 @@
  *
  *  7.34
  *  - add FUSE_SYNCFS
+ *
+ *  7.35
+ *  - add FOPEN_NOFLUSH
+ *
+ *  7.36
+ *  - extend fuse_init_in with reserved fields, add FUSE_INIT_EXT init flag
+ *  - add flags2 to fuse_init_in and fuse_init_out
+ *  - add FUSE_SECURITY_CTX init flag
+ *  - add security context to create, mkdir, symlink, and mknod requests
+ *  - add FUSE_HAS_INODE_DAX, FUSE_ATTR_DAX
  */
 
 #ifndef _LINUX_FUSE_H
@@ -215,7 +225,7 @@
 #define FUSE_KERNEL_VERSION 7
 
 /** Minor version number of this interface */
-#define FUSE_KERNEL_MINOR_VERSION 34
+#define FUSE_KERNEL_MINOR_VERSION 36
 
 /** The node ID of the root inode */
 #define FUSE_ROOT_ID 1
@@ -286,12 +296,14 @@ struct fuse_file_lock {
  * FOPEN_NONSEEKABLE: the file is not seekable
  * FOPEN_CACHE_DIR: allow caching this directory
  * FOPEN_STREAM: the file is stream-like (no file position at all)
+ * FOPEN_NOFLUSH: don't flush data cache on close (unless FUSE_WRITEBACK_CACHE)
  */
 #define FOPEN_DIRECT_IO		(1 << 0)
 #define FOPEN_KEEP_CACHE	(1 << 1)
 #define FOPEN_NONSEEKABLE	(1 << 2)
 #define FOPEN_CACHE_DIR		(1 << 3)
 #define FOPEN_STREAM		(1 << 4)
+#define FOPEN_NOFLUSH		(1 << 5)
 
 /**
  * INIT request/reply flags
@@ -332,6 +344,11 @@ struct fuse_file_lock {
  *			write/truncate sgid is killed only if file has group
  *			execute permission. (Same as Linux VFS behavior).
  * FUSE_SETXATTR_EXT:	Server supports extended struct fuse_setxattr_in
+ * FUSE_INIT_EXT: extended fuse_init_in request
+ * FUSE_INIT_RESERVED: reserved, do not use
+ * FUSE_SECURITY_CTX:	add security context to create, mkdir, symlink, and
+ *			mknod
+ * FUSE_HAS_INODE_DAX:  use per inode DAX
  */
 #define FUSE_ASYNC_READ		(1 << 0)
 #define FUSE_POSIX_LOCKS	(1 << 1)
@@ -363,6 +380,11 @@ struct fuse_file_lock {
 #define FUSE_SUBMOUNTS		(1 << 27)
 #define FUSE_HANDLE_KILLPRIV_V2	(1 << 28)
 #define FUSE_SETXATTR_EXT	(1 << 29)
+#define FUSE_INIT_EXT		(1 << 30)
+#define FUSE_INIT_RESERVED	(1 << 31)
+/* bits 32..63 get shifted down 32 bits into the flags2 field */
+#define FUSE_SECURITY_CTX	(1ULL << 32)
+#define FUSE_HAS_INODE_DAX	(1ULL << 33)
 
 /**
  * CUSE INIT request/reply flags
@@ -445,8 +467,10 @@ struct fuse_file_lock {
  * fuse_attr flags
  *
  * FUSE_ATTR_SUBMOUNT: Object is a submount root
+ * FUSE_ATTR_DAX: Enable DAX for this file in per inode DAX mode
  */
 #define FUSE_ATTR_SUBMOUNT      (1 << 0)
+#define FUSE_ATTR_DAX		(1 << 1)
 
 /**
  * Open flags
@@ -732,6 +756,8 @@ struct fuse_init_in {
 	uint32_t	minor;
 	uint32_t	max_readahead;
 	uint32_t	flags;
+	uint32_t	flags2;
+	uint32_t	unused[11];
 };
 
 #define FUSE_COMPAT_INIT_OUT_SIZE 8
@@ -748,7 +774,8 @@ struct fuse_init_out {
 	uint32_t	time_gran;
 	uint16_t	max_pages;
 	uint16_t	map_alignment;
-	uint32_t	unused[8];
+	uint32_t	flags2;
+	uint32_t	unused[7];
 };
 
 #define CUSE_INIT_INFO_MAX 4096
@@ -856,9 +883,12 @@ struct fuse_dirent {
 	char name[];
 };
 
-#define FUSE_NAME_OFFSET offsetof(struct fuse_dirent, name)
-#define FUSE_DIRENT_ALIGN(x) \
+/* Align variable length records to 64bit boundary */
+#define FUSE_REC_ALIGN(x) \
 	(((x) + sizeof(uint64_t) - 1) & ~(sizeof(uint64_t) - 1))
+
+#define FUSE_NAME_OFFSET offsetof(struct fuse_dirent, name)
+#define FUSE_DIRENT_ALIGN(x) FUSE_REC_ALIGN(x)
 #define FUSE_DIRENT_SIZE(d) \
 	FUSE_DIRENT_ALIGN(FUSE_NAME_OFFSET + (d)->namelen)
 
@@ -975,4 +1005,26 @@ struct fuse_syncfs_in {
 	uint64_t	padding;
 };
 
+/*
+ * For each security context, send fuse_secctx with size of security context
+ * fuse_secctx will be followed by security context name and this in turn
+ * will be followed by actual context label.
+ * fuse_secctx, name, context
+ */
+struct fuse_secctx {
+	uint32_t	size;
+	uint32_t	padding;
+};
+
+/*
+ * Contains the information about how many fuse_secctx structures are being
+ * sent and what's the total size of all security contexts (including
+ * size of fuse_secctx_header).
+ *
+ */
+struct fuse_secctx_header {
+	uint32_t	size;
+	uint32_t	nr_secctx;
+};
+
 #endif /* _LINUX_FUSE_H */
diff --git a/include/standard-headers/linux/pci_regs.h b/include/standard-headers/linux/pci_regs.h
index ff6ccbc6efe..bee1a9ed6e6 100644
--- a/include/standard-headers/linux/pci_regs.h
+++ b/include/standard-headers/linux/pci_regs.h
@@ -301,23 +301,23 @@
 #define  PCI_SID_ESR_FIC	0x20	/* First In Chassis Flag */
 #define PCI_SID_CHASSIS_NR	3	/* Chassis Number */
 
-/* Message Signalled Interrupt registers */
+/* Message Signaled Interrupt registers */
 
-#define PCI_MSI_FLAGS		2	/* Message Control */
+#define PCI_MSI_FLAGS		0x02	/* Message Control */
 #define  PCI_MSI_FLAGS_ENABLE	0x0001	/* MSI feature enabled */
 #define  PCI_MSI_FLAGS_QMASK	0x000e	/* Maximum queue size available */
 #define  PCI_MSI_FLAGS_QSIZE	0x0070	/* Message queue size configured */
 #define  PCI_MSI_FLAGS_64BIT	0x0080	/* 64-bit addresses allowed */
 #define  PCI_MSI_FLAGS_MASKBIT	0x0100	/* Per-vector masking capable */
 #define PCI_MSI_RFU		3	/* Rest of capability flags */
-#define PCI_MSI_ADDRESS_LO	4	/* Lower 32 bits */
-#define PCI_MSI_ADDRESS_HI	8	/* Upper 32 bits (if PCI_MSI_FLAGS_64BIT set) */
-#define PCI_MSI_DATA_32		8	/* 16 bits of data for 32-bit devices */
-#define PCI_MSI_MASK_32		12	/* Mask bits register for 32-bit devices */
-#define PCI_MSI_PENDING_32	16	/* Pending intrs for 32-bit devices */
-#define PCI_MSI_DATA_64		12	/* 16 bits of data for 64-bit devices */
-#define PCI_MSI_MASK_64		16	/* Mask bits register for 64-bit devices */
-#define PCI_MSI_PENDING_64	20	/* Pending intrs for 64-bit devices */
+#define PCI_MSI_ADDRESS_LO	0x04	/* Lower 32 bits */
+#define PCI_MSI_ADDRESS_HI	0x08	/* Upper 32 bits (if PCI_MSI_FLAGS_64BIT set) */
+#define PCI_MSI_DATA_32		0x08	/* 16 bits of data for 32-bit devices */
+#define PCI_MSI_MASK_32		0x0c	/* Mask bits register for 32-bit devices */
+#define PCI_MSI_PENDING_32	0x10	/* Pending intrs for 32-bit devices */
+#define PCI_MSI_DATA_64		0x0c	/* 16 bits of data for 64-bit devices */
+#define PCI_MSI_MASK_64		0x10	/* Mask bits register for 64-bit devices */
+#define PCI_MSI_PENDING_64	0x14	/* Pending intrs for 64-bit devices */
 
 /* MSI-X registers (in MSI-X capability) */
 #define PCI_MSIX_FLAGS		2	/* Message Control */
@@ -335,10 +335,10 @@
 
 /* MSI-X Table entry format (in memory mapped by a BAR) */
 #define PCI_MSIX_ENTRY_SIZE		16
-#define PCI_MSIX_ENTRY_LOWER_ADDR	0  /* Message Address */
-#define PCI_MSIX_ENTRY_UPPER_ADDR	4  /* Message Upper Address */
-#define PCI_MSIX_ENTRY_DATA		8  /* Message Data */
-#define PCI_MSIX_ENTRY_VECTOR_CTRL	12 /* Vector Control */
+#define PCI_MSIX_ENTRY_LOWER_ADDR	0x0  /* Message Address */
+#define PCI_MSIX_ENTRY_UPPER_ADDR	0x4  /* Message Upper Address */
+#define PCI_MSIX_ENTRY_DATA		0x8  /* Message Data */
+#define PCI_MSIX_ENTRY_VECTOR_CTRL	0xc  /* Vector Control */
 #define  PCI_MSIX_ENTRY_CTRL_MASKBIT	0x00000001
 
 /* CompactPCI Hotswap Register */
@@ -470,7 +470,7 @@
 
 /* PCI Express capability registers */
 
-#define PCI_EXP_FLAGS		2	/* Capabilities register */
+#define PCI_EXP_FLAGS		0x02	/* Capabilities register */
 #define  PCI_EXP_FLAGS_VERS	0x000f	/* Capability version */
 #define  PCI_EXP_FLAGS_TYPE	0x00f0	/* Device/Port type */
 #define   PCI_EXP_TYPE_ENDPOINT	   0x0	/* Express Endpoint */
@@ -484,7 +484,7 @@
 #define   PCI_EXP_TYPE_RC_EC	   0xa	/* Root Complex Event Collector */
 #define  PCI_EXP_FLAGS_SLOT	0x0100	/* Slot implemented */
 #define  PCI_EXP_FLAGS_IRQ	0x3e00	/* Interrupt message number */
-#define PCI_EXP_DEVCAP		4	/* Device capabilities */
+#define PCI_EXP_DEVCAP		0x04	/* Device capabilities */
 #define  PCI_EXP_DEVCAP_PAYLOAD	0x00000007 /* Max_Payload_Size */
 #define  PCI_EXP_DEVCAP_PHANTOM	0x00000018 /* Phantom functions */
 #define  PCI_EXP_DEVCAP_EXT_TAG	0x00000020 /* Extended tags */
@@ -497,7 +497,7 @@
 #define  PCI_EXP_DEVCAP_PWR_VAL	0x03fc0000 /* Slot Power Limit Value */
 #define  PCI_EXP_DEVCAP_PWR_SCL	0x0c000000 /* Slot Power Limit Scale */
 #define  PCI_EXP_DEVCAP_FLR     0x10000000 /* Function Level Reset */
-#define PCI_EXP_DEVCTL		8	/* Device Control */
+#define PCI_EXP_DEVCTL		0x08	/* Device Control */
 #define  PCI_EXP_DEVCTL_CERE	0x0001	/* Correctable Error Reporting En. */
 #define  PCI_EXP_DEVCTL_NFERE	0x0002	/* Non-Fatal Error Reporting Enable */
 #define  PCI_EXP_DEVCTL_FERE	0x0004	/* Fatal Error Reporting Enable */
@@ -522,7 +522,7 @@
 #define  PCI_EXP_DEVCTL_READRQ_2048B 0x4000 /* 2048 Bytes */
 #define  PCI_EXP_DEVCTL_READRQ_4096B 0x5000 /* 4096 Bytes */
 #define  PCI_EXP_DEVCTL_BCR_FLR 0x8000  /* Bridge Configuration Retry / FLR */
-#define PCI_EXP_DEVSTA		10	/* Device Status */
+#define PCI_EXP_DEVSTA		0x0a	/* Device Status */
 #define  PCI_EXP_DEVSTA_CED	0x0001	/* Correctable Error Detected */
 #define  PCI_EXP_DEVSTA_NFED	0x0002	/* Non-Fatal Error Detected */
 #define  PCI_EXP_DEVSTA_FED	0x0004	/* Fatal Error Detected */
@@ -530,7 +530,7 @@
 #define  PCI_EXP_DEVSTA_AUXPD	0x0010	/* AUX Power Detected */
 #define  PCI_EXP_DEVSTA_TRPND	0x0020	/* Transactions Pending */
 #define PCI_CAP_EXP_RC_ENDPOINT_SIZEOF_V1	12	/* v1 endpoints without link end here */
-#define PCI_EXP_LNKCAP		12	/* Link Capabilities */
+#define PCI_EXP_LNKCAP		0x0c	/* Link Capabilities */
 #define  PCI_EXP_LNKCAP_SLS	0x0000000f /* Supported Link Speeds */
 #define  PCI_EXP_LNKCAP_SLS_2_5GB 0x00000001 /* LNKCAP2 SLS Vector bit 0 */
 #define  PCI_EXP_LNKCAP_SLS_5_0GB 0x00000002 /* LNKCAP2 SLS Vector bit 1 */
@@ -549,7 +549,7 @@
 #define  PCI_EXP_LNKCAP_DLLLARC	0x00100000 /* Data Link Layer Link Active Reporting Capable */
 #define  PCI_EXP_LNKCAP_LBNC	0x00200000 /* Link Bandwidth Notification Capability */
 #define  PCI_EXP_LNKCAP_PN	0xff000000 /* Port Number */
-#define PCI_EXP_LNKCTL		16	/* Link Control */
+#define PCI_EXP_LNKCTL		0x10	/* Link Control */
 #define  PCI_EXP_LNKCTL_ASPMC	0x0003	/* ASPM Control */
 #define  PCI_EXP_LNKCTL_ASPM_L0S 0x0001	/* L0s Enable */
 #define  PCI_EXP_LNKCTL_ASPM_L1  0x0002	/* L1 Enable */
@@ -562,7 +562,7 @@
 #define  PCI_EXP_LNKCTL_HAWD	0x0200	/* Hardware Autonomous Width Disable */
 #define  PCI_EXP_LNKCTL_LBMIE	0x0400	/* Link Bandwidth Management Interrupt Enable */
 #define  PCI_EXP_LNKCTL_LABIE	0x0800	/* Link Autonomous Bandwidth Interrupt Enable */
-#define PCI_EXP_LNKSTA		18	/* Link Status */
+#define PCI_EXP_LNKSTA		0x12	/* Link Status */
 #define  PCI_EXP_LNKSTA_CLS	0x000f	/* Current Link Speed */
 #define  PCI_EXP_LNKSTA_CLS_2_5GB 0x0001 /* Current Link Speed 2.5GT/s */
 #define  PCI_EXP_LNKSTA_CLS_5_0GB 0x0002 /* Current Link Speed 5.0GT/s */
@@ -582,7 +582,7 @@
 #define  PCI_EXP_LNKSTA_LBMS	0x4000	/* Link Bandwidth Management Status */
 #define  PCI_EXP_LNKSTA_LABS	0x8000	/* Link Autonomous Bandwidth Status */
 #define PCI_CAP_EXP_ENDPOINT_SIZEOF_V1	20	/* v1 endpoints with link end here */
-#define PCI_EXP_SLTCAP		20	/* Slot Capabilities */
+#define PCI_EXP_SLTCAP		0x14	/* Slot Capabilities */
 #define  PCI_EXP_SLTCAP_ABP	0x00000001 /* Attention Button Present */
 #define  PCI_EXP_SLTCAP_PCP	0x00000002 /* Power Controller Present */
 #define  PCI_EXP_SLTCAP_MRLSP	0x00000004 /* MRL Sensor Present */
@@ -595,7 +595,7 @@
 #define  PCI_EXP_SLTCAP_EIP	0x00020000 /* Electromechanical Interlock Present */
 #define  PCI_EXP_SLTCAP_NCCS	0x00040000 /* No Command Completed Support */
 #define  PCI_EXP_SLTCAP_PSN	0xfff80000 /* Physical Slot Number */
-#define PCI_EXP_SLTCTL		24	/* Slot Control */
+#define PCI_EXP_SLTCTL		0x18	/* Slot Control */
 #define  PCI_EXP_SLTCTL_ABPE	0x0001	/* Attention Button Pressed Enable */
 #define  PCI_EXP_SLTCTL_PFDE	0x0002	/* Power Fault Detected Enable */
 #define  PCI_EXP_SLTCTL_MRLSCE	0x0004	/* MRL Sensor Changed Enable */
@@ -617,7 +617,7 @@
 #define  PCI_EXP_SLTCTL_EIC	0x0800	/* Electromechanical Interlock Control */
 #define  PCI_EXP_SLTCTL_DLLSCE	0x1000	/* Data Link Layer State Changed Enable */
 #define  PCI_EXP_SLTCTL_IBPD_DISABLE	0x4000 /* In-band PD disable */
-#define PCI_EXP_SLTSTA		26	/* Slot Status */
+#define PCI_EXP_SLTSTA		0x1a	/* Slot Status */
 #define  PCI_EXP_SLTSTA_ABP	0x0001	/* Attention Button Pressed */
 #define  PCI_EXP_SLTSTA_PFD	0x0002	/* Power Fault Detected */
 #define  PCI_EXP_SLTSTA_MRLSC	0x0004	/* MRL Sensor Changed */
@@ -627,15 +627,15 @@
 #define  PCI_EXP_SLTSTA_PDS	0x0040	/* Presence Detect State */
 #define  PCI_EXP_SLTSTA_EIS	0x0080	/* Electromechanical Interlock Status */
 #define  PCI_EXP_SLTSTA_DLLSC	0x0100	/* Data Link Layer State Changed */
-#define PCI_EXP_RTCTL		28	/* Root Control */
+#define PCI_EXP_RTCTL		0x1c	/* Root Control */
 #define  PCI_EXP_RTCTL_SECEE	0x0001	/* System Error on Correctable Error */
 #define  PCI_EXP_RTCTL_SENFEE	0x0002	/* System Error on Non-Fatal Error */
 #define  PCI_EXP_RTCTL_SEFEE	0x0004	/* System Error on Fatal Error */
 #define  PCI_EXP_RTCTL_PMEIE	0x0008	/* PME Interrupt Enable */
 #define  PCI_EXP_RTCTL_CRSSVE	0x0010	/* CRS Software Visibility Enable */
-#define PCI_EXP_RTCAP		30	/* Root Capabilities */
+#define PCI_EXP_RTCAP		0x1e	/* Root Capabilities */
 #define  PCI_EXP_RTCAP_CRSVIS	0x0001	/* CRS Software Visibility capability */
-#define PCI_EXP_RTSTA		32	/* Root Status */
+#define PCI_EXP_RTSTA		0x20	/* Root Status */
 #define  PCI_EXP_RTSTA_PME	0x00010000 /* PME status */
 #define  PCI_EXP_RTSTA_PENDING	0x00020000 /* PME pending */
 /*
@@ -646,7 +646,7 @@
  * Use pcie_capability_read_word() and similar interfaces to use them
  * safely.
  */
-#define PCI_EXP_DEVCAP2		36	/* Device Capabilities 2 */
+#define PCI_EXP_DEVCAP2		0x24	/* Device Capabilities 2 */
 #define  PCI_EXP_DEVCAP2_COMP_TMOUT_DIS	0x00000010 /* Completion Timeout Disable supported */
 #define  PCI_EXP_DEVCAP2_ARI		0x00000020 /* Alternative Routing-ID */
 #define  PCI_EXP_DEVCAP2_ATOMIC_ROUTE	0x00000040 /* Atomic Op routing */
@@ -658,7 +658,7 @@
 #define  PCI_EXP_DEVCAP2_OBFF_MSG	0x00040000 /* New message signaling */
 #define  PCI_EXP_DEVCAP2_OBFF_WAKE	0x00080000 /* Re-use WAKE# for OBFF */
 #define  PCI_EXP_DEVCAP2_EE_PREFIX	0x00200000 /* End-End TLP Prefix */
-#define PCI_EXP_DEVCTL2		40	/* Device Control 2 */
+#define PCI_EXP_DEVCTL2		0x28	/* Device Control 2 */
 #define  PCI_EXP_DEVCTL2_COMP_TIMEOUT	0x000f	/* Completion Timeout Value */
 #define  PCI_EXP_DEVCTL2_COMP_TMOUT_DIS	0x0010	/* Completion Timeout Disable */
 #define  PCI_EXP_DEVCTL2_ARI		0x0020	/* Alternative Routing-ID */
@@ -670,9 +670,9 @@
 #define  PCI_EXP_DEVCTL2_OBFF_MSGA_EN	0x2000	/* Enable OBFF Message type A */
 #define  PCI_EXP_DEVCTL2_OBFF_MSGB_EN	0x4000	/* Enable OBFF Message type B */
 #define  PCI_EXP_DEVCTL2_OBFF_WAKE_EN	0x6000	/* OBFF using WAKE# signaling */
-#define PCI_EXP_DEVSTA2		42	/* Device Status 2 */
-#define PCI_CAP_EXP_RC_ENDPOINT_SIZEOF_V2	44	/* v2 endpoints without link end here */
-#define PCI_EXP_LNKCAP2		44	/* Link Capabilities 2 */
+#define PCI_EXP_DEVSTA2		0x2a	/* Device Status 2 */
+#define PCI_CAP_EXP_RC_ENDPOINT_SIZEOF_V2 0x2c	/* end of v2 EPs w/o link */
+#define PCI_EXP_LNKCAP2		0x2c	/* Link Capabilities 2 */
 #define  PCI_EXP_LNKCAP2_SLS_2_5GB	0x00000002 /* Supported Speed 2.5GT/s */
 #define  PCI_EXP_LNKCAP2_SLS_5_0GB	0x00000004 /* Supported Speed 5GT/s */
 #define  PCI_EXP_LNKCAP2_SLS_8_0GB	0x00000008 /* Supported Speed 8GT/s */
@@ -680,7 +680,7 @@
 #define  PCI_EXP_LNKCAP2_SLS_32_0GB	0x00000020 /* Supported Speed 32GT/s */
 #define  PCI_EXP_LNKCAP2_SLS_64_0GB	0x00000040 /* Supported Speed 64GT/s */
 #define  PCI_EXP_LNKCAP2_CROSSLINK	0x00000100 /* Crosslink supported */
-#define PCI_EXP_LNKCTL2		48	/* Link Control 2 */
+#define PCI_EXP_LNKCTL2		0x30	/* Link Control 2 */
 #define  PCI_EXP_LNKCTL2_TLS		0x000f
 #define  PCI_EXP_LNKCTL2_TLS_2_5GT	0x0001 /* Supported Speed 2.5GT/s */
 #define  PCI_EXP_LNKCTL2_TLS_5_0GT	0x0002 /* Supported Speed 5GT/s */
@@ -691,12 +691,12 @@
 #define  PCI_EXP_LNKCTL2_ENTER_COMP	0x0010 /* Enter Compliance */
 #define  PCI_EXP_LNKCTL2_TX_MARGIN	0x0380 /* Transmit Margin */
 #define  PCI_EXP_LNKCTL2_HASD		0x0020 /* HW Autonomous Speed Disable */
-#define PCI_EXP_LNKSTA2		50	/* Link Status 2 */
-#define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2	52	/* v2 endpoints with link end here */
-#define PCI_EXP_SLTCAP2		52	/* Slot Capabilities 2 */
+#define PCI_EXP_LNKSTA2		0x32	/* Link Status 2 */
+#define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2	0x32	/* end of v2 EPs w/ link */
+#define PCI_EXP_SLTCAP2		0x34	/* Slot Capabilities 2 */
 #define  PCI_EXP_SLTCAP2_IBPD	0x00000001 /* In-band PD Disable Supported */
-#define PCI_EXP_SLTCTL2		56	/* Slot Control 2 */
-#define PCI_EXP_SLTSTA2		58	/* Slot Status 2 */
+#define PCI_EXP_SLTCTL2		0x38	/* Slot Control 2 */
+#define PCI_EXP_SLTSTA2		0x3a	/* Slot Status 2 */
 
 /* Extended Capabilities (PCI-X 2.0 and Express) */
 #define PCI_EXT_CAP_ID(header)		(header & 0x0000ffff)
@@ -742,7 +742,7 @@
 #define PCI_EXT_CAP_MCAST_ENDPOINT_SIZEOF 40
 
 /* Advanced Error Reporting */
-#define PCI_ERR_UNCOR_STATUS	4	/* Uncorrectable Error Status */
+#define PCI_ERR_UNCOR_STATUS	0x04	/* Uncorrectable Error Status */
 #define  PCI_ERR_UNC_UND	0x00000001	/* Undefined */
 #define  PCI_ERR_UNC_DLP	0x00000010	/* Data Link Protocol */
 #define  PCI_ERR_UNC_SURPDN	0x00000020	/* Surprise Down */
@@ -760,11 +760,11 @@
 #define  PCI_ERR_UNC_MCBTLP	0x00800000	/* MC blocked TLP */
 #define  PCI_ERR_UNC_ATOMEG	0x01000000	/* Atomic egress blocked */
 #define  PCI_ERR_UNC_TLPPRE	0x02000000	/* TLP prefix blocked */
-#define PCI_ERR_UNCOR_MASK	8	/* Uncorrectable Error Mask */
+#define PCI_ERR_UNCOR_MASK	0x08	/* Uncorrectable Error Mask */
 	/* Same bits as above */
-#define PCI_ERR_UNCOR_SEVER	12	/* Uncorrectable Error Severity */
+#define PCI_ERR_UNCOR_SEVER	0x0c	/* Uncorrectable Error Severity */
 	/* Same bits as above */
-#define PCI_ERR_COR_STATUS	16	/* Correctable Error Status */
+#define PCI_ERR_COR_STATUS	0x10	/* Correctable Error Status */
 #define  PCI_ERR_COR_RCVR	0x00000001	/* Receiver Error Status */
 #define  PCI_ERR_COR_BAD_TLP	0x00000040	/* Bad TLP Status */
 #define  PCI_ERR_COR_BAD_DLLP	0x00000080	/* Bad DLLP Status */
@@ -773,20 +773,20 @@
 #define  PCI_ERR_COR_ADV_NFAT	0x00002000	/* Advisory Non-Fatal */
 #define  PCI_ERR_COR_INTERNAL	0x00004000	/* Corrected Internal */
 #define  PCI_ERR_COR_LOG_OVER	0x00008000	/* Header Log Overflow */
-#define PCI_ERR_COR_MASK	20	/* Correctable Error Mask */
+#define PCI_ERR_COR_MASK	0x14	/* Correctable Error Mask */
 	/* Same bits as above */
-#define PCI_ERR_CAP		24	/* Advanced Error Capabilities */
-#define  PCI_ERR_CAP_FEP(x)	((x) & 31)	/* First Error Pointer */
+#define PCI_ERR_CAP		0x18	/* Advanced Error Capabilities & Ctrl*/
+#define  PCI_ERR_CAP_FEP(x)	((x) & 0x1f)	/* First Error Pointer */
 #define  PCI_ERR_CAP_ECRC_GENC	0x00000020	/* ECRC Generation Capable */
 #define  PCI_ERR_CAP_ECRC_GENE	0x00000040	/* ECRC Generation Enable */
 #define  PCI_ERR_CAP_ECRC_CHKC	0x00000080	/* ECRC Check Capable */
 #define  PCI_ERR_CAP_ECRC_CHKE	0x00000100	/* ECRC Check Enable */
-#define PCI_ERR_HEADER_LOG	28	/* Header Log Register (16 bytes) */
-#define PCI_ERR_ROOT_COMMAND	44	/* Root Error Command */
+#define PCI_ERR_HEADER_LOG	0x1c	/* Header Log Register (16 bytes) */
+#define PCI_ERR_ROOT_COMMAND	0x2c	/* Root Error Command */
 #define  PCI_ERR_ROOT_CMD_COR_EN	0x00000001 /* Correctable Err Reporting Enable */
 #define  PCI_ERR_ROOT_CMD_NONFATAL_EN	0x00000002 /* Non-Fatal Err Reporting Enable */
 #define  PCI_ERR_ROOT_CMD_FATAL_EN	0x00000004 /* Fatal Err Reporting Enable */
-#define PCI_ERR_ROOT_STATUS	48
+#define PCI_ERR_ROOT_STATUS	0x30
 #define  PCI_ERR_ROOT_COR_RCV		0x00000001 /* ERR_COR Received */
 #define  PCI_ERR_ROOT_MULTI_COR_RCV	0x00000002 /* Multiple ERR_COR */
 #define  PCI_ERR_ROOT_UNCOR_RCV		0x00000004 /* ERR_FATAL/NONFATAL */
@@ -795,52 +795,52 @@
 #define  PCI_ERR_ROOT_NONFATAL_RCV	0x00000020 /* Non-Fatal Received */
 #define  PCI_ERR_ROOT_FATAL_RCV		0x00000040 /* Fatal Received */
 #define  PCI_ERR_ROOT_AER_IRQ		0xf8000000 /* Advanced Error Interrupt Message Number */
-#define PCI_ERR_ROOT_ERR_SRC	52	/* Error Source Identification */
+#define PCI_ERR_ROOT_ERR_SRC	0x34	/* Error Source Identification */
 
 /* Virtual Channel */
-#define PCI_VC_PORT_CAP1	4
+#define PCI_VC_PORT_CAP1	0x04
 #define  PCI_VC_CAP1_EVCC	0x00000007	/* extended VC count */
 #define  PCI_VC_CAP1_LPEVCC	0x00000070	/* low prio extended VC count */
 #define  PCI_VC_CAP1_ARB_SIZE	0x00000c00
-#define PCI_VC_PORT_CAP2	8
+#define PCI_VC_PORT_CAP2	0x08
 #define  PCI_VC_CAP2_32_PHASE		0x00000002
 #define  PCI_VC_CAP2_64_PHASE		0x00000004
 #define  PCI_VC_CAP2_128_PHASE		0x00000008
 #define  PCI_VC_CAP2_ARB_OFF		0xff000000
-#define PCI_VC_PORT_CTRL	12
+#define PCI_VC_PORT_CTRL	0x0c
 #define  PCI_VC_PORT_CTRL_LOAD_TABLE	0x00000001
-#define PCI_VC_PORT_STATUS	14
+#define PCI_VC_PORT_STATUS	0x0e
 #define  PCI_VC_PORT_STATUS_TABLE	0x00000001
-#define PCI_VC_RES_CAP		16
+#define PCI_VC_RES_CAP		0x10
 #define  PCI_VC_RES_CAP_32_PHASE	0x00000002
 #define  PCI_VC_RES_CAP_64_PHASE	0x00000004
 #define  PCI_VC_RES_CAP_128_PHASE	0x00000008
 #define  PCI_VC_RES_CAP_128_PHASE_TB	0x00000010
 #define  PCI_VC_RES_CAP_256_PHASE	0x00000020
 #define  PCI_VC_RES_CAP_ARB_OFF		0xff000000
-#define PCI_VC_RES_CTRL		20
+#define PCI_VC_RES_CTRL		0x14
 #define  PCI_VC_RES_CTRL_LOAD_TABLE	0x00010000
 #define  PCI_VC_RES_CTRL_ARB_SELECT	0x000e0000
 #define  PCI_VC_RES_CTRL_ID		0x07000000
 #define  PCI_VC_RES_CTRL_ENABLE		0x80000000
-#define PCI_VC_RES_STATUS	26
+#define PCI_VC_RES_STATUS	0x1a
 #define  PCI_VC_RES_STATUS_TABLE	0x00000001
 #define  PCI_VC_RES_STATUS_NEGO		0x00000002
 #define PCI_CAP_VC_BASE_SIZEOF		0x10
-#define PCI_CAP_VC_PER_VC_SIZEOF	0x0C
+#define PCI_CAP_VC_PER_VC_SIZEOF	0x0c
 
 /* Power Budgeting */
-#define PCI_PWR_DSR		4	/* Data Select Register */
-#define PCI_PWR_DATA		8	/* Data Register */
+#define PCI_PWR_DSR		0x04	/* Data Select Register */
+#define PCI_PWR_DATA		0x08	/* Data Register */
 #define  PCI_PWR_DATA_BASE(x)	((x) & 0xff)	    /* Base Power */
 #define  PCI_PWR_DATA_SCALE(x)	(((x) >> 8) & 3)    /* Data Scale */
 #define  PCI_PWR_DATA_PM_SUB(x)	(((x) >> 10) & 7)   /* PM Sub State */
 #define  PCI_PWR_DATA_PM_STATE(x) (((x) >> 13) & 3) /* PM State */
 #define  PCI_PWR_DATA_TYPE(x)	(((x) >> 15) & 7)   /* Type */
 #define  PCI_PWR_DATA_RAIL(x)	(((x) >> 18) & 7)   /* Power Rail */
-#define PCI_PWR_CAP		12	/* Capability */
+#define PCI_PWR_CAP		0x0c	/* Capability */
 #define  PCI_PWR_CAP_BUDGET(x)	((x) & 1)	/* Included in system budget */
-#define PCI_EXT_CAP_PWR_SIZEOF	16
+#define PCI_EXT_CAP_PWR_SIZEOF	0x10
 
 /* Root Complex Event Collector Endpoint Association  */
 #define PCI_RCEC_RCIEP_BITMAP	4	/* Associated Bitmap for RCiEPs */
@@ -964,7 +964,7 @@
 #define  PCI_SRIOV_VFM_MI	0x1	/* Dormant.MigrateIn */
 #define  PCI_SRIOV_VFM_MO	0x2	/* Active.MigrateOut */
 #define  PCI_SRIOV_VFM_AV	0x3	/* Active.Available */
-#define PCI_EXT_CAP_SRIOV_SIZEOF 64
+#define PCI_EXT_CAP_SRIOV_SIZEOF 0x40
 
 #define PCI_LTR_MAX_SNOOP_LAT	0x4
 #define PCI_LTR_MAX_NOSNOOP_LAT	0x6
@@ -1017,12 +1017,12 @@
 #define   PCI_TPH_LOC_NONE	0x000	/* no location */
 #define   PCI_TPH_LOC_CAP	0x200	/* in capability */
 #define   PCI_TPH_LOC_MSIX	0x400	/* in MSI-X */
-#define PCI_TPH_CAP_ST_MASK	0x07FF0000	/* st table mask */
-#define PCI_TPH_CAP_ST_SHIFT	16	/* st table shift */
-#define PCI_TPH_BASE_SIZEOF	12	/* size with no st table */
+#define PCI_TPH_CAP_ST_MASK	0x07FF0000	/* ST table mask */
+#define PCI_TPH_CAP_ST_SHIFT	16	/* ST table shift */
+#define PCI_TPH_BASE_SIZEOF	0xc	/* size with no ST table */
 
 /* Downstream Port Containment */
-#define PCI_EXP_DPC_CAP			4	/* DPC Capability */
+#define PCI_EXP_DPC_CAP			0x04	/* DPC Capability */
 #define PCI_EXP_DPC_IRQ			0x001F	/* Interrupt Message Number */
 #define  PCI_EXP_DPC_CAP_RP_EXT		0x0020	/* Root Port Extensions */
 #define  PCI_EXP_DPC_CAP_POISONED_TLP	0x0040	/* Poisoned TLP Egress Blocking Supported */
@@ -1030,19 +1030,19 @@
 #define  PCI_EXP_DPC_RP_PIO_LOG_SIZE	0x0F00	/* RP PIO Log Size */
 #define  PCI_EXP_DPC_CAP_DL_ACTIVE	0x1000	/* ERR_COR signal on DL_Active supported */
 
-#define PCI_EXP_DPC_CTL			6	/* DPC control */
+#define PCI_EXP_DPC_CTL			0x06	/* DPC control */
 #define  PCI_EXP_DPC_CTL_EN_FATAL	0x0001	/* Enable trigger on ERR_FATAL message */
 #define  PCI_EXP_DPC_CTL_EN_NONFATAL	0x0002	/* Enable trigger on ERR_NONFATAL message */
 #define  PCI_EXP_DPC_CTL_INT_EN		0x0008	/* DPC Interrupt Enable */
 
-#define PCI_EXP_DPC_STATUS		8	/* DPC Status */
+#define PCI_EXP_DPC_STATUS		0x08	/* DPC Status */
 #define  PCI_EXP_DPC_STATUS_TRIGGER	    0x0001 /* Trigger Status */
 #define  PCI_EXP_DPC_STATUS_TRIGGER_RSN	    0x0006 /* Trigger Reason */
 #define  PCI_EXP_DPC_STATUS_INTERRUPT	    0x0008 /* Interrupt Status */
 #define  PCI_EXP_DPC_RP_BUSY		    0x0010 /* Root Port Busy */
 #define  PCI_EXP_DPC_STATUS_TRIGGER_RSN_EXT 0x0060 /* Trig Reason Extension */
 
-#define PCI_EXP_DPC_SOURCE_ID		10	/* DPC Source Identifier */
+#define PCI_EXP_DPC_SOURCE_ID		 0x0A	/* DPC Source Identifier */
 
 #define PCI_EXP_DPC_RP_PIO_STATUS	 0x0C	/* RP PIO Status */
 #define PCI_EXP_DPC_RP_PIO_MASK		 0x10	/* RP PIO Mask */
@@ -1086,7 +1086,11 @@
 
 /* Designated Vendor-Specific (DVSEC, PCI_EXT_CAP_ID_DVSEC) */
 #define PCI_DVSEC_HEADER1		0x4 /* Designated Vendor-Specific Header1 */
+#define  PCI_DVSEC_HEADER1_VID(x)	((x) & 0xffff)
+#define  PCI_DVSEC_HEADER1_REV(x)	(((x) >> 16) & 0xf)
+#define  PCI_DVSEC_HEADER1_LEN(x)	(((x) >> 20) & 0xfff)
 #define PCI_DVSEC_HEADER2		0x8 /* Designated Vendor-Specific Header2 */
+#define  PCI_DVSEC_HEADER2_ID(x)		((x) & 0xffff)
 
 /* Data Link Feature */
 #define PCI_DLF_CAP		0x04	/* Capabilities Register */
diff --git a/include/standard-headers/linux/virtio_gpio.h b/include/standard-headers/linux/virtio_gpio.h
new file mode 100644
index 00000000000..2b5cf06349a
--- /dev/null
+++ b/include/standard-headers/linux/virtio_gpio.h
@@ -0,0 +1,72 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+
+#ifndef _LINUX_VIRTIO_GPIO_H
+#define _LINUX_VIRTIO_GPIO_H
+
+#include "standard-headers/linux/types.h"
+
+/* Virtio GPIO Feature bits */
+#define VIRTIO_GPIO_F_IRQ			0
+
+/* Virtio GPIO request types */
+#define VIRTIO_GPIO_MSG_GET_NAMES		0x0001
+#define VIRTIO_GPIO_MSG_GET_DIRECTION		0x0002
+#define VIRTIO_GPIO_MSG_SET_DIRECTION		0x0003
+#define VIRTIO_GPIO_MSG_GET_VALUE		0x0004
+#define VIRTIO_GPIO_MSG_SET_VALUE		0x0005
+#define VIRTIO_GPIO_MSG_IRQ_TYPE		0x0006
+
+/* Possible values of the status field */
+#define VIRTIO_GPIO_STATUS_OK			0x0
+#define VIRTIO_GPIO_STATUS_ERR			0x1
+
+/* Direction types */
+#define VIRTIO_GPIO_DIRECTION_NONE		0x00
+#define VIRTIO_GPIO_DIRECTION_OUT		0x01
+#define VIRTIO_GPIO_DIRECTION_IN		0x02
+
+/* Virtio GPIO IRQ types */
+#define VIRTIO_GPIO_IRQ_TYPE_NONE		0x00
+#define VIRTIO_GPIO_IRQ_TYPE_EDGE_RISING	0x01
+#define VIRTIO_GPIO_IRQ_TYPE_EDGE_FALLING	0x02
+#define VIRTIO_GPIO_IRQ_TYPE_EDGE_BOTH		0x03
+#define VIRTIO_GPIO_IRQ_TYPE_LEVEL_HIGH		0x04
+#define VIRTIO_GPIO_IRQ_TYPE_LEVEL_LOW		0x08
+
+struct virtio_gpio_config {
+	uint16_t ngpio;
+	uint8_t padding[2];
+	uint32_t gpio_names_size;
+};
+
+/* Virtio GPIO Request / Response */
+struct virtio_gpio_request {
+	uint16_t type;
+	uint16_t gpio;
+	uint32_t value;
+};
+
+struct virtio_gpio_response {
+	uint8_t status;
+	uint8_t value;
+};
+
+struct virtio_gpio_response_get_names {
+	uint8_t status;
+	uint8_t value[];
+};
+
+/* Virtio GPIO IRQ Request / Response */
+struct virtio_gpio_irq_request {
+	uint16_t gpio;
+};
+
+struct virtio_gpio_irq_response {
+	uint8_t status;
+};
+
+/* Possible values of the interrupt status field */
+#define VIRTIO_GPIO_IRQ_STATUS_INVALID		0x0
+#define VIRTIO_GPIO_IRQ_STATUS_VALID		0x1
+
+#endif /* _LINUX_VIRTIO_GPIO_H */
diff --git a/include/standard-headers/linux/virtio_i2c.h b/include/standard-headers/linux/virtio_i2c.h
new file mode 100644
index 00000000000..09fa907793c
--- /dev/null
+++ b/include/standard-headers/linux/virtio_i2c.h
@@ -0,0 +1,47 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later WITH Linux-syscall-note */
+/*
+ * Definitions for virtio I2C Adpter
+ *
+ * Copyright (c) 2021 Intel Corporation. All rights reserved.
+ */
+
+#ifndef _LINUX_VIRTIO_I2C_H
+#define _LINUX_VIRTIO_I2C_H
+
+#include "standard-headers/linux/const.h"
+#include "standard-headers/linux/types.h"
+
+/* Virtio I2C Feature bits */
+#define VIRTIO_I2C_F_ZERO_LENGTH_REQUEST	0
+
+/* The bit 0 of the @virtio_i2c_out_hdr.@flags, used to group the requests */
+#define VIRTIO_I2C_FLAGS_FAIL_NEXT	_BITUL(0)
+
+/* The bit 1 of the @virtio_i2c_out_hdr.@flags, used to mark a buffer as read */
+#define VIRTIO_I2C_FLAGS_M_RD		_BITUL(1)
+
+/**
+ * struct virtio_i2c_out_hdr - the virtio I2C message OUT header
+ * @addr: the controlled device address
+ * @padding: used to pad to full dword
+ * @flags: used for feature extensibility
+ */
+struct virtio_i2c_out_hdr {
+	uint16_t addr;
+	uint16_t padding;
+	uint32_t flags;
+};
+
+/**
+ * struct virtio_i2c_in_hdr - the virtio I2C message IN header
+ * @status: the processing result from the backend
+ */
+struct virtio_i2c_in_hdr {
+	uint8_t status;
+};
+
+/* The final status written by the device */
+#define VIRTIO_I2C_MSG_OK	0
+#define VIRTIO_I2C_MSG_ERR	1
+
+#endif /* _LINUX_VIRTIO_I2C_H */
diff --git a/include/standard-headers/linux/virtio_iommu.h b/include/standard-headers/linux/virtio_iommu.h
index b9443b83a13..366379c2f07 100644
--- a/include/standard-headers/linux/virtio_iommu.h
+++ b/include/standard-headers/linux/virtio_iommu.h
@@ -16,6 +16,7 @@
 #define VIRTIO_IOMMU_F_BYPASS			3
 #define VIRTIO_IOMMU_F_PROBE			4
 #define VIRTIO_IOMMU_F_MMIO			5
+#define VIRTIO_IOMMU_F_BYPASS_CONFIG		6
 
 struct virtio_iommu_range_64 {
 	uint64_t					start;
@@ -36,6 +37,8 @@ struct virtio_iommu_config {
 	struct virtio_iommu_range_32		domain_range;
 	/* Probe buffer size */
 	uint32_t					probe_size;
+	uint8_t					bypass;
+	uint8_t					reserved[3];
 };
 
 /* Request types */
@@ -66,11 +69,14 @@ struct virtio_iommu_req_tail {
 	uint8_t					reserved[3];
 };
 
+#define VIRTIO_IOMMU_ATTACH_F_BYPASS		(1 << 0)
+
 struct virtio_iommu_req_attach {
 	struct virtio_iommu_req_head		head;
 	uint32_t					domain;
 	uint32_t					endpoint;
-	uint8_t					reserved[8];
+	uint32_t					flags;
+	uint8_t					reserved[4];
 	struct virtio_iommu_req_tail		tail;
 };
 
diff --git a/include/standard-headers/linux/virtio_pcidev.h b/include/standard-headers/linux/virtio_pcidev.h
new file mode 100644
index 00000000000..bdf1d062da2
--- /dev/null
+++ b/include/standard-headers/linux/virtio_pcidev.h
@@ -0,0 +1,65 @@
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) */
+/*
+ * Copyright (C) 2021 Intel Corporation
+ * Author: Johannes Berg <johannes@sipsolutions.net>
+ */
+#ifndef _LINUX_VIRTIO_PCIDEV_H
+#define _LINUX_VIRTIO_PCIDEV_H
+#include "standard-headers/linux/types.h"
+
+/**
+ * enum virtio_pcidev_ops - virtual PCI device operations
+ * @VIRTIO_PCIDEV_OP_RESERVED: reserved to catch errors
+ * @VIRTIO_PCIDEV_OP_CFG_READ: read config space, size is 1, 2, 4 or 8;
+ *	the @data field should be filled in by the device (in little endian).
+ * @VIRTIO_PCIDEV_OP_CFG_WRITE: write config space, size is 1, 2, 4 or 8;
+ *	the @data field contains the data to write (in little endian).
+ * @VIRTIO_PCIDEV_OP_MMIO_READ: read BAR mem/pio, size can be variable;
+ *	the @data field should be filled in by the device (in little endian).
+ * @VIRTIO_PCIDEV_OP_MMIO_WRITE: write BAR mem/pio, size can be variable;
+ *	the @data field contains the data to write (in little endian).
+ * @VIRTIO_PCIDEV_OP_MMIO_MEMSET: memset MMIO, size is variable but
+ *	the @data field only has one byte (unlike @VIRTIO_PCIDEV_OP_MMIO_WRITE)
+ * @VIRTIO_PCIDEV_OP_INT: legacy INTx# pin interrupt, the addr field is 1-4 for
+ *	the number
+ * @VIRTIO_PCIDEV_OP_MSI: MSI(-X) interrupt, this message basically transports
+ *	the 16- or 32-bit write that would otherwise be done into memory,
+ *	analogous to the write messages (@VIRTIO_PCIDEV_OP_MMIO_WRITE) above
+ * @VIRTIO_PCIDEV_OP_PME: Dummy message whose content is ignored (and should be
+ *	all zeroes) to signal the PME# pin.
+ */
+enum virtio_pcidev_ops {
+	VIRTIO_PCIDEV_OP_RESERVED = 0,
+	VIRTIO_PCIDEV_OP_CFG_READ,
+	VIRTIO_PCIDEV_OP_CFG_WRITE,
+	VIRTIO_PCIDEV_OP_MMIO_READ,
+	VIRTIO_PCIDEV_OP_MMIO_WRITE,
+	VIRTIO_PCIDEV_OP_MMIO_MEMSET,
+	VIRTIO_PCIDEV_OP_INT,
+	VIRTIO_PCIDEV_OP_MSI,
+	VIRTIO_PCIDEV_OP_PME,
+};
+
+/**
+ * struct virtio_pcidev_msg - virtio PCI device operation
+ * @op: the operation to do
+ * @bar: the bar (only with BAR read/write messages)
+ * @reserved: reserved
+ * @size: the size of the read/write (in bytes)
+ * @addr: the address to read/write
+ * @data: the data, normally @size long, but just one byte for
+ *	%VIRTIO_PCIDEV_OP_MMIO_MEMSET
+ *
+ * Note: the fields are all in native (CPU) endian, however, the
+ * @data values will often be in little endian (see the ops above.)
+ */
+struct virtio_pcidev_msg {
+	uint8_t op;
+	uint8_t bar;
+	uint16_t reserved;
+	uint32_t size;
+	uint64_t addr;
+	uint8_t data[];
+};
+
+#endif /* _LINUX_VIRTIO_PCIDEV_H */
diff --git a/include/standard-headers/linux/virtio_scmi.h b/include/standard-headers/linux/virtio_scmi.h
new file mode 100644
index 00000000000..8f2c305aea0
--- /dev/null
+++ b/include/standard-headers/linux/virtio_scmi.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) */
+/*
+ * Copyright (C) 2020-2021 OpenSynergy GmbH
+ * Copyright (C) 2021 ARM Ltd.
+ */
+
+#ifndef _LINUX_VIRTIO_SCMI_H
+#define _LINUX_VIRTIO_SCMI_H
+
+#include "standard-headers/linux/virtio_types.h"
+
+/* Device implements some SCMI notifications, or delayed responses. */
+#define VIRTIO_SCMI_F_P2A_CHANNELS 0
+
+/* Device implements any SCMI statistics shared memory region */
+#define VIRTIO_SCMI_F_SHARED_MEMORY 1
+
+/* Virtqueues */
+
+#define VIRTIO_SCMI_VQ_TX 0 /* cmdq */
+#define VIRTIO_SCMI_VQ_RX 1 /* eventq */
+#define VIRTIO_SCMI_VQ_MAX_CNT 2
+
+#endif /* _LINUX_VIRTIO_SCMI_H */
diff --git a/linux-headers/asm-generic/unistd.h b/linux-headers/asm-generic/unistd.h
index 4557a8b6086..1c48b0ae3ba 100644
--- a/linux-headers/asm-generic/unistd.h
+++ b/linux-headers/asm-generic/unistd.h
@@ -883,8 +883,11 @@ __SYSCALL(__NR_process_mrelease, sys_process_mrelease)
 #define __NR_futex_waitv 449
 __SYSCALL(__NR_futex_waitv, sys_futex_waitv)
 
+#define __NR_set_mempolicy_home_node 450
+__SYSCALL(__NR_set_mempolicy_home_node, sys_set_mempolicy_home_node)
+
 #undef __NR_syscalls
-#define __NR_syscalls 450
+#define __NR_syscalls 451
 
 /*
  * 32 bit systems traditionally used different
diff --git a/linux-headers/asm-mips/unistd_n32.h b/linux-headers/asm-mips/unistd_n32.h
index 4b3e7ad1ecc..1f14a6fad33 100644
--- a/linux-headers/asm-mips/unistd_n32.h
+++ b/linux-headers/asm-mips/unistd_n32.h
@@ -377,5 +377,7 @@
 #define __NR_landlock_add_rule (__NR_Linux + 445)
 #define __NR_landlock_restrict_self (__NR_Linux + 446)
 #define __NR_process_mrelease (__NR_Linux + 448)
+#define __NR_futex_waitv (__NR_Linux + 449)
+#define __NR_set_mempolicy_home_node (__NR_Linux + 450)
 
 #endif /* _ASM_UNISTD_N32_H */
diff --git a/linux-headers/asm-mips/unistd_n64.h b/linux-headers/asm-mips/unistd_n64.h
index 488d9298d9b..e5a8ebec789 100644
--- a/linux-headers/asm-mips/unistd_n64.h
+++ b/linux-headers/asm-mips/unistd_n64.h
@@ -353,5 +353,7 @@
 #define __NR_landlock_add_rule (__NR_Linux + 445)
 #define __NR_landlock_restrict_self (__NR_Linux + 446)
 #define __NR_process_mrelease (__NR_Linux + 448)
+#define __NR_futex_waitv (__NR_Linux + 449)
+#define __NR_set_mempolicy_home_node (__NR_Linux + 450)
 
 #endif /* _ASM_UNISTD_N64_H */
diff --git a/linux-headers/asm-mips/unistd_o32.h b/linux-headers/asm-mips/unistd_o32.h
index f47399870a8..871d57168fe 100644
--- a/linux-headers/asm-mips/unistd_o32.h
+++ b/linux-headers/asm-mips/unistd_o32.h
@@ -423,5 +423,7 @@
 #define __NR_landlock_add_rule (__NR_Linux + 445)
 #define __NR_landlock_restrict_self (__NR_Linux + 446)
 #define __NR_process_mrelease (__NR_Linux + 448)
+#define __NR_futex_waitv (__NR_Linux + 449)
+#define __NR_set_mempolicy_home_node (__NR_Linux + 450)
 
 #endif /* _ASM_UNISTD_O32_H */
diff --git a/linux-headers/asm-powerpc/unistd_32.h b/linux-headers/asm-powerpc/unistd_32.h
index 11d54696dc6..585c7fefbc2 100644
--- a/linux-headers/asm-powerpc/unistd_32.h
+++ b/linux-headers/asm-powerpc/unistd_32.h
@@ -430,6 +430,8 @@
 #define __NR_landlock_add_rule 445
 #define __NR_landlock_restrict_self 446
 #define __NR_process_mrelease 448
+#define __NR_futex_waitv 449
+#define __NR_set_mempolicy_home_node 450
 
 
 #endif /* _ASM_UNISTD_32_H */
diff --git a/linux-headers/asm-powerpc/unistd_64.h b/linux-headers/asm-powerpc/unistd_64.h
index cf740bab136..350f7ec0ac3 100644
--- a/linux-headers/asm-powerpc/unistd_64.h
+++ b/linux-headers/asm-powerpc/unistd_64.h
@@ -402,6 +402,8 @@
 #define __NR_landlock_add_rule 445
 #define __NR_landlock_restrict_self 446
 #define __NR_process_mrelease 448
+#define __NR_futex_waitv 449
+#define __NR_set_mempolicy_home_node 450
 
 
 #endif /* _ASM_UNISTD_64_H */
diff --git a/linux-headers/asm-riscv/bitsperlong.h b/linux-headers/asm-riscv/bitsperlong.h
new file mode 100644
index 00000000000..cc5c45a9ce9
--- /dev/null
+++ b/linux-headers/asm-riscv/bitsperlong.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0-only WITH Linux-syscall-note */
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ * Copyright (C) 2015 Regents of the University of California
+ */
+
+#ifndef _ASM_RISCV_BITSPERLONG_H
+#define _ASM_RISCV_BITSPERLONG_H
+
+#define __BITS_PER_LONG (__SIZEOF_POINTER__ * 8)
+
+#include <asm-generic/bitsperlong.h>
+
+#endif /* _ASM_RISCV_BITSPERLONG_H */
diff --git a/linux-headers/asm-riscv/mman.h b/linux-headers/asm-riscv/mman.h
new file mode 100644
index 00000000000..8eebf89f5ab
--- /dev/null
+++ b/linux-headers/asm-riscv/mman.h
@@ -0,0 +1 @@
+#include <asm-generic/mman.h>
diff --git a/linux-headers/asm-riscv/unistd.h b/linux-headers/asm-riscv/unistd.h
new file mode 100644
index 00000000000..8062996c2df
--- /dev/null
+++ b/linux-headers/asm-riscv/unistd.h
@@ -0,0 +1,44 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Copyright (C) 2018 David Abdurachmanov <david.abdurachmanov@gmail.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <https://www.gnu.org/licenses/>.
+ */
+
+#ifdef __LP64__
+#define __ARCH_WANT_NEW_STAT
+#define __ARCH_WANT_SET_GET_RLIMIT
+#endif /* __LP64__ */
+
+#define __ARCH_WANT_SYS_CLONE3
+
+#include <asm-generic/unistd.h>
+
+/*
+ * Allows the instruction cache to be flushed from userspace.  Despite RISC-V
+ * having a direct 'fence.i' instruction available to userspace (which we
+ * can't trap!), that's not actually viable when running on Linux because the
+ * kernel might schedule a process on another hart.  There is no way for
+ * userspace to handle this without invoking the kernel (as it doesn't know the
+ * thread->hart mappings), so we've defined a RISC-V specific system call to
+ * flush the instruction cache.
+ *
+ * __NR_riscv_flush_icache is defined to flush the instruction cache over an
+ * address range, with the flush applying to either all threads or just the
+ * caller.  We don't currently do anything with the address range, that's just
+ * in there for forwards compatibility.
+ */
+#ifndef __NR_riscv_flush_icache
+#define __NR_riscv_flush_icache (__NR_arch_specific_syscall + 15)
+#endif
+__SYSCALL(__NR_riscv_flush_icache, sys_riscv_flush_icache)
diff --git a/linux-headers/asm-s390/unistd_32.h b/linux-headers/asm-s390/unistd_32.h
index 8f97d98128f..8e644d65f57 100644
--- a/linux-headers/asm-s390/unistd_32.h
+++ b/linux-headers/asm-s390/unistd_32.h
@@ -420,5 +420,7 @@
 #define __NR_landlock_add_rule 445
 #define __NR_landlock_restrict_self 446
 #define __NR_process_mrelease 448
+#define __NR_futex_waitv 449
+#define __NR_set_mempolicy_home_node 450
 
 #endif /* _ASM_S390_UNISTD_32_H */
diff --git a/linux-headers/asm-s390/unistd_64.h b/linux-headers/asm-s390/unistd_64.h
index 021ffc30e66..51da542fec1 100644
--- a/linux-headers/asm-s390/unistd_64.h
+++ b/linux-headers/asm-s390/unistd_64.h
@@ -368,5 +368,7 @@
 #define __NR_landlock_add_rule 445
 #define __NR_landlock_restrict_self 446
 #define __NR_process_mrelease 448
+#define __NR_futex_waitv 449
+#define __NR_set_mempolicy_home_node 450
 
 #endif /* _ASM_S390_UNISTD_64_H */
diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
index 5a776a08f78..2da3316bb55 100644
--- a/linux-headers/asm-x86/kvm.h
+++ b/linux-headers/asm-x86/kvm.h
@@ -373,9 +373,23 @@ struct kvm_debugregs {
 	__u64 reserved[9];
 };
 
-/* for KVM_CAP_XSAVE */
+/* for KVM_CAP_XSAVE and KVM_CAP_XSAVE2 */
 struct kvm_xsave {
+	/*
+	 * KVM_GET_XSAVE2 and KVM_SET_XSAVE write and read as many bytes
+	 * as are returned by KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2)
+	 * respectively, when invoked on the vm file descriptor.
+	 *
+	 * The size value returned by KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2)
+	 * will always be at least 4096. Currently, it is only greater
+	 * than 4096 if a dynamic feature has been enabled with
+	 * ``arch_prctl()``, but this may change in the future.
+	 *
+	 * The offsets of the state save areas in struct kvm_xsave follow
+	 * the contents of CPUID leaf 0xD on the host.
+	 */
 	__u32 region[1024];
+	__u32 extra[0];
 };
 
 #define KVM_MAX_XCRS	16
diff --git a/linux-headers/asm-x86/unistd_32.h b/linux-headers/asm-x86/unistd_32.h
index 9c9ffe312b6..87e1e977afd 100644
--- a/linux-headers/asm-x86/unistd_32.h
+++ b/linux-headers/asm-x86/unistd_32.h
@@ -440,6 +440,7 @@
 #define __NR_memfd_secret 447
 #define __NR_process_mrelease 448
 #define __NR_futex_waitv 449
+#define __NR_set_mempolicy_home_node 450
 
 
 #endif /* _ASM_UNISTD_32_H */
diff --git a/linux-headers/asm-x86/unistd_64.h b/linux-headers/asm-x86/unistd_64.h
index 084f1eef9c5..147a78d623e 100644
--- a/linux-headers/asm-x86/unistd_64.h
+++ b/linux-headers/asm-x86/unistd_64.h
@@ -362,6 +362,7 @@
 #define __NR_memfd_secret 447
 #define __NR_process_mrelease 448
 #define __NR_futex_waitv 449
+#define __NR_set_mempolicy_home_node 450
 
 
 #endif /* _ASM_UNISTD_64_H */
diff --git a/linux-headers/asm-x86/unistd_x32.h b/linux-headers/asm-x86/unistd_x32.h
index a2441affc24..27098db7fbe 100644
--- a/linux-headers/asm-x86/unistd_x32.h
+++ b/linux-headers/asm-x86/unistd_x32.h
@@ -315,6 +315,7 @@
 #define __NR_memfd_secret (__X32_SYSCALL_BIT + 447)
 #define __NR_process_mrelease (__X32_SYSCALL_BIT + 448)
 #define __NR_futex_waitv (__X32_SYSCALL_BIT + 449)
+#define __NR_set_mempolicy_home_node (__X32_SYSCALL_BIT + 450)
 #define __NR_rt_sigaction (__X32_SYSCALL_BIT + 512)
 #define __NR_rt_sigreturn (__X32_SYSCALL_BIT + 513)
 #define __NR_ioctl (__X32_SYSCALL_BIT + 514)
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 02c5e7b7bbb..00af3bc333a 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -1130,6 +1130,9 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_BINARY_STATS_FD 203
 #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
 #define KVM_CAP_ARM_MTE 205
+#define KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM 206
+#define KVM_CAP_VM_GPA_BITS 207
+#define KVM_CAP_XSAVE2 208
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1161,11 +1164,20 @@ struct kvm_irq_routing_hv_sint {
 	__u32 sint;
 };
 
+struct kvm_irq_routing_xen_evtchn {
+	__u32 port;
+	__u32 vcpu;
+	__u32 priority;
+};
+
+#define KVM_IRQ_ROUTING_XEN_EVTCHN_PRIO_2LEVEL ((__u32)(-1))
+
 /* gsi routing entry types */
 #define KVM_IRQ_ROUTING_IRQCHIP 1
 #define KVM_IRQ_ROUTING_MSI 2
 #define KVM_IRQ_ROUTING_S390_ADAPTER 3
 #define KVM_IRQ_ROUTING_HV_SINT 4
+#define KVM_IRQ_ROUTING_XEN_EVTCHN 5
 
 struct kvm_irq_routing_entry {
 	__u32 gsi;
@@ -1177,6 +1189,7 @@ struct kvm_irq_routing_entry {
 		struct kvm_irq_routing_msi msi;
 		struct kvm_irq_routing_s390_adapter adapter;
 		struct kvm_irq_routing_hv_sint hv_sint;
+		struct kvm_irq_routing_xen_evtchn xen_evtchn;
 		__u32 pad[8];
 	} u;
 };
@@ -1207,6 +1220,7 @@ struct kvm_x86_mce {
 #define KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL	(1 << 1)
 #define KVM_XEN_HVM_CONFIG_SHARED_INFO		(1 << 2)
 #define KVM_XEN_HVM_CONFIG_RUNSTATE		(1 << 3)
+#define KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL	(1 << 4)
 
 struct kvm_xen_hvm_config {
 	__u32 flags;
@@ -1609,6 +1623,9 @@ struct kvm_enc_region {
 #define KVM_S390_NORMAL_RESET	_IO(KVMIO,   0xc3)
 #define KVM_S390_CLEAR_RESET	_IO(KVMIO,   0xc4)
 
+/* Available with KVM_CAP_XSAVE2 */
+#define KVM_GET_XSAVE2		  _IOR(KVMIO,  0xcf, struct kvm_xsave)
+
 struct kvm_s390_pv_sec_parm {
 	__u64 origin;
 	__u64 length;
-- 
Gitee


From 72377d09a18397e4a5d6f258e085cb991b6933b9 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini@redhat.com>
Date: Tue, 22 Feb 2022 17:58:11 +0100
Subject: [PATCH 086/407] linux-headers: include missing changes from 5.17

RH-Author: Paul Lai <plai@redhat.com>
RH-MergeRequest: 176: Enable KVM AMX support
RH-Commit: [4/13] 2ed7cbc07e63d85cda916ef44d1e82b1fba7fdf4
RH-Bugzilla: 1916415
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Igor Mammedov <imammedo@redhat.com>
RH-Acked-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 1ea5208febcc068449b63282d72bb719ab67a466)
Signed-off-by: Paul Lai <plai@redhat.com>
---
 linux-headers/asm-x86/kvm.h | 3 +++
 linux-headers/linux/kvm.h   | 4 ++++
 2 files changed, 7 insertions(+)

diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
index 2da3316bb55..bf6e96011df 100644
--- a/linux-headers/asm-x86/kvm.h
+++ b/linux-headers/asm-x86/kvm.h
@@ -452,6 +452,9 @@ struct kvm_sync_regs {
 
 #define KVM_STATE_VMX_PREEMPTION_TIMER_DEADLINE	0x00000001
 
+/* attributes for system fd (group 0) */
+#define KVM_X86_XCOMP_GUEST_SUPP	0
+
 struct kvm_vmx_nested_state_data {
 	__u8 vmcs12[KVM_STATE_NESTED_VMX_VMCS_SIZE];
 	__u8 shadow_vmcs12[KVM_STATE_NESTED_VMX_VMCS_SIZE];
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 00af3bc333a..d232feaae97 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -1133,6 +1133,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM 206
 #define KVM_CAP_VM_GPA_BITS 207
 #define KVM_CAP_XSAVE2 208
+#define KVM_CAP_SYS_ATTRIBUTES 209
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -2047,4 +2048,7 @@ struct kvm_stats_desc {
 
 #define KVM_GET_STATS_FD  _IO(KVMIO,  0xce)
 
+/* Available with KVM_CAP_XSAVE2 */
+#define KVM_GET_XSAVE2		  _IOR(KVMIO,  0xcf, struct kvm_xsave)
+
 #endif /* __LINUX_KVM_H */
-- 
Gitee


From 4c2eba1df7571476863bf782b2276e12bf432b23 Mon Sep 17 00:00:00 2001
From: Jing Liu <jing2.liu@intel.com>
Date: Wed, 16 Feb 2022 22:04:27 -0800
Subject: [PATCH 087/407] x86: Fix the 64-byte boundary enumeration for
 extended state

RH-Author: Paul Lai <plai@redhat.com>
RH-MergeRequest: 176: Enable KVM AMX support
RH-Commit: [5/13] 64fc93e3b0ad0fc56da9d71b33d9eefd3cbba1d7
RH-Bugzilla: 1916415
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Igor Mammedov <imammedo@redhat.com>
RH-Acked-by: Paolo Bonzini <pbonzini@redhat.com>

The extended state subleaves (EAX=0Dh, ECX=n, n>1).ECX[1]
indicate whether the extended state component locates
on the next 64-byte boundary following the preceding state
component when the compacted format of an XSAVE area is
used.

Right now, they are all zero because no supported component
needed the bit to be set, but the upcoming AMX feature will
use it.  Fix the subleaves value according to KVM's supported
cpuid.

Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20220217060434.52460-2-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 131266b7565bd437127bd231563572696bb27235)
Signed-off-by: Paul Lai <plai@redhat.com>
---
 target/i386/cpu.c         | 1 +
 target/i386/cpu.h         | 6 ++++++
 target/i386/kvm/kvm-cpu.c | 1 +
 3 files changed, 8 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index dd6935b1dd5..f44fad3a2a2 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -5495,6 +5495,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
                 const ExtSaveArea *esa = &x86_ext_save_areas[count];
                 *eax = esa->size;
                 *ebx = esa->offset;
+                *ecx = esa->ecx & ESA_FEATURE_ALIGN64_MASK;
             }
         }
         break;
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index c6a6c871f16..5d9702a991e 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -548,6 +548,11 @@ typedef enum X86Seg {
 #define XSTATE_Hi16_ZMM_MASK            (1ULL << XSTATE_Hi16_ZMM_BIT)
 #define XSTATE_PKRU_MASK                (1ULL << XSTATE_PKRU_BIT)
 
+#define ESA_FEATURE_ALIGN64_BIT         1
+
+#define ESA_FEATURE_ALIGN64_MASK        (1U << ESA_FEATURE_ALIGN64_BIT)
+
+
 /* CPUID feature words */
 typedef enum FeatureWord {
     FEAT_1_EDX,         /* CPUID[1].EDX */
@@ -1354,6 +1359,7 @@ QEMU_BUILD_BUG_ON(sizeof(XSavePKRU) != 0x8);
 typedef struct ExtSaveArea {
     uint32_t feature, bits;
     uint32_t offset, size;
+    uint32_t ecx;
 } ExtSaveArea;
 
 #define XSAVE_STATE_AREA_COUNT (XSTATE_PKRU_BIT + 1)
diff --git a/target/i386/kvm/kvm-cpu.c b/target/i386/kvm/kvm-cpu.c
index 7b004065aec..86ef7b27127 100644
--- a/target/i386/kvm/kvm-cpu.c
+++ b/target/i386/kvm/kvm-cpu.c
@@ -104,6 +104,7 @@ static void kvm_cpu_xsave_init(void)
             if (sz != 0) {
                 assert(esa->size == sz);
                 esa->offset = kvm_arch_get_supported_cpuid(s, 0xd, i, R_EBX);
+                esa->ecx = kvm_arch_get_supported_cpuid(s, 0xd, i, R_ECX);
             }
         }
     }
-- 
Gitee


From 93f97c7ba70495d8f753602b80b4622bb5bda614 Mon Sep 17 00:00:00 2001
From: Jing Liu <jing2.liu@intel.com>
Date: Wed, 16 Feb 2022 22:04:28 -0800
Subject: [PATCH 088/407] x86: Add AMX XTILECFG and XTILEDATA components

RH-Author: Paul Lai <plai@redhat.com>
RH-MergeRequest: 176: Enable KVM AMX support
RH-Commit: [6/13] 95229f87b4494631d57232f374a174f7bc95843a
RH-Bugzilla: 1916415
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Igor Mammedov <imammedo@redhat.com>
RH-Acked-by: Paolo Bonzini <pbonzini@redhat.com>

The AMX TILECFG register and the TMMx tile data registers are
saved/restored via XSAVE, respectively in state component 17
(64 bytes) and state component 18 (8192 bytes).

Add AMX feature bits to x86_ext_save_areas array to set
up AMX components. Add structs that define the layout of
AMX XSAVE areas and use QEMU_BUILD_BUG_ON to validate the
structs sizes.

Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20220217060434.52460-3-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 1f16764f7d4515bfd5e4ae0aae814fa280a7d0c8)
Signed-off-by: Paul Lai <plai@redhat.com>
---
 target/i386/cpu.c |  8 ++++++++
 target/i386/cpu.h | 18 +++++++++++++++++-
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index f44fad3a2a2..0453c27c9df 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1401,6 +1401,14 @@ ExtSaveArea x86_ext_save_areas[XSAVE_STATE_AREA_COUNT] = {
     [XSTATE_PKRU_BIT] =
           { .feature = FEAT_7_0_ECX, .bits = CPUID_7_0_ECX_PKU,
             .size = sizeof(XSavePKRU) },
+    [XSTATE_XTILE_CFG_BIT] = {
+        .feature = FEAT_7_0_EDX, .bits = CPUID_7_0_EDX_AMX_TILE,
+        .size = sizeof(XSaveXTILECFG),
+    },
+    [XSTATE_XTILE_DATA_BIT] = {
+        .feature = FEAT_7_0_EDX, .bits = CPUID_7_0_EDX_AMX_TILE,
+        .size = sizeof(XSaveXTILEDATA)
+    },
 };
 
 static uint32_t xsave_area_size(uint64_t mask)
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 5d9702a991e..e1dd8b95551 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -537,6 +537,8 @@ typedef enum X86Seg {
 #define XSTATE_ZMM_Hi256_BIT            6
 #define XSTATE_Hi16_ZMM_BIT             7
 #define XSTATE_PKRU_BIT                 9
+#define XSTATE_XTILE_CFG_BIT            17
+#define XSTATE_XTILE_DATA_BIT           18
 
 #define XSTATE_FP_MASK                  (1ULL << XSTATE_FP_BIT)
 #define XSTATE_SSE_MASK                 (1ULL << XSTATE_SSE_BIT)
@@ -845,6 +847,8 @@ typedef uint64_t FeatureWordArray[FEATURE_WORDS];
 #define CPUID_7_0_EDX_TSX_LDTRK         (1U << 16)
 /* AVX512_FP16 instruction */
 #define CPUID_7_0_EDX_AVX512_FP16       (1U << 23)
+/* AMX tile (two-dimensional register) */
+#define CPUID_7_0_EDX_AMX_TILE          (1U << 24)
 /* Speculation Control */
 #define CPUID_7_0_EDX_SPEC_CTRL         (1U << 26)
 /* Single Thread Indirect Branch Predictors */
@@ -1348,6 +1352,16 @@ typedef struct XSavePKRU {
     uint32_t padding;
 } XSavePKRU;
 
+/* Ext. save area 17: AMX XTILECFG state */
+typedef struct XSaveXTILECFG {
+    uint8_t xtilecfg[64];
+} XSaveXTILECFG;
+
+/* Ext. save area 18: AMX XTILEDATA state */
+typedef struct XSaveXTILEDATA {
+    uint8_t xtiledata[8][1024];
+} XSaveXTILEDATA;
+
 QEMU_BUILD_BUG_ON(sizeof(XSaveAVX) != 0x100);
 QEMU_BUILD_BUG_ON(sizeof(XSaveBNDREG) != 0x40);
 QEMU_BUILD_BUG_ON(sizeof(XSaveBNDCSR) != 0x40);
@@ -1355,6 +1369,8 @@ QEMU_BUILD_BUG_ON(sizeof(XSaveOpmask) != 0x40);
 QEMU_BUILD_BUG_ON(sizeof(XSaveZMM_Hi256) != 0x200);
 QEMU_BUILD_BUG_ON(sizeof(XSaveHi16_ZMM) != 0x400);
 QEMU_BUILD_BUG_ON(sizeof(XSavePKRU) != 0x8);
+QEMU_BUILD_BUG_ON(sizeof(XSaveXTILECFG) != 0x40);
+QEMU_BUILD_BUG_ON(sizeof(XSaveXTILEDATA) != 0x2000);
 
 typedef struct ExtSaveArea {
     uint32_t feature, bits;
@@ -1362,7 +1378,7 @@ typedef struct ExtSaveArea {
     uint32_t ecx;
 } ExtSaveArea;
 
-#define XSAVE_STATE_AREA_COUNT (XSTATE_PKRU_BIT + 1)
+#define XSAVE_STATE_AREA_COUNT (XSTATE_XTILE_DATA_BIT + 1)
 
 extern ExtSaveArea x86_ext_save_areas[XSAVE_STATE_AREA_COUNT];
 
-- 
Gitee


From 9e80d7ca980d0ce7672922125513feba39a23c8a Mon Sep 17 00:00:00 2001
From: Yang Zhong <yang.zhong@intel.com>
Date: Wed, 16 Feb 2022 22:04:29 -0800
Subject: [PATCH 089/407] x86: Grant AMX permission for guest

RH-Author: Paul Lai <plai@redhat.com>
RH-MergeRequest: 176: Enable KVM AMX support
RH-Commit: [7/13] 437578191f61139ca710cc7045ab38eb0d05eae2
RH-Bugzilla: 1916415
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Igor Mammedov <imammedo@redhat.com>
RH-Acked-by: Paolo Bonzini <pbonzini@redhat.com>

Kernel allocates 4K xstate buffer by default. For XSAVE features
which require large state component (e.g. AMX), Linux kernel
dynamically expands the xstate buffer only after the process has
acquired the necessary permissions. Those are called dynamically-
enabled XSAVE features (or dynamic xfeatures).

There are separate permissions for native tasks and guests.

Qemu should request the guest permissions for dynamic xfeatures
which will be exposed to the guest. This only needs to be done
once before the first vcpu is created.

KVM implemented one new ARCH_GET_XCOMP_SUPP system attribute API to
get host side supported_xcr0 and Qemu can decide if it can request
dynamically enabled XSAVE features permission.
https://lore.kernel.org/all/20220126152210.3044876-1-pbonzini@redhat.com/

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Signed-off-by: Jing Liu <jing2.liu@intel.com>
Message-Id: <20220217060434.52460-4-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 19db68ca68a78fa033a21d419036b6e416554564)
Signed-off-by: Paul Lai <plai@redhat.com>
---
 target/i386/cpu.c          |  7 +++++
 target/i386/cpu.h          |  4 +++
 target/i386/kvm/kvm-cpu.c  | 12 ++++----
 target/i386/kvm/kvm.c      | 57 ++++++++++++++++++++++++++++++++++++++
 target/i386/kvm/kvm_i386.h |  1 +
 5 files changed, 75 insertions(+), 6 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 0453c27c9df..c19b51ea328 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6027,6 +6027,7 @@ static void x86_cpu_enable_xsave_components(X86CPU *cpu)
     CPUX86State *env = &cpu->env;
     int i;
     uint64_t mask;
+    static bool request_perm;
 
     if (!(env->features[FEAT_1_ECX] & CPUID_EXT_XSAVE)) {
         env->features[FEAT_XSAVE_COMP_LO] = 0;
@@ -6042,6 +6043,12 @@ static void x86_cpu_enable_xsave_components(X86CPU *cpu)
         }
     }
 
+    /* Only request permission for first vcpu */
+    if (kvm_enabled() && !request_perm) {
+        kvm_request_xsave_components(cpu, mask);
+        request_perm = true;
+    }
+
     env->features[FEAT_XSAVE_COMP_LO] = mask;
     env->features[FEAT_XSAVE_COMP_HI] = mask >> 32;
 }
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index e1dd8b95551..58676390e62 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -549,6 +549,10 @@ typedef enum X86Seg {
 #define XSTATE_ZMM_Hi256_MASK           (1ULL << XSTATE_ZMM_Hi256_BIT)
 #define XSTATE_Hi16_ZMM_MASK            (1ULL << XSTATE_Hi16_ZMM_BIT)
 #define XSTATE_PKRU_MASK                (1ULL << XSTATE_PKRU_BIT)
+#define XSTATE_XTILE_CFG_MASK           (1ULL << XSTATE_XTILE_CFG_BIT)
+#define XSTATE_XTILE_DATA_MASK          (1ULL << XSTATE_XTILE_DATA_BIT)
+
+#define XSTATE_DYNAMIC_MASK             (XSTATE_XTILE_DATA_MASK)
 
 #define ESA_FEATURE_ALIGN64_BIT         1
 
diff --git a/target/i386/kvm/kvm-cpu.c b/target/i386/kvm/kvm-cpu.c
index 86ef7b27127..bdc967c484e 100644
--- a/target/i386/kvm/kvm-cpu.c
+++ b/target/i386/kvm/kvm-cpu.c
@@ -84,7 +84,7 @@ static void kvm_cpu_max_instance_init(X86CPU *cpu)
 static void kvm_cpu_xsave_init(void)
 {
     static bool first = true;
-    KVMState *s = kvm_state;
+    uint32_t eax, ebx, ecx, edx;
     int i;
 
     if (!first) {
@@ -100,11 +100,11 @@ static void kvm_cpu_xsave_init(void)
         ExtSaveArea *esa = &x86_ext_save_areas[i];
 
         if (esa->size) {
-            int sz = kvm_arch_get_supported_cpuid(s, 0xd, i, R_EAX);
-            if (sz != 0) {
-                assert(esa->size == sz);
-                esa->offset = kvm_arch_get_supported_cpuid(s, 0xd, i, R_EBX);
-                esa->ecx = kvm_arch_get_supported_cpuid(s, 0xd, i, R_ECX);
+            host_cpuid(0xd, i, &eax, &ebx, &ecx, &edx);
+            if (eax != 0) {
+                assert(esa->size == eax);
+                esa->offset = ebx;
+                esa->ecx = ecx;
             }
         }
     }
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index a668f521ac9..b5d98c4361d 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -17,6 +17,7 @@
 #include "qapi/error.h"
 #include <sys/ioctl.h>
 #include <sys/utsname.h>
+#include <sys/syscall.h>
 
 #include <linux/kvm.h>
 #include "standard-headers/asm-x86/kvm_para.h"
@@ -347,6 +348,7 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t function,
     struct kvm_cpuid2 *cpuid;
     uint32_t ret = 0;
     uint32_t cpuid_1_edx;
+    uint64_t bitmask;
 
     cpuid = get_supported_cpuid(s);
 
@@ -404,6 +406,25 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t function,
         if (!has_msr_arch_capabs) {
             ret &= ~CPUID_7_0_EDX_ARCH_CAPABILITIES;
         }
+    } else if (function == 0xd && index == 0 &&
+               (reg == R_EAX || reg == R_EDX)) {
+        struct kvm_device_attr attr = {
+            .group = 0,
+            .attr = KVM_X86_XCOMP_GUEST_SUPP,
+            .addr = (unsigned long) &bitmask
+        };
+
+        bool sys_attr = kvm_check_extension(s, KVM_CAP_SYS_ATTRIBUTES);
+        if (!sys_attr) {
+            warn_report("cannot get sys attribute capabilities %d", sys_attr);
+        }
+
+        int rc = kvm_ioctl(s, KVM_GET_DEVICE_ATTR, &attr);
+        if (rc == -1 && (errno == ENXIO || errno == EINVAL)) {
+            warn_report("KVM_GET_DEVICE_ATTR(0, KVM_X86_XCOMP_GUEST_SUPP) "
+                        "error: %d", rc);
+        }
+        ret = (reg == R_EAX) ? bitmask : bitmask >> 32;
     } else if (function == 0x80000001 && reg == R_ECX) {
         /*
          * It's safe to enable TOPOEXT even if it's not returned by
@@ -5054,3 +5075,39 @@ bool kvm_arch_cpu_check_are_resettable(void)
 {
     return !sev_es_enabled();
 }
+
+#define ARCH_REQ_XCOMP_GUEST_PERM       0x1025
+
+void kvm_request_xsave_components(X86CPU *cpu, uint64_t mask)
+{
+    KVMState *s = kvm_state;
+    uint64_t supported;
+
+    mask &= XSTATE_DYNAMIC_MASK;
+    if (!mask) {
+        return;
+    }
+    /*
+     * Just ignore bits that are not in CPUID[EAX=0xD,ECX=0].
+     * ARCH_REQ_XCOMP_GUEST_PERM would fail, and QEMU has warned
+     * about them already because they are not supported features.
+     */
+    supported = kvm_arch_get_supported_cpuid(s, 0xd, 0, R_EAX);
+    supported |= (uint64_t)kvm_arch_get_supported_cpuid(s, 0xd, 0, R_EDX) << 32;
+    mask &= supported;
+
+    while (mask) {
+        int bit = ctz64(mask);
+        int rc = syscall(SYS_arch_prctl, ARCH_REQ_XCOMP_GUEST_PERM, bit);
+        if (rc) {
+            /*
+             * Older kernel version (<5.17) do not support
+             * ARCH_REQ_XCOMP_GUEST_PERM, but also do not return
+             * any dynamic feature from kvm_arch_get_supported_cpuid.
+             */
+            warn_report("prctl(ARCH_REQ_XCOMP_GUEST_PERM) failure "
+                        "for feature bit %d", bit);
+        }
+        mask &= ~BIT_ULL(bit);
+    }
+}
diff --git a/target/i386/kvm/kvm_i386.h b/target/i386/kvm/kvm_i386.h
index a978509d507..4124912c202 100644
--- a/target/i386/kvm/kvm_i386.h
+++ b/target/i386/kvm/kvm_i386.h
@@ -52,5 +52,6 @@ bool kvm_hyperv_expand_features(X86CPU *cpu, Error **errp);
 uint64_t kvm_swizzle_msi_ext_dest_id(uint64_t address);
 
 bool kvm_enable_sgx_provisioning(KVMState *s);
+void kvm_request_xsave_components(X86CPU *cpu, uint64_t mask);
 
 #endif
-- 
Gitee


From a833f6c6095564d257de6343ccb009a9887c88f4 Mon Sep 17 00:00:00 2001
From: Jing Liu <jing2.liu@intel.com>
Date: Wed, 16 Feb 2022 22:04:30 -0800
Subject: [PATCH 090/407] x86: Add XFD faulting bit for state components

RH-Author: Paul Lai <plai@redhat.com>
RH-MergeRequest: 176: Enable KVM AMX support
RH-Commit: [8/13] 0b1b46c5d075655ab94bc79e042b187c5dc55551
RH-Bugzilla: 1916415
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Igor Mammedov <imammedo@redhat.com>
RH-Acked-by: Paolo Bonzini <pbonzini@redhat.com>

Intel introduces XFD faulting mechanism for extended
XSAVE features to dynamically enable the features in
runtime. If CPUID (EAX=0Dh, ECX=n, n>1).ECX[2] is set
as 1, it indicates support for XFD faulting of this
state component.

Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20220217060434.52460-5-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 0f17f6b30f3b051f0f96ccc98c9f7f395713699f)
Signed-off-by: Paul Lai <plai@redhat.com>
---
 target/i386/cpu.c | 3 ++-
 target/i386/cpu.h | 2 ++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index c19b51ea328..cd27c0eb81d 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -5503,7 +5503,8 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
                 const ExtSaveArea *esa = &x86_ext_save_areas[count];
                 *eax = esa->size;
                 *ebx = esa->offset;
-                *ecx = esa->ecx & ESA_FEATURE_ALIGN64_MASK;
+                *ecx = esa->ecx &
+                       (ESA_FEATURE_ALIGN64_MASK | ESA_FEATURE_XFD_MASK);
             }
         }
         break;
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 58676390e62..f2bdef9c26c 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -555,8 +555,10 @@ typedef enum X86Seg {
 #define XSTATE_DYNAMIC_MASK             (XSTATE_XTILE_DATA_MASK)
 
 #define ESA_FEATURE_ALIGN64_BIT         1
+#define ESA_FEATURE_XFD_BIT             2
 
 #define ESA_FEATURE_ALIGN64_MASK        (1U << ESA_FEATURE_ALIGN64_BIT)
+#define ESA_FEATURE_XFD_MASK            (1U << ESA_FEATURE_XFD_BIT)
 
 
 /* CPUID feature words */
-- 
Gitee


From 65e4755f31e95b2f890fcc4229057fd9d334d2e2 Mon Sep 17 00:00:00 2001
From: Jing Liu <jing2.liu@intel.com>
Date: Wed, 16 Feb 2022 22:04:31 -0800
Subject: [PATCH 091/407] x86: Add AMX CPUIDs enumeration

RH-Author: Paul Lai <plai@redhat.com>
RH-MergeRequest: 176: Enable KVM AMX support
RH-Commit: [9/13] fab147992ad927c9538529f018f06e2f48546c5b
RH-Bugzilla: 1916415
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Igor Mammedov <imammedo@redhat.com>
RH-Acked-by: Paolo Bonzini <pbonzini@redhat.com>

Add AMX primary feature bits XFD and AMX_TILE to
enumerate the CPU's AMX capability. Meanwhile, add
AMX TILE and TMUL CPUID leaf and subleaves which
exist when AMX TILE is present to provide the maximum
capability of TILE and TMUL.

Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20220217060434.52460-6-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit f21a48171cf3fa39532fc8553fd82e81b88b6474)
Signed-off-by: Paul Lai <plai@redhat.com>
---
 target/i386/cpu.c     | 55 ++++++++++++++++++++++++++++++++++++++++---
 target/i386/kvm/kvm.c |  4 +++-
 2 files changed, 55 insertions(+), 4 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index cd27c0eb81d..09e08f7f388 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -574,6 +574,18 @@ static CPUCacheInfo legacy_l3_cache = {
 #define INTEL_PT_CYCLE_BITMAP    0x1fff         /* Support 0,2^(0~11) */
 #define INTEL_PT_PSB_BITMAP      (0x003f << 16) /* Support 2K,4K,8K,16K,32K,64K */
 
+/* CPUID Leaf 0x1D constants: */
+#define INTEL_AMX_TILE_MAX_SUBLEAF     0x1
+#define INTEL_AMX_TOTAL_TILE_BYTES     0x2000
+#define INTEL_AMX_BYTES_PER_TILE       0x400
+#define INTEL_AMX_BYTES_PER_ROW        0x40
+#define INTEL_AMX_TILE_MAX_NAMES       0x8
+#define INTEL_AMX_TILE_MAX_ROWS        0x10
+
+/* CPUID Leaf 0x1E constants: */
+#define INTEL_AMX_TMUL_MAX_K           0x10
+#define INTEL_AMX_TMUL_MAX_N           0x40
+
 void x86_cpu_vendor_words2str(char *dst, uint32_t vendor1,
                               uint32_t vendor2, uint32_t vendor3)
 {
@@ -843,8 +855,8 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
             "avx512-vp2intersect", NULL, "md-clear", NULL,
             NULL, NULL, "serialize", NULL,
             "tsx-ldtrk", NULL, NULL /* pconfig */, NULL,
-            NULL, NULL, NULL, "avx512-fp16",
-            NULL, NULL, "spec-ctrl", "stibp",
+            NULL, NULL, "amx-bf16", "avx512-fp16",
+            "amx-tile", "amx-int8", "spec-ctrl", "stibp",
             NULL, "arch-capabilities", "core-capability", "ssbd",
         },
         .cpuid = {
@@ -909,7 +921,7 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
         .type = CPUID_FEATURE_WORD,
         .feat_names = {
             "xsaveopt", "xsavec", "xgetbv1", "xsaves",
-            NULL, NULL, NULL, NULL,
+            "xfd", NULL, NULL, NULL,
             NULL, NULL, NULL, NULL,
             NULL, NULL, NULL, NULL,
             NULL, NULL, NULL, NULL,
@@ -5593,6 +5605,43 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
         }
         break;
     }
+    case 0x1D: {
+        /* AMX TILE */
+        *eax = 0;
+        *ebx = 0;
+        *ecx = 0;
+        *edx = 0;
+        if (!(env->features[FEAT_7_0_EDX] & CPUID_7_0_EDX_AMX_TILE)) {
+            break;
+        }
+
+        if (count == 0) {
+            /* Highest numbered palette subleaf */
+            *eax = INTEL_AMX_TILE_MAX_SUBLEAF;
+        } else if (count == 1) {
+            *eax = INTEL_AMX_TOTAL_TILE_BYTES |
+                   (INTEL_AMX_BYTES_PER_TILE << 16);
+            *ebx = INTEL_AMX_BYTES_PER_ROW | (INTEL_AMX_TILE_MAX_NAMES << 16);
+            *ecx = INTEL_AMX_TILE_MAX_ROWS;
+        }
+        break;
+    }
+    case 0x1E: {
+        /* AMX TMUL */
+        *eax = 0;
+        *ebx = 0;
+        *ecx = 0;
+        *edx = 0;
+        if (!(env->features[FEAT_7_0_EDX] & CPUID_7_0_EDX_AMX_TILE)) {
+            break;
+        }
+
+        if (count == 0) {
+            /* Highest numbered palette subleaf */
+            *ebx = INTEL_AMX_TMUL_MAX_K | (INTEL_AMX_TMUL_MAX_N << 8);
+        }
+        break;
+    }
     case 0x40000000:
         /*
          * CPUID code in kvm_arch_init_vcpu() ignores stuff
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index b5d98c4361d..a64a79d8703 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -1779,7 +1779,9 @@ int kvm_arch_init_vcpu(CPUState *cs)
                 c = &cpuid_data.entries[cpuid_i++];
             }
             break;
-        case 0x14: {
+        case 0x14:
+        case 0x1d:
+        case 0x1e: {
             uint32_t times;
 
             c->function = i;
-- 
Gitee


From 1053669021335e3ba7f203147bd3bbeaa33d69e0 Mon Sep 17 00:00:00 2001
From: Jing Liu <jing2.liu@intel.com>
Date: Wed, 16 Feb 2022 22:04:32 -0800
Subject: [PATCH 092/407] x86: add support for KVM_CAP_XSAVE2 and AMX state
 migration

RH-Author: Paul Lai <plai@redhat.com>
RH-MergeRequest: 176: Enable KVM AMX support
RH-Commit: [10/13] d584f455ba1ecd8a4a87f3470e6aac24ba9a1f5a
RH-Bugzilla: 1916415
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Igor Mammedov <imammedo@redhat.com>
RH-Acked-by: Paolo Bonzini <pbonzini@redhat.com>

When dynamic xfeatures (e.g. AMX) are used by the guest, the xsave
area would be larger than 4KB. KVM_GET_XSAVE2 and KVM_SET_XSAVE
under KVM_CAP_XSAVE2 works with a xsave buffer larger than 4KB.
Always use the new ioctls under KVM_CAP_XSAVE2 when KVM supports it.

Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Zeng Guang <guang.zeng@intel.com>
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20220217060434.52460-7-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit e56dd3c70abb31893c61ac834109fa7a38841330)
Signed-off-by: Paul Lai <plai@redhat.com>
---
 target/i386/cpu.h          |  4 ++++
 target/i386/kvm/kvm.c      | 42 ++++++++++++++++++++++++--------------
 target/i386/xsave_helper.c | 28 +++++++++++++++++++++++++
 3 files changed, 59 insertions(+), 15 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index f2bdef9c26c..14a3501b87b 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1522,6 +1522,10 @@ typedef struct CPUX86State {
     uint64_t opmask_regs[NB_OPMASK_REGS];
     YMMReg zmmh_regs[CPU_NB_REGS];
     ZMMReg hi16_zmm_regs[CPU_NB_REGS];
+#ifdef TARGET_X86_64
+    uint8_t xtilecfg[64];
+    uint8_t xtiledata[8192];
+#endif
 
     /* sysenter registers */
     uint32_t sysenter_cs;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index a64a79d8703..d3d476df27b 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -123,6 +123,7 @@ static uint32_t num_architectural_pmu_gp_counters;
 static uint32_t num_architectural_pmu_fixed_counters;
 
 static int has_xsave;
+static int has_xsave2;
 static int has_xcrs;
 static int has_pit_state2;
 static int has_exception_payload;
@@ -1585,6 +1586,26 @@ static Error *invtsc_mig_blocker;
 
 #define KVM_MAX_CPUID_ENTRIES  100
 
+static void kvm_init_xsave(CPUX86State *env)
+{
+    if (has_xsave2) {
+        env->xsave_buf_len = QEMU_ALIGN_UP(has_xsave2, 4096);
+    } else if (has_xsave) {
+        env->xsave_buf_len = sizeof(struct kvm_xsave);
+    } else {
+        return;
+    }
+
+    env->xsave_buf = qemu_memalign(4096, env->xsave_buf_len);
+    memset(env->xsave_buf, 0, env->xsave_buf_len);
+    /*
+     * The allocated storage must be large enough for all of the
+     * possible XSAVE state components.
+     */
+    assert(kvm_arch_get_supported_cpuid(kvm_state, 0xd, 0, R_ECX) <=
+           env->xsave_buf_len);
+}
+
 int kvm_arch_init_vcpu(CPUState *cs)
 {
     struct {
@@ -1614,6 +1635,8 @@ int kvm_arch_init_vcpu(CPUState *cs)
 
     cpuid_i = 0;
 
+    has_xsave2 = kvm_check_extension(cs->kvm_state, KVM_CAP_XSAVE2);
+
     r = kvm_arch_set_tsc_khz(cs);
     if (r < 0) {
         return r;
@@ -2003,19 +2026,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
     if (r) {
         goto fail;
     }
-
-    if (has_xsave) {
-        env->xsave_buf_len = sizeof(struct kvm_xsave);
-        env->xsave_buf = qemu_memalign(4096, env->xsave_buf_len);
-        memset(env->xsave_buf, 0, env->xsave_buf_len);
-
-        /*
-         * The allocated storage must be large enough for all of the
-         * possible XSAVE state components.
-         */
-        assert(kvm_arch_get_supported_cpuid(kvm_state, 0xd, 0, R_ECX)
-               <= env->xsave_buf_len);
-    }
+    kvm_init_xsave(env);
 
     max_nested_state_len = kvm_max_nested_state_length();
     if (max_nested_state_len > 0) {
@@ -3263,13 +3274,14 @@ static int kvm_get_xsave(X86CPU *cpu)
 {
     CPUX86State *env = &cpu->env;
     void *xsave = env->xsave_buf;
-    int ret;
+    int type, ret;
 
     if (!has_xsave) {
         return kvm_get_fpu(cpu);
     }
 
-    ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_XSAVE, xsave);
+    type = has_xsave2 ? KVM_GET_XSAVE2 : KVM_GET_XSAVE;
+    ret = kvm_vcpu_ioctl(CPU(cpu), type, xsave);
     if (ret < 0) {
         return ret;
     }
diff --git a/target/i386/xsave_helper.c b/target/i386/xsave_helper.c
index ac61a963440..996e9f3bfef 100644
--- a/target/i386/xsave_helper.c
+++ b/target/i386/xsave_helper.c
@@ -126,6 +126,20 @@ void x86_cpu_xsave_all_areas(X86CPU *cpu, void *buf, uint32_t buflen)
 
         memcpy(pkru, &env->pkru, sizeof(env->pkru));
     }
+
+    e = &x86_ext_save_areas[XSTATE_XTILE_CFG_BIT];
+    if (e->size && e->offset) {
+        XSaveXTILECFG *tilecfg = buf + e->offset;
+
+        memcpy(tilecfg, &env->xtilecfg, sizeof(env->xtilecfg));
+    }
+
+    e = &x86_ext_save_areas[XSTATE_XTILE_DATA_BIT];
+    if (e->size && e->offset && buflen >= e->size + e->offset) {
+        XSaveXTILEDATA *tiledata = buf + e->offset;
+
+        memcpy(tiledata, &env->xtiledata, sizeof(env->xtiledata));
+    }
 #endif
 }
 
@@ -247,5 +261,19 @@ void x86_cpu_xrstor_all_areas(X86CPU *cpu, const void *buf, uint32_t buflen)
         pkru = buf + e->offset;
         memcpy(&env->pkru, pkru, sizeof(env->pkru));
     }
+
+    e = &x86_ext_save_areas[XSTATE_XTILE_CFG_BIT];
+    if (e->size && e->offset) {
+        const XSaveXTILECFG *tilecfg = buf + e->offset;
+
+        memcpy(&env->xtilecfg, tilecfg, sizeof(env->xtilecfg));
+    }
+
+    e = &x86_ext_save_areas[XSTATE_XTILE_DATA_BIT];
+    if (e->size && e->offset && buflen >= e->size + e->offset) {
+        const XSaveXTILEDATA *tiledata = buf + e->offset;
+
+        memcpy(&env->xtiledata, tiledata, sizeof(env->xtiledata));
+    }
 #endif
 }
-- 
Gitee


From 76ff5adb870f0822b65ab632fad00803c9b5dd3f Mon Sep 17 00:00:00 2001
From: Zeng Guang <guang.zeng@intel.com>
Date: Wed, 16 Feb 2022 22:04:33 -0800
Subject: [PATCH 093/407] x86: Support XFD and AMX xsave data migration

RH-Author: Paul Lai <plai@redhat.com>
RH-MergeRequest: 176: Enable KVM AMX support
RH-Commit: [11/13] 4ff6e5544ffdac4e6d2f568f7f63b937502ca6c5
RH-Bugzilla: 1916415
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Igor Mammedov <imammedo@redhat.com>
RH-Acked-by: Paolo Bonzini <pbonzini@redhat.com>

XFD(eXtended Feature Disable) allows to enable a
feature on xsave state while preventing specific
user threads from using the feature.

Support save and restore XFD MSRs if CPUID.D.1.EAX[4]
enumerate to be valid. Likewise migrate the MSRs and
related xsave state necessarily.

Signed-off-by: Zeng Guang <guang.zeng@intel.com>
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20220217060434.52460-8-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit cdec2b753b487d9e8aab028231c35d87789ea083)
Signed-off-by: Paul Lai <plai@redhat.com>
---
 target/i386/cpu.h     |  9 +++++++++
 target/i386/kvm/kvm.c | 18 +++++++++++++++++
 target/i386/machine.c | 46 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 73 insertions(+)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 14a3501b87b..8ab2a4042a8 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -505,6 +505,9 @@ typedef enum X86Seg {
 
 #define MSR_VM_HSAVE_PA                 0xc0010117
 
+#define MSR_IA32_XFD                    0x000001c4
+#define MSR_IA32_XFD_ERR                0x000001c5
+
 #define MSR_IA32_BNDCFGS                0x00000d90
 #define MSR_IA32_XSS                    0x00000da0
 #define MSR_IA32_UMWAIT_CONTROL         0xe1
@@ -870,6 +873,8 @@ typedef uint64_t FeatureWordArray[FEATURE_WORDS];
 #define CPUID_7_1_EAX_AVX_VNNI          (1U << 4)
 /* AVX512 BFloat16 Instruction */
 #define CPUID_7_1_EAX_AVX512_BF16       (1U << 5)
+/* XFD Extend Feature Disabled */
+#define CPUID_D_1_EAX_XFD               (1U << 4)
 
 /* Packets which contain IP payload have LIP values */
 #define CPUID_14_0_ECX_LIP              (1U << 31)
@@ -1610,6 +1615,10 @@ typedef struct CPUX86State {
     uint64_t msr_rtit_cr3_match;
     uint64_t msr_rtit_addrs[MAX_RTIT_ADDRS];
 
+    /* Per-VCPU XFD MSRs */
+    uint64_t msr_xfd;
+    uint64_t msr_xfd_err;
+
     /* exception/interrupt handling */
     int error_code;
     int exception_is_int;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index d3d476df27b..b1128b0e074 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -3219,6 +3219,13 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
                               env->msr_ia32_sgxlepubkeyhash[3]);
         }
 
+        if (env->features[FEAT_XSAVE] & CPUID_D_1_EAX_XFD) {
+            kvm_msr_entry_add(cpu, MSR_IA32_XFD,
+                              env->msr_xfd);
+            kvm_msr_entry_add(cpu, MSR_IA32_XFD_ERR,
+                              env->msr_xfd_err);
+        }
+
         /* Note: MSR_IA32_FEATURE_CONTROL is written separately, see
          *       kvm_put_msr_feature_control. */
     }
@@ -3571,6 +3578,11 @@ static int kvm_get_msrs(X86CPU *cpu)
         kvm_msr_entry_add(cpu, MSR_IA32_SGXLEPUBKEYHASH3, 0);
     }
 
+    if (env->features[FEAT_XSAVE] & CPUID_D_1_EAX_XFD) {
+        kvm_msr_entry_add(cpu, MSR_IA32_XFD, 0);
+        kvm_msr_entry_add(cpu, MSR_IA32_XFD_ERR, 0);
+    }
+
     ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_MSRS, cpu->kvm_msr_buf);
     if (ret < 0) {
         return ret;
@@ -3870,6 +3882,12 @@ static int kvm_get_msrs(X86CPU *cpu)
             env->msr_ia32_sgxlepubkeyhash[index - MSR_IA32_SGXLEPUBKEYHASH0] =
                            msrs[i].data;
             break;
+        case MSR_IA32_XFD:
+            env->msr_xfd = msrs[i].data;
+            break;
+        case MSR_IA32_XFD_ERR:
+            env->msr_xfd_err = msrs[i].data;
+            break;
         }
     }
 
diff --git a/target/i386/machine.c b/target/i386/machine.c
index 83c2b91529b..3977e9d8f8d 100644
--- a/target/i386/machine.c
+++ b/target/i386/machine.c
@@ -1455,6 +1455,48 @@ static const VMStateDescription vmstate_msr_intel_sgx = {
     }
 };
 
+static bool xfd_msrs_needed(void *opaque)
+{
+    X86CPU *cpu = opaque;
+    CPUX86State *env = &cpu->env;
+
+    return !!(env->features[FEAT_XSAVE] & CPUID_D_1_EAX_XFD);
+}
+
+static const VMStateDescription vmstate_msr_xfd = {
+    .name = "cpu/msr_xfd",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = xfd_msrs_needed,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT64(env.msr_xfd, X86CPU),
+        VMSTATE_UINT64(env.msr_xfd_err, X86CPU),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+#ifdef TARGET_X86_64
+static bool amx_xtile_needed(void *opaque)
+{
+    X86CPU *cpu = opaque;
+    CPUX86State *env = &cpu->env;
+
+    return !!(env->features[FEAT_7_0_EDX] & CPUID_7_0_EDX_AMX_TILE);
+}
+
+static const VMStateDescription vmstate_amx_xtile = {
+    .name = "cpu/intel_amx_xtile",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = amx_xtile_needed,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT8_ARRAY(env.xtilecfg, X86CPU, 64),
+        VMSTATE_UINT8_ARRAY(env.xtiledata, X86CPU, 8192),
+        VMSTATE_END_OF_LIST()
+    }
+};
+#endif
+
 const VMStateDescription vmstate_x86_cpu = {
     .name = "cpu",
     .version_id = 12,
@@ -1593,6 +1635,10 @@ const VMStateDescription vmstate_x86_cpu = {
 #endif
         &vmstate_msr_tsx_ctrl,
         &vmstate_msr_intel_sgx,
+        &vmstate_msr_xfd,
+#ifdef TARGET_X86_64
+        &vmstate_amx_xtile,
+#endif
         NULL
     }
 };
-- 
Gitee


From f03a10bb42a8a61e879925268691175476c2521c Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini@redhat.com>
Date: Fri, 18 Mar 2022 16:23:47 +0100
Subject: [PATCH 094/407] target/i386: kvm: do not access uninitialized
 variable on older kernels

RH-Author: Paul Lai <plai@redhat.com>
RH-MergeRequest: 176: Enable KVM AMX support
RH-Commit: [12/13] 776fac1e7d1aa16ec5f4d99ddad3039eab8212af
RH-Bugzilla: 1916415
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Igor Mammedov <imammedo@redhat.com>
RH-Acked-by: Paolo Bonzini <pbonzini@redhat.com>

KVM support for AMX includes a new system attribute, KVM_X86_XCOMP_GUEST_SUPP.
Commit 19db68ca68 ("x86: Grant AMX permission for guest", 2022-03-15) however
did not fully consider the behavior on older kernels.  First, it warns
too aggressively.  Second, it invokes the KVM_GET_DEVICE_ATTR ioctl
unconditionally and then uses the "bitmask" variable, which remains
uninitialized if the ioctl fails.  Third, kvm_ioctl returns -errno rather
than -1 on errors.

While at it, explain why the ioctl is needed and KVM_GET_SUPPORTED_CPUID
is not enough.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 3ec5ad40081b14af28496198b4d08dbe13386790)
Signed-off-by: Paul Lai <plai@redhat.com>
---
 target/i386/kvm/kvm.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index b1128b0e074..bd439e56ad6 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -409,6 +409,12 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t function,
         }
     } else if (function == 0xd && index == 0 &&
                (reg == R_EAX || reg == R_EDX)) {
+        /*
+         * The value returned by KVM_GET_SUPPORTED_CPUID does not include
+         * features that still have to be enabled with the arch_prctl
+         * system call.  QEMU needs the full value, which is retrieved
+         * with KVM_GET_DEVICE_ATTR.
+         */
         struct kvm_device_attr attr = {
             .group = 0,
             .attr = KVM_X86_XCOMP_GUEST_SUPP,
@@ -417,13 +423,16 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t function,
 
         bool sys_attr = kvm_check_extension(s, KVM_CAP_SYS_ATTRIBUTES);
         if (!sys_attr) {
-            warn_report("cannot get sys attribute capabilities %d", sys_attr);
+            return ret;
         }
 
         int rc = kvm_ioctl(s, KVM_GET_DEVICE_ATTR, &attr);
-        if (rc == -1 && (errno == ENXIO || errno == EINVAL)) {
-            warn_report("KVM_GET_DEVICE_ATTR(0, KVM_X86_XCOMP_GUEST_SUPP) "
-                        "error: %d", rc);
+        if (rc < 0) {
+            if (rc != -ENXIO) {
+                warn_report("KVM_GET_DEVICE_ATTR(0, KVM_X86_XCOMP_GUEST_SUPP) "
+                            "error: %d", rc);
+            }
+            return ret;
         }
         ret = (reg == R_EAX) ? bitmask : bitmask >> 32;
     } else if (function == 0x80000001 && reg == R_ECX) {
-- 
Gitee


From e3354c0fd48b0478a47793dc4d79515565fba3fe Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini@redhat.com>
Date: Wed, 23 Mar 2022 12:33:25 +0100
Subject: [PATCH 095/407] KVM: x86: workaround invalid CPUID[0xD,9] info on
 some AMD processors
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Paul Lai <plai@redhat.com>
RH-MergeRequest: 176: Enable KVM AMX support
RH-Commit: [13/13] 38f147c911258e84e01336271ebd23a1c24371fc
RH-Bugzilla: 1916415
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Igor Mammedov <imammedo@redhat.com>
RH-Acked-by: Paolo Bonzini <pbonzini@redhat.com>

Some AMD processors expose the PKRU extended save state even if they do not have
the related PKU feature in CPUID.  Worse, when they do they report a size of
64, whereas the expected size of the PKRU extended save state is 8, therefore
the esa->size == eax assertion does not hold.

The state is already ignored by KVM_GET_SUPPORTED_CPUID because it
was not enabled in the host XCR0.  However, QEMU kvm_cpu_xsave_init()
runs before QEMU invokes arch_prctl() to enable dynamically-enabled
save states such as XTILEDATA, and KVM_GET_SUPPORTED_CPUID hides save
states that have yet to be enabled.  Therefore, kvm_cpu_xsave_init()
needs to consult the host CPUID instead of KVM_GET_SUPPORTED_CPUID,
and dies with an assertion failure.

When setting up the ExtSaveArea array to match the host, ignore features that
KVM does not report as supported.  This will cause QEMU to skip the incorrect
CPUID leaf instead of tripping the assertion.

Closes: https://gitlab.com/qemu-project/qemu/-/issues/916
Reported-by: Daniel P. Berrangé <berrange@redhat.com>
Analyzed-by: Yang Zhong <yang.zhong@intel.com>
Reported-by: Peter Krempa <pkrempa@redhat.com>
Tested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 58f7db26f21c690cf9a669c314cfd7371506084a)
Signed-off-by: Paul Lai <plai@redhat.com>
---
 target/i386/cpu.c         |  4 ++--
 target/i386/cpu.h         |  2 ++
 target/i386/kvm/kvm-cpu.c | 19 ++++++++++++-------
 3 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 09e08f7f388..0543b846ff2 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -4980,8 +4980,8 @@ CpuDefinitionInfoList *qmp_query_cpu_definitions(Error **errp)
     return cpu_list;
 }
 
-static uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
-                                                   bool migratable_only)
+uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
+                                            bool migratable_only)
 {
     FeatureWordInfo *wi = &feature_word_info[w];
     uint64_t r = 0;
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 8ab2a4042a8..006b735fe4a 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -604,6 +604,8 @@ typedef enum FeatureWord {
 } FeatureWord;
 
 typedef uint64_t FeatureWordArray[FEATURE_WORDS];
+uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
+                                            bool migratable_only);
 
 /* cpuid_features bits */
 #define CPUID_FP87 (1U << 0)
diff --git a/target/i386/kvm/kvm-cpu.c b/target/i386/kvm/kvm-cpu.c
index bdc967c484e..74c1396a93f 100644
--- a/target/i386/kvm/kvm-cpu.c
+++ b/target/i386/kvm/kvm-cpu.c
@@ -99,13 +99,18 @@ static void kvm_cpu_xsave_init(void)
     for (i = XSTATE_SSE_BIT + 1; i < XSAVE_STATE_AREA_COUNT; i++) {
         ExtSaveArea *esa = &x86_ext_save_areas[i];
 
-        if (esa->size) {
-            host_cpuid(0xd, i, &eax, &ebx, &ecx, &edx);
-            if (eax != 0) {
-                assert(esa->size == eax);
-                esa->offset = ebx;
-                esa->ecx = ecx;
-            }
+        if (!esa->size) {
+            continue;
+        }
+        if ((x86_cpu_get_supported_feature_word(esa->feature, false) & esa->bits)
+            != esa->bits) {
+            continue;
+        }
+        host_cpuid(0xd, i, &eax, &ebx, &ecx, &edx);
+        if (eax != 0) {
+            assert(esa->size == eax);
+            esa->offset = ebx;
+            esa->ecx = ecx;
         }
     }
 }
-- 
Gitee


From c93f35108dc8d6532a6db7caa3ea8684bcecde2e Mon Sep 17 00:00:00 2001
From: Si-Wei Liu <si-wei.liu@oracle.com>
Date: Fri, 6 May 2022 19:28:12 -0700
Subject: [PATCH 096/407] virtio-net: setup vhost_dev and notifiers for cvq
 only when feature is negotiated
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Jason Wang <jasowang@redhat.com>
RH-MergeRequest: 187: Multiqueue fixes for vhost-vDPA
RH-Commit: [1/7] 38bcfaa661f437b3dfa6b6f152dffd60073dc054
RH-Bugzilla: 2069946
RH-Acked-by: Eugenio Pérez <eperezma@redhat.com>
RH-Acked-by: Cindy Lu <lulu@redhat.com>
RH-Acked-by: Laurent Vivier <lvivier@redhat.com>

When the control virtqueue feature is absent or not negotiated,
vhost_net_start() still tries to set up vhost_dev and install
vhost notifiers for the control virtqueue, which results in
erroneous ioctl calls with incorrect queue index sending down
to driver. Do that only when needed.

Fixes: 22288fe ("virtio-net: vhost control virtqueue support")
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <1651890498-24478-2-git-send-email-si-wei.liu@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit aa8581945a13712ff3eed0ad3ba7a9664fc1604b)
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/net/virtio-net.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index e1f4748831e..ec045c3f41d 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -244,7 +244,8 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
     VirtIODevice *vdev = VIRTIO_DEVICE(n);
     NetClientState *nc = qemu_get_queue(n->nic);
     int queue_pairs = n->multiqueue ? n->max_queue_pairs : 1;
-    int cvq = n->max_ncs - n->max_queue_pairs;
+    int cvq = virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ) ?
+              n->max_ncs - n->max_queue_pairs : 0;
 
     if (!get_vhost_net(nc->peer)) {
         return;
-- 
Gitee


From 5bb3b48e728371c586acb1a660d6a2500bb566c4 Mon Sep 17 00:00:00 2001
From: Si-Wei Liu <si-wei.liu@oracle.com>
Date: Fri, 6 May 2022 19:28:13 -0700
Subject: [PATCH 097/407] virtio-net: align ctrl_vq index for non-mq guest for
 vhost_vdpa
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Jason Wang <jasowang@redhat.com>
RH-MergeRequest: 187: Multiqueue fixes for vhost-vDPA
RH-Commit: [2/7] 2647cf59f3dd1e3d8af2d12c01e06ae26fbc1dc2
RH-Bugzilla: 2069946
RH-Acked-by: Eugenio Pérez <eperezma@redhat.com>
RH-Acked-by: Cindy Lu <lulu@redhat.com>
RH-Acked-by: Laurent Vivier <lvivier@redhat.com>

With MQ enabled vdpa device and non-MQ supporting guest e.g.
booting vdpa with mq=on over OVMF of single vqp, below assert
failure is seen:

../hw/virtio/vhost-vdpa.c:560: vhost_vdpa_get_vq_index: Assertion `idx >= dev->vq_index && idx < dev->vq_index + dev->nvqs' failed.

0  0x00007f8ce3ff3387 in raise () at /lib64/libc.so.6
1  0x00007f8ce3ff4a78 in abort () at /lib64/libc.so.6
2  0x00007f8ce3fec1a6 in __assert_fail_base () at /lib64/libc.so.6
3  0x00007f8ce3fec252 in  () at /lib64/libc.so.6
4  0x0000558f52d79421 in vhost_vdpa_get_vq_index (dev=<optimized out>, idx=<optimized out>) at ../hw/virtio/vhost-vdpa.c:563
5  0x0000558f52d79421 in vhost_vdpa_get_vq_index (dev=<optimized out>, idx=<optimized out>) at ../hw/virtio/vhost-vdpa.c:558
6  0x0000558f52d7329a in vhost_virtqueue_mask (hdev=0x558f55c01800, vdev=0x558f568f91f0, n=2, mask=<optimized out>) at ../hw/virtio/vhost.c:1557
7  0x0000558f52c6b89a in virtio_pci_set_guest_notifier (d=d@entry=0x558f568f0f60, n=n@entry=2, assign=assign@entry=true, with_irqfd=with_irqfd@entry=false)
   at ../hw/virtio/virtio-pci.c:974
8  0x0000558f52c6c0d8 in virtio_pci_set_guest_notifiers (d=0x558f568f0f60, nvqs=3, assign=true) at ../hw/virtio/virtio-pci.c:1019
9  0x0000558f52bf091d in vhost_net_start (dev=dev@entry=0x558f568f91f0, ncs=0x558f56937cd0, data_queue_pairs=data_queue_pairs@entry=1, cvq=cvq@entry=1)
   at ../hw/net/vhost_net.c:361
10 0x0000558f52d4e5e7 in virtio_net_set_status (status=<optimized out>, n=0x558f568f91f0) at ../hw/net/virtio-net.c:289
11 0x0000558f52d4e5e7 in virtio_net_set_status (vdev=0x558f568f91f0, status=15 '\017') at ../hw/net/virtio-net.c:370
12 0x0000558f52d6c4b2 in virtio_set_status (vdev=vdev@entry=0x558f568f91f0, val=val@entry=15 '\017') at ../hw/virtio/virtio.c:1945
13 0x0000558f52c69eff in virtio_pci_common_write (opaque=0x558f568f0f60, addr=<optimized out>, val=<optimized out>, size=<optimized out>) at ../hw/virtio/virtio-pci.c:1292
14 0x0000558f52d15d6e in memory_region_write_accessor (mr=0x558f568f19d0, addr=20, value=<optimized out>, size=1, shift=<optimized out>, mask=<optimized out>, attrs=...)
   at ../softmmu/memory.c:492
15 0x0000558f52d127de in access_with_adjusted_size (addr=addr@entry=20, value=value@entry=0x7f8cdbffe748, size=size@entry=1, access_size_min=<optimized out>, access_size_max=<optimized out>, access_fn=0x558f52d15cf0 <memory_region_write_accessor>, mr=0x558f568f19d0, attrs=...) at ../softmmu/memory.c:554
16 0x0000558f52d157ef in memory_region_dispatch_write (mr=mr@entry=0x558f568f19d0, addr=20, data=<optimized out>, op=<optimized out>, attrs=attrs@entry=...)
   at ../softmmu/memory.c:1504
17 0x0000558f52d078e7 in flatview_write_continue (fv=fv@entry=0x7f8accbc3b90, addr=addr@entry=103079215124, attrs=..., ptr=ptr@entry=0x7f8ce6300028, len=len@entry=1, addr1=<optimized out>, l=<optimized out>, mr=0x558f568f19d0) at /home/opc/qemu-upstream/include/qemu/host-utils.h:165
18 0x0000558f52d07b06 in flatview_write (fv=0x7f8accbc3b90, addr=103079215124, attrs=..., buf=0x7f8ce6300028, len=1) at ../softmmu/physmem.c:2822
19 0x0000558f52d0b36b in address_space_write (as=<optimized out>, addr=<optimized out>, attrs=..., buf=buf@entry=0x7f8ce6300028, len=<optimized out>)
   at ../softmmu/physmem.c:2914
20 0x0000558f52d0b3da in address_space_rw (as=<optimized out>, addr=<optimized out>, attrs=...,
   attrs@entry=..., buf=buf@entry=0x7f8ce6300028, len=<optimized out>, is_write=<optimized out>) at ../softmmu/physmem.c:2924
21 0x0000558f52dced09 in kvm_cpu_exec (cpu=cpu@entry=0x558f55c2da60) at ../accel/kvm/kvm-all.c:2903
22 0x0000558f52dcfabd in kvm_vcpu_thread_fn (arg=arg@entry=0x558f55c2da60) at ../accel/kvm/kvm-accel-ops.c:49
23 0x0000558f52f9f04a in qemu_thread_start (args=<optimized out>) at ../util/qemu-thread-posix.c:556
24 0x00007f8ce4392ea5 in start_thread () at /lib64/libpthread.so.0
25 0x00007f8ce40bb9fd in clone () at /lib64/libc.so.6

The cause for the assert failure is due to that the vhost_dev index
for the ctrl vq was not aligned with actual one in use by the guest.
Upon multiqueue feature negotiation in virtio_net_set_multiqueue(),
if guest doesn't support multiqueue, the guest vq layout would shrink
to a single queue pair, consisting of 3 vqs in total (rx, tx and ctrl).
This results in ctrl_vq taking a different vhost_dev group index than
the default. We can map vq to the correct vhost_dev group by checking
if MQ is supported by guest and successfully negotiated. Since the
MQ feature is only present along with CTRL_VQ, we ensure the index
2 is only meant for the control vq while MQ is not supported by guest.

Fixes: 22288fe ("virtio-net: vhost control virtqueue support")
Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <1651890498-24478-3-git-send-email-si-wei.liu@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 68b0a6395f36a8f48f56f46d05f30be2067598b0)
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/net/virtio-net.c | 33 +++++++++++++++++++++++++++++++--
 1 file changed, 31 insertions(+), 2 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index ec045c3f41d..f118379bb46 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -14,6 +14,7 @@
 #include "qemu/osdep.h"
 #include "qemu/atomic.h"
 #include "qemu/iov.h"
+#include "qemu/log.h"
 #include "qemu/main-loop.h"
 #include "qemu/module.h"
 #include "hw/virtio/virtio.h"
@@ -3163,8 +3164,22 @@ static NetClientInfo net_virtio_info = {
 static bool virtio_net_guest_notifier_pending(VirtIODevice *vdev, int idx)
 {
     VirtIONet *n = VIRTIO_NET(vdev);
-    NetClientState *nc = qemu_get_subqueue(n->nic, vq2q(idx));
+    NetClientState *nc;
     assert(n->vhost_started);
+    if (!virtio_vdev_has_feature(vdev, VIRTIO_NET_F_MQ) && idx == 2) {
+        /* Must guard against invalid features and bogus queue index
+         * from being set by malicious guest, or penetrated through
+         * buggy migration stream.
+         */
+        if (!virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ)) {
+            qemu_log_mask(LOG_GUEST_ERROR,
+                          "%s: bogus vq index ignored\n", __func__);
+            return false;
+        }
+        nc = qemu_get_subqueue(n->nic, n->max_queue_pairs);
+    } else {
+        nc = qemu_get_subqueue(n->nic, vq2q(idx));
+    }
     return vhost_net_virtqueue_pending(get_vhost_net(nc->peer), idx);
 }
 
@@ -3172,8 +3187,22 @@ static void virtio_net_guest_notifier_mask(VirtIODevice *vdev, int idx,
                                            bool mask)
 {
     VirtIONet *n = VIRTIO_NET(vdev);
-    NetClientState *nc = qemu_get_subqueue(n->nic, vq2q(idx));
+    NetClientState *nc;
     assert(n->vhost_started);
+    if (!virtio_vdev_has_feature(vdev, VIRTIO_NET_F_MQ) && idx == 2) {
+        /* Must guard against invalid features and bogus queue index
+         * from being set by malicious guest, or penetrated through
+         * buggy migration stream.
+         */
+        if (!virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ)) {
+            qemu_log_mask(LOG_GUEST_ERROR,
+                          "%s: bogus vq index ignored\n", __func__);
+            return;
+        }
+        nc = qemu_get_subqueue(n->nic, n->max_queue_pairs);
+    } else {
+        nc = qemu_get_subqueue(n->nic, vq2q(idx));
+    }
     vhost_net_virtqueue_mask(get_vhost_net(nc->peer),
                              vdev, idx, mask);
 }
-- 
Gitee


From 111911f3a6d7b7530cf550e845b1cca1ac97b991 Mon Sep 17 00:00:00 2001
From: Si-Wei Liu <si-wei.liu@oracle.com>
Date: Fri, 6 May 2022 19:28:14 -0700
Subject: [PATCH 098/407] vhost-vdpa: fix improper cleanup in
 net_init_vhost_vdpa
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Jason Wang <jasowang@redhat.com>
RH-MergeRequest: 187: Multiqueue fixes for vhost-vDPA
RH-Commit: [3/7] b3b658dcb4695defe1fdb199570fb984291e8e21
RH-Bugzilla: 2069946
RH-Acked-by: Eugenio Pérez <eperezma@redhat.com>
RH-Acked-by: Cindy Lu <lulu@redhat.com>
RH-Acked-by: Laurent Vivier <lvivier@redhat.com>

... such that no memory leaks on dangling net clients in case of
error.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <1651890498-24478-4-git-send-email-si-wei.liu@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 9bd055073e375c8a0d7ebce925e05d914d69fc7f)
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 net/vhost-vdpa.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 25dd6dd9754..814f7046875 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -306,7 +306,9 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
 
 err:
     if (i) {
-        qemu_del_net_client(ncs[0]);
+        for (i--; i >= 0; i--) {
+            qemu_del_net_client(ncs[i]);
+        }
     }
     qemu_close(vdpa_device_fd);
     g_free(ncs);
-- 
Gitee


From 0c6e5700108efa4b0f767d32e1e714bf2d6a1077 Mon Sep 17 00:00:00 2001
From: Si-Wei Liu <si-wei.liu@oracle.com>
Date: Fri, 6 May 2022 19:28:15 -0700
Subject: [PATCH 099/407] vhost-net: fix improper cleanup in vhost_net_start
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Jason Wang <jasowang@redhat.com>
RH-MergeRequest: 187: Multiqueue fixes for vhost-vDPA
RH-Commit: [4/7] bebe7990a12e901fbb84e5e4b7a62744d75c9d9e
RH-Bugzilla: 2069946
RH-Acked-by: Eugenio Pérez <eperezma@redhat.com>
RH-Acked-by: Cindy Lu <lulu@redhat.com>
RH-Acked-by: Laurent Vivier <lvivier@redhat.com>

vhost_net_start() missed a corresponding stop_one() upon error from
vhost_set_vring_enable(). While at it, make the error handling for
err_start more robust. No real issue was found due to this though.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <1651890498-24478-5-git-send-email-si-wei.liu@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 6f3910b5eee00b8cc959e94659c0d524c482a418)
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/net/vhost_net.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 30379d2ca41..d6d7c51f629 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -381,6 +381,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
             r = vhost_set_vring_enable(peer, peer->vring_enable);
 
             if (r < 0) {
+                vhost_net_stop_one(get_vhost_net(peer), dev);
                 goto err_start;
             }
         }
@@ -390,7 +391,8 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
 
 err_start:
     while (--i >= 0) {
-        peer = qemu_get_peer(ncs , i);
+        peer = qemu_get_peer(ncs, i < data_queue_pairs ?
+                                  i : n->max_queue_pairs);
         vhost_net_stop_one(get_vhost_net(peer), dev);
     }
     e = k->set_guest_notifiers(qbus->parent, total_notifiers, false);
-- 
Gitee


From 4616fe6a6d703639925be0d69d2f066d70091ee9 Mon Sep 17 00:00:00 2001
From: Si-Wei Liu <si-wei.liu@oracle.com>
Date: Fri, 6 May 2022 19:28:16 -0700
Subject: [PATCH 100/407] vhost-vdpa: backend feature should set only once
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Jason Wang <jasowang@redhat.com>
RH-MergeRequest: 187: Multiqueue fixes for vhost-vDPA
RH-Commit: [5/7] 0ab13542cf25c129dc403db95c7db12cdb012744
RH-Bugzilla: 2069946
RH-Acked-by: Eugenio Pérez <eperezma@redhat.com>
RH-Acked-by: Cindy Lu <lulu@redhat.com>
RH-Acked-by: Laurent Vivier <lvivier@redhat.com>

The vhost_vdpa_one_time_request() branch in
vhost_vdpa_set_backend_cap() incorrectly sends down
ioctls on vhost_dev with non-zero index. This may
end up with multiple VHOST_SET_BACKEND_FEATURES
ioctl calls sent down on the vhost-vdpa fd that is
shared between all these vhost_dev's.

To fix it, send down ioctl only once via the first
vhost_dev with index 0. Toggle the polarity of the
vhost_vdpa_one_time_request() test should do the
trick.

Fixes: 4d191cfdc7de ("vhost-vdpa: classify one time request")
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <1651890498-24478-6-git-send-email-si-wei.liu@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 6aee7e4233f6467f69531fcd352adff028f3f5ea)
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/virtio/vhost-vdpa.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 78da48a333f..a9be24776a7 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -525,7 +525,7 @@ static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
 
     features &= f;
 
-    if (vhost_vdpa_one_time_request(dev)) {
+    if (!vhost_vdpa_one_time_request(dev)) {
         r = vhost_vdpa_call(dev, VHOST_SET_BACKEND_FEATURES, &features);
         if (r) {
             return -EFAULT;
-- 
Gitee


From ee98d5ee65af80bc622c65c430799edf09e0490a Mon Sep 17 00:00:00 2001
From: Si-Wei Liu <si-wei.liu@oracle.com>
Date: Fri, 6 May 2022 19:28:17 -0700
Subject: [PATCH 101/407] vhost-vdpa: change name and polarity for
 vhost_vdpa_one_time_request()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Jason Wang <jasowang@redhat.com>
RH-MergeRequest: 187: Multiqueue fixes for vhost-vDPA
RH-Commit: [6/7] 727ab0bb813f073e8cd2f7e68a9acda60c2cb33d
RH-Bugzilla: 2069946
RH-Acked-by: Eugenio Pérez <eperezma@redhat.com>
RH-Acked-by: Cindy Lu <lulu@redhat.com>
RH-Acked-by: Laurent Vivier <lvivier@redhat.com>

The name vhost_vdpa_one_time_request() was confusing. No
matter whatever it returns, its typical occurrence had
always been at requests that only need to be applied once.
And the name didn't suggest what it actually checks for.
Change it to vhost_vdpa_first_dev() with polarity flipped
for better readibility of code. That way it is able to
reflect what the check is really about.

This call is applicable to request which performs operation
only once, before queues are set up, and usually at the beginning
of the caller function. Document the requirement for it in place.

Conflicts: hw/virtio/vhost-vdpa.c since we don't have shadow virtqueue
suport.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Message-Id: <1651890498-24478-7-git-send-email-si-wei.liu@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit d71b0609fc04217e28d17009f04d74b08be6f466)
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/virtio/vhost-vdpa.c | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index a9be24776a7..38bbcb3c18e 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -319,11 +319,18 @@ static void vhost_vdpa_get_iova_range(struct vhost_vdpa *v)
                                     v->iova_range.last);
 }
 
-static bool vhost_vdpa_one_time_request(struct vhost_dev *dev)
+/*
+ * The use of this function is for requests that only need to be
+ * applied once. Typically such request occurs at the beginning
+ * of operation, and before setting up queues. It should not be
+ * used for request that performs operation until all queues are
+ * set, which would need to check dev->vq_index_end instead.
+ */
+static bool vhost_vdpa_first_dev(struct vhost_dev *dev)
 {
     struct vhost_vdpa *v = dev->opaque;
 
-    return v->index != 0;
+    return v->index == 0;
 }
 
 static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
@@ -351,7 +358,7 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
 
     vhost_vdpa_get_iova_range(v);
 
-    if (vhost_vdpa_one_time_request(dev)) {
+    if (!vhost_vdpa_first_dev(dev)) {
         return 0;
     }
 
@@ -468,7 +475,7 @@ static int vhost_vdpa_memslots_limit(struct vhost_dev *dev)
 static int vhost_vdpa_set_mem_table(struct vhost_dev *dev,
                                     struct vhost_memory *mem)
 {
-    if (vhost_vdpa_one_time_request(dev)) {
+    if (!vhost_vdpa_first_dev(dev)) {
         return 0;
     }
 
@@ -496,7 +503,7 @@ static int vhost_vdpa_set_features(struct vhost_dev *dev,
 {
     int ret;
 
-    if (vhost_vdpa_one_time_request(dev)) {
+    if (!vhost_vdpa_first_dev(dev)) {
         return 0;
     }
 
@@ -525,7 +532,7 @@ static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
 
     features &= f;
 
-    if (!vhost_vdpa_one_time_request(dev)) {
+    if (vhost_vdpa_first_dev(dev)) {
         r = vhost_vdpa_call(dev, VHOST_SET_BACKEND_FEATURES, &features);
         if (r) {
             return -EFAULT;
@@ -670,7 +677,7 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
 static int vhost_vdpa_set_log_base(struct vhost_dev *dev, uint64_t base,
                                      struct vhost_log *log)
 {
-    if (vhost_vdpa_one_time_request(dev)) {
+    if (!vhost_vdpa_first_dev(dev)) {
         return 0;
     }
 
@@ -739,7 +746,7 @@ static int vhost_vdpa_get_features(struct vhost_dev *dev,
 
 static int vhost_vdpa_set_owner(struct vhost_dev *dev)
 {
-    if (vhost_vdpa_one_time_request(dev)) {
+    if (!vhost_vdpa_first_dev(dev)) {
         return 0;
     }
 
-- 
Gitee


From 077800a2e8942feafe09a5332b14195b3e792be2 Mon Sep 17 00:00:00 2001
From: Si-Wei Liu <si-wei.liu@oracle.com>
Date: Fri, 6 May 2022 19:28:18 -0700
Subject: [PATCH 102/407] virtio-net: don't handle mq request in userspace
 handler for vhost-vdpa
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Jason Wang <jasowang@redhat.com>
RH-MergeRequest: 187: Multiqueue fixes for vhost-vDPA
RH-Commit: [7/7] 0e6684d12e42752deae8f5ebc56456fed174e0ed
RH-Bugzilla: 2069946
RH-Acked-by: Eugenio Pérez <eperezma@redhat.com>
RH-Acked-by: Cindy Lu <lulu@redhat.com>
RH-Acked-by: Laurent Vivier <lvivier@redhat.com>

virtio_queue_host_notifier_read() tends to read pending event
left behind on ioeventfd in the vhost_net_stop() path, and
attempts to handle outstanding kicks from userspace vq handler.
However, in the ctrl_vq handler, virtio_net_handle_mq() has a
recursive call into virtio_net_set_status(), which may lead to
segmentation fault as shown in below stack trace:

0  0x000055f800df1780 in qdev_get_parent_bus (dev=0x0) at ../hw/core/qdev.c:376
1  0x000055f800c68ad8 in virtio_bus_device_iommu_enabled (vdev=vdev@entry=0x0) at ../hw/virtio/virtio-bus.c:331
2  0x000055f800d70d7f in vhost_memory_unmap (dev=<optimized out>) at ../hw/virtio/vhost.c:318
3  0x000055f800d70d7f in vhost_memory_unmap (dev=<optimized out>, buffer=0x7fc19bec5240, len=2052, is_write=1, access_len=2052) at ../hw/virtio/vhost.c:336
4  0x000055f800d71867 in vhost_virtqueue_stop (dev=dev@entry=0x55f8037ccc30, vdev=vdev@entry=0x55f8044ec590, vq=0x55f8037cceb0, idx=0) at ../hw/virtio/vhost.c:1241
5  0x000055f800d7406c in vhost_dev_stop (hdev=hdev@entry=0x55f8037ccc30, vdev=vdev@entry=0x55f8044ec590) at ../hw/virtio/vhost.c:1839
6  0x000055f800bf00a7 in vhost_net_stop_one (net=0x55f8037ccc30, dev=0x55f8044ec590) at ../hw/net/vhost_net.c:315
7  0x000055f800bf0678 in vhost_net_stop (dev=dev@entry=0x55f8044ec590, ncs=0x55f80452bae0, data_queue_pairs=data_queue_pairs@entry=7, cvq=cvq@entry=1)
   at ../hw/net/vhost_net.c:423
8  0x000055f800d4e628 in virtio_net_set_status (status=<optimized out>, n=0x55f8044ec590) at ../hw/net/virtio-net.c:296
9  0x000055f800d4e628 in virtio_net_set_status (vdev=vdev@entry=0x55f8044ec590, status=15 '\017') at ../hw/net/virtio-net.c:370
10 0x000055f800d534d8 in virtio_net_handle_ctrl (iov_cnt=<optimized out>, iov=<optimized out>, cmd=0 '\000', n=0x55f8044ec590) at ../hw/net/virtio-net.c:1408
11 0x000055f800d534d8 in virtio_net_handle_ctrl (vdev=0x55f8044ec590, vq=0x7fc1a7e888d0) at ../hw/net/virtio-net.c:1452
12 0x000055f800d69f37 in virtio_queue_host_notifier_read (vq=0x7fc1a7e888d0) at ../hw/virtio/virtio.c:2331
13 0x000055f800d69f37 in virtio_queue_host_notifier_read (n=n@entry=0x7fc1a7e8894c) at ../hw/virtio/virtio.c:3575
14 0x000055f800c688e6 in virtio_bus_cleanup_host_notifier (bus=<optimized out>, n=n@entry=14) at ../hw/virtio/virtio-bus.c:312
15 0x000055f800d73106 in vhost_dev_disable_notifiers (hdev=hdev@entry=0x55f8035b51b0, vdev=vdev@entry=0x55f8044ec590)
   at ../../../include/hw/virtio/virtio-bus.h:35
16 0x000055f800bf00b2 in vhost_net_stop_one (net=0x55f8035b51b0, dev=0x55f8044ec590) at ../hw/net/vhost_net.c:316
17 0x000055f800bf0678 in vhost_net_stop (dev=dev@entry=0x55f8044ec590, ncs=0x55f80452bae0, data_queue_pairs=data_queue_pairs@entry=7, cvq=cvq@entry=1)
   at ../hw/net/vhost_net.c:423
18 0x000055f800d4e628 in virtio_net_set_status (status=<optimized out>, n=0x55f8044ec590) at ../hw/net/virtio-net.c:296
19 0x000055f800d4e628 in virtio_net_set_status (vdev=0x55f8044ec590, status=15 '\017') at ../hw/net/virtio-net.c:370
20 0x000055f800d6c4b2 in virtio_set_status (vdev=0x55f8044ec590, val=<optimized out>) at ../hw/virtio/virtio.c:1945
21 0x000055f800d11d9d in vm_state_notify (running=running@entry=false, state=state@entry=RUN_STATE_SHUTDOWN) at ../softmmu/runstate.c:333
22 0x000055f800d04e7a in do_vm_stop (state=state@entry=RUN_STATE_SHUTDOWN, send_stop=send_stop@entry=false) at ../softmmu/cpus.c:262
23 0x000055f800d04e99 in vm_shutdown () at ../softmmu/cpus.c:280
24 0x000055f800d126af in qemu_cleanup () at ../softmmu/runstate.c:812
25 0x000055f800ad5b13 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ../softmmu/main.c:51

For now, temporarily disable handling MQ request from the ctrl_vq
userspace hanlder to avoid the recursive virtio_net_set_status()
call. Some rework is needed to allow changing the number of
queues without going through a full virtio_net_set_status cycle,
particularly for vhost-vdpa backend.

This patch will need to be reverted as soon as future patches of
having the change of #queues handled in userspace is merged.

Fixes: 402378407db ("vhost-vdpa: multiqueue support")
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <1651890498-24478-8-git-send-email-si-wei.liu@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 2a7888cc3aa31faee839fa5dddad354ff8941f4c)
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/net/virtio-net.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index f118379bb46..7e172ef8297 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1373,6 +1373,7 @@ static int virtio_net_handle_mq(VirtIONet *n, uint8_t cmd,
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(n);
     uint16_t queue_pairs;
+    NetClientState *nc = qemu_get_queue(n->nic);
 
     virtio_net_disable_rss(n);
     if (cmd == VIRTIO_NET_CTRL_MQ_HASH_CONFIG) {
@@ -1404,6 +1405,18 @@ static int virtio_net_handle_mq(VirtIONet *n, uint8_t cmd,
         return VIRTIO_NET_ERR;
     }
 
+    /* Avoid changing the number of queue_pairs for vdpa device in
+     * userspace handler. A future fix is needed to handle the mq
+     * change in userspace handler with vhost-vdpa. Let's disable
+     * the mq handling from userspace for now and only allow get
+     * done through the kernel. Ripples may be seen when falling
+     * back to userspace, but without doing it qemu process would
+     * crash on a recursive entry to virtio_net_set_status().
+     */
+    if (nc->peer && nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
+        return VIRTIO_NET_ERR;
+    }
+
     n->curr_queue_pairs = queue_pairs;
     /* stop the backend before changing the number of queue_pairs to avoid handling a
      * disabled queue */
-- 
Gitee


From bca5cf1cc30db554c1d38a79380fdb64b36eb468 Mon Sep 17 00:00:00 2001
From: Hanna Reitz <hreitz@redhat.com>
Date: Thu, 20 Jan 2022 15:22:59 +0100
Subject: [PATCH 103/407] ide: Increment BB in-flight counter for TRIM BH
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Hanna Reitz <hreitz@redhat.com>
RH-MergeRequest: 188: ide: Increment BB in-flight counter for TRIM BH
RH-Commit: [1/1] 1e702e735ff63f2b8b69c20cac1b309dd085cd62
RH-Bugzilla: 2029980
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
RH-Acked-by: Kevin Wolf <kwolf@redhat.com>
RH-Acked-by: Paolo Bonzini <pbonzini@redhat.com>

When we still have an AIOCB registered for DMA operations, we try to
settle the respective operation by draining the BlockBackend associated
with the IDE device.

However, this assumes that every DMA operation is associated with an
increment of the BlockBackend’s in-flight counter (e.g. through some
ongoing I/O operation), so that draining the BB until its in-flight
counter reaches 0 will settle all DMA operations.  That is not the case:
For TRIM, the guest can issue a zero-length operation that will not
result in any I/O operation forwarded to the BlockBackend, and also not
increment the in-flight counter in any other way.  In such a case,
blk_drain() will be a no-op if no other operations are in flight.

It is clear that if blk_drain() is a no-op, the value of
s->bus->dma->aiocb will not change between checking it in the `if`
condition and asserting that it is NULL after blk_drain().

The particular problem is that ide_issue_trim() creates a BH
(ide_trim_bh_cb()) to settle the TRIM request: iocb->common.cb() is
ide_dma_cb(), which will either create a new request, or find the
transfer to be done and call ide_set_inactive(), which clears
s->bus->dma->aiocb.  Therefore, the blk_drain() must wait for
ide_trim_bh_cb() to run, which currently it will not always do.

To fix this issue, we increment the BlockBackend's in-flight counter
when the TRIM operation begins (in ide_issue_trim(), when the
ide_trim_bh_cb() BH is created) and decrement it when ide_trim_bh_cb()
is done.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2029980
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220120142259.120189-1-hreitz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Tested-by: John Snow <jsnow@redhat.com>
(cherry picked from commit 7e5cdb345f77d76cb4877fe6230c4e17a7d0d0ca)
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
 hw/ide/core.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index e28f8aad611..15138225be4 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -433,12 +433,16 @@ static const AIOCBInfo trim_aiocb_info = {
 static void ide_trim_bh_cb(void *opaque)
 {
     TrimAIOCB *iocb = opaque;
+    BlockBackend *blk = iocb->s->blk;
 
     iocb->common.cb(iocb->common.opaque, iocb->ret);
 
     qemu_bh_delete(iocb->bh);
     iocb->bh = NULL;
     qemu_aio_unref(iocb);
+
+    /* Paired with an increment in ide_issue_trim() */
+    blk_dec_in_flight(blk);
 }
 
 static void ide_issue_trim_cb(void *opaque, int ret)
@@ -508,6 +512,9 @@ BlockAIOCB *ide_issue_trim(
     IDEState *s = opaque;
     TrimAIOCB *iocb;
 
+    /* Paired with a decrement in ide_trim_bh_cb() */
+    blk_inc_in_flight(s->blk);
+
     iocb = blk_aio_get(&trim_aiocb_info, s->blk, cb, cb_opaque);
     iocb->s = s;
     iocb->bh = qemu_bh_new(ide_trim_bh_cb, iocb);
-- 
Gitee


From 3f1c0fc3b6b79ba41abb8f3f263db982e5fd1b74 Mon Sep 17 00:00:00 2001
From: Hanna Reitz <hreitz@redhat.com>
Date: Wed, 16 Feb 2022 11:53:53 +0100
Subject: [PATCH 104/407] block: Make bdrv_refresh_limits() non-recursive
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Hanna Reitz <hreitz@redhat.com>
RH-MergeRequest: 189: block: Make bdrv_refresh_limits() non-recursive
RH-Commit: [1/3] 1a1fe37f8d8f0344dd8639d6cc9d884d1aff9096
RH-Bugzilla: 2072932
RH-Acked-by: Eric Blake <eblake@redhat.com>
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
RH-Acked-by: Kevin Wolf <kwolf@redhat.com>

bdrv_refresh_limits() recurses down to the node's children.  That does
not seem necessary: When we refresh limits on some node, and then
recurse down and were to change one of its children's BlockLimits, then
that would mean we noticed the changed limits by pure chance.  The fact
that we refresh the parent's limits has nothing to do with it, so the
reason for the change probably happened before this point in time, and
we should have refreshed the limits then.

Consequently, we should actually propagate block limits changes upwards,
not downwards.  That is a separate and pre-existing issue, though, and
so will not be addressed in this patch.

The problem with recursing is that bdrv_refresh_limits() is not atomic.
It begins with zeroing BDS.bl, and only then sets proper, valid limits.
If we do not drain all nodes whose limits are refreshed, then concurrent
I/O requests can encounter invalid request_alignment values and crash
qemu.  Therefore, a recursing bdrv_refresh_limits() requires the whole
subtree to be drained, which is currently not ensured by most callers.

A non-recursive bdrv_refresh_limits() only requires the node in question
to not receive I/O requests, and this is done by most callers in some
way or another:
- bdrv_open_driver() deals with a new node with no parents yet
- bdrv_set_file_or_backing_noperm() acts on a drained node
- bdrv_reopen_commit() acts only on drained nodes
- bdrv_append() should in theory require the node to be drained; in
  practice most callers just lock the AioContext, which should at least
  be enough to prevent concurrent I/O requests from accessing invalid
  limits

So we can resolve the bug by making bdrv_refresh_limits() non-recursive.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1879437
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20220216105355.30729-2-hreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 4d378bbd831bdd2f6e6adcd4ea5b77b6effaa627)
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
 block/io.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/block/io.c b/block/io.c
index 4e4cb556c58..c3e7301613a 100644
--- a/block/io.c
+++ b/block/io.c
@@ -189,10 +189,6 @@ void bdrv_refresh_limits(BlockDriverState *bs, Transaction *tran, Error **errp)
     QLIST_FOREACH(c, &bs->children, next) {
         if (c->role & (BDRV_CHILD_DATA | BDRV_CHILD_FILTERED | BDRV_CHILD_COW))
         {
-            bdrv_refresh_limits(c->bs, tran, errp);
-            if (*errp) {
-                return;
-            }
             bdrv_merge_limits(&bs->bl, &c->bs->bl);
             have_limits = true;
         }
-- 
Gitee


From 7861e27ba53ea978f4d4659866378dd78f38ed8c Mon Sep 17 00:00:00 2001
From: Hanna Reitz <hreitz@redhat.com>
Date: Wed, 16 Feb 2022 11:53:54 +0100
Subject: [PATCH 105/407] iotests: Allow using QMP with the QSD

RH-Author: Hanna Reitz <hreitz@redhat.com>
RH-MergeRequest: 189: block: Make bdrv_refresh_limits() non-recursive
RH-Commit: [2/3] 55bee4690a2e02d3be9f2bd68f2d244d0a36743b
RH-Bugzilla: 2072932
RH-Acked-by: Eric Blake <eblake@redhat.com>
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
RH-Acked-by: Kevin Wolf <kwolf@redhat.com>

Add a parameter to optionally open a QMP connection when creating a
QemuStorageDaemon instance.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220216105355.30729-3-hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit ec88eed8d14088b36a3495710368b8d1a3c33420)
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
 tests/qemu-iotests/iotests.py | 32 +++++++++++++++++++++++++++++++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index a51b5ce8cd8..2ef493755c6 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -38,6 +38,7 @@
 
 from qemu.machine import qtest
 from qemu.qmp import QMPMessage
+from qemu.aqmp.legacy import QEMUMonitorProtocol
 
 # Use this logger for logging messages directly from the iotests module
 logger = logging.getLogger('qemu.iotests')
@@ -315,14 +316,30 @@ def cmd(self, cmd):
 
 
 class QemuStorageDaemon:
-    def __init__(self, *args: str, instance_id: str = 'a'):
+    _qmp: Optional[QEMUMonitorProtocol] = None
+    _qmpsock: Optional[str] = None
+    # Python < 3.8 would complain if this type were not a string literal
+    # (importing `annotations` from `__future__` would work; but not on <= 3.6)
+    _p: 'Optional[subprocess.Popen[bytes]]' = None
+
+    def __init__(self, *args: str, instance_id: str = 'a', qmp: bool = False):
         assert '--pidfile' not in args
         self.pidfile = os.path.join(test_dir, f'qsd-{instance_id}-pid')
         all_args = [qsd_prog] + list(args) + ['--pidfile', self.pidfile]
 
+        if qmp:
+            self._qmpsock = os.path.join(sock_dir, f'qsd-{instance_id}.sock')
+            all_args += ['--chardev',
+                         f'socket,id=qmp-sock,path={self._qmpsock}',
+                         '--monitor', 'qmp-sock']
+
+            self._qmp = QEMUMonitorProtocol(self._qmpsock, server=True)
+
         # Cannot use with here, we want the subprocess to stay around
         # pylint: disable=consider-using-with
         self._p = subprocess.Popen(all_args)
+        if self._qmp is not None:
+            self._qmp.accept()
         while not os.path.exists(self.pidfile):
             if self._p.poll() is not None:
                 cmd = ' '.join(all_args)
@@ -337,11 +354,24 @@ def __init__(self, *args: str, instance_id: str = 'a'):
 
         assert self._pid == self._p.pid
 
+    def qmp(self, cmd: str, args: Optional[Dict[str, object]] = None) \
+            -> QMPMessage:
+        assert self._qmp is not None
+        return self._qmp.cmd(cmd, args)
+
     def stop(self, kill_signal=15):
         self._p.send_signal(kill_signal)
         self._p.wait()
         self._p = None
 
+        if self._qmp:
+            self._qmp.close()
+
+        if self._qmpsock is not None:
+            try:
+                os.remove(self._qmpsock)
+            except OSError:
+                pass
         try:
             os.remove(self.pidfile)
         except OSError:
-- 
Gitee


From 58e86bda4789baad4f5cb7abec466e68d9b79e1a Mon Sep 17 00:00:00 2001
From: Hanna Reitz <hreitz@redhat.com>
Date: Wed, 16 Feb 2022 11:53:55 +0100
Subject: [PATCH 106/407] iotests/graph-changes-while-io: New test

RH-Author: Hanna Reitz <hreitz@redhat.com>
RH-MergeRequest: 189: block: Make bdrv_refresh_limits() non-recursive
RH-Commit: [3/3] b9dffe09bef6cf9b2f0aad69b327ea1df92e847a
RH-Bugzilla: 2072932
RH-Acked-by: Eric Blake <eblake@redhat.com>
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
RH-Acked-by: Kevin Wolf <kwolf@redhat.com>

Test the following scenario:
1. Some block node (null-co) attached to a user (here: NBD server) that
   performs I/O and keeps the node in an I/O thread
2. Repeatedly run blockdev-add/blockdev-del to add/remove an overlay
   to/from that node

Each blockdev-add triggers bdrv_refresh_limits(), and because
blockdev-add runs in the main thread, it does not stop the I/O requests.
I/O can thus happen while the limits are refreshed, and when such a
request sees a temporarily invalid block limit (e.g. alignment is 0),
this may easily crash qemu (or the storage daemon in this case).

The block layer needs to ensure that I/O requests to a node are paused
while that node's BlockLimits are refreshed.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20220216105355.30729-4-hreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 971bea8089531af56b1bbd9ce62e756bdf006711)
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
 .../qemu-iotests/tests/graph-changes-while-io | 91 +++++++++++++++++++
 .../tests/graph-changes-while-io.out          |  5 +
 2 files changed, 96 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/graph-changes-while-io
 create mode 100644 tests/qemu-iotests/tests/graph-changes-while-io.out

diff --git a/tests/qemu-iotests/tests/graph-changes-while-io b/tests/qemu-iotests/tests/graph-changes-while-io
new file mode 100755
index 00000000000..567e8cf21e9
--- /dev/null
+++ b/tests/qemu-iotests/tests/graph-changes-while-io
@@ -0,0 +1,91 @@
+#!/usr/bin/env python3
+# group: rw
+#
+# Test graph changes while I/O is happening
+#
+# Copyright (C) 2022 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+import os
+from threading import Thread
+import iotests
+from iotests import imgfmt, qemu_img, qemu_img_create, QMPTestCase, \
+        QemuStorageDaemon
+
+
+top = os.path.join(iotests.test_dir, 'top.img')
+nbd_sock = os.path.join(iotests.sock_dir, 'nbd.sock')
+
+
+def do_qemu_img_bench() -> None:
+    """
+    Do some I/O requests on `nbd_sock`.
+    """
+    assert qemu_img('bench', '-f', 'raw', '-c', '2000000',
+                    f'nbd+unix:///node0?socket={nbd_sock}') == 0
+
+
+class TestGraphChangesWhileIO(QMPTestCase):
+    def setUp(self) -> None:
+        # Create an overlay that can be added at runtime on top of the
+        # null-co block node that will receive I/O
+        assert qemu_img_create('-f', imgfmt, '-F', 'raw', '-b', 'null-co://',
+                               top) == 0
+
+        # QSD instance with a null-co block node in an I/O thread,
+        # exported over NBD (on `nbd_sock`, export name "node0")
+        self.qsd = QemuStorageDaemon(
+            '--object', 'iothread,id=iothread0',
+            '--blockdev', 'null-co,node-name=node0,read-zeroes=true',
+            '--nbd-server', f'addr.type=unix,addr.path={nbd_sock}',
+            '--export', 'nbd,id=exp0,node-name=node0,iothread=iothread0,' +
+                        'fixed-iothread=true,writable=true',
+            qmp=True
+        )
+
+    def tearDown(self) -> None:
+        self.qsd.stop()
+
+    def test_blockdev_add_while_io(self) -> None:
+        # Run qemu-img bench in the background
+        bench_thr = Thread(target=do_qemu_img_bench)
+        bench_thr.start()
+
+        # While qemu-img bench is running, repeatedly add and remove an
+        # overlay to/from node0
+        while bench_thr.is_alive():
+            result = self.qsd.qmp('blockdev-add', {
+                'driver': imgfmt,
+                'node-name': 'overlay',
+                'backing': 'node0',
+                'file': {
+                    'driver': 'file',
+                    'filename': top
+                }
+            })
+            self.assert_qmp(result, 'return', {})
+
+            result = self.qsd.qmp('blockdev-del', {
+                'node-name': 'overlay'
+            })
+            self.assert_qmp(result, 'return', {})
+
+        bench_thr.join()
+
+if __name__ == '__main__':
+    # Format must support raw backing files
+    iotests.main(supported_fmts=['qcow', 'qcow2', 'qed'],
+                 supported_protocols=['file'])
diff --git a/tests/qemu-iotests/tests/graph-changes-while-io.out b/tests/qemu-iotests/tests/graph-changes-while-io.out
new file mode 100644
index 00000000000..ae1213e6f86
--- /dev/null
+++ b/tests/qemu-iotests/tests/graph-changes-while-io.out
@@ -0,0 +1,5 @@
+.
+----------------------------------------------------------------------
+Ran 1 tests
+
+OK
-- 
Gitee


From 2fc49b1471908a53d7057ec4f578124715b35245 Mon Sep 17 00:00:00 2001
From: Vivek Goyal <vgoyal@redhat.com>
Date: Tue, 8 Feb 2022 15:48:04 -0500
Subject: [PATCH 107/407] virtiofsd: Fix breakage due to fuse_init_in size
 change

RH-Author: Dr. David Alan Gilbert <dgilbert@redhat.com>
RH-MergeRequest: 193: virtiofsd: Fix breakage due to fuse_init_in size change
RH-Commit: [1/1] 5809db034f9361fb462181d71e7cdde1324f8e54
RH-Bugzilla: 2097209
RH-Acked-by: German Maglione <None>
RH-Acked-by: Laszlo Ersek <lersek@redhat.com>
RH-Acked-by: Vivek Goyal <None>
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

Kernel version 5.17 has increased the size of "struct fuse_init_in" struct.
Previously this struct was 16 bytes and now it has been extended to
64 bytes in size.

Once qemu headers are updated to latest, it will expect to receive 64 byte
size struct (for protocol version major 7 and minor > 6). But if guest is
booting older kernel (older than 5.17), then it still sends older
fuse_init_in of size 16 bytes. And do_init() fails. It is expecting
64 byte struct. And this results in mount of virtiofs failing.

Fix this by parsing 16 bytes only for now. Separate patches will be
posted which will parse rest of the bytes and enable new functionality.
Right now we don't support any of the new functionality, so we don't
lose anything by not parsing bytes beyond 16.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <20220208204813.682906-2-vgoyal@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
(cherry picked from commit a086d54c6ffa38f7e71f182b63a25315304a3392)
---
 tools/virtiofsd/fuse_lowlevel.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index e4679c73abc..5d431a7038c 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -1880,6 +1880,8 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
                     struct fuse_mbuf_iter *iter)
 {
     size_t compat_size = offsetof(struct fuse_init_in, max_readahead);
+    size_t compat2_size = offsetof(struct fuse_init_in, flags) +
+                              sizeof(uint32_t);
     struct fuse_init_in *arg;
     struct fuse_init_out outarg;
     struct fuse_session *se = req->se;
@@ -1897,7 +1899,7 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
 
     /* ...and now consume the new fields. */
     if (arg->major == 7 && arg->minor >= 6) {
-        if (!fuse_mbuf_iter_advance(iter, sizeof(*arg) - compat_size)) {
+        if (!fuse_mbuf_iter_advance(iter, compat2_size - compat_size)) {
             fuse_reply_err(req, EINVAL);
             return;
         }
-- 
Gitee


From 2640bf431c7006b3dfd757b5b98732b808c75dbe Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Philippe=20Mathieu-Daud=C3=A9?= <philmd@redhat.com>
Date: Thu, 18 Nov 2021 12:57:32 +0100
Subject: [PATCH 108/407] hw/block/fdc: Prevent end-of-track overrun
 (CVE-2021-3507)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 194: hw/block/fdc: Prevent end-of-track overrun (CVE-2021-3507)
RH-Commit: [1/2] 31fa0351382b4ca5bd989b09e4d811ae73040673 (jmaloy/qemu-kvm)
RH-Bugzilla: 1951521
RH-Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Hanna Reitz <hreitz@redhat.com>

Per the 82078 datasheet, if the end-of-track (EOT byte in
the FIFO) is more than the number of sectors per side, the
command is terminated unsuccessfully:

* 5.2.5 DATA TRANSFER TERMINATION

  The 82078 supports terminal count explicitly through
  the TC pin and implicitly through the underrun/over-
  run and end-of-track (EOT) functions. For full sector
  transfers, the EOT parameter can define the last
  sector to be transferred in a single or multisector
  transfer. If the last sector to be transferred is a par-
  tial sector, the host can stop transferring the data in
  mid-sector, and the 82078 will continue to complete
  the sector as if a hardware TC was received. The
  only difference between these implicit functions and
  TC is that they return "abnormal termination" result
  status. Such status indications can be ignored if they
  were expected.

* 6.1.3 READ TRACK

  This command terminates when the EOT specified
  number of sectors have been read. If the 82078
  does not find an I D Address Mark on the diskette
  after the second· occurrence of a pulse on the
  INDX# pin, then it sets the IC code in Status Regis-
  ter 0 to "01" (Abnormal termination), sets the MA bit
  in Status Register 1 to "1", and terminates the com-
  mand.

* 6.1.6 VERIFY

  Refer to Table 6-6 and Table 6-7 for information
  concerning the values of MT and EC versus SC and
  EOT value.

* Table 6·6. Result Phase Table

* Table 6-7. Verify Command Result Phase Table

Fix by aborting the transfer when EOT > # Sectors Per Side.

Cc: qemu-stable@nongnu.org
Cc: Hervé Poussineau <hpoussin@reactos.org>
Fixes: baca51faff0 ("floppy driver: disk geometry auto detect")
Reported-by: Alexander Bulekov <alxndr@bu.edu>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/339
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20211118115733.4038610-2-philmd@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit defac5e2fbddf8423a354ff0454283a2115e1367)
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 hw/block/fdc.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/hw/block/fdc.c b/hw/block/fdc.c
index 97fa6de4239..755a26c114f 100644
--- a/hw/block/fdc.c
+++ b/hw/block/fdc.c
@@ -1531,6 +1531,14 @@ static void fdctrl_start_transfer(FDCtrl *fdctrl, int direction)
         int tmp;
         fdctrl->data_len = 128 << (fdctrl->fifo[5] > 7 ? 7 : fdctrl->fifo[5]);
         tmp = (fdctrl->fifo[6] - ks + 1);
+        if (tmp < 0) {
+            FLOPPY_DPRINTF("invalid EOT: %d\n", tmp);
+            fdctrl_stop_transfer(fdctrl, FD_SR0_ABNTERM, FD_SR1_MA, 0x00);
+            fdctrl->fifo[3] = kt;
+            fdctrl->fifo[4] = kh;
+            fdctrl->fifo[5] = ks;
+            return;
+        }
         if (fdctrl->fifo[0] & 0x80)
             tmp += fdctrl->fifo[6];
         fdctrl->data_len *= tmp;
-- 
Gitee


From 3e23d7d0964f08a36064b3bc0c6ed37e8644c5fd Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Philippe=20Mathieu-Daud=C3=A9?= <philmd@redhat.com>
Date: Thu, 18 Nov 2021 12:57:33 +0100
Subject: [PATCH 109/407] tests/qtest/fdc-test: Add a regression test for
 CVE-2021-3507
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 194: hw/block/fdc: Prevent end-of-track overrun (CVE-2021-3507)
RH-Commit: [2/2] 31ec71276b521b06d4142fffa88a3fa4d1494d92 (jmaloy/qemu-kvm)
RH-Bugzilla: 1951521
RH-Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Hanna Reitz <hreitz@redhat.com>

Add the reproducer from https://gitlab.com/qemu-project/qemu/-/issues/339

Without the previous commit, when running 'make check-qtest-i386'
with QEMU configured with '--enable-sanitizers' we get:

  ==4028352==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x619000062a00 at pc 0x5626d03c491a bp 0x7ffdb4199410 sp 0x7ffdb4198bc0
  READ of size 786432 at 0x619000062a00 thread T0
      #0 0x5626d03c4919 in __asan_memcpy (qemu-system-i386+0x1e65919)
      #1 0x5626d1c023cc in flatview_write_continue softmmu/physmem.c:2787:13
      #2 0x5626d1bf0c0f in flatview_write softmmu/physmem.c:2822:14
      #3 0x5626d1bf0798 in address_space_write softmmu/physmem.c:2914:18
      #4 0x5626d1bf0f37 in address_space_rw softmmu/physmem.c:2924:16
      #5 0x5626d1bf14c8 in cpu_physical_memory_rw softmmu/physmem.c:2933:5
      #6 0x5626d0bd5649 in cpu_physical_memory_write include/exec/cpu-common.h:82:5
      #7 0x5626d0bd0a07 in i8257_dma_write_memory hw/dma/i8257.c:452:9
      #8 0x5626d09f825d in fdctrl_transfer_handler hw/block/fdc.c:1616:13
      #9 0x5626d0a048b4 in fdctrl_start_transfer hw/block/fdc.c:1539:13
      #10 0x5626d09f4c3e in fdctrl_write_data hw/block/fdc.c:2266:13
      #11 0x5626d09f22f7 in fdctrl_write hw/block/fdc.c:829:9
      #12 0x5626d1c20bc5 in portio_write softmmu/ioport.c:207:17

  0x619000062a00 is located 0 bytes to the right of 512-byte region [0x619000062800,0x619000062a00)
  allocated by thread T0 here:
      #0 0x5626d03c66ec in posix_memalign (qemu-system-i386+0x1e676ec)
      #1 0x5626d2b988d4 in qemu_try_memalign util/oslib-posix.c:210:11
      #2 0x5626d2b98b0c in qemu_memalign util/oslib-posix.c:226:27
      #3 0x5626d09fbaf0 in fdctrl_realize_common hw/block/fdc.c:2341:20
      #4 0x5626d0a150ed in isabus_fdc_realize hw/block/fdc-isa.c:113:5
      #5 0x5626d2367935 in device_set_realized hw/core/qdev.c:531:13

  SUMMARY: AddressSanitizer: heap-buffer-overflow (qemu-system-i386+0x1e65919) in __asan_memcpy
  Shadow bytes around the buggy address:
    0x0c32800044f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x0c3280004500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0c3280004510: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0c3280004520: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0c3280004530: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  =>0x0c3280004540:[fa]fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x0c3280004550: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x0c3280004560: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x0c3280004570: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x0c3280004580: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x0c3280004590: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  Shadow byte legend (one shadow byte represents 8 application bytes):
    Addressable:           00
    Heap left redzone:       fa
    Freed heap region:       fd
  ==4028352==ABORTING

[ kwolf: Added snapshot=on to prevent write file lock failure ]

Reported-by: Alexander Bulekov <alxndr@bu.edu>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alexander Bulekov <alxndr@bu.edu>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 46609b90d9e3a6304def11038a76b58ff43f77bc)
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 tests/qtest/fdc-test.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/tests/qtest/fdc-test.c b/tests/qtest/fdc-test.c
index 8f6eee84a47..6f5850354fd 100644
--- a/tests/qtest/fdc-test.c
+++ b/tests/qtest/fdc-test.c
@@ -583,6 +583,26 @@ static void test_cve_2021_20196(void)
     qtest_quit(s);
 }
 
+static void test_cve_2021_3507(void)
+{
+    QTestState *s;
+
+    s = qtest_initf("-nographic -m 32M -nodefaults "
+                    "-drive file=%s,format=raw,if=floppy,snapshot=on",
+                    test_image);
+    qtest_outl(s, 0x9, 0x0a0206);
+    qtest_outw(s, 0x3f4, 0x1600);
+    qtest_outw(s, 0x3f4, 0x0000);
+    qtest_outw(s, 0x3f4, 0x0000);
+    qtest_outw(s, 0x3f4, 0x0000);
+    qtest_outw(s, 0x3f4, 0x0200);
+    qtest_outw(s, 0x3f4, 0x0200);
+    qtest_outw(s, 0x3f4, 0x0000);
+    qtest_outw(s, 0x3f4, 0x0000);
+    qtest_outw(s, 0x3f4, 0x0000);
+    qtest_quit(s);
+}
+
 int main(int argc, char **argv)
 {
     int fd;
@@ -614,6 +634,7 @@ int main(int argc, char **argv)
     qtest_add_func("/fdc/read_no_dma_19", test_read_no_dma_19);
     qtest_add_func("/fdc/fuzz-registers", fuzz_registers);
     qtest_add_func("/fdc/fuzz/cve_2021_20196", test_cve_2021_20196);
+    qtest_add_func("/fdc/fuzz/cve_2021_3507", test_cve_2021_3507);
 
     ret = g_test_run();
 
-- 
Gitee


From 752b29baeba70a7424caa48d88b4da11494a9f16 Mon Sep 17 00:00:00 2001
From: Juan Quintela <quintela@redhat.com>
Date: Wed, 18 May 2022 02:52:22 -0300
Subject: [PATCH 110/407] migration: Never call twice qemu_target_page_size()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Leonardo Brás <leobras@redhat.com>
RH-MergeRequest: 191: MSG_ZEROCOPY + Multifd @ rhel8.7
RH-Commit: [1/26] 809ca84dec80bafc1959df8c9e57f482ee752a97
RH-Bugzilla: 2072049
RH-Acked-by: Peter Xu <peterx@redhat.com>
RH-Acked-by: Daniel P. Berrangé <berrange@redhat.com>
RH-Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
(cherry picked from commit 144fa06b3431e806057ce1438338395b35a3e544)
Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 migration/migration.c | 7 ++++---
 migration/multifd.c   | 7 ++++---
 migration/savevm.c    | 5 +++--
 3 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index a87ff01b819..8a13294da69 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -992,6 +992,8 @@ static void populate_time_info(MigrationInfo *info, MigrationState *s)
 
 static void populate_ram_info(MigrationInfo *info, MigrationState *s)
 {
+    size_t page_size = qemu_target_page_size();
+
     info->has_ram = true;
     info->ram = g_malloc0(sizeof(*info->ram));
     info->ram->transferred = ram_counters.transferred;
@@ -1000,12 +1002,11 @@ static void populate_ram_info(MigrationInfo *info, MigrationState *s)
     /* legacy value.  It is not used anymore */
     info->ram->skipped = 0;
     info->ram->normal = ram_counters.normal;
-    info->ram->normal_bytes = ram_counters.normal *
-        qemu_target_page_size();
+    info->ram->normal_bytes = ram_counters.normal * page_size;
     info->ram->mbps = s->mbps;
     info->ram->dirty_sync_count = ram_counters.dirty_sync_count;
     info->ram->postcopy_requests = ram_counters.postcopy_requests;
-    info->ram->page_size = qemu_target_page_size();
+    info->ram->page_size = page_size;
     info->ram->multifd_bytes = ram_counters.multifd_bytes;
     info->ram->pages_per_second = s->pages_per_second;
 
diff --git a/migration/multifd.c b/migration/multifd.c
index 7c9deb1921d..8125d0015c4 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -289,7 +289,8 @@ static void multifd_send_fill_packet(MultiFDSendParams *p)
 static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
 {
     MultiFDPacket_t *packet = p->packet;
-    uint32_t pages_max = MULTIFD_PACKET_SIZE / qemu_target_page_size();
+    size_t page_size = qemu_target_page_size();
+    uint32_t pages_max = MULTIFD_PACKET_SIZE / page_size;
     RAMBlock *block;
     int i;
 
@@ -358,14 +359,14 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
     for (i = 0; i < p->pages->used; i++) {
         uint64_t offset = be64_to_cpu(packet->offset[i]);
 
-        if (offset > (block->used_length - qemu_target_page_size())) {
+        if (offset > (block->used_length - page_size)) {
             error_setg(errp, "multifd: offset too long %" PRIu64
                        " (max " RAM_ADDR_FMT ")",
                        offset, block->used_length);
             return -1;
         }
         p->pages->iov[i].iov_base = block->host + offset;
-        p->pages->iov[i].iov_len = qemu_target_page_size();
+        p->pages->iov[i].iov_len = page_size;
     }
 
     return 0;
diff --git a/migration/savevm.c b/migration/savevm.c
index d59e976d50e..0bef031acb3 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1685,6 +1685,7 @@ static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis,
 {
     PostcopyState ps = postcopy_state_set(POSTCOPY_INCOMING_ADVISE);
     uint64_t remote_pagesize_summary, local_pagesize_summary, remote_tps;
+    size_t page_size = qemu_target_page_size();
     Error *local_err = NULL;
 
     trace_loadvm_postcopy_handle_advise();
@@ -1741,13 +1742,13 @@ static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis,
     }
 
     remote_tps = qemu_get_be64(mis->from_src_file);
-    if (remote_tps != qemu_target_page_size()) {
+    if (remote_tps != page_size) {
         /*
          * Again, some differences could be dealt with, but for now keep it
          * simple.
          */
         error_report("Postcopy needs matching target page sizes (s=%d d=%zd)",
-                     (int)remote_tps, qemu_target_page_size());
+                     (int)remote_tps, page_size);
         return -1;
     }
 
-- 
Gitee


From 746f6267a46222dd8b532da36ab986a013ce4a10 Mon Sep 17 00:00:00 2001
From: Juan Quintela <quintela@redhat.com>
Date: Wed, 18 May 2022 02:52:22 -0300
Subject: [PATCH 111/407] multifd: Rename used field to num
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Leonardo Brás <leobras@redhat.com>
RH-MergeRequest: 191: MSG_ZEROCOPY + Multifd @ rhel8.7
RH-Commit: [2/26] 952283197ef89be4d61c7690bb6c3194e5c67217
RH-Bugzilla: 2072049
RH-Acked-by: Peter Xu <peterx@redhat.com>
RH-Acked-by: Daniel P. Berrangé <berrange@redhat.com>
RH-Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

We will need to split it later in zero_num (number of zero pages) and
normal_num (number of normal pages).  This name is better.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
(cherry picked from commit 90a3d2f9d5f729147b2827c177932603ae6e2d55)
Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 migration/multifd.c | 38 +++++++++++++++++++-------------------
 migration/multifd.h |  2 +-
 2 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 8125d0015c4..8ea86d81dcd 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -252,7 +252,7 @@ static MultiFDPages_t *multifd_pages_init(size_t size)
 
 static void multifd_pages_clear(MultiFDPages_t *pages)
 {
-    pages->used = 0;
+    pages->num = 0;
     pages->allocated = 0;
     pages->packet_num = 0;
     pages->block = NULL;
@@ -270,7 +270,7 @@ static void multifd_send_fill_packet(MultiFDSendParams *p)
 
     packet->flags = cpu_to_be32(p->flags);
     packet->pages_alloc = cpu_to_be32(p->pages->allocated);
-    packet->pages_used = cpu_to_be32(p->pages->used);
+    packet->pages_used = cpu_to_be32(p->pages->num);
     packet->next_packet_size = cpu_to_be32(p->next_packet_size);
     packet->packet_num = cpu_to_be64(p->packet_num);
 
@@ -278,7 +278,7 @@ static void multifd_send_fill_packet(MultiFDSendParams *p)
         strncpy(packet->ramblock, p->pages->block->idstr, 256);
     }
 
-    for (i = 0; i < p->pages->used; i++) {
+    for (i = 0; i < p->pages->num; i++) {
         /* there are architectures where ram_addr_t is 32 bit */
         uint64_t temp = p->pages->offset[i];
 
@@ -332,18 +332,18 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
         p->pages = multifd_pages_init(packet->pages_alloc);
     }
 
-    p->pages->used = be32_to_cpu(packet->pages_used);
-    if (p->pages->used > packet->pages_alloc) {
+    p->pages->num = be32_to_cpu(packet->pages_used);
+    if (p->pages->num > packet->pages_alloc) {
         error_setg(errp, "multifd: received packet "
                    "with %d pages and expected maximum pages are %d",
-                   p->pages->used, packet->pages_alloc) ;
+                   p->pages->num, packet->pages_alloc) ;
         return -1;
     }
 
     p->next_packet_size = be32_to_cpu(packet->next_packet_size);
     p->packet_num = be64_to_cpu(packet->packet_num);
 
-    if (p->pages->used == 0) {
+    if (p->pages->num == 0) {
         return 0;
     }
 
@@ -356,7 +356,7 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
         return -1;
     }
 
-    for (i = 0; i < p->pages->used; i++) {
+    for (i = 0; i < p->pages->num; i++) {
         uint64_t offset = be64_to_cpu(packet->offset[i]);
 
         if (offset > (block->used_length - page_size)) {
@@ -443,13 +443,13 @@ static int multifd_send_pages(QEMUFile *f)
         }
         qemu_mutex_unlock(&p->mutex);
     }
-    assert(!p->pages->used);
+    assert(!p->pages->num);
     assert(!p->pages->block);
 
     p->packet_num = multifd_send_state->packet_num++;
     multifd_send_state->pages = p->pages;
     p->pages = pages;
-    transferred = ((uint64_t) pages->used) * qemu_target_page_size()
+    transferred = ((uint64_t) pages->num) * qemu_target_page_size()
                 + p->packet_len;
     qemu_file_update_transfer(f, transferred);
     ram_counters.multifd_bytes += transferred;
@@ -469,12 +469,12 @@ int multifd_queue_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset)
     }
 
     if (pages->block == block) {
-        pages->offset[pages->used] = offset;
-        pages->iov[pages->used].iov_base = block->host + offset;
-        pages->iov[pages->used].iov_len = qemu_target_page_size();
-        pages->used++;
+        pages->offset[pages->num] = offset;
+        pages->iov[pages->num].iov_base = block->host + offset;
+        pages->iov[pages->num].iov_len = qemu_target_page_size();
+        pages->num++;
 
-        if (pages->used < pages->allocated) {
+        if (pages->num < pages->allocated) {
             return 1;
         }
     }
@@ -586,7 +586,7 @@ void multifd_send_sync_main(QEMUFile *f)
     if (!migrate_use_multifd()) {
         return;
     }
-    if (multifd_send_state->pages->used) {
+    if (multifd_send_state->pages->num) {
         if (multifd_send_pages(f) < 0) {
             error_report("%s: multifd_send_pages fail", __func__);
             return;
@@ -649,7 +649,7 @@ static void *multifd_send_thread(void *opaque)
         qemu_mutex_lock(&p->mutex);
 
         if (p->pending_job) {
-            uint32_t used = p->pages->used;
+            uint32_t used = p->pages->num;
             uint64_t packet_num = p->packet_num;
             flags = p->flags;
 
@@ -665,7 +665,7 @@ static void *multifd_send_thread(void *opaque)
             p->flags = 0;
             p->num_packets++;
             p->num_pages += used;
-            p->pages->used = 0;
+            p->pages->num = 0;
             p->pages->block = NULL;
             qemu_mutex_unlock(&p->mutex);
 
@@ -1091,7 +1091,7 @@ static void *multifd_recv_thread(void *opaque)
             break;
         }
 
-        used = p->pages->used;
+        used = p->pages->num;
         flags = p->flags;
         /* recv methods don't know how to handle the SYNC flag */
         p->flags &= ~MULTIFD_FLAG_SYNC;
diff --git a/migration/multifd.h b/migration/multifd.h
index 15c50ca0b24..86820dd028a 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -55,7 +55,7 @@ typedef struct {
 
 typedef struct {
     /* number of used pages */
-    uint32_t used;
+    uint32_t num;
     /* number of allocated pages */
     uint32_t allocated;
     /* global number of generated multifd packets */
-- 
Gitee


From 726dd797623dd6674fa2cc14e69d413ef278c65d Mon Sep 17 00:00:00 2001
From: Juan Quintela <quintela@redhat.com>
Date: Wed, 18 May 2022 02:52:23 -0300
Subject: [PATCH 112/407] multifd: Add missing documentation
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Leonardo Brás <leobras@redhat.com>
RH-MergeRequest: 191: MSG_ZEROCOPY + Multifd @ rhel8.7
RH-Commit: [3/26] 924fca4305ebd8669955d456fc1c515f509e6026
RH-Bugzilla: 2072049
RH-Acked-by: Peter Xu <peterx@redhat.com>
RH-Acked-by: Daniel P. Berrangé <berrange@redhat.com>
RH-Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
(cherry picked from commit 18ede636bc29fd8bda628fe3e5c593f8c1b734f4)
(fixed typo in commit message)
Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 migration/multifd-zlib.c | 2 ++
 migration/multifd-zstd.c | 2 ++
 migration/multifd.c      | 1 +
 3 files changed, 5 insertions(+)

diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
index ab4ba75d752..f403d2f0310 100644
--- a/migration/multifd-zlib.c
+++ b/migration/multifd-zlib.c
@@ -74,6 +74,7 @@ static int zlib_send_setup(MultiFDSendParams *p, Error **errp)
  * Close the channel and return memory.
  *
  * @p: Params for the channel that we are using
+ * @errp: pointer to an error
  */
 static void zlib_send_cleanup(MultiFDSendParams *p, Error **errp)
 {
@@ -96,6 +97,7 @@ static void zlib_send_cleanup(MultiFDSendParams *p, Error **errp)
  *
  * @p: Params for the channel that we are using
  * @used: number of pages used
+ * @errp: pointer to an error
  */
 static int zlib_send_prepare(MultiFDSendParams *p, uint32_t used, Error **errp)
 {
diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
index 693bddf8c9e..8d657f88604 100644
--- a/migration/multifd-zstd.c
+++ b/migration/multifd-zstd.c
@@ -86,6 +86,7 @@ static int zstd_send_setup(MultiFDSendParams *p, Error **errp)
  * Close the channel and return memory.
  *
  * @p: Params for the channel that we are using
+ * @errp: pointer to an error
  */
 static void zstd_send_cleanup(MultiFDSendParams *p, Error **errp)
 {
@@ -109,6 +110,7 @@ static void zstd_send_cleanup(MultiFDSendParams *p, Error **errp)
  *
  * @p: Params for the channel that we are using
  * @used: number of pages used
+ * @errp: pointer to an error
  */
 static int zstd_send_prepare(MultiFDSendParams *p, uint32_t used, Error **errp)
 {
diff --git a/migration/multifd.c b/migration/multifd.c
index 8ea86d81dcd..cdeffdc4c59 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -66,6 +66,7 @@ static int nocomp_send_setup(MultiFDSendParams *p, Error **errp)
  * For no compression this function does nothing.
  *
  * @p: Params for the channel that we are using
+ * @errp: pointer to an error
  */
 static void nocomp_send_cleanup(MultiFDSendParams *p, Error **errp)
 {
-- 
Gitee


From 9dcd0d97298bd8a50af6ca78e7bc2bf18f233e67 Mon Sep 17 00:00:00 2001
From: Juan Quintela <quintela@redhat.com>
Date: Wed, 18 May 2022 02:52:23 -0300
Subject: [PATCH 113/407] multifd: The variable is only used inside the loop
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Leonardo Brás <leobras@redhat.com>
RH-MergeRequest: 191: MSG_ZEROCOPY + Multifd @ rhel8.7
RH-Commit: [4/26] 45d8bbde75ebbef6329c41ddb56db4526739f94f
RH-Bugzilla: 2072049
RH-Acked-by: Peter Xu <peterx@redhat.com>
RH-Acked-by: Daniel P. Berrangé <berrange@redhat.com>
RH-Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
(cherry picked from commit 1943c11a62bd0741e5d9fbba78404fe47ebea820)
Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 migration/multifd.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index cdeffdc4c59..ce7101cf9de 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -629,7 +629,6 @@ static void *multifd_send_thread(void *opaque)
     MultiFDSendParams *p = opaque;
     Error *local_err = NULL;
     int ret = 0;
-    uint32_t flags = 0;
 
     trace_multifd_send_thread_start(p->id);
     rcu_register_thread();
@@ -652,7 +651,7 @@ static void *multifd_send_thread(void *opaque)
         if (p->pending_job) {
             uint32_t used = p->pages->num;
             uint64_t packet_num = p->packet_num;
-            flags = p->flags;
+            uint32_t flags = p->flags;
 
             if (used) {
                 ret = multifd_send_state->ops->send_prepare(p, used,
-- 
Gitee


From 1d5f62ea0202a0472744e9776637c0a7bc6fd769 Mon Sep 17 00:00:00 2001
From: Juan Quintela <quintela@redhat.com>
Date: Wed, 18 May 2022 02:52:23 -0300
Subject: [PATCH 114/407] multifd: remove used parameter from send_prepare()
 method
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Leonardo Brás <leobras@redhat.com>
RH-MergeRequest: 191: MSG_ZEROCOPY + Multifd @ rhel8.7
RH-Commit: [5/26] ad6360d19d65e8c332dcdc3d3234478639e03db8
RH-Bugzilla: 2072049
RH-Acked-by: Peter Xu <peterx@redhat.com>
RH-Acked-by: Daniel P. Berrangé <berrange@redhat.com>
RH-Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

It is already there as p->pages->num.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
(cherry picked from commit 02fb81043ecee338e4aeb8f5be09a46325dc5e43)
Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 migration/multifd-zlib.c | 7 +++----
 migration/multifd-zstd.c | 7 +++----
 migration/multifd.c      | 9 +++------
 migration/multifd.h      | 2 +-
 4 files changed, 10 insertions(+), 15 deletions(-)

diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
index f403d2f0310..0c70a2dc785 100644
--- a/migration/multifd-zlib.c
+++ b/migration/multifd-zlib.c
@@ -96,10 +96,9 @@ static void zlib_send_cleanup(MultiFDSendParams *p, Error **errp)
  * Returns 0 for success or -1 for error
  *
  * @p: Params for the channel that we are using
- * @used: number of pages used
  * @errp: pointer to an error
  */
-static int zlib_send_prepare(MultiFDSendParams *p, uint32_t used, Error **errp)
+static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
 {
     struct iovec *iov = p->pages->iov;
     struct zlib_data *z = p->data;
@@ -108,11 +107,11 @@ static int zlib_send_prepare(MultiFDSendParams *p, uint32_t used, Error **errp)
     int ret;
     uint32_t i;
 
-    for (i = 0; i < used; i++) {
+    for (i = 0; i < p->pages->num; i++) {
         uint32_t available = z->zbuff_len - out_size;
         int flush = Z_NO_FLUSH;
 
-        if (i == used - 1) {
+        if (i == p->pages->num - 1) {
             flush = Z_SYNC_FLUSH;
         }
 
diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
index 8d657f88604..466b370cad2 100644
--- a/migration/multifd-zstd.c
+++ b/migration/multifd-zstd.c
@@ -109,10 +109,9 @@ static void zstd_send_cleanup(MultiFDSendParams *p, Error **errp)
  * Returns 0 for success or -1 for error
  *
  * @p: Params for the channel that we are using
- * @used: number of pages used
  * @errp: pointer to an error
  */
-static int zstd_send_prepare(MultiFDSendParams *p, uint32_t used, Error **errp)
+static int zstd_send_prepare(MultiFDSendParams *p, Error **errp)
 {
     struct iovec *iov = p->pages->iov;
     struct zstd_data *z = p->data;
@@ -123,10 +122,10 @@ static int zstd_send_prepare(MultiFDSendParams *p, uint32_t used, Error **errp)
     z->out.size = z->zbuff_len;
     z->out.pos = 0;
 
-    for (i = 0; i < used; i++) {
+    for (i = 0; i < p->pages->num; i++) {
         ZSTD_EndDirective flush = ZSTD_e_continue;
 
-        if (i == used - 1) {
+        if (i == p->pages->num - 1) {
             flush = ZSTD_e_flush;
         }
         z->in.src = iov[i].iov_base;
diff --git a/migration/multifd.c b/migration/multifd.c
index ce7101cf9de..098ef8842c9 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -82,13 +82,11 @@ static void nocomp_send_cleanup(MultiFDSendParams *p, Error **errp)
  * Returns 0 for success or -1 for error
  *
  * @p: Params for the channel that we are using
- * @used: number of pages used
  * @errp: pointer to an error
  */
-static int nocomp_send_prepare(MultiFDSendParams *p, uint32_t used,
-                               Error **errp)
+static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp)
 {
-    p->next_packet_size = used * qemu_target_page_size();
+    p->next_packet_size = p->pages->num * qemu_target_page_size();
     p->flags |= MULTIFD_FLAG_NOCOMP;
     return 0;
 }
@@ -654,8 +652,7 @@ static void *multifd_send_thread(void *opaque)
             uint32_t flags = p->flags;
 
             if (used) {
-                ret = multifd_send_state->ops->send_prepare(p, used,
-                                                            &local_err);
+                ret = multifd_send_state->ops->send_prepare(p, &local_err);
                 if (ret != 0) {
                     qemu_mutex_unlock(&p->mutex);
                     break;
diff --git a/migration/multifd.h b/migration/multifd.h
index 86820dd028a..7968cc5c206 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -159,7 +159,7 @@ typedef struct {
     /* Cleanup for sending side */
     void (*send_cleanup)(MultiFDSendParams *p, Error **errp);
     /* Prepare the send packet */
-    int (*send_prepare)(MultiFDSendParams *p, uint32_t used, Error **errp);
+    int (*send_prepare)(MultiFDSendParams *p, Error **errp);
     /* Write the send packet */
     int (*send_write)(MultiFDSendParams *p, uint32_t used, Error **errp);
     /* Setup for receiving side */
-- 
Gitee


From c586f73578161489d8df7726760fb4cec25382a2 Mon Sep 17 00:00:00 2001
From: Juan Quintela <quintela@redhat.com>
Date: Wed, 18 May 2022 02:52:23 -0300
Subject: [PATCH 115/407] multifd: remove used parameter from send_recv_pages()
 method
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Leonardo Brás <leobras@redhat.com>
RH-MergeRequest: 191: MSG_ZEROCOPY + Multifd @ rhel8.7
RH-Commit: [6/26] 5c1a506e4178501a0894ea4e7ac919e1d4d4cc32
RH-Bugzilla: 2072049
RH-Acked-by: Peter Xu <peterx@redhat.com>
RH-Acked-by: Daniel P. Berrangé <berrange@redhat.com>
RH-Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

It is already there as p->pages->num.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
(cherry picked from commit 40a4bfe9d3f8ad35a9c3ffb4cbf7367e2777054b)
Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 migration/multifd-zlib.c | 9 ++++-----
 migration/multifd-zstd.c | 7 +++----
 migration/multifd.c      | 7 +++----
 migration/multifd.h      | 2 +-
 4 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
index 0c70a2dc785..330fc021c57 100644
--- a/migration/multifd-zlib.c
+++ b/migration/multifd-zlib.c
@@ -235,17 +235,16 @@ static void zlib_recv_cleanup(MultiFDRecvParams *p)
  * Returns 0 for success or -1 for error
  *
  * @p: Params for the channel that we are using
- * @used: number of pages used
  * @errp: pointer to an error
  */
-static int zlib_recv_pages(MultiFDRecvParams *p, uint32_t used, Error **errp)
+static int zlib_recv_pages(MultiFDRecvParams *p, Error **errp)
 {
     struct zlib_data *z = p->data;
     z_stream *zs = &z->zs;
     uint32_t in_size = p->next_packet_size;
     /* we measure the change of total_out */
     uint32_t out_size = zs->total_out;
-    uint32_t expected_size = used * qemu_target_page_size();
+    uint32_t expected_size = p->pages->num * qemu_target_page_size();
     uint32_t flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK;
     int ret;
     int i;
@@ -264,12 +263,12 @@ static int zlib_recv_pages(MultiFDRecvParams *p, uint32_t used, Error **errp)
     zs->avail_in = in_size;
     zs->next_in = z->zbuff;
 
-    for (i = 0; i < used; i++) {
+    for (i = 0; i < p->pages->num; i++) {
         struct iovec *iov = &p->pages->iov[i];
         int flush = Z_NO_FLUSH;
         unsigned long start = zs->total_out;
 
-        if (i == used - 1) {
+        if (i == p->pages->num - 1) {
             flush = Z_SYNC_FLUSH;
         }
 
diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
index 466b370cad2..f0d11057923 100644
--- a/migration/multifd-zstd.c
+++ b/migration/multifd-zstd.c
@@ -255,14 +255,13 @@ static void zstd_recv_cleanup(MultiFDRecvParams *p)
  * Returns 0 for success or -1 for error
  *
  * @p: Params for the channel that we are using
- * @used: number of pages used
  * @errp: pointer to an error
  */
-static int zstd_recv_pages(MultiFDRecvParams *p, uint32_t used, Error **errp)
+static int zstd_recv_pages(MultiFDRecvParams *p, Error **errp)
 {
     uint32_t in_size = p->next_packet_size;
     uint32_t out_size = 0;
-    uint32_t expected_size = used * qemu_target_page_size();
+    uint32_t expected_size = p->pages->num * qemu_target_page_size();
     uint32_t flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK;
     struct zstd_data *z = p->data;
     int ret;
@@ -283,7 +282,7 @@ static int zstd_recv_pages(MultiFDRecvParams *p, uint32_t used, Error **errp)
     z->in.size = in_size;
     z->in.pos = 0;
 
-    for (i = 0; i < used; i++) {
+    for (i = 0; i < p->pages->num; i++) {
         struct iovec *iov = &p->pages->iov[i];
 
         z->out.dst = iov->iov_base;
diff --git a/migration/multifd.c b/migration/multifd.c
index 098ef8842c9..55d99a82325 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -141,10 +141,9 @@ static void nocomp_recv_cleanup(MultiFDRecvParams *p)
  * Returns 0 for success or -1 for error
  *
  * @p: Params for the channel that we are using
- * @used: number of pages used
  * @errp: pointer to an error
  */
-static int nocomp_recv_pages(MultiFDRecvParams *p, uint32_t used, Error **errp)
+static int nocomp_recv_pages(MultiFDRecvParams *p, Error **errp)
 {
     uint32_t flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK;
 
@@ -153,7 +152,7 @@ static int nocomp_recv_pages(MultiFDRecvParams *p, uint32_t used, Error **errp)
                    p->id, flags, MULTIFD_FLAG_NOCOMP);
         return -1;
     }
-    return qio_channel_readv_all(p->c, p->pages->iov, used, errp);
+    return qio_channel_readv_all(p->c, p->pages->iov, p->pages->num, errp);
 }
 
 static MultiFDMethods multifd_nocomp_ops = {
@@ -1099,7 +1098,7 @@ static void *multifd_recv_thread(void *opaque)
         qemu_mutex_unlock(&p->mutex);
 
         if (used) {
-            ret = multifd_recv_state->ops->recv_pages(p, used, &local_err);
+            ret = multifd_recv_state->ops->recv_pages(p, &local_err);
             if (ret != 0) {
                 break;
             }
diff --git a/migration/multifd.h b/migration/multifd.h
index 7968cc5c206..e57adc783b4 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -167,7 +167,7 @@ typedef struct {
     /* Cleanup for receiving side */
     void (*recv_cleanup)(MultiFDRecvParams *p);
     /* Read all pages */
-    int (*recv_pages)(MultiFDRecvParams *p, uint32_t used, Error **errp);
+    int (*recv_pages)(MultiFDRecvParams *p, Error **errp);
 } MultiFDMethods;
 
 void multifd_register_ops(int method, MultiFDMethods *ops);
-- 
Gitee


From dbe1d660452507fb50c4feb16c77ff9fc0e8c319 Mon Sep 17 00:00:00 2001
From: Juan Quintela <quintela@redhat.com>
Date: Wed, 18 May 2022 02:52:23 -0300
Subject: [PATCH 116/407] multifd: Fill offset and block for reception
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Leonardo Brás <leobras@redhat.com>
RH-MergeRequest: 191: MSG_ZEROCOPY + Multifd @ rhel8.7
RH-Commit: [7/26] 51a9e6b76af956d63fc735172211d9bf6f0f6f80
RH-Bugzilla: 2072049
RH-Acked-by: Peter Xu <peterx@redhat.com>
RH-Acked-by: Daniel P. Berrangé <berrange@redhat.com>
RH-Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

We were using the iov directly, but we will need this info on the
following patch.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
(cherry picked from commit 01102a2ef6c97acc5cc8a2c3bb62b7665a20f51f)
Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 migration/multifd.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/migration/multifd.c b/migration/multifd.c
index 55d99a82325..0533da154a8 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -354,6 +354,7 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
         return -1;
     }
 
+    p->pages->block = block;
     for (i = 0; i < p->pages->num; i++) {
         uint64_t offset = be64_to_cpu(packet->offset[i]);
 
@@ -363,6 +364,7 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
                        offset, block->used_length);
             return -1;
         }
+        p->pages->offset[i] = offset;
         p->pages->iov[i].iov_base = block->host + offset;
         p->pages->iov[i].iov_len = page_size;
     }
-- 
Gitee


From f72b1152fbcc618d3f45810be0cf2ddbfd4ccd80 Mon Sep 17 00:00:00 2001
From: Juan Quintela <quintela@redhat.com>
Date: Wed, 18 May 2022 02:52:23 -0300
Subject: [PATCH 117/407] multifd: Make zstd compression method not use iovs
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Leonardo Brás <leobras@redhat.com>
RH-MergeRequest: 191: MSG_ZEROCOPY + Multifd @ rhel8.7
RH-Commit: [8/26] 010579fa73b5a4c6fd631dc9fbaf6f974974bc99
RH-Bugzilla: 2072049
RH-Acked-by: Peter Xu <peterx@redhat.com>
RH-Acked-by: Daniel P. Berrangé <berrange@redhat.com>
RH-Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
(cherry picked from commit f5ff548774c22b34a0c0e2fef85f1be11160d774)
Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 migration/multifd-zstd.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
index f0d11057923..d9ed42622bb 100644
--- a/migration/multifd-zstd.c
+++ b/migration/multifd-zstd.c
@@ -13,6 +13,7 @@
 #include "qemu/osdep.h"
 #include <zstd.h>
 #include "qemu/rcu.h"
+#include "exec/ramblock.h"
 #include "exec/target_page.h"
 #include "qapi/error.h"
 #include "migration.h"
@@ -113,8 +114,8 @@ static void zstd_send_cleanup(MultiFDSendParams *p, Error **errp)
  */
 static int zstd_send_prepare(MultiFDSendParams *p, Error **errp)
 {
-    struct iovec *iov = p->pages->iov;
     struct zstd_data *z = p->data;
+    size_t page_size = qemu_target_page_size();
     int ret;
     uint32_t i;
 
@@ -128,8 +129,8 @@ static int zstd_send_prepare(MultiFDSendParams *p, Error **errp)
         if (i == p->pages->num - 1) {
             flush = ZSTD_e_flush;
         }
-        z->in.src = iov[i].iov_base;
-        z->in.size = iov[i].iov_len;
+        z->in.src = p->pages->block->host + p->pages->offset[i];
+        z->in.size = page_size;
         z->in.pos = 0;
 
         /*
@@ -261,7 +262,8 @@ static int zstd_recv_pages(MultiFDRecvParams *p, Error **errp)
 {
     uint32_t in_size = p->next_packet_size;
     uint32_t out_size = 0;
-    uint32_t expected_size = p->pages->num * qemu_target_page_size();
+    size_t page_size = qemu_target_page_size();
+    uint32_t expected_size = p->pages->num * page_size;
     uint32_t flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK;
     struct zstd_data *z = p->data;
     int ret;
@@ -283,10 +285,8 @@ static int zstd_recv_pages(MultiFDRecvParams *p, Error **errp)
     z->in.pos = 0;
 
     for (i = 0; i < p->pages->num; i++) {
-        struct iovec *iov = &p->pages->iov[i];
-
-        z->out.dst = iov->iov_base;
-        z->out.size = iov->iov_len;
+        z->out.dst = p->pages->block->host + p->pages->offset[i];
+        z->out.size = page_size;
         z->out.pos = 0;
 
         /*
@@ -300,8 +300,8 @@ static int zstd_recv_pages(MultiFDRecvParams *p, Error **errp)
         do {
             ret = ZSTD_decompressStream(z->zds, &z->out, &z->in);
         } while (ret > 0 && (z->in.size - z->in.pos > 0)
-                         && (z->out.pos < iov->iov_len));
-        if (ret > 0 && (z->out.pos < iov->iov_len)) {
+                         && (z->out.pos < page_size));
+        if (ret > 0 && (z->out.pos < page_size)) {
             error_setg(errp, "multifd %d: decompressStream buffer too small",
                        p->id);
             return -1;
-- 
Gitee


From dc74ef56a20b52e07d180be72c74c88333739216 Mon Sep 17 00:00:00 2001
From: Juan Quintela <quintela@redhat.com>
Date: Wed, 18 May 2022 02:52:23 -0300
Subject: [PATCH 118/407] multifd: Make zlib compression method not use iovs
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Leonardo Brás <leobras@redhat.com>
RH-MergeRequest: 191: MSG_ZEROCOPY + Multifd @ rhel8.7
RH-Commit: [9/26] d33dd62b833d50fee989a195aebcc8d5e7d43181
RH-Bugzilla: 2072049
RH-Acked-by: Peter Xu <peterx@redhat.com>
RH-Acked-by: Daniel P. Berrangé <berrange@redhat.com>
RH-Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
(cherry picked from commit a5ed22948873b50fcf1415d1ce15c71d61a9388d)
Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 migration/multifd-zlib.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
index 330fc021c57..a1950a4588d 100644
--- a/migration/multifd-zlib.c
+++ b/migration/multifd-zlib.c
@@ -13,6 +13,7 @@
 #include "qemu/osdep.h"
 #include <zlib.h>
 #include "qemu/rcu.h"
+#include "exec/ramblock.h"
 #include "exec/target_page.h"
 #include "qapi/error.h"
 #include "migration.h"
@@ -100,8 +101,8 @@ static void zlib_send_cleanup(MultiFDSendParams *p, Error **errp)
  */
 static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
 {
-    struct iovec *iov = p->pages->iov;
     struct zlib_data *z = p->data;
+    size_t page_size = qemu_target_page_size();
     z_stream *zs = &z->zs;
     uint32_t out_size = 0;
     int ret;
@@ -115,8 +116,8 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
             flush = Z_SYNC_FLUSH;
         }
 
-        zs->avail_in = iov[i].iov_len;
-        zs->next_in = iov[i].iov_base;
+        zs->avail_in = page_size;
+        zs->next_in = p->pages->block->host + p->pages->offset[i];
 
         zs->avail_out = available;
         zs->next_out = z->zbuff + out_size;
@@ -240,6 +241,7 @@ static void zlib_recv_cleanup(MultiFDRecvParams *p)
 static int zlib_recv_pages(MultiFDRecvParams *p, Error **errp)
 {
     struct zlib_data *z = p->data;
+    size_t page_size = qemu_target_page_size();
     z_stream *zs = &z->zs;
     uint32_t in_size = p->next_packet_size;
     /* we measure the change of total_out */
@@ -264,7 +266,6 @@ static int zlib_recv_pages(MultiFDRecvParams *p, Error **errp)
     zs->next_in = z->zbuff;
 
     for (i = 0; i < p->pages->num; i++) {
-        struct iovec *iov = &p->pages->iov[i];
         int flush = Z_NO_FLUSH;
         unsigned long start = zs->total_out;
 
@@ -272,8 +273,8 @@ static int zlib_recv_pages(MultiFDRecvParams *p, Error **errp)
             flush = Z_SYNC_FLUSH;
         }
 
-        zs->avail_out = iov->iov_len;
-        zs->next_out = iov->iov_base;
+        zs->avail_out = page_size;
+        zs->next_out = p->pages->block->host + p->pages->offset[i];
 
         /*
          * Welcome to inflate semantics
@@ -286,8 +287,8 @@ static int zlib_recv_pages(MultiFDRecvParams *p, Error **errp)
         do {
             ret = inflate(zs, flush);
         } while (ret == Z_OK && zs->avail_in
-                             && (zs->total_out - start) < iov->iov_len);
-        if (ret == Z_OK && (zs->total_out - start) < iov->iov_len) {
+                             && (zs->total_out - start) < page_size);
+        if (ret == Z_OK && (zs->total_out - start) < page_size) {
             error_setg(errp, "multifd %d: inflate generated too few output",
                        p->id);
             return -1;
-- 
Gitee


From e2195bb735521e42c358e1c898d45d293ab39314 Mon Sep 17 00:00:00 2001
From: Juan Quintela <quintela@redhat.com>
Date: Wed, 18 May 2022 02:52:23 -0300
Subject: [PATCH 119/407] migration: All this fields are unsigned
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Leonardo Brás <leobras@redhat.com>
RH-MergeRequest: 191: MSG_ZEROCOPY + Multifd @ rhel8.7
RH-Commit: [10/26] 2c3ee27aae334db3b283ab7ef580f58e396e569d
RH-Bugzilla: 2072049
RH-Acked-by: Peter Xu <peterx@redhat.com>
RH-Acked-by: Daniel P. Berrangé <berrange@redhat.com>
RH-Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

So printing it as %d is wrong.  Notice that for the channel id, that
is an uint8_t, but I changed it anyways for consistency.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
(cherry picked from commit 04e114049406dbb69fc9043c795ddd28fdba31a6)
Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 migration/multifd-zlib.c | 20 ++++++++++----------
 migration/multifd-zstd.c | 24 ++++++++++++------------
 migration/multifd.c      | 16 ++++++++--------
 migration/trace-events   | 26 +++++++++++++-------------
 4 files changed, 43 insertions(+), 43 deletions(-)

diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
index a1950a4588d..a987e4a26cb 100644
--- a/migration/multifd-zlib.c
+++ b/migration/multifd-zlib.c
@@ -52,7 +52,7 @@ static int zlib_send_setup(MultiFDSendParams *p, Error **errp)
     zs->opaque = Z_NULL;
     if (deflateInit(zs, migrate_multifd_zlib_level()) != Z_OK) {
         g_free(z);
-        error_setg(errp, "multifd %d: deflate init failed", p->id);
+        error_setg(errp, "multifd %u: deflate init failed", p->id);
         return -1;
     }
     /* We will never have more than page_count pages */
@@ -62,7 +62,7 @@ static int zlib_send_setup(MultiFDSendParams *p, Error **errp)
     if (!z->zbuff) {
         deflateEnd(&z->zs);
         g_free(z);
-        error_setg(errp, "multifd %d: out of memory for zbuff", p->id);
+        error_setg(errp, "multifd %u: out of memory for zbuff", p->id);
         return -1;
     }
     p->data = z;
@@ -134,12 +134,12 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
             ret = deflate(zs, flush);
         } while (ret == Z_OK && zs->avail_in && zs->avail_out);
         if (ret == Z_OK && zs->avail_in) {
-            error_setg(errp, "multifd %d: deflate failed to compress all input",
+            error_setg(errp, "multifd %u: deflate failed to compress all input",
                        p->id);
             return -1;
         }
         if (ret != Z_OK) {
-            error_setg(errp, "multifd %d: deflate returned %d instead of Z_OK",
+            error_setg(errp, "multifd %u: deflate returned %d instead of Z_OK",
                        p->id, ret);
             return -1;
         }
@@ -193,7 +193,7 @@ static int zlib_recv_setup(MultiFDRecvParams *p, Error **errp)
     zs->avail_in = 0;
     zs->next_in = Z_NULL;
     if (inflateInit(zs) != Z_OK) {
-        error_setg(errp, "multifd %d: inflate init failed", p->id);
+        error_setg(errp, "multifd %u: inflate init failed", p->id);
         return -1;
     }
     /* We will never have more than page_count pages */
@@ -203,7 +203,7 @@ static int zlib_recv_setup(MultiFDRecvParams *p, Error **errp)
     z->zbuff = g_try_malloc(z->zbuff_len);
     if (!z->zbuff) {
         inflateEnd(zs);
-        error_setg(errp, "multifd %d: out of memory for zbuff", p->id);
+        error_setg(errp, "multifd %u: out of memory for zbuff", p->id);
         return -1;
     }
     return 0;
@@ -252,7 +252,7 @@ static int zlib_recv_pages(MultiFDRecvParams *p, Error **errp)
     int i;
 
     if (flags != MULTIFD_FLAG_ZLIB) {
-        error_setg(errp, "multifd %d: flags received %x flags expected %x",
+        error_setg(errp, "multifd %u: flags received %x flags expected %x",
                    p->id, flags, MULTIFD_FLAG_ZLIB);
         return -1;
     }
@@ -289,19 +289,19 @@ static int zlib_recv_pages(MultiFDRecvParams *p, Error **errp)
         } while (ret == Z_OK && zs->avail_in
                              && (zs->total_out - start) < page_size);
         if (ret == Z_OK && (zs->total_out - start) < page_size) {
-            error_setg(errp, "multifd %d: inflate generated too few output",
+            error_setg(errp, "multifd %u: inflate generated too few output",
                        p->id);
             return -1;
         }
         if (ret != Z_OK) {
-            error_setg(errp, "multifd %d: inflate returned %d instead of Z_OK",
+            error_setg(errp, "multifd %u: inflate returned %d instead of Z_OK",
                        p->id, ret);
             return -1;
         }
     }
     out_size = zs->total_out - out_size;
     if (out_size != expected_size) {
-        error_setg(errp, "multifd %d: packet size received %d size expected %d",
+        error_setg(errp, "multifd %u: packet size received %u size expected %u",
                    p->id, out_size, expected_size);
         return -1;
     }
diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
index d9ed42622bb..2185a83eacb 100644
--- a/migration/multifd-zstd.c
+++ b/migration/multifd-zstd.c
@@ -56,7 +56,7 @@ static int zstd_send_setup(MultiFDSendParams *p, Error **errp)
     z->zcs = ZSTD_createCStream();
     if (!z->zcs) {
         g_free(z);
-        error_setg(errp, "multifd %d: zstd createCStream failed", p->id);
+        error_setg(errp, "multifd %u: zstd createCStream failed", p->id);
         return -1;
     }
 
@@ -64,7 +64,7 @@ static int zstd_send_setup(MultiFDSendParams *p, Error **errp)
     if (ZSTD_isError(res)) {
         ZSTD_freeCStream(z->zcs);
         g_free(z);
-        error_setg(errp, "multifd %d: initCStream failed with error %s",
+        error_setg(errp, "multifd %u: initCStream failed with error %s",
                    p->id, ZSTD_getErrorName(res));
         return -1;
     }
@@ -75,7 +75,7 @@ static int zstd_send_setup(MultiFDSendParams *p, Error **errp)
     if (!z->zbuff) {
         ZSTD_freeCStream(z->zcs);
         g_free(z);
-        error_setg(errp, "multifd %d: out of memory for zbuff", p->id);
+        error_setg(errp, "multifd %u: out of memory for zbuff", p->id);
         return -1;
     }
     return 0;
@@ -146,12 +146,12 @@ static int zstd_send_prepare(MultiFDSendParams *p, Error **errp)
         } while (ret > 0 && (z->in.size - z->in.pos > 0)
                          && (z->out.size - z->out.pos > 0));
         if (ret > 0 && (z->in.size - z->in.pos > 0)) {
-            error_setg(errp, "multifd %d: compressStream buffer too small",
+            error_setg(errp, "multifd %u: compressStream buffer too small",
                        p->id);
             return -1;
         }
         if (ZSTD_isError(ret)) {
-            error_setg(errp, "multifd %d: compressStream error %s",
+            error_setg(errp, "multifd %u: compressStream error %s",
                        p->id, ZSTD_getErrorName(ret));
             return -1;
         }
@@ -201,7 +201,7 @@ static int zstd_recv_setup(MultiFDRecvParams *p, Error **errp)
     z->zds = ZSTD_createDStream();
     if (!z->zds) {
         g_free(z);
-        error_setg(errp, "multifd %d: zstd createDStream failed", p->id);
+        error_setg(errp, "multifd %u: zstd createDStream failed", p->id);
         return -1;
     }
 
@@ -209,7 +209,7 @@ static int zstd_recv_setup(MultiFDRecvParams *p, Error **errp)
     if (ZSTD_isError(ret)) {
         ZSTD_freeDStream(z->zds);
         g_free(z);
-        error_setg(errp, "multifd %d: initDStream failed with error %s",
+        error_setg(errp, "multifd %u: initDStream failed with error %s",
                    p->id, ZSTD_getErrorName(ret));
         return -1;
     }
@@ -222,7 +222,7 @@ static int zstd_recv_setup(MultiFDRecvParams *p, Error **errp)
     if (!z->zbuff) {
         ZSTD_freeDStream(z->zds);
         g_free(z);
-        error_setg(errp, "multifd %d: out of memory for zbuff", p->id);
+        error_setg(errp, "multifd %u: out of memory for zbuff", p->id);
         return -1;
     }
     return 0;
@@ -270,7 +270,7 @@ static int zstd_recv_pages(MultiFDRecvParams *p, Error **errp)
     int i;
 
     if (flags != MULTIFD_FLAG_ZSTD) {
-        error_setg(errp, "multifd %d: flags received %x flags expected %x",
+        error_setg(errp, "multifd %u: flags received %x flags expected %x",
                    p->id, flags, MULTIFD_FLAG_ZSTD);
         return -1;
     }
@@ -302,19 +302,19 @@ static int zstd_recv_pages(MultiFDRecvParams *p, Error **errp)
         } while (ret > 0 && (z->in.size - z->in.pos > 0)
                          && (z->out.pos < page_size));
         if (ret > 0 && (z->out.pos < page_size)) {
-            error_setg(errp, "multifd %d: decompressStream buffer too small",
+            error_setg(errp, "multifd %u: decompressStream buffer too small",
                        p->id);
             return -1;
         }
         if (ZSTD_isError(ret)) {
-            error_setg(errp, "multifd %d: decompressStream returned %s",
+            error_setg(errp, "multifd %u: decompressStream returned %s",
                        p->id, ZSTD_getErrorName(ret));
             return ret;
         }
         out_size += z->out.pos;
     }
     if (out_size != expected_size) {
-        error_setg(errp, "multifd %d: packet size received %d size expected %d",
+        error_setg(errp, "multifd %u: packet size received %u size expected %u",
                    p->id, out_size, expected_size);
         return -1;
     }
diff --git a/migration/multifd.c b/migration/multifd.c
index 0533da154a8..d0d19470f9f 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -148,7 +148,7 @@ static int nocomp_recv_pages(MultiFDRecvParams *p, Error **errp)
     uint32_t flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK;
 
     if (flags != MULTIFD_FLAG_NOCOMP) {
-        error_setg(errp, "multifd %d: flags received %x flags expected %x",
+        error_setg(errp, "multifd %u: flags received %x flags expected %x",
                    p->id, flags, MULTIFD_FLAG_NOCOMP);
         return -1;
     }
@@ -212,8 +212,8 @@ static int multifd_recv_initial_packet(QIOChannel *c, Error **errp)
     }
 
     if (msg.version != MULTIFD_VERSION) {
-        error_setg(errp, "multifd: received packet version %d "
-                   "expected %d", msg.version, MULTIFD_VERSION);
+        error_setg(errp, "multifd: received packet version %u "
+                   "expected %u", msg.version, MULTIFD_VERSION);
         return -1;
     }
 
@@ -229,8 +229,8 @@ static int multifd_recv_initial_packet(QIOChannel *c, Error **errp)
     }
 
     if (msg.id > migrate_multifd_channels()) {
-        error_setg(errp, "multifd: received channel version %d "
-                   "expected %d", msg.version, MULTIFD_VERSION);
+        error_setg(errp, "multifd: received channel version %u "
+                   "expected %u", msg.version, MULTIFD_VERSION);
         return -1;
     }
 
@@ -303,7 +303,7 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
     packet->version = be32_to_cpu(packet->version);
     if (packet->version != MULTIFD_VERSION) {
         error_setg(errp, "multifd: received packet "
-                   "version %d and expected version %d",
+                   "version %u and expected version %u",
                    packet->version, MULTIFD_VERSION);
         return -1;
     }
@@ -317,7 +317,7 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
      */
     if (packet->pages_alloc > pages_max * 100) {
         error_setg(errp, "multifd: received packet "
-                   "with size %d and expected a maximum size of %d",
+                   "with size %u and expected a maximum size of %u",
                    packet->pages_alloc, pages_max * 100) ;
         return -1;
     }
@@ -333,7 +333,7 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
     p->pages->num = be32_to_cpu(packet->pages_used);
     if (p->pages->num > packet->pages_alloc) {
         error_setg(errp, "multifd: received packet "
-                   "with %d pages and expected maximum pages are %d",
+                   "with %u pages and expected maximum pages are %u",
                    p->pages->num, packet->pages_alloc) ;
         return -1;
     }
diff --git a/migration/trace-events b/migration/trace-events
index b48d873b8a0..5172cb3b3da 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -115,23 +115,23 @@ ram_write_tracking_ramblock_start(const char *block_id, size_t page_size, void *
 ram_write_tracking_ramblock_stop(const char *block_id, size_t page_size, void *addr, size_t length) "%s: page_size: %zu addr: %p length: %zu"
 
 # multifd.c
-multifd_new_send_channel_async(uint8_t id) "channel %d"
-multifd_recv(uint8_t id, uint64_t packet_num, uint32_t used, uint32_t flags, uint32_t next_packet_size) "channel %d packet_num %" PRIu64 " pages %d flags 0x%x next packet size %d"
-multifd_recv_new_channel(uint8_t id) "channel %d"
+multifd_new_send_channel_async(uint8_t id) "channel %u"
+multifd_recv(uint8_t id, uint64_t packet_num, uint32_t used, uint32_t flags, uint32_t next_packet_size) "channel %u packet_num %" PRIu64 " pages %u flags 0x%x next packet size %u"
+multifd_recv_new_channel(uint8_t id) "channel %u"
 multifd_recv_sync_main(long packet_num) "packet num %ld"
-multifd_recv_sync_main_signal(uint8_t id) "channel %d"
-multifd_recv_sync_main_wait(uint8_t id) "channel %d"
+multifd_recv_sync_main_signal(uint8_t id) "channel %u"
+multifd_recv_sync_main_wait(uint8_t id) "channel %u"
 multifd_recv_terminate_threads(bool error) "error %d"
-multifd_recv_thread_end(uint8_t id, uint64_t packets, uint64_t pages) "channel %d packets %" PRIu64 " pages %" PRIu64
-multifd_recv_thread_start(uint8_t id) "%d"
-multifd_send(uint8_t id, uint64_t packet_num, uint32_t used, uint32_t flags, uint32_t next_packet_size) "channel %d packet_num %" PRIu64 " pages %d flags 0x%x next packet size %d"
-multifd_send_error(uint8_t id) "channel %d"
+multifd_recv_thread_end(uint8_t id, uint64_t packets, uint64_t pages) "channel %u packets %" PRIu64 " pages %" PRIu64
+multifd_recv_thread_start(uint8_t id) "%u"
+multifd_send(uint8_t id, uint64_t packet_num, uint32_t used, uint32_t flags, uint32_t next_packet_size) "channel %u packet_num %" PRIu64 " pages %u flags 0x%x next packet size %u"
+multifd_send_error(uint8_t id) "channel %u"
 multifd_send_sync_main(long packet_num) "packet num %ld"
-multifd_send_sync_main_signal(uint8_t id) "channel %d"
-multifd_send_sync_main_wait(uint8_t id) "channel %d"
+multifd_send_sync_main_signal(uint8_t id) "channel %u"
+multifd_send_sync_main_wait(uint8_t id) "channel %u"
 multifd_send_terminate_threads(bool error) "error %d"
-multifd_send_thread_end(uint8_t id, uint64_t packets, uint64_t pages) "channel %d packets %" PRIu64 " pages %"  PRIu64
-multifd_send_thread_start(uint8_t id) "%d"
+multifd_send_thread_end(uint8_t id, uint64_t packets, uint64_t pages) "channel %u packets %" PRIu64 " pages %"  PRIu64
+multifd_send_thread_start(uint8_t id) "%u"
 multifd_tls_outgoing_handshake_start(void *ioc, void *tioc, const char *hostname) "ioc=%p tioc=%p hostname=%s"
 multifd_tls_outgoing_handshake_error(void *ioc, const char *err) "ioc=%p err=%s"
 multifd_tls_outgoing_handshake_complete(void *ioc) "ioc=%p"
-- 
Gitee


From bff965662a4f028de5fbb645d2c81be154f9ae27 Mon Sep 17 00:00:00 2001
From: Juan Quintela <quintela@redhat.com>
Date: Wed, 18 May 2022 02:52:24 -0300
Subject: [PATCH 120/407] multifd: Move iov from pages to params
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Leonardo Brás <leobras@redhat.com>
RH-MergeRequest: 191: MSG_ZEROCOPY + Multifd @ rhel8.7
RH-Commit: [11/26] 24dff3ef68cf3327811242193502319ed3e3940a
RH-Bugzilla: 2072049
RH-Acked-by: Peter Xu <peterx@redhat.com>
RH-Acked-by: Daniel P. Berrangé <berrange@redhat.com>
RH-Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

This will allow us to reduce the number of system calls on the next patch.

Signed-off-by: Juan Quintela <quintela@redhat.com>
(cherry picked from commit 226468ba3dea950ab4bb0b729878dde25812da1c)
Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 migration/multifd.c | 34 ++++++++++++++++++++++++----------
 migration/multifd.h |  8 ++++++--
 2 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index d0d19470f9f..5004f394aa1 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -86,7 +86,16 @@ static void nocomp_send_cleanup(MultiFDSendParams *p, Error **errp)
  */
 static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp)
 {
-    p->next_packet_size = p->pages->num * qemu_target_page_size();
+    MultiFDPages_t *pages = p->pages;
+    size_t page_size = qemu_target_page_size();
+
+    for (int i = 0; i < p->pages->num; i++) {
+        p->iov[p->iovs_num].iov_base = pages->block->host + pages->offset[i];
+        p->iov[p->iovs_num].iov_len = page_size;
+        p->iovs_num++;
+    }
+
+    p->next_packet_size = p->pages->num * page_size;
     p->flags |= MULTIFD_FLAG_NOCOMP;
     return 0;
 }
@@ -104,7 +113,7 @@ static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp)
  */
 static int nocomp_send_write(MultiFDSendParams *p, uint32_t used, Error **errp)
 {
-    return qio_channel_writev_all(p->c, p->pages->iov, used, errp);
+    return qio_channel_writev_all(p->c, p->iov, p->iovs_num, errp);
 }
 
 /**
@@ -146,13 +155,18 @@ static void nocomp_recv_cleanup(MultiFDRecvParams *p)
 static int nocomp_recv_pages(MultiFDRecvParams *p, Error **errp)
 {
     uint32_t flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK;
+    size_t page_size = qemu_target_page_size();
 
     if (flags != MULTIFD_FLAG_NOCOMP) {
         error_setg(errp, "multifd %u: flags received %x flags expected %x",
                    p->id, flags, MULTIFD_FLAG_NOCOMP);
         return -1;
     }
-    return qio_channel_readv_all(p->c, p->pages->iov, p->pages->num, errp);
+    for (int i = 0; i < p->pages->num; i++) {
+        p->iov[i].iov_base = p->pages->block->host + p->pages->offset[i];
+        p->iov[i].iov_len = page_size;
+    }
+    return qio_channel_readv_all(p->c, p->iov, p->pages->num, errp);
 }
 
 static MultiFDMethods multifd_nocomp_ops = {
@@ -242,7 +256,6 @@ static MultiFDPages_t *multifd_pages_init(size_t size)
     MultiFDPages_t *pages = g_new0(MultiFDPages_t, 1);
 
     pages->allocated = size;
-    pages->iov = g_new0(struct iovec, size);
     pages->offset = g_new0(ram_addr_t, size);
 
     return pages;
@@ -254,8 +267,6 @@ static void multifd_pages_clear(MultiFDPages_t *pages)
     pages->allocated = 0;
     pages->packet_num = 0;
     pages->block = NULL;
-    g_free(pages->iov);
-    pages->iov = NULL;
     g_free(pages->offset);
     pages->offset = NULL;
     g_free(pages);
@@ -365,8 +376,6 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
             return -1;
         }
         p->pages->offset[i] = offset;
-        p->pages->iov[i].iov_base = block->host + offset;
-        p->pages->iov[i].iov_len = page_size;
     }
 
     return 0;
@@ -470,8 +479,6 @@ int multifd_queue_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset)
 
     if (pages->block == block) {
         pages->offset[pages->num] = offset;
-        pages->iov[pages->num].iov_base = block->host + offset;
-        pages->iov[pages->num].iov_len = qemu_target_page_size();
         pages->num++;
 
         if (pages->num < pages->allocated) {
@@ -564,6 +571,8 @@ void multifd_save_cleanup(void)
         p->packet_len = 0;
         g_free(p->packet);
         p->packet = NULL;
+        g_free(p->iov);
+        p->iov = NULL;
         multifd_send_state->ops->send_cleanup(p, &local_err);
         if (local_err) {
             migrate_set_error(migrate_get_current(), local_err);
@@ -651,6 +660,7 @@ static void *multifd_send_thread(void *opaque)
             uint32_t used = p->pages->num;
             uint64_t packet_num = p->packet_num;
             uint32_t flags = p->flags;
+            p->iovs_num = 0;
 
             if (used) {
                 ret = multifd_send_state->ops->send_prepare(p, &local_err);
@@ -919,6 +929,7 @@ int multifd_save_setup(Error **errp)
         p->packet->version = cpu_to_be32(MULTIFD_VERSION);
         p->name = g_strdup_printf("multifdsend_%d", i);
         p->tls_hostname = g_strdup(s->hostname);
+        p->iov = g_new0(struct iovec, page_count);
         socket_send_channel_create(multifd_new_send_channel_async, p);
     }
 
@@ -1018,6 +1029,8 @@ int multifd_load_cleanup(Error **errp)
         p->packet_len = 0;
         g_free(p->packet);
         p->packet = NULL;
+        g_free(p->iov);
+        p->iov = NULL;
         multifd_recv_state->ops->recv_cleanup(p);
     }
     qemu_sem_destroy(&multifd_recv_state->sem_sync);
@@ -1158,6 +1171,7 @@ int multifd_load_setup(Error **errp)
                       + sizeof(uint64_t) * page_count;
         p->packet = g_malloc0(p->packet_len);
         p->name = g_strdup_printf("multifdrecv_%d", i);
+        p->iov = g_new0(struct iovec, page_count);
     }
 
     for (i = 0; i < thread_count; i++) {
diff --git a/migration/multifd.h b/migration/multifd.h
index e57adc783b4..c3f18af3640 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -62,8 +62,6 @@ typedef struct {
     uint64_t packet_num;
     /* offset of each page */
     ram_addr_t *offset;
-    /* pointer to each page */
-    struct iovec *iov;
     RAMBlock *block;
 } MultiFDPages_t;
 
@@ -110,6 +108,10 @@ typedef struct {
     uint64_t num_pages;
     /* syncs main thread and channels */
     QemuSemaphore sem_sync;
+    /* buffers to send */
+    struct iovec *iov;
+    /* number of iovs used */
+    uint32_t iovs_num;
     /* used for compression methods */
     void *data;
 }  MultiFDSendParams;
@@ -149,6 +151,8 @@ typedef struct {
     uint64_t num_pages;
     /* syncs main thread and channels */
     QemuSemaphore sem_sync;
+    /* buffers to recv */
+    struct iovec *iov;
     /* used for de-compression methods */
     void *data;
 } MultiFDRecvParams;
-- 
Gitee


From d37bbe6b16ad1fb5506708fe7bd4149f77703ce1 Mon Sep 17 00:00:00 2001
From: Juan Quintela <quintela@redhat.com>
Date: Wed, 18 May 2022 02:52:24 -0300
Subject: [PATCH 121/407] multifd: Make zlib use iov's
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Leonardo Brás <leobras@redhat.com>
RH-MergeRequest: 191: MSG_ZEROCOPY + Multifd @ rhel8.7
RH-Commit: [12/26] 58630452e14802e71a9eadb17cfe4964ebf8e091
RH-Bugzilla: 2072049
RH-Acked-by: Peter Xu <peterx@redhat.com>
RH-Acked-by: Daniel P. Berrangé <berrange@redhat.com>
RH-Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
(cherry picked from commit 48a4a44c1cde382c6b8e7792d01fe7d9b0a59c69)
Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 migration/multifd-zlib.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
index a987e4a26cb..96475e096ee 100644
--- a/migration/multifd-zlib.c
+++ b/migration/multifd-zlib.c
@@ -145,6 +145,9 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
         }
         out_size += available - zs->avail_out;
     }
+    p->iov[p->iovs_num].iov_base = z->zbuff;
+    p->iov[p->iovs_num].iov_len = out_size;
+    p->iovs_num++;
     p->next_packet_size = out_size;
     p->flags |= MULTIFD_FLAG_ZLIB;
 
@@ -164,10 +167,7 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
  */
 static int zlib_send_write(MultiFDSendParams *p, uint32_t used, Error **errp)
 {
-    struct zlib_data *z = p->data;
-
-    return qio_channel_write_all(p->c, (void *)z->zbuff, p->next_packet_size,
-                                 errp);
+    return qio_channel_writev_all(p->c, p->iov, p->iovs_num, errp);
 }
 
 /**
-- 
Gitee


From 7ea6ca645e0cc6c7da80586c3587a4ba29904b7b Mon Sep 17 00:00:00 2001
From: Juan Quintela <quintela@redhat.com>
Date: Wed, 18 May 2022 02:52:24 -0300
Subject: [PATCH 122/407] multifd: Make zstd use iov's
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Leonardo Brás <leobras@redhat.com>
RH-MergeRequest: 191: MSG_ZEROCOPY + Multifd @ rhel8.7
RH-Commit: [13/26] 4d7036fb32efdf088d23737b9710e6ad1a4654aa
RH-Bugzilla: 2072049
RH-Acked-by: Peter Xu <peterx@redhat.com>
RH-Acked-by: Daniel P. Berrangé <berrange@redhat.com>
RH-Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
(cherry picked from commit 0a818b89eb8eaf79ae651405907d8110a0935cfd)
Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 migration/multifd-zstd.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
index 2185a83eacb..4e60cdbc546 100644
--- a/migration/multifd-zstd.c
+++ b/migration/multifd-zstd.c
@@ -156,6 +156,9 @@ static int zstd_send_prepare(MultiFDSendParams *p, Error **errp)
             return -1;
         }
     }
+    p->iov[p->iovs_num].iov_base = z->zbuff;
+    p->iov[p->iovs_num].iov_len = z->out.pos;
+    p->iovs_num++;
     p->next_packet_size = z->out.pos;
     p->flags |= MULTIFD_FLAG_ZSTD;
 
@@ -175,10 +178,7 @@ static int zstd_send_prepare(MultiFDSendParams *p, Error **errp)
  */
 static int zstd_send_write(MultiFDSendParams *p, uint32_t used, Error **errp)
 {
-    struct zstd_data *z = p->data;
-
-    return qio_channel_write_all(p->c, (void *)z->zbuff, p->next_packet_size,
-                                 errp);
+    return qio_channel_writev_all(p->c, p->iov, p->iovs_num, errp);
 }
 
 /**
-- 
Gitee


From 08f50ada8234f91ac0405bec9eab5440c108bedd Mon Sep 17 00:00:00 2001
From: Juan Quintela <quintela@redhat.com>
Date: Wed, 18 May 2022 02:52:24 -0300
Subject: [PATCH 123/407] multifd: Remove send_write() method
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Leonardo Brás <leobras@redhat.com>
RH-MergeRequest: 191: MSG_ZEROCOPY + Multifd @ rhel8.7
RH-Commit: [14/26] 5fa59ffa09099fbc6da84e9a192ca71af52cc98f
RH-Bugzilla: 2072049
RH-Acked-by: Peter Xu <peterx@redhat.com>
RH-Acked-by: Daniel P. Berrangé <berrange@redhat.com>
RH-Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Everything use now iov's.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
(cherry picked from commit 468fcb5dd0c965e1af0da9efab09b1462631da18)
Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 migration/multifd-zlib.c | 17 -----------------
 migration/multifd-zstd.c | 17 -----------------
 migration/multifd.c      | 20 ++------------------
 migration/multifd.h      |  2 --
 4 files changed, 2 insertions(+), 54 deletions(-)

diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
index 96475e096ee..8ed29b96337 100644
--- a/migration/multifd-zlib.c
+++ b/migration/multifd-zlib.c
@@ -154,22 +154,6 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
     return 0;
 }
 
-/**
- * zlib_send_write: do the actual write of the data
- *
- * Do the actual write of the comprresed buffer.
- *
- * Returns 0 for success or -1 for error
- *
- * @p: Params for the channel that we are using
- * @used: number of pages used
- * @errp: pointer to an error
- */
-static int zlib_send_write(MultiFDSendParams *p, uint32_t used, Error **errp)
-{
-    return qio_channel_writev_all(p->c, p->iov, p->iovs_num, errp);
-}
-
 /**
  * zlib_recv_setup: setup receive side
  *
@@ -312,7 +296,6 @@ static MultiFDMethods multifd_zlib_ops = {
     .send_setup = zlib_send_setup,
     .send_cleanup = zlib_send_cleanup,
     .send_prepare = zlib_send_prepare,
-    .send_write = zlib_send_write,
     .recv_setup = zlib_recv_setup,
     .recv_cleanup = zlib_recv_cleanup,
     .recv_pages = zlib_recv_pages
diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
index 4e60cdbc546..25e1f517b5f 100644
--- a/migration/multifd-zstd.c
+++ b/migration/multifd-zstd.c
@@ -165,22 +165,6 @@ static int zstd_send_prepare(MultiFDSendParams *p, Error **errp)
     return 0;
 }
 
-/**
- * zstd_send_write: do the actual write of the data
- *
- * Do the actual write of the comprresed buffer.
- *
- * Returns 0 for success or -1 for error
- *
- * @p: Params for the channel that we are using
- * @used: number of pages used
- * @errp: pointer to an error
- */
-static int zstd_send_write(MultiFDSendParams *p, uint32_t used, Error **errp)
-{
-    return qio_channel_writev_all(p->c, p->iov, p->iovs_num, errp);
-}
-
 /**
  * zstd_recv_setup: setup receive side
  *
@@ -325,7 +309,6 @@ static MultiFDMethods multifd_zstd_ops = {
     .send_setup = zstd_send_setup,
     .send_cleanup = zstd_send_cleanup,
     .send_prepare = zstd_send_prepare,
-    .send_write = zstd_send_write,
     .recv_setup = zstd_recv_setup,
     .recv_cleanup = zstd_recv_cleanup,
     .recv_pages = zstd_recv_pages
diff --git a/migration/multifd.c b/migration/multifd.c
index 5004f394aa1..1e1551d78bd 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -100,22 +100,6 @@ static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp)
     return 0;
 }
 
-/**
- * nocomp_send_write: do the actual write of the data
- *
- * For no compression we just have to write the data.
- *
- * Returns 0 for success or -1 for error
- *
- * @p: Params for the channel that we are using
- * @used: number of pages used
- * @errp: pointer to an error
- */
-static int nocomp_send_write(MultiFDSendParams *p, uint32_t used, Error **errp)
-{
-    return qio_channel_writev_all(p->c, p->iov, p->iovs_num, errp);
-}
-
 /**
  * nocomp_recv_setup: setup receive side
  *
@@ -173,7 +157,6 @@ static MultiFDMethods multifd_nocomp_ops = {
     .send_setup = nocomp_send_setup,
     .send_cleanup = nocomp_send_cleanup,
     .send_prepare = nocomp_send_prepare,
-    .send_write = nocomp_send_write,
     .recv_setup = nocomp_recv_setup,
     .recv_cleanup = nocomp_recv_cleanup,
     .recv_pages = nocomp_recv_pages
@@ -687,7 +670,8 @@ static void *multifd_send_thread(void *opaque)
             }
 
             if (used) {
-                ret = multifd_send_state->ops->send_write(p, used, &local_err);
+                ret = qio_channel_writev_all(p->c, p->iov, p->iovs_num,
+                                             &local_err);
                 if (ret != 0) {
                     break;
                 }
diff --git a/migration/multifd.h b/migration/multifd.h
index c3f18af3640..7496f951a79 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -164,8 +164,6 @@ typedef struct {
     void (*send_cleanup)(MultiFDSendParams *p, Error **errp);
     /* Prepare the send packet */
     int (*send_prepare)(MultiFDSendParams *p, Error **errp);
-    /* Write the send packet */
-    int (*send_write)(MultiFDSendParams *p, uint32_t used, Error **errp);
     /* Setup for receiving side */
     int (*recv_setup)(MultiFDRecvParams *p, Error **errp);
     /* Cleanup for receiving side */
-- 
Gitee


From 704332a22e385ad80c487923684139a36e5c5d19 Mon Sep 17 00:00:00 2001
From: Juan Quintela <quintela@redhat.com>
Date: Wed, 18 May 2022 02:52:24 -0300
Subject: [PATCH 124/407] multifd: Use a single writev on the send side
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Leonardo Brás <leobras@redhat.com>
RH-MergeRequest: 191: MSG_ZEROCOPY + Multifd @ rhel8.7
RH-Commit: [15/26] c37063c813fc0ba695072117f272360e5c413803
RH-Bugzilla: 2072049
RH-Acked-by: Peter Xu <peterx@redhat.com>
RH-Acked-by: Daniel P. Berrangé <berrange@redhat.com>
RH-Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Until now, we wrote the packet header with write(), and the rest of the
pages with writev().  Just increase the size of the iovec and do a
single writev().

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
(cherry picked from commit d48c3a044537689866fe44e65d24c7d39a68868a)
Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 migration/multifd.c | 20 ++++++++------------
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 1e1551d78bd..d0f86542b1c 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -643,7 +643,7 @@ static void *multifd_send_thread(void *opaque)
             uint32_t used = p->pages->num;
             uint64_t packet_num = p->packet_num;
             uint32_t flags = p->flags;
-            p->iovs_num = 0;
+            p->iovs_num = 1;
 
             if (used) {
                 ret = multifd_send_state->ops->send_prepare(p, &local_err);
@@ -663,20 +663,15 @@ static void *multifd_send_thread(void *opaque)
             trace_multifd_send(p->id, packet_num, used, flags,
                                p->next_packet_size);
 
-            ret = qio_channel_write_all(p->c, (void *)p->packet,
-                                        p->packet_len, &local_err);
+            p->iov[0].iov_len = p->packet_len;
+            p->iov[0].iov_base = p->packet;
+
+            ret = qio_channel_writev_all(p->c, p->iov, p->iovs_num,
+                                         &local_err);
             if (ret != 0) {
                 break;
             }
 
-            if (used) {
-                ret = qio_channel_writev_all(p->c, p->iov, p->iovs_num,
-                                             &local_err);
-                if (ret != 0) {
-                    break;
-                }
-            }
-
             qemu_mutex_lock(&p->mutex);
             p->pending_job--;
             qemu_mutex_unlock(&p->mutex);
@@ -913,7 +908,8 @@ int multifd_save_setup(Error **errp)
         p->packet->version = cpu_to_be32(MULTIFD_VERSION);
         p->name = g_strdup_printf("multifdsend_%d", i);
         p->tls_hostname = g_strdup(s->hostname);
-        p->iov = g_new0(struct iovec, page_count);
+        /* We need one extra place for the packet header */
+        p->iov = g_new0(struct iovec, page_count + 1);
         socket_send_channel_create(multifd_new_send_channel_async, p);
     }
 
-- 
Gitee


From 15282aaa3c2bf0f0a0fc20e9e8e9b4a909c540d7 Mon Sep 17 00:00:00 2001
From: Juan Quintela <quintela@redhat.com>
Date: Wed, 18 May 2022 02:52:24 -0300
Subject: [PATCH 125/407] multifd: Use normal pages array on the send side
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Leonardo Brás <leobras@redhat.com>
RH-MergeRequest: 191: MSG_ZEROCOPY + Multifd @ rhel8.7
RH-Commit: [16/26] 1c48806474daf48fe93920ac361311af95c6a6f3
RH-Bugzilla: 2072049
RH-Acked-by: Peter Xu <peterx@redhat.com>
RH-Acked-by: Daniel P. Berrangé <berrange@redhat.com>
RH-Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

We are only sending normal pages through multifd channels.
Later on this series, we are going to also send zero pages.
We are going to detect if a page is zero or non zero in the multifd
channel thread, not on the main thread.

So we receive an array of pages page->offset[N]

And we will end with:

p->normal[N - zero_pages]
p->zero[zero_pages].

In this patch, we just copy all the pages in offset to normal.

for (i = 0; i < pages->num; i++) {
    p->narmal[p->normal_num] = pages->offset[i];
    p->normal_num++:
}

Later in the series this becomes:

for (i = 0; i < pages->num; i++) {
    if (buffer_is_zero(page->offset[i])) {
        p->zerol[p->zero_num] = pages->offset[i];
        p->zero_num++:
    } else {
        p->narmal[p->normal_num] = pages->offset[i];
        p->normal_num++:
    }
}

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/multifd-zlib.c |  6 +++---
 migration/multifd-zstd.c |  6 +++---
 migration/multifd.c      | 30 +++++++++++++++++++-----------
 migration/multifd.h      |  8 ++++++--
 migration/trace-events   |  4 ++--
 5 files changed, 33 insertions(+), 21 deletions(-)

diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
index 8ed29b96337..8508f26adf1 100644
--- a/migration/multifd-zlib.c
+++ b/migration/multifd-zlib.c
@@ -108,16 +108,16 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
     int ret;
     uint32_t i;
 
-    for (i = 0; i < p->pages->num; i++) {
+    for (i = 0; i < p->normal_num; i++) {
         uint32_t available = z->zbuff_len - out_size;
         int flush = Z_NO_FLUSH;
 
-        if (i == p->pages->num - 1) {
+        if (i == p->normal_num - 1) {
             flush = Z_SYNC_FLUSH;
         }
 
         zs->avail_in = page_size;
-        zs->next_in = p->pages->block->host + p->pages->offset[i];
+        zs->next_in = p->pages->block->host + p->normal[i];
 
         zs->avail_out = available;
         zs->next_out = z->zbuff + out_size;
diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
index 25e1f517b5f..693af3a1406 100644
--- a/migration/multifd-zstd.c
+++ b/migration/multifd-zstd.c
@@ -123,13 +123,13 @@ static int zstd_send_prepare(MultiFDSendParams *p, Error **errp)
     z->out.size = z->zbuff_len;
     z->out.pos = 0;
 
-    for (i = 0; i < p->pages->num; i++) {
+    for (i = 0; i < p->normal_num; i++) {
         ZSTD_EndDirective flush = ZSTD_e_continue;
 
-        if (i == p->pages->num - 1) {
+        if (i == p->normal_num - 1) {
             flush = ZSTD_e_flush;
         }
-        z->in.src = p->pages->block->host + p->pages->offset[i];
+        z->in.src = p->pages->block->host + p->normal[i];
         z->in.size = page_size;
         z->in.pos = 0;
 
diff --git a/migration/multifd.c b/migration/multifd.c
index d0f86542b1c..37252264002 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -89,13 +89,13 @@ static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp)
     MultiFDPages_t *pages = p->pages;
     size_t page_size = qemu_target_page_size();
 
-    for (int i = 0; i < p->pages->num; i++) {
-        p->iov[p->iovs_num].iov_base = pages->block->host + pages->offset[i];
+    for (int i = 0; i < p->normal_num; i++) {
+        p->iov[p->iovs_num].iov_base = pages->block->host + p->normal[i];
         p->iov[p->iovs_num].iov_len = page_size;
         p->iovs_num++;
     }
 
-    p->next_packet_size = p->pages->num * page_size;
+    p->next_packet_size = p->normal_num * page_size;
     p->flags |= MULTIFD_FLAG_NOCOMP;
     return 0;
 }
@@ -262,7 +262,7 @@ static void multifd_send_fill_packet(MultiFDSendParams *p)
 
     packet->flags = cpu_to_be32(p->flags);
     packet->pages_alloc = cpu_to_be32(p->pages->allocated);
-    packet->pages_used = cpu_to_be32(p->pages->num);
+    packet->pages_used = cpu_to_be32(p->normal_num);
     packet->next_packet_size = cpu_to_be32(p->next_packet_size);
     packet->packet_num = cpu_to_be64(p->packet_num);
 
@@ -270,9 +270,9 @@ static void multifd_send_fill_packet(MultiFDSendParams *p)
         strncpy(packet->ramblock, p->pages->block->idstr, 256);
     }
 
-    for (i = 0; i < p->pages->num; i++) {
+    for (i = 0; i < p->normal_num; i++) {
         /* there are architectures where ram_addr_t is 32 bit */
-        uint64_t temp = p->pages->offset[i];
+        uint64_t temp = p->normal[i];
 
         packet->offset[i] = cpu_to_be64(temp);
     }
@@ -556,6 +556,8 @@ void multifd_save_cleanup(void)
         p->packet = NULL;
         g_free(p->iov);
         p->iov = NULL;
+        g_free(p->normal);
+        p->normal = NULL;
         multifd_send_state->ops->send_cleanup(p, &local_err);
         if (local_err) {
             migrate_set_error(migrate_get_current(), local_err);
@@ -640,12 +642,17 @@ static void *multifd_send_thread(void *opaque)
         qemu_mutex_lock(&p->mutex);
 
         if (p->pending_job) {
-            uint32_t used = p->pages->num;
             uint64_t packet_num = p->packet_num;
             uint32_t flags = p->flags;
             p->iovs_num = 1;
+            p->normal_num = 0;
+
+            for (int i = 0; i < p->pages->num; i++) {
+                p->normal[p->normal_num] = p->pages->offset[i];
+                p->normal_num++;
+            }
 
-            if (used) {
+            if (p->normal_num) {
                 ret = multifd_send_state->ops->send_prepare(p, &local_err);
                 if (ret != 0) {
                     qemu_mutex_unlock(&p->mutex);
@@ -655,12 +662,12 @@ static void *multifd_send_thread(void *opaque)
             multifd_send_fill_packet(p);
             p->flags = 0;
             p->num_packets++;
-            p->num_pages += used;
+            p->total_normal_pages += p->normal_num;
             p->pages->num = 0;
             p->pages->block = NULL;
             qemu_mutex_unlock(&p->mutex);
 
-            trace_multifd_send(p->id, packet_num, used, flags,
+            trace_multifd_send(p->id, packet_num, p->normal_num, flags,
                                p->next_packet_size);
 
             p->iov[0].iov_len = p->packet_len;
@@ -710,7 +717,7 @@ out:
     qemu_mutex_unlock(&p->mutex);
 
     rcu_unregister_thread();
-    trace_multifd_send_thread_end(p->id, p->num_packets, p->num_pages);
+    trace_multifd_send_thread_end(p->id, p->num_packets, p->total_normal_pages);
 
     return NULL;
 }
@@ -910,6 +917,7 @@ int multifd_save_setup(Error **errp)
         p->tls_hostname = g_strdup(s->hostname);
         /* We need one extra place for the packet header */
         p->iov = g_new0(struct iovec, page_count + 1);
+        p->normal = g_new0(ram_addr_t, page_count);
         socket_send_channel_create(multifd_new_send_channel_async, p);
     }
 
diff --git a/migration/multifd.h b/migration/multifd.h
index 7496f951a79..7823199dbe6 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -104,14 +104,18 @@ typedef struct {
     /* thread local variables */
     /* packets sent through this channel */
     uint64_t num_packets;
-    /* pages sent through this channel */
-    uint64_t num_pages;
+    /* non zero pages sent through this channel */
+    uint64_t total_normal_pages;
     /* syncs main thread and channels */
     QemuSemaphore sem_sync;
     /* buffers to send */
     struct iovec *iov;
     /* number of iovs used */
     uint32_t iovs_num;
+    /* Pages that are not zero */
+    ram_addr_t *normal;
+    /* num of non zero pages */
+    uint32_t normal_num;
     /* used for compression methods */
     void *data;
 }  MultiFDSendParams;
diff --git a/migration/trace-events b/migration/trace-events
index 5172cb3b3da..171a83a55da 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -124,13 +124,13 @@ multifd_recv_sync_main_wait(uint8_t id) "channel %u"
 multifd_recv_terminate_threads(bool error) "error %d"
 multifd_recv_thread_end(uint8_t id, uint64_t packets, uint64_t pages) "channel %u packets %" PRIu64 " pages %" PRIu64
 multifd_recv_thread_start(uint8_t id) "%u"
-multifd_send(uint8_t id, uint64_t packet_num, uint32_t used, uint32_t flags, uint32_t next_packet_size) "channel %u packet_num %" PRIu64 " pages %u flags 0x%x next packet size %u"
+multifd_send(uint8_t id, uint64_t packet_num, uint32_t normal, uint32_t flags, uint32_t next_packet_size) "channel %u packet_num %" PRIu64 " normal pages %u flags 0x%x next packet size %u"
 multifd_send_error(uint8_t id) "channel %u"
 multifd_send_sync_main(long packet_num) "packet num %ld"
 multifd_send_sync_main_signal(uint8_t id) "channel %u"
 multifd_send_sync_main_wait(uint8_t id) "channel %u"
 multifd_send_terminate_threads(bool error) "error %d"
-multifd_send_thread_end(uint8_t id, uint64_t packets, uint64_t pages) "channel %u packets %" PRIu64 " pages %"  PRIu64
+multifd_send_thread_end(uint8_t id, uint64_t packets, uint64_t normal_pages) "channel %u packets %" PRIu64 " normal pages %"  PRIu64
 multifd_send_thread_start(uint8_t id) "%u"
 multifd_tls_outgoing_handshake_start(void *ioc, void *tioc, const char *hostname) "ioc=%p tioc=%p hostname=%s"
 multifd_tls_outgoing_handshake_error(void *ioc, const char *err) "ioc=%p err=%s"
-- 
Gitee


From 9d01b3de5711b2af5650f8d7a19caa220ef178f5 Mon Sep 17 00:00:00 2001
From: Leonardo Bras <leobras@redhat.com>
Date: Wed, 18 May 2022 02:52:24 -0300
Subject: [PATCH 126/407] QIOChannel: Add flags on io_writev and introduce
 io_flush callback
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Leonardo Brás <leobras@redhat.com>
RH-MergeRequest: 191: MSG_ZEROCOPY + Multifd @ rhel8.7
RH-Commit: [17/26] 7bde4e79fd3f76a6cc84d9cacf50420584ddd35c
RH-Bugzilla: 2072049
RH-Acked-by: Peter Xu <peterx@redhat.com>
RH-Acked-by: Daniel P. Berrangé <berrange@redhat.com>
RH-Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Add flags to io_writev and introduce io_flush as optional callback to
QIOChannelClass, allowing the implementation of zero copy writes by
subclasses.

How to use them:
- Write data using qio_channel_writev*(...,QIO_CHANNEL_WRITE_FLAG_ZERO_COPY),
- Wait write completion with qio_channel_flush().

Notes:
As some zero copy write implementations work asynchronously, it's
recommended to keep the write buffer untouched until the return of
qio_channel_flush(), to avoid the risk of sending an updated buffer
instead of the buffer state during write.

As io_flush callback is optional, if a subclass does not implement it, then:
- io_flush will return 0 without changing anything.

Also, some functions like qio_channel_writev_full_all() were adapted to
receive a flag parameter. That allows shared code between zero copy and
non-zero copy writev, and also an easier implementation on new flags.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20220513062836.965425-3-leobras@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
(cherry picked from commit b88651cb4d4fa416fdbb6afaf5b26ec8c035eaad)
Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 chardev/char-io.c                   |  2 +-
 hw/remote/mpqemu-link.c             |  2 +-
 include/io/channel.h                | 38 +++++++++++++++++++++-
 io/channel-buffer.c                 |  1 +
 io/channel-command.c                |  1 +
 io/channel-file.c                   |  1 +
 io/channel-socket.c                 |  2 ++
 io/channel-tls.c                    |  1 +
 io/channel-websock.c                |  1 +
 io/channel.c                        | 49 +++++++++++++++++++++++------
 migration/rdma.c                    |  1 +
 scsi/pr-manager-helper.c            |  2 +-
 tests/unit/test-io-channel-socket.c |  1 +
 13 files changed, 88 insertions(+), 14 deletions(-)

diff --git a/chardev/char-io.c b/chardev/char-io.c
index 8ced1841609..4451128cba5 100644
--- a/chardev/char-io.c
+++ b/chardev/char-io.c
@@ -122,7 +122,7 @@ int io_channel_send_full(QIOChannel *ioc,
 
         ret = qio_channel_writev_full(
             ioc, &iov, 1,
-            fds, nfds, NULL);
+            fds, nfds, 0, NULL);
         if (ret == QIO_CHANNEL_ERR_BLOCK) {
             if (offset) {
                 return offset;
diff --git a/hw/remote/mpqemu-link.c b/hw/remote/mpqemu-link.c
index 7e841820e52..e8f556bd277 100644
--- a/hw/remote/mpqemu-link.c
+++ b/hw/remote/mpqemu-link.c
@@ -69,7 +69,7 @@ bool mpqemu_msg_send(MPQemuMsg *msg, QIOChannel *ioc, Error **errp)
     }
 
     if (!qio_channel_writev_full_all(ioc, send, G_N_ELEMENTS(send),
-                                    fds, nfds, errp)) {
+                                    fds, nfds, 0, errp)) {
         ret = true;
     } else {
         trace_mpqemu_send_io_error(msg->cmd, msg->size, nfds);
diff --git a/include/io/channel.h b/include/io/channel.h
index 88988979f88..c680ee74802 100644
--- a/include/io/channel.h
+++ b/include/io/channel.h
@@ -32,12 +32,15 @@ OBJECT_DECLARE_TYPE(QIOChannel, QIOChannelClass,
 
 #define QIO_CHANNEL_ERR_BLOCK -2
 
+#define QIO_CHANNEL_WRITE_FLAG_ZERO_COPY 0x1
+
 typedef enum QIOChannelFeature QIOChannelFeature;
 
 enum QIOChannelFeature {
     QIO_CHANNEL_FEATURE_FD_PASS,
     QIO_CHANNEL_FEATURE_SHUTDOWN,
     QIO_CHANNEL_FEATURE_LISTEN,
+    QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY,
 };
 
 
@@ -104,6 +107,7 @@ struct QIOChannelClass {
                          size_t niov,
                          int *fds,
                          size_t nfds,
+                         int flags,
                          Error **errp);
     ssize_t (*io_readv)(QIOChannel *ioc,
                         const struct iovec *iov,
@@ -136,6 +140,8 @@ struct QIOChannelClass {
                                   IOHandler *io_read,
                                   IOHandler *io_write,
                                   void *opaque);
+    int (*io_flush)(QIOChannel *ioc,
+                    Error **errp);
 };
 
 /* General I/O handling functions */
@@ -228,6 +234,7 @@ ssize_t qio_channel_readv_full(QIOChannel *ioc,
  * @niov: the length of the @iov array
  * @fds: an array of file handles to send
  * @nfds: number of file handles in @fds
+ * @flags: write flags (QIO_CHANNEL_WRITE_FLAG_*)
  * @errp: pointer to a NULL-initialized error object
  *
  * Write data to the IO channel, reading it from the
@@ -260,6 +267,7 @@ ssize_t qio_channel_writev_full(QIOChannel *ioc,
                                 size_t niov,
                                 int *fds,
                                 size_t nfds,
+                                int flags,
                                 Error **errp);
 
 /**
@@ -837,6 +845,7 @@ int qio_channel_readv_full_all(QIOChannel *ioc,
  * @niov: the length of the @iov array
  * @fds: an array of file handles to send
  * @nfds: number of file handles in @fds
+ * @flags: write flags (QIO_CHANNEL_WRITE_FLAG_*)
  * @errp: pointer to a NULL-initialized error object
  *
  *
@@ -846,6 +855,14 @@ int qio_channel_readv_full_all(QIOChannel *ioc,
  * to be written, yielding from the current coroutine
  * if required.
  *
+ * If QIO_CHANNEL_WRITE_FLAG_ZERO_COPY is passed in flags,
+ * instead of waiting for all requested data to be written,
+ * this function will wait until it's all queued for writing.
+ * In this case, if the buffer gets changed between queueing and
+ * sending, the updated buffer will be sent. If this is not a
+ * desired behavior, it's suggested to call qio_channel_flush()
+ * before reusing the buffer.
+ *
  * Returns: 0 if all bytes were written, or -1 on error
  */
 
@@ -853,6 +870,25 @@ int qio_channel_writev_full_all(QIOChannel *ioc,
                                 const struct iovec *iov,
                                 size_t niov,
                                 int *fds, size_t nfds,
-                                Error **errp);
+                                int flags, Error **errp);
+
+/**
+ * qio_channel_flush:
+ * @ioc: the channel object
+ * @errp: pointer to a NULL-initialized error object
+ *
+ * Will block until every packet queued with
+ * qio_channel_writev_full() + QIO_CHANNEL_WRITE_FLAG_ZERO_COPY
+ * is sent, or return in case of any error.
+ *
+ * If not implemented, acts as a no-op, and returns 0.
+ *
+ * Returns -1 if any error is found,
+ *          1 if every send failed to use zero copy.
+ *          0 otherwise.
+ */
+
+int qio_channel_flush(QIOChannel *ioc,
+                      Error **errp);
 
 #endif /* QIO_CHANNEL_H */
diff --git a/io/channel-buffer.c b/io/channel-buffer.c
index baa4e2b089f..bf52011be2d 100644
--- a/io/channel-buffer.c
+++ b/io/channel-buffer.c
@@ -81,6 +81,7 @@ static ssize_t qio_channel_buffer_writev(QIOChannel *ioc,
                                          size_t niov,
                                          int *fds,
                                          size_t nfds,
+                                         int flags,
                                          Error **errp)
 {
     QIOChannelBuffer *bioc = QIO_CHANNEL_BUFFER(ioc);
diff --git a/io/channel-command.c b/io/channel-command.c
index b2a9e271382..5ff1691badd 100644
--- a/io/channel-command.c
+++ b/io/channel-command.c
@@ -258,6 +258,7 @@ static ssize_t qio_channel_command_writev(QIOChannel *ioc,
                                           size_t niov,
                                           int *fds,
                                           size_t nfds,
+                                          int flags,
                                           Error **errp)
 {
     QIOChannelCommand *cioc = QIO_CHANNEL_COMMAND(ioc);
diff --git a/io/channel-file.c b/io/channel-file.c
index c4bf799a802..348a48545eb 100644
--- a/io/channel-file.c
+++ b/io/channel-file.c
@@ -114,6 +114,7 @@ static ssize_t qio_channel_file_writev(QIOChannel *ioc,
                                        size_t niov,
                                        int *fds,
                                        size_t nfds,
+                                       int flags,
                                        Error **errp)
 {
     QIOChannelFile *fioc = QIO_CHANNEL_FILE(ioc);
diff --git a/io/channel-socket.c b/io/channel-socket.c
index 606ec97cf7c..bfbd64787eb 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -525,6 +525,7 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
                                          size_t niov,
                                          int *fds,
                                          size_t nfds,
+                                         int flags,
                                          Error **errp)
 {
     QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc);
@@ -620,6 +621,7 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
                                          size_t niov,
                                          int *fds,
                                          size_t nfds,
+                                         int flags,
                                          Error **errp)
 {
     QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc);
diff --git a/io/channel-tls.c b/io/channel-tls.c
index 2ae1b92fc0a..4ce890a5382 100644
--- a/io/channel-tls.c
+++ b/io/channel-tls.c
@@ -301,6 +301,7 @@ static ssize_t qio_channel_tls_writev(QIOChannel *ioc,
                                       size_t niov,
                                       int *fds,
                                       size_t nfds,
+                                      int flags,
                                       Error **errp)
 {
     QIOChannelTLS *tioc = QIO_CHANNEL_TLS(ioc);
diff --git a/io/channel-websock.c b/io/channel-websock.c
index 70889bb54da..035dd6075b1 100644
--- a/io/channel-websock.c
+++ b/io/channel-websock.c
@@ -1127,6 +1127,7 @@ static ssize_t qio_channel_websock_writev(QIOChannel *ioc,
                                           size_t niov,
                                           int *fds,
                                           size_t nfds,
+                                          int flags,
                                           Error **errp)
 {
     QIOChannelWebsock *wioc = QIO_CHANNEL_WEBSOCK(ioc);
diff --git a/io/channel.c b/io/channel.c
index e8b019dc36e..0640941ac57 100644
--- a/io/channel.c
+++ b/io/channel.c
@@ -72,18 +72,32 @@ ssize_t qio_channel_writev_full(QIOChannel *ioc,
                                 size_t niov,
                                 int *fds,
                                 size_t nfds,
+                                int flags,
                                 Error **errp)
 {
     QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc);
 
-    if ((fds || nfds) &&
-        !qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_FD_PASS)) {
+    if (fds || nfds) {
+        if (!qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_FD_PASS)) {
+            error_setg_errno(errp, EINVAL,
+                             "Channel does not support file descriptor passing");
+            return -1;
+        }
+        if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) {
+            error_setg_errno(errp, EINVAL,
+                             "Zero Copy does not support file descriptor passing");
+            return -1;
+        }
+    }
+
+    if ((flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) &&
+        !qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY)) {
         error_setg_errno(errp, EINVAL,
-                         "Channel does not support file descriptor passing");
+                         "Requested Zero Copy feature is not available");
         return -1;
     }
 
-    return klass->io_writev(ioc, iov, niov, fds, nfds, errp);
+    return klass->io_writev(ioc, iov, niov, fds, nfds, flags, errp);
 }
 
 
@@ -217,14 +231,14 @@ int qio_channel_writev_all(QIOChannel *ioc,
                            size_t niov,
                            Error **errp)
 {
-    return qio_channel_writev_full_all(ioc, iov, niov, NULL, 0, errp);
+    return qio_channel_writev_full_all(ioc, iov, niov, NULL, 0, 0, errp);
 }
 
 int qio_channel_writev_full_all(QIOChannel *ioc,
                                 const struct iovec *iov,
                                 size_t niov,
                                 int *fds, size_t nfds,
-                                Error **errp)
+                                int flags, Error **errp)
 {
     int ret = -1;
     struct iovec *local_iov = g_new(struct iovec, niov);
@@ -237,8 +251,10 @@ int qio_channel_writev_full_all(QIOChannel *ioc,
 
     while (nlocal_iov > 0) {
         ssize_t len;
-        len = qio_channel_writev_full(ioc, local_iov, nlocal_iov, fds, nfds,
-                                      errp);
+
+        len = qio_channel_writev_full(ioc, local_iov, nlocal_iov, fds,
+                                            nfds, flags, errp);
+
         if (len == QIO_CHANNEL_ERR_BLOCK) {
             if (qemu_in_coroutine()) {
                 qio_channel_yield(ioc, G_IO_OUT);
@@ -277,7 +293,7 @@ ssize_t qio_channel_writev(QIOChannel *ioc,
                            size_t niov,
                            Error **errp)
 {
-    return qio_channel_writev_full(ioc, iov, niov, NULL, 0, errp);
+    return qio_channel_writev_full(ioc, iov, niov, NULL, 0, 0, errp);
 }
 
 
@@ -297,7 +313,7 @@ ssize_t qio_channel_write(QIOChannel *ioc,
                           Error **errp)
 {
     struct iovec iov = { .iov_base = (char *)buf, .iov_len = buflen };
-    return qio_channel_writev_full(ioc, &iov, 1, NULL, 0, errp);
+    return qio_channel_writev_full(ioc, &iov, 1, NULL, 0, 0, errp);
 }
 
 
@@ -473,6 +489,19 @@ off_t qio_channel_io_seek(QIOChannel *ioc,
     return klass->io_seek(ioc, offset, whence, errp);
 }
 
+int qio_channel_flush(QIOChannel *ioc,
+                                Error **errp)
+{
+    QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc);
+
+    if (!klass->io_flush ||
+        !qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY)) {
+        return 0;
+    }
+
+    return klass->io_flush(ioc, errp);
+}
+
 
 static void qio_channel_restart_read(void *opaque)
 {
diff --git a/migration/rdma.c b/migration/rdma.c
index f5d3bbe7e9c..54acd2000e0 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2833,6 +2833,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
                                        size_t niov,
                                        int *fds,
                                        size_t nfds,
+                                       int flags,
                                        Error **errp)
 {
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
diff --git a/scsi/pr-manager-helper.c b/scsi/pr-manager-helper.c
index 451c7631b76..3be52a98d58 100644
--- a/scsi/pr-manager-helper.c
+++ b/scsi/pr-manager-helper.c
@@ -77,7 +77,7 @@ static int pr_manager_helper_write(PRManagerHelper *pr_mgr,
         iov.iov_base = (void *)buf;
         iov.iov_len = sz;
         n_written = qio_channel_writev_full(QIO_CHANNEL(pr_mgr->ioc), &iov, 1,
-                                            nfds ? &fd : NULL, nfds, errp);
+                                            nfds ? &fd : NULL, nfds, 0, errp);
 
         if (n_written <= 0) {
             assert(n_written != QIO_CHANNEL_ERR_BLOCK);
diff --git a/tests/unit/test-io-channel-socket.c b/tests/unit/test-io-channel-socket.c
index c49eec1f038..6713886d02a 100644
--- a/tests/unit/test-io-channel-socket.c
+++ b/tests/unit/test-io-channel-socket.c
@@ -444,6 +444,7 @@ static void test_io_channel_unix_fd_pass(void)
                             G_N_ELEMENTS(iosend),
                             fdsend,
                             G_N_ELEMENTS(fdsend),
+                            0,
                             &error_abort);
 
     qio_channel_readv_full(dst,
-- 
Gitee


From 389206199ba09860ca748309a33109ac52bc3047 Mon Sep 17 00:00:00 2001
From: Leonardo Bras <leobras@redhat.com>
Date: Wed, 18 May 2022 02:52:25 -0300
Subject: [PATCH 127/407] QIOChannelSocket: Implement io_writev zero copy flag
 & io_flush for CONFIG_LINUX
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Leonardo Brás <leobras@redhat.com>
RH-MergeRequest: 191: MSG_ZEROCOPY + Multifd @ rhel8.7
RH-Commit: [18/26] 6f65c8c879a5df57213b541d58285b65178f8547
RH-Bugzilla: 2072049
RH-Acked-by: Peter Xu <peterx@redhat.com>
RH-Acked-by: Daniel P. Berrangé <berrange@redhat.com>
RH-Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

For CONFIG_LINUX, implement the new zero copy flag and the optional callback
io_flush on QIOChannelSocket, but enables it only when MSG_ZEROCOPY
feature is available in the host kernel, which is checked on
qio_channel_socket_connect_sync()

qio_channel_socket_flush() was implemented by counting how many times
sendmsg(...,MSG_ZEROCOPY) was successfully called, and then reading the
socket's error queue, in order to find how many of them finished sending.
Flush will loop until those counters are the same, or until some error occurs.

Notes on using writev() with QIO_CHANNEL_WRITE_FLAG_ZERO_COPY:
1: Buffer
- As MSG_ZEROCOPY tells the kernel to use the same user buffer to avoid copying,
some caution is necessary to avoid overwriting any buffer before it's sent.
If something like this happen, a newer version of the buffer may be sent instead.
- If this is a problem, it's recommended to call qio_channel_flush() before freeing
or re-using the buffer.

2: Locked memory
- When using MSG_ZERCOCOPY, the buffer memory will be locked after queued, and
unlocked after it's sent.
- Depending on the size of each buffer, and how often it's sent, it may require
a larger amount of locked memory than usually available to non-root user.
- If the required amount of locked memory is not available, writev_zero_copy
will return an error, which can abort an operation like migration,
- Because of this, when an user code wants to add zero copy as a feature, it
requires a mechanism to disable it, so it can still be accessible to less
privileged users.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20220513062836.965425-4-leobras@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
(cherry picked from commit 2bc58ffc2926a4efdd03edfb5909861fefc68c3d)
Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 include/io/channel-socket.h |   2 +
 io/channel-socket.c         | 116 ++++++++++++++++++++++++++++++++++--
 2 files changed, 114 insertions(+), 4 deletions(-)

diff --git a/include/io/channel-socket.h b/include/io/channel-socket.h
index e747e635147..513c428fe48 100644
--- a/include/io/channel-socket.h
+++ b/include/io/channel-socket.h
@@ -47,6 +47,8 @@ struct QIOChannelSocket {
     socklen_t localAddrLen;
     struct sockaddr_storage remoteAddr;
     socklen_t remoteAddrLen;
+    ssize_t zero_copy_queued;
+    ssize_t zero_copy_sent;
 };
 
 
diff --git a/io/channel-socket.c b/io/channel-socket.c
index bfbd64787eb..38a46ba213a 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -26,6 +26,14 @@
 #include "io/channel-watch.h"
 #include "trace.h"
 #include "qapi/clone-visitor.h"
+#ifdef CONFIG_LINUX
+#include <linux/errqueue.h>
+#include <sys/socket.h>
+
+#if (defined(MSG_ZEROCOPY) && defined(SO_ZEROCOPY))
+#define QEMU_MSG_ZEROCOPY
+#endif
+#endif
 
 #define SOCKET_MAX_FDS 16
 
@@ -55,6 +63,8 @@ qio_channel_socket_new(void)
 
     sioc = QIO_CHANNEL_SOCKET(object_new(TYPE_QIO_CHANNEL_SOCKET));
     sioc->fd = -1;
+    sioc->zero_copy_queued = 0;
+    sioc->zero_copy_sent = 0;
 
     ioc = QIO_CHANNEL(sioc);
     qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN);
@@ -154,6 +164,16 @@ int qio_channel_socket_connect_sync(QIOChannelSocket *ioc,
         return -1;
     }
 
+#ifdef QEMU_MSG_ZEROCOPY
+    int ret, v = 1;
+    ret = setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY, &v, sizeof(v));
+    if (ret == 0) {
+        /* Zero copy available on host */
+        qio_channel_set_feature(QIO_CHANNEL(ioc),
+                                QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY);
+    }
+#endif
+
     return 0;
 }
 
@@ -534,6 +554,7 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
     char control[CMSG_SPACE(sizeof(int) * SOCKET_MAX_FDS)];
     size_t fdsize = sizeof(int) * nfds;
     struct cmsghdr *cmsg;
+    int sflags = 0;
 
     memset(control, 0, CMSG_SPACE(sizeof(int) * SOCKET_MAX_FDS));
 
@@ -558,15 +579,31 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
         memcpy(CMSG_DATA(cmsg), fds, fdsize);
     }
 
+#ifdef QEMU_MSG_ZEROCOPY
+    if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) {
+        sflags = MSG_ZEROCOPY;
+    }
+#endif
+
  retry:
-    ret = sendmsg(sioc->fd, &msg, 0);
+    ret = sendmsg(sioc->fd, &msg, sflags);
     if (ret <= 0) {
-        if (errno == EAGAIN) {
+        switch (errno) {
+        case EAGAIN:
             return QIO_CHANNEL_ERR_BLOCK;
-        }
-        if (errno == EINTR) {
+        case EINTR:
             goto retry;
+#ifdef QEMU_MSG_ZEROCOPY
+        case ENOBUFS:
+            if (sflags & MSG_ZEROCOPY) {
+                error_setg_errno(errp, errno,
+                                 "Process can't lock enough memory for using MSG_ZEROCOPY");
+                return -1;
+            }
+            break;
+#endif
         }
+
         error_setg_errno(errp, errno,
                          "Unable to write to socket");
         return -1;
@@ -660,6 +697,74 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
 }
 #endif /* WIN32 */
 
+
+#ifdef QEMU_MSG_ZEROCOPY
+static int qio_channel_socket_flush(QIOChannel *ioc,
+                                    Error **errp)
+{
+    QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc);
+    struct msghdr msg = {};
+    struct sock_extended_err *serr;
+    struct cmsghdr *cm;
+    char control[CMSG_SPACE(sizeof(*serr))];
+    int received;
+    int ret = 1;
+
+    msg.msg_control = control;
+    msg.msg_controllen = sizeof(control);
+    memset(control, 0, sizeof(control));
+
+    while (sioc->zero_copy_sent < sioc->zero_copy_queued) {
+        received = recvmsg(sioc->fd, &msg, MSG_ERRQUEUE);
+        if (received < 0) {
+            switch (errno) {
+            case EAGAIN:
+                /* Nothing on errqueue, wait until something is available */
+                qio_channel_wait(ioc, G_IO_ERR);
+                continue;
+            case EINTR:
+                continue;
+            default:
+                error_setg_errno(errp, errno,
+                                 "Unable to read errqueue");
+                return -1;
+            }
+        }
+
+        cm = CMSG_FIRSTHDR(&msg);
+        if (cm->cmsg_level != SOL_IP &&
+            cm->cmsg_type != IP_RECVERR) {
+            error_setg_errno(errp, EPROTOTYPE,
+                             "Wrong cmsg in errqueue");
+            return -1;
+        }
+
+        serr = (void *) CMSG_DATA(cm);
+        if (serr->ee_errno != SO_EE_ORIGIN_NONE) {
+            error_setg_errno(errp, serr->ee_errno,
+                             "Error on socket");
+            return -1;
+        }
+        if (serr->ee_origin != SO_EE_ORIGIN_ZEROCOPY) {
+            error_setg_errno(errp, serr->ee_origin,
+                             "Error not from zero copy");
+            return -1;
+        }
+
+        /* No errors, count successfully finished sendmsg()*/
+        sioc->zero_copy_sent += serr->ee_data - serr->ee_info + 1;
+
+        /* If any sendmsg() succeeded using zero copy, return 0 at the end */
+        if (serr->ee_code != SO_EE_CODE_ZEROCOPY_COPIED) {
+            ret = 0;
+        }
+    }
+
+    return ret;
+}
+
+#endif /* QEMU_MSG_ZEROCOPY */
+
 static int
 qio_channel_socket_set_blocking(QIOChannel *ioc,
                                 bool enabled,
@@ -789,6 +894,9 @@ static void qio_channel_socket_class_init(ObjectClass *klass,
     ioc_klass->io_set_delay = qio_channel_socket_set_delay;
     ioc_klass->io_create_watch = qio_channel_socket_create_watch;
     ioc_klass->io_set_aio_fd_handler = qio_channel_socket_set_aio_fd_handler;
+#ifdef QEMU_MSG_ZEROCOPY
+    ioc_klass->io_flush = qio_channel_socket_flush;
+#endif
 }
 
 static const TypeInfo qio_channel_socket_info = {
-- 
Gitee


From a67bd1637b6817d84b320f7529c17424ffdc863c Mon Sep 17 00:00:00 2001
From: Leonardo Bras <leobras@redhat.com>
Date: Wed, 18 May 2022 02:52:25 -0300
Subject: [PATCH 128/407] migration: Add zero-copy-send parameter for QMP/HMP
 for Linux
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Leonardo Brás <leobras@redhat.com>
RH-MergeRequest: 191: MSG_ZEROCOPY + Multifd @ rhel8.7
RH-Commit: [19/26] 44ec703088cad75fd6e504958527e81d3261c9df
RH-Bugzilla: 2072049
RH-Acked-by: Peter Xu <peterx@redhat.com>
RH-Acked-by: Daniel P. Berrangé <berrange@redhat.com>
RH-Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Add property that allows zero-copy migration of memory pages
on the sending side, and also includes a helper function
migrate_use_zero_copy_send() to check if it's enabled.

No code is introduced to actually do the migration, but it allow
future implementations to enable/disable this feature.

On non-Linux builds this parameter is compiled-out.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20220513062836.965425-5-leobras@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
(cherry picked from commit abb6295b3ace5d17c3a65936913fc346616dbf14)
Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 migration/migration.c | 32 ++++++++++++++++++++++++++++++++
 migration/migration.h |  5 +++++
 migration/socket.c    | 11 +++++++++--
 monitor/hmp-cmds.c    |  6 ++++++
 qapi/migration.json   | 24 ++++++++++++++++++++++++
 5 files changed, 76 insertions(+), 2 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 8a13294da69..b0fc3f68bd1 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -888,6 +888,10 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
     params->multifd_zlib_level = s->parameters.multifd_zlib_level;
     params->has_multifd_zstd_level = true;
     params->multifd_zstd_level = s->parameters.multifd_zstd_level;
+#ifdef CONFIG_LINUX
+    params->has_zero_copy_send = true;
+    params->zero_copy_send = s->parameters.zero_copy_send;
+#endif
     params->has_xbzrle_cache_size = true;
     params->xbzrle_cache_size = s->parameters.xbzrle_cache_size;
     params->has_max_postcopy_bandwidth = true;
@@ -1541,6 +1545,11 @@ static void migrate_params_test_apply(MigrateSetParameters *params,
     if (params->has_multifd_compression) {
         dest->multifd_compression = params->multifd_compression;
     }
+#ifdef CONFIG_LINUX
+    if (params->has_zero_copy_send) {
+        dest->zero_copy_send = params->zero_copy_send;
+    }
+#endif
     if (params->has_xbzrle_cache_size) {
         dest->xbzrle_cache_size = params->xbzrle_cache_size;
     }
@@ -1653,6 +1662,11 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
     if (params->has_multifd_compression) {
         s->parameters.multifd_compression = params->multifd_compression;
     }
+#ifdef CONFIG_LINUX
+    if (params->has_zero_copy_send) {
+        s->parameters.zero_copy_send = params->zero_copy_send;
+    }
+#endif
     if (params->has_xbzrle_cache_size) {
         s->parameters.xbzrle_cache_size = params->xbzrle_cache_size;
         xbzrle_cache_resize(params->xbzrle_cache_size, errp);
@@ -2543,6 +2557,17 @@ int migrate_multifd_zstd_level(void)
     return s->parameters.multifd_zstd_level;
 }
 
+#ifdef CONFIG_LINUX
+bool migrate_use_zero_copy_send(void)
+{
+    MigrationState *s;
+
+    s = migrate_get_current();
+
+    return s->parameters.zero_copy_send;
+}
+#endif
+
 int migrate_use_xbzrle(void)
 {
     MigrationState *s;
@@ -4193,6 +4218,10 @@ static Property migration_properties[] = {
     DEFINE_PROP_UINT8("multifd-zstd-level", MigrationState,
                       parameters.multifd_zstd_level,
                       DEFAULT_MIGRATE_MULTIFD_ZSTD_LEVEL),
+#ifdef CONFIG_LINUX
+    DEFINE_PROP_BOOL("zero_copy_send", MigrationState,
+                      parameters.zero_copy_send, false),
+#endif
     DEFINE_PROP_SIZE("xbzrle-cache-size", MigrationState,
                       parameters.xbzrle_cache_size,
                       DEFAULT_MIGRATE_XBZRLE_CACHE_SIZE),
@@ -4290,6 +4319,9 @@ static void migration_instance_init(Object *obj)
     params->has_multifd_compression = true;
     params->has_multifd_zlib_level = true;
     params->has_multifd_zstd_level = true;
+#ifdef CONFIG_LINUX
+    params->has_zero_copy_send = true;
+#endif
     params->has_xbzrle_cache_size = true;
     params->has_max_postcopy_bandwidth = true;
     params->has_max_cpu_throttle = true;
diff --git a/migration/migration.h b/migration/migration.h
index d016cedd9d2..908098939fe 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -339,6 +339,11 @@ MultiFDCompression migrate_multifd_compression(void);
 int migrate_multifd_zlib_level(void);
 int migrate_multifd_zstd_level(void);
 
+#ifdef CONFIG_LINUX
+bool migrate_use_zero_copy_send(void);
+#else
+#define migrate_use_zero_copy_send() (false)
+#endif
 int migrate_use_xbzrle(void);
 uint64_t migrate_xbzrle_cache_size(void);
 bool migrate_colo_enabled(void);
diff --git a/migration/socket.c b/migration/socket.c
index 05705a32d8f..3754d8f72c5 100644
--- a/migration/socket.c
+++ b/migration/socket.c
@@ -74,9 +74,16 @@ static void socket_outgoing_migration(QIOTask *task,
 
     if (qio_task_propagate_error(task, &err)) {
         trace_migration_socket_outgoing_error(error_get_pretty(err));
-    } else {
-        trace_migration_socket_outgoing_connected(data->hostname);
+           goto out;
     }
+
+    trace_migration_socket_outgoing_connected(data->hostname);
+
+    if (migrate_use_zero_copy_send()) {
+        error_setg(&err, "Zero copy send not available in migration");
+    }
+
+out:
     migration_channel_connect(data->s, sioc, data->hostname, err);
     object_unref(OBJECT(sioc));
 }
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 2669156b284..e02da5008bf 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -1297,6 +1297,12 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
         p->has_multifd_zstd_level = true;
         visit_type_uint8(v, param, &p->multifd_zstd_level, &err);
         break;
+#ifdef CONFIG_LINUX
+    case MIGRATION_PARAMETER_ZERO_COPY_SEND:
+        p->has_zero_copy_send = true;
+        visit_type_bool(v, param, &p->zero_copy_send, &err);
+        break;
+#endif
     case MIGRATION_PARAMETER_XBZRLE_CACHE_SIZE:
         p->has_xbzrle_cache_size = true;
         if (!visit_type_size(v, param, &cache_size, &err)) {
diff --git a/qapi/migration.json b/qapi/migration.json
index bbfd48cf0b1..59b5c5780b3 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -730,6 +730,13 @@
 #                      will consume more CPU.
 #                      Defaults to 1. (Since 5.0)
 #
+# @zero-copy-send: Controls behavior on sending memory pages on migration.
+#                  When true, enables a zero-copy mechanism for sending
+#                  memory pages, if host supports it.
+#                  Requires that QEMU be permitted to use locked memory
+#                  for guest RAM pages.
+#                  Defaults to false. (Since 7.1)
+#
 # @block-bitmap-mapping: Maps block nodes and bitmaps on them to
 #                        aliases for the purpose of dirty bitmap migration.  Such
 #                        aliases may for example be the corresponding names on the
@@ -769,6 +776,7 @@
            'xbzrle-cache-size', 'max-postcopy-bandwidth',
            'max-cpu-throttle', 'multifd-compression',
            'multifd-zlib-level' ,'multifd-zstd-level',
+           { 'name': 'zero-copy-send', 'if' : 'CONFIG_LINUX'},
            'block-bitmap-mapping' ] }
 
 ##
@@ -895,6 +903,13 @@
 #                      will consume more CPU.
 #                      Defaults to 1. (Since 5.0)
 #
+# @zero-copy-send: Controls behavior on sending memory pages on migration.
+#                  When true, enables a zero-copy mechanism for sending
+#                  memory pages, if host supports it.
+#                  Requires that QEMU be permitted to use locked memory
+#                  for guest RAM pages.
+#                  Defaults to false. (Since 7.1)
+#
 # @block-bitmap-mapping: Maps block nodes and bitmaps on them to
 #                        aliases for the purpose of dirty bitmap migration.  Such
 #                        aliases may for example be the corresponding names on the
@@ -949,6 +964,7 @@
             '*multifd-compression': 'MultiFDCompression',
             '*multifd-zlib-level': 'uint8',
             '*multifd-zstd-level': 'uint8',
+            '*zero-copy-send': { 'type': 'bool', 'if': 'CONFIG_LINUX' },
             '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
 
 ##
@@ -1095,6 +1111,13 @@
 #                      will consume more CPU.
 #                      Defaults to 1. (Since 5.0)
 #
+# @zero-copy-send: Controls behavior on sending memory pages on migration.
+#                  When true, enables a zero-copy mechanism for sending
+#                  memory pages, if host supports it.
+#                  Requires that QEMU be permitted to use locked memory
+#                  for guest RAM pages.
+#                  Defaults to false. (Since 7.1)
+#
 # @block-bitmap-mapping: Maps block nodes and bitmaps on them to
 #                        aliases for the purpose of dirty bitmap migration.  Such
 #                        aliases may for example be the corresponding names on the
@@ -1147,6 +1170,7 @@
             '*multifd-compression': 'MultiFDCompression',
             '*multifd-zlib-level': 'uint8',
             '*multifd-zstd-level': 'uint8',
+            '*zero-copy-send': { 'type': 'bool', 'if': 'CONFIG_LINUX' },
             '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
 
 ##
-- 
Gitee


From e62e44f837dc88a7cea3a8a3d9c52ae294b155e3 Mon Sep 17 00:00:00 2001
From: Leonardo Bras <leobras@redhat.com>
Date: Wed, 18 May 2022 02:52:25 -0300
Subject: [PATCH 129/407] migration: Add migrate_use_tls() helper
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Leonardo Brás <leobras@redhat.com>
RH-MergeRequest: 191: MSG_ZEROCOPY + Multifd @ rhel8.7
RH-Commit: [20/26] 02afc2e60f1abbf6db45d83e54a18b66dad52426
RH-Bugzilla: 2072049
RH-Acked-by: Peter Xu <peterx@redhat.com>
RH-Acked-by: Daniel P. Berrangé <berrange@redhat.com>
RH-Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

A lot of places check parameters.tls_creds in order to evaluate if TLS is
in use, and sometimes call migrate_get_current() just for that test.

Add new helper function migrate_use_tls() in order to simplify testing
for TLS usage.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220513062836.965425-6-leobras@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
(cherry picked from commit d2fafb6a6814a8998607d0baf691265032996a0f)
Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 migration/channel.c   | 3 +--
 migration/migration.c | 9 +++++++++
 migration/migration.h | 1 +
 migration/multifd.c   | 5 +----
 4 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/migration/channel.c b/migration/channel.c
index c4fc000a1a0..086b5c0d8b1 100644
--- a/migration/channel.c
+++ b/migration/channel.c
@@ -38,8 +38,7 @@ void migration_channel_process_incoming(QIOChannel *ioc)
     trace_migration_set_incoming_channel(
         ioc, object_get_typename(OBJECT(ioc)));
 
-    if (s->parameters.tls_creds &&
-        *s->parameters.tls_creds &&
+    if (migrate_use_tls() &&
         !object_dynamic_cast(OBJECT(ioc),
                              TYPE_QIO_CHANNEL_TLS)) {
         migration_tls_channel_process_incoming(s, ioc, &local_err);
diff --git a/migration/migration.c b/migration/migration.c
index b0fc3f68bd1..8e28f2ee413 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2568,6 +2568,15 @@ bool migrate_use_zero_copy_send(void)
 }
 #endif
 
+int migrate_use_tls(void)
+{
+    MigrationState *s;
+
+    s = migrate_get_current();
+
+    return s->parameters.tls_creds && *s->parameters.tls_creds;
+}
+
 int migrate_use_xbzrle(void)
 {
     MigrationState *s;
diff --git a/migration/migration.h b/migration/migration.h
index 908098939fe..9396b7e90a5 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -344,6 +344,7 @@ bool migrate_use_zero_copy_send(void);
 #else
 #define migrate_use_zero_copy_send() (false)
 #endif
+int migrate_use_tls(void);
 int migrate_use_xbzrle(void);
 uint64_t migrate_xbzrle_cache_size(void);
 bool migrate_colo_enabled(void);
diff --git a/migration/multifd.c b/migration/multifd.c
index 37252264002..e53811f04a9 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -789,14 +789,11 @@ static bool multifd_channel_connect(MultiFDSendParams *p,
                                     QIOChannel *ioc,
                                     Error *error)
 {
-    MigrationState *s = migrate_get_current();
-
     trace_multifd_set_outgoing_channel(
         ioc, object_get_typename(OBJECT(ioc)), p->tls_hostname, error);
 
     if (!error) {
-        if (s->parameters.tls_creds &&
-            *s->parameters.tls_creds &&
+        if (migrate_use_tls() &&
             !object_dynamic_cast(OBJECT(ioc),
                                  TYPE_QIO_CHANNEL_TLS)) {
             multifd_tls_channel_connect(p, ioc, &error);
-- 
Gitee


From cb89360a372c9d90e5ffc24ec6afdde2fa2c2f6c Mon Sep 17 00:00:00 2001
From: Leonardo Bras <leobras@redhat.com>
Date: Wed, 18 May 2022 02:52:25 -0300
Subject: [PATCH 130/407] multifd: multifd_send_sync_main now returns negative
 on error
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Leonardo Brás <leobras@redhat.com>
RH-MergeRequest: 191: MSG_ZEROCOPY + Multifd @ rhel8.7
RH-Commit: [21/26] b4e4f3663576aa87f3b2f66f1d38bad4f50bd4ac
RH-Bugzilla: 2072049
RH-Acked-by: Peter Xu <peterx@redhat.com>
RH-Acked-by: Daniel P. Berrangé <berrange@redhat.com>
RH-Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Even though multifd_send_sync_main() currently emits error_reports, it's
callers don't really check it before continuing.

Change multifd_send_sync_main() to return -1 on error and 0 on success.
Also change all it's callers to make use of this change and possibly fail
earlier.

(This change is important to next patch on  multifd zero copy
implementation, to make it sure an error in zero-copy flush does not go
unnoticed.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220513062836.965425-7-leobras@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
(cherry picked from commit 33d70973a3a6e8c6b62bcbc64d9e488961981007)
Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 migration/multifd.c | 10 ++++++----
 migration/multifd.h |  2 +-
 migration/ram.c     | 29 ++++++++++++++++++++++-------
 3 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index e53811f04a9..1e34e01ebcd 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -573,17 +573,17 @@ void multifd_save_cleanup(void)
     multifd_send_state = NULL;
 }
 
-void multifd_send_sync_main(QEMUFile *f)
+int multifd_send_sync_main(QEMUFile *f)
 {
     int i;
 
     if (!migrate_use_multifd()) {
-        return;
+        return 0;
     }
     if (multifd_send_state->pages->num) {
         if (multifd_send_pages(f) < 0) {
             error_report("%s: multifd_send_pages fail", __func__);
-            return;
+            return -1;
         }
     }
     for (i = 0; i < migrate_multifd_channels(); i++) {
@@ -596,7 +596,7 @@ void multifd_send_sync_main(QEMUFile *f)
         if (p->quit) {
             error_report("%s: channel %d has already quit", __func__, i);
             qemu_mutex_unlock(&p->mutex);
-            return;
+            return -1;
         }
 
         p->packet_num = multifd_send_state->packet_num++;
@@ -615,6 +615,8 @@ void multifd_send_sync_main(QEMUFile *f)
         qemu_sem_wait(&p->sem_sync);
     }
     trace_multifd_send_sync_main(multifd_send_state->packet_num);
+
+    return 0;
 }
 
 static void *multifd_send_thread(void *opaque)
diff --git a/migration/multifd.h b/migration/multifd.h
index 7823199dbe6..92de8781554 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -22,7 +22,7 @@ int multifd_load_cleanup(Error **errp);
 bool multifd_recv_all_channels_created(void);
 bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp);
 void multifd_recv_sync_main(void);
-void multifd_send_sync_main(QEMUFile *f);
+int multifd_send_sync_main(QEMUFile *f);
 int multifd_queue_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset);
 
 /* Multifd Compression flags */
diff --git a/migration/ram.c b/migration/ram.c
index 863035d2351..3e208efca74 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2992,6 +2992,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 {
     RAMState **rsp = opaque;
     RAMBlock *block;
+    int ret;
 
     if (compress_threads_save_setup()) {
         return -1;
@@ -3026,7 +3027,11 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     ram_control_before_iterate(f, RAM_CONTROL_SETUP);
     ram_control_after_iterate(f, RAM_CONTROL_SETUP);
 
-    multifd_send_sync_main(f);
+    ret =  multifd_send_sync_main(f);
+    if (ret < 0) {
+        return ret;
+    }
+
     qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
     qemu_fflush(f);
 
@@ -3135,7 +3140,11 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
 out:
     if (ret >= 0
         && migration_is_setup_or_active(migrate_get_current()->state)) {
-        multifd_send_sync_main(rs->f);
+        ret = multifd_send_sync_main(rs->f);
+        if (ret < 0) {
+            return ret;
+        }
+
         qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
         qemu_fflush(f);
         ram_counters.transferred += 8;
@@ -3193,13 +3202,19 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
         ram_control_after_iterate(f, RAM_CONTROL_FINISH);
     }
 
-    if (ret >= 0) {
-        multifd_send_sync_main(rs->f);
-        qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
-        qemu_fflush(f);
+    if (ret < 0) {
+        return ret;
     }
 
-    return ret;
+    ret = multifd_send_sync_main(rs->f);
+    if (ret < 0) {
+        return ret;
+    }
+
+    qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
+    qemu_fflush(f);
+
+    return 0;
 }
 
 static void ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
-- 
Gitee


From d2ad7f71004aa4a2455e66f24415181be299855e Mon Sep 17 00:00:00 2001
From: Leonardo Bras <leobras@redhat.com>
Date: Wed, 18 May 2022 02:52:25 -0300
Subject: [PATCH 131/407] multifd: Send header packet without flags if
 zero-copy-send is enabled
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Leonardo Brás <leobras@redhat.com>
RH-MergeRequest: 191: MSG_ZEROCOPY + Multifd @ rhel8.7
RH-Commit: [22/26] 9abfee42b72f11911cf128519826d09cbd2f5bc3
RH-Bugzilla: 2072049
RH-Acked-by: Peter Xu <peterx@redhat.com>
RH-Acked-by: Daniel P. Berrangé <berrange@redhat.com>
RH-Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Since d48c3a0445 ("multifd: Use a single writev on the send side"),
sending the header packet and the memory pages happens in the same
writev, which can potentially make the migration faster.

Using channel-socket as example, this works well with the default copying
mechanism of sendmsg(), but with zero-copy-send=true, it will cause
the migration to often break.

This happens because the header packet buffer gets reused quite often,
and there is a high chance that by the time the MSG_ZEROCOPY mechanism get
to send the buffer, it has already changed, sending the wrong data and
causing the migration to abort.

It means that, as it is, the buffer for the header packet is not suitable
for sending with MSG_ZEROCOPY.

In order to enable zero copy for multifd, send the header packet on an
individual write(), without any flags, and the remanining pages with a
writev(), as it was happening before. This only changes how a migration
with zero-copy-send=true works, not changing any current behavior for
migrations with zero-copy-send=false.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220513062836.965425-8-leobras@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
(cherry picked from commit b7dbdd8e76cd03453c234dbb9578d20969859d74)
Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 migration/multifd.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 1e34e01ebcd..193f70cdbac 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -624,6 +624,7 @@ static void *multifd_send_thread(void *opaque)
     MultiFDSendParams *p = opaque;
     Error *local_err = NULL;
     int ret = 0;
+    bool use_zero_copy_send = migrate_use_zero_copy_send();
 
     trace_multifd_send_thread_start(p->id);
     rcu_register_thread();
@@ -646,9 +647,14 @@ static void *multifd_send_thread(void *opaque)
         if (p->pending_job) {
             uint64_t packet_num = p->packet_num;
             uint32_t flags = p->flags;
-            p->iovs_num = 1;
             p->normal_num = 0;
 
+            if (use_zero_copy_send) {
+                p->iovs_num = 0;
+            } else {
+                p->iovs_num = 1;
+            }
+
             for (int i = 0; i < p->pages->num; i++) {
                 p->normal[p->normal_num] = p->pages->offset[i];
                 p->normal_num++;
@@ -672,8 +678,18 @@ static void *multifd_send_thread(void *opaque)
             trace_multifd_send(p->id, packet_num, p->normal_num, flags,
                                p->next_packet_size);
 
-            p->iov[0].iov_len = p->packet_len;
-            p->iov[0].iov_base = p->packet;
+            if (use_zero_copy_send) {
+                /* Send header first, without zerocopy */
+                ret = qio_channel_write_all(p->c, (void *)p->packet,
+                                            p->packet_len, &local_err);
+                if (ret != 0) {
+                    break;
+                }
+            } else {
+                /* Send header using the same writev call */
+                p->iov[0].iov_len = p->packet_len;
+                p->iov[0].iov_base = p->packet;
+            }
 
             ret = qio_channel_writev_all(p->c, p->iov, p->iovs_num,
                                          &local_err);
-- 
Gitee


From 36b54bb7049748a92d5874a6db3ffc83c3f33815 Mon Sep 17 00:00:00 2001
From: Leonardo Bras <leobras@redhat.com>
Date: Wed, 18 May 2022 02:52:25 -0300
Subject: [PATCH 132/407] multifd: Implement zero copy write in multifd
 migration (multifd-zero-copy)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Leonardo Brás <leobras@redhat.com>
RH-MergeRequest: 191: MSG_ZEROCOPY + Multifd @ rhel8.7
RH-Commit: [23/26] 904ce3909cfef62dd84cc7d3c6a3482e7e6f28e9
RH-Bugzilla: 2072049
RH-Acked-by: Peter Xu <peterx@redhat.com>
RH-Acked-by: Daniel P. Berrangé <berrange@redhat.com>
RH-Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Implement zero copy send on nocomp_send_write(), by making use of QIOChannel
writev + flags & flush interface.

Change multifd_send_sync_main() so flush_zero_copy() can be called
after each iteration in order to make sure all dirty pages are sent before
a new iteration is started. It will also flush at the beginning and at the
end of migration.

Also make it return -1 if flush_zero_copy() fails, in order to cancel
the migration process, and avoid resuming the guest in the target host
without receiving all current RAM.

This will work fine on RAM migration because the RAM pages are not usually freed,
and there is no problem on changing the pages content between writev_zero_copy() and
the actual sending of the buffer, because this change will dirty the page and
cause it to be re-sent on a next iteration anyway.

A lot of locked memory may be needed in order to use multifd migration
with zero-copy enabled, so disabling the feature should be necessary for
low-privileged users trying to perform multifd migrations.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220513062836.965425-9-leobras@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
(cherry picked from commit 5b1d9bab2da4fca3a3caee97c430e5709cb32b7b)
Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 migration/migration.c | 11 ++++++++++-
 migration/multifd.c   | 37 +++++++++++++++++++++++++++++++++++--
 migration/multifd.h   |  2 ++
 migration/socket.c    |  5 +++--
 4 files changed, 50 insertions(+), 5 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 8e28f2ee413..5357efd3481 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1471,7 +1471,16 @@ static bool migrate_params_check(MigrationParameters *params, Error **errp)
         error_prepend(errp, "Invalid mapping given for block-bitmap-mapping: ");
         return false;
     }
-
+#ifdef CONFIG_LINUX
+    if (params->zero_copy_send &&
+        (!migrate_use_multifd() ||
+         params->multifd_compression != MULTIFD_COMPRESSION_NONE ||
+         (params->tls_creds && *params->tls_creds))) {
+        error_setg(errp,
+                   "Zero copy only available for non-compressed non-TLS multifd migration");
+        return false;
+    }
+#endif
     return true;
 }
 
diff --git a/migration/multifd.c b/migration/multifd.c
index 193f70cdbac..90ab4c4346d 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -576,6 +576,7 @@ void multifd_save_cleanup(void)
 int multifd_send_sync_main(QEMUFile *f)
 {
     int i;
+    bool flush_zero_copy;
 
     if (!migrate_use_multifd()) {
         return 0;
@@ -586,6 +587,20 @@ int multifd_send_sync_main(QEMUFile *f)
             return -1;
         }
     }
+
+    /*
+     * When using zero-copy, it's necessary to flush the pages before any of
+     * the pages can be sent again, so we'll make sure the new version of the
+     * pages will always arrive _later_ than the old pages.
+     *
+     * Currently we achieve this by flushing the zero-page requested writes
+     * per ram iteration, but in the future we could potentially optimize it
+     * to be less frequent, e.g. only after we finished one whole scanning of
+     * all the dirty bitmaps.
+     */
+
+    flush_zero_copy = migrate_use_zero_copy_send();
+
     for (i = 0; i < migrate_multifd_channels(); i++) {
         MultiFDSendParams *p = &multifd_send_state->params[i];
 
@@ -607,6 +622,17 @@ int multifd_send_sync_main(QEMUFile *f)
         ram_counters.transferred += p->packet_len;
         qemu_mutex_unlock(&p->mutex);
         qemu_sem_post(&p->sem);
+
+        if (flush_zero_copy && p->c) {
+            int ret;
+            Error *err = NULL;
+
+            ret = qio_channel_flush(p->c, &err);
+            if (ret < 0) {
+                error_report_err(err);
+                return -1;
+            }
+        }
     }
     for (i = 0; i < migrate_multifd_channels(); i++) {
         MultiFDSendParams *p = &multifd_send_state->params[i];
@@ -691,8 +717,8 @@ static void *multifd_send_thread(void *opaque)
                 p->iov[0].iov_base = p->packet;
             }
 
-            ret = qio_channel_writev_all(p->c, p->iov, p->iovs_num,
-                                         &local_err);
+            ret = qio_channel_writev_full_all(p->c, p->iov, p->iovs_num, NULL,
+                                              0, p->write_flags, &local_err);
             if (ret != 0) {
                 break;
             }
@@ -933,6 +959,13 @@ int multifd_save_setup(Error **errp)
         /* We need one extra place for the packet header */
         p->iov = g_new0(struct iovec, page_count + 1);
         p->normal = g_new0(ram_addr_t, page_count);
+
+        if (migrate_use_zero_copy_send()) {
+            p->write_flags = QIO_CHANNEL_WRITE_FLAG_ZERO_COPY;
+        } else {
+            p->write_flags = 0;
+        }
+
         socket_send_channel_create(multifd_new_send_channel_async, p);
     }
 
diff --git a/migration/multifd.h b/migration/multifd.h
index 92de8781554..11d5e273e61 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -95,6 +95,8 @@ typedef struct {
     uint32_t packet_len;
     /* pointer to the packet */
     MultiFDPacket_t *packet;
+    /* multifd flags for sending ram */
+    int write_flags;
     /* multifd flags for each packet */
     uint32_t flags;
     /* size of the next packet that contains pages */
diff --git a/migration/socket.c b/migration/socket.c
index 3754d8f72c5..4fd5e85f50e 100644
--- a/migration/socket.c
+++ b/migration/socket.c
@@ -79,8 +79,9 @@ static void socket_outgoing_migration(QIOTask *task,
 
     trace_migration_socket_outgoing_connected(data->hostname);
 
-    if (migrate_use_zero_copy_send()) {
-        error_setg(&err, "Zero copy send not available in migration");
+    if (migrate_use_zero_copy_send() &&
+        !qio_channel_has_feature(sioc, QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY)) {
+        error_setg(&err, "Zero copy send feature not detected in host kernel");
     }
 
 out:
-- 
Gitee


From 6945fc149d3d2341f22be38b6ead1ddcdd4145c2 Mon Sep 17 00:00:00 2001
From: Leonardo Bras <leobras@redhat.com>
Date: Mon, 20 Jun 2022 02:39:42 -0300
Subject: [PATCH 133/407] QIOChannelSocket: Introduce assert and reduce ifdefs
 to improve readability
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Leonardo Brás <leobras@redhat.com>
RH-MergeRequest: 191: MSG_ZEROCOPY + Multifd @ rhel8.7
RH-Commit: [24/26] b50e2e65307149f247155a7f7a032dc99e57718d
RH-Bugzilla: 2072049
RH-Acked-by: Peter Xu <peterx@redhat.com>
RH-Acked-by: Daniel P. Berrangé <berrange@redhat.com>
RH-Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

During implementation of MSG_ZEROCOPY feature, a lot of #ifdefs were
introduced, particularly at qio_channel_socket_writev().

Rewrite some of those changes so it's easier to read.

Also, introduce an assert to help detect incorrect zero-copy usage is when
it's disabled on build.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  dgilbert: Fixed up thinko'd g_assert_unreachable->g_assert_not_reached
(cherry picked from commit 803ca43e4c7fcf32f9f68c118301ccd0c83ece3f)
Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 io/channel-socket.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/io/channel-socket.c b/io/channel-socket.c
index 38a46ba213a..7d37b39de7c 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -579,11 +579,17 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
         memcpy(CMSG_DATA(cmsg), fds, fdsize);
     }
 
-#ifdef QEMU_MSG_ZEROCOPY
     if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) {
+#ifdef QEMU_MSG_ZEROCOPY
         sflags = MSG_ZEROCOPY;
-    }
+#else
+        /*
+         * We expect QIOChannel class entry point to have
+         * blocked this code path already
+         */
+        g_assert_not_reached();
 #endif
+    }
 
  retry:
     ret = sendmsg(sioc->fd, &msg, sflags);
@@ -593,15 +599,13 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
             return QIO_CHANNEL_ERR_BLOCK;
         case EINTR:
             goto retry;
-#ifdef QEMU_MSG_ZEROCOPY
         case ENOBUFS:
-            if (sflags & MSG_ZEROCOPY) {
+            if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) {
                 error_setg_errno(errp, errno,
                                  "Process can't lock enough memory for using MSG_ZEROCOPY");
                 return -1;
             }
             break;
-#endif
         }
 
         error_setg_errno(errp, errno,
-- 
Gitee


From 8c2ab2a976ad190a841ca45e21317c06ec451f41 Mon Sep 17 00:00:00 2001
From: Leonardo Bras <leobras@redhat.com>
Date: Mon, 20 Jun 2022 02:39:43 -0300
Subject: [PATCH 134/407] QIOChannelSocket: Fix zero-copy send so socket flush
 works
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Leonardo Brás <leobras@redhat.com>
RH-MergeRequest: 191: MSG_ZEROCOPY + Multifd @ rhel8.7
RH-Commit: [25/26] 3ede94f3269e21c3ace073ed1a6f24696315bcbb
RH-Bugzilla: 2072049
RH-Acked-by: Peter Xu <peterx@redhat.com>
RH-Acked-by: Daniel P. Berrangé <berrange@redhat.com>
RH-Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Somewhere between v6 and v7 the of the zero-copy-send patchset a crucial
part of the flushing mechanism got missing: incrementing zero_copy_queued.

Without that, the flushing interface becomes a no-op, and there is no
guarantee the buffer is really sent.

This can go as bad as causing a corruption in RAM during migration.

Fixes: 2bc58ffc2926 ("QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX")
Reported-by: 徐闯 <xuchuangxclwt@bytedance.com>
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
(cherry picked from commit 4f5a09714c983a3471fd12e3c7f3196e95c650c1)
Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 io/channel-socket.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/io/channel-socket.c b/io/channel-socket.c
index 7d37b39de7c..df858da9247 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -612,6 +612,11 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
                          "Unable to write to socket");
         return -1;
     }
+
+    if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) {
+        sioc->zero_copy_queued++;
+    }
+
     return ret;
 }
 #else /* WIN32 */
-- 
Gitee


From 27ae5906242518ea5b3fbd925fffdff6a623c545 Mon Sep 17 00:00:00 2001
From: Leonardo Bras <leobras@redhat.com>
Date: Mon, 20 Jun 2022 02:39:45 -0300
Subject: [PATCH 135/407] migration: Change zero_copy_send from migration
 parameter to migration capability
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Leonardo Brás <leobras@redhat.com>
RH-MergeRequest: 191: MSG_ZEROCOPY + Multifd @ rhel8.7
RH-Commit: [26/26] ea61e6cbdbe47611bd22d18988e1c4c4e8357cc3
RH-Bugzilla: 2072049
RH-Acked-by: Peter Xu <peterx@redhat.com>
RH-Acked-by: Daniel P. Berrangé <berrange@redhat.com>
RH-Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

When originally implemented, zero_copy_send was designed as a Migration
paramenter.

But taking into account how is that supposed to work, and how
the difference between a capability and a parameter, it only makes sense
that zero-copy-send would work better as a capability.

Taking into account how recently the change got merged, it was decided
that it's still time to make it right, and convert zero_copy_send into
a Migration capability.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  dgilbert: always define the capability, even on non-Linux but error if
set; avoids build problems with the capability
(cherry picked from commit 1abaec9a1b2c23f7aa94709a422128d9e42c3e0b)
Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 migration/migration.c | 58 +++++++++++++++++++------------------------
 monitor/hmp-cmds.c    |  6 -----
 qapi/migration.json   | 33 +++++++-----------------
 3 files changed, 34 insertions(+), 63 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 5357efd3481..c8aa55d2fe7 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -162,7 +162,8 @@ INITIALIZE_MIGRATE_CAPS_SET(check_caps_background_snapshot,
     MIGRATION_CAPABILITY_COMPRESS,
     MIGRATION_CAPABILITY_XBZRLE,
     MIGRATION_CAPABILITY_X_COLO,
-    MIGRATION_CAPABILITY_VALIDATE_UUID);
+    MIGRATION_CAPABILITY_VALIDATE_UUID,
+    MIGRATION_CAPABILITY_ZERO_COPY_SEND);
 
 bool migrate_pre_2_2;
 
@@ -888,10 +889,6 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
     params->multifd_zlib_level = s->parameters.multifd_zlib_level;
     params->has_multifd_zstd_level = true;
     params->multifd_zstd_level = s->parameters.multifd_zstd_level;
-#ifdef CONFIG_LINUX
-    params->has_zero_copy_send = true;
-    params->zero_copy_send = s->parameters.zero_copy_send;
-#endif
     params->has_xbzrle_cache_size = true;
     params->xbzrle_cache_size = s->parameters.xbzrle_cache_size;
     params->has_max_postcopy_bandwidth = true;
@@ -1249,6 +1246,24 @@ static bool migrate_caps_check(bool *cap_list,
         }
     }
 
+#ifdef CONFIG_LINUX
+    if (cap_list[MIGRATION_CAPABILITY_ZERO_COPY_SEND] &&
+        (!cap_list[MIGRATION_CAPABILITY_MULTIFD] ||
+         migrate_use_compression() ||
+         migrate_use_tls())) {
+        error_setg(errp,
+                   "Zero copy only available for non-compressed non-TLS multifd migration");
+        return false;
+    }
+#else
+    if (cap_list[MIGRATION_CAPABILITY_ZERO_COPY_SEND]) {
+        error_setg(errp,
+                   "Zero copy currently only available on Linux");
+        return false;
+    }
+#endif
+
+
     /* incoming side only */
     if (runstate_check(RUN_STATE_INMIGRATE) &&
         !migrate_multifd_is_allowed() &&
@@ -1471,16 +1486,6 @@ static bool migrate_params_check(MigrationParameters *params, Error **errp)
         error_prepend(errp, "Invalid mapping given for block-bitmap-mapping: ");
         return false;
     }
-#ifdef CONFIG_LINUX
-    if (params->zero_copy_send &&
-        (!migrate_use_multifd() ||
-         params->multifd_compression != MULTIFD_COMPRESSION_NONE ||
-         (params->tls_creds && *params->tls_creds))) {
-        error_setg(errp,
-                   "Zero copy only available for non-compressed non-TLS multifd migration");
-        return false;
-    }
-#endif
     return true;
 }
 
@@ -1554,11 +1559,6 @@ static void migrate_params_test_apply(MigrateSetParameters *params,
     if (params->has_multifd_compression) {
         dest->multifd_compression = params->multifd_compression;
     }
-#ifdef CONFIG_LINUX
-    if (params->has_zero_copy_send) {
-        dest->zero_copy_send = params->zero_copy_send;
-    }
-#endif
     if (params->has_xbzrle_cache_size) {
         dest->xbzrle_cache_size = params->xbzrle_cache_size;
     }
@@ -1671,11 +1671,6 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
     if (params->has_multifd_compression) {
         s->parameters.multifd_compression = params->multifd_compression;
     }
-#ifdef CONFIG_LINUX
-    if (params->has_zero_copy_send) {
-        s->parameters.zero_copy_send = params->zero_copy_send;
-    }
-#endif
     if (params->has_xbzrle_cache_size) {
         s->parameters.xbzrle_cache_size = params->xbzrle_cache_size;
         xbzrle_cache_resize(params->xbzrle_cache_size, errp);
@@ -2573,7 +2568,7 @@ bool migrate_use_zero_copy_send(void)
 
     s = migrate_get_current();
 
-    return s->parameters.zero_copy_send;
+    return s->enabled_capabilities[MIGRATION_CAPABILITY_ZERO_COPY_SEND];
 }
 #endif
 
@@ -4236,10 +4231,6 @@ static Property migration_properties[] = {
     DEFINE_PROP_UINT8("multifd-zstd-level", MigrationState,
                       parameters.multifd_zstd_level,
                       DEFAULT_MIGRATE_MULTIFD_ZSTD_LEVEL),
-#ifdef CONFIG_LINUX
-    DEFINE_PROP_BOOL("zero_copy_send", MigrationState,
-                      parameters.zero_copy_send, false),
-#endif
     DEFINE_PROP_SIZE("xbzrle-cache-size", MigrationState,
                       parameters.xbzrle_cache_size,
                       DEFAULT_MIGRATE_XBZRLE_CACHE_SIZE),
@@ -4277,6 +4268,10 @@ static Property migration_properties[] = {
     DEFINE_PROP_MIG_CAP("x-multifd", MIGRATION_CAPABILITY_MULTIFD),
     DEFINE_PROP_MIG_CAP("x-background-snapshot",
             MIGRATION_CAPABILITY_BACKGROUND_SNAPSHOT),
+#ifdef CONFIG_LINUX
+    DEFINE_PROP_MIG_CAP("x-zero-copy-send",
+            MIGRATION_CAPABILITY_ZERO_COPY_SEND),
+#endif
 
     DEFINE_PROP_END_OF_LIST(),
 };
@@ -4337,9 +4332,6 @@ static void migration_instance_init(Object *obj)
     params->has_multifd_compression = true;
     params->has_multifd_zlib_level = true;
     params->has_multifd_zstd_level = true;
-#ifdef CONFIG_LINUX
-    params->has_zero_copy_send = true;
-#endif
     params->has_xbzrle_cache_size = true;
     params->has_max_postcopy_bandwidth = true;
     params->has_max_cpu_throttle = true;
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index e02da5008bf..2669156b284 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -1297,12 +1297,6 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
         p->has_multifd_zstd_level = true;
         visit_type_uint8(v, param, &p->multifd_zstd_level, &err);
         break;
-#ifdef CONFIG_LINUX
-    case MIGRATION_PARAMETER_ZERO_COPY_SEND:
-        p->has_zero_copy_send = true;
-        visit_type_bool(v, param, &p->zero_copy_send, &err);
-        break;
-#endif
     case MIGRATION_PARAMETER_XBZRLE_CACHE_SIZE:
         p->has_xbzrle_cache_size = true;
         if (!visit_type_size(v, param, &cache_size, &err)) {
diff --git a/qapi/migration.json b/qapi/migration.json
index 59b5c5780b3..fe70a0c4b2b 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -452,6 +452,13 @@
 #                       procedure starts. The VM RAM is saved with running VM.
 #                       (since 6.0)
 #
+# @zero-copy-send: Controls behavior on sending memory pages on migration.
+#                  When true, enables a zero-copy mechanism for sending
+#                  memory pages, if host supports it.
+#                  Requires that QEMU be permitted to use locked memory
+#                  for guest RAM pages.
+#                  (since 7.1)
+#
 # Features:
 # @unstable: Members @x-colo and @x-ignore-shared are experimental.
 #
@@ -465,7 +472,8 @@
            'block', 'return-path', 'pause-before-switchover', 'multifd',
            'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
            { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
-           'validate-uuid', 'background-snapshot'] }
+           'validate-uuid', 'background-snapshot',
+           'zero-copy-send'] }
 
 ##
 # @MigrationCapabilityStatus:
@@ -730,12 +738,6 @@
 #                      will consume more CPU.
 #                      Defaults to 1. (Since 5.0)
 #
-# @zero-copy-send: Controls behavior on sending memory pages on migration.
-#                  When true, enables a zero-copy mechanism for sending
-#                  memory pages, if host supports it.
-#                  Requires that QEMU be permitted to use locked memory
-#                  for guest RAM pages.
-#                  Defaults to false. (Since 7.1)
 #
 # @block-bitmap-mapping: Maps block nodes and bitmaps on them to
 #                        aliases for the purpose of dirty bitmap migration.  Such
@@ -776,7 +778,6 @@
            'xbzrle-cache-size', 'max-postcopy-bandwidth',
            'max-cpu-throttle', 'multifd-compression',
            'multifd-zlib-level' ,'multifd-zstd-level',
-           { 'name': 'zero-copy-send', 'if' : 'CONFIG_LINUX'},
            'block-bitmap-mapping' ] }
 
 ##
@@ -903,13 +904,6 @@
 #                      will consume more CPU.
 #                      Defaults to 1. (Since 5.0)
 #
-# @zero-copy-send: Controls behavior on sending memory pages on migration.
-#                  When true, enables a zero-copy mechanism for sending
-#                  memory pages, if host supports it.
-#                  Requires that QEMU be permitted to use locked memory
-#                  for guest RAM pages.
-#                  Defaults to false. (Since 7.1)
-#
 # @block-bitmap-mapping: Maps block nodes and bitmaps on them to
 #                        aliases for the purpose of dirty bitmap migration.  Such
 #                        aliases may for example be the corresponding names on the
@@ -964,7 +958,6 @@
             '*multifd-compression': 'MultiFDCompression',
             '*multifd-zlib-level': 'uint8',
             '*multifd-zstd-level': 'uint8',
-            '*zero-copy-send': { 'type': 'bool', 'if': 'CONFIG_LINUX' },
             '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
 
 ##
@@ -1111,13 +1104,6 @@
 #                      will consume more CPU.
 #                      Defaults to 1. (Since 5.0)
 #
-# @zero-copy-send: Controls behavior on sending memory pages on migration.
-#                  When true, enables a zero-copy mechanism for sending
-#                  memory pages, if host supports it.
-#                  Requires that QEMU be permitted to use locked memory
-#                  for guest RAM pages.
-#                  Defaults to false. (Since 7.1)
-#
 # @block-bitmap-mapping: Maps block nodes and bitmaps on them to
 #                        aliases for the purpose of dirty bitmap migration.  Such
 #                        aliases may for example be the corresponding names on the
@@ -1170,7 +1156,6 @@
             '*multifd-compression': 'MultiFDCompression',
             '*multifd-zlib-level': 'uint8',
             '*multifd-zstd-level': 'uint8',
-            '*zero-copy-send': { 'type': 'bool', 'if': 'CONFIG_LINUX' },
             '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
 
 ##
-- 
Gitee


From 9858958484319b2ef38d59c1b528cc6ee4839913 Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx@redhat.com>
Date: Tue, 1 Mar 2022 16:39:14 +0800
Subject: [PATCH 136/407] migration: Add migration_incoming_transport_cleanup()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Peter Xu <peterx@redhat.com>
RH-MergeRequest: 195: migration: Allow migrate-recover to run multiple times
RH-Commit: [1/2] 57b2a9a165ee7cb2d01519bd54eb8dc4185815e0
RH-Bugzilla: 2097652
RH-Acked-by: Leonardo Brás <leobras@redhat.com>
RH-Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
RH-Acked-by: Hanna Reitz <hreitz@redhat.com>

Add a helper to cleanup the transport listener.

When do it, we should also null-ify the cleanup hook and the data, then it's
even safe to call it multiple times.

Move the socket_address_list cleanup altogether, because that's a mirror of the
listener channels and only for the purpose of query-migrate.  Hence when
someone wants to cleanup the listener transport, it should also want to cleanup
the socket list too, always.

No functional change intended.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220301083925.33483-15-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
(cherry picked from commit e031149c78489413038e934eec9f54ac699cf322)
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 22 ++++++++++++++--------
 migration/migration.h |  1 +
 2 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index c8aa55d2fe7..b787a367897 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -263,6 +263,19 @@ MigrationIncomingState *migration_incoming_get_current(void)
     return current_incoming;
 }
 
+void migration_incoming_transport_cleanup(MigrationIncomingState *mis)
+{
+    if (mis->socket_address_list) {
+        qapi_free_SocketAddressList(mis->socket_address_list);
+        mis->socket_address_list = NULL;
+    }
+
+    if (mis->transport_cleanup) {
+        mis->transport_cleanup(mis->transport_data);
+        mis->transport_data = mis->transport_cleanup = NULL;
+    }
+}
+
 void migration_incoming_state_destroy(void)
 {
     struct MigrationIncomingState *mis = migration_incoming_get_current();
@@ -283,10 +296,8 @@ void migration_incoming_state_destroy(void)
         g_array_free(mis->postcopy_remote_fds, TRUE);
         mis->postcopy_remote_fds = NULL;
     }
-    if (mis->transport_cleanup) {
-        mis->transport_cleanup(mis->transport_data);
-    }
 
+    migration_incoming_transport_cleanup(mis);
     qemu_event_reset(&mis->main_thread_load_event);
 
     if (mis->page_requested) {
@@ -294,11 +305,6 @@ void migration_incoming_state_destroy(void)
         mis->page_requested = NULL;
     }
 
-    if (mis->socket_address_list) {
-        qapi_free_SocketAddressList(mis->socket_address_list);
-        mis->socket_address_list = NULL;
-    }
-
     yank_unregister_instance(MIGRATION_YANK_INSTANCE);
 }
 
diff --git a/migration/migration.h b/migration/migration.h
index 9396b7e90a5..243898e3bec 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -130,6 +130,7 @@ struct MigrationIncomingState {
 
 MigrationIncomingState *migration_incoming_get_current(void);
 void migration_incoming_state_destroy(void);
+void migration_incoming_transport_cleanup(MigrationIncomingState *mis);
 /*
  * Functions to work with blocktime context
  */
-- 
Gitee


From a447ea4d43bd651c0a0a6863741499922887a6c2 Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx@redhat.com>
Date: Thu, 31 Mar 2022 11:08:45 -0400
Subject: [PATCH 137/407] migration: Allow migrate-recover to run multiple
 times
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Peter Xu <peterx@redhat.com>
RH-MergeRequest: 195: migration: Allow migrate-recover to run multiple times
RH-Commit: [2/2] a2e6b02007a06c9c7f5237289095811c7d7ca1f1
RH-Bugzilla: 2097652
RH-Acked-by: Leonardo Brás <leobras@redhat.com>
RH-Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
RH-Acked-by: Hanna Reitz <hreitz@redhat.com>

Previously migration didn't have an easy way to cleanup the listening
transport, migrate recovery only allows to execute once.  That's done with a
trick flag in postcopy_recover_triggered.

Now the facility is already there.

Drop postcopy_recover_triggered and instead allows a new migrate-recover to
release the previous listener transport.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220331150857.74406-8-peterx@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
(cherry picked from commit 08401c0426bc1a5ce4609afd1cda5dd39abbf9fa)
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 13 ++-----------
 migration/migration.h |  1 -
 migration/savevm.c    |  3 ---
 3 files changed, 2 insertions(+), 15 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index b787a367897..616c3ff32ee 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2158,11 +2158,8 @@ void qmp_migrate_recover(const char *uri, Error **errp)
         return;
     }
 
-    if (qatomic_cmpxchg(&mis->postcopy_recover_triggered,
-                       false, true) == true) {
-        error_setg(errp, "Migrate recovery is triggered already");
-        return;
-    }
+    /* If there's an existing transport, release it */
+    migration_incoming_transport_cleanup(mis);
 
     /*
      * Note that this call will never start a real migration; it will
@@ -2170,12 +2167,6 @@ void qmp_migrate_recover(const char *uri, Error **errp)
      * to continue using that newly established channel.
      */
     qemu_start_incoming_migration(uri, errp);
-
-    /* Safe to dereference with the assert above */
-    if (*errp) {
-        /* Reset the flag so user could still retry */
-        qatomic_set(&mis->postcopy_recover_triggered, false);
-    }
 }
 
 void qmp_migrate_pause(Error **errp)
diff --git a/migration/migration.h b/migration/migration.h
index 243898e3bec..0ae2133326b 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -103,7 +103,6 @@ struct MigrationIncomingState {
     struct PostcopyBlocktimeContext *blocktime_ctx;
 
     /* notify PAUSED postcopy incoming migrations to try to continue */
-    bool postcopy_recover_triggered;
     QemuSemaphore postcopy_pause_sem_dst;
     QemuSemaphore postcopy_pause_sem_fault;
 
diff --git a/migration/savevm.c b/migration/savevm.c
index 0bef031acb3..b8382aaa644 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2568,9 +2568,6 @@ static bool postcopy_pause_incoming(MigrationIncomingState *mis)
 
     assert(migrate_postcopy_ram());
 
-    /* Clear the triggered bit to allow one recovery */
-    mis->postcopy_recover_triggered = false;
-
     /*
      * Unregister yank with either from/to src would work, since ioc behind it
      * is the same
-- 
Gitee


From b1cdf1dc2676e497798a9ef36143525d5982f9aa Mon Sep 17 00:00:00 2001
From: Thomas Huth <thuth@redhat.com>
Date: Fri, 8 Jul 2022 12:29:50 +0200
Subject: [PATCH 138/407] pc-bios/s390-ccw/virtio: Introduce a macro for the
 DASD block size

RH-Author: Thomas Huth <thuth@redhat.com>
RH-MergeRequest: 198: pc-bios/s390-ccw: Fix boot from disks with 4k sectors that do not have the typical DASD geometry
RH-Commit: [1/9] 1053101fd5fb591131c567ff98c7d92b63a9dfa9
RH-Bugzilla: 2098076
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>

Bugzilla: http://bugzilla.redhat.com/2098076

commit 1f2c2ee48e87ea743f8e23cc7569dd26c4cf9623
Author: Thomas Huth <thuth@redhat.com>
Date:   Mon Jul 4 13:18:53 2022 +0200

    pc-bios/s390-ccw/virtio: Introduce a macro for the DASD block size

    Use VIRTIO_DASD_DEFAULT_BLOCK_SIZE instead of the magic value 4096.

    Message-Id: <20220704111903.62400-3-thuth@redhat.com>
    Reviewed-by: Eric Farman <farman@linux.ibm.com>
    Reviewed-by: Cornelia Huck <cohuck@redhat.com>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Thomas Huth <thuth@redhat.com>
---
 pc-bios/s390-ccw/virtio-blkdev.c | 2 +-
 pc-bios/s390-ccw/virtio.h        | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/pc-bios/s390-ccw/virtio-blkdev.c b/pc-bios/s390-ccw/virtio-blkdev.c
index 7d35050292d..6483307630a 100644
--- a/pc-bios/s390-ccw/virtio-blkdev.c
+++ b/pc-bios/s390-ccw/virtio-blkdev.c
@@ -155,7 +155,7 @@ void virtio_assume_eckd(void)
     vdev->config.blk.physical_block_exp = 0;
     switch (vdev->senseid.cu_model) {
     case VIRTIO_ID_BLOCK:
-        vdev->config.blk.blk_size = 4096;
+        vdev->config.blk.blk_size = VIRTIO_DASD_DEFAULT_BLOCK_SIZE;
         break;
     case VIRTIO_ID_SCSI:
         vdev->config.blk.blk_size = vdev->scsi_block_size;
diff --git a/pc-bios/s390-ccw/virtio.h b/pc-bios/s390-ccw/virtio.h
index 19fceb6495f..9e410bde6f2 100644
--- a/pc-bios/s390-ccw/virtio.h
+++ b/pc-bios/s390-ccw/virtio.h
@@ -198,6 +198,7 @@ extern int virtio_read_many(ulong sector, void *load_addr, int sec_num);
 #define VIRTIO_SECTOR_SIZE 512
 #define VIRTIO_ISO_BLOCK_SIZE 2048
 #define VIRTIO_SCSI_BLOCK_SIZE 512
+#define VIRTIO_DASD_DEFAULT_BLOCK_SIZE 4096
 
 static inline ulong virtio_sector_adjust(ulong sector)
 {
-- 
Gitee


From 2e8dbc9e69201cf6a65514e700f4c9ce5d28d724 Mon Sep 17 00:00:00 2001
From: Thomas Huth <thuth@redhat.com>
Date: Fri, 8 Jul 2022 12:29:50 +0200
Subject: [PATCH 139/407] pc-bios/s390-ccw/bootmap: Improve the guessing logic
 in zipl_load_vblk()

RH-Author: Thomas Huth <thuth@redhat.com>
RH-MergeRequest: 198: pc-bios/s390-ccw: Fix boot from disks with 4k sectors that do not have the typical DASD geometry
RH-Commit: [2/9] db1d2e7929352bec0e1a5d4cf3fb385bbe02304b
RH-Bugzilla: 2098076
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>

Bugzilla: http://bugzilla.redhat.com/2098076

commit 422865f6672ee1482b98d18321b55c1ecfb06c82
Author: Thomas Huth <thuth@redhat.com>
Date:   Mon Jul 4 13:18:54 2022 +0200

    pc-bios/s390-ccw/bootmap: Improve the guessing logic in zipl_load_vblk()

    The logic of trying an final ISO or ECKD boot on virtio-block devices is
    very weird: Since the geometry hardly ever matches in virtio_disk_is_scsi(),
    virtio_blk_setup_device() always sets a "guessed" disk geometry via
    virtio_assume_scsi() (which is certainly also wrong in a lot of cases).

    zipl_load_vblk() then sees that there's been a "virtio_guessed_disk_nature"
    and tries to fix up the geometry again via virtio_assume_iso9660() before
    always trying to do ipl_iso_el_torito(). That's a very brain-twisting
    way of attempting to boot from ISO images, which won't work anymore after
    the following patches that will clean up the virtio_assume_scsi() mess
    (and thus get rid of the "virtio_guessed_disk_nature" here).

    Let's try a better approach instead: ISO files always have a magic
    string "CD001" at offset 0x8001 (see e.g. the ECMA-119 specification)
    which we can use to decide whether we should try to boot in ISO 9660
    mode (which we should also try if we see a sector size of 2048).

    And if we were not able to boot in ISO mode here, the final boot attempt
    before panicking is to boot in ECKD mode. Since this is our last boot
    attempt anyway, simply always assume the ECKD geometry here (if the sector
    size was not 4096 yet), so that we also do not depend on the guessed disk
    geometry from virtio_blk_setup_device() here anymore.

    Message-Id: <20220704111903.62400-4-thuth@redhat.com>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Thomas Huth <thuth@redhat.com>
---
 pc-bios/s390-ccw/bootmap.c | 27 +++++++++++++++++++++++----
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/pc-bios/s390-ccw/bootmap.c b/pc-bios/s390-ccw/bootmap.c
index 56411ab3b67..994e59c0b02 100644
--- a/pc-bios/s390-ccw/bootmap.c
+++ b/pc-bios/s390-ccw/bootmap.c
@@ -780,18 +780,37 @@ static void ipl_iso_el_torito(void)
     }
 }
 
+/**
+ * Detect whether we're trying to boot from an .ISO image.
+ * These always have a signature string "CD001" at offset 0x8001.
+ */
+static bool has_iso_signature(void)
+{
+    int blksize = virtio_get_block_size();
+
+    if (!blksize || virtio_read(0x8000 / blksize, sec)) {
+        return false;
+    }
+
+    return !memcmp("CD001", &sec[1], 5);
+}
+
 /***********************************************************************
  * Bus specific IPL sequences
  */
 
 static void zipl_load_vblk(void)
 {
-    if (virtio_guessed_disk_nature()) {
-        virtio_assume_iso9660();
+    int blksize = virtio_get_block_size();
+
+    if (blksize == VIRTIO_ISO_BLOCK_SIZE || has_iso_signature()) {
+        if (blksize != VIRTIO_ISO_BLOCK_SIZE) {
+            virtio_assume_iso9660();
+        }
+        ipl_iso_el_torito();
     }
-    ipl_iso_el_torito();
 
-    if (virtio_guessed_disk_nature()) {
+    if (blksize != VIRTIO_DASD_DEFAULT_BLOCK_SIZE) {
         sclp_print("Using guessed DASD geometry.\n");
         virtio_assume_eckd();
     }
-- 
Gitee


From 0f0ebd5f54923d82b85e8f4ca6e0e2f01af39e80 Mon Sep 17 00:00:00 2001
From: Thomas Huth <thuth@redhat.com>
Date: Fri, 8 Jul 2022 12:29:50 +0200
Subject: [PATCH 140/407] pc-bios/s390-ccw/virtio-blkdev: Simplify/fix
 virtio_ipl_disk_is_valid()

RH-Author: Thomas Huth <thuth@redhat.com>
RH-MergeRequest: 198: pc-bios/s390-ccw: Fix boot from disks with 4k sectors that do not have the typical DASD geometry
RH-Commit: [3/9] ca0b836a417ce5bbd26e489551f573d6b2fc9e94
RH-Bugzilla: 2098076
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>

Bugzilla: http://bugzilla.redhat.com/2098076

commit bbf615f7b707f009ef8e757d170902ad33b90644
Author: Thomas Huth <thuth@redhat.com>
Date:   Mon Jul 4 13:18:55 2022 +0200

    pc-bios/s390-ccw/virtio-blkdev: Simplify/fix virtio_ipl_disk_is_valid()

    The s390-ccw bios fails to boot if the boot disk is a virtio-blk
    disk with a sector size of 4096. For example:

     dasdfmt -b 4096 -d cdl -y -p -M quick /dev/dasdX
     fdasd -a /dev/dasdX
     install a guest onto /dev/dasdX1 using virtio-blk
     qemu-system-s390x -nographic -hda /dev/dasdX1

    The bios then bails out with:

     ! Cannot read block 0 !

    Looking at virtio_ipl_disk_is_valid() and especially the function
    virtio_disk_is_scsi(), it does not really make sense that we expect
    only such a limited disk geometry (like a block size of 512) for
    our boot disks. Let's relax the check and allow everything that
    remotely looks like a sane disk.

    Message-Id: <20220704111903.62400-5-thuth@redhat.com>
    Reviewed-by: Eric Farman <farman@linux.ibm.com>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Thomas Huth <thuth@redhat.com>
---
 pc-bios/s390-ccw/virtio-blkdev.c | 41 ++++++--------------------------
 pc-bios/s390-ccw/virtio.h        |  2 --
 2 files changed, 7 insertions(+), 36 deletions(-)

diff --git a/pc-bios/s390-ccw/virtio-blkdev.c b/pc-bios/s390-ccw/virtio-blkdev.c
index 6483307630a..7e13155589c 100644
--- a/pc-bios/s390-ccw/virtio-blkdev.c
+++ b/pc-bios/s390-ccw/virtio-blkdev.c
@@ -166,46 +166,19 @@ void virtio_assume_eckd(void)
         virtio_eckd_sectors_for_block_size(vdev->config.blk.blk_size);
 }
 
-bool virtio_disk_is_scsi(void)
-{
-    VDev *vdev = virtio_get_device();
-
-    if (vdev->guessed_disk_nature == VIRTIO_GDN_SCSI) {
-        return true;
-    }
-    switch (vdev->senseid.cu_model) {
-    case VIRTIO_ID_BLOCK:
-        return (vdev->config.blk.geometry.heads == 255)
-            && (vdev->config.blk.geometry.sectors == 63)
-            && (virtio_get_block_size()  == VIRTIO_SCSI_BLOCK_SIZE);
-    case VIRTIO_ID_SCSI:
-        return true;
-    }
-    return false;
-}
-
-bool virtio_disk_is_eckd(void)
+bool virtio_ipl_disk_is_valid(void)
 {
+    int blksize = virtio_get_block_size();
     VDev *vdev = virtio_get_device();
-    const int block_size = virtio_get_block_size();
 
-    if (vdev->guessed_disk_nature == VIRTIO_GDN_DASD) {
+    if (vdev->guessed_disk_nature == VIRTIO_GDN_SCSI ||
+        vdev->guessed_disk_nature == VIRTIO_GDN_DASD) {
         return true;
     }
-    switch (vdev->senseid.cu_model) {
-    case VIRTIO_ID_BLOCK:
-        return (vdev->config.blk.geometry.heads == 15)
-            && (vdev->config.blk.geometry.sectors ==
-                virtio_eckd_sectors_for_block_size(block_size));
-    case VIRTIO_ID_SCSI:
-        return false;
-    }
-    return false;
-}
 
-bool virtio_ipl_disk_is_valid(void)
-{
-    return virtio_disk_is_scsi() || virtio_disk_is_eckd();
+    return (vdev->senseid.cu_model == VIRTIO_ID_BLOCK ||
+            vdev->senseid.cu_model == VIRTIO_ID_SCSI) &&
+           blksize >= 512 && blksize <= 4096;
 }
 
 int virtio_get_block_size(void)
diff --git a/pc-bios/s390-ccw/virtio.h b/pc-bios/s390-ccw/virtio.h
index 9e410bde6f2..241730effe4 100644
--- a/pc-bios/s390-ccw/virtio.h
+++ b/pc-bios/s390-ccw/virtio.h
@@ -186,8 +186,6 @@ void virtio_assume_scsi(void);
 void virtio_assume_eckd(void);
 void virtio_assume_iso9660(void);
 
-extern bool virtio_disk_is_scsi(void);
-extern bool virtio_disk_is_eckd(void);
 extern bool virtio_ipl_disk_is_valid(void);
 extern int virtio_get_block_size(void);
 extern uint8_t virtio_get_heads(void);
-- 
Gitee


From 89742f91f4f40f4abbfee96479427ead996680e1 Mon Sep 17 00:00:00 2001
From: Thomas Huth <thuth@redhat.com>
Date: Fri, 8 Jul 2022 12:29:50 +0200
Subject: [PATCH 141/407] pc-bios/s390-ccw/virtio-blkdev: Remove
 virtio_assume_scsi()

RH-Author: Thomas Huth <thuth@redhat.com>
RH-MergeRequest: 198: pc-bios/s390-ccw: Fix boot from disks with 4k sectors that do not have the typical DASD geometry
RH-Commit: [4/9] 5256c4e6f4d5c5aedf1bad3fee30dd3ad230a3dd
RH-Bugzilla: 2098076
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>

Bugzilla: http://bugzilla.redhat.com/2098076

commit 5447de2619050a0a4dd480b97f88a9b58da360d1
Author: Thomas Huth <thuth@redhat.com>
Date:   Mon Jul 4 13:18:56 2022 +0200

    pc-bios/s390-ccw/virtio-blkdev: Remove virtio_assume_scsi()

    The virtio_assume_scsi() function is very questionable: First, it
    is only called for virtio-blk, and not for virtio-scsi, so the naming
    is already quite confusing. Second, it is called if we detected a
    "invalid" IPL disk, trying to fix it by blindly setting a sector
    size of 512. This of course won't work in most cases since disks
    might have a different sector size for a reason.

    Thus let's remove this strange function now. The calling code can
    also be removed completely, since there is another spot in main.c
    that does "IPL_assert(virtio_ipl_disk_is_valid(), ...)" to make
    sure that we do not try to IPL from an invalid device.

    Message-Id: <20220704111903.62400-6-thuth@redhat.com>
    Reviewed-by: Eric Farman <farman@linux.ibm.com>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Thomas Huth <thuth@redhat.com>
---
 pc-bios/s390-ccw/virtio-blkdev.c | 24 ------------------------
 pc-bios/s390-ccw/virtio.h        |  1 -
 2 files changed, 25 deletions(-)

diff --git a/pc-bios/s390-ccw/virtio-blkdev.c b/pc-bios/s390-ccw/virtio-blkdev.c
index 7e13155589c..db1f7f44aab 100644
--- a/pc-bios/s390-ccw/virtio-blkdev.c
+++ b/pc-bios/s390-ccw/virtio-blkdev.c
@@ -112,23 +112,6 @@ VirtioGDN virtio_guessed_disk_nature(void)
     return virtio_get_device()->guessed_disk_nature;
 }
 
-void virtio_assume_scsi(void)
-{
-    VDev *vdev = virtio_get_device();
-
-    switch (vdev->senseid.cu_model) {
-    case VIRTIO_ID_BLOCK:
-        vdev->guessed_disk_nature = VIRTIO_GDN_SCSI;
-        vdev->config.blk.blk_size = VIRTIO_SCSI_BLOCK_SIZE;
-        vdev->config.blk.physical_block_exp = 0;
-        vdev->blk_factor = 1;
-        break;
-    case VIRTIO_ID_SCSI:
-        vdev->scsi_block_size = VIRTIO_SCSI_BLOCK_SIZE;
-        break;
-    }
-}
-
 void virtio_assume_iso9660(void)
 {
     VDev *vdev = virtio_get_device();
@@ -247,13 +230,6 @@ int virtio_blk_setup_device(SubChannelId schid)
     switch (vdev->senseid.cu_model) {
     case VIRTIO_ID_BLOCK:
         sclp_print("Using virtio-blk.\n");
-        if (!virtio_ipl_disk_is_valid()) {
-            /* make sure all getters but blocksize return 0 for
-             * invalid IPL disk
-             */
-            memset(&vdev->config.blk, 0, sizeof(vdev->config.blk));
-            virtio_assume_scsi();
-        }
         break;
     case VIRTIO_ID_SCSI:
         IPL_assert(vdev->config.scsi.sense_size == VIRTIO_SCSI_SENSE_SIZE,
diff --git a/pc-bios/s390-ccw/virtio.h b/pc-bios/s390-ccw/virtio.h
index 241730effe4..600ba5052b3 100644
--- a/pc-bios/s390-ccw/virtio.h
+++ b/pc-bios/s390-ccw/virtio.h
@@ -182,7 +182,6 @@ enum guessed_disk_nature_type {
 typedef enum guessed_disk_nature_type VirtioGDN;
 
 VirtioGDN virtio_guessed_disk_nature(void);
-void virtio_assume_scsi(void);
 void virtio_assume_eckd(void);
 void virtio_assume_iso9660(void);
 
-- 
Gitee


From 7668fbe35c6e5618b9f8134115ac240898cd8a0d Mon Sep 17 00:00:00 2001
From: Thomas Huth <thuth@redhat.com>
Date: Fri, 8 Jul 2022 12:29:50 +0200
Subject: [PATCH 142/407] pc-bios/s390-ccw/virtio: Set missing status bits
 while initializing

RH-Author: Thomas Huth <thuth@redhat.com>
RH-MergeRequest: 198: pc-bios/s390-ccw: Fix boot from disks with 4k sectors that do not have the typical DASD geometry
RH-Commit: [5/9] 6072245f49c229518246b4a0d1be360331305bfa
RH-Bugzilla: 2098076
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>

Bugzilla: http://bugzilla.redhat.com/2098076

commit 175aa06a152ef6b58ba9b2e47a1296b024dea70c
Author: Thomas Huth <thuth@redhat.com>
Date:   Mon Jul 4 13:18:57 2022 +0200

    pc-bios/s390-ccw/virtio: Set missing status bits while initializing

    According chapter "3.1.1 Driver Requirements: Device Initialization"
    of the Virtio specification (v1.1), a driver for a device has to set
    the ACKNOWLEDGE and DRIVER bits in the status field after resetting
    the device. The s390-ccw bios skipped these steps so far and seems
    like QEMU never cared. Anyway, it's better to follow the spec, so
    let's set these bits now in the right spots, too.

    Message-Id: <20220704111903.62400-7-thuth@redhat.com>
    Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
    Reviewed-by: Cornelia Huck <cohuck@redhat.com>
    Reviewed-by: Eric Farman <farman@linux.ibm.com>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Thomas Huth <thuth@redhat.com>
---
 pc-bios/s390-ccw/virtio.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/pc-bios/s390-ccw/virtio.c b/pc-bios/s390-ccw/virtio.c
index 5d2c6e33816..4e85a2eb828 100644
--- a/pc-bios/s390-ccw/virtio.c
+++ b/pc-bios/s390-ccw/virtio.c
@@ -220,7 +220,7 @@ int virtio_run(VDev *vdev, int vqid, VirtioCmd *cmd)
 void virtio_setup_ccw(VDev *vdev)
 {
     int i, rc, cfg_size = 0;
-    unsigned char status = VIRTIO_CONFIG_S_DRIVER_OK;
+    uint8_t status;
     struct VirtioFeatureDesc {
         uint32_t features;
         uint8_t index;
@@ -234,6 +234,10 @@ void virtio_setup_ccw(VDev *vdev)
 
     run_ccw(vdev, CCW_CMD_VDEV_RESET, NULL, 0, false);
 
+    status = VIRTIO_CONFIG_S_ACKNOWLEDGE;
+    rc = run_ccw(vdev, CCW_CMD_WRITE_STATUS, &status, sizeof(status), false);
+    IPL_assert(rc == 0, "Could not write ACKNOWLEDGE status to host");
+
     switch (vdev->senseid.cu_model) {
     case VIRTIO_ID_NET:
         vdev->nr_vqs = 2;
@@ -253,6 +257,11 @@ void virtio_setup_ccw(VDev *vdev)
     default:
         panic("Unsupported virtio device\n");
     }
+
+    status |= VIRTIO_CONFIG_S_DRIVER;
+    rc = run_ccw(vdev, CCW_CMD_WRITE_STATUS, &status, sizeof(status), false);
+    IPL_assert(rc == 0, "Could not write DRIVER status to host");
+
     IPL_assert(
         run_ccw(vdev, CCW_CMD_READ_CONF, &vdev->config, cfg_size, false) == 0,
        "Could not get block device configuration");
@@ -291,9 +300,10 @@ void virtio_setup_ccw(VDev *vdev)
             run_ccw(vdev, CCW_CMD_SET_VQ, &info, sizeof(info), false) == 0,
             "Cannot set VQ info");
     }
-    IPL_assert(
-        run_ccw(vdev, CCW_CMD_WRITE_STATUS, &status, sizeof(status), false) == 0,
-        "Could not write status to host");
+
+    status |= VIRTIO_CONFIG_S_DRIVER_OK;
+    rc = run_ccw(vdev, CCW_CMD_WRITE_STATUS, &status, sizeof(status), false);
+    IPL_assert(rc == 0, "Could not write DRIVER_OK status to host");
 }
 
 bool virtio_is_supported(SubChannelId schid)
-- 
Gitee


From 7dfbcfec2a3a9a1de386aa47412f8dc1ddaa7f7a Mon Sep 17 00:00:00 2001
From: Thomas Huth <thuth@redhat.com>
Date: Fri, 8 Jul 2022 12:29:50 +0200
Subject: [PATCH 143/407] pc-bios/s390-ccw/virtio: Read device config after
 feature negotiation

RH-Author: Thomas Huth <thuth@redhat.com>
RH-MergeRequest: 198: pc-bios/s390-ccw: Fix boot from disks with 4k sectors that do not have the typical DASD geometry
RH-Commit: [6/9] 99ed8765d614207db19ded75d62c65171674d982
RH-Bugzilla: 2098076
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>

Bugzilla: http://bugzilla.redhat.com/2098076

commit aa5c69ce99411c4886bcd051f288afc02b6d968d
Author: Thomas Huth <thuth@redhat.com>
Date:   Mon Jul 4 13:18:58 2022 +0200

    pc-bios/s390-ccw/virtio: Read device config after feature negotiation

    Feature negotiation should be done first, since some fields in the
    config area can depend on the negotiated features and thus should
    rather be read afterwards.

    While we're at it, also adjust the error message here a little bit
    (the code is nowadays used for non-block virtio devices, too).

    Message-Id: <20220704111903.62400-8-thuth@redhat.com>
    Reviewed-by: Eric Farman <farman@linux.ibm.com>
    Reviewed-by: Cornelia Huck <cohuck@redhat.com>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Thomas Huth <thuth@redhat.com>
---
 pc-bios/s390-ccw/virtio.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/pc-bios/s390-ccw/virtio.c b/pc-bios/s390-ccw/virtio.c
index 4e85a2eb828..d8c2b52710d 100644
--- a/pc-bios/s390-ccw/virtio.c
+++ b/pc-bios/s390-ccw/virtio.c
@@ -262,10 +262,6 @@ void virtio_setup_ccw(VDev *vdev)
     rc = run_ccw(vdev, CCW_CMD_WRITE_STATUS, &status, sizeof(status), false);
     IPL_assert(rc == 0, "Could not write DRIVER status to host");
 
-    IPL_assert(
-        run_ccw(vdev, CCW_CMD_READ_CONF, &vdev->config, cfg_size, false) == 0,
-       "Could not get block device configuration");
-
     /* Feature negotiation */
     for (i = 0; i < ARRAY_SIZE(vdev->guest_features); i++) {
         feats.features = 0;
@@ -278,6 +274,9 @@ void virtio_setup_ccw(VDev *vdev)
         IPL_assert(rc == 0, "Could not set features bits");
     }
 
+    rc = run_ccw(vdev, CCW_CMD_READ_CONF, &vdev->config, cfg_size, false);
+    IPL_assert(rc == 0, "Could not get virtio device configuration");
+
     for (i = 0; i < vdev->nr_vqs; i++) {
         VqInfo info = {
             .queue = (unsigned long long) ring_area + (i * VIRTIO_RING_SIZE),
-- 
Gitee


From c8c6b7f24ffb91db807eb0d78522b9fcd59137ae Mon Sep 17 00:00:00 2001
From: Thomas Huth <thuth@redhat.com>
Date: Fri, 8 Jul 2022 12:29:50 +0200
Subject: [PATCH 144/407] pc-bios/s390-ccw/virtio: Beautify the code for
 reading virtqueue configuration

RH-Author: Thomas Huth <thuth@redhat.com>
RH-MergeRequest: 198: pc-bios/s390-ccw: Fix boot from disks with 4k sectors that do not have the typical DASD geometry
RH-Commit: [7/9] 52fb7fee7d7c46397f32e35bd5f92f82616dfb5c
RH-Bugzilla: 2098076
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>

Bugzilla: http://bugzilla.redhat.com/2098076

commit 070824885741f5d2a66626d3c4ecb2773c8e0552
Author: Thomas Huth <thuth@redhat.com>
Date:   Mon Jul 4 13:18:59 2022 +0200

    pc-bios/s390-ccw/virtio: Beautify the code for reading virtqueue configuration

    It looks nicer if we separate the run_ccw() from the IPL_assert()
    statement, and the error message should talk about "virtio device"
    instead of "block device", since this code is nowadays used for
    non-block (i.e. network) devices, too.

    Message-Id: <20220704111903.62400-9-thuth@redhat.com>
    Reviewed-by: Cornelia Huck <cohuck@redhat.com>
    Reviewed-by: Eric Farman <farman@linux.ibm.com>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Thomas Huth <thuth@redhat.com>
---
 pc-bios/s390-ccw/virtio.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/pc-bios/s390-ccw/virtio.c b/pc-bios/s390-ccw/virtio.c
index d8c2b52710d..f37510f3124 100644
--- a/pc-bios/s390-ccw/virtio.c
+++ b/pc-bios/s390-ccw/virtio.c
@@ -289,9 +289,8 @@ void virtio_setup_ccw(VDev *vdev)
             .num = 0,
         };
 
-        IPL_assert(
-            run_ccw(vdev, CCW_CMD_READ_VQ_CONF, &config, sizeof(config), false) == 0,
-            "Could not get block device VQ configuration");
+        rc = run_ccw(vdev, CCW_CMD_READ_VQ_CONF, &config, sizeof(config), false);
+        IPL_assert(rc == 0, "Could not get virtio device VQ configuration");
         info.num = config.num;
         vring_init(&vdev->vrings[i], &info);
         vdev->vrings[i].schid = vdev->schid;
-- 
Gitee


From 32d146e4e8cc9b5a8b1a19ce38428844443ed975 Mon Sep 17 00:00:00 2001
From: Thomas Huth <thuth@redhat.com>
Date: Fri, 8 Jul 2022 12:29:50 +0200
Subject: [PATCH 145/407] pc-bios/s390-ccw: Split virtio-scsi code from
 virtio_blk_setup_device()

RH-Author: Thomas Huth <thuth@redhat.com>
RH-MergeRequest: 198: pc-bios/s390-ccw: Fix boot from disks with 4k sectors that do not have the typical DASD geometry
RH-Commit: [8/9] 8e24806a91c91b2e3603da88e5a22d96a91e8686
RH-Bugzilla: 2098076
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>

Bugzilla: http://bugzilla.redhat.com/2098076

commit cf30b7c4a9b2c64518be8037c2e6670aacdb00b9
Author: Thomas Huth <thuth@redhat.com>
Date:   Mon Jul 4 13:19:00 2022 +0200

    pc-bios/s390-ccw: Split virtio-scsi code from virtio_blk_setup_device()

    The next patch is going to add more virtio-block specific code to
    virtio_blk_setup_device(), and if the virtio-scsi code is also in
    there, this is more cumbersome. And the calling function virtio_setup()
    in main.c looks at the device type already anyway, so it's more
    logical to separate the virtio-scsi stuff into a new function in
    virtio-scsi.c instead.

    Message-Id: <20220704111903.62400-10-thuth@redhat.com>
    Reviewed-by: Eric Farman <farman@linux.ibm.com>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Thomas Huth <thuth@redhat.com>
---
 pc-bios/s390-ccw/main.c          | 24 +++++++++++++++++-------
 pc-bios/s390-ccw/virtio-blkdev.c | 20 ++------------------
 pc-bios/s390-ccw/virtio-scsi.c   | 19 ++++++++++++++++++-
 pc-bios/s390-ccw/virtio-scsi.h   |  2 +-
 4 files changed, 38 insertions(+), 27 deletions(-)

diff --git a/pc-bios/s390-ccw/main.c b/pc-bios/s390-ccw/main.c
index 5d2b7ba94de..13e1d8fdf79 100644
--- a/pc-bios/s390-ccw/main.c
+++ b/pc-bios/s390-ccw/main.c
@@ -14,6 +14,7 @@
 #include "s390-ccw.h"
 #include "cio.h"
 #include "virtio.h"
+#include "virtio-scsi.h"
 #include "dasd-ipl.h"
 
 char stack[PAGE_SIZE * 8] __attribute__((__aligned__(PAGE_SIZE)));
@@ -218,6 +219,7 @@ static int virtio_setup(void)
 {
     VDev *vdev = virtio_get_device();
     QemuIplParameters *early_qipl = (QemuIplParameters *)QIPL_ADDRESS;
+    int ret;
 
     memcpy(&qipl, early_qipl, sizeof(QemuIplParameters));
 
@@ -225,18 +227,26 @@ static int virtio_setup(void)
         menu_setup();
     }
 
-    if (virtio_get_device_type() == VIRTIO_ID_NET) {
+    switch (vdev->senseid.cu_model) {
+    case VIRTIO_ID_NET:
         sclp_print("Network boot device detected\n");
         vdev->netboot_start_addr = qipl.netboot_start_addr;
-    } else {
-        int ret = virtio_blk_setup_device(blk_schid);
-        if (ret) {
-            return ret;
-        }
+        return 0;
+    case VIRTIO_ID_BLOCK:
+        ret = virtio_blk_setup_device(blk_schid);
+        break;
+    case VIRTIO_ID_SCSI:
+        ret = virtio_scsi_setup_device(blk_schid);
+        break;
+    default:
+        panic("\n! No IPL device available !\n");
+    }
+
+    if (!ret) {
         IPL_assert(virtio_ipl_disk_is_valid(), "No valid IPL device detected");
     }
 
-    return 0;
+    return ret;
 }
 
 static void ipl_boot_device(void)
diff --git a/pc-bios/s390-ccw/virtio-blkdev.c b/pc-bios/s390-ccw/virtio-blkdev.c
index db1f7f44aab..c175b66a479 100644
--- a/pc-bios/s390-ccw/virtio-blkdev.c
+++ b/pc-bios/s390-ccw/virtio-blkdev.c
@@ -222,27 +222,11 @@ uint64_t virtio_get_blocks(void)
 int virtio_blk_setup_device(SubChannelId schid)
 {
     VDev *vdev = virtio_get_device();
-    int ret = 0;
 
     vdev->schid = schid;
     virtio_setup_ccw(vdev);
 
-    switch (vdev->senseid.cu_model) {
-    case VIRTIO_ID_BLOCK:
-        sclp_print("Using virtio-blk.\n");
-        break;
-    case VIRTIO_ID_SCSI:
-        IPL_assert(vdev->config.scsi.sense_size == VIRTIO_SCSI_SENSE_SIZE,
-            "Config: sense size mismatch");
-        IPL_assert(vdev->config.scsi.cdb_size == VIRTIO_SCSI_CDB_SIZE,
-            "Config: CDB size mismatch");
+    sclp_print("Using virtio-blk.\n");
 
-        sclp_print("Using virtio-scsi.\n");
-        ret = virtio_scsi_setup(vdev);
-        break;
-    default:
-        panic("\n! No IPL device available !\n");
-    }
-
-    return ret;
+    return 0;
 }
diff --git a/pc-bios/s390-ccw/virtio-scsi.c b/pc-bios/s390-ccw/virtio-scsi.c
index 2c8d0f30970..3b7069270ce 100644
--- a/pc-bios/s390-ccw/virtio-scsi.c
+++ b/pc-bios/s390-ccw/virtio-scsi.c
@@ -329,7 +329,7 @@ static void scsi_parse_capacity_report(void *data,
     }
 }
 
-int virtio_scsi_setup(VDev *vdev)
+static int virtio_scsi_setup(VDev *vdev)
 {
     int retry_test_unit_ready = 3;
     uint8_t data[256];
@@ -430,3 +430,20 @@ int virtio_scsi_setup(VDev *vdev)
 
     return 0;
 }
+
+int virtio_scsi_setup_device(SubChannelId schid)
+{
+    VDev *vdev = virtio_get_device();
+
+    vdev->schid = schid;
+    virtio_setup_ccw(vdev);
+
+    IPL_assert(vdev->config.scsi.sense_size == VIRTIO_SCSI_SENSE_SIZE,
+               "Config: sense size mismatch");
+    IPL_assert(vdev->config.scsi.cdb_size == VIRTIO_SCSI_CDB_SIZE,
+               "Config: CDB size mismatch");
+
+    sclp_print("Using virtio-scsi.\n");
+
+    return virtio_scsi_setup(vdev);
+}
diff --git a/pc-bios/s390-ccw/virtio-scsi.h b/pc-bios/s390-ccw/virtio-scsi.h
index 4b14c2c2f90..e6b6cd4815b 100644
--- a/pc-bios/s390-ccw/virtio-scsi.h
+++ b/pc-bios/s390-ccw/virtio-scsi.h
@@ -67,8 +67,8 @@ static inline bool virtio_scsi_response_ok(const VirtioScsiCmdResp *r)
         return r->response == VIRTIO_SCSI_S_OK && r->status == CDB_STATUS_GOOD;
 }
 
-int virtio_scsi_setup(VDev *vdev);
 int virtio_scsi_read_many(VDev *vdev,
                           ulong sector, void *load_addr, int sec_num);
+int virtio_scsi_setup_device(SubChannelId schid);
 
 #endif /* VIRTIO_SCSI_H */
-- 
Gitee


From edd07ddd9420f208ae71c2dbe92adab637c3162b Mon Sep 17 00:00:00 2001
From: Thomas Huth <thuth@redhat.com>
Date: Fri, 8 Jul 2022 12:29:50 +0200
Subject: [PATCH 146/407] pc-bios/s390-ccw/virtio-blkdev: Request the right
 feature bits

RH-Author: Thomas Huth <thuth@redhat.com>
RH-MergeRequest: 198: pc-bios/s390-ccw: Fix boot from disks with 4k sectors that do not have the typical DASD geometry
RH-Commit: [9/9] f04835423d648b04f2187ef9890f2d1689e2b57e
RH-Bugzilla: 2098076
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>

Bugzilla: http://bugzilla.redhat.com/2098076

commit 9125a314cca4a1838b09305a87d8efb98f80ab67
Author: Thomas Huth <thuth@redhat.com>
Date:   Mon Jul 4 13:19:01 2022 +0200

    pc-bios/s390-ccw/virtio-blkdev: Request the right feature bits

    The virtio-blk code uses the block size and geometry fields in the
    config area. According to the virtio-spec, these have to be negotiated
    with the right feature bits during initialization, otherwise they
    might not be available. QEMU is so far very forgiving and always
    provides them, but we should not rely on this behavior, so let's
    better request them properly via the VIRTIO_BLK_F_GEOMETRY and
    VIRTIO_BLK_F_BLK_SIZE feature bits.

    Message-Id: <20220704111903.62400-11-thuth@redhat.com>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Thomas Huth <thuth@redhat.com>
---
 pc-bios/s390-ccw/virtio-blkdev.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/pc-bios/s390-ccw/virtio-blkdev.c b/pc-bios/s390-ccw/virtio-blkdev.c
index c175b66a479..8271c472968 100644
--- a/pc-bios/s390-ccw/virtio-blkdev.c
+++ b/pc-bios/s390-ccw/virtio-blkdev.c
@@ -13,6 +13,9 @@
 #include "virtio.h"
 #include "virtio-scsi.h"
 
+#define VIRTIO_BLK_F_GEOMETRY   (1 << 4)
+#define VIRTIO_BLK_F_BLK_SIZE   (1 << 6)
+
 static int virtio_blk_read_many(VDev *vdev, ulong sector, void *load_addr,
                                 int sec_num)
 {
@@ -223,6 +226,7 @@ int virtio_blk_setup_device(SubChannelId schid)
 {
     VDev *vdev = virtio_get_device();
 
+    vdev->guest_features[0] = VIRTIO_BLK_F_GEOMETRY | VIRTIO_BLK_F_BLK_SIZE;
     vdev->schid = schid;
     virtio_setup_ccw(vdev);
 
-- 
Gitee


From c1b032b81d0c77a52aed7323dc9ca07b62a01289 Mon Sep 17 00:00:00 2001
From: Stefan Hajnoczi <stefanha@redhat.com>
Date: Thu, 9 Jun 2022 17:47:11 +0100
Subject: [PATCH 147/407] linux-aio: fix unbalanced plugged counter in
 laio_io_unplug()

RH-Author: Stefan Hajnoczi <stefanha@redhat.com>
RH-MergeRequest: 199: linux-aio: fix unbalanced plugged counter in laio_io_unplug()
RH-Commit: [1/2] f518df755090289905898a36922992288688e338
RH-Bugzilla: 2105410
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Acked-by: Hanna Reitz <hreitz@redhat.com>
RH-Acked-by: Stefano Garzarella <sgarzare@redhat.com>

Every laio_io_plug() call has a matching laio_io_unplug() call. There is
a plugged counter that tracks the number of levels of plugging and
allows for nesting.

The plugged counter must reflect the balance between laio_io_plug() and
laio_io_unplug() calls accurately. Otherwise I/O stalls occur since
io_submit(2) calls are skipped while plugged.

Reported-by: Nikolay Tenev <nt@storpool.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Message-id: 20220609164712.1539045-2-stefanha@redhat.com
Cc: Stefano Garzarella <sgarzare@redhat.com>
Fixes: 68d7946648 ("linux-aio: add `dev_max_batch` parameter to laio_io_unplug()")
[Stefano Garzarella suggested adding a Fixes tag.
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit f387cac5af030a58ac5a0dacf64cab5e5a4fe5c7)
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/linux-aio.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/linux-aio.c b/block/linux-aio.c
index f53ae72e21f..77f17ad5965 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -360,8 +360,10 @@ void laio_io_unplug(BlockDriverState *bs, LinuxAioState *s,
                     uint64_t dev_max_batch)
 {
     assert(s->io_q.plugged);
+    s->io_q.plugged--;
+
     if (s->io_q.in_queue >= laio_max_batch(s, dev_max_batch) ||
-        (--s->io_q.plugged == 0 &&
+        (!s->io_q.plugged &&
          !s->io_q.blocked && !QSIMPLEQ_EMPTY(&s->io_q.pending))) {
         ioq_submit(s);
     }
-- 
Gitee


From ce144ba8c7a885f13ee5249bfbd5ca17c4ce9484 Mon Sep 17 00:00:00 2001
From: Stefan Hajnoczi <stefanha@redhat.com>
Date: Thu, 9 Jun 2022 17:47:12 +0100
Subject: [PATCH 148/407] linux-aio: explain why max batch is checked in
 laio_io_unplug()

RH-Author: Stefan Hajnoczi <stefanha@redhat.com>
RH-MergeRequest: 199: linux-aio: fix unbalanced plugged counter in laio_io_unplug()
RH-Commit: [2/2] 8617870ed70e3a57269f06eeb242d0fab79a66fb
RH-Bugzilla: 2105410
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Acked-by: Hanna Reitz <hreitz@redhat.com>
RH-Acked-by: Stefano Garzarella <sgarzare@redhat.com>

It may not be obvious why laio_io_unplug() checks max batch. I discussed
this with Stefano and have added a comment summarizing the reason.

Cc: Stefano Garzarella <sgarzare@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Message-id: 20220609164712.1539045-3-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 99b969fbe105117f5af6060d3afef40ca39cc9c1)
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/linux-aio.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/block/linux-aio.c b/block/linux-aio.c
index 77f17ad5965..85650c4222c 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -362,6 +362,12 @@ void laio_io_unplug(BlockDriverState *bs, LinuxAioState *s,
     assert(s->io_q.plugged);
     s->io_q.plugged--;
 
+    /*
+     * Why max batch checking is performed here:
+     * Another BDS may have queued requests with a higher dev_max_batch and
+     * therefore in_queue could now exceed our dev_max_batch. Re-check the max
+     * batch so we can honor our device's dev_max_batch.
+     */
     if (s->io_q.in_queue >= laio_max_batch(s, dev_max_batch) ||
         (!s->io_q.plugged &&
          !s->io_q.blocked && !QSIMPLEQ_EMPTY(&s->io_q.pending))) {
-- 
Gitee


From 1b8641271a63221680edb6add06bd042e28f2e4d Mon Sep 17 00:00:00 2001
From: David Edmondson <david.edmondson@oracle.com>
Date: Tue, 21 Dec 2021 09:34:40 +0000
Subject: [PATCH 149/407] migration: Introduce ram_transferred_add()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Leonardo Brás <leobras@redhat.com>
RH-MergeRequest: 201: Zero-copy-send fixes + improvements
RH-Commit: [1/8] a6545760b0de13d533f6164be0545a6720bb42c7
RH-Bugzilla: 2110203
RH-Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
RH-Acked-by: Peter Xu <peterx@redhat.com>
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>

Replace direct manipulation of ram_counters.transferred with a
function.

Signed-off-by: David Edmondson <david.edmondson@oracle.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
(cherry picked from commit 4c2d0f6dca24f3396ab0718ad3f9f53cc53004df)
Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 migration/ram.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 3e208efca74..3e82c4ff461 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -391,6 +391,11 @@ uint64_t ram_bytes_remaining(void)
 
 MigrationStats ram_counters;
 
+static void ram_transferred_add(uint64_t bytes)
+{
+    ram_counters.transferred += bytes;
+}
+
 /* used by the search for pages to send */
 struct PageSearchStatus {
     /* Current block being searched */
@@ -772,7 +777,7 @@ static int save_xbzrle_page(RAMState *rs, uint8_t **current_data,
      * RAM_SAVE_FLAG_CONTINUE.
      */
     xbzrle_counters.bytes += bytes_xbzrle - 8;
-    ram_counters.transferred += bytes_xbzrle;
+    ram_transferred_add(bytes_xbzrle);
 
     return 1;
 }
@@ -1203,7 +1208,7 @@ static int save_zero_page(RAMState *rs, RAMBlock *block, ram_addr_t offset)
 
     if (len) {
         ram_counters.duplicate++;
-        ram_counters.transferred += len;
+        ram_transferred_add(len);
         return 1;
     }
     return -1;
@@ -1239,7 +1244,7 @@ static bool control_save_page(RAMState *rs, RAMBlock *block, ram_addr_t offset,
     }
 
     if (bytes_xmit) {
-        ram_counters.transferred += bytes_xmit;
+        ram_transferred_add(bytes_xmit);
         *pages = 1;
     }
 
@@ -1270,8 +1275,8 @@ static bool control_save_page(RAMState *rs, RAMBlock *block, ram_addr_t offset,
 static int save_normal_page(RAMState *rs, RAMBlock *block, ram_addr_t offset,
                             uint8_t *buf, bool async)
 {
-    ram_counters.transferred += save_page_header(rs, rs->f, block,
-                                                 offset | RAM_SAVE_FLAG_PAGE);
+    ram_transferred_add(save_page_header(rs, rs->f, block,
+                                         offset | RAM_SAVE_FLAG_PAGE));
     if (async) {
         qemu_put_buffer_async(rs->f, buf, TARGET_PAGE_SIZE,
                               migrate_release_ram() &
@@ -1279,7 +1284,7 @@ static int save_normal_page(RAMState *rs, RAMBlock *block, ram_addr_t offset,
     } else {
         qemu_put_buffer(rs->f, buf, TARGET_PAGE_SIZE);
     }
-    ram_counters.transferred += TARGET_PAGE_SIZE;
+    ram_transferred_add(TARGET_PAGE_SIZE);
     ram_counters.normal++;
     return 1;
 }
@@ -1378,7 +1383,7 @@ exit:
 static void
 update_compress_thread_counts(const CompressParam *param, int bytes_xmit)
 {
-    ram_counters.transferred += bytes_xmit;
+    ram_transferred_add(bytes_xmit);
 
     if (param->zero_page) {
         ram_counters.duplicate++;
@@ -2303,7 +2308,7 @@ void acct_update_position(QEMUFile *f, size_t size, bool zero)
         ram_counters.duplicate += pages;
     } else {
         ram_counters.normal += pages;
-        ram_counters.transferred += size;
+        ram_transferred_add(size);
         qemu_update_position(f, size);
     }
 }
@@ -3147,7 +3152,7 @@ out:
 
         qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
         qemu_fflush(f);
-        ram_counters.transferred += 8;
+        ram_transferred_add(8);
 
         ret = qemu_file_get_error(f);
     }
-- 
Gitee


From ff2511292806be80c4964a55937d3c80185a180c Mon Sep 17 00:00:00 2001
From: David Edmondson <david.edmondson@oracle.com>
Date: Tue, 21 Dec 2021 09:34:41 +0000
Subject: [PATCH 150/407] migration: Tally pre-copy, downtime and post-copy
 bytes independently
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Leonardo Brás <leobras@redhat.com>
RH-MergeRequest: 201: Zero-copy-send fixes + improvements
RH-Commit: [2/8] 7d1bf37a3d93da88da6525d70fc1fce1abb92b83
RH-Bugzilla: 2110203
RH-Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
RH-Acked-by: Peter Xu <peterx@redhat.com>
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>

Provide information on the number of bytes copied in the pre-copy,
downtime and post-copy phases of migration.

Signed-off-by: David Edmondson <david.edmondson@oracle.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
(cherry picked from commit ae6806688016711bb9ec7541266d76ab511c5e3b)
Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 migration/migration.c |  3 +++
 migration/ram.c       |  7 +++++++
 monitor/hmp-cmds.c    | 12 ++++++++++++
 qapi/migration.json   | 13 ++++++++++++-
 4 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index 616c3ff32ee..e100b30f00a 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1016,6 +1016,9 @@ static void populate_ram_info(MigrationInfo *info, MigrationState *s)
     info->ram->page_size = page_size;
     info->ram->multifd_bytes = ram_counters.multifd_bytes;
     info->ram->pages_per_second = s->pages_per_second;
+    info->ram->precopy_bytes = ram_counters.precopy_bytes;
+    info->ram->downtime_bytes = ram_counters.downtime_bytes;
+    info->ram->postcopy_bytes = ram_counters.postcopy_bytes;
 
     if (migrate_use_xbzrle()) {
         info->has_xbzrle_cache = true;
diff --git a/migration/ram.c b/migration/ram.c
index 3e82c4ff461..e7173da2177 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -393,6 +393,13 @@ MigrationStats ram_counters;
 
 static void ram_transferred_add(uint64_t bytes)
 {
+    if (runstate_is_running()) {
+        ram_counters.precopy_bytes += bytes;
+    } else if (migration_in_postcopy()) {
+        ram_counters.postcopy_bytes += bytes;
+    } else {
+        ram_counters.downtime_bytes += bytes;
+    }
     ram_counters.transferred += bytes;
 }
 
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 2669156b284..8c384dc1b21 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -293,6 +293,18 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
             monitor_printf(mon, "postcopy request count: %" PRIu64 "\n",
                            info->ram->postcopy_requests);
         }
+        if (info->ram->precopy_bytes) {
+            monitor_printf(mon, "precopy ram: %" PRIu64 " kbytes\n",
+                           info->ram->precopy_bytes >> 10);
+        }
+        if (info->ram->downtime_bytes) {
+            monitor_printf(mon, "downtime ram: %" PRIu64 " kbytes\n",
+                           info->ram->downtime_bytes >> 10);
+        }
+        if (info->ram->postcopy_bytes) {
+            monitor_printf(mon, "postcopy ram: %" PRIu64 " kbytes\n",
+                           info->ram->postcopy_bytes >> 10);
+        }
     }
 
     if (info->has_disk) {
diff --git a/qapi/migration.json b/qapi/migration.json
index fe70a0c4b2b..c8ec260ab01 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -46,6 +46,15 @@
 # @pages-per-second: the number of memory pages transferred per second
 #                    (Since 4.0)
 #
+# @precopy-bytes: The number of bytes sent in the pre-copy phase
+#                 (since 7.0).
+#
+# @downtime-bytes: The number of bytes sent while the guest is paused
+#                  (since 7.0).
+#
+# @postcopy-bytes: The number of bytes sent during the post-copy phase
+#                  (since 7.0).
+#
 # Since: 0.14
 ##
 { 'struct': 'MigrationStats',
@@ -54,7 +63,9 @@
            'normal-bytes': 'int', 'dirty-pages-rate' : 'int',
            'mbps' : 'number', 'dirty-sync-count' : 'int',
            'postcopy-requests' : 'int', 'page-size' : 'int',
-           'multifd-bytes' : 'uint64', 'pages-per-second' : 'uint64' } }
+           'multifd-bytes' : 'uint64', 'pages-per-second' : 'uint64',
+           'precopy-bytes' : 'uint64', 'downtime-bytes' : 'uint64',
+           'postcopy-bytes' : 'uint64' } }
 
 ##
 # @XBZRLECacheStats:
-- 
Gitee


From edb1c66b617c167a8f5a4305209d0c5ecc7461ce Mon Sep 17 00:00:00 2001
From: Leonardo Bras <leobras@redhat.com>
Date: Mon, 11 Jul 2022 18:11:11 -0300
Subject: [PATCH 151/407] QIOChannelSocket: Fix zero-copy flush returning code
 1 when nothing sent
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Leonardo Brás <leobras@redhat.com>
RH-MergeRequest: 201: Zero-copy-send fixes + improvements
RH-Commit: [3/8] 1ad707702fa26cd4d0fa1870c21f5f26ae93ff97
RH-Bugzilla: 2110203
RH-Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
RH-Acked-by: Peter Xu <peterx@redhat.com>
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>

If flush is called when no buffer was sent with MSG_ZEROCOPY, it currently
returns 1. This return code should be used only when Linux fails to use
MSG_ZEROCOPY on a lot of sendmsg().

Fix this by returning early from flush if no sendmsg(...,MSG_ZEROCOPY)
was attempted.

Fixes: 2bc58ffc2926 ("QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX")
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Acked-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220711211112.18951-2-leobras@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
(cherry picked from commit 927f93e099c4f9184e60a1bc61624ac2d04d0223)
Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 io/channel-socket.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/io/channel-socket.c b/io/channel-socket.c
index df858da9247..cf0d67c51b0 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -717,12 +717,18 @@ static int qio_channel_socket_flush(QIOChannel *ioc,
     struct cmsghdr *cm;
     char control[CMSG_SPACE(sizeof(*serr))];
     int received;
-    int ret = 1;
+    int ret;
+
+    if (sioc->zero_copy_queued == sioc->zero_copy_sent) {
+        return 0;
+    }
 
     msg.msg_control = control;
     msg.msg_controllen = sizeof(control);
     memset(control, 0, sizeof(control));
 
+    ret = 1;
+
     while (sioc->zero_copy_sent < sioc->zero_copy_queued) {
         received = recvmsg(sioc->fd, &msg, MSG_ERRQUEUE);
         if (received < 0) {
-- 
Gitee


From 4173deb83dd08fe32ab3080ead70dffab2451085 Mon Sep 17 00:00:00 2001
From: Leonardo Bras <leobras@redhat.com>
Date: Mon, 11 Jul 2022 18:11:12 -0300
Subject: [PATCH 152/407] Add dirty-sync-missed-zero-copy migration stat
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Leonardo Brás <leobras@redhat.com>
RH-MergeRequest: 201: Zero-copy-send fixes + improvements
RH-Commit: [4/8] 56cce61cf95aafc8dafae7531b43c166084abfec
RH-Bugzilla: 2110203
RH-Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
RH-Acked-by: Peter Xu <peterx@redhat.com>
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>

Signed-off-by: Leonardo Bras <leobras@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220711211112.18951-3-leobras@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
(cherry picked from commit cf20c897338067ab4b70a4596fdccaf90c7e29a1)
Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 migration/migration.c | 2 ++
 monitor/hmp-cmds.c    | 5 +++++
 qapi/migration.json   | 7 ++++++-
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index e100b30f00a..952a26c5c2e 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1012,6 +1012,8 @@ static void populate_ram_info(MigrationInfo *info, MigrationState *s)
     info->ram->normal_bytes = ram_counters.normal * page_size;
     info->ram->mbps = s->mbps;
     info->ram->dirty_sync_count = ram_counters.dirty_sync_count;
+    info->ram->dirty_sync_missed_zero_copy =
+            ram_counters.dirty_sync_missed_zero_copy;
     info->ram->postcopy_requests = ram_counters.postcopy_requests;
     info->ram->page_size = page_size;
     info->ram->multifd_bytes = ram_counters.multifd_bytes;
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 8c384dc1b21..f7216ab5d02 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -305,6 +305,11 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
             monitor_printf(mon, "postcopy ram: %" PRIu64 " kbytes\n",
                            info->ram->postcopy_bytes >> 10);
         }
+        if (info->ram->dirty_sync_missed_zero_copy) {
+            monitor_printf(mon,
+                           "Zero-copy-send fallbacks happened: %" PRIu64 " times\n",
+                           info->ram->dirty_sync_missed_zero_copy);
+        }
     }
 
     if (info->has_disk) {
diff --git a/qapi/migration.json b/qapi/migration.json
index c8ec260ab01..94bc5c69dbb 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -55,6 +55,10 @@
 # @postcopy-bytes: The number of bytes sent during the post-copy phase
 #                  (since 7.0).
 #
+# @dirty-sync-missed-zero-copy: Number of times dirty RAM synchronization could
+#                               not avoid copying dirty pages. This is between
+#                               0 and @dirty-sync-count * @multifd-channels.
+#                               (since 7.1)
 # Since: 0.14
 ##
 { 'struct': 'MigrationStats',
@@ -65,7 +69,8 @@
            'postcopy-requests' : 'int', 'page-size' : 'int',
            'multifd-bytes' : 'uint64', 'pages-per-second' : 'uint64',
            'precopy-bytes' : 'uint64', 'downtime-bytes' : 'uint64',
-           'postcopy-bytes' : 'uint64' } }
+           'postcopy-bytes' : 'uint64',
+           'dirty-sync-missed-zero-copy' : 'uint64' } }
 
 ##
 # @XBZRLECacheStats:
-- 
Gitee


From 0438e6686da60d464a230b048958836349aae637 Mon Sep 17 00:00:00 2001
From: Leonardo Bras <leobras@redhat.com>
Date: Mon, 11 Jul 2022 18:11:13 -0300
Subject: [PATCH 153/407] migration/multifd: Report to user when zerocopy not
 working
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Leonardo Brás <leobras@redhat.com>
RH-MergeRequest: 201: Zero-copy-send fixes + improvements
RH-Commit: [5/8] 0b2e23b7f8ae72936e11369cd44ba474ef3b9e8c
RH-Bugzilla: 2110203
RH-Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
RH-Acked-by: Peter Xu <peterx@redhat.com>
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>

Some errors, like the lack of Scatter-Gather support by the network
interface(NETIF_F_SG) may cause sendmsg(...,MSG_ZEROCOPY) to fail on using
zero-copy, which causes it to fall back to the default copying mechanism.

After each full dirty-bitmap scan there should be a zero-copy flush
happening, which checks for errors each of the previous calls to
sendmsg(...,MSG_ZEROCOPY). If all of them failed to use zero-copy, then
increment dirty_sync_missed_zero_copy migration stat to let the user know
about it.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220711211112.18951-4-leobras@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
(cherry picked from commit d59c40cc483729f2e67c80e58df769ad19976fe9)
Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 migration/multifd.c | 2 ++
 migration/ram.c     | 5 +++++
 migration/ram.h     | 2 ++
 3 files changed, 9 insertions(+)

diff --git a/migration/multifd.c b/migration/multifd.c
index 90ab4c4346d..7c16523e6ba 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -631,6 +631,8 @@ int multifd_send_sync_main(QEMUFile *f)
             if (ret < 0) {
                 error_report_err(err);
                 return -1;
+            } else if (ret == 1) {
+                dirty_sync_missed_zero_copy();
             }
         }
     }
diff --git a/migration/ram.c b/migration/ram.c
index e7173da2177..93cdb456ac8 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -403,6 +403,11 @@ static void ram_transferred_add(uint64_t bytes)
     ram_counters.transferred += bytes;
 }
 
+void dirty_sync_missed_zero_copy(void)
+{
+    ram_counters.dirty_sync_missed_zero_copy++;
+}
+
 /* used by the search for pages to send */
 struct PageSearchStatus {
     /* Current block being searched */
diff --git a/migration/ram.h b/migration/ram.h
index c515396a9a3..69c3ccb26a6 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -88,4 +88,6 @@ void ram_write_tracking_prepare(void);
 int ram_write_tracking_start(void);
 void ram_write_tracking_stop(void);
 
+void dirty_sync_missed_zero_copy(void);
+
 #endif
-- 
Gitee


From 0a18e5997debc89655f3dcb22e514f1385e1390b Mon Sep 17 00:00:00 2001
From: Leonardo Bras <leobras@redhat.com>
Date: Tue, 19 Jul 2022 09:23:45 -0300
Subject: [PATCH 154/407] migration: Avoid false-positive on non-supported
 scenarios for zero-copy-send
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Leonardo Brás <leobras@redhat.com>
RH-MergeRequest: 201: Zero-copy-send fixes + improvements
RH-Commit: [6/8] f23195f3ab4f6eba0463f38e5971ccaccdac2cfd
RH-Bugzilla: 2110203
RH-Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
RH-Acked-by: Peter Xu <peterx@redhat.com>
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>

Migration with zero-copy-send currently has it's limitations, as it can't
be used with TLS nor any kind of compression. In such scenarios, it should
output errors during parameter / capability setting.

But currently there are some ways of setting this not-supported scenarios
without printing the error message:

!) For 'compression' capability, it works by enabling it together with
zero-copy-send. This happens because the validity test for zero-copy uses
the helper unction migrate_use_compression(), which check for compression
presence in s->enabled_capabilities[MIGRATION_CAPABILITY_COMPRESS].

The point here is: the validity test happens before the capability gets
enabled. If all of them get enabled together, this test will not return
error.

In order to fix that, replace migrate_use_compression() by directly testing
the cap_list parameter migrate_caps_check().

2) For features enabled by parameters such as TLS & 'multifd_compression',
there was also a possibility of setting non-supported scenarios: setting
zero-copy-send first, then setting the unsupported parameter.

In order to fix that, also add a check for parameters conflicting with
zero-copy-send on migrate_params_check().

3) XBZRLE is also a compression capability, so it makes sense to also add
it to the list of capabilities which are not supported with zero-copy-send.

Fixes: 1abaec9a1b2c ("migration: Change zero_copy_send from migration parameter to migration capability")
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Message-Id: <20220719122345.253713-1-leobras@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
(cherry picked from commit 90eb69e4f1a16b388d0483543bf6bfc69a9966e4)
Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 migration/migration.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index 952a26c5c2e..35b3197eff5 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1260,7 +1260,9 @@ static bool migrate_caps_check(bool *cap_list,
 #ifdef CONFIG_LINUX
     if (cap_list[MIGRATION_CAPABILITY_ZERO_COPY_SEND] &&
         (!cap_list[MIGRATION_CAPABILITY_MULTIFD] ||
-         migrate_use_compression() ||
+         cap_list[MIGRATION_CAPABILITY_COMPRESS] ||
+         cap_list[MIGRATION_CAPABILITY_XBZRLE] ||
+         migrate_multifd_compression() ||
          migrate_use_tls())) {
         error_setg(errp,
                    "Zero copy only available for non-compressed non-TLS multifd migration");
@@ -1497,6 +1499,17 @@ static bool migrate_params_check(MigrationParameters *params, Error **errp)
         error_prepend(errp, "Invalid mapping given for block-bitmap-mapping: ");
         return false;
     }
+
+#ifdef CONFIG_LINUX
+    if (migrate_use_zero_copy_send() &&
+        ((params->has_multifd_compression && params->multifd_compression) ||
+         (params->has_tls_creds && params->tls_creds && *params->tls_creds))) {
+        error_setg(errp,
+                   "Zero copy only available for non-compressed non-TLS multifd migration");
+        return false;
+    }
+#endif
+
     return true;
 }
 
-- 
Gitee


From 3d1ace267c29c840c61d186812fc46557382454d Mon Sep 17 00:00:00 2001
From: Leonardo Bras <leobras@redhat.com>
Date: Mon, 25 Jul 2022 22:02:35 -0300
Subject: [PATCH 155/407] migration: add remaining params->has_* = true in
 migration_instance_init()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Leonardo Brás <leobras@redhat.com>
RH-MergeRequest: 201: Zero-copy-send fixes + improvements
RH-Commit: [7/8] fb622e5b88e14eb859d4903d9c088ba6ca63fc81
RH-Bugzilla: 2110203
RH-Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
RH-Acked-by: Peter Xu <peterx@redhat.com>
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>

Some of params->has_* = true are missing in migration_instance_init, this
causes migrate_params_check() to skip some tests, allowing some
unsupported scenarios.

Fix this by adding all missing params->has_* = true in
migration_instance_init().

Fixes: 69ef1f36b0 ("migration: define 'tls-creds' and 'tls-hostname' migration parameters")
Fixes: 1d58872a91 ("migration: do not wait for free thread")
Fixes: d2f1d29b95 ("migration: add support for a "tls-authz" migration parameter")
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Message-Id: <20220726010235.342927-1-leobras@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
(cherry picked from commit df67aa3e61e2c83459da7d815962d9706f1528fc)
Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 migration/migration.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 35b3197eff5..51e6726dac6 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -4334,6 +4334,7 @@ static void migration_instance_init(Object *obj)
     /* Set has_* up only for parameter checks */
     params->has_compress_level = true;
     params->has_compress_threads = true;
+    params->has_compress_wait_thread = true;
     params->has_decompress_threads = true;
     params->has_throttle_trigger_threshold = true;
     params->has_cpu_throttle_initial = true;
@@ -4354,6 +4355,9 @@ static void migration_instance_init(Object *obj)
     params->has_announce_max = true;
     params->has_announce_rounds = true;
     params->has_announce_step = true;
+    params->has_tls_creds = true;
+    params->has_tls_hostname = true;
+    params->has_tls_authz = true;
 
     qemu_sem_init(&ms->postcopy_pause_sem, 0);
     qemu_sem_init(&ms->postcopy_pause_rp_sem, 0);
-- 
Gitee


From c74460c3d63214854bba7f2768abe9f7c0179e85 Mon Sep 17 00:00:00 2001
From: Leonardo Bras <leobras@redhat.com>
Date: Thu, 4 Aug 2022 04:10:43 -0300
Subject: [PATCH 156/407] QIOChannelSocket: Add support for MSG_ZEROCOPY + IPV6
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Leonardo Brás <leobras@redhat.com>
RH-MergeRequest: 201: Zero-copy-send fixes + improvements
RH-Commit: [8/8] 6e26ee7c9ebaedb07623313cb0678816867751dd
RH-Bugzilla: 2110203
RH-Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
RH-Acked-by: Peter Xu <peterx@redhat.com>
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>

For using MSG_ZEROCOPY, there are two steps:
1 - io_writev() the packet, which enqueues the packet for sending, and
2 - io_flush(), which gets confirmation that all packets got correctly sent

Currently, if MSG_ZEROCOPY is used to send packets over IPV6, no error will
be reported in (1), but it will fail in the first time (2) happens.

This happens because (2) currently checks for cmsg_level & cmsg_type
associated with IPV4 only, before reporting any error.

Add checks for cmsg_level & cmsg_type associated with IPV6, and thus enable
support for MSG_ZEROCOPY + IPV6

Fixes: 2bc58ffc29 ("QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX")
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
(cherry picked from commit 5258a7e2c0677d16e9e1d06845f60171adf0b290)
Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 io/channel-socket.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/io/channel-socket.c b/io/channel-socket.c
index cf0d67c51b0..6010ad7017e 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -747,8 +747,8 @@ static int qio_channel_socket_flush(QIOChannel *ioc,
         }
 
         cm = CMSG_FIRSTHDR(&msg);
-        if (cm->cmsg_level != SOL_IP &&
-            cm->cmsg_type != IP_RECVERR) {
+        if (cm->cmsg_level != SOL_IP   && cm->cmsg_type != IP_RECVERR &&
+            cm->cmsg_level != SOL_IPV6 && cm->cmsg_type != IPV6_RECVERR) {
             error_setg_errno(errp, EPROTOTYPE,
                              "Wrong cmsg in errqueue");
             return -1;
-- 
Gitee


From 77c68cd274544ef55fd45dd109fba3123ac3e2fe Mon Sep 17 00:00:00 2001
From: Thomas Huth <thuth@redhat.com>
Date: Fri, 5 Aug 2022 11:42:14 +0200
Subject: [PATCH 157/407] pc-bios/s390-ccw: Fix booting with logical block size
 < physical block size

RH-Author: Thomas Huth <thuth@redhat.com>
RH-MergeRequest: 207: pc-bios/s390-ccw: Fix booting with logical block size < physical block size
RH-Commit: [1/1] ab22832592e0a48277bf7aca1b941a1be79aeab6
RH-Bugzilla: 2112296
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Acked-by: Claudio Imbrenda <None>

For accessing single blocks during boot, it's the logical block size that
matters. (Physical block sizes are rather interesting e.g. for creating
file systems with the correct alignment for speed reasons etc.).
So the s390-ccw bios has to use the logical block size for calculating
sector numbers during the boot phase, the "physical_block_exp" shift
value must not be taken into account. This change fixes the boot process
when the guest hast been installed on a disk where the logical block size
differs from the physical one, e.g. if the guest has been installed
like this:

 qemu-system-s390x -nographic -accel kvm -m 2G \
  -drive if=none,id=d1,file=fedora.iso,format=raw,media=cdrom \
  -device virtio-scsi -device scsi-cd,drive=d1 \
  -drive if=none,id=d2,file=test.qcow2,format=qcow2
  -device virtio-blk,drive=d2,physical_block_size=4096,logical_block_size=512

Linux correctly uses the logical block size of 512 for the installation,
but the s390-ccw bios tries to boot from a disk with 4096 block size so
far, as long as this patch has not been applied yet (well, it used to work
by accident in the past due to the virtio_assume_scsi() hack that used to
enforce 512 byte sectors on all virtio-block disks, but that hack has been
well removed in commit 5447de2619050a0a4d to fix other scenarios).

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2112296
Message-Id: <20220805094214.285223-1-thuth@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 393296de19650e1400ca265914cfdeb313725363)
---
 pc-bios/s390-ccw/virtio-blkdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pc-bios/s390-ccw/virtio-blkdev.c b/pc-bios/s390-ccw/virtio-blkdev.c
index 8271c472968..794f99b42c2 100644
--- a/pc-bios/s390-ccw/virtio-blkdev.c
+++ b/pc-bios/s390-ccw/virtio-blkdev.c
@@ -173,7 +173,7 @@ int virtio_get_block_size(void)
 
     switch (vdev->senseid.cu_model) {
     case VIRTIO_ID_BLOCK:
-        return vdev->config.blk.blk_size << vdev->config.blk.physical_block_exp;
+        return vdev->config.blk.blk_size;
     case VIRTIO_ID_SCSI:
         return vdev->scsi_block_size;
     }
-- 
Gitee


From 4cb60988b1c00eb3ebfcae505eb0e64c4b8d8eab Mon Sep 17 00:00:00 2001
From: Kevin Wolf <kwolf@redhat.com>
Date: Mon, 22 Aug 2022 14:53:20 +0200
Subject: [PATCH 158/407] scsi-generic: Fix emulated block limits VPD page

RH-Author: Kevin Wolf <kwolf@redhat.com>
RH-MergeRequest: 212: scsi-generic: Fix emulated block limits VPD page
RH-Commit: [1/1] d3ba6b2e03039043716ddc6b7d4a424d92249081
RH-Bugzilla: 2120279
RH-Acked-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
RH-Acked-by: Hanna Reitz <hreitz@redhat.com>
RH-Acked-by: Paolo Bonzini <pbonzini@redhat.com>
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

Commits 01ef8185b80 amd 24b36e9813e updated the way that the maximum
transfer length is calculated for patching block limits VPD page in an
INQUIRY response.

The same updates also need to be made for the case where the host device
does not support the block limits VPD page at all and we emulate the
whole page.

Without this fix, on host block devices a maximum transfer length of
(INT_MAX - sector_size) bytes is advertised to the guest, resulting in
I/O errors when a request that exceeds the host limits is made by the
guest. (Prior to commit 24b36e9813e, this code path would use the
max_transfer value from the host instead of INT_MAX, but still miss the
fix from 01ef8185b80 where max_transfer is also capped to max_iov
host pages, so it would be less wrong, but still wrong.)

Cc: qemu-stable@nongnu.org
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2096251
Fixes: 01ef8185b809af9d287e1a03a3f9d8ea8231118a
Fixes: 24b36e9813ec15da7db62e3b3621730710c5f020
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20220822125320.48257-1-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 51e15194b0a091e5c40aab2eb234a1d36c5c58ee)

Resolved conflict: qemu_real_host_page_size() is a getter function in
current upstream, but still just a public global variable downstream.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 hw/scsi/scsi-generic.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
index 0306ccc7b1e..37428998396 100644
--- a/hw/scsi/scsi-generic.c
+++ b/hw/scsi/scsi-generic.c
@@ -147,6 +147,18 @@ static int execute_command(BlockBackend *blk,
     return 0;
 }
 
+static uint64_t calculate_max_transfer(SCSIDevice *s)
+{
+    uint64_t max_transfer = blk_get_max_hw_transfer(s->conf.blk);
+    uint32_t max_iov = blk_get_max_hw_iov(s->conf.blk);
+
+    assert(max_transfer);
+    max_transfer = MIN_NON_ZERO(max_transfer,
+                                max_iov * qemu_real_host_page_size);
+
+    return max_transfer / s->blocksize;
+}
+
 static int scsi_handle_inquiry_reply(SCSIGenericReq *r, SCSIDevice *s, int len)
 {
     uint8_t page, page_idx;
@@ -179,12 +191,7 @@ static int scsi_handle_inquiry_reply(SCSIGenericReq *r, SCSIDevice *s, int len)
         (r->req.cmd.buf[1] & 0x01)) {
         page = r->req.cmd.buf[2];
         if (page == 0xb0) {
-            uint64_t max_transfer = blk_get_max_hw_transfer(s->conf.blk);
-            uint32_t max_iov = blk_get_max_hw_iov(s->conf.blk);
-
-            assert(max_transfer);
-            max_transfer = MIN_NON_ZERO(max_transfer, max_iov * qemu_real_host_page_size)
-                / s->blocksize;
+            uint64_t max_transfer = calculate_max_transfer(s);
             stl_be_p(&r->buf[8], max_transfer);
             /* Also take care of the opt xfer len. */
             stl_be_p(&r->buf[12],
@@ -230,7 +237,7 @@ static int scsi_generic_emulate_block_limits(SCSIGenericReq *r, SCSIDevice *s)
     uint8_t buf[64];
 
     SCSIBlockLimits bl = {
-        .max_io_sectors = blk_get_max_transfer(s->conf.blk) / s->blocksize
+        .max_io_sectors = calculate_max_transfer(s),
     };
 
     memset(r->buf, 0, r->buflen);
-- 
Gitee


From ebe19f54f4b7d50d86248ccbf601ac51083ee97e Mon Sep 17 00:00:00 2001
From: Thomas Huth <thuth@redhat.com>
Date: Wed, 10 Aug 2022 14:57:18 +0200
Subject: [PATCH 159/407] backends/hostmem: Fix support of memory-backend-memfd
 in qemu_maxrampagesize()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <None>
RH-MergeRequest: 221: backends/hostmem: Fix support of memory-backend-memfd in qemu_maxrampagesize()
RH-Bugzilla: 2117149
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Commit: [1/1] b5a1047750af32c0a261b8385ea0e819eb16681a

It is currently not possible yet to use "memory-backend-memfd" on s390x
with hugepages enabled. This problem is caused by qemu_maxrampagesize()
not taking memory-backend-memfd objects into account yet, so the code
in s390_memory_init() fails to enable the huge page support there via
s390_set_max_pagesize(). Fix it by generalizing the code, so that it
looks at qemu_ram_pagesize(memdev->mr.ram_block) instead of re-trying
to get the information from the filesystem.

Suggested-by: David Hildenbrand <david@redhat.com>
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2116496
Message-Id: <20220810125720.3849835-2-thuth@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 8be934b70e923104da883b990dee18f02552d40e)
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2117149
[clg: Resolved conflict on qemu_real_host_page_size() ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 backends/hostmem.c | 14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/backends/hostmem.c b/backends/hostmem.c
index 4c05862ed5a..0c4654ea85d 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -305,22 +305,12 @@ bool host_memory_backend_is_mapped(HostMemoryBackend *backend)
     return backend->is_mapped;
 }
 
-#ifdef __linux__
 size_t host_memory_backend_pagesize(HostMemoryBackend *memdev)
 {
-    Object *obj = OBJECT(memdev);
-    char *path = object_property_get_str(obj, "mem-path", NULL);
-    size_t pagesize = qemu_mempath_getpagesize(path);
-
-    g_free(path);
+    size_t pagesize = qemu_ram_pagesize(memdev->mr.ram_block);
+    g_assert(pagesize >= qemu_real_host_page_size);
     return pagesize;
 }
-#else
-size_t host_memory_backend_pagesize(HostMemoryBackend *memdev)
-{
-    return qemu_real_host_page_size;
-}
-#endif
 
 static void
 host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
-- 
Gitee


From 12145990e43a1f3954955e5910962d8d400cd8c6 Mon Sep 17 00:00:00 2001
From: Vitaly Kuznetsov <vkuznets@redhat.com>
Date: Thu, 18 Aug 2022 17:01:12 +0200
Subject: [PATCH 160/407] i386: reset KVM nested state upon CPU reset

RH-Author: Miroslav Rezanina <mrezanin@redhat.com>
RH-MergeRequest: 219: Synchronize qemu-6.2.0-20.el8.1 build from RHEL 8.7 to RHEL 8.8
RH-Bugzilla: 2125271
RH-Acked-by: Vitaly Kuznetsov <vkuznets@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [1/2] de4db7bceb6baaf69aec8b0ae9aa8887aa869e15

Make sure env->nested_state is cleaned up when a vCPU is reset, it may
be stale after an incoming migration, kvm_arch_put_registers() may
end up failing or putting vCPU in a weird state.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220818150113.479917-2-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 3cafdb67504a34a0305260f0c86a73d5a3fb000b)
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 target/i386/kvm/kvm.c | 37 +++++++++++++++++++++++++++----------
 1 file changed, 27 insertions(+), 10 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index bd439e56ad6..81d729dc408 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -1615,6 +1615,30 @@ static void kvm_init_xsave(CPUX86State *env)
            env->xsave_buf_len);
 }
 
+static void kvm_init_nested_state(CPUX86State *env)
+{
+    struct kvm_vmx_nested_state_hdr *vmx_hdr;
+    uint32_t size;
+
+    if (!env->nested_state) {
+        return;
+    }
+
+    size = env->nested_state->size;
+
+    memset(env->nested_state, 0, size);
+    env->nested_state->size = size;
+
+    if (cpu_has_vmx(env)) {
+        env->nested_state->format = KVM_STATE_NESTED_FORMAT_VMX;
+        vmx_hdr = &env->nested_state->hdr.vmx;
+        vmx_hdr->vmxon_pa = -1ull;
+        vmx_hdr->vmcs12_pa = -1ull;
+    } else if (cpu_has_svm(env)) {
+        env->nested_state->format = KVM_STATE_NESTED_FORMAT_SVM;
+    }
+}
+
 int kvm_arch_init_vcpu(CPUState *cs)
 {
     struct {
@@ -2042,19 +2066,10 @@ int kvm_arch_init_vcpu(CPUState *cs)
         assert(max_nested_state_len >= offsetof(struct kvm_nested_state, data));
 
         if (cpu_has_vmx(env) || cpu_has_svm(env)) {
-            struct kvm_vmx_nested_state_hdr *vmx_hdr;
-
             env->nested_state = g_malloc0(max_nested_state_len);
             env->nested_state->size = max_nested_state_len;
 
-            if (cpu_has_vmx(env)) {
-                env->nested_state->format = KVM_STATE_NESTED_FORMAT_VMX;
-                vmx_hdr = &env->nested_state->hdr.vmx;
-                vmx_hdr->vmxon_pa = -1ull;
-                vmx_hdr->vmcs12_pa = -1ull;
-            } else {
-                env->nested_state->format = KVM_STATE_NESTED_FORMAT_SVM;
-            }
+            kvm_init_nested_state(env);
         }
     }
 
@@ -2117,6 +2132,8 @@ void kvm_arch_reset_vcpu(X86CPU *cpu)
     /* enabled by default */
     env->poll_control_msr = 1;
 
+    kvm_init_nested_state(env);
+
     sev_es_set_reset_vector(CPU(cpu));
 }
 
-- 
Gitee


From fd0d85a42587bee285f33a9a889fcd24aa700413 Mon Sep 17 00:00:00 2001
From: Vitaly Kuznetsov <vkuznets@redhat.com>
Date: Thu, 18 Aug 2022 17:01:13 +0200
Subject: [PATCH 161/407] i386: do kvm_put_msr_feature_control() first thing
 when vCPU is reset

RH-Author: Miroslav Rezanina <mrezanin@redhat.com>
RH-MergeRequest: 219: Synchronize qemu-6.2.0-20.el8.1 build from RHEL 8.7 to RHEL 8.8
RH-Bugzilla: 2125271
RH-Acked-by: Vitaly Kuznetsov <vkuznets@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [2/2] 08e1e67db96801e4a35aa6b60a93b2c2f1641220

kvm_put_sregs2() fails to reset 'locked' CR4/CR0 bits upon vCPU reset when
it is in VMX root operation. Do kvm_put_msr_feature_control() before
kvm_put_sregs2() to (possibly) kick vCPU out of VMX root operation. It also
seems logical to do kvm_put_msr_feature_control() before
kvm_put_nested_state() and not after it, especially when 'real' nested
state is set.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220818150113.479917-3-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 45ed68a1a3a19754ade954d75a3c9d13ff560e5c)
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 target/i386/kvm/kvm.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 81d729dc408..a06221d3e53 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -4255,6 +4255,18 @@ int kvm_arch_put_registers(CPUState *cpu, int level)
 
     assert(cpu_is_stopped(cpu) || qemu_cpu_is_self(cpu));
 
+    /*
+     * Put MSR_IA32_FEATURE_CONTROL first, this ensures the VM gets out of VMX
+     * root operation upon vCPU reset. kvm_put_msr_feature_control() should also
+     * preceed kvm_put_nested_state() when 'real' nested state is set.
+     */
+    if (level >= KVM_PUT_RESET_STATE) {
+        ret = kvm_put_msr_feature_control(x86_cpu);
+        if (ret < 0) {
+            return ret;
+        }
+    }
+
     /* must be before kvm_put_nested_state so that EFER.SVME is set */
     ret = kvm_put_sregs(x86_cpu);
     if (ret < 0) {
@@ -4266,11 +4278,6 @@ int kvm_arch_put_registers(CPUState *cpu, int level)
         if (ret < 0) {
             return ret;
         }
-
-        ret = kvm_put_msr_feature_control(x86_cpu);
-        if (ret < 0) {
-            return ret;
-        }
     }
 
     if (level == KVM_PUT_FULL_STATE) {
-- 
Gitee


From 1a7bd518e4d280fddea5bce531879834a2891dea Mon Sep 17 00:00:00 2001
From: Thomas Huth <thuth@redhat.com>
Date: Fri, 29 Jul 2022 16:55:34 +0200
Subject: [PATCH 162/407] redhat: Update linux-headers/linux/kvm.h to v5.18-rc6
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <None>
RH-MergeRequest: 220: s390x: Fix skey test in kvm_unit_test
RH-Bugzilla: 2124757
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Commit: [1/2] e514a00305cb0caab9d3acc0efb325853daa6d51

Upstream Status: RHEL-only
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2124757

Based on upstream commit e4082063e47e9731dbeb1c26174c17f6038f577f
("linux-headers: Update to v5.18-rc6"), but this is focusing on
the file linux-headers/linux/kvm.h only (since the other changes
related to the VFIO renaming might break some stuff).

Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 71516db15469a02600932a5c1f0d4a9626a91193)
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 linux-headers/linux/kvm.h | 27 +++++++++++++++++++++------
 1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index d232feaae97..0d05d02ee4f 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -445,7 +445,11 @@ struct kvm_run {
 #define KVM_SYSTEM_EVENT_RESET          2
 #define KVM_SYSTEM_EVENT_CRASH          3
 			__u32 type;
-			__u64 flags;
+			__u32 ndata;
+			union {
+				__u64 flags;
+				__u64 data[16];
+			};
 		} system_event;
 		/* KVM_EXIT_S390_STSI */
 		struct {
@@ -562,9 +566,12 @@ struct kvm_s390_mem_op {
 	__u32 op;		/* type of operation */
 	__u64 buf;		/* buffer in userspace */
 	union {
-		__u8 ar;	/* the access register number */
+		struct {
+			__u8 ar;	/* the access register number */
+			__u8 key;	/* access key, ignored if flag unset */
+		};
 		__u32 sida_offset; /* offset into the sida */
-		__u8 reserved[32]; /* should be set to 0 */
+		__u8 reserved[32]; /* ignored */
 	};
 };
 /* types for kvm_s390_mem_op->op */
@@ -572,9 +579,12 @@ struct kvm_s390_mem_op {
 #define KVM_S390_MEMOP_LOGICAL_WRITE	1
 #define KVM_S390_MEMOP_SIDA_READ	2
 #define KVM_S390_MEMOP_SIDA_WRITE	3
+#define KVM_S390_MEMOP_ABSOLUTE_READ	4
+#define KVM_S390_MEMOP_ABSOLUTE_WRITE	5
 /* flags for kvm_s390_mem_op->flags */
 #define KVM_S390_MEMOP_F_CHECK_ONLY		(1ULL << 0)
 #define KVM_S390_MEMOP_F_INJECT_EXCEPTION	(1ULL << 1)
+#define KVM_S390_MEMOP_F_SKEY_PROTECTION	(1ULL << 2)
 
 /* for KVM_INTERRUPT */
 struct kvm_interrupt {
@@ -1134,6 +1144,12 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_VM_GPA_BITS 207
 #define KVM_CAP_XSAVE2 208
 #define KVM_CAP_SYS_ATTRIBUTES 209
+#define KVM_CAP_PPC_AIL_MODE_3 210
+#define KVM_CAP_S390_MEM_OP_EXTENSION 211
+#define KVM_CAP_PMU_CAPABILITY 212
+#define KVM_CAP_DISABLE_QUIRKS2 213
+/* #define KVM_CAP_VM_TSC_CONTROL 214 */
+#define KVM_CAP_SYSTEM_EVENT_DATA 215
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1624,9 +1640,6 @@ struct kvm_enc_region {
 #define KVM_S390_NORMAL_RESET	_IO(KVMIO,   0xc3)
 #define KVM_S390_CLEAR_RESET	_IO(KVMIO,   0xc4)
 
-/* Available with KVM_CAP_XSAVE2 */
-#define KVM_GET_XSAVE2		  _IOR(KVMIO,  0xcf, struct kvm_xsave)
-
 struct kvm_s390_pv_sec_parm {
 	__u64 origin;
 	__u64 length;
@@ -1973,6 +1986,8 @@ struct kvm_dirty_gfn {
 #define KVM_BUS_LOCK_DETECTION_OFF             (1 << 0)
 #define KVM_BUS_LOCK_DETECTION_EXIT            (1 << 1)
 
+#define KVM_PMU_CAP_DISABLE                    (1 << 0)
+
 /**
  * struct kvm_stats_header - Header of per vm/vcpu binary statistics data.
  * @flags: Some extra information for header, always 0 for now.
-- 
Gitee


From 74ffa70f962ce4cd5df16be3c2debab2477c5eda Mon Sep 17 00:00:00 2001
From: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
Date: Fri, 6 May 2022 17:39:56 +0200
Subject: [PATCH 163/407] target/s390x: kvm: Honor storage keys during
 emulation
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <None>
RH-MergeRequest: 220: s390x: Fix skey test in kvm_unit_test
RH-Bugzilla: 2124757
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Commit: [2/2] 980dbb4eba8d2f1da7cf4113230d0a6483cffc4f

Storage key controlled protection is currently not honored when
emulating instructions.
If available, enable key protection for the MEM_OP ioctl, thereby
enabling it for the s390_cpu_virt_mem_* functions, when using kvm.
As a result, the emulation of the following instructions honors storage
keys:

* CLP
  	The Synch I/O CLP command would need special handling in order
  	to support storage keys, but is currently not supported.
* CHSC
	Performing commands asynchronously would require special
	handling, but commands are currently always synchronous.
* STSI
* TSCH
	Must (and does) not change channel if terminated due to
	protection.
* MSCH
	Suppressed on protection, works because fetching instruction.
* SSCH
	Suppressed on protection, works because fetching instruction.
* STSCH
* STCRW
	Suppressed on protection, this works because no partial store is
	possible, because the operand cannot span multiple pages.
* PCISTB
* MPCIFC
* STPCIFC

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2124757

Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
Message-Id: <20220506153956.2217601-3-scgl@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 54354861d21b69ec0781f43e67b8d4f6edad7e3f)
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 target/s390x/kvm/kvm.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
index c52434985ba..ba04997da1e 100644
--- a/target/s390x/kvm/kvm.c
+++ b/target/s390x/kvm/kvm.c
@@ -152,12 +152,15 @@ const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
 static int cap_sync_regs;
 static int cap_async_pf;
 static int cap_mem_op;
+static int cap_mem_op_extension;
 static int cap_s390_irq;
 static int cap_ri;
 static int cap_hpage_1m;
 static int cap_vcpu_resets;
 static int cap_protected;
 
+static bool mem_op_storage_key_support;
+
 static int active_cmma;
 
 static int kvm_s390_query_mem_limit(uint64_t *memory_limit)
@@ -355,6 +358,8 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
     cap_sync_regs = kvm_check_extension(s, KVM_CAP_SYNC_REGS);
     cap_async_pf = kvm_check_extension(s, KVM_CAP_ASYNC_PF);
     cap_mem_op = kvm_check_extension(s, KVM_CAP_S390_MEM_OP);
+    cap_mem_op_extension = kvm_check_extension(s, KVM_CAP_S390_MEM_OP_EXTENSION);
+    mem_op_storage_key_support = cap_mem_op_extension > 0;
     cap_s390_irq = kvm_check_extension(s, KVM_CAP_S390_INJECT_IRQ);
     cap_vcpu_resets = kvm_check_extension(s, KVM_CAP_S390_VCPU_RESETS);
     cap_protected = kvm_check_extension(s, KVM_CAP_S390_PROTECTED);
@@ -843,6 +848,7 @@ int kvm_s390_mem_op(S390CPU *cpu, vaddr addr, uint8_t ar, void *hostbuf,
                        : KVM_S390_MEMOP_LOGICAL_READ,
         .buf = (uint64_t)hostbuf,
         .ar = ar,
+        .key = (cpu->env.psw.mask & PSW_MASK_KEY) >> PSW_SHIFT_KEY,
     };
     int ret;
 
@@ -852,6 +858,9 @@ int kvm_s390_mem_op(S390CPU *cpu, vaddr addr, uint8_t ar, void *hostbuf,
     if (!hostbuf) {
         mem_op.flags |= KVM_S390_MEMOP_F_CHECK_ONLY;
     }
+    if (mem_op_storage_key_support) {
+        mem_op.flags |= KVM_S390_MEMOP_F_SKEY_PROTECTION;
+    }
 
     ret = kvm_vcpu_ioctl(CPU(cpu), KVM_S390_MEM_OP, &mem_op);
     if (ret < 0) {
-- 
Gitee


From 26925d88712d380c0fc2c053b740571de49387b0 Mon Sep 17 00:00:00 2001
From: Yusuke Okada <okada.yusuke@jp.fujitsu.com>
Date: Thu, 18 Aug 2022 14:46:19 -0400
Subject: [PATCH 164/407] virtiofsd: use g_date_time_get_microsecond to get
 subsecond
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Dr. David Alan Gilbert <dgilbert@redhat.com>
RH-MergeRequest: 222: virtiofsd: use g_date_time_get_microsecond to get subsecond
RH-Bugzilla: 2018885
RH-Acked-by: Vivek Goyal <None>
RH-Acked-by: Daniel P. Berrangé <berrange@redhat.com>
RH-Acked-by: Sergio Lopez <slp@redhat.com>
RH-Commit: [1/1] da8795576acc7029044a801ef42676d66471a577

The "%f" specifier in g_date_time_format() is only available in glib
2.65.2 or later. If combined with older glib, the function returns null
and the timestamp displayed as "(null)".

For backward compatibility, g_date_time_get_microsecond should be used
to retrieve subsecond.

In this patch the g_date_time_format() leaves subsecond field as "%06d"
and let next snprintf to format with g_date_time_get_microsecond.

Signed-off-by: Yusuke Okada <okada.yusuke@jp.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-id: 20220818184618.2205172-1-yokada.996@gmail.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit f16d15c9276bd8f501f861c39cbd4adc812d0c1d)
---
 tools/virtiofsd/passthrough_ll.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index b3d0674f6d2..523d8fbe1ed 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -3791,6 +3791,7 @@ static void setup_nofile_rlimit(unsigned long rlimit_nofile)
 static void log_func(enum fuse_log_level level, const char *fmt, va_list ap)
 {
     g_autofree char *localfmt = NULL;
+    char buf[64];
 
     if (current_log_level < level) {
         return;
@@ -3803,9 +3804,11 @@ static void log_func(enum fuse_log_level level, const char *fmt, va_list ap)
                                        fmt);
         } else {
             g_autoptr(GDateTime) now = g_date_time_new_now_utc();
-            g_autofree char *nowstr = g_date_time_format(now, "%Y-%m-%d %H:%M:%S.%f%z");
+            g_autofree char *nowstr = g_date_time_format(now,
+                                       "%Y-%m-%d %H:%M:%S.%%06d%z");
+            snprintf(buf, 64, nowstr, g_date_time_get_microsecond(now));
             localfmt = g_strdup_printf("[%s] [ID: %08ld] %s",
-                                       nowstr, syscall(__NR_gettid), fmt);
+                                       buf, syscall(__NR_gettid), fmt);
         }
         fmt = localfmt;
     }
-- 
Gitee


From 85b58c09d0468f361ff8028c05ec4ad4d8cdd74f Mon Sep 17 00:00:00 2001
From: Halil Pasic <pasic@linux.ibm.com>
Date: Mon, 7 Feb 2022 12:28:57 +0100
Subject: [PATCH 165/407] virtio: fix the condition for iommu_platform not
 supported
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Dr. David Alan Gilbert <dgilbert@redhat.com>
RH-MergeRequest: 224: virtiofs on s390 secure execution
RH-Bugzilla: 2116302
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Cédric Le Goater <None>
RH-Commit: [1/2] d7edc7e3905a04644c9ff44b0d36122c72068e08

The commit 04ceb61a40 ("virtio: Fail if iommu_platform is requested, but
unsupported") claims to fail the device hotplug when iommu_platform
is requested, but not supported by the (vhost) device. On the first
glance the condition for detecting that situation looks perfect, but
because a certain peculiarity of virtio_platform it ain't.

In fact the aforementioned commit introduces a regression. It breaks
virtio-fs support for Secure Execution, and most likely also for AMD SEV
or any other confidential guest scenario that relies encrypted guest
memory.  The same also applies to any other vhost device that does not
support _F_ACCESS_PLATFORM.

The peculiarity is that iommu_platform and _F_ACCESS_PLATFORM collates
"device can not access all of the guest RAM" and "iova != gpa, thus
device needs to translate iova".

Confidential guest technologies currently rely on the device/hypervisor
offering _F_ACCESS_PLATFORM, so that, after the feature has been
negotiated, the guest  grants access to the portions of memory the
device needs to see. So in for confidential guests, generally,
_F_ACCESS_PLATFORM is about the restricted access to memory, but not
about the addresses used being something else than guest physical
addresses.

This is the very reason for which commit f7ef7e6e3b ("vhost: correctly
turn on VIRTIO_F_IOMMU_PLATFORM") fences _F_ACCESS_PLATFORM from the
vhost device that does not need it, because on the vhost interface it
only means "I/O address translation is needed".

This patch takes inspiration from f7ef7e6e3b ("vhost: correctly turn on
VIRTIO_F_IOMMU_PLATFORM"), and uses the same condition for detecting the
situation when _F_ACCESS_PLATFORM is requested, but no I/O translation
by the device, and thus no device capability is needed. In this
situation claiming that the device does not support iommu_plattform=on
is counter-productive. So let us stop doing that!

Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Reported-by: Jakob Naucke <Jakob.Naucke@ibm.com>
Fixes: 04ceb61a40 ("virtio: Fail if iommu_platform is requested, but
unsupported")
Acked-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: qemu-stable@nongnu.org

Message-Id: <20220207112857.607829-1-pasic@linux.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit e65902a913bf31ba79a83a3bd3621108b85cf645)
---
 hw/virtio/virtio-bus.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
index d23db98c568..0f69d1c7426 100644
--- a/hw/virtio/virtio-bus.c
+++ b/hw/virtio/virtio-bus.c
@@ -48,6 +48,7 @@ void virtio_bus_device_plugged(VirtIODevice *vdev, Error **errp)
     VirtioBusClass *klass = VIRTIO_BUS_GET_CLASS(bus);
     VirtioDeviceClass *vdc = VIRTIO_DEVICE_GET_CLASS(vdev);
     bool has_iommu = virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM);
+    bool vdev_has_iommu;
     Error *local_err = NULL;
 
     DPRINTF("%s: plug device.\n", qbus->name);
@@ -69,11 +70,6 @@ void virtio_bus_device_plugged(VirtIODevice *vdev, Error **errp)
         return;
     }
 
-    if (has_iommu && !virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) {
-        error_setg(errp, "iommu_platform=true is not supported by the device");
-        return;
-    }
-
     if (klass->device_plugged != NULL) {
         klass->device_plugged(qbus->parent, &local_err);
     }
@@ -82,9 +78,15 @@ void virtio_bus_device_plugged(VirtIODevice *vdev, Error **errp)
         return;
     }
 
+    vdev_has_iommu = virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM);
     if (klass->get_dma_as != NULL && has_iommu) {
         virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM);
         vdev->dma_as = klass->get_dma_as(qbus->parent);
+        if (!vdev_has_iommu && vdev->dma_as != &address_space_memory) {
+            error_setg(errp,
+                       "iommu_platform=true is not supported by the device");
+            return;
+        }
     } else {
         vdev->dma_as = &address_space_memory;
     }
-- 
Gitee


From 9c7a74d403ce88afba5aafe1b8008ad73342012e Mon Sep 17 00:00:00 2001
From: Halil Pasic <pasic@linux.ibm.com>
Date: Mon, 7 Mar 2022 12:29:39 +0100
Subject: [PATCH 166/407] virtio: fix feature negotiation for ACCESS_PLATFORM
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Dr. David Alan Gilbert <dgilbert@redhat.com>
RH-MergeRequest: 224: virtiofs on s390 secure execution
RH-Bugzilla: 2116302
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Cédric Le Goater <None>
RH-Commit: [2/2] 264d3bdbbde985f16ed6f5a1786547c25fb8cc04

Unlike most virtio features ACCESS_PLATFORM is considered mandatory by
QEMU, i.e. the driver must accept it if offered by the device. The
virtio specification says that the driver SHOULD accept the
ACCESS_PLATFORM feature if offered, and that the device MAY fail to
operate if ACCESS_PLATFORM was offered but not negotiated.

While a SHOULD ain't exactly a MUST, we are certainly allowed to fail
the device when the driver fences ACCESS_PLATFORM. With commit
2943b53f68 ("virtio: force VIRTIO_F_IOMMU_PLATFORM") we already made the
decision to do so whenever the get_dma_as() callback is implemented (by
the bus), which in practice means for the entirety of virtio-pci.

That means, if the device needs to translate I/O addresses, then
ACCESS_PLATFORM is mandatory. The aforementioned commit tells us in the
commit message that this is for security reasons. More precisely if we
were to allow a less then trusted driver (e.g. an user-space driver, or
a nested guest) to make the device bypass the IOMMU by not negotiating
ACCESS_PLATFORM, then the guest kernel would have no ability to
control/police (by programming the IOMMU) what pieces of guest memory
the driver may manipulate using the device. Which would break security
assumptions within the guest.

If ACCESS_PLATFORM is offered not because we want the device to utilize
an IOMMU and do address translation, but because the device does not
have access to the entire guest RAM, and needs the driver to grant
access to the bits it needs access to (e.g. confidential guest support),
we still require the guest to have the corresponding logic and to accept
ACCESS_PLATFORM. If the driver does not accept ACCESS_PLATFORM, then
things are bound to go wrong, and we may see failures much less graceful
than failing the device because the driver didn't negotiate
ACCESS_PLATFORM.

So let us make ACCESS_PLATFORM mandatory for the driver regardless
of whether the get_dma_as() callback is implemented or not.

Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Fixes: 2943b53f68 ("virtio: force VIRTIO_F_IOMMU_PLATFORM")

Message-Id: <20220307112939.2780117-1-pasic@linux.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
(cherry picked from commit 06134e2bc35dc21543d4cbcf31f858c03d383442)
---
 hw/virtio/virtio-bus.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
index 0f69d1c7426..d7ec023adf2 100644
--- a/hw/virtio/virtio-bus.c
+++ b/hw/virtio/virtio-bus.c
@@ -78,17 +78,23 @@ void virtio_bus_device_plugged(VirtIODevice *vdev, Error **errp)
         return;
     }
 
-    vdev_has_iommu = virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM);
-    if (klass->get_dma_as != NULL && has_iommu) {
+    vdev->dma_as = &address_space_memory;
+    if (has_iommu) {
+        vdev_has_iommu = virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM);
+        /*
+         * Present IOMMU_PLATFORM to the driver iff iommu_plattform=on and
+         * device operational. If the driver does not accept IOMMU_PLATFORM
+         * we fail the device.
+         */
         virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM);
-        vdev->dma_as = klass->get_dma_as(qbus->parent);
-        if (!vdev_has_iommu && vdev->dma_as != &address_space_memory) {
-            error_setg(errp,
+        if (klass->get_dma_as) {
+            vdev->dma_as = klass->get_dma_as(qbus->parent);
+            if (!vdev_has_iommu && vdev->dma_as != &address_space_memory) {
+                error_setg(errp,
                        "iommu_platform=true is not supported by the device");
-            return;
+                return;
+            }
         }
-    } else {
-        vdev->dma_as = &address_space_memory;
     }
 }
 
-- 
Gitee


From 9ad67fdc2941349a2562d8fcb7cb791aba810c71 Mon Sep 17 00:00:00 2001
From: Matthew Rosato <mjrosato@linux.ibm.com>
Date: Fri, 3 Dec 2021 09:27:03 -0500
Subject: [PATCH 167/407] s390x/pci: use a reserved ID for the default PCI
 group
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 226: s390: Enhanced Interpretation for PCI Functions and Secure Execution guest dump
RH-Bugzilla: 1664378 2043909
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [1/41] ad3ed38dec95acf0da04d7669fe772d798d039fc

The current default PCI group being used can technically collide with a
real group ID passed from a hostdev.  Let's instead use a group ID that
comes from a special pool (0xF0-0xFF) that is architected to be reserved
for simulated devices.

Fixes: 28dc86a072 ("s390x/pci: use a PCI Group structure")
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Message-Id: <20211203142706.427279-2-mjrosato@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit b2892a2b9d45d25b909108ca633d19f9d8d673f5)
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 include/hw/s390x/s390-pci-bus.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/hw/s390x/s390-pci-bus.h b/include/hw/s390x/s390-pci-bus.h
index aa891c178dd..2727e7bdefb 100644
--- a/include/hw/s390x/s390-pci-bus.h
+++ b/include/hw/s390x/s390-pci-bus.h
@@ -313,7 +313,7 @@ typedef struct ZpciFmb {
 } ZpciFmb;
 QEMU_BUILD_BUG_MSG(offsetof(ZpciFmb, fmt0) != 48, "padding in ZpciFmb");
 
-#define ZPCI_DEFAULT_FN_GRP 0x20
+#define ZPCI_DEFAULT_FN_GRP 0xFF
 typedef struct S390PCIGroup {
     ClpRspQueryPciGrp zpci_group;
     int id;
-- 
Gitee


From e71a40c4d5869d3708dbe5bf79ce897b4c89054b Mon Sep 17 00:00:00 2001
From: Matthew Rosato <mjrosato@linux.ibm.com>
Date: Fri, 3 Dec 2021 09:27:04 -0500
Subject: [PATCH 168/407] s390x/pci: don't use hard-coded dma range in reg_ioat
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 226: s390: Enhanced Interpretation for PCI Functions and Secure Execution guest dump
RH-Bugzilla: 1664378 2043909
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [2/41] c7897321f9848ef8f115130832774bbcd6724f03

Instead use the values from clp info, they will either be the hard-coded
values or what came from the host driver via vfio.

Fixes: 9670ee752727 ("s390x/pci: use a PCI Function structure")
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Message-Id: <20211203142706.427279-3-mjrosato@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit df7ce0a94d9283f0656b4bc0f21566973ff649a3)
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 hw/s390x/s390-pci-inst.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
index 1c8ad91175b..11b7f6bfa14 100644
--- a/hw/s390x/s390-pci-inst.c
+++ b/hw/s390x/s390-pci-inst.c
@@ -916,9 +916,10 @@ int pci_dereg_irqs(S390PCIBusDevice *pbdev)
     return 0;
 }
 
-static int reg_ioat(CPUS390XState *env, S390PCIIOMMU *iommu, ZpciFib fib,
+static int reg_ioat(CPUS390XState *env, S390PCIBusDevice *pbdev, ZpciFib fib,
                     uintptr_t ra)
 {
+    S390PCIIOMMU *iommu = pbdev->iommu;
     uint64_t pba = ldq_p(&fib.pba);
     uint64_t pal = ldq_p(&fib.pal);
     uint64_t g_iota = ldq_p(&fib.iota);
@@ -927,7 +928,7 @@ static int reg_ioat(CPUS390XState *env, S390PCIIOMMU *iommu, ZpciFib fib,
 
     pba &= ~0xfff;
     pal |= 0xfff;
-    if (pba > pal || pba < ZPCI_SDMA_ADDR || pal > ZPCI_EDMA_ADDR) {
+    if (pba > pal || pba < pbdev->zpci_fn.sdma || pal > pbdev->zpci_fn.edma) {
         s390_program_interrupt(env, PGM_OPERAND, ra);
         return -EINVAL;
     }
@@ -1125,7 +1126,7 @@ int mpcifc_service_call(S390CPU *cpu, uint8_t r1, uint64_t fiba, uint8_t ar,
         } else if (pbdev->iommu->enabled) {
             cc = ZPCI_PCI_LS_ERR;
             s390_set_status_code(env, r1, ZPCI_MOD_ST_SEQUENCE);
-        } else if (reg_ioat(env, pbdev->iommu, fib, ra)) {
+        } else if (reg_ioat(env, pbdev, fib, ra)) {
             cc = ZPCI_PCI_LS_ERR;
             s390_set_status_code(env, r1, ZPCI_MOD_ST_INSUF_RES);
         }
@@ -1150,7 +1151,7 @@ int mpcifc_service_call(S390CPU *cpu, uint8_t r1, uint64_t fiba, uint8_t ar,
             s390_set_status_code(env, r1, ZPCI_MOD_ST_SEQUENCE);
         } else {
             pci_dereg_ioat(pbdev->iommu);
-            if (reg_ioat(env, pbdev->iommu, fib, ra)) {
+            if (reg_ioat(env, pbdev, fib, ra)) {
                 cc = ZPCI_PCI_LS_ERR;
                 s390_set_status_code(env, r1, ZPCI_MOD_ST_INSUF_RES);
             }
-- 
Gitee


From 8effc77cf05a02ee0bb69ac937064a39bf81909e Mon Sep 17 00:00:00 2001
From: Matthew Rosato <mjrosato@linux.ibm.com>
Date: Fri, 3 Dec 2021 09:27:05 -0500
Subject: [PATCH 169/407] s390x/pci: use the passthrough measurement update
 interval
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 226: s390: Enhanced Interpretation for PCI Functions and Secure Execution guest dump
RH-Bugzilla: 1664378 2043909
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [3/41] bc31ea731fe64e51522f1202e65528311397b919

We may have gotten a measurement update interval from the underlying host
via vfio -- Use it to set the interval via which we update the function
measurement block.

Fixes: 28dc86a072 ("s390x/pci: use a PCI Group structure")
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Message-Id: <20211203142706.427279-4-mjrosato@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit cb6d6a3e6aa1226b67fd218953dcb3866c3a6845)
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 hw/s390x/s390-pci-inst.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
index 11b7f6bfa14..07bab85ce57 100644
--- a/hw/s390x/s390-pci-inst.c
+++ b/hw/s390x/s390-pci-inst.c
@@ -1046,7 +1046,7 @@ static void fmb_update(void *opaque)
                       sizeof(pbdev->fmb.last_update))) {
         return;
     }
-    timer_mod(pbdev->fmb_timer, t + DEFAULT_MUI);
+    timer_mod(pbdev->fmb_timer, t + pbdev->pci_group->zpci_group.mui);
 }
 
 int mpcifc_service_call(S390CPU *cpu, uint8_t r1, uint64_t fiba, uint8_t ar,
@@ -1204,7 +1204,8 @@ int mpcifc_service_call(S390CPU *cpu, uint8_t r1, uint64_t fiba, uint8_t ar,
         }
         pbdev->fmb_addr = fmb_addr;
         timer_mod(pbdev->fmb_timer,
-                  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + DEFAULT_MUI);
+                  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
+                                    pbdev->pci_group->zpci_group.mui);
         break;
     }
     default:
-- 
Gitee


From 1c16921572d1efc87fcb87567860dc00d6964113 Mon Sep 17 00:00:00 2001
From: Matthew Rosato <mjrosato@linux.ibm.com>
Date: Fri, 3 Dec 2021 09:27:06 -0500
Subject: [PATCH 170/407] s390x/pci: add supported DT information to clp
 response
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 226: s390: Enhanced Interpretation for PCI Functions and Secure Execution guest dump
RH-Bugzilla: 1664378 2043909
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [4/41] 275668f6d38fbc1dfa2f1aa8f58b2c319de2657d

The DTSM is a mask that specifies which I/O Address Translation designation
types are supported.  Today QEMU only supports DT=1.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Message-Id: <20211203142706.427279-5-mjrosato@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit ac6aa30ac47b2abaf142f76de46374da2a98f6e7)
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 hw/s390x/s390-pci-bus.c         | 1 +
 hw/s390x/s390-pci-inst.c        | 1 +
 hw/s390x/s390-pci-vfio.c        | 1 +
 include/hw/s390x/s390-pci-bus.h | 1 +
 include/hw/s390x/s390-pci-clp.h | 3 ++-
 5 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 1b51a728385..01b58ebc707 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -782,6 +782,7 @@ static void s390_pci_init_default_group(void)
     resgrp->i = 128;
     resgrp->maxstbl = 128;
     resgrp->version = 0;
+    resgrp->dtsm = ZPCI_DTSM;
 }
 
 static void set_pbdev_info(S390PCIBusDevice *pbdev)
diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
index 07bab85ce57..6d400d41472 100644
--- a/hw/s390x/s390-pci-inst.c
+++ b/hw/s390x/s390-pci-inst.c
@@ -329,6 +329,7 @@ int clp_service_call(S390CPU *cpu, uint8_t r2, uintptr_t ra)
         stw_p(&resgrp->i, group->zpci_group.i);
         stw_p(&resgrp->maxstbl, group->zpci_group.maxstbl);
         resgrp->version = group->zpci_group.version;
+        resgrp->dtsm = group->zpci_group.dtsm;
         stw_p(&resgrp->hdr.rsp, CLP_RC_OK);
         break;
     }
diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
index 2a153fa8c9e..6f80a47e299 100644
--- a/hw/s390x/s390-pci-vfio.c
+++ b/hw/s390x/s390-pci-vfio.c
@@ -160,6 +160,7 @@ static void s390_pci_read_group(S390PCIBusDevice *pbdev,
         resgrp->i = cap->noi;
         resgrp->maxstbl = cap->maxstbl;
         resgrp->version = cap->version;
+        resgrp->dtsm = ZPCI_DTSM;
     }
 }
 
diff --git a/include/hw/s390x/s390-pci-bus.h b/include/hw/s390x/s390-pci-bus.h
index 2727e7bdefb..da3cde2bb42 100644
--- a/include/hw/s390x/s390-pci-bus.h
+++ b/include/hw/s390x/s390-pci-bus.h
@@ -37,6 +37,7 @@
 #define ZPCI_MAX_UID 0xffff
 #define UID_UNDEFINED 0
 #define UID_CHECKING_ENABLED 0x01
+#define ZPCI_DTSM 0x40
 
 OBJECT_DECLARE_SIMPLE_TYPE(S390pciState, S390_PCI_HOST_BRIDGE)
 OBJECT_DECLARE_SIMPLE_TYPE(S390PCIBus, S390_PCI_BUS)
diff --git a/include/hw/s390x/s390-pci-clp.h b/include/hw/s390x/s390-pci-clp.h
index 96b8e3f1331..cc8c8662b89 100644
--- a/include/hw/s390x/s390-pci-clp.h
+++ b/include/hw/s390x/s390-pci-clp.h
@@ -163,7 +163,8 @@ typedef struct ClpRspQueryPciGrp {
     uint8_t fr;
     uint16_t maxstbl;
     uint16_t mui;
-    uint64_t reserved3;
+    uint8_t dtsm;
+    uint8_t reserved3[7];
     uint64_t dasm; /* dma address space mask */
     uint64_t msia; /* MSI address */
     uint64_t reserved4;
-- 
Gitee


From 78703965f97406d09faf72343dfb8e48e3e5d5db Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= <clg@redhat.com>
Date: Fri, 28 Oct 2022 17:48:00 +0100
Subject: [PATCH 171/407] Update linux headers to v6.0-rc4
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 226: s390: Enhanced Interpretation for PCI Functions and Secure Execution guest dump
RH-Bugzilla: 1664378 2043909
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [5/41] ca55f497d1bf1e72179330f8f613781bf999d898

Based on upstream commit d525f73f9186a5bc641b8caf0b2c9bb94e5aa963
("Update linux headers to v6.0-rc4"), but this is focusing only on the
ZPCI and protected dump changes.

Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 linux-headers/linux/kvm.h       | 87 +++++++++++++++++++++++++++++++++
 linux-headers/linux/vfio_zdev.h |  7 +++
 2 files changed, 94 insertions(+)

diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 0d05d02ee4f..c65930288c7 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -1150,6 +1150,9 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_DISABLE_QUIRKS2 213
 /* #define KVM_CAP_VM_TSC_CONTROL 214 */
 #define KVM_CAP_SYSTEM_EVENT_DATA 215
+#define KVM_CAP_S390_PROTECTED_DUMP 217
+#define KVM_CAP_S390_ZPCI_OP 221
+#define KVM_CAP_S390_CPU_TOPOLOGY 222
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1651,6 +1654,55 @@ struct kvm_s390_pv_unp {
 	__u64 tweak;
 };
 
+enum pv_cmd_dmp_id {
+	KVM_PV_DUMP_INIT,
+	KVM_PV_DUMP_CONFIG_STOR_STATE,
+	KVM_PV_DUMP_COMPLETE,
+	KVM_PV_DUMP_CPU,
+};
+
+struct kvm_s390_pv_dmp {
+	__u64 subcmd;
+	__u64 buff_addr;
+	__u64 buff_len;
+	__u64 gaddr;		/* For dump storage state */
+	__u64 reserved[4];
+};
+
+enum pv_cmd_info_id {
+	KVM_PV_INFO_VM,
+	KVM_PV_INFO_DUMP,
+};
+
+struct kvm_s390_pv_info_dump {
+	__u64 dump_cpu_buffer_len;
+	__u64 dump_config_mem_buffer_per_1m;
+	__u64 dump_config_finalize_len;
+};
+
+struct kvm_s390_pv_info_vm {
+	__u64 inst_calls_list[4];
+	__u64 max_cpus;
+	__u64 max_guests;
+	__u64 max_guest_addr;
+	__u64 feature_indication;
+};
+
+struct kvm_s390_pv_info_header {
+	__u32 id;
+	__u32 len_max;
+	__u32 len_written;
+	__u32 reserved;
+};
+
+struct kvm_s390_pv_info {
+	struct kvm_s390_pv_info_header header;
+	union {
+		struct kvm_s390_pv_info_dump dump;
+		struct kvm_s390_pv_info_vm vm;
+	};
+};
+
 enum pv_cmd_id {
 	KVM_PV_ENABLE,
 	KVM_PV_DISABLE,
@@ -1659,6 +1711,8 @@ enum pv_cmd_id {
 	KVM_PV_VERIFY,
 	KVM_PV_PREP_RESET,
 	KVM_PV_UNSHARE_ALL,
+	KVM_PV_INFO,
+	KVM_PV_DUMP,
 };
 
 struct kvm_pv_cmd {
@@ -2066,4 +2120,37 @@ struct kvm_stats_desc {
 /* Available with KVM_CAP_XSAVE2 */
 #define KVM_GET_XSAVE2		  _IOR(KVMIO,  0xcf, struct kvm_xsave)
 
+/* Available with KVM_CAP_S390_PROTECTED_DUMP */
+#define KVM_S390_PV_CPU_COMMAND	_IOWR(KVMIO, 0xd0, struct kvm_pv_cmd)
+
+/* Available with KVM_CAP_S390_ZPCI_OP */
+#define KVM_S390_ZPCI_OP         _IOW(KVMIO,  0xd1, struct kvm_s390_zpci_op)
+
+struct kvm_s390_zpci_op {
+	/* in */
+	__u32 fh;               /* target device */
+	__u8  op;               /* operation to perform */
+	__u8  pad[3];
+	union {
+		/* for KVM_S390_ZPCIOP_REG_AEN */
+		struct {
+			__u64 ibv;      /* Guest addr of interrupt bit vector */
+			__u64 sb;       /* Guest addr of summary bit */
+			__u32 flags;
+			__u32 noi;      /* Number of interrupts */
+			__u8 isc;       /* Guest interrupt subclass */
+			__u8 sbo;       /* Offset of guest summary bit vector */
+			__u16 pad;
+		} reg_aen;
+		__u64 reserved[8];
+	} u;
+};
+
+/* types for kvm_s390_zpci_op->op */
+#define KVM_S390_ZPCIOP_REG_AEN                0
+#define KVM_S390_ZPCIOP_DEREG_AEN      1
+
+/* flags for kvm_s390_zpci_op->u.reg_aen.flags */
+#define KVM_S390_ZPCIOP_REGAEN_HOST    (1 << 0)
+
 #endif /* __LINUX_KVM_H */
diff --git a/linux-headers/linux/vfio_zdev.h b/linux-headers/linux/vfio_zdev.h
index b4309397b6b..77f2aff1f27 100644
--- a/linux-headers/linux/vfio_zdev.h
+++ b/linux-headers/linux/vfio_zdev.h
@@ -29,6 +29,9 @@ struct vfio_device_info_cap_zpci_base {
 	__u16 fmb_length;	/* Measurement Block Length (in bytes) */
 	__u8 pft;		/* PCI Function Type */
 	__u8 gid;		/* PCI function group ID */
+	/* End of version 1 */
+	__u32 fh;		/* PCI function handle */
+	/* End of version 2 */
 };
 
 /**
@@ -47,6 +50,10 @@ struct vfio_device_info_cap_zpci_group {
 	__u16 noi;		/* Maximum number of MSIs */
 	__u16 maxstbl;		/* Maximum Store Block Length */
 	__u8 version;		/* Supported PCI Version */
+	/* End of version 1 */
+	__u8 reserved;
+	__u16 imaxstbl;		/* Maximum Interpreted Store Block Length */
+	/* End of version 2 */
 };
 
 /**
-- 
Gitee


From 79e694aa4610093e65f98d8ad75d84f89426ca05 Mon Sep 17 00:00:00 2001
From: Matthew Rosato <mjrosato@linux.ibm.com>
Date: Fri, 2 Sep 2022 13:27:31 -0400
Subject: [PATCH 172/407] s390x/pci: add routine to get host function handle
 from CLP info
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 226: s390: Enhanced Interpretation for PCI Functions and Secure Execution guest dump
RH-Bugzilla: 1664378 2043909
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [6/41] 8ab652cf4095e61f5f55726d41111de227d452e7

In order to interface with the underlying host zPCI device, we need
to know its function handle. Add a routine to grab this from the
vfio CLP capabilities chain.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Message-Id: <20220902172737.170349-3-mjrosato@linux.ibm.com>
[thuth: Replace free(info) with g_free(info)]
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 21fa15298d88db2050a713cdf79c10cb0e09146f)
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 hw/s390x/s390-pci-vfio.c         | 83 ++++++++++++++++++++++++++------
 include/hw/s390x/s390-pci-vfio.h |  5 ++
 2 files changed, 72 insertions(+), 16 deletions(-)

diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
index 6f80a47e299..08bcc55e853 100644
--- a/hw/s390x/s390-pci-vfio.c
+++ b/hw/s390x/s390-pci-vfio.c
@@ -124,6 +124,27 @@ static void s390_pci_read_base(S390PCIBusDevice *pbdev,
     pbdev->zpci_fn.pft = 0;
 }
 
+static bool get_host_fh(S390PCIBusDevice *pbdev, struct vfio_device_info *info,
+                        uint32_t *fh)
+{
+    struct vfio_info_cap_header *hdr;
+    struct vfio_device_info_cap_zpci_base *cap;
+    VFIOPCIDevice *vpci = container_of(pbdev->pdev, VFIOPCIDevice, pdev);
+
+    hdr = vfio_get_device_info_cap(info, VFIO_DEVICE_INFO_CAP_ZPCI_BASE);
+
+    /* Can only get the host fh with version 2 or greater */
+    if (hdr == NULL || hdr->version < 2) {
+        trace_s390_pci_clp_cap(vpci->vbasedev.name,
+                               VFIO_DEVICE_INFO_CAP_ZPCI_BASE);
+        return false;
+    }
+    cap = (void *) hdr;
+
+    *fh = cap->fh;
+    return true;
+}
+
 static void s390_pci_read_group(S390PCIBusDevice *pbdev,
                                 struct vfio_device_info *info)
 {
@@ -217,25 +238,13 @@ static void s390_pci_read_pfip(S390PCIBusDevice *pbdev,
     memcpy(pbdev->zpci_fn.pfip, cap->pfip, CLP_PFIP_NR_SEGMENTS);
 }
 
-/*
- * This function will issue the VFIO_DEVICE_GET_INFO ioctl and look for
- * capabilities that contain information about CLP features provided by the
- * underlying host.
- * On entry, defaults have already been placed into the guest CLP response
- * buffers.  On exit, defaults will have been overwritten for any CLP features
- * found in the capability chain; defaults will remain for any CLP features not
- * found in the chain.
- */
-void s390_pci_get_clp_info(S390PCIBusDevice *pbdev)
+static struct vfio_device_info *get_device_info(S390PCIBusDevice *pbdev,
+                                                uint32_t argsz)
 {
-    g_autofree struct vfio_device_info *info = NULL;
+    struct vfio_device_info *info = g_malloc0(argsz);
     VFIOPCIDevice *vfio_pci;
-    uint32_t argsz;
     int fd;
 
-    argsz = sizeof(*info);
-    info = g_malloc0(argsz);
-
     vfio_pci = container_of(pbdev->pdev, VFIOPCIDevice, pdev);
     fd = vfio_pci->vbasedev.fd;
 
@@ -250,7 +259,8 @@ retry:
 
     if (ioctl(fd, VFIO_DEVICE_GET_INFO, info)) {
         trace_s390_pci_clp_dev_info(vfio_pci->vbasedev.name);
-        return;
+        g_free(info);
+        return NULL;
     }
 
     if (info->argsz > argsz) {
@@ -259,6 +269,47 @@ retry:
         goto retry;
     }
 
+    return info;
+}
+
+/*
+ * Get the host function handle from the vfio CLP capabilities chain.  Returns
+ * true if a fh value was placed into the provided buffer.  Returns false
+ * if a fh could not be obtained (ioctl failed or capabilitiy version does
+ * not include the fh)
+ */
+bool s390_pci_get_host_fh(S390PCIBusDevice *pbdev, uint32_t *fh)
+{
+    g_autofree struct vfio_device_info *info = NULL;
+
+    assert(fh);
+
+    info = get_device_info(pbdev, sizeof(*info));
+    if (!info) {
+        return false;
+    }
+
+    return get_host_fh(pbdev, info, fh);
+}
+
+/*
+ * This function will issue the VFIO_DEVICE_GET_INFO ioctl and look for
+ * capabilities that contain information about CLP features provided by the
+ * underlying host.
+ * On entry, defaults have already been placed into the guest CLP response
+ * buffers.  On exit, defaults will have been overwritten for any CLP features
+ * found in the capability chain; defaults will remain for any CLP features not
+ * found in the chain.
+ */
+void s390_pci_get_clp_info(S390PCIBusDevice *pbdev)
+{
+    g_autofree struct vfio_device_info *info = NULL;
+
+    info = get_device_info(pbdev, sizeof(*info));
+    if (!info) {
+        return;
+    }
+
     /*
      * Find the CLP features provided and fill in the guest CLP responses.
      * Always call s390_pci_read_base first as information from this could
diff --git a/include/hw/s390x/s390-pci-vfio.h b/include/hw/s390x/s390-pci-vfio.h
index ff708aef500..ae1b126ff70 100644
--- a/include/hw/s390x/s390-pci-vfio.h
+++ b/include/hw/s390x/s390-pci-vfio.h
@@ -20,6 +20,7 @@ bool s390_pci_update_dma_avail(int fd, unsigned int *avail);
 S390PCIDMACount *s390_pci_start_dma_count(S390pciState *s,
                                           S390PCIBusDevice *pbdev);
 void s390_pci_end_dma_count(S390pciState *s, S390PCIDMACount *cnt);
+bool s390_pci_get_host_fh(S390PCIBusDevice *pbdev, uint32_t *fh);
 void s390_pci_get_clp_info(S390PCIBusDevice *pbdev);
 #else
 static inline bool s390_pci_update_dma_avail(int fd, unsigned int *avail)
@@ -33,6 +34,10 @@ static inline S390PCIDMACount *s390_pci_start_dma_count(S390pciState *s,
 }
 static inline void s390_pci_end_dma_count(S390pciState *s,
                                           S390PCIDMACount *cnt) { }
+static inline bool s390_pci_get_host_fh(S390PCIBusDevice *pbdev, uint32_t *fh)
+{
+    return false;
+}
 static inline void s390_pci_get_clp_info(S390PCIBusDevice *pbdev) { }
 #endif
 
-- 
Gitee


From b463b342b464fe132ac96cd70bfe36fc184bea8b Mon Sep 17 00:00:00 2001
From: Matthew Rosato <mjrosato@linux.ibm.com>
Date: Fri, 2 Sep 2022 13:27:32 -0400
Subject: [PATCH 173/407] s390x/pci: enable for load/store interpretation
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 226: s390: Enhanced Interpretation for PCI Functions and Secure Execution guest dump
RH-Bugzilla: 1664378 2043909
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [7/41] 3a96e901e295bb9e0c530638c45b5da5d60c00bd

If the ZPCI_OP ioctl reports that is is available and usable, then the
underlying KVM host will enable load/store intepretation for any guest
device without a SHM bit in the guest function handle.  For a device that
will be using interpretation support, ensure the guest function handle
matches the host function handle; this value is re-checked every time the
guest issues a SET PCI FN to enable the guest device as it is the only
opportunity to reflect function handle changes.

By default, unless interpret=off is specified, interpretation support will
always be assumed and exploited if the necessary ioctl and features are
available on the host kernel.  When these are unavailable, we will silently
revert to the interception model; this allows existing guest configurations
to work unmodified on hosts with and without zPCI interpretation support,
allowing QEMU to choose the best support model available.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Acked-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20220902172737.170349-4-mjrosato@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit dd1d5fd9684beeb0c14c39f497ef2aa9ac683aa7)
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 hw/s390x/meson.build            |  1 +
 hw/s390x/s390-pci-bus.c         | 66 ++++++++++++++++++++++++++++++++-
 hw/s390x/s390-pci-inst.c        | 16 ++++++++
 hw/s390x/s390-pci-kvm.c         | 22 +++++++++++
 include/hw/s390x/s390-pci-bus.h |  1 +
 include/hw/s390x/s390-pci-kvm.h | 24 ++++++++++++
 target/s390x/kvm/kvm.c          |  7 ++++
 target/s390x/kvm/kvm_s390x.h    |  1 +
 8 files changed, 137 insertions(+), 1 deletion(-)
 create mode 100644 hw/s390x/s390-pci-kvm.c
 create mode 100644 include/hw/s390x/s390-pci-kvm.h

diff --git a/hw/s390x/meson.build b/hw/s390x/meson.build
index 28484256ec0..6e6e47fcda3 100644
--- a/hw/s390x/meson.build
+++ b/hw/s390x/meson.build
@@ -23,6 +23,7 @@ s390x_ss.add(when: 'CONFIG_KVM', if_true: files(
   's390-skeys-kvm.c',
   's390-stattrib-kvm.c',
   'pv.c',
+  's390-pci-kvm.c',
 ))
 s390x_ss.add(when: 'CONFIG_TCG', if_true: files(
   'tod-tcg.c',
diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 01b58ebc707..18bfae0465a 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -16,6 +16,7 @@
 #include "qapi/visitor.h"
 #include "hw/s390x/s390-pci-bus.h"
 #include "hw/s390x/s390-pci-inst.h"
+#include "hw/s390x/s390-pci-kvm.h"
 #include "hw/s390x/s390-pci-vfio.h"
 #include "hw/pci/pci_bus.h"
 #include "hw/qdev-properties.h"
@@ -971,12 +972,51 @@ static void s390_pci_update_subordinate(PCIDevice *dev, uint32_t nr)
     }
 }
 
+static int s390_pci_interp_plug(S390pciState *s, S390PCIBusDevice *pbdev)
+{
+    uint32_t idx, fh;
+
+    if (!s390_pci_get_host_fh(pbdev, &fh)) {
+        return -EPERM;
+    }
+
+    /*
+     * The host device is already in an enabled state, but we always present
+     * the initial device state to the guest as disabled (ZPCI_FS_DISABLED).
+     * Therefore, mask off the enable bit from the passthrough handle until
+     * the guest issues a CLP SET PCI FN later to enable the device.
+     */
+    pbdev->fh = fh & ~FH_MASK_ENABLE;
+
+    /* Next, see if the idx is already in-use */
+    idx = pbdev->fh & FH_MASK_INDEX;
+    if (pbdev->idx != idx) {
+        if (s390_pci_find_dev_by_idx(s, idx)) {
+            return -EINVAL;
+        }
+        /*
+         * Update the idx entry with the passed through idx
+         * If the relinquished idx is lower than next_idx, use it
+         * to replace next_idx
+         */
+        g_hash_table_remove(s->zpci_table, &pbdev->idx);
+        if (idx < s->next_idx) {
+            s->next_idx = idx;
+        }
+        pbdev->idx = idx;
+        g_hash_table_insert(s->zpci_table, &pbdev->idx, pbdev);
+    }
+
+    return 0;
+}
+
 static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
                               Error **errp)
 {
     S390pciState *s = S390_PCI_HOST_BRIDGE(hotplug_dev);
     PCIDevice *pdev = NULL;
     S390PCIBusDevice *pbdev = NULL;
+    int rc;
 
     if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_BRIDGE)) {
         PCIBridge *pb = PCI_BRIDGE(dev);
@@ -1022,12 +1062,35 @@ static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
         set_pbdev_info(pbdev);
 
         if (object_dynamic_cast(OBJECT(dev), "vfio-pci")) {
-            pbdev->fh |= FH_SHM_VFIO;
+            /*
+             * By default, interpretation is always requested; if the available
+             * facilities indicate it is not available, fallback to the
+             * interception model.
+             */
+            if (pbdev->interp) {
+                if (s390_pci_kvm_interp_allowed()) {
+                    rc = s390_pci_interp_plug(s, pbdev);
+                    if (rc) {
+                        error_setg(errp, "Plug failed for zPCI device in "
+                                   "interpretation mode: %d", rc);
+                        return;
+                    }
+                } else {
+                    DPRINTF("zPCI interpretation facilities missing.\n");
+                    pbdev->interp = false;
+                }
+            }
             pbdev->iommu->dma_limit = s390_pci_start_dma_count(s, pbdev);
             /* Fill in CLP information passed via the vfio region */
             s390_pci_get_clp_info(pbdev);
+            if (!pbdev->interp) {
+                /* Do vfio passthrough but intercept for I/O */
+                pbdev->fh |= FH_SHM_VFIO;
+            }
         } else {
             pbdev->fh |= FH_SHM_EMUL;
+            /* Always intercept emulated devices */
+            pbdev->interp = false;
         }
 
         if (s390_pci_msix_init(pbdev)) {
@@ -1360,6 +1423,7 @@ static Property s390_pci_device_properties[] = {
     DEFINE_PROP_UINT16("uid", S390PCIBusDevice, uid, UID_UNDEFINED),
     DEFINE_PROP_S390_PCI_FID("fid", S390PCIBusDevice, fid),
     DEFINE_PROP_STRING("target", S390PCIBusDevice, target),
+    DEFINE_PROP_BOOL("interpret", S390PCIBusDevice, interp, true),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
index 6d400d41472..651ec386350 100644
--- a/hw/s390x/s390-pci-inst.c
+++ b/hw/s390x/s390-pci-inst.c
@@ -18,6 +18,8 @@
 #include "sysemu/hw_accel.h"
 #include "hw/s390x/s390-pci-inst.h"
 #include "hw/s390x/s390-pci-bus.h"
+#include "hw/s390x/s390-pci-kvm.h"
+#include "hw/s390x/s390-pci-vfio.h"
 #include "hw/s390x/tod.h"
 
 #ifndef DEBUG_S390PCI_INST
@@ -246,6 +248,20 @@ int clp_service_call(S390CPU *cpu, uint8_t r2, uintptr_t ra)
                 goto out;
             }
 
+            /*
+             * Take this opportunity to make sure we still have an accurate
+             * host fh.  It's possible part of the handle changed while the
+             * device was disabled to the guest (e.g. vfio hot reset for
+             * ISM during plug)
+             */
+            if (pbdev->interp) {
+                /* Take this opportunity to make sure we are sync'd with host */
+                if (!s390_pci_get_host_fh(pbdev, &pbdev->fh) ||
+                    !(pbdev->fh & FH_MASK_ENABLE)) {
+                    stw_p(&ressetpci->hdr.rsp, CLP_RC_SETPCIFN_FH);
+                    goto out;
+                }
+            }
             pbdev->fh |= FH_MASK_ENABLE;
             pbdev->state = ZPCI_FS_ENABLED;
             stl_p(&ressetpci->fh, pbdev->fh);
diff --git a/hw/s390x/s390-pci-kvm.c b/hw/s390x/s390-pci-kvm.c
new file mode 100644
index 00000000000..0f16104a74c
--- /dev/null
+++ b/hw/s390x/s390-pci-kvm.c
@@ -0,0 +1,22 @@
+/*
+ * s390 zPCI KVM interfaces
+ *
+ * Copyright 2022 IBM Corp.
+ * Author(s): Matthew Rosato <mjrosato@linux.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at
+ * your option) any later version. See the COPYING file in the top-level
+ * directory.
+ */
+
+#include "qemu/osdep.h"
+
+#include "kvm/kvm_s390x.h"
+#include "hw/s390x/pv.h"
+#include "hw/s390x/s390-pci-kvm.h"
+#include "cpu_models.h"
+
+bool s390_pci_kvm_interp_allowed(void)
+{
+    return kvm_s390_get_zpci_op() && !s390_is_pv();
+}
diff --git a/include/hw/s390x/s390-pci-bus.h b/include/hw/s390x/s390-pci-bus.h
index da3cde2bb42..a9843dfe97d 100644
--- a/include/hw/s390x/s390-pci-bus.h
+++ b/include/hw/s390x/s390-pci-bus.h
@@ -350,6 +350,7 @@ struct S390PCIBusDevice {
     IndAddr *indicator;
     bool pci_unplug_request_processed;
     bool unplug_requested;
+    bool interp;
     QTAILQ_ENTRY(S390PCIBusDevice) link;
 };
 
diff --git a/include/hw/s390x/s390-pci-kvm.h b/include/hw/s390x/s390-pci-kvm.h
new file mode 100644
index 00000000000..80a2e7d0caf
--- /dev/null
+++ b/include/hw/s390x/s390-pci-kvm.h
@@ -0,0 +1,24 @@
+/*
+ * s390 PCI KVM interfaces
+ *
+ * Copyright 2022 IBM Corp.
+ * Author(s): Matthew Rosato <mjrosato@linux.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at
+ * your option) any later version. See the COPYING file in the top-level
+ * directory.
+ */
+
+#ifndef HW_S390_PCI_KVM_H
+#define HW_S390_PCI_KVM_H
+
+#ifdef CONFIG_KVM
+bool s390_pci_kvm_interp_allowed(void);
+#else
+static inline bool s390_pci_kvm_interp_allowed(void)
+{
+    return false;
+}
+#endif
+
+#endif
diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
index ba04997da1e..30712487d48 100644
--- a/target/s390x/kvm/kvm.c
+++ b/target/s390x/kvm/kvm.c
@@ -158,6 +158,7 @@ static int cap_ri;
 static int cap_hpage_1m;
 static int cap_vcpu_resets;
 static int cap_protected;
+static int cap_zpci_op;
 
 static bool mem_op_storage_key_support;
 
@@ -363,6 +364,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
     cap_s390_irq = kvm_check_extension(s, KVM_CAP_S390_INJECT_IRQ);
     cap_vcpu_resets = kvm_check_extension(s, KVM_CAP_S390_VCPU_RESETS);
     cap_protected = kvm_check_extension(s, KVM_CAP_S390_PROTECTED);
+    cap_zpci_op = kvm_check_extension(s, KVM_CAP_S390_ZPCI_OP);
 
     kvm_vm_enable_cap(s, KVM_CAP_S390_USER_SIGP, 0);
     kvm_vm_enable_cap(s, KVM_CAP_S390_VECTOR_REGISTERS, 0);
@@ -2579,3 +2581,8 @@ bool kvm_arch_cpu_check_are_resettable(void)
 {
     return true;
 }
+
+int kvm_s390_get_zpci_op(void)
+{
+    return cap_zpci_op;
+}
diff --git a/target/s390x/kvm/kvm_s390x.h b/target/s390x/kvm/kvm_s390x.h
index 05a5e1e6f46..aaae8570de5 100644
--- a/target/s390x/kvm/kvm_s390x.h
+++ b/target/s390x/kvm/kvm_s390x.h
@@ -27,6 +27,7 @@ void kvm_s390_vcpu_interrupt_pre_save(S390CPU *cpu);
 int kvm_s390_vcpu_interrupt_post_load(S390CPU *cpu);
 int kvm_s390_get_hpage_1m(void);
 int kvm_s390_get_ri(void);
+int kvm_s390_get_zpci_op(void);
 int kvm_s390_get_clock(uint8_t *tod_high, uint64_t *tod_clock);
 int kvm_s390_get_clock_ext(uint8_t *tod_high, uint64_t *tod_clock);
 int kvm_s390_set_clock(uint8_t tod_high, uint64_t tod_clock);
-- 
Gitee


From 48672844dd4095e4558db07ae49276dcb852e1a3 Mon Sep 17 00:00:00 2001
From: Matthew Rosato <mjrosato@linux.ibm.com>
Date: Fri, 2 Sep 2022 13:27:33 -0400
Subject: [PATCH 174/407] s390x/pci: don't fence interpreted devices without
 MSI-X
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 226: s390: Enhanced Interpretation for PCI Functions and Secure Execution guest dump
RH-Bugzilla: 1664378 2043909
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [8/41] 52bad4368e9494c43133338b386dc0cc159aeedc

Lack of MSI-X support is not an issue for interpreted passthrough
devices, so let's let these in.  This will allow, for example, ISM
devices to be passed through -- but only when interpretation is
available and being used.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Message-Id: <20220902172737.170349-5-mjrosato@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 15d0e7942d3b31ff71d8e0e8cec3a8203214f19b)
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 hw/s390x/s390-pci-bus.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 18bfae0465a..07c7c155e37 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -881,6 +881,10 @@ static int s390_pci_msix_init(S390PCIBusDevice *pbdev)
 
 static void s390_pci_msix_free(S390PCIBusDevice *pbdev)
 {
+    if (pbdev->msix.entries == 0) {
+        return;
+    }
+
     memory_region_del_subregion(&pbdev->iommu->mr, &pbdev->msix_notify_mr);
     object_unparent(OBJECT(&pbdev->msix_notify_mr));
 }
@@ -1093,7 +1097,7 @@ static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
             pbdev->interp = false;
         }
 
-        if (s390_pci_msix_init(pbdev)) {
+        if (s390_pci_msix_init(pbdev) && !pbdev->interp) {
             error_setg(errp, "MSI-X support is mandatory "
                        "in the S390 architecture");
             return;
-- 
Gitee


From 6db25f3708f4e7d0791be39551befb49d2c12f60 Mon Sep 17 00:00:00 2001
From: Matthew Rosato <mjrosato@linux.ibm.com>
Date: Fri, 2 Sep 2022 13:27:34 -0400
Subject: [PATCH 175/407] s390x/pci: enable adapter event notification for
 interpreted devices
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 226: s390: Enhanced Interpretation for PCI Functions and Secure Execution guest dump
RH-Bugzilla: 1664378 2043909
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [9/41] 771975c436c7cb608e0e9e40edd732ac310beb69

Use the associated kvm ioctl operation to enable adapter event notification
and forwarding for devices when requested.  This feature will be set up
with or without firmware assist based upon the 'forwarding_assist' setting.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Message-Id: <20220902172737.170349-6-mjrosato@linux.ibm.com>
[thuth: Rename "forwarding_assist" property to "forwarding-assist"]
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit d0bc7091c2013ad2fa164100cf7b17962370e8ab)
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 hw/s390x/s390-pci-bus.c         | 20 ++++++++++++++---
 hw/s390x/s390-pci-inst.c        | 40 +++++++++++++++++++++++++++++++--
 hw/s390x/s390-pci-kvm.c         | 30 +++++++++++++++++++++++++
 include/hw/s390x/s390-pci-bus.h |  1 +
 include/hw/s390x/s390-pci-kvm.h | 14 ++++++++++++
 5 files changed, 100 insertions(+), 5 deletions(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 07c7c155e37..cd152ce7110 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -190,7 +190,10 @@ void s390_pci_sclp_deconfigure(SCCB *sccb)
         rc = SCLP_RC_NO_ACTION_REQUIRED;
         break;
     default:
-        if (pbdev->summary_ind) {
+        if (pbdev->interp && (pbdev->fh & FH_MASK_ENABLE)) {
+            /* Interpreted devices were using interrupt forwarding */
+            s390_pci_kvm_aif_disable(pbdev);
+        } else if (pbdev->summary_ind) {
             pci_dereg_irqs(pbdev);
         }
         if (pbdev->iommu->enabled) {
@@ -1082,6 +1085,7 @@ static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
                 } else {
                     DPRINTF("zPCI interpretation facilities missing.\n");
                     pbdev->interp = false;
+                    pbdev->forwarding_assist = false;
                 }
             }
             pbdev->iommu->dma_limit = s390_pci_start_dma_count(s, pbdev);
@@ -1090,11 +1094,13 @@ static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
             if (!pbdev->interp) {
                 /* Do vfio passthrough but intercept for I/O */
                 pbdev->fh |= FH_SHM_VFIO;
+                pbdev->forwarding_assist = false;
             }
         } else {
             pbdev->fh |= FH_SHM_EMUL;
             /* Always intercept emulated devices */
             pbdev->interp = false;
+            pbdev->forwarding_assist = false;
         }
 
         if (s390_pci_msix_init(pbdev) && !pbdev->interp) {
@@ -1244,7 +1250,10 @@ static void s390_pcihost_reset(DeviceState *dev)
     /* Process all pending unplug requests */
     QTAILQ_FOREACH_SAFE(pbdev, &s->zpci_devs, link, next) {
         if (pbdev->unplug_requested) {
-            if (pbdev->summary_ind) {
+            if (pbdev->interp && (pbdev->fh & FH_MASK_ENABLE)) {
+                /* Interpreted devices were using interrupt forwarding */
+                s390_pci_kvm_aif_disable(pbdev);
+            } else if (pbdev->summary_ind) {
                 pci_dereg_irqs(pbdev);
             }
             if (pbdev->iommu->enabled) {
@@ -1382,7 +1391,10 @@ static void s390_pci_device_reset(DeviceState *dev)
         break;
     }
 
-    if (pbdev->summary_ind) {
+    if (pbdev->interp && (pbdev->fh & FH_MASK_ENABLE)) {
+        /* Interpreted devices were using interrupt forwarding */
+        s390_pci_kvm_aif_disable(pbdev);
+    } else if (pbdev->summary_ind) {
         pci_dereg_irqs(pbdev);
     }
     if (pbdev->iommu->enabled) {
@@ -1428,6 +1440,8 @@ static Property s390_pci_device_properties[] = {
     DEFINE_PROP_S390_PCI_FID("fid", S390PCIBusDevice, fid),
     DEFINE_PROP_STRING("target", S390PCIBusDevice, target),
     DEFINE_PROP_BOOL("interpret", S390PCIBusDevice, interp, true),
+    DEFINE_PROP_BOOL("forwarding-assist", S390PCIBusDevice, forwarding_assist,
+                     true),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
index 651ec386350..20a9bcc7afb 100644
--- a/hw/s390x/s390-pci-inst.c
+++ b/hw/s390x/s390-pci-inst.c
@@ -1066,6 +1066,32 @@ static void fmb_update(void *opaque)
     timer_mod(pbdev->fmb_timer, t + pbdev->pci_group->zpci_group.mui);
 }
 
+static int mpcifc_reg_int_interp(S390PCIBusDevice *pbdev, ZpciFib *fib)
+{
+    int rc;
+
+    rc = s390_pci_kvm_aif_enable(pbdev, fib, pbdev->forwarding_assist);
+    if (rc) {
+        DPRINTF("Failed to enable interrupt forwarding\n");
+        return rc;
+    }
+
+    return 0;
+}
+
+static int mpcifc_dereg_int_interp(S390PCIBusDevice *pbdev, ZpciFib *fib)
+{
+    int rc;
+
+    rc = s390_pci_kvm_aif_disable(pbdev);
+    if (rc) {
+        DPRINTF("Failed to disable interrupt forwarding\n");
+        return rc;
+    }
+
+    return 0;
+}
+
 int mpcifc_service_call(S390CPU *cpu, uint8_t r1, uint64_t fiba, uint8_t ar,
                         uintptr_t ra)
 {
@@ -1120,7 +1146,12 @@ int mpcifc_service_call(S390CPU *cpu, uint8_t r1, uint64_t fiba, uint8_t ar,
 
     switch (oc) {
     case ZPCI_MOD_FC_REG_INT:
-        if (pbdev->summary_ind) {
+        if (pbdev->interp) {
+            if (mpcifc_reg_int_interp(pbdev, &fib)) {
+                cc = ZPCI_PCI_LS_ERR;
+                s390_set_status_code(env, r1, ZPCI_MOD_ST_SEQUENCE);
+            }
+        } else if (pbdev->summary_ind) {
             cc = ZPCI_PCI_LS_ERR;
             s390_set_status_code(env, r1, ZPCI_MOD_ST_SEQUENCE);
         } else if (reg_irqs(env, pbdev, fib)) {
@@ -1129,7 +1160,12 @@ int mpcifc_service_call(S390CPU *cpu, uint8_t r1, uint64_t fiba, uint8_t ar,
         }
         break;
     case ZPCI_MOD_FC_DEREG_INT:
-        if (!pbdev->summary_ind) {
+        if (pbdev->interp) {
+            if (mpcifc_dereg_int_interp(pbdev, &fib)) {
+                cc = ZPCI_PCI_LS_ERR;
+                s390_set_status_code(env, r1, ZPCI_MOD_ST_SEQUENCE);
+            }
+        } else if (!pbdev->summary_ind) {
             cc = ZPCI_PCI_LS_ERR;
             s390_set_status_code(env, r1, ZPCI_MOD_ST_SEQUENCE);
         } else {
diff --git a/hw/s390x/s390-pci-kvm.c b/hw/s390x/s390-pci-kvm.c
index 0f16104a74c..9134fe185fe 100644
--- a/hw/s390x/s390-pci-kvm.c
+++ b/hw/s390x/s390-pci-kvm.c
@@ -11,12 +11,42 @@
 
 #include "qemu/osdep.h"
 
+#include <linux/kvm.h>
+
 #include "kvm/kvm_s390x.h"
 #include "hw/s390x/pv.h"
+#include "hw/s390x/s390-pci-bus.h"
 #include "hw/s390x/s390-pci-kvm.h"
+#include "hw/s390x/s390-pci-inst.h"
 #include "cpu_models.h"
 
 bool s390_pci_kvm_interp_allowed(void)
 {
     return kvm_s390_get_zpci_op() && !s390_is_pv();
 }
+
+int s390_pci_kvm_aif_enable(S390PCIBusDevice *pbdev, ZpciFib *fib, bool assist)
+{
+    struct kvm_s390_zpci_op args = {
+        .fh = pbdev->fh,
+        .op = KVM_S390_ZPCIOP_REG_AEN,
+        .u.reg_aen.ibv = fib->aibv,
+        .u.reg_aen.sb = fib->aisb,
+        .u.reg_aen.noi = FIB_DATA_NOI(fib->data),
+        .u.reg_aen.isc = FIB_DATA_ISC(fib->data),
+        .u.reg_aen.sbo = FIB_DATA_AISBO(fib->data),
+        .u.reg_aen.flags = (assist) ? 0 : KVM_S390_ZPCIOP_REGAEN_HOST
+    };
+
+    return kvm_vm_ioctl(kvm_state, KVM_S390_ZPCI_OP, &args);
+}
+
+int s390_pci_kvm_aif_disable(S390PCIBusDevice *pbdev)
+{
+    struct kvm_s390_zpci_op args = {
+        .fh = pbdev->fh,
+        .op = KVM_S390_ZPCIOP_DEREG_AEN
+    };
+
+    return kvm_vm_ioctl(kvm_state, KVM_S390_ZPCI_OP, &args);
+}
diff --git a/include/hw/s390x/s390-pci-bus.h b/include/hw/s390x/s390-pci-bus.h
index a9843dfe97d..5b09f0cf2fd 100644
--- a/include/hw/s390x/s390-pci-bus.h
+++ b/include/hw/s390x/s390-pci-bus.h
@@ -351,6 +351,7 @@ struct S390PCIBusDevice {
     bool pci_unplug_request_processed;
     bool unplug_requested;
     bool interp;
+    bool forwarding_assist;
     QTAILQ_ENTRY(S390PCIBusDevice) link;
 };
 
diff --git a/include/hw/s390x/s390-pci-kvm.h b/include/hw/s390x/s390-pci-kvm.h
index 80a2e7d0caf..933814a4025 100644
--- a/include/hw/s390x/s390-pci-kvm.h
+++ b/include/hw/s390x/s390-pci-kvm.h
@@ -12,13 +12,27 @@
 #ifndef HW_S390_PCI_KVM_H
 #define HW_S390_PCI_KVM_H
 
+#include "hw/s390x/s390-pci-bus.h"
+#include "hw/s390x/s390-pci-inst.h"
+
 #ifdef CONFIG_KVM
 bool s390_pci_kvm_interp_allowed(void);
+int s390_pci_kvm_aif_enable(S390PCIBusDevice *pbdev, ZpciFib *fib, bool assist);
+int s390_pci_kvm_aif_disable(S390PCIBusDevice *pbdev);
 #else
 static inline bool s390_pci_kvm_interp_allowed(void)
 {
     return false;
 }
+static inline int s390_pci_kvm_aif_enable(S390PCIBusDevice *pbdev, ZpciFib *fib,
+                                          bool assist)
+{
+    return -EINVAL;
+}
+static inline int s390_pci_kvm_aif_disable(S390PCIBusDevice *pbdev)
+{
+    return -EINVAL;
+}
 #endif
 
 #endif
-- 
Gitee


From 512cc797b4006e8f0c9a6d50b4ce3775bd3772b2 Mon Sep 17 00:00:00 2001
From: Matthew Rosato <mjrosato@linux.ibm.com>
Date: Fri, 2 Sep 2022 13:27:35 -0400
Subject: [PATCH 176/407] s390x/pci: let intercept devices have separate PCI
 groups
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 226: s390: Enhanced Interpretation for PCI Functions and Secure Execution guest dump
RH-Bugzilla: 1664378 2043909
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [10/41] 1545bdcd2e21386afa9869f0414e96eecb62647d

Let's use the reserved pool of simulated PCI groups to allow intercept
devices to have separate groups from interpreted devices as some group
values may be different. If we run out of simulated PCI groups, subsequent
intercept devices just get the default group.
Furthermore, if we encounter any PCI groups from hostdevs that are marked
as simulated, let's just assign them to the default group to avoid
conflicts between host simulated groups and our own simulated groups.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Message-Id: <20220902172737.170349-7-mjrosato@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 30dcf4f7fd23bef7d72a2454c60881710fd4c785)
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 hw/s390x/s390-pci-bus.c         | 19 ++++++++++++++--
 hw/s390x/s390-pci-vfio.c        | 40 ++++++++++++++++++++++++++++++---
 include/hw/s390x/s390-pci-bus.h |  6 ++++-
 3 files changed, 59 insertions(+), 6 deletions(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index cd152ce7110..d8b1e44a02f 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -748,13 +748,14 @@ static void s390_pci_iommu_free(S390pciState *s, PCIBus *bus, int32_t devfn)
     object_unref(OBJECT(iommu));
 }
 
-S390PCIGroup *s390_group_create(int id)
+S390PCIGroup *s390_group_create(int id, int host_id)
 {
     S390PCIGroup *group;
     S390pciState *s = s390_get_phb();
 
     group = g_new0(S390PCIGroup, 1);
     group->id = id;
+    group->host_id = host_id;
     QTAILQ_INSERT_TAIL(&s->zpci_groups, group, link);
     return group;
 }
@@ -772,12 +773,25 @@ S390PCIGroup *s390_group_find(int id)
     return NULL;
 }
 
+S390PCIGroup *s390_group_find_host_sim(int host_id)
+{
+    S390PCIGroup *group;
+    S390pciState *s = s390_get_phb();
+
+    QTAILQ_FOREACH(group, &s->zpci_groups, link) {
+        if (group->id >= ZPCI_SIM_GRP_START && group->host_id == host_id) {
+            return group;
+        }
+    }
+    return NULL;
+}
+
 static void s390_pci_init_default_group(void)
 {
     S390PCIGroup *group;
     ClpRspQueryPciGrp *resgrp;
 
-    group = s390_group_create(ZPCI_DEFAULT_FN_GRP);
+    group = s390_group_create(ZPCI_DEFAULT_FN_GRP, ZPCI_DEFAULT_FN_GRP);
     resgrp = &group->zpci_group;
     resgrp->fr = 1;
     resgrp->dasm = 0;
@@ -825,6 +839,7 @@ static void s390_pcihost_realize(DeviceState *dev, Error **errp)
                                            NULL, g_free);
     s->zpci_table = g_hash_table_new_full(g_int_hash, g_int_equal, NULL, NULL);
     s->bus_no = 0;
+    s->next_sim_grp = ZPCI_SIM_GRP_START;
     QTAILQ_INIT(&s->pending_sei);
     QTAILQ_INIT(&s->zpci_devs);
     QTAILQ_INIT(&s->zpci_dma_limit);
diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
index 08bcc55e853..338f436e875 100644
--- a/hw/s390x/s390-pci-vfio.c
+++ b/hw/s390x/s390-pci-vfio.c
@@ -150,13 +150,18 @@ static void s390_pci_read_group(S390PCIBusDevice *pbdev,
 {
     struct vfio_info_cap_header *hdr;
     struct vfio_device_info_cap_zpci_group *cap;
+    S390pciState *s = s390_get_phb();
     ClpRspQueryPciGrp *resgrp;
     VFIOPCIDevice *vpci =  container_of(pbdev->pdev, VFIOPCIDevice, pdev);
+    uint8_t start_gid = pbdev->zpci_fn.pfgid;
 
     hdr = vfio_get_device_info_cap(info, VFIO_DEVICE_INFO_CAP_ZPCI_GROUP);
 
-    /* If capability not provided, just use the default group */
-    if (hdr == NULL) {
+    /*
+     * If capability not provided or the underlying hostdev is simulated, just
+     * use the default group.
+     */
+    if (hdr == NULL || pbdev->zpci_fn.pfgid >= ZPCI_SIM_GRP_START) {
         trace_s390_pci_clp_cap(vpci->vbasedev.name,
                                VFIO_DEVICE_INFO_CAP_ZPCI_GROUP);
         pbdev->zpci_fn.pfgid = ZPCI_DEFAULT_FN_GRP;
@@ -165,11 +170,40 @@ static void s390_pci_read_group(S390PCIBusDevice *pbdev,
     }
     cap = (void *) hdr;
 
+    /*
+     * For an intercept device, let's use an existing simulated group if one
+     * one was already created for other intercept devices in this group.
+     * If not, create a new simulated group if any are still available.
+     * If all else fails, just fall back on the default group.
+     */
+    if (!pbdev->interp) {
+        pbdev->pci_group = s390_group_find_host_sim(pbdev->zpci_fn.pfgid);
+        if (pbdev->pci_group) {
+            /* Use existing simulated group */
+            pbdev->zpci_fn.pfgid = pbdev->pci_group->id;
+            return;
+        } else {
+            if (s->next_sim_grp == ZPCI_DEFAULT_FN_GRP) {
+                /* All out of simulated groups, use default */
+                trace_s390_pci_clp_cap(vpci->vbasedev.name,
+                                       VFIO_DEVICE_INFO_CAP_ZPCI_GROUP);
+                pbdev->zpci_fn.pfgid = ZPCI_DEFAULT_FN_GRP;
+                pbdev->pci_group = s390_group_find(ZPCI_DEFAULT_FN_GRP);
+                return;
+            } else {
+                /* We can assign a new simulated group */
+                pbdev->zpci_fn.pfgid = s->next_sim_grp;
+                s->next_sim_grp++;
+                /* Fall through to create the new sim group using CLP info */
+            }
+        }
+    }
+
     /* See if the PCI group is already defined, create if not */
     pbdev->pci_group = s390_group_find(pbdev->zpci_fn.pfgid);
 
     if (!pbdev->pci_group) {
-        pbdev->pci_group = s390_group_create(pbdev->zpci_fn.pfgid);
+        pbdev->pci_group = s390_group_create(pbdev->zpci_fn.pfgid, start_gid);
 
         resgrp = &pbdev->pci_group->zpci_group;
         if (cap->flags & VFIO_DEVICE_INFO_ZPCI_FLAG_REFRESH) {
diff --git a/include/hw/s390x/s390-pci-bus.h b/include/hw/s390x/s390-pci-bus.h
index 5b09f0cf2fd..0605fcea24d 100644
--- a/include/hw/s390x/s390-pci-bus.h
+++ b/include/hw/s390x/s390-pci-bus.h
@@ -315,13 +315,16 @@ typedef struct ZpciFmb {
 QEMU_BUILD_BUG_MSG(offsetof(ZpciFmb, fmt0) != 48, "padding in ZpciFmb");
 
 #define ZPCI_DEFAULT_FN_GRP 0xFF
+#define ZPCI_SIM_GRP_START 0xF0
 typedef struct S390PCIGroup {
     ClpRspQueryPciGrp zpci_group;
     int id;
+    int host_id;
     QTAILQ_ENTRY(S390PCIGroup) link;
 } S390PCIGroup;
-S390PCIGroup *s390_group_create(int id);
+S390PCIGroup *s390_group_create(int id, int host_id);
 S390PCIGroup *s390_group_find(int id);
+S390PCIGroup *s390_group_find_host_sim(int host_id);
 
 struct S390PCIBusDevice {
     DeviceState qdev;
@@ -370,6 +373,7 @@ struct S390pciState {
     QTAILQ_HEAD(, S390PCIBusDevice) zpci_devs;
     QTAILQ_HEAD(, S390PCIDMACount) zpci_dma_limit;
     QTAILQ_HEAD(, S390PCIGroup) zpci_groups;
+    uint8_t next_sim_grp;
 };
 
 S390pciState *s390_get_phb(void);
-- 
Gitee


From 2ec510708bbb9603d4c783ecc788264b303ad24b Mon Sep 17 00:00:00 2001
From: Matthew Rosato <mjrosato@linux.ibm.com>
Date: Fri, 2 Sep 2022 13:27:36 -0400
Subject: [PATCH 177/407] s390x/pci: reflect proper maxstbl for groups of
 interpreted devices
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 226: s390: Enhanced Interpretation for PCI Functions and Secure Execution guest dump
RH-Bugzilla: 1664378 2043909
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [11/41] 9ac2f5dedef3d743ef621525eef222a3e09d63b3

The maximum supported store block length might be different depending
on whether the instruction is interpretively executed (firmware-reported
maximum) or handled via userspace intercept (host kernel API maximum).
Choose the best available value during group creation.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Message-Id: <20220902172737.170349-8-mjrosato@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 9ee8f7e46a7d42ede69a4780200129bf1acb0d01)
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 hw/s390x/s390-pci-vfio.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
index 338f436e875..2aefa508a07 100644
--- a/hw/s390x/s390-pci-vfio.c
+++ b/hw/s390x/s390-pci-vfio.c
@@ -213,7 +213,11 @@ static void s390_pci_read_group(S390PCIBusDevice *pbdev,
         resgrp->msia = cap->msi_addr;
         resgrp->mui = cap->mui;
         resgrp->i = cap->noi;
-        resgrp->maxstbl = cap->maxstbl;
+        if (pbdev->interp && hdr->version >= 2) {
+            resgrp->maxstbl = cap->imaxstbl;
+        } else {
+            resgrp->maxstbl = cap->maxstbl;
+        }
         resgrp->version = cap->version;
         resgrp->dtsm = ZPCI_DTSM;
     }
-- 
Gitee


From dc0e97cb2650dbb33786a2164b753639a433a227 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= <clg@redhat.com>
Date: Mon, 7 Nov 2022 17:13:49 +0100
Subject: [PATCH 178/407] s390x/s390-virtio-ccw: Switch off zPCI enhancements
 on older machines
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 226: s390: Enhanced Interpretation for PCI Functions and Secure Execution guest dump
RH-Bugzilla: 1664378 2043909
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [12/41] 61e32bab6d68ee9abd6a0751944af82e002b05b4

zPCI enhancement features (interpretation and forward assist) were
recently introduced to improve performance on PCI passthrough devices.
To maintain the same behaviour on older Z machines, deactivate the
features with the associated properties.

Signed-off-by: Cédric Le Goater <clg@redhat.com>
Message-Id: <20221107161349.1032730-3-clg@kaod.org>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit d3d1a406127f7da482eafbdc871c120c2770bb91)
[ clg: applied zPCI restrictions to rhel8.5.0 machine and below ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 hw/s390x/s390-virtio-ccw.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index bec270598bc..bd80e72cf82 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -1130,8 +1130,14 @@ static void ccw_machine_rhel850_instance_options(MachineState *machine)
 
 static void ccw_machine_rhel850_class_options(MachineClass *mc)
 {
+    static GlobalProperty compat[] = {
+        { TYPE_S390_PCI_DEVICE, "interpret", "off", },
+        { TYPE_S390_PCI_DEVICE, "forwarding-assist", "off", },
+    };
+
     ccw_machine_rhel860_class_options(mc);
     compat_props_add(mc->compat_props, hw_compat_rhel_8_5, hw_compat_rhel_8_5_len);
+    compat_props_add(mc->compat_props, compat, G_N_ELEMENTS(compat));
     mc->smp_props.prefer_sockets = true;
 }
 DEFINE_CCW_MACHINE(rhel850, "rhel8.5.0", false);
-- 
Gitee


From bad8ecb7fb7b4c9450dd493514a9c7ef034f8c8c Mon Sep 17 00:00:00 2001
From: Janosch Frank <frankja@linux.ibm.com>
Date: Wed, 30 Mar 2022 12:35:55 +0000
Subject: [PATCH 179/407] dump: Use ERRP_GUARD()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 226: s390: Enhanced Interpretation for PCI Functions and Secure Execution guest dump
RH-Bugzilla: 1664378 2043909
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [13/41] f735cd1dab0230000cfadd878765fdf4647b239c

Let's move to the new way of handling errors before changing the dump
code. This patch has mostly been generated by the coccinelle script
scripts/coccinelle/errp-guard.cocci.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20220330123603.107120-2-frankja@linux.ibm.com>
(cherry picked from commit 86a518bba4f4d7c9016fc5b104fe1e58b00ad756)
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 dump/dump.c | 144 ++++++++++++++++++++++------------------------------
 1 file changed, 61 insertions(+), 83 deletions(-)

diff --git a/dump/dump.c b/dump/dump.c
index 662d0a62cd9..9876123f2e7 100644
--- a/dump/dump.c
+++ b/dump/dump.c
@@ -390,23 +390,21 @@ static void write_data(DumpState *s, void *buf, int length, Error **errp)
 static void write_memory(DumpState *s, GuestPhysBlock *block, ram_addr_t start,
                          int64_t size, Error **errp)
 {
+    ERRP_GUARD();
     int64_t i;
-    Error *local_err = NULL;
 
     for (i = 0; i < size / s->dump_info.page_size; i++) {
         write_data(s, block->host_addr + start + i * s->dump_info.page_size,
-                   s->dump_info.page_size, &local_err);
-        if (local_err) {
-            error_propagate(errp, local_err);
+                   s->dump_info.page_size, errp);
+        if (*errp) {
             return;
         }
     }
 
     if ((size % s->dump_info.page_size) != 0) {
         write_data(s, block->host_addr + start + i * s->dump_info.page_size,
-                   size % s->dump_info.page_size, &local_err);
-        if (local_err) {
-            error_propagate(errp, local_err);
+                   size % s->dump_info.page_size, errp);
+        if (*errp) {
             return;
         }
     }
@@ -476,11 +474,11 @@ static void get_offset_range(hwaddr phys_addr,
 
 static void write_elf_loads(DumpState *s, Error **errp)
 {
+    ERRP_GUARD();
     hwaddr offset, filesz;
     MemoryMapping *memory_mapping;
     uint32_t phdr_index = 1;
     uint32_t max_index;
-    Error *local_err = NULL;
 
     if (s->have_section) {
         max_index = s->sh_info;
@@ -494,14 +492,13 @@ static void write_elf_loads(DumpState *s, Error **errp)
                          s, &offset, &filesz);
         if (s->dump_info.d_class == ELFCLASS64) {
             write_elf64_load(s, memory_mapping, phdr_index++, offset,
-                             filesz, &local_err);
+                             filesz, errp);
         } else {
             write_elf32_load(s, memory_mapping, phdr_index++, offset,
-                             filesz, &local_err);
+                             filesz, errp);
         }
 
-        if (local_err) {
-            error_propagate(errp, local_err);
+        if (*errp) {
             return;
         }
 
@@ -514,7 +511,7 @@ static void write_elf_loads(DumpState *s, Error **errp)
 /* write elf header, PT_NOTE and elf note to vmcore. */
 static void dump_begin(DumpState *s, Error **errp)
 {
-    Error *local_err = NULL;
+    ERRP_GUARD();
 
     /*
      * the vmcore's format is:
@@ -542,73 +539,64 @@ static void dump_begin(DumpState *s, Error **errp)
 
     /* write elf header to vmcore */
     if (s->dump_info.d_class == ELFCLASS64) {
-        write_elf64_header(s, &local_err);
+        write_elf64_header(s, errp);
     } else {
-        write_elf32_header(s, &local_err);
+        write_elf32_header(s, errp);
     }
-    if (local_err) {
-        error_propagate(errp, local_err);
+    if (*errp) {
         return;
     }
 
     if (s->dump_info.d_class == ELFCLASS64) {
         /* write PT_NOTE to vmcore */
-        write_elf64_note(s, &local_err);
-        if (local_err) {
-            error_propagate(errp, local_err);
+        write_elf64_note(s, errp);
+        if (*errp) {
             return;
         }
 
         /* write all PT_LOAD to vmcore */
-        write_elf_loads(s, &local_err);
-        if (local_err) {
-            error_propagate(errp, local_err);
+        write_elf_loads(s, errp);
+        if (*errp) {
             return;
         }
 
         /* write section to vmcore */
         if (s->have_section) {
-            write_elf_section(s, 1, &local_err);
-            if (local_err) {
-                error_propagate(errp, local_err);
+            write_elf_section(s, 1, errp);
+            if (*errp) {
                 return;
             }
         }
 
         /* write notes to vmcore */
-        write_elf64_notes(fd_write_vmcore, s, &local_err);
-        if (local_err) {
-            error_propagate(errp, local_err);
+        write_elf64_notes(fd_write_vmcore, s, errp);
+        if (*errp) {
             return;
         }
     } else {
         /* write PT_NOTE to vmcore */
-        write_elf32_note(s, &local_err);
-        if (local_err) {
-            error_propagate(errp, local_err);
+        write_elf32_note(s, errp);
+        if (*errp) {
             return;
         }
 
         /* write all PT_LOAD to vmcore */
-        write_elf_loads(s, &local_err);
-        if (local_err) {
-            error_propagate(errp, local_err);
+        write_elf_loads(s, errp);
+        if (*errp) {
             return;
         }
 
         /* write section to vmcore */
         if (s->have_section) {
-            write_elf_section(s, 0, &local_err);
-            if (local_err) {
-                error_propagate(errp, local_err);
+            write_elf_section(s, 0, errp);
+            if (*errp) {
                 return;
             }
         }
 
         /* write notes to vmcore */
-        write_elf32_notes(fd_write_vmcore, s, &local_err);
-        if (local_err) {
-            error_propagate(errp, local_err);
+        write_elf32_notes(fd_write_vmcore, s, errp);
+        if (*errp) {
             return;
         }
     }
@@ -644,9 +632,9 @@ static int get_next_block(DumpState *s, GuestPhysBlock *block)
 /* write all memory to vmcore */
 static void dump_iterate(DumpState *s, Error **errp)
 {
+    ERRP_GUARD();
     GuestPhysBlock *block;
     int64_t size;
-    Error *local_err = NULL;
 
     do {
         block = s->next_block;
@@ -658,9 +646,8 @@ static void dump_iterate(DumpState *s, Error **errp)
                 size -= block->target_end - (s->begin + s->length);
             }
         }
-        write_memory(s, block, s->start, size, &local_err);
-        if (local_err) {
-            error_propagate(errp, local_err);
+        write_memory(s, block, s->start, size, errp);
+        if (*errp) {
             return;
         }
 
@@ -669,11 +656,10 @@ static void dump_iterate(DumpState *s, Error **errp)
 
 static void create_vmcore(DumpState *s, Error **errp)
 {
-    Error *local_err = NULL;
+    ERRP_GUARD();
 
-    dump_begin(s, &local_err);
-    if (local_err) {
-        error_propagate(errp, local_err);
+    dump_begin(s, errp);
+    if (*errp) {
         return;
     }
 
@@ -810,6 +796,7 @@ static bool note_name_equal(DumpState *s,
 /* write common header, sub header and elf note to vmcore */
 static void create_header32(DumpState *s, Error **errp)
 {
+    ERRP_GUARD();
     DiskDumpHeader32 *dh = NULL;
     KdumpSubHeader32 *kh = NULL;
     size_t size;
@@ -818,7 +805,6 @@ static void create_header32(DumpState *s, Error **errp)
     uint32_t bitmap_blocks;
     uint32_t status = 0;
     uint64_t offset_note;
-    Error *local_err = NULL;
 
     /* write common header, the version of kdump-compressed format is 6th */
     size = sizeof(DiskDumpHeader32);
@@ -894,9 +880,8 @@ static void create_header32(DumpState *s, Error **errp)
     s->note_buf_offset = 0;
 
     /* use s->note_buf to store notes temporarily */
-    write_elf32_notes(buf_write_note, s, &local_err);
-    if (local_err) {
-        error_propagate(errp, local_err);
+    write_elf32_notes(buf_write_note, s, errp);
+    if (*errp) {
         goto out;
     }
     if (write_buffer(s->fd, offset_note, s->note_buf,
@@ -922,6 +907,7 @@ out:
 /* write common header, sub header and elf note to vmcore */
 static void create_header64(DumpState *s, Error **errp)
 {
+    ERRP_GUARD();
     DiskDumpHeader64 *dh = NULL;
     KdumpSubHeader64 *kh = NULL;
     size_t size;
@@ -930,7 +916,6 @@ static void create_header64(DumpState *s, Error **errp)
     uint32_t bitmap_blocks;
     uint32_t status = 0;
     uint64_t offset_note;
-    Error *local_err = NULL;
 
     /* write common header, the version of kdump-compressed format is 6th */
     size = sizeof(DiskDumpHeader64);
@@ -1006,9 +991,8 @@ static void create_header64(DumpState *s, Error **errp)
     s->note_buf_offset = 0;
 
     /* use s->note_buf to store notes temporarily */
-    write_elf64_notes(buf_write_note, s, &local_err);
-    if (local_err) {
-        error_propagate(errp, local_err);
+    write_elf64_notes(buf_write_note, s, errp);
+    if (*errp) {
         goto out;
     }
 
@@ -1472,8 +1456,8 @@ out:
 
 static void create_kdump_vmcore(DumpState *s, Error **errp)
 {
+    ERRP_GUARD();
     int ret;
-    Error *local_err = NULL;
 
     /*
      * the kdump-compressed format is:
@@ -1503,21 +1487,18 @@ static void create_kdump_vmcore(DumpState *s, Error **errp)
         return;
     }
 
-    write_dump_header(s, &local_err);
-    if (local_err) {
-        error_propagate(errp, local_err);
+    write_dump_header(s, errp);
+    if (*errp) {
         return;
     }
 
-    write_dump_bitmap(s, &local_err);
-    if (local_err) {
-        error_propagate(errp, local_err);
+    write_dump_bitmap(s, errp);
+    if (*errp) {
         return;
     }
 
-    write_dump_pages(s, &local_err);
-    if (local_err) {
-        error_propagate(errp, local_err);
+    write_dump_pages(s, errp);
+    if (*errp) {
         return;
     }
 
@@ -1647,10 +1628,10 @@ static void dump_init(DumpState *s, int fd, bool has_format,
                       DumpGuestMemoryFormat format, bool paging, bool has_filter,
                       int64_t begin, int64_t length, Error **errp)
 {
+    ERRP_GUARD();
     VMCoreInfoState *vmci = vmcoreinfo_find();
     CPUState *cpu;
     int nr_cpus;
-    Error *err = NULL;
     int ret;
 
     s->has_format = has_format;
@@ -1769,9 +1750,8 @@ static void dump_init(DumpState *s, int fd, bool has_format,
 
     /* get memory mapping */
     if (paging) {
-        qemu_get_guest_memory_mapping(&s->list, &s->guest_phys_blocks, &err);
-        if (err != NULL) {
-            error_propagate(errp, err);
+        qemu_get_guest_memory_mapping(&s->list, &s->guest_phys_blocks, errp);
+        if (*errp) {
             goto cleanup;
         }
     } else {
@@ -1870,33 +1850,32 @@ cleanup:
 /* this operation might be time consuming. */
 static void dump_process(DumpState *s, Error **errp)
 {
-    Error *local_err = NULL;
+    ERRP_GUARD();
     DumpQueryResult *result = NULL;
 
     if (s->has_format && s->format == DUMP_GUEST_MEMORY_FORMAT_WIN_DMP) {
 #ifdef TARGET_X86_64
-        create_win_dump(s, &local_err);
+        create_win_dump(s, errp);
 #endif
     } else if (s->has_format && s->format != DUMP_GUEST_MEMORY_FORMAT_ELF) {
-        create_kdump_vmcore(s, &local_err);
+        create_kdump_vmcore(s, errp);
     } else {
-        create_vmcore(s, &local_err);
+        create_vmcore(s, errp);
     }
 
     /* make sure status is written after written_size updates */
     smp_wmb();
     qatomic_set(&s->status,
-               (local_err ? DUMP_STATUS_FAILED : DUMP_STATUS_COMPLETED));
+               (*errp ? DUMP_STATUS_FAILED : DUMP_STATUS_COMPLETED));
 
     /* send DUMP_COMPLETED message (unconditionally) */
     result = qmp_query_dump(NULL);
     /* should never fail */
     assert(result);
-    qapi_event_send_dump_completed(result, !!local_err, (local_err ?
-                                   error_get_pretty(local_err) : NULL));
+    qapi_event_send_dump_completed(result, !!*errp, (*errp ?
+                                                     error_get_pretty(*errp) : NULL));
     qapi_free_DumpQueryResult(result);
 
-    error_propagate(errp, local_err);
     dump_cleanup(s);
 }
 
@@ -1925,10 +1904,10 @@ void qmp_dump_guest_memory(bool paging, const char *file,
                            int64_t length, bool has_format,
                            DumpGuestMemoryFormat format, Error **errp)
 {
+    ERRP_GUARD();
     const char *p;
     int fd = -1;
     DumpState *s;
-    Error *local_err = NULL;
     bool detach_p = false;
 
     if (runstate_check(RUN_STATE_INMIGRATE)) {
@@ -2028,9 +2007,8 @@ void qmp_dump_guest_memory(bool paging, const char *file,
     dump_state_prepare(s);
 
     dump_init(s, fd, has_format, format, paging, has_begin,
-              begin, length, &local_err);
-    if (local_err) {
-        error_propagate(errp, local_err);
+              begin, length, errp);
+    if (*errp) {
         qatomic_set(&s->status, DUMP_STATUS_FAILED);
         return;
     }
-- 
Gitee


From 6f8d29003dfef1cf8836f8498f54ab09f4dd6dc0 Mon Sep 17 00:00:00 2001
From: Janosch Frank <frankja@linux.ibm.com>
Date: Thu, 7 Apr 2022 09:48:24 +0000
Subject: [PATCH 180/407] dump: Remove the sh_info variable
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 226: s390: Enhanced Interpretation for PCI Functions and Secure Execution guest dump
RH-Bugzilla: 1664378 2043909
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [14/41] 24af12b78c8f5a02cf85df2f6b1d64249f9499c9

There's no need to have phdr_num and sh_info at the same time. We can
make phdr_num 32 bit and set PN_XNUM when we write the header if
phdr_num >= PN_XNUM.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20220407094824.5074-1-frankja@linux.ibm.com>
(cherry picked from commit 046bc4160bc780eaacc2d702a2589f1a7a01188d)
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 dump/dump.c           | 44 +++++++++++++++++++++++--------------------
 include/sysemu/dump.h |  3 +--
 2 files changed, 25 insertions(+), 22 deletions(-)

diff --git a/dump/dump.c b/dump/dump.c
index 9876123f2e7..7236b167cc5 100644
--- a/dump/dump.c
+++ b/dump/dump.c
@@ -124,6 +124,12 @@ static int fd_write_vmcore(const void *buf, size_t size, void *opaque)
 
 static void write_elf64_header(DumpState *s, Error **errp)
 {
+    /*
+     * phnum in the elf header is 16 bit, if we have more segments we
+     * set phnum to PN_XNUM and write the real number of segments to a
+     * special section.
+     */
+    uint16_t phnum = MIN(s->phdr_num, PN_XNUM);
     Elf64_Ehdr elf_header;
     int ret;
 
@@ -138,9 +144,9 @@ static void write_elf64_header(DumpState *s, Error **errp)
     elf_header.e_ehsize = cpu_to_dump16(s, sizeof(elf_header));
     elf_header.e_phoff = cpu_to_dump64(s, sizeof(Elf64_Ehdr));
     elf_header.e_phentsize = cpu_to_dump16(s, sizeof(Elf64_Phdr));
-    elf_header.e_phnum = cpu_to_dump16(s, s->phdr_num);
+    elf_header.e_phnum = cpu_to_dump16(s, phnum);
     if (s->have_section) {
-        uint64_t shoff = sizeof(Elf64_Ehdr) + sizeof(Elf64_Phdr) * s->sh_info;
+        uint64_t shoff = sizeof(Elf64_Ehdr) + sizeof(Elf64_Phdr) * s->phdr_num;
 
         elf_header.e_shoff = cpu_to_dump64(s, shoff);
         elf_header.e_shentsize = cpu_to_dump16(s, sizeof(Elf64_Shdr));
@@ -155,6 +161,12 @@ static void write_elf64_header(DumpState *s, Error **errp)
 
 static void write_elf32_header(DumpState *s, Error **errp)
 {
+    /*
+     * phnum in the elf header is 16 bit, if we have more segments we
+     * set phnum to PN_XNUM and write the real number of segments to a
+     * special section.
+     */
+    uint16_t phnum = MIN(s->phdr_num, PN_XNUM);
     Elf32_Ehdr elf_header;
     int ret;
 
@@ -169,9 +181,9 @@ static void write_elf32_header(DumpState *s, Error **errp)
     elf_header.e_ehsize = cpu_to_dump16(s, sizeof(elf_header));
     elf_header.e_phoff = cpu_to_dump32(s, sizeof(Elf32_Ehdr));
     elf_header.e_phentsize = cpu_to_dump16(s, sizeof(Elf32_Phdr));
-    elf_header.e_phnum = cpu_to_dump16(s, s->phdr_num);
+    elf_header.e_phnum = cpu_to_dump16(s, phnum);
     if (s->have_section) {
-        uint32_t shoff = sizeof(Elf32_Ehdr) + sizeof(Elf32_Phdr) * s->sh_info;
+        uint32_t shoff = sizeof(Elf32_Ehdr) + sizeof(Elf32_Phdr) * s->phdr_num;
 
         elf_header.e_shoff = cpu_to_dump32(s, shoff);
         elf_header.e_shentsize = cpu_to_dump16(s, sizeof(Elf32_Shdr));
@@ -358,12 +370,12 @@ static void write_elf_section(DumpState *s, int type, Error **errp)
     if (type == 0) {
         shdr_size = sizeof(Elf32_Shdr);
         memset(&shdr32, 0, shdr_size);
-        shdr32.sh_info = cpu_to_dump32(s, s->sh_info);
+        shdr32.sh_info = cpu_to_dump32(s, s->phdr_num);
         shdr = &shdr32;
     } else {
         shdr_size = sizeof(Elf64_Shdr);
         memset(&shdr64, 0, shdr_size);
-        shdr64.sh_info = cpu_to_dump32(s, s->sh_info);
+        shdr64.sh_info = cpu_to_dump32(s, s->phdr_num);
         shdr = &shdr64;
     }
 
@@ -478,13 +490,6 @@ static void write_elf_loads(DumpState *s, Error **errp)
     hwaddr offset, filesz;
     MemoryMapping *memory_mapping;
     uint32_t phdr_index = 1;
-    uint32_t max_index;
-
-    if (s->have_section) {
-        max_index = s->sh_info;
-    } else {
-        max_index = s->phdr_num;
-    }
 
     QTAILQ_FOREACH(memory_mapping, &s->list.head, next) {
         get_offset_range(memory_mapping->phys_addr,
@@ -502,7 +507,7 @@ static void write_elf_loads(DumpState *s, Error **errp)
             return;
         }
 
-        if (phdr_index >= max_index) {
+        if (phdr_index >= s->phdr_num) {
             break;
         }
     }
@@ -1809,22 +1814,21 @@ static void dump_init(DumpState *s, int fd, bool has_format,
         s->phdr_num += s->list.num;
         s->have_section = false;
     } else {
+        /* sh_info of section 0 holds the real number of phdrs */
         s->have_section = true;
-        s->phdr_num = PN_XNUM;
-        s->sh_info = 1; /* PT_NOTE */
 
         /* the type of shdr->sh_info is uint32_t, so we should avoid overflow */
         if (s->list.num <= UINT32_MAX - 1) {
-            s->sh_info += s->list.num;
+            s->phdr_num += s->list.num;
         } else {
-            s->sh_info = UINT32_MAX;
+            s->phdr_num = UINT32_MAX;
         }
     }
 
     if (s->dump_info.d_class == ELFCLASS64) {
         if (s->have_section) {
             s->memory_offset = sizeof(Elf64_Ehdr) +
-                               sizeof(Elf64_Phdr) * s->sh_info +
+                               sizeof(Elf64_Phdr) * s->phdr_num +
                                sizeof(Elf64_Shdr) + s->note_size;
         } else {
             s->memory_offset = sizeof(Elf64_Ehdr) +
@@ -1833,7 +1837,7 @@ static void dump_init(DumpState *s, int fd, bool has_format,
     } else {
         if (s->have_section) {
             s->memory_offset = sizeof(Elf32_Ehdr) +
-                               sizeof(Elf32_Phdr) * s->sh_info +
+                               sizeof(Elf32_Phdr) * s->phdr_num +
                                sizeof(Elf32_Shdr) + s->note_size;
         } else {
             s->memory_offset = sizeof(Elf32_Ehdr) +
diff --git a/include/sysemu/dump.h b/include/sysemu/dump.h
index 250143cb5a7..b463fc9c022 100644
--- a/include/sysemu/dump.h
+++ b/include/sysemu/dump.h
@@ -154,8 +154,7 @@ typedef struct DumpState {
     GuestPhysBlockList guest_phys_blocks;
     ArchDumpInfo dump_info;
     MemoryMappingList list;
-    uint16_t phdr_num;
-    uint32_t sh_info;
+    uint32_t phdr_num;
     bool have_section;
     bool resume;
     bool detached;
-- 
Gitee


From f52b5b30b064b7e900b55c94657658702de0a550 Mon Sep 17 00:00:00 2001
From: Janosch Frank <frankja@linux.ibm.com>
Date: Wed, 30 Mar 2022 12:35:57 +0000
Subject: [PATCH 181/407] dump: Introduce shdr_num to decrease complexity
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 226: s390: Enhanced Interpretation for PCI Functions and Secure Execution guest dump
RH-Bugzilla: 1664378 2043909
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [15/41] b0215ea5d381ef7f6abfe3f3bafea51ce933da56

Let's move from a boolean to a int variable which will later enable us
to store the number of sections that are in the dump file.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20220330123603.107120-4-frankja@linux.ibm.com>
(cherry picked from commit 862a395858e5a302ed5921487777acdc95a3a31b)
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 dump/dump.c           | 24 ++++++++++++------------
 include/sysemu/dump.h |  2 +-
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/dump/dump.c b/dump/dump.c
index 7236b167cc5..972e28b0892 100644
--- a/dump/dump.c
+++ b/dump/dump.c
@@ -145,12 +145,12 @@ static void write_elf64_header(DumpState *s, Error **errp)
     elf_header.e_phoff = cpu_to_dump64(s, sizeof(Elf64_Ehdr));
     elf_header.e_phentsize = cpu_to_dump16(s, sizeof(Elf64_Phdr));
     elf_header.e_phnum = cpu_to_dump16(s, phnum);
-    if (s->have_section) {
+    if (s->shdr_num) {
         uint64_t shoff = sizeof(Elf64_Ehdr) + sizeof(Elf64_Phdr) * s->phdr_num;
 
         elf_header.e_shoff = cpu_to_dump64(s, shoff);
         elf_header.e_shentsize = cpu_to_dump16(s, sizeof(Elf64_Shdr));
-        elf_header.e_shnum = cpu_to_dump16(s, 1);
+        elf_header.e_shnum = cpu_to_dump16(s, s->shdr_num);
     }
 
     ret = fd_write_vmcore(&elf_header, sizeof(elf_header), s);
@@ -182,12 +182,12 @@ static void write_elf32_header(DumpState *s, Error **errp)
     elf_header.e_phoff = cpu_to_dump32(s, sizeof(Elf32_Ehdr));
     elf_header.e_phentsize = cpu_to_dump16(s, sizeof(Elf32_Phdr));
     elf_header.e_phnum = cpu_to_dump16(s, phnum);
-    if (s->have_section) {
+    if (s->shdr_num) {
         uint32_t shoff = sizeof(Elf32_Ehdr) + sizeof(Elf32_Phdr) * s->phdr_num;
 
         elf_header.e_shoff = cpu_to_dump32(s, shoff);
         elf_header.e_shentsize = cpu_to_dump16(s, sizeof(Elf32_Shdr));
-        elf_header.e_shnum = cpu_to_dump16(s, 1);
+        elf_header.e_shnum = cpu_to_dump16(s, s->shdr_num);
     }
 
     ret = fd_write_vmcore(&elf_header, sizeof(elf_header), s);
@@ -566,7 +566,7 @@ static void dump_begin(DumpState *s, Error **errp)
         }
 
         /* write section to vmcore */
-        if (s->have_section) {
+        if (s->shdr_num) {
             write_elf_section(s, 1, errp);
             if (*errp) {
                 return;
@@ -592,7 +592,7 @@ static void dump_begin(DumpState *s, Error **errp)
         }
 
         /* write section to vmcore */
-        if (s->have_section) {
+        if (s->shdr_num) {
             write_elf_section(s, 0, errp);
             if (*errp) {
                 return;
@@ -1811,11 +1811,11 @@ static void dump_init(DumpState *s, int fd, bool has_format,
      */
     s->phdr_num = 1; /* PT_NOTE */
     if (s->list.num < UINT16_MAX - 2) {
+        s->shdr_num = 0;
         s->phdr_num += s->list.num;
-        s->have_section = false;
     } else {
         /* sh_info of section 0 holds the real number of phdrs */
-        s->have_section = true;
+        s->shdr_num = 1;
 
         /* the type of shdr->sh_info is uint32_t, so we should avoid overflow */
         if (s->list.num <= UINT32_MAX - 1) {
@@ -1826,19 +1826,19 @@ static void dump_init(DumpState *s, int fd, bool has_format,
     }
 
     if (s->dump_info.d_class == ELFCLASS64) {
-        if (s->have_section) {
+        if (s->shdr_num) {
             s->memory_offset = sizeof(Elf64_Ehdr) +
                                sizeof(Elf64_Phdr) * s->phdr_num +
-                               sizeof(Elf64_Shdr) + s->note_size;
+                               sizeof(Elf64_Shdr) * s->shdr_num + s->note_size;
         } else {
             s->memory_offset = sizeof(Elf64_Ehdr) +
                                sizeof(Elf64_Phdr) * s->phdr_num + s->note_size;
         }
     } else {
-        if (s->have_section) {
+        if (s->shdr_num) {
             s->memory_offset = sizeof(Elf32_Ehdr) +
                                sizeof(Elf32_Phdr) * s->phdr_num +
-                               sizeof(Elf32_Shdr) + s->note_size;
+                               sizeof(Elf32_Shdr) * s->shdr_num + s->note_size;
         } else {
             s->memory_offset = sizeof(Elf32_Ehdr) +
                                sizeof(Elf32_Phdr) * s->phdr_num + s->note_size;
diff --git a/include/sysemu/dump.h b/include/sysemu/dump.h
index b463fc9c022..19458bffbd1 100644
--- a/include/sysemu/dump.h
+++ b/include/sysemu/dump.h
@@ -155,7 +155,7 @@ typedef struct DumpState {
     ArchDumpInfo dump_info;
     MemoryMappingList list;
     uint32_t phdr_num;
-    bool have_section;
+    uint32_t shdr_num;
     bool resume;
     bool detached;
     ssize_t note_size;
-- 
Gitee


From 10a3a8592d1ba5c9d303e57ef3ad5779bd98304e Mon Sep 17 00:00:00 2001
From: Janosch Frank <frankja@linux.ibm.com>
Date: Wed, 30 Mar 2022 12:35:58 +0000
Subject: [PATCH 182/407] dump: Remove the section if when calculating the
 memory offset
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 226: s390: Enhanced Interpretation for PCI Functions and Secure Execution guest dump
RH-Bugzilla: 1664378 2043909
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [16/41] ff214d2c23b9cb16fd49d22d976829267df43133

When s->shdr_num is 0 we'll add 0 bytes of section headers which is
equivalent to not adding section headers but with the multiplication
we can remove a if/else.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20220330123603.107120-5-frankja@linux.ibm.com>
(cherry picked from commit 344107e07bd81546474a54ab83800158ca953059)
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 dump/dump.c | 24 ++++++++----------------
 1 file changed, 8 insertions(+), 16 deletions(-)

diff --git a/dump/dump.c b/dump/dump.c
index 972e28b0892..5cc2322325b 100644
--- a/dump/dump.c
+++ b/dump/dump.c
@@ -1826,23 +1826,15 @@ static void dump_init(DumpState *s, int fd, bool has_format,
     }
 
     if (s->dump_info.d_class == ELFCLASS64) {
-        if (s->shdr_num) {
-            s->memory_offset = sizeof(Elf64_Ehdr) +
-                               sizeof(Elf64_Phdr) * s->phdr_num +
-                               sizeof(Elf64_Shdr) * s->shdr_num + s->note_size;
-        } else {
-            s->memory_offset = sizeof(Elf64_Ehdr) +
-                               sizeof(Elf64_Phdr) * s->phdr_num + s->note_size;
-        }
+        s->memory_offset = sizeof(Elf64_Ehdr) +
+                           sizeof(Elf64_Phdr) * s->phdr_num +
+                           sizeof(Elf64_Shdr) * s->shdr_num +
+                           s->note_size;
     } else {
-        if (s->shdr_num) {
-            s->memory_offset = sizeof(Elf32_Ehdr) +
-                               sizeof(Elf32_Phdr) * s->phdr_num +
-                               sizeof(Elf32_Shdr) * s->shdr_num + s->note_size;
-        } else {
-            s->memory_offset = sizeof(Elf32_Ehdr) +
-                               sizeof(Elf32_Phdr) * s->phdr_num + s->note_size;
-        }
+        s->memory_offset = sizeof(Elf32_Ehdr) +
+                           sizeof(Elf32_Phdr) * s->phdr_num +
+                           sizeof(Elf32_Shdr) * s->shdr_num +
+                           s->note_size;
     }
 
     return;
-- 
Gitee


From 31edd619673e2345f8740cb1cc77a212d1d5b126 Mon Sep 17 00:00:00 2001
From: Janosch Frank <frankja@linux.ibm.com>
Date: Wed, 30 Mar 2022 12:35:59 +0000
Subject: [PATCH 183/407] dump: Add more offset variables
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 226: s390: Enhanced Interpretation for PCI Functions and Secure Execution guest dump
RH-Bugzilla: 1664378 2043909
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [17/41] fbe629e1476e8a0e039f989af6e1f4707075ba01

Offset calculations are easy enough to get wrong. Let's add a few
variables to make moving around elf headers and data sections easier.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220330123603.107120-6-frankja@linux.ibm.com>
(cherry picked from commit e71d353360bb09a8e784e35d78370c691f6ea185)
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 dump/dump.c           | 35 +++++++++++++++--------------------
 include/sysemu/dump.h |  4 ++++
 2 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/dump/dump.c b/dump/dump.c
index 5cc2322325b..85a402b38c2 100644
--- a/dump/dump.c
+++ b/dump/dump.c
@@ -142,13 +142,11 @@ static void write_elf64_header(DumpState *s, Error **errp)
     elf_header.e_machine = cpu_to_dump16(s, s->dump_info.d_machine);
     elf_header.e_version = cpu_to_dump32(s, EV_CURRENT);
     elf_header.e_ehsize = cpu_to_dump16(s, sizeof(elf_header));
-    elf_header.e_phoff = cpu_to_dump64(s, sizeof(Elf64_Ehdr));
+    elf_header.e_phoff = cpu_to_dump64(s, s->phdr_offset);
     elf_header.e_phentsize = cpu_to_dump16(s, sizeof(Elf64_Phdr));
     elf_header.e_phnum = cpu_to_dump16(s, phnum);
     if (s->shdr_num) {
-        uint64_t shoff = sizeof(Elf64_Ehdr) + sizeof(Elf64_Phdr) * s->phdr_num;
-
-        elf_header.e_shoff = cpu_to_dump64(s, shoff);
+        elf_header.e_shoff = cpu_to_dump64(s, s->shdr_offset);
         elf_header.e_shentsize = cpu_to_dump16(s, sizeof(Elf64_Shdr));
         elf_header.e_shnum = cpu_to_dump16(s, s->shdr_num);
     }
@@ -179,13 +177,11 @@ static void write_elf32_header(DumpState *s, Error **errp)
     elf_header.e_machine = cpu_to_dump16(s, s->dump_info.d_machine);
     elf_header.e_version = cpu_to_dump32(s, EV_CURRENT);
     elf_header.e_ehsize = cpu_to_dump16(s, sizeof(elf_header));
-    elf_header.e_phoff = cpu_to_dump32(s, sizeof(Elf32_Ehdr));
+    elf_header.e_phoff = cpu_to_dump32(s, s->phdr_offset);
     elf_header.e_phentsize = cpu_to_dump16(s, sizeof(Elf32_Phdr));
     elf_header.e_phnum = cpu_to_dump16(s, phnum);
     if (s->shdr_num) {
-        uint32_t shoff = sizeof(Elf32_Ehdr) + sizeof(Elf32_Phdr) * s->phdr_num;
-
-        elf_header.e_shoff = cpu_to_dump32(s, shoff);
+        elf_header.e_shoff = cpu_to_dump32(s, s->shdr_offset);
         elf_header.e_shentsize = cpu_to_dump16(s, sizeof(Elf32_Shdr));
         elf_header.e_shnum = cpu_to_dump16(s, s->shdr_num);
     }
@@ -248,12 +244,11 @@ static void write_elf32_load(DumpState *s, MemoryMapping *memory_mapping,
 static void write_elf64_note(DumpState *s, Error **errp)
 {
     Elf64_Phdr phdr;
-    hwaddr begin = s->memory_offset - s->note_size;
     int ret;
 
     memset(&phdr, 0, sizeof(Elf64_Phdr));
     phdr.p_type = cpu_to_dump32(s, PT_NOTE);
-    phdr.p_offset = cpu_to_dump64(s, begin);
+    phdr.p_offset = cpu_to_dump64(s, s->note_offset);
     phdr.p_paddr = 0;
     phdr.p_filesz = cpu_to_dump64(s, s->note_size);
     phdr.p_memsz = cpu_to_dump64(s, s->note_size);
@@ -313,13 +308,12 @@ static void write_elf64_notes(WriteCoreDumpFunction f, DumpState *s,
 
 static void write_elf32_note(DumpState *s, Error **errp)
 {
-    hwaddr begin = s->memory_offset - s->note_size;
     Elf32_Phdr phdr;
     int ret;
 
     memset(&phdr, 0, sizeof(Elf32_Phdr));
     phdr.p_type = cpu_to_dump32(s, PT_NOTE);
-    phdr.p_offset = cpu_to_dump32(s, begin);
+    phdr.p_offset = cpu_to_dump32(s, s->note_offset);
     phdr.p_paddr = 0;
     phdr.p_filesz = cpu_to_dump32(s, s->note_size);
     phdr.p_memsz = cpu_to_dump32(s, s->note_size);
@@ -1826,15 +1820,16 @@ static void dump_init(DumpState *s, int fd, bool has_format,
     }
 
     if (s->dump_info.d_class == ELFCLASS64) {
-        s->memory_offset = sizeof(Elf64_Ehdr) +
-                           sizeof(Elf64_Phdr) * s->phdr_num +
-                           sizeof(Elf64_Shdr) * s->shdr_num +
-                           s->note_size;
+        s->phdr_offset = sizeof(Elf64_Ehdr);
+        s->shdr_offset = s->phdr_offset + sizeof(Elf64_Phdr) * s->phdr_num;
+        s->note_offset = s->shdr_offset + sizeof(Elf64_Shdr) * s->shdr_num;
+        s->memory_offset = s->note_offset + s->note_size;
     } else {
-        s->memory_offset = sizeof(Elf32_Ehdr) +
-                           sizeof(Elf32_Phdr) * s->phdr_num +
-                           sizeof(Elf32_Shdr) * s->shdr_num +
-                           s->note_size;
+
+        s->phdr_offset = sizeof(Elf32_Ehdr);
+        s->shdr_offset = s->phdr_offset + sizeof(Elf32_Phdr) * s->phdr_num;
+        s->note_offset = s->shdr_offset + sizeof(Elf32_Shdr) * s->shdr_num;
+        s->memory_offset = s->note_offset + s->note_size;
     }
 
     return;
diff --git a/include/sysemu/dump.h b/include/sysemu/dump.h
index 19458bffbd1..ffc2ea1072f 100644
--- a/include/sysemu/dump.h
+++ b/include/sysemu/dump.h
@@ -159,6 +159,10 @@ typedef struct DumpState {
     bool resume;
     bool detached;
     ssize_t note_size;
+    hwaddr shdr_offset;
+    hwaddr phdr_offset;
+    hwaddr section_offset;
+    hwaddr note_offset;
     hwaddr memory_offset;
     int fd;
 
-- 
Gitee


From b150ad5f1b31e2136134f6133e342b7c6a3111db Mon Sep 17 00:00:00 2001
From: Janosch Frank <frankja@linux.ibm.com>
Date: Wed, 30 Mar 2022 12:36:00 +0000
Subject: [PATCH 184/407] dump: Introduce dump_is_64bit() helper function
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 226: s390: Enhanced Interpretation for PCI Functions and Secure Execution guest dump
RH-Bugzilla: 1664378 2043909
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [18/41] a0fd2d1985c61b8e50d4a7ca26bc0ee6fcaa6196

Checking d_class in dump_info leads to lengthy conditionals so let's
shorten things a bit by introducing a helper function.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20220330123603.107120-7-frankja@linux.ibm.com>
(cherry picked from commit 05bbaa5040ccb3419e8b93af8040485430e2db42)
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 dump/dump.c | 25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/dump/dump.c b/dump/dump.c
index 85a402b38c2..6394e940238 100644
--- a/dump/dump.c
+++ b/dump/dump.c
@@ -55,6 +55,11 @@ static Error *dump_migration_blocker;
       DIV_ROUND_UP((name_size), 4) +                    \
       DIV_ROUND_UP((desc_size), 4)) * 4)
 
+static inline bool dump_is_64bit(DumpState *s)
+{
+    return s->dump_info.d_class == ELFCLASS64;
+}
+
 uint16_t cpu_to_dump16(DumpState *s, uint16_t val)
 {
     if (s->dump_info.d_endian == ELFDATA2LSB) {
@@ -489,7 +494,7 @@ static void write_elf_loads(DumpState *s, Error **errp)
         get_offset_range(memory_mapping->phys_addr,
                          memory_mapping->length,
                          s, &offset, &filesz);
-        if (s->dump_info.d_class == ELFCLASS64) {
+        if (dump_is_64bit(s)) {
             write_elf64_load(s, memory_mapping, phdr_index++, offset,
                              filesz, errp);
         } else {
@@ -537,7 +542,7 @@ static void dump_begin(DumpState *s, Error **errp)
      */
 
     /* write elf header to vmcore */
-    if (s->dump_info.d_class == ELFCLASS64) {
+    if (dump_is_64bit(s)) {
         write_elf64_header(s, errp);
     } else {
         write_elf32_header(s, errp);
@@ -546,7 +551,7 @@ static void dump_begin(DumpState *s, Error **errp)
         return;
     }
 
-    if (s->dump_info.d_class == ELFCLASS64) {
+    if (dump_is_64bit(s)) {
         /* write PT_NOTE to vmcore */
         write_elf64_note(s, errp);
         if (*errp) {
@@ -757,7 +762,7 @@ static void get_note_sizes(DumpState *s, const void *note,
     uint64_t name_sz;
     uint64_t desc_sz;
 
-    if (s->dump_info.d_class == ELFCLASS64) {
+    if (dump_is_64bit(s)) {
         const Elf64_Nhdr *hdr = note;
         note_head_sz = sizeof(Elf64_Nhdr);
         name_sz = tswap64(hdr->n_namesz);
@@ -1017,10 +1022,10 @@ out:
 
 static void write_dump_header(DumpState *s, Error **errp)
 {
-    if (s->dump_info.d_class == ELFCLASS32) {
-        create_header32(s, errp);
-    } else {
+    if (dump_is_64bit(s)) {
         create_header64(s, errp);
+    } else {
+        create_header32(s, errp);
     }
 }
 
@@ -1715,8 +1720,8 @@ static void dump_init(DumpState *s, int fd, bool has_format,
         uint32_t size;
         uint16_t format;
 
-        note_head_size = s->dump_info.d_class == ELFCLASS32 ?
-            sizeof(Elf32_Nhdr) : sizeof(Elf64_Nhdr);
+        note_head_size = dump_is_64bit(s) ?
+            sizeof(Elf64_Nhdr) : sizeof(Elf32_Nhdr);
 
         format = le16_to_cpu(vmci->vmcoreinfo.guest_format);
         size = le32_to_cpu(vmci->vmcoreinfo.size);
@@ -1819,7 +1824,7 @@ static void dump_init(DumpState *s, int fd, bool has_format,
         }
     }
 
-    if (s->dump_info.d_class == ELFCLASS64) {
+    if (dump_is_64bit(s)) {
         s->phdr_offset = sizeof(Elf64_Ehdr);
         s->shdr_offset = s->phdr_offset + sizeof(Elf64_Phdr) * s->phdr_num;
         s->note_offset = s->shdr_offset + sizeof(Elf64_Shdr) * s->shdr_num;
-- 
Gitee


From bbf5b999272598ebed25966c296c5b701f7f9adb Mon Sep 17 00:00:00 2001
From: Janosch Frank <frankja@linux.ibm.com>
Date: Wed, 30 Mar 2022 12:36:01 +0000
Subject: [PATCH 185/407] dump: Consolidate phdr note writes
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 226: s390: Enhanced Interpretation for PCI Functions and Secure Execution guest dump
RH-Bugzilla: 1664378 2043909
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [19/41] 180c4c0ab4941a0bf366dc7f32ee035e03daa6c0

There's no need to have two write functions. Let's rather have two
functions that set the data for elf 32/64 and then write it in a
common function.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20220330123603.107120-8-frankja@linux.ibm.com>
(cherry picked from commit bc7d558017e6700f9a05c61b0b638a8994945f0d)
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 dump/dump.c | 94 +++++++++++++++++++++++++++--------------------------
 1 file changed, 48 insertions(+), 46 deletions(-)

diff --git a/dump/dump.c b/dump/dump.c
index 6394e940238..823ca328835 100644
--- a/dump/dump.c
+++ b/dump/dump.c
@@ -246,24 +246,15 @@ static void write_elf32_load(DumpState *s, MemoryMapping *memory_mapping,
     }
 }
 
-static void write_elf64_note(DumpState *s, Error **errp)
+static void write_elf64_phdr_note(DumpState *s, Elf64_Phdr *phdr)
 {
-    Elf64_Phdr phdr;
-    int ret;
-
-    memset(&phdr, 0, sizeof(Elf64_Phdr));
-    phdr.p_type = cpu_to_dump32(s, PT_NOTE);
-    phdr.p_offset = cpu_to_dump64(s, s->note_offset);
-    phdr.p_paddr = 0;
-    phdr.p_filesz = cpu_to_dump64(s, s->note_size);
-    phdr.p_memsz = cpu_to_dump64(s, s->note_size);
-    phdr.p_vaddr = 0;
-
-    ret = fd_write_vmcore(&phdr, sizeof(Elf64_Phdr), s);
-    if (ret < 0) {
-        error_setg_errno(errp, -ret,
-                         "dump: failed to write program header table");
-    }
+    memset(phdr, 0, sizeof(*phdr));
+    phdr->p_type = cpu_to_dump32(s, PT_NOTE);
+    phdr->p_offset = cpu_to_dump64(s, s->note_offset);
+    phdr->p_paddr = 0;
+    phdr->p_filesz = cpu_to_dump64(s, s->note_size);
+    phdr->p_memsz = cpu_to_dump64(s, s->note_size);
+    phdr->p_vaddr = 0;
 }
 
 static inline int cpu_index(CPUState *cpu)
@@ -311,24 +302,15 @@ static void write_elf64_notes(WriteCoreDumpFunction f, DumpState *s,
     write_guest_note(f, s, errp);
 }
 
-static void write_elf32_note(DumpState *s, Error **errp)
+static void write_elf32_phdr_note(DumpState *s, Elf32_Phdr *phdr)
 {
-    Elf32_Phdr phdr;
-    int ret;
-
-    memset(&phdr, 0, sizeof(Elf32_Phdr));
-    phdr.p_type = cpu_to_dump32(s, PT_NOTE);
-    phdr.p_offset = cpu_to_dump32(s, s->note_offset);
-    phdr.p_paddr = 0;
-    phdr.p_filesz = cpu_to_dump32(s, s->note_size);
-    phdr.p_memsz = cpu_to_dump32(s, s->note_size);
-    phdr.p_vaddr = 0;
-
-    ret = fd_write_vmcore(&phdr, sizeof(Elf32_Phdr), s);
-    if (ret < 0) {
-        error_setg_errno(errp, -ret,
-                         "dump: failed to write program header table");
-    }
+    memset(phdr, 0, sizeof(*phdr));
+    phdr->p_type = cpu_to_dump32(s, PT_NOTE);
+    phdr->p_offset = cpu_to_dump32(s, s->note_offset);
+    phdr->p_paddr = 0;
+    phdr->p_filesz = cpu_to_dump32(s, s->note_size);
+    phdr->p_memsz = cpu_to_dump32(s, s->note_size);
+    phdr->p_vaddr = 0;
 }
 
 static void write_elf32_notes(WriteCoreDumpFunction f, DumpState *s,
@@ -358,6 +340,32 @@ static void write_elf32_notes(WriteCoreDumpFunction f, DumpState *s,
     write_guest_note(f, s, errp);
 }
 
+static void write_elf_phdr_note(DumpState *s, Error **errp)
+{
+    ERRP_GUARD();
+    Elf32_Phdr phdr32;
+    Elf64_Phdr phdr64;
+    void *phdr;
+    size_t size;
+    int ret;
+
+    if (dump_is_64bit(s)) {
+        write_elf64_phdr_note(s, &phdr64);
+        size = sizeof(phdr64);
+        phdr = &phdr64;
+    } else {
+        write_elf32_phdr_note(s, &phdr32);
+        size = sizeof(phdr32);
+        phdr = &phdr32;
+    }
+
+    ret = fd_write_vmcore(phdr, size, s);
+    if (ret < 0) {
+        error_setg_errno(errp, -ret,
+                         "dump: failed to write program header table");
+    }
+}
+
 static void write_elf_section(DumpState *s, int type, Error **errp)
 {
     Elf32_Shdr shdr32;
@@ -551,13 +559,13 @@ static void dump_begin(DumpState *s, Error **errp)
         return;
     }
 
-    if (dump_is_64bit(s)) {
-        /* write PT_NOTE to vmcore */
-        write_elf64_note(s, errp);
-        if (*errp) {
-            return;
-        }
+    /* write PT_NOTE to vmcore */
+    write_elf_phdr_note(s, errp);
+    if (*errp) {
+        return;
+    }
 
+    if (dump_is_64bit(s)) {
         /* write all PT_LOAD to vmcore */
         write_elf_loads(s, errp);
         if (*errp) {
@@ -578,12 +586,6 @@ static void dump_begin(DumpState *s, Error **errp)
             return;
         }
     } else {
-        /* write PT_NOTE to vmcore */
-        write_elf32_note(s, errp);
-        if (*errp) {
-            return;
-        }
-
         /* write all PT_LOAD to vmcore */
         write_elf_loads(s, errp);
         if (*errp) {
-- 
Gitee


From 451463c7f35b2861ed980c2a9a600a5bc5f9301d Mon Sep 17 00:00:00 2001
From: Janosch Frank <frankja@linux.ibm.com>
Date: Wed, 30 Mar 2022 12:36:02 +0000
Subject: [PATCH 186/407] dump: Cleanup dump_begin write functions
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 226: s390: Enhanced Interpretation for PCI Functions and Secure Execution guest dump
RH-Bugzilla: 1664378 2043909
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [20/41] 18ea1457a3e54fd368e556d96c3be50c6ad0a6bd

There's no need to have a gigantic if in there let's move the elf
32/64 bit logic into the section, segment or note code.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20220330123603.107120-9-frankja@linux.ibm.com>
(cherry picked from commit 5ff2e5a3e1e67930e523486e39549a33fcf97227)
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 dump/dump.c | 42 +++++++++++-------------------------------
 1 file changed, 11 insertions(+), 31 deletions(-)

diff --git a/dump/dump.c b/dump/dump.c
index 823ca328835..88abde355a3 100644
--- a/dump/dump.c
+++ b/dump/dump.c
@@ -565,46 +565,26 @@ static void dump_begin(DumpState *s, Error **errp)
         return;
     }
 
-    if (dump_is_64bit(s)) {
-        /* write all PT_LOAD to vmcore */
-        write_elf_loads(s, errp);
+    /* write all PT_LOAD to vmcore */
+    write_elf_loads(s, errp);
+    if (*errp) {
+        return;
+    }
+
+    /* write section to vmcore */
+    if (s->shdr_num) {
+        write_elf_section(s, 1, errp);
         if (*errp) {
             return;
         }
+    }
 
-        /* write section to vmcore */
-        if (s->shdr_num) {
-            write_elf_section(s, 1, errp);
-            if (*errp) {
-                return;
-            }
-        }
-
+    if (dump_is_64bit(s)) {
         /* write notes to vmcore */
         write_elf64_notes(fd_write_vmcore, s, errp);
-        if (*errp) {
-            return;
-        }
     } else {
-        /* write all PT_LOAD to vmcore */
-        write_elf_loads(s, errp);
-        if (*errp) {
-            return;
-        }
-
-        /* write section to vmcore */
-        if (s->shdr_num) {
-            write_elf_section(s, 0, errp);
-            if (*errp) {
-                return;
-            }
-        }
-
         /* write notes to vmcore */
         write_elf32_notes(fd_write_vmcore, s, errp);
-        if (*errp) {
-            return;
-        }
     }
 }
 
-- 
Gitee


From bb35fa66f40f938f7c6bfb85e81c1e5029e8a8a7 Mon Sep 17 00:00:00 2001
From: Janosch Frank <frankja@linux.ibm.com>
Date: Wed, 30 Mar 2022 12:36:03 +0000
Subject: [PATCH 187/407] dump: Consolidate elf note function
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 226: s390: Enhanced Interpretation for PCI Functions and Secure Execution guest dump
RH-Bugzilla: 1664378 2043909
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [21/41] 52298c098c116aea75ad15894731ff412c2c4e73

Just like with the other write functions let's move the 32/64 bit elf
handling to a function to improve readability.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20220330123603.107120-10-frankja@linux.ibm.com>
(cherry picked from commit c68124738bc29017e4254c898bc40be7be477af7)
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 dump/dump.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/dump/dump.c b/dump/dump.c
index 88abde355a3..a451abc590f 100644
--- a/dump/dump.c
+++ b/dump/dump.c
@@ -520,6 +520,15 @@ static void write_elf_loads(DumpState *s, Error **errp)
     }
 }
 
+static void write_elf_notes(DumpState *s, Error **errp)
+{
+    if (dump_is_64bit(s)) {
+        write_elf64_notes(fd_write_vmcore, s, errp);
+    } else {
+        write_elf32_notes(fd_write_vmcore, s, errp);
+    }
+}
+
 /* write elf header, PT_NOTE and elf note to vmcore. */
 static void dump_begin(DumpState *s, Error **errp)
 {
@@ -579,13 +588,8 @@ static void dump_begin(DumpState *s, Error **errp)
         }
     }
 
-    if (dump_is_64bit(s)) {
-        /* write notes to vmcore */
-        write_elf64_notes(fd_write_vmcore, s, errp);
-    } else {
-        /* write notes to vmcore */
-        write_elf32_notes(fd_write_vmcore, s, errp);
-    }
+    /* write notes to vmcore */
+    write_elf_notes(s, errp);
 }
 
 static int get_next_block(DumpState *s, GuestPhysBlock *block)
-- 
Gitee


From 6e9732e527f0e06d13c310f77e471da1bbadaf32 Mon Sep 17 00:00:00 2001
From: Janosch Frank <frankja@linux.ibm.com>
Date: Thu, 11 Aug 2022 12:10:54 +0000
Subject: [PATCH 188/407] dump: Replace opaque DumpState pointer with a typed
 one
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 226: s390: Enhanced Interpretation for PCI Functions and Secure Execution guest dump
RH-Bugzilla: 1664378 2043909
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [22/41] 5f071d7ef441ae6f5da70eb56018c4657deee3d7

It's always better to convey the type of a pointer if at all
possible. So let's add the DumpState typedef to typedefs.h and move
the dump note functions from the opaque pointers to DumpState
pointers.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
CC: Peter Maydell <peter.maydell@linaro.org>
CC: Cédric Le Goater <clg@kaod.org>
CC: Daniel Henrique Barboza <danielhb413@gmail.com>
CC: David Gibson <david@gibson.dropbear.id.au>
CC: Greg Kurz <groug@kaod.org>
CC: Palmer Dabbelt <palmer@dabbelt.com>
CC: Alistair Francis <alistair.francis@wdc.com>
CC: Bin Meng <bin.meng@windriver.com>
CC: Cornelia Huck <cohuck@redhat.com>
CC: Thomas Huth <thuth@redhat.com>
CC: Richard Henderson <richard.henderson@linaro.org>
CC: David Hildenbrand <david@redhat.com>
Acked-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20220811121111.9878-2-frankja@linux.ibm.com>
(cherry picked from commit 1af0006ab959864dfa2f59e9136c5fb93000b61f)
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 include/hw/core/sysemu-cpu-ops.h |  8 ++++----
 include/qemu/typedefs.h          |  1 +
 target/arm/arch_dump.c           |  6 ++----
 target/arm/cpu.h                 |  4 ++--
 target/i386/arch_dump.c          | 30 +++++++++++++++---------------
 target/i386/cpu.h                |  8 ++++----
 target/ppc/arch_dump.c           | 18 +++++++++---------
 target/ppc/cpu.h                 |  4 ++--
 target/riscv/arch_dump.c         |  6 ++----
 target/riscv/cpu.h               |  4 ++--
 target/s390x/arch_dump.c         | 10 +++++-----
 target/s390x/s390x-internal.h    |  2 +-
 12 files changed, 49 insertions(+), 52 deletions(-)

diff --git a/include/hw/core/sysemu-cpu-ops.h b/include/hw/core/sysemu-cpu-ops.h
index a9ba39e5f25..ee169b872ca 100644
--- a/include/hw/core/sysemu-cpu-ops.h
+++ b/include/hw/core/sysemu-cpu-ops.h
@@ -53,25 +53,25 @@ typedef struct SysemuCPUOps {
      * 32-bit VM coredump.
      */
     int (*write_elf32_note)(WriteCoreDumpFunction f, CPUState *cpu,
-                            int cpuid, void *opaque);
+                            int cpuid, DumpState *s);
     /**
      * @write_elf64_note: Callback for writing a CPU-specific ELF note to a
      * 64-bit VM coredump.
      */
     int (*write_elf64_note)(WriteCoreDumpFunction f, CPUState *cpu,
-                            int cpuid, void *opaque);
+                            int cpuid, DumpState *s);
     /**
      * @write_elf32_qemunote: Callback for writing a CPU- and QEMU-specific ELF
      * note to a 32-bit VM coredump.
      */
     int (*write_elf32_qemunote)(WriteCoreDumpFunction f, CPUState *cpu,
-                                void *opaque);
+                                DumpState *s);
     /**
      * @write_elf64_qemunote: Callback for writing a CPU- and QEMU-specific ELF
      * note to a 64-bit VM coredump.
      */
     int (*write_elf64_qemunote)(WriteCoreDumpFunction f, CPUState *cpu,
-                                void *opaque);
+                                DumpState *s);
     /**
      * @virtio_is_big_endian: Callback to return %true if a CPU which supports
      * runtime configurable endianness is currently big-endian.
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index ee60eb3de47..ac9d031be63 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -125,6 +125,7 @@ typedef struct VirtIODevice VirtIODevice;
 typedef struct Visitor Visitor;
 typedef struct VMChangeStateEntry VMChangeStateEntry;
 typedef struct VMStateDescription VMStateDescription;
+typedef struct DumpState DumpState;
 
 /*
  * Pointer types
diff --git a/target/arm/arch_dump.c b/target/arm/arch_dump.c
index 01848453109..3a824e0aa68 100644
--- a/target/arm/arch_dump.c
+++ b/target/arm/arch_dump.c
@@ -232,12 +232,11 @@ static int aarch64_write_elf64_sve(WriteCoreDumpFunction f,
 #endif
 
 int arm_cpu_write_elf64_note(WriteCoreDumpFunction f, CPUState *cs,
-                             int cpuid, void *opaque)
+                             int cpuid, DumpState *s)
 {
     struct aarch64_note note;
     ARMCPU *cpu = ARM_CPU(cs);
     CPUARMState *env = &cpu->env;
-    DumpState *s = opaque;
     uint64_t pstate, sp;
     int ret, i;
 
@@ -360,12 +359,11 @@ static int arm_write_elf32_vfp(WriteCoreDumpFunction f, CPUARMState *env,
 }
 
 int arm_cpu_write_elf32_note(WriteCoreDumpFunction f, CPUState *cs,
-                             int cpuid, void *opaque)
+                             int cpuid, DumpState *s)
 {
     struct arm_note note;
     ARMCPU *cpu = ARM_CPU(cs);
     CPUARMState *env = &cpu->env;
-    DumpState *s = opaque;
     int ret, i;
     bool fpvalid = cpu_isar_feature(aa32_vfp_simd, cpu);
 
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index e33f37b70ad..8d2f496ef9a 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1065,9 +1065,9 @@ int arm_gen_dynamic_svereg_xml(CPUState *cpu, int base_reg);
 const char *arm_gdb_get_dynamic_xml(CPUState *cpu, const char *xmlname);
 
 int arm_cpu_write_elf64_note(WriteCoreDumpFunction f, CPUState *cs,
-                             int cpuid, void *opaque);
+                             int cpuid, DumpState *s);
 int arm_cpu_write_elf32_note(WriteCoreDumpFunction f, CPUState *cs,
-                             int cpuid, void *opaque);
+                             int cpuid, DumpState *s);
 
 #ifdef TARGET_AARCH64
 int aarch64_cpu_gdb_read_register(CPUState *cpu, GByteArray *buf, int reg);
diff --git a/target/i386/arch_dump.c b/target/i386/arch_dump.c
index 004141fc042..c290910a04b 100644
--- a/target/i386/arch_dump.c
+++ b/target/i386/arch_dump.c
@@ -42,7 +42,7 @@ typedef struct {
 
 static int x86_64_write_elf64_note(WriteCoreDumpFunction f,
                                    CPUX86State *env, int id,
-                                   void *opaque)
+                                   DumpState *s)
 {
     x86_64_user_regs_struct regs;
     Elf64_Nhdr *note;
@@ -94,7 +94,7 @@ static int x86_64_write_elf64_note(WriteCoreDumpFunction f,
     buf += descsz - sizeof(x86_64_user_regs_struct)-sizeof(target_ulong);
     memcpy(buf, &regs, sizeof(x86_64_user_regs_struct));
 
-    ret = f(note, note_size, opaque);
+    ret = f(note, note_size, s);
     g_free(note);
     if (ret < 0) {
         return -1;
@@ -148,7 +148,7 @@ static void x86_fill_elf_prstatus(x86_elf_prstatus *prstatus, CPUX86State *env,
 }
 
 static int x86_write_elf64_note(WriteCoreDumpFunction f, CPUX86State *env,
-                                int id, void *opaque)
+                                int id, DumpState *s)
 {
     x86_elf_prstatus prstatus;
     Elf64_Nhdr *note;
@@ -170,7 +170,7 @@ static int x86_write_elf64_note(WriteCoreDumpFunction f, CPUX86State *env,
     buf += ROUND_UP(name_size, 4);
     memcpy(buf, &prstatus, sizeof(prstatus));
 
-    ret = f(note, note_size, opaque);
+    ret = f(note, note_size, s);
     g_free(note);
     if (ret < 0) {
         return -1;
@@ -180,7 +180,7 @@ static int x86_write_elf64_note(WriteCoreDumpFunction f, CPUX86State *env,
 }
 
 int x86_cpu_write_elf64_note(WriteCoreDumpFunction f, CPUState *cs,
-                             int cpuid, void *opaque)
+                             int cpuid, DumpState *s)
 {
     X86CPU *cpu = X86_CPU(cs);
     int ret;
@@ -189,10 +189,10 @@ int x86_cpu_write_elf64_note(WriteCoreDumpFunction f, CPUState *cs,
     bool lma = !!(first_x86_cpu->env.hflags & HF_LMA_MASK);
 
     if (lma) {
-        ret = x86_64_write_elf64_note(f, &cpu->env, cpuid, opaque);
+        ret = x86_64_write_elf64_note(f, &cpu->env, cpuid, s);
     } else {
 #endif
-        ret = x86_write_elf64_note(f, &cpu->env, cpuid, opaque);
+        ret = x86_write_elf64_note(f, &cpu->env, cpuid, s);
 #ifdef TARGET_X86_64
     }
 #endif
@@ -201,7 +201,7 @@ int x86_cpu_write_elf64_note(WriteCoreDumpFunction f, CPUState *cs,
 }
 
 int x86_cpu_write_elf32_note(WriteCoreDumpFunction f, CPUState *cs,
-                             int cpuid, void *opaque)
+                             int cpuid, DumpState *s)
 {
     X86CPU *cpu = X86_CPU(cs);
     x86_elf_prstatus prstatus;
@@ -224,7 +224,7 @@ int x86_cpu_write_elf32_note(WriteCoreDumpFunction f, CPUState *cs,
     buf += ROUND_UP(name_size, 4);
     memcpy(buf, &prstatus, sizeof(prstatus));
 
-    ret = f(note, note_size, opaque);
+    ret = f(note, note_size, s);
     g_free(note);
     if (ret < 0) {
         return -1;
@@ -329,7 +329,7 @@ static void qemu_get_cpustate(QEMUCPUState *s, CPUX86State *env)
 
 static inline int cpu_write_qemu_note(WriteCoreDumpFunction f,
                                       CPUX86State *env,
-                                      void *opaque,
+                                      DumpState *s,
                                       int type)
 {
     QEMUCPUState state;
@@ -369,7 +369,7 @@ static inline int cpu_write_qemu_note(WriteCoreDumpFunction f,
     buf += ROUND_UP(name_size, 4);
     memcpy(buf, &state, sizeof(state));
 
-    ret = f(note, note_size, opaque);
+    ret = f(note, note_size, s);
     g_free(note);
     if (ret < 0) {
         return -1;
@@ -379,19 +379,19 @@ static inline int cpu_write_qemu_note(WriteCoreDumpFunction f,
 }
 
 int x86_cpu_write_elf64_qemunote(WriteCoreDumpFunction f, CPUState *cs,
-                                 void *opaque)
+                                 DumpState *s)
 {
     X86CPU *cpu = X86_CPU(cs);
 
-    return cpu_write_qemu_note(f, &cpu->env, opaque, 1);
+    return cpu_write_qemu_note(f, &cpu->env, s, 1);
 }
 
 int x86_cpu_write_elf32_qemunote(WriteCoreDumpFunction f, CPUState *cs,
-                                 void *opaque)
+                                 DumpState *s)
 {
     X86CPU *cpu = X86_CPU(cs);
 
-    return cpu_write_qemu_note(f, &cpu->env, opaque, 0);
+    return cpu_write_qemu_note(f, &cpu->env, s, 0);
 }
 
 int cpu_get_dump_info(ArchDumpInfo *info,
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 006b735fe4a..5d2ddd81b95 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1887,13 +1887,13 @@ extern const VMStateDescription vmstate_x86_cpu;
 int x86_cpu_pending_interrupt(CPUState *cs, int interrupt_request);
 
 int x86_cpu_write_elf64_note(WriteCoreDumpFunction f, CPUState *cpu,
-                             int cpuid, void *opaque);
+                             int cpuid, DumpState *s);
 int x86_cpu_write_elf32_note(WriteCoreDumpFunction f, CPUState *cpu,
-                             int cpuid, void *opaque);
+                             int cpuid, DumpState *s);
 int x86_cpu_write_elf64_qemunote(WriteCoreDumpFunction f, CPUState *cpu,
-                                 void *opaque);
+                                 DumpState *s);
 int x86_cpu_write_elf32_qemunote(WriteCoreDumpFunction f, CPUState *cpu,
-                                 void *opaque);
+                                 DumpState *s);
 
 void x86_cpu_get_memory_mapping(CPUState *cpu, MemoryMappingList *list,
                                 Error **errp);
diff --git a/target/ppc/arch_dump.c b/target/ppc/arch_dump.c
index bb392f6d888..e9f512bcd48 100644
--- a/target/ppc/arch_dump.c
+++ b/target/ppc/arch_dump.c
@@ -270,23 +270,23 @@ ssize_t cpu_get_note_size(int class, int machine, int nr_cpus)
 static int ppc_write_all_elf_notes(const char *note_name,
                                    WriteCoreDumpFunction f,
                                    PowerPCCPU *cpu, int id,
-                                   void *opaque)
+                                   DumpState *s)
 {
-    NoteFuncArg arg = { .state = opaque };
+    NoteFuncArg arg = { .state = s };
     int ret = -1;
     int note_size;
     const NoteFuncDesc *nf;
 
     for (nf = note_func; nf->note_contents_func; nf++) {
-        arg.note.hdr.n_namesz = cpu_to_dump32(opaque, sizeof(arg.note.name));
-        arg.note.hdr.n_descsz = cpu_to_dump32(opaque, nf->contents_size);
+        arg.note.hdr.n_namesz = cpu_to_dump32(s, sizeof(arg.note.name));
+        arg.note.hdr.n_descsz = cpu_to_dump32(s, nf->contents_size);
         strncpy(arg.note.name, note_name, sizeof(arg.note.name));
 
         (*nf->note_contents_func)(&arg, cpu);
 
         note_size =
             sizeof(arg.note) - sizeof(arg.note.contents) + nf->contents_size;
-        ret = f(&arg.note, note_size, opaque);
+        ret = f(&arg.note, note_size, s);
         if (ret < 0) {
             return -1;
         }
@@ -295,15 +295,15 @@ static int ppc_write_all_elf_notes(const char *note_name,
 }
 
 int ppc64_cpu_write_elf64_note(WriteCoreDumpFunction f, CPUState *cs,
-                               int cpuid, void *opaque)
+                               int cpuid, DumpState *s)
 {
     PowerPCCPU *cpu = POWERPC_CPU(cs);
-    return ppc_write_all_elf_notes("CORE", f, cpu, cpuid, opaque);
+    return ppc_write_all_elf_notes("CORE", f, cpu, cpuid, s);
 }
 
 int ppc32_cpu_write_elf32_note(WriteCoreDumpFunction f, CPUState *cs,
-                               int cpuid, void *opaque)
+                               int cpuid, DumpState *s)
 {
     PowerPCCPU *cpu = POWERPC_CPU(cs);
-    return ppc_write_all_elf_notes("CORE", f, cpu, cpuid, opaque);
+    return ppc_write_all_elf_notes("CORE", f, cpu, cpuid, s);
 }
diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index e946da5f3a8..9c2b61266d3 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1289,9 +1289,9 @@ void ppc_gdb_gen_spr_xml(PowerPCCPU *cpu);
 const char *ppc_gdb_get_dynamic_xml(CPUState *cs, const char *xml_name);
 #endif
 int ppc64_cpu_write_elf64_note(WriteCoreDumpFunction f, CPUState *cs,
-                               int cpuid, void *opaque);
+                               int cpuid, DumpState *s);
 int ppc32_cpu_write_elf32_note(WriteCoreDumpFunction f, CPUState *cs,
-                               int cpuid, void *opaque);
+                               int cpuid, DumpState *s);
 #ifndef CONFIG_USER_ONLY
 void ppc_cpu_do_interrupt(CPUState *cpu);
 bool ppc_cpu_exec_interrupt(CPUState *cpu, int int_req);
diff --git a/target/riscv/arch_dump.c b/target/riscv/arch_dump.c
index 709f621d826..736a232956e 100644
--- a/target/riscv/arch_dump.c
+++ b/target/riscv/arch_dump.c
@@ -64,12 +64,11 @@ static void riscv64_note_init(struct riscv64_note *note, DumpState *s,
 }
 
 int riscv_cpu_write_elf64_note(WriteCoreDumpFunction f, CPUState *cs,
-                               int cpuid, void *opaque)
+                               int cpuid, DumpState *s)
 {
     struct riscv64_note note;
     RISCVCPU *cpu = RISCV_CPU(cs);
     CPURISCVState *env = &cpu->env;
-    DumpState *s = opaque;
     int ret, i = 0;
     const char name[] = "CORE";
 
@@ -134,12 +133,11 @@ static void riscv32_note_init(struct riscv32_note *note, DumpState *s,
 }
 
 int riscv_cpu_write_elf32_note(WriteCoreDumpFunction f, CPUState *cs,
-                               int cpuid, void *opaque)
+                               int cpuid, DumpState *s)
 {
     struct riscv32_note note;
     RISCVCPU *cpu = RISCV_CPU(cs);
     CPURISCVState *env = &cpu->env;
-    DumpState *s = opaque;
     int ret, i;
     const char name[] = "CORE";
 
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 0760c0af93d..4cce524b2c0 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -344,9 +344,9 @@ extern const char * const riscv_fpr_regnames[];
 const char *riscv_cpu_get_trap_name(target_ulong cause, bool async);
 void riscv_cpu_do_interrupt(CPUState *cpu);
 int riscv_cpu_write_elf64_note(WriteCoreDumpFunction f, CPUState *cs,
-                               int cpuid, void *opaque);
+                               int cpuid, DumpState *s);
 int riscv_cpu_write_elf32_note(WriteCoreDumpFunction f, CPUState *cs,
-                               int cpuid, void *opaque);
+                               int cpuid, DumpState *s);
 int riscv_cpu_gdb_read_register(CPUState *cpu, GByteArray *buf, int reg);
 int riscv_cpu_gdb_write_register(CPUState *cpu, uint8_t *buf, int reg);
 bool riscv_cpu_fp_enabled(CPURISCVState *env);
diff --git a/target/s390x/arch_dump.c b/target/s390x/arch_dump.c
index 08daf93ae1f..f60a14920d4 100644
--- a/target/s390x/arch_dump.c
+++ b/target/s390x/arch_dump.c
@@ -204,7 +204,7 @@ static const NoteFuncDesc note_linux[] = {
 static int s390x_write_elf64_notes(const char *note_name,
                                        WriteCoreDumpFunction f,
                                        S390CPU *cpu, int id,
-                                       void *opaque,
+                                       DumpState *s,
                                        const NoteFuncDesc *funcs)
 {
     Note note;
@@ -222,7 +222,7 @@ static int s390x_write_elf64_notes(const char *note_name,
         (*nf->note_contents_func)(&note, cpu, id);
 
         note_size = sizeof(note) - sizeof(note.contents) + nf->contents_size;
-        ret = f(&note, note_size, opaque);
+        ret = f(&note, note_size, s);
 
         if (ret < 0) {
             return -1;
@@ -235,16 +235,16 @@ static int s390x_write_elf64_notes(const char *note_name,
 
 
 int s390_cpu_write_elf64_note(WriteCoreDumpFunction f, CPUState *cs,
-                              int cpuid, void *opaque)
+                              int cpuid, DumpState *s)
 {
     S390CPU *cpu = S390_CPU(cs);
     int r;
 
-    r = s390x_write_elf64_notes("CORE", f, cpu, cpuid, opaque, note_core);
+    r = s390x_write_elf64_notes("CORE", f, cpu, cpuid, s, note_core);
     if (r) {
         return r;
     }
-    return s390x_write_elf64_notes("LINUX", f, cpu, cpuid, opaque, note_linux);
+    return s390x_write_elf64_notes("LINUX", f, cpu, cpuid, s, note_linux);
 }
 
 int cpu_get_dump_info(ArchDumpInfo *info,
diff --git a/target/s390x/s390x-internal.h b/target/s390x/s390x-internal.h
index 1a178aed416..02cf6c3f438 100644
--- a/target/s390x/s390x-internal.h
+++ b/target/s390x/s390x-internal.h
@@ -228,7 +228,7 @@ static inline hwaddr decode_basedisp_s(CPUS390XState *env, uint32_t ipb,
 
 /* arch_dump.c */
 int s390_cpu_write_elf64_note(WriteCoreDumpFunction f, CPUState *cs,
-                              int cpuid, void *opaque);
+                              int cpuid, DumpState *s);
 
 
 /* cc_helper.c */
-- 
Gitee


From fb8c488a6578958945524505425052d76cdbdeff Mon Sep 17 00:00:00 2001
From: Janosch Frank <frankja@linux.ibm.com>
Date: Thu, 11 Aug 2022 12:10:55 +0000
Subject: [PATCH 189/407] dump: Rename write_elf_loads to write_elf_phdr_loads
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 226: s390: Enhanced Interpretation for PCI Functions and Secure Execution guest dump
RH-Bugzilla: 1664378 2043909
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [23/41] 18e3ef70b97c525b7c43cf12143204bdb1060e4f

Let's make it a bit clearer that we write the program headers of the
PT_LOAD type.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
Reviewed-by: Steffen Eiden <seiden@ibm.linux.com>
Message-Id: <20220811121111.9878-3-frankja@linux.ibm.com>
(cherry picked from commit afae6056ea79e2d89fd90867de3a01732eae724f)
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 dump/dump.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/dump/dump.c b/dump/dump.c
index a451abc590f..fa787f379f6 100644
--- a/dump/dump.c
+++ b/dump/dump.c
@@ -491,7 +491,7 @@ static void get_offset_range(hwaddr phys_addr,
     }
 }
 
-static void write_elf_loads(DumpState *s, Error **errp)
+static void write_elf_phdr_loads(DumpState *s, Error **errp)
 {
     ERRP_GUARD();
     hwaddr offset, filesz;
@@ -574,8 +574,8 @@ static void dump_begin(DumpState *s, Error **errp)
         return;
     }
 
-    /* write all PT_LOAD to vmcore */
-    write_elf_loads(s, errp);
+    /* write all PT_LOADs to vmcore */
+    write_elf_phdr_loads(s, errp);
     if (*errp) {
         return;
     }
-- 
Gitee


From f8ad598671fd329dbe6fc85c753d31cc79cf7cbb Mon Sep 17 00:00:00 2001
From: Janosch Frank <frankja@linux.ibm.com>
Date: Thu, 11 Aug 2022 12:10:56 +0000
Subject: [PATCH 190/407] dump: Refactor dump_iterate and introduce
 dump_filter_memblock_*()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 226: s390: Enhanced Interpretation for PCI Functions and Secure Execution guest dump
RH-Bugzilla: 1664378 2043909
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [24/41] 74ef470f24d9d98093c4d63730a99474587033fd

The iteration over the memblocks in dump_iterate() is hard to
understand so it's about time to clean it up. Instead of manually
grabbing the next memblock we can use QTAILQ_FOREACH to iterate over
all memblocks.

Additionally we move the calculation of the offset and length out by
introducing and using the dump_filter_memblock_*() functions. These
functions will later be used to cleanup other parts of dump.c.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20220811121111.9878-4-frankja@linux.ibm.com>
(cherry picked from commit 1e8113032f5b1efc5da66382470ce4809c76f8f2)
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 dump/dump.c | 74 ++++++++++++++++++++++++++++++-----------------------
 1 file changed, 42 insertions(+), 32 deletions(-)

diff --git a/dump/dump.c b/dump/dump.c
index fa787f379f6..d981e843dd5 100644
--- a/dump/dump.c
+++ b/dump/dump.c
@@ -592,31 +592,43 @@ static void dump_begin(DumpState *s, Error **errp)
     write_elf_notes(s, errp);
 }
 
-static int get_next_block(DumpState *s, GuestPhysBlock *block)
+static int64_t dump_filtered_memblock_size(GuestPhysBlock *block,
+                                           int64_t filter_area_start,
+                                           int64_t filter_area_length)
 {
-    while (1) {
-        block = QTAILQ_NEXT(block, next);
-        if (!block) {
-            /* no more block */
-            return 1;
-        }
+    int64_t size, left, right;
 
-        s->start = 0;
-        s->next_block = block;
-        if (s->has_filter) {
-            if (block->target_start >= s->begin + s->length ||
-                block->target_end <= s->begin) {
-                /* This block is out of the range */
-                continue;
-            }
+    /* No filter, return full size */
+    if (!filter_area_length) {
+        return block->target_end - block->target_start;
+    }
 
-            if (s->begin > block->target_start) {
-                s->start = s->begin - block->target_start;
-            }
+    /* calculate the overlapped region. */
+    left = MAX(filter_area_start, block->target_start);
+    right = MIN(filter_area_start + filter_area_length, block->target_end);
+    size = right - left;
+    size = size > 0 ? size : 0;
+
+    return size;
+}
+
+static int64_t dump_filtered_memblock_start(GuestPhysBlock *block,
+                                            int64_t filter_area_start,
+                                            int64_t filter_area_length)
+{
+    if (filter_area_length) {
+        /* return -1 if the block is not within filter area */
+        if (block->target_start >= filter_area_start + filter_area_length ||
+            block->target_end <= filter_area_start) {
+            return -1;
         }
 
-        return 0;
+        if (filter_area_start > block->target_start) {
+            return filter_area_start - block->target_start;
+        }
     }
+
+    return 0;
 }
 
 /* write all memory to vmcore */
@@ -624,24 +636,22 @@ static void dump_iterate(DumpState *s, Error **errp)
 {
     ERRP_GUARD();
     GuestPhysBlock *block;
-    int64_t size;
-
-    do {
-        block = s->next_block;
+    int64_t memblock_size, memblock_start;
 
-        size = block->target_end - block->target_start;
-        if (s->has_filter) {
-            size -= s->start;
-            if (s->begin + s->length < block->target_end) {
-                size -= block->target_end - (s->begin + s->length);
-            }
+    QTAILQ_FOREACH(block, &s->guest_phys_blocks.head, next) {
+        memblock_start = dump_filtered_memblock_start(block, s->begin, s->length);
+        if (memblock_start == -1) {
+            continue;
         }
-        write_memory(s, block, s->start, size, errp);
+
+        memblock_size = dump_filtered_memblock_size(block, s->begin, s->length);
+
+        /* Write the memory to file */
+        write_memory(s, block, memblock_start, memblock_size, errp);
         if (*errp) {
             return;
         }
-
-    } while (!get_next_block(s, block));
+    }
 }
 
 static void create_vmcore(DumpState *s, Error **errp)
-- 
Gitee


From 74de280adc1d4ba6cf16b930acf95f2504a0dfbc Mon Sep 17 00:00:00 2001
From: Janosch Frank <frankja@linux.ibm.com>
Date: Thu, 11 Aug 2022 12:10:57 +0000
Subject: [PATCH 191/407] dump: Rework get_start_block
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 226: s390: Enhanced Interpretation for PCI Functions and Secure Execution guest dump
RH-Bugzilla: 1664378 2043909
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [25/41] c93842a1aaeadcc11e91c194452fcd05d163b3ca

get_start_block() returns the start address of the first memory block
or -1.

With the GuestPhysBlock iterator conversion we don't need to set the
start address and can therefore remove that code and the "start"
DumpState struct member. The only functionality left is the validation
of the start block so it only makes sense to re-name the function to
validate_start_block()

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
Message-Id: <20220811121111.9878-5-frankja@linux.ibm.com>
(cherry picked from commit 0c2994ac9009577b967529ce18e269da5b280351)
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 dump/dump.c           | 20 ++++++--------------
 include/sysemu/dump.h |  2 --
 2 files changed, 6 insertions(+), 16 deletions(-)

diff --git a/dump/dump.c b/dump/dump.c
index d981e843dd5..e6aa037f59a 100644
--- a/dump/dump.c
+++ b/dump/dump.c
@@ -1509,30 +1509,22 @@ static void create_kdump_vmcore(DumpState *s, Error **errp)
     }
 }
 
-static ram_addr_t get_start_block(DumpState *s)
+static int validate_start_block(DumpState *s)
 {
     GuestPhysBlock *block;
 
     if (!s->has_filter) {
-        s->next_block = QTAILQ_FIRST(&s->guest_phys_blocks.head);
         return 0;
     }
 
     QTAILQ_FOREACH(block, &s->guest_phys_blocks.head, next) {
+        /* This block is out of the range */
         if (block->target_start >= s->begin + s->length ||
             block->target_end <= s->begin) {
-            /* This block is out of the range */
             continue;
         }
-
-        s->next_block = block;
-        if (s->begin > block->target_start) {
-            s->start = s->begin - block->target_start;
-        } else {
-            s->start = 0;
-        }
-        return s->start;
-    }
+        return 0;
+   }
 
     return -1;
 }
@@ -1679,8 +1671,8 @@ static void dump_init(DumpState *s, int fd, bool has_format,
         goto cleanup;
     }
 
-    s->start = get_start_block(s);
-    if (s->start == -1) {
+    /* Is the filter filtering everything? */
+    if (validate_start_block(s) == -1) {
         error_setg(errp, QERR_INVALID_PARAMETER, "begin");
         goto cleanup;
     }
diff --git a/include/sysemu/dump.h b/include/sysemu/dump.h
index ffc2ea1072f..7fce1d4af67 100644
--- a/include/sysemu/dump.h
+++ b/include/sysemu/dump.h
@@ -166,8 +166,6 @@ typedef struct DumpState {
     hwaddr memory_offset;
     int fd;
 
-    GuestPhysBlock *next_block;
-    ram_addr_t start;
     bool has_filter;
     int64_t begin;
     int64_t length;
-- 
Gitee


From 9e2978ef77426d79d685c94d32aa7e37fbfccbf2 Mon Sep 17 00:00:00 2001
From: Janosch Frank <frankja@linux.ibm.com>
Date: Thu, 11 Aug 2022 12:10:58 +0000
Subject: [PATCH 192/407] dump: Rework filter area variables
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 226: s390: Enhanced Interpretation for PCI Functions and Secure Execution guest dump
RH-Bugzilla: 1664378 2043909
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [26/41] f10a5523dfd2724f7a8637fca3ed68ba6df659a5

While the DumpState begin and length variables directly mirror the API
variable names they are not very descriptive. So let's add a
"filter_area_" prefix and make has_filter a function checking length > 0.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20220811121111.9878-6-frankja@linux.ibm.com>
(cherry picked from commit dddf725f70bfe7f5adb41fa31dbd06e767271bda)
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 dump/dump.c           | 53 +++++++++++++++++++++++++------------------
 include/sysemu/dump.h | 13 ++++++++---
 2 files changed, 41 insertions(+), 25 deletions(-)

diff --git a/dump/dump.c b/dump/dump.c
index e6aa037f59a..f6fe13e2582 100644
--- a/dump/dump.c
+++ b/dump/dump.c
@@ -60,6 +60,11 @@ static inline bool dump_is_64bit(DumpState *s)
     return s->dump_info.d_class == ELFCLASS64;
 }
 
+static inline bool dump_has_filter(DumpState *s)
+{
+    return s->filter_area_length > 0;
+}
+
 uint16_t cpu_to_dump16(DumpState *s, uint16_t val)
 {
     if (s->dump_info.d_endian == ELFDATA2LSB) {
@@ -444,29 +449,30 @@ static void get_offset_range(hwaddr phys_addr,
     *p_offset = -1;
     *p_filesz = 0;
 
-    if (s->has_filter) {
-        if (phys_addr < s->begin || phys_addr >= s->begin + s->length) {
+    if (dump_has_filter(s)) {
+        if (phys_addr < s->filter_area_begin ||
+            phys_addr >= s->filter_area_begin + s->filter_area_length) {
             return;
         }
     }
 
     QTAILQ_FOREACH(block, &s->guest_phys_blocks.head, next) {
-        if (s->has_filter) {
-            if (block->target_start >= s->begin + s->length ||
-                block->target_end <= s->begin) {
+        if (dump_has_filter(s)) {
+            if (block->target_start >= s->filter_area_begin + s->filter_area_length ||
+                block->target_end <= s->filter_area_begin) {
                 /* This block is out of the range */
                 continue;
             }
 
-            if (s->begin <= block->target_start) {
+            if (s->filter_area_begin <= block->target_start) {
                 start = block->target_start;
             } else {
-                start = s->begin;
+                start = s->filter_area_begin;
             }
 
             size_in_block = block->target_end - start;
-            if (s->begin + s->length < block->target_end) {
-                size_in_block -= block->target_end - (s->begin + s->length);
+            if (s->filter_area_begin + s->filter_area_length < block->target_end) {
+                size_in_block -= block->target_end - (s->filter_area_begin + s->filter_area_length);
             }
         } else {
             start = block->target_start;
@@ -639,12 +645,12 @@ static void dump_iterate(DumpState *s, Error **errp)
     int64_t memblock_size, memblock_start;
 
     QTAILQ_FOREACH(block, &s->guest_phys_blocks.head, next) {
-        memblock_start = dump_filtered_memblock_start(block, s->begin, s->length);
+        memblock_start = dump_filtered_memblock_start(block, s->filter_area_begin, s->filter_area_length);
         if (memblock_start == -1) {
             continue;
         }
 
-        memblock_size = dump_filtered_memblock_size(block, s->begin, s->length);
+        memblock_size = dump_filtered_memblock_size(block, s->filter_area_begin, s->filter_area_length);
 
         /* Write the memory to file */
         write_memory(s, block, memblock_start, memblock_size, errp);
@@ -1513,14 +1519,14 @@ static int validate_start_block(DumpState *s)
 {
     GuestPhysBlock *block;
 
-    if (!s->has_filter) {
+    if (!dump_has_filter(s)) {
         return 0;
     }
 
     QTAILQ_FOREACH(block, &s->guest_phys_blocks.head, next) {
         /* This block is out of the range */
-        if (block->target_start >= s->begin + s->length ||
-            block->target_end <= s->begin) {
+        if (block->target_start >= s->filter_area_begin + s->filter_area_length ||
+            block->target_end <= s->filter_area_begin) {
             continue;
         }
         return 0;
@@ -1559,10 +1565,10 @@ static int64_t dump_calculate_size(DumpState *s)
     int64_t size = 0, total = 0, left = 0, right = 0;
 
     QTAILQ_FOREACH(block, &s->guest_phys_blocks.head, next) {
-        if (s->has_filter) {
+        if (dump_has_filter(s)) {
             /* calculate the overlapped region. */
-            left = MAX(s->begin, block->target_start);
-            right = MIN(s->begin + s->length, block->target_end);
+            left = MAX(s->filter_area_begin, block->target_start);
+            right = MIN(s->filter_area_begin + s->filter_area_length, block->target_end);
             size = right - left;
             size = size > 0 ? size : 0;
         } else {
@@ -1652,9 +1658,12 @@ static void dump_init(DumpState *s, int fd, bool has_format,
     }
 
     s->fd = fd;
-    s->has_filter = has_filter;
-    s->begin = begin;
-    s->length = length;
+    if (has_filter && !length) {
+        error_setg(errp, QERR_INVALID_PARAMETER, "length");
+        goto cleanup;
+    }
+    s->filter_area_begin = begin;
+    s->filter_area_length = length;
 
     memory_mapping_list_init(&s->list);
 
@@ -1787,8 +1796,8 @@ static void dump_init(DumpState *s, int fd, bool has_format,
         return;
     }
 
-    if (s->has_filter) {
-        memory_mapping_filter(&s->list, s->begin, s->length);
+    if (dump_has_filter(s)) {
+        memory_mapping_filter(&s->list, s->filter_area_begin, s->filter_area_length);
     }
 
     /*
diff --git a/include/sysemu/dump.h b/include/sysemu/dump.h
index 7fce1d4af67..b62513d87d6 100644
--- a/include/sysemu/dump.h
+++ b/include/sysemu/dump.h
@@ -166,9 +166,16 @@ typedef struct DumpState {
     hwaddr memory_offset;
     int fd;
 
-    bool has_filter;
-    int64_t begin;
-    int64_t length;
+    /*
+     * Dump filter area variables
+     *
+     * A filtered dump only contains the guest memory designated by
+     * the start address and length variables defined below.
+     *
+     * If length is 0, no filtering is applied.
+     */
+    int64_t filter_area_begin;  /* Start address of partial guest memory area */
+    int64_t filter_area_length; /* Length of partial guest memory area */
 
     uint8_t *note_buf;          /* buffer for notes */
     size_t note_buf_offset;     /* the writing place in note_buf */
-- 
Gitee


From cc834c4005f46413f221a9321fdf6121ff68c38b Mon Sep 17 00:00:00 2001
From: Janosch Frank <frankja@linux.ibm.com>
Date: Thu, 11 Aug 2022 12:10:59 +0000
Subject: [PATCH 193/407] dump: Rework dump_calculate_size function
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 226: s390: Enhanced Interpretation for PCI Functions and Secure Execution guest dump
RH-Bugzilla: 1664378 2043909
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [27/41] eaa05c39109b57a119752ad3df66f4c2ace2cbe4

dump_calculate_size() sums up all the sizes of the guest memory
blocks. Since we already have a function that calculates the size of a
single memory block (dump_get_memblock_size()) we can simply iterate
over the blocks and use the function instead of calculating the size
ourselves.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
Message-Id: <20220811121111.9878-7-frankja@linux.ibm.com>
(cherry picked from commit c370d5300f9ac1f90f8158082d22262b904fe30e)
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 dump/dump.c | 22 ++++++++--------------
 1 file changed, 8 insertions(+), 14 deletions(-)

diff --git a/dump/dump.c b/dump/dump.c
index f6fe13e2582..902a85ef8e6 100644
--- a/dump/dump.c
+++ b/dump/dump.c
@@ -1557,25 +1557,19 @@ bool dump_in_progress(void)
     return (qatomic_read(&state->status) == DUMP_STATUS_ACTIVE);
 }
 
-/* calculate total size of memory to be dumped (taking filter into
- * acoount.) */
+/*
+ * calculate total size of memory to be dumped (taking filter into
+ * account.)
+ */
 static int64_t dump_calculate_size(DumpState *s)
 {
     GuestPhysBlock *block;
-    int64_t size = 0, total = 0, left = 0, right = 0;
+    int64_t total = 0;
 
     QTAILQ_FOREACH(block, &s->guest_phys_blocks.head, next) {
-        if (dump_has_filter(s)) {
-            /* calculate the overlapped region. */
-            left = MAX(s->filter_area_begin, block->target_start);
-            right = MIN(s->filter_area_begin + s->filter_area_length, block->target_end);
-            size = right - left;
-            size = size > 0 ? size : 0;
-        } else {
-            /* count the whole region in */
-            size = (block->target_end - block->target_start);
-        }
-        total += size;
+        total += dump_filtered_memblock_size(block,
+                                             s->filter_area_begin,
+                                             s->filter_area_length);
     }
 
     return total;
-- 
Gitee


From 32d355f51918f1782e8be5079bd247516f10ad66 Mon Sep 17 00:00:00 2001
From: Janosch Frank <frankja@linux.ibm.com>
Date: Thu, 11 Aug 2022 12:11:00 +0000
Subject: [PATCH 194/407] dump: Split elf header functions into prepare and
 write
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 226: s390: Enhanced Interpretation for PCI Functions and Secure Execution guest dump
RH-Bugzilla: 1664378 2043909
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [28/41] f70a13ad443835e7f46b7c5e176e372d370ac797

Let's split the write from the modification of the elf header so we
can consolidate the write of the data in one function.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20220811121111.9878-8-frankja@linux.ibm.com>
(cherry picked from commit 670e76998a61ca171200fcded3865b294a2d1243)
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 dump/dump.c | 100 ++++++++++++++++++++++++++++------------------------
 1 file changed, 53 insertions(+), 47 deletions(-)

diff --git a/dump/dump.c b/dump/dump.c
index 902a85ef8e6..8d5226f8610 100644
--- a/dump/dump.c
+++ b/dump/dump.c
@@ -132,7 +132,7 @@ static int fd_write_vmcore(const void *buf, size_t size, void *opaque)
     return 0;
 }
 
-static void write_elf64_header(DumpState *s, Error **errp)
+static void prepare_elf64_header(DumpState *s, Elf64_Ehdr *elf_header)
 {
     /*
      * phnum in the elf header is 16 bit, if we have more segments we
@@ -140,34 +140,27 @@ static void write_elf64_header(DumpState *s, Error **errp)
      * special section.
      */
     uint16_t phnum = MIN(s->phdr_num, PN_XNUM);
-    Elf64_Ehdr elf_header;
-    int ret;
 
-    memset(&elf_header, 0, sizeof(Elf64_Ehdr));
-    memcpy(&elf_header, ELFMAG, SELFMAG);
-    elf_header.e_ident[EI_CLASS] = ELFCLASS64;
-    elf_header.e_ident[EI_DATA] = s->dump_info.d_endian;
-    elf_header.e_ident[EI_VERSION] = EV_CURRENT;
-    elf_header.e_type = cpu_to_dump16(s, ET_CORE);
-    elf_header.e_machine = cpu_to_dump16(s, s->dump_info.d_machine);
-    elf_header.e_version = cpu_to_dump32(s, EV_CURRENT);
-    elf_header.e_ehsize = cpu_to_dump16(s, sizeof(elf_header));
-    elf_header.e_phoff = cpu_to_dump64(s, s->phdr_offset);
-    elf_header.e_phentsize = cpu_to_dump16(s, sizeof(Elf64_Phdr));
-    elf_header.e_phnum = cpu_to_dump16(s, phnum);
+    memset(elf_header, 0, sizeof(Elf64_Ehdr));
+    memcpy(elf_header, ELFMAG, SELFMAG);
+    elf_header->e_ident[EI_CLASS] = ELFCLASS64;
+    elf_header->e_ident[EI_DATA] = s->dump_info.d_endian;
+    elf_header->e_ident[EI_VERSION] = EV_CURRENT;
+    elf_header->e_type = cpu_to_dump16(s, ET_CORE);
+    elf_header->e_machine = cpu_to_dump16(s, s->dump_info.d_machine);
+    elf_header->e_version = cpu_to_dump32(s, EV_CURRENT);
+    elf_header->e_ehsize = cpu_to_dump16(s, sizeof(elf_header));
+    elf_header->e_phoff = cpu_to_dump64(s, s->phdr_offset);
+    elf_header->e_phentsize = cpu_to_dump16(s, sizeof(Elf64_Phdr));
+    elf_header->e_phnum = cpu_to_dump16(s, phnum);
     if (s->shdr_num) {
-        elf_header.e_shoff = cpu_to_dump64(s, s->shdr_offset);
-        elf_header.e_shentsize = cpu_to_dump16(s, sizeof(Elf64_Shdr));
-        elf_header.e_shnum = cpu_to_dump16(s, s->shdr_num);
-    }
-
-    ret = fd_write_vmcore(&elf_header, sizeof(elf_header), s);
-    if (ret < 0) {
-        error_setg_errno(errp, -ret, "dump: failed to write elf header");
+        elf_header->e_shoff = cpu_to_dump64(s, s->shdr_offset);
+        elf_header->e_shentsize = cpu_to_dump16(s, sizeof(Elf64_Shdr));
+        elf_header->e_shnum = cpu_to_dump16(s, s->shdr_num);
     }
 }
 
-static void write_elf32_header(DumpState *s, Error **errp)
+static void prepare_elf32_header(DumpState *s, Elf32_Ehdr *elf_header)
 {
     /*
      * phnum in the elf header is 16 bit, if we have more segments we
@@ -175,28 +168,45 @@ static void write_elf32_header(DumpState *s, Error **errp)
      * special section.
      */
     uint16_t phnum = MIN(s->phdr_num, PN_XNUM);
-    Elf32_Ehdr elf_header;
-    int ret;
 
-    memset(&elf_header, 0, sizeof(Elf32_Ehdr));
-    memcpy(&elf_header, ELFMAG, SELFMAG);
-    elf_header.e_ident[EI_CLASS] = ELFCLASS32;
-    elf_header.e_ident[EI_DATA] = s->dump_info.d_endian;
-    elf_header.e_ident[EI_VERSION] = EV_CURRENT;
-    elf_header.e_type = cpu_to_dump16(s, ET_CORE);
-    elf_header.e_machine = cpu_to_dump16(s, s->dump_info.d_machine);
-    elf_header.e_version = cpu_to_dump32(s, EV_CURRENT);
-    elf_header.e_ehsize = cpu_to_dump16(s, sizeof(elf_header));
-    elf_header.e_phoff = cpu_to_dump32(s, s->phdr_offset);
-    elf_header.e_phentsize = cpu_to_dump16(s, sizeof(Elf32_Phdr));
-    elf_header.e_phnum = cpu_to_dump16(s, phnum);
+    memset(elf_header, 0, sizeof(Elf32_Ehdr));
+    memcpy(elf_header, ELFMAG, SELFMAG);
+    elf_header->e_ident[EI_CLASS] = ELFCLASS32;
+    elf_header->e_ident[EI_DATA] = s->dump_info.d_endian;
+    elf_header->e_ident[EI_VERSION] = EV_CURRENT;
+    elf_header->e_type = cpu_to_dump16(s, ET_CORE);
+    elf_header->e_machine = cpu_to_dump16(s, s->dump_info.d_machine);
+    elf_header->e_version = cpu_to_dump32(s, EV_CURRENT);
+    elf_header->e_ehsize = cpu_to_dump16(s, sizeof(elf_header));
+    elf_header->e_phoff = cpu_to_dump32(s, s->phdr_offset);
+    elf_header->e_phentsize = cpu_to_dump16(s, sizeof(Elf32_Phdr));
+    elf_header->e_phnum = cpu_to_dump16(s, phnum);
     if (s->shdr_num) {
-        elf_header.e_shoff = cpu_to_dump32(s, s->shdr_offset);
-        elf_header.e_shentsize = cpu_to_dump16(s, sizeof(Elf32_Shdr));
-        elf_header.e_shnum = cpu_to_dump16(s, s->shdr_num);
+        elf_header->e_shoff = cpu_to_dump32(s, s->shdr_offset);
+        elf_header->e_shentsize = cpu_to_dump16(s, sizeof(Elf32_Shdr));
+        elf_header->e_shnum = cpu_to_dump16(s, s->shdr_num);
     }
+}
 
-    ret = fd_write_vmcore(&elf_header, sizeof(elf_header), s);
+static void write_elf_header(DumpState *s, Error **errp)
+{
+    Elf32_Ehdr elf32_header;
+    Elf64_Ehdr elf64_header;
+    size_t header_size;
+    void *header_ptr;
+    int ret;
+
+    if (dump_is_64bit(s)) {
+        prepare_elf64_header(s, &elf64_header);
+        header_size = sizeof(elf64_header);
+        header_ptr = &elf64_header;
+    } else {
+        prepare_elf32_header(s, &elf32_header);
+        header_size = sizeof(elf32_header);
+        header_ptr = &elf32_header;
+    }
+
+    ret = fd_write_vmcore(header_ptr, header_size, s);
     if (ret < 0) {
         error_setg_errno(errp, -ret, "dump: failed to write elf header");
     }
@@ -565,11 +575,7 @@ static void dump_begin(DumpState *s, Error **errp)
      */
 
     /* write elf header to vmcore */
-    if (dump_is_64bit(s)) {
-        write_elf64_header(s, errp);
-    } else {
-        write_elf32_header(s, errp);
-    }
+    write_elf_header(s, errp);
     if (*errp) {
         return;
     }
-- 
Gitee


From a2e84d9f1a6df175f6faf75b9be4dd8bcaffb701 Mon Sep 17 00:00:00 2001
From: Janosch Frank <frankja@linux.ibm.com>
Date: Thu, 11 Aug 2022 12:11:01 +0000
Subject: [PATCH 195/407] dump: Rename write_elf*_phdr_note to
 prepare_elf*_phdr_note
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 226: s390: Enhanced Interpretation for PCI Functions and Secure Execution guest dump
RH-Bugzilla: 1664378 2043909
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [29/41] 876cea6f6e51be8df2763f56d0daef99d11fdd49

The functions in question do not actually write to the file descriptor
they set up a buffer which is later written to the fd.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20220811121111.9878-9-frankja@linux.ibm.com>
(cherry picked from commit 2341a94d3a0a8a93a5a977e642da1807b8edaab8)
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 dump/dump.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/dump/dump.c b/dump/dump.c
index 8d5226f8610..c2c1341ad72 100644
--- a/dump/dump.c
+++ b/dump/dump.c
@@ -261,7 +261,7 @@ static void write_elf32_load(DumpState *s, MemoryMapping *memory_mapping,
     }
 }
 
-static void write_elf64_phdr_note(DumpState *s, Elf64_Phdr *phdr)
+static void prepare_elf64_phdr_note(DumpState *s, Elf64_Phdr *phdr)
 {
     memset(phdr, 0, sizeof(*phdr));
     phdr->p_type = cpu_to_dump32(s, PT_NOTE);
@@ -317,7 +317,7 @@ static void write_elf64_notes(WriteCoreDumpFunction f, DumpState *s,
     write_guest_note(f, s, errp);
 }
 
-static void write_elf32_phdr_note(DumpState *s, Elf32_Phdr *phdr)
+static void prepare_elf32_phdr_note(DumpState *s, Elf32_Phdr *phdr)
 {
     memset(phdr, 0, sizeof(*phdr));
     phdr->p_type = cpu_to_dump32(s, PT_NOTE);
@@ -365,11 +365,11 @@ static void write_elf_phdr_note(DumpState *s, Error **errp)
     int ret;
 
     if (dump_is_64bit(s)) {
-        write_elf64_phdr_note(s, &phdr64);
+        prepare_elf64_phdr_note(s, &phdr64);
         size = sizeof(phdr64);
         phdr = &phdr64;
     } else {
-        write_elf32_phdr_note(s, &phdr32);
+        prepare_elf32_phdr_note(s, &phdr32);
         size = sizeof(phdr32);
         phdr = &phdr32;
     }
-- 
Gitee


From e7ccd631361d9704b9046cc1450edb4ccfb53880 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Marc-Andr=C3=A9=20Lureau?= <marcandre.lureau@redhat.com>
Date: Thu, 25 Aug 2022 12:40:12 +0400
Subject: [PATCH 196/407] dump: simplify a bit kdump get_next_page()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 226: s390: Enhanced Interpretation for PCI Functions and Secure Execution guest dump
RH-Bugzilla: 1664378 2043909
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [30/41] 417ac19fa96036e0242f40121ac6e87a9f3f70ba

This should be functionally equivalent, but slightly easier to read,
with simplified paths and checks at the end of the function.

The following patch is a major rewrite to get rid of the assert().

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
(cherry picked from commit 08df343874fcddd260021a04ce3c5a34f2c48164)
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 dump/dump.c | 21 ++++++++-------------
 1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/dump/dump.c b/dump/dump.c
index c2c1341ad72..1c49232390d 100644
--- a/dump/dump.c
+++ b/dump/dump.c
@@ -1133,17 +1133,11 @@ static bool get_next_page(GuestPhysBlock **blockptr, uint64_t *pfnptr,
     if (!block) {
         block = QTAILQ_FIRST(&s->guest_phys_blocks.head);
         *blockptr = block;
-        assert((block->target_start & ~target_page_mask) == 0);
-        assert((block->target_end & ~target_page_mask) == 0);
-        *pfnptr = dump_paddr_to_pfn(s, block->target_start);
-        if (bufptr) {
-            *bufptr = block->host_addr;
-        }
-        return true;
+        addr = block->target_start;
+    } else {
+        addr = dump_pfn_to_paddr(s, *pfnptr + 1);
     }
-
-    *pfnptr = *pfnptr + 1;
-    addr = dump_pfn_to_paddr(s, *pfnptr);
+    assert(block != NULL);
 
     if ((addr >= block->target_start) &&
         (addr + s->dump_info.page_size <= block->target_end)) {
@@ -1155,12 +1149,13 @@ static bool get_next_page(GuestPhysBlock **blockptr, uint64_t *pfnptr,
         if (!block) {
             return false;
         }
-        assert((block->target_start & ~target_page_mask) == 0);
-        assert((block->target_end & ~target_page_mask) == 0);
-        *pfnptr = dump_paddr_to_pfn(s, block->target_start);
+        addr = block->target_start;
         buf = block->host_addr;
     }
 
+    assert((block->target_start & ~target_page_mask) == 0);
+    assert((block->target_end & ~target_page_mask) == 0);
+    *pfnptr = dump_paddr_to_pfn(s, addr);
     if (bufptr) {
         *bufptr = buf;
     }
-- 
Gitee


From fd710c2361a3e4909d8cb835910d35ca32b118b5 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Marc-Andr=C3=A9=20Lureau?= <marcandre.lureau@redhat.com>
Date: Mon, 5 Sep 2022 16:06:21 +0400
Subject: [PATCH 197/407] dump: fix kdump to work over non-aligned blocks
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 226: s390: Enhanced Interpretation for PCI Functions and Secure Execution guest dump
RH-Bugzilla: 1664378 2043909
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [31/41] b307bdce4a4791fc30160fa2a1678bd238f2432e

Rewrite get_next_page() to work over non-aligned blocks. When it
encounters non aligned addresses, it will try to fill a page provided by
the caller.

This solves a kdump crash with "tpm-crb-cmd" RAM memory region,
qemu-kvm: ../dump/dump.c:1162: _Bool get_next_page(GuestPhysBlock **,
uint64_t *, uint8_t **, DumpState *): Assertion `(block->target_start &
~target_page_mask) == 0' failed.

because:
guest_phys_block_add_section: target_start=00000000fed40080 target_end=00000000fed41000: added (count: 4)

Fixes:
https://bugzilla.redhat.com/show_bug.cgi?id=2120480

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
(cherry picked from commit 94d788408d2d5a6474c99b2c9cf06913b9db7c58)
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 dump/dump.c | 79 +++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 56 insertions(+), 23 deletions(-)

diff --git a/dump/dump.c b/dump/dump.c
index 1c49232390d..88177fa886a 100644
--- a/dump/dump.c
+++ b/dump/dump.c
@@ -1117,50 +1117,81 @@ static uint64_t dump_pfn_to_paddr(DumpState *s, uint64_t pfn)
 }
 
 /*
- * exam every page and return the page frame number and the address of the page.
- * bufptr can be NULL. note: the blocks here is supposed to reflect guest-phys
- * blocks, so block->target_start and block->target_end should be interal
- * multiples of the target page size.
+ * Return the page frame number and the page content in *bufptr. bufptr can be
+ * NULL. If not NULL, *bufptr must contains a target page size of pre-allocated
+ * memory. This is not necessarily the memory returned.
  */
 static bool get_next_page(GuestPhysBlock **blockptr, uint64_t *pfnptr,
                           uint8_t **bufptr, DumpState *s)
 {
     GuestPhysBlock *block = *blockptr;
-    hwaddr addr, target_page_mask = ~((hwaddr)s->dump_info.page_size - 1);
-    uint8_t *buf;
+    uint32_t page_size = s->dump_info.page_size;
+    uint8_t *buf = NULL, *hbuf;
+    hwaddr addr;
 
     /* block == NULL means the start of the iteration */
     if (!block) {
         block = QTAILQ_FIRST(&s->guest_phys_blocks.head);
         *blockptr = block;
         addr = block->target_start;
+        *pfnptr = dump_paddr_to_pfn(s, addr);
     } else {
-        addr = dump_pfn_to_paddr(s, *pfnptr + 1);
+        *pfnptr += 1;
+        addr = dump_pfn_to_paddr(s, *pfnptr);
     }
     assert(block != NULL);
 
-    if ((addr >= block->target_start) &&
-        (addr + s->dump_info.page_size <= block->target_end)) {
-        buf = block->host_addr + (addr - block->target_start);
-    } else {
-        /* the next page is in the next block */
-        block = QTAILQ_NEXT(block, next);
-        *blockptr = block;
-        if (!block) {
-            return false;
+    while (1) {
+        if (addr >= block->target_start && addr < block->target_end) {
+            size_t n = MIN(block->target_end - addr, page_size - addr % page_size);
+            hbuf = block->host_addr + (addr - block->target_start);
+            if (!buf) {
+                if (n == page_size) {
+                    /* this is a whole target page, go for it */
+                    assert(addr % page_size == 0);
+                    buf = hbuf;
+                    break;
+                } else if (bufptr) {
+                    assert(*bufptr);
+                    buf = *bufptr;
+                    memset(buf, 0, page_size);
+                } else {
+                    return true;
+                }
+            }
+
+            memcpy(buf + addr % page_size, hbuf, n);
+            addr += n;
+            if (addr % page_size == 0) {
+                /* we filled up the page */
+                break;
+            }
+        } else {
+            /* the next page is in the next block */
+            *blockptr = block = QTAILQ_NEXT(block, next);
+            if (!block) {
+                break;
+            }
+
+            addr = block->target_start;
+            /* are we still in the same page? */
+            if (dump_paddr_to_pfn(s, addr) != *pfnptr) {
+                if (buf) {
+                    /* no, but we already filled something earlier, return it */
+                    break;
+                } else {
+                    /* else continue from there */
+                    *pfnptr = dump_paddr_to_pfn(s, addr);
+                }
+            }
         }
-        addr = block->target_start;
-        buf = block->host_addr;
     }
 
-    assert((block->target_start & ~target_page_mask) == 0);
-    assert((block->target_end & ~target_page_mask) == 0);
-    *pfnptr = dump_paddr_to_pfn(s, addr);
     if (bufptr) {
         *bufptr = buf;
     }
 
-    return true;
+    return buf != NULL;
 }
 
 static void write_dump_bitmap(DumpState *s, Error **errp)
@@ -1306,6 +1337,7 @@ static void write_dump_pages(DumpState *s, Error **errp)
     uint8_t *buf;
     GuestPhysBlock *block_iter = NULL;
     uint64_t pfn_iter;
+    g_autofree uint8_t *page = NULL;
 
     /* get offset of page_desc and page_data in dump file */
     offset_desc = s->offset_page;
@@ -1341,12 +1373,13 @@ static void write_dump_pages(DumpState *s, Error **errp)
     }
 
     offset_data += s->dump_info.page_size;
+    page = g_malloc(s->dump_info.page_size);
 
     /*
      * dump memory to vmcore page by page. zero page will all be resided in the
      * first page of page section
      */
-    while (get_next_page(&block_iter, &pfn_iter, &buf, s)) {
+    for (buf = page; get_next_page(&block_iter, &pfn_iter, &buf, s); buf = page) {
         /* check zero page */
         if (is_zero_page(buf, s->dump_info.page_size)) {
             ret = write_cache(&page_desc, &pd_zero, sizeof(PageDescriptor),
-- 
Gitee


From 5db606b95e896772caa5240125620fb8daf88011 Mon Sep 17 00:00:00 2001
From: Janosch Frank <frankja@linux.ibm.com>
Date: Mon, 17 Oct 2022 08:38:13 +0000
Subject: [PATCH 198/407] dump: Use a buffer for ELF section data and headers
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 226: s390: Enhanced Interpretation for PCI Functions and Secure Execution guest dump
RH-Bugzilla: 1664378 2043909
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [32/41] e1a03e202e67764581e486f37e13e479200e5846

Currently we're writing the NULL section header if we overflow the
physical header number in the ELF header. But in the future we'll add
custom section headers AND section data.

To facilitate this we need to rearange section handling a bit. As with
the other ELF headers we split the code into a prepare and a write
step.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20221017083822.43118-2-frankja@linux.ibm.com>
(cherry picked from commit e41ed29bcee5cb16715317bcf290f6b5c196eb0a)
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 dump/dump.c           | 75 +++++++++++++++++++++++++++++--------------
 include/sysemu/dump.h |  2 ++
 2 files changed, 53 insertions(+), 24 deletions(-)

diff --git a/dump/dump.c b/dump/dump.c
index 88177fa886a..4142b4cc0cb 100644
--- a/dump/dump.c
+++ b/dump/dump.c
@@ -381,31 +381,60 @@ static void write_elf_phdr_note(DumpState *s, Error **errp)
     }
 }
 
-static void write_elf_section(DumpState *s, int type, Error **errp)
+static void prepare_elf_section_hdr_zero(DumpState *s)
 {
-    Elf32_Shdr shdr32;
-    Elf64_Shdr shdr64;
-    int shdr_size;
-    void *shdr;
-    int ret;
+    if (dump_is_64bit(s)) {
+        Elf64_Shdr *shdr64 = s->elf_section_hdrs;
 
-    if (type == 0) {
-        shdr_size = sizeof(Elf32_Shdr);
-        memset(&shdr32, 0, shdr_size);
-        shdr32.sh_info = cpu_to_dump32(s, s->phdr_num);
-        shdr = &shdr32;
+        shdr64->sh_info = cpu_to_dump32(s, s->phdr_num);
     } else {
-        shdr_size = sizeof(Elf64_Shdr);
-        memset(&shdr64, 0, shdr_size);
-        shdr64.sh_info = cpu_to_dump32(s, s->phdr_num);
-        shdr = &shdr64;
+        Elf32_Shdr *shdr32 = s->elf_section_hdrs;
+
+        shdr32->sh_info = cpu_to_dump32(s, s->phdr_num);
+    }
+}
+
+static void prepare_elf_section_hdrs(DumpState *s)
+{
+    size_t len, sizeof_shdr;
+
+    /*
+     * Section ordering:
+     * - HDR zero
+     */
+    sizeof_shdr = dump_is_64bit(s) ? sizeof(Elf64_Shdr) : sizeof(Elf32_Shdr);
+    len = sizeof_shdr * s->shdr_num;
+    s->elf_section_hdrs = g_malloc0(len);
+
+    /*
+     * The first section header is ALWAYS a special initial section
+     * header.
+     *
+     * The header should be 0 with one exception being that if
+     * phdr_num is PN_XNUM then the sh_info field contains the real
+     * number of segment entries.
+     *
+     * As we zero allocate the buffer we will only need to modify
+     * sh_info for the PN_XNUM case.
+     */
+    if (s->phdr_num >= PN_XNUM) {
+        prepare_elf_section_hdr_zero(s);
     }
+}
 
-    ret = fd_write_vmcore(shdr, shdr_size, s);
+static void write_elf_section_headers(DumpState *s, Error **errp)
+{
+    size_t sizeof_shdr = dump_is_64bit(s) ? sizeof(Elf64_Shdr) : sizeof(Elf32_Shdr);
+    int ret;
+
+    prepare_elf_section_hdrs(s);
+
+    ret = fd_write_vmcore(s->elf_section_hdrs, s->shdr_num * sizeof_shdr, s);
     if (ret < 0) {
-        error_setg_errno(errp, -ret,
-                         "dump: failed to write section header table");
+        error_setg_errno(errp, -ret, "dump: failed to write section headers");
     }
+
+    g_free(s->elf_section_hdrs);
 }
 
 static void write_data(DumpState *s, void *buf, int length, Error **errp)
@@ -592,12 +621,10 @@ static void dump_begin(DumpState *s, Error **errp)
         return;
     }
 
-    /* write section to vmcore */
-    if (s->shdr_num) {
-        write_elf_section(s, 1, errp);
-        if (*errp) {
-            return;
-        }
+    /* write section headers to vmcore */
+    write_elf_section_headers(s, errp);
+    if (*errp) {
+        return;
     }
 
     /* write notes to vmcore */
diff --git a/include/sysemu/dump.h b/include/sysemu/dump.h
index b62513d87d6..9995f65dc8b 100644
--- a/include/sysemu/dump.h
+++ b/include/sysemu/dump.h
@@ -177,6 +177,8 @@ typedef struct DumpState {
     int64_t filter_area_begin;  /* Start address of partial guest memory area */
     int64_t filter_area_length; /* Length of partial guest memory area */
 
+    void *elf_section_hdrs;     /* Pointer to section header buffer */
+
     uint8_t *note_buf;          /* buffer for notes */
     size_t note_buf_offset;     /* the writing place in note_buf */
     uint32_t nr_cpus;           /* number of guest's cpu */
-- 
Gitee


From 49f9453130eddc2339ca5a4e40bda04860dc957c Mon Sep 17 00:00:00 2001
From: Janosch Frank <frankja@linux.ibm.com>
Date: Mon, 17 Oct 2022 08:38:14 +0000
Subject: [PATCH 199/407] dump: Write ELF section headers right after ELF
 header
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 226: s390: Enhanced Interpretation for PCI Functions and Secure Execution guest dump
RH-Bugzilla: 1664378 2043909
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [33/41] e956040753533ac376e9763145192de1e216027d

Let's start bundling the writes of the headers and of the data so we
have a clear ordering between them. Since the ELF header uses offsets
to the headers we can freely order them.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20221017083822.43118-3-frankja@linux.ibm.com>
(cherry picked from commit cb415fd61e48d52f81dcf38956e3f913651cff1c)
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 dump/dump.c | 31 ++++++++++++++-----------------
 1 file changed, 14 insertions(+), 17 deletions(-)

diff --git a/dump/dump.c b/dump/dump.c
index 4142b4cc0cb..d17537d4e93 100644
--- a/dump/dump.c
+++ b/dump/dump.c
@@ -584,6 +584,8 @@ static void dump_begin(DumpState *s, Error **errp)
      *   --------------
      *   |  elf header |
      *   --------------
+     *   |  sctn_hdr   |
+     *   --------------
      *   |  PT_NOTE    |
      *   --------------
      *   |  PT_LOAD    |
@@ -592,8 +594,6 @@ static void dump_begin(DumpState *s, Error **errp)
      *   --------------
      *   |  PT_LOAD    |
      *   --------------
-     *   |  sec_hdr    |
-     *   --------------
      *   |  elf note   |
      *   --------------
      *   |  memory     |
@@ -609,20 +609,20 @@ static void dump_begin(DumpState *s, Error **errp)
         return;
     }
 
-    /* write PT_NOTE to vmcore */
-    write_elf_phdr_note(s, errp);
+    /* write section headers to vmcore */
+    write_elf_section_headers(s, errp);
     if (*errp) {
         return;
     }
 
-    /* write all PT_LOADs to vmcore */
-    write_elf_phdr_loads(s, errp);
+    /* write PT_NOTE to vmcore */
+    write_elf_phdr_note(s, errp);
     if (*errp) {
         return;
     }
 
-    /* write section headers to vmcore */
-    write_elf_section_headers(s, errp);
+    /* write all PT_LOADs to vmcore */
+    write_elf_phdr_loads(s, errp);
     if (*errp) {
         return;
     }
@@ -1877,16 +1877,13 @@ static void dump_init(DumpState *s, int fd, bool has_format,
     }
 
     if (dump_is_64bit(s)) {
-        s->phdr_offset = sizeof(Elf64_Ehdr);
-        s->shdr_offset = s->phdr_offset + sizeof(Elf64_Phdr) * s->phdr_num;
-        s->note_offset = s->shdr_offset + sizeof(Elf64_Shdr) * s->shdr_num;
-        s->memory_offset = s->note_offset + s->note_size;
+        s->shdr_offset = sizeof(Elf64_Ehdr);
+        s->phdr_offset = s->shdr_offset + sizeof(Elf64_Shdr) * s->shdr_num;
+        s->note_offset = s->phdr_offset + sizeof(Elf64_Phdr) * s->phdr_num;
     } else {
-
-        s->phdr_offset = sizeof(Elf32_Ehdr);
-        s->shdr_offset = s->phdr_offset + sizeof(Elf32_Phdr) * s->phdr_num;
-        s->note_offset = s->shdr_offset + sizeof(Elf32_Shdr) * s->shdr_num;
-        s->memory_offset = s->note_offset + s->note_size;
+        s->shdr_offset = sizeof(Elf32_Ehdr);
+        s->phdr_offset = s->shdr_offset + sizeof(Elf32_Shdr) * s->shdr_num;
+        s->note_offset = s->phdr_offset + sizeof(Elf32_Phdr) * s->phdr_num;
     }
 
     return;
-- 
Gitee


From 1da44fe42a13524f462fb79ec9f925aca7b67584 Mon Sep 17 00:00:00 2001
From: Janosch Frank <frankja@linux.ibm.com>
Date: Mon, 17 Oct 2022 08:38:15 +0000
Subject: [PATCH 200/407] dump: Reorder struct DumpState
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 226: s390: Enhanced Interpretation for PCI Functions and Secure Execution guest dump
RH-Bugzilla: 1664378 2043909
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [34/41] 8d44e5e8c86ea5b33644eba141046cd657d0071e

Let's move ELF related members into one block and guest memory related
ones into another to improve readability.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20221017083822.43118-4-frankja@linux.ibm.com>
(cherry picked from commit 8384b73c46fd474847d7e74d121318e344edc3c4)
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 include/sysemu/dump.h | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/include/sysemu/dump.h b/include/sysemu/dump.h
index 9995f65dc8b..9ed811b3133 100644
--- a/include/sysemu/dump.h
+++ b/include/sysemu/dump.h
@@ -154,15 +154,8 @@ typedef struct DumpState {
     GuestPhysBlockList guest_phys_blocks;
     ArchDumpInfo dump_info;
     MemoryMappingList list;
-    uint32_t phdr_num;
-    uint32_t shdr_num;
     bool resume;
     bool detached;
-    ssize_t note_size;
-    hwaddr shdr_offset;
-    hwaddr phdr_offset;
-    hwaddr section_offset;
-    hwaddr note_offset;
     hwaddr memory_offset;
     int fd;
 
@@ -177,6 +170,15 @@ typedef struct DumpState {
     int64_t filter_area_begin;  /* Start address of partial guest memory area */
     int64_t filter_area_length; /* Length of partial guest memory area */
 
+    /* Elf dump related data */
+    uint32_t phdr_num;
+    uint32_t shdr_num;
+    ssize_t note_size;
+    hwaddr shdr_offset;
+    hwaddr phdr_offset;
+    hwaddr section_offset;
+    hwaddr note_offset;
+
     void *elf_section_hdrs;     /* Pointer to section header buffer */
 
     uint8_t *note_buf;          /* buffer for notes */
-- 
Gitee


From da430025044dc9d52ba31e88cb31162db5ce3da1 Mon Sep 17 00:00:00 2001
From: Janosch Frank <frankja@linux.ibm.com>
Date: Mon, 17 Oct 2022 08:38:16 +0000
Subject: [PATCH 201/407] dump: Reintroduce memory_offset and section_offset
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 226: s390: Enhanced Interpretation for PCI Functions and Secure Execution guest dump
RH-Bugzilla: 1664378 2043909
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [35/41] e60c0d066aeeedb42e724712bc3aa7b7591c6c79

section_offset will later be used to store the offset to the section
data which will be stored last. For now memory_offset is only needed
to make section_offset look nicer.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20221017083822.43118-5-frankja@linux.ibm.com>
(cherry picked from commit 13fd417ddc81a1685c6a8f4e1c80bbfe7150f164)
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 dump/dump.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/dump/dump.c b/dump/dump.c
index d17537d4e93..7a424017905 100644
--- a/dump/dump.c
+++ b/dump/dump.c
@@ -1885,6 +1885,8 @@ static void dump_init(DumpState *s, int fd, bool has_format,
         s->phdr_offset = s->shdr_offset + sizeof(Elf32_Shdr) * s->shdr_num;
         s->note_offset = s->phdr_offset + sizeof(Elf32_Phdr) * s->phdr_num;
     }
+    s->memory_offset = s->note_offset + s->note_size;
+    s->section_offset = s->memory_offset + s->total_size;
 
     return;
 
-- 
Gitee


From 1ce2c36236084cc76ad2dd7d6fe5875568b1c27e Mon Sep 17 00:00:00 2001
From: Janosch Frank <frankja@linux.ibm.com>
Date: Mon, 17 Oct 2022 11:32:10 +0000
Subject: [PATCH 202/407] dump: Add architecture section and section string
 table support
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 226: s390: Enhanced Interpretation for PCI Functions and Secure Execution guest dump
RH-Bugzilla: 1664378 2043909
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [36/41] 83b98ff185e93e62703f686b65546d60c783d783

Add hooks which architectures can use to add arbitrary data to custom
sections.

Also add a section name string table in order to identify section
contents

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20221017113210.41674-1-frankja@linux.ibm.com>
(cherry picked from commit 9b72224f44612ddd5b434a1bccf79346946d11da)
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 dump/dump.c                | 186 +++++++++++++++++++++++++++++++------
 include/sysemu/dump-arch.h |   3 +
 include/sysemu/dump.h      |   3 +
 3 files changed, 166 insertions(+), 26 deletions(-)

diff --git a/dump/dump.c b/dump/dump.c
index 7a424017905..4aa8fb64d24 100644
--- a/dump/dump.c
+++ b/dump/dump.c
@@ -104,6 +104,7 @@ static int dump_cleanup(DumpState *s)
     memory_mapping_list_free(&s->list);
     close(s->fd);
     g_free(s->guest_note);
+    g_array_unref(s->string_table_buf);
     s->guest_note = NULL;
     if (s->resume) {
         if (s->detached) {
@@ -153,11 +154,10 @@ static void prepare_elf64_header(DumpState *s, Elf64_Ehdr *elf_header)
     elf_header->e_phoff = cpu_to_dump64(s, s->phdr_offset);
     elf_header->e_phentsize = cpu_to_dump16(s, sizeof(Elf64_Phdr));
     elf_header->e_phnum = cpu_to_dump16(s, phnum);
-    if (s->shdr_num) {
-        elf_header->e_shoff = cpu_to_dump64(s, s->shdr_offset);
-        elf_header->e_shentsize = cpu_to_dump16(s, sizeof(Elf64_Shdr));
-        elf_header->e_shnum = cpu_to_dump16(s, s->shdr_num);
-    }
+    elf_header->e_shoff = cpu_to_dump64(s, s->shdr_offset);
+    elf_header->e_shentsize = cpu_to_dump16(s, sizeof(Elf64_Shdr));
+    elf_header->e_shnum = cpu_to_dump16(s, s->shdr_num);
+    elf_header->e_shstrndx = cpu_to_dump16(s, s->shdr_num - 1);
 }
 
 static void prepare_elf32_header(DumpState *s, Elf32_Ehdr *elf_header)
@@ -181,11 +181,10 @@ static void prepare_elf32_header(DumpState *s, Elf32_Ehdr *elf_header)
     elf_header->e_phoff = cpu_to_dump32(s, s->phdr_offset);
     elf_header->e_phentsize = cpu_to_dump16(s, sizeof(Elf32_Phdr));
     elf_header->e_phnum = cpu_to_dump16(s, phnum);
-    if (s->shdr_num) {
-        elf_header->e_shoff = cpu_to_dump32(s, s->shdr_offset);
-        elf_header->e_shentsize = cpu_to_dump16(s, sizeof(Elf32_Shdr));
-        elf_header->e_shnum = cpu_to_dump16(s, s->shdr_num);
-    }
+    elf_header->e_shoff = cpu_to_dump32(s, s->shdr_offset);
+    elf_header->e_shentsize = cpu_to_dump16(s, sizeof(Elf32_Shdr));
+    elf_header->e_shnum = cpu_to_dump16(s, s->shdr_num);
+    elf_header->e_shstrndx = cpu_to_dump16(s, s->shdr_num - 1);
 }
 
 static void write_elf_header(DumpState *s, Error **errp)
@@ -196,6 +195,8 @@ static void write_elf_header(DumpState *s, Error **errp)
     void *header_ptr;
     int ret;
 
+    /* The NULL header and the shstrtab are always defined */
+    assert(s->shdr_num >= 2);
     if (dump_is_64bit(s)) {
         prepare_elf64_header(s, &elf64_header);
         header_size = sizeof(elf64_header);
@@ -394,17 +395,49 @@ static void prepare_elf_section_hdr_zero(DumpState *s)
     }
 }
 
-static void prepare_elf_section_hdrs(DumpState *s)
+static void prepare_elf_section_hdr_string(DumpState *s, void *buff)
+{
+    uint64_t index = s->string_table_buf->len;
+    const char strtab[] = ".shstrtab";
+    Elf32_Shdr shdr32 = {};
+    Elf64_Shdr shdr64 = {};
+    int shdr_size;
+    void *shdr;
+
+    g_array_append_vals(s->string_table_buf, strtab, sizeof(strtab));
+    if (dump_is_64bit(s)) {
+        shdr_size = sizeof(Elf64_Shdr);
+        shdr64.sh_type = SHT_STRTAB;
+        shdr64.sh_offset = s->section_offset + s->elf_section_data_size;
+        shdr64.sh_name = index;
+        shdr64.sh_size = s->string_table_buf->len;
+        shdr = &shdr64;
+    } else {
+        shdr_size = sizeof(Elf32_Shdr);
+        shdr32.sh_type = SHT_STRTAB;
+        shdr32.sh_offset = s->section_offset + s->elf_section_data_size;
+        shdr32.sh_name = index;
+        shdr32.sh_size = s->string_table_buf->len;
+        shdr = &shdr32;
+    }
+    memcpy(buff, shdr, shdr_size);
+}
+
+static bool prepare_elf_section_hdrs(DumpState *s, Error **errp)
 {
     size_t len, sizeof_shdr;
+    void *buff_hdr;
 
     /*
      * Section ordering:
      * - HDR zero
+     * - Arch section hdrs
+     * - String table hdr
      */
     sizeof_shdr = dump_is_64bit(s) ? sizeof(Elf64_Shdr) : sizeof(Elf32_Shdr);
     len = sizeof_shdr * s->shdr_num;
     s->elf_section_hdrs = g_malloc0(len);
+    buff_hdr = s->elf_section_hdrs;
 
     /*
      * The first section header is ALWAYS a special initial section
@@ -420,6 +453,26 @@ static void prepare_elf_section_hdrs(DumpState *s)
     if (s->phdr_num >= PN_XNUM) {
         prepare_elf_section_hdr_zero(s);
     }
+    buff_hdr += sizeof_shdr;
+
+    /* Add architecture defined section headers */
+    if (s->dump_info.arch_sections_write_hdr_fn
+        && s->shdr_num > 2) {
+        buff_hdr += s->dump_info.arch_sections_write_hdr_fn(s, buff_hdr);
+
+        if (s->shdr_num >= SHN_LORESERVE) {
+            error_setg_errno(errp, EINVAL,
+                             "dump: too many architecture defined sections");
+            return false;
+        }
+    }
+
+    /*
+     * String table is the last section since strings are added via
+     * arch_sections_write_hdr().
+     */
+    prepare_elf_section_hdr_string(s, buff_hdr);
+    return true;
 }
 
 static void write_elf_section_headers(DumpState *s, Error **errp)
@@ -427,7 +480,9 @@ static void write_elf_section_headers(DumpState *s, Error **errp)
     size_t sizeof_shdr = dump_is_64bit(s) ? sizeof(Elf64_Shdr) : sizeof(Elf32_Shdr);
     int ret;
 
-    prepare_elf_section_hdrs(s);
+    if (!prepare_elf_section_hdrs(s, errp)) {
+        return;
+    }
 
     ret = fd_write_vmcore(s->elf_section_hdrs, s->shdr_num * sizeof_shdr, s);
     if (ret < 0) {
@@ -437,6 +492,29 @@ static void write_elf_section_headers(DumpState *s, Error **errp)
     g_free(s->elf_section_hdrs);
 }
 
+static void write_elf_sections(DumpState *s, Error **errp)
+{
+    int ret;
+
+    if (s->elf_section_data_size) {
+        /* Write architecture section data */
+        ret = fd_write_vmcore(s->elf_section_data,
+                              s->elf_section_data_size, s);
+        if (ret < 0) {
+            error_setg_errno(errp, -ret,
+                             "dump: failed to write architecture section data");
+            return;
+        }
+    }
+
+    /* Write string table */
+    ret = fd_write_vmcore(s->string_table_buf->data,
+                          s->string_table_buf->len, s);
+    if (ret < 0) {
+        error_setg_errno(errp, -ret, "dump: failed to write string table data");
+    }
+}
+
 static void write_data(DumpState *s, void *buf, int length, Error **errp)
 {
     int ret;
@@ -693,6 +771,31 @@ static void dump_iterate(DumpState *s, Error **errp)
     }
 }
 
+static void dump_end(DumpState *s, Error **errp)
+{
+    int rc;
+    ERRP_GUARD();
+
+    if (s->elf_section_data_size) {
+        s->elf_section_data = g_malloc0(s->elf_section_data_size);
+    }
+
+    /* Adds the architecture defined section data to s->elf_section_data  */
+    if (s->dump_info.arch_sections_write_fn &&
+        s->elf_section_data_size) {
+        rc = s->dump_info.arch_sections_write_fn(s, s->elf_section_data);
+        if (rc) {
+            error_setg_errno(errp, rc,
+                             "dump: failed to get arch section data");
+            g_free(s->elf_section_data);
+            return;
+        }
+    }
+
+    /* write sections to vmcore */
+    write_elf_sections(s, errp);
+}
+
 static void create_vmcore(DumpState *s, Error **errp)
 {
     ERRP_GUARD();
@@ -702,7 +805,14 @@ static void create_vmcore(DumpState *s, Error **errp)
         return;
     }
 
+    /* Iterate over memory and dump it to file */
     dump_iterate(s, errp);
+    if (*errp) {
+        return;
+    }
+
+    /* Write the section data */
+    dump_end(s, errp);
 }
 
 static int write_start_flat_header(int fd)
@@ -1720,6 +1830,14 @@ static void dump_init(DumpState *s, int fd, bool has_format,
     s->filter_area_begin = begin;
     s->filter_area_length = length;
 
+    /* First index is 0, it's the special null name */
+    s->string_table_buf = g_array_new(FALSE, TRUE, 1);
+    /*
+     * Allocate the null name, due to the clearing option set to true
+     * it will be 0.
+     */
+    g_array_set_size(s->string_table_buf, 1);
+
     memory_mapping_list_init(&s->list);
 
     guest_phys_blocks_init(&s->guest_phys_blocks);
@@ -1856,26 +1974,42 @@ static void dump_init(DumpState *s, int fd, bool has_format,
     }
 
     /*
-     * calculate phdr_num
+     * The first section header is always a special one in which most
+     * fields are 0. The section header string table is also always
+     * set.
+     */
+    s->shdr_num = 2;
+
+    /*
+     * Adds the number of architecture sections to shdr_num and sets
+     * elf_section_data_size so we know the offsets and sizes of all
+     * parts.
+     */
+    if (s->dump_info.arch_sections_add_fn) {
+        s->dump_info.arch_sections_add_fn(s);
+    }
+
+    /*
+     * calculate shdr_num so we know the offsets and sizes of all
+     * parts.
+     * Calculate phdr_num
      *
-     * the type of ehdr->e_phnum is uint16_t, so we should avoid overflow
+     * The absolute maximum amount of phdrs is UINT32_MAX - 1 as
+     * sh_info is 32 bit. There's special handling once we go over
+     * UINT16_MAX - 1 but that is handled in the ehdr and section
+     * code.
      */
-    s->phdr_num = 1; /* PT_NOTE */
-    if (s->list.num < UINT16_MAX - 2) {
-        s->shdr_num = 0;
+    s->phdr_num = 1; /* Reserve PT_NOTE */
+    if (s->list.num <= UINT32_MAX - 1) {
         s->phdr_num += s->list.num;
     } else {
-        /* sh_info of section 0 holds the real number of phdrs */
-        s->shdr_num = 1;
-
-        /* the type of shdr->sh_info is uint32_t, so we should avoid overflow */
-        if (s->list.num <= UINT32_MAX - 1) {
-            s->phdr_num += s->list.num;
-        } else {
-            s->phdr_num = UINT32_MAX;
-        }
+        s->phdr_num = UINT32_MAX;
     }
 
+    /*
+     * Now that the number of section and program headers is known we
+     * can calculate the offsets of the headers and data.
+     */
     if (dump_is_64bit(s)) {
         s->shdr_offset = sizeof(Elf64_Ehdr);
         s->phdr_offset = s->shdr_offset + sizeof(Elf64_Shdr) * s->shdr_num;
diff --git a/include/sysemu/dump-arch.h b/include/sysemu/dump-arch.h
index e25b02e9901..59bbc9be38c 100644
--- a/include/sysemu/dump-arch.h
+++ b/include/sysemu/dump-arch.h
@@ -21,6 +21,9 @@ typedef struct ArchDumpInfo {
     uint32_t page_size;      /* The target's page size. If it's variable and
                               * unknown, then this should be the maximum. */
     uint64_t phys_base;      /* The target's physmem base. */
+    void (*arch_sections_add_fn)(DumpState *s);
+    uint64_t (*arch_sections_write_hdr_fn)(DumpState *s, uint8_t *buff);
+    int (*arch_sections_write_fn)(DumpState *s, uint8_t *buff);
 } ArchDumpInfo;
 
 struct GuestPhysBlockList; /* memory_mapping.h */
diff --git a/include/sysemu/dump.h b/include/sysemu/dump.h
index 9ed811b3133..38ccac7190a 100644
--- a/include/sysemu/dump.h
+++ b/include/sysemu/dump.h
@@ -180,6 +180,9 @@ typedef struct DumpState {
     hwaddr note_offset;
 
     void *elf_section_hdrs;     /* Pointer to section header buffer */
+    void *elf_section_data;     /* Pointer to section data buffer */
+    uint64_t elf_section_data_size; /* Size of section data */
+    GArray *string_table_buf;   /* String table data buffer */
 
     uint8_t *note_buf;          /* buffer for notes */
     size_t note_buf_offset;     /* the writing place in note_buf */
-- 
Gitee


From 94bc52d50dfb8d7b4fc62fd2bca6b72a9b6ebafc Mon Sep 17 00:00:00 2001
From: Janosch Frank <frankja@linux.ibm.com>
Date: Mon, 17 Oct 2022 08:38:18 +0000
Subject: [PATCH 203/407] s390x: Add protected dump cap
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 226: s390: Enhanced Interpretation for PCI Functions and Secure Execution guest dump
RH-Bugzilla: 1664378 2043909
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [37/41] 52e1e7bf1a00ce3a220d3db2f733a65548bfec6d

Add a protected dump capability for later feature checking.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
Message-Id: <20221017083822.43118-7-frankja@linux.ibm.com>
[ Marc-André - Add missing stubs when !kvm ]
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
(cherry picked from commit ad3b2e693daac6ed92db7361236028851d37c77c)
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 target/s390x/kvm/kvm.c       |  7 +++++++
 target/s390x/kvm/kvm_s390x.h |  1 +
 target/s390x/kvm/meson.build |  2 ++
 target/s390x/kvm/stubs.c     | 12 ++++++++++++
 4 files changed, 22 insertions(+)
 create mode 100644 target/s390x/kvm/stubs.c

diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
index 30712487d48..d36b44f32a8 100644
--- a/target/s390x/kvm/kvm.c
+++ b/target/s390x/kvm/kvm.c
@@ -159,6 +159,7 @@ static int cap_hpage_1m;
 static int cap_vcpu_resets;
 static int cap_protected;
 static int cap_zpci_op;
+static int cap_protected_dump;
 
 static bool mem_op_storage_key_support;
 
@@ -365,6 +366,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
     cap_vcpu_resets = kvm_check_extension(s, KVM_CAP_S390_VCPU_RESETS);
     cap_protected = kvm_check_extension(s, KVM_CAP_S390_PROTECTED);
     cap_zpci_op = kvm_check_extension(s, KVM_CAP_S390_ZPCI_OP);
+    cap_protected_dump = kvm_check_extension(s, KVM_CAP_S390_PROTECTED_DUMP);
 
     kvm_vm_enable_cap(s, KVM_CAP_S390_USER_SIGP, 0);
     kvm_vm_enable_cap(s, KVM_CAP_S390_VECTOR_REGISTERS, 0);
@@ -2042,6 +2044,11 @@ int kvm_s390_assign_subch_ioeventfd(EventNotifier *notifier, uint32_t sch,
     return kvm_vm_ioctl(kvm_state, KVM_IOEVENTFD, &kick);
 }
 
+int kvm_s390_get_protected_dump(void)
+{
+    return cap_protected_dump;
+}
+
 int kvm_s390_get_ri(void)
 {
     return cap_ri;
diff --git a/target/s390x/kvm/kvm_s390x.h b/target/s390x/kvm/kvm_s390x.h
index aaae8570de5..f9785564d0b 100644
--- a/target/s390x/kvm/kvm_s390x.h
+++ b/target/s390x/kvm/kvm_s390x.h
@@ -26,6 +26,7 @@ int kvm_s390_set_cpu_state(S390CPU *cpu, uint8_t cpu_state);
 void kvm_s390_vcpu_interrupt_pre_save(S390CPU *cpu);
 int kvm_s390_vcpu_interrupt_post_load(S390CPU *cpu);
 int kvm_s390_get_hpage_1m(void);
+int kvm_s390_get_protected_dump(void);
 int kvm_s390_get_ri(void);
 int kvm_s390_get_zpci_op(void);
 int kvm_s390_get_clock(uint8_t *tod_high, uint64_t *tod_clock);
diff --git a/target/s390x/kvm/meson.build b/target/s390x/kvm/meson.build
index d1356356b1f..aef52b6686b 100644
--- a/target/s390x/kvm/meson.build
+++ b/target/s390x/kvm/meson.build
@@ -1,6 +1,8 @@
 
 s390x_ss.add(when: 'CONFIG_KVM', if_true: files(
   'kvm.c'
+), if_false: files(
+  'stubs.c'
 ))
 
 # Newer kernels on s390 check for an S390_PGSTE program header and
diff --git a/target/s390x/kvm/stubs.c b/target/s390x/kvm/stubs.c
new file mode 100644
index 00000000000..5fd63b9a7e3
--- /dev/null
+++ b/target/s390x/kvm/stubs.c
@@ -0,0 +1,12 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+
+#include "kvm_s390x.h"
+
+int kvm_s390_get_protected_dump(void)
+{
+    return false;
+}
-- 
Gitee


From 6c49db102103b9e8d5f3d338b5540d4f53d75583 Mon Sep 17 00:00:00 2001
From: Janosch Frank <frankja@linux.ibm.com>
Date: Mon, 17 Oct 2022 08:38:19 +0000
Subject: [PATCH 204/407] s390x: Introduce PV query interface
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 226: s390: Enhanced Interpretation for PCI Functions and Secure Execution guest dump
RH-Bugzilla: 1664378 2043909
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [38/41] 3090615d81ec6b9e4c306f7fc3709e1935ff5a79

Introduce an interface over which we can get information about UV data.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Reviewed-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
Acked-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20221017083822.43118-8-frankja@linux.ibm.com>
(cherry picked from commit 03d83ecfae46bf5e0074cb5808043b30df34064b)
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 hw/s390x/pv.c              | 61 ++++++++++++++++++++++++++++++++++++++
 hw/s390x/s390-virtio-ccw.c |  6 ++++
 include/hw/s390x/pv.h      | 10 +++++++
 3 files changed, 77 insertions(+)

diff --git a/hw/s390x/pv.c b/hw/s390x/pv.c
index 401b63d6cb6..4c012f2eebd 100644
--- a/hw/s390x/pv.c
+++ b/hw/s390x/pv.c
@@ -20,6 +20,11 @@
 #include "exec/confidential-guest-support.h"
 #include "hw/s390x/ipl.h"
 #include "hw/s390x/pv.h"
+#include "target/s390x/kvm/kvm_s390x.h"
+
+static bool info_valid;
+static struct kvm_s390_pv_info_vm info_vm;
+static struct kvm_s390_pv_info_dump info_dump;
 
 static int __s390_pv_cmd(uint32_t cmd, const char *cmdname, void *data)
 {
@@ -56,6 +61,42 @@ static int __s390_pv_cmd(uint32_t cmd, const char *cmdname, void *data)
     }                                  \
 }
 
+int s390_pv_query_info(void)
+{
+    struct kvm_s390_pv_info info = {
+        .header.id = KVM_PV_INFO_VM,
+        .header.len_max = sizeof(info.header) + sizeof(info.vm),
+    };
+    int rc;
+
+    /* Info API's first user is dump so they are bundled */
+    if (!kvm_s390_get_protected_dump()) {
+        return 0;
+    }
+
+    rc = s390_pv_cmd(KVM_PV_INFO, &info);
+    if (rc) {
+        error_report("KVM PV INFO cmd %x failed: %s",
+                     info.header.id, strerror(-rc));
+        return rc;
+    }
+    memcpy(&info_vm, &info.vm, sizeof(info.vm));
+
+    info.header.id = KVM_PV_INFO_DUMP;
+    info.header.len_max = sizeof(info.header) + sizeof(info.dump);
+    rc = s390_pv_cmd(KVM_PV_INFO, &info);
+    if (rc) {
+        error_report("KVM PV INFO cmd %x failed: %s",
+                     info.header.id, strerror(-rc));
+        return rc;
+    }
+
+    memcpy(&info_dump, &info.dump, sizeof(info.dump));
+    info_valid = true;
+
+    return rc;
+}
+
 int s390_pv_vm_enable(void)
 {
     return s390_pv_cmd(KVM_PV_ENABLE, NULL);
@@ -114,6 +155,26 @@ void s390_pv_inject_reset_error(CPUState *cs)
     env->regs[r1 + 1] = DIAG_308_RC_INVAL_FOR_PV;
 }
 
+uint64_t kvm_s390_pv_dmp_get_size_cpu(void)
+{
+    return info_dump.dump_cpu_buffer_len;
+}
+
+uint64_t kvm_s390_pv_dmp_get_size_completion_data(void)
+{
+    return info_dump.dump_config_finalize_len;
+}
+
+uint64_t kvm_s390_pv_dmp_get_size_mem_state(void)
+{
+    return info_dump.dump_config_mem_buffer_per_1m;
+}
+
+bool kvm_s390_pv_info_basic_valid(void)
+{
+    return info_valid;
+}
+
 #define TYPE_S390_PV_GUEST "s390-pv-guest"
 OBJECT_DECLARE_SIMPLE_TYPE(S390PVGuest, S390_PV_GUEST)
 
diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index bd80e72cf82..a9617ab79f2 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -365,6 +365,12 @@ static int s390_machine_protect(S390CcwMachineState *ms)
 
     ms->pv = true;
 
+    /* Will return 0 if API is not available since it's not vital */
+    rc = s390_pv_query_info();
+    if (rc) {
+        goto out_err;
+    }
+
     /* Set SE header and unpack */
     rc = s390_ipl_prepare_pv_header();
     if (rc) {
diff --git a/include/hw/s390x/pv.h b/include/hw/s390x/pv.h
index 1f1f545bfc2..e5ea0eca16c 100644
--- a/include/hw/s390x/pv.h
+++ b/include/hw/s390x/pv.h
@@ -38,6 +38,7 @@ static inline bool s390_is_pv(void)
     return ccw->pv;
 }
 
+int s390_pv_query_info(void);
 int s390_pv_vm_enable(void);
 void s390_pv_vm_disable(void);
 int s390_pv_set_sec_parms(uint64_t origin, uint64_t length);
@@ -46,8 +47,13 @@ void s390_pv_prep_reset(void);
 int s390_pv_verify(void);
 void s390_pv_unshare(void);
 void s390_pv_inject_reset_error(CPUState *cs);
+uint64_t kvm_s390_pv_dmp_get_size_cpu(void);
+uint64_t kvm_s390_pv_dmp_get_size_mem_state(void);
+uint64_t kvm_s390_pv_dmp_get_size_completion_data(void);
+bool kvm_s390_pv_info_basic_valid(void);
 #else /* CONFIG_KVM */
 static inline bool s390_is_pv(void) { return false; }
+static inline int s390_pv_query_info(void) { return 0; }
 static inline int s390_pv_vm_enable(void) { return 0; }
 static inline void s390_pv_vm_disable(void) {}
 static inline int s390_pv_set_sec_parms(uint64_t origin, uint64_t length) { return 0; }
@@ -56,6 +62,10 @@ static inline void s390_pv_prep_reset(void) {}
 static inline int s390_pv_verify(void) { return 0; }
 static inline void s390_pv_unshare(void) {}
 static inline void s390_pv_inject_reset_error(CPUState *cs) {};
+static inline uint64_t kvm_s390_pv_dmp_get_size_cpu(void) { return 0; }
+static inline uint64_t kvm_s390_pv_dmp_get_size_mem_state(void) { return 0; }
+static inline uint64_t kvm_s390_pv_dmp_get_size_completion_data(void) { return 0; }
+static inline bool kvm_s390_pv_info_basic_valid(void) { return false; }
 #endif /* CONFIG_KVM */
 
 int s390_pv_kvm_init(ConfidentialGuestSupport *cgs, Error **errp);
-- 
Gitee


From 72ee7b28c9a2cb6416a474905a73a4c49112f9dc Mon Sep 17 00:00:00 2001
From: Janosch Frank <frankja@linux.ibm.com>
Date: Mon, 17 Oct 2022 08:38:20 +0000
Subject: [PATCH 205/407] include/elf.h: add s390x note types
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 226: s390: Enhanced Interpretation for PCI Functions and Secure Execution guest dump
RH-Bugzilla: 1664378 2043909
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [39/41] ebf0873744905abbe9cfc423a56c6d1b4f2ae936

Adding two s390x note types

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20221017083822.43118-9-frankja@linux.ibm.com>
(cherry picked from commit 5433669c7a1884cc0394c360148965edf7519884)
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 include/elf.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/elf.h b/include/elf.h
index 811bf4a1cb5..4edab8e5a27 100644
--- a/include/elf.h
+++ b/include/elf.h
@@ -1647,6 +1647,8 @@ typedef struct elf64_shdr {
 #define NT_TASKSTRUCT	4
 #define NT_AUXV		6
 #define NT_PRXFPREG     0x46e62b7f      /* copied from gdb5.1/include/elf/common.h */
+#define NT_S390_PV_CPU_DATA	0x30e	/* s390 protvirt cpu dump data */
+#define NT_S390_RI_CB	0x30d		/* s390 runtime instrumentation */
 #define NT_S390_GS_CB   0x30b           /* s390 guarded storage registers */
 #define NT_S390_VXRS_HIGH 0x30a         /* s390 vector registers 16-31 */
 #define NT_S390_VXRS_LOW  0x309         /* s390 vector registers 0-15 (lower half) */
-- 
Gitee


From ce90be24a39ddfc12b1b46656ab10a873357ae67 Mon Sep 17 00:00:00 2001
From: Janosch Frank <frankja@linux.ibm.com>
Date: Mon, 17 Oct 2022 08:38:21 +0000
Subject: [PATCH 206/407] s390x: Add KVM PV dump interface
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 226: s390: Enhanced Interpretation for PCI Functions and Secure Execution guest dump
RH-Bugzilla: 1664378 2043909
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [40/41] 5df512a63b2ed17991489565b70f89f4efc0b639

Let's add a few bits of code which hide the new KVM PV dump API from
us via new functions.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
[ Marc-André: fix up for compilation issue ]
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20221017083822.43118-10-frankja@linux.ibm.com>
(cherry picked from commit 753ca06f4706cd6e57750a606afb08c5c5299643)
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 hw/s390x/pv.c         | 51 +++++++++++++++++++++++++++++++++++++++++++
 include/hw/s390x/pv.h |  9 ++++++++
 2 files changed, 60 insertions(+)

diff --git a/hw/s390x/pv.c b/hw/s390x/pv.c
index 4c012f2eebd..728ba245476 100644
--- a/hw/s390x/pv.c
+++ b/hw/s390x/pv.c
@@ -175,6 +175,57 @@ bool kvm_s390_pv_info_basic_valid(void)
     return info_valid;
 }
 
+static int s390_pv_dump_cmd(uint64_t subcmd, uint64_t uaddr, uint64_t gaddr,
+                            uint64_t len)
+{
+    struct kvm_s390_pv_dmp dmp = {
+        .subcmd = subcmd,
+        .buff_addr = uaddr,
+        .buff_len = len,
+        .gaddr = gaddr,
+    };
+    int ret;
+
+    ret = s390_pv_cmd(KVM_PV_DUMP, (void *)&dmp);
+    if (ret) {
+        error_report("KVM DUMP command %ld failed", subcmd);
+    }
+    return ret;
+}
+
+int kvm_s390_dump_cpu(S390CPU *cpu, void *buff)
+{
+    struct kvm_s390_pv_dmp dmp = {
+        .subcmd = KVM_PV_DUMP_CPU,
+        .buff_addr = (uint64_t)buff,
+        .gaddr = 0,
+        .buff_len = info_dump.dump_cpu_buffer_len,
+    };
+    struct kvm_pv_cmd pv = {
+        .cmd = KVM_PV_DUMP,
+        .data = (uint64_t)&dmp,
+    };
+
+    return kvm_vcpu_ioctl(CPU(cpu), KVM_S390_PV_CPU_COMMAND, &pv);
+}
+
+int kvm_s390_dump_init(void)
+{
+    return s390_pv_dump_cmd(KVM_PV_DUMP_INIT, 0, 0, 0);
+}
+
+int kvm_s390_dump_mem_state(uint64_t gaddr, size_t len, void *dest)
+{
+    return s390_pv_dump_cmd(KVM_PV_DUMP_CONFIG_STOR_STATE, (uint64_t)dest,
+                            gaddr, len);
+}
+
+int kvm_s390_dump_completion_data(void *buff)
+{
+    return s390_pv_dump_cmd(KVM_PV_DUMP_COMPLETE, (uint64_t)buff, 0,
+                            info_dump.dump_config_finalize_len);
+}
+
 #define TYPE_S390_PV_GUEST "s390-pv-guest"
 OBJECT_DECLARE_SIMPLE_TYPE(S390PVGuest, S390_PV_GUEST)
 
diff --git a/include/hw/s390x/pv.h b/include/hw/s390x/pv.h
index e5ea0eca16c..9360aa10914 100644
--- a/include/hw/s390x/pv.h
+++ b/include/hw/s390x/pv.h
@@ -51,6 +51,10 @@ uint64_t kvm_s390_pv_dmp_get_size_cpu(void);
 uint64_t kvm_s390_pv_dmp_get_size_mem_state(void);
 uint64_t kvm_s390_pv_dmp_get_size_completion_data(void);
 bool kvm_s390_pv_info_basic_valid(void);
+int kvm_s390_dump_init(void);
+int kvm_s390_dump_cpu(S390CPU *cpu, void *buff);
+int kvm_s390_dump_mem_state(uint64_t addr, size_t len, void *dest);
+int kvm_s390_dump_completion_data(void *buff);
 #else /* CONFIG_KVM */
 static inline bool s390_is_pv(void) { return false; }
 static inline int s390_pv_query_info(void) { return 0; }
@@ -66,6 +70,11 @@ static inline uint64_t kvm_s390_pv_dmp_get_size_cpu(void) { return 0; }
 static inline uint64_t kvm_s390_pv_dmp_get_size_mem_state(void) { return 0; }
 static inline uint64_t kvm_s390_pv_dmp_get_size_completion_data(void) { return 0; }
 static inline bool kvm_s390_pv_info_basic_valid(void) { return false; }
+static inline int kvm_s390_dump_init(void) { return 0; }
+static inline int kvm_s390_dump_cpu(S390CPU *cpu, void *buff) { return 0; }
+static inline int kvm_s390_dump_mem_state(uint64_t addr, size_t len,
+                                          void *dest) { return 0; }
+static inline int kvm_s390_dump_completion_data(void *buff) { return 0; }
 #endif /* CONFIG_KVM */
 
 int s390_pv_kvm_init(ConfidentialGuestSupport *cgs, Error **errp);
-- 
Gitee


From 6690d1407eb42baf77f429a13b02704861bda7a6 Mon Sep 17 00:00:00 2001
From: Janosch Frank <frankja@linux.ibm.com>
Date: Mon, 17 Oct 2022 08:38:22 +0000
Subject: [PATCH 207/407] s390x: pv: Add dump support
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 226: s390: Enhanced Interpretation for PCI Functions and Secure Execution guest dump
RH-Bugzilla: 1664378 2043909
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [41/41] 2731c2329276e76013e3b3df21e9743bc74edd2b

Sometimes dumping a guest from the outside is the only way to get the
data that is needed. This can be the case if a dumping mechanism like
KDUMP hasn't been configured or data needs to be fetched at a specific
point. Dumping a protected guest from the outside without help from
fw/hw doesn't yield sufficient data to be useful. Hence we now
introduce PV dump support.

The PV dump support works by integrating the firmware into the dump
process. New Ultravisor calls are used to initiate the dump process,
dump cpu data, dump memory state and lastly complete the dump process.
The UV calls are exposed by KVM via the new KVM_PV_DUMP command and
its subcommands. The guest's data is fully encrypted and can only be
decrypted by the entity that owns the customer communication key for
the dumped guest. Also dumping needs to be allowed via a flag in the
SE header.

On the QEMU side of things we store the PV dump data in the newly
introduced architecture ELF sections (storage state and completion
data) and the cpu notes (for cpu dump data).

Users can use the zgetdump tool to convert the encrypted QEMU dump to an
unencrypted one.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Message-Id: <20221017083822.43118-11-frankja@linux.ibm.com>
(cherry picked from commit 113d8f4e95cf0450bea421263de6ec016c779ad0)
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 dump/dump.c              |  12 +-
 include/sysemu/dump.h    |   5 +
 target/s390x/arch_dump.c | 262 +++++++++++++++++++++++++++++++++++----
 3 files changed, 246 insertions(+), 33 deletions(-)

diff --git a/dump/dump.c b/dump/dump.c
index 4aa8fb64d24..5dee060b730 100644
--- a/dump/dump.c
+++ b/dump/dump.c
@@ -709,9 +709,9 @@ static void dump_begin(DumpState *s, Error **errp)
     write_elf_notes(s, errp);
 }
 
-static int64_t dump_filtered_memblock_size(GuestPhysBlock *block,
-                                           int64_t filter_area_start,
-                                           int64_t filter_area_length)
+int64_t dump_filtered_memblock_size(GuestPhysBlock *block,
+                                    int64_t filter_area_start,
+                                    int64_t filter_area_length)
 {
     int64_t size, left, right;
 
@@ -729,9 +729,9 @@ static int64_t dump_filtered_memblock_size(GuestPhysBlock *block,
     return size;
 }
 
-static int64_t dump_filtered_memblock_start(GuestPhysBlock *block,
-                                            int64_t filter_area_start,
-                                            int64_t filter_area_length)
+int64_t dump_filtered_memblock_start(GuestPhysBlock *block,
+                                     int64_t filter_area_start,
+                                     int64_t filter_area_length)
 {
     if (filter_area_length) {
         /* return -1 if the block is not within filter area */
diff --git a/include/sysemu/dump.h b/include/sysemu/dump.h
index 38ccac7190a..4ffed0b6598 100644
--- a/include/sysemu/dump.h
+++ b/include/sysemu/dump.h
@@ -215,4 +215,9 @@ typedef struct DumpState {
 uint16_t cpu_to_dump16(DumpState *s, uint16_t val);
 uint32_t cpu_to_dump32(DumpState *s, uint32_t val);
 uint64_t cpu_to_dump64(DumpState *s, uint64_t val);
+
+int64_t dump_filtered_memblock_size(GuestPhysBlock *block, int64_t filter_area_start,
+                                    int64_t filter_area_length);
+int64_t dump_filtered_memblock_start(GuestPhysBlock *block, int64_t filter_area_start,
+                                     int64_t filter_area_length);
 #endif
diff --git a/target/s390x/arch_dump.c b/target/s390x/arch_dump.c
index f60a14920d4..a2329141e8a 100644
--- a/target/s390x/arch_dump.c
+++ b/target/s390x/arch_dump.c
@@ -12,11 +12,13 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/units.h"
 #include "cpu.h"
 #include "s390x-internal.h"
 #include "elf.h"
 #include "sysemu/dump.h"
-
+#include "hw/s390x/pv.h"
+#include "kvm/kvm_s390x.h"
 
 struct S390xUserRegsStruct {
     uint64_t psw[2];
@@ -76,9 +78,16 @@ typedef struct noteStruct {
         uint64_t todcmp;
         uint32_t todpreg;
         uint64_t ctrs[16];
+        uint8_t dynamic[1];  /*
+                              * Would be a flexible array member, if
+                              * that was legal inside a union. Real
+                              * size comes from PV info interface.
+                              */
     } contents;
 } QEMU_PACKED Note;
 
+static bool pv_dump_initialized;
+
 static void s390x_write_elf64_prstatus(Note *note, S390CPU *cpu, int id)
 {
     int i;
@@ -177,28 +186,39 @@ static void s390x_write_elf64_prefix(Note *note, S390CPU *cpu, int id)
     note->contents.prefix = cpu_to_be32((uint32_t)(cpu->env.psa));
 }
 
+static void s390x_write_elf64_pv(Note *note, S390CPU *cpu, int id)
+{
+    note->hdr.n_type = cpu_to_be32(NT_S390_PV_CPU_DATA);
+    if (!pv_dump_initialized) {
+        return;
+    }
+    kvm_s390_dump_cpu(cpu, &note->contents.dynamic);
+}
 
 typedef struct NoteFuncDescStruct {
     int contents_size;
+    uint64_t (*note_size_func)(void); /* NULL for non-dynamic sized contents */
     void (*note_contents_func)(Note *note, S390CPU *cpu, int id);
+    bool pvonly;
 } NoteFuncDesc;
 
 static const NoteFuncDesc note_core[] = {
-    {sizeof_field(Note, contents.prstatus), s390x_write_elf64_prstatus},
-    {sizeof_field(Note, contents.fpregset), s390x_write_elf64_fpregset},
-    { 0, NULL}
+    {sizeof_field(Note, contents.prstatus), NULL, s390x_write_elf64_prstatus, false},
+    {sizeof_field(Note, contents.fpregset), NULL, s390x_write_elf64_fpregset, false},
+    { 0, NULL, NULL, false}
 };
 
 static const NoteFuncDesc note_linux[] = {
-    {sizeof_field(Note, contents.prefix),   s390x_write_elf64_prefix},
-    {sizeof_field(Note, contents.ctrs),     s390x_write_elf64_ctrs},
-    {sizeof_field(Note, contents.timer),    s390x_write_elf64_timer},
-    {sizeof_field(Note, contents.todcmp),   s390x_write_elf64_todcmp},
-    {sizeof_field(Note, contents.todpreg),  s390x_write_elf64_todpreg},
-    {sizeof_field(Note, contents.vregslo),  s390x_write_elf64_vregslo},
-    {sizeof_field(Note, contents.vregshi),  s390x_write_elf64_vregshi},
-    {sizeof_field(Note, contents.gscb),     s390x_write_elf64_gscb},
-    { 0, NULL}
+    {sizeof_field(Note, contents.prefix),   NULL, s390x_write_elf64_prefix,  false},
+    {sizeof_field(Note, contents.ctrs),     NULL, s390x_write_elf64_ctrs,    false},
+    {sizeof_field(Note, contents.timer),    NULL, s390x_write_elf64_timer,   false},
+    {sizeof_field(Note, contents.todcmp),   NULL, s390x_write_elf64_todcmp,  false},
+    {sizeof_field(Note, contents.todpreg),  NULL, s390x_write_elf64_todpreg, false},
+    {sizeof_field(Note, contents.vregslo),  NULL, s390x_write_elf64_vregslo, false},
+    {sizeof_field(Note, contents.vregshi),  NULL, s390x_write_elf64_vregshi, false},
+    {sizeof_field(Note, contents.gscb),     NULL, s390x_write_elf64_gscb,    false},
+    {0, kvm_s390_pv_dmp_get_size_cpu,       s390x_write_elf64_pv, true},
+    { 0, NULL, NULL, false}
 };
 
 static int s390x_write_elf64_notes(const char *note_name,
@@ -207,22 +227,41 @@ static int s390x_write_elf64_notes(const char *note_name,
                                        DumpState *s,
                                        const NoteFuncDesc *funcs)
 {
-    Note note;
+    Note note, *notep;
     const NoteFuncDesc *nf;
-    int note_size;
+    int note_size, content_size;
     int ret = -1;
 
     assert(strlen(note_name) < sizeof(note.name));
 
     for (nf = funcs; nf->note_contents_func; nf++) {
-        memset(&note, 0, sizeof(note));
-        note.hdr.n_namesz = cpu_to_be32(strlen(note_name) + 1);
-        note.hdr.n_descsz = cpu_to_be32(nf->contents_size);
-        g_strlcpy(note.name, note_name, sizeof(note.name));
-        (*nf->note_contents_func)(&note, cpu, id);
+        notep = &note;
+        if (nf->pvonly && !s390_is_pv()) {
+            continue;
+        }
+
+        content_size = nf->note_size_func ? nf->note_size_func() : nf->contents_size;
+        note_size = sizeof(note) - sizeof(notep->contents) + content_size;
+
+        /* Notes with dynamic sizes need to allocate a note */
+        if (nf->note_size_func) {
+            notep = g_malloc(note_size);
+        }
+
+        memset(notep, 0, sizeof(note));
 
-        note_size = sizeof(note) - sizeof(note.contents) + nf->contents_size;
-        ret = f(&note, note_size, s);
+        /* Setup note header data */
+        notep->hdr.n_descsz = cpu_to_be32(content_size);
+        notep->hdr.n_namesz = cpu_to_be32(strlen(note_name) + 1);
+        g_strlcpy(notep->name, note_name, sizeof(notep->name));
+
+        /* Get contents and write them out */
+        (*nf->note_contents_func)(notep, cpu, id);
+        ret = f(notep, note_size, s);
+
+        if (nf->note_size_func) {
+            g_free(notep);
+        }
 
         if (ret < 0) {
             return -1;
@@ -247,13 +286,179 @@ int s390_cpu_write_elf64_note(WriteCoreDumpFunction f, CPUState *cs,
     return s390x_write_elf64_notes("LINUX", f, cpu, cpuid, s, note_linux);
 }
 
+/* PV dump section size functions */
+static uint64_t get_mem_state_size_from_len(uint64_t len)
+{
+    return (len / (MiB)) * kvm_s390_pv_dmp_get_size_mem_state();
+}
+
+static uint64_t get_size_mem_state(DumpState *s)
+{
+    return get_mem_state_size_from_len(s->total_size);
+}
+
+static uint64_t get_size_completion_data(DumpState *s)
+{
+    return kvm_s390_pv_dmp_get_size_completion_data();
+}
+
+/* PV dump section data functions*/
+static int get_data_completion(DumpState *s, uint8_t *buff)
+{
+    int rc;
+
+    if (!pv_dump_initialized) {
+        return 0;
+    }
+    rc = kvm_s390_dump_completion_data(buff);
+    if (!rc) {
+            pv_dump_initialized = false;
+    }
+    return rc;
+}
+
+static int get_mem_state(DumpState *s, uint8_t *buff)
+{
+    int64_t memblock_size, memblock_start;
+    GuestPhysBlock *block;
+    uint64_t off;
+    int rc;
+
+    QTAILQ_FOREACH(block, &s->guest_phys_blocks.head, next) {
+        memblock_start = dump_filtered_memblock_start(block, s->filter_area_begin,
+                                                      s->filter_area_length);
+        if (memblock_start == -1) {
+            continue;
+        }
+
+        memblock_size = dump_filtered_memblock_size(block, s->filter_area_begin,
+                                                    s->filter_area_length);
+
+        off = get_mem_state_size_from_len(block->target_start);
+
+        rc = kvm_s390_dump_mem_state(block->target_start,
+                                     get_mem_state_size_from_len(memblock_size),
+                                     buff + off);
+        if (rc) {
+            return rc;
+        }
+    }
+
+    return 0;
+}
+
+static struct sections {
+    uint64_t (*sections_size_func)(DumpState *s);
+    int (*sections_contents_func)(DumpState *s, uint8_t *buff);
+    char sctn_str[12];
+} sections[] = {
+    { get_size_mem_state, get_mem_state, "pv_mem_meta"},
+    { get_size_completion_data, get_data_completion, "pv_compl"},
+    {NULL , NULL, ""}
+};
+
+static uint64_t arch_sections_write_hdr(DumpState *s, uint8_t *buff)
+{
+    Elf64_Shdr *shdr = (void *)buff;
+    struct sections *sctn = sections;
+    uint64_t off = s->section_offset;
+
+    if (!pv_dump_initialized) {
+        return 0;
+    }
+
+    for (; sctn->sections_size_func; off += shdr->sh_size, sctn++, shdr++) {
+        memset(shdr, 0, sizeof(*shdr));
+        shdr->sh_type = SHT_PROGBITS;
+        shdr->sh_offset = off;
+        shdr->sh_size = sctn->sections_size_func(s);
+        shdr->sh_name = s->string_table_buf->len;
+        g_array_append_vals(s->string_table_buf, sctn->sctn_str, sizeof(sctn->sctn_str));
+    }
+
+    return (uintptr_t)shdr - (uintptr_t)buff;
+}
+
+
+/* Add arch specific number of sections and their respective sizes */
+static void arch_sections_add(DumpState *s)
+{
+    struct sections *sctn = sections;
+
+    /*
+     * We only do a PV dump if we are running a PV guest, KVM supports
+     * the dump API and we got valid dump length information.
+     */
+    if (!s390_is_pv() || !kvm_s390_get_protected_dump() ||
+        !kvm_s390_pv_info_basic_valid()) {
+        return;
+    }
+
+    /*
+     * Start the UV dump process by doing the initialize dump call via
+     * KVM as the proxy.
+     */
+    if (!kvm_s390_dump_init()) {
+        pv_dump_initialized = true;
+    } else {
+        /*
+         * Dump init failed, maybe the guest owner disabled dumping.
+         * We'll continue the non-PV dump process since this is no
+         * reason to crash qemu.
+         */
+        return;
+    }
+
+    for (; sctn->sections_size_func; sctn++) {
+        s->shdr_num += 1;
+        s->elf_section_data_size += sctn->sections_size_func(s);
+    }
+}
+
+/*
+ * After the PV dump has been initialized, the CPU data has been
+ * fetched and memory has been dumped, we need to grab the tweak data
+ * and the completion data.
+ */
+static int arch_sections_write(DumpState *s, uint8_t *buff)
+{
+    struct sections *sctn = sections;
+    int rc;
+
+    if (!pv_dump_initialized) {
+        return -EINVAL;
+    }
+
+    for (; sctn->sections_size_func; sctn++) {
+        rc = sctn->sections_contents_func(s, buff);
+        buff += sctn->sections_size_func(s);
+        if (rc) {
+            return rc;
+        }
+    }
+    return 0;
+}
+
 int cpu_get_dump_info(ArchDumpInfo *info,
                       const struct GuestPhysBlockList *guest_phys_blocks)
 {
     info->d_machine = EM_S390;
     info->d_endian = ELFDATA2MSB;
     info->d_class = ELFCLASS64;
-
+    /*
+     * This is evaluated for each dump so we can freely switch
+     * between PV and non-PV.
+     */
+    if (s390_is_pv() && kvm_s390_get_protected_dump() &&
+        kvm_s390_pv_info_basic_valid()) {
+        info->arch_sections_add_fn = *arch_sections_add;
+        info->arch_sections_write_hdr_fn = *arch_sections_write_hdr;
+        info->arch_sections_write_fn = *arch_sections_write;
+    } else {
+        info->arch_sections_add_fn = NULL;
+        info->arch_sections_write_hdr_fn = NULL;
+        info->arch_sections_write_fn = NULL;
+    }
     return 0;
 }
 
@@ -261,7 +466,7 @@ ssize_t cpu_get_note_size(int class, int machine, int nr_cpus)
 {
     int name_size = 8; /* "LINUX" or "CORE" + pad */
     size_t elf_note_size = 0;
-    int note_head_size;
+    int note_head_size, content_size;
     const NoteFuncDesc *nf;
 
     assert(class == ELFCLASS64);
@@ -270,12 +475,15 @@ ssize_t cpu_get_note_size(int class, int machine, int nr_cpus)
     note_head_size = sizeof(Elf64_Nhdr);
 
     for (nf = note_core; nf->note_contents_func; nf++) {
-        elf_note_size = elf_note_size + note_head_size + name_size +
-                        nf->contents_size;
+        elf_note_size = elf_note_size + note_head_size + name_size + nf->contents_size;
     }
     for (nf = note_linux; nf->note_contents_func; nf++) {
+        if (nf->pvonly && !s390_is_pv()) {
+            continue;
+        }
+        content_size = nf->contents_size ? nf->contents_size : nf->note_size_func();
         elf_note_size = elf_note_size + note_head_size + name_size +
-                        nf->contents_size;
+                        content_size;
     }
 
     return (elf_note_size) * nr_cpus;
-- 
Gitee


From 4a4487e667c2a5e388473d8cd62638856ff73f52 Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Mon, 7 Nov 2022 19:59:46 -0500
Subject: [PATCH 208/407] ui/vnc-clipboard: fix integer underflow in
 vnc_client_cut_text_ext
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 227: ui/vnc-clipboard: fix integer underflow in vnc_client_cut_text_ext
RH-Bugzilla: 2129760
RH-Acked-by: Mauro Matteo Cascella <None>
RH-Acked-by: Marc-André Lureau <marcandre.lureau@redhat.com>
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Gerd Hoffmann <kraxel@redhat.com>
RH-Commit: [1/1] ac19a6c0777e308061bcb6d1de5cc9beaa105a3a (jmaloy/qemu-kvm)

BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2129760
CVE: CVE-2022-3165
Upstream: Merged

commit d307040b18bfcb1393b910f1bae753d5c12a4dc7
Author: Mauro Matteo Cascella <mcascell@redhat.com>
Date:   Sun Sep 25 22:45:11 2022 +0200

    ui/vnc-clipboard: fix integer underflow in vnc_client_cut_text_ext

    Extended ClientCutText messages start with a 4-byte header. If len < 4,
    an integer underflow occurs in vnc_client_cut_text_ext. The result is
    used to decompress data in a while loop in inflate_buffer, leading to
    CPU consumption and denial of service. Prevent this by checking dlen in
    protocol_client_msg.

    Fixes: CVE-2022-3165
    Fixes: 0bf41cab93e5 ("ui/vnc: clipboard support")
    Reported-by: TangPeng <tangpeng@qianxin.com>
    Signed-off-by: Mauro Matteo Cascella <mcascell@redhat.com>
    Message-Id: <20220925204511.1103214-1-mcascell@redhat.com>
    Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

(cherry picked from commit d307040b18bfcb1393b910f1bae753d5c12a4dc7)
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 ui/vnc.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/ui/vnc.c b/ui/vnc.c
index af02522e841..a14b6861be6 100644
--- a/ui/vnc.c
+++ b/ui/vnc.c
@@ -2442,8 +2442,8 @@ static int protocol_client_msg(VncState *vs, uint8_t *data, size_t len)
         if (len == 1) {
             return 8;
         }
+        uint32_t dlen = abs(read_s32(data, 4));
         if (len == 8) {
-            uint32_t dlen = abs(read_s32(data, 4));
             if (dlen > (1 << 20)) {
                 error_report("vnc: client_cut_text msg payload has %u bytes"
                              " which exceeds our limit of 1MB.", dlen);
@@ -2456,8 +2456,13 @@ static int protocol_client_msg(VncState *vs, uint8_t *data, size_t len)
         }
 
         if (read_s32(data, 4) < 0) {
-            vnc_client_cut_text_ext(vs, abs(read_s32(data, 4)),
-                                    read_u32(data, 8), data + 12);
+            if (dlen < 4) {
+                error_report("vnc: malformed payload (header less than 4 bytes)"
+                             " in extended clipboard pseudo-encoding.");
+                vnc_client_error(vs);
+                break;
+            }
+            vnc_client_cut_text_ext(vs, dlen, read_u32(data, 8), data + 12);
             break;
         }
         vnc_client_cut_text(vs, read_u32(data, 4), data + 8);
-- 
Gitee


From e235e7a53f6b5af51879b11f86914d7bbdc31bfb Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Wed, 9 Nov 2022 18:41:18 -0500
Subject: [PATCH 209/407] hw/acpi: Add ospm_status hook implementation for
 acpi-ged

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 228: qemu-kvm: backport some aarch64 fixes
RH-Bugzilla: 2132609
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Igor Mammedov <imammedo@redhat.com>
RH-Acked-by: Eric Auger <eric.auger@redhat.com>
RH-Acked-by: Gavin Shan <gshan@redhat.com>
RH-Commit: [1/2] 99730b1a27666ca745dc28d90751c938d43f1682 (jmaloy/qemu-kvm)

BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2132609
Upstream: Merged

commit d4424bebceaa8ffbc23060ce45e52a9bb817e3c9
Author: Keqian Zhu <zhukeqian1@huawei.com>
Date:   Tue Aug 16 17:49:57 2022 +0800

    hw/acpi: Add ospm_status hook implementation for acpi-ged

    Setup an ARM virtual machine of machine virt and execute qmp "query-acpi-ospm-status"
    causes segmentation fault with following dumpstack:
     #1  0x0000aaaaab64235c in qmp_query_acpi_ospm_status (errp=errp@entry=0xfffffffff030) at ../monitor/qmp-cmds.c:312
     #2  0x0000aaaaabfc4e20 in qmp_marshal_query_acpi_ospm_status (args=<optimized out>, ret=0xffffea4ffe90, errp=0xffffea4ffe88) at qapi/qapi-commands-acpi.c:63
     #3  0x0000aaaaabff8ba0 in do_qmp_dispatch_bh (opaque=0xffffea4ffe98) at ../qapi/qmp-dispatch.c:128
     #4  0x0000aaaaac02e594 in aio_bh_call (bh=0xffffe0004d80) at ../util/async.c:150
     #5  aio_bh_poll (ctx=ctx@entry=0xaaaaad0f6040) at ../util/async.c:178
     #6  0x0000aaaaac00bd40 in aio_dispatch (ctx=ctx@entry=0xaaaaad0f6040) at ../util/aio-posix.c:421
     #7  0x0000aaaaac02e010 in aio_ctx_dispatch (source=0xaaaaad0f6040, callback=<optimized out>, user_data=<optimized out>) at ../util/async.c:320
     #8  0x0000fffff76f6884 in g_main_context_dispatch () at /usr/lib64/libglib-2.0.so.0
     #9  0x0000aaaaac0452d4 in glib_pollfds_poll () at ../util/main-loop.c:297
     #10 os_host_main_loop_wait (timeout=0) at ../util/main-loop.c:320
     #11 main_loop_wait (nonblocking=nonblocking@entry=0) at ../util/main-loop.c:596
     #12 0x0000aaaaab5c9e50 in qemu_main_loop () at ../softmmu/runstate.c:734
     #13 0x0000aaaaab185370 in qemu_main (argc=argc@entry=47, argv=argv@entry=0xfffffffff518, envp=envp@entry=0x0) at ../softmmu/main.c:38
     #14 0x0000aaaaab16f99c in main (argc=47, argv=0xfffffffff518) at ../softmmu/main.c:47

    Fixes: ebb62075021a ("hw/acpi: Add ACPI Generic Event Device Support")
    Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
    Reviewed-by: Igor Mammedov <imammedo@redhat.com>
    Message-id: 20220816094957.31700-1-zhukeqian1@huawei.com
    Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

(cherry picked from commit d4424bebceaa8ffbc23060ce45e52a9bb817e3c9)
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 hw/acpi/generic_event_device.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
index e28457a7d10..a3d31631fe0 100644
--- a/hw/acpi/generic_event_device.c
+++ b/hw/acpi/generic_event_device.c
@@ -267,6 +267,13 @@ static void acpi_ged_unplug_cb(HotplugHandler *hotplug_dev,
     }
 }
 
+static void acpi_ged_ospm_status(AcpiDeviceIf *adev, ACPIOSTInfoList ***list)
+{
+    AcpiGedState *s = ACPI_GED(adev);
+
+    acpi_memory_ospm_status(&s->memhp_state, list);
+}
+
 static void acpi_ged_send_event(AcpiDeviceIf *adev, AcpiEventStatusBits ev)
 {
     AcpiGedState *s = ACPI_GED(adev);
@@ -409,6 +416,7 @@ static void acpi_ged_class_init(ObjectClass *class, void *data)
     hc->unplug_request = acpi_ged_unplug_request_cb;
     hc->unplug = acpi_ged_unplug_cb;
 
+    adevc->ospm_status = acpi_ged_ospm_status;
     adevc->send_event = acpi_ged_send_event;
 }
 
-- 
Gitee


From cd881441cf4b5648bd3410b3f82b60a6118a1a62 Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Wed, 9 Nov 2022 18:41:18 -0500
Subject: [PATCH 210/407] target/arm/kvm: Retry KVM_CREATE_VM call if it fails
 EINTR

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 228: qemu-kvm: backport some aarch64 fixes
RH-Bugzilla: 2132609
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Igor Mammedov <imammedo@redhat.com>
RH-Acked-by: Eric Auger <eric.auger@redhat.com>
RH-Acked-by: Gavin Shan <gshan@redhat.com>
RH-Commit: [2/2] 8494bbfb3fcd8693f56312f984d2964d1ca275c2 (jmaloy/qemu-kvm)

BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2132609
Upstream: Merged

commit bbde13cd14ad4eec18529ce0bf5876058464e124
Author: Peter Maydell <peter.maydell@linaro.org>
Date:   Fri Sep 30 12:38:24 2022 +0100

    target/arm/kvm: Retry KVM_CREATE_VM call if it fails EINTR

    Occasionally the KVM_CREATE_VM ioctl can return EINTR, even though
    there is no pending signal to be taken. In commit 94ccff13382055
    we added a retry-on-EINTR loop to the KVM_CREATE_VM call in the
    generic KVM code. Adopt the same approach for the use of the
    ioctl in the Arm-specific KVM code (where we use it to create a
    scratch VM for probing for various things).

    For more information, see the mailing list thread:
    https://lore.kernel.org/qemu-devel/8735e0s1zw.wl-maz@kernel.org/

    Reported-by: Vitaly Chikunov <vt@altlinux.org>
    Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
    Reviewed-by: Vitaly Chikunov <vt@altlinux.org>
    Reviewed-by: Eric Auger <eric.auger@redhat.com>
    Acked-by: Marc Zyngier <maz@kernel.org>
    Message-id: 20220930113824.1933293-1-peter.maydell@linaro.org

(cherry picked from commit bbde13cd14ad4eec18529ce0bf5876058464e124)
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 target/arm/kvm.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index bbf1ce7ba3b..1ae4e510558 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -80,7 +80,9 @@ bool kvm_arm_create_scratch_host_vcpu(const uint32_t *cpus_to_try,
     if (max_vm_pa_size < 0) {
         max_vm_pa_size = 0;
     }
-    vmfd = ioctl(kvmfd, KVM_CREATE_VM, max_vm_pa_size);
+    do {
+        vmfd = ioctl(kvmfd, KVM_CREATE_VM, max_vm_pa_size);
+    } while (vmfd == -1 && errno == EINTR);
     if (vmfd < 0) {
         goto err;
     }
-- 
Gitee


From 6269d61b57197b84f2ca8f2fe4a71fa1c5f42791 Mon Sep 17 00:00:00 2001
From: Thomas Huth <thuth@redhat.com>
Date: Mon, 14 Nov 2022 14:25:02 +0100
Subject: [PATCH 211/407] docs/system/s390x: Document the "loadparm" machine
 property
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Thomas Huth <thuth@redhat.com>
RH-MergeRequest: 233: s390x: Document the "loadparm" machine property
RH-Bugzilla: 2128225
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Cédric Le Goater <clg@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [1/2] e9589ea32d2a8f82971476b644e1063fa14cf822

The "loadparm" machine property is useful for selecting alternative
kernels on the disk of the guest, but so far we do not tell the users
yet how to use it. Add some documentation to fill this gap.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2128235
Message-Id: <20221114132502.110213-1-thuth@redhat.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit be5df2edb5d69ff3107c5616aa035a9ba8d0422e)
---
 docs/system/s390x/bootdevices.rst | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/docs/system/s390x/bootdevices.rst b/docs/system/s390x/bootdevices.rst
index 9e591cb9dc3..d4bf3b9f0bc 100644
--- a/docs/system/s390x/bootdevices.rst
+++ b/docs/system/s390x/bootdevices.rst
@@ -53,6 +53,32 @@ recommended to specify a CD-ROM device via ``-device scsi-cd`` (as mentioned
 above) instead.
 
 
+Selecting kernels with the ``loadparm`` property
+------------------------------------------------
+
+The ``s390-ccw-virtio`` machine supports the so-called ``loadparm`` parameter
+which can be used to select the kernel on the disk of the guest that the
+s390-ccw bios should boot. When starting QEMU, it can be specified like this::
+
+ qemu-system-s390x -machine s390-ccw-virtio,loadparm=<string>
+
+The first way to use this parameter is to use the word ``PROMPT`` as the
+``<string>`` here. In that case the s390-ccw bios will show a list of
+installed kernels on the disk of the guest and ask the user to enter a number
+to chose which kernel should be booted -- similar to what can be achieved by
+specifying the ``-boot menu=on`` option when starting QEMU. Note that the menu
+list will only show the names of the installed kernels when using a DASD-like
+disk image with 4k byte sectors. On normal SCSI-style disks with 512-byte
+sectors, there is not enough space for the zipl loader on the disk to store
+the kernel names, so you only get a list without names here.
+
+The second way to use this parameter is to use a number in the range from 0
+to 31. The numbers that can be used here correspond to the numbers that are
+shown when using the ``PROMPT`` option, and the s390-ccw bios will then try
+to automatically boot the kernel that is associated with the given number.
+Note that ``0`` can be used to boot the default entry.
+
+
 Booting from a network device
 -----------------------------
 
-- 
Gitee


From 2a03fd5e61dc34336a22704555697b67a0e595fa Mon Sep 17 00:00:00 2001
From: Thomas Huth <thuth@redhat.com>
Date: Fri, 18 Nov 2022 15:23:19 +0100
Subject: [PATCH 212/407] s390x: Register TYPE_S390_CCW_MACHINE properties as
 class properties
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Thomas Huth <thuth@redhat.com>
RH-MergeRequest: 233: s390x: Document the "loadparm" machine property
RH-Bugzilla: 2128225
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Cédric Le Goater <clg@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [2/2] 28a0086cb0e8be2535deafdd9115cadd7ff033f3

Currently, when running 'qemu-system-s390x -M s390-ccw-virtio,help'
the s390x-specific properties are not listed anymore. This happens
because since commit d8fb7d0969 ("vl: switch -M parsing to keyval")
the properties have to be defined at the class level and not at the
instance level anymore. Fix it on s390x now, too, by moving the
registration of the properties to the class level"

Fixes: d8fb7d0969 ("vl: switch -M parsing to keyval")
Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
Message-Id: <20221103170150.20789-2-pmorel@linux.ibm.com>
[thuth: Add patch description]
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 1fd396e32288bbf536483c74b68cb3ee86005a9f)

Conflicts:
	hw/s390x/s390-virtio-ccw.c
	(dropped the "zpcii-disable" property code - it's not used in downstream)
Signed-off-by: Thomas Huth <thuth@redhat.com>
---
 hw/s390x/s390-virtio-ccw.c | 117 +++++++++++++++++++++----------------
 1 file changed, 67 insertions(+), 50 deletions(-)

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index a9617ab79f2..4a7cd21cacb 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -42,6 +42,7 @@
 #include "sysemu/sysemu.h"
 #include "hw/s390x/pv.h"
 #include "migration/blocker.h"
+#include "qapi/visitor.h"
 
 static Error *pv_mig_blocker;
 
@@ -588,38 +589,6 @@ static ram_addr_t s390_fixup_ram_size(ram_addr_t sz)
     return newsz;
 }
 
-static void ccw_machine_class_init(ObjectClass *oc, void *data)
-{
-    MachineClass *mc = MACHINE_CLASS(oc);
-    NMIClass *nc = NMI_CLASS(oc);
-    HotplugHandlerClass *hc = HOTPLUG_HANDLER_CLASS(oc);
-    S390CcwMachineClass *s390mc = S390_CCW_MACHINE_CLASS(mc);
-
-    s390mc->ri_allowed = true;
-    s390mc->cpu_model_allowed = true;
-    s390mc->css_migration_enabled = true;
-    s390mc->hpage_1m_allowed = true;
-    mc->init = ccw_init;
-    mc->reset = s390_machine_reset;
-    mc->block_default_type = IF_VIRTIO;
-    mc->no_cdrom = 1;
-    mc->no_floppy = 1;
-    mc->no_parallel = 1;
-    mc->no_sdcard = 1;
-    mc->max_cpus = S390_MAX_CPUS;
-    mc->has_hotpluggable_cpus = true;
-    assert(!mc->get_hotplug_handler);
-    mc->get_hotplug_handler = s390_get_hotplug_handler;
-    mc->cpu_index_to_instance_props = s390_cpu_index_to_props;
-    mc->possible_cpu_arch_ids = s390_possible_cpu_arch_ids;
-    /* it is overridden with 'host' cpu *in kvm_arch_init* */
-    mc->default_cpu_type = S390_CPU_TYPE_NAME("qemu");
-    hc->plug = s390_machine_device_plug;
-    hc->unplug_request = s390_machine_device_unplug_request;
-    nc->nmi_monitor_handler = s390_nmi;
-    mc->default_ram_id = "s390.ram";
-}
-
 static inline bool machine_get_aes_key_wrap(Object *obj, Error **errp)
 {
     S390CcwMachineState *ms = S390_CCW_MACHINE(obj);
@@ -694,19 +663,29 @@ bool hpage_1m_allowed(void)
     return get_machine_class()->hpage_1m_allowed;
 }
 
-static char *machine_get_loadparm(Object *obj, Error **errp)
+static void machine_get_loadparm(Object *obj, Visitor *v,
+                                 const char *name, void *opaque,
+                                 Error **errp)
 {
     S390CcwMachineState *ms = S390_CCW_MACHINE(obj);
+    char *str = g_strndup((char *) ms->loadparm, sizeof(ms->loadparm));
 
-    /* make a NUL-terminated string */
-    return g_strndup((char *) ms->loadparm, sizeof(ms->loadparm));
+    visit_type_str(v, name, &str, errp);
+    g_free(str);
 }
 
-static void machine_set_loadparm(Object *obj, const char *val, Error **errp)
+static void machine_set_loadparm(Object *obj, Visitor *v,
+                                 const char *name, void *opaque,
+                                 Error **errp)
 {
     S390CcwMachineState *ms = S390_CCW_MACHINE(obj);
+    char *val;
     int i;
 
+    if (!visit_type_str(v, name, &val, errp)) {
+        return;
+    }
+
     for (i = 0; i < sizeof(ms->loadparm) && val[i]; i++) {
         uint8_t c = qemu_toupper(val[i]); /* mimic HMC */
 
@@ -724,29 +703,67 @@ static void machine_set_loadparm(Object *obj, const char *val, Error **errp)
         ms->loadparm[i] = ' '; /* pad right with spaces */
     }
 }
-static inline void s390_machine_initfn(Object *obj)
+
+static void ccw_machine_class_init(ObjectClass *oc, void *data)
 {
-    object_property_add_bool(obj, "aes-key-wrap",
-                             machine_get_aes_key_wrap,
-                             machine_set_aes_key_wrap);
-    object_property_set_description(obj, "aes-key-wrap",
+    MachineClass *mc = MACHINE_CLASS(oc);
+    NMIClass *nc = NMI_CLASS(oc);
+    HotplugHandlerClass *hc = HOTPLUG_HANDLER_CLASS(oc);
+    S390CcwMachineClass *s390mc = S390_CCW_MACHINE_CLASS(mc);
+
+    s390mc->ri_allowed = true;
+    s390mc->cpu_model_allowed = true;
+    s390mc->css_migration_enabled = true;
+    s390mc->hpage_1m_allowed = true;
+    mc->init = ccw_init;
+    mc->reset = s390_machine_reset;
+    mc->block_default_type = IF_VIRTIO;
+    mc->no_cdrom = 1;
+    mc->no_floppy = 1;
+    mc->no_parallel = 1;
+    mc->no_sdcard = 1;
+    mc->max_cpus = S390_MAX_CPUS;
+    mc->has_hotpluggable_cpus = true;
+    assert(!mc->get_hotplug_handler);
+    mc->get_hotplug_handler = s390_get_hotplug_handler;
+    mc->cpu_index_to_instance_props = s390_cpu_index_to_props;
+    mc->possible_cpu_arch_ids = s390_possible_cpu_arch_ids;
+    /* it is overridden with 'host' cpu *in kvm_arch_init* */
+    mc->default_cpu_type = S390_CPU_TYPE_NAME("qemu");
+    hc->plug = s390_machine_device_plug;
+    hc->unplug_request = s390_machine_device_unplug_request;
+    nc->nmi_monitor_handler = s390_nmi;
+    mc->default_ram_id = "s390.ram";
+
+    object_class_property_add_bool(oc, "aes-key-wrap",
+                                   machine_get_aes_key_wrap,
+                                   machine_set_aes_key_wrap);
+    object_class_property_set_description(oc, "aes-key-wrap",
             "enable/disable AES key wrapping using the CPACF wrapping key");
-    object_property_set_bool(obj, "aes-key-wrap", true, NULL);
 
-    object_property_add_bool(obj, "dea-key-wrap",
-                             machine_get_dea_key_wrap,
-                             machine_set_dea_key_wrap);
-    object_property_set_description(obj, "dea-key-wrap",
+    object_class_property_add_bool(oc, "dea-key-wrap",
+                                   machine_get_dea_key_wrap,
+                                   machine_set_dea_key_wrap);
+    object_class_property_set_description(oc, "dea-key-wrap",
             "enable/disable DEA key wrapping using the CPACF wrapping key");
-    object_property_set_bool(obj, "dea-key-wrap", true, NULL);
-    object_property_add_str(obj, "loadparm",
-            machine_get_loadparm, machine_set_loadparm);
-    object_property_set_description(obj, "loadparm",
+
+    object_class_property_add(oc, "loadparm", "loadparm",
+                              machine_get_loadparm, machine_set_loadparm,
+                              NULL, NULL);
+    object_class_property_set_description(oc, "loadparm",
             "Up to 8 chars in set of [A-Za-z0-9. ] (lower case chars converted"
             " to upper case) to pass to machine loader, boot manager,"
             " and guest kernel");
 }
 
+static inline void s390_machine_initfn(Object *obj)
+{
+    S390CcwMachineState *ms = S390_CCW_MACHINE(obj);
+
+    ms->aes_key_wrap = true;
+    ms->dea_key_wrap = true;
+}
+
 static const TypeInfo ccw_machine_info = {
     .name          = TYPE_S390_CCW_MACHINE,
     .parent        = TYPE_MACHINE,
-- 
Gitee


From 6c10709a328a08f412d3c7682eefdb9867e2dc22 Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Fri, 18 Nov 2022 20:26:54 -0500
Subject: [PATCH 213/407] ui/vnc.c: Fixed a deadlock bug.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 234: ui/vnc.c: Fixed a deadlock bug.
RH-Bugzilla: 2141896
RH-Acked-by: Gerd Hoffmann <kraxel@redhat.com>
RH-Acked-by: Marc-André Lureau <marcandre.lureau@redhat.com>
RH-Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
RH-Commit: [1/1] d3d1d28d7b621a8ae8a593a5bd5303fa7951c17c (jmaloy/qemu-kvm)

BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2141896
Upstream: Merged

commit 1dbbe6f172810026c51dc84ed927a3cc23017949
Author: Rao Lei <lei.rao@intel.com>
Date:   Wed Jan 5 10:08:08 2022 +0800

    ui/vnc.c: Fixed a deadlock bug.

    The GDB statck is as follows:
    (gdb) bt
    0  __lll_lock_wait (futex=futex@entry=0x56211df20360, private=0) at lowlevellock.c:52
    1  0x00007f263caf20a3 in __GI___pthread_mutex_lock (mutex=0x56211df20360) at ../nptl/pthread_mutex_lock.c:80
    2  0x000056211a757364 in qemu_mutex_lock_impl (mutex=0x56211df20360, file=0x56211a804857 "../ui/vnc-jobs.h", line=60)
        at ../util/qemu-thread-posix.c:80
    3  0x000056211a0ef8c7 in vnc_lock_output (vs=0x56211df14200) at ../ui/vnc-jobs.h:60
    4  0x000056211a0efcb7 in vnc_clipboard_send (vs=0x56211df14200, count=1, dwords=0x7ffdf1701338) at ../ui/vnc-clipboard.c:138
    5  0x000056211a0f0129 in vnc_clipboard_notify (notifier=0x56211df244c8, data=0x56211dd1bbf0) at ../ui/vnc-clipboard.c:209
    6  0x000056211a75dde8 in notifier_list_notify (list=0x56211afa17d0 <clipboard_notifiers>, data=0x56211dd1bbf0) at ../util/notify.c:39
    7  0x000056211a0bf0e6 in qemu_clipboard_update (info=0x56211dd1bbf0) at ../ui/clipboard.c:50
    8  0x000056211a0bf05d in qemu_clipboard_peer_release (peer=0x56211df244c0, selection=QEMU_CLIPBOARD_SELECTION_CLIPBOARD)
        at ../ui/clipboard.c:41
    9  0x000056211a0bef9b in qemu_clipboard_peer_unregister (peer=0x56211df244c0) at ../ui/clipboard.c:19
    10 0x000056211a0d45f3 in vnc_disconnect_finish (vs=0x56211df14200) at ../ui/vnc.c:1358
    11 0x000056211a0d4c9d in vnc_client_read (vs=0x56211df14200) at ../ui/vnc.c:1611
    12 0x000056211a0d4df8 in vnc_client_io (ioc=0x56211ce70690, condition=G_IO_IN, opaque=0x56211df14200) at ../ui/vnc.c:1649
    13 0x000056211a5b976c in qio_channel_fd_source_dispatch
        (source=0x56211ce50a00, callback=0x56211a0d4d71 <vnc_client_io>, user_data=0x56211df14200) at ../io/channel-watch.c:84
    14 0x00007f263ccede8e in g_main_context_dispatch () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
    15 0x000056211a77d4a1 in glib_pollfds_poll () at ../util/main-loop.c:232
    16 0x000056211a77d51f in os_host_main_loop_wait (timeout=958545) at ../util/main-loop.c:255
    17 0x000056211a77d630 in main_loop_wait (nonblocking=0) at ../util/main-loop.c:531
    18 0x000056211a45bc8e in qemu_main_loop () at ../softmmu/runstate.c:726
    19 0x000056211a0b45fa in main (argc=69, argv=0x7ffdf1701778, envp=0x7ffdf17019a8) at ../softmmu/main.c:50

    From the call trace, we can see it is a deadlock bug.
    vnc_disconnect_finish will acquire the output_mutex.
    But, the output_mutex will be acquired again in vnc_clipboard_send.
    Repeated locking will cause deadlock. So, I move
    qemu_clipboard_peer_unregister() behind vnc_unlock_output();
     Fixes: 0bf41cab93e ("ui/vnc: clipboard support")
    Signed-off-by: Lei Rao <lei.rao@intel.com>
    Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
    Message-Id: <20220105020808.597325-1-lei.rao@intel.com>
    Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

(cherry picked from commit 1dbbe6f172810026c51dc84ed927a3cc23017949)
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 ui/vnc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/ui/vnc.c b/ui/vnc.c
index a14b6861be6..76372ca1ded 100644
--- a/ui/vnc.c
+++ b/ui/vnc.c
@@ -1354,12 +1354,12 @@ void vnc_disconnect_finish(VncState *vs)
         /* last client gone */
         vnc_update_server_surface(vs->vd);
     }
+    vnc_unlock_output(vs);
+
     if (vs->cbpeer.update.notify) {
         qemu_clipboard_peer_unregister(&vs->cbpeer);
     }
 
-    vnc_unlock_output(vs);
-
     qemu_mutex_destroy(&vs->output_mutex);
     if (vs->bh != NULL) {
         qemu_bh_delete(vs->bh);
-- 
Gitee


From 2982082db040878dd87df2338fb0b40bec0d963e Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Mon, 5 Dec 2022 15:32:55 -0500
Subject: [PATCH 214/407] hw/display/qxl: Have qxl_log_command Return early if
 no log_cmd handler
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 240: hw/display/qxl: Have qxl_log_command Return early if no log_cmd handler
RH-Bugzilla: 2148545
RH-Acked-by: Gerd Hoffmann <kraxel@redhat.com>
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
RH-Acked-by: Marc-André Lureau <marcandre.lureau@redhat.com>
RH-Commit: [1/5] 33d94f40c46cccbc32d108d1035365917bf90356 (jmaloy/jons-qemu-kvm)

BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2148545
CVE: CVE-2022-4144
Upstream: Merged

commit 61c34fc194b776ecadc39fb26b061331107e5599
Author: Philippe Mathieu-Daudé <philmd@linaro.org>
Date:   Mon Nov 28 21:27:37 2022 +0100

    hw/display/qxl: Have qxl_log_command Return early if no log_cmd handler

    Only 3 command types are logged: no need to call qxl_phys2virt()
    for the other types. Using different cases will help to pass
    different structure sizes to qxl_phys2virt() in a pair of commits.

    Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
    Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
    Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
    Message-Id: <20221128202741.4945-2-philmd@linaro.org>

(cherry picked from commit 61c34fc194b776ecadc39fb26b061331107e5599)
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 hw/display/qxl-logger.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/hw/display/qxl-logger.c b/hw/display/qxl-logger.c
index 68bfa475680..1bcf803db6d 100644
--- a/hw/display/qxl-logger.c
+++ b/hw/display/qxl-logger.c
@@ -247,6 +247,16 @@ int qxl_log_command(PCIQXLDevice *qxl, const char *ring, QXLCommandExt *ext)
             qxl_name(qxl_type, ext->cmd.type),
             compat ? "(compat)" : "");
 
+    switch (ext->cmd.type) {
+    case QXL_CMD_DRAW:
+        break;
+    case QXL_CMD_SURFACE:
+        break;
+    case QXL_CMD_CURSOR:
+        break;
+    default:
+        goto out;
+    }
     data = qxl_phys2virt(qxl, ext->cmd.data, ext->group_id);
     if (!data) {
         return 1;
@@ -269,6 +279,7 @@ int qxl_log_command(PCIQXLDevice *qxl, const char *ring, QXLCommandExt *ext)
         qxl_log_cmd_cursor(qxl, data, ext->group_id);
         break;
     }
+out:
     fprintf(stderr, "\n");
     return 0;
 }
-- 
Gitee


From 64dc41608cb70083608d1fe83dfac5aaed75c55f Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Mon, 5 Dec 2022 15:32:55 -0500
Subject: [PATCH 215/407] hw/display/qxl: Document qxl_phys2virt()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 240: hw/display/qxl: Have qxl_log_command Return early if no log_cmd handler
RH-Bugzilla: 2148545
RH-Acked-by: Gerd Hoffmann <kraxel@redhat.com>
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
RH-Acked-by: Marc-André Lureau <marcandre.lureau@redhat.com>
RH-Commit: [2/5] f84c0b379022c527fc2508a242443d86454944c0 (jmaloy/jons-qemu-kvm)

BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2148545
CVE: CVE-2022-4144
Upstream: Merged

commit b1901de83a9456cde26fc755f71ca2b7b3ef50fc
Author: Philippe Mathieu-Daudé <philmd@linaro.org>
Date:   Mon Nov 28 21:27:38 2022 +0100

    hw/display/qxl: Document qxl_phys2virt()

    Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
    Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
    Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
    Message-Id: <20221128202741.4945-3-philmd@linaro.org>

(cherry picked from commit b1901de83a9456cde26fc755f71ca2b7b3ef50fc)
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 hw/display/qxl.h | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/hw/display/qxl.h b/hw/display/qxl.h
index 30d21f4d0bd..c938f88a2f6 100644
--- a/hw/display/qxl.h
+++ b/hw/display/qxl.h
@@ -147,6 +147,25 @@ OBJECT_DECLARE_SIMPLE_TYPE(PCIQXLDevice, PCI_QXL)
 #define QXL_DEFAULT_REVISION (QXL_REVISION_STABLE_V12 + 1)
 
 /* qxl.c */
+/**
+ * qxl_phys2virt: Get a pointer within a PCI VRAM memory region.
+ *
+ * @qxl: QXL device
+ * @phys: physical offset of buffer within the VRAM
+ * @group_id: memory slot group
+ *
+ * Returns a host pointer to a buffer placed at offset @phys within the
+ * active slot @group_id of the PCI VGA RAM memory region associated with
+ * the @qxl device. If the slot is inactive, or the offset is out
+ * of the memory region, returns NULL.
+ *
+ * Use with care; by the time this function returns, the returned pointer is
+ * not protected by RCU anymore.  If the caller is not within an RCU critical
+ * section and does not hold the iothread lock, it must have other means of
+ * protecting the pointer, such as a reference to the region that includes
+ * the incoming ram_addr_t.
+ *
+ */
 void *qxl_phys2virt(PCIQXLDevice *qxl, QXLPHYSICAL phys, int group_id);
 void qxl_set_guest_bug(PCIQXLDevice *qxl, const char *msg, ...)
     GCC_FMT_ATTR(2, 3);
-- 
Gitee


From 193a2751ff4563c507192516e8f0347bb76cff16 Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Mon, 5 Dec 2022 15:32:55 -0500
Subject: [PATCH 216/407] hw/display/qxl: Pass requested buffer size to
 qxl_phys2virt()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 240: hw/display/qxl: Have qxl_log_command Return early if no log_cmd handler
RH-Bugzilla: 2148545
RH-Acked-by: Gerd Hoffmann <kraxel@redhat.com>
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
RH-Acked-by: Marc-André Lureau <marcandre.lureau@redhat.com>
RH-Commit: [3/5] 8e362d67fe7fef9eb457cfb15d75b298fed725c3 (jmaloy/jons-qemu-kvm)

BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2148545
CVE: CVE-2022-4144
Upstream: Merged

commit 8efec0ef8bbc1e75a7ebf6e325a35806ece9b39f
Author: Philippe Mathieu-Daudé <philmd@linaro.org>
Date:   Mon Nov 28 21:27:39 2022 +0100

    hw/display/qxl: Pass requested buffer size to qxl_phys2virt()

    Currently qxl_phys2virt() doesn't check for buffer overrun.
    In order to do so in the next commit, pass the buffer size
    as argument.

    For QXLCursor in qxl_render_cursor() -> qxl_cursor() we
    verify the size of the chunked data ahead, checking we can
    access 'sizeof(QXLCursor) + chunk->data_size' bytes.
    Since in the SPICE_CURSOR_TYPE_MONO case the cursor is
    assumed to fit in one chunk, no change are required.
    In SPICE_CURSOR_TYPE_ALPHA the ahead read is handled in
    qxl_unpack_chunks().

    Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
    Acked-by: Gerd Hoffmann <kraxel@redhat.com>
    Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
    Message-Id: <20221128202741.4945-4-philmd@linaro.org>

(cherry picked from commit 8efec0ef8bbc1e75a7ebf6e325a35806ece9b39f)
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 hw/display/qxl-logger.c | 11 ++++++++---
 hw/display/qxl-render.c | 20 ++++++++++++++++----
 hw/display/qxl.c        | 14 +++++++++-----
 hw/display/qxl.h        |  4 +++-
 4 files changed, 36 insertions(+), 13 deletions(-)

diff --git a/hw/display/qxl-logger.c b/hw/display/qxl-logger.c
index 1bcf803db6d..35c38f62525 100644
--- a/hw/display/qxl-logger.c
+++ b/hw/display/qxl-logger.c
@@ -106,7 +106,7 @@ static int qxl_log_image(PCIQXLDevice *qxl, QXLPHYSICAL addr, int group_id)
     QXLImage *image;
     QXLImageDescriptor *desc;
 
-    image = qxl_phys2virt(qxl, addr, group_id);
+    image = qxl_phys2virt(qxl, addr, group_id, sizeof(QXLImage));
     if (!image) {
         return 1;
     }
@@ -214,7 +214,8 @@ int qxl_log_cmd_cursor(PCIQXLDevice *qxl, QXLCursorCmd *cmd, int group_id)
                 cmd->u.set.position.y,
                 cmd->u.set.visible ? "yes" : "no",
                 cmd->u.set.shape);
-        cursor = qxl_phys2virt(qxl, cmd->u.set.shape, group_id);
+        cursor = qxl_phys2virt(qxl, cmd->u.set.shape, group_id,
+                               sizeof(QXLCursor));
         if (!cursor) {
             return 1;
         }
@@ -236,6 +237,7 @@ int qxl_log_command(PCIQXLDevice *qxl, const char *ring, QXLCommandExt *ext)
 {
     bool compat = ext->flags & QXL_COMMAND_FLAG_COMPAT;
     void *data;
+    size_t datasz;
     int ret;
 
     if (!qxl->cmdlog) {
@@ -249,15 +251,18 @@ int qxl_log_command(PCIQXLDevice *qxl, const char *ring, QXLCommandExt *ext)
 
     switch (ext->cmd.type) {
     case QXL_CMD_DRAW:
+        datasz = compat ? sizeof(QXLCompatDrawable) : sizeof(QXLDrawable);
         break;
     case QXL_CMD_SURFACE:
+        datasz = sizeof(QXLSurfaceCmd);
         break;
     case QXL_CMD_CURSOR:
+        datasz = sizeof(QXLCursorCmd);
         break;
     default:
         goto out;
     }
-    data = qxl_phys2virt(qxl, ext->cmd.data, ext->group_id);
+    data = qxl_phys2virt(qxl, ext->cmd.data, ext->group_id, datasz);
     if (!data) {
         return 1;
     }
diff --git a/hw/display/qxl-render.c b/hw/display/qxl-render.c
index ca217004bf7..fcfd40c3ac1 100644
--- a/hw/display/qxl-render.c
+++ b/hw/display/qxl-render.c
@@ -107,7 +107,9 @@ static void qxl_render_update_area_unlocked(PCIQXLDevice *qxl)
         qxl->guest_primary.resized = 0;
         qxl->guest_primary.data = qxl_phys2virt(qxl,
                                                 qxl->guest_primary.surface.mem,
-                                                MEMSLOT_GROUP_GUEST);
+                                                MEMSLOT_GROUP_GUEST,
+                                                qxl->guest_primary.abs_stride
+                                                * height);
         if (!qxl->guest_primary.data) {
             goto end;
         }
@@ -228,7 +230,8 @@ static void qxl_unpack_chunks(void *dest, size_t size, PCIQXLDevice *qxl,
         if (offset == size) {
             return;
         }
-        chunk = qxl_phys2virt(qxl, chunk->next_chunk, group_id);
+        chunk = qxl_phys2virt(qxl, chunk->next_chunk, group_id,
+                              sizeof(QXLDataChunk) + chunk->data_size);
         if (!chunk) {
             return;
         }
@@ -295,7 +298,8 @@ fail:
 /* called from spice server thread context only */
 int qxl_render_cursor(PCIQXLDevice *qxl, QXLCommandExt *ext)
 {
-    QXLCursorCmd *cmd = qxl_phys2virt(qxl, ext->cmd.data, ext->group_id);
+    QXLCursorCmd *cmd = qxl_phys2virt(qxl, ext->cmd.data, ext->group_id,
+                                      sizeof(QXLCursorCmd));
     QXLCursor *cursor;
     QEMUCursor *c;
 
@@ -314,7 +318,15 @@ int qxl_render_cursor(PCIQXLDevice *qxl, QXLCommandExt *ext)
     }
     switch (cmd->type) {
     case QXL_CURSOR_SET:
-        cursor = qxl_phys2virt(qxl, cmd->u.set.shape, ext->group_id);
+        /* First read the QXLCursor to get QXLDataChunk::data_size ... */
+        cursor = qxl_phys2virt(qxl, cmd->u.set.shape, ext->group_id,
+                               sizeof(QXLCursor));
+        if (!cursor) {
+            return 1;
+        }
+        /* Then read including the chunked data following QXLCursor. */
+        cursor = qxl_phys2virt(qxl, cmd->u.set.shape, ext->group_id,
+                               sizeof(QXLCursor) + cursor->chunk.data_size);
         if (!cursor) {
             return 1;
         }
diff --git a/hw/display/qxl.c b/hw/display/qxl.c
index 29c80b4289b..aa9065183e3 100644
--- a/hw/display/qxl.c
+++ b/hw/display/qxl.c
@@ -274,7 +274,8 @@ static void qxl_spice_monitors_config_async(PCIQXLDevice *qxl, int replay)
                                           QXL_IO_MONITORS_CONFIG_ASYNC));
     }
 
-    cfg = qxl_phys2virt(qxl, qxl->guest_monitors_config, MEMSLOT_GROUP_GUEST);
+    cfg = qxl_phys2virt(qxl, qxl->guest_monitors_config, MEMSLOT_GROUP_GUEST,
+                        sizeof(QXLMonitorsConfig));
     if (cfg != NULL && cfg->count == 1) {
         qxl->guest_primary.resized = 1;
         qxl->guest_head0_width  = cfg->heads[0].width;
@@ -459,7 +460,8 @@ static int qxl_track_command(PCIQXLDevice *qxl, struct QXLCommandExt *ext)
     switch (le32_to_cpu(ext->cmd.type)) {
     case QXL_CMD_SURFACE:
     {
-        QXLSurfaceCmd *cmd = qxl_phys2virt(qxl, ext->cmd.data, ext->group_id);
+        QXLSurfaceCmd *cmd = qxl_phys2virt(qxl, ext->cmd.data, ext->group_id,
+                                           sizeof(QXLSurfaceCmd));
 
         if (!cmd) {
             return 1;
@@ -494,7 +496,8 @@ static int qxl_track_command(PCIQXLDevice *qxl, struct QXLCommandExt *ext)
     }
     case QXL_CMD_CURSOR:
     {
-        QXLCursorCmd *cmd = qxl_phys2virt(qxl, ext->cmd.data, ext->group_id);
+        QXLCursorCmd *cmd = qxl_phys2virt(qxl, ext->cmd.data, ext->group_id,
+                                          sizeof(QXLCursorCmd));
 
         if (!cmd) {
             return 1;
@@ -1444,7 +1447,8 @@ static bool qxl_get_check_slot_offset(PCIQXLDevice *qxl, QXLPHYSICAL pqxl,
 }
 
 /* can be also called from spice server thread context */
-void *qxl_phys2virt(PCIQXLDevice *qxl, QXLPHYSICAL pqxl, int group_id)
+void *qxl_phys2virt(PCIQXLDevice *qxl, QXLPHYSICAL pqxl, int group_id,
+                    size_t size)
 {
     uint64_t offset;
     uint32_t slot;
@@ -1952,7 +1956,7 @@ static void qxl_dirty_surfaces(PCIQXLDevice *qxl)
         }
 
         cmd = qxl_phys2virt(qxl, qxl->guest_surfaces.cmds[i],
-                            MEMSLOT_GROUP_GUEST);
+                            MEMSLOT_GROUP_GUEST, sizeof(QXLSurfaceCmd));
         assert(cmd);
         assert(cmd->type == QXL_SURFACE_CMD_CREATE);
         qxl_dirty_one_surface(qxl, cmd->u.surface_create.data,
diff --git a/hw/display/qxl.h b/hw/display/qxl.h
index c938f88a2f6..c784315daae 100644
--- a/hw/display/qxl.h
+++ b/hw/display/qxl.h
@@ -153,6 +153,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(PCIQXLDevice, PCI_QXL)
  * @qxl: QXL device
  * @phys: physical offset of buffer within the VRAM
  * @group_id: memory slot group
+ * @size: size of the buffer
  *
  * Returns a host pointer to a buffer placed at offset @phys within the
  * active slot @group_id of the PCI VGA RAM memory region associated with
@@ -166,7 +167,8 @@ OBJECT_DECLARE_SIMPLE_TYPE(PCIQXLDevice, PCI_QXL)
  * the incoming ram_addr_t.
  *
  */
-void *qxl_phys2virt(PCIQXLDevice *qxl, QXLPHYSICAL phys, int group_id);
+void *qxl_phys2virt(PCIQXLDevice *qxl, QXLPHYSICAL phys, int group_id,
+                    size_t size);
 void qxl_set_guest_bug(PCIQXLDevice *qxl, const char *msg, ...)
     GCC_FMT_ATTR(2, 3);
 
-- 
Gitee


From 8057cd1f1ad828e4fe7b9ff2a614c0245d15a1b8 Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Mon, 5 Dec 2022 15:32:55 -0500
Subject: [PATCH 217/407] hw/display/qxl: Avoid buffer overrun in qxl_phys2virt
 (CVE-2022-4144)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 240: hw/display/qxl: Have qxl_log_command Return early if no log_cmd handler
RH-Bugzilla: 2148545
RH-Acked-by: Gerd Hoffmann <kraxel@redhat.com>
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
RH-Acked-by: Marc-André Lureau <marcandre.lureau@redhat.com>
RH-Commit: [4/5] afe53f8d9b31c6fd8211fe172173151f3255e67c (jmaloy/jons-qemu-kvm)

BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2148545
CVE: CVE-2022-4144
Upstream: Merged

commit 6dbbf055148c6f1b7d8a3251a65bd6f3d1e1f622
Author: Philippe Mathieu-Daudé <philmd@linaro.org>
Date:   Mon Nov 28 21:27:40 2022 +0100

    hw/display/qxl: Avoid buffer overrun in qxl_phys2virt (CVE-2022-4144)

    Have qxl_get_check_slot_offset() return false if the requested
    buffer size does not fit within the slot memory region.

    Similarly qxl_phys2virt() now returns NULL in such case, and
    qxl_dirty_one_surface() aborts.

    This avoids buffer overrun in the host pointer returned by
    memory_region_get_ram_ptr().

    Fixes: CVE-2022-4144 (out-of-bounds read)
    Reported-by: Wenxu Yin (@awxylitol)
    Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1336
    Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
    Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
    Message-Id: <20221128202741.4945-5-philmd@linaro.org>

(cherry picked from commit 6dbbf055148c6f1b7d8a3251a65bd6f3d1e1f622)
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 hw/display/qxl.c | 27 +++++++++++++++++++++++----
 hw/display/qxl.h |  2 +-
 2 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/hw/display/qxl.c b/hw/display/qxl.c
index aa9065183e3..2a4b2d41583 100644
--- a/hw/display/qxl.c
+++ b/hw/display/qxl.c
@@ -1412,11 +1412,13 @@ static void qxl_reset_surfaces(PCIQXLDevice *d)
 
 /* can be also called from spice server thread context */
 static bool qxl_get_check_slot_offset(PCIQXLDevice *qxl, QXLPHYSICAL pqxl,
-                                      uint32_t *s, uint64_t *o)
+                                      uint32_t *s, uint64_t *o,
+                                      size_t size_requested)
 {
     uint64_t phys   = le64_to_cpu(pqxl);
     uint32_t slot   = (phys >> (64 -  8)) & 0xff;
     uint64_t offset = phys & 0xffffffffffff;
+    uint64_t size_available;
 
     if (slot >= NUM_MEMSLOTS) {
         qxl_set_guest_bug(qxl, "slot too large %d >= %d", slot,
@@ -1440,6 +1442,23 @@ static bool qxl_get_check_slot_offset(PCIQXLDevice *qxl, QXLPHYSICAL pqxl,
                           slot, offset, qxl->guest_slots[slot].size);
         return false;
     }
+    size_available = memory_region_size(qxl->guest_slots[slot].mr);
+    if (qxl->guest_slots[slot].offset + offset >= size_available) {
+        qxl_set_guest_bug(qxl,
+                          "slot %d offset %"PRIu64" > region size %"PRIu64"\n",
+                          slot, qxl->guest_slots[slot].offset + offset,
+                          size_available);
+        return false;
+    }
+    size_available -= qxl->guest_slots[slot].offset + offset;
+    if (size_requested > size_available) {
+        qxl_set_guest_bug(qxl,
+                          "slot %d offset %"PRIu64" size %zu: "
+                          "overrun by %"PRIu64" bytes\n",
+                          slot, offset, size_requested,
+                          size_requested - size_available);
+        return false;
+    }
 
     *s = slot;
     *o = offset;
@@ -1459,7 +1478,7 @@ void *qxl_phys2virt(PCIQXLDevice *qxl, QXLPHYSICAL pqxl, int group_id,
         offset = le64_to_cpu(pqxl) & 0xffffffffffff;
         return (void *)(intptr_t)offset;
     case MEMSLOT_GROUP_GUEST:
-        if (!qxl_get_check_slot_offset(qxl, pqxl, &slot, &offset)) {
+        if (!qxl_get_check_slot_offset(qxl, pqxl, &slot, &offset, size)) {
             return NULL;
         }
         ptr = memory_region_get_ram_ptr(qxl->guest_slots[slot].mr);
@@ -1925,9 +1944,9 @@ static void qxl_dirty_one_surface(PCIQXLDevice *qxl, QXLPHYSICAL pqxl,
     uint32_t slot;
     bool rc;
 
-    rc = qxl_get_check_slot_offset(qxl, pqxl, &slot, &offset);
-    assert(rc == true);
     size = (uint64_t)height * abs(stride);
+    rc = qxl_get_check_slot_offset(qxl, pqxl, &slot, &offset, size);
+    assert(rc == true);
     trace_qxl_surfaces_dirty(qxl->id, offset, size);
     qxl_set_dirty(qxl->guest_slots[slot].mr,
                   qxl->guest_slots[slot].offset + offset,
diff --git a/hw/display/qxl.h b/hw/display/qxl.h
index c784315daae..89ca832cf97 100644
--- a/hw/display/qxl.h
+++ b/hw/display/qxl.h
@@ -157,7 +157,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(PCIQXLDevice, PCI_QXL)
  *
  * Returns a host pointer to a buffer placed at offset @phys within the
  * active slot @group_id of the PCI VGA RAM memory region associated with
- * the @qxl device. If the slot is inactive, or the offset is out
+ * the @qxl device. If the slot is inactive, or the offset + size are out
  * of the memory region, returns NULL.
  *
  * Use with care; by the time this function returns, the returned pointer is
-- 
Gitee


From 0e2faa09ef70be44523832e3e378564ed8c2ae78 Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Mon, 5 Dec 2022 15:32:55 -0500
Subject: [PATCH 218/407] hw/display/qxl: Assert memory slot fits in
 preallocated MemoryRegion
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 240: hw/display/qxl: Have qxl_log_command Return early if no log_cmd handler
RH-Bugzilla: 2148545
RH-Acked-by: Gerd Hoffmann <kraxel@redhat.com>
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
RH-Acked-by: Marc-André Lureau <marcandre.lureau@redhat.com>
RH-Commit: [5/5] f809ce48e7989dd6547b7c8bf1a5efc3fdcacbac (jmaloy/jons-qemu-kvm)

BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2148545
CVE: CVE-2022-4144
Upstream: Merged

commit 86fdb0582c653a9824183679403a85f588260d62
Author: Philippe Mathieu-Daudé <philmd@linaro.org>
Date:   Mon Nov 28 21:27:41 2022 +0100

    hw/display/qxl: Assert memory slot fits in preallocated MemoryRegion

    Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
    Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
    Message-Id: <20221128202741.4945-6-philmd@linaro.org>

(cherry picked from commit 86fdb0582c653a9824183679403a85f588260d62)
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 hw/display/qxl.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/display/qxl.c b/hw/display/qxl.c
index 2a4b2d41583..bcd9e8716aa 100644
--- a/hw/display/qxl.c
+++ b/hw/display/qxl.c
@@ -1372,6 +1372,7 @@ static int qxl_add_memslot(PCIQXLDevice *d, uint32_t slot_id, uint64_t delta,
         qxl_set_guest_bug(d, "%s: pci_region = %d", __func__, pci_region);
         return 1;
     }
+    assert(guest_end - pci_start <= memory_region_size(mr));
 
     virt_start = (intptr_t)memory_region_get_ram_ptr(mr);
     memslot.slot_id = slot_id;
-- 
Gitee


From 4dcdce9eed46406f4604788967e171482677f4d0 Mon Sep 17 00:00:00 2001
From: Nico Boehr <nrb@linux.ibm.com>
Date: Wed, 12 Oct 2022 14:32:29 +0200
Subject: [PATCH 219/407] s390x/tod-kvm: don't save/restore the TOD in PV
 guests
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 244: s390x/tod-kvm: don't save/restore the TOD in PV guests
RH-Bugzilla: 2155448
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Commit: [1/1] 3cb3154dd7c1549c54cf8c0483b5f23b235f6db3

Under PV, the guest's TOD clock is under control of the ultravisor and the
hypervisor cannot change it.

With upcoming kernel changes[1], the Linux kernel will reject QEMU's
request to adjust the guest's clock in this case, so don't attempt to set
the clock.

This avoids the following warning message on save/restore of a PV guest:

warning: Unable to set KVM guest TOD clock: Operation not supported

[1] https://lore.kernel.org/all/20221011160712.928239-2-nrb@linux.ibm.com/

Fixes: c3347ed0d2ee ("s390x: protvirt: Support unpack facility")
Signed-off-by: Nico Boehr <nrb@linux.ibm.com>
Message-Id: <20221012123229.1196007-1-nrb@linux.ibm.com>
[thuth: Add curly braces]
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 38621181ae3cbec62e3490fbc14f6ac01642d07a)
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 hw/s390x/tod-kvm.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/hw/s390x/tod-kvm.c b/hw/s390x/tod-kvm.c
index ec855811aeb..c804c979b5c 100644
--- a/hw/s390x/tod-kvm.c
+++ b/hw/s390x/tod-kvm.c
@@ -13,6 +13,7 @@
 #include "qemu/module.h"
 #include "sysemu/runstate.h"
 #include "hw/s390x/tod.h"
+#include "hw/s390x/pv.h"
 #include "kvm/kvm_s390x.h"
 
 static void kvm_s390_get_tod_raw(S390TOD *tod, Error **errp)
@@ -84,6 +85,14 @@ static void kvm_s390_tod_vm_state_change(void *opaque, bool running,
     S390TODState *td = opaque;
     Error *local_err = NULL;
 
+    /*
+     * Under PV, the clock is under ultravisor control, hence we cannot restore
+     * it on resume.
+     */
+    if (s390_is_pv()) {
+        return;
+    }
+
     if (running && td->stopped) {
         /* Set the old TOD when running the VM - start the TOD clock. */
         kvm_s390_set_tod_raw(&td->base, &local_err);
-- 
Gitee


From 58a45809d5b73bdfcd72b533abc63d4ea271f2c6 Mon Sep 17 00:00:00 2001
From: Hanna Reitz <hreitz@redhat.com>
Date: Wed, 9 Nov 2022 17:54:48 +0100
Subject: [PATCH 220/407] block/mirror: Do not wait for active writes

RH-Author: Hanna Czenczek <hreitz@redhat.com>
RH-MergeRequest: 246: block/mirror: Make active mirror progress even under full load
RH-Bugzilla: 2125119
RH-Acked-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
RH-Acked-by: Stefano Garzarella <sgarzare@redhat.com>
RH-Acked-by: Kevin Wolf <kwolf@redhat.com>
RH-Commit: [1/3] 652d1e55b954f13eaec2c86f58735d4942837e16

Waiting for all active writes to settle before daring to create a
background copying operation means that we will never do background
operations while the guest does anything (in write-blocking mode), and
therefore cannot converge.  Yes, we also will not diverge, but actually
converging would be even nicer.

It is unclear why we did decide to wait for all active writes to settle
before creating a background operation, but it just does not seem
necessary.  Active writes will put themselves into the in_flight bitmap
and thus properly block actually conflicting background requests.

It is important for active requests to wait on overlapping background
requests, which we do in active_write_prepare().  However, so far it was
not documented why it is important.  Add such documentation now, and
also to the other call of mirror_wait_on_conflicts(), so that it becomes
more clear why and when requests need to actively wait for other
requests to settle.

Another thing to note is that of course we need to ensure that there are
no active requests when the job completes, but that is done by virtue of
the BDS being drained anyway, so there cannot be any active requests at
that point.

With this change, we will need to explicitly keep track of how many
bytes are in flight in active requests so that
job_progress_set_remaining() in mirror_run() can set the correct number
of remaining bytes.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2123297
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20221109165452.67927-2-hreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit d69a879bdf1aed586478eaa161ee064fe1b92f1a)
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
---
 block/mirror.c | 37 ++++++++++++++++++++++++++++++-------
 1 file changed, 30 insertions(+), 7 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index efec2c7674b..282f428cb7d 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -81,6 +81,7 @@ typedef struct MirrorBlockJob {
     int max_iov;
     bool initial_zeroing_ongoing;
     int in_active_write_counter;
+    int64_t active_write_bytes_in_flight;
     bool prepared;
     bool in_drain;
 } MirrorBlockJob;
@@ -493,6 +494,13 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
     }
     bdrv_dirty_bitmap_unlock(s->dirty_bitmap);
 
+    /*
+     * Wait for concurrent requests to @offset.  The next loop will limit the
+     * copied area based on in_flight_bitmap so we only copy an area that does
+     * not overlap with concurrent in-flight requests.  Still, we would like to
+     * copy something, so wait until there are at least no more requests to the
+     * very beginning of the area.
+     */
     mirror_wait_on_conflicts(NULL, s, offset, 1);
 
     job_pause_point(&s->common.job);
@@ -993,12 +1001,6 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
         int64_t cnt, delta;
         bool should_complete;
 
-        /* Do not start passive operations while there are active
-         * writes in progress */
-        while (s->in_active_write_counter) {
-            mirror_wait_for_any_operation(s, true);
-        }
-
         if (s->ret < 0) {
             ret = s->ret;
             goto immediate_exit;
@@ -1015,7 +1017,9 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
         /* cnt is the number of dirty bytes remaining and s->bytes_in_flight is
          * the number of bytes currently being processed; together those are
          * the current remaining operation length */
-        job_progress_set_remaining(&s->common.job, s->bytes_in_flight + cnt);
+        job_progress_set_remaining(&s->common.job,
+                                   s->bytes_in_flight + cnt +
+                                   s->active_write_bytes_in_flight);
 
         /* Note that even when no rate limit is applied we need to yield
          * periodically with no pending I/O so that bdrv_drain_all() returns.
@@ -1073,6 +1077,10 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
 
             s->in_drain = true;
             bdrv_drained_begin(bs);
+
+            /* Must be zero because we are drained */
+            assert(s->in_active_write_counter == 0);
+
             cnt = bdrv_get_dirty_count(s->dirty_bitmap);
             if (cnt > 0 || mirror_flush(s) < 0) {
                 bdrv_drained_end(bs);
@@ -1306,6 +1314,7 @@ do_sync_target_write(MirrorBlockJob *job, MirrorMethod method,
     }
 
     job_progress_increase_remaining(&job->common.job, bytes);
+    job->active_write_bytes_in_flight += bytes;
 
     switch (method) {
     case MIRROR_METHOD_COPY:
@@ -1327,6 +1336,7 @@ do_sync_target_write(MirrorBlockJob *job, MirrorMethod method,
         abort();
     }
 
+    job->active_write_bytes_in_flight -= bytes;
     if (ret >= 0) {
         job_progress_update(&job->common.job, bytes);
     } else {
@@ -1375,6 +1385,19 @@ static MirrorOp *coroutine_fn active_write_prepare(MirrorBlockJob *s,
 
     s->in_active_write_counter++;
 
+    /*
+     * Wait for concurrent requests affecting the area.  If there are already
+     * running requests that are copying off now-to-be stale data in the area,
+     * we must wait for them to finish before we begin writing fresh data to the
+     * target so that the write operations appear in the correct order.
+     * Note that background requests (see mirror_iteration()) in contrast only
+     * wait for conflicting requests at the start of the dirty area, and then
+     * (based on the in_flight_bitmap) truncate the area to copy so it will not
+     * conflict with any requests beyond that.  For active writes, however, we
+     * cannot truncate that area.  The request from our parent must be blocked
+     * until the area is copied in full.  Therefore, we must wait for the whole
+     * area to become free of concurrent requests.
+     */
     mirror_wait_on_conflicts(op, s, offset, bytes);
 
     bitmap_set(s->in_flight_bitmap, start_chunk, end_chunk - start_chunk);
-- 
Gitee


From 36cef4949e39f06ce52bc7339efa623fae6ac172 Mon Sep 17 00:00:00 2001
From: Hanna Reitz <hreitz@redhat.com>
Date: Wed, 9 Nov 2022 17:54:49 +0100
Subject: [PATCH 221/407] block/mirror: Drop mirror_wait_for_any_operation()

RH-Author: Hanna Czenczek <hreitz@redhat.com>
RH-MergeRequest: 246: block/mirror: Make active mirror progress even under full load
RH-Bugzilla: 2125119
RH-Acked-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
RH-Acked-by: Stefano Garzarella <sgarzare@redhat.com>
RH-Acked-by: Kevin Wolf <kwolf@redhat.com>
RH-Commit: [2/3] dec37883bcc491441ae08d9592d1ec26a47765c0

mirror_wait_for_free_in_flight_slot() is the only remaining user of
mirror_wait_for_any_operation(), so inline the latter into the former.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20221109165452.67927-3-hreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit eb994912993077f178ccb43b20e422ecf9ae4ac7)
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
---
 block/mirror.c | 21 ++++++++-------------
 1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 282f428cb7d..6b02555ad71 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -304,19 +304,21 @@ static int mirror_cow_align(MirrorBlockJob *s, int64_t *offset,
 }
 
 static inline void coroutine_fn
-mirror_wait_for_any_operation(MirrorBlockJob *s, bool active)
+mirror_wait_for_free_in_flight_slot(MirrorBlockJob *s)
 {
     MirrorOp *op;
 
     QTAILQ_FOREACH(op, &s->ops_in_flight, next) {
-        /* Do not wait on pseudo ops, because it may in turn wait on
+        /*
+         * Do not wait on pseudo ops, because it may in turn wait on
          * some other operation to start, which may in fact be the
          * caller of this function.  Since there is only one pseudo op
          * at any given time, we will always find some real operation
-         * to wait on. */
-        if (!op->is_pseudo_op && op->is_in_flight &&
-            op->is_active_write == active)
-        {
+         * to wait on.
+         * Also, do not wait on active operations, because they do not
+         * use up in-flight slots.
+         */
+        if (!op->is_pseudo_op && op->is_in_flight && !op->is_active_write) {
             qemu_co_queue_wait(&op->waiting_requests, NULL);
             return;
         }
@@ -324,13 +326,6 @@ mirror_wait_for_any_operation(MirrorBlockJob *s, bool active)
     abort();
 }
 
-static inline void coroutine_fn
-mirror_wait_for_free_in_flight_slot(MirrorBlockJob *s)
-{
-    /* Only non-active operations use up in-flight slots */
-    mirror_wait_for_any_operation(s, false);
-}
-
 /* Perform a mirror copy operation.
  *
  * *op->bytes_handled is set to the number of bytes copied after and
-- 
Gitee


From 40430d9fb5b246727d1a2bb794976e21c5aa4bbc Mon Sep 17 00:00:00 2001
From: Hanna Reitz <hreitz@redhat.com>
Date: Wed, 9 Nov 2022 17:54:50 +0100
Subject: [PATCH 222/407] block/mirror: Fix NULL s->job in active writes

RH-Author: Hanna Czenczek <hreitz@redhat.com>
RH-MergeRequest: 246: block/mirror: Make active mirror progress even under full load
RH-Bugzilla: 2125119
RH-Acked-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
RH-Acked-by: Stefano Garzarella <sgarzare@redhat.com>
RH-Acked-by: Kevin Wolf <kwolf@redhat.com>
RH-Commit: [3/3] 49d7ebd15667151a6e14228a8260cfdd0aa27a78

There is a small gap in mirror_start_job() before putting the mirror
filter node into the block graph (bdrv_append() call) and the actual job
being created.  Before the job is created, MirrorBDSOpaque.job is NULL.

It is possible that requests come in when bdrv_drained_end() is called,
and those requests would see MirrorBDSOpaque.job == NULL.  Have our
filter node handle that case gracefully.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20221109165452.67927-4-hreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit da93d5c84e56e6b4e84aa8e98b6b984c9b6bb528)
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
---
 block/mirror.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 6b02555ad71..50289fca499 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1438,11 +1438,13 @@ static int coroutine_fn bdrv_mirror_top_do_write(BlockDriverState *bs,
     MirrorOp *op = NULL;
     MirrorBDSOpaque *s = bs->opaque;
     int ret = 0;
-    bool copy_to_target;
+    bool copy_to_target = false;
 
-    copy_to_target = s->job->ret >= 0 &&
-                     !job_is_cancelled(&s->job->common.job) &&
-                     s->job->copy_mode == MIRROR_COPY_MODE_WRITE_BLOCKING;
+    if (s->job) {
+        copy_to_target = s->job->ret >= 0 &&
+                         !job_is_cancelled(&s->job->common.job) &&
+                         s->job->copy_mode == MIRROR_COPY_MODE_WRITE_BLOCKING;
+    }
 
     if (copy_to_target) {
         op = active_write_prepare(s->job, offset, bytes);
@@ -1487,11 +1489,13 @@ static int coroutine_fn bdrv_mirror_top_pwritev(BlockDriverState *bs,
     QEMUIOVector bounce_qiov;
     void *bounce_buf;
     int ret = 0;
-    bool copy_to_target;
+    bool copy_to_target = false;
 
-    copy_to_target = s->job->ret >= 0 &&
-                     !job_is_cancelled(&s->job->common.job) &&
-                     s->job->copy_mode == MIRROR_COPY_MODE_WRITE_BLOCKING;
+    if (s->job) {
+        copy_to_target = s->job->ret >= 0 &&
+                         !job_is_cancelled(&s->job->common.job) &&
+                         s->job->copy_mode == MIRROR_COPY_MODE_WRITE_BLOCKING;
+    }
 
     if (copy_to_target) {
         /* The guest might concurrently modify the data to write; but
-- 
Gitee


From e822e61df3da6d791b64628167e9c79ce6d247bf Mon Sep 17 00:00:00 2001
From: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Date: Mon, 16 Jan 2023 07:49:59 -0500
Subject: [PATCH 223/407] accel: introduce accelerator blocker API
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Emanuele Giuseppe Esposito <eesposit@redhat.com>
RH-MergeRequest: 247: accel: introduce accelerator blocker API
RH-Bugzilla: 2161188
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Acked-by: Vitaly Kuznetsov <vkuznets@redhat.com>
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Commit: [1/3] 9d3d7f9554974a79042c915763288cce07aef135

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2161188

commit bd688fc93120fb3e28aa70e3dfdf567ccc1e0bc1
Author: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Date:   Fri Nov 11 10:47:56 2022 -0500

    accel: introduce accelerator blocker API

    This API allows the accelerators to prevent vcpus from issuing
    new ioctls while execting a critical section marked with the
    accel_ioctl_inhibit_begin/end functions.

    Note that all functions submitting ioctls must mark where the
    ioctl is being called with accel_{cpu_}ioctl_begin/end().

    This API requires the caller to always hold the BQL.
    API documentation is in sysemu/accel-blocker.h

    Internally, it uses a QemuLockCnt together with a per-CPU QemuLockCnt
    (to minimize cache line bouncing) to keep avoid that new ioctls
    run when the critical section starts, and a QemuEvent to wait
    that all running ioctls finish.

    Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
    Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
    Message-Id: <20221111154758.1372674-2-eesposit@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Conflicts:
	util/meson.build: files are missing in rhel 8.8.0
	namely int128.c, memalign.c and interval-tree.c

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 accel/accel-blocker.c          | 154 +++++++++++++++++++++++++++++++++
 accel/meson.build              |   2 +-
 hw/core/cpu-common.c           |   2 +
 include/hw/core/cpu.h          |   3 +
 include/sysemu/accel-blocker.h |  56 ++++++++++++
 util/meson.build               |   2 +-
 6 files changed, 217 insertions(+), 2 deletions(-)
 create mode 100644 accel/accel-blocker.c
 create mode 100644 include/sysemu/accel-blocker.h

diff --git a/accel/accel-blocker.c b/accel/accel-blocker.c
new file mode 100644
index 00000000000..1e7f423462d
--- /dev/null
+++ b/accel/accel-blocker.c
@@ -0,0 +1,154 @@
+/*
+ * Lock to inhibit accelerator ioctls
+ *
+ * Copyright (c) 2022 Red Hat Inc.
+ *
+ * Author: Emanuele Giuseppe Esposito       <eesposit@redhat.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/thread.h"
+#include "qemu/main-loop.h"
+#include "hw/core/cpu.h"
+#include "sysemu/accel-blocker.h"
+
+static QemuLockCnt accel_in_ioctl_lock;
+static QemuEvent accel_in_ioctl_event;
+
+void accel_blocker_init(void)
+{
+    qemu_lockcnt_init(&accel_in_ioctl_lock);
+    qemu_event_init(&accel_in_ioctl_event, false);
+}
+
+void accel_ioctl_begin(void)
+{
+    if (likely(qemu_mutex_iothread_locked())) {
+        return;
+    }
+
+    /* block if lock is taken in kvm_ioctl_inhibit_begin() */
+    qemu_lockcnt_inc(&accel_in_ioctl_lock);
+}
+
+void accel_ioctl_end(void)
+{
+    if (likely(qemu_mutex_iothread_locked())) {
+        return;
+    }
+
+    qemu_lockcnt_dec(&accel_in_ioctl_lock);
+    /* change event to SET. If event was BUSY, wake up all waiters */
+    qemu_event_set(&accel_in_ioctl_event);
+}
+
+void accel_cpu_ioctl_begin(CPUState *cpu)
+{
+    if (unlikely(qemu_mutex_iothread_locked())) {
+        return;
+    }
+
+    /* block if lock is taken in kvm_ioctl_inhibit_begin() */
+    qemu_lockcnt_inc(&cpu->in_ioctl_lock);
+}
+
+void accel_cpu_ioctl_end(CPUState *cpu)
+{
+    if (unlikely(qemu_mutex_iothread_locked())) {
+        return;
+    }
+
+    qemu_lockcnt_dec(&cpu->in_ioctl_lock);
+    /* change event to SET. If event was BUSY, wake up all waiters */
+    qemu_event_set(&accel_in_ioctl_event);
+}
+
+static bool accel_has_to_wait(void)
+{
+    CPUState *cpu;
+    bool needs_to_wait = false;
+
+    CPU_FOREACH(cpu) {
+        if (qemu_lockcnt_count(&cpu->in_ioctl_lock)) {
+            /* exit the ioctl, if vcpu is running it */
+            qemu_cpu_kick(cpu);
+            needs_to_wait = true;
+        }
+    }
+
+    return needs_to_wait || qemu_lockcnt_count(&accel_in_ioctl_lock);
+}
+
+void accel_ioctl_inhibit_begin(void)
+{
+    CPUState *cpu;
+
+    /*
+     * We allow to inhibit only when holding the BQL, so we can identify
+     * when an inhibitor wants to issue an ioctl easily.
+     */
+    g_assert(qemu_mutex_iothread_locked());
+
+    /* Block further invocations of the ioctls outside the BQL.  */
+    CPU_FOREACH(cpu) {
+        qemu_lockcnt_lock(&cpu->in_ioctl_lock);
+    }
+    qemu_lockcnt_lock(&accel_in_ioctl_lock);
+
+    /* Keep waiting until there are running ioctls */
+    while (true) {
+
+        /* Reset event to FREE. */
+        qemu_event_reset(&accel_in_ioctl_event);
+
+        if (accel_has_to_wait()) {
+            /*
+             * If event is still FREE, and there are ioctls still in progress,
+             * wait.
+             *
+             *  If an ioctl finishes before qemu_event_wait(), it will change
+             * the event state to SET. This will prevent qemu_event_wait() from
+             * blocking, but it's not a problem because if other ioctls are
+             * still running the loop will iterate once more and reset the event
+             * status to FREE so that it can wait properly.
+             *
+             * If an ioctls finishes while qemu_event_wait() is blocking, then
+             * it will be waken up, but also here the while loop makes sure
+             * to re-enter the wait if there are other running ioctls.
+             */
+            qemu_event_wait(&accel_in_ioctl_event);
+        } else {
+            /* No ioctl is running */
+            return;
+        }
+    }
+}
+
+void accel_ioctl_inhibit_end(void)
+{
+    CPUState *cpu;
+
+    qemu_lockcnt_unlock(&accel_in_ioctl_lock);
+    CPU_FOREACH(cpu) {
+        qemu_lockcnt_unlock(&cpu->in_ioctl_lock);
+    }
+}
+
diff --git a/accel/meson.build b/accel/meson.build
index dfd808d2c8e..801b4d44e81 100644
--- a/accel/meson.build
+++ b/accel/meson.build
@@ -1,4 +1,4 @@
-specific_ss.add(files('accel-common.c'))
+specific_ss.add(files('accel-common.c', 'accel-blocker.c'))
 softmmu_ss.add(files('accel-softmmu.c'))
 user_ss.add(files('accel-user.c'))
 
diff --git a/hw/core/cpu-common.c b/hw/core/cpu-common.c
index 9e3241b4308..b6e83acf0ae 100644
--- a/hw/core/cpu-common.c
+++ b/hw/core/cpu-common.c
@@ -238,6 +238,7 @@ static void cpu_common_initfn(Object *obj)
     cpu->nr_threads = 1;
 
     qemu_mutex_init(&cpu->work_mutex);
+    qemu_lockcnt_init(&cpu->in_ioctl_lock);
     QSIMPLEQ_INIT(&cpu->work_list);
     QTAILQ_INIT(&cpu->breakpoints);
     QTAILQ_INIT(&cpu->watchpoints);
@@ -249,6 +250,7 @@ static void cpu_common_finalize(Object *obj)
 {
     CPUState *cpu = CPU(obj);
 
+    qemu_lockcnt_destroy(&cpu->in_ioctl_lock);
     qemu_mutex_destroy(&cpu->work_mutex);
 }
 
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index e948e81f1a9..49d9c73f973 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -383,6 +383,9 @@ struct CPUState {
     uint32_t kvm_fetch_index;
     uint64_t dirty_pages;
 
+    /* Use by accel-block: CPU is executing an ioctl() */
+    QemuLockCnt in_ioctl_lock;
+
     /* Used for events with 'vcpu' and *without* the 'disabled' properties */
     DECLARE_BITMAP(trace_dstate_delayed, CPU_TRACE_DSTATE_MAX_EVENTS);
     DECLARE_BITMAP(trace_dstate, CPU_TRACE_DSTATE_MAX_EVENTS);
diff --git a/include/sysemu/accel-blocker.h b/include/sysemu/accel-blocker.h
new file mode 100644
index 00000000000..72020529ef3
--- /dev/null
+++ b/include/sysemu/accel-blocker.h
@@ -0,0 +1,56 @@
+/*
+ * Accelerator blocking API, to prevent new ioctls from starting and wait the
+ * running ones finish.
+ * This mechanism differs from pause/resume_all_vcpus() in that it does not
+ * release the BQL.
+ *
+ *  Copyright (c) 2022 Red Hat Inc.
+ *
+ * Author: Emanuele Giuseppe Esposito       <eesposit@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#ifndef ACCEL_BLOCKER_H
+#define ACCEL_BLOCKER_H
+
+#include "qemu/osdep.h"
+#include "sysemu/cpus.h"
+
+extern void accel_blocker_init(void);
+
+/*
+ * accel_{cpu_}ioctl_begin/end:
+ * Mark when ioctl is about to run or just finished.
+ *
+ * accel_{cpu_}ioctl_begin will block after accel_ioctl_inhibit_begin() is
+ * called, preventing new ioctls to run. They will continue only after
+ * accel_ioctl_inibith_end().
+ */
+extern void accel_ioctl_begin(void);
+extern void accel_ioctl_end(void);
+extern void accel_cpu_ioctl_begin(CPUState *cpu);
+extern void accel_cpu_ioctl_end(CPUState *cpu);
+
+/*
+ * accel_ioctl_inhibit_begin: start critical section
+ *
+ * This function makes sure that:
+ * 1) incoming accel_{cpu_}ioctl_begin() calls block
+ * 2) wait that all ioctls that were already running reach
+ *    accel_{cpu_}ioctl_end(), kicking vcpus if necessary.
+ *
+ * This allows the caller to access shared data or perform operations without
+ * worrying of concurrent vcpus accesses.
+ */
+extern void accel_ioctl_inhibit_begin(void);
+
+/*
+ * accel_ioctl_inhibit_end: end critical section started by
+ * accel_ioctl_inhibit_begin()
+ *
+ * This function allows blocked accel_{cpu_}ioctl_begin() to continue.
+ */
+extern void accel_ioctl_inhibit_end(void);
+
+#endif /* ACCEL_BLOCKER_H */
diff --git a/util/meson.build b/util/meson.build
index 05b593055a1..b5f153b0e8c 100644
--- a/util/meson.build
+++ b/util/meson.build
@@ -48,6 +48,7 @@ util_ss.add(files('transactions.c'))
 util_ss.add(when: 'CONFIG_POSIX', if_true: files('drm.c'))
 util_ss.add(files('guest-random.c'))
 util_ss.add(files('yank.c'))
+util_ss.add(files('lockcnt.c'))
 
 if have_user
   util_ss.add(files('selfmap.c'))
@@ -69,7 +70,6 @@ if have_block
   util_ss.add(files('hexdump.c'))
   util_ss.add(files('iova-tree.c'))
   util_ss.add(files('iov.c', 'qemu-sockets.c', 'uri.c'))
-  util_ss.add(files('lockcnt.c'))
   util_ss.add(files('main-loop.c'))
   util_ss.add(files('nvdimm-utils.c'))
   util_ss.add(files('qemu-coroutine.c', 'qemu-coroutine-lock.c', 'qemu-coroutine-io.c'))
-- 
Gitee


From 3a2fc4aee2a0f03b61a93653e69194829df22781 Mon Sep 17 00:00:00 2001
From: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Date: Mon, 16 Jan 2023 07:51:08 -0500
Subject: [PATCH 224/407] KVM: keep track of running ioctls

RH-Author: Emanuele Giuseppe Esposito <eesposit@redhat.com>
RH-MergeRequest: 247: accel: introduce accelerator blocker API
RH-Bugzilla: 2161188
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Acked-by: Vitaly Kuznetsov <vkuznets@redhat.com>
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Commit: [2/3] 357508389e2a0fd996206b406e9e235e50b5f0b6

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2161188

commit a27dd2de68f37ba96fe164a42121daa5f0750afc
Author: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Date:   Fri Nov 11 10:47:57 2022 -0500

    KVM: keep track of running ioctls

    Using the new accel-blocker API, mark where ioctls are being called
    in KVM. Next, we will implement the critical section that will take
    care of performing memslots modifications atomically, therefore
    preventing any new ioctl from running and allowing the running ones
    to finish.

    Signed-off-by: David Hildenbrand <david@redhat.com>
    Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
    Message-Id: <20221111154758.1372674-3-eesposit@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 accel/kvm/kvm-all.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 8f2a53438f6..221aadfda74 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -2337,6 +2337,7 @@ static int kvm_init(MachineState *ms)
     assert(TARGET_PAGE_SIZE <= qemu_real_host_page_size);
 
     s->sigmask_len = 8;
+    accel_blocker_init();
 
 #ifdef KVM_CAP_SET_GUEST_DEBUG
     QTAILQ_INIT(&s->kvm_sw_breakpoints);
@@ -3018,7 +3019,9 @@ int kvm_vm_ioctl(KVMState *s, int type, ...)
     va_end(ap);
 
     trace_kvm_vm_ioctl(type, arg);
+    accel_ioctl_begin();
     ret = ioctl(s->vmfd, type, arg);
+    accel_ioctl_end();
     if (ret == -1) {
         ret = -errno;
     }
@@ -3036,7 +3039,9 @@ int kvm_vcpu_ioctl(CPUState *cpu, int type, ...)
     va_end(ap);
 
     trace_kvm_vcpu_ioctl(cpu->cpu_index, type, arg);
+    accel_cpu_ioctl_begin(cpu);
     ret = ioctl(cpu->kvm_fd, type, arg);
+    accel_cpu_ioctl_end(cpu);
     if (ret == -1) {
         ret = -errno;
     }
@@ -3054,7 +3059,9 @@ int kvm_device_ioctl(int fd, int type, ...)
     va_end(ap);
 
     trace_kvm_device_ioctl(fd, type, arg);
+    accel_ioctl_begin();
     ret = ioctl(fd, type, arg);
+    accel_ioctl_end();
     if (ret == -1) {
         ret = -errno;
     }
-- 
Gitee


From c1e2e1b8f1f27939a66f3cf9e4ca0b13ef989cc3 Mon Sep 17 00:00:00 2001
From: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Date: Mon, 16 Jan 2023 07:51:56 -0500
Subject: [PATCH 225/407] kvm: Atomic memslot updates

RH-Author: Emanuele Giuseppe Esposito <eesposit@redhat.com>
RH-MergeRequest: 247: accel: introduce accelerator blocker API
RH-Bugzilla: 2161188
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Acked-by: Vitaly Kuznetsov <vkuznets@redhat.com>
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Commit: [3/3] 520e41c0f58066a7381a5f6b32b81bc01cce51c0

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2161188

commit f39b7d2b96e3e73c01bb678cd096f7baf0b9ab39
Author: David Hildenbrand <david@redhat.com>
Date:   Fri Nov 11 10:47:58 2022 -0500

    kvm: Atomic memslot updates

    If we update an existing memslot (e.g., resize, split), we temporarily
    remove the memslot to re-add it immediately afterwards. These updates
    are not atomic, especially not for KVM VCPU threads, such that we can
    get spurious faults.

    Let's inhibit most KVM ioctls while performing relevant updates, such
    that we can perform the update just as if it would happen atomically
    without additional kernel support.

    We capture the add/del changes and apply them in the notifier commit
    stage instead. There, we can check for overlaps and perform the ioctl
    inhibiting only if really required (-> overlap).

    To keep things simple we don't perform additional checks that wouldn't
    actually result in an overlap -- such as !RAM memory regions in some
    cases (see kvm_set_phys_mem()).

    To minimize cache-line bouncing, use a separate indicator
    (in_ioctl_lock) per CPU.  Also, make sure to hold the kvm_slots_lock
    while performing both actions (removing+re-adding).

    We have to wait until all IOCTLs were exited and block new ones from
    getting executed.

    This approach cannot result in a deadlock as long as the inhibitor does
    not hold any locks that might hinder an IOCTL from getting finished and
    exited - something fairly unusual. The inhibitor will always hold the BQL.

    AFAIKs, one possible candidate would be userfaultfd. If a page cannot be
    placed (e.g., during postcopy), because we're waiting for a lock, or if the
    userfaultfd thread cannot process a fault, because it is waiting for a
    lock, there could be a deadlock. However, the BQL is not applicable here,
    because any other guest memory access while holding the BQL would already
    result in a deadlock.

    Nothing else in the kernel should block forever and wait for userspace
    intervention.

    Note: pause_all_vcpus()/resume_all_vcpus() or
    start_exclusive()/end_exclusive() cannot be used, as they either drop
    the BQL or require to be called without the BQL - something inhibitors
    cannot handle. We need a low-level locking mechanism that is
    deadlock-free even when not releasing the BQL.

    Signed-off-by: David Hildenbrand <david@redhat.com>
    Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
    Tested-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
    Message-Id: <20221111154758.1372674-4-eesposit@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Conflicts:
	accel/kvm/kvm-all.c: include "sysemu/dirtylimit.h" is missing in
	rhel 8.8.0

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 accel/kvm/kvm-all.c      | 101 ++++++++++++++++++++++++++++++++++-----
 include/sysemu/kvm_int.h |   8 ++++
 2 files changed, 98 insertions(+), 11 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 221aadfda74..3b7bc39823a 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -31,6 +31,7 @@
 #include "sysemu/kvm_int.h"
 #include "sysemu/runstate.h"
 #include "sysemu/cpus.h"
+#include "sysemu/accel-blocker.h"
 #include "qemu/bswap.h"
 #include "exec/memory.h"
 #include "exec/ram_addr.h"
@@ -45,6 +46,7 @@
 #include "qemu/guest-random.h"
 #include "sysemu/hw_accel.h"
 #include "kvm-cpus.h"
+#include "qemu/range.h"
 
 #include "hw/boards.h"
 
@@ -1334,6 +1336,7 @@ void kvm_set_max_memslot_size(hwaddr max_slot_size)
     kvm_max_slot_size = max_slot_size;
 }
 
+/* Called with KVMMemoryListener.slots_lock held */
 static void kvm_set_phys_mem(KVMMemoryListener *kml,
                              MemoryRegionSection *section, bool add)
 {
@@ -1368,14 +1371,12 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
     ram = memory_region_get_ram_ptr(mr) + mr_offset;
     ram_start_offset = memory_region_get_ram_addr(mr) + mr_offset;
 
-    kvm_slots_lock();
-
     if (!add) {
         do {
             slot_size = MIN(kvm_max_slot_size, size);
             mem = kvm_lookup_matching_slot(kml, start_addr, slot_size);
             if (!mem) {
-                goto out;
+                return;
             }
             if (mem->flags & KVM_MEM_LOG_DIRTY_PAGES) {
                 /*
@@ -1413,7 +1414,7 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
             start_addr += slot_size;
             size -= slot_size;
         } while (size);
-        goto out;
+        return;
     }
 
     /* register the new slot */
@@ -1438,9 +1439,6 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
         ram += slot_size;
         size -= slot_size;
     } while (size);
-
-out:
-    kvm_slots_unlock();
 }
 
 static void *kvm_dirty_ring_reaper_thread(void *data)
@@ -1492,18 +1490,95 @@ static void kvm_region_add(MemoryListener *listener,
                            MemoryRegionSection *section)
 {
     KVMMemoryListener *kml = container_of(listener, KVMMemoryListener, listener);
+    KVMMemoryUpdate *update;
+
+    update = g_new0(KVMMemoryUpdate, 1);
+    update->section = *section;
 
-    memory_region_ref(section->mr);
-    kvm_set_phys_mem(kml, section, true);
+    QSIMPLEQ_INSERT_TAIL(&kml->transaction_add, update, next);
 }
 
 static void kvm_region_del(MemoryListener *listener,
                            MemoryRegionSection *section)
 {
     KVMMemoryListener *kml = container_of(listener, KVMMemoryListener, listener);
+    KVMMemoryUpdate *update;
+
+    update = g_new0(KVMMemoryUpdate, 1);
+    update->section = *section;
+
+    QSIMPLEQ_INSERT_TAIL(&kml->transaction_del, update, next);
+}
+
+static void kvm_region_commit(MemoryListener *listener)
+{
+    KVMMemoryListener *kml = container_of(listener, KVMMemoryListener,
+                                          listener);
+    KVMMemoryUpdate *u1, *u2;
+    bool need_inhibit = false;
+
+    if (QSIMPLEQ_EMPTY(&kml->transaction_add) &&
+        QSIMPLEQ_EMPTY(&kml->transaction_del)) {
+        return;
+    }
+
+    /*
+     * We have to be careful when regions to add overlap with ranges to remove.
+     * We have to simulate atomic KVM memslot updates by making sure no ioctl()
+     * is currently active.
+     *
+     * The lists are order by addresses, so it's easy to find overlaps.
+     */
+    u1 = QSIMPLEQ_FIRST(&kml->transaction_del);
+    u2 = QSIMPLEQ_FIRST(&kml->transaction_add);
+    while (u1 && u2) {
+        Range r1, r2;
+
+        range_init_nofail(&r1, u1->section.offset_within_address_space,
+                          int128_get64(u1->section.size));
+        range_init_nofail(&r2, u2->section.offset_within_address_space,
+                          int128_get64(u2->section.size));
+
+        if (range_overlaps_range(&r1, &r2)) {
+            need_inhibit = true;
+            break;
+        }
+        if (range_lob(&r1) < range_lob(&r2)) {
+            u1 = QSIMPLEQ_NEXT(u1, next);
+        } else {
+            u2 = QSIMPLEQ_NEXT(u2, next);
+        }
+    }
+
+    kvm_slots_lock();
+    if (need_inhibit) {
+        accel_ioctl_inhibit_begin();
+    }
+
+    /* Remove all memslots before adding the new ones. */
+    while (!QSIMPLEQ_EMPTY(&kml->transaction_del)) {
+        u1 = QSIMPLEQ_FIRST(&kml->transaction_del);
+        QSIMPLEQ_REMOVE_HEAD(&kml->transaction_del, next);
 
-    kvm_set_phys_mem(kml, section, false);
-    memory_region_unref(section->mr);
+        kvm_set_phys_mem(kml, &u1->section, false);
+        memory_region_unref(u1->section.mr);
+
+        g_free(u1);
+    }
+    while (!QSIMPLEQ_EMPTY(&kml->transaction_add)) {
+        u1 = QSIMPLEQ_FIRST(&kml->transaction_add);
+        QSIMPLEQ_REMOVE_HEAD(&kml->transaction_add, next);
+
+        memory_region_ref(u1->section.mr);
+        kvm_set_phys_mem(kml, &u1->section, true);
+
+        g_free(u1);
+    }
+
+    if (need_inhibit) {
+        accel_ioctl_inhibit_end();
+    }
+    kvm_slots_unlock();
 }
 
 static void kvm_log_sync(MemoryListener *listener,
@@ -1647,8 +1722,12 @@ void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml,
         kml->slots[i].slot = i;
     }
 
+    QSIMPLEQ_INIT(&kml->transaction_add);
+    QSIMPLEQ_INIT(&kml->transaction_del);
+
     kml->listener.region_add = kvm_region_add;
     kml->listener.region_del = kvm_region_del;
+    kml->listener.commit = kvm_region_commit;
     kml->listener.log_start = kvm_log_start;
     kml->listener.log_stop = kvm_log_stop;
     kml->listener.priority = 10;
diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
index 1f5487d9b74..7e18c0a3c0a 100644
--- a/include/sysemu/kvm_int.h
+++ b/include/sysemu/kvm_int.h
@@ -11,6 +11,7 @@
 
 #include "exec/memory.h"
 #include "qemu/accel.h"
+#include "qemu/queue.h"
 #include "sysemu/kvm.h"
 
 typedef struct KVMSlot
@@ -30,10 +31,17 @@ typedef struct KVMSlot
     ram_addr_t ram_start_offset;
 } KVMSlot;
 
+typedef struct KVMMemoryUpdate {
+    QSIMPLEQ_ENTRY(KVMMemoryUpdate) next;
+    MemoryRegionSection section;
+} KVMMemoryUpdate;
+
 typedef struct KVMMemoryListener {
     MemoryListener listener;
     KVMSlot *slots;
     int as_id;
+    QSIMPLEQ_HEAD(, KVMMemoryUpdate) transaction_add;
+    QSIMPLEQ_HEAD(, KVMMemoryUpdate) transaction_del;
 } KVMMemoryListener;
 
 void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml,
-- 
Gitee


From a7b1afbf1197653ebef2f782141a6bbd2b5fba0b Mon Sep 17 00:00:00 2001
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Date: Wed, 13 Apr 2022 12:33:29 +0100
Subject: [PATCH 226/407] migration: Read state once

RH-Author: Dr. David Alan Gilbert <dgilbert@redhat.com>
RH-MergeRequest: 249: migration: Read state once
RH-Bugzilla: 2074205
RH-Acked-by: Peter Xu <peterx@redhat.com>
RH-Acked-by: Laszlo Ersek <lersek@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Acked-by: quintela1 <quintela@redhat.com>
RH-Commit: [1/1] 9aa47b492a646fce4e66ebd9b7d7a85286d16051

The 'status' field for the migration is updated normally using
an atomic operation from the migration thread.
Most readers of it aren't that careful, and in most cases it doesn't
matter.

In query_migrate->fill_source_migration_info the 'state'
is read twice; the first time to decide which state fields to fill in,
and then secondly to copy the state to the status field; that can end up
with a status that's inconsistent; e.g. setting up the fields
for 'setup' and then having an 'active' status.  In that case
libvirt gets upset by the lack of ram info.
The symptom is:
   libvirt.libvirtError: internal error: migration was active, but no RAM info was set

Read the state exactly once in fill_source_migration_info.

This is a possible fix for:
https://bugzilla.redhat.com/show_bug.cgi?id=2074205

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20220413113329.103696-1-dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
(cherry picked from commit 552de79bfdd5e9e53847eb3c6d6e4cd898a4370e)
---
 migration/migration.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 51e6726dac6..d8b24a2c917 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1071,6 +1071,7 @@ static void populate_disk_info(MigrationInfo *info)
 static void fill_source_migration_info(MigrationInfo *info)
 {
     MigrationState *s = migrate_get_current();
+    int state = qatomic_read(&s->state);
     GSList *cur_blocker = migration_blockers;
 
     info->blocked_reasons = NULL;
@@ -1090,7 +1091,7 @@ static void fill_source_migration_info(MigrationInfo *info)
     }
     info->has_blocked_reasons = info->blocked_reasons != NULL;
 
-    switch (s->state) {
+    switch (state) {
     case MIGRATION_STATUS_NONE:
         /* no migration has happened ever */
         /* do not overwrite destination migration status */
@@ -1135,7 +1136,7 @@ static void fill_source_migration_info(MigrationInfo *info)
         info->has_status = true;
         break;
     }
-    info->status = s->state;
+    info->status = state;
 }
 
 typedef enum WriteTrackingSupport {
-- 
Gitee


From 878daea914bd8020ef32c0810e0af318cc420e30 Mon Sep 17 00:00:00 2001
From: Matthew Rosato <mjrosato@linux.ibm.com>
Date: Fri, 28 Oct 2022 15:47:56 -0400
Subject: [PATCH 227/407] s390x/pci: RPCIT second pass when mappings exhausted
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 250: s390x/pci: reset ISM passthrough devices on shutdown and system reset
RH-Bugzilla: 2163713
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Commit: [1/4] 0b4500b9247725b1ef0b290bb85392300a618cac

If we encounter a new mapping while the number of available DMA entries
in vfio is 0, we are currently skipping that mapping which is a problem
if we manage to free up DMA space after that within the same RPCIT --
we will return to the guest with CC0 and have not mapped everything
within the specified range.  This issue was uncovered while testing
changes to the s390 linux kernel iommu/dma code, where a different
usage pattern was employed (new mappings start at the end of the
aperture and work back towards the front, making us far more likely
to encounter new mappings before invalidated mappings during a
global refresh).

Fix this by tracking whether any mappings were skipped due to vfio
DMA limit hitting 0; when this occurs, we still continue the range
and unmap/map anything we can - then we must re-run the range again
to pickup anything that was missed.  This must occur in a loop until
all requests are satisfied (success) or we detect that we are still
unable to complete all mappings (return ZPCI_RPCIT_ST_INSUFF_RES).

Link: https://lore.kernel.org/linux-s390/20221019144435.369902-1-schnelle@linux.ibm.com/
Fixes: 37fa32de70 ("s390x/pci: Honor DMA limits set by vfio")
Reported-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Message-Id: <20221028194758.204007-2-mjrosato@linux.ibm.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 4a8d21ba50fc8625c3bd51dab903872952f95718)
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 hw/s390x/s390-pci-inst.c | 29 ++++++++++++++++++++++-------
 1 file changed, 22 insertions(+), 7 deletions(-)

diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
index 20a9bcc7afb..7cc4bcf850d 100644
--- a/hw/s390x/s390-pci-inst.c
+++ b/hw/s390x/s390-pci-inst.c
@@ -677,8 +677,9 @@ int rpcit_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2, uintptr_t ra)
     S390PCIBusDevice *pbdev;
     S390PCIIOMMU *iommu;
     S390IOTLBEntry entry;
-    hwaddr start, end;
+    hwaddr start, end, sstart;
     uint32_t dma_avail;
+    bool again;
 
     if (env->psw.mask & PSW_MASK_PSTATE) {
         s390_program_interrupt(env, PGM_PRIVILEGED, ra);
@@ -691,7 +692,7 @@ int rpcit_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2, uintptr_t ra)
     }
 
     fh = env->regs[r1] >> 32;
-    start = env->regs[r2];
+    sstart = start = env->regs[r2];
     end = start + env->regs[r2 + 1];
 
     pbdev = s390_pci_find_dev_by_fh(s390_get_phb(), fh);
@@ -732,6 +733,9 @@ int rpcit_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2, uintptr_t ra)
         goto err;
     }
 
+ retry:
+    start = sstart;
+    again = false;
     while (start < end) {
         error = s390_guest_io_table_walk(iommu->g_iota, start, &entry);
         if (error) {
@@ -739,13 +743,24 @@ int rpcit_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2, uintptr_t ra)
         }
 
         start += entry.len;
-        while (entry.iova < start && entry.iova < end &&
-               (dma_avail > 0 || entry.perm == IOMMU_NONE)) {
-            dma_avail = s390_pci_update_iotlb(iommu, &entry);
-            entry.iova += TARGET_PAGE_SIZE;
-            entry.translated_addr += TARGET_PAGE_SIZE;
+        while (entry.iova < start && entry.iova < end) {
+            if (dma_avail > 0 || entry.perm == IOMMU_NONE) {
+                dma_avail = s390_pci_update_iotlb(iommu, &entry);
+                entry.iova += TARGET_PAGE_SIZE;
+                entry.translated_addr += TARGET_PAGE_SIZE;
+            } else {
+                /*
+                 * We are unable to make a new mapping at this time, continue
+                 * on and hopefully free up more space.  Then attempt another
+                 * pass.
+                 */
+                again = true;
+                break;
+            }
         }
     }
+    if (again && dma_avail > 0)
+        goto retry;
 err:
     if (error) {
         pbdev->state = ZPCI_FS_ERROR;
-- 
Gitee


From ede2f87ec1b0de791b1e838effe1afd31ea64d39 Mon Sep 17 00:00:00 2001
From: Matthew Rosato <mjrosato@linux.ibm.com>
Date: Fri, 28 Oct 2022 15:47:57 -0400
Subject: [PATCH 228/407] s390x/pci: coalesce unmap operations
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 250: s390x/pci: reset ISM passthrough devices on shutdown and system reset
RH-Bugzilla: 2163713
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Commit: [2/4] 7b5ee38eca565f5a7cbede4b9883ba3a508fb46c

Currently, each unmapped page is handled as an individual iommu
region notification.  Attempt to group contiguous unmap operations
into fewer notifications to reduce overhead.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Message-Id: <20221028194758.204007-3-mjrosato@linux.ibm.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit ef536007c3301bbd6a787e4c2210ea289adaa6f0)
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 hw/s390x/s390-pci-inst.c | 51 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
index 7cc4bcf850d..66e764f9016 100644
--- a/hw/s390x/s390-pci-inst.c
+++ b/hw/s390x/s390-pci-inst.c
@@ -640,6 +640,8 @@ static uint32_t s390_pci_update_iotlb(S390PCIIOMMU *iommu,
         }
         g_hash_table_remove(iommu->iotlb, &entry->iova);
         inc_dma_avail(iommu);
+        /* Don't notify the iommu yet, maybe we can bundle contiguous unmaps */
+        goto out;
     } else {
         if (cache) {
             if (cache->perm == entry->perm &&
@@ -663,15 +665,44 @@ static uint32_t s390_pci_update_iotlb(S390PCIIOMMU *iommu,
         dec_dma_avail(iommu);
     }
 
+    /*
+     * All associated iotlb entries have already been cleared, trigger the
+     * unmaps.
+     */
     memory_region_notify_iommu(&iommu->iommu_mr, 0, event);
 
 out:
     return iommu->dma_limit ? iommu->dma_limit->avail : 1;
 }
 
+static void s390_pci_batch_unmap(S390PCIIOMMU *iommu, uint64_t iova,
+                                 uint64_t len)
+{
+    uint64_t remain = len, start = iova, end = start + len - 1, mask, size;
+    IOMMUTLBEvent event = {
+        .type = IOMMU_NOTIFIER_UNMAP,
+        .entry = {
+            .target_as = &address_space_memory,
+            .translated_addr = 0,
+            .perm = IOMMU_NONE,
+        },
+    };
+
+    while (remain >= TARGET_PAGE_SIZE) {
+        mask = dma_aligned_pow2_mask(start, end, 64);
+        size = mask + 1;
+        event.entry.iova = start;
+        event.entry.addr_mask = mask;
+        memory_region_notify_iommu(&iommu->iommu_mr, 0, event);
+        start += size;
+        remain -= size;
+    }
+}
+
 int rpcit_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2, uintptr_t ra)
 {
     CPUS390XState *env = &cpu->env;
+    uint64_t iova, coalesce = 0;
     uint32_t fh;
     uint16_t error = 0;
     S390PCIBusDevice *pbdev;
@@ -742,6 +773,21 @@ int rpcit_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2, uintptr_t ra)
             break;
         }
 
+        /*
+         * If this is an unmap of a PTE, let's try to coalesce multiple unmaps
+         * into as few notifier events as possible.
+         */
+        if (entry.perm == IOMMU_NONE && entry.len == TARGET_PAGE_SIZE) {
+            if (coalesce == 0) {
+                iova = entry.iova;
+            }
+            coalesce += entry.len;
+        } else if (coalesce > 0) {
+            /* Unleash the coalesced unmap before processing a new map */
+            s390_pci_batch_unmap(iommu, iova, coalesce);
+            coalesce = 0;
+        }
+
         start += entry.len;
         while (entry.iova < start && entry.iova < end) {
             if (dma_avail > 0 || entry.perm == IOMMU_NONE) {
@@ -759,6 +805,11 @@ int rpcit_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2, uintptr_t ra)
             }
         }
     }
+    if (coalesce) {
+            /* Unleash the coalesced unmap before finishing rpcit */
+            s390_pci_batch_unmap(iommu, iova, coalesce);
+            coalesce = 0;
+    }
     if (again && dma_avail > 0)
         goto retry;
 err:
-- 
Gitee


From d5e0446796be2a5cb97793ac2f44b3815ee665c8 Mon Sep 17 00:00:00 2001
From: Matthew Rosato <mjrosato@linux.ibm.com>
Date: Fri, 28 Oct 2022 15:47:58 -0400
Subject: [PATCH 229/407] s390x/pci: shrink DMA aperture to be bound by vfio
 DMA limit
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 250: s390x/pci: reset ISM passthrough devices on shutdown and system reset
RH-Bugzilla: 2163713
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Commit: [3/4] aa241dd250ad5e696b67c87dddc31ee5aaee9c0e

Currently, s390x-pci performs accounting against the vfio DMA
limit and triggers the guest to clean up mappings when the limit
is reached. Let's go a step further and also limit the size of
the supported DMA aperture reported to the guest based upon the
initial vfio DMA limit reported for the container (if less than
than the size reported by the firmware/host zPCI layer).  This
avoids processing sections of the guest DMA table during global
refresh that, for common use cases, will never be used anway, and
makes exhausting the vfio DMA limit due to mismatch between guest
aperture size and host limit far less likely and more indicitive
of an error.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Message-Id: <20221028194758.204007-4-mjrosato@linux.ibm.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit df202e3ff3fccb49868e08f20d0bda86cb953fbe)
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 hw/s390x/s390-pci-vfio.c        | 11 +++++++++++
 include/hw/s390x/s390-pci-bus.h |  1 +
 2 files changed, 12 insertions(+)

diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
index 2aefa508a07..99806e2a846 100644
--- a/hw/s390x/s390-pci-vfio.c
+++ b/hw/s390x/s390-pci-vfio.c
@@ -84,6 +84,7 @@ S390PCIDMACount *s390_pci_start_dma_count(S390pciState *s,
     cnt->users = 1;
     cnt->avail = avail;
     QTAILQ_INSERT_TAIL(&s->zpci_dma_limit, cnt, link);
+    pbdev->iommu->max_dma_limit = avail;
     return cnt;
 }
 
@@ -103,6 +104,7 @@ static void s390_pci_read_base(S390PCIBusDevice *pbdev,
     struct vfio_info_cap_header *hdr;
     struct vfio_device_info_cap_zpci_base *cap;
     VFIOPCIDevice *vpci =  container_of(pbdev->pdev, VFIOPCIDevice, pdev);
+    uint64_t vfio_size;
 
     hdr = vfio_get_device_info_cap(info, VFIO_DEVICE_INFO_CAP_ZPCI_BASE);
 
@@ -122,6 +124,15 @@ static void s390_pci_read_base(S390PCIBusDevice *pbdev,
     /* The following values remain 0 until we support other FMB formats */
     pbdev->zpci_fn.fmbl = 0;
     pbdev->zpci_fn.pft = 0;
+
+    /*
+     * If appropriate, reduce the size of the supported DMA aperture reported
+     * to the guest based upon the vfio DMA limit.
+     */
+    vfio_size = pbdev->iommu->max_dma_limit << TARGET_PAGE_BITS;
+    if (vfio_size < (cap->end_dma - cap->start_dma + 1)) {
+        pbdev->zpci_fn.edma = cap->start_dma + vfio_size - 1;
+    }
 }
 
 static bool get_host_fh(S390PCIBusDevice *pbdev, struct vfio_device_info *info,
diff --git a/include/hw/s390x/s390-pci-bus.h b/include/hw/s390x/s390-pci-bus.h
index 0605fcea24d..1c46e3a2691 100644
--- a/include/hw/s390x/s390-pci-bus.h
+++ b/include/hw/s390x/s390-pci-bus.h
@@ -278,6 +278,7 @@ struct S390PCIIOMMU {
     uint64_t g_iota;
     uint64_t pba;
     uint64_t pal;
+    uint64_t max_dma_limit;
     GHashTable *iotlb;
     S390PCIDMACount *dma_limit;
 };
-- 
Gitee


From cd135b6d1d304ee19ff09b1d3e8d22a4a8ce1755 Mon Sep 17 00:00:00 2001
From: Matthew Rosato <mjrosato@linux.ibm.com>
Date: Fri, 9 Dec 2022 14:57:00 -0500
Subject: [PATCH 230/407] s390x/pci: reset ISM passthrough devices on shutdown
 and system reset
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 250: s390x/pci: reset ISM passthrough devices on shutdown and system reset
RH-Bugzilla: 2163713
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Commit: [4/4] c857d022c7c2f43cdeb66c4f6acfd9272c925b35

ISM device firmware stores unique state information that can
can cause a wholesale unmap of the associated IOMMU (e.g. when
we get a termination signal for QEMU) to trigger firmware errors
because firmware believes we are attempting to invalidate entries
that are still in-use by the guest OS (when in fact that guest is
in the process of being terminated or rebooted).
To alleviate this, register both a shutdown notifier (for unexpected
termination cases e.g. virsh destroy) as well as a reset callback
(for cases like guest OS reboot).  For each of these scenarios, trigger
PCI device reset; this is enough to indicate to firmware that the IOMMU
is no longer in-use by the guest OS, making it safe to invalidate any
associated IOMMU entries.

Fixes: 15d0e7942d3b ("s390x/pci: don't fence interpreted devices without MSI-X")
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Message-Id: <20221209195700.263824-1-mjrosato@linux.ibm.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
[thuth: Adjusted the hunk in s390-pci-vfio.c due to different context]
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 03451953c79e6b31f7860ee0c35b28e181d573c1)
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 hw/s390x/s390-pci-bus.c         | 28 ++++++++++++++++++++++++++++
 hw/s390x/s390-pci-vfio.c        |  2 ++
 include/hw/s390x/s390-pci-bus.h |  5 +++++
 3 files changed, 35 insertions(+)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index d8b1e44a02f..2d92848b0f0 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -24,6 +24,8 @@
 #include "hw/pci/msi.h"
 #include "qemu/error-report.h"
 #include "qemu/module.h"
+#include "sysemu/reset.h"
+#include "sysemu/runstate.h"
 
 #ifndef DEBUG_S390PCI_BUS
 #define DEBUG_S390PCI_BUS  0
@@ -150,10 +152,30 @@ out:
     psccb->header.response_code = cpu_to_be16(rc);
 }
 
+static void s390_pci_shutdown_notifier(Notifier *n, void *opaque)
+{
+    S390PCIBusDevice *pbdev = container_of(n, S390PCIBusDevice,
+                                           shutdown_notifier);
+
+    pci_device_reset(pbdev->pdev);
+}
+
+static void s390_pci_reset_cb(void *opaque)
+{
+    S390PCIBusDevice *pbdev = opaque;
+
+    pci_device_reset(pbdev->pdev);
+}
+
 static void s390_pci_perform_unplug(S390PCIBusDevice *pbdev)
 {
     HotplugHandler *hotplug_ctrl;
 
+    if (pbdev->pft == ZPCI_PFT_ISM) {
+        notifier_remove(&pbdev->shutdown_notifier);
+        qemu_unregister_reset(s390_pci_reset_cb, pbdev);
+    }
+
     /* Unplug the PCI device */
     if (pbdev->pdev) {
         DeviceState *pdev = DEVICE(pbdev->pdev);
@@ -1111,6 +1133,12 @@ static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
                 pbdev->fh |= FH_SHM_VFIO;
                 pbdev->forwarding_assist = false;
             }
+            /* Register shutdown notifier and reset callback for ISM devices */
+            if (pbdev->pft == ZPCI_PFT_ISM) {
+                pbdev->shutdown_notifier.notify = s390_pci_shutdown_notifier;
+                qemu_register_shutdown_notifier(&pbdev->shutdown_notifier);
+                qemu_register_reset(s390_pci_reset_cb, pbdev);
+            }
         } else {
             pbdev->fh |= FH_SHM_EMUL;
             /* Always intercept emulated devices */
diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
index 99806e2a846..69af35f4fe3 100644
--- a/hw/s390x/s390-pci-vfio.c
+++ b/hw/s390x/s390-pci-vfio.c
@@ -124,6 +124,8 @@ static void s390_pci_read_base(S390PCIBusDevice *pbdev,
     /* The following values remain 0 until we support other FMB formats */
     pbdev->zpci_fn.fmbl = 0;
     pbdev->zpci_fn.pft = 0;
+    /* Store function type separately for type-specific behavior */
+    pbdev->pft = cap->pft;
 
     /*
      * If appropriate, reduce the size of the supported DMA aperture reported
diff --git a/include/hw/s390x/s390-pci-bus.h b/include/hw/s390x/s390-pci-bus.h
index 1c46e3a2691..e0a9f9385be 100644
--- a/include/hw/s390x/s390-pci-bus.h
+++ b/include/hw/s390x/s390-pci-bus.h
@@ -39,6 +39,9 @@
 #define UID_CHECKING_ENABLED 0x01
 #define ZPCI_DTSM 0x40
 
+/* zPCI Function Types */
+#define ZPCI_PFT_ISM 5
+
 OBJECT_DECLARE_SIMPLE_TYPE(S390pciState, S390_PCI_HOST_BRIDGE)
 OBJECT_DECLARE_SIMPLE_TYPE(S390PCIBus, S390_PCI_BUS)
 OBJECT_DECLARE_SIMPLE_TYPE(S390PCIBusDevice, S390_PCI_DEVICE)
@@ -344,6 +347,7 @@ struct S390PCIBusDevice {
     uint16_t noi;
     uint16_t maxstbl;
     uint8_t sum;
+    uint8_t pft;
     S390PCIGroup *pci_group;
     ClpRspQueryPci zpci_fn;
     S390MsixInfo msix;
@@ -352,6 +356,7 @@ struct S390PCIBusDevice {
     MemoryRegion msix_notify_mr;
     IndAddr *summary_ind;
     IndAddr *indicator;
+    Notifier shutdown_notifier;
     bool pci_unplug_request_processed;
     bool unplug_requested;
     bool interp;
-- 
Gitee


From dc6a66bbca3e09d6333d30d035efddff0053aa0f Mon Sep 17 00:00:00 2001
From: Kevin Wolf <kwolf@redhat.com>
Date: Thu, 12 Jan 2023 20:14:51 +0100
Subject: [PATCH 231/407] qcow2: Fix theoretical corruption in store_bitmap()
 error path
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Kevin Wolf <kwolf@redhat.com>
RH-MergeRequest: 251: qemu-img: Fix exit code for errors closing the image
RH-Bugzilla: 2147617
RH-Acked-by: Hanna Czenczek <hreitz@redhat.com>
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
RH-Acked-by: Stefano Garzarella <sgarzare@redhat.com>
RH-Commit: [1/4] d0a26bed7b16db41e7baee1f8f2b3ae54e52dd52

In order to write the bitmap table to the image file, it is converted to
big endian. If the write fails, it is passed to clear_bitmap_table() to
free all of the clusters it had allocated before. However, if we don't
convert it back to native endianness first, we'll free things at a wrong
offset.

In practical terms, the offsets will be so high that we won't actually
free any allocated clusters, but just run into an error, but in theory
this can cause image corruption.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230112191454.169353-2-kwolf@redhat.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit b03dd9613bcf8fe948581b2b3585510cb525c382)
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/qcow2-bitmap.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
index 8fb47315515..869069415c2 100644
--- a/block/qcow2-bitmap.c
+++ b/block/qcow2-bitmap.c
@@ -115,7 +115,7 @@ static int update_header_sync(BlockDriverState *bs)
     return bdrv_flush(bs->file->bs);
 }
 
-static inline void bitmap_table_to_be(uint64_t *bitmap_table, size_t size)
+static inline void bitmap_table_bswap_be(uint64_t *bitmap_table, size_t size)
 {
     size_t i;
 
@@ -1401,9 +1401,10 @@ static int store_bitmap(BlockDriverState *bs, Qcow2Bitmap *bm, Error **errp)
         goto fail;
     }
 
-    bitmap_table_to_be(tb, tb_size);
+    bitmap_table_bswap_be(tb, tb_size);
     ret = bdrv_pwrite(bs->file, tb_offset, tb, tb_size * sizeof(tb[0]));
     if (ret < 0) {
+        bitmap_table_bswap_be(tb, tb_size);
         error_setg_errno(errp, -ret, "Failed to write bitmap '%s' to file",
                          bm_name);
         goto fail;
-- 
Gitee


From 45a6b0a4ce57e1992dff9a82883c4dee6e774ea4 Mon Sep 17 00:00:00 2001
From: Kevin Wolf <kwolf@redhat.com>
Date: Thu, 12 Jan 2023 20:14:52 +0100
Subject: [PATCH 232/407] qemu-img commit: Report errors while closing the
 image

RH-Author: Kevin Wolf <kwolf@redhat.com>
RH-MergeRequest: 251: qemu-img: Fix exit code for errors closing the image
RH-Bugzilla: 2147617
RH-Acked-by: Hanna Czenczek <hreitz@redhat.com>
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
RH-Acked-by: Stefano Garzarella <sgarzare@redhat.com>
RH-Commit: [2/4] 28f95bf76d1d63e2b0bed0c2ba5206bd3e5ea4f8

blk_unref() can't report any errors that happen while closing the image.
For example, if qcow2 hits an -ENOSPC error while writing out dirty
bitmaps when it's closed, it prints error messages to stderr, but
'qemu-img commit' won't see any error return value and will therefore
look successful with exit code 0.

In order to fix this, manually inactivate the image first before calling
blk_unref(). This already performs the operations that would be most
likely to fail while closing the image, but it can still return errors.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230112191454.169353-3-kwolf@redhat.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 44efba2d713aca076c411594d0c1a2b99155eeb3)
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 qemu-img.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/qemu-img.c b/qemu-img.c
index f036a1d428d..18833f7d69c 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -443,6 +443,11 @@ static BlockBackend *img_open(bool image_opts,
         blk = img_open_file(filename, NULL, fmt, flags, writethrough, quiet,
                             force_share);
     }
+
+    if (blk) {
+        blk_set_force_allow_inactivate(blk);
+    }
+
     return blk;
 }
 
@@ -1110,6 +1115,14 @@ unref_backing:
 done:
     qemu_progress_end();
 
+    /*
+     * Manually inactivate the image first because this way we can know whether
+     * an error occurred. blk_unref() doesn't tell us about failures.
+     */
+    ret = bdrv_inactivate_all();
+    if (ret < 0 && !local_err) {
+        error_setg_errno(&local_err, -ret, "Error while closing the image");
+    }
     blk_unref(blk);
 
     if (local_err) {
-- 
Gitee


From e18dea259b9b1e1b68bff5b8c26dcad94dac4285 Mon Sep 17 00:00:00 2001
From: Kevin Wolf <kwolf@redhat.com>
Date: Thu, 12 Jan 2023 20:14:53 +0100
Subject: [PATCH 233/407] qemu-img bitmap: Report errors while closing the
 image
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Kevin Wolf <kwolf@redhat.com>
RH-MergeRequest: 251: qemu-img: Fix exit code for errors closing the image
RH-Bugzilla: 2147617
RH-Acked-by: Hanna Czenczek <hreitz@redhat.com>
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
RH-Acked-by: Stefano Garzarella <sgarzare@redhat.com>
RH-Commit: [3/4] 8e13e09564718a0badd03af84f036246a46a0eba

blk_unref() can't report any errors that happen while closing the image.
For example, if qcow2 hits an -ENOSPC error while writing out dirty
bitmaps when it's closed, it prints error messages to stderr, but
'qemu-img bitmap' won't see any error return value and will therefore
look successful with exit code 0.

In order to fix this, manually inactivate the image first before calling
blk_unref(). This already performs the operations that would be most
likely to fail while closing the image, but it can still return errors.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1330
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230112191454.169353-4-kwolf@redhat.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit c5e477110dcb8ef4642dce399777c3dee68fa96c)
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 qemu-img.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/qemu-img.c b/qemu-img.c
index 18833f7d69c..7d035c0c7fd 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -4622,6 +4622,7 @@ static int img_bitmap(int argc, char **argv)
     QSIMPLEQ_HEAD(, ImgBitmapAction) actions;
     ImgBitmapAction *act, *act_next;
     const char *op;
+    int inactivate_ret;
 
     QSIMPLEQ_INIT(&actions);
 
@@ -4806,6 +4807,16 @@ static int img_bitmap(int argc, char **argv)
     ret = 0;
 
  out:
+    /*
+     * Manually inactivate the images first because this way we can know whether
+     * an error occurred. blk_unref() doesn't tell us about failures.
+     */
+    inactivate_ret = bdrv_inactivate_all();
+    if (inactivate_ret < 0) {
+        error_report("Error while closing the image: %s", strerror(-inactivate_ret));
+        ret = 1;
+    }
+
     blk_unref(src);
     blk_unref(blk);
     qemu_opts_del(opts);
-- 
Gitee


From 8dcc69a3e499676fbe3d70629a4d50a49b68f322 Mon Sep 17 00:00:00 2001
From: Kevin Wolf <kwolf@redhat.com>
Date: Thu, 12 Jan 2023 20:14:54 +0100
Subject: [PATCH 234/407] qemu-iotests: Test qemu-img bitmap/commit exit code
 on error

RH-Author: Kevin Wolf <kwolf@redhat.com>
RH-MergeRequest: 251: qemu-img: Fix exit code for errors closing the image
RH-Bugzilla: 2147617
RH-Acked-by: Hanna Czenczek <hreitz@redhat.com>
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
RH-Acked-by: Stefano Garzarella <sgarzare@redhat.com>
RH-Commit: [4/4] fb2f9de98ddd2ee1d745119e4f15272ef44e0aae

This tests that when an error happens while writing back bitmaps to the
image file in qcow2_inactivate(), 'qemu-img bitmap/commit' actually
return an error value in their exit code instead of making the operation
look successful to scripts.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230112191454.169353-5-kwolf@redhat.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 07a4e1f8e5418f36424cd57d5d061b090a238c65)
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 .../qemu-iotests/tests/qemu-img-close-errors  | 96 +++++++++++++++++++
 .../tests/qemu-img-close-errors.out           | 23 +++++
 2 files changed, 119 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/qemu-img-close-errors
 create mode 100644 tests/qemu-iotests/tests/qemu-img-close-errors.out

diff --git a/tests/qemu-iotests/tests/qemu-img-close-errors b/tests/qemu-iotests/tests/qemu-img-close-errors
new file mode 100755
index 00000000000..50bfb6cfa2a
--- /dev/null
+++ b/tests/qemu-iotests/tests/qemu-img-close-errors
@@ -0,0 +1,96 @@
+#!/usr/bin/env bash
+# group: rw auto quick
+#
+# Check that errors while closing the image, in particular writing back dirty
+# bitmaps, is correctly reported with a failing qemu-img exit code.
+#
+# Copyright (C) 2023 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+# creator
+owner=kwolf@redhat.com
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+status=1	# failure is the default!
+
+_cleanup()
+{
+    _cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+cd ..
+. ./common.rc
+. ./common.filter
+
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux
+
+size=1G
+
+# The error we are going to use is ENOSPC. Depending on how many bitmaps we
+# create in the backing file (and therefore increase the used up space), we get
+# failures in different places. With a low number, only merging the bitmap
+# fails, whereas with a higher number, already 'qemu-img commit' fails.
+for max_bitmap in 6 7; do
+    echo
+    echo "=== Test with $max_bitmap bitmaps ==="
+
+    TEST_IMG="$TEST_IMG.base" _make_test_img -q $size
+    for i in $(seq 1 $max_bitmap); do
+        $QEMU_IMG bitmap --add "$TEST_IMG.base" "stale-bitmap-$i"
+    done
+
+    # Simulate a block device of 128 MB by resizing the image file accordingly
+    # and then enforcing the size with the raw driver
+    $QEMU_IO -f raw -c "truncate 128M" "$TEST_IMG.base"
+    BASE_JSON='json:{
+        "driver": "qcow2",
+        "file": {
+            "driver": "raw",
+            "size": 134217728,
+            "file": {
+                "driver": "file",
+                "filename":"'"$TEST_IMG.base"'"
+            }
+        }
+    }'
+
+    _make_test_img -q -b "$BASE_JSON" -F $IMGFMT
+    $QEMU_IMG bitmap --add "$TEST_IMG" "good-bitmap"
+
+    $QEMU_IO -c 'write 0 126m' "$TEST_IMG" | _filter_qemu_io
+
+    $QEMU_IMG commit -d "$TEST_IMG" 2>&1 | _filter_generated_node_ids
+    echo "qemu-img commit exit code: ${PIPESTATUS[0]}"
+
+    $QEMU_IMG bitmap --add "$BASE_JSON" "good-bitmap"
+    echo "qemu-img bitmap --add exit code: $?"
+
+    $QEMU_IMG bitmap --merge "good-bitmap" -b "$TEST_IMG" "$BASE_JSON" \
+        "good-bitmap" 2>&1 | _filter_generated_node_ids
+    echo "qemu-img bitmap --merge exit code:  ${PIPESTATUS[0]}"
+done
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
+
diff --git a/tests/qemu-iotests/tests/qemu-img-close-errors.out b/tests/qemu-iotests/tests/qemu-img-close-errors.out
new file mode 100644
index 00000000000..1bfe88f1762
--- /dev/null
+++ b/tests/qemu-iotests/tests/qemu-img-close-errors.out
@@ -0,0 +1,23 @@
+QA output created by qemu-img-close-errors
+
+=== Test with 6 bitmaps ===
+wrote 132120576/132120576 bytes at offset 0
+126 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+Image committed.
+qemu-img commit exit code: 0
+qemu-img bitmap --add exit code: 0
+qemu-img: Lost persistent bitmaps during inactivation of node 'NODE_NAME': Failed to write bitmap 'good-bitmap' to file: No space left on device
+qemu-img: Error while closing the image: Invalid argument
+qemu-img: Lost persistent bitmaps during inactivation of node 'NODE_NAME': Failed to write bitmap 'good-bitmap' to file: No space left on device
+qemu-img bitmap --merge exit code:  1
+
+=== Test with 7 bitmaps ===
+wrote 132120576/132120576 bytes at offset 0
+126 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+qemu-img: Lost persistent bitmaps during inactivation of node 'NODE_NAME': Failed to write bitmap 'stale-bitmap-7' to file: No space left on device
+qemu-img: Lost persistent bitmaps during inactivation of node 'NODE_NAME': Failed to write bitmap 'stale-bitmap-7' to file: No space left on device
+qemu-img: Error while closing the image: Invalid argument
+qemu-img commit exit code: 1
+qemu-img bitmap --add exit code: 0
+qemu-img bitmap --merge exit code:  0
+*** done
-- 
Gitee


From d4ce9c3e4d9f9df3d46f7f0c408d93cbd16fdc56 Mon Sep 17 00:00:00 2001
From: "manish.mishra" <manish.mishra@nutanix.com>
Date: Tue, 20 Dec 2022 18:44:17 +0000
Subject: [PATCH 235/407] io: Add support for MSG_PEEK for socket channel
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Peter Xu <peterx@redhat.com>
RH-MergeRequest: 258: migration: Fix multifd crash due to channel disorder
RH-Bugzilla: 2137740
RH-Acked-by: quintela1 <quintela@redhat.com>
RH-Acked-by: Leonardo Brás <leobras@redhat.com>
RH-Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
RH-Commit: [1/2] 04fc6fae358599b8509f5355469d2e8720f01903

Conflicts:
	io/channel-null.c
	migration/channel-block.c

        Because these two files do not exist in rhel8.8 tree, dropping the
        changes.

MSG_PEEK peeks at the channel, The data is treated as unread and
the next read shall still return this data. This support is
currently added only for socket class. Extra parameter 'flags'
is added to io_readv calls to pass extra read flags like MSG_PEEK.

Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Suggested-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: manish.mishra <manish.mishra@nutanix.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
(cherry picked from commit 84615a19ddf2bfb38d7b3a0d487d2397ee55e4f3)
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 chardev/char-socket.c               |  4 ++--
 include/io/channel.h                |  6 ++++++
 io/channel-buffer.c                 |  1 +
 io/channel-command.c                |  1 +
 io/channel-file.c                   |  1 +
 io/channel-socket.c                 | 19 ++++++++++++++++++-
 io/channel-tls.c                    |  1 +
 io/channel-websock.c                |  1 +
 io/channel.c                        | 16 ++++++++++++----
 migration/rdma.c                    |  1 +
 scsi/qemu-pr-helper.c               |  2 +-
 tests/qtest/tpm-emu.c               |  2 +-
 tests/unit/test-io-channel-socket.c |  1 +
 util/vhost-user-server.c            |  2 +-
 14 files changed, 48 insertions(+), 10 deletions(-)

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index 836cfa0bc21..4cdf79e0c2b 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -339,11 +339,11 @@ static ssize_t tcp_chr_recv(Chardev *chr, char *buf, size_t len)
     if (qio_channel_has_feature(s->ioc, QIO_CHANNEL_FEATURE_FD_PASS)) {
         ret = qio_channel_readv_full(s->ioc, &iov, 1,
                                      &msgfds, &msgfds_num,
-                                     NULL);
+                                     0, NULL);
     } else {
         ret = qio_channel_readv_full(s->ioc, &iov, 1,
                                      NULL, NULL,
-                                     NULL);
+                                     0, NULL);
     }
 
     if (ret == QIO_CHANNEL_ERR_BLOCK) {
diff --git a/include/io/channel.h b/include/io/channel.h
index c680ee74802..716235d4967 100644
--- a/include/io/channel.h
+++ b/include/io/channel.h
@@ -34,6 +34,8 @@ OBJECT_DECLARE_TYPE(QIOChannel, QIOChannelClass,
 
 #define QIO_CHANNEL_WRITE_FLAG_ZERO_COPY 0x1
 
+#define QIO_CHANNEL_READ_FLAG_MSG_PEEK 0x1
+
 typedef enum QIOChannelFeature QIOChannelFeature;
 
 enum QIOChannelFeature {
@@ -41,6 +43,7 @@ enum QIOChannelFeature {
     QIO_CHANNEL_FEATURE_SHUTDOWN,
     QIO_CHANNEL_FEATURE_LISTEN,
     QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY,
+    QIO_CHANNEL_FEATURE_READ_MSG_PEEK,
 };
 
 
@@ -114,6 +117,7 @@ struct QIOChannelClass {
                         size_t niov,
                         int **fds,
                         size_t *nfds,
+                        int flags,
                         Error **errp);
     int (*io_close)(QIOChannel *ioc,
                     Error **errp);
@@ -188,6 +192,7 @@ void qio_channel_set_name(QIOChannel *ioc,
  * @niov: the length of the @iov array
  * @fds: pointer to an array that will received file handles
  * @nfds: pointer filled with number of elements in @fds on return
+ * @flags: read flags (QIO_CHANNEL_READ_FLAG_*)
  * @errp: pointer to a NULL-initialized error object
  *
  * Read data from the IO channel, storing it in the
@@ -224,6 +229,7 @@ ssize_t qio_channel_readv_full(QIOChannel *ioc,
                                size_t niov,
                                int **fds,
                                size_t *nfds,
+                               int flags,
                                Error **errp);
 
 
diff --git a/io/channel-buffer.c b/io/channel-buffer.c
index bf52011be2d..8096180f855 100644
--- a/io/channel-buffer.c
+++ b/io/channel-buffer.c
@@ -54,6 +54,7 @@ static ssize_t qio_channel_buffer_readv(QIOChannel *ioc,
                                         size_t niov,
                                         int **fds,
                                         size_t *nfds,
+                                        int flags,
                                         Error **errp)
 {
     QIOChannelBuffer *bioc = QIO_CHANNEL_BUFFER(ioc);
diff --git a/io/channel-command.c b/io/channel-command.c
index 5ff1691badd..2834413b3a1 100644
--- a/io/channel-command.c
+++ b/io/channel-command.c
@@ -230,6 +230,7 @@ static ssize_t qio_channel_command_readv(QIOChannel *ioc,
                                          size_t niov,
                                          int **fds,
                                          size_t *nfds,
+                                         int flags,
                                          Error **errp)
 {
     QIOChannelCommand *cioc = QIO_CHANNEL_COMMAND(ioc);
diff --git a/io/channel-file.c b/io/channel-file.c
index 348a48545eb..490f0e5d84a 100644
--- a/io/channel-file.c
+++ b/io/channel-file.c
@@ -86,6 +86,7 @@ static ssize_t qio_channel_file_readv(QIOChannel *ioc,
                                       size_t niov,
                                       int **fds,
                                       size_t *nfds,
+                                      int flags,
                                       Error **errp)
 {
     QIOChannelFile *fioc = QIO_CHANNEL_FILE(ioc);
diff --git a/io/channel-socket.c b/io/channel-socket.c
index 6010ad7017e..ca8b180b691 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -174,6 +174,9 @@ int qio_channel_socket_connect_sync(QIOChannelSocket *ioc,
     }
 #endif
 
+    qio_channel_set_feature(QIO_CHANNEL(ioc),
+                            QIO_CHANNEL_FEATURE_READ_MSG_PEEK);
+
     return 0;
 }
 
@@ -407,6 +410,9 @@ qio_channel_socket_accept(QIOChannelSocket *ioc,
     }
 #endif /* WIN32 */
 
+    qio_channel_set_feature(QIO_CHANNEL(cioc),
+                            QIO_CHANNEL_FEATURE_READ_MSG_PEEK);
+
     trace_qio_channel_socket_accept_complete(ioc, cioc, cioc->fd);
     return cioc;
 
@@ -497,6 +503,7 @@ static ssize_t qio_channel_socket_readv(QIOChannel *ioc,
                                         size_t niov,
                                         int **fds,
                                         size_t *nfds,
+                                        int flags,
                                         Error **errp)
 {
     QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc);
@@ -518,6 +525,10 @@ static ssize_t qio_channel_socket_readv(QIOChannel *ioc,
 
     }
 
+    if (flags & QIO_CHANNEL_READ_FLAG_MSG_PEEK) {
+        sflags |= MSG_PEEK;
+    }
+
  retry:
     ret = recvmsg(sioc->fd, &msg, sflags);
     if (ret < 0) {
@@ -625,11 +636,17 @@ static ssize_t qio_channel_socket_readv(QIOChannel *ioc,
                                         size_t niov,
                                         int **fds,
                                         size_t *nfds,
+                                        int flags,
                                         Error **errp)
 {
     QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc);
     ssize_t done = 0;
     ssize_t i;
+    int sflags = 0;
+
+    if (flags & QIO_CHANNEL_READ_FLAG_MSG_PEEK) {
+        sflags |= MSG_PEEK;
+    }
 
     for (i = 0; i < niov; i++) {
         ssize_t ret;
@@ -637,7 +654,7 @@ static ssize_t qio_channel_socket_readv(QIOChannel *ioc,
         ret = recv(sioc->fd,
                    iov[i].iov_base,
                    iov[i].iov_len,
-                   0);
+                   sflags);
         if (ret < 0) {
             if (errno == EAGAIN) {
                 if (done) {
diff --git a/io/channel-tls.c b/io/channel-tls.c
index 4ce890a5382..c730cb8ec56 100644
--- a/io/channel-tls.c
+++ b/io/channel-tls.c
@@ -260,6 +260,7 @@ static ssize_t qio_channel_tls_readv(QIOChannel *ioc,
                                      size_t niov,
                                      int **fds,
                                      size_t *nfds,
+                                     int flags,
                                      Error **errp)
 {
     QIOChannelTLS *tioc = QIO_CHANNEL_TLS(ioc);
diff --git a/io/channel-websock.c b/io/channel-websock.c
index 035dd6075b1..13c94f2afe0 100644
--- a/io/channel-websock.c
+++ b/io/channel-websock.c
@@ -1081,6 +1081,7 @@ static ssize_t qio_channel_websock_readv(QIOChannel *ioc,
                                          size_t niov,
                                          int **fds,
                                          size_t *nfds,
+                                         int flags,
                                          Error **errp)
 {
     QIOChannelWebsock *wioc = QIO_CHANNEL_WEBSOCK(ioc);
diff --git a/io/channel.c b/io/channel.c
index 0640941ac57..a8c7f116490 100644
--- a/io/channel.c
+++ b/io/channel.c
@@ -52,6 +52,7 @@ ssize_t qio_channel_readv_full(QIOChannel *ioc,
                                size_t niov,
                                int **fds,
                                size_t *nfds,
+                               int flags,
                                Error **errp)
 {
     QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc);
@@ -63,7 +64,14 @@ ssize_t qio_channel_readv_full(QIOChannel *ioc,
         return -1;
     }
 
-    return klass->io_readv(ioc, iov, niov, fds, nfds, errp);
+    if ((flags & QIO_CHANNEL_READ_FLAG_MSG_PEEK) &&
+        !qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_READ_MSG_PEEK)) {
+        error_setg_errno(errp, EINVAL,
+                         "Channel does not support peek read");
+        return -1;
+    }
+
+    return klass->io_readv(ioc, iov, niov, fds, nfds, flags, errp);
 }
 
 
@@ -146,7 +154,7 @@ int qio_channel_readv_full_all_eof(QIOChannel *ioc,
     while ((nlocal_iov > 0) || local_fds) {
         ssize_t len;
         len = qio_channel_readv_full(ioc, local_iov, nlocal_iov, local_fds,
-                                     local_nfds, errp);
+                                     local_nfds, 0, errp);
         if (len == QIO_CHANNEL_ERR_BLOCK) {
             if (qemu_in_coroutine()) {
                 qio_channel_yield(ioc, G_IO_IN);
@@ -284,7 +292,7 @@ ssize_t qio_channel_readv(QIOChannel *ioc,
                           size_t niov,
                           Error **errp)
 {
-    return qio_channel_readv_full(ioc, iov, niov, NULL, NULL, errp);
+    return qio_channel_readv_full(ioc, iov, niov, NULL, NULL, 0, errp);
 }
 
 
@@ -303,7 +311,7 @@ ssize_t qio_channel_read(QIOChannel *ioc,
                          Error **errp)
 {
     struct iovec iov = { .iov_base = buf, .iov_len = buflen };
-    return qio_channel_readv_full(ioc, &iov, 1, NULL, NULL, errp);
+    return qio_channel_readv_full(ioc, &iov, 1, NULL, NULL, 0, errp);
 }
 
 
diff --git a/migration/rdma.c b/migration/rdma.c
index 54acd2000e0..dcf98bd7f8d 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2917,6 +2917,7 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
                                       size_t niov,
                                       int **fds,
                                       size_t *nfds,
+                                      int flags,
                                       Error **errp)
 {
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
diff --git a/scsi/qemu-pr-helper.c b/scsi/qemu-pr-helper.c
index f281daeced8..12ec8e93688 100644
--- a/scsi/qemu-pr-helper.c
+++ b/scsi/qemu-pr-helper.c
@@ -612,7 +612,7 @@ static int coroutine_fn prh_read(PRHelperClient *client, void *buf, int sz,
         iov.iov_base = buf;
         iov.iov_len = sz;
         n_read = qio_channel_readv_full(QIO_CHANNEL(client->ioc), &iov, 1,
-                                        &fds, &nfds, errp);
+                                        &fds, &nfds, 0, errp);
 
         if (n_read == QIO_CHANNEL_ERR_BLOCK) {
             qio_channel_yield(QIO_CHANNEL(client->ioc), G_IO_IN);
diff --git a/tests/qtest/tpm-emu.c b/tests/qtest/tpm-emu.c
index 2994d1cf423..3cf1acaf7d0 100644
--- a/tests/qtest/tpm-emu.c
+++ b/tests/qtest/tpm-emu.c
@@ -106,7 +106,7 @@ void *tpm_emu_ctrl_thread(void *data)
         int *pfd = NULL;
         size_t nfd = 0;
 
-        qio_channel_readv_full(ioc, &iov, 1, &pfd, &nfd, &error_abort);
+        qio_channel_readv_full(ioc, &iov, 1, &pfd, &nfd, 0, &error_abort);
         cmd = be32_to_cpu(cmd);
         g_assert_cmpint(cmd, ==, CMD_SET_DATAFD);
         g_assert_cmpint(nfd, ==, 1);
diff --git a/tests/unit/test-io-channel-socket.c b/tests/unit/test-io-channel-socket.c
index 6713886d02a..de2930f203f 100644
--- a/tests/unit/test-io-channel-socket.c
+++ b/tests/unit/test-io-channel-socket.c
@@ -452,6 +452,7 @@ static void test_io_channel_unix_fd_pass(void)
                            G_N_ELEMENTS(iorecv),
                            &fdrecv,
                            &nfdrecv,
+                           0,
                            &error_abort);
 
     g_assert(nfdrecv == G_N_ELEMENTS(fdsend));
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
index 783d847a6db..e6a9ef72b74 100644
--- a/util/vhost-user-server.c
+++ b/util/vhost-user-server.c
@@ -102,7 +102,7 @@ vu_message_read(VuDev *vu_dev, int conn_fd, VhostUserMsg *vmsg)
          * qio_channel_readv_full may have short reads, keeping calling it
          * until getting VHOST_USER_HDR_SIZE or 0 bytes in total
          */
-        rc = qio_channel_readv_full(ioc, &iov, 1, &fds, &nfds, &local_err);
+        rc = qio_channel_readv_full(ioc, &iov, 1, &fds, &nfds, 0, &local_err);
         if (rc < 0) {
             if (rc == QIO_CHANNEL_ERR_BLOCK) {
                 assert(local_err == NULL);
-- 
Gitee


From 0d9dd4bdae822330762c3a4690096f3fee2bae73 Mon Sep 17 00:00:00 2001
From: "manish.mishra" <manish.mishra@nutanix.com>
Date: Tue, 20 Dec 2022 18:44:18 +0000
Subject: [PATCH 236/407] migration: check magic value for deciding the mapping
 of channels
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Peter Xu <peterx@redhat.com>
RH-MergeRequest: 258: migration: Fix multifd crash due to channel disorder
RH-Bugzilla: 2137740
RH-Acked-by: quintela1 <quintela@redhat.com>
RH-Acked-by: Leonardo Brás <leobras@redhat.com>
RH-Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
RH-Commit: [2/2] f97bebef3d3e372cfd660e5ddb6cffba791840d2

Conflicts:
        migration/migration.c
        migration/multifd.c
        migration/postcopy-ram.c
        migration/postcopy-ram.h

        There're a bunch of conflicts due to missing upstream patches on
        e.g. on qemufile reworks, postcopy preempt.  We don't plan to have
        preempt in rhel8 at all, probably the same as the rest.

Current logic assumes that channel connections on the destination side are
always established in the same order as the source and the first one will
always be the main channel followed by the multifid or post-copy
preemption channel. This may not be always true, as even if a channel has a
connection established on the source side it can be in the pending state on
the destination side and a newer connection can be established first.
Basically causing out of order mapping of channels on the destination side.
Currently, all channels except post-copy preempt send a magic number, this
patch uses that magic number to decide the type of channel. This logic is
applicable only for precopy(multifd) live migration, as mentioned, the
post-copy preempt channel does not send any magic number. Also, tls live
migrations already does tls handshake before creating other channels, so
this issue is not possible with tls, hence this logic is avoided for tls
live migrations. This patch uses read peek to check the magic number of
channels so that current data/control stream management remains
un-effected.

Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Suggested-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: manish.mishra <manish.mishra@nutanix.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
(cherry picked from commit 6720c2b32725e6ac404f22851a0ecd0a71d0cbe2)
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/channel.c   | 45 ++++++++++++++++++++++++++++++++++++++
 migration/channel.h   |  5 +++++
 migration/migration.c | 51 +++++++++++++++++++++++++++++++------------
 migration/multifd.c   | 19 ++++++++--------
 migration/multifd.h   |  2 +-
 5 files changed, 98 insertions(+), 24 deletions(-)

diff --git a/migration/channel.c b/migration/channel.c
index 086b5c0d8b1..ee308fef235 100644
--- a/migration/channel.c
+++ b/migration/channel.c
@@ -98,3 +98,48 @@ void migration_channel_connect(MigrationState *s,
     g_free(s->hostname);
     error_free(error);
 }
+
+
+/**
+ * @migration_channel_read_peek - Peek at migration channel, without
+ *     actually removing it from channel buffer.
+ *
+ * @ioc: the channel object
+ * @buf: the memory region to read data into
+ * @buflen: the number of bytes to read in @buf
+ * @errp: pointer to a NULL-initialized error object
+ *
+ * Returns 0 if successful, returns -1 and sets @errp if fails.
+ */
+int migration_channel_read_peek(QIOChannel *ioc,
+                                const char *buf,
+                                const size_t buflen,
+                                Error **errp)
+{
+    ssize_t len = 0;
+    struct iovec iov = { .iov_base = (char *)buf, .iov_len = buflen };
+
+    while (true) {
+        len = qio_channel_readv_full(ioc, &iov, 1, NULL, NULL,
+                                     QIO_CHANNEL_READ_FLAG_MSG_PEEK, errp);
+
+        if (len <= 0 && len != QIO_CHANNEL_ERR_BLOCK) {
+            error_setg(errp,
+                       "Failed to peek at channel");
+            return -1;
+        }
+
+        if (len == buflen) {
+            break;
+        }
+
+        /* 1ms sleep. */
+        if (qemu_in_coroutine()) {
+            qemu_co_sleep_ns(QEMU_CLOCK_REALTIME, 1000000);
+        } else {
+            g_usleep(1000);
+        }
+    }
+
+    return 0;
+}
diff --git a/migration/channel.h b/migration/channel.h
index 67a461c28a4..5bdb8208a74 100644
--- a/migration/channel.h
+++ b/migration/channel.h
@@ -24,4 +24,9 @@ void migration_channel_connect(MigrationState *s,
                                QIOChannel *ioc,
                                const char *hostname,
                                Error *error_in);
+
+int migration_channel_read_peek(QIOChannel *ioc,
+                                const char *buf,
+                                const size_t buflen,
+                                Error **errp);
 #endif
diff --git a/migration/migration.c b/migration/migration.c
index d8b24a2c917..0885549de04 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -32,6 +32,7 @@
 #include "savevm.h"
 #include "qemu-file-channel.h"
 #include "qemu-file.h"
+#include "channel.h"
 #include "migration/vmstate.h"
 #include "block/block.h"
 #include "qapi/error.h"
@@ -637,10 +638,6 @@ static bool migration_incoming_setup(QEMUFile *f, Error **errp)
 {
     MigrationIncomingState *mis = migration_incoming_get_current();
 
-    if (multifd_load_setup(errp) != 0) {
-        return false;
-    }
-
     if (!mis->from_src_file) {
         mis->from_src_file = f;
     }
@@ -701,10 +698,42 @@ void migration_fd_process_incoming(QEMUFile *f, Error **errp)
 void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
 {
     MigrationIncomingState *mis = migration_incoming_get_current();
+    bool default_channel = true;
+    uint32_t channel_magic = 0;
     Error *local_err = NULL;
-    bool start_migration;
+    int ret = 0;
 
-    if (!mis->from_src_file) {
+    if (migrate_use_multifd() && !migrate_postcopy_ram() &&
+        qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_READ_MSG_PEEK)) {
+        /*
+         * With multiple channels, it is possible that we receive channels
+         * out of order on destination side, causing incorrect mapping of
+         * source channels on destination side. Check channel MAGIC to
+         * decide type of channel. Please note this is best effort, postcopy
+         * preempt channel does not send any magic number so avoid it for
+         * postcopy live migration. Also tls live migration already does
+         * tls handshake while initializing main channel so with tls this
+         * issue is not possible.
+         */
+        ret = migration_channel_read_peek(ioc, (void *)&channel_magic,
+                                          sizeof(channel_magic), &local_err);
+
+        if (ret != 0) {
+            error_propagate(errp, local_err);
+            return;
+        }
+
+        default_channel = (channel_magic == cpu_to_be32(QEMU_VM_FILE_MAGIC));
+    } else {
+        default_channel = !mis->from_src_file;
+    }
+
+    if (multifd_load_setup(errp) != 0) {
+        error_setg(errp, "Failed to setup multifd channels");
+        return;
+    }
+
+    if (default_channel) {
         /* The first connection (multifd may have multiple) */
         QEMUFile *f = qemu_fopen_channel_input(ioc);
 
@@ -716,23 +745,17 @@ void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
         if (!migration_incoming_setup(f, errp)) {
             return;
         }
-
-        /*
-         * Common migration only needs one channel, so we can start
-         * right now.  Multifd needs more than one channel, we wait.
-         */
-        start_migration = !migrate_use_multifd();
     } else {
         /* Multiple connections */
         assert(migrate_use_multifd());
-        start_migration = multifd_recv_new_channel(ioc, &local_err);
+        multifd_recv_new_channel(ioc, &local_err);
         if (local_err) {
             error_propagate(errp, local_err);
             return;
         }
     }
 
-    if (start_migration) {
+    if (migration_has_all_channels()) {
         migration_incoming_process();
     }
 }
diff --git a/migration/multifd.c b/migration/multifd.c
index 7c16523e6ba..75ac052d2f4 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -1183,9 +1183,14 @@ int multifd_load_setup(Error **errp)
     uint32_t page_count = MULTIFD_PACKET_SIZE / qemu_target_page_size();
     uint8_t i;
 
-    if (!migrate_use_multifd()) {
+    /*
+     * Return successfully if multiFD recv state is already initialised
+     * or multiFD is not enabled.
+     */
+    if (multifd_recv_state || !migrate_use_multifd()) {
         return 0;
     }
+
     if (!migrate_multifd_is_allowed()) {
         error_setg(errp, "multifd is not supported by current protocol");
         return -1;
@@ -1244,11 +1249,9 @@ bool multifd_recv_all_channels_created(void)
 
 /*
  * Try to receive all multifd channels to get ready for the migration.
- * - Return true and do not set @errp when correctly receiving all channels;
- * - Return false and do not set @errp when correctly receiving the current one;
- * - Return false and set @errp when failing to receive the current channel.
+ * Sets @errp when failing to receive the current channel.
  */
-bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp)
+void multifd_recv_new_channel(QIOChannel *ioc, Error **errp)
 {
     MultiFDRecvParams *p;
     Error *local_err = NULL;
@@ -1261,7 +1264,7 @@ bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp)
                                 "failed to receive packet"
                                 " via multifd channel %d: ",
                                 qatomic_read(&multifd_recv_state->count));
-        return false;
+        return;
     }
     trace_multifd_recv_new_channel(id);
 
@@ -1271,7 +1274,7 @@ bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp)
                    id);
         multifd_recv_terminate_threads(local_err);
         error_propagate(errp, local_err);
-        return false;
+        return;
     }
     p->c = ioc;
     object_ref(OBJECT(ioc));
@@ -1282,6 +1285,4 @@ bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp)
     qemu_thread_create(&p->thread, p->name, multifd_recv_thread, p,
                        QEMU_THREAD_JOINABLE);
     qatomic_inc(&multifd_recv_state->count);
-    return qatomic_read(&multifd_recv_state->count) ==
-           migrate_multifd_channels();
 }
diff --git a/migration/multifd.h b/migration/multifd.h
index 11d5e273e61..9c0a2a0701f 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -20,7 +20,7 @@ void multifd_save_cleanup(void);
 int multifd_load_setup(Error **errp);
 int multifd_load_cleanup(Error **errp);
 bool multifd_recv_all_channels_created(void);
-bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp);
+void multifd_recv_new_channel(QIOChannel *ioc, Error **errp);
 void multifd_recv_sync_main(void);
 int multifd_send_sync_main(QEMUFile *f);
 int multifd_queue_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset);
-- 
Gitee


From 673e33719e2924425bf9c5f21921a5f253b12850 Mon Sep 17 00:00:00 2001
From: Thomas Huth <thuth@redhat.com>
Date: Tue, 14 Feb 2023 14:48:37 +0100
Subject: [PATCH 237/407] target/s390x/arch_dump: Fix memory corruption in
 s390x_write_elf64_notes()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Thomas Huth <thuth@redhat.com>
RH-MergeRequest: 260: target/s390x/arch_dump: Fix memory corruption in s390x_write_elf64_notes()
RH-Bugzilla: 2168187
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Acked-by: Cédric Le Goater <clg@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Commit: [1/1] 67b71ed720a1f03d5bda9119969ea95fc4a6106d

Bugzilla: https://bugzilla.redhat.com/2168187
Upstream-Status: Posted (and reviewed, but not merged yet)

"note_size" can be smaller than sizeof(note), so unconditionally calling
memset(notep, 0, sizeof(note)) could cause a memory corruption here in
case notep has been allocated dynamically, thus let's use note_size as
length argument for memset() instead.

Fixes: 113d8f4e95 ("s390x: pv: Add dump support")
Message-Id: <20230214141056.680969-1-thuth@redhat.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
---
 target/s390x/arch_dump.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/s390x/arch_dump.c b/target/s390x/arch_dump.c
index a2329141e8a..a7c44ba49d3 100644
--- a/target/s390x/arch_dump.c
+++ b/target/s390x/arch_dump.c
@@ -248,7 +248,7 @@ static int s390x_write_elf64_notes(const char *note_name,
             notep = g_malloc(note_size);
         }
 
-        memset(notep, 0, sizeof(note));
+        memset(notep, 0, note_size);
 
         /* Setup note header data */
         notep->hdr.n_descsz = cpu_to_be32(content_size);
-- 
Gitee


From dede40e250f578aecfedaab6c40ba7aa6734dc35 Mon Sep 17 00:00:00 2001
From: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Date: Thu, 9 Mar 2023 09:26:35 -0500
Subject: [PATCH 238/407] aio_wait_kick: add missing memory barrier

RH-Author: Emanuele Giuseppe Esposito <eesposit@redhat.com>
RH-MergeRequest: 263: qatomic: add smp_mb__before/after_rmw()
RH-Bugzilla: 2168472
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Eric Auger <eric.auger@redhat.com>
RH-Acked-by: Paolo Bonzini <pbonzini@redhat.com>
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Commit: [1/10] eb774aee79864052e14e706d931e52e7bd1162c8

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2168472

commit 7455ff1aa01564cc175db5b2373e610503ad4411
Author: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Date:   Tue May 24 13:30:54 2022 -0400

    aio_wait_kick: add missing memory barrier

    It seems that aio_wait_kick always required a memory barrier
    or atomic operation in the caller, but nobody actually
    took care of doing it.

    Let's put the barrier in the function instead, and pair it
    with another one in AIO_WAIT_WHILE. Read aio_wait_kick()
    comment for further explanation.

    Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
    Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
    Message-Id: <20220524173054.12651-1-eesposit@redhat.com>
    Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
    Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
    Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 include/block/aio-wait.h |  2 ++
 util/aio-wait.c          | 16 +++++++++++++++-
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/include/block/aio-wait.h b/include/block/aio-wait.h
index b39eefb38d1..54840f86223 100644
--- a/include/block/aio-wait.h
+++ b/include/block/aio-wait.h
@@ -81,6 +81,8 @@ extern AioWait global_aio_wait;
     AioContext *ctx_ = (ctx);                                      \
     /* Increment wait_->num_waiters before evaluating cond. */     \
     qatomic_inc(&wait_->num_waiters);                              \
+    /* Paired with smp_mb in aio_wait_kick(). */                   \
+    smp_mb();                                                      \
     if (ctx_ && in_aio_context_home_thread(ctx_)) {                \
         while ((cond)) {                                           \
             aio_poll(ctx_, true);                                  \
diff --git a/util/aio-wait.c b/util/aio-wait.c
index bdb3d3af22d..98c5accd29d 100644
--- a/util/aio-wait.c
+++ b/util/aio-wait.c
@@ -35,7 +35,21 @@ static void dummy_bh_cb(void *opaque)
 
 void aio_wait_kick(void)
 {
-    /* The barrier (or an atomic op) is in the caller.  */
+    /*
+     * Paired with smp_mb in AIO_WAIT_WHILE. Here we have:
+     * write(condition);
+     * aio_wait_kick() {
+     *      smp_mb();
+     *      read(num_waiters);
+     * }
+     *
+     * And in AIO_WAIT_WHILE:
+     * write(num_waiters);
+     * smp_mb();
+     * read(condition);
+     */
+    smp_mb();
+
     if (qatomic_read(&global_aio_wait.num_waiters)) {
         aio_bh_schedule_oneshot(qemu_get_aio_context(), dummy_bh_cb, NULL);
     }
-- 
Gitee


From fbba1b13e533c649563b63f66468b91f7fffe7fc Mon Sep 17 00:00:00 2001
From: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Date: Thu, 9 Mar 2023 08:10:33 -0500
Subject: [PATCH 239/407] qatomic: add smp_mb__before/after_rmw()

RH-Author: Emanuele Giuseppe Esposito <eesposit@redhat.com>
RH-MergeRequest: 263: qatomic: add smp_mb__before/after_rmw()
RH-Bugzilla: 2168472
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Eric Auger <eric.auger@redhat.com>
RH-Acked-by: Paolo Bonzini <pbonzini@redhat.com>
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Commit: [2/10] 1f87eb3157abcf23f020881cedce42f76497f348

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2168472

commit ff00bed1897c3d27adc5b0cec6f6eeb5a7d13176
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Thu Mar 2 11:10:56 2023 +0100

    qatomic: add smp_mb__before/after_rmw()

    On ARM, seqcst loads and stores (which QEMU does not use) are compiled
    respectively as LDAR and STLR instructions.  Even though LDAR is
    also used for load-acquire operations, it also waits for all STLRs to
    leave the store buffer.  Thus, LDAR and STLR alone are load-acquire
    and store-release operations, but LDAR also provides store-against-load
    ordering as long as the previous store is a STLR.

    Compare this to ARMv7, where store-release is DMB+STR and load-acquire
    is LDR+DMB, but an additional DMB is needed between store-seqcst and
    load-seqcst (e.g. DMB+STR+DMB+LDR+DMB); or with x86, where MOV provides
    load-acquire and store-release semantics and the two can be reordered.

    Likewise, on ARM sequentially consistent read-modify-write operations only
    need to use LDAXR and STLXR respectively for the load and the store, while
    on x86 they need to use the stronger LOCK prefix.

    In a strange twist of events, however, the _stronger_ semantics
    of the ARM instructions can end up causing bugs on ARM, not on x86.
    The problems occur when seqcst atomics are mixed with relaxed atomics.

    QEMU's atomics try to bridge the Linux API (that most of the developers
    are familiar with) and the C11 API, and the two have a substantial
    difference:

    - in Linux, strongly-ordered atomics such as atomic_add_return() affect
      the global ordering of _all_ memory operations, including for example
      READ_ONCE()/WRITE_ONCE()

    - in C11, sequentially consistent atomics (except for seq-cst fences)
      only affect the ordering of sequentially consistent operations.
      In particular, since relaxed loads are done with LDR on ARM, they are
      not ordered against seqcst stores (which are done with STLR).

    QEMU implements high-level synchronization primitives with the idea that
    the primitives contain the necessary memory barriers, and the callers can
    use relaxed atomics (qatomic_read/qatomic_set) or even regular accesses.
    This is very much incompatible with the C11 view that seqcst accesses
    are only ordered against other seqcst accesses, and requires using seqcst
    fences as in the following example:

       qatomic_set(&y, 1);            qatomic_set(&x, 1);
       smp_mb();                      smp_mb();
       ... qatomic_read(&x) ...       ... qatomic_read(&y) ...

    When a qatomic_*() read-modify write operation is used instead of one
    or both stores, developers that are more familiar with the Linux API may
    be tempted to omit the smp_mb(), which will work on x86 but not on ARM.

    This nasty difference between Linux and C11 read-modify-write operations
    has already caused issues in util/async.c and more are being found.
    Provide something similar to Linux smp_mb__before/after_atomic(); this
    has the double function of documenting clearly why there is a memory
    barrier, and avoiding a double barrier on x86 and s390x systems.

    The new macro can already be put to use in qatomic_mb_set().

    Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 docs/devel/atomics.rst | 26 +++++++++++++++++++++-----
 include/qemu/atomic.h  | 17 ++++++++++++++++-
 2 files changed, 37 insertions(+), 6 deletions(-)

diff --git a/docs/devel/atomics.rst b/docs/devel/atomics.rst
index 52baa0736d2..10fbfc58bb1 100644
--- a/docs/devel/atomics.rst
+++ b/docs/devel/atomics.rst
@@ -25,7 +25,8 @@ provides macros that fall in three camps:
 
 - weak atomic access and manual memory barriers: ``qatomic_read()``,
   ``qatomic_set()``, ``smp_rmb()``, ``smp_wmb()``, ``smp_mb()``,
-  ``smp_mb_acquire()``, ``smp_mb_release()``, ``smp_read_barrier_depends()``;
+  ``smp_mb_acquire()``, ``smp_mb_release()``, ``smp_read_barrier_depends()``,
+  ``smp_mb__before_rmw()``, ``smp_mb__after_rmw()``;
 
 - sequentially consistent atomic access: everything else.
 
@@ -470,7 +471,7 @@ and memory barriers, and the equivalents in QEMU:
   sequential consistency.
 
 - in QEMU, ``qatomic_read()`` and ``qatomic_set()`` do not participate in
-  the total ordering enforced by sequentially-consistent operations.
+  the ordering enforced by read-modify-write operations.
   This is because QEMU uses the C11 memory model.  The following example
   is correct in Linux but not in QEMU:
 
@@ -486,9 +487,24 @@ and memory barriers, and the equivalents in QEMU:
   because the read of ``y`` can be moved (by either the processor or the
   compiler) before the write of ``x``.
 
-  Fixing this requires an ``smp_mb()`` memory barrier between the write
-  of ``x`` and the read of ``y``.  In the common case where only one thread
-  writes ``x``, it is also possible to write it like this:
+  Fixing this requires a full memory barrier between the write of ``x`` and
+  the read of ``y``.  QEMU provides ``smp_mb__before_rmw()`` and
+  ``smp_mb__after_rmw()``; they act both as an optimization,
+  avoiding the memory barrier on processors where it is unnecessary,
+  and as a clarification of this corner case of the C11 memory model:
+
+      +--------------------------------+
+      | QEMU (correct)                 |
+      +================================+
+      | ::                             |
+      |                                |
+      |   a = qatomic_fetch_add(&x, 2);|
+      |   smp_mb__after_rmw();         |
+      |   b = qatomic_read(&y);        |
+      +--------------------------------+
+
+  In the common case where only one thread writes ``x``, it is also possible
+  to write it like this:
 
       +--------------------------------+
       | QEMU (correct)                 |
diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
index 112a29910be..7855443cab9 100644
--- a/include/qemu/atomic.h
+++ b/include/qemu/atomic.h
@@ -243,6 +243,20 @@
 #define smp_wmb()   smp_mb_release()
 #define smp_rmb()   smp_mb_acquire()
 
+/*
+ * SEQ_CST is weaker than the older __sync_* builtins and Linux
+ * kernel read-modify-write atomics.  Provide a macro to obtain
+ * the same semantics.
+ */
+#if !defined(QEMU_SANITIZE_THREAD) && \
+    (defined(__i386__) || defined(__x86_64__) || defined(__s390x__))
+# define smp_mb__before_rmw() signal_barrier()
+# define smp_mb__after_rmw() signal_barrier()
+#else
+# define smp_mb__before_rmw() smp_mb()
+# define smp_mb__after_rmw() smp_mb()
+#endif
+
 /* qatomic_mb_read/set semantics map Java volatile variables. They are
  * less expensive on some platforms (notably POWER) than fully
  * sequentially consistent operations.
@@ -257,7 +271,8 @@
 #if !defined(__SANITIZE_THREAD__) && \
     (defined(__i386__) || defined(__x86_64__) || defined(__s390x__))
 /* This is more efficient than a store plus a fence.  */
-# define qatomic_mb_set(ptr, i)  ((void)qatomic_xchg(ptr, i))
+# define qatomic_mb_set(ptr, i) \
+    ({ (void)qatomic_xchg(ptr, i); smp_mb__after_rmw(); })
 #else
 # define qatomic_mb_set(ptr, i) \
    ({ qatomic_store_release(ptr, i); smp_mb(); })
-- 
Gitee


From 69ff1d0d6875dc9657e577774f428777b0d322d8 Mon Sep 17 00:00:00 2001
From: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Date: Thu, 9 Mar 2023 08:10:57 -0500
Subject: [PATCH 240/407] qemu-thread-posix: cleanup, fix, document QemuEvent

RH-Author: Emanuele Giuseppe Esposito <eesposit@redhat.com>
RH-MergeRequest: 263: qatomic: add smp_mb__before/after_rmw()
RH-Bugzilla: 2168472
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Eric Auger <eric.auger@redhat.com>
RH-Acked-by: Paolo Bonzini <pbonzini@redhat.com>
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Commit: [3/10] 746070c4d78c7f0a9ac4456d9aee69475acb8964

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2168472

commit 9586a1329f5dce6c1d7f4de53cf0536644d7e593
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Thu Mar 2 11:19:52 2023 +0100

    qemu-thread-posix: cleanup, fix, document QemuEvent

    QemuEvent is currently broken on ARM due to missing memory barriers
    after qatomic_*().  Apart from adding the memory barrier, a closer look
    reveals some unpaired memory barriers too.  Document more clearly what
    is going on.

    Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 util/qemu-thread-posix.c | 69 ++++++++++++++++++++++++++++------------
 1 file changed, 49 insertions(+), 20 deletions(-)

diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c
index e1225b63bd2..dd3b6d46708 100644
--- a/util/qemu-thread-posix.c
+++ b/util/qemu-thread-posix.c
@@ -430,13 +430,21 @@ void qemu_event_destroy(QemuEvent *ev)
 
 void qemu_event_set(QemuEvent *ev)
 {
-    /* qemu_event_set has release semantics, but because it *loads*
+    assert(ev->initialized);
+
+    /*
+     * Pairs with both qemu_event_reset() and qemu_event_wait().
+     *
+     * qemu_event_set has release semantics, but because it *loads*
      * ev->value we need a full memory barrier here.
      */
-    assert(ev->initialized);
     smp_mb();
     if (qatomic_read(&ev->value) != EV_SET) {
-        if (qatomic_xchg(&ev->value, EV_SET) == EV_BUSY) {
+        int old = qatomic_xchg(&ev->value, EV_SET);
+
+        /* Pairs with memory barrier in kernel futex_wait system call.  */
+        smp_mb__after_rmw();
+        if (old == EV_BUSY) {
             /* There were waiters, wake them up.  */
             qemu_futex_wake(ev, INT_MAX);
         }
@@ -445,18 +453,19 @@ void qemu_event_set(QemuEvent *ev)
 
 void qemu_event_reset(QemuEvent *ev)
 {
-    unsigned value;
-
     assert(ev->initialized);
-    value = qatomic_read(&ev->value);
-    smp_mb_acquire();
-    if (value == EV_SET) {
-        /*
-         * If there was a concurrent reset (or even reset+wait),
-         * do nothing.  Otherwise change EV_SET->EV_FREE.
-         */
-        qatomic_or(&ev->value, EV_FREE);
-    }
+
+    /*
+     * If there was a concurrent reset (or even reset+wait),
+     * do nothing.  Otherwise change EV_SET->EV_FREE.
+     */
+    qatomic_or(&ev->value, EV_FREE);
+
+    /*
+     * Order reset before checking the condition in the caller.
+     * Pairs with the first memory barrier in qemu_event_set().
+     */
+    smp_mb__after_rmw();
 }
 
 void qemu_event_wait(QemuEvent *ev)
@@ -464,20 +473,40 @@ void qemu_event_wait(QemuEvent *ev)
     unsigned value;
 
     assert(ev->initialized);
-    value = qatomic_read(&ev->value);
-    smp_mb_acquire();
+
+    /*
+     * qemu_event_wait must synchronize with qemu_event_set even if it does
+     * not go down the slow path, so this load-acquire is needed that
+     * synchronizes with the first memory barrier in qemu_event_set().
+     *
+     * If we do go down the slow path, there is no requirement at all: we
+     * might miss a qemu_event_set() here but ultimately the memory barrier in
+     * qemu_futex_wait() will ensure the check is done correctly.
+     */
+    value = qatomic_load_acquire(&ev->value);
     if (value != EV_SET) {
         if (value == EV_FREE) {
             /*
-             * Leave the event reset and tell qemu_event_set that there
-             * are waiters.  No need to retry, because there cannot be
-             * a concurrent busy->free transition.  After the CAS, the
-             * event will be either set or busy.
+             * Leave the event reset and tell qemu_event_set that there are
+             * waiters.  No need to retry, because there cannot be a concurrent
+             * busy->free transition.  After the CAS, the event will be either
+             * set or busy.
+             *
+             * This cmpxchg doesn't have particular ordering requirements if it
+             * succeeds (moving the store earlier can only cause qemu_event_set()
+             * to issue _more_ wakeups), the failing case needs acquire semantics
+             * like the load above.
              */
             if (qatomic_cmpxchg(&ev->value, EV_FREE, EV_BUSY) == EV_SET) {
                 return;
             }
         }
+
+        /*
+         * This is the final check for a concurrent set, so it does need
+         * a smp_mb() pairing with the second barrier of qemu_event_set().
+         * The barrier is inside the FUTEX_WAIT system call.
+         */
         qemu_futex_wait(ev, EV_BUSY);
     }
 }
-- 
Gitee


From 89f0f430a6946f89ac720a42ebf301260ea9cb1f Mon Sep 17 00:00:00 2001
From: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Date: Thu, 9 Mar 2023 08:11:05 -0500
Subject: [PATCH 241/407] qemu-thread-win32: cleanup, fix, document QemuEvent

RH-Author: Emanuele Giuseppe Esposito <eesposit@redhat.com>
RH-MergeRequest: 263: qatomic: add smp_mb__before/after_rmw()
RH-Bugzilla: 2168472
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Eric Auger <eric.auger@redhat.com>
RH-Acked-by: Paolo Bonzini <pbonzini@redhat.com>
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Commit: [4/10] 43d5bd903b460d4c3c5793a456820e8c5c8521d9

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2168472

commit 6c5df4b48f0c52a61342ecb307a43f4c2a3565c4
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Thu Mar 2 11:22:50 2023 +0100

    qemu-thread-win32: cleanup, fix, document QemuEvent

    QemuEvent is currently broken on ARM due to missing memory barriers
    after qatomic_*().  Apart from adding the memory barrier, a closer look
    reveals some unpaired memory barriers that are not really needed and
    complicated the functions unnecessarily.  Also, it is relying on
    a memory barrier in ResetEvent(); the barrier _ought_ to be there
    but there is really no documentation about it, so make it explicit.

    Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 util/qemu-thread-win32.c | 82 +++++++++++++++++++++++++++-------------
 1 file changed, 56 insertions(+), 26 deletions(-)

diff --git a/util/qemu-thread-win32.c b/util/qemu-thread-win32.c
index 52eb19f3511..c10249bc2e7 100644
--- a/util/qemu-thread-win32.c
+++ b/util/qemu-thread-win32.c
@@ -246,12 +246,20 @@ void qemu_event_destroy(QemuEvent *ev)
 void qemu_event_set(QemuEvent *ev)
 {
     assert(ev->initialized);
-    /* qemu_event_set has release semantics, but because it *loads*
+
+    /*
+     * Pairs with both qemu_event_reset() and qemu_event_wait().
+     *
+     * qemu_event_set has release semantics, but because it *loads*
      * ev->value we need a full memory barrier here.
      */
     smp_mb();
     if (qatomic_read(&ev->value) != EV_SET) {
-        if (qatomic_xchg(&ev->value, EV_SET) == EV_BUSY) {
+        int old = qatomic_xchg(&ev->value, EV_SET);
+
+        /* Pairs with memory barrier after ResetEvent.  */
+        smp_mb__after_rmw();
+        if (old == EV_BUSY) {
             /* There were waiters, wake them up.  */
             SetEvent(ev->event);
         }
@@ -260,17 +268,19 @@ void qemu_event_set(QemuEvent *ev)
 
 void qemu_event_reset(QemuEvent *ev)
 {
-    unsigned value;
-
     assert(ev->initialized);
-    value = qatomic_read(&ev->value);
-    smp_mb_acquire();
-    if (value == EV_SET) {
-        /* If there was a concurrent reset (or even reset+wait),
-         * do nothing.  Otherwise change EV_SET->EV_FREE.
-         */
-        qatomic_or(&ev->value, EV_FREE);
-    }
+
+    /*
+     * If there was a concurrent reset (or even reset+wait),
+     * do nothing.  Otherwise change EV_SET->EV_FREE.
+     */
+    qatomic_or(&ev->value, EV_FREE);
+
+    /*
+     * Order reset before checking the condition in the caller.
+     * Pairs with the first memory barrier in qemu_event_set().
+     */
+    smp_mb__after_rmw();
 }
 
 void qemu_event_wait(QemuEvent *ev)
@@ -278,29 +288,49 @@ void qemu_event_wait(QemuEvent *ev)
     unsigned value;
 
     assert(ev->initialized);
-    value = qatomic_read(&ev->value);
-    smp_mb_acquire();
+
+    /*
+     * qemu_event_wait must synchronize with qemu_event_set even if it does
+     * not go down the slow path, so this load-acquire is needed that
+     * synchronizes with the first memory barrier in qemu_event_set().
+     *
+     * If we do go down the slow path, there is no requirement at all: we
+     * might miss a qemu_event_set() here but ultimately the memory barrier in
+     * qemu_futex_wait() will ensure the check is done correctly.
+     */
+    value = qatomic_load_acquire(&ev->value);
     if (value != EV_SET) {
         if (value == EV_FREE) {
-            /* qemu_event_set is not yet going to call SetEvent, but we are
-             * going to do another check for EV_SET below when setting EV_BUSY.
-             * At that point it is safe to call WaitForSingleObject.
+            /*
+             * Here the underlying kernel event is reset, but qemu_event_set is
+             * not yet going to call SetEvent.  However, there will be another
+             * check for EV_SET below when setting EV_BUSY.  At that point it
+             * is safe to call WaitForSingleObject.
              */
             ResetEvent(ev->event);
 
-            /* Tell qemu_event_set that there are waiters.  No need to retry
-             * because there cannot be a concurrent busy->free transition.
-             * After the CAS, the event will be either set or busy.
+            /*
+             * It is not clear whether ResetEvent provides this barrier; kernel
+             * APIs (KeResetEvent/KeClearEvent) do not.  Better safe than sorry!
+             */
+            smp_mb();
+
+            /*
+             * Leave the event reset and tell qemu_event_set that there are
+             * waiters.  No need to retry, because there cannot be a concurrent
+             * busy->free transition.  After the CAS, the event will be either
+             * set or busy.
              */
             if (qatomic_cmpxchg(&ev->value, EV_FREE, EV_BUSY) == EV_SET) {
-                value = EV_SET;
-            } else {
-                value = EV_BUSY;
+                return;
             }
         }
-        if (value == EV_BUSY) {
-            WaitForSingleObject(ev->event, INFINITE);
-        }
+
+        /*
+         * ev->value is now EV_BUSY.  Since we didn't observe EV_SET,
+         * qemu_event_set() must observe EV_BUSY and call SetEvent().
+         */
+        WaitForSingleObject(ev->event, INFINITE);
     }
 }
 
-- 
Gitee


From 44aa0974cf3ca46ccb20fa4d613090df550074c4 Mon Sep 17 00:00:00 2001
From: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Date: Thu, 9 Mar 2023 08:11:14 -0500
Subject: [PATCH 242/407] edu: add smp_mb__after_rmw()

RH-Author: Emanuele Giuseppe Esposito <eesposit@redhat.com>
RH-MergeRequest: 263: qatomic: add smp_mb__before/after_rmw()
RH-Bugzilla: 2168472
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Eric Auger <eric.auger@redhat.com>
RH-Acked-by: Paolo Bonzini <pbonzini@redhat.com>
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Commit: [5/10] 300901290e08b253b1278eedc39cd07c1e202b96

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2168472

commit 2482aeea4195ad84cf3d4e5b15b28ec5b420ed5a
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Thu Mar 2 11:16:13 2023 +0100

    edu: add smp_mb__after_rmw()

    Ensure ordering between clearing the COMPUTING flag and checking
    IRQFACT, and between setting the IRQFACT flag and checking
    COMPUTING.  This ensures that no wakeups are lost.

    Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 hw/misc/edu.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/hw/misc/edu.c b/hw/misc/edu.c
index e935c418d40..a1f8bc77e77 100644
--- a/hw/misc/edu.c
+++ b/hw/misc/edu.c
@@ -267,6 +267,8 @@ static void edu_mmio_write(void *opaque, hwaddr addr, uint64_t val,
     case 0x20:
         if (val & EDU_STATUS_IRQFACT) {
             qatomic_or(&edu->status, EDU_STATUS_IRQFACT);
+            /* Order check of the COMPUTING flag after setting IRQFACT.  */
+            smp_mb__after_rmw();
         } else {
             qatomic_and(&edu->status, ~EDU_STATUS_IRQFACT);
         }
@@ -349,6 +351,9 @@ static void *edu_fact_thread(void *opaque)
         qemu_mutex_unlock(&edu->thr_mutex);
         qatomic_and(&edu->status, ~EDU_STATUS_COMPUTING);
 
+        /* Clear COMPUTING flag before checking IRQFACT.  */
+        smp_mb__after_rmw();
+
         if (qatomic_read(&edu->status) & EDU_STATUS_IRQFACT) {
             qemu_mutex_lock_iothread();
             edu_raise_irq(edu, FACT_IRQ);
-- 
Gitee


From 5963fbb119ef8a8fb4551852f8f86165855ca1ae Mon Sep 17 00:00:00 2001
From: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Date: Thu, 9 Mar 2023 08:13:17 -0500
Subject: [PATCH 243/407] aio-wait: switch to smp_mb__after_rmw()

RH-Author: Emanuele Giuseppe Esposito <eesposit@redhat.com>
RH-MergeRequest: 263: qatomic: add smp_mb__before/after_rmw()
RH-Bugzilla: 2168472
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Eric Auger <eric.auger@redhat.com>
RH-Acked-by: Paolo Bonzini <pbonzini@redhat.com>
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Commit: [6/10] 9f30f97754139ffd18d36b2350f9ed4e59ac496e

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2168472

commit b532526a07ef3b903ead2e055fe6cc87b41057a3
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Fri Mar 3 11:03:52 2023 +0100

    aio-wait: switch to smp_mb__after_rmw()

    The barrier comes after an atomic increment, so it is enough to use
    smp_mb__after_rmw(); this avoids a double barrier on x86 systems.

    Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 include/block/aio-wait.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/block/aio-wait.h b/include/block/aio-wait.h
index 54840f86223..03b6394c78a 100644
--- a/include/block/aio-wait.h
+++ b/include/block/aio-wait.h
@@ -82,7 +82,7 @@ extern AioWait global_aio_wait;
     /* Increment wait_->num_waiters before evaluating cond. */     \
     qatomic_inc(&wait_->num_waiters);                              \
     /* Paired with smp_mb in aio_wait_kick(). */                   \
-    smp_mb();                                                      \
+    smp_mb__after_rmw();                                           \
     if (ctx_ && in_aio_context_home_thread(ctx_)) {                \
         while ((cond)) {                                           \
             aio_poll(ctx_, true);                                  \
-- 
Gitee


From 952a4a4e3fb5d84668bba358b3716d739a750b7f Mon Sep 17 00:00:00 2001
From: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Date: Thu, 9 Mar 2023 08:15:14 -0500
Subject: [PATCH 244/407] qemu-coroutine-lock: add smp_mb__after_rmw()

RH-Author: Emanuele Giuseppe Esposito <eesposit@redhat.com>
RH-MergeRequest: 263: qatomic: add smp_mb__before/after_rmw()
RH-Bugzilla: 2168472
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Eric Auger <eric.auger@redhat.com>
RH-Acked-by: Paolo Bonzini <pbonzini@redhat.com>
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Commit: [7/10] 9cf1b6d3b0dd154489e75ad54a3000ea58983960

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2168472

commit e3a3b6ec8169eab2feb241b4982585001512cd55
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Fri Mar 3 10:52:59 2023 +0100

    qemu-coroutine-lock: add smp_mb__after_rmw()

    mutex->from_push and mutex->handoff in qemu-coroutine-lock implement
    the familiar pattern:

       write a                                  write b
       smp_mb()                                 smp_mb()
       read b                                   read a

    The memory barrier is required by the C memory model even after a
    SEQ_CST read-modify-write operation such as QSLIST_INSERT_HEAD_ATOMIC.
    Add it and avoid the unclear qatomic_mb_read() operation.

    Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 util/qemu-coroutine-lock.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c
index 26694038397..a03ed0e6646 100644
--- a/util/qemu-coroutine-lock.c
+++ b/util/qemu-coroutine-lock.c
@@ -206,10 +206,16 @@ static void coroutine_fn qemu_co_mutex_lock_slowpath(AioContext *ctx,
     trace_qemu_co_mutex_lock_entry(mutex, self);
     push_waiter(mutex, &w);
 
+    /*
+     * Add waiter before reading mutex->handoff.  Pairs with qatomic_mb_set
+     * in qemu_co_mutex_unlock.
+     */
+    smp_mb__after_rmw();
+
     /* This is the "Responsibility Hand-Off" protocol; a lock() picks from
      * a concurrent unlock() the responsibility of waking somebody up.
      */
-    old_handoff = qatomic_mb_read(&mutex->handoff);
+    old_handoff = qatomic_read(&mutex->handoff);
     if (old_handoff &&
         has_waiters(mutex) &&
         qatomic_cmpxchg(&mutex->handoff, old_handoff, 0) == old_handoff) {
@@ -308,6 +314,7 @@ void coroutine_fn qemu_co_mutex_unlock(CoMutex *mutex)
         }
 
         our_handoff = mutex->sequence;
+        /* Set handoff before checking for waiters.  */
         qatomic_mb_set(&mutex->handoff, our_handoff);
         if (!has_waiters(mutex)) {
             /* The concurrent lock has not added itself yet, so it
-- 
Gitee


From b523ee571f2aea27ad25877b4aab25dc9acb4957 Mon Sep 17 00:00:00 2001
From: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Date: Thu, 9 Mar 2023 08:15:24 -0500
Subject: [PATCH 245/407] physmem: add missing memory barrier

RH-Author: Emanuele Giuseppe Esposito <eesposit@redhat.com>
RH-MergeRequest: 263: qatomic: add smp_mb__before/after_rmw()
RH-Bugzilla: 2168472
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Eric Auger <eric.auger@redhat.com>
RH-Acked-by: Paolo Bonzini <pbonzini@redhat.com>
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Commit: [8/10] f6a9659f7cf40b78de6e85e4a7c06842273aa770

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2168472

commit 33828ca11da08436e1b32f3e79dabce3061a0427
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Fri Mar 3 14:36:32 2023 +0100

    physmem: add missing memory barrier

    Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 softmmu/physmem.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 4d0ef5f92fe..2b96fad3026 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -3087,6 +3087,8 @@ void cpu_register_map_client(QEMUBH *bh)
     qemu_mutex_lock(&map_client_list_lock);
     client->bh = bh;
     QLIST_INSERT_HEAD(&map_client_list, client, link);
+    /* Write map_client_list before reading in_use.  */
+    smp_mb();
     if (!qatomic_read(&bounce.in_use)) {
         cpu_notify_map_clients_locked();
     }
@@ -3279,6 +3281,7 @@ void address_space_unmap(AddressSpace *as, void *buffer, hwaddr len,
     qemu_vfree(bounce.buffer);
     bounce.buffer = NULL;
     memory_region_unref(bounce.mr);
+    /* Clear in_use before reading map_client_list.  */
     qatomic_mb_set(&bounce.in_use, false);
     cpu_notify_map_clients();
 }
-- 
Gitee


From 7f1074e55c55a7056cbc1a78aeb143aa7c919aae Mon Sep 17 00:00:00 2001
From: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Date: Thu, 9 Mar 2023 08:15:36 -0500
Subject: [PATCH 246/407] async: update documentation of the memory barriers

RH-Author: Emanuele Giuseppe Esposito <eesposit@redhat.com>
RH-MergeRequest: 263: qatomic: add smp_mb__before/after_rmw()
RH-Bugzilla: 2168472
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Eric Auger <eric.auger@redhat.com>
RH-Acked-by: Paolo Bonzini <pbonzini@redhat.com>
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Commit: [9/10] d471da2acf7a107cf75f3327c5e8d7456307160e

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2168472

commit 8dd48650b43dfde4ebea34191ac267e474bcc29e
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Mon Mar 6 10:15:06 2023 +0100

    async: update documentation of the memory barriers

    Ever since commit 8c6b0356b539 ("util/async: make bh_aio_poll() O(1)",
    2020-02-22), synchronization between qemu_bh_schedule() and aio_bh_poll()
    is happening when the bottom half is enqueued in the bh_list; not
    when the flags are set.  Update the documentation to match.

    Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 util/async.c | 33 +++++++++++++++++++--------------
 1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/util/async.c b/util/async.c
index 6f6717a34b6..795fe699b6c 100644
--- a/util/async.c
+++ b/util/async.c
@@ -71,14 +71,21 @@ static void aio_bh_enqueue(QEMUBH *bh, unsigned new_flags)
     unsigned old_flags;
 
     /*
-     * The memory barrier implicit in qatomic_fetch_or makes sure that:
-     * 1. idle & any writes needed by the callback are done before the
-     *    locations are read in the aio_bh_poll.
-     * 2. ctx is loaded before the callback has a chance to execute and bh
-     *    could be freed.
+     * Synchronizes with atomic_fetch_and() in aio_bh_dequeue(), ensuring that
+     * insertion starts after BH_PENDING is set.
      */
     old_flags = qatomic_fetch_or(&bh->flags, BH_PENDING | new_flags);
+
     if (!(old_flags & BH_PENDING)) {
+        /*
+         * At this point the bottom half becomes visible to aio_bh_poll().
+         * This insertion thus synchronizes with QSLIST_MOVE_ATOMIC in
+         * aio_bh_poll(), ensuring that:
+         * 1. any writes needed by the callback are visible from the callback
+         *    after aio_bh_dequeue() returns bh.
+         * 2. ctx is loaded before the callback has a chance to execute and bh
+         *    could be freed.
+         */
         QSLIST_INSERT_HEAD_ATOMIC(&ctx->bh_list, bh, next);
     }
 
@@ -97,11 +104,8 @@ static QEMUBH *aio_bh_dequeue(BHList *head, unsigned *flags)
     QSLIST_REMOVE_HEAD(head, next);
 
     /*
-     * The qatomic_and is paired with aio_bh_enqueue().  The implicit memory
-     * barrier ensures that the callback sees all writes done by the scheduling
-     * thread.  It also ensures that the scheduling thread sees the cleared
-     * flag before bh->cb has run, and thus will call aio_notify again if
-     * necessary.
+     * Synchronizes with qatomic_fetch_or() in aio_bh_enqueue(), ensuring that
+     * the removal finishes before BH_PENDING is reset.
      */
     *flags = qatomic_fetch_and(&bh->flags,
                               ~(BH_PENDING | BH_SCHEDULED | BH_IDLE));
@@ -148,6 +152,7 @@ int aio_bh_poll(AioContext *ctx)
     BHListSlice *s;
     int ret = 0;
 
+    /* Synchronizes with QSLIST_INSERT_HEAD_ATOMIC in aio_bh_enqueue().  */
     QSLIST_MOVE_ATOMIC(&slice.bh_list, &ctx->bh_list);
     QSIMPLEQ_INSERT_TAIL(&ctx->bh_slice_list, &slice, next);
 
@@ -437,15 +442,15 @@ LuringState *aio_get_linux_io_uring(AioContext *ctx)
 void aio_notify(AioContext *ctx)
 {
     /*
-     * Write e.g. bh->flags before writing ctx->notified.  Pairs with smp_mb in
-     * aio_notify_accept.
+     * Write e.g. ctx->bh_list before writing ctx->notified.  Pairs with
+     * smp_mb() in aio_notify_accept().
      */
     smp_wmb();
     qatomic_set(&ctx->notified, true);
 
     /*
-     * Write ctx->notified before reading ctx->notify_me.  Pairs
-     * with smp_mb in aio_ctx_prepare or aio_poll.
+     * Write ctx->notified (and also ctx->bh_list) before reading ctx->notify_me.
+     * Pairs with smp_mb() in aio_ctx_prepare or aio_poll.
      */
     smp_mb();
     if (qatomic_read(&ctx->notify_me)) {
-- 
Gitee


From 8dc11f1208316a1eaab11091bf82d952f56c11a3 Mon Sep 17 00:00:00 2001
From: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Date: Thu, 9 Mar 2023 08:15:42 -0500
Subject: [PATCH 247/407] async: clarify usage of barriers in the polling case

RH-Author: Emanuele Giuseppe Esposito <eesposit@redhat.com>
RH-MergeRequest: 263: qatomic: add smp_mb__before/after_rmw()
RH-Bugzilla: 2168472
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Eric Auger <eric.auger@redhat.com>
RH-Acked-by: Paolo Bonzini <pbonzini@redhat.com>
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Commit: [10/10] 3be07ccc6137a0336becfe63a818d9cbadb38e9c

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2168472

commit 6229438cca037d42f44a96d38feb15cb102a444f
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Mon Mar 6 10:43:52 2023 +0100

    async: clarify usage of barriers in the polling case

    Explain that aio_context_notifier_poll() relies on
    aio_notify_accept() to catch all the memory writes that were
    done before ctx->notified was set to true.

    Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
    Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 util/async.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/util/async.c b/util/async.c
index 795fe699b6c..2a63bf90f2c 100644
--- a/util/async.c
+++ b/util/async.c
@@ -463,8 +463,9 @@ void aio_notify_accept(AioContext *ctx)
     qatomic_set(&ctx->notified, false);
 
     /*
-     * Write ctx->notified before reading e.g. bh->flags.  Pairs with smp_wmb
-     * in aio_notify.
+     * Order reads of ctx->notified (in aio_context_notifier_poll()) and the
+     * above clearing of ctx->notified before reads of e.g. bh->flags.  Pairs
+     * with smp_wmb() in aio_notify.
      */
     smp_mb();
 }
@@ -487,6 +488,11 @@ static bool aio_context_notifier_poll(void *opaque)
     EventNotifier *e = opaque;
     AioContext *ctx = container_of(e, AioContext, notifier);
 
+    /*
+     * No need for load-acquire because we just want to kick the
+     * event loop.  aio_notify_accept() takes care of synchronizing
+     * the event loop with the producers.
+     */
     return qatomic_read(&ctx->notified);
 }
 
-- 
Gitee


From d6dd4f99dc198b7a8c304504b9e83f21c94d587e Mon Sep 17 00:00:00 2001
From: Stefan Hajnoczi <stefanha@redhat.com>
Date: Tue, 21 Feb 2023 16:22:16 -0500
Subject: [PATCH 248/407] scsi: protect req->aiocb with AioContext lock

RH-Author: Stefan Hajnoczi <stefanha@redhat.com>
RH-MergeRequest: 264: scsi: protect req->aiocb with AioContext lock
RH-Bugzilla: 2090990
RH-Acked-by: Stefano Garzarella <sgarzare@redhat.com>
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Acked-by: Kevin Wolf <kwolf@redhat.com>
RH-Commit: [1/3] e6a6d4109713e0fd6d6c515535c66196fea98688

If requests are being processed in the IOThread when a SCSIDevice is
unplugged, scsi_device_purge_requests() -> scsi_req_cancel_async() races
with I/O completion callbacks. Both threads load and store req->aiocb.
This can lead to assert(r->req.aiocb == NULL) failures and undefined
behavior.

Protect r->req.aiocb with the AioContext lock to prevent the race.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230221212218.1378734-2-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 7b7fc3d0102dafe8eb44802493036a526e921a71)
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/scsi/scsi-disk.c    | 23 ++++++++++++++++-------
 hw/scsi/scsi-generic.c | 11 ++++++-----
 2 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index d4914178ea0..179ce22c4af 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -270,9 +270,11 @@ static void scsi_aio_complete(void *opaque, int ret)
     SCSIDiskReq *r = (SCSIDiskReq *)opaque;
     SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
 
+    aio_context_acquire(blk_get_aio_context(s->qdev.conf.blk));
+
     assert(r->req.aiocb != NULL);
     r->req.aiocb = NULL;
-    aio_context_acquire(blk_get_aio_context(s->qdev.conf.blk));
+
     if (scsi_disk_req_check_error(r, ret, true)) {
         goto done;
     }
@@ -354,10 +356,11 @@ static void scsi_dma_complete(void *opaque, int ret)
     SCSIDiskReq *r = (SCSIDiskReq *)opaque;
     SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
 
+    aio_context_acquire(blk_get_aio_context(s->qdev.conf.blk));
+
     assert(r->req.aiocb != NULL);
     r->req.aiocb = NULL;
 
-    aio_context_acquire(blk_get_aio_context(s->qdev.conf.blk));
     if (ret < 0) {
         block_acct_failed(blk_get_stats(s->qdev.conf.blk), &r->acct);
     } else {
@@ -390,10 +393,11 @@ static void scsi_read_complete(void *opaque, int ret)
     SCSIDiskReq *r = (SCSIDiskReq *)opaque;
     SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
 
+    aio_context_acquire(blk_get_aio_context(s->qdev.conf.blk));
+
     assert(r->req.aiocb != NULL);
     r->req.aiocb = NULL;
 
-    aio_context_acquire(blk_get_aio_context(s->qdev.conf.blk));
     if (ret < 0) {
         block_acct_failed(blk_get_stats(s->qdev.conf.blk), &r->acct);
     } else {
@@ -443,10 +447,11 @@ static void scsi_do_read_cb(void *opaque, int ret)
     SCSIDiskReq *r = (SCSIDiskReq *)opaque;
     SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
 
+    aio_context_acquire(blk_get_aio_context(s->qdev.conf.blk));
+
     assert (r->req.aiocb != NULL);
     r->req.aiocb = NULL;
 
-    aio_context_acquire(blk_get_aio_context(s->qdev.conf.blk));
     if (ret < 0) {
         block_acct_failed(blk_get_stats(s->qdev.conf.blk), &r->acct);
     } else {
@@ -527,10 +532,11 @@ static void scsi_write_complete(void * opaque, int ret)
     SCSIDiskReq *r = (SCSIDiskReq *)opaque;
     SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
 
+    aio_context_acquire(blk_get_aio_context(s->qdev.conf.blk));
+
     assert (r->req.aiocb != NULL);
     r->req.aiocb = NULL;
 
-    aio_context_acquire(blk_get_aio_context(s->qdev.conf.blk));
     if (ret < 0) {
         block_acct_failed(blk_get_stats(s->qdev.conf.blk), &r->acct);
     } else {
@@ -1659,10 +1665,11 @@ static void scsi_unmap_complete(void *opaque, int ret)
     SCSIDiskReq *r = data->r;
     SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
 
+    aio_context_acquire(blk_get_aio_context(s->qdev.conf.blk));
+
     assert(r->req.aiocb != NULL);
     r->req.aiocb = NULL;
 
-    aio_context_acquire(blk_get_aio_context(s->qdev.conf.blk));
     if (scsi_disk_req_check_error(r, ret, true)) {
         scsi_req_unref(&r->req);
         g_free(data);
@@ -1738,9 +1745,11 @@ static void scsi_write_same_complete(void *opaque, int ret)
     SCSIDiskReq *r = data->r;
     SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
 
+    aio_context_acquire(blk_get_aio_context(s->qdev.conf.blk));
+
     assert(r->req.aiocb != NULL);
     r->req.aiocb = NULL;
-    aio_context_acquire(blk_get_aio_context(s->qdev.conf.blk));
+
     if (scsi_disk_req_check_error(r, ret, true)) {
         goto done;
     }
diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
index 37428998396..a1a40df64b8 100644
--- a/hw/scsi/scsi-generic.c
+++ b/hw/scsi/scsi-generic.c
@@ -111,10 +111,11 @@ static void scsi_command_complete(void *opaque, int ret)
     SCSIGenericReq *r = (SCSIGenericReq *)opaque;
     SCSIDevice *s = r->req.dev;
 
+    aio_context_acquire(blk_get_aio_context(s->conf.blk));
+
     assert(r->req.aiocb != NULL);
     r->req.aiocb = NULL;
 
-    aio_context_acquire(blk_get_aio_context(s->conf.blk));
     scsi_command_complete_noio(r, ret);
     aio_context_release(blk_get_aio_context(s->conf.blk));
 }
@@ -269,11 +270,11 @@ static void scsi_read_complete(void * opaque, int ret)
     SCSIDevice *s = r->req.dev;
     int len;
 
+    aio_context_acquire(blk_get_aio_context(s->conf.blk));
+
     assert(r->req.aiocb != NULL);
     r->req.aiocb = NULL;
 
-    aio_context_acquire(blk_get_aio_context(s->conf.blk));
-
     if (ret || r->req.io_canceled) {
         scsi_command_complete_noio(r, ret);
         goto done;
@@ -387,11 +388,11 @@ static void scsi_write_complete(void * opaque, int ret)
 
     trace_scsi_generic_write_complete(ret);
 
+    aio_context_acquire(blk_get_aio_context(s->conf.blk));
+
     assert(r->req.aiocb != NULL);
     r->req.aiocb = NULL;
 
-    aio_context_acquire(blk_get_aio_context(s->conf.blk));
-
     if (ret || r->req.io_canceled) {
         scsi_command_complete_noio(r, ret);
         goto done;
-- 
Gitee


From d96c38ec1a800a1389263f7a155ee7850d39247b Mon Sep 17 00:00:00 2001
From: Stefan Hajnoczi <stefanha@redhat.com>
Date: Tue, 21 Feb 2023 16:22:17 -0500
Subject: [PATCH 249/407] dma-helpers: prevent dma_blk_cb() vs dma_aio_cancel()
 race

RH-Author: Stefan Hajnoczi <stefanha@redhat.com>
RH-MergeRequest: 264: scsi: protect req->aiocb with AioContext lock
RH-Bugzilla: 2090990
RH-Acked-by: Stefano Garzarella <sgarzare@redhat.com>
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Acked-by: Kevin Wolf <kwolf@redhat.com>
RH-Commit: [2/3] 14f5835093ba8c5111f3ada2fe87730371aca733

dma_blk_cb() only takes the AioContext lock around ->io_func(). That
means the rest of dma_blk_cb() is not protected. In particular, the
DMAAIOCB field accesses happen outside the lock.

There is a race when the main loop thread holds the AioContext lock and
invokes scsi_device_purge_requests() -> bdrv_aio_cancel() ->
dma_aio_cancel() while an IOThread executes dma_blk_cb(). The dbs->acb
field determines how cancellation proceeds. If dma_aio_cancel() sees
dbs->acb == NULL while dma_blk_cb() is still running, the request can be
completed twice (-ECANCELED and the actual return value).

The following assertion can occur with virtio-scsi when an IOThread is
used:

  ../hw/scsi/scsi-disk.c:368: scsi_dma_complete: Assertion `r->req.aiocb != NULL' failed.

Fix the race by holding the AioContext across dma_blk_cb(). Now
dma_aio_cancel() under the AioContext lock will not see
inconsistent/intermediate states.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230221212218.1378734-3-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit abfcd2760b3e70727bbc0792221b8b98a733dc32)
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/scsi/scsi-disk.c   |  4 +---
 softmmu/dma-helpers.c | 12 +++++++-----
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 179ce22c4af..c8109a673e9 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -351,13 +351,12 @@ done:
     scsi_req_unref(&r->req);
 }
 
+/* Called with AioContext lock held */
 static void scsi_dma_complete(void *opaque, int ret)
 {
     SCSIDiskReq *r = (SCSIDiskReq *)opaque;
     SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
 
-    aio_context_acquire(blk_get_aio_context(s->qdev.conf.blk));
-
     assert(r->req.aiocb != NULL);
     r->req.aiocb = NULL;
 
@@ -367,7 +366,6 @@ static void scsi_dma_complete(void *opaque, int ret)
         block_acct_done(blk_get_stats(s->qdev.conf.blk), &r->acct);
     }
     scsi_dma_complete_noio(r, ret);
-    aio_context_release(blk_get_aio_context(s->qdev.conf.blk));
 }
 
 static void scsi_read_complete_noio(SCSIDiskReq *r, int ret)
diff --git a/softmmu/dma-helpers.c b/softmmu/dma-helpers.c
index 7d766a5e89a..42af18719a2 100644
--- a/softmmu/dma-helpers.c
+++ b/softmmu/dma-helpers.c
@@ -127,17 +127,19 @@ static void dma_complete(DMAAIOCB *dbs, int ret)
 static void dma_blk_cb(void *opaque, int ret)
 {
     DMAAIOCB *dbs = (DMAAIOCB *)opaque;
+    AioContext *ctx = dbs->ctx;
     dma_addr_t cur_addr, cur_len;
     void *mem;
 
     trace_dma_blk_cb(dbs, ret);
 
+    aio_context_acquire(ctx);
     dbs->acb = NULL;
     dbs->offset += dbs->iov.size;
 
     if (dbs->sg_cur_index == dbs->sg->nsg || ret < 0) {
         dma_complete(dbs, ret);
-        return;
+        goto out;
     }
     dma_blk_unmap(dbs);
 
@@ -177,9 +179,9 @@ static void dma_blk_cb(void *opaque, int ret)
 
     if (dbs->iov.size == 0) {
         trace_dma_map_wait(dbs);
-        dbs->bh = aio_bh_new(dbs->ctx, reschedule_dma, dbs);
+        dbs->bh = aio_bh_new(ctx, reschedule_dma, dbs);
         cpu_register_map_client(dbs->bh);
-        return;
+        goto out;
     }
 
     if (!QEMU_IS_ALIGNED(dbs->iov.size, dbs->align)) {
@@ -187,11 +189,11 @@ static void dma_blk_cb(void *opaque, int ret)
                                 QEMU_ALIGN_DOWN(dbs->iov.size, dbs->align));
     }
 
-    aio_context_acquire(dbs->ctx);
     dbs->acb = dbs->io_func(dbs->offset, &dbs->iov,
                             dma_blk_cb, dbs, dbs->io_func_opaque);
-    aio_context_release(dbs->ctx);
     assert(dbs->acb);
+out:
+    aio_context_release(ctx);
 }
 
 static void dma_aio_cancel(BlockAIOCB *acb)
-- 
Gitee


From 0c4b9bf63400abc155181b6a5e2797b5375caa60 Mon Sep 17 00:00:00 2001
From: Stefan Hajnoczi <stefanha@redhat.com>
Date: Tue, 21 Feb 2023 16:22:18 -0500
Subject: [PATCH 250/407] virtio-scsi: reset SCSI devices from main loop thread

RH-Author: Stefan Hajnoczi <stefanha@redhat.com>
RH-MergeRequest: 264: scsi: protect req->aiocb with AioContext lock
RH-Bugzilla: 2090990
RH-Acked-by: Stefano Garzarella <sgarzare@redhat.com>
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Acked-by: Kevin Wolf <kwolf@redhat.com>
RH-Commit: [3/3] 30d7c2bd868efa6694992e75ace22fb48aef161b

When an IOThread is configured, the ctrl virtqueue is processed in the
IOThread. TMFs that reset SCSI devices are currently called directly
from the IOThread and trigger an assertion failure in blk_drain() from
the following call stack:

virtio_scsi_handle_ctrl_req -> virtio_scsi_do_tmf -> device_code_reset
-> scsi_disk_reset -> scsi_device_purge_requests -> blk_drain

  ../block/block-backend.c:1780: void blk_drain(BlockBackend *): Assertion `qemu_in_main_thread()' failed.

The blk_drain() function is not designed to be called from an IOThread
because it needs the Big QEMU Lock (BQL).

This patch defers TMFs that reset SCSI devices to a Bottom Half (BH)
that runs in the main loop thread under the BQL. This way it's safe to
call blk_drain() and the assertion failure is avoided.

Introduce s->tmf_bh_list for tracking TMF requests that have been
deferred to the BH. When the BH runs it will grab the entire list and
process all requests. Care must be taken to clear the list when the
virtio-scsi device is reset or unrealized. Otherwise deferred TMF
requests could execute later and lead to use-after-free or other
undefined behavior.

The s->resetting counter that's used by TMFs that reset SCSI devices is
accessed from multiple threads. This patch makes that explicit by using
atomic accessor functions. With this patch applied the counter is only
modified by the main loop thread under the BQL but can be read by any
thread.

Reported-by: Qing Wang <qinwang@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230221212218.1378734-4-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit be2c42b97c3a3a395b2f05bad1b6c7de20ecf2a5)
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

Conflicts:
- hw/scsi/virtio-scsi.c
  - VirtIOSCSIReq is defined in include/hw/virtio/virtio-scsi.h
    downstream instead of hw/scsi/virtio-scsi.c because commit
    3dc584abeef0 ("virtio-scsi: move request-related items from .h to
    .c") is missing. Update the struct fields in virtio-scsi.h
    downstream.

  - Use qbus_reset_all() downstream instead of bus_cold_reset() because
    commit 4a5fc890b1d3 ("scsi: Use device_cold_reset() and
    bus_cold_reset()") is missing.

  - Drop GLOBAL_STATE_CODE() because these macros don't exist
    downstream. They are assertions/documentation and can be removed
    without affecting the code.
---
 hw/scsi/virtio-scsi.c           | 155 +++++++++++++++++++++++++-------
 include/hw/virtio/virtio-scsi.h |  21 +++--
 2 files changed, 139 insertions(+), 37 deletions(-)

diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index a35257c35a8..ef19a9bcd09 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -256,6 +256,118 @@ static inline void virtio_scsi_ctx_check(VirtIOSCSI *s, SCSIDevice *d)
     }
 }
 
+static void virtio_scsi_do_one_tmf_bh(VirtIOSCSIReq *req)
+{
+    VirtIOSCSI *s = req->dev;
+    SCSIDevice *d = virtio_scsi_device_get(s, req->req.tmf.lun);
+    BusChild *kid;
+    int target;
+
+    switch (req->req.tmf.subtype) {
+    case VIRTIO_SCSI_T_TMF_LOGICAL_UNIT_RESET:
+        if (!d) {
+            req->resp.tmf.response = VIRTIO_SCSI_S_BAD_TARGET;
+            goto out;
+        }
+        if (d->lun != virtio_scsi_get_lun(req->req.tmf.lun)) {
+            req->resp.tmf.response = VIRTIO_SCSI_S_INCORRECT_LUN;
+            goto out;
+        }
+        qatomic_inc(&s->resetting);
+        qdev_reset_all(&d->qdev);
+        qatomic_dec(&s->resetting);
+        break;
+
+    case VIRTIO_SCSI_T_TMF_I_T_NEXUS_RESET:
+        target = req->req.tmf.lun[1];
+        qatomic_inc(&s->resetting);
+
+        rcu_read_lock();
+        QTAILQ_FOREACH_RCU(kid, &s->bus.qbus.children, sibling) {
+            SCSIDevice *d1 = SCSI_DEVICE(kid->child);
+            if (d1->channel == 0 && d1->id == target) {
+                qdev_reset_all(&d1->qdev);
+            }
+        }
+        rcu_read_unlock();
+
+        qatomic_dec(&s->resetting);
+        break;
+
+    default:
+        g_assert_not_reached();
+        break;
+    }
+
+out:
+    object_unref(OBJECT(d));
+
+    virtio_scsi_acquire(s);
+    virtio_scsi_complete_req(req);
+    virtio_scsi_release(s);
+}
+
+/* Some TMFs must be processed from the main loop thread */
+static void virtio_scsi_do_tmf_bh(void *opaque)
+{
+    VirtIOSCSI *s = opaque;
+    QTAILQ_HEAD(, VirtIOSCSIReq) reqs = QTAILQ_HEAD_INITIALIZER(reqs);
+    VirtIOSCSIReq *req;
+    VirtIOSCSIReq *tmp;
+
+    virtio_scsi_acquire(s);
+
+    QTAILQ_FOREACH_SAFE(req, &s->tmf_bh_list, next, tmp) {
+        QTAILQ_REMOVE(&s->tmf_bh_list, req, next);
+        QTAILQ_INSERT_TAIL(&reqs, req, next);
+    }
+
+    qemu_bh_delete(s->tmf_bh);
+    s->tmf_bh = NULL;
+
+    virtio_scsi_release(s);
+
+    QTAILQ_FOREACH_SAFE(req, &reqs, next, tmp) {
+        QTAILQ_REMOVE(&reqs, req, next);
+        virtio_scsi_do_one_tmf_bh(req);
+    }
+}
+
+static void virtio_scsi_reset_tmf_bh(VirtIOSCSI *s)
+{
+    VirtIOSCSIReq *req;
+    VirtIOSCSIReq *tmp;
+
+    virtio_scsi_acquire(s);
+
+    if (s->tmf_bh) {
+        qemu_bh_delete(s->tmf_bh);
+        s->tmf_bh = NULL;
+    }
+
+    QTAILQ_FOREACH_SAFE(req, &s->tmf_bh_list, next, tmp) {
+        QTAILQ_REMOVE(&s->tmf_bh_list, req, next);
+
+        /* SAM-6 6.3.2 Hard reset */
+        req->resp.tmf.response = VIRTIO_SCSI_S_TARGET_FAILURE;
+        virtio_scsi_complete_req(req);
+    }
+
+    virtio_scsi_release(s);
+}
+
+static void virtio_scsi_defer_tmf_to_bh(VirtIOSCSIReq *req)
+{
+    VirtIOSCSI *s = req->dev;
+
+    QTAILQ_INSERT_TAIL(&s->tmf_bh_list, req, next);
+
+    if (!s->tmf_bh) {
+        s->tmf_bh = qemu_bh_new(virtio_scsi_do_tmf_bh, s);
+        qemu_bh_schedule(s->tmf_bh);
+    }
+}
+
 /* Return 0 if the request is ready to be completed and return to guest;
  * -EINPROGRESS if the request is submitted and will be completed later, in the
  *  case of async cancellation. */
@@ -263,8 +375,6 @@ static int virtio_scsi_do_tmf(VirtIOSCSI *s, VirtIOSCSIReq *req)
 {
     SCSIDevice *d = virtio_scsi_device_get(s, req->req.tmf.lun);
     SCSIRequest *r, *next;
-    BusChild *kid;
-    int target;
     int ret = 0;
 
     virtio_scsi_ctx_check(s, d);
@@ -321,15 +431,9 @@ static int virtio_scsi_do_tmf(VirtIOSCSI *s, VirtIOSCSIReq *req)
         break;
 
     case VIRTIO_SCSI_T_TMF_LOGICAL_UNIT_RESET:
-        if (!d) {
-            goto fail;
-        }
-        if (d->lun != virtio_scsi_get_lun(req->req.tmf.lun)) {
-            goto incorrect_lun;
-        }
-        s->resetting++;
-        qdev_reset_all(&d->qdev);
-        s->resetting--;
+    case VIRTIO_SCSI_T_TMF_I_T_NEXUS_RESET:
+        virtio_scsi_defer_tmf_to_bh(req);
+        ret = -EINPROGRESS;
         break;
 
     case VIRTIO_SCSI_T_TMF_ABORT_TASK_SET:
@@ -372,22 +476,6 @@ static int virtio_scsi_do_tmf(VirtIOSCSI *s, VirtIOSCSIReq *req)
         }
         break;
 
-    case VIRTIO_SCSI_T_TMF_I_T_NEXUS_RESET:
-        target = req->req.tmf.lun[1];
-        s->resetting++;
-
-        rcu_read_lock();
-        QTAILQ_FOREACH_RCU(kid, &s->bus.qbus.children, sibling) {
-            SCSIDevice *d1 = SCSI_DEVICE(kid->child);
-            if (d1->channel == 0 && d1->id == target) {
-                qdev_reset_all(&d1->qdev);
-            }
-        }
-        rcu_read_unlock();
-
-        s->resetting--;
-        break;
-
     case VIRTIO_SCSI_T_TMF_CLEAR_ACA:
     default:
         req->resp.tmf.response = VIRTIO_SCSI_S_FUNCTION_REJECTED;
@@ -603,7 +691,7 @@ static void virtio_scsi_request_cancelled(SCSIRequest *r)
     if (!req) {
         return;
     }
-    if (req->dev->resetting) {
+    if (qatomic_read(&req->dev->resetting)) {
         req->resp.cmd.response = VIRTIO_SCSI_S_RESET;
     } else {
         req->resp.cmd.response = VIRTIO_SCSI_S_ABORTED;
@@ -784,9 +872,12 @@ static void virtio_scsi_reset(VirtIODevice *vdev)
     VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(vdev);
 
     assert(!s->dataplane_started);
-    s->resetting++;
+
+    virtio_scsi_reset_tmf_bh(s);
+
+    qatomic_inc(&s->resetting);
     qbus_reset_all(BUS(&s->bus));
-    s->resetting--;
+    qatomic_dec(&s->resetting);
 
     vs->sense_size = VIRTIO_SCSI_SENSE_DEFAULT_SIZE;
     vs->cdb_size = VIRTIO_SCSI_CDB_DEFAULT_SIZE;
@@ -1018,6 +1109,8 @@ static void virtio_scsi_device_realize(DeviceState *dev, Error **errp)
     VirtIOSCSI *s = VIRTIO_SCSI(dev);
     Error *err = NULL;
 
+    QTAILQ_INIT(&s->tmf_bh_list);
+
     virtio_scsi_common_realize(dev,
                                virtio_scsi_handle_ctrl,
                                virtio_scsi_handle_event,
@@ -1055,6 +1148,8 @@ static void virtio_scsi_device_unrealize(DeviceState *dev)
 {
     VirtIOSCSI *s = VIRTIO_SCSI(dev);
 
+    virtio_scsi_reset_tmf_bh(s);
+
     qbus_set_hotplug_handler(BUS(&s->bus), NULL);
     virtio_scsi_common_unrealize(dev);
 }
diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
index 543681bc183..b0e36f25aa7 100644
--- a/include/hw/virtio/virtio-scsi.h
+++ b/include/hw/virtio/virtio-scsi.h
@@ -77,13 +77,22 @@ struct VirtIOSCSICommon {
     VirtQueue **cmd_vqs;
 };
 
+struct VirtIOSCSIReq;
+
 struct VirtIOSCSI {
     VirtIOSCSICommon parent_obj;
 
     SCSIBus bus;
-    int resetting;
+    int resetting; /* written from main loop thread, read from any thread */
     bool events_dropped;
 
+    /*
+     * TMFs deferred to main loop BH. These fields are protected by
+     * virtio_scsi_acquire().
+     */
+    QEMUBH *tmf_bh;
+    QTAILQ_HEAD(, VirtIOSCSIReq) tmf_bh_list;
+
     /* Fields for dataplane below */
     AioContext *ctx; /* one iothread per virtio-scsi-pci for now */
 
@@ -106,13 +115,11 @@ typedef struct VirtIOSCSIReq {
     QEMUSGList qsgl;
     QEMUIOVector resp_iov;
 
-    union {
-        /* Used for two-stage request submission */
-        QTAILQ_ENTRY(VirtIOSCSIReq) next;
+    /* Used for two-stage request submission and TMFs deferred to BH */
+    QTAILQ_ENTRY(VirtIOSCSIReq) next;
 
-        /* Used for cancellation of request during TMFs */
-        int remaining;
-    };
+    /* Used for cancellation of request during TMFs */
+    int remaining;
 
     SCSIRequest *sreq;
     size_t resp_size;
-- 
Gitee


From d570043ddb5731dfe20ee185f2a6153d90ca2aa5 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= <clg@redhat.com>
Date: Mon, 16 Jan 2023 18:46:05 +0100
Subject: [PATCH 251/407] s390x/pv: Implement a CGS check helper
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 271: Secure guest can't boot with maximal number of vcpus (248)
RH-Bugzilla: 2187159
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [1/1] c870d525c48ab6d0df964b5abe48efe2528c9883

When a protected VM is started with the maximum number of CPUs (248),
the service call providing information on the CPUs requires more
buffer space than allocated and QEMU disgracefully aborts :

    LOADPARM=[........]
    Using virtio-blk.
    Using SCSI scheme.
    ...................................................................................
    qemu-system-s390x: KVM_S390_MEM_OP failed: Argument list too long

When protected virtualization is initialized, compute the maximum
number of vCPUs supported by the machine and return useful information
to the user before the machine starts in case of error.

Suggested-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Message-Id: <20230116174607.2459498-2-clg@kaod.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 75d7150c636569f6687f7e70a33be893be43eb5f)
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 hw/s390x/pv.c | 40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/hw/s390x/pv.c b/hw/s390x/pv.c
index 728ba245476..749e5db1ce2 100644
--- a/hw/s390x/pv.c
+++ b/hw/s390x/pv.c
@@ -20,6 +20,7 @@
 #include "exec/confidential-guest-support.h"
 #include "hw/s390x/ipl.h"
 #include "hw/s390x/pv.h"
+#include "hw/s390x/sclp.h"
 #include "target/s390x/kvm/kvm_s390x.h"
 
 static bool info_valid;
@@ -249,6 +250,41 @@ struct S390PVGuestClass {
     ConfidentialGuestSupportClass parent_class;
 };
 
+/*
+ * If protected virtualization is enabled, the amount of data that the
+ * Read SCP Info Service Call can use is limited to one page. The
+ * available space also depends on the Extended-Length SCCB (ELS)
+ * feature which can take more buffer space to store feature
+ * information. This impacts the maximum number of CPUs supported in
+ * the machine.
+ */
+static uint32_t s390_pv_get_max_cpus(void)
+{
+    int offset_cpu = s390_has_feat(S390_FEAT_EXTENDED_LENGTH_SCCB) ?
+        offsetof(ReadInfo, entries) : SCLP_READ_SCP_INFO_FIXED_CPU_OFFSET;
+
+    return (TARGET_PAGE_SIZE - offset_cpu) / sizeof(CPUEntry);
+}
+
+static bool s390_pv_check_cpus(Error **errp)
+{
+    MachineState *ms = MACHINE(qdev_get_machine());
+    uint32_t pv_max_cpus = s390_pv_get_max_cpus();
+
+    if (ms->smp.max_cpus > pv_max_cpus) {
+        error_setg(errp, "Protected VMs support a maximum of %d CPUs",
+                   pv_max_cpus);
+        return false;
+    }
+
+    return true;
+}
+
+static bool s390_pv_guest_check(ConfidentialGuestSupport *cgs, Error **errp)
+{
+    return s390_pv_check_cpus(errp);
+}
+
 int s390_pv_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
 {
     if (!object_dynamic_cast(OBJECT(cgs), TYPE_S390_PV_GUEST)) {
@@ -261,6 +297,10 @@ int s390_pv_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
         return -1;
     }
 
+    if (!s390_pv_guest_check(cgs, errp)) {
+        return -1;
+    }
+
     cgs->ready = true;
 
     return 0;
-- 
Gitee


From 239bda7bd0e3f918af21ae9b3b8a6d3a82c96b9f Mon Sep 17 00:00:00 2001
From: Eric Blake <eblake@redhat.com>
Date: Fri, 14 Apr 2023 10:33:58 -0500
Subject: [PATCH 252/407] migration: Handle block device inactivation failures
 better

RH-Author: Eric Blake <eblake@redhat.com>
RH-MergeRequest: 273: migration: prevent source core dump if NFS dies mid-migration
RH-Bugzilla: 2177957
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Acked-by: quintela1 <quintela@redhat.com>
RH-Acked-by: Kevin Wolf <kwolf@redhat.com>
RH-Commit: [1/3] 5892c17ca0a21d824d176e7398d12f7cf991651d (ebblake/qemu-kvm)

Consider what happens when performing a migration between two host
machines connected to an NFS server serving multiple block devices to
the guest, when the NFS server becomes unavailable.  The migration
attempts to inactivate all block devices on the source (a necessary
step before the destination can take over); but if the NFS server is
non-responsive, the attempt to inactivate can itself fail.  When that
happens, the destination fails to get the migrated guest (good,
because the source wasn't able to flush everything properly):

  (qemu) qemu-kvm: load of migration failed: Input/output error

at which point, our only hope for the guest is for the source to take
back control.  With the current code base, the host outputs a message, but then appears to resume:

  (qemu) qemu-kvm: qemu_savevm_state_complete_precopy_non_iterable: bdrv_inactivate_all() failed (-1)

  (src qemu)info status
   VM status: running

but a second migration attempt now asserts:

  (src qemu) qemu-kvm: ../block.c:6738: int bdrv_inactivate_recurse(BlockDriverState *): Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' failed.

Whether the guest is recoverable on the source after the first failure
is debatable, but what we do not want is to have qemu itself fail due
to an assertion.  It looks like the problem is as follows:

In migration.c:migration_completion(), the source sets 'inactivate' to
true (since COLO is not enabled), then tries
savevm.c:qemu_savevm_state_complete_precopy() with a request to
inactivate block devices.  In turn, this calls
block.c:bdrv_inactivate_all(), which fails when flushing runs up
against the non-responsive NFS server.  With savevm failing, we are
now left in a state where some, but not all, of the block devices have
been inactivated; but migration_completion() then jumps to 'fail'
rather than 'fail_invalidate' and skips an attempt to reclaim those
those disks by calling bdrv_activate_all().  Even if we do attempt to
reclaim disks, we aren't taking note of failure there, either.

Thus, we have reached a state where the migration engine has forgotten
all state about whether a block device is inactive, because we did not
set s->block_inactive in enough places; so migration allows the source
to reach vm_start() and resume execution, violating the block layer
invariant that the guest CPUs should not be restarted while a device
is inactive.  Note that the code in migration.c:migrate_fd_cancel()
will also try to reactivate all block devices if s->block_inactive was
set, but because we failed to set that flag after the first failure,
the source assumes it has reclaimed all devices, even though it still
has remaining inactivated devices and does not try again.  Normally,
qmp_cont() will also try to reactivate all disks (or correctly fail if
the disks are not reclaimable because NFS is not yet back up), but the
auto-resumption of the source after a migration failure does not go
through qmp_cont().  And because we have left the block layer in an
inconsistent state with devices still inactivated, the later migration
attempt is hitting the assertion failure.

Since it is important to not resume the source with inactive disks,
this patch marks s->block_inactive before attempting inactivation,
rather than after succeeding, in order to prevent any vm_start() until
it has successfully reactivated all devices.

See also https://bugzilla.redhat.com/show_bug.cgi?id=2058982

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Acked-by: Lukas Straub <lukasstraub2@web.de>
Tested-by: Lukas Straub <lukasstraub2@web.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
(cherry picked from commit 403d18ae384239876764bbfa111d6cc5dcb673d1)
---
 migration/migration.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 0885549de04..08e5e8f0132 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3256,13 +3256,11 @@ static void migration_completion(MigrationState *s)
                                             MIGRATION_STATUS_DEVICE);
             }
             if (ret >= 0) {
+                s->block_inactive = inactivate;
                 qemu_file_set_rate_limit(s->to_dst_file, INT64_MAX);
                 ret = qemu_savevm_state_complete_precopy(s->to_dst_file, false,
                                                          inactivate);
             }
-            if (inactivate && ret >= 0) {
-                s->block_inactive = true;
-            }
         }
         qemu_mutex_unlock_iothread();
 
@@ -3321,6 +3319,7 @@ fail_invalidate:
         bdrv_invalidate_cache_all(&local_err);
         if (local_err) {
             error_report_err(local_err);
+            s->block_inactive = true;
         } else {
             s->block_inactive = false;
         }
-- 
Gitee


From d1f7d9816987add353ede96cb4587f663c6f7541 Mon Sep 17 00:00:00 2001
From: Eric Blake <eblake@redhat.com>
Date: Thu, 20 Apr 2023 09:35:51 -0500
Subject: [PATCH 253/407] migration: Minor control flow simplification

RH-Author: Eric Blake <eblake@redhat.com>
RH-MergeRequest: 273: migration: prevent source core dump if NFS dies mid-migration
RH-Bugzilla: 2177957
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Acked-by: quintela1 <quintela@redhat.com>
RH-Acked-by: Kevin Wolf <kwolf@redhat.com>
RH-Commit: [2/3] f00b21b6ebd377af79af93ac18f103f8dc0309d6 (ebblake/qemu-kvm)

No need to declare a temporary variable.

Suggested-by: Juan Quintela <quintela@redhat.com>
Fixes: 1df36e8c6289 ("migration: Handle block device inactivation failures better")
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
(cherry picked from commit 5d39f44d7ac5c63f53d4d0900ceba9521bc27e49)
---
 migration/migration.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 08e5e8f0132..6ba8eb0fdf9 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3248,7 +3248,6 @@ static void migration_completion(MigrationState *s)
         ret = global_state_store();
 
         if (!ret) {
-            bool inactivate = !migrate_colo_enabled();
             ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
             trace_migration_completion_vm_stop(ret);
             if (ret >= 0) {
@@ -3256,10 +3255,10 @@ static void migration_completion(MigrationState *s)
                                             MIGRATION_STATUS_DEVICE);
             }
             if (ret >= 0) {
-                s->block_inactive = inactivate;
+                s->block_inactive = !migrate_colo_enabled();
                 qemu_file_set_rate_limit(s->to_dst_file, INT64_MAX);
                 ret = qemu_savevm_state_complete_precopy(s->to_dst_file, false,
-                                                         inactivate);
+                                                         s->block_inactive);
             }
         }
         qemu_mutex_unlock_iothread();
-- 
Gitee


From 45dc4b3a5720ad865b9add8280fd14c634692375 Mon Sep 17 00:00:00 2001
From: Eric Blake <eblake@redhat.com>
Date: Tue, 2 May 2023 15:52:12 -0500
Subject: [PATCH 254/407] migration: Attempt disk reactivation in more failure
 scenarios

RH-Author: Eric Blake <eblake@redhat.com>
RH-MergeRequest: 273: migration: prevent source core dump if NFS dies mid-migration
RH-Bugzilla: 2177957
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Acked-by: quintela1 <quintela@redhat.com>
RH-Acked-by: Kevin Wolf <kwolf@redhat.com>
RH-Commit: [3/3] e84bf1e7233c0273ca3136ecaa6b2cfc9c0efacb (ebblake/qemu-kvm)

Commit fe904ea824 added a fail_inactivate label, which tries to
reactivate disks on the source after a failure while s->state ==
MIGRATION_STATUS_ACTIVE, but didn't actually use the label if
qemu_savevm_state_complete_precopy() failed.  This failure to
reactivate is also present in commit 6039dd5b1c (also covering the new
s->state == MIGRATION_STATUS_DEVICE state) and 403d18ae (ensuring
s->block_inactive is set more reliably).

Consolidate the two labels back into one - no matter HOW migration is
failed, if there is any chance we can reach vm_start() after having
attempted inactivation, it is essential that we have tried to restart
disks before then.  This also makes the cleanup more like
migrate_fd_cancel().

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20230502205212.134680-1-eblake@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 6dab4c93ecfae48e2e67b984d1032c1e988d3005)
[eblake: downstream migrate_colo() => migrate_colo_enabled()]
Signed-off-by: Eric Blake <eblake@redhat.com>
---
 migration/migration.c | 24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 6ba8eb0fdf9..817170d52dd 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3255,6 +3255,11 @@ static void migration_completion(MigrationState *s)
                                             MIGRATION_STATUS_DEVICE);
             }
             if (ret >= 0) {
+                /*
+                 * Inactivate disks except in COLO, and track that we
+                 * have done so in order to remember to reactivate
+                 * them if migration fails or is cancelled.
+                 */
                 s->block_inactive = !migrate_colo_enabled();
                 qemu_file_set_rate_limit(s->to_dst_file, INT64_MAX);
                 ret = qemu_savevm_state_complete_precopy(s->to_dst_file, false,
@@ -3290,13 +3295,13 @@ static void migration_completion(MigrationState *s)
         rp_error = await_return_path_close_on_source(s);
         trace_migration_return_path_end_after(rp_error);
         if (rp_error) {
-            goto fail_invalidate;
+            goto fail;
         }
     }
 
     if (qemu_file_get_error(s->to_dst_file)) {
         trace_migration_completion_file_err();
-        goto fail_invalidate;
+        goto fail;
     }
 
     if (!migrate_colo_enabled()) {
@@ -3306,26 +3311,25 @@ static void migration_completion(MigrationState *s)
 
     return;
 
-fail_invalidate:
-    /* If not doing postcopy, vm_start() will be called: let's regain
-     * control on images.
-     */
-    if (s->state == MIGRATION_STATUS_ACTIVE ||
-        s->state == MIGRATION_STATUS_DEVICE) {
+fail:
+    if (s->block_inactive && (s->state == MIGRATION_STATUS_ACTIVE ||
+                              s->state == MIGRATION_STATUS_DEVICE)) {
+        /*
+         * If not doing postcopy, vm_start() will be called: let's
+         * regain control on images.
+         */
         Error *local_err = NULL;
 
         qemu_mutex_lock_iothread();
         bdrv_invalidate_cache_all(&local_err);
         if (local_err) {
             error_report_err(local_err);
-            s->block_inactive = true;
         } else {
             s->block_inactive = false;
         }
         qemu_mutex_unlock_iothread();
     }
 
-fail:
     migrate_set_state(&s->state, current_active_state,
                       MIGRATION_STATUS_FAILED);
 }
-- 
Gitee


From 0c8b055490a474c92550c55bb10f0ca4ad066935 Mon Sep 17 00:00:00 2001
From: Florian Westphal <fw@strlen.de>
Date: Fri, 24 Mar 2023 11:47:20 +0100
Subject: [PATCH 255/407] nbd/server: push pending frames after sending reply

RH-Author: Eric Blake <eblake@redhat.com>
RH-MergeRequest: 274: nbd: improve TLS performance of NBD server
RH-Bugzilla: 2035712
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Acked-by: Kevin Wolf <kwolf@redhat.com>
RH-Acked-by: Stefano Garzarella <sgarzare@redhat.com>
RH-Commit: [1/2] ab92c06c48810aa40380de0433dcac4c6e4be9a5 (ebblake/qemu-kvm)

qemu-nbd doesn't set TCP_NODELAY on the tcp socket.

Kernel waits for more data and avoids transmission of small packets.
Without TLS this is barely noticeable, but with TLS this really shows.

Booting a VM via qemu-nbd on localhost (with tls) takes more than
2 minutes on my system.  tcpdump shows frequent wait periods, where no
packets get sent for a 40ms period.

Add explicit (un)corking when processing (and responding to) requests.
"TCP_CORK, &zero" after earlier "CORK, &one" will flush pending data.

VM Boot time:
main:    no tls:  23s, with tls: 2m45s
patched: no tls:  14s, with tls: 15s

VM Boot time, qemu-nbd via network (same lan):
main:    no tls:  18s, with tls: 1m50s
patched: no tls:  17s, with tls: 18s

Future optimization: if we could detect if there is another pending
request we could defer the uncork operation because more data would be
appended.

Signed-off-by: Florian Westphal <fw@strlen.de>
Message-Id: <20230324104720.2498-1-fw@strlen.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit bd2cd4a441ded163b62371790876f28a9b834317)
Signed-off-by: Eric Blake <eblake@redhat.com>
---
 nbd/server.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/nbd/server.c b/nbd/server.c
index 4630dd73225..a5edc7f6813 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -2647,6 +2647,8 @@ static coroutine_fn void nbd_trip(void *opaque)
         goto disconnect;
     }
 
+    qio_channel_set_cork(client->ioc, true);
+
     if (ret < 0) {
         /* It wans't -EIO, so, according to nbd_co_receive_request()
          * semantics, we should return the error to the client. */
@@ -2672,6 +2674,7 @@ static coroutine_fn void nbd_trip(void *opaque)
         goto disconnect;
     }
 
+    qio_channel_set_cork(client->ioc, false);
 done:
     nbd_request_put(req);
     nbd_client_put(client);
-- 
Gitee


From 28323d92e6bc70caca9dd8c63fe9e2dec6b850e1 Mon Sep 17 00:00:00 2001
From: Eric Blake <eblake@redhat.com>
Date: Mon, 3 Apr 2023 19:40:47 -0500
Subject: [PATCH 256/407] nbd/server: Request TCP_NODELAY
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Eric Blake <eblake@redhat.com>
RH-MergeRequest: 274: nbd: improve TLS performance of NBD server
RH-Bugzilla: 2035712
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Acked-by: Kevin Wolf <kwolf@redhat.com>
RH-Acked-by: Stefano Garzarella <sgarzare@redhat.com>
RH-Commit: [2/2] 092145077756cda2a4f849c5911031b0fc4a2134 (ebblake/qemu-kvm)

Nagle's algorithm adds latency in order to reduce network packet
overhead on small packets.  But when we are already using corking to
merge smaller packets into transactional requests, the extra delay
from TCP defaults just gets in the way (see recent commit bd2cd4a4).

For reference, qemu as an NBD client already requests TCP_NODELAY (see
nbd_connect() in nbd/client-connection.c); as does libnbd as a client
[1], and nbdkit as a server [2].  Furthermore, the NBD spec recommends
the use of TCP_NODELAY [3].

[1] https://gitlab.com/nbdkit/libnbd/-/blob/a48a1142/generator/states-connect.c#L39
[2] https://gitlab.com/nbdkit/nbdkit/-/blob/45b72f5b/server/sockets.c#L430
[3] https://github.com/NetworkBlockDevice/nbd/blob/master/doc/proto.md#protocol-phases

CC: Florian Westphal <fw@strlen.de>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20230404004047.142086-1-eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit f1426881a827a6d3f31b65616c4a8db1e9e7c45e)
Signed-off-by: Eric Blake <eblake@redhat.com>
---
 nbd/server.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/nbd/server.c b/nbd/server.c
index a5edc7f6813..6db124cf530 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -2738,6 +2738,7 @@ void nbd_client_new(QIOChannelSocket *sioc,
     }
     client->tlsauthz = g_strdup(tlsauthz);
     client->sioc = sioc;
+    qio_channel_set_delay(QIO_CHANNEL(sioc), false);
     object_ref(OBJECT(client->sioc));
     client->ioc = QIO_CHANNEL(sioc);
     object_ref(OBJECT(client->ioc));
-- 
Gitee


From 42e94673ccf076870c3b149c9b35d387a8d87447 Mon Sep 17 00:00:00 2001
From: Greg Kurz <groug@kaod.org>
Date: Tue, 15 Feb 2022 19:15:29 +0100
Subject: [PATCH 257/407] virtiofsd: Add basic support for FUSE_SYNCFS request

RH-Author: German Maglione <None>
RH-MergeRequest: 278: virtiofsd: Add basic support for FUSE_SYNCFS request
RH-Bugzilla: 2196880
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
RH-Acked-by: Stefano Garzarella <sgarzare@redhat.com>
RH-Acked-by: Hanna Czenczek <hreitz@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [1/1] 7a0cbe70d97f13e74b2116218fccd9f79d335752

Honor the expected behavior of syncfs() to synchronously flush all data
and metadata to disk on linux systems.

If virtiofsd is started with '-o announce_submounts', the client is
expected to send a FUSE_SYNCFS request for each individual submount.
In this case, we just create a new file descriptor on the submount
inode with lo_inode_open(), call syncfs() on it and close it. The
intermediary file is needed because O_PATH descriptors aren't
backed by an actual file and syncfs() would fail with EBADF.

If virtiofsd is started without '-o announce_submounts' or if the
client doesn't have the FUSE_CAP_SUBMOUNTS capability, the client
only sends a single FUSE_SYNCFS request for the root inode. The
server would thus need to track submounts internally and call
syncfs() on each of them. This will be implemented later.

Note that syncfs() might suffer from a time penalty if the submounts
are being hammered by some unrelated workload on the host. The only
solution to prevent that is to avoid shared mounts.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20220215181529.164070-2-groug@kaod.org>
Reviewed-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
(cherry picked from commit 45b04ef48dbbeb18d93c2631bf5584ac493de749)
Signed-off-by: German Maglione <gmaglione@redhat.com>
---
 tools/virtiofsd/fuse_lowlevel.c       | 11 +++++++
 tools/virtiofsd/fuse_lowlevel.h       | 13 ++++++++
 tools/virtiofsd/passthrough_ll.c      | 44 +++++++++++++++++++++++++++
 tools/virtiofsd/passthrough_seccomp.c |  1 +
 4 files changed, 69 insertions(+)

diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index 5d431a7038c..57f928463a1 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -1876,6 +1876,16 @@ static void do_lseek(fuse_req_t req, fuse_ino_t nodeid,
     }
 }
 
+static void do_syncfs(fuse_req_t req, fuse_ino_t nodeid,
+                      struct fuse_mbuf_iter *iter)
+{
+    if (req->se->op.syncfs) {
+        req->se->op.syncfs(req, nodeid);
+    } else {
+        fuse_reply_err(req, ENOSYS);
+    }
+}
+
 static void do_init(fuse_req_t req, fuse_ino_t nodeid,
                     struct fuse_mbuf_iter *iter)
 {
@@ -2282,6 +2292,7 @@ static struct {
     [FUSE_RENAME2] = { do_rename2, "RENAME2" },
     [FUSE_COPY_FILE_RANGE] = { do_copy_file_range, "COPY_FILE_RANGE" },
     [FUSE_LSEEK] = { do_lseek, "LSEEK" },
+    [FUSE_SYNCFS] = { do_syncfs, "SYNCFS" },
 };
 
 #define FUSE_MAXOP (sizeof(fuse_ll_ops) / sizeof(fuse_ll_ops[0]))
diff --git a/tools/virtiofsd/fuse_lowlevel.h b/tools/virtiofsd/fuse_lowlevel.h
index c55c0ca2fc1..b889dae4de0 100644
--- a/tools/virtiofsd/fuse_lowlevel.h
+++ b/tools/virtiofsd/fuse_lowlevel.h
@@ -1226,6 +1226,19 @@ struct fuse_lowlevel_ops {
      */
     void (*lseek)(fuse_req_t req, fuse_ino_t ino, off_t off, int whence,
                   struct fuse_file_info *fi);
+
+    /**
+     * Synchronize file system content
+     *
+     * If this request is answered with an error code of ENOSYS,
+     * this is treated as success and future calls to syncfs() will
+     * succeed automatically without being sent to the filesystem
+     * process.
+     *
+     * @param req request handle
+     * @param ino the inode number
+     */
+    void (*syncfs)(fuse_req_t req, fuse_ino_t ino);
 };
 
 /**
diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 523d8fbe1ed..00ccb90a72a 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -3357,6 +3357,49 @@ static void lo_lseek(fuse_req_t req, fuse_ino_t ino, off_t off, int whence,
     }
 }
 
+static int lo_do_syncfs(struct lo_data *lo, struct lo_inode *inode)
+{
+    int fd, ret = 0;
+
+    fuse_log(FUSE_LOG_DEBUG, "lo_do_syncfs(ino=%" PRIu64 ")\n",
+             inode->fuse_ino);
+
+    fd = lo_inode_open(lo, inode, O_RDONLY);
+    if (fd < 0) {
+        return -fd;
+    }
+
+    if (syncfs(fd) < 0) {
+        ret = errno;
+    }
+
+    close(fd);
+    return ret;
+}
+
+static void lo_syncfs(fuse_req_t req, fuse_ino_t ino)
+{
+    struct lo_data *lo = lo_data(req);
+    struct lo_inode *inode = lo_inode(req, ino);
+    int err;
+
+    if (!inode) {
+        fuse_reply_err(req, EBADF);
+        return;
+    }
+
+    err = lo_do_syncfs(lo, inode);
+    lo_inode_put(lo, &inode);
+
+    /*
+     * If submounts aren't announced, the client only sends a request to
+     * sync the root inode. TODO: Track submounts internally and iterate
+     * over them as well.
+     */
+
+    fuse_reply_err(req, err);
+}
+
 static void lo_destroy(void *userdata)
 {
     struct lo_data *lo = (struct lo_data *)userdata;
@@ -3417,6 +3460,7 @@ static struct fuse_lowlevel_ops lo_oper = {
     .copy_file_range = lo_copy_file_range,
 #endif
     .lseek = lo_lseek,
+    .syncfs = lo_syncfs,
     .destroy = lo_destroy,
 };
 
diff --git a/tools/virtiofsd/passthrough_seccomp.c b/tools/virtiofsd/passthrough_seccomp.c
index a3ce9f898d2..3e9d6181dc6 100644
--- a/tools/virtiofsd/passthrough_seccomp.c
+++ b/tools/virtiofsd/passthrough_seccomp.c
@@ -108,6 +108,7 @@ static const int syscall_allowlist[] = {
     SCMP_SYS(set_robust_list),
     SCMP_SYS(setxattr),
     SCMP_SYS(symlinkat),
+    SCMP_SYS(syncfs),
     SCMP_SYS(time), /* Rarely needed, except on static builds */
     SCMP_SYS(tgkill),
     SCMP_SYS(unlinkat),
-- 
Gitee


From fb5268e169776ea00ca6c00bcf792c743530e6b5 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= <clg@redhat.com>
Date: Tue, 23 May 2023 12:34:33 +0200
Subject: [PATCH 258/407] s390: kvm: adjust diag318 resets to retain data
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 279: Backport latest s390x-related fixes from upstream QEMU for qemu-kvm in RHEL 8.9
RH-Bugzilla: 2169308 2209605
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Commit: [1/21] 16f2ff166efdd26a3be98d7c97d3b184598d1ca4

Bugzilla: https://bugzilla.redhat.com/2169308

commit c35aff184b2ed5be930da671ea25c857713555af
Author: Collin L. Walling <walling@linux.ibm.com>
Date:   Wed Nov 17 10:23:03 2021 -0500

    s390: kvm: adjust diag318 resets to retain data

    The CPNC portion of the diag318 data is erroneously reset during an
    initial CPU reset caused by SIGP. Let's go ahead and relocate the
    diag318_info field within the CPUS390XState struct such that it is
    only zeroed during a clear reset. This way, the CPNC will be retained
    for each VCPU in the configuration after the diag318 instruction
    has been invoked.

    The s390_machine_reset code already takes care of zeroing the diag318
    data on VM resets, which also cover resets caused by diag308.

    Fixes: fabdada9357b ("s390: guest support for diagnose 0x318")
    Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
    Signed-off-by: Collin Walling <walling@linux.ibm.com>
    Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
    Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
    Message-Id: <20211117152303.627969-1-walling@linux.ibm.com>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 target/s390x/cpu.h     | 4 ++--
 target/s390x/kvm/kvm.c | 4 ++++
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/target/s390x/cpu.h b/target/s390x/cpu.h
index ca3845d0235..a75e559134c 100644
--- a/target/s390x/cpu.h
+++ b/target/s390x/cpu.h
@@ -63,6 +63,8 @@ struct CPUS390XState {
     uint64_t etoken;       /* etoken */
     uint64_t etoken_extension; /* etoken extension */
 
+    uint64_t diag318_info;
+
     /* Fields up to this point are not cleared by initial CPU reset */
     struct {} start_initial_reset_fields;
 
@@ -118,8 +120,6 @@ struct CPUS390XState {
     uint16_t external_call_addr;
     DECLARE_BITMAP(emergency_signals, S390_MAX_CPUS);
 
-    uint64_t diag318_info;
-
 #if !defined(CONFIG_USER_ONLY)
     uint64_t tlb_fill_tec;   /* translation exception code during tlb_fill */
     int tlb_fill_exc;        /* exception number seen during tlb_fill */
diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
index d36b44f32a8..8d36c377b58 100644
--- a/target/s390x/kvm/kvm.c
+++ b/target/s390x/kvm/kvm.c
@@ -1598,6 +1598,10 @@ void kvm_s390_set_diag318(CPUState *cs, uint64_t diag318_info)
         env->diag318_info = diag318_info;
         cs->kvm_run->s.regs.diag318 = diag318_info;
         cs->kvm_run->kvm_dirty_regs |= KVM_SYNC_DIAG318;
+        /*
+         * diag 318 info is zeroed during a clear reset and
+         * diag 308 IPL subcodes.
+         */
     }
 }
 
-- 
Gitee


From a4d86d8fc171987cadf3cdc28262ac96157f0419 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= <clg@redhat.com>
Date: Tue, 23 May 2023 12:34:33 +0200
Subject: [PATCH 259/407] target/s390x: Fix SLDA sign bit index
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 279: Backport latest s390x-related fixes from upstream QEMU for qemu-kvm in RHEL 8.9
RH-Bugzilla: 2169308 2209605
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Commit: [2/21] 8600ece5b20bbe9dfa91e322cf29c5f79000d39c

Bugzilla: https://bugzilla.redhat.com/2169308

commit 521130f267240cb1ed8fd4635496493a153281db
Author: Ilya Leoshkevich <iii@linux.ibm.com>
Date:   Wed Jan 12 17:50:12 2022 +0100

    target/s390x: Fix SLDA sign bit index

    SLDA operates on 64-bit values, so its sign bit index should be 63,
    not 31.

    Fixes: a79ba3398a0a ("target-s390: Convert SHIFT DOUBLE")
    Reported-by: David Hildenbrand <david@redhat.com>
    Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Message-Id: <20220112165016.226996-2-iii@linux.ibm.com>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 target/s390x/tcg/insn-data.def | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/s390x/tcg/insn-data.def b/target/s390x/tcg/insn-data.def
index 3e5594210c8..c92284df5d4 100644
--- a/target/s390x/tcg/insn-data.def
+++ b/target/s390x/tcg/insn-data.def
@@ -800,7 +800,7 @@
     C(0xebde, SRLK,    RSY_a, DO,  r3_32u, sh32, new, r1_32, srl, 0)
     C(0xeb0c, SRLG,    RSY_a, Z,   r3_o, sh64, r1, 0, srl, 0)
 /* SHIFT LEFT DOUBLE */
-    D(0x8f00, SLDA,    RS_a,  Z,   r1_D32, sh64, new, r1_D32, sla, 0, 31)
+    D(0x8f00, SLDA,    RS_a,  Z,   r1_D32, sh64, new, r1_D32, sla, 0, 63)
 /* SHIFT LEFT DOUBLE LOGICAL */
     C(0x8d00, SLDL,    RS_a,  Z,   r1_D32, sh64, new, r1_D32, sll, 0)
 /* SHIFT RIGHT DOUBLE */
-- 
Gitee


From 5676033edd2ab22049098862038a787430968d74 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= <clg@redhat.com>
Date: Tue, 23 May 2023 12:34:33 +0200
Subject: [PATCH 260/407] target/s390x: Fix SRDA CC calculation
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 279: Backport latest s390x-related fixes from upstream QEMU for qemu-kvm in RHEL 8.9
RH-Bugzilla: 2169308 2209605
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Commit: [3/21] 95b2ba26003baa51f85f07e8860f875349c72b86

Bugzilla: https://bugzilla.redhat.com/2169308

commit 57556b28afde4b039bb12bfc274bd8df9022d946
Author: Ilya Leoshkevich <iii@linux.ibm.com>
Date:   Wed Jan 12 17:50:13 2022 +0100

    target/s390x: Fix SRDA CC calculation

    SRDA uses r1_D32 for binding the first operand and s64 for setting CC.
    cout_s64() relies on o->out being the shift result, however,
    wout_r1_D32() clobbers it.

    Fix by using a temporary.

    Fixes: a79ba3398a0a ("target-s390: Convert SHIFT DOUBLE")
    Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Message-Id: <20220112165016.226996-3-iii@linux.ibm.com>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 target/s390x/tcg/translate.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
index dcc249a197c..c5e59b68afe 100644
--- a/target/s390x/tcg/translate.c
+++ b/target/s390x/tcg/translate.c
@@ -5420,9 +5420,11 @@ static void wout_r1_P32(DisasContext *s, DisasOps *o)
 static void wout_r1_D32(DisasContext *s, DisasOps *o)
 {
     int r1 = get_field(s, r1);
+    TCGv_i64 t = tcg_temp_new_i64();
     store_reg32_i64(r1 + 1, o->out);
-    tcg_gen_shri_i64(o->out, o->out, 32);
-    store_reg32_i64(r1, o->out);
+    tcg_gen_shri_i64(t, o->out, 32);
+    store_reg32_i64(r1, t);
+    tcg_temp_free_i64(t);
 }
 #define SPEC_wout_r1_D32 SPEC_r1_even
 
-- 
Gitee


From 2b76153b81fd6fa09fc5e8e0f37ee54bc4c4e744 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= <clg@redhat.com>
Date: Tue, 23 May 2023 12:34:33 +0200
Subject: [PATCH 261/407] target/s390x: Fix cc_calc_sla_64() missing overflows
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 279: Backport latest s390x-related fixes from upstream QEMU for qemu-kvm in RHEL 8.9
RH-Bugzilla: 2169308 2209605
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Commit: [4/21] 2f91de2ac980d6ffa4da0ec41bb30562624a2396

Bugzilla: https://bugzilla.redhat.com/2169308

commit df103c09bc2f549d36ba6313a69c18fc003ef1ee
Author: Ilya Leoshkevich <iii@linux.ibm.com>
Date:   Wed Jan 12 17:50:14 2022 +0100

    target/s390x: Fix cc_calc_sla_64() missing overflows

    An overflow occurs for SLAG when at least one shifted bit is not equal
    to sign bit. Therefore, we need to check that `shift + 1` bits are
    neither all 0s nor all 1s. The current code checks only `shift` bits,
    missing some overflows.

    Fixes: cbe24bfa91d2 ("target-s390: Convert SHIFT, ROTATE SINGLE")
    Co-developed-by: David Hildenbrand <david@redhat.com>
    Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Message-Id: <20220112165016.226996-4-iii@linux.ibm.com>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 target/s390x/tcg/cc_helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/s390x/tcg/cc_helper.c b/target/s390x/tcg/cc_helper.c
index c2c96c3a3c1..c9b7b0e8c68 100644
--- a/target/s390x/tcg/cc_helper.c
+++ b/target/s390x/tcg/cc_helper.c
@@ -297,7 +297,7 @@ static uint32_t cc_calc_sla_32(uint32_t src, int shift)
 
 static uint32_t cc_calc_sla_64(uint64_t src, int shift)
 {
-    uint64_t mask = ((1ULL << shift) - 1ULL) << (64 - shift);
+    uint64_t mask = -1ULL << (63 - shift);
     uint64_t sign = 1ULL << 63;
     uint64_t match;
     int64_t r;
-- 
Gitee


From 20b6529b9f72fc5061fb2fa9d23f77cfbb3d89a3 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= <clg@redhat.com>
Date: Tue, 23 May 2023 12:34:33 +0200
Subject: [PATCH 262/407] target/s390x: Fix shifting 32-bit values for more
 than 31 bits
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 279: Backport latest s390x-related fixes from upstream QEMU for qemu-kvm in RHEL 8.9
RH-Bugzilla: 2169308 2209605
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Commit: [5/21] fba372359f0771ec41f3ad7ee4f1376e545da088

Bugzilla: https://bugzilla.redhat.com/2169308

commit 6da170beda33f3e7f1d9242814acd9f428f0f0fb
Author: Ilya Leoshkevich <iii@linux.ibm.com>
Date:   Wed Jan 12 17:50:15 2022 +0100

    target/s390x: Fix shifting 32-bit values for more than 31 bits

    According to PoP, both 32- and 64-bit shifts use lowest 6 address
    bits. The current code special-cases 32-bit shifts to use only 5 bits,
    which is not correct. For example, shifting by 32 bits currently
    preserves the initial value, however, it's supposed zero it out
    instead.

    Fix by merging sh32 and sh64 and adapting CC calculation to shift
    values greater than 31.

    Fixes: cbe24bfa91d2 ("target-s390: Convert SHIFT, ROTATE SINGLE")
    Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Message-Id: <20220112165016.226996-5-iii@linux.ibm.com>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 target/s390x/cpu-dump.c        |  3 +--
 target/s390x/s390x-internal.h  |  3 +--
 target/s390x/tcg/cc_helper.c   | 36 +++-----------------------
 target/s390x/tcg/insn-data.def | 36 +++++++++++++-------------
 target/s390x/tcg/translate.c   | 47 ++++++++++++++++------------------
 5 files changed, 45 insertions(+), 80 deletions(-)

diff --git a/target/s390x/cpu-dump.c b/target/s390x/cpu-dump.c
index 0f5c062994f..ffa9e94d848 100644
--- a/target/s390x/cpu-dump.c
+++ b/target/s390x/cpu-dump.c
@@ -121,8 +121,7 @@ const char *cc_name(enum cc_op cc_op)
         [CC_OP_NZ_F64]    = "CC_OP_NZ_F64",
         [CC_OP_NZ_F128]   = "CC_OP_NZ_F128",
         [CC_OP_ICM]       = "CC_OP_ICM",
-        [CC_OP_SLA_32]    = "CC_OP_SLA_32",
-        [CC_OP_SLA_64]    = "CC_OP_SLA_64",
+        [CC_OP_SLA]       = "CC_OP_SLA",
         [CC_OP_FLOGR]     = "CC_OP_FLOGR",
         [CC_OP_LCBB]      = "CC_OP_LCBB",
         [CC_OP_VC]        = "CC_OP_VC",
diff --git a/target/s390x/s390x-internal.h b/target/s390x/s390x-internal.h
index 02cf6c3f438..c9acb450bac 100644
--- a/target/s390x/s390x-internal.h
+++ b/target/s390x/s390x-internal.h
@@ -193,8 +193,7 @@ enum cc_op {
     CC_OP_NZ_F128,              /* FP dst != 0 (128bit) */
 
     CC_OP_ICM,                  /* insert characters under mask */
-    CC_OP_SLA_32,               /* Calculate shift left signed (32bit) */
-    CC_OP_SLA_64,               /* Calculate shift left signed (64bit) */
+    CC_OP_SLA,                  /* Calculate shift left signed */
     CC_OP_FLOGR,                /* find leftmost one */
     CC_OP_LCBB,                 /* load count to block boundary */
     CC_OP_VC,                   /* vector compare result */
diff --git a/target/s390x/tcg/cc_helper.c b/target/s390x/tcg/cc_helper.c
index c9b7b0e8c68..8d04097f784 100644
--- a/target/s390x/tcg/cc_helper.c
+++ b/target/s390x/tcg/cc_helper.c
@@ -268,34 +268,7 @@ static uint32_t cc_calc_icm(uint64_t mask, uint64_t val)
     }
 }
 
-static uint32_t cc_calc_sla_32(uint32_t src, int shift)
-{
-    uint32_t mask = ((1U << shift) - 1U) << (32 - shift);
-    uint32_t sign = 1U << 31;
-    uint32_t match;
-    int32_t r;
-
-    /* Check if the sign bit stays the same.  */
-    if (src & sign) {
-        match = mask;
-    } else {
-        match = 0;
-    }
-    if ((src & mask) != match) {
-        /* Overflow.  */
-        return 3;
-    }
-
-    r = ((src << shift) & ~sign) | (src & sign);
-    if (r == 0) {
-        return 0;
-    } else if (r < 0) {
-        return 1;
-    }
-    return 2;
-}
-
-static uint32_t cc_calc_sla_64(uint64_t src, int shift)
+static uint32_t cc_calc_sla(uint64_t src, int shift)
 {
     uint64_t mask = -1ULL << (63 - shift);
     uint64_t sign = 1ULL << 63;
@@ -459,11 +432,8 @@ static uint32_t do_calc_cc(CPUS390XState *env, uint32_t cc_op,
     case CC_OP_ICM:
         r =  cc_calc_icm(src, dst);
         break;
-    case CC_OP_SLA_32:
-        r =  cc_calc_sla_32(src, dst);
-        break;
-    case CC_OP_SLA_64:
-        r =  cc_calc_sla_64(src, dst);
+    case CC_OP_SLA:
+        r =  cc_calc_sla(src, dst);
         break;
     case CC_OP_FLOGR:
         r = cc_calc_flogr(dst);
diff --git a/target/s390x/tcg/insn-data.def b/target/s390x/tcg/insn-data.def
index c92284df5d4..99f4f5e36eb 100644
--- a/target/s390x/tcg/insn-data.def
+++ b/target/s390x/tcg/insn-data.def
@@ -747,8 +747,8 @@
     C(0xb9e1, POPCNT,  RRE,   PC,  0, r2_o, r1, 0, popcnt, nz64)
 
 /* ROTATE LEFT SINGLE LOGICAL */
-    C(0xeb1d, RLL,     RSY_a, Z,   r3_o, sh32, new, r1_32, rll32, 0)
-    C(0xeb1c, RLLG,    RSY_a, Z,   r3_o, sh64, r1, 0, rll64, 0)
+    C(0xeb1d, RLL,     RSY_a, Z,   r3_o, sh, new, r1_32, rll32, 0)
+    C(0xeb1c, RLLG,    RSY_a, Z,   r3_o, sh, r1, 0, rll64, 0)
 
 /* ROTATE THEN INSERT SELECTED BITS */
     C(0xec55, RISBG,   RIE_f, GIE, 0, r2, r1, 0, risbg, s64)
@@ -784,29 +784,29 @@
     C(0x0400, SPM,     RR_a,  Z,   r1, 0, 0, 0, spm, 0)
 
 /* SHIFT LEFT SINGLE */
-    D(0x8b00, SLA,     RS_a,  Z,   r1, sh32, new, r1_32, sla, 0, 31)
-    D(0xebdd, SLAK,    RSY_a, DO,  r3, sh32, new, r1_32, sla, 0, 31)
-    D(0xeb0b, SLAG,    RSY_a, Z,   r3, sh64, r1, 0, sla, 0, 63)
+    D(0x8b00, SLA,     RS_a,  Z,   r1, sh, new, r1_32, sla, 0, 31)
+    D(0xebdd, SLAK,    RSY_a, DO,  r3, sh, new, r1_32, sla, 0, 31)
+    D(0xeb0b, SLAG,    RSY_a, Z,   r3, sh, r1, 0, sla, 0, 63)
 /* SHIFT LEFT SINGLE LOGICAL */
-    C(0x8900, SLL,     RS_a,  Z,   r1_o, sh32, new, r1_32, sll, 0)
-    C(0xebdf, SLLK,    RSY_a, DO,  r3_o, sh32, new, r1_32, sll, 0)
-    C(0xeb0d, SLLG,    RSY_a, Z,   r3_o, sh64, r1, 0, sll, 0)
+    C(0x8900, SLL,     RS_a,  Z,   r1_o, sh, new, r1_32, sll, 0)
+    C(0xebdf, SLLK,    RSY_a, DO,  r3_o, sh, new, r1_32, sll, 0)
+    C(0xeb0d, SLLG,    RSY_a, Z,   r3_o, sh, r1, 0, sll, 0)
 /* SHIFT RIGHT SINGLE */
-    C(0x8a00, SRA,     RS_a,  Z,   r1_32s, sh32, new, r1_32, sra, s32)
-    C(0xebdc, SRAK,    RSY_a, DO,  r3_32s, sh32, new, r1_32, sra, s32)
-    C(0xeb0a, SRAG,    RSY_a, Z,   r3_o, sh64, r1, 0, sra, s64)
+    C(0x8a00, SRA,     RS_a,  Z,   r1_32s, sh, new, r1_32, sra, s32)
+    C(0xebdc, SRAK,    RSY_a, DO,  r3_32s, sh, new, r1_32, sra, s32)
+    C(0xeb0a, SRAG,    RSY_a, Z,   r3_o, sh, r1, 0, sra, s64)
 /* SHIFT RIGHT SINGLE LOGICAL */
-    C(0x8800, SRL,     RS_a,  Z,   r1_32u, sh32, new, r1_32, srl, 0)
-    C(0xebde, SRLK,    RSY_a, DO,  r3_32u, sh32, new, r1_32, srl, 0)
-    C(0xeb0c, SRLG,    RSY_a, Z,   r3_o, sh64, r1, 0, srl, 0)
+    C(0x8800, SRL,     RS_a,  Z,   r1_32u, sh, new, r1_32, srl, 0)
+    C(0xebde, SRLK,    RSY_a, DO,  r3_32u, sh, new, r1_32, srl, 0)
+    C(0xeb0c, SRLG,    RSY_a, Z,   r3_o, sh, r1, 0, srl, 0)
 /* SHIFT LEFT DOUBLE */
-    D(0x8f00, SLDA,    RS_a,  Z,   r1_D32, sh64, new, r1_D32, sla, 0, 63)
+    D(0x8f00, SLDA,    RS_a,  Z,   r1_D32, sh, new, r1_D32, sla, 0, 63)
 /* SHIFT LEFT DOUBLE LOGICAL */
-    C(0x8d00, SLDL,    RS_a,  Z,   r1_D32, sh64, new, r1_D32, sll, 0)
+    C(0x8d00, SLDL,    RS_a,  Z,   r1_D32, sh, new, r1_D32, sll, 0)
 /* SHIFT RIGHT DOUBLE */
-    C(0x8e00, SRDA,    RS_a,  Z,   r1_D32, sh64, new, r1_D32, sra, s64)
+    C(0x8e00, SRDA,    RS_a,  Z,   r1_D32, sh, new, r1_D32, sra, s64)
 /* SHIFT RIGHT DOUBLE LOGICAL */
-    C(0x8c00, SRDL,    RS_a,  Z,   r1_D32, sh64, new, r1_D32, srl, 0)
+    C(0x8c00, SRDL,    RS_a,  Z,   r1_D32, sh, new, r1_D32, srl, 0)
 
 /* SQUARE ROOT */
     F(0xb314, SQEBR,   RRE,   Z,   0, e2, new, e1, sqeb, 0, IF_BFP)
diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
index c5e59b68afe..b14e6a04a73 100644
--- a/target/s390x/tcg/translate.c
+++ b/target/s390x/tcg/translate.c
@@ -636,8 +636,7 @@ static void gen_op_calc_cc(DisasContext *s)
     case CC_OP_LTUGTU_64:
     case CC_OP_TM_32:
     case CC_OP_TM_64:
-    case CC_OP_SLA_32:
-    case CC_OP_SLA_64:
+    case CC_OP_SLA:
     case CC_OP_SUBU:
     case CC_OP_NZ_F128:
     case CC_OP_VC:
@@ -1178,19 +1177,6 @@ struct DisasInsn {
 /* ====================================================================== */
 /* Miscellaneous helpers, used by several operations.  */
 
-static void help_l2_shift(DisasContext *s, DisasOps *o, int mask)
-{
-    int b2 = get_field(s, b2);
-    int d2 = get_field(s, d2);
-
-    if (b2 == 0) {
-        o->in2 = tcg_const_i64(d2 & mask);
-    } else {
-        o->in2 = get_address(s, 0, b2, d2);
-        tcg_gen_andi_i64(o->in2, o->in2, mask);
-    }
-}
-
 static DisasJumpType help_goto_direct(DisasContext *s, uint64_t dest)
 {
     if (dest == s->pc_tmp) {
@@ -4113,9 +4099,18 @@ static DisasJumpType op_soc(DisasContext *s, DisasOps *o)
 
 static DisasJumpType op_sla(DisasContext *s, DisasOps *o)
 {
+    TCGv_i64 t;
     uint64_t sign = 1ull << s->insn->data;
-    enum cc_op cco = s->insn->data == 31 ? CC_OP_SLA_32 : CC_OP_SLA_64;
-    gen_op_update2_cc_i64(s, cco, o->in1, o->in2);
+    if (s->insn->data == 31) {
+        t = tcg_temp_new_i64();
+        tcg_gen_shli_i64(t, o->in1, 32);
+    } else {
+        t = o->in1;
+    }
+    gen_op_update2_cc_i64(s, CC_OP_SLA, t, o->in2);
+    if (s->insn->data == 31) {
+        tcg_temp_free_i64(t);
+    }
     tcg_gen_shl_i64(o->out, o->in1, o->in2);
     /* The arithmetic left shift is curious in that it does not affect
        the sign bit.  Copy that over from the source unchanged.  */
@@ -5924,17 +5919,19 @@ static void in2_ri2(DisasContext *s, DisasOps *o)
 }
 #define SPEC_in2_ri2 0
 
-static void in2_sh32(DisasContext *s, DisasOps *o)
+static void in2_sh(DisasContext *s, DisasOps *o)
 {
-    help_l2_shift(s, o, 31);
-}
-#define SPEC_in2_sh32 0
+    int b2 = get_field(s, b2);
+    int d2 = get_field(s, d2);
 
-static void in2_sh64(DisasContext *s, DisasOps *o)
-{
-    help_l2_shift(s, o, 63);
+    if (b2 == 0) {
+        o->in2 = tcg_const_i64(d2 & 0x3f);
+    } else {
+        o->in2 = get_address(s, 0, b2, d2);
+        tcg_gen_andi_i64(o->in2, o->in2, 0x3f);
+    }
 }
-#define SPEC_in2_sh64 0
+#define SPEC_in2_sh 0
 
 static void in2_m2_8u(DisasContext *s, DisasOps *o)
 {
-- 
Gitee


From fa5c068a3a64e6a1a7fe3b8b8cb48d6d72c7246f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= <clg@redhat.com>
Date: Tue, 23 May 2023 12:34:33 +0200
Subject: [PATCH 263/407] s390x: sigp: Reorder the SIGP STOP code
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 279: Backport latest s390x-related fixes from upstream QEMU for qemu-kvm in RHEL 8.9
RH-Bugzilla: 2169308 2209605
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Commit: [6/21] 0c957b3f4a2d6abb278375a7080055502fa8e34d

Bugzilla: https://bugzilla.redhat.com/2169308

commit 59b9b5186e44a90088a91ed7a7493b03027e4f1f
Author: Eric Farman <farman@linux.ibm.com>
Date:   Mon Dec 13 22:09:19 2021 +0100

    s390x: sigp: Reorder the SIGP STOP code

    Let's wait to mark the VCPU STOPPED until the possible
    STORE STATUS operation is completed, so that we know the
    CPU is fully stopped and done doing anything. (When we
    also clear the possible sigp_order field for STOP orders.)

    Suggested-by: David Hildenbrand <david@redhat.com>
    Signed-off-by: Eric Farman <farman@linux.ibm.com>
    Message-Id: <20211213210919.856693-2-farman@linux.ibm.com>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 target/s390x/sigp.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/target/s390x/sigp.c b/target/s390x/sigp.c
index 51c727834c0..9dd977349ab 100644
--- a/target/s390x/sigp.c
+++ b/target/s390x/sigp.c
@@ -139,7 +139,7 @@ static void sigp_stop_and_store_status(CPUState *cs, run_on_cpu_data arg)
     case S390_CPU_STATE_OPERATING:
         cpu->env.sigp_order = SIGP_STOP_STORE_STATUS;
         cpu_inject_stop(cpu);
-        /* store will be performed in do_stop_interrup() */
+        /* store will be performed in do_stop_interrupt() */
         break;
     case S390_CPU_STATE_STOPPED:
         /* already stopped, just store the status */
@@ -479,13 +479,17 @@ void do_stop_interrupt(CPUS390XState *env)
 {
     S390CPU *cpu = env_archcpu(env);
 
-    if (s390_cpu_set_state(S390_CPU_STATE_STOPPED, cpu) == 0) {
-        qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
-    }
+    /*
+     * Complete the STOP operation before exposing the CPU as
+     * STOPPED to the system.
+     */
     if (cpu->env.sigp_order == SIGP_STOP_STORE_STATUS) {
         s390_store_status(cpu, S390_STORE_STATUS_DEF_ADDR, true);
     }
     env->sigp_order = 0;
+    if (s390_cpu_set_state(S390_CPU_STATE_STOPPED, cpu) == 0) {
+        qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
+    }
     env->pending_int &= ~INTERRUPT_STOP;
 }
 
-- 
Gitee


From cc5c0ac00161971ccc8a8a31e3ad5ee139bf68f2 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= <clg@redhat.com>
Date: Tue, 23 May 2023 12:34:33 +0200
Subject: [PATCH 264/407] s390x/tcg: Fix BRASL with a large negative offset
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 279: Backport latest s390x-related fixes from upstream QEMU for qemu-kvm in RHEL 8.9
RH-Bugzilla: 2169308 2209605
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Commit: [7/21] f2eb97bf300afcb440cd5dc6d398ce7ad34f1db9

Bugzilla: https://bugzilla.redhat.com/2169308

commit fc3dd86a290a9c7c3c3273961b03058ae8f1d49f
Author: Ilya Leoshkevich <iii@linux.ibm.com>
Date:   Mon Mar 14 11:42:30 2022 +0100

    s390x/tcg: Fix BRASL with a large negative offset

    When RI2 is 0x80000000, qemu enters an infinite loop instead of jumping
    backwards. Fix by adding a missing cast, like in in2_ri2().

    Fixes: 8ac33cdb8bfb ("Convert BRANCH AND SAVE")
    Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
    Message-Id: <20220314104232.675863-2-iii@linux.ibm.com>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 target/s390x/tcg/translate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
index b14e6a04a73..8147d952dfa 100644
--- a/target/s390x/tcg/translate.c
+++ b/target/s390x/tcg/translate.c
@@ -1567,7 +1567,7 @@ static DisasJumpType op_bal(DisasContext *s, DisasOps *o)
 static DisasJumpType op_basi(DisasContext *s, DisasOps *o)
 {
     pc_to_link_info(o->out, s, s->pc_tmp);
-    return help_goto_direct(s, s->base.pc_next + 2 * get_field(s, i2));
+    return help_goto_direct(s, s->base.pc_next + (int64_t)get_field(s, i2) * 2);
 }
 
 static DisasJumpType op_bc(DisasContext *s, DisasOps *o)
-- 
Gitee


From abd28e07d4f2fc16ad34ffe8e44b5494b0ca8b35 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= <clg@redhat.com>
Date: Tue, 23 May 2023 12:34:33 +0200
Subject: [PATCH 265/407] s390x/tcg: Fix BRCL with a large negative offset
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 279: Backport latest s390x-related fixes from upstream QEMU for qemu-kvm in RHEL 8.9
RH-Bugzilla: 2169308 2209605
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Commit: [8/21] 60abe03ceba239268b72ff79e2945b73822fb72f

Bugzilla: https://bugzilla.redhat.com/2169308

commit 16ed5f14215b20c8dc49b96e2149032ba3238beb
Author: Ilya Leoshkevich <iii@linux.ibm.com>
Date:   Mon Mar 14 11:42:31 2022 +0100

    s390x/tcg: Fix BRCL with a large negative offset

    When RI2 is 0x80000000, qemu enters an infinite loop instead of jumping
    backwards. Fix by adding a missing cast, like in in2_ri2().

    Fixes: 7233f2ed1717 ("target-s390: Convert BRANCH ON CONDITION")
    Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
    Message-Id: <20220314104232.675863-3-iii@linux.ibm.com>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 target/s390x/tcg/translate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
index 8147d952dfa..7ff7f90e232 100644
--- a/target/s390x/tcg/translate.c
+++ b/target/s390x/tcg/translate.c
@@ -1201,7 +1201,7 @@ static DisasJumpType help_branch(DisasContext *s, DisasCompare *c,
                                  bool is_imm, int imm, TCGv_i64 cdest)
 {
     DisasJumpType ret;
-    uint64_t dest = s->base.pc_next + 2 * imm;
+    uint64_t dest = s->base.pc_next + (int64_t)imm * 2;
     TCGLabel *lab;
 
     /* Take care of the special cases first.  */
-- 
Gitee


From 0cb590a2740be8d82834f9221250d98024f489d0 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= <clg@redhat.com>
Date: Tue, 23 May 2023 12:34:33 +0200
Subject: [PATCH 266/407] target/s390x: Fix determination of overflow condition
 code after addition
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 279: Backport latest s390x-related fixes from upstream QEMU for qemu-kvm in RHEL 8.9
RH-Bugzilla: 2169308 2209605
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Commit: [9/21] e8b946ff4e521e0367cb03fcd918a2f8af8bd4d5

Bugzilla: https://bugzilla.redhat.com/2169308

commit 5a2e67a691501bc4dd81c46c81b8f1881c8bd5df
Author: Bruno Haible <bruno@clisp.org>
Date:   Wed Mar 23 17:26:20 2022 +0100

    target/s390x: Fix determination of overflow condition code after addition

    This program currently prints different results when run with TCG instead
    of running on real s390x hardware:

     #include <stdio.h>

     int overflow_32 (int x, int y)
     {
       int sum;
       return ! __builtin_add_overflow (x, y, &sum);
     }

     int overflow_64 (long long x, long long y)
     {
       long sum;
       return ! __builtin_add_overflow (x, y, &sum);
     }

     int a1 = -2147483648;
     int b1 = -2147483648;
     long long a2 = -9223372036854775808L;
     long long b2 = -9223372036854775808L;

     int main ()
     {
       {
         int a = a1;
         int b = b1;
         printf ("a = 0x%x, b = 0x%x\n", a, b);
         printf ("no_overflow = %d\n", overflow_32 (a, b));
       }
       {
         long long a = a2;
         long long b = b2;
         printf ("a = 0x%llx, b = 0x%llx\n", a, b);
         printf ("no_overflow = %d\n", overflow_64 (a, b));
       }
     }

    Signed-off-by: Bruno Haible <bruno@clisp.org>
    Resolves: https://gitlab.com/qemu-project/qemu/-/issues/616
    Message-Id: <20220323162621.139313-2-thuth@redhat.com>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 target/s390x/tcg/cc_helper.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/s390x/tcg/cc_helper.c b/target/s390x/tcg/cc_helper.c
index 8d04097f784..e11cdb745dc 100644
--- a/target/s390x/tcg/cc_helper.c
+++ b/target/s390x/tcg/cc_helper.c
@@ -136,7 +136,7 @@ static uint32_t cc_calc_subu(uint64_t borrow_out, uint64_t result)
 
 static uint32_t cc_calc_add_64(int64_t a1, int64_t a2, int64_t ar)
 {
-    if ((a1 > 0 && a2 > 0 && ar < 0) || (a1 < 0 && a2 < 0 && ar > 0)) {
+    if ((a1 > 0 && a2 > 0 && ar < 0) || (a1 < 0 && a2 < 0 && ar >= 0)) {
         return 3; /* overflow */
     } else {
         if (ar < 0) {
@@ -196,7 +196,7 @@ static uint32_t cc_calc_comp_64(int64_t dst)
 
 static uint32_t cc_calc_add_32(int32_t a1, int32_t a2, int32_t ar)
 {
-    if ((a1 > 0 && a2 > 0 && ar < 0) || (a1 < 0 && a2 < 0 && ar > 0)) {
+    if ((a1 > 0 && a2 > 0 && ar < 0) || (a1 < 0 && a2 < 0 && ar >= 0)) {
         return 3; /* overflow */
     } else {
         if (ar < 0) {
-- 
Gitee


From bd87344f0675dfe0bc2c9cc082acf20a22bafcb7 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= <clg@redhat.com>
Date: Tue, 23 May 2023 12:34:33 +0200
Subject: [PATCH 267/407] target/s390x: Fix determination of overflow condition
 code after subtraction
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 279: Backport latest s390x-related fixes from upstream QEMU for qemu-kvm in RHEL 8.9
RH-Bugzilla: 2169308 2209605
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Commit: [10/21] 14792faddfca784503f89c292ebaba5be8d3fc96

Bugzilla: https://bugzilla.redhat.com/2169308

commit fc6e0d0f2db5126592bb4066d484fcdfc14ccf36
Author: Bruno Haible <bruno@clisp.org>
Date:   Wed Mar 23 17:26:21 2022 +0100

    target/s390x: Fix determination of overflow condition code after subtraction

    Reported by Paul Eggert in
    https://lists.gnu.org/archive/html/bug-gnulib/2021-09/msg00050.html

    This program currently prints different results when run with TCG instead
    of running on real s390x hardware:

     #include <stdio.h>

     int overflow_32 (int x, int y)
     {
       int sum;
       return __builtin_sub_overflow (x, y, &sum);
     }

     int overflow_64 (long long x, long long y)
     {
       long sum;
       return __builtin_sub_overflow (x, y, &sum);
     }

     int a1 = 0;
     int b1 = -2147483648;
     long long a2 = 0L;
     long long b2 = -9223372036854775808L;

     int main ()
     {
       {
         int a = a1;
         int b = b1;
         printf ("a = 0x%x, b = 0x%x\n", a, b);
         printf ("no_overflow = %d\n", ! overflow_32 (a, b));
       }
       {
         long long a = a2;
         long long b = b2;
         printf ("a = 0x%llx, b = 0x%llx\n", a, b);
         printf ("no_overflow = %d\n", ! overflow_64 (a, b));
       }
     }

    Signed-off-by: Bruno Haible <bruno@clisp.org>
    Resolves: https://gitlab.com/qemu-project/qemu/-/issues/618
    Message-Id: <20220323162621.139313-3-thuth@redhat.com>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 target/s390x/tcg/cc_helper.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/s390x/tcg/cc_helper.c b/target/s390x/tcg/cc_helper.c
index e11cdb745dc..b2e8d3d9f59 100644
--- a/target/s390x/tcg/cc_helper.c
+++ b/target/s390x/tcg/cc_helper.c
@@ -151,7 +151,7 @@ static uint32_t cc_calc_add_64(int64_t a1, int64_t a2, int64_t ar)
 
 static uint32_t cc_calc_sub_64(int64_t a1, int64_t a2, int64_t ar)
 {
-    if ((a1 > 0 && a2 < 0 && ar < 0) || (a1 < 0 && a2 > 0 && ar > 0)) {
+    if ((a1 >= 0 && a2 < 0 && ar < 0) || (a1 < 0 && a2 > 0 && ar > 0)) {
         return 3; /* overflow */
     } else {
         if (ar < 0) {
@@ -211,7 +211,7 @@ static uint32_t cc_calc_add_32(int32_t a1, int32_t a2, int32_t ar)
 
 static uint32_t cc_calc_sub_32(int32_t a1, int32_t a2, int32_t ar)
 {
-    if ((a1 > 0 && a2 < 0 && ar < 0) || (a1 < 0 && a2 > 0 && ar > 0)) {
+    if ((a1 >= 0 && a2 < 0 && ar < 0) || (a1 < 0 && a2 > 0 && ar > 0)) {
         return 3; /* overflow */
     } else {
         if (ar < 0) {
-- 
Gitee


From 3416cbf9e76089fcec6c1321c4c19817ba447fcb Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= <clg@redhat.com>
Date: Tue, 23 May 2023 12:34:33 +0200
Subject: [PATCH 268/407] s390x: follow qdev tree to detect SCSI device on a
 CCW bus
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 279: Backport latest s390x-related fixes from upstream QEMU for qemu-kvm in RHEL 8.9
RH-Bugzilla: 2169308 2209605
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Commit: [11/21] 97303bc9c356e8828d185868736b395bc0b70214

Bugzilla: https://bugzilla.redhat.com/2169308

commit 7d2eb76d0407fc391b78df16d17f1e616ec3e228
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Mon Mar 28 09:40:00 2022 +0200

    s390x: follow qdev tree to detect SCSI device on a CCW bus

    Do not make assumptions on the parent type of the SCSIDevice, instead
    use object_dynamic_cast all the way up to the CcwDevice.  This is cleaner
    because there is no guarantee that the bus is on a virtio-scsi device;
    that is only the case for the default configuration of QEMU's s390x
    target.

    Reviewed-by: Thomas Huth <thuth@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 hw/s390x/ipl.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/hw/s390x/ipl.c b/hw/s390x/ipl.c
index eb7fc4c4ae8..9051d8652d6 100644
--- a/hw/s390x/ipl.c
+++ b/hw/s390x/ipl.c
@@ -376,14 +376,18 @@ static CcwDevice *s390_get_ccw_device(DeviceState *dev_st, int *devtype)
                 object_dynamic_cast(OBJECT(dev_st),
                                     TYPE_SCSI_DEVICE);
             if (sd) {
-                SCSIBus *bus = scsi_bus_from_device(sd);
-                VirtIOSCSI *vdev = container_of(bus, VirtIOSCSI, bus);
-                VirtIOSCSICcw *scsi_ccw = container_of(vdev, VirtIOSCSICcw,
-                                                       vdev);
-
-                ccw_dev = (CcwDevice *)object_dynamic_cast(OBJECT(scsi_ccw),
-                                                           TYPE_CCW_DEVICE);
-                tmp_dt = CCW_DEVTYPE_SCSI;
+                SCSIBus *sbus = scsi_bus_from_device(sd);
+                VirtIODevice *vdev = (VirtIODevice *)
+                    object_dynamic_cast(OBJECT(sbus->qbus.parent),
+                                        TYPE_VIRTIO_DEVICE);
+                if (vdev) {
+                    ccw_dev = (CcwDevice *)
+                        object_dynamic_cast(OBJECT(qdev_get_parent_bus(DEVICE(vdev))->parent),
+                                            TYPE_CCW_DEVICE);
+                    if (ccw_dev) {
+                        tmp_dt = CCW_DEVTYPE_SCSI;
+                    }
+                }
             }
         }
     }
-- 
Gitee


From 591acbf3629255fbda2b9382b667f41f1c8139cd Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= <clg@redhat.com>
Date: Tue, 23 May 2023 12:34:33 +0200
Subject: [PATCH 269/407] target/s390x: Fix the accumulation of ccm in op_icm
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 279: Backport latest s390x-related fixes from upstream QEMU for qemu-kvm in RHEL 8.9
RH-Bugzilla: 2169308 2209605
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Commit: [12/21] ad52141b1d733a34d392b72d9962ea7ac521dc17

Bugzilla: https://bugzilla.redhat.com/2169308

commit 21641ee5a9b31568c990c7fc949eeb9bcd0f6a0f
Author: Richard Henderson <richard.henderson@linaro.org>
Date:   Fri Apr 1 13:36:59 2022 -0600

    target/s390x: Fix the accumulation of ccm in op_icm

    Coverity rightly reports that 0xff << pos can overflow.
    This would affect the ICMH instruction.

    Fixes: Coverity CID 1487161
    Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: Thomas Huth <thuth@redhat.com>
    Message-Id: <20220401193659.332079-1-richard.henderson@linaro.org>

Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 target/s390x/tcg/translate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
index 7ff7f90e232..75f0418c101 100644
--- a/target/s390x/tcg/translate.c
+++ b/target/s390x/tcg/translate.c
@@ -2592,7 +2592,7 @@ static DisasJumpType op_icm(DisasContext *s, DisasOps *o)
                 tcg_gen_qemu_ld8u(tmp, o->in2, get_mem_index(s));
                 tcg_gen_addi_i64(o->in2, o->in2, 1);
                 tcg_gen_deposit_i64(o->out, o->out, tmp, pos, 8);
-                ccm |= 0xff << pos;
+                ccm |= 0xffull << pos;
             }
             m3 = (m3 << 1) & 0xf;
             pos -= 8;
-- 
Gitee


From 686a97ee551255bdbf2906950447a2992ca0c061 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= <clg@redhat.com>
Date: Tue, 23 May 2023 12:34:33 +0200
Subject: [PATCH 270/407] target/s390x: Fix writeback to v1 in helper_vstl
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 279: Backport latest s390x-related fixes from upstream QEMU for qemu-kvm in RHEL 8.9
RH-Bugzilla: 2169308 2209605
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Commit: [13/21] 9db50d12afc0a85921e6bfdb69f12ba29f3dce72

Bugzilla: https://bugzilla.redhat.com/2169308

commit db67a6ff480b346b1415b983f9582028cf8e18f0
Author: Richard Henderson <richard.henderson@linaro.org>
Date:   Thu Apr 28 11:46:56 2022 +0200

    target/s390x: Fix writeback to v1 in helper_vstl

    Fixes: 0e0a5b49ad58 ("s390x/tcg: Implement VECTOR STORE WITH LENGTH")
    Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
    Signed-off-by: David Hildenbrand <david@redhat.com>
    Tested-by: Thomas Huth <thuth@redhat.com>
    Reviewed-by: David Miller <dmiller423@gmail.com>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Message-Id: <20220428094708.84835-2-david@redhat.com>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 target/s390x/tcg/vec_helper.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/target/s390x/tcg/vec_helper.c b/target/s390x/tcg/vec_helper.c
index ededf13cf09..48d86722b2d 100644
--- a/target/s390x/tcg/vec_helper.c
+++ b/target/s390x/tcg/vec_helper.c
@@ -200,7 +200,6 @@ void HELPER(vstl)(CPUS390XState *env, const void *v1, uint64_t addr,
         addr = wrap_address(env, addr + 8);
         cpu_stq_data_ra(env, addr, s390_vec_read_element64(v1, 1), GETPC());
     } else {
-        S390Vector tmp = {};
         int i;
 
         for (i = 0; i < bytes; i++) {
@@ -209,6 +208,5 @@ void HELPER(vstl)(CPUS390XState *env, const void *v1, uint64_t addr,
             cpu_stb_data_ra(env, addr, byte, GETPC());
             addr = wrap_address(env, addr + 1);
         }
-        *(S390Vector *)v1 = tmp;
     }
 }
-- 
Gitee


From eafc9e7be011ef8c87bbaf1996073546a550cc83 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= <clg@redhat.com>
Date: Tue, 23 May 2023 12:34:33 +0200
Subject: [PATCH 271/407] target/s390x: fix handling of zeroes in vfmin/vfmax
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 279: Backport latest s390x-related fixes from upstream QEMU for qemu-kvm in RHEL 8.9
RH-Bugzilla: 2169308 2209605
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Commit: [14/21] 27f66691e08192a5c9f2ecbde3603c0adece4857

Bugzilla: https://bugzilla.redhat.com/2169308

commit 13c59eb09bd6d1fbc13f08b708226421f14a232b
Author: Ilya Leoshkevich <iii@linux.ibm.com>
Date:   Wed Jul 13 20:26:10 2022 +0200

    target/s390x: fix handling of zeroes in vfmin/vfmax

    vfmin_res() / vfmax_res() are trying to check whether a and b are both
    zeroes, but in reality they check that they are the same kind of zero.
    This causes incorrect results when comparing positive and negative
    zeroes.

    Fixes: da4807527f3b ("s390x/tcg: Implement VECTOR FP (MAXIMUM|MINIMUM)")
    Co-developed-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
    Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
    Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Message-Id: <20220713182612.3780050-2-iii@linux.ibm.com>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 target/s390x/tcg/vec_fpu_helper.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/s390x/tcg/vec_fpu_helper.c b/target/s390x/tcg/vec_fpu_helper.c
index 1a779934715..d1249706f9a 100644
--- a/target/s390x/tcg/vec_fpu_helper.c
+++ b/target/s390x/tcg/vec_fpu_helper.c
@@ -794,7 +794,7 @@ static S390MinMaxRes vfmin_res(uint16_t dcmask_a, uint16_t dcmask_b,
         default:
             g_assert_not_reached();
         }
-    } else if (unlikely(dcmask_a & dcmask_b & DCMASK_ZERO)) {
+    } else if (unlikely((dcmask_a & DCMASK_ZERO) && (dcmask_b & DCMASK_ZERO))) {
         switch (type) {
         case S390_MINMAX_TYPE_JAVA:
             return neg_a ? S390_MINMAX_RES_A : S390_MINMAX_RES_B;
@@ -844,7 +844,7 @@ static S390MinMaxRes vfmax_res(uint16_t dcmask_a, uint16_t dcmask_b,
         default:
             g_assert_not_reached();
         }
-    } else if (unlikely(dcmask_a & dcmask_b & DCMASK_ZERO)) {
+    } else if (unlikely((dcmask_a & DCMASK_ZERO) && (dcmask_b & DCMASK_ZERO))) {
         const bool neg_a = dcmask_a & DCMASK_NEGATIVE;
 
         switch (type) {
-- 
Gitee


From 112c60ab956dd9e7a76b5ee4a66a6a9c3985c073 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= <clg@redhat.com>
Date: Tue, 23 May 2023 12:34:33 +0200
Subject: [PATCH 272/407] target/s390x: Fix CLFIT and CLGIT immediate size
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 279: Backport latest s390x-related fixes from upstream QEMU for qemu-kvm in RHEL 8.9
RH-Bugzilla: 2169308 2209605
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Commit: [15/21] 68c0b87490dfe5349797acd7494fd293c3f733ca

Bugzilla: https://bugzilla.redhat.com/2169308

commit d324c21ba0b84b3033baa097e44a7fbbec815fad
Author: Ilya Leoshkevich <iii@linux.ibm.com>
Date:   Wed Aug 17 18:15:29 2022 +0200

    target/s390x: Fix CLFIT and CLGIT immediate size

    I2 is 16 bits, not 32.

    Found by running valgrind's none/tests/s390x/traps.

    Fixes: 1c2687518235 ("target-s390: Implement COMPARE AND TRAP")
    Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
    Message-Id: <20220817161529.597414-1-iii@linux.ibm.com>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 target/s390x/tcg/insn-data.def | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/s390x/tcg/insn-data.def b/target/s390x/tcg/insn-data.def
index 99f4f5e36eb..96d47941620 100644
--- a/target/s390x/tcg/insn-data.def
+++ b/target/s390x/tcg/insn-data.def
@@ -287,8 +287,8 @@
     D(0xb961, CLGRT,   RRF_c, GIE, r1_o, r2_o, 0, 0, ct, 0, 1)
     D(0xeb23, CLT,     RSY_b, MIE, r1_32u, m2_32u, 0, 0, ct, 0, 1)
     D(0xeb2b, CLGT,    RSY_b, MIE, r1_o, m2_64, 0, 0, ct, 0, 1)
-    D(0xec73, CLFIT,   RIE_a, GIE, r1_32u, i2_32u, 0, 0, ct, 0, 1)
-    D(0xec71, CLGIT,   RIE_a, GIE, r1_o, i2_32u, 0, 0, ct, 0, 1)
+    D(0xec73, CLFIT,   RIE_a, GIE, r1_32u, i2_16u, 0, 0, ct, 0, 1)
+    D(0xec71, CLGIT,   RIE_a, GIE, r1_o, i2_16u, 0, 0, ct, 0, 1)
 
 /* CONVERT TO DECIMAL */
     C(0x4e00, CVD,     RX_a,  Z,   r1_o, a2, 0, 0, cvd, 0)
-- 
Gitee


From b82950a4443b6d745087f78057aecc7f752eb209 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= <clg@redhat.com>
Date: Tue, 23 May 2023 12:34:33 +0200
Subject: [PATCH 273/407] s390x/tcg: Fix opcode for lzrf
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 279: Backport latest s390x-related fixes from upstream QEMU for qemu-kvm in RHEL 8.9
RH-Bugzilla: 2169308 2209605
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Commit: [16/21] 43af79d2c9cd818bfa7ac1819bd9964c86915d97

Bugzilla: https://bugzilla.redhat.com/2169308

commit 131aafa7eff4aa4d747cb7113726b27394a38866
Author: Christian Borntraeger <borntraeger@linux.ibm.com>
Date:   Wed Sep 14 12:57:50 2022 +0200

    s390x/tcg: Fix opcode for lzrf

    Fix the opcode for Load and Zero Rightmost Byte (32).

    Fixes: c2a5c1d718ea ("target/s390x: Implement load-and-zero-rightmost-byte insns")
    Reported-by: Nathan Chancellor <nathan@kernel.org>
    Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
    Tested-by: Nathan Chancellor <nathan@kernel.org>
    Reviewed-by: Cornelia Huck <cohuck@redhat.com>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Cc: qemu-stable@nongnu.org
    Message-Id: <20220914105750.767697-1-borntraeger@linux.ibm.com>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 target/s390x/tcg/insn-data.def | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/s390x/tcg/insn-data.def b/target/s390x/tcg/insn-data.def
index 96d47941620..d54673a3bab 100644
--- a/target/s390x/tcg/insn-data.def
+++ b/target/s390x/tcg/insn-data.def
@@ -463,7 +463,7 @@
     C(0xe39f, LAT,     RXY_a, LAT, 0, m2_32u, r1, 0, lat, 0)
     C(0xe385, LGAT,    RXY_a, LAT, 0, a2, r1, 0, lgat, 0)
 /* LOAD AND ZERO RIGHTMOST BYTE */
-    C(0xe3eb, LZRF,    RXY_a, LZRB, 0, m2_32u, new, r1_32, lzrb, 0)
+    C(0xe33b, LZRF,    RXY_a, LZRB, 0, m2_32u, new, r1_32, lzrb, 0)
     C(0xe32a, LZRG,    RXY_a, LZRB, 0, m2_64, r1, 0, lzrb, 0)
 /* LOAD LOGICAL AND ZERO RIGHTMOST BYTE */
     C(0xe33a, LLZRGF,  RXY_a, LZRB, 0, m2_32u, r1, 0, lzrb, 0)
-- 
Gitee


From 40e7cecf6cd6b881ce022b692a6325094834374f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= <clg@redhat.com>
Date: Tue, 23 May 2023 12:34:33 +0200
Subject: [PATCH 274/407] target/s390x: Fix emulation of the VISTR instruction
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 279: Backport latest s390x-related fixes from upstream QEMU for qemu-kvm in RHEL 8.9
RH-Bugzilla: 2169308 2209605
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Commit: [17/21] ca521ee65c0bd2b191d6fdddbfe38daf39bd7b07

Bugzilla: https://bugzilla.redhat.com/2169308

commit f7d81a351d6122440f9190adba69da3f81b7b186
Author: Thomas Huth <thuth@redhat.com>
Date:   Wed Oct 12 20:27:54 2022 +0200

    target/s390x: Fix emulation of the VISTR instruction

    The element size is encoded in the M3 field, not in the M4
    field.

    Fixes: be6324c6b734 ("s390x/tcg: Implement VECTOR ISOLATE STRING")
    Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1248
    Message-Id: <20221012182755.1014853-3-thuth@redhat.com>
    Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 target/s390x/tcg/translate_vx.c.inc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/s390x/tcg/translate_vx.c.inc b/target/s390x/tcg/translate_vx.c.inc
index 28bf5a23b68..6a125694ed7 100644
--- a/target/s390x/tcg/translate_vx.c.inc
+++ b/target/s390x/tcg/translate_vx.c.inc
@@ -2413,7 +2413,7 @@ static DisasJumpType op_vfene(DisasContext *s, DisasOps *o)
 
 static DisasJumpType op_vistr(DisasContext *s, DisasOps *o)
 {
-    const uint8_t es = get_field(s, m4);
+    const uint8_t es = get_field(s, m3);
     const uint8_t m5 = get_field(s, m5);
     static gen_helper_gvec_2 * const g[3] = {
         gen_helper_gvec_vistr8,
-- 
Gitee


From cd2a7586bc7b2ec60eaa1756211a8e8afe30e435 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= <clg@redhat.com>
Date: Tue, 23 May 2023 12:34:33 +0200
Subject: [PATCH 275/407] s390x/css: revert SCSW ctrl/flag bits on error
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 279: Backport latest s390x-related fixes from upstream QEMU for qemu-kvm in RHEL 8.9
RH-Bugzilla: 2169308 2209605
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Commit: [18/21] e4d5797ab93ba4afd9978a1d3e1f9d05da301506

Bugzilla: https://bugzilla.redhat.com/2169308

commit f53b033e4cd2e7706df3cca04f3bf3c5ffc6b08c
Author: Peter Jin <pjin@linux.ibm.com>
Date:   Thu Oct 27 23:23:41 2022 +0200

    s390x/css: revert SCSW ctrl/flag bits on error

    Revert the control and flag bits in the subchannel status word in case
    the SSCH operation fails with non-zero CC (ditto for CSCH and HSCH).
    According to POPS, the control and flag bits are only changed if SSCH,
    CSCH, and HSCH return CC 0, and no other action should be taken otherwise.
    In order to simulate that after the fact, the bits need to be reverted on
    non-zero CC.

    While the do_subchannel_work logic for virtual (virtio) devices will
    return condition code 0, passthrough (vfio) devices may encounter
    errors from either the host kernel or real hardware that need to be
    accounted for after this point. This includes restoring the state of
    the Subchannel Status Word to reflect the subchannel, as these bits
    would not be set in the event of a non-zero condition code from the
    affected instructions.

    Experimentation has shown that a failure on a START SUBCHANNEL (SSCH)
    to a passthrough device would leave the subchannel with the START
    PENDING activity control bit set, thus blocking subsequent SSCH
    operations in css_do_ssch() until some form of error recovery was
    undertaken since no interrupt would be expected.

    Signed-off-by: Peter Jin <pjin@linux.ibm.com>
    Message-Id: <20221027212341.2904795-1-pjin@linux.ibm.com>
    Reviewed-by: Eric Farman <farman@linux.ibm.com>
    Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
    [thuth: Updated the commit description to Eric's suggestion]
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 hw/s390x/css.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 48 insertions(+), 3 deletions(-)

diff --git a/hw/s390x/css.c b/hw/s390x/css.c
index 7d9523f8113..95d1b3a3cee 100644
--- a/hw/s390x/css.c
+++ b/hw/s390x/css.c
@@ -1522,21 +1522,37 @@ IOInstEnding css_do_xsch(SubchDev *sch)
 IOInstEnding css_do_csch(SubchDev *sch)
 {
     SCHIB *schib = &sch->curr_status;
+    uint16_t old_scsw_ctrl;
+    IOInstEnding ccode;
 
     if (~(schib->pmcw.flags) & (PMCW_FLAGS_MASK_DNV | PMCW_FLAGS_MASK_ENA)) {
         return IOINST_CC_NOT_OPERATIONAL;
     }
 
+    /*
+     * Save the current scsw.ctrl in case CSCH fails and we need
+     * to revert the scsw to the status quo ante.
+     */
+    old_scsw_ctrl = schib->scsw.ctrl;
+
     /* Trigger the clear function. */
     schib->scsw.ctrl &= ~(SCSW_CTRL_MASK_FCTL | SCSW_CTRL_MASK_ACTL);
     schib->scsw.ctrl |= SCSW_FCTL_CLEAR_FUNC | SCSW_ACTL_CLEAR_PEND;
 
-    return do_subchannel_work(sch);
+    ccode = do_subchannel_work(sch);
+
+    if (ccode != IOINST_CC_EXPECTED) {
+        schib->scsw.ctrl = old_scsw_ctrl;
+    }
+
+    return ccode;
 }
 
 IOInstEnding css_do_hsch(SubchDev *sch)
 {
     SCHIB *schib = &sch->curr_status;
+    uint16_t old_scsw_ctrl;
+    IOInstEnding ccode;
 
     if (~(schib->pmcw.flags) & (PMCW_FLAGS_MASK_DNV | PMCW_FLAGS_MASK_ENA)) {
         return IOINST_CC_NOT_OPERATIONAL;
@@ -1553,6 +1569,12 @@ IOInstEnding css_do_hsch(SubchDev *sch)
         return IOINST_CC_BUSY;
     }
 
+    /*
+     * Save the current scsw.ctrl in case HSCH fails and we need
+     * to revert the scsw to the status quo ante.
+     */
+    old_scsw_ctrl = schib->scsw.ctrl;
+
     /* Trigger the halt function. */
     schib->scsw.ctrl |= SCSW_FCTL_HALT_FUNC;
     schib->scsw.ctrl &= ~SCSW_FCTL_START_FUNC;
@@ -1564,7 +1586,13 @@ IOInstEnding css_do_hsch(SubchDev *sch)
     }
     schib->scsw.ctrl |= SCSW_ACTL_HALT_PEND;
 
-    return do_subchannel_work(sch);
+    ccode = do_subchannel_work(sch);
+
+    if (ccode != IOINST_CC_EXPECTED) {
+        schib->scsw.ctrl = old_scsw_ctrl;
+    }
+
+    return ccode;
 }
 
 static void css_update_chnmon(SubchDev *sch)
@@ -1605,6 +1633,8 @@ static void css_update_chnmon(SubchDev *sch)
 IOInstEnding css_do_ssch(SubchDev *sch, ORB *orb)
 {
     SCHIB *schib = &sch->curr_status;
+    uint16_t old_scsw_ctrl, old_scsw_flags;
+    IOInstEnding ccode;
 
     if (~(schib->pmcw.flags) & (PMCW_FLAGS_MASK_DNV | PMCW_FLAGS_MASK_ENA)) {
         return IOINST_CC_NOT_OPERATIONAL;
@@ -1626,11 +1656,26 @@ IOInstEnding css_do_ssch(SubchDev *sch, ORB *orb)
     }
     sch->orb = *orb;
     sch->channel_prog = orb->cpa;
+
+    /*
+     * Save the current scsw.ctrl and scsw.flags in case SSCH fails and we need
+     * to revert the scsw to the status quo ante.
+     */
+    old_scsw_ctrl = schib->scsw.ctrl;
+    old_scsw_flags = schib->scsw.flags;
+
     /* Trigger the start function. */
     schib->scsw.ctrl |= (SCSW_FCTL_START_FUNC | SCSW_ACTL_START_PEND);
     schib->scsw.flags &= ~SCSW_FLAGS_MASK_PNO;
 
-    return do_subchannel_work(sch);
+    ccode = do_subchannel_work(sch);
+
+    if (ccode != IOINST_CC_EXPECTED) {
+        schib->scsw.ctrl = old_scsw_ctrl;
+        schib->scsw.flags = old_scsw_flags;
+    }
+
+    return ccode;
 }
 
 static void copy_irb_to_guest(IRB *dest, const IRB *src, const PMCW *pmcw,
-- 
Gitee


From 542a7cb3f74d52248011e835498d33e9fae16463 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= <clg@redhat.com>
Date: Tue, 23 May 2023 12:34:33 +0200
Subject: [PATCH 276/407] target/s390x/tcg: Fix and improve the SACF
 instruction
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 279: Backport latest s390x-related fixes from upstream QEMU for qemu-kvm in RHEL 8.9
RH-Bugzilla: 2169308 2209605
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Commit: [19/21] 62030baceb0b0d1d651ba9026bb419ed4b2a8149

Bugzilla: https://bugzilla.redhat.com/2169308

commit 21be74a9a59d1e4954ebb59dcbee0fda0b19de00
Author: Thomas Huth <thuth@redhat.com>
Date:   Thu Dec 1 19:44:43 2022 +0100

    target/s390x/tcg: Fix and improve the SACF instruction

    The SET ADDRESS SPACE CONTROL FAST instruction is not privileged, it can be
    used from problem space, too. Just the switching to the home address space
    is privileged and should still generate a privilege exception. This bug is
    e.g. causing programs like Java that use the "getcpu" vdso kernel function
    to crash (see https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=990417#26 ).

    While we're at it, also check if DAT is not enabled. In that case the
    instruction is supposed to generate a special operation exception.

    Resolves: https://gitlab.com/qemu-project/qemu/-/issues/655
    Message-Id: <20221201184443.136355-1-thuth@redhat.com>
    Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Conflicts:
   file rename target/s390x/tcg/insn-data.h.in -> insn-data.def

Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 target/s390x/tcg/cc_helper.c   | 7 +++++++
 target/s390x/tcg/insn-data.def | 2 +-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/target/s390x/tcg/cc_helper.c b/target/s390x/tcg/cc_helper.c
index b2e8d3d9f59..b36f8cdc8b9 100644
--- a/target/s390x/tcg/cc_helper.c
+++ b/target/s390x/tcg/cc_helper.c
@@ -487,6 +487,10 @@ void HELPER(sacf)(CPUS390XState *env, uint64_t a1)
 {
     HELPER_LOG("%s: %16" PRIx64 "\n", __func__, a1);
 
+    if (!(env->psw.mask & PSW_MASK_DAT)) {
+        tcg_s390_program_interrupt(env, PGM_SPECIAL_OP, GETPC());
+    }
+
     switch (a1 & 0xf00) {
     case 0x000:
         env->psw.mask &= ~PSW_MASK_ASC;
@@ -497,6 +501,9 @@ void HELPER(sacf)(CPUS390XState *env, uint64_t a1)
         env->psw.mask |= PSW_ASC_SECONDARY;
         break;
     case 0x300:
+        if ((env->psw.mask & PSW_MASK_PSTATE) != 0) {
+            tcg_s390_program_interrupt(env, PGM_PRIVILEGED, GETPC());
+        }
         env->psw.mask &= ~PSW_MASK_ASC;
         env->psw.mask |= PSW_ASC_HOME;
         break;
diff --git a/target/s390x/tcg/insn-data.def b/target/s390x/tcg/insn-data.def
index d54673a3bab..548b0eedc2f 100644
--- a/target/s390x/tcg/insn-data.def
+++ b/target/s390x/tcg/insn-data.def
@@ -1315,7 +1315,7 @@
 /* SERVICE CALL LOGICAL PROCESSOR (PV hypercall) */
     F(0xb220, SERVC,   RRE,   Z,   r1_o, r2_o, 0, 0, servc, 0, IF_PRIV | IF_IO)
 /* SET ADDRESS SPACE CONTROL FAST */
-    F(0xb279, SACF,    S,     Z,   0, a2, 0, 0, sacf, 0, IF_PRIV)
+    C(0xb279, SACF,    S,     Z,   0, a2, 0, 0, sacf, 0)
 /* SET CLOCK */
     F(0xb204, SCK,     S,     Z,   la2, 0, 0, 0, sck, 0, IF_PRIV | IF_IO)
 /* SET CLOCK COMPARATOR */
-- 
Gitee


From 5ed80201e21f92569d8758fb56bf7fe32b211598 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= <clg@redhat.com>
Date: Tue, 23 May 2023 12:34:33 +0200
Subject: [PATCH 277/407] target/s390x/tcg/mem_helper: Test the right bits in
 psw_key_valid()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 279: Backport latest s390x-related fixes from upstream QEMU for qemu-kvm in RHEL 8.9
RH-Bugzilla: 2169308 2209605
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Commit: [20/21] 00a243a96953023387bab6f1925b734755c53e6e

Bugzilla: https://bugzilla.redhat.com/2169308

commit 5e275ca6fb32bcb4b56b29e6acfd3cf306c4a180
Author: Thomas Huth <thuth@redhat.com>
Date:   Mon Dec 5 15:20:43 2022 +0100

    target/s390x/tcg/mem_helper: Test the right bits in psw_key_valid()

    The PSW key mask is a 16 bit field, and the psw_key variable is
    in the range from 0 to 15, so it does not make sense to use
    "0x80 >> psw_key" for testing the bits here. We should use 0x8000
    instead.

    Message-Id: <20221205142043.95185-1-thuth@redhat.com>
    Reviewed-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 target/s390x/tcg/mem_helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/s390x/tcg/mem_helper.c b/target/s390x/tcg/mem_helper.c
index 362a30d99e0..bd27c75dfbd 100644
--- a/target/s390x/tcg/mem_helper.c
+++ b/target/s390x/tcg/mem_helper.c
@@ -50,7 +50,7 @@ static inline bool psw_key_valid(CPUS390XState *env, uint8_t psw_key)
 
     if (env->psw.mask & PSW_MASK_PSTATE) {
         /* PSW key has range 0..15, it is valid if the bit is 1 in the PKM */
-        return pkm & (0x80 >> psw_key);
+        return pkm & (0x8000 >> psw_key);
     }
     return true;
 }
-- 
Gitee


From e2c5bf3ecf6d24bbdf4e5470dd03c2f1420e354e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= <clg@redhat.com>
Date: Thu, 25 May 2023 12:50:19 +0200
Subject: [PATCH 278/407] pc-bios: Add support for List-Directed IPL from ECKD
 DASD
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 279: Backport latest s390x-related fixes from upstream QEMU for qemu-kvm in RHEL 8.9
RH-Bugzilla: 2169308 2209605
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: David Hildenbrand <david@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Commit: [21/21] cab945af05566d892459a7c8ea3f114310d6bb67

Bugzilla: https://bugzilla.redhat.com/2209605

commit 8af5d141713f5d20c4bc1719eb746ef8b1746bd6
Author: Jared Rossi <jrossi@linux.ibm.com>
Date:   Tue Feb 21 12:45:48 2023 -0500

    pc-bios: Add support for List-Directed IPL from ECKD DASD

    Check for a List Directed IPL Boot Record, which would supersede the CCW type
    entries.  If the record is valid, proceed to use the new style pointers
    and perform LD-IPL. Each block pointer is interpreted as either an LD-IPL
    pointer or a legacy CCW pointer depending on the type of IPL initiated.

    In either case CCW- or LD-IPL is transparent to the user and will boot the same
    image regardless of which set of pointers is used. Because the interactive boot
    menu is only written with the old style pointers, the menu will be disabled for
    List Directed IPL from ECKD DASD.

    If the LD-IPL fails, retry the IPL using the CCW type pointers.

    If no LD-IPL boot record is found, simply perform CCW type IPL as usual.

    Signed-off-by: Jared Rossi <jrossi@linux.ibm.com>
    Message-Id: <20230221174548.1866861-2-jrossi@linux.ibm.com>
    [thuth: Drop some superfluous parantheses]
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 pc-bios/s390-ccw/bootmap.c | 157 ++++++++++++++++++++++++++++---------
 pc-bios/s390-ccw/bootmap.h |  30 ++++++-
 2 files changed, 148 insertions(+), 39 deletions(-)

diff --git a/pc-bios/s390-ccw/bootmap.c b/pc-bios/s390-ccw/bootmap.c
index 994e59c0b02..a2137449dc3 100644
--- a/pc-bios/s390-ccw/bootmap.c
+++ b/pc-bios/s390-ccw/bootmap.c
@@ -72,42 +72,74 @@ static inline void verify_boot_info(BootInfo *bip)
                "Bad block size in zIPL section of the 1st record.");
 }
 
-static block_number_t eckd_block_num(EckdCHS *chs)
+static void eckd_format_chs(ExtEckdBlockPtr *ptr,  bool ldipl,
+                            uint64_t *c,
+                            uint64_t *h,
+                            uint64_t *s)
+{
+    if (ldipl) {
+        *c = ptr->ldptr.chs.cylinder;
+        *h = ptr->ldptr.chs.head;
+        *s = ptr->ldptr.chs.sector;
+    } else {
+        *c = ptr->bptr.chs.cylinder;
+        *h = ptr->bptr.chs.head;
+        *s = ptr->bptr.chs.sector;
+    }
+}
+
+static block_number_t eckd_chs_to_block(uint64_t c, uint64_t h, uint64_t s)
 {
     const uint64_t sectors = virtio_get_sectors();
     const uint64_t heads = virtio_get_heads();
-    const uint64_t cylinder = chs->cylinder
-                            + ((chs->head & 0xfff0) << 12);
-    const uint64_t head = chs->head & 0x000f;
+    const uint64_t cylinder = c + ((h & 0xfff0) << 12);
+    const uint64_t head = h & 0x000f;
     const block_number_t block = sectors * heads * cylinder
                                + sectors * head
-                               + chs->sector
-                               - 1; /* block nr starts with zero */
+                               + s - 1; /* block nr starts with zero */
     return block;
 }
 
-static bool eckd_valid_address(BootMapPointer *p)
+static block_number_t eckd_block_num(EckdCHS *chs)
 {
-    const uint64_t head = p->eckd.chs.head & 0x000f;
+    return eckd_chs_to_block(chs->cylinder, chs->head, chs->sector);
+}
+
+static block_number_t gen_eckd_block_num(ExtEckdBlockPtr *ptr, bool ldipl)
+{
+    uint64_t cyl, head, sec;
+    eckd_format_chs(ptr, ldipl, &cyl, &head, &sec);
+    return eckd_chs_to_block(cyl, head, sec);
+}
 
+static bool eckd_valid_chs(uint64_t cyl, uint64_t head, uint64_t sector)
+{
     if (head >= virtio_get_heads()
-        ||  p->eckd.chs.sector > virtio_get_sectors()
-        ||  p->eckd.chs.sector <= 0) {
+        || sector > virtio_get_sectors()
+        || sector <= 0) {
         return false;
     }
 
     if (!virtio_guessed_disk_nature() &&
-        eckd_block_num(&p->eckd.chs) >= virtio_get_blocks()) {
+        eckd_chs_to_block(cyl, head, sector) >= virtio_get_blocks()) {
         return false;
     }
 
     return true;
 }
 
-static block_number_t load_eckd_segments(block_number_t blk, uint64_t *address)
+static bool eckd_valid_address(ExtEckdBlockPtr *ptr, bool ldipl)
+{
+    uint64_t cyl, head, sec;
+    eckd_format_chs(ptr, ldipl, &cyl, &head, &sec);
+    return eckd_valid_chs(cyl, head, sec);
+}
+
+static block_number_t load_eckd_segments(block_number_t blk, bool ldipl,
+                                         uint64_t *address)
 {
     block_number_t block_nr;
-    int j, rc;
+    int j, rc, count;
     BootMapPointer *bprs = (void *)_bprs;
     bool more_data;
 
@@ -117,7 +149,7 @@ static block_number_t load_eckd_segments(block_number_t blk, uint64_t *address)
     do {
         more_data = false;
         for (j = 0;; j++) {
-            block_nr = eckd_block_num(&bprs[j].xeckd.bptr.chs);
+            block_nr = gen_eckd_block_num(&bprs[j].xeckd, ldipl);
             if (is_null_block_number(block_nr)) { /* end of chunk */
                 break;
             }
@@ -129,11 +161,26 @@ static block_number_t load_eckd_segments(block_number_t blk, uint64_t *address)
                 break;
             }
 
-            IPL_assert(block_size_ok(bprs[j].xeckd.bptr.size),
+            /* List directed pointer does not store block size */
+            IPL_assert(ldipl || block_size_ok(bprs[j].xeckd.bptr.size),
                        "bad chunk block size");
-            IPL_assert(eckd_valid_address(&bprs[j]), "bad chunk ECKD addr");
 
-            if ((bprs[j].xeckd.bptr.count == 0) && unused_space(&(bprs[j+1]),
+            if (!eckd_valid_address(&bprs[j].xeckd, ldipl)) {
+                /*
+                 * If an invalid address is found during LD-IPL then break and
+                 * retry as CCW
+                 */
+                IPL_assert(ldipl, "bad chunk ECKD addr");
+                break;
+            }
+
+            if (ldipl) {
+                count = bprs[j].xeckd.ldptr.count;
+            } else {
+                count = bprs[j].xeckd.bptr.count;
+            }
+
+            if (count == 0 && unused_space(&bprs[j + 1],
                 sizeof(EckdBlockPtr))) {
                 /* This is a "continue" pointer.
                  * This ptr should be the last one in the current
@@ -149,11 +196,10 @@ static block_number_t load_eckd_segments(block_number_t blk, uint64_t *address)
             /* Load (count+1) blocks of code at (block_nr)
              * to memory (address).
              */
-            rc = virtio_read_many(block_nr, (void *)(*address),
-                                  bprs[j].xeckd.bptr.count+1);
+            rc = virtio_read_many(block_nr, (void *)(*address), count + 1);
             IPL_assert(rc == 0, "code chunk read failed");
 
-            *address += (bprs[j].xeckd.bptr.count+1) * virtio_get_block_size();
+            *address += (count + 1) * virtio_get_block_size();
         }
     } while (more_data);
     return block_nr;
@@ -237,8 +283,10 @@ static void run_eckd_boot_script(block_number_t bmt_block_nr,
     uint64_t address;
     BootMapTable *bmt = (void *)sec;
     BootMapScript *bms = (void *)sec;
+    /* The S1B block number is NULL_BLOCK_NR if and only if it's an LD-IPL */
+    bool ldipl = (s1b_block_nr == NULL_BLOCK_NR);
 
-    if (menu_is_enabled_zipl()) {
+    if (menu_is_enabled_zipl() && !ldipl) {
         loadparm = eckd_get_boot_menu_index(s1b_block_nr);
     }
 
@@ -249,7 +297,7 @@ static void run_eckd_boot_script(block_number_t bmt_block_nr,
     memset(sec, FREE_SPACE_FILLER, sizeof(sec));
     read_block(bmt_block_nr, sec, "Cannot read Boot Map Table");
 
-    block_nr = eckd_block_num(&bmt->entry[loadparm].xeckd.bptr.chs);
+    block_nr = gen_eckd_block_num(&bmt->entry[loadparm].xeckd, ldipl);
     IPL_assert(block_nr != -1, "Cannot find Boot Map Table Entry");
 
     memset(sec, FREE_SPACE_FILLER, sizeof(sec));
@@ -264,13 +312,18 @@ static void run_eckd_boot_script(block_number_t bmt_block_nr,
         }
 
         address = bms->entry[i].address.load_address;
-        block_nr = eckd_block_num(&bms->entry[i].blkptr.xeckd.bptr.chs);
+        block_nr = gen_eckd_block_num(&bms->entry[i].blkptr.xeckd, ldipl);
 
         do {
-            block_nr = load_eckd_segments(block_nr, &address);
+            block_nr = load_eckd_segments(block_nr, ldipl, &address);
         } while (block_nr != -1);
     }
 
+    if (ldipl && bms->entry[i].type != BOOT_SCRIPT_EXEC) {
+        /* Abort LD-IPL and retry as CCW-IPL */
+        return;
+    }
+
     IPL_assert(bms->entry[i].type == BOOT_SCRIPT_EXEC,
                "Unknown script entry type");
     write_reset_psw(bms->entry[i].address.load_address); /* no return */
@@ -380,6 +433,23 @@ static void ipl_eckd_ldl(ECKD_IPL_mode_t mode)
     /* no return */
 }
 
+static block_number_t eckd_find_bmt(ExtEckdBlockPtr *ptr)
+{
+    block_number_t blockno;
+    uint8_t tmp_sec[MAX_SECTOR_SIZE];
+    BootRecord *br;
+
+    blockno = gen_eckd_block_num(ptr, 0);
+    read_block(blockno, tmp_sec, "Cannot read boot record");
+    br = (BootRecord *)tmp_sec;
+    if (!magic_match(br->magic, ZIPL_MAGIC)) {
+        /* If the boot record is invalid, return and try CCW-IPL instead */
+        return NULL_BLOCK_NR;
+    }
+
+    return gen_eckd_block_num(&br->pgt.xeckd, 1);
+}
+
 static void print_eckd_msg(void)
 {
     char msg[] = "Using ECKD scheme (block size *****), ";
@@ -401,28 +471,43 @@ static void print_eckd_msg(void)
 
 static void ipl_eckd(void)
 {
-    XEckdMbr *mbr = (void *)sec;
-    LDL_VTOC *vlbl = (void *)sec;
+    IplVolumeLabel *vlbl = (void *)sec;
+    LDL_VTOC *vtoc = (void *)sec;
+    block_number_t ldipl_bmt; /* Boot Map Table for List-Directed IPL */
 
     print_eckd_msg();
 
-    /* Grab the MBR again */
+    /* Block 2 can contain either the CDL VOL1 label or the LDL VTOC */
     memset(sec, FREE_SPACE_FILLER, sizeof(sec));
-    read_block(0, mbr, "Cannot read block 0 on DASD");
+    read_block(2, vlbl, "Cannot read block 2");
 
-    if (magic_match(mbr->magic, IPL1_MAGIC)) {
-        ipl_eckd_cdl();         /* only returns in case of error */
-        return;
+    /*
+     * First check for a list-directed-format pointer which would
+     * supersede the CCW pointer.
+     */
+    if (eckd_valid_address((ExtEckdBlockPtr *)&vlbl->f.br, 0)) {
+        ldipl_bmt = eckd_find_bmt((ExtEckdBlockPtr *)&vlbl->f.br);
+        if (ldipl_bmt) {
+            sclp_print("List-Directed\n");
+            /* LD-IPL does not use the S1B bock, just make it NULL */
+            run_eckd_boot_script(ldipl_bmt, NULL_BLOCK_NR);
+            /* Only return in error, retry as CCW-IPL */
+            sclp_print("Retrying IPL ");
+            print_eckd_msg();
+        }
+        memset(sec, FREE_SPACE_FILLER, sizeof(sec));
+        read_block(2, vtoc, "Cannot read block 2");
     }
 
-    /* LDL/CMS? */
-    memset(sec, FREE_SPACE_FILLER, sizeof(sec));
-    read_block(2, vlbl, "Cannot read block 2");
+    /* Not list-directed */
+    if (magic_match(vtoc->magic, VOL1_MAGIC)) {
+        ipl_eckd_cdl(); /* may return in error */
+    }
 
-    if (magic_match(vlbl->magic, CMS1_MAGIC)) {
+    if (magic_match(vtoc->magic, CMS1_MAGIC)) {
         ipl_eckd_ldl(ECKD_CMS); /* no return */
     }
-    if (magic_match(vlbl->magic, LNX1_MAGIC)) {
+    if (magic_match(vtoc->magic, LNX1_MAGIC)) {
         ipl_eckd_ldl(ECKD_LDL); /* no return */
     }
 
diff --git a/pc-bios/s390-ccw/bootmap.h b/pc-bios/s390-ccw/bootmap.h
index 3946aa3f8d2..d4690a88c28 100644
--- a/pc-bios/s390-ccw/bootmap.h
+++ b/pc-bios/s390-ccw/bootmap.h
@@ -45,9 +45,23 @@ typedef struct EckdBlockPtr {
                     * it's 0 for TablePtr, ScriptPtr, and SectionPtr */
 } __attribute__ ((packed)) EckdBlockPtr;
 
-typedef struct ExtEckdBlockPtr {
+typedef struct LdEckdCHS {
+    uint32_t cylinder;
+    uint8_t head;
+    uint8_t sector;
+} __attribute__ ((packed)) LdEckdCHS;
+
+typedef struct LdEckdBlockPtr {
+    LdEckdCHS chs; /* cylinder/head/sector is an address of the block */
+    uint8_t reserved[4];
+    uint16_t count;
+    uint32_t pad;
+} __attribute__ ((packed)) LdEckdBlockPtr;
+
+/* bptr is used for CCW type IPL, while ldptr is for list-directed IPL */
+typedef union ExtEckdBlockPtr {
     EckdBlockPtr bptr;
-    uint8_t reserved[8];
+    LdEckdBlockPtr ldptr;
 } __attribute__ ((packed)) ExtEckdBlockPtr;
 
 typedef union BootMapPointer {
@@ -57,6 +71,15 @@ typedef union BootMapPointer {
     ExtEckdBlockPtr xeckd;
 } __attribute__ ((packed)) BootMapPointer;
 
+typedef struct BootRecord {
+    uint8_t magic[4];
+    uint32_t version;
+    uint64_t res1;
+    BootMapPointer pgt;
+    uint8_t reserved[510 - 32];
+    uint16_t os_id;
+} __attribute__ ((packed)) BootRecord;
+
 /* aka Program Table */
 typedef struct BootMapTable {
     uint8_t magic[4];
@@ -292,7 +315,8 @@ typedef struct IplVolumeLabel {
         struct {
             unsigned char key[4]; /* == "VOL1" */
             unsigned char volser[6];
-            unsigned char reserved[6];
+            unsigned char reserved[64];
+            EckdCHS br; /* Location of Boot Record for list-directed IPL */
         } f;
     };
 } __attribute__((packed)) IplVolumeLabel;
-- 
Gitee


From 21cf16fd3e9a6c0f01ce18a394aa70e74e856f59 Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Tue, 9 May 2023 10:29:03 -0400
Subject: [PATCH 279/407] memory: prevent dma-reentracy issues

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 277: memory: prevent dma-reentracy issues
RH-Bugzilla: 1999236
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Commit: [1/12] 8fced41b4b2105343e8f0250286b771bcb43c81f (redhat/rhel/src/qemu-kvm/jons-qemu-kvm-2)

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1999236
Upstream: Merged
CVE: CVE-2021-3750
CVE: CVE-2023-0330

commit a2e1753b8054344f32cf94f31c6399a58794a380
Author: Alexander Bulekov <alxndr@bu.edu>
Date:   Thu Apr 27 17:10:06 2023 -0400

    memory: prevent dma-reentracy issues

    Add a flag to the DeviceState, when a device is engaged in PIO/MMIO/DMA.
    This flag is set/checked prior to calling a device's MemoryRegion
    handlers, and set when device code initiates DMA.  The purpose of this
    flag is to prevent two types of DMA-based reentrancy issues:

    1.) mmio -> dma -> mmio case
    2.) bh -> dma write -> mmio case

    These issues have led to problems such as stack-exhaustion and
    use-after-frees.

    Summary of the problem from Peter Maydell:
    https://lore.kernel.org/qemu-devel/CAFEAcA_23vc7hE3iaM-JVA6W38LK4hJoWae5KcknhPRD5fPBZA@mail.gmail.com

    Resolves: https://gitlab.com/qemu-project/qemu/-/issues/62
    Resolves: https://gitlab.com/qemu-project/qemu/-/issues/540
    Resolves: https://gitlab.com/qemu-project/qemu/-/issues/541
    Resolves: https://gitlab.com/qemu-project/qemu/-/issues/556
    Resolves: https://gitlab.com/qemu-project/qemu/-/issues/557
    Resolves: https://gitlab.com/qemu-project/qemu/-/issues/827
    Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1282
    Resolves: CVE-2023-0330

    Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
    Reviewed-by: Thomas Huth <thuth@redhat.com>
    Message-Id: <20230427211013.2994127-2-alxndr@bu.edu>
    [thuth: Replace warn_report() with warn_report_once()]
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 include/exec/memory.h  |  5 +++++
 include/hw/qdev-core.h |  7 +++++++
 softmmu/memory.c       | 16 ++++++++++++++++
 3 files changed, 28 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 20f1b27377e..e089f90f9bc 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -734,6 +734,8 @@ struct MemoryRegion {
     bool is_iommu;
     RAMBlock *ram_block;
     Object *owner;
+    /* owner as TYPE_DEVICE. Used for re-entrancy checks in MR access hotpath */
+    DeviceState *dev;
 
     const MemoryRegionOps *ops;
     void *opaque;
@@ -757,6 +759,9 @@ struct MemoryRegion {
     unsigned ioeventfd_nb;
     MemoryRegionIoeventfd *ioeventfds;
     RamDiscardManager *rdm; /* Only for RAM */
+
+    /* For devices designed to perform re-entrant IO into their own IO MRs */
+    bool disable_reentrancy_guard;
 };
 
 struct IOMMUMemoryRegion {
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 20d3066595e..14226f860d4 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -162,6 +162,10 @@ struct NamedClockList {
     QLIST_ENTRY(NamedClockList) node;
 };
 
+typedef struct {
+    bool engaged_in_io;
+} MemReentrancyGuard;
+
 /**
  * DeviceState:
  * @realized: Indicates whether the device has been fully constructed.
@@ -193,6 +197,9 @@ struct DeviceState {
     int instance_id_alias;
     int alias_required_for_version;
     ResettableState reset;
+
+    /* Is the device currently in mmio/pio/dma? Used to prevent re-entrancy */
+    MemReentrancyGuard mem_reentrancy_guard;
 };
 
 struct DeviceListener {
diff --git a/softmmu/memory.c b/softmmu/memory.c
index 7340e19ff5e..102f0a42482 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -541,6 +541,18 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
         access_size_max = 4;
     }
 
+    /* Do not allow more than one simultaneous access to a device's IO Regions */
+    if (mr->dev && !mr->disable_reentrancy_guard &&
+        !mr->ram_device && !mr->ram && !mr->rom_device && !mr->readonly) {
+        if (mr->dev->mem_reentrancy_guard.engaged_in_io) {
+            warn_report_once("Blocked re-entrant IO on MemoryRegion: "
+                             "%s at addr: 0x%" HWADDR_PRIX,
+                             memory_region_name(mr), addr);
+            return MEMTX_ACCESS_ERROR;
+        }
+        mr->dev->mem_reentrancy_guard.engaged_in_io = true;
+    }
+
     /* FIXME: support unaligned access? */
     access_size = MAX(MIN(size, access_size_max), access_size_min);
     access_mask = MAKE_64BIT_MASK(0, access_size * 8);
@@ -555,6 +567,9 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
                         access_mask, attrs);
         }
     }
+    if (mr->dev) {
+        mr->dev->mem_reentrancy_guard.engaged_in_io = false;
+    }
     return r;
 }
 
@@ -1169,6 +1184,7 @@ static void memory_region_do_init(MemoryRegion *mr,
     }
     mr->name = g_strdup(name);
     mr->owner = owner;
+    mr->dev = (DeviceState *) object_dynamic_cast(mr->owner, TYPE_DEVICE);
     mr->ram_block = NULL;
 
     if (name) {
-- 
Gitee


From 018d782b7dd26bf8e3f89c8ccb48d5f326b491d4 Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Tue, 9 May 2023 10:29:03 -0400
Subject: [PATCH 280/407] async: Add an optional reentrancy guard to the BH API

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 277: memory: prevent dma-reentracy issues
RH-Bugzilla: 1999236
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Commit: [2/12] b03f247e242a6cdb3eebec36477234ac77dcd20c (redhat/rhel/src/qemu-kvm/jons-qemu-kvm-2)

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1999236
Upstream: Merged
CVE: CVE-2021-3750
Conflict: The file block/graph-lock.h, inluded from include/block/aio.h,
          doesn't exist in this code version. The code compiles without
          issues if this include is just omitted, so we do that.

commit 9c86c97f12c060bf7484dd931f38634e166a81f0
Author: Alexander Bulekov <alxndr@bu.edu>
Date:   Thu Apr 27 17:10:07 2023 -0400

    async: Add an optional reentrancy guard to the BH API

    Devices can pass their MemoryReentrancyGuard (from their DeviceState),
    when creating new BHes. Then, the async API will toggle the guard
    before/after calling the BH call-back. This prevents bh->mmio reentrancy
    issues.

    Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
    Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
    Message-Id: <20230427211013.2994127-3-alxndr@bu.edu>
    [thuth: Fix "line over 90 characters" checkpatch.pl error]
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 docs/devel/multiple-iothreads.txt |  7 +++++++
 include/block/aio.h               | 18 ++++++++++++++++--
 include/qemu/main-loop.h          |  7 +++++--
 tests/unit/ptimer-test-stubs.c    |  3 ++-
 util/async.c                      | 18 +++++++++++++++++-
 util/main-loop.c                  |  6 ++++--
 util/trace-events                 |  1 +
 7 files changed, 52 insertions(+), 8 deletions(-)

diff --git a/docs/devel/multiple-iothreads.txt b/docs/devel/multiple-iothreads.txt
index aeb997bed5f..a11576bc744 100644
--- a/docs/devel/multiple-iothreads.txt
+++ b/docs/devel/multiple-iothreads.txt
@@ -61,6 +61,7 @@ There are several old APIs that use the main loop AioContext:
  * LEGACY qemu_aio_set_event_notifier() - monitor an event notifier
  * LEGACY timer_new_ms() - create a timer
  * LEGACY qemu_bh_new() - create a BH
+ * LEGACY qemu_bh_new_guarded() - create a BH with a device re-entrancy guard
  * LEGACY qemu_aio_wait() - run an event loop iteration
 
 Since they implicitly work on the main loop they cannot be used in code that
@@ -72,8 +73,14 @@ Instead, use the AioContext functions directly (see include/block/aio.h):
  * aio_set_event_notifier() - monitor an event notifier
  * aio_timer_new() - create a timer
  * aio_bh_new() - create a BH
+ * aio_bh_new_guarded() - create a BH with a device re-entrancy guard
  * aio_poll() - run an event loop iteration
 
+The qemu_bh_new_guarded/aio_bh_new_guarded APIs accept a "MemReentrancyGuard"
+argument, which is used to check for and prevent re-entrancy problems. For
+BHs associated with devices, the reentrancy-guard is contained in the
+corresponding DeviceState and named "mem_reentrancy_guard".
+
 The AioContext can be obtained from the IOThread using
 iothread_get_aio_context() or for the main loop using qemu_get_aio_context().
 Code that takes an AioContext argument works both in IOThreads or the main
diff --git a/include/block/aio.h b/include/block/aio.h
index 47fbe9d81f2..c7da152985e 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -22,6 +22,8 @@
 #include "qemu/event_notifier.h"
 #include "qemu/thread.h"
 #include "qemu/timer.h"
+#include "hw/qdev-core.h"
+
 
 typedef struct BlockAIOCB BlockAIOCB;
 typedef void BlockCompletionFunc(void *opaque, int ret);
@@ -321,9 +323,11 @@ void aio_bh_schedule_oneshot_full(AioContext *ctx, QEMUBHFunc *cb, void *opaque,
  * is opaque and must be allocated prior to its use.
  *
  * @name: A human-readable identifier for debugging purposes.
+ * @reentrancy_guard: A guard set when entering a cb to prevent
+ * device-reentrancy issues
  */
 QEMUBH *aio_bh_new_full(AioContext *ctx, QEMUBHFunc *cb, void *opaque,
-                        const char *name);
+                        const char *name, MemReentrancyGuard *reentrancy_guard);
 
 /**
  * aio_bh_new: Allocate a new bottom half structure
@@ -332,7 +336,17 @@ QEMUBH *aio_bh_new_full(AioContext *ctx, QEMUBHFunc *cb, void *opaque,
  * string.
  */
 #define aio_bh_new(ctx, cb, opaque) \
-    aio_bh_new_full((ctx), (cb), (opaque), (stringify(cb)))
+    aio_bh_new_full((ctx), (cb), (opaque), (stringify(cb)), NULL)
+
+/**
+ * aio_bh_new_guarded: Allocate a new bottom half structure with a
+ * reentrancy_guard
+ *
+ * A convenience wrapper for aio_bh_new_full() that uses the cb as the name
+ * string.
+ */
+#define aio_bh_new_guarded(ctx, cb, opaque, guard) \
+    aio_bh_new_full((ctx), (cb), (opaque), (stringify(cb)), guard)
 
 /**
  * aio_notify: Force processing of pending events.
diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
index 8dbc6fcb894..85dd5ada9ec 100644
--- a/include/qemu/main-loop.h
+++ b/include/qemu/main-loop.h
@@ -294,9 +294,12 @@ void qemu_cond_timedwait_iothread(QemuCond *cond, int ms);
 
 void qemu_fd_register(int fd);
 
+#define qemu_bh_new_guarded(cb, opaque, guard) \
+    qemu_bh_new_full((cb), (opaque), (stringify(cb)), guard)
 #define qemu_bh_new(cb, opaque) \
-    qemu_bh_new_full((cb), (opaque), (stringify(cb)))
-QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name);
+    qemu_bh_new_full((cb), (opaque), (stringify(cb)), NULL)
+QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name,
+                         MemReentrancyGuard *reentrancy_guard);
 void qemu_bh_schedule_idle(QEMUBH *bh);
 
 enum {
diff --git a/tests/unit/ptimer-test-stubs.c b/tests/unit/ptimer-test-stubs.c
index 2a3ef587991..a7a2d08e7ec 100644
--- a/tests/unit/ptimer-test-stubs.c
+++ b/tests/unit/ptimer-test-stubs.c
@@ -108,7 +108,8 @@ int64_t qemu_clock_deadline_ns_all(QEMUClockType type, int attr_mask)
     return deadline;
 }
 
-QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name)
+QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name,
+                         MemReentrancyGuard *reentrancy_guard)
 {
     QEMUBH *bh = g_new(QEMUBH, 1);
 
diff --git a/util/async.c b/util/async.c
index 2a63bf90f2c..1fff02e7fc9 100644
--- a/util/async.c
+++ b/util/async.c
@@ -62,6 +62,7 @@ struct QEMUBH {
     void *opaque;
     QSLIST_ENTRY(QEMUBH) next;
     unsigned flags;
+    MemReentrancyGuard *reentrancy_guard;
 };
 
 /* Called concurrently from any thread */
@@ -127,7 +128,7 @@ void aio_bh_schedule_oneshot_full(AioContext *ctx, QEMUBHFunc *cb,
 }
 
 QEMUBH *aio_bh_new_full(AioContext *ctx, QEMUBHFunc *cb, void *opaque,
-                        const char *name)
+                        const char *name, MemReentrancyGuard *reentrancy_guard)
 {
     QEMUBH *bh;
     bh = g_new(QEMUBH, 1);
@@ -136,13 +137,28 @@ QEMUBH *aio_bh_new_full(AioContext *ctx, QEMUBHFunc *cb, void *opaque,
         .cb = cb,
         .opaque = opaque,
         .name = name,
+        .reentrancy_guard = reentrancy_guard,
     };
     return bh;
 }
 
 void aio_bh_call(QEMUBH *bh)
 {
+    bool last_engaged_in_io = false;
+
+    if (bh->reentrancy_guard) {
+        last_engaged_in_io = bh->reentrancy_guard->engaged_in_io;
+        if (bh->reentrancy_guard->engaged_in_io) {
+            trace_reentrant_aio(bh->ctx, bh->name);
+        }
+        bh->reentrancy_guard->engaged_in_io = true;
+    }
+
     bh->cb(bh->opaque);
+
+    if (bh->reentrancy_guard) {
+        bh->reentrancy_guard->engaged_in_io = last_engaged_in_io;
+    }
 }
 
 /* Multiple occurrences of aio_bh_poll cannot be called concurrently. */
diff --git a/util/main-loop.c b/util/main-loop.c
index 06b18b195cf..1eacf04691c 100644
--- a/util/main-loop.c
+++ b/util/main-loop.c
@@ -544,9 +544,11 @@ void main_loop_wait(int nonblocking)
 
 /* Functions to operate on the main QEMU AioContext.  */
 
-QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name)
+QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name,
+                         MemReentrancyGuard *reentrancy_guard)
 {
-    return aio_bh_new_full(qemu_aio_context, cb, opaque, name);
+    return aio_bh_new_full(qemu_aio_context, cb, opaque, name,
+                           reentrancy_guard);
 }
 
 /*
diff --git a/util/trace-events b/util/trace-events
index c8f53d7d9fc..dc3b1eb3bfe 100644
--- a/util/trace-events
+++ b/util/trace-events
@@ -11,6 +11,7 @@ poll_remove(void *ctx, void *node, int fd) "ctx %p node %p fd %d"
 # async.c
 aio_co_schedule(void *ctx, void *co) "ctx %p co %p"
 aio_co_schedule_bh_cb(void *ctx, void *co) "ctx %p co %p"
+reentrant_aio(void *ctx, const char *name) "ctx %p name %s"
 
 # thread-pool.c
 thread_pool_submit(void *pool, void *req, void *opaque) "pool %p req %p opaque %p"
-- 
Gitee


From 3401156c74c4b2f296aeb9ec559815072a400f11 Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Tue, 9 May 2023 10:29:03 -0400
Subject: [PATCH 281/407] checkpatch: add qemu_bh_new/aio_bh_new checks

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 277: memory: prevent dma-reentracy issues
RH-Bugzilla: 1999236
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Commit: [3/12] 620b480b0878c18223f3cc103450bc16aa6d7e21 (redhat/rhel/src/qemu-kvm/jons-qemu-kvm-2)

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1999236
Upstream: Merged
CVE: CVE-2021-3750

commit ef56ffbdd6b0605dc1e305611287b948c970e236
Author: Alexander Bulekov <alxndr@bu.edu>
Date:   Thu Apr 27 17:10:08 2023 -0400

    checkpatch: add qemu_bh_new/aio_bh_new checks

    Advise authors to use the _guarded versions of the APIs, instead.

    Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
    Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
    Message-Id: <20230427211013.2994127-4-alxndr@bu.edu>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 scripts/checkpatch.pl | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index cb8eff233e0..b2428e80ccf 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -2858,6 +2858,14 @@ sub process {
 		if ($line =~ /\bsignal\s*\(/ && !($line =~ /SIG_(?:IGN|DFL)/)) {
 			ERROR("use sigaction to establish signal handlers; signal is not portable\n" . $herecurr);
 		}
+# recommend qemu_bh_new_guarded instead of qemu_bh_new
+        if ($realfile =~ /.*\/hw\/.*/ && $line =~ /\bqemu_bh_new\s*\(/) {
+			ERROR("use qemu_bh_new_guarded() instead of qemu_bh_new() to avoid reentrancy problems\n" . $herecurr);
+		}
+# recommend aio_bh_new_guarded instead of aio_bh_new
+        if ($realfile =~ /.*\/hw\/.*/ && $line =~ /\baio_bh_new\s*\(/) {
+			ERROR("use aio_bh_new_guarded() instead of aio_bh_new() to avoid reentrancy problems\n" . $herecurr);
+		}
 # check for module_init(), use category-specific init macros explicitly please
 		if ($line =~ /^module_init\s*\(/) {
 			ERROR("please use block_init(), type_init() etc. instead of module_init()\n" . $herecurr);
-- 
Gitee


From adb536321629200ac0975668135321fa76470407 Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Tue, 9 May 2023 10:29:03 -0400
Subject: [PATCH 282/407] hw: replace most qemu_bh_new calls with
 qemu_bh_new_guarded

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 277: memory: prevent dma-reentracy issues
RH-Bugzilla: 1999236
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Commit: [4/12] 00c51d30246b3aa529f6043e35ee471660aa1fce (redhat/rhel/src/qemu-kvm/jons-qemu-kvm-2)

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1999236
Upstream: Merged
CVE: CVE-2021-3750
Conflicts: In hw/nvme/ctrl.c there are no calls to qemu_bh_new() at the two locations
           the replacement is done in the upstream commit. Instead, timer_new_ns() is
           used. We leave these functions unaltered.

commit f63192b0544af5d3e4d5edfd85ab520fcf671377
Author: Alexander Bulekov <alxndr@bu.edu>
Date:   Thu Apr 27 17:10:09 2023 -0400

    hw: replace most qemu_bh_new calls with qemu_bh_new_guarded

    This protects devices from bh->mmio reentrancy issues.

    Thanks: Thomas Huth <thuth@redhat.com> for diagnosing OS X test failure.
    Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
    Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
    Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
    Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
    Reviewed-by: Paul Durrant <paul@xen.org>
    Reviewed-by: Thomas Huth <thuth@redhat.com>
    Message-Id: <20230427211013.2994127-5-alxndr@bu.edu>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 hw/9pfs/xen-9p-backend.c        | 5 ++++-
 hw/block/dataplane/virtio-blk.c | 3 ++-
 hw/block/dataplane/xen-block.c  | 5 +++--
 hw/char/virtio-serial-bus.c     | 3 ++-
 hw/display/qxl.c                | 9 ++++++---
 hw/display/virtio-gpu.c         | 6 ++++--
 hw/ide/ahci.c                   | 3 ++-
 hw/ide/ahci_internal.h          | 1 +
 hw/ide/core.c                   | 4 +++-
 hw/misc/imx_rngc.c              | 6 ++++--
 hw/misc/macio/mac_dbdma.c       | 2 +-
 hw/net/virtio-net.c             | 3 ++-
 hw/scsi/mptsas.c                | 3 ++-
 hw/scsi/scsi-bus.c              | 3 ++-
 hw/scsi/vmw_pvscsi.c            | 3 ++-
 hw/usb/dev-uas.c                | 3 ++-
 hw/usb/hcd-dwc2.c               | 3 ++-
 hw/usb/hcd-ehci.c               | 3 ++-
 hw/usb/hcd-uhci.c               | 2 +-
 hw/usb/host-libusb.c            | 6 ++++--
 hw/usb/redirect.c               | 6 ++++--
 hw/usb/xen-usb.c                | 3 ++-
 hw/virtio/virtio-balloon.c      | 5 +++--
 hw/virtio/virtio-crypto.c       | 3 ++-
 24 files changed, 62 insertions(+), 31 deletions(-)

diff --git a/hw/9pfs/xen-9p-backend.c b/hw/9pfs/xen-9p-backend.c
index 65c4979c3c5..09f7c135880 100644
--- a/hw/9pfs/xen-9p-backend.c
+++ b/hw/9pfs/xen-9p-backend.c
@@ -60,6 +60,7 @@ typedef struct Xen9pfsDev {
 
     int num_rings;
     Xen9pfsRing *rings;
+    MemReentrancyGuard mem_reentrancy_guard;
 } Xen9pfsDev;
 
 static void xen_9pfs_disconnect(struct XenLegacyDevice *xendev);
@@ -441,7 +442,9 @@ static int xen_9pfs_connect(struct XenLegacyDevice *xendev)
         xen_9pdev->rings[i].ring.out = xen_9pdev->rings[i].data +
                                        XEN_FLEX_RING_SIZE(ring_order);
 
-        xen_9pdev->rings[i].bh = qemu_bh_new(xen_9pfs_bh, &xen_9pdev->rings[i]);
+        xen_9pdev->rings[i].bh = qemu_bh_new_guarded(xen_9pfs_bh,
+                                                     &xen_9pdev->rings[i],
+                                                     &xen_9pdev->mem_reentrancy_guard);
         xen_9pdev->rings[i].out_cons = 0;
         xen_9pdev->rings[i].out_size = 0;
         xen_9pdev->rings[i].inprogress = false;
diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index ee5a5352dc8..5f0de7da1e3 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -127,7 +127,8 @@ bool virtio_blk_data_plane_create(VirtIODevice *vdev, VirtIOBlkConf *conf,
     } else {
         s->ctx = qemu_get_aio_context();
     }
-    s->bh = aio_bh_new(s->ctx, notify_guest_bh, s);
+    s->bh = aio_bh_new_guarded(s->ctx, notify_guest_bh, s,
+                               &DEVICE(vdev)->mem_reentrancy_guard);
     s->batch_notify_vqs = bitmap_new(conf->num_queues);
 
     *dataplane = s;
diff --git a/hw/block/dataplane/xen-block.c b/hw/block/dataplane/xen-block.c
index 860787580a3..07855feea6c 100644
--- a/hw/block/dataplane/xen-block.c
+++ b/hw/block/dataplane/xen-block.c
@@ -631,8 +631,9 @@ XenBlockDataPlane *xen_block_dataplane_create(XenDevice *xendev,
     } else {
         dataplane->ctx = qemu_get_aio_context();
     }
-    dataplane->bh = aio_bh_new(dataplane->ctx, xen_block_dataplane_bh,
-                               dataplane);
+    dataplane->bh = aio_bh_new_guarded(dataplane->ctx, xen_block_dataplane_bh,
+                                       dataplane,
+                                       &DEVICE(xendev)->mem_reentrancy_guard);
 
     return dataplane;
 }
diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index f01ec2137c9..f18124b1553 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -985,7 +985,8 @@ static void virtser_port_device_realize(DeviceState *dev, Error **errp)
         return;
     }
 
-    port->bh = qemu_bh_new(flush_queued_data_bh, port);
+    port->bh = qemu_bh_new_guarded(flush_queued_data_bh, port,
+                                   &dev->mem_reentrancy_guard);
     port->elem = NULL;
 }
 
diff --git a/hw/display/qxl.c b/hw/display/qxl.c
index bcd9e8716aa..0f663b99124 100644
--- a/hw/display/qxl.c
+++ b/hw/display/qxl.c
@@ -2206,11 +2206,14 @@ static void qxl_realize_common(PCIQXLDevice *qxl, Error **errp)
 
     qemu_add_vm_change_state_handler(qxl_vm_change_state_handler, qxl);
 
-    qxl->update_irq = qemu_bh_new(qxl_update_irq_bh, qxl);
+    qxl->update_irq = qemu_bh_new_guarded(qxl_update_irq_bh, qxl,
+                                          &DEVICE(qxl)->mem_reentrancy_guard);
     qxl_reset_state(qxl);
 
-    qxl->update_area_bh = qemu_bh_new(qxl_render_update_area_bh, qxl);
-    qxl->ssd.cursor_bh = qemu_bh_new(qemu_spice_cursor_refresh_bh, &qxl->ssd);
+    qxl->update_area_bh = qemu_bh_new_guarded(qxl_render_update_area_bh, qxl,
+                                              &DEVICE(qxl)->mem_reentrancy_guard);
+    qxl->ssd.cursor_bh = qemu_bh_new_guarded(qemu_spice_cursor_refresh_bh, &qxl->ssd,
+                                             &DEVICE(qxl)->mem_reentrancy_guard);
 }
 
 static void qxl_realize_primary(PCIDevice *dev, Error **errp)
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index d78b9700c7d..ecf9079145e 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -1332,8 +1332,10 @@ void virtio_gpu_device_realize(DeviceState *qdev, Error **errp)
 
     g->ctrl_vq = virtio_get_queue(vdev, 0);
     g->cursor_vq = virtio_get_queue(vdev, 1);
-    g->ctrl_bh = qemu_bh_new(virtio_gpu_ctrl_bh, g);
-    g->cursor_bh = qemu_bh_new(virtio_gpu_cursor_bh, g);
+    g->ctrl_bh = qemu_bh_new_guarded(virtio_gpu_ctrl_bh, g,
+                                     &qdev->mem_reentrancy_guard);
+    g->cursor_bh = qemu_bh_new_guarded(virtio_gpu_cursor_bh, g,
+                                       &qdev->mem_reentrancy_guard);
     QTAILQ_INIT(&g->reslist);
     QTAILQ_INIT(&g->cmdq);
     QTAILQ_INIT(&g->fenceq);
diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index a94c6e26fb0..7488b28065e 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -1504,7 +1504,8 @@ static void ahci_cmd_done(const IDEDMA *dma)
     ahci_write_fis_d2h(ad);
 
     if (ad->port_regs.cmd_issue && !ad->check_bh) {
-        ad->check_bh = qemu_bh_new(ahci_check_cmd_bh, ad);
+        ad->check_bh = qemu_bh_new_guarded(ahci_check_cmd_bh, ad,
+                                           &ad->mem_reentrancy_guard);
         qemu_bh_schedule(ad->check_bh);
     }
 }
diff --git a/hw/ide/ahci_internal.h b/hw/ide/ahci_internal.h
index 109de9e2d11..a7768dd69e4 100644
--- a/hw/ide/ahci_internal.h
+++ b/hw/ide/ahci_internal.h
@@ -321,6 +321,7 @@ struct AHCIDevice {
     bool init_d2h_sent;
     AHCICmdHdr *cur_cmd;
     NCQTransferState ncq_tfs[AHCI_MAX_CMDS];
+    MemReentrancyGuard mem_reentrancy_guard;
 };
 
 struct AHCIPCIState {
diff --git a/hw/ide/core.c b/hw/ide/core.c
index 15138225be4..05a32d0a99a 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -510,6 +510,7 @@ BlockAIOCB *ide_issue_trim(
         BlockCompletionFunc *cb, void *cb_opaque, void *opaque)
 {
     IDEState *s = opaque;
+    IDEDevice *dev = s->unit ? s->bus->slave : s->bus->master;
     TrimAIOCB *iocb;
 
     /* Paired with a decrement in ide_trim_bh_cb() */
@@ -517,7 +518,8 @@ BlockAIOCB *ide_issue_trim(
 
     iocb = blk_aio_get(&trim_aiocb_info, s->blk, cb, cb_opaque);
     iocb->s = s;
-    iocb->bh = qemu_bh_new(ide_trim_bh_cb, iocb);
+    iocb->bh = qemu_bh_new_guarded(ide_trim_bh_cb, iocb,
+                                   &DEVICE(dev)->mem_reentrancy_guard);
     iocb->ret = 0;
     iocb->qiov = qiov;
     iocb->i = -1;
diff --git a/hw/misc/imx_rngc.c b/hw/misc/imx_rngc.c
index 632c03779cb..082c6980ad5 100644
--- a/hw/misc/imx_rngc.c
+++ b/hw/misc/imx_rngc.c
@@ -228,8 +228,10 @@ static void imx_rngc_realize(DeviceState *dev, Error **errp)
     sysbus_init_mmio(sbd, &s->iomem);
 
     sysbus_init_irq(sbd, &s->irq);
-    s->self_test_bh = qemu_bh_new(imx_rngc_self_test, s);
-    s->seed_bh = qemu_bh_new(imx_rngc_seed, s);
+    s->self_test_bh = qemu_bh_new_guarded(imx_rngc_self_test, s,
+                                          &dev->mem_reentrancy_guard);
+    s->seed_bh = qemu_bh_new_guarded(imx_rngc_seed, s,
+                                     &dev->mem_reentrancy_guard);
 }
 
 static void imx_rngc_reset(DeviceState *dev)
diff --git a/hw/misc/macio/mac_dbdma.c b/hw/misc/macio/mac_dbdma.c
index e220f1a9277..f6a9e76fe74 100644
--- a/hw/misc/macio/mac_dbdma.c
+++ b/hw/misc/macio/mac_dbdma.c
@@ -912,7 +912,7 @@ static void mac_dbdma_realize(DeviceState *dev, Error **errp)
 {
     DBDMAState *s = MAC_DBDMA(dev);
 
-    s->bh = qemu_bh_new(DBDMA_run_bh, s);
+    s->bh = qemu_bh_new_guarded(DBDMA_run_bh, s, &dev->mem_reentrancy_guard);
 }
 
 static void mac_dbdma_class_init(ObjectClass *oc, void *data)
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 7e172ef8297..ddaa8fa1224 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -2753,7 +2753,8 @@ static void virtio_net_add_queue(VirtIONet *n, int index)
         n->vqs[index].tx_vq =
             virtio_add_queue(vdev, n->net_conf.tx_queue_size,
                              virtio_net_handle_tx_bh);
-        n->vqs[index].tx_bh = qemu_bh_new(virtio_net_tx_bh, &n->vqs[index]);
+        n->vqs[index].tx_bh = qemu_bh_new_guarded(virtio_net_tx_bh, &n->vqs[index],
+                                                  &DEVICE(vdev)->mem_reentrancy_guard);
     }
 
     n->vqs[index].tx_waiting = 0;
diff --git a/hw/scsi/mptsas.c b/hw/scsi/mptsas.c
index f6c77655443..ab8aaca85d8 100644
--- a/hw/scsi/mptsas.c
+++ b/hw/scsi/mptsas.c
@@ -1313,7 +1313,8 @@ static void mptsas_scsi_realize(PCIDevice *dev, Error **errp)
     }
     s->max_devices = MPTSAS_NUM_PORTS;
 
-    s->request_bh = qemu_bh_new(mptsas_fetch_requests, s);
+    s->request_bh = qemu_bh_new_guarded(mptsas_fetch_requests, s,
+                                        &DEVICE(dev)->mem_reentrancy_guard);
 
     scsi_bus_init(&s->bus, sizeof(s->bus), &dev->qdev, &mptsas_scsi_info);
 }
diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index 77325d8cc7a..b506ab7d043 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -192,7 +192,8 @@ static void scsi_dma_restart_cb(void *opaque, bool running, RunState state)
         AioContext *ctx = blk_get_aio_context(s->conf.blk);
         /* The reference is dropped in scsi_dma_restart_bh.*/
         object_ref(OBJECT(s));
-        s->bh = aio_bh_new(ctx, scsi_dma_restart_bh, s);
+        s->bh = aio_bh_new_guarded(ctx, scsi_dma_restart_bh, s,
+                                   &DEVICE(s)->mem_reentrancy_guard);
         qemu_bh_schedule(s->bh);
     }
 }
diff --git a/hw/scsi/vmw_pvscsi.c b/hw/scsi/vmw_pvscsi.c
index cd76bd67ab7..4c36febbc0e 100644
--- a/hw/scsi/vmw_pvscsi.c
+++ b/hw/scsi/vmw_pvscsi.c
@@ -1178,7 +1178,8 @@ pvscsi_realizefn(PCIDevice *pci_dev, Error **errp)
         pcie_endpoint_cap_init(pci_dev, PVSCSI_EXP_EP_OFFSET);
     }
 
-    s->completion_worker = qemu_bh_new(pvscsi_process_completion_queue, s);
+    s->completion_worker = qemu_bh_new_guarded(pvscsi_process_completion_queue, s,
+                                               &DEVICE(pci_dev)->mem_reentrancy_guard);
 
     scsi_bus_init(&s->bus, sizeof(s->bus), DEVICE(pci_dev), &pvscsi_scsi_info);
     /* override default SCSI bus hotplug-handler, with pvscsi's one */
diff --git a/hw/usb/dev-uas.c b/hw/usb/dev-uas.c
index 599d6b52a01..a36a7c3013d 100644
--- a/hw/usb/dev-uas.c
+++ b/hw/usb/dev-uas.c
@@ -935,7 +935,8 @@ static void usb_uas_realize(USBDevice *dev, Error **errp)
 
     QTAILQ_INIT(&uas->results);
     QTAILQ_INIT(&uas->requests);
-    uas->status_bh = qemu_bh_new(usb_uas_send_status_bh, uas);
+    uas->status_bh = qemu_bh_new_guarded(usb_uas_send_status_bh, uas,
+                                         &d->mem_reentrancy_guard);
 
     dev->flags |= (1 << USB_DEV_FLAG_IS_SCSI_STORAGE);
     scsi_bus_init(&uas->bus, sizeof(uas->bus), DEVICE(dev), &usb_uas_scsi_info);
diff --git a/hw/usb/hcd-dwc2.c b/hw/usb/hcd-dwc2.c
index e1d96acf7ec..0e238f84229 100644
--- a/hw/usb/hcd-dwc2.c
+++ b/hw/usb/hcd-dwc2.c
@@ -1364,7 +1364,8 @@ static void dwc2_realize(DeviceState *dev, Error **errp)
     s->fi = USB_FRMINTVL - 1;
     s->eof_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, dwc2_frame_boundary, s);
     s->frame_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, dwc2_work_timer, s);
-    s->async_bh = qemu_bh_new(dwc2_work_bh, s);
+    s->async_bh = qemu_bh_new_guarded(dwc2_work_bh, s,
+                                      &dev->mem_reentrancy_guard);
 
     sysbus_init_irq(sbd, &s->irq);
 }
diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c
index 6caa7ac6c28..df4ff6f2c1a 100644
--- a/hw/usb/hcd-ehci.c
+++ b/hw/usb/hcd-ehci.c
@@ -2528,7 +2528,8 @@ void usb_ehci_realize(EHCIState *s, DeviceState *dev, Error **errp)
     }
 
     s->frame_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, ehci_work_timer, s);
-    s->async_bh = qemu_bh_new(ehci_work_bh, s);
+    s->async_bh = qemu_bh_new_guarded(ehci_work_bh, s,
+                                      &dev->mem_reentrancy_guard);
     s->device = dev;
 
     s->vmstate = qemu_add_vm_change_state_handler(usb_ehci_vm_state_change, s);
diff --git a/hw/usb/hcd-uhci.c b/hw/usb/hcd-uhci.c
index 7930b868fa8..469c5e57e93 100644
--- a/hw/usb/hcd-uhci.c
+++ b/hw/usb/hcd-uhci.c
@@ -1195,7 +1195,7 @@ void usb_uhci_common_realize(PCIDevice *dev, Error **errp)
                               USB_SPEED_MASK_LOW | USB_SPEED_MASK_FULL);
         }
     }
-    s->bh = qemu_bh_new(uhci_bh, s);
+    s->bh = qemu_bh_new_guarded(uhci_bh, s, &DEVICE(dev)->mem_reentrancy_guard);
     s->frame_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, uhci_frame_timer, s);
     s->num_ports_vmstate = NB_PORTS;
     QTAILQ_INIT(&s->queues);
diff --git a/hw/usb/host-libusb.c b/hw/usb/host-libusb.c
index d0d46dd0a4a..09b961116b2 100644
--- a/hw/usb/host-libusb.c
+++ b/hw/usb/host-libusb.c
@@ -1141,7 +1141,8 @@ static void usb_host_nodev_bh(void *opaque)
 static void usb_host_nodev(USBHostDevice *s)
 {
     if (!s->bh_nodev) {
-        s->bh_nodev = qemu_bh_new(usb_host_nodev_bh, s);
+        s->bh_nodev = qemu_bh_new_guarded(usb_host_nodev_bh, s,
+                                          &DEVICE(s)->mem_reentrancy_guard);
     }
     qemu_bh_schedule(s->bh_nodev);
 }
@@ -1739,7 +1740,8 @@ static int usb_host_post_load(void *opaque, int version_id)
     USBHostDevice *dev = opaque;
 
     if (!dev->bh_postld) {
-        dev->bh_postld = qemu_bh_new(usb_host_post_load_bh, dev);
+        dev->bh_postld = qemu_bh_new_guarded(usb_host_post_load_bh, dev,
+                                             &DEVICE(dev)->mem_reentrancy_guard);
     }
     qemu_bh_schedule(dev->bh_postld);
     dev->bh_postld_pending = true;
diff --git a/hw/usb/redirect.c b/hw/usb/redirect.c
index 5f0ef9cb3b0..59cd3cd7c4a 100644
--- a/hw/usb/redirect.c
+++ b/hw/usb/redirect.c
@@ -1437,8 +1437,10 @@ static void usbredir_realize(USBDevice *udev, Error **errp)
         }
     }
 
-    dev->chardev_close_bh = qemu_bh_new(usbredir_chardev_close_bh, dev);
-    dev->device_reject_bh = qemu_bh_new(usbredir_device_reject_bh, dev);
+    dev->chardev_close_bh = qemu_bh_new_guarded(usbredir_chardev_close_bh, dev,
+                                                &DEVICE(dev)->mem_reentrancy_guard);
+    dev->device_reject_bh = qemu_bh_new_guarded(usbredir_device_reject_bh, dev,
+                                                &DEVICE(dev)->mem_reentrancy_guard);
     dev->attach_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL, usbredir_do_attach, dev);
 
     packet_id_queue_init(&dev->cancelled, dev, "cancelled");
diff --git a/hw/usb/xen-usb.c b/hw/usb/xen-usb.c
index 0f7369e7ed6..dec91294ad0 100644
--- a/hw/usb/xen-usb.c
+++ b/hw/usb/xen-usb.c
@@ -1021,7 +1021,8 @@ static void usbback_alloc(struct XenLegacyDevice *xendev)
 
     QTAILQ_INIT(&usbif->req_free_q);
     QSIMPLEQ_INIT(&usbif->hotplug_q);
-    usbif->bh = qemu_bh_new(usbback_bh, usbif);
+    usbif->bh = qemu_bh_new_guarded(usbback_bh, usbif,
+                                    &DEVICE(xendev)->mem_reentrancy_guard);
 }
 
 static int usbback_free(struct XenLegacyDevice *xendev)
diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 9a4f491b54d..f503572e279 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -917,8 +917,9 @@ static void virtio_balloon_device_realize(DeviceState *dev, Error **errp)
         precopy_add_notifier(&s->free_page_hint_notify);
 
         object_ref(OBJECT(s->iothread));
-        s->free_page_bh = aio_bh_new(iothread_get_aio_context(s->iothread),
-                                     virtio_ballloon_get_free_page_hints, s);
+        s->free_page_bh = aio_bh_new_guarded(iothread_get_aio_context(s->iothread),
+                                             virtio_ballloon_get_free_page_hints, s,
+                                             &dev->mem_reentrancy_guard);
     }
 
     if (virtio_has_feature(s->host_features, VIRTIO_BALLOON_F_REPORTING)) {
diff --git a/hw/virtio/virtio-crypto.c b/hw/virtio/virtio-crypto.c
index 54f9bbb789c..1be7bb543cd 100644
--- a/hw/virtio/virtio-crypto.c
+++ b/hw/virtio/virtio-crypto.c
@@ -817,7 +817,8 @@ static void virtio_crypto_device_realize(DeviceState *dev, Error **errp)
         vcrypto->vqs[i].dataq =
                  virtio_add_queue(vdev, 1024, virtio_crypto_handle_dataq_bh);
         vcrypto->vqs[i].dataq_bh =
-                 qemu_bh_new(virtio_crypto_dataq_bh, &vcrypto->vqs[i]);
+                 qemu_bh_new_guarded(virtio_crypto_dataq_bh, &vcrypto->vqs[i],
+                                     &dev->mem_reentrancy_guard);
         vcrypto->vqs[i].vcrypto = vcrypto;
     }
 
-- 
Gitee


From fdb0dfef0e905fb1c475cf1dd9ec667166412be6 Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Tue, 9 May 2023 10:29:03 -0400
Subject: [PATCH 283/407] lsi53c895a: disable reentrancy detection for script
 RAM

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 277: memory: prevent dma-reentracy issues
RH-Bugzilla: 1999236
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Commit: [5/12] b5334c3a34b38ed1dccf0030d5704e51e00fdce3 (redhat/rhel/src/qemu-kvm/jons-qemu-kvm-2)

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1999236
Upstream: Merged
CVE: CVE-2021-3750

commit bfd6e7ae6a72b84e2eb9574f56e6ec037f05182c
Author: Alexander Bulekov <alxndr@bu.edu>
Date:   Thu Apr 27 17:10:10 2023 -0400

    lsi53c895a: disable reentrancy detection for script RAM

    As the code is designed to use the memory APIs to access the script ram,
    disable reentrancy checks for the pseudo-RAM ram_io MemoryRegion.

    In the future, ram_io may be converted from an IO to a proper RAM MemoryRegion.

    Reported-by: Fiona Ebner <f.ebner@proxmox.com>
    Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
    Reviewed-by: Thomas Huth <thuth@redhat.com>
    Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
    Message-Id: <20230427211013.2994127-6-alxndr@bu.edu>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 hw/scsi/lsi53c895a.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/hw/scsi/lsi53c895a.c b/hw/scsi/lsi53c895a.c
index 85e907a7854..1e15e13fbf4 100644
--- a/hw/scsi/lsi53c895a.c
+++ b/hw/scsi/lsi53c895a.c
@@ -2301,6 +2301,12 @@ static void lsi_scsi_realize(PCIDevice *dev, Error **errp)
     memory_region_init_io(&s->io_io, OBJECT(s), &lsi_io_ops, s,
                           "lsi-io", 256);
 
+    /*
+     * Since we use the address-space API to interact with ram_io, disable the
+     * re-entrancy guard.
+     */
+    s->ram_io.disable_reentrancy_guard = true;
+
     address_space_init(&s->pci_io_as, pci_address_space_io(dev), "lsi-pci-io");
     qdev_init_gpio_out(d, &s->ext_irq, 1);
 
-- 
Gitee


From b2891dab417e1b5ae051de19f9047f1027ca5dce Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Tue, 9 May 2023 10:29:03 -0400
Subject: [PATCH 284/407] bcm2835_property: disable reentrancy detection for
 iomem

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 277: memory: prevent dma-reentracy issues
RH-Bugzilla: 1999236
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Commit: [6/12] 4d6187430ca1c4309a36824c0c6815d2a763db1a (redhat/rhel/src/qemu-kvm/jons-qemu-kvm-2)

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1999236
Upstream: Merged
CVE: CVE-2021-3750

commit 985c4a4e547afb9573b6bd6843d20eb2c3d1d1cd
Author: Alexander Bulekov <alxndr@bu.edu>
Date:   Thu Apr 27 17:10:11 2023 -0400

    bcm2835_property: disable reentrancy detection for iomem

    As the code is designed for re-entrant calls from bcm2835_property to
    bcm2835_mbox and back into bcm2835_property, mark iomem as
    reentrancy-safe.

    Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
    Reviewed-by: Thomas Huth <thuth@redhat.com>
    Message-Id: <20230427211013.2994127-7-alxndr@bu.edu>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 hw/misc/bcm2835_property.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/hw/misc/bcm2835_property.c b/hw/misc/bcm2835_property.c
index 73941bdae97..022b5a849c2 100644
--- a/hw/misc/bcm2835_property.c
+++ b/hw/misc/bcm2835_property.c
@@ -377,6 +377,13 @@ static void bcm2835_property_init(Object *obj)
 
     memory_region_init_io(&s->iomem, OBJECT(s), &bcm2835_property_ops, s,
                           TYPE_BCM2835_PROPERTY, 0x10);
+
+    /*
+     * bcm2835_property_ops call into bcm2835_mbox, which in-turn reads from
+     * iomem. As such, mark iomem as re-entracy safe.
+     */
+    s->iomem.disable_reentrancy_guard = true;
+
     sysbus_init_mmio(SYS_BUS_DEVICE(s), &s->iomem);
     sysbus_init_irq(SYS_BUS_DEVICE(s), &s->mbox_irq);
 }
-- 
Gitee


From 65eaa892a3ccd1b08891e3b2c18bb113a2b749c3 Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Tue, 9 May 2023 10:29:03 -0400
Subject: [PATCH 285/407] raven: disable reentrancy detection for iomem

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 277: memory: prevent dma-reentracy issues
RH-Bugzilla: 1999236
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Commit: [7/12] f41983390acba68043d386be090172dd17a5e58c (redhat/rhel/src/qemu-kvm/jons-qemu-kvm-2)

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1999236
Upstream: Merged
CVE: CVE-2021-3750

commit 6dad5a6810d9c60ca320d01276f6133bbcfa1fc7
Author: Alexander Bulekov <alxndr@bu.edu>
Date:   Thu Apr 27 17:10:12 2023 -0400

    raven: disable reentrancy detection for iomem

    As the code is designed for re-entrant calls from raven_io_ops to
    pci-conf, mark raven_io_ops as reentrancy-safe.

    Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
    Message-Id: <20230427211013.2994127-8-alxndr@bu.edu>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 hw/pci-host/raven.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/hw/pci-host/raven.c b/hw/pci-host/raven.c
index 6e514f75eb8..245b1653e4d 100644
--- a/hw/pci-host/raven.c
+++ b/hw/pci-host/raven.c
@@ -294,6 +294,13 @@ static void raven_pcihost_initfn(Object *obj)
     memory_region_init(&s->pci_memory, obj, "pci-memory", 0x3f000000);
     address_space_init(&s->pci_io_as, &s->pci_io, "raven-io");
 
+    /*
+     * Raven's raven_io_ops use the address-space API to access pci-conf-idx
+     * (which is also owned by the raven device). As such, mark the
+     * pci_io_non_contiguous as re-entrancy safe.
+     */
+    s->pci_io_non_contiguous.disable_reentrancy_guard = true;
+
     /* CPU address space */
     memory_region_add_subregion(address_space_mem, PCI_IO_BASE_ADDR,
                                 &s->pci_io);
-- 
Gitee


From 5370b5da07275ccc981097503671fd9fae4de4b2 Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Tue, 9 May 2023 10:29:03 -0400
Subject: [PATCH 286/407] apic: disable reentrancy detection for apic-msi

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 277: memory: prevent dma-reentracy issues
RH-Bugzilla: 1999236
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Commit: [8/12] 25c3cf99b00cd9adc10d6e7afa9c3e3b7da08de2 (redhat/rhel/src/qemu-kvm/jons-qemu-kvm-2)

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1999236
Upstream: Merged
CVE: CVE-2021-3750

commit 50795ee051a342c681a9b45671c552fbd6274db8
Author: Alexander Bulekov <alxndr@bu.edu>
Date:   Thu Apr 27 17:10:13 2023 -0400

    apic: disable reentrancy detection for apic-msi

    As the code is designed for re-entrant calls to apic-msi, mark apic-msi
    as reentrancy-safe.

    Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
    Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
    Message-Id: <20230427211013.2994127-9-alxndr@bu.edu>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 hw/intc/apic.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/hw/intc/apic.c b/hw/intc/apic.c
index 3df11c34d68..a7c2b301a87 100644
--- a/hw/intc/apic.c
+++ b/hw/intc/apic.c
@@ -883,6 +883,13 @@ static void apic_realize(DeviceState *dev, Error **errp)
     memory_region_init_io(&s->io_memory, OBJECT(s), &apic_io_ops, s, "apic-msi",
                           APIC_SPACE_SIZE);
 
+    /*
+     * apic-msi's apic_mem_write can call into ioapic_eoi_broadcast, which can
+     * write back to apic-msi. As such mark the apic-msi region re-entrancy
+     * safe.
+     */
+    s->io_memory.disable_reentrancy_guard = true;
+
     s->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, apic_timer, s);
     local_apics[s->id] = s;
 
-- 
Gitee


From 76c97ba71169c55624c4f5699c69925f9d30f8c7 Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Tue, 9 May 2023 10:29:03 -0400
Subject: [PATCH 287/407] async: avoid use-after-free on re-entrancy guard

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 277: memory: prevent dma-reentracy issues
RH-Bugzilla: 1999236
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Commit: [9/12] d357650e581c3921bbfe3e2fde5e3f55853b5fab (redhat/rhel/src/qemu-kvm/jons-qemu-kvm-2)

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1999236
Upstream: Merged
CVE: CVE-2021-3750

commit 7915bd06f25e1803778081161bf6fa10c42dc7cd
Author: Alexander Bulekov <alxndr@bu.edu>
Date:   Mon May 1 10:19:56 2023 -0400

    async: avoid use-after-free on re-entrancy guard

    A BH callback can free the BH, causing a use-after-free in aio_bh_call.
    Fix that by keeping a local copy of the re-entrancy guard pointer.

    Buglink: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=58513
    Fixes: 9c86c97f12 ("async: Add an optional reentrancy guard to the BH API")
    Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
    Message-Id: <20230501141956.3444868-1-alxndr@bu.edu>
    Reviewed-by: Thomas Huth <thuth@redhat.com>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 util/async.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/util/async.c b/util/async.c
index 1fff02e7fc9..ffe0541c3ba 100644
--- a/util/async.c
+++ b/util/async.c
@@ -146,18 +146,20 @@ void aio_bh_call(QEMUBH *bh)
 {
     bool last_engaged_in_io = false;
 
-    if (bh->reentrancy_guard) {
-        last_engaged_in_io = bh->reentrancy_guard->engaged_in_io;
-        if (bh->reentrancy_guard->engaged_in_io) {
+    /* Make a copy of the guard-pointer as cb may free the bh */
+    MemReentrancyGuard *reentrancy_guard = bh->reentrancy_guard;
+    if (reentrancy_guard) {
+        last_engaged_in_io = reentrancy_guard->engaged_in_io;
+        if (reentrancy_guard->engaged_in_io) {
             trace_reentrant_aio(bh->ctx, bh->name);
         }
-        bh->reentrancy_guard->engaged_in_io = true;
+        reentrancy_guard->engaged_in_io = true;
     }
 
     bh->cb(bh->opaque);
 
-    if (bh->reentrancy_guard) {
-        bh->reentrancy_guard->engaged_in_io = last_engaged_in_io;
+    if (reentrancy_guard) {
+        reentrancy_guard->engaged_in_io = last_engaged_in_io;
     }
 }
 
-- 
Gitee


From f90bbfb711986746a486d79fe5e27ad82b1e8ed0 Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Wed, 7 Jun 2023 11:45:09 -0400
Subject: [PATCH 288/407] memory: stricter checks prior to unsetting
 engaged_in_io

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 277: memory: prevent dma-reentracy issues
RH-Bugzilla: 1999236
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Commit: [10/12] 773b62a84b2bd4f5ee7fb8e1cfb3bb91c3a01de1 (redhat/rhel/src/qemu-kvm/jons-qemu-kvm-2)

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1999236
Upstream: Merged
CVE: CVE-2021-3750

commit 3884bf6468ac6bbb58c2b3feaa74e87f821b52f3
Author: Alexander Bulekov <alxndr@bu.edu>
Date:   Tue May 16 04:40:02 2023 -0400

    memory: stricter checks prior to unsetting engaged_in_io

    engaged_in_io could be unset by an MR with re-entrancy checks disabled.
    Ensure that only MRs that can set the engaged_in_io flag can unset it.

    Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
    Message-Id: <20230516084002.3813836-1-alxndr@bu.edu>
    Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 softmmu/memory.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/softmmu/memory.c b/softmmu/memory.c
index 102f0a42482..6b986153572 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -533,6 +533,7 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
     unsigned access_size;
     unsigned i;
     MemTxResult r = MEMTX_OK;
+    bool reentrancy_guard_applied = false;
 
     if (!access_size_min) {
         access_size_min = 1;
@@ -551,6 +552,7 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
             return MEMTX_ACCESS_ERROR;
         }
         mr->dev->mem_reentrancy_guard.engaged_in_io = true;
+        reentrancy_guard_applied = true;
     }
 
     /* FIXME: support unaligned access? */
@@ -567,7 +569,7 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
                         access_mask, attrs);
         }
     }
-    if (mr->dev) {
+    if (mr->dev && reentrancy_guard_applied) {
         mr->dev->mem_reentrancy_guard.engaged_in_io = false;
     }
     return r;
-- 
Gitee


From 828c7b30664691d54e85506900434c02ea5a5b4c Mon Sep 17 00:00:00 2001
From: Thomas Huth <thuth@redhat.com>
Date: Tue, 16 May 2023 11:05:56 +0200
Subject: [PATCH 289/407] lsi53c895a: disable reentrancy detection for MMIO
 region, too

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 277: memory: prevent dma-reentracy issues
RH-Bugzilla: 1999236
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Commit: [11/12] 8016c86f8432f5ea06c831d1181e87e6d45a6a50 (redhat/rhel/src/qemu-kvm/jons-qemu-kvm-2)

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1999236
Upstream: Merged
CVE: CVE-2021-3750

commit d139fe9ad8a27bcc50b4ead77d2f97d191a0e95e
Author: Thomas Huth <thuth@redhat.com>
Date:   Tue May 16 11:05:56 2023 +0200

    lsi53c895a: disable reentrancy detection for MMIO region, too

    While trying to use a SCSI disk on the LSI controller with an
    older version of Fedora (25), I'm getting:

     qemu: warning: Blocked re-entrant IO on MemoryRegion: lsi-mmio at addr: 0x34

    and the SCSI controller is not usable. Seems like we have to
    disable the reentrancy checker for the MMIO region, too, to
    get this working again.

    The problem could be reproduced it like this:

    ./qemu-system-x86_64 -accel kvm -m 2G -machine q35 \
     -device lsi53c810,id=lsi1 -device scsi-hd,drive=d0 \
     -drive if=none,id=d0,file=.../somedisk.qcow2 \
     -cdrom Fedora-Everything-netinst-i386-25-1.3.iso

    Where somedisk.qcow2 is an image that contains already some partitions
    and file systems.

    In the boot menu of Fedora, go to
    "Troubleshooting" -> "Rescue a Fedora system" -> "3) Skip to shell"

    Then check "dmesg | grep -i 53c" for failure messages, and try to mount
    a partition from somedisk.qcow2.

    Message-Id: <20230516090556.553813-1-thuth@redhat.com>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 hw/scsi/lsi53c895a.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/scsi/lsi53c895a.c b/hw/scsi/lsi53c895a.c
index 1e15e13fbf4..2b9cb2ac5db 100644
--- a/hw/scsi/lsi53c895a.c
+++ b/hw/scsi/lsi53c895a.c
@@ -2306,6 +2306,7 @@ static void lsi_scsi_realize(PCIDevice *dev, Error **errp)
      * re-entrancy guard.
      */
     s->ram_io.disable_reentrancy_guard = true;
+    s->mmio_io.disable_reentrancy_guard = true;
 
     address_space_init(&s->pci_io_as, pci_address_space_io(dev), "lsi-pci-io");
     qdev_init_gpio_out(d, &s->ext_irq, 1);
-- 
Gitee


From 9aa7cd0fe449e374c94e739c372bf3d9252f5f60 Mon Sep 17 00:00:00 2001
From: Thomas Huth <thuth@redhat.com>
Date: Mon, 22 May 2023 11:10:11 +0200
Subject: [PATCH 290/407] hw/scsi/lsi53c895a: Fix reentrancy issues in the LSI
 controller (CVE-2023-0330)

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 277: memory: prevent dma-reentracy issues
RH-Bugzilla: 1999236
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Commit: [12/12] 28f5e04344109d8514869c50468bef481437201d (redhat/rhel/src/qemu-kvm/jons-qemu-kvm-2)

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1999236
Upstream: Merged
CVE: CVE-2021-3750

commit b987718bbb1d0eabf95499b976212dd5f0120d75
Author: Thomas Huth <thuth@redhat.com>
Date:   Mon May 22 11:10:11 2023 +0200

    hw/scsi/lsi53c895a: Fix reentrancy issues in the LSI controller (CVE-2023-0330)

    We cannot use the generic reentrancy guard in the LSI code, so
    we have to manually prevent endless reentrancy here. The problematic
    lsi_execute_script() function has already a way to detect whether
    too many instructions have been executed - we just have to slightly
    change the logic here that it also takes into account if the function
    has been called too often in a reentrant way.

    The code in fuzz-lsi53c895a-test.c has been taken from an earlier
    patch by Mauro Matteo Cascella.

    Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1563
    Message-Id: <20230522091011.1082574-1-thuth@redhat.com>
    Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
    Reviewed-by: Alexander Bulekov <alxndr@bu.edu>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 hw/scsi/lsi53c895a.c               |  23 +++--
 tests/qtest/fuzz-lsi53c895a-test.c | 161 +++++++++++++++++++++++++++++
 2 files changed, 178 insertions(+), 6 deletions(-)
 create mode 100644 tests/qtest/fuzz-lsi53c895a-test.c

diff --git a/hw/scsi/lsi53c895a.c b/hw/scsi/lsi53c895a.c
index 2b9cb2ac5db..b60786fd56b 100644
--- a/hw/scsi/lsi53c895a.c
+++ b/hw/scsi/lsi53c895a.c
@@ -1133,15 +1133,24 @@ static void lsi_execute_script(LSIState *s)
     uint32_t addr, addr_high;
     int opcode;
     int insn_processed = 0;
+    static int reentrancy_level;
+
+    reentrancy_level++;
 
     s->istat1 |= LSI_ISTAT1_SRUN;
 again:
-    if (++insn_processed > LSI_MAX_INSN) {
-        /* Some windows drivers make the device spin waiting for a memory
-           location to change.  If we have been executed a lot of code then
-           assume this is the case and force an unexpected device disconnect.
-           This is apparently sufficient to beat the drivers into submission.
-         */
+    /*
+     * Some windows drivers make the device spin waiting for a memory location
+     * to change. If we have executed more than LSI_MAX_INSN instructions then
+     * assume this is the case and force an unexpected device disconnect. This
+     * is apparently sufficient to beat the drivers into submission.
+     *
+     * Another issue (CVE-2023-0330) can occur if the script is programmed to
+     * trigger itself again and again. Avoid this problem by stopping after
+     * being called multiple times in a reentrant way (8 is an arbitrary value
+     * which should be enough for all valid use cases).
+     */
+    if (++insn_processed > LSI_MAX_INSN || reentrancy_level > 8) {
         if (!(s->sien0 & LSI_SIST0_UDC)) {
             qemu_log_mask(LOG_GUEST_ERROR,
                           "lsi_scsi: inf. loop with UDC masked");
@@ -1595,6 +1604,8 @@ again:
         }
     }
     trace_lsi_execute_script_stop();
+
+    reentrancy_level--;
 }
 
 static uint8_t lsi_reg_readb(LSIState *s, int offset)
diff --git a/tests/qtest/fuzz-lsi53c895a-test.c b/tests/qtest/fuzz-lsi53c895a-test.c
new file mode 100644
index 00000000000..1b55928b9f1
--- /dev/null
+++ b/tests/qtest/fuzz-lsi53c895a-test.c
@@ -0,0 +1,161 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * QTest fuzzer-generated testcase for LSI53C895A device
+ *
+ * Copyright (c) Red Hat
+ */
+
+#include "qemu/osdep.h"
+#include "libqtest.h"
+
+/*
+ * This used to trigger a DMA reentrancy issue
+ * leading to memory corruption bugs like stack
+ * overflow or use-after-free
+ * https://gitlab.com/qemu-project/qemu/-/issues/1563
+ */
+static void test_lsi_dma_reentrancy(void)
+{
+    QTestState *s;
+
+    s = qtest_init("-M q35 -m 512M -nodefaults "
+                   "-blockdev driver=null-co,node-name=null0 "
+                   "-device lsi53c810 -device scsi-cd,drive=null0");
+
+    qtest_outl(s, 0xcf8, 0x80000804); /* PCI Command Register */
+    qtest_outw(s, 0xcfc, 0x7);        /* Enables accesses */
+    qtest_outl(s, 0xcf8, 0x80000814); /* Memory Bar 1 */
+    qtest_outl(s, 0xcfc, 0xff100000); /* Set MMIO Address*/
+    qtest_outl(s, 0xcf8, 0x80000818); /* Memory Bar 2 */
+    qtest_outl(s, 0xcfc, 0xff000000); /* Set RAM Address*/
+    qtest_writel(s, 0xff000000, 0xc0000024);
+    qtest_writel(s, 0xff000114, 0x00000080);
+    qtest_writel(s, 0xff00012c, 0xff000000);
+    qtest_writel(s, 0xff000004, 0xff000114);
+    qtest_writel(s, 0xff000008, 0xff100014);
+    qtest_writel(s, 0xff10002f, 0x000000ff);
+
+    qtest_quit(s);
+}
+
+/*
+ * This used to trigger a UAF in lsi_do_msgout()
+ * https://gitlab.com/qemu-project/qemu/-/issues/972
+ */
+static void test_lsi_do_msgout_cancel_req(void)
+{
+    QTestState *s;
+
+    if (sizeof(void *) == 4) {
+        g_test_skip("memory size too big for 32-bit build");
+        return;
+    }
+
+    s = qtest_init("-M q35 -m 2G -nodefaults "
+                   "-device lsi53c895a,id=scsi "
+                   "-device scsi-hd,drive=disk0 "
+                   "-drive file=null-co://,id=disk0,if=none,format=raw");
+
+    qtest_outl(s, 0xcf8, 0x80000810);
+    qtest_outl(s, 0xcf8, 0xc000);
+    qtest_outl(s, 0xcf8, 0x80000810);
+    qtest_outw(s, 0xcfc, 0x7);
+    qtest_outl(s, 0xcf8, 0x80000810);
+    qtest_outl(s, 0xcfc, 0xc000);
+    qtest_outl(s, 0xcf8, 0x80000804);
+    qtest_outw(s, 0xcfc, 0x05);
+    qtest_writeb(s, 0x69736c10, 0x08);
+    qtest_writeb(s, 0x69736c13, 0x58);
+    qtest_writeb(s, 0x69736c1a, 0x01);
+    qtest_writeb(s, 0x69736c1b, 0x06);
+    qtest_writeb(s, 0x69736c22, 0x01);
+    qtest_writeb(s, 0x69736c23, 0x07);
+    qtest_writeb(s, 0x69736c2b, 0x02);
+    qtest_writeb(s, 0x69736c48, 0x08);
+    qtest_writeb(s, 0x69736c4b, 0x58);
+    qtest_writeb(s, 0x69736c52, 0x04);
+    qtest_writeb(s, 0x69736c53, 0x06);
+    qtest_writeb(s, 0x69736c5b, 0x02);
+    qtest_outl(s, 0xc02d, 0x697300);
+    qtest_writeb(s, 0x5a554662, 0x01);
+    qtest_writeb(s, 0x5a554663, 0x07);
+    qtest_writeb(s, 0x5a55466a, 0x10);
+    qtest_writeb(s, 0x5a55466b, 0x22);
+    qtest_writeb(s, 0x5a55466c, 0x5a);
+    qtest_writeb(s, 0x5a55466d, 0x5a);
+    qtest_writeb(s, 0x5a55466e, 0x34);
+    qtest_writeb(s, 0x5a55466f, 0x5a);
+    qtest_writeb(s, 0x5a345a5a, 0x77);
+    qtest_writeb(s, 0x5a345a5b, 0x55);
+    qtest_writeb(s, 0x5a345a5c, 0x51);
+    qtest_writeb(s, 0x5a345a5d, 0x27);
+    qtest_writeb(s, 0x27515577, 0x41);
+    qtest_outl(s, 0xc02d, 0x5a5500);
+    qtest_writeb(s, 0x364001d0, 0x08);
+    qtest_writeb(s, 0x364001d3, 0x58);
+    qtest_writeb(s, 0x364001da, 0x01);
+    qtest_writeb(s, 0x364001db, 0x26);
+    qtest_writeb(s, 0x364001dc, 0x0d);
+    qtest_writeb(s, 0x364001dd, 0xae);
+    qtest_writeb(s, 0x364001de, 0x41);
+    qtest_writeb(s, 0x364001df, 0x5a);
+    qtest_writeb(s, 0x5a41ae0d, 0xf8);
+    qtest_writeb(s, 0x5a41ae0e, 0x36);
+    qtest_writeb(s, 0x5a41ae0f, 0xd7);
+    qtest_writeb(s, 0x5a41ae10, 0x36);
+    qtest_writeb(s, 0x36d736f8, 0x0c);
+    qtest_writeb(s, 0x36d736f9, 0x80);
+    qtest_writeb(s, 0x36d736fa, 0x0d);
+    qtest_outl(s, 0xc02d, 0x364000);
+
+    qtest_quit(s);
+}
+
+/*
+ * This used to trigger the assert in lsi_do_dma()
+ * https://bugs.launchpad.net/qemu/+bug/697510
+ * https://bugs.launchpad.net/qemu/+bug/1905521
+ * https://bugs.launchpad.net/qemu/+bug/1908515
+ */
+static void test_lsi_do_dma_empty_queue(void)
+{
+    QTestState *s;
+
+    s = qtest_init("-M q35 -nographic -monitor none -serial none "
+                   "-drive if=none,id=drive0,"
+                            "file=null-co://,file.read-zeroes=on,format=raw "
+                   "-device lsi53c895a,id=scsi0 "
+                   "-device scsi-hd,drive=drive0,"
+                            "bus=scsi0.0,channel=0,scsi-id=0,lun=0");
+    qtest_outl(s, 0xcf8, 0x80001814);
+    qtest_outl(s, 0xcfc, 0xe1068000);
+    qtest_outl(s, 0xcf8, 0x80001818);
+    qtest_outl(s, 0xcf8, 0x80001804);
+    qtest_outw(s, 0xcfc, 0x7);
+    qtest_outl(s, 0xcf8, 0x80002010);
+
+    qtest_writeb(s, 0xe106802e, 0xff); /* Fill DSP bits 16-23 */
+    qtest_writeb(s, 0xe106802f, 0xff); /* Fill DSP bits 24-31: trigger SCRIPT */
+
+    qtest_quit(s);
+}
+
+int main(int argc, char **argv)
+{
+    g_test_init(&argc, &argv, NULL);
+
+    if (!qtest_has_device("lsi53c895a")) {
+        return 0;
+    }
+
+    qtest_add_func("fuzz/lsi53c895a/lsi_do_dma_empty_queue",
+                   test_lsi_do_dma_empty_queue);
+
+    qtest_add_func("fuzz/lsi53c895a/lsi_do_msgout_cancel_req",
+                   test_lsi_do_msgout_cancel_req);
+
+    qtest_add_func("fuzz/lsi53c895a/lsi_dma_reentrancy",
+                   test_lsi_dma_reentrancy);
+
+    return g_test_run();
+}
-- 
Gitee


From 413a815f6edfc2c69c21b399ed8d27fe8b42fdf1 Mon Sep 17 00:00:00 2001
From: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Date: Wed, 24 May 2023 07:25:57 -0400
Subject: [PATCH 291/407] target/i386: add support for FLUSH_L1D feature

RH-Author: Emanuele Giuseppe Esposito <eesposit@redhat.com>
RH-MergeRequest: 281: target/i386: add support for FLUSH_L1D feature
RH-Bugzilla: 2216203
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [1/2] 50c54ca7c734dc2b9303e724a6c5ac1127472271

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2216203

commit 0e7e3bf1a552c178924867fa7c2f30ccc8a179e0
Author: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Date:   Wed Feb 1 08:57:58 2023 -0500

    target/i386: add support for FLUSH_L1D feature

    As reported by Intel's doc:
    "L1D_FLUSH: Writeback and invalidate the L1 data cache"

    If this cpu feature is present in host, allow QEMU to choose whether to
    show it to the guest too.
    One disadvantage of not exposing it is that the guest will report
    a non existing vulnerability in
    /sys/devices/system/cpu/vulnerabilities/mmio_stale_data
    because the mitigation is present only when the cpu has
            (FLUSH_L1D and MD_CLEAR) or FB_CLEAR
    features enabled.

    Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
    Message-Id: <20230201135759.555607-2-eesposit@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 target/i386/cpu.c | 2 +-
 target/i386/cpu.h | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 0543b846ff2..47da059df6e 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -857,7 +857,7 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
             "tsx-ldtrk", NULL, NULL /* pconfig */, NULL,
             NULL, NULL, "amx-bf16", "avx512-fp16",
             "amx-tile", "amx-int8", "spec-ctrl", "stibp",
-            NULL, "arch-capabilities", "core-capability", "ssbd",
+            "flush-l1d", "arch-capabilities", "core-capability", "ssbd",
         },
         .cpuid = {
             .eax = 7,
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 5d2ddd81b95..7cb7cea8ab1 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -864,6 +864,8 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
 #define CPUID_7_0_EDX_SPEC_CTRL         (1U << 26)
 /* Single Thread Indirect Branch Predictors */
 #define CPUID_7_0_EDX_STIBP             (1U << 27)
+/* Flush L1D cache */
+#define CPUID_7_0_EDX_FLUSH_L1D         (1U << 28)
 /* Arch Capabilities */
 #define CPUID_7_0_EDX_ARCH_CAPABILITIES (1U << 29)
 /* Core Capability */
-- 
Gitee


From 94e225fb8575e03f7fc903c928788798474ff505 Mon Sep 17 00:00:00 2001
From: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Date: Wed, 24 May 2023 07:26:04 -0400
Subject: [PATCH 292/407] target/i386: add support for FB_CLEAR feature

RH-Author: Emanuele Giuseppe Esposito <eesposit@redhat.com>
RH-MergeRequest: 281: target/i386: add support for FLUSH_L1D feature
RH-Bugzilla: 2216203
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [2/2] 8cd4b7366a9898e406ca20c9a28f14ddce855b1e

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2216203

commit 22e1094ca82d5518c1b69aff3e87c550776ae1eb
Author: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Date:   Wed Feb 1 08:57:59 2023 -0500

    target/i386: add support for FB_CLEAR feature

    As reported by the Intel's doc:
    "FB_CLEAR: The processor will overwrite fill buffer values as part of
    MD_CLEAR operations with the VERW instruction.
    On these processors, L1D_FLUSH does not overwrite fill buffer values."

    If this cpu feature is present in host, allow QEMU to choose whether to
    show it to the guest too.
    One disadvantage of not exposing it is that the guest will report
    a non existing vulnerability in
    /sys/devices/system/cpu/vulnerabilities/mmio_stale_data
    because the mitigation is present only when the cpu has
            (FLUSH_L1D and MD_CLEAR) or FB_CLEAR
    features enabled.

    Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
    Message-Id: <20230201135759.555607-3-eesposit@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 target/i386/cpu.c | 2 +-
 target/i386/cpu.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 47da059df6e..9d3dcdcc0d2 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -981,7 +981,7 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
             "ssb-no", "mds-no", "pschange-mc-no", "tsx-ctrl",
             "taa-no", NULL, NULL, NULL,
             NULL, NULL, NULL, NULL,
-            NULL, NULL, NULL, NULL,
+            NULL, "fb-clear", NULL, NULL,
             NULL, NULL, NULL, NULL,
             NULL, NULL, NULL, NULL,
             NULL, NULL, NULL, NULL,
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 7cb7cea8ab1..9b7d664ee7c 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -950,6 +950,7 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
 #define MSR_ARCH_CAP_PSCHANGE_MC_NO     (1U << 6)
 #define MSR_ARCH_CAP_TSX_CTRL_MSR       (1U << 7)
 #define MSR_ARCH_CAP_TAA_NO             (1U << 8)
+#define MSR_ARCH_CAP_FB_CLEAR           (1U << 17)
 
 #define MSR_CORE_CAP_SPLIT_LOCK_DETECT  (1U << 5)
 
-- 
Gitee


From befb52140f6a58a18dde3936641fe9b3e1bbcb70 Mon Sep 17 00:00:00 2001
From: Leonardo Bras <leobras@redhat.com>
Date: Tue, 20 Jun 2023 14:51:03 -0300
Subject: [PATCH 293/407] migration: Disable postcopy + multifd migration
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Leonardo Brás <leobras@redhat.com>
RH-MergeRequest: 287: migration: Disable postcopy + multifd migration
RH-Bugzilla: 2169733
RH-Acked-by: Peter Xu <peterx@redhat.com>
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Commit: [1/1] 07d26fbac35b7586fe790304f03d316ed26a4ef2

Since the introduction of multifd, it's possible to perform a multifd
migration and finish it using postcopy.

A bug introduced by yank (fixed on cfc3bcf373) was previously preventing
a successful use of this migration scenario, and now thing should be
working on most scenarios.

But since there is not enough testing/support nor any reported users for
this scenario, we should disable this combination before it may cause any
problems for users.

Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
(cherry picked from commit b405dfff1ea3cf0530b628895b5a7a50dc8c6996)
[leobras: moves logic from options.c -> migration.c and use cap_list
instead of new_caps for backward compatibility]
Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 migration/migration.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 817170d52dd..1ad82e63f00 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1246,6 +1246,11 @@ static bool migrate_caps_check(bool *cap_list,
             error_setg(errp, "Postcopy is not compatible with ignore-shared");
             return false;
         }
+
+        if (cap_list[MIGRATION_CAPABILITY_MULTIFD]) {
+            error_setg(errp, "Postcopy is not yet compatible with multifd");
+            return false;
+        }
     }
 
     if (cap_list[MIGRATION_CAPABILITY_BACKGROUND_SNAPSHOT]) {
-- 
Gitee


From 065b73399a4fe68864d7c7532d1b8c922b0c632f Mon Sep 17 00:00:00 2001
From: Hanna Czenczek <hreitz@redhat.com>
Date: Tue, 11 Apr 2023 19:34:15 +0200
Subject: [PATCH 294/407] util/iov: Make qiov_slice() public

RH-Author: Hanna Czenczek <hreitz@redhat.com>
RH-MergeRequest: 291: block: Split padded I/O vectors exceeding IOV_MAX
RH-Bugzilla: 2141964
RH-Acked-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
RH-Commit: [1/5] 7f082982e49bacbcc21ca24e471b4399e64321a9

We want to inline qemu_iovec_init_extended() in block/io.c for padding
requests, and having access to qiov_slice() is useful for this.  As a
public function, it is renamed to qemu_iovec_slice().

(We will need to count the number of I/O vector elements of a slice
there, and then later process this slice.  Without qiov_slice(), we
would need to call qemu_iovec_subvec_niov(), and all further
IOV-processing functions may need to skip prefixing elements to
accomodate for a qiov_offset.  Because qemu_iovec_subvec_niov()
internally calls qiov_slice(), we can just have the block/io.c code call
qiov_slice() itself, thus get the number of elements, and also create an
iovec array with the superfluous prefixing elements stripped, so the
following processing functions no longer need to skip them.)

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-Id: <20230411173418.19549-2-hreitz@redhat.com>
(cherry picked from commit 3d06cea8256d54a6b0238934c31012f7f17100f5)
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
---
 include/qemu/iov.h |  3 +++
 util/iov.c         | 14 +++++++-------
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/include/qemu/iov.h b/include/qemu/iov.h
index 93307466809..46fadfb27a6 100644
--- a/include/qemu/iov.h
+++ b/include/qemu/iov.h
@@ -229,6 +229,9 @@ int qemu_iovec_init_extended(
         void *tail_buf, size_t tail_len);
 void qemu_iovec_init_slice(QEMUIOVector *qiov, QEMUIOVector *source,
                            size_t offset, size_t len);
+struct iovec *qemu_iovec_slice(QEMUIOVector *qiov,
+                               size_t offset, size_t len,
+                               size_t *head, size_t *tail, int *niov);
 int qemu_iovec_subvec_niov(QEMUIOVector *qiov, size_t offset, size_t len);
 void qemu_iovec_add(QEMUIOVector *qiov, void *base, size_t len);
 void qemu_iovec_concat(QEMUIOVector *dst,
diff --git a/util/iov.c b/util/iov.c
index 58c7b3eeee5..3ccb530b164 100644
--- a/util/iov.c
+++ b/util/iov.c
@@ -373,15 +373,15 @@ static struct iovec *iov_skip_offset(struct iovec *iov, size_t offset,
 }
 
 /*
- * qiov_slice
+ * qemu_iovec_slice
  *
  * Find subarray of iovec's, containing requested range. @head would
  * be offset in first iov (returned by the function), @tail would be
  * count of extra bytes in last iovec (returned iov + @niov - 1).
  */
-static struct iovec *qiov_slice(QEMUIOVector *qiov,
-                                size_t offset, size_t len,
-                                size_t *head, size_t *tail, int *niov)
+struct iovec *qemu_iovec_slice(QEMUIOVector *qiov,
+                               size_t offset, size_t len,
+                               size_t *head, size_t *tail, int *niov)
 {
     struct iovec *iov, *end_iov;
 
@@ -406,7 +406,7 @@ int qemu_iovec_subvec_niov(QEMUIOVector *qiov, size_t offset, size_t len)
     size_t head, tail;
     int niov;
 
-    qiov_slice(qiov, offset, len, &head, &tail, &niov);
+    qemu_iovec_slice(qiov, offset, len, &head, &tail, &niov);
 
     return niov;
 }
@@ -434,8 +434,8 @@ int qemu_iovec_init_extended(
     }
 
     if (mid_len) {
-        mid_iov = qiov_slice(mid_qiov, mid_offset, mid_len,
-                             &mid_head, &mid_tail, &mid_niov);
+        mid_iov = qemu_iovec_slice(mid_qiov, mid_offset, mid_len,
+                                   &mid_head, &mid_tail, &mid_niov);
     }
 
     total_niov = !!head_len + mid_niov + !!tail_len;
-- 
Gitee


From d4dda1cf9946e4e4f9138610526621c9ce08ce93 Mon Sep 17 00:00:00 2001
From: Hanna Czenczek <hreitz@redhat.com>
Date: Tue, 11 Apr 2023 19:34:16 +0200
Subject: [PATCH 295/407] block: Collapse padded I/O vecs exceeding IOV_MAX

RH-Author: Hanna Czenczek <hreitz@redhat.com>
RH-MergeRequest: 291: block: Split padded I/O vectors exceeding IOV_MAX
RH-Bugzilla: 2141964
RH-Acked-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
RH-Commit: [2/5] 1d86ce8398e4ab66e308a686f9855c963e52b0a9

When processing vectored guest requests that are not aligned to the
storage request alignment, we pad them by adding head and/or tail
buffers for a read-modify-write cycle.

The guest can submit I/O vectors up to IOV_MAX (1024) in length, but
with this padding, the vector can exceed that limit.  As of
4c002cef0e9abe7135d7916c51abce47f7fc1ee2 ("util/iov: make
qemu_iovec_init_extended() honest"), we refuse to pad vectors beyond the
limit, instead returning an error to the guest.

To the guest, this appears as a random I/O error.  We should not return
an I/O error to the guest when it issued a perfectly valid request.

Before 4c002cef0e9abe7135d7916c51abce47f7fc1ee2, we just made the vector
longer than IOV_MAX, which generally seems to work (because the guest
assumes a smaller alignment than we really have, file-posix's
raw_co_prw() will generally see bdrv_qiov_is_aligned() return false, and
so emulate the request, so that the IOV_MAX does not matter).  However,
that does not seem exactly great.

I see two ways to fix this problem:
1. We split such long requests into two requests.
2. We join some elements of the vector into new buffers to make it
   shorter.

I am wary of (1), because it seems like it may have unintended side
effects.

(2) on the other hand seems relatively simple to implement, with
hopefully few side effects, so this patch does that.

To do this, the use of qemu_iovec_init_extended() in bdrv_pad_request()
is effectively replaced by the new function bdrv_create_padded_qiov(),
which not only wraps the request IOV with padding head/tail, but also
ensures that the resulting vector will not have more than IOV_MAX
elements.  Putting that functionality into qemu_iovec_init_extended() is
infeasible because it requires allocating a bounce buffer; doing so
would require many more parameters (buffer alignment, how to initialize
the buffer, and out parameters like the buffer, its length, and the
original elements), which is not reasonable.

Conversely, it is not difficult to move qemu_iovec_init_extended()'s
functionality into bdrv_create_padded_qiov() by using public
qemu_iovec_* functions, so that is what this patch does.

Because bdrv_pad_request() was the only "serious" user of
qemu_iovec_init_extended(), the next patch will remove the latter
function, so the functionality is not implemented twice.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2141964
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-Id: <20230411173418.19549-3-hreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
(cherry picked from commit 18743311b829cafc1737a5f20bc3248d5f91ee2a)

Conflicts:
	block/io.c: Downstream bdrv_pad_request() has no @flags
        parameter.

Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
---
 block/io.c | 166 ++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 151 insertions(+), 15 deletions(-)

diff --git a/block/io.c b/block/io.c
index c3e7301613a..0fe8f0dd40b 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1624,6 +1624,14 @@ out:
  * @merge_reads is true for small requests,
  * if @buf_len == @head + bytes + @tail. In this case it is possible that both
  * head and tail exist but @buf_len == align and @tail_buf == @buf.
+ *
+ * @write is true for write requests, false for read requests.
+ *
+ * If padding makes the vector too long (exceeding IOV_MAX), then we need to
+ * merge existing vector elements into a single one.  @collapse_bounce_buf acts
+ * as the bounce buffer in such cases.  @pre_collapse_qiov has the pre-collapse
+ * I/O vector elements so for read requests, the data can be copied back after
+ * the read is done.
  */
 typedef struct BdrvRequestPadding {
     uint8_t *buf;
@@ -1632,11 +1640,17 @@ typedef struct BdrvRequestPadding {
     size_t head;
     size_t tail;
     bool merge_reads;
+    bool write;
     QEMUIOVector local_qiov;
+
+    uint8_t *collapse_bounce_buf;
+    size_t collapse_len;
+    QEMUIOVector pre_collapse_qiov;
 } BdrvRequestPadding;
 
 static bool bdrv_init_padding(BlockDriverState *bs,
                               int64_t offset, int64_t bytes,
+                              bool write,
                               BdrvRequestPadding *pad)
 {
     int64_t align = bs->bl.request_alignment;
@@ -1668,6 +1682,8 @@ static bool bdrv_init_padding(BlockDriverState *bs,
         pad->tail_buf = pad->buf + pad->buf_len - align;
     }
 
+    pad->write = write;
+
     return true;
 }
 
@@ -1733,8 +1749,23 @@ zero_mem:
     return 0;
 }
 
-static void bdrv_padding_destroy(BdrvRequestPadding *pad)
+/**
+ * Free *pad's associated buffers, and perform any necessary finalization steps.
+ */
+static void bdrv_padding_finalize(BdrvRequestPadding *pad)
 {
+    if (pad->collapse_bounce_buf) {
+        if (!pad->write) {
+            /*
+             * If padding required elements in the vector to be collapsed into a
+             * bounce buffer, copy the bounce buffer content back
+             */
+            qemu_iovec_from_buf(&pad->pre_collapse_qiov, 0,
+                                pad->collapse_bounce_buf, pad->collapse_len);
+        }
+        qemu_vfree(pad->collapse_bounce_buf);
+        qemu_iovec_destroy(&pad->pre_collapse_qiov);
+    }
     if (pad->buf) {
         qemu_vfree(pad->buf);
         qemu_iovec_destroy(&pad->local_qiov);
@@ -1742,6 +1773,101 @@ static void bdrv_padding_destroy(BdrvRequestPadding *pad)
     memset(pad, 0, sizeof(*pad));
 }
 
+/*
+ * Create pad->local_qiov by wrapping @iov in the padding head and tail, while
+ * ensuring that the resulting vector will not exceed IOV_MAX elements.
+ *
+ * To ensure this, when necessary, the first two or three elements of @iov are
+ * merged into pad->collapse_bounce_buf and replaced by a reference to that
+ * bounce buffer in pad->local_qiov.
+ *
+ * After performing a read request, the data from the bounce buffer must be
+ * copied back into pad->pre_collapse_qiov (e.g. by bdrv_padding_finalize()).
+ */
+static int bdrv_create_padded_qiov(BlockDriverState *bs,
+                                   BdrvRequestPadding *pad,
+                                   struct iovec *iov, int niov,
+                                   size_t iov_offset, size_t bytes)
+{
+    int padded_niov, surplus_count, collapse_count;
+
+    /* Assert this invariant */
+    assert(niov <= IOV_MAX);
+
+    /*
+     * Cannot pad if resulting length would exceed SIZE_MAX.  Returning an error
+     * to the guest is not ideal, but there is little else we can do.  At least
+     * this will practically never happen on 64-bit systems.
+     */
+    if (SIZE_MAX - pad->head < bytes ||
+        SIZE_MAX - pad->head - bytes < pad->tail)
+    {
+        return -EINVAL;
+    }
+
+    /* Length of the resulting IOV if we just concatenated everything */
+    padded_niov = !!pad->head + niov + !!pad->tail;
+
+    qemu_iovec_init(&pad->local_qiov, MIN(padded_niov, IOV_MAX));
+
+    if (pad->head) {
+        qemu_iovec_add(&pad->local_qiov, pad->buf, pad->head);
+    }
+
+    /*
+     * If padded_niov > IOV_MAX, we cannot just concatenate everything.
+     * Instead, merge the first two or three elements of @iov to reduce the
+     * number of vector elements as necessary.
+     */
+    if (padded_niov > IOV_MAX) {
+        /*
+         * Only head and tail can have lead to the number of entries exceeding
+         * IOV_MAX, so we can exceed it by the head and tail at most.  We need
+         * to reduce the number of elements by `surplus_count`, so we merge that
+         * many elements plus one into one element.
+         */
+        surplus_count = padded_niov - IOV_MAX;
+        assert(surplus_count <= !!pad->head + !!pad->tail);
+        collapse_count = surplus_count + 1;
+
+        /*
+         * Move the elements to collapse into `pad->pre_collapse_qiov`, then
+         * advance `iov` (and associated variables) by those elements.
+         */
+        qemu_iovec_init(&pad->pre_collapse_qiov, collapse_count);
+        qemu_iovec_concat_iov(&pad->pre_collapse_qiov, iov,
+                              collapse_count, iov_offset, SIZE_MAX);
+        iov += collapse_count;
+        iov_offset = 0;
+        niov -= collapse_count;
+        bytes -= pad->pre_collapse_qiov.size;
+
+        /*
+         * Construct the bounce buffer to match the length of the to-collapse
+         * vector elements, and for write requests, initialize it with the data
+         * from those elements.  Then add it to `pad->local_qiov`.
+         */
+        pad->collapse_len = pad->pre_collapse_qiov.size;
+        pad->collapse_bounce_buf = qemu_blockalign(bs, pad->collapse_len);
+        if (pad->write) {
+            qemu_iovec_to_buf(&pad->pre_collapse_qiov, 0,
+                              pad->collapse_bounce_buf, pad->collapse_len);
+        }
+        qemu_iovec_add(&pad->local_qiov,
+                       pad->collapse_bounce_buf, pad->collapse_len);
+    }
+
+    qemu_iovec_concat_iov(&pad->local_qiov, iov, niov, iov_offset, bytes);
+
+    if (pad->tail) {
+        qemu_iovec_add(&pad->local_qiov,
+                       pad->buf + pad->buf_len - pad->tail, pad->tail);
+    }
+
+    assert(pad->local_qiov.niov == MIN(padded_niov, IOV_MAX));
+    return 0;
+}
+
 /*
  * bdrv_pad_request
  *
@@ -1749,6 +1875,8 @@ static void bdrv_padding_destroy(BdrvRequestPadding *pad)
  * read of padding, bdrv_padding_rmw_read() should be called separately if
  * needed.
  *
+ * @write is true for write requests, false for read requests.
+ *
  * Request parameters (@qiov, &qiov_offset, &offset, &bytes) are in-out:
  *  - on function start they represent original request
  *  - on failure or when padding is not needed they are unchanged
@@ -1757,25 +1885,33 @@ static void bdrv_padding_destroy(BdrvRequestPadding *pad)
 static int bdrv_pad_request(BlockDriverState *bs,
                             QEMUIOVector **qiov, size_t *qiov_offset,
                             int64_t *offset, int64_t *bytes,
+                            bool write,
                             BdrvRequestPadding *pad, bool *padded)
 {
     int ret;
+    struct iovec *sliced_iov;
+    int sliced_niov;
+    size_t sliced_head, sliced_tail;
 
     bdrv_check_qiov_request(*offset, *bytes, *qiov, *qiov_offset, &error_abort);
 
-    if (!bdrv_init_padding(bs, *offset, *bytes, pad)) {
+    if (!bdrv_init_padding(bs, *offset, *bytes, write, pad)) {
         if (padded) {
             *padded = false;
         }
         return 0;
     }
 
-    ret = qemu_iovec_init_extended(&pad->local_qiov, pad->buf, pad->head,
-                                   *qiov, *qiov_offset, *bytes,
-                                   pad->buf + pad->buf_len - pad->tail,
-                                   pad->tail);
+    sliced_iov = qemu_iovec_slice(*qiov, *qiov_offset, *bytes,
+                                  &sliced_head, &sliced_tail,
+                                  &sliced_niov);
+
+    /* Guaranteed by bdrv_check_qiov_request() */
+    assert(*bytes <= SIZE_MAX);
+    ret = bdrv_create_padded_qiov(bs, pad, sliced_iov, sliced_niov,
+                                  sliced_head, *bytes);
     if (ret < 0) {
-        bdrv_padding_destroy(pad);
+        bdrv_padding_finalize(pad);
         return ret;
     }
     *bytes += pad->head + pad->tail;
@@ -1836,8 +1972,8 @@ int coroutine_fn bdrv_co_preadv_part(BdrvChild *child,
         flags |= BDRV_REQ_COPY_ON_READ;
     }
 
-    ret = bdrv_pad_request(bs, &qiov, &qiov_offset, &offset, &bytes, &pad,
-                           NULL);
+    ret = bdrv_pad_request(bs, &qiov, &qiov_offset, &offset, &bytes, false,
+                           &pad, NULL);
     if (ret < 0) {
         goto fail;
     }
@@ -1847,7 +1983,7 @@ int coroutine_fn bdrv_co_preadv_part(BdrvChild *child,
                               bs->bl.request_alignment,
                               qiov, qiov_offset, flags);
     tracked_request_end(&req);
-    bdrv_padding_destroy(&pad);
+    bdrv_padding_finalize(&pad);
 
 fail:
     bdrv_dec_in_flight(bs);
@@ -2167,7 +2303,7 @@ static int coroutine_fn bdrv_co_do_zero_pwritev(BdrvChild *child,
     bool padding;
     BdrvRequestPadding pad;
 
-    padding = bdrv_init_padding(bs, offset, bytes, &pad);
+    padding = bdrv_init_padding(bs, offset, bytes, true, &pad);
     if (padding) {
         bdrv_make_request_serialising(req, align);
 
@@ -2214,7 +2350,7 @@ static int coroutine_fn bdrv_co_do_zero_pwritev(BdrvChild *child,
     }
 
 out:
-    bdrv_padding_destroy(&pad);
+    bdrv_padding_finalize(&pad);
 
     return ret;
 }
@@ -2280,8 +2416,8 @@ int coroutine_fn bdrv_co_pwritev_part(BdrvChild *child,
          * bdrv_co_do_zero_pwritev() does aligning by itself, so, we do
          * alignment only if there is no ZERO flag.
          */
-        ret = bdrv_pad_request(bs, &qiov, &qiov_offset, &offset, &bytes, &pad,
-                               &padded);
+        ret = bdrv_pad_request(bs, &qiov, &qiov_offset, &offset, &bytes, true,
+                               &pad, &padded);
         if (ret < 0) {
             return ret;
         }
@@ -2310,7 +2446,7 @@ int coroutine_fn bdrv_co_pwritev_part(BdrvChild *child,
     ret = bdrv_aligned_pwritev(child, &req, offset, bytes, align,
                                qiov, qiov_offset, flags);
 
-    bdrv_padding_destroy(&pad);
+    bdrv_padding_finalize(&pad);
 
 out:
     tracked_request_end(&req);
-- 
Gitee


From 2a4a2492efb9f4fe2d69cea1224aac3c53626d69 Mon Sep 17 00:00:00 2001
From: Hanna Czenczek <hreitz@redhat.com>
Date: Tue, 11 Apr 2023 19:34:17 +0200
Subject: [PATCH 296/407] util/iov: Remove qemu_iovec_init_extended()

RH-Author: Hanna Czenczek <hreitz@redhat.com>
RH-MergeRequest: 291: block: Split padded I/O vectors exceeding IOV_MAX
RH-Bugzilla: 2141964
RH-Acked-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
RH-Commit: [3/5] 19c8307ef1289f1991199d1d1f6ab6c89a4b59ce

bdrv_pad_request() was the main user of qemu_iovec_init_extended().
HEAD^ has removed that use, so we can remove qemu_iovec_init_extended()
now.

The only remaining user is qemu_iovec_init_slice(), which can easily
inline the small part it really needs.

Note that qemu_iovec_init_extended() offered a memcpy() optimization to
initialize the new I/O vector.  qemu_iovec_concat_iov(), which is used
to replace its functionality, does not, but calls qemu_iovec_add() for
every single element.  If we decide this optimization was important, we
will need to re-implement it in qemu_iovec_concat_iov(), which might
also benefit its pre-existing users.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-Id: <20230411173418.19549-4-hreitz@redhat.com>
(cherry picked from commit cc63f6f6fa1aaa4b6405dd69432c693e9c8d18ca)
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
---
 include/qemu/iov.h |  5 ---
 util/iov.c         | 79 +++++++---------------------------------------
 2 files changed, 11 insertions(+), 73 deletions(-)

diff --git a/include/qemu/iov.h b/include/qemu/iov.h
index 46fadfb27a6..63a1c01965d 100644
--- a/include/qemu/iov.h
+++ b/include/qemu/iov.h
@@ -222,11 +222,6 @@ static inline void *qemu_iovec_buf(QEMUIOVector *qiov)
 
 void qemu_iovec_init(QEMUIOVector *qiov, int alloc_hint);
 void qemu_iovec_init_external(QEMUIOVector *qiov, struct iovec *iov, int niov);
-int qemu_iovec_init_extended(
-        QEMUIOVector *qiov,
-        void *head_buf, size_t head_len,
-        QEMUIOVector *mid_qiov, size_t mid_offset, size_t mid_len,
-        void *tail_buf, size_t tail_len);
 void qemu_iovec_init_slice(QEMUIOVector *qiov, QEMUIOVector *source,
                            size_t offset, size_t len);
 struct iovec *qemu_iovec_slice(QEMUIOVector *qiov,
diff --git a/util/iov.c b/util/iov.c
index 3ccb530b164..af3ccc2546b 100644
--- a/util/iov.c
+++ b/util/iov.c
@@ -411,70 +411,6 @@ int qemu_iovec_subvec_niov(QEMUIOVector *qiov, size_t offset, size_t len)
     return niov;
 }
 
-/*
- * Compile new iovec, combining @head_buf buffer, sub-qiov of @mid_qiov,
- * and @tail_buf buffer into new qiov.
- */
-int qemu_iovec_init_extended(
-        QEMUIOVector *qiov,
-        void *head_buf, size_t head_len,
-        QEMUIOVector *mid_qiov, size_t mid_offset, size_t mid_len,
-        void *tail_buf, size_t tail_len)
-{
-    size_t mid_head, mid_tail;
-    int total_niov, mid_niov = 0;
-    struct iovec *p, *mid_iov = NULL;
-
-    assert(mid_qiov->niov <= IOV_MAX);
-
-    if (SIZE_MAX - head_len < mid_len ||
-        SIZE_MAX - head_len - mid_len < tail_len)
-    {
-        return -EINVAL;
-    }
-
-    if (mid_len) {
-        mid_iov = qemu_iovec_slice(mid_qiov, mid_offset, mid_len,
-                                   &mid_head, &mid_tail, &mid_niov);
-    }
-
-    total_niov = !!head_len + mid_niov + !!tail_len;
-    if (total_niov > IOV_MAX) {
-        return -EINVAL;
-    }
-
-    if (total_niov == 1) {
-        qemu_iovec_init_buf(qiov, NULL, 0);
-        p = &qiov->local_iov;
-    } else {
-        qiov->niov = qiov->nalloc = total_niov;
-        qiov->size = head_len + mid_len + tail_len;
-        p = qiov->iov = g_new(struct iovec, qiov->niov);
-    }
-
-    if (head_len) {
-        p->iov_base = head_buf;
-        p->iov_len = head_len;
-        p++;
-    }
-
-    assert(!mid_niov == !mid_len);
-    if (mid_niov) {
-        memcpy(p, mid_iov, mid_niov * sizeof(*p));
-        p[0].iov_base = (uint8_t *)p[0].iov_base + mid_head;
-        p[0].iov_len -= mid_head;
-        p[mid_niov - 1].iov_len -= mid_tail;
-        p += mid_niov;
-    }
-
-    if (tail_len) {
-        p->iov_base = tail_buf;
-        p->iov_len = tail_len;
-    }
-
-    return 0;
-}
-
 /*
  * Check if the contents of subrange of qiov data is all zeroes.
  */
@@ -506,14 +442,21 @@ bool qemu_iovec_is_zero(QEMUIOVector *qiov, size_t offset, size_t bytes)
 void qemu_iovec_init_slice(QEMUIOVector *qiov, QEMUIOVector *source,
                            size_t offset, size_t len)
 {
-    int ret;
+    struct iovec *slice_iov;
+    int slice_niov;
+    size_t slice_head, slice_tail;
 
     assert(source->size >= len);
     assert(source->size - len >= offset);
 
-    /* We shrink the request, so we can't overflow neither size_t nor MAX_IOV */
-    ret = qemu_iovec_init_extended(qiov, NULL, 0, source, offset, len, NULL, 0);
-    assert(ret == 0);
+    slice_iov = qemu_iovec_slice(source, offset, len,
+                                 &slice_head, &slice_tail, &slice_niov);
+    if (slice_niov == 1) {
+        qemu_iovec_init_buf(qiov, slice_iov[0].iov_base + slice_head, len);
+    } else {
+        qemu_iovec_init(qiov, slice_niov);
+        qemu_iovec_concat_iov(qiov, slice_iov, slice_niov, slice_head, len);
+    }
 }
 
 void qemu_iovec_destroy(QEMUIOVector *qiov)
-- 
Gitee


From b392fc4046e91971477d261c0489f0d48dc2eecf Mon Sep 17 00:00:00 2001
From: Hanna Czenczek <hreitz@redhat.com>
Date: Tue, 11 Apr 2023 19:34:18 +0200
Subject: [PATCH 297/407] iotests/iov-padding: New test

RH-Author: Hanna Czenczek <hreitz@redhat.com>
RH-MergeRequest: 291: block: Split padded I/O vectors exceeding IOV_MAX
RH-Bugzilla: 2141964
RH-Acked-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
RH-Commit: [4/5] a80be9c26ebd5503745989cd6823cb4814264258

Test that even vectored IO requests with 1024 vector elements that are
not aligned to the device's request alignment will succeed.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-Id: <20230411173418.19549-5-hreitz@redhat.com>
(cherry picked from commit d7e1905e3f54ff9512db4c7a946a8603b62b108d)
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
---
 tests/qemu-iotests/tests/iov-padding     | 85 ++++++++++++++++++++++++
 tests/qemu-iotests/tests/iov-padding.out | 59 ++++++++++++++++
 2 files changed, 144 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/iov-padding
 create mode 100644 tests/qemu-iotests/tests/iov-padding.out

diff --git a/tests/qemu-iotests/tests/iov-padding b/tests/qemu-iotests/tests/iov-padding
new file mode 100755
index 00000000000..b9604900c7c
--- /dev/null
+++ b/tests/qemu-iotests/tests/iov-padding
@@ -0,0 +1,85 @@
+#!/usr/bin/env bash
+# group: rw quick
+#
+# Check the interaction of request padding (to fit alignment restrictions) with
+# vectored I/O from the guest
+#
+# Copyright Red Hat
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+seq=$(basename $0)
+echo "QA output created by $seq"
+
+status=1	# failure is the default!
+
+_cleanup()
+{
+    _cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+cd ..
+. ./common.rc
+. ./common.filter
+
+_supported_fmt raw
+_supported_proto file
+
+_make_test_img 1M
+
+IMGSPEC="driver=blkdebug,align=4096,image.driver=file,image.filename=$TEST_IMG"
+
+# Four combinations:
+# - Offset 4096, length 1023 * 512 + 512: Fully aligned to 4k
+# - Offset 4096, length 1023 * 512 + 4096: Head is aligned, tail is not
+# - Offset 512, length 1023 * 512 + 512: Neither head nor tail are aligned
+# - Offset 512, length 1023 * 512 + 4096: Tail is aligned, head is not
+for start_offset in 4096 512; do
+    for last_element_length in 512 4096; do
+        length=$((1023 * 512 + $last_element_length))
+
+        echo
+        echo "== performing 1024-element vectored requests to image (offset: $start_offset; length: $length) =="
+
+        # Fill with data for testing
+        $QEMU_IO -c 'write -P 1 0 1M' "$TEST_IMG" | _filter_qemu_io
+
+        # 1023 512-byte buffers, and then one with length $last_element_length
+        cmd_params="-P 2 $start_offset $(yes 512 | head -n 1023 | tr '\n' ' ') $last_element_length"
+        QEMU_IO_OPTIONS="$QEMU_IO_OPTIONS_NO_FMT" $QEMU_IO \
+            -c "writev $cmd_params" \
+            --image-opts \
+            "$IMGSPEC" \
+            | _filter_qemu_io
+
+        # Read all patterns -- read the part we just wrote with writev twice,
+        # once "normally", and once with a readv, so we see that that works, too
+        QEMU_IO_OPTIONS="$QEMU_IO_OPTIONS_NO_FMT" $QEMU_IO \
+            -c "read -P 1 0 $start_offset" \
+            -c "read -P 2 $start_offset $length" \
+            -c "readv $cmd_params" \
+            -c "read -P 1 $((start_offset + length)) $((1024 * 1024 - length - start_offset))" \
+            --image-opts \
+            "$IMGSPEC" \
+            | _filter_qemu_io
+    done
+done
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/tests/iov-padding.out b/tests/qemu-iotests/tests/iov-padding.out
new file mode 100644
index 00000000000..e07a91fac76
--- /dev/null
+++ b/tests/qemu-iotests/tests/iov-padding.out
@@ -0,0 +1,59 @@
+QA output created by iov-padding
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+
+== performing 1024-element vectored requests to image (offset: 4096; length: 524288) ==
+wrote 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 524288/524288 bytes at offset 4096
+512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 4096/4096 bytes at offset 0
+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 524288/524288 bytes at offset 4096
+512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 524288/524288 bytes at offset 4096
+512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 520192/520192 bytes at offset 528384
+508 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== performing 1024-element vectored requests to image (offset: 4096; length: 527872) ==
+wrote 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 527872/527872 bytes at offset 4096
+515.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 4096/4096 bytes at offset 0
+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 527872/527872 bytes at offset 4096
+515.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 527872/527872 bytes at offset 4096
+515.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 516608/516608 bytes at offset 531968
+504.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== performing 1024-element vectored requests to image (offset: 512; length: 524288) ==
+wrote 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 524288/524288 bytes at offset 512
+512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 512/512 bytes at offset 0
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 524288/524288 bytes at offset 512
+512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 524288/524288 bytes at offset 512
+512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 523776/523776 bytes at offset 524800
+511.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== performing 1024-element vectored requests to image (offset: 512; length: 527872) ==
+wrote 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 527872/527872 bytes at offset 512
+515.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 512/512 bytes at offset 0
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 527872/527872 bytes at offset 512
+515.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 527872/527872 bytes at offset 512
+515.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 520192/520192 bytes at offset 528384
+508 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+*** done
-- 
Gitee


From e611e9f6526b1f7fb6f4c18ff96ebf09a41e8077 Mon Sep 17 00:00:00 2001
From: Hanna Czenczek <hreitz@redhat.com>
Date: Fri, 14 Jul 2023 10:59:38 +0200
Subject: [PATCH 298/407] block: Fix pad_request's request restriction

RH-Author: Hanna Czenczek <hreitz@redhat.com>
RH-MergeRequest: 291: block: Split padded I/O vectors exceeding IOV_MAX
RH-Bugzilla: 2141964
RH-Acked-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
RH-Commit: [5/5] f9188bd089d6c67185ea1accde20d491a2ed3193

bdrv_pad_request() relies on requests' lengths not to exceed SIZE_MAX,
which bdrv_check_qiov_request() does not guarantee.

bdrv_check_request32() however will guarantee this, and both of
bdrv_pad_request()'s callers (bdrv_co_preadv_part() and
bdrv_co_pwritev_part()) already run it before calling
bdrv_pad_request().  Therefore, bdrv_pad_request() can safely call
bdrv_check_request32() without expecting error, too.

In effect, this patch will not change guest-visible behavior.  It is a
clean-up to tighten a condition to match what is guaranteed by our
callers, and which exists purely to show clearly why the subsequent
assertion (`assert(*bytes <= SIZE_MAX)`) is always true.

Note there is a difference between the interfaces of
bdrv_check_qiov_request() and bdrv_check_request32(): The former takes
an errp, the latter does not, so we can no longer just pass
&error_abort.  Instead, we need to check the returned value.  While we
do expect success (because the callers have already run this function),
an assert(ret == 0) is not much simpler than just to return an error if
it occurs, so let us handle errors by returning them up the stack now.

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-id: 20230714085938.202730-1-hreitz@redhat.com
Fixes: 18743311b829cafc1737a5f20bc3248d5f91ee2a
       ("block: Collapse padded I/O vecs exceeding IOV_MAX")
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
---
 block/io.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/block/io.c b/block/io.c
index 0fe8f0dd40b..8ae57728a66 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1893,7 +1893,11 @@ static int bdrv_pad_request(BlockDriverState *bs,
     int sliced_niov;
     size_t sliced_head, sliced_tail;
 
-    bdrv_check_qiov_request(*offset, *bytes, *qiov, *qiov_offset, &error_abort);
+    /* Should have been checked by the caller already */
+    ret = bdrv_check_request32(*offset, *bytes, *qiov, *qiov_offset);
+    if (ret < 0) {
+        return ret;
+    }
 
     if (!bdrv_init_padding(bs, *offset, *bytes, write, pad)) {
         if (padded) {
@@ -1906,7 +1910,7 @@ static int bdrv_pad_request(BlockDriverState *bs,
                                   &sliced_head, &sliced_tail,
                                   &sliced_niov);
 
-    /* Guaranteed by bdrv_check_qiov_request() */
+    /* Guaranteed by bdrv_check_request32() */
     assert(*bytes <= SIZE_MAX);
     ret = bdrv_create_padded_qiov(bs, pad, sliced_iov, sliced_niov,
                                   sliced_head, *bytes);
-- 
Gitee


From ef16caeea38e35317f8f81178affe3669ec8ff12 Mon Sep 17 00:00:00 2001
From: Bandan Das <bsd@redhat.com>
Date: Thu, 3 Aug 2023 15:23:54 -0400
Subject: [PATCH 299/407] qapi, i386/sev: Change the reduced-phys-bits value
 from 5 to 1

RH-Author: Bandan Das <None>
RH-MergeRequest: 296: Updates to SEV reduced-phys-bits parameter
RH-Bugzilla: 2214840
RH-Acked-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
RH-Acked-by: Vitaly Kuznetsov <vkuznets@redhat.com>
RH-Commit: [1/4] 4137cb3b57cbb175078bc908fb2301ea2b97fd17

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2214840

commit 798a818f50a9bfc01e8b5943090de458863b897b
Author: Tom Lendacky <thomas.lendacky@amd.com>
Date:   Fri Sep 30 10:14:27 2022 -0500

    qapi, i386/sev: Change the reduced-phys-bits value from 5 to 1

    A guest only ever experiences, at most, 1 bit of reduced physical
    addressing. Change the query-sev-capabilities json comment to use 1.

    Fixes: 31dd67f684 ("sev/i386: qmp: add query-sev-capabilities command")
    Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
    Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
    Message-Id: <cb96d8e09154533af4b4e6988469bc0b32390b65.1664550870.git.thomas.lendacky@amd.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

RHEL Notes:
     Conflicts: Context differences, since commit 811b4ec7f8eb<qapi, target/i386/sev: Add cpu0-id to query-sev-capabilities>
     is missing

Signed-off-by: Bandan Das <bsd@redhat.com>
---
 qapi/misc-target.json | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/qapi/misc-target.json b/qapi/misc-target.json
index 4bc45d24741..ede9052440a 100644
--- a/qapi/misc-target.json
+++ b/qapi/misc-target.json
@@ -205,7 +205,7 @@
 #
 # -> { "execute": "query-sev-capabilities" }
 # <- { "return": { "pdh": "8CCDD8DDD", "cert-chain": "888CCCDDDEE",
-#                  "cbitpos": 47, "reduced-phys-bits": 5}}
+#                  "cbitpos": 47, "reduced-phys-bits": 1}}
 #
 ##
 { 'command': 'query-sev-capabilities', 'returns': 'SevCapability',
-- 
Gitee


From cd7cfa2e2eea71c0fe627e8b9567feba203cf1a5 Mon Sep 17 00:00:00 2001
From: Bandan Das <bsd@redhat.com>
Date: Thu, 3 Aug 2023 15:12:15 -0400
Subject: [PATCH 300/407] qemu-options.hx: Update the reduced-phys-bits
 documentation

RH-Author: Bandan Das <None>
RH-MergeRequest: 296: Updates to SEV reduced-phys-bits parameter
RH-Bugzilla: 2214840
RH-Acked-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
RH-Acked-by: Vitaly Kuznetsov <vkuznets@redhat.com>
RH-Commit: [2/4] f8e8f5aeff449a34ce90c6e55e2a51873a6e6a87

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2214840

commit 326e3015c4c6f3197157ea0bb00826ae740e2fad
Author: Tom Lendacky <thomas.lendacky@amd.com>
Date:   Fri Sep 30 10:14:28 2022 -0500

    qemu-options.hx: Update the reduced-phys-bits documentation

    A guest only ever experiences, at most, 1 bit of reduced physical
    addressing. Update the documentation to reflect this as well as change
    the example value on the reduced-phys-bits option.

    Fixes: a9b4942f48 ("target/i386: add Secure Encrypted Virtualization (SEV) object")
    Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
    Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
    Message-Id: <13a62ced1808546c1d398e2025cf85f4c94ae123.1664550870.git.thomas.lendacky@amd.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Bandan Das <bsd@redhat.com>
---
 qemu-options.hx | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index 4b7798088b0..981248e283e 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -5204,7 +5204,7 @@ SRST
         physical address space. The ``reduced-phys-bits`` is used to
         provide the number of bits we loose in physical address space.
         Similar to C-bit, the value is Host family dependent. On EPYC,
-        the value should be 5.
+        a guest will lose a maximum of 1 bit, so the value should be 1.
 
         The ``sev-device`` provides the device file to use for
         communicating with the SEV firmware running inside AMD Secure
@@ -5239,7 +5239,7 @@ SRST
 
              # |qemu_system_x86| \\
                  ...... \\
-                 -object sev-guest,id=sev0,cbitpos=47,reduced-phys-bits=5 \\
+                 -object sev-guest,id=sev0,cbitpos=47,reduced-phys-bits=1 \\
                  -machine ...,memory-encryption=sev0 \\
                  .....
 
-- 
Gitee


From 18f6c6176f5e3e1b8a23349985a8444bc4faa999 Mon Sep 17 00:00:00 2001
From: Bandan Das <bsd@redhat.com>
Date: Thu, 3 Aug 2023 15:13:19 -0400
Subject: [PATCH 301/407] i386/sev: Update checks and information related to
 reduced-phys-bits

RH-Author: Bandan Das <None>
RH-MergeRequest: 296: Updates to SEV reduced-phys-bits parameter
RH-Bugzilla: 2214840
RH-Acked-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
RH-Acked-by: Vitaly Kuznetsov <vkuznets@redhat.com>
RH-Commit: [3/4] b617173d2b15fa39cdc02b5c1ac4d52e9b0dfede

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2214840

commit 8168fed9f84e3128f7628969ae78af49433d5ce7
Author: Tom Lendacky <thomas.lendacky@amd.com>
Date:   Fri Sep 30 10:14:29 2022 -0500

    i386/sev: Update checks and information related to reduced-phys-bits

    The value of the reduced-phys-bits parameter is propogated to the CPUID
    information exposed to the guest. Update the current validation check to
    account for the size of the CPUID field (6-bits), ensuring the value is
    in the range of 1 to 63.

    Maintain backward compatibility, to an extent, by allowing a value greater
    than 1 (so that the previously documented value of 5 still works), but not
    allowing anything over 63.

    Fixes: d8575c6c02 ("sev/i386: add command to initialize the memory encryption context")
    Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
    Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
    Message-Id: <cca5341a95ac73f904e6300f10b04f9c62e4e8ff.1664550870.git.thomas.lendacky@amd.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Bandan Das <bsd@redhat.com>
---
 target/i386/sev.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index 025ff7a6f84..ba6a65e90c1 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -892,15 +892,26 @@ int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
     host_cpuid(0x8000001F, 0, NULL, &ebx, NULL, NULL);
     host_cbitpos = ebx & 0x3f;
 
+    /*
+     * The cbitpos value will be placed in bit positions 5:0 of the EBX
+     * register of CPUID 0x8000001F. No need to verify the range as the
+     * comparison against the host value accomplishes that.
+     */
     if (host_cbitpos != sev->cbitpos) {
         error_setg(errp, "%s: cbitpos check failed, host '%d' requested '%d'",
                    __func__, host_cbitpos, sev->cbitpos);
         goto err;
     }
 
-    if (sev->reduced_phys_bits < 1) {
-        error_setg(errp, "%s: reduced_phys_bits check failed, it should be >=1,"
-                   " requested '%d'", __func__, sev->reduced_phys_bits);
+    /*
+     * The reduced-phys-bits value will be placed in bit positions 11:6 of
+     * the EBX register of CPUID 0x8000001F, so verify the supplied value
+     * is in the range of 1 to 63.
+     */
+    if (sev->reduced_phys_bits < 1 || sev->reduced_phys_bits > 63) {
+        error_setg(errp, "%s: reduced_phys_bits check failed,"
+                   " it should be in the range of 1 to 63, requested '%d'",
+                   __func__, sev->reduced_phys_bits);
         goto err;
     }
 
-- 
Gitee


From c5d6e8a4846c088237a7e6ef0e15689aab1ac6f3 Mon Sep 17 00:00:00 2001
From: Bandan Das <bsd@redhat.com>
Date: Thu, 3 Aug 2023 15:14:14 -0400
Subject: [PATCH 302/407] i386/cpu: Update how the EBX register of CPUID
 0x8000001F is set

RH-Author: Bandan Das <None>
RH-MergeRequest: 296: Updates to SEV reduced-phys-bits parameter
RH-Bugzilla: 2214840
RH-Acked-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
RH-Acked-by: Vitaly Kuznetsov <vkuznets@redhat.com>
RH-Commit: [4/4] 8b236fd9bc4c177bfacf6220a429e711b5bf062e

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2214840

commit fb6bbafc0f19385fb257ee073ed13dcaf613f2f8
Author: Tom Lendacky <thomas.lendacky@amd.com>
Date:   Fri Sep 30 10:14:30 2022 -0500

    i386/cpu: Update how the EBX register of CPUID 0x8000001F is set

    Update the setting of CPUID 0x8000001F EBX to clearly document the ranges
    associated with fields being set.

    Fixes: 6cb8f2a663 ("cpu/i386: populate CPUID 0x8000_001F when SEV is active")
    Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
    Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
    Message-Id: <5822fd7d02b575121380e1f493a8f6d9eba2b11a.1664550870.git.thomas.lendacky@amd.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Bandan Das <bsd@redhat.com>
---
 target/i386/cpu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 9d3dcdcc0d2..265f0aadfc5 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -5836,8 +5836,8 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
         if (sev_enabled()) {
             *eax = 0x2;
             *eax |= sev_es_enabled() ? 0x8 : 0;
-            *ebx = sev_get_cbit_position();
-            *ebx |= sev_get_reduced_phys_bits() << 6;
+            *ebx = sev_get_cbit_position() & 0x3f; /* EBX[5:0] */
+            *ebx |= (sev_get_reduced_phys_bits() & 0x3f) << 6; /* EBX[11:6] */
         }
         break;
     default:
-- 
Gitee


From e11b2df25efbf7b7c3727aa3d8f8c21494c6bce3 Mon Sep 17 00:00:00 2001
From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
Date: Mon, 23 May 2022 18:26:58 +0200
Subject: [PATCH 303/407] target/i386/kvm: Fix disabling MPX on "-cpu host"
 with MPX-capable host

RH-Author: Ani Sinha <None>
RH-MergeRequest: 297: target/i386/kvm: Fix disabling MPX on "-cpu host" with MPX-capable host
RH-Bugzilla: 2223947
RH-Acked-by: Vitaly Kuznetsov <vkuznets@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [1/1] 90098294a873a53b366389606fd0402efcbd70ad

Since KVM commit 5f76f6f5ff96 ("KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled")
it is not possible to disable MPX on a "-cpu host" just by adding "-mpx"
there if the host CPU does indeed support MPX.
QEMU will fail to set MSR_IA32_VMX_TRUE_{EXIT,ENTRY}_CTLS MSRs in this case
and so trigger an assertion failure.

Instead, besides "-mpx" one has to explicitly add also
"-vmx-exit-clear-bndcfgs" and "-vmx-entry-load-bndcfgs" to QEMU command
line to make it work, which is a bit convoluted.

Make the MPX-related bits in FEAT_VMX_{EXIT,ENTRY}_CTLS dependent on MPX
being actually enabled so such workarounds are no longer necessary.

Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <51aa2125c76363204cc23c27165e778097c33f0b.1653323077.git.maciej.szmigiero@oracle.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 267b5e7e378afd260004cb37a66a6fcd641e3b53)
---
 target/i386/cpu.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 265f0aadfc5..726814ee2e8 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1326,6 +1326,14 @@ static FeatureDep feature_dependencies[] = {
         .from = { FEAT_7_0_EBX,             CPUID_7_0_EBX_INVPCID },
         .to = { FEAT_VMX_SECONDARY_CTLS,    VMX_SECONDARY_EXEC_ENABLE_INVPCID },
     },
+    {
+        .from = { FEAT_7_0_EBX,             CPUID_7_0_EBX_MPX },
+        .to = { FEAT_VMX_EXIT_CTLS,         VMX_VM_EXIT_CLEAR_BNDCFGS },
+    },
+    {
+        .from = { FEAT_7_0_EBX,             CPUID_7_0_EBX_MPX },
+        .to = { FEAT_VMX_ENTRY_CTLS,        VMX_VM_ENTRY_LOAD_BNDCFGS },
+    },
     {
         .from = { FEAT_7_0_EBX,             CPUID_7_0_EBX_RDSEED },
         .to = { FEAT_VMX_SECONDARY_CTLS,    VMX_SECONDARY_EXEC_RDSEED_EXITING },
-- 
Gitee


From 295eb31117ae8e1909761249ed519f97f366766f Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Tue, 15 Aug 2023 19:00:44 -0400
Subject: [PATCH 304/407] vhost-vdpa: do not cleanup the vdpa/vhost-net
 structures if peer nic is present

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 304: vhost-vdpa: do not cleanup the vdpa/vhost-net structures if peer nic is present
RH-Bugzilla: 2215786
RH-Acked-by: Ani Sinha <None>
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Commit: [1/1] 16aa37efdf129f2619cedf9c030222b88eda9e26 (redhat/rhel/src/qemu-kvm/jons-qemu-kvm-2)

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2215786
CVE: CVE-2023-3301
Upstream: Merged
Conflicts: commit babf8b87127a is not present in this release, so the commit does not
           apply cleanly. The two adjacent munmap() calls introduced by that commit
           don't seem to be needed for the logics of this change.

commit a0d7215e339b61c7d7a7b3fcf754954d80d93eb8
Author: Ani Sinha <anisinha@redhat.com>
Date:   Mon Jun 19 12:22:09 2023 +0530

    vhost-vdpa: do not cleanup the vdpa/vhost-net structures if peer nic is present

    When a peer nic is still attached to the vdpa backend, it is too early to free
    up the vhost-net and vdpa structures. If these structures are freed here, then
    QEMU crashes when the guest is being shut down. The following call chain
    would result in an assertion failure since the pointer returned from
    vhost_vdpa_get_vhost_net() would be NULL:

    do_vm_stop() -> vm_state_notify() -> virtio_set_status() ->
    virtio_net_vhost_status() -> get_vhost_net().

    Therefore, we defer freeing up the structures until at guest shutdown
    time when qemu_cleanup() calls net_cleanup() which then calls
    qemu_del_net_client() which would eventually call vhost_vdpa_cleanup()
    again to free up the structures. This time, the loop in net_cleanup()
    ensures that vhost_vdpa_cleanup() will be called one last time when
    all the peer nics are detached and freed.

    All unit tests pass with this change.

    CC: imammedo@redhat.com
    CC: jusual@redhat.com
    CC: mst@redhat.com
    Fixes: CVE-2023-3301
    Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2128929
    Signed-off-by: Ani Sinha <anisinha@redhat.com>
    Message-Id: <20230619065209.442185-1-anisinha@redhat.com>
    Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 net/vhost-vdpa.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 814f7046875..ac48de94957 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -128,6 +128,14 @@ static void vhost_vdpa_cleanup(NetClientState *nc)
 {
     VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
 
+    /*
+     * If a peer NIC is attached, do not cleanup anything.
+     * Cleanup will happen as a part of qemu_cleanup() -> net_cleanup()
+     * when the guest is shutting down.
+     */
+    if (nc->peer && nc->peer->info->type == NET_CLIENT_DRIVER_NIC) {
+        return;
+    }
     if (s->vhost_net) {
         vhost_net_cleanup(s->vhost_net);
         g_free(s->vhost_net);
-- 
Gitee


From 597676bf62510a321cae27dd17f1df44940a5eca Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Tue, 4 Jul 2023 10:41:22 +0200
Subject: [PATCH 305/407] ui/vnc-clipboard: fix infinite loop in inflate_buffer
 (CVE-2023-3255)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 316: ui/vnc-clipboard: fix infinite loop in inflate_buffer (CVE-2023-3255)
RH-Bugzilla: 2218488
RH-Acked-by: Mauro Matteo Cascella <None>
RH-Commit: [1/1] f3cb05fb6e40261da5fe10f003fa3e57920469bb (redhat/rhel/src/qemu-kvm/jons-qemu-kvm-2)

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2218488
CVE: CVE-2023-3255
Upstream: Merged

commit d921fea338c1059a27ce7b75309d7a2e485f710b
Author: Mauro Matteo Cascella <mcascell@redhat.com>
Date:   Tue Jul 4 10:41:22 2023 +0200

    ui/vnc-clipboard: fix infinite loop in inflate_buffer (CVE-2023-3255)

    A wrong exit condition may lead to an infinite loop when inflating a
    valid zlib buffer containing some extra bytes in the `inflate_buffer`
    function. The bug only occurs post-authentication. Return the buffer
    immediately if the end of the compressed data has been reached
    (Z_STREAM_END).

    Fixes: CVE-2023-3255
    Fixes: 0bf41cab ("ui/vnc: clipboard support")
    Reported-by: Kevin Denis <kevin.denis@synacktiv.com>
    Signed-off-by: Mauro Matteo Cascella <mcascell@redhat.com>
    Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
    Tested-by: Marc-André Lureau <marcandre.lureau@redhat.com>
    Message-ID: <20230704084210.101822-1-mcascell@redhat.com>

Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 ui/vnc-clipboard.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/ui/vnc-clipboard.c b/ui/vnc-clipboard.c
index 67284b556cd..c84599cfdbe 100644
--- a/ui/vnc-clipboard.c
+++ b/ui/vnc-clipboard.c
@@ -51,8 +51,11 @@ static uint8_t *inflate_buffer(uint8_t *in, uint32_t in_len, uint32_t *size)
         ret = inflate(&stream, Z_FINISH);
         switch (ret) {
         case Z_OK:
-        case Z_STREAM_END:
             break;
+        case Z_STREAM_END:
+            *size = stream.total_out;
+            inflateEnd(&stream);
+            return out;
         case Z_BUF_ERROR:
             out_len <<= 1;
             if (out_len > (1 << 20)) {
@@ -67,11 +70,6 @@ static uint8_t *inflate_buffer(uint8_t *in, uint32_t in_len, uint32_t *size)
         }
     }
 
-    *size = stream.total_out;
-    inflateEnd(&stream);
-
-    return out;
-
 err_end:
     inflateEnd(&stream);
 err:
-- 
Gitee


From 0ac1994e5df5e8ee32c98364e6054206182044e8 Mon Sep 17 00:00:00 2001
From: Janosch Frank <frankja@linux.ibm.com>
Date: Wed, 23 Aug 2023 16:22:15 +0200
Subject: [PATCH 306/407] s390x/ap: fix missing subsystem reset registration
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Thomas Huth <thuth@redhat.com>
RH-MergeRequest: 321: Enable Secure Execution Crypto Passthrough for KVM on s390x
RH-Bugzilla: 2111390
RH-Acked-by: Cédric Le Goater <clg@redhat.com>
RH-Commit: [1/5] 4ebe81bb6cc4fc137ca4ebc9c0cebdedc421cc91

A subsystem reset contains a reset of AP resources which has been
missing.  Adding the AP bridge to the list of device types that need
reset fixes this issue.

Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Fixes: a51b3153 ("s390x/ap: base Adjunct Processor (AP) object model")
Message-ID: <20230823142219.1046522-2-seiden@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 297ec01f0b9864ea8209ca0ddc6643b4c0574bdb)
---
 hw/s390x/s390-virtio-ccw.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 4a7cd21cacb..412d73715ac 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -100,6 +100,7 @@ static const char *const reset_dev_types[] = {
     "s390-flic",
     "diag288",
     TYPE_S390_PCI_HOST_BRIDGE,
+    TYPE_AP_BRIDGE,
 };
 
 static void subsystem_reset(void)
-- 
Gitee


From b75412f3044b658a2c71e1c8e10abca16b8124bd Mon Sep 17 00:00:00 2001
From: Janosch Frank <frankja@linux.ibm.com>
Date: Fri, 1 Sep 2023 11:48:51 +0000
Subject: [PATCH 307/407] s390x: do a subsystem reset before the unprotect on
 reboot
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Thomas Huth <thuth@redhat.com>
RH-MergeRequest: 321: Enable Secure Execution Crypto Passthrough for KVM on s390x
RH-Bugzilla: 2111390
RH-Acked-by: Cédric Le Goater <clg@redhat.com>
RH-Commit: [2/5] ea430d236e1a20ddad7095d2e6d10f741f9a1907

Bound APQNs have to be reset before tearing down the secure config via
s390_machine_unprotect(). Otherwise the Ultravisor will return a error
code.

So let's do a subsystem_reset() which includes a AP reset before the
unprotect call. We'll do a full device_reset() afterwards which will
reset some devices twice. That's ok since we can't move the
device_reset() before the unprotect as it includes a CPU clear reset
which the Ultravisor does not expect at that point in time.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Message-ID: <20230901114851.154357-1-frankja@linux.ibm.com>
Tested-by: Viktor Mihajlovski <mihajlov@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit ef1535901a07f2e49fa25c8bcee7f0b73801d824)

Conflicts:
	hw/s390x/s390-virtio-ccw.c
	(contextual conflict due to missing commit 7966d70f6f6b)
Signed-off-by: Thomas Huth <thuth@redhat.com>
---
 hw/s390x/s390-virtio-ccw.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 412d73715ac..17146469eec 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -430,10 +430,20 @@ static void s390_machine_reset(MachineState *machine)
     switch (reset_type) {
     case S390_RESET_EXTERNAL:
     case S390_RESET_REIPL:
+        /*
+         * Reset the subsystem which includes a AP reset. If a PV
+         * guest had APQNs attached the AP reset is a prerequisite to
+         * unprotecting since the UV checks if all APQNs are reset.
+         */
+        subsystem_reset();
         if (s390_is_pv()) {
             s390_machine_unprotect(ms);
         }
 
+        /*
+         * Device reset includes CPU clear resets so this has to be
+         * done AFTER the unprotect call above.
+         */
         qemu_devices_reset();
         s390_crypto_reset();
 
-- 
Gitee


From 7c7edcd82b618383f0681696f297e8835652521d Mon Sep 17 00:00:00 2001
From: Thomas Huth <thuth@redhat.com>
Date: Tue, 12 Sep 2023 11:24:40 +0200
Subject: [PATCH 308/407] redhat: Update linux-headers for
 kvm_s390_vm_cpu_uv_feat
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Thomas Huth <thuth@redhat.com>
RH-MergeRequest: 321: Enable Secure Execution Crypto Passthrough for KVM on s390x
RH-Bugzilla: 2111390
RH-Acked-by: Cédric Le Goater <clg@redhat.com>
RH-Commit: [3/5] f1329f5ce5f66033ead7777384dcc1613cad1226

Upstream Status: rhel-only

This hunk is part of upstream commit da3c22c74a3c
("linux-headers: Update to Linux v6.6-rc1"), but since that
commit updates a lot of files and does not apply cleanly,
we only focus on the necessary change here.

Signed-off-by: Thomas Huth <thuth@redhat.com>
---
 linux-headers/asm-s390/kvm.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/linux-headers/asm-s390/kvm.h b/linux-headers/asm-s390/kvm.h
index f053b8304a8..6706bdc5cc0 100644
--- a/linux-headers/asm-s390/kvm.h
+++ b/linux-headers/asm-s390/kvm.h
@@ -158,6 +158,22 @@ struct kvm_s390_vm_cpu_subfunc {
 	__u8 reserved[1728];
 };
 
+#define KVM_S390_VM_CPU_PROCESSOR_UV_FEAT_GUEST	6
+#define KVM_S390_VM_CPU_MACHINE_UV_FEAT_GUEST	7
+
+#define KVM_S390_VM_CPU_UV_FEAT_NR_BITS	64
+struct kvm_s390_vm_cpu_uv_feat {
+	union {
+		struct {
+			__u64 : 4;
+			__u64 ap : 1;		/* bit 4 */
+			__u64 ap_intr : 1;	/* bit 5 */
+			__u64 : 58;
+		};
+		__u64 feat;
+	};
+};
+
 /* kvm attributes for crypto */
 #define KVM_S390_VM_CRYPTO_ENABLE_AES_KW	0
 #define KVM_S390_VM_CRYPTO_ENABLE_DEA_KW	1
-- 
Gitee


From 50bf1f823e5fadb8dc764810228661cb41ba9a8a Mon Sep 17 00:00:00 2001
From: Steffen Eiden <seiden@linux.ibm.com>
Date: Wed, 23 Aug 2023 16:22:18 +0200
Subject: [PATCH 309/407] target/s390x/kvm: Refactor AP functionalities
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Thomas Huth <thuth@redhat.com>
RH-MergeRequest: 321: Enable Secure Execution Crypto Passthrough for KVM on s390x
RH-Bugzilla: 2111390
RH-Acked-by: Cédric Le Goater <clg@redhat.com>
RH-Commit: [4/5] 8ab2f8766931fb65a391aab590d0ccabd8ba8909

kvm_s390_set_attr() is a misleading name as it only sets attributes for
the KVM_S390_VM_CRYPTO group. Therefore, rename it to
kvm_s390_set_crypto_attr().

Add new functions ap_available() and ap_enabled() to avoid code
duplication later.

Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael Mueller <mimu@linux.ibm.com>
Signed-off-by: Steffen Eiden <seiden@linux.ibm.com>
Message-ID: <20230823142219.1046522-5-seiden@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 354383c12294f2ee510204cfdc5aaed9f0c42171)
---
 target/s390x/kvm/kvm.c | 24 +++++++++++++++++-------
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
index 8d36c377b58..eb8ca4c7809 100644
--- a/target/s390x/kvm/kvm.c
+++ b/target/s390x/kvm/kvm.c
@@ -251,7 +251,7 @@ static void kvm_s390_enable_cmma(void)
     trace_kvm_enable_cmma(rc);
 }
 
-static void kvm_s390_set_attr(uint64_t attr)
+static void kvm_s390_set_crypto_attr(uint64_t attr)
 {
     struct kvm_device_attr attribute = {
         .group = KVM_S390_VM_CRYPTO,
@@ -276,7 +276,7 @@ static void kvm_s390_init_aes_kw(void)
     }
 
     if (kvm_vm_check_attr(kvm_state, KVM_S390_VM_CRYPTO, attr)) {
-            kvm_s390_set_attr(attr);
+            kvm_s390_set_crypto_attr(attr);
     }
 }
 
@@ -290,7 +290,7 @@ static void kvm_s390_init_dea_kw(void)
     }
 
     if (kvm_vm_check_attr(kvm_state, KVM_S390_VM_CRYPTO, attr)) {
-            kvm_s390_set_attr(attr);
+            kvm_s390_set_crypto_attr(attr);
     }
 }
 
@@ -2297,6 +2297,17 @@ static int configure_cpu_subfunc(const S390FeatBitmap features)
     return kvm_vm_ioctl(kvm_state, KVM_SET_DEVICE_ATTR, &attr);
 }
 
+static bool ap_available(void)
+{
+    return kvm_vm_check_attr(kvm_state, KVM_S390_VM_CRYPTO,
+                             KVM_S390_VM_CRYPTO_ENABLE_APIE);
+}
+
+static bool ap_enabled(const S390FeatBitmap features)
+{
+    return test_bit(S390_FEAT_AP, features);
+}
+
 static int kvm_to_feat[][2] = {
     { KVM_S390_VM_CPU_FEAT_ESOP, S390_FEAT_ESOP },
     { KVM_S390_VM_CPU_FEAT_SIEF2, S390_FEAT_SIE_F2 },
@@ -2476,8 +2487,7 @@ void kvm_s390_get_host_cpu_model(S390CPUModel *model, Error **errp)
         return;
     }
     /* for now, we can only provide the AP feature with HW support */
-    if (kvm_vm_check_attr(kvm_state, KVM_S390_VM_CRYPTO,
-        KVM_S390_VM_CRYPTO_ENABLE_APIE)) {
+    if (ap_available()) {
         set_bit(S390_FEAT_AP, model->features);
     }
 
@@ -2503,7 +2513,7 @@ static void kvm_s390_configure_apie(bool interpret)
                                 KVM_S390_VM_CRYPTO_DISABLE_APIE;
 
     if (kvm_vm_check_attr(kvm_state, KVM_S390_VM_CRYPTO, attr)) {
-        kvm_s390_set_attr(attr);
+        kvm_s390_set_crypto_attr(attr);
     }
 }
 
@@ -2565,7 +2575,7 @@ void kvm_s390_apply_cpu_model(const S390CPUModel *model, Error **errp)
         kvm_s390_enable_cmma();
     }
 
-    if (test_bit(S390_FEAT_AP, model->features)) {
+    if (ap_enabled(model->features)) {
         kvm_s390_configure_apie(true);
     }
 }
-- 
Gitee


From cc48c53bd097c5dfe5791fc4e63f838931bc260c Mon Sep 17 00:00:00 2001
From: Steffen Eiden <seiden@linux.ibm.com>
Date: Wed, 23 Aug 2023 16:22:19 +0200
Subject: [PATCH 310/407] target/s390x: AP-passthrough for PV guests
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Thomas Huth <thuth@redhat.com>
RH-MergeRequest: 321: Enable Secure Execution Crypto Passthrough for KVM on s390x
RH-Bugzilla: 2111390
RH-Acked-by: Cédric Le Goater <clg@redhat.com>
RH-Commit: [5/5] 9bf3dfd78fb030a22db7bb756a2cb7f54a0a8d82

Enabling AP-passthrough(AP-pt) for PV-guest by using the new CPU
features for PV-AP-pt of KVM.

As usual QEMU first checks which CPU features are available and then
sets them if available and selected by user. An additional check is done
to verify that PV-AP can only be enabled if "regular" AP-pt is enabled
as well. Note that KVM itself does not enforce this restriction.

Reviewed-by: Michael Mueller <mimu@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Steffen Eiden <seiden@linux.ibm.com>
Message-ID: <20230823142219.1046522-6-seiden@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 5ac951519c23d9eaf7dc9e2dcbcbc7d9a745ffe7)

Conflicts:
	target/s390x/gen-features.c
	(simple contextual conflict due to missing S390_FEAT_PAIE)
Signed-off-by: Thomas Huth <thuth@redhat.com>
---
 target/s390x/cpu_features.h         |  1 +
 target/s390x/cpu_features_def.h.inc |  4 ++
 target/s390x/cpu_models.c           |  2 +
 target/s390x/gen-features.c         |  2 +
 target/s390x/kvm/kvm.c              | 70 +++++++++++++++++++++++++++++
 5 files changed, 79 insertions(+)

diff --git a/target/s390x/cpu_features.h b/target/s390x/cpu_features.h
index 87463f064d2..a9bd68a2e11 100644
--- a/target/s390x/cpu_features.h
+++ b/target/s390x/cpu_features.h
@@ -43,6 +43,7 @@ typedef enum {
     S390_FEAT_TYPE_KDSA,
     S390_FEAT_TYPE_SORTL,
     S390_FEAT_TYPE_DFLTCC,
+    S390_FEAT_TYPE_UV_FEAT_GUEST,
 } S390FeatType;
 
 /* Definition of a CPU feature */
diff --git a/target/s390x/cpu_features_def.h.inc b/target/s390x/cpu_features_def.h.inc
index e86662bb3b8..aa1f51f2a8a 100644
--- a/target/s390x/cpu_features_def.h.inc
+++ b/target/s390x/cpu_features_def.h.inc
@@ -378,3 +378,7 @@ DEF_FEAT(DEFLATE_GHDT, "dfltcc-gdht", DFLTCC, 1, "DFLTCC GDHT")
 DEF_FEAT(DEFLATE_CMPR, "dfltcc-cmpr", DFLTCC, 2, "DFLTCC CMPR")
 DEF_FEAT(DEFLATE_XPND, "dfltcc-xpnd", DFLTCC, 4, "DFLTCC XPND")
 DEF_FEAT(DEFLATE_F0, "dfltcc-f0", DFLTCC, 192, "DFLTCC format 0 parameter-block")
+
+/* Features exposed via the UV-CALL instruction */
+DEF_FEAT(UV_FEAT_AP, "appv", UV_FEAT_GUEST, 4, "AP instructions installed for secure guests")
+DEF_FEAT(UV_FEAT_AP_INTR, "appvi", UV_FEAT_GUEST, 5, "AP instructions interruption support for secure guests")
diff --git a/target/s390x/cpu_models.c b/target/s390x/cpu_models.c
index 11e06cc51fa..454485e7069 100644
--- a/target/s390x/cpu_models.c
+++ b/target/s390x/cpu_models.c
@@ -467,6 +467,8 @@ static void check_consistency(const S390CPUModel *model)
         { S390_FEAT_DIAG_318, S390_FEAT_EXTENDED_LENGTH_SCCB },
         { S390_FEAT_NNPA, S390_FEAT_VECTOR },
         { S390_FEAT_RDP, S390_FEAT_LOCAL_TLB_CLEARING },
+        { S390_FEAT_UV_FEAT_AP, S390_FEAT_AP },
+        { S390_FEAT_UV_FEAT_AP_INTR, S390_FEAT_UV_FEAT_AP },
     };
     int i;
 
diff --git a/target/s390x/gen-features.c b/target/s390x/gen-features.c
index 7cb1a6ec10b..b789288c828 100644
--- a/target/s390x/gen-features.c
+++ b/target/s390x/gen-features.c
@@ -575,6 +575,8 @@ static uint16_t full_GEN16_GA1[] = {
     S390_FEAT_BEAR_ENH,
     S390_FEAT_RDP,
     S390_FEAT_PAI,
+    S390_FEAT_UV_FEAT_AP,
+    S390_FEAT_UV_FEAT_AP_INTR,
 };
 
 
diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
index eb8ca4c7809..a963866ef4b 100644
--- a/target/s390x/kvm/kvm.c
+++ b/target/s390x/kvm/kvm.c
@@ -2308,6 +2308,42 @@ static bool ap_enabled(const S390FeatBitmap features)
     return test_bit(S390_FEAT_AP, features);
 }
 
+static bool uv_feat_supported(void)
+{
+    return kvm_vm_check_attr(kvm_state, KVM_S390_VM_CPU_MODEL,
+                             KVM_S390_VM_CPU_PROCESSOR_UV_FEAT_GUEST);
+}
+
+static int query_uv_feat_guest(S390FeatBitmap features)
+{
+    struct kvm_s390_vm_cpu_uv_feat prop = {};
+    struct kvm_device_attr attr = {
+        .group = KVM_S390_VM_CPU_MODEL,
+        .attr = KVM_S390_VM_CPU_MACHINE_UV_FEAT_GUEST,
+        .addr = (uint64_t) &prop,
+    };
+    int rc;
+
+    /* AP support check is currently the only user of the UV feature test */
+    if (!(uv_feat_supported() && ap_available())) {
+        return 0;
+    }
+
+    rc = kvm_vm_ioctl(kvm_state, KVM_GET_DEVICE_ATTR, &attr);
+    if (rc) {
+        return  rc;
+    }
+
+    if (prop.ap) {
+        set_bit(S390_FEAT_UV_FEAT_AP, features);
+    }
+    if (prop.ap_intr) {
+        set_bit(S390_FEAT_UV_FEAT_AP_INTR, features);
+    }
+
+    return 0;
+}
+
 static int kvm_to_feat[][2] = {
     { KVM_S390_VM_CPU_FEAT_ESOP, S390_FEAT_ESOP },
     { KVM_S390_VM_CPU_FEAT_SIEF2, S390_FEAT_SIE_F2 },
@@ -2502,11 +2538,38 @@ void kvm_s390_get_host_cpu_model(S390CPUModel *model, Error **errp)
         set_bit(S390_FEAT_DIAG_318, model->features);
     }
 
+    /* Test for Ultravisor features that influence secure guest behavior */
+    query_uv_feat_guest(model->features);
+
     /* strip of features that are not part of the maximum model */
     bitmap_and(model->features, model->features, model->def->full_feat,
                S390_FEAT_MAX);
 }
 
+static int configure_uv_feat_guest(const S390FeatBitmap features)
+{
+    struct kvm_s390_vm_cpu_uv_feat uv_feat = {};
+    struct kvm_device_attr attribute = {
+        .group = KVM_S390_VM_CPU_MODEL,
+        .attr = KVM_S390_VM_CPU_PROCESSOR_UV_FEAT_GUEST,
+        .addr = (__u64) &uv_feat,
+    };
+
+    /* AP support check is currently the only user of the UV feature test */
+    if (!(uv_feat_supported() && ap_enabled(features))) {
+        return 0;
+    }
+
+    if (test_bit(S390_FEAT_UV_FEAT_AP, features)) {
+        uv_feat.ap = 1;
+    }
+    if (test_bit(S390_FEAT_UV_FEAT_AP_INTR, features)) {
+        uv_feat.ap_intr = 1;
+    }
+
+    return kvm_vm_ioctl(kvm_state, KVM_SET_DEVICE_ATTR, &attribute);
+}
+
 static void kvm_s390_configure_apie(bool interpret)
 {
     uint64_t attr = interpret ? KVM_S390_VM_CRYPTO_ENABLE_APIE :
@@ -2578,6 +2641,13 @@ void kvm_s390_apply_cpu_model(const S390CPUModel *model, Error **errp)
     if (ap_enabled(model->features)) {
         kvm_s390_configure_apie(true);
     }
+
+    /* configure UV-features for the guest indicated via query / test_bit */
+    rc = configure_uv_feat_guest(model->features);
+    if (rc) {
+        error_setg(errp, "KVM: Error configuring CPU UV features %d", rc);
+        return;
+    }
 }
 
 void kvm_s390_restart_interrupt(S390CPU *cpu)
-- 
Gitee


From 23077cc4965024afb10638c6e06931eea94fd199 Mon Sep 17 00:00:00 2001
From: Thomas Huth <thuth@redhat.com>
Date: Fri, 17 Nov 2023 11:32:37 +0100
Subject: [PATCH 311/407] target/s390x/dump: Remove unneeded dump info function
 pointer init
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Thomas Huth <thuth@redhat.com>
RH-MergeRequest: 323: Fix problem that secure execution guest might remain in "paused" state after failed dump
RH-Jira: RHEL-16696
RH-Acked-by: Marc-André Lureau <marcandre.lureau@redhat.com>
RH-Acked-by: Cédric Le Goater <clg@redhat.com>
RH-Commit: [1/3] e3b0697ec76274f778fc523efb72f0cbca25cd77

JIRA: https://issues.redhat.com/browse/RHEL-16696

commit 816644b1219900875f47d7adf9bfb283f1b29aa0
Author: Janosch Frank <frankja@linux.ibm.com>
Date:   Thu Nov 9 12:04:41 2023 +0000

    target/s390x/dump: Remove unneeded dump info function pointer init

    dump_state_prepare() now sets the function pointers to NULL so we only
    need to touch them if we're going to use them.

    Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
    Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
    Reviewed-by: Thomas Huth <thuth@redhat.com>
    Message-ID: <20231109120443.185979-2-frankja@linux.ibm.com>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Thomas Huth <thuth@redhat.com>
---
 target/s390x/arch_dump.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/target/s390x/arch_dump.c b/target/s390x/arch_dump.c
index a7c44ba49d3..7cdd4b71670 100644
--- a/target/s390x/arch_dump.c
+++ b/target/s390x/arch_dump.c
@@ -454,10 +454,6 @@ int cpu_get_dump_info(ArchDumpInfo *info,
         info->arch_sections_add_fn = *arch_sections_add;
         info->arch_sections_write_hdr_fn = *arch_sections_write_hdr;
         info->arch_sections_write_fn = *arch_sections_write;
-    } else {
-        info->arch_sections_add_fn = NULL;
-        info->arch_sections_write_hdr_fn = NULL;
-        info->arch_sections_write_fn = NULL;
     }
     return 0;
 }
-- 
Gitee


From 3145dceedfd15ed5b8f56fa9216bee07f8d78e30 Mon Sep 17 00:00:00 2001
From: Thomas Huth <thuth@redhat.com>
Date: Fri, 17 Nov 2023 11:32:37 +0100
Subject: [PATCH 312/407] dump: Add arch cleanup function
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Thomas Huth <thuth@redhat.com>
RH-MergeRequest: 323: Fix problem that secure execution guest might remain in "paused" state after failed dump
RH-Jira: RHEL-16696
RH-Acked-by: Marc-André Lureau <marcandre.lureau@redhat.com>
RH-Acked-by: Cédric Le Goater <clg@redhat.com>
RH-Commit: [2/3] b70f406dec88ffd4877f3d5d580fc8f821bdb252

JIRA: https://issues.redhat.com/browse/RHEL-16696

commit e72629e5149aba6f44122ea6d2a803ef136a0c6b
Author: Janosch Frank <frankja@linux.ibm.com>
Date:   Thu Nov 9 12:04:42 2023 +0000

    dump: Add arch cleanup function

    Some architectures (s390x) need to cleanup after a failed dump to be
    able to continue to run the vm. Add a cleanup function pointer and
    call it if it's set.

    Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
    Reviewed-by: Thomas Huth <thuth@redhat.com>
    Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
    Message-ID: <20231109120443.185979-3-frankja@linux.ibm.com>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Thomas Huth <thuth@redhat.com>
---
 dump/dump.c                | 4 ++++
 include/sysemu/dump-arch.h | 1 +
 2 files changed, 5 insertions(+)

diff --git a/dump/dump.c b/dump/dump.c
index 5dee060b730..93edb895475 100644
--- a/dump/dump.c
+++ b/dump/dump.c
@@ -100,6 +100,10 @@ uint64_t cpu_to_dump64(DumpState *s, uint64_t val)
 
 static int dump_cleanup(DumpState *s)
 {
+    if (s->dump_info.arch_cleanup_fn) {
+        s->dump_info.arch_cleanup_fn(s);
+    }
+
     guest_phys_blocks_free(&s->guest_phys_blocks);
     memory_mapping_list_free(&s->list);
     close(s->fd);
diff --git a/include/sysemu/dump-arch.h b/include/sysemu/dump-arch.h
index 59bbc9be38c..743916e46ca 100644
--- a/include/sysemu/dump-arch.h
+++ b/include/sysemu/dump-arch.h
@@ -24,6 +24,7 @@ typedef struct ArchDumpInfo {
     void (*arch_sections_add_fn)(DumpState *s);
     uint64_t (*arch_sections_write_hdr_fn)(DumpState *s, uint8_t *buff);
     int (*arch_sections_write_fn)(DumpState *s, uint8_t *buff);
+    void (*arch_cleanup_fn)(DumpState *s);
 } ArchDumpInfo;
 
 struct GuestPhysBlockList; /* memory_mapping.h */
-- 
Gitee


From 479ef815102a2e0413bb4f37929b8b3008dcdcc0 Mon Sep 17 00:00:00 2001
From: Thomas Huth <thuth@redhat.com>
Date: Fri, 17 Nov 2023 11:32:37 +0100
Subject: [PATCH 313/407] target/s390x/arch_dump: Add arch cleanup function for
 PV dumps
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Thomas Huth <thuth@redhat.com>
RH-MergeRequest: 323: Fix problem that secure execution guest might remain in "paused" state after failed dump
RH-Jira: RHEL-16696
RH-Acked-by: Marc-André Lureau <marcandre.lureau@redhat.com>
RH-Acked-by: Cédric Le Goater <clg@redhat.com>
RH-Commit: [3/3] 0bb389c9339b95f7ff6dc284526b0c8d5ef736b4

JIRA: https://issues.redhat.com/browse/RHEL-16696

commit d12a91e0baafce7b1cbacff7cf9339eeb0011732
Author: Janosch Frank <frankja@linux.ibm.com>
Date:   Thu Nov 9 12:04:43 2023 +0000

    target/s390x/arch_dump: Add arch cleanup function for PV dumps

    PV dumps block vcpu runs until dump end is reached. If there's an
    error between PV dump init and PV dump end the vm will never be able
    to run again. One example of such an error is insufficient disk space
    for the dump file.

    Let's add a cleanup function that tries to do a dump end. The dump
    completion data is discarded but there's no point in writing it to a
    file anyway if there's a possibility that other PV dump data is
    missing.

    Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
    Reviewed-by: Thomas Huth <thuth@redhat.com>
    Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
    Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
    Message-ID: <20231109120443.185979-4-frankja@linux.ibm.com>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Thomas Huth <thuth@redhat.com>
---
 target/s390x/arch_dump.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/target/s390x/arch_dump.c b/target/s390x/arch_dump.c
index 7cdd4b71670..3b1f178dc30 100644
--- a/target/s390x/arch_dump.c
+++ b/target/s390x/arch_dump.c
@@ -439,6 +439,22 @@ static int arch_sections_write(DumpState *s, uint8_t *buff)
     return 0;
 }
 
+static void arch_cleanup(DumpState *s)
+{
+    g_autofree uint8_t *buff = NULL;
+    int rc;
+
+    if (!pv_dump_initialized) {
+        return;
+    }
+
+    buff = g_malloc(kvm_s390_pv_dmp_get_size_completion_data());
+    rc = kvm_s390_dump_completion_data(buff);
+    if (!rc) {
+            pv_dump_initialized = false;
+    }
+}
+
 int cpu_get_dump_info(ArchDumpInfo *info,
                       const struct GuestPhysBlockList *guest_phys_blocks)
 {
@@ -454,6 +470,7 @@ int cpu_get_dump_info(ArchDumpInfo *info,
         info->arch_sections_add_fn = *arch_sections_add;
         info->arch_sections_write_hdr_fn = *arch_sections_write_hdr;
         info->arch_sections_write_fn = *arch_sections_write;
+        info->arch_cleanup_fn = *arch_cleanup;
     }
     return 0;
 }
-- 
Gitee


From 11abdd5627c69862678bfb4f703b09103cb286db Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Thu, 23 Nov 2023 11:30:46 -0500
Subject: [PATCH 314/407] net: Provide MemReentrancyGuard * to qemu_new_nic()

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 331: net: Provide MemReentrancyGuard * to qemu_new_nic()
RH-Jira: RHEL-7309
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
RH-Acked-by: Laurent Vivier <lvivier@redhat.com>
RH-Acked-by: Jason Wang <jasowang@redhat.com>
RH-Commit: [1/2] bc963fb349b90288f547de97a5cbe9a74f856419 (redhat/rhel/src/qemu-kvm/jons-qemu-kvm-2)

Jira: https://issues.redhat.com/browse/RHEL-7309
CVE: CVE-2023-3019
Upstream: Merged
Conflicts: hw/net/hw/net/xen_nic.c seems to have undergone significant changes upstream,
           so the change had to be manually adapted to the old code.

commit 7d0fefdf81f5973334c344f6b8e1896c309dff66
Author: Akihiko Odaki <akihiko.odaki@daynix.com>
Date:   Thu Jun 1 12:18:58 2023 +0900

    net: Provide MemReentrancyGuard * to qemu_new_nic()

    Recently MemReentrancyGuard was added to DeviceState to record that the
    device is engaging in I/O. The network device backend needs to update it
    when delivering a packet to a device.

    In preparation for such a change, add MemReentrancyGuard * as a
    parameter of qemu_new_nic().

    Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
    Reviewed-by: Alexander Bulekov <alxndr@bu.edu>
    Signed-off-by: Jason Wang <jasowang@redhat.com>

Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 hw/arm/musicpal.c             | 3 ++-
 hw/net/allwinner-sun8i-emac.c | 3 ++-
 hw/net/allwinner_emac.c       | 3 ++-
 hw/net/cadence_gem.c          | 3 ++-
 hw/net/dp8393x.c              | 3 ++-
 hw/net/e1000.c                | 3 ++-
 hw/net/e1000e.c               | 2 +-
 hw/net/eepro100.c             | 4 +++-
 hw/net/etraxfs_eth.c          | 3 ++-
 hw/net/fsl_etsec/etsec.c      | 3 ++-
 hw/net/ftgmac100.c            | 3 ++-
 hw/net/i82596.c               | 2 +-
 hw/net/imx_fec.c              | 2 +-
 hw/net/lan9118.c              | 3 ++-
 hw/net/mcf_fec.c              | 3 ++-
 hw/net/mipsnet.c              | 3 ++-
 hw/net/msf2-emac.c            | 3 ++-
 hw/net/ne2000-isa.c           | 3 ++-
 hw/net/ne2000-pci.c           | 3 ++-
 hw/net/npcm7xx_emc.c          | 3 ++-
 hw/net/opencores_eth.c        | 3 ++-
 hw/net/pcnet.c                | 3 ++-
 hw/net/rocker/rocker_fp.c     | 4 ++--
 hw/net/rtl8139.c              | 3 ++-
 hw/net/smc91c111.c            | 3 ++-
 hw/net/spapr_llan.c           | 3 ++-
 hw/net/stellaris_enet.c       | 3 ++-
 hw/net/sungem.c               | 2 +-
 hw/net/sunhme.c               | 3 ++-
 hw/net/tulip.c                | 3 ++-
 hw/net/virtio-net.c           | 6 ++++--
 hw/net/vmxnet3.c              | 2 +-
 hw/net/xen_nic.c              | 3 ++-
 hw/net/xgmac.c                | 3 ++-
 hw/net/xilinx_axienet.c       | 3 ++-
 hw/net/xilinx_ethlite.c       | 3 ++-
 hw/usb/dev-network.c          | 3 ++-
 include/net/net.h             | 1 +
 net/net.c                     | 1 +
 39 files changed, 74 insertions(+), 39 deletions(-)

diff --git a/hw/arm/musicpal.c b/hw/arm/musicpal.c
index 2d612cc0c9b..eebf7622176 100644
--- a/hw/arm/musicpal.c
+++ b/hw/arm/musicpal.c
@@ -417,7 +417,8 @@ static void mv88w8618_eth_realize(DeviceState *dev, Error **errp)
 
     address_space_init(&s->dma_as, s->dma_mr, "emac-dma");
     s->nic = qemu_new_nic(&net_mv88w8618_info, &s->conf,
-                          object_get_typename(OBJECT(dev)), dev->id, s);
+                          object_get_typename(OBJECT(dev)), dev->id,
+                          &dev->mem_reentrancy_guard, s);
 }
 
 static const VMStateDescription mv88w8618_eth_vmsd = {
diff --git a/hw/net/allwinner-sun8i-emac.c b/hw/net/allwinner-sun8i-emac.c
index ff611f18fbd..9d0885ee154 100644
--- a/hw/net/allwinner-sun8i-emac.c
+++ b/hw/net/allwinner-sun8i-emac.c
@@ -810,7 +810,8 @@ static void allwinner_sun8i_emac_realize(DeviceState *dev, Error **errp)
 
     qemu_macaddr_default_if_unset(&s->conf.macaddr);
     s->nic = qemu_new_nic(&net_allwinner_sun8i_emac_info, &s->conf,
-                           object_get_typename(OBJECT(dev)), dev->id, s);
+                          object_get_typename(OBJECT(dev)), dev->id,
+                          &dev->mem_reentrancy_guard, s);
     qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);
 }
 
diff --git a/hw/net/allwinner_emac.c b/hw/net/allwinner_emac.c
index ddddf35c45d..b3d73143bf1 100644
--- a/hw/net/allwinner_emac.c
+++ b/hw/net/allwinner_emac.c
@@ -453,7 +453,8 @@ static void aw_emac_realize(DeviceState *dev, Error **errp)
 
     qemu_macaddr_default_if_unset(&s->conf.macaddr);
     s->nic = qemu_new_nic(&net_aw_emac_info, &s->conf,
-                          object_get_typename(OBJECT(dev)), dev->id, s);
+                          object_get_typename(OBJECT(dev)), dev->id,
+                          &dev->mem_reentrancy_guard, s);
     qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);
 
     fifo8_create(&s->rx_fifo, RX_FIFO_SIZE);
diff --git a/hw/net/cadence_gem.c b/hw/net/cadence_gem.c
index 24b3a0ff667..cb61a764171 100644
--- a/hw/net/cadence_gem.c
+++ b/hw/net/cadence_gem.c
@@ -1633,7 +1633,8 @@ static void gem_realize(DeviceState *dev, Error **errp)
     qemu_macaddr_default_if_unset(&s->conf.macaddr);
 
     s->nic = qemu_new_nic(&net_gem_info, &s->conf,
-                          object_get_typename(OBJECT(dev)), dev->id, s);
+                          object_get_typename(OBJECT(dev)), dev->id,
+                          &dev->mem_reentrancy_guard, s);
 
     if (s->jumbo_max_len > MAX_FRAME_SIZE) {
         error_setg(errp, "jumbo-max-len is greater than %d",
diff --git a/hw/net/dp8393x.c b/hw/net/dp8393x.c
index 45b954e46c2..abfcc6f69fe 100644
--- a/hw/net/dp8393x.c
+++ b/hw/net/dp8393x.c
@@ -943,7 +943,8 @@ static void dp8393x_realize(DeviceState *dev, Error **errp)
                           "dp8393x-regs", SONIC_REG_COUNT << s->it_shift);
 
     s->nic = qemu_new_nic(&net_dp83932_info, &s->conf,
-                          object_get_typename(OBJECT(dev)), dev->id, s);
+                          object_get_typename(OBJECT(dev)), dev->id,
+                          &dev->mem_reentrancy_guard, s);
     qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);
 
     s->watchdog = timer_new_ns(QEMU_CLOCK_VIRTUAL, dp8393x_watchdog, s);
diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index 282d01e3748..86da1ae39ef 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -1733,7 +1733,8 @@ static void pci_e1000_realize(PCIDevice *pci_dev, Error **errp)
                                macaddr);
 
     d->nic = qemu_new_nic(&net_e1000_info, &d->conf,
-                          object_get_typename(OBJECT(d)), dev->id, d);
+                          object_get_typename(OBJECT(d)), dev->id,
+                          &dev->mem_reentrancy_guard, d);
 
     qemu_format_nic_info_str(qemu_get_queue(d->nic), macaddr);
 
diff --git a/hw/net/e1000e.c b/hw/net/e1000e.c
index d35bc1f0b09..c6096fa8484 100644
--- a/hw/net/e1000e.c
+++ b/hw/net/e1000e.c
@@ -340,7 +340,7 @@ e1000e_init_net_peer(E1000EState *s, PCIDevice *pci_dev, uint8_t *macaddr)
     int i;
 
     s->nic = qemu_new_nic(&net_e1000e_info, &s->conf,
-        object_get_typename(OBJECT(s)), dev->id, s);
+        object_get_typename(OBJECT(s)), dev->id, &dev->mem_reentrancy_guard, s);
 
     s->core.max_queue_num = s->conf.peers.queues ? s->conf.peers.queues - 1 : 0;
 
diff --git a/hw/net/eepro100.c b/hw/net/eepro100.c
index 16e95ef9cc9..16ca4dda04b 100644
--- a/hw/net/eepro100.c
+++ b/hw/net/eepro100.c
@@ -1865,7 +1865,9 @@ static void e100_nic_realize(PCIDevice *pci_dev, Error **errp)
     nic_reset(s);
 
     s->nic = qemu_new_nic(&net_eepro100_info, &s->conf,
-                          object_get_typename(OBJECT(pci_dev)), pci_dev->qdev.id, s);
+                          object_get_typename(OBJECT(pci_dev)),
+                          pci_dev->qdev.id,
+                          &pci_dev->qdev.mem_reentrancy_guard, s);
 
     qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);
     TRACE(OTHER, logout("%s\n", qemu_get_queue(s->nic)->info_str));
diff --git a/hw/net/etraxfs_eth.c b/hw/net/etraxfs_eth.c
index 1b82aec7943..ba57a978d15 100644
--- a/hw/net/etraxfs_eth.c
+++ b/hw/net/etraxfs_eth.c
@@ -618,7 +618,8 @@ static void etraxfs_eth_realize(DeviceState *dev, Error **errp)
 
     qemu_macaddr_default_if_unset(&s->conf.macaddr);
     s->nic = qemu_new_nic(&net_etraxfs_info, &s->conf,
-                          object_get_typename(OBJECT(s)), dev->id, s);
+                          object_get_typename(OBJECT(s)), dev->id,
+                          &dev->mem_reentrancy_guard, s);
     qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);
 
     s->phy.read = tdk_read;
diff --git a/hw/net/fsl_etsec/etsec.c b/hw/net/fsl_etsec/etsec.c
index bd9d62b5593..f790613b525 100644
--- a/hw/net/fsl_etsec/etsec.c
+++ b/hw/net/fsl_etsec/etsec.c
@@ -391,7 +391,8 @@ static void etsec_realize(DeviceState *dev, Error **errp)
     eTSEC        *etsec = ETSEC_COMMON(dev);
 
     etsec->nic = qemu_new_nic(&net_etsec_info, &etsec->conf,
-                              object_get_typename(OBJECT(dev)), dev->id, etsec);
+                              object_get_typename(OBJECT(dev)), dev->id,
+                              &dev->mem_reentrancy_guard, etsec);
     qemu_format_nic_info_str(qemu_get_queue(etsec->nic), etsec->conf.macaddr.a);
 
     etsec->ptimer = ptimer_init(etsec_timer_hit, etsec, PTIMER_POLICY_DEFAULT);
diff --git a/hw/net/ftgmac100.c b/hw/net/ftgmac100.c
index 25685ba3a95..781e7f352e1 100644
--- a/hw/net/ftgmac100.c
+++ b/hw/net/ftgmac100.c
@@ -1111,7 +1111,8 @@ static void ftgmac100_realize(DeviceState *dev, Error **errp)
     qemu_macaddr_default_if_unset(&s->conf.macaddr);
 
     s->nic = qemu_new_nic(&net_ftgmac100_info, &s->conf,
-                          object_get_typename(OBJECT(dev)), dev->id, s);
+                          object_get_typename(OBJECT(dev)), dev->id,
+                          &dev->mem_reentrancy_guard, s);
     qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);
 }
 
diff --git a/hw/net/i82596.c b/hw/net/i82596.c
index ec21e2699a1..dc64246f754 100644
--- a/hw/net/i82596.c
+++ b/hw/net/i82596.c
@@ -743,7 +743,7 @@ void i82596_common_init(DeviceState *dev, I82596State *s, NetClientInfo *info)
         qemu_macaddr_default_if_unset(&s->conf.macaddr);
     }
     s->nic = qemu_new_nic(info, &s->conf, object_get_typename(OBJECT(dev)),
-                dev->id, s);
+                dev->id, &dev->mem_reentrancy_guard, s);
     qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);
 
     if (USE_TIMER) {
diff --git a/hw/net/imx_fec.c b/hw/net/imx_fec.c
index 9c7035bc948..ed19ee93501 100644
--- a/hw/net/imx_fec.c
+++ b/hw/net/imx_fec.c
@@ -1310,7 +1310,7 @@ static void imx_eth_realize(DeviceState *dev, Error **errp)
 
     s->nic = qemu_new_nic(&imx_eth_net_info, &s->conf,
                           object_get_typename(OBJECT(dev)),
-                          dev->id, s);
+                          dev->id, &dev->mem_reentrancy_guard, s);
 
     qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);
 }
diff --git a/hw/net/lan9118.c b/hw/net/lan9118.c
index 6aff424cbe5..942bce9ae67 100644
--- a/hw/net/lan9118.c
+++ b/hw/net/lan9118.c
@@ -1354,7 +1354,8 @@ static void lan9118_realize(DeviceState *dev, Error **errp)
     qemu_macaddr_default_if_unset(&s->conf.macaddr);
 
     s->nic = qemu_new_nic(&net_lan9118_info, &s->conf,
-                          object_get_typename(OBJECT(dev)), dev->id, s);
+                          object_get_typename(OBJECT(dev)), dev->id,
+                          &dev->mem_reentrancy_guard, s);
     qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);
     s->eeprom[0] = 0xa5;
     for (i = 0; i < 6; i++) {
diff --git a/hw/net/mcf_fec.c b/hw/net/mcf_fec.c
index 25e3e453ab1..a6be7bf4130 100644
--- a/hw/net/mcf_fec.c
+++ b/hw/net/mcf_fec.c
@@ -643,7 +643,8 @@ static void mcf_fec_realize(DeviceState *dev, Error **errp)
     mcf_fec_state *s = MCF_FEC_NET(dev);
 
     s->nic = qemu_new_nic(&net_mcf_fec_info, &s->conf,
-                          object_get_typename(OBJECT(dev)), dev->id, s);
+                          object_get_typename(OBJECT(dev)), dev->id,
+                          &dev->mem_reentrancy_guard, s);
     qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);
 }
 
diff --git a/hw/net/mipsnet.c b/hw/net/mipsnet.c
index 2ade72dea08..8e925de867c 100644
--- a/hw/net/mipsnet.c
+++ b/hw/net/mipsnet.c
@@ -255,7 +255,8 @@ static void mipsnet_realize(DeviceState *dev, Error **errp)
     sysbus_init_irq(sbd, &s->irq);
 
     s->nic = qemu_new_nic(&net_mipsnet_info, &s->conf,
-                          object_get_typename(OBJECT(dev)), dev->id, s);
+                          object_get_typename(OBJECT(dev)), dev->id,
+                          &dev->mem_reentrancy_guard, s);
     qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);
 }
 
diff --git a/hw/net/msf2-emac.c b/hw/net/msf2-emac.c
index 9278fdce0b3..1efa3dbf012 100644
--- a/hw/net/msf2-emac.c
+++ b/hw/net/msf2-emac.c
@@ -527,7 +527,8 @@ static void msf2_emac_realize(DeviceState *dev, Error **errp)
 
     qemu_macaddr_default_if_unset(&s->conf.macaddr);
     s->nic = qemu_new_nic(&net_msf2_emac_info, &s->conf,
-                          object_get_typename(OBJECT(dev)), dev->id, s);
+                          object_get_typename(OBJECT(dev)), dev->id,
+                          &dev->mem_reentrancy_guard, s);
     qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);
 }
 
diff --git a/hw/net/ne2000-isa.c b/hw/net/ne2000-isa.c
index dd6f6e34d3c..30bd20c2939 100644
--- a/hw/net/ne2000-isa.c
+++ b/hw/net/ne2000-isa.c
@@ -74,7 +74,8 @@ static void isa_ne2000_realizefn(DeviceState *dev, Error **errp)
     ne2000_reset(s);
 
     s->nic = qemu_new_nic(&net_ne2000_isa_info, &s->c,
-                          object_get_typename(OBJECT(dev)), dev->id, s);
+                          object_get_typename(OBJECT(dev)), dev->id,
+                          &dev->mem_reentrancy_guard, s);
     qemu_format_nic_info_str(qemu_get_queue(s->nic), s->c.macaddr.a);
 }
 
diff --git a/hw/net/ne2000-pci.c b/hw/net/ne2000-pci.c
index 9e5d10859ac..4f8a6990813 100644
--- a/hw/net/ne2000-pci.c
+++ b/hw/net/ne2000-pci.c
@@ -71,7 +71,8 @@ static void pci_ne2000_realize(PCIDevice *pci_dev, Error **errp)
 
     s->nic = qemu_new_nic(&net_ne2000_info, &s->c,
                           object_get_typename(OBJECT(pci_dev)),
-                          pci_dev->qdev.id, s);
+                          pci_dev->qdev.id,
+                          &pci_dev->qdev.mem_reentrancy_guard, s);
     qemu_format_nic_info_str(qemu_get_queue(s->nic), s->c.macaddr.a);
 }
 
diff --git a/hw/net/npcm7xx_emc.c b/hw/net/npcm7xx_emc.c
index 7c892f820fb..dd1d0ad3bc6 100644
--- a/hw/net/npcm7xx_emc.c
+++ b/hw/net/npcm7xx_emc.c
@@ -802,7 +802,8 @@ static void npcm7xx_emc_realize(DeviceState *dev, Error **errp)
 
     qemu_macaddr_default_if_unset(&emc->conf.macaddr);
     emc->nic = qemu_new_nic(&net_npcm7xx_emc_info, &emc->conf,
-                            object_get_typename(OBJECT(dev)), dev->id, emc);
+                            object_get_typename(OBJECT(dev)), dev->id,
+                            &dev->mem_reentrancy_guard, emc);
     qemu_format_nic_info_str(qemu_get_queue(emc->nic), emc->conf.macaddr.a);
 }
 
diff --git a/hw/net/opencores_eth.c b/hw/net/opencores_eth.c
index 0b3dc3146e6..f96d6ea2ccf 100644
--- a/hw/net/opencores_eth.c
+++ b/hw/net/opencores_eth.c
@@ -732,7 +732,8 @@ static void sysbus_open_eth_realize(DeviceState *dev, Error **errp)
     sysbus_init_irq(sbd, &s->irq);
 
     s->nic = qemu_new_nic(&net_open_eth_info, &s->conf,
-                          object_get_typename(OBJECT(s)), dev->id, s);
+                          object_get_typename(OBJECT(s)), dev->id,
+                          &dev->mem_reentrancy_guard, s);
 }
 
 static void qdev_open_eth_reset(DeviceState *dev)
diff --git a/hw/net/pcnet.c b/hw/net/pcnet.c
index dcd3fc49481..da910a70bf1 100644
--- a/hw/net/pcnet.c
+++ b/hw/net/pcnet.c
@@ -1718,7 +1718,8 @@ void pcnet_common_init(DeviceState *dev, PCNetState *s, NetClientInfo *info)
     s->poll_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, pcnet_poll_timer, s);
 
     qemu_macaddr_default_if_unset(&s->conf.macaddr);
-    s->nic = qemu_new_nic(info, &s->conf, object_get_typename(OBJECT(dev)), dev->id, s);
+    s->nic = qemu_new_nic(info, &s->conf, object_get_typename(OBJECT(dev)),
+                          dev->id, &dev->mem_reentrancy_guard, s);
     qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);
 
     /* Initialize the PROM */
diff --git a/hw/net/rocker/rocker_fp.c b/hw/net/rocker/rocker_fp.c
index cbeed65bd5e..0d21948adab 100644
--- a/hw/net/rocker/rocker_fp.c
+++ b/hw/net/rocker/rocker_fp.c
@@ -241,8 +241,8 @@ FpPort *fp_port_alloc(Rocker *r, char *sw_name,
     port->conf.bootindex = -1;
     port->conf.peers = *peers;
 
-    port->nic = qemu_new_nic(&fp_port_info, &port->conf,
-                             sw_name, NULL, port);
+    port->nic = qemu_new_nic(&fp_port_info, &port->conf, sw_name, NULL,
+                             &DEVICE(r)->mem_reentrancy_guard, port);
     qemu_format_nic_info_str(qemu_get_queue(port->nic),
                              port->conf.macaddr.a);
 
diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c
index 3ffb9dd22c2..a3565c7159d 100644
--- a/hw/net/rtl8139.c
+++ b/hw/net/rtl8139.c
@@ -3400,7 +3400,8 @@ static void pci_rtl8139_realize(PCIDevice *dev, Error **errp)
     s->eeprom.contents[9] = s->conf.macaddr.a[4] | s->conf.macaddr.a[5] << 8;
 
     s->nic = qemu_new_nic(&net_rtl8139_info, &s->conf,
-                          object_get_typename(OBJECT(dev)), d->id, s);
+                          object_get_typename(OBJECT(dev)), d->id,
+                          &d->mem_reentrancy_guard, s);
     qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);
 
     s->cplus_txbuffer = NULL;
diff --git a/hw/net/smc91c111.c b/hw/net/smc91c111.c
index ad778cd8fc7..4eda971ef3e 100644
--- a/hw/net/smc91c111.c
+++ b/hw/net/smc91c111.c
@@ -783,7 +783,8 @@ static void smc91c111_realize(DeviceState *dev, Error **errp)
     sysbus_init_irq(sbd, &s->irq);
     qemu_macaddr_default_if_unset(&s->conf.macaddr);
     s->nic = qemu_new_nic(&net_smc91c111_info, &s->conf,
-                          object_get_typename(OBJECT(dev)), dev->id, s);
+                          object_get_typename(OBJECT(dev)), dev->id,
+                          &dev->mem_reentrancy_guard, s);
     qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);
     /* ??? Save/restore.  */
 }
diff --git a/hw/net/spapr_llan.c b/hw/net/spapr_llan.c
index a6876a936db..475d5f3a348 100644
--- a/hw/net/spapr_llan.c
+++ b/hw/net/spapr_llan.c
@@ -325,7 +325,8 @@ static void spapr_vlan_realize(SpaprVioDevice *sdev, Error **errp)
     memcpy(&dev->perm_mac.a, &dev->nicconf.macaddr.a, sizeof(dev->perm_mac.a));
 
     dev->nic = qemu_new_nic(&net_spapr_vlan_info, &dev->nicconf,
-                            object_get_typename(OBJECT(sdev)), sdev->qdev.id, dev);
+                            object_get_typename(OBJECT(sdev)), sdev->qdev.id,
+                            &sdev->qdev.mem_reentrancy_guard, dev);
     qemu_format_nic_info_str(qemu_get_queue(dev->nic), dev->nicconf.macaddr.a);
 
     dev->rxp_timer = timer_new_us(QEMU_CLOCK_VIRTUAL, spapr_vlan_flush_rx_queue,
diff --git a/hw/net/stellaris_enet.c b/hw/net/stellaris_enet.c
index 8dd60783d81..6768a6912f0 100644
--- a/hw/net/stellaris_enet.c
+++ b/hw/net/stellaris_enet.c
@@ -492,7 +492,8 @@ static void stellaris_enet_realize(DeviceState *dev, Error **errp)
     qemu_macaddr_default_if_unset(&s->conf.macaddr);
 
     s->nic = qemu_new_nic(&net_stellaris_enet_info, &s->conf,
-                          object_get_typename(OBJECT(dev)), dev->id, s);
+                          object_get_typename(OBJECT(dev)), dev->id,
+                          &dev->mem_reentrancy_guard, s);
     qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);
 }
 
diff --git a/hw/net/sungem.c b/hw/net/sungem.c
index 3684a4d733b..c12d44e9dc9 100644
--- a/hw/net/sungem.c
+++ b/hw/net/sungem.c
@@ -1361,7 +1361,7 @@ static void sungem_realize(PCIDevice *pci_dev, Error **errp)
     qemu_macaddr_default_if_unset(&s->conf.macaddr);
     s->nic = qemu_new_nic(&net_sungem_info, &s->conf,
                           object_get_typename(OBJECT(dev)),
-                          dev->id, s);
+                          dev->id, &dev->mem_reentrancy_guard, s);
     qemu_format_nic_info_str(qemu_get_queue(s->nic),
                              s->conf.macaddr.a);
 }
diff --git a/hw/net/sunhme.c b/hw/net/sunhme.c
index fc34905f875..fa98528d713 100644
--- a/hw/net/sunhme.c
+++ b/hw/net/sunhme.c
@@ -892,7 +892,8 @@ static void sunhme_realize(PCIDevice *pci_dev, Error **errp)
 
     qemu_macaddr_default_if_unset(&s->conf.macaddr);
     s->nic = qemu_new_nic(&net_sunhme_info, &s->conf,
-                          object_get_typename(OBJECT(d)), d->id, s);
+                          object_get_typename(OBJECT(d)), d->id,
+                          &d->mem_reentrancy_guard, s);
     qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);
 }
 
diff --git a/hw/net/tulip.c b/hw/net/tulip.c
index ca69f7ea5e1..985c4c14a4f 100644
--- a/hw/net/tulip.c
+++ b/hw/net/tulip.c
@@ -981,7 +981,8 @@ static void pci_tulip_realize(PCIDevice *pci_dev, Error **errp)
 
     s->nic = qemu_new_nic(&net_tulip_info, &s->c,
                           object_get_typename(OBJECT(pci_dev)),
-                          pci_dev->qdev.id, s);
+                          pci_dev->qdev.id,
+                          &pci_dev->qdev.mem_reentrancy_guard, s);
     qemu_format_nic_info_str(qemu_get_queue(s->nic), s->c.macaddr.a);
 }
 
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index ddaa8fa1224..f5f07f8e639 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -3512,10 +3512,12 @@ static void virtio_net_device_realize(DeviceState *dev, Error **errp)
          * Happen when virtio_net_set_netclient_name has been called.
          */
         n->nic = qemu_new_nic(&net_virtio_info, &n->nic_conf,
-                              n->netclient_type, n->netclient_name, n);
+                              n->netclient_type, n->netclient_name,
+                              &dev->mem_reentrancy_guard, n);
     } else {
         n->nic = qemu_new_nic(&net_virtio_info, &n->nic_conf,
-                              object_get_typename(OBJECT(dev)), dev->id, n);
+                              object_get_typename(OBJECT(dev)), dev->id,
+                              &dev->mem_reentrancy_guard, n);
     }
 
     for (i = 0; i < n->max_queue_pairs; i++) {
diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index f65af4e9ef2..d4df039c559 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -2078,7 +2078,7 @@ static void vmxnet3_net_init(VMXNET3State *s)
 
     s->nic = qemu_new_nic(&net_vmxnet3_info, &s->conf,
                           object_get_typename(OBJECT(s)),
-                          d->id, s);
+                          d->id, &d->mem_reentrancy_guard, s);
 
     s->peer_has_vhdr = vmxnet3_peer_has_vnet_hdr(s);
     s->tx_sop = true;
diff --git a/hw/net/xen_nic.c b/hw/net/xen_nic.c
index 5c815b4f0c5..3d0b7820d3b 100644
--- a/hw/net/xen_nic.c
+++ b/hw/net/xen_nic.c
@@ -294,7 +294,8 @@ static int net_init(struct XenLegacyDevice *xendev)
     }
 
     netdev->nic = qemu_new_nic(&net_xen_info, &netdev->conf,
-                               "xen", NULL, netdev);
+                               "xen", NULL,
+                               &xendev->qdev.mem_reentrancy_guard, netdev);
 
     snprintf(qemu_get_queue(netdev->nic)->info_str,
              sizeof(qemu_get_queue(netdev->nic)->info_str),
diff --git a/hw/net/xgmac.c b/hw/net/xgmac.c
index 0ab6ae91aa1..1f4f277d840 100644
--- a/hw/net/xgmac.c
+++ b/hw/net/xgmac.c
@@ -402,7 +402,8 @@ static void xgmac_enet_realize(DeviceState *dev, Error **errp)
 
     qemu_macaddr_default_if_unset(&s->conf.macaddr);
     s->nic = qemu_new_nic(&net_xgmac_enet_info, &s->conf,
-                          object_get_typename(OBJECT(dev)), dev->id, s);
+                          object_get_typename(OBJECT(dev)), dev->id,
+                          &dev->mem_reentrancy_guard, s);
     qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);
 
     s->regs[XGMAC_ADDR_HIGH(0)] = (s->conf.macaddr.a[5] << 8) |
diff --git a/hw/net/xilinx_axienet.c b/hw/net/xilinx_axienet.c
index 990ff3a1c25..8a34243803d 100644
--- a/hw/net/xilinx_axienet.c
+++ b/hw/net/xilinx_axienet.c
@@ -968,7 +968,8 @@ static void xilinx_enet_realize(DeviceState *dev, Error **errp)
 
     qemu_macaddr_default_if_unset(&s->conf.macaddr);
     s->nic = qemu_new_nic(&net_xilinx_enet_info, &s->conf,
-                          object_get_typename(OBJECT(dev)), dev->id, s);
+                          object_get_typename(OBJECT(dev)), dev->id,
+                          &dev->mem_reentrancy_guard, s);
     qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);
 
     tdk_init(&s->TEMAC.phy);
diff --git a/hw/net/xilinx_ethlite.c b/hw/net/xilinx_ethlite.c
index 6e09f7e422e..80cb869e22b 100644
--- a/hw/net/xilinx_ethlite.c
+++ b/hw/net/xilinx_ethlite.c
@@ -235,7 +235,8 @@ static void xilinx_ethlite_realize(DeviceState *dev, Error **errp)
 
     qemu_macaddr_default_if_unset(&s->conf.macaddr);
     s->nic = qemu_new_nic(&net_xilinx_ethlite_info, &s->conf,
-                          object_get_typename(OBJECT(dev)), dev->id, s);
+                          object_get_typename(OBJECT(dev)), dev->id,
+                          &dev->mem_reentrancy_guard, s);
     qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);
 }
 
diff --git a/hw/usb/dev-network.c b/hw/usb/dev-network.c
index 6c49c16015e..ae447a8bc32 100644
--- a/hw/usb/dev-network.c
+++ b/hw/usb/dev-network.c
@@ -1362,7 +1362,8 @@ static void usb_net_realize(USBDevice *dev, Error **errp)
 
     qemu_macaddr_default_if_unset(&s->conf.macaddr);
     s->nic = qemu_new_nic(&net_usbnet_info, &s->conf,
-                          object_get_typename(OBJECT(s)), s->dev.qdev.id, s);
+                          object_get_typename(OBJECT(s)), s->dev.qdev.id,
+                          &s->dev.qdev.mem_reentrancy_guard, s);
     qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);
     snprintf(s->usbstring_mac, sizeof(s->usbstring_mac),
              "%02x%02x%02x%02x%02x%02x",
diff --git a/include/net/net.h b/include/net/net.h
index 523136c7acb..1457b6c0141 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -145,6 +145,7 @@ NICState *qemu_new_nic(NetClientInfo *info,
                        NICConf *conf,
                        const char *model,
                        const char *name,
+                       MemReentrancyGuard *reentrancy_guard,
                        void *opaque);
 void qemu_del_nic(NICState *nic);
 NetClientState *qemu_get_subqueue(NICState *nic, int queue_index);
diff --git a/net/net.c b/net/net.c
index f0d14dbfc1f..669e194c4be 100644
--- a/net/net.c
+++ b/net/net.c
@@ -299,6 +299,7 @@ NICState *qemu_new_nic(NetClientInfo *info,
                        NICConf *conf,
                        const char *model,
                        const char *name,
+                       MemReentrancyGuard *reentrancy_guard,
                        void *opaque)
 {
     NetClientState **peers = conf->peers.ncs;
-- 
Gitee


From 832eb4b400be18e727ae00eba93b769db2e62de3 Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Thu, 23 Nov 2023 11:30:46 -0500
Subject: [PATCH 315/407] net: Update MemReentrancyGuard for NIC

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 331: net: Provide MemReentrancyGuard * to qemu_new_nic()
RH-Jira: RHEL-7309
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
RH-Acked-by: Laurent Vivier <lvivier@redhat.com>
RH-Acked-by: Jason Wang <jasowang@redhat.com>
RH-Commit: [2/2] b116efe725dd838c2cab9bd2240112f3c6c46d6a (redhat/rhel/src/qemu-kvm/jons-qemu-kvm-2)

Jira: https://issues.redhat.com/browse/RHEL-7309
CVE: CVE-2023-3019
Upstream: Merged

commit 9050f976e447444ea6ee2ba12c9f77e4b0dc54bc
Author: Akihiko Odaki <akihiko.odaki@daynix.com>
Date:   Thu Jun 1 12:18:59 2023 +0900

    net: Update MemReentrancyGuard for NIC

    Recently MemReentrancyGuard was added to DeviceState to record that the
    device is engaging in I/O. The network device backend needs to update it
    when delivering a packet to a device.

    This implementation follows what bottom half does, but it does not add
    a tracepoint for the case that the network device backend started
    delivering a packet to a device which is already engaging in I/O. This
    is because such reentrancy frequently happens for
    qemu_flush_queued_packets() and is insignificant.

    Fixes: CVE-2023-3019
    Reported-by: Alexander Bulekov <alxndr@bu.edu>
    Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
    Acked-by: Alexander Bulekov <alxndr@bu.edu>
    Signed-off-by: Jason Wang <jasowang@redhat.com>

Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 include/net/net.h |  1 +
 net/net.c         | 14 ++++++++++++++
 2 files changed, 15 insertions(+)

diff --git a/include/net/net.h b/include/net/net.h
index 1457b6c0141..11d4564ea16 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -112,6 +112,7 @@ struct NetClientState {
 typedef struct NICState {
     NetClientState *ncs;
     NICConf *conf;
+    MemReentrancyGuard *reentrancy_guard;
     void *opaque;
     bool peer_deleted;
 } NICState;
diff --git a/net/net.c b/net/net.c
index 669e194c4be..b3008a52b72 100644
--- a/net/net.c
+++ b/net/net.c
@@ -312,6 +312,7 @@ NICState *qemu_new_nic(NetClientInfo *info,
     nic = g_malloc0(info->size + sizeof(NetClientState) * queues);
     nic->ncs = (void *)nic + info->size;
     nic->conf = conf;
+    nic->reentrancy_guard = reentrancy_guard,
     nic->opaque = opaque;
 
     for (i = 0; i < queues; i++) {
@@ -767,6 +768,7 @@ static ssize_t qemu_deliver_packet_iov(NetClientState *sender,
                                        int iovcnt,
                                        void *opaque)
 {
+    MemReentrancyGuard *owned_reentrancy_guard;
     NetClientState *nc = opaque;
     int ret;
 
@@ -779,12 +781,24 @@ static ssize_t qemu_deliver_packet_iov(NetClientState *sender,
         return 0;
     }
 
+    if (nc->info->type != NET_CLIENT_DRIVER_NIC ||
+        qemu_get_nic(nc)->reentrancy_guard->engaged_in_io) {
+        owned_reentrancy_guard = NULL;
+    } else {
+        owned_reentrancy_guard = qemu_get_nic(nc)->reentrancy_guard;
+        owned_reentrancy_guard->engaged_in_io = true;
+    }
+
     if (nc->info->receive_iov && !(flags & QEMU_NET_PACKET_FLAG_RAW)) {
         ret = nc->info->receive_iov(nc, iov, iovcnt);
     } else {
         ret = nc_sendv_compat(nc, iov, iovcnt, flags);
     }
 
+    if (owned_reentrancy_guard) {
+        owned_reentrancy_guard->engaged_in_io = false;
+    }
+
     if (ret == 0) {
         nc->receive_disabled = 1;
     }
-- 
Gitee


From 9f5111713c87273c9d4aa5a65503a730ee47d70c Mon Sep 17 00:00:00 2001
From: Prasad Pandit <ppandit@redhat.com>
Date: Fri, 18 Aug 2023 16:38:12 +0530
Subject: [PATCH 316/407] vhost: release memory_listener object in error path

RH-Author: Prasad Pandit <None>
RH-MergeRequest: 337: vhost: release memory_listener object in error path
RH-Jira: RHEL-7567
RH-Acked-by: Peter Xu <peterx@redhat.com>
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Commit: [1/1] 1e377a2f6f148e11a452d11107d839521354e2ca

Jira: https://issues.redhat.com/browse/RHEL-7567

commit 1e3ffb34f764f8ac4c003b2b2e6a775b2b073a16
Author: Prasad J Pandit <pjp@fedoraproject.org>
Date:   Mon May 29 17:13:32 2023 +0530

    vhost: release memory_listener object in error path

    vhost_dev_start function does not release memory_listener object
    in case of an error. This may crash the guest when vhost is unable
    to set memory table:

      stack trace of thread 125653:
      Program terminated with signal SIGSEGV, Segmentation fault
      #0  memory_listener_register (qemu-kvm + 0x6cda0f)
      #1  vhost_dev_start (qemu-kvm + 0x699301)
      #2  vhost_net_start (qemu-kvm + 0x45b03f)
      #3  virtio_net_set_status (qemu-kvm + 0x665672)
      #4  qmp_set_link (qemu-kvm + 0x548fd5)
      #5  net_vhost_user_event (qemu-kvm + 0x552c45)
      #6  tcp_chr_connect (qemu-kvm + 0x88d473)
      #7  tcp_chr_new_client (qemu-kvm + 0x88cf83)
      #8  tcp_chr_accept (qemu-kvm + 0x88b429)
      #9  qio_net_listener_channel_func (qemu-kvm + 0x7ac07c)
      #10 g_main_context_dispatch (libglib-2.0.so.0 + 0x54e2f)

    Release memory_listener objects in the error path.

    Signed-off-by: Prasad Pandit <pjp@fedoraproject.org>
    Message-Id: <20230529114333.31686-2-ppandit@redhat.com>
    Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    Reviewed-by: Peter Xu <peterx@redhat.com>
    Fixes: c471ad0e9b ("vhost_net: device IOTLB support")
    Cc: qemu-stable@nongnu.org
    Acked-by: Jason Wang <jasowang@redhat.com>
---
 hw/virtio/vhost.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 437347ad01c..639029aa76c 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1818,6 +1818,9 @@ fail_vq:
     }
 
 fail_mem:
+    if (vhost_dev_has_iommu(hdev)) {
+        memory_listener_unregister(&hdev->iommu_listener);
+    }
 fail_features:
 
     hdev->started = false;
-- 
Gitee


From eb1f574920dbd04fd3f14036b1c0f4a4d2a13b10 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Marc-Andr=C3=A9=20Lureau?= <marcandre.lureau@redhat.com>
Date: Mon, 11 Sep 2023 18:04:47 +0400
Subject: [PATCH 317/407] ui: fix crash when there are no active_console
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Marc-André Lureau <marcandre.lureau@redhat.com>
RH-MergeRequest: 338: ui: fix crash when there are no active_console
RH-Jira: RHEL-2600
RH-Acked-by: Gerd Hoffmann <None>
RH-Acked-by: Vitaly Kuznetsov <vkuznets@redhat.com>
RH-Commit: [1/1] c58d1d76558dbc7ee2a8193a1e7a9b87a79ac385

JIRA: https://issues.redhat.com/browse/RHEL-2600

Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
0x0000555555888630 in dpy_ui_info_supported (con=0x0) at ../ui/console.c:812
812	    return con->hw_ops->ui_info != NULL;
(gdb) bt
#0  0x0000555555888630 in dpy_ui_info_supported (con=0x0) at ../ui/console.c:812
#1  0x00005555558a44b1 in protocol_client_msg (vs=0x5555578c76c0, data=0x5555581e93f0 <incomplete sequence \373>, len=24) at ../ui/vnc.c:2585
#2  0x00005555558a19ac in vnc_client_read (vs=0x5555578c76c0) at ../ui/vnc.c:1607
#3  0x00005555558a1ac2 in vnc_client_io (ioc=0x5555581eb0e0, condition=G_IO_IN, opaque=0x5555578c76c0) at ../ui/vnc.c:1635

Fixes:
https://issues.redhat.com/browse/RHEL-2600

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Albert Esteve <aesteve@redhat.com>

(cherry picked from commit 48a35e12faf90a896c5aa4755812201e00d60316)
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 ui/console.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/ui/console.c b/ui/console.c
index 29a3e3f0f51..df3426bd8a2 100644
--- a/ui/console.c
+++ b/ui/console.c
@@ -1525,6 +1525,9 @@ bool dpy_ui_info_supported(QemuConsole *con)
     if (con == NULL) {
         con = active_console;
     }
+    if (con == NULL) {
+        return false;
+    }
 
     return con->hw_ops->ui_info != NULL;
 }
-- 
Gitee


From eaa586e0e5c2b94c8a9cab30eeb32a4c933c0e25 Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Fri, 24 Nov 2023 12:17:11 -0500
Subject: [PATCH 318/407] hw/ide: reset: cancel async DMA operation before
 resetting state
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 335: hw/ide: reset: cancel async DMA operation before resetting state
RH-Jira: RHEL-15437
RH-Acked-by: Hanna Czenczek <hreitz@redhat.com>
RH-Acked-by: Paolo Bonzini <pbonzini@redhat.com>
RH-Commit: [1/2] b0f5f7f888559a210f1c6b3c545e337dbbc9cf22 (redhat/rhel/src/qemu-kvm/jons-qemu-kvm-2)

JIRA: https://issues.redhat.com/browse/RHEL-15437
CVE: CVE-2023-5088
Upstream: Merged

commit 7d7512019fc40c577e2bdd61f114f31a9eb84a8e
Author: Fiona Ebner <f.ebner@proxmox.com>
Date:   Wed Sep 6 15:09:21 2023 +0200

    hw/ide: reset: cancel async DMA operation before resetting state

    If there is a pending DMA operation during ide_bus_reset(), the fact
    that the IDEState is already reset before the operation is canceled
    can be problematic. In particular, ide_dma_cb() might be called and
    then use the reset IDEState which contains the signature after the
    reset. When used to construct the IO operation this leads to
    ide_get_sector() returning 0 and nsector being 1. This is particularly
    bad, because a write command will thus destroy the first sector which
    often contains a partition table or similar.

    Traces showing the unsolicited write happening with IDEState
    0x5595af6949d0 being used after reset:

    > ahci_port_write ahci(0x5595af6923f0)[0]: port write [reg:PxSCTL] @ 0x2c: 0x00000300
    > ahci_reset_port ahci(0x5595af6923f0)[0]: reset port
    > ide_reset IDEstate 0x5595af6949d0
    > ide_reset IDEstate 0x5595af694da8
    > ide_bus_reset_aio aio_cancel
    > dma_aio_cancel dbs=0x7f64600089a0
    > dma_blk_cb dbs=0x7f64600089a0 ret=0
    > dma_complete dbs=0x7f64600089a0 ret=0 cb=0x5595acd40b30
    > ahci_populate_sglist ahci(0x5595af6923f0)[0]
    > ahci_dma_prepare_buf ahci(0x5595af6923f0)[0]: prepare buf limit=512 prepared=512
    > ide_dma_cb IDEState 0x5595af6949d0; sector_num=0 n=1 cmd=DMA WRITE
    > dma_blk_io dbs=0x7f6420802010 bs=0x5595ae2c6c30 offset=0 to_dev=1
    > dma_blk_cb dbs=0x7f6420802010 ret=0

    > (gdb) p *qiov
    > $11 = {iov = 0x7f647c76d840, niov = 1, {{nalloc = 1, local_iov = {iov_base = 0x0,
    >       iov_len = 512}}, {__pad = "\001\000\000\000\000\000\000\000\000\000\000",
    >       size = 512}}}
    > (gdb) bt
    > #0  blk_aio_pwritev (blk=0x5595ae2c6c30, offset=0, qiov=0x7f6420802070, flags=0,
    >     cb=0x5595ace6f0b0 <dma_blk_cb>, opaque=0x7f6420802010)
    >     at ../block/block-backend.c:1682
    > #1  0x00005595ace6f185 in dma_blk_cb (opaque=0x7f6420802010, ret=<optimized out>)
    >     at ../softmmu/dma-helpers.c:179
    > #2  0x00005595ace6f778 in dma_blk_io (ctx=0x5595ae0609f0,
    >     sg=sg@entry=0x5595af694d00, offset=offset@entry=0, align=align@entry=512,
    >     io_func=io_func@entry=0x5595ace6ee30 <dma_blk_write_io_func>,
    >     io_func_opaque=io_func_opaque@entry=0x5595ae2c6c30,
    >     cb=0x5595acd40b30 <ide_dma_cb>, opaque=0x5595af6949d0,
    >     dir=DMA_DIRECTION_TO_DEVICE) at ../softmmu/dma-helpers.c:244
    > #3  0x00005595ace6f90a in dma_blk_write (blk=0x5595ae2c6c30,
    >     sg=sg@entry=0x5595af694d00, offset=offset@entry=0, align=align@entry=512,
    >     cb=cb@entry=0x5595acd40b30 <ide_dma_cb>, opaque=opaque@entry=0x5595af6949d0)
    >     at ../softmmu/dma-helpers.c:280
    > #4  0x00005595acd40e18 in ide_dma_cb (opaque=0x5595af6949d0, ret=<optimized out>)
    >     at ../hw/ide/core.c:953
    > #5  0x00005595ace6f319 in dma_complete (ret=0, dbs=0x7f64600089a0)
    >     at ../softmmu/dma-helpers.c:107
    > #6  dma_blk_cb (opaque=0x7f64600089a0, ret=0) at ../softmmu/dma-helpers.c:127
    > #7  0x00005595ad12227d in blk_aio_complete (acb=0x7f6460005b10)
    >     at ../block/block-backend.c:1527
    > #8  blk_aio_complete (acb=0x7f6460005b10) at ../block/block-backend.c:1524
    > #9  blk_aio_write_entry (opaque=0x7f6460005b10) at ../block/block-backend.c:1594
    > #10 0x00005595ad258cfb in coroutine_trampoline (i0=<optimized out>,
    >     i1=<optimized out>) at ../util/coroutine-ucontext.c:177

    Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
    Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
    Tested-by: simon.rowe@nutanix.com
    Message-ID: <20230906130922.142845-1-f.ebner@proxmox.com>

Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 hw/ide/core.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 05a32d0a99a..fd50c123e8c 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -2456,19 +2456,19 @@ static void ide_dummy_transfer_stop(IDEState *s)
 
 void ide_bus_reset(IDEBus *bus)
 {
-    bus->unit = 0;
-    bus->cmd = 0;
-    ide_reset(&bus->ifs[0]);
-    ide_reset(&bus->ifs[1]);
-    ide_clear_hob(bus);
-
-    /* pending async DMA */
+    /* pending async DMA - needs the IDEState before it is reset */
     if (bus->dma->aiocb) {
         trace_ide_bus_reset_aio();
         blk_aio_cancel(bus->dma->aiocb);
         bus->dma->aiocb = NULL;
     }
 
+    bus->unit = 0;
+    bus->cmd = 0;
+    ide_reset(&bus->ifs[0]);
+    ide_reset(&bus->ifs[1]);
+    ide_clear_hob(bus);
+
     /* reset dma provider too */
     if (bus->dma->ops->reset) {
         bus->dma->ops->reset(bus->dma);
-- 
Gitee


From 97d0ef7221760a5a6c922b72701b26f1ad8a702a Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Fri, 24 Nov 2023 12:17:53 -0500
Subject: [PATCH 319/407] tests/qtest: ahci-test: add test exposing reset issue
 with pending callback
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 335: hw/ide: reset: cancel async DMA operation before resetting state
RH-Jira: RHEL-15437
RH-Acked-by: Hanna Czenczek <hreitz@redhat.com>
RH-Acked-by: Paolo Bonzini <pbonzini@redhat.com>
RH-Commit: [2/2] 364e0703d22d69a4c1cfcff250ad0a3c81ada7b2 (redhat/rhel/src/qemu-kvm/jons-qemu-kvm-2)

JIRA: https://issues.redhat.com/browse/RHEL-15437
CVE: CVE-2023-5088
Upstream: Merged

commit cc610857bbd3551f4b86ae2299336b5d9aa0db2b
Author: Fiona Ebner <f.ebner@proxmox.com>
Date:   Wed Sep 6 15:09:22 2023 +0200

    tests/qtest: ahci-test: add test exposing reset issue with pending callback

    Before commit "hw/ide: reset: cancel async DMA operation before
    resetting state", this test would fail, because a reset with a
    pending write operation would lead to an unsolicited write to the
    first sector of the disk.

    The test writes a pattern to the beginning of the disk and verifies
    that it is still intact after a reset with a pending operation. It
    also checks that the pending operation actually completes correctly.

    Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
    Message-ID: <20230906130922.142845-2-f.ebner@proxmox.com>
    Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>

Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 tests/qtest/ahci-test.c | 86 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 85 insertions(+), 1 deletion(-)

diff --git a/tests/qtest/ahci-test.c b/tests/qtest/ahci-test.c
index 8073ccc2052..b4d15566e1e 100644
--- a/tests/qtest/ahci-test.c
+++ b/tests/qtest/ahci-test.c
@@ -1425,6 +1425,89 @@ static void test_reset(void)
     ahci_shutdown(ahci);
 }
 
+static void test_reset_pending_callback(void)
+{
+    AHCIQState *ahci;
+    AHCICommand *cmd;
+    uint8_t port;
+    uint64_t ptr1;
+    uint64_t ptr2;
+
+    int bufsize = 4 * 1024;
+    int speed = bufsize + (bufsize / 2);
+    int offset1 = 0;
+    int offset2 = bufsize / AHCI_SECTOR_SIZE;
+
+    g_autofree unsigned char *tx1 = g_malloc(bufsize);
+    g_autofree unsigned char *tx2 = g_malloc(bufsize);
+    g_autofree unsigned char *rx1 = g_malloc0(bufsize);
+    g_autofree unsigned char *rx2 = g_malloc0(bufsize);
+
+    /* Uses throttling to make test independent of specific environment. */
+    ahci = ahci_boot_and_enable("-drive if=none,id=drive0,file=%s,"
+                                "cache=writeback,format=%s,"
+                                "throttling.bps-write=%d "
+                                "-M q35 "
+                                "-device ide-hd,drive=drive0 ",
+                                tmp_path, imgfmt, speed);
+
+    port = ahci_port_select(ahci);
+    ahci_port_clear(ahci, port);
+
+    ptr1 = ahci_alloc(ahci, bufsize);
+    ptr2 = ahci_alloc(ahci, bufsize);
+
+    g_assert(ptr1 && ptr2);
+
+    /* Need two different patterns. */
+    do {
+        generate_pattern(tx1, bufsize, AHCI_SECTOR_SIZE);
+        generate_pattern(tx2, bufsize, AHCI_SECTOR_SIZE);
+    } while (memcmp(tx1, tx2, bufsize) == 0);
+
+    qtest_bufwrite(ahci->parent->qts, ptr1, tx1, bufsize);
+    qtest_bufwrite(ahci->parent->qts, ptr2, tx2, bufsize);
+
+    /* Write to beginning of disk to check it wasn't overwritten later. */
+    ahci_guest_io(ahci, port, CMD_WRITE_DMA_EXT, ptr1, bufsize, offset1);
+
+    /* Issue asynchronously to get a pending callback during reset. */
+    cmd = ahci_command_create(CMD_WRITE_DMA_EXT);
+    ahci_command_adjust(cmd, offset2, ptr2, bufsize, 0);
+    ahci_command_commit(ahci, cmd, port);
+    ahci_command_issue_async(ahci, cmd);
+
+    ahci_set(ahci, AHCI_GHC, AHCI_GHC_HR);
+
+    ahci_command_free(cmd);
+
+    /* Wait for throttled write to finish. */
+    sleep(1);
+
+    /* Start again. */
+    ahci_clean_mem(ahci);
+    ahci_pci_enable(ahci);
+    ahci_hba_enable(ahci);
+    port = ahci_port_select(ahci);
+    ahci_port_clear(ahci, port);
+
+    /* Read and verify. */
+    ahci_guest_io(ahci, port, CMD_READ_DMA_EXT, ptr1, bufsize, offset1);
+    qtest_bufread(ahci->parent->qts, ptr1, rx1, bufsize);
+    g_assert_cmphex(memcmp(tx1, rx1, bufsize), ==, 0);
+
+    ahci_guest_io(ahci, port, CMD_READ_DMA_EXT, ptr2, bufsize, offset2);
+    qtest_bufread(ahci->parent->qts, ptr2, rx2, bufsize);
+    g_assert_cmphex(memcmp(tx2, rx2, bufsize), ==, 0);
+
+    ahci_free(ahci, ptr1);
+    ahci_free(ahci, ptr2);
+
+    ahci_clean_mem(ahci);
+
+    ahci_shutdown(ahci);
+}
+
 static void test_ncq_simple(void)
 {
     AHCIQState *ahci;
@@ -1929,7 +2012,8 @@ int main(int argc, char **argv)
     qtest_add_func("/ahci/migrate/dma/halted", test_migrate_halted_dma);
 
     qtest_add_func("/ahci/max", test_max);
-    qtest_add_func("/ahci/reset", test_reset);
+    qtest_add_func("/ahci/reset/simple", test_reset);
+    qtest_add_func("/ahci/reset/pending_callback", test_reset_pending_callback);
 
     qtest_add_func("/ahci/io/ncq/simple", test_ncq_simple);
     qtest_add_func("/ahci/migrate/ncq/simple", test_migrate_ncq);
-- 
Gitee


From e3fedaa63f215e3ce12c5506be48a72f6e571288 Mon Sep 17 00:00:00 2001
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Date: Wed, 6 Apr 2022 14:58:12 -0400
Subject: [PATCH 320/407] acpi: fix acpi_index migration
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Peter Xu <peterx@redhat.com>
RH-MergeRequest: 343: acpi: fix acpi_index migration
RH-Jira: RHEL-20189
RH-Acked-by: Leonardo Brás <leobras@redhat.com>
RH-Acked-by: Igor Mammedov <imammedo@redhat.com>
RH-Acked-by: Prasad Pandit <None>
RH-Commit: [1/2] c5b9cdf5791cd856207b7df7e2ef5df360ec8de4

vmstate_acpi_pcihp_use_acpi_index() was expecting AcpiPciHpState
as state but it actually received PIIX4PMState, because
VMSTATE_PCI_HOTPLUG is a macro and not another struct.
So it ended up accessing random pointer, which resulted
in 'false' return value and acpi_index field wasn't ever
sent.

However in 7.0 that pointer de-references to value > 0, and
destination QEMU starts to expect the field which isn't
sent in migratioon stream from older QEMU (6.2 and older).
As result migration fails with:
  qemu-system-x86_64: Missing section footer for 0000:00:01.3/piix4_pm
  qemu-system-x86_64: load of migration failed: Invalid argument

In addition with QEMU-6.2, destination due to not expected
state, also never expects the acpi_index field in migration
stream.

Q35 is not affected as it always sends/expects the field as
long as acpi based PCI hotplug is enabled.

Fix issue by introducing compat knob to never send/expect
acpi_index in migration stream for 6.2 and older PC machine
types and always send it for 7.0 and newer PC machine types.

Diagnosed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Fixes: b32bd76 ("pci: introduce acpi-index property for PCI device")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/932
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit a83c2844903c45aa7d32cdd17305f23ce2c56ab9)
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/acpi/acpi-pci-hotplug-stub.c |  4 ----
 hw/acpi/pcihp.c                 |  6 ------
 hw/acpi/piix4.c                 | 15 ++++++++++++++-
 hw/core/machine.c               |  5 +++++
 include/hw/acpi/pcihp.h         |  2 --
 5 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/hw/acpi/acpi-pci-hotplug-stub.c b/hw/acpi/acpi-pci-hotplug-stub.c
index 734e4c59868..a43f6dafc92 100644
--- a/hw/acpi/acpi-pci-hotplug-stub.c
+++ b/hw/acpi/acpi-pci-hotplug-stub.c
@@ -41,7 +41,3 @@ void acpi_pcihp_reset(AcpiPciHpState *s, bool acpihp_root_off)
     return;
 }
 
-bool vmstate_acpi_pcihp_use_acpi_index(void *opaque, int version_id)
-{
-    return false;
-}
diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
index be0e846b343..ec861661c3d 100644
--- a/hw/acpi/pcihp.c
+++ b/hw/acpi/pcihp.c
@@ -559,12 +559,6 @@ void acpi_pcihp_init(Object *owner, AcpiPciHpState *s, PCIBus *root_bus,
                                    OBJ_PROP_FLAG_READ);
 }
 
-bool vmstate_acpi_pcihp_use_acpi_index(void *opaque, int version_id)
-{
-     AcpiPciHpState *s = opaque;
-     return s->acpi_index;
-}
-
 const VMStateDescription vmstate_acpi_pcihp_pci_status = {
     .name = "acpi_pcihp_pci_status",
     .version_id = 1,
diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
index 8d6011c0a3b..033e75ce5b7 100644
--- a/hw/acpi/piix4.c
+++ b/hw/acpi/piix4.c
@@ -82,6 +82,7 @@ struct PIIX4PMState {
     AcpiPciHpState acpi_pci_hotplug;
     bool use_acpi_hotplug_bridge;
     bool use_acpi_root_pci_hotplug;
+    bool not_migrate_acpi_index;
 
     uint8_t disable_s3;
     uint8_t disable_s4;
@@ -269,6 +270,16 @@ static bool piix4_vmstate_need_smbus(void *opaque, int version_id)
     return pm_smbus_vmstate_needed();
 }
 
+/*
+ * This is a fudge to turn off the acpi_index field,
+ * whose test was always broken on piix4 with 6.2 and older machine types.
+ */
+static bool vmstate_test_migrate_acpi_index(void *opaque, int version_id)
+{
+    PIIX4PMState *s = PIIX4_PM(opaque);
+    return s->use_acpi_hotplug_bridge && !s->not_migrate_acpi_index;
+}
+
 /* qemu-kvm 1.2 uses version 3 but advertised as 2
  * To support incoming qemu-kvm 1.2 migration, change version_id
  * and minimum_version_id to 2 below (which breaks migration from
@@ -299,7 +310,7 @@ static const VMStateDescription vmstate_acpi = {
             struct AcpiPciHpPciStatus),
         VMSTATE_PCI_HOTPLUG(acpi_pci_hotplug, PIIX4PMState,
                             vmstate_test_use_acpi_hotplug_bridge,
-                            vmstate_acpi_pcihp_use_acpi_index),
+                            vmstate_test_migrate_acpi_index),
         VMSTATE_END_OF_LIST()
     },
     .subsections = (const VMStateDescription*[]) {
@@ -654,6 +665,8 @@ static Property piix4_pm_properties[] = {
     DEFINE_PROP_BOOL("memory-hotplug-support", PIIX4PMState,
                      acpi_memory_hotplug.is_enabled, true),
     DEFINE_PROP_BOOL("smm-compat", PIIX4PMState, smm_compat, false),
+    DEFINE_PROP_BOOL("x-not-migrate-acpi-index", PIIX4PMState,
+                      not_migrate_acpi_index, false),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/core/machine.c b/hw/core/machine.c
index 76fcabec7a2..2724f6848ab 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -331,6 +331,11 @@ GlobalProperty hw_compat_rhel_7_1[] = {
 };
 const size_t hw_compat_rhel_7_1_len = G_N_ELEMENTS(hw_compat_rhel_7_1);
 
+GlobalProperty hw_compat_6_2[] = {
+    { "PIIX4_PM", "x-not-migrate-acpi-index", "on"},
+};
+const size_t hw_compat_6_2_len = G_N_ELEMENTS(hw_compat_6_2);
+
 GlobalProperty hw_compat_6_1[] = {
     { "vhost-user-vsock-device", "seqpacket", "off" },
     { "nvme-ns", "shared", "off" },
diff --git a/include/hw/acpi/pcihp.h b/include/hw/acpi/pcihp.h
index af1a169fc32..7e268c2c9c9 100644
--- a/include/hw/acpi/pcihp.h
+++ b/include/hw/acpi/pcihp.h
@@ -73,8 +73,6 @@ void acpi_pcihp_reset(AcpiPciHpState *s, bool acpihp_root_off);
 
 extern const VMStateDescription vmstate_acpi_pcihp_pci_status;
 
-bool vmstate_acpi_pcihp_use_acpi_index(void *opaque, int version_id);
-
 #define VMSTATE_PCI_HOTPLUG(pcihp, state, test_pcihp, test_acpi_index) \
         VMSTATE_UINT32_TEST(pcihp.hotplug_select, state, \
                             test_pcihp), \
-- 
Gitee


From e03e084ecddcefd35a5697828acccf03bcd5d8b6 Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx@redhat.com>
Date: Wed, 6 Sep 2023 16:29:23 -0400
Subject: [PATCH 321/407] RHEL: Enable "x-not-migrate-acpi-index" for all
 pre-RHEL8 guests
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Peter Xu <peterx@redhat.com>
RH-MergeRequest: 343: acpi: fix acpi_index migration
RH-Jira: RHEL-20189
RH-Acked-by: Leonardo Brás <leobras@redhat.com>
RH-Acked-by: Igor Mammedov <imammedo@redhat.com>
RH-Acked-by: Prasad Pandit <None>
RH-Commit: [2/2] 0a26a71236e68dd7feb5d2063254090e3852d6ba

The acpi index migration is simply broken before for all pre-RHEL8
branches.  Don't migrate it for all of them.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/core/machine.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 2724f6848ab..6650a3d7b7a 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -44,6 +44,10 @@ GlobalProperty hw_compat_rhel_8_6[] = {
      * we need do disable it downstream on the latest hw_compat_rhel_8.
      */
     { "vhost-vsock-device", "seqpacket", "off" },
+    /*
+     * RHEL-2186: all rhel8 machines should not migrate acpi index.
+     */
+    { "PIIX4_PM", "x-not-migrate-acpi-index", "on"},
 };
 const size_t hw_compat_rhel_8_6_len = G_N_ELEMENTS(hw_compat_rhel_8_6);
 
-- 
Gitee


From 34afc12b59bc4a563a32c72e79f790eeace66233 Mon Sep 17 00:00:00 2001
From: Eric Auger <eric.auger@redhat.com>
Date: Thu, 4 Jan 2024 12:02:31 +0100
Subject: [PATCH 322/407] hw/arm/virt: Do not load efi-virtio.rom for all
 virtio-net-pci variants

RH-Author: Eric Auger <eric.auger@redhat.com>
RH-MergeRequest: 344: hw/arm/virt: Do not load efi-virtio.rom for any virtio-net-pci variants
RH-Jira: RHEL-14870
RH-Acked-by: Gerd Hoffmann <None>
RH-Acked-by: Sebastian Ott <None>
RH-Commit: [1/1] ffeaa78ad0a1cff5b49009dfb32d25e5cadc0e05

Upstream: RHEL-only
Brew: http://brewweb.engineering.redhat.com/brew/taskinfo?taskID=5785640

Currently arm_rhel_compat just sets the romfile to "" for
virtio-net-pci and not for transitional and non transitional
variants. However, on aarch64 RHEL, efi-virtio.rom is not
shipped so transitional and non-transitional variants cannot
be used and the following error is obeserved:

"Could not open option rom 'efi-virtio.rom': No such file or directory"

In practice, we do not need any rom file for those virtio-net-pci
variants either because edk2 already brings the full functionality.

So let's change the applied compat to cover all the variants. While
at it also change the way arm_rhel_compat is applied. Instead of
applying it from the latest _virt_options(), which is error prone
when upgrading the machine type, let's apply it before calling
*virt_options in the non abstract machine class. That way the setting
will apply to any machine type without any need to add it in any
future machine types.

We don't really care keeping non void romfiles for transitional and
non transitional devices on previous machine types because this
was not working anyway.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 hw/arm/virt.c | 42 ++++++++++++++++++++++++++++--------------
 1 file changed, 28 insertions(+), 14 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index dbf0a6d62fe..46c72a96117 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -108,11 +108,39 @@
     DEFINE_VIRT_MACHINE_LATEST(major, minor, false)
 #endif /* disabled for RHEL */
 
+/*
+ * This variable is for changes to properties that are RHEL specific,
+ * different to the current upstream and to be applied to the latest
+ * machine type. They may be overriden by older machine compats.
+ *
+ * virtio-net-pci variant romfiles are not needed because edk2 does
+ * fully support the pxe boot. Besides virtio romfiles are not shipped
+ * on rhel/aarch64.
+ */
+GlobalProperty arm_rhel_compat[] = {
+    {"virtio-net-pci", "romfile", "" },
+    {"virtio-net-pci-transitional", "romfile", "" },
+    {"virtio-net-pci-non-transitional", "romfile", "" },
+};
+const size_t arm_rhel_compat_len = G_N_ELEMENTS(arm_rhel_compat);
+
+/*
+ * This cannot be called from the rhel_virt_class_init() because
+ * TYPE_RHEL_MACHINE is abstract and mc->compat_props g_ptr_array_new()
+ * only is called on virt-rhelm.n.s non abstract class init.
+ */
+static void arm_rhel_compat_set(MachineClass *mc)
+{
+    compat_props_add(mc->compat_props, arm_rhel_compat,
+                     arm_rhel_compat_len);
+}
+
 #define DEFINE_RHEL_MACHINE_LATEST(m, n, s, latest)                     \
     static void rhel##m##n##s##_virt_class_init(ObjectClass *oc,        \
                                                 void *data)             \
     {                                                                   \
         MachineClass *mc = MACHINE_CLASS(oc);                           \
+        arm_rhel_compat_set(mc);                                        \
         rhel##m##n##s##_virt_options(mc);                               \
         mc->desc = "RHEL " # m "." # n "." # s " ARM Virtual Machine";  \
         if (latest) {                                                   \
@@ -136,19 +164,6 @@
 #define DEFINE_RHEL_MACHINE(major, minor, subminor)             \
     DEFINE_RHEL_MACHINE_LATEST(major, minor, subminor, false)
 
-/* This variable is for changes to properties that are RHEL specific,
- * different to the current upstream and to be applied to the latest
- * machine type.
- */
-GlobalProperty arm_rhel_compat[] = {
-    {
-        .driver   = "virtio-net-pci",
-        .property = "romfile",
-        .value    = "",
-    },
-};
-const size_t arm_rhel_compat_len = G_N_ELEMENTS(arm_rhel_compat);
-
 /* Number of external interrupt lines to configure the GIC with */
 #define NUM_IRQS 256
 
@@ -3240,7 +3255,6 @@ type_init(rhel_machine_init);
 
 static void rhel860_virt_options(MachineClass *mc)
 {
-    compat_props_add(mc->compat_props, arm_rhel_compat, arm_rhel_compat_len);
 }
 DEFINE_RHEL_MACHINE_AS_LATEST(8, 6, 0)
 
-- 
Gitee


From ed0a9dd735d609de99b05dd3dab638b55ab0050f Mon Sep 17 00:00:00 2001
From: Thomas Huth <thuth@redhat.com>
Date: Mon, 15 Jan 2024 14:00:04 +0100
Subject: [PATCH 323/407] MAINTAINERS: split out s390x sections
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Thomas Huth <thuth@redhat.com>
RH-MergeRequest: 348: s390x: Provide some more useful information if decryption of a PV image fails
RH-Jira: RHEL-18214
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Acked-by: Cédric Le Goater <clg@redhat.com>
RH-Commit: [1/5] a71a3c11922481f97c36570e361088d17474e481

JIRA: https://issues.redhat.com/browse/RHEL-18214

commit 56e34834029c7c6862cb0095d95ad83c50485f88
Author: Cornelia Huck <cohuck@redhat.com>
Date:   Wed Dec 22 11:55:48 2021 +0100

    MAINTAINERS: split out s390x sections

    Split out some more specialized devices etc., so that we can build
    smarter lists of people to be put on cc: in the future.

    Signed-off-by: Cornelia Huck <cohuck@redhat.com>
    Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
    Acked-by: David Hildenbrand <david@redhat.com>
    Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
    Acked-by: Thomas Huth <thuth@redhat.com>
    Acked-by: Halil Pasic <pasic@linux.ibm.com>
    Acked-by: Eric Farman <farman@linux.ibm.com>
    Message-Id: <20211222105548.356852-1-cohuck@redhat.com>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Thomas Huth <thuth@redhat.com>
---
 MAINTAINERS | 85 ++++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 74 insertions(+), 11 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 7543eb4d597..b893206fc3f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -297,7 +297,6 @@ M: David Hildenbrand <david@redhat.com>
 S: Maintained
 F: target/s390x/
 F: target/s390x/tcg
-F: target/s390x/cpu_models_*.[ch]
 F: hw/s390x/
 F: disas/s390.c
 F: tests/tcg/s390x/
@@ -396,16 +395,10 @@ M: Halil Pasic <pasic@linux.ibm.com>
 M: Christian Borntraeger <borntraeger@de.ibm.com>
 S: Supported
 F: target/s390x/kvm/
-F: target/s390x/ioinst.[ch]
 F: target/s390x/machine.c
 F: target/s390x/sigp.c
-F: target/s390x/cpu_features*.[ch]
-F: target/s390x/cpu_models.[ch]
 F: hw/s390x/pv.c
 F: include/hw/s390x/pv.h
-F: hw/intc/s390_flic.c
-F: hw/intc/s390_flic_kvm.c
-F: include/hw/s390x/s390_flic.h
 F: gdb-xml/s390*.xml
 T: git https://github.com/borntraeger/qemu.git s390-next
 L: qemu-s390x@nongnu.org
@@ -1529,12 +1522,8 @@ S390 Virtio-ccw
 M: Halil Pasic <pasic@linux.ibm.com>
 M: Christian Borntraeger <borntraeger@de.ibm.com>
 S: Supported
-F: hw/char/sclp*.[hc]
-F: hw/char/terminal3270.c
 F: hw/s390x/
 F: include/hw/s390x/
-F: hw/watchdog/wdt_diag288.c
-F: include/hw/watchdog/wdt_diag288.h
 F: configs/devices/s390x-softmmu/default.mak
 F: tests/avocado/machine_s390_ccw_virtio.py
 T: git https://github.com/borntraeger/qemu.git s390-next
@@ -1559,6 +1548,37 @@ F: hw/s390x/s390-pci*
 F: include/hw/s390x/s390-pci*
 L: qemu-s390x@nongnu.org
 
+S390 channel subsystem
+M: Halil Pasic <pasic@linux.ibm.com>
+M: Christian Borntraeger <borntraeger@linux.ibm.com>
+S: Supported
+F: hw/s390x/ccw-device.[ch]
+F: hw/s390x/css.c
+F: hw/s390x/css-bridge.c
+F: include/hw/s390x/css.h
+F: include/hw/s390x/css-bridge.h
+F: include/hw/s390x/ioinst.h
+F: target/s390x/ioinst.c
+L: qemu-s390x@nongnu.org
+
+S390 CPU models
+M: David Hildenbrand <david@redhat.com>
+S: Maintained
+F: target/s390x/cpu_features*.[ch]
+F: target/s390x/cpu_models.[ch]
+L: qemu-s390x@nongnu.org
+
+S390 SCLP-backed devices
+M: Halil Pasic <pasic@linux.ibm.com>
+M: Christian Borntraeger <borntraeger@linux.ibm.com>
+S: Supported
+F: include/hw/s390x/event-facility.h
+F: include/hw/s390x/sclp.h
+F: hw/char/sclp*.[hc]
+F: hw/s390x/event-facility.c
+F: hw/s390x/sclp*.c
+L: qemu-s390x@nongnu.org
+
 X86 Machines
 ------------
 PC
@@ -1956,6 +1976,7 @@ M: Halil Pasic <pasic@linux.ibm.com>
 S: Supported
 F: hw/s390x/virtio-ccw*.[hc]
 F: hw/s390x/vhost-vsock-ccw.c
+F: hw/s390x/vhost-user-fs-ccw.c
 T: git https://gitlab.com/cohuck/qemu.git s390-next
 T: git https://github.com/borntraeger/qemu.git s390-next
 L: qemu-s390x@nongnu.org
@@ -2294,6 +2315,48 @@ F: hw/timer/mips_gictimer.c
 F: include/hw/intc/mips_gic.h
 F: include/hw/timer/mips_gictimer.h
 
+S390 3270 device
+M: Halil Pasic <pasic@linux.ibm.com>
+M: Christian Borntraeger <borntraeger@linux.ibm.com>
+S: Odd fixes
+F: include/hw/s390x/3270-ccw.h
+F: hw/char/terminal3270.c
+F: hw/s390x/3270-ccw.c
+L: qemu-s390x@nongnu.org
+
+S390 diag 288 watchdog
+M: Halil Pasic <pasic@linux.ibm.com>
+M: Christian Borntraeger <borntraeger@linux.ibm.com>
+S: Supported
+F: hw/watchdog/wdt_diag288.c
+F: include/hw/watchdog/wdt_diag288.h
+L: qemu-s390x@nongnu.org
+
+S390 storage key device
+M: Halil Pasic <pasic@linux.ibm.com>
+M: Christian Borntraeger <borntraeger@linux.ibm.com>
+S: Supported
+F: hw/s390x/storage-keys.h
+F: hw/390x/s390-skeys*.c
+L: qemu-s390x@nongnu.org
+
+S390 storage attribute device
+M: Halil Pasic <pasic@linux.ibm.com>
+M: Christian Borntraeger <borntraeger@linux.ibm.com>
+S: Supported
+F: hw/s390x/storage-attributes.h
+F: hw/s390/s390-stattrib*.c
+L: qemu-s390x@nongnu.org
+
+S390 floating interrupt controller
+M: Halil Pasic <pasic@linux.ibm.com>
+M: Christian Borntraeger <borntraeger@linux.ibm.com>
+M: David Hildenbrand <david@redhat.com>
+S: Supported
+F: hw/intc/s390_flic*.c
+F: include/hw/s390x/s390_flic.h
+L: qemu-s390x@nongnu.org
+
 Subsystems
 ----------
 Overall Audio backends
-- 
Gitee


From 82d4203964435ec4efe439e691b945da1c4b9c5b Mon Sep 17 00:00:00 2001
From: Thomas Huth <thuth@redhat.com>
Date: Mon, 15 Jan 2024 14:00:04 +0100
Subject: [PATCH 324/407] s390x/pv: remove semicolon from macro definition
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Thomas Huth <thuth@redhat.com>
RH-MergeRequest: 348: s390x: Provide some more useful information if decryption of a PV image fails
RH-Jira: RHEL-18214
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Acked-by: Cédric Le Goater <clg@redhat.com>
RH-Commit: [2/5] 52a04c945a584746ff30bed516ad97bab75ac821

JIRA: https://issues.redhat.com/browse/RHEL-18214

commit 36c182bbe680d64f0868522bb9256b5b8eccf280
Author: Claudio Imbrenda <imbrenda@linux.ibm.com>
Date:   Mon Oct 10 17:10:41 2022 +0200

    s390x/pv: remove semicolon from macro definition

    Remove spurious semicolon at the end of the macro s390_pv_cmd

    Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
    Acked-by: Cornelia Huck <cohuck@redhat.com>
    Message-Id: <20221010151041.89071-1-imbrenda@linux.ibm.com>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Thomas Huth <thuth@redhat.com>
---
 hw/s390x/pv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/s390x/pv.c b/hw/s390x/pv.c
index 749e5db1ce2..8a1c71436ba 100644
--- a/hw/s390x/pv.c
+++ b/hw/s390x/pv.c
@@ -51,7 +51,7 @@ static int __s390_pv_cmd(uint32_t cmd, const char *cmdname, void *data)
  * This macro lets us pass the command as a string to the function so
  * we can print it on an error.
  */
-#define s390_pv_cmd(cmd, data) __s390_pv_cmd(cmd, #cmd, data);
+#define s390_pv_cmd(cmd, data) __s390_pv_cmd(cmd, #cmd, data)
 #define s390_pv_cmd_exit(cmd, data)    \
 {                                      \
     int rc;                            \
-- 
Gitee


From d2be840d5a4d25f6bcdcc92164c9adfdaada7573 Mon Sep 17 00:00:00 2001
From: Thomas Huth <thuth@redhat.com>
Date: Mon, 15 Jan 2024 14:00:04 +0100
Subject: [PATCH 325/407] hw/s390x/pv: Restrict Protected Virtualization to
 sysemu
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Thomas Huth <thuth@redhat.com>
RH-MergeRequest: 348: s390x: Provide some more useful information if decryption of a PV image fails
RH-Jira: RHEL-18214
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Acked-by: Cédric Le Goater <clg@redhat.com>
RH-Commit: [3/5] 17b11f9fd2b53c7d33c09a62f28cfca19b18e798

JIRA: https://issues.redhat.com/browse/RHEL-18214

commit 3ea7e312671686e616efa1b8caa5f5ce2d06543a
Author: Philippe Mathieu-Daudé <philmd@linaro.org>
Date:   Sat Dec 17 16:24:52 2022 +0100

    hw/s390x/pv: Restrict Protected Virtualization to sysemu

    Protected Virtualization is irrelevant in user emulation.

    Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
    Message-Id: <20221217152454.96388-4-philmd@linaro.org>
    Reviewed-by: Thomas Huth <thuth@redhat.com>
    Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Thomas Huth <thuth@redhat.com>
---
 target/s390x/cpu_features.c | 4 ++++
 target/s390x/cpu_models.c   | 4 +++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/target/s390x/cpu_features.c b/target/s390x/cpu_features.c
index 5528acd0828..2e4e11d264b 100644
--- a/target/s390x/cpu_features.c
+++ b/target/s390x/cpu_features.c
@@ -14,7 +14,9 @@
 #include "qemu/osdep.h"
 #include "qemu/module.h"
 #include "cpu_features.h"
+#ifndef CONFIG_USER_ONLY
 #include "hw/s390x/pv.h"
+#endif
 
 #define DEF_FEAT(_FEAT, _NAME, _TYPE, _BIT, _DESC) \
     [S390_FEAT_##_FEAT] = {                        \
@@ -107,6 +109,7 @@ void s390_fill_feat_block(const S390FeatBitmap features, S390FeatType type,
         feat = find_next_bit(features, S390_FEAT_MAX, feat + 1);
     }
 
+#ifndef CONFIG_USER_ONLY
     if (!s390_is_pv()) {
         return;
     }
@@ -147,6 +150,7 @@ void s390_fill_feat_block(const S390FeatBitmap features, S390FeatType type,
     default:
         return;
     }
+#endif
 }
 
 void s390_add_from_feat_block(S390FeatBitmap features, S390FeatType type,
diff --git a/target/s390x/cpu_models.c b/target/s390x/cpu_models.c
index 454485e7069..e7c586c76ec 100644
--- a/target/s390x/cpu_models.c
+++ b/target/s390x/cpu_models.c
@@ -22,8 +22,8 @@
 #include "qemu/qemu-print.h"
 #ifndef CONFIG_USER_ONLY
 #include "sysemu/sysemu.h"
-#endif
 #include "hw/s390x/pv.h"
+#endif
 
 #define CPUDEF_INIT(_type, _gen, _ec_ga, _mha_pow, _hmfai, _name, _desc) \
     {                                                                    \
@@ -236,6 +236,7 @@ bool s390_has_feat(S390Feat feat)
         return 0;
     }
 
+#ifndef CONFIG_USER_ONLY
     if (s390_is_pv()) {
         switch (feat) {
         case S390_FEAT_DIAG_318:
@@ -259,6 +260,7 @@ bool s390_has_feat(S390Feat feat)
             break;
         }
     }
+#endif
     return test_bit(feat, cpu->model->features);
 }
 
-- 
Gitee


From c754598abb8bc1b4fed587b0777a9f61f8cf4715 Mon Sep 17 00:00:00 2001
From: Thomas Huth <thuth@redhat.com>
Date: Mon, 15 Jan 2024 18:52:30 +0100
Subject: [PATCH 326/407] hw/s390x: Move KVM specific PV from hw/ to
 target/s390x/kvm/
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Thomas Huth <thuth@redhat.com>
RH-MergeRequest: 348: s390x: Provide some more useful information if decryption of a PV image fails
RH-Jira: RHEL-18214
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Acked-by: Cédric Le Goater <clg@redhat.com>
RH-Commit: [4/5] f6095bfdb89268007a0741665284955db4752d46

JIRA: https://issues.redhat.com/browse/RHEL-18214

commit f5f9c6ea11bc807664fdeb9354915c2c9cdcbd89
Author: Philippe Mathieu-Daudé <philmd@linaro.org>
Date:   Sat Jun 24 22:06:44 2023 +0200

    hw/s390x: Move KVM specific PV from hw/ to target/s390x/kvm/

    Protected Virtualization (PV) is not a real hardware device:
    it is a feature of the firmware on s390x that is exposed to
    userspace via the KVM interface.

    Move the pv.c/pv.h files to target/s390x/kvm/ to make this clearer.

    Suggested-by: Thomas Huth <thuth@redhat.com>
    Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
    Message-Id: <20230624200644.23931-1-philmd@linaro.org>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Conflicts:
    hw/s390x/ipl.c
    hw/s390x/s390-virtio-ccw.c
    target/s390x/diag.c
    (simple contextual conflict due to differce with #include statements)
Signed-off-by: Thomas Huth <thuth@redhat.com>
---
 MAINTAINERS                                 | 2 --
 hw/s390x/ipl.c                              | 2 +-
 hw/s390x/meson.build                        | 1 -
 hw/s390x/s390-pci-kvm.c                     | 2 +-
 hw/s390x/s390-virtio-ccw.c                  | 2 +-
 hw/s390x/tod-kvm.c                          | 2 +-
 target/s390x/arch_dump.c                    | 2 +-
 target/s390x/cpu-sysemu.c                   | 2 +-
 target/s390x/cpu_features.c                 | 2 +-
 target/s390x/cpu_models.c                   | 2 +-
 target/s390x/diag.c                         | 2 +-
 target/s390x/helper.c                       | 2 +-
 target/s390x/ioinst.c                       | 2 +-
 target/s390x/kvm/kvm.c                      | 2 +-
 target/s390x/kvm/meson.build                | 1 +
 {hw/s390x => target/s390x/kvm}/pv.c         | 2 +-
 {include/hw/s390x => target/s390x/kvm}/pv.h | 0
 17 files changed, 14 insertions(+), 16 deletions(-)
 rename {hw/s390x => target/s390x/kvm}/pv.c (99%)
 rename {include/hw/s390x => target/s390x/kvm}/pv.h (100%)

diff --git a/MAINTAINERS b/MAINTAINERS
index b893206fc3f..d74ca51154d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -397,8 +397,6 @@ S: Supported
 F: target/s390x/kvm/
 F: target/s390x/machine.c
 F: target/s390x/sigp.c
-F: hw/s390x/pv.c
-F: include/hw/s390x/pv.h
 F: gdb-xml/s390*.xml
 T: git https://github.com/borntraeger/qemu.git s390-next
 L: qemu-s390x@nongnu.org
diff --git a/hw/s390x/ipl.c b/hw/s390x/ipl.c
index 9051d8652d6..c25e2474266 100644
--- a/hw/s390x/ipl.c
+++ b/hw/s390x/ipl.c
@@ -27,7 +27,7 @@
 #include "hw/s390x/vfio-ccw.h"
 #include "hw/s390x/css.h"
 #include "hw/s390x/ebcdic.h"
-#include "hw/s390x/pv.h"
+#include "target/s390x/kvm/pv.h"
 #include "ipl.h"
 #include "qemu/error-report.h"
 #include "qemu/config-file.h"
diff --git a/hw/s390x/meson.build b/hw/s390x/meson.build
index 6e6e47fcda3..bb3b42f6131 100644
--- a/hw/s390x/meson.build
+++ b/hw/s390x/meson.build
@@ -22,7 +22,6 @@ s390x_ss.add(when: 'CONFIG_KVM', if_true: files(
   'tod-kvm.c',
   's390-skeys-kvm.c',
   's390-stattrib-kvm.c',
-  'pv.c',
   's390-pci-kvm.c',
 ))
 s390x_ss.add(when: 'CONFIG_TCG', if_true: files(
diff --git a/hw/s390x/s390-pci-kvm.c b/hw/s390x/s390-pci-kvm.c
index 9134fe185fe..ff41e4106d1 100644
--- a/hw/s390x/s390-pci-kvm.c
+++ b/hw/s390x/s390-pci-kvm.c
@@ -14,7 +14,7 @@
 #include <linux/kvm.h>
 
 #include "kvm/kvm_s390x.h"
-#include "hw/s390x/pv.h"
+#include "target/s390x/kvm/pv.h"
 #include "hw/s390x/s390-pci-bus.h"
 #include "hw/s390x/s390-pci-kvm.h"
 #include "hw/s390x/s390-pci-inst.h"
diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 17146469eec..7bfa5b4e8fc 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -40,7 +40,7 @@
 #include "hw/qdev-properties.h"
 #include "hw/s390x/tod.h"
 #include "sysemu/sysemu.h"
-#include "hw/s390x/pv.h"
+#include "target/s390x/kvm/pv.h"
 #include "migration/blocker.h"
 #include "qapi/visitor.h"
 
diff --git a/hw/s390x/tod-kvm.c b/hw/s390x/tod-kvm.c
index c804c979b5c..9776cda50a3 100644
--- a/hw/s390x/tod-kvm.c
+++ b/hw/s390x/tod-kvm.c
@@ -13,7 +13,7 @@
 #include "qemu/module.h"
 #include "sysemu/runstate.h"
 #include "hw/s390x/tod.h"
-#include "hw/s390x/pv.h"
+#include "target/s390x/kvm/pv.h"
 #include "kvm/kvm_s390x.h"
 
 static void kvm_s390_get_tod_raw(S390TOD *tod, Error **errp)
diff --git a/target/s390x/arch_dump.c b/target/s390x/arch_dump.c
index 3b1f178dc30..2554238c162 100644
--- a/target/s390x/arch_dump.c
+++ b/target/s390x/arch_dump.c
@@ -17,8 +17,8 @@
 #include "s390x-internal.h"
 #include "elf.h"
 #include "sysemu/dump.h"
-#include "hw/s390x/pv.h"
 #include "kvm/kvm_s390x.h"
+#include "target/s390x/kvm/pv.h"
 
 struct S390xUserRegsStruct {
     uint64_t psw[2];
diff --git a/target/s390x/cpu-sysemu.c b/target/s390x/cpu-sysemu.c
index 5471e01ee82..547287a949b 100644
--- a/target/s390x/cpu-sysemu.c
+++ b/target/s390x/cpu-sysemu.c
@@ -32,7 +32,7 @@
 #include "qapi/qapi-visit-run-state.h"
 #include "sysemu/hw_accel.h"
 
-#include "hw/s390x/pv.h"
+#include "target/s390x/kvm/pv.h"
 #include "hw/boards.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/tcg.h"
diff --git a/target/s390x/cpu_features.c b/target/s390x/cpu_features.c
index 2e4e11d264b..ebb155ce1cb 100644
--- a/target/s390x/cpu_features.c
+++ b/target/s390x/cpu_features.c
@@ -15,7 +15,7 @@
 #include "qemu/module.h"
 #include "cpu_features.h"
 #ifndef CONFIG_USER_ONLY
-#include "hw/s390x/pv.h"
+#include "target/s390x/kvm/pv.h"
 #endif
 
 #define DEF_FEAT(_FEAT, _NAME, _TYPE, _BIT, _DESC) \
diff --git a/target/s390x/cpu_models.c b/target/s390x/cpu_models.c
index e7c586c76ec..100c5e7b3a7 100644
--- a/target/s390x/cpu_models.c
+++ b/target/s390x/cpu_models.c
@@ -22,7 +22,7 @@
 #include "qemu/qemu-print.h"
 #ifndef CONFIG_USER_ONLY
 #include "sysemu/sysemu.h"
-#include "hw/s390x/pv.h"
+#include "target/s390x/kvm/pv.h"
 #endif
 
 #define CPUDEF_INIT(_type, _gen, _ec_ga, _mha_pow, _hmfai, _name, _desc) \
diff --git a/target/s390x/diag.c b/target/s390x/diag.c
index 76b01dcd68b..7c8714cc274 100644
--- a/target/s390x/diag.c
+++ b/target/s390x/diag.c
@@ -19,9 +19,9 @@
 #include "sysemu/cpus.h"
 #include "hw/s390x/ipl.h"
 #include "hw/s390x/s390-virtio-ccw.h"
-#include "hw/s390x/pv.h"
 #include "sysemu/kvm.h"
 #include "kvm/kvm_s390x.h"
+#include "target/s390x/kvm/pv.h"
 
 int handle_diag_288(CPUS390XState *env, uint64_t r1, uint64_t r3)
 {
diff --git a/target/s390x/helper.c b/target/s390x/helper.c
index 6e35473c7f8..860977126ab 100644
--- a/target/s390x/helper.c
+++ b/target/s390x/helper.c
@@ -24,7 +24,7 @@
 #include "exec/gdbstub.h"
 #include "qemu/timer.h"
 #include "hw/s390x/ioinst.h"
-#include "hw/s390x/pv.h"
+#include "target/s390x/kvm/pv.h"
 #include "sysemu/hw_accel.h"
 #include "sysemu/runstate.h"
 #include "sysemu/tcg.h"
diff --git a/target/s390x/ioinst.c b/target/s390x/ioinst.c
index bdae5090bc8..409f3e3e636 100644
--- a/target/s390x/ioinst.c
+++ b/target/s390x/ioinst.c
@@ -16,7 +16,7 @@
 #include "hw/s390x/ioinst.h"
 #include "trace.h"
 #include "hw/s390x/s390-pci-bus.h"
-#include "hw/s390x/pv.h"
+#include "target/s390x/kvm/pv.h"
 
 /* All I/O instructions but chsc use the s format */
 static uint64_t get_address_from_regs(CPUS390XState *env, uint32_t ipb,
diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
index a963866ef4b..6d1a6324b9e 100644
--- a/target/s390x/kvm/kvm.c
+++ b/target/s390x/kvm/kvm.c
@@ -51,7 +51,7 @@
 #include "exec/memattrs.h"
 #include "hw/s390x/s390-virtio-ccw.h"
 #include "hw/s390x/s390-virtio-hcall.h"
-#include "hw/s390x/pv.h"
+#include "target/s390x/kvm/pv.h"
 
 #ifndef DEBUG_KVM
 #define DEBUG_KVM  0
diff --git a/target/s390x/kvm/meson.build b/target/s390x/kvm/meson.build
index aef52b6686b..739d5b9f545 100644
--- a/target/s390x/kvm/meson.build
+++ b/target/s390x/kvm/meson.build
@@ -1,5 +1,6 @@
 
 s390x_ss.add(when: 'CONFIG_KVM', if_true: files(
+  'pv.c',
   'kvm.c'
 ), if_false: files(
   'stubs.c'
diff --git a/hw/s390x/pv.c b/target/s390x/kvm/pv.c
similarity index 99%
rename from hw/s390x/pv.c
rename to target/s390x/kvm/pv.c
index 8a1c71436ba..e14db4f41a0 100644
--- a/hw/s390x/pv.c
+++ b/target/s390x/kvm/pv.c
@@ -19,9 +19,9 @@
 #include "qom/object_interfaces.h"
 #include "exec/confidential-guest-support.h"
 #include "hw/s390x/ipl.h"
-#include "hw/s390x/pv.h"
 #include "hw/s390x/sclp.h"
 #include "target/s390x/kvm/kvm_s390x.h"
+#include "target/s390x/kvm/pv.h"
 
 static bool info_valid;
 static struct kvm_s390_pv_info_vm info_vm;
diff --git a/include/hw/s390x/pv.h b/target/s390x/kvm/pv.h
similarity index 100%
rename from include/hw/s390x/pv.h
rename to target/s390x/kvm/pv.h
-- 
Gitee


From 73e45ce17321b3715fe3150f833e4c748473f7a3 Mon Sep 17 00:00:00 2001
From: Thomas Huth <thuth@redhat.com>
Date: Wed, 10 Jan 2024 15:29:16 +0100
Subject: [PATCH 327/407] target/s390x/kvm/pv: Provide some more useful
 information if decryption fails
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Thomas Huth <thuth@redhat.com>
RH-MergeRequest: 348: s390x: Provide some more useful information if decryption of a PV image fails
RH-Jira: RHEL-18214
RH-Acked-by: Jon Maloy <jmaloy@redhat.com>
RH-Acked-by: Cédric Le Goater <clg@redhat.com>
RH-Commit: [5/5] 087acaecfaa5921b409beb212123214fa79fe50c

JIRA: https://issues.redhat.com/browse/RHEL-18214

commit 7af51621b16ae86646cc2dc9dee30de8176ff761
Author: Thomas Huth <thuth@redhat.com>
Date:   Wed Jan 10 15:29:16 2024 +0100

    target/s390x/kvm/pv: Provide some more useful information if decryption fails

    It's a common scenario to copy guest images from one host to another
    to run the guest on the other machine. This (of course) does not work
    with "secure execution" guests since they are encrypted with one certain
    host key. However, if you still (accidentally) do it, you only get a
    very user-unfriendly error message that looks like this:

     qemu-system-s390x: KVM PV command 2 (KVM_PV_SET_SEC_PARMS) failed:
      header rc 108 rrc 5 IOCTL rc: -22

    Let's provide at least a somewhat nicer hint to the users so that they
    are able to figure out what might have gone wrong.

    Message-ID: <20240110142916.850605-1-thuth@redhat.com>
    Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
    Reviewed-by: Cédric Le Goater <clg@redhat.com>
    Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Conflicts:
    target/s390x/kvm/pv.c
    target/s390x/kvm/pv.h
    (contextual conflict due to missing async-teardown in RHEL8)
Signed-off-by: Thomas Huth <thuth@redhat.com>
---
 hw/s390x/ipl.c             |  5 ++---
 hw/s390x/ipl.h             |  2 +-
 hw/s390x/s390-virtio-ccw.c |  5 ++++-
 target/s390x/kvm/pv.c      | 25 ++++++++++++++++++++-----
 target/s390x/kvm/pv.h      |  5 +++--
 5 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/hw/s390x/ipl.c b/hw/s390x/ipl.c
index c25e2474266..c6cefdd3fe3 100644
--- a/hw/s390x/ipl.c
+++ b/hw/s390x/ipl.c
@@ -709,7 +709,7 @@ static void s390_ipl_prepare_qipl(S390CPU *cpu)
     cpu_physical_memory_unmap(addr, len, 1, len);
 }
 
-int s390_ipl_prepare_pv_header(void)
+int s390_ipl_prepare_pv_header(Error **errp)
 {
     IplParameterBlock *ipib = s390_ipl_get_iplb_pv();
     IPLBlockPV *ipib_pv = &ipib->pv;
@@ -718,8 +718,7 @@ int s390_ipl_prepare_pv_header(void)
 
     cpu_physical_memory_read(ipib_pv->pv_header_addr, hdr,
                              ipib_pv->pv_header_len);
-    rc = s390_pv_set_sec_parms((uintptr_t)hdr,
-                               ipib_pv->pv_header_len);
+    rc = s390_pv_set_sec_parms((uintptr_t)hdr, ipib_pv->pv_header_len, errp);
     g_free(hdr);
     return rc;
 }
diff --git a/hw/s390x/ipl.h b/hw/s390x/ipl.h
index dfc6dfd89c8..f9cce333301 100644
--- a/hw/s390x/ipl.h
+++ b/hw/s390x/ipl.h
@@ -107,7 +107,7 @@ typedef union IplParameterBlock IplParameterBlock;
 
 int s390_ipl_set_loadparm(uint8_t *loadparm);
 void s390_ipl_update_diag308(IplParameterBlock *iplb);
-int s390_ipl_prepare_pv_header(void);
+int s390_ipl_prepare_pv_header(Error **errp);
 int s390_ipl_pv_unpack(void);
 void s390_ipl_prepare_cpu(S390CPU *cpu);
 IplParameterBlock *s390_ipl_get_iplb(void);
diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 7bfa5b4e8fc..94434c3bb19 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -374,7 +374,7 @@ static int s390_machine_protect(S390CcwMachineState *ms)
     }
 
     /* Set SE header and unpack */
-    rc = s390_ipl_prepare_pv_header();
+    rc = s390_ipl_prepare_pv_header(&local_err);
     if (rc) {
         goto out_err;
     }
@@ -393,6 +393,9 @@ static int s390_machine_protect(S390CcwMachineState *ms)
     return rc;
 
 out_err:
+    if (local_err) {
+        error_report_err(local_err);
+    }
     s390_machine_unprotect(ms);
     return rc;
 }
diff --git a/target/s390x/kvm/pv.c b/target/s390x/kvm/pv.c
index e14db4f41a0..ae75063777e 100644
--- a/target/s390x/kvm/pv.c
+++ b/target/s390x/kvm/pv.c
@@ -27,7 +27,8 @@ static bool info_valid;
 static struct kvm_s390_pv_info_vm info_vm;
 static struct kvm_s390_pv_info_dump info_dump;
 
-static int __s390_pv_cmd(uint32_t cmd, const char *cmdname, void *data)
+static int __s390_pv_cmd(uint32_t cmd, const char *cmdname, void *data,
+                         int *pvrc)
 {
     struct kvm_pv_cmd pv_cmd = {
         .cmd = cmd,
@@ -44,6 +45,9 @@ static int __s390_pv_cmd(uint32_t cmd, const char *cmdname, void *data)
                      "IOCTL rc: %d", cmd, cmdname, pv_cmd.rc, pv_cmd.rrc,
                      rc);
     }
+    if (pvrc) {
+        *pvrc = pv_cmd.rc;
+    }
     return rc;
 }
 
@@ -51,12 +55,13 @@ static int __s390_pv_cmd(uint32_t cmd, const char *cmdname, void *data)
  * This macro lets us pass the command as a string to the function so
  * we can print it on an error.
  */
-#define s390_pv_cmd(cmd, data) __s390_pv_cmd(cmd, #cmd, data)
+#define s390_pv_cmd(cmd, data) __s390_pv_cmd(cmd, #cmd, data, NULL)
+#define s390_pv_cmd_pvrc(cmd, data, pvrc) __s390_pv_cmd(cmd, #cmd, data, pvrc)
 #define s390_pv_cmd_exit(cmd, data)    \
 {                                      \
     int rc;                            \
                                        \
-    rc = __s390_pv_cmd(cmd, #cmd, data);\
+    rc = __s390_pv_cmd(cmd, #cmd, data, NULL); \
     if (rc) {                          \
         exit(1);                       \
     }                                  \
@@ -108,14 +113,24 @@ void s390_pv_vm_disable(void)
      s390_pv_cmd_exit(KVM_PV_DISABLE, NULL);
 }
 
-int s390_pv_set_sec_parms(uint64_t origin, uint64_t length)
+int s390_pv_set_sec_parms(uint64_t origin, uint64_t length, Error **errp)
 {
+    int ret, pvrc;
     struct kvm_s390_pv_sec_parm args = {
         .origin = origin,
         .length = length,
     };
 
-    return s390_pv_cmd(KVM_PV_SET_SEC_PARMS, &args);
+    ret = s390_pv_cmd_pvrc(KVM_PV_SET_SEC_PARMS, &args, &pvrc);
+    if (ret) {
+        error_setg(errp, "Failed to set secure execution parameters");
+        if (pvrc == 0x108) {
+            error_append_hint(errp, "Please check whether the image is "
+                                    "correctly encrypted for this host\n");
+        }
+    }
+
+    return ret;
 }
 
 /*
diff --git a/target/s390x/kvm/pv.h b/target/s390x/kvm/pv.h
index 9360aa10914..6868c3f4aca 100644
--- a/target/s390x/kvm/pv.h
+++ b/target/s390x/kvm/pv.h
@@ -41,7 +41,7 @@ static inline bool s390_is_pv(void)
 int s390_pv_query_info(void);
 int s390_pv_vm_enable(void);
 void s390_pv_vm_disable(void);
-int s390_pv_set_sec_parms(uint64_t origin, uint64_t length);
+int s390_pv_set_sec_parms(uint64_t origin, uint64_t length, Error **errp);
 int s390_pv_unpack(uint64_t addr, uint64_t size, uint64_t tweak);
 void s390_pv_prep_reset(void);
 int s390_pv_verify(void);
@@ -60,7 +60,8 @@ static inline bool s390_is_pv(void) { return false; }
 static inline int s390_pv_query_info(void) { return 0; }
 static inline int s390_pv_vm_enable(void) { return 0; }
 static inline void s390_pv_vm_disable(void) {}
-static inline int s390_pv_set_sec_parms(uint64_t origin, uint64_t length) { return 0; }
+static inline int s390_pv_set_sec_parms(uint64_t origin, uint64_t length,
+                                        Error **errp) { return 0; }
 static inline int s390_pv_unpack(uint64_t addr, uint64_t size, uint64_t tweak) { return 0; }
 static inline void s390_pv_prep_reset(void) {}
 static inline int s390_pv_verify(void) { return 0; }
-- 
Gitee


From e38ee1fdd0f0ab3ddf848e94df815529f5ebc3f1 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= <clg@redhat.com>
Date: Tue, 23 Jan 2024 13:59:24 +0100
Subject: [PATCH 328/407] s390x/pci: avoid double enable/disable of aif
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 349: s390x: Fix reset ordering of passthrough ISM devices
RH-Jira: RHEL-22411
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Commit: [1/3] 450e4ca607d801bce93415994250374d70fb72f6

JIRA: https://issues.redhat.com/browse/RHEL-22411

commit 07b2c8e034d80ff92e202405c494d2ff80fcf848
Author: Matthew Rosato <mjrosato@linux.ibm.com>
Date:   Thu Jan 18 13:51:49 2024 -0500

    s390x/pci: avoid double enable/disable of aif

    Use a flag to keep track of whether AIF is currently enabled.  This can be
    used to avoid enabling/disabling AIF multiple times as well as to determine
    whether or not it should be disabled during reset processing.

    Fixes: d0bc7091c2 ("s390x/pci: enable adapter event notification for interpreted devices")
    Reported-by: Cédric Le Goater <clg@redhat.com>
    Reviewed-by: Eric Farman <farman@linux.ibm.com>
    Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
    Message-ID: <20240118185151.265329-2-mjrosato@linux.ibm.com>
    Reviewed-by: Cédric Le Goater <clg@redhat.com>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 hw/s390x/s390-pci-kvm.c         | 25 +++++++++++++++++++++++--
 include/hw/s390x/s390-pci-bus.h |  1 +
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/hw/s390x/s390-pci-kvm.c b/hw/s390x/s390-pci-kvm.c
index ff41e4106d1..1ee510436cb 100644
--- a/hw/s390x/s390-pci-kvm.c
+++ b/hw/s390x/s390-pci-kvm.c
@@ -27,6 +27,7 @@ bool s390_pci_kvm_interp_allowed(void)
 
 int s390_pci_kvm_aif_enable(S390PCIBusDevice *pbdev, ZpciFib *fib, bool assist)
 {
+    int rc;
     struct kvm_s390_zpci_op args = {
         .fh = pbdev->fh,
         .op = KVM_S390_ZPCIOP_REG_AEN,
@@ -38,15 +39,35 @@ int s390_pci_kvm_aif_enable(S390PCIBusDevice *pbdev, ZpciFib *fib, bool assist)
         .u.reg_aen.flags = (assist) ? 0 : KVM_S390_ZPCIOP_REGAEN_HOST
     };
 
-    return kvm_vm_ioctl(kvm_state, KVM_S390_ZPCI_OP, &args);
+    if (pbdev->aif) {
+        return -EINVAL;
+    }
+
+    rc = kvm_vm_ioctl(kvm_state, KVM_S390_ZPCI_OP, &args);
+    if (rc == 0) {
+        pbdev->aif = true;
+    }
+
+    return rc;
 }
 
 int s390_pci_kvm_aif_disable(S390PCIBusDevice *pbdev)
 {
+    int rc;
+
     struct kvm_s390_zpci_op args = {
         .fh = pbdev->fh,
         .op = KVM_S390_ZPCIOP_DEREG_AEN
     };
 
-    return kvm_vm_ioctl(kvm_state, KVM_S390_ZPCI_OP, &args);
+    if (!pbdev->aif) {
+        return -EINVAL;
+    }
+
+    rc = kvm_vm_ioctl(kvm_state, KVM_S390_ZPCI_OP, &args);
+    if (rc == 0) {
+        pbdev->aif = false;
+    }
+
+    return rc;
 }
diff --git a/include/hw/s390x/s390-pci-bus.h b/include/hw/s390x/s390-pci-bus.h
index e0a9f9385be..7a658f5e30f 100644
--- a/include/hw/s390x/s390-pci-bus.h
+++ b/include/hw/s390x/s390-pci-bus.h
@@ -361,6 +361,7 @@ struct S390PCIBusDevice {
     bool unplug_requested;
     bool interp;
     bool forwarding_assist;
+    bool aif;
     QTAILQ_ENTRY(S390PCIBusDevice) link;
 };
 
-- 
Gitee


From 373def41f91731fad46710d072a16805ff34e843 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= <clg@redhat.com>
Date: Tue, 23 Jan 2024 13:59:24 +0100
Subject: [PATCH 329/407] s390x/pci: refresh fh before disabling aif
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 349: s390x: Fix reset ordering of passthrough ISM devices
RH-Jira: RHEL-22411
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Commit: [2/3] 4a7d3fccdac508253bd7e5765973a08482022edb

JIRA: https://issues.redhat.com/browse/RHEL-22411

commit 30e35258e25c75c9d799c34fd89afcafffb37084
Author: Matthew Rosato <mjrosato@linux.ibm.com>
Date:   Thu Jan 18 13:51:50 2024 -0500

    s390x/pci: refresh fh before disabling aif

    Typically we refresh the host fh during CLP enable, however it's possible
    that the device goes through multiple reset events before the guest
    performs another CLP enable.  Let's handle this for now by refreshing the
    host handle from vfio before disabling aif.

    Fixes: 03451953c7 ("s390x/pci: reset ISM passthrough devices on shutdown and system reset")
    Reported-by: Cédric Le Goater <clg@redhat.com>
    Reviewed-by: Eric Farman <farman@linux.ibm.com>
    Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
    Message-ID: <20240118185151.265329-3-mjrosato@linux.ibm.com>
    Reviewed-by: Cédric Le Goater <clg@redhat.com>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 hw/s390x/s390-pci-kvm.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/hw/s390x/s390-pci-kvm.c b/hw/s390x/s390-pci-kvm.c
index 1ee510436cb..9eef4fc3ec7 100644
--- a/hw/s390x/s390-pci-kvm.c
+++ b/hw/s390x/s390-pci-kvm.c
@@ -18,6 +18,7 @@
 #include "hw/s390x/s390-pci-bus.h"
 #include "hw/s390x/s390-pci-kvm.h"
 #include "hw/s390x/s390-pci-inst.h"
+#include "hw/s390x/s390-pci-vfio.h"
 #include "cpu_models.h"
 
 bool s390_pci_kvm_interp_allowed(void)
@@ -64,6 +65,14 @@ int s390_pci_kvm_aif_disable(S390PCIBusDevice *pbdev)
         return -EINVAL;
     }
 
+    /*
+     * The device may have already been reset but we still want to relinquish
+     * the guest ISC, so always be sure to use an up-to-date host fh.
+     */
+    if (!s390_pci_get_host_fh(pbdev, &args.fh)) {
+        return -EPERM;
+    }
+
     rc = kvm_vm_ioctl(kvm_state, KVM_S390_ZPCI_OP, &args);
     if (rc == 0) {
         pbdev->aif = false;
-- 
Gitee


From ed05d26ae9bf96d40fa095c9d8b8290f12081183 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= <clg@redhat.com>
Date: Tue, 23 Jan 2024 13:59:24 +0100
Subject: [PATCH 330/407] s390x/pci: drive ISM reset from subsystem reset
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Cédric Le Goater <clg@redhat.com>
RH-MergeRequest: 349: s390x: Fix reset ordering of passthrough ISM devices
RH-Jira: RHEL-22411
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
RH-Commit: [3/3] 42e89595dd5e24538a2d3f075391b4534497eece

JIRA: https://issues.redhat.com/browse/RHEL-22411

commit 68c691ca99a2538d6a53a70ce8a9ce06ee307ff1
Author: Matthew Rosato <mjrosato@linux.ibm.com>
Date:   Thu Jan 18 13:51:51 2024 -0500

    s390x/pci: drive ISM reset from subsystem reset

    ISM devices are sensitive to manipulation of the IOMMU, so the ISM device
    needs to be reset before the vfio-pci device is reset (triggering a full
    UNMAP).  In order to ensure this occurs, trigger ISM device resets from
    subsystem_reset before triggering the PCI bus reset (which will also
    trigger vfio-pci reset).  This only needs to be done for ISM devices
    which were enabled for use by the guest.
    Further, ensure that AIF is disabled as part of the reset event.

    Fixes: ef1535901a ("s390x: do a subsystem reset before the unprotect on reboot")
    Fixes: 03451953c7 ("s390x/pci: reset ISM passthrough devices on shutdown and system reset")
    Reported-by: Cédric Le Goater <clg@redhat.com>
    Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
    Message-ID: <20240118185151.265329-4-mjrosato@linux.ibm.com>
    Reviewed-by: Eric Farman <farman@linux.ibm.com>
    Reviewed-by: Cédric Le Goater <clg@redhat.com>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 hw/s390x/s390-pci-bus.c         | 26 +++++++++++++++++---------
 hw/s390x/s390-virtio-ccw.c      |  8 ++++++++
 include/hw/s390x/s390-pci-bus.h |  1 +
 3 files changed, 26 insertions(+), 9 deletions(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 2d92848b0f0..a8953693b9b 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -160,20 +160,12 @@ static void s390_pci_shutdown_notifier(Notifier *n, void *opaque)
     pci_device_reset(pbdev->pdev);
 }
 
-static void s390_pci_reset_cb(void *opaque)
-{
-    S390PCIBusDevice *pbdev = opaque;
-
-    pci_device_reset(pbdev->pdev);
-}
-
 static void s390_pci_perform_unplug(S390PCIBusDevice *pbdev)
 {
     HotplugHandler *hotplug_ctrl;
 
     if (pbdev->pft == ZPCI_PFT_ISM) {
         notifier_remove(&pbdev->shutdown_notifier);
-        qemu_unregister_reset(s390_pci_reset_cb, pbdev);
     }
 
     /* Unplug the PCI device */
@@ -1137,7 +1129,6 @@ static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
             if (pbdev->pft == ZPCI_PFT_ISM) {
                 pbdev->shutdown_notifier.notify = s390_pci_shutdown_notifier;
                 qemu_register_shutdown_notifier(&pbdev->shutdown_notifier);
-                qemu_register_reset(s390_pci_reset_cb, pbdev);
             }
         } else {
             pbdev->fh |= FH_SHM_EMUL;
@@ -1284,6 +1275,23 @@ static void s390_pci_enumerate_bridge(PCIBus *bus, PCIDevice *pdev,
     pci_default_write_config(pdev, PCI_SUBORDINATE_BUS, s->bus_no, 1);
 }
 
+void s390_pci_ism_reset(void)
+{
+    S390pciState *s = s390_get_phb();
+
+    S390PCIBusDevice *pbdev, *next;
+
+    /* Trigger reset event for each passthrough ISM device currently in-use */
+    QTAILQ_FOREACH_SAFE(pbdev, &s->zpci_devs, link, next) {
+        if (pbdev->interp && pbdev->pft == ZPCI_PFT_ISM &&
+            pbdev->fh & FH_MASK_ENABLE) {
+            s390_pci_kvm_aif_disable(pbdev);
+
+            pci_device_reset(pbdev->pdev);
+        }
+    }
+}
+
 static void s390_pcihost_reset(DeviceState *dev)
 {
     S390pciState *s = S390_PCI_HOST_BRIDGE(dev);
diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 94434c3bb19..51e5b398880 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -108,6 +108,14 @@ static void subsystem_reset(void)
     DeviceState *dev;
     int i;
 
+    /*
+     * ISM firmware is sensitive to unexpected changes to the IOMMU, which can
+     * occur during reset of the vfio-pci device (unmap of entire aperture).
+     * Ensure any passthrough ISM devices are reset now, while CPUs are paused
+     * but before vfio-pci cleanup occurs.
+     */
+    s390_pci_ism_reset();
+
     for (i = 0; i < ARRAY_SIZE(reset_dev_types); i++) {
         dev = DEVICE(object_resolve_path_type("", reset_dev_types[i], NULL));
         if (dev) {
diff --git a/include/hw/s390x/s390-pci-bus.h b/include/hw/s390x/s390-pci-bus.h
index 7a658f5e30f..2bfad5563a4 100644
--- a/include/hw/s390x/s390-pci-bus.h
+++ b/include/hw/s390x/s390-pci-bus.h
@@ -401,5 +401,6 @@ S390PCIBusDevice *s390_pci_find_dev_by_target(S390pciState *s,
                                               const char *target);
 S390PCIBusDevice *s390_pci_find_next_avail_dev(S390pciState *s,
                                                S390PCIBusDevice *pbdev);
+void s390_pci_ism_reset(void);
 
 #endif
-- 
Gitee


From 8977f66a2ac682424cb8e704919326393f80e099 Mon Sep 17 00:00:00 2001
From: Stefan Hajnoczi <stefanha@redhat.com>
Date: Thu, 18 Jan 2024 09:48:21 -0500
Subject: [PATCH 331/407] iotests: add filter_qmp_generated_node_ids()

RH-Author: Stefan Hajnoczi <stefanha@redhat.com>
RH-MergeRequest: 352: monitor: only run coroutine commands in qemu_aio_context
RH-Jira: RHEL-7353
RH-Acked-by: Kevin Wolf <kwolf@redhat.com>
RH-Acked-by: Hanna Czenczek <hreitz@redhat.com>
RH-Commit: [1/4] cc276c8ef9e140203afc19fcd8b5b8e20577054d

Add a filter function for QMP responses that contain QEMU's
automatically generated node ids. The ids change between runs and must
be masked in the reference output.

The next commit will use this new function.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20240118144823.1497953-2-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit da62b507a20510d819bcfbe8f5e573409b954006)
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tests/qemu-iotests/iotests.py | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 2ef493755c6..fd41f93421d 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -521,6 +521,13 @@ def _filter(_key, value):
 def filter_generated_node_ids(msg):
     return re.sub("#block[0-9]+", "NODE_NAME", msg)
 
+def filter_qmp_generated_node_ids(qmsg):
+    def _filter(_key, value):
+        if is_str(value):
+            return filter_generated_node_ids(value)
+        return value
+    return filter_qmp(qmsg, _filter)
+
 def filter_img_info(output, filename):
     lines = []
     for line in output.split('\n'):
-- 
Gitee


From 33211239628ea57d90bae274a0b48a1f38ed4a9b Mon Sep 17 00:00:00 2001
From: Stefan Hajnoczi <stefanha@redhat.com>
Date: Thu, 18 Jan 2024 09:48:22 -0500
Subject: [PATCH 332/407] iotests: port 141 to Python for reliable QMP testing

RH-Author: Stefan Hajnoczi <stefanha@redhat.com>
RH-MergeRequest: 352: monitor: only run coroutine commands in qemu_aio_context
RH-Jira: RHEL-7353
RH-Acked-by: Kevin Wolf <kwolf@redhat.com>
RH-Acked-by: Hanna Czenczek <hreitz@redhat.com>
RH-Commit: [2/4] ff0899262544b1b61b4c7de2eb798b664fe5202e

The common.qemu bash functions allow tests to interact with the QMP
monitor of a QEMU process. I spent two days trying to update 141 when
the order of the test output changed, but found it would still fail
occassionally because printf() and QMP events race with synchronous QMP
communication.

I gave up and ported 141 to the existing Python API for QMP tests. The
Python API is less affected by the order in which QEMU prints output
because it does not print all QMP traffic by default.

The next commit changes the order in which QMP messages are received.
Make 141 reliable first.

Cc: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20240118144823.1497953-3-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 9ee2dd4c22a3639c5462b3fc20df60c005c3de64)
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

Conflicts:
  tests/qemu-iotests/141
  tests/qemu-iotests/141.out

  This commit replaces these files anyway, so apply our changes instead
  of dragging in more dependencies to resolve context conflicts.
---
 tests/qemu-iotests/141     | 307 ++++++++++++++++---------------------
 tests/qemu-iotests/141.out | 204 ++++++------------------
 2 files changed, 178 insertions(+), 333 deletions(-)

diff --git a/tests/qemu-iotests/141 b/tests/qemu-iotests/141
index 115cc1691e5..a7d3985a02e 100755
--- a/tests/qemu-iotests/141
+++ b/tests/qemu-iotests/141
@@ -1,9 +1,12 @@
-#!/usr/bin/env bash
+#!/usr/bin/env python3
 # group: rw auto quick
 #
 # Test case for ejecting BDSs with block jobs still running on them
 #
-# Copyright (C) 2016 Red Hat, Inc.
+# Originally written in bash by Hanna Czenczek, ported to Python by Stefan
+# Hajnoczi.
+#
+# Copyright Red Hat
 #
 # This program is free software; you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
@@ -19,177 +22,129 @@
 # along with this program.  If not, see <http://www.gnu.org/licenses/>.
 #
 
-# creator
-owner=mreitz@redhat.com
-
-seq="$(basename $0)"
-echo "QA output created by $seq"
-
-status=1	# failure is the default!
-
-_cleanup()
-{
-    _cleanup_qemu
-    _cleanup_test_img
-    for img in "$TEST_DIR"/{b,m,o}.$IMGFMT; do
-        _rm_test_img "$img"
-    done
-}
-trap "_cleanup; exit \$status" 0 1 2 3 15
-
-# get standard environment, filters and checks
-. ./common.rc
-. ./common.filter
-. ./common.qemu
-
-# Needs backing file and backing format support
-_supported_fmt qcow2 qed
-_supported_proto file
-_supported_os Linux
-
-
-test_blockjob()
-{
-    _send_qemu_cmd $QEMU_HANDLE \
-        "{'execute': 'blockdev-add',
-          'arguments': {
-              'node-name': 'drv0',
-              'driver': '$IMGFMT',
-              'file': {
-                  'driver': 'file',
-                  'filename': '$TEST_IMG'
-              }}}" \
-        'return'
-
-    # If "$2" is an event, we may or may not see it before the
-    # {"return": {}}.  Therefore, filter the {"return": {}} out both
-    # here and in the next command.  (Naturally, if we do not see it
-    # here, we will see it before the next command can be executed,
-    # so it will appear in the next _send_qemu_cmd's output.)
-    _send_qemu_cmd $QEMU_HANDLE \
-        "$1" \
-        "$2" \
-        | _filter_img_create | _filter_qmp_empty_return
-
-    # We want this to return an error because the block job is still running
-    _send_qemu_cmd $QEMU_HANDLE \
-        "{'execute': 'blockdev-del',
-          'arguments': {'node-name': 'drv0'}}" \
-        'error' | _filter_generated_node_ids | _filter_qmp_empty_return
-
-    _send_qemu_cmd $QEMU_HANDLE \
-        "{'execute': 'block-job-cancel',
-          'arguments': {'device': 'job0'}}" \
-        "$3"
-
-    _send_qemu_cmd $QEMU_HANDLE \
-        "{'execute': 'blockdev-del',
-          'arguments': {'node-name': 'drv0'}}" \
-        'return'
-}
-
-
-TEST_IMG="$TEST_DIR/b.$IMGFMT" _make_test_img 1M
-TEST_IMG="$TEST_DIR/m.$IMGFMT" _make_test_img -b "$TEST_DIR/b.$IMGFMT" -F $IMGFMT 1M
-_make_test_img -b "$TEST_DIR/m.$IMGFMT" 1M -F $IMGFMT
-
-_launch_qemu -nodefaults
-
-_send_qemu_cmd $QEMU_HANDLE \
-    "{'execute': 'qmp_capabilities'}" \
-    'return'
-
-echo
-echo '=== Testing drive-backup ==='
-echo
-
-# drive-backup will not send BLOCK_JOB_READY by itself, and cancelling the job
-# will consequently result in BLOCK_JOB_CANCELLED being emitted.
-
-test_blockjob \
-    "{'execute': 'drive-backup',
-      'arguments': {'job-id': 'job0',
-                    'device': 'drv0',
-                    'target': '$TEST_DIR/o.$IMGFMT',
-                    'format': '$IMGFMT',
-                    'sync': 'none'}}" \
-    'return' \
-    '"status": "null"'
-
-echo
-echo '=== Testing drive-mirror ==='
-echo
-
-# drive-mirror will send BLOCK_JOB_READY basically immediately, and cancelling
-# the job will consequently result in BLOCK_JOB_COMPLETED being emitted.
-
-test_blockjob \
-    "{'execute': 'drive-mirror',
-      'arguments': {'job-id': 'job0',
-                    'device': 'drv0',
-                    'target': '$TEST_DIR/o.$IMGFMT',
-                    'format': '$IMGFMT',
-                    'sync': 'none'}}" \
-    'BLOCK_JOB_READY' \
-    '"status": "null"'
-
-echo
-echo '=== Testing active block-commit ==='
-echo
-
-# An active block-commit will send BLOCK_JOB_READY basically immediately, and
-# cancelling the job will consequently result in BLOCK_JOB_COMPLETED being
-# emitted.
-
-test_blockjob \
-    "{'execute': 'block-commit',
-      'arguments': {'job-id': 'job0', 'device': 'drv0'}}" \
-    'BLOCK_JOB_READY' \
-    '"status": "null"'
-
-echo
-echo '=== Testing non-active block-commit ==='
-echo
-
-# Give block-commit something to work on, otherwise it would be done
-# immediately, send a BLOCK_JOB_COMPLETED and ejecting the BDS would work just
-# fine without the block job still running.
-
-$QEMU_IO -c 'write 0 1M' "$TEST_DIR/m.$IMGFMT" | _filter_qemu_io
-
-test_blockjob \
-    "{'execute': 'block-commit',
-      'arguments': {'job-id': 'job0',
-                    'device': 'drv0',
-                    'top':    '$TEST_DIR/m.$IMGFMT',
-                    'speed':  1}}" \
-    'return' \
-    '"status": "null"'
-
-echo
-echo '=== Testing block-stream ==='
-echo
-
-# Give block-stream something to work on, otherwise it would be done
-# immediately, send a BLOCK_JOB_COMPLETED and ejecting the BDS would work just
-# fine without the block job still running.
-
-$QEMU_IO -c 'write 0 1M' "$TEST_DIR/b.$IMGFMT" | _filter_qemu_io
-
-# With some data to stream (and @speed set to 1), block-stream will not complete
-# until we send the block-job-cancel command.
-
-test_blockjob \
-    "{'execute': 'block-stream',
-      'arguments': {'job-id': 'job0',
-                    'device': 'drv0',
-                    'speed': 1}}" \
-    'return' \
-    '"status": "null"'
-
-_cleanup_qemu
-
-# success, all done
-echo "*** done"
-rm -f $seq.full
-status=0
+import iotests
+
+# Common filters to mask values that vary in the test output
+QMP_FILTERS = [iotests.filter_qmp_testfiles, \
+               iotests.filter_qmp_imgfmt]
+
+
+class TestCase:
+    def __init__(self, name, vm, image_path, cancel_event):
+        self.name = name
+        self.vm = vm
+        self.image_path = image_path
+        self.cancel_event = cancel_event
+
+    def __enter__(self):
+        iotests.log(f'=== Testing {self.name} ===')
+        self.vm.qmp_log('blockdev-add', \
+                        node_name='drv0', \
+                        driver=iotests.imgfmt, \
+                        file={'driver': 'file', 'filename': self.image_path}, \
+                        filters=QMP_FILTERS)
+
+    def __exit__(self, *exc_details):
+        # This is expected to fail because the job still exists
+        self.vm.qmp_log('blockdev-del', node_name='drv0', \
+                        filters=[iotests.filter_qmp_generated_node_ids])
+
+        self.vm.qmp_log('block-job-cancel', device='job0')
+        event = self.vm.event_wait(self.cancel_event)
+        iotests.log(event, filters=[iotests.filter_qmp_event])
+
+        # This time it succeeds
+        self.vm.qmp_log('blockdev-del', node_name='drv0')
+
+        # Separate test cases in output
+        iotests.log('')
+
+
+def main() -> None:
+    with iotests.FilePath('bottom', 'middle', 'top', 'target') as \
+            (bottom_path, middle_path, top_path, target_path), \
+         iotests.VM() as vm:
+
+        iotests.log('Creating bottom <- middle <- top backing file chain...')
+        IMAGE_SIZE='1M'
+        iotests.qemu_img_create('-f', iotests.imgfmt, bottom_path, IMAGE_SIZE)
+        iotests.qemu_img_create('-f', iotests.imgfmt, \
+                                '-F', iotests.imgfmt, \
+                                '-b', bottom_path, \
+                                middle_path, \
+                                IMAGE_SIZE)
+        iotests.qemu_img_create('-f', iotests.imgfmt, \
+                                '-F', iotests.imgfmt, \
+                                '-b', middle_path, \
+                                top_path, \
+                                IMAGE_SIZE)
+
+        iotests.log('Starting VM...')
+        vm.add_args('-nodefaults')
+        vm.launch()
+
+        # drive-backup will not send BLOCK_JOB_READY by itself, and cancelling
+        # the job will consequently result in BLOCK_JOB_CANCELLED being
+        # emitted.
+        with TestCase('drive-backup', vm, top_path, 'BLOCK_JOB_CANCELLED'):
+            vm.qmp_log('drive-backup', \
+                       job_id='job0', \
+                       device='drv0', \
+                       target=target_path, \
+                       format=iotests.imgfmt, \
+                       sync='none', \
+                       filters=QMP_FILTERS)
+
+        # drive-mirror will send BLOCK_JOB_READY basically immediately, and
+        # cancelling the job will consequently result in BLOCK_JOB_COMPLETED
+        # being emitted.
+        with TestCase('drive-mirror', vm, top_path, 'BLOCK_JOB_COMPLETED'):
+            vm.qmp_log('drive-mirror', \
+                       job_id='job0', \
+                       device='drv0', \
+                       target=target_path, \
+                       format=iotests.imgfmt, \
+                       sync='none', \
+                       filters=QMP_FILTERS)
+            event = vm.event_wait('BLOCK_JOB_READY')
+            assert event is not None # silence mypy
+            iotests.log(event, filters=[iotests.filter_qmp_event])
+
+        # An active block-commit will send BLOCK_JOB_READY basically
+        # immediately, and cancelling the job will consequently result in
+        # BLOCK_JOB_COMPLETED being emitted.
+        with TestCase('active block-commit', vm, top_path, \
+                      'BLOCK_JOB_COMPLETED'):
+            vm.qmp_log('block-commit', \
+                       job_id='job0', \
+                       device='drv0')
+            event = vm.event_wait('BLOCK_JOB_READY')
+            assert event is not None # silence mypy
+            iotests.log(event, filters=[iotests.filter_qmp_event])
+
+        # Give block-commit something to work on, otherwise it would be done
+        # immediately, send a BLOCK_JOB_COMPLETED and ejecting the BDS would
+        # work just fine without the block job still running.
+        iotests.qemu_io(middle_path, '-c', f'write 0 {IMAGE_SIZE}')
+        with TestCase('non-active block-commit', vm, top_path, \
+                      'BLOCK_JOB_CANCELLED'):
+            vm.qmp_log('block-commit', \
+                       job_id='job0', \
+                       device='drv0', \
+                       top=middle_path, \
+                       speed=1, \
+                       filters=[iotests.filter_qmp_testfiles])
+
+        # Give block-stream something to work on, otherwise it would be done
+        # immediately, send a BLOCK_JOB_COMPLETED and ejecting the BDS would
+        # work just fine without the block job still running.
+        iotests.qemu_io(bottom_path, '-c', f'write 0 {IMAGE_SIZE}')
+        with TestCase('block-stream', vm, top_path, 'BLOCK_JOB_CANCELLED'):
+            vm.qmp_log('block-stream', \
+                       job_id='job0', \
+                       device='drv0', \
+                       speed=1)
+
+if __name__ == '__main__':
+    iotests.script_main(main, supported_fmts=['qcow2', 'qed'],
+                        supported_protocols=['file'])
diff --git a/tests/qemu-iotests/141.out b/tests/qemu-iotests/141.out
index c4c15fb275f..91b7ba50af7 100644
--- a/tests/qemu-iotests/141.out
+++ b/tests/qemu-iotests/141.out
@@ -1,179 +1,69 @@
-QA output created by 141
-Formatting 'TEST_DIR/b.IMGFMT', fmt=IMGFMT size=1048576
-Formatting 'TEST_DIR/m.IMGFMT', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/b.IMGFMT backing_fmt=IMGFMT
-Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/m.IMGFMT backing_fmt=IMGFMT
-{'execute': 'qmp_capabilities'}
-{"return": {}}
-
+Creating bottom <- middle <- top backing file chain...
+Starting VM...
 === Testing drive-backup ===
-
-{'execute': 'blockdev-add',
-          'arguments': {
-              'node-name': 'drv0',
-              'driver': 'IMGFMT',
-              'file': {
-                  'driver': 'file',
-                  'filename': 'TEST_DIR/t.IMGFMT'
-              }}}
-{"return": {}}
-{'execute': 'drive-backup',
-'arguments': {'job-id': 'job0',
-'device': 'drv0',
-'target': 'TEST_DIR/o.IMGFMT',
-'format': 'IMGFMT',
-'sync': 'none'}}
-Formatting 'TEST_DIR/o.IMGFMT', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/t.IMGFMT backing_fmt=IMGFMT
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job0"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "paused", "id": "job0"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}}
-{'execute': 'blockdev-del',
-          'arguments': {'node-name': 'drv0'}}
+{"execute": "blockdev-add", "arguments": {"driver": "IMGFMT", "file": {"driver": "file", "filename": "TEST_DIR/PID-top"}, "node-name": "drv0"}}
+{"return": {}}
+{"execute": "drive-backup", "arguments": {"device": "drv0", "format": "IMGFMT", "job-id": "job0", "sync": "none", "target": "TEST_DIR/PID-target"}}
+{"return": {}}
+{"execute": "blockdev-del", "arguments": {"node-name": "drv0"}}
 {"error": {"class": "GenericError", "desc": "Node 'drv0' is busy: node is used as backing hd of 'NODE_NAME'"}}
-{'execute': 'block-job-cancel',
-          'arguments': {'device': 'job0'}}
+{"execute": "block-job-cancel", "arguments": {"device": "job0"}}
 {"return": {}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": "job0"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "job0", "len": 1048576, "offset": 0, "speed": 0, "type": "backup"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "job0"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "job0"}}
-{'execute': 'blockdev-del',
-          'arguments': {'node-name': 'drv0'}}
+{"data": {"device": "job0", "len": 1048576, "offset": 0, "speed": 0, "type": "backup"}, "event": "BLOCK_JOB_CANCELLED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
+{"execute": "blockdev-del", "arguments": {"node-name": "drv0"}}
 {"return": {}}
 
 === Testing drive-mirror ===
-
-{'execute': 'blockdev-add',
-          'arguments': {
-              'node-name': 'drv0',
-              'driver': 'IMGFMT',
-              'file': {
-                  'driver': 'file',
-                  'filename': 'TEST_DIR/t.IMGFMT'
-              }}}
-{"return": {}}
-{'execute': 'drive-mirror',
-'arguments': {'job-id': 'job0',
-'device': 'drv0',
-'target': 'TEST_DIR/o.IMGFMT',
-'format': 'IMGFMT',
-'sync': 'none'}}
-Formatting 'TEST_DIR/o.IMGFMT', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/t.IMGFMT backing_fmt=IMGFMT
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job0"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job0"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "job0", "len": 0, "offset": 0, "speed": 0, "type": "mirror"}}
-{'execute': 'blockdev-del',
-          'arguments': {'node-name': 'drv0'}}
+{"execute": "blockdev-add", "arguments": {"driver": "IMGFMT", "file": {"driver": "file", "filename": "TEST_DIR/PID-top"}, "node-name": "drv0"}}
+{"return": {}}
+{"execute": "drive-mirror", "arguments": {"device": "drv0", "format": "IMGFMT", "job-id": "job0", "sync": "none", "target": "TEST_DIR/PID-target"}}
+{"return": {}}
+{"data": {"device": "job0", "len": 0, "offset": 0, "speed": 0, "type": "mirror"}, "event": "BLOCK_JOB_READY", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
+{"execute": "blockdev-del", "arguments": {"node-name": "drv0"}}
 {"error": {"class": "GenericError", "desc": "Node 'drv0' is busy: block device is in use by block job: mirror"}}
-{'execute': 'block-job-cancel',
-          'arguments': {'device': 'job0'}}
+{"execute": "block-job-cancel", "arguments": {"device": "job0"}}
 {"return": {}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "job0"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "job0"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "job0", "len": 0, "offset": 0, "speed": 0, "type": "mirror"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "job0"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "job0"}}
-{'execute': 'blockdev-del',
-          'arguments': {'node-name': 'drv0'}}
+{"data": {"device": "job0", "len": 0, "offset": 0, "speed": 0, "type": "mirror"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
+{"execute": "blockdev-del", "arguments": {"node-name": "drv0"}}
 {"return": {}}
 
 === Testing active block-commit ===
-
-{'execute': 'blockdev-add',
-          'arguments': {
-              'node-name': 'drv0',
-              'driver': 'IMGFMT',
-              'file': {
-                  'driver': 'file',
-                  'filename': 'TEST_DIR/t.IMGFMT'
-              }}}
-{"return": {}}
-{'execute': 'block-commit',
-'arguments': {'job-id': 'job0', 'device': 'drv0'}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job0"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job0"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "job0", "len": 0, "offset": 0, "speed": 0, "type": "commit"}}
-{'execute': 'blockdev-del',
-          'arguments': {'node-name': 'drv0'}}
+{"execute": "blockdev-add", "arguments": {"driver": "IMGFMT", "file": {"driver": "file", "filename": "TEST_DIR/PID-top"}, "node-name": "drv0"}}
+{"return": {}}
+{"execute": "block-commit", "arguments": {"device": "drv0", "job-id": "job0"}}
+{"return": {}}
+{"data": {"device": "job0", "len": 0, "offset": 0, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_READY", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
+{"execute": "blockdev-del", "arguments": {"node-name": "drv0"}}
 {"error": {"class": "GenericError", "desc": "Node 'drv0' is busy: block device is in use by block job: commit"}}
-{'execute': 'block-job-cancel',
-          'arguments': {'device': 'job0'}}
+{"execute": "block-job-cancel", "arguments": {"device": "job0"}}
 {"return": {}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "job0"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "job0"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "job0", "len": 0, "offset": 0, "speed": 0, "type": "commit"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "job0"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "job0"}}
-{'execute': 'blockdev-del',
-          'arguments': {'node-name': 'drv0'}}
+{"data": {"device": "job0", "len": 0, "offset": 0, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
+{"execute": "blockdev-del", "arguments": {"node-name": "drv0"}}
 {"return": {}}
 
 === Testing non-active block-commit ===
-
-wrote 1048576/1048576 bytes at offset 0
-1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-{'execute': 'blockdev-add',
-          'arguments': {
-              'node-name': 'drv0',
-              'driver': 'IMGFMT',
-              'file': {
-                  'driver': 'file',
-                  'filename': 'TEST_DIR/t.IMGFMT'
-              }}}
-{"return": {}}
-{'execute': 'block-commit',
-'arguments': {'job-id': 'job0',
-'device': 'drv0',
-'top':    'TEST_DIR/m.IMGFMT',
-'speed':  1}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job0"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}}
-{'execute': 'blockdev-del',
-          'arguments': {'node-name': 'drv0'}}
-{"error": {"class": "GenericError", "desc": "Node drv0 is in use"}}
-{'execute': 'block-job-cancel',
-          'arguments': {'device': 'job0'}}
-{"return": {}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": "job0"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "job0", "len": 1048576, "offset": 524288, "speed": 1, "type": "commit"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "job0"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "job0"}}
-{'execute': 'blockdev-del',
-          'arguments': {'node-name': 'drv0'}}
+{"execute": "blockdev-add", "arguments": {"driver": "IMGFMT", "file": {"driver": "file", "filename": "TEST_DIR/PID-top"}, "node-name": "drv0"}}
+{"return": {}}
+{"execute": "block-commit", "arguments": {"device": "drv0", "job-id": "job0", "speed": 1, "top": "TEST_DIR/PID-middle"}}
+{"return": {}}
+{"execute": "blockdev-del", "arguments": {"node-name": "drv0"}}
+{"error": {"class": "GenericError", "desc": "Node 'drv0' is busy: block device is in use by block job: commit"}}
+{"execute": "block-job-cancel", "arguments": {"device": "job0"}}
+{"return": {}}
+{"data": {"device": "job0", "len": 1048576, "offset": 524288, "speed": 1, "type": "commit"}, "event": "BLOCK_JOB_CANCELLED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
+{"execute": "blockdev-del", "arguments": {"node-name": "drv0"}}
 {"return": {}}
 
 === Testing block-stream ===
-
-wrote 1048576/1048576 bytes at offset 0
-1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-{'execute': 'blockdev-add',
-          'arguments': {
-              'node-name': 'drv0',
-              'driver': 'IMGFMT',
-              'file': {
-                  'driver': 'file',
-                  'filename': 'TEST_DIR/t.IMGFMT'
-              }}}
-{"return": {}}
-{'execute': 'block-stream',
-'arguments': {'job-id': 'job0',
-'device': 'drv0',
-'speed': 1}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job0"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}}
-{'execute': 'blockdev-del',
-          'arguments': {'node-name': 'drv0'}}
+{"execute": "blockdev-add", "arguments": {"driver": "IMGFMT", "file": {"driver": "file", "filename": "TEST_DIR/PID-top"}, "node-name": "drv0"}}
+{"return": {}}
+{"execute": "block-stream", "arguments": {"device": "drv0", "job-id": "job0", "speed": 1}}
+{"return": {}}
+{"execute": "blockdev-del", "arguments": {"node-name": "drv0"}}
 {"error": {"class": "GenericError", "desc": "Node 'drv0' is busy: block device is in use by block job: stream"}}
-{'execute': 'block-job-cancel',
-          'arguments': {'device': 'job0'}}
+{"execute": "block-job-cancel", "arguments": {"device": "job0"}}
 {"return": {}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": "job0"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "job0", "len": 1048576, "offset": 524288, "speed": 1, "type": "stream"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "job0"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "job0"}}
-{'execute': 'blockdev-del',
-          'arguments': {'node-name': 'drv0'}}
+{"data": {"device": "job0", "len": 1048576, "offset": 524288, "speed": 1, "type": "stream"}, "event": "BLOCK_JOB_CANCELLED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
+{"execute": "blockdev-del", "arguments": {"node-name": "drv0"}}
 {"return": {}}
-*** done
+
-- 
Gitee


From d105308dc2b7a29404f581dc1838f44d955b95d2 Mon Sep 17 00:00:00 2001
From: Stefan Hajnoczi <stefanha@redhat.com>
Date: Thu, 18 Jan 2024 09:48:23 -0500
Subject: [PATCH 333/407] monitor: only run coroutine commands in
 qemu_aio_context

RH-Author: Stefan Hajnoczi <stefanha@redhat.com>
RH-MergeRequest: 352: monitor: only run coroutine commands in qemu_aio_context
RH-Jira: RHEL-7353
RH-Acked-by: Kevin Wolf <kwolf@redhat.com>
RH-Acked-by: Hanna Czenczek <hreitz@redhat.com>
RH-Commit: [3/4] c79f7b4b6a677beb838fc428871e003ed8ee4e53

monitor_qmp_dispatcher_co() runs in the iohandler AioContext that is not
polled during nested event loops. The coroutine currently reschedules
itself in the main loop's qemu_aio_context AioContext, which is polled
during nested event loops. One known problem is that QMP device-add
calls drain_call_rcu(), which temporarily drops the BQL, leading to all
sorts of havoc like other vCPU threads re-entering device emulation code
while another vCPU thread is waiting in device emulation code with
aio_poll().

Paolo Bonzini suggested running non-coroutine QMP handlers in the
iohandler AioContext. This avoids trouble with nested event loops. His
original idea was to move coroutine rescheduling to
monitor_qmp_dispatch(), but I resorted to moving it to qmp_dispatch()
because we don't know if the QMP handler needs to run in coroutine
context in monitor_qmp_dispatch(). monitor_qmp_dispatch() would have
been nicer since it's associated with the monitor implementation and not
as general as qmp_dispatch(), which is also used by qemu-ga.

A number of qemu-iotests need updated .out files because the order of
QMP events vs QMP responses has changed.

Solves Issue #1933.

Cc: qemu-stable@nongnu.org
Fixes: 7bed89958bfbf40df9ca681cefbdca63abdde39d ("device_core: use drain_call_rcu in in qmp_device_add")
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2215192
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2214985
Buglink: https://issues.redhat.com/browse/RHEL-17369
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20240118144823.1497953-4-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Tested-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit effd60c878176bcaf97fa7ce2b12d04bb8ead6f7)
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

Conflicts:
	tests/qemu-iotests/185
	tests/qemu-iotests/308.out

        These tests are different downstream. Shorten the upstream
	changes to only cover portions that exist downstream.

	tests/qemu-iotests/tests/file-io-error
	tests/qemu-iotests/tests/iothreads-resize.out

        These tests don't exist downstream. Ignore them.
---
 monitor/qmp.c                         | 17 ------
 qapi/qmp-dispatch.c                   | 24 ++++++++-
 tests/qemu-iotests/060.out            |  4 +-
 tests/qemu-iotests/071.out            |  4 +-
 tests/qemu-iotests/081.out            | 16 +++---
 tests/qemu-iotests/087.out            | 12 ++---
 tests/qemu-iotests/108.out            |  2 +-
 tests/qemu-iotests/109                |  4 +-
 tests/qemu-iotests/109.out            | 78 ++++++++++++---------------
 tests/qemu-iotests/117.out            |  2 +-
 tests/qemu-iotests/120.out            |  2 +-
 tests/qemu-iotests/127.out            |  2 +-
 tests/qemu-iotests/140.out            |  2 +-
 tests/qemu-iotests/143.out            |  2 +-
 tests/qemu-iotests/156.out            |  2 +-
 tests/qemu-iotests/176.out            | 16 +++---
 tests/qemu-iotests/182.out            |  2 +-
 tests/qemu-iotests/183.out            |  4 +-
 tests/qemu-iotests/184.out            | 32 +++++------
 tests/qemu-iotests/185.out            | 45 ++++++++++++++--
 tests/qemu-iotests/191.out            | 16 +++---
 tests/qemu-iotests/195.out            | 16 +++---
 tests/qemu-iotests/223.out            | 12 ++---
 tests/qemu-iotests/227.out            | 32 +++++------
 tests/qemu-iotests/247.out            |  2 +-
 tests/qemu-iotests/273.out            |  8 +--
 tests/qemu-iotests/308                |  4 +-
 tests/qemu-iotests/308.out            |  2 +-
 tests/qemu-iotests/tests/qsd-jobs.out |  4 +-
 29 files changed, 198 insertions(+), 170 deletions(-)

diff --git a/monitor/qmp.c b/monitor/qmp.c
index 092c527b6fc..acd0a350c24 100644
--- a/monitor/qmp.c
+++ b/monitor/qmp.c
@@ -296,14 +296,6 @@ void coroutine_fn monitor_qmp_dispatcher_co(void *data)
             qemu_coroutine_yield();
         }
 
-        /*
-         * Move the coroutine from iohandler_ctx to qemu_aio_context for
-         * executing the command handler so that it can make progress if it
-         * involves an AIO_WAIT_WHILE().
-         */
-        aio_co_schedule(qemu_get_aio_context(), qmp_dispatcher_co);
-        qemu_coroutine_yield();
-
         /* Process request */
         if (req_obj->req) {
             if (trace_event_get_state(TRACE_MONITOR_QMP_CMD_IN_BAND)) {
@@ -330,15 +322,6 @@ void coroutine_fn monitor_qmp_dispatcher_co(void *data)
         }
 
         qmp_request_free(req_obj);
-
-        /*
-         * Yield and reschedule so the main loop stays responsive.
-         *
-         * Move back to iohandler_ctx so that nested event loops for
-         * qemu_aio_context don't start new monitor commands.
-         */
-        aio_co_schedule(iohandler_get_aio_context(), qmp_dispatcher_co);
-        qemu_coroutine_yield();
     }
 }
 
diff --git a/qapi/qmp-dispatch.c b/qapi/qmp-dispatch.c
index d378bccac73..114e2934766 100644
--- a/qapi/qmp-dispatch.c
+++ b/qapi/qmp-dispatch.c
@@ -207,9 +207,31 @@ QDict *qmp_dispatch(const QmpCommandList *cmds, QObject *request,
     assert(!(oob && qemu_in_coroutine()));
     assert(monitor_cur() == NULL);
     if (!!(cmd->options & QCO_COROUTINE) == qemu_in_coroutine()) {
+        if (qemu_in_coroutine()) {
+            /*
+             * Move the coroutine from iohandler_ctx to qemu_aio_context for
+             * executing the command handler so that it can make progress if it
+             * involves an AIO_WAIT_WHILE().
+             */
+            aio_co_schedule(qemu_get_aio_context(), qemu_coroutine_self());
+            qemu_coroutine_yield();
+        }
+
         monitor_set_cur(qemu_coroutine_self(), cur_mon);
         cmd->fn(args, &ret, &err);
         monitor_set_cur(qemu_coroutine_self(), NULL);
+
+        if (qemu_in_coroutine()) {
+            /*
+             * Yield and reschedule so the main loop stays responsive.
+             *
+             * Move back to iohandler_ctx so that nested event loops for
+             * qemu_aio_context don't start new monitor commands.
+             */
+            aio_co_schedule(iohandler_get_aio_context(),
+                            qemu_coroutine_self());
+            qemu_coroutine_yield();
+        }
     } else {
        /*
         * Actual context doesn't match the one the command needs.
@@ -233,7 +255,7 @@ QDict *qmp_dispatch(const QmpCommandList *cmds, QObject *request,
             .errp       = &err,
             .co         = qemu_coroutine_self(),
         };
-        aio_bh_schedule_oneshot(qemu_get_aio_context(), do_qmp_dispatch_bh,
+        aio_bh_schedule_oneshot(iohandler_get_aio_context(), do_qmp_dispatch_bh,
                                 &data);
         qemu_coroutine_yield();
     }
diff --git a/tests/qemu-iotests/060.out b/tests/qemu-iotests/060.out
index b74540bafbe..9c5fa875cf8 100644
--- a/tests/qemu-iotests/060.out
+++ b/tests/qemu-iotests/060.out
@@ -421,8 +421,8 @@ QMP_VERSION
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_IMAGE_CORRUPTED", "data": {"device": "none0", "msg": "Preventing invalid write on metadata (overlaps with refcount table)", "offset": 65536, "node-name": "drive", "fatal": true, "size": 65536}}
 write failed: Input/output error
 {"return": ""}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"return": {}}
 
 === Testing incoming inactive corrupted image ===
 
@@ -432,8 +432,8 @@ QMP_VERSION
 qcow2: Image is corrupt: L2 table offset 0x2a2a2a00 unaligned (L1 index: 0); further non-fatal corruption events will be suppressed
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_IMAGE_CORRUPTED", "data": {"device": "", "msg": "L2 table offset 0x2a2a2a00 unaligned (L1 index: 0)", "node-name": "drive", "fatal": false}}
 {"return": ""}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"return": {}}
 
     corrupt: false
 *** done
diff --git a/tests/qemu-iotests/071.out b/tests/qemu-iotests/071.out
index bca0c02f5c0..a2923b05c29 100644
--- a/tests/qemu-iotests/071.out
+++ b/tests/qemu-iotests/071.out
@@ -45,8 +45,8 @@ QMP_VERSION
 {"return": {}}
 read failed: Input/output error
 {"return": ""}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"return": {}}
 
 
 === Testing blkverify on existing block device ===
@@ -84,9 +84,9 @@ wrote 512/512 bytes at offset 0
 {"return": ""}
 read failed: Input/output error
 {"return": ""}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
 QEMU_PROG: Failed to flush the L2 table cache: Input/output error
 QEMU_PROG: Failed to flush the refcount block cache: Input/output error
+{"return": {}}
 
 *** done
diff --git a/tests/qemu-iotests/081.out b/tests/qemu-iotests/081.out
index 615c0835498..aba85ea5644 100644
--- a/tests/qemu-iotests/081.out
+++ b/tests/qemu-iotests/081.out
@@ -35,8 +35,8 @@ QMP_VERSION
 read 10485760/10485760 bytes at offset 0
 10 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 {"return": ""}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"return": {}}
 
 
 == using quorum rewrite corrupted mode ==
@@ -67,8 +67,8 @@ QMP_VERSION
 read 10485760/10485760 bytes at offset 0
 10 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 {"return": ""}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"return": {}}
 
 -- checking that the image has been corrected --
 read 10485760/10485760 bytes at offset 0
@@ -106,8 +106,8 @@ QMP_VERSION
 {"return": {}}
 {"return": {}}
 {"return": {}}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"return": {}}
 
 Testing:
 QMP_VERSION
@@ -115,8 +115,8 @@ QMP_VERSION
 {"return": {}}
 {"return": {}}
 {"error": {"class": "GenericError", "desc": "Cannot add a child to a quorum in blkverify mode"}}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"return": {}}
 
 
 == dynamically removing a child from a quorum ==
@@ -125,31 +125,31 @@ QMP_VERSION
 {"return": {}}
 {"return": {}}
 {"return": {}}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"return": {}}
 
 Testing:
 QMP_VERSION
 {"return": {}}
 {"return": {}}
 {"error": {"class": "GenericError", "desc": "The number of children cannot be lower than the vote threshold 2"}}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"return": {}}
 
 Testing:
 QMP_VERSION
 {"return": {}}
 {"error": {"class": "GenericError", "desc": "blkverify=on can only be set if there are exactly two files and vote-threshold is 2"}}
 {"error": {"class": "GenericError", "desc": "Cannot find device='drive0-quorum' nor node-name='drive0-quorum'"}}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"return": {}}
 
 Testing:
 QMP_VERSION
 {"return": {}}
 {"return": {}}
 {"error": {"class": "GenericError", "desc": "The number of children cannot be lower than the vote threshold 2"}}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"return": {}}
 
 *** done
diff --git a/tests/qemu-iotests/087.out b/tests/qemu-iotests/087.out
index e1c23a69836..97b6d8036dc 100644
--- a/tests/qemu-iotests/087.out
+++ b/tests/qemu-iotests/087.out
@@ -7,8 +7,8 @@ Testing:
 QMP_VERSION
 {"return": {}}
 {"error": {"class": "GenericError", "desc": "'node-name' must be specified for the root node"}}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"return": {}}
 
 
 === Duplicate ID ===
@@ -18,8 +18,8 @@ QMP_VERSION
 {"return": {}}
 {"error": {"class": "GenericError", "desc": "node-name=disk is conflicting with a device id"}}
 {"error": {"class": "GenericError", "desc": "Duplicate nodes with node-name='test-node'"}}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"return": {}}
 
 
 === aio=native without O_DIRECT ===
@@ -28,8 +28,8 @@ Testing:
 QMP_VERSION
 {"return": {}}
 {"error": {"class": "GenericError", "desc": "aio=native was specified, but it requires cache.direct=on, which was not specified."}}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"return": {}}
 
 
 === Encrypted image QCow ===
@@ -40,8 +40,8 @@ QMP_VERSION
 {"return": {}}
 {"return": {}}
 {"error": {"class": "GenericError", "desc": "Use of AES-CBC encrypted IMGFMT images is no longer supported in system emulators"}}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"return": {}}
 
 
 === Encrypted image LUKS ===
@@ -52,8 +52,8 @@ QMP_VERSION
 {"return": {}}
 {"return": {}}
 {"return": {}}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"return": {}}
 
 
 === Missing driver ===
@@ -63,7 +63,7 @@ Testing: -S
 QMP_VERSION
 {"return": {}}
 {"error": {"class": "GenericError", "desc": "Parameter 'driver' is missing"}}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"return": {}}
 
 *** done
diff --git a/tests/qemu-iotests/108.out b/tests/qemu-iotests/108.out
index b5401d788dc..b9c876b394a 100644
--- a/tests/qemu-iotests/108.out
+++ b/tests/qemu-iotests/108.out
@@ -173,8 +173,8 @@ OK: Reftable is where we expect it
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "create"}}
 {"return": {}}
 { "execute": "quit" }
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"return": {}}
 
 wrote 65536/65536 bytes at offset 0
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
diff --git a/tests/qemu-iotests/109 b/tests/qemu-iotests/109
index e207a555f37..0fb580f9a59 100755
--- a/tests/qemu-iotests/109
+++ b/tests/qemu-iotests/109
@@ -57,13 +57,13 @@ run_qemu()
     _launch_qemu -drive file="${source_img}",format=raw,cache=${CACHEMODE},aio=${AIOMODE},id=src
     _send_qemu_cmd $QEMU_HANDLE "{ 'execute': 'qmp_capabilities' }" "return"
 
-    _send_qemu_cmd $QEMU_HANDLE \
+    capture_events="$qmp_event" _send_qemu_cmd $QEMU_HANDLE \
         "{'execute':'drive-mirror', 'arguments':{
             'device': 'src', 'target': '$raw_img', $qmp_format
             'mode': 'existing', 'sync': 'full'}}" \
         "return"
 
-    _send_qemu_cmd $QEMU_HANDLE '' "$qmp_event"
+    capture_events="$qmp_event JOB_STATUS_CHANGE" _wait_event $QEMU_HANDLE "$qmp_event"
     if test "$qmp_event" = BLOCK_JOB_ERROR; then
         _send_qemu_cmd $QEMU_HANDLE '' '"status": "null"'
     fi
diff --git a/tests/qemu-iotests/109.out b/tests/qemu-iotests/109.out
index e29280015e7..255b81fcdce 100644
--- a/tests/qemu-iotests/109.out
+++ b/tests/qemu-iotests/109.out
@@ -7,7 +7,7 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=SIZE
 { 'execute': 'qmp_capabilities' }
 {"return": {}}
 {'execute':'drive-mirror', 'arguments':{
-            'device': 'src', 'target': 'TEST_DIR/t.IMGFMT',
+            'device': 'src', 'target': 'TEST_DIR/t.IMGFMT', 
             'mode': 'existing', 'sync': 'full'}}
 WARNING: Image format was not specified for 'TEST_DIR/t.raw' and probing guessed raw.
          Automatically detecting the format is dangerous for raw images, write operations on block 0 will be restricted.
@@ -23,8 +23,8 @@ WARNING: Image format was not specified for 'TEST_DIR/t.raw' and probing guessed
 {"execute":"query-block-jobs"}
 {"return": []}
 {"execute":"quit"}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"return": {}}
 read 512/512 bytes at offset 0
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 { 'execute': 'qmp_capabilities' }
@@ -35,12 +35,10 @@ read 512/512 bytes at offset 0
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "src"}}
 {"return": {}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "src", "len": 1024, "offset": 1024, "speed": 0, "type": "mirror"}}
 {"execute":"query-block-jobs"}
 {"return": [{"auto-finalize": true, "io-status": "ok", "device": "src", "auto-dismiss": true, "busy": false, "len": 1024, "offset": 1024, "status": "ready", "paused": false, "speed": 0, "ready": true, "type": "mirror"}]}
 {"execute":"quit"}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "standby", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
@@ -48,6 +46,7 @@ read 512/512 bytes at offset 0
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "src", "len": 1024, "offset": 1024, "speed": 0, "type": "mirror"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "src"}}
+{"return": {}}
 Images are identical.
 
 === Writing a qcow2 header into raw ===
@@ -57,7 +56,7 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=SIZE
 { 'execute': 'qmp_capabilities' }
 {"return": {}}
 {'execute':'drive-mirror', 'arguments':{
-            'device': 'src', 'target': 'TEST_DIR/t.IMGFMT',
+            'device': 'src', 'target': 'TEST_DIR/t.IMGFMT', 
             'mode': 'existing', 'sync': 'full'}}
 WARNING: Image format was not specified for 'TEST_DIR/t.raw' and probing guessed raw.
          Automatically detecting the format is dangerous for raw images, write operations on block 0 will be restricted.
@@ -73,8 +72,8 @@ WARNING: Image format was not specified for 'TEST_DIR/t.raw' and probing guessed
 {"execute":"query-block-jobs"}
 {"return": []}
 {"execute":"quit"}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"return": {}}
 read 512/512 bytes at offset 0
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 { 'execute': 'qmp_capabilities' }
@@ -85,12 +84,10 @@ read 512/512 bytes at offset 0
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "src"}}
 {"return": {}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "src", "len": 197120, "offset": 197120, "speed": 0, "type": "mirror"}}
 {"execute":"query-block-jobs"}
 {"return": [{"auto-finalize": true, "io-status": "ok", "device": "src", "auto-dismiss": true, "busy": false, "len": 197120, "offset": 197120, "status": "ready", "paused": false, "speed": 0, "ready": true, "type": "mirror"}]}
 {"execute":"quit"}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "standby", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
@@ -98,6 +95,7 @@ read 512/512 bytes at offset 0
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "src", "len": 197120, "offset": 197120, "speed": 0, "type": "mirror"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "src"}}
+{"return": {}}
 Images are identical.
 
 === Writing a qed header into raw ===
@@ -107,7 +105,7 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=SIZE
 { 'execute': 'qmp_capabilities' }
 {"return": {}}
 {'execute':'drive-mirror', 'arguments':{
-            'device': 'src', 'target': 'TEST_DIR/t.IMGFMT',
+            'device': 'src', 'target': 'TEST_DIR/t.IMGFMT', 
             'mode': 'existing', 'sync': 'full'}}
 WARNING: Image format was not specified for 'TEST_DIR/t.raw' and probing guessed raw.
          Automatically detecting the format is dangerous for raw images, write operations on block 0 will be restricted.
@@ -123,8 +121,8 @@ WARNING: Image format was not specified for 'TEST_DIR/t.raw' and probing guessed
 {"execute":"query-block-jobs"}
 {"return": []}
 {"execute":"quit"}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"return": {}}
 read 512/512 bytes at offset 0
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 { 'execute': 'qmp_capabilities' }
@@ -135,12 +133,10 @@ read 512/512 bytes at offset 0
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "src"}}
 {"return": {}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "src", "len": 327680, "offset": 327680, "speed": 0, "type": "mirror"}}
 {"execute":"query-block-jobs"}
 {"return": [{"auto-finalize": true, "io-status": "ok", "device": "src", "auto-dismiss": true, "busy": false, "len": 327680, "offset": 327680, "status": "ready", "paused": false, "speed": 0, "ready": true, "type": "mirror"}]}
 {"execute":"quit"}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "standby", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
@@ -148,6 +144,7 @@ read 512/512 bytes at offset 0
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "src", "len": 327680, "offset": 327680, "speed": 0, "type": "mirror"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "src"}}
+{"return": {}}
 Images are identical.
 
 === Writing a vdi header into raw ===
@@ -157,7 +154,7 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=SIZE
 { 'execute': 'qmp_capabilities' }
 {"return": {}}
 {'execute':'drive-mirror', 'arguments':{
-            'device': 'src', 'target': 'TEST_DIR/t.IMGFMT',
+            'device': 'src', 'target': 'TEST_DIR/t.IMGFMT', 
             'mode': 'existing', 'sync': 'full'}}
 WARNING: Image format was not specified for 'TEST_DIR/t.raw' and probing guessed raw.
          Automatically detecting the format is dangerous for raw images, write operations on block 0 will be restricted.
@@ -173,8 +170,8 @@ WARNING: Image format was not specified for 'TEST_DIR/t.raw' and probing guessed
 {"execute":"query-block-jobs"}
 {"return": []}
 {"execute":"quit"}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"return": {}}
 read 512/512 bytes at offset 0
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 { 'execute': 'qmp_capabilities' }
@@ -185,12 +182,10 @@ read 512/512 bytes at offset 0
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "src"}}
 {"return": {}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "src", "len": 1024, "offset": 1024, "speed": 0, "type": "mirror"}}
 {"execute":"query-block-jobs"}
 {"return": [{"auto-finalize": true, "io-status": "ok", "device": "src", "auto-dismiss": true, "busy": false, "len": 1024, "offset": 1024, "status": "ready", "paused": false, "speed": 0, "ready": true, "type": "mirror"}]}
 {"execute":"quit"}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "standby", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
@@ -198,6 +193,7 @@ read 512/512 bytes at offset 0
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "src", "len": 1024, "offset": 1024, "speed": 0, "type": "mirror"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "src"}}
+{"return": {}}
 Images are identical.
 
 === Writing a vmdk header into raw ===
@@ -207,7 +203,7 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=SIZE
 { 'execute': 'qmp_capabilities' }
 {"return": {}}
 {'execute':'drive-mirror', 'arguments':{
-            'device': 'src', 'target': 'TEST_DIR/t.IMGFMT',
+            'device': 'src', 'target': 'TEST_DIR/t.IMGFMT', 
             'mode': 'existing', 'sync': 'full'}}
 WARNING: Image format was not specified for 'TEST_DIR/t.raw' and probing guessed raw.
          Automatically detecting the format is dangerous for raw images, write operations on block 0 will be restricted.
@@ -223,8 +219,8 @@ WARNING: Image format was not specified for 'TEST_DIR/t.raw' and probing guessed
 {"execute":"query-block-jobs"}
 {"return": []}
 {"execute":"quit"}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"return": {}}
 read 512/512 bytes at offset 0
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 { 'execute': 'qmp_capabilities' }
@@ -235,12 +231,10 @@ read 512/512 bytes at offset 0
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "src"}}
 {"return": {}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "src", "len": 65536, "offset": 65536, "speed": 0, "type": "mirror"}}
 {"execute":"query-block-jobs"}
 {"return": [{"auto-finalize": true, "io-status": "ok", "device": "src", "auto-dismiss": true, "busy": false, "len": 65536, "offset": 65536, "status": "ready", "paused": false, "speed": 0, "ready": true, "type": "mirror"}]}
 {"execute":"quit"}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "standby", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
@@ -248,6 +242,7 @@ read 512/512 bytes at offset 0
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "src", "len": 65536, "offset": 65536, "speed": 0, "type": "mirror"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "src"}}
+{"return": {}}
 Images are identical.
 
 === Writing a vpc header into raw ===
@@ -257,7 +252,7 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=SIZE
 { 'execute': 'qmp_capabilities' }
 {"return": {}}
 {'execute':'drive-mirror', 'arguments':{
-            'device': 'src', 'target': 'TEST_DIR/t.IMGFMT',
+            'device': 'src', 'target': 'TEST_DIR/t.IMGFMT', 
             'mode': 'existing', 'sync': 'full'}}
 WARNING: Image format was not specified for 'TEST_DIR/t.raw' and probing guessed raw.
          Automatically detecting the format is dangerous for raw images, write operations on block 0 will be restricted.
@@ -273,8 +268,8 @@ WARNING: Image format was not specified for 'TEST_DIR/t.raw' and probing guessed
 {"execute":"query-block-jobs"}
 {"return": []}
 {"execute":"quit"}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"return": {}}
 read 512/512 bytes at offset 0
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 { 'execute': 'qmp_capabilities' }
@@ -285,12 +280,10 @@ read 512/512 bytes at offset 0
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "src"}}
 {"return": {}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "src", "len": 2560, "offset": 2560, "speed": 0, "type": "mirror"}}
 {"execute":"query-block-jobs"}
 {"return": [{"auto-finalize": true, "io-status": "ok", "device": "src", "auto-dismiss": true, "busy": false, "len": 2560, "offset": 2560, "status": "ready", "paused": false, "speed": 0, "ready": true, "type": "mirror"}]}
 {"execute":"quit"}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "standby", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
@@ -298,6 +291,7 @@ read 512/512 bytes at offset 0
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "src", "len": 2560, "offset": 2560, "speed": 0, "type": "mirror"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "src"}}
+{"return": {}}
 Images are identical.
 
 === Copying sample image empty.bochs into raw ===
@@ -306,7 +300,7 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=SIZE
 { 'execute': 'qmp_capabilities' }
 {"return": {}}
 {'execute':'drive-mirror', 'arguments':{
-            'device': 'src', 'target': 'TEST_DIR/t.IMGFMT',
+            'device': 'src', 'target': 'TEST_DIR/t.IMGFMT', 
             'mode': 'existing', 'sync': 'full'}}
 WARNING: Image format was not specified for 'TEST_DIR/t.raw' and probing guessed raw.
          Automatically detecting the format is dangerous for raw images, write operations on block 0 will be restricted.
@@ -322,8 +316,8 @@ WARNING: Image format was not specified for 'TEST_DIR/t.raw' and probing guessed
 {"execute":"query-block-jobs"}
 {"return": []}
 {"execute":"quit"}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"return": {}}
 read 512/512 bytes at offset 0
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 { 'execute': 'qmp_capabilities' }
@@ -334,12 +328,10 @@ read 512/512 bytes at offset 0
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "src"}}
 {"return": {}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "src", "len": 2560, "offset": 2560, "speed": 0, "type": "mirror"}}
 {"execute":"query-block-jobs"}
 {"return": [{"auto-finalize": true, "io-status": "ok", "device": "src", "auto-dismiss": true, "busy": false, "len": 2560, "offset": 2560, "status": "ready", "paused": false, "speed": 0, "ready": true, "type": "mirror"}]}
 {"execute":"quit"}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "standby", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
@@ -347,6 +339,7 @@ read 512/512 bytes at offset 0
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "src", "len": 2560, "offset": 2560, "speed": 0, "type": "mirror"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "src"}}
+{"return": {}}
 Images are identical.
 
 === Copying sample image iotest-dirtylog-10G-4M.vhdx into raw ===
@@ -355,7 +348,7 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=SIZE
 { 'execute': 'qmp_capabilities' }
 {"return": {}}
 {'execute':'drive-mirror', 'arguments':{
-            'device': 'src', 'target': 'TEST_DIR/t.IMGFMT',
+            'device': 'src', 'target': 'TEST_DIR/t.IMGFMT', 
             'mode': 'existing', 'sync': 'full'}}
 WARNING: Image format was not specified for 'TEST_DIR/t.raw' and probing guessed raw.
          Automatically detecting the format is dangerous for raw images, write operations on block 0 will be restricted.
@@ -371,8 +364,8 @@ WARNING: Image format was not specified for 'TEST_DIR/t.raw' and probing guessed
 {"execute":"query-block-jobs"}
 {"return": []}
 {"execute":"quit"}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"return": {}}
 read 512/512 bytes at offset 0
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 { 'execute': 'qmp_capabilities' }
@@ -383,12 +376,10 @@ read 512/512 bytes at offset 0
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "src"}}
 {"return": {}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "src", "len": 31457280, "offset": 31457280, "speed": 0, "type": "mirror"}}
 {"execute":"query-block-jobs"}
 {"return": [{"auto-finalize": true, "io-status": "ok", "device": "src", "auto-dismiss": true, "busy": false, "len": 31457280, "offset": 31457280, "status": "ready", "paused": false, "speed": 0, "ready": true, "type": "mirror"}]}
 {"execute":"quit"}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "standby", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
@@ -396,6 +387,7 @@ read 512/512 bytes at offset 0
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "src", "len": 31457280, "offset": 31457280, "speed": 0, "type": "mirror"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "src"}}
+{"return": {}}
 Images are identical.
 
 === Copying sample image parallels-v1 into raw ===
@@ -404,7 +396,7 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=SIZE
 { 'execute': 'qmp_capabilities' }
 {"return": {}}
 {'execute':'drive-mirror', 'arguments':{
-            'device': 'src', 'target': 'TEST_DIR/t.IMGFMT',
+            'device': 'src', 'target': 'TEST_DIR/t.IMGFMT', 
             'mode': 'existing', 'sync': 'full'}}
 WARNING: Image format was not specified for 'TEST_DIR/t.raw' and probing guessed raw.
          Automatically detecting the format is dangerous for raw images, write operations on block 0 will be restricted.
@@ -420,8 +412,8 @@ WARNING: Image format was not specified for 'TEST_DIR/t.raw' and probing guessed
 {"execute":"query-block-jobs"}
 {"return": []}
 {"execute":"quit"}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"return": {}}
 read 512/512 bytes at offset 0
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 { 'execute': 'qmp_capabilities' }
@@ -432,12 +424,10 @@ read 512/512 bytes at offset 0
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "src"}}
 {"return": {}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "src", "len": 327680, "offset": 327680, "speed": 0, "type": "mirror"}}
 {"execute":"query-block-jobs"}
 {"return": [{"auto-finalize": true, "io-status": "ok", "device": "src", "auto-dismiss": true, "busy": false, "len": 327680, "offset": 327680, "status": "ready", "paused": false, "speed": 0, "ready": true, "type": "mirror"}]}
 {"execute":"quit"}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "standby", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
@@ -445,6 +435,7 @@ read 512/512 bytes at offset 0
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "src", "len": 327680, "offset": 327680, "speed": 0, "type": "mirror"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "src"}}
+{"return": {}}
 Images are identical.
 
 === Copying sample image simple-pattern.cloop into raw ===
@@ -453,7 +444,7 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=SIZE
 { 'execute': 'qmp_capabilities' }
 {"return": {}}
 {'execute':'drive-mirror', 'arguments':{
-            'device': 'src', 'target': 'TEST_DIR/t.IMGFMT',
+            'device': 'src', 'target': 'TEST_DIR/t.IMGFMT', 
             'mode': 'existing', 'sync': 'full'}}
 WARNING: Image format was not specified for 'TEST_DIR/t.raw' and probing guessed raw.
          Automatically detecting the format is dangerous for raw images, write operations on block 0 will be restricted.
@@ -469,8 +460,8 @@ WARNING: Image format was not specified for 'TEST_DIR/t.raw' and probing guessed
 {"execute":"query-block-jobs"}
 {"return": []}
 {"execute":"quit"}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"return": {}}
 read 512/512 bytes at offset 0
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 { 'execute': 'qmp_capabilities' }
@@ -481,12 +472,10 @@ read 512/512 bytes at offset 0
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "src"}}
 {"return": {}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "src", "len": 2048, "offset": 2048, "speed": 0, "type": "mirror"}}
 {"execute":"query-block-jobs"}
 {"return": [{"auto-finalize": true, "io-status": "ok", "device": "src", "auto-dismiss": true, "busy": false, "len": 2048, "offset": 2048, "status": "ready", "paused": false, "speed": 0, "ready": true, "type": "mirror"}]}
 {"execute":"quit"}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "standby", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
@@ -494,6 +483,7 @@ read 512/512 bytes at offset 0
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "src", "len": 2048, "offset": 2048, "speed": 0, "type": "mirror"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "src"}}
+{"return": {}}
 Images are identical.
 
 === Write legitimate MBR into raw ===
@@ -502,7 +492,7 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=SIZE
 { 'execute': 'qmp_capabilities' }
 {"return": {}}
 {'execute':'drive-mirror', 'arguments':{
-            'device': 'src', 'target': 'TEST_DIR/t.IMGFMT',
+            'device': 'src', 'target': 'TEST_DIR/t.IMGFMT', 
             'mode': 'existing', 'sync': 'full'}}
 WARNING: Image format was not specified for 'TEST_DIR/t.raw' and probing guessed raw.
          Automatically detecting the format is dangerous for raw images, write operations on block 0 will be restricted.
@@ -510,12 +500,10 @@ WARNING: Image format was not specified for 'TEST_DIR/t.raw' and probing guessed
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "src"}}
 {"return": {}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "src", "len": 512, "offset": 512, "speed": 0, "type": "mirror"}}
 {"execute":"query-block-jobs"}
 {"return": [{"auto-finalize": true, "io-status": "ok", "device": "src", "auto-dismiss": true, "busy": false, "len": 512, "offset": 512, "status": "ready", "paused": false, "speed": 0, "ready": true, "type": "mirror"}]}
 {"execute":"quit"}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "standby", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
@@ -523,6 +511,7 @@ WARNING: Image format was not specified for 'TEST_DIR/t.raw' and probing guessed
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "src", "len": 512, "offset": 512, "speed": 0, "type": "mirror"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "src"}}
+{"return": {}}
 Images are identical.
 { 'execute': 'qmp_capabilities' }
 {"return": {}}
@@ -532,12 +521,10 @@ Images are identical.
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "src"}}
 {"return": {}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "src", "len": 512, "offset": 512, "speed": 0, "type": "mirror"}}
 {"execute":"query-block-jobs"}
 {"return": [{"auto-finalize": true, "io-status": "ok", "device": "src", "auto-dismiss": true, "busy": false, "len": 512, "offset": 512, "status": "ready", "paused": false, "speed": 0, "ready": true, "type": "mirror"}]}
 {"execute":"quit"}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "standby", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
@@ -545,5 +532,6 @@ Images are identical.
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "src", "len": 512, "offset": 512, "speed": 0, "type": "mirror"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "src"}}
+{"return": {}}
 Images are identical.
 *** done
diff --git a/tests/qemu-iotests/117.out b/tests/qemu-iotests/117.out
index 735ffd25c65..1cea9e02173 100644
--- a/tests/qemu-iotests/117.out
+++ b/tests/qemu-iotests/117.out
@@ -18,8 +18,8 @@ wrote 65536/65536 bytes at offset 0
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 {"return": ""}
 { 'execute': 'quit' }
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"return": {}}
 No errors were found on the image.
 read 65536/65536 bytes at offset 0
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
diff --git a/tests/qemu-iotests/120.out b/tests/qemu-iotests/120.out
index 0744c1f1360..35d84a5bc5a 100644
--- a/tests/qemu-iotests/120.out
+++ b/tests/qemu-iotests/120.out
@@ -5,8 +5,8 @@ QMP_VERSION
 wrote 65536/65536 bytes at offset 0
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 {"return": ""}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"return": {}}
 read 65536/65536 bytes at offset 0
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 read 65536/65536 bytes at offset 0
diff --git a/tests/qemu-iotests/127.out b/tests/qemu-iotests/127.out
index 1685c4850a5..dd8c4a8aa94 100644
--- a/tests/qemu-iotests/127.out
+++ b/tests/qemu-iotests/127.out
@@ -28,6 +28,6 @@ wrote 42/42 bytes at offset 0
 { 'execute': 'quit' }
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "mirror"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "mirror"}}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"return": {}}
 *** done
diff --git a/tests/qemu-iotests/140.out b/tests/qemu-iotests/140.out
index 312f76d5daf..32866440aed 100644
--- a/tests/qemu-iotests/140.out
+++ b/tests/qemu-iotests/140.out
@@ -19,6 +19,6 @@ read 65536/65536 bytes at offset 0
 qemu-io: can't open device nbd+unix:///drv?socket=SOCK_DIR/nbd: Requested export not available
 server reported: export 'drv' not present
 { 'execute': 'quit' }
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"return": {}}
 *** done
diff --git a/tests/qemu-iotests/143.out b/tests/qemu-iotests/143.out
index 9ec5888e0ee..d6afa32abc5 100644
--- a/tests/qemu-iotests/143.out
+++ b/tests/qemu-iotests/143.out
@@ -10,6 +10,6 @@ server reported: export 'no_such_export' not present
 qemu-io: can't open device nbd+unix:///aa--aa1?socket=SOCK_DIR/nbd: Requested export not available
 server reported: export 'aa--aa...' not present
 { 'execute': 'quit' }
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"return": {}}
 *** done
diff --git a/tests/qemu-iotests/156.out b/tests/qemu-iotests/156.out
index 4a22f0c41af..07e5e83f5d4 100644
--- a/tests/qemu-iotests/156.out
+++ b/tests/qemu-iotests/156.out
@@ -72,8 +72,8 @@ read 65536/65536 bytes at offset 196608
 {"return": ""}
 
 { 'execute': 'quit' }
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"return": {}}
 
 read 65536/65536 bytes at offset 0
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
diff --git a/tests/qemu-iotests/176.out b/tests/qemu-iotests/176.out
index 9d09b604520..45e9153ef39 100644
--- a/tests/qemu-iotests/176.out
+++ b/tests/qemu-iotests/176.out
@@ -169,8 +169,8 @@ QMP_VERSION
 {"return": {}}
 {"return": {}}
 {"return": {}}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"return": {}}
 wrote 196608/196608 bytes at offset 2147287040
 192 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 wrote 131072/131072 bytes at offset 2147352576
@@ -206,8 +206,8 @@ QMP_VERSION
 {"return": {}}
 {"return": {}}
 {"return": {"sha256": HASH}}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"return": {}}
 
 === Test pass bitmap.1 ===
 
@@ -218,8 +218,8 @@ QMP_VERSION
 {"return": {}}
 {"return": {}}
 {"return": {}}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"return": {}}
 wrote 196608/196608 bytes at offset 2147287040
 192 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 wrote 131072/131072 bytes at offset 2147352576
@@ -256,8 +256,8 @@ QMP_VERSION
 {"return": {}}
 {"return": {}}
 {"return": {"sha256": HASH}}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"return": {}}
 
 === Test pass bitmap.2 ===
 
@@ -268,8 +268,8 @@ QMP_VERSION
 {"return": {}}
 {"return": {}}
 {"return": {}}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"return": {}}
 wrote 196608/196608 bytes at offset 2147287040
 192 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 wrote 131072/131072 bytes at offset 2147352576
@@ -306,8 +306,8 @@ QMP_VERSION
 {"return": {}}
 {"return": {}}
 {"return": {"sha256": HASH}}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"return": {}}
 
 === Test pass bitmap.3 ===
 
@@ -318,8 +318,8 @@ QMP_VERSION
 {"return": {}}
 {"return": {}}
 {"return": {}}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"return": {}}
 wrote 196608/196608 bytes at offset 2147287040
 192 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 wrote 131072/131072 bytes at offset 2147352576
@@ -353,6 +353,6 @@ QMP_VERSION
 {"return": {}}
 {"return": {}}
 {"return": {"sha256": HASH}}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"return": {}}
 *** done
diff --git a/tests/qemu-iotests/182.out b/tests/qemu-iotests/182.out
index 57f7265458f..83fc1a4797a 100644
--- a/tests/qemu-iotests/182.out
+++ b/tests/qemu-iotests/182.out
@@ -53,6 +53,6 @@ Formatting 'TEST_DIR/t.qcow2.overlay', fmt=qcow2 cluster_size=65536 extended_l2=
 {'execute': 'qmp_capabilities'}
 {"return": {}}
 {'execute': 'quit'}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"return": {}}
 *** done
diff --git a/tests/qemu-iotests/183.out b/tests/qemu-iotests/183.out
index fd9c2e52a58..51aa41c8885 100644
--- a/tests/qemu-iotests/183.out
+++ b/tests/qemu-iotests/183.out
@@ -53,11 +53,11 @@ wrote 65536/65536 bytes at offset 1048576
 === Shut down and check image ===
 
 {"execute":"quit"}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
 {"return": {}}
 {"execute":"quit"}
-{"return": {}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"return": {}}
 No errors were found on the image.
 No errors were found on the image.
 wrote 65536/65536 bytes at offset 1048576
diff --git a/tests/qemu-iotests/184.out b/tests/qemu-iotests/184.out
index 77e5489d653..e8f631f8532 100644
--- a/tests/qemu-iotests/184.out
+++ b/tests/qemu-iotests/184.out
@@ -89,10 +89,6 @@ Testing:
     "return": [
     ]
 }
-{
-    "return": {
-    }
-}
 {
     "timestamp": {
         "seconds":  TIMESTAMP,
@@ -104,6 +100,10 @@ Testing:
         "reason": "host-qmp-quit"
     }
 }
+{
+    "return": {
+    }
+}
 
 
 == property changes in ThrottleGroup ==
@@ -169,10 +169,6 @@ Testing:
         "iops-total-max": 0
     }
 }
-{
-    "return": {
-    }
-}
 {
     "timestamp": {
         "seconds":  TIMESTAMP,
@@ -184,6 +180,10 @@ Testing:
         "reason": "host-qmp-quit"
     }
 }
+{
+    "return": {
+    }
+}
 
 
 == object creation/set errors  ==
@@ -211,10 +211,6 @@ Testing:
         "desc": "bps/iops/max total values and read/write values cannot be used at the same time"
     }
 }
-{
-    "return": {
-    }
-}
 {
     "timestamp": {
         "seconds":  TIMESTAMP,
@@ -226,6 +222,10 @@ Testing:
         "reason": "host-qmp-quit"
     }
 }
+{
+    "return": {
+    }
+}
 
 
 == don't specify group ==
@@ -247,10 +247,6 @@ Testing:
         "desc": "Parameter 'throttle-group' is missing"
     }
 }
-{
-    "return": {
-    }
-}
 {
     "timestamp": {
         "seconds":  TIMESTAMP,
@@ -262,6 +258,10 @@ Testing:
         "reason": "host-qmp-quit"
     }
 }
+{
+    "return": {
+    }
+}
 
 
 *** done
diff --git a/tests/qemu-iotests/185.out b/tests/qemu-iotests/185.out
index 754a6412586..48aa4657762 100644
--- a/tests/qemu-iotests/185.out
+++ b/tests/qemu-iotests/185.out
@@ -40,9 +40,16 @@ Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off comp
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "disk"}}
 {"return": {}}
 { 'execute': 'quit' }
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "paused", "id": "disk"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "disk"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "paused", "id": "disk"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "disk"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": "disk"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "disk", "len": 67108864, "offset": 524288, "speed": 65536, "type": "commit"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "disk"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "disk"}}
+{"return": {}}
 
 === Start active commit job and exit qemu ===
 
@@ -56,9 +63,16 @@ Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off comp
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "disk"}}
 {"return": {}}
 { 'execute': 'quit' }
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "paused", "id": "disk"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "disk"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "paused", "id": "disk"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "disk"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": "disk"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "disk", "len": 4194304, "offset": 4194304, "speed": 65536, "type": "commit"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "disk"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "disk"}}
+{"return": {}}
 
 === Start mirror job and exit qemu ===
 
@@ -75,9 +89,16 @@ Formatting 'TEST_DIR/t.qcow2.copy', fmt=qcow2 cluster_size=65536 extended_l2=off
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "disk"}}
 {"return": {}}
 { 'execute': 'quit' }
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "paused", "id": "disk"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "disk"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "paused", "id": "disk"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "disk"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": "disk"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "disk", "len": 4194304, "offset": 4194304, "speed": 65536, "type": "mirror"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "disk"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "disk"}}
+{"return": {}}
 
 === Start backup job and exit qemu ===
 
@@ -97,9 +118,16 @@ Formatting 'TEST_DIR/t.qcow2.copy', fmt=qcow2 cluster_size=65536 extended_l2=off
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "disk"}}
 {"return": {}}
 { 'execute': 'quit' }
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "paused", "id": "disk"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "disk"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "paused", "id": "disk"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "disk"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": "disk"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "disk", "len": 67108864, "offset": 65536, "speed": 65536, "type": "backup"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "disk"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "disk"}}
+{"return": {}}
 
 === Start streaming job and exit qemu ===
 
@@ -112,8 +140,15 @@ Formatting 'TEST_DIR/t.qcow2.copy', fmt=qcow2 cluster_size=65536 extended_l2=off
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "disk"}}
 {"return": {}}
 { 'execute': 'quit' }
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "paused", "id": "disk"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "disk"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "paused", "id": "disk"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "disk"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": "disk"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "disk", "len": 67108864, "offset": 524288, "speed": 65536, "type": "stream"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "disk"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "disk"}}
+{"return": {}}
 No errors were found on the image.
 *** done
diff --git a/tests/qemu-iotests/191.out b/tests/qemu-iotests/191.out
index ea887773743..c3309e4bc69 100644
--- a/tests/qemu-iotests/191.out
+++ b/tests/qemu-iotests/191.out
@@ -378,10 +378,6 @@ wrote 65536/65536 bytes at offset 1048576
     ]
 }
 { 'execute': 'quit' }
-{
-    "return": {
-    }
-}
 {
     "timestamp": {
         "seconds":  TIMESTAMP,
@@ -393,6 +389,10 @@ wrote 65536/65536 bytes at offset 1048576
         "reason": "host-qmp-quit"
     }
 }
+{
+    "return": {
+    }
+}
 image: TEST_DIR/t.IMGFMT
 file format: IMGFMT
 virtual size: 64 MiB (67108864 bytes)
@@ -796,10 +796,6 @@ wrote 65536/65536 bytes at offset 1048576
     ]
 }
 { 'execute': 'quit' }
-{
-    "return": {
-    }
-}
 {
     "timestamp": {
         "seconds":  TIMESTAMP,
@@ -811,6 +807,10 @@ wrote 65536/65536 bytes at offset 1048576
         "reason": "host-qmp-quit"
     }
 }
+{
+    "return": {
+    }
+}
 image: TEST_DIR/t.IMGFMT
 file format: IMGFMT
 virtual size: 64 MiB (67108864 bytes)
diff --git a/tests/qemu-iotests/195.out b/tests/qemu-iotests/195.out
index ec84df5012a..91717d302e1 100644
--- a/tests/qemu-iotests/195.out
+++ b/tests/qemu-iotests/195.out
@@ -17,10 +17,6 @@ Testing: -drive if=none,file=TEST_DIR/t.IMGFMT,backing.node-name=mid
     "return": {
     }
 }
-{
-    "return": {
-    }
-}
 {
     "timestamp": {
         "seconds":  TIMESTAMP,
@@ -32,6 +28,10 @@ Testing: -drive if=none,file=TEST_DIR/t.IMGFMT,backing.node-name=mid
         "reason": "host-qmp-quit"
     }
 }
+{
+    "return": {
+    }
+}
 
 image: TEST_DIR/t.IMGFMT.mid
 file format: IMGFMT
@@ -55,10 +55,6 @@ Testing: -drive if=none,file=TEST_DIR/t.IMGFMT,node-name=top
     "return": {
     }
 }
-{
-    "return": {
-    }
-}
 {
     "timestamp": {
         "seconds":  TIMESTAMP,
@@ -70,6 +66,10 @@ Testing: -drive if=none,file=TEST_DIR/t.IMGFMT,node-name=top
         "reason": "host-qmp-quit"
     }
 }
+{
+    "return": {
+    }
+}
 
 image: TEST_DIR/t.IMGFMT
 file format: IMGFMT
diff --git a/tests/qemu-iotests/223.out b/tests/qemu-iotests/223.out
index e58ea5abbd5..5014a381738 100644
--- a/tests/qemu-iotests/223.out
+++ b/tests/qemu-iotests/223.out
@@ -11,8 +11,8 @@ QMP_VERSION
 {"return": {}}
 {"return": {}}
 {"return": {}}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"return": {}}
 
 
 === Write part of the file under active bitmap ===
@@ -118,14 +118,14 @@ read 2097152/2097152 bytes at offset 2097152
 
 {"execute":"nbd-server-remove",
   "arguments":{"name":"n"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_EXPORT_DELETED", "data": {"id": "n"}}
 {"return": {}}
 {"execute":"nbd-server-remove",
   "arguments":{"name":"n2"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_EXPORT_DELETED", "data": {"id": "n"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_EXPORT_DELETED", "data": {"id": "n2"}}
 {"return": {}}
 {"execute":"nbd-server-remove",
   "arguments":{"name":"n2"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_EXPORT_DELETED", "data": {"id": "n2"}}
 {"error": {"class": "GenericError", "desc": "Export 'n2' is not found"}}
 {"execute":"nbd-server-stop"}
 {"return": {}}
@@ -219,22 +219,22 @@ read 2097152/2097152 bytes at offset 2097152
 
 {"execute":"nbd-server-remove",
   "arguments":{"name":"n"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_EXPORT_DELETED", "data": {"id": "n"}}
 {"return": {}}
 {"execute":"nbd-server-remove",
   "arguments":{"name":"n2"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_EXPORT_DELETED", "data": {"id": "n"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_EXPORT_DELETED", "data": {"id": "n2"}}
 {"return": {}}
 {"execute":"nbd-server-remove",
   "arguments":{"name":"n2"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_EXPORT_DELETED", "data": {"id": "n2"}}
 {"error": {"class": "GenericError", "desc": "Export 'n2' is not found"}}
 {"execute":"nbd-server-stop"}
 {"return": {}}
 {"execute":"nbd-server-stop"}
 {"error": {"class": "GenericError", "desc": "NBD server not running"}}
 {"execute":"quit"}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"return": {}}
 
 === Use qemu-nbd as server ===
 
diff --git a/tests/qemu-iotests/227.out b/tests/qemu-iotests/227.out
index 9c09ee3917b..26cb68c1ada 100644
--- a/tests/qemu-iotests/227.out
+++ b/tests/qemu-iotests/227.out
@@ -48,10 +48,6 @@ Testing: -drive driver=null-co,read-zeroes=on,if=virtio
         }
     ]
 }
-{
-    "return": {
-    }
-}
 {
     "timestamp": {
         "seconds":  TIMESTAMP,
@@ -63,6 +59,10 @@ Testing: -drive driver=null-co,read-zeroes=on,if=virtio
         "reason": "host-qmp-quit"
     }
 }
+{
+    "return": {
+    }
+}
 
 
 === blockstats with -drive if=none ===
@@ -112,10 +112,6 @@ Testing: -drive driver=null-co,if=none
         }
     ]
 }
-{
-    "return": {
-    }
-}
 {
     "timestamp": {
         "seconds":  TIMESTAMP,
@@ -127,6 +123,10 @@ Testing: -drive driver=null-co,if=none
         "reason": "host-qmp-quit"
     }
 }
+{
+    "return": {
+    }
+}
 
 
 === blockstats with -blockdev ===
@@ -143,10 +143,6 @@ Testing: -blockdev driver=null-co,node-name=null
     "return": [
     ]
 }
-{
-    "return": {
-    }
-}
 {
     "timestamp": {
         "seconds":  TIMESTAMP,
@@ -158,6 +154,10 @@ Testing: -blockdev driver=null-co,node-name=null
         "reason": "host-qmp-quit"
     }
 }
+{
+    "return": {
+    }
+}
 
 
 === blockstats with -blockdev and -device ===
@@ -208,10 +208,6 @@ Testing: -blockdev driver=null-co,read-zeroes=on,node-name=null -device virtio-b
         }
     ]
 }
-{
-    "return": {
-    }
-}
 {
     "timestamp": {
         "seconds":  TIMESTAMP,
@@ -223,5 +219,9 @@ Testing: -blockdev driver=null-co,read-zeroes=on,node-name=null -device virtio-b
         "reason": "host-qmp-quit"
     }
 }
+{
+    "return": {
+    }
+}
 
 *** done
diff --git a/tests/qemu-iotests/247.out b/tests/qemu-iotests/247.out
index e909e839949..7d252e7fe4d 100644
--- a/tests/qemu-iotests/247.out
+++ b/tests/qemu-iotests/247.out
@@ -17,6 +17,6 @@ QMP_VERSION
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "job0", "len": 134217728, "offset": 134217728, "speed": 0, "type": "commit"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "job0"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "job0"}}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
+{"return": {}}
 *** done
diff --git a/tests/qemu-iotests/273.out b/tests/qemu-iotests/273.out
index 4e840b67303..2fd9d9f1953 100644
--- a/tests/qemu-iotests/273.out
+++ b/tests/qemu-iotests/273.out
@@ -286,10 +286,6 @@ Testing: -blockdev file,node-name=base,filename=TEST_DIR/t.IMGFMT.base -blockdev
         ]
     }
 }
-{
-    "return": {
-    }
-}
 {
     "timestamp": {
         "seconds":  TIMESTAMP,
@@ -301,5 +297,9 @@ Testing: -blockdev file,node-name=base,filename=TEST_DIR/t.IMGFMT.base -blockdev
         "reason": "host-qmp-quit"
     }
 }
+{
+    "return": {
+    }
+}
 
 *** done
diff --git a/tests/qemu-iotests/308 b/tests/qemu-iotests/308
index 2e3f8f42824..cdb15075518 100755
--- a/tests/qemu-iotests/308
+++ b/tests/qemu-iotests/308
@@ -77,6 +77,7 @@ fuse_export_add()
 # $1: Export ID
 fuse_export_del()
 {
+    capture_events="BLOCK_EXPORT_DELETED" \
     _send_qemu_cmd $QEMU_HANDLE \
         "{'execute': 'block-export-del',
           'arguments': {
@@ -84,8 +85,7 @@ fuse_export_del()
           } }" \
         'return'
 
-    _send_qemu_cmd $QEMU_HANDLE \
-        '' \
+    _wait_event $QEMU_HANDLE \
         'BLOCK_EXPORT_DELETED'
 }
 
diff --git a/tests/qemu-iotests/308.out b/tests/qemu-iotests/308.out
index fc47bb11a2e..0afbef2f8ea 100644
--- a/tests/qemu-iotests/308.out
+++ b/tests/qemu-iotests/308.out
@@ -165,9 +165,9 @@ OK: Post-truncate image size is as expected
 
 === Tear down ===
 {'execute': 'quit'}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_EXPORT_DELETED", "data": {"id": "export-mp"}}
+{"return": {}}
 
 === Compare copy with original ===
 Images are identical.
diff --git a/tests/qemu-iotests/tests/qsd-jobs.out b/tests/qemu-iotests/tests/qsd-jobs.out
index c1bc9b83562..aa6b6d1aefe 100644
--- a/tests/qemu-iotests/tests/qsd-jobs.out
+++ b/tests/qemu-iotests/tests/qsd-jobs.out
@@ -7,8 +7,8 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 backing_file=TEST_DIR/
 QMP_VERSION
 {"return": {}}
 {"return": {}}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "job0", "len": 0, "offset": 0, "speed": 0, "type": "commit"}}
+{"return": {}}
 
 === Streaming can't get permission on base node ===
 
@@ -17,6 +17,6 @@ QMP_VERSION
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job0"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "job0"}}
 {"error": {"class": "GenericError", "desc": "Permission conflict on node 'fmt_base': permissions 'write' are both required by an unnamed block device (uses node 'fmt_base' as 'root' child) and unshared by stream job 'job0' (uses node 'fmt_base' as 'intermediate node' child)."}}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_EXPORT_DELETED", "data": {"id": "export1"}}
+{"return": {}}
 *** done
-- 
Gitee


From 29408ecb4e3a5a3e944c7a05f52a945000100b38 Mon Sep 17 00:00:00 2001
From: Kevin Wolf <kwolf@redhat.com>
Date: Fri, 9 Feb 2024 18:31:03 +0100
Subject: [PATCH 334/407] iotests: Make 144 deterministic again

RH-Author: Stefan Hajnoczi <stefanha@redhat.com>
RH-MergeRequest: 352: monitor: only run coroutine commands in qemu_aio_context
RH-Jira: RHEL-7353
RH-Acked-by: Kevin Wolf <kwolf@redhat.com>
RH-Acked-by: Hanna Czenczek <hreitz@redhat.com>
RH-Commit: [4/4] 4974a32174abefb509b7c46671a364b4b991449e

Since commit effd60c8 changed how QMP commands are processed, the order
of the block-commit return value and job events in iotests 144 wasn't
fixed and more and caused the test to fail intermittently.

Change the test to cache events first and then print them in a
predefined order.

Waiting three times for JOB_STATUS_CHANGE is a bit uglier than just
waiting for the JOB_STATUS_CHANGE that has "status": "ready", but the
tooling we have doesn't seem to allow the latter easily.

Fixes: effd60c878176bcaf97fa7ce2b12d04bb8ead6f7
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2126
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20240209173103.239994-1-kwolf@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit cc29c12ec629ba68a4a6cb7d165c94cc8502815a)
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tests/qemu-iotests/144     | 12 +++++++++++-
 tests/qemu-iotests/144.out |  2 +-
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/144 b/tests/qemu-iotests/144
index 60e9ddd75fb..8c50d6487e4 100755
--- a/tests/qemu-iotests/144
+++ b/tests/qemu-iotests/144
@@ -83,12 +83,22 @@ echo
 echo === Performing block-commit on active layer ===
 echo
 
+capture_events="BLOCK_JOB_READY JOB_STATUS_CHANGE"
+
 # Block commit on active layer, push the new overlay into base
 _send_qemu_cmd $h "{ 'execute': 'block-commit',
                                 'arguments': {
                                                  'device': 'virtio0'
                                               }
-                    }" "READY"
+                    }" "return"
+
+_wait_event $h "JOB_STATUS_CHANGE"
+_wait_event $h "JOB_STATUS_CHANGE"
+_wait_event $h "JOB_STATUS_CHANGE"
+
+_wait_event $h "BLOCK_JOB_READY"
+
+capture_events=
 
 _send_qemu_cmd $h "{ 'execute': 'block-job-complete',
                                 'arguments': {
diff --git a/tests/qemu-iotests/144.out b/tests/qemu-iotests/144.out
index b3b48120150..2245ddfa106 100644
--- a/tests/qemu-iotests/144.out
+++ b/tests/qemu-iotests/144.out
@@ -25,9 +25,9 @@ Formatting 'TEST_DIR/tmp.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off co
                                                  'device': 'virtio0'
                                               }
                     }
+{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "virtio0"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "virtio0"}}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "virtio0"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "virtio0", "len": 0, "offset": 0, "speed": 0, "type": "commit"}}
 { 'execute': 'block-job-complete',
-- 
Gitee


From 21cb099438b94d1e97d06ab6e8a046f902217b43 Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Tue, 5 Mar 2024 11:36:15 -0500
Subject: [PATCH 335/407] glib-compat: Introduce g_memdup2() wrapper
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 353: ui/clipboard: mark type as not available when there is no data
RH-Jira: RHEL-19628
RH-Acked-by: Marc-André Lureau <marcandre.lureau@redhat.com>
RH-Acked-by: Gerd Hoffmann <None>
RH-Commit: [1/2] f401c63303ef558bfcbb36e4c8fcc8bf2b1c3eb4 (redhat/rhel/src/qemu-kvm/jons-qemu-kvm-2)

JIRA: https://issues.redhat.com/browse/RHEL-19628
CVE: CVE-2023-6683
Upstream: Merged

commit 2c674fada72079583a3f2cc1790b16a0259c4fa0
Author: Philippe Mathieu-Daudé <philmd@linaro.org>
Date:   Fri Sep 3 19:44:44 2021 +0200

    glib-compat: Introduce g_memdup2() wrapper
    When experimenting raising GLIB_VERSION_MIN_REQUIRED to 2.68
    (Fedora 34 provides GLib 2.68.1) we get:

      hw/virtio/virtio-crypto.c:245:24: error: 'g_memdup' is deprecated: Use 'g_memdup2' instead [-Werror,-Wdeprecated-declarations]
      ...

    g_memdup() has been updated by g_memdup2() to fix eventual security
    issues (size argument is 32-bit and could be truncated / wrapping).
    GLib recommends to copy their static inline version of g_memdup2():
    https://discourse.gnome.org/t/port-your-module-from-g-memdup-to-g-memdup2-now/5538

    Our glib-compat.h provides a comment explaining how to deal with
    these deprecated declarations (see commit e71e8cc0355
    "glib: enforce the minimum required version and warn about old APIs").

    Following this comment suggestion, implement the g_memdup2_qemu()
    wrapper to g_memdup2(), and use the safer equivalent inlined when
    we are using pre-2.68 GLib.

    Reported-by: Eric Blake <eblake@redhat.com>
    Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
    Reviewed-by: Eric Blake <eblake@redhat.com>
    Message-Id: <20210903174510.751630-3-philmd@redhat.com>
    Signed-off-by: Laurent Vivier <laurent@vivier.eu>

Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 include/glib-compat.h | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/include/glib-compat.h b/include/glib-compat.h
index 9e95c888f54..8d01a8c01fb 100644
--- a/include/glib-compat.h
+++ b/include/glib-compat.h
@@ -68,6 +68,43 @@
  * without generating warnings.
  */
 
+/*
+ * g_memdup2_qemu:
+ * @mem: (nullable): the memory to copy.
+ * @byte_size: the number of bytes to copy.
+ *
+ * Allocates @byte_size bytes of memory, and copies @byte_size bytes into it
+ * from @mem. If @mem is %NULL it returns %NULL.
+ *
+ * This replaces g_memdup(), which was prone to integer overflows when
+ * converting the argument from a #gsize to a #guint.
+ *
+ * This static inline version is a backport of the new public API from
+ * GLib 2.68, kept internal to GLib for backport to older stable releases.
+ * See https://gitlab.gnome.org/GNOME/glib/-/issues/2319.
+ *
+ * Returns: (nullable): a pointer to the newly-allocated copy of the memory,
+ *          or %NULL if @mem is %NULL.
+ */
+static inline gpointer g_memdup2_qemu(gconstpointer mem, gsize byte_size)
+{
+#if GLIB_CHECK_VERSION(2, 68, 0)
+    return g_memdup2(mem, byte_size);
+#else
+    gpointer new_mem;
+
+    if (mem && byte_size != 0) {
+        new_mem = g_malloc(byte_size);
+        memcpy(new_mem, mem, byte_size);
+    } else {
+        new_mem = NULL;
+    }
+
+    return new_mem;
+#endif
+}
+#define g_memdup2(m, s) g_memdup2_qemu(m, s)
+
 #if defined(G_OS_UNIX)
 /*
  * Note: The fallback implementation is not MT-safe, and it returns a copy of
-- 
Gitee


From 97a29fec8d5397a1225ab6eb129892b6659f212a Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Tue, 5 Mar 2024 11:36:15 -0500
Subject: [PATCH 336/407] ui/clipboard: mark type as not available when there
 is no data
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 353: ui/clipboard: mark type as not available when there is no data
RH-Jira: RHEL-19628
RH-Acked-by: Marc-André Lureau <marcandre.lureau@redhat.com>
RH-Acked-by: Gerd Hoffmann <None>
RH-Commit: [2/2] fa0edf7a362a16978e2377cf61f36ff227d186b2 (redhat/rhel/src/qemu-kvm/jons-qemu-kvm-2)

JIRA: https://issues.redhat.com/browse/RHEL-19628
CVE: CVE-2023-6683
Upstream: Merged
Conflicts:
         - The function g_memdup2() is used by this commit, but is not present in
           this code version. It looks safe to introduce it in a preceding commit,
           instead of reverting to the less safe g_memdup(), so that is what we do.
         - There is a second upstream commit covering this CVE:
           commit 9c416582611b ("ui/clipboard: add asserts for update and request")
           which is based on several other previous commits not present in this version.
           Re-applying these, or trying to adapt the code, is too intrusive and risky
           given that it only introduces two diagnostic asserts which are not essential
           for solving the CVE.
           We therefore omit that commit.

commit 405484b29f6548c7b86549b0f961b906337aa68a
Author: Fiona Ebner <f.ebner@proxmox.com>
Date:   Wed Jan 24 11:57:48 2024 +0100

    ui/clipboard: mark type as not available when there is no data

    With VNC, a client can send a non-extended VNC_MSG_CLIENT_CUT_TEXT
    message with len=0. In qemu_clipboard_set_data(), the clipboard info
    will be updated setting data to NULL (because g_memdup(data, size)
    returns NULL when size is 0). If the client does not set the
    VNC_ENCODING_CLIPBOARD_EXT feature when setting up the encodings, then
    the 'request' callback for the clipboard peer is not initialized.
    Later, because data is NULL, qemu_clipboard_request() can be reached
    via vdagent_chr_write() and vdagent_clipboard_recv_request() and
    there, the clipboard owner's 'request' callback will be attempted to
    be called, but that is a NULL pointer.

    In particular, this can happen when using the KRDC (22.12.3) VNC
    client.

    Another scenario leading to the same issue is with two clients (say
    noVNC and KRDC):

    The noVNC client sets the extension VNC_FEATURE_CLIPBOARD_EXT and
    initializes its cbpeer.

    The KRDC client does not, but triggers a vnc_client_cut_text() (note
    it's not the _ext variant)). There, a new clipboard info with it as
    the 'owner' is created and via qemu_clipboard_set_data() is called,
    which in turn calls qemu_clipboard_update() with that info.

    In qemu_clipboard_update(), the notifier for the noVNC client will be
    called, i.e. vnc_clipboard_notify() and also set vs->cbinfo for the
    noVNC client. The 'owner' in that clipboard info is the clipboard peer
    for the KRDC client, which did not initialize the 'request' function.
    That sounds correct to me, it is the owner of that clipboard info.

    Then when noVNC sends a VNC_MSG_CLIENT_CUT_TEXT message (it did set
    the VNC_FEATURE_CLIPBOARD_EXT feature correctly, so a check for it
    passes), that clipboard info is passed to qemu_clipboard_request() and
    the original segfault still happens.

    Fix the issue by handling updates with size 0 differently. In
    particular, mark in the clipboard info that the type is not available.

    While at it, switch to g_memdup2(), because g_memdup() is deprecated.

    Cc: qemu-stable@nongnu.org
    Fixes: CVE-2023-6683
    Reported-by: Markus Frank <m.frank@proxmox.com>
    Suggested-by: Marc-André Lureau <marcandre.lureau@redhat.com>
    Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
    Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
    Tested-by: Markus Frank <m.frank@proxmox.com>
    Message-ID: <20240124105749.204610-1-f.ebner@proxmox.com>

Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 ui/clipboard.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/ui/clipboard.c b/ui/clipboard.c
index d7b008d62a0..b8c795f2e22 100644
--- a/ui/clipboard.c
+++ b/ui/clipboard.c
@@ -123,9 +123,15 @@ void qemu_clipboard_set_data(QemuClipboardPeer *peer,
     }
 
     g_free(info->types[type].data);
-    info->types[type].data = g_memdup(data, size);
-    info->types[type].size = size;
-    info->types[type].available = true;
+    if (size) {
+        info->types[type].data = g_memdup2(data, size);
+        info->types[type].size = size;
+        info->types[type].available = true;
+    } else {
+        info->types[type].data = NULL;
+        info->types[type].size = 0;
+        info->types[type].available = false;
+    }
 
     if (update) {
         qemu_clipboard_update(info);
-- 
Gitee


From 6e36965d4faeafd1680ccfed631687411cd5ae13 Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Tue, 5 Mar 2024 16:38:49 -0500
Subject: [PATCH 337/407] virtio-net: correctly copy vnet header when flushing
 TX

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 354: virtio-net: correctly copy vnet header when flushing TX
RH-Jira: RHEL-19496
RH-Acked-by: Jason Wang <jasowang@redhat.com>
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
RH-Commit: [1/1] 445b601da86a64298b776879fa0f30a4bf6c16f5 (redhat/rhel/src/qemu-kvm/jons-qemu-kvm-2)

JIRA: https://issues.redhat.com/browse/RHEL-19496
CVE: CVE-2023-6693
Upstream: Merged

commit 2220e8189fb94068dbad333228659fbac819abb0
Author: Jason Wang <jasowang@redhat.com>
Date:   Tue Jan 2 11:29:01 2024 +0800

    virtio-net: correctly copy vnet header when flushing TX

    When HASH_REPORT is negotiated, the guest_hdr_len might be larger than
    the size of the mergeable rx buffer header. Using
    virtio_net_hdr_mrg_rxbuf during the header swap might lead a stack
    overflow in this case. Fixing this by using virtio_net_hdr_v1_hash
    instead.

    Reported-by: Xiao Lei <leixiao.nop@zju.edu.cn>
    Cc: Yuri Benditovich <yuri.benditovich@daynix.com>
    Cc: qemu-stable@nongnu.org
    Cc: Mauro Matteo Cascella <mcascell@redhat.com>
    Fixes: CVE-2023-6693
    Fixes: e22f0603fb2f ("virtio-net: reference implementation of hash report")
    Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
    Signed-off-by: Jason Wang <jasowang@redhat.com>

Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 hw/net/virtio-net.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index f5f07f8e639..7d459726d42 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -602,6 +602,11 @@ static void virtio_net_set_mrg_rx_bufs(VirtIONet *n, int mergeable_rx_bufs,
 
     n->mergeable_rx_bufs = mergeable_rx_bufs;
 
+    /*
+     * Note: when extending the vnet header, please make sure to
+     * change the vnet header copying logic in virtio_net_flush_tx()
+     * as well.
+     */
     if (version_1) {
         n->guest_hdr_len = hash_report ?
             sizeof(struct virtio_net_hdr_v1_hash) :
@@ -2535,7 +2540,7 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
         ssize_t ret;
         unsigned int out_num;
         struct iovec sg[VIRTQUEUE_MAX_SIZE], sg2[VIRTQUEUE_MAX_SIZE + 1], *out_sg;
-        struct virtio_net_hdr_mrg_rxbuf mhdr;
+        struct virtio_net_hdr_v1_hash vhdr;
 
         elem = virtqueue_pop(q->tx_vq, sizeof(VirtQueueElement));
         if (!elem) {
@@ -2552,7 +2557,7 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
         }
 
         if (n->has_vnet_hdr) {
-            if (iov_to_buf(out_sg, out_num, 0, &mhdr, n->guest_hdr_len) <
+            if (iov_to_buf(out_sg, out_num, 0, &vhdr, n->guest_hdr_len) <
                 n->guest_hdr_len) {
                 virtio_error(vdev, "virtio-net header incorrect");
                 virtqueue_detach_element(q->tx_vq, elem, 0);
@@ -2560,8 +2565,8 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
                 return -EINVAL;
             }
             if (n->needs_vnet_hdr_swap) {
-                virtio_net_hdr_swap(vdev, (void *) &mhdr);
-                sg2[0].iov_base = &mhdr;
+                virtio_net_hdr_swap(vdev, (void *) &vhdr);
+                sg2[0].iov_base = &vhdr;
                 sg2[0].iov_len = n->guest_hdr_len;
                 out_num = iov_copy(&sg2[1], ARRAY_SIZE(sg2) - 1,
                                    out_sg, out_num,
-- 
Gitee


From c268e598c6a5f5001debc29bb889eb8babceabaa Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Wed, 5 Jun 2024 19:56:51 -0400
Subject: [PATCH 338/407] qcow2: Don't open data_file with BDRV_O_NO_IO
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 5: EMBARGOED CVE-2024-4467 for rhel-8.10.z (PRDSC)
RH-Jira: RHEL-35616
RH-CVE: CVE-2024-4467
RH-Acked-by: Kevin Wolf <kwolf@redhat.com>
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
RH-Commit: [1/5] 2e72d21c14d86645cf68eec78f49d5cc5d77581f

Conflicts: qcow2_do_open(): missing boolean ´open_data_file'.
           We assume it to be true.

commit f9843ce5c519901654a7d8ba43ee95ce25ca13c2
Author: Kevin Wolf <kwolf@redhat.com>
Date:   Thu Apr 11 15:06:01 2024 +0200

    qcow2: Don't open data_file with BDRV_O_NO_IO

    One use case for 'qemu-img info' is verifying that untrusted images
    don't reference an unwanted external file, be it as a backing file or an
    external data file. To make sure that calling 'qemu-img info' can't
    already have undesired side effects with a malicious image, just don't
    open the data file at all with BDRV_O_NO_IO. If nothing ever tries to do
    I/O, we don't need to have it open.

    This changes the output of iotests case 061, which used 'qemu-img info'
    to show that opening an image with an invalid data file fails. After
    this patch, it succeeds. Replace this part of the test with a qemu-io
    call, but keep the final 'qemu-img info' to show that the invalid data
    file is correctly displayed in the output.

    Signed-off-by: Kevin Wolf <kwolf@redhat.com>
    Reviewed-by: Eric Blake <eblake@redhat.com>
    Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
    Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
    Upstream: N/A, embargoed
    Signed-off-by: Hanna Czenczek <hreitz@redhat.com>

Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 block/qcow2.c              | 87 +++++++++++++++++++++++---------------
 tests/qemu-iotests/061     |  6 ++-
 tests/qemu-iotests/061.out |  8 +++-
 3 files changed, 62 insertions(+), 39 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index d5090167569..6ee19196128 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1613,50 +1613,67 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
         goto fail;
     }
 
-    /* Open external data file */
-    s->data_file = bdrv_open_child(NULL, options, "data-file", bs,
-                                   &child_of_bds, BDRV_CHILD_DATA,
-                                   true, errp);
-    if (*errp) {
-        ret = -EINVAL;
-        goto fail;
-    }
+    if (flags & BDRV_O_NO_IO) {
+        /*
+         * Don't open the data file for 'qemu-img info' so that it can be used
+         * to verify that an untrusted qcow2 image doesn't refer to external
+         * files.
+         *
+         * Note: This still makes has_data_file() return true.
+         */
+        if (s->incompatible_features & QCOW2_INCOMPAT_DATA_FILE) {
+            s->data_file = NULL;
+        } else {
+            s->data_file = bs->file;
+        }
+        qdict_extract_subqdict(options, NULL, "data-file.");
+        qdict_del(options, "data-file");
+    } else {
+        /* Open external data file */
+        s->data_file = bdrv_open_child(NULL, options, "data-file", bs,
+                                       &child_of_bds, BDRV_CHILD_DATA,
+                                       true, errp);
+        if (*errp) {
+            ret = -EINVAL;
+            goto fail;
+        }
 
-    if (s->incompatible_features & QCOW2_INCOMPAT_DATA_FILE) {
-        if (!s->data_file && s->image_data_file) {
-            s->data_file = bdrv_open_child(s->image_data_file, options,
-                                           "data-file", bs, &child_of_bds,
-                                           BDRV_CHILD_DATA, false, errp);
+        if (s->incompatible_features & QCOW2_INCOMPAT_DATA_FILE) {
+            if (!s->data_file && s->image_data_file) {
+                s->data_file = bdrv_open_child(s->image_data_file, options,
+                                               "data-file", bs, &child_of_bds,
+                                               BDRV_CHILD_DATA, false, errp);
+                if (!s->data_file) {
+                    ret = -EINVAL;
+                    goto fail;
+                }
+            }
             if (!s->data_file) {
+                error_setg(errp, "'data-file' is required for this image");
                 ret = -EINVAL;
                 goto fail;
             }
-        }
-        if (!s->data_file) {
-            error_setg(errp, "'data-file' is required for this image");
-            ret = -EINVAL;
-            goto fail;
-        }
 
-        /* No data here */
-        bs->file->role &= ~BDRV_CHILD_DATA;
+            /* No data here */
+            bs->file->role &= ~BDRV_CHILD_DATA;
 
-        /* Must succeed because we have given up permissions if anything */
-        bdrv_child_refresh_perms(bs, bs->file, &error_abort);
-    } else {
-        if (s->data_file) {
-            error_setg(errp, "'data-file' can only be set for images with an "
-                             "external data file");
-            ret = -EINVAL;
-            goto fail;
-        }
+            /* Must succeed because we have given up permissions if anything */
+            bdrv_child_refresh_perms(bs, bs->file, &error_abort);
+        } else {
+            if (s->data_file) {
+                error_setg(errp, "'data-file' can only be set for images with an "
+                           "external data file");
+                ret = -EINVAL;
+                goto fail;
+            }
 
-        s->data_file = bs->file;
+            s->data_file = bs->file;
 
-        if (data_file_is_raw(bs)) {
-            error_setg(errp, "data-file-raw requires a data file");
-            ret = -EINVAL;
-            goto fail;
+            if (data_file_is_raw(bs)) {
+                error_setg(errp, "data-file-raw requires a data file");
+                ret = -EINVAL;
+                goto fail;
+            }
         }
     }
 
diff --git a/tests/qemu-iotests/061 b/tests/qemu-iotests/061
index 9507c223bda..6a5bd47efc3 100755
--- a/tests/qemu-iotests/061
+++ b/tests/qemu-iotests/061
@@ -322,12 +322,14 @@ $QEMU_IMG amend -o "data_file=foo" "$TEST_IMG"
 echo
 _make_test_img -o "compat=1.1,data_file=$TEST_IMG.data" 64M
 $QEMU_IMG amend -o "data_file=foo" "$TEST_IMG"
-_img_info --format-specific
+$QEMU_IO -c "read 0 4k" "$TEST_IMG" 2>&1 | _filter_testdir | _filter_imgfmt
+$QEMU_IO -c "open -o data-file.filename=$TEST_IMG.data,file.filename=$TEST_IMG" -c "read 0 4k" | _filter_qemu_io
 TEST_IMG="data-file.filename=$TEST_IMG.data,file.filename=$TEST_IMG" _img_info --format-specific --image-opts
 
 echo
 $QEMU_IMG amend -o "data_file=" --image-opts "data-file.filename=$TEST_IMG.data,file.filename=$TEST_IMG"
-_img_info --format-specific
+$QEMU_IO -c "read 0 4k" "$TEST_IMG" 2>&1 | _filter_testdir | _filter_imgfmt
+$QEMU_IO -c "open -o data-file.filename=$TEST_IMG.data,file.filename=$TEST_IMG" -c "read 0 4k" | _filter_qemu_io
 TEST_IMG="data-file.filename=$TEST_IMG.data,file.filename=$TEST_IMG" _img_info --format-specific --image-opts
 
 echo
diff --git a/tests/qemu-iotests/061.out b/tests/qemu-iotests/061.out
index 7ecbd4dea87..99b2307a23c 100644
--- a/tests/qemu-iotests/061.out
+++ b/tests/qemu-iotests/061.out
@@ -545,7 +545,9 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
 qemu-img: data-file can only be set for images that use an external data file
 
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 data_file=TEST_DIR/t.IMGFMT.data
-qemu-img: Could not open 'TEST_DIR/t.IMGFMT': Could not open 'foo': No such file or directory
+qemu-io: can't open device TEST_DIR/t.IMGFMT: Could not open 'foo': No such file or directory
+read 4096/4096 bytes at offset 0
+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 image: TEST_DIR/t.IMGFMT
 file format: IMGFMT
 virtual size: 64 MiB (67108864 bytes)
@@ -560,7 +562,9 @@ Format specific information:
     corrupt: false
     extended l2: false
 
-qemu-img: Could not open 'TEST_DIR/t.IMGFMT': 'data-file' is required for this image
+qemu-io: can't open device TEST_DIR/t.IMGFMT: 'data-file' is required for this image
+read 4096/4096 bytes at offset 0
+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 image: TEST_DIR/t.IMGFMT
 file format: IMGFMT
 virtual size: 64 MiB (67108864 bytes)
-- 
Gitee


From c9698d78f5f2d320ae9d50a5827b4f6eb29f1277 Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Wed, 5 Jun 2024 19:56:51 -0400
Subject: [PATCH 339/407] iotests/244: Don't store data-file with protocol in
 image

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 5: EMBARGOED CVE-2024-4467 for rhel-8.10.z (PRDSC)
RH-Jira: RHEL-35616
RH-CVE: CVE-2024-4467
RH-Acked-by: Kevin Wolf <kwolf@redhat.com>
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
RH-Commit: [2/5] a422cfdba938e1bd857008ccbbddc695011ae0ff

commit 92e00dab8be1570b13172353d77d2af44cb4e22b
Author: Kevin Wolf <kwolf@redhat.com>
Date:   Thu Apr 25 14:49:40 2024 +0200

    iotests/244: Don't store data-file with protocol in image

    We want to disable filename parsing for data files because it's too easy
    to abuse in malicious image files. Make the test ready for the change by
    passing the data file explicitly in command line options.

    Signed-off-by: Kevin Wolf <kwolf@redhat.com>
    Reviewed-by: Eric Blake <eblake@redhat.com>
    Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
    Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
    Upstream: N/A, embargoed
    Signed-off-by: Hanna Czenczek <hreitz@redhat.com>

Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 tests/qemu-iotests/244 | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/244 b/tests/qemu-iotests/244
index 3e61fa25bb6..bb9cc6512f3 100755
--- a/tests/qemu-iotests/244
+++ b/tests/qemu-iotests/244
@@ -215,9 +215,22 @@ $QEMU_IMG convert -f $IMGFMT -O $IMGFMT -n -C "$TEST_IMG.src" "$TEST_IMG"
 $QEMU_IMG compare -f $IMGFMT -F $IMGFMT "$TEST_IMG.src" "$TEST_IMG"
 
 # blkdebug doesn't support copy offloading, so this tests the error path
-$QEMU_IMG amend -f $IMGFMT -o "data_file=blkdebug::$TEST_IMG.data" "$TEST_IMG"
-$QEMU_IMG convert -f $IMGFMT -O $IMGFMT -n -C "$TEST_IMG.src" "$TEST_IMG"
-$QEMU_IMG compare -f $IMGFMT -F $IMGFMT "$TEST_IMG.src" "$TEST_IMG"
+test_img_with_blkdebug="json:{
+    'driver': 'qcow2',
+    'file': {
+        'driver': 'file',
+        'filename': '$TEST_IMG'
+    },
+    'data-file': {
+        'driver': 'blkdebug',
+        'image': {
+            'driver': 'file',
+            'filename': '$TEST_IMG.data'
+        }
+    }
+}"
+$QEMU_IMG convert -f $IMGFMT -O $IMGFMT -n -C "$TEST_IMG.src" "$test_img_with_blkdebug"
+$QEMU_IMG compare -f $IMGFMT -F $IMGFMT "$TEST_IMG.src" "$test_img_with_blkdebug"
 
 echo
 echo "=== Flushing should flush the data file ==="
-- 
Gitee


From 68153d83107e2c0e00af4aca550564b8a4ebaea5 Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Wed, 5 Jun 2024 19:56:51 -0400
Subject: [PATCH 340/407] iotests/270: Don't store data-file with json: prefix
 in image

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 5: EMBARGOED CVE-2024-4467 for rhel-8.10.z (PRDSC)
RH-Jira: RHEL-35616
RH-CVE: CVE-2024-4467
RH-Acked-by: Kevin Wolf <kwolf@redhat.com>
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
RH-Commit: [3/5] ac08690fd3ea3af6e24b2f6a8beedcfe469917a8

commit 705bcc2819ce8e0f8b9d660a93bc48de26413aec
Author: Kevin Wolf <kwolf@redhat.com>
Date:   Thu Apr 25 14:49:40 2024 +0200

    iotests/270: Don't store data-file with json: prefix in image

    We want to disable filename parsing for data files because it's too easy
    to abuse in malicious image files. Make the test ready for the change by
    passing the data file explicitly in command line options.

    Signed-off-by: Kevin Wolf <kwolf@redhat.com>
    Reviewed-by: Eric Blake <eblake@redhat.com>
    Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
    Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
    Upstream: N/A, embargoed
    Signed-off-by: Hanna Czenczek <hreitz@redhat.com>

Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 tests/qemu-iotests/270 | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/270 b/tests/qemu-iotests/270
index 74352342db5..c37b674aa2e 100755
--- a/tests/qemu-iotests/270
+++ b/tests/qemu-iotests/270
@@ -60,8 +60,16 @@ _make_test_img -o cluster_size=2M,data_file="$TEST_IMG.orig" \
 # "write" 2G of data without using any space.
 # (qemu-img create does not like it, though, because null-co does not
 # support image creation.)
-$QEMU_IMG amend -o data_file="json:{'driver':'null-co',,'size':'4294967296'}" \
-    "$TEST_IMG"
+test_img_with_null_data="json:{
+    'driver': '$IMGFMT',
+    'file': {
+        'filename': '$TEST_IMG'
+    },
+    'data-file': {
+        'driver': 'null-co',
+        'size':'4294967296'
+    }
+}"
 
 # This gives us a range of:
 #   2^31 - 512 + 768 - 1 = 2^31 + 255 > 2^31
@@ -74,7 +82,7 @@ $QEMU_IMG amend -o data_file="json:{'driver':'null-co',,'size':'4294967296'}" \
 # on L2 boundaries, we need large L2 tables; hence the cluster size of
 # 2 MB.  (Anything from 256 kB should work, though, because then one L2
 # table covers 8 GB.)
-$QEMU_IO -c "write 768 $((2 ** 31 - 512))" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "write 768 $((2 ** 31 - 512))" "$test_img_with_null_data" | _filter_qemu_io
 
 _check_test_img
 
-- 
Gitee


From ad6ea9bde1b08f15fdb9a58c90b3d41f6717ec1d Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Sun, 9 Jun 2024 23:08:39 -0400
Subject: [PATCH 341/407] block: introduce bdrv_open_file_child() helper

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 5: EMBARGOED CVE-2024-4467 for rhel-8.10.z (PRDSC)
RH-Jira: RHEL-35616
RH-CVE: CVE-2024-4467
RH-Acked-by: Kevin Wolf <kwolf@redhat.com>
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
RH-Commit: [4/5] 9f582a9aff740eb9ec6f64bfec94854038d8545f

Conflicts: - copy-before-write.c::cbw_copy() is an older version than
             upstream, but introduction of the new function is
	     straight-forward.
           - include/block/block-global-state.h doesn't exist in this
             code version. Adding the prototype to
             include/block/block.h instead.
           - struct BlockDriver has no field 'filtered_child_is_backing'
             We remove the corresponding assert() in the new function.

commit 83930780325b144a5908c45b3957b9b6457b3831
Author: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Date:   Tue Jul 26 23:11:21 2022 +0300

    block: introduce bdrv_open_file_child() helper

    Almost all drivers call bdrv_open_child() similarly. Let's create a
    helper for this.

    The only not updated drivers that call bdrv_open_child() to set
    bs->file are raw-format and snapshot-access:
        raw-format sometimes want to have filtered child but
            don't set drv->is_filter to true.
        snapshot-access wants only DATA | PRIMARY

    Possibly we should implement drv->is_filter_func() handler, to consider
    raw-format as filter when it works as filter.. But it's another story.

    Note also, that we decrease assignments to bs->file in code: it helps
    us restrict modifying this field in further commit.

    Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
    Reviewed-by: Hanna Reitz <hreitz@redhat.com>
    Message-Id: <20220726201134.924743-3-vsementsov@yandex-team.ru>
    Reviewed-by: Kevin Wolf <kwolf@redhat.com>
    Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 block.c                   | 18 ++++++++++++++++++
 block/blkdebug.c          |  9 +++------
 block/blklogwrites.c      |  7 ++-----
 block/blkreplay.c         |  7 ++-----
 block/blkverify.c         |  9 +++------
 block/bochs.c             |  7 +++----
 block/cloop.c             |  7 +++----
 block/copy-before-write.c |  9 ++++-----
 block/copy-on-read.c      |  9 ++++-----
 block/crypto.c            | 11 ++++++-----
 block/dmg.c               |  7 +++----
 block/filter-compress.c   |  8 +++-----
 block/parallels.c         |  7 +++----
 block/preallocate.c       |  9 ++++-----
 block/qcow.c              |  6 ++----
 block/qcow2.c             |  8 ++++----
 block/qed.c               |  8 ++++----
 block/replication.c       |  8 +++-----
 block/throttle.c          |  8 +++-----
 block/vdi.c               |  7 +++----
 block/vhdx.c              |  7 +++----
 block/vmdk.c              |  7 +++----
 block/vpc.c               |  7 +++----
 include/block/block.h     |  3 +++
 24 files changed, 92 insertions(+), 101 deletions(-)

diff --git a/block.c b/block.c
index 0ac5b163d2a..889f8785659 100644
--- a/block.c
+++ b/block.c
@@ -3546,6 +3546,24 @@ BdrvChild *bdrv_open_child(const char *filename,
                              errp);
 }
 
+/*
+ * Wrapper on bdrv_open_child() for most popular case: open primary child of bs.
+ */
+int bdrv_open_file_child(const char *filename,
+                         QDict *options, const char *bdref_key,
+                         BlockDriverState *parent, Error **errp)
+{
+    BdrvChildRole role;
+
+    role = parent->drv->is_filter ?
+        (BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY) : BDRV_CHILD_IMAGE;
+
+    parent->file = bdrv_open_child(filename, options, bdref_key, parent,
+                                   &child_of_bds, role, false, errp);
+
+    return parent->file ? 0 : -EINVAL;
+}
+
 /*
  * TODO Future callers may need to specify parent/child_class in order for
  * option inheritance to work. Existing callers use it for the root node.
diff --git a/block/blkdebug.c b/block/blkdebug.c
index bbf29487030..5fcfc8ac6fa 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -503,12 +503,9 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags,
     }
 
     /* Open the image file */
-    bs->file = bdrv_open_child(qemu_opt_get(opts, "x-image"), options, "image",
-                               bs, &child_of_bds,
-                               BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
-                               false, errp);
-    if (!bs->file) {
-        ret = -EINVAL;
+    ret = bdrv_open_file_child(qemu_opt_get(opts, "x-image"), options, "image",
+                               bs, errp);
+    if (ret < 0) {
         goto out;
     }
 
diff --git a/block/blklogwrites.c b/block/blklogwrites.c
index f7a251e91f9..f66a617eb3e 100644
--- a/block/blklogwrites.c
+++ b/block/blklogwrites.c
@@ -155,11 +155,8 @@ static int blk_log_writes_open(BlockDriverState *bs, QDict *options, int flags,
     }
 
     /* Open the file */
-    bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
-                               BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY, false,
-                               errp);
-    if (!bs->file) {
-        ret = -EINVAL;
+    ret = bdrv_open_file_child(NULL, options, "file", bs, errp);
+    if (ret < 0) {
         goto fail;
     }
 
diff --git a/block/blkreplay.c b/block/blkreplay.c
index dcbe780ddbd..76a0b8d12ae 100644
--- a/block/blkreplay.c
+++ b/block/blkreplay.c
@@ -26,11 +26,8 @@ static int blkreplay_open(BlockDriverState *bs, QDict *options, int flags,
     int ret;
 
     /* Open the image file */
-    bs->file = bdrv_open_child(NULL, options, "image", bs, &child_of_bds,
-                               BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
-                               false, errp);
-    if (!bs->file) {
-        ret = -EINVAL;
+    ret = bdrv_open_file_child(NULL, options, "image", bs, errp);
+    if (ret < 0) {
         goto fail;
     }
 
diff --git a/block/blkverify.c b/block/blkverify.c
index d1facf5ba90..920e8916841 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -121,12 +121,9 @@ static int blkverify_open(BlockDriverState *bs, QDict *options, int flags,
     }
 
     /* Open the raw file */
-    bs->file = bdrv_open_child(qemu_opt_get(opts, "x-raw"), options, "raw",
-                               bs, &child_of_bds,
-                               BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
-                               false, errp);
-    if (!bs->file) {
-        ret = -EINVAL;
+    ret = bdrv_open_file_child(qemu_opt_get(opts, "x-raw"), options, "raw",
+                               bs, errp);
+    if (ret < 0) {
         goto fail;
     }
 
diff --git a/block/bochs.c b/block/bochs.c
index 4d68658087b..b2dc06bbfdf 100644
--- a/block/bochs.c
+++ b/block/bochs.c
@@ -110,10 +110,9 @@ static int bochs_open(BlockDriverState *bs, QDict *options, int flags,
         return ret;
     }
 
-    bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
-                               BDRV_CHILD_IMAGE, false, errp);
-    if (!bs->file) {
-        return -EINVAL;
+    ret = bdrv_open_file_child(NULL, options, "file", bs, errp);
+    if (ret < 0) {
+        return ret;
     }
 
     ret = bdrv_pread(bs->file, 0, &bochs, sizeof(bochs));
diff --git a/block/cloop.c b/block/cloop.c
index b8c6d0eccdb..bee87da1734 100644
--- a/block/cloop.c
+++ b/block/cloop.c
@@ -71,10 +71,9 @@ static int cloop_open(BlockDriverState *bs, QDict *options, int flags,
         return ret;
     }
 
-    bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
-                               BDRV_CHILD_IMAGE, false, errp);
-    if (!bs->file) {
-        return -EINVAL;
+    ret = bdrv_open_file_child(NULL, options, "file", bs, errp);
+    if (ret < 0) {
+        return ret;
     }
 
     /* read header */
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index c30a5ff8dea..8aa2cb6a853 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -150,12 +150,11 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
 {
     BDRVCopyBeforeWriteState *s = bs->opaque;
     BdrvDirtyBitmap *copy_bitmap;
+    int ret;
 
-    bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
-                               BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
-                               false, errp);
-    if (!bs->file) {
-        return -EINVAL;
+    ret = bdrv_open_file_child(NULL, options, "file", bs, errp);
+    if (ret < 0) {
+        return ret;
     }
 
     s->target = bdrv_open_child(NULL, options, "target", bs, &child_of_bds,
diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index 1fc7fb3333b..815ac1d8356 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -41,12 +41,11 @@ static int cor_open(BlockDriverState *bs, QDict *options, int flags,
     BDRVStateCOR *state = bs->opaque;
     /* Find a bottom node name, if any */
     const char *bottom_node = qdict_get_try_str(options, "bottom");
+    int ret;
 
-    bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
-                               BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
-                               false, errp);
-    if (!bs->file) {
-        return -EINVAL;
+    ret = bdrv_open_file_child(NULL, options, "file", bs, errp);
+    if (ret < 0) {
+        return ret;
     }
 
     bs->supported_read_flags = BDRV_REQ_PREFETCH;
diff --git a/block/crypto.c b/block/crypto.c
index c8ba4681e20..abfce39230b 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -260,15 +260,14 @@ static int block_crypto_open_generic(QCryptoBlockFormat format,
 {
     BlockCrypto *crypto = bs->opaque;
     QemuOpts *opts = NULL;
-    int ret = -EINVAL;
+    int ret;
     QCryptoBlockOpenOptions *open_opts = NULL;
     unsigned int cflags = 0;
     QDict *cryptoopts = NULL;
 
-    bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
-                               BDRV_CHILD_IMAGE, false, errp);
-    if (!bs->file) {
-        return -EINVAL;
+    ret = bdrv_open_file_child(NULL, options, "file", bs, errp);
+    if (ret < 0) {
+        return ret;
     }
 
     bs->supported_write_flags = BDRV_REQ_FUA &
@@ -276,6 +275,7 @@ static int block_crypto_open_generic(QCryptoBlockFormat format,
 
     opts = qemu_opts_create(opts_spec, NULL, 0, &error_abort);
     if (!qemu_opts_absorb_qdict(opts, options, errp)) {
+        ret = -EINVAL;
         goto cleanup;
     }
 
@@ -284,6 +284,7 @@ static int block_crypto_open_generic(QCryptoBlockFormat format,
 
     open_opts = block_crypto_open_opts_init(cryptoopts, errp);
     if (!open_opts) {
+        ret = -EINVAL;
         goto cleanup;
     }
 
diff --git a/block/dmg.c b/block/dmg.c
index 447901fbb87..38c363dd39a 100644
--- a/block/dmg.c
+++ b/block/dmg.c
@@ -439,10 +439,9 @@ static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
         return ret;
     }
 
-    bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
-                               BDRV_CHILD_IMAGE, false, errp);
-    if (!bs->file) {
-        return -EINVAL;
+    ret = bdrv_open_file_child(NULL, options, "file", bs, errp);
+    if (ret < 0) {
+        return ret;
     }
 
     block_module_load_one("dmg-bz2");
diff --git a/block/filter-compress.c b/block/filter-compress.c
index d5be538619a..305716c86cb 100644
--- a/block/filter-compress.c
+++ b/block/filter-compress.c
@@ -30,11 +30,9 @@
 static int compress_open(BlockDriverState *bs, QDict *options, int flags,
                          Error **errp)
 {
-    bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
-                               BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
-                               false, errp);
-    if (!bs->file) {
-        return -EINVAL;
+    int ret = bdrv_open_file_child(NULL, options, "file", bs, errp);
+    if (ret < 0) {
+        return ret;
     }
 
     if (!bs->file->bs->drv || !block_driver_can_compress(bs->file->bs->drv)) {
diff --git a/block/parallels.c b/block/parallels.c
index 6ebad2a2bbc..ed4debd8996 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -735,10 +735,9 @@ static int parallels_open(BlockDriverState *bs, QDict *options, int flags,
     Error *local_err = NULL;
     char *buf;
 
-    bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
-                               BDRV_CHILD_IMAGE, false, errp);
-    if (!bs->file) {
-        return -EINVAL;
+    ret = bdrv_open_file_child(NULL, options, "file", bs, errp);
+    if (ret < 0) {
+        return ret;
     }
 
     ret = bdrv_pread(bs->file, 0, &ph, sizeof(ph));
diff --git a/block/preallocate.c b/block/preallocate.c
index 1d4233f7300..332408bdc9d 100644
--- a/block/preallocate.c
+++ b/block/preallocate.c
@@ -134,6 +134,7 @@ static int preallocate_open(BlockDriverState *bs, QDict *options, int flags,
                             Error **errp)
 {
     BDRVPreallocateState *s = bs->opaque;
+    int ret;
 
     /*
      * s->data_end and friends should be initialized on permission update.
@@ -141,11 +142,9 @@ static int preallocate_open(BlockDriverState *bs, QDict *options, int flags,
      */
     s->file_end = s->zero_start = s->data_end = -EINVAL;
 
-    bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
-                               BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
-                               false, errp);
-    if (!bs->file) {
-        return -EINVAL;
+    ret = bdrv_open_file_child(NULL, options, "file", bs, errp);
+    if (ret < 0) {
+        return ret;
     }
 
     if (!preallocate_absorb_opts(&s->opts, options, bs->file->bs, errp)) {
diff --git a/block/qcow.c b/block/qcow.c
index c39940f33eb..544a17261f9 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -120,10 +120,8 @@ static int qcow_open(BlockDriverState *bs, QDict *options, int flags,
     qdict_extract_subqdict(options, &encryptopts, "encrypt.");
     encryptfmt = qdict_get_try_str(encryptopts, "format");
 
-    bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
-                               BDRV_CHILD_IMAGE, false, errp);
-    if (!bs->file) {
-        ret = -EINVAL;
+    ret = bdrv_open_file_child(NULL, options, "file", bs, errp);
+    if (ret < 0) {
         goto fail;
     }
 
diff --git a/block/qcow2.c b/block/qcow2.c
index 6ee19196128..29ea157e6bc 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1907,11 +1907,11 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
         .errp = errp,
         .ret = -EINPROGRESS
     };
+    int ret;
 
-    bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
-                               BDRV_CHILD_IMAGE, false, errp);
-    if (!bs->file) {
-        return -EINVAL;
+    ret = bdrv_open_file_child(NULL, options, "file", bs, errp);
+    if (ret < 0) {
+        return ret;
     }
 
     /* Initialise locks */
diff --git a/block/qed.c b/block/qed.c
index 558d3646c4b..e3b06a3d009 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -558,11 +558,11 @@ static int bdrv_qed_open(BlockDriverState *bs, QDict *options, int flags,
         .errp = errp,
         .ret = -EINPROGRESS
     };
+    int ret;
 
-    bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
-                               BDRV_CHILD_IMAGE, false, errp);
-    if (!bs->file) {
-        return -EINVAL;
+    ret = bdrv_open_file_child(NULL, options, "file", bs, errp);
+    if (ret < 0) {
+        return ret;
     }
 
     bdrv_qed_init_state(bs);
diff --git a/block/replication.c b/block/replication.c
index 55c8f894aa3..2f173977641 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -88,11 +88,9 @@ static int replication_open(BlockDriverState *bs, QDict *options,
     const char *mode;
     const char *top_id;
 
-    bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
-                               BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
-                               false, errp);
-    if (!bs->file) {
-        return -EINVAL;
+    ret = bdrv_open_file_child(NULL, options, "file", bs, errp);
+    if (ret < 0) {
+        return ret;
     }
 
     ret = -EINVAL;
diff --git a/block/throttle.c b/block/throttle.c
index 6e8d52fa245..4fb5798c27a 100644
--- a/block/throttle.c
+++ b/block/throttle.c
@@ -78,11 +78,9 @@ static int throttle_open(BlockDriverState *bs, QDict *options,
     char *group;
     int ret;
 
-    bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
-                               BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
-                               false, errp);
-    if (!bs->file) {
-        return -EINVAL;
+    ret = bdrv_open_file_child(NULL, options, "file", bs, errp);
+    if (ret < 0) {
+        return ret;
     }
     bs->supported_write_flags = bs->file->bs->supported_write_flags |
                                 BDRV_REQ_WRITE_UNCHANGED;
diff --git a/block/vdi.c b/block/vdi.c
index bdc58d726ee..c50c0ed61fd 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -376,10 +376,9 @@ static int vdi_open(BlockDriverState *bs, QDict *options, int flags,
     int ret;
     QemuUUID uuid_link, uuid_parent;
 
-    bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
-                               BDRV_CHILD_IMAGE, false, errp);
-    if (!bs->file) {
-        return -EINVAL;
+    ret = bdrv_open_file_child(NULL, options, "file", bs, errp);
+    if (ret < 0) {
+        return ret;
     }
 
     logout("\n");
diff --git a/block/vhdx.c b/block/vhdx.c
index 356ec4c455a..e7d6d7509a7 100644
--- a/block/vhdx.c
+++ b/block/vhdx.c
@@ -996,10 +996,9 @@ static int vhdx_open(BlockDriverState *bs, QDict *options, int flags,
     uint64_t signature;
     Error *local_err = NULL;
 
-    bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
-                               BDRV_CHILD_IMAGE, false, errp);
-    if (!bs->file) {
-        return -EINVAL;
+    ret = bdrv_open_file_child(NULL, options, "file", bs, errp);
+    if (ret < 0) {
+        return ret;
     }
 
     s->bat = NULL;
diff --git a/block/vmdk.c b/block/vmdk.c
index 0dfab6e9413..7d7e56b36c8 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1262,10 +1262,9 @@ static int vmdk_open(BlockDriverState *bs, QDict *options, int flags,
     BDRVVmdkState *s = bs->opaque;
     uint32_t magic;
 
-    bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
-                               BDRV_CHILD_IMAGE, false, errp);
-    if (!bs->file) {
-        return -EINVAL;
+    ret = bdrv_open_file_child(NULL, options, "file", bs, errp);
+    if (ret < 0) {
+        return ret;
     }
 
     buf = vmdk_read_desc(bs->file, 0, errp);
diff --git a/block/vpc.c b/block/vpc.c
index 297a26262ab..430cab1cbb9 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -232,10 +232,9 @@ static int vpc_open(BlockDriverState *bs, QDict *options, int flags,
     int ret;
     int64_t bs_size;
 
-    bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
-                               BDRV_CHILD_IMAGE, false, errp);
-    if (!bs->file) {
-        return -EINVAL;
+    ret = bdrv_open_file_child(NULL, options, "file", bs, errp);
+    if (ret < 0) {
+        return ret;
     }
 
     opts = qemu_opts_create(&vpc_runtime_opts, NULL, 0, &error_abort);
diff --git a/include/block/block.h b/include/block/block.h
index e5dd22b0343..f885f113efb 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -376,6 +376,9 @@ BdrvChild *bdrv_open_child(const char *filename,
                            const BdrvChildClass *child_class,
                            BdrvChildRole child_role,
                            bool allow_none, Error **errp);
+int bdrv_open_file_child(const char *filename,
+                         QDict *options, const char *bdref_key,
+                         BlockDriverState *parent, Error **errp);
 BlockDriverState *bdrv_open_blockdev_ref(BlockdevRef *ref, Error **errp);
 int bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
                         Error **errp);
-- 
Gitee


From 43d92e2cea012221979132984375a18c9b62b52f Mon Sep 17 00:00:00 2001
From: Jon Maloy <jmaloy@redhat.com>
Date: Wed, 5 Jun 2024 19:56:51 -0400
Subject: [PATCH 342/407] block: Parse filenames only when explicitly requested

RH-Author: Jon Maloy <jmaloy@redhat.com>
RH-MergeRequest: 5: EMBARGOED CVE-2024-4467 for rhel-8.10.z (PRDSC)
RH-Jira: RHEL-35616
RH-CVE: CVE-2024-4467
RH-Acked-by: Kevin Wolf <kwolf@redhat.com>
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
RH-Commit: [5/5] a3e197add64fc6950c4ac576e34d833dfae7ee34

Conflicts: - brdv_open_child_common(): bdrv_graph_wrlock/unlock()
             don't exist in this code version. We ignore them.
	     bdrv_open_inherit(): no_coroutine_fn/GRAPH_UNLOCKED
             doesn't exist. We ignore it.
           - Changes to bdrv_open_file_child() didn't apply cleanly,
             but fixing it is straight-forward.
           - GLOBAL_STATE_CODE() not present in this code. Ignoring it.
           - bdrv_open_file_child(): Need to continue setting of
	     parent->file.

commit f44c2941d4419e60f16dea3e9adca164e75aa78d
Author: Kevin Wolf <kwolf@redhat.com>
Date:   Thu Apr 25 14:56:02 2024 +0200

    block: Parse filenames only when explicitly requested

    When handling image filenames from legacy options such as -drive or from
    tools, these filenames are parsed for protocol prefixes, including for
    the json:{} pseudo-protocol.

    This behaviour is intended for filenames that come directly from the
    command line and for backing files, which may come from the image file
    itself. Higher level management tools generally take care to verify that
    untrusted images don't contain a bad (or any) backing file reference;
    'qemu-img info' is a suitable tool for this.

    However, for other files that can be referenced in images, such as
    qcow2 data files or VMDK extents, the string from the image file is
    usually not verified by management tools - and 'qemu-img info' wouldn't
    be suitable because in contrast to backing files, it already opens these
    other referenced files. So here the string should be interpreted as a
    literal local filename. More complex configurations need to be specified
    explicitly on the command line or in QMP.

    This patch changes bdrv_open_inherit() so that it only parses filenames
    if a new parameter parse_filename is true. It is set for the top level
    in bdrv_open(), for the file child and for the backing file child. All
    other callers pass false and disable filename parsing this way.

    Signed-off-by: Kevin Wolf <kwolf@redhat.com>
    Reviewed-by: Eric Blake <eblake@redhat.com>
    Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
    Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
    Upstream: N/A, embargoed
    Signed-off-by: Hanna Czenczek <hreitz@redhat.com>

Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 block.c | 81 +++++++++++++++++++++++++++++++++++++++------------------
 1 file changed, 56 insertions(+), 25 deletions(-)

diff --git a/block.c b/block.c
index 889f8785659..ddebf50efa0 100644
--- a/block.c
+++ b/block.c
@@ -82,6 +82,7 @@ static BlockDriverState *bdrv_open_inherit(const char *filename,
                                            BlockDriverState *parent,
                                            const BdrvChildClass *child_class,
                                            BdrvChildRole child_role,
+                                           bool parse_filename,
                                            Error **errp);
 
 static bool bdrv_recurse_has_child(BlockDriverState *bs,
@@ -1926,7 +1927,8 @@ static void parse_json_protocol(QDict *options, const char **pfilename,
  * block driver has been specified explicitly.
  */
 static int bdrv_fill_options(QDict **options, const char *filename,
-                             int *flags, Error **errp)
+                             int *flags, bool allow_parse_filename,
+                             Error **errp)
 {
     const char *drvname;
     bool protocol = *flags & BDRV_O_PROTOCOL;
@@ -1966,7 +1968,7 @@ static int bdrv_fill_options(QDict **options, const char *filename,
     if (protocol && filename) {
         if (!qdict_haskey(*options, "filename")) {
             qdict_put_str(*options, "filename", filename);
-            parse_filename = true;
+            parse_filename = allow_parse_filename;
         } else {
             error_setg(errp, "Can't specify 'file' and 'filename' options at "
                              "the same time");
@@ -3439,7 +3441,8 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict *parent_options,
     }
 
     backing_hd = bdrv_open_inherit(backing_filename, reference, options, 0, bs,
-                                   &child_of_bds, bdrv_backing_role(bs), errp);
+                                   &child_of_bds, bdrv_backing_role(bs), true,
+                                   errp);
     if (!backing_hd) {
         bs->open_flags |= BDRV_O_NO_BACKING;
         error_prepend(errp, "Could not open backing file: ");
@@ -3472,7 +3475,8 @@ free_exit:
 static BlockDriverState *
 bdrv_open_child_bs(const char *filename, QDict *options, const char *bdref_key,
                    BlockDriverState *parent, const BdrvChildClass *child_class,
-                   BdrvChildRole child_role, bool allow_none, Error **errp)
+                   BdrvChildRole child_role, bool allow_none,
+                   bool parse_filename, Error **errp)
 {
     BlockDriverState *bs = NULL;
     QDict *image_options;
@@ -3503,7 +3507,8 @@ bdrv_open_child_bs(const char *filename, QDict *options, const char *bdref_key,
     }
 
     bs = bdrv_open_inherit(filename, reference, image_options, 0,
-                           parent, child_class, child_role, errp);
+                           parent, child_class, child_role, parse_filename,
+                           errp);
     if (!bs) {
         goto done;
     }
@@ -3513,6 +3518,29 @@ done:
     return bs;
 }
 
+static BdrvChild *bdrv_open_child_common(const char *filename,
+                                         QDict *options, const char *bdref_key,
+                                         BlockDriverState *parent,
+                                         const BdrvChildClass *child_class,
+                                         BdrvChildRole child_role,
+                                         bool allow_none, bool parse_filename,
+                                         Error **errp)
+{
+    BlockDriverState *bs;
+    BdrvChild *child;
+
+    bs = bdrv_open_child_bs(filename, options, bdref_key, parent, child_class,
+                            child_role, allow_none, parse_filename, errp);
+    if (bs == NULL) {
+        return NULL;
+    }
+
+    child = bdrv_attach_child(parent, bs, bdref_key, child_class, child_role,
+                              errp);
+
+    return child;
+}
+
 /*
  * Opens a disk image whose options are given as BlockdevRef in another block
  * device's options.
@@ -3534,20 +3562,17 @@ BdrvChild *bdrv_open_child(const char *filename,
                            BdrvChildRole child_role,
                            bool allow_none, Error **errp)
 {
-    BlockDriverState *bs;
-
-    bs = bdrv_open_child_bs(filename, options, bdref_key, parent, child_class,
-                            child_role, allow_none, errp);
-    if (bs == NULL) {
-        return NULL;
-    }
-
-    return bdrv_attach_child(parent, bs, bdref_key, child_class, child_role,
-                             errp);
+    return bdrv_open_child_common(filename, options, bdref_key, parent,
+                                  child_class, child_role, allow_none, false,
+                                  errp);
 }
 
 /*
- * Wrapper on bdrv_open_child() for most popular case: open primary child of bs.
+ * This does mostly the same as bdrv_open_child(), but for opening the primary
+ * child of a node. A notable difference from bdrv_open_child() is that it
+ * enables filename parsing for protocol names (including json:).
+ *
+ * @parent can move to a different AioContext in this function.
  */
 int bdrv_open_file_child(const char *filename,
                          QDict *options, const char *bdref_key,
@@ -3558,8 +3583,9 @@ int bdrv_open_file_child(const char *filename,
     role = parent->drv->is_filter ?
         (BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY) : BDRV_CHILD_IMAGE;
 
-    parent->file = bdrv_open_child(filename, options, bdref_key, parent,
-                                   &child_of_bds, role, false, errp);
+    parent->file = bdrv_open_child_common(filename, options, bdref_key, parent,
+                                          &child_of_bds, role, false, true,
+                                          errp);
 
     return parent->file ? 0 : -EINVAL;
 }
@@ -3599,7 +3625,8 @@ BlockDriverState *bdrv_open_blockdev_ref(BlockdevRef *ref, Error **errp)
 
     }
 
-    bs = bdrv_open_inherit(NULL, reference, qdict, 0, NULL, NULL, 0, errp);
+    bs = bdrv_open_inherit(NULL, reference, qdict, 0, NULL, NULL, 0, false,
+                           errp);
     obj = NULL;
     qobject_unref(obj);
     visit_free(v);
@@ -3690,6 +3717,7 @@ static BlockDriverState *bdrv_open_inherit(const char *filename,
                                            BlockDriverState *parent,
                                            const BdrvChildClass *child_class,
                                            BdrvChildRole child_role,
+                                           bool parse_filename,
                                            Error **errp)
 {
     int ret;
@@ -3733,9 +3761,11 @@ static BlockDriverState *bdrv_open_inherit(const char *filename,
     }
 
     /* json: syntax counts as explicit options, as if in the QDict */
-    parse_json_protocol(options, &filename, &local_err);
-    if (local_err) {
-        goto fail;
+    if (parse_filename) {
+        parse_json_protocol(options, &filename, &local_err);
+        if (local_err) {
+            goto fail;
+        }
     }
 
     bs->explicit_options = qdict_clone_shallow(options);
@@ -3760,7 +3790,8 @@ static BlockDriverState *bdrv_open_inherit(const char *filename,
                                      parent->open_flags, parent->options);
     }
 
-    ret = bdrv_fill_options(&options, filename, &flags, &local_err);
+    ret = bdrv_fill_options(&options, filename, &flags, parse_filename,
+                            &local_err);
     if (ret < 0) {
         goto fail;
     }
@@ -3829,7 +3860,7 @@ static BlockDriverState *bdrv_open_inherit(const char *filename,
 
         file_bs = bdrv_open_child_bs(filename, options, "file", bs,
                                      &child_of_bds, BDRV_CHILD_IMAGE,
-                                     true, &local_err);
+                                     true, true, &local_err);
         if (local_err) {
             goto fail;
         }
@@ -3974,7 +4005,7 @@ BlockDriverState *bdrv_open(const char *filename, const char *reference,
                             QDict *options, int flags, Error **errp)
 {
     return bdrv_open_inherit(filename, reference, options, flags, NULL,
-                             NULL, 0, errp);
+                             NULL, 0, true, errp);
 }
 
 /* Return true if the NULL-terminated @list contains @str */
-- 
Gitee


From d92857d41f8bb626ed40ddf21a668504f364d07a Mon Sep 17 00:00:00 2001
From: Jacob Wang <jacob.wang@openanolis.org>
Date: Mon, 2 Aug 2021 01:16:05 +0800
Subject: [PATCH 343/407] virtiofsd: Adjust limit for minor version

Upstream virtiofsd only supports fuse >= 7.31
(https://github.com/qemu/qemu/commit/72c42e2d65510e073cf78fdc924d121c77fa0080),
while Cloud Kernel has fuse version 7.27, which causes virtiofs fails to run
due to version mismatch. This limitation is unnecessary in Cloud Kernel because
we have already backported mandatory fuse patches to support virtofs
frontend in Kernel. Hence, adjust the minor version limit to 7.27 to
suppress the limitation.

Note that current fuse implementation in Cloud Kernel might lack of some
certain capabilities in fuse 7.28 ~ 7.31, which may cause unexpected results,
this patch is merely a workaround to enable virtiofs in guest kernel side and
further action is ongoing to make sure fuse APIs in both sides are 100%
compatible.

Signed-off-by: Jacob Wang <jacob.wang@openanolis.org>
Acked-by: Caspar Zhang <caspar@linux.alibaba.com>
---
 tools/virtiofsd/fuse_lowlevel.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index 57f928463a1..133e4fd40b3 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -1929,7 +1929,7 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
     outarg.major = FUSE_KERNEL_VERSION;
     outarg.minor = FUSE_KERNEL_MINOR_VERSION;
 
-    if (arg->major < 7 || (arg->major == 7 && arg->minor < 31)) {
+    if (arg->major < 7 || (arg->major == 7 && arg->minor < 27)) {
         fuse_log(FUSE_LOG_ERR, "fuse: unsupported protocol version: %u.%u\n",
                  arg->major, arg->minor);
         fuse_reply_err(req, EPROTO);
-- 
Gitee


From eb088acc0e773dde9d4e90d4eeea7a4f6f062c46 Mon Sep 17 00:00:00 2001
From: jiangxin <jiangxin@hygon.cn>
Date: Tue, 24 Aug 2021 14:57:28 +0800
Subject: [PATCH 344/407] anolis: csv/i386: add CSV context

CSV is the secure virtualization feature on Hygon CPU.
It is compatible with the AMD SEV/SEV-ES and extends out
specific funtionality by setting bit 6 of guest policy.

Add the context and the build option.

Change-Id: I02b48c779dd25ddda4546fe3e28c1fe0c8c2e4c6
Signed-off-by: Xin Jiang <jiangxin@hygon.cn>
---
 configs/devices/i386-softmmu/default.mak      |  1 +
 .../x86_64-softmmu/x86_64-rh-devices.mak      |  1 +
 hw/i386/Kconfig                               |  5 ++
 target/i386/csv-sysemu-stub.c                 | 16 ++++++
 target/i386/csv.c                             | 51 +++++++++++++++++++
 target/i386/csv.h                             | 35 +++++++++++++
 target/i386/meson.build                       |  1 +
 7 files changed, 110 insertions(+)
 create mode 100644 target/i386/csv-sysemu-stub.c
 create mode 100644 target/i386/csv.c
 create mode 100644 target/i386/csv.h

diff --git a/configs/devices/i386-softmmu/default.mak b/configs/devices/i386-softmmu/default.mak
index 598c6646dfc..db83ffcab9d 100644
--- a/configs/devices/i386-softmmu/default.mak
+++ b/configs/devices/i386-softmmu/default.mak
@@ -23,6 +23,7 @@
 #CONFIG_TPM_TIS_ISA=n
 #CONFIG_VTD=n
 #CONFIG_SGX=n
+#CONFIG_CSV=n
 
 # Boards:
 #
diff --git a/configs/devices/x86_64-softmmu/x86_64-rh-devices.mak b/configs/devices/x86_64-softmmu/x86_64-rh-devices.mak
index 31ce08edaba..ee1df7aa52c 100644
--- a/configs/devices/x86_64-softmmu/x86_64-rh-devices.mak
+++ b/configs/devices/x86_64-softmmu/x86_64-rh-devices.mak
@@ -102,3 +102,4 @@ CONFIG_TPM_TIS_ISA=y
 CONFIG_TPM_EMULATOR=y
 CONFIG_TPM_PASSTHROUGH=y
 CONFIG_SGX=y
+CONFIG_CSV=y
diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig
index d22ac4a4b95..ed35d762b36 100644
--- a/hw/i386/Kconfig
+++ b/hw/i386/Kconfig
@@ -10,6 +10,10 @@ config SGX
     bool
     depends on KVM
 
+config CSV
+    bool
+    depends on SEV
+
 config PC
     bool
     imply APPLESMC
@@ -26,6 +30,7 @@ config PC
     imply QXL
     imply SEV
     imply SGX
+    imply CSV
     imply SGA
     imply TEST_DEVICES
     imply TPM_CRB
diff --git a/target/i386/csv-sysemu-stub.c b/target/i386/csv-sysemu-stub.c
new file mode 100644
index 00000000000..a89b2600e79
--- /dev/null
+++ b/target/i386/csv-sysemu-stub.c
@@ -0,0 +1,16 @@
+/*
+ * QEMU CSV support
+ *
+ * Copyright: Hygon Info Technologies Ltd. 2022
+ *
+ * Author:
+ *      Jiang Xin <jiangxin@hygon.cn>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "sev.h"
+#include "csv.h"
diff --git a/target/i386/csv.c b/target/i386/csv.c
new file mode 100644
index 00000000000..aac825d3f92
--- /dev/null
+++ b/target/i386/csv.c
@@ -0,0 +1,51 @@
+/*
+ * QEMU CSV support
+ *
+ * Copyright: Hygon Info Technologies Ltd. 2022
+ *
+ * Author:
+ *      Jiang Xin <jiangxin@hygon.cn>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+
+#include "cpu.h"
+#include "sev.h"
+#include "csv.h"
+
+CsvGuestState csv_guest = { 0 };
+
+#define CPUID_VENDOR_HYGON_EBX   0x6f677948  /* "Hygo" */
+#define CPUID_VENDOR_HYGON_ECX   0x656e6975  /* "uine" */
+#define CPUID_VENDOR_HYGON_EDX   0x6e65476e  /* "nGen" */
+
+#define GUEST_POLICY_CSV_BIT     (1 << 6)
+
+static bool is_hygon_cpu(void)
+{
+    uint32_t ebx = 0;
+    uint32_t ecx = 0;
+    uint32_t edx = 0;
+
+    host_cpuid(0, 0, NULL, &ebx, &ecx, &edx);
+
+    if (ebx == CPUID_VENDOR_HYGON_EBX &&
+        ecx == CPUID_VENDOR_HYGON_ECX &&
+        edx == CPUID_VENDOR_HYGON_EDX)
+        return true;
+   else
+        return false;
+}
+
+bool
+csv_enabled(void)
+{
+    if (!is_hygon_cpu())
+        return false;
+
+    return sev_es_enabled() && (csv_guest.policy & GUEST_POLICY_CSV_BIT);
+}
diff --git a/target/i386/csv.h b/target/i386/csv.h
new file mode 100644
index 00000000000..057d37d975d
--- /dev/null
+++ b/target/i386/csv.h
@@ -0,0 +1,35 @@
+/*
+ * QEMU CSV support
+ *
+ * Copyright: Hygon Info Technologies Ltd. 2022
+ *
+ * Author:
+ *      Jiang Xin <jiangxin@hygon.cn>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef QEMU_CSV_H
+#define QEMU_CSV_H
+
+#include "qapi/qapi-commands-misc-target.h"
+
+#ifdef CONFIG_CSV
+bool csv_enabled(void);
+#else
+#define csv_enabled() 0
+#endif
+
+struct CsvGuestState {
+    uint32_t policy;
+    int sev_fd;
+    void *state;
+};
+
+typedef struct CsvGuestState CsvGuestState;
+
+extern struct CsvGuestState csv_guest;
+
+#endif
diff --git a/target/i386/meson.build b/target/i386/meson.build
index ae38dc95635..361dea92905 100644
--- a/target/i386/meson.build
+++ b/target/i386/meson.build
@@ -21,6 +21,7 @@ i386_softmmu_ss.add(files(
   'cpu-sysemu.c',
 ))
 i386_softmmu_ss.add(when: 'CONFIG_SEV', if_true: files('sev.c'), if_false: files('sev-sysemu-stub.c'))
+i386_softmmu_ss.add(when: 'CONFIG_CSV', if_true: files('csv.c'), if_false: files('csv-sysemu-stub.c'))
 
 i386_user_ss = ss.source_set()
 
-- 
Gitee


From 91f00e19e6c828a8a679afe77194517d51c8563a Mon Sep 17 00:00:00 2001
From: jiangxin <jiangxin@hygon.cn>
Date: Wed, 25 Aug 2021 11:07:41 +0800
Subject: [PATCH 345/407] anolis: csv/i386: add command to initialize CSV
 context

When CSV is enabled, KVM_CSV_INIT command is used to initialize
the platform, which is implemented by reusing the SEV API
framework and extending the functionality.

The KVM_CSV_INIT command should be performed earlier than
any other command.

Signed-off-by: Xin Jiang <jiangxin@hygon.cn>
Change-Id: Ia4201dc90c250c23658e1cf5e19b528df9075330
---
 linux-headers/linux/kvm.h     | 11 +++++++++
 target/i386/csv-sysemu-stub.c |  5 ++++
 target/i386/csv.c             | 44 +++++++++++++++++++++++++++++++++++
 target/i386/csv.h             |  3 +++
 target/i386/sev.c             | 17 ++++++++++++++
 target/i386/sev.h             |  5 ++++
 6 files changed, 85 insertions(+)

diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index c65930288c7..36fb857a1b4 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -1916,6 +1916,17 @@ struct kvm_sev_receive_update_data {
 	__u32 trans_len;
 };
 
+/* CSV command */
+enum csv_cmd_id {
+	KVM_CSV_NR_MIN = 0xc0,
+
+	KVM_CSV_INIT = KVM_CSV_NR_MIN,
+};
+
+struct kvm_csv_init_data {
+	__u64 nodemask;
+};
+
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
 #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
diff --git a/target/i386/csv-sysemu-stub.c b/target/i386/csv-sysemu-stub.c
index a89b2600e79..dbd710dc6f5 100644
--- a/target/i386/csv-sysemu-stub.c
+++ b/target/i386/csv-sysemu-stub.c
@@ -14,3 +14,8 @@
 #include "qemu/osdep.h"
 #include "sev.h"
 #include "csv.h"
+
+int csv_init(uint32_t policy, int fd, void *state, struct sev_ops *ops)
+{
+    return 0;
+}
diff --git a/target/i386/csv.c b/target/i386/csv.c
index aac825d3f92..c11f59f30c3 100644
--- a/target/i386/csv.c
+++ b/target/i386/csv.c
@@ -13,6 +13,12 @@
 
 #include "qemu/osdep.h"
 
+#include <linux/kvm.h>
+
+#ifdef CONFIG_NUMA
+#include <numaif.h>
+#endif
+
 #include "cpu.h"
 #include "sev.h"
 #include "csv.h"
@@ -41,6 +47,44 @@ static bool is_hygon_cpu(void)
         return false;
 }
 
+int
+csv_init(uint32_t policy, int fd, void *state, struct sev_ops *ops)
+{
+    int fw_error;
+    int ret;
+    struct kvm_csv_init_data data = { 0 };
+
+#ifdef CONFIG_NUMA
+    int mode;
+    unsigned long nodemask;
+
+    /* Set flags as 0 to retrieve the default NUMA policy. */
+    ret = get_mempolicy(&mode, &nodemask, sizeof(nodemask) * 8, NULL, 0);
+    if (ret == 0 && (mode == MPOL_BIND))
+        data.nodemask = nodemask;
+#endif
+
+    if (!ops || !ops->sev_ioctl || !ops->fw_error_to_str)
+        return -1;
+
+    csv_guest.policy = policy;
+    if (csv_enabled()) {
+        ret = ops->sev_ioctl(fd, KVM_CSV_INIT, &data, &fw_error);
+        if (ret) {
+            csv_guest.policy = 0;
+            error_report("%s: Fail to initialize ret=%d fw_error=%d '%s'",
+                       __func__, ret, fw_error, ops->fw_error_to_str(fw_error));
+            return -1;
+        }
+
+        csv_guest.sev_fd = fd;
+        csv_guest.state = state;
+        csv_guest.sev_ioctl = ops->sev_ioctl;
+        csv_guest.fw_error_to_str = ops->fw_error_to_str;
+    }
+    return 0;
+}
+
 bool
 csv_enabled(void)
 {
diff --git a/target/i386/csv.h b/target/i386/csv.h
index 057d37d975d..886dbb26130 100644
--- a/target/i386/csv.h
+++ b/target/i386/csv.h
@@ -26,10 +26,13 @@ struct CsvGuestState {
     uint32_t policy;
     int sev_fd;
     void *state;
+    int (*sev_ioctl)(int fd, int cmd, void *data, int *error);
+    const char *(*fw_error_to_str)(int code);
 };
 
 typedef struct CsvGuestState CsvGuestState;
 
 extern struct CsvGuestState csv_guest;
+extern int csv_init(uint32_t policy, int fd, void *state, struct sev_ops *ops);
 
 #endif
diff --git a/target/i386/sev.c b/target/i386/sev.c
index ba6a65e90c1..e9e61478dad 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -26,6 +26,7 @@
 #include "crypto/hash.h"
 #include "sysemu/kvm.h"
 #include "sev.h"
+#include "csv.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/runstate.h"
 #include "trace.h"
@@ -43,6 +44,8 @@
 OBJECT_DECLARE_SIMPLE_TYPE(SevGuestState, SEV_GUEST)
 
 
+extern struct sev_ops sev_ops;
+
 /**
  * SevGuestState:
  *
@@ -963,6 +966,15 @@ int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
         goto err;
     }
 
+    /* Support CSV */
+    if (!ret && cmd == KVM_SEV_ES_INIT) {
+        ret = csv_init(sev_guest->policy, sev->sev_fd, &sev->state, &sev_ops);
+        if (ret) {
+            error_setg(errp, "%s: failed to init csv context", __func__);
+            goto err;
+        }
+    }
+
     ret = sev_launch_start(sev);
     if (ret) {
         error_setg(errp, "%s: failed to create encryption context", __func__);
@@ -1343,6 +1355,11 @@ bool sev_add_kernel_loader_hashes(SevKernelLoaderContext *ctx, Error **errp)
     return ret;
 }
 
+struct sev_ops sev_ops = {
+    .sev_ioctl = sev_ioctl,
+    .fw_error_to_str = fw_error_to_str,
+};
+
 static void
 sev_register_types(void)
 {
diff --git a/target/i386/sev.h b/target/i386/sev.h
index 83e82aa42c4..1c97bff99e0 100644
--- a/target/i386/sev.h
+++ b/target/i386/sev.h
@@ -59,4 +59,9 @@ void sev_es_set_reset_vector(CPUState *cpu);
 
 int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp);
 
+struct sev_ops {
+    int (*sev_ioctl)(int fd, int cmd, void *data, int *error);
+    const char *(*fw_error_to_str)(int code);
+};
+
 #endif
-- 
Gitee


From 1958d93e6da5c4801fe3d5098d674cbbcdd49961 Mon Sep 17 00:00:00 2001
From: jiangxin <jiangxin@hygon.cn>
Date: Wed, 25 Aug 2021 09:59:16 +0800
Subject: [PATCH 346/407] anolis: csv/i386: add command to load data to guest
 memory

The KVM_CSV_LAUNCH_ENCRYPT_DATA command is used to load data to an
encrypted guest memory in an isolated memory region that guest owns.

Signed-off-by: Xin Jiang <jiangxin@hygon.cn>
Change-Id: I6d4bea4969a0afa87fb8f2e52fb7ab02ec0db2ed
---
 linux-headers/linux/kvm.h     |  7 ++++
 target/i386/csv-sysemu-stub.c |  5 +++
 target/i386/csv.c             | 69 +++++++++++++++++++++++++++++++++++
 target/i386/csv.h             |  2 +
 target/i386/trace-events      |  3 ++
 5 files changed, 86 insertions(+)

diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 36fb857a1b4..6a838e34f37 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -1921,6 +1921,13 @@ enum csv_cmd_id {
 	KVM_CSV_NR_MIN = 0xc0,
 
 	KVM_CSV_INIT = KVM_CSV_NR_MIN,
+	KVM_CSV_LAUNCH_ENCRYPT_DATA,
+};
+
+struct kvm_csv_launch_encrypt_data {
+	__u64 gpa;
+	__u64 uaddr;
+	__u32 len;
 };
 
 struct kvm_csv_init_data {
diff --git a/target/i386/csv-sysemu-stub.c b/target/i386/csv-sysemu-stub.c
index dbd710dc6f5..236a6909d2e 100644
--- a/target/i386/csv-sysemu-stub.c
+++ b/target/i386/csv-sysemu-stub.c
@@ -19,3 +19,8 @@ int csv_init(uint32_t policy, int fd, void *state, struct sev_ops *ops)
 {
     return 0;
 }
+
+int csv_load_data(uint64_t gpa, uint8_t *ptr, uint64_t len, Error **errp)
+{
+    g_assert_not_reached();
+}
diff --git a/target/i386/csv.c b/target/i386/csv.c
index c11f59f30c3..3b3dff51745 100644
--- a/target/i386/csv.c
+++ b/target/i386/csv.c
@@ -14,11 +14,13 @@
 #include "qemu/osdep.h"
 
 #include <linux/kvm.h>
+#include "qapi/error.h"
 
 #ifdef CONFIG_NUMA
 #include <numaif.h>
 #endif
 
+#include "trace.h"
 #include "cpu.h"
 #include "sev.h"
 #include "csv.h"
@@ -93,3 +95,70 @@ csv_enabled(void)
 
     return sev_es_enabled() && (csv_guest.policy & GUEST_POLICY_CSV_BIT);
 }
+
+static bool
+csv_check_state(SevState state)
+{
+    return *((SevState *)csv_guest.state) == state ? true : false;
+}
+
+static int
+csv_ioctl(int cmd, void *data, int *error)
+{
+    if (csv_guest.sev_ioctl)
+        return csv_guest.sev_ioctl(csv_guest.sev_fd, cmd, data, error);
+    else
+        return -1;
+}
+
+static const char *
+fw_error_to_str(int code)
+{
+    if (csv_guest.fw_error_to_str)
+        return csv_guest.fw_error_to_str(code);
+    else
+        return NULL;
+}
+
+static int
+csv_launch_encrypt_data(uint64_t gpa, uint8_t *addr, uint64_t len)
+{
+    int ret, fw_error;
+    struct kvm_csv_launch_encrypt_data update;
+
+    if (!addr || !len) {
+        return 1;
+    }
+
+    update.gpa = (__u64)gpa;
+    update.uaddr = (__u64)(unsigned long)addr;
+    update.len = len;
+    trace_kvm_csv_launch_encrypt_data(gpa, addr, len);
+    ret = csv_ioctl(KVM_CSV_LAUNCH_ENCRYPT_DATA, &update, &fw_error);
+    if (ret) {
+        error_report("%s: CSV LAUNCH_ENCRYPT_DATA ret=%d fw_error=%d '%s'",
+                __func__, ret, fw_error, fw_error_to_str(fw_error));
+    }
+
+    return ret;
+}
+
+int
+csv_load_data(uint64_t gpa, uint8_t *ptr, uint64_t len, Error **errp)
+{
+    int ret = 0;
+
+    if (!csv_enabled()) {
+        error_setg(errp, "%s: CSV is not enabled", __func__);
+        return -1;
+    }
+
+    /* if CSV is in update state then load the data to secure memory */
+    if (csv_check_state(SEV_STATE_LAUNCH_UPDATE)) {
+        ret = csv_launch_encrypt_data(gpa, ptr, len);
+        if (ret)
+            error_setg(errp, "%s: CSV fail to encrypt data", __func__);
+    }
+
+    return ret;
+}
diff --git a/target/i386/csv.h b/target/i386/csv.h
index 886dbb26130..6f7b112d964 100644
--- a/target/i386/csv.h
+++ b/target/i386/csv.h
@@ -35,4 +35,6 @@ typedef struct CsvGuestState CsvGuestState;
 extern struct CsvGuestState csv_guest;
 extern int csv_init(uint32_t policy, int fd, void *state, struct sev_ops *ops);
 
+int csv_load_data(uint64_t gpa, uint8_t *ptr, uint64_t len, Error **errp);
+
 #endif
diff --git a/target/i386/trace-events b/target/i386/trace-events
index 2cd8726eebb..b7da9bd7481 100644
--- a/target/i386/trace-events
+++ b/target/i386/trace-events
@@ -11,3 +11,6 @@ kvm_sev_launch_measurement(const char *value) "data %s"
 kvm_sev_launch_finish(void) ""
 kvm_sev_launch_secret(uint64_t hpa, uint64_t hva, uint64_t secret, int len) "hpa 0x%" PRIx64 " hva 0x%" PRIx64 " data 0x%" PRIx64 " len %d"
 kvm_sev_attestation_report(const char *mnonce, const char *data) "mnonce %s data %s"
+
+# csv.c
+kvm_csv_launch_encrypt_data(uint64_t gpa, void *addr, uint64_t len) "gpa 0x%" PRIx64 "addr %p len 0x%" PRIu64
-- 
Gitee


From eefeb2844587499dd296890c4d554a72da24b2b3 Mon Sep 17 00:00:00 2001
From: jiangxin <jiangxin@hygon.cn>
Date: Wed, 25 Aug 2021 12:25:05 +0800
Subject: [PATCH 347/407] anolis: csv/i386: add command to load vmcb to guest
 memory

The KVM_CSV_LAUNCH_ENCRYPT_VMCB command is used to load and encrypt
the initial VMCB data to secure memory in an isolated region
that guest owns.

Signed-off-by: Xin Jiang <jiangxin@hygon.cn>
Change-Id: I821bc8ab726f1bd22df36c163196951504eaa8da
---
 linux-headers/linux/kvm.h     |  1 +
 target/i386/csv-sysemu-stub.c |  5 +++++
 target/i386/csv.c             | 21 +++++++++++++++++++++
 target/i386/csv.h             |  1 +
 target/i386/sev.c             |  7 +++++--
 5 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 6a838e34f37..36350607f54 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -1922,6 +1922,7 @@ enum csv_cmd_id {
 
 	KVM_CSV_INIT = KVM_CSV_NR_MIN,
 	KVM_CSV_LAUNCH_ENCRYPT_DATA,
+	KVM_CSV_LAUNCH_ENCRYPT_VMCB,
 };
 
 struct kvm_csv_launch_encrypt_data {
diff --git a/target/i386/csv-sysemu-stub.c b/target/i386/csv-sysemu-stub.c
index 236a6909d2e..a5ce986e3c1 100644
--- a/target/i386/csv-sysemu-stub.c
+++ b/target/i386/csv-sysemu-stub.c
@@ -24,3 +24,8 @@ int csv_load_data(uint64_t gpa, uint8_t *ptr, uint64_t len, Error **errp)
 {
     g_assert_not_reached();
 }
+
+int csv_launch_encrypt_vmcb(void)
+{
+    g_assert_not_reached();
+}
diff --git a/target/i386/csv.c b/target/i386/csv.c
index 3b3dff51745..d166d3775eb 100644
--- a/target/i386/csv.c
+++ b/target/i386/csv.c
@@ -162,3 +162,24 @@ csv_load_data(uint64_t gpa, uint8_t *ptr, uint64_t len, Error **errp)
 
     return ret;
 }
+
+int
+csv_launch_encrypt_vmcb(void)
+{
+    int ret, fw_error;
+
+    if (!csv_enabled()) {
+        error_report("%s: CSV is not enabled",__func__);
+        return -1;
+    }
+
+    ret = csv_ioctl(KVM_CSV_LAUNCH_ENCRYPT_VMCB, NULL, &fw_error);
+    if (ret) {
+        error_report("%s: CSV LAUNCH_ENCRYPT_VMCB ret=%d fw_error=%d '%s'",
+                     __func__, ret, fw_error, fw_error_to_str(fw_error));
+        goto err;
+    }
+
+err:
+    return ret;
+}
diff --git a/target/i386/csv.h b/target/i386/csv.h
index 6f7b112d964..7dc5c75366f 100644
--- a/target/i386/csv.h
+++ b/target/i386/csv.h
@@ -34,6 +34,7 @@ typedef struct CsvGuestState CsvGuestState;
 
 extern struct CsvGuestState csv_guest;
 extern int csv_init(uint32_t policy, int fd, void *state, struct sev_ops *ops);
+extern int csv_launch_encrypt_vmcb(void);
 
 int csv_load_data(uint64_t gpa, uint8_t *ptr, uint64_t len, Error **errp);
 
diff --git a/target/i386/sev.c b/target/i386/sev.c
index e9e61478dad..188787a060b 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -770,8 +770,11 @@ sev_launch_get_measure(Notifier *notifier, void *unused)
     }
 
     if (sev_es_enabled()) {
-        /* measure all the VM save areas before getting launch_measure */
-        ret = sev_launch_update_vmsa(sev);
+        if (csv_enabled())
+            ret = csv_launch_encrypt_vmcb();
+        else
+            /* measure all the VM save areas before getting launch_measure */
+            ret = sev_launch_update_vmsa(sev);
         if (ret) {
             exit(1);
         }
-- 
Gitee


From c7f2f559184fc08e4a222506c5991f59da929f15 Mon Sep 17 00:00:00 2001
From: jiangxin <jiangxin@hygon.cn>
Date: Tue, 24 Aug 2021 17:31:28 +0800
Subject: [PATCH 348/407] anolis: cpu/i386: populate CPUID 0x8000_001F when CSV
 is active

On Hygon platform, bit 30 of EAX indicates whether
this feature is supported in hardware.

When CSV is active, CPUID 0x8000_001F provides
information for it.

Signed-off-by: Xin Jiang <jiangxin@hygon.cn>
Change-Id: Ifdd2a20f7cb4a079ba918928f012e5f64f7059e6
---
 target/i386/cpu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 726814ee2e8..7d779c6a8ca 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -27,6 +27,7 @@
 #include "sysemu/hvf.h"
 #include "kvm/kvm_i386.h"
 #include "sev.h"
+#include "csv.h"
 #include "qapi/error.h"
 #include "qapi/qapi-visit-machine.h"
 #include "qapi/qmp/qerror.h"
@@ -5844,6 +5845,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
         if (sev_enabled()) {
             *eax = 0x2;
             *eax |= sev_es_enabled() ? 0x8 : 0;
+            *eax |= csv_enabled() ? 0x40000000 : 0; /* bit 30 for CSV */
             *ebx = sev_get_cbit_position() & 0x3f; /* EBX[5:0] */
             *ebx |= (sev_get_reduced_phys_bits() & 0x3f) << 6; /* EBX[11:6] */
         }
-- 
Gitee


From d4cd2ad8cd9e562eeb86bc6dd828b0be9e55924b Mon Sep 17 00:00:00 2001
From: jiangxin <jiangxin@hygon.cn>
Date: Wed, 25 Aug 2021 12:36:00 +0800
Subject: [PATCH 349/407] anolis: csv/i386: CSV guest do not need
 register/unregister guest secure memory

CSV guest memory is allocated by firmware in secure processor
from dedicated memory reserved upon system boot up,
consequently it is not necessary to add notifier to pin/unpin memory.

Signed-off-by: Xin Jiang <jiangxin@hygon.cn>
Change-Id: I10d5b5ee8dbc3a1bf9ed1935c006c61a094f1e8d
---
 target/i386/sev.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index 188787a060b..8fa0dcf10fb 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -984,7 +984,10 @@ int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
         goto err;
     }
 
-    ram_block_notifier_add(&sev_ram_notifier);
+    /* CSV guest needs no notifier to reg/unreg memory */
+    if (!csv_enabled()) {
+        ram_block_notifier_add(&sev_ram_notifier);
+    }
     qemu_add_machine_init_done_notifier(&sev_machine_done_notify);
     qemu_add_vm_change_state_handler(sev_vm_state_change, sev);
 
-- 
Gitee


From 998f1394d6ece4dd955a2143e6666e9bb8d3bdb5 Mon Sep 17 00:00:00 2001
From: Xin Jiang <jiangxin@hygon.cn>
Date: Fri, 25 Aug 2023 14:51:16 +0800
Subject: [PATCH 350/407] anolis: target/i386: csv: load initial image to
 private memory

The initial image of CSV guest should be loaded into private memory
before guest boot.

Add APIs to implement the image load.

Signed-off-by: Xin Jiang <jiangxin@hygon.cn>
Change-Id: I708e7521bfe079216c0e0aea619a9c6bb1d7af04
---
 hw/i386/pc_sysfw.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c
index c8b17af9535..84aad306dc7 100644
--- a/hw/i386/pc_sysfw.c
+++ b/hw/i386/pc_sysfw.c
@@ -38,6 +38,7 @@
 #include "hw/block/flash.h"
 #include "sysemu/kvm.h"
 #include "sev.h"
+#include "csv.h"
 
 #define FLASH_SECTOR_SIZE 4096
 
@@ -208,7 +209,10 @@ static void pc_system_flash_map(PCMachineState *pcms,
                     exit(1);
                 }
 
-                sev_encrypt_flash(flash_ptr, flash_size, &error_fatal);
+                if (csv_enabled())
+                    csv_load_data(flash_mem->addr, flash_ptr, flash_size, &error_fatal);
+                else
+                    sev_encrypt_flash(flash_ptr, flash_size, &error_fatal);
             }
         }
     }
-- 
Gitee


From 3fdb069c77e775af04a64136cf621fbe553b7ca5 Mon Sep 17 00:00:00 2001
From: Xin Jiang <jiangxin@hygon.cn>
Date: Thu, 13 Jul 2023 09:35:10 +0800
Subject: [PATCH 351/407] anolis: vga: force full update for CSV guest

As CSV's NPT(nested page table) is managed by firmware, VMM is hard
to track the dirty pages of vga buffer. Although VMM could perform
a command to firmware to update read/write attribute of vga buffer
in NPT, it costs more time due to communication between VMM and
firmware. So the simplest method is to fully update vga buffer
always.

Signed-off-by: Xin Jiang <jiangxin@hygon.cn>
Change-Id: I51ff00440483011fb9088a75005644beb378f8dd
---
 hw/display/vga.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/hw/display/vga.c b/hw/display/vga.c
index 9d1f66af402..be4282a2f53 100644
--- a/hw/display/vga.c
+++ b/hw/display/vga.c
@@ -36,6 +36,9 @@
 #include "migration/vmstate.h"
 #include "trace.h"
 
+#include "target/i386/sev.h"
+#include "target/i386/csv.h"
+
 //#define DEBUG_VGA_MEM
 //#define DEBUG_VGA_REG
 
@@ -1779,6 +1782,11 @@ static void vga_update_display(void *opaque)
             s->cursor_blink_time = qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL);
             full_update = 1;
         }
+
+        /* Force to full update in CSV guest. */
+        if (csv_enabled())
+            full_update = 1;
+
         switch(graphic_mode) {
         case GMODE_TEXT:
             vga_draw_text(s, full_update);
-- 
Gitee


From 8c5cc9cf9301f3e18522fbbfe3731641366e0389 Mon Sep 17 00:00:00 2001
From: Brijesh Singh <brijesh.singh@amd.com>
Date: Thu, 7 May 2020 22:26:17 +0000
Subject: [PATCH 352/407] doc: update AMD SEV to include Live migration flow

cherry-picked from https://github.com/AMDESE/qemu/commit/0e2b3d80e3.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: hanliyang <hanliyang@hygon.cn>
---
 docs/amd-memory-encryption.txt | 40 +++++++++++++++++++++++++++++++++-
 1 file changed, 39 insertions(+), 1 deletion(-)

diff --git a/docs/amd-memory-encryption.txt b/docs/amd-memory-encryption.txt
index ffca382b5f5..17e2714d935 100644
--- a/docs/amd-memory-encryption.txt
+++ b/docs/amd-memory-encryption.txt
@@ -126,7 +126,45 @@ TODO
 
 Live Migration
 ----------------
-TODO
+AMD SEV encrypts the memory of VMs and because a different key is used
+in each VM, the hypervisor will be unable to simply copy the
+ciphertext from one VM to another to migrate the VM. Instead the AMD SEV Key
+Management API provides sets of function which the hypervisor can use
+to package a guest page for migration, while maintaining the confidentiality
+provided by AMD SEV.
+
+SEV guest VMs have the concept of private and shared memory. The private
+memory is encrypted with the guest-specific key, while shared memory may
+be encrypted with the hypervisor key. The migration APIs provided by the
+SEV API spec should be used for migrating the private pages. The
+KVM_GET_PAGE_ENC_BITMAP ioctl can be used to get the guest page encryption
+bitmap. The bitmap can be used to check if the given guest page is
+private or shared.
+
+Before initiating the migration, we need to know the targets machine's public
+Diffie-Hellman key (PDH) and certificate chain. It can be retrieved
+with the 'query-sev-capabilities' QMP command or using the sev-tool. The
+migrate-set-parameter can be used to pass the target machine's PDH and
+certificate chain.
+
+During the migration flow, the SEND_START is called on the source hypervisor
+to create an outgoing encryption context. The SEV guest policy dictates whether
+the certificate passed through the migrate-sev-set-info command will be
+validated. SEND_UPDATE_DATA is called to encrypt the guest private pages.
+After migration is completed, SEND_FINISH is called to destroy the encryption
+context and make the VM non-runnable to protect it against cloning.
+
+On the target machine, RECEIVE_START is called first to create an
+incoming encryption context. The RECEIVE_UPDATE_DATA is called to copy
+the received encrypted page into guest memory. After migration has
+completed, RECEIVE_FINISH is called to make the VM runnable.
+
+For more information about the migration see SEV API Appendix A
+Usage flow (Live migration section).
+
+NOTE:
+To protect against the memory clone SEV APIs are designed to make the VM
+unrunnable in case of the migration failure.
 
 References
 -----------------
-- 
Gitee


From 043621434ea791fb6c1532e51d5fd927847fd5d6 Mon Sep 17 00:00:00 2001
From: Brijesh Singh <brijesh.singh@amd.com>
Date: Tue, 27 Jul 2021 11:27:00 +0000
Subject: [PATCH 353/407] migration.json: add AMD SEV specific migration
 parameters

cherry-picked from https://github.com/AMDESE/qemu/commit/d6a23bde6b6e.

AMD SEV migration flow requires that target machine's public Diffie-Hellman
key (PDH) and certificate chain must be passed before initiating the guest
migration. User can use QMP 'migrate-set-parameters' to pass the certificate
chain. The certificate chain will be used while creating the outgoing
encryption context.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: hanliyang <hanliyang@hygon.cn>
---
 migration/migration.c | 66 +++++++++++++++++++++++++++++++++++++++++++
 monitor/hmp-cmds.c    | 18 ++++++++++++
 qapi/migration.json   | 40 ++++++++++++++++++++++++--
 3 files changed, 121 insertions(+), 3 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 1ad82e63f00..b1ae6a9ff80 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -932,6 +932,12 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
     params->announce_rounds = s->parameters.announce_rounds;
     params->has_announce_step = true;
     params->announce_step = s->parameters.announce_step;
+    params->has_sev_pdh = true;
+    params->sev_pdh = g_strdup(s->parameters.sev_pdh);
+    params->has_sev_plat_cert = true;
+    params->sev_plat_cert = g_strdup(s->parameters.sev_plat_cert);
+    params->has_sev_amd_cert = true;
+    params->sev_amd_cert = g_strdup(s->parameters.sev_amd_cert);
 
     if (s->parameters.has_block_bitmap_mapping) {
         params->has_block_bitmap_mapping = true;
@@ -1638,6 +1644,19 @@ static void migrate_params_test_apply(MigrateSetParameters *params,
         dest->has_block_bitmap_mapping = true;
         dest->block_bitmap_mapping = params->block_bitmap_mapping;
     }
+
+    if (params->has_sev_pdh) {
+        assert(params->sev_pdh->type == QTYPE_QSTRING);
+        dest->sev_pdh = g_strdup(params->sev_pdh->u.s);
+    }
+    if (params->has_sev_plat_cert) {
+        assert(params->sev_plat_cert->type == QTYPE_QSTRING);
+        dest->sev_plat_cert = g_strdup(params->sev_plat_cert->u.s);
+    }
+    if (params->has_sev_amd_cert) {
+        assert(params->sev_amd_cert->type == QTYPE_QSTRING);
+        dest->sev_amd_cert = g_strdup(params->sev_amd_cert->u.s);
+    }
 }
 
 static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
@@ -1760,6 +1779,22 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
             QAPI_CLONE(BitmapMigrationNodeAliasList,
                        params->block_bitmap_mapping);
     }
+
+    if (params->has_sev_pdh) {
+        g_free(s->parameters.sev_pdh);
+        assert(params->sev_pdh->type == QTYPE_QSTRING);
+        s->parameters.sev_pdh = g_strdup(params->sev_pdh->u.s);
+    }
+    if (params->has_sev_plat_cert) {
+        g_free(s->parameters.sev_plat_cert);
+        assert(params->sev_plat_cert->type == QTYPE_QSTRING);
+        s->parameters.sev_plat_cert = g_strdup(params->sev_plat_cert->u.s);
+    }
+    if (params->has_sev_amd_cert) {
+        g_free(s->parameters.sev_amd_cert);
+        assert(params->sev_amd_cert->type == QTYPE_QSTRING);
+        s->parameters.sev_amd_cert = g_strdup(params->sev_amd_cert->u.s);
+    }
 }
 
 void qmp_migrate_set_parameters(MigrateSetParameters *params, Error **errp)
@@ -1780,6 +1815,27 @@ void qmp_migrate_set_parameters(MigrateSetParameters *params, Error **errp)
         params->tls_hostname->type = QTYPE_QSTRING;
         params->tls_hostname->u.s = strdup("");
     }
+    /* TODO Rewrite "" to null instead */
+    if (params->has_sev_pdh
+        && params->sev_pdh->type == QTYPE_QNULL) {
+        qobject_unref(params->sev_pdh->u.n);
+        params->sev_pdh->type = QTYPE_QSTRING;
+        params->sev_pdh->u.s = strdup("");
+    }
+    /* TODO Rewrite "" to null instead */
+    if (params->has_sev_plat_cert
+        && params->sev_plat_cert->type == QTYPE_QNULL) {
+        qobject_unref(params->sev_plat_cert->u.n);
+        params->sev_plat_cert->type = QTYPE_QSTRING;
+        params->sev_plat_cert->u.s = strdup("");
+    }
+    /* TODO Rewrite "" to null instead */
+    if (params->has_sev_amd_cert
+        && params->sev_amd_cert->type == QTYPE_QNULL) {
+        qobject_unref(params->sev_amd_cert->u.n);
+        params->sev_amd_cert->type = QTYPE_QSTRING;
+        params->sev_amd_cert->u.s = strdup("");
+    }
 
     migrate_params_test_apply(params, &tmp);
 
@@ -4339,6 +4395,9 @@ static void migration_instance_finalize(Object *obj)
     qemu_mutex_destroy(&ms->qemu_file_lock);
     g_free(params->tls_hostname);
     g_free(params->tls_creds);
+    g_free(params->sev_pdh);
+    g_free(params->sev_plat_cert);
+    g_free(params->sev_amd_cert);
     qemu_sem_destroy(&ms->wait_unplug_sem);
     qemu_sem_destroy(&ms->rate_limit_sem);
     qemu_sem_destroy(&ms->pause_sem);
@@ -4390,6 +4449,13 @@ static void migration_instance_init(Object *obj)
     params->has_tls_hostname = true;
     params->has_tls_authz = true;
 
+    params->sev_pdh = g_strdup("");
+    params->sev_plat_cert = g_strdup("");
+    params->sev_amd_cert = g_strdup("");
+    params->has_sev_pdh = true;
+    params->has_sev_plat_cert = true;
+    params->has_sev_amd_cert = true;
+
     qemu_sem_init(&ms->postcopy_pause_sem, 0);
     qemu_sem_init(&ms->postcopy_pause_rp_sem, 0);
     qemu_sem_init(&ms->rp_state.rp_sem, 0);
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index f7216ab5d02..409bd15a68d 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -1349,6 +1349,24 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
         error_setg(&err, "The block-bitmap-mapping parameter can only be set "
                    "through QMP");
         break;
+    case MIGRATION_PARAMETER_SEV_PDH:
+        p->has_sev_pdh = true;
+        p->sev_pdh = g_new0(StrOrNull, 1);
+        p->sev_pdh->type = QTYPE_QSTRING;
+        visit_type_str(v, param, &p->sev_pdh->u.s, &err);
+        break;
+    case MIGRATION_PARAMETER_SEV_PLAT_CERT:
+        p->has_sev_plat_cert = true;
+        p->sev_plat_cert = g_new0(StrOrNull, 1);
+        p->sev_plat_cert->type = QTYPE_QSTRING;
+        visit_type_str(v, param, &p->sev_plat_cert->u.s, &err);
+        break;
+    case MIGRATION_PARAMETER_SEV_AMD_CERT:
+        p->has_sev_amd_cert = true;
+        p->sev_amd_cert = g_new0(StrOrNull, 1);
+        p->sev_amd_cert->type = QTYPE_QSTRING;
+        visit_type_str(v, param, &p->sev_amd_cert->u.s, &err);
+        break;
     default:
         assert(0);
     }
diff --git a/qapi/migration.json b/qapi/migration.json
index 94bc5c69dbb..bad53a2f8db 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -774,6 +774,15 @@
 #                        block device name if there is one, and to their node name
 #                        otherwise. (Since 5.2)
 #
+# @sev-pdh: The target host platform diffie-hellman key encoded in base64
+#           (Since 4.2)
+#
+# @sev-plat-cert: The target host platform certificate chain encoded in base64
+#                 (Since 4.2)
+#
+# @sev-amd-cert: AMD certificate chain which include ASK and OCA encoded in
+#                base64 (Since 4.2)
+#
 # Features:
 # @unstable: Member @x-checkpoint-delay is experimental.
 #
@@ -794,7 +803,8 @@
            'xbzrle-cache-size', 'max-postcopy-bandwidth',
            'max-cpu-throttle', 'multifd-compression',
            'multifd-zlib-level' ,'multifd-zstd-level',
-           'block-bitmap-mapping' ] }
+           'block-bitmap-mapping',
+           'sev-pdh', 'sev-plat-cert', 'sev-amd-cert' ] }
 
 ##
 # @MigrateSetParameters:
@@ -939,6 +949,15 @@
 #                        block device name if there is one, and to their node name
 #                        otherwise. (Since 5.2)
 #
+# @sev-pdh: The target host platform diffie-hellman key encoded in base64
+#           (Since 4.2)
+#
+# @sev-plat-cert: The target host platform certificate chain encoded in base64
+#                 (Since 4.2)
+#
+# @sev-amd-cert: AMD certificate chain which include ASK and OCA encoded in
+#                base64 (Since 4.2)
+#
 # Features:
 # @unstable: Member @x-checkpoint-delay is experimental.
 #
@@ -974,7 +993,10 @@
             '*multifd-compression': 'MultiFDCompression',
             '*multifd-zlib-level': 'uint8',
             '*multifd-zstd-level': 'uint8',
-            '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
+            '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ],
+            '*sev-pdh':'StrOrNull',
+            '*sev-plat-cert': 'StrOrNull',
+            '*sev-amd-cert' : 'StrOrNull' } }
 
 ##
 # @migrate-set-parameters:
@@ -1139,6 +1161,15 @@
 #                        block device name if there is one, and to their node name
 #                        otherwise. (Since 5.2)
 #
+# @sev-pdh: The target host platform diffie-hellman key encoded in base64
+#           (Since 4.2)
+#
+# @sev-plat-cert: The target host platform certificate chain encoded in base64
+#                 (Since 4.2)
+#
+# @sev-amd-cert: AMD certificate chain which include ASK and OCA encoded in
+#                base64 (Since 4.2)
+#
 # Features:
 # @unstable: Member @x-checkpoint-delay is experimental.
 #
@@ -1172,7 +1203,10 @@
             '*multifd-compression': 'MultiFDCompression',
             '*multifd-zlib-level': 'uint8',
             '*multifd-zstd-level': 'uint8',
-            '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
+            '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ],
+            '*sev-pdh':'str',
+            '*sev-plat-cert': 'str',
+            '*sev-amd-cert' : 'str'} }
 
 ##
 # @query-migrate-parameters:
-- 
Gitee


From b14be3ce577531fb50c98e142e9bcf9428cfd583 Mon Sep 17 00:00:00 2001
From: Brijesh Singh <brijesh.singh@amd.com>
Date: Tue, 27 Jul 2021 11:41:37 +0000
Subject: [PATCH 354/407] confidential guest support: introduce
 ConfidentialGuestMemoryEncryptionOps for encrypted VMs

cherry-picked from https://github.com/AMDESE/qemu/commit/74fce7be9bd.

When memory encryption is enabled in VM, the guest RAM will be encrypted
with the guest-specific key, to protect the confidentiality of data while
in transit we need to platform specific hooks to save or migrate the
guest RAM.

Introduce the new ConfidentialGuestMemoryEncryptionOps in this patch
which will be later used by the encrypted guest for migration.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: hanliyang <hanliyang@hygon.cn>
---
 include/exec/confidential-guest-support.h | 27 +++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/include/exec/confidential-guest-support.h b/include/exec/confidential-guest-support.h
index ba2dd4b5dfc..343f686fc2b 100644
--- a/include/exec/confidential-guest-support.h
+++ b/include/exec/confidential-guest-support.h
@@ -53,8 +53,35 @@ struct ConfidentialGuestSupport {
     bool ready;
 };
 
+/**
+ * The functions registers with ConfidentialGuestMemoryEncryptionOps will be
+ * used during the encrypted guest migration.
+ */
+struct ConfidentialGuestMemoryEncryptionOps {
+    /* Initialize the platform specific state before starting the migration */
+    int (*save_setup)(const char *pdh, const char *plat_cert,
+                      const char *amd_cert);
+
+    /* Write the encrypted page and metadata associated with it */
+    int (*save_outgoing_page)(QEMUFile *f, uint8_t *ptr, uint32_t size,
+                              uint64_t *bytes_sent);
+
+    /* Load the incoming encrypted page into guest memory */
+    int (*load_incoming_page)(QEMUFile *f, uint8_t *ptr);
+
+    /* Check if gfn is in shared/unencrypted region */
+    bool (*is_gfn_in_unshared_region)(unsigned long gfn);
+
+    /* Write the shared regions list */
+    int (*save_outgoing_shared_regions_list)(QEMUFile *f);
+
+    /* Load the shared regions list */
+    int (*load_incoming_shared_regions_list)(QEMUFile *f);
+};
+
 typedef struct ConfidentialGuestSupportClass {
     ObjectClass parent;
+    struct ConfidentialGuestMemoryEncryptionOps *memory_encryption_ops;
 } ConfidentialGuestSupportClass;
 
 #endif /* !CONFIG_USER_ONLY */
-- 
Gitee


From 1291b7305b144ffc9db269b386fd777786ee57a6 Mon Sep 17 00:00:00 2001
From: Brijesh Singh <brijesh.singh@amd.com>
Date: Tue, 27 Jul 2021 12:10:23 +0000
Subject: [PATCH 355/407] target/i386: sev: provide callback to setup outgoing
 context

cherry-picked from https://github.com/AMDESE/qemu/commit/7521883afc0.

The user provides the target machine's Platform Diffie-Hellman key (PDH)
and certificate chain before starting the SEV guest migration. Cache the
certificate chain as we need them while creating the outgoing context.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: hanliyang <hanliyang@hygon.cn>
---
 target/i386/sev.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++
 target/i386/sev.h |  2 ++
 2 files changed, 61 insertions(+)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index 8fa0dcf10fb..300d2db2886 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -76,6 +76,12 @@ struct SevGuestState {
     int sev_fd;
     SevState state;
     gchar *measurement;
+    guchar *remote_pdh;
+    size_t remote_pdh_len;
+    guchar *remote_plat_cert;
+    size_t remote_plat_cert_len;
+    guchar *amd_cert;
+    size_t amd_cert_len;
 
     uint32_t reset_cs;
     uint32_t reset_ip;
@@ -160,6 +166,12 @@ static const char *const sev_fw_errlist[] = {
 
 #define SEV_FW_MAX_ERROR      ARRAY_SIZE(sev_fw_errlist)
 
+#define SEV_FW_BLOB_MAX_SIZE            0x4000          /* 16KB */
+
+static struct ConfidentialGuestMemoryEncryptionOps sev_memory_encryption_ops = {
+    .save_setup = sev_save_setup,
+};
+
 static int
 sev_ioctl(int fd, int cmd, void *data, int *error)
 {
@@ -872,6 +884,48 @@ sev_vm_state_change(void *opaque, bool running, RunState state)
     }
 }
 
+static inline bool check_blob_length(size_t value)
+{
+    if (value > SEV_FW_BLOB_MAX_SIZE) {
+        error_report("invalid length max=%d got=%ld",
+                     SEV_FW_BLOB_MAX_SIZE, value);
+        return false;
+    }
+
+    return true;
+}
+
+int sev_save_setup(const char *pdh, const char *plat_cert,
+                   const char *amd_cert)
+{
+    SevGuestState *s = sev_guest;
+
+    s->remote_pdh = g_base64_decode(pdh, &s->remote_pdh_len);
+    if (!check_blob_length(s->remote_pdh_len)) {
+        goto error;
+    }
+
+    s->remote_plat_cert = g_base64_decode(plat_cert,
+                                          &s->remote_plat_cert_len);
+    if (!check_blob_length(s->remote_plat_cert_len)) {
+        goto error;
+    }
+
+    s->amd_cert = g_base64_decode(amd_cert, &s->amd_cert_len);
+    if (!check_blob_length(s->amd_cert_len)) {
+        goto error;
+    }
+
+    return 0;
+
+error:
+    g_free(s->remote_pdh);
+    g_free(s->remote_plat_cert);
+    g_free(s->amd_cert);
+
+    return 1;
+}
+
 int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
 {
     SevGuestState *sev
@@ -886,6 +940,9 @@ int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
         return 0;
     }
 
+    ConfidentialGuestSupportClass *cgs_class =
+        (ConfidentialGuestSupportClass *) object_get_class(OBJECT(cgs));
+
     ret = ram_block_discard_disable(true);
     if (ret) {
         error_report("%s: cannot disable RAM discard", __func__);
@@ -991,6 +1048,8 @@ int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
     qemu_add_machine_init_done_notifier(&sev_machine_done_notify);
     qemu_add_vm_change_state_handler(sev_vm_state_change, sev);
 
+    cgs_class->memory_encryption_ops = &sev_memory_encryption_ops;
+
     cgs->ready = true;
 
     return 0;
diff --git a/target/i386/sev.h b/target/i386/sev.h
index 1c97bff99e0..14801ec44db 100644
--- a/target/i386/sev.h
+++ b/target/i386/sev.h
@@ -51,6 +51,8 @@ extern uint32_t sev_get_reduced_phys_bits(void);
 extern bool sev_add_kernel_loader_hashes(SevKernelLoaderContext *ctx, Error **errp);
 
 int sev_encrypt_flash(uint8_t *ptr, uint64_t len, Error **errp);
+int sev_save_setup(const char *pdh, const char *plat_cert,
+                   const char *amd_cert);
 int sev_inject_launch_secret(const char *hdr, const char *secret,
                              uint64_t gpa, Error **errp);
 
-- 
Gitee


From f4db607f0f817f28a3b433aa959e277290cd4835 Mon Sep 17 00:00:00 2001
From: Brijesh Singh <brijesh.singh@amd.com>
Date: Tue, 27 Jul 2021 12:16:09 +0000
Subject: [PATCH 356/407] target/i386: sev: do not create launch context for an
 incoming guest

cherry-picked from https://github.com/AMDESE/qemu/commit/b85694233495.

The LAUNCH_START is used for creating an encryption context to encrypt
newly created guest, for an incoming guest the RECEIVE_START should be
used.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: hanliyang <hanliyang@hygon.cn>
---
 target/i386/sev.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index 300d2db2886..0ef02d141a9 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -1035,10 +1035,16 @@ int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
         }
     }
 
-    ret = sev_launch_start(sev);
-    if (ret) {
-        error_setg(errp, "%s: failed to create encryption context", __func__);
-        goto err;
+    /*
+     * The LAUNCH context is used for new guest, if its an incoming guest
+     * then RECEIVE context will be created after the connection is established.
+     */
+    if (!runstate_check(RUN_STATE_INMIGRATE)) {
+        ret = sev_launch_start(sev);
+        if (ret) {
+            error_setg(errp, "%s: failed to create encryption context", __func__);
+            goto err;
+        }
     }
 
     /* CSV guest needs no notifier to reg/unreg memory */
-- 
Gitee


From 90646abdd6c0964ed32bedf3835dcb7057463e85 Mon Sep 17 00:00:00 2001
From: Brijesh Singh <brijesh.singh@amd.com>
Date: Tue, 27 Jul 2021 12:55:25 +0000
Subject: [PATCH 357/407] target/i386: sev: add support to encrypt the outgoing
 page

cherry-picked from https://github.com/AMDESE/qemu/commit/5187c6f86bd.

The sev_save_outgoing_page() provide the implementation to encrypt the
guest private pages during the transit. The routines uses the SEND_START
command to create the outgoing encryption context on the first call then
uses the SEND_UPDATE_DATA command to encrypt the data before writing it
to the socket. While encrypting the data SEND_UPDATE_DATA produces some
metadata (e.g MAC, IV). The metadata is also sent to the target machine.
After migration is completed, we issue the SEND_FINISH command to transition
the SEV guest state from sending to unrunnable state.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: hanliyang <hanliyang@hygon.cn>
---
 target/i386/sev.c        | 221 +++++++++++++++++++++++++++++++++++++++
 target/i386/sev.h        |   2 +
 target/i386/trace-events |   3 +
 3 files changed, 226 insertions(+)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index 0ef02d141a9..491bdc91c27 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -31,6 +31,8 @@
 #include "sysemu/runstate.h"
 #include "trace.h"
 #include "migration/blocker.h"
+#include "migration/qemu-file.h"
+#include "migration/misc.h"
 #include "qom/object.h"
 #include "monitor/monitor.h"
 #include "monitor/hmp-target.h"
@@ -82,6 +84,8 @@ struct SevGuestState {
     size_t remote_plat_cert_len;
     guchar *amd_cert;
     size_t amd_cert_len;
+    gchar *send_packet_hdr;
+    size_t send_packet_hdr_len;
 
     uint32_t reset_cs;
     uint32_t reset_ip;
@@ -170,6 +174,7 @@ static const char *const sev_fw_errlist[] = {
 
 static struct ConfidentialGuestMemoryEncryptionOps sev_memory_encryption_ops = {
     .save_setup = sev_save_setup,
+    .save_outgoing_page = sev_save_outgoing_page,
 };
 
 static int
@@ -926,6 +931,40 @@ error:
     return 1;
 }
 
+static void
+sev_send_finish(void)
+{
+    int ret, error;
+
+    trace_kvm_sev_send_finish();
+    ret = sev_ioctl(sev_guest->sev_fd, KVM_SEV_SEND_FINISH, 0, &error);
+    if (ret) {
+        error_report("%s: SEND_FINISH ret=%d fw_error=%d '%s'",
+                     __func__, ret, error, fw_error_to_str(error));
+    }
+
+    g_free(sev_guest->send_packet_hdr);
+    sev_set_guest_state(sev_guest, SEV_STATE_RUNNING);
+}
+
+static void
+sev_migration_state_notifier(Notifier *notifier, void *data)
+{
+    MigrationState *s = data;
+
+    if (migration_has_finished(s) ||
+        migration_in_postcopy_after_devices(s) ||
+        migration_has_failed(s)) {
+        if (sev_check_state(sev_guest, SEV_STATE_SEND_UPDATE)) {
+            sev_send_finish();
+        }
+    }
+}
+
+static Notifier sev_migration_state_notify = {
+    .notify = sev_migration_state_notifier,
+};
+
 int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
 {
     SevGuestState *sev
@@ -1053,6 +1092,7 @@ int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
     }
     qemu_add_machine_init_done_notifier(&sev_machine_done_notify);
     qemu_add_vm_change_state_handler(sev_vm_state_change, sev);
+    add_migration_state_change_notifier(&sev_migration_state_notify);
 
     cgs_class->memory_encryption_ops = &sev_memory_encryption_ops;
 
@@ -1294,6 +1334,187 @@ int sev_es_save_reset_vector(void *flash_ptr, uint64_t flash_size)
     return 0;
 }
 
+static int
+sev_get_send_session_length(void)
+{
+    int ret, fw_err = 0;
+    struct kvm_sev_send_start start = {};
+
+    ret = sev_ioctl(sev_guest->sev_fd, KVM_SEV_SEND_START, &start, &fw_err);
+    if (fw_err != SEV_RET_INVALID_LEN) {
+        ret = -1;
+        error_report("%s: failed to get session length ret=%d fw_error=%d '%s'",
+                     __func__, ret, fw_err, fw_error_to_str(fw_err));
+        goto err;
+    }
+
+    ret = start.session_len;
+err:
+    return ret;
+}
+
+static int
+sev_send_start(SevGuestState *s, QEMUFile *f, uint64_t *bytes_sent)
+{
+    gsize pdh_len = 0, plat_cert_len;
+    int session_len, ret, fw_error;
+    struct kvm_sev_send_start start = { };
+    guchar *pdh = NULL, *plat_cert = NULL, *session = NULL;
+    Error *local_err = NULL;
+
+    if (!s->remote_pdh || !s->remote_plat_cert || !s->amd_cert_len) {
+        error_report("%s: missing remote PDH or PLAT_CERT", __func__);
+        return 1;
+    }
+
+   start.pdh_cert_uaddr = (uintptr_t) s->remote_pdh;
+   start.pdh_cert_len = s->remote_pdh_len;
+
+   start.plat_certs_uaddr = (uintptr_t)s->remote_plat_cert;
+   start.plat_certs_len = s->remote_plat_cert_len;
+
+   start.amd_certs_uaddr = (uintptr_t)s->amd_cert;
+   start.amd_certs_len = s->amd_cert_len;
+
+    /* get the session length */
+   session_len = sev_get_send_session_length();
+   if (session_len < 0) {
+       ret = 1;
+       goto err;
+   }
+
+   session = g_new0(guchar, session_len);
+   start.session_uaddr = (unsigned long)session;
+   start.session_len = session_len;
+
+   /* Get our PDH certificate */
+   ret = sev_get_pdh_info(s->sev_fd, &pdh, &pdh_len,
+                          &plat_cert, &plat_cert_len, &local_err);
+   if (ret) {
+       error_report("Failed to get our PDH cert");
+       goto err;
+   }
+
+   trace_kvm_sev_send_start(start.pdh_cert_uaddr, start.pdh_cert_len,
+                            start.plat_certs_uaddr, start.plat_certs_len,
+                            start.amd_certs_uaddr, start.amd_certs_len);
+
+   ret = sev_ioctl(s->sev_fd, KVM_SEV_SEND_START, &start, &fw_error);
+   if (ret < 0) {
+       error_report("%s: SEND_START ret=%d fw_error=%d '%s'",
+               __func__, ret, fw_error, fw_error_to_str(fw_error));
+       goto err;
+   }
+
+   qemu_put_be32(f, start.policy);
+   qemu_put_be32(f, pdh_len);
+   qemu_put_buffer(f, (uint8_t *)pdh, pdh_len);
+   qemu_put_be32(f, start.session_len);
+   qemu_put_buffer(f, (uint8_t *)start.session_uaddr, start.session_len);
+   *bytes_sent = 12 + pdh_len + start.session_len;
+
+   sev_set_guest_state(s, SEV_STATE_SEND_UPDATE);
+
+err:
+   g_free(pdh);
+   g_free(plat_cert);
+   return ret;
+}
+
+static int
+sev_send_get_packet_len(int *fw_err)
+{
+    int ret;
+    struct kvm_sev_send_update_data update = { 0, };
+
+    ret = sev_ioctl(sev_guest->sev_fd, KVM_SEV_SEND_UPDATE_DATA,
+                    &update, fw_err);
+    if (*fw_err != SEV_RET_INVALID_LEN) {
+        ret = -1;
+        error_report("%s: failed to get session length ret=%d fw_error=%d '%s'",
+                    __func__, ret, *fw_err, fw_error_to_str(*fw_err));
+        goto err;
+    }
+
+    ret = update.hdr_len;
+
+err:
+    return ret;
+}
+
+static int
+sev_send_update_data(SevGuestState *s, QEMUFile *f, uint8_t *ptr, uint32_t size,
+                     uint64_t *bytes_sent)
+{
+    int ret, fw_error;
+    guchar *trans;
+    struct kvm_sev_send_update_data update = { };
+
+    /*
+     * If this is first call then query the packet header bytes and allocate
+     * the packet buffer.
+     */
+    if (!s->send_packet_hdr) {
+        s->send_packet_hdr_len = sev_send_get_packet_len(&fw_error);
+        if (s->send_packet_hdr_len < 1) {
+            error_report("%s: SEND_UPDATE fw_error=%d '%s'",
+                         __func__, fw_error, fw_error_to_str(fw_error));
+            return 1;
+        }
+
+        s->send_packet_hdr = g_new(gchar, s->send_packet_hdr_len);
+    }
+
+    /* allocate transport buffer */
+    trans = g_new(guchar, size);
+
+    update.hdr_uaddr = (uintptr_t)s->send_packet_hdr;
+    update.hdr_len = s->send_packet_hdr_len;
+    update.guest_uaddr = (uintptr_t)ptr;
+    update.guest_len = size;
+    update.trans_uaddr = (uintptr_t)trans;
+    update.trans_len = size;
+
+    trace_kvm_sev_send_update_data(ptr, trans, size);
+
+    ret = sev_ioctl(s->sev_fd, KVM_SEV_SEND_UPDATE_DATA, &update, &fw_error);
+    if (ret) {
+        error_report("%s: SEND_UPDATE_DATA ret=%d fw_error=%d '%s'",
+                     __func__, ret, fw_error, fw_error_to_str(fw_error));
+        goto err;
+    }
+
+    qemu_put_be32(f, update.hdr_len);
+    qemu_put_buffer(f, (uint8_t *)update.hdr_uaddr, update.hdr_len);
+    *bytes_sent = 4 + update.hdr_len;
+
+    qemu_put_be32(f, update.trans_len);
+    qemu_put_buffer(f, (uint8_t *)update.trans_uaddr, update.trans_len);
+    *bytes_sent += (4 + update.trans_len);
+
+err:
+    g_free(trans);
+    return ret;
+}
+
+int sev_save_outgoing_page(QEMUFile *f, uint8_t *ptr,
+                           uint32_t sz, uint64_t *bytes_sent)
+{
+    SevGuestState *s = sev_guest;
+
+    /*
+     * If this is a first buffer then create outgoing encryption context
+     * and write our PDH, policy and session data.
+     */
+    if (!sev_check_state(s, SEV_STATE_SEND_UPDATE) &&
+        sev_send_start(s, f, bytes_sent)) {
+        error_report("Failed to create outgoing context");
+        return 1;
+    }
+
+    return sev_send_update_data(s, f, ptr, sz, bytes_sent);
+}
+
 static const QemuUUID sev_hash_table_header_guid = {
     .data = UUID_LE(0x9438d606, 0x4f22, 0x4cc9, 0xb4, 0x79, 0xa7, 0x93,
                     0xd4, 0x11, 0xfd, 0x21)
diff --git a/target/i386/sev.h b/target/i386/sev.h
index 14801ec44db..19d3136dd80 100644
--- a/target/i386/sev.h
+++ b/target/i386/sev.h
@@ -53,6 +53,8 @@ extern bool sev_add_kernel_loader_hashes(SevKernelLoaderContext *ctx, Error **er
 int sev_encrypt_flash(uint8_t *ptr, uint64_t len, Error **errp);
 int sev_save_setup(const char *pdh, const char *plat_cert,
                    const char *amd_cert);
+int sev_save_outgoing_page(QEMUFile *f, uint8_t *ptr,
+                           uint32_t size, uint64_t *bytes_sent);
 int sev_inject_launch_secret(const char *hdr, const char *secret,
                              uint64_t gpa, Error **errp);
 
diff --git a/target/i386/trace-events b/target/i386/trace-events
index b7da9bd7481..c165e373710 100644
--- a/target/i386/trace-events
+++ b/target/i386/trace-events
@@ -11,6 +11,9 @@ kvm_sev_launch_measurement(const char *value) "data %s"
 kvm_sev_launch_finish(void) ""
 kvm_sev_launch_secret(uint64_t hpa, uint64_t hva, uint64_t secret, int len) "hpa 0x%" PRIx64 " hva 0x%" PRIx64 " data 0x%" PRIx64 " len %d"
 kvm_sev_attestation_report(const char *mnonce, const char *data) "mnonce %s data %s"
+kvm_sev_send_start(uint64_t pdh, int l1, uint64_t plat, int l2, uint64_t amd, int l3) "pdh 0x%" PRIx64 " len %d plat 0x%" PRIx64 " len %d amd 0x%" PRIx64 " len %d"
+kvm_sev_send_update_data(void *src, void *dst, int len) "guest %p trans %p len %d"
+kvm_sev_send_finish(void) ""
 
 # csv.c
 kvm_csv_launch_encrypt_data(uint64_t gpa, void *addr, uint64_t len) "gpa 0x%" PRIx64 "addr %p len 0x%" PRIu64
-- 
Gitee


From 6d2c35000b02b1195b0292858ace3d625204f83a Mon Sep 17 00:00:00 2001
From: Brijesh Singh <brijesh.singh@amd.com>
Date: Tue, 27 Jul 2021 13:00:50 +0000
Subject: [PATCH 358/407] target/i386: sev: add support to load incoming
 encrypted page

cherry-picked from https://github.com/AMDESE/qemu/commit/e86e5dccb045.

The sev_load_incoming_page() provide the implementation to read the
incoming guest private pages from the socket and load it into the guest
memory. The routines uses the RECEIVE_START command to create the
incoming encryption context on the first call then uses the
RECEIEVE_UPDATE_DATA command to load the encrypted pages into the guest
memory. After migration is completed, we issue the RECEIVE_FINISH command
to transition the SEV guest to the runnable state so that it can be
executed.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: hanliyang <hanliyang@hygon.cn>
---
 target/i386/sev.c        | 137 ++++++++++++++++++++++++++++++++++++++-
 target/i386/sev.h        |   1 +
 target/i386/trace-events |   3 +
 3 files changed, 140 insertions(+), 1 deletion(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index 491bdc91c27..9420d463383 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -175,6 +175,7 @@ static const char *const sev_fw_errlist[] = {
 static struct ConfidentialGuestMemoryEncryptionOps sev_memory_encryption_ops = {
     .save_setup = sev_save_setup,
     .save_outgoing_page = sev_save_outgoing_page,
+    .load_incoming_page = sev_load_incoming_page,
 };
 
 static int
@@ -877,13 +878,33 @@ sev_launch_finish(SevGuestState *sev)
     migrate_add_blocker(sev_mig_blocker, &error_fatal);
 }
 
+static int
+sev_receive_finish(SevGuestState *s)
+{
+    int error, ret = 1;
+
+    trace_kvm_sev_receive_finish();
+    ret = sev_ioctl(s->sev_fd, KVM_SEV_RECEIVE_FINISH, 0, &error);
+    if (ret) {
+        error_report("%s: RECEIVE_FINISH ret=%d fw_error=%d '%s'",
+                     __func__, ret, error, fw_error_to_str(error));
+        goto err;
+    }
+
+    sev_set_guest_state(s, SEV_STATE_RUNNING);
+err:
+    return ret;
+}
+
 static void
 sev_vm_state_change(void *opaque, bool running, RunState state)
 {
     SevGuestState *sev = opaque;
 
     if (running) {
-        if (!sev_check_state(sev, SEV_STATE_RUNNING)) {
+        if (sev_check_state(sev, SEV_STATE_RECEIVE_UPDATE)) {
+            sev_receive_finish(sev);
+        } else if (!sev_check_state(sev, SEV_STATE_RUNNING)) {
             sev_launch_finish(sev);
         }
     }
@@ -1515,6 +1536,120 @@ int sev_save_outgoing_page(QEMUFile *f, uint8_t *ptr,
     return sev_send_update_data(s, f, ptr, sz, bytes_sent);
 }
 
+static int
+sev_receive_start(SevGuestState *sev, QEMUFile *f)
+{
+    int ret = 1;
+    int fw_error;
+    struct kvm_sev_receive_start start = { };
+    gchar *session = NULL, *pdh_cert = NULL;
+
+    /* get SEV guest handle */
+    start.handle = object_property_get_int(OBJECT(sev), "handle",
+                                           &error_abort);
+
+    /* get the source policy */
+    start.policy = qemu_get_be32(f);
+
+    /* get source PDH key */
+    start.pdh_len = qemu_get_be32(f);
+    if (!check_blob_length(start.pdh_len)) {
+        return 1;
+    }
+
+    pdh_cert = g_new(gchar, start.pdh_len);
+    qemu_get_buffer(f, (uint8_t *)pdh_cert, start.pdh_len);
+    start.pdh_uaddr = (uintptr_t)pdh_cert;
+
+    /* get source session data */
+    start.session_len = qemu_get_be32(f);
+    if (!check_blob_length(start.session_len)) {
+        return 1;
+    }
+    session = g_new(gchar, start.session_len);
+    qemu_get_buffer(f, (uint8_t *)session, start.session_len);
+    start.session_uaddr = (uintptr_t)session;
+
+    trace_kvm_sev_receive_start(start.policy, session, pdh_cert);
+
+    ret = sev_ioctl(sev_guest->sev_fd, KVM_SEV_RECEIVE_START,
+                    &start, &fw_error);
+    if (ret < 0) {
+        error_report("Error RECEIVE_START ret=%d fw_error=%d '%s'",
+                      ret, fw_error, fw_error_to_str(fw_error));
+        goto err;
+    }
+
+    object_property_set_int(OBJECT(sev), "handle", start.handle, &error_abort);
+    sev_set_guest_state(sev, SEV_STATE_RECEIVE_UPDATE);
+err:
+    g_free(session);
+    g_free(pdh_cert);
+
+    return ret;
+}
+
+static int sev_receive_update_data(QEMUFile *f, uint8_t *ptr)
+{
+    int ret = 1, fw_error = 0;
+    gchar *hdr = NULL, *trans = NULL;
+    struct kvm_sev_receive_update_data update = {};
+
+    /* get packet header */
+    update.hdr_len = qemu_get_be32(f);
+    if (!check_blob_length(update.hdr_len)) {
+        return 1;
+    }
+
+    hdr = g_new(gchar, update.hdr_len);
+    qemu_get_buffer(f, (uint8_t *)hdr, update.hdr_len);
+    update.hdr_uaddr = (uintptr_t)hdr;
+
+    /* get transport buffer */
+    update.trans_len = qemu_get_be32(f);
+    if (!check_blob_length(update.trans_len)) {
+        goto err;
+    }
+
+    trans = g_new(gchar, update.trans_len);
+    update.trans_uaddr = (uintptr_t)trans;
+    qemu_get_buffer(f, (uint8_t *)update.trans_uaddr, update.trans_len);
+
+    update.guest_uaddr = (uintptr_t) ptr;
+    update.guest_len = update.trans_len;
+
+    trace_kvm_sev_receive_update_data(trans, ptr, update.guest_len,
+            hdr, update.hdr_len);
+
+    ret = sev_ioctl(sev_guest->sev_fd, KVM_SEV_RECEIVE_UPDATE_DATA,
+                    &update, &fw_error);
+    if (ret) {
+        error_report("Error RECEIVE_UPDATE_DATA ret=%d fw_error=%d '%s'",
+                     ret, fw_error, fw_error_to_str(fw_error));
+        goto err;
+    }
+err:
+    g_free(trans);
+    g_free(hdr);
+    return ret;
+}
+
+int sev_load_incoming_page(QEMUFile *f, uint8_t *ptr)
+{
+    SevGuestState *s = sev_guest;
+
+    /*
+     * If this is first buffer and SEV is not in recieiving state then
+     * use RECEIVE_START command to create a encryption context.
+     */
+    if (!sev_check_state(s, SEV_STATE_RECEIVE_UPDATE) &&
+        sev_receive_start(s, f)) {
+        return 1;
+    }
+
+    return sev_receive_update_data(f, ptr);
+}
+
 static const QemuUUID sev_hash_table_header_guid = {
     .data = UUID_LE(0x9438d606, 0x4f22, 0x4cc9, 0xb4, 0x79, 0xa7, 0x93,
                     0xd4, 0x11, 0xfd, 0x21)
diff --git a/target/i386/sev.h b/target/i386/sev.h
index 19d3136dd80..1040b0cb2e1 100644
--- a/target/i386/sev.h
+++ b/target/i386/sev.h
@@ -55,6 +55,7 @@ int sev_save_setup(const char *pdh, const char *plat_cert,
                    const char *amd_cert);
 int sev_save_outgoing_page(QEMUFile *f, uint8_t *ptr,
                            uint32_t size, uint64_t *bytes_sent);
+int sev_load_incoming_page(QEMUFile *f, uint8_t *ptr);
 int sev_inject_launch_secret(const char *hdr, const char *secret,
                              uint64_t gpa, Error **errp);
 
diff --git a/target/i386/trace-events b/target/i386/trace-events
index c165e373710..e32b0319ba3 100644
--- a/target/i386/trace-events
+++ b/target/i386/trace-events
@@ -14,6 +14,9 @@ kvm_sev_attestation_report(const char *mnonce, const char *data) "mnonce %s data
 kvm_sev_send_start(uint64_t pdh, int l1, uint64_t plat, int l2, uint64_t amd, int l3) "pdh 0x%" PRIx64 " len %d plat 0x%" PRIx64 " len %d amd 0x%" PRIx64 " len %d"
 kvm_sev_send_update_data(void *src, void *dst, int len) "guest %p trans %p len %d"
 kvm_sev_send_finish(void) ""
+kvm_sev_receive_start(int policy, void *session, void *pdh) "policy 0x%x session %p pdh %p"
+kvm_sev_receive_update_data(void *src, void *dst, int len, void *hdr, int hdr_len) "guest %p trans %p len %d hdr %p hdr_len %d"
+kvm_sev_receive_finish(void) ""
 
 # csv.c
 kvm_csv_launch_encrypt_data(uint64_t gpa, void *addr, uint64_t len) "gpa 0x%" PRIx64 "addr %p len 0x%" PRIu64
-- 
Gitee


From 7746f784ff016c2fb610716dc5cbb1310d89e9dd Mon Sep 17 00:00:00 2001
From: Ashish Kalra <ashish.kalra@amd.com>
Date: Tue, 27 Jul 2021 15:05:49 +0000
Subject: [PATCH 359/407] kvm: Add support for SEV shared regions list and
 KVM_EXIT_HYPERCALL.

cherry-picked from https://github.com/AMDESE/qemu/commit/fcbbd9b19ac.

KVM_HC_MAP_GPA_RANGE hypercall is used by the SEV guest to notify a
change in the page encryption status to the hypervisor. The hypercall
should be invoked only when the encryption attribute is changed from
encrypted -> decrypted and vice versa. By default all guest pages are
considered encrypted.

The hypercall exits to userspace with KVM_EXIT_HYPERCALL exit code,
currently this is used only by SEV guests for guest page encryptiion
status tracking. Add support to handle this exit and invoke SEV
shared regions list handlers.

Add support for SEV guest shared regions and implementation of the
SEV shared regions list.

Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: hanliyang <hanliyang@hygon.cn>
---
 linux-headers/linux/kvm.h |   3 ++
 target/i386/kvm/kvm.c     |  48 +++++++++++++++++
 target/i386/sev.c         | 105 ++++++++++++++++++++++++++++++++++++++
 target/i386/sev.h         |   3 ++
 4 files changed, 159 insertions(+)

diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 36350607f54..8ed722236ee 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -344,6 +344,7 @@ struct kvm_run {
 		} mmio;
 		/* KVM_EXIT_HYPERCALL */
 		struct {
+#define KVM_HC_MAP_GPA_RANGE 12
 			__u64 nr;
 			__u64 args[6];
 			__u64 ret;
@@ -1154,6 +1155,8 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_S390_ZPCI_OP 221
 #define KVM_CAP_S390_CPU_TOPOLOGY 222
 
+#define KVM_EXIT_HYPERCALL_VALID_MASK (1 << KVM_HC_MAP_GPA_RANGE)
+
 #ifdef KVM_CAP_IRQ_ROUTING
 
 struct kvm_irq_routing_irqchip {
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index a06221d3e53..5262806c47c 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -127,6 +127,7 @@ static int has_xsave2;
 static int has_xcrs;
 static int has_pit_state2;
 static int has_exception_payload;
+static int has_map_gpa_range;
 
 static bool has_msr_mcg_ext_ctl;
 
@@ -2052,6 +2053,17 @@ int kvm_arch_init_vcpu(CPUState *cs)
         c->eax = MAX(c->eax, KVM_CPUID_SIGNATURE | 0x10);
     }
 
+    if (sev_enabled()) {
+        c = cpuid_find_entry(&cpuid_data.cpuid,
+                             KVM_CPUID_FEATURES | kvm_base, 0);
+        if (c) {
+            c->eax |= (1 << KVM_FEATURE_MIGRATION_CONTROL);
+            if (has_map_gpa_range) {
+                c->eax |= (1 << KVM_FEATURE_HC_MAP_GPA_RANGE);
+            }
+        }
+    }
+
     cpuid_data.cpuid.nent = cpuid_i;
 
     cpuid_data.cpuid.padding = 0;
@@ -2397,6 +2409,17 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
         }
     }
 
+    has_map_gpa_range = kvm_check_extension(s, KVM_CAP_EXIT_HYPERCALL);
+    if (has_map_gpa_range) {
+        ret = kvm_vm_enable_cap(s, KVM_CAP_EXIT_HYPERCALL, 0,
+                                KVM_EXIT_HYPERCALL_VALID_MASK);
+        if (ret < 0) {
+            error_report("kvm: Failed to enable MAP_GPA_RANGE cap: %s",
+                         strerror(-ret));
+            return ret;
+        }
+    }
+
     ret = kvm_get_supported_msrs(s);
     if (ret < 0) {
         return ret;
@@ -4612,6 +4635,28 @@ static int kvm_handle_tpr_access(X86CPU *cpu)
     return 1;
 }
 
+static int kvm_handle_exit_hypercall(X86CPU *cpu, struct kvm_run *run)
+{
+    /*
+     * Currently this exit is only used by SEV guests for
+     * guest page encryption status tracking.
+     */
+    if (run->hypercall.nr == KVM_HC_MAP_GPA_RANGE) {
+        unsigned long enc = run->hypercall.args[2];
+        unsigned long gpa = run->hypercall.args[0];
+        unsigned long npages = run->hypercall.args[1];
+        unsigned long gfn_start = gpa >> TARGET_PAGE_BITS;
+        unsigned long gfn_end = gfn_start + npages;
+
+        if (enc) {
+            sev_remove_shared_regions_list(gfn_start, gfn_end);
+         } else {
+            sev_add_shared_regions_list(gfn_start, gfn_end);
+         }
+    }
+    return 0;
+}
+
 int kvm_arch_insert_sw_breakpoint(CPUState *cs, struct kvm_sw_breakpoint *bp)
 {
     static const uint8_t int3 = 0xcc;
@@ -4902,6 +4947,9 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
         /* already handled in kvm_arch_post_run */
         ret = 0;
         break;
+    case KVM_EXIT_HYPERCALL:
+        ret = kvm_handle_exit_hypercall(cpu, run);
+        break;
     default:
         fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason);
         ret = -1;
diff --git a/target/i386/sev.c b/target/i386/sev.c
index 9420d463383..1aa100f809f 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -45,6 +45,10 @@
 #define TYPE_SEV_GUEST "sev-guest"
 OBJECT_DECLARE_SIMPLE_TYPE(SevGuestState, SEV_GUEST)
 
+struct shared_region {
+    unsigned long gfn_start, gfn_end;
+    QTAILQ_ENTRY(shared_region) list;
+};
 
 extern struct sev_ops sev_ops;
 
@@ -90,6 +94,8 @@ struct SevGuestState {
     uint32_t reset_cs;
     uint32_t reset_ip;
     bool reset_data_valid;
+
+    QTAILQ_HEAD(, shared_region) shared_regions_list;
 };
 
 #define DEFAULT_GUEST_POLICY    0x1 /* disable debug */
@@ -1116,6 +1122,7 @@ int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
     add_migration_state_change_notifier(&sev_migration_state_notify);
 
     cgs_class->memory_encryption_ops = &sev_memory_encryption_ops;
+    QTAILQ_INIT(&sev->shared_regions_list);
 
     cgs->ready = true;
 
@@ -1650,6 +1657,104 @@ int sev_load_incoming_page(QEMUFile *f, uint8_t *ptr)
     return sev_receive_update_data(f, ptr);
 }
 
+int sev_remove_shared_regions_list(unsigned long start, unsigned long end)
+{
+    SevGuestState *s = sev_guest;
+    struct shared_region *pos;
+
+    QTAILQ_FOREACH(pos, &s->shared_regions_list, list) {
+        unsigned long l, r;
+        unsigned long curr_gfn_end = pos->gfn_end;
+
+        /*
+         * Find if any intersection exists ?
+         * left bound for intersecting segment
+         */
+        l = MAX(start, pos->gfn_start);
+        /* right bound for intersecting segment */
+        r = MIN(end, pos->gfn_end);
+        if (l <= r) {
+            if (pos->gfn_start == l && pos->gfn_end == r) {
+                QTAILQ_REMOVE(&s->shared_regions_list, pos, list);
+            } else if (l == pos->gfn_start) {
+                pos->gfn_start = r;
+            } else if (r == pos->gfn_end) {
+                pos->gfn_end = l;
+            } else {
+                /* Do a de-merge -- split linked list nodes */
+                struct shared_region *shrd_region;
+
+                pos->gfn_end = l;
+                shrd_region = g_malloc0(sizeof(*shrd_region));
+                if (!shrd_region) {
+                    return 0;
+                }
+                shrd_region->gfn_start = r;
+                shrd_region->gfn_end = curr_gfn_end;
+                QTAILQ_INSERT_AFTER(&s->shared_regions_list, pos,
+                                    shrd_region, list);
+            }
+        }
+        if (end <= curr_gfn_end) {
+            break;
+        }
+    }
+    return 0;
+}
+
+int sev_add_shared_regions_list(unsigned long start, unsigned long end)
+{
+    struct shared_region *shrd_region;
+    struct shared_region *pos;
+    SevGuestState *s = sev_guest;
+
+    if (QTAILQ_EMPTY(&s->shared_regions_list)) {
+        shrd_region = g_malloc0(sizeof(*shrd_region));
+        if (!shrd_region) {
+            return -1;
+        }
+        shrd_region->gfn_start = start;
+        shrd_region->gfn_end = end;
+        QTAILQ_INSERT_TAIL(&s->shared_regions_list, shrd_region, list);
+        return 0;
+    }
+
+    /*
+     * shared regions list is a sorted list in ascending order
+     * of guest PA's and also merges consecutive range of guest PA's
+     */
+    QTAILQ_FOREACH(pos, &s->shared_regions_list, list) {
+        /* handle duplicate overlapping regions */
+        if (start >= pos->gfn_start && end <= pos->gfn_end) {
+            return 0;
+        }
+        if (pos->gfn_end < start) {
+            continue;
+        }
+        /* merge consecutive guest PA(s) -- forward merge */
+        if (pos->gfn_start <= start && pos->gfn_end >= start) {
+            pos->gfn_end = end;
+            return 0;
+        }
+        break;
+    }
+    /*
+     * Add a new node
+     */
+    shrd_region = g_malloc0(sizeof(*shrd_region));
+    if (!shrd_region) {
+        return -1;
+    }
+    shrd_region->gfn_start = start;
+    shrd_region->gfn_end = end;
+    if (pos) {
+        QTAILQ_INSERT_BEFORE(pos, shrd_region, list);
+    } else {
+        QTAILQ_INSERT_TAIL(&s->shared_regions_list, shrd_region, list);
+    }
+    return 1;
+}
+
 static const QemuUUID sev_hash_table_header_guid = {
     .data = UUID_LE(0x9438d606, 0x4f22, 0x4cc9, 0xb4, 0x79, 0xa7, 0x93,
                     0xd4, 0x11, 0xfd, 0x21)
diff --git a/target/i386/sev.h b/target/i386/sev.h
index 1040b0cb2e1..8423b552245 100644
--- a/target/i386/sev.h
+++ b/target/i386/sev.h
@@ -61,6 +61,9 @@ int sev_inject_launch_secret(const char *hdr, const char *secret,
 
 int sev_es_save_reset_vector(void *flash_ptr, uint64_t flash_size);
 void sev_es_set_reset_vector(CPUState *cpu);
+int sev_remove_shared_regions_list(unsigned long gfn_start,
+                                   unsigned long gfn_end);
+int sev_add_shared_regions_list(unsigned long gfn_start, unsigned long gfn_end);
 
 int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp);
 
-- 
Gitee


From 9cd783f9cdf51f35d7990afd0da82626af6d5338 Mon Sep 17 00:00:00 2001
From: Brijesh Singh <brijesh.singh@amd.com>
Date: Tue, 27 Jul 2021 16:31:36 +0000
Subject: [PATCH 360/407] migration: add support to migrate shared regions list

cherry-picked from https://github.com/AMDESE/qemu/commit/9236f522e48b6.

When memory encryption is enabled, the hypervisor maintains a shared
regions list which is referred by hypervisor during migration to check
if page is private or shared. This list is built during the VM bootup and
must be migrated to the target host so that hypervisor on target host can
use it for future migration.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: hanliyang <hanliyang@hygon.cn>
---
 target/i386/sev.c | 43 +++++++++++++++++++++++++++++++++++++++++++
 target/i386/sev.h |  2 ++
 2 files changed, 45 insertions(+)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index 1aa100f809f..42b3a6ef79d 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -178,10 +178,15 @@ static const char *const sev_fw_errlist[] = {
 
 #define SEV_FW_BLOB_MAX_SIZE            0x4000          /* 16KB */
 
+#define SHARED_REGION_LIST_CONT     0x1
+#define SHARED_REGION_LIST_END      0x2
+
 static struct ConfidentialGuestMemoryEncryptionOps sev_memory_encryption_ops = {
     .save_setup = sev_save_setup,
     .save_outgoing_page = sev_save_outgoing_page,
     .load_incoming_page = sev_load_incoming_page,
+    .save_outgoing_shared_regions_list = sev_save_outgoing_shared_regions_list,
+    .load_incoming_shared_regions_list = sev_load_incoming_shared_regions_list,
 };
 
 static int
@@ -1755,6 +1760,44 @@ int sev_add_shared_regions_list(unsigned long start, unsigned long end)
     return 1;
 }
 
+int sev_save_outgoing_shared_regions_list(QEMUFile *f)
+{
+    SevGuestState *s = sev_guest;
+    struct shared_region *pos;
+
+    QTAILQ_FOREACH(pos, &s->shared_regions_list, list) {
+        qemu_put_be32(f, SHARED_REGION_LIST_CONT);
+        qemu_put_be32(f, pos->gfn_start);
+        qemu_put_be32(f, pos->gfn_end);
+    }
+
+    qemu_put_be32(f, SHARED_REGION_LIST_END);
+    return 0;
+}
+
+int sev_load_incoming_shared_regions_list(QEMUFile *f)
+{
+    SevGuestState *s = sev_guest;
+    struct shared_region *shrd_region;
+    int status;
+
+    status = qemu_get_be32(f);
+    while (status == SHARED_REGION_LIST_CONT) {
+
+        shrd_region = g_malloc0(sizeof(*shrd_region));
+        if (!shrd_region) {
+            return 0;
+        }
+        shrd_region->gfn_start = qemu_get_be32(f);
+        shrd_region->gfn_end = qemu_get_be32(f);
+
+        QTAILQ_INSERT_TAIL(&s->shared_regions_list, shrd_region, list);
+
+        status = qemu_get_be32(f);
+    }
+    return 0;
+}
+
 static const QemuUUID sev_hash_table_header_guid = {
     .data = UUID_LE(0x9438d606, 0x4f22, 0x4cc9, 0xb4, 0x79, 0xa7, 0x93,
                     0xd4, 0x11, 0xfd, 0x21)
diff --git a/target/i386/sev.h b/target/i386/sev.h
index 8423b552245..34814b9f21b 100644
--- a/target/i386/sev.h
+++ b/target/i386/sev.h
@@ -64,6 +64,8 @@ void sev_es_set_reset_vector(CPUState *cpu);
 int sev_remove_shared_regions_list(unsigned long gfn_start,
                                    unsigned long gfn_end);
 int sev_add_shared_regions_list(unsigned long gfn_start, unsigned long gfn_end);
+int sev_save_outgoing_shared_regions_list(QEMUFile *f);
+int sev_load_incoming_shared_regions_list(QEMUFile *f);
 
 int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp);
 
-- 
Gitee


From 7fd875536eb6b8dc9907c472b1096b216f0e9071 Mon Sep 17 00:00:00 2001
From: Brijesh Singh <brijesh.singh@amd.com>
Date: Tue, 27 Jul 2021 16:53:19 +0000
Subject: [PATCH 361/407] migration/ram: add support to send encrypted pages

cherry-picked from https://github.com/AMDESE/qemu/commit/2d6bda0d4cf.

When memory encryption is enabled, the guest memory will be encrypted with
the guest specific key. The patch introduces RAM_SAVE_FLAG_ENCRYPTED_PAGE
flag to distinguish the encrypted data from plaintext. Encrypted pages
may need special handling. The sev_save_outgoing_page() is used
by the sender to write the encrypted pages onto the socket, similarly the
sev_load_incoming_page() is used by the target to read the
encrypted pages from the socket and load into the guest memory.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: hanliyang <hanliyang@hygon.cn>
---
 migration/migration.h |   2 +
 migration/ram.c       | 163 +++++++++++++++++++++++++++++++++++++++++-
 target/i386/sev.c     |  14 ++++
 target/i386/sev.h     |   4 ++
 4 files changed, 182 insertions(+), 1 deletion(-)

diff --git a/migration/migration.h b/migration/migration.h
index 0ae2133326b..596668d806b 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -403,4 +403,6 @@ void migration_cancel(const Error *error);
 
 void populate_vfio_info(MigrationInfo *info);
 
+bool memcrypt_enabled(void);
+
 #endif
diff --git a/migration/ram.c b/migration/ram.c
index 93cdb456ac8..57f2c01bfee 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -55,6 +55,10 @@
 #include "qemu/iov.h"
 #include "multifd.h"
 #include "sysemu/runstate.h"
+#include "exec/confidential-guest-support.h"
+
+/* Defines RAM_SAVE_ENCRYPTED_PAGE and RAM_SAVE_SHARED_REGION_LIST */
+#include "target/i386/sev.h"
 
 #include "hw/boards.h" /* for machine_dump_guest_core() */
 
@@ -80,6 +84,16 @@
 #define RAM_SAVE_FLAG_XBZRLE   0x40
 /* 0x80 is reserved in migration.h start with 0x100 next */
 #define RAM_SAVE_FLAG_COMPRESS_PAGE    0x100
+#define RAM_SAVE_FLAG_ENCRYPTED_DATA   0x200
+
+bool memcrypt_enabled(void)
+{
+    MachineState *ms = MACHINE(qdev_get_machine());
+    if(ms->cgs)
+        return ms->cgs->ready;
+    else
+        return false;
+}
 
 static inline bool is_zero_range(uint8_t *p, uint64_t size)
 {
@@ -468,6 +482,8 @@ static QemuCond decomp_done_cond;
 
 static bool do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block,
                                  ram_addr_t offset, uint8_t *source_buf);
+static int ram_save_encrypted_page(RAMState *rs, PageSearchStatus *pss,
+                                   bool last_stage);
 
 static void *do_data_compress(void *opaque)
 {
@@ -1301,6 +1317,80 @@ static int save_normal_page(RAMState *rs, RAMBlock *block, ram_addr_t offset,
     return 1;
 }
 
+/**
+ * ram_save_encrypted_page - send the given encrypted page to the stream
+ */
+static int ram_save_encrypted_page(RAMState *rs, PageSearchStatus *pss,
+                                   bool last_stage)
+{
+    int ret;
+    uint8_t *p;
+    RAMBlock *block = pss->block;
+    ram_addr_t offset = pss->page << TARGET_PAGE_BITS;
+    uint64_t bytes_xmit;
+    MachineState *ms = MACHINE(qdev_get_machine());
+    ConfidentialGuestSupportClass *cgs_class =
+        (ConfidentialGuestSupportClass *) object_get_class(OBJECT(ms->cgs));
+    struct ConfidentialGuestMemoryEncryptionOps *ops =
+        cgs_class->memory_encryption_ops;
+
+    p = block->host + offset;
+
+    ram_counters.transferred +=
+        save_page_header(rs, rs->f, block,
+                    offset | RAM_SAVE_FLAG_ENCRYPTED_DATA);
+
+    qemu_put_be32(rs->f, RAM_SAVE_ENCRYPTED_PAGE);
+    ret = ops->save_outgoing_page(rs->f, p, TARGET_PAGE_SIZE, &bytes_xmit);
+    if (ret) {
+        return -1;
+    }
+
+    ram_counters.transferred += bytes_xmit;
+    ram_counters.normal++;
+
+    return 1;
+}
+
+/**
+ * ram_save_shared_region_list: send the shared region list
+ */
+static int ram_save_shared_region_list(RAMState *rs, QEMUFile *f)
+{
+    MachineState *ms = MACHINE(qdev_get_machine());
+    ConfidentialGuestSupportClass *cgs_class =
+        (ConfidentialGuestSupportClass *) object_get_class(OBJECT(ms->cgs));
+    struct ConfidentialGuestMemoryEncryptionOps *ops =
+        cgs_class->memory_encryption_ops;
+
+    save_page_header(rs, rs->f, rs->last_seen_block,
+                     RAM_SAVE_FLAG_ENCRYPTED_DATA);
+    qemu_put_be32(rs->f, RAM_SAVE_SHARED_REGIONS_LIST);
+    return ops->save_outgoing_shared_regions_list(rs->f);
+}
+
+static int load_encrypted_data(QEMUFile *f, uint8_t *ptr)
+{
+    MachineState *ms = MACHINE(qdev_get_machine());
+    ConfidentialGuestSupportClass *cgs_class =
+        (ConfidentialGuestSupportClass *) object_get_class(OBJECT(ms->cgs));
+    struct ConfidentialGuestMemoryEncryptionOps *ops =
+        cgs_class->memory_encryption_ops;
+
+    int flag;
+
+    flag = qemu_get_be32(f);
+
+    if (flag == RAM_SAVE_ENCRYPTED_PAGE) {
+        return ops->load_incoming_page(f, ptr);
+    } else if (flag == RAM_SAVE_SHARED_REGIONS_LIST) {
+        return ops->load_incoming_shared_regions_list(f);
+    } else {
+        error_report("unknown encrypted flag %x", flag);
+        return 1;
+    }
+}
+
 /**
  * ram_save_page: send the given page to the stream
  *
@@ -2144,6 +2234,35 @@ static bool save_compress_page(RAMState *rs, RAMBlock *block, ram_addr_t offset)
     return false;
 }
 
+/**
+ * encrypted_test_list: check if the page is encrypted
+ *
+ * Returns a bool indicating whether the page is encrypted.
+ */
+static bool encrypted_test_list(RAMState *rs, RAMBlock *block,
+                                  unsigned long page)
+{
+    MachineState *ms = MACHINE(qdev_get_machine());
+    ConfidentialGuestSupportClass *cgs_class =
+        (ConfidentialGuestSupportClass *) object_get_class(OBJECT(ms->cgs));
+    struct ConfidentialGuestMemoryEncryptionOps *ops =
+        cgs_class->memory_encryption_ops;
+    unsigned long gfn;
+
+    /* ROM devices contains the unencrypted data */
+    if (memory_region_is_rom(block->mr)) {
+        return false;
+    }
+
+    /*
+     * Translate page in ram_addr_t address space to GPA address
+     * space using memory region.
+     */
+    gfn = page + (block->mr->addr >> TARGET_PAGE_BITS);
+
+    return ops->is_gfn_in_unshared_region(gfn);
+}
+
 /**
  * ram_save_target_page: save one target page
  *
@@ -2164,6 +2283,17 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss,
         return res;
     }
 
+    /*
+     * If memory encryption is enabled then use memory encryption APIs
+     * to write the outgoing buffer to the wire. The encryption APIs
+     * will take care of accessing the guest memory and re-encrypt it
+     * for the transport purposes.
+     */
+    if (memcrypt_enabled() &&
+        encrypted_test_list(rs, pss->block, pss->page)) {
+        return ram_save_encrypted_page(rs, pss, last_stage);
+    }
+
     if (save_compress_page(rs, block, offset)) {
         return 1;
     }
@@ -2990,6 +3120,18 @@ void qemu_guest_free_page_hint(void *addr, size_t len)
     }
 }
 
+static int ram_encrypted_save_setup(void)
+{
+    MachineState *ms = MACHINE(qdev_get_machine());
+    ConfidentialGuestSupportClass *cgs_class =
+        (ConfidentialGuestSupportClass *) object_get_class(OBJECT(ms->cgs));
+    struct ConfidentialGuestMemoryEncryptionOps *ops =
+        cgs_class->memory_encryption_ops;
+    MigrationParameters *p = &migrate_get_current()->parameters;
+
+    return ops->save_setup(p->sev_pdh, p->sev_plat_cert, p->sev_amd_cert);
+}
+
 /*
  * Each of ram_save_setup, ram_save_iterate and ram_save_complete has
  * long-running RCU critical section.  When rcu-reclaims in the code
@@ -3025,6 +3167,13 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     (*rsp)->f = f;
 
     WITH_RCU_READ_LOCK_GUARD() {
+
+        if (memcrypt_enabled()) {
+            if (ram_encrypted_save_setup()) {
+                return -1;
+            }
+        }
+
         qemu_put_be64(f, ram_bytes_total_common(true) | RAM_SAVE_FLAG_MEM_SIZE);
 
         RAMBLOCK_FOREACH_MIGRATABLE(block) {
@@ -3217,6 +3366,11 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 
         flush_compressed_data(rs);
         ram_control_after_iterate(f, RAM_CONTROL_FINISH);
+
+        /* send the shared regions list */
+        if (memcrypt_enabled()) {
+            ret = ram_save_shared_region_list(rs, f);
+        }
     }
 
     if (ret < 0) {
@@ -4038,7 +4192,8 @@ static int ram_load_precopy(QEMUFile *f)
         }
 
         if (flags & (RAM_SAVE_FLAG_ZERO | RAM_SAVE_FLAG_PAGE |
-                     RAM_SAVE_FLAG_COMPRESS_PAGE | RAM_SAVE_FLAG_XBZRLE)) {
+                     RAM_SAVE_FLAG_COMPRESS_PAGE | RAM_SAVE_FLAG_XBZRLE |
+                     RAM_SAVE_FLAG_ENCRYPTED_DATA)) {
             RAMBlock *block = ram_block_from_stream(f, flags);
 
             host = host_from_ram_block_offset(block, addr);
@@ -4167,6 +4322,12 @@ static int ram_load_precopy(QEMUFile *f)
                 break;
             }
             break;
+        case RAM_SAVE_FLAG_ENCRYPTED_DATA:
+            if (load_encrypted_data(f, host)) {
+                    error_report("Failed to load encrypted data");
+                    ret = -EINVAL;
+            }
+            break;
         case RAM_SAVE_FLAG_EOS:
             /* normal exit */
             multifd_recv_sync_main();
diff --git a/target/i386/sev.c b/target/i386/sev.c
index 42b3a6ef79d..fac81ba72cd 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -185,6 +185,7 @@ static struct ConfidentialGuestMemoryEncryptionOps sev_memory_encryption_ops = {
     .save_setup = sev_save_setup,
     .save_outgoing_page = sev_save_outgoing_page,
     .load_incoming_page = sev_load_incoming_page,
+    .is_gfn_in_unshared_region = sev_is_gfn_in_unshared_region,
     .save_outgoing_shared_regions_list = sev_save_outgoing_shared_regions_list,
     .load_incoming_shared_regions_list = sev_load_incoming_shared_regions_list,
 };
@@ -1798,6 +1799,19 @@ int sev_load_incoming_shared_regions_list(QEMUFile *f)
     return 0;
 }
 
+bool sev_is_gfn_in_unshared_region(unsigned long gfn)
+{
+    SevGuestState *s = sev_guest;
+    struct shared_region *pos;
+
+    QTAILQ_FOREACH(pos, &s->shared_regions_list, list) {
+        if (gfn >= pos->gfn_start && gfn < pos->gfn_end) {
+            return false;
+        }
+    }
+    return true;
+}
+
 static const QemuUUID sev_hash_table_header_guid = {
     .data = UUID_LE(0x9438d606, 0x4f22, 0x4cc9, 0xb4, 0x79, 0xa7, 0x93,
                     0xd4, 0x11, 0xfd, 0x21)
diff --git a/target/i386/sev.h b/target/i386/sev.h
index 34814b9f21b..b8ad840141f 100644
--- a/target/i386/sev.h
+++ b/target/i386/sev.h
@@ -38,6 +38,9 @@ typedef struct SevKernelLoaderContext {
     size_t cmdline_size;
 } SevKernelLoaderContext;
 
+#define RAM_SAVE_ENCRYPTED_PAGE           0x1
+#define RAM_SAVE_SHARED_REGIONS_LIST      0x2
+
 #ifdef CONFIG_SEV
 bool sev_enabled(void);
 bool sev_es_enabled(void);
@@ -66,6 +69,7 @@ int sev_remove_shared_regions_list(unsigned long gfn_start,
 int sev_add_shared_regions_list(unsigned long gfn_start, unsigned long gfn_end);
 int sev_save_outgoing_shared_regions_list(QEMUFile *f);
 int sev_load_incoming_shared_regions_list(QEMUFile *f);
+bool sev_is_gfn_in_unshared_region(unsigned long gfn);
 
 int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp);
 
-- 
Gitee


From a4c3d580f2d2a340d769e599c9870e2a097d47d2 Mon Sep 17 00:00:00 2001
From: Ashish Kalra <ashish.kalra@amd.com>
Date: Tue, 27 Jul 2021 18:05:25 +0000
Subject: [PATCH 362/407] migration/ram: Force encrypted status for flash0 &
 flash1 devices.

cherry-picked from https://github.com/AMDESE/qemu/commit/803d6a4c8d.

Currently OVMF clears the C-bit and marks NonExistent memory space
as decrypted in the page encryption bitmap. By marking the
NonExistent memory space as decrypted it gurantees any future MMIO adds
will work correctly, but this marks flash0 device space as decrypted.
At reset the SEV core will be in forced encrypted state, so this
decrypted marking of flash0 device space will cause VCPU reset to fail
as flash0 device pages will be migrated incorrectly.

Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: hanliyang <hanliyang@hygon.cn>
---
 migration/ram.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 57f2c01bfee..da7a809945c 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2254,6 +2254,14 @@ static bool encrypted_test_list(RAMState *rs, RAMBlock *block,
         return false;
     }
 
+    if (!strcmp(memory_region_name(block->mr), "system.flash0")) {
+        return true;
+    }
+
+    if (!strcmp(memory_region_name(block->mr), "system.flash1")) {
+        return false;
+    }
+
     /*
      * Translate page in ram_addr_t address space to GPA address
      * space using memory region.
-- 
Gitee


From c3f4c510fb2c5278e7fbc2882e433193af2a3369 Mon Sep 17 00:00:00 2001
From: Ashish Kalra <ashish.kalra@amd.com>
Date: Tue, 3 Aug 2021 16:06:27 +0000
Subject: [PATCH 363/407] migration: for SEV live migration bump downtime limit
 to 1s.

cherry-picked from https://github.com/AMDESE/qemu/commit/b1468803a2200.

Now, qemu has a default expected downtime of 300 ms and
SEV Live migration has a page-per-second bandwidth of 350-450 pages
( SEV Live migration being generally slow due to guest RAM pages
being migrated after encryption using the security processor ).
With this expected downtime of 300ms and 350-450 pps bandwith,
the threshold size = <1/3 of the PPS bandwidth = ~100 pages.

Now, this threshold size is the maximum pages/bytes that can be
sent in the final completion phase of Live migration
(where the source VM is stopped) with the expected downtime.
Therefore, with the threshold size computed above,
the migration completion phase which halts the source VM
and then transfers the leftover dirty pages,
is only reached in SEV live migration case when # of dirty pages are ~100.

The dirty-pages-rate with larger guest RAM configuration like 4G, 8G, etc.
is much higher, typically in the range of 300-400+ pages, hence,
we always remain in the "dirty-sync" phase of migration and never
reach the migration completion phase with above guest RAM configs.

To summarize, with larger guest RAM configs,
the dirty-pages-rate > threshold_size (with the default qemu expected downtime of 300ms).

So, the fix is to increase qemu's expected downtime.

This is a tweakable parameter which can be set using "migrate_set_downtime".

With a downtime of 1 second, we get a threshold size of ~350-450 pages,
which will handle the "dirty-pages-rate" of 300+ pages and complete
the migration process, so we bump the default downtime to 1s in case
of SEV live migration being active.

Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: hanliyang <hanliyang@hygon.cn>
---
 migration/migration.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index b1ae6a9ff80..4d5083b6f57 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3670,6 +3670,10 @@ static void migration_update_counters(MigrationState *s,
     transferred = current_bytes - s->iteration_initial_bytes;
     time_spent = current_time - s->iteration_start_time;
     bandwidth = (double)transferred / time_spent;
+    if (memcrypt_enabled() &&
+        s->parameters.downtime_limit < 1000) {
+        s->parameters.downtime_limit = 1000;
+    }
     s->threshold_size = bandwidth * s->parameters.downtime_limit;
 
     s->mbps = (((double) transferred * 8.0) /
-- 
Gitee


From d3f66f2d8c8437d1a8f1ebeb8649373fc35d9e58 Mon Sep 17 00:00:00 2001
From: Alexander Graf <agraf@csgraf.de>
Date: Wed, 5 Oct 2022 00:56:42 +0200
Subject: [PATCH 364/407] i386: kvm: Add support for MSR filtering

commit 860054d8ce4067ef2bc3deb2a98cf93350fc03e4 upstream.

KVM has grown support to deflect arbitrary MSRs to user space since
Linux 5.10. For now we don't expect to make a lot of use of this
feature, so let's expose it the easiest way possible: With up to 16
individually maskable MSRs.

This patch adds a kvm_filter_msr() function that other code can call
to install a hook on KVM MSR reads or writes.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Message-Id: <20221004225643.65036-3-agraf@csgraf.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 target/i386/kvm/kvm.c      | 123 +++++++++++++++++++++++++++++++++++++
 target/i386/kvm/kvm_i386.h |  11 ++++
 2 files changed, 134 insertions(+)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 5262806c47c..607469fb3e7 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -135,6 +135,8 @@ static struct kvm_cpuid2 *cpuid_cache;
 static struct kvm_cpuid2 *hv_cpuid_cache;
 static struct kvm_msr_list *kvm_feature_msrs;
 
+static KVMMSRHandlers msr_handlers[KVM_MSR_FILTER_MAX_RANGES];
+
 #define BUS_LOCK_SLICE_TIME 1000000000ULL /* ns */
 static RateLimit bus_lock_ratelimit_ctrl;
 
@@ -2524,6 +2526,15 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
                                 x86ms->bus_lock_ratelimit, BUS_LOCK_SLICE_TIME);
         }
     }
+    if (kvm_vm_check_extension(s, KVM_CAP_X86_USER_SPACE_MSR)) {
+        ret = kvm_vm_enable_cap(s, KVM_CAP_X86_USER_SPACE_MSR, 0,
+                                KVM_MSR_EXIT_REASON_FILTER);
+        if (ret) {
+            error_report("Could not enable user space MSRs: %s",
+                         strerror(-ret));
+            exit(1);
+        }
+    }
 
     return 0;
 }
@@ -4848,6 +4859,108 @@ void kvm_arch_update_guest_debug(CPUState *cpu, struct kvm_guest_debug *dbg)
     }
 }
 
+static bool kvm_install_msr_filters(KVMState *s)
+{
+    uint64_t zero = 0;
+    struct kvm_msr_filter filter = {
+        .flags = KVM_MSR_FILTER_DEFAULT_ALLOW,
+    };
+    int r, i, j = 0;
+
+    for (i = 0; i < KVM_MSR_FILTER_MAX_RANGES; i++) {
+        KVMMSRHandlers *handler = &msr_handlers[i];
+        if (handler->msr) {
+            struct kvm_msr_filter_range *range = &filter.ranges[j++];
+
+            *range = (struct kvm_msr_filter_range) {
+                .flags = 0,
+                .nmsrs = 1,
+                .base = handler->msr,
+                .bitmap = (__u8 *)&zero,
+            };
+
+            if (handler->rdmsr) {
+                range->flags |= KVM_MSR_FILTER_READ;
+            }
+
+            if (handler->wrmsr) {
+                range->flags |= KVM_MSR_FILTER_WRITE;
+            }
+        }
+    }
+
+    r = kvm_vm_ioctl(s, KVM_X86_SET_MSR_FILTER, &filter);
+    if (r) {
+        return false;
+    }
+
+    return true;
+}
+
+bool kvm_filter_msr(KVMState *s, uint32_t msr, QEMURDMSRHandler *rdmsr,
+                    QEMUWRMSRHandler *wrmsr)
+{
+    int i;
+
+    for (i = 0; i < ARRAY_SIZE(msr_handlers); i++) {
+        if (!msr_handlers[i].msr) {
+            msr_handlers[i] = (KVMMSRHandlers) {
+                .msr = msr,
+                .rdmsr = rdmsr,
+                .wrmsr = wrmsr,
+            };
+
+            if (!kvm_install_msr_filters(s)) {
+                msr_handlers[i] = (KVMMSRHandlers) { };
+                return false;
+            }
+
+            return true;
+        }
+    }
+
+    return false;
+}
+
+static int kvm_handle_rdmsr(X86CPU *cpu, struct kvm_run *run)
+{
+    int i;
+    bool r;
+
+    for (i = 0; i < ARRAY_SIZE(msr_handlers); i++) {
+        KVMMSRHandlers *handler = &msr_handlers[i];
+        if (run->msr.index == handler->msr) {
+            if (handler->rdmsr) {
+                r = handler->rdmsr(cpu, handler->msr,
+                                   (uint64_t *)&run->msr.data);
+                run->msr.error = r ? 0 : 1;
+                return 0;
+            }
+        }
+    }
+
+    assert(false);
+}
+
+static int kvm_handle_wrmsr(X86CPU *cpu, struct kvm_run *run)
+{
+    int i;
+    bool r;
+
+    for (i = 0; i < ARRAY_SIZE(msr_handlers); i++) {
+        KVMMSRHandlers *handler = &msr_handlers[i];
+        if (run->msr.index == handler->msr) {
+            if (handler->wrmsr) {
+                r = handler->wrmsr(cpu, handler->msr, run->msr.data);
+                run->msr.error = r ? 0 : 1;
+                return 0;
+            }
+        }
+    }
+
+    assert(false);
+}
+
 static bool has_sgx_provisioning;
 
 static bool __kvm_enable_sgx_provisioning(KVMState *s)
@@ -4947,6 +5060,16 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
         /* already handled in kvm_arch_post_run */
         ret = 0;
         break;
+    case KVM_EXIT_X86_RDMSR:
+        /* We only enable MSR filtering, any other exit is bogus */
+        assert(run->msr.reason == KVM_MSR_EXIT_REASON_FILTER);
+        ret = kvm_handle_rdmsr(cpu, run);
+        break;
+    case KVM_EXIT_X86_WRMSR:
+        /* We only enable MSR filtering, any other exit is bogus */
+        assert(run->msr.reason == KVM_MSR_EXIT_REASON_FILTER);
+        ret = kvm_handle_wrmsr(cpu, run);
+        break;
     case KVM_EXIT_HYPERCALL:
         ret = kvm_handle_exit_hypercall(cpu, run);
         break;
diff --git a/target/i386/kvm/kvm_i386.h b/target/i386/kvm/kvm_i386.h
index 4124912c202..2ed586c11b7 100644
--- a/target/i386/kvm/kvm_i386.h
+++ b/target/i386/kvm/kvm_i386.h
@@ -54,4 +54,15 @@ uint64_t kvm_swizzle_msi_ext_dest_id(uint64_t address);
 bool kvm_enable_sgx_provisioning(KVMState *s);
 void kvm_request_xsave_components(X86CPU *cpu, uint64_t mask);
 
+typedef bool QEMURDMSRHandler(X86CPU *cpu, uint32_t msr, uint64_t *val);
+typedef bool QEMUWRMSRHandler(X86CPU *cpu, uint32_t msr, uint64_t val);
+typedef struct kvm_msr_handlers {
+    uint32_t msr;
+    QEMURDMSRHandler *rdmsr;
+    QEMUWRMSRHandler *wrmsr;
+} KVMMSRHandlers;
+
+bool kvm_filter_msr(KVMState *s, uint32_t msr, QEMURDMSRHandler *rdmsr,
+                    QEMUWRMSRHandler *wrmsr);
+
 #endif
-- 
Gitee


From 389463c50ebb0894dcd747ab0e09279ff5e6ec29 Mon Sep 17 00:00:00 2001
From: Ashish Kalra <ashish.kalra@amd.com>
Date: Tue, 27 Jul 2021 17:59:33 +0000
Subject: [PATCH 365/407] kvm: Add support for userspace MSR filtering and
 handling of MSR_KVM_MIGRATION_CONTROL.

cherry-picked from https://github.com/AMDESE/qemu/commit/67935c3fd5f.

Add support for userspace MSR filtering using KVM_X86_SET_MSR_FILTER
ioctl and handling of MSRs in userspace. Currently this is only used
for SEV guests which use MSR_KVM_MIGRATION_CONTROL to indicate if the
guest is enabled and ready for migration.

KVM arch code calls into SEV guest specific code to delete the
SEV migrate blocker which has been setup at SEV_LAUNCH_FINISH.

Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: hanliyang <hanliyang@hygon.cn>
---
 target/i386/kvm/kvm.c | 37 +++++++++++++++++++++++++++++++++++++
 target/i386/sev.c     |  6 ++++++
 target/i386/sev.h     |  1 +
 3 files changed, 44 insertions(+)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 607469fb3e7..a833219d0a6 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2330,6 +2330,32 @@ static int kvm_get_supported_msrs(KVMState *s)
     return ret;
 }
 
+/*
+ * Currently this exit is only used by SEV guests for
+ * MSR_KVM_MIGRATION_CONTROL to indicate if the guest
+ * is ready for migration.
+ */
+static uint64_t msr_kvm_migration_control;
+
+static bool kvm_rdmsr_kvm_migration_control(X86CPU *cpu, uint32_t msr,
+                                            uint64_t *val)
+{
+    *val = msr_kvm_migration_control;
+
+    return true;
+}
+
+static bool kvm_wrmsr_kvm_migration_control(X86CPU *cpu, uint32_t msr,
+                                            uint64_t val)
+{
+    msr_kvm_migration_control = val;
+
+    if (val == KVM_MIGRATION_READY)
+        sev_del_migrate_blocker();
+
+    return true;
+}
+
 static Notifier smram_machine_done;
 static KVMMemoryListener smram_listener;
 static AddressSpace smram_address_space;
@@ -2527,6 +2553,8 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
         }
     }
     if (kvm_vm_check_extension(s, KVM_CAP_X86_USER_SPACE_MSR)) {
+        bool r;
+
         ret = kvm_vm_enable_cap(s, KVM_CAP_X86_USER_SPACE_MSR, 0,
                                 KVM_MSR_EXIT_REASON_FILTER);
         if (ret) {
@@ -2534,6 +2562,15 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
                          strerror(-ret));
             exit(1);
         }
+
+        r = kvm_filter_msr(s, MSR_KVM_MIGRATION_CONTROL,
+                           kvm_rdmsr_kvm_migration_control,
+                           kvm_wrmsr_kvm_migration_control);
+        if (!r) {
+            error_report("Could not install MSR_KVM_MIGRATION_CONTROL handler: %s",
+                         strerror(-ret));
+            exit(1);
+        }
     }
 
     return 0;
diff --git a/target/i386/sev.c b/target/i386/sev.c
index fac81ba72cd..a5537072719 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -890,6 +890,12 @@ sev_launch_finish(SevGuestState *sev)
     migrate_add_blocker(sev_mig_blocker, &error_fatal);
 }
 
+void
+sev_del_migrate_blocker(void)
+{
+    migrate_del_blocker(sev_mig_blocker);
+}
+
 static int
 sev_receive_finish(SevGuestState *s)
 {
diff --git a/target/i386/sev.h b/target/i386/sev.h
index b8ad840141f..7e53f84616b 100644
--- a/target/i386/sev.h
+++ b/target/i386/sev.h
@@ -70,6 +70,7 @@ int sev_add_shared_regions_list(unsigned long gfn_start, unsigned long gfn_end);
 int sev_save_outgoing_shared_regions_list(QEMUFile *f);
 int sev_load_incoming_shared_regions_list(QEMUFile *f);
 bool sev_is_gfn_in_unshared_region(unsigned long gfn);
+void sev_del_migrate_blocker(void);
 
 int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp);
 
-- 
Gitee


From fcb0d45523e50fd77bb692000868cb483fb42629 Mon Sep 17 00:00:00 2001
From: hanliyang <hanliyang@hygon.cn>
Date: Tue, 8 Dec 2020 22:57:46 -0500
Subject: [PATCH 366/407] anolis: migration/ram: Force encrypted status for VGA
 vram

The VGA vram memory region act as frame buffer of VM. This memory
is decrypted in the QEMU process. For CSV VM live migration, we
should avoid memory encryption status check on VGA vram.

Signed-off-by: hanliyang <hanliyang@hygon.cn>
---
 migration/ram.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index da7a809945c..fb05ff44ed8 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2262,6 +2262,10 @@ static bool encrypted_test_list(RAMState *rs, RAMBlock *block,
         return false;
     }
 
+    if (!strcmp(memory_region_name(block->mr), "vga.vram")) {
+        return false;
+    }
+
     /*
      * Translate page in ram_addr_t address space to GPA address
      * space using memory region.
-- 
Gitee


From 2787d58772580af75efd7424d642d78332530436 Mon Sep 17 00:00:00 2001
From: hanliyang <hanliyang@hygon.cn>
Date: Sun, 16 Jan 2022 19:57:58 -0500
Subject: [PATCH 367/407] anolis: target/i386: sev: Clear shared_regions_list
 when reboot CSV Guest

Also fix memory leak in sev_remove_shared_regions_list().

Signed-off-by: hanliyang <hanliyang@hygon.cn>
---
 target/i386/kvm/kvm.c | 6 ++++++
 target/i386/sev.c     | 5 +++--
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index a833219d0a6..f0b61ec3cfd 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2143,6 +2143,12 @@ void kvm_arch_reset_vcpu(X86CPU *cpu)
 
         hyperv_x86_synic_reset(cpu);
     }
+
+    if (cpu_is_bsp(cpu) &&
+        sev_enabled() && has_map_gpa_range) {
+        sev_remove_shared_regions_list(0, -1);
+    }
+
     /* enabled by default */
     env->poll_control_msr = 1;
 
diff --git a/target/i386/sev.c b/target/i386/sev.c
index a5537072719..671214e8ba9 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -1672,9 +1672,9 @@ int sev_load_incoming_page(QEMUFile *f, uint8_t *ptr)
 int sev_remove_shared_regions_list(unsigned long start, unsigned long end)
 {
     SevGuestState *s = sev_guest;
-    struct shared_region *pos;
+    struct shared_region *pos, *next_pos;
 
-    QTAILQ_FOREACH(pos, &s->shared_regions_list, list) {
+    QTAILQ_FOREACH_SAFE(pos, &s->shared_regions_list, list, next_pos) {
         unsigned long l, r;
         unsigned long curr_gfn_end = pos->gfn_end;
 
@@ -1688,6 +1688,7 @@ int sev_remove_shared_regions_list(unsigned long start, unsigned long end)
         if (l <= r) {
             if (pos->gfn_start == l && pos->gfn_end == r) {
                 QTAILQ_REMOVE(&s->shared_regions_list, pos, list);
+                g_free(pos);
             } else if (l == pos->gfn_start) {
                 pos->gfn_start = r;
             } else if (r == pos->gfn_end) {
-- 
Gitee


From 46b4d9933853b24fc9ee4af021555220d47b25b6 Mon Sep 17 00:00:00 2001
From: hanliyang <hanliyang@hygon.cn>
Date: Sun, 16 Jan 2022 20:05:02 -0500
Subject: [PATCH 368/407] anolis: migration/ram: Fix calculation of gfn
 correpond to a page in ramblock

A RAMBlock contains a host memory region which may consist of many
discontiguous MemoryRegion in AddressSpace of a Guest, so we cannot
get gpa by MemoryRegion.addr. Since KVM memslot records the relationship
between gpa and hva, so we can pass the hva of page in RAMBlock to
kvm_phisical_memory_addr_from_host() to get the expected gpa.

Signed-off-by: hanliyang <hanliyang@hygon.cn>
---
 migration/ram.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/migration/ram.c b/migration/ram.c
index fb05ff44ed8..33e2c9d5fbd 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -59,6 +59,7 @@
 
 /* Defines RAM_SAVE_ENCRYPTED_PAGE and RAM_SAVE_SHARED_REGION_LIST */
 #include "target/i386/sev.h"
+#include "sysemu/kvm.h"
 
 #include "hw/boards.h" /* for machine_dump_guest_core() */
 
@@ -2248,6 +2249,8 @@ static bool encrypted_test_list(RAMState *rs, RAMBlock *block,
     struct ConfidentialGuestMemoryEncryptionOps *ops =
         cgs_class->memory_encryption_ops;
     unsigned long gfn;
+    hwaddr paddr = 0;
+    int ret;
 
     /* ROM devices contains the unencrypted data */
     if (memory_region_is_rom(block->mr)) {
@@ -2270,7 +2273,14 @@ static bool encrypted_test_list(RAMState *rs, RAMBlock *block,
      * Translate page in ram_addr_t address space to GPA address
      * space using memory region.
      */
-    gfn = page + (block->mr->addr >> TARGET_PAGE_BITS);
+    if (kvm_enabled()) {
+        ret = kvm_physical_memory_addr_from_host(kvm_state,
+                           block->host + (page << TARGET_PAGE_BITS), &paddr);
+        if (ret == 0) {
+            return false;
+        }
+    }
+    gfn = paddr >> TARGET_PAGE_BITS;
 
     return ops->is_gfn_in_unshared_region(gfn);
 }
-- 
Gitee


From 05d7e04e9d493793487364a801945b8297d80a75 Mon Sep 17 00:00:00 2001
From: hanliyang <hanliyang@hygon.cn>
Date: Wed, 1 Nov 2023 06:00:11 +0000
Subject: [PATCH 369/407] anolis: target/i386: csv: Move is_hygon_cpu() to
 header file csv.h

The helper function is_hygon_cpu() will be called by functions out of
target/i386/csv.c. Move it to header file csv.h so that it can be a
common helper function.

Signed-off-by: hanliyang <hanliyang@hygon.cn>
---
 target/i386/csv.c | 20 --------------------
 target/i386/csv.h | 22 ++++++++++++++++++++++
 2 files changed, 22 insertions(+), 20 deletions(-)

diff --git a/target/i386/csv.c b/target/i386/csv.c
index d166d3775eb..161cad39ae9 100644
--- a/target/i386/csv.c
+++ b/target/i386/csv.c
@@ -27,28 +27,8 @@
 
 CsvGuestState csv_guest = { 0 };
 
-#define CPUID_VENDOR_HYGON_EBX   0x6f677948  /* "Hygo" */
-#define CPUID_VENDOR_HYGON_ECX   0x656e6975  /* "uine" */
-#define CPUID_VENDOR_HYGON_EDX   0x6e65476e  /* "nGen" */
-
 #define GUEST_POLICY_CSV_BIT     (1 << 6)
 
-static bool is_hygon_cpu(void)
-{
-    uint32_t ebx = 0;
-    uint32_t ecx = 0;
-    uint32_t edx = 0;
-
-    host_cpuid(0, 0, NULL, &ebx, &ecx, &edx);
-
-    if (ebx == CPUID_VENDOR_HYGON_EBX &&
-        ecx == CPUID_VENDOR_HYGON_ECX &&
-        edx == CPUID_VENDOR_HYGON_EDX)
-        return true;
-   else
-        return false;
-}
-
 int
 csv_init(uint32_t policy, int fd, void *state, struct sev_ops *ops)
 {
diff --git a/target/i386/csv.h b/target/i386/csv.h
index 7dc5c75366f..4f5b2773c4a 100644
--- a/target/i386/csv.h
+++ b/target/i386/csv.h
@@ -16,6 +16,28 @@
 
 #include "qapi/qapi-commands-misc-target.h"
 
+#include "cpu.h"
+
+#define CPUID_VENDOR_HYGON_EBX   0x6f677948  /* "Hygo" */
+#define CPUID_VENDOR_HYGON_ECX   0x656e6975  /* "uine" */
+#define CPUID_VENDOR_HYGON_EDX   0x6e65476e  /* "nGen" */
+
+static __attribute__((unused)) bool is_hygon_cpu(void)
+{
+    uint32_t ebx = 0;
+    uint32_t ecx = 0;
+    uint32_t edx = 0;
+
+    host_cpuid(0, 0, NULL, &ebx, &ecx, &edx);
+
+    if (ebx == CPUID_VENDOR_HYGON_EBX &&
+        ecx == CPUID_VENDOR_HYGON_ECX &&
+        edx == CPUID_VENDOR_HYGON_EDX)
+        return true;
+    else
+        return false;
+}
+
 #ifdef CONFIG_CSV
 bool csv_enabled(void);
 #else
-- 
Gitee


From c9ecb567c7eccdfe3edf516a90d48ee534d12c36 Mon Sep 17 00:00:00 2001
From: hanliyang <hanliyang@hygon.cn>
Date: Mon, 13 Nov 2023 21:55:33 +0000
Subject: [PATCH 370/407] anolis: target/i386: csv: Read cert chain from file
 when prepared for CSV live migration

The cert chain is too long when encoded with base64, use the filename
of cert chain instead of the encoded string when prepared for CSV live
migration.

Signed-off-by: hanliyang <hanliyang@hygon.cn>
---
 qapi/migration.json | 24 +++++++++++++++---------
 target/i386/sev.c   | 29 +++++++++++++++++++++++++----
 2 files changed, 40 insertions(+), 13 deletions(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index bad53a2f8db..8632abb2068 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -774,14 +774,16 @@
 #                        block device name if there is one, and to their node name
 #                        otherwise. (Since 5.2)
 #
-# @sev-pdh: The target host platform diffie-hellman key encoded in base64
+# @sev-pdh: The target host platform diffie-hellman key encoded in base64, or
+#           pdh filename for hygon
 #           (Since 4.2)
 #
-# @sev-plat-cert: The target host platform certificate chain encoded in base64
+# @sev-plat-cert: The target host platform certificate chain encoded in base64,
+#                 or plat cert filename for hygon
 #                 (Since 4.2)
 #
 # @sev-amd-cert: AMD certificate chain which include ASK and OCA encoded in
-#                base64 (Since 4.2)
+#                base64, or vendor cert filename for hygon (Since 4.2)
 #
 # Features:
 # @unstable: Member @x-checkpoint-delay is experimental.
@@ -949,14 +951,16 @@
 #                        block device name if there is one, and to their node name
 #                        otherwise. (Since 5.2)
 #
-# @sev-pdh: The target host platform diffie-hellman key encoded in base64
+# @sev-pdh: The target host platform diffie-hellman key encoded in base64, or
+#           pdh filename for hygon
 #           (Since 4.2)
 #
-# @sev-plat-cert: The target host platform certificate chain encoded in base64
+# @sev-plat-cert: The target host platform certificate chain encoded in base64,
+#                 or plat cert filename for hygon
 #                 (Since 4.2)
 #
 # @sev-amd-cert: AMD certificate chain which include ASK and OCA encoded in
-#                base64 (Since 4.2)
+#                base64, or vendor cert filename for hygon (Since 4.2)
 #
 # Features:
 # @unstable: Member @x-checkpoint-delay is experimental.
@@ -1161,14 +1165,16 @@
 #                        block device name if there is one, and to their node name
 #                        otherwise. (Since 5.2)
 #
-# @sev-pdh: The target host platform diffie-hellman key encoded in base64
+# @sev-pdh: The target host platform diffie-hellman key encoded in base64, or
+#           pdh filename for hygon
 #           (Since 4.2)
 #
-# @sev-plat-cert: The target host platform certificate chain encoded in base64
+# @sev-plat-cert: The target host platform certificate chain encoded in base64,
+#                 or plat cert filename for hygon
 #                 (Since 4.2)
 #
 # @sev-amd-cert: AMD certificate chain which include ASK and OCA encoded in
-#                base64 (Since 4.2)
+#                base64, or vendor cert filename for hygon (Since 4.2)
 #
 # Features:
 # @unstable: Member @x-checkpoint-delay is experimental.
diff --git a/target/i386/sev.c b/target/i386/sev.c
index 671214e8ba9..ac56c4d3f93 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -944,18 +944,39 @@ int sev_save_setup(const char *pdh, const char *plat_cert,
 {
     SevGuestState *s = sev_guest;
 
-    s->remote_pdh = g_base64_decode(pdh, &s->remote_pdh_len);
+    if (is_hygon_cpu()) {
+        if (sev_read_file_base64(pdh, &s->remote_pdh,
+                                 &s->remote_pdh_len) < 0) {
+            goto error;
+        }
+    } else {
+        s->remote_pdh = g_base64_decode(pdh, &s->remote_pdh_len);
+    }
     if (!check_blob_length(s->remote_pdh_len)) {
         goto error;
     }
 
-    s->remote_plat_cert = g_base64_decode(plat_cert,
-                                          &s->remote_plat_cert_len);
+    if (is_hygon_cpu()) {
+        if (sev_read_file_base64(plat_cert, &s->remote_plat_cert,
+                                 &s->remote_plat_cert_len) < 0) {
+            goto error;
+        }
+    } else {
+        s->remote_plat_cert = g_base64_decode(plat_cert,
+                                              &s->remote_plat_cert_len);
+    }
     if (!check_blob_length(s->remote_plat_cert_len)) {
         goto error;
     }
 
-    s->amd_cert = g_base64_decode(amd_cert, &s->amd_cert_len);
+    if (is_hygon_cpu()) {
+        if (sev_read_file_base64(amd_cert, &s->amd_cert,
+                                 &s->amd_cert_len) < 0) {
+            goto error;
+        }
+    } else {
+        s->amd_cert = g_base64_decode(amd_cert, &s->amd_cert_len);
+    }
     if (!check_blob_length(s->amd_cert_len)) {
         goto error;
     }
-- 
Gitee


From a49730959a15931aca2d339802fa283fc702951c Mon Sep 17 00:00:00 2001
From: fangbaoshun <fangbaoshun@hygon.cn>
Date: Mon, 2 Aug 2021 11:00:07 +0800
Subject: [PATCH 371/407] anolis: target/i386: csv: add support to queue the
 outgoing page into a list

The csv_queue_outgoing_page() provide the implementation to queue the
guest private pages during transmission. The routines queues the outgoing
pages into a listi, and then issues the KVM_CSV_COMMAND_BATCH command to
encrypt the pages togather before writing them to the socket.

Signed-off-by: hanliyang <hanliyang@hygon.cn>
---
 include/exec/confidential-guest-support.h |   3 +
 linux-headers/linux/kvm.h                 |   6 +
 target/i386/sev.c                         | 172 ++++++++++++++++++++++
 target/i386/sev.h                         |   2 +
 4 files changed, 183 insertions(+)

diff --git a/include/exec/confidential-guest-support.h b/include/exec/confidential-guest-support.h
index 343f686fc2b..1f32519d425 100644
--- a/include/exec/confidential-guest-support.h
+++ b/include/exec/confidential-guest-support.h
@@ -77,6 +77,9 @@ struct ConfidentialGuestMemoryEncryptionOps {
 
     /* Load the shared regions list */
     int (*load_incoming_shared_regions_list)(QEMUFile *f);
+
+    /* Queue the encrypted page and metadata associated with it into a list */
+    int (*queue_outgoing_page)(uint8_t *ptr, uint32_t size, uint64_t addr);
 };
 
 typedef struct ConfidentialGuestSupportClass {
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 8ed722236ee..7a04c993ad1 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -1938,6 +1938,12 @@ struct kvm_csv_init_data {
 	__u64 nodemask;
 };
 
+struct kvm_csv_batch_list_node {
+    __u64 cmd_data_addr;
+    __u64 addr;
+    __u64 next_cmd_addr;
+};
+
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
 #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
diff --git a/target/i386/sev.c b/target/i386/sev.c
index ac56c4d3f93..74768275704 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -50,6 +50,15 @@ struct shared_region {
     QTAILQ_ENTRY(shared_region) list;
 };
 
+typedef struct CsvBatchCmdList CsvBatchCmdList;
+typedef void (*CsvDestroyCmdNodeFn) (void *data);
+
+struct CsvBatchCmdList {
+    struct kvm_csv_batch_list_node *head;
+    struct kvm_csv_batch_list_node *tail;
+    CsvDestroyCmdNodeFn destroy_fn;
+};
+
 extern struct sev_ops sev_ops;
 
 /**
@@ -96,6 +105,9 @@ struct SevGuestState {
     bool reset_data_valid;
 
     QTAILQ_HEAD(, shared_region) shared_regions_list;
+
+    /* link list used for HYGON CSV */
+    CsvBatchCmdList *csv_batch_cmd_list;
 };
 
 #define DEFAULT_GUEST_POLICY    0x1 /* disable debug */
@@ -188,6 +200,7 @@ static struct ConfidentialGuestMemoryEncryptionOps sev_memory_encryption_ops = {
     .is_gfn_in_unshared_region = sev_is_gfn_in_unshared_region,
     .save_outgoing_shared_regions_list = sev_save_outgoing_shared_regions_list,
     .load_incoming_shared_regions_list = sev_load_incoming_shared_regions_list,
+    .queue_outgoing_page = csv_queue_outgoing_page,
 };
 
 static int
@@ -1840,6 +1853,165 @@ bool sev_is_gfn_in_unshared_region(unsigned long gfn)
     return true;
 }
 
+#include "csv.h"
+
+static CsvBatchCmdList *
+csv_batch_cmd_list_create(struct kvm_csv_batch_list_node *head,
+                          CsvDestroyCmdNodeFn func)
+{
+    CsvBatchCmdList *csv_batch_cmd_list =
+                        g_malloc0(sizeof(*csv_batch_cmd_list));
+
+    if (!csv_batch_cmd_list) {
+        return NULL;
+    }
+
+    csv_batch_cmd_list->head = head;
+    csv_batch_cmd_list->tail = head;
+    csv_batch_cmd_list->destroy_fn = func;
+
+    return csv_batch_cmd_list;
+}
+
+static int
+csv_batch_cmd_list_add_after(CsvBatchCmdList *list,
+                             struct kvm_csv_batch_list_node *new_node)
+{
+    list->tail->next_cmd_addr = (__u64)new_node;
+    list->tail = new_node;
+
+    return 0;
+}
+
+static struct kvm_csv_batch_list_node *
+csv_batch_cmd_list_node_create(uint64_t cmd_data_addr, uint64_t addr)
+{
+    struct kvm_csv_batch_list_node *new_node =
+                        g_malloc0(sizeof(struct kvm_csv_batch_list_node));
+
+    if (!new_node) {
+        return NULL;
+    }
+
+    new_node->cmd_data_addr = cmd_data_addr;
+    new_node->addr = addr;
+    new_node->next_cmd_addr = 0;
+
+    return new_node;
+}
+
+static int csv_batch_cmd_list_destroy(CsvBatchCmdList *list)
+{
+    struct kvm_csv_batch_list_node *node = list->head;
+
+    while (node != NULL) {
+        if (list->destroy_fn != NULL)
+            list->destroy_fn((void *)node->cmd_data_addr);
+
+        list->head = (struct kvm_csv_batch_list_node *)node->next_cmd_addr;
+        g_free(node);
+        node = list->head;
+    }
+
+    g_free(list);
+    return 0;
+}
+
+static void send_update_data_free(void *data)
+{
+    struct kvm_sev_send_update_data *update =
+                        (struct kvm_sev_send_update_data *)data;
+    g_free((guchar *)update->hdr_uaddr);
+    g_free((guchar *)update->trans_uaddr);
+    g_free(update);
+}
+
+static int
+csv_send_queue_data(SevGuestState *s, uint8_t *ptr,
+                    uint32_t size, uint64_t addr)
+{
+    int ret = 0;
+    int fw_error;
+    guchar *trans;
+    guchar *packet_hdr;
+    struct kvm_sev_send_update_data *update;
+    struct kvm_csv_batch_list_node *new_node = NULL;
+
+    /* If this is first call then query the packet header bytes and allocate
+     * the packet buffer.
+     */
+    if (s->send_packet_hdr_len < 1) {
+        s->send_packet_hdr_len = sev_send_get_packet_len(&fw_error);
+        if (s->send_packet_hdr_len < 1) {
+            error_report("%s: SEND_UPDATE fw_error=%d '%s'",
+                    __func__, fw_error, fw_error_to_str(fw_error));
+            return 1;
+        }
+    }
+
+    packet_hdr = g_new(guchar, s->send_packet_hdr_len);
+    memset(packet_hdr, 0, s->send_packet_hdr_len);
+
+    update = g_new0(struct kvm_sev_send_update_data, 1);
+
+    /* allocate transport buffer */
+    trans = g_new(guchar, size);
+
+    update->hdr_uaddr = (unsigned long)packet_hdr;
+    update->hdr_len = s->send_packet_hdr_len;
+    update->guest_uaddr = (unsigned long)ptr;
+    update->guest_len = size;
+    update->trans_uaddr = (unsigned long)trans;
+    update->trans_len = size;
+
+    new_node = csv_batch_cmd_list_node_create((uint64_t)update, addr);
+    if (!new_node) {
+        ret = -ENOMEM;
+        goto err;
+    }
+
+    if (s->csv_batch_cmd_list == NULL) {
+        s->csv_batch_cmd_list = csv_batch_cmd_list_create(new_node,
+                                                send_update_data_free);
+        if (s->csv_batch_cmd_list == NULL) {
+            ret = -ENOMEM;
+            goto err;
+        }
+    } else {
+        /* Add new_node's command address to the last_node */
+        csv_batch_cmd_list_add_after(s->csv_batch_cmd_list, new_node);
+    }
+
+    trace_kvm_sev_send_update_data(ptr, trans, size);
+
+    return ret;
+
+err:
+    g_free(trans);
+    g_free(update);
+    g_free(packet_hdr);
+    g_free(new_node);
+    if (s->csv_batch_cmd_list) {
+        csv_batch_cmd_list_destroy(s->csv_batch_cmd_list);
+        s->csv_batch_cmd_list = NULL;
+    }
+    return ret;
+}
+
+int
+csv_queue_outgoing_page(uint8_t *ptr, uint32_t sz, uint64_t addr)
+{
+    SevGuestState *s = sev_guest;
+
+    /* Only support for HYGON CSV */
+    if (!is_hygon_cpu()) {
+        error_report("Only support enqueue pages for HYGON CSV");
+        return -EINVAL;
+    }
+
+    return csv_send_queue_data(s, ptr, sz, addr);
+}
+
 static const QemuUUID sev_hash_table_header_guid = {
     .data = UUID_LE(0x9438d606, 0x4f22, 0x4cc9, 0xb4, 0x79, 0xa7, 0x93,
                     0xd4, 0x11, 0xfd, 0x21)
diff --git a/target/i386/sev.h b/target/i386/sev.h
index 7e53f84616b..1eba67bcf7f 100644
--- a/target/i386/sev.h
+++ b/target/i386/sev.h
@@ -74,6 +74,8 @@ void sev_del_migrate_blocker(void);
 
 int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp);
 
+int csv_queue_outgoing_page(uint8_t *ptr, uint32_t sz, uint64_t addr);
+
 struct sev_ops {
     int (*sev_ioctl)(int fd, int cmd, void *data, int *error);
     const char *(*fw_error_to_str)(int code);
-- 
Gitee


From 493719dec9535d40307e32070c3836e691144797 Mon Sep 17 00:00:00 2001
From: fangbaoshun <fangbaoshun@hygon.cn>
Date: Mon, 2 Aug 2021 11:41:58 +0800
Subject: [PATCH 372/407] anolis: target/i386: csv: add support to encrypt the
 outgoing pages in the list queued before.

The csv_save_queued_outgoing_pages() provide the implementation to encrypt
the guest private pages during transmission. The routines uses SEND_START
command to create the outgoing encryption context on the first call then
uses COMMAND_BATCH command to send the SEND_UPDATE_DATA commands queued
in the list to encrypt the data before writing it to the socket. While
encrypting the data SEND_UPDATE_DATA produces some metadata (e.g MAC, IV).
The metadata is also sent to the target machine. After migration is completed,
we issue the SEND_FINISH command to transition the SEV guest state from sending
to unrunnable state.

Signed-off-by: hanliyang <hanliyang@hygon.cn>
---
 include/exec/confidential-guest-support.h |  4 +
 linux-headers/linux/kvm.h                 |  8 ++
 target/i386/sev.c                         | 89 +++++++++++++++++++++++
 target/i386/sev.h                         |  4 +
 4 files changed, 105 insertions(+)

diff --git a/include/exec/confidential-guest-support.h b/include/exec/confidential-guest-support.h
index 1f32519d425..1ff4823b6a9 100644
--- a/include/exec/confidential-guest-support.h
+++ b/include/exec/confidential-guest-support.h
@@ -80,6 +80,10 @@ struct ConfidentialGuestMemoryEncryptionOps {
 
     /* Queue the encrypted page and metadata associated with it into a list */
     int (*queue_outgoing_page)(uint8_t *ptr, uint32_t size, uint64_t addr);
+
+    /* Write the list queued with encrypted pages and metadata associated
+     * with them */
+    int (*save_queued_outgoing_pages)(QEMUFile *f, uint64_t *bytes_sent);
 };
 
 typedef struct ConfidentialGuestSupportClass {
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 7a04c993ad1..2289dc7c090 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -1823,6 +1823,9 @@ enum sev_cmd_id {
 	/* Guest Migration Extension */
 	KVM_SEV_SEND_CANCEL,
 
+	/* Hygon CSV batch command */
+	KVM_CSV_COMMAND_BATCH = 0x18,
+
 	KVM_SEV_NR_MAX,
 };
 
@@ -1944,6 +1947,11 @@ struct kvm_csv_batch_list_node {
     __u64 next_cmd_addr;
 };
 
+struct kvm_csv_command_batch {
+	__u32 command_id;
+	__u64 csv_batch_list_uaddr;
+};
+
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
 #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
diff --git a/target/i386/sev.c b/target/i386/sev.c
index 74768275704..dd8d8358ae0 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -201,6 +201,7 @@ static struct ConfidentialGuestMemoryEncryptionOps sev_memory_encryption_ops = {
     .save_outgoing_shared_regions_list = sev_save_outgoing_shared_regions_list,
     .load_incoming_shared_regions_list = sev_load_incoming_shared_regions_list,
     .queue_outgoing_page = csv_queue_outgoing_page,
+    .save_queued_outgoing_pages = csv_save_queued_outgoing_pages,
 };
 
 static int
@@ -1998,6 +1999,70 @@ err:
     return ret;
 }
 
+static int
+csv_command_batch(uint32_t cmd_id, uint64_t head_uaddr, int *fw_err)
+{
+    int ret;
+    struct kvm_csv_command_batch command_batch = { };
+
+    command_batch.command_id = cmd_id;
+    command_batch.csv_batch_list_uaddr = head_uaddr;
+
+    ret = sev_ioctl(sev_guest->sev_fd, KVM_CSV_COMMAND_BATCH,
+                    &command_batch, fw_err);
+    if (ret) {
+        error_report("%s: COMMAND_BATCH ret=%d fw_err=%d '%s'",
+                __func__, ret, *fw_err, fw_error_to_str(*fw_err));
+    }
+
+    return ret;
+}
+
+static int
+csv_send_update_data_batch(SevGuestState *s, QEMUFile *f, uint64_t *bytes_sent)
+{
+    int ret, fw_error = 0;
+    struct kvm_sev_send_update_data *update;
+    struct kvm_csv_batch_list_node *node;
+
+    ret = csv_command_batch(KVM_SEV_SEND_UPDATE_DATA,
+                            (uint64_t)s->csv_batch_cmd_list->head, &fw_error);
+    if (ret) {
+        error_report("%s: csv_command_batch ret=%d fw_error=%d '%s'",
+                __func__, ret, fw_error, fw_error_to_str(fw_error));
+        goto err;
+    }
+
+    *bytes_sent = 0;
+    for (node = s->csv_batch_cmd_list->head;
+         node != NULL;
+         node = (struct kvm_csv_batch_list_node *)node->next_cmd_addr) {
+        if (node != s->csv_batch_cmd_list->head) {
+            /* head's page header is saved before send_update_data */
+            qemu_put_be64(f, node->addr);
+            *bytes_sent += 8;
+            if (node->next_cmd_addr != 0)
+                qemu_put_be32(f, RAM_SAVE_ENCRYPTED_PAGE_BATCH);
+            else
+                qemu_put_be32(f, RAM_SAVE_ENCRYPTED_PAGE_BATCH_END);
+            *bytes_sent += 4;
+        }
+        update = (struct kvm_sev_send_update_data *)node->cmd_data_addr;
+        qemu_put_be32(f, update->hdr_len);
+        qemu_put_buffer(f, (uint8_t *)update->hdr_uaddr, update->hdr_len);
+        *bytes_sent += (4 + update->hdr_len);
+
+        qemu_put_be32(f, update->trans_len);
+        qemu_put_buffer(f, (uint8_t *)update->trans_uaddr, update->trans_len);
+        *bytes_sent += (4 + update->trans_len);
+    }
+
+err:
+    csv_batch_cmd_list_destroy(s->csv_batch_cmd_list);
+    s->csv_batch_cmd_list = NULL;
+    return ret;
+}
+
 int
 csv_queue_outgoing_page(uint8_t *ptr, uint32_t sz, uint64_t addr)
 {
@@ -2012,6 +2077,30 @@ csv_queue_outgoing_page(uint8_t *ptr, uint32_t sz, uint64_t addr)
     return csv_send_queue_data(s, ptr, sz, addr);
 }
 
+int
+csv_save_queued_outgoing_pages(QEMUFile *f, uint64_t *bytes_sent)
+{
+    SevGuestState *s = sev_guest;
+
+    /* Only support for HYGON CSV */
+    if (!is_hygon_cpu()) {
+        error_report("Only support transfer queued pages for HYGON CSV");
+        return -EINVAL;
+    }
+
+    /*
+     * If this is a first buffer then create outgoing encryption context
+     * and write our PDH, policy and session data.
+     */
+    if (!sev_check_state(s, SEV_STATE_SEND_UPDATE) &&
+        sev_send_start(s, f, bytes_sent)) {
+        error_report("Failed to create outgoing context");
+        return 1;
+    }
+
+    return csv_send_update_data_batch(s, f, bytes_sent);
+}
+
 static const QemuUUID sev_hash_table_header_guid = {
     .data = UUID_LE(0x9438d606, 0x4f22, 0x4cc9, 0xb4, 0x79, 0xa7, 0x93,
                     0xd4, 0x11, 0xfd, 0x21)
diff --git a/target/i386/sev.h b/target/i386/sev.h
index 1eba67bcf7f..37ed17f3212 100644
--- a/target/i386/sev.h
+++ b/target/i386/sev.h
@@ -41,6 +41,9 @@ typedef struct SevKernelLoaderContext {
 #define RAM_SAVE_ENCRYPTED_PAGE           0x1
 #define RAM_SAVE_SHARED_REGIONS_LIST      0x2
 
+#define RAM_SAVE_ENCRYPTED_PAGE_BATCH     0x4
+#define RAM_SAVE_ENCRYPTED_PAGE_BATCH_END 0x5
+
 #ifdef CONFIG_SEV
 bool sev_enabled(void);
 bool sev_es_enabled(void);
@@ -75,6 +78,7 @@ void sev_del_migrate_blocker(void);
 int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp);
 
 int csv_queue_outgoing_page(uint8_t *ptr, uint32_t sz, uint64_t addr);
+int csv_save_queued_outgoing_pages(QEMUFile *f, uint64_t *bytes_sent);
 
 struct sev_ops {
     int (*sev_ioctl)(int fd, int cmd, void *data, int *error);
-- 
Gitee


From 1a838c86ca5a14e6addbd56aeda85bbf18cf64df Mon Sep 17 00:00:00 2001
From: fangbaoshun <fangbaoshun@hygon.cn>
Date: Mon, 2 Aug 2021 13:49:48 +0800
Subject: [PATCH 373/407] anolis: target/i386: csv: add support to queue the
 incoming page into a list

The csv_queue_incoming_page() provide the implementation to queue the
guest private pages during transmission. The routines queues the incoming
socket which contains the guest private pages into a list then uses the
COMMAND_BATCH command to load the encrypted pages into the guest memory.

Signed-off-by: hanliyang <hanliyang@hygon.cn>
---
 include/exec/confidential-guest-support.h |  3 +
 target/i386/sev.c                         | 92 +++++++++++++++++++++++
 target/i386/sev.h                         |  1 +
 3 files changed, 96 insertions(+)

diff --git a/include/exec/confidential-guest-support.h b/include/exec/confidential-guest-support.h
index 1ff4823b6a9..530b5057b91 100644
--- a/include/exec/confidential-guest-support.h
+++ b/include/exec/confidential-guest-support.h
@@ -84,6 +84,9 @@ struct ConfidentialGuestMemoryEncryptionOps {
     /* Write the list queued with encrypted pages and metadata associated
      * with them */
     int (*save_queued_outgoing_pages)(QEMUFile *f, uint64_t *bytes_sent);
+
+    /* Queue the incoming encrypted page into a list */
+    int (*queue_incoming_page)(QEMUFile *f, uint8_t *ptr);
 };
 
 typedef struct ConfidentialGuestSupportClass {
diff --git a/target/i386/sev.c b/target/i386/sev.c
index dd8d8358ae0..30afa5d58c6 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -202,6 +202,7 @@ static struct ConfidentialGuestMemoryEncryptionOps sev_memory_encryption_ops = {
     .load_incoming_shared_regions_list = sev_load_incoming_shared_regions_list,
     .queue_outgoing_page = csv_queue_outgoing_page,
     .save_queued_outgoing_pages = csv_save_queued_outgoing_pages,
+    .queue_incoming_page = csv_queue_incoming_page,
 };
 
 static int
@@ -1927,6 +1928,15 @@ static void send_update_data_free(void *data)
     g_free(update);
 }
 
+static void receive_update_data_free(void *data)
+{
+    struct kvm_sev_receive_update_data *update =
+                        (struct kvm_sev_receive_update_data *)data;
+    g_free((guchar *)update->hdr_uaddr);
+    g_free((guchar *)update->trans_uaddr);
+    g_free(update);
+}
+
 static int
 csv_send_queue_data(SevGuestState *s, uint8_t *ptr,
                     uint32_t size, uint64_t addr)
@@ -1999,6 +2009,66 @@ err:
     return ret;
 }
 
+static int
+csv_receive_queue_data(SevGuestState *s, QEMUFile *f, uint8_t *ptr)
+{
+    int ret = 0;
+    gchar *hdr = NULL, *trans = NULL;
+    struct kvm_sev_receive_update_data *update;
+    struct kvm_csv_batch_list_node *new_node = NULL;
+
+    update = g_new0(struct kvm_sev_receive_update_data, 1);
+    /* get packet header */
+    update->hdr_len = qemu_get_be32(f);
+    hdr = g_new(gchar, update->hdr_len);
+    qemu_get_buffer(f, (uint8_t *)hdr, update->hdr_len);
+    update->hdr_uaddr = (unsigned long)hdr;
+
+    /* get transport buffer */
+    update->trans_len = qemu_get_be32(f);
+    trans = g_new(gchar, update->trans_len);
+    update->trans_uaddr = (unsigned long)trans;
+    qemu_get_buffer(f, (uint8_t *)update->trans_uaddr, update->trans_len);
+
+    /* set guest address,guest len is page_size */
+    update->guest_uaddr = (uint64_t)ptr;
+    update->guest_len = TARGET_PAGE_SIZE;
+
+    new_node = csv_batch_cmd_list_node_create((uint64_t)update, 0);
+    if (!new_node) {
+        ret = -ENOMEM;
+        goto err;
+    }
+
+    if (s->csv_batch_cmd_list == NULL) {
+        s->csv_batch_cmd_list = csv_batch_cmd_list_create(new_node,
+                                                receive_update_data_free);
+        if (s->csv_batch_cmd_list == NULL) {
+            ret = -ENOMEM;
+            goto err;
+        }
+    } else {
+        /* Add new_node's command address to the last_node */
+        csv_batch_cmd_list_add_after(s->csv_batch_cmd_list, new_node);
+    }
+
+    trace_kvm_sev_receive_update_data(trans, (void *)ptr, update->guest_len,
+            (void *)hdr, update->hdr_len);
+
+    return ret;
+
+err:
+    g_free(trans);
+    g_free(update);
+    g_free(hdr);
+    g_free(new_node);
+    if (s->csv_batch_cmd_list) {
+        csv_batch_cmd_list_destroy(s->csv_batch_cmd_list);
+        s->csv_batch_cmd_list = NULL;
+    }
+    return ret;
+}
+
 static int
 csv_command_batch(uint32_t cmd_id, uint64_t head_uaddr, int *fw_err)
 {
@@ -2077,6 +2147,28 @@ csv_queue_outgoing_page(uint8_t *ptr, uint32_t sz, uint64_t addr)
     return csv_send_queue_data(s, ptr, sz, addr);
 }
 
+int csv_queue_incoming_page(QEMUFile *f, uint8_t *ptr)
+{
+    SevGuestState *s = sev_guest;
+
+    /* Only support for HYGON CSV */
+    if (!is_hygon_cpu()) {
+        error_report("Only support enqueue received pages for HYGON CSV");
+        return -EINVAL;
+    }
+
+    /*
+     * If this is first buffer and SEV is not in recieiving state then
+     * use RECEIVE_START command to create a encryption context.
+     */
+    if (!sev_check_state(s, SEV_STATE_RECEIVE_UPDATE) &&
+        sev_receive_start(s, f)) {
+        return 1;
+    }
+
+    return csv_receive_queue_data(s, f, ptr);
+}
+
 int
 csv_save_queued_outgoing_pages(QEMUFile *f, uint64_t *bytes_sent)
 {
diff --git a/target/i386/sev.h b/target/i386/sev.h
index 37ed17f3212..6166be42019 100644
--- a/target/i386/sev.h
+++ b/target/i386/sev.h
@@ -79,6 +79,7 @@ int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp);
 
 int csv_queue_outgoing_page(uint8_t *ptr, uint32_t sz, uint64_t addr);
 int csv_save_queued_outgoing_pages(QEMUFile *f, uint64_t *bytes_sent);
+int csv_queue_incoming_page(QEMUFile *f, uint8_t *ptr);
 
 struct sev_ops {
     int (*sev_ioctl)(int fd, int cmd, void *data, int *error);
-- 
Gitee


From a32dc3e2797efc2ff58eea75f8d8b3346a6dfc37 Mon Sep 17 00:00:00 2001
From: fangbaoshun <fangbaoshun@hygon.cn>
Date: Mon, 2 Aug 2021 14:11:43 +0800
Subject: [PATCH 374/407] anolis: target/i386: csv: add support to load
 incoming encrypted pages queued in the CMD list

The csv_load_queued_incoming_pages() provide the implementation to read the
incoming guest private pages from the socket queued in the CMD list and load
them into the guest memory. The routines uses the RECEIVE_START command to
create the incoming encryption context on the first call then uses the
COMMAND_BATCH carried with RECEIEVE_UPDATE_DATA commands to load the encrypted
pages into the guest memory. After migration is completed, we issue the
RECEIVE_FINISH command to transition the SEV guest to the runnable state
so that it can be executed.

Signed-off-by: hanliyang <hanliyang@hygon.cn>
---
 include/exec/confidential-guest-support.h |  3 +++
 target/i386/sev.c                         | 32 +++++++++++++++++++++++
 target/i386/sev.h                         |  1 +
 3 files changed, 36 insertions(+)

diff --git a/include/exec/confidential-guest-support.h b/include/exec/confidential-guest-support.h
index 530b5057b91..f6a30fab3f6 100644
--- a/include/exec/confidential-guest-support.h
+++ b/include/exec/confidential-guest-support.h
@@ -87,6 +87,9 @@ struct ConfidentialGuestMemoryEncryptionOps {
 
     /* Queue the incoming encrypted page into a list */
     int (*queue_incoming_page)(QEMUFile *f, uint8_t *ptr);
+
+    /* Load the incoming encrypted pages queued in list into guest memory */
+    int (*load_queued_incoming_pages)(QEMUFile *f);
 };
 
 typedef struct ConfidentialGuestSupportClass {
diff --git a/target/i386/sev.c b/target/i386/sev.c
index 30afa5d58c6..a4598839aba 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -203,6 +203,7 @@ static struct ConfidentialGuestMemoryEncryptionOps sev_memory_encryption_ops = {
     .queue_outgoing_page = csv_queue_outgoing_page,
     .save_queued_outgoing_pages = csv_save_queued_outgoing_pages,
     .queue_incoming_page = csv_queue_incoming_page,
+    .load_queued_incoming_pages = csv_load_queued_incoming_pages,
 };
 
 static int
@@ -2133,6 +2134,24 @@ err:
     return ret;
 }
 
+static int
+csv_receive_update_data_batch(SevGuestState *s)
+{
+    int ret;
+    int fw_error;
+
+    ret = csv_command_batch(KVM_SEV_RECEIVE_UPDATE_DATA,
+                            (uint64_t)s->csv_batch_cmd_list->head, &fw_error);
+    if (ret) {
+        error_report("%s: csv_command_batch ret=%d fw_error=%d '%s'",
+                __func__, ret, fw_error, fw_error_to_str(fw_error));
+    }
+
+    csv_batch_cmd_list_destroy(s->csv_batch_cmd_list);
+    s->csv_batch_cmd_list = NULL;
+    return ret;
+}
+
 int
 csv_queue_outgoing_page(uint8_t *ptr, uint32_t sz, uint64_t addr)
 {
@@ -2193,6 +2212,19 @@ csv_save_queued_outgoing_pages(QEMUFile *f, uint64_t *bytes_sent)
     return csv_send_update_data_batch(s, f, bytes_sent);
 }
 
+int csv_load_queued_incoming_pages(QEMUFile *f)
+{
+    SevGuestState *s = sev_guest;
+
+    /* Only support for HYGON CSV */
+    if (!is_hygon_cpu()) {
+        error_report("Only support load queued pages for HYGON CSV");
+        return -EINVAL;
+    }
+
+    return csv_receive_update_data_batch(s);
+}
+
 static const QemuUUID sev_hash_table_header_guid = {
     .data = UUID_LE(0x9438d606, 0x4f22, 0x4cc9, 0xb4, 0x79, 0xa7, 0x93,
                     0xd4, 0x11, 0xfd, 0x21)
diff --git a/target/i386/sev.h b/target/i386/sev.h
index 6166be42019..c53f96b0474 100644
--- a/target/i386/sev.h
+++ b/target/i386/sev.h
@@ -80,6 +80,7 @@ int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp);
 int csv_queue_outgoing_page(uint8_t *ptr, uint32_t sz, uint64_t addr);
 int csv_save_queued_outgoing_pages(QEMUFile *f, uint64_t *bytes_sent);
 int csv_queue_incoming_page(QEMUFile *f, uint8_t *ptr);
+int csv_load_queued_incoming_pages(QEMUFile *f);
 
 struct sev_ops {
     int (*sev_ioctl)(int fd, int cmd, void *data, int *error);
-- 
Gitee


From 072bc6ee40d58fcf68ad7d524cb902b9700bcfa1 Mon Sep 17 00:00:00 2001
From: fangbaoshun <fangbaoshun@hygon.cn>
Date: Mon, 2 Aug 2021 14:35:51 +0800
Subject: [PATCH 375/407] anolis: migration/ram: Accelerate the transmission of
 CSV guest's encrypted pages

When memory encryption is enabled, the guest memory will be encrypted with
the guest specific key. The patch introduces an accelerate solution which
queued the pages into list and send them togather by COMMAND_BATCH.

Signed-off-by: hanliyang <hanliyang@hygon.cn>
---
 configs/devices/i386-softmmu/default.mak |   1 +
 hw/i386/Kconfig                          |   5 ++
 migration/ram.c                          | 106 +++++++++++++++++++++++
 target/i386/csv.h                        |  11 +++
 target/i386/sev.h                        |   1 +
 5 files changed, 124 insertions(+)

diff --git a/configs/devices/i386-softmmu/default.mak b/configs/devices/i386-softmmu/default.mak
index db83ffcab9d..e948e54e4e9 100644
--- a/configs/devices/i386-softmmu/default.mak
+++ b/configs/devices/i386-softmmu/default.mak
@@ -24,6 +24,7 @@
 #CONFIG_VTD=n
 #CONFIG_SGX=n
 #CONFIG_CSV=n
+#CONFIG_HYGON_CSV_MIG_ACCEL=n
 
 # Boards:
 #
diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig
index ed35d762b36..8876b360b68 100644
--- a/hw/i386/Kconfig
+++ b/hw/i386/Kconfig
@@ -4,6 +4,7 @@ config X86_FW_OVMF
 config SEV
     bool
     select X86_FW_OVMF
+    select HYGON_CSV_MIG_ACCEL
     depends on KVM
 
 config SGX
@@ -14,6 +15,10 @@ config CSV
     bool
     depends on SEV
 
+config HYGON_CSV_MIG_ACCEL
+    bool
+    depends on SEV
+
 config PC
     bool
     imply APPLESMC
diff --git a/migration/ram.c b/migration/ram.c
index 33e2c9d5fbd..7e2d4e7df34 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -59,6 +59,7 @@
 
 /* Defines RAM_SAVE_ENCRYPTED_PAGE and RAM_SAVE_SHARED_REGION_LIST */
 #include "target/i386/sev.h"
+#include "target/i386/csv.h"
 #include "sysemu/kvm.h"
 
 #include "hw/boards.h" /* for machine_dump_guest_core() */
@@ -2348,6 +2349,99 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss,
     return ram_save_page(rs, pss, last_stage);
 }
 
+#ifdef CONFIG_HYGON_CSV_MIG_ACCEL
+/**
+ * ram_save_encrypted_pages_in_batch - send the given encrypted pages to
+ *                                     the stream.
+ */
+static int
+ram_save_encrypted_pages_in_batch(RAMState *rs, PageSearchStatus *pss, bool last_stage)
+{
+    int ret;
+    int tmppages = 0, pages = 0;
+    RAMBlock *block = pss->block;
+    ram_addr_t offset = pss->page << TARGET_PAGE_BITS;
+    ram_addr_t start_offset = 0;
+    uint32_t host_len = 0;
+    uint8_t *p;
+    uint64_t bytes_xmit;
+    MachineState *ms = MACHINE(qdev_get_machine());
+    ConfidentialGuestSupportClass *cgs_class =
+        (ConfidentialGuestSupportClass *)object_get_class(OBJECT(ms->cgs));
+    struct ConfidentialGuestMemoryEncryptionOps *ops =
+        cgs_class->memory_encryption_ops;
+
+    do {
+        /* Check the pages is dirty and if it is send it */
+        if (!migration_bitmap_clear_dirty(rs, block, pss->page)) {
+            pss->page++;
+            continue;
+        }
+
+        /* Process the unencrypted page */
+        if (!encrypted_test_list(rs, block, pss->page)) {
+            tmppages = ram_save_target_page(rs, pss, last_stage);
+        } else {
+            /* Caculate the offset and host virtual address of the page */
+            offset = pss->page << TARGET_PAGE_BITS;
+            p = block->host + offset;
+
+            /* Record the offset and host virtual address of the first
+             * page in this loop which will be used below.
+             */
+            if (host_len == 0) {
+                start_offset = offset | RAM_SAVE_FLAG_ENCRYPTED_DATA;
+            } else {
+                offset |= (RAM_SAVE_FLAG_ENCRYPTED_DATA | RAM_SAVE_FLAG_CONTINUE);
+            }
+
+            /* Queue the outgoing page if the page is not zero page.
+             * If the queued pages are up to the outgoing page window size,
+             * process them below.
+             */
+            if (ops->queue_outgoing_page(p, TARGET_PAGE_SIZE, offset))
+                return -1;
+
+            tmppages = 1;
+            host_len += TARGET_PAGE_SIZE;
+            ram_counters.normal++;
+        }
+
+        if (tmppages < 0) {
+            return tmppages;
+        }
+
+        pages += tmppages;
+
+        pss->page++;
+    } while (offset_in_ramblock(block, pss->page << TARGET_PAGE_BITS) &&
+             host_len < CSV_OUTGOING_PAGE_WINDOW_SIZE);
+
+    /* Check if there are any queued pages */
+    if (host_len != 0) {
+        ram_counters.transferred +=
+            save_page_header(rs, rs->f, block, start_offset);
+        /* if only one page queued, flag is BATCH_END, else flag is BATCH */
+        if (host_len > TARGET_PAGE_SIZE)
+            qemu_put_be32(rs->f, RAM_SAVE_ENCRYPTED_PAGE_BATCH);
+        else
+            qemu_put_be32(rs->f, RAM_SAVE_ENCRYPTED_PAGE_BATCH_END);
+        ram_counters.transferred += 4;
+        /* Process the queued pages in batch */
+        ret = ops->save_queued_outgoing_pages(rs->f, &bytes_xmit);
+        if (ret) {
+            return -1;
+        }
+        ram_counters.transferred += bytes_xmit;
+    }
+
+    /* The offset we leave with is the last one we looked at */
+    pss->page--;
+
+    return pages;
+}
+#endif
+
 /**
  * ram_save_host_page: save a whole host page
  *
@@ -2382,6 +2476,18 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss,
         return 0;
     }
 
+#ifdef CONFIG_HYGON_CSV_MIG_ACCEL
+    /*
+     * If command_batch function is enabled and memory encryption is enabled
+     * then use command batch APIs to accelerate the sending process
+     * to write the outgoing buffer to the wire. The encryption APIs
+     * will re-encrypt the data with transport key so that data is prototect
+     * on the wire.
+     */
+    if (memcrypt_enabled() && is_hygon_cpu() && !migration_in_postcopy())
+        return ram_save_encrypted_pages_in_batch(rs, pss, last_stage);
+#endif
+
     do {
         /* Check the pages is dirty and if it is send it */
         if (migration_bitmap_clear_dirty(rs, pss->block, pss->page)) {
diff --git a/target/i386/csv.h b/target/i386/csv.h
index 4f5b2773c4a..e51cc830226 100644
--- a/target/i386/csv.h
+++ b/target/i386/csv.h
@@ -16,6 +16,8 @@
 
 #include "qapi/qapi-commands-misc-target.h"
 
+#ifdef CONFIG_SEV
+
 #include "cpu.h"
 
 #define CPUID_VENDOR_HYGON_EBX   0x6f677948  /* "Hygo" */
@@ -38,6 +40,15 @@ static __attribute__((unused)) bool is_hygon_cpu(void)
         return false;
 }
 
+#else
+
+static __attribute__((unused)) bool is_hygon_cpu(void)
+{
+    return false;
+}
+
+#endif
+
 #ifdef CONFIG_CSV
 bool csv_enabled(void);
 #else
diff --git a/target/i386/sev.h b/target/i386/sev.h
index c53f96b0474..345dffac5b5 100644
--- a/target/i386/sev.h
+++ b/target/i386/sev.h
@@ -43,6 +43,7 @@ typedef struct SevKernelLoaderContext {
 
 #define RAM_SAVE_ENCRYPTED_PAGE_BATCH     0x4
 #define RAM_SAVE_ENCRYPTED_PAGE_BATCH_END 0x5
+#define CSV_OUTGOING_PAGE_WINDOW_SIZE     (4094 * TARGET_PAGE_SIZE)
 
 #ifdef CONFIG_SEV
 bool sev_enabled(void);
-- 
Gitee


From de20eb1c0e3c43cf016c4a3b26d1958479e95af1 Mon Sep 17 00:00:00 2001
From: fangbaoshun <fangbaoshun@hygon.cn>
Date: Mon, 2 Aug 2021 14:49:45 +0800
Subject: [PATCH 376/407] anolis: migration/ram: Accelerate the loading of CSV
 guest's encrypted pages

When memory encryption is enabled, the guest memory will be encrypted with
the guest specific key. The patch introduces an accelerate solution which
queued the pages into list and load them togather by COMMAND_BATCH.

Signed-off-by: hanliyang <hanliyang@hygon.cn>
---
 migration/ram.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 7e2d4e7df34..90f1bda839c 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1387,6 +1387,14 @@ static int load_encrypted_data(QEMUFile *f, uint8_t *ptr)
         return ops->load_incoming_page(f, ptr);
     } else if (flag == RAM_SAVE_SHARED_REGIONS_LIST) {
         return ops->load_incoming_shared_regions_list(f);
+    } else if (flag == RAM_SAVE_ENCRYPTED_PAGE_BATCH) {
+        return ops->queue_incoming_page(f, ptr);
+    } else if (flag == RAM_SAVE_ENCRYPTED_PAGE_BATCH_END) {
+        if (ops->queue_incoming_page(f, ptr)) {
+            error_report("Failed to queue incoming data");
+            return -EINVAL;
+        }
+        return ops->load_queued_incoming_pages(f);
     } else {
         error_report("unknown encrypted flag %x", flag);
         return 1;
-- 
Gitee


From bce2b69fd9118b0a18483ec01fde415114ef90f5 Mon Sep 17 00:00:00 2001
From: hanliyang <hanliyang@hygon.cn>
Date: Tue, 7 Jun 2022 15:19:32 +0800
Subject: [PATCH 377/407] anolis: target/i386/csv: Add support for migrate VMSA
 for CSV2 guest

CSV2 can protect guest's cpu state through memory encryption. Each
vcpu has its corresponding memory, which is also called VMSA, and
is encrypted by guest's specific encrytion key.

When CSV2 guest exit to host, the vcpu's state will be encrypted
and saved to VMSA, and the VMSA will be decrypted and loaded to cpu
when the guest's vcpu running at next time.

If user wants to migrate one CSV2 guest to target machine, the VMSA
of the vcpus also should be migrated to target. CSV firmware provides
SEND_UPDATE_VMSA/RECEIVE_UPDATE_VMSA API through which VMSA can be
converted into secure data and transmitted to the remote end (for
example, network transmission).

The migration of cpu state is identified by CPUState.cpu_index which
may not equals to vcpu id from KVM's perspective.

When migrate the VMSA, the source QEMU will invoke SEND_UPDATE_VMSA to
generate data correspond to VMSA, after target QEMU received the data,
it will calc target vcpu id in the KVM by CPUState.cpu_index, and then
invoke RECEIVE_UPDATE_VMSA to restore VMSA correspond to vcpu.

Signed-off-by: hanliyang <hanliyang@hygon.cn>
---
 include/exec/confidential-guest-support.h |   6 +
 linux-headers/linux/kvm.h                 |  16 ++
 migration/ram.c                           |  30 ++++
 target/i386/sev.c                         | 196 ++++++++++++++++++++++
 target/i386/sev.h                         |   3 +
 target/i386/trace-events                  |   2 +
 6 files changed, 253 insertions(+)

diff --git a/include/exec/confidential-guest-support.h b/include/exec/confidential-guest-support.h
index f6a30fab3f6..a069bad9d96 100644
--- a/include/exec/confidential-guest-support.h
+++ b/include/exec/confidential-guest-support.h
@@ -90,6 +90,12 @@ struct ConfidentialGuestMemoryEncryptionOps {
 
     /* Load the incoming encrypted pages queued in list into guest memory */
     int (*load_queued_incoming_pages)(QEMUFile *f);
+
+    /* Write the encrypted cpu state */
+    int (*save_outgoing_cpu_state)(QEMUFile *f);
+
+    /* Load the encrypted cpu state */
+    int (*load_incoming_cpu_state)(QEMUFile *f);
 };
 
 typedef struct ConfidentialGuestSupportClass {
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 2289dc7c090..c8a1f961c54 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -1904,6 +1904,14 @@ struct kvm_sev_send_update_data {
 	__u32 trans_len;
 };
 
+struct kvm_sev_send_update_vmsa {
+	__u32 vcpu_id;
+	__u64 hdr_uaddr;
+	__u32 hdr_len;
+	__u64 trans_uaddr;
+	__u32 trans_len;
+};
+
 struct kvm_sev_receive_start {
 	__u32 handle;
 	__u32 policy;
@@ -1922,6 +1930,14 @@ struct kvm_sev_receive_update_data {
 	__u32 trans_len;
 };
 
+struct kvm_sev_receive_update_vmsa {
+	__u32 vcpu_id;
+	__u64 hdr_uaddr;
+	__u32 hdr_len;
+	__u64 trans_uaddr;
+	__u32 trans_len;
+};
+
 /* CSV command */
 enum csv_cmd_id {
 	KVM_CSV_NR_MIN = 0xc0,
diff --git a/migration/ram.c b/migration/ram.c
index 90f1bda839c..8c61eecb976 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1371,6 +1371,23 @@ static int ram_save_shared_region_list(RAMState *rs, QEMUFile *f)
     return ops->save_outgoing_shared_regions_list(rs->f);
 }
 
+/**
+ * ram_save_encrypted_cpu_state: send the encrypted cpu state
+ */
+static int ram_save_encrypted_cpu_state(RAMState *rs, QEMUFile *f)
+{
+    MachineState *ms = MACHINE(qdev_get_machine());
+    ConfidentialGuestSupportClass *cgs_class =
+        (ConfidentialGuestSupportClass *) object_get_class(OBJECT(ms->cgs));
+    struct ConfidentialGuestMemoryEncryptionOps *ops =
+        cgs_class->memory_encryption_ops;
+
+    save_page_header(rs, rs->f, rs->last_seen_block,
+                     RAM_SAVE_FLAG_ENCRYPTED_DATA);
+    qemu_put_be32(rs->f, RAM_SAVE_ENCRYPTED_CPU_STATE);
+    return ops->save_outgoing_cpu_state(rs->f);
+}
+
 static int load_encrypted_data(QEMUFile *f, uint8_t *ptr)
 {
     MachineState *ms = MACHINE(qdev_get_machine());
@@ -1395,6 +1412,8 @@ static int load_encrypted_data(QEMUFile *f, uint8_t *ptr)
             return -EINVAL;
         }
         return ops->load_queued_incoming_pages(f);
+    } else if (flag == RAM_SAVE_ENCRYPTED_CPU_STATE) {
+        return ops->load_incoming_cpu_state(f);
     } else {
         error_report("unknown encrypted flag %x", flag);
         return 1;
@@ -3507,6 +3526,17 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
         if (memcrypt_enabled()) {
             ret = ram_save_shared_region_list(rs, f);
         }
+
+        /*
+         * send the encrypted cpu state, for example, CSV2 guest's
+         * vmsa for each vcpu.
+         */
+        if (!ret && memcrypt_enabled() && is_hygon_cpu()) {
+            ret = ram_save_encrypted_cpu_state(rs, f);
+            if (ret) {
+                error_report("Failed to save encrypted cpu state");
+            }
+        }
     }
 
     if (ret < 0) {
diff --git a/target/i386/sev.c b/target/i386/sev.c
index a4598839aba..0d303d04cb7 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -100,6 +100,10 @@ struct SevGuestState {
     gchar *send_packet_hdr;
     size_t send_packet_hdr_len;
 
+    /* needed by live migration of HYGON CSV2 guest */
+    gchar *send_vmsa_packet_hdr;
+    size_t send_vmsa_packet_hdr_len;
+
     uint32_t reset_cs;
     uint32_t reset_ip;
     bool reset_data_valid;
@@ -193,6 +197,9 @@ static const char *const sev_fw_errlist[] = {
 #define SHARED_REGION_LIST_CONT     0x1
 #define SHARED_REGION_LIST_END      0x2
 
+#define ENCRYPTED_CPU_STATE_CONT    0x1
+#define ENCRYPTED_CPU_STATE_END     0x2
+
 static struct ConfidentialGuestMemoryEncryptionOps sev_memory_encryption_ops = {
     .save_setup = sev_save_setup,
     .save_outgoing_page = sev_save_outgoing_page,
@@ -204,6 +211,8 @@ static struct ConfidentialGuestMemoryEncryptionOps sev_memory_encryption_ops = {
     .save_queued_outgoing_pages = csv_save_queued_outgoing_pages,
     .queue_incoming_page = csv_queue_incoming_page,
     .load_queued_incoming_pages = csv_load_queued_incoming_pages,
+    .save_outgoing_cpu_state = csv_save_outgoing_cpu_state,
+    .load_incoming_cpu_state = csv_load_incoming_cpu_state,
 };
 
 static int
@@ -1020,6 +1029,9 @@ sev_send_finish(void)
     }
 
     g_free(sev_guest->send_packet_hdr);
+    if (sev_es_enabled() && is_hygon_cpu()) {
+        g_free(sev_guest->send_vmsa_packet_hdr);
+    }
     sev_set_guest_state(sev_guest, SEV_STATE_RUNNING);
 }
 
@@ -2225,6 +2237,190 @@ int csv_load_queued_incoming_pages(QEMUFile *f)
     return csv_receive_update_data_batch(s);
 }
 
+static int
+sev_send_vmsa_get_packet_len(int *fw_err)
+{
+    int ret;
+    struct kvm_sev_send_update_vmsa update = { 0, };
+
+    ret = sev_ioctl(sev_guest->sev_fd, KVM_SEV_SEND_UPDATE_VMSA,
+                    &update, fw_err);
+    if (*fw_err != SEV_RET_INVALID_LEN) {
+        ret = -1;
+        error_report("%s: failed to get session length ret=%d fw_error=%d '%s'",
+                     __func__, ret, *fw_err, fw_error_to_str(*fw_err));
+        goto err;
+    }
+
+    ret = update.hdr_len;
+
+err:
+    return ret;
+}
+
+static int
+sev_send_update_vmsa(SevGuestState *s, QEMUFile *f, uint32_t cpu_id,
+                     uint32_t cpu_index, uint32_t size)
+{
+    int ret, fw_error;
+    guchar *trans = NULL;
+    struct kvm_sev_send_update_vmsa update = {};
+
+    /*
+     * If this is first call then query the packet header bytes and allocate
+     * the packet buffer.
+     */
+    if (!s->send_vmsa_packet_hdr) {
+        s->send_vmsa_packet_hdr_len = sev_send_vmsa_get_packet_len(&fw_error);
+        if (s->send_vmsa_packet_hdr_len < 1) {
+            error_report("%s: SEND_UPDATE_VMSA fw_error=%d '%s'",
+                         __func__, fw_error, fw_error_to_str(fw_error));
+            return 1;
+        }
+
+        s->send_vmsa_packet_hdr = g_new(gchar, s->send_vmsa_packet_hdr_len);
+    }
+
+    /* allocate transport buffer */
+    trans = g_new(guchar, size);
+
+    update.vcpu_id = cpu_id;
+    update.hdr_uaddr = (uintptr_t)s->send_vmsa_packet_hdr;
+    update.hdr_len = s->send_vmsa_packet_hdr_len;
+    update.trans_uaddr = (uintptr_t)trans;
+    update.trans_len = size;
+
+    trace_kvm_sev_send_update_vmsa(cpu_id, cpu_index, trans, size);
+
+    ret = sev_ioctl(s->sev_fd, KVM_SEV_SEND_UPDATE_VMSA, &update, &fw_error);
+    if (ret) {
+        error_report("%s: SEND_UPDATE_VMSA ret=%d fw_error=%d '%s'",
+                     __func__, ret, fw_error, fw_error_to_str(fw_error));
+        goto err;
+    }
+
+    /*
+     * Migration of vCPU's VMState according to the instance_id
+     * (i.e. CPUState.cpu_index)
+     */
+    qemu_put_be32(f, sizeof(uint32_t));
+    qemu_put_buffer(f, (uint8_t *)&cpu_index, sizeof(uint32_t));
+
+    qemu_put_be32(f, update.hdr_len);
+    qemu_put_buffer(f, (uint8_t *)update.hdr_uaddr, update.hdr_len);
+
+    qemu_put_be32(f, update.trans_len);
+    qemu_put_buffer(f, (uint8_t *)update.trans_uaddr, update.trans_len);
+
+err:
+    g_free(trans);
+    return ret;
+}
+
+int csv_save_outgoing_cpu_state(QEMUFile *f)
+{
+    SevGuestState *s = sev_guest;
+    CPUState *cpu;
+    int ret = 0;
+
+    /* Only support migrate VMSAs for HYGON CSV2 guest */
+    if (!sev_es_enabled() || !is_hygon_cpu()) {
+        return 0;
+    }
+
+    CPU_FOREACH(cpu) {
+        qemu_put_be32(f, ENCRYPTED_CPU_STATE_CONT);
+        ret = sev_send_update_vmsa(s, f, kvm_arch_vcpu_id(cpu),
+                                   cpu->cpu_index, TARGET_PAGE_SIZE);
+        if (ret) {
+            goto err;
+        }
+    }
+
+    qemu_put_be32(f, ENCRYPTED_CPU_STATE_END);
+
+err:
+    return ret;
+}
+
+static int sev_receive_update_vmsa(QEMUFile *f)
+{
+    int ret = 1, fw_error = 0;
+    CPUState *cpu;
+    uint32_t cpu_index, cpu_id = 0;
+    gchar *hdr = NULL, *trans = NULL;
+    struct kvm_sev_receive_update_vmsa update = {};
+
+    /* get cpu index buffer */
+    assert(qemu_get_be32(f) == sizeof(uint32_t));
+    qemu_get_buffer(f, (uint8_t *)&cpu_index, sizeof(uint32_t));
+
+    CPU_FOREACH(cpu) {
+        if (cpu->cpu_index == cpu_index) {
+            cpu_id = kvm_arch_vcpu_id(cpu);
+            break;
+        }
+    }
+    update.vcpu_id = cpu_id;
+
+    /* get packet header */
+    update.hdr_len = qemu_get_be32(f);
+    if (!check_blob_length(update.hdr_len)) {
+        return 1;
+    }
+
+    hdr = g_new(gchar, update.hdr_len);
+    qemu_get_buffer(f, (uint8_t *)hdr, update.hdr_len);
+    update.hdr_uaddr = (uintptr_t)hdr;
+
+    /* get transport buffer */
+    update.trans_len = qemu_get_be32(f);
+    if (!check_blob_length(update.trans_len)) {
+        goto err;
+    }
+
+    trans = g_new(gchar, update.trans_len);
+    update.trans_uaddr = (uintptr_t)trans;
+    qemu_get_buffer(f, (uint8_t *)update.trans_uaddr, update.trans_len);
+
+    trace_kvm_sev_receive_update_vmsa(cpu_id, cpu_index,
+                trans, update.trans_len, hdr, update.hdr_len);
+
+    ret = sev_ioctl(sev_guest->sev_fd, KVM_SEV_RECEIVE_UPDATE_VMSA,
+                    &update, &fw_error);
+    if (ret) {
+        error_report("Error RECEIVE_UPDATE_VMSA ret=%d fw_error=%d '%s'",
+                     ret, fw_error, fw_error_to_str(fw_error));
+    }
+
+err:
+    g_free(trans);
+    g_free(hdr);
+    return ret;
+}
+
+int csv_load_incoming_cpu_state(QEMUFile *f)
+{
+    int status, ret = 0;
+
+    /* Only support migrate VMSAs for HYGON CSV2 guest */
+    if (!sev_es_enabled() || !is_hygon_cpu()) {
+        return 0;
+    }
+
+    status = qemu_get_be32(f);
+    while (status == ENCRYPTED_CPU_STATE_CONT) {
+        ret = sev_receive_update_vmsa(f);
+        if (ret) {
+            break;
+        }
+
+        status = qemu_get_be32(f);
+    }
+
+    return ret;
+}
+
 static const QemuUUID sev_hash_table_header_guid = {
     .data = UUID_LE(0x9438d606, 0x4f22, 0x4cc9, 0xb4, 0x79, 0xa7, 0x93,
                     0xd4, 0x11, 0xfd, 0x21)
diff --git a/target/i386/sev.h b/target/i386/sev.h
index 345dffac5b5..8b38567c3d1 100644
--- a/target/i386/sev.h
+++ b/target/i386/sev.h
@@ -44,6 +44,7 @@ typedef struct SevKernelLoaderContext {
 #define RAM_SAVE_ENCRYPTED_PAGE_BATCH     0x4
 #define RAM_SAVE_ENCRYPTED_PAGE_BATCH_END 0x5
 #define CSV_OUTGOING_PAGE_WINDOW_SIZE     (4094 * TARGET_PAGE_SIZE)
+#define RAM_SAVE_ENCRYPTED_CPU_STATE      0x6
 
 #ifdef CONFIG_SEV
 bool sev_enabled(void);
@@ -82,6 +83,8 @@ int csv_queue_outgoing_page(uint8_t *ptr, uint32_t sz, uint64_t addr);
 int csv_save_queued_outgoing_pages(QEMUFile *f, uint64_t *bytes_sent);
 int csv_queue_incoming_page(QEMUFile *f, uint8_t *ptr);
 int csv_load_queued_incoming_pages(QEMUFile *f);
+int csv_save_outgoing_cpu_state(QEMUFile *f);
+int csv_load_incoming_cpu_state(QEMUFile *f);
 
 struct sev_ops {
     int (*sev_ioctl)(int fd, int cmd, void *data, int *error);
diff --git a/target/i386/trace-events b/target/i386/trace-events
index e32b0319ba3..60a4609c0f5 100644
--- a/target/i386/trace-events
+++ b/target/i386/trace-events
@@ -17,6 +17,8 @@ kvm_sev_send_finish(void) ""
 kvm_sev_receive_start(int policy, void *session, void *pdh) "policy 0x%x session %p pdh %p"
 kvm_sev_receive_update_data(void *src, void *dst, int len, void *hdr, int hdr_len) "guest %p trans %p len %d hdr %p hdr_len %d"
 kvm_sev_receive_finish(void) ""
+kvm_sev_send_update_vmsa(uint32_t cpu_id, uint32_t cpu_index, void *dst, int len) "cpu_id %d cpu_index %d trans %p len %d"
+kvm_sev_receive_update_vmsa(uint32_t cpu_id, uint32_t cpu_index, void *src, int len, void *hdr, int hdr_len) "cpu_id %d cpu_index %d trans %p len %d hdr %p hdr_len %d"
 
 # csv.c
 kvm_csv_launch_encrypt_data(uint64_t gpa, void *addr, uint64_t len) "gpa 0x%" PRIx64 "addr %p len 0x%" PRIu64
-- 
Gitee


From 54a97efbe7c4a2174188a65b49aacabd91a31167 Mon Sep 17 00:00:00 2001
From: panpingsheng <panpingsheng@hygon.cn>
Date: Sat, 12 Jun 2021 15:15:29 +0800
Subject: [PATCH 378/407] anolis: target/i386: get/set/migrate GHCB state

GHCB state is necessary to CSV2 guest when migrating to target.

Add GHCB related definition, it also adds corresponding part
to kvm_get/put, and vmstate.

Signed-off-by: hanliyang <hanliyang@hygon.cn>
---
 include/sysemu/kvm.h      |  1 +
 linux-headers/linux/kvm.h |  1 +
 target/i386/cpu.h         |  4 ++++
 target/i386/kvm/kvm.c     | 11 +++++++++++
 target/i386/machine.c     | 24 ++++++++++++++++++++++++
 target/i386/sev.c         | 10 ++++++++++
 6 files changed, 51 insertions(+)

diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 7b22aeb6ae1..9f8099f4876 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -46,6 +46,7 @@ extern bool kvm_readonly_mem_allowed;
 extern bool kvm_direct_msi_allowed;
 extern bool kvm_ioeventfd_any_length_allowed;
 extern bool kvm_msi_use_devid;
+extern bool kvm_has_msr_ghcb;
 
 #define kvm_enabled()           (kvm_allowed)
 /**
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index c8a1f961c54..8e92682ffee 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -1154,6 +1154,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_S390_PROTECTED_DUMP 217
 #define KVM_CAP_S390_ZPCI_OP 221
 #define KVM_CAP_S390_CPU_TOPOLOGY 222
+#define KVM_CAP_SEV_ES_GHCB 500
 
 #define KVM_EXIT_HYPERCALL_VALID_MASK (1 << KVM_HC_MAP_GPA_RANGE)
 
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 9b7d664ee7c..1415a33fbbc 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -505,6 +505,8 @@ typedef enum X86Seg {
 
 #define MSR_VM_HSAVE_PA                 0xc0010117
 
+#define MSR_AMD64_SEV_ES_GHCB           0xc0010130
+
 #define MSR_IA32_XFD                    0x000001c4
 #define MSR_IA32_XFD_ERR                0x000001c5
 
@@ -1735,6 +1737,8 @@ typedef struct CPUX86State {
     TPRAccess tpr_access_type;
 
     unsigned nr_dies;
+
+    uint64_t ghcb_gpa;
 } CPUX86State;
 
 struct kvm_msrs;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index f0b61ec3cfd..fb85cc69813 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -3346,6 +3346,10 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
         }
     }
 
+    if (kvm_has_msr_ghcb) {
+        kvm_msr_entry_add(cpu, MSR_AMD64_SEV_ES_GHCB, env->ghcb_gpa);
+    }
+
     return kvm_buf_set_msrs(cpu);
 }
 
@@ -3686,6 +3690,10 @@ static int kvm_get_msrs(X86CPU *cpu)
         kvm_msr_entry_add(cpu, MSR_IA32_XFD_ERR, 0);
     }
 
+    if (kvm_has_msr_ghcb) {
+        kvm_msr_entry_add(cpu, MSR_AMD64_SEV_ES_GHCB, 0);
+    }
+
     ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_MSRS, cpu->kvm_msr_buf);
     if (ret < 0) {
         return ret;
@@ -3991,6 +3999,9 @@ static int kvm_get_msrs(X86CPU *cpu)
         case MSR_IA32_XFD_ERR:
             env->msr_xfd_err = msrs[i].data;
             break;
+        case MSR_AMD64_SEV_ES_GHCB:
+            env->ghcb_gpa = msrs[i].data;
+            break;
         }
     }
 
diff --git a/target/i386/machine.c b/target/i386/machine.c
index 3977e9d8f8d..8aa54432d7d 100644
--- a/target/i386/machine.c
+++ b/target/i386/machine.c
@@ -1497,6 +1497,27 @@ static const VMStateDescription vmstate_amx_xtile = {
 };
 #endif
 
+#if defined(CONFIG_KVM) && defined(TARGET_X86_64)
+static bool msr_ghcb_gpa_needed(void *opaque)
+{
+    X86CPU *cpu = opaque;
+    CPUX86State *env = &cpu->env;
+
+    return env->ghcb_gpa != 0;
+}
+
+static const VMStateDescription vmstate_msr_ghcb_gpa = {
+    .name = "cpu/svm_msr_ghcb_gpa",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = msr_ghcb_gpa_needed,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT64(env.ghcb_gpa, X86CPU),
+        VMSTATE_END_OF_LIST()
+    }
+};
+#endif
+
 const VMStateDescription vmstate_x86_cpu = {
     .name = "cpu",
     .version_id = 12,
@@ -1638,6 +1659,9 @@ const VMStateDescription vmstate_x86_cpu = {
         &vmstate_msr_xfd,
 #ifdef TARGET_X86_64
         &vmstate_amx_xtile,
+#endif
+#if defined(CONFIG_KVM) && defined(TARGET_X86_64)
+        &vmstate_msr_ghcb_gpa,
 #endif
         NULL
     }
diff --git a/target/i386/sev.c b/target/i386/sev.c
index 0d303d04cb7..76fb835f26d 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -215,6 +215,8 @@ static struct ConfidentialGuestMemoryEncryptionOps sev_memory_encryption_ops = {
     .load_incoming_cpu_state = csv_load_incoming_cpu_state,
 };
 
+bool kvm_has_msr_ghcb;
+
 static int
 sev_ioctl(int fd, int cmd, void *data, int *error)
 {
@@ -1185,6 +1187,14 @@ int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
     cgs_class->memory_encryption_ops = &sev_memory_encryption_ops;
     QTAILQ_INIT(&sev->shared_regions_list);
 
+    /* Determine whether support MSR_AMD64_SEV_ES_GHCB */
+    if (sev_es_enabled()) {
+        kvm_has_msr_ghcb =
+            kvm_vm_check_extension(kvm_state, KVM_CAP_SEV_ES_GHCB);
+    } else {
+        kvm_has_msr_ghcb = false;
+    }
+
     cgs->ready = true;
 
     return 0;
-- 
Gitee


From dcb0c5eb3e807b0f8949271462e3bde93b09119a Mon Sep 17 00:00:00 2001
From: hanliyang <hanliyang@hygon.cn>
Date: Sun, 19 Jun 2022 16:49:45 +0800
Subject: [PATCH 379/407] anolis: target/i386/kvm: Return resettable when
 emulate HYGON CSV2 guest

SEV-ES guest will be terminated by QEMU when receive reboot request.
In order to support reboot for CSV2 guest, report resettable in
kvm_arch_cpu_check_are_resettable().

Signed-off-by: hanliyang <hanliyang@hygon.cn>
---
 target/i386/kvm/kvm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index fb85cc69813..19c622bb084 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -30,6 +30,7 @@
 #include "sysemu/runstate.h"
 #include "kvm_i386.h"
 #include "sev.h"
+#include "csv.h"
 #include "hyperv.h"
 #include "hyperv-proto.h"
 
@@ -5363,7 +5364,7 @@ bool kvm_has_waitpkg(void)
 
 bool kvm_arch_cpu_check_are_resettable(void)
 {
-    return !sev_es_enabled();
+    return !(sev_es_enabled() && !is_hygon_cpu()) && !csv_enabled();
 }
 
 #define ARCH_REQ_XCOMP_GUEST_PERM       0x1025
-- 
Gitee


From ba1d9acd161ac0eab65b5b334967556b30866d4b Mon Sep 17 00:00:00 2001
From: hanliyang <hanliyang@hygon.cn>
Date: Thu, 15 Apr 2021 08:32:24 -0400
Subject: [PATCH 380/407] anolis: kvm: Add support for CSV2 reboot

Linux will set vcpu.arch.guest_state_protected to true after execute
LAUNCH_UPDATE_VMSA successfully, and then KVM will prevent any changes
to VMCB State Save Area.

In order to support CSV2 guest reboot, calls cpus_control_pre_system_reset()
to set vcpu.arch.guest_state_protected to false, and calls
cpus_control_post_system_reset() to restore VMSA of guest's vcpu with
data generated by LAUNCH_UPDATE_VMSA.

In addition, for memory encrypted guest, additional works may be
required during system reset, such as flushing the cache. The function
cpus_control_post_system_reset() hints linux to flush caches of guest
memory.

Signed-off-by: hanliyang <hanliyang@hygon.cn>
---
 accel/kvm/kvm-accel-ops.c  |  3 +++
 accel/kvm/kvm-all.c        | 10 ++++++++++
 accel/kvm/kvm-cpus.h       |  3 +++
 include/sysemu/accel-ops.h |  3 +++
 include/sysemu/cpus.h      |  2 ++
 linux-headers/linux/kvm.h  |  4 ++++
 softmmu/cpus.c             | 14 ++++++++++++++
 softmmu/runstate.c         |  4 ++++
 8 files changed, 43 insertions(+)

diff --git a/accel/kvm/kvm-accel-ops.c b/accel/kvm/kvm-accel-ops.c
index 7516c67a3f5..9602f9609fc 100644
--- a/accel/kvm/kvm-accel-ops.c
+++ b/accel/kvm/kvm-accel-ops.c
@@ -83,6 +83,9 @@ static void kvm_accel_ops_class_init(ObjectClass *oc, void *data)
     ops->synchronize_post_init = kvm_cpu_synchronize_post_init;
     ops->synchronize_state = kvm_cpu_synchronize_state;
     ops->synchronize_pre_loadvm = kvm_cpu_synchronize_pre_loadvm;
+
+    ops->control_pre_system_reset = kvm_cpus_control_pre_system_reset;
+    ops->control_post_system_reset = kvm_cpus_control_post_system_reset;
 }
 
 static const TypeInfo kvm_accel_ops_type = {
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 3b7bc39823a..decc77b472e 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -2848,6 +2848,16 @@ void kvm_cpu_synchronize_pre_loadvm(CPUState *cpu)
     run_on_cpu(cpu, do_kvm_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
 }
 
+void kvm_cpus_control_pre_system_reset(void)
+{
+    kvm_vm_ioctl(kvm_state, KVM_CONTROL_VCPU_PRE_SYSTEM_RESET, NULL);
+}
+
+void kvm_cpus_control_post_system_reset(void)
+{
+    kvm_vm_ioctl(kvm_state, KVM_CONTROL_VCPU_POST_SYSTEM_RESET, NULL);
+}
+
 #ifdef KVM_HAVE_MCE_INJECTION
 static __thread void *pending_sigbus_addr;
 static __thread int pending_sigbus_code;
diff --git a/accel/kvm/kvm-cpus.h b/accel/kvm/kvm-cpus.h
index bf0bd1bee4e..8ba363e0b91 100644
--- a/accel/kvm/kvm-cpus.h
+++ b/accel/kvm/kvm-cpus.h
@@ -19,4 +19,7 @@ void kvm_cpu_synchronize_post_reset(CPUState *cpu);
 void kvm_cpu_synchronize_post_init(CPUState *cpu);
 void kvm_cpu_synchronize_pre_loadvm(CPUState *cpu);
 
+void kvm_cpus_control_pre_system_reset(void);
+void kvm_cpus_control_post_system_reset(void);
+
 #endif /* KVM_CPUS_H */
diff --git a/include/sysemu/accel-ops.h b/include/sysemu/accel-ops.h
index 032f6979d76..43893c1a46d 100644
--- a/include/sysemu/accel-ops.h
+++ b/include/sysemu/accel-ops.h
@@ -40,6 +40,9 @@ struct AccelOpsClass {
 
     int64_t (*get_virtual_clock)(void);
     int64_t (*get_elapsed_ticks)(void);
+
+    void (*control_pre_system_reset)(void);
+    void (*control_post_system_reset)(void);
 };
 
 #endif /* ACCEL_OPS_H */
diff --git a/include/sysemu/cpus.h b/include/sysemu/cpus.h
index 868f1192dee..70d6e8ad373 100644
--- a/include/sysemu/cpus.h
+++ b/include/sysemu/cpus.h
@@ -42,6 +42,8 @@ extern int icount_align_option;
 void qemu_cpu_kick_self(void);
 
 bool cpus_are_resettable(void);
+void cpus_control_pre_system_reset(void);
+void cpus_control_post_system_reset(void);
 
 void cpu_synchronize_all_states(void);
 void cpu_synchronize_all_post_reset(void);
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 8e92682ffee..73c3c520385 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -1528,6 +1528,10 @@ struct kvm_s390_ucas_mapping {
 #define KVM_GET_DEVICE_ATTR	  _IOW(KVMIO,  0xe2, struct kvm_device_attr)
 #define KVM_HAS_DEVICE_ATTR	  _IOW(KVMIO,  0xe3, struct kvm_device_attr)
 
+/* ioctls for control vcpu setup during system reset */
+#define KVM_CONTROL_VCPU_PRE_SYSTEM_RESET  _IO(KVMIO, 0xe8)
+#define KVM_CONTROL_VCPU_POST_SYSTEM_RESET _IO(KVMIO, 0xe9)
+
 /*
  * ioctls for vcpu fds
  */
diff --git a/softmmu/cpus.c b/softmmu/cpus.c
index 071085f840b..319a72d9201 100644
--- a/softmmu/cpus.c
+++ b/softmmu/cpus.c
@@ -200,6 +200,20 @@ bool cpus_are_resettable(void)
     return cpu_check_are_resettable();
 }
 
+void cpus_control_pre_system_reset(void)
+{
+    if (cpus_accel->control_pre_system_reset) {
+        cpus_accel->control_pre_system_reset();
+    }
+}
+
+void cpus_control_post_system_reset(void)
+{
+    if (cpus_accel->control_post_system_reset) {
+        cpus_accel->control_post_system_reset();
+    }
+}
+
 int64_t cpus_get_virtual_clock(void)
 {
     /*
diff --git a/softmmu/runstate.c b/softmmu/runstate.c
index 10d9b7365aa..4b9c9f9e074 100644
--- a/softmmu/runstate.c
+++ b/softmmu/runstate.c
@@ -437,6 +437,8 @@ void qemu_system_reset(ShutdownCause reason)
 
     mc = current_machine ? MACHINE_GET_CLASS(current_machine) : NULL;
 
+    cpus_control_pre_system_reset();
+
     cpu_synchronize_all_states();
 
     if (mc && mc->reset) {
@@ -448,6 +450,8 @@ void qemu_system_reset(ShutdownCause reason)
         qapi_event_send_reset(shutdown_caused_by_guest(reason), reason);
     }
     cpu_synchronize_all_post_reset();
+
+    cpus_control_post_system_reset();
 }
 
 /*
-- 
Gitee


From 237ec25776b4fc70100eddda6a5fc39fee938520 Mon Sep 17 00:00:00 2001
From: liuyafei <liuyafei@hygon.cn>
Date: Mon, 22 May 2023 20:37:40 +0800
Subject: [PATCH 381/407] anolis: vfio: only map shared region for CSV virtual
 machine

qemu vfio listener map/unmap all of the virtual machine's memory.
It does not work for CSV virtual machine, as only shared memory
should be accessed by device.

Change-Id: I3f281c28166a36f2bed8ea193523715af9ff3271
---
 hw/vfio/common.c              |  40 +++++++++-
 include/exec/memory.h         |  11 +++
 softmmu/memory.c              |  18 +++++
 target/i386/csv-sysemu-stub.c |  10 +++
 target/i386/csv.c             | 134 ++++++++++++++++++++++++++++++++++
 target/i386/csv.h             |  12 +++
 target/i386/kvm/kvm.c         |   2 +
 7 files changed, 225 insertions(+), 2 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 080046e3f51..7cc03a7ef97 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -41,6 +41,9 @@
 #include "qapi/error.h"
 #include "migration/migration.h"
 
+#include "target/i386/sev.h"
+#include "target/i386/csv.h"
+
 VFIOGroupList vfio_group_list =
     QLIST_HEAD_INITIALIZER(vfio_group_list);
 static QLIST_HEAD(, VFIOAddressSpace) vfio_address_spaces =
@@ -1251,6 +1254,30 @@ static void vfio_listener_log_global_stop(MemoryListener *listener)
     vfio_set_dirty_page_tracking(container, false);
 }
 
+static SharedRegionListener *g_shl;
+static void shared_memory_listener_register(MemoryListener *listener, AddressSpace *as)
+{
+    SharedRegionListener *shl;
+
+    shl = g_new0(SharedRegionListener, 1);
+
+    shl->listener = listener;
+    shl->as = as;
+
+    shared_region_register_listener(shl);
+    g_shl = shl;
+}
+
+static void shared_memory_listener_unregister(void)
+{
+    SharedRegionListener *shl = g_shl;
+
+    shared_region_unregister_listener(shl);
+
+    g_free(shl);
+    g_shl = NULL;
+}
+
 static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
                                  uint64_t size, ram_addr_t ram_addr)
 {
@@ -1453,7 +1480,12 @@ static const MemoryListener vfio_memory_listener = {
 
 static void vfio_listener_release(VFIOContainer *container)
 {
-    memory_listener_unregister(&container->listener);
+    if (csv_enabled()) {
+        shared_memory_listener_unregister();
+    } else {
+        memory_listener_unregister(&container->listener);
+    }
+
     if (container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU) {
         memory_listener_unregister(&container->prereg_listener);
     }
@@ -2183,7 +2215,11 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
 
     container->listener = vfio_memory_listener;
 
-    memory_listener_register(&container->listener, container->space->as);
+    if (csv_enabled()) {
+        shared_memory_listener_register(&container->listener, container->space->as);
+    } else {
+        memory_listener_register(&container->listener, container->space->as);
+    }
 
     if (container->error) {
         ret = -1;
diff --git a/include/exec/memory.h b/include/exec/memory.h
index e089f90f9bc..26dcb6d1bbb 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -710,6 +710,17 @@ void ram_discard_manager_register_listener(RamDiscardManager *rdm,
 void ram_discard_manager_unregister_listener(RamDiscardManager *rdm,
                                              RamDiscardListener *rdl);
 
+typedef struct SharedRegionListener SharedRegionListener;
+struct SharedRegionListener {
+    MemoryListener *listener;
+    AddressSpace *as;
+    QTAILQ_ENTRY(SharedRegionListener) next;
+};
+
+void shared_region_register_listener(SharedRegionListener *shl);
+void shared_region_unregister_listener(SharedRegionListener *shl);
+void *shared_region_listeners_get(void);
+
 typedef struct CoalescedMemoryRange CoalescedMemoryRange;
 typedef struct MemoryRegionIoeventfd MemoryRegionIoeventfd;
 
diff --git a/softmmu/memory.c b/softmmu/memory.c
index 6b986153572..3598dfc6955 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -47,6 +47,9 @@ static QTAILQ_HEAD(, MemoryListener) memory_listeners
 static QTAILQ_HEAD(, AddressSpace) address_spaces
     = QTAILQ_HEAD_INITIALIZER(address_spaces);
 
+static QTAILQ_HEAD(, SharedRegionListener) shared_region_listeners
+    = QTAILQ_HEAD_INITIALIZER(shared_region_listeners);
+
 static GHashTable *flat_views;
 
 typedef struct AddrRange AddrRange;
@@ -2129,6 +2132,21 @@ void ram_discard_manager_unregister_listener(RamDiscardManager *rdm,
     rdmc->unregister_listener(rdm, rdl);
 }
 
+void shared_region_register_listener(SharedRegionListener *shl)
+{
+    QTAILQ_INSERT_TAIL(&shared_region_listeners, shl, next);
+}
+
+void shared_region_unregister_listener(SharedRegionListener *shl)
+{
+    QTAILQ_REMOVE(&shared_region_listeners, shl, next);
+}
+
+void *shared_region_listeners_get(void)
+{
+    return &shared_region_listeners;
+}
+
 void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client)
 {
     uint8_t mask = 1 << client;
diff --git a/target/i386/csv-sysemu-stub.c b/target/i386/csv-sysemu-stub.c
index a5ce986e3c1..96ca055270b 100644
--- a/target/i386/csv-sysemu-stub.c
+++ b/target/i386/csv-sysemu-stub.c
@@ -29,3 +29,13 @@ int csv_launch_encrypt_vmcb(void)
 {
     g_assert_not_reached();
 }
+
+int csv_shared_region_dma_map(uint64_t start, uint64_t end)
+{
+    return 0;
+}
+
+void csv_shared_region_dma_unmap(uint64_t start, uint64_t end)
+{
+
+}
diff --git a/target/i386/csv.c b/target/i386/csv.c
index 161cad39ae9..6d05382fb23 100644
--- a/target/i386/csv.c
+++ b/target/i386/csv.c
@@ -24,6 +24,7 @@
 #include "cpu.h"
 #include "sev.h"
 #include "csv.h"
+#include "exec/address-spaces.h"
 
 CsvGuestState csv_guest = { 0 };
 
@@ -63,6 +64,8 @@ csv_init(uint32_t policy, int fd, void *state, struct sev_ops *ops)
         csv_guest.state = state;
         csv_guest.sev_ioctl = ops->sev_ioctl;
         csv_guest.fw_error_to_str = ops->fw_error_to_str;
+        QTAILQ_INIT(&csv_guest.dma_map_regions_list);
+        qemu_mutex_init(&csv_guest.dma_map_regions_list_mutex);
     }
     return 0;
 }
@@ -163,3 +166,134 @@ csv_launch_encrypt_vmcb(void)
 err:
     return ret;
 }
+
+int csv_shared_region_dma_map(uint64_t start, uint64_t end)
+{
+    MemoryRegionSection section;
+    AddressSpace *as;
+    QTAILQ_HEAD(, SharedRegionListener) *shared_region_listeners;
+    SharedRegionListener *shl;
+    MemoryListener *listener;
+    uint64_t size;
+    CsvGuestState *s = &csv_guest;
+    struct dma_map_region *region, *pos;
+    int ret = 0;
+
+    if (!csv_enabled())
+        return 0;
+
+    if (end <= start)
+        return 0;
+
+    shared_region_listeners = shared_region_listeners_get();
+    if (QTAILQ_EMPTY(shared_region_listeners))
+        return 0;
+
+    size = end - start;
+
+    qemu_mutex_lock(&s->dma_map_regions_list_mutex);
+    QTAILQ_FOREACH(pos, &s->dma_map_regions_list, list) {
+        if (start >= (pos->start + pos->size)) {
+            continue;
+        } else if ((start + size) <= pos->start) {
+            break;
+        } else {
+            goto end;
+        }
+    }
+    QTAILQ_FOREACH(shl, shared_region_listeners, next) {
+        listener = shl->listener;
+        as = shl->as;
+        section = memory_region_find(as->root, start, size);
+        if (!section.mr) {
+            goto end;
+        }
+
+        if (!memory_region_is_ram(section.mr)) {
+            memory_region_unref(section.mr);
+            goto end;
+        }
+
+        if (listener->region_add) {
+            listener->region_add(listener, &section);
+        }
+        memory_region_unref(section.mr);
+    }
+
+    region = g_malloc0(sizeof(*region));
+    if (!region) {
+        ret = -1;
+        goto end;
+    }
+    region->start = start;
+    region->size = size;
+
+    if (pos) {
+        QTAILQ_INSERT_BEFORE(pos, region, list);
+    } else {
+        QTAILQ_INSERT_TAIL(&s->dma_map_regions_list, region, list);
+    }
+
+end:
+    qemu_mutex_unlock(&s->dma_map_regions_list_mutex);
+    return ret;
+}
+
+void csv_shared_region_dma_unmap(uint64_t start, uint64_t end)
+{
+    MemoryRegionSection section;
+    AddressSpace *as;
+    QTAILQ_HEAD(, SharedRegionListener) *shared_region_listeners;
+    SharedRegionListener *shl;
+    MemoryListener *listener;
+    uint64_t size;
+    CsvGuestState *s = &csv_guest;
+    struct dma_map_region *pos, *next_pos;
+
+    if (!csv_enabled())
+        return;
+
+    if (end <= start)
+        return;
+
+    shared_region_listeners = shared_region_listeners_get();
+    if (QTAILQ_EMPTY(shared_region_listeners))
+        return;
+
+    size = end - start;
+
+    qemu_mutex_lock(&s->dma_map_regions_list_mutex);
+    QTAILQ_FOREACH_SAFE(pos, &s->dma_map_regions_list, list, next_pos) {
+        uint64_t l, r;
+        uint64_t curr_end = pos->start + pos->size;
+
+        l = MAX(start, pos->start);
+        r = MIN(start + size, pos->start + pos->size);
+        if (l < r) {
+            if ((start <= pos->start) && (start + size >= pos->start + pos->size)) {
+                QTAILQ_FOREACH(shl, shared_region_listeners, next) {
+                    listener = shl->listener;
+                    as = shl->as;
+                    section = memory_region_find(as->root, pos->start, pos->size);
+                    if (!section.mr) {
+                        goto end;
+                    }
+                    if (listener->region_del) {
+                        listener->region_del(listener, &section);
+                    }
+                    memory_region_unref(section.mr);
+                }
+
+                QTAILQ_REMOVE(&s->dma_map_regions_list, pos, list);
+                g_free(pos);
+            }
+            break;
+        }
+        if ((start + size) <= curr_end) {
+            break;
+        }
+    }
+end:
+    qemu_mutex_unlock(&s->dma_map_regions_list_mutex);
+    return;
+}
diff --git a/target/i386/csv.h b/target/i386/csv.h
index e51cc830226..6412c0c70b6 100644
--- a/target/i386/csv.h
+++ b/target/i386/csv.h
@@ -15,6 +15,8 @@
 #define QEMU_CSV_H
 
 #include "qapi/qapi-commands-misc-target.h"
+#include "qemu/thread.h"
+#include "qemu/queue.h"
 
 #ifdef CONFIG_SEV
 
@@ -55,12 +57,19 @@ bool csv_enabled(void);
 #define csv_enabled() 0
 #endif
 
+struct dma_map_region {
+    uint64_t start, size;
+    QTAILQ_ENTRY(dma_map_region) list;
+};
+
 struct CsvGuestState {
     uint32_t policy;
     int sev_fd;
     void *state;
     int (*sev_ioctl)(int fd, int cmd, void *data, int *error);
     const char *(*fw_error_to_str)(int code);
+    QTAILQ_HEAD(, dma_map_region) dma_map_regions_list;
+    QemuMutex dma_map_regions_list_mutex;
 };
 
 typedef struct CsvGuestState CsvGuestState;
@@ -71,4 +80,7 @@ extern int csv_launch_encrypt_vmcb(void);
 
 int csv_load_data(uint64_t gpa, uint8_t *ptr, uint64_t len, Error **errp);
 
+int csv_shared_region_dma_map(uint64_t start, uint64_t end);
+void csv_shared_region_dma_unmap(uint64_t start, uint64_t end);
+
 #endif
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 19c622bb084..cecc3d035d8 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -4716,8 +4716,10 @@ static int kvm_handle_exit_hypercall(X86CPU *cpu, struct kvm_run *run)
 
         if (enc) {
             sev_remove_shared_regions_list(gfn_start, gfn_end);
+            csv_shared_region_dma_unmap(gpa, gfn_end << TARGET_PAGE_BITS);
          } else {
             sev_add_shared_regions_list(gfn_start, gfn_end);
+            csv_shared_region_dma_map(gpa, gfn_end << TARGET_PAGE_BITS);
          }
     }
     return 0;
-- 
Gitee


From 85fb37d40e533a757f3bb756aea0655a09dc3101 Mon Sep 17 00:00:00 2001
From: jiangxin <jiangxin@hygon.cn>
Date: Fri, 17 Jun 2022 09:25:19 +0800
Subject: [PATCH 382/407] anolis: linux-headers: update kernel headers to
 include CSV migration cmds

Four new migration commands are added to support CSV migration.

KVM_CSV_SEND_ENCRYPT_DATA/KVM_CSV_RECEIVE_ENCRYPT_DATA cmds are
used to migrate guest's pages.

KVM_CSV_SEND_ENCRYPT_CONTEXT/KVM_CSV_RECEIVE_ENCRYPT_CONTEXT cmds
are used to migration guest's runtime context.

Change-Id: Ib3b733c7b5713aa6a6648c65e03cf8c9618ff1af
Signed-off-by: Xin Jiang <jiangxin@hygon.cn>
---
 linux-headers/linux/kvm.h | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 73c3c520385..854bd3bcb6f 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -1950,6 +1950,12 @@ enum csv_cmd_id {
 	KVM_CSV_INIT = KVM_CSV_NR_MIN,
 	KVM_CSV_LAUNCH_ENCRYPT_DATA,
 	KVM_CSV_LAUNCH_ENCRYPT_VMCB,
+	KVM_CSV_SEND_ENCRYPT_DATA,
+	KVM_CSV_SEND_ENCRYPT_CONTEXT,
+	KVM_CSV_RECEIVE_ENCRYPT_DATA,
+	KVM_CSV_RECEIVE_ENCRYPT_CONTEXT,
+
+	KVM_CSV_NR_MAX,
 };
 
 struct kvm_csv_launch_encrypt_data {
@@ -1973,6 +1979,38 @@ struct kvm_csv_command_batch {
 	__u64 csv_batch_list_uaddr;
 };
 
+struct kvm_csv_send_encrypt_data {
+	__u64 hdr_uaddr;
+	__u32 hdr_len;
+	__u64 guest_addr_data;
+	__u32 guest_addr_len;
+	__u64 trans_uaddr;
+	__u32 trans_len;
+};
+
+struct kvm_csv_send_encrypt_context {
+	__u64 hdr_uaddr;
+	__u32 hdr_len;
+	__u64 trans_uaddr;
+	__u32 trans_len;
+};
+
+struct kvm_csv_receive_encrypt_data {
+	__u64 hdr_uaddr;
+	__u32 hdr_len;
+	__u64 guest_addr_data;
+	__u32 guest_addr_len;
+	__u64 trans_uaddr;
+	__u32 trans_len;
+};
+
+struct kvm_csv_receive_encrypt_context {
+	__u64 hdr_uaddr;
+	__u32 hdr_len;
+	__u64 trans_uaddr;
+	__u32 trans_len;
+};
+
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
 #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
-- 
Gitee


From b024c861fd38f19b24df5f0becf8fe2765c89126 Mon Sep 17 00:00:00 2001
From: jiangxin <jiangxin@hygon.cn>
Date: Fri, 17 Jun 2022 09:37:56 +0800
Subject: [PATCH 383/407] anolis: csv/i386: add support to migrate the outgoing
 page

The csv_send_encrypt_data() provides the method to encrypt the
guest's private pages during migration. The routine is similar to
CSV2's. Usually, it starts with a SEND_START command to create the
migration context. Then SEND_ENCRYPT_DATA command is performed to
encrypt guest pages. After migration is completed, a SEND_FINISH
command is performed to the firmware.

Change-Id: I6781119890036636d8f5c0a19c6647fa8a33a37d
---
 migration/ram.c          |  83 ++++++++++++++++++
 target/i386/csv.c        | 185 +++++++++++++++++++++++++++++++++++++++
 target/i386/csv.h        |  22 +++++
 target/i386/sev.c        |  13 ++-
 target/i386/sev.h        |   1 +
 target/i386/trace-events |   1 +
 6 files changed, 304 insertions(+), 1 deletion(-)

diff --git a/migration/ram.c b/migration/ram.c
index 8c61eecb976..727fe801dbe 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2469,6 +2469,86 @@ ram_save_encrypted_pages_in_batch(RAMState *rs, PageSearchStatus *pss, bool last
 }
 #endif
 
+/**
+ * ram_save_csv_pages - send the given csv VM pages to the stream
+ */
+static int ram_save_csv_pages(RAMState *rs, PageSearchStatus *pss,
+                              bool last_stage)
+{
+    int ret;
+    int tmppages = 0, pages = 0;
+    RAMBlock *block = pss->block;
+    ram_addr_t offset = 0;
+    hwaddr paddr = RAM_ADDR_INVALID;
+    uint32_t host_len = 0;
+    uint8_t *p;
+    uint64_t bytes_xmit;
+    MachineState *ms = MACHINE(qdev_get_machine());
+    ConfidentialGuestSupportClass *cgs_class =
+        (ConfidentialGuestSupportClass *) object_get_class(OBJECT(ms->cgs));
+    struct ConfidentialGuestMemoryEncryptionOps *ops =
+        cgs_class->memory_encryption_ops;
+
+    if (!csv_enabled())
+        return 0;
+
+    do {
+        /* Check the pages is dirty and if it is send it */
+        if (!migration_bitmap_clear_dirty(rs, block, pss->page)) {
+            pss->page++;
+            continue;
+        }
+
+        ret = kvm_physical_memory_addr_from_host(kvm_state,
+                   block->host + (pss->page << TARGET_PAGE_BITS), &paddr);
+        /* Process ROM or MMIO */
+        if (paddr == RAM_ADDR_INVALID || memory_region_is_rom(block->mr))
+            tmppages = ram_save_target_page(rs, pss, last_stage);
+        else {
+            /* Caculate the offset and host virtual address of the page */
+            offset = pss->page << TARGET_PAGE_BITS;
+            p = block->host + offset;
+
+            if (ops->queue_outgoing_page(p, TARGET_PAGE_SIZE, offset))
+                return -1;
+
+            tmppages = 1;
+            host_len += TARGET_PAGE_SIZE;
+            ram_counters.normal++;
+        }
+
+        if (tmppages < 0) {
+            return tmppages;
+        }
+
+        pages += tmppages;
+
+        pss->page++;
+    } while (offset_in_ramblock(block, pss->page << TARGET_PAGE_BITS) &&
+             host_len < CSV3_OUTGOING_PAGE_WINDOW_SIZE);
+
+    /* Check if there are any queued pages */
+    if (host_len != 0) {
+        /* Always set offset as 0 for csv. */
+        ram_counters.transferred +=
+            save_page_header(rs, rs->f, block, 0 | RAM_SAVE_FLAG_ENCRYPTED_DATA);
+
+        qemu_put_be32(rs->f, RAM_SAVE_ENCRYPTED_PAGE);
+        ram_counters.transferred += 4;
+        /* Process the queued pages in batch */
+        ret = ops->save_queued_outgoing_pages(rs->f, &bytes_xmit);
+        if (ret) {
+            return -1;
+        }
+        ram_counters.transferred += bytes_xmit;
+    }
+
+    /* The offset we leave with is the last one we looked at */
+    pss->page--;
+
+    return pages;
+}
+
 /**
  * ram_save_host_page: save a whole host page
  *
@@ -2503,6 +2583,9 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss,
         return 0;
     }
 
+    if (csv_enabled())
+        return ram_save_csv_pages(rs, pss, last_stage);
+
 #ifdef CONFIG_HYGON_CSV_MIG_ACCEL
     /*
      * If command_batch function is enabled and memory encryption is enabled
diff --git a/target/i386/csv.c b/target/i386/csv.c
index 6d05382fb23..c5a2bc99244 100644
--- a/target/i386/csv.c
+++ b/target/i386/csv.c
@@ -13,6 +13,7 @@
 
 #include "qemu/osdep.h"
 
+#include <linux/psp-sev.h>
 #include <linux/kvm.h>
 #include "qapi/error.h"
 
@@ -20,12 +21,30 @@
 #include <numaif.h>
 #endif
 
+#include "migration/blocker.h"
+#include "migration/qemu-file.h"
+#include "migration/misc.h"
+#include "monitor/monitor.h"
+#include "sysemu/kvm.h"
+
 #include "trace.h"
 #include "cpu.h"
 #include "sev.h"
 #include "csv.h"
 #include "exec/address-spaces.h"
 
+struct ConfidentialGuestMemoryEncryptionOps csv_memory_encryption_ops = {
+    .save_setup = sev_save_setup,
+    .save_outgoing_page = NULL,
+    .is_gfn_in_unshared_region = NULL,
+    .save_outgoing_shared_regions_list = sev_save_outgoing_shared_regions_list,
+    .load_incoming_shared_regions_list = sev_load_incoming_shared_regions_list,
+    .queue_outgoing_page = csv3_queue_outgoing_page,
+    .save_queued_outgoing_pages = csv3_save_queued_outgoing_pages,
+};
+
+#define CSV_OUTGOING_PAGE_NUM (CSV3_OUTGOING_PAGE_WINDOW_SIZE/TARGET_PAGE_SIZE)
+
 CsvGuestState csv_guest = { 0 };
 
 #define GUEST_POLICY_CSV_BIT     (1 << 6)
@@ -66,6 +85,7 @@ csv_init(uint32_t policy, int fd, void *state, struct sev_ops *ops)
         csv_guest.fw_error_to_str = ops->fw_error_to_str;
         QTAILQ_INIT(&csv_guest.dma_map_regions_list);
         qemu_mutex_init(&csv_guest.dma_map_regions_list_mutex);
+        csv_guest.sev_send_start = ops->sev_send_start;
     }
     return 0;
 }
@@ -297,3 +317,168 @@ end:
     qemu_mutex_unlock(&s->dma_map_regions_list_mutex);
     return;
 }
+
+static inline hwaddr csv_hva_to_gfn(uint8_t *ptr)
+{
+    ram_addr_t offset = RAM_ADDR_INVALID;
+
+    kvm_physical_memory_addr_from_host(kvm_state, ptr, &offset);
+
+    return offset >> TARGET_PAGE_BITS;
+}
+
+static int
+csv_send_start(QEMUFile *f, uint64_t *bytes_sent)
+{
+    if (csv_guest.sev_send_start)
+        return csv_guest.sev_send_start(f, bytes_sent);
+    else
+        return -1;
+}
+
+static int
+csv_send_get_packet_len(int *fw_err)
+{
+    int ret;
+    struct kvm_csv_send_encrypt_data update = {0};
+
+    update.hdr_len = 0;
+    update.trans_len = 0;
+    ret = csv_ioctl(KVM_CSV_SEND_ENCRYPT_DATA, &update, fw_err);
+    if (*fw_err != SEV_RET_INVALID_LEN) {
+        error_report("%s: failed to get session length ret=%d fw_error=%d '%s'",
+                    __func__, ret, *fw_err, fw_error_to_str(*fw_err));
+        ret = 0;
+        goto err;
+    }
+
+    if (update.hdr_len <= INT_MAX)
+        ret = update.hdr_len;
+    else
+        ret = 0;
+
+err:
+    return ret;
+}
+
+static int
+csv_send_encrypt_data(CsvGuestState *s, QEMUFile *f, uint8_t *ptr, uint32_t size,
+                     uint64_t *bytes_sent)
+{
+    int ret, fw_error = 0;
+    guchar *trans;
+    uint32_t guest_addr_entry_num;
+    uint32_t i;
+    struct kvm_csv_send_encrypt_data update = { };
+
+    /*
+     * If this is first call then query the packet header bytes and allocate
+     * the packet buffer.
+     */
+    if (!s->send_packet_hdr) {
+        s->send_packet_hdr_len = csv_send_get_packet_len(&fw_error);
+        if (s->send_packet_hdr_len < 1) {
+            error_report("%s: SEND_UPDATE fw_error=%d '%s'",
+                         __func__, fw_error, fw_error_to_str(fw_error));
+            return 1;
+        }
+
+        s->send_packet_hdr = g_new(gchar, s->send_packet_hdr_len);
+    }
+
+    if (!s->guest_addr_len || !s->guest_addr_data) {
+        error_report("%s: invalid host address or size", __func__);
+        return 1;
+    } else {
+        guest_addr_entry_num = s->guest_addr_len / sizeof(struct guest_addr_entry);
+    }
+
+    /* allocate transport buffer */
+    trans = g_new(guchar, guest_addr_entry_num * TARGET_PAGE_SIZE);
+
+    update.hdr_uaddr = (uintptr_t)s->send_packet_hdr;
+    update.hdr_len = s->send_packet_hdr_len;
+    update.guest_addr_data = (uintptr_t)s->guest_addr_data;
+    update.guest_addr_len = s->guest_addr_len;
+    update.trans_uaddr = (uintptr_t)trans;
+    update.trans_len = guest_addr_entry_num * TARGET_PAGE_SIZE;
+
+    trace_kvm_csv_send_encrypt_data(trans, update.trans_len);
+
+    ret = csv_ioctl(KVM_CSV_SEND_ENCRYPT_DATA, &update, &fw_error);
+    if (ret) {
+        error_report("%s: SEND_ENCRYPT_DATA ret=%d fw_error=%d '%s'",
+                     __func__, ret, fw_error, fw_error_to_str(fw_error));
+        goto err;
+    }
+
+    for (i = 0; i < guest_addr_entry_num; i++) {
+        if (s->guest_addr_data[i].share)
+            memcpy(trans + i * TARGET_PAGE_SIZE, (guchar *)s->guest_hva_data[i].hva,
+                   TARGET_PAGE_SIZE);
+    }
+
+    qemu_put_be32(f, update.hdr_len);
+    qemu_put_buffer(f, (uint8_t *)update.hdr_uaddr, update.hdr_len);
+    *bytes_sent = 4 + update.hdr_len;
+
+    qemu_put_be32(f, update.guest_addr_len);
+    qemu_put_buffer(f, (uint8_t *)update.guest_addr_data, update.guest_addr_len);
+    *bytes_sent = 4 + update.guest_addr_len;
+
+    qemu_put_be32(f, update.trans_len);
+    qemu_put_buffer(f, (uint8_t *)update.trans_uaddr, update.trans_len);
+    *bytes_sent += (4 + update.trans_len);
+
+err:
+    s->guest_addr_len = 0;
+    g_free(trans);
+    return ret;
+}
+
+int
+csv3_queue_outgoing_page(uint8_t *ptr, uint32_t sz, uint64_t addr)
+{
+    CsvGuestState *s = &csv_guest;
+    uint32_t i = 0;
+
+    (void) addr;
+
+    if (!s->guest_addr_data) {
+        s->guest_hva_data = g_new0(struct guest_hva_entry, CSV_OUTGOING_PAGE_NUM);
+        s->guest_addr_data = g_new0(struct guest_addr_entry, CSV_OUTGOING_PAGE_NUM);
+        s->guest_addr_len = 0;
+    }
+
+    if (s->guest_addr_len >= sizeof(struct guest_addr_entry) * CSV_OUTGOING_PAGE_NUM) {
+        error_report("Failed to queue outgoing page");
+        return 1;
+    }
+
+    i = s->guest_addr_len / sizeof(struct guest_addr_entry);
+    s->guest_hva_data[i].hva = (uintptr_t)ptr;
+    s->guest_addr_data[i].share = 0;
+    s->guest_addr_data[i].reserved = 0;
+    s->guest_addr_data[i].gfn = csv_hva_to_gfn(ptr);
+    s->guest_addr_len += sizeof(struct guest_addr_entry);
+
+    return 0;
+}
+
+int
+csv3_save_queued_outgoing_pages(QEMUFile *f, uint64_t *bytes_sent)
+{
+    CsvGuestState *s = &csv_guest;
+
+    /*
+     * If this is a first buffer then create outgoing encryption context
+     * and write our PDH, policy and session data.
+     */
+    if (!csv_check_state(SEV_STATE_SEND_UPDATE) &&
+        csv_send_start(f, bytes_sent)) {
+        error_report("Failed to create outgoing context");
+        return 1;
+    }
+
+    return csv_send_encrypt_data(s, f, NULL, 0, bytes_sent);
+}
diff --git a/target/i386/csv.h b/target/i386/csv.h
index 6412c0c70b6..273d69d12c4 100644
--- a/target/i386/csv.h
+++ b/target/i386/csv.h
@@ -62,6 +62,18 @@ struct dma_map_region {
     QTAILQ_ENTRY(dma_map_region) list;
 };
 
+#define CSV3_OUTGOING_PAGE_WINDOW_SIZE (512 * TARGET_PAGE_SIZE)
+
+struct guest_addr_entry {
+    uint64_t share:    1;
+    uint64_t reserved: 11;
+    uint64_t gfn:      52;
+};
+
+struct guest_hva_entry {
+    uint64_t  hva;
+};
+
 struct CsvGuestState {
     uint32_t policy;
     int sev_fd;
@@ -70,6 +82,13 @@ struct CsvGuestState {
     const char *(*fw_error_to_str)(int code);
     QTAILQ_HEAD(, dma_map_region) dma_map_regions_list;
     QemuMutex dma_map_regions_list_mutex;
+    gchar *send_packet_hdr;
+    size_t send_packet_hdr_len;
+    struct guest_hva_entry *guest_hva_data;
+    struct guest_addr_entry *guest_addr_data;
+    size_t guest_addr_len;
+
+    int (*sev_send_start)(QEMUFile *f, uint64_t *bytes_sent);
 };
 
 typedef struct CsvGuestState CsvGuestState;
@@ -77,10 +96,13 @@ typedef struct CsvGuestState CsvGuestState;
 extern struct CsvGuestState csv_guest;
 extern int csv_init(uint32_t policy, int fd, void *state, struct sev_ops *ops);
 extern int csv_launch_encrypt_vmcb(void);
+extern struct ConfidentialGuestMemoryEncryptionOps csv_memory_encryption_ops;
 
 int csv_load_data(uint64_t gpa, uint8_t *ptr, uint64_t len, Error **errp);
 
 int csv_shared_region_dma_map(uint64_t start, uint64_t end);
 void csv_shared_region_dma_unmap(uint64_t start, uint64_t end);
+int csv3_queue_outgoing_page(uint8_t *ptr, uint32_t sz, uint64_t addr);
+int csv3_save_queued_outgoing_pages(QEMUFile *f, uint64_t *bytes_sent);
 
 #endif
diff --git a/target/i386/sev.c b/target/i386/sev.c
index 76fb835f26d..f026fa1f092 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -1184,7 +1184,10 @@ int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
     qemu_add_vm_change_state_handler(sev_vm_state_change, sev);
     add_migration_state_change_notifier(&sev_migration_state_notify);
 
-    cgs_class->memory_encryption_ops = &sev_memory_encryption_ops;
+    if (csv_enabled())
+        cgs_class->memory_encryption_ops = &csv_memory_encryption_ops;
+    else
+        cgs_class->memory_encryption_ops = &sev_memory_encryption_ops;
     QTAILQ_INIT(&sev->shared_regions_list);
 
     /* Determine whether support MSR_AMD64_SEV_ES_GHCB */
@@ -2563,9 +2566,17 @@ bool sev_add_kernel_loader_hashes(SevKernelLoaderContext *ctx, Error **errp)
     return ret;
 }
 
+static int _sev_send_start(QEMUFile *f, uint64_t *bytes_sent)
+{
+    SevGuestState *s = sev_guest;
+
+    return sev_send_start(s, f, bytes_sent);
+}
+
 struct sev_ops sev_ops = {
     .sev_ioctl = sev_ioctl,
     .fw_error_to_str = fw_error_to_str,
+    .sev_send_start = _sev_send_start,
 };
 
 static void
diff --git a/target/i386/sev.h b/target/i386/sev.h
index 8b38567c3d1..6d1a918413c 100644
--- a/target/i386/sev.h
+++ b/target/i386/sev.h
@@ -89,6 +89,7 @@ int csv_load_incoming_cpu_state(QEMUFile *f);
 struct sev_ops {
     int (*sev_ioctl)(int fd, int cmd, void *data, int *error);
     const char *(*fw_error_to_str)(int code);
+    int (*sev_send_start)(QEMUFile *f, uint64_t *bytes_sent);
 };
 
 #endif
diff --git a/target/i386/trace-events b/target/i386/trace-events
index 60a4609c0f5..8db3e363859 100644
--- a/target/i386/trace-events
+++ b/target/i386/trace-events
@@ -22,3 +22,4 @@ kvm_sev_receive_update_vmsa(uint32_t cpu_id, uint32_t cpu_index, void *src, int
 
 # csv.c
 kvm_csv_launch_encrypt_data(uint64_t gpa, void *addr, uint64_t len) "gpa 0x%" PRIx64 "addr %p len 0x%" PRIu64
+kvm_csv_send_encrypt_data(void *dst, int len) "trans %p len %d"
-- 
Gitee


From 5d0e07d936ab5a2981719d3203edd46a22d5d1c3 Mon Sep 17 00:00:00 2001
From: jiangxin <jiangxin@hygon.cn>
Date: Fri, 17 Jun 2022 09:45:45 +0800
Subject: [PATCH 384/407] anolis: csv/i386: add support to migrate the incoming
 page

The csv_receive_encrypt_data() provides the method to read incoming
guest private pages from socket and load them into guest memory.
The routine is similar to CSV2's. Usually, it starts with a RECEIVE
START command to create the migration context. Then RECEIVE ENCRYPT
DATA command is performed to let the firmware load incoming pages
into guest memory. After migration is completed, a RECEIVE FINISH
command is performed to the firmware.

Change-Id: I53bc350e379fa747c7b3c3b3be9f2fef0bf1af9f
---
 target/i386/csv.c        | 87 ++++++++++++++++++++++++++++++++++++++++
 target/i386/csv.h        |  2 +
 target/i386/sev.c        |  8 ++++
 target/i386/sev.h        |  1 +
 target/i386/trace-events |  1 +
 5 files changed, 99 insertions(+)

diff --git a/target/i386/csv.c b/target/i386/csv.c
index c5a2bc99244..00ff7d20a54 100644
--- a/target/i386/csv.c
+++ b/target/i386/csv.c
@@ -36,11 +36,14 @@
 struct ConfidentialGuestMemoryEncryptionOps csv_memory_encryption_ops = {
     .save_setup = sev_save_setup,
     .save_outgoing_page = NULL,
+    .load_incoming_page = csv3_load_incoming_page,
     .is_gfn_in_unshared_region = NULL,
     .save_outgoing_shared_regions_list = sev_save_outgoing_shared_regions_list,
     .load_incoming_shared_regions_list = sev_load_incoming_shared_regions_list,
     .queue_outgoing_page = csv3_queue_outgoing_page,
     .save_queued_outgoing_pages = csv3_save_queued_outgoing_pages,
+    .queue_incoming_page = NULL,
+    .load_queued_incoming_pages = NULL,
 };
 
 #define CSV_OUTGOING_PAGE_NUM (CSV3_OUTGOING_PAGE_WINDOW_SIZE/TARGET_PAGE_SIZE)
@@ -86,6 +89,7 @@ csv_init(uint32_t policy, int fd, void *state, struct sev_ops *ops)
         QTAILQ_INIT(&csv_guest.dma_map_regions_list);
         qemu_mutex_init(&csv_guest.dma_map_regions_list_mutex);
         csv_guest.sev_send_start = ops->sev_send_start;
+        csv_guest.sev_receive_start = ops->sev_receive_start;
     }
     return 0;
 }
@@ -482,3 +486,86 @@ csv3_save_queued_outgoing_pages(QEMUFile *f, uint64_t *bytes_sent)
 
     return csv_send_encrypt_data(s, f, NULL, 0, bytes_sent);
 }
+
+static int
+csv_receive_start(QEMUFile *f)
+{
+    if (csv_guest.sev_receive_start)
+        return csv_guest.sev_receive_start(f);
+    else
+        return -1;
+}
+
+static int csv_receive_encrypt_data(QEMUFile *f, uint8_t *ptr)
+{
+    int ret = 1, fw_error = 0;
+    uint32_t i, guest_addr_entry_num;
+    gchar *hdr = NULL, *trans = NULL;
+    struct guest_addr_entry *guest_addr_data;
+    struct kvm_csv_receive_encrypt_data update = {};
+    void *hva = NULL;
+    MemoryRegion *mr = NULL;
+
+    /* get packet header */
+    update.hdr_len = qemu_get_be32(f);
+
+    hdr = g_new(gchar, update.hdr_len);
+    qemu_get_buffer(f, (uint8_t *)hdr, update.hdr_len);
+    update.hdr_uaddr = (uintptr_t)hdr;
+
+    /* get guest addr data */
+    update.guest_addr_len = qemu_get_be32(f);
+
+    guest_addr_data = (struct guest_addr_entry *)g_new(gchar, update.guest_addr_len);
+    qemu_get_buffer(f, (uint8_t *)guest_addr_data, update.guest_addr_len);
+    update.guest_addr_data = (uintptr_t)guest_addr_data;
+
+    /* get transport buffer */
+    update.trans_len = qemu_get_be32(f);
+
+    trans = g_new(gchar, update.trans_len);
+    update.trans_uaddr = (uintptr_t)trans;
+    qemu_get_buffer(f, (uint8_t *)update.trans_uaddr, update.trans_len);
+
+    /* update share memory. */
+    guest_addr_entry_num = update.guest_addr_len / sizeof(struct guest_addr_entry);
+    for (i = 0; i < guest_addr_entry_num; i++) {
+        if (guest_addr_data[i].share) {
+            hva = gpa2hva(&mr,
+                          ((uint64_t)guest_addr_data[i].gfn << TARGET_PAGE_BITS),
+                          TARGET_PAGE_SIZE,
+                          NULL);
+            if (hva)
+                memcpy(hva, trans + i * TARGET_PAGE_SIZE, TARGET_PAGE_SIZE);
+        }
+    }
+
+    trace_kvm_csv_receive_encrypt_data(trans, update.trans_len, hdr, update.hdr_len);
+
+    ret = csv_ioctl(KVM_CSV_RECEIVE_ENCRYPT_DATA, &update, &fw_error);
+    if (ret) {
+        error_report("Error RECEIVE_ENCRYPT_DATA ret=%d fw_error=%d '%s'",
+                     ret, fw_error, fw_error_to_str(fw_error));
+        goto err;
+    }
+
+err:
+    g_free(trans);
+    g_free(guest_addr_data);
+    g_free(hdr);
+    return ret;
+}
+
+int csv3_load_incoming_page(QEMUFile *f, uint8_t *ptr)
+{
+    /*
+     * If this is first buffer and SEV is not in recieiving state then
+     * use RECEIVE_START command to create a encryption context.
+     */
+    if (!csv_check_state(SEV_STATE_RECEIVE_UPDATE) &&
+        csv_receive_start(f)) {
+        return 1;
+    }
+
+    return csv_receive_encrypt_data(f, ptr);
+}
diff --git a/target/i386/csv.h b/target/i386/csv.h
index 273d69d12c4..c8639cfa9a1 100644
--- a/target/i386/csv.h
+++ b/target/i386/csv.h
@@ -89,6 +89,7 @@ struct CsvGuestState {
     size_t guest_addr_len;
 
     int (*sev_send_start)(QEMUFile *f, uint64_t *bytes_sent);
+    int (*sev_receive_start)(QEMUFile *f);
 };
 
 typedef struct CsvGuestState CsvGuestState;
@@ -102,6 +103,7 @@ int csv_load_data(uint64_t gpa, uint8_t *ptr, uint64_t len, Error **errp);
 
 int csv_shared_region_dma_map(uint64_t start, uint64_t end);
 void csv_shared_region_dma_unmap(uint64_t start, uint64_t end);
+int csv3_load_incoming_page(QEMUFile *f, uint8_t *ptr);
 int csv3_queue_outgoing_page(uint8_t *ptr, uint32_t sz, uint64_t addr);
 int csv3_save_queued_outgoing_pages(QEMUFile *f, uint64_t *bytes_sent);
 
diff --git a/target/i386/sev.c b/target/i386/sev.c
index f026fa1f092..03a91e5b309 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -2573,10 +2573,18 @@ static int _sev_send_start(QEMUFile *f, uint64_t *bytes_sent)
     return sev_send_start(s, f, bytes_sent);
 }
 
+static int _sev_receive_start(QEMUFile *f)
+{
+    SevGuestState *s = sev_guest;
+
+    return sev_receive_start(s, f);
+}
+
 struct sev_ops sev_ops = {
     .sev_ioctl = sev_ioctl,
     .fw_error_to_str = fw_error_to_str,
     .sev_send_start = _sev_send_start,
+    .sev_receive_start = _sev_receive_start,
 };
 
 static void
diff --git a/target/i386/sev.h b/target/i386/sev.h
index 6d1a918413c..9e5c6506c87 100644
--- a/target/i386/sev.h
+++ b/target/i386/sev.h
@@ -90,6 +90,7 @@ struct sev_ops {
     int (*sev_ioctl)(int fd, int cmd, void *data, int *error);
     const char *(*fw_error_to_str)(int code);
     int (*sev_send_start)(QEMUFile *f, uint64_t *bytes_sent);
+    int (*sev_receive_start)(QEMUFile *f);
 };
 
 #endif
diff --git a/target/i386/trace-events b/target/i386/trace-events
index 8db3e363859..1854356fc5b 100644
--- a/target/i386/trace-events
+++ b/target/i386/trace-events
@@ -23,3 +23,4 @@ kvm_sev_receive_update_vmsa(uint32_t cpu_id, uint32_t cpu_index, void *src, int
 # csv.c
 kvm_csv_launch_encrypt_data(uint64_t gpa, void *addr, uint64_t len) "gpa 0x%" PRIx64 "addr %p len 0x%" PRIu64
 kvm_csv_send_encrypt_data(void *dst, int len) "trans %p len %d"
+kvm_csv_receive_encrypt_data(void *dst, int len, void *hdr, int hdr_len) "trans %p len %d hdr %p hdr_len %d"
-- 
Gitee


From 3537ef32e75d8564ead9f2383e9a75d1f11f1cd0 Mon Sep 17 00:00:00 2001
From: jiangxin <jiangxin@hygon.cn>
Date: Fri, 17 Jun 2022 09:52:31 +0800
Subject: [PATCH 385/407] anolis: csv/i386: add support to migrate the outgoing
 context

CSV needs to migrate guest cpu's context pages. Prior to migration
of the context, it should query transfer buffer length and header
data length by SEND ENCRYPT CONTEXT command. New migration flag
RAM_SAVE_ENCRYPTED_CSV_CONTEXT is defined for CSV.

Change-Id: I1989e76bd5c91c78074fb31eff075deacfa16078
---
 target/i386/csv.c        | 79 ++++++++++++++++++++++++++++++++++++++++
 target/i386/csv.h        |  1 +
 target/i386/trace-events |  1 +
 3 files changed, 81 insertions(+)

diff --git a/target/i386/csv.c b/target/i386/csv.c
index 00ff7d20a54..271c48867e0 100644
--- a/target/i386/csv.c
+++ b/target/i386/csv.c
@@ -44,6 +44,7 @@ struct ConfidentialGuestMemoryEncryptionOps csv_memory_encryption_ops = {
     .save_queued_outgoing_pages = csv3_save_queued_outgoing_pages,
     .queue_incoming_page = NULL,
     .load_queued_incoming_pages = NULL,
+    .save_outgoing_cpu_state = csv3_save_outgoing_context,
 };
 
 #define CSV_OUTGOING_PAGE_NUM (CSV3_OUTGOING_PAGE_WINDOW_SIZE/TARGET_PAGE_SIZE)
@@ -569,3 +570,81 @@ int csv3_load_incoming_page(QEMUFile *f, uint8_t *ptr)
 
     return csv_receive_encrypt_data(f, ptr);
 }
+
+static int
+csv_send_get_context_len(int *fw_err, int *context_len, int *hdr_len)
+{
+    int ret = 0;
+    struct kvm_csv_send_encrypt_context update = {};
+
+    ret = csv_ioctl(KVM_CSV_SEND_ENCRYPT_CONTEXT, &update, fw_err);
+    if (*fw_err != SEV_RET_INVALID_LEN) {
+        error_report("%s: failed to get context length ret=%d fw_error=%d '%s'",
+                    __func__, ret, *fw_err, fw_error_to_str(*fw_err));
+        ret = -1;
+        goto err;
+    }
+
+    if (update.trans_len <= INT_MAX && update.hdr_len <= INT_MAX) {
+        *context_len = update.trans_len;
+        *hdr_len = update.hdr_len;
+    }
+    ret = 0;
+err:
+    return ret;
+}
+
+static int
+csv_send_encrypt_context(CsvGuestState *s, QEMUFile *f)
+{
+    int ret, fw_error = 0;
+    int context_len = 0;
+    int hdr_len = 0;
+    guchar *trans;
+    guchar *hdr;
+    struct kvm_csv_send_encrypt_context update = { };
+
+    ret = csv_send_get_context_len(&fw_error, &context_len, &hdr_len);
+    if (context_len < 1 || hdr_len < 1) {
+        error_report("%s: fail to get context length fw_error=%d '%s'",
+                     __func__, fw_error, fw_error_to_str(fw_error));
+        return 1;
+    }
+
+    /* allocate transport buffer */
+    trans = g_new(guchar, context_len);
+    hdr = g_new(guchar, hdr_len);
+
+    update.hdr_uaddr = (uintptr_t)hdr;
+    update.hdr_len = hdr_len;
+    update.trans_uaddr = (uintptr_t)trans;
+    update.trans_len = context_len;
+
+    trace_kvm_csv_send_encrypt_context(trans, update.trans_len);
+
+    ret = csv_ioctl(KVM_CSV_SEND_ENCRYPT_CONTEXT, &update, &fw_error);
+    if (ret) {
+        error_report("%s: SEND_ENCRYPT_CONTEXT ret=%d fw_error=%d '%s'",
+                     __func__, ret, fw_error, fw_error_to_str(fw_error));
+        goto err;
+    }
+
+    qemu_put_be32(f, update.hdr_len);
+    qemu_put_buffer(f, (uint8_t *)update.hdr_uaddr, update.hdr_len);
+
+    qemu_put_be32(f, update.trans_len);
+    qemu_put_buffer(f, (uint8_t *)update.trans_uaddr, update.trans_len);
+
+err:
+    g_free(trans);
+    g_free(hdr);
+    return ret;
+}
+
+int csv3_save_outgoing_context(QEMUFile *f)
+{
+    CsvGuestState *s = &csv_guest;
+
+    /* send csv context. */
+    return csv_send_encrypt_context(s, f);
+}
diff --git a/target/i386/csv.h b/target/i386/csv.h
index c8639cfa9a1..9def1611d88 100644
--- a/target/i386/csv.h
+++ b/target/i386/csv.h
@@ -103,6 +103,7 @@ int csv_load_data(uint64_t gpa, uint8_t *ptr, uint64_t len, Error **errp);
 
 int csv_shared_region_dma_map(uint64_t start, uint64_t end);
 void csv_shared_region_dma_unmap(uint64_t start, uint64_t end);
+int csv3_save_outgoing_context(QEMUFile *f);
 int csv3_load_incoming_page(QEMUFile *f, uint8_t *ptr);
 int csv3_queue_outgoing_page(uint8_t *ptr, uint32_t sz, uint64_t addr);
 int csv3_save_queued_outgoing_pages(QEMUFile *f, uint64_t *bytes_sent);
diff --git a/target/i386/trace-events b/target/i386/trace-events
index 1854356fc5b..08a782ce15c 100644
--- a/target/i386/trace-events
+++ b/target/i386/trace-events
@@ -23,4 +23,5 @@ kvm_sev_receive_update_vmsa(uint32_t cpu_id, uint32_t cpu_index, void *src, int
 # csv.c
 kvm_csv_launch_encrypt_data(uint64_t gpa, void *addr, uint64_t len) "gpa 0x%" PRIx64 "addr %p len 0x%" PRIu64
 kvm_csv_send_encrypt_data(void *dst, int len) "trans %p len %d"
+kvm_csv_send_encrypt_context(void *dst, int len) "trans %p len %d"
 kvm_csv_receive_encrypt_data(void *dst, int len, void *hdr, int hdr_len) "trans %p len %d hdr %p hdr_len %d"
-- 
Gitee


From 5b43e81295cdf43165db39494716a2593790c333 Mon Sep 17 00:00:00 2001
From: jiangxin <jiangxin@hygon.cn>
Date: Fri, 17 Jun 2022 10:00:46 +0800
Subject: [PATCH 386/407] anolis: csv/i386: add support to migrate the incoming
 context

The csv_load_incoming_context() provides the method to read incoming
guest's context from socket. It loads them into guest private memory.
This is the last step during migration and RECEIVE FINISH command is
performed by then to complete the whole migration.

Change-Id: I7290e0e9527bb819eee5813038110d981908a880
---
 target/i386/csv.c        | 45 ++++++++++++++++++++++++++++++++++++++++
 target/i386/csv.h        |  1 +
 target/i386/trace-events |  1 +
 3 files changed, 47 insertions(+)

diff --git a/target/i386/csv.c b/target/i386/csv.c
index 271c48867e0..b0ca16980d5 100644
--- a/target/i386/csv.c
+++ b/target/i386/csv.c
@@ -45,6 +45,7 @@ struct ConfidentialGuestMemoryEncryptionOps csv_memory_encryption_ops = {
     .queue_incoming_page = NULL,
     .load_queued_incoming_pages = NULL,
     .save_outgoing_cpu_state = csv3_save_outgoing_context,
+    .load_incoming_cpu_state = csv3_load_incoming_context,
 };
 
 #define CSV_OUTGOING_PAGE_NUM (CSV3_OUTGOING_PAGE_WINDOW_SIZE/TARGET_PAGE_SIZE)
@@ -641,6 +642,42 @@ err:
     return ret;
 }
 
+static int
+csv_receive_encrypt_context(CsvGuestState *s, QEMUFile *f)
+{
+    int ret = 1, fw_error = 0;
+    gchar *hdr = NULL, *trans = NULL;
+    struct kvm_csv_receive_encrypt_context update = {};
+
+    /* get packet header */
+    update.hdr_len = qemu_get_be32(f);
+
+    hdr = g_new(gchar, update.hdr_len);
+    qemu_get_buffer(f, (uint8_t *)hdr, update.hdr_len);
+    update.hdr_uaddr = (uintptr_t)hdr;
+
+    /* get transport buffer */
+    update.trans_len = qemu_get_be32(f);
+
+    trans = g_new(gchar, update.trans_len);
+    update.trans_uaddr = (uintptr_t)trans;
+    qemu_get_buffer(f, (uint8_t *)update.trans_uaddr, update.trans_len);
+
+    trace_kvm_csv_receive_encrypt_context(trans, update.trans_len, hdr, update.hdr_len);
+
+    ret = csv_ioctl(KVM_CSV_RECEIVE_ENCRYPT_CONTEXT, &update, &fw_error);
+    if (ret) {
+        error_report("Error RECEIVE_ENCRYPT_CONTEXT ret=%d fw_error=%d '%s'",
+                     ret, fw_error, fw_error_to_str(fw_error));
+        goto err;
+    }
+
+err:
+    g_free(trans);
+    g_free(hdr);
+    return ret;
+}
+
 int csv3_save_outgoing_context(QEMUFile *f)
 {
     CsvGuestState *s = &csv_guest;
@@ -648,3 +685,11 @@ int csv3_save_outgoing_context(QEMUFile *f)
     /* send csv context. */
     return csv_send_encrypt_context(s, f);
 }
+
+int csv3_load_incoming_context(QEMUFile *f)
+{
+    CsvGuestState *s = &csv_guest;
+
+    /* receive csv context. */
+    return csv_receive_encrypt_context(s, f);
+}
diff --git a/target/i386/csv.h b/target/i386/csv.h
index 9def1611d88..2e0506313d9 100644
--- a/target/i386/csv.h
+++ b/target/i386/csv.h
@@ -104,6 +104,7 @@ int csv_load_data(uint64_t gpa, uint8_t *ptr, uint64_t len, Error **errp);
 int csv_shared_region_dma_map(uint64_t start, uint64_t end);
 void csv_shared_region_dma_unmap(uint64_t start, uint64_t end);
 int csv3_save_outgoing_context(QEMUFile *f);
+int csv3_load_incoming_context(QEMUFile *f);
 int csv3_load_incoming_page(QEMUFile *f, uint8_t *ptr);
 int csv3_queue_outgoing_page(uint8_t *ptr, uint32_t sz, uint64_t addr);
 int csv3_save_queued_outgoing_pages(QEMUFile *f, uint64_t *bytes_sent);
diff --git a/target/i386/trace-events b/target/i386/trace-events
index 08a782ce15c..47ab390de69 100644
--- a/target/i386/trace-events
+++ b/target/i386/trace-events
@@ -25,3 +25,4 @@ kvm_csv_launch_encrypt_data(uint64_t gpa, void *addr, uint64_t len) "gpa 0x%" PR
 kvm_csv_send_encrypt_data(void *dst, int len) "trans %p len %d"
 kvm_csv_send_encrypt_context(void *dst, int len) "trans %p len %d"
 kvm_csv_receive_encrypt_data(void *dst, int len, void *hdr, int hdr_len) "trans %p len %d hdr %p hdr_len %d"
+kvm_csv_receive_encrypt_context(void *dst, int len, void *hdr, int hdr_len) "trans %p len %d hdr %p hdr_len %d"
-- 
Gitee


From 372cd3b2ef17f1011866a64e07c4a4eb18ee432a Mon Sep 17 00:00:00 2001
From: appleLin <appleLin@hygon.cn>
Date: Wed, 3 Aug 2022 21:02:41 +0800
Subject: [PATCH 387/407] anolis: target/i386/sev: Add support for reuse ASID
 for different CSV guests

In you want to reuse one ASID for many CSV guests, you should provide a
label (i.e. userid) and the length of the label when launch CSV guest.
The CSV guests which were provided the same userid will share the same
ASID.

Signed-off-by: hanliyang <hanliyang@hygon.cn>
---
 linux-headers/linux/kvm.h |  5 +++++
 qapi/qom.json             |  5 ++++-
 qemu-options.hx           |  5 ++++-
 target/i386/csv.c         |  2 --
 target/i386/csv.h         |  3 +++
 target/i386/sev.c         | 47 ++++++++++++++++++++++++++++++++++++++-
 6 files changed, 62 insertions(+), 5 deletions(-)

diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 854bd3bcb6f..f06bc701eec 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -2011,6 +2011,11 @@ struct kvm_csv_receive_encrypt_context {
 	__u32 trans_len;
 };
 
+struct kvm_csv_init {
+	__u64 userid_addr;
+	__u32 len;
+};
+
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
 #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
diff --git a/qapi/qom.json b/qapi/qom.json
index eeb5395ff3b..387c0a142df 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -773,6 +773,8 @@
 #                 designated guest firmware page for measured boot
 #                 with -kernel (default: false) (since 6.2)
 #
+# @user-id: the user id of the guest owner, only support on Hygon CPUs
+#
 # Since: 2.12
 ##
 { 'struct': 'SevGuestProperties',
@@ -783,7 +785,8 @@
             '*handle': 'uint32',
             '*cbitpos': 'uint32',
             'reduced-phys-bits': 'uint32',
-            '*kernel-hashes': 'bool' } }
+            '*kernel-hashes': 'bool',
+            '*user-id': 'str' } }
 
 ##
 # @ObjectType:
diff --git a/qemu-options.hx b/qemu-options.hx
index 981248e283e..6eaa529d4b5 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -5189,7 +5189,7 @@ SRST
                  -object secret,id=sec0,keyid=secmaster0,format=base64,\\
                      data=$SECRET,iv=$(<iv.b64)
 
-    ``-object sev-guest,id=id,cbitpos=cbitpos,reduced-phys-bits=val,[sev-device=string,policy=policy,handle=handle,dh-cert-file=file,session-file=file,kernel-hashes=on|off]``
+    ``-object sev-guest,id=id,cbitpos=cbitpos,reduced-phys-bits=val,[sev-device=string,policy=policy,handle=handle,dh-cert-file=file,session-file=file,kernel-hashes=on|off,user-id=id]``
         Create a Secure Encrypted Virtualization (SEV) guest object,
         which can be used to provide the guest memory encryption support
         on AMD processors.
@@ -5233,6 +5233,9 @@ SRST
         cmdline to a designated guest firmware page for measured Linux
         boot with -kernel. The default is off. (Since 6.2)
 
+        The ``user-id`` set the user id of the guest owner, this only
+        support on Hygon CPUs.
+
         e.g to launch a SEV guest
 
         .. parsed-literal::
diff --git a/target/i386/csv.c b/target/i386/csv.c
index b0ca16980d5..0e48982964d 100644
--- a/target/i386/csv.c
+++ b/target/i386/csv.c
@@ -52,8 +52,6 @@ struct ConfidentialGuestMemoryEncryptionOps csv_memory_encryption_ops = {
 
 CsvGuestState csv_guest = { 0 };
 
-#define GUEST_POLICY_CSV_BIT     (1 << 6)
-
 int
 csv_init(uint32_t policy, int fd, void *state, struct sev_ops *ops)
 {
diff --git a/target/i386/csv.h b/target/i386/csv.h
index 2e0506313d9..86f46d30c2c 100644
--- a/target/i386/csv.h
+++ b/target/i386/csv.h
@@ -22,6 +22,9 @@
 
 #include "cpu.h"
 
+#define GUEST_POLICY_CSV_BIT     (1 << 6)
+#define GUEST_POLICY_REUSE_ASID  (1 << 7)
+
 #define CPUID_VENDOR_HYGON_EBX   0x6f677948  /* "Hygo" */
 #define CPUID_VENDOR_HYGON_ECX   0x656e6975  /* "uine" */
 #define CPUID_VENDOR_HYGON_EDX   0x6e65476e  /* "nGen" */
diff --git a/target/i386/sev.c b/target/i386/sev.c
index 03a91e5b309..813f0de7161 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -82,6 +82,7 @@ struct SevGuestState {
     uint32_t cbitpos;
     uint32_t reduced_phys_bits;
     bool kernel_hashes;
+    char *user_id;
 
     /* runtime state */
     uint32_t handle;
@@ -383,6 +384,22 @@ sev_guest_set_dh_cert_file(Object *obj, const char *value, Error **errp)
     s->dh_cert_file = g_strdup(value);
 }
 
+static char *
+sev_guest_get_user_id(Object *obj, Error **errp)
+{
+    SevGuestState *s = SEV_GUEST(obj);
+
+    return g_strdup(s->user_id);
+}
+
+static void
+sev_guest_set_user_id(Object *obj, const char *value, Error **errp)
+{
+    SevGuestState *s = SEV_GUEST(obj);
+
+    s->user_id = g_strdup(value);
+}
+
 static char *
 sev_guest_get_sev_device(Object *obj, Error **errp)
 {
@@ -436,6 +453,11 @@ sev_guest_class_init(ObjectClass *oc, void *data)
                                    sev_guest_set_kernel_hashes);
     object_class_property_set_description(oc, "kernel-hashes",
             "add kernel hashes to guest firmware for measured Linux boot");
+    object_class_property_add_str(oc, "user-id",
+                                  sev_guest_get_user_id,
+                                  sev_guest_set_user_id);
+    object_class_property_set_description(oc, "user-id",
+            "user id of the guest owner");
 }
 
 static void
@@ -1148,7 +1170,30 @@ int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
     }
 
     trace_kvm_sev_init();
-    ret = sev_ioctl(sev->sev_fd, cmd, NULL, &fw_error);
+
+    /* Only support reuse asid for CSV/CSV2 guest */
+    if (is_hygon_cpu() &&
+        (sev_guest->policy & GUEST_POLICY_REUSE_ASID) &&
+        !(sev_guest->policy & GUEST_POLICY_CSV_BIT)) {
+        char *user_id = NULL;
+        struct kvm_csv_init *init_cmd_buf = NULL;
+
+        user_id = object_property_get_str(OBJECT(sev), "user-id", NULL);
+        if (user_id && strlen(user_id)) {
+            init_cmd_buf = g_new0(struct kvm_csv_init, 1);
+            init_cmd_buf->len = strlen(user_id);
+            init_cmd_buf->userid_addr = (__u64)user_id;
+        }
+        ret = sev_ioctl(sev->sev_fd, cmd, init_cmd_buf, &fw_error);
+
+        if (user_id) {
+            g_free(user_id);
+            g_free(init_cmd_buf);
+        }
+    } else {
+        ret = sev_ioctl(sev->sev_fd, cmd, NULL, &fw_error);
+    }
+
     if (ret) {
         error_setg(errp, "%s: failed to initialize ret=%d fw_error=%d '%s'",
                    __func__, ret, fw_error, fw_error_to_str(fw_error));
-- 
Gitee


From a1e91ebda76b6206617bcb046fdb0d8d20d1d614 Mon Sep 17 00:00:00 2001
From: xiongmengbiao <xiongmengbiao@hygon.cn>
Date: Wed, 6 Mar 2024 17:43:57 +0800
Subject: [PATCH 388/407] support vpsp

simulate a psp misc device for support tkm's key isolation

Signed-off-by: xiongmengbiao <xiongmengbiao@hygon.cn>
---
 hw/misc/Kconfig     |   4 ++
 hw/misc/meson.build |   1 +
 hw/misc/psp.c       | 141 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 146 insertions(+)
 create mode 100644 hw/misc/psp.c

diff --git a/hw/misc/Kconfig b/hw/misc/Kconfig
index 507058d8bff..d1d05442d0b 100644
--- a/hw/misc/Kconfig
+++ b/hw/misc/Kconfig
@@ -171,4 +171,8 @@ config SIFIVE_U_PRCI
 config VIRT_CTRL
     bool
 
+config PSP_DEV
+    bool
+    default y
+
 source macio/Kconfig
diff --git a/hw/misc/meson.build b/hw/misc/meson.build
index 3f41a3a5b27..39e58363160 100644
--- a/hw/misc/meson.build
+++ b/hw/misc/meson.build
@@ -10,6 +10,7 @@ softmmu_ss.add(when: 'CONFIG_UNIMP', if_true: files('unimp.c'))
 softmmu_ss.add(when: 'CONFIG_EMPTY_SLOT', if_true: files('empty_slot.c'))
 softmmu_ss.add(when: 'CONFIG_LED', if_true: files('led.c'))
 softmmu_ss.add(when: 'CONFIG_PVPANIC_COMMON', if_true: files('pvpanic.c'))
+softmmu_ss.add(when: 'CONFIG_PSP_DEV', if_true: files('psp.c'))
 
 # ARM devices
 softmmu_ss.add(when: 'CONFIG_PL310', if_true: files('arm_l2x0.c'))
diff --git a/hw/misc/psp.c b/hw/misc/psp.c
new file mode 100644
index 00000000000..1cfbab859e7
--- /dev/null
+++ b/hw/misc/psp.c
@@ -0,0 +1,141 @@
+/*
+ * hygon psp device emulation
+ *
+ * Copyright 2024 HYGON Corp.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at
+ * your option) any later version. See the COPYING file in the top-level
+ * directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/compiler.h"
+#include "qemu/error-report.h"
+#include "qapi/error.h"
+#include "migration/vmstate.h"
+#include "hw/qdev-properties.h"
+#include "sysemu/runstate.h"
+#include <sys/ioctl.h>
+
+#define TYPE_PSP_DEV "psp"
+OBJECT_DECLARE_SIMPLE_TYPE(PSPDevState, PSP_DEV)
+
+struct PSPDevState {
+    /* Private */
+    DeviceState pdev;
+
+    /* Public */
+    Notifier shutdown_notifier;
+    int dev_fd;
+    uint8_t enabled;
+
+    /**
+     * vid is used to identify a virtual machine in qemu.
+     * When a virtual machine accesses a tkm key,
+     * the TKM module uses different key spaces based on different vids.
+    */
+    uint32_t vid;
+};
+
+#define PSP_DEV_PATH "/dev/hygon_psp_config"
+#define HYGON_PSP_IOC_TYPE      'H'
+#define PSP_IOC_MUTEX_ENABLE    _IOWR(HYGON_PSP_IOC_TYPE, 1, NULL)
+#define PSP_IOC_MUTEX_DISABLE   _IOWR(HYGON_PSP_IOC_TYPE, 2, NULL)
+#define PSP_IOC_VPSP_OPT        _IOWR(HYGON_PSP_IOC_TYPE, 3, NULL)
+
+enum VPSP_DEV_CTRL_OPCODE {
+    VPSP_OP_VID_ADD,
+    VPSP_OP_VID_DEL,
+};
+
+struct psp_dev_ctrl {
+    unsigned char op;
+    union {
+        unsigned int vid;
+        unsigned char reserved[128];
+    } data;
+};
+
+static void psp_dev_destroy(PSPDevState *state)
+{
+    struct psp_dev_ctrl ctrl = { 0 };
+    if (state && state->dev_fd) {
+        if (state->enabled) {
+            ctrl.op = VPSP_OP_VID_DEL;
+            if (ioctl(state->dev_fd, PSP_IOC_VPSP_OPT, &ctrl) < 0) {
+                error_report("VPSP_OP_VID_DEL: %d", -errno);
+            } else {
+                state->enabled = false;
+            }
+        }
+        qemu_close(state->dev_fd);
+        state->dev_fd = 0;
+    }
+}
+
+/**
+ * Guest OS performs shut down operations through 'shutdown' and 'powerdown' event.
+ * The 'powerdown' event will also trigger 'shutdown' in the end,
+ * so only attention to the 'shutdown' event.
+ *
+ * When Guest OS trigger 'reboot' or 'reset' event, to do nothing.
+*/
+static void psp_dev_shutdown_notify(Notifier *notifier, void *data)
+{
+    PSPDevState *state = container_of(notifier, PSPDevState, shutdown_notifier);
+    psp_dev_destroy(state);
+}
+
+static void psp_dev_realize(DeviceState *dev, Error **errp)
+{
+    struct psp_dev_ctrl ctrl = { 0 };
+    PSPDevState *state = PSP_DEV(dev);
+
+    state->dev_fd = qemu_open_old(PSP_DEV_PATH, O_RDWR);
+    if (state->dev_fd < 0) {
+        error_setg(errp, "fail to open %s, errno %d.", PSP_DEV_PATH, errno);
+        goto end;
+    }
+
+    ctrl.op = VPSP_OP_VID_ADD;
+    ctrl.data.vid = state->vid;
+    if (ioctl(state->dev_fd, PSP_IOC_VPSP_OPT, &ctrl) < 0) {
+        error_setg(errp, "psp_dev_realize VPSP_OP_VID_ADD vid %d, return %d", ctrl.data.vid, -errno);
+        goto end;
+    }
+
+    state->enabled = true;
+    state->shutdown_notifier.notify = psp_dev_shutdown_notify;
+    qemu_register_shutdown_notifier(&state->shutdown_notifier);
+end:
+    return;
+}
+
+static struct Property psp_dev_properties[] = {
+    DEFINE_PROP_UINT32("vid", PSPDevState, vid, 0),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void psp_dev_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    dc->desc = "PSP Device";
+    dc->realize = psp_dev_realize;
+    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+    device_class_set_props(dc, psp_dev_properties);
+}
+
+static const TypeInfo psp_dev_info = {
+    .name = TYPE_PSP_DEV,
+    .parent = TYPE_DEVICE,
+    .instance_size = sizeof(PSPDevState),
+    .class_init = psp_dev_class_init,
+};
+
+static void psp_dev_register_types(void)
+{
+    type_register_static(&psp_dev_info);
+}
+
+type_init(psp_dev_register_types)
-- 
Gitee


From fc12000269b080045c1c1f20ac7184c49d9d5641 Mon Sep 17 00:00:00 2001
From: Yanjing Zhou <zhouyanjing@hygon.cn>
Date: Tue, 11 Jun 2024 14:28:12 +0800
Subject: [PATCH 389/407] target/i386: Add Hygon Dhyana-v3 CPU model

Add the following feature bits for Dhyana CPU model:
perfctr-core, clzero, xsaveerptr, aes, pclmulqdq, sha-ni

Disable xsaves feature bit for Erratum 1386

Signed-off-by: Yanjing Zhou <zhouyanjing@hygon.cn>
---
 target/i386/cpu.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 7d779c6a8ca..0a20ac38fd7 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -4094,6 +4094,20 @@ static const X86CPUDefinition builtin_x86_defs[] = {
                   { /* end of list */ }
               },
             },
+            { .version = 3,
+              .props = (PropValue[]) {
+                  { "xsaves", "off" },
+                  { "perfctr-core", "on" },
+                  { "clzero", "on" },
+                  { "xsaveerptr", "on" },
+                  { "aes", "on" },
+                  { "pclmulqdq", "on" },
+                  { "sha-ni", "on" },
+                  { "model-id",
+                      "Hygon Dhyana-v3 processor" },
+                  { /* end of list */ }
+              },
+            },
             { /* end of list */ }
         }
     },
-- 
Gitee


From a1ec1c6fe146c0aa6c7758df9da969d2c9e74efa Mon Sep 17 00:00:00 2001
From: Yanjing Zhou <zhouyanjing@hygon.cn>
Date: Tue, 11 Jun 2024 14:29:34 +0800
Subject: [PATCH 390/407] target/i386: Add new Hygon 'Dharma' CPU model

Add the following feature bits compare to Dhyana CPU model:
stibp, ibrs, umip, ssbd

Signed-off-by: Yanjing Zhou <zhouyanjing@hygon.cn>
---
 target/i386/cpu.c | 99 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 99 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 0a20ac38fd7..12aabc24027 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1760,6 +1760,56 @@ static const CPUCaches epyc_milan_cache_info = {
     },
 };
 
+static const CPUCaches dharma_cache_info = {
+    .l1d_cache = &(CPUCacheInfo) {
+        .type = DATA_CACHE,
+        .level = 1,
+        .size = 32 * KiB,
+        .line_size = 64,
+        .associativity = 8,
+        .partitions = 1,
+        .sets = 64,
+        .lines_per_tag = 1,
+        .self_init = 1,
+        .no_invd_sharing = true,
+    },
+    .l1i_cache = &(CPUCacheInfo) {
+        .type = INSTRUCTION_CACHE,
+        .level = 1,
+        .size = 32 * KiB,
+        .line_size = 64,
+        .associativity = 8,
+        .partitions = 1,
+        .sets = 64,
+        .lines_per_tag = 1,
+        .self_init = 1,
+        .no_invd_sharing = true,
+    },
+    .l2_cache = &(CPUCacheInfo) {
+        .type = UNIFIED_CACHE,
+        .level = 2,
+        .size = 512 * KiB,
+        .line_size = 64,
+        .associativity = 8,
+        .partitions = 1,
+        .sets = 1024,
+        .lines_per_tag = 1,
+    },
+    .l3_cache = &(CPUCacheInfo) {
+        .type = UNIFIED_CACHE,
+        .level = 3,
+        .size = 16 * MiB,
+        .line_size = 64,
+        .associativity = 16,
+        .partitions = 1,
+        .sets = 16384,
+        .lines_per_tag = 1,
+        .self_init = true,
+        .inclusive = true,
+        .complex_indexing = true,
+    },
+};
+
 /* The following VMX features are not supported by KVM and are left out in the
  * CPU definitions:
  *
@@ -4228,6 +4278,55 @@ static const X86CPUDefinition builtin_x86_defs[] = {
         .model_id = "AMD EPYC-Milan Processor",
         .cache_info = &epyc_milan_cache_info,
     },
+    {
+        .name = "Dharma",
+        .level = 0xd,
+        .vendor = CPUID_VENDOR_HYGON,
+        .family = 24,
+        .model = 4,
+        .stepping = 0,
+        .features[FEAT_1_EDX] =
+            CPUID_SSE2 | CPUID_SSE | CPUID_FXSR | CPUID_MMX | CPUID_CLFLUSH |
+            CPUID_PSE36 | CPUID_PAT | CPUID_CMOV | CPUID_MCA | CPUID_PGE |
+            CPUID_MTRR | CPUID_SEP | CPUID_APIC | CPUID_CX8 | CPUID_MCE |
+            CPUID_PAE | CPUID_MSR | CPUID_TSC | CPUID_PSE | CPUID_DE |
+            CPUID_VME | CPUID_FP87,
+        .features[FEAT_1_ECX] =
+            CPUID_EXT_RDRAND | CPUID_EXT_F16C | CPUID_EXT_AVX |
+            CPUID_EXT_XSAVE | CPUID_EXT_AES | CPUID_EXT_POPCNT |
+            CPUID_EXT_MOVBE | CPUID_EXT_SSE42 | CPUID_EXT_SSE41 |
+            CPUID_EXT_CX16 | CPUID_EXT_FMA | CPUID_EXT_SSSE3 |
+            CPUID_EXT_MONITOR | CPUID_EXT_PCLMULQDQ | CPUID_EXT_SSE3,
+        .features[FEAT_8000_0001_EDX] =
+            CPUID_EXT2_LM | CPUID_EXT2_RDTSCP | CPUID_EXT2_PDPE1GB |
+            CPUID_EXT2_FFXSR | CPUID_EXT2_MMXEXT | CPUID_EXT2_NX |
+            CPUID_EXT2_SYSCALL,
+        .features[FEAT_8000_0001_ECX] =
+            CPUID_EXT3_OSVW | CPUID_EXT3_3DNOWPREFETCH |
+            CPUID_EXT3_MISALIGNSSE | CPUID_EXT3_SSE4A | CPUID_EXT3_ABM |
+            CPUID_EXT3_CR8LEG | CPUID_EXT3_SVM | CPUID_EXT3_LAHF_LM |
+            CPUID_EXT3_TOPOEXT | CPUID_EXT3_PERFCORE,
+        .features[FEAT_8000_0008_EBX] =
+            CPUID_8000_0008_EBX_CLZERO | CPUID_8000_0008_EBX_XSAVEERPTR |
+            CPUID_8000_0008_EBX_IBPB | CPUID_8000_0008_EBX_IBRS |
+            CPUID_8000_0008_EBX_STIBP | CPUID_8000_0008_EBX_AMD_SSBD,
+        .features[FEAT_7_0_EBX] =
+            CPUID_7_0_EBX_FSGSBASE | CPUID_7_0_EBX_BMI1 | CPUID_7_0_EBX_AVX2 |
+            CPUID_7_0_EBX_SMEP | CPUID_7_0_EBX_BMI2 | CPUID_7_0_EBX_RDSEED |
+            CPUID_7_0_EBX_ADX | CPUID_7_0_EBX_SMAP | CPUID_7_0_EBX_CLFLUSHOPT |
+            CPUID_7_0_EBX_SHA_NI,
+        .features[FEAT_7_0_ECX] = CPUID_7_0_ECX_UMIP,
+        .features[FEAT_XSAVE] =
+            CPUID_XSAVE_XSAVEOPT | CPUID_XSAVE_XSAVEC |
+            CPUID_XSAVE_XGETBV1,
+        .features[FEAT_6_EAX] =
+            CPUID_6_EAX_ARAT,
+        .features[FEAT_SVM] =
+            CPUID_SVM_NPT | CPUID_SVM_NRIPSAVE,
+        .xlevel = 0x8000001E,
+        .model_id = "Hygon Dharma Processor",
+        .cache_info = &dharma_cache_info,
+    },
 };
 
 /*
-- 
Gitee


From 04b7c258c88c8034b620d6824c836438f25c0e1d Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini@redhat.com>
Date: Mon, 27 Feb 2023 10:57:09 +0100
Subject: [PATCH 391/407] target/i386: add FSRM to TCG

commit c0728d4e3d23356691e4182eac54c67e1ca26618 upstream.

Fast short REP MOVS can be added to TCG, since a trivial translation
of string operation is a good option for short lengths.

Intel-SIG: commit c0728d4e3d23 target/i386: add FSRM to TCG.
Add SPR/GNR/SRA new ISAs backporting

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
---
 target/i386/cpu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 12aabc24027..b13f57d4f4e 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -661,7 +661,7 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t vendor1,
 #define TCG_7_0_ECX_FEATURES (CPUID_7_0_ECX_PKU | \
           /* CPUID_7_0_ECX_OSPKE is dynamic */ \
           CPUID_7_0_ECX_LA57 | CPUID_7_0_ECX_PKS)
-#define TCG_7_0_EDX_FEATURES 0
+#define TCG_7_0_EDX_FEATURES CPUID_7_0_EDX_FSRM
 #define TCG_7_1_EAX_FEATURES 0
 #define TCG_APM_FEATURES 0
 #define TCG_6_EAX_FEATURES CPUID_6_EAX_ARAT
-- 
Gitee


From 14fd15969ad8eacd03fd5113e6749fb978d3b274 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini@redhat.com>
Date: Mon, 27 Feb 2023 10:55:46 +0100
Subject: [PATCH 392/407] target/i386: add FZRM, FSRS, FSRC

commit 58794f644e43ef8e60ed05395c58099311c1fcd1 upstream.

These are three more markers for string operation optimizations.
They can all be added to TCG, whose string operations are more or
less as fast as they can be for short lengths.

Intel-SIG: commit 58794f644e43 target/i386: add FZRM, FSRS, FSRC.
Add SPR/GNR/SRA new ISAs backporting

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
---
 target/i386/cpu.c | 7 ++++---
 target/i386/cpu.h | 7 +++++++
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index b13f57d4f4e..22132e16c89 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -662,7 +662,8 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t vendor1,
           /* CPUID_7_0_ECX_OSPKE is dynamic */ \
           CPUID_7_0_ECX_LA57 | CPUID_7_0_ECX_PKS)
 #define TCG_7_0_EDX_FEATURES CPUID_7_0_EDX_FSRM
-#define TCG_7_1_EAX_FEATURES 0
+#define TCG_7_1_EAX_FEATURES (CPUID_7_1_EAX_FZRM | CPUID_7_1_EAX_FSRS | \
+          CPUID_7_1_EAX_FSRC)
 #define TCG_APM_FEATURES 0
 #define TCG_6_EAX_FEATURES CPUID_6_EAX_ARAT
 #define TCG_XSAVE_FEATURES (CPUID_XSAVE_XSAVEOPT | CPUID_XSAVE_XGETBV1)
@@ -872,8 +873,8 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
         .feat_names = {
             NULL, NULL, NULL, NULL,
             "avx-vnni", "avx512-bf16", NULL, NULL,
-            NULL, NULL, NULL, NULL,
-            NULL, NULL, NULL, NULL,
+            NULL, NULL, "fzrm", "fsrs",
+            "fsrc", NULL, NULL, NULL,
             NULL, NULL, NULL, NULL,
             NULL, NULL, NULL, NULL,
             NULL, NULL, NULL, NULL,
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 1415a33fbbc..98f885ace36 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -879,6 +879,13 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
 #define CPUID_7_1_EAX_AVX_VNNI          (1U << 4)
 /* AVX512 BFloat16 Instruction */
 #define CPUID_7_1_EAX_AVX512_BF16       (1U << 5)
+/* Fast Zero REP MOVS */
+#define CPUID_7_1_EAX_FZRM              (1U << 10)
+/* Fast Short REP STOS */
+#define CPUID_7_1_EAX_FSRS              (1U << 11)
+/* Fast Short REP CMPS/SCAS */
+#define CPUID_7_1_EAX_FSRC              (1U << 12)
+
 /* XFD Extend Feature Disabled */
 #define CPUID_D_1_EAX_XFD               (1U << 4)
 
-- 
Gitee


From 74944bb93a19009374cc9a12e3fe07d9f9685eb9 Mon Sep 17 00:00:00 2001
From: "Wang, Lei" <lei4.wang@intel.com>
Date: Thu, 11 Aug 2022 22:57:51 -0700
Subject: [PATCH 393/407] i386: Add new CPU model SapphireRapids

commit 7eb061b06e97af9a8da7f31b839d78997ae737fc upstream.

The new CPU model mostly inherits features from Icelake-Server, while
adding new features:
 - AMX (Advance Matrix eXtensions)
 - Bus Lock Debug Exception
and new instructions:
 - AVX VNNI (Vector Neural Network Instruction):
    - VPDPBUS: Multiply and Add Unsigned and Signed Bytes
    - VPDPBUSDS: Multiply and Add Unsigned and Signed Bytes with Saturation
    - VPDPWSSD: Multiply and Add Signed Word Integers
    - VPDPWSSDS: Multiply and Add Signed Integers with Saturation
 - FP16: Replicates existing AVX512 computational SP (FP32) instructions
   using FP16 instead of FP32 for ~2X performance gain
 - SERIALIZE: Provide software with a simple way to force the processor to
   complete all modifications, faster, allowed in all privilege levels and
   not causing an unconditional VM exit
 - TSX Suspend Load Address Tracking: Allows programmers to choose which
   memory accesses do not need to be tracked in the TSX read set
 - AVX512_BF16: Vector Neural Network Instructions supporting BFLOAT16
   inputs and conversion instructions from IEEE single precision
 - fast zero-length MOVSB (KVM doesn't support yet)
 - fast short STOSB (KVM doesn't support yet)
 - fast short CMPSB, SCASB (KVM doesn't support yet)

Features that may be added in future versions:
 - CET (virtualization support hasn't been merged)

Intel-SIG: commit 7eb061b06e97 i386: Add new CPU model SapphireRapids.
Add SPR/GNR/SRA new ISAs backporting

Signed-off-by: Wang, Lei <lei4.wang@intel.com>
Reviewed-by: Robert Hoo <robert.hu@linux.intel.com>
Message-Id: <20220812055751.14553-1-lei4.wang@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
---
 target/i386/cpu.c | 133 +++++++++++++++++++++++++++++++++++++++++++++-
 target/i386/cpu.h |   4 ++
 2 files changed, 135 insertions(+), 2 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 22132e16c89..86697039c9a 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -3596,6 +3596,135 @@ static const X86CPUDefinition builtin_x86_defs[] = {
             { /* end of list */ }
         }
     },
+    {
+        .name = "SapphireRapids",
+        .level = 0x20,
+        .vendor = CPUID_VENDOR_INTEL,
+        .family = 6,
+        .model = 143,
+        .stepping = 4,
+        /*
+         * please keep the ascending order so that we can have a clear view of
+         * bit position of each feature.
+         */
+        .features[FEAT_1_EDX] =
+            CPUID_FP87 | CPUID_VME | CPUID_DE | CPUID_PSE | CPUID_TSC |
+            CPUID_MSR | CPUID_PAE | CPUID_MCE | CPUID_CX8 | CPUID_APIC |
+            CPUID_SEP | CPUID_MTRR | CPUID_PGE | CPUID_MCA | CPUID_CMOV |
+            CPUID_PAT | CPUID_PSE36 | CPUID_CLFLUSH | CPUID_MMX | CPUID_FXSR |
+            CPUID_SSE | CPUID_SSE2,
+        .features[FEAT_1_ECX] =
+            CPUID_EXT_SSE3 | CPUID_EXT_PCLMULQDQ | CPUID_EXT_SSSE3 |
+            CPUID_EXT_FMA | CPUID_EXT_CX16 | CPUID_EXT_PCID | CPUID_EXT_SSE41 |
+            CPUID_EXT_SSE42 | CPUID_EXT_X2APIC | CPUID_EXT_MOVBE |
+            CPUID_EXT_POPCNT | CPUID_EXT_TSC_DEADLINE_TIMER | CPUID_EXT_AES |
+            CPUID_EXT_XSAVE | CPUID_EXT_AVX | CPUID_EXT_F16C | CPUID_EXT_RDRAND,
+        .features[FEAT_8000_0001_EDX] =
+            CPUID_EXT2_SYSCALL | CPUID_EXT2_NX | CPUID_EXT2_PDPE1GB |
+            CPUID_EXT2_RDTSCP | CPUID_EXT2_LM,
+        .features[FEAT_8000_0001_ECX] =
+            CPUID_EXT3_LAHF_LM | CPUID_EXT3_ABM | CPUID_EXT3_3DNOWPREFETCH,
+        .features[FEAT_8000_0008_EBX] =
+            CPUID_8000_0008_EBX_WBNOINVD,
+        .features[FEAT_7_0_EBX] =
+            CPUID_7_0_EBX_FSGSBASE | CPUID_7_0_EBX_BMI1 | CPUID_7_0_EBX_HLE |
+            CPUID_7_0_EBX_AVX2 | CPUID_7_0_EBX_SMEP | CPUID_7_0_EBX_BMI2 |
+            CPUID_7_0_EBX_ERMS | CPUID_7_0_EBX_INVPCID | CPUID_7_0_EBX_RTM |
+            CPUID_7_0_EBX_AVX512F | CPUID_7_0_EBX_AVX512DQ |
+            CPUID_7_0_EBX_RDSEED | CPUID_7_0_EBX_ADX | CPUID_7_0_EBX_SMAP |
+            CPUID_7_0_EBX_AVX512IFMA | CPUID_7_0_EBX_CLFLUSHOPT |
+            CPUID_7_0_EBX_CLWB | CPUID_7_0_EBX_AVX512CD | CPUID_7_0_EBX_SHA_NI |
+            CPUID_7_0_EBX_AVX512BW | CPUID_7_0_EBX_AVX512VL,
+        .features[FEAT_7_0_ECX] =
+            CPUID_7_0_ECX_AVX512_VBMI | CPUID_7_0_ECX_UMIP | CPUID_7_0_ECX_PKU |
+            CPUID_7_0_ECX_AVX512_VBMI2 | CPUID_7_0_ECX_GFNI |
+            CPUID_7_0_ECX_VAES | CPUID_7_0_ECX_VPCLMULQDQ |
+            CPUID_7_0_ECX_AVX512VNNI | CPUID_7_0_ECX_AVX512BITALG |
+            CPUID_7_0_ECX_AVX512_VPOPCNTDQ | CPUID_7_0_ECX_LA57 |
+            CPUID_7_0_ECX_RDPID | CPUID_7_0_ECX_BUS_LOCK_DETECT,
+        .features[FEAT_7_0_EDX] =
+            CPUID_7_0_EDX_FSRM | CPUID_7_0_EDX_SERIALIZE |
+            CPUID_7_0_EDX_TSX_LDTRK | CPUID_7_0_EDX_AMX_BF16 |
+            CPUID_7_0_EDX_AVX512_FP16 | CPUID_7_0_EDX_AMX_TILE |
+            CPUID_7_0_EDX_AMX_INT8 | CPUID_7_0_EDX_SPEC_CTRL |
+            CPUID_7_0_EDX_ARCH_CAPABILITIES | CPUID_7_0_EDX_SPEC_CTRL_SSBD,
+        .features[FEAT_ARCH_CAPABILITIES] =
+            MSR_ARCH_CAP_RDCL_NO | MSR_ARCH_CAP_IBRS_ALL |
+            MSR_ARCH_CAP_SKIP_L1DFL_VMENTRY | MSR_ARCH_CAP_MDS_NO |
+            MSR_ARCH_CAP_PSCHANGE_MC_NO | MSR_ARCH_CAP_TAA_NO,
+        .features[FEAT_XSAVE] =
+            CPUID_XSAVE_XSAVEOPT | CPUID_XSAVE_XSAVEC |
+            CPUID_XSAVE_XGETBV1 | CPUID_XSAVE_XSAVES | CPUID_D_1_EAX_XFD,
+        .features[FEAT_6_EAX] =
+            CPUID_6_EAX_ARAT,
+        .features[FEAT_7_1_EAX] =
+            CPUID_7_1_EAX_AVX_VNNI | CPUID_7_1_EAX_AVX512_BF16 |
+            CPUID_7_1_EAX_FZRM | CPUID_7_1_EAX_FSRS | CPUID_7_1_EAX_FSRC,
+        .features[FEAT_VMX_BASIC] =
+            MSR_VMX_BASIC_INS_OUTS | MSR_VMX_BASIC_TRUE_CTLS,
+        .features[FEAT_VMX_ENTRY_CTLS] =
+            VMX_VM_ENTRY_LOAD_DEBUG_CONTROLS | VMX_VM_ENTRY_IA32E_MODE |
+            VMX_VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL |
+            VMX_VM_ENTRY_LOAD_IA32_PAT | VMX_VM_ENTRY_LOAD_IA32_EFER,
+        .features[FEAT_VMX_EPT_VPID_CAPS] =
+            MSR_VMX_EPT_EXECONLY |
+            MSR_VMX_EPT_PAGE_WALK_LENGTH_4 | MSR_VMX_EPT_PAGE_WALK_LENGTH_5 |
+            MSR_VMX_EPT_WB | MSR_VMX_EPT_2MB | MSR_VMX_EPT_1GB |
+            MSR_VMX_EPT_INVEPT | MSR_VMX_EPT_AD_BITS |
+            MSR_VMX_EPT_INVEPT_SINGLE_CONTEXT | MSR_VMX_EPT_INVEPT_ALL_CONTEXT |
+            MSR_VMX_EPT_INVVPID | MSR_VMX_EPT_INVVPID_SINGLE_ADDR |
+            MSR_VMX_EPT_INVVPID_SINGLE_CONTEXT |
+            MSR_VMX_EPT_INVVPID_ALL_CONTEXT |
+            MSR_VMX_EPT_INVVPID_SINGLE_CONTEXT_NOGLOBALS,
+        .features[FEAT_VMX_EXIT_CTLS] =
+            VMX_VM_EXIT_SAVE_DEBUG_CONTROLS |
+            VMX_VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL |
+            VMX_VM_EXIT_ACK_INTR_ON_EXIT | VMX_VM_EXIT_SAVE_IA32_PAT |
+            VMX_VM_EXIT_LOAD_IA32_PAT | VMX_VM_EXIT_SAVE_IA32_EFER |
+            VMX_VM_EXIT_LOAD_IA32_EFER | VMX_VM_EXIT_SAVE_VMX_PREEMPTION_TIMER,
+        .features[FEAT_VMX_MISC] =
+            MSR_VMX_MISC_STORE_LMA | MSR_VMX_MISC_ACTIVITY_HLT |
+            MSR_VMX_MISC_VMWRITE_VMEXIT,
+        .features[FEAT_VMX_PINBASED_CTLS] =
+            VMX_PIN_BASED_EXT_INTR_MASK | VMX_PIN_BASED_NMI_EXITING |
+            VMX_PIN_BASED_VIRTUAL_NMIS | VMX_PIN_BASED_VMX_PREEMPTION_TIMER |
+            VMX_PIN_BASED_POSTED_INTR,
+        .features[FEAT_VMX_PROCBASED_CTLS] =
+            VMX_CPU_BASED_VIRTUAL_INTR_PENDING |
+            VMX_CPU_BASED_USE_TSC_OFFSETING | VMX_CPU_BASED_HLT_EXITING |
+            VMX_CPU_BASED_INVLPG_EXITING | VMX_CPU_BASED_MWAIT_EXITING |
+            VMX_CPU_BASED_RDPMC_EXITING | VMX_CPU_BASED_RDTSC_EXITING |
+            VMX_CPU_BASED_CR3_LOAD_EXITING | VMX_CPU_BASED_CR3_STORE_EXITING |
+            VMX_CPU_BASED_CR8_LOAD_EXITING | VMX_CPU_BASED_CR8_STORE_EXITING |
+            VMX_CPU_BASED_TPR_SHADOW | VMX_CPU_BASED_VIRTUAL_NMI_PENDING |
+            VMX_CPU_BASED_MOV_DR_EXITING | VMX_CPU_BASED_UNCOND_IO_EXITING |
+            VMX_CPU_BASED_USE_IO_BITMAPS | VMX_CPU_BASED_MONITOR_TRAP_FLAG |
+            VMX_CPU_BASED_USE_MSR_BITMAPS | VMX_CPU_BASED_MONITOR_EXITING |
+            VMX_CPU_BASED_PAUSE_EXITING |
+            VMX_CPU_BASED_ACTIVATE_SECONDARY_CONTROLS,
+        .features[FEAT_VMX_SECONDARY_CTLS] =
+            VMX_SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
+            VMX_SECONDARY_EXEC_ENABLE_EPT | VMX_SECONDARY_EXEC_DESC |
+            VMX_SECONDARY_EXEC_RDTSCP |
+            VMX_SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE |
+            VMX_SECONDARY_EXEC_ENABLE_VPID | VMX_SECONDARY_EXEC_WBINVD_EXITING |
+            VMX_SECONDARY_EXEC_UNRESTRICTED_GUEST |
+            VMX_SECONDARY_EXEC_APIC_REGISTER_VIRT |
+            VMX_SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
+            VMX_SECONDARY_EXEC_RDRAND_EXITING |
+            VMX_SECONDARY_EXEC_ENABLE_INVPCID |
+            VMX_SECONDARY_EXEC_ENABLE_VMFUNC | VMX_SECONDARY_EXEC_SHADOW_VMCS |
+            VMX_SECONDARY_EXEC_RDSEED_EXITING | VMX_SECONDARY_EXEC_ENABLE_PML |
+            VMX_SECONDARY_EXEC_XSAVES,
+        .features[FEAT_VMX_VMFUNC] =
+            MSR_VMX_VMFUNC_EPT_SWITCHING,
+        .xlevel = 0x80000008,
+        .model_id = "Intel Xeon Processor (SapphireRapids)",
+        .versions = (X86CPUVersionDefinition[]) {
+            { .version = 1 },
+            { /* end of list */ },
+        },
+    },
     {
         .name = "Denverton",
         .level = 21,
@@ -5729,7 +5858,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
         break;
     }
     case 0x1D: {
-        /* AMX TILE */
+        /* AMX TILE, for now hardcoded for Sapphire Rapids*/
         *eax = 0;
         *ebx = 0;
         *ecx = 0;
@@ -5750,7 +5879,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
         break;
     }
     case 0x1E: {
-        /* AMX TMUL */
+        /* AMX TMUL, for now hardcoded for Sapphire Rapids */
         *eax = 0;
         *ebx = 0;
         *ecx = 0;
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 98f885ace36..d156ccd17ea 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -858,10 +858,14 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
 #define CPUID_7_0_EDX_SERIALIZE         (1U << 14)
 /* TSX Suspend Load Address Tracking instruction */
 #define CPUID_7_0_EDX_TSX_LDTRK         (1U << 16)
+/* AMX_BF16 instruction */
+#define CPUID_7_0_EDX_AMX_BF16          (1U << 22)
 /* AVX512_FP16 instruction */
 #define CPUID_7_0_EDX_AVX512_FP16       (1U << 23)
 /* AMX tile (two-dimensional register) */
 #define CPUID_7_0_EDX_AMX_TILE          (1U << 24)
+/* AMX_INT8 instruction */
+#define CPUID_7_0_EDX_AMX_INT8          (1U << 25)
 /* Speculation Control */
 #define CPUID_7_0_EDX_SPEC_CTRL         (1U << 26)
 /* Single Thread Indirect Branch Predictors */
-- 
Gitee


From b2a50fe4aa32ea55683244b1d7048f6a2f05187b Mon Sep 17 00:00:00 2001
From: Jiaxi Chen <jiaxi.chen@linux.intel.com>
Date: Fri, 3 Mar 2023 14:59:08 +0800
Subject: [PATCH 394/407] target/i386: Add support for CMPCCXADD in CPUID
 enumeration

commit a9ce107fd0f2017af84255a9cf6542fa3eb3e214 upstream.

CMPccXADD is a new set of instructions in the latest Intel platform
Sierra Forest. This new instruction set includes a semaphore operation
that can compare and add the operands if condition is met, which can
improve database performance.

The bit definition:
CPUID.(EAX=7,ECX=1):EAX[bit 7]

Add CPUID definition for CMPCCXADD.

Intel-SIG: commit a9ce107fd0f2 target/i386: Add support for CMPCCXADD in CPUID enumeration.
Add SPR/GNR/SRA new ISAs backporting

Signed-off-by: Jiaxi Chen <jiaxi.chen@linux.intel.com>
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20230303065913.1246327-2-tao1.su@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
---
 target/i386/cpu.c | 2 +-
 target/i386/cpu.h | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 86697039c9a..989d3213cb0 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -872,7 +872,7 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
         .type = CPUID_FEATURE_WORD,
         .feat_names = {
             NULL, NULL, NULL, NULL,
-            "avx-vnni", "avx512-bf16", NULL, NULL,
+            "avx-vnni", "avx512-bf16", NULL, "cmpccxadd",
             NULL, NULL, "fzrm", "fsrs",
             "fsrc", NULL, NULL, NULL,
             NULL, NULL, NULL, NULL,
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index d156ccd17ea..f8f5c1d8571 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -883,6 +883,8 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
 #define CPUID_7_1_EAX_AVX_VNNI          (1U << 4)
 /* AVX512 BFloat16 Instruction */
 #define CPUID_7_1_EAX_AVX512_BF16       (1U << 5)
+/* CMPCCXADD Instructions */
+#define CPUID_7_1_EAX_CMPCCXADD         (1U << 7)
 /* Fast Zero REP MOVS */
 #define CPUID_7_1_EAX_FZRM              (1U << 10)
 /* Fast Short REP STOS */
-- 
Gitee


From 40179d58d2ddcb2a840eac25126117567b9c02cc Mon Sep 17 00:00:00 2001
From: Jiaxi Chen <jiaxi.chen@linux.intel.com>
Date: Fri, 3 Mar 2023 14:59:09 +0800
Subject: [PATCH 395/407] target/i386: Add support for AMX-FP16 in CPUID
 enumeration

commit 99ed8445ea27742a4df40f51a3a5fbd6f8e76fa5 upstream.

Latest Intel platform Granite Rapids has introduced a new instruction -
AMX-FP16, which performs dot-products of two FP16 tiles and accumulates
the results into a packed single precision tile. AMX-FP16 adds FP16
capability and allows a FP16 GPU trained model to run faster without
loss of accuracy or added SW overhead.

The bit definition:
CPUID.(EAX=7,ECX=1):EAX[bit 21]

Add CPUID definition for AMX-FP16.

Intel-SIG: commit 99ed8445ea27 target/i386: Add support for AMX-FP16 in CPUID enumeration.
Add SPR/GNR/SRA new ISAs backporting

Signed-off-by: Jiaxi Chen <jiaxi.chen@linux.intel.com>
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20230303065913.1246327-3-tao1.su@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
---
 target/i386/cpu.c | 2 +-
 target/i386/cpu.h | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 989d3213cb0..8420624950e 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -876,7 +876,7 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
             NULL, NULL, "fzrm", "fsrs",
             "fsrc", NULL, NULL, NULL,
             NULL, NULL, NULL, NULL,
-            NULL, NULL, NULL, NULL,
+            NULL, "amx-fp16", NULL, NULL,
             NULL, NULL, NULL, NULL,
             NULL, NULL, NULL, NULL,
         },
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index f8f5c1d8571..62f61cefb37 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -891,6 +891,8 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
 #define CPUID_7_1_EAX_FSRS              (1U << 11)
 /* Fast Short REP CMPS/SCAS */
 #define CPUID_7_1_EAX_FSRC              (1U << 12)
+/* Support Tile Computational Operations on FP16 Numbers */
+#define CPUID_7_1_EAX_AMX_FP16          (1U << 21)
 
 /* XFD Extend Feature Disabled */
 #define CPUID_D_1_EAX_XFD               (1U << 4)
-- 
Gitee


From efa846baa2087c154d37e8ec50cdb6ca2d3e9eb8 Mon Sep 17 00:00:00 2001
From: Jiaxi Chen <jiaxi.chen@linux.intel.com>
Date: Fri, 3 Mar 2023 14:59:10 +0800
Subject: [PATCH 396/407] target/i386: Add support for AVX-IFMA in CPUID
 enumeration

commit a957a88416ecbec51e147cba9fe89b93f6646b3b upstream.

AVX-IFMA is a new instruction in the latest Intel platform Sierra
Forest. This instruction packed multiplies unsigned 52-bit integers and
adds the low/high 52-bit products to Qword Accumulators.

The bit definition:
CPUID.(EAX=7,ECX=1):EAX[bit 23]

Add CPUID definition for AVX-IFMA.

Intel-SIG: commit a957a88416ec target/i386: Add support for AVX-IFMA in CPUID enumeration.
Add SPR/GNR/SRA new ISAs backporting

Signed-off-by: Jiaxi Chen <jiaxi.chen@linux.intel.com>
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20230303065913.1246327-4-tao1.su@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
---
 target/i386/cpu.c | 2 +-
 target/i386/cpu.h | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 8420624950e..af5d6841c7b 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -876,7 +876,7 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
             NULL, NULL, "fzrm", "fsrs",
             "fsrc", NULL, NULL, NULL,
             NULL, NULL, NULL, NULL,
-            NULL, "amx-fp16", NULL, NULL,
+            NULL, "amx-fp16", NULL, "avx-ifma",
             NULL, NULL, NULL, NULL,
             NULL, NULL, NULL, NULL,
         },
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 62f61cefb37..4f5e1f35d4f 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -893,6 +893,8 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
 #define CPUID_7_1_EAX_FSRC              (1U << 12)
 /* Support Tile Computational Operations on FP16 Numbers */
 #define CPUID_7_1_EAX_AMX_FP16          (1U << 21)
+/* Support for VPMADD52[H,L]UQ */
+#define CPUID_7_1_EAX_AVX_IFMA          (1U << 23)
 
 /* XFD Extend Feature Disabled */
 #define CPUID_D_1_EAX_XFD               (1U << 4)
-- 
Gitee


From b3767c0b00b85164d3564af7aa7097be0ef3e5b0 Mon Sep 17 00:00:00 2001
From: Jiaxi Chen <jiaxi.chen@linux.intel.com>
Date: Fri, 3 Mar 2023 14:59:11 +0800
Subject: [PATCH 397/407] target/i386: Add support for AVX-VNNI-INT8 in CPUID
 enumeration

commit eaaa197d5b112ea2758b54df58881a2626de3af5 upstream.

AVX-VNNI-INT8 is a new set of instructions in the latest Intel platform
Sierra Forest, aims for the platform to have superior AI capabilities.
This instruction multiplies the individual bytes of two unsigned or
unsigned source operands, then adds and accumulates the results into the
destination dword element size operand.

The bit definition:
CPUID.(EAX=7,ECX=1):EDX[bit 4]

AVX-VNNI-INT8 is on a new feature bits leaf. Add a CPUID feature word
FEAT_7_1_EDX for this leaf.

Add CPUID definition for AVX-VNNI-INT8.

Intel-SIG: commit eaaa197d5b11 target/i386: Add support for AVX-VNNI-INT8 in CPUID enumeration.
Add SPR/GNR/SRA new ISAs backporting

Signed-off-by: Jiaxi Chen <jiaxi.chen@linux.intel.com>
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20230303065913.1246327-5-tao1.su@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
---
 target/i386/cpu.c | 22 +++++++++++++++++++++-
 target/i386/cpu.h |  4 ++++
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index af5d6841c7b..ce74d7ce4a2 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -664,6 +664,7 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t vendor1,
 #define TCG_7_0_EDX_FEATURES CPUID_7_0_EDX_FSRM
 #define TCG_7_1_EAX_FEATURES (CPUID_7_1_EAX_FZRM | CPUID_7_1_EAX_FSRS | \
           CPUID_7_1_EAX_FSRC)
+#define TCG_7_1_EDX_FEATURES 0
 #define TCG_APM_FEATURES 0
 #define TCG_6_EAX_FEATURES CPUID_6_EAX_ARAT
 #define TCG_XSAVE_FEATURES (CPUID_XSAVE_XSAVEOPT | CPUID_XSAVE_XGETBV1)
@@ -887,6 +888,25 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
         },
         .tcg_features = TCG_7_1_EAX_FEATURES,
     },
+    [FEAT_7_1_EDX] = {
+        .type = CPUID_FEATURE_WORD,
+        .feat_names = {
+            NULL, NULL, NULL, NULL,
+            "avx-vnni-int8", NULL, NULL, NULL,
+            NULL, NULL, NULL, NULL,
+            NULL, NULL, NULL, NULL,
+            NULL, NULL, NULL, NULL,
+            NULL, NULL, NULL, NULL,
+            NULL, NULL, NULL, NULL,
+            NULL, NULL, NULL, NULL,
+        },
+        .cpuid = {
+            .eax = 7,
+            .needs_ecx = true, .ecx = 1,
+            .reg = R_EDX,
+        },
+        .tcg_features = TCG_7_1_EDX_FEATURES,
+    },
     [FEAT_8000_0007_EDX] = {
         .type = CPUID_FEATURE_WORD,
         .feat_names = {
@@ -5638,9 +5658,9 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
             }
         } else if (count == 1) {
             *eax = env->features[FEAT_7_1_EAX];
+            *edx = env->features[FEAT_7_1_EDX];
             *ebx = 0;
             *ecx = 0;
-            *edx = 0;
         } else {
             *eax = 0;
             *ebx = 0;
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 4f5e1f35d4f..79e456d47c3 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -602,6 +602,7 @@ typedef enum FeatureWord {
     FEAT_SGX_12_0_EAX,  /* CPUID[EAX=0x12,ECX=0].EAX (SGX) */
     FEAT_SGX_12_0_EBX,  /* CPUID[EAX=0x12,ECX=0].EBX (SGX MISCSELECT[31:0]) */
     FEAT_SGX_12_1_EAX,  /* CPUID[EAX=0x12,ECX=1].EAX (SGX ATTRIBUTES[31:0]) */
+    FEAT_7_1_EDX,       /* CPUID[EAX=7,ECX=1].EDX */
     FEATURE_WORDS,
 } FeatureWord;
 
@@ -896,6 +897,9 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
 /* Support for VPMADD52[H,L]UQ */
 #define CPUID_7_1_EAX_AVX_IFMA          (1U << 23)
 
+/* Support for VPDPB[SU,UU,SS]D[,S] */
+#define CPUID_7_1_EDX_AVX_VNNI_INT8     (1U << 4)
+
 /* XFD Extend Feature Disabled */
 #define CPUID_D_1_EAX_XFD               (1U << 4)
 
-- 
Gitee


From fef212ecf240b128ea27b3c076205abac91cee82 Mon Sep 17 00:00:00 2001
From: Jiaxi Chen <jiaxi.chen@linux.intel.com>
Date: Fri, 3 Mar 2023 14:59:12 +0800
Subject: [PATCH 398/407] target/i386: Add support for AVX-NE-CONVERT in CPUID
 enumeration

commit ecd2e6ca037d7bf3673c5478590d686d5cd6135a upstream.

AVX-NE-CONVERT is a new set of instructions which can convert low
precision floating point like BF16/FP16 to high precision floating point
FP32, as well as convert FP32 elements to BF16. This instruction allows
the platform to have improved AI capabilities and better compatibility.

The bit definition:
CPUID.(EAX=7,ECX=1):EDX[bit 5]

Add CPUID definition for AVX-NE-CONVERT.

Intel-SIG: commit ecd2e6ca037d target/i386: Add support for AVX-NE-CONVERT in CPUID enumeration.
Add SPR/GNR/SRA new ISAs backporting

Signed-off-by: Jiaxi Chen <jiaxi.chen@linux.intel.com>
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20230303065913.1246327-6-tao1.su@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
---
 target/i386/cpu.c | 2 +-
 target/i386/cpu.h | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index ce74d7ce4a2..7ab531cb288 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -892,7 +892,7 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
         .type = CPUID_FEATURE_WORD,
         .feat_names = {
             NULL, NULL, NULL, NULL,
-            "avx-vnni-int8", NULL, NULL, NULL,
+            "avx-vnni-int8", "avx-ne-convert", NULL, NULL,
             NULL, NULL, NULL, NULL,
             NULL, NULL, NULL, NULL,
             NULL, NULL, NULL, NULL,
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 79e456d47c3..4757479acdc 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -899,6 +899,8 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
 
 /* Support for VPDPB[SU,UU,SS]D[,S] */
 #define CPUID_7_1_EDX_AVX_VNNI_INT8     (1U << 4)
+/* AVX NE CONVERT Instructions */
+#define CPUID_7_1_EDX_AVX_NE_CONVERT    (1U << 5)
 
 /* XFD Extend Feature Disabled */
 #define CPUID_D_1_EAX_XFD               (1U << 4)
-- 
Gitee


From 3c27c45aa5594ca9a811f44ccc3f34e81ddb9277 Mon Sep 17 00:00:00 2001
From: Jiaxi Chen <jiaxi.chen@linux.intel.com>
Date: Fri, 3 Mar 2023 14:59:13 +0800
Subject: [PATCH 399/407] target/i386: Add support for PREFETCHIT0/1 in CPUID
 enumeration

commit d1a1111514333e46a98b136235f71eef90d610fa upstream.

Latest Intel platform Granite Rapids has introduced a new instruction -
PREFETCHIT0/1, which moves code to memory (cache) closer to the
processor depending on specific hints.

The bit definition:
CPUID.(EAX=7,ECX=1):EDX[bit 14]

Add CPUID definition for PREFETCHIT0/1.

Intel-SIG: commit d1a111151433 target/i386: Add support for PREFETCHIT0/1 in CPUID enumeration.
Add SPR/GNR/SRA new ISAs backporting

Signed-off-by: Jiaxi Chen <jiaxi.chen@linux.intel.com>
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20230303065913.1246327-7-tao1.su@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
---
 target/i386/cpu.c | 2 +-
 target/i386/cpu.h | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 7ab531cb288..abd15ad5ded 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -894,7 +894,7 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
             NULL, NULL, NULL, NULL,
             "avx-vnni-int8", "avx-ne-convert", NULL, NULL,
             NULL, NULL, NULL, NULL,
-            NULL, NULL, NULL, NULL,
+            NULL, NULL, "prefetchiti", NULL,
             NULL, NULL, NULL, NULL,
             NULL, NULL, NULL, NULL,
             NULL, NULL, NULL, NULL,
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 4757479acdc..435f458cf33 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -901,6 +901,8 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
 #define CPUID_7_1_EDX_AVX_VNNI_INT8     (1U << 4)
 /* AVX NE CONVERT Instructions */
 #define CPUID_7_1_EDX_AVX_NE_CONVERT    (1U << 5)
+/* PREFETCHIT0/1 Instructions */
+#define CPUID_7_1_EDX_PREFETCHITI       (1U << 14)
 
 /* XFD Extend Feature Disabled */
 #define CPUID_D_1_EAX_XFD               (1U << 4)
-- 
Gitee


From 50bf8a166502db43fe291ccf4a881b33754c5d0e Mon Sep 17 00:00:00 2001
From: Tao Su <tao1.su@linux.intel.com>
Date: Thu, 6 Jul 2023 13:49:44 +0800
Subject: [PATCH 400/407] target/i386: Adjust feature level according to
 FEAT_7_1_EDX

commit 8731336e90dea3dd04948127e775c9f087f97a4c upstream.

If FEAT_7_1_EAX is 0 and FEAT_7_1_EDX is non-zero, as is the case
with a Granite Rapids host and
'-cpu host,-avx-vnni,-avx512-bf16,-fzrm,-fsrs,-fsrc,-amx-fp16', we can't
get CPUID_7_1 leaf even though CPUID_7_1_EDX has non-zero value.

Update cpuid_level_func7 according to CPUID_7_1_EDX, otherwise
guest may report wrong maximum number sub-leaves in leaf 07H.

Fixes: eaaa197d5b11 ("target/i386: Add support for AVX-VNNI-INT8 in CPUID enumeration")
Intel-SIG: commit 8731336e90de target/i386: Adjust feature level according to FEAT_7_1_EDX.
Add SPR/GNR/SRA new ISAs backporting

Cc: qemu-stable@nongnu.org
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20230706054949.66556-2-tao1.su@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
---
 target/i386/cpu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index abd15ad5ded..ea10fce1014 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6482,6 +6482,7 @@ void x86_cpu_expand_features(X86CPU *cpu, Error **errp)
         x86_cpu_adjust_feat_level(cpu, FEAT_6_EAX);
         x86_cpu_adjust_feat_level(cpu, FEAT_7_0_ECX);
         x86_cpu_adjust_feat_level(cpu, FEAT_7_1_EAX);
+        x86_cpu_adjust_feat_level(cpu, FEAT_7_1_EDX);
         x86_cpu_adjust_feat_level(cpu, FEAT_8000_0001_EDX);
         x86_cpu_adjust_feat_level(cpu, FEAT_8000_0001_ECX);
         x86_cpu_adjust_feat_level(cpu, FEAT_8000_0007_EDX);
-- 
Gitee


From 5b3ffb6f490e174a8db396c23d8fb8ab02b85069 Mon Sep 17 00:00:00 2001
From: Tao Su <tao1.su@linux.intel.com>
Date: Thu, 6 Jul 2023 13:49:47 +0800
Subject: [PATCH 401/407] target/i386: Add new bit definitions of
 MSR_IA32_ARCH_CAPABILITIES

commit 6c43ec3b206956a8a3008accafe9eb2dfd885190 upstream.

Currently, bit 13, 14, 15 and 24 of MSR_IA32_ARCH_CAPABILITIES are
disclosed for fixing security issues, so add those bit definitions.

Intel-SIG: commit 6c43ec3b2069 target/i386: Add new bit definitions of MSR_IA32_ARCH_CAPABILITIES.
Add SPR/GNR/SRA new ISAs backporting

Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-ID: <20230706054949.66556-5-tao1.su@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
---
 target/i386/cpu.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 435f458cf33..c65133ab6b6 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -977,7 +977,11 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
 #define MSR_ARCH_CAP_PSCHANGE_MC_NO     (1U << 6)
 #define MSR_ARCH_CAP_TSX_CTRL_MSR       (1U << 7)
 #define MSR_ARCH_CAP_TAA_NO             (1U << 8)
+#define MSR_ARCH_CAP_SBDR_SSDP_NO       (1U << 13)
+#define MSR_ARCH_CAP_FBSDP_NO           (1U << 14)
+#define MSR_ARCH_CAP_PSDP_NO            (1U << 15)
 #define MSR_ARCH_CAP_FB_CLEAR           (1U << 17)
+#define MSR_ARCH_CAP_PBRSB_NO           (1U << 24)
 
 #define MSR_CORE_CAP_SPLIT_LOCK_DETECT  (1U << 5)
 
-- 
Gitee


From a836e9bcb29c6f524737a503cbc2205447b8ac2a Mon Sep 17 00:00:00 2001
From: Tao Su <tao1.su@linux.intel.com>
Date: Thu, 6 Jul 2023 13:49:45 +0800
Subject: [PATCH 402/407] target/i386: Add support for MCDT_NO in CPUID
 enumeration

commit 9dd8b71091f47bac395f543779269c14d8d93c60 upstream.

CPUID.(EAX=7,ECX=2):EDX[bit 5] enumerates MCDT_NO. Processors enumerate
this bit as 1 do not exhibit MXCSR Configuration Dependent Timing (MCDT)
behavior and do not need to be mitigated to avoid data-dependent behavior
for certain instructions.

Since MCDT_NO is in a new sub-leaf, add a new CPUID feature word
FEAT_7_2_EDX. Also update cpuid_level_func7 by FEAT_7_2_EDX.

Intel-SIG: commit 9dd8b71091f4 target/i386: Add support for MCDT_NO in CPUID enumeration.
Add SPR/GNR/SRA new ISAs backporting

Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20230706054949.66556-3-tao1.su@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
---
 target/i386/cpu.c | 26 ++++++++++++++++++++++++++
 target/i386/cpu.h |  4 ++++
 2 files changed, 30 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index ea10fce1014..271f5c0d578 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -665,6 +665,7 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t vendor1,
 #define TCG_7_1_EAX_FEATURES (CPUID_7_1_EAX_FZRM | CPUID_7_1_EAX_FSRS | \
           CPUID_7_1_EAX_FSRC)
 #define TCG_7_1_EDX_FEATURES 0
+#define TCG_7_2_EDX_FEATURES 0
 #define TCG_APM_FEATURES 0
 #define TCG_6_EAX_FEATURES CPUID_6_EAX_ARAT
 #define TCG_XSAVE_FEATURES (CPUID_XSAVE_XSAVEOPT | CPUID_XSAVE_XGETBV1)
@@ -907,6 +908,25 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
         },
         .tcg_features = TCG_7_1_EDX_FEATURES,
     },
+    [FEAT_7_2_EDX] = {
+        .type = CPUID_FEATURE_WORD,
+        .feat_names = {
+            NULL, NULL, NULL, NULL,
+            NULL, "mcdt-no", NULL, NULL,
+            NULL, NULL, NULL, NULL,
+            NULL, NULL, NULL, NULL,
+            NULL, NULL, NULL, NULL,
+            NULL, NULL, NULL, NULL,
+            NULL, NULL, NULL, NULL,
+            NULL, NULL, NULL, NULL,
+        },
+        .cpuid = {
+            .eax = 7,
+            .needs_ecx = true, .ecx = 2,
+            .reg = R_EDX,
+        },
+        .tcg_features = TCG_7_2_EDX_FEATURES,
+    },
     [FEAT_8000_0007_EDX] = {
         .type = CPUID_FEATURE_WORD,
         .feat_names = {
@@ -5661,6 +5681,11 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
             *edx = env->features[FEAT_7_1_EDX];
             *ebx = 0;
             *ecx = 0;
+        } else if (count == 2) {
+            *edx = env->features[FEAT_7_2_EDX];
+            *eax = 0;
+            *ebx = 0;
+            *ecx = 0;
         } else {
             *eax = 0;
             *ebx = 0;
@@ -6483,6 +6508,7 @@ void x86_cpu_expand_features(X86CPU *cpu, Error **errp)
         x86_cpu_adjust_feat_level(cpu, FEAT_7_0_ECX);
         x86_cpu_adjust_feat_level(cpu, FEAT_7_1_EAX);
         x86_cpu_adjust_feat_level(cpu, FEAT_7_1_EDX);
+        x86_cpu_adjust_feat_level(cpu, FEAT_7_2_EDX);
         x86_cpu_adjust_feat_level(cpu, FEAT_8000_0001_EDX);
         x86_cpu_adjust_feat_level(cpu, FEAT_8000_0001_ECX);
         x86_cpu_adjust_feat_level(cpu, FEAT_8000_0007_EDX);
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index c65133ab6b6..26572b84623 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -603,6 +603,7 @@ typedef enum FeatureWord {
     FEAT_SGX_12_0_EBX,  /* CPUID[EAX=0x12,ECX=0].EBX (SGX MISCSELECT[31:0]) */
     FEAT_SGX_12_1_EAX,  /* CPUID[EAX=0x12,ECX=1].EAX (SGX ATTRIBUTES[31:0]) */
     FEAT_7_1_EDX,       /* CPUID[EAX=7,ECX=1].EDX */
+    FEAT_7_2_EDX,       /* CPUID[EAX=7,ECX=2].EDX */
     FEATURE_WORDS,
 } FeatureWord;
 
@@ -904,6 +905,9 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
 /* PREFETCHIT0/1 Instructions */
 #define CPUID_7_1_EDX_PREFETCHITI       (1U << 14)
 
+/* Do not exhibit MXCSR Configuration Dependent Timing (MCDT) behavior */
+#define CPUID_7_2_EDX_MCDT_NO           (1U << 5)
+
 /* XFD Extend Feature Disabled */
 #define CPUID_D_1_EAX_XFD               (1U << 4)
 
-- 
Gitee


From 9c3095cbf75baae22df7fee7717e6b75a866a711 Mon Sep 17 00:00:00 2001
From: Tao Su <tao1.su@linux.intel.com>
Date: Thu, 6 Jul 2023 13:49:49 +0800
Subject: [PATCH 403/407] target/i386: Add new CPU model GraniteRapids

commit 6d5e9694ef374159072984c0958c3eaab6dd1d52 upstream.

The GraniteRapids CPU model mainly adds the following new features
based on SapphireRapids:
- PREFETCHITI CPUID.(EAX=7,ECX=1):EDX[bit 14]
- AMX-FP16 CPUID.(EAX=7,ECX=1):EAX[bit 21]

And adds the following security fix for corresponding vulnerabilities:
- MCDT_NO CPUID.(EAX=7,ECX=2):EDX[bit 5]
- SBDR_SSDP_NO MSR_IA32_ARCH_CAPABILITIES[bit 13]
- FBSDP_NO MSR_IA32_ARCH_CAPABILITIES[bit 14]
- PSDP_NO MSR_IA32_ARCH_CAPABILITIES[bit 15]
- PBRSB_NO MSR_IA32_ARCH_CAPABILITIES[bit 24]

Intel-SIG: commit 6d5e9694ef37 target/i386: Add new CPU model GraniteRapids.
Add SPR/GNR/SRA new ISAs backporting

Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Tested-by: Xuelian Guo <xuelian.guo@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20230706054949.66556-7-tao1.su@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
---
 target/i386/cpu.c | 136 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 136 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 271f5c0d578..6fa32fd9b99 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -4016,6 +4016,142 @@ static const X86CPUDefinition builtin_x86_defs[] = {
             { /* end of list */ },
         },
     },
+    {
+        .name = "GraniteRapids",
+        .level = 0x20,
+        .vendor = CPUID_VENDOR_INTEL,
+        .family = 6,
+        .model = 173,
+        .stepping = 0,
+        /*
+         * please keep the ascending order so that we can have a clear view of
+         * bit position of each feature.
+         */
+        .features[FEAT_1_EDX] =
+            CPUID_FP87 | CPUID_VME | CPUID_DE | CPUID_PSE | CPUID_TSC |
+            CPUID_MSR | CPUID_PAE | CPUID_MCE | CPUID_CX8 | CPUID_APIC |
+            CPUID_SEP | CPUID_MTRR | CPUID_PGE | CPUID_MCA | CPUID_CMOV |
+            CPUID_PAT | CPUID_PSE36 | CPUID_CLFLUSH | CPUID_MMX | CPUID_FXSR |
+            CPUID_SSE | CPUID_SSE2,
+        .features[FEAT_1_ECX] =
+            CPUID_EXT_SSE3 | CPUID_EXT_PCLMULQDQ | CPUID_EXT_SSSE3 |
+            CPUID_EXT_FMA | CPUID_EXT_CX16 | CPUID_EXT_PCID | CPUID_EXT_SSE41 |
+            CPUID_EXT_SSE42 | CPUID_EXT_X2APIC | CPUID_EXT_MOVBE |
+            CPUID_EXT_POPCNT | CPUID_EXT_TSC_DEADLINE_TIMER | CPUID_EXT_AES |
+            CPUID_EXT_XSAVE | CPUID_EXT_AVX | CPUID_EXT_F16C | CPUID_EXT_RDRAND,
+        .features[FEAT_8000_0001_EDX] =
+            CPUID_EXT2_SYSCALL | CPUID_EXT2_NX | CPUID_EXT2_PDPE1GB |
+            CPUID_EXT2_RDTSCP | CPUID_EXT2_LM,
+        .features[FEAT_8000_0001_ECX] =
+            CPUID_EXT3_LAHF_LM | CPUID_EXT3_ABM | CPUID_EXT3_3DNOWPREFETCH,
+        .features[FEAT_8000_0008_EBX] =
+            CPUID_8000_0008_EBX_WBNOINVD,
+        .features[FEAT_7_0_EBX] =
+            CPUID_7_0_EBX_FSGSBASE | CPUID_7_0_EBX_BMI1 | CPUID_7_0_EBX_HLE |
+            CPUID_7_0_EBX_AVX2 | CPUID_7_0_EBX_SMEP | CPUID_7_0_EBX_BMI2 |
+            CPUID_7_0_EBX_ERMS | CPUID_7_0_EBX_INVPCID | CPUID_7_0_EBX_RTM |
+            CPUID_7_0_EBX_AVX512F | CPUID_7_0_EBX_AVX512DQ |
+            CPUID_7_0_EBX_RDSEED | CPUID_7_0_EBX_ADX | CPUID_7_0_EBX_SMAP |
+            CPUID_7_0_EBX_AVX512IFMA | CPUID_7_0_EBX_CLFLUSHOPT |
+            CPUID_7_0_EBX_CLWB | CPUID_7_0_EBX_AVX512CD | CPUID_7_0_EBX_SHA_NI |
+            CPUID_7_0_EBX_AVX512BW | CPUID_7_0_EBX_AVX512VL,
+        .features[FEAT_7_0_ECX] =
+            CPUID_7_0_ECX_AVX512_VBMI | CPUID_7_0_ECX_UMIP | CPUID_7_0_ECX_PKU |
+            CPUID_7_0_ECX_AVX512_VBMI2 | CPUID_7_0_ECX_GFNI |
+            CPUID_7_0_ECX_VAES | CPUID_7_0_ECX_VPCLMULQDQ |
+            CPUID_7_0_ECX_AVX512VNNI | CPUID_7_0_ECX_AVX512BITALG |
+            CPUID_7_0_ECX_AVX512_VPOPCNTDQ | CPUID_7_0_ECX_LA57 |
+            CPUID_7_0_ECX_RDPID | CPUID_7_0_ECX_BUS_LOCK_DETECT,
+        .features[FEAT_7_0_EDX] =
+            CPUID_7_0_EDX_FSRM | CPUID_7_0_EDX_SERIALIZE |
+            CPUID_7_0_EDX_TSX_LDTRK | CPUID_7_0_EDX_AMX_BF16 |
+            CPUID_7_0_EDX_AVX512_FP16 | CPUID_7_0_EDX_AMX_TILE |
+            CPUID_7_0_EDX_AMX_INT8 | CPUID_7_0_EDX_SPEC_CTRL |
+            CPUID_7_0_EDX_ARCH_CAPABILITIES | CPUID_7_0_EDX_SPEC_CTRL_SSBD,
+        .features[FEAT_ARCH_CAPABILITIES] =
+            MSR_ARCH_CAP_RDCL_NO | MSR_ARCH_CAP_IBRS_ALL |
+            MSR_ARCH_CAP_SKIP_L1DFL_VMENTRY | MSR_ARCH_CAP_MDS_NO |
+            MSR_ARCH_CAP_PSCHANGE_MC_NO | MSR_ARCH_CAP_TAA_NO |
+            MSR_ARCH_CAP_SBDR_SSDP_NO | MSR_ARCH_CAP_FBSDP_NO |
+            MSR_ARCH_CAP_PSDP_NO | MSR_ARCH_CAP_PBRSB_NO,
+        .features[FEAT_XSAVE] =
+            CPUID_XSAVE_XSAVEOPT | CPUID_XSAVE_XSAVEC |
+            CPUID_XSAVE_XGETBV1 | CPUID_XSAVE_XSAVES | CPUID_D_1_EAX_XFD,
+        .features[FEAT_6_EAX] =
+            CPUID_6_EAX_ARAT,
+        .features[FEAT_7_1_EAX] =
+            CPUID_7_1_EAX_AVX_VNNI | CPUID_7_1_EAX_AVX512_BF16 |
+            CPUID_7_1_EAX_FZRM | CPUID_7_1_EAX_FSRS | CPUID_7_1_EAX_FSRC |
+            CPUID_7_1_EAX_AMX_FP16,
+        .features[FEAT_7_1_EDX] =
+            CPUID_7_1_EDX_PREFETCHITI,
+        .features[FEAT_7_2_EDX] =
+            CPUID_7_2_EDX_MCDT_NO,
+        .features[FEAT_VMX_BASIC] =
+            MSR_VMX_BASIC_INS_OUTS | MSR_VMX_BASIC_TRUE_CTLS,
+        .features[FEAT_VMX_ENTRY_CTLS] =
+            VMX_VM_ENTRY_LOAD_DEBUG_CONTROLS | VMX_VM_ENTRY_IA32E_MODE |
+            VMX_VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL |
+            VMX_VM_ENTRY_LOAD_IA32_PAT | VMX_VM_ENTRY_LOAD_IA32_EFER,
+        .features[FEAT_VMX_EPT_VPID_CAPS] =
+            MSR_VMX_EPT_EXECONLY |
+            MSR_VMX_EPT_PAGE_WALK_LENGTH_4 | MSR_VMX_EPT_PAGE_WALK_LENGTH_5 |
+            MSR_VMX_EPT_WB | MSR_VMX_EPT_2MB | MSR_VMX_EPT_1GB |
+            MSR_VMX_EPT_INVEPT | MSR_VMX_EPT_AD_BITS |
+            MSR_VMX_EPT_INVEPT_SINGLE_CONTEXT | MSR_VMX_EPT_INVEPT_ALL_CONTEXT |
+            MSR_VMX_EPT_INVVPID | MSR_VMX_EPT_INVVPID_SINGLE_ADDR |
+            MSR_VMX_EPT_INVVPID_SINGLE_CONTEXT |
+            MSR_VMX_EPT_INVVPID_ALL_CONTEXT |
+            MSR_VMX_EPT_INVVPID_SINGLE_CONTEXT_NOGLOBALS,
+        .features[FEAT_VMX_EXIT_CTLS] =
+            VMX_VM_EXIT_SAVE_DEBUG_CONTROLS |
+            VMX_VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL |
+            VMX_VM_EXIT_ACK_INTR_ON_EXIT | VMX_VM_EXIT_SAVE_IA32_PAT |
+            VMX_VM_EXIT_LOAD_IA32_PAT | VMX_VM_EXIT_SAVE_IA32_EFER |
+            VMX_VM_EXIT_LOAD_IA32_EFER | VMX_VM_EXIT_SAVE_VMX_PREEMPTION_TIMER,
+        .features[FEAT_VMX_MISC] =
+            MSR_VMX_MISC_STORE_LMA | MSR_VMX_MISC_ACTIVITY_HLT |
+            MSR_VMX_MISC_VMWRITE_VMEXIT,
+        .features[FEAT_VMX_PINBASED_CTLS] =
+            VMX_PIN_BASED_EXT_INTR_MASK | VMX_PIN_BASED_NMI_EXITING |
+            VMX_PIN_BASED_VIRTUAL_NMIS | VMX_PIN_BASED_VMX_PREEMPTION_TIMER |
+            VMX_PIN_BASED_POSTED_INTR,
+        .features[FEAT_VMX_PROCBASED_CTLS] =
+            VMX_CPU_BASED_VIRTUAL_INTR_PENDING |
+            VMX_CPU_BASED_USE_TSC_OFFSETING | VMX_CPU_BASED_HLT_EXITING |
+            VMX_CPU_BASED_INVLPG_EXITING | VMX_CPU_BASED_MWAIT_EXITING |
+            VMX_CPU_BASED_RDPMC_EXITING | VMX_CPU_BASED_RDTSC_EXITING |
+            VMX_CPU_BASED_CR3_LOAD_EXITING | VMX_CPU_BASED_CR3_STORE_EXITING |
+            VMX_CPU_BASED_CR8_LOAD_EXITING | VMX_CPU_BASED_CR8_STORE_EXITING |
+            VMX_CPU_BASED_TPR_SHADOW | VMX_CPU_BASED_VIRTUAL_NMI_PENDING |
+            VMX_CPU_BASED_MOV_DR_EXITING | VMX_CPU_BASED_UNCOND_IO_EXITING |
+            VMX_CPU_BASED_USE_IO_BITMAPS | VMX_CPU_BASED_MONITOR_TRAP_FLAG |
+            VMX_CPU_BASED_USE_MSR_BITMAPS | VMX_CPU_BASED_MONITOR_EXITING |
+            VMX_CPU_BASED_PAUSE_EXITING |
+            VMX_CPU_BASED_ACTIVATE_SECONDARY_CONTROLS,
+        .features[FEAT_VMX_SECONDARY_CTLS] =
+            VMX_SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
+            VMX_SECONDARY_EXEC_ENABLE_EPT | VMX_SECONDARY_EXEC_DESC |
+            VMX_SECONDARY_EXEC_RDTSCP |
+            VMX_SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE |
+            VMX_SECONDARY_EXEC_ENABLE_VPID | VMX_SECONDARY_EXEC_WBINVD_EXITING |
+            VMX_SECONDARY_EXEC_UNRESTRICTED_GUEST |
+            VMX_SECONDARY_EXEC_APIC_REGISTER_VIRT |
+            VMX_SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
+            VMX_SECONDARY_EXEC_RDRAND_EXITING |
+            VMX_SECONDARY_EXEC_ENABLE_INVPCID |
+            VMX_SECONDARY_EXEC_ENABLE_VMFUNC | VMX_SECONDARY_EXEC_SHADOW_VMCS |
+            VMX_SECONDARY_EXEC_RDSEED_EXITING | VMX_SECONDARY_EXEC_ENABLE_PML |
+            VMX_SECONDARY_EXEC_XSAVES,
+        .features[FEAT_VMX_VMFUNC] =
+            MSR_VMX_VMFUNC_EPT_SWITCHING,
+        .xlevel = 0x80000008,
+        .model_id = "Intel Xeon Processor (GraniteRapids)",
+        .versions = (X86CPUVersionDefinition[]) {
+            { .version = 1 },
+            { /* end of list */ },
+        },
+    },
     {
         .name = "KnightsMill",
         .level = 0xd,
-- 
Gitee


From 8c7aaf0743963a31dedb447306eb76ab818d0748 Mon Sep 17 00:00:00 2001
From: Tao Su <tao1.su@linux.intel.com>
Date: Wed, 30 Aug 2023 15:43:24 +0800
Subject: [PATCH 404/407] target/i386: Add support for AMX-COMPLEX in CPUID
 enumeration

commit 3e76bafb28c8292be5c4a32cab873b3a82cbcc87 upstream.

Latest Intel platform GraniteRapids-D introduces AMX-COMPLEX, which adds
two instructions to perform matrix multiplication of two tiles containing
complex elements and accumulate the results into a packed single precision
tile.

AMX-COMPLEX is enumerated via CPUID.(EAX=7,ECX=1):EDX[bit 8]. Add the CPUID
definition for AMX-COMPLEX, AMX-COMPLEX will be enabled automatically when
using '-cpu host' and KVM advertises AMX-COMPLEX to userspace.

Intel-SIG: commit 3e76bafb28c8 target/i386: Add support for AMX-COMPLEX in CPUID enumeration.
Add SPR/GNR/SRA new ISAs backporting

Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20230830074324.84059-1-tao1.su@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
---
 target/i386/cpu.c | 2 +-
 target/i386/cpu.h | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 6fa32fd9b99..a3d0ce99819 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -894,7 +894,7 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
         .feat_names = {
             NULL, NULL, NULL, NULL,
             "avx-vnni-int8", "avx-ne-convert", NULL, NULL,
-            NULL, NULL, NULL, NULL,
+            "amx-complex", NULL, NULL, NULL,
             NULL, NULL, "prefetchiti", NULL,
             NULL, NULL, NULL, NULL,
             NULL, NULL, NULL, NULL,
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 26572b84623..e84cb8265c7 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -902,6 +902,8 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
 #define CPUID_7_1_EDX_AVX_VNNI_INT8     (1U << 4)
 /* AVX NE CONVERT Instructions */
 #define CPUID_7_1_EDX_AVX_NE_CONVERT    (1U << 5)
+/* AMX COMPLEX Instructions */
+#define CPUID_7_1_EDX_AMX_COMPLEX       (1U << 8)
 /* PREFETCHIT0/1 Instructions */
 #define CPUID_7_1_EDX_PREFETCHITI       (1U << 14)
 
-- 
Gitee


From 1e085029388cdc3758f5e4641386afa434f51416 Mon Sep 17 00:00:00 2001
From: Tao Su <tao1.su@linux.intel.com>
Date: Wed, 20 Mar 2024 10:10:44 +0800
Subject: [PATCH 405/407] target/i386: Add new CPU model SierraForest
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

commit 6e82d3b6220777667968a04c87e1667f164ebe88 upstream.

According to table 1-2 in Intel Architecture Instruction Set Extensions and
Future Features (rev 051) [1], SierraForest has the following new features
which have already been virtualized:

- CMPCCXADD CPUID.(EAX=7,ECX=1):EAX[bit 7]
- AVX-IFMA CPUID.(EAX=7,ECX=1):EAX[bit 23]
- AVX-VNNI-INT8 CPUID.(EAX=7,ECX=1):EDX[bit 4]
- AVX-NE-CONVERT CPUID.(EAX=7,ECX=1):EDX[bit 5]

Add above features to new CPU model SierraForest. Comparing with GraniteRapids
CPU model, SierraForest bare-metal removes the following features:

- HLE CPUID.(EAX=7,ECX=0):EBX[bit 4]
- RTM CPUID.(EAX=7,ECX=0):EBX[bit 11]
- AVX512F CPUID.(EAX=7,ECX=0):EBX[bit 16]
- AVX512DQ CPUID.(EAX=7,ECX=0):EBX[bit 17]
- AVX512_IFMA CPUID.(EAX=7,ECX=0):EBX[bit 21]
- AVX512CD CPUID.(EAX=7,ECX=0):EBX[bit 28]
- AVX512BW CPUID.(EAX=7,ECX=0):EBX[bit 30]
- AVX512VL CPUID.(EAX=7,ECX=0):EBX[bit 31]
- AVX512_VBMI CPUID.(EAX=7,ECX=0):ECX[bit 1]
- AVX512_VBMI2 CPUID.(EAX=7,ECX=0):ECX[bit 6]
- AVX512_VNNI CPUID.(EAX=7,ECX=0):ECX[bit 11]
- AVX512_BITALG CPUID.(EAX=7,ECX=0):ECX[bit 12]
- AVX512_VPOPCNTDQ CPUID.(EAX=7,ECX=0):ECX[bit 14]
- LA57 CPUID.(EAX=7,ECX=0):ECX[bit 16]
- TSXLDTRK CPUID.(EAX=7,ECX=0):EDX[bit 16]
- AMX-BF16 CPUID.(EAX=7,ECX=0):EDX[bit 22]
- AVX512_FP16 CPUID.(EAX=7,ECX=0):EDX[bit 23]
- AMX-TILE CPUID.(EAX=7,ECX=0):EDX[bit 24]
- AMX-INT8 CPUID.(EAX=7,ECX=0):EDX[bit 25]
- AVX512_BF16 CPUID.(EAX=7,ECX=1):EAX[bit 5]
- fast zero-length MOVSB CPUID.(EAX=7,ECX=1):EAX[bit 10]
- fast short CMPSB, SCASB CPUID.(EAX=7,ECX=1):EAX[bit 12]
- AMX-FP16 CPUID.(EAX=7,ECX=1):EAX[bit 21]
- PREFETCHI CPUID.(EAX=7,ECX=1):EDX[bit 14]
- XFD CPUID.(EAX=0xD,ECX=1):EAX[bit 4]
- EPT_PAGE_WALK_LENGTH_5 VMX_EPT_VPID_CAP(0x48c)[bit 7]

Add all features of GraniteRapids CPU model except above features to
SierraForest CPU model.

SierraForest doesn’t support TSX and RTM but supports TAA_NO. When RTM is
not enabled in host, KVM will not report TAA_NO. So, just don't include
TAA_NO in SierraForest CPU model.

[1] https://cdrdv2.intel.com/v1/dl/getContent/671368

Intel-SIG: commit 6e82d3b62207 target/i386: Add new CPU model SierraForest.
Add SPR/GNR/SRA new ISAs backporting

Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Message-ID: <20240320021044.508263-1-tao1.su@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
---
 target/i386/cpu.c | 126 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 126 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index a3d0ce99819..1097ee92113 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -3765,6 +3765,132 @@ static const X86CPUDefinition builtin_x86_defs[] = {
             { /* end of list */ },
         },
     },
+    {
+        .name = "SierraForest",
+        .level = 0x23,
+        .vendor = CPUID_VENDOR_INTEL,
+        .family = 6,
+        .model = 175,
+        .stepping = 0,
+        /*
+         * please keep the ascending order so that we can have a clear view of
+         * bit position of each feature.
+         */
+        .features[FEAT_1_EDX] =
+            CPUID_FP87 | CPUID_VME | CPUID_DE | CPUID_PSE | CPUID_TSC |
+            CPUID_MSR | CPUID_PAE | CPUID_MCE | CPUID_CX8 | CPUID_APIC |
+            CPUID_SEP | CPUID_MTRR | CPUID_PGE | CPUID_MCA | CPUID_CMOV |
+            CPUID_PAT | CPUID_PSE36 | CPUID_CLFLUSH | CPUID_MMX | CPUID_FXSR |
+            CPUID_SSE | CPUID_SSE2,
+        .features[FEAT_1_ECX] =
+            CPUID_EXT_SSE3 | CPUID_EXT_PCLMULQDQ | CPUID_EXT_SSSE3 |
+            CPUID_EXT_FMA | CPUID_EXT_CX16 | CPUID_EXT_PCID | CPUID_EXT_SSE41 |
+            CPUID_EXT_SSE42 | CPUID_EXT_X2APIC | CPUID_EXT_MOVBE |
+            CPUID_EXT_POPCNT | CPUID_EXT_TSC_DEADLINE_TIMER | CPUID_EXT_AES |
+            CPUID_EXT_XSAVE | CPUID_EXT_AVX | CPUID_EXT_F16C | CPUID_EXT_RDRAND,
+        .features[FEAT_8000_0001_EDX] =
+            CPUID_EXT2_SYSCALL | CPUID_EXT2_NX | CPUID_EXT2_PDPE1GB |
+            CPUID_EXT2_RDTSCP | CPUID_EXT2_LM,
+        .features[FEAT_8000_0001_ECX] =
+            CPUID_EXT3_LAHF_LM | CPUID_EXT3_ABM | CPUID_EXT3_3DNOWPREFETCH,
+        .features[FEAT_8000_0008_EBX] =
+            CPUID_8000_0008_EBX_WBNOINVD,
+        .features[FEAT_7_0_EBX] =
+            CPUID_7_0_EBX_FSGSBASE | CPUID_7_0_EBX_BMI1 | CPUID_7_0_EBX_AVX2 |
+            CPUID_7_0_EBX_SMEP | CPUID_7_0_EBX_BMI2 | CPUID_7_0_EBX_ERMS |
+            CPUID_7_0_EBX_INVPCID | CPUID_7_0_EBX_RDSEED | CPUID_7_0_EBX_ADX |
+            CPUID_7_0_EBX_SMAP | CPUID_7_0_EBX_CLFLUSHOPT | CPUID_7_0_EBX_CLWB |
+            CPUID_7_0_EBX_SHA_NI,
+        .features[FEAT_7_0_ECX] =
+            CPUID_7_0_ECX_UMIP | CPUID_7_0_ECX_PKU | CPUID_7_0_ECX_GFNI |
+            CPUID_7_0_ECX_VAES | CPUID_7_0_ECX_VPCLMULQDQ |
+            CPUID_7_0_ECX_RDPID | CPUID_7_0_ECX_BUS_LOCK_DETECT,
+        .features[FEAT_7_0_EDX] =
+            CPUID_7_0_EDX_FSRM | CPUID_7_0_EDX_SERIALIZE |
+            CPUID_7_0_EDX_SPEC_CTRL | CPUID_7_0_EDX_ARCH_CAPABILITIES |
+            CPUID_7_0_EDX_SPEC_CTRL_SSBD,
+        .features[FEAT_ARCH_CAPABILITIES] =
+            MSR_ARCH_CAP_RDCL_NO | MSR_ARCH_CAP_IBRS_ALL |
+            MSR_ARCH_CAP_SKIP_L1DFL_VMENTRY | MSR_ARCH_CAP_MDS_NO |
+            MSR_ARCH_CAP_PSCHANGE_MC_NO | MSR_ARCH_CAP_SBDR_SSDP_NO |
+            MSR_ARCH_CAP_FBSDP_NO | MSR_ARCH_CAP_PSDP_NO |
+            MSR_ARCH_CAP_PBRSB_NO,
+        .features[FEAT_XSAVE] =
+            CPUID_XSAVE_XSAVEOPT | CPUID_XSAVE_XSAVEC |
+            CPUID_XSAVE_XGETBV1 | CPUID_XSAVE_XSAVES,
+        .features[FEAT_6_EAX] =
+            CPUID_6_EAX_ARAT,
+        .features[FEAT_7_1_EAX] =
+            CPUID_7_1_EAX_AVX_VNNI | CPUID_7_1_EAX_CMPCCXADD |
+            CPUID_7_1_EAX_FSRS | CPUID_7_1_EAX_AVX_IFMA,
+        .features[FEAT_7_1_EDX] =
+            CPUID_7_1_EDX_AVX_VNNI_INT8 | CPUID_7_1_EDX_AVX_NE_CONVERT,
+        .features[FEAT_7_2_EDX] =
+            CPUID_7_2_EDX_MCDT_NO,
+        .features[FEAT_VMX_BASIC] =
+            MSR_VMX_BASIC_INS_OUTS | MSR_VMX_BASIC_TRUE_CTLS,
+        .features[FEAT_VMX_ENTRY_CTLS] =
+            VMX_VM_ENTRY_LOAD_DEBUG_CONTROLS | VMX_VM_ENTRY_IA32E_MODE |
+            VMX_VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL |
+            VMX_VM_ENTRY_LOAD_IA32_PAT | VMX_VM_ENTRY_LOAD_IA32_EFER,
+        .features[FEAT_VMX_EPT_VPID_CAPS] =
+            MSR_VMX_EPT_EXECONLY | MSR_VMX_EPT_PAGE_WALK_LENGTH_4 |
+            MSR_VMX_EPT_WB | MSR_VMX_EPT_2MB | MSR_VMX_EPT_1GB |
+            MSR_VMX_EPT_INVEPT | MSR_VMX_EPT_AD_BITS |
+            MSR_VMX_EPT_INVEPT_SINGLE_CONTEXT | MSR_VMX_EPT_INVEPT_ALL_CONTEXT |
+            MSR_VMX_EPT_INVVPID | MSR_VMX_EPT_INVVPID_SINGLE_ADDR |
+            MSR_VMX_EPT_INVVPID_SINGLE_CONTEXT |
+            MSR_VMX_EPT_INVVPID_ALL_CONTEXT |
+            MSR_VMX_EPT_INVVPID_SINGLE_CONTEXT_NOGLOBALS,
+        .features[FEAT_VMX_EXIT_CTLS] =
+            VMX_VM_EXIT_SAVE_DEBUG_CONTROLS |
+            VMX_VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL |
+            VMX_VM_EXIT_ACK_INTR_ON_EXIT | VMX_VM_EXIT_SAVE_IA32_PAT |
+            VMX_VM_EXIT_LOAD_IA32_PAT | VMX_VM_EXIT_SAVE_IA32_EFER |
+            VMX_VM_EXIT_LOAD_IA32_EFER | VMX_VM_EXIT_SAVE_VMX_PREEMPTION_TIMER,
+        .features[FEAT_VMX_MISC] =
+            MSR_VMX_MISC_STORE_LMA | MSR_VMX_MISC_ACTIVITY_HLT |
+            MSR_VMX_MISC_VMWRITE_VMEXIT,
+        .features[FEAT_VMX_PINBASED_CTLS] =
+            VMX_PIN_BASED_EXT_INTR_MASK | VMX_PIN_BASED_NMI_EXITING |
+            VMX_PIN_BASED_VIRTUAL_NMIS | VMX_PIN_BASED_VMX_PREEMPTION_TIMER |
+            VMX_PIN_BASED_POSTED_INTR,
+        .features[FEAT_VMX_PROCBASED_CTLS] =
+            VMX_CPU_BASED_VIRTUAL_INTR_PENDING |
+            VMX_CPU_BASED_USE_TSC_OFFSETING | VMX_CPU_BASED_HLT_EXITING |
+            VMX_CPU_BASED_INVLPG_EXITING | VMX_CPU_BASED_MWAIT_EXITING |
+            VMX_CPU_BASED_RDPMC_EXITING | VMX_CPU_BASED_RDTSC_EXITING |
+            VMX_CPU_BASED_CR3_LOAD_EXITING | VMX_CPU_BASED_CR3_STORE_EXITING |
+            VMX_CPU_BASED_CR8_LOAD_EXITING | VMX_CPU_BASED_CR8_STORE_EXITING |
+            VMX_CPU_BASED_TPR_SHADOW | VMX_CPU_BASED_VIRTUAL_NMI_PENDING |
+            VMX_CPU_BASED_MOV_DR_EXITING | VMX_CPU_BASED_UNCOND_IO_EXITING |
+            VMX_CPU_BASED_USE_IO_BITMAPS | VMX_CPU_BASED_MONITOR_TRAP_FLAG |
+            VMX_CPU_BASED_USE_MSR_BITMAPS | VMX_CPU_BASED_MONITOR_EXITING |
+            VMX_CPU_BASED_PAUSE_EXITING |
+            VMX_CPU_BASED_ACTIVATE_SECONDARY_CONTROLS,
+        .features[FEAT_VMX_SECONDARY_CTLS] =
+            VMX_SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
+            VMX_SECONDARY_EXEC_ENABLE_EPT | VMX_SECONDARY_EXEC_DESC |
+            VMX_SECONDARY_EXEC_RDTSCP |
+            VMX_SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE |
+            VMX_SECONDARY_EXEC_ENABLE_VPID | VMX_SECONDARY_EXEC_WBINVD_EXITING |
+            VMX_SECONDARY_EXEC_UNRESTRICTED_GUEST |
+            VMX_SECONDARY_EXEC_APIC_REGISTER_VIRT |
+            VMX_SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
+            VMX_SECONDARY_EXEC_RDRAND_EXITING |
+            VMX_SECONDARY_EXEC_ENABLE_INVPCID |
+            VMX_SECONDARY_EXEC_ENABLE_VMFUNC | VMX_SECONDARY_EXEC_SHADOW_VMCS |
+            VMX_SECONDARY_EXEC_RDSEED_EXITING | VMX_SECONDARY_EXEC_ENABLE_PML |
+            VMX_SECONDARY_EXEC_XSAVES,
+        .features[FEAT_VMX_VMFUNC] =
+            MSR_VMX_VMFUNC_EPT_SWITCHING,
+        .xlevel = 0x80000008,
+        .model_id = "Intel Xeon Processor (SierraForest)",
+        .versions = (X86CPUVersionDefinition[]) {
+            { .version = 1 },
+            { /* end of list */ },
+        },
+    },
     {
         .name = "Denverton",
         .level = 21,
-- 
Gitee


From dd0a5346f015afa2db89d600970ca82ca7e6a9cc Mon Sep 17 00:00:00 2001
From: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Date: Wed, 13 Mar 2024 07:53:23 -0700
Subject: [PATCH 406/407] target/i386: Export RFDS bit to guests

commit 41bdd9812863c150284a9339a048ed88c40f4df7 upstream.

Register File Data Sampling (RFDS) is a CPU side-channel vulnerability
that may expose stale register value. CPUs that set RFDS_NO bit in MSR
IA32_ARCH_CAPABILITIES indicate that they are not vulnerable to RFDS.
Similarly, RFDS_CLEAR indicates that CPU is affected by RFDS, and has
the microcode to help mitigate RFDS.

Make RFDS_CLEAR and RFDS_NO bits available to guests.

Intel-SIG: commit 41bdd9812863 target/i386: Export RFDS bit to guests.
Add SPR/GNR/SRA new ISAs backporting

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Message-ID: <9a38877857392b5c2deae7e7db1b170d15510314.1710341348.git.pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
---
 target/i386/cpu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 1097ee92113..0a3c768546a 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1025,8 +1025,8 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
             NULL, NULL, NULL, NULL,
             NULL, "fb-clear", NULL, NULL,
             NULL, NULL, NULL, NULL,
-            NULL, NULL, NULL, NULL,
-            NULL, NULL, NULL, NULL,
+            NULL, NULL, NULL, "rfds-no",
+            "rfds-clear", NULL, NULL, NULL,
         },
         .msr = {
             .index = MSR_IA32_ARCH_CAPABILITIES,
-- 
Gitee


From 9f815d491423a9926255ee22c3a6aeb9a7569c51 Mon Sep 17 00:00:00 2001
From: Xuchun Shang <xuchun.shang@linux.alibaba.com>
Date: Sun, 28 Jul 2024 13:18:25 +0800
Subject: [PATCH 407/407] hw/arm: include arm v7m

Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
---
 hw/arm/meson.build | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/arm/meson.build b/hw/arm/meson.build
index 87ed4dd914e..721a8eb8bed 100644
--- a/hw/arm/meson.build
+++ b/hw/arm/meson.build
@@ -31,7 +31,7 @@ arm_ss.add(when: 'CONFIG_VEXPRESS', if_true: files('vexpress.c'))
 arm_ss.add(when: 'CONFIG_ZYNQ', if_true: files('xilinx_zynq.c'))
 arm_ss.add(when: 'CONFIG_SABRELITE', if_true: files('sabrelite.c'))
 
-#arm_ss.add(when: 'CONFIG_ARM_V7M', if_true: files('armv7m.c'))
+arm_ss.add(when: 'CONFIG_ARM_V7M', if_true: files('armv7m.c'))
 arm_ss.add(when: 'CONFIG_EXYNOS4', if_true: files('exynos4210.c'))
 arm_ss.add(when: 'CONFIG_PXA2XX', if_true: files('pxa2xx.c', 'pxa2xx_gpio.c', 'pxa2xx_pic.c'))
 arm_ss.add(when: 'CONFIG_DIGIC', if_true: files('digic.c'))
-- 
Gitee