Skip to content

Kola: add support for multipath devices#1296

Merged
openshift-merge-robot merged 1 commit intocoreos:masterfrom
darkmuggle:pr/multipath
Apr 2, 2020
Merged

Kola: add support for multipath devices#1296
openshift-merge-robot merged 1 commit intocoreos:masterfrom
darkmuggle:pr/multipath

Conversation

@darkmuggle
Copy link
Copy Markdown
Contributor

While looking at providing support for Multipath, I needed the ability to setup Multipathed disks.

@darkmuggle
Copy link
Copy Markdown
Contributor Author

darkmuggle commented Mar 28, 2020

Example for an RHCOS instance:

:/# lsblk --output wwn,name,maj:min,size,type
WWN                NAME   MAJ:MIN  SIZE TYPE
0x5000c500d1d1d1d1 sda      8:0     16G disk
0x5000c500d1d1d1d1 |-sda1   8:1    384M part
0x5000c500d1d1d1d1 |-sda2   8:2    127M part
0x5000c500d1d1d1d1 |-sda3   8:3      1M part
0x5000c500d1d1d1d1 `-sda4   8:4    3.1G part
0x5000c500d1d1d1d1 sdb      8:16    16G disk
0x5000c500d1d1d1d1 |-sdb1   8:17   384M part
0x5000c500d1d1d1d1 |-sdb2   8:18   127M part
0x5000c500d1d1d1d1 |-sdb3   8:19     1M part
0x5000c500d1d1d1d1 `-sdb4   8:20   3.1G part

:/# mkdir -p /etc/multipath
:/# mpathconf --enable   
:/# systemctl start multipathd.service
 [  149.993496] multipathd[963]: --------start up--------
[  149.995191] multipathd[963]: read /etc/multipath.conf
[  149.996942] multipathd[963]: path checkers start up
[  150.027039] device-mapper: multipath service-time: version 0.3.0 loaded
[  150.045634] multipathd[963]: mpatha: load table [0 33554432 multipath 0 0 2 1 service-time 0 1 1 8:0 1 service-time 0 1 1 8:16 1

:/# dmsetup table
mpatha4: 0 6582272 linear 253:0 1050624
mpatha: 0 33554432 multipath 0 0 2 1 service-time 0 1 2 8:0 1 1 service-time 0 1 2 8:16 1 1 
mpatha3: 0 2048 linear 253:0 1048576
mpatha2: 0 260096 linear 253:0 788480
mpatha1: 0 786432 linear 253:0 2048

Comment thread mantle/platform/qemu.go
Comment thread mantle/platform/qemu.go Outdated
Comment thread mantle/platform/qemu.go Outdated
@darkmuggle
Copy link
Copy Markdown
Contributor Author

Okay, lifting the WIP/Draft tags.

A couple of changes:

  • switched to virtio-scsi-pci because we need the SCSI features like WWN ID's
  • handle MPIO on it own

Once I get MPIO working in Dracut I plan on submitting a test for this. To play with this in COSA, you can use cosa run --multipath.

NOTE: Right now RHCOS and FCOS can't use root on Multipath since its in use by the time multipath tools come up. This is PR simply gives a way to test support later.

@darkmuggle darkmuggle marked this pull request as ready for review March 31, 2020 17:49
@darkmuggle darkmuggle removed the WIP PR still being worked on label Mar 31, 2020
@darkmuggle
Copy link
Copy Markdown
Contributor Author

/test sanity

Comment thread mantle/platform/qemu.go Outdated
id := fmt.Sprintf("d%d", builder.diskId)

if disk.MultiPathDisk {
builder.Append("-device", "virtio-scsi-pci,id=scsi")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we try to add multiple multipath disks - will this fail since we're appending multiple with the same id=scsi? I think we either need to only append this once, or create a separate bus per disk?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that is true. Testing a fix now.

Comment thread mantle/platform/qemu.go Outdated
Comment thread mantle/platform/qemu.go
@cgwalters
Copy link
Copy Markdown
Member

I think the Prow jobs are doing some sort of bad merge with master that's not getting the updated Makefile bits from #1303 - try rebasing on master?

@cgwalters
Copy link
Copy Markdown
Member

Maybe this is useful, I just did a google search for "libvirt multipath" and found:

@darkmuggle
Copy link
Copy Markdown
Contributor Author

@cgwalters I'm exploring using qemu-nbd to serve a copy of the disk, but that's not ready. If this is acceptable, then I'll continue to explore the qemu-nbd path

@cgwalters
Copy link
Copy Markdown
Member

From the CI run:

qemu-img: /builddir/build/BUILD/qemu-4.1.1/block.c:5767: bdrv_img_create: Assertion `full_backing' failed.
--- FAIL: coreos.ignition.mount.disks (2.02s)
        mount.go:165: adding additional disk: signal: aborted

@cgwalters
Copy link
Copy Markdown
Member

@cgwalters I'm exploring using qemu-nbd to serve a copy of the disk, but that's not ready. If this is acceptable, then I'll continue to explore the qemu-nbd path

Oh yeah totally fine by me as is. The main reason I care is that I do want cosa kola run speed to be reasonable to do locally and converting the disk image is a hit. (But...eventually we will add enough tests it won't be reasonable to run all of them for every change anyways)

Add support for "faking" multipath devices. In order to do this, Kola
converts QCOW2 to a raw device and adds the device twice with attributes
sufficient to trick multipath tools into seeing the devices as two paths
to the same device.

This feature is needed in order to adequately test root on multipath
devices for RHCOS.
@cgwalters
Copy link
Copy Markdown
Member

/lgtm

@openshift-ci-robot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, darkmuggle

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [cgwalters,darkmuggle]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 72458e6 into coreos:master Apr 2, 2020
cgwalters added a commit to cgwalters/coreos-assembler that referenced this pull request Apr 23, 2020
I ran out of space in `/tmp` today and it turned out I had
a whole lot of `/tmp/mantle-qemu` files, including some
really big disk images.  This is a combination of changes
from coreos#1338
and the multipath PR coreos#1296

Basically if we get interrupted by Ctrl-C which is a lot more
likely now when qemu isn't owning the serial console from the
start, we will leak our tmpfiles.

We probably need to install a `SIGINT` handler but that's a whole
mess that's avoided if we just let the kernel clean up our
resources for us, which is what was happening before the multipath
PR (well, at least for files and not the swtpm tmpdir, but that's
small).

And actually
coreos#1380
does install a `SIGINT` handler so someone please review that
and we can maybe fix this more after that lands.
cgwalters added a commit to cgwalters/coreos-assembler that referenced this pull request Apr 24, 2020
I ran out of space in `/tmp` today and it turned out I had
a whole lot of `/tmp/mantle-qemu` files, including some
really big disk images.  This is a combination of changes
from coreos#1338
and the multipath PR coreos#1296

Basically if we get interrupted by Ctrl-C which is a lot more
likely now when qemu isn't owning the serial console from the
start, we will leak our tmpfiles.

We probably need to install a `SIGINT` handler but that's a whole
mess that's avoided if we just let the kernel clean up our
resources for us, which is what was happening before the multipath
PR (well, at least for files and not the swtpm tmpdir, but that's
small).

And actually
coreos#1380
does install a `SIGINT` handler so someone please review that
and we can maybe fix this more after that lands.
cgwalters added a commit to cgwalters/coreos-assembler that referenced this pull request Apr 24, 2020
I ran out of space in `/tmp` today and it turned out I had
a whole lot of `/tmp/mantle-qemu` files, including some
really big disk images.  This is a combination of changes
from coreos#1338
and the multipath PR coreos#1296

Basically if we get interrupted by Ctrl-C which is a lot more
likely now when qemu isn't owning the serial console from the
start, we will leak our tmpfiles.

We probably need to install a `SIGINT` handler but that's a whole
mess that's avoided if we just let the kernel clean up our
resources for us, which is what was happening before the multipath
PR (well, at least for files and not the swtpm tmpdir, but that's
small).

And actually
coreos#1380
does install a `SIGINT` handler so someone please review that
and we can maybe fix this more after that lands.
openshift-merge-robot pushed a commit that referenced this pull request Apr 24, 2020
I ran out of space in `/tmp` today and it turned out I had
a whole lot of `/tmp/mantle-qemu` files, including some
really big disk images.  This is a combination of changes
from #1338
and the multipath PR #1296

Basically if we get interrupted by Ctrl-C which is a lot more
likely now when qemu isn't owning the serial console from the
start, we will leak our tmpfiles.

We probably need to install a `SIGINT` handler but that's a whole
mess that's avoided if we just let the kernel clean up our
resources for us, which is what was happening before the multipath
PR (well, at least for files and not the swtpm tmpdir, but that's
small).

And actually
#1380
does install a `SIGINT` handler so someone please review that
and we can maybe fix this more after that lands.
@darkmuggle darkmuggle deleted the pr/multipath branch September 21, 2020 21:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants