ForumGeneral Discussionubuntu 2204 kvm multipath iscsi r …

You need to log in to create posts and topics. Login · Register

ubuntu 2204 kvm multipath iscsi reservation issue

daniel.shafer
8 Posts

March 3, 2025, 3:48 pm

I'm hoping someone could assist me with a iscsi volumes issue on ubuntu using multipath.

With redhat, I was able to setup a 10TB volume across a three node cluster in petasan using 6 paths, two per petasan server.

Now I'm trying to do the same thing with ubuntu as the host and I'm having a little trouble with gfs2 and petasan.

My first issue was when I added the 4th host to the kvm cluster, things started to break at the iscsi reservation level.

So we changed gears, and tried three smaller volumes, each with two paths. First iscsi volume added fine, pacemaker didn't fence anything out.

Then we added the second volume and we started seeing the following error in the log:

Mar 3 06:33:25 ftc-mvm-001 pacemaker-fenced[19540]: notice: fence_scsi_off_2[1978926] error output [ 2025-03-03 06:33:25,544 ERROR: Failed: keys cannot be same. You can not fence yourself. ]

The techs helping me with this suggested:

im really suspecting that SAN is running multiple tgtd daemons and the reservations are not tracked consistently across them

I'd like to set this up again and in a failed state see what the reservations are.

targetctl isn't installed on the petasan nodes so I'm not sure what I would use to take a look.

Can someone help with this please? I'm not sure what I need to look at here to help troubleshoot the issue.

Thanks!

admin
2,973 Posts

March 4, 2025, 10:04 am

it is not clear what is being fenced by pacemaker, fencing of iSCSI server is done by PetaSAN using Consul, it is PetaSAN's job to provide an HA disk for you. My understanding your pacemaker fencing would be working to provide HA for kvm ?

PetaSAN does support Persistent Reservations.

targetctl-fb is installed on the nodes.

daniel.shafer
8 Posts

March 5, 2025, 2:06 pm

the kvm solution we are using has pacemaker running to monitor the virtual environment. the mvm (morpheus) setup watches the mounted volumes that are used for running VM's and if a server loses it's connection (say network issue) it will take that kvm server offline and move the VM's to other hypervisor servers.

I've never had issue with petasan and ovirt leveraging redhat kvm. we are just moving to a new solution that uses mvm and ubuntu and seeing some weird connection problems with ubuntu now doing the same things with the same setup we were using that just worked and we are trying to troubleshoot those issues.

Hope that helps clear up where we are seeing the issue, it's on the host side.

I'm getting errors about multiple volumes with the same name in ubnutu syslog and I want to have proof it's not the san, so I was wondering if it's possible to see the iscsi reservation map from the petasan side of things?

We are also seeing issues when we use multipath with anything more than 2 paths to petasan that I'm working on. I have a alma88 server with 8 paths to a 12 node petasan array and it's rock solid, so I've got to find the different between the two OS's and why weird things are happening in ubuntu2204.

daniel.shafer
8 Posts

March 6, 2025, 1:06 pm

quick question, do you guys have a multipath config that we should be using for petasan?

Currently I have this setup but I think I need a fallback and no_path_retry set too maybe?

devices {
  device {
    vendor "PETASAN"
    product ".*"
    path_selector "round-robin 0"
    path_grouping_policy multibus
    rr_min_io 1
  }
}

daniel.shafer
8 Posts

March 13, 2025, 11:43 am

I can recreate this issue every time I setup an iscsi volume with more than one path to a ubuntu 2204 kvm server cluster.

Command being used to setup the gfs2 volume is:

sudo mkfs.gfs2 -O -K -p lock_dlm -r 2048 -t ftc_mvm_cluster2:mvm2_iscsi_01 -j 32 /dev/mapper/36001405092e33d800002000000000000

I will start seeing iscsi reservation errors after about 5 minutes, right after the kvm system put's the volume in use by the cluster.

With a single iscsi path, the volume stays running, but performance is not as good as it could be if multiple paths were running.

Could really use a little help with this one as the kvm cluster doesn't have an issue with multiple paths connected to an HPE Alletra SAN using this multipath configuration:

devices {
device {
vendor "3PARdata"
product "VV"
path_grouping_policy "group_by_prio"
path_selector "round-robin 0"
path_checker "tur"
features "0"
hardware_handler "1 alua"
prio "alua"
failback immediate
rr_weight "uniform"
no_path_retry 18
fast_io_fail_tmo 10
dev_loss_tmo "infinity"
}
}

I'd really like to use PetaSAN for my KVM cluster too, can you help?

I'll provide anything you need to help troubleshoot this issue, think more people may be experiencing this issue based on how reproducable it is.

daniel.shafer
8 Posts

March 13, 2025, 2:13 pm

Quote from daniel.shafer on March 13, 2025, 2:13 pm
Here is the error message we see on one of the nodes as iscsi disconnects:

Mar 13 07:43:18 ftc-mvm-010 kernel: [ 4875.624321] gfs2: GFS2 installed
Mar 13 07:43:18 ftc-mvm-010 kernel: [ 4875.625673] gfs2: fsid=ftc_mvm_cluster_02:mvm2_iscsi_01: Trying to join cluster "lock_dlm", "ftc_mvm_cluster_02:mvm2_iscsi_01"
Mar 13 07:43:18 ftc-mvm-010 kernel: [ 4875.625797] dlm: Using TCP for communications
Mar 13 07:43:18 ftc-mvm-010 kernel: [ 4875.626144] dlm: mvm2_iscsi_01: joining the lockspace group...
Mar 13 07:43:18 ftc-mvm-010 kernel: [ 4875.634680] dlm: mvm2_iscsi_01: dlm_recover 5489011170463395644
Mar 13 07:43:18 ftc-mvm-010 kernel: [ 4875.634701] dlm: mvm2_iscsi_01: group event done 0
Mar 13 07:43:18 ftc-mvm-010 kernel: [ 4875.634769] dlm: mvm2_iscsi_01: add member 3
Mar 13 07:43:18 ftc-mvm-010 kernel: [ 4875.634773] dlm: mvm2_iscsi_01: add member 2
Mar 13 07:43:18 ftc-mvm-010 kernel: [ 4875.634777] dlm: mvm2_iscsi_01: add member 1
Mar 13 07:43:18 ftc-mvm-010 kernel: [ 4875.635006] dlm: connecting to 3
Mar 13 07:43:18 ftc-mvm-010 kernel: [ 4875.635084] dlm: connecting to 1
Mar 13 07:43:18 ftc-mvm-010 kernel: [ 4875.635252] dlm: got connection from 3
Mar 13 07:43:18 ftc-mvm-010 kernel: [ 4875.743077] dlm: got connection from 1
Mar 13 07:43:18 ftc-mvm-010 kernel: [ 4875.743921] dlm: version 0x00030002 for node 1 detected
Mar 13 07:43:18 ftc-mvm-010 kernel: [ 4875.744433] dlm: version 0x00030002 for node 3 detected
Mar 13 07:43:18 ftc-mvm-010 kernel: [ 4875.744520] dlm: mvm2_iscsi_01: dlm_recover_members 3 nodes
Mar 13 07:43:18 ftc-mvm-010 kernel: [ 4875.766375] dlm: mvm2_iscsi_01: generation 2 slots 3 1:3 2:1 3:2
Mar 13 07:43:18 ftc-mvm-010 kernel: [ 4875.766380] dlm: mvm2_iscsi_01: dlm_recover_directory
Mar 13 07:43:18 ftc-mvm-010 kernel: [ 4875.767079] dlm: mvm2_iscsi_01: dlm_recover_directory 2 in 2 new
Mar 13 07:43:18 ftc-mvm-010 kernel: [ 4875.789451] dlm: mvm2_iscsi_01: dlm_recover_directory 0 out 2 messages
Mar 13 07:43:18 ftc-mvm-010 kernel: [ 4875.833427] dlm: mvm2_iscsi_01: dlm_recover 5489011170463395644 generation 2 done: 67 ms
Mar 13 07:43:18 ftc-mvm-010 kernel: [ 4875.833483] dlm: mvm2_iscsi_01: join complete
Mar 13 07:43:41 ftc-mvm-010 kernel: [ 4898.518372] gfs2: fsid=ftc_mvm_cluster_02:mvm2_iscsi_01: Joined cluster. Now mounting FS (format 1802)...
Mar 13 07:43:41 ftc-mvm-010 kernel: [ 4899.402820] gfs2: fsid=ftc_mvm_cluster_02:mvm2_iscsi_01.2: journal 2 mapped with 1 extents in 0ms
Mar 13 07:43:41 ftc-mvm-010 kernel: [ 4899.424339] gfs2: fsid=ftc_mvm_cluster_02:mvm2_iscsi_01.2: jid=2, already locked for use
Mar 13 07:43:41 ftc-mvm-010 kernel: [ 4899.424349] gfs2: fsid=ftc_mvm_cluster_02:mvm2_iscsi_01.2: jid=2: Looking at journal...
Mar 13 07:43:42 ftc-mvm-010 kernel: [ 4899.927769] gfs2: fsid=ftc_mvm_cluster_02:mvm2_iscsi_01.2: jid=2: Journal head lookup took 503ms
Mar 13 07:43:42 ftc-mvm-010 kernel: [ 4899.927837] gfs2: fsid=ftc_mvm_cluster_02:mvm2_iscsi_01.2: jid=2: Done
Mar 13 07:43:42 ftc-mvm-010 pacemaker-controld[13176]: notice: Result of start operation for mvm2_iscsi_01 on ftc-mvm-010: ok
Mar 13 08:05:00 ftc-mvm-010 kernel: [ 6178.085829] sd 2:0:0:0: reservation conflict
Mar 13 08:05:00 ftc-mvm-010 kernel: [ 6178.085864] sd 2:0:0:0: [sde] tag#12 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
Mar 13 08:05:00 ftc-mvm-010 kernel: [ 6178.085874] sd 2:0:0:0: [sde] tag#12 CDB: Write(16) 8a 00 00 00 00 00 00 0c 68 d0 00 00 00 38 00 00
Mar 13 08:05:00 ftc-mvm-010 kernel: [ 6178.085878] reservation conflict error, dev sde, sector 813264 op 0x1:(WRITE) flags 0x4200 phys_seg 7 prio class 0
Mar 13 08:05:00 ftc-mvm-010 kernel: [ 6178.086682] reservation conflict error, dev dm-0, sector 813264 op 0x1:(WRITE) flags 0x0 phys_seg 7 prio class 0
Mar 13 08:05:00 ftc-mvm-010 kernel: [ 6178.086736] gfs2: fsid=ftc_mvm_cluster_02:mvm2_iscsi_01.2: Error 6 writing to journal, jid=2
Mar 13 08:05:00 ftc-mvm-010 kernel: [ 6178.086799] gfs2: fsid=ftc_mvm_cluster_02:mvm2_iscsi_01.2: about to withdraw this file system
Mar 13 08:05:05 ftc-mvm-010 python3[40871]: ansible-ansible.legacy.command Invoked with _raw_params=ls /mnt/59b28a94-cc13-4514-ad64-4f2cf603bc32 _uses_shell=True stdin_add_newline=True strip_empty_ends=True argv=None c
hdir=None executable=None creates=None removes=None stdin=None
Mar 13 08:05:05 ftc-mvm-010 kernel: [ 6183.305714] gfs2: fsid=ftc_mvm_cluster_02:mvm2_iscsi_01.2: Requesting recovery of jid 2.
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.249700] gfs2: fsid=ftc_mvm_cluster_02:mvm2_iscsi_01.2: Journal recovery complete for jid 2.
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.249709] gfs2: fsid=ftc_mvm_cluster_02:mvm2_iscsi_01.2: Glock dequeues delayed: 0
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.254860] gfs2: fsid=ftc_mvm_cluster_02:mvm2_iscsi_01.2: telling LM to unmount
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.254921] dlm: mvm2_iscsi_01: leaving the lockspace group...
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.256341] gfs2: fsid=ftc_mvm_cluster_02:mvm2_iscsi_01.2: recover_prep ignored due to withdraw.
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.256491] dlm: mvm2_iscsi_01: group event done 0
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.260381] dlm: mvm2_iscsi_01: release_lockspace final free
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.260492] gfs2: fsid=ftc_mvm_cluster_02:mvm2_iscsi_01.2: File system withdrawn
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.260515] CPU: 10 PID: 28458 Comm: gfs2_logd/ftc_m Not tainted 6.8.0-52-generic #53~22.04.1-Ubuntu
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.260519] Hardware name: HP ProLiant XL170r Gen9/ProLiant XL170r Gen9, BIOS U14 08/29/2024
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.260521] Call Trace:
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.260524] <TASK>
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.260529] dump_stack_lvl+0x76/0xa0
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.260537] dump_stack+0x10/0x20
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.260543] gfs2_withdraw+0xd5/0x160 [gfs2]
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.260617] gfs2_logd+0x1ef/0x330 [gfs2]
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.260635] ? __pfx_autoremove_wake_function+0x10/0x10
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.260641] ? __pfx_gfs2_logd+0x10/0x10 [gfs2]
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.260659] kthread+0xf2/0x120
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.260663] ? __pfx_kthread+0x10/0x10
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.260666] ret_from_fork+0x47/0x70
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.260670] ? __pfx_kthread+0x10/0x10
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.260672] ret_from_fork_asm+0x1b/0x30
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.260676] </TASK>
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.260696] gfs2: fsid=ftc_mvm_cluster_02:mvm2_iscsi_01.2: Error -5 syncing glock
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.260722] gfs2: fsid=ftc_mvm_cluster_02:mvm2_iscsi_01.2: G: s:EX n:2/2a5f7 f:lDpfIo t:SH d:SH/6176000 a:0 v:0 r:3 m:200 p:1
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.260766] gfs2: fsid=ftc_mvm_cluster_02:mvm2_iscsi_01.2: I: n:136/173559 t:4 f:0x00 d:0x00000001 s:3864 p:0

Here is the error message we see on one of the nodes as iscsi disconnects:

Mar 13 07:43:18 ftc-mvm-010 kernel: [ 4875.624321] gfs2: GFS2 installed
Mar 13 07:43:18 ftc-mvm-010 kernel: [ 4875.625673] gfs2: fsid=ftc_mvm_cluster_02:mvm2_iscsi_01: Trying to join cluster "lock_dlm", "ftc_mvm_cluster_02:mvm2_iscsi_01"
Mar 13 07:43:18 ftc-mvm-010 kernel: [ 4875.625797] dlm: Using TCP for communications
Mar 13 07:43:18 ftc-mvm-010 kernel: [ 4875.626144] dlm: mvm2_iscsi_01: joining the lockspace group...
Mar 13 07:43:18 ftc-mvm-010 kernel: [ 4875.634680] dlm: mvm2_iscsi_01: dlm_recover 5489011170463395644
Mar 13 07:43:18 ftc-mvm-010 kernel: [ 4875.634701] dlm: mvm2_iscsi_01: group event done 0
Mar 13 07:43:18 ftc-mvm-010 kernel: [ 4875.634769] dlm: mvm2_iscsi_01: add member 3
Mar 13 07:43:18 ftc-mvm-010 kernel: [ 4875.634773] dlm: mvm2_iscsi_01: add member 2
Mar 13 07:43:18 ftc-mvm-010 kernel: [ 4875.634777] dlm: mvm2_iscsi_01: add member 1
Mar 13 07:43:18 ftc-mvm-010 kernel: [ 4875.635006] dlm: connecting to 3
Mar 13 07:43:18 ftc-mvm-010 kernel: [ 4875.635084] dlm: connecting to 1
Mar 13 07:43:18 ftc-mvm-010 kernel: [ 4875.635252] dlm: got connection from 3
Mar 13 07:43:18 ftc-mvm-010 kernel: [ 4875.743077] dlm: got connection from 1
Mar 13 07:43:18 ftc-mvm-010 kernel: [ 4875.743921] dlm: version 0x00030002 for node 1 detected
Mar 13 07:43:18 ftc-mvm-010 kernel: [ 4875.744433] dlm: version 0x00030002 for node 3 detected
Mar 13 07:43:18 ftc-mvm-010 kernel: [ 4875.744520] dlm: mvm2_iscsi_01: dlm_recover_members 3 nodes
Mar 13 07:43:18 ftc-mvm-010 kernel: [ 4875.766375] dlm: mvm2_iscsi_01: generation 2 slots 3 1:3 2:1 3:2
Mar 13 07:43:18 ftc-mvm-010 kernel: [ 4875.766380] dlm: mvm2_iscsi_01: dlm_recover_directory
Mar 13 07:43:18 ftc-mvm-010 kernel: [ 4875.767079] dlm: mvm2_iscsi_01: dlm_recover_directory 2 in 2 new
Mar 13 07:43:18 ftc-mvm-010 kernel: [ 4875.789451] dlm: mvm2_iscsi_01: dlm_recover_directory 0 out 2 messages
Mar 13 07:43:18 ftc-mvm-010 kernel: [ 4875.833427] dlm: mvm2_iscsi_01: dlm_recover 5489011170463395644 generation 2 done: 67 ms
Mar 13 07:43:18 ftc-mvm-010 kernel: [ 4875.833483] dlm: mvm2_iscsi_01: join complete
Mar 13 07:43:41 ftc-mvm-010 kernel: [ 4898.518372] gfs2: fsid=ftc_mvm_cluster_02:mvm2_iscsi_01: Joined cluster. Now mounting FS (format 1802)...
Mar 13 07:43:41 ftc-mvm-010 kernel: [ 4899.402820] gfs2: fsid=ftc_mvm_cluster_02:mvm2_iscsi_01.2: journal 2 mapped with 1 extents in 0ms
Mar 13 07:43:41 ftc-mvm-010 kernel: [ 4899.424339] gfs2: fsid=ftc_mvm_cluster_02:mvm2_iscsi_01.2: jid=2, already locked for use
Mar 13 07:43:41 ftc-mvm-010 kernel: [ 4899.424349] gfs2: fsid=ftc_mvm_cluster_02:mvm2_iscsi_01.2: jid=2: Looking at journal...
Mar 13 07:43:42 ftc-mvm-010 kernel: [ 4899.927769] gfs2: fsid=ftc_mvm_cluster_02:mvm2_iscsi_01.2: jid=2: Journal head lookup took 503ms
Mar 13 07:43:42 ftc-mvm-010 kernel: [ 4899.927837] gfs2: fsid=ftc_mvm_cluster_02:mvm2_iscsi_01.2: jid=2: Done
Mar 13 07:43:42 ftc-mvm-010 pacemaker-controld[13176]: notice: Result of start operation for mvm2_iscsi_01 on ftc-mvm-010: ok
Mar 13 08:05:00 ftc-mvm-010 kernel: [ 6178.085829] sd 2:0:0:0: reservation conflict
Mar 13 08:05:00 ftc-mvm-010 kernel: [ 6178.085864] sd 2:0:0:0: [sde] tag#12 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
Mar 13 08:05:00 ftc-mvm-010 kernel: [ 6178.085874] sd 2:0:0:0: [sde] tag#12 CDB: Write(16) 8a 00 00 00 00 00 00 0c 68 d0 00 00 00 38 00 00
Mar 13 08:05:00 ftc-mvm-010 kernel: [ 6178.085878] reservation conflict error, dev sde, sector 813264 op 0x1:(WRITE) flags 0x4200 phys_seg 7 prio class 0
Mar 13 08:05:00 ftc-mvm-010 kernel: [ 6178.086682] reservation conflict error, dev dm-0, sector 813264 op 0x1:(WRITE) flags 0x0 phys_seg 7 prio class 0
Mar 13 08:05:00 ftc-mvm-010 kernel: [ 6178.086736] gfs2: fsid=ftc_mvm_cluster_02:mvm2_iscsi_01.2: Error 6 writing to journal, jid=2
Mar 13 08:05:00 ftc-mvm-010 kernel: [ 6178.086799] gfs2: fsid=ftc_mvm_cluster_02:mvm2_iscsi_01.2: about to withdraw this file system
Mar 13 08:05:05 ftc-mvm-010 python3[40871]: ansible-ansible.legacy.command Invoked with _raw_params=ls /mnt/59b28a94-cc13-4514-ad64-4f2cf603bc32 _uses_shell=True stdin_add_newline=True strip_empty_ends=True argv=None c
hdir=None executable=None creates=None removes=None stdin=None
Mar 13 08:05:05 ftc-mvm-010 kernel: [ 6183.305714] gfs2: fsid=ftc_mvm_cluster_02:mvm2_iscsi_01.2: Requesting recovery of jid 2.
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.249700] gfs2: fsid=ftc_mvm_cluster_02:mvm2_iscsi_01.2: Journal recovery complete for jid 2.
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.249709] gfs2: fsid=ftc_mvm_cluster_02:mvm2_iscsi_01.2: Glock dequeues delayed: 0
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.254860] gfs2: fsid=ftc_mvm_cluster_02:mvm2_iscsi_01.2: telling LM to unmount
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.254921] dlm: mvm2_iscsi_01: leaving the lockspace group...
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.256341] gfs2: fsid=ftc_mvm_cluster_02:mvm2_iscsi_01.2: recover_prep ignored due to withdraw.
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.256491] dlm: mvm2_iscsi_01: group event done 0
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.260381] dlm: mvm2_iscsi_01: release_lockspace final free
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.260492] gfs2: fsid=ftc_mvm_cluster_02:mvm2_iscsi_01.2: File system withdrawn
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.260515] CPU: 10 PID: 28458 Comm: gfs2_logd/ftc_m Not tainted 6.8.0-52-generic #53~22.04.1-Ubuntu
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.260519] Hardware name: HP ProLiant XL170r Gen9/ProLiant XL170r Gen9, BIOS U14 08/29/2024
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.260521] Call Trace:
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.260524] <TASK>
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.260529] dump_stack_lvl+0x76/0xa0
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.260537] dump_stack+0x10/0x20
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.260543] gfs2_withdraw+0xd5/0x160 [gfs2]
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.260617] gfs2_logd+0x1ef/0x330 [gfs2]
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.260635] ? __pfx_autoremove_wake_function+0x10/0x10
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.260641] ? __pfx_gfs2_logd+0x10/0x10 [gfs2]
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.260659] kthread+0xf2/0x120
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.260663] ? __pfx_kthread+0x10/0x10
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.260666] ret_from_fork+0x47/0x70
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.260670] ? __pfx_kthread+0x10/0x10
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.260672] ret_from_fork_asm+0x1b/0x30
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.260676] </TASK>
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.260696] gfs2: fsid=ftc_mvm_cluster_02:mvm2_iscsi_01.2: Error -5 syncing glock
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.260722] gfs2: fsid=ftc_mvm_cluster_02:mvm2_iscsi_01.2: G: s:EX n:2/2a5f7 f:lDpfIo t:SH d:SH/6176000 a:0 v:0 r:3 m:200 p:1
Mar 13 08:05:06 ftc-mvm-010 kernel: [ 6184.260766] gfs2: fsid=ftc_mvm_cluster_02:mvm2_iscsi_01.2: I: n:136/173559 t:4 f:0x00 d:0x00000001 s:3864 p:0

daniel.shafer
8 Posts

March 13, 2025, 2:14 pm

The previous error happens with only two paths configured to a single iscsi volume running on a 3 node petasan 3.30 cluster.

OS details of the kvm setup given previously.

admin
2,973 Posts

March 13, 2025, 2:17 pm

We work well with kvm, my understanding in your kvm setup with redhat ovirt was working, but not with mvm (morpheus) on ubuntu. The problem is we have no experience with mvm (morpheus), maybe if you can compare the working configuration from redhat and mvm, it may give you leads.

what is output of:

multipath -ll

what is content of:

/etc/multipath.conf

where/who is specifying muti-path to use round-robin vs failover ?

what are the reservation errors do you see ?

who is generating the errors ?

are they SCSI persistent reservation errors ? version 3 or 2 ?

do they happen when you access the volume or during failover ?

how many clients are accessing the volume ?

if you have only 1 kvm accessing the volume, do you still see errors ?

daniel.shafer
8 Posts

March 13, 2025, 7:50 pm

since I hadn't heard back, i spoke with some of my friends in the storage world and been trying a few things.

Most of the questions you asked to what I was using I already answered in the above thread, like the multipath.conf file, round robin, reservation error, please see above. To answer the rest, the errors didn't happen during failover but right after setup of the cluster. Three kvm hosts are connecting to three petasan nodes using gfs2/mutltipathd/open-iscsi . We haven't tried only 1 kvm host only since this is a locking issue and we figured we needed at least two to test properly.

Currently we got two paths working from each kvm server via gfs2/multipathd/open-iscsi on ubuntu2204, so here is an update with new data and an aswer as to when it was happening:

As mvm is setting up the iscsi volume, it uses gfs2 to format the disk. The reservation errors would happen right after install, pacemaker would get grumpy, kernel error about reservation was seen and then fencing would occur to drop one of the kvm hosts volume mount points offline. pcs status showed an unhealthy cluster with the host fenced after this. This error would only happen when we were running more than a single path to the petasan array, single path (1 iscsi session) worked fine from multiple kvm during all the different tests we ran.

So, after a few different attempts at multipath settings, we came up with this multipath.conf data for petasan:

devices {
device {
vendor "PETASAN"
product ".*"
path_grouping_policy "group_by_prio"
path_selector "round-robin 0"
path_checker "tur"
features "0"
hardware_handler "1 alua"
prio "alua"
failback "immediate"
rr_weight "uniform"
no_path_retry "18"
fast_io_fail_tmo "10"
dev_loss_tmo "infinity"
}
}

Please let me know if you see something we should not be using, or should add to make this 'mo betta, would appreciate it!

admin
2,973 Posts

March 13, 2025, 9:40 pm

so things are working ? how was it fixed ? i also see changes in vendor/product fields.