Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Petasan Cluster issue - Module 'devicehealth' has failed: unknown operation

I am running a Proxmox cluster with 4 Petasan nodes clustered for storage. Over the weekend 23 of my 32 drives are showing offline. THis is what I get when I check the status. How can I fix this?

 

root@uccsan01:~# ceph status
cluster:
id: 9774be8a-c28d-4b1a-b614-e99a0151d001
health: HEALTH_ERR
Module 'devicehealth' has failed: unknown operation
Reduced data availability: 100 pgs inactive, 59 pgs down, 11 pgs stale
Degraded data redundancy: 4961154/14820801 objects degraded (33.474%), 63 pgs degraded, 63 pgs undersized
2 slow ops, oldest one blocked for 38 sec, osd.5 has slow ops

services:
mon: 3 daemons, quorum uccsan03,uccsan01,uccsan02 (age 9h)
mgr: uccsan02(active, since 4M), standbys: uccsan03, uccsan01
osd: 32 osds: 9 up (since 2h), 9 in (since 2h); 64 remapped pgs

data:
pools: 2 pools, 129 pgs
objects: 4.94M objects, 19 TiB
usage: 34 TiB used, 98 TiB / 132 TiB avail
pgs: 77.519% pgs not active
4961154/14820801 objects degraded (33.474%)
461988/14820801 objects misplaced (3.117%)
46 down
39 undersized+degraded+remapped+backfill_wait+peered
21 active+undersized+degraded+remapped+backfill_wait
11 stale+down
7 active+clean
2 undersized+degraded+remapped+backfilling+peered
2 down+remapped
1 active+undersized+degraded

io:
recovery: 48 MiB/s, 11 objects/s

progress:
Global Recovery Event (2d)
[=...........................] (remaining: 5w)

I am running version 3.3

The issue is not related to "Module 'devicehealth' has failed: unknown operation". You can ignore this error.

The problem is that OSDs are down, the primary focus is to get them up. There could be many problems that would lead to this including hardware/network, i can see from the status that even the monitors had a service up in last 3 hours, so it may not be just the OSDs. First try to reboot the cluster, else you will need to look at logs in /var/log/ceph, /var/log/syslog, dmesg.