Forums - PetaSAN

ForumGeneral Discussion3 Node Cluster - 1 of the nodes w …
You need to log in to create posts and topics. Login · Register
3 Node Cluster - 1 of the nodes was down because of a CPU issue

nocstaff@urbancom.net
11 Posts

August 11, 2023, 2:33 pm
Quote from nocstaff@urbancom.net on August 11, 2023, 2:33 pm
I have a 3 node cluster and lost 1 of my nodes because of a cpu failure. It took almost 2 weeks to get the parts and restore the node. I brought the node back online and it has been almost a day and I am still getting warnings. The restored node has 8 OSD's but only 3 are showing as up.

Here is the current health

ceph health
HEALTH_WARN 2 nearfull osd(s); Low space hindering backfill (add storage if this doesn't resolve itself): 15 pgs backfill_toofull; Degraded data redundancy: 1127223/9042237 objects degraded (12.466%), 193 pgs degraded, 193 pgs undersized; 513 pgs not deep-scrubbed in time; 513 pgs not scrubbed in time; 2 pool(s) nearfull

Should I leave it alone and let it resolve on its own or is there something I should be doing. I am concerned because of the low space warning

I have a 3 node cluster and lost 1 of my nodes because of a cpu failure. It took almost 2 weeks to get the parts and restore the node. I brought the node back online and it has been almost a day and I am still getting warnings. The restored node has 8 OSD's but only 3 are showing as up.

Here is the current health

ceph health
HEALTH_WARN 2 nearfull osd(s); Low space hindering backfill (add storage if this doesn't resolve itself): 15 pgs backfill_toofull; Degraded data redundancy: 1127223/9042237 objects degraded (12.466%), 193 pgs degraded, 193 pgs undersized; 513 pgs not deep-scrubbed in time; 513 pgs not scrubbed in time; 2 pool(s) nearfull

Should I leave it alone and let it resolve on its own or is there something I should be doing. I am concerned because of the low space warning

#1

admin
2,981 Posts

August 11, 2023, 7:49 pm
Quote from admin on August 11, 2023, 7:49 pm
I would lower the OSD crush weight on all the OSD s in the problem node. then try to start the 5 down OSDs, if they fail to start look at their logs to try to find the ptoblem. if they have been damaged from the initial failure you need to replace them,

I would lower the OSD crush weight on all the OSD s in the problem node. then try to start the 5 down OSDs, if they fail to start look at their logs to try to find the ptoblem. if they have been damaged from the initial failure you need to replace them,

#2

Post Reply: 3 Node Cluster - 1 of the nodes was down because of a CPU issue

Cancel