Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

3 Node Cluster - 1 of the nodes was down because of a CPU issue

I have a 3 node cluster and lost 1 of my nodes because of a cpu failure. It took almost 2 weeks to get the parts and restore the node. I brought the node back online and it has been almost a day and I am still getting warnings. The restored node has 8 OSD's but only 3 are showing as up.

Here is the current health

ceph health
HEALTH_WARN 2 nearfull osd(s); Low space hindering backfill (add storage if this doesn't resolve itself): 15 pgs backfill_toofull; Degraded data redundancy: 1127223/9042237 objects degraded (12.466%), 193 pgs degraded, 193 pgs undersized; 513 pgs not deep-scrubbed in time; 513 pgs not scrubbed in time; 2 pool(s) nearfull

Should I leave it alone and let it resolve on its own or is there something I should be doing. I am concerned because of the low space warning

I would lower the OSD  crush weight on all the OSD s in the problem node. then try to start the 5 down OSDs, if they fail to start look at their logs to try to find the ptoblem. if they have been damaged from the initial failure you need to replace them,