Petasan 2 node cluster issue post 1 node failure

Ramadass K
10 Posts
January 7, 2025, 7:42 amQuote from Ramadass K on January 7, 2025, 7:42 amWe have a 3 node Petasan cluster with version - 3.2.1. One of the nodes has crashed and we have rebalanced the data.
Post rebalancing following are the issues
- 9 PGs are down
- We are not able to bring up RBD disks from Petasan front end. Error is shown as "http code 504; Gateway timeout"
- We are trying to add 4th node; but unable to join ; error in front end is "Error joining cluster; not all ceph monitors are up"
- Only 1 manager is up
Following are the details
root@NODE-1:~# ceph -s
cluster:
id: 01d2f513-5979-4fb8-9d7d-d46d4027275c
health: HEALTH_ERR
1/3 mons down, quorum NODE-1,NODE-2
noout flag(s) set
full ratio(s) out of order
Reduced data availability: 9 pgs inactive, 9 pgs down
Degraded data redundancy: 17 pgs undersized
services:
mon: 3 daemons, quorum NODE-1,NODE-2 (age 15h), out of quorum: NODE-3
mgr: NODE-1(active, since 15h)
osd: 30 osds: 25 up (since 15h), 25 in (since 25h); 16 remapped pgs
flags noout
data:
pools: 3 pools, 1057 pgs
objects: 1.43M objects, 5.4 TiB
usage: 11 TiB used, 2.8 TiB / 14 TiB avail
pgs: 0.851% pgs not active
2/2862888 objects misplaced (0.000%)
1015 active+clean
17 active+undersized
16 active+clean+remapped
9 down
We have a 3 node Petasan cluster with version - 3.2.1. One of the nodes has crashed and we have rebalanced the data.
Post rebalancing following are the issues
- 9 PGs are down
- We are not able to bring up RBD disks from Petasan front end. Error is shown as "http code 504; Gateway timeout"
- We are trying to add 4th node; but unable to join ; error in front end is "Error joining cluster; not all ceph monitors are up"
- Only 1 manager is up
Following are the details
root@NODE-1:~# ceph -s
cluster:
id: 01d2f513-5979-4fb8-9d7d-d46d4027275c
health: HEALTH_ERR
1/3 mons down, quorum NODE-1,NODE-2
noout flag(s) set
full ratio(s) out of order
Reduced data availability: 9 pgs inactive, 9 pgs down
Degraded data redundancy: 17 pgs undersized
services:
mon: 3 daemons, quorum NODE-1,NODE-2 (age 15h), out of quorum: NODE-3
mgr: NODE-1(active, since 15h)
osd: 30 osds: 25 up (since 15h), 25 in (since 25h); 16 remapped pgs
flags noout
data:
pools: 3 pools, 1057 pgs
objects: 1.43M objects, 5.4 TiB
usage: 11 TiB used, 2.8 TiB / 14 TiB avail
pgs: 0.851% pgs not active
2/2862888 objects misplaced (0.000%)
1015 active+clean
17 active+undersized
16 active+clean+remapped
9 down
Last edited on January 7, 2025, 9:37 am by Ramadass K · #1
Petasan 2 node cluster issue post 1 node failure
Ramadass K
10 Posts
Quote from Ramadass K on January 7, 2025, 7:42 amWe have a 3 node Petasan cluster with version - 3.2.1. One of the nodes has crashed and we have rebalanced the data.
Post rebalancing following are the issues
- 9 PGs are down
- We are not able to bring up RBD disks from Petasan front end. Error is shown as "http code 504; Gateway timeout"
- We are trying to add 4th node; but unable to join ; error in front end is "Error joining cluster; not all ceph monitors are up"
- Only 1 manager is up
Following are the details
root@NODE-1:~# ceph -s
cluster:
id: 01d2f513-5979-4fb8-9d7d-d46d4027275c
health: HEALTH_ERR
1/3 mons down, quorum NODE-1,NODE-2
noout flag(s) set
full ratio(s) out of order
Reduced data availability: 9 pgs inactive, 9 pgs down
Degraded data redundancy: 17 pgs undersizedservices:
mon: 3 daemons, quorum NODE-1,NODE-2 (age 15h), out of quorum: NODE-3
mgr: NODE-1(active, since 15h)
osd: 30 osds: 25 up (since 15h), 25 in (since 25h); 16 remapped pgs
flags nooutdata:
pools: 3 pools, 1057 pgs
objects: 1.43M objects, 5.4 TiB
usage: 11 TiB used, 2.8 TiB / 14 TiB avail
pgs: 0.851% pgs not active
2/2862888 objects misplaced (0.000%)
1015 active+clean
17 active+undersized
16 active+clean+remapped
9 down
We have a 3 node Petasan cluster with version - 3.2.1. One of the nodes has crashed and we have rebalanced the data.
Post rebalancing following are the issues
- 9 PGs are down
- We are not able to bring up RBD disks from Petasan front end. Error is shown as "http code 504; Gateway timeout"
- We are trying to add 4th node; but unable to join ; error in front end is "Error joining cluster; not all ceph monitors are up"
- Only 1 manager is up
Following are the details
root@NODE-1:~# ceph -s
cluster:
id: 01d2f513-5979-4fb8-9d7d-d46d4027275c
health: HEALTH_ERR
1/3 mons down, quorum NODE-1,NODE-2
noout flag(s) set
full ratio(s) out of order
Reduced data availability: 9 pgs inactive, 9 pgs down
Degraded data redundancy: 17 pgs undersized
services:
mon: 3 daemons, quorum NODE-1,NODE-2 (age 15h), out of quorum: NODE-3
mgr: NODE-1(active, since 15h)
osd: 30 osds: 25 up (since 15h), 25 in (since 25h); 16 remapped pgs
flags noout
data:
pools: 3 pools, 1057 pgs
objects: 1.43M objects, 5.4 TiB
usage: 11 TiB used, 2.8 TiB / 14 TiB avail
pgs: 0.851% pgs not active
2/2862888 objects misplaced (0.000%)
1015 active+clean
17 active+undersized
16 active+clean+remapped
9 down