You need to log in to create posts and topics. Login · Register

Pool going to Inactive

Hi

I have 6 nodes and i am using erasure coding (4+2p) pools for  iscsi and smb.

For metadata pools i am using replicated pools of size 3.

After deleting shutting down 2 nodes some of the pool are going to inactive state, so i can't access iscsi and smb.

Pools details

  • ceph osd pool ls
    .mgr
    File_Share_Meta_SSD
    File_Share_Default_SSD
    File_Share_Data_SSD
    File_Share_Meta_HDD
    File_Share_Default_HDD
    File_Share_Data_HDD
    ISCSI_Data_SSD
    ISCSI_Meta_SSD
    ISCSI_Data_HDD
    ISCSI_Meta_HDD

*  Pool Name: .mgr
size: 3
min_size: 2
pg_num: 1
pgp_num: 1
crush_rule: by-host-ssd
hashpspool: true
nodelete: false
nopgchange: false
nosizechange: false
write_fadvise_dontneed: false
noscrub: false
nodeep-scrub: false
use_gmt_hitset: 1
fast_read: 0
compression_mode: none
compression_algorithm: none
pg_autoscale_mode: off
pg_num_min: 1
eio: false
bulk: false
pg_num_max: 32

* Pool Name: File_Share_Meta_SSD
size: 3
min_size: 2
pg_num: 2048
pgp_num: 2048
crush_rule: by-host-ssd
hashpspool: true
nodelete: false
nopgchange: false
nosizechange: false
write_fadvise_dontneed: false
noscrub: false
nodeep-scrub: false
use_gmt_hitset: 1
fast_read: 0
recovery_priority: 5
compression_mode: none
compression_algorithm: none
pg_autoscale_mode: on
pg_num_min: 16
target_size_ratio: 1
pg_autoscale_bias: 4
eio: false
bulk: false

* Pool Name: File_Share_Default_SSD
size: 3
min_size: 2
pg_num: 512
pgp_num: 512
crush_rule: by-host-ssd
hashpspool: true
nodelete: false
nopgchange: false
nosizechange: false
write_fadvise_dontneed: false
noscrub: false
nodeep-scrub: false
use_gmt_hitset: 1
fast_read: 0
compression_mode: none
compression_algorithm: none
pg_autoscale_mode: on
target_size_ratio: 1
eio: false
bulk: false

* Pool Name: File_Share_Data_SSD
size: 6
min_size: 5
pg_num: 64
pgp_num: 64
crush_rule: ec-by-host-ssd
hashpspool: true
allow_ec_overwrites: true
nodelete: false
nopgchange: false
nosizechange: false
write_fadvise_dontneed: false
noscrub: false
nodeep-scrub: false
use_gmt_hitset: 1
erasure_code_profile: ec-42-profile
fast_read: 0
compression_mode: none
compression_algorithm: none
pg_autoscale_mode: on
target_size_ratio: 1
eio: false
bulk: false

* Pool Name: File_Share_Meta_HDD
size: 3
min_size: 2
pg_num: 4096
pgp_num: 4096
crush_rule: by-host-hdd
hashpspool: true
nodelete: false
nopgchange: false
nosizechange: false
write_fadvise_dontneed: false
noscrub: false
nodeep-scrub: false
use_gmt_hitset: 1
fast_read: 0
recovery_priority: 5
compression_mode: none
compression_algorithm: none
pg_autoscale_mode: on
pg_num_min: 16
target_size_ratio: 1
pg_autoscale_bias: 4
eio: false
bulk: false

* Pool Name: File_Share_Default_HDD
size: 3
min_size: 2
pg_num: 512
pgp_num: 512
crush_rule: by-host-hdd
hashpspool: true
nodelete: false
nopgchange: false
nosizechange: false
write_fadvise_dontneed: false
noscrub: false
nodeep-scrub: false
use_gmt_hitset: 1
fast_read: 0
compression_mode: none
compression_algorithm: none
pg_autoscale_mode: on
target_size_ratio: 1
eio: false
bulk: false

* Pool Name: File_Share_Data_HDD
size: 6
min_size: 5
pg_num: 256
pgp_num: 256
crush_rule: ec-by-host-hdd
hashpspool: true
allow_ec_overwrites: true
nodelete: false
nopgchange: false
nosizechange: false
write_fadvise_dontneed: false
noscrub: false
nodeep-scrub: false
use_gmt_hitset: 1
erasure_code_profile: ec-42-profile
fast_read: 0
compression_mode: none
compression_algorithm: none
pg_autoscale_mode: on
target_size_ratio: 1
eio: false
bulk: false

* Pool Name: ISCSI_Data_SSD
size: 6
min_size: 5
pg_num: 64
pgp_num: 64
crush_rule: ec-by-host-ssd
hashpspool: true
allow_ec_overwrites: true
nodelete: false
nopgchange: false
nosizechange: false
write_fadvise_dontneed: false
noscrub: false
nodeep-scrub: false
use_gmt_hitset: 1
erasure_code_profile: ec-42-profile
fast_read: 0
compression_mode: none
compression_algorithm: none
pg_autoscale_mode: on
target_size_ratio: 1
eio: false
bulk: false

* Pool Name: ISCSI_Meta_SSD
size: 3
min_size: 2
pg_num: 64
pgp_num: 64
crush_rule: by-host-ssd
hashpspool: true
nodelete: false
nopgchange: false
nosizechange: false
write_fadvise_dontneed: false
noscrub: false
nodeep-scrub: false
use_gmt_hitset: 1
fast_read: 0
compression_mode: none
compression_algorithm: none
pg_autoscale_mode: on
target_size_ratio: 1
eio: false
bulk: false

* Pool Name: ISCSI_Data_HDD
size: 6
min_size: 5
pg_num: 64
pgp_num: 64
crush_rule: ec-by-host-hdd
hashpspool: true
allow_ec_overwrites: true
nodelete: false
nopgchange: false
nosizechange: false
write_fadvise_dontneed: false
noscrub: false
nodeep-scrub: false
use_gmt_hitset: 1
erasure_code_profile: ec-42-profile
fast_read: 0
compression_mode: none
compression_algorithm: none
pg_autoscale_mode: on
target_size_ratio: 1
eio: false
bulk: false

* Pool Name: ISCSI_Meta_HDD
size: 3
min_size: 2
pg_num: 256
pgp_num: 256
crush_rule: by-host-hdd
hashpspool: true
nodelete: false
nopgchange: false
nosizechange: false
write_fadvise_dontneed: false
noscrub: false
nodeep-scrub: false
use_gmt_hitset: 1
fast_read: 0
compression_mode: none
compression_algorithm: none
pg_autoscale_mode: on
target_size_ratio: 1
eio: false
bulk: false

 

which 2 nodes were deleted ? any of the first 3 nodes were deleted (management ) or 4-5 (storage) ? did you delete the disks in the deleted nodes before deleting the nodes or are they still available ? what is output command of ceph status ?

 

I didn't shutdown the first 3 nodes .

I shutdown the node 5 and 6

Didn't removed the disk , i just shutdown the 2 nodes to check the erasure code of 4+2 p and replication of 3 .

So is it ok to shutdown 2 nodes in erasure coding 4+2p

root@storage-01:~# ceph -s
cluster:
id: 09648ac5-073f-48c0-bbcd-45a88dad2e9a
health: HEALTH_WARN
1 MDSs report slow metadata IOs
1/5 mons down, quorum storage-03,storage-01,storage-02,storage-04
48 osds down
2 hosts (48 osds) down
Reduced data availability: 1936 pgs inactive
Degraded data redundancy: 8040/24084 objects degraded (33.383%), 488 pgs degraded, 6433 pgs undersized

services:
mon: 5 daemons, quorum storage-03,storage-01,storage-02,storage-04 (age 4m), out of quorum: storage-05
mgr: storage-03(active, since 79m), standbys: storage-01, storage-02
mds: 2/2 daemons up, 1 standby
osd: 144 osds: 96 up (since 3m), 144 in (since 64m)

data:
volumes: 2/2 healthy
pools: 11 pools, 7937 pgs
objects: 4.07k objects, 12 GiB
usage: 29 GiB used, 1.8 PiB / 1.8 PiB avail
pgs: 24.392% pgs not active
8040/24084 objects degraded (33.383%)
4440 active+undersized
1505 undersized+peered
1504 active+clean
431 undersized+degraded+peered
57 active+undersized+degraded

root@storage-01:~# ceph health detail
HEALTH_WARN 1 MDSs report slow metadata IOs; 1/5 mons down, quorum storage-03,storage-01,storage-02,storage-04; 48 osds down; 2 hosts (48 osds) down; Reduced data availability: 1936 pgs inactive; Degraded data redundancy: 8040/24084 objects degraded (33.383%), 488 pgs degraded, 6433 pgs undersized
[WRN] MDS_SLOW_METADATA_IO: 1 MDSs report slow metadata IOs
mds.storage-02(mds.0): 2 slow metadata IOs are blocked > 30 secs, oldest blocked for 255 secs
[WRN] MON_DOWN: 1/5 mons down, quorum storage-03,storage-01,storage-02,storage-04
mon.storage-05 (rank 4) addr [v2:10.10.128.95:3300/0,v1:10.10.128.95:6789/0] is down (out of quorum)
[WRN] OSD_DOWN: 48 osds down
osd.72 (root=default,host=storage-06) is down
osd.73 (root=default,host=storage-06) is down
osd.74 (root=default,host=storage-06) is down
osd.75 (root=default,host=storage-06) is down
osd.76 (root=default,host=storage-06) is down
osd.77 (root=default,host=storage-06) is down
osd.78 (root=default,host=storage-06) is down
osd.79 (root=default,host=storage-06) is down
osd.80 (root=default,host=storage-06) is down
osd.96 (root=default,host=storage-05) is down
osd.97 (root=default,host=storage-05) is down
osd.98 (root=default,host=storage-05) is down
osd.108 (root=default,host=storage-05) is down
osd.109 (root=default,host=storage-05) is down
osd.110 (root=default,host=storage-05) is down
osd.111 (root=default,host=storage-05) is down
osd.112 (root=default,host=storage-05) is down
osd.113 (root=default,host=storage-05) is down
osd.114 (root=default,host=storage-05) is down
osd.115 (root=default,host=storage-05) is down
osd.116 (root=default,host=storage-05) is down
osd.117 (root=default,host=storage-05) is down
osd.118 (root=default,host=storage-05) is down
osd.119 (root=default,host=storage-05) is down
osd.120 (root=default,host=storage-05) is down
osd.121 (root=default,host=storage-05) is down
osd.122 (root=default,host=storage-05) is down
osd.123 (root=default,host=storage-05) is down
osd.124 (root=default,host=storage-05) is down
osd.125 (root=default,host=storage-05) is down
osd.126 (root=default,host=storage-05) is down
osd.127 (root=default,host=storage-05) is down
osd.128 (root=default,host=storage-05) is down
osd.129 (root=default,host=storage-06) is down
osd.130 (root=default,host=storage-06) is down
osd.131 (root=default,host=storage-06) is down
osd.132 (root=default,host=storage-06) is down
osd.133 (root=default,host=storage-06) is down
osd.134 (root=default,host=storage-06) is down
osd.135 (root=default,host=storage-06) is down
osd.136 (root=default,host=storage-06) is down
osd.137 (root=default,host=storage-06) is down
osd.138 (root=default,host=storage-06) is down
osd.139 (root=default,host=storage-06) is down
osd.140 (root=default,host=storage-06) is down
osd.141 (root=default,host=storage-06) is down
osd.142 (root=default,host=storage-06) is down
osd.143 (root=default,host=storage-06) is down
[WRN] OSD_HOST_DOWN: 2 hosts (48 osds) down
host storage-06 (root=default) (24 osds) is down
host storage-05 (root=default) (24 osds) is down
[WRN] PG_AVAILABILITY: Reduced data availability: 1936 pgs inactive
pg 46.ecf is stuck inactive for 4m, current state undersized+degraded+peered, last acting [67]
pg 46.ed2 is stuck inactive for 4m, current state undersized+peered, last acting [68]
pg 46.ed5 is stuck inactive for 4m, current state undersized+peered, last acting [29]
pg 46.ed7 is stuck inactive for 4m, current state undersized+peered, last acting [46]
pg 46.edb is stuck inactive for 4m, current state undersized+peered, last acting [66]
pg 46.ee2 is stuck inactive for 4m, current state undersized+peered, last acting [50]
pg 46.ee6 is stuck inactive for 4m, current state undersized+peered, last acting [46]
pg 46.eea is stuck inactive for 4m, current state undersized+peered, last acting [83]
pg 46.efc is stuck inactive for 4m, current state undersized+peered, last acting [52]
pg 46.f00 is stuck inactive for 4m, current state undersized+peered, last acting [63]
pg 46.f0c is stuck inactive for 4m, current state undersized+peered, last acting [50]
pg 46.f3b is stuck inactive for 4m, current state undersized+peered, last acting [68]
pg 46.f41 is stuck inactive for 4m, current state undersized+peered, last acting [39]
pg 46.f43 is stuck inactive for 4m, current state undersized+peered, last acting [50]
pg 46.f44 is stuck inactive for 4m, current state undersized+peered, last acting [34]
pg 46.f49 is stuck inactive for 4m, current state undersized+peered, last acting [85]
pg 46.f4a is stuck inactive for 76m, current state undersized+peered, last acting [54]
pg 46.f4b is stuck inactive for 4m, current state undersized+peered, last acting [68]
pg 46.f54 is stuck inactive for 4m, current state undersized+peered, last acting [71]
pg 46.f61 is stuck inactive for 4m, current state undersized+peered, last acting [44]
pg 46.f69 is stuck inactive for 4m, current state undersized+peered, last acting [57]
pg 46.f72 is stuck inactive for 79m, current state undersized+peered, last acting [43]
pg 46.f75 is stuck inactive for 4m, current state undersized+peered, last acting [53]
pg 46.f78 is stuck inactive for 4m, current state undersized+peered, last acting [53]
pg 46.f79 is stuck inactive for 4m, current state undersized+peered, last acting [31]
pg 46.f7f is stuck inactive for 4m, current state undersized+peered, last acting [61]
pg 46.f80 is stuck inactive for 79m, current state undersized+peered, last acting [30]
pg 46.f8d is stuck inactive for 4m, current state undersized+peered, last acting [42]
pg 46.f8e is stuck inactive for 4m, current state undersized+peered, last acting [31]
pg 46.f90 is stuck inactive for 79m, current state undersized+peered, last acting [46]
pg 46.f91 is stuck inactive for 4m, current state undersized+peered, last acting [30]
pg 46.f94 is stuck inactive for 4m, current state undersized+peered, last acting [68]
pg 46.f97 is stuck inactive for 79m, current state undersized+peered, last acting [67]
pg 46.fb1 is stuck inactive for 4m, current state undersized+peered, last acting [46]
pg 46.fb6 is stuck inactive for 4m, current state undersized+peered, last acting [38]
pg 46.fb7 is stuck inactive for 4m, current state undersized+peered, last acting [49]
pg 46.fb9 is stuck inactive for 4m, current state undersized+peered, last acting [38]
pg 46.fc5 is stuck inactive for 4m, current state undersized+peered, last acting [56]
pg 46.fc9 is stuck inactive for 4m, current state undersized+peered, last acting [35]
pg 46.fca is stuck inactive for 4m, current state undersized+peered, last acting [60]
pg 46.fcb is stuck inactive for 4m, current state undersized+peered, last acting [53]
pg 46.fcc is stuck inactive for 4m, current state undersized+peered, last acting [55]
pg 46.fce is stuck inactive for 4m, current state undersized+peered, last acting [55]
pg 46.fcf is stuck inactive for 4m, current state undersized+peered, last acting [34]
pg 46.fd7 is stuck inactive for 4m, current state undersized+peered, last acting [49]
pg 46.fe5 is stuck inactive for 4m, current state undersized+peered, last acting [88]
pg 46.fe8 is stuck inactive for 4m, current state undersized+peered, last acting [44]
pg 46.fef is stuck inactive for 4m, current state undersized+peered, last acting [53]
pg 46.ff5 is stuck inactive for 4m, current state undersized+peered, last acting [54]
pg 46.ff6 is stuck inactive for 4m, current state undersized+peered, last acting [28]
pg 46.ffe is stuck inactive for 70m, current state undersized+peered, last acting [55]
[WRN] PG_DEGRADED: Degraded data redundancy: 8040/24084 objects degraded (33.383%), 488 pgs degraded, 6433 pgs undersized
pg 46.fc1 is stuck undersized for 4m, current state active+undersized, last acting [43,57]
pg 46.fc2 is stuck undersized for 4m, current state active+undersized, last acting [51,61]
pg 46.fc3 is stuck undersized for 4m, current state active+undersized, last acting [37,95]
pg 46.fc4 is stuck undersized for 4m, current state active+undersized, last acting [86,33]
pg 46.fc5 is stuck undersized for 4m, current state undersized+peered, last acting [56]
pg 46.fc6 is stuck undersized for 4m, current state active+undersized, last acting [43,56]
pg 46.fc7 is stuck undersized for 4m, current state active+undersized, last acting [68,31]
pg 46.fc9 is stuck undersized for 4m, current state undersized+peered, last acting [35]
pg 46.fca is stuck undersized for 4m, current state undersized+peered, last acting [60]
pg 46.fcb is stuck undersized for 4m, current state undersized+peered, last acting [53]
pg 46.fcc is stuck undersized for 4m, current state undersized+peered, last acting [55]
pg 46.fcd is stuck undersized for 4m, current state active+undersized, last acting [91,70]
pg 46.fce is stuck undersized for 4m, current state undersized+peered, last acting [55]
pg 46.fcf is stuck undersized for 4m, current state undersized+peered, last acting [34]
pg 46.fd1 is stuck undersized for 4m, current state active+undersized, last acting [64,34]
pg 46.fd2 is stuck undersized for 4m, current state active+undersized, last acting [71,95]
pg 46.fd3 is stuck undersized for 4m, current state active+undersized, last acting [67,94]
pg 46.fd4 is stuck undersized for 4m, current state active+undersized, last acting [88,39]
pg 46.fd5 is stuck undersized for 4m, current state active+undersized, last acting [89,65]
pg 46.fd7 is stuck undersized for 4m, current state undersized+peered, last acting [49]
pg 46.fd8 is stuck undersized for 4m, current state active+undersized, last acting [59,71]
pg 46.fd9 is stuck undersized for 4m, current state active+undersized, last acting [89,64]
pg 46.fda is stuck undersized for 4m, current state active+undersized, last acting [87,43]
pg 46.fdb is stuck undersized for 4m, current state active+undersized, last acting [63,54]
pg 46.fde is stuck undersized for 4m, current state active+undersized+degraded, last acting [27,47]
pg 46.fdf is stuck undersized for 4m, current state active+undersized, last acting [61,48]
pg 46.fe0 is stuck undersized for 4m, current state active+undersized, last acting [29,64]
pg 46.fe2 is stuck undersized for 4m, current state active+undersized, last acting [71,89]
pg 46.fe3 is stuck undersized for 4m, current state active+undersized, last acting [52,35]
pg 46.fe5 is stuck undersized for 4m, current state undersized+peered, last acting [88]
pg 46.fe6 is stuck undersized for 4m, current state active+undersized, last acting [36,70]
pg 46.fe7 is stuck undersized for 4m, current state active+undersized, last acting [31,88]
pg 46.fe8 is stuck undersized for 4m, current state undersized+peered, last acting [44]
pg 46.fe9 is stuck undersized for 4m, current state active+undersized, last acting [28,63]
pg 46.fea is stuck undersized for 4m, current state active+undersized, last acting [44,34]
pg 46.fec is stuck undersized for 4m, current state active+undersized, last acting [66,31]
pg 46.fed is stuck undersized for 4m, current state active+undersized, last acting [91,62]
pg 46.fee is stuck undersized for 4m, current state active+undersized, last acting [46,40]
pg 46.fef is stuck undersized for 4m, current state undersized+peered, last acting [53]
pg 46.ff0 is stuck undersized for 4m, current state active+undersized, last acting [51,35]
pg 46.ff1 is stuck undersized for 4m, current state active+undersized, last acting [34,43]
pg 46.ff2 is stuck undersized for 4m, current state active+undersized, last acting [69,51]
pg 46.ff3 is stuck undersized for 4m, current state active+undersized, last acting [92,64]
pg 46.ff5 is stuck undersized for 4m, current state undersized+peered, last acting [54]
pg 46.ff6 is stuck undersized for 4m, current state undersized+peered, last acting [28]
pg 46.ff8 is stuck undersized for 4m, current state active+undersized, last acting [39,59]
pg 46.ff9 is stuck undersized for 4m, current state active+undersized, last acting [28,54]
pg 46.ffa is stuck undersized for 4m, current state active+undersized, last acting [27,41]
pg 46.ffb is stuck undersized for 4m, current state active+undersized, last acting [87,37]
pg 46.ffc is stuck undersized for 4m, current state active+undersized, last acting [59,50]
pg 46.ffe is stuck undersized for 4m, current state undersized+peered, last acting [55]

 

 

lower the min size of ec pool from 5 to 4.

What about the replicated pool that is also going into inactive state

Issue resolved by editing min_size , Thanku

If this is a production system,  i would recommend you leave the min_size as original even if pool is inactive until until the recovery process rebuilds the lost replicas from the 2 nodes down. If this was a test, i would start up 1 of the 2 down nodes. If you do go for reducing min_size, this will solve the pool inactive but you have no redundancy and may cause issues like data inconsistency if OSD go up and down for any reason, since you are allowing data changes to occur without any redundancy.