Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

PetaSAN Ceph HEALTH_ERR – PG 16.90a active+clean+inconsistent (Possible Data Damage)

Hi Team,

We are observing a Ceph / PetaSAN health error on the production cluster and need guidance to safely fix it.

cluster:
id: 928aada2-f7e9-4bd5-9e61-7711b881f0f5
health: HEALTH_ERR
1 scrub errors
Possible data damage: 1 pg inconsistent

services:
mon: 3 daemons, quorum mys-dc-gc1-mon03,mys-dc-gc1-mon01,mys-dc-gc1-mon02 (age 16m)
mgr: mys-dc-gc1-mon03(active, since 19M), standbys: mys-dc-gc1-mon02, mys-dc-gc1-mon01
mds: 1/1 daemons up, 2 standby
osd: 117 osds: 117 up (since 2w), 117 in (since 2w)

data:
volumes: 1/1 healthy
pools: 7 pools, 4769 pgs
objects: 39.28M objects, 134 TiB
usage: 400 TiB used, 313 TiB / 713 TiB avail
pgs: 4763 active+clean
4 active+clean+scrubbing
1 active+clean+scrubbing+deep
1 active+clean+inconsistent

io:
client: 4.0 MiB/s rd, 60 MiB/s wr, 120 op/s rd, 3.52k op/s wr

root@osd06:/var/log/ceph# ceph health detail
HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent
[ERR] OSD_SCRUB_ERRORS: 1 scrub errors
[ERR] PG_DAMAGED: Possible data damage: 1 pg inconsistent
pg 16.90a is active+clean+inconsistent, acting [70,26,35]
root@mys-dc-gc1-osd06:/var/log/ceph#

Find out error detail, find which OSD is involved
rados list-inconsistent-obj 16.90a --format=json-pretty

Try to repair error and re-scrub
ceph pg repair 16.90a
ceph pg deep-scrub 16.90a

Print scrub finish time
ceph pg dump | grep 16.90a | tr -s ' ' | cut -d ' ' -f23
Or (depends on Ceph version)
ceph pg dump | grep 16.90a | tr -s ' ' | cut -d ' ' -f24

If error persist after second scrub, replace affected OSD

Note: Please consider buying support.