You need to log in to create posts and topics. Login · Register

cant get mds services started

MDS services appear to be down and not starting.

see below where it is having an issue with the ${CLUSTER} variable...

 

root@ceph-public1:~# systemctl status ceph-mds@ceph-public1
ceph-mds@ceph-public1.service - Ceph metadata server daemon
Loaded: loaded (/lib/systemd/system/ceph-mds@.service; disabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Fri 2024-03-29 20:08:41 EDT; 3h 31min ago
Process: 1235714 ExecStart=/usr/bin/ceph-mds -f --cluster ${CLUSTER} --id ceph-public1 --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE)
Main PID: 1235714 (code=exited, status=1/FAILURE)

Mar 29 20:08:41 ceph-public1 systemd[1]: ceph-mds@ceph-public1.service: Service hold-off time over, scheduling restart.
Mar 29 20:08:41 ceph-public1 systemd[1]: ceph-mds@ceph-public1.service: Scheduled restart job, restart counter is at 3.
Mar 29 20:08:41 ceph-public1 systemd[1]: Stopped Ceph metadata server daemon.
Mar 29 20:08:41 ceph-public1 systemd[1]: ceph-mds@ceph-public1.service: Start request repeated too quickly.
Mar 29 20:08:41 ceph-public1 systemd[1]: ceph-mds@ceph-public1.service: Failed with result 'exit-code'.
Mar 29 20:08:41 ceph-public1 systemd[1]: Failed to start Ceph metadata server daemon.
root@ceph-public1:~#

Magid, i forgot to mention that above is on a 2.8.1 cluster that due to internal reasons are hesitating to migrate to 3.2.1 (latest) as we are working on other issues that are underlining this cluster with regards to iSCSI ...

as an update. i followed your instructions on following link to recreate the mds services.

 

Forums

 

Quote

1) can you run
ceph versions

2) Was this a fresh 2.6 install or what was this an older cluster that was upgraded ?

3) Did you try manually to add other mds servers yourself ?

4) Try to re-create the mds servers
on management node:

# edit installed flag
nano /opt/petasan/config/flags/flags.json
change line
"ceph_mds_installed": true
to
"ceph_mds_installed": false

# delete key
ceph auth del mds.HOSTNAME

# recreate mds
/opt/petasan/scripts/create_mds.py

# check if up

systemctl status ceph-mds@HOSTNAME

if it works, do the same on the other 2 management nodes.