Clarification on PetaSAN_Online_Upgrade_Guide

neiltorda
99 Posts
July 27, 2023, 6:32 pmQuote from neiltorda on July 27, 2023, 6:32 pmClient I/O appears to be working. I logged into a system that has an iscsi multi-path disk from this system mounted. I could go into the mounted folder and read and write data to it. It is reporting the correct sizes, etc.
I ran the commands on node psan4 (the one that has not yet been updated)
oot@psan4:~# ceph versions
{
"mon": {
"ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stable)": 1,
"ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 2
},
"mgr": {
"ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stable)": 1,
"ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 2
},
"osd": {
"ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stable)": 36,
"ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 114
},
"mds": {
"ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stable)": 1,
"ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 2
},
"overall": {
"ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stable)": 39,
"ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 120
}
}
root@psan4:~# ceph osd dump | grep release
require_osd_release octopus
Client I/O appears to be working. I logged into a system that has an iscsi multi-path disk from this system mounted. I could go into the mounted folder and read and write data to it. It is reporting the correct sizes, etc.
I ran the commands on node psan4 (the one that has not yet been updated)
oot@psan4:~# ceph versions
{
"mon": {
"ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stable)": 1,
"ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 2
},
"mgr": {
"ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stable)": 1,
"ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 2
},
"osd": {
"ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stable)": 36,
"ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 114
},
"mds": {
"ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stable)": 1,
"ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 2
},
"overall": {
"ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stable)": 39,
"ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 120
}
}
root@psan4:~# ceph osd dump | grep release
require_osd_release octopus

admin
2,974 Posts
July 27, 2023, 6:35 pmQuote from admin on July 27, 2023, 6:35 pmlooks good, i would go ahead with psan4 upgrade
looks good, i would go ahead with psan4 upgrade

neiltorda
99 Posts
July 27, 2023, 6:44 pmQuote from neiltorda on July 27, 2023, 6:44 pmI am running the update script now on psan4.
Once complete, do I run this:
ceph osd require-osd-release quincy
from just one node, or on all four nodes?
Thanks,
Neil
I am running the update script now on psan4.
Once complete, do I run this:
ceph osd require-osd-release quincy
from just one node, or on all four nodes?
Thanks,
Neil

neiltorda
99 Posts
July 27, 2023, 7:13 pmQuote from neiltorda on July 27, 2023, 7:13 pmafter running the updates and running
ceph osd require-osd-release quincy on one of the nodes, everything comes up except for the graphs at the bottom of the webUI.
There is a red triangle in the top right corner that when moused=over says Internal server error.
after running the updates and running
ceph osd require-osd-release quincy on one of the nodes, everything comes up except for the graphs at the bottom of the webUI.
There is a red triangle in the top right corner that when moused=over says Internal server error.

admin
2,974 Posts
July 28, 2023, 9:01 amQuote from admin on July 28, 2023, 9:01 amif you will reboot fter the upgrade, rhw graphs may work by themselves
you can also try
get the stats server ip from
/opt/petasan/scripts/util/get_cluster_leader.py
on node which is stats server, try
/opt/petasan/scripts/stats-stop.sh
/opt/petasan/scripts/stats-setup.sh
/opt/petasan/scripts/stats-start.sh
if you will reboot fter the upgrade, rhw graphs may work by themselves
you can also try
get the stats server ip from
/opt/petasan/scripts/util/get_cluster_leader.py
on node which is stats server, try
/opt/petasan/scripts/stats-stop.sh
/opt/petasan/scripts/stats-setup.sh
/opt/petasan/scripts/stats-start.sh

neiltorda
99 Posts
July 28, 2023, 2:54 pmQuote from neiltorda on July 28, 2023, 2:54 pmI had rebooted all the nodes yesterday evening. So I first tried the commands you provided here. (the cluster leader is being shown as psan1, 172.16.32.X) where X is the correct IP for psan1.
I had the same result. I then tried rebooting all the nodes again, to see if that brought it back. Still same red triangle with internal server error.
So I ran the above commands a 2nd time on the appropriate node (again, still psan1), with the same results.
The top part of the webUI is reporting properly, as are other pages in the UI (iscsi disk list and path assignment for example) are also working. It is just the graph at the bottom of the dashboard that is still showing the error.
Any other ideas?
Thanks so much!
Neil
I had rebooted all the nodes yesterday evening. So I first tried the commands you provided here. (the cluster leader is being shown as psan1, 172.16.32.X) where X is the correct IP for psan1.
I had the same result. I then tried rebooting all the nodes again, to see if that brought it back. Still same red triangle with internal server error.
So I ran the above commands a 2nd time on the appropriate node (again, still psan1), with the same results.
The top part of the webUI is reporting properly, as are other pages in the UI (iscsi disk list and path assignment for example) are also working. It is just the graph at the bottom of the dashboard that is still showing the error.
Any other ideas?
Thanks so much!
Neil

admin
2,974 Posts
July 30, 2023, 9:14 amQuote from admin on July 30, 2023, 9:14 amtry to swicth the current stats server
get the current stats server ip from
/opt/petasan/scripts/util/get_cluster_leader.py
on node which is stats server, stop the stats
systemctl stop petasan-cluster-leader
systemctl stop petasan-notification
/opt/petasan/scripts/stats-stop.sh
refresh dashboard, it should show bad gateway
consul kv delete PetaSAN/Services/ClusterLeader
wait approx 1 min then refresh dashboard
check the stats server is now a new node
/opt/petasan/scripts/util/get_cluster_leader.py
if you still have an error even with new stats server, i would suspect the stats data
on the new server:
/opt/petasan/scripts/stats-stop.sh
mv /opt/petasan/config/shared/graphite /opt/petasan/config/shared/graphite_backup
/opt/petasan/scripts/stats-setup.sh
/opt/petasan/scripts/stats-start.sh
if this works and you really need the old stats data, you can move the old stats files in groups (there is 1 file for each metric) and maybe you could find a corrupt metric file causing this.
try to swicth the current stats server
get the current stats server ip from
/opt/petasan/scripts/util/get_cluster_leader.py
on node which is stats server, stop the stats
systemctl stop petasan-cluster-leader
systemctl stop petasan-notification
/opt/petasan/scripts/stats-stop.sh
refresh dashboard, it should show bad gateway
consul kv delete PetaSAN/Services/ClusterLeader
wait approx 1 min then refresh dashboard
check the stats server is now a new node
/opt/petasan/scripts/util/get_cluster_leader.py
if you still have an error even with new stats server, i would suspect the stats data
on the new server:
/opt/petasan/scripts/stats-stop.sh
mv /opt/petasan/config/shared/graphite /opt/petasan/config/shared/graphite_backup
/opt/petasan/scripts/stats-setup.sh
/opt/petasan/scripts/stats-start.sh
if this works and you really need the old stats data, you can move the old stats files in groups (there is 1 file for each metric) and maybe you could find a corrupt metric file causing this.

neiltorda
99 Posts
July 31, 2023, 2:16 pmQuote from neiltorda on July 31, 2023, 2:16 pmAfter moving to the new node (psan2), I am still showing the same error.
So I started going through the next set of steps, but the stats-setup.sh script is throwing the error shown below:
root@psan2:~# /opt/petasan/scripts/stats-stop.sh
root@psan2:~# mv /opt/petasan/config/shared/graphite/ /opt/petasan/config/shared/graphite_backup
root@psan2:~# /opt/petasan/scripts/stats-setup.sh
mv: cannot stat '/opt/petasan/config/shared/graphite/whisper/PetaSAN/ClusterStats/ceph-ceph/cluster': No such file or directory
After moving to the new node (psan2), I am still showing the same error.
So I started going through the next set of steps, but the stats-setup.sh script is throwing the error shown below:
root@psan2:~# /opt/petasan/scripts/stats-stop.sh
root@psan2:~# mv /opt/petasan/config/shared/graphite/ /opt/petasan/config/shared/graphite_backup
root@psan2:~# /opt/petasan/scripts/stats-setup.sh
mv: cannot stat '/opt/petasan/config/shared/graphite/whisper/PetaSAN/ClusterStats/ceph-ceph/cluster': No such file or directory

admin
2,974 Posts
July 31, 2023, 6:28 pmQuote from admin on July 31, 2023, 6:28 pmyou can ignore it, proceed with the following step
/opt/petasan/scripts/stats-start.sh
and see if the graphs start to show
you can ignore it, proceed with the following step
/opt/petasan/scripts/stats-start.sh
and see if the graphs start to show

neiltorda
99 Posts
August 1, 2023, 5:35 pmQuote from neiltorda on August 1, 2023, 5:35 pmRan the steps again to be safe, same issue:
Here is the terminal output:
root@psan2:~# /opt/petasan/scripts/util/get_cluster_leader.py
{'psan2': '172.16.32.10'}
root@psan2:~# /opt/petasan/scripts/stats-stop.sh
root@psan2:~# mv /opt/petasan/config/shared/graphite /opt/petasan/config/shared/graphite_backup2
root@psan2:~# /opt/petasan/scripts/stats-setup.sh
mv: cannot stat '/opt/petasan/config/shared/graphite/whisper/PetaSAN/ClusterStats/ceph-ceph/cluster': No such file or directory
root@psan2:~# /opt/petasan/scripts/stats-start.sh
volume set: success
root@psan2:~#
Ran the steps again to be safe, same issue:
Here is the terminal output:
root@psan2:~# /opt/petasan/scripts/util/get_cluster_leader.py
{'psan2': '172.16.32.10'}
root@psan2:~# /opt/petasan/scripts/stats-stop.sh
root@psan2:~# mv /opt/petasan/config/shared/graphite /opt/petasan/config/shared/graphite_backup2
root@psan2:~# /opt/petasan/scripts/stats-setup.sh
mv: cannot stat '/opt/petasan/config/shared/graphite/whisper/PetaSAN/ClusterStats/ceph-ceph/cluster': No such file or directory
root@psan2:~# /opt/petasan/scripts/stats-start.sh
volume set: success
root@psan2:~#
Clarification on PetaSAN_Online_Upgrade_Guide
neiltorda
99 Posts
Quote from neiltorda on July 27, 2023, 6:32 pmClient I/O appears to be working. I logged into a system that has an iscsi multi-path disk from this system mounted. I could go into the mounted folder and read and write data to it. It is reporting the correct sizes, etc.
I ran the commands on node psan4 (the one that has not yet been updated)
oot@psan4:~# ceph versions
{
"mon": {
"ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stable)": 1,
"ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 2
},
"mgr": {
"ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stable)": 1,
"ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 2
},
"osd": {
"ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stable)": 36,
"ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 114
},
"mds": {
"ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stable)": 1,
"ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 2
},
"overall": {
"ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stable)": 39,
"ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 120
}
}
root@psan4:~# ceph osd dump | grep release
require_osd_release octopus
Client I/O appears to be working. I logged into a system that has an iscsi multi-path disk from this system mounted. I could go into the mounted folder and read and write data to it. It is reporting the correct sizes, etc.
I ran the commands on node psan4 (the one that has not yet been updated)
oot@psan4:~# ceph versions
{
"mon": {
"ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stable)": 1,
"ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 2
},
"mgr": {
"ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stable)": 1,
"ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 2
},
"osd": {
"ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stable)": 36,
"ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 114
},
"mds": {
"ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stable)": 1,
"ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 2
},
"overall": {
"ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stable)": 39,
"ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 120
}
}
root@psan4:~# ceph osd dump | grep release
require_osd_release octopus
admin
2,974 Posts
Quote from admin on July 27, 2023, 6:35 pmlooks good, i would go ahead with psan4 upgrade
looks good, i would go ahead with psan4 upgrade
neiltorda
99 Posts
Quote from neiltorda on July 27, 2023, 6:44 pmI am running the update script now on psan4.
Once complete, do I run this:
ceph osd require-osd-release quincyfrom just one node, or on all four nodes?
Thanks,
Neil
I am running the update script now on psan4.
Once complete, do I run this:
ceph osd require-osd-release quincy
from just one node, or on all four nodes?
Thanks,
Neil
neiltorda
99 Posts
Quote from neiltorda on July 27, 2023, 7:13 pmafter running the updates and running
ceph osd require-osd-release quincy on one of the nodes, everything comes up except for the graphs at the bottom of the webUI.
There is a red triangle in the top right corner that when moused=over says Internal server error.
after running the updates and running
ceph osd require-osd-release quincy on one of the nodes, everything comes up except for the graphs at the bottom of the webUI.
There is a red triangle in the top right corner that when moused=over says Internal server error.
admin
2,974 Posts
Quote from admin on July 28, 2023, 9:01 amif you will reboot fter the upgrade, rhw graphs may work by themselves
you can also try
get the stats server ip from
/opt/petasan/scripts/util/get_cluster_leader.pyon node which is stats server, try
/opt/petasan/scripts/stats-stop.sh
/opt/petasan/scripts/stats-setup.sh
/opt/petasan/scripts/stats-start.sh
if you will reboot fter the upgrade, rhw graphs may work by themselves
you can also try
get the stats server ip from
/opt/petasan/scripts/util/get_cluster_leader.py
on node which is stats server, try
/opt/petasan/scripts/stats-stop.sh
/opt/petasan/scripts/stats-setup.sh
/opt/petasan/scripts/stats-start.sh
neiltorda
99 Posts
Quote from neiltorda on July 28, 2023, 2:54 pmI had rebooted all the nodes yesterday evening. So I first tried the commands you provided here. (the cluster leader is being shown as psan1, 172.16.32.X) where X is the correct IP for psan1.
I had the same result. I then tried rebooting all the nodes again, to see if that brought it back. Still same red triangle with internal server error.
So I ran the above commands a 2nd time on the appropriate node (again, still psan1), with the same results.
The top part of the webUI is reporting properly, as are other pages in the UI (iscsi disk list and path assignment for example) are also working. It is just the graph at the bottom of the dashboard that is still showing the error.
Any other ideas?
Thanks so much!
Neil
I had rebooted all the nodes yesterday evening. So I first tried the commands you provided here. (the cluster leader is being shown as psan1, 172.16.32.X) where X is the correct IP for psan1.
I had the same result. I then tried rebooting all the nodes again, to see if that brought it back. Still same red triangle with internal server error.
So I ran the above commands a 2nd time on the appropriate node (again, still psan1), with the same results.
The top part of the webUI is reporting properly, as are other pages in the UI (iscsi disk list and path assignment for example) are also working. It is just the graph at the bottom of the dashboard that is still showing the error.
Any other ideas?
Thanks so much!
Neil
admin
2,974 Posts
Quote from admin on July 30, 2023, 9:14 amtry to swicth the current stats server
get the current stats server ip from
/opt/petasan/scripts/util/get_cluster_leader.pyon node which is stats server, stop the stats
systemctl stop petasan-cluster-leader
systemctl stop petasan-notification
/opt/petasan/scripts/stats-stop.sh
refresh dashboard, it should show bad gateway
consul kv delete PetaSAN/Services/ClusterLeader
wait approx 1 min then refresh dashboard
check the stats server is now a new node
/opt/petasan/scripts/util/get_cluster_leader.pyif you still have an error even with new stats server, i would suspect the stats data
on the new server:
/opt/petasan/scripts/stats-stop.sh
mv /opt/petasan/config/shared/graphite /opt/petasan/config/shared/graphite_backup
/opt/petasan/scripts/stats-setup.sh
/opt/petasan/scripts/stats-start.sh
if this works and you really need the old stats data, you can move the old stats files in groups (there is 1 file for each metric) and maybe you could find a corrupt metric file causing this.
try to swicth the current stats server
get the current stats server ip from
/opt/petasan/scripts/util/get_cluster_leader.py
on node which is stats server, stop the stats
systemctl stop petasan-cluster-leader
systemctl stop petasan-notification
/opt/petasan/scripts/stats-stop.sh
refresh dashboard, it should show bad gateway
consul kv delete PetaSAN/Services/ClusterLeader
wait approx 1 min then refresh dashboard
check the stats server is now a new node
/opt/petasan/scripts/util/get_cluster_leader.py
if you still have an error even with new stats server, i would suspect the stats data
on the new server:
/opt/petasan/scripts/stats-stop.sh
mv /opt/petasan/config/shared/graphite /opt/petasan/config/shared/graphite_backup
/opt/petasan/scripts/stats-setup.sh
/opt/petasan/scripts/stats-start.sh
if this works and you really need the old stats data, you can move the old stats files in groups (there is 1 file for each metric) and maybe you could find a corrupt metric file causing this.
neiltorda
99 Posts
Quote from neiltorda on July 31, 2023, 2:16 pmAfter moving to the new node (psan2), I am still showing the same error.
So I started going through the next set of steps, but the stats-setup.sh script is throwing the error shown below:
root@psan2:~# /opt/petasan/scripts/stats-stop.sh
root@psan2:~# mv /opt/petasan/config/shared/graphite/ /opt/petasan/config/shared/graphite_backup
root@psan2:~# /opt/petasan/scripts/stats-setup.sh
mv: cannot stat '/opt/petasan/config/shared/graphite/whisper/PetaSAN/ClusterStats/ceph-ceph/cluster': No such file or directory
After moving to the new node (psan2), I am still showing the same error.
So I started going through the next set of steps, but the stats-setup.sh script is throwing the error shown below:
root@psan2:~# /opt/petasan/scripts/stats-stop.sh
root@psan2:~# mv /opt/petasan/config/shared/graphite/ /opt/petasan/config/shared/graphite_backup
root@psan2:~# /opt/petasan/scripts/stats-setup.sh
mv: cannot stat '/opt/petasan/config/shared/graphite/whisper/PetaSAN/ClusterStats/ceph-ceph/cluster': No such file or directory
admin
2,974 Posts
Quote from admin on July 31, 2023, 6:28 pmyou can ignore it, proceed with the following step
/opt/petasan/scripts/stats-start.sh
and see if the graphs start to show
you can ignore it, proceed with the following step
/opt/petasan/scripts/stats-start.sh
and see if the graphs start to show
neiltorda
99 Posts
Quote from neiltorda on August 1, 2023, 5:35 pmRan the steps again to be safe, same issue:
Here is the terminal output:
root@psan2:~# /opt/petasan/scripts/util/get_cluster_leader.py
{'psan2': '172.16.32.10'}
root@psan2:~# /opt/petasan/scripts/stats-stop.sh
root@psan2:~# mv /opt/petasan/config/shared/graphite /opt/petasan/config/shared/graphite_backup2
root@psan2:~# /opt/petasan/scripts/stats-setup.sh
mv: cannot stat '/opt/petasan/config/shared/graphite/whisper/PetaSAN/ClusterStats/ceph-ceph/cluster': No such file or directory
root@psan2:~# /opt/petasan/scripts/stats-start.sh
volume set: success
root@psan2:~#
Ran the steps again to be safe, same issue:
Here is the terminal output:
root@psan2:~# /opt/petasan/scripts/util/get_cluster_leader.py
{'psan2': '172.16.32.10'}
root@psan2:~# /opt/petasan/scripts/stats-stop.sh
root@psan2:~# mv /opt/petasan/config/shared/graphite /opt/petasan/config/shared/graphite_backup2
root@psan2:~# /opt/petasan/scripts/stats-setup.sh
mv: cannot stat '/opt/petasan/config/shared/graphite/whisper/PetaSAN/ClusterStats/ceph-ceph/cluster': No such file or directory
root@psan2:~# /opt/petasan/scripts/stats-start.sh
volume set: success
root@psan2:~#