Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

PetaSAN v3.3.0 Kernel command line cstate settings

Hello there

PetaSAN is a really great solution! Thank you very much for all the efforts put into it!

After playing with PetaSAN in a virtual environment we moved to 3 physical servers. It works very well. The "problem" is, these are re-purposed servers that have 2 rather powerful CPU built in. They are most of the time 100% idle, since we do not have that many transactions and no EC pools. Most of the work is done in HW and it's associated offloading mechanisms.

But the environment and our electricity bill is not that happy about the CPU's running at full speed. We noticed that the CPU's run at 100% clock speed all the time. We verified the BIOS settings, tried to lower the frequency of some cores and so on to no avail. It turned out, PetaSAN provides some kernel settings preventing any power saving mechanisms to kick in:

BOOT_IMAGE=/boot/vmlinuz-5.14.21-08-petasan root=UUID=1a10e09e-e3cb-4886-a461-47aad6f28f54 ro quiet net.ifnames=1 intel_idle.max_cstate=1 processor.max_cstate=1 pti=off spectre_v2=off l1tf=off nospec_store_b
ypass_disable no_stf_barrier

where the relevant settings are:

intel_idle.max_cstate=1
processor.max_cstate=1

Although I can understand the idea behind this choice (no latency due to frequency / efficiency scaling), energy/environment/cost wise, this is not that optimal, especially if there is a big overcommitment of CPU resources.

So my question is: what is the reason the creator of PetaSAN choose these settings? What happens if higher cstates are allowed?

For us, if the stability is not endangered, we might either run at higher cstates, configure a few cores to be on 100% frequency all the time and others to go to powersafe.

If the cstates cannot be altered, the only option would be to remove one of the physical CPU, but this then get's complex on which of the PCI slots are usable and which are not.

Thank you

 

 

 

Thank you for your nice feedback, it is a lot of work under the hood to make things work 🙂

This cpu setting does make a significant difference in latency, almost 40% better. So performance of a single thread is much improved. It is one of the top OS/kernel tuning parameter by Ceph. There are many online reference to this in the Ceph community.

Ofcourse you can change this if you really want:

In PetaSAN 4.0 you can do so via the UI under Performance Profiles...you can create your own profiles and apply them centrally ant any time. It could be one reason to upgrade.

I you do not upgrade, you can do it manually..on ALL nodes, edit:

/opt/petasan/config/tuning/current/post_deploy_script

and remove the cpu settings, then reboot ALL nodes.

... and constantly changing, security updates and so on. E.g. ongoing work and it is well maintained as version 4 just arrived, not only kept alive. Looking at the upgrade guide, the path from 3.3.0 to 4.0.0 appears to be straight forward.

Note: it is a bit strange to read section 7 "Upgrading from PetaSAN 4.X and above" as 4.0.0 just hit the public ...

I indeed did not research the cstate settings related to Ceph, as PetaSAN might have had it's own reasons to do this, as it is more than only Ceph. But yes, there are many references to this.

Does PetaSAN v4 implement these profiles via Tuned?

If you already planned for this in v4, I trust this is the best / long shot way to address  this. So we will do the upgrade, see how much energy can be conserved and how the latency is affected (if this is easy to measure), if this does not work, we will remove one of the physical CPU.

BTW: my plan to only run a few cores at 100% and others with less, makes it too complex, as processes like OSD would need to be pinned to these CPU's, plus other services I'm not aware of. If a service is restarted, pinning would need to happen again. So I just dumped that idea ..

Thank you

 

We did the update to 4.0.0, which was a smooth transition. We then tried to apply the performance profile.

Not sure what need to be done to apply the changed settings, as "Apply" does not update / rebuild grub in this case.

BUT: after the upgrade from 3.3.0 to 4.0.0, we did not restart the nodes. To see if the new settings of the performance profile were applied, we rebooted one of the nodes. This ends in the GRUB> prompt and does no longer start up the OS.

Using set prefix= and issue the "normal" command,  we were able to launch the grub menu which in turn allows normal boot of the node ...

What is the "official" procedure to fix this condition?

 

The Performance Profiles run the different scripts defined in the profile. Nearly all do not require a reboot (at least in the PetaSAN default profile). The cpu speed / c-states settings do require reboots, and can be applied/changes while running (you would need to setup you own script/profile and change the parameters passed to cpupower command line). So you can have a low and high perf. profiles  apply profiles at runtime without reboots.

As for the system did not start after a reboot, and you had to change the grub prefix...no that is not the official way 🙂 and ofcourse it should work without issue. Maybe you had some problems in your upgrade. If you can reproduce this using a different environment or setup that we can reproduce, please report this (preferably in the bug topic) and we will try to reproduce it.