Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

pg autoscaler and default rule

I don't know if is a bug but i had this scenario:

  • create a replicate rule (or was already present, i don't remember)
  • create an erasure coded rule using replicated-by-host-hdd

This make my pools using an overlapping root and autoscaler can't be used.

I think that a "constraint" or a warning to use only the same device class will be useful.

Regards, Fabrizio

 

huh... well.. i am going to double down on this issue and i too are now seeing the same problem....

 

2023-11-15T21:06:33.407-0500 7f9ce911e700 0 [pg_autoscaler WARNING root] pool 1 contains an overlapping root -1... skipping scaling
2023-11-15T21:06:33.407-0500 7f9ce911e700 0 [pg_autoscaler WARNING root] pool 6 contains an overlapping root -1... skipping scaling
2023-11-15T21:06:33.407-0500 7f9ce911e700 0 [pg_autoscaler WARNING root] pool 7 contains an overlapping root -1... skipping scaling
2023-11-15T21:06:33.407-0500 7f9ce911e700 0 [pg_autoscaler WARNING root] pool 8 contains an overlapping root -2... skipping scaling
2023-11-15T21:06:33.407-0500 7f9ce911e700 0 [pg_autoscaler WARNING root] pool 9 contains an overlapping root -1... skipping scaling
2023-11-15T21:06:33.407-0500 7f9ce911e700 0 [pg_autoscaler WARNING root] pool 10 contains an overlapping root -15... skipping scaling
2023-11-15T21:06:33.407-0500 7f9ce911e700 0 [pg_autoscaler WARNING root] pool 11 contains an overlapping root -2... skipping scaling
2023-11-15T21:06:33.407-0500 7f9ce911e700 0 [pg_autoscaler WARNING root] pool 12 contains an overlapping root -1... skipping scaling
2023-11-15T21:06:33.411-0500 7f9ce911e700 0 [pg_autoscaler WARNING root] pool 13 contains an overlapping root -15... skipping scaling
2023-11-15T21:06:33.411-0500 7f9ce911e700 0 [pg_autoscaler WARNING root] pool 14 contains an overlapping root -2... skipping scaling
2023-11-15T21:06:33.411-0500 7f9ce911e700 0 [pg_autoscaler WARNING root] pool 15 contains an overlapping root -1... skipping scaling
2023-11-15T21:06:33.411-0500 7f9ce911e700 0 [pg_autoscaler WARNING root] pool 16 contains an overlapping root -15... skipping scaling
2023-11-15T21:06:33.411-0500 7f9ce911e700 0 [pg_autoscaler WARNING root] pool 17 contains an overlapping root -2... skipping scaling
2023-11-15T21:06:33.411-0500 7f9ce911e700 0 [pg_autoscaler WARNING root] pool 18 contains an overlapping root -15... skipping scaling
2023-11-15T21:06:33.411-0500 7f9ce911e700 0 [pg_autoscaler WARNING root] pool 19 contains an overlapping root -2... skipping scaling
2023-11-15T21:06:33.411-0500 7f9ce911e700 0 [pg_autoscaler WARNING root] pool 20 contains an overlapping root -15... skipping scaling
2023-11-15T21:06:33.415-0500 7f9ce911e700 0 [pg_autoscaler WARNING root] pool 21 contains an overlapping root -15... skipping scaling
2023-11-15T21:06:33.415-0500 7f9ce911e700 0 [pg_autoscaler WARNING root] pool 25 contains an overlapping root -15... skipping scaling

per below tree, we do NOT have an ID "-15" or "-2"

 

ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 30.55756 root default
-9 6.91190 host cephdev1
8 hdd 1.81897 osd.8 up 1.00000 0.50000
9 hdd 1.81897 osd.9 up 1.00000 0.50000
10 hdd 1.81897 osd.10 up 1.00000 0.50000
12 ssd 1.45499 osd.12 up 1.00000 1.00000
-3 8.36688 host cephdev2
0 hdd 1.81897 osd.0 up 1.00000 0.50000
3 hdd 1.81897 osd.3 up 1.00000 0.50000
11 hdd 1.81897 osd.11 up 1.00000 0.50000
13 ssd 1.45499 osd.13 up 1.00000 1.00000
16 ssd 1.45499 osd.16 up 1.00000 1.00000
-5 6.91190 host cephdev3
1 hdd 1.81897 osd.1 up 1.00000 0.50000
4 hdd 1.81897 osd.4 up 1.00000 0.50000
6 hdd 1.81897 osd.6 up 1.00000 0.50000
14 ssd 1.45499 osd.14 up 1.00000 1.00000
-7 8.36688 host cephdev4
2 hdd 1.81897 osd.2 up 1.00000 0.50000
5 hdd 1.81897 osd.5 up 1.00000 0.50000
7 hdd 1.81897 osd.7 up 1.00000 0.50000
21 ssd 1.45499 osd.21 up 1.00000 1.00000
22 ssd 1.45499 osd.22 up 1.00000 1.00000

 

 

 

 

Type:
ceph osd crush rule dump

You will see how your rules are defined:

ceph osd crush rule dump
[
{
"rule_id": 0,
"rule_name": "replicated_rule",
"type": 1,
"steps": [
{
"op": "take",
"item": -1,
"item_name": "default"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
},
{
"rule_id": 1,
"rule_name": "ec-by-host-hdd",
"type": 3,
"steps": [
{
"op": "set_chooseleaf_tries",
"num": 5
},
{
"op": "set_choose_tries",
"num": 100
},
{
"op": "take",
"item": -2,
"item_name": "default~hdd"
},
{
"op": "chooseleaf_indep",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
},
{
"rule_id": 2,
"rule_name": "replicate-by-host-hdd",
"type": 1,
"steps": [
{
"op": "take",
"item": -2,
"item_name": "default~hdd"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
}
]

If you can see, I have in my rules a item_name: "default-hdd" and another rule with "default". If you have pools on your osd that uses both the rules, you will have an overlapping rule.
So, simply use only "default-hdd" and "default-ssd" templates to create your rules (not the "default" only"); then, you can change rule on your pools to use SSD or HDD rule, not default one. Is safe to do it on a live cluster, it will rebalance your datas.