You need to log in to create posts and topics. Login · Register

OOM on a single Node

Hello,

on a degraded Cluster, I have the issue that on one Node, 2 OSD Services are Crashing with Out of Memory dmesg entry round about every 10 Minutes.

The Cluster is recovering since 1 Week.

The Node has 128 GB RAM and 12 OSDs running.

Any Ideas? Or a bug?

[code]

[6353773.121658] libceph: osd32 up
[6353785.569423] atopacctd invoked oom-killer: gfp_mask=0x1100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
[6353785.569436] CPU: 7 PID: 3320 Comm: atopacctd Tainted: G E N 5.14.21-02-petasan #1 SLE15-SP4 (unreleased) 42312f06abc57ff126981fb8ed0c809bb5c3b9b3
[6353785.569443] Hardware name: Supermicro SSG-5018D8-AR12L/X10SDV-7TP4F, BIOS 2.1a 02/12/2020
[6353785.569446] Call Trace:
[6353785.569450] <TASK>
[6353785.569454] dump_stack_lvl+0x46/0x5e
[6353785.569463] dump_header+0x4a/0x1fe
[6353785.569468] oom_kill_process.cold+0xb/0x10
[6353785.569472] out_of_memory+0x1bd/0x500
[6353785.569480] __alloc_pages_slowpath.constprop.0+0xc00/0xce0
[6353785.569492] __alloc_pages+0x2d5/0x320
[6353785.569498] pagecache_get_page+0x1b3/0x490
[6353785.569504] filemap_fault+0x526/0xb10
[6353785.569509] ? filemap_map_pages+0x13d/0x590
[6353785.569514] __do_fault+0x35/0xb0
[6353785.569519] __handle_mm_fault+0xecd/0x14c0
[6353785.569525] ? switch_fpu_return+0x49/0xd0
[6353785.569531] handle_mm_fault+0xd5/0x2b0
[6353785.569536] do_user_addr_fault+0x1c2/0x690
[6353785.569542] exc_page_fault+0x68/0x150
[6353785.569550] ? asm_exc_page_fault+0x8/0x30
[6353785.569556] asm_exc_page_fault+0x1e/0x30
[6353785.569561] RIP: 0033:0x55e7c22833cb
[6353785.569570] Code: Unable to access opcode bytes at RIP 0x55e7c22833a1.
[6353785.569572] RSP: 002b:00007ffe61c18fe0 EFLAGS: 00010206
[6353785.569577] RAX: 0000000000000000 RBX: 0000000000000007 RCX: 00007fe0feb371b4
[6353785.569580] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[6353785.569582] RBP: 00007ffe61c19010 R08: 0000000000000000 R09: 0000000000000000
[6353785.569585] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[6353785.569588] R13: 00007ffe61c19000 R14: 00007ffe61c1d170 R15: 00007ffe61c1cf28
[6353785.569594] </TASK>
[6353785.569596] Mem-Info:
[6353785.569599] active_anon:7250 inactive_anon:8010391 isolated_anon:0
active_file:53 inactive_file:497 isolated_file:3
unevictable:7596 dirty:111 writeback:0
slab_reclaimable:204980 slab_unreclaimable:24316918
mapped:1846 shmem:997 pagetables:24122 bounce:0
free:270369 free_pcp:62 free_cma:0
[6353785.569608] Node 0 active_anon:29000kB inactive_anon:32041564kB active_file:212kB inactive_file:1988kB unevictable:30384kB isolated(anon):0kB isolated(file):12kB mapped:7384kB dirty:444kB writeback:0kB shmem:3988kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 124928kB writeback_tmp:0kB kernel_stack:18720kB pagetables:96488kB all_unreclaimable? no
[6353785.569616] Node 0 DMA free:11264kB boost:0kB min:4kB low:16kB high:28kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[6353785.569626] lowmem_reserve[]: 0 1792 128613 128613 128613
[6353785.569634] Node 0 DMA32 free:510188kB boost:0kB min:940kB low:2772kB high:4604kB reserved_highatomic:2048KB active_anon:648kB inactive_anon:803768kB active_file:0kB inactive_file:136kB unevictable:21880kB writepending:0kB present:1975720kB managed:1910184kB mlocked:21880kB bounce:0kB free_pcp:248kB local_pcp:0kB free_cma:0kB
[6353785.569644] lowmem_reserve[]: 0 0 126820 126820 126820
[6353785.569652] Node 0 Normal free:560024kB boost:489540kB min:556172kB low:686036kB high:815900kB reserved_highatomic:4096KB active_anon:28352kB inactive_anon:31238516kB active_file:428kB inactive_file:1560kB unevictable:8504kB writepending:752kB present:132120576kB managed:129872088kB mlocked:8504kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[6353785.569662] lowmem_reserve[]: 0 0 0 0 0
[6353785.569669] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 2*4096kB (M) = 11264kB
[6353785.569694] Node 0 DMA32: 5815*4kB (UME) 3579*8kB (UME) 4514*16kB (UME) 3265*32kB (UME) 1432*64kB (UE) 493*128kB (UME) 99*256kB (UME) 5*512kB (UE) 1*1024kB (M) 10*2048kB (M) 19*4096kB (M) = 510580kB
[6353785.569725] Node 0 Normal: 1239*4kB (UME) 27197*8kB (UME) 16490*16kB (UME) 2449*32kB (UME) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 564740kB
[6353785.569751] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[6353785.569754] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[6353785.569757] 3652 total pagecache pages
[6353785.569759] 0 pages in swap cache
[6353785.569761] Swap cache stats: add 0, delete 0, find 0/0
[6353785.569763] Free swap = 0kB
[6353785.569765] Total swap = 0kB
[6353785.569767] 33528066 pages RAM
[6353785.569768] 0 pages HighMem/MovableOnly
[6353785.569771] 578658 pages reserved
[6353785.569772] 0 pages cma reserved
[6353785.569774] 0 pages hwpoisoned
[6353785.569775] Unreclaimable slab info:
[6353785.569777] Name Used Total
[6353785.569779] t10_alua_tg_pt_gp_cache 31KB 31KB
[6353785.569783] rbd_obj_request 2354KB 2354KB
[6353785.569786] rbd_img_request 95565503KB 95565503KB
[6353785.569789] ceph_osd_request 2504KB 2531KB
[6353785.569792] ceph_msg 1808KB 1808KB
[6353785.569795] fuse_request 61KB 61KB
[6353785.569801] scsi_sense_cache 2168KB 2168KB
[6353785.569804] PINGv6 432KB 432KB
[6353785.569807] RAWv6 308KB 308KB
[6353785.569809] UDPv6 504KB 504KB
[6353785.569812] TCPv6 494KB 494KB
[6353785.569815] mqueue_inode_cache 32KB 32KB
[6353785.569819] UNIX 605KB 605KB
[6353785.569822] PING 448KB 448KB
[6353785.569824] RAW 544KB 544KB
[6353785.569826] tw_sock_TCP 903KB 903KB
[6353785.569828] request_sock_TCP 239KB 239KB
[6353785.569849] TCP 7967KB 10269KB
[6353785.569851] hugetlbfs_inode_cache 31KB 31KB
[6353785.569854] ep_head 156KB 156KB
[6353785.569857] bio_crypt_ctx 27KB 27KB
[6353785.569859] request_queue 472KB 472KB
[6353785.569862] biovec-max 864KB 928KB
[6353785.569865] biovec-128 608KB 704KB
[6353785.569868] biovec-64 544KB 704KB
[6353785.569870] khugepaged_mm_slot 63KB 63KB
[6353785.569873] dmaengine-unmap-256 30KB 30KB
[6353785.569876] dmaengine-unmap-128 31KB 31KB
[6353785.569878] skbuff_ext_cache 351KB 385KB
[6353785.569881] skbuff_fclone_cache 2032KB 2032KB
[6353785.569884] skbuff_head_cache 428KB 440KB
[6353785.569886] file_lock_cache 124KB 124KB
[6353785.569889] file_lock_ctx 63KB 63KB
[6353785.569891] fsnotify_mark_connector 92KB 92KB
[6353785.569894] net_namespace 255KB 255KB
[6353785.569897] task_delay_info 195KB 195KB
[6353785.569899] taskstats 253KB 253KB
[6353785.569901] proc_dir_entry 433KB 433KB
[6353785.569904] pde_opener 63KB 63KB
[6353785.569907] seq_file 63KB 63KB
[6353785.569909] shmem_inode_cache 2709KB 2709KB
[6353785.569921] kernfs_node_cache 8268KB 8372KB
[6353785.569924] mnt_cache 1968KB 1968KB
[6353785.569940] filp 1933KB 2408KB
[6353785.569942] names_cache 512KB 512KB
[6353785.569945] lsm_file_cache 553KB 553KB
[6353785.569947] uts_namespace 249KB 249KB
[6353785.569950] vm_area_struct 3144KB 3144KB
[6353785.569952] mm_struct 573KB 573KB
[6353785.569955] files_cache 537KB 537KB
[6353785.569958] signal_cache 1833KB 1890KB
[6353785.569961] sighand_cache 2227KB 2227KB
[6353785.569964] task_struct 12393KB 12575KB
[6353785.569967] cred_jar 2260KB 2260KB
[6353785.569975] anon_vma_chain 1020KB 1140KB
[6353785.569979] anon_vma 982KB 1025KB
[6353785.569981] pid 988KB 988KB
[6353785.569984] Acpi-Operand 854KB 854KB
[6353785.569986] Acpi-ParseExt 47KB 47KB
[6353785.569989] Acpi-State 63KB 63KB
[6353785.569991] numa_policy 7KB 7KB
[6353785.569995] perf_event 1226KB 1414KB
[6353785.569997] trace_event_file 517KB 517KB
[6353785.570000] ftrace_event_field 541KB 541KB
[6353785.570002] pool_workqueue 515KB 528KB
[6353785.570005] task_group 252KB 252KB
[6353785.570008] vmap_area 682KB 688KB
[6353785.570012] kmalloc-cg-8k 512KB 512KB
[6353785.570015] kmalloc-cg-4k 512KB 512KB
[6353785.570017] kmalloc-cg-2k 1248KB 1248KB
[6353785.570020] kmalloc-cg-1k 2048KB 2048KB
[6353785.570022] kmalloc-cg-512 304KB 304KB
[6353785.570024] kmalloc-cg-256 128KB 128KB
[6353785.570026] kmalloc-cg-192 133KB 133KB
[6353785.570029] kmalloc-cg-96 63KB 63KB
[6353785.570031] kmalloc-cg-64 64KB 64KB
[6353785.570033] kmalloc-cg-32 64KB 64KB
[6353785.570036] kmalloc-cg-16 64KB 64KB
[6353785.570038] kmalloc-cg-8 308KB 308KB
[6353785.570040] kmalloc-8k 7824KB 7968KB
[6353785.570045] kmalloc-4k 6512KB 6720KB
[6353785.570049] kmalloc-2k 4192KB 4576KB
[6353785.570053] kmalloc-1k 3622KB 3968KB
[6353785.570130] kmalloc-512 3965KB 14272KB
[6353785.570469] kmalloc-256 14052KB 30240KB
[6353785.570472] kmalloc-192 417KB 417KB
[6353785.570475] kmalloc-128 380KB 380KB
[6353785.570477] kmalloc-96 606KB 606KB
[6353785.570484] kmalloc-64 1413KB 1520KB
[6353785.570487] kmalloc-32 1712KB 1712KB
[6353785.570489] kmalloc-16 488KB 488KB
[6353785.570491] kmalloc-8 156KB 156KB
[6353785.570493] kmem_cache_node 36KB 36KB
[6353785.570495] kmem_cache 88KB 88KB
[6353785.570497] Tasks state (memory values in pages):
[6353785.570498] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
[6353785.570511] [ 515] 0 515 90038 741 737280 0 -250 systemd-journal
[6353785.570517] [ 548] 0 548 5487 776 65536 0 -1000 systemd-udevd
[6353785.570523] [ 3292] 116 3292 1777 478 53248 0 0 rpcbind
[6353785.570528] [ 3296] 102 3296 6172 1035 94208 0 0 systemd-resolve
[6353785.570533] [ 3320] 0 3320 627 367 40960 0 0 atopacctd
[6353785.570538] [ 3353] 111 3353 1790 579 53248 0 -900 dbus-daemon
[6353785.570543] [ 3363] 108 3363 56028 695 90112 0 0 rsyslogd
[6353785.570547] [ 3365] 0 3365 2848 950 53248 0 0 smartd
[6353785.570551] [ 3368] 0 3368 4383 451 69632 0 0 systemd-logind
[6353785.570556] [ 3474] 0 3474 20565 9035 200704 0 0 deploy.py
[6353785.570560] [ 3480] 107 3480 5731 844 90112 0 0 zabbix_agentd
[6353785.570565] [ 3483] 0 3483 1706 528 57344 0 0 cron
[6353785.570569] [ 3498] 107 3498 5731 663 90112 0 0 zabbix_agentd
[6353785.570574] [ 3499] 107 3499 5731 524 81920 0 0 zabbix_agentd
[6353785.570578] [ 3500] 107 3500 5731 524 81920 0 0 zabbix_agentd
[6353785.570582] [ 3501] 107 3501 5731 524 81920 0 0 zabbix_agentd
[6353785.570586] [ 3502] 107 3502 5731 715 86016 0 0 zabbix_agentd
[6353785.570590] [ 3503] 0 3503 3046 525 69632 0 -1000 sshd
[6353785.570594] [ 3626] 0 3626 8736 6467 98304 0 0 collectl
[6353785.570599] [ 4841] 0 4841 1459 373 49152 0 0 agetty
[6353785.570603] [ 4854] 0 4854 8834 5096 110592 0 0 mount_sharedfs.
[6353785.570608] [ 4965] 103 4965 18564 552 57344 0 0 ntpd
[6353785.570613] [ 4972] 0 4972 46468 6659 274432 0 0 consul
[6353785.570617] [ 5006] 0 5006 9054 5274 106496 0 0 files_sync.py
[6353785.570621] [ 5024] 0 5024 195666 11707 274432 0 0 iscsi_service.p
[6353785.570626] [ 5027] 0 5027 20333 8990 188416 0 0 iscsi_export_sn
[6353785.570630] [ 5030] 0 5030 19814 8505 192512 0 0 node_stats.py
[6353785.570635] [ 5696] 0 5696 4407 1972 81920 0 0 qperf.py
[6353785.570639] [ 5701] 0 5701 654 119 40960 0 0 sh
[6353785.570643] [ 5702] 0 5702 924 113 45056 0 0 qperf
[6353785.570648] [ 5746] 64045 5746 2040127 705987 15441920 0 0 ceph-osd
[6353785.570653] [ 7323] 0 7323 654 119 45056 0 0 sh
[6353785.570657] [ 7329] 0 7329 1368 94 45056 0 0 openvt
[6353785.570662] [ 7330] 0 7330 20637 9160 200704 0 0 console.py
[6353785.570666] [ 7366] 0 7366 2246 991 61440 0 0 dialog
[6353785.570674] [4040561] 64045 4040561 1284582 689006 9437184 0 0 ceph-osd
[6353785.570680] [4072035] 64045 4072035 1172819 710171 8654848 0 0 ceph-osd
[6353785.570685] [4132572] 64045 4132572 911820 628570 6574080 0 0 ceph-osd
[6353785.570689] [ 195633] 0 195633 7747 7593 98304 0 0 atop
[6353785.570694] [ 198644] 64045 198644 948932 696522 6860800 0 0 ceph-osd
[6353785.570699] [ 210997] 64045 210997 870563 656507 6254592 0 0 ceph-osd
[6353785.570703] [ 245128] 64045 245128 1250703 636700 8732672 0 0 ceph-osd
[6353785.570708] [ 255276] 64045 255276 994877 702902 7237632 0 0 ceph-osd
[6353785.570713] [ 258310] 64045 258310 923723 598137 6668288 0 0 ceph-osd
[6353785.570717] [ 284983] 64045 284983 832974 547315 5967872 0 0 ceph-osd
[6353785.570722] [ 414681] 0 414681 114787 1745 155648 0 0 glusterfs
[6353785.570729] [ 467256] 64045 467256 1183530 1014859 8720384 0 0 ceph-osd
[6353785.570734] [ 470262] 64045 470262 498771 351876 3227648 0 0 ceph-osd
[6353785.570739] [ 474837] 0 474837 654 119 45056 0 0 sh
[6353785.570743] [ 474838] 0 474838 654 402 45056 0 0 detect-disks.sh
[6353785.570748] [ 475049] 0 475049 654 29 45056 0 0 detect-disks.sh
[6353785.570753] [ 475050] 0 475050 2130 199 53248 0 0 udevadm
[6353785.570757] [ 475051] 0 475051 1611 65 45056 0 0 grep
[6353785.570761] [ 475052] 0 475052 1374 74 49152 0 0 cut
[6353785.570765] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/system-ceph\x2dosd.slice/ceph-osd@23.service,task=ceph-osd,pid=467256,uid=64045
[6353785.570883] Out of memory: Killed process 467256 (ceph-osd) total-vm:4734120kB, anon-rss:4059436kB, file-rss:0kB, shmem-rss:0kB, UID:64045 pgtables:8516kB oom_score_adj:0
[6353785.903656] oom_reaper: reaped process 467256 (ceph-osd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[6353785.911183] libceph: osd23 (1)192.168.216.12:6889 socket closed (con state OPEN)
[6353787.053825] libceph: osd23 (1)192.168.216.12:6889 socket closed (con state V1_BANNER)
[6353787.305530] libceph: osd23 (1)192.168.216.12:6889 socket error on write
[6353787.410629] libceph: osd23 down

[/code]

Thanks

1-Is this only on node 2 ?

2-What is the output on that node:

uname -r

3-What is backfill speed set at in the UI page ?

4-If you run "ceph status" a few times, what is average recovery traffic you see ?

 

Hi,

that was only on node 2, i've rebooted the whole node this morning.

But there was a strange issue a week before, node 5 crashed, and ended up in a crashing loop.
So the node crashed direct after coming up all the time.
The node was reachable by ssh, I was able to connect, 2 minutes later the node crashed while to osds from that node came back online in the cluster.

Node 5 is still in the Cluster, but offline. So 12 OSDs are missing since a week.

uname output: 5.14.21-02-petasan

Backfill is "fast"

avg Recoveryspeed was at about 900MB/s but the Speed is gone over the days down to about 6 MB/s, I think that is because of the few Objects which left for recovery.

cluster:
id: 7e26c1d0-1d17-4084-a119-062e280cc3fb
health: HEALTH_WARN
1 nearfull osd(s)
Degraded data redundancy: 1840171/206561091 objects degraded (0.891%), 21 pgs degraded, 21 pgs undersized
4717 pgs not deep-scrubbed in time
11 pool(s) nearfull

services:
mon: 3 daemons, quorum la01-cmon-03,la01-cmon-01,la01-cmon-02 (age 2w)
mgr: la01-cmon-03(active, since 7M), standbys: la01-cmon-02, la01-cmon-01
mds: 1/1 daemons up, 2 standby
osd: 72 osds: 60 up (since 3h), 60 in (since 9h); 22 remapped pgs
rgw: 3 daemons active (3 hosts, 1 zones)

data:
volumes: 1/1 healthy
pools: 11 pools, 5122 pgs
objects: 103.27M objects, 244 TiB
usage: 455 TiB used, 312 TiB / 767 TiB avail
pgs: 1840171/206561091 objects degraded (0.891%)
33559/206561091 objects misplaced (0.016%)
5096 active+clean
21 active+undersized+degraded+remapped+backfilling
4 active+clean+scrubbing+deep
1 active+remapped+backfilling

io:
client: 5.9 MiB/s rd, 15 MiB/s wr, 3.36k op/s rd, 1.51k op/s wr
recovery: 6.8 MiB/s, 17 objects/s

progress:
Global Recovery Event (2w)
[===========================.] (remaining: 113m)

 

Make sure you update to version 3.2.1 as 3.2.0 had a kernel small memory leak, it usually would show after months of operation. it is possible you are hitting it. Make sure you perform online upgrade to 3.2.1 and reboot each node as instructed at end of upgrade.