知其不可奈何而安之若命,德之至也。—-《庄子·内篇·人间世》
写在前面
博文内容为 通过 BCC 工具集 memleak
进行内存泄漏分析的简单认知
包括 memleak
脚本简单认知,内核态(内核模块)、用户态(Java,Python,C)
内存跟踪泄漏分析 Demo
理解不足小伙伴帮忙指正 :),生活加油
知其不可奈何而安之若命,德之至也。—-《庄子·内篇·人间世》
持续分享技术干货,感兴趣小伙伴可以关注下 ^_^
使用 BPF 分析 Linux 内存泄漏,这里主要使用 BCC 工具集中的 memleak 工具
memleak(8)’是一个 BCC 工具,可以用来跟踪内存分配和释放事件对应的调用栈信息
。随着时间的推移,这个工具可以显示长期不被释放的内存。
理论上一段时间后还是没有释放的内存
,这意味着可能是泄漏的内存。
工具的源码地址:
https://github.com/iovisor/bcc/blob/master/tools/memleak.py
工具的帮助文档:
https://github.com/iovisor/bcc/blob/master/tools/memleak_example.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 EXAMPLES: ./memleak -p $(pidof allocs) Trace allocations and display a summary of "leaked" (outstanding) allocations every 5 seconds ./memleak -p $(pidof allocs) -t Trace allocations and display each individual allocator function call ./memleak -ap $(pidof allocs) 10 Trace allocations and display allocated addresses, sizes, and stacks every 10 seconds for outstanding allocations ./memleak -c "./allocs" Run the specified command and trace its allocations ./memleak Trace allocations in kernel mode and display a summary of outstanding allocations every 5 seconds ./memleak -o 60000 Trace allocations in kernel mode and display a summary of outstanding allocations that are at least one minute (60 seconds) old ./memleak -s 5 Trace roughly every 5th allocation, to reduce overhead ./memleak --sort count Trace allocations in kernel mode and display a summary of outstanding allocations that are sorted in count order
这里简单说明一下这个脚本做了什么:
主要用于内核态和用户态
的内存跟踪,memleak
命令 当未指定进程 ID (pid = -1)
且未过滤特定命令 (command = None)
时,工具默认跟踪内核内存事件。若指定了 pid
或 command
,则只跟踪特定用户空间进程的内存行为。
1 kernel_trace = (pid == -1 and command is None )
对于用户态内存分配 (User-mode Allocations
),跟踪 malloc/calloc/realloc/mmap
等内存分配函数,以及对应的 free/munmap
释放函数。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 def attach_probes (sym, fn_prefix=None , can_fail=False , need_uretprobe=True ): if fn_prefix is None : fn_prefix = sym if args.symbols_prefix is not None : sym = args.symbols_prefix + sym try : bpf.attach_uprobe(name=obj, sym=sym, fn_name=fn_prefix + "_enter" , pid=pid) if need_uretprobe: bpf.attach_uretprobe(name=obj, sym=sym, fn_name=fn_prefix + "_exit" , pid=pid) except Exception: if can_fail: return else : raise attach_probes("malloc" ) attach_probes("calloc" ) attach_probes("realloc" ) attach_probes("mmap" , can_fail=True ) attach_probes("posix_memalign" ) attach_probes("valloc" , can_fail=True ) attach_probes("memalign" ) attach_probes("pvalloc" , can_fail=True ) attach_probes("aligned_alloc" , can_fail=True ) attach_probes("free" , need_uretprobe=False ) attach_probes("munmap" , can_fail=True , need_uretprobe=False )
对于内核态内存分配 (Kernel-mode Allocations
),通过内核跟踪点(tracepoints
)监控 kmalloc/kfree/kmem_cache_alloc
等内核内存分配释放函数,支持物理页分配(如 __get_free_pages
)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 bpf_source_kernel = """ TRACEPOINT_PROBE(kmem, kmalloc) { if (WORKAROUND_MISSING_FREE) gen_free_enter((struct pt_regs *)args, (void *)args->ptr); gen_alloc_enter((struct pt_regs *)args, args->bytes_alloc, KERNEL); return gen_alloc_exit2((struct pt_regs *)args, (size_t)args->ptr, KERNEL); } TRACEPOINT_PROBE(kmem, kfree) { return gen_free_enter((struct pt_regs *)args, (void *)args->ptr); } TRACEPOINT_PROBE(kmem, kmem_cache_alloc) { if (WORKAROUND_MISSING_FREE) gen_free_enter((struct pt_regs *)args, (void *)args->ptr); gen_alloc_enter((struct pt_regs *)args, args->bytes_alloc, KERNEL); return gen_alloc_exit2((struct pt_regs *)args, (size_t)args->ptr, KERNEL); } TRACEPOINT_PROBE(kmem, kmem_cache_free) { return gen_free_enter((struct pt_regs *)args, (void *)args->ptr); } TRACEPOINT_PROBE(kmem, mm_page_alloc) { gen_alloc_enter((struct pt_regs *)args, PAGE_SIZE << args->order, KERNEL); return gen_alloc_exit2((struct pt_regs *)args, args->pfn, KERNEL); } TRACEPOINT_PROBE(kmem, mm_page_free) { return gen_free_enter((struct pt_regs *)args, (void *)args->pfn); } """
泄漏检测
:记录未匹配 free 的分配操作,统计 “未释放” 内存的大小、次数和调用栈。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 static inline void update_statistics_add (u64 stack_id, u64 sz) { struct combined_alloc_info_t *existing_cinfo ; struct combined_alloc_info_t cinfo = {0 , 0 }; existing_cinfo = combined_allocs.lookup(&stack_id); if (!existing_cinfo) { combined_allocs.update(&stack_id, &cinfo); existing_cinfo = combined_allocs.lookup(&stack_id); if (!existing_cinfo) return ; } __sync_fetch_and_add(&existing_cinfo->total_size, sz); __sync_fetch_and_add(&existing_cinfo->number_of_allocs, 1 ); } static inline void update_statistics_del (u64 stack_id, u64 sz) { struct combined_alloc_info_t *existing_cinfo ; existing_cinfo = combined_allocs.lookup(&stack_id); if (!existing_cinfo) return ; if (existing_cinfo->number_of_allocs > 1 ) { __sync_fetch_and_sub(&existing_cinfo->total_size, sz); __sync_fetch_and_sub(&existing_cinfo->number_of_allocs, 1 ); } else { combined_allocs.delete (&stack_id); } }
过滤与采样
:支持按内存大小(–min-size/–max-size)、采样率(-s)过滤事件,减少性能开销
下面我们通过几个 Demo 来演示如何进行内核态和用户态的内存泄漏跟踪,下面的 Demo使用的最新版本的工具,实际上如果有特殊需求,可以定制化开发,感兴趣小伙伴可以尝试,欢迎留言讨论
内核态内存泄漏分析 这里我们通过一个内核模块来模拟内存泄漏的问题,memory_leak
是一个故意制造内存泄漏
的模块,通过定时器周期性分配内核内存但不释放
,模拟内核态内存泄漏场景。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ┌──[root@liruilongs.github.io]-[~/kleak_demo] └─$cat Makefile obj-m += memory_leak.o CFLAGS_memory_leak.o += -g all: make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules clean: make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean ┌──[root@liruilongs.github.io]-[~/kleak_demo] └─$ ┌──[root@liruilongs.github.io]-[~/kleak_demo] └─$ls Makefile memory_leak.c memory_leak.mod memory_leak.mod.o modules.order Module.symvers memory_leak.ko memory_leak.mod.c memory_leak.o
下面为模块对应的代码
初始化阶段(memory_leak_init)
,设置定时器 leak_timer
,绑定回调函数 leak_timer_callback
。首次触发时间设为 1秒后(msecs_to_jiffies(1000)
),打印加载日志:Memory leak module loaded
。 定时器回调(leak_timer_callback
):每次触发分配 1MB 内存, 内存泄漏核心操作:char *ptr = kmalloc(1024 * 1024, GFP_KERNEL); // 分配 1MB 内存
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 ┌──[root@liruilongs.github.io]-[~/kleak_demo] └─$cat memory_leak.c #include <linux/init.h> #include <linux/module.h> #include <linux/slab.h> #include <linux/timer.h> static struct timer_list leak_timer ;static int leak_count = 0 ;static void leak_timer_callback (struct timer_list *t) { char *ptr = kmalloc(1024 * 1024 , GFP_KERNEL); if (ptr) { printk(KERN_INFO "Leaked memory %d at %p\n" , leak_count++, ptr); } mod_timer(&leak_timer, jiffies + msecs_to_jiffies(10 )); } static int __init memory_leak_init (void ) { timer_setup(&leak_timer, leak_timer_callback, 0 ); mod_timer(&leak_timer, jiffies + msecs_to_jiffies(1000 )); printk(KERN_INFO "Memory leak module loaded\n" ); return 0 ; } static void __exit memory_leak_exit (void ) { del_timer_sync(&leak_timer); printk(KERN_INFO "Memory leak module unloaded (but memory not freed!)\n" ); } module_init(memory_leak_init); module_exit(memory_leak_exit); MODULE_LICENSE("GPL" ); MODULE_DESCRIPTION("Memory Leak Demo with Timer" );
编译之后加载内核模块即可,这是一个测试模块,只能在测试环境跑,加载模块之后,观测内核日志
1 2 3 4 5 6 7 8 ┌──[root@liruilongs.github.io]-[~/kleak_demo] └─$make ┌──[root@liruilongs.github.io]-[~/kleak_demo] └─$dmesg -C ┌──[root@liruilongs.github.io]-[~/kleak_demo] └─$insmod memory_leak.ko ┌──[root@liruilongs.github.io]-[~/kleak_demo] └─$dmesg -T --follow
下面为通过 BCC 工具 memleak
跟踪的内存分配日志,这里参数的意思,--top 5
以大小排序展示前五条, --min-size 1048000
过滤大于这个大小的内存申请。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 ┌──[root@liruilongs.github.io]-[/usr/share/bcc/tools] └─$./memleak --top 5 --min-size 1048000 Attaching to kernel allocators, Ctrl+C to quit. [19:33:40] Top 5 stacks with outstanding allocations: 1048576 bytes in 1 allocations from stack 0xffffffff9a1ab368 __alloc_pages+0x188 [kernel] 0xffffffff9a1ab368 __alloc_pages+0x188 [kernel] 0xffffffff9a169059 __kmalloc_large_node+0x79 [kernel] 0xffffffff9a1695d9 kmalloc_large+0x19 [kernel] 0xffffffffc0cd7024 leak_timer_callback+0x14 [memory_leak] 0xffffffff99fc4f84 call_timer_fn+0x24 [kernel] 0xffffffff99fc528e __run_timers.part.0+0x1ee [kernel] 0xffffffff99fc5356 run_timer_softirq+0x26 [kernel] 0xffffffff9aa978d7 __do_softirq+0xc7 [kernel] 0xffffffff99f10df1 __irq_exit_rcu+0xa1 [kernel] 0xffffffff9aa81a93 common_interrupt+0x43 [kernel] 0xffffffff9ac00d62 asm_common_interrupt+0x22 [kernel] 4194304 bytes in 4 allocations from stack 0xffffffff9a169648 kmalloc_large+0x88 [kernel] 0xffffffff9a169648 kmalloc_large+0x88 [kernel] 0xffffffffc0cd7024 leak_timer_callback+0x14 [memory_leak] 0xffffffff99fc4f84 call_timer_fn+0x24 [kernel] 0xffffffff99fc528e __run_timers.part.0+0x1ee [kernel] 0xffffffff99fc5356 run_timer_softirq+0x26 [kernel] 0xffffffff9aa978d7 __do_softirq+0xc7 [kernel] 0xffffffff99f10df1 __irq_exit_rcu+0xa1 [kernel] 0xffffffff9aa81ad0 common_interrupt+0x80 [kernel] 0xffffffff9ac00d62 asm_common_interrupt+0x22 [kernel] 0xffffffff9aa86850 acpi_safe_halt+0x20 [kernel] 0xffffffff9aa8689f acpi_idle_do_entry+0x2f [kernel] 0xffffffff9aa86bab acpi_idle_enter+0x7b [kernel] 0xffffffff9aa85911 cpuidle_enter_state+0x81 [kernel] 0xffffffff9a764639 cpuidle_enter+0x29 [kernel] 0xffffffff99f6c03a cpuidle_idle_call+0xfa [kernel] 0xffffffff99f6c12b do_idle+0x7b [kernel] 0xffffffff99f6c379 cpu_startup_entry+0x19 [kernel] 0xffffffff9aa86e0a rest_init+0xca [kernel] 0xffffffff9c48f766 arch_call_rest_init+0xa [kernel] 0xffffffff9c48fc67 start_kernel+0x4a3 [kernel] 0xffffffff99e00159 secondary_startup_64_no_verify+0xe4 [kernel] 4194304 bytes in 4 allocations from stack 0xffffffff9a1ab368 __alloc_pages+0x188 [kernel] 0xffffffff9a1ab368 __alloc_pages+0x188 [kernel] 0xffffffff9a169059 __kmalloc_large_node+0x79 [kernel] 0xffffffff9a1695d9 kmalloc_large+0x19 [kernel] 0xffffffffc0cd7024 leak_timer_callback+0x14 [memory_leak] 0xffffffff99fc4f84 call_timer_fn+0x24 [kernel] 0xffffffff99fc528e __run_timers.part.0+0x1ee [kernel] 0xffffffff99fc5356 run_timer_softirq+0x26 [kernel] 。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。 0xffffffff9c48fc67 start_kernel+0x4a3 [kernel] 0xffffffff99e00159 secondary_startup_64_no_verify+0xe4 [kernel] 454033408 bytes in 433 allocations from stack 0xffffffff9a1ab368 __alloc_pages+0x188 [kernel] 0xffffffff9a1ab368 __alloc_pages+0x188 [kernel] 0xffffffff9a169059 __kmalloc_large_node+0x79 [kernel] 0xffffffff9a1695d9 kmalloc_large+0x19 [kernel] 0xffffffffc0cd7024 leak_timer_callback+0x14 [memory_leak] 0xffffffff99fc4f84 call_timer_fn+0x24 [kernel] 0xffffffff99fc528e __run_timers.part.0+0x1ee [kernel] 0xffffffff99fc5356 run_timer_softirq+0x26 [kernel] 0xffffffff9aa978d7 __do_softirq+0xc7 [kernel] 。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。 0xffffffff9c48fc67 start_kernel+0x4a3 [kernel] 0xffffffff99e00159 secondary_startup_64_no_verify+0xe4 [kernel] 455081984 bytes in 434 allocations from stack 0xffffffff9a169648 kmalloc_large+0x88 [kernel] 0xffffffff9a169648 kmalloc_large+0x88 [kernel] 0xffffffffc0cd7024 leak_timer_callback+0x14 [memory_leak] 0xffffffff99fc4f84 call_timer_fn+0x24 [kernel] 0xffffffff99fc528e __run_timers.part.0+0x1ee [kernel] 0xffffffff99fc5356 run_timer_softirq+0x26 [kernel] 0xffffffff9aa978d7 __do_softirq+0xc7 [kernel] 0xffffffff99f10df1 __irq_exit_rcu+0xa1 [kernel] 。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。 0xffffffff9c48fc67 start_kernel+0x4a3 [kernel] 0xffffffff99e00159 secondary_startup_64_no_verify+0xe4 [kernel] [19:33:46] Top 5 stacks with outstanding allocations: 5242880 bytes in 5 allocations from stack 0xffffffff9a169648 kmalloc_large+0x88 [kernel] 0xffffffff9a169648 kmalloc_large+0x88 [kernel] 0xffffffffc0cd7024 leak_timer_callback+0x14 [memory_leak] 0xffffffff99fc4f84 call_timer_fn+0x24 [kernel] 0xffffffff99fc528e __run_timers.part.0+0x1ee [kernel] 0xffffffff99fc5356 run_timer_softirq+0x26 [kernel] 0xffffffff9aa978d7 __do_softirq+0xc7 [kernel] 0xffffffff99f10df1 __irq_exit_rcu+0xa1 [kernel] 0xffffffff9aa81ad0 common_interrupt+0x80 [kernel] 。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。 0xffffffff9c48fc67 start_kernel+0x4a3 [kernel] 0xffffffff99e00159 secondary_startup_64_no_verify+0xe4 [kernel] 5242880 bytes in 5 allocations from stack 0xffffffff9a1ab368 __alloc_pages+0x188 [kernel] 0xffffffff9a1ab368 __alloc_pages+0x188 [kernel] 0xffffffff9a169059 __kmalloc_large_node+0x79 [kernel] 0xffffffff9a1695d9 kmalloc_large+0x19 [kernel] 0xffffffffc0cd7024 leak_timer_callback+0x14 [memory_leak] 0xffffffff99fc4f84 call_timer_fn+0x24 [kernel] 0xffffffff99fc528e __run_timers.part.0+0x1ee [kernel] 0xffffffff99fc5356 run_timer_softirq+0x26 [kernel] 0xffffffff9aa978d7 __do_softirq+0xc7 [kernel] 0xffffffff99f10df1 __irq_exit_rcu+0xa1 [kernel] 。。。。。。。。。。。。。。。 0xffffffff9c48fc67 start_kernel+0x4a3 [kernel] 0xffffffff99e00159 secondary_startup_64_no_verify+0xe4 [kernel] 16777216 bytes in 8 allocations from stack 0xffffffff9a1ab368 __alloc_pages+0x188 [kernel] 0xffffffff9a1ab368 __alloc_pages+0x188 [kernel] 0xffffffff9a1abb47 __folio_alloc+0x17 [kernel] 0xffffffff9a1d0401 vma_alloc_folio+0x281 [kernel] 0xffffffff9a1f1006 do_huge_pmd_anonymous_page+0xb6 [kernel] 0xffffffff9a1843c1 __handle_mm_fault+0x661 [kernel] 0xffffffff9a1844ad handle_mm_fault+0xcd [kernel] 0xffffffff99e8ac94 do_user_addr_fault+0x1b4 [kernel] 0xffffffff9aa84ab2 exc_page_fault+0x62 [kernel] 0xffffffff9ac00bc2 asm_exc_page_fault+0x22 [kernel] 929038336 bytes in 886 allocations from stack 0xffffffff9a1ab368 __alloc_pages+0x188 [kernel] 0xffffffff9a1ab368 __alloc_pages+0x188 [kernel] 0xffffffff9a169059 __kmalloc_large_node+0x79 [kernel] 0xffffffff9a1695d9 kmalloc_large+0x19 [kernel] 0xffffffffc0cd7024 leak_timer_callback+0x14 [memory_leak] 0xffffffff99fc4f84 call_timer_fn+0x24 [kernel] 0xffffffff99fc528e __run_timers.part.0+0x1ee [kernel] 0xffffffff99fc5356 run_timer_softirq+0x26 [kernel] 。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。 0xffffffff9c48fc67 start_kernel+0x4a3 [kernel] 0xffffffff99e00159 secondary_startup_64_no_verify+0xe4 [kernel] 931135488 bytes in 888 allocations from stack 0xffffffff9a169648 kmalloc_large+0x88 [kernel] 0xffffffff9a169648 kmalloc_large+0x88 [kernel] 0xffffffffc0cd7024 leak_timer_callback+0x14 [memory_leak] 0xffffffff99fc4f84 call_timer_fn+0x24 [kernel] 0xffffffff99fc528e __run_timers.part.0+0x1ee [kernel] 0xffffffff99fc5356 run_timer_softirq+0x26 [kernel] 。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。 0xffffffff9c48f766 arch_call_rest_init+0xa [kernel] 0xffffffff9c48fc67 start_kernel+0x4a3 [kernel] 0xffffffff99e00159 secondary_startup_64_no_verify+0xe4 [kernel] ^C┌──[root@liruilongs.github.io]-[/usr/share/bcc/tools] └─$
上面的输出中反复出现关键调用栈 leak_timer_callback
,即为内存漏点:
1 0xffffffffc0cd7024 leak_timer_callback+0x14 [memory_leak]
这里的 [memory_leak]
可以确定是那个内核模块造成的,下面为完成的调用栈
1 2 3 4 5 6 0xffffffff9a1ab368 __alloc_pages+0x188 [kernel] 0xffffffff9a1ab368 __alloc_pages+0x188 [kernel] 0xffffffff9a169059 __kmalloc_large_node+0x79 [kernel] 0xffffffff9a1695d9 kmalloc_large+0x19 [kernel] 0xffffffffc0cd7024 leak_timer_callback+0x14 [memory_leak] 0xffffffff99fc4f84 call_timer_fn+0x24 [kernel]
直接对应上面的 memory_leak.c
内核模块代码中的泄漏函数 static void leak_timer_callback(struct timer_list *t)
。每次定时器触发(每10ms)会通过 kmalloc(1024 * 1024, GFP_KERNEL)
分配 1MB 内存但从不释放
首次检测(19:33:40):
泄漏内存通过内核页分配器
实现,可以验证我们前面的博文,大内存内核会直接调用页分配器分配连续物理页
1 2 __alloc_pages+0x188 [kernel] kmalloc_large+0x19 [kernel]
测试完成切记需要卸载模块,要不一会就 OOM
了,dmesg 中可以看到 memory_leak 内核模块申请内存的日志
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 ┌──[root@liruilongs.github.io]-[~/kleak_demo] └─$ rmmod memory_leak ┌──[root@liruilongs.github.io]-[~/kleak_demo] └─$dmesg -T | head -10 [Tue Jun 3 19:32:24 2025] Memory leak module loaded [Tue Jun 3 19:32:25 2025] Leaked memory 0 at 000000002eeb6860 [Tue Jun 3 19:32:25 2025] Leaked memory 1 at 0000000094c9050b [Tue Jun 3 19:32:25 2025] Leaked memory 2 at 00000000733807b5 [Tue Jun 3 19:32:25 2025] Leaked memory 3 at 0000000002beb3fc [Tue Jun 3 19:32:25 2025] Leaked memory 4 at 0000000003eb80f2 [Tue Jun 3 19:32:25 2025] Leaked memory 5 at 00000000ab0f3c38 [Tue Jun 3 19:32:25 2025] Leaked memory 6 at 0000000061366508 [Tue Jun 3 19:32:25 2025] Leaked memory 7 at 0000000093260330 [Tue Jun 3 19:32:25 2025] Leaked memory 8 at 00000000f4913b97 ┌──[root@liruilongs.github.io]-[~/kleak_demo] └─$
可以通过 free
命令直观的看到内存快速消耗,通过 free
列可以看到,空闲内存以 1s 一个 G 的速度减少,但是因为分配的是虚拟内存
,没有映射实际物理内存,所以 used
没有明显变化
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 ┌──[root@liruilongs.github.io]-[~] └─$free -s 1 -h total used free shared buff/cache available Mem: 15Gi 11Gi 3.2Gi 14Mi 485Mi 3.4Gi Swap: 2.0Gi 0B 2.0Gi total used free shared buff/cache available Mem: 15Gi 12Gi 3.1Gi 14Mi 485Mi 3.3Gi Swap: 2.0Gi 0B 2.0Gi total used free shared buff/cache available Mem: 15Gi 12Gi 3.0Gi 14Mi 485Mi 3.2Gi Swap: 2.0Gi 0B 2.0Gi total used free shared buff/cache available Mem: 15Gi 12Gi 2.9Gi 14Mi 485Mi 3.1Gi Swap: 2.0Gi 0B 2.0Gi total used free shared buff/cache available Mem: 15Gi 12Gi 2.8Gi 14Mi 485Mi 3.0Gi Swap: 2.0Gi 0B 2.0Gi total used free shared buff/cache available Mem: 15Gi 12Gi 2.7Gi 14Mi 485Mi 2.9Gi Swap: 2.0Gi 0B 2.0Gi total used free shared buff/cache available Mem: 15Gi 12Gi 2.6Gi 14Mi 486Mi 2.8Gi Swap: 2.0Gi 0B 2.0Gi
当然这里只是一个Demo,用于演示,实际的内存泄漏往往要复杂的多,需要通过调用栈结合代码分析
用户态内存泄漏分析 java 内存泄漏分析 堆外内存 使用的 JDK 版本
1 2 3 4 [developer@developer ~]$ java --show-version openjdk 17.0.13 2024-10-15 OpenJDK Runtime Environment BiSheng (build 17.0.13+11) OpenJDK 64-Bit Server VM BiSheng (build 17.0.13+11, mixed mode, sharing)
测试用的 Demo 典型的Java堆外内存泄漏演示程序,主要功能是持续分配堆外内存且不释放,最终导致系统内存耗尽
通过 ByteBuffer.allocateDirect()
申请的是堆外内存(Direct Memory)
。
堆外内存不受JVM GC 管理,需手动释放或依赖Cleaner机制
下面是 Demo 代码
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 [root@liruilongs .github.io ~]# cat memLeakDemo.java import java.nio.ByteBuffer;import java.util.ArrayList;import java.util.List;public class MemoryLeakJava { private static final List<Object> leakList = new ArrayList<>(); public static void main (String[] args) throws InterruptedException { while (true ) { ByteBuffer directBuffer = ByteBuffer.allocateDirect(10 *1024 * 1024 ); leakList.add(directBuffer); Thread.sleep(1000 ); } } } [root@liruilongs .github.io ~]#
run 上面的Demo
然后通过 BCC 工具进行分析跟踪
1 2 3 4 [root@liruilongs.github.io tools] Attaching to pid 1722308, Ctrl+C to quit. [16:44:22] Top 10 stacks with outstanding allocations: [16:44:32] Top 10 stacks with outstanding allocations:
参数说明:
-p $(pgrep java)
指定跟踪的进程ID
-s
每 3 次分配采样一次
-a
每隔 10 秒输出未释放内存的地址、大小及完整调用堆栈
-o
参数表示仅显示存活超过 20000 毫秒(20 s)的未释放内存,排除短期临时分配
上面的输出数据我们可以看到,前两次输出 20 s内,所以没有内存分配释放以及对应的调用栈信息,下面为第三次的输出,记录了在分配内存20s后任然存在的内存块分配地址以及大小
总共 6 次内存分配,两次主要的分配分别为 20MB 和 40MB,总计 60MB 未释放。每次分配大小为 10MB 左右(10485760 字节 ≈ 10MB)。这里分配的内存对应我们上面分配的堆外内存
除了内存块数据外,memleak
还记录了分配的调用栈,可以通过调用栈直接定位具体的调用方法
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 [16:44:42] Top 10 stacks with outstanding allocations: addr = ffff0dbf6010 size = 10485760 addr = ffff0a9f1010 size = 10485760 addr = ffff0f9f9010 size = 10485760 addr = ffff0b3f2010 size = 10485760 addr = ffff0b3f2000 size = 10489856 addr = ffff0f9f9000 size = 10489856 20979712 bytes in 2 allocations from stack 0x0000ffff96f2f730 [unknown] [libc.so.6] 0x0000ffff96f303b4 [unknown] [libc.so.6] 0x0000ffff96f31598 [unknown] [libc.so.6] 0x0000ffff96f31e84 malloc+0xa4 [libc.so.6] 0x0000fffffffff000 [unknown] [[uprobes]] 0x0000ffff9695d798 Unsafe_AllocateMemory0+0x74 [libjvm.so] 0x0000ffff78ef9b38 [unknown] 0x0000ffff78ef5f04 [unknown] 0x0000ffff78ef5f04 [unknown] 0x0000ffff78ef6018 [unknown] 0x0000ffff78ef5db8 [unknown] 0x0000ffff78ef0140 [unknown] 0x0000ffff963228e8 JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*)+0x298 [libjvm.so] 0x0000ffff96767a30 invoke(InstanceKlass*, methodHandle const&, Handle, bool, objArrayHandle, BasicType, objArrayHandle, bool, JavaThread*) [clone .constprop.0]+0x4fc [libjvm.so] 0x0000ffff967687e8 Reflection::invoke_method(oopDesc*, Handle, objArrayHandle, JavaThread*)+0x148 [libjvm.so] 0x0000ffff963ed5b4 JVM_InvokeMethod+0x124 [libjvm.so] 0x0000ffff78ef9b38 [unknown] 0x0000ffff78ef5db8 [unknown] 0x0000ffff78ef5db8 [unknown] 0x0000ffff78ef6330 [unknown] 0x0000ffff78ef5db8 [unknown] 0x0000ffff78ef6018 [unknown] 0x0000ffff78ef6018 [unknown] 0x0000ffff78ef0140 [unknown] 0x0000ffff963228e8 JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*)+0x298 [libjvm.so] 0x0000ffff963b3da4 jni_invoke_static(JNIEnv_*, JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, JavaThread*) [clone .constprop.1]+0x1c4 [libjvm.so] 0x0000ffff963b6878 jni_CallStaticVoidMethod+0x118 [libjvm.so] 0x0000ffff9705434c JavaMain+0xc5c [libjli.so] 0x0000ffff9705743c ThreadJavaMain+0xc [libjli.so] 0x0000ffff96f22518 [unknown] [libc.so.6] 0x0000ffff96f89d5c [unknown] [libc.so.6] 41943040 bytes in 4 allocations from stack 0x0000ffff966cf658 os::malloc(unsigned long, MEMFLAGS)+0xb8 [libjvm.so] 0x0000ffff9695d798 Unsafe_AllocateMemory0+0x74 [libjvm.so] 0x0000ffff78ef9b38 [unknown] 0x0000ffff78ef5f04 [unknown] 0x0000ffff78ef5f04 [unknown] 0x0000ffff78ef6018 [unknown] 0x0000ffff78ef5db8 [unknown] 0x0000ffff78ef0140 [unknown] 0x0000ffff963228e8 JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*)+0x298 [libjvm.so] 0x0000ffff96767a30 invoke(InstanceKlass*, methodHandle const&, Handle, bool, objArrayHandle, BasicType, objArrayHandle, bool, JavaThread*) [clone .constprop.0]+0x4fc [libjvm.so] 0x0000ffff967687e8 Reflection::invoke_method(oopDesc*, Handle, objArrayHandle, JavaThread*)+0x148 [libjvm.so] 0x0000ffff963ed5b4 JVM_InvokeMethod+0x124 [libjvm.so] 0x0000ffff78ef9b38 [unknown] 0x0000ffff78ef5db8 [unknown] 0x0000ffff78ef5db8 [unknown] 0x0000ffff78ef6330 [unknown] 0x0000ffff78ef5db8 [unknown] 0x0000ffff78ef6018 [unknown] 0x0000ffff78ef6018 [unknown] 0x0000ffff78ef0140 [unknown] 0x0000ffff963228e8 JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*)+0x298 [libjvm.so] 0x0000ffff963b3da4 jni_invoke_static(JNIEnv_*, JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, JavaThread*) [clone .constprop.1]+0x1c4 [libjvm.so] 0x0000ffff963b6878 jni_CallStaticVoidMethod+0x118 [libjvm.so] 0x0000ffff9705434c JavaMain+0xc5c [libjli.so] 0x0000ffff9705743c ThreadJavaMain+0xc [libjli.so] 0x0000ffff96f22518 [unknown] [libc.so.6] 0x0000ffff96f89d5c [unknown] [libc.so.6]
这里我们简单分析一下,以10485760 大小的内存块分配为例,上面最后一个调用栈输出,最上面是栈底,可以看到直接的调用的 libjvm
这个库(这是JVM的 核心实现),内部调用 glibc
的 malloc
函数分配内存
1 2 3 ├─ 本地内存分配(关键泄漏点) │ └─ Unsafe_AllocateMemory0 (libjvm.so) // 对应 Java 的 Unsafe.allocateMemory() │ └─ os::malloc (libjvm.so) // JVM 封装的 malloc 调用
通过这个调用栈我们可以直接定位代码中的对应的内存分配的函数,也就是内存泄漏的地方
1 ByteBuffer directBuffer = ByteBuffer.allocateDirect(10 *1024 *1024 );
这里是如何确定的,DirectByteBuffer
的构造会触发 Unsafe.allocateMemory
,JVM
通过 os::malloc
分配内存,并将内存地址与 DirectByteBuffer
对象关联。
下面是JDK源码的一部分
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 DirectByteBuffer(int cap) { super (-1 , 0 , cap, cap); boolean pa = VM.isDirectMemoryPageAligned(); int ps = Bits.pageSize(); long size = Math.max(1L , (long )cap + (pa ? ps : 0 )); Bits.reserveMemory(size, cap); long base = 0 ; try { base = unsafe.allocateMemory(size); } catch (OutOfMemoryError x) { Bits.unreserveMemory(size, cap); throw x; } unsafe.setMemory(base, size, (byte ) 0 ); if (pa && (base % ps != 0 )) { address = base + ps - (base & (ps - 1 )); } else { address = base; } cleaner = Cleaner.create(this , new Deallocator(base, size, cap)); att = null ; ...................
这里有个问题,可能细心的小伙伴发现了,虽然都是 10M 左右,但是还是不一样的,前两次直接调用 libc
库的 malloc
函数分配
1 2 3 4 5 6 7 20979712 bytes in 2 allocations from stack 0x0000ffff96f2f730 [unknown] [libc.so.6] 0x0000ffff96f303b4 [unknown] [libc.so.6] 0x0000ffff96f31598 [unknown] [libc.so.6] 0x0000ffff96f31e84 malloc+0xa4 [libc.so.6] 0x0000fffffffff000 [unknown] [[uprobes]] 0x0000ffff9695d798 Unsafe_AllocateMemory0+0x74 [libjvm.so]
后面的四次由 JVM
封装的 os::malloc
分配。
1 2 3 41943040 bytes in 4 allocations from stack 0x0000ffff966cf658 os::malloc(unsigned long, MEMFLAGS)+0xb8 [libjvm.so] 0x0000ffff9695d798 Unsafe_AllocateMemory0+0x74 [libjvm.so]
为什么会这样,我找了好多资料没有找到原因,欢迎小伙伴留言讨论 ^_^,我猜可能的原因,实际上只分配了一次,只是触发了两次埋点,但是解释不通为什么数量不一样。
下面为 10s 后的第四次输出
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 [16:44:52] Top 10 stacks with outstanding allocations: addr = ffff0dbf6010 size = 10485760 addr = ffff063ea010 size = 10485760 addr = ffff081ed010 size = 10485760 addr = ffff04fe8010 size = 10485760 addr = ffff06deb010 size = 10485760 addr = ffff0a9f1010 size = 10485760 addr = ffff08bee010 size = 10485760 addr = ffff0f9f9010 size = 10485760 addr = ffff095ef010 size = 10485760 addr = ffff0b3f2010 size = 10485760 addr = ffff081ed000 size = 10489856 addr = ffff077ec000 size = 10489856 addr = ffff0b3f2000 size = 10489856 addr = ffff0f9f9000 size = 10489856 addr = ffff04fe8000 size = 10489856 addr = ffff08bee000 size = 10489856 62939136 bytes in 6 allocations from stack 0x0000ffff96f2f730 [unknown] [libc.so.6] 0x0000ffff96f303b4 [unknown] [libc.so.6] 0x0000ffff96f31598 [unknown] [libc.so.6] 0x0000ffff96f31e84 malloc+0xa4 [libc.so.6] 0x0000fffffffff000 [unknown] [[uprobes]] 0x0000ffff9695d798 Unsafe_AllocateMemory0+0x74 [libjvm.so] 0x0000ffff78ef9b38 [unknown] 0x0000ffff78ef5f04 [unknown] 0x0000ffff78ef5f04 [unknown] 0x0000ffff78ef6018 [unknown] 0x0000ffff78ef5db8 [unknown] 0x0000ffff78ef0140 [unknown] 0x0000ffff963228e8 JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*)+0x298 [libjvm.so] 0x0000ffff96767a30 invoke(InstanceKlass*, methodHandle const&, Handle, bool, objArrayHandle, BasicType, objArrayHandle, bool, JavaThread*) [clone .constprop.0]+0x4fc [libjvm.so] 0x0000ffff967687e8 Reflection::invoke_method(oopDesc*, Handle, objArrayHandle, JavaThread*)+0x148 [libjvm.so] 0x0000ffff963ed5b4 JVM_InvokeMethod+0x124 [libjvm.so] 0x0000ffff78ef9b38 [unknown] 0x0000ffff78ef5db8 [unknown] 0x0000ffff78ef5db8 [unknown] 0x0000ffff78ef6330 [unknown] 0x0000ffff78ef5db8 [unknown] 0x0000ffff78ef6018 [unknown] 0x0000ffff78ef6018 [unknown] 0x0000ffff78ef0140 [unknown] 0x0000ffff963228e8 JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*)+0x298 [libjvm.so] 0x0000ffff963b3da4 jni_invoke_static(JNIEnv_*, JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, JavaThread*) [clone .constprop.1]+0x1c4 [libjvm.so] 0x0000ffff963b6878 jni_CallStaticVoidMethod+0x118 [libjvm.so] 0x0000ffff9705434c JavaMain+0xc5c [libjli.so] 0x0000ffff9705743c ThreadJavaMain+0xc [libjli.so] 0x0000ffff96f22518 [unknown] [libc.so.6] 0x0000ffff96f89d5c [unknown] [libc.so.6] 104857600 bytes in 10 allocations from stack 0x0000ffff966cf658 os::malloc(unsigned long, MEMFLAGS)+0xb8 [libjvm.so] 0x0000ffff9695d798 Unsafe_AllocateMemory0+0x74 [libjvm.so] 0x0000ffff78ef9b38 [unknown] 0x0000ffff78ef5f04 [unknown] 0x0000ffff78ef5f04 [unknown] 0x0000ffff78ef6018 [unknown] 0x0000ffff78ef5db8 [unknown] 0x0000ffff78ef0140 [unknown] 0x0000ffff963228e8 JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*)+0x298 [libjvm.so] 0x0000ffff96767a30 invoke(InstanceKlass*, methodHandle const&, Handle, bool, objArrayHandle, BasicType, objArrayHandle, bool, JavaThread*) [clone .constprop.0]+0x4fc [libjvm.so] 0x0000ffff967687e8 Reflection::invoke_method(oopDesc*, Handle, objArrayHandle, JavaThread*)+0x148 [libjvm.so] 0x0000ffff963ed5b4 JVM_InvokeMethod+0x124 [libjvm.so] 0x0000ffff78ef9b38 [unknown] 0x0000ffff78ef5db8 [unknown] 0x0000ffff78ef5db8 [unknown] 0x0000ffff78ef6330 [unknown] 0x0000ffff78ef5db8 [unknown] 0x0000ffff78ef6018 [unknown] 0x0000ffff78ef6018 [unknown] 0x0000ffff78ef0140 [unknown] 0x0000ffff963228e8 JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*)+0x298 [libjvm.so] 0x0000ffff963b3da4 jni_invoke_static(JNIEnv_*, JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, JavaThread*) [clone .constprop.1]+0x1c4 [libjvm.so] 0x0000ffff963b6878 jni_CallStaticVoidMethod+0x118 [libjvm.so] 0x0000ffff9705434c JavaMain+0xc5c [libjli.so] 0x0000ffff9705743c ThreadJavaMain+0xc [libjli.so] 0x0000ffff96f22518 [unknown] [libc.so.6] 0x0000ffff96f89d5c [unknown] [libc.so.6]
可以看到调用栈数据保持一致
,包括分配的内存块大小
,验证了我们上面找到的内存漏点。
实际上生产中大多数常见内存事故,我们首先需要确定是否存在内存泄漏,然后在寻找漏点,这里还有个更简单的方法,直接比对每次采集的内存块地址,如果在多个时间区间内,一直增加,释放的特别少
,那么可能存在内存泄漏
下面为1分钟之内驻留的内存块地址,我们可以通过简单的比对,直接判断是否存在内存泄漏,但是这里的比对太多明显
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 [16:44:42] Top 10 stacks with outstanding allocations: addr = ffff0dbf6010 size = 10485760 addr = ffff0a9f1010 size = 10485760 addr = ffff0f9f9010 size = 10485760 addr = ffff0b3f2010 size = 10485760 addr = ffff0b3f2000 size = 10489856 addr = ffff0f9f9000 size = 10489856 .............. [16:45:42] Top 10 stacks with outstanding allocations: addr = ffff34156aa0 size = 40 addr = ffff031e5010 size = 10485760 addr = fffee79b9010 size = 10485760 addr = fffef37cc010 size = 10485760 addr = fffefa5d7010 size = 10485760 addr = ffff0dbf6010 size = 10485760 addr = ffff063ea010 size = 10485760 addr = ffff081ed010 size = 10485760 addr = ffff04fe8010 size = 10485760 addr = ffff06deb010 size = 10485760 addr = fffefe1dd010 size = 10485760 addr = fffeff5df010 size = 10485760 addr = fffee51b5010 size = 10485760 addr = ffff009e1010 size = 10485760 addr = fffef5fd0010 size = 10485760 addr = fffee65b7010 size = 10485760 addr = fffefffe0010 size = 10485760 addr = ffff0a9f1010 size = 10485760 addr = fffeef1c5010 size = 10485760 addr = ffff08bee010 size = 10485760 addr = ffff0f9f9010 size = 10485760 addr = fffefb9d9010 size = 10485760 addr = ffff095ef010 size = 10485760 addr = ffff0b3f2010 size = 10485760 addr = fffeefbc6010 size = 10485760 addr = ffff027e4010 size = 10485760 addr = fffef4bce010 size = 10485760 addr = fffeee7c4010 size = 10485760 addr = fffef87d4010 size = 10485760 addr = fffefffe0000 size = 10489856 addr = fffeebfc0000 size = 10489856 addr = fffef7dd3000 size = 10489856 addr = ffff081ed000 size = 10489856 addr = fffeeb5bf000 size = 10489856 addr = ffff077ec000 size = 10489856 addr = ffff0b3f2000 size = 10489856 addr = fffee65b7000 size = 10489856 addr = ffff0f9f9000 size = 10489856 addr = fffef19c9000 size = 10489856 addr = fffee97bc000 size = 10489856 addr = fffef69d1000 size = 10489856 addr = fffee79b9000 size = 10489856 addr = ffff009e1000 size = 10489856 addr = fffeee7c4000 size = 10489856 addr = fffef55cf000 size = 10489856 addr = fffefafd8000 size = 10489856 addr = ffff04fe8000 size = 10489856 addr = fffeefbc6000 size = 10489856 addr = fffeed3c2000 size = 10489856 addr = ffff08bee000 size = 10489856 .............
红色的部分全部为增加的内存块,之前开始分配的内存一个都没有释放
堆内内存 下面我们来看另一种情况,对应堆内存的跟踪,上面的Demo分配的内存并不归java 堆管理,简单修改使内存分配从本地内存转移到了Java 堆内存。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 [root@liruilongs .github.io ~]# cat memLeakDemo.java import java.nio.ByteBuffer;import java.util.ArrayList;import java.util.List;import java.util.Random;public class MemoryLeakJava { private static final List<Object> leakList = new ArrayList<>(); private static final Random random = new Random(); static class DataObject { private byte [] data; public DataObject (int sizeInBytes) { this .data = new byte [sizeInBytes]; random.nextBytes(this .data); } } public static void main (String[] args) throws InterruptedException { while (true ) { DataObject data = new DataObject(10 * 1024 * 1024 ); leakList.add(data); Thread.sleep(1000 ); } } } [root@liruilongs .github.io ~]#
每次创建的对象 DataObject
为 10M,JVM 堆中,超过 10M 的对象会直接进入到老年代,下面为使用相同的命令进行的内存分配跟踪。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 [root@liruilongs.github.io tools] Attaching to pid 1743032, Ctrl+C to quit. [19:29:29] Top 3 stacks with outstanding allocations: [19:29:39] Top 3 stacks with outstanding allocations: [19:29:49] Top 3 stacks with outstanding allocations: addr = ffff4041d4f0 size = 26 addr = ffff40401ed0 size = 26 addr = ffff403ec3d0 size = 26 addr = ffff40402490 size = 26 addr = ffff404189c0 size = 26 .......................................... addr = ffff6455e000 size = 4096 addr = ffff648a4000 size = 4096 addr = ffff648b4000 size = 4096 addr = ffff64581000 size = 4096 addr = ffff64221000 size = 4096 addr = ffff648bd000 size = 4096 addr = ffff648c1000 size = 4096 addr = ffff648a7000 size = 4096 addr = ffff6237b000 size = 294912 addr = ffff6249f000 size = 311296 addr = fd700000 size = 18874368 addr = fa000000 size = 18874368 86016 bytes in 21 allocations from stack 0x0000ffff80f38534 os::pd_commit_memory_or_exit(char*, unsigned long, unsigned long, bool, char const*)+0x44 [libjvm.so] 0x0000ffff80f31edc os::commit_memory_or_exit(char*, unsigned long, unsigned long, bool, char const*)+0x38 [libjvm.so] 0x0000ffff80a80608 G1PageBasedVirtualSpace::commit(unsigned long, unsigned long)+0x178 [libjvm.so] 0x0000ffff80a955bc G1RegionsSmallerThanCommitSizeMapper::commit_regions(unsigned int, unsigned long, WorkGang*)+0x9c [libjvm.so] 0x0000ffff80b32cd4 HeapRegionManager::commit_regions(unsigned int, unsigned long, WorkGang*)+0xb4 [libjvm.so] 0x0000ffff80b34b94 HeapRegionManager::expand(unsigned int, unsigned int, WorkGang*)+0x34 [libjvm.so] 0x0000ffff80b34da0 HeapRegionManager::expand_by(unsigned int, WorkGang*)+0x70 [libjvm.so] 0x0000ffff80a378bc G1CollectedHeap::expand(unsigned long, WorkGang*, double*)+0x10c [libjvm.so] 0x0000ffff80a3a788 G1CollectedHeap::resize_heap_if_necessary()+0x58 [libjvm.so] 0x0000ffff80a48090 G1ConcurrentMark::remark()+0x3a0 [libjvm.so] 0x0000ffff80abad54 VM_G1PauseConcurrent::doit()+0x164 [libjvm.so] 0x0000ffff8120fe58 VM_Operation::evaluate()+0xe8 [libjvm.so] 0x0000ffff81211a3c VMThread::evaluate_operation(VM_Operation*)+0xfc [libjvm.so] 0x0000ffff81211f88 VMThread::inner_execute(VM_Operation*)+0x1f8 [libjvm.so] 0x0000ffff812122f8 VMThread::run()+0xd4 [libjvm.so] 0x0000ffff81193434 Thread::call_run()+0xc4 [libjvm.so] 0x0000ffff80f3b7ac thread_native_entry(Thread*)+0xdc [libjvm.so] 0x0000ffff81782518 [unknown] [libc.so.6] 0x0000ffff817e9d5c [unknown] [libc.so.6] 606208 bytes in 2 allocations from stack 0x0000ffff80f38534 os::pd_commit_memory_or_exit(char*, unsigned long, unsigned long, bool, char const*)+0x44 [libjvm.so] 0x0000ffff80f31edc os::commit_memory_or_exit(char*, unsigned long, unsigned long, bool, char const*)+0x38 [libjvm.so] 0x0000ffff80a80608 G1PageBasedVirtualSpace::commit(unsigned long, unsigned long)+0x178 [libjvm.so] 0x0000ffff80a952b4 G1RegionsLargerThanCommitSizeMapper::commit_regions(unsigned int, unsigned long, WorkGang*)+0x194 [libjvm.so] 0x0000ffff80b32cb8 HeapRegionManager::commit_regions(unsigned int, unsigned long, WorkGang*)+0x98 [libjvm.so] 0x0000ffff80b34b94 HeapRegionManager::expand(unsigned int, unsigned int, WorkGang*)+0x34 [libjvm.so] 0x0000ffff80b34da0 HeapRegionManager::expand_by(unsigned int, WorkGang*)+0x70 [libjvm.so] 0x0000ffff80a378bc G1CollectedHeap::expand(unsigned long, WorkGang*, double*)+0x10c [libjvm.so] 0x0000ffff80a3a788 G1CollectedHeap::resize_heap_if_necessary()+0x58 [libjvm.so] 0x0000ffff80a48090 G1ConcurrentMark::remark()+0x3a0 [libjvm.so] 0x0000ffff80abad54 VM_G1PauseConcurrent::doit()+0x164 [libjvm.so] 0x0000ffff8120fe58 VM_Operation::evaluate()+0xe8 [libjvm.so] 0x0000ffff81211a3c VMThread::evaluate_operation(VM_Operation*)+0xfc [libjvm.so] 0x0000ffff81211f88 VMThread::inner_execute(VM_Operation*)+0x1f8 [libjvm.so] 0x0000ffff812122f8 VMThread::run()+0xd4 [libjvm.so] 0x0000ffff81193434 Thread::call_run()+0xc4 [libjvm.so] 0x0000ffff80f3b7ac thread_native_entry(Thread*)+0xdc [libjvm.so] 0x0000ffff81782518 [unknown] [libc.so.6] 0x0000ffff817e9d5c [unknown] [libc.so.6] 37748736 bytes in 2 allocations from stack 0x0000ffff80f38534 os::pd_commit_memory_or_exit(char*, unsigned long, unsigned long, bool, char const*)+0x44 [libjvm.so] 0x0000ffff80f31edc os::commit_memory_or_exit(char*, unsigned long, unsigned long, bool, char const*)+0x38 [libjvm.so] 0x0000ffff80a80608 G1PageBasedVirtualSpace::commit(unsigned long, unsigned long)+0x178 [libjvm.so] 0x0000ffff80a952b4 G1RegionsLargerThanCommitSizeMapper::commit_regions(unsigned int, unsigned long, WorkGang*)+0x194 [libjvm.so] 0x0000ffff80b32c80 HeapRegionManager::commit_regions(unsigned int, unsigned long, WorkGang*)+0x60 [libjvm.so] 0x0000ffff80b34b94 HeapRegionManager::expand(unsigned int, unsigned int, WorkGang*)+0x34 [libjvm.so] 0x0000ffff80b34da0 HeapRegionManager::expand_by(unsigned int, WorkGang*)+0x70 [libjvm.so] 0x0000ffff80a378bc G1CollectedHeap::expand(unsigned long, WorkGang*, double*)+0x10c [libjvm.so] 0x0000ffff80a3a788 G1CollectedHeap::resize_heap_if_necessary()+0x58 [libjvm.so] 0x0000ffff80a48090 G1ConcurrentMark::remark()+0x3a0 [libjvm.so] 0x0000ffff80abad54 VM_G1PauseConcurrent::doit()+0x164 [libjvm.so] 0x0000ffff8120fe58 VM_Operation::evaluate()+0xe8 [libjvm.so] 0x0000ffff81211a3c VMThread::evaluate_operation(VM_Operation*)+0xfc [libjvm.so] 0x0000ffff81211f88 VMThread::inner_execute(VM_Operation*)+0x1f8 [libjvm.so] 0x0000ffff812122f8 VMThread::run()+0xd4 [libjvm.so] 0x0000ffff81193434 Thread::call_run()+0xc4 [libjvm.so] 0x0000ffff80f3b7ac thread_native_entry(Thread*)+0xdc [libjvm.so] 0x0000ffff81782518 [unknown] [libc.so.6] 0x0000ffff817e9d5c [unknown] [libc.so.6] [19:29:59] Top 3 stacks with outstanding allocations: addr = ffff4041d4f0 size = 26 addr = ffff40401ed0 size = 26 addr = ffff403ec3d0 size = 26 addr = ffff40402490 size = 26 addr = ffff404189c0 size = 26 ........................
通过静态列表leakList
强引用所有 data
实例,导致堆内存无法回收,触发 G1 GC 不断扩展堆空间
。我们可以通过上面的调用栈来获取这些信息
1 G1CollectedHeap::expand → HeapRegionManager::commit_regions → os::commit_memory
但是对应 Java 堆内部的内存情况,无从得知
所以对于堆内内存通过BCC 工具 memleak
无法实现内存分配跟踪,需要使用 Java 生态自己的一些工具 JVisualVM、jstat、jmap、jconsole、jprofile
等。
下面为 VisualVM
中 GC可视化插件,一个典型的内存泄漏的 GC监控数据
感兴趣小伙伴可以看我之前的博文:
https://mp.weixin.qq.com/s/Eoda9WACipRtaRFS-0G6Rw
Python 内存泄漏分析 下面是一个 Python 的内存泄漏Demo
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 [root@liruilongs.github.io ~] import timeimport osimport psutilleaked_objects = [] def leak_memory (step=1024 * 1024 ): """持续分配内存并保留引用""" while True : large_data = bytearray (step) leaked_objects.append(large_data) time.sleep(1 ) if __name__ == "__main__" : print (f"Python 进程 PID: {os.getpid()} " ) leak_memory() [root@liruilongs.github.io ~]
通过同样的命令监控, 可以看到 libc.so.6
有些函数都是 unknown
, 还有一些python 解释器的库libpython3.9.so.1.0
, 即 python
也无法直接通过 memleak
来定位内存分配释放函数, 但是因为我们的Demo很简单,所以可以直观的看到,所有泄漏内存均来自 libpython3.9.so.1.0
中的 PyByteArray_Resize
函数调用栈
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 [root@liruilongs.github.io tools] Attaching to pid 1757033, Ctrl+C to quit. [20:22:00] Top 10 stacks with outstanding allocations: [20:22:10] Top 10 stacks with outstanding allocations: [20:22:20] Top 10 stacks with outstanding allocations: addr = ffffaeab4010 size = 1048577 addr = ffffae2ac000 size = 1052672 addr = ffffaeab4000 size = 1052672 addr = ffffae4ae000 size = 1052672 addr = ffffae8b2000 size = 1052672 1048577 bytes in 1 allocations from stack 0x0000ffffba410000 [unknown] [libpython3.9.so.1.0] 0x0000ffffba3bc8ec PyByteArray_Resize+0xcc [libpython3.9.so.1.0] 0x0000ffffba3bcbb8 [unknown] [libpython3.9.so.1.0] 0x0000ffffba41fe00 [unknown] [libpython3.9.so.1.0] 0x0000ffffba3c9278 _PyObject_MakeTpCall+0x98 [libpython3.9.so.1.0] 0x0000ffffba38d9f8 _PyEval_EvalFrameDefault+0x5b98 [libpython3.9.so.1.0] 0x0000ffffba387068 [unknown] [libpython3.9.so.1.0] 0x0000ffffba38d428 _PyEval_EvalFrameDefault+0x55c8 [libpython3.9.so.1.0] 0x0000ffffba48f7e4 [unknown] [libpython3.9.so.1.0] 0x0000ffffba48fc44 _PyEval_EvalCodeWithName+0x64 [libpython3.9.so.1.0] 0x0000ffffba48fc90 PyEval_EvalCodeEx+0x40 [libpython3.9.so.1.0] 0x0000ffffba48fccc PyEval_EvalCode+0x2c [libpython3.9.so.1.0] 0x0000ffffba4cafbc [unknown] [libpython3.9.so.1.0] 0x0000ffffba4cb1d4 [unknown] [libpython3.9.so.1.0] 0x0000ffffba4cdcd0 [unknown] [libpython3.9.so.1.0] 0x0000ffffba4cde84 PyRun_SimpleFileExFlags+0x124 [libpython3.9.so.1.0] 0x0000ffffba4e8a20 Py_RunMain+0x5f0 [libpython3.9.so.1.0] 0x0000ffffba4e8f10 Py_BytesMain+0x5c [libpython3.9.so.1.0] 0x0000ffffba08b000 [unknown] [libc.so.6] 0x0000ffffba08b0d8 __libc_start_main+0x94 [libc.so.6] 0x0000aaaada4b08b0 _start+0x30 [python3.9] 4210688 bytes in 4 allocations from stack 0x0000ffffba0ef730 [unknown] [libc.so.6] 0x0000ffffba0f03b4 [unknown] [libc.so.6] 0x0000ffffba0f1598 [unknown] [libc.so.6] 0x0000fffffffff000 [unknown] [[uprobes]] 0x0000ffffba3bc8ec PyByteArray_Resize+0xcc [libpython3.9.so.1.0] 0x0000ffffba3bcbb8 [unknown] [libpython3.9.so.1.0] 0x0000ffffba41fe00 [unknown] [libpython3.9.so.1.0] 0x0000ffffba3c9278 _PyObject_MakeTpCall+0x98 [libpython3.9.so.1.0] 0x0000ffffba38d9f8 _PyEval_EvalFrameDefault+0x5b98 [libpython3.9.so.1.0] 0x0000ffffba387068 [unknown] [libpython3.9.so.1.0] 0x0000ffffba38d428 _PyEval_EvalFrameDefault+0x55c8 [libpython3.9.so.1.0] 0x0000ffffba48f7e4 [unknown] [libpython3.9.so.1.0] 0x0000ffffba48fc44 _PyEval_EvalCodeWithName+0x64 [libpython3.9.so.1.0] 0x0000ffffba48fc90 PyEval_EvalCodeEx+0x40 [libpython3.9.so.1.0] 0x0000ffffba48fccc PyEval_EvalCode+0x2c [libpython3.9.so.1.0] 0x0000ffffba4cafbc [unknown] [libpython3.9.so.1.0] 0x0000ffffba4cb1d4 [unknown] [libpython3.9.so.1.0] 0x0000ffffba4cdcd0 [unknown] [libpython3.9.so.1.0] 0x0000ffffba4cde84 PyRun_SimpleFileExFlags+0x124 [libpython3.9.so.1.0] 0x0000ffffba4e8a20 Py_RunMain+0x5f0 [libpython3.9.so.1.0] 0x0000ffffba4e8f10 Py_BytesMain+0x5c [libpython3.9.so.1.0] 0x0000ffffba08b000 [unknown] [libc.so.6] 0x0000ffffba08b0d8 __libc_start_main+0x94 [libc.so.6] 0x0000aaaada4b08b0 _start+0x30 [python3.9] [20:22:30] Top 10 stacks with outstanding allocations: addr = ffffaeab4010 size = 1048577 addr = ffffadca6010 size = 1048577 addr = ffffae2ac000 size = 1052672 addr = ffffadda7000 size = 1052672 addr = ffffaeab4000 size = 1052672 addr = ffffae4ae000 size = 1052672 addr = ffffad7a1000 size = 1052672 addr = ffffae8b2000 size = 1052672 addr = ffffadea8000 size = 1052672 addr = ffffadca6000 size = 1052672 2097154 bytes in 2 allocations from stack 0x0000ffffba410000 [unknown] [libpython3.9.so.1.0] 0x0000ffffba3bc8ec PyByteArray_Resize+0xcc [libpython3.9.so.1.0] 0x0000ffffba3bcbb8 [unknown] [libpython3.9.so.1.0] 0x0000ffffba41fe00 [unknown] [libpython3.9.so.1.0] 0x0000ffffba3c9278 _PyObject_MakeTpCall+0x98 [libpython3.9.so.1.0] 0x0000ffffba38d9f8 _PyEval_EvalFrameDefault+0x5b98 [libpython3.9.so.1.0] 0x0000ffffba387068 [unknown] [libpython3.9.so.1.0] 0x0000ffffba38d428 _PyEval_EvalFrameDefault+0x55c8 [libpython3.9.so.1.0] 0x0000ffffba48f7e4 [unknown] [libpython3.9.so.1.0] 0x0000ffffba48fc44 _PyEval_EvalCodeWithName+0x64 [libpython3.9.so.1.0] 0x0000ffffba48fc90 PyEval_EvalCodeEx+0x40 [libpython3.9.so.1.0] 0x0000ffffba48fccc PyEval_EvalCode+0x2c [libpython3.9.so.1.0] 0x0000ffffba4cafbc [unknown] [libpython3.9.so.1.0] 0x0000ffffba4cb1d4 [unknown] [libpython3.9.so.1.0] 0x0000ffffba4cdcd0 [unknown] [libpython3.9.so.1.0] 0x0000ffffba4cde84 PyRun_SimpleFileExFlags+0x124 [libpython3.9.so.1.0] 0x0000ffffba4e8a20 Py_RunMain+0x5f0 [libpython3.9.so.1.0] 0x0000ffffba4e8f10 Py_BytesMain+0x5c [libpython3.9.so.1.0] 0x0000ffffba08b000 [unknown] [libc.so.6] 0x0000ffffba08b0d8 __libc_start_main+0x94 [libc.so.6] 0x0000aaaada4b08b0 _start+0x30 [python3.9] 8421376 bytes in 8 allocations from stack 0x0000ffffba0ef730 [unknown] [libc.so.6] 0x0000ffffba0f03b4 [unknown] [libc.so.6] 0x0000ffffba0f1598 [unknown] [libc.so.6] 0x0000fffffffff000 [unknown] [[uprobes]] 0x0000ffffba3bc8ec PyByteArray_Resize+0xcc [libpython3.9.so.1.0] 0x0000ffffba3bcbb8 [unknown] [libpython3.9.so.1.0] 0x0000ffffba41fe00 [unknown] [libpython3.9.so.1.0] 0x0000ffffba3c9278 _PyObject_MakeTpCall+0x98 [libpython3.9.so.1.0] 0x0000ffffba38d9f8 _PyEval_EvalFrameDefault+0x5b98 [libpython3.9.so.1.0] 0x0000ffffba387068 [unknown] [libpython3.9.so.1.0] 0x0000ffffba38d428 _PyEval_EvalFrameDefault+0x55c8 [libpython3.9.so.1.0] 0x0000ffffba48f7e4 [unknown] [libpython3.9.so.1.0] 0x0000ffffba48fc44 _PyEval_EvalCodeWithName+0x64 [libpython3.9.so.1.0] 0x0000ffffba48fc90 PyEval_EvalCodeEx+0x40 [libpython3.9.so.1.0] 0x0000ffffba48fccc PyEval_EvalCode+0x2c [libpython3.9.so.1.0] 0x0000ffffba4cafbc [unknown] [libpython3.9.so.1.0] 0x0000ffffba4cb1d4 [unknown] [libpython3.9.so.1.0] 0x0000ffffba4cdcd0 [unknown] [libpython3.9.so.1.0] 0x0000ffffba4cde84 PyRun_SimpleFileExFlags+0x124 [libpython3.9.so.1.0] 0x0000ffffba4e8a20 Py_RunMain+0x5f0 [libpython3.9.so.1.0] 0x0000ffffba4e8f10 Py_BytesMain+0x5c [libpython3.9.so.1.0] 0x0000ffffba08b000 [unknown] [libc.so.6] 0x0000ffffba08b0d8 __libc_start_main+0x94 [libc.so.6] 0x0000aaaada4b08b0 _start+0x30 [python3.9] [20:22:40] Top 10 stacks with outstanding allocations: addr = ffffad49e010 size = 1048577 addr = ffffaeab4010 size = 1048577 addr = ffffad29c010 size = 1048577 addr = ffffadca6010 size = 1048577 addr = ffffad59f010 size = 1048577 addr = ffffad6a0010 size = 1048577 addr = ffffad19b010 size = 1048577 addr = ffffae2ac000 size = 1052672 addr = ffffadda7000 size = 1052672 addr = ffffaeab4000 size = 1052672 addr = ffffae4ae000 size = 1052672 addr = ffffacd97000 size = 1052672 addr = ffffad39d000 size = 1052672 addr = ffffad59f000 size = 1052672 addr = ffffad09a000 size = 1052672 addr = ffffad7a1000 size = 1052672 addr = ffffae8b2000 size = 1052672 addr = ffffacf99000 size = 1052672 addr = ffffadea8000 size = 1052672 addr = ffffadca6000 size = 1052672 addr = ffffad49e000 size = 1052672 7340039 bytes in 7 allocations from stack 0x0000ffffba410000 [unknown] [libpython3.9.so.1.0] 0x0000ffffba3bc8ec PyByteArray_Resize+0xcc [libpython3.9.so.1.0] 0x0000ffffba3bcbb8 [unknown] [libpython3.9.so.1.0] 0x0000ffffba41fe00 [unknown] [libpython3.9.so.1.0] 0x0000ffffba3c9278 _PyObject_MakeTpCall+0x98 [libpython3.9.so.1.0] 0x0000ffffba38d9f8 _PyEval_EvalFrameDefault+0x5b98 [libpython3.9.so.1.0] 0x0000ffffba387068 [unknown] [libpython3.9.so.1.0] 0x0000ffffba38d428 _PyEval_EvalFrameDefault+0x55c8 [libpython3.9.so.1.0] 0x0000ffffba48f7e4 [unknown] [libpython3.9.so.1.0] 0x0000ffffba48fc44 _PyEval_EvalCodeWithName+0x64 [libpython3.9.so.1.0] 0x0000ffffba48fc90 PyEval_EvalCodeEx+0x40 [libpython3.9.so.1.0] 0x0000ffffba48fccc PyEval_EvalCode+0x2c [libpython3.9.so.1.0] 0x0000ffffba4cafbc [unknown] [libpython3.9.so.1.0] 0x0000ffffba4cb1d4 [unknown] [libpython3.9.so.1.0] 0x0000ffffba4cdcd0 [unknown] [libpython3.9.so.1.0] 0x0000ffffba4cde84 PyRun_SimpleFileExFlags+0x124 [libpython3.9.so.1.0] 0x0000ffffba4e8a20 Py_RunMain+0x5f0 [libpython3.9.so.1.0] 0x0000ffffba4e8f10 Py_BytesMain+0x5c [libpython3.9.so.1.0] 0x0000ffffba08b000 [unknown] [libc.so.6] 0x0000ffffba08b0d8 __libc_start_main+0x94 [libc.so.6] 0x0000aaaada4b08b0 _start+0x30 [python3.9] 14737408 bytes in 14 allocations from stack 0x0000ffffba0ef730 [unknown] [libc.so.6] 0x0000ffffba0f03b4 [unknown] [libc.so.6] 0x0000ffffba0f1598 [unknown] [libc.so.6] 0x0000fffffffff000 [unknown] [[uprobes]] 0x0000ffffba3bc8ec PyByteArray_Resize+0xcc [libpython3.9.so.1.0] 0x0000ffffba3bcbb8 [unknown] [libpython3.9.so.1.0] 0x0000ffffba41fe00 [unknown] [libpython3.9.so.1.0] 0x0000ffffba3c9278 _PyObject_MakeTpCall+0x98 [libpython3.9.so.1.0] 0x0000ffffba38d9f8 _PyEval_EvalFrameDefault+0x5b98 [libpython3.9.so.1.0] 0x0000ffffba387068 [unknown] [libpython3.9.so.1.0] 0x0000ffffba38d428 _PyEval_EvalFrameDefault+0x55c8 [libpython3.9.so.1.0] 0x0000ffffba48f7e4 [unknown] [libpython3.9.so.1.0] 0x0000ffffba48fc44 _PyEval_EvalCodeWithName+0x64 [libpython3.9.so.1.0] 0x0000ffffba48fc90 PyEval_EvalCodeEx+0x40 [libpython3.9.so.1.0] 0x0000ffffba48fccc PyEval_EvalCode+0x2c [libpython3.9.so.1.0] 0x0000ffffba4cafbc [unknown] [libpython3.9.so.1.0] 0x0000ffffba4cb1d4 [unknown] [libpython3.9.so.1.0] 0x0000ffffba4cdcd0 [unknown] [libpython3.9.so.1.0] 0x0000ffffba4cde84 PyRun_SimpleFileExFlags+0x124 [libpython3.9.so.1.0] 0x0000ffffba4e8a20 Py_RunMain+0x5f0 [libpython3.9.so.1.0] 0x0000ffffba4e8f10 Py_BytesMain+0x5c [libpython3.9.so.1.0] 0x0000ffffba08b000 [unknown] [libc.so.6] 0x0000ffffba08b0d8 __libc_start_main+0x94 [libc.so.6] 0x0000aaaada4b08b0 _start+0x30 [python3.9] ............................................................ ^C[root@liruilongs.github.io tools]
根据分配的内存块大小,调用栈关键路径,我们可以确定是 list
扩容引起的 内存泄漏
1 2 3 4 5 6 PyByteArray_Resize+0xcc → _PyObject_MakeTpCall+0x98 → _PyEval_EvalFrameDefault+0x5b98 → PyRun_SimpleFileExFlags+0x124 → Py_RunMain+0x5f0 → Py_BytesMain+0x5c
对于 Python 也无法直接通过 memleak
来实现跟踪,实际中可能需要通过 tracemalloc
等 python 内存工具进行分析, tracemalloc
是 Python 标准库中的内存追踪调试工具,用于监控和分析 Python 程序的内存分配行为
C 内存泄漏分析 前面我们简单分析了这个 BCC 脚本,可以看到实际上他直接对内核库的一些用户态和内核态的内存分配函数进行埋点跟踪,所以对于用户态的项目来说,用 C 写的更合适一点,我们看一个 Demo
1 2 3 4 5 6 7 ┌──[root@liruilongs.github.io]-[~] └─$vim memory_leak_demo.c ┌──[root@liruilongs.github.io]-[~] └─$vim memory_leak_demo.c ┌──[root@liruilongs.github.io]-[~] └─$gcc -g memory_leak_demo.c -o leak_demo
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 ┌──[root@liruilongs.github.io]-[~] └─$cat memory_leak_demo.c #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <time.h> static int allocation_count = 0 ;void * allocate_memory (size_t size) { void *ptr = malloc (size); if (ptr) { allocation_count++; time_t now = time(NULL ); struct tm *tm_info = localtime(&now); char time_buf[20 ]; strftime(time_buf, 20 , "%Y-%m-%d %H:%M:%S" , tm_info); printf ("[%s] 分配 #%d: %zu 字节 at 地址 %p\n" , time_buf, allocation_count, size, ptr); } else { perror("内存分配失败" ); } return ptr; } void memory_leak_demo () { int *data_buffer = NULL ; for (int i = 0 ; i < 1000 ; i++) { data_buffer = (int *)allocate_memory(1024 * 1024 ); if (data_buffer) { data_buffer[0 ] = i; printf ("写入值: %d\n" , data_buffer[0 ]); } sleep(1 ); } } int main () { printf ("===== 内存泄漏演示开始 =====\n" ); memory_leak_demo(); printf ("===== 演示结束(已泄漏 %d 块内存)=====\n" , allocation_count); return 0 ; }
使用 memleak 观测内存问题,下面的输出显示
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 ┌──[root@liruilongs.github.io]-[/usr/share/bcc/tools] └─$./memleak -p $(pgrep leak_demo) --top 3 -s 3 -a 10 -o 20000 Attaching to pid 16369, Ctrl+C to quit. [10:43:05] Top 3 stacks with outstanding allocations: addr = 7fb86c2b7010 size = 15 addr = 7fb86ada2000 size = 1048576 addr = 7fb86b3a8000 size = 1048576 addr = 7fb86b7ac000 size = 1048576 addr = 7fb86b6ab000 size = 1048576 addr = 7fb86bbb0010 size = 1048576 addr = 7fb86b8ad000 size = 1052672 addr = 7fb86b1a6000 size = 1052672 addr = 7fb86b5aa000 size = 1052672 addr = 7fb86afa4000 size = 1052672 addr = 7fb86b9ae010 size = 1052672 addr = 7fb86baaf010 size = 1052672 addr = 7fb86b0a5000 size = 1052672 3153935 bytes in 4 allocations from stack 0x00000000004011ae allocate_memory+0x18 [leak_demo] 0x000000000040125f memory_leak_demo+0x23 [leak_demo] 0x00000000004012bd main+0x18 [leak_demo] 0x00007fb870c29590 __libc_start_call_main+0x80 [libc.so.6] 9457664 bytes in 9 allocations from stack 0x00007fb870c980dd sysmalloc+0x7ed [libc.so.6] [10:43:18] Top 3 stacks with outstanding allocations: addr = 7fb86c2b7010 size = 15 .................................... 0x00007fb870c980dd sysmalloc+0x7ed [libc.so.6] [10:43:28] Top 3 stacks with outstanding allocations: addr = 7fb86c2b7010 size = 15 addr = 7fb86ada2000 size = 1048576 addr = 7fb86b3a8000 size = 1048576 addr = 7fb86b7ac000 size = 1048576 addr = 7fb86a095000 size = 1048576 addr = 7fb86a99e000 size = 1048576 addr = 7fb869f94000 size = 1048576 addr = 7fb86b6ab000 size = 1048576 addr = 7fb86bbb0010 size = 1048576 addr = 7fb86b8ad000 size = 1052672 addr = 7fb86a79c000 size = 1052672 addr = 7fb86a89d000 size = 1052672 addr = 7fb86b1a6000 size = 1052672 addr = 7fb86a398000 size = 1052672 addr = 7fb86b5aa000 size = 1052672 addr = 7fb86afa4000 size = 1052672 addr = 7fb869e93000 size = 1052672 addr = 7fb869c91000 size = 1052672 addr = 7fb86a196000 size = 1052672 addr = 7fb86b9ae010 size = 1052672 addr = 7fb86baaf010 size = 1052672 addr = 7fb86b0a5000 size = 1052672 addr = 7fb869a8f000 size = 1052672 addr = 7fb86a297000 size = 1052672 3153935 bytes in 4 allocations from stack 0x00000000004011ae allocate_memory+0x18 [leak_demo] 0x000000000040125f memory_leak_demo+0x23 [leak_demo] 0x00000000004012bd main+0x18 [leak_demo] 0x00007fb870c29590 __libc_start_call_main+0x80 [libc.so.6] 21024768 bytes in 20 allocations from stack 0x00007fb870c980dd sysmalloc+0x7ed [libc.so.6] [10:43:38] Top 3 stacks with outstanding allocations: addr = 7fb86c2b7010 size = 15 addr = 7fb86ada2000 size = 1048576 addr = 7fb86b3a8000 size = 1048576 addr = 7fb86b7ac000 size = 1048576 addr = 7fb86a095000 size = 1048576 addr = 7fb86a99e000 size = 1048576 addr = 7fb869f94000 size = 1048576 addr = 7fb86b6ab000 size = 1048576 addr = 7fb86bbb0010 size = 1048576 addr = 7fb86b8ad000 size = 1052672 addr = 7fb86a79c000 size = 1052672 addr = 7fb86a89d000 size = 1052672 addr = 7fb86958a000 size = 1052672 addr = 7fb86b1a6000 size = 1052672 addr = 7fb86a398000 size = 1052672 addr = 7fb86b5aa000 size = 1052672 addr = 7fb86afa4000 size = 1052672 addr = 7fb869e93000 size = 1052672 addr = 7fb869c91000 size = 1052672 addr = 7fb86a196000 size = 1052672 addr = 7fb869085000 size = 1052672 addr = 7fb86b9ae010 size = 1052672 addr = 7fb86baaf010 size = 1052672 addr = 7fb86b0a5000 size = 1052672 addr = 7fb869186000 size = 1052672 addr = 7fb869a8f000 size = 1052672 addr = 7fb86a297000 size = 1052672 3153935 bytes in 4 allocations from stack 0x00000000004011ae allocate_memory+0x18 [leak_demo] 0x000000000040125f memory_leak_demo+0x23 [leak_demo] 0x00000000004012bd main+0x18 [leak_demo] 0x00007fb870c29590 __libc_start_call_main+0x80 [libc.so.6] 24182784 bytes in 23 allocations from stack 0x00007fb870c980dd sysmalloc+0x7ed [libc.so.6] ^C┌──[root@liruilongs.github.io]-[/usr/share/bcc/tools] └─$
持续增长的未释放内存块(如 1052672 字节 ≈1MB 的多次分配),通过 memleak
打印的 堆栈追踪指向 allocate_memory+0x18
和 memory_leak_demo+0x23
函数
1 2 3 4 5 3153935 bytes in 4 allocations from stack 0x00000000004011ae allocate_memory+0x18 [leak_demo] 0x000000000040125f memory_leak_demo+0x23 [leak_demo] 0x00000000004012bd main+0x18 [leak_demo] 0x00007fb870c29590 __libc_start_call_main+0x80 [libc.so.6]
正是上面 Demo 中的调用函数 memory_leak_demo()
和分配函数 allocate_memory
.
关于 BCC 工具 memleak
进行内存泄漏分析和小伙伴分析到这里,上面都是一些 Demo,只是为了展示工具如何使用,实际的分析要结合调用栈复杂的多。
博文部分内容参考 © 文中涉及参考链接内容版权归原作者所有,如有侵权请告知 :)
《BPF Performance Tools》
© 2018-至今 liruilonger@gmail.com , 保持署名-非商用-相同方式共享(CC BY-NC-SA 4.0)