| # Debugging memory usage on Android |
| |
| In this guide, you'll learn how to: |
| |
| - Use `dumpsys meminfo` to get a high-level overview of memory usage. |
| - Understand the basics of Linux memory management. |
| - Use Perfetto to investigate memory usage over time. |
| - Analyze native heap profiles and Java heap dumps to identify memory leaks. |
| |
| ## Prerequisites |
| |
| - A host running macOS or Linux. |
| - [ADB](https://developer.android.com/studio/command-line/adb) installed and in |
| PATH. |
| - A device running Android 11+. |
| |
| If you are profiling your own app and are not running a userdebug build of |
| Android, your app needs to be marked as profileable or debuggable in its |
| manifest. See the |
| [heapprofd documentation](/docs/data-sources/native-heap-profiler.md#heapprofd-targets) |
| for more details on which applications can be targeted. |
| |
| ## dumpsys meminfo |
| |
| A good place to get started investigating memory usage of a process is |
| `dumpsys meminfo` which gives a high-level overview of how much of the various |
| types of memory are being used by a process. |
| |
| ```bash |
| $ adb shell dumpsys meminfo com.android.systemui |
| |
| Applications Memory Usage (in Kilobytes): |
| Uptime: 2030149 Realtime: 2030149 |
| |
| ** MEMINFO in pid 1974 [com.android.systemui] ** |
| Pss Private Private SwapPss Rss Heap Heap Heap |
| Total Dirty Clean Dirty Total Size Alloc Free |
| ------ ------ ------ ------ ------ ------ ------ ------ |
| Native Heap 16840 16804 0 6764 19428 34024 25037 5553 |
| Dalvik Heap 9110 9032 0 136 13164 36444 9111 27333 |
| |
| [more stuff...] |
| ``` |
| |
| Looking at the "Private Dirty" column of Dalvik Heap (= Java Heap) and Native |
| Heap, we can see that SystemUI's memory usage on the Java heap is 9M, on the |
| native heap it's 17M. |
| |
| ## Linux memory management |
| |
| But what does _clean_, _dirty_, _Rss_, _Pss_, _Swap_ actually mean? To answer |
| this question, we need to delve into Linux memory management a bit. |
| |
| From the kernel's point of view, memory is split into equally sized blocks |
| called _pages_. These are generally 4KiB. |
| |
| Pages are organized in virtually contiguous ranges called VMA (Virtual Memory |
| Area). |
| |
| VMAs are created when a process requests a new pool of memory pages through the |
| [mmap() system call](https://man7.org/linux/man-pages/man2/mmap.2.html). |
| Applications rarely call mmap() directly. Those calls are typically mediated by |
| the allocator, `malloc()/operator new()` for native processes or by the Android |
| RunTime for Java apps. |
| |
| VMAs can be of two types: file-backed and anonymous. |
| |
| **File-backed VMAs** are a view of a file in memory. They are obtained passing a |
| file descriptor to `mmap()`. The kernel will serve page faults on the VMA |
| through the passed file, so reading a pointer to the VMA becomes the equivalent |
| of a `read()` on the file. File-backed VMAs are used, for instance, by the |
| dynamic linker (`ld`) when executing new processes or dynamically loading |
| libraries, or by the Android framework, when loading a new .dex library or |
| accessing resources in the APK. |
| |
| **Anonymous VMAs** are memory-only areas not backed by any file. This is the way |
| allocators request dynamic memory from the kernel. Anonymous VMAs are obtained |
| calling `mmap(... MAP_ANONYMOUS ...)`. |
| |
| Physical memory is only allocated, in page granularity, once the application |
| tries to read/write from a VMA. If you allocate 32 MiB worth of pages but only |
| touch one byte, your process' memory usage will only go up by 4KiB. You will |
| have increased your process' _virtual memory_ by 32 MiB, but its resident |
| _physical memory_ by 4 KiB. |
| |
| When optimizing memory use of programs, we are interested in reducing their |
| footprint in _physical memory_. High _virtual memory_ use is generally not a |
| cause for concern on modern platforms (except if you run out of address space, |
| which is very hard on 64 bit systems). |
| |
| We call the amount a process' memory that is resident in _physical memory_ its |
| **RSS** (Resident Set Size). Not all resident memory is equal though. |
| |
| From a memory-consumption viewpoint, individual pages within a VMA can have the |
| following states: |
| |
| - **Resident**: the page is mapped to a physical memory page. Resident pages can |
| be in two states: |
| - **Clean** (only for file-backed pages): the contents of the page are the |
| same of the contents on-disk. The kernel can evict clean pages more easily |
| in case of memory pressure. This is because if they should be needed again, |
| the kernel knows it can re-create its contents by reading them from the |
| underlying file. |
| - **Dirty**: the contents of the page diverge from the disk, or (in most |
| cases), the page has no disk backing (i.e. it's _anonymous_). Dirty pages |
| cannot be evicted because doing so would cause data loss. However they can |
| be swapped out on disk or ZRAM, if present. |
| - **Swapped**: a dirty page can be written to the swap file on disk (on most |
| Linux desktop distributions) or compressed (on Android and CrOS through |
| [ZRAM](https://source.android.com/devices/tech/perf/low-ram#zram)). The page |
| will stay swapped until a new page fault on its virtual address happens, at |
| which point the kernel will bring it back in main memory. |
| - **Not present**: no page fault ever happened on the page or the page was clean |
| and later was evicted. |
| |
| It is generally more important to reduce the amount of _dirty_ memory as that |
| cannot be reclaimed like _clean_ memory and, on Android, even if swapped in |
| ZRAM, will still eat part of the system memory budget. This is why we looked at |
| _Private Dirty_ in the `dumpsys meminfo` example. |
| |
| _Shared_ memory can be mapped into more than one process. This means VMAs in |
| different processes refer to the same physical memory. This typically happens |
| with file-backed memory of commonly used libraries (e.g., libc.so, |
| framework.dex) or, more rarely, when a process `fork()`s and a child process |
| inherits dirty memory from its parent. |
| |
| This introduces the concept of **PSS** (Proportional Set Size). In **PSS**, |
| memory that is resident in multiple processes is proportionally attributed to |
| each of them. If we map one 4KiB page into four processes, each of their **PSS** |
| will increase by 1KiB. |
| |
| #### Recap |
| |
| - Dynamically allocated memory, whether allocated through C's `malloc()`, C++'s |
| `operator new()` or Java's `new X()` starts always as _anonymous_ and _dirty_, |
| unless it is never used. |
| - If this memory is not read/written for a while, or in case of memory pressure, |
| it gets swapped out on ZRAM and becomes _swapped_. |
| - Anonymous memory, whether _resident_ (and hence _dirty_) or _swapped_ is |
| always a resource hog and should be avoided if unnecessary. |
| - File-mapped memory comes from code (java or native), libraries and resource |
| and is almost always _clean_. Clean memory also erodes the system memory |
| budget but typically application developers have less control on it. |
| |
| ## Memory over time |
| |
| `dumpsys meminfo` is good to get a snapshot of the current memory usage, but |
| even very short memory spikes can lead to low-memory situations, which will lead |
| to [LMKs](#lmk). We have two tools to investigate situations like this |
| |
| - RSS High Watermark. |
| - Memory tracepoints. |
| |
| ### RSS High Watermark |
| |
| We can get a lot of information from the `/proc/[pid]/status` file, including |
| memory information. `VmHWM` shows the maximum RSS usage the process has seen |
| since it was started. This value is kept updated by the kernel. |
| |
| ```bash |
| $ adb shell cat '/proc/$(pidof com.android.systemui)/status' |
| [...] |
| VmHWM: 256972 kB |
| VmRSS: 195272 kB |
| RssAnon: 30184 kB |
| RssFile: 164420 kB |
| RssShmem: 668 kB |
| VmSwap: 43960 kB |
| [...] |
| ``` |
| |
| ### Memory tracepoints |
| |
| NOTE: For detailed instructions about the memory trace points see the |
| [Data sources > Memory > Counters and events](/docs/data-sources/memory-counters.md) |
| page. |
| |
| We can use Perfetto to get information about memory management events from the |
| kernel. |
| |
| ```bash |
| $ adb shell perfetto \ |
| -c - --txt \ |
| -o /data/misc/perfetto-traces/trace \ |
| <<EOF |
| |
| buffers: { |
| size_kb: 8960 |
| fill_policy: DISCARD |
| } |
| buffers: { |
| size_kb: 1280 |
| fill_policy: DISCARD |
| } |
| data_sources: { |
| config { |
| name: "linux.process_stats" |
| target_buffer: 1 |
| process_stats_config { |
| scan_all_processes_on_start: true |
| } |
| } |
| } |
| data_sources: { |
| config { |
| name: "linux.ftrace" |
| ftrace_config { |
| ftrace_events: "mm_event/mm_event_record" |
| ftrace_events: "kmem/rss_stat" |
| ftrace_events: "kmem/ion_heap_grow" |
| ftrace_events: "kmem/ion_heap_shrink" |
| } |
| } |
| } |
| duration_ms: 30000 |
| |
| EOF |
| ``` |
| |
| While it is running, take a photo if you are following along. |
| |
| Pull the file using `adb pull /data/misc/perfetto-traces/trace ~/mem-trace` and |
| upload to the [Perfetto UI](https://ui.perfetto.dev). This will show overall |
| stats about system [ION](#ion) usage, and per-process stats to expand. Scroll |
| down (or Ctrl-F for) to `com.google.android.GoogleCamera` and expand. This will |
| show a timeline for various memory stats for camera. |
| |
|  |
| |
| We can see that around 2/3 into the trace, the memory spiked (in the |
| mem.rss.anon track). This is where I took a photo. This is a good way to see how |
| the memory usage of an application reacts to different triggers. |
| |
| ## Which tool to use |
| |
| If you want to drill down into _anonymous_ memory allocated by Java code, |
| labeled by `dumpsys meminfo` as `Dalvik Heap`, see the |
| [Analyzing the java heap](#java-hprof) section. |
| |
| If you want to drill down into _anonymous_ memory allocated by native code, |
| labeled by `dumpsys meminfo` as `Native Heap`, see the |
| [Analyzing the Native Heap](#heapprofd) section. Note that it's frequent to end |
| up with native memory even if your app doesn't have any C/C++ code. This is |
| because the implementation of some framework API (e.g. Regex) is internally |
| implemented through native code. |
| |
| If you want to drill down into file-mapped memory the best option is to use |
| `adb shell showmap PID` (on Android) or inspect `/proc/PID/smaps`. |
| |
| ## {#lmk} Low-memory kills |
| |
| When an Android device becomes low on memory, a daemon called `lmkd` will start |
| killing processes in order to free up memory. Devices' strategies differ, but in |
| general processes will be killed in order of descending `oom_score_adj` score |
| (i.e. background apps and processes first, foreground processes last). |
| |
| Apps on Android are not killed when switching away from them. They instead |
| remain _cached_ even after the user finishes using them. This is to make |
| subsequent starts of the app faster. Such apps will generally be killed first |
| (because they have a higher `oom_score_adj`). |
| |
| We can collect information about LMKs and `oom_score_adj` using Perfetto. |
| |
| ```protobuf |
| $ adb shell perfetto \ |
| -c - --txt \ |
| -o /data/misc/perfetto-traces/trace \ |
| <<EOF |
| |
| buffers: { |
| size_kb: 8960 |
| fill_policy: DISCARD |
| } |
| buffers: { |
| size_kb: 1280 |
| fill_policy: DISCARD |
| } |
| data_sources: { |
| config { |
| name: "linux.process_stats" |
| target_buffer: 1 |
| process_stats_config { |
| scan_all_processes_on_start: true |
| } |
| } |
| } |
| data_sources: { |
| config { |
| name: "linux.ftrace" |
| ftrace_config { |
| ftrace_events: "lowmemorykiller/lowmemory_kill" |
| ftrace_events: "oom/oom_score_adj_update" |
| ftrace_events: "ftrace/print" |
| atrace_apps: "lmkd" |
| } |
| } |
| } |
| duration_ms: 60000 |
| |
| EOF |
| ``` |
| |
| Pull the file using `adb pull /data/misc/perfetto-traces/trace ~/oom-trace` and |
| upload to the [Perfetto UI](https://ui.perfetto.dev). |
| |
|  |
| |
| We can see that the OOM score of Camera gets reduced (making it less likely to |
| be killed) when it is opened, and gets increased again once it is closed. |
| |
| ## {#heapprofd} Analyzing the Native Heap |
| |
| **Native Heap Profiles require Android 10.** |
| |
| NOTE: For detailed instructions about the native heap profiler and |
| troubleshooting see the |
| [Data sources > Heap profiler](/docs/data-sources/native-heap-profiler.md) page. |
| |
| Applications usually get memory through `malloc` or C++'s `new` rather than |
| directly getting it from the kernel. The allocator makes sure that your memory |
| is more efficiently handled (i.e. there are not many gaps) and that the overhead |
| from asking the kernel remains low. |
| |
| We can log the native allocations and frees that a process does using |
| _heapprofd_. The resulting profile can be used to attribute memory usage to |
| particular function callstacks, supporting a mix of both native and Java code. |
| The profile _will only show allocations done while it was running_, any |
| allocations done before will not be shown. |
| |
| ### {#capture-profile-native} Capturing the profile |
| |
| Use the `tools/heap_profile` script to profile a process. If you are having |
| trouble make sure you are using the |
| [latest version](https://raw.githubusercontent.com/google/perfetto/main/tools/heap_profile). |
| See all the arguments using `tools/heap_profile -h`, or use the defaults and |
| just profile a process (e.g. `system_server`): |
| |
| ```bash |
| $ tools/heap_profile -n system_server |
| |
| Profiling active. Press Ctrl+C to terminate. |
| You may disconnect your device. |
| |
| Wrote profiles to /tmp/profile-1283e247-2170-4f92-8181-683763e17445 (symlink /tmp/heap_profile-latest) |
| These can be viewed using pprof. Googlers: head to pprof/ and upload them. |
| ``` |
| |
| When you see _Profiling active_, play around with the phone a bit. When you are |
| done, press Ctrl-C to end the profile. For this tutorial, I opened a couple of |
| apps. |
| |
| ### Viewing the data |
| |
| Then upload the `raw-trace` file from the output directory to the |
| [Perfetto UI](https://ui.perfetto.dev) and click on diamond marker that shows. |
| |
|  |
| |
| The tabs that are available are |
| |
| - **Unreleased malloc size**: how many bytes were allocated but not freed at |
| this callstack the moment the dump was created. |
| - **Total malloc size**: how many bytes were allocated (including ones freed at |
| the moment of the dump) at this callstack. |
| - **Unreleased malloc count**: how many allocations without matching frees were |
| done at this callstack. |
| - **Total malloc count**: how many allocations (including ones with matching |
| frees) were done at this callstack. |
| |
| The default view will show you all allocations that were done while the profile |
| was running but that weren't freed (the **space** tab). |
| |
|  |
| |
| We can see that a lot of memory gets allocated in paths through |
| `AssetManager.applyStyle`. To get the total memory that was allocated this way, |
| we can enter "applyStyle" into the Focus textbox. This will only show callstacks |
| where some frame matches "applyStyle". |
| |
|  |
| |
| From this we have a clear idea where in the code we have to look. From the code |
| we can see how that memory is being used and if we actually need all of it. |
| |
| ## {#java-hprof} Analyzing the Java Heap |
| |
| **Java Heap Dumps require Android 11.** |
| |
| NOTE: For detailed instructions about capturing Java heap dumps and |
| troubleshooting see the |
| [Data sources > Java heap dumps](/docs/data-sources/java-heap-profiler.md) page. |
| |
| ### {#capture-profile-java} Dumping the java heap |
| |
| We can get a snapshot of the graph of all the Java objects that constitute the |
| Java heap. We use the `tools/java_heap_dump` script. If you are having trouble |
| make sure you are using the |
| [latest version](https://raw.githubusercontent.com/google/perfetto/main/tools/java_heap_dump). |
| |
| ```bash |
| $ tools/java_heap_dump -n com.android.systemui |
| |
| Dumping Java Heap. |
| Wrote profile to /tmp/tmpup3QrQprofile |
| This can be viewed using https://ui.perfetto.dev. |
| ``` |
| |
| ### Viewing the Data |
| |
| Upload the trace to the [Perfetto UI](https://ui.perfetto.dev) and click on |
| diamond marker that shows. |
| |
|  |
| |
| This will present a set of flamegraph views as explained below. |
| |
| #### "Size" and "Objects" tabs |
| |
|  |
| |
| These views show the memory attributed to the shortest path to a |
| garbage-collection root. In general an object is reachable by many paths, we |
| only show the shortest as that reduces the complexity of the data displayed and |
| is generally the highest-signal. The rightmost `[merged]` stacks is the sum of |
| all objects that are too small to be displayed. |
| |
| - **Size**: how many bytes are retained via this path to the GC root. |
| - **Objects**: how many objects are retained via this path to the GC root. |
| |
| If we want to only see callstacks that have a frame that contains some string, |
| we can use the Focus feature. If we want to know all allocations that have to do |
| with notifications, we can put "notification" in the Focus box. |
| |
| As with native heap profiles, if we want to focus on some specific aspect of the |
| graph, we can filter by the names of the classes. If we wanted to see everything |
| that could be caused by notifications, we can put "notification" in the Focus |
| box. |
| |
|  |
| |
| We aggregate the paths per class name, so if there are multiple objects of the |
| same type retained by a `java.lang.Object[]`, we will show one element as its |
| child, as you can see in the leftmost stack above. This also applies to the |
| dominator tree paths as described below. |
| |
| #### "Dominated Size" and "Dominated Objects" tabs |
| |
|  |
| |
| Another way to present the heap graph as a flamegraph (a tree) is to show its |
| [dominator tree](/docs/analysis/stdlib-docs.autogen#memory-heap_graph_dominator_tree). |
| In a heap graph, an object `a` dominates an object `b` if `b` is reachable from |
| the root only via paths that go through `a`. The dominators of an object form a |
| chain from the root and the object is exclusvely retained by all objects on this |
| chain. For all reachable objects in the graph those chains form a tree, i.e. the |
| dominator tree. |
| |
| We aggregate the tree paths per class name, and each element (tree node) |
| represents a set of objects that have the same class name and position in the |
| dominator tree. |
| |
| - **Dominated Size**: how many bytes are exclusively retained by the objects in |
| a node. |
| - **Dominated Objects**: how many objects are exclusively retained by the |
| objects in a node. |