|  | # Tracing 101 | 
|  | *This page provides a birds-eye view of performance analysis. | 
|  | The aim is to orient people who have no idea what "tracing" is.* | 
|  |  | 
|  | ## Introduction to... | 
|  | ### Performance | 
|  | Performance analysis is concerned with making software run *better*. | 
|  | The definition of *better* varies widely and depends on the situation. | 
|  | Examples include: | 
|  | * performing the same work using fewer resources (CPU, memory, | 
|  | network, battery, etc.) | 
|  | * increasing utilization of available resources | 
|  | * identifying and eliminating unnecessary work altogether | 
|  |  | 
|  | Much of the difficulty in improving performance comes from | 
|  | identifying the root cause of performance issues. Modern software systems are | 
|  | complicated, having a lot of components and a web of cross-interactions. | 
|  | Techniques which help engineers understand the execution of a system | 
|  | and pinpoint issues that are critical. | 
|  |  | 
|  | **Tracing** and **profiling** are two such widely-used techniques for | 
|  | performance analysis. **Perfetto** is an open-source suite of tools, combining | 
|  | tracing and profiling to give users powerful insights into their system. | 
|  |  | 
|  | ### Tracing | 
|  | **Tracing** involves collecting highly detailed data about the execution | 
|  | of a system. A single continuous session of recording is called a trace file | 
|  | or **trace** for short. | 
|  |  | 
|  | Traces contain enough detail to fully reconstruct the timeline of events. | 
|  | They often include low-level kernel events like scheduler context switches, | 
|  | thread wakeups, syscalls, etc. With the "right" trace, reproduction of a | 
|  | performance bug is not needed as the trace provides all necessary context. | 
|  |  | 
|  | Application code is also **instrumented** in areas of the program which are | 
|  | considered to be *important*. This instrumentation keeps track of what the | 
|  | program was doing over time (e.g. which functions were being run, or how long | 
|  | each call took) and context about the execution (e.g. what were the parameters | 
|  | to a function call, or why was a function run). | 
|  |  | 
|  | The level of detail in traces makes it impractical to read traces directly | 
|  | like a log file in all but the simplest cases. Instead, a combination of | 
|  | **trace analysis** libraries and **trace viewers** are used. Trace analysis | 
|  | libraries provide a way for users to extract and summarize trace events in | 
|  | a programmatic manner. Trace viewers visualize the events in a trace on a | 
|  | timeline which give users a graphical view of what their system was doing | 
|  | over time. | 
|  |  | 
|  | #### Logging vs tracing | 
|  | A good intuition is that logging is to functional testing what | 
|  | tracing is to performance analysis. Tracing is, in a sense, "structured" | 
|  | logging: instead of having arbitrary strings emitted from parts of the system, | 
|  | tracing reflects the detailed state of a system in a structured way to allow | 
|  | reconstruction of the timeline of events. | 
|  |  | 
|  | Moreover, tracing frameworks (like Perfetto) place heavy emphasis | 
|  | on having minimal overhead. This is essential so that the framework | 
|  | does not significantly disrupt whatever is being measured: modern frameworks | 
|  | are fast enough that they can measure execution at the nanosecond level | 
|  | without significantly impacting the execution speed of the program. | 
|  |  | 
|  | *Small aside: theoretically, tracing frameworks are powerful enough to act as | 
|  | a logging system as well. However, the utilization of each in practice is | 
|  | different enough that the two tend to be separate.* | 
|  |  | 
|  | #### Metrics vs tracing | 
|  | Metrics are numerical values which track the performance of a system over time. | 
|  | Usually metrics map to high-level concepts. Examples of metrics include: CPU | 
|  | usage, memory usage, network bandwidth, etc. Metrics are collected directly from | 
|  | the app or operating system while the program is running. | 
|  |  | 
|  | After glimpsing the power of tracing, a natural question arises: why bother | 
|  | with high level metrics at all? Why not instead just use tracing and | 
|  | compute metrics on resulting traces? In some settings, this may indeed be the | 
|  | right approach. In local and lab situations using **trace-based metrics**, | 
|  | where metrics are computed from traces instead of collecting them directly, | 
|  | is a powerful approach. If a metric regresses, it's easy to open the trace | 
|  | to root cause why that happened. | 
|  |  | 
|  | However, trace-based metrics are not a universal solution. When running in | 
|  | production, the heavyweight nature of traces can make it impractical to collect | 
|  | them 24/7. Computing a metric with a trace can take megabytes of data vs bytes | 
|  | for direct metric collection. | 
|  |  | 
|  | Using metrics is the right choice when you want to understand the performance | 
|  | of a system over time but do not want to or can not pay the cost of collecting | 
|  | traces. In these situations, traces should be used as a **root-causing** tool. | 
|  | When your metrics show there is a problem, targeted tracing can be rolled out | 
|  | to understand why the regression may have happened. | 
|  |  | 
|  | ### Profiling | 
|  | **Profiling** involves sampling some usage of a resource by | 
|  | a program. A single continuous session of recording is known as a **profile**. | 
|  |  | 
|  | Each sample collects the function callstack (i.e. the line of code along with | 
|  | all calling functions). Generally this information is aggregated across the | 
|  | profile. For each seen callstack, the aggregation gives the percentage of usage | 
|  | of the resource by that callstack. By far the most common types of profiling are | 
|  | **memory profiling** and **CPU profiling**. | 
|  |  | 
|  | Memory profiling is used to understand which parts of a program are allocating | 
|  | memory on the heap. The profiler generally hooks into `malloc` (and `free`) | 
|  | calls of a native (C/C++/Rust/etc.) program to sample the callstacks | 
|  | calling `malloc`. Information about how many bytes were allocated is also | 
|  | retained. CPU profiling is used for understanding where the program is | 
|  | spending CPU time. The profiler captures the callstack running on a CPU | 
|  | over time. Generally this is done periodically (e.g. every 50ms), but can be | 
|  | also be done when certain events happen in the operating system. | 
|  |  | 
|  | #### Profiling vs tracing | 
|  | There are two main questions for comparing profiling and tracing: | 
|  | 1. Why profile my program statistically when I can just trace *everything*? | 
|  | 2. Why use tracing to reconstruct the timeline of events when profiling gives me | 
|  | the exact line of code using the most resources? | 
|  |  | 
|  | ##### When to use profiling over tracing | 
|  | Traces cannot feasibly capture execution of extreme high frequency | 
|  | events e.g. every function call. Profiling tools fill this niche: by | 
|  | sampling, they can significantly cut down on how much information they store. | 
|  | The statistical nature of profilers are rarely a problem; the sampling | 
|  | algorithms for profilers are specifically designed to capture data which is | 
|  | highly representative of the real resource use. | 
|  |  | 
|  | *Aside: a handful of very specialized tracing tools exist which | 
|  | can capture every function call (e.g. | 
|  | [magic-trace](https://github.com/janestreet/magic-trace)) but they output | 
|  | *gigabytes* of data every second which make them impractical for anything | 
|  | beyond investigating tiny snippets of code. They also generally have higher | 
|  | overhead than general purpose tracing tools.* | 
|  |  | 
|  | ##### When to use tracing over profiling | 
|  | While profilers give callstacks where resources are being used, they lack | 
|  | information about *why* that happened. For example, why was malloc being called | 
|  | by function *foo()* so many times? All they say is *foo()* allocated X bytes | 
|  | over Y calls to `malloc`. Traces are excellent at providing this exact context: | 
|  | application instrumentation and low-level kernel events together provide | 
|  | deep insight into why code was run in the first place. | 
|  |  | 
|  | NOTE: Perfetto supports collecting, analyzing and visualizing both profiles | 
|  | and traces at the same time so you can have the best of both worlds! | 
|  |  | 
|  | ## Perfetto | 
|  | Perfetto is a suite of tools for software performance analysis. Its purpose | 
|  | is to empower engineers to understand where resources are being used by their | 
|  | systems. It helps identify the changes they can make to improve performance | 
|  | and verify the impact of those changes. | 
|  |  | 
|  | NOTE: In Perfetto, since profiles and traces can be collected simultaneously, | 
|  | we call everything a "trace" even if it may contain (only) profiling data | 
|  | inside. | 
|  |  | 
|  | ### Recording traces | 
|  | Perfetto is highly configurable when it comes to recording traces. There are | 
|  | literally hundreds of knobs which can be tweaked to control what data is | 
|  | collected, how it should be collected, how much information a trace should | 
|  | contain etc. | 
|  |  | 
|  | [Record traces on Linux quickstart](/docs/quickstart/linux-tracing.md) is | 
|  | a good place to start if you're unfamiliar with Perfetto. For Android | 
|  | developers, | 
|  | [Record traces on Android quickstart](/docs/quickstart/android-tracing.md) will | 
|  | be more applicable. The [trace configuration](/docs/concepts/config.md) page | 
|  | is also useful to consult as a reference. | 
|  |  | 
|  | The following sub-sections give an overview of various points worth considering | 
|  | when recording Perfetto traces. | 
|  |  | 
|  | #### Kernel tracing | 
|  | Perfetto integrates closely with the Linux kernel's | 
|  | [ftrace](https://www.kernel.org/doc/Documentation/trace/ftrace.txt) tracing | 
|  | system to record kernel events (e.g. scheduling, syscalls, wakeups). The | 
|  | [scheduling](/docs/data-sources/cpu-scheduling.md), | 
|  | [syscall](/docs/data-sources/syscalls.md) and | 
|  | [CPU frequency](/docs/data-sources/cpu-freq.md) data source pages give | 
|  | examples of configuring ftrace collection. | 
|  |  | 
|  | Natively supported ftrace events can be found in the fields of | 
|  | [this proto message](/docs/reference/trace-packet-proto.autogen#FtraceEvent). | 
|  | Perfetto also supports collecting ftrace events it does not natively understand | 
|  | (i.e. it does not have a protobuf message for) as a | 
|  | ["generic"](/docs/reference/trace-packet-proto.autogen#GenericFtraceEvent) | 
|  | events. These events are encoded as key-value pairs, similar to a JSON | 
|  | dictionary. | 
|  |  | 
|  | It is strongly discouraged to rely on generic events for production use cases: | 
|  | the inefficient encoding causes trace size bloat and the | 
|  | [trace processor](/docs/analysis/trace-processor.md) cannot parse them | 
|  | meaningfully. Instead, support should be added for parsing important ftrace | 
|  | events to Perfetto: | 
|  | [here](/docs/contributing/common-tasks.md#add-a-new-ftrace-event) is a simple | 
|  | set of steps to follow which are found. | 
|  |  | 
|  | #### Instrumentation with Perfetto SDK | 
|  | Perfetto has a [C++ SDK](https://perfetto.dev/docs/instrumentation/tracing-sdk) | 
|  | which can be used to instrument programs to emit tracing events. The SDK is | 
|  | designed to be very low-overhead and is distributed in an "amalgamated" form | 
|  | of a one `.cc` and one `.h` file, making it easy to integrate in any build | 
|  | system. | 
|  |  | 
|  | A C SDK is under active development and should be available for general | 
|  | usage by Q2 2023. See [this doc](https://bit.ly/perfetto-c) for details (note | 
|  | viewing this doc requires being a member of | 
|  | [this group](https://groups.google.com/forum/#!forum/perfetto-dev)) | 
|  |  | 
|  | A Java/Kotlin SDK for Android (as a | 
|  | [JetPack library](https://developer.android.com/jetpack/androidx)). | 
|  | This is under development but there is no set timescale for when an official | 
|  | release will happen. | 
|  |  | 
|  | ##### android.os.Trace (atrace) vs Perfetto SDK | 
|  | NOTE: This section is only relevant for Android platform developers or Android | 
|  | app developers with tracing experience. Other readers can safely skip this | 
|  | section. | 
|  |  | 
|  | Perfetto has significant advantages over atrace. Some of the biggest advantages | 
|  | include: | 
|  | * performance: tracing to Perfetto from system/app code requires just a memory | 
|  | write which is far faster than the syscall latency imposed by atrace. This | 
|  | generally makes Perfetto anywhere from 3-4x faster than atrace | 
|  | * features: atrace's API is extremely limited, lacking support for debug | 
|  | arguments, custom clocks, flow events. Perfetto has a far richer API allowing | 
|  | natural representation of data-flow. | 
|  | * trace size: Perfetto supports various features (delta encoded timestamps, | 
|  | interned strings, protobuf encoding) which vastly reduce to size of trace | 
|  | files. | 
|  |  | 
|  | Unfortunately, there are also some downsides: | 
|  | * dedicated thread: a thread dedicated to Perfetto is necessary for every | 
|  | process which wants to trace to Perfetto. | 
|  | * wakeups on tracing start: currently, when tracing starts, every process | 
|  | registered for tracing is woken up which significantly limits how many | 
|  | processes can be traced. This limitation should be removed in coming quarters. | 
|  |  | 
|  | For now, the recommendation from the Perfetto team is to continue utilizing | 
|  | atrace for most usecases: if you think you have a usecase which would benefit | 
|  | from the SDK, please reach out to the team directly. By mid-2023, significant | 
|  | progress should be made addressing the limitations of the current SDK allowing | 
|  | more widespread adoption of the SDK. | 
|  |  | 
|  | <!-- | 
|  | TODO(lalitm): write the remainder of the doc using the following template | 
|  |  | 
|  | #### Native heap profiling | 
|  |  | 
|  | #### Java heap graphs | 
|  |  | 
|  | #### Callstack sampling | 
|  |  | 
|  |  | 
|  | #### Flight recorder tracing | 
|  | TODO(lalitm): write this. | 
|  |  | 
|  | ##### Field tracing | 
|  | TODO(lalitm): write this. | 
|  |  | 
|  | #### Clock sync | 
|  | TODO(lalitm): write this. | 
|  |  | 
|  |  | 
|  | #### Analysis | 
|  | TODO(lalitm): write this. | 
|  | * Trace processing | 
|  | * UI | 
|  | * httpd mode | 
|  | * metrics | 
|  | * Python | 
|  |  | 
|  |  | 
|  | The remainder of this | 
|  | page will focus on the applications of Perfetto to solve various performance | 
|  | related problems. | 
|  |  | 
|  | ## Solving problems with Perfetto | 
|  | TODO(lalitm): write this. | 
|  | * When to look into callstack sampling | 
|  | * When to use memory profiling | 
|  | * When to look at scheduling latency | 
|  |  | 
|  |  | 
|  | TODO(lalitm): write this. | 
|  |  | 
|  | --> |