docs/tracing-101.md - third_party/perfetto - Git at Google

 # Tracing 101
 *This page provides a birds-eye view of performance analysis.
 The aim is to orient people who have no idea what "tracing" is.*

 ## Introduction to...
 ### Performance
 Performance analysis is concerned with making software run *better*.
 The definition of *better* varies widely and depends on the situation.
 Examples include:
 * performing the same work using fewer resources (CPU, memory,
   network, battery, etc.)
 * increasing utilization of available resources
 * identifying and eliminating unnecessary work altogether

 Much of the difficulty in improving performance comes from
 identifying the root cause of performance issues. Modern software systems are
 complicated, having a lot of components and a web of cross-interactions.
 Techniques which help engineers understand the execution of a system
 and pinpoint issues that are critical.

 **Tracing** and **profiling** are two such widely-used techniques for
 performance analysis. **Perfetto** is an open-source suite of tools, combining
 tracing and profiling to give users powerful insights into their system.

 ### Tracing
 **Tracing** involves collecting highly detailed data about the execution
 of a system. A single continuous session of recording is called a trace file
 or **trace** for short.

 Traces contain enough detail to fully reconstruct the timeline of events.
 They often include low-level kernel events like scheduler context switches,
 thread wakeups, syscalls, etc. With the "right" trace, reproduction of a
 performance bug is not needed as the trace provides all necessary context.

 Application code is also **instrumented** in areas of the program which are
 considered to be *important*. This instrumentation keeps track of what the
 program was doing over time (e.g. which functions were being run, or how long
 each call took) and context about the execution (e.g. what were the parameters
 to a function call, or why was a function run).

 The level of detail in traces makes it impractical to read traces directly
 like a log file in all but the simplest cases. Instead, a combination of
 **trace analysis** libraries and **trace viewers** are used. Trace analysis
 libraries provide a way for users to extract and summarize trace events in
 a programmatic manner. Trace viewers visualize the events in a trace on a
 timeline which give users a graphical view of what their system was doing
 over time.

 #### Logging vs tracing
 A good intuition is that logging is to functional testing what
 tracing is to performance analysis. Tracing is, in a sense, "structured"
 logging: instead of having arbitrary strings emitted from parts of the system,
 tracing reflects the detailed state of a system in a structured way to allow
 reconstruction of the timeline of events.

 Moreover, tracing frameworks (like Perfetto) place heavy emphasis
 on having minimal overhead. This is essential so that the framework
 does not significantly disrupt whatever is being measured: modern frameworks
 are fast enough that they can measure execution at the nanosecond level
 without significantly impacting the execution speed of the program.

 *Small aside: theoretically, tracing frameworks are powerful enough to act as
 a logging system as well. However, the utilization of each in practice is
 different enough that the two tend to be separate.*

 #### Metrics vs tracing
 Metrics are numerical values which track the performance of a system over time.
 Usually metrics map to high-level concepts. Examples of metrics include: CPU
 usage, memory usage, network bandwidth, etc. Metrics are collected directly from
 the app or operating system while the program is running.

 After glimpsing the power of tracing, a natural question arises: why bother
 with high level metrics at all? Why not instead just use tracing and
 compute metrics on resulting traces? In some settings, this may indeed be the
 right approach. In local and lab situations using **trace-based metrics**,
 where metrics are computed from traces instead of collecting them directly,
 is a powerful approach. If a metric regresses, it's easy to open the trace
 to root cause why that happened.

 However, trace-based metrics are not a universal solution. When running in
 production, the heavyweight nature of traces can make it impractical to collect
 them 24/7. Computing a metric with a trace can take megabytes of data vs bytes
 for direct metric collection.

 Using metrics is the right choice when you want to understand the performance
 of a system over time but do not want to or can not pay the cost of collecting
 traces. In these situations, traces should be used as a **root-causing** tool.
 When your metrics show there is a problem, targeted tracing can be rolled out
 to understand why the regression may have happened.

 ### Profiling
 **Profiling** involves sampling some usage of a resource by
 a program. A single continuous session of recording is known as a **profile**.

 Each sample collects the function callstack (i.e. the line of code along with
 all calling functions). Generally this information is aggregated across the
 profile. For each seen callstack, the aggregation gives the percentage of usage
 of the resource by that callstack. By far the most common types of profiling are
 **memory profiling** and **CPU profiling**.

 Memory profiling is used to understand which parts of a program are allocating
 memory on the heap. The profiler generally hooks into `malloc` (and `free`)
 calls of a native (C/C++/Rust/etc.) program to sample the callstacks
 calling `malloc`. Information about how many bytes were allocated is also
 retained. CPU profiling is used for understanding where the program is
 spending CPU time. The profiler captures the callstack running on a CPU
 over time. Generally this is done periodically (e.g. every 50ms), but can be
 also be done when certain events happen in the operating system.

 #### Profiling vs tracing
 There are two main questions for comparing profiling and tracing:
 1. Why profile my program statistically when I can just trace *everything*?
 2. Why use tracing to reconstruct the timeline of events when profiling gives me
    the exact line of code using the most resources?

 ##### When to use profiling over tracing
 Traces cannot feasibly capture execution of extreme high frequency
 events e.g. every function call. Profiling tools fill this niche: by
 sampling, they can significantly cut down on how much information they store.
 The statistical nature of profilers are rarely a problem; the sampling
 algorithms for profilers are specifically designed to capture data which is
 highly representative of the real resource use.

 *Aside: a handful of very specialized tracing tools exist which
 can capture every function call (e.g.
 [magic-trace](https://github.com/janestreet/magic-trace)) but they output
 *gigabytes* of data every second which make them impractical for anything
 beyond investigating tiny snippets of code. They also generally have higher
 overhead than general purpose tracing tools.*

 ##### When to use tracing over profiling
 While profilers give callstacks where resources are being used, they lack
 information about *why* that happened. For example, why was malloc being called
 by function *foo()* so many times? All they say is *foo()* allocated X bytes
 over Y calls to `malloc`. Traces are excellent at providing this exact context:
 application instrumentation and low-level kernel events together provide
 deep insight into why code was run in the first place.

 NOTE: Perfetto supports collecting, analyzing and visualizing both profiles
 and traces at the same time so you can have the best of both worlds!

 ## Perfetto
 Perfetto is a suite of tools for software performance analysis. Its purpose
 is to empower engineers to understand where resources are being used by their
 systems. It helps identify the changes they can make to improve performance
 and verify the impact of those changes.

 NOTE: In Perfetto, since profiles and traces can be collected simultaneously,
 we call everything a "trace" even if it may contain (only) profiling data
 inside.

 ### Recording traces
 Perfetto is highly configurable when it comes to recording traces. There are
 literally hundreds of knobs which can be tweaked to control what data is
 collected, how it should be collected, how much information a trace should
 contain etc.

 [Record traces on Linux quickstart](/docs/quickstart/linux-tracing.md) is
 a good place to start if you're unfamiliar with Perfetto. For Android
 developers,
 [Record traces on Android quickstart](/docs/quickstart/android-tracing.md) will
 be more applicable. The [trace configuration](/docs/concepts/config.md) page
 is also useful to consult as a reference.

 The following sub-sections give an overview of various points worth considering
 when recording Perfetto traces.

 #### Kernel tracing
 Perfetto integrates closely with the Linux kernel's
 [ftrace](https://www.kernel.org/doc/Documentation/trace/ftrace.txt) tracing
 system to record kernel events (e.g. scheduling, syscalls, wakeups). The
 [scheduling](/docs/data-sources/cpu-scheduling.md),
 [syscall](/docs/data-sources/syscalls.md) and
 [CPU frequency](/docs/data-sources/cpu-freq.md) data source pages give
 examples of configuring ftrace collection.

 Natively supported ftrace events can be found in the fields of
 [this proto message](/docs/reference/trace-packet-proto.autogen#FtraceEvent).
 Perfetto also supports collecting ftrace events it does not natively understand
 (i.e. it does not have a protobuf message for) as a
 ["generic"](/docs/reference/trace-packet-proto.autogen#GenericFtraceEvent)
 events. These events are encoded as key-value pairs, similar to a JSON
 dictionary.

 It is strongly discouraged to rely on generic events for production use cases:
 the inefficient encoding causes trace size bloat and the
 [trace processor](/docs/analysis/trace-processor.md) cannot parse them
 meaningfully. Instead, support should be added for parsing important ftrace
 events to Perfetto:
 [here](/docs/contributing/common-tasks.md#add-a-new-ftrace-event) is a simple
 set of steps to follow which are found.

 #### Instrumentation with Perfetto SDK
 Perfetto has a [C++ SDK](https://perfetto.dev/docs/instrumentation/tracing-sdk)
 which can be used to instrument programs to emit tracing events. The SDK is
 designed to be very low-overhead and is distributed in an "amalgamated" form
 of a one `.cc` and one `.h` file, making it easy to integrate in any build
 system.

 A C SDK is under active development and should be available for general
 usage by Q2 2023. See [this doc](https://bit.ly/perfetto-c) for details (note
 viewing this doc requires being a member of
 [this group](https://groups.google.com/forum/#!forum/perfetto-dev))

 A Java/Kotlin SDK for Android (as a
 [JetPack library](https://developer.android.com/jetpack/androidx)).
 This is under development but there is no set timescale for when an official
 release will happen.

 ##### android.os.Trace (atrace) vs Perfetto SDK
 NOTE: This section is only relevant for Android platform developers or Android
 app developers with tracing experience. Other readers can safely skip this
 section.

 Perfetto has significant advantages over atrace. Some of the biggest advantages
 include:
 * performance: tracing to Perfetto from system/app code requires just a memory
   write which is far faster than the syscall latency imposed by atrace. This
   generally makes Perfetto anywhere from 3-4x faster than atrace
 * features: atrace's API is extremely limited, lacking support for debug
   arguments, custom clocks, flow events. Perfetto has a far richer API allowing
   natural representation of data-flow.
 * trace size: Perfetto supports various features (delta encoded timestamps,
   interned strings, protobuf encoding) which vastly reduce to size of trace
   files.

 Unfortunately, there are also some downsides:
 * dedicated thread: a thread dedicated to Perfetto is necessary for every
   process which wants to trace to Perfetto.
 * wakeups on tracing start: currently, when tracing starts, every process
   registered for tracing is woken up which significantly limits how many
   processes can be traced. This limitation should be removed in coming quarters.

 For now, the recommendation from the Perfetto team is to continue utilizing
 atrace for most usecases: if you think you have a usecase which would benefit
 from the SDK, please reach out to the team directly. By mid-2023, significant
 progress should be made addressing the limitations of the current SDK allowing
 more widespread adoption of the SDK.

 <!--
 TODO(lalitm): write the remainder of the doc using the following template

 #### Native heap profiling

 #### Java heap graphs

 #### Callstack sampling


 #### Flight recorder tracing
 TODO(lalitm): write this.

 ##### Field tracing
 TODO(lalitm): write this.

 #### Clock sync
 TODO(lalitm): write this.


 #### Analysis
 TODO(lalitm): write this.
 * Trace processing
 * UI
 * httpd mode
 * metrics
 * Python


 The remainder of this
 page will focus on the applications of Perfetto to solve various performance
 related problems.

 ## Solving problems with Perfetto
 TODO(lalitm): write this.
 * When to look into callstack sampling
 * When to use memory profiling
 * When to look at scheduling latency


 TODO(lalitm): write this.

 -->
	# Tracing 101
	*This page provides a birds-eye view of performance analysis.
	The aim is to orient people who have no idea what "tracing" is.*

	## Introduction to...
	### Performance
	Performance analysis is concerned with making software run better.
	The definition of better varies widely and depends on the situation.
	Examples include:
	* performing the same work using fewer resources (CPU, memory,
	network, battery, etc.)
	* increasing utilization of available resources
	* identifying and eliminating unnecessary work altogether

	Much of the difficulty in improving performance comes from
	identifying the root cause of performance issues. Modern software systems are
	complicated, having a lot of components and a web of cross-interactions.
	Techniques which help engineers understand the execution of a system
	and pinpoint issues that are critical.

	Tracing and profiling are two such widely-used techniques for
	performance analysis. Perfetto is an open-source suite of tools, combining
	tracing and profiling to give users powerful insights into their system.

	### Tracing
	Tracing involves collecting highly detailed data about the execution
	of a system. A single continuous session of recording is called a trace file
	or trace for short.

	Traces contain enough detail to fully reconstruct the timeline of events.
	They often include low-level kernel events like scheduler context switches,
	thread wakeups, syscalls, etc. With the "right" trace, reproduction of a
	performance bug is not needed as the trace provides all necessary context.

	Application code is also instrumented in areas of the program which are
	considered to be important. This instrumentation keeps track of what the
	program was doing over time (e.g. which functions were being run, or how long
	each call took) and context about the execution (e.g. what were the parameters
	to a function call, or why was a function run).

	The level of detail in traces makes it impractical to read traces directly
	like a log file in all but the simplest cases. Instead, a combination of
	trace analysis libraries and trace viewers are used. Trace analysis
	libraries provide a way for users to extract and summarize trace events in
	a programmatic manner. Trace viewers visualize the events in a trace on a
	timeline which give users a graphical view of what their system was doing
	over time.

	#### Logging vs tracing
	A good intuition is that logging is to functional testing what
	tracing is to performance analysis. Tracing is, in a sense, "structured"
	logging: instead of having arbitrary strings emitted from parts of the system,
	tracing reflects the detailed state of a system in a structured way to allow
	reconstruction of the timeline of events.

	Moreover, tracing frameworks (like Perfetto) place heavy emphasis
	on having minimal overhead. This is essential so that the framework
	does not significantly disrupt whatever is being measured: modern frameworks
	are fast enough that they can measure execution at the nanosecond level
	without significantly impacting the execution speed of the program.

	*Small aside: theoretically, tracing frameworks are powerful enough to act as
	a logging system as well. However, the utilization of each in practice is
	different enough that the two tend to be separate.*

	#### Metrics vs tracing
	Metrics are numerical values which track the performance of a system over time.
	Usually metrics map to high-level concepts. Examples of metrics include: CPU
	usage, memory usage, network bandwidth, etc. Metrics are collected directly from
	the app or operating system while the program is running.

	After glimpsing the power of tracing, a natural question arises: why bother
	with high level metrics at all? Why not instead just use tracing and
	compute metrics on resulting traces? In some settings, this may indeed be the
	right approach. In local and lab situations using trace-based metrics,
	where metrics are computed from traces instead of collecting them directly,
	is a powerful approach. If a metric regresses, it's easy to open the trace
	to root cause why that happened.

	However, trace-based metrics are not a universal solution. When running in
	production, the heavyweight nature of traces can make it impractical to collect
	them 24/7. Computing a metric with a trace can take megabytes of data vs bytes
	for direct metric collection.

	Using metrics is the right choice when you want to understand the performance
	of a system over time but do not want to or can not pay the cost of collecting
	traces. In these situations, traces should be used as a root-causing tool.
	When your metrics show there is a problem, targeted tracing can be rolled out
	to understand why the regression may have happened.

	### Profiling
	Profiling involves sampling some usage of a resource by
	a program. A single continuous session of recording is known as a profile.

	Each sample collects the function callstack (i.e. the line of code along with
	all calling functions). Generally this information is aggregated across the
	profile. For each seen callstack, the aggregation gives the percentage of usage
	of the resource by that callstack. By far the most common types of profiling are
	memory profiling and CPU profiling.

	Memory profiling is used to understand which parts of a program are allocating
	memory on the heap. The profiler generally hooks into `malloc` (and `free`)
	calls of a native (C/C++/Rust/etc.) program to sample the callstacks
	calling `malloc`. Information about how many bytes were allocated is also
	retained. CPU profiling is used for understanding where the program is
	spending CPU time. The profiler captures the callstack running on a CPU
	over time. Generally this is done periodically (e.g. every 50ms), but can be
	also be done when certain events happen in the operating system.

	#### Profiling vs tracing
	There are two main questions for comparing profiling and tracing:
	1. Why profile my program statistically when I can just trace everything?
	2. Why use tracing to reconstruct the timeline of events when profiling gives me
	the exact line of code using the most resources?

	##### When to use profiling over tracing
	Traces cannot feasibly capture execution of extreme high frequency
	events e.g. every function call. Profiling tools fill this niche: by
	sampling, they can significantly cut down on how much information they store.
	The statistical nature of profilers are rarely a problem; the sampling
	algorithms for profilers are specifically designed to capture data which is
	highly representative of the real resource use.

	*Aside: a handful of very specialized tracing tools exist which
	can capture every function call (e.g.
	[magic-trace](https://github.com/janestreet/magic-trace)) but they output
	gigabytes of data every second which make them impractical for anything
	beyond investigating tiny snippets of code. They also generally have higher
	overhead than general purpose tracing tools.*

	##### When to use tracing over profiling
	While profilers give callstacks where resources are being used, they lack
	information about why that happened. For example, why was malloc being called
	by function foo() so many times? All they say is foo() allocated X bytes
	over Y calls to `malloc`. Traces are excellent at providing this exact context:
	application instrumentation and low-level kernel events together provide
	deep insight into why code was run in the first place.

	NOTE: Perfetto supports collecting, analyzing and visualizing both profiles
	and traces at the same time so you can have the best of both worlds!

	## Perfetto
	Perfetto is a suite of tools for software performance analysis. Its purpose
	is to empower engineers to understand where resources are being used by their
	systems. It helps identify the changes they can make to improve performance
	and verify the impact of those changes.

	NOTE: In Perfetto, since profiles and traces can be collected simultaneously,
	we call everything a "trace" even if it may contain (only) profiling data
	inside.

	### Recording traces
	Perfetto is highly configurable when it comes to recording traces. There are
	literally hundreds of knobs which can be tweaked to control what data is
	collected, how it should be collected, how much information a trace should
	contain etc.

	[Record traces on Linux quickstart](/docs/quickstart/linux-tracing.md) is
	a good place to start if you're unfamiliar with Perfetto. For Android
	developers,
	[Record traces on Android quickstart](/docs/quickstart/android-tracing.md) will
	be more applicable. The [trace configuration](/docs/concepts/config.md) page
	is also useful to consult as a reference.

	The following sub-sections give an overview of various points worth considering
	when recording Perfetto traces.

	#### Kernel tracing
	Perfetto integrates closely with the Linux kernel's
	[ftrace](https://www.kernel.org/doc/Documentation/trace/ftrace.txt) tracing
	system to record kernel events (e.g. scheduling, syscalls, wakeups). The
	[scheduling](/docs/data-sources/cpu-scheduling.md),
	[syscall](/docs/data-sources/syscalls.md) and
	[CPU frequency](/docs/data-sources/cpu-freq.md) data source pages give
	examples of configuring ftrace collection.

	Natively supported ftrace events can be found in the fields of
	[this proto message](/docs/reference/trace-packet-proto.autogen#FtraceEvent).
	Perfetto also supports collecting ftrace events it does not natively understand
	(i.e. it does not have a protobuf message for) as a
	["generic"](/docs/reference/trace-packet-proto.autogen#GenericFtraceEvent)
	events. These events are encoded as key-value pairs, similar to a JSON
	dictionary.

	It is strongly discouraged to rely on generic events for production use cases:
	the inefficient encoding causes trace size bloat and the
	[trace processor](/docs/analysis/trace-processor.md) cannot parse them
	meaningfully. Instead, support should be added for parsing important ftrace
	events to Perfetto:
	[here](/docs/contributing/common-tasks.md#add-a-new-ftrace-event) is a simple
	set of steps to follow which are found.

	#### Instrumentation with Perfetto SDK
	Perfetto has a [C++ SDK](https://perfetto.dev/docs/instrumentation/tracing-sdk)
	which can be used to instrument programs to emit tracing events. The SDK is
	designed to be very low-overhead and is distributed in an "amalgamated" form
	of a one `.cc` and one `.h` file, making it easy to integrate in any build
	system.

	A C SDK is under active development and should be available for general
	usage by Q2 2023. See [this doc](https://bit.ly/perfetto-c) for details (note
	viewing this doc requires being a member of
	[this group](https://groups.google.com/forum/#!forum/perfetto-dev))

	A Java/Kotlin SDK for Android (as a
	[JetPack library](https://developer.android.com/jetpack/androidx)).
	This is under development but there is no set timescale for when an official
	release will happen.

	##### android.os.Trace (atrace) vs Perfetto SDK
	NOTE: This section is only relevant for Android platform developers or Android
	app developers with tracing experience. Other readers can safely skip this
	section.

	Perfetto has significant advantages over atrace. Some of the biggest advantages
	include:
	* performance: tracing to Perfetto from system/app code requires just a memory
	write which is far faster than the syscall latency imposed by atrace. This
	generally makes Perfetto anywhere from 3-4x faster than atrace
	* features: atrace's API is extremely limited, lacking support for debug
	arguments, custom clocks, flow events. Perfetto has a far richer API allowing
	natural representation of data-flow.
	* trace size: Perfetto supports various features (delta encoded timestamps,
	interned strings, protobuf encoding) which vastly reduce to size of trace
	files.

	Unfortunately, there are also some downsides:
	* dedicated thread: a thread dedicated to Perfetto is necessary for every
	process which wants to trace to Perfetto.
	* wakeups on tracing start: currently, when tracing starts, every process
	registered for tracing is woken up which significantly limits how many
	processes can be traced. This limitation should be removed in coming quarters.

	For now, the recommendation from the Perfetto team is to continue utilizing
	atrace for most usecases: if you think you have a usecase which would benefit
	from the SDK, please reach out to the team directly. By mid-2023, significant
	progress should be made addressing the limitations of the current SDK allowing
	more widespread adoption of the SDK.

	<!--
	TODO(lalitm): write the remainder of the doc using the following template

	#### Native heap profiling

	#### Java heap graphs

	#### Callstack sampling


	#### Flight recorder tracing
	TODO(lalitm): write this.

	##### Field tracing
	TODO(lalitm): write this.

	#### Clock sync
	TODO(lalitm): write this.


	#### Analysis
	TODO(lalitm): write this.
	* Trace processing
	* UI
	* httpd mode
	* metrics
	* Python


	The remainder of this
	page will focus on the applications of Perfetto to solve various performance
	related problems.

	## Solving problems with Perfetto
	TODO(lalitm): write this.
	* When to look into callstack sampling
	* When to use memory profiling
	* When to look at scheduling latency


	TODO(lalitm): write this.

	-->