docs/trace-processor.md - third_party/perfetto - Git at Google

 # Trace Processor

 The Trace Processor is a C++ library
 ([/src/trace_processor](/src/trace_processor)) that is able to ingest traces of
 various format and expose them in a massaged, higher level format, queryable
 through SQL queries. The trace processor is used:
 * By the [Perfetto UI](https://ui.perfetto.dev/), in the form of a
   Web Assembly module.
 * Standalone, using the `trace_processor_shell` target
   (`ninja -C out/xxx trace_processor_shell`).
 * In internal google3 pipelines for batch processing.

 Supported input formats:
  * Perfetto .proto traces
  * Chrome .json trace events [WIP]
  * [NOT IMPLEMENTED YET] ftrace format as per `/sys/kernel/debug/tracing/trace`.

 ![Trace Processor](https://storage.googleapis.com/perfetto/markdown_img/trace-processor-small.png)

 Rationale
 ---------
 Traces are raw because they are optimized for fast & low overhead writing.
 Despite being protos, their output is not ideal for being consumed by third
 parties as-is. Some data massaging is required.
 Examples:
 * Ftrace sched_switch events only provide thread names and thread IDs.
   In order to attribute execution times to the package/process that data needs
   to be joined with the process_tree events to join TIDs with their parent PID
   and process name.
 * Even after this join, sched_switch events produce two slices (one at the
   beginning, one at the end) per sched event. What most consumers want to see
   instead is one "interval" per thread execution time-slice.
 * Similarly ftrace ext4 events provide only inode numbers and those need to be
   joined with inode->path events.


 Schema
 ------

 ### sched table
 `ts`
 Timestamp of the scheduling event, in nanoseconds. This comes from the
 CLOCK_BOOTTIME, when available.

 `dur`
 Duration of the scheduling event, in nanoseconds.

 `utid`
 ID of the thread. This is NOT the UNIX pid/tid (see below).
 This can be used to joined the `thread` and `process` tables.

 `cpu`
 CPU number where the scheduling event happened.

 `cycles`
 [WIP] Number of CPU cycles. This keeps into account cpu frequency changes that
 happened during the `dur` interval. This is available only if the `cpufreq`
 ftrace category has been enabled in the [trace config](/docs/trace-config.md).


 ### process table
 `upid`
 Unique process ID. This is NOT the UNIX pid. This is a sequence number generated
 by the trace processor to uniquely identify a process in the trace. This is to
 deal with the fact that UNIX pids can be recycled and two distinct processes
 which lifetimes don't overlap can be assigned the same pid.

 `name`
 Process name, as per /proc/pid/cmdline.

 `pid`
 The UNIX pid (also known as Thread Group ID in Linux). This also matches the
 tid of the process' main thread.


 Example:

 | upid              |               name |                pid |
 |-------------------|--------------------|--------------------|
 |                 1 | /system/bin/logd   |                601 |
 |                 2 | rcu_preempt        |                  7 |
 |                 3 | rcuop/4            |                 44 |
 |                 4 | rcuop/6            |                 60 |

 ### thread table
 `utid`
 Unique thread ID. This is NOT the Linux pid or tid. Like the above, this is a
 sequence number generated by the trace processor to uniquely identify a thread
 in the trace.

 `upid`
 ID of the parent process in the `process` table.
 This can be used to JOIN a thread with its process.

 `name`
 Threaf name, as per /proc/pid/task/tid/comm.

 `tid`
 The Linux thread id (confusingly named "pid" in the Linux-world).
 For the process' main thread `tid` == `tgid` == `pid`.

 Example:

 | utid   | upid  | name             | tid  |
 |--------|-------|------------------|------|
 |      1 |     1 | logd.klogd       |  632 |
 |      2 |     2 | rcu_preempt      |    7 |
 |      3 |     4 | rcuop/6          |   60 |
 |      4 |     6 | rcuop/3          |   36 |
 |      5 |     8 | sugov:0          |  588 |
 |      6 |     9 | kworker/u16:6    | 9283 |
 |      7 |    12 | sensors@1.0-ser  | 1021 |
 |      8 |    12 | HwBinder:797_1   | 1626 |


 Sample queries for the `sched` (sched_switch events) table
 ----------------------------------------------------------

 ### Trace duration
 ``` sql
 select ((select max(ts) from sched) - (select min(ts) from sched)) / 1e9 as duration_sec
 ```

 ### Total CPU usage
 ``` sql
 select cpu, sum(dur)/1e9 as cpu_time_sec from sched group by cpu order by cpu
 ```

 ### List all processes
 ``` sql
 select process.name, pid from process limit 100
 ```

 ### List all processes and threads
 ``` sql
 select process.name as proc_name, pid, thread.name as thread_name, tid from thread left join process using(upid) limit 100
 ```

 ### CPU time for top 100 threads
 ``` sql
 select thread.name as thread_name, tid, cpu_sec from (select utid, sum(dur)/1e9 as cpu_sec from sched group by utid order by dur desc limit 100) inner join thread using(utid)
 ```

 With matching process names
 ``` sql
 select thread.name as thread_name, process.name as proc_name, tid, pid, cpu_sec from (select utid, sum(dur)/1e9 as cpu_sec from sched group by utid order by dur desc limit 100) left outer join thread using(utid) left outer join process using(upid)
 ```

 ### CPU time for top 100 processes
 ``` sql
 select proc_name, cpu_sec from (select process.name as proc_name, upid, cpu_sec from (select utid, sum(dur)/1e9 as cpu_sec from sched group by utid) left join thread using(utid) left join process using(upid)) group by upid order by cpu_sec desc limit 100
 ```

 ### CPU time for top 100 processes broken down by cpu
 ``` sql
 select proc_name, cpu, cpu_sec from (select process.name as proc_name, upid, cpu, cpu_sec from (select cpu, utid, sum(dur)/1e9 as cpu_sec from sched group by utid) left join thread using(utid) left join process using(upid)) group by upid, cpu order by cpu_sec desc limit 100
 ```
	# Trace Processor

	The Trace Processor is a C++ library
	([/src/trace_processor](/src/trace_processor)) that is able to ingest traces of
	various format and expose them in a massaged, higher level format, queryable
	through SQL queries. The trace processor is used:
	* By the [Perfetto UI](https://ui.perfetto.dev/), in the form of a
	Web Assembly module.
	* Standalone, using the `trace_processor_shell` target
	(`ninja -C out/xxx trace_processor_shell`).
	* In internal google3 pipelines for batch processing.

	Supported input formats:
	* Perfetto .proto traces
	* Chrome .json trace events [WIP]
	* [NOT IMPLEMENTED YET] ftrace format as per `/sys/kernel/debug/tracing/trace`.

	![Trace Processor](https://storage.googleapis.com/perfetto/markdown_img/trace-processor-small.png)

	Rationale
	---------
	Traces are raw because they are optimized for fast & low overhead writing.
	Despite being protos, their output is not ideal for being consumed by third
	parties as-is. Some data massaging is required.
	Examples:
	* Ftrace sched_switch events only provide thread names and thread IDs.
	In order to attribute execution times to the package/process that data needs
	to be joined with the process_tree events to join TIDs with their parent PID
	and process name.
	* Even after this join, sched_switch events produce two slices (one at the
	beginning, one at the end) per sched event. What most consumers want to see
	instead is one "interval" per thread execution time-slice.
	* Similarly ftrace ext4 events provide only inode numbers and those need to be
	joined with inode->path events.


	Schema
	------

	### sched table
	`ts`
	Timestamp of the scheduling event, in nanoseconds. This comes from the
	CLOCK_BOOTTIME, when available.

	`dur`
	Duration of the scheduling event, in nanoseconds.

	`utid`
	ID of the thread. This is NOT the UNIX pid/tid (see below).
	This can be used to joined the `thread` and `process` tables.

	`cpu`
	CPU number where the scheduling event happened.

	`cycles`
	[WIP] Number of CPU cycles. This keeps into account cpu frequency changes that
	happened during the `dur` interval. This is available only if the `cpufreq`
	ftrace category has been enabled in the [trace config](/docs/trace-config.md).


	### process table
	`upid`
	Unique process ID. This is NOT the UNIX pid. This is a sequence number generated
	by the trace processor to uniquely identify a process in the trace. This is to
	deal with the fact that UNIX pids can be recycled and two distinct processes
	which lifetimes don't overlap can be assigned the same pid.

	`name`
	Process name, as per /proc/pid/cmdline.

	`pid`
	The UNIX pid (also known as Thread Group ID in Linux). This also matches the
	tid of the process' main thread.


	Example:

	\| upid \| name \| pid \|
	\|-------------------\|--------------------\|--------------------\|
	\| 1 \| /system/bin/logd \| 601 \|
	\| 2 \| rcu_preempt \| 7 \|
	\| 3 \| rcuop/4 \| 44 \|
	\| 4 \| rcuop/6 \| 60 \|

	### thread table
	`utid`
	Unique thread ID. This is NOT the Linux pid or tid. Like the above, this is a
	sequence number generated by the trace processor to uniquely identify a thread
	in the trace.

	`upid`
	ID of the parent process in the `process` table.
	This can be used to JOIN a thread with its process.

	`name`
	Threaf name, as per /proc/pid/task/tid/comm.

	`tid`
	The Linux thread id (confusingly named "pid" in the Linux-world).
	For the process' main thread `tid` == `tgid` == `pid`.

	Example:

	\| utid \| upid \| name \| tid \|
	\|--------\|-------\|------------------\|------\|
	\| 1 \| 1 \| logd.klogd \| 632 \|
	\| 2 \| 2 \| rcu_preempt \| 7 \|
	\| 3 \| 4 \| rcuop/6 \| 60 \|
	\| 4 \| 6 \| rcuop/3 \| 36 \|
	\| 5 \| 8 \| sugov:0 \| 588 \|
	\| 6 \| 9 \| kworker/u16:6 \| 9283 \|
	\| 7 \| 12 \| sensors@1.0-ser \| 1021 \|
	\| 8 \| 12 \| HwBinder:797_1 \| 1626 \|


	Sample queries for the `sched` (sched_switch events) table
	----------------------------------------------------------

	### Trace duration
	``` sql
	select ((select max(ts) from sched) - (select min(ts) from sched)) / 1e9 as duration_sec
	```

	### Total CPU usage
	``` sql
	select cpu, sum(dur)/1e9 as cpu_time_sec from sched group by cpu order by cpu
	```

	### List all processes
	``` sql
	select process.name, pid from process limit 100
	```

	### List all processes and threads
	``` sql
	select process.name as proc_name, pid, thread.name as thread_name, tid from thread left join process using(upid) limit 100
	```

	### CPU time for top 100 threads
	``` sql
	select thread.name as thread_name, tid, cpu_sec from (select utid, sum(dur)/1e9 as cpu_sec from sched group by utid order by dur desc limit 100) inner join thread using(utid)
	```

	With matching process names
	``` sql
	select thread.name as thread_name, process.name as proc_name, tid, pid, cpu_sec from (select utid, sum(dur)/1e9 as cpu_sec from sched group by utid order by dur desc limit 100) left outer join thread using(utid) left outer join process using(upid)
	```

	### CPU time for top 100 processes
	``` sql
	select proc_name, cpu_sec from (select process.name as proc_name, upid, cpu_sec from (select utid, sum(dur)/1e9 as cpu_sec from sched group by utid) left join thread using(utid) left join process using(upid)) group by upid order by cpu_sec desc limit 100
	```

	### CPU time for top 100 processes broken down by cpu
	``` sql
	select proc_name, cpu, cpu_sec from (select process.name as proc_name, upid, cpu, cpu_sec from (select cpu, utid, sum(dur)/1e9 as cpu_sec from sched group by utid) left join thread using(utid) left join process using(upid)) group by upid, cpu order by cpu_sec desc limit 100
	```