blob: 5644457846f740f6bcf70c7e108fe4fdaf2f5e67 [file] [log] [blame] [view]
# Trace Processor
The Trace Processor is a C++ library
([/src/trace_processor](/src/trace_processor)) that is able to ingest traces of
various format and expose them in a massaged, higher level format, queryable
through SQL queries. The trace processor is used:
* By the [Perfetto UI](https://ui.perfetto.dev/), in the form of a
Web Assembly module.
* Standalone, using the `trace_processor_shell` target
(`ninja -C out/xxx trace_processor_shell`).
* In internal google3 pipelines for batch processing.
Supported input formats:
* Perfetto .proto traces
* Chrome .json trace events [WIP]
* [NOT IMPLEMENTED YET] ftrace format as per `/sys/kernel/debug/tracing/trace`.
![Trace Processor](https://storage.googleapis.com/perfetto/markdown_img/trace-processor-small.png)
Rationale
---------
Traces are raw because they are optimized for fast & low overhead writing.
Despite being protos, their output is not ideal for being consumed by third
parties as-is. Some data massaging is required.
Examples:
* Ftrace sched_switch events only provide thread names and thread IDs.
In order to attribute execution times to the package/process that data needs
to be joined with the process_tree events to join TIDs with their parent PID
and process name.
* Even after this join, sched_switch events produce two slices (one at the
beginning, one at the end) per sched event. What most consumers want to see
instead is one "interval" per thread execution time-slice.
* Similarly ftrace ext4 events provide only inode numbers and those need to be
joined with inode->path events.
Schema
------
### sched table
`ts`
Timestamp of the scheduling event, in nanoseconds. This comes from the
CLOCK_BOOTTIME, when available.
`dur`
Duration of the scheduling event, in nanoseconds.
`utid`
ID of the thread. This is NOT the UNIX pid/tid (see below).
This can be used to joined the `thread` and `process` tables.
`cpu`
CPU number where the scheduling event happened.
`cycles`
[WIP] Number of CPU cycles. This keeps into account cpu frequency changes that
happened during the `dur` interval. This is available only if the `cpufreq`
ftrace category has been enabled in the [trace config](/docs/trace-config.md).
### process table
`upid`
Unique process ID. This is NOT the UNIX pid. This is a sequence number generated
by the trace processor to uniquely identify a process in the trace. This is to
deal with the fact that UNIX pids can be recycled and two distinct processes
which lifetimes don't overlap can be assigned the same pid.
`name`
Process name, as per /proc/pid/cmdline.
`pid`
The UNIX pid (also known as Thread Group ID in Linux). This also matches the
tid of the process' main thread.
Example:
| upid | name | pid |
|-------------------|--------------------|--------------------|
| 1 | /system/bin/logd | 601 |
| 2 | rcu_preempt | 7 |
| 3 | rcuop/4 | 44 |
| 4 | rcuop/6 | 60 |
### thread table
`utid`
Unique thread ID. This is NOT the Linux pid or tid. Like the above, this is a
sequence number generated by the trace processor to uniquely identify a thread
in the trace.
`upid`
ID of the parent process in the `process` table.
This can be used to JOIN a thread with its process.
`name`
Threaf name, as per /proc/pid/task/tid/comm.
`tid`
The Linux thread id (confusingly named "pid" in the Linux-world).
For the process' main thread `tid` == `tgid` == `pid`.
Example:
| utid | upid | name | tid |
|--------|-------|------------------|------|
| 1 | 1 | logd.klogd | 632 |
| 2 | 2 | rcu_preempt | 7 |
| 3 | 4 | rcuop/6 | 60 |
| 4 | 6 | rcuop/3 | 36 |
| 5 | 8 | sugov:0 | 588 |
| 6 | 9 | kworker/u16:6 | 9283 |
| 7 | 12 | sensors@1.0-ser | 1021 |
| 8 | 12 | HwBinder:797_1 | 1626 |
Sample queries for the `sched` (sched_switch events) table
----------------------------------------------------------
### Trace duration
``` sql
select ((select max(ts) from sched) - (select min(ts) from sched)) / 1e9 as duration_sec
```
### Total CPU usage
``` sql
select cpu, sum(dur)/1e9 as cpu_time_sec from sched group by cpu order by cpu
```
### List all processes
``` sql
select process.name, pid from process limit 100
```
### List all processes and threads
``` sql
select process.name as proc_name, pid, thread.name as thread_name, tid from thread left join process using(upid) limit 100
```
### CPU time for top 100 threads
``` sql
select thread.name as thread_name, tid, cpu_sec from (select utid, sum(dur)/1e9 as cpu_sec from sched group by utid order by dur desc limit 100) inner join thread using(utid)
```
With matching process names
``` sql
select thread.name as thread_name, process.name as proc_name, tid, pid, cpu_sec from (select utid, sum(dur)/1e9 as cpu_sec from sched group by utid order by dur desc limit 100) left outer join thread using(utid) left outer join process using(upid)
```
### CPU time for top 100 processes
``` sql
select proc_name, cpu_sec from (select process.name as proc_name, upid, cpu_sec from (select utid, sum(dur)/1e9 as cpu_sec from sched group by utid) left join thread using(utid) left join process using(upid)) group by upid order by cpu_sec desc limit 100
```
### CPU time for top 100 processes broken down by cpu
``` sql
select proc_name, cpu, cpu_sec from (select process.name as proc_name, upid, cpu, cpu_sec from (select cpu, utid, sum(dur)/1e9 as cpu_sec from sched group by utid) left join thread using(utid) left join process using(upid)) group by upid, cpu order by cpu_sec desc limit 100
```