| # Trace Processor |
| |
| The Trace Processor is a C++ library |
| ([/src/trace_processor](/src/trace_processor)) that is able to ingest traces of |
| various format and expose them in a massaged, higher level format, queryable |
| through SQL queries. The trace processor is used: |
| * By the [Perfetto UI](https://ui.perfetto.dev/), in the form of a |
| Web Assembly module. |
| * Standalone, using the `trace_processor_shell` target |
| (`ninja -C out/xxx trace_processor_shell`). |
| * In internal google3 pipelines for batch processing. |
| |
| Supported input formats: |
| * Perfetto .proto traces |
| * Chrome .json trace events [WIP] |
| * [NOT IMPLEMENTED YET] ftrace format as per `/sys/kernel/debug/tracing/trace`. |
| |
|  |
| |
| Rationale |
| --------- |
| Traces are raw because they are optimized for fast & low overhead writing. |
| Despite being protos, their output is not ideal for being consumed by third |
| parties as-is. Some data massaging is required. |
| Examples: |
| * Ftrace sched_switch events only provide thread names and thread IDs. |
| In order to attribute execution times to the package/process that data needs |
| to be joined with the process_tree events to join TIDs with their parent PID |
| and process name. |
| * Even after this join, sched_switch events produce two slices (one at the |
| beginning, one at the end) per sched event. What most consumers want to see |
| instead is one "interval" per thread execution time-slice. |
| * Similarly ftrace ext4 events provide only inode numbers and those need to be |
| joined with inode->path events. |
| |
| |
| Schema |
| ------ |
| |
| ### sched table |
| `ts` |
| Timestamp of the scheduling event, in nanoseconds. This comes from the |
| CLOCK_BOOTTIME, when available. |
| |
| `dur` |
| Duration of the scheduling event, in nanoseconds. |
| |
| `utid` |
| ID of the thread. This is NOT the UNIX pid/tid (see below). |
| This can be used to joined the `thread` and `process` tables. |
| |
| `cpu` |
| CPU number where the scheduling event happened. |
| |
| `cycles` |
| [WIP] Number of CPU cycles. This keeps into account cpu frequency changes that |
| happened during the `dur` interval. This is available only if the `cpufreq` |
| ftrace category has been enabled in the [trace config](/docs/trace-config.md). |
| |
| |
| ### process table |
| `upid` |
| Unique process ID. This is NOT the UNIX pid. This is a sequence number generated |
| by the trace processor to uniquely identify a process in the trace. This is to |
| deal with the fact that UNIX pids can be recycled and two distinct processes |
| which lifetimes don't overlap can be assigned the same pid. |
| |
| `name` |
| Process name, as per /proc/pid/cmdline. |
| |
| `pid` |
| The UNIX pid (also known as Thread Group ID in Linux). This also matches the |
| tid of the process' main thread. |
| |
| |
| Example: |
| |
| | upid | name | pid | |
| |-------------------|--------------------|--------------------| |
| | 1 | /system/bin/logd | 601 | |
| | 2 | rcu_preempt | 7 | |
| | 3 | rcuop/4 | 44 | |
| | 4 | rcuop/6 | 60 | |
| |
| ### thread table |
| `utid` |
| Unique thread ID. This is NOT the Linux pid or tid. Like the above, this is a |
| sequence number generated by the trace processor to uniquely identify a thread |
| in the trace. |
| |
| `upid` |
| ID of the parent process in the `process` table. |
| This can be used to JOIN a thread with its process. |
| |
| `name` |
| Threaf name, as per /proc/pid/task/tid/comm. |
| |
| `tid` |
| The Linux thread id (confusingly named "pid" in the Linux-world). |
| For the process' main thread `tid` == `tgid` == `pid`. |
| |
| Example: |
| |
| | utid | upid | name | tid | |
| |--------|-------|------------------|------| |
| | 1 | 1 | logd.klogd | 632 | |
| | 2 | 2 | rcu_preempt | 7 | |
| | 3 | 4 | rcuop/6 | 60 | |
| | 4 | 6 | rcuop/3 | 36 | |
| | 5 | 8 | sugov:0 | 588 | |
| | 6 | 9 | kworker/u16:6 | 9283 | |
| | 7 | 12 | sensors@1.0-ser | 1021 | |
| | 8 | 12 | HwBinder:797_1 | 1626 | |
| |
| |
| Sample queries for the `sched` (sched_switch events) table |
| ---------------------------------------------------------- |
| |
| ### Trace duration |
| ``` sql |
| select ((select max(ts) from sched) - (select min(ts) from sched)) / 1e9 as duration_sec |
| ``` |
| |
| ### Total CPU usage |
| ``` sql |
| select cpu, sum(dur)/1e9 as cpu_time_sec from sched group by cpu order by cpu |
| ``` |
| |
| ### List all processes |
| ``` sql |
| select process.name, pid from process limit 100 |
| ``` |
| |
| ### List all processes and threads |
| ``` sql |
| select process.name as proc_name, pid, thread.name as thread_name, tid from thread left join process using(upid) limit 100 |
| ``` |
| |
| ### CPU time for top 100 threads |
| ``` sql |
| select thread.name as thread_name, tid, cpu_sec from (select utid, sum(dur)/1e9 as cpu_sec from sched group by utid order by dur desc limit 100) inner join thread using(utid) |
| ``` |
| |
| With matching process names |
| ``` sql |
| select thread.name as thread_name, process.name as proc_name, tid, pid, cpu_sec from (select utid, sum(dur)/1e9 as cpu_sec from sched group by utid order by dur desc limit 100) left outer join thread using(utid) left outer join process using(upid) |
| ``` |
| |
| ### CPU time for top 100 processes |
| ``` sql |
| select proc_name, cpu_sec from (select process.name as proc_name, upid, cpu_sec from (select utid, sum(dur)/1e9 as cpu_sec from sched group by utid) left join thread using(utid) left join process using(upid)) group by upid order by cpu_sec desc limit 100 |
| ``` |
| |
| ### CPU time for top 100 processes broken down by cpu |
| ``` sql |
| select proc_name, cpu, cpu_sec from (select process.name as proc_name, upid, cpu, cpu_sec from (select cpu, utid, sum(dur)/1e9 as cpu_sec from sched group by utid) left join thread using(utid) left join process using(upid)) group by upid, cpu order by cpu_sec desc limit 100 |
| ``` |