Docs: add Trace Processor docs

TBR: lalitm,taylori
Change-Id: I6c6688e0a6d4691fb2590988de97e38e89ff7edd
diff --git a/docs/trace-processor.md b/docs/trace-processor.md
new file mode 100644
index 0000000..5644457
--- /dev/null
+++ b/docs/trace-processor.md
@@ -0,0 +1,157 @@
+# Trace Processor
+
+The Trace Processor is a C++ library
+([/src/trace_processor](/src/trace_processor)) that is able to ingest traces of
+various format and expose them in a massaged, higher level format, queryable
+through SQL queries. The trace processor is used:
+* By the [Perfetto UI](https://ui.perfetto.dev/), in the form of a
+  Web Assembly module.
+* Standalone, using the `trace_processor_shell` target
+  (`ninja -C out/xxx trace_processor_shell`).
+* In internal google3 pipelines for batch processing.
+
+Supported input formats:
+ * Perfetto .proto traces
+ * Chrome .json trace events [WIP]
+ * [NOT IMPLEMENTED YET] ftrace format as per `/sys/kernel/debug/tracing/trace`.
+
+![Trace Processor](https://storage.googleapis.com/perfetto/markdown_img/trace-processor-small.png)
+
+Rationale
+---------
+Traces are raw because they are optimized for fast & low overhead writing.
+Despite being protos, their output is not ideal for being consumed by third
+parties as-is. Some data massaging is required.  
+Examples:
+* Ftrace sched_switch events only provide thread names and thread IDs. 
+  In order to attribute execution times to the package/process that data needs
+  to be joined with the process_tree events to join TIDs with their parent PID
+  and process name.
+* Even after this join, sched_switch events produce two slices (one at the
+  beginning, one at the end) per sched event. What most consumers want to see 
+  instead is one "interval" per thread execution time-slice.
+* Similarly ftrace ext4 events provide only inode numbers and those need to be
+  joined with inode->path events.
+
+
+Schema
+------
+
+### sched table
+`ts`  
+Timestamp of the scheduling event, in nanoseconds. This comes from the
+CLOCK_BOOTTIME, when available.
+
+`dur`  
+Duration of the scheduling event, in nanoseconds.
+
+`utid`  
+ID of the thread. This is NOT the UNIX pid/tid (see below).
+This can be used to joined the `thread` and `process` tables.
+
+`cpu`  
+CPU number where the scheduling event happened.
+
+`cycles`  
+[WIP] Number of CPU cycles. This keeps into account cpu frequency changes that
+happened during the `dur` interval. This is available only if the `cpufreq`
+ftrace category has been enabled in the [trace config](/docs/trace-config.md).
+
+
+### process table
+`upid`  
+Unique process ID. This is NOT the UNIX pid. This is a sequence number generated
+by the trace processor to uniquely identify a process in the trace. This is to
+deal with the fact that UNIX pids can be recycled and two distinct processes 
+which lifetimes don't overlap can be assigned the same pid.
+
+`name`  
+Process name, as per /proc/pid/cmdline.
+
+`pid`  
+The UNIX pid (also known as Thread Group ID in Linux). This also matches the
+tid of the process' main thread.
+
+
+Example:  
+
+| upid              |               name |                pid |
+|-------------------|--------------------|--------------------|
+|                 1 | /system/bin/logd   |                601 |
+|                 2 | rcu_preempt        |                  7 |
+|                 3 | rcuop/4            |                 44 |
+|                 4 | rcuop/6            |                 60 |
+
+### thread table
+`utid`  
+Unique thread ID. This is NOT the Linux pid or tid. Like the above, this is a
+sequence number generated by the trace processor to uniquely identify a thread
+in the trace.
+
+`upid`  
+ID of the parent process in the `process` table.
+This can be used to JOIN a thread with its process.
+
+`name`  
+Threaf name, as per /proc/pid/task/tid/comm.
+
+`tid`  
+The Linux thread id (confusingly named "pid" in the Linux-world).
+For the process' main thread `tid` == `tgid` == `pid`.
+
+Example:  
+
+| utid   | upid  | name             | tid  |
+|--------|-------|------------------|------|
+|      1 |     1 | logd.klogd       |  632 |
+|      2 |     2 | rcu_preempt      |    7 |
+|      3 |     4 | rcuop/6          |   60 |
+|      4 |     6 | rcuop/3          |   36 |
+|      5 |     8 | sugov:0          |  588 |
+|      6 |     9 | kworker/u16:6    | 9283 |
+|      7 |    12 | sensors@1.0-ser  | 1021 |
+|      8 |    12 | HwBinder:797_1   | 1626 |
+
+
+Sample queries for the `sched` (sched_switch events) table
+----------------------------------------------------------
+
+### Trace duration
+``` sql
+select ((select max(ts) from sched) - (select min(ts) from sched)) / 1e9 as duration_sec
+```
+
+### Total CPU usage
+``` sql
+select cpu, sum(dur)/1e9 as cpu_time_sec from sched group by cpu order by cpu
+```
+
+### List all processes
+``` sql
+select process.name, pid from process limit 100
+```
+
+### List all processes and threads
+``` sql
+select process.name as proc_name, pid, thread.name as thread_name, tid from thread left join process using(upid) limit 100
+```
+
+### CPU time for top 100 threads
+``` sql
+select thread.name as thread_name, tid, cpu_sec from (select utid, sum(dur)/1e9 as cpu_sec from sched group by utid order by dur desc limit 100) inner join thread using(utid)
+```
+
+With matching process names
+``` sql
+select thread.name as thread_name, process.name as proc_name, tid, pid, cpu_sec from (select utid, sum(dur)/1e9 as cpu_sec from sched group by utid order by dur desc limit 100) left outer join thread using(utid) left outer join process using(upid)
+```
+
+### CPU time for top 100 processes
+``` sql
+select proc_name, cpu_sec from (select process.name as proc_name, upid, cpu_sec from (select utid, sum(dur)/1e9 as cpu_sec from sched group by utid) left join thread using(utid) left join process using(upid)) group by upid order by cpu_sec desc limit 100
+```
+
+### CPU time for top 100 processes broken down by cpu
+``` sql
+select proc_name, cpu, cpu_sec from (select process.name as proc_name, upid, cpu, cpu_sec from (select cpu, utid, sum(dur)/1e9 as cpu_sec from sched group by utid) left join thread using(utid) left join process using(upid)) group by upid, cpu order by cpu_sec desc limit 100
+```