Trace Processor `report` Subcommand

Authors: @lalitm

Status: Draft

Note: This document is a thought experiment exploring a possible future direction for trace_processor_shell. It is NOT a proposal for immediate implementation. The goal is to capture the design space and solicit feedback.

Motivation

Perfetto traces are rich, multi-dimensional datasets. Today, extracting a useful summary requires either:

Loading the trace in the UI and clicking around.
Writing ad-hoc SQL queries against trace_processor.
Authoring TraceSummarySpec textprotos for the summarize subcommand.

None of these serve the “I just collected a trace, what's in it?” use case well. Users coming from perf report expect to point a tool at a data file and immediately see an opinionated, useful summary — no query authoring, no spec files, no UI.

This gap is especially felt by:

CLI power users who want quick triage without leaving the terminal.
AI tools that need structured trace summaries to reason about performance.
CI pipelines that want a human-readable (or machine-parseable) trace summary as a build artifact.

The Firefox Profiler project is exploring a similar direction with their experimental pq CLI tool (PR 5663 in the firefox-devtools/profiler repo), which provides opinionated per-dimension views of profiling data from the command line.

Decision

Pending

Design

Relationship to `summarize`

report is a higher-level, opinionated cousin of summarize:

summarize is the general-purpose engine — users author custom TraceSummarySpec protos to define exactly what to compute.
report ships built-in specs that produce useful defaults across known trace dimensions.

Under the hood, report is built entirely on top of the summarization machinery. Each dimension's report is a pre-authored TraceSummarySpec that gets fed into the same engine that summarize uses.

Built-in spec embedding

Report specs are authored as human-readable textproto files in the source tree (e.g. src/trace_processor/shell/report_specs/*.textproto). A build rule converts these to binary proto and embeds them as byte arrays in the binary, following the existing perfetto_cc_proto_descriptor pattern used for metric and trace descriptors. This means:

Zero file I/O at runtime — specs are baked into the binary.
Specs are human-editable in the source tree.
The same build infrastructure that handles descriptor embedding is reused.

CLI surface

trace_processor_shell report [dimension] [FLAGS] <trace_file>

When no dimension is specified, produce an overview covering all applicable dimensions (skipping those with no data in the trace). When a dimension is specified, produce a detailed per-dimension report.

Dimensions

Dimension	Description
`slices`	Slice aggregations (wall duration, count, max)
`stack-samples`	CPU profiling samples (self/total time)
`heap-profile`	Heap allocation profiling (bytes, count)
`heap-dump`	Heap snapshot analysis (retained size, objects)
`scheduling`	Thread scheduling (CPU time, runnable, wait time)

Output format flags

--format text|json     Output format (default: text).

text: Human-readable tables, similar to perf report --stdio.
json: Structured JSON object, for tool/AI consumption.

Scoping flags

These filter the report to a subset of the trace data:

--pid <pid>            Scope to a specific process ID.
--process <name>       Scope to a process by name.
--tid <tid>            Scope to a specific thread ID.
--thread <name>        Scope to a thread by name.
--track <name>         Scope to a track by name.
--cpu <cpu>            Scope to a specific CPU.
--time <start>,<end>   Scope to a time range.
                       Accepts raw nanoseconds or human-friendly
                       format (e.g. 2.7s,3.1s).

Scoping flags are translated into structured query filters and interval_intersect clauses in the underlying TraceSummarySpec, using the existing DSL primitives — no raw SQL WHERE clauses.

Aggregation control

--top <N>              Number of entries per section (default: 10).

Overview output

When invoked without a dimension, the overview produces a one-line trace context followed by per-dimension aggregated highlights.

Example (--format text):

Trace: 12.3s | Android 14 | Pixel 7 Pro | 12 processes | 48 threads | 156 tracks

Slices (12.3M total):
  Name                          Count     Total dur   % of trace   Max dur
  Choreographer#doFrame         83.2k     4.1s        33.2%        128ms
  DrawFrame                     83.1k     3.5s        28.4%        96ms
  measure                       41.6k     890ms       7.2%         42ms
  layout                        41.6k     620ms       5.0%         38ms
  dequeueBuffer                 24.9k     310ms       2.5%         12ms
  eglSwapBuffers                24.9k     280ms       2.3%         8ms
  RenderThread::draw            24.9k     240ms       1.9%         6ms
  BinderTransaction             12.1k     180ms       1.5%         52ms
  animation                     8.3k      120ms       1.0%         4ms
  inflate                       2.1k      95ms        0.8%         18ms

Stack Samples (3.2k total):
  Function                                  Self%     Total%    Samples
  art::Thread::RunRootClock                 18.2%     42.1%     583
  __epoll_pwait                             12.1%     12.1%     387
  art::interpreter::Execute                 8.4%      31.2%     269
  ...

Scheduling:
  Thread                        CPU time   Runnable   Sleeping   % of trace
  RenderThread                  3.2s       120ms      8.9s       26.0%
  mali-cmar-backe               1.8s       45ms       10.4s      14.6%
  HeapTaskDaemon                890ms      12ms       11.3s      7.2%
  ...

Heap Profile: (not present in trace)
Heap Dump: (not present in trace)

Per-dimension detail

Per-dimension reports provide a deeper view. For example, tp report slices <trace> would show the same columns as the overview but with a higher default --top and potentially additional breakdowns (e.g. per-thread grouping).

The exact content of per-dimension reports is left as an open question for now. As noted below, the call-tree views for stack samples (top-down / bottom-up, as seen in perf report and the Firefox Profiler's pq tool) are a natural fit here but the exact interaction model needs more thought.

Per-dimension column definitions

Slices

Default aggregation key: slice name.

Column	Description
Name	Slice name
Count	Number of instances
Total dur	Sum of wall durations across all instances
% of trace	Total duration as percentage of trace duration
Max dur	Maximum single-instance duration (outlier detection)

Stack Samples

Modeled after perf report --stdio.

Column	Description
Function	Function name (symbol)
Self%	Samples where this function is at the top of the stack
Total%	Samples where this function appears anywhere in stack
Samples	Absolute sample count

Heap Profile

Same shape as stack samples but with bytes instead of sample count.

Column	Description
Allocator	Allocation site / function
Self bytes	Bytes allocated directly by this function
Total bytes	Bytes allocated by this function and its callees
Count	Number of allocations
Avg size	Average allocation size

Heap Dump

Point-in-time memory snapshot.

Column	Description
Type/Alloc	Type or allocator
Retained size	Total retained memory
Live objects	Count of live objects

Scheduling

Per-thread scheduling summary.

Column	Description
Thread	Thread name
CPU time	Total time spent running on a CPU
Runnable	Total time in runnable state (waiting for CPU)
Sleeping	Total time sleeping
% of trace	CPU time as percentage of trace duration

Sources of inspiration

perf report (Linux perf): Opinionated defaults, hierarchical views, sort-by-overhead, --stdio output. The gold standard for “point at data, get useful summary.”
Firefox Profiler pq: CLI profile querying with per-dimension formatters, top-down/bottom-up call trees, scoping via time ranges, dual human/JSON output (PR 5663 in the firefox-devtools/profiler repo).
pprof (Go): -top, -text views for CPU/heap profiles. Ergonomic top-N function summaries.
heaptrack (KDE): CLI heap profile summaries — peak consumption, top allocators, leak candidates.

Alternatives considered

Ship report specs as external files

Pro:

Users can inspect and modify specs without rebuilding.

Con:

Requires distributing spec files alongside the binary.
File discovery and path resolution adds complexity.
Embedded binary protos are zero-overhead and follow existing precedent (metric descriptors, trace descriptors).

Hardcode aggregation queries in C++

Pro:

No proto serialization overhead.

Con:

Loses the declarative nature of the summarization DSL.
Cannot be reused by the summarize subcommand.
Harder to maintain and review.

Combine with `summarize`

Pro:

One subcommand to learn.

Con:

summarize is for custom specs; overloading it with opinionated defaults muddies its purpose.
Different flag surfaces (scoping flags vs spec paths) would conflict.

Open questions

Per-dimension drill-down interaction model: For stack samples, top-down and bottom-up call trees (à la perf report and Firefox Profiler's pq tool) are a natural fit. Should these be sub-sub-commands (tp report stack-samples top-down <trace>), flags (--view top-down), or sections within the same output?
Exact per-dimension report content: The overview columns are defined above. The detailed per-dimension reports may include additional breakdowns (e.g. per-thread slice grouping, per-process scheduling). Exact content TBD.
Spec authoring: The built-in specs need to be written against the existing PerfettoSQL stdlib tables and modules. The exact table/module references for each dimension need to be determined.
Trace metadata extraction: The one-line context line (OS, device, duration, process/thread/track counts) may require queries outside the summarization DSL. How to handle this cleanly?

Trace Processor report Subcommand

Motivation

Decision

Design

Relationship to summarize

Built-in spec embedding

CLI surface

Dimensions

Output format flags

Scoping flags

Aggregation control

Overview output

Per-dimension detail

Per-dimension column definitions

Slices

Stack Samples

Heap Profile

Heap Dump

Scheduling

Sources of inspiration

Alternatives considered

Ship report specs as external files

Hardcode aggregation queries in C++

Combine with summarize

Open questions

Trace Processor `report` Subcommand

Relationship to `summarize`

Combine with `summarize`