report SubcommandAuthors: @lalitm
Status: Draft
Note: This document is a thought experiment exploring a possible future direction for trace_processor_shell. It is NOT a proposal for immediate implementation. The goal is to capture the design space and solicit feedback.
Perfetto traces are rich, multi-dimensional datasets. Today, extracting a useful summary requires either:
TraceSummarySpec textprotos for the summarize subcommand.None of these serve the “I just collected a trace, what's in it?” use case well. Users coming from perf report expect to point a tool at a data file and immediately see an opinionated, useful summary — no query authoring, no spec files, no UI.
This gap is especially felt by:
The Firefox Profiler project is exploring a similar direction with their experimental pq CLI tool (PR 5663 in the firefox-devtools/profiler repo), which provides opinionated per-dimension views of profiling data from the command line.
Pending
summarizereport is a higher-level, opinionated cousin of summarize:
summarize is the general-purpose engine — users author custom TraceSummarySpec protos to define exactly what to compute.report ships built-in specs that produce useful defaults across known trace dimensions.Under the hood, report is built entirely on top of the summarization machinery. Each dimension's report is a pre-authored TraceSummarySpec that gets fed into the same engine that summarize uses.
Report specs are authored as human-readable textproto files in the source tree (e.g. src/trace_processor/shell/report_specs/*.textproto). A build rule converts these to binary proto and embeds them as byte arrays in the binary, following the existing perfetto_cc_proto_descriptor pattern used for metric and trace descriptors. This means:
trace_processor_shell report [dimension] [FLAGS] <trace_file>
When no dimension is specified, produce an overview covering all applicable dimensions (skipping those with no data in the trace). When a dimension is specified, produce a detailed per-dimension report.
| Dimension | Description |
|---|---|
slices | Slice aggregations (wall duration, count, max) |
stack-samples | CPU profiling samples (self/total time) |
heap-profile | Heap allocation profiling (bytes, count) |
heap-dump | Heap snapshot analysis (retained size, objects) |
scheduling | Thread scheduling (CPU time, runnable, wait time) |
--format text|json Output format (default: text).
text: Human-readable tables, similar to perf report --stdio.json: Structured JSON object, for tool/AI consumption.These filter the report to a subset of the trace data:
--pid <pid> Scope to a specific process ID.
--process <name> Scope to a process by name.
--tid <tid> Scope to a specific thread ID.
--thread <name> Scope to a thread by name.
--track <name> Scope to a track by name.
--cpu <cpu> Scope to a specific CPU.
--time <start>,<end> Scope to a time range.
Accepts raw nanoseconds or human-friendly
format (e.g. 2.7s,3.1s).
Scoping flags are translated into structured query filters and interval_intersect clauses in the underlying TraceSummarySpec, using the existing DSL primitives — no raw SQL WHERE clauses.
--top <N> Number of entries per section (default: 10).
When invoked without a dimension, the overview produces a one-line trace context followed by per-dimension aggregated highlights.
Example (--format text):
Trace: 12.3s | Android 14 | Pixel 7 Pro | 12 processes | 48 threads | 156 tracks Slices (12.3M total): Name Count Total dur % of trace Max dur Choreographer#doFrame 83.2k 4.1s 33.2% 128ms DrawFrame 83.1k 3.5s 28.4% 96ms measure 41.6k 890ms 7.2% 42ms layout 41.6k 620ms 5.0% 38ms dequeueBuffer 24.9k 310ms 2.5% 12ms eglSwapBuffers 24.9k 280ms 2.3% 8ms RenderThread::draw 24.9k 240ms 1.9% 6ms BinderTransaction 12.1k 180ms 1.5% 52ms animation 8.3k 120ms 1.0% 4ms inflate 2.1k 95ms 0.8% 18ms Stack Samples (3.2k total): Function Self% Total% Samples art::Thread::RunRootClock 18.2% 42.1% 583 __epoll_pwait 12.1% 12.1% 387 art::interpreter::Execute 8.4% 31.2% 269 ... Scheduling: Thread CPU time Runnable Sleeping % of trace RenderThread 3.2s 120ms 8.9s 26.0% mali-cmar-backe 1.8s 45ms 10.4s 14.6% HeapTaskDaemon 890ms 12ms 11.3s 7.2% ... Heap Profile: (not present in trace) Heap Dump: (not present in trace)
Per-dimension reports provide a deeper view. For example, tp report slices <trace> would show the same columns as the overview but with a higher default --top and potentially additional breakdowns (e.g. per-thread grouping).
The exact content of per-dimension reports is left as an open question for now. As noted below, the call-tree views for stack samples (top-down / bottom-up, as seen in perf report and the Firefox Profiler's pq tool) are a natural fit here but the exact interaction model needs more thought.
Default aggregation key: slice name.
| Column | Description |
|---|---|
| Name | Slice name |
| Count | Number of instances |
| Total dur | Sum of wall durations across all instances |
| % of trace | Total duration as percentage of trace duration |
| Max dur | Maximum single-instance duration (outlier detection) |
Modeled after perf report --stdio.
| Column | Description |
|---|---|
| Function | Function name (symbol) |
| Self% | Samples where this function is at the top of the stack |
| Total% | Samples where this function appears anywhere in stack |
| Samples | Absolute sample count |
Same shape as stack samples but with bytes instead of sample count.
| Column | Description |
|---|---|
| Allocator | Allocation site / function |
| Self bytes | Bytes allocated directly by this function |
| Total bytes | Bytes allocated by this function and its callees |
| Count | Number of allocations |
| Avg size | Average allocation size |
Point-in-time memory snapshot.
| Column | Description |
|---|---|
| Type/Alloc | Type or allocator |
| Retained size | Total retained memory |
| Live objects | Count of live objects |
Per-thread scheduling summary.
| Column | Description |
|---|---|
| Thread | Thread name |
| CPU time | Total time spent running on a CPU |
| Runnable | Total time in runnable state (waiting for CPU) |
| Sleeping | Total time sleeping |
| % of trace | CPU time as percentage of trace duration |
perf report (Linux perf): Opinionated defaults, hierarchical views, sort-by-overhead, --stdio output. The gold standard for “point at data, get useful summary.”pq: CLI profile querying with per-dimension formatters, top-down/bottom-up call trees, scoping via time ranges, dual human/JSON output (PR 5663 in the firefox-devtools/profiler repo).pprof (Go): -top, -text views for CPU/heap profiles. Ergonomic top-N function summaries.heaptrack (KDE): CLI heap profile summaries — peak consumption, top allocators, leak candidates.Pro:
Con:
Pro:
Con:
summarize subcommand.summarizePro:
Con:
summarize is for custom specs; overloading it with opinionated defaults muddies its purpose.perf report and Firefox Profiler's pq tool) are a natural fit. Should these be sub-sub-commands (tp report stack-samples top-down <trace>), flags (--view top-down), or sections within the same output?