blob: 1597de18df2a616bb4a72b1abe08d0b1a2f4c297 [file] [log] [blame] [view] [edit]
# pprof Support in Perfetto
_**Status:** COMPLETED **·** lalitm **·** 2025-09-30_
## Objective
Add support for importing pprof files into Perfetto Trace Processor and visualizing them with flame graphs in the Perfetto UI. This enables analysis of CPU/heap profiles from Go, C++, and other tools that generate pprof format within the Perfetto ecosystem.
## Overview
This feature extends Perfetto's trace analysis capabilities to include non-time-based aggregate profiling data. Unlike existing profiling support which is integrated with timeline-based traces, pprof data represents standalone aggregate samples that are independent of time.
```mermaid
graph LR
A[pprof file] --> B[PprofTraceReader]
B --> C[aggregate_profile table]
B --> D[aggregate_sample table]
B --> E[stack_profile_* tables]
C --> F[UI: Scope/Metric Selection]
D --> F
E --> F
F --> G[Interactive Flamegraph]
```
The implementation builds upon existing Perfetto infrastructure:
- **Database layer**: Extends existing `stack_profile_*` tables with new aggregate tables
- **Import pipeline**: Follows the established `TraceType` + `TraceReader` pattern
- **UI layer**: Leverages existing flame graph visualization components
### Requirements
**Zero-setup analysis:** A pprof file can be analyzed with a single command or drag-and-drop.
**Full format support:** Support gzipped and uncompressed pprof protobuf files from any pprof-compatible tool.
**Multiple metrics per file:** Handle pprof files containing multiple value types (e.g., CPU samples + allocation counts) in a single visualization.
**Interactive flame graphs:** Provide full interactivity including zoom, search, and source location attribution where available.
**No timeline confusion:** Keep pprof data completely separate from time-based trace analysis to avoid user confusion.
## Detailed Design
### File Format Support
The implementation supports the standard pprof format as defined by [Google's pprof tool](https://github.com/google/pprof/blob/main/proto/profile.proto):
**Gzipped format:** Files compressed with gzip, as typically generated by most profiling tools.
**Raw protobuf:** Uncompressed protobuf files for development and testing.
**Profile structure:** Full support for the Profile protobuf message including:
- String table for deduplicated strings
- Sample data with location hierarchies
- Function and mapping metadata
- Multiple value types (CPU samples, allocations, etc.)
### Import Architecture
#### File Detection
The import pipeline automatically detects pprof files through a two-stage process:
1. **Gzip detection:** Recognize gzipped files by magic bytes (`1f 8b`)
2. **Protobuf validation:** After decompression, validate pprof structure by checking for Profile message with `sample_type` field
#### PprofTraceReader
```cpp
class PprofTraceReader : public ChunkedTraceReader {
public:
explicit PprofTraceReader(TraceProcessorContext* context);
base::Status Parse(TraceBlobView blob) override;
base::Status NotifyEndOfFile() override;
private:
base::Status ParseProfile();
TraceProcessorContext* context_;
std::vector<uint8_t> buffer_;
};
```
The reader accumulates pprof data into an internal buffer and parses the complete protobuf message upon EOF notification.
### Database Schema
#### New Tables
The implementation introduces two new tables that integrate with existing stack profiling infrastructure:
```sql
-- Metadata for each profiling metric from pprof files
CREATE TABLE aggregate_profile (
id INTEGER PRIMARY KEY,
scope TEXT, -- file identifier (e.g., "cpu.pprof")
name TEXT, -- display name (e.g., "pprof cpu")
sample_type_type TEXT, -- pprof ValueType.type (e.g., "cpu")
sample_type_unit TEXT -- pprof ValueType.unit (e.g., "nanoseconds")
);
-- Sample values aggregated by callsite
CREATE TABLE aggregate_sample (
id INTEGER PRIMARY KEY,
aggregate_profile_id INTEGER, -- FK to aggregate_profile
callsite_id INTEGER, -- FK to stack_profile_callsite
value REAL -- sample count/value
);
```
#### Integration with Existing Infrastructure
- **stack_profile_frame:** Stores function name and source file information
- **stack_profile_callsite:** Maintains call stack hierarchy from root to leaf
- **stack_profile_mapping:** Contains binary/library mapping information
Each pprof location becomes a frame, callsites represent the full call chain from root to leaf, and samples aggregate values at each callsite.
### Data Processing Pipeline
#### Step 1: String Table Parsing
All pprof files use a string table for deduplication. The importer builds a vector of strings from the protobuf `string_table` field.
#### Step 2: Mapping and Function Creation
For each pprof `Mapping` and `Function`:
- Extract binary name, build ID, and memory ranges
- Create entries in `stack_profile_mapping` and populate frame metadata
- Build lookup tables for location resolution
#### Step 3: Location Processing
Each pprof `Location` represents a program counter with optional debug information:
- Map addresses to existing or dummy memory mappings
- Extract function names from associated line information
- Create `stack_profile_frame` entries with relative PCs
#### Step 4: Sample Processing
For each pprof `Sample`:
- Build complete callsite hierarchy from location chain (reversing pprof leaf-first order)
- Create aggregate entries for each value type in the sample
- Link samples to callsites through `aggregate_sample` table
```
Pprof Sample → Location IDs [3,2,1] (leaf first)
Perfetto Callsite hierarchy: 1 → 2 → 3 (root to leaf)
Multiple aggregate_sample entries (one per value type)
```
### UI Implementation
#### PprofPage Component
The UI provides a dedicated page for pprof analysis accessible from the main navigation. The page automatically discovers available data and provides interactive controls.
#### Dynamic Data Discovery
Upon loading, the UI queries the database to discover:
1. **Available scopes** (typically one per imported pprof file)
2. **Available metrics** within each scope (CPU, allocations, etc.)
3. **Sample data** for the selected scope/metric combination
```typescript
// Discover available pprof data
const scopesResult = await trace.engine.query(`
SELECT DISTINCT scope FROM __intrinsic_aggregate_profile ORDER BY scope
`);
// Load metrics for selected scope
const metricsResult = await trace.engine.query(`
SELECT sample_type_type, sample_type_unit
FROM __intrinsic_aggregate_profile
WHERE scope = '${selectedScope}'
`);
```
#### Flamegraph Integration
The implementation reuses Perfetto's existing `QueryFlamegraph` component with dynamically generated metrics:
```typescript
const flamegraphMetrics = metricsFromTableOrSubquery(
`
WITH metrics AS MATERIALIZED (
SELECT
callsite_id,
sum(sample.value) AS self_value
FROM __intrinsic_aggregate_sample sample
JOIN __intrinsic_aggregate_profile profile
ON sample.aggregate_profile_id = profile.id
WHERE profile.scope = '${scope}'
AND profile.sample_type_type = '${metric}'
GROUP BY callsite_id
)
SELECT
c.id,
c.parent_id as parentId,
c.name,
c.mapping_name,
coalesce(m.self_value, 0) AS self_value
FROM _callstacks_for_stack_profile_samples!(metrics) AS c
LEFT JOIN metrics AS m USING (callsite_id)
`,
[{ name: 'Pprof Samples', unit: unit, columnName: 'self_value' }],
'include perfetto module callstacks.stack_profile'
);
```
This query leverages the existing `_callstacks_for_stack_profile_samples!` table function to build the complete flamegraph hierarchy while aggregating pprof sample values.
### Usage
#### Command Line Analysis
```bash
# Analyze a pprof file directly
$ trace_processor_shell profile.pprof
# Query available metrics
> SELECT scope, sample_type_type, sample_type_unit
FROM __intrinsic_aggregate_profile;
# Examine sample data
> SELECT COUNT(*) FROM __intrinsic_aggregate_sample
WHERE aggregate_profile_id = 1;
```
#### Web UI Analysis
1. **File loading**: Drag and drop pprof file into Perfetto UI or use file picker
2. **Automatic detection**: Perfetto recognizes pprof format and imports data
3. **Navigation**: Go to "Pprof" page from main navigation
4. **Interactive analysis**: Select scope/metric and explore flame graph
#### Multi-metric Files
For pprof files containing multiple value types (e.g., CPU samples + heap allocations):
1. **Single import**: All metrics from one file imported together under same scope
2. **Metric switching**: UI dropdown allows switching between metrics instantly
3. **Independent analysis**: Each metric displays as separate flame graph
## Design Principles
### Integration over Replacement
Rather than building a standalone pprof viewer, this feature integrates pprof analysis into Perfetto's existing infrastructure. This provides:
**Unified tooling:** Users can analyze pprof data alongside other trace formats using the same UI and SQL interface.
**Leveraged infrastructure:** Reuses existing flame graph rendering, call stack handling, and database optimization.
**Consistent UX:** Familiar Perfetto interface for users already using the platform.
### Separation of Concerns
**Timeline independence:** pprof data represents aggregate samples without time dimension, kept completely separate from timeline-based trace analysis.
**Static import model:** pprof files are imported once and stored in read-only tables, avoiding complex re-aggregation logic.
**Format-specific handling:** Dedicated importer handles pprof-specific concepts while mapping to Perfetto's general profiling abstractions.
### Minimal Overhead
**Zero cost when unused:** No impact on existing Perfetto functionality when pprof features are not used.
**Efficient storage:** Sample values stored in aggregated form, avoiding redundant per-sample overhead.
**Query optimization:** Leverages existing database indices and table functions for optimal performance.