Multi-machine recording

This document describes how to record a single Perfetto trace that captures events from two Linux machines simultaneously. It uses traced_relay on the second machine to forward producer IPC to a traced running on the first machine.

For background on what multi-machine tracing is and how it works under the hood, see Multi-machine architecture.

Use case

You have a workload split across two Linux machines — e.g. a client on machine A driving a server on machine B, or a host running a Linux VM — and you want a single trace covering both, so cross-machine causality is visible in one timeline and queryable in one trace file.

In the rest of this guide, host is the machine that will run traced and own the trace buffers, and guest is the second machine whose producers feed into the same trace via traced_relay. Substitute <host-ip> with the IP address (or hostname) of host as reachable from guest.

Prerequisites

  • tracebox available on both machines. See Start Using Perfetto for how to obtain a binary.
  • A network path from guest to host on a chosen TCP port (e.g. port 20001). If there's a firewall between them, open the port.
  • No traced already running on either machine. On the guest, traced and traced_relay would contest the same local producer socket; on the host you want the traced you start below, not a system one.
  • host and guest are separate OS images — two machines, a host plus a VM, etc. Pointing both producers at the same kernel does not work.

NOTE: This guide records ftrace events for the example, which on Linux typically requires running the producer commands as root (or with CAP_SYS_ADMIN). The IPC commands themselves do not require root.

Usage

Step 1: Start traced on the host, listening on TCP

On host:

PERFETTO_PRODUCER_SOCK_NAME=0.0.0.0:20001 \
  tracebox traced --enable-relay-endpoint

PERFETTO_PRODUCER_SOCK_NAME rebinds the producer socket from the default UNIX path to a TCP listener that remote machines can reach. --enable-relay-endpoint makes that socket accept traced_relay connections in addition to ordinary local producers.

Leave this process running.

Step 2: Start traced_probes on the host

In a second shell on host:

PERFETTO_PRODUCER_SOCK_NAME=127.0.0.1:20001 \
  sudo -E tracebox traced_probes

The same env var that rebound traced's listener also tells local producers where to connect — without it, traced_probes would still try the default UNIX socket and fail. sudo -E preserves the env var across the privilege escalation needed for ftrace.

Step 3: Start traced_relay on the guest

On guest:

PERFETTO_RELAY_SOCK_NAME=<host-ip>:20001 \
  tracebox traced_relay

traced_relay opens the standard local producer socket on guest and forwards every producer IPC frame to the host's relay endpoint. You should see a startup line of the form:

Started traced_relay, listening on /tmp/perfetto-producer, forwarding to <host-ip>:20001

(The listening path may instead be /run/perfetto/traced-producer.sock if that directory exists — both are valid Linux defaults.)

Leave this process running.

Step 4: Start traced_probes on the guest

In a second shell on guest:

sudo tracebox traced_probes

No env var is needed: with PERFETTO_PRODUCER_SOCK_NAME unset, traced_probes connects to the default Linux producer socket — which is exactly the path traced_relay is listening on — so the two find each other automatically.

Step 5: Record a trace from the host

Multi-machine tracing requires an explicit TraceConfig — the tracebox perfetto -t 10s ... sched/sched_switch shorthand records on the host machine only (see Multi-machine architecture).

On host, write a config file:

cat > config.pbtx <<'EOF'
buffers {
  size_kb: 32768
  fill_policy: RING_BUFFER
}
trace_all_machines: true
data_sources {
  config {
    name: "linux.ftrace"
    ftrace_config {
      ftrace_events: "sched/sched_switch"
    }
  }
}
duration_ms: 10000
EOF

Then record:

tracebox perfetto --txt -c config.pbtx -o trace.pftrace

Step 6: Verify both machines are in the trace

Open trace.pftrace at https://ui.perfetto.dev. In the SQL query view, run:

SELECT id, raw_id, sysname, release, arch, num_cpus FROM machine;

Expect two rows. id = 0 is always the host; remote machines have a non-zero raw_id. See the machine table reference for the full set of columns.

To confirm that events from both machines made it into the trace, group ftrace events by machine. ftrace_event does not carry machine_id directly — each row references a cpu (via ucpu), and cpu carries the machine_id:

SELECT cpu.machine_id, COUNT(*) AS num_events
FROM ftrace_event
JOIN cpu USING (ucpu)
GROUP BY cpu.machine_id;

You should see one row per machine, each with a non-zero count. The same join pattern works against the thread or process tables to slice by machine through different dimensions.

Troubleshooting

  • Only one row in machine. Connectivity problem. Check that <host-ip>:20001 is reachable from guest (e.g. with nc -zv), that the firewall is open, and that traced on the host bound 0.0.0.0 (not 127.0.0.1).
  • traced_relay exits immediately and prints usage. PERFETTO_RELAY_SOCK_NAME is unset or empty — traced_relay has no host to forward to.
  • traced_probes on the guest fails with a connect error. Make sure traced_relay is running on the guest (Step 3) and that no stale traced is also running there contesting the producer socket.
  • Producers on the host fail to connect. Confirm traced started with PERFETTO_PRODUCER_SOCK_NAME=0.0.0.0:20001 (Step 1) and that the producers are pointed at the same address (Step 2).

Next steps