commit	9ed18981b172e0ef8f5f1a42f22733f4656138bd	[log] [tgz]
author	Ryan <rsavitski@google.com>	Wed Jul 24 15:04:40 2019 +0100
committer	Ryan <rsavitski@google.com>	Wed Jul 24 15:04:40 2019 +0100
tree	561451a1059dc5ff141e6f935c3129abad1d98ad
parent	05c925d2e7f0ffd5ef6d603bcbda90398c48e9e2 [diff]

traced_probes ftrace: switch back to single-threaded, nonblocking read(2)-only approach

Per the recent performance measurements (go/perfetto-bg-cpu), we no longer
think that the multi-threading and block/nonblock splice/read approach is
worthwhile in its current state. It is not just highly complex, it also has an
unnecessarily high cpu% overhead.

The reading is now guided by a single repeating task that reads & parses the
contents of all per-cpu ftrace pipes.

What we lose in this version:
* ability to sleep until a page of ftrace events is filled (with blocking
  splice). This would only make a difference for tracing sessions with truly
  low-frequency events (not important to optimize for atm).
* scalability for many-core machines. This version works well for an 8 core
  phone, but is likely to struggle on a 64 core workstation. Let's treat this
  patch as a reset for complexity, and reintroduce it only as necessary.
  Update: ran on my 72 core dev workstation as a smoke test, it kept up fine.
* possibly splice efficiency? Haven't tried a single-thread splicer, but the
  bigger immediate wins are probably in the parsing code (this version is at
  5:1 utime:stime ratio according to my measurements).

Rough measurements of traced_probes cpu% on a crosshatch (standalone ndk build
+ tmux script), with the methodology as in go/perfetto-bg-cpu:

tuned cfg: 32k page (i.e. chunk) size, 1s ftrace drain period.

idle device, default cfg: 2.4%
idle device, tuned cfg: <1%

video rec, default cfg: 13.5%
video rec, tuned cfg: 6%

So we're doing much better with a tuned config, waking up for 60ms to process
all cores once a second.

Unfortunately this patch is lacking in programmatic tests, I'm not really sure
which ones would be worthwhile with the existing mock-heavy test setups. This
will likely require a separate pass (in a separate cl) to be more
unit-testable (it'd be nice to test the cpu_reader loop stop/continue logic).

Bug: 133312949
Change-Id: Ia79a267f43214f336b5396f4dd5789bc49ab1e67

15 files changed

tree: 561451a1059dc5ff141e6f935c3129abad1d98ad

README.md

Perfetto - Performance instrumentation and tracing

Perfetto is an open-source project for performance instrumentation and tracing of Linux/Android/Chrome platforms and user-space apps.

See www.perfetto.dev for docs.

Bugs

For bugs affecting Android or the tracing internals use the internal bug tracker (go/perfetto-bugs).
For bugs affecting Chrome use http://crbug.com, Component:Speed>Tracing label:Perfetto.