commit	ccd89612055ff167c73f4bbc374a38e86de65556	[log] [tgz]
author	Ryan Savitski <rsavitski@google.com>	Mon Mar 09 18:31:47 2020 +0000
committer	Ryan Savitski <rsavitski@google.com>	Mon Mar 09 18:31:47 2020 +0000
tree	f3c46534cc56441a57514c27533dc5d780024237
parent	95f126da184ea1050e7644910521fde3008e7fcc [diff]

traced_perf: move unwinding to a dedicated thread

This splits the single-threaded traced_perf into two threads:
* a primary thread that does the initial kernel buffer reading, as well
  as the final interning/serialization, and IPC.
* an unwinder thread that sits in-between in terms of the dataflow.

The reasoning is that unwinding is very long-tailed, and we don't want
to starve the kernel buffer reading/IPC functions of the producer while
sitting in a 1s-long unwind on Android.

The unwinder uses a ring queue for the input samples (and is woken up by
the primary thread after it pushes a batch of samples). Once a sample is
unwound, it's posted directly back to the main thread (might need to
consider batching here as well if we don't want many staggered wakeups).

Note on the unwinding queue: this approach is enqueueing parsed samples
(with complex types like the unique_ptr and vector). An alternative
considered was to make the queue entries have their original kernel
format (so a direct memcpy from the kernel ring buffer). I can see a
variety of pros and cons for both approaches (which I won't summarize
here), but ultimately decided on keeping the early parsing into a
"complex" type, and dealing with just that type post-EventReader.

The pid-tracking is done primarily on the primary thread, but a subset
(pretty much ready vs expired) of updates is replicated to the Unwinder
(which acts as a listener, without pushing any updates of its own).

I considered two approach for the unwinding queues: a single shared
queue (as posted), and per-DataSource queues that would be created by
the primary thread, and adopted by the unwinder. There's an argument
that separate queues would be more fair when there are concurrent data
sources, and the load is too high. On the other hand, we will still
ultimately want a process-wide cap on the amount of inflight samples, so
a single queue shortcuts to that (at the expense of some fairness).

UnwindingHandle is a temporary copy-paste. I'm hoping to get rid of it
within a week (but it's a separate conversation on
base::ThreadTaskRunner API that I don't want blocking this patch).

Unwindstack caching is removed temporarily (since during reconnects, we
might be moving unwinding between threads while recreating the
Unwinder), will fix in a follow-up.

Note to reviewer: I'm not very confident about most of the file/class
naming choices. Please criticize the inconsistencies without reservation.

Bug: 144281346
Change-Id: I4f59d1b4d52cf589fbe60e78ad4c1ee0b9994c0a

12 files changed

tree: f3c46534cc56441a57514c27533dc5d780024237

README.md

Perfetto - Performance instrumentation and tracing

Perfetto is an open-source project for performance instrumentation and tracing of Linux/Android/Chrome platforms and user-space apps.

See www.perfetto.dev for docs.

Contributing

See /docs/contributing.md for instructions.

The source-of-truth repo is Android's Gerrit. The GitHub repo is a read-only mirror.

Bugs

For bugs affecting Android or the tracing internals use the internal bug tracker (go/perfetto-bugs).
For bugs affecting Chrome use http://crbug.com, Component:Speed>Tracing label:Perfetto.

Community

You can reach us on our Discord channel. If you prefer using IRC we have an experimental Discord <> IRC bridge synced with #perfetto-dev on Freenode.