commit | ccd89612055ff167c73f4bbc374a38e86de65556 | [log] [tgz] |
---|---|---|
author | Ryan Savitski <rsavitski@google.com> | Mon Mar 09 18:31:47 2020 +0000 |
committer | Ryan Savitski <rsavitski@google.com> | Mon Mar 09 18:31:47 2020 +0000 |
tree | f3c46534cc56441a57514c27533dc5d780024237 | |
parent | 95f126da184ea1050e7644910521fde3008e7fcc [diff] |
traced_perf: move unwinding to a dedicated thread This splits the single-threaded traced_perf into two threads: * a primary thread that does the initial kernel buffer reading, as well as the final interning/serialization, and IPC. * an unwinder thread that sits in-between in terms of the dataflow. The reasoning is that unwinding is very long-tailed, and we don't want to starve the kernel buffer reading/IPC functions of the producer while sitting in a 1s-long unwind on Android. The unwinder uses a ring queue for the input samples (and is woken up by the primary thread after it pushes a batch of samples). Once a sample is unwound, it's posted directly back to the main thread (might need to consider batching here as well if we don't want many staggered wakeups). Note on the unwinding queue: this approach is enqueueing parsed samples (with complex types like the unique_ptr and vector). An alternative considered was to make the queue entries have their original kernel format (so a direct memcpy from the kernel ring buffer). I can see a variety of pros and cons for both approaches (which I won't summarize here), but ultimately decided on keeping the early parsing into a "complex" type, and dealing with just that type post-EventReader. The pid-tracking is done primarily on the primary thread, but a subset (pretty much ready vs expired) of updates is replicated to the Unwinder (which acts as a listener, without pushing any updates of its own). I considered two approach for the unwinding queues: a single shared queue (as posted), and per-DataSource queues that would be created by the primary thread, and adopted by the unwinder. There's an argument that separate queues would be more fair when there are concurrent data sources, and the load is too high. On the other hand, we will still ultimately want a process-wide cap on the amount of inflight samples, so a single queue shortcuts to that (at the expense of some fairness). UnwindingHandle is a temporary copy-paste. I'm hoping to get rid of it within a week (but it's a separate conversation on base::ThreadTaskRunner API that I don't want blocking this patch). Unwindstack caching is removed temporarily (since during reconnects, we might be moving unwinding between threads while recreating the Unwinder), will fix in a follow-up. Note to reviewer: I'm not very confident about most of the file/class naming choices. Please criticize the inconsistencies without reservation. Bug: 144281346 Change-Id: I4f59d1b4d52cf589fbe60e78ad4c1ee0b9994c0a
Perfetto is an open-source project for performance instrumentation and tracing of Linux/Android/Chrome platforms and user-space apps.
See www.perfetto.dev for docs.
See /docs/contributing.md for instructions.
The source-of-truth repo is Android's Gerrit. The GitHub repo is a read-only mirror.
You can reach us on our Discord channel. If you prefer using IRC we have an experimental Discord <> IRC bridge synced with #perfetto-dev
on Freenode.