traced_perf: move unwinding to a dedicated thread
This splits the single-threaded traced_perf into two threads:
* a primary thread that does the initial kernel buffer reading, as well
as the final interning/serialization, and IPC.
* an unwinder thread that sits in-between in terms of the dataflow.
The reasoning is that unwinding is very long-tailed, and we don't want
to starve the kernel buffer reading/IPC functions of the producer while
sitting in a 1s-long unwind on Android.
The unwinder uses a ring queue for the input samples (and is woken up by
the primary thread after it pushes a batch of samples). Once a sample is
unwound, it's posted directly back to the main thread (might need to
consider batching here as well if we don't want many staggered wakeups).
Note on the unwinding queue: this approach is enqueueing parsed samples
(with complex types like the unique_ptr and vector). An alternative
considered was to make the queue entries have their original kernel
format (so a direct memcpy from the kernel ring buffer). I can see a
variety of pros and cons for both approaches (which I won't summarize
here), but ultimately decided on keeping the early parsing into a
"complex" type, and dealing with just that type post-EventReader.
The pid-tracking is done primarily on the primary thread, but a subset
(pretty much ready vs expired) of updates is replicated to the Unwinder
(which acts as a listener, without pushing any updates of its own).
I considered two approach for the unwinding queues: a single shared
queue (as posted), and per-DataSource queues that would be created by
the primary thread, and adopted by the unwinder. There's an argument
that separate queues would be more fair when there are concurrent data
sources, and the load is too high. On the other hand, we will still
ultimately want a process-wide cap on the amount of inflight samples, so
a single queue shortcuts to that (at the expense of some fairness).
UnwindingHandle is a temporary copy-paste. I'm hoping to get rid of it
within a week (but it's a separate conversation on
base::ThreadTaskRunner API that I don't want blocking this patch).
Unwindstack caching is removed temporarily (since during reconnects, we
might be moving unwinding between threads while recreating the
Unwinder), will fix in a follow-up.
Note to reviewer: I'm not very confident about most of the file/class
naming choices. Please criticize the inconsistencies without reservation.
Bug: 144281346
Change-Id: I4f59d1b4d52cf589fbe60e78ad4c1ee0b9994c0a
diff --git a/Android.bp b/Android.bp
index 6f56224..40057d4 100644
--- a/Android.bp
+++ b/Android.bp
@@ -5926,6 +5926,11 @@
],
}
+// GN: //src/profiling/perf:common_types
+filegroup {
+ name: "perfetto_src_profiling_perf_common_types",
+}
+
// GN: //src/profiling/perf:proc_descriptors
filegroup {
name: "perfetto_src_profiling_perf_proc_descriptors",
@@ -5971,6 +5976,9 @@
// GN: //src/profiling/perf:unwinding
filegroup {
name: "perfetto_src_profiling_perf_unwinding",
+ srcs: [
+ "src/profiling/perf/unwinding.cc",
+ ],
}
// GN: //src/profiling/symbolizer:symbolize_database
@@ -7239,6 +7247,7 @@
":perfetto_src_profiling_memory_scoped_spinlock",
":perfetto_src_profiling_memory_unittests",
":perfetto_src_profiling_memory_wire_protocol",
+ ":perfetto_src_profiling_perf_common_types",
":perfetto_src_profiling_perf_proc_descriptors",
":perfetto_src_profiling_perf_producer",
":perfetto_src_profiling_perf_producer_unittests",
@@ -7749,10 +7758,12 @@
":perfetto_src_profiling_common_interning_output",
":perfetto_src_profiling_common_proc_utils",
":perfetto_src_profiling_common_unwind_support",
+ ":perfetto_src_profiling_perf_common_types",
":perfetto_src_profiling_perf_proc_descriptors",
":perfetto_src_profiling_perf_producer",
":perfetto_src_profiling_perf_regs_parsing",
":perfetto_src_profiling_perf_traced_perf_main",
+ ":perfetto_src_profiling_perf_unwinding",
":perfetto_src_protozero_protozero",
":perfetto_src_tracing_common",
":perfetto_src_tracing_core_core",